Article: Robustness of statistical methods when measure is affected by ceiling and/or floor effect

I was following up on citations to a paper @f2harrell cited in a number of threads supporting his preference for ordinal methods in applied problems:

The following PLOS paper approaches the problem from a formal measurement theory perspective.

I’m still digesting the paper, but their reference section is an excellent historical review of the simulation studies and the longstanding argument that compares the performance of textbook parametric and nonparametric methods on ordinal data.

They go beyond the simple comparisons of parametric t-tests variants and rank based methods to include Bayesian t-tests, TOST interval testing procedures, and some robust frequentist techniques (ie. trimmed-t test).

Some conclusions from their simulations that I found interesting:

  1. Blockquote
    Rank-based tests performed fairly well when considered relative to other alternatives. The results however suggest that methods that utilize rank-transform perform similar to methods that utilize log or logit transform. As with other transforms, if the fit between the choice of rank-based transform and the data generating mechanism is poor, test performance degrades and main effects and interactions may not be correctly identified. The same applies to linear methods. However, across wide range of scenarios, t -test and F -test showed inferior performance and their use with data with CFE should be discouraged.

  2. Blockquote
    The use of modern inferential methods, that were considered in the current work, can’t be recommended in their current form. The trimmed t -test showed worst performance of all tests and was ineffective at countering CFE. The equivalence testing methods, depended on the correct data transformation and otherwise produced false rejection of alternative hypothesis when the measure was affected by CFE. On occasion, confidence intervals manifested patterns of biased inference, where the estimate became more biased and more certain as the magnitude of CFE increased. Cohen’s d was biased by CFE as well and hence its use in meta-analysis or for research planning with measures affected by CFE is problematic.

  3. Blockquote
    In sum, CFE describes a constellation of several phenomena, such as heterogeneous variance, strong skew or nonlinear relation between measurement and the latent trait. The measure discreteness may add to that. The overview of the robustness literature suggested that these factors are detrimental to the performance of popular inferential methods. The current study illustrated, that when these phenomena co-occur, the resulting performance loss is not just sum of its parts, but ranges from cases of biased noisy inference, in which the detrimental effects cancel out, to cases of biased inference in which the detrimental effects reinforce each other. Hence, these phenomena need to be considered in conjunction.