You’ve missed the point. The fact that probabilities are used as the initial statement doesn’t mean you can’t get “regular” outputs. After all you can phrase the ordinary linear model as \Pr(Y \leq y | X) = \Phi(\frac{y - X\beta)}{\sigma}). But from that model you can still get E(Y|X) = X\beta. I have case studies in my RMS course notes using ordinal regression as a replacement for GLS.
Having “odds” in the name of the model does not hint what the outcome looks like other than to assume it’s either discrete ordinal or continuous.
Can ordinal regression apply to when my outcome variable has directionality?
A particular case is when 0 is the reference point, and negative numbers and positive numbers both represent extreme. For example, in looking at range of motion/flexibility of the hip, age can influence how stiff you are. Extreme numbers of better, but close to 0 is not as good. Another would be that one particular muscle is affected, so the patient is able to rotate his hip one way (ie. the positive direction), but not the other way (ie, the negative direction).
In such situations I usually fit two models: one for the signed measurement and one for the absolute value. In your particular setting you’d expect the model for the signed measurement to be weak.
I think @EpiLearneR is referring to the link printed in the article itself (at the very end of the Introduction). Which, I imagine, you won’t be able to correct.
Cross-sectional data;
N = 147;
7 categorical & 6 continuous Xs;
Y is continuous (from 5 to 66, integer values).
Goal is to estimate associations between Xs and Y. Using Ordinal Regression (orm()). First, I fitted a full model and used bootstrap to validate it (validate()). Then I also fitted smaller models using different data reduction strategies (e.g., redun(), varclus(), and fastbw()), always validating such models/strategies with bootstrap.
Question: Which of these models should I interpret for making my inferences on the association between the Xs and Y (e.g., through anova(), summary(), contrast(), Predict())? The one with the best predictive performance? If so, which index from the validation table (validate()) should I use?
E.g., ρ, R², Mean |Pr(Y≥Y0.5)-0.5|, etc
From Section 4.12.2 Developing Models for Effect Estimation, I take that for the above case there is not much need for data reduction and model validation. If that holds, then should I simply use the full model for making the inferences?
There are many model performance criteria. But one of the most important ones for inference is confidence interval coverage. For that a full, pre-specified model is hard to beat. There are a few cases where choosing between two competing models and using the “winner” is OK for inference. Example: Both models contain 3 thought-to-be-especially-important covariates, one model contains 4 principal components computed on the remaining 7 variables and the other model includes all 7 variables separately.
Suppose the outcome of interest (Y) is ordinal (almost continuous, or at least many distinct values), measured at baseline and one follow-up timepoint, and roughly 15% of the outcome data are missing at follow-up (probably not MCAR, but maybe MAR). Would it be reasonable to use aregImpute and fit.mult.impute, similar to the example in RMS 15.5 to impute the missing Y at follow-up, in part based on Y at baseline and other baseline predictors? The example shows imputation of missing predictors only.
On 1. I don’t know of further work, but the referenced Stat in Med paper establishes everything I need. Some indirect evidence includes (1) what is established for the similar Cox model, even though it uses partial likelihood and not full likelihood; (2) a model without covariates is just the ECDF and properties of the ECDF are well established; (3) a model with a single binary covariate is the Wilcoxon test which is well established.
On 2. there are related references in Chapter 3 of RMS. Assuming only one follow-up time and assuming non-existence of a surrogate outcome variable, the 15% are basically non-recoverable and imputation won’t help very much. You would have to make a big MAR assumption conditional on baseline X (including the baseline version of Y).
Thank you so much for the prompt response! Regarding #2, can you recommend a better alternative? Surely an ordinal outcome in a clinical trial with some dropout is a common problem…?
In a clinical trial the efficacy analysis dataset (unlike the safety dataset) requires at least one post-randomization visit. You might call this “modified intent-to-treat” but I’m not sure. This is sometimes a reason to do longitudinal studies; the loss of the sole follow-up measurement is pretty fatal. Especially fatal is to have such dropouts with double blinding is not in effect.
I am trying to model a continuous outcome which is defined as “Total of correct: sum of right answers”. This simply means that the outcome can vary between 0 (0 correct answers on the test) and 36 (the maximum score you can get on the test).
As of now, I am transforming it to a score between 0 and 1, and model it using logistic regression. In previous studies, people have used linear regression with the raw score. Would this be a good case for using ordinal regression? Thanks.
This is ideal for ordinal regression, with no transformation and no collapsing of Y-levels. This will handle floor and ceiling effects, bimodality, non-normality, etc.