Sample Size Calculation for Ordinal Logistic Regression

A retrospective study of risk factors causing elevation of a serum marker. The outcome is the number of times in a specific treatment phase where this serum marker is elevated above a specified cutoff.

Proposed risk factors are nominal (gender, genetics) and continuous (age, BMI, etc…). Again, the outcome is the number of “episodes” of elevated marker.

My Question is: for ordinal logistic regression, should I calculate minimum sample size based on a “rule of thumb” (event per predictor / event per parameter) ? What other options are available ?

1 Like

Although there is no specific discussion of ordinal outcomes in this article but they present the principles for minimum sample size determination for binary outcomes, and it may be helpful:


The number of times that a marker is elevated above an always-arbitrary cutpoint fails to take close calls or irregular measurement times into account. So it can be biased (if measurement times are irregular or there are missing time periods) and inefficient (losing power). That kind of analysis is highly discouraged. Instead use a full longitudinal model on the continuous marker, choosing a model that allows for irregular measurement times if that pertains. This will greatly increase power and generalizability, allowing you to get (especially using an ordinal model) the exceedance probability estimates you originally sought. You can also better handle interruptions due to death and other events this way. Ordinal longitudinal models will handle all of this. See for example here.


I have a question for you and @f2harrell

I have been tasked with calculating the minimum required sample size for a multivariable ordinal logistic model with three outcomes. I am not aware of any guidance or methodological papers that specifically address how to do this for an ordinal model. I was therefore going to adhere to the guidance discussed in the paper you linked, and also summarised here:

My simple approach was to follow steps 1-4 for a binary outcome model and calculate the required minimal sample size for each of the three outcomes, for each step. I was then going to target the highest minimal sample size obtained from calculating any one of the four steps on any of the three outcomes, my thinking being that a ordinal model will be at least as powerful as a dichotomous logistic model.

However, I run into problems in steps 3 (What sample size will produce predicted values that have a small mean error across all individuals?) and 4 (What sample size will produce a small optimism in apparent model fit?). Because one of my ordinal outcomes is rare, these steps produce a ridiculously high minimal sample size, upwards of 150000 patients. This is because the calculation assumes I am dichotomously trying to predict this rare outcome alone and assumes I have only this rare information to estimate the coefficients. This is of course not the case, I have plenty information contained within the other more common outcomes and intend to “exploit” the proportional odds assumption to “borrow” information from them.

Because the coefficients are estimated using all events, it seems intuitive to me that the minimal sample size calculated in steps 3 and 4 should be based on the overall proportion of individuals who experience at least one of the three possible outcomes. For this assumption to be incorrect, the power of the ordinal logistical model would have to be less than a logistic model that was based on a dichotomisation of the same outcomes (i.e. either experiencing no outcome, or any one of the three outcomes).

Does this intuition make sense? Do you know of any papers that discuss this?

1 Like

These are the calculations if this helps to explain my concerns. The output of the estimated minimal required sample size is shown at the bottom for each of the four steps. We see that for step 3 and 4, the estimated minimal sample size is almost an order of magnitude smaller of the proportion experiencing any of the three events are used compared to largest estimate produced by looking at any single outcome:

Step 1: What sample size will produce a precise estimate of the overall outcome risk or mean outcome value?

Assumed proportion experiencing an event

p_any ← 0.07
p_outp ← 0.06
p_hosp ← 0.03
p_icu ← 0.002

Targeted margin of error for estimating the overall risk of the outcome

target_margin_error_any ← 0.02
target_margin_error_outp ← 0.02
target_margin_error_hosp ← 0.01 # lower target as the proportion is low
target_margin_error_icu ← 0.005 # lower target as the proportion is very low

For any outcome

n1_any ← (1.96/target_margin_error_any)^2 * p_any * (1 - p_any)

For outpatient visit or worse

n1_outp ← (1.96/target_margin_error_outp)^2 * p_outp * (1 - p_outp)

For hospital admission or worse

n1_hosp ← (1.96/target_margin_error_hosp)^2 * p_hosp * (1 - p_hosp)

For ICU admission or death

n1_icu ← (1.96/target_margin_error_icu)^2 * p_icu * (1 - p_icu)

Step 2: What sample size will produce predicted values that have a small mean error across all individuals?

Number of candidate predictor parameters

cand_pred ← 22

Targeted Mean Absolute Prediction Error (MAPE)

targeted_MAPE_any ← 0.03
targeted_MAPE_outp ← 0.03
targeted_MAPE_hosp ← 0.02 # Lower as outcome rarer and more important
targeted_MAPE_icu ← 0.01 # Lower as outcome rarer and considerably more important

For any outcome

n2_any ← exp((-0.508 + 0.259 * log(p_any) + 0.504 * log(cand_pred) - log(targeted_MAPE_any))/0.544)

For outpatient visit or worse

n2_outp ← exp((-0.508 + 0.259 * log(p_outp) + 0.504 * log(cand_pred) - log(targeted_MAPE_outp))/0.544)

For hospital admission or worse

n2_hosp ← exp((-0.508 + 0.259 * log(p_hosp) + 0.504 * log(cand_pred) - log(targeted_MAPE_hosp))/0.544)

For ICU admission or death

n2_icu ← exp((-0.508 + 0.259 * log(p_icu) + 0.504 * log(cand_pred) - log(targeted_MAPE_icu))/0.544)

Step 3: What sample size will produce a small required shrinkage of predictor effects?

For this step one must specify the desired amount of shrinkage (1 - expected uniform shrinkage factor)

expected_uniform_shrinkage_factor ← 0.95

and choose a conservative value of anticipated model performance (Cox-Snell R2). I will use the method described in to estimate a conservative CSR2 from a C-statistic, as I find this more intuitive

approximate_R2 ← function(auc, prev, n = 1000000){
#define mu as a function of the C statistic
mu ← sqrt(2) * qnorm(auc)
#simulate large sample linear prediction based on two normals for non-eventsN(0, 1), events and N(mu, 1)
LP ← c(rnorm(prev*n, mean=0, sd=1), rnorm((1-prev)n, mean=mu, sd=1))
y ← c(rep(0, prev
n), rep(1, (1-prev)n))
#Fit a logistic regression with LP as covariate; this is essentially a calibration model, and the intercept and slope estimate will ensure the outcome proportion is accounted for, without changing C statistic
fit ← lrm(y~LP)
max_R2 ← function(prev){
R2.nagelkerke = as.numeric(fit$stats[‘R2’]),
R2.coxsnell = as.numeric(fit$stats[‘R2’]) * max_R2(prev)))


anticipated_R2_any ← approximate_R2(auc = 0.8, prev = p_any)$R2.coxsnell
anticipated_R2_outp ← approximate_R2(auc = 0.8, prev = p_outp)$R2.coxsnell
anticipated_R2_hosp ← approximate_R2(auc = 0.8, prev = p_hosp)$R2.coxsnell
anticipated_R2_icu ← approximate_R2(auc = 0.8, prev = p_icu)$R2.coxsnell

For any outcome

n3_any ← cand_pred / ((expected_uniform_shrinkage_factor - 1) * log(1 - (anticipated_R2_any/expected_uniform_shrinkage_factor)))

For outpatient visit or worse

n3_outp ← cand_pred / ((expected_uniform_shrinkage_factor - 1) * log(1 - (anticipated_R2_outp/expected_uniform_shrinkage_factor)))

For hospital admission or worse

n3_hosp ← cand_pred / ((expected_uniform_shrinkage_factor - 1) * log(1 - (anticipated_R2_hosp/expected_uniform_shrinkage_factor)))

For ICU admission or death

n3_icu ← cand_pred / ((expected_uniform_shrinkage_factor - 1) * log(1 - (anticipated_R2_icu/expected_uniform_shrinkage_factor)))

Step 4: What sample size will produce a small optimism in apparent model fit?

First one must calculate the maximum Cox Snell R2 given the estimated proportion experiencing the outcome as shown in the supplement of

n ← 100

ln_Lnull_any ← p_anyn * log((p_anyn)/n) + (n - p_anyn) * log(1 - ((p_anyn)/n))
ln_Lnull_outp ← p_outpn * log((p_outpn)/n) + (n - p_outpn) * log(1 - ((p_outpn)/n))
ln_Lnull_hosp ← p_hospn * log((p_hospn)/n) + (n - p_hospn) * log(1 - ((p_hospn)/n))
ln_Lnull_icu ← p_icun * log((p_icun)/n) + (n - p_icun) * log(1 - ((p_icun)/n))

maxR2cs_any ← 1 - exp((2 * ln_Lnull_any)/n)
maxR2cs_outp ← 1 - exp((2 * ln_Lnull_outp)/n)
maxR2cs_hosp ← 1 - exp((2 * ln_Lnull_hosp)/n)
maxR2cs_icu ← 1 - exp((2 * ln_Lnull_icu)/n)

delta ← 0.05 # expected optimism

and then calculate the estimated optimism corrected shrinkage

est_optimism_corrected_shrinkage_any ← anticipated_R2_any / (anticipated_R2_any + delta * maxR2cs_any)
est_optimism_corrected_shrinkage_outp ← anticipated_R2_outp / (anticipated_R2_outp + delta * maxR2cs_outp)
est_optimism_corrected_shrinkage_hosp ← anticipated_R2_hosp / (anticipated_R2_hosp + delta * maxR2cs_hosp)
est_optimism_corrected_shrinkage_icu ← anticipated_R2_icu / (anticipated_R2_icu + delta * maxR2cs_icu)

For any outcome

n4_any ← cand_pred / ((est_optimism_corrected_shrinkage_any - 1) * log(1 - (anticipated_R2_any/est_optimism_corrected_shrinkage_any)))

For outpatient visit or worse

n4_outp ← cand_pred / ((est_optimism_corrected_shrinkage_outp - 1) * log(1 - (anticipated_R2_outp/est_optimism_corrected_shrinkage_outp)))

For hospital admission or worse

n4_hosp ← cand_pred / ((est_optimism_corrected_shrinkage_hosp - 1) * log(1 - (anticipated_R2_hosp/est_optimism_corrected_shrinkage_hosp )))

For ICU admission or death

n4_icu ← cand_pred / ((est_optimism_corrected_shrinkage_icu - 1) * log(1 - (anticipated_R2_icu/est_optimism_corrected_shrinkage_icu)))


max(n1_any, n1_outp, n1_hosp, n1_icu)
[1] 1117.906

max(n2_any, n2_outp, n2_hosp, n2_icu)
[1] 1722.742

max(n3_any, n3_outp, n3_hosp, n3_icu)
[1] 151392.2
[1] 4770.586

max(n4_any, n4_outp, n4_hosp, n4_icu)
[1] 15437.67
[1] 1047.569


Wow a lot of good stuff here. As in the R Hmisc package popower function I would compute the factor k such that the effective sample size for ordinal Y is the same as that of a continuous variable with kN observations. Then use a Richard Riley-type sample size calculation for continuous Y and apply the factor 1/k to it.


Thank you for your suggestion. I will try to work through the documentation of popower() to understand and implement the answer.

I also asked this same question on twitter and linked to the discussion on datamethods. I received some interesting answers that I am I still trying to wrap my head around. Ben Van Calster and Maarten van Smeden pointed to this interesting paper [2104.09282] Risk prediction models for discrete ordinal outcomes: calibration and the impact of the proportional odds assumption, which suggests that even when the proportional odds assumption holds, calibration may still be poor using a proportion odds model and recommend using a multinomial logistic model for this reason. This does not directly answer the question about how to calculate a minimal sample size requirement for proportional odds models but ties into their suggestion, which was to approach the problem as if it were a multinomial logistic model and reference Maarten van Smeden explains that, given a multinomial logistic model, the smallest outcome category generally dictates the minimum required sample size. This I find highly intuitive, as a multinomial logistic regression model does not ‘share’ information for the parameter coefficients between the different outcome categories. I still fail to understand why this should be the case for an ordinal proportional odds logistics model.

Given that my assumption that the parameters increase the log odds of the outcomes proportionally (i.e. the proportional odds assumption holds), my intuition is that I should be able to ‘borrow’ information from the more common outcome categories to help predict the risk of the rare outcome category. I do not doubt that my intuition is incorrect, but it would help me considerably if someone could attempt to explain why this intuition is wrong – where my thought process is deviates from what is actually happening.

Thinking about a multinomial model doesn’t help us in thinking about an ordinal model in this context.

1 Like

Would you agree with my intuition that the minimal required sample size that would be calculated by following the steps outlined in the linked BMJ article assuming that the proportional odds model is in fact a dichotomous logistic model of no event vs. any of the ordinal events, would yield a minimum sample size that is equal to or larger than the true minimal required sample size of the proportional odds model? This is assuming that the minimal required sample sizes in steps 1 and 2 do not dominate.

I am still working through your suggestion of popower and trying to understand how to implement your solution.

No, translating it to a binary outcome is too conservative. That would only work if the 3rd most frequent category is not frequent and there were not many categories in total.

1 Like

To clarify, when you say that translating to a binary outcome (no event vs any event) is too conservative, do you mean that the resulting minimum required sample size from that calculation would be too small or too large?

1 Like

The minimum sample size would be too large.