Quality of life surveys and proportional odds models

Hello all,

I was asked by a colleague to examine a data set that looked at pre-operative frailty as a predictor of post-operative quality of life, but I am struggling to identify a suitable model due to the nature of the input variables.

The outcome is a RAND 36-item health survey questionnaire which collapses down to eight categories. For example; one category, “physical functioning”, is the mean of 10 individual questions, set against a three point scale; “pain”, is the mean of two questions, set against a five point scale. All said and done, you are left with a score for each category that can take defined values between 0 and 100.

Predictors of these values are frailty (measured as a the proportion of 11 defined co-morbidities that a patient presents with), sex and age. Patients are classified as frail when they have three or more conditions (Frailty index = 3/11, i.e. > 0.27)

The publications I’ve seen using similar data appear not to address the underlying data or justify the model selected. My own path has arrived at using a proportional odds ordinal logistic model (rms::orm) where I treat the outcome as continuous and frailty as ordinal.

fit <- orm(physical functioning ~ scored(frailty) + rcs(age, 4) + sex, data)

My question is, am I approaching this correctly? Any thoughts, suggestions or direction when facing this type of data would be greatly welcomed.


I think that the proportional odds model is a really good choice here. But IMHO you are not sufficiently questioning two of the premises of the research. I am convinced that neither premise is correct.

  • Some of the components of the frailty index may have been dichotomized versions of fundamental measures, the the original ordinal or continuous values of these may possibly contain more information about frailty than the entire frailty index
  • The definition of frailty = frailty index \geq 3 was pulled out of thin air and is not justified.

You seem to not be using those premises, thank goodness, but I suggest you don’t let them go by without criticism.


Good questions by the OP. I’d like some input on this as well as this problem of information loss also impacts formal approaches research synthesis.

On the frailty index: I assume only the dichotomized scores are available. Is there some way of incorporating what little information remains in the reported data in a principled way?

Also, related to the question by the OP – how do you feel about constructing theses scores by averaging inherently ordinal variables? What metric (aside from averages of ordinal items) could be proposed?

@f2harrell: do you have any ideas on how this information loss would relate to the evidence metrics described by Jeff Blume? I haven’t worked out all of the math yet, but it seems like it would amount discounted sample size adjustment based on ARE (asymptotic relative efficiency).

AFAICT, the problem of “dichotomania” would affect power. I was considering some sort of metric that takes the ratio of \frac { 1 - \beta} {\alpha} looking for errors that would increase \beta or \alpha, and then discount accordingly. The closer some study, test, etc \frac { 1 - \beta} {\alpha} approaches 1, the less valuable as an evidence measure it is.

My inspiration is the following: E. Lehmann Some Principles of the Theory of Testing Hypotheses

I understand from BBR that dichotomization is terrible for power. If I am doing the math right, a study originally powered at 80%, but then dichotomizes, can throw away as much as 80% of their pre-study power, leaving only 16% actual power. 0.8*(1 - 0.8)

Besides just being terrible for power, my even bigger peeve about dichotomizing continuous outcomes

I am currently working on a trial design where the primary outcome variable will be left ventricular ejection fraction (LVEF) measured 6 months after initiation of treatment. I have successfully lobbied the investigator to use LVEF as a continuous variable, thank goodness (ANCOVA with 6-month as the endpoint and covariate adjustment for baseline variable), although many of his collaborators were pushing for something like a dichotomized “LVEF recovery” (defined as >=10% increase from the baseline LVEF). If we do this, a patient who had a 5% increase is treated the same as a patient with zero improvement, while a patient that had a 10% increase is treated the same as a patient that had a 20% increase. If one treatment improved every single patient by exactly 5% while an alternative treatment improved half of patients by 10% and WORSENED half of patients by 10% (so mean improvement is zero), a dichotomized version of the outcome variable (“improvement” = 10% increase) would suggest that the latter treatment is superior even though the ‘average’ benefit of the former is better (5% improvement versus zero).

Not only do you lose power, you can actually get completely nonsense results like this.


The formula isn’t so easy but that’s in the right direction. The efficiency loss is first thought of as variance ratios, which become sample size ratios in effect. And I don’t think of dichotomization hurting just one style of analysis such as second generation p-values. It affects all methods.

1 Like

Thanks for the comments. It’s reassuring to know I’m on the right track with my analysis choices (whatever about the analysis question).

@f2harrell: I only use frail/not frail for initial visualisation to get a sense of the data distributions and I intend on using the nice example offered by @ADAlthousePhD when I next meet my colleague.

As indicated in my original post, the literature appears to be awash with dubious reports of pre-operative measures predicting post-operative QoL. Early on in this project, I’m pretty sure I saw a paper where the authors reversed the outcome and the predictor in the model. I don’t have a lot of miles behind me with respect to medical studies, but the prevalence of dichotomising data in just the projects that have crossed my desk is alarming.

The initial look that assumes the patient falls off a cliff at the threshold for ‘frail’ will be misleading.

I tried looking up your reference for this; sadly it appears that the 2009 paper is not available online.

I was able to find someone who addressed this prior:

He describes the problems both Andrew and you describe, but it seems depends on the actual effect size. I will have to study this more closely, maybe do some R simulations.

In Jeff Blume’s framework, the evidential value of a paper (that rejects) is \frac {1 - \beta_{\alpha}}{\alpha}, both \beta and a get increased by this practice, driving the likelihood ratio towards 1, possibly making the data summary worthless.

Worth reading for anyone who does research synthesis and is concerned about the impact of the errors BBR describes:


Here is the paper


Of course loss of power is only one of several problems associated with artificial grouping. For example, under some circumstances a cut can also introduce a spurious association.

Bivariate median splits and spurious statistical significance. SE Maxwell , HD Delaney . Psychological bulletin 113 (1), 181, 1993.


Thanks for the reference; looks like the author put it up on researchgate

In relation to meta-analysis, Hunter and Schmidt wrote this article on potential for correction

I came across this critique of the use of logistic regression. I don’t agree with the thesis, but the reference section is good.

From reading the various papers in this thread the big loss of information by categorizing continuous variables is by treating items that have order as equivalent. An ordinal analysis (ie. converting interval or ratio scale data to ranks) loses distance among items, but still maintains order among observations. That doesn’t lose nearly as much information, and can have some robustness benefits.

There’s lots I disagree with in that paper. It also forgot the fact that with a proportional odds model you can get an odds ratio without losing a significant amount of information.

Re attenuation of the correlation due to dichotomizing, Peters and van Voorhis published a correction factor in 1940 in their book Statistical Procedures and their Mathematical Bases (NY: McGraw-Hill). It was apparently soon forgotten.

Gary McLelland has a nice little slider demo of this.

EDIT: There were originally two different demos, but Gary has apparently consolidated them.

1 Like

This discussion is very much on my mind as I wrestle with data on depression and stress in our medical students.

While I realise the perils of dichotomising depression scores, I have the problem that the scores themselves are arbitrary units. No-one can interpret, say, a four-point difference between two groups of students.

Second, dichotomising the scores using an optimal cutoff for clinical diagnosis of depression allows me to calculate prevalence rate ratios and attributable risks for each of the stressors.

I’ve opted for the solution of running duplicate analyses in the background using olr and the original scale scores, to make sure that the simplifications I have introduced in order to produce interpretable measures of effect size haven’t resulted in failure to detect important relationships.

Any reactions on the wisdom of all this? Am I doing a bad thing through a good motive?

1 Like

You may well be confronting here a fatal flaw in the whole enterprise of dealing with experiences such as depressive symptoms as if they were quantities that can be tallied and otherwise subjected to the sorts of analysis that work in the physical sciences. When you find yourself destroying information in your data in order to render it more ‘interpretable’, perhaps that’s a sign?

Genuine concern about the psychological distress of your medical students would almost certainly require an approch that treats each one as a precious and unique individual.

Every person, you are precious. You are priceless. And you have a specialness. #povertytour http://ow.ly/6aT2f

— Cornel West (@CornelWest) August 23, 2011

A statistical analysis, by contrast, necessarily regards them as exchangeable units. May I ask what humanistic end(s) you wish to achieve by “wrestl[ing] with these data”?

A thoughtful question. The trouble with unique and precious individuals is that if they have nothing in common, then we cannot apply previous knowledge to help them. This would apply equally to, say, kidney disease or dyslexia.

My heroes in psychiatry were the clinicians who spent years talking to the inmates in asylums, trying to discern commonalities and patterns to mental illness, against a prevailing belief that people went mad in their own individual ways. Charcot, Breuer, Kraeplin and Freud, and the many others who laid the foundations for a more methodical approach to the understanding of mental illness.

To reassure you : our next phase in the research is to take what we have learned from the quantitative work (which was based in turn on focus group interviews) and conduct more qualitative research with students to try to understand the phenomena that we have uncovered. It’s an iterative process.

But it’s based on the idea that people have sufficient in common to be understood, in part, by generalisations. If they hadn’t, music would’t work, or anything that harnesses our common humanity.

1 Like