Estimating variance with heteroscedastic regression models

RMazzolari001 · April 16, 2021, 6:39am

I am concerned that the variability in oxygen consumption in response to exercise might depend on the individual’s fitness: less ‘fit’ individuals might show a larger variability than more ‘fit’ individuals.
I only have one continuous predictor (maximal oxygen consumption - which defines the fitness level) and one continuous outcome (oxygen uptake in response to exercise) for about 30-35 independent observations. I don’t want to get rid of heteroscedasticity but rather estimate the variability in the outcome across the range of values of the predictor. I have searched a lot but I haven’t found much about this topic.
Any suggestions?

f2harrell · April 17, 2021, 1:08pm

I can think of two different approaches:

see if a semiparametric model fits the data, i.e., whether a certain link function (logit, probit, etc.) satisfies the parallelism assumption. Parallelism of, e.g., the logit of the cumulative distribution functions of Y, implies that there is a transformation f(Y) for which the dispersion is constant. Logit link → semiparametric proportional odds ordinal logistic model.
fit a model that allows you to specify covariates for the log of \sigma as well as for the mean. This is implemented beautifully in a Bayesian model with the R brms package brm fitting function.

JDruns · April 22, 2021, 2:13am

There’s a chapter in the 2003 textbook ‘Semiparametric Regression’ by Ruppert, Wand, and Carroll that’s dedicated to variance function estimation, and starts with a very similar problem to yours. Their solution is to model the variance function semiparametrically—which, it sounds like, is what you want to do (understand how variance in oxygen consumption changes as a function of fitness). It might be worth checking out (it is Chapter 14, pages 261-267).