Quantile regression on zero-inflated continuous outcomes - two-part modeling approaches

mkvdp · June 4, 2020, 5:08pm

Hi there,

I have a zero-inflated continuous dependent variable for which I would like to be able to perform a quantile regression analysis. Approximately half of the observations are true zeros, and the remainder of the values vary widely and are highly right-skewed. I first attempted to conduct a multivariable quantile regression analysis using the 0.1, 0.25, 0.5, 0.75, and 0.9 quantiles. Given the amount of zeros, I was unable to obtain estimates for the 0.1 and 0.25 quantiles.

I’m wondering if I can use a two-part model to first model, using a logit or probit link, the odds of values >0 occurring, and then using quantile regression analysis to estimate the effect of my independent variables across the distribution of the values >0. So far I’ve only found approaches to dealing with zero-inflation for count data. I found the Stata package “twopm”, which provides methods for two-part models on continuous data, but only using glm approaches (lm or Gamma). I know that this is possible, as I found this dissertation, which uses the approach I would like to use (although I haven’t been able to figure out how to get my hands on the full text). Would anyone be able to point me in the direction for conducting a two-part model that includes a quantile regression approach using R or Stata?

Any help would be greatly appreciated. Thank you.

f2harrell · June 4, 2020, 6:28pm

You can certainly do a Heckman two-part model. Quantile regression is only for continuous Y. Semiparametric ordinal models on the other hand will handle even the most extreme “clumping at zero”. See the case study on ordinal analysis of continuous Y here.

mkvdp · June 4, 2020, 10:57pm

Thanks for your reply! I used the orm() function in your rms package (default settings), and retrieved the following output:

                       Coef    S.E.   Wald Z Pr(>|Z|)
                 x1   -0.8423 0.1987 -4.24  <0.0001 
                 x2    0.1366 0.2509  0.54  0.5861  
                 x3    0.0366 0.1945  0.19  0.8509  
                 x4    0.2587 0.3470  0.75  0.4559

I couldn’t find an intuitive (at least for me) example on interpreting the coefficients. I’m assuming that, because a logit link was used, I’m dealing with log odds, but I’m not exactly sure what the exponentiated coefficients mean with respect to my continuous variable. Would it be similar to (1 - exp(-0.8423))*100, in which I could interpret the coefficient for x1 as 56% difference for x1 (a categorical variable) compared with the reference group?

Thanks again!

f2harrell · June 4, 2020, 11:06pm

Don’t convert to %; just talk about odds ratio for Y \geq y for a certain fixed y. And go through the case study to see how to convert to predicted mean or quantiles. But for now use exceedance probabilities and odds ratios. See rms functions Mean, Quantile, ExProb as used in the case study.