Quantile regression with splines and penalization?

I’m looking for ways to estimate quantiles with (1) splines, (2) penalization and (3) the ability to print an equation. RMS makes it easy to get (1) and (3), but I have not found a way to implement penalization:

library(dplyr)
library(rms)
set.seed(123)
n <- 100
df <- data.frame(y = exp(x1 + rnorm(n)/4),
                 x1 = rnorm(n))

dd <- datadist(df); options(datadist='dd')
fq50 <- Rq(y ~ rcs(x1, 3), tau = 0.5, data = df)

pred.q <- Function(fq50)
sascode(pred.q)

I see that Roger Koenker’s original package on which RMS::Rq() is based has a method = "lasso" or "scad" that “…implement the lasso penalty and Fan and Li smoothly clipped absolute deviation penalty, respectively. These methods should probably be regarded as experimental”.

In my application, I have 16 continuous predictors and n~3000.

Hi Thomas - it’s wonderful to have you join datamethods. For that application I’d need a volunteer to join the github project for rms and to extend the Rq function to use these new options which I was unaware of. I don’t think it would take a lot of programming. You can form the package and then I merge the new changes in the code and help file into the main package source.

1 Like

Thanks Frank - DataMethods is such a wonderful community. I have silently participated in many discussions and found them very useful.

I’m surprised there isn’t as large a body of work on quantile regression as there is on other methods (e.g. only a handful of packages in R). But maybe that’s me committing the cardinal sin of falling in love with a method!

Do you have any thoughts of where else to look when one needs robustness, nonlinearity, penalization and a method that yields a transportable equation (i.e. no CART and such)?

One other option might be the relatively new qgam package, which extends the penalized spline capabilities of mgcv to estimating (one or more) quantiles.

Thanks, John. I did consider GAMs. Do you know if one can get an equation? A cursory search seems to indicate you can’t.

“Equation” in the sense of the actual values in the predictor matrix in the fitted model and the generic basis functions for each smooth? I believe it’s possible…but might involve some intensive digging around under the hood, so to speak. The mgcv functions that handle this are documented pretty well (eg here), but they are complicated. If you just need the equation for porting the model to some other software, the ‘lpmatrix’ method might do the trick.

One reason: quantile regression requires large sample sizes, as it is only as efficient as sample quantiles. For example, if there are no covariates in the model, the quantile regression estimate of the median is exactly the sample median, which has efficiency \frac{2}{\pi} under a Gaussian distribution.

2 Likes