This is a place for discussions arising from Frank Harrell’s [Regression Modeling Strategies] full or short course and for regression modeling topics from the MSCI Biostatistics II course. The next short course will be held virtually May 2021.
Here is the first question of the RMS 2020 short course.
Is it fair to say that the overarching objectives of regression models in general (and RMS in particular) are (1) to develop models that will make accurate predictions of responses for future observations, (2) to properly estimate and represent uncertainty, and (3) support valid inference (whatever that actually is)? And that everything else follows from these and are the necessary details?
If not, how can this proposition be improved?
In the session, we talked about the SNR (signal-to-noise ratio) and it was pointed out that machine learning may be best where this is very high or infinite such as in games or in visual processing. This might be a qualifier to add.
In reference to course philosophy and the perspective that models are usually the best descriptive statistics (because descriptive statistics do not work in higher dimensions) I was wondering on what side of the inferential model vs descriptive statistics divide you’d place techniques like multidimensional scaling. Do you see value in such techniques? I’m thinking in particular of things like ordination in ecology and situations where signal-to noise very low in high dimensional setting (and still need to assess/summarize in a reasonable way).
Can you please clarify regarding the usage of regression splines based on the discussion from today’s lecture. Is the consideration for how many knots to include for each continuous term in a regression model on the same scale as to how many covariates one can include (i.e. how many degrees of freedom can be afforded)? From the lecture it seemed like one can regularly allow at least 3 knots regardless of how many other terms are in the model.
I had a similar question today, Abraham. I assumed, when we were discussing the number of ‘variables’ that could be supported by a particular sample size, this referred to the number of actual parameters (betas) in the model including dummy variables and non-linear terms needed to model all of the variables.
I was wondering if there was a minimum number of unique values you need for a “continuous” x to be suitable for modelling by restricted cubic splines?
I think that techniques such as multidimensional scaling have very good utility especially in relationship to data reduction and better understanding interrelationships among predictors or among multiple outcome variables.
Adding knots is almost like adding a new variable to the model. Perhaps one could say that having a variable with 4 knots (3 d.f.) is like adding 2.5 variables to the model. If nonlinear terms are penalized it would be less. And the ability to use 3 knots will be hampered if there are many continuous variables in the model, unless the sample size is very large.
Sometimes I require 10 or more distinct values. But a slightly different problem occurs if you have a lot of ties at one value, making it hard for default knot placement algorithms to work.
Are data reduction techniques (like multidimensional scaling), in and of themselves, modeling or descriptive statistics (of high dimensional data)?
I’d like to learn more about the problems hierarchical random-effects models have with nonlinear (e.g., logistic) regression, which Prof. Harrell referred to at the end of this afternoon’s class.
Gelman is enthusiastic about hierarchical logistic regression, and I’ve read a lot of his work on the subject, so I’d like to also know more about the limitations and what to worry about in using such models.
Good question. I tend to use them descriptively but they are kind of modeling too.
I’ll bet that there is a single best blog article of his on andrewgelman.com. If you find it and point us to it that would be great.
A lot of Gelman’s blog posts on nonlinear hierarchical modeling focus on specific details rather than the big-picture, but there’s a 2018 paper by Weber, Gelman, Lee, Betancourt, Vehtari, and Racine-Poon on hierarchical nonlinear regression related to meta-analysis and drug development: http://www.stat.columbia.edu/~gelman/research/published/AOAS1122.pdf
Here is a recent blog post from Gelman, saying that nonlinear hierarchical models can be handled by the
stan_nlmer() function in the
rstanarm package: https://statmodeling.stat.columbia.edu/2020/03/30/fit-nonlinear-regressions-in-r-using-stan_lmer/
And here’s an R Journal paper by Paul Buerkner about the
brms package, which has a worked example of fitting nonlinear hierarchical model using
brms (see Example 3 on pp. 403ff).
I’ll keep looking for a great blog post by Gelman, but the links above may be good starting points.
Datamethods.org would only allow me to put 2 links in a post, so here’s a link to Buerkner’s R-journal paper that I cite in my previous comment: https://journal.r-project.org/archive/2018/RJ-2018-017/RJ-2018-017.pdf
My impression from today’s discussions is that you do not advocate dropping variables from a model nor simplifying non-linearities. But you also talk about bloc tests. Wouldn’t bloc tests be a way to see if clusters of variables and/or non-linearities are necessary? If a simpler model has an equivalent AIC/BIC/c-index, is well calibrated and the lrtest does not favor the complex model would this be enough to justify keeping the simple model? If not, can you please clarify when to use block tests and for what purpose?
I have questions related to the rms package and today’s discussions:
My take from today is that variable selection is to be avoided if possible and taken with a grain of salt if not. Rms has implemented
fastbwwhich performs fast backward variable selection with an AIC or p-value rule. I am not familiar with the procedure but does this avoid the pitfalls of variable selection mentioned today? If not, what was the motivation behind its implementation?
pentraceto implement penalized regression with continuous and binary outcomes. Is there something similar for cph? And polytomous/ordinal regression?
In the RMS book, chapter 5.5.2, you mention a backward selection from the ‘gold standard’ model based on R-squared. Is this implemented in the rms package and can it be specified within validate and calibrate functions?
Great question. We’ll soon talk in class about chunk tests with large degrees of freedom being a safe way to possibly delete a pre-specified block of variables.
fastbw implements a fast approximate (exact if doing a linear model) stepdown. It doesn’t avoid any of the pitfalls except that backwards stepdown works slightly better than forward selection.
survival package implements some sorts of penalization for the
rms::lrm implements penalization for binary and ordinal regression. Someone has done some work on penalized polytomous regression.
No specific code has been written to do model validation in the context of model approximation. Some of my students have identified one possible problem with model approximation.
Laura, These are good questions and your inquiry will be of interest to many, I am sure.
Frank will have a better answer, but until then, let me take a shot at it for you.
Your expressed impressions are largely correct: Frank advises against dropping variables and other simplifications as when these ‘parsimonious’ models are presented they are reported and interpreted as though the final model was correctly divined and the standard errors, CI’s and p-values are all too small —over-stating the certainty and precision of the model. This is another instance where “phantom df’s” come into play: df’s that were part of the model development process in variables and various parameterizations considered and evaluated are ‘disappeared’. This leads to misrepresentation of the real degree of randomness in the (entire) process of predicting responses.
This is the essence of RMS—statistical forensics exposing unwitting statistical misdirection in misstating the real uncertainty. These over-optimistic models tend not to validate well and be reproducible. This stuff is subtle. But important.
“Chunk tests” are used to guide decisions while minimizing the temptations to hack significance by scrutinizing individual terms. It is a more disciplined approach to making judicious conservative modeling decisions.
I hope that is helpful.
Frank, if I have this wrong, please set us straight.
Thank you and Drew very much for all these answers and looking forward to discussions about chunk tests and more!
Regarding my question about imputing missing data for a new data using a fitted object imp=aregImpute(). You said that it is not possible because the imputation model should include the outcome variable. But when we are doing external validation, we do have the outcome variable. So, my question is: does it make sense to impute the missing data in the validation data using the imputation model trained using the derivation data set? Or would recommend to fit an imputation model to the validation data separately?
Would it be possible to allow the aregImpute() function to perform something like imp.newx = predict(imp, newx)?
I’d also like to hear more about the advantages/disadvantages of hierarchical (w/ random effects) v. the nonlinear fixed effects models described so far.