Sorry looks like a bug. lrm
only seems to support offset
if there are other covariates in the model and you are not suppressing the intercept.
Hi Dr. Harrell.
Is there a way to have your anova.rms function print out Likelihood Ratio tests instead of Wald? This would be especially useful in the logistic regression setting.
Thanks.
I wish there were. This would have been much better, but it requires multiple model fits. The best that rms
offers is lrtest()
once you make two fits.
Is the “proportion Chisquare” (from plot.anova function) still correct in logistic regression setting if it’s Wald-based?
Would you recommend to instead use AIC, which is likelihood based, to rank predictors in plot.anova function in order to decide on how many df to allocate to different variables?
Thanks.
AIC or any measure that is a function of -2 log likelihood will be better. Wald is just quicker. The proportion of \chi^2 from Wald tests is OK (anova.rms
) but could be improved by instead basing it on the log likelihood.
To clarify, is the AIC printed out in your plot.anova function based on Wald ChiSquare or is it based on -2 log likelihood? It seems to be on a scale that I am not used to seeing (the terms in the output sometimes have positive AIC sometimes negative).
Thanks.
AIC is defined to be negative but sometimes I convert it to a \chi^2 scale, i.e., \chi^2 minus 2 \times d.f. If I do that in the anova
context it is in an approximate sense since it’s substituting Wald for LR \chi^2.
Hi Dr. Harrell, I have another basic question regarding regression modelling. I built a model using variables based on clinical meaning and previously published data. None of them are significant. How would you present such model? is the nomogram still useful? I know reviewers will not like this model.
The main thing to concentrate on is the adequacy of the effective sample size (e.g., number of events if Y is binary) and can you show that the sample size was adequate for identification of prognostic factors. Don’t delete “insignificant” variables. Also emphasize the model L.R. \chi^2 statistic and its degrees of freedom. And you may show a predictor variable clustering dendogram using Spearman \rho^2 as a similarity measure. The nomogram will not be very helpful.
Hi Dr. Harrell. Is there a way to apply penalized regression (via pentrace) for a model with multiple imputation. It seems that pentrace requires coding x = T, y = T in your model call, but fit.mult.impute doesn’t seem to allow for this. Is there a way around this problem?
Thanks!
It is hard to know which order of analytical steps to use so I don’t have an answer for you. This provides more reason to use a Bayesian model where penalization is more automatically a part of the whole process through the use of skeptical prior distributions (‘shrinkage priors’).
Hi Dr. Harrell. I know that you generally recommend likelihood ratio based statistics to compare models (https://www.fharrell.com/post/addvalue/). However, I am under the impression that these statistics are primarily designed for nested models. Is there a particular statistic you’d recommend using to compare non-nested models. A typical use case would be if we were exploring 2 potential biomarkers and wanted to choose only one. Currently, I’ve seen people comparing AUC and then calculating a p-value to test if they are different. I recall from your course that you made it very clear not to use AUC for model comparison however.
Any suggestions would be greatly appreciated!
Thanks!
In my book, the chapter on maximum likelihood estimation covers this. Briefly, you form a “super model” with both biomarkers, and compare each of your models to that one. You can compute the adequacy index from that: proportion of log likelihood explained by a submodel compared to that explained by the super model.
Thanks. Can you (or is it appropriate to) calculate a p-value from this adequacy index to quantify the difference between the models?
You can compute p for (1+2) - (1) and (1+2) - (2). You may find that each biomarker adds ‘significant’ information to the other. If not, it’s more clear cut which one is better.
Hi Dr. Harrell. Are the performance metrics given for a bayesian logistic regression model subject to “optimism” as we find by frequentist models. For example, I would never use the apparent R2 to summarize my model performance, rather I’d apply the bootstrap or repeated CV to calculate an optimism-corrected version. Does the same have to be done for a bayesian model? If so, does “validate” or “calibrate” work with a model using blrm?
Thanks.
Great questions and someday I hope we have a central knowledge base about this. blrm
does not have validate
or calibrate
functions, mainly because the very meaning of a Bayesian model parameter estimate is different. With Bayes you have a posterior distribution for each parameter as well as derived parameters such as the c-index (for the latter you get an uncertainty interval measuring how well you can estimate it from the sample at hand). We are used to thinking of validation of statistical indexes and of calibration curves as validating a single set of parameter values.
Then there is the issue of what is meant by overfitting. In Bayesian modeling there is no overfitting per se, only a failure to agree on the priors. A judge who believes that a prior on \beta_3 should have been much narrower around zero will see the posterior distribution for \beta_3 from the wider prior as representing overfitting.
The Bayesian world, especially the Stan
implementation of it, is favoring a quick leave-out-one cross-validation approach. The best I can tell this is on the log-likelihood scale and is used to compare performance of competing models and not to say that a single model is OK. I hope someone can respond with more information about that.
For more discussion about this go here.
Hello,
I’m trying to fit a logistic regression model where the response variable follows a Binomial distribution rather than a Bernoulli distribution.
Using glm, the code would be:
m1 <- glm (prop ~ X1, data = mydata, family = binomial, weights = total)
In the glm function weights
is the total number of trials. The response variable prop
is a vector containing the proportion of successes and total
is a vector containing the number of trials.
When I tried to implement this using lrm, weights means something else and the code is throwing an error message back.
Have you come across this before? Is there an argument that needs to be included?
Thanks in advance for any advice.
rms::lrm
implements weights. If you have a bug to report please see hbiostat.org/fh/contacts.html for how-to information. The current version on CRAN
may have a bug related to weights that will be fixed in the next release, but I’m not sure.
Hi Dr. Harrell. I would like to know if “RMS” package can generate a P value for the interaction between two splines? As we all know, a 3-D surface plot is a good way to present the interaction between two continuous predictor variables. However, can we do any test to examine the interaction between these two splines, and then obtain a P value?
Any suggestions would be greatly appreciated!