A likelihood ratio test of two nested models with or without interaction could work for that. Let’s see what Frank has to say.
When approximate Wald \chi^2 tests are accurate enough, you get this test automatically by running anova(fit object)
. anova
also provides separate tests for where the interaction is nonlinear in each variable.
And how do you do it in a Bayesian model, for example with blrm? An interaction with a spline implies k terms, each with its posterior probability…
This is a question that I’ve wondered about since we don’t have composite assessment with Bayesian in the same way as we do chunk tests with frequentist tests. I think that with Bayes the leave-out-one approach to assessing the log likelihood is probably the way to go, as described in McElreath’s Statistical Thinking where he shows how to compute the probability that one model is “truer” than another.
Upps… and how do you assess that with rmsb?
Hi Dr. Harrell. I have a logistic regression model which includes 4 treatment arms and several other predictors (no interactions). I produced a nomogram based on information from this model to be used for future stratification in clinical trials. However, towards that end, we wouldn’t have information on which treatment the patient will be receiving. Is there a way to yield a “score” from a nomogram ignoring one of the predictors? Can I, for example, take the average number of “points” provided by the 4 different treatments and just add that to each new patient to yield a predicted score independent of treatment?
Thanks.
See the loo
option in the help file and look at Statistical Rethinking.
I think you are confusing “nomogram” with “model development”. A nomogram is just a graphical device to present a model once it’s developed.
To answer your question when you have a missing variable it is not appropriate to scale up the linear predictor as if that variable weren’t in the model.
Would you suggest to instead build a model ignoring treatment? Is there any way to use a model containing treatment to help stratify patients for a future clinical trial (where treatment is not known)? I thought there might be, considering there are no interactions assumed in the model.
Thanks.
Not sure. It may depend on whether treatment is orthogonal to baseline characteristics (i.e., whether the trial was randomized).
Thank you very much for your reply! However, the anova(fit.object) command works well only in linear regression, but not in logistic regression. For instance, I typed help(lrm) in R and found a example made by you to fit a logistic model containing predictors age, blood.pressure, sex and cholesterol. I will use it as a example in the following.
fit1 <- lm( blood.pressure ~ rcs(age,4) + rcs(cholesterol,4)+rcs(age,4)*rcs(cholesterol,4)+sex, x=TRUE, y=TRUE)
anova(fit1)
fit2 <- lrm(y ~ rcs(age,4) + rcs(cholesterol,4)+rcs(age,4)*rcs(cholesterol,4) +sex, x=TRUE, y=TRUE)
anova(fit2)
By anova(fit1), I can get the P value for the interaction between age and cholesterol. This is the result:
Analysis of Variance Table
Response: blood.pressure
Df Sum Sq Mean Sq F value Pr(>F)
rcs(age, 4) 3 1209 402.97 1.7833 0.1487
rcs(cholesterol, 4) 3 692 230.51 1.0201 0.3829
sex 1 5 4.51 0.0200 0.8876
rcs(age, 4):rcs(cholesterol, 4) 9 2756 306.22 1.3551 0.2042
Residuals 980 221457 225.98
But by anova(fit2), the R reported:
"singular information matrix in lrm.fit (rank= 16 ). Offending variable(s): age’’ * cholesterol’’
Warning message:
In lrm(y ~ rcs(age, 4) + rcs(cholesterol, 4) + rcs(age, 4) * rcs(cholesterol, :
Unable to fit model using “lrm.fit” "
So, what can I to test the significance of the interaction between two splines in a logistic regression model? Thanks in advance for any advice.
Please block quote code so it displays properly.
To fix your problem use anova(fit2, tol=1e-16)
as there is a matrix near-singularity. If that doesn’t work, reduce the number of knots by one.
In a repeated measures study, you recommend including baseline as a predictor. However, with this setup, I am struggling how to set up the right contrast to be able to compare a later measurement to the baseline measure. For example, we may be interested in testing, within a given treatment arm, is the biomarker measurement at week 2 different than it was at baseline. If baseline is on the LHS of the equation, this contrast is straightforward to set up (since I can get an estimate for the biomarker’s value at baseline and at week 2 and test if the difference = 0) but when it’s on the RHS, I can’t figure out the right contrast. Perhaps, if time is treated as continuous this can still be estimated by setting time to = 0, however, if time is categorical (e.g. there are only 3 time points and it was preferable to model time as categorical) is there a contrast which would address that question?
Thanks.
Clarify the study design. Your question makes sense only if this is a single-arm study with no control group.
Thank you very much for your quick reply.
Let’s say a randomized clinical trial with 2 treatment arms (A & B). They measure a continuous biomarker right before treatment (baseline) and then 2 and 16 weeks after treatment. If I wanted to test within treatment A if the continuous biomarker at week 2 is different than it was at baseline, I can see how to set up a contrast with a model which includes baseline values in the outcome, but I don’t see how I can test for this with a model which doesn’t (and treats time as categorical). Does that help clarify the question?
Thanks!
You are trying to turn a parallel group study into a pre-post study. Don’t. The response variable will be different from baseline just because of regression to the mean and other non-biological effects.
Think of the key question in a parallel group study: do two patients who started at the same biomarker level end at different levels due to treatment? This is a contrast between, say, week 16 values for treatments A and B covariate adjusted for baseline biomarker level.
Hi Dr. Harrell. I apologize, but I am struggling to understand the issue with using a parallel group study to assess a significant change from baseline within a treatment arm. While I understand that the main goal in a parallel group study is to compare the effect of treatment on the outcome, I don’t understand why we also can’t be interested in a secondary goal of assessing if there is a change from baseline within each arm. Why doesn’t the parallel group study have the benefit of containing a “pre-post study” within each arm in addition to the benefit of being able to compare the arms?
Also, regarding your comment “The response variable will be different from baseline just because of regression to the mean and other non-biological effects.”, wouldn’t this be an issue for any study design? What would you consider to be an appropriate way to assess if a given treatment will change the values of a continuous biomarker over time?
Thanks.
Change from baseline is an invalid outcome variable in a parallel group study. Think of it these ways:
- regression to the mean can make changes uninteresting
- inclusion/exclusion criteria on the baseline variable can make the change from baseline arbitrary
I am missing why you would even be interested in this question. The only time pre/post calculations are done is when you have no other alternative (say in a pre-post design with no randomization, which is the weakest of all study designs). Capitalize on the fact that you have a good design, and use the baseline only for covariate adjustment.
I was catching up on the course and the materials and questions are excellent!
Quick question: during the titanic case study Dr. @f2harrell mentioned there is an updated version titanic5, is it available on encyclopedia titanica or somewhere else? I wandered through encyclopedia titanica and found no entrance to a complete dataset.
Here is a temporary place holding a version that shows old (titanic3
) and new data: https://hbiostat.org/attach/Titanic_v3_and_v5_data.xlsx courtesy of David Beltran of Honda Inc.