RMS Discussions

Dear @f2harrell, how can I change the features of the prediction/effect line on a partial effect plot?

Using ‘colfill’ only changes the color of the confidence bands.
Is there a way to change e.g., the color and width of the line/curve?

ggplot(Predict(fit, x),
colfill = ‘red’,
anova = an, pval = T, size.anova = 5) +
theme_classic()

Thank you!

You can access the information provided by Predict() and incorporate this into a standard ggplot() object that you can manipulate like a standard object. See below for pseudocode that might give you an idea of how to do this:

ggplot(
  Predict(
    fit,
    kint = 2,
    x,
    ref.zero = TRUE,
    fun = exp
  ))  +
  geom_hline(yintercept = 1, lty = 2) +
  geom_vline(xintercept = 0.7, lty = 2) +
  geom_vline(xintercept = 3.7, lty = 2) +
  geom_rug(data = df, aes(x = x, y = NULL)) +
  scale_y_continuous(breaks = c(0.1, 0.25, 0.5, 0.75, 1, 1.5, 2, 4, 8, 16, 32)) +
  scale_x_continuous(limits = c(NA, 10)) +
  coord_trans(y = "log10") +
  theme_bw() +
  theme(legend.position = "bottom")
1 Like

This is excellent. Sometimes you can “+” onto a result from ggplot(Predict()) to get certain things changed. And there are many arguments to ggplot.Predict. But taking control using ggplot2 as @Elias_Eythorsson has done is often cleaner.

I’d say it’s a slight mistake Relative look odds would be a little better.

For the Cox model, it should be showing the plain linear predictor, which is centered so that the mean linear predictor over all observations is zero. For Cox, you could say that the y-axis (if not anti-logged) is always a relative log hazard since the Cox model has no intercept.

It’s often easiest to understand if you take charge by using the contrast function instead of Predict. Then you can form any differences you want to use any reference values you want, and anti-log those if you want relative effect ratios. The help file for contrast.rms has several examples. You usually have to pass the result of contrast through as.data.frame to make ggplot2 work.

Example: get relative log odds, relative to age=50 and make the age effect be for females if there is a sex interaction.

k <- contrast(fit, list(sex='female', age=30:70),
                          list(sex='female', age=50))
k <- as.data.frame(k)
ggplot(k) + ...
1 Like

Thank you, I appreciate a lot. Seems my confusion is finally resolved. The left-hand figure which I produced is showing the log relative hazard, where “relative” means “relative to the baseline hazard”; CI doesn’t collapse due to the uncertainty in spline estimation. The right-hand figure is showing the log relative hazard, where “relative” means “relative to the hazard for the reference value”. The CI collapses there (since exp(f(4.12) - f(4.12)) = 1). I think I’ll make that clear in my future figures by labelling the y-axes accordingly.

1 Like

My outcome is a normally distributed continuous variable (cognition scores). Exposure is a log transformed continuous variable. Covariates are 1) sex 2) race 3) education and 4)year of data collection.

When i fit the model using ols, I get a very large estimate for intercept , something in 1000s and an equally large standard error like 1000s. Estimates for other covariates, sex, race, education are okay.

I am wondering what could be root cause of a very large intercept estimate and correspondingly large SE for intercept and how should I investigate this issue?

Thanks in advance.

Intercepts move around as needed to properly center the estimates and can be extremely far from zero if the predictors have means that are far from zero. This is expected, and we don’t use the standard error of the intercept per se in any important contrasts or predictions.

1 Like

Dear @f2harrell,

I have identified that there is a significant effect modification by baseline age on the association of a genetic variant (let’s call it “snp”, a numeric variable) on risk of heart failure with preserved ejection fraction over time. The variant is more strongly associated with the outcome among those at the highest ages.

I am trying to plot this relationship using the rms package, incorporating a spline term for age and an interaction term for age and snp. I would like to plot age on the x axis and HR for HFpEF as a function of 1-unit increase in “snp”.

My attempts have only allowed me to plot the HR for HFpEF as a function of age stratified by SNP level (0,1,2). However, this is not totally what I’d like to graph.

My code is below:

library(rms)
options(datadist=“d”)
d ← datadist(df)
attach(df)

model ← cph(Surv(chftt, hfpef) ~ snp*rcs(AGE, 4), data = df, x=TRUE, y=TRUE)

snp_pred ← Predict(model, AGE, snp, fun=exp)

plot(snp_pred, xlab = “Age”, ylab = “Hazard Ratio”, main = “HR of SNP on hfpef across Age (Restricted Cubic Spline)”)

Side note: ggplot gives better output than plot.

A very useful interaction type of plot is obtained by ggplot(snp_pred, age, snp) but you may want to take total control of the calculations to plot an actual snp hazard ratio as a function of age. Type ?contrast.rms to see examples of this. Briefly, you ask contrast() to contrast two values of snp at a vector of values of age, anti-log the result and the confidence intervals, and plot with the ggplot2 package. See the line # show separate estimates by treatment and sex in the examples here.

2 Likes

Hi Frank can i use the lrm function to fit a repeated measure ordinal outcomes (y : 3-Above Average, 2-Average, 1-Below Average) ? If lrm cannot do it, which r library would you recommend that can fit repeated measure ordinal outcomes, thanks.

The definitions of y imply some grouping of data which does not seem proper.

lrm and orm can do this if using a GEE approach (by way of rms::robcov).

What is the timing of the repeats? For rapid repeats you can use a random effects ordinal model such as provided by the ordinal package although I’ve had some convergence issues with it. Random effects are implemented a Bayesian models in the rmsb package blrm function.

For long-time-duration repeats a serial correlation approach such as a Markov process is more likely to work.

1 Like

Frank, good question about Y, for now I will say that Y is created based on expert opinion. I am planning to check for the PO assumptions based on the method suggested in your article, doi: 10.1002/(sici)1097-0258(19980430)17:8<909::aid-sim753>3.0.co;2-o.

Timing is every trimester, but some participants have only one observation in any one trimester, some have one every trimester(3 observations). Some have more than one within any one trimester. So all possible combinations.

Assuming that Y is not a simplification of an underlying continuous variable, it still needs to be validated for inter-rater agreement before analyzing it.

1 Like

I will plan for this. Thanks Frank.

Dear All,

If I may ask another question on the topic of point-wise confidence bands for the continuous interaction plots. If I understand this correctly, the length of the point-wise confidence band at the extremes of the x-axis depends on how far these points are from the average of the x values, but not necessarily on the sparsity of data points (in case of the survival analysis, the events) corresponding to the extreme values of the x-axis. Is this correct? And is there any better way of constructing point-wise CIs in such a way that they depend on the number of events at the extreme ends of the x-axis? In other words, somehow to reflect in the length of the confidence interval the fact that at some points there are not enough events (so to have it wider). One critique towards using confidence bands (either point-wise or simultaneous) is that at the extreme ends of the x-axis where we usually do not have enough events we have a false sense of confidence since the confidence interval at those points includes events from the overall data.

Regards,
Samvel

This is a bit off topic compared to RMS General Regression - modeling strategy - Datamethods Discussion Forum but briefly you are right if linearity is assumed. If you use a spline then confidence bands are wider where you allow shapes to change (at knots). Widths come from model assumptions and distance from x-means and it’s not good to dictate widths then work backwards to model specification.

Dear prof. Harrell,
Thank you very much for this answer and sorry for the off topic question.

Samvel

I think this is another good reason why to select splines over linear models, since splines would allow to capture the change in the variance due to the sparsity of data.

is there any utility to centering continuous variables before applying the restricted cubic spline transformation? What about standardizing?

1 Like

This topic is for rms package software questions. Please move this question to RMS Modeling Longitudinal Responses - models - Datamethods Discussion Forum and I’ll delete it here.

1 Like