Logistic Model Questions

rafdoodle1 · December 13, 2024, 6:26pm

Hello there,

Hope everyone is having a wonderful holiday! In the photo below, I am trying to fit the following logistic regression model with rms’ lrm function in R, but I seem to be running into the shown errors/warnings and I don’t know why. Does the lrm function not support survey weights yet? I am currently using the svyglm function from survey instead and that is working fine for now.

Additionally, I plan on approximating the full model with a reduced model using the stepdown approach described by Harrell and Ambler, in which predictors and interactions that contribute the least to each model’s R2 will be dropped sequentially from each respective model until removing the next predictor or interaction would decrease R2 to less than 95% of the initial model’s R2. I didn’t find anything in the course notes about this approach or code for it, so below is the R function I made for it and I am wondering if it is correct.

# Function for stepdown procedure by Harrell and Ambler
stepdown <- function(full_model, data, threshold = 0.95) {
  initial_r2 <- calculate_nagelkerke_r2(full_model, data)
  current_model <- full_model
  current_r2 <- initial_r2
  removed_terms <- c()
  
  # Get the terms in the model
  terms_in_model <- attr(terms(full_model), "term.labels")
  
  # Iteratively remove terms and check R²
  while (length(terms_in_model) > 0) {
    r2_drop <- sapply(terms_in_model, function(term) {
      reduced_model <- try(update(current_model, as.formula(paste(". ~ . -", term))), silent = TRUE)
      if (inherits(reduced_model, "try-error")) return(Inf)
      calculate_nagelkerke_r2(reduced_model, data)
    })
    
    # Identify the term with the least impact on R²
    term_to_remove <- terms_in_model[which.max(r2_drop)]
    max_r2_drop <- r2_drop[which.max(r2_drop)]
    
    # Check if the current R² after removing the term is still above the threshold
    if (max_r2_drop < threshold * initial_r2) {
      break
    } else {
      current_model <- update(current_model, as.formula(paste(". ~ . -", term_to_remove)))
      terms_in_model <- setdiff(terms_in_model, term_to_remove)
      removed_terms <- c(removed_terms, term_to_remove)
      current_r2 <- max_r2_drop
    }
  }
  
  return(current_model)
}

I would appreciate any feedback!

Thanks,
Raf

f2harrell · December 13, 2024, 6:43pm

Weights are supported for model fitting, just not bootstrapping. You have a different problem related to a similarity in the design matrix. The varclus function may help you with this. The new version of the rms package that appeared today may also help as lrm was completely re-written and a better singularity tolerance is used.

You may have to simplify your model or try the new transx option in lrm described in the lrm.fit help file for the new version.

The looping for model approximation is automated in rms if you like the “fast backward” approach. Case studies are in https://hbiostat.org/rmsc.

rafdoodle1 · December 14, 2024, 5:56am

Hello Dr. Harrell,

Thanks for the response, as well as for running this amazing course and forum! Very much appreciated! What version of R is needed to run the latest version of rms with the newest lrm function with the new transx option? I currently have R version 4.4.2.

Regarding the similarity in the design matrix, I am developing my model from synthetic data, not the actual restricted data for which the model’s design matrix could be different. I would simplify my model, but for my initial model, I would like to include all the candidate predictors and interactions I identified in the literature for my thesis.

If I continue to use the survey package, would my stepdown function I wrote be a valid replacement for the fast backward approach (i.e. valid way to perform your and Ambler’s stepdown procedure)?

Thanks,
Raf

f2harrell · December 14, 2024, 1:07pm

That version of R should work. The new version is on CRAN for Mac now, and Linux. It will probably be there for Windows on 2024-12-15.

The stepdown algorithm is probably OK. I didn’t check it in detail. Just make sure you are predicted the full model predicted values, not the raw outcomes. And I’d use ordinary R^2 to measure approximation accuracy.