Survival Modeling with Weibull: Why am I overestimating the effect size in my simulation?

Johannes_Schwenke · March 11, 2025, 10:21am

Context

I’m an MD currently doing a PhD in Clinical Epidemiology with a focus on Clinical Trial Design. My statistical foundation isn’t the best, but I’m trying to improve. I’d really appreciate some insight on survival modeling with the Weibull distribution.

TLDR

I’m planning the analysis of an RCT and rely on simulation because I don’t trust myself. I’m creating potential outcomes in survival time, dependent on covariates for semi-simulated data (based on real patients, details below).

I simulate survival times Y_i for each patient by drawing from a Weibull distribution, whose mean is predicted for each patient based on covariates, while the shape parameter is the same for everyone. For Y^{a=1}, I increase the scale parameter by a factor x, e.g., 1.25. For every simulation, patients are randomly allocated to treatment or control and their survival time is sampled from their Weibull distribution under a = 1 and a = 0 respectively.

I then try to recover the introduced effect size (increase in expected survival time, expressed as a ratio) using brms and two Weibull models: one without covariates and one with covariates, using standardization from marginaleffects. Somehow, I always seem to overestimate the effect size by a bit and I don’t understand why.

Long Version

I’m currently planning the statistical analysis of an RCT. For planning, I’m using routinely collected data of past patients from our hospital’s registry who would have been eligible for the trial, if it had taken place at that time. Therefore, I can’t share data as of now, but will try to produce a fully simulated, reproducible example if that’s needed.

When I run some visual diagnostics, the log(survival_time) and log(cumulative_hazard) seem to be roughly linear parallel for some important predictors, so at first glance a Weibull distribution seems appropriate. (Even if it wasn’t, I don’t think it would affect my simulation, as the Weibull distribution for each patient will be the ground truth later, not their actual survival time from the registry). Note: if there are some checks I should do for continuous predictors, let me know.

As I’m not allowed to add more than 1 picture to this post, please find the album here: imgur album

Modelling

First, I fit a Weibull model with some covariates to predict the survival time of the patients in my data set. I use this model for predictions under no treatment (a = 0). I’m using brms, and I think the model is parametrized as follows. I wrote this up myself after going through the brms documentation and Stan forum, so there might be errors.

We model the log of the expected survival time as a linear function of covariates:

\log(\mu_i) = \beta_0 + \beta_1x_{i1} + \beta_2x_{i2} + ... + \beta_px_{ip} \tag{0}

where \mu_i is the expected survival time for individual i, and x_{i1}, x_{i2}, ..., x_{ip} are their covariate values.

For notational simplicity, I drop the individual-level subscript i in the subsequent equations, but all individual-specific quantities (\mu, \lambda, and S(t)) should be understood as \mu_i, \lambda_i, and S_i(t) - representing the expected survival time, scale parameter, and survival probability for a specific individual with their corresponding covariate values.

The relationship between the mean \mu and the Weibull parameters is given by:

\mu = \lambda \Gamma(1 + \frac{1}{k}) \tag{1}

where \lambda is the scale parameter and k is the shape parameter. The survival function for the Weibull distribution is:

S(t) = \exp\left(-\left(\frac{t}{\lambda}\right)^k\right) \tag{2}

To calculate survival probabilities, we need to convert the model output \log(\mu) into the scale parameter \lambda. By rearranging equation (1), we get:

\lambda = \frac{\mu}{\Gamma(1 + \frac{1}{k})} \tag{3}

Note that the shape parameter k is assumed constant and does not vary with covariates in the regression model.

Substituting equation (3) into equation (2), we obtain the survival probability at time t in terms of \mu:

S(t) = \exp\left(-\left(\frac{t}{\frac{\mu}{\Gamma(1 + \frac{1}{k})}}\right)^k\right) \tag{4}

Another quick check: When I fit the model with ECOG as the only covariate, and overlay it on the KM curves, it looks okay. There probably are better checks which I don’t know.

# model fitting
fit.surv.ecog <- brm(
  formula = survival_time | cens(censoring) ~ 1 + ecog_fstcnt,
  data = surv_df,
  family = weibull(),
  cores = 4,
  seed = 1,
  iter = 4000,
  control = list(adapt_delta = 0.85), #increased because 7 divergent transitions
  file = "models/fit_surv_a0.rds",
  file_refit = "on_change"
)

# manually calculating survival time + CrIs
# for the CI we obtain the scale parameter for every draw from the posterior, by:
# - getting all the posterior draws for the shape parameter (the same for every group)
# - joining this using draw id on each drop with the ecog conditional estimate
# - use these to calculate scale parameter for every group for every draw 
# - then obtain mean, median, and quantiles for uncertainty

shape_draws <- fit.surv.ecog |> 
  spread_draws(shape) |> 
  mutate(drawid = factor(.draw)) |> 
  select(drawid, shape)

avg_pred_ci <- avg_predictions(fit.surv.ecog, type = "response", 
                            variables = "ecog_fstcnt") |> 
  posterior_draws() |> 
  left_join(shape_draws, by = "drawid") |> 
  mutate(scale = draw/gamma(1+1/shape)) |> 
  crossing(etime) |> 
  mutate(surv_prob = exp(-(etime/scale)^shape)) |> 
  group_by(ecog_fstcnt, etime) |> 
  summarise(mean = mean(surv_prob),
            median = median(surv_prob),
            conf.lower = quantile(surv_prob, 0.025),
            conf.higher = quantile(surv_prob, 0.975),
            .groups = "drop")

Next, I fit a model to make individual level predictions of survival time. I will use this model to create potential outcomes for each patient for each simulation.

fit.surv.indiv <- brm(
  formula = survival_time | cens(censoring) ~ 1 + diagnosis_lumped + ecog_fstcnt + poly(c_reactive_protein, 2) + poly(albumin, 2) + poly(lactate_dehydrogenase, 2),
  data = surv_df,
  family = weibull(),
  cores = 4,
  seed = 1,
  iter = 4000,
  control = list(adapt_delta = 0.85),
  file = "models/fit_surv_indiv.rds",
  file_refit = "on_change"
)

fit.surv.indiv summary

> fit.surv.indiv
 Family: weibull 
  Links: mu = log; shape = identity 
Formula: survival_time | cens(censoring) ~ 1 + diagnosis_lumped + ecog_fstcnt + poly(c_reactive_protein, 2) + poly(albumin, 2) + poly(lactate_dehydrogenase, 2) 
   Data: surv_df (Number of observations: 264) 
  Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
         total post-warmup draws = 8000

Regression Coefficients:
                                               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                                          6.89      0.32     6.32     7.57 1.00     5368     4580
diagnosis_lumpedLungtumorincludingmesothelioma     0.05      0.35    -0.64     0.73 1.00     5528     5478
diagnosis_lumpedProstatecancer                     0.53      0.45    -0.34     1.43 1.00     5933     5744
diagnosis_lumpedOther                             -0.24      0.34    -0.93     0.41 1.00     5381     5242
ecog_fstcnt.L                                     -1.27      0.31    -1.90    -0.67 1.00     7101     5924
ecog_fstcnt.Q                                     -0.15      0.26    -0.66     0.39 1.00     7166     5895
ecog_fstcnt.C                                     -0.27      0.21    -0.71     0.13 1.00     6799     6001
polyc_reactive_protein21                           0.13      2.28    -4.19     4.78 1.00     5408     5334
polyc_reactive_protein22                          -1.02      1.44    -3.81     1.83 1.00     7947     5796
polyalbumin21                                      6.88      2.76     1.68    12.61 1.00     6195     5352
polyalbumin22                                      3.85      2.34    -0.43     8.79 1.00     6381     5799
polylactate_dehydrogenase21                       -3.45      1.61    -6.35    -0.01 1.00     4795     4350
polylactate_dehydrogenase22                        3.14      1.38     0.77     6.10 1.00     5928     3885

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     1.09      0.11     0.89     1.31 1.00     4435     4919

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

These are the treatment models I will use to recover the effect I’ll introduce later:

# model with only trt_group
fit.surv.trt <- brm(
  formula = survival_time | cens(censoring) ~ 1 + trt_group,
  data = surv_trt,
  family = weibull(),
  cores = 4,
  seed = 1,
  iter = 4000,
  control = list(adapt_delta = 0.85),
  file = "models/fit_surv_trt.rds",
  file_refit = "on_change"
)

# model with adjustment variables (currently all from pred model)
fit.surv.trt.adj <- brm(
  formula = survival_time | cens(censoring) ~ 1 + trt_group + 
    diagnosis_lumped + ecog_fstcnt + poly(c_reactive_protein, 2) + 
    poly(albumin, 2) + poly(lactate_dehydrogenase, 2),
  data = surv_trt,
  family = weibull(),
  cores = 4,
  seed = 1,
  iter = 4000,
  control = list(adapt_delta = 0.85),
  file = "models/fit_surv_trt_adj.rds",
  file_refit = "on_change"
)

Simulations

This is (part of) the code I use for the simulations and recovery of the treatment effect. This is inside a parallelized function, with pred_model using fit.surv.indiv from above:

Code for Simulations

# shape is assumed to be constant/the same for all patients
shape <- posterior_summary(pred_model)["shape", 1]
    
# expected value of survival time for each patient
pred <- predictions(pred_model, type = "response") |> 
  data.frame()
      
# Set seed for reproducibility
set.seed(base_seed + i)

newdata <- pred |> 
  mutate(
    shape = shape,
    # calculate scale parameter with estimate (mu) and global shape
    scale_a0 = estimate/gamma(1+1/shape),
    
    # increase scale by factor of 1.25 (or any other treatment effect)
    scale_a1 = (estimate*effect_size)/gamma(1+1/shape))|> 
  rename(pat_id = rowid) |> 
  group_by(pat_id) |>
  
  # realization of weibull random variable for each patient for each potential outcome
  mutate(
    weibull_draw_1 = rweibull(1, shape = shape, scale = scale_a1),
    weibull_draw_0 = rweibull(1, shape = shape, scale = scale_a0)) |> 
  ungroup() |> 
  
  # create treatment assignment (1:1) and choose potential outcome belonging to treatment
  mutate(
    trt_group = sample(c("A", "B"), n(), replace = TRUE),
    survival_time = case_when(
      trt_group == "A" ~ weibull_draw_0,
      trt_group == "B" ~ weibull_draw_1),
      
      # create follow-up time dependent of recruitment time (365 days fu time, 500 days recruitment)
      fu_time = fu_time+ifelse(
        is.infinite(recruitment_time),
        0,  # No additional time when recruitment_time is infinite
        round(runif(n(), min = 0, max = recruitment_time))
        ),
      event_death = case_when(
        survival_time <= fu_time ~ 1, 
        TRUE ~ 0
        ),
      survival_time = case_when(
        survival_time > fu_time ~ fu_time,
        TRUE ~ survival_time
        ),
      censoring = case_when(
        event_death == 1 ~ "none",
        TRUE ~ "right")
  )
      
# Update the treatment model with new data
updated_fit <- update(
  trt_model,
  newdata = newdata,
  seed = base_seed,
  cores = 1
)
      
# Calculate comparisons
comparisons <- avg_comparisons(
  updated_fit,
  variables = "trt_group",
  type = "response",
  comparison = "ratio"
  ) |> 
  data.frame()

Given that the ground truth for each patient is a Weibull distribution, I’d assume that a Weibull model should accurately recover the effect size, especially the model which includes all predictive covariates. However, I seem to overestimate this slightly.

For 200 simulations with effect size (ratio of the expected survival times of the groups) 1.25 I get on average:

# With heavy right censoring (about 70%)
> mean(results_adj$estimate)
[1] 1.298035
> mean(results_unadj$estimate)
[1] 1.285314

# With infinite follow-up
> mean(results_adj_inf$estimate)
[1] 1.276131
> mean(results_unadj_inf$estimate)
[1] 1.323146

Red dashed line at 1.25 = truth.

image_1741687113487_0

For the null (only 100 iterations) these are the results:

# With heavy right censoring (about 70%)
> mean(results_adj_null$estimate)
[1] 1.016193
> mean(results_unadj_null$estimate)
[1] 1.046222

# With infinite follow-up
> mean(results_adj_null_inf$estimate)
[1] 1.020453
> mean(results_unadj_null_inf$estimate)
[1] 1.074315

Is this expected behavior? Am I making any mistakes? Should I approach the planning entirely differently? Thanks a lot for your input! If I should provide any additional info, let me know.

davidcnorrismd · March 11, 2025, 10:45am

What are your Monte Carlo standard errors (MCSE’s)?

Johannes_Schwenke · March 11, 2025, 11:35am

For one iteration with 70% right censoring these are as follows.
I haven’t extracted the MCSE for all the iterations, but could of course implement that, if needed. Tbh, not yet super comfortable on deciding when the MCSE is good enough, for the treatment parameter I thought it looked fine (?). FWIW, I tried a couple of seeds and am always overestimating.

bayestestR::mcse(fit.surv.trt)

#      Parameter        MCSE
# 1  b_Intercept 0.004562840
# 2 b_trt_groupB 0.004673126

bayestestR::mcse(fit.surv.trt.adj)
#                                           Parameter        MCSE
# 1                                       b_Intercept 0.008114060
# 2                                      b_trt_groupB 0.003297236
# 3  b_diagnosis_lumpedLungtumorincludingmesothelioma 0.008185067
# 4                  b_diagnosis_lumpedProstatecancer 0.009864573
# 5                           b_diagnosis_lumpedOther 0.008383347
# 6                                   b_ecog_fstcnt.L 0.005113869
# 7                                   b_ecog_fstcnt.Q 0.004314383
# 8                                   b_ecog_fstcnt.C 0.003486006
# 9                        b_polyc_reactive_protein21 0.037594315
# 10                       b_polyc_reactive_protein22 0.023718682
# 11                                  b_polyalbumin21 0.041053805
# 12                                  b_polyalbumin22 0.026856403
# 13                    b_polylactate_dehydrogenase21 0.028984758
# 14                    b_polylactate_dehydrogenase22 0.020225976

Model output below

Summary

Adjusted

Family: weibull 
  Links: mu = log; shape = identity 
Formula: survival_time | cens(censoring) ~ 1 + trt_group + diagnosis_lumped + ecog_fstcnt + poly(c_reactive_protein, 2) + poly(albumin, 2) + poly(lactate_dehydrogenase, 2) 
   Data: surv_trt (Number of observations: 264) 
  Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
         total post-warmup draws = 8000

Regression Coefficients:
                                               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept                                          7.78      0.50     6.92     8.86 1.00     4045     4485
trt_groupB                                         0.69      0.31     0.11     1.32 1.00     8744     5847
diagnosis_lumpedLungtumorincludingmesothelioma    -0.54      0.50    -1.58     0.38 1.00     3816     4108
diagnosis_lumpedProstatecancer                     0.08      0.61    -1.11     1.29 1.00     3852     4624
diagnosis_lumpedOther                             -0.30      0.51    -1.34     0.67 1.00     3769     4182
ecog_fstcnt.L                                     -1.05      0.42    -1.89    -0.25 1.00     6770     5933
ecog_fstcnt.Q                                     -0.06      0.35    -0.74     0.66 1.00     6854     5072
ecog_fstcnt.C                                     -0.53      0.30    -1.14     0.02 1.00     7403     5688
polyc_reactive_protein21                           0.12      2.88    -5.37     5.94 1.00     5959     5155
polyc_reactive_protein22                           1.32      1.98    -2.39     5.43 1.00     7085     6133
polyalbumin21                                      6.80      3.13     0.83    13.17 1.00     5851     5724
polyalbumin22                                     -0.45      2.39    -4.99     4.39 1.00     7995     6575
polylactate_dehydrogenase21                       -5.54      2.09    -9.66    -1.38 1.00     5214     5257
polylactate_dehydrogenase22                        3.87      1.66     0.77     7.40 1.00     7046     4885

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     0.85      0.09     0.67     1.04 1.00     3925     4891

Unadjusted

Family: weibull 
  Links: mu = log; shape = identity 
Formula: survival_time | cens(censoring) ~ 1 + trt_group 
   Data: surv_trt (Number of observations: 264) 
  Draws: 4 chains, each with iter = 4000; warmup = 2000; thin = 1;
         total post-warmup draws = 8000

Regression Coefficients:
           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept      7.50      0.26     7.05     8.09 1.00     3428     3055
trt_groupB     0.57      0.28     0.04     1.15 1.00     3742     3853

Further Distributional Parameters:
      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
shape     0.83      0.09     0.67     1.02 1.00     2957     3544

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
and Tail_ESS are effective sample size measures, and Rhat is the potential
scale reduction factor on split chains (at convergence, Rhat = 1).

davidcnorrismd · March 11, 2025, 11:53am

What are the MCSE’s for the estimates you’re comparing? I’m wondering (to begin) how much of the ‘overestimation’ that worries you is accounted-for by an MCSE ‘±’.

Johannes_Schwenke · March 11, 2025, 12:15pm

If I understand correctly, you mean the MCSE for the ratios of the expected survival time? If I computed this correctly, the MCSE seems small-ish, doesn’t it?

ratios <- avg_comparisons(
  fit.surv.trt.adj,
  variables = "trt_group",
  type = "response",
  comparison = "ratio") |> 
  posterior_draws() |> 
  pull(draw)

> sd(ratios) / sqrt(posterior::ess_basic(ratios))
[1] 0.007753826
> sd(ratios) / sqrt(posterior::ess_bulk(ratios))
[1] 0.007270282
> sd(ratios) / sqrt(posterior::ess_tail(ratios))
[1] 0.008898318

js592 · March 11, 2025, 7:41pm

I think what David may be getting at is the MCSE of the mean of the overall simulation (not the MSCE of the individual parameter estimates per round – these should be virtually ignorable if you are using enough MCMC iterations). I.e., you are computing the mean point estimate of the treatment effect with 200 Monte Carlo simulations, what is its MCSE?

The only advice I can offer at the moment is that your simulation seems somewhat complicated. I would suggest focusing first on a simulated dataset without any patient specific covariates (I couldn’t tell if your “unadjusted” model is doing this). Also, you may want to review how many patients are included in the simulated datasets (this number should also affect the MCSE I think).

davidcnorrismd · March 11, 2025, 7:57pm

So it does seem the [upward] bias in the estimates exceeds the MCSE, but this isn’t as worrisome as one might suppose. Biased estimators are actually okay! What would be troubling, though, is a lack of consistency. If your bias fails to converge to zero as the size of your simulated population increases, then you might worry about an error in your code.

f2harrell · March 11, 2025, 8:20pm

This is probably not helpful but the spower function in the R Hmisc package simulates Weibull random survival times, with complexities such as time-varying treatment effect, dropout, dropping, etc. This may just help you double check the simulation.

Johannes_Schwenke · March 13, 2025, 8:33am

I will simplify the simulation and shall report back in due course.

js592 · March 13, 2025, 3:28pm

To elaborate on my final point here: I noticed your brms code does not set any priors. I think it will set some defaults on the backend which are probably sensible but if these are in conflict with your assumed ground truth effect size and/or the sample size per simulation is small then this could result in the observed bias. You may also want to look into this.

Johannes_Schwenke · March 13, 2025, 3:56pm

Yes, that’s correct, I didn’t set priors on purpose for the time being, but pretty sure these were not responsible for the bias.

Prior Summary

rior_summary(fit.surv.trt.adj)
                  prior     class                                           coef group resp dpar nlpar lb ub       source
                 (flat)         b                                                                                 default
                 (flat)         b diagnosis_lumpedLungtumorincludingmesothelioma                             (vectorized)
                 (flat)         b                          diagnosis_lumpedOther                             (vectorized)
                 (flat)         b                 diagnosis_lumpedProstatecancer                             (vectorized)
                 (flat)         b                                  ecog_fstcnt.C                             (vectorized)
                 (flat)         b                                  ecog_fstcnt.L                             (vectorized)
                 (flat)         b                                  ecog_fstcnt.Q                             (vectorized)
                 (flat)         b                                  polyalbumin21                             (vectorized)
                 (flat)         b                                  polyalbumin22                             (vectorized)
                 (flat)         b                    polylactate_dehydrogenase21                             (vectorized)
                 (flat)         b                    polylactate_dehydrogenase22                             (vectorized)
                 (flat)         b                                     trt_groupB                             (vectorized)
 student_t(3, 6.3, 2.5) Intercept                                                                                 default
      gamma(0.01, 0.01)     shape                                                                       0         default
>

I rewrote the function which creates the simulated datasets. Started from very simple, purely simulated datasets, where everyone had the same Weibull distribution, up to predicting an individual Weibull for every patient based on his/her covariates. There was en error in handling non-infinite follow-up times, but this should not have affected the simulation when setting follow-up to infinity. But for some reason I now no longer seem to be overestimating the truth. At least according to my preliminary testing. I think it’s very strange, so will definitely do some more testing.

Rewritten sim function

run_iteration <- function(i, data, trt_model, base_seed, fu_time, 
                               recruitment_time, effect_size) {
  
    # Set seed for reproducibility
    set.seed(base_seed + i)
    
    newdata <- data |> 
      mutate(
        weibull_draw_0 = rweibull(n(), shape = shape, scale = scale),
        weibull_draw_1 = rweibull(n(), shape = shape, scale = effect_size * scale),
        trt_group = sample(c("A", "B"), size = n(), replace = TRUE),
        survival_time = if_else(
          trt_group == "B", 
          weibull_draw_1, 
          weibull_draw_0), 
        recruitment_time_indiv = runif(n(), min = 0, max = recruitment_time),
        # censoring
        fu_time = if (is.infinite(fu_time)) {
                    Inf
                  } else {
                    fu_time + recruitment_time_indiv  # Add individual times
                  },
        event_death = case_when(
          survival_time <= fu_time ~ 1, 
          TRUE ~ 0
        ),
        survival_time = case_when(
          survival_time > fu_time ~ fu_time,
          TRUE ~ survival_time
        ),
        censoring = case_when(
          event_death == 1 ~ "none",
          TRUE ~ "right")
        )
    
    # Update the model with new data
    updated_fit <- update(
      trt_model,
      newdata = newdata,
      seed = base_seed,
      cores = 1
    )
    
    # Calculate comparisons
    comparisons <- avg_comparisons(
      updated_fit,
      variables = "trt_group",
      type = "response",
      comparison = "ratio"
    ) |> 
      data.frame()
    
    # Calculate probability of benefit
    draws <- posterior_draws(
      avg_comparisons(
        updated_fit,
        variables = "trt_group",
        type = "response",
        comparison = "ratio"
      )
    )
    p_benefit <- mean(draws$draw > 1)
    
    # Return results for this iteration
    list(
      iteration = i,
      contrast = comparisons$contrast,
      estimate = comparisons$estimate,
      conf.low = comparisons$conf.low,
      conf.high = comparisons$conf.high,
      p_benefit = p_benefit
    )
  }

Input data for sim-function

shape_constant <- posterior_summary(fit.surv.indiv)["shape", 1]

pred_mult_df <- predictions(fit.surv.indiv) |> 
  data.frame() |> 
  select(rowid, 
         estimate, 
         ecog_fstcnt, 
         diagnosis_lumped, 
         c_reactive_protein,
         albumin,
         lactate_dehydrogenase) |> 
  rename(pat_id = rowid,
         mu = estimate) |> 
  mutate(shape = shape_constant,
         scale = mu/gamma(1+1/shape))

Here’s some reproducible code, for the purely simulated examples.

Summary

Setup

library(tidyverse)
library(ggplot2)
library(brms)
library(tidybayes)
library(patchwork)
library(marginaleffects)
library(future)
library(furrr)
library(arrow)
library(mice)

options(
  ggplot2.discrete.colour = ggokabeito::palette_okabe_ito(),
  ggplot2.discrete.fill = ggokabeito::palette_okabe_ito(),
  ggplot2.continuous.colour = "viridis",
  ggplot2.continuous.fill = "viridis",
  book.base_family = "sans",
  book.base_size = 14
)

# set default theme
theme_set(theme_gray(base_size = 12) +
            theme(panel.grid = element_blank()))

Same Weibull for everyone

Fit unadjusted model

We need to fit the Weibull model once, before we can update it in the iterations.

n <- 250
global_shape <- 1.1
global_mu <- 1200
trt_effect <- 1.25

pat_id <- 1:n

set.seed(1234)

simple_surv_df <- tibble(pat_id) |> 
    mutate(
      shape = global_shape, 
      mu = global_mu,
      scale = mu/gamma(1+ 1/shape),
      weibull_draw_0 = rweibull(n, shape = shape, scale = scale),
      weibull_draw_1 = rweibull(n, shape = shape, scale = trt_effect * scale),
      trt_group = sample(c("A", "B"), size = n, replace = TRUE),
      survival_time = if_else(
        trt_group == "B", 
        weibull_draw_1, 
        weibull_draw_0), 
      censoring = "none")


fit.surv.trt.simple <- brm(
  survival_time | cens(censoring) ~ 1 + trt_group,
  data = simple_surv_df,
  family = weibull(),
  cores = 4,
  seed = 1,
  iter = 2000,
  file = "models/fit_surv_trt_simple.rds",
  file_refit = "on_change"
)

Function used in parallel iterations

# Define the iteration function
run_iteration <- function(i, n, trt_model, base_seed, fu_time, recruitment_time, 
                          effect_size, global_shape, global_mu) {
  
  # Set seed for reproducibility
  set.seed(base_seed + i)
  
  pat_id <- 1:n
  
  newdata <- tibble(pat_id) |> 
    mutate(
      shape = global_shape, 
      mu = global_mu,
      scale = mu/gamma(1+ 1/shape),
      weibull_draw_0 = rweibull(n, shape = shape, scale = scale),
      weibull_draw_1 = rweibull(n, shape = shape, scale = effect_size * scale),
      trt_group = sample(c("A", "B"), size = n, replace = TRUE),
      survival_time = if_else(
        trt_group == "B", 
        weibull_draw_1, 
        weibull_draw_0), 
      recruitment_time_indiv = runif(n(), min = 0, max = recruitment_time),
      # censoring
      fu_time = if (is.infinite(fu_time)) {
                  Inf  # Just use original fu_time
                } else {
                  fu_time + recruitment_time_indiv  # Add individual times
                },
      event_death = case_when(
        survival_time <= fu_time ~ 1, 
        TRUE ~ 0
      ),
      survival_time = case_when(
        survival_time > fu_time ~ fu_time,
        TRUE ~ survival_time
      ),
      censoring = case_when(
        event_death == 1 ~ "none",
        TRUE ~ "right"))  
  
  # Update the model with new data
  updated_fit <- update(
    trt_model,
    newdata = newdata,
    seed = base_seed,
    cores = 1
  )
  
  # Calculate comparisons
  comparisons <- avg_comparisons(
    updated_fit,
    variables = "trt_group",
    type = "response",
    comparison = "ratio"
  ) |> 
    data.frame()
  
  # Calculate probability of benefit
  draws <- posterior_draws(
    avg_comparisons(
      updated_fit,
      variables = "trt_group",
      type = "response",
      comparison = "ratio"
    )
  )
  p_benefit <- mean(draws$draw > 1)
  
  
  
  # Return results for this iteration
  list(
    iteration = i,
    contrast = comparisons$contrast,
    estimate = comparisons$estimate,
    conf.low = comparisons$conf.low,
    conf.high = comparisons$conf.high,
    p_benefit = p_benefit
  )
}

Run simulation

base_seed <- 2025
recruitment_time <- 550

n_iterations <- 1e2
n_cores <- 10

# Set up parallel processing
plan(multisession, workers = n_cores, gc = TRUE)

# Create sequence of iterations
iterations <- 1:n_iterations

# Run iterations in parallel
results <- future_map(
  iterations,
  ~run_iteration(
    .x, n = n, trt_model = fit.surv.trt.simple, base_seed = base_seed, 
    fu_time = Inf, recruitment_time = recruitment_time,
    effect_size = effect_size, global_shape = global_shape, 
    global_mu = global_mu),
  .options = furrr_options(seed = TRUE),
  .progress = TRUE
)

results_df <- do.call(rbind, lapply(results, data.frame))

# Run iterations in parallel
results_365 <- future_map(
  iterations,
  ~run_iteration(
    .x, n = n, trt_model = fit.surv.trt.simple, base_seed = base_seed, 
    fu_time = 365, recruitment_time = recruitment_time,
    effect_size = effect_size, global_shape = global_shape, 
    global_mu = global_mu),
  .options = furrr_options(seed = TRUE),
  .progress = TRUE
)

results_365_df <- do.call(rbind, lapply(results_365, data.frame))

plan(sequential)

Plot simulation

MCMC_mean <- mean(results_df$estimate)

p1 <- results_df |> 
  ggplot(aes(x = iteration, y = estimate)) +
  geom_hline(yintercept = effect_size, color = "red", linetype = "dashed") +
  geom_hline(yintercept = MCMC_mean, color = "orange", linetype = "dotted") +
  geom_hline(yintercept = 1, color = "blue", linetype = "dashed") +
  geom_point() +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high)) +
  scale_y_continuous(limits = c(0, 5)) +
  labs(
    subtitle = "Point estimates and 95% CrI with Inf FU"
  )

MCMC_mean <- mean(results_365_df$estimate)

p2 <- results_365_df |> 
  ggplot(aes(x = iteration, y = estimate)) +
  geom_hline(yintercept = effect_size, color = "red", linetype = "dashed") +
  geom_hline(yintercept = MCMC_mean, color = "orange", linetype = "dotted") +
  geom_hline(yintercept = 1, color = "blue", linetype = "dashed") +
  geom_point() +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high)) +
  scale_y_continuous(limits = c(0, 5)) +
  labs(
    subtitle = "Point estimates and 95% CrI with 365 days FU"
  )

p1 + p2 + plot_annotation(
  title = 'Simple Simluation'
)

Different Weibull for different ECOG

global_shape <- 1.1
intercept <- 6.63
ecog_L <- -1.75
ecog_Q <- -0.18
ecog_C <- -0.29
fu_time <- Inf
 
ecog_data <- tibble(pat_id) |> 
    mutate(
      ecog_fstcnt = sample(
        c("0", "1", "2", "3"), size = n, replace = TRUE, 
        prob = c(0.35, 0.45, 0.15, 0.5)),
      ecog_fstcnt = factor(ecog_fstcnt, levels = 0:3, ordered = TRUE),
      shape = global_shape,
      ecog_effect = case_when(
      ecog_fstcnt == "0" ~ ecog_L * (-0.6708204) + 
        ecog_Q * (0.5) + ecog_C * (-0.2236068),
      ecog_fstcnt == "1" ~ ecog_L * (-0.2236068) + 
        ecog_Q * (-0.5) + ecog_C * (0.6708204),
      ecog_fstcnt == "2" ~ ecog_L * (0.2236068) + 
        ecog_Q * (-0.5) + ecog_C * (-0.6708204),
      ecog_fstcnt == "3" ~ ecog_L * (0.6708204) + 
        ecog_Q * (0.5) + ecog_C * (0.2236068)
      ),
      mu = exp(intercept + ecog_effect),
      scale = mu/gamma(1+ 1/shape),
      weibull_draw_0 = rweibull(n, shape = shape, scale = scale),
      weibull_draw_1 = rweibull(n, shape = shape, scale = effect_size * scale),
      trt_group = sample(c("A", "B"), size = n, replace = TRUE),
      survival_time = if_else(
        trt_group == "B", 
        weibull_draw_1, 
        weibull_draw_0), 
      recruitment_time_indiv = runif(n(), min = 0, max = recruitment_time),
      # censoring
      fu_time = if (is.infinite(fu_time)) {
                  Inf  # Just use original fu_time
                } else {
                  fu_time + recruitment_time_indiv  # Add individual times
                },
      event_death = case_when(
        survival_time <= fu_time ~ 1, 
        TRUE ~ 0
      ),
      survival_time = case_when(
        survival_time > fu_time ~ fu_time,
        TRUE ~ survival_time
      ),
      censoring = case_when(
        event_death == 1 ~ "none",
        TRUE ~ "right"))  

ecog_data |> 
  group_by(trt_group) |> 
  summarise(mean = mean(survival_time)) |> 
  pivot_wider(names_from = trt_group,
              values_from = mean,
              names_prefix = "mean_") |> 
  mutate(ratio = mean_B / mean_A)

fit.surv.trt.ecog <- brm(
  survival_time | cens(censoring) ~ 1 + trt_group + ecog_fstcnt,
  data = ecog_data,
  family = weibull(),
  cores = 4,
  seed = 1,
  iter = 2000,
  file = "models/fit_surv_trt_simple_ecog.rds",
  file_refit = "on_change"
)

New function for parallel processing

run_iteration_ecog <- function(i, n, trt_model, base_seed, fu_time, 
                               recruitment_time, effect_size, global_shape) {
  
  # Set seed for reproducibility
  set.seed(base_seed + i)
  
 # Predictor values taken from model
  ecog_L <- -1.75
  ecog_Q <- -0.18
  ecog_C <- -0.29
  
  pat_id <- 1:n
  
  newdata <- tibble(pat_id) |> 
    mutate(
      shape = global_shape, 
      ecog_fstcnt = sample(
        c("0", "1", "2", "3"), size = n, replace = TRUE, 
        prob = c(0.35, 0.45, 0.15, 0.5)),
      ecog_fstcnt = factor(ecog_fstcnt, levels = 0:3, ordered = TRUE),
      ecog_effect = case_when(
      ecog_fstcnt == "0" ~ ecog_L * (-0.6708204) + 
        ecog_Q * (0.5) + ecog_C * (-0.2236068),
      ecog_fstcnt == "1" ~ ecog_L * (-0.2236068) + 
        ecog_Q * (-0.5) + ecog_C * (0.6708204),
      ecog_fstcnt == "2" ~ ecog_L * (0.2236068) + 
        ecog_Q * (-0.5) + ecog_C * (-0.6708204),
      ecog_fstcnt == "3" ~ ecog_L * (0.6708204) + 
        ecog_Q * (0.5) + ecog_C * (0.2236068)
      ),
      mu = exp(intercept + ecog_effect),
      scale = mu/gamma(1+ 1/shape),
      weibull_draw_0 = rweibull(n, shape = shape, scale = scale),
      weibull_draw_1 = rweibull(n, shape = shape, scale = effect_size * scale),
      trt_group = sample(c("A", "B"), size = n, replace = TRUE),
      survival_time = if_else(
        trt_group == "B", 
        weibull_draw_1, 
        weibull_draw_0), 
      recruitment_time_indiv = runif(n(), min = 0, max = recruitment_time),
      # censoring
      fu_time = if (is.infinite(fu_time)) {
                  Inf
                } else {
                  fu_time + recruitment_time_indiv
                },
      event_death = case_when(
        survival_time <= fu_time ~ 1, 
        TRUE ~ 0
      ),
      survival_time = case_when(
        survival_time > fu_time ~ fu_time,
        TRUE ~ survival_time
      ),
      censoring = case_when(
        event_death == 1 ~ "none",
        TRUE ~ "right"))  
  
  # Update the model with new data
  updated_fit <- update(
    trt_model,
    newdata = newdata,
    seed = base_seed,
    cores = 1
  )
  
  # Calculate comparisons
  comparisons <- avg_comparisons(
    updated_fit,
    variables = "trt_group",
    type = "response",
    comparison = "ratio"
  ) |> 
    data.frame()
  
  # Calculate probability of benefit
  draws <- posterior_draws(
    avg_comparisons(
      updated_fit,
      variables = "trt_group",
      type = "response",
      comparison = "ratio"
    )
  )
  p_benefit <- mean(draws$draw > 1)
  
  
  
  # Return results for this iteration
  list(
    iteration = i,
    contrast = comparisons$contrast,
    estimate = comparisons$estimate,
    conf.low = comparisons$conf.low,
    conf.high = comparisons$conf.high,
    p_benefit = p_benefit
  )
}

Run simulations where ecog affects survival.

base_seed <- 2025

# Set up parallel processing
plan(multisession, workers = n_cores, gc = TRUE)

# Create sequence of iterations
iterations <- 1:n_iterations

# Run iterations in parallel
results <- future_map(
  iterations,
  ~run_iteration_ecog(
    .x, n = n, trt_model = fit.surv.trt.ecog, base_seed = base_seed, 
    fu_time = Inf, recruitment_time = recruitment_time,
    effect_size = effect_size, global_shape = global_shape),
  .options = furrr_options(seed = TRUE),
  .progress = TRUE
)

results_ecog_df <- do.call(rbind, lapply(results, data.frame))

results_365 <- future_map(
  iterations,
  ~run_iteration_ecog(
    .x, n = n, trt_model = fit.surv.trt.ecog, base_seed = base_seed, 
    fu_time = 365, recruitment_time = recruitment_time,
    effect_size = effect_size, global_shape = global_shape),
  .options = furrr_options(seed = TRUE),
  .progress = TRUE
)

results_ecog_365_df <- do.call(rbind, lapply(results_365, data.frame))

plan(sequential)

MCMC_mean <- mean(results_ecog_df$estimate)

p3 <- results_ecog_df |> 
  ggplot(aes(x = iteration, y = estimate)) +
  geom_hline(yintercept = effect_size, color = "red", linetype = "dashed") +
  geom_hline(yintercept = MCMC_mean, color = "orange", linetype = "dotted") +
  geom_hline(yintercept = 1, color = "blue", linetype = "dashed") +
  geom_point() +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high)) +
  scale_y_continuous(limits = c(0, 5)) +
  labs(
    subtitle = "Point estimates and 95% CrI with Inf FU"
  )

MCMC_mean <- mean(results_ecog_365_df$estimate)

p4 <- results_ecog_365_df |> 
  ggplot(aes(x = iteration, y = estimate)) +
  geom_hline(yintercept = effect_size, color = "red", linetype = "dashed") +
  geom_hline(yintercept = MCMC_mean, color = "orange", linetype = "dotted") +
  geom_hline(yintercept = 1, color = "blue", linetype = "dashed") +
  geom_point() +
  geom_errorbar(aes(ymin = conf.low, ymax = conf.high)) +
  scale_y_continuous(limits = c(0, 5)) +
  labs(
    subtitle = "Point estimates and 95% CrI with 365 days FU"
  )

p3 + p4 + plot_annotation(
  title = 'Simluation with ECOG as predictor'
)

When using fully simulated data, the estimator seems to be unbiased.

davidcnorrismd · March 14, 2025, 12:26pm

The MRE concept elaborated on Stack Overflow may prove helpful.