Hello everyone,
I’m working on a dataset of N trials containing treatment effects for outcomes X and Y, the standard errors of these treatment effects, and the within-trial correlation of treatment effects on X and Y.
Y is generally considered more important/relevant for patients and clinicians than X but, due to restraints on time & resources, most trials focus on X. I am therefore interested in analyzing to what extent a treatment’s effect on X is a good surrogate/predictor of its effect on Y.
Ideally, I would like my analysis to take into account both the uncertainties involved in estimating X and Y as well as their within-trial correlation. An example of a paper that (I think) does this: https://pubmed.ncbi.nlm.nih.gov/30635226/ (the associated statistical appendix clarifies their methodological approach, although it is behind a paywall).
Now, onto a simulated data example and the approaches I’ve tried to tackle this:
#Load foreach and data.table
require("foreach")
require("data.table")
#Sequence over range of (somewhat) plausible surrogate treatment effects "stes"
stes <- seq(-0.5, 0.5, 0.05)
#Run a loop over stes
tes <- foreach(ste = stes,
counter_id = 1:length(stes), #Add a loop counter as a trial id
.combine = "rbind") %do% {
#Sample some amount of error from 0.1 to 1
error <- sample(seq(0.1, 1, 0.1), 1)
#Simulate a distribution of surrogate treatment effects
#giving rise to the observed treatment effect in this trial
surrogate_dist <- rnorm(n = 1000,
mean = ste, #With a given mean
sd = error) #With a given standard deviation
#Sample some amount of within-trial correlation of treatment effects from 0.1 to 1
rho <- sample(seq(0.1, 0.8, 0.1), 1)
#Simulate a distribution of true treatment effects
#giving rise to the observed treatment effect in this trial
true_dist <- (rho * surrogate_dist) + sqrt(1 - rho*rho) * rnorm(n = 1000, mean = 0, sd = error)
#Create a data.table representing this trial
data.table(trial_id = counter_id,
surrogate_te = mean(surrogate_dist),
true_te = mean(true_dist),
surrogate_sete = sd(surrogate_dist),
true_sete = sd(true_dist),
intra_cor = rho)
}
We now have a dataset of 21 trials, each with a treatment effect on X and Y, standard errors for these treatment effects, and a within-trial correlation coefficient. The effects on X and Y vary between the trials (reflecting the different degrees of effectiveness of treatments), as do the associated standard errors (reflecting the different degrees of precision due to variation in trial size).
Based on some of the previous literature (https://10.0.3.233/jamainternmed.2021.5726 among other examples), my first impulse was to just run a weighted linear regression. I set the weights as being equal to the inverse of the sum of the variances of treatment effects (such that we let more precise estimates shape our results).
#Run weighted least squares with weights equal to the precision of the observed
#treatment effect
#Run weighted least squares with weights equal to the precision of the observed
#treatment effect
weighted_model <- lm(data = tes,
formula = true_te ~ surrogate_te,
weights = 1/(surrogate_sete^2 + true_sete^2))
The 2 problems with this are (I think) that it ignores the absolute uncertainty in estimating the treatment effects (the weighting procedure only utilizes the relative precision of each trial’s estimate but not the absolute magnitude of their standard errors) and it also ignores within-trial correlations.
Other approaches I’ve tried include:
1) Incorporating error in the treatment effects using the se
and me
functions of brms
:
require("parallel")
#Model 1
m1 <- brm(data = tes,
prior = prior(normal(0, 1), class = "b"),
cores = parallel::detectCores() - 1,
formula = true_te | se(true_sete) ~ me(surrogate_te, surrogate_sete)
)
As I understand it, this models the actual treatment effects on X and Y as arising from normal distributions, each with a given (observed) mean and standard deviation (equal to the observed standard error). This looks like it accounts for uncertainty in treatment effects, but it ignores within-trial correlation.
2) Using a bivariate normal and modeling the correlation between the residuals:
#Model 2
m2 <- brm(data = tes,
prior = prior(normal(0, 1), class = "Intercept"),
cores = parallel::detectCores() - 1,
formula = bf(true_te | se(true_sete) ~ 1) +
bf(surrogate_te | se(surrogate_sete) ~ 1) +
set_rescor(TRUE)
)
This also looks like it appropriately models observed treatment effects as arising from an unobserved distribution with a given mean and standard error. But again, this would not account for within-trial correlation.
3) Reframing the problem and viewing the two types of outcomes (surrogate and true) as being random effects nested within trials and using the correlation parameter between them to assess surrogacy:
#Convert tes to long format first
tes_long <- data.table(
te = tes[, .(c(true_te, surrogate_te)), by = trial_id][[2]],
se = tes[, .(c(true_sete, surrogate_sete)), by = trial_id][[2]],
outcome = rep(c("True", "Surrogate"), times = nrow(tes)),
trial_id = rep(tes$trial_id, each = 2),
intra_cor = rep(tes$intra_cor, each = 2)
)
#Create a variance-covariance matrix
cov_matrix <- matrix(nrow = nrow(tes_long),
ncol = nrow(tes_long)
)
#Fill up the diagonals with the variance for surrogate and true treatment effects
diag(cov_matrix) <- tes_long$se^2
#Calculate and fill in the covariance of all-cause and HF hospitalizations for each trial
for(i in 1:uniqueN(tes_long$trial_id)) {
cov_matrix[2*i, (2*i) - 1] <- tes_long[trial_id == i, intra_cor][1]*sqrt(cov_matrix[(2*i) - 1, (2*i) - 1])*sqrt(cov_matrix[2*i, 2*i])
cov_matrix[(2*i) - 1, 2*i] <- tes_long[trial_id == i, intra_cor][1]*sqrt(cov_matrix[(2*i) - 1, (2*i) - 1])*sqrt(cov_matrix[2*i, 2*i])
}
#Convert NAs to 0
cov_matrix[is.na(cov_matrix)] <- 0
#Model 3
m3 <- brm(data = tes_long,
data2 = list(cov_matrix = cov_matrix),
prior =
c(prior(constant(1), class = "sigma")),
cores = parallel::detectCores() - 1,
formula = te ~
0 + #No global intercept
outcome + #A separate intercept for each of the 2 outcomes
(0 + outcome|trial_id) + #Nest the 2 outcomes within their trials
fcor(cov_matrix) #Pass the covariance matrix
)
This last option seems like it takes both the standard errors and within-trial correlations into account since I’m allowed to pass a variance-covariance matrix containing said information.
That said, given my lack of formal training, I would greatly appreciate any help regarding what approach(es) may be best suited for this task (and what the key differences between them might be).