Randomized trial with variation around pre-specified follow-up times

SpiekerStats · November 18, 2020, 11:26pm

Good evening, datamethods world! I’ve been twisting myself in knots over this one and could use some advice to straighten me out. Sometimes when you sit with a problem for too long you just need another set of eyes looking at your work.

Background/setup: A two-arm, 1:1 randomized trial of 200 individuals, with outcome measurements planned for 30 days and 90 days. The goal is to understand the treatment effect, defined as a difference in mean outcome between groups at 30 days and at 90 days.

The challenge: In reality, there is a good deal of variation in the actual follow-up times. What was planned for 30 days ended up ranging between 15 and 60 days. What was planned for 90 days ended up ranging between 60 and 120 days.

Standard (?) approach: Usually what I see people do is trim out the observations that are “too far” outside the visit window and just treat the time points as visit numbers (i.e., follow-up visit 1 vs. follow-up visit 2).

Possible (?) alternative: To avoid grouping people into buckets that don’t adequately capture their true time post randomization, one possibility seems to be an adjusted GEE model with an indicator for treatment, a restricted cubic spline on time, and a spline-treatment interaction. From this model, I would, in theory, be able to characterize the treatment effect across continuous time – of course with the most precision available at time points where people tended to visit. But of course, the only function of this model to form point and interval estimates at 30 and 90 days, as pre-specified. As first blush, this seems to be a reasonable way to deal with variable follow-up times.

Question: With some reflection, I see some possible cause for concern. Does the proposed alternative compromise randomization? My worry is that this formulation may require follow-up time to be non-informative, an assumption that is not testable in practice. If the kind of subject who follows up at 25 days is just fundamentally different than the kind of subject who follows up at 60 days, it feels as though this ends up creating more problems than it solves. If the time of follow-up is not random, then it seems as if we are estimating the (adjusted) difference in mean outcome between groups among folks who happen to follow up at 30 days–a condition that may induce systematic differences between the two randomization groups being compared because it’s post-randomization. If the follow up times are truly random, then my sense is that there’s no cause for concern.

Do you see the tension? What are your thoughts? Am I worried about nothing? Should I abandon this idea, delete this post, and never make mention of it again?

Looking forward to discussing.

f2harrell · November 19, 2020, 12:22pm

You set up the problem well Andrew. It all depends on the reasons someone returned for a follow-up visit. In a randomized trial the follow-up times are usually random with respect to the patient’s emerging condition, so censoring is uninformative and you can use traditional methods like the Cox model for dealing with the exact censoring times.

The effect of follow-up time on treatment effect is a somewhat separate issue, but can’t be addressed using time as a covariate. Instead a formal time-dependent covariate should be used, which changes the likelihood function in the Cox model.

Whether doing a longitudinal analysis or a time-to-event analysis it is even advantageous to have random follow-up times that don’t follow the prescribed visits, because then you can estimate the smooth effect of time on treatment effect. In a longitudinal study with a continuous or categorical outcome you can randomize say 3 follow-up times for each patient and then do the usual combined analysis to estimate the entire time-response profile. In a time-to-event analysis you can have a spline function of time interacting with treatment to get a smooth treatment effect estimate over time.

A common problem in analysis of data like yours is the use of “visits” instead of “days” in the analysis. The model works better and the results are more informative if the actual day is used. In the longitudinal setting this calls for a continuous time correlation structure, which can be handled quite elegantly.

I’m not sure this answers your question and I hope others chime in.

pmbrown · November 19, 2020, 12:49pm

it’s a good Q. I can’t remember what rules were applied in industry, only that there was a large degree of convention (ie different companies handled it in the same way, and not with much imagination). Eg, if you look at an industry SAP on clinicaltrials.gov you see standard remarks (see p24-25): https://clinicaltrials.gov/ProvidedDocs/99/NCT02194699/SAP_001.pdf
“Any data collected at unscheduled visits will be listed, included within baseline data in shift plots and reversibility summaries, and will be included in the definition of maximum /minimum within-period value, but will not be included in summaries by visit. In case of a missing assessment at a scheduled visit followed by an unscheduled visit, the unscheduled assessment will not replace the missing result in the summary outputs by period and visit. If appropriate, i.e. if a substantial percentage of observations for a variable fall outside the adjusted window, sensitivity analysis will be performed”
It all sounds familiar and routine, but not very thoughtful. If there was a large proportion of visits outside the time windows then i would consider a random coefficients model i guess. But this would never happen in industry because everything is monitored quite tightly

MSchwartz · November 19, 2020, 10:45pm

Hi,

Frank has referenced some good approaches and I am not sure that I can offer anything additional from a methodology view. The distinction between designated visit numbers and the actual timing of the visits, is a critical one. In essence, you end up with a model predicted outcome assessment at specific time points, as opposed to the empirically observed outcome at those time points.

@pmbrown has also referenced some industry related considerations, and I can speak from experience, from the view of serving on a number of DSMB/DMC bodies for both pharma and device trials. In that setting, there are typically protocol defined, acceptable windows of time around planned follow up encounters.

So, for example, for a 7 day visit, that would be +/- 1 day. For a 30 day visit, that might be +/- no more than 7 days. For a 90 day, visit, that might be no more than +/- 14 days. As you get farther out, the allowable window around the follow up time will get wider, to a point, recognizing that the time sensitivity decreases over time.

Patients seen outside of those windows would be considered protocol deviations and recorded as such, as an indicator of protocol compliance. The DSMB/DMC would also review those as part of an assessment of the conduct of the trial. They would look to see if those deviations occur generally over most/all sites, in a multi-site study, which would be suggestive of widespread protocol compliance issues, perhaps requiring a protocol change, or if they occur at a limited number of sites, suggesting the need for remediation at those specific sites. There would also be an assessment of the per arm incidence of these protocol deviations, that might be suggestive of specific issues in one arm or the other and to understand the etiology, if there is a material differential in incidence.

For your ITT analysis, you would want to consider the approaches that Frank defines, given the variability in the intervals. That may have to be considered in light of any pre-specified analyses in a formal SAP.

For a Per Protocol (PP) analysis, if one is defined, you might exclude the patients that were not compliant with the defined follow up time points. It would be important to know if the conclusions of the PP analysis differ materially from the ITT analysis and to consider the implications of such a finding.

pmbrown · November 20, 2020, 8:40am

it seems if many fall outside the specified windows then ITT and PP will surely differ and then it promotes cynicism unnecessarily? id discard industry thinking here and make no mention of a PP analysis. Academics are lucky to have the freedom to do more clever/appropriate analyses i feel