Data collected outside of guidelines/window

I am involved in a longitudinal study where each person is measured, say,
quarterly for 4 quarters and their measurement occasion is defined as, say,
90 days from their prior measurement occasion, plus or minus 10 days (call this their “window”). However, some of the subjects
missed their window but came in later for a measure and this was done (i.e.,
their data was collected); some
were very close (e.g., they missed the window by 1 day or so) and some are not
nearly so close (e.g., they missed the window by more than 30 days).

I have been unable to find any literature on how to deal with subjects who
miss their measurement window but still come in and have their data collected,
but I believe and hope there is some relevant literature.

So, I am looking for 2 things: (1) advice on how to deal with the measurement
occasions that are outside of the pre-specified window and (2) citations, if
any, that discuss this issue in any way.

I do note that I was not involved in setting up the size of the windows but I
was told that the window size was set so that the clinicians would be
“comfortable” that the value was appropriate for that time point - and that
is all I know about the issue (and I don’t fully understand the answer!)

1 Like


That is a fairly common occurrence in longitudinal studies, where discrete time intervals are defined in the study protocol for follow up windows.

You will never get patients to consistently come in to an outpatient clinical setting for data collection on exactly the specific day desired as time goes on. Even phone call contacts can be an issue. The logistics get to be problematic, not the least of which is the availability of the clinical team and their schedules.

It is also common that you see that the width of the windows may get larger as you go out in time, reflecting that temporal related status changes in the patients are likely to be more stable the further out from time 0 you get.

So, for example, where the 1 month window might be plus or minus a few days, by the time you get out to a year it can be plus or minus a week, and by five years, it could be plus or minus a month. It all depends upon the realities of the logistics involved, and the likelihood that the data to be collected would materially vary from the start to the end of the window, biasing your observations.

Violations of those protocol defined follow up windows, and missed visits entirely, would be formally tracked as protocol violations.

In most cases, your ITT analysis would include those patients using their data as if collected at the discrete time interval.

However, a per protocol analysis, if one is performed, might exclude those patients due to the violations.

The protocol and/or the SAP for the study would pre-define your analysis cohorts such as ITT, Per Protocol, Safety, As Treated, and so forth, and the inclusion/exclusion criteria for each.

Marc I’ve long thought that even for ITT we need to use actual dates and not intended dates. Any further thoughts or literature references about that?

I think it’s important to use continuous time correlation structures that can readily accommodate irregular measurement times.

1 Like

Hi Frank, thanks for your reply.

I have not located much in the way of literature that addresses this particular situation, that is, dealing with time based variations in the collection of follow up data, at time points that are outside of, and may materially deviate from the more narrow, discrete, protocol defined windows. It is possible that I am just not using the correct keywords, but I did try various terms and combinations.

There are some general papers that I located, a number outside of clinical trials, even outside of life sciences (e.g. engineering, economics, etc.), that discuss general issues around using time as a discrete versus continuous variable in modeling. Frequently, those are dealing with time series based analyses, versus the finite number of time points as typically seen in clinical trials.

So, this is a topic that would seem to be in need of further comparative work to elucidate the pros and cons of varying analytic methods and the impact (e.g. bias) on conclusions from relevant studies.

Fundamentally, I agree with your view that time is, in general, best dealt with as a continuous variable to account for irregularly spaced intervals across patients in this setting. Where I wrestle with the notion, is that there can be circumstances that one can envision where it may not be always apropos.

In favor of using time as a continuous variable as you note, is that you can deal with the variations in the actual data collection time points, both within and between patients, given the realities of those variations in the conduct of a trial as raised in this thread. One can then use a model based approach (e.g. mixed effects, gls, gee, etc.) to estimate the values of the outcome of interest at specific time points that are relevant to the trial design and the questions being posed.

Another potential benefit of using time as a continuous variable is that you only have one degree of freedom, if time is not transformed, versus having a larger number of degrees of freedom for time as a discrete variable. So, you may get more power for the model in that setting, if presuming a linear relationship is reasonable. However, if you perform a non-linear transform on time, such as using a spline, then you can possibly lose that advantage, depending upon the number of time points involved in the study timeline.

The potential downside to using time as a continuous variable is that, if there are only a few post baseline time points, you may not be able to reasonably transform time using a spline or some other non-linear method, perhaps risking overfitting the model. Thus you effectively revert to presuming a linear relationship between time as a continuous variable and the outcome measures of interest, which may not be reasonable. So, in that setting, using time as a discrete variable would allow for more flexibility, perhaps at the expense of forcing the presumption of consistency in the data collection time points. So there may be tradeoffs in that setting, and one may want to engage in some sensitivity analyses to assess potential bias in the estimates that result.

I look forward to your thoughts on the above. Thanks!

1 Like

Fantastic thoughts Marc. I’ll just add that discrete time works better primarily when there are less than around 4 distinct follow-up times. With 4 times it takes 3 d.f. to model time discretely, and 2 d.f. for a quadratic or a restricted cubic spline with 3 knots.


Hi Frank, thanks for your follow up!

Yah, I suspected that a practical limit on the number of follow ups in favor of the discrete case would be around 3 or 4 in most situations.

This is an interesting topic and it will be of value to see if anyone takes on a more formalized comparative analysis, ideally using real-world trial data, to assessing differences in the two approaches under varying timeline parameters.


1 Like

P.S. For visits that are scheduled because of changing patient status, where the tendency to schedule a visit is a function of current risks that are not captured in the last visit’s data, special methods apply as well discussed here.

1 Like

thank you both - rich

1 Like

forgot to note that I had also (1) asked a couple of friends about this and (2) asked on Statalist - the only friend not included in either public forum agreed with Frank about use of actual time; if anyone is interested in the discussion on Statalist, see data collected at the "wrong" time - Statalist


Hi Frank,

Thanks for finding that article. I will take the time to review it further.

It raises the issue, which we had not really touched on in the prior discussion, which is how to deal with the interim unscheduled visits, and depending upon the reason for them, whether or not the outcomes of interest may be measured at those time points. If so, how to handle those in the relevant analyses.

Given the publication date (2004) of that paper, I wondered what the citation history for it might be, which I found here:

One of the more recent (2022) papers in the above list is:

Randomized Trials With Repeatedly Measured Outcomes: Handling Irregular and Potentially Informative Assessment Times
by Pullenayegum and Scharfstein

where the latter co-author is also a co-author on the 2004 paper you cited. it appears, upon a brief review of the abstract, to conduct a literature review and poses recommendations for the consideration of trialists in terms of influencing study design and the publication of study results.


Great work Marc. Their new paper is the one we need.

My summary at present: Use of actual assessment dates is better than using intended dates, but may still have problems if measurement times are informative.

1 Like