Intermittent exposure variable in causal inference

I have been asked to assist in the design and analysis of an observational study where the exposure is the administration of a medication to racehorses (furosemide) and the outcome of interest is career length. That is does the administration of furosemide to horses impact the length of their racing career. There are two problems I foresee. Furosemide is administered to some horses for exercise induced pulmonary hemorrhage on the day that they race. Therefore, the exposure is determined by another variable, number of race starts. How does one account for that, since the impact may be cumulative? Secondly the majority of horses will receive furosemide so how do you handle an exposure variable where the percentage exposed is significantly higher than the percentage unexposed? Thank you.

I will try to ask a better question. I have been asked to help in the design of an observational study with the hypothesis that the administration of furosemide to racing thoroughbreds lengthens a horse’s career. The database is large with most horses receiving the medication and a small percentage not. The exposure will be if the horse consistently received furosemide during its career vs never receiving furosemide. The outcome will be dichotomized into a long career > than 20 starts and a short career <20 starts. The question is what factors to consider in performing a retrospective cohort study, particularly when there is a much smaller number of unexposed horses.


Some initial questions for you:

  1. It sounds like you may actually have 3 treatment groups:
  • Horses that got furosemide routinely over the course of their careers, presumably under a regular dosing regimen, with regular doses between races

  • Horses that only got furosemide on the day of each race and not otherwise. BTW, a quick Google search this morning suggests that due to doping concerns, as well as drug side effects, discussions over a possible general ban of furosemide have been raised, and the use of furosemide on the day of the race has been banned by a number of race tracks.

  • Horses that never got furosemide at all (edited to add: presuming that this is verifiable)

  1. In your second post, you dichotomize career length. Why and how was 20 chosen? Why not use number of race starts as a continuous outcome variable or consider using time to the end of the horse’s career from a common time 0?

  2. Will you know the actual career length for every horse, or will there be horses that are still actively racing at the end of your study observation window? If the latter, you need to consider censoring as part of your study design and analysis methods, which might make time to career end as the outcome of interest more attractive. Then you can consider Kaplan-Meier analyses, Cox regression or perhaps multi-state models as analytic methods which can handle censored observations.

  3. What is the actual proportion of horses that would be expected in each of the treatment groups? You mention a small percentage of horses that would not get furosemide at all. How small? 10%, 5%, 1%?

As I did quickly this morning, you should do some searching to see how other relevant studies have been designed, and see what other factors may be relevant to your new study. For example, this site that I found in my search this morning on sudden death in horses associated with the use of furosemide:

New Study Finds Horses Racing On Lasix At 62 Percent Increased Risk Of Sudden Death

which references this study:

Fifteen risk factors associated with sudden death in Thoroughbred racehorses in North America (2009–2021)

I do not have any sense of bias or conflict of interest in the initial web site or the study authors and I did not take the time to review the paper in detail.

That kind of information may very well have an impact on your study design, if a horse dies suddenly where that death may be attributed to the use of furosemide which actually shortens their effective career length, versus a horse that is still alive and whose career is deemed to be over for various other reasons. Your analyses may become more complicated since the length of the horses career may then be confounded by various factors, and which may make analytic methods such as multi-state models more attractive.

You should also review some of the considerations/limitations in the above study, since it would appear that there can be seasonal variations (e.g. the impact of summer heat), variations in the type and length of the track that the horse is racing on, the time interval between races, and numerous other factors.

If this is a retrospective study as you note, and you might perhaps use a source such as the EID referenced in the above study:

Equine Injury Database

that may enable you to obtain data on a far larger pre-existing cohort, at the potential expense of being limited to the data available in the existing source. That removes one of the biggest impacts on study sample size and budgets in prospective studies, which is the cost per observational unit, the study duration and related factors.

Even so, you may have to consider temporal changes in various factors that can introduce time related effects in the characteristics of the horses, changes in race and track conditions, regulatory changes and other relevant factors.


M thank you for your reply. The studies are intended to be presented to a government agency to demonstrate if in fact lasix may lengthen a horses career due to limiting lung damage and perhaps some unknown process. The data is available from the Jockey Club as you suggest. The risk factor studies you link to suffer from the table 2 fallacy if causality is our goal. Unfortunately I dont think it is possible with observational data. My colleague suggested taking one year and selecting all the horses which did not receive furosemide (about 5% of the racing population) and then comparing to that did receive furosemide and were in the same race. Multiple confounders potentially exist such as musculoskeletal injury being the primary cause of attrition in race horses. I like the idea of a Kapalan-Meier analysis and would categorize horses into always, never or intermittently receive furosemide. As you suggest I think there will be a small percentage of horses still racing at the end of the study.

Another possibility is comparing 2 year olds pre 2020 vs post 2020 since furosemide was outlawed for that cohort in 2020.

1 Like


I just had a chance to review the paper that I referenced in my earlier reply and note a number of issues with their methods, including, but not limited to:

  1. The use of univariate pre-screening of potential covariates for their LR model with p < 0.2 for inclusion.

  2. The use of forward stepwise selection for variable inclusion in the LR model.

  3. The use of the H-L test for assessing model fit.

  4. The use of post hoc power calculations.

One positive I guess, which is that they used R for their analyses…

They do, however, raise some interesting issues vis-a-vis pre-existing factors for the horse, track and environmental conditions, race frequency and intervals, and so forth, where I would defer to your knowledge as a veterinarian. You note one in your earlier reply, being prior musculoskeletal injury.

From a search where this particular study was also referenced in other sources, there seems to be suggestions that the timing of the pre-race furosemide treatment, apparently usually around 4 hours before the race, may vary, and which may be a factor. So there may be treatment optimization strategies that could be important here, and I would envision that there is existing research along those lines.

Additional discussions seem to suggest that post-race/exercise recovery treatments, to replace minerals and electrolytes lost from furosemide use, are also important considerations.

Not clear if that information would be available in the EID.

You are correct in that showing causation will be problematic here, as it is with all, even well designed, observational studies, especially when it comes to the estimation of potential effect sizes. Associations are more typically the primary goal in this setting.

What you may be able to achieve here, presuming that the data support a positive association in your cohort, is to provide ethical and ultimately, financial, justification for a prospective, randomized study. However, I am not sufficiently familiar with all of the nuances in an animal research setting, especially as they are relevant here, as to the key issues that would impact your ability to move forward in that manner.


Your inquiry prompts a question familiar to patient advocates: Is career length the primary endpoint of interest to the horses? (No way to know of course, and it could be, right? These creatures do seem to love a good race.)

Might shortening of the animal’s race career and/ or life be a possible finding for the use of this drug? It has side effects and the remedy might be a variable to account for I assume.

“Horses receiving furosemide lose valuable electrolytes in their urine. To replenish those losses, KER formulated an electrolyte product called Race Recovery"

I apologize for complicating things, and I don’t know the specific domain well enough, but generally I would refrain from classifying exposure as never/always/intermittent.
Usually (at least in medicine), one might care for incident users, and to support that, an analysis should assess the effect of starting medication (vs. not starting it).
When, instead, one compares never users to always users, it often has little clinical benefit. The question you end up answering is “if the patient starts taking a drug an keep doing so for X years (i.e., becomes a persistent user) then the effect it will have is Y”. But a lot can happen from starting medication until becoming a persistent user, and during that time window selection bias can creep in (e.g., immortal time bias).

Therefore, and again sorry for complicating things, you might benefit from modeling a dynamic/sustained treatment strategy rather than a static/point treatment strategy.
Under this approach you could then assess the cumulative effect of furosemide.
One related approach to do this time-varying analysis while respecting time-zero is with sequential “trials” analysis.
If this approach sounds like it fits your use-case (e.g. you have time-varying data for each horse in the database), there are plenty more details in Hernan and Robins’ What If book, sections 19.2, 19.3, 20.4, and 21.2 (but really chapters 19 and 21).


Thank you Ehud. I just found out that we will not have information on the horse’s individual races. We will have total lifetime starts and total lifetime starts with furosemide. I was thinking the exposure could be percentage of starts with furosemide. I am unsure if this would be a continuous variable from 0 to 100 or as my colleague suggests should it be divided into quintiles? I suspect it will make it challenging to eliminate immortal time bias.

I apologize, but I’m not sure I understand what “total lifetime starts” means, but I agree not having temporal/longitudinal data can make it challenging (but not necessarily detrimental).
Regardless (and again, I’m generalizing and your specific case might be different), I would prefer to avoid dichotomization/categorization of variables. Neither for the exposure (quintiles), nor the outcome (>20 years). There shouldn’t be anything wrong with a fractional exposure analytically speaking, however I would suggest thinking if that’s interpretable (namely, how would you intervene on a percentage - by changing the denominator or the numerator? these can be two different actions).

1 Like