Individual response

Sander this post is extraordinarily helpful. Thank you. The only thing that would make it better is the possibility of adding a practical fully-worked out case study along those lines. Can you recommend one? This is slightly related to Avoiding One-Number Summaries of Treatment Effects for RCTs with Binary Outcomes | Statistical Thinking.

3 Likes

This formulation of the problem seems equivalent to the approach to statistical inference recommended by Seymour Geisser I discussed in this thread:

On the first two pages of chapter 1 of Predictive Inference: An Introduction, He goes on to summarize a number of your cautions in one lengthy paragraph (bolded sections are my emphasis):

Blockquote
…This measurement error convention, often assumed to be justifiable from central limit theorem considerations…was seized upon and indiscriminently adopted for situations where it is dubious at best and erroneous at worst. … applications of this sort regularly occur … much more frequently in the softer sciences. The variation here is rarely of the measurement error variety. As a true physical description of the statistical model used is often inappropriate if we stress testing and estimation of “true” entities, the parameters. If these and other models are considered in their appropriate context they are potentially very useful, i.e. their appropriate use is as models that can yield approximations for prediction of further observables, presumed to be exchangeable in some sense with the process under scrutiny. Clearly hypothesis testing and estimation as stressed in almost all statistics books involve parameters. Hence this presumes the truth of the model and imparts an inappropriate existential meaning to an index or parameter. Model selection, contrariwise is a preferable activity, because it consists of searching for a single mode (or a mixture of several).that is adequate for the prediction of observables even though it is unlikely to be the true one. This is particularly appropriate in the softer areas of application, which are legion, where the so-called true explanatory model is virtually so complex as to be unattainable

Geisser later points out (page 3):

Blockquote:
As regards to statistical prediction, the amount of structure one can reasonably infuse into a given problem could very well determine the inferential model, whether it be frequentist, fiducialist,
likelihood, or Bayesian. Any one of them possesses the capacity for implementing the predictive
approach, but only the Bayesian mode is always capable of producing probability distributions for prediction.

Given that background, what am I to make of the criticisms of statisticians by causal inference
proponents? It seems to me none of their criticisms are relevant to either a Bayesian approach
or the frequentist approach recommended by @f2harrell in Regression Modelling Strategies. At best,
they criticize a practice of statistics that no one competent would advocate.

Thanks Frank,

For illustration you could take any paper that assumes f(y_x;z,u) = E(y;x,z,u) or equivalent (e.g., that z is sufficient to “control bias” in the sense of forcing this equality) and then illustrates fitting of E(y;x,z,u) with an example. A classic used when I was a student was Cornfield’s fitting a logistic model and then using that model with x fixed at a reference level across patients to compute risk scores of patients. See for example p. 1681 of Cornfield, ‘The University Group Diabetes Program: A Further Statistical Analysis of the Mortality Findings’, JAMA 1971. His is a purely verbal description, there being no established notation back then. In modern terms what he was doing there is arguing against there being much confounding in the trial because the distribution of the fitted E(y;0,z,u) - his proxy for the f(y_0;z,u) distribution - was similar in the treated and untreated groups.

More recent examples in modern notation can be found under the topics of g-estimation and, especially, finding “optimal” treatment regimes. Once the fitted function f(y_x;z,u) is in hand, focus then turns to external validity: whether z is sufficient for transporting the function to a new target population beyond those studied. This problem is more general and difficult than the internal validity issue of whether the fitted potential outcomes can be transported across the treatment or exposure groups within the study - see p. 46 of the Hernan-Robins book for a discussion.

There are now many papers on finding functions for making treatment choices (“optimizing” treatment regimes) and transporting those functions. I haven’t even begun to read them all and would not claim to be able to recommend one best for your purposes. Nonetheless, most I see are illustrated with real examples. One clearly centered around individual patient choices is Msaouel et al. “A Causal Framework for Making Individualized Treatment Decisions in Oncology” in Cancers 2022, which I believe was posted earlier in this thread. One could object to its use of additive risk models and risk differences, but the same general framework can be used with other models.

Regardless of the chosen smoothing model, one can display the risk estimates directly instead of their differences. Survival times and their differences might however be more relevant than risks (probabilities). In either case, the use of differences is defensible to the extent the chosen differences are proportional to loss differences, which I am pretty sure is far more often for risk and survival-time differences than for odds ratios (I’ve yet to see a real medical example in which odds ratios are proportional to actual loss differences).

5 Likes

R^3: I don’t see where Geisser or others making similar arguments (there have been many) address the key distinction between the causal (potential-outcome) function f(y_x;z,u) and the purely predictive (regression) function E(y;x,z,u). That is the core criticism, as I see it. The Bayesian framework does not include this key component so it has to be added on; this gap in the framework may explain why some Bayesians failed to understand the role of randomization. Interestingly, apparently in e-mails (with Pearl) toward the end of his life, Lindley recognized and conceded the need for causal extension.

In contrast, starting in the 1920s frequentists developed the necessary language for that component as it arose naturally from randomization theory, so it is rather startling how it failed to take firm hold until recent decades. Yes, experienced, intuitively smart statisticians got by without causal formalisms, and “causal inference” can be framed as a prediction problem, enabling the vast toolboxes of statistical prediction to be applied (whether frequentist, Bayesian, hybrids, etc.), e.g., see ,
Greenland, S. (2012). Causal inference as a prediction problem: Assumptions, identification, and evidence synthesis. Ch. 5 in: Berzuini, C., Dawid, A.P., and Bernardinelli, L. (eds.). Causality: Statistical Perspectives and Applications. John Wiley and Sons, Chichester, UK, 43-58.
But methodologies still need to make the fundamental distinction encapsuled as “correlation is not causation” in order to derive sound algorithms for making decisions.

I argue further that any sound statistical algorithm needs an explicit causal foundation, even if it is only a survey method, because all studies need to consider causes of observation selection and missing entries:
Greenland, S. (2022). The causal foundations of applied probability and statistics. Ch. 31 in: Dechter, R., Halpern, J., and Geffner, H., eds. Probabilistic and Causal Inference: The Works of Judea Pearl. ACM Books, no. 36, 605-624, Probabilistic and Causal Inference:The Works of Judea Pearl | ACM Books, corrected version at [2011.02677] The causal foundations of applied probability and statistics
I think recognition of this need for explicit causal models is one way the AI/computer-science literature pulled ahead of the statistics literature in the 1990s-2000s.

6 Likes

Appreciate the reference. I am stunned by how elegantly you have been summarizing such complex topics in these last few posts. And you even predicted our upcoming project which is indeed to apply the framework using more flexible models. It is foreshadowed in the parts where we discuss Bayesian nonparametric modeling. Having said that, because we do not always have the data or the methodology available to apply such flexible modeling for many clinical decisions, I do indeed also think that the research done by @AndersHuitfeldt and others can be important for practical applications in medicine. My intuition is that his insistence on anchoring relative treatment effect summary measures to potential biological / mechanistic interpretation will pay off.

5 Likes

Isn’t an RCT directly transportable to clinical settings? If an RCT shows ATE>0 , then, on average, the treatment has a positive effect. That seems like useful information in a clinical setting

The problem from a doctor’s point of view is that there varying degrees of disease severity, so that when the result of an RCT is applied to an individual patient the risk difference (AKA absolute risk reduction) has to take this into consideration. This means that baseline numerical test results that provide a measure of disease severity for the treatment and control groups have to be available from the RCT in order to make the above assessment.

By choosing treatment, confounding can occur.

A doctor advising a patient whether to accept or choose treatment will take into account not only the severity of the condition but also what dose to take. Both affect the intended effect of treatment but also its adverse effects. I can’t think offhand of an example where this process results in confounding, when the choice and ‘doing’ of treatment results in an increased frequency of a desired outcome over and above what would be expected from a RCT.

Why couldn’t this excess death rate be due to poor choices of treatment in the observational setting?

I suppose that this could happen if a drug was given inappropriately (e.g. giving intravenous fluids to someone in congestive heart failure already overloaded with fluid which would be perverse and malpractice). Excess death due to a drug would be by definition an adverse effect or ‘harm’. All treatment can cause benefit and harm but by different causal mechanisms. One cannot prove benefit or harm in an individual because as I understand it, many may experience a good outcome anyway without a drug (e.g. no MI without a statin) and many will also suffer what appears to be a drug’s adverse outcome without taking the drug (e.g. muscle pain without taking a statin).

Context is important so I would have thought any discussion about cause and effect must take place with detailed understanding of the biological mechanisms including the multiple feedback mechanisms involved. The result of such reasoning will always be a hypothesis that has to be tested by experimentation (i.e. RCTs, observational studies etc.), the results of which will be uncertain due to limited data and stochastic processes. I have experimented with fitting parametric distributions, splines and logistic regression functions and then placing confidence intervals on the estimated probabilities of outcomes conditional on disease severity, treatment and control and also confidence intervals on the differences between these probabilities.

Although I get the gist of the discussion between @Sander_Greenland, @F2harrell, @R_cubed, @Pavlos_Msaouel, @AndersHuitfeldt, @davidcnorrismd and @S_doi, I am still not clear as to how it relates in detail to my attempts to understand these processes. My aim is to foster a better mutual understanding between statisticians, researchers and practicing physicians. I would therefore be grateful to @Sander_Greenland especially and others if they could comment on the suggestions in my recent posts of Should one derive risk difference from the odds ratio? - #340 by HuwLlewelyn and Should one derive risk difference from the odds ratio? - #359 by HuwLlewelyn and how especially disease severity, treatment, control, probabilities of outcomes and their confidence limits relate to the expressions f(y_x;z,u) and E(y;x,z,u).

5 Likes

One way to connect these would be to take for example a patient who has prostate cancer. X is the choice between a treatment or control (control here would be surveillance, i.e., no treatment). Y is the outcome of interest, which here can be overall survival time (expressed for example as posterior mean overall survival probability). U denotes the specific patient for whom we are called upon to make the choice. Z is a vector of all his covariates that can influence Y. Thus Z includes disease severity, but also other considerations such the patient’s comorbidities as well as biomarkers on the tumor cells reflecting the mechanisms targeted by the treatment. Small letters denote specific values of these variables.

While disease severity is certainly an important consideration when estimating the effect of X=x in Y, it is not enough. For example, the patient may have relatively indolent prostate cancer (disease severity) and severe cardiovascular comorbidities that could actually result in decreased survival time if we choose a therapy such as androgen deprivation therapy (ADT) as opposed to surveillance. His cancer may also harbor mutations in androgen receptor signaling (this is again different than disease severity) that can negate the effect of ADT on the cancer even though it will still yield cardiovascular toxicity.

2 Likes

I think these are important questions and look forward to specific clinical illustrations as answers. One thing I would add is that in your first post tagged above you plotted probabilities in the example from a model and different models (logit or log binomial for example) will return similar probabilities on the same dataset - what will differ is the value of the effect measure in strata with different baseline risks. A dataset can indeed be created that returns a constant RR or OR or RD over strata that differ by baseline risk but predictive margins when plotted from the different models on datasets with different constant effects should be illustrative.

Thank you @Pavlos_Msaouel and @S_doi. I agree that there are many ways of creating predictive models, many of which may give similar results. I also agree that there are a number of ways of measuring disease ‘severity’. However, we have to identify measures of severity that are highly predictive of the chosen outcome and which also identify subjects that will respond the treatment. Instead of using one measure of severity (e.g. the albumin excretion rate (AER) as I do) one could incorporate other measures to form a score such (e.g. the AER, HbA1c etc.) and then compare the predictive power of the calibrated score with each individual measure (e.g. AER and HbA1c). There might well be little difference, for example as the angiotensin receptor blocker (ARB) would have little effect on those diabetic patients with a high HbA1c. The baseline risk might be higher conditional on the placebo, HbA1c and the AER but the risk reduction would no greater than due to the ARB conditional on the AER alone as the ARB would not improve diabetic control.

Another central issue is how well calibrated are the probabilities of the outcome, which I don’t think featured in the recent discussion, including by @Sander_Greenland. This of course begs the question of how we should calibrate. One simple preliminary test of calibration that I use is that (1) the average of all probabilities read from a curve should match the overall frequency of the outcome and (2) the average of all probabilities read from a curve up to some point should match the overall frequency of the outcome up to that point; the same applying to those above the point. If the probabilities are not well-calibrated then we have to derive some function that does make them so. The data used to calibrate then has to be regarded as ‘training’ data’ and the calibrated curve tested on another data set. I think that the issue of calibration needs clarifying.

1 Like

My journey to understand this thing called “evidence based decision making” started with a vague intuition that the norms taught to me and my colleagues were logically flawed. Since then I’ve been motivated to borrow tools from mathematical and philosophical logic to formalize this beautiful narrative description of principled and honest scientific discourse by Paul Rosenbaum in his book Observational Studies (section 1.3). Here is an excerpt, but the entire short section is worth thinking about.

I’ve learned much from Sander’s (and Frank’s) writings and posts. I credit his presentations on rebuilding statistics upon information theoretic grounds as crucial for scientific understanding.

I find it surprising that there are a large number of scholars who think practicing good science is distinct from (Bayesian) decision theoretic considerations. As a first order approximation of a rational scientific actor, an agent who attempts to maximize the information from “experiments” (defined to include observational studies) seems like a good starting point.

I acknowledge this is a minority position, but after much study, I have to disagree with the causal
inference scholars who claim probability is not a strong enough language with which to express
causal concepts. There were some interesting Twitter threads (now deleted, sadly) where Harry Crane and Nassim Taleb challenged Pearl on his position that causation is outside of standard statistical inference.

Causal inference is closely related to exchangeability, and disagreements about
study design are better discussed in terms of what factors render the groups being considered not
exchangeable.

Causal inference is just inference.[1] A community of scholars can be modeled as a group
of bettors; those who have the best models of future observations (in the sense their forecast enable
them to win more than they lose on a consistent basis). Converging to the best causal model ends
the process of betting on outcomes, unless someone finds an anomaly worth betting on, of course.

Possession of good causal models enable one to be like the gambler in JL Kelly’s paper A New Interpretation of the Information Rate.

  1. David Rohde Proceedings on “I (Still) Can’t Believe It’s Not Better!” at NeurIPS 2021 Workshops,
    PMLR 163:75-79, 2022. The supplement is just as valuable as the main paper.
    Causal Inference, is just Inference: A beautifully simple idea that not everyone accepts

Further Reading

Gelman’s blog had an in-depth discussion: Causal inference in AI: Expressing potential outcomes in a graphical-modeling framework that can be fit using Stan

This comment from Daniel Lakeland in that thread sums up my attitude on this elegantly:

Blockquote
I found it very frustrating to talk with Pearl regarding these issues (there was a long exchange between us on this blog about 3 or 4 years back), because I came to the conclusion just as you have that his understanding of what is probability theory and statistics is entirely frequentist … and my understanding was Bayesian… and so we talked past each other… He even acknowledged knowing about the development of the Cox/Jaynes theory of probability as extended logic, but seemed to gloss over any actual understanding of it.

Another Gelman post is here: Resolving disputes between J. Pearl and D. Rubin on causal inference

2 Likes

The outcome in this case is a continuous variable or categorical with bins for ranges of severity. Expected values comprise this ATE: E[Y_1 - Y_0]. I understand that a physician will still want more information from the studies. Separating the ATE would seem to be useful: E[Y_1] and E[Y_0]. Nevertheless, if no other information is available, an RCT’s ATE result would seem to be useful to some degree.

One simple example of a confounder would be income. A patient with high income can afford a treatment recommendation and also is less likely to suffer from low income situations like malnutrition or financial stress. This can increase the likelihood of doing the treatment and of a positive outcome. This particular confounder may be weak and a situation may just not have any strong confounders. Then the data from the observations would not contribute much to probabilities of causation such as P(\text{benefit}) and P(\text{harm}).

Confounding in observational studies is good when estimating probabilities of causation. But a good observational study may not be feasible or available.

True, we can’t go back in time and withhold a drug that was administered (or vice-versa) to see what the two results would have been for an individual. We can only estimate ranges of probabilities of how a person will react, with and without treatment.

Yes, ideally we encapsulate these biological mechanisms in the form of a DAG and possibly functional mechanisms between variables (or at least constraints on these functions). However, in the absence of biological mechanisms, we can still compute probability ranges on probabilities of causation. Those ranges might be looser/wider without the expert knowledge, but they can often be narrow enough to make good decisions from.

Upon reading your sentence again, I think I misunderstood. With disease severity, I thought you were referring to the outcome, but now I think you were referring to a covariate. In this case, what you’d want is a Conditional Average Treatment Effect (CATE): E[Y_1 - Y_0|S], where S is disease severity. So the RCT results would either be grouped by S or a function would be fit with S as input.

Thank you @Scott for answering my questions. You are proposing different ways of reasoning with medical knowledge by invoking some principles of causal inference. @R_cubed points out that there these methods are debatable. There is also of course much disagreement amongst medical scientists when they advance alternative hypotheses based on different theories and background knowledge. The way forward of course is to conduct RCTs and observational studies to test these hypotheses to find out what happens in practice and to calibrate probabilities arrived at from RCTs with or without causal inference.

I remain concerned about the different advice that you and I would give a patient based on the result of the RCT and observational study described in your paper. You would assure a female patient that no harm can occur by choosing to take an ‘over-the counter’ drug, whereas I would warn her that it is unsafe unless taken with the same close supervision as during the RCT. I would be grateful if you could explain why we would give this different advice.

My numbers and hypothetical advice assume that the patient would administer treatment properly (at the same level as was done in the RCT). Of course, if there’s a risk of the patient not administering treatment properly then that has to be taken into account in the advice and discussion about the treatment.

But the observational study shows clear evidence that more die after choosing to take the drug than die after choosing not to take it. This implies that allowing the patient to choose ’causes’ many more to die (due to an adverse effect from one causal mechanism such as not taking the drug properly etc) that are ‘saved’ (by some other causal mechanism suggested by the RCT). Accordingly you surety cannot assume that such an adverse effect will not happen again and then conclude that a future patient choosing to take the drug has a zero probability of dying from the adverse effect. In the light of the observational study, the FDA would surely refuse a license for the proposed use (ie by allowing the patient to choose). There seems to be a divergence here between our causal inference processes! It seems that amongst other things, you may not be allowing for the possibility of more than one causal mechanism happening ‘in parallel’ at the same time.

2 Likes

I found a number of excellent videos on this dispute about causal inference that should be of interest.

Speaker bios

The presenters include: Larry Wasserman giving the frequentist POV, Philip Dawid, and Finnian Lattimore giving the Bayesian one.

The ones I’ve made it through so far:
Larry Wasserman - Problems With Bayesian Causal Inference

His main complaint is that Bayesian credible intervals don’t necessarily have frequentist coverage. I’d simply point out that from Geisser’s perspective: in large areas of application, there is no parameter. Parameters are useful fictions. This is evident in extracting risk neutral implied densities from options prices (ie. in a bankruptcy or M&A scenario). But he discusses how causal inference questions are more challenging from a computational POV than the more simple versions seen assessment of interventions.

Philip Dawid - Causal Inference Is Just Bayesian Decision Theory

This is pretty much my intuition on the issue, but Dawid shows how causal inference problems are merely an instance of the more general Bayesian process and equivalent to Pearl’s DAGs.with the assumption of vast amounts of data.

Finnian Lattimore - Causal Inference with Bayes Rule

Excellent discussion on how to do Bayesian (Causal) inference with finite data. I’m very sympathetic to her POV that causal inference isn’t all that different from “statistical” inference, but one participant (Carlos Cinelli) kept insisting there is some distinction between “causal inference” and decision theory applications. They seem to insist that modelling the relation among probability distributions is meta to the “statistical inference” process, while that step seems naturally part of it to a Bayesian.

Panel Discussion: Does causality mean we need to go beyond Bayesian decision theory?

The key segment is the exchange between Lattimore and a Carlos Cinelli from 19:00 mark to 24:10

From 19:00 to 22:46, Carlos made the claim that “causal inference” is deducing the functional relation of variables in the model. Statistical inference for him is simply an inductive procedure for choosing the most likely distribution.from a sample. He calls the deductive process “logical” and in an effort to make Bayesian inference appear absurd, he concludes that “If logical deduction is Bayesian, everything is Bayesian inference.”

From 22:46 to 23:36 Lattimore describes the Bayesian approach as a general procedure for learning, and uses Planck’s constant as an example. In her description, you “define a model linking observations to the Planck’s constant that’s going to be some physics model” using known physical laws.

Cinelli objected to characterizing the use of physical laws to specify variable constrains when asking:

Cinelli: Do you consider the physics part, as part of statistics?
Lattimore: Yes. It is modelling.
Cinelli: (Thinking he is scoring a debate point) But it is a physical model, not a statistical model…
Lattimore: (Laughing at the absurdity) It is a physical model with random variables. It is a statistical model.

Lattimore effectively refuted the Perlian distinction between “statistical inference” and “causal inference.” Does “statistical mechanics” cease to be part of physics because of the use of “statistical” models?

The Perlian CGM school of thought essentially cannot conceive of the use of informative Bayesian inference, which physicists RT Cox and ET Jaynes advocated.

Another talk by David Rohde with the same title of the paper mentioned above. This was from March 2022.

David Rohde - Causal Inference is Inference – A beautifully simple idea that not everyone accepts

There are still interesting questions and conjectures I got from these talks:

  • Why do frequentist procedures, which ignore information in this context, do so well?
  • How might a Bayesian use frequentist methods when the obvious Bayesian mechanism is not
    computable?
  • How does the Approximate Bayesian Computation program (ABC) relate to frequentist methods?

This paper by Donald Rubin sparked research into avoiding the need for intensive likelihood computations that characterize ABC.

A relatively recent primer on ABC

2 Likes

I would agree with @HuwLlewelyn and will try to summarize what @HuwLlewelyn has said thus far (please correct me if I have made a mistake Huw)

RCT
Gender has no prognostic value for the outcome
Drug decreases odds of death by 72.3%
In the trial this translates to a reduction in death from 79% to 51%

Observational study
Untreated and treated males have the same death proportion
Untreated males and untreated females have the same death proportion
Treated females have a 6.3 fold increase in odds of death
This translates to a change from 30% baseline risk of death to 73% in treated females

Interpretation
We port the RCT to the observational study thus:
a) Odds in treated males goes from 0.43 to 0.43*(1-0.723) = 0.12 then increases by choice back to 0.43 (72.3% increase). Therefore the type of male that chooses therapy is one that for some reason has equal harm to the benefit of treatment
b) Odds in treated females goes from 0.43 to (1-0.723) = 0.12 then because of choice increases to 0.43*6.3 = 2.71 i.e. an increase of about 22 fold from 0.12. Therefore the type of female that chooses treatment is one that will have a much much greater harm than the benefit of treatment

Conclusion from observational data
We need to find out what the factor related to choice is that harms both men and women but harms women much more than men

Comparison to @scott
a) First, they tell us that the drug is not as safe as the RCT would have us believe, it may cause death in a sizable fraction of patients.
I would say this does not follow from the above – rather it is the choice related factor that is the culprit here
b) Second, they tell us that a woman is totally clear of such dangers, and should have no hesitation to take the drug, unlike a man, who faces a decision; a 21% chance of being harmed by the drug is cause for concern.
This again does not follow from the data because both seem to be harmed by the ability to choose but women much more than men
c) Physicians, likewise, should be aware of the risks involved before recommending the drug to a man.
Does not seem the right decision again for a physician
d) Third, the data tell policy makers what the overall societal benefit would be if the drug is administered to women only; 28% of the drug-takers would survive who would die otherwise.
Again the data seem to suggest otherwise

I would be grateful for input from @HuwLlewelyn and @scott and happy to be corrected in these calculations but my concerns seem mostly in line with what Huw has been saying all along regarding the clinical decision making implications of these studies

Predictive margins from logistic regression (hypothetical)

1 Like

Thank you. You do give a very good representation of my reasoning.

1 Like

I have decided to delete the previous post and put a more intuitive one for possible comments from @scott and @HuwLlewelyn that may be more helpful in understanding the issues raised
Bayes

I saw an exchange recently on Twitter and thought it was relevant to this thread. One poster wrote:

Can we express this sentence in mathematics: “similar patients given identical treatments will have different values in different studies”.

The assumption overriding counterfactuals is that Y(1, u) and Y(0,u) exist, and are immutable properties of u (the patient). Is it wrong?

Clinically speaking, the answer is “yes,” this assumption usually IS wrong when “u” is a human being. And it is exactly this fundamental misunderstanding that clinicians and statisticians find so frustrating about the hype surrounding the potential for “personalized medicine.”

Those who traffic, professionally, in stochasticity (as physicians and statisticians do), seem better placed to appreciate its scope (and therefore to give it the respect that it deserves) than those in other fields. Human biology/physiology and behaviour are each far more complex and far less predictable than a circuit board- and when human physiology and behaviour interact with each other in determining “response” to a treatment, look out- the number of possible outcomes is unfathomable.

Provided that nobody has tampered with the wiring in my house, I expect that when I turn off the breaker to my stove, it will shut off. When I flip the breaker the other way, I expect my oven to turn on. Sadly, patients are not as predictable as my oven. “Responsiveness” to a treatment is only credibly viewed as an “immutable” property of a patient in a very narrow set of clinical scenarios. Even in situations where a patient’s response to a given exposure has, historically been highly predictable (e.g., allergic reactions), responses nonetheless often attenuate over time. In oncology, where a tumour might initially respond to a treatment that blocks a biologic pathway driving the tumour’s growth, patients eventually, unfortunately, often stop responding to treatment.

Physicians have seen so many “unexpected” outcomes in their careers, that the unexpected is the only thing we have learned to expect in terms of patients’ response to treatment. If I have a patient with recurrent major depression, I am not the least bit surprised if the antidepressant that worked for her 5 years ago does not seem to work this time around. The same is true for treatment of many other conditions, including acute and chronic pain (e.g., migraine), therapies for substance abuse, epilepsy, lung disease, gynecologic disease, infectious disease…the list is endless. Rarely will a physician be surprised when a previously effective treatment does not generate the “expected” response.

An illustration of the pervasiveness of stochasticity in medicine and its impact on treatment “response:” I am not overly surprised if an otherwise healthy older patient who is anti coagulated for chronic atrial fibrillation nonetheless presents one day to the ER with a TIA. This is an “unexpected” event only in the sense that we had hoped that her anticoagulant would have made her absolute risk for TIA/stroke very low. But then, in follow-up a few days later in my office, the mystery is solved, when, on taking a careful history, the patient recalls that she had been distracted by an unplanned visit from her daughter and forgot to take her DOAC for 3 days prior to the event…

In short, human biology/physiology changes constantly and a patient’s “response” to a treatment is affected not only by these changes (which, in turn, are often affected by his environment), but also by innumerable ways in which his comorbidities interact over time, by his behaviour/decisions (in which case, physiologic complexity is effectively multiplied by behavioural complexity), and by innumerable stochastic factors that are part of everyday life. Physicians know that there are very few “immutable properties of u.”

4 Likes