Individual response

f2harrell · January 19, 2023, 12:20pm

This discussion is incredibly helpful. @HuwLlewelyn joins @Stephen Senn in being the most impressive scientists I’ve known in their abilities to cut through arguments of others and to make cogent new arguments. It confirms what @Stephen has argued repeatedly that principles of experimental and clinical trial design must be brought to causal inference about treatment effects. The discussion also confirms my previous feeling that outside of special situations (such as analysis of treatment effects within RCTs compensating for non-adherence to treatment) causal inference remains a theoretical nicety and a great thought organizer but has not yet been translated to practical application in treatment evaluation. Hence the lack of uptake on the challenge put at https://discourse.datamethods.org/t/examples-of-solid-causal-inferences-from-purely-observational-data.

HuwLlewelyn · January 19, 2023, 2:04pm

One area where causal inference might be translated to a practical application in treatment evaluation is when taking HTE into consideration. This was a tweet that I addressed to Judea Pearl recently to which he did not reply:

In RCTs Irbesartan reduces risk of nephropathy. HbA1c & AER are risk factors. According to ‘causal’ medical theory, Irbesartan should reduce AER but not HbA1c. For HTE, should risk reduction be estimated due to that of AER alone & not HbA1c? How does CI notation express this?

How would @Stephen and others in this discussion design a study to answer this question?

Stephen · January 23, 2023, 8:39pm

I have yet to study Huw’s reply in detail but on a brief read I think that it gets to the nub of the argument. It seems baffling to me that consistency is considered to be reasonable or practical. However, I wonder if in fact M&P depends on more than just “that an individual response to treatment depends entirely on biological factors, unaffected by the settings in which treatment is taken”. The individuals contributing information from the observational studies are not the same individuals as in the RCTs. Thus we have to be able to assume that the two sets of individual are exchangeable to the extent needed in order to be able to solve for the unknowns. I do not consider this to be a reasonable assumption and referred to “study effects” as being a problem. The TARGET study is an excellent example of the problem Lessons from TGN1412 and TARGET: implications for observational studies and meta‐analysis - Senn - 2008 - Pharmaceutical Statistics - Wiley Online Library
The way that study effects are dealt with in conventional statistical approaches is either by declaring them as fixed and hence eliminating them by contrasts or as declaring them as random and then trying to estimate the variance component. All of this was extensively developed in connection with incomplete block designs by the Rothamsted school in the period 1925-1945.
My view is that adding observational data does not pull the rabbit out of the hat. Adding extra equations does not necessarily render a system identifiable, in particular, if in doing so one adds more unknowns.

HuwLlewelyn · January 24, 2023, 11:21pm

I would like to sum up following @Stephen’s and my latest skirmish with Judea Pearl on Twitter. He wrote that I was wrong to assume that p(Yt) from the RCT should have been equal to p(y|t) from the observation study. However he reasserted that p(y|t) was equal to p(yt|t), the latter being the result of a ‘Level 3’ or imaginary RCT result that applies to choosers (it can be imagined after reasoning from other established beliefs but cannot be done in realty). It seems that the assumption of ‘consistency’ is therefore a Level 3 or imagined result of p(yt|t) that is equal to (y|t) the observation study result. This assumption of ‘consistency’ is therefore unverifiable and un-refutable by study and based on personal belief leading to a forceful assertion.

The only probabilities supported by reliable data are the results of the RCT. If we are only prepared to rely on the RCT results (but not rely on forceful assertions based on imagination) then all we can conclude is that from counterfactual concepts, p(Individual Benefit) ≥ Pr(yt) - Pr(yc) and P(Individual Benefit) ≤ Pr(yt) and p(Individual Harm) = p(Individual Benefit) - (Pr(yt) - Pr(yc)) as I explained in a previous post. However, the latter probabilities of imaginary individual counterfactual outcomes do not seem to make any difference to practical decisions, which result in the reasoning set out in @Stephen’s Twitter response [See https://twitter.com/stephensenn/status/1617807975858704385?s=20&t=8Kkuwt3CM9K7ceCUYGnSXA].

f2harrell · January 25, 2023, 11:58am

Huw, I and others greatly appreciate your diligence on this incredibly important topic. I tried my very best to get Judea to join us here so that he could try to expand his arguments and provide details as you have done, and also to carefully read all the posts here, but to no avail. But your posts, like those of @Stephen are also highly useful for citing in tweets. If you haven’t done this already, clicking on the 3 dots at the bottom of the post pulls up a chain link symbol that can be clicked on to get the URL that leads directly to a specific reply, for inclusion in a tweet.

HuwLlewelyn · January 25, 2023, 12:26pm

I agree that it is a pity that Judea Pearl does not engage in our discussions on this site. I suppose he can still follow links to find out what we are writing; I will link my Twitter posts to this site more consistently from now on! He has now responded in a general way this morning to my question about how to verify his assumptions about consistency and I have asked for a link or reference to his source. You will have learnt from my recent post on ‘solid causal inferences’ [Examples of solid causal inferences from purely observational data - #26 by HuwLlewelyn] that I have spent a lot of thought and time on how we can use post licensing ‘observational’ studies learn how to apply RCT results to patient care and to monitor our effectiveness. I am hoping that my diligence in participating in these discussions will help me to learn how best to explain my own ideas to the statistical and CI communities (as in addition to clinicians in my own community).

f2harrell · January 31, 2023, 7:01pm

@Pavlos_Msaouel - this new R package is relevant: Multi-State Models for Oncology

Pavlos_Msaouel · January 31, 2023, 8:11pm

Very nice. Would prefer if the survival curves show the confidence bands for the difference as per your approach, e.g., here.

You may also find interesting how we modeled disease status here in an oncology phase I-II design scenario.

ESMD · February 20, 2023, 3:27pm

“If you are referring to the example in our paper, then my conclusion is somewhat different: The FDA should license the drug for all females and lounch (sic) a study to explore the existence of features E and F that produce benefit in some males and harm in others.”

I’d love to see how this would work in practice. It’s a shame the author won’t engage here to describe his proposal further.

Mathematics is not MY TERMS or YOUR TERMS, it is a useful language to communicate ideas unambiguously, even across disciplines.

Strong disagree. This statement is only true when all important and relevant stakeholders are able to communicate with, and understand, math and symbols- and the pool of such people is very small indeed.

Communicating in math and symbols is a great way to alienate a huge swathe of relevant stakeholders who might otherwise be able to identify major conceptual blindspots. Ultimately, the rate-limiting step in the process of getting any idea implemented is the ability to make ourselves understood by others…

HuwLlewelyn · February 27, 2023, 8:36pm

I will argue here that those males and females in the observation study must have been given advice based on the results of the RCT and that all the required information would have been available from the RCTs so that the observation study is not required. However, the RCT results had to be re-constructed by working backwards from the observation study. I will also address the point made by @ESMD that mathematical symbols should be linked to verbal reasoning in order to broaden discussions to make use of broader expertise.

The assumption that allowed this reconstruction was that the proportion of patients dying on no treatment in the observation study was the same as in the RCTs. Similarly, the proportion surviving on treatment in the observational study was in the same as in the RCT. This information can therefore be used to reconstruct what would have happened in the RCT if information about the nature of treatment was not available or had been withheld from the participants so that potential treatment choosers as well as those refusing had been randomised to be given treatment or no treatment.

There was clearly a big difference in the outcome of those patients choosing to take the treatment in the observational studies compared to those refusing, suggesting that it was not due to chance from some random or uninformed choice. This suggests that during the observational study, the choice was informed and based on advice as a result of what was discovered in the RCTs (or less likely known before the RCT was done but unethically withheld from the patients agreeing to participate). It can therefore be assumed that this knowledge would not have been available before the RCTs on females and males otherwise those patients who would be harmed or not helped significantly would have been excluded.

Disease severity is always known in patients recruited into a RCT. Those with minimal disease or very severe disease are usually excluded. Typically those with severe disease feel more uncomfortable and develop an unwanted outcome (e.g. death) more often than those with less severe disease and given the choice they would opt for treatment. For the sake of argument, the label {s} for severe will be applied to those who chose treatment in the observation study. However, the patient characteristic represented by {s} might have been something different (e.g. a known gene, DNA pattern or family history of anaphylaxis).

Figure 1 is what I call a ‘P Map’ that I use in my teaching and in the Oxford Handbook of Clinical Diagnosis to try to translate verbal reasoning with probabilities into mathematical symbols. The arrows represent probabilities statements e.g. in Figure 1 the top arrow from right to left states that ‘Of those with ’ Not Severe’ {s’} a proportion / probability of 210/300 = P(y_c|s’) = 0.7 lead to Survival (y_c)'. The remainder of Figure 1 represents the proportions and probabilities arising from those male and female participants who were randomised to the control (no treatment) group in the RCTs. They are represented by one figure because the results were identical for males and females.

Figure 1: The results of randomisation to control group in the RCTs on males and females
Figure 1e

Referring to the notation in Figure 1, we know from the RCT that P(y_c) = 0.21, P(y’_c) = 0.79 (see green type). We are told from the Observational Study (see red type) that the feature (s) that prompted choosing treatment occurred in 70% of males and females so P(s) = 0.7 and P(s’) = 0.3. We are also told that the 30% frequency of death in those on no treatment in the Observational Study was the same as in the RCT, so p(y’_c|s’) = 0.3. This information so far allows us to calculate all the other probabilities and proportions in Figure1. Thus from Bayes rule, p(s’|y’)_c = 0.3x0.3/0.79 = 0.114 so that p(s|y’_c) = 1-0.114 = 0.886. From Bayes rule, p(y’_c|s) = 0.79x0.886/0.7 = 1 so that p(y_c|s) = 1 – 1 = 0.

Figure 2: The results of randomisation to the treatment group in the RCT on females

Figure 2e

The result of the RCT on females when they were randomised to treatment is shown in Figure 2. This time were told that 27% of those with the feature (s) who chose treatment in the Observational Study would have been the same in the RCT, so in the latter, P(y_t|s) = 0.27 and P(y’_t|s) = 0.73. From Bayes rule, P(s|y’_t) = 0.7x0.73/0.511 = 1 so that P(‘|y’_t) = 1 – 1 = 0 and by Bayes rule, P(s’|y’_t) = 0. This also means that P(s’∩y’_t) = 0 and from Figure 1, P(s’∩y’_c) = 0.09.If p(Benefit) = [P(s’∩y_t)-P(s’∩y_c)]+[P(s∩y_t)-P(s∩y_c)[ = [0.3-0.21]+[0.189-0]=0.09+0.189=0.279, then p(Harm) = p(Benefit)-ATE= 0.279-0.279=0.

The above results means that those with and without feature {s} benefit from treatment by more surviving (and fewer dying) on treatment than on placebo). In other words between subsets {s’∩y_t} and {s’∩y_c} and also subsets {s∩y_t} and {s∩y_c} there was only benefit from treatment and no harm so p(Harm) was zero. However in those with {s’} few (30%) die on placebo. If the treatment had an unpleasant adverse effect (e.g. brain damage with life-long mental and physical incapacity) the treatment might be refused. This is what might have happened in the observation study. However of those with the feature {s}, 100% would die without treatment so the latter subgroup would choose it in an observation study after being so advised.

The result of the RCT on males when they were randomised to treatment is shown in Figure 3. This time were told that 70% of those with feature (s) who chose treatment in the Observational Study would have been the same in the RCT, so in the latter, P(y_t|s) = 0.7 and P(y’_t|s) = 0.3. From Bayes rule, P(s|y’_t) = 0.7x0.3/0.51 = 0.412 so that P(s‘|y’_t) = 0.588 and by Bayes rule again, P(s’|y’_t) = 0.51x0.588/0.3 = 1. This also means that P(s’∩y’_t) = 0.3*1 = 0.3 and from Figure 1, P(s’∩y’_c) = 0.09. In contrast to the female data, ‘benefit’ only occurs between {s∩y_t} and {s∩y_c} so for males if P(Benefit)=[P(s∩y_t)-P(s∩y_c)]=[0.49-0)=0.49, then p(Harm) = p(Benefit)-ATE=0.49-0.28=0.21

Figure 3: The results of randomisation to the treatment group in the RCT on males

Figure 3e

In the case of males, the reconstructed RCT result was very surprising. Many more men (actually 100%) were dying after treatment than on no treatment when it was 30% (exactly the same as in females). This suggested that the extra deaths on treatment were due to an adverse effect. It was also clear that none of those men surviving had taken the drug but all those dying had taken it. This would have been very noticeable to those conducting the RCT and would have prompted a detailed investigation leading to a discovery of the cause (e.g. anaphylaxis or fatal failure of an organ). Those males in the observational study would therefore have been forewarned not to take the drug unless they had the feature {s}.

The optimum strategy would therefore be to treat males with the feature {s} but not to treat those without that feature (i.e. s’). This means that a total of 49% would survive with {s} and being treated and a total of 21% with s’ and no treatment would survive giving a total of 49+21 = 70% surviving. If none of the men were treated 21% would survive. If they were all treated, 49% would survive. By contrast if all the females were treated, 49% would survive compared to 21% if none were treated. If only those females with {s} were treated 18.9% would survive together with 21% of those not treated giving as total of 39.9%. This is what happened in the observation study.

The CSM or FDA might license the treatment for all females but only the males with feature {s}.

ESMD · February 28, 2023, 12:30am

I’m sure there will be some who can follow the arguments above- unfortunately, I’m not among them. It would be very helpful if the authors of the paper in question could propose, in a simple, narrative way (no math, no symbols), how they would take a drug that showed promise in animal studies through the regulatory process to approval. I would like to see estimates of how many patients would be recruited to clinical trials, where patients would come from who might be included in observational studies, and when each type of study would occur relative to regulatory approval. All other arguments are moot if proposals for how such studies would be (ethically) conducted are based in a fantastic conceptualization of drug regulatory processes.

On another note, possibly of interest to you:

From post 104 in this thread:

“In any set of patients, with binary treatment and outcome, there are four types of patients: never-recoverer, benefiter, harmed-by-treatment, and always-recoverer.”

I wonder if this proposed approach is derived from the econometrics literature (?) See section 6.1 of this paper by Guido Imbens. It seems like this idea of “4 latent groups” might have originated in econometrics in the context of describing potential reactions of of young men to the Vietnam war draft:

https://onlinelibrary.wiley.com/doi/10.3982/ECTA21204?af=R

"Although we initially worked within that traditional latent index framework, our then- colleague at Harvard, Gary Chamberlain, suggested that it would improve transparency to remove what he called “the somewhat mysterious variable νi,” and to use a potential outcome notation not just for the outcomes, but also for the decision to serve in the military. Here, the pair of potential treatment values,

Wi(0)􏰀Wi(1) 􏰀

denotes whether a particular individual would serve if draft-eligible (the potential out- come Wi (1) ∈ {0􏰀 1}), and whether they would serve if not draft-eligible (the potential outcome Wi (0) ∈ {0􏰀 1}). This notation greatly clarified our argument and made clear that there are, in principle, four different types of individuals, as presented in Table I.14 There are never-takers, who do not serve irrespective of their draft-eligibility status, always- takers who serve irrespective of their draft-eligibility status, compliers, who only serve if draft eligible, and defiers, who only serve if not draft-eligible."

The authors seem to be trying to extrapolate the “four latent group” econometrics concept to human biologic responses to drug treatments. The extrapolation fails in this context, but this fact might not be obvious to those trained in computer science rather than biologic science…

HuwLlewelyn · February 28, 2023, 6:13pm

The 4 part individual effect of an intervention proposed by the causal inference community is based on counterfactuals. For example, if a group of 10 people are treated and 6 survive and then we go back in time and don’t treat, 4 survive. However 2 individuals would have survived with or without treatment (always survivors), 4 would have survived with treatment but not without (benefited), 2 would have survived without treatment but not with treatment (harmed) and 2 would not have survived with or without treatment (never survivors).

In order to discover what happened to each individual above we would need a Time Machine to treat, go back in time and not treat and then compare what happened to each individual. However Pearl & Muller calculated the above proportions (but not what happened to each individual) using various inequalities from a combination of RCTs and observational studies.

There is also a question of stochastic processes. In the messy real world if the above counterfactual study was repeated a few days later the above 2 individuals ‘harmed’ in the first study might appear in the benefit group the second time and 2 of those in the ‘benefit’ group in the first study might appear in the ‘harm’ group during the second study. The overall proportions of 6/10 and 4/10 would stay the same suggesting that the treatment was beneficial on the whole. Individuals from all 4 groups would probably jump around leaving the overall proportions the same.

The problem is that Pearl and Muller don’t explain how knowing the above 4 proportions changes the decision of how to advise an individual when making a decision about whether to accept or decline a treatment. @Stephen and @phildawid have written a paper recently explaining why "the approach is dangerously misguided and should not be used in practice” https://arxiv.org/pdf/2301.11976.pdf. I agree that the 4 proportions are of theoretical interest only and have no place in practical decisions including those made using established decision theory.

In my latest post 220 Individual response - #224 by HuwLlewelyn I suggest that the 4 proportions (for what they are worth) can be arrived at by using traditional diagnostic reasoning from RCT results alone using covariants (e.g. those that represent disease severity or other information such as genetic markers). Observational studies are not necessary. Also the 4 proportions provide less information than that of diagnostic reasoning as explained in my ‘P Maps’.

The only way that I can envision individuals really being harmed and also benefiting from a single treatment are via two different causal mechanisms. For example a drug might benefit by killing cancer cells but harm by killing bone marrow cells. You would then have 2x4 theoretical proportions, 4 for each of the 2 causal mechanisms for what they are worth.

ESMD · March 1, 2023, 12:17am

Thank you for this terrific narrative explanation; all your hard work to arrive at this point is greatly appreciated. But I can’t help but think that it shouldn’t have required such slogging by any reader to arrive at this elegant summary. Also, thanks for the link to the arxiv publication- mathematically-inclined readers will surely appreciate its insights.

llynn · March 7, 2023, 11:45pm

How would we put that to the test? Here in this link the RCT find no effect on the rate of postoperative complications using perioperative pulse oximetry yet it is the standard of care. Why?

First relying solely on RCT data, without “observational evidence”, whatever that is, pulse oximetry might be abandoned. Indeed considered from a utilitarian perspective perioperative pulse oximetry may be a waste of money. .However from the perspective of "individual response’ pulse oximetry is viewed as pivotal by clinicians. If there is individual benefit from pulse oximetry, how does individual harm from of pulse oximetry balance that out to render no net benefit for the group under test?

Arguably an RCT is too simple to allow discovery of the individual response which renders the extant theory of the balancing harm and benefit from perioperative pulse oximetry. For this reason clinicians may largely ignore the results of the RCT when they are applied to in the study of treatment or testing applied to a largely unmeasured bucket of highly complex, heterogeneous pathophysiologic conditions.

So RCT have limits which are more definable when observational studies are contemporaneously or previously applied.

s_doi · March 8, 2023, 6:20am

I am not sure this logic follows. A well conducted observational study may give more information than a poorly conducted RCT but clearly a well conducted RCT is better. These RCTs seem poorly conducted to me as both arms had the intervention and the fact that oximetry is “limited” in the control arm seems unethical to say the least. As Huw said previously, we do not need a RCT to compare outcomes between jumpers with intact parachutes and parachutes with a hole in them so the research question needs to be ethical and well chosen

HuwLlewelyn · March 8, 2023, 11:10am

The observation study described by Scott and Pearl involved not giving treatment to those in one (refusnik) group (they had mild disease perhaps) and giving it to those in another compliant group (perhaps they had severe disease). Therefore, a comparison of treatment against no treatment in both groups was not possible in the observational study. However, they ASSUMED that the proportion surviving in the ‘mild’ refusnik no treatment observation group was the same as in the RCT (implying that they could have discovered it from the RCT). They also assumed that the proportion dying in the treated compliant ‘severe’ group was the same as in the RCT.

In practice, an observational study and RCT should be done using the mild/severe split to check that the above proportions are the same and if not to question the fact (e.g. Was the treatment given properly in the RCT but not in the observational study?). The observational study could also contain many more subjects than the RCT and might detect rare adverse effects that were not detected in the smaller sized RCT. Even so, there could be an objection that the adverse effects could have occurred equally often in a large control group that was of course absent in the observational study.

Regarding the oximeter example, perhaps they should have by way of analogy included patients at high and low risk of hypoxaemia (or better still a range of risks from very low to very high) in their RCTs. They might then be able to tell which patients with different levels of risk (if any) would benefit.

llynn · March 8, 2023, 1:41pm

Why? Pulse oximetry clearly can be harmful. Without an RCT how do we know whether the ATE is positive or negative?. So why is it unethical. Furthermore the studies were done so the instant discussion relates to the science and math of the studies.

Pulse oximeters are not a simple parachutes but more importantly the many pathophysiologies of unexpected death and their evolution in the hospital is not comparable to death caused by precipitous deceleration after an inadequately impeded force of gravity…

Interesting idea but it is actually the timing of the hypoxemia in relation to the death pattern not its occurrence which present theory suggests determines the benefit or risk of pulse oximetry and that cannot be known prior to the RCT.

I provide this example to show that, when engaging complex and dynamic questions in the setting of highly heterogenous pathophysiology the RCT cannot provide the “why” which is often needed. to understand results. Why didn’t the RCT show a benefit or harm? We are left speculating and worse, if it does not give us the result we were sure it would we default to arguing the RCT were poorly done an ignore the results. But here, if we believe that pulse oximeter provide benefit then the result of the RCTs suggest that it causes harm which balanced out the benefit. So there is important information derived from the RCT, you just have to put the bias aside and trust the results and then explore the reasons why pulse oximetry might be harmful. .

HuwLlewelyn · March 8, 2023, 2:54pm

I agree. The ‘why’ in the form of a possible subsequent explanation for an outcome with and without intervention should have been part of the original hypothesis being tested by that RCT. We should try to reason why and under what circumstances a pulse oximter should help by reducing the frequency of some unwanted outcome and design a RCT to test this hypothesis.

ESMD · March 8, 2023, 4:21pm

Yes. It’s important not to indict the RCT method because some people, historically, haven’t designed them with enough foresight/care.

Assay sensitivity seems to be the key ingredient missing in the design of many of the RCTs Lawrence is concerned about. “Post-operative status” isn’t a disease, so maybe it’s not a great inclusion criterion for an RCT (in contrast to e.g,. acute occlusion MI or gallstone pancreatitis).

Randomly assigning an experimental intervention to subjects who are in a “physiologic state” that can be arrived at by many different biologic pathways, without a deep understanding of the prognostic distribution of untreated subjects, isn’t an ideal approach. If we could turn back the clock to a time before perioperative pulse oximetry became routine, we could maybe imagine a better way to design RCTs to reveal its benefits. For example, a reasonable first step might have been careful analysis of cases involving patients who died suddenly in the perioperative period while not being monitored. Attention to cause of death and measures that might plausibly have averted a bad outcome (e.g., a pulse oximeter alarming), might have identified a patient subset that is more likely to benefit from oximetry. After that, an RCT aiming to corroborate a benefit for oximetry could have been enriched with higher risk patients. Observing more adverse outcomes might have allowed any intrinsic benefit of oximetry to be detected more efficiently. For example, maybe a trial that enrolled only post-op patients with COPD or neuromuscular disease could show a benefit, whereas an RCT involving “all-comers” in the post-op period, would not.

The above process sounds logical enough. But once a medical practice has become firmly established, there will be many who argue that clinical equipoise has been lost. This is especially true if the intervention is cheap and doesn’t use a lot of resources, where downsides to empiric intervention, even without RCT “proof” of benefit, are hard to fathom, and where the potential consequences of not intervening are serious. The second to last point is the real nub of the issue. Often, some stakeholders perceive potential downsides to an intervention, where other stakeholders don’t (see the endless debate re mask mandates during the pandemic). In the case of pulse oximetry, every anesthetist probably can recall a few cases where a pulse oximeter was the first indicator of a patient’s unanticipated abrupt postoperative decompensation- those types of cases probably stick with a person for a very long time…

Finally, it seems important not to start seeing the potential for qualitative interactions everywhere we look. While their presence might be more plausible in a poorly-designed RCT that has lumped a pile of patients together who have no business being part of the same experiment, a well-designed RCT, focusing on patients with more homogeneous disease (e.g., acute occlusion MI) would probably be much less likely to involve important treatment by patient qualitative interactions.

Arguing that EVERY ostensibly neutral RCT plausibly might be “hiding” signals of efficacy that have simply been “obscured” by qualitative interactions, assumes that EVERY treatment we can imagine plausibly has the potential to benefit some patients and harm others- we just need to keep examining people on a more and more granular level in order to distinguish “responders” from “non-responders.” But of course, this argument is susceptible to infinite regress and isn’t a realistic basis for approving new drugs and devices.

llynn · March 8, 2023, 7:15pm

Yes, here they were studying a continuous testing device generating a dynamic testing result. Such dynamic testing invariably transition from true negative to false negative to true positive. The period of false negativity poses risk to any subset of patients requiring time sensitive intervention because it induces a false sense of security which may cause a delay in critical, time sensitive, intervention .

Now we see the mix of pathophysiologies renders a mix of patients wherein the period of false negativity is long (harm) and wherein it is short (benefit). So this is the same fundamental problem I discussed previously in relation to RCT of poorly defined synthetic syndromes, like sleep apnea, sepsis, and ARDS.

The key here is that there has been decades long a general misunderstanding of the applicability of RCT in the investigation of testing or intervention in poorly defined populations where the measurement (e.g. “all patients having procedure X”) is not a valid measurement for the treatment or test being studied.

Finally, the dynamic behavior of the individual confusion matrices in specific relation to the range of pathophysiologies under test must be understood. All of these require deep observational research to learn the dynamic relational patterns of the target adverse conditions. The idea that an RCT can routinely replace discovery in complex heterogenous environments is not true. OS are the source of requisite initial discovery.

I bring this to this “individual response” discussion because it shows that the nuanced relationship between individual harm and individual benefit, how these can be routinely hidden within the average and how this can result in wrongful conclusions. Applying the test in an RCT to a select population most likely to benefit might bias the result towards benefit if the test is then applied to a more broad population. because the number of those most likely to be harmed might be diminished.

These fundamental considerations underlie the potential effects of the severity disparities induced by choice. Unless the OS is constructed with an informed and narrow focus, not only might the severity be different in the refusniks, the pathophysiology itself might be different.