Henry Ford Observational Study on HCQ in COVID-19-- failure of basic peer review

A basic criteria for CRITICAL illness is mechanical ventilatory support.
https://dx.doi.org/10.1016%2Fj.cjca.2020.04.010

So we talked about the limitations of selecting a few lab tests like D-Dimer or a SOFA score plus demographics to adjust for mortality. We talked about determining when those selections are sufficient or insufficient (as suggested in the initial post for HCQ).

Now lets consider the tests which might be needed. In the excellent article linked below we see figure 4. Note the cytokines (IL6 in particular). Note the potential presence or absence of cytokine release syndrome (CRS).

Most of these are not measured routinely. This begs the question, if a signal is not measured or determined can it be ignored. In other words, if we did not ask the age can we ignore age when adjusting?

This is not a rhetorical question.

My comments on datamethods have been primarily focused on seeking methods to achieve reproducibilty of critical care research.

The initial post in this thread relating to a HCQ observational study and the author identified lab values not considered re: the mortality endpoint. I agreed and pointed out that this is true of all critical care studies. The problem should be addressed in a formal review.

Yet. what if the primary endpoint itself is complex (like a SOFA score or “time to recovery”.)

Now SARS-COV-2 is the most mortal pandemic virus for 100 years but despite that those studying Remdesivir did not choose mortality as the primary endpoint. If they had it would have been a negative study. We, over 50 days out, are still waiting for the 28 d. mortality data.

Yet, “time to recovery” is not a trivial endpoint but how reproducible is the math which determines this endpoint in this study?

Here is the primary endpoint of this RCT.

"The primary analysis was a stratified log-rank test of the time to recovery with remdesivir as compared with placebo, with stratification by disease severity…

The primary outcome measure was the time to recovery, defined as the first day, during the 28 days after enrollment, on which a patient satisfied categories 1, 2, or 3 on the eight-category ordinal scale… "

The goal is to have reproducible math defining the condition AND defining the primary endpoint so the results of the continuous formula including the math in between (the statistics) are reproducible.

Here the term “with stratification by disease severity” is concerning. In critical care, “disease severity” statification is very difficult and may not be reproducible. In the ICU we have “severity of illness” (the overall severity of a patient) and “disease severity” the severity of the disease under test. Even the narrower of the two (disease severity) is difficult. For
example, treatment of pneumonia with a mechanical ventilator does not necessarily indicate greater disease severity than treatment with HFNO2.

As always, I am asking for help here.
Is this method of time to recovery (TTR) a solid primary endpoint?
How might TTR be better measured?

1 Like

I am in the process of compiling a list of faults with the time to recovery endpoint. Some of the points I’m listing involve problems in counting bad events when considering time to a good event, informative censoring, inability to deal with missing data in the middle, and lack of statistical power. It would be helpful if everyone can think of particular clinical outcome scenarios that would fool time to recovery.

In my new detailed COVID-19 design document I’m pushing for longitudinal ordinal outcomes for therapeutic studies. These encompass TTR as a special case but capture so much more information, hence will lower the sample sizes needed to obtain sufficient evidence.

4 Likes

What about relapse? For example, the patient is enrolled at the lower end of the disease spectrum and initially improves enough to meet the “recovery” endpoint. But then the patient relapses and gets even worse than the initial disease severity and remains in that condition at the end of the trial. Is that a true “recovery” event?

1 Like

Blockquote
I am in the process of compiling a list of faults with the time to recovery endpoint.

I’m looking forward to this.

I doubt I’m the only one, but I would greatly benefit from a post where you analyze the various endpoints used in clinical research from a statistical point of view, their limitations, and what could better answer the clinical question.

This goes back to a point you expressed in a very old thread:

Blockquote

  1. Understand the measurements you are analyzing and don’t hesitate to question how the underlying information was captured.
1 Like

One definition I’ve seen requires “sustained recovery” which will solve that problem if the “sustain” period is long enough, but then it’s not a proper endpoint because it requres a peek into the future for a patient to be classified as “recovered”.

I need advice about the most convincing and insightful way to present this. There are a 3 approaches I can think of:

  • mock up 3 clinical scenarios and show how they are scored re: time to recovery vs. using more granular time-oriented information
  • re-analyze a completed clinical trial that used an inefficient endpoint (there are way too many to choose from but data availability is at issue) by using an efficient longitudinal endpoint and show the difference
  • simulate a trial that is analyzed multiple ways (but how to create the simulation model? how to keep it from being biased towards one analysis method being better than another?)
3 Likes

I would argue for adopting a dynamical systems perspective, especially in the critical-care context where equilibria, departures from them, and efforts to restore them, are central to clinicians’ thinking.

A suitable generic framework for inferring states of dynamical systems from measurements on them is provided by filtering aka ‘data assimilation’ [1]. These are inherently Bayesian concepts, it seems to me, so should have some broad appeal here.

From a philosophy-of-science perspective, I think this formulation helps clarify that the measurements—notwithstanding the great importance of good measurement—are not the (noumenal) primary objects of interest, but are merely (phenomenal) proxies for the underlying, latent quantities (‘states’) that substantive theories of disease and therapy will concern themselves with.

  1. Künsch HR. Particle filters. Bernoulli. 2013;19(4):1391-1403. doi:10.3150/12-BEJSP07 [open access]
1 Like

I can’t actualize that.

You’re using dynamical systems concepts when you run Stan; why not use them when modeling the real systems of ultimate interest? I’m sure that in your years of collaboration with cardiologists you must have heard them use language like “falling off the Starling curve”. Such catastrophe-theoretic intuitions in turn invite consideration of physiology in terms of gradient dynamical systems, which are generic and easily simulated.

Regardless of whatever particular underlying formalism you might adopt, I do think a convincing and insightful development & presentation will require positing a DGP at least one level deeper (closer to reality) than any phenomenological treatment you might offer in terms of measurements and their statistical analysis.

Sounds good but is beyond my ability to translate into a simulation. I’m able to simulate longitudinal ordinal data and time to first event data but so far haven’t thought beyond that. The longitudinal models I’m familiar with do not use time-dependent covariates or state transitions.

1 Like

The DGP can have state transitions, etc., even if the models you build & estimate regard such concepts as a ‘black box’. Checking CRAN just now, I note several off-the-shelf solutions there, including this interesting package “Dynamical Systems Approach to Immune Response Modeling” updated only last week:

https://cran.r-project.org/package=DSAIRM

1 Like

I think this would be very instructional in the instant case.

New Remdesivir data for this old discussion.

As I pointed out in previous post across the US, the correct, science based, skepticism of the efficacy HCQ was not balanced with similar scepticism of Remdesivir. This was disconcerting and raised the question of political bias. We saw too much of that in 2020. It has taught the public the political bias often drives the behavior of scientists. I think they were not aware of that. Here are some fresh Remdesivir data. I think the company has some new data also which has the opposite conclusion. I don’t have that publication.

But the final data for the ACTT-1 NIAID NIH Remdesivir study showed a mortality reduction.

1 Like

Yes. The paper cites that.

The original thread here was made awaiting the delayed mortality data.

My point was that the original NEJM pub was quite weak but few highlighted that at the time.

I suppose that ACTT1 dominates these Ops trials.

1 Like

Here are more data. The findings are as expected.

https://www.acpjournals.org/doi/10.7326/M21-0653

There are several things perhaps the authors can do to make this study more informative

  1. The authors say that
    There was a total of 2,948 COVID-19 admissions, of these, 267 (9%) patients had not been discharged, 15 (0.5%) left against medical advice, and four (0.1%) were transferred to another healthcare facility; these patients were excluded from analysis as we could not ascertain their outcome. In addition, there were 121 (4.1%) readmissions, which were also excluded
    Thus there was complete ascertainment of in-hospital outcome in all patients, so the time-to-event analysis should have been a logistic regression analysis since time to the outcome is just a proxy for severity and death.
  2. Percent O2 saturation, admission to ICU and ventilator use are all proxies for the outcome as presumably most deaths will be of those with more severe disease needing ICU care and/or ventilation. Why would one adjust for these? Better to create a proxy for ventilation or ICU admission or death and use that as the outcome.
  3. Patients selected into the analysis need to be those treated for at least X days before the outcome (I leave X for the authors to justify).
  4. Adjustments for important confounders need to be robust e.g. older age is the most important risk factor for severity and from table 1 were less likely to receive the intervention so the age grouping is inadequate

Perhaps if the authors are reading this they can run the appropriate logistic regression model according to 1-4 and advise what are the results?

I am joining this conversation very late. Last July (July 19, 2020 to be exact), I sent Dr. Zervos, the corresponding author for this paper, a six page “technical review” of the paper, explaining that I did not believe it had been properly peer-reviewed and noting that the journal did not accept correspondence. I probably should have posted my review somewhere like here.

Other commentators about the paper have raised the points made in my review (and some new ones) with one exception. The issue is the problem of dealing with patients admitted with a “do not resuscitate” or “do not ventilate.” Here is what was in my (unsolicited) review.

It is not clear whether patients with a DNR advance directive or a do not ventilate advance directive at admission could have been admitted to the hospital and included in the analysis. If so, the publication should have stated how many patients had a DNR advance directive or a do not ventilate advance directive. If such patients were admitted, the publication should have described the approach to treatment (ICU admission, ventilation, HCQ, AZM, steroids) for these patients.

The clinical approach to the handling of DNR and do not ventilate advance directives and their effect on decision to give HCQ, AZM, and/or steroids and to admit to the ICU and ventilate has a potentially important effect on the results of the analysis.

A better model would not eliminate bias due to this unmeasured(or ignored) variable.

1 Like

Excellent points. Feel free to post your entire review here if you want.