Henry Ford Observational Study on HCQ in COVID-19-- failure of basic peer review

llynn · July 14, 2020, 3:29pm

So we talked about the limitations of selecting a few lab tests like D-Dimer or a SOFA score plus demographics to adjust for mortality. We talked about determining when those selections are sufficient or insufficient (as suggested in the initial post for HCQ).

Now lets consider the tests which might be needed. In the excellent article linked below we see figure 4. Note the cytokines (IL6 in particular). Note the potential presence or absence of cytokine release syndrome (CRS).

Most of these are not measured routinely. This begs the question, if a signal is not measured or determined can it be ignored. In other words, if we did not ask the age can we ignore age when adjusting?

This is not a rhetorical question.

llynn · July 21, 2020, 10:40am

My comments on datamethods have been primarily focused on seeking methods to achieve reproducibilty of critical care research.

The initial post in this thread relating to a HCQ observational study and the author identified lab values not considered re: the mortality endpoint. I agreed and pointed out that this is true of all critical care studies. The problem should be addressed in a formal review.

Yet. what if the primary endpoint itself is complex (like a SOFA score or “time to recovery”.)

Now SARS-COV-2 is the most mortal pandemic virus for 100 years but despite that those studying Remdesivir did not choose mortality as the primary endpoint. If they had it would have been a negative study. We, over 50 days out, are still waiting for the 28 d. mortality data.

Yet, “time to recovery” is not a trivial endpoint but how reproducible is the math which determines this endpoint in this study?

Here is the primary endpoint of this RCT.

"The primary analysis was a stratified log-rank test of the time to recovery with remdesivir as compared with placebo, with stratification by disease severity…

The primary outcome measure was the time to recovery, defined as the first day, during the 28 days after enrollment, on which a patient satisfied categories 1, 2, or 3 on the eight-category ordinal scale… "

The goal is to have reproducible math defining the condition AND defining the primary endpoint so the results of the continuous formula including the math in between (the statistics) are reproducible.

Here the term “with stratification by disease severity” is concerning. In critical care, “disease severity” statification is very difficult and may not be reproducible. In the ICU we have “severity of illness” (the overall severity of a patient) and “disease severity” the severity of the disease under test. Even the narrower of the two (disease severity) is difficult. For
example, treatment of pneumonia with a mechanical ventilator does not necessarily indicate greater disease severity than treatment with HFNO2.

As always, I am asking for help here.
Is this method of time to recovery (TTR) a solid primary endpoint?
How might TTR be better measured?

f2harrell · July 21, 2020, 11:29am

I am in the process of compiling a list of faults with the time to recovery endpoint. Some of the points I’m listing involve problems in counting bad events when considering time to a good event, informative censoring, inability to deal with missing data in the middle, and lack of statistical power. It would be helpful if everyone can think of particular clinical outcome scenarios that would fool time to recovery.

In my new detailed COVID-19 design document I’m pushing for longitudinal ordinal outcomes for therapeutic studies. These encompass TTR as a special case but capture so much more information, hence will lower the sample sizes needed to obtain sufficient evidence.

DavidJCohen · July 21, 2020, 2:08pm

What about relapse? For example, the patient is enrolled at the lower end of the disease spectrum and initially improves enough to meet the “recovery” endpoint. But then the patient relapses and gets even worse than the initial disease severity and remains in that condition at the end of the trial. Is that a true “recovery” event?

R_cubed · July 21, 2020, 2:25pm

Blockquote
I am in the process of compiling a list of faults with the time to recovery endpoint.

I’m looking forward to this.

I doubt I’m the only one, but I would greatly benefit from a post where you analyze the various endpoints used in clinical research from a statistical point of view, their limitations, and what could better answer the clinical question.

This goes back to a point you expressed in a very old thread:

Blockquote

Understand the measurements you are analyzing and don’t hesitate to question how the underlying information was captured.

f2harrell · July 21, 2020, 3:45pm

One definition I’ve seen requires “sustained recovery” which will solve that problem if the “sustain” period is long enough, but then it’s not a proper endpoint because it requres a peek into the future for a patient to be classified as “recovered”.

I need advice about the most convincing and insightful way to present this. There are a 3 approaches I can think of:

mock up 3 clinical scenarios and show how they are scored re: time to recovery vs. using more granular time-oriented information
re-analyze a completed clinical trial that used an inefficient endpoint (there are way too many to choose from but data availability is at issue) by using an efficient longitudinal endpoint and show the difference
simulate a trial that is analyzed multiple ways (but how to create the simulation model? how to keep it from being biased towards one analysis method being better than another?)

davidcnorrismd · July 21, 2020, 9:02pm

I would argue for adopting a dynamical systems perspective, especially in the critical-care context where equilibria, departures from them, and efforts to restore them, are central to clinicians’ thinking.

A suitable generic framework for inferring states of dynamical systems from measurements on them is provided by filtering aka ‘data assimilation’ [1]. These are inherently Bayesian concepts, it seems to me, so should have some broad appeal here.

From a philosophy-of-science perspective, I think this formulation helps clarify that the measurements—notwithstanding the great importance of good measurement—are not the (noumenal) primary objects of interest, but are merely (phenomenal) proxies for the underlying, latent quantities (‘states’) that substantive theories of disease and therapy will concern themselves with.

Künsch HR. Particle filters. Bernoulli. 2013;19(4):1391-1403. doi:10.3150/12-BEJSP07 [open access]

f2harrell · July 22, 2020, 1:18pm

I can’t actualize that.

davidcnorrismd · July 22, 2020, 4:32pm

You’re using dynamical systems concepts when you run Stan; why not use them when modeling the real systems of ultimate interest? I’m sure that in your years of collaboration with cardiologists you must have heard them use language like “falling off the Starling curve”. Such catastrophe-theoretic intuitions in turn invite consideration of physiology in terms of gradient dynamical systems, which are generic and easily simulated.

Regardless of whatever particular underlying formalism you might adopt, I do think a convincing and insightful development & presentation will require positing a DGP at least one level deeper (closer to reality) than any phenomenological treatment you might offer in terms of measurements and their statistical analysis.

f2harrell · July 22, 2020, 4:36pm

Sounds good but is beyond my ability to translate into a simulation. I’m able to simulate longitudinal ordinal data and time to first event data but so far haven’t thought beyond that. The longitudinal models I’m familiar with do not use time-dependent covariates or state transitions.

davidcnorrismd · July 22, 2020, 5:04pm

The DGP can have state transitions, etc., even if the models you build & estimate regard such concepts as a ‘black box’. Checking CRAN just now, I note several off-the-shelf solutions there, including this interesting package “Dynamical Systems Approach to Immune Response Modeling” updated only last week:

https://cran.r-project.org/package=DSAIRM

llynn · July 22, 2020, 6:49pm

I think this would be very instructional in the instant case.

llynn · July 1, 2021, 1:31am

New Remdesivir data for this old discussion.

As I pointed out in previous post across the US, the correct, science based, skepticism of the efficacy HCQ was not balanced with similar scepticism of Remdesivir. This was disconcerting and raised the question of political bias. We saw too much of that in 2020. It has taught the public the political bias often drives the behavior of scientists. I think they were not aware of that. Here are some fresh Remdesivir data. I think the company has some new data also which has the opposite conclusion. I don’t have that publication.

f2harrell · July 1, 2021, 2:14am

But the final data for the ACTT-1 NIAID NIH Remdesivir study showed a mortality reduction.

llynn · July 1, 2021, 2:22am

Yes. The paper cites that.

The original thread here was made awaiting the delayed mortality data.

My point was that the original NEJM pub was quite weak but few highlighted that at the time.

I suppose that ACTT1 dominates these Ops trials.

llynn · July 14, 2021, 12:44pm

Here are more data. The findings are as expected.

https://www.acpjournals.org/doi/10.7326/M21-0653

s_doi · July 15, 2021, 3:30pm

There are several things perhaps the authors can do to make this study more informative

The authors say that
“There was a total of 2,948 COVID-19 admissions, of these, 267 (9%) patients had not been discharged, 15 (0.5%) left against medical advice, and four (0.1%) were transferred to another healthcare facility; these patients were excluded from analysis as we could not ascertain their outcome. In addition, there were 121 (4.1%) readmissions, which were also excluded”
Thus there was complete ascertainment of in-hospital outcome in all patients, so the time-to-event analysis should have been a logistic regression analysis since time to the outcome is just a proxy for severity and death.
Percent O2 saturation, admission to ICU and ventilator use are all proxies for the outcome as presumably most deaths will be of those with more severe disease needing ICU care and/or ventilation. Why would one adjust for these? Better to create a proxy for ventilation or ICU admission or death and use that as the outcome.
Patients selected into the analysis need to be those treated for at least X days before the outcome (I leave X for the authors to justify).
Adjustments for important confounders need to be robust e.g. older age is the most important risk factor for severity and from table 1 were less likely to receive the intervention so the age grouping is inadequate

Perhaps if the authors are reading this they can run the appropriate logistic regression model according to 1-4 and advise what are the results?

EpiMD5 · July 15, 2021, 6:12pm

I am joining this conversation very late. Last July (July 19, 2020 to be exact), I sent Dr. Zervos, the corresponding author for this paper, a six page “technical review” of the paper, explaining that I did not believe it had been properly peer-reviewed and noting that the journal did not accept correspondence. I probably should have posted my review somewhere like here.

Other commentators about the paper have raised the points made in my review (and some new ones) with one exception. The issue is the problem of dealing with patients admitted with a “do not resuscitate” or “do not ventilate.” Here is what was in my (unsolicited) review.

It is not clear whether patients with a DNR advance directive or a do not ventilate advance directive at admission could have been admitted to the hospital and included in the analysis. If so, the publication should have stated how many patients had a DNR advance directive or a do not ventilate advance directive. If such patients were admitted, the publication should have described the approach to treatment (ICU admission, ventilation, HCQ, AZM, steroids) for these patients.

The clinical approach to the handling of DNR and do not ventilate advance directives and their effect on decision to give HCQ, AZM, and/or steroids and to admit to the ICU and ventilate has a potentially important effect on the results of the analysis.

A better model would not eliminate bias due to this unmeasured(or ignored) variable.

f2harrell · July 15, 2021, 6:35pm

Excellent points. Feel free to post your entire review here if you want.

EpiMD5 · July 15, 2021, 10:42pm

REVIEW OF YOUR ARTICLE

Why I am Writing This

The question of whether hydroxychloroquine (HCQ) affects outcomes in patients hospitalized with COVID-19 is important. The inherent importance of the question has been enhanced because of the near-hysterical political environment in which the question is being asked. The high profile nature of research about HCQ treatment of patients with COVID-19 conveys to researchers examining the question of HCQ and COVID-19 outcomes a high burden for assuring that the data the researchers present are valid and that the conclusions the researchers draw are supported by the data.

The publication of data on this question from the Henry Ford Health system has been described in the media as being peer-reviewed. I believe that the pre-publication peer review was not thorough. The research does not appear to have been reviewed by a statistician. There is evidence that any non-statistician reviewer did not pay close attention to the description of the methods.

Below I identify some serious issues with the analytic approach. I believe that these issues would have been identified by a statistical reviewer. I identify many issues with the description of Methods. I believe that these issues would have been identified by a more careful non-statistical peer reviewer.

Below I also describe an alternative approach to analysis of these observational data.

I do not know what an analysis that would have taken into account these comments would show about HCQ and mortality in hospitalized patients with COVID-19. I don’t care about the conclusions of an analysis that would have taken into account these comments. I care deeply about assuring that research published anywhere is the best that it can be. This publication is not the best that it could have been.

Other Treatments and Their Timing / Ability of Cox Regression to Account for Interactions

The analysis of observational data from hospitals in the Henry Ford Health system by Arshad et al. (1) focuses on the association of morality with hydroxychloroquine (HCQ) alone or in combination with azithromycin (AZM) in patients with COVID-19. Of the 2,541 patients in the study, 1,733 (68.2%) were given steroids and 114 (4.5%) were given tocilizumab, 614 (24.2%) were managed in the ICU, and 448 (17.6%) were mechanically ventilated. The percentage of patients who had these treatments/interventions differed between the patients treated with neither HCQ nor AZM and patients treated with HCQ alone or in combination with AZM.

The effect of HCQ on mortality might be modified by steroid use, treatment in the ICU, and/or mechanical ventilation. Treatment with HCQ might modify the use of the ICU and/or mechanical ventilation. Patients in the analysis received HCQ with or without AZM in a protocol-defined narrow time window after admission to the hospital. Even if protocol driven, the timing of steroid administration, the timing of ICU admission, and the timing of ventilation are not specified in the publication and might have differed among patients given HCQ with or without AZM compared with patients given neither HCQ nor AZM.

Of particular concern is the difference in the use of steroids between the patients given neither HCQ nor AZM and those given HCQ. Of the 409 patients given neither HCQ nor AZM, 146 (35.7%) were given steroids compared with 948 of the 1202 (78.9%) of the patients given HCQ alone and 582 of the 783 (74.3%) of the patients given HCQ + AZM.

Given these co-interventions and their timing, disentangling any effect of HCQ in causing a reduction in mortality is an analytic challenge.

The multivariate Cox regression model reported in Table 3 treats steroid use, ICU use, and ventilation dichotomously when estimating the HR for mortality in patients given HCQ alone, AZM alone, HCQ + AZM compared with patients given neither HCQ nor AZM. Modeling these co-interventions as dichotomies does not take into account possible interactions between steroids, ICU use, and ventilation in affecting mortality. It does not account for any effect of steroids, HCQ or HCQ + AZM on the timing of ICU admission and ventilation.

Adjustment for ICU Admission and Ventilator Use in Cox Regression Analysis

The rationale for adjusting for admission to ICU and use of a ventilator is not provided. Both of these variables are clearly related to the outcome—mortality. They also are related to exposure. It is not clear that these variables are confounders for which adjustment is appropriate. They might also be considered to be intermediate variables whose adjustment might constitute over-adjustment. This issue should have been discussed.

Definition of Covariates in Cox Regression Analysis

In the Cox regression analysis, age is modeled in two groups—age <65 years and age >=65 years. The rationale for dichotomizing age is not provided. The practice of dichotomizing continuous variable in regression analysis is generally frowned on. (Royston P, Altman DG, Sauerbrei W. Dichotomizing continuous predictors in multiple regression: a bad idea. Stat Med. 2006 Jan 15;25(1):127-41. doi:10.1002/sim.2331. PMID: 16217841.)

Age is an important predictor of mortality in COVID-19, with older people at substantially higher risk. In the Henry Ford data, the mean age of patients given neither HCQ nor AZM was 68.1 (±18.9) years and the median was 71 years. Patients given HCQ alone or HCQ + AZM were quite a lot younger—mean 63.2 (±15.6) years (difference of 5 years) and median 53 years (difference of 18 years) for patients given HCQ alone; mean 62.3 (±15.9) years (difference of 6 years) and median 62 years (difference of 9 years) for patients given HCQ + AZM. The difference in the percentage of very old patients is even more striking. The upper limit of the interquartile range for patients given neither HCQ nor AZM was 83 (Table 1), meaning that 25% of these patients were age 83 years or more. The upper limit of the interquartile range was 74 (Table 1) both for patients given HCQ alone and HCQ + AZM, meaning that 25% of these patients were age 74 years or more.

The confounding effect of age on mortality in patients with COVID-19 could have been taken into account better by creating dummy variables for age based on quartiles. Better yet, an exploratory analysis could have been done to identify an optimal approach to adjustment for differences in age among the groups studied.

The distributions of both BMI and O2 saturation are complexly related to HCQ/AZM and they are also important predictors of morality in prior studies. In the Cox regression analysis, BMI and O2 saturation were also modeled as dichotomies without providing a rationale for the choice of cut-points or for treating the variables as dichotomies. A rationale for the choice of cut-point for the variables should have been given and the decision to dichotomize justified.

Presentation of Kaplan Meier Plots, Not Plots Based on the Cox Regression Analysis

Figure 1 shows a Kaplan-Meier plot of survival probability by days after admission. A plot of the estimated adjusted survival probability for the four treatment groups (neither HCQ nor AZM, HCQ only, AZM, HCQ + AZM) based on the Cox regression model (preferably with 95% confidence intervals) should have been presented. The Cox regression model plots of estimated survival probabilities are not difficult to generate but they are not “automatic.”

Here is a link to a UCLA tutorial that provides detailed instruction on how to generate these plots using SAS. Introduction to Survival Analysis in SAS

Propensity-matched Analysis

In addition to the Cox regression analysis, the publication presents an analysis of survival probability based on propensity-matched analysis for patients given HCQ with or without AZM compared with patients not given HCQ. The comparator—patients not given HCQ—includes patients given AZM but not HCQ as well as patients given neither HCQ nor AZM. Presumably by design, an identical percentage (44.2%) of the 190 patients in the two groups examined were given steroids, an identical percentage (6.3%) were admitted to the ICU, an identical percentage (5.3%) were mechanically, and an identical percentage (1.1%) were given tocilizumab. But the propensity-matches analysis is based on data from 190 patients of the 556 patients not given HCQ (34.2%) compared with 190 of the 1,985 of patients given HCQ with or without AZM (9.6%), throwing away a lot of data.

More important, the 380 patients whose data were used in the propensity-matched analysis are a highly select and unrepresentative sample of the 2,541 patients included in the study. For example, only 6.3% of the 380 patients in the propensity-matched were admitted to the ICU compared with 24.2% of all patients; only 5.3% of the 380 the patients in the propensity-matched analysis were mechanically ventilated compared with 17.6% of all patients. The implications of the unrepresentativeness of the patients in the propensity-matched analysis should have been discussed.

Finally, it is impossible to draw a conclusion about a specific and independent effect of HCQ on mortality from the propensity-matched analysis because the 190 patients in the “HCQ with or without AZM” group were a mix of patients given HCQ alone and those given HCQ + AZM. The comparison group was a mix of patients not given HCQ but given AZM. The number and percentage of the 190 patients in the two groups who were given AZM should have been stated.

Puzzling Data on Severity of Illness Comparing Patients Given Neither HCQ nor AZM and Patients Given HCQ and AZM

Fifteen percent (15.2%) of patients given neither HCQ nor AZM were ever in the ICU compared with 37.0% of patients given HCQ + AZM. Eight percent (8.3%) of patients given neither HCQ nor AZM were mechanically ventilated compared with 29.9% of patients given HCQ + AZM.

Ignoring missing data, the mean mSOFA score was 4.0 (+/- 3.7) for the patients given neither HCQ nor AZM and 4.2 (+/- 3.1) for the patients given HCQ + AZM. The percentages of patients with severe hypoxemia at admission (max O2 sat <=85%) were almost identical in patients given neither HCQ nor AZM (15.9%) and patients given HCQ + AZM (16%). Thus, these two measures of illness severity of illness at admission (mSOFA and O2 saturation at admission) are very similar in the patients given neither HCQ nor AZM and the patients given HCQ + AZM.

The substantially higher percentage of patients given HCQ + AZM who were ever in the ICU and the higher percentage ventilated is puzzling given the measures of illness severity at admission. The publication should have discussed the issue and offered an explanation.

Died in ED

The abstract states that “all patients….were treated as inpatients for at least 48 hours unless expired within 24 hours.” The methods section states that “all patients ….were treated as inpatients for at least 48 hours unless they expired within that time frame.” Which is it—24 hours or 48 hours? It seems that patients who expired in the ED were not included. The publication should have stated this so there is no possibility of confusion. The publication should have given the number of patients treated as inpatients for at least 48 hours who were excluded because they expired within 24 hours or 48 hours (whichever is correct).

Do Not Resuscitate

It is not clear whether patients with a DNR advance directive or a do not ventilate advance directive at admission could have been admitted to the hospital and included in the analysis. If so, the publication should have stated how many patients had a DNR advance directive or a do not ventilate advance directive. If such patients were admitted, the publication should have described the approach to treatment (ICU admission, ventilation, HCQ, AZM, steroids) for these patients.

The clinical approach to the handling of DNR and do not ventilate advance directives and their effect on decision to give HCQ, AZM, and/or steroids and to admit to the ICU and ventilate has a potentially important effect on the results of the analysis.

Missing Data on mSOFA

Table 1 should have provided the number and percentage of patients according to mSOFA with missing data in addition to the number and percentage ignoring missing data. Possible reasons for the differences in missing mSOFA data comparing the four groups should have been given.

Exclusion of Patients Not Yet Discharged as of Some Date

The analysis was based on 2,541 of the 2,948 patients who were admitted RT-PCR positive. Patients who had not been discharged, those who left against medical advice, were transferred or represented re-admissions have been excluded. The cut-off date for “not discharged” should have been stated (i.e., “Patients who had not been discharged by month/day/year were excluded”).
The Discussion should have addressed how exclusion of these patients might have affected the results. These patients a complex mix of patients who have been ill for a long time but are not “hopeless” and those who have been recently admitted but have not recovered. These patients could have been treated as censored in a Cox regression analysis. The outcome of the analysis would then have been “discharged alive.”

Patients Already in the Hospital on March 10, 2020

The analysis included patients who had been admitted to the hospital prior to March 10, 2020 but tested positive for the SARS-CoV-2 virus using RT-PCR on or after March 10, 2020. The manuscript should have provided the number of patients admitted prior to March 10, 2020 but tested positive on or after March 10, 2020. The manuscript should have provided information on the number of these patients who were in the various treatment categories (neither HCQ nor AZM, HCQ only, AZM only, HCQ + AZM).

There appears to be a steep learning curve in management of hospitalized COVID-19 patients with improved outcomes as experience is gained. Exclusion of these patients from the analysis would have been preferable.

Protocol Implementation Date

The publication states that treatments were driven by a protocol that was uniform system-wide and established by a COVID-19 Task Force. The publication should have stated the date of implementation of the protocol system-wide. It should have described any activities to ensure uniformity of implementation across hospitals.

Another Way to Look at HCQ and AZM Independent of Steroids

From Table 2, it is possible to determine that, among the patients given neither HCQ nor AZM 263 were given a steroid and 146 were given a steroid but not HCQ. Among patients classified as being given HCQ alone, 254 were given no steroid and HCQ and 948 were given both HCQ and a steroid. A straightforward way to assess the association HCQ, AZM, and HCQ + AZM with mortality independent of any effect of steroids on mortality is an analysis that stratifies by treatment and presents crude mortality rates and adjusted HRs as laid out below.

The TABLE has been omitted. It suggested separate multivariate (logistic or Cox) analyses in strata of treatment defined by HCQ, AZM, steroid alone and in combination with adjustments first for only for age (better modeled) and sex and then for “other stuff.” The numbers get small, of course.