Fully sequential Bayesian clinical trial design for in-hospital treatment of COVID-19

Our Research Methods Program of the Vanderbilt Institute for Clinical and Translational Research and Vanderbilt Department of Biostatistics have created a fully sequential design and analysis plan for the COVID ordinal outcome scale that we’d like to share. Suggestions welcomed. It along with related resources may be found at http://hbiostat.org/proj/covid19 . The plan will be evolving, especially with inclusion of some Bayesian operating characteristic simulations. It is our hope that other academic groups and pharmaceutical companies may benefit from this plan. By allowing unlimited looks at the data with no need for conservative multiplicity adjustments (nor even a way to implement such adjustments) this design allows the most rapid learning about treatment effects.


I am curious about the data that goes into the model.

First, is the outcome for each patient the worst outcome experienced by the patient or the outcome on day 15 after randomization? I’d hope that some patients would improve (and therefore their ordinal outcome would increase in level) but would it be better to model whether a medication would have a better final outcome (at day 15) or less severe disease overall?

Second, the document states that the SOFA score is going to have a linear effect in the model. I may be mistaken but isn’t this a point-based score that could be broken up into a number of sub-scores or even the continuous underlying measurements? As a practical matter, is it required that all measurements be performed on the same day and how do multiple measurements come into play? For example, the bilirubin and CBC might be measured occasionally and the MAP measured continually. The ventilation status is a part of the SOFA score (as part of the respiratory status) but also part of the ordinal outcome (as outcome levels 2 and 3). Would it be the worst SOFA score in the day prior to randomization that would be used or the most recent score prior to randomization?

Overall, its really interesting to think about. Perhaps in the future an Excel spreadsheet could be provided for standardized data entry and then a Shiny app distributed for analysis as an R package. Stan models can be a bit compute-intensive to run on a public server but a virtual machine or Docker image might make it easier for those without R/Rcpp experience to use.


Great questions Eric.

I know of one study being planned that will measure the ordinal scale daily. For others I think it’s the status at day 14 regardless of the path to get there. I may be wrong.

Sample sizes are not large enough to allow for adjustment of individual SOFA components, I have not discussed with the critical care docs how the component measurements are made but I assume they are measured on the day of randomization.

Regarding data entry the best solution is REDCap and I am talking to our PIs about sharing the REDCap metadata/forms online.

1 Like

At VUMC, if an IRB has been approved, then a REDCap project can be linked to Epic directly. This can allow pulling (some) data automagically from the EMR into REDCap and allowing other data to be updated from inside Epic/eStar to make it easier for clinicians and researchers to update patient’s information. It would also be possible to use the REDCap API to pull data directly into R and update an RMarkdown doc with the latest results (there may even be a way to use a web-hook to fire off recompiling the doc when changes happen to the REDCap data or just using a cron job.)

Alternatively, it shouldn’t be too difficult to put together some Epic Flowsheets, SmartTexts, and SmartLinks to pull the necessary data from the chart for use in documentation. VUMC eStar is currently on a build freeze pending the upgrade this weekend, but once that upgrade’s bugs get settled, this should be a straight-forward project for a physician builder of Epic analyst. Epic itself is even providing help to streamline the process (https://userweb.epic.com/Thread/96179/COVID-19-Clinical-Trials-Recruitment-Support/).

I’ll keep an eye out on the Epic forums for other Epic sites trying to do something similar. Most of what is up on the Epic userweb concerns ordersets for treating patients or dashboards to track testing and positive patients.

The main downside of this approach is that it while it could make it really easy for VUMC studies, it could be more difficult to translate to other institutions that don’t have REDCap.


I emailed @f2harrell some comments that are now posted at http://hbiostat.org/proj/covid19 . So, I guess it would be best for everyone to comment on the code in this thread.

Do people want me to amend the Stan code for the “basic ordered logit model” to do the thing where you put the Dirichlet prior on the conditional probability of falling in categories 1 … 7 for someone who has average predictors and then derive the cutpoints as transformed parameters from the inverse CDF of the logistic distribution?

1 Like

I think that would be good, and anything you want to work on in the Bayesian proportional odds world would be helpful to many. As an aside I’m not a fan of the term “cutpoint” or the cutpoint notation but like to think this way (as in here:

Let the ordered unique values of Y be denoted by y_{1}, y_{2}, \dots, y_{k} and
let the intercepts associated with y_{1}, \dots, y_{k} be \alpha_{1}, \alpha_{2}, \dots, \alpha_{k}, where \alpha_{1} = \infty because \text{Prob}[Y \geq y_{1}] = 1. Let \alpha_{y} = \alpha_{i}, i:y_{i}=y. Then

\text{Prob}[Y \geq y_{i} | X] = F(\alpha_{i} + X\beta) = F(\alpha_{y_{i}} + X\beta)

I don’t want derive cutpoints to be estimate intercepts. I know this is picky.

OK, I rewrote the Stan code to correspond to 13.3.1 of your PDF. The only differences are

  • The lowest outcome category is 1 (as in the draft design) rather than 0 (as in the PDF)
  • The Stan code assumes the design matrix has centered columns so that \alpha_j is the logit of \Pr\left(Y \geq j \mid \mathbf{x} = \mathbf{0}\right) for a person whose predictors are all average (even dummy variables).

This makes it possible to declare a simplex \mathbf{\pi} whose 7 elements can be interpreted as the probability of a person with average predictors falling into the corresponding outcome category. You can then put an uniformative Dirichlet prior on \mathbf{\pi} and derive \mathbf{\alpha} as an intermediate variable given \mathbf{\pi}.

However, the \boldsymbol{\alpha} in the Stan code are shifted by \overline{\mathbf{x}} \boldsymbol{\beta} relative to those in rms::orm where the predictors are not centered.

// Based on section 13.3.1 of http://hbiostat.org/doc/rms.pdf
functions { // can check these match rms::orm via rstan::expose_stan_functions
  // pointwise log-likelihood contributions
  vector pw_log_lik(vector alpha, vector beta, row_vector[] X, int[] y) {
    int N = size(X);
    vector[N] out;
    int k = max(y); // assumes all possible categories are observed
    for (n in 1:N) {
      real eta = X[n] * beta;
      int y_n = y[n];
      if (y_n == 1) out[n] = log1m_exp(-log1p_exp(-alpha[1] - eta));
      else if (y_n == k) out[n] = -log1p_exp(-alpha[k - 1]  - eta);
      else out[n] = log_diff_exp(-log1p_exp(-alpha[y_n - 1] - eta),
                                 -log1p_exp(-alpha[y_n]     - eta));
    return out;
  // Pr(y == j)
  matrix Pr(vector alpha, vector beta, row_vector[] X, int[] y) {
    int N = size(X);
    int k = max(y); // assumes all possible categories are observed
    matrix[N, k] out;
    for (n in 1:N) {
      real eta = X[n] * beta;
      out[n, 1] = log1m_exp(-log1p_exp(-alpha[1] - eta));
      out[n, k] = -log1p_exp(-alpha[k - 1]  - eta);
      for (y_n in 2:(k - 1))
        out[n, y_n] = log_diff_exp(-log1p_exp(-alpha[y_n - 1] - eta),
                                   -log1p_exp(-alpha[y_n]     - eta));
    return exp(out);
data {
  int<lower = 1> N;   // number of observations
  int<lower = 1> p;   // number of predictors
  row_vector[p] X[N]; // columnwise CENTERED predictors
  int<lower = 3> k;   // number of outcome categories (7)
  int<lower = 1, upper = k> y[N]; // outcome on 1 ... k
  // prior standard deviations
  vector<lower = 0>[p] sds;
parameters {
  vector[p] beta; // coefficients
  simplex[k] pi;  // category probabilities for a person w/ average predictors
transformed parameters {
  vector[k - 1] alpha;                               // intercepts
  vector[N] log_lik;                                 // log-likelihood pieces
  for (j in 2:k) alpha[j - 1] = logit(sum(pi[j:k])); // predictors are CENTERED
  log_lik = pw_log_lik(alpha, beta, X, y);
model {
  target += log_lik;
  target += normal_lpdf(beta | 0, sds);
  // implicit: pi ~ dirichlet(ones)
generated quantities {
  vector[p] OR = exp(beta);


This looks great @bgoodri! I have a development version of a package which implements a closely related Stan model on github here. The goal of this package/code is to analyze continuous or mixed continuous/discrete outcomes using cumulative probability models (i.e., the number of outcome categories k is similar or equal to the number of observations n). One big difference is the concentration parameter for the Dirichlet prior on \pi. Since we only have a small number of observations in each category, using the usual uninformative symmetric Dirichlet doesn’t work very well. Currently the package uses a concentration parameter of 1/k based on a heuristic argument that this is equivalent to adding 1 pseudo-observation, but I’m exploring other options.

Where in this design would substantive theories of postulated therapeutic effects enter into consideration? Take for example the several theories advanced in [1,pp5–6]:

Findings from previous studies have suggested that chloroquine and hydroxychloroquine may inhibit the coronavirus through a series of steps. Firstly, the drugs can change the pH at the surface of the cell membrane and thus, inhibit the fusion of the virus to the cell membrane. It can also inhibit nucleic acid replication, glycosylation of viral proteins, virus assembly, new virus particle transport, virus release and other process to achieve its antiviral effects.

… and then later on pp.15–16:

In some patients it has been reported that their immune response to the SARS-CoV-2 virus results in the increase of cytokines IL-6 and IL-10. This may progress to a cytokine storm, followed by multi-organ failure and potentially death. Both hydroxychloroquine and chloroquine have immunomodulatory effects and can suppress the increase of immune factors[29, 30]. Bearing this in mind, it is possible that early treatment with either of the drugs may help prevent the progression of the disease to a critical, life-threatening state.

How would continuous learning proceed about the several mechanisms of action posited here? I have to think that incorporating more information via multivariate, time-series outcomes would be essential. As I read it, the design in its present form seems organized around a single, ordinal outcome measure as its central feature.

  1. Yao X, Ye F, Zhang M, et al. In Vitro Antiviral Activity and Projection of Optimized Dosing Design of Hydroxychloroquine for the Treatment of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). Clinical Infectious Diseases. March 2020:ciaa237. doi:10.1093/cid/ciaa237


A March 23 paper [2] came to my attention this morning, which illustrates some of the opportunities for obtaining quantitative time-series data on the course of illness. N=23 patients were followed serially, with a mean of 7.5 respiratory specimens on which reverse transcriptase quantitative PCR (RT-qPCR) was done to quantify SARS-CoV-2 RNA. Moreover, serology for IgG and IgM antibodies to the virus surface spike receptor binding domain (RBD) and internal nucleoprotein (NP) were used to (semi?)quantitate antibody responses. The paper claimed, e.g., that initial viral load was high initially, and declined steadily. This suggests an opportunity to model the slope of log10 copies/mL RNA—a chance to estimate rather than hypothesis-test, which should please some of you! Furthermore, the onset of antibody response offers up a time-to-event outcome that might be worth modeling. (But on that latter point, please note that the IgG response and its timing may have a subtler—spline-able?—relation to outcomes as discussed at the top left-hand column on p.9 including with correlation to macaque experimental studies. This thread by virologist Akiko Iwasaki has a nice further explanation of the point made there.)

  1. To KK-W, Tsang OT-Y, Leung W-S, et al. Temporal profiles of viral load in posterior oropharyngeal saliva samples and serum antibody responses during infection by SARS-CoV-2: an observational cohort study. The Lancet Infectious Diseases. March 2020:S1473309920301961. doi:10.1016/S1473-3099(20)30196-1