Best approach to comparing time-of-day effects for binary event (admission to hospital)

Hi, I have a dataset that I have some basic intuition about how to analyze, but I could use some further guidance.

I am looking at the likelihood that a patient presenting to the emergency department will be admitted to the hospital based on time-of-day of presentation. My data looks like this:


I have thought about several different ways to approach this data, but none seem perfect yet:

  1. Group by hour (24 groups), calcluate OR_hour (SUM_hour(FALSE)/SUM_hour(TRUE)) with standard errors SE(log(OR_hour)) = SQRT(1/SUM_hour(TRUE) + 1/SUM_hour(FALSE)), then look for relationship between log(OR) ~ Time. This seems like it would be highly dependent on how I group, and I’m not sure how to incorporate the SE into a linear model like that.

  2. Take a cumulative sum of events admission (TRUE) and non-admission(FALSE) across time from OO:OO to 23:59. This gives me two lines. If I normalize to total admissions and non-admissions, they span [0,1], but then I lose information about the total number, which seems would impact my uncertainty in my calculations. I feel like this is analogous to survival curves and a Cox Proportional-Hazards model might be appropriate, but the starting conditions feel different from those in Kaplan-Meier survival curves.

I would appreciate any guidance to any papers, resources, or other threads (that I failed to identify) that might put me on the right path here.



something like 1) eg log-binomial regression with some nonlinear terms and plot proportion moved from ED->hopsital over 24 hr?

edit: i guess some will say dont categorise continuous scale, but maybe it depends on the hypothesis? eg maybe they are interested in when shifts start/end?

Lrm(data,Admission ~ rcs(time,4)) avoids grouping and allows nonlinearity. Ignores periodicity (the relationship should be continuous between 23:59 and 00:00), which could be modeled with periodic splines though?

This is pretty easy to implement in mgcv using cyclic cubic regression splines:

time <- runif(100,0,24)
admitted <- rbinom(100, size=1, prob=0.5)
my_mod <- gam(admitted ~ s(time, bs="cc"), family=binomial(), method="REML")

Seems like a natural way to enforce continuity at midnight.


Thank you for your responses!

I like that this doesn’t force a sine wave on the data (no reason peak and trough should be 12 hours apart).

What about these ensures enforcement of continuity at midnight?

It seems one could also do this in a Bayesian fashion as follows:

I ended up implementing the following,

fit.sin <- brm(data =df, family=binomial,
             bf(Outcome | trials(1) ~ a + b * sin((Time+c)*d), a+b+c+d~ 1,nl=TRUE),
             prior = c(prior(normal(0,1),nlpar="a"),
                       prior(cauchy(0,1),nlpar="b",lb=-0.001), ## keep b positive, run phase across 24 hours
                       prior(cauchy(0,24),nlpar="c",lb=0,ub=24-0.001), ## phase across 24 hours since b> 0
                       prior(constant(3.141593/12),nlpar="d")), ### correction factor pi/12
             seed =10)

based on this example:

What I did like about this approach with the sine fit was it gave me firm parameters of peak and phase (via b and c) which otherwise were difficult to estimate from my data, since the peak and phase vary somewhat based on how finely or roughly I bin my data.

The next step I was trying but unable to figure out was making an asymmetric sine wave, where I still use a parametric sine function, but the peak-to-trough distance does not equal the trough-to-peak distance. It ended up requiring mod 24 math and if then statements.

f.asym.3 <- function(t,a,b0,b1,tau,ga,per=24){
  t <- (t - ga) %% per
  if (t < tau){ a + b0 * sin(pi/(tau)    *(t-tau))}
  else        { a + b1 * sin(pi/(per-tau)*(t-tau))}

I was able to piece together an implementation in brms using step functions to turn functions on and off based on the time, but it was too unwieldy and I abandoned it before finishing the implementation. Have to think there was an easier way…

1 Like

They are periodic by design and smoothly continuous?

More specifically, I don’t clearly see what tells the function that 0 and 24 (or 0 and 2 \pi) are the same. If I were to feed in data from hours 0 to 22, would it assume that 0 and 22 are the same?

Look at the implementation for the R package. They require you to specify the domain, i.e., 0 through 24. Xmin and xmax or something similar. is one example, another is mentioned above in the thread.

Also consider a repeating spline function which doesn’t require special nonlinear estimation.