Most of the published predictive models of medication non adherence compare models basically by comparing c-statistic ( discrimination ability). Usually, non adherence is measured using the Proportion of days covered ( PDC), which is a continuous variable that is dichotomized ( PDC>=80. % considered to be good adherence). I have learned from this forum about the undesirability of dichotomizing continuous variables, but this is not my main concern.
By comparing c- statistics, some researchers showed that prediction of PDC of the second year since first filling, is greatly improved if the PDC of the first three months is used as a predictive variable in the model.
Is this not using part of Y in order to predict Y? Doesn’t this approach result in overoptimistic c-statistics ( even after 10- fold cross validation)?
Thanks,
Elias
Since it is not valid to dichotomize adherence at 0.8, and is completely unnecessary, it’s hard to get motivated to go further along those lines. But in general you can create a first-order Markov model where adherence in the previous period is used as a covariate for the current period. You can also interact that variable with absolute time since study start to give an even more flexible correlation pattern. You can marginalize the model (de-condition on previous periods) to get marginal quantities such as cumulative incidence. The Bayesian framework is easier in this context because it will automatically take parameter estimate uncertainty into account as described here.
It would be better to use finer time periods than a year, even to the point of having an observation at each prescription refill treating assessment time as a continuous variable.
Thanks for your suggestions.
Suppose that I want to predict adherence to statins or antithrombotics as a continuous variable (calculating PDC), in the first year post myocardial infarction ( which is the most important period with the highest risk of recurrence).
If I use hierarchical modeling ( level 2 - country of birth. I live in Israel and country of birth can capture some of the differences between social groups):
Is it right to use, in addition to baseline patient-related covariates, the PDC ( as continuous) of first month or first three months as a covariate in the prediction of first year post MI PDC?
Would you still recommend using finer time periods?
The goal of the model is to identify patients at risk of non adherence in the first year post MI, preferably before discharge from cardiology unit, so that an intervention to improve adherence would be planned already before discharge. However, most studies show improved prediction when adding patient refilling behaviour in the first 1-3 months.
Since you are probably interested in patients not on statins before their MI and would like to know about instantaneous adherence post MI, the covariate to condition on may be their adherence history for non-cardiac meds. So you might study the subset of patients with a few other chronic diseases for which you could obtain such information. At any rate, you can semiparametrically model the PDC longitudinally within patients, which would allow estimation of many interesting quantities including the “adherence decay curve” i.e. the shape of the estimated PDC over time. PDC could be computed monthly.
back to this subject. if I want to use a first order Markov model and plan to include in the model continuous ( using rcs) and categorical variables so that I wil have around 30 parameters to estimate. any suggestions on how to decide what is the required sample size?
the observation will be each month were we will check if the patient will be adherent or not ( as you suggested to me)
Given another dataset one can use methods as I used in some of the reports here to estimate the effective sample size (ESS) per day. For example in the VIOLET study with 28 daily measurements I estimated that you have effectively one subject per 6 days of measurement on one subject. At any rate the ESS will be larger than the number of subjects. To be conservative you might start by taking it to be 1.5 times the number of subjects for the purpose of planning model complexity.
So suppose I have a binary outcome that I measure every month, say, for 12 or 24 months. Then I can calculate the minimal sample size for binary outcome ( according to this article [article])(https://onlinelibrary.wiley.com/doi/full/10.1002/sim.7992)
and then divide this sample size by 1.5 and get the ESS, and that would be a conservative estimate of the number of subjects required for such model.
Sort of. It would be far better to have pilot longitudinal data on which to base estimates of ESS. The way I estimate ESS is to take a single outcome assessment in the middle of follow up and compute the standard error of a key baseline variable in predicting the single binary Y. Then gradually add more time points, fit an appropriate correlation structure (or possibly use the cluster sandwich covariance estimator) and compute the standard error on the sample predictor. The ratio of the squares of the standard errors estimates relative efficiency (ESS per actual subject). Gradually adding time points allows you to learn about frequency of outcome assessment. Or you can cut through that and just use all time points to get a single longitudinal standard error.