Most of the published predictive models of medication non adherence compare models basically by comparing c-statistic ( discrimination ability). Usually, non adherence is measured using the Proportion of days covered ( PDC), which is a continuous variable that is dichotomized ( PDC>=80. % considered to be good adherence). I have learned from this forum about the undesirability of dichotomizing continuous variables, but this is not my main concern.
By comparing c- statistics, some researchers showed that prediction of PDC of the second year since first filling, is greatly improved if the PDC of the first three months is used as a predictive variable in the model.
Is this not using part of Y in order to predict Y? Doesn’t this approach result in overoptimistic c-statistics ( even after 10- fold cross validation)?
Thanks,
Elias