In an ecological study, we investigated whether systemic drug prescribing for psoriasis varies by season and other exacerbating factors. Eligible patients with psoriasis were assessed for each season for initiation, discontinuation and switching of systemic drugs for year 2016 to 2019. The incidence of initiation, switching and discontinuation and 95% CIs were calculated separately for all seasons in 2016-2019. In addition, the mean incidence and 95% CI for winter through fall season in 2016-2019 were also calculated. To determine a seasonal trend, we compared the 95% CIs for the incidence of initiation, switching and discontinuation for each season. If the 95% Cis among seasons do not overlap, then two incidences are considered statistically different. In addition, line graphics were used to show the seasonal trend for systemic drugs overall and the seasonal trend stratified by each covariate. We received the following comments from a reviewer.
“The authors have not account for repeated measures over time by using a generalized linear model with log-link (mixed-effects logistics regression model) to derive the seasonal percentages. The confidence intervals in Tables 2-4 also do not account for the correlation due to repeated measures.
The data are presented clearly; however, if the correlation is not considered among repeated measures, the results may not be accurate. Given that the differences found are small, this could change some of the results.
We cannot determine the accuracy without repeated logistic regression models being considered.”
The exposure of this study is a time interval, an aggregate variable season, not a patient-level variable. Are the reviewer’s comments legitimate? We only have 16 time points. For the aggregate analysis, we have four seasons average. As the analysis is generated from a software platform, we do not have patient-level data. How could we possibly address those comments? Thank you so much.
i believe the reviewer is right that this approach is not adequate and you must take a modelling approach, although it’s not clear to me how random effects necessarily fit into that. The simplest thing would be to see how others have analysed such data in the journal you are aiming for, and see what stats source they are referencing
I echo the sentiment that random effects don’t make sense here. fharrell.com/post/re relates to this.
Thank you so much for your reply. If a logistic regression is used and both individual-level (age, sex, psoriatic arthritis) and ecologic measures (season, humidity, latitude) are used in the logistic regression, an important limitation of the analysis is that observations for individuals within groups are not likely to be independent. An assumption of a logistic regression is that the observations are independent.
Specifically, if a patient was not on a biologic drug for the spring, summer, fall and winter of a particular year, then the patient would contribute to four records for that year and were not independent when we assessed the biologic initiation. On the other hand, if a patient was on a biologic drug for the spring, summer, fall and winter of a particular year and did not switch from or discontinue the biologic, then the patient again would contribute to four records for that year and were not independent when we assessed the biologic switching or discontinuation. In all these situations, the assumption that the observations are independent is not met. So, I am quite confused how the modeling can be done.
In addition, the patient-level data cannot be downloaded from the analytic platform. Is there an alternative way that we could move forward without the modeling step?
Thank you for your review and comment, Dr. Harrell.
the non-independence that you describe is time within patient, which could be represented in the model, but you dont have the patient level data?
Thank you for your reply. Unfortunately the analysis was performed by our vendor using their platform. The platform does not allow download of patient-level data due to the size of the data. I wish we used SAS, which would have given us a lot of flexibility.
If you think SAS has flexibiliity, take a look at R.
I wish we used SAS and generated the patient-level data. Are you saying R can do data manipulation as well? We do not have patient-level data for this manuscript as the platform does not have the data download feature.
but you could analyse the aggregated data using R, if you have it, to satisfy the peer reviewer re repeated measures analysis that allows for the correlated data. I think the reviewer should be understanding re lack of access to patient data
That is great to know. I wonder if you could share the right R program with me and data structure that is needed for this type of analysis. That would be greatly appreciated. I am afraid that the data I have do not meet the requirement for the aggregated data analysis in R.
Riffing off of @pmbrown some ideas that come to mind:
- Try something like the robumeta or clubSandwich packages to do a meta-regression of your seasonality effect accounting for clustering within your data. Essentially you treat each season as a separate study conducted within the same database.
- See if you can adapt code from MBNMAtime to work with a single study by providing informative priors for the correlation structure
- Use this sort of hacky solution to adjust standard errors manually based on some assumed correlation and then either meta-regression
- For any two comparisons you could look at sensitivity of your claims to various correlations through treating the variance of the difference as sum of variances - 2*covariance. (This might be useful as a starting point to see how much of a concern it is).
The data input for all of the above would just be your estimates and standard errors calculated backwards from 95% CI.
Thank you very much for your help. For each of your suggestions, I will do some research.