In RCTs in cognition research, the primary outcomes are usually summary measures (e.g., a weighted average of several reaction time tasks) which putatively measure some latent cognitive ability, such as short term memory. It is sometimes the case that these summary measures inappropriately average over cognitive tests that measure distinct cognitive processes. So, as a secondary/exploratory analysis, it can be valuable to look at each of the cognitive tests individually.
An immediate problem with this approach is that, with a large number of tests (~20), the treatment effect estimates will be noisy, and spurious evidence of benefit/harm may emerge.
One solution is to use a hierarchical model as described by Gelman et al. (who specifically use multiple cognition outcomes as an example). This way, we estimate the standard deviation of the treatment effect across the different outcomes, and partially pool the estimates accordingly. Our treatment effect estimates are then less noisy and less error prone.
But this doesnât seem workable when outcomes are a mix of reaction time tasks, accuracy tasks etc, each of which are very differently distributed and need to modeled in different ways. One overall hierarchical model isnât an option.
So, my question is: is it possible to perform several independent models, retain the treatment effect estimates from each, and then partially pool them in a second modelling stage?
In other words, is it possible to take a vector of treatment effect estimates from different models and appropriately shrink them towards their common mean? Or is there a different/better option available?
when i have encountered such cognitive scores i displayed the estimates in a forest plot with the estimate for the aggregated scores at the bottom (usually there are several levels where aggregation occurs). Incidentally, as @f2harrell says, i think proportional odds modelling would be useful here although you dont see it in the literature, it is usually the difference between mean scores that is given, but even the subject matter experts dont know exactly how to interpret the magnitude of the effect on that scale. Also, i thought these scores were designed to be aggregated and are not considered meaningful when teased apart?
âa mix of reaction time tasks, accuracy tasks etcâ - i know these outcomes, they are rarely analysed properly, it is time to complete tasks and number of tasks completed? so some joint model is needed?
I had never heard of this approach. It makes a lot of sense for testing a âglobalâ treatment effect without relying on summary scores. I will read about OâBriens approach in more detail.
This is what I had done initially - combining mean differences and log odds ratios into a forest plot. What I had not done is include the aggregate score at the bottom. Thatâs a good idea as it will make clear that if there is no evidence for a benefit on the aggregate scale, large effects on individual tests can probably be considered somewhat suspect. Otherwise you do not make any formal adjustments for multiple comparisons?
Most commonly it is response time and the proportion of correct responses from N trials. There probably is a good generative joint model for this process, and I should look into that. People tend to treat the RT and accuracy components as independent, and model both as Gaussian. Obviously with accuracy scores this produces a terrible fit to data, particularly when there are floor/ceiling effects. The shifted lognormal for RT and beta binomial models for accuracy seems to fit the data very well, though it still does (problematically) consider the accuracy and RT independentlyâŚ
If you could imagine overall utility as some linear combination of partial utility functions for each outcome (no interactions) then stochastic multicriteria acceptability analysis might help you to cast things in terms of decision theory. Relatively easy to output summaries like the relative importance of outcomes required to prefer one therapy over the other and how confident you can be in that ranking. Tommi Tervonne (Tommi Tervonen's personal web space).
Both of these ideas miss out on the benefits of Gelman approach of sharing information across outcomes and instead just try to turn your 20 outcomes into 1.