A colleague brought to my attention the following peer-reviewed meta-analysis of 12 prognostic studies that attempted to predict future ACL injury risk from knee kinematics.
Title: Do knee abduction kinematics and kinetics predict future anterior cruciate ligament injury risk? A systematic review and meta-analysis of prospective studies
The authors found 9 studies, using 3 different assessments, that attempt to detect excess angular motion as a predictor of future ACL injury. They reported 12 confidence intervals, which were grouped according to the assessment used. Figure 2 of their report shows 3 separate estimates of aggregate effect size, in either degrees or centimeters of displacement.
While an examination of their aggregate compatibility intervals demonstrated a shift in favor of the prognostic value of knee kinematics, but all of their computed intervals included 0, leading the authors to conclude:
Blockquote
Contrary to clinical opinion, our findings indicate that knee abduction kinematics and kinetics during weight-bearing activities may not be risk factors for future ACL injury.
Using only the data reported (in Figure 2), a re-analysis using p-value combination implied by the intervals shows, with \alpha selected using Bayesian reasoning, that contrary to the conclusion of the authors, to disbelieve in the predictive value of knee kinematics requires such
high prior odds in favor of the null as to be unreasonable.
Methods
Bayarri, Berger, and Selke (2017) provide the rationale for a Bayesian interpretation of frequentist procedures.
Expressing Bayes’ Theorem in odds:
O_{pre} = \frac{\pi_{0}}{\pi_1} \times \frac{1 - \bar\beta }{\alpha}
The evidential value of an experiment to detect the existence of an effect, is the ratio of \frac{1 - \bar\beta}{\alpha}, a quantity they call the rejection ratio. By considering the context, the prior odds, power and \alpha can be manipulated to design an experiment that shifts our beliefs by the amount necessary for the decision at hand.
At conventional type I (0.05) and type II (0.2) error probabilities, a single experiment can only be expected to produce evidence of 16:1 in favor of an alternative, conditional on actually obtaining a p value below the specified \alpha. For hypotheses with lower prior odds, multiple conceptually similar studies must be performed and combined before enough evidence will overcome a skeptical prior.
For the purpose of this re-analysis, the following assumptions are made:
- Prior odds on detecting an effect = 1:19; Prior probability 0.05 in favor of effect.
- Type II error of a single study = 0.6; Power = 0.4. \alpha = 0.05. Since \frac{0.4}{0.05} > 1, a single study has information value, but is not likely to reject the null reference hypothesis of no effect on its own.
- Target Posterior odds 9:1 in favor of kinematics predicting injury risk.
- Combined power = 1 - 0.6^{12} = 0.997
- Solving for \alpha = \frac{0.997}{9 \times19} = 0.005.
Despite discounting the power of any individual study, combining the studies and reducing \alpha by a factor of 10 provides a posterior odds greater than 9:1 in favor of a true effect in at least 1 study, if our observed p-value is lower than this level (specified before looking at the data).
P-values are difficult to interpret. An alternative representation of p-values has been recommended by Greenland and Rafi (2020) called the S-value. It is the base 2 log transformation of log_2(\frac{1}{p}) or -log_2(p). This rescaling provides bits of information against the asserted test hypothesis, and is known as the surprisal. Intuitively, it compares the observed p-value to a set of flips of a fair coin. Very low p-values translate to high surprisal values, suggesting the asserted model is unlikely.
The important point: a surprisal of 0 asserts there is no information, not that the information supports the null.
The data reported in Figure 2 were manually extracted and entered into a spreadsheet (LibreOffice Calc 5.4.2.2). One sided p-values can be recovered by the procedure described in Knapp, Hartaung, and Sinha (p. 31) (link) and steps 1 and 2 of Altman (2011) (link)
The equally weighted Stouffer method will be used to combine the Z scores.
Author | Number | Task | Factor | Mean | Sign | Lower | Upper | CI Diff | Std Error | Z score | 1 side P | Study S-val | Log Bayes | Study Bayes’ |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Raisanen | 1 | SLS | 2d peak | -7.71 | -1 | -18.09 | 2.67 | 20.76 | 5.296 | -1.456 | 0.0727 | 3.782 | -0.658 | 1.930 |
Dingenen | 2 | SDJ | 2d peak | -0.7 | -1 | -8.19 | 6.79 | 14.98 | 3.821 | -0.183 | 0.4273 | 1.227 | -0.012 | 1.013 |
Georger | 3 | VDJ | 3d IC | 1.65 | 1 | -2.56 | 5.86 | 8.42 | 2.148 | 0.768 | 0.7788 | 0.361 | 0.636 | 0.529 |
Hewett | 4 | VDJ | 3d IC | -8.11 | -1 | -14.25 | -1.97 | 12.28 | 3.133 | -2.589 | 0.0048 | 7.698 | -2.662 | 14.319 |
Crosshaug | 5 | VDJ | 3d IC | -0.5 | -1 | -2.01 | 1.01 | 3.02 | 0.770 | -0.649 | 0.2582 | 1.954 | -0.051 | 1.052 |
Leppanen | 6 | VDJ | 3d IC | -2.7 | -1 | -6.15 | 0.75 | 6.9 | 1.760 | -1.534 | 0.0625 | 3.999 | -0.753 | 2.122 |
Georger | 7 | VDJ | 3d Peak | 1.33 | 1 | -3.97 | 6.63 | 10.6 | 2.704 | 0.492 | 0.6886 | 0.538 | 0.359 | 0.698 |
Hewett | 8 | VDJ | 3d Peak | -7.6 | -1 | -13.36 | -1.84 | 11.52 | 2.939 | -2.586 | 0.0049 | 7.687 | -2.655 | 14.226 |
Nilstad | 9 | VDJ | 3d Peak | -0.65 | -1 | -5.55 | 4.25 | 9.8 | 2.500 | -0.260 | 0.3974 | 1.331 | -0.003 | 1.003 |
Crosshaug | 10 | VDJ | MKD | -0.3 | -1 | -0.78 | 0.18 | 0.96 | 0.245 | -1.225 | 0.1103 | 3.181 | -0.414 | 1.513 |
Leppanen | 11 | VDJ | MKD | 0.4 | 1 | -0.63 | 1.43 | 2.06 | 0.526 | 0.761 | 0.7767 | 0.365 | 0.628 | 0.533 |
Numata | 12 | VDJ | MKD | -1.3 | -1 | -3.34 | 0.84 | 4.18 | 1.066 | -1.219 | 0.1114 | 3.166 | -0.409 | 1.505 |
Sum | -9.680 | -5.992 | 400.406 | |||||||||||
Stouffer Test | -2.794 | |||||||||||||
p value | 0.003 | |||||||||||||
Omnibus S-val | 8.587 |
Looking at the Z Score column, the Stouffer method computes an aggregate Z of -2.794, or produces a one-sided p-value of 0.003. A Bayesian interpretation of the p-value combination method suggests there is at least 1 study that produced a true effect with \ge 90% posterior probability.
Alternatively, we could say that, conditional the assumption that there was no effect in any study, this data provides about 8 bits of information against the null.
Later, I’ll describe how I’d use the log Bayes factor bound.