Exposure E is known to be a rare cause of a common condition C, which has many other possible causes. When C is caused by E, this is defined as a new clinical entity EC. To confirm EC an invasive and costly test is needed, which is seldomly performed. Some hypothesize that EC is underdiagnosed and is much more common (causes a higher percentage of C) than currently thought, and advocate for increased use of the invasive costly test on these grounds.
We have a large dataset of individuals who are prospectively screened for exposure E and we know whether or not each patient has common condition C. We have not performed the invasive test on any of the patients. We have done several analyses and shown that the prevalence C does not differ by exposure E.
I am looking for guidance on how to word these results in a statistically defensible manner and quantify the certainty of these “negative” results. Would I be correct in thinking that using the confidence limits of the risk difference would be useful? Let us say that the risk difference of C between those with and without E is 0.9% (95%CI -0.5% to 2.1%). Can I say something along the lines of: “We found no association between E and C. Our data are unable to directly estimate the prevalence of EC. However, assuming that E does not confer protection for other causes of C, and EC is the only explanation for a difference in the prevalence of C between those with E and without E, then our data is incompatible with a EC prevalence of 2.1% or higher.”.
Or should I be justifying or quantifying this negative effect with reference to a power analysis, showing what true effect I should be able to detect, given the data? How would I word this?
For several pragmatic reasons, I am bound into using frequentist statistics and language.
Can anyone point me to a resource that discusses wording for “negative” results and resources on power analysis?