Predicting survival of cancer patients using the horoscope: astrology, causal inference, seasonality, frequentist & Bayesian approach

Sander · August 22, 2020, 10:41pm

Sorry to report that the only aspect of this discussion that makes sense to me is the comment that seasonality is at play here, and the note that the data are not about astrology; they are simply about astrological month of birth, so that the chi-squared analysis makes no sense given the cyclical ordering.

Turning to biology, the chi-squared analysis fares even worse given the astronomical (as opposed to astrological) irrelevance of the month categories (there are after all about 12.4 lunar-phase cycles and 13.3 lunar-orbit cycles a year). That aside, lifetime effects of season of gestation are very plausible based on the embryology and life-course epidemiology I know of; for example sunlight exposure affects vitamin D levels which are central to bone formation and maintenance, as well as maternal resistance to infections which in turn can have dramatic effects on the fetus that last a lifetime (e.g., impaired hearing from congenital rubella). And infections themselves can have dramatic seasonal patterns, even in the tropics.

To model seasonality in a way connected to the leading hypothetical mechanisms (gestational nutrition, infections, and sunlight) I would instead start with a 2-parameter single-annual cycle on the circle of months: one parameter for the cycle origin (circle location) and another that distorts away from the sinusoidal shape via horizontal stretching and compression. Other parameters essential for serious seasonality analysis would include monotone interactions with birth year (recognizing that seasonal nutrition and infection have been flattened out by modern food distribution and infection controls); and some sort of climate interaction to allow for sunlight exposure and respiratory infections. These items compose the qualitative prior information represented by the data model (as per Box 1980).

For an analysis of astrology, those seasonal terms would be needed as potential confounders. A fair analysis would also group the signs by alleged sign characteristics or some other aspect of astrological theory. The latter is quite elaborate according to some afficionados and includes lunar-cycle elements (I don’t know details and I would consult with those who do).

This is all to say: I don’t believe in astrology any more than others here, but the knee-jerk “this is noise” responses to “significant” analyses purporting to be about astrology betray how statistics has its own severe problems with irrational belief and sheer prejudice passed off as scientific skepticism, as often imposed by “skeptical” priors. That prejudice is most prominent in declarations that a priori an association must be random if you can’t immediately think of a plausible mechanism for the association. Such imagination failure usually happens because you don’t know the background topic well enough to realize the many not-unreasonable ways the association could be “real” (i.e., due to a causal mechanism, albeit perhaps not the one under study).

I find it particularly disturbing when commentators double down on their ignorance with claims that there are data proving there is no association (perhaps because they read reviews saying there is none, based on all the underpowered studies that reported p>0.05 as “no association”), having never bothered to dispassionately analyze the actual background literature. This pseudo-empirical pseudo-skepticism is one version of what I call nullism; it plagues the “reproducibility crisis” industry as transported from experimental psychology into medical epidemiology, where it has been weaponized for the risk-denial industry.

BTW my jaundiced view of pseudoscientific prejudice passed off as skepticism is derived from one of Feyerabend’s themes, which he illustrated using astrology as an example. And no he didn’t believe in astrology, he just thought most scientists attacking it were hypocritical as “science defenders”, given their sloppy, prejudicial refutations.

albertoca · August 23, 2020, 9:42pm

Prioritising the least far-fetched hypotheses cannot be considered an act of ignorance. Obviously I cannot rule out seasonality, nor the other infinite hypotheses that are conceivable (including the astrological one), but this p-value is not enough even to create a conjecture. In the hypothetico-deductive method applied to medicine, the usual thing is to construct a solid rationale, which is generally reached by induction. When the rational basis is mature, the hypotheses are formalized and a frequentist test is applied, to challenge the hypothesis.
I agree with your criticism to nullism, but that should be nuanced as it is also influenced by the previous plausibility of the hypothesis. You will agree with me that, when faced with the same p-value, it is not the same to declare negative an immunotherapy clinical trial with divergent survival curves, with a powerful rationale, or this analysis about the horoscope carried out for fun.
If the above argument were true, any significant p-value, as far-fetched as it might be, would have to be followed by the post-hoc elaboration of a rational basis a posteriori, no matter how implausible it might be.
In this case, the argument of seasonality could obviously be a further hypothesis. However, it is a hypothesis that is not compatible with what is currently known about the carcinogenesis and tumorigenesis of gastric cancer in adults. Although the biological basis of tumours was mysterious a few years ago, this no longer applies today. Today, the entire genome of hundreds of stomach cancers has been sequenced, and the genetic mechanisms of their pathogenesis are fairly well known. Since Vogelstein it has been known that these tumours require 6-8 somatic mutations, which develop stochastically over decades. Evidently, there are examples of children who develop childhood tumors, due to germline mutations. There are also known polymorphisms that predispose to gastric cancer decades after birth. However, throughout life, the subject suffers millions of exposures, such as smoking, alcohol, the silent action of Helicobacter pylori over decades, obesity, Epstein-Barr virus infections, etc. To believe that the cancer of a heavy smoker in his 70s is due to intrauterine exposure is not impossible, although it is logical to think that this effect should be tiny compared to his continuous environmental exposures. Believing that the cause is in a transient, seasonal maternal exposure, which can be separated from other factors in this observational study, is even more complicated. But it’s not over yet because we’re not talking about cancer incidence. All the subjects in this cohort have the exact same type of cancer. Therefore, even less plausible is the hypothesis that a transient and unknown intrauterine exposure is capable of modifying the prognosis of a metastatic cancer developed 70 years later, with a sufficiently large magnitude of effect to be discerned in a study, which has difficulty in capturing more evident effects. If this prognostic (non-oncogenic) effect existed, it would most likely have been detected already, because as I explained before, nowadays the complete sequencing of tumors is an easy procedure, already done.
Precisely, as this type of hypothesis-free studies are becoming more and more frequent, the basis of my analysis was to speculate with a hypothesis-free frequentist test , and then to reflect on the result. To do this, I chose the craziest variable I could think of, to show the dangers of the hypothetico-deductive method when it is done without hypotheses.

Sander · August 23, 2020, 11:14pm

Thanks but sorry, I don’t buy any of your rationale for dismissing confounding by seasonality. I’ve observed directly for a half century (and far longer from history of science) claims along the lines of “today we understand this process/disease/etc. so well that we can rule out everything that doesn’t fit neatly into our current conceptualization”. I think it’s hubris, pure and simple. Where (for example) are overwhelming data showing epigenetic factors determined prenatally (and thus subject to season of conception and gestation effects) play no role in this or any cancer? And then, seasonal factors are not at all “transient” relative to critical gestation spans (like the first trimester).

The attitude (and it is just that) that there is such overwhelming null data has been labeled “spinning knowledge out of ignorance”. It’s an attitude not even warranted in physical sciences - see the now-absurd looking null claims made by leading scientists of their time, like Harold Jeffreys (a god to some Bayesians, one as capable of foolishness as any of the gods of statistics) asserting that continental drift was physically impossible well into my lifetime!

You seem to have missed my main point that your calendar variable was not at all crazy: It was just a recategorization of a year into 12 months, with astrological labels that deceived you and others into thinking there should be no association. Like everyone attacking the straw man of astrology you made no effort whatsoever to build a model from an astrological theory (there are several) to predict your outcome and thus refute the theory. Again we agree astrology is rubbish, but our view is NOT because of statistical analyses, it is because the whole area fails to fit in with our established science and in fact looks utterly fanciful and archaic, in that it does not subject itself to empirical refutation and revision in light of that.

The nonscientific nature of astrology does NOT mean signs won’t be predictive because of confounding by mundane, earthly seasonal effects or the like. Causation and confounding by poorly understood or even unknown causal processes (whether plate tectonics or epigenetics) is always a risk when the targeted factor has not been randomized, as with month of birth.

So it is that your response seems to me to carry on the usual confusion of belief in the null (which scientists can and should suspend far more often than they do) with decision to use the null based on other considerations far beyond a smallish P-value. Here the latter indicate there is not enough basis for pursuing this association, all things considered - not only random error but the entire contextual background of the variables including what you listed.

This kind of confusion of belief with decision seems a consequence of human factors, such as a need for faith and certainty otherwise denied to most scientists via disavowal of traditional religion. The confusion persists and is perpetuated by bad examples in the literature, despite warnings that (1) A P-value is just a summary of a relation between a model and the data, conveying very narrow specialized information, useful if understood correctly but far from enough to carry the weight of an inference or decision by itself, and (2) decision rules should not be mistaken for belief functions.

Unfortunately, today we are witnessing reactionary movements to keep researchers “barefoot and ignorant”, stuck in the mire of treating NHST as some apotheosis of falsificationism (or worse, of the scientific method). But a P-value tells us nothing more than where the data landed along one of many axes pointing away from the assumptions used to compute it. Not even a 5-sigma p (3/10^7) will get you the Nobel prize in physics without extensive documentation that your experiment forced every assumption to hold except the null hypothesis - and even then a replication will be needed. Yet medical research still enshrines as a universal inference rule a “significance” indicator with 5 orders of magnitude more error and which throws away most of the information in the P-value. This ongoing standard practice (forced on authors by prestigious journals like JAMA) makes the most fanciful astrology seem rational.

albertoca · August 24, 2020, 6:23pm

After thinking about it for a while, and leaving aside the plausibility of the idea, I find the proposal fascinating, but I need help to fit that. I don’t know how to fit a circle into a Cox regression, but I certainly think it’s a beautiful idea. Please help to complete the code:

aga$date_nac<-as.Date(aga$"Date of birth", format="%d.%m.%Y")
dat$Day_S <- as.numeric(strftime(dat$date_nac, format = "%j") ) # Day_S is the day in the year
dat$Day_S[dat$Day_S==59] <- 58 # remove the extra day in leap years
dat$Day_S<-ifelse(dat$Day_S>58, dat$Day_S-1 ,dat$Day_S )
fit <- coxph(Surv(SG,Die)~  sin(2*pi/(365)*Day_S)+cos(2*pi/(365)*Day_S),data=dat)
visreg(fit, "Day_S",ylab="Hazard ratio", trans=exp)

RamonSalazarS · August 24, 2020, 6:59pm

big data association versus causality or monarchy versus democracy

Sander · August 24, 2020, 11:01pm

Thanks but I think we’re still too short on context to jump into code…

First, on plausibility and confounding: The point is, to deal with confounding we need information that bounds it (even if only stochastically), not just a declaration that this or that is “implausible”. An example of such information is randomization, absent here. Another example is observed independence of potential confounders from the targeted potential cause (treatment or exposure); another would be observed independence of those from the targeted outcome variable. Without such information, we need to adjust for potential confounding, and that requires the target not be perfectly predicted by the potential confounders (in propensity-score terms, we need some overlap of the treated and untreated groups conditional on the score).

Next, the big problem here is that you have not defined the target exposure variable! Apparently it is something to do with astrology, but what? Neither I nor apparently you know enough about astrology to come up with a variable that represents what an astrology aficionado would consider an astrologically causal or at least predictive function of the only measurement you have, date of birth - which is also a proxy for seasonal causes.

Next, let’s suppose we do obtain an acceptable astrological function, call it x(date) [which might be a vector of astrologic properties, e.g., 3 indicators for “earth/water/fire/air” sign]. The confounder I was concerned about is a seasonal function, call it z(date), and I imagined it would here suffice to use a scalar covering one cycle over a year (one max, one min). There is a huge literature on seasonal regression (google it) in sales and marketing but some also in infectious disease, from which to draw z(date).

As a simple thought I suggested entering z(date) as something like bsin(a+f2*pi) where b is an unknown regression parameter, a is an unknown location parameter (cycle origin), and f is the birth date expressed as fraction of the year from Jan. 1 to Dec. 31 (f runs from 0 to 1). That is however an intrinsically nonlinear term so you need nonlinear regression software to use it [although in this case it is not hard to approximate that by cycling between fitting with a fixed and then fixing the regression parameters and fitting a by maximizing the partial likelihood, which technically invalidates the final x(date) coefficient inferences, but not enough to worry about here]. I have no doubt you can find easier, packaged approaches online.

What you won’t find packaged is how to create x(date), the target variable for the analysis effort. That reflects a general problem far beyond the silliness of astrology or of trying to control confounding while estimating an effect with only one measurement (date) on which to base both variables: Far too many regression analyses I see, even of a plausible target or confounding variable t (e.g., t = a nutrient; or t=age) fail to create credible regressor functions x(t) or z(t); instead they dump untransformed measurements into the model as single linear terms, which leads to junk inferences when (as here) the effects if any of the variables would be intrinsically nonlinear.

BrianLee · August 25, 2020, 11:51am

I find this discussion very fascinating, as I actually did a seasonality analysis for autism recently. (Eur J Epi 2019 https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6602987/ . I’m not sure I did it correctly, but to the point about transforming the date variable, we actually went a step further to decompose the raw signal into component signals, and then made inferences based on the components that made sense. Clearly there is a lot of noise in data like these. It’s very likely that associations would pop up due to chance. But it’s also fair to acknowledge we don’t know enough about the etiology of many complex diseases to be putting heavily nullist priors on things just because we can.

karlamoPA · August 25, 2020, 2:06pm

Because the association lacks a plausible mechanism of action (the month I’m born influences my risk of death from hundred+ unique complex diseases) the result calls out for doing the analysis again on another set of records. If the finding is reproducible … well then we have to think about it more : )

Pbleic · August 25, 2020, 2:26pm

Layton et. al published a NEJM paper on increase in ADHD diagnosis in children with August birthdays using a regression discontinuity analysis. https://www.nejm.org/doi/full/10.1056/NEJMoa1806828
Their explanation (which I have personal experience with) is that, when states have a September 1 cutoff for school entry, the children with August birthdays are the youngest in their class. The age-related behaviors leads to inappropriate diagnosis of ADHD. Some unlikely seasonal variation may occur from non-obvious mechanisms. Here, birth month is a common cause of immature behavior (and indirectly, inappropriate ADHD diagnosis) and school enrollment.

zad · August 25, 2020, 2:50pm

I’m not sure I really understand the point of your analysis. First, it seems like your data are not public so no one can confirm or reproduce your analysis to any extent.

And if you wanted to show that spurious associations/tiny p-values can occur when testing many many parameters, why not use a random number generator and do a simulation study, where you’d actually have an understanding of the data-generating mechanism?

What I suggested seems like a much better way to study the behavior and properties of these methods, and is what is commonly done to do so. There have already been many studies of this sort published, and many contrasting classical approaches with Bayesian ones.

Using a data set that you don’t have deity-like control over, with a theory you barely understand doesn’t seem like a good idea to me

zad · August 25, 2020, 3:00pm

Your reply pasted below once again confirms that you don’t really have much interest in seasonality, and the theory of astrology and other topics mentioned above (even though there are some great suggestions above on how to study them), but that you’re mainly interested in the properties/behavior of these methods. However, as Sander mentioned above, without even a surface-level understanding of the theory you’re invoking and full control of the data set you’re using, I don’t think the results you get will actually mean what you think they mean

zad · August 25, 2020, 3:04pm

Here’s a great resource if you want to do your own sim study to examine the properties of the methods in question…

albertoca · August 25, 2020, 3:16pm

The ultimate goal of my question was just to learn. I tried to make a simulation at the beginning.
This is the code that I tried naive, I don’t know if I did it right:

error<-NULL
number_parameters<-80
n=100
m <- matrix(data=0, nrow=number_parameters, ncol=n)

for (p in 1:number_parameters){
for (i in 1:n) {
dat$sim= factor(sample(1:number_parameters, nrow(dat), replace = TRUE)
dat$sim 
d<-survdiff(Surv(SG,Die)~sim, data=dat)
error[i]<-pchisq(d$chisq, df=number_parameters-1, lower.tail=FALSE)
m[p,i]=error[i] # in the rows go the parameters

}}

sum(error<0.05)/length(error)

I wasn’t able to find anything interesting, and I assumed that the chi2 distribution was so old that there was little more to dig into on that side. Other people have answered fascinating things. Among these insights, the issue of seasonality has been an unexpected twist that came up during the discussion. Indeed I think that seasonal effects are unlikely here, as @karlamoPA also thinks.

However, I was not aware of the existence of seasonal models, and I found them to be very nice tools and concepts worth exploring here, regardless of my previous belief.
The database is part of a national gastric cancer registry sponsored by the Spanish Society of Oncology (AGAMENON registry).
Thank you for your contribution.

albertoca · August 25, 2020, 3:28pm

By the way, a fellow oncologist wrote me to complain that we’ll have them testing out outlandish hypotheses from small P-values, or from subgroup analyses until the end of times.

Sander · August 25, 2020, 3:49pm

I don’t think this result calls out for reproduction yet, because the data haven’t yet been analyzed using a model grounded in a theory that would predict reproducibility. I suggested one way to do that.

To elaborate, I’m going to give you my version of a neo-Fisherian/post-Popperian take which parallels the usage of P-values in the Higgs boson experiments. Those experiments were set in the opposite extreme of the present case, insofar as most everyone was sure the background theory was correct and yet they wanted to put it through a severe test - severe in the physical sense of demanding extremely expensive precision equipment and personnel (not “severe” as an absurd malapropism referring to a non-null P-value).

Ideally, the theory would be a causal mechanism on which to build up a statistical model form. Regardless, only if an analysis grounded in such a theory produced a lot of information (= tiny P-value = large S-value) against the null in the direction predicted by the theory (thus corroborating the theory) could one say further pursuit could be worth the effort. Otherwise, without a theoretical derivation of the statistical model used to derive the test (whether from the standard model of physics or some astrological tale), one has to ask: what is the likely yield or value of pursuing some randomly observed association?

albertoca · August 25, 2020, 4:00pm

As a counterpoint to physics, what now concerns many oncologists such as @RamonSalazarS and others is the multiplication of dubious results not generated by solid hypotheses, and not validated, by the rise of omics. We don’t have billions of patients like physicists; we don’t look so much for absolute truths through severe tests, as Deborah Mayo explains, but for practical ways to improve the day-to-day lives of patients. That is why my feeling is that sometimes we get caught up in the scale of the philosophical discussion, applying concepts that would be very valid in the field of theoretical frontier physics, to the mundane realm of daily clinical practice.

On the other hand, evaluating temporality is interesting as a confounding factor, but can also be of interest in itself, even if it is to say that it does not exist or has a minuscule effect.

Sander · August 25, 2020, 5:39pm

I see no “counterpoint to physics”, because (like physicists and everyone else) oncologists will only spend their constrained resources on associations whose pursuit they expect to pay off. This means they won’t be pursuing astrology (except perhaps as a parlor game) because the stars are much too far removed from tumor genesis, and won’t pursue gestational season either because the outcome (while not as far as from the stars) is still in their minds too far removed from that factor (as encoded in their accepted theory, as you described).

I feel like I have not been clear enough so let me recap how I see our discussion: The issue I think perhaps you meant to raise at the start is what to make of small P-values when there is no theory to explain such an observation (at least, no theory we aren’t 100% sure is false like astrology). My counterpoint was in part this: Why were you examining the association at all? It was in fact because you had some ill-formed notion of some theory to test that you were sure was false, namely astrological theory; and you thought that redividing the year into horoscope months would provide a test of that theory which you expected to be falsified by a high P-value (which is itself the reverse of Fisher’s logic for using P-values). I pointed out that (1) your P-value did not test astrological theory because you did not derive any prediction (statistical model) from astrology or for that matter from any theory at all, and that (2) an observed P-value could have a myriad of other explanations, which even if seemingly implausible were not absurd like effects of stars; I gave one example: season. So, all you tested was an unordered-categorical statistical model with category labels derived from astrology. I went on to describe how to test astrology if that was someone’s actual intent, allowing for unknown confounding by season.

If we now discard all that theory, we can return to the question: What to make of a P-value with no derivation from a background theory? Very little, almost nothing; I’d say just this much: According to a particular measure of distance called the Shannon or binary surprisal or S-value, and making no use of any structural information like time ordering of dates, the data you gave are about 6 or 7 bits away from what is predicted by the null model in which the underlying outcome means are uniform over birthday [bits are tiny units so let’s not fuss over decimals; each one represents the maximum information we could gain from one binary observation, e.g., the maximum information that we could gain from one toss about loading for heads in a coin-tossing mechanism]. Without more input, that’s the end of the story supplied by statistical theory. And anything more (besides more of the same sort of data) has to be supplied by the context, which is what determines whether this analysis is in retrospect a trifling diversion like a horoscope, or the lead into the greatest breakthrough of the decade in oncology.

albertoca · August 25, 2020, 6:44pm

Your response conveys very stimulating ideas. I can’t help but dig a little deeper into the s-value. I have two doubts. Shanon’s formula, relating information to entropy, is relatively easy to understand: microstates increase exponentially in relation to the minimum information needed to describe them. However, I do not understand the intuition to expand that concept to p-values. Since some readers here may not be clear about it either, perhaps it would be nice if you could explain it in a simple way, as intuitively as possible. What makes the p-value somewhat similar to the number of microstates in Boltzmann’s equation? On the other hand, and as a result of @R_cubed previous comment I was asking a colleague if 6-7 bits were too much or too little, looking for analogies with worldly issues. We did not reach a clear conclusion. The idea is beautiful, but how do you interpret the number of bits to know if 6 are too little?

Sander · August 25, 2020, 8:41pm

Answering that gets into a big topic area…

Various transforms of the observed P-value p such as 1−p, 1/p, and log(1/p) have been used as measures of evidence against statistical models or hypotheses (which are parts of models). Part of the interpretation problem is that those statistical models get confused with the theories that predict them. Correct understanding however requires keeping them distinct, because the measures refer only to the statistical model used to derive the p-value; purely logically the measures say nothing about any theory unless that model can be deduced from the theory. Thus (as Shannon pointed out) the information they measure is only information in the narrow syntactic sense of data deviation from statistical prediction; they connect to semantic information (contextual meaning) only to the extent that information gets encoded in the theory in a form that enters into the deduction of the statistical model from the theory. You may see my earlier comments as stemming from this distinction. [Parallel comments apply to other statistical measures in other systems, such as likelihood ratios and Bayes factors, which have huge literatures including math connections to P-values - but again those literatures often confuse the statistical model (which may now include an explicit prior distribution) with the causal network or theory under study.]

The idea of re-expressing P-values as surprisals via the S-value s = log(1/p) = −log(p ) transform goes back (at least) to the 1950s, using various log bases (which only change the unit scale). The transform also arises in studies of test behavior under alternatives, with ln(1/p) as an example of a betting score and a “safe” test statistic.

I’ve authored or coauthored several articles trying to explain the S-value’s motivation and interpretation from a neo-Fisherian (refutational statistics) perspective. The following should be free downloads, and may most directly answer your question:
Greenland, S. (2019). Some misleading criticisms of P-values and their resolution with S-values. The American Statistician , 73, supplement 1, 106-114, open access at www.tandfonline.com/doi/pdf/10.1080/00031305.2018.1529625
Rafi, Z., and Greenland, S. Semantic and cognitive tools to aid statistical science: Replace confidence and significance by compatibility and surprise. BMC Research Methodology , in press. http://arxiv.org/abs/1909.08579
Greenland, S., and Rafi, Z. To aid scientific inference, emphasize unconditional descriptions of statistics. http://arxiv.org/abs/1909.08583
See also this background essay against nullism, dichotomania, and model reification:
Greenland, S. (2017). The need for cognitive science in methodology. American Journal of Epidemiology , 186 , 639-645, https://academic.oup.com/aje/article/186/6/639/3886035.
More quick, basic treatments of key topics in the above articles are in:
Amrhein, V., Trafimow, D., Greenland, S. (2019). Inferential statistics as descriptive statistics: There is no replication crisis if we don’t expect replication. The American Statistician , 73 supplement 1, 262-270, open access at www.tandfonline.com/doi/pdf/10.1080/00031305.2018.1543137
Greenland, S. (2019). Are “confidence intervals” better termed “uncertainty intervals”? No: Call them compatibility intervals. British Medical Journal , 366 :15381, https://www.bmj.com/content/366/bmj.I5381.
Cole, S.R., Edwards, J., and Greenland, S. (2020). Surprise! American Journal of Epidemiology, in press. https://academic.oup.com/aje/advance-article-abstract/doi/10.1093/aje/kwaa136/5869593

Sander · August 25, 2020, 10:01pm

P.S. Re: “how do you interpret the number of bits to know if 6 are too little?”
Without extensive context, that question is far too ill-posed to answer, yet conventional stat training and practice pretends otherwise. The problem is that, as used in such questions, qualitative terms like “too little”, “large”, “very small” etc. are contextual valuations that refer to conditions beyond the study results, and can go far beyond even the bare scientific context.
Consider: How do you interpret the number of grams to know if 6 are too little, or too much? You have to narrow the context quite a bit you to answer that. Is that question about daily nutrient intake? Then 6 grams is far too little protein to maintain good health, but far too much salt.

With statistical tests, is 6 bits of information against a hypothesis too little or more than enough to pursue further study? Well how much would a further study cost? 1 hour and $100 as in replicating a very simple psych experiment in a lab already set up for it? Or 10 years and $100 million as in replicating certain long-term clinical trials?

The need for a loss or cost function for value declarations is well recognized in statistical decision theory but fatally neglected in most basic stat education and publication, where instead the magical “p < 0.05” = “s > 4.32 bits” is enforced as a universal rule. This practice is vigorously defended and adhered to by prestigious medical journals and opinion leaders, despite the fact that the founders of current conventional statistical testing theory and practice (Fisher and Neyman) warned against such nonsense - and despite plain demonstrations of how nonsensical it is.

This kind of convoluted defense of utter nonsense is exactly what one should expect when the defenders have built their careers and policy decisions using and enforcing the nonsense. Resistance will be especially fierce among those who face enormous liability if their practices and policies are revealed to be damaging (read about the resistance to antiseptic procedures in the 19th century, or to dietary origins of obesity and diabetes in the 20th century).

Although there were always objections to such universal decision rules (including again from Fisher and Neyman), of late there has been more objections, some even from defenders of fixed-alpha-level testing. See for example last year’s special issue of The American Statistician, introduced by Wasserstein, Schirm & Lazar at https://www.tandfonline.com/doi/full/10.1080/00031305.2019.1583913