Predicting survival of cancer patients using the horoscope: astrology, causal inference, seasonality, frequentist & Bayesian approach

R_cubed · August 15, 2020, 1:34pm

Correct me if I’m wrong, but can’t the “sufficiently skeptical prior” be translated to a roughly equivalent frequentist p value?

If I did the math right, a p of 0.01343 gives about 6 bits of information against the null, using the p value transformation recommended by Greenland in this article:

Greenland, S. (2019) Valid P -Values Behave Exactly as They Should: Some Misleading Criticisms of P -Values and Their Resolution With S -Values (link).

I don’t remember which Sander Greenland paper I read this in (I was thinking of a distinct paper from the one linked to above), but he also discussed the critical distinction between the hypothesis testing procedures used by Fisher, Neyman, and Pearson, vs. the misleading textbook descriptions of them. This “fixed level” significance testing (ie. using the same magical \alpha of 0.05, 0.01, etc. without consideration of the effect or sample size), is something that neither Fisher, nor Neyman would have approved of.

f2harrell · August 15, 2020, 1:53pm

I don’t see how to encode skepticism in the right place in the logic flow with the frequentist approach. You are suggesting being more skeptical about data you witnessed whereas I’m being more skeptical about hypotheses.

Jochen · August 15, 2020, 2:11pm

Not sure if I got what you try to say, but a frequentist being sceptic about the tested hypothesis would not be willing to reject this hypothesis when p is not really, really small.

PS: great paper linked by R_cubed, thanks for bringing this to my attention!

albertoca · August 15, 2020, 2:46pm

But how do you interpret the s-value in this case? My intuition is not clear about whether six bits is too little information or not .

Pavlos_Msaouel · August 15, 2020, 3:37pm

Context is important when looking at p-values (or their s-value transformation). What you did is the equivalent of taking multiple random presumed fair coins from your wallet and flipping each of them seven times (p=0.01). It is not that surprising that you observed one of them to come up as tails on all seven tosses.

R_cubed · August 15, 2020, 4:09pm

@f2harrell: I think I’ve had enough coffee this morning to clean up my initial post.

To say a skeptical prior would not find an association seems to be related to a frequentist advising someone that his/her a level is too high for a particular context.

Added on 8/17/2020: I should have simply cited an instructive paper by Bayarri, Benjamin, Berger, and Selke on what they call the rejection ratio to demonstrate what I mean. This perspective is useful whether you approach a problem as a frequentist or a Bayesian. I wish it was taught to me in my intro stats class.

Expressing Bayes Theorem (pre-experiment) in odds:
O_{pre} = \frac{\pi_{0}}{\pi_1} \times \frac{1 - \bar\beta }{\alpha}

The value of an experiment (conditional on rejection of the null) is the ratio of power to \alpha, or what the authors call the rejection ratio. We can see that the more skeptical the prior, the smaller \alpha needs to be for an experiment to shift the prior odds. They use the average or expected power, as this is thought about before any data are seen.

In the post-data POV, they use the \frac{1}{-e \times p \times ln(p) } bound to relate the p value to a Bayes factor bound, which provides the highest amount of evidence against the null provided by the data, for any prior. This is the most an honest advocate can assert as evidence in favor of an effect for a particular study.

For any retrospective look at a data set, we can calculate a Bayes’ factor, a Bayes’ factor bound, or a p value. There exists a function that outputs a Bayes’ factor when given a p value; the inverse gives a p value. We can always solve for the prior required when we assert any particular posterior, when given the Bayes’ factor of the data.

After much study, I try to place frequentist reports in a Bayesian context. Robert Matthews (Aston University) has written a few instructive papers on how derive the implied prior when presented with “confidence” (aka. compatibility) intervals. This is his earliest one.

Matthews, R. (2001) Methods for Assessing the Credibility of Clinical Trial Outcomes Drug Information Journal, Volume: 35 issue: 4, page(s): 1469-1478 (link)

Addendum: An educational paper on placing p values in a Bayesian context:

@albertoca: Sander Greenland likes to convert the p value into a proper binary unit of refutation. In this case, 6 bits of information against a model is akin to flipping a fair coin 6 times, and having them all come up heads - 0.5^6 = 0.0156.

Using the \frac{1}{-e \times p \times ln(p)} bound, your p value of 0.01343 converts to a best case Bayes’ Factor Bound of 6.36 to 1 in favor of the alternative.

albertoca · August 15, 2020, 4:12pm

But isn’t it to correct this that the chi2 distribution was invented, which becomes more and more elongated as the degrees of freedom increase?

albertoca · August 15, 2020, 4:17pm

I understand that, and the connection with entropy/information theory seemed like a beautiful idea to me. However, in this case it is not clear to me that the s-value will help me to interpret the result differently from the p-value.

Pavlos_Msaouel · August 15, 2020, 4:22pm

The use of context to map the observed data into a generating process is something that informs the statistical analysis but lies outside of it. This is a topic of intense debate but I found this article by Judea Pearl to be an extremely insightful introduction.

albertoca · August 15, 2020, 5:40pm

To pick up the gaunlet, I’ve fitted a Bayesian Cox model with skeptical priors ~normal (mean=0, sd=0.1) for all the signs of the zodiac, except for the basal category (the sign of Capricorn). With the Capricorns I did not know very well what to do. I don’t know if a skeptical prior should be placed on the intercept and what the prior would be. The brms library uses the Student-t distribution with 3 parameters for the intercept.
In any case this is the result. The Bayes factor against a null (intercept-only) model is 8 in this example. What do you think?

prior <- c(set_prior("normal(0,0.1)", class = "b", coef = "zodAquarius"),
           set_prior("normal(0,0.1)", class = "b", coef = "zodAries"),
           set_prior("normal(0,0.1)", class = "b", coef = "zodCancer"),
           set_prior("normal(0,0.1)", class = "b", coef = "zodGemini"),
           set_prior("normal(0,0.1)", class = "b", coef = "zodLeo"),
           set_prior("normal(0,0.1)", class = "b", coef = "zodLibra"),
           set_prior("normal(0,0.1)", class = "b", coef = "zodPisces"),
           set_prior("normal(0,0.1)", class = "b", coef = "zodSagittarius"),
           set_prior("normal(0,0.1)", class = "b", coef = "zodScorpio"),
           set_prior("normal(0,0.1)", class = "b", coef = "zodTaurus"),
           set_prior("normal(0,0.1)", class = "b", coef = "zodVirgo")
           
)
> rstan_options (auto_write=TRUE)
> options (mc.cores=parallel::detectCores ())
> fit <- brm(SG | cens(1-Die) ~ 1 + zod, data = dat, prior=prior, family = brmsfamily("cox"),chains=4, iter=2000, save_all_pars = TRUE)
Compiling Stan program...
Start sampling

> fit2 <- brm(SG | cens(1-Die) ~ 1 , data = dat, family = brmsfamily("cox"),chains=4, iter=2000, save_all_pars = TRUE)
Compiling Stan program...
Start sampling

> summary(fit) 
 Family: cox 
  Links: mu = log 
Formula: SG | cens(1 - Die) ~ 1 + zod 
   Data: dat (Number of observations: 2473) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Population-Level Effects: 
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept          1.56      0.08     1.42     1.73 1.00     1803     1982
zodAquarius        0.05      0.06    -0.07     0.18 1.00     4366     3145
zodPisces          0.02      0.06    -0.11     0.13 1.00     3462     2786
zodAries          -0.08      0.06    -0.20     0.04 1.00     4652     3343
zodTaurus         -0.06      0.06    -0.19     0.07 1.00     3922     2963
zodGemini          0.08      0.06    -0.04     0.20 1.00     4058     3030
zodCancer         -0.04      0.06    -0.16     0.08 1.00     4030     2859
zodLeo             0.11      0.07    -0.03     0.24 1.00     4300     3243
zodVirgo           0.03      0.06    -0.09     0.15 1.00     3874     2963
zodLibra          -0.02      0.07    -0.15     0.11 1.00     4226     2854
zodScorpio        -0.02      0.07    -0.15     0.11 1.00     4043     2882
zodSagittarius    -0.15      0.07    -0.28    -0.01 1.00     4076     3180

> bayes_factor(fit,fit2)
Estimated Bayes factor in favor of fit over fit2: 8.02152

R_cubed · August 15, 2020, 6:33pm

Blockquote
However, in this case it is not clear to me that the s-value will help me to interpret the result differently from the p-value.

An advantage of the S-value over p is one can add the surprisals of different studies knowing only the p value alone to come up with an aggregate surprisal value. This is what Fisher’s combination procedure does (except using the natural log as the information measure).

In the context of the OP, given the sample size (2400 subjects), I don’t find a rejection of the exact null of no relationship between survival and zodiac sufficiently surprising to warrant another study.

davidcnorrismd · August 16, 2020, 10:25am

This seems somehow related to ‘preregistration’. Is that the major difference between the Bayesian and S-value approaches here?

RGNewcombe · August 19, 2020, 10:25am

There is nothing exceptional about your p-value. Back in around 2005 I was analysing the results of a trial I believed had been validly randomised. There was a big imbalance on the most important prognostic factor. One doesn’t normally calculate a p-value for such imbalance as it should be simply a random number between 0 and 1 - but it turned out to be rather below 0.001. On reflection - this is the only time I can recall seeing this in a lifetime working as a statistician (since 1971), so really it is not at all surprising.

There is no inherent difference between parametrising into 12 months the way we do, or the Hebrew way, or the astrological way, each of which is simply offset some days from ours. There are methods available that fit an a priori meaningful sinusoidal model - basically 2 df, regress on sine and cosine of date which are orthogonal to one another.

I recall a former colleague, Prof Colin Roberts, rather a sceptical guy, back around 1980 presenting data on birth defects not by month but by star sign. There were some patterns evident, which he interpreted intelligently, simply as obvious seasonal trends - i.e. just what a sinusoidal model would seek to pick up. In contrast to the pattern here which only very slightly supports any idea of sinusoidal seasonality.

RonanConroy · August 20, 2020, 9:52am

albertoca:

                  N Observed Expected (O-E)^2/E (O-E)^2/V
zod=Capricorn   247      203  195.667  0.274836  0.304873
zod=Aquarius    214      187  169.646  1.775170  1.942353
zod=Pisces      215      183  176.597  0.232181  0.254755
zod=Aries       238      187  209.565  2.429634  2.714388
zod=Taurus      196      158  171.088  1.001254  1.096064
zod=Gemini      215      180  157.237  3.295255  3.583045
zod=Cancer      222      185  191.979  0.253737  0.282298
zod=Leo         167      139  114.379  5.300093  5.657195
zod=Virgo       226      190  179.263  0.643120  0.706881
zod=Libra       182      158  162.612  0.130789  0.142481
zod=Scorpio     178      145  148.864  0.100272  0.108424
zod=Sagittarius 173      142  180.104  8.061506  8.890433
   Chisq= 23.8  on 11 degrees of freedom, p= 0.01343

Hang on – these aren’t signs of the zodiac, they are, roughly, months of birth. And there is a seasonality effect on disease risk for some cancers, both childhood and adult.

And the births are seasonal –

zod <- c(247,214,215,238,196,215,222,167,226,182,178,173)
chisq.test(zod)

Chi-squared test for given probabilities

data: zod
X-squared = 37.154, df = 11, p-value = 0.0001086

So season of birth, which is associated with differential in utero exposure to factors such as viral illness, as well as postnatal exposure to infectious disease and micropollutants, is linked with cancer death.

Rather than arguing about how sceptical we are about astrology, I would be diving into the literature and finding other work in this area. My over-the-second-cup-of-coffee literature search suggests that the effect may be driven by some particular cancers such as breast.

albertoca · August 20, 2020, 3:59pm

As far as I know, the effect of the seasonality of birth has been described for the incidence of some types of cancer. Seasonality with respect to date of diagnosis has also been described for incidence and prognosis. However, I am not aware of articles that have described that the month of birth has influence on the prognosis of advanced tumors diagnosed >40-60 years later, and the long period makes a biological relationship largely implausible.
The interest of my question was about the best approach, frequentist or Bayesian, and about the properties of the chi2 distribution with many degrees of freedom. I felt it was surprisingly easy to find significant results with the log-rank test (p-value <0.05), in different databases, for the zodiac sign or month of birth. Indeed that can be explored with the months of the year, the signs of the zodiac or the Chinese calendar.

RonanConroy · August 21, 2020, 9:58am

The point I was making is that you are not analysing signs of the zodiac but rather season of birth.

Chinese birth signs are fascinating. Many years ago some researchers published a study of cause specific deaths in California among those with chinese and non-chinese names. The rationale is that people born under different signs have different vulnerabilities to disease, according to this belief system. Fire signs, for example, are vulnerable to lung disease.

If you didn’t have a chinese name, then your chinese birth sign had no effect on your age at death.

If you had a chinese name and you died of a disease that was not associated with your birth sign, you died at the same age, on average, as a person with a non-chinese name.

However, if you had a chinese name and died of a cause of death predicted by your birth sign, then you died two to five years earlier than a non-chinese with the same disease.

DavidColquhoun · August 21, 2020, 8:35pm

It’s good to see the Bayarri, Benjamin & Berger approach cited here.

In the TAS 2019 edition on ‘A World beyond p < 0.05’, Benjamin & Berger proposed that p values should be supplemented by the maximum posterior odds on H1.

Of all the suggestions made in that edition if TAS, that was the closest to mine in the same edition, The false positive risk: a proposal concerning what to do about p values. I proposed supplementing the p value with the similar quantity L₁₀, the likelihood ratio in favour of H₁. Or, more comprehensibly perhaps, with the false positive risk when prior odds are 50:50,

FPR₅₀ = 1 / (1 + L₁₀).

In a Bayesian context, this could interpreted as the posterior probability of H₀ in the light of the observed p value.

In the case of a two independent samples these two approaches give similar results, especially around p = 0.05. Berger’s is simpler to calculate, but mine is simpler to derive.

DavidColquhoun · August 21, 2020, 8:54pm

Using your “best case Bayes’ Factor Bound of 6.36 to 1 in favor of the alternative”, we’d get

FPR₅₀ = 1 / (1 + 6.36) = 0.135

That can be compared with the p value of 0.013. Of course if the prior odds of there being a real effect were less than 1, the FPR₅₀ would be a good deal bigger.

Sander · August 22, 2020, 10:41pm

Sorry to report that the only aspect of this discussion that makes sense to me is the comment that seasonality is at play here, and the note that the data are not about astrology; they are simply about astrological month of birth, so that the chi-squared analysis makes no sense given the cyclical ordering.

Turning to biology, the chi-squared analysis fares even worse given the astronomical (as opposed to astrological) irrelevance of the month categories (there are after all about 12.4 lunar-phase cycles and 13.3 lunar-orbit cycles a year). That aside, lifetime effects of season of gestation are very plausible based on the embryology and life-course epidemiology I know of; for example sunlight exposure affects vitamin D levels which are central to bone formation and maintenance, as well as maternal resistance to infections which in turn can have dramatic effects on the fetus that last a lifetime (e.g., impaired hearing from congenital rubella). And infections themselves can have dramatic seasonal patterns, even in the tropics.

To model seasonality in a way connected to the leading hypothetical mechanisms (gestational nutrition, infections, and sunlight) I would instead start with a 2-parameter single-annual cycle on the circle of months: one parameter for the cycle origin (circle location) and another that distorts away from the sinusoidal shape via horizontal stretching and compression. Other parameters essential for serious seasonality analysis would include monotone interactions with birth year (recognizing that seasonal nutrition and infection have been flattened out by modern food distribution and infection controls); and some sort of climate interaction to allow for sunlight exposure and respiratory infections. These items compose the qualitative prior information represented by the data model (as per Box 1980).

For an analysis of astrology, those seasonal terms would be needed as potential confounders. A fair analysis would also group the signs by alleged sign characteristics or some other aspect of astrological theory. The latter is quite elaborate according to some afficionados and includes lunar-cycle elements (I don’t know details and I would consult with those who do).

This is all to say: I don’t believe in astrology any more than others here, but the knee-jerk “this is noise” responses to “significant” analyses purporting to be about astrology betray how statistics has its own severe problems with irrational belief and sheer prejudice passed off as scientific skepticism, as often imposed by “skeptical” priors. That prejudice is most prominent in declarations that a priori an association must be random if you can’t immediately think of a plausible mechanism for the association. Such imagination failure usually happens because you don’t know the background topic well enough to realize the many not-unreasonable ways the association could be “real” (i.e., due to a causal mechanism, albeit perhaps not the one under study).

I find it particularly disturbing when commentators double down on their ignorance with claims that there are data proving there is no association (perhaps because they read reviews saying there is none, based on all the underpowered studies that reported p>0.05 as “no association”), having never bothered to dispassionately analyze the actual background literature. This pseudo-empirical pseudo-skepticism is one version of what I call nullism; it plagues the “reproducibility crisis” industry as transported from experimental psychology into medical epidemiology, where it has been weaponized for the risk-denial industry.

BTW my jaundiced view of pseudoscientific prejudice passed off as skepticism is derived from one of Feyerabend’s themes, which he illustrated using astrology as an example. And no he didn’t believe in astrology, he just thought most scientists attacking it were hypocritical as “science defenders”, given their sloppy, prejudicial refutations.

albertoca · August 23, 2020, 9:42pm

Prioritising the least far-fetched hypotheses cannot be considered an act of ignorance. Obviously I cannot rule out seasonality, nor the other infinite hypotheses that are conceivable (including the astrological one), but this p-value is not enough even to create a conjecture. In the hypothetico-deductive method applied to medicine, the usual thing is to construct a solid rationale, which is generally reached by induction. When the rational basis is mature, the hypotheses are formalized and a frequentist test is applied, to challenge the hypothesis.
I agree with your criticism to nullism, but that should be nuanced as it is also influenced by the previous plausibility of the hypothesis. You will agree with me that, when faced with the same p-value, it is not the same to declare negative an immunotherapy clinical trial with divergent survival curves, with a powerful rationale, or this analysis about the horoscope carried out for fun.
If the above argument were true, any significant p-value, as far-fetched as it might be, would have to be followed by the post-hoc elaboration of a rational basis a posteriori, no matter how implausible it might be.
In this case, the argument of seasonality could obviously be a further hypothesis. However, it is a hypothesis that is not compatible with what is currently known about the carcinogenesis and tumorigenesis of gastric cancer in adults. Although the biological basis of tumours was mysterious a few years ago, this no longer applies today. Today, the entire genome of hundreds of stomach cancers has been sequenced, and the genetic mechanisms of their pathogenesis are fairly well known. Since Vogelstein it has been known that these tumours require 6-8 somatic mutations, which develop stochastically over decades. Evidently, there are examples of children who develop childhood tumors, due to germline mutations. There are also known polymorphisms that predispose to gastric cancer decades after birth. However, throughout life, the subject suffers millions of exposures, such as smoking, alcohol, the silent action of Helicobacter pylori over decades, obesity, Epstein-Barr virus infections, etc. To believe that the cancer of a heavy smoker in his 70s is due to intrauterine exposure is not impossible, although it is logical to think that this effect should be tiny compared to his continuous environmental exposures. Believing that the cause is in a transient, seasonal maternal exposure, which can be separated from other factors in this observational study, is even more complicated. But it’s not over yet because we’re not talking about cancer incidence. All the subjects in this cohort have the exact same type of cancer. Therefore, even less plausible is the hypothesis that a transient and unknown intrauterine exposure is capable of modifying the prognosis of a metastatic cancer developed 70 years later, with a sufficiently large magnitude of effect to be discerned in a study, which has difficulty in capturing more evident effects. If this prognostic (non-oncogenic) effect existed, it would most likely have been detected already, because as I explained before, nowadays the complete sequencing of tumors is an easy procedure, already done.
Precisely, as this type of hypothesis-free studies are becoming more and more frequent, the basis of my analysis was to speculate with a hypothesis-free frequentist test , and then to reflect on the result. To do this, I chose the craziest variable I could think of, to show the dangers of the hypothetico-deductive method when it is done without hypotheses.