Predicting survival of cancer patients using the horoscope: astrology, causal inference, seasonality, frequentist & Bayesian approach

To explore non hypothesis-driven statistical analyses, I have evaluated the prognostic effect of the zodiac sign on the overall survival, in a real database of >2500 cancer patients:

> library(DescTools)
> dat$zod<-Zodiac(dat$date_birth, lang = c("engl"), stringsAsFactors = TRUE)
> survdiff(Surv(OS,Die)~zod, data=dat)
survdiff(formula = Surv(OS, Die) ~ zod, data = dat)

n=2473, 97 observations deleted due to missingness.

                  N Observed Expected (O-E)^2/E (O-E)^2/V
zod=Capricorn   247      203  195.667  0.274836  0.304873
zod=Aquarius    214      187  169.646  1.775170  1.942353
zod=Pisces      215      183  176.597  0.232181  0.254755
zod=Aries       238      187  209.565  2.429634  2.714388
zod=Taurus      196      158  171.088  1.001254  1.096064
zod=Gemini      215      180  157.237  3.295255  3.583045
zod=Cancer      222      185  191.979  0.253737  0.282298
zod=Leo         167      139  114.379  5.300093  5.657195
zod=Virgo       226      190  179.263  0.643120  0.706881
zod=Libra       182      158  162.612  0.130789  0.142481
zod=Scorpio     178      145  148.864  0.100272  0.108424
zod=Sagittarius 173      142  180.104  8.061506  8.890433
   Chisq= 23.8  on 11 degrees of freedom, p= 0.01343 

To my surprise, I observed that the log-rank test is significant (p=0.01), similar to the likelihood ratio test.

I wanted to ask for some insight on the type I error & the behavior of these tests, depending on the degrees of freedom, and the causes of this result.


The luck of the draw. A Bayesian analysis would not find any evidence for an association, because you and I would put such a skeptical prior on the association.


Correct me if I’m wrong, but can’t the “sufficiently skeptical prior” be translated to a roughly equivalent frequentist p value?

If I did the math right, a p of 0.01343 gives about 6 bits of information against the null, using the p value transformation recommended by Greenland in this article:

Greenland, S. (2019) Valid P -Values Behave Exactly as They Should: Some Misleading Criticisms of P -Values and Their Resolution With S -Values (link).

I don’t remember which Sander Greenland paper I read this in (I was thinking of a distinct paper from the one linked to above), but he also discussed the critical distinction between the hypothesis testing procedures used by Fisher, Neyman, and Pearson, vs. the misleading textbook descriptions of them. This “fixed level” significance testing (ie. using the same magical \alpha of 0.05, 0.01, etc. without consideration of the effect or sample size), is something that neither Fisher, nor Neyman would have approved of.


I don’t see how to encode skepticism in the right place in the logic flow with the frequentist approach. You are suggesting being more skeptical about data you witnessed whereas I’m being more skeptical about hypotheses.


Not sure if I got what you try to say, but a frequentist being sceptic about the tested hypothesis would not be willing to reject this hypothesis when p is not really, really small.

PS: great paper linked by R_cubed, thanks for bringing this to my attention!

1 Like

But how do you interpret the s-value in this case? My intuition is not clear about whether six bits is too little information or not .

Context is important when looking at p-values (or their s-value transformation). What you did is the equivalent of taking multiple random presumed fair coins from your wallet and flipping each of them seven times (p=0.01). It is not that surprising that you observed one of them to come up as tails on all seven tosses.


@f2harrell: I think I’ve had enough coffee this morning to clean up my initial post.

To say a skeptical prior would not find an association seems to be related to a frequentist advising someone that his/her a level is too high for a particular context.

Added on 8/17/2020: I should have simply cited an instructive paper by Bayarri, Benjamin, Berger, and Selke on what they call the rejection ratio to demonstrate what I mean. This perspective is useful whether you approach a problem as a frequentist or a Bayesian. I wish it was taught to me in my intro stats class.

Expressing Bayes Theorem (pre-experiment) in odds:
O_{pre} = \frac{\pi_{0}}{\pi_1} \times \frac{1 - \bar\beta }{\alpha}

The value of an experiment (conditional on rejection of the null) is the ratio of power to \alpha, or what the authors call the rejection ratio. We can see that the more skeptical the prior, the smaller \alpha needs to be for an experiment to shift the prior odds. They use the average or expected power, as this is thought about before any data are seen.

In the post-data POV, they use the \frac{1}{-e \times p \times ln(p) } bound to relate the p value to a Bayes factor bound, which provides the highest amount of evidence against the null provided by the data, for any prior. This is the most an honest advocate can assert as evidence in favor of an effect for a particular study.

For any retrospective look at a data set, we can calculate a Bayes’ factor, a Bayes’ factor bound, or a p value. There exists a function that outputs a Bayes’ factor when given a p value; the inverse gives a p value. We can always solve for the prior required when we assert any particular posterior, when given the Bayes’ factor of the data.

After much study, I try to place frequentist reports in a Bayesian context. Robert Matthews (Aston University) has written a few instructive papers on how derive the implied prior when presented with “confidence” (aka. compatibility) intervals. This is his earliest one.

Matthews, R. (2001) Methods for Assessing the Credibility of Clinical Trial Outcomes Drug Information Journal, Volume: 35 issue: 4, page(s): 1469-1478 (link)

Addendum: An educational paper on placing p values in a Bayesian context:

@albertoca: Sander Greenland likes to convert the p value into a proper binary unit of refutation. In this case, 6 bits of information against a model is akin to flipping a fair coin 6 times, and having them all come up heads - 0.5^6 = 0.0156.

Using the \frac{1}{-e \times p \times ln(p)} bound, your p value of 0.01343 converts to a best case Bayes’ Factor Bound of 6.36 to 1 in favor of the alternative.

But isn’t it to correct this that the chi2 distribution was invented, which becomes more and more elongated as the degrees of freedom increase?

I understand that, and the connection with entropy/information theory seemed like a beautiful idea to me. However, in this case it is not clear to me that the s-value will help me to interpret the result differently from the p-value.

The use of context to map the observed data into a generating process is something that informs the statistical analysis but lies outside of it. This is a topic of intense debate but I found this article by Judea Pearl to be an extremely insightful introduction.

1 Like

To pick up the gaunlet, I’ve fitted a Bayesian Cox model with skeptical priors ~normal (mean=0, sd=0.1) for all the signs of the zodiac, except for the basal category (the sign of Capricorn). With the Capricorns I did not know very well what to do. I don’t know if a skeptical prior should be placed on the intercept and what the prior would be. The brms library uses the Student-t distribution with 3 parameters for the intercept.
In any case this is the result. The Bayes factor against a null (intercept-only) model is 8 in this example. What do you think?

prior <- c(set_prior("normal(0,0.1)", class = "b", coef = "zodAquarius"),
           set_prior("normal(0,0.1)", class = "b", coef = "zodAries"),
           set_prior("normal(0,0.1)", class = "b", coef = "zodCancer"),
           set_prior("normal(0,0.1)", class = "b", coef = "zodGemini"),
           set_prior("normal(0,0.1)", class = "b", coef = "zodLeo"),
           set_prior("normal(0,0.1)", class = "b", coef = "zodLibra"),
           set_prior("normal(0,0.1)", class = "b", coef = "zodPisces"),
           set_prior("normal(0,0.1)", class = "b", coef = "zodSagittarius"),
           set_prior("normal(0,0.1)", class = "b", coef = "zodScorpio"),
           set_prior("normal(0,0.1)", class = "b", coef = "zodTaurus"),
           set_prior("normal(0,0.1)", class = "b", coef = "zodVirgo")
> rstan_options (auto_write=TRUE)
> options (mc.cores=parallel::detectCores ())
> fit <- brm(SG | cens(1-Die) ~ 1 + zod, data = dat, prior=prior, family = brmsfamily("cox"),chains=4, iter=2000, save_all_pars = TRUE)
Compiling Stan program...
Start sampling

> fit2 <- brm(SG | cens(1-Die) ~ 1 , data = dat, family = brmsfamily("cox"),chains=4, iter=2000, save_all_pars = TRUE)
Compiling Stan program...
Start sampling

> summary(fit) 
 Family: cox 
  Links: mu = log 
Formula: SG | cens(1 - Die) ~ 1 + zod 
   Data: dat (Number of observations: 2473) 
Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
         total post-warmup samples = 4000

Population-Level Effects: 
               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
Intercept          1.56      0.08     1.42     1.73 1.00     1803     1982
zodAquarius        0.05      0.06    -0.07     0.18 1.00     4366     3145
zodPisces          0.02      0.06    -0.11     0.13 1.00     3462     2786
zodAries          -0.08      0.06    -0.20     0.04 1.00     4652     3343
zodTaurus         -0.06      0.06    -0.19     0.07 1.00     3922     2963
zodGemini          0.08      0.06    -0.04     0.20 1.00     4058     3030
zodCancer         -0.04      0.06    -0.16     0.08 1.00     4030     2859
zodLeo             0.11      0.07    -0.03     0.24 1.00     4300     3243
zodVirgo           0.03      0.06    -0.09     0.15 1.00     3874     2963
zodLibra          -0.02      0.07    -0.15     0.11 1.00     4226     2854
zodScorpio        -0.02      0.07    -0.15     0.11 1.00     4043     2882
zodSagittarius    -0.15      0.07    -0.28    -0.01 1.00     4076     3180

> bayes_factor(fit,fit2)
Estimated Bayes factor in favor of fit over fit2: 8.02152
1 Like

However, in this case it is not clear to me that the s-value will help me to interpret the result differently from the p-value.

An advantage of the S-value over p is one can add the surprisals of different studies knowing only the p value alone to come up with an aggregate surprisal value. This is what Fisher’s combination procedure does (except using the natural log as the information measure).

In the context of the OP, given the sample size (2400 subjects), I don’t find a rejection of the exact null of no relationship between survival and zodiac sufficiently surprising to warrant another study.


This seems somehow related to ‘preregistration’. Is that the major difference between the Bayesian and S-value approaches here?

There is nothing exceptional about your p-value. Back in around 2005 I was analysing the results of a trial I believed had been validly randomised. There was a big imbalance on the most important prognostic factor. One doesn’t normally calculate a p-value for such imbalance as it should be simply a random number between 0 and 1 - but it turned out to be rather below 0.001. On reflection - this is the only time I can recall seeing this in a lifetime working as a statistician (since 1971), so really it is not at all surprising.

There is no inherent difference between parametrising into 12 months the way we do, or the Hebrew way, or the astrological way, each of which is simply offset some days from ours. There are methods available that fit an a priori meaningful sinusoidal model - basically 2 df, regress on sine and cosine of date which are orthogonal to one another.

I recall a former colleague, Prof Colin Roberts, rather a sceptical guy, back around 1980 presenting data on birth defects not by month but by star sign. There were some patterns evident, which he interpreted intelligently, simply as obvious seasonal trends - i.e. just what a sinusoidal model would seek to pick up. In contrast to the pattern here which only very slightly supports any idea of sinusoidal seasonality.


Hang on – these aren’t signs of the zodiac, they are, roughly, months of birth. And there is a seasonality effect on disease risk for some cancers, both childhood and adult.

And the births are seasonal –

zod <- c(247,214,215,238,196,215,222,167,226,182,178,173)

Chi-squared test for given probabilities

data: zod
X-squared = 37.154, df = 11, p-value = 0.0001086

So season of birth, which is associated with differential in utero exposure to factors such as viral illness, as well as postnatal exposure to infectious disease and micropollutants, is linked with cancer death.

Rather than arguing about how sceptical we are about astrology, I would be diving into the literature and finding other work in this area. My over-the-second-cup-of-coffee literature search suggests that the effect may be driven by some particular cancers such as breast.


As far as I know, the effect of the seasonality of birth has been described for the incidence of some types of cancer. Seasonality with respect to date of diagnosis has also been described for incidence and prognosis. However, I am not aware of articles that have described that the month of birth has influence on the prognosis of advanced tumors diagnosed >40-60 years later, and the long period makes a biological relationship largely implausible.
The interest of my question was about the best approach, frequentist or Bayesian, and about the properties of the chi2 distribution with many degrees of freedom. I felt it was surprisingly easy to find significant results with the log-rank test (p-value <0.05), in different databases, for the zodiac sign or month of birth. Indeed that can be explored with the months of the year, the signs of the zodiac or the Chinese calendar.


The point I was making is that you are not analysing signs of the zodiac but rather season of birth.

Chinese birth signs are fascinating. Many years ago some researchers published a study of cause specific deaths in California among those with chinese and non-chinese names. The rationale is that people born under different signs have different vulnerabilities to disease, according to this belief system. Fire signs, for example, are vulnerable to lung disease.

If you didn’t have a chinese name, then your chinese birth sign had no effect on your age at death.

If you had a chinese name and you died of a disease that was not associated with your birth sign, you died at the same age, on average, as a person with a non-chinese name.

However, if you had a chinese name and died of a cause of death predicted by your birth sign, then you died two to five years earlier than a non-chinese with the same disease.


It’s good to see the Bayarri, Benjamin & Berger approach cited here.

In the TAS 2019 edition on ‘A World beyond p < 0.05’, Benjamin & Berger proposed that p values should be supplemented by the maximum posterior odds on H1.

Of all the suggestions made in that edition if TAS, that was the closest to mine in the same edition, The false positive risk: a proposal concerning what to do about p values. I proposed supplementing the p value with the similar quantity L10, the likelihood ratio in favour of H1. Or, more comprehensibly perhaps, with the false positive risk when prior odds are 50:50,

FPR50 = 1 / (1 + L10).

In a Bayesian context, this could interpreted as the posterior probability of H0 in the light of the observed p value.

In the case of a two independent samples these two approaches give similar results, especially around p = 0.05. Berger’s is simpler to calculate, but mine is simpler to derive.

1 Like

Using your “best case Bayes’ Factor Bound of 6.36 to 1 in favor of the alternative”, we’d get

FPR50 = 1 / (1 + 6.36) = 0.135

That can be compared with the p value of 0.013. Of course if the prior odds of there being a real effect were less than 1, the FPR50 would be a good deal bigger.

1 Like