# When is testing superior to estimation?

Can anybody give me a real world example where testing is superior to estimating?

Say we compare two different treatments (A and B). I find it hard to see under what circumstances testing would be superior than an estimation that gives me two posterior distributions (for A and B).

Can anyone give an example or explain what I am missing here? Thanks. Links to papers/articles/websites are also appreciated.

Edit: if this question is overly broad, please tell me and I’ll close/delete this item.

Continuing the discussion from Most reasonably hypothesised effects cannot be exactly zero?:

2 Likes

I think this is an excellent question. My career has been devoted to clinical trials and health services/outcomes research. Even though I use hypothesis testing for expediency I do not recall a single example where hypothesis testing was the best way to meet a research project’s goals. Since I became Bayesian, what I want to see is the entire posterior distribution for the unknown effect, the probability that the effect is in the right direction, and the probability that the effect is more than trivial, for a reasonable choice of ‘trivial’. More on that here.

4 Likes

what about drug development with generics. i guess equivalence testing is estimation but really we don’t care what the estimate is along as it falls within certain boundaries, thus it is the statistical testing that matters, a kind of go v no go

I don’t see that. No/no go decisions need to also make use of evidence for magnitude of effects.

The only advantage I can think of is a psychological one: the fact that the conclusion was the result of a procedure makes it look more ‘objective’. “Our magic test says that Treatment A is superior” sounds more convincing to some than “Looking at the posterior distribution one sees that it is more probably that Treatment A is superior”.

2 Likes

Though it is possible to use such testing wisely, in reality a procedure that results in positive/negative reject/not reject etc. provides only the illusion of objectivity and is in effect attempting to move the thinking component from the researcher to a black box.

2 Likes

Still puzzles me how/why NHST got its prominent place in science and medicine. The method is (afaik) good for:

• frequent decision making
• in a closed context
• where an occasional wrong decision is not fatal
• but where one wants to minimize the number of mistakes

Example: quality control in a screw factory. Based on samples the quality manager decides if a batch of screws is deemed “good” or “faulty”. Minimize Type I and Type II errors; but a single wrong decision is not fatal.

Contrast this to science in general.

• goal is not to take decisions but to gain knowlegde
• context is usually open. It’s not just H0 vs H1.

Or, when we go to decision making, let’s look at medical decisions:

• many once in a lifetime decisions (have a bone marrow transplant yes or no?)
• long term error rates are not relevant, each individual decision counts. “I don’t care that 80% of all patiënts survive, I want to know your best calculations of MY chances of survival”
• and of course a patient is only interested in the relevant probability: the probability of the hypothesis given the data. “Given these data, there is a 95% probability that you have this disease” . Rather than in the probability of the data given some hypothesis.

As you all probably know, many of these objections were worded long ago, e.g… by Sir Ronald Fisher in 1955. https://www.phil.vt.edu/dmayo/personal_website/Fisher-1955.pdf

Or am I missing the point here? Thnx.

4 Likes

I don’t think you are missing the point. But even with the screw factory, the question is “how good”, and estimation seems in order. I’m not convinced that a point null is the way to go. A Bayesian analysis would emphasize the probability that the parameter representing “bad” exceeds a given tolerance.

1 Like

There is a real-life example, but not from the context of comparing two treatments. See section 1.5.1 of my book

Newcombe RG. Confidence intervals for proportions and related measures of effect size. Chapman & Hall/CRC Biostatistics Series, Taylor & Francis, Boca Raton, FL, 2012. ISBN: 978-4-4398-1278-5.