Some quotes from Freedman and Box that you may have seen before. Boldface is my emphasis.
“Statisticians generally prefer to make causal inferences from randomized controlled experiments, using the techniques developed by [Ronald A.] Fisher and [Jerzy] Neyman. In many situations, of course, experiments are impractical or unethical. Most of what we know about causation in such contexts is derived from observational studies. Sometimes, these are analyzed by regression models; sometimes, these are treated as natural experiments, perhaps after conditioning on covariates. Delicate judgments are required to assess the probable impact of confounders (measured and unmeasured), other sources of bias, and the adequacy of the statistical models used to make adjustments. There is much room for error in this enterprise, and much room for legitimate disagreement.
”[John] Snow’s work on cholera, among other examples, shows that sound causal inferences can be drawn from nonexperimental data. On the one hand, no mechanical rules can be laid down for making such inferences. Since [David] Hume’s day, that is almost a truism. On the other hand, an enormous investment of skill, intelligence and hard work seems to be a requirement. Many convergent lines of evidence must be developed. Natural variation needs to be identified and exploited. Data must be collected. Confounders need to be considered. Alternative explanations have to be exhaustively tested. Above all, the right question needs to be framed.
“Naturally, there is a strong desire to substitute intellectual capital for labor. That is why investigators often try to base causal inference on statistical models. With this approach, P-values play a crucial role. The technology is relatively easy to use and promises to open a wide variety of questions to the research effort. However, the appearance of methodological rigor can be deceptive. Like confidence intervals, P-values generally deal with the problem of sampling error not the problem of bias. Even with sampling error, artifactual results are likely if there is any kind of search over possible specifications for a model, or different definitions of exposure and disease. Models may be used in efforts to adjust for confounding and other sources of bias, but many somewhat arbitrary choices are made. Which variables to enter in the equation? What functional form to use? What assumptions to make about error terms? These choices are seldom dictated either by data or prior scientific knowledge. That is why judgment is so critical, the opportunity for error so large and the number of successful applications so limited.”
Source: Excerpt from the final (“Summary and Conclusions”) section of David Freedman (1999), “From Association to Causation: Some Remarks on the History of Statistics”, Statistical Science, 14 (3): 243-258.
“Statistics has no reason for existence except as a catalyst for scientific enquiry in which only the last stage, when all the creative work has already been done, is concerned with a final fixed model and a rigorous test of conclusions. The main part of such an investigation involves an inductive-deductive iteration with input coming from the subject-matter specialist at every stage. This requires a continuously developing model in which the identity of the measured responses, the factors considered, the structure of the mathematical model, the number and nature of its parameters and even the objective of the study change. With its present access to enormous computer power and provocative and thought-provoking graphical display, modern statistics could make enormous contributions to this – the main body of scientific endeavour. But most of the time it does not.”
Source: Excerpt from G. E. P. Box’s Discussion of David Draper (1995), “Assessment and Propagation of Model Uncertainty” (with discussion), Journal of the Royal Statistical Society, Series B, 57 (1): 45–97.
“Statistics is, or should be, about scientific investigation and how to do it better, but many statisticians believe it is a branch of mathematics….
“So I think it ludicrous to suppose that anyone who has no experience of real scientific inquiry is qualified to teach or research statistics. Unhappily, many of the people in our most prestigious universities who are teaching future statisticians and conducting research in statistics are precisely in this category….”
“When I talk to engineers and physical scientists whom I am hoping to persuade to give statistical methods a chance, I have come to dread the comment ‘Yes, I once took a course in statistics,’ because I know it usually means that instead of starting from scratch, I now must start with a severe handicap….”
“Much philosophical argumentation about the nature of statistical inference is, I believe, irrelevant because it contains the hidden but profound assumption of a one-shot approach, in spite of the fact that the majority of scientific investigations follow an iterative and adaptive sequence. Since the inductive-deductive iteration which is scientific method cannot be readily fitted into a purely mathematical model, we too often agree to concentrate on the deductive bit we can study mathematically and pretend that the rest does not exist. This cuts the investigatory process in two and kills it. Students absorb the impression that models are true and data are wrong instead of the other way around. The assumption of normality is stressed but the devastating effect of dependence in space and time in our essentially nonstationary world is not. The dominant role of the design of experiments and surveys, as compared with their analysis, is not understood…”
“It seems a pity that while we statisticians have an opportunity to rate as first-class scientists we should settle for the rather dreary role of second-class mathematicians. So I am not surprised by the situation we are in. I believe scientists and engineers have little time for statistics because they judge much of it to be irrelevant to what they are doing, and they are right.”
Source: Excerpts from G. E. P. Box’s Commentary on A. Bruce Hoadley and J. R. Kettenring (1990), “Communications Between Statisticians and Engineers/Physical Scientists” (with discussion), Technometrics, 32 (3): 243-274.