Very good comments and references about the use of p-values and measures of variability in the context of observational studies. In may opinion, descriptive tables in observational studies have two purposes: 1) to provide a picture of who was included in the study and, therefore, to whom the findings of the study could be extrapolated; and 2) to provide a picture of the exchangeability (comparability) of the exposed (treated) and non-exposed (untreated) groups. I think using confidence intervals for purpose 1 would be fine. For instance, one could report the average age of the participants with it’s 95% confidence interval. Reporting standard deviations (SD) or standard errors (SE) seems of little use to me. But this is because I have a hard time making sense of SD and SE, unless I used them to calculate a 95% CI. P-values are useless for purpose (1), since the goal is different from testing a null hypothesis. Regarding the second purpose, I do not use p-values. Even more, if I can get away with it, I do not report crude estimate of the effect of the exposure (i.e. crude comparisons between exposed and non-exposed). I don’t report crude or adjusted effects for other exposures. I just report the frequency/mean of the outcome in each group, exposed and non-exposed, with their 95% CI. I shy away from reporting the crude estimates of effect because they are likely biased and, therefore, not amenable for causal inferences. Statistical tests and confidence intervals for crude comparisons are not valid. Indeed, they are uninterpretable because they assume systematic sources of error (say confounding) have been eliminated before conducting the test. Also, I think reporting them may mislead some readers. As explained before, in the context of RCT, p-values for a comparison of the distribution of prognostic factors in treated and untreated are pointless, because we already know that the process that generated any difference between the two groups was a random assignment. When explaining this to my students, I tell them that Table 1 in an RCT aims to alert us about differences that may make the two treatment groups non-exchangeable (non-comparable). This lack of comparability is a systematic error, not a random error. Statistical tests aim to quantify random error, under the assumption of little or no systematic error. Therefore, a statistical test and the corresponding p-value is the wrong medicine for this patient (Table 1).