At last, Frank, you have stated your perception in a way that looks to me truly colorblind to explicit statements of why noncollapsibility is considered by so many to be bad: It’s bad because it means the subgroup ORs (which is all clinicians know about because that is what is highlighted in reports and quoted by editorials and medical media) can be arbitrarily far from every individual effect, even if all the individual effects are identical and there no uncontrolled study bias (such as confounding). Furthermore, this phenomenon should be expected in practice because there is always unmeasured baseline risk variation (as shown by the fact that individual outcomes vary within our finest measured subgroupings). So the clinician can never supply a patient-specific OR estimate, she can only supply the OR estimate for the finest subgroup she’s seen reported (which is rarely anything finer than sex plus some very crude age grouping like “above/below 60”)
I am thus forced to conclude that you think it not bad that a reported OR will always mislead the user when applied to individual decisions, unless it is combined with other information to transform it into a collapsible measure like the RD or component risks. You need to confirm or deny that impression.
Mathematically, what is going on is that the OR is variation (logically) independent of the baseline risk, which is a technical advantage for model simplicity and fitting but which also means that it conveys less information about individual effects than an RD. The RD has exactly the opposite pairing: It is severely dependent on the baseline risk, which is a technical disadvantage for model simplicity and fitting, but which also means it conveys more information about individual effects (specifically the individual RD must average to the group RD). The RR falls somewhere between these extremes in being not as technically well behaved as the OR but providing some information about individual effects; also not as ill-behaved technically as the RD while providing some less simple information about individual effects.
If after a few generations there is no agreement about noncollapsibility then we should not expect any in your and my remaining lifetimes. What might agree on is that, for clinical applications, the focus of reporting should be not on effect measures but on full reporting of risk functions. I have been advocating that from the beginning of this controversy in the 1970s as have others.
This resolution would however require concessions on your part that (1) people should stop touting the OR as some kind of magic special measure (which is what the papers by Doi et al. and Doi’s comments here come across as doing) and that (2) for those who (unlike you and Doi) do want a measure that represents average individual effects, the OR simply will not suffice if the risk is not always “low” (in the sense explained in my papers, e.g., if 10% deviation from the average is the maximum tolerable error then the OR shouldn’t be used if the odds can exceed 10%).