This is a really terrific question!
Disclosure: I am a trained frequentist clinical trial statistician whose knowledge of Bayesian statistical approaches is presently “Know just enough about Bayesian approaches to read a paper and explain the results to a layperson” - not at all an expert.
Just to briefly recap: the issue is that traditional frequentist null-hypothesis-testing boils the result down to “significant” or “not significant” based on how likely it was to observe the trial’s data if there was no treatment effect, while Bayesian approaches create a posterior probability distribution of the estimated treatment effect (though the pragmatist may point out that in decision making, this is still often going to be reduced to a yes/no decision).
Caveat before I answer: “it depends” and the comments below probably vary somewhat depending on the specific clinical situation, sponsors of the trials, financial incentives in play, etc. With that said, I’ll venture a few comments:
Re: design-related flaws, it is certainly possible that some of the smaller trials which first lead to the excitement have more of these problems than larger trials that follow. This of course need not be an intrinsic feature of small trials, but may reflect that as the stakes are raised, a greater degree of rigor and regulatory zeal is applied, more experienced investigators get involved, and the trial design is more carefully scrutinized. For one example, see this thread:
Briefly, the authors performed a crossover trial of OSA patients that received one night on the drug and one night on the placebo, then did a subgroup analysis that was restricted to patients with the poorest results on the placebo (using the justification that only these patients met the criteria for OSA on their ‘placebo’ night). The authors proceed to conclude that the drug was more effective (“greatly reduces” appears in the title!) for patients with more severe OSA, though my thread illustrates that regression to the mean could explain part or all of these results (they should have used a pre-study measurement of OSA severity if they wanted to do subgroup analyses by severity, not used the placebo night itself as the assessment of severity). I would not be surprised at all if a more rigorous subsequent study (which will also probably be larger) shows less benefit than this result, because the analysis that they used in their paper inherently overestimates the treatment benefit by design. (Yes, I wrote an cranky letter about this, and no, it didn’t seem to make any impression on the authors)
So the answer is Yes, with caveats: sometimes design flaws in early-stage RCT’s (which also tend to be on the smaller side) are better addressed in subsequent larger trials with more oversight (and/or more experienced investigators), and this probably explains some of the problem you describe here.
Re: publication bias, I generally agree that this is a greater issue with studies other than randomized trials, though again it may vary a bit depending on some other specifics. I think some of the meta-researchers have found a disturbingly high number of clinical trials with no reported results, though, but (at least from my personal experience) I think most randomized trials ultimately will find a home in the published literature (and it is my personal belief that all trials should be published - some folks have an issue with the idea that even bad trials should be published because it’s rewarding the authors by giving them a publication, but IMO the results of the science should still be made public, though the journal editors/reviewers ought to point out all issues and make sure they are adequately discussed).
Re: the perils of an “all or nothing” frequentist approach, you are certainly correct that there are perils to attaching too much confidence to the findings from any single trial. Whether further research should be performed, or whether the practice being tested should be adopted, depends on specifics of the clinical situation, regulatory considerations. For example, a drug company may decide that they need a “go” or “no go” decision of whether to pursue something further based on their results of their trial. In an ideal world, the trial would be designed with a more flexible sample size to reach a more conclusive result, but sponsors only have so much money and there are only so many patients / so much time that will be invested in studying certain agents before they are either considered market-ready or otherwise are likely abandoned.
I admit that despite my enthusiasm for the potential of Bayesian approaches, I feel some degree of caution at what we have started to see a bit in medicine: the idea that any “negative” trial can just be reanalyzed using a Bayesian approach and subsequently declared a positive-ish trial.