I might add that the general notion of giving multiple researchers the same dataset and having them arrive at differing results, at least to some extent, is not a new topic.
Back in the 90’s, David Naftel at UAB was a valued mentor of mine. At the time I was working on the STS National Databases, developing risk adjustment models for operative mortality, with Fred Edwards and Richard Clark. We had both the benefit and the curse of having access to, for the time, huge datasets, with hundreds of thousands of observations.
David and I engaged in numerous discussions about the approaches one could take to developing multivariable models, including feature selection and so forth.
At least in part, predicated upon those discussions, in 1994, David wrote an editorial in the Journal of Thoracic and Cardiovascular Surgery:
Do different investigators sometimes produce different multivariable equations from the same data?
June 1994, Volume 107, Issue 6, Pages 1528–1529
I would say that it is worth a read 20+ years later for perspective.
In the fall of 1995, I had the privilege of attending a research workshop at KU Leuven, co-hosted by Paul Sargeant at KUL, and UAB faculty, including David, John Kirklin and Gene Blackstone. The focus of the workshop was the team-based approach to analysis, where statisticians and clinical subject matter experts collaborate to derive reasonable results.
It is an experience that still influences how I go about analyses today and would advocate for others.
I might also remind of the oft-cited/paraphrased quote by George Box: “Essentially, all models are wrong, but some are useful.”