Critique of a published prediction model

Elias_Eythorsson · May 20, 2024, 10:59am

I agree with all your points and they have revitalised me to campaign for a full-outcome-level proportional odds model in the next iteration or our prediction model. I will freely admit that I did not realise the bias I was introducing into the higher outcome levels by combining them, and this is an important lesson that I have gained from posting this call for critique on datamethods.

My strategy for this version was designed around its suggested use, which was to inform the decision to obtain a bone marrow sample in individuals with presumed MGUS, based on the probability that this test would reveal the person to have SMM (by bone marrow criteria) rather than MGUS. Based on my own experience, my haematologist colleagues and conversations at the American Society of Haematology conferences from 2021-2023, this is the most common rationale for obtaining a bone marrow sample in individuals with presumed MGUS. Our aim with the model, is to reveal to clinicians how low-yield their current testing strategy and how unlikely it is to change management, with the end goal to decrease bone marrow samplings in the MGUS population. It should be noted that the Mayo Clinic risk stratification model of MGUS has been repurposed for this decision for decades. Our paper is the first to rigorously evaluate the Mayo Clinic model for this purpose, showing our prediction model to be superior for any risk threshold by decision curve analysis. The biases introduced by combining higher outcome levels into the ≥15% BMPC outcome level (the highest in our model) do not effect the recommended use of the model as presented in our online calculator, because the model is exclusively evaluated on the ≥10-14% outcome level.

The rationale for using a proportional odds model even though I only intended this model to be evaluated on one of the outcomes (≥10-14%), was based on statistical efficiency. I learned this from reading Regression Modelling Strategies, 2nd edition and we discussed this here on this forum https://discourse.datamethods.org/t/using-a-proportional-odds-model-instead-of-logistic-model-to-improve-power/7256 . I never figured out how to formally incorporate the efficiency gained from using a proportional odds model rather than a logistic model, so I based the sample size calculation for the prediction model on procedures for a logistic model on the rationale that the proportional odds model is at least as efficient. Informally, I should that for my use, the proportional odds model with four outcome levels was 40% (range 18-80%) more efficient (as shown in the attachment in the original post titled MGUS_prediction_ordinal_vs_logistic.html and the thread I linked to in this comment.

So while using all the outcome levels is more efficient (and I will be incorporating this in future iterations of the model when I figure out how to present the results and justify its use to clinicians), the current model derivation is more than adequately powered/efficient for its intended use.