I am trying to wrap my head around methodological issues while constructing a predictive model.
The predictive model was developed following the Tripod guidelines. It has a defined use-case. An ordinal logistic regression model was chosen and the outcome variable defined based on subject matter knowledge of clinical collaberators without “looking at the data”. The included covariates where chosen based on subject matter knowledge, and respecting the number of candidate parameters the data could support.
Now I have fit a tentative ordinal model. Examining the residuals, there are some covariates which show strong evidence of not following the proportional odds assumption. However, most of the variables are binary, 0 and 1 which I feel may affect the interpretation of the importance of this violation. I am also aware that there is a literature regarding the proportional hazards assumption, and that violations of this may not matter. I am therefore curious how I should proceed, when I have already chosen the model and the variables, but now find violations.
Specifically when I do not believe the violations have any real significance to the model performance. Here I show the calibration plots for the first, second and third outcomes. The third outcome is rare, and I am not surprsied it performs relatively poorly. The overall model seems to perform well. How to I reconcile violations of assumptions with this fact?