That sounds reasonable, assuming \beta_1 applies to the logit propensity score. But the propensity score does not have enough variables in it to be worth the trouble. You can do a better job by just including all the variables as covariates. PS is used when there are too many potential confounders to model as separate covariates, due to limits on the effective sample size for Y.
Q: How should one define knots for restricted cubic splines when dealing with missing data?
I understand from the rms package documentation that fit.mult.impute estimates knot locations from the first imputation and fixes them for all subsequent computations to ensure parameter consistency.
However, I have not found explicit theoretical guidance in RMS (e.g., Section 2.4.6) distinguishing between using the āfirst imputationā versus simply using the observed marginal distribution (e.g., rcspline.eval(na.omit(x))) to set these fixed knots.
Is using the first imputation merely a software pragmatism in rms, or is there a theoretical reason to prefer it over defining fixed knots based on the available observed values?
I am currently calculating knots on the observed data (na.omit(x)) and fixing them in the formula across all imputed datasets. I would value your confirmation that this approach is methodologically sound.
This is a great point to bring up, and one I wish I had devoted attention to much earlier. I wish that I had recorded default knots in the datadist so rcs could use the distribution of non-missing x instead of the distribution of first-imputation-filled-in x.
I think your approach is best, discarding NAs and saving your computed knot locations in individual vectors that you can insert as second arguments to rcs. But keep in mind that aregImpute is probably recomputed knot locations upon each fill-in (this needs to be checked) so there will be a slight incompatibility between how the imputations are done and how the outcome model is fitted. Someday I could add an option for pre-specified knot locations to be used by aregImpute.