The simulation result would do nothing more than reflect the true model that was used to simulate its data.

Can’t you simulate random continuous versus binary variables and test whether there are more deviations from the PO in one case or the other? My guess is that when you use quadratic terms or splines, the PO fails much more.

No please consider my earlier point. Whatever bias you put into the simulation model will be revealed by such simulation. Not really worth doing IMHO.

So how can you investigate if continuous variables modeled as splines or quadratic terms generate more PO violations? Don’t you have this intuition?

You would have to assemble lots of real datasets to study that properly. My intuition is that it does not matter whether a variable is continuous or categorical, in terms of likelihood of a PO violation. What matters more is the strength of the predictor,as violations of assumptions typically matter more for stronger predictors.

I have read the entire chapter 7. I loved it, a new world to learn these years. I guess the compareBmods function gives you weights based on the WAIC differences of each model with the best model (formula in page 229 in the McElreath’s book), is that it? These weights indicate the probability that each model is the best, is that right?

That is my understanding. But look at the code underneath `compareBmods`

to check.

In the case of the previous example, I have compared two models, one additive with a treatment + age formula, versus one with an interaction with a treatment * age formula. In principle, compareBmods favours the additive model without interaction. However, WAICs compare the accuracy of the prediction. Here I do not want to predict, but make inferences about the values of the therapeutic effect in each age range. For this I have to include the interaction necessarily even though the most likely data generation process likely does not. How do you solve this dissonance?

I have a doubt.

When I use the predict function I get negative probabilities for some levels of the end point.

This determines that the cumulative probability, for example, the probability of quality of life of at least y (Pr >=y), does not decrease monotonically when increasing y. For example:

```
> predict(bs_final,newdata, type='fitted.ind')
y x Mean Lower Upper
1 HE6=0 1 0.624 0.131 0.985
2 HE6=8.3 1 -0.094 -0.177 -0.005
3 HE6=16.7 1 -0.053 -0.174 0.068
4 HE6=25 1 -0.118 -0.202 -0.013
5 HE6=33.3 1 0.035 -0.060 0.144
6 HE6=41.7 1 -0.118 -0.212 -0.020
7 HE6=50 1 0.172 0.061 0.290
8 HE6=58.3 1 -0.050 -0.136 0.045
9 HE6=66.7 1 0.043 -0.044 0.129
10 HE6=75 1 -0.024 -0.106 0.062
11 HE6=83.3 1 0.180 0.063 0.289
12 HE6=91.7 1 -0.009 -0.094 0.071
13 HE6=100 1 0.411 0.054 0.774
```

Provide the simplest possible reproducible example using self-generated data and I’ll get to work quickly on this worrisome problem.

Done in github, thanks.