Dear all,

I got a problem when I was thinking about my study design.

The study is an exploratory prognostic factor study. 10 factors in individual patient level and 2 factors in hospital levels. I also plan to test one factor as an interaction term.

I have two outcomes. One is the length of stay, which will be modeled with the negative binomial method. The other is the reoperation rate, which will be modeled with survival analysis with the death as the competing risk.

I am thinking how to confirm I have enough sample to explore these factors, or at least get a relatively accurate estimation. For the length of stay, it seems ok because I could use all of my data with the data imputation. For the reoperation rate, things become so intractability.

I have two problems. The first is whether I should run my model for each time point (For example, 1, 5, and 9-year reoperation rate). Maybe I should run once if the assumption of proportional hazard is valid? I am not sure.

The second is the sample size for different time point is different. For example (here I use the data from JAMA. 2018;320 (16):1659-1669.), the crude risk for 1-year reoperation rate is 1.4% (1275/90215), 5-year reoperation rate is 2.9% (1508/52715), and 9-year reoperation rate is 3.4% (240/6981). If I run the model for the 9-year reoperation rate, the sample size is 240. Considering the possible non-linearity, categorical outcomes and interaction term, this sample size seems too small. I am wondering whether I should do some kinds of variable selection (For example, LASSO or backward). If not, what is the right way for me to run the model?

Thank you for your time and wish you have a great day!

Best wishes,

Lingxiao