Hey there,

I hope it is not a too dumb question, but until last week I thought I have understood the main principles of sample size calculation. Now I got introduced into a project and have to do the sample size calculation and I am kinda confused.

**Setting:**

N patients with a specific disease X (and N*constant healthy controls) should be recruited and specific value will be measured, lets call this value MTV.

X has different subgroup of disease-groups. So every patient who has X is in minimum one (but can be in more than one) subgroup of this disease, X1, X2, X3, X4, X5.

So for example one patient can have X and X is expressed as having X1 and X3.

Another patient has disease X, but in expression as only having X4.

One could think of symptoms in this setting, which would yield to a similar idea.

- goal: hypothesis: â€śMTV is increased in patients with X.â€ť
- goal: hypothesis: â€śMTV is increased in patients with expressions X1, X2, X3, X4 in comparison to patients with X5.â€ť (X1 vs X5, X2 vs X5, X3 vs X5 and X4 vs X5).
- goal: â€śincreased MTV is associated with higher event rates in patients with X.â€ť

For the 1. goal my idea is simple:

I need power = 0.8, sign. level =0.05 and both expected means with standard deviations and I can simply use G*Power to calculate the sample size.

Lets make an example: mean(MTV|X) = 50, std(MTV|X)=10, mean(MTV|noX)=40, std(MTV|noX)=8 then I get an effect size d of: 1,38675 and can calculate the total sample size of 16 (8vs.8)

For the 2. goal, I am kinda stuck cause of different reasons:

- I only found expected MTV means for subgroups X1,X2 and X5, but not for X3 and X4 (We expect, that the values are in a similar range as the ones in X1 and X2).
- I think the best design would be a 1:1:1:1:1 target allocation (with randomization one could fit a small margin for bias prevention here). But this doesnt fit to goal 1 or can I just assume it like the estimated 16 patients is minimum size for the recruitment?
- I would analyze this in a multivariable model, with influence factor â€śsubgrouptypeâ€ť. But how can I calculate sample sizes for a GLM-type model?
- I am currently in a very hard discussion with my mentor, that it could be better to use the continuous values which describe X1,â€¦,X5. And then build a model like â€śMTV = baseline + a
*contX1 + b*contX2 + c*contX3 + d*contX4 + e*subgrouptype + errorâ€ť, but the problem remains, I dont know how to estimate sample size for such a model.

For the 3. goal, I dont know any kind of tools or starting points, as I never saw any sample-size calculation for survival-analysis studies. I would appreciate every hint regarding this.

To be clear, 1. and 2. are the main goals. And if I see it correct, 1. is powered, if 2. is powered, so I have to look, how to power the 2. goal question.

Thanks in advance for every hint!