# Sample Size calculation - "subgroup hypothesis", Guidance needed

Hey there,
I hope it is not a too dumb question, but until last week I thought I have understood the main principles of sample size calculation. Now I got introduced into a project and have to do the sample size calculation and I am kinda confused.

Setting:
N patients with a specific disease X (and N*constant healthy controls) should be recruited and specific value will be measured, lets call this value MTV.
X has different subgroup of disease-groups. So every patient who has X is in minimum one (but can be in more than one) subgroup of this disease, X1, X2, X3, X4, X5.
So for example one patient can have X and X is expressed as having X1 and X3.
Another patient has disease X, but in expression as only having X4.
One could think of symptoms in this setting, which would yield to a similar idea.

1. goal: hypothesis: â€śMTV is increased in patients with X.â€ť
2. goal: hypothesis: â€śMTV is increased in patients with expressions X1, X2, X3, X4 in comparison to patients with X5.â€ť (X1 vs X5, X2 vs X5, X3 vs X5 and X4 vs X5).
3. goal: â€śincreased MTV is associated with higher event rates in patients with X.â€ť

For the 1. goal my idea is simple:
I need power = 0.8, sign. level =0.05 and both expected means with standard deviations and I can simply use G*Power to calculate the sample size.
Lets make an example: mean(MTV|X) = 50, std(MTV|X)=10, mean(MTV|noX)=40, std(MTV|noX)=8 then I get an effect size d of: 1,38675 and can calculate the total sample size of 16 (8vs.8)

For the 2. goal, I am kinda stuck cause of different reasons:

• I only found expected MTV means for subgroups X1,X2 and X5, but not for X3 and X4 (We expect, that the values are in a similar range as the ones in X1 and X2).
• I think the best design would be a 1:1:1:1:1 target allocation (with randomization one could fit a small margin for bias prevention here). But this doesnt fit to goal 1 or can I just assume it like the estimated 16 patients is minimum size for the recruitment?
• I would analyze this in a multivariable model, with influence factor â€śsubgrouptypeâ€ť. But how can I calculate sample sizes for a GLM-type model?
• I am currently in a very hard discussion with my mentor, that it could be better to use the continuous values which describe X1,â€¦,X5. And then build a model like â€śMTV = baseline + acontX1 + bcontX2 + ccontX3 + dcontX4 + e*subgrouptype + errorâ€ť, but the problem remains, I dont know how to estimate sample size for such a model.

For the 3. goal, I dont know any kind of tools or starting points, as I never saw any sample-size calculation for survival-analysis studies. I would appreciate every hint regarding this.

To be clear, 1. and 2. are the main goals. And if I see it correct, 1. is powered, if 2. is powered, so I have to look, how to power the 2. goal question.

Thanks in advance for every hint!

1 Like

Anyone any hint or idea regarding this?

Your mentor is correct that the most general and elegant way to handle this is using a model. This will handle arbitrary overlap between groups. Itâ€™s best to concentrate on the power for whether any groups are different from one another, starting with a model with no interactions. This will be a 5 d.f. â€śChunk testâ€ť. There is a complex way to compute sample size needed for 0.9 power (0.8 is not good enough) if you have another dataset with the same variables. But in general youâ€™d need to write an R program to simulate the F statistics for the chunk test under different assumptions for regression coefficient values, and solve for n such that the fraction of F values exceeding the critical value at the, say, 0.05 level, is 0.9.

1 Like