Sample Size calculation - "subgroup hypothesis", Guidance needed

JuliaRose · January 16, 2024, 9:39am

Hey there,
I hope it is not a too dumb question, but until last week I thought I have understood the main principles of sample size calculation. Now I got introduced into a project and have to do the sample size calculation and I am kinda confused.

Setting:
N patients with a specific disease X (and N*constant healthy controls) should be recruited and specific value will be measured, lets call this value MTV.
X has different subgroup of disease-groups. So every patient who has X is in minimum one (but can be in more than one) subgroup of this disease, X1, X2, X3, X4, X5.
So for example one patient can have X and X is expressed as having X1 and X3.
Another patient has disease X, but in expression as only having X4.
One could think of symptoms in this setting, which would yield to a similar idea.

goal: hypothesis: “MTV is increased in patients with X.”
goal: hypothesis: “MTV is increased in patients with expressions X1, X2, X3, X4 in comparison to patients with X5.” (X1 vs X5, X2 vs X5, X3 vs X5 and X4 vs X5).
goal: “increased MTV is associated with higher event rates in patients with X.”

For the 1. goal my idea is simple:
I need power = 0.8, sign. level =0.05 and both expected means with standard deviations and I can simply use G*Power to calculate the sample size.
Lets make an example: mean(MTV|X) = 50, std(MTV|X)=10, mean(MTV|noX)=40, std(MTV|noX)=8 then I get an effect size d of: 1,38675 and can calculate the total sample size of 16 (8vs.8)

For the 2. goal, I am kinda stuck cause of different reasons:

I only found expected MTV means for subgroups X1,X2 and X5, but not for X3 and X4 (We expect, that the values are in a similar range as the ones in X1 and X2).
I think the best design would be a 1:1:1:1:1 target allocation (with randomization one could fit a small margin for bias prevention here). But this doesnt fit to goal 1 or can I just assume it like the estimated 16 patients is minimum size for the recruitment?
I would analyze this in a multivariable model, with influence factor “subgrouptype”. But how can I calculate sample sizes for a GLM-type model?
I am currently in a very hard discussion with my mentor, that it could be better to use the continuous values which describe X1,…,X5. And then build a model like “MTV = baseline + acontX1 + bcontX2 + ccontX3 + dcontX4 + e*subgrouptype + error”, but the problem remains, I dont know how to estimate sample size for such a model.

For the 3. goal, I dont know any kind of tools or starting points, as I never saw any sample-size calculation for survival-analysis studies. I would appreciate every hint regarding this.

To be clear, 1. and 2. are the main goals. And if I see it correct, 1. is powered, if 2. is powered, so I have to look, how to power the 2. goal question.

Thanks in advance for every hint!

JuliaRose · February 3, 2024, 7:36pm

Anyone any hint or idea regarding this?

f2harrell · February 4, 2024, 2:00pm

Your mentor is correct that the most general and elegant way to handle this is using a model. This will handle arbitrary overlap between groups. It’s best to concentrate on the power for whether any groups are different from one another, starting with a model with no interactions. This will be a 5 d.f. “Chunk test”. There is a complex way to compute sample size needed for 0.9 power (0.8 is not good enough) if you have another dataset with the same variables. But in general you’d need to write an R program to simulate the F statistics for the chunk test under different assumptions for regression coefficient values, and solve for n such that the fraction of F values exceeding the critical value at the, say, 0.05 level, is 0.9.