Hey there,
I hope it is not a too dumb question, but until last week I thought I have understood the main principles of sample size calculation. Now I got introduced into a project and have to do the sample size calculation and I am kinda confused.
Setting:
N patients with a specific disease X (and N*constant healthy controls) should be recruited and specific value will be measured, lets call this value MTV.
X has different subgroup of disease-groups. So every patient who has X is in minimum one (but can be in more than one) subgroup of this disease, X1, X2, X3, X4, X5.
So for example one patient can have X and X is expressed as having X1 and X3.
Another patient has disease X, but in expression as only having X4.
One could think of symptoms in this setting, which would yield to a similar idea.
- goal: hypothesis: “MTV is increased in patients with X.”
- goal: hypothesis: “MTV is increased in patients with expressions X1, X2, X3, X4 in comparison to patients with X5.” (X1 vs X5, X2 vs X5, X3 vs X5 and X4 vs X5).
- goal: “increased MTV is associated with higher event rates in patients with X.”
For the 1. goal my idea is simple:
I need power = 0.8, sign. level =0.05 and both expected means with standard deviations and I can simply use G*Power to calculate the sample size.
Lets make an example: mean(MTV|X) = 50, std(MTV|X)=10, mean(MTV|noX)=40, std(MTV|noX)=8 then I get an effect size d of: 1,38675 and can calculate the total sample size of 16 (8vs.8)
For the 2. goal, I am kinda stuck cause of different reasons:
- I only found expected MTV means for subgroups X1,X2 and X5, but not for X3 and X4 (We expect, that the values are in a similar range as the ones in X1 and X2).
- I think the best design would be a 1:1:1:1:1 target allocation (with randomization one could fit a small margin for bias prevention here). But this doesnt fit to goal 1 or can I just assume it like the estimated 16 patients is minimum size for the recruitment?
- I would analyze this in a multivariable model, with influence factor “subgrouptype”. But how can I calculate sample sizes for a GLM-type model?
- I am currently in a very hard discussion with my mentor, that it could be better to use the continuous values which describe X1,…,X5. And then build a model like “MTV = baseline + acontX1 + bcontX2 + ccontX3 + dcontX4 + e*subgrouptype + error”, but the problem remains, I dont know how to estimate sample size for such a model.
For the 3. goal, I dont know any kind of tools or starting points, as I never saw any sample-size calculation for survival-analysis studies. I would appreciate every hint regarding this.
To be clear, 1. and 2. are the main goals. And if I see it correct, 1. is powered, if 2. is powered, so I have to look, how to power the 2. goal question.
Thanks in advance for every hint!