Hi,
Needless to say, comparing Medicare versus commercial introduces all kinds of confounding, not just because of discrete age differences (>=65 versus <65), but you also likely have a number of relevant comorbidities at a higher prevalence in the Medicare group that further complicates comparisons. In other words, age and gender adjustments are not likely to be sufficient, unless you case matched on other characteristics a priori.
Since you have a lack of data, you cannot effectively control for that, so it really comes down to what can you actually say about the observed differences between the groups, to some extent, in a vacuum.
It may come down to not making the formal inference that the event rate in the Medicare group is “significantly” higher than the commercial group, which would reasonably be expected, but in essence, is the event rate in the Medicare group acceptable, within some margin of error, which given your sample size, may be too large at present.
For example, for the 3 out of 31, which is 9.7%, you would have 95% confidence intervals of 3.3% - 24.9% (using Wilson Score intervals). That is a wide range.
How did you manage to get the same number of patients in each group? Did you plan to collect a consecutive series of patients in each group over time, until you enrolled the same number of patients in each, or is there some other mechanism involved?
The key question embodied in the above is whether or not you also have to consider patient selection bias in the two groups, that may further confound your comparisons. In other words, if you did not enroll a consecutive series, is there something different about the patients that were eligible for the study, but were not enrolled, as compared to the patients that were enrolled, that further complicates your comparisons.
With respect to the methods for small samples, not saying that they are invalid at all. However, all formal statistical methods have underlying assumptions, and one of the key general assumptions is having a sufficient sample size to have stable values of the parameters that you are estimating, within an acceptable level of uncertainty.
That, even with the application of those specific methods, you still have “large” (for some definition of large) standard errors, which in turn, likely means overly wide confidence intervals for your odds ratios, those methods are essentially still saying, “get more data”.
Bear in mind, that formalized statistical methods are about engaging in inference. You have some, presumably random, sample from a population. That population may be real or theoretical.
Using the sample, you are then attempting to make statements about the characteristics of the population from which the sample is drawn.
If your sample is too small, your estimates of various parameters will embody a level of uncertainty that is not likely acceptable.
As an example, you have a coin that you presume to be fair. You toss it four times, with an a priori expectation of two heads and two tails, but you get four heads. The probability of getting four heads in a row, if the coin is fair, is 6.25%. Would you now reject the hypothesis that the coin is fair, or would you toss it a larger number of times to observe the results and reduce the uncertainty?
With your limited dataset, you are essentially saying, you want to make a decision about the fairness of the coin after only four tosses.
You can still narrowly describe what was observed in the sample empirically, but you will have trouble making inferences about the population via formalized statistical methods.
That is why, in the aborted study that I referenced above, we did not engage in any formal statistical analyses. We simply reported what was observed, and in the discussion of the study, were sure to mention that the study was aborted, that the enrollment was far short of what was planned, and we were only reporting the observed results empirically, given the uncertainty in the results. In essence, we treated it as if it were a pilot study.