Immunoglobulins can be measured in plasma. In children, the reference limits have been shown to be different by age of the child. In adults, the reference limits have been assumed to be invariant to age, based on older studies with relatively small samples. I have access to data from a large prospective population-based screening study that measured immunoglobulins in all participants. My aim is to test the hypothesis that reference limits vary by age. I intend to use quantile regression of the 2.5th and 97.5 percentile of immunoglobulin level with age modelled as a restricted cubic spline with four knots and to compare these visually to the currently accepted limits, as a function of age.
My question is how I could provide evidence for or against the adjusted and accepted reference limits being different, other than visually? I share the misgivings of most of datamethods’ readers with P-values and hypothesis testing, but I wonder if this particular question would fit into the null hypothesis testing framework. I am thinking along the lines of performing an equivalence test between the currently accepted and predicted age-adjusted reference limit for pre-specified age-groups.
Does anyone have any thoughts on this approach or suggestions on better ways to achieve the same goals?
One initial thought is to quantity the extent of the age effect, either an adjusted generalized R^2 or form a contrast to estimate the difference in 0.975 quantiles over the quantiles of age. A partial effect plot of age would be even better than choosing just the quantiles.
I would be inclined to seek out a more sharply defined question by placing the question of ‘reference limits’ in the context of a particular clinical decision. How much evidence does an extreme value of the immunoglobulin level provide to support X diagnosis conditional on Age?
It might well be that seeking ‘One Reference Limit to Rule Them All’ isn’t a productive research agenda.
I agree. The reference limits (often times) are not indicators of natural phenomena but often only the 2.5th and 97.5th percentiles of the values as obtained from a sample of healthy people and anything outside of that may be an indicator of a problem.
The current usage of references limits more or less boils down to answering the question of: “Is this person with this value at risk for any kind of problem which needs additional investigation, based on this particular value.” It is a quick kind of vague risk assessment that would be better completed with context-dependent models. As you suggest, “In a patient with x, y, z features/results, what is the probability of diagnosis X based on additional lab value L?”
Age-related immune-senescence is reasonably well taught with declines in immunity (including humoral) anticipated overtime in many cases. Can we expect a 90-year-old to have the same levels of Ig as a 40-year-old, all else constant?
I wonder what the goal is in your context. The reference limits are often arbitrary based on what the 2.5th and 97.5th percentile values were in healthy individuals (for example, and I wouldn’t be surprised about different derivation methods for different labs). Since this often isn’t the indicator of disease, I suggest considering if disease risk prediction is better augmented by a regression model which accounts for the relevant variables and the underlying Ig (or whichever lab you want to de-throne the reference limits for) to predict some disease/outcome probability. The comparison comes back to the problem @f2harrell has reviewed in many cases for us: compare the model performance (including out of sample) of the model with the underlying (or splined, for example) Ig number against an identical model which instead uses the threshold limit in a binary manner to predict the disease risk.
Moving the somewhat arbitrary cutoff is less ideal than showing the superiority of the actual value of Ig for predicting an outcome, while accounting for variation in the outcome due to other known variables. In other words, compare performance of the reference cutoffs against the actual lab values and move the field toward a problem specific regression model to validate.
I do agree with your sentiment. What you are describing has been termed “functional reference intervals” and is an entirely different research question, which our group also intends to answer in the course of my PhD student’s thesis. I share all your misgivings with conventional laboratory reference intervals as currently used in clinical practice. However, it is a fact that reference intervals are used in daily practice and their validity does have consequence. I still feel that testing the formalised hypothesis that the extremes of the distribution of immunoglobulin values does not change as a function of age (and other continuous baseline values) is of value and exactly how one would do this is the topic of the current datamethods discussion.
This is exactly the question we intend to answer, among others. I wholeheartedly agree with you analysis and rest assured, we do intend to do exactly as you describe. As with many research projects, there are several related but functionally independent research questions that we intend to answer. The first question, which was sourced by interviewing several clinicians about what questions they find interesting and would like to learn the answer to, will be to provide evidence for or against the hypothesis that immunoglobulin reference intervals are invariant to age. We have been thinking deeply about how to do this in an efficient, convincing and informative way, which is the topic of this datamethods’ thread.
In some manner, this gets at the notion that you shouldn’t test statistically something that is a given theoretical point (i.e. odd how many studies discuss “significance” relating age with mortality…). If there is sufficient literature discussing the evidence for decreased immunoglobulin production with age, it could be taken as a given and used to point out the inconsistency of using only a single range. This doesn’t alleviate the issue of claiming there exists (in a natural sense) a reference range which is what is used in clinical practice.
I wonder if it is better to construct the argument around the idea that reference intervals are unnatural and artificial (and should be done away with after proper vetting of models) as opposed to creating new or additional reference intervals which would still be artifacts of data and our opinion “how abnormal” something is before it’s on the “other side” of the line. This would take massive work over time, however.
Paraproteins are a huge thorn in the side of primary care physicians- specifically, what to do with them once we find them (?) As you know, they are very common in the elderly population, yet primary care physicians aren’t really trained in how to “risk-stratify” these patients with regard to their future risk of progression to myeloma. Many patients end up being stratified as “low risk of progression” (to multiple myeloma) following hematology consultation, but recommendations from hematologists regarding the need for periodic serologic monitoring (or not) for this group, and how often monitoring should occur, seem inconsistent. There seems to be some consensus around “higher risk” features (e.g., M-protein level >15 g/L/type of M-protein- IgA or IgM/abnormal free light chain ratio), but hematologists’ advice re what to do with everybody else (i.e., the “low risk” group) feels like a bit of a crapshoot right now.
Is the purpose of your research to determine whether there might be justification for “raising the threshold” for identifying paraproteins in the elderly population, with a view to labelling a smaller proportion as having MGUS? To to do this, would you need to show that the risk of progression to myeloma among patients with “low risk” serologic findings (e.g., lower Ig levels) is so low that labelling these patients as “diseased” is clinically unhelpful (?)
It might also be useful to look at the evidence that other medical specialties have used as the basis for “dialing back” their recommendations for monitoring patients who were previously considered to be at increased future risk for developing a disease. For example, colonoscopic surveillance guidelines are about to change in my area. Previously, identification of any polyp(s) on a scope would lead to a recommendation for ongoing scope surveillance every 5 years. Guidance is now changing to acknowledge that patients without a family history, who have only 1 or 2 low risk adenomas on their scope, can return to the fecal immunohistochemical screening track: