Advice for appropriate causal model and analysis method

We are designing a study to understand how radiation doses to small bowel blood pool affects the risk of lymphopenia (reduced lymphocyte count) during radiotherapy (See protocol here).

Problem Statement

Lymphocytes are highly sensitive to radiotherapy and we know that lympocytopenia during radiotherapy is associated with poorer outcomes across multiple solid cancers. The reasons are not very clearly understood and as most of the studies are retrospective multiple sources of bias are possible. Nonetheless avoiding lymphocytopenia is useful in itself as it also reduces the risk of infections (esp viral / fungal) during treatment.

In patients of cervical cancer who receive radiotherapy, lymphocytopenia is nearly universal after radiotherapy. Specialized radiotherapy techniques like bone marrow sparing radiotherapy prioritize saving the bone marrow from radiotherapy (especially the pelvic bone marrow). Randomized trials investigating bone marrow sparing radiotherapy have demonstrated that while this reduces neutropenia (reduced neutrophil counts), lymphopenia does not change. Mechanistically this is not surprising as in adults only 2 - 3% of the total lymphocyte pool will be in the bone marrow.

Evidence from other cancer sites (e.g thoracic cancer) suggests that lymphopenia during radiotherapy may be related to the irradiation of the blood as it circulates to the organs like lung, heart etc. In case of cervical cancers the organ where the maximum blood flow is seen is the small intestine (approximately 11% of the total cardiac output).

The surprising thing is that till date no one actually investigated the impact of irradiation of the small bowel blood pool on lymphopenia.

Study Objective

To determine if radiation doses to the small bowel have an impact on the lymphocyte count of the patients during treatment.

Measurements

Lymphocyte counts are measured once before starting radiotherapy and then weekly during treatment (usually 5 - 6 measurements till the treatment completes).

Doses to the organs like bone marrow and small bowel will be available as a dose matrix and we can obtain various dose parameters like average dose, median dose as well as dose volume data. At the starting stage we will consider a simple assumption that dose to the small bowel is a surrogate of the blood pool doses in the small bowel.

Other factors that can influence this toxicity

There are several other potential factors that can influence this toxicity like:

  1. Age
  2. Menopausal status
  3. Height and weight (these determine the total dose of chemotherapy delivered during treamtent and also influence the volume of tissue receiving radiotherapy)
  4. Chemotherapy dose (patients receive chemotherapy weekly and each week the dose of chemotherapy (cisplatin) affects the lymphocyte count)
  5. Bone marrow dose

We have made the following causal diagram:
dagitty-model

As lymphocyte counts are repeating data they will depend on the baseline lymphocyte count (patients with pre-existing lymphopenia likely to have greater lymphopenia in the future), radiation dose to the bone marrow (this has a strong mechanistic basis), and potentially the radiation dose to the small bowel which we are interested in.

We know that height and weight directly influences the total dose of cisplatin administered as we calculate the dose of the drug based on body surface area. Additionally height and weight determine the dose to the bone marrow and the bowel (radiation physics).
PTV is planning target volume (ie. the volume of the tissue we are treating to take care of the cancer) and this is dependant also on the height and weight.
Age and menopusal status can influence weight and may influence the baseline lymphocyte count.

Proposed analytical methodology

A linear mixed model where the cluster variable is patient ID, fixed variables are PTV, height, weight and baseline lymphocyte count. Dependant variable is the weekly lymphocyte count.
Conditional independencies will need to be checked as per the DAG.

Question

As per the DAG, the regression model needs to adjust for the ptv, height, weight and bone marrow doses. While cisplatin dose is important the DAG indicates that we do not need to adjust for it if we can demonstrate the conditional independance. However, cisplatin is given weekly and the dose delivered on each week will be determined by the presence of hematological toxicity. Now we do not adjust the dose of cisplatin or hold cisplatin based on the lymphocyte counts per se but it is likely that a person with severe lymphocytopenia may have other hematological toxicities or secondary infections that prevent us from giving cisplatin. In other words the lymphocyte counts can potentially influence subsequent cisplatin doses as shown in the illustration below.

CDDP - Lymphopenia Relation

I understand that in this case seperate DAG should be built for week 2 lymphocyte count, week 3 lymphocyte count etc to account for this temporal relationship but how should we model this relationship. One way I have thought of is to consider cisplatin dose and weight for each week as a random variable and include it in the linear mixed model. Is this the correct way or should I use another method.

Also I would appreciate feedback about the overall design and how I should approach the sample size calculation.

DAG model code for DAGITTY

dag {
bb="-5.478,-5.027,4.025,5.389"
"baseline lymphocyte count" [pos="-2.421,1.130"]
"bowel dose" [exposure,pos="-2.169,-0.350"]
"cisplatin dose" [pos="-3.144,2.716"]
"lymphocyte count" [outcome,pos="0.902,-0.947"]
"marrow dose" [adjusted,pos="-1.473,-1.961"]
"menopausal status" [pos="-4.914,-0.183"]
age [pos="-5.027,1.332"]
height [adjusted,pos="-3.502,-0.612"]
ptv [adjusted,pos="-2.328,-3.595"]
weight [adjusted,pos="-4.762,-1.913"]
"baseline lymphocyte count" -> "lymphocyte count"
"bowel dose" -> "lymphocyte count"
"cisplatin dose" -> "lymphocyte count"
"marrow dose" -> "bowel dose"
"marrow dose" -> "lymphocyte count"
"menopausal status" -> "baseline lymphocyte count"
"menopausal status" -> weight
age -> "baseline lymphocyte count"
age -> "menopausal status"
height -> "baseline lymphocyte count"
height -> "bowel dose"
height -> "cisplatin dose"
height -> "marrow dose"
height -> ptv
ptv -> "bowel dose"
ptv -> "lymphocyte count"
ptv -> "marrow dose"
weight -> "baseline lymphocyte count"
weight -> "bowel dose"
weight -> "cisplatin dose"
weight -> "marrow dose"
weight -> ptv
}


1 Like

Anyone considering using a DAG for complex relationships such as this that include time varying treatment covariate feedback should invest 4-5 weeks in Miguel HernĆ”nā€™s excellent (and free) edx course on DAGs.

https://www.edx.org/learn/data-analysis/harvard-university-causal-diagrams-draw-your-assumptions-before-your-conclusions

6 Likes

Thank you. Will certainly do this.

Based on the feedback received from @ehudk I have attempted to modify the DAG to explicitly show the relationship between cisplatin dose received each week and the lymphocyte counts observed each week (ie. Cisplatin dose 1st cycle ā†’ causes hematological toxicity and reduces lymphocyte count in the 1st week of treatment ā†’ if severe this influences the dose of Cisplatin administered at the 2nd cycle and the reduced lymphocyte count on the 1st week itself influences the lymphocyte count in the 2nd week and so on. The revised DAG is shown below.
image
DAG code is available here:

dag {
bb="-5.478,-5.027,4.025,5.389"
"Cisplatin Dose 2nd Cycle" [pos="0.888,2.982"]
"Lymphocyte Count Week 2" [outcome,pos="1.770,2.076"]
"baseline lymphocyte count" [pos="-2.348,0.574"]
"bowel dose" [exposure,pos="-1.380,-1.428"]
"cisplatin dose 1st Cycle" [pos="-3.144,2.716"]
"lymphocyte count Week 1" [outcome,pos="-0.597,2.040"]
"marrow dose" [adjusted,pos="0.769,-3.978"]
"menopausal status" [pos="-4.914,-0.183"]
age [pos="-5.027,1.332"]
height [adjusted,pos="-3.223,1.254"]
ptv [adjusted,pos="-1.665,-4.145"]
weight [adjusted,pos="-4.357,-2.679"]
"Cisplatin Dose 2nd Cycle" -> "Lymphocyte Count Week 2"
"baseline lymphocyte count" -> "lymphocyte count Week 1"
"bowel dose" -> "Lymphocyte Count Week 2"
"bowel dose" -> "lymphocyte count Week 1"
"cisplatin dose 1st Cycle" -> "lymphocyte count Week 1"
"lymphocyte count Week 1" -> "Cisplatin Dose 2nd Cycle"
"lymphocyte count Week 1" -> "Lymphocyte Count Week 2"
"marrow dose" -> "Lymphocyte Count Week 2"
"marrow dose" -> "bowel dose"
"marrow dose" -> "lymphocyte count Week 1"
"menopausal status" -> "baseline lymphocyte count"
"menopausal status" -> weight
age -> "baseline lymphocyte count"
age -> "menopausal status"
height -> "Cisplatin Dose 2nd Cycle"
height -> "Lymphocyte Count Week 2"
height -> "baseline lymphocyte count"
height -> "bowel dose"
height -> "cisplatin dose 1st Cycle"
height -> "marrow dose"
height -> ptv
ptv -> "Lymphocyte Count Week 2"
ptv -> "bowel dose"
ptv -> "lymphocyte count Week 1"
ptv -> "marrow dose"
weight -> "Cisplatin Dose 2nd Cycle"
weight -> "Lymphocyte Count Week 2"
weight -> "baseline lymphocyte count"
weight -> "bowel dose"
weight -> "cisplatin dose 1st Cycle"
weight -> "marrow dose"
weight -> ptv
}

Additionally to clarify the point about Bowel dose. In radiotherapy we generally make a plan at the start of the treatment which provides us with an estimate of the cumulative dose that the bowel will receive over 5 weeks of therapy. Practically patient receives radiotherapy 5 days a week so technically the dose received each day influences the lymphocyte count. However, we do not change our radiotherapy treatment plan based on lymphocyte count hence I have retained it as a ā€œsingle point exposureā€.

1 Like

Thanks for clarifying @S_Chakraborty.
As I said over BlueSky, this seems to be a covariate-outcome feedback loop with a point exposure (or a constant sustained exposure, but not dynamic). This somewhat simplifies things as you donā€™t need g-formula methods to account for treatment-confounder loops.

Second thing to note, is that although cisplatin dose might not be required for identification of the bowel dose - lymphocyte count relationship, it should be predictive of the outcome and therefore explain away some of its variance and leave the bowel dose estimation more precise [see Model 8].

In this case, I think your proposed GLMM is quite reasonable if I understand it correctly. You can structure your data in a person-time format, which allows you to incorporate the changes in cisplatin, weight, and ā€œbaselineā€ lymphocyte count* that occur every week.
*(i.e. the lymphocyte count at week T-1, but you can also have a constant column of lymphocyte count at actual baseline if you think itā€™s not fully Markovian and thereā€™s residual information there).

I would then fit a GLMM in the form of
lymphocite.count.t ~ age + menopausal.status + weight.t + height + ptv + marrow.dose + cisplatin.dose.t + lymphocytes.tminus1 + bowel.dose + time + (1|patient.id).
With several comments:

  1. slap splines everywhere, especially important for time to allow non-linear time progression, but I would even consider doing everything with GAMs.
  2. I think thereā€™s significant benefit in including the interaction of bowel.dose:time to allow the effect to vary with time.
  3. the *.t are time-varying covariates.

I donā€™t know how lymphocye counts are distributed, so I canā€™t weigh in what family/link function to use. Probably identity, but maybe gamma regression or negative binomial if they are actual counts (positive, possible right skew).

Iā€™m ~70% confident in this answer.

1 Like

This seems to be a very well thought-out question and answer. Forgive me for asking this question: The model you proposed is the one I would propose without knowing anything abou DAGs. Does the DAG just help us understand our assumptions better or did it really fundamentally guide the model specification?

2 Likes

I canā€™t speak for Santam, but generally in my work, laying out the structure of the problem in a DAG is often the most convenient way to bridge the language barrier between clinicians and statisticians.

Laying out the structure can tell you how hard it would be to answer your question: can it be answered with a regression on existing data? do you need some more sophisticated modeling approach? do you even have the required data? is it just impossible?
I think this is what you would call ā€œunderstand the assumptions betterā€. Especially important in those non-experimental settings where you hope to achieve unconfoundedness without randomization.

And, at the very end, the DAG tells you what input variables are required to go into your estimator.
Which is what I think you would call ā€œmodel specificationā€.
Again I think DAGs are helpful here: the graphical back-door criterion makes identification of exchangeability easier in my opinion. But also more efficient model specification, like my comment above about the benefits of including cisplatin dose become very clear with a DAG (say, for me, a fellow statistician on the internet with zero knowledge on lymphopenia and radiotherapy; again, bridging domain experts with data experts).

2 Likes

Thank you for the detailed response. I agree after drawing the DAG it did help a lot in clarifying the ideas. The lymphocyte count at T-1 is available and that seems like a very elegant solution for the problem. The dataset is going to be formated in a person time format (I will keep the dates so that we can have the most flexible analysis) and will include time as days after the start of radiotherapy. Will certainly include splines on all quantitative variables as well as the interaction term.
I have a question on *.t time-varying covariates as a followup : In my dataset I will have the date when the blood test was done and date when the chemotherapy administered. I am planning to convert the dates into days post start of radiotherapy (partly because I want to make this dataset open and this helps with the issues related to confidentiality). Will this be correct ?

1 Like

I agree with this. After learning about DAG it has certainly helped me clarify my assumptions and think more deeply about what I am doing with the variables. I am now trying to have DAGs with all studies that I propose (observational ones especially).

I am still unclear though about the process for checking for assumptions of conditional independance which need to be performed as I am unclear on the interpretation of these (specially in situations where the observational studies may have a relatively small sample size). @ehudk what are your thoughts on that.

1 Like

Iā€™m not clear about what happens when exactly in your setting. But I think your plan is correct - if bowel-dose is the main variable of interest, then I would set it as time zero and align everything else according to it.

2 Likes

Thanks. As bowel dose is the main variable of interest and is also available in the treatment plan before the treatment starts I will align it with the start of treatment and days for the rest of the treatment will be counted according to it.

1 Like

What fraction of the bodyā€™s lymphocytes will pass through the radiation field while it is ā€˜onā€™? What will be the distribution of radiation doses absorbed by this fraction of lymphocytes? Does the radiation dose induce lymphocytopenia primarily by direct killing of the lymphocytes? (Or are there more subtle interactions, e.g. with irradiated tissues that the lymphocytes encounter as they circulate?)

1 Like

Thanks @davidcnorrismd these are all excellent questions and some of these we do not have very reliable answers to. Regarding the fraction of the bodyā€™s lymphocytes irradiated in a single fraction, we know from a modelling study that in a single fraction of radiotherapy to the brain for a malignant glioma about 5% of the total circulating cells receive > 0.5 Gy of irradiation and after 30 fractions of treatment nearly the entire circulating cell compartment has been irradiated. (See :
Yovino S, Kleinberg L, Grossman SA, Narayanan M, Ford E. The etiology of treatment-related lymphopenia in patients with malignant gliomas: modeling radiation dose to circulating lymphocytes explains clinical observations and suggests methods of modifying the impact of radiation on immune cells. Cancer Invest 2013;31:140ā€“4. https://doi.org/10.3109/07357907.2012.762780.). Interestingly while the brain gets 12% of the cardiac output the small intestine gets 11% ! (ICRP 89)

There are other modelling studies also which show very interesting results w.r.t to the kinetics of radiation dose delivery and the technique implemented. We do know that lymphocytopenia is primarily due to the direct killing of lymphocytes and this is a phenomenon observed during irradiation of the thoracic tumors also. In the study we are planning all patients have been treated with intensity modulated radiotherapy a technique that ā€œpaintsā€ a variable dose across the volume and it would be interest also to see how this spatial dose distribution affects the lymphocytopenia. There is a very interesting package which we found during our literature review (Hedos - https://github.com/MGHPhysicsResearch/hedos) which we will be using as a followup to this (but only if this basic clinical question is answered). Regarding the last question which is a question of interest as we have some evidence that non-circulating lymphocytes may have a different pattern of radio-resistance - I do not think we will have any answer from this study directly.

2 Likes

I would consider using PyAgrum, which can model dynamic Bayesian networks and interventions thereof.

This method
https://ar-tiste.xyz/?page_id=613
generates the DAG automatically, without domain expert help, from temporal data. It might interest you at some point. It would involve a fresh new start, so I donā€™t advise you to use it immediately. Finish your current method and then maybe you might be interested in considering this after that.

Lastly, you might find my free book helpful in some serendipitous way
https://ar-tiste.xyz/?page_id=459