Good afternoon all. I’m new here!! @f2harrell thanks for the suggestion to come on in!
We’ve had an unusual occurrence in relation to the deployment of a survey - and could use some advice.
Here are the details of what was intended:
The online survey will be programmed into REDCap and conducted among ~3200 participants using XXX, a national survey research aggregation firm. A selection of XXX’s panels of potential participants will be shown basic, non-content information about the survey (i.e., length of time to complete and a compensation amount). These members of XXX’s database are autonomously searching for survey opportunities and will self-select to learn more about this survey opportunity. Those who self-select will be sent to REDCap, where they will view an Introduction page that describes the purpose of the study and expectations of those who participate. Those who choose to continue from there will be redirected to the anonymous Demographic screening form. Those who choose not to participate will be thanked for their time but will not continue to the Demographic screening form.
Participants will be randomized to one of four conditions defined by the wording used (community informed versus traditional) and the method of presentation (video versus text). To minimize the potential for imbalance on key covariates, randomization will be stratified by:
● Health Literacy: High (Extremely/Quite a bit confident filling out medical forms by myself); Low (Somewhat/A little bit/Not at all confident filling out medical forms by myself)
● Educational Attainment: Less than High School, High School Diploma or Equivalent, Some College/Technical School or Greater
● Age: Less than 55, 55+
What happened:
There was an unusual technical glitch that allowed 328 participants to enter into the survey without randomization and without any delivery of intervention. This had to do with the connection of an external module in REDCap that controls the randomization/stratification and then the insertion into one of the 4 arms for the intervention delivery. The “bucket” capturing higher educational attainment was “full” and there wasn’t a stopgap in place to trigger an out. So, these participants were skipped through and pushed straight into answering the outcomes survey.
Now, the question here is what to do.
- If the participants had been randomized, but then just happened to receive no intervention, that could proceed via ITT. But, there was no randomization.
- I would normally be jumping up and down to say we don’t just “remove” or “replace” them – because that would be more post-randomization exclusion. But, that’s not the case here, because they didn’t properly randomize.
I’ve tried to think through a few options:
- Redeploying those slots in the survey to capture an additional 328 participants. However, if we do this, should it be general and just “open” to all comers – or should we aim to capture 328 additionals with the higher level of educational attainment, because the “pass through” differentially hit that subgroup over time (it wasn’t that 328 patients in a row just happened to be in that category, it was a cumulative effect). But, note, we are not trying to achieve a representative sample on education. In fact, we are oversampling for low education, so aiming to target 328 with higher education may seem counter-productive, but there could be a data integrity or statistical impact element to consider.
- Do we randomize post-hoc and then proceed via ITT (though we know no intervention was delivered)?
- Do we take the loss against the planned sample size because of the unforeseen complication?
Appreciate any thoughts or ideas!!