Newsworthy item here, I think. The authors of this EPPC report veritably invite a replication at #7 in this FAQ; somebody ought to take up the gauntlet.
The authors of this report describe the source of data used in this analysis as follows:
“Our team has purchased access to a commercially available all-payer health insurance claims database including de-identified data for all U.S. patients during the years 2017 to 2023. It includes information on hospital and office visits, diagnoses, procedures, and prescriptions processed through private health insurance, Medicaid, Medicare, TRICARE, and the Department of Veterans Affairs (VA).”
I question the existence of a database that includes data for “all U.S. patients” during the years 2017 to 2023 that contains information on hospitalizations, office visits, diagnoses, procedures, and prescriptions for all of the listed insurers (private health insurance, Medicaid, Medicare, TRICARE, and the Department of Veterans Affair (VA).
Interesting! (This is exactly the type of expertise I was hoping someone would offer.) Is there anywhere a compendium of such commercially-available databases, detailing their content and pricing?
ALL PAYER CLAIMS DATABASES
Overview
For at least the last decade and a half, attempts have been made to develop comprehensive databases of health information based on insurance claims. An all payer claims database like the one that these researchers state they used in their analysis is referred to by health services researchers as an APCD (All Payer Claims Database).
Existing APCDs are operated at the state level, usually enabled by legislation that mandates that insurers provide data, specifies requirements to assure protection of the privacy of patients whose data are included in the database, and shields entities that provide claims data from litigation related to provision of the data.
Even with enabling legislation at the state level, claims information from federal insurers such as the Veterans Health Administration (VHA), TRICARE, the Federal Employee Health Benefits Program (FEHBP), and the Indian Health Service (HIS) may not be included in the APCD. A 2016 ruling by the U.S. Supreme Court (Gobeille v. Liberty Mutual Insurance Company) held that plans regulated under the federal Employee Retirement Income Security Act (ERISA) could not be compelled by state governments to submit data to APCDs. State APCDs that continue to collect data from self-insured ERISA plans must rely on voluntary participation from employers and third-party administrators.
There is no federal legislation that mandates that insurers provide claims data to an APCD covering the entire United States.
Challenges of Creating an APCD AND LIMITATIONS OF EXISTING APCDS
A 2023 document from the Department of Health and Human Services (DHHS) about APCDs references three reports. The reports were prepared by researchers at the RAND corporation.
The first report (with a date of 2021) discusses the history of attempts to create APCDs and the challenges and limitations.
The second report (released in June 2022), provides additional detail on ACPD data collection and access procedures. https://aspe.hhs.gov/sites/default/files/documents/96f34fd0474b3da4884836c4341f1bbe/Linking-State-Health-Care-Data.pdf
The third report discusses the particular challenges of using APCDs for multi-state studies.
Status of State APCDS in 2023
An organization, the APCD Council, has been doing work to facilitate the creation of APCSs for every state in the United States. A map posted at their website shows the status of efforts to create APCDs by state as of 2023.
The APCD Council described the status of APCDs in 2023 as follows:
Existing APCD—25 states
Strong interest—10 states
Voluntary APCD—4 states
No current activity—11 states.
The APCD Council website also provides links to detailed information about the status of development of APCDs for all 50 states and additional information about the APCD (or links to this information) for states that have an APCD.
COMMERCIAL DATABASES BASED ON CLAIMS
The referenced report states that the de-identified data used was purchased. There are a number of health datasets available to be purchased.
A 2023 Lancet article (NOT paywalled) describes (somewhat briefly) the 11 major commercial health datasets and vendors in the United States. Several use data only from paid claims. Several use data from claims and electronic health records (and sometimes other sources).
None of these commercial datasets contain data on claims from “all payers” for any period.
https://www.thelancet.com/pdfs/journals/landig/PIIS2589-7500(23)00025-0.pdf
SUGGESTION FOR REPLICATION
Dr. Norris has suggested that a replication of the analysis of complications of medication abortion might be useful. Because the authors of this analysis describe the source of their data as an “all payers claims database,” perhaps a replication in one or more of the 25 states that are described as having a functional APCD would be useful.
Thanks for the great resources. My take on claims databases is that they miss the mark on extent of disease, poverty, baseline physical function, and functional/quality of life outcomes.
And, maybe most importantly, time of disease “onset.” There will often be no relationship between the time of disease/symptom onset and when a particular diagnosis is first coded by a primary care physician. Patients often wait months to years before first reporting non-severe symptoms to their physician. It’s also common for patients to stockpile their complaints and then present with 5 or more in a single visit, meaning that only a single complaint or a subset of complaints might be coded. As a result, chicken/egg scenarios will often be unresolvable with this type of study.