the task: analysing patient registry data (millions of patients)
the problem: the data are so large that the stats software complains of “insufficient” memory when attempting to fit ‘complex’ models (i’ll leave out the detail)
the solution: buy more RAM, but i’m up to 64gb. Anything else?
some background info:
-there are plenty of tips online re making code more efficient, limiting CPU time, data storage etc.
-this guy complains of the same problem: “PROC PHREG : avec fragilité (effet aléatoire) + processus de comptage” [translation: ultimately they suggest buying more RAM]
-a potential solution?: Biom J, Analyzing Large Datasets with Bootstrap Penalization [they claim to make their code available, but as usual, the link doesnt work: broken link]
-maybe the best/simplest option: Analyzing Big Data in Psychology: A Split/Analyze/Meta-Analyze Approach (SAM)
has anyone implemented this before? i guess this problem is something statisticians encounter, unless we limit ourselevs to the most crude model? although it is not raised when discussing issues re registry data: Analysis, Interpretation, and Reporting of Registry Data To Evaluate Outcomes. i cannot see any references to the SAM approach in the medical literature, and i would have to let my computer run for weeks
edit: i would have to monitor progress and flag those sub-samples where the model failed to converge, a further complication, especially if it takes weeks to run