Tools for exploratory development of databases?

davidcnorrismd · May 2, 2024, 9:45am

For data sets of any (relational) complexity I have generally been in the habit of using a full-featured database engine (usu., Postgres) to develop a highly normalized schema, and impose lots of integrity constraints.

But for an exploratory analysis I’m undertaking now, I’ve found GNU recutils — plus Emacs rec-mode — to be really right-sized. I think the issue is that I need to explore the available data and their relationships freely and organically, on my way toward a focused data set for analysis. So I’m finding something in recutils akin to the freedom often sought in scripting languages, from the rigorous plodding involved in working with strongly typed programming languages. Indeed, just as scripting languages may optionally allow type-checking, recutils does have fairly rich facilities for imposing constraints on record sets; a single keypress in Emacs rec-mode lets you run an integrity check after editing.

One additional benefit of recutils text-file-basedness, as noted here, is that the record sets can be version controlled.

What software do you all use for work of this kind?

f2harrell · May 2, 2024, 12:17pm

David does this exploration mask outcome variables to avoid downstream overfitting problems?

davidcnorrismd · May 2, 2024, 4:21pm

Good question! I’m using ‘exploratory’ perhaps in an overloaded sense that could be misleading on Datamethods. This particular investigation aims to find a group of orphan approvals in some fairly focused disease area, and then to examine the character of the evidence brought forth in support of the orphan indication.

So in this case, ‘exploration’ means something like ‘bushwhacking’. I won’t be making any claim that the final results of this in any way constitute a representative sample of anything. At best, I would be aiming for a comprehensive collection of <1 dozen orphan designations within some disease type that I will choose (and narrow) for possibly idiosyncratic reasons.