Devising a database

I`m new to datamethods and statistical methods in general and have got a question on principles of data entry for a prospective observational study. At the health care institution my residency is held there exist a relatively high flow of patients with psoriasis receiving biological therapy.

What I want to do is to devise a protocol for an observational prospective study. As for now, there is no database for this cohort of patients. I`m struggling with several problems:

  1. Where do I start with the protocol? Could you recommend any literature on the subject? There seem to be so many guidelines, which is a little frustrating
  2. What is the best approach at the stage of data entry? I understand rows are for observations and columns are for variables continuous or categorical. But there seem to be so many variables I`d like to include which makes the Excell table not very human-readable considering all the variables will have to be appropriately coded.
  3. One of the reasons I`m up to the task is my interest in adverse events in patients receiving various kinds of systemic therapy. Actually, in some future I was hoping to accumulate enough data for a survival-analysis. Are there any guidelines on data entry for a survival analysis?

In general there seem to be so many things missed out when I try to make some kind of a table for my task. Like for example a patient can receive certain kind of therapy for a definite period of time and than change for another therapy for some particular reason - this sort “change-of-the-state” information seems to be totally lost. In my search I have come upon relevance databases such as SQL which hold the ability to preserve such change variations. Are they used in medical research databases?

Thank you all in advance for any help or literature you might recommend

If you have a research library, you would want to see if the following texts by Paul Rosenbaum are available:

1 Observational Studies (second edition)
2. Design of Observational Studies (new edition published in 2020)


Consider the following Registries for Evaluating Patient Outcomes: A User's Guide: 4th Edition | Effective Health Care Program

1 Like

If you must use Excel, Darren Dahly has a good video:


Your problem is better described as one of data modeling, rather than data entry. For purposes like yours, the relational model is the obvious first step. Not only is the relational model a powerful discipline (it includes nontrivial theorems), but it is also ubiquitous and supported by many robust RDBMS implementations including several open-source projects (MySQL, PostgreSQL). You might even do well to collaborate with someone at your institution who is already conversant in data modeling.

1 Like

Strongly consider REDCap.


I second this.
REDCap has powerful tools in exporting and cleaning the data.
Also their logic branching is superb.


Others have offered some resources, but I would further recommend seeking an experienced partner to guide you with this study. Learning from a book can be helpful, but there are nuances in study design and conduct that only an experienced partner can help you with, to anticipate issues and to avoid mistakes that can bite you later on.

It sounds like you are at or are affiliated with an academic institution. Many, if not most, will have a biostatistical consulting service for faculty and staff, that you may be able to leverage as a partner resource. Some will likely have a fee structure associated with the provision of services, and you may either need to seek a grant (which they can likely help with) or they may have their own funding resources available to help cover those costs for you.

There are a plethora of issues to consider:

  1. You will need to obtain institutional IRB approval for the study before proceeding, which will include reviewing and approving your protocol, case report forms and related documentation.

  2. As a prospective observational study, you will need to draft an informed consent document that the patients (or their legal representative) will need to sign before you can enroll them in the study and collect any data on them. The IRB will need to approve this, along with the other documentation before you can proceed.

  3. You mentioned needing a protocol, and an experienced partner can help guide you with the identification and expression of key focal points for the study, development of the protocol and relevant content including background, inclusion/exclusion criteria, patient pathway, data to be collected, regulatory content, and planned analyses.

  4. Similarly, for the case report forms, they can help guide you in how to structure them, how to collect certain data, and related design conceptual issues. With an observational study, what data are reasonable to collect given standard clinical practice, what might be available for all patients, what might only be available on a subset of patients? How are you going to deal with missing data? How much data are reasonable to collect, given the resources that you will have available and how will that enable you to meet your study goals? Might you want to use any validated patient reported outcomes instruments and if so, which ones would make sense in context and might there be licensing issues for some?

  5. REDCap has been mentioned as a sophisticated database platform that can make sense here, and I agree. Data modeling has been raised, and that is an important consideration in how to design the technical implementation of your case report forms within a database application. What data will only be collected once per patient? What data may be collected serially over time, or have multiple occurrences (e.g. lab values and adverse events), that may require dedicated table structures for the efficiency of data collection and storage.

  6. What software application will you be using to analyze the data and what is your comfort level with it? How much programming may be required to manipulate and restructure your data into formats that are conducive to performing certain analyses, given the inputs required? An experienced partner can be helpful here, in collaboration with your clinical expertise, to aid in the selection of statistical methods, the interpretation of results, and what next steps may be apropos.

  7. Presuming that you may have publication plans, an experienced partner can help guide you with suitable analyses, table/figure generation, medical writing and the drafting and production of abstracts/posters/manuscripts, journal/meeting submissions, responding to peer review comments, and related tasks.

Collaboration with an experienced partner, or team, as may be apropos, is an important part of clinical research and I would advocate taking advantage of whatever human resources that you can avail yourself of at your institution, to help maximize your probability of conducting a successful study.