Open data repositories

This is a list of open dataset repositories related to health, medicine and epidemiology.

Some of the listed resources require registration or access approval (DiA = N), data may only be used for education but not research (Edu= Y, Res = N), and usage may not be free of charge (Free = N). Make sure that you verify and respect the restrictions of any particular dataset before usage.

This list purposely excludes medical imaging data repositories. For such imaging and other medical data for machine learning see: this list by Andrew Beam.

Contribute to the list

It all started with this tweet (with obvious misspelling of imaging):

Suggestions for new entries, and pointing out errors or faulty links are appreciated. Please notify me by adding your comments to the starting tweet or by email: M.van_Smeden@lumc.nl.

Finally: this list is for information purposes only and comes with no guarantees.

Repositories of multiple datasets

DiA Res Edu Free Positive notes Limitations h/t Last accessed
Vanderbilt Biostats data repository Y Y Y Y Diverse useable datasets (50+) Not all datasets directly linked to primary publication Lucy 12/07/2018
CDC NHANES datasets Y Y Y Y Nutrition/health survey (since 1999), RNHANES R-package Massive: easy to get lost; complete dataset might require merging of datasets James 12/07/2018
CDC NHIS datasets Y Y Y Y Household interviews about health (care access) (since 1961) Massive: easy to get lost; complete dataset might require merging of datasets Catherine 12/07/2018
CDC BRFSS Y Y Y Y Annual survey data health risk behaviors, chronic conditions (since 1984) Massive: easy to get lost; complete dataset might require merging of datasets Catherine 12/07/2018
CDC Birth/Death datasets Y Y Y Y Annual birth and death data (since 1968) Lisa 13/07/2018
Minnesota Population Center N Y Y Y Survey data sources on health/treatments/death etc. Need to create a free account (5min) with identifiable info Nicole 12/07/2018
Seer incidence database N Y Y Y Longitudinal database with 10,000+ cancer patients Need signed agreement that can takes several days Tim 12/07/2018
Dryad data Y Y Y Y Data of papers published in BMJ Open, Plos Medicine etc. Not all ‘data packages’ contain actual raw data Paul 12/07/2018
Biomedical data journal Y Y Y Y Biomedical data sets accompanied with citable manuscript Stopped its activities back in 2016 Paul 12/07/2018
UKDataservice N Y Y Y Major collection of surveys, UK census data and medical data Access for Non-UK based takes days for approval Michelle 12/07/2018
Sage research methods N N Y Y Small collection of medical datasets (6) Need university login and can only be used as teaching materials Chelsea 12/07/2018
ClinEpiDB Y Y Y Y Easy access to a couple of open datasets Currently only 1 of these studies accessible Brianna 12/07/2018
ICPSR N Y Y Y Large archive with social science and couple of health datasets Easy to get lost. Need free account (5min) with identifiable info Rohit 12/07/2018
CloserUK N Y Y Y Data of eight large (cohort) studies Half of studies not directly accessible for non-UK Mel 12/07/2018
Figshare N Y Y Y Contains data on several studies Difficult to find raw data files Maaike 13/07/2018
DABS Y Y Y Y Biomarker/diagnostic datasets (19) Noah 13/07/2018
Unicef MICS N Y Y Y Multi-country well-being surveys women/children Access approval takes several days Filipa 13/07/2018
Cebu Y Y Y Y Cebu Longitudinal Health and Nutrition Survey Massive: easy to get lost; complete dataset might require merging of datasets Darren 13/07/2018
Klein & Moeschberger Y Y Y Y Data from Survival Analysis book (1997) Data stored in an R-package; might also be advantage Benjamin 14/07/2018
Davidson Y Y Y Y Data from Statistical Models book (2003) Benjamin 14/07/2018
Royston & Sauerbrei Y Y Y Y Data from Multivariable Model-building book (2008) Tim 14/07/2018
California Health Interview N Y Y Y California Health Interview Survey Requires registration. Did not yet succeed to gain access Julia 14/07/2018
UK biobank N Y Y Y UK biobank with longitudinal data on 500,000 participants Elaborate registration procedure Tom 14/07/2018
Note:
DiA: direct access without registration (Y: Yes, N: No); Res: (most) data can in principle be used for research and in scientific publications; Edu: (most) data can in principle be used for eductation; Free: (most) data can be used without financial or other compensations.Mistakes/misclassifications possible

Single datasets

DiA Res Edu Free Positive notes Limitations h/t Last accessed
Acupuncture headache trial Y Y Y Y Trial with 401 patients treated for chronic headache by acupuncture Graeme 12/07/2018
Vital signs dataset Y Y Y Y Vital signs data for 32 patients who underwent anesthesia Rmadillo 12/07/2018
MIMIC dataset N Y Y N Large dataset with 50,000+ hospital admissions Mandatory online course to gain access that isn’t free for non-US researchers Tim 12/07/2018
PPMI dataset N Y Y Y Detailed cohort with 500 parkinson patients Access to data based on request (1 week). Annual update on analyses requested Dr. MJ 12/07/2018
LINCS L1000 Y Y Y Yn Elaborate gene-expression data repository Carlos 12/07/2018
Note:
DiA: direct access without registration (Y: Yes, N: No); Res: (most) data can in principle be used for research and in scientific publications; Edu: (most) data can in principle be used for eductation; Free: (most) data can be used without financial or other compensations. Mistakes/misclassifications possible