Research Guides: Research Data: Finding Statistics & Data Sets

Highlighted Tool: PolicyMap

Explore mapping health data and more with PolicyMap.

PolicyMap is a browser-based mapping tool that provides access to a wealth of data concerning physical health, infant and maternal health, uninsured populations, various federal programs, and the location of health facilities such as hospitals and FQHCs. In addition, the tool includes a broad array of data related to Social Determinants of Health (SDOH), such as demographics, income, healthy food access, the economy, housing, public transportation, and more.

Tips for Finding and Using Data

When searching for data think about who would collect this data. Clinicians? Government agencies?
Pay attention to the data sources used in books and articles from your literature review.
- Did the authors deposit their research data in a repository? (Learn more about data sharing and new funder mandates)
- Even if they don't share their research data, you may find "Data available upon request" statements for possible case by case use.
When you find a relevant data set learn as much as you can about it before beginning any analyses.
- Always read the codebook, paying particular attention to data definitions!
- Read publications that used that data set and note how those researchers used the data.
The BERD clinic supports faculty researchers doing statistical analysis.
The Health Science Library can help you with data visualization and cleaning tools.
Cite the data set! Data is an increasingly important product of research and can contribute significantly to a researcher's scholarly reputation. The APA 7 outlines examples and guidance for citing datasets. Make sure to include the following elements:
- who created the dataset
- what the dataset is named
- what year the dataset was published or released
- what version of the dataset was used
- where the dataset is hosted
- what unique identifiers have been assigned to the dataset, such as a Digital Object Identifer (DOI) or Archival Resource Key (ARK)
- what date the dataset was accessed

Data Sources

Local & National Health Data

Shelby County Health Department

Tennessee Department of Health
The Tennessee Department of Health (TDH) is responsible for collecting, analyzing, and sharing data to inform health policy, programming and planning. TDH provides health reports, fact sheets, and visualizations of data related to Tennesseans' health.

Centers for Disease Control and Prevention (CDC)

NIH Repositories
Explore NIH supported repositories by domain or ICO for submitted data sets.

HHS (Health and Human Services) Surveys and Data Resources
The Guide to HHS Surveys and Data Resources is a compilation of information about all major data collection systems sponsored by the U.S. Department of Health and Human Services (HHS). These surveys and data collection systems allow us to monitor and track the health of the population and the functioning of the health care system.

Social Determinants of Health & GIS

PolicyMap
PolicyMap is a browser-based mapping tool that provides access to a wealth of data concerning physical health, infant and maternal health, uninsured populations, various federal programs, and the location of health facilities such as hospitals and FQHCs. In addition, the tool includes a broad array of data related to Social Determinants of Health (SDOH), such as demographics, income, healthy food access, the economy, housing, public transportation, and more. (View a recorded PolicyMap webinar.)

National Neighborhood Data Archive (NaNDA)
The National Neighborhood Data Archive (NaNDA) is a publicly available data archive containing measures of the physical, economic, demographic, and social environment at multiple levels of spatial scale (eg, census tract, ZIP code tabulation area, county). Each NaNDA dataset covers all or most of the entire nation (including both rural and urban areas) and represents a set of measures on a single topic of interest, including socioeconomic disadvantage, healthcare, housing, partisanship, and public transit, with temporal coverage dating back to 2000.

Medical Records & Marketing Data via UTHSC

Please contact the UT Health Science Center CBMI (Center for Biomedical Informatics) for assistance accessing and using the resources listed below.

CERNER Health Facts
Since 2000, CERNER Health Facts® database has captured and stored de-identified, longitudinal electronic health record (EHR) patient data, aggregated and organized to facilitate analyses and reporting – it currently contains data on almost 50 million patients and almost 300 million encounters.

Research Enterprise Datawarehouse (rEDW)
The Center for Biomedical Informatics (CBMI) at UTHSC provides access to a cloud-based health research platform TriNetXLive. This platform provides a visual and tabular data summary of the research Enterprise Data Warehouse (rEDW) containing standardized aggregated pediatric and adult healthcare data from Methodist Le Bonheur Health System.

MarketScan
The Truven Health MarketScan® Research Databases capture person-specific clinical utilization, expenditures, and enrollment across inpatient, outpatient, prescription drug, and carve-out services. The data come from a selection of large employers, health plans, and government and public organizations. The MarketScan Research Databases link paid claims and encounter data to detailed patient information across sites and types of providers and over time.

Nielsen Data Set
The size, scope, breadth, and longitudinal time frame of these data make them unique. They cover a wide range of products, categories, retail channels, stores, and geographic markets in the United States.

General Data Sources

Google Dataset Search
Using a simple keyword search, users can discover datasets hosted in thousands of repositories across the Web.

Inter-university Consortium for Political and Social Research (ICPSR)
ICPSR maintains a data archive of more than 250,000 files of research in the social and behavioral sciences. It hosts 21 specialized collections of data in education, aging, criminal justice, substance abuse, terrorism, and other fields.

Harvard Dataverse
The Harvard Dataverse Repository is a free data repository open to all researchers from any discipline, both inside and outside of the Harvard community, where you can share, archive, cite, access, and explore research data. Explore data by subject, including "Medicine, Health and Life Sciences."

OSF (Open Science Framework)
OSF is a free, open multi-use platform that can be used to work collaboratively on research projects and openly share research data.

Figshare
Figshare is a free and open sharing data repository that can be searched or browsed by category. Many journal publishers partner with Figshare to share datasets associated with publications.