Where to get your data (on)

After blogging for weeks about various trends in mortality, aid and disease risk, we received a simple but critical question in our email inbox this week: where do you get your data?

The message wasn’t from someone desperate to finish their term paper; it’s a valid question about how we make conclusions about what problems need attention, what the underlying causes might be, and what new issues are arising in public health practice.

In this week’s blog, we attempt to summarize some of the most useful—and occasionally off-the-wall—resources we use to gather and analyze data on public health—from a website that let’s you directly compare the strength of evidence behind different health interventions, to a map that compares the number of liquor stores to the number of healthy food outlets in your neighborhood.

Domestic Portal sites

One of the greatest challenges to finding and analyzing health data is that large, specialized databases designed for researchers are opaque at best, formatted for 1980’s software, and rarely advertise themselves. But a series of newer websites offers the opportunity for even the most statistically-averse person to access and easily display critical health indicators.

Several of these web portals are based on efforts by the Kaiser Family Foundation to meaningfully translate data for the public. Among sites that offer domestic health data, http://www.statehealthfacts.org/ is perhaps the most accessible and comprehensive. During the recent healthcare reform debates, for example, it was difficult to tell which (if any) “facts” being reported in editorials and Sunday morning talk shows was correct. On the state facts website, it’s easy to verify, for example, that Medicaid is actually more administratively efficient than private health insurance companies (the overhead costs are about 14% for private coverage as compared to 7% for Medicaid).

On the state facts website, it’s also possible to quickly make tables and maps comparing the 50 states on policy-relevant variables as wide-ranging as the number of tax credits used by private insurance companies, to the percent of adults who had their teeth cleaned last year (see the map below; here’s to Connecticut for getting almost 80% of their residents’ teeth to look less British):

One of the disadvantages of the site, however, is that it only offers the most recent year’s data; so while it carefully catalogues its sources for you to trace, longitudinal comparisons can be difficult.

Another site that offers this kind of cross-sectional data is http://www.healthindicators.gov/, which is focused less on policy-related variables (like levels of funding for various programs, or insurance issues) and more on health statistics for specific diseases, the availability of health services by region, and the risk factors associated with disease. This site is most practical for the person trying to correlate data on health outcomes (such as maternal deaths), with healthcare service availability (such as the number of publicly-funded family planning clinics in an area), or with indicators of the social determinants of health (such as the rate of domestic violence in the area).

What’s most useful about this website is that it’s linked to the hundreds of U.S. government surveys and statistical records that you’d never know existed. For instance, if you wanted to compare the number of liquor stores to the number of healthy food outlets in your neighborhood, would it be obvious that you’d have to look in the U.S. Census Bureau’s County Business Patterns subsection of the Population Estimates Database, using the edition formatted according to the North American Industrial Classification System? It’s a bit easier to just click on “liquor store rate” to find out that Dorothy’s got a problem in Kansas:

A caveat to the health indicators warehouse is that it doesn’t yet include every form of government-level health data you might need, and it’s focused on the state level. For more disaggregated county-level data, you’ll need to visit the Community Health Status Indicators website to find the data in table format and the County Health Rankings website to make maps of that data.  The county-level data was recently used in a major new study to show that while people in Japan, Canada, and other nations are enjoying significant gains in life expectancy every year, most counties within the United States are falling behind.

For data that’s more demographically detailed (e.g., for research on health disparities by race), you’ll have to visit the Healthy People website, which will also gradually incorporate longitudinal time-series data. Finally, for measures of healthcare system variables (such as emergency department visits), you’ll have to visit the CMS Community Utilization and Quality Indicators website. Note that you should try to avoid accessing the utilization data through the Centers for Medicaid and Medicare website, because there’s practically no way to easily tabulate or map the data from the main CMS statistics site (of course, how intuitive!). A lot of other healthcare access and disparities data is available through the AHRQ.

If you’re lost, you can always go back to http://www.data.gov/health and look at all the options under “data/tools”.

Global Health Resources

How many of these resources are available to folks interested in global health? International statistics are notorious for their unreliability and inaccessibility, but there are a few new portal websites available to get easy-to-interpret data quickly, often in map or table format.

The old-guard of public health websites was the WHO’s statistical information system (WHOSIS). This recently underwent a major makeover to become more publicly accessible, and was rebranded the “Global Health Observatory” (GHO). The GHO offers three major categories of data: standard mortality statistics by disease, information on the prevalence of major risk factors (tobacco, alcohol, and even environmental risks), and some limited data on health systems (physicians per capita, etc.). A major disadvantage is that it’s often inconsistent with how many years worth of data it displays (sometimes just the most recent year), and vomits out the data in a format that’s really difficult to use in Excel or statistical programs unless you spend hours manually reformatting it. Much of the data on the diseases related to the Millennium Development Goals can be tabulated, exported and mapped more easily on the MDG Indicators website; the Hans-Rosling-style GapMinder charts on this site are also nice for web display, though you’ll have to put up with some bugs in the program:

For non-MDG data, many of us bypass the GHO altogether and get easily-analyzed WHO data from the global mortality database to find death rates by disease, the InfoBase for non-communicable disease indicators including risk factors like physical activity levels and tobacco, or the Global Health Atlas for communicable disease indicators like the number of malaria bed nets distributed in each country. These three databases make it much easier than the GHO to find longitudinal time-series data. If you need to make a quick table or graph of just core-level health and demographic indicators (basic indicators like crude mortality or access to water), then the WHO website is also much harder to manipulate than using DOLPHN, although some pre-prepared maps are available on the WHO Map Repository. If you still can’t find what you need, the WHO does have a guide to all of its indicators, although that website is frequently broken.

Do you sense a headache coming on? Never fear. Based on the state health facts website, the Kaiser Family Foundation also built http://www.globalhealthfacts.org/. As with the state health data, this is mostly cross-sectional data from the latest available year. And while much of the data is redundant with the World Health Organization’s GHO, it’s much easier to access. Some of the most useful data here is on aid and public health infrastructure, which is difficult to find on the WHO or other international websites. Like most Kaiser Foundation sites, it’s also easy to interpret (for instance, look at the OECD DAC and CRS databases to see how messy disaggregated aid data looks); the Global Health Facts website nicely parses the data down to relevant variables without losing critical information. What’s more, it allows for side-by-side comparisons between major aid initiatives, such as PEPFAR, the Global Fund, USAID and the GHI. You can quickly map data, for example, showing which countries get USAID water and sanitation aid (and the map is accompanied, of course, by the detailed tabular data):

And then compare that information to data on the regions that actually need water and sanitation assistance (here’s the map of the percent of population with sustainable sanitation access):

Of course, the king of all datasets in the World Bank’s World Development Indicators database, which incorporates much of the WHO’s data, and makes longitudinal analysis easy. Many global health folks secretly admit that they go first to this website and check-out if the data they’re looking for are on the Bank’s indicators page, because it’s so much easier to download to Excel and analyze the data from the Bank’s site than from nearly any global health database. The data are usually available from 1960 to the present, for every country in the world. For those who want data for a stats class or research paper, the data are all cleaned and sorted, so that you don’t have the messy task of relabeling Bosnia’s country name every third year in the dataset.

Survey data

The big elephant in the room is how to access surveys, especially household surveys upon which most of these websites are based.

At a domestic level, the three big survey databases all require a good degree of statistical training to use. They’re the National Health and Nutrition Examination Survey (NHANES) and it’s related National Health Interview Survey (NHIS), which combines interviews with laboratory studies and physical examinations on representative populations in the United States to study everything from longitudinal kidney disease rates to hearing loss; the Health Cost and Utilization Project (HCUP), which includes the largest collection of longitudinal hospital care data in the United States, with all payer and encounter-level data beginning in 1988; and the Behavioral Risk Factor Surveillance System (BRFSS), the world’s largest, ongoing telephone health survey, tracking health conditions and risk behaviors in the United States yearly since 1984. We wouldn’t recommend touching any of these unless you are dedicating yourself to a serious research project. What’s nice is that the NHANES folks have put up a tutorial for those who want to learn how to use their data, but it’s still at a professional-researcher level. It’s easier to access the BRFSS data, because of the availability of an interactive map-making website and online analysis tool. The HCUP is also easier to access with its online query site.

In global health, the Demographic and Health Surveys (DHS) are perhaps the most important survey data to mention. They’re a collection of than 240 surveys in over 85 countries, collecting indicators of fertilityfamily planningmaternal and child healthgenderHIV/AIDSmalaria, and nutrition. While you can access all of the raw data for free, the easiest way to visualize and download it is to use the online portal http://www.statcompiler.com/, where you can pick the relevant indicators, countries and years.

The DHS is notably limited. There’s a lot of other useful household survey information out there, some of it not available in the World Bank database, and much of it as widely varied as the World Health Surveys from 2002 to 2004 that focused on health systems, or the Reproductive Health Surveys that revealed critical information from the 1980s and 1990s, especially for Latin America and Eastern Europe. If you didn’t have a nerdy professor to tell you where to find these, how would you know about them? Thankfully, the folks at University of Washington’s Institute for Health Metrics and Evaluation have setup a Global Health Data Exchange (GHDx), which is in the process of incorporating hundreds of surveys and other databases into one common website. It carries raw data as well as nice interactive tables and charts for easy viewing and downloads.

Newer applications

As public data access and fancy online graphing software become a more popular way of increasing transparency, a couple folks stand out as having used data access for great new purposes.

The transparency award really must go to the British Guardian newspaper’s Datablog, probably the most effective use of (often health-related) data access we’ve ever seen. The Datablog tracks everything from gypsy caravan travel patterns to how the economic crisis has affected the poor in Eastern Europe, and it does so in a digestible one-page format with the data openly-accessible to anyone for reanalysis.

A new invention of the Kaiser Family Foundation is also innovative for making policy-relevant comparisons of how global health interventions stack up. Called the Global Health Interventions website, it tries to synthesize boat-loads of data about “what works” and make it accessible using simple rating bars for the strength of the evidence about the intervention and the likely impact of the intervention. For example, look at how we can compare isoniazid preventive therapy for tuberculosis prevention with the BCG vaccine:

And of course, here’s to Hans Rosling for making data exciting and accessible to folks whose eyes glaze over when the rest of us talk. Even though we’re not a big fan of flashy TED presentations that sometimes go off the deep end, some of the GapMinder charts really provide insight:

2 responses to “Where to get your data (on)

  1. Pingback: EpiAnalysis gives the Clif Notes | The Night Bus

  2. Pingback: Visible Cures « The Other Side of Complexity

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s