Health datasets - what do we want to measure?

Hello all,

We are in the midst of our datasets selection, and we were considering to add Health Performance data. The problem is, that health performance has so many datasets in so many formats. There is a lot of variety, and we want to narrow it down a bit so we will have a better definition of this dataset.

We see two different stories of this data -

a. Measure the state of the health in a country - this will probably include parameters like common diseases, number of doctor per capita, waiting times for specialists, etc.
b. Parameters that can help citizens in real time decision making - This will probably include number of available beds in ER’s, locations and services of different clinics, etc.

Can you help us to understand which health data we want to measure? What would you like us to measure? What do you think is more likely to be measured on a global level?

Real time decision making? Deciding which ER to go to based on which is the least busy at this moment seems a bit ambitious at the moment. Yet equally some national-level public health performance statistics may not be very useful either.

In education the sweet spot is often school-level data: that is useful for overall education outcome work (particularly as school-level data is sufficiently granular to relate to other indicators such as family income), for public service management and accountability and for individual choice/consumer pressure by parents.

So in health the sweet spot may be data at the level of individual hospital/clinic - again useful for overall health outcome work as it can be related to other indicators, useful for management and for civic engagement, and valuable for individual decisions. The data needs to be location, facilities, resources and performance.

Finely-grained public health data (for example diseases by district) may also be useful, but it is a different dataset. District level health service performance does not allow an individual to judge which clinic to go to.

1 Like

Hear, hear.

The FSA let me know which restaurants are safe, it is only logical that the NHS do the same with hospitals.
This is available as open data in some countries. See for instance and the associated dataset here:
Data about the prevalence and/or incidence of hospital-acquired infection (HAI) might be especially interesting to have.

1 Like

I think you could think of two main categories of data (they somewhat reflect what you mean by “measuring health” and “helping with decision making”.

I would think about “disease rate/incidence” (probably there’s a better name for that), and “performance indicators” that would measure the service side of things.

In that way, you can clearly separate statistics on the national health (what you mention in point A) and everything else that has to do with the way people have access to health services (related to point B, but I agree with @dirdigeng that probably taking your smartphone after you’ve been hit by a car is ambitious and maybe a bad idea ;)) . So, on one hand you have actual “health” data and on the other hand performance data from providers.

On which datasets should be included on each category I have mixed feelings. Diseases are somewhat listed by organizations such as WHO and we could take that, but then, ¿could you blame a country that hasn’t had a case of Malaria in 20 years for not publishing a dataset on that disease?

Performance indicators are even harder. Once again, there a few that seem obvious and universal (eg. mortality) but just through our experience (DATA Uruguay) on launching in Mexico with Codeando MX, we’ve realized there are HUGE differences in the way health systems work and we’re having a tough time trying to have a unique data standard.

1 Like

@danielcarranza - can you elaborate more about the differences in data on performance between Mexico City and Uruguay? Which Datasets were almost the same? Which ones were difficult?

@dirdigeng - From your experience in other countries - is this performance data really available? How do we define performance? Internal report by the ministry of health? From your experiences, can we find these reports in developing countries? If so, how often are they being done?

I do think that health is an important topic, and the beauty of the Index is that we can try something this year and iterate on it with changes next year. However, I feel like it shouldn’t be as wide and vague as it is in the Barometer’s definition. Who can consult about this? WHO?

On Australia’s Regional Open Data Census we use the following definition for Health Performance data:

Statistics generated from administrative data that could be used to indicate performance of specific healthcare services, or the healthcare system as a whole (e.g. emergency care, patients treated, elective surgery, quality, safety, patient experience, dental care, mental health).

1 Like

@mor A common pattern in small middle-income countries is that data comes from two sources. Mortality data comes through the Registrar of Births and Deaths. In addition each hospital makes a periodic return (often monthly) to the Ministry of Health. This covers the number of beds available, admissions and other basic throughput information. It often goes into great detail on diseases/ailments etc and to a lesser extent on throughput by speciality. Private hospitals tend only to be required to report diseases, although most Ministries of Health would like private hospitals to report on the same basis as public hospitals. Public health statistics are published in some detail, generally as PDFs of underlying spreadsheets, together with aggregated throughput statistics. Ministries of Health typically have a directory of health facilities, although it is sometimes not up to date. Information on “individual hospital budgets” is hard to find - often different inputs (staff, premises, medicines, equipment) are funded in different ways.

@Mor It has to do with very different systems. Here in UY you’ll get reports by health provider (which doesn’t even translate to “health insurance”, it’s way more complicated) and one provider might have it’s own hospital or use different hospitals, or even specialized clinics. My own health provider uses another’s hospital (the facilities) but their own doctors. So, mortality by hospital could be a pretty useless measure for me as I could go to several places depending on availability, my health plan, my condition or even prices (for my provider). In Mexico -for what I could understand- what you have are actually several systems running at the same time (all different). Some of them are “location centered” but others aren’t. So, mortality by hospital again… mostly useless for me, might be useful in Mexico, depending on the system you belong. Don’t get me wrong, all data is welcome and can be used. But for mortality, I’d care about mortality by health provider. Even by doctor would be much more useful than by hospital. Here few doctors work in most places, interesting data to gather would be if their success rate is the same when operating in different hospitals or for different “clients” (providers).

I like @Stephen’s definition. It’s broad enough to include vary different ways the systems work, but pretty clear on specifying that that data (whatever it is) should enable me (in my context) to measure and compare performance.

I agree with @dirdigeng that data at health services provider level is what should be assessed for the health sector, and this should include (geo)location, resources, facilities, and health indicators. I would avoid to focus primarily on performance data, as performance data only becomes relevant when you have core and granular data on the health system.

WHO’s glossary may help in refining the health dataset definition

Also, I think that a good to exercise to recognize what is a key datasets within a sector is to think about dataset as “unlockers” of a system, i.e. a dataset that once opened, can help to make connexions with other key data in other sectors, and can add great value to other data within the sector.

Many Government do publish health performance data at macro level or sub-level, because this is asked by development partners, but sometime without even knowing where their hospitals really are …


Guys, this discussion was really helpful, thank you!

It seems like there is still some basic data we need to get before we can go in more complicated level (as @pzwsk says). I think this can be really good exercise for us to see what data is available on a global level.

@mor Is it worth considering some of the suggested datasets as “candidate” datasets for the Open Data Index? - a bit like candidate sports for the Olympics? In this year’s exercise we could ask reporters to give details of what is available in their country, and give a preliminary assessment against the standard criteria, and we use that information to firm up the definition for making the dataset a Full member of the Index for the next year (or drop it altogether if there is no clear picture of what data could be available or it depends too much on national structures).


It is a good suggestion, but I am a bit concern it will confused the submitters as well. We can ask submitters to add it to the comments, and analyse it there.

I am afraid we are too late in the game in order to do changes in the platforms that will allow implementation of this suggestion…

1 Like

Also, building on my other comment - this discussion made me realise that we are speaking about 3 different datasets.

  • Location and type of facilities (Geo location theme)
  • Resources (Fiscal theme)
  • Health indicators (Statistics theme)

All of them probably be scatter in different places - service providers, insurance agencies, health ministry, etc. It will be impossible to add all of them to the index. It does sounds though, that locations and facilities is a basic dataset that needs to be collected (which remind me something nice that PAMI, a health provider did in Argentina -

Resources, as @dirdigeng mentioned before, are hard to find and reports and health indicators need specific characteristics which will fit all countries, which I think will be hard to find (see or new datasets definition guide).

We should try and collect more data on what health data is out there in each country. I’ll mark it so we can explain it to submitters.


Why not both? We can measure both the qualitative and quantitative properties of health care this way…

My concern is that the index is a crowdsourced and it will be difficult to look for more than one dataset. Moreover, I think we still didn’t figure our what is the most crucial dataset to measure.

From the three categories:

Health indicators are likely include in the national statistics, and health facilities respond to various level of administration (local, regional, national, federal). So, if it is to add a new dataset to the global index, the resources category seems more promising.

Since most countries have comprehensive public health funds which are often managed by institutions with a certain degree if independence, one option is to ask for the detailed budget of these Funds.

Another interesting option is to ask for data that show the results of the medicine included in the public funded programs.

In Ghana, it would be great to have data that generated by the Ghana Health service to indicate performance of the health service and also track the occurrence of certain deadly illnesses/diseases.

I think the name of the dataset which is currently used on the GODI is misleading. “Performance” suggests either the amount of patient intake, recovery rate, or something relating to “actual performance” of the institutions. The dataset description talks only about hospital locations and the disease rate in the country and neither IMO are about the ‘performance’ of the health system.

I also think that location of the hospitals is a totally separate topic from the disease rate and could be published by two different entities in the government.

It might make sense to reconsider in the future the label of the dataset and/or split it into two.


In Austraiia’s Regional open data census I have split location of government facilities (including hospitals) out from health performance

1 Like

@bluechi - yes, the name is far from perfect. I will look at this note before we publishing to full index. As this thread mentioned, there are some issues with health data, so the more feedback the better.