Trustworthy Data in an Authority Crisis

It's still possible to build trustworthy institutions to deliver quality data in an authority crisis, but they don't look anything like the institutions we currently have.

Jul 05, 2022

∙ Paid

If you’ve followed me for any length of time, you may be surprised that I like data.

There are so many reasons I like data, but a big part of it is that data makes me feel like I’m reaching down to the bones of a question. Well-captured data brings the complexities of the world into view. It helps us make sense of things we might only know through intuition. It’s the first step of my rough vision of how we create institutional scientific knowledge.

Today, we’ll talk about just that first part of this diagram. Data gathering seems like the simplest part of this equation, but it is more complex than most people realize.

Data gathering is a fascinating technical topic. The data that most people see as a straightforward reporting of facts is often a massively complex process with a life cycle of information gathering, transfer, and statistical analysis. I love this process. I find it incredibly fascinating and important.

This is why I joined the COVID Tracking Project back when there were fewer than 1,000 confirmed cases.

The COVID Tracking Project was a remarkable achievement of citizen-led data initiatives. Conceived by Alexis Madrigal and driven by hundreds of incredible technical volunteers, the CTP provided insight into Covid numbers for months while the CDC was scrambling to provide basic data on what was happening.

The system CTP had was this: a set of volunteers would sign up for a data input time slot. At the given time, we would check the state health departments for their latest COVID numbers and input them into a spreadsheet. The number would be verified by a secondary reviewer (who would ensure, for example, that 14,290 new cases weren’t entered when there are actually 1,429) and then reported by CTP as their “official” number.

This worked pretty well to produce good, reliable, clean data that others could easily access. But it was the errors in this process that taught me the most about reliability and trust in data gathering.

There was a stretch in those early days when the Washington health department was trying to update its COVID dashboard. During this time, their updates simply stopped. It looked, from the outside, like the virus had stopped spreading in Washington. This was clearly not true; everyone knew that. My solution was to manually enter data from each county health department until the new dashboard was live. This was an enormous time-suck, but it provided continuity in the data that we felt was important at this critical time.

CTP’s work was interesting because we weren’t even involved in any of the core part of the data gathering. The data was being gathered from thousands of hospitals by thousands of county health departments and reported by state health departments. Many of these health departments were working seven days a week to help the public know what was going on. Several of the employees were part of the CTP Slack group, which gave us incredible insight into the sausage-making of data gathering. When there were problems with the data, we would hear about it directly from the people collating it. It was wonderful to be in the middle of this hard work, to be a part of the scientific machine that brings insight about the world.

Why Did This Work?

As I reviewed my participation in this endeavor, I started asking myself, “How is it that the Covid Tracking Project was able to establish trust and become an example of reliable data?” CTP set forward a mission of data collection, and every step in that process was a girder reinforcing a stable platform worthy of trust.

Keep reading with a 7-day free trial

Subscribe to Matt Shapiro's Marginally Compelling to keep reading this post and get 7 days of free access to the full post archives.