The Fallacy of Red Covid
The New York Times stated that Covid has become a virus afflicting Republicans, but their methodology was deceptive through inexplicable omissions
Let’s talk about Covid counties, and red vs blue.
Since the early days of this pandemic, there has been a big push implying that red states are doing worse with Covid deaths than blue states. This is not really true. The differences have been regional and, to the extent that there have been more deaths in blue-vs-red or red-vs-blue, it has simply been because different regions tend to align in politically similar ways.
Even so, the theory that political alignment is driving death rates persists. Since September, the preferred method for demonstrating this has been to go county-by-county and look at the percentage of the Trump vote vs Biden votes within those counties. When we do this, we see what looks like a pretty stark difference between red America and blue America. I’ve noticed that there are some people who have come to the deeply mistaken conclusion that Covid is killing mainly Republicans.
This chart is a textbook example of how data is used to manipulate framing. According to the author, “In Trump and Biden counties, one candidate won at least 70 percent of the vote; in swing counties, both won at least 45 percent.”
Let’s start with this strangely arbitrary choice of which counties to include. First of all, the framing of this chart excludes over a hundred million people, nearly a third of the country. If you were in a county that voted for Biden with 54% of the vote and 44% going to Trump, your county doesn’t fit any of the metrics here and is simply left out.
Next, let’s compare these county groups that are included. This chart implies that the major difference between these counties is their political alignment, but when we look at them in detail, we see there are a lot of other distinctions that aren’t brought to our attention.
A caveat here: The New York Times county data set (which I used when writing this ) does not track Covid data for counties within New York City, which account for 15% of the population in the “Biden Counties”. Sadly, David Leonhardt does not mention this in his article and so I am unsure if his data set includes New York City. This is just one more frustratingly opaque angle on this weird data set.
We can see that this chart claims to be about politics but, when we see the details, might be more about an urban-rural divide.
Or perhaps this is about age. The Trump counties have 30% more elderly per capita than the Biden counties, so we would naturally expect older counties to have higher rates of death.
Or perhaps this is about income. If we look at just the “Biden counties”, we find that there is a strong correlation between high-income counties and lower Covid rates.
With a little bit of context, we can see what is happening here: There are about two dozen high-population, high-income counties that vote heavily for Democrats. This group skews heavily to a few large metros, especially if we’re not including the counties of New York City. For a little bit of perspective of how regionally concentrated this group is, 20% of the “Biden counties” population lives in Los Angeles.
These high population counties tend to be populated with younger people and a higher concentration of workers able to work remotely. When they catch Covid, they have quick access to medical care. The low-income counties that vote heavily for Democrats are rural and poor and much more likely to die of Covid not because of their political alignment (which is identical to the high-income counties) but because of structural disadvantages.
These structural disadvantages don’t go away due to voting patterns. They persist across all rural and poor communities regardless of what hole they punched in a voting booth on November 3rd, 2020.
Any investigation of this data outside of this simplistic frame presented to us by the New York Times shows this complex story. Collating the data into three coarse categories (and intentionally discarding all data that does not fit this arbitrary framing) doesn’t give us some great insight. On the contrary, it allows the author to make sweeping generalizations while blinding his audience to the details.
I cannot say how much I hate this framing, not because it is a manipulation of the data (though that component is certainly present in the omission of a myriad of essential details) but because it combines two intellectual practices I most loathe:
The practice of filtering all data patterns into “red vs blue” because that happens to be the data we have most readily available
The implication that COVID is a moral disease and the people who are dying are dying because they did the wrong things
This incessant red vs blue framing says, sometimes implicitly and sometimes explicitly, that it’s mainly Republicans dying from COVID and that’s their fault.
Predicting the Wrong Things
A huge frustration with this particular metric is that it embraces a complexity of data management that raises the technical bar for anyone who wants to engage the topic and then blends that high bar with a vast over-simplification along a single metric. Working with county data is not easy and it’s even less easy to replicate the work the New York Times and David has done here (especially when I’m not sure what their strategy is when it comes to including New York City counties).
I have a rickety facsimile of what I think the NYT has been using to generate their data and what I found is that combining counties using this metric obfuscates more than it reveals. Before the winter surge, the New York Times published its first piece on Red Covid. They claimed the gap between red and blue America was directly traceable to vaccine differences and confidently proclaimed “the gap will probably keep growing”.
But the summer surge was a regional surge that hit southern states. If you look at blue counties and red counties in those states (which the New York Times specifically did not do), you find that they were hit at similar rates. There is no clear partisan pattern to be seen.
As it turns out, the gap did not keep growing. When the winter surge hit every region of the United States, the gap between red and blue counties shrank. David Leonhardt explained this away:
The most likely explanation seems to be that the number of Trump voters vulnerable to severe illness — which was still very large earlier last year — has declined, because more of them have built up some immunity to Covid from a previous infection.
I find this explanation opportunistic. He’s looking for a reason to explain why he was wrong and this one seems to fit the bill. He has no data to support it other than the fact that he made a prediction based on his assumptions in September and got it terribly wrong. Rather than admit his prediction failed because his theory was probably wrong, he fishes around for a mentally satisfying explanation that allows him to keep his theory intact.
That is the entire problem with this kind of journalism. If we always find excuses to meet our biases, we’re not really doing helpful data work. We’re just pattern-matching past data to explanations that seem plausible. We’re just guessing in the dark.
Narratives Constrained By Knowledge
I wish I had the full and complete story of how to constrain Covid. That would be handy and probably very profitable. But the reality is that these monocausal explanations do more harm than good.
To demonstrate the absurdity of this data strategy. I decided to adopt the strategy of the Red Covid pieces and split counties into three groups, but set those groups to non-political metrics.
If we decided to make income inequality our focus, we could make a solid case that the poor counties have suffered Covid far worse than the rich ones. It’s right there in the chart, a clear and consistent separation between poor and rich counties. From this, I conclude that Covid outcomes are driven by wealth.
We know that age makes a difference, let’s look at these counties set into groups by how much of the population is over 65. If we wanted to focus on COVID’s risks to the elderly, we could set forth a narrative about how the real problem is our neglect of older populations.
The reason for focusing on politics isn’t to discover anything important or helpful. It is to assign blame, attack political enemies, and revel in self-satisfaction. The implication is ultimately to say that the red regions have eschewed vaccines and therefore condemned themselves to higher death rates, those fools.
But we can investigate this theory fairly easily. Last year during the winter COVID surge, almost no one was vaccinated. This is our control group; every county in the country had a minuscule number of effective vaccinations.
This year, the narrative is that the lack of vaccines are causing more death in red regions than blue regions. All we need to do is compare the impact of the winter surge this year (when anyone who wanted a vaccine can get one) to last year (when almost no one was vaccinated).
I looked at how many deaths were in each group (GOP, swing, Dem) this year compared to last year. My hope in doing this was to flatten the differences in income, age, population density, and all the other factors that contribute to higher or lower COVID death rates. The result is that this year’s winter surge looks a lot like last year’s.
The percentage of COVID deaths as measured by county-based political alignment is nearly identical. This implies that it isn’t politics driving these deaths, but whatever structural components that were present last year and continue to be present this year. That is most likely a complex combination of age, income, healthcare access, genetics, multi-generational living spaces, community engagement, and education.
I’m certain vaccines saved many lives this year. But vaccine administration for the highest risk group (+65 age group) is up to 80-90% for even the reddest states. The narrative of red counties as anti-science backwaters who are dying because of their stupidity is only possible with the most tortured manipulation of the data. To my deep sadness, this dishonest presentation has become a mainstream position.
We are never really comfortable with uncertainty. We seek causal answers to structure the world we see around us. The patterns of COVID over the last 2 years have not been particularly kind to that part of our psyches. The more work I’ve done with COVID data, the more aware I become that (as a friend put it to me) we go looking for the answer that satisfies our mental search for narrative closure. Once we find an answer that intuitively makes sense we stop investigating.
Our narratives naturally bend toward the metrics we can measure. Data on voting patterns is abundant and easily accessible so it is light work to overlay that pattern on all our other data. This isn’t good. This blinds us because we make these intuitive leaps with the same data set over and over. We use voting data as proxy data for patterns in income, age, race, population density, and a host of other cultural and economic factors.
But my biggest frustration is that this causes us to be unkind. These articles and tortured charts drive people to believe that COVID is the fault of their political enemies or that it is a disease that afflicts the outgroup. This is a view being explicitly promoted and it damages everything from national social cohesion to individual family relationships. It’s something worth fighting against.
Disney Shorts - Wynken, Blynken, and Nod
The poem Wynken, Blynken, and Nod by Eugene Field is a lullaby love song to childhood dreams. Adapted from that poem, this Silly Symphony is one of those gentle and beautiful shorts that Disney really loved to make. We join three little tots on their wooden shoe voyage through the sky. The first half of the short shows the whimsy of fishing for the little star herring fish, every visual soaked in toddler cuteness. The tots are soon tossed about their boat in a storm driven by the blowing of some playful clouds & this creates just enough tension and danger to give us relief when they float down from their dream adventure to the room of the sleeping child.
This short is overwhelmingly cute and a great example of how early animation was able to elevate the moving picture art form with fantastical visuals that weren’t possible in a live-action format.
“But my biggest frustration is that this causes us to be unkind. These articles and tortured charts drive people to believe that COVID is the fault of their political enemies or that it is a disease that afflicts the outgroup. This is a view being explicitly promoted and it damages everything from national social cohesion to individual family relationships. It’s something worth fighting against.”
Well said!
Being a bit lazy on vacation, so recycling a previous comment of mine to Eugyppius, but it highlights and provides source that this “analysis” that Leondhardt has been trying to push is no different than the outcome if run in any prior year. It’s been known for decades that for several reasons people in rural areas die at higher rates of the 5 leading causes of death than their urban counterparts. It shouldn’t take much imagination to figure out why.
https://www.eugyppius.com/p/nyt-do-covid-precautions-work-yes/comment/5480187
“For the full effect, you have to follow him regurgitate this story on a monthly basis the last year, each iteration, inching slowly to the obviousness of it all. [1]
I feel bad for the guy, because the final piece of the puzzle is right in front of him... if only he should for a moment re-run his "Trump counties have slightly higher mortality than Biden counties" for, say, 2019 or 2009 and discover the same pattern was there all along. He is SO CLOSE....
Or maybe he can just google "why do people die at slightly higher rates in rural areas compared to urban areas" and maybe he will find the answer all along, right from his favorite institution:
https://www.cdc.gov/media/releases/2017/p0112-rural-death-risk.html
Unbelievable.
[1] (one such iteration of his "journalism" https://www.nytimes.com/2021/11/08/briefing/covid-death-toll-red-america.html