First published Wed Mar 24, 2021
Simpson’s Paradox is a statistical phenomenon where an association between two variables in a population emerges, disappears or reverses when the population is divided into subpopulations. For instance, two variables may be positively associated in a population, but be independent or even negatively associated in all subpopulations. Cases exhibiting the paradox are unproblematic from the perspective of mathematics and probability theory, but nevertheless strike many people as surprising. Additionally, the paradox has implications for a range of areas that rely on probabilities, including decision theory, causal inference, and evolutionary biology. Finally, there are many instances of the paradox, including in epidemiology and in studies of discrimination, where understanding the paradox is essential for drawing the correct conclusions from the data.
Simpson’s paradox says that when we combine all of the groups together and look at the data in aggregate form, the correlation that we noticed before may reverse itself. This is most often due to lurking variables that have not been considered, but sometimes it is due to the numerical values of the data.1 Jul 2019
The context missing here is Simpson’s Paradox—“statistical phenomenon where an association between two variables in a population emerges, disappears or reverses when the population is divided into subpopulations.” In other words, if we just slap data on a graph, it looks like one very clear story. However, when we take into account confounders—or other variables that could also explain this phenomenon—it tells another story.
In this case, we need to take into account age. White Americans are far more likely to outlive Black or Hispanic Americans. They live longer for a whole myriad of reasons, like differences in access to care, trauma, stress, etc. This is important because age is the strongest risk factor, by far, for dying of COVID.
If we account for age, we see a very different story. Below is data from a database called CDC WONDER. The 2022 death data is provisional (which means it’s not the official count because death certificates take a long time to process), but it’s the best we have. I pulled all COVID deaths for 2022 and organized by race. Before adjusting for age, White Americans account for 43 per 100,000 in 2022 compared to, for example, Black Americans, who account for 37 per 100,000. After we adjust for age, the story changes: Whites account for 31 per 100,000 while Blacks account for 40 per 100,000. A complete switch.
If we look at this phenomenon over time, the gap is narrowing. Below I graphed the difference between crude and age-adjusted deaths over time by race/ethnicity. Public health departments and grassroots organizations have been working tirelessly to get vaccinations to underserved communities. And, deaths among White Americans have increased faster over the past 3 years. But the association has not flipped like the NYT article concluded. Unfortunately, outrunning decades-long health disparities that under-served communities face in this country is incredibly difficult.
There are really important, big public health questions we need answered. And bringing them to light and having discussions is more than important. But we need to answer them responsibly, because, as we’ve seen throughout the pandemic, misinformation can do serious damage to public health responses.
“Your Local Epidemiologist (YLE)” is written by Dr. Katelyn Jetelina, MPH PhD—an epidemiologist, biostatistician, wife, and mom of two little girls. During the day she works at a nonpartisan health policy think tank, and at night she writes this newsletter. Her main goal is to “translate” the ever-evolving public health science so that people will be well equipped to make evidence-based decisions. This newsletter is free thanks to the generous support of fellow YLE community members. To support the effort, please subscribe here: