Do not use choropleths for your COVID-19 counts, ever!
In a hilarious contribution to Medium, Dr. Noah Haber et al. issued a call to “Flatten the Curve of Armchair Epidemiology“. They analyze the transmission of “well-intended partial truths” about COVID-19 and caution of hidden “viral reservoirs throughout the internet”. To flatten this curve, they recommend fact-checking before posting and go as far as endorsing social-media distancing measures. As with general COVID-19 tips based on armchair epidemiology, misinformation can also be spread through the numerous COVID-19 maps that are widely circulating through the Web. In this article I want to focus on one particular instance of armchair cartography: wrongly mapping COVID-19 count data using choropleth symbology.
Choropleths are great-looking maps, my favourite thematic map type! They use graduated colour schemes to fill areas (the spatial units of analysis) to represent the magnitude (usually in ranges) of data collected for, or aggregated to, these units. But they can be deceptive in many ways, one of which arises from using raw-count data without adjusting for the different sizes of the spatial units. The above gallery of cartographic failures shows a small selection of misleading choropleth maps of COVID-19 cases published by major government and news media Web sites as of March 26, 2020.
Representing raw-count variables using choropleth mapping is a mistake that is notoriously difficult to explain. In “Mapping coronavirus, responsibly“, Dr. Kenneth Field notes the need to normalize raw COVID-19 totals to account for different underlying population sizes of China’s provinces. But in a related debate on Twitter, Dr. Stephanie Tuerk, a Senior Data Visualization Engineer at Mathematica, pointedly asks: “Can you further articulate the problem with using a choropleth to display counts? What precisely will people misunderstand?” She also questions the recommendation to transform count data into normalized rates, if the goal is to map the original counts. Indeed, I tell my cartography students that normalizing their data (by area, total population, or another reference total) will create a new variable and they need to think about whether that’s what they actually want to visualize.
The best explanation that I have seen as to the actual reason for the misrepresentation of raw-count data through choropleth maps was written by GIS Consultant and former Harvard Lecturer Paul Cote under the heading “Effective Cartography – Mapping with Aggregated Statistics“. Using the schematic figures shown above, Paul underlines our cognitive ability to understand quantity from graphics that vary in one dimension (size), such as in proportional symbols, in contrast to how we read intensity from colour (lightness, value), such as on choropleth maps. It appears that we are wired to understand a choropleth map as a representation of an intensity (e.g. population density per sqkm, infection rate per one million people), not as a count, and therefore this map type does not fit with raw-count data.
The cartography textbook by Dr. Terry Slocum et al. (2009) proposes an additional explanation. They note that we read information from a choropleth map as the probability of encountering a phenomenon. For example, if we look at Google’s world map of COVID-19 cases, China’s 80,000 cases put it in the highest class (dark blue). We’d therefore expect to be exposed to many infected people if we were to travel around that country. Conversely, we’d expect to find fewer cases in Canada, since this country’s 4,000 cases are mapped two classes lower (medium blue). Assuming we run into comparable numbers of people given space-time constraints (but ignoring current travel restrictions!), this is a wrong conclusion since Canada’s COVID-19 infection rate of 103 cases per one million population is roughly twice as high as China’s 53 (March 26 data from https://www.worldometers.info/coronavirus/#countries).
It is important to note that this issue does not automatically occur on every choropleth map or between any two spatial units on a given map. In fact, I had a hard time finding a suitable pair of provinces or countries, in which the relationship between raw counts was inverted compared to that between normalized data. Yet, the possibility of this issue is what makes the choropleth map a no-go for visualizing total counts.
The above example also highlights another serious issue of the choropleth technique: It maps each value homogenously across its entire spatial unit, while in reality many phenomena are unevenly distributed within the units. Infectious disease is a good example of a phenomenon that produces highly localized clusters (China’s city of Wuhan, Italy’s Lombardy region, Germany’s Heinsberg district), which are poorly represented on any choropleth map that uses data aggregated to larger spatial units. The coronavirus pandemic demonstrates that improper cartography is not just an academic concern but can have serious real life implications – on public attitudes and even on policy decisions!