Mobility Data for Response to COVID-19 could Exclude Older and Non-white People

An effective method used to analyze appropriate response to COVID-19 is information on an individual’s mobility— usually measured by their smartphones. A recent study tested the reliability and prejudice of popularly used mobility data, discovering that older and non-white voters are less likely to be captured. If public health resources were to be allocated based solely on this information, it could cause harm to elderly and minority groups.

The study was carried out by researchers at the Carnegie Mellon University (CMU) and Stanford University. It makes an appearance in the Proceedings of the ACM Conference on Fairness, Accountability and Transparency, a publication of the Association for Computing Machinery.

‘Older age is a major risk factor for COVID-19 related mortality, and African-American, Native-American and Latinx communities bear a disproportionately high burden of COVID-19 cases and deaths’, said Amanda Coston, doctoral student at CMU’s Heinz College and Machine Learning Department. She led the study as a summer research fellow at Stanford’s University Regulation, Evaluation and Governance Lab. ‘If these demographic groups are not well represented in data that are used to inform policymaking, we risk enacting policies that fail to help those at greatest risk and further exacerbating serious disparities in the healthcare response to the pandemic’.

Mobility data was used during the pandemic to examine how effective social distancing policies and people’s travel were to the spread of the virus. Regardless of how crucial the situation where this information was used, there are no adequate independent assessment of the data’s reliability.

In this study, the first independent audit of demographic bias of a smartphone-based mobility dataset used in the response to COVID-19, the researchers assessed the validity of SafeGraph data. This popular mobility dataset has information from about 47 million mobile devices in the United States. The data originates from mobile applications such as weather, navigation and social media apps where the users turned on their location.

At the beginning of the pandemic, SafeGraph released a sizeable amount of its data for free to assist researchers and the government inform responses. Consequently, SafeGraph’s mobility data has been used in a wide range of the pandemic research, including by the Centers for Disease Control and Prevention, and to aid public health orders issued by various authorities. The researchers in this study had the objective of determining if SafeGraph data was accurately representing the broader amount of the population.

SafeGraph publicly reported the accuracy of the representativeness of their data, but the researchers suggested that as a result of the company’s analysis on demographic bias only at Census-aggregated levels, and the demographic bias specific to places of interest (like voting places), an independent audit was necessary.

The biggest challenge in conducting the said audit is the inadequacy of demographic information— SafeGraph doesn’t have data like age or race. The researchers illustrated how administrative data could provide the required information for a bias audit, complementing SafeGraph’s information. They made use of North Carolina’s voter registration and turnout records, which included information on age, gender, race and voters’ travel to the polling station on Election Day. Their data originated from a private voter file which combines publicly available voter records. The study included a total number of 539,000 voters from North Carolina who voted at 558 locations during the 2018 general election.

The study identifies a sampling bias in the data that underrepresents two high-risk groups. The older and minority voters were less likely to be captured by the mobility data. This could lead to insufficient allocation of health resources such as testing sites and masks to vulnerable populations.

‘While SafeGraph information may help people make policy decisions, auxiliary information, including prior knowledge about local populations should also be used to make policy decisions about allocating resources’, suggested Alexandra Chouldechova, Assistant Professor of Statistics and Public Policy who coauthored the study.

The authors requested that more firms should provide the kind of data that is more transparent and represented. They also noted that in the United States, the voters tend to be older and inclusive of more white people than the general population. Additionally, since SafeGraph provides the researchers with a summed up version of the data for privacy reasons, the researchers could not test for bias at the individual voter level. They instead tested for bias at the physical places of interest, discovering evidence that SafeGraph is more likely to capture traffic to places frequented by younger, white visitors than to places frequented by older, largely non-white visitors.

The study shows how administrative data can be used to overcome the insufficiency of demographic information, a common challenge in conducting bias audits.

The study was supported by Stanford University’s Institute for Human-Centered Artificial Intelligence, the Stanford RISE COVID-19 Crisis Response Faculty Seed Grant Program, CMU’s K & L Gates Presidential Fellowship and the National Science Foundation.

By Marvellous Iwendi.

Source: Heinz College