Researchers Can Predict Vaccine Hesitancy on Zip Code-level

With new variants of the COVID-19 vaccine emerging every other day, the concept of vaccine hesitancy is become more present and powerful in the U.S.

Researchers at the USC Viterbi School of Engineering recently published a paper which proposes a Natural Language Processing (NLP) software that can pinpoint where skepticism around vaccines exist in real-time.

Research team lead at USC’s Information Sciences Institute (ISI), Mayank Kejriwal was inspired by the inadequacy of tech in predicting vaccine hesitancy. The software has word embedding algorithms which can detect keywords related to vaccines. These improvements make data collection on zip code-level incredibly faster, simpler and more accurate.

The study’s system make use of publicly available Twitter data and already-existing machine algorithms to process data. It has outperformed local and national survey data in its task of reflecting public opinions on the COVID-19 vaccine.

Sara Melotte, Research Assistant at ISI, gave a statement on the metrics of the study and how it makes possible the goal of making predictions on a community level.

‘We show that only the text tweet and hashtags are sufficient to predict zip code-level vaccine hesitancy with reasonable accuracy, even if the tweets are not all related to the COVID-19 pandemic,’ said Melotte.

It removes any possibility of bias in the surveys, an unavoidable consequence when people know their personal information is being collected.

‘Historically, a lot of things depend on surveys. When you see poll numbers, those are collected by surveys, which are expensive,’ said Kejriwal. The cost, timeliness and constantly evolving opinions are limitations which complicate the process of obtaining current and accurate data.

‘What typically ends up happening is we have to wait for the survey to come out, and by then, you’d already be too late,’ said Kejriwal. ‘But we showed that you can use publicly available Twitter data and scrape it out using a program.’

The model uses external data as sources, like the amount of hospitals or scientific establishments in the neighborhood. ‘We investigate the extent to which the use of these independent sets of features helps in improving the model,’ said Kejriwal.

However, the collection of such data is limited by varying state and city regulations on the availability of public information. The study still however provides reliable methods and data for predicting vaccine hesitancy in cities which are heavy traffic Twitter areas.

‘For any public health crisis, there will always be signals in social media,’ said Kejriwal. ‘This is an opportunity because it’s a living record and can provide us with a blueprint for getting signals in any public health crisis.’

By Marvellous Iwendi.

Source: USC Viterbi