Researchers Say Anonymous Data doesn’t equal Private Data

A research team at the Illinois Institute of Technology have been successful in extracting personal information such as gender and age from anonymous cellphone data making use of artificial intelligence and machine learning algorithms. This has raised a lot of questions regarding data security.

The team consisted of Vijay K. Gurbani, research associate professor of computer science; Matthew Shapiro, professor of political science; and Yuri Mansury, associate professor of social sciences. The data used by the researchers was obtained from a Latin American cellphone company. The data was used to accurately guess the age and gender of users through their private communications with relative ease.

The research team designed a neural network model to guess gender with 67% accuracy, outperforming modern techniques including random forest, decision tree, and gradient boosting models. The age of individual users were estimated with 78% accuracy using the same model.

‘Age and gender information does seem innocuous, but this information is used in nefarious ways by people, many times with devastating consequences,’ said Shapiro. ‘When someone with bad intentions targets young children for anything, ranging from sales to sexual predation, it violates a number of laws designed to protect minors, such as the Children’s Online Privacy Protection Act and HIPAA. At the other end of the age spectrum, seniors are targeted by sophisticated spam and phishing efforts given their susceptibility and their access to savings.’

The team made use of a Linux (Fedora) operating system with 16 GB memory and an Intel i506200U CPU with four cores to run the neural network model.

‘The laptop we used for this work is not exclusive at all,’ said Gurbani. ‘To a well-resourced adversary, there will be much more powerful machines available, including access to cluster computing, where multiple computers are configured in a cluster to provide the computer power for the AI/ML models.’

Although the dataset used to carry out the research is not one that is publicly available, Gurbani believes that an adversary could gather similar datasets by collecting data by attacking the computing infrastructure of service providers’, or through public Wi-Fi hotspots.

‘As we mentioned in our paper, such attacks unfortunately do occur and are not rare,’ said Gurbani. ‘The process to collect this data would not be easy, but it would not be impossible either.’

The paper’s objective is to initiate a dialogue which analyzes the effect of emerging artificial intelligence and machine learning techniques on privacy regulations. In the United States, there are no nationwide privacy regulations, so the researchers studied how these techniques affect the European Union’s General Data Protection Regulation articles.

‘Machine learning and automated decision making will be a mainstream of business processes, and there is no escaping that reality,’ said Gurbani. ‘The issue at hand is how to protect individual privacy as well as societal and economic interests from fraud using the appropriate regulatory framework.’

Mansury believes that one way to do that is to provide consumers with the ‘opt-out option’ which allows them to keep their personal information private when installing an app.

By Marvellous Iwendi.

Source: Illinois Tech