Scientists Find that ChatGPT Can Disclose People's Private Data

By Max Radchuk

12.01.2023 16:32

Photo: A team of researchers found out that the popular chatbot ChatGPT discloses people's personal information. Source: The Gaze collage

A team of researchers has published a study that shows that the popular OpenAI chatbot with artificial intelligence ChatGPT discloses people's personal information. Tech Policy Press writes about this with reference to a study published on the arXiv server.

A team of scientists from Google DeepMind and the universities of Washington, Cornell, Carnegie Mellon, the University of California at Berkeley, and ETH Zurich found that the machine learning model on which ChatGPT and all so-called large language models (LLMs) are based was trained on huge amounts of data collected from the Internet. Thanks to this, it is adept at generating new strings of texts without repeating the original texts it has absorbed.

However, it has previously been shown that image generators can be forced to generate examples from copyrighted data. The new study shows that ChatGPT is also susceptible to this.

Scientists extracted some of the training data and found that some of it contained identifying information of real people: names, email addresses and phone numbers, etc.

"Using queries to ChatGPT, we were able to extract more than 10 thousand unique verbatim memorised training examples. We can assume that targeted attackers will be able to extract much more data," the researchers said.

The experiment was based on the search for keywords that would disable the chatbot and force it to disclose training data. Thus, the researchers asked ChatGPT to repeat certain words, such as "poem", endlessly.

Their goal was to force ChatGPT to "deviate" from its task of being a chatbot and "return to its original purpose of modelling language". Although much of the generated text was nonsense, the researchers say that in some cases ChatGPT deviated to copy the source information directly from its training data.

The attack was carried out on the GPT 3.5 AI version, which is free for users.

"OpenAI claims that 100 million people use ChatGPT every week. Therefore, probably more than a billion man-hours have interacted with the model. Until this article, no one had ever noticed that ChatGPT was producing training data with such a high frequency. It is worrying that language models can have such hidden vulnerabilities," the researchers emphasise.

Read The Gaze articles in Google News

Recommended

How Ukrainian Innovators Shaped the Modern World

Russian Threats to NATO and Europe in 2025

Propaganda in Asia: How to Fight the Kremlin's Influence

Five Writers from Other Cultures Born and Connected to Ukraine

New Sanctions against Russia – How they Reduce its Ability to Wage War

Similar articles

Ukraine Launches K4 Program to Fund AI Startups for Military Innovation

Zelenskyy Urges International Partners to Restrict AI Technology Exports to Russia

Ukraine Leaps from 102nd to 5th in Global Digital Services Ranking

Ukraine Adopts Law Enabling Free EU-Wide Mobile Roaming

From Battlefield to Breakthrough: Ukraine’s AI Research Earns Spot in NATO’s Top 10 Scientific Works of 2025

Ukraine to Launch AI Assistant in Diia App, Pioneering AI-Driven Governance