Menu

Scientists Find that ChatGPT Can Disclose People's Private Data

By
Photo: A team of researchers found out that the popular chatbot ChatGPT discloses people's personal information. Source: The Gaze collage
Photo: A team of researchers found out that the popular chatbot ChatGPT discloses people's personal information. Source: The Gaze collage

A team of researchers has published a study that shows that the popular OpenAI chatbot with artificial intelligence ChatGPT discloses people's personal information. Tech Policy Press writes about this with reference to a study published on the arXiv server.

A team of scientists from Google DeepMind and the universities of Washington, Cornell, Carnegie Mellon, the University of California at Berkeley, and ETH Zurich found that the machine learning model on which ChatGPT and all so-called large language models (LLMs) are based was trained on huge amounts of data collected from the Internet. Thanks to this, it is adept at generating new strings of texts without repeating the original texts it has absorbed.

However, it has previously been shown that image generators can be forced to generate examples from copyrighted data. The new study shows that ChatGPT is also susceptible to this.

Scientists extracted some of the training data and found that some of it contained identifying information of real people: names, email addresses and phone numbers, etc.

"Using queries to ChatGPT, we were able to extract more than 10 thousand unique verbatim memorised training examples. We can assume that targeted attackers will be able to extract much more data," the researchers said.

The experiment was based on the search for keywords that would disable the chatbot and force it to disclose training data. Thus, the researchers asked ChatGPT to repeat certain words, such as "poem", endlessly.

Their goal was to force ChatGPT to "deviate" from its task of being a chatbot and "return to its original purpose of modelling language". Although much of the generated text was nonsense, the researchers say that in some cases ChatGPT deviated to copy the source information directly from its training data.

The attack was carried out on the GPT 3.5 AI version, which is free for users.

"OpenAI claims that 100 million people use ChatGPT every week. Therefore, probably more than a billion man-hours have interacted with the model. Until this article, no one had ever noticed that ChatGPT was producing training data with such a high frequency. It is worrying that language models can have such hidden vulnerabilities," the researchers emphasise.

Similar articles

We use cookies to personalize content and ads, to provide social media features and to analyze our traffic. We also share information about your use of our site with our social media, advertising and analytics partners who may combine it with other information that you've provided to them. Cookie Policy

Outdated Browser
Для комфортної роботи в Мережі потрібен сучасний браузер. Тут можна знайти останні версії.
Outdated Browser
Цей сайт призначений для комп'ютерів, але
ви можете вільно користуватися ним.
67.15%
людей використовує
цей браузер
Google Chrome
Доступно для
  • Windows
  • Mac OS
  • Linux
9.6%
людей використовує
цей браузер
Mozilla Firefox
Доступно для
  • Windows
  • Mac OS
  • Linux
4.5%
людей використовує
цей браузер
Microsoft Edge
Доступно для
  • Windows
  • Mac OS
3.15%
людей використовує
цей браузер
Доступно для
  • Windows
  • Mac OS
  • Linux