Menu

Competition for Leadership: Digital Giants OpenAI and Meta Prepare New AI Models Capable of 'Reasoning'

By
Photo:  Competition for Leadership: Digital Giants OpenAI and Meta Prepare New AI Models Capable of 'Reasoning'. Source: Collage The Gaze \ by Leonid Lukashenko
Photo: Competition for Leadership: Digital Giants OpenAI and Meta Prepare New AI Models Capable of 'Reasoning'. Source: Collage The Gaze \ by Leonid Lukashenko

The competition for leadership of digital giants in the field of artificial intelligence has turned into a hunt for digital data for training and technology development. And to get this data, tech companies, including OpenAI, Google, and Meta, ignored corporate policies and discussed breaking the law, The New York Times reports.

Meta said it would begin rolling out Llama 3 in the coming weeks, while Microsoft-backed OpenAI indicated that its next model, expected to be called GPT-5, was coming “soon”.

In particular, managers, lawyers, and engineers of Meta, which owns Facebook and Instagram, last year discussed buying one of the largest English-language publishing houses Simon & Schuster to purchase and use large works in artificial intelligence training. This became known from the records of internal meetings obtained by The Times. Employees also agreed to collect copyrighted data from the Internet, even if it meant facing lawsuits. They said it would take too long to negotiate licences with publishers, artists, musicians and the news industry.

Like OpenAI, Google has transcribed YouTube videos to collect text for its artificial intelligence models, according to five people with knowledge of the company's practices. This potentially violated the copyrights of the videos, which belong to their creators.

Last year, Google also expanded its terms of service. According to members of the company's privacy team and an internal memo reviewed by The Times, one reason for the change was to allow Google to use publicly available Google Docs, restaurant reviews on Google Maps, and other online material to gain more insight. AI products.

For many years, the Internet - with sites like Wikipedia and Reddit - was a seemingly endless source of data. But as AI has evolved, tech companies have been looking for more repositories. Google and Meta, which have billions of users generating search queries and social media posts every day, were largely restricted by privacy laws and their own policies from using much of this content for AI.

Today, the volume of new data is crucial for AI development. Leading chatbot systems have been trained from pools of digital text covering three trillion words, or roughly twice the number of words held by the Bodleian Library at Oxford University, which has been collecting manuscripts since 1602. Artificial intelligence researchers say that the most valuable information is high-quality information, such as published books and articles that have been carefully written and edited by professionals.

More than 10,000 professional groups, authors, companies, and others submitted comments last year on the use of creative works by AI models to the Copyright Office, a federal agency that is preparing guidance on how to apply copyright law in the AI era.

Justine Bateman, a filmmaker, former actress, and author of two books, told the Copyright Office that AI models have been taking content, including her writings and films, without permission or payment.

"This is the biggest theft in the United States, period," she said in an interview.

Technology companies are so eager for new data that some are developing "synthetic" information. This is not organic data created by humans, but text, images, and code created by artificial intelligence models - in other words, systems learning from what they generate. 

Similar articles

We use cookies to personalize content and ads, to provide social media features and to analyze our traffic. We also share information about your use of our site with our social media, advertising and analytics partners who may combine it with other information that you've provided to them. Cookie Policy

Outdated Browser
Для комфортної роботи в Мережі потрібен сучасний браузер. Тут можна знайти останні версії.
Outdated Browser
Цей сайт призначений для комп'ютерів, але
ви можете вільно користуватися ним.
67.15%
людей використовує
цей браузер
Google Chrome
Доступно для
  • Windows
  • Mac OS
  • Linux
9.6%
людей використовує
цей браузер
Mozilla Firefox
Доступно для
  • Windows
  • Mac OS
  • Linux
4.5%
людей використовує
цей браузер
Microsoft Edge
Доступно для
  • Windows
  • Mac OS
3.15%
людей використовує
цей браузер
Доступно для
  • Windows
  • Mac OS
  • Linux