The New York Times wants OpenAI and Microsoft to pay for training data


The New York Times is suing OpenAI and its close partner (and investor), Microsoft, for allegedly violating copyright law by training generative AI models on the Times’ content.

In the lawsuit filed in federal district court in Manhattan, The Times argues that millions of its articles were used to train AI models, including OpenAI’s ultra-popular ChatGPIT and Microsoft’s Copilot, without its consent. I went. The Times is calling on OpenAI and Microsoft to “destroy” model and training data containing objectionable content and to be held responsible for “billions of dollars in statutory and actual damages” related to the “unlawful copying and use” of The Times’s uniquely valuable works. Is. ,

“If The Times and other news organizations cannot produce and protect their independent journalism, there will be a void that no computer or artificial intelligence can fill,” The Times’ complaint reads. “Less journalism will be produced, and the cost to society will be much higher.”

Generative AI models “learn” from examples producing essays, code, emails, articles, and more, and vendors like OpenAI scour the web for millions to billions of these examples to add to their training sets. Some examples are in the public domain. Others do not fall under restrictive licenses or fall under licenses that require citations or specific types of compensation.

Vendors argue that the fair use doctrine provides comprehensive protection for their web-scraping practices. The copyright holder disagrees; Hundreds of news organizations are now using the code to block OpenAI, Google and others from scanning their websites for training data.

The number of legal battles due to vendor-outlet conflicts is on the rise, The Times being the latest.

Actress Sarah Silverman joined a pair of lawsuits in July, accusing Meta and OpenAI of “swallowing” Silverman’s memoirs to train their AI models. In a separate lawsuit, thousands of novelists, including Jonathan Franzen and John Grisham, claimed that OpenAI sourced their work as training data without their permission or knowledge. And several programmers have a case underway against Microsoft, OpenAI and GitHub over Copilot, an AI-powered code-generating tool that the plaintiffs say was developed using their IP-protected code.

While The Times is not the first to sue generic AI vendors over alleged IP violations involving written works, it is the largest publisher involved in such a lawsuit to date – and highlights the potential damage to its brand through “hallucinations.” One of the first publishers to do so. Facts generated from generic AI models.

The Times’ complaint cites several instances in which Microsoft’s Bing Chat (now called Copilot), which is based on the OpenAI model, provided false information that it said came from the Times. was – which also included results for the “15 Most Heart-Healthy Foods”. ,” 12 of which were not mentioned in any Times article.

The Times also makes the case that by using OpenAI and Microsoft Times Works are effectively creating news publisher competitors, harming the Times’s business by providing information that generally cannot be accessed without a subscription. Could – Information that is not always cited, sometimes monetized, and removed affiliate links that The Times uses to generate commissions.

As The Times’ complaint states, generic AI models have a tendency to regurgitate training data, for example by reproducing results from articles almost verbatim. Beyond resurgence, OpenAI has, on at least one occasion, inadvertently enabled ChatGPT users to receive paid news content.

“Defendants sought to take advantage of The Times’s substantial investment in journalism,” the complaint says. The complaint accuses OpenAI and Microsoft of “using The Times content without payment to create products that replace The Times and drive audiences away from it.”

The impact on news subscription businesses — and publisher Web traffic — is at the center of a similar lawsuit filed by publishers earlier in the month against Google. In the case, like The Times, the defendants argued that AI experiments, including Google’s AI-powered Bard chatbot and search generator experience, siphon off publishers’ content, readers and advertising revenue in anti-competitive ways.

There is merit in the claim of the publishers. A recent model from The Atlantic found that, if a search engine like Google integrated AI into search, it would answer a user’s query 75% of the time without requiring a click-through to its website. Google Suite publishers estimate they will lose up to 40% of their traffic.

Some news outlets, rather than fighting vendors in court, have opted to sign licensing agreements with them. The Associated Press struck a deal with OpenAI in July, and German publisher Axel Springer, owner of Politico and Business Insider, did the same this month.

In its complaint, The Times says it attempted to reach a licensing arrangement with Microsoft and OpenAI in April, but negotiations were ultimately not successful.

Source link

About Author

Leave a Reply

Your email address will not be published. Required fields are marked *