AI chatbots are illegally ripping off copyrighted news, says media group

AI developers are taking revenue, data and users away from news publications by building competing products, the News Media Alliance claims.

Artificial intelligence developers heavily rely on illegally scraping copyrighted material from news publications and journalists to train their models, a news industry group has claimed.

On Oct. 30, the News Media Alliance (NMA) published a 77-page white paper and accompanying submission to the United States Copyright Office that claims the data sets that train AI models use significantly more news publisher content compared to other sources.

As a result, the generations from AI “copy and use publisher content in their outputs” which infringes on their copyright and puts news outlets in competition with AI models.

“Many generative AI developers have chosen to scrape publisher content without permission and use it for model training and in real-time to create competing products,” NMA stressed in an Oct. 31 statement.

On Monday, the News/Media Alliance published a White Paper and a technical analysis and submitted comments to the @CopyrightOffice on the use of publisher content to power generative artificial intelligence technologies (#GAI). https://t.co/Zr05e7nZTS

— News/Media Alliance (@newsalliance) October 31, 2023

The group argues while news publishers make investments and take on risks, AI developers are the ones rewarded “in terms of users, data, brand creation, and advertising dollars.”

Reduced revenues, employment opportunities and tarnished relationships with its viewers are other setbacks publishers face, the NMA noted its submission to the Copyright Office.

To combat the issues, the NMA recommended the Copyright Office declare that using a publication’s content to monetize AI systems harms publishers. The group also called for various licensing models and transparency measures to restrict the ingestion of copyrighted materials.

The NMA also recommends the Copyright Office adopt measures to scrap protected content from third-party websites.

The Guardian has accused Microsoft of damaging its journalistic reputation by publishing an AI-generated poll speculating on the cause of a woman’s death next to an article by the news publisher. https://t.co/tOie87HSyA

— News/Media Alliance (@newsalliance) November 1, 2023

The NMA acknowledged the benefits of generative AI and noted that publications and journalists can use AI for proofreading, idea generation and search engine optimization.

OpenAI’s ChatGPT, Google’s Bard and Anthropic’s Claude are three AI chatbots that have seen increased use over the last 12 months. However, the methods to train these AI models have been criticized, with all facing copyright infringement claims in court.

Comedian Sarah Silverman sued OpenAI and Meta in July claiming the two firms used her copyrighted work to train their AI systems without permission.

OpenAI and Google were hit with separate class-action suits over claims they scraped private user information from the internet.

Google has said it will assume legal responsibility if its customers are alleged to have infringed copyright for using its generative AI products on Google Cloud and Workspace.