Google becomes the first AI company to be fined over training data – Yahoo Finance

3 minutes, 54 seconds Read

Nearly five years ago, I reported on how France’s news publishers had gone to war with Google over the issue of “ancillary copyright” fees—payments for including snippets of article text in Google’s search results. I wrote that the media houses were unlikely to win. Well, they just did. And as with everything else these days, AI suddenly became part of the equation.

This morning, the French competition authority fined Google €250 million ($271 million) for failing to comply with commitments it made a couple years ago regarding how it would negotiate the fees with the French news outlets. Google already received a €500 million fine over the matter in 2021. It’s not arguing back this time, which has probably spared it an even higher fine.

I’m not going to go into the details of the revised negotiation process to which Google has agreed—it’s as dull as it sounds and, if you really want to dive in, here’s Google’s statement in French and English. The interesting part of this episode is about Google’s Bard AI, which these days goes by the name of Gemini.

As far as I’m aware, this is the first fine to be levied on an AI company at least partially because of its free-wheeling incorporation of everything it can grab into its training data.

According to the French Competition Authority (FCA) the fine takes account of the fact that Google “used content from press agencies and publishers to train its [Bard] foundation model, without notifying either them or the [FCA].”

As Google tells it, the FCA “does not challenge the way web content is used to improve newer products like generative AI.” Google claimed this issue is “already addressed” in Article 4 of the EU Copyright Directive, which provides an exception for text and data mining, and in the upcoming EU AI Act, which tells AI companies to respect the Copyright Directive (and to publish “sufficiently detailed” summaries of their training data).

But according to the FCA, the question of whether article-scraping AI companies qualify for that text and data mining exception “has not yet been settled.” It said Google had “at the very least” broken its commitment to be transparent in its commercial dealings with French news publishers, making this a fining matter.

Now, Google did recently launch a new control in the robots.txt file that web publishers use to send signals to Google’s web crawlers. The setting is called Google-Extended, and it’s supposed to let publishers opt out of having their data become Bard/Gemini training fodder without also having their articles disappear from Google Search and Google News.

But it only added that control at the end of September, more than two months after Bard’s European launch. During that period, French publishers effectively had to allow the unrestricted hoovering of their output into Bard if they also wanted it to appear in Search and News—which, remember, is how they then get to claim ancillary copyright fees from Google. That broke another of Google’s commitments, again contributing to today’s fine. The FCA also told Google to explain to publishers how the opt-out mechanism works.

So in summary, Google’s new control for publishers has belatedly fixed one of the issues with the unpaid incorporation of news articles into its AI training material, but the overall legality of this practice under EU copyright law remains an open question. It’s not hard to see why Google’s big rival, OpenAI, has started cutting licensing deals with European press publishers like Axel Springer and Le Monde.

One final note while we’re on the subject of Big Tech and European antitrust regulators: Margrethe Vestager, the European Commission’s competition chief, just told Reuters that Apple may face a probe over the €0.50-per-app-installation “core technology fee” it’s levying on developers who dare to use the third-party iOS app stores that Apple must allow under the new EU Digital Markets Act. “There are things that we take a keen interest in, for instance, if the new Apple fee structure will de facto not make it in any way attractive to use the benefits of the DMA,” she said.

Given that Microsoft and Meta have both complained to Vestager that the fee makes third-party app stores unviable, there’s a storm a-coming, and I do not hate to say I told you so. More news below—and by the way, the Data Sheet team offers our red-cheeked apologies for misspelling “Nvidia” in yesterday’s email subject line…

David Meyer

Want to send thoughts or suggestions to Data Sheet? Drop a line here.

This story was originally featured on

This post was originally published on this site

Similar Posts