Here’s the deal: AI giants get to grab all your data unless you say they can’t. Fancy that? No, neither do I | Chris Stokel-Walker

9 months ago

Imagine someone drives up to a pub in a top-of-the-range sports car – a £1.5m Koenigsegg Regera, to pick one at random – parks up and saunters out of the vehicle. They come into the pub you’re drinking in and begin walking around its patrons, slipping their hand into your pocket in full view, smiling at you as they take out your wallet and empty it of its cash and cards.

The not-so-subtle pickpocket stops if you shout and ask what the hell they’re doing. “Sorry for the inconvenience,” the pickpocket says. “It’s an opt-out regime, mate.”

Sounds absurd. Yet it seems to be the approach the government is pursuing in order to placate AI companies. A consultation is soon to open, the Financial Times reports, that will allow AI companies to scrape content from individuals and organisations unless they explicitly opt out of their data being used.

The AI revolution has been as all encompassing as it has been rapid. Even if you’re not one of the 200 million people who log on to ChatGPT every week, or dabble with its generative AI competitors such as Claude and Gemini, you will undoubtedly have interacted with an AI system – knowingly or unknowingly. But the fire of AI needs two constantly replenishing sources in order to survive and not burn out. One is energy – which is why AI companies are getting into the business of buying nuclear power plants. And the other is data.

Data is vital to AI systems because it helps them to develop facsimiles of how we interact. If AI has any “knowledge” – and that’s highly disputed, given it’s really a fancy pattern-matching machine – then it stems from the data on which it is trained.

One study forecast that large language models such as ChatGPT will run out of training data by 2026, so voracious is its appetite. Yet, without that data, the AI revolution may stall. Tech companies know that, which is why they are penning licensing deals for content left, right and centre. But that introduces friction, and a sector whose unofficial motto for the past decade or more has been “move fast and break things” doesn’t do friction.

Which is why they are already trying to nudge us towards an opt-out approach to copyright, where everything we type, post and share is destined to become AI training data by default unless we say no, rather than an opt-in regime, where companies have to ask us to use our data. We can already see how companies are priming us for this reality: this week, X began notifying users of a change to its terms and conditions of use that would enable all posts to be used to train Grok, Elon Musk’s AI model designed to compete with ChatGPT. And Meta, the parent company of Facebook and Instagram, has made a similar change – resulting in the viral “Goodbye Meta AI” urban legend that supposedly overrides legal agreements.

The reason AI companies want an opt-out regime is obvious: if you ask most people if they want anything from the books they write or music they produce, or the posts and photos they share on social networks, to be used to train AI, they’ll say no. And then the wheels come off the AI revolution. The reason governments want to enable such a change to the concept of copyright ownership that has existed for more than 300 years, and has been enshrined in law for more than 100, is less obvious. But like many things, it seems to come down to money.

The government has been confronted with lobbying from big tech companies suggesting that this is a requirement for them to consider the country as a place to invest in and share the spoils of AI innovation. One lobbying document penned by Google suggested backing its approach for an opt-out copyright regime would “ensure the UK can be a competitive place to develop and train AI models in the future”. The government’s mooted framing of the issue, which already puts the opt-out approach on the table as the method to be argued against, is therefore a big win for big tech lobbyists.

With the amount of money washing around the tech sector and the levels of investment being thrown at AI projects, it’s unsurprising Keir Starmer doesn’t want to miss out on the potential bounty available. The government would be remiss not to consider how to appease tech companies as they develop a world-changing technology, and to try to make the UK an AI powerhouse.

But this isn’t the answer. Let’s be clear: the UK’s mooted copyright scheme would effectively enable companies to nick our data – every post we make, every book we write, every song we create – with impunity. It would require us to sign up to every individual service and tell them that no, we don’t want them to chew up our data and spit out a poor composite image of us. Potentially hundreds of them, from big tech companies to small research labs.

Lest we forget, OpenAI – a company now valued at more than $150bn – is planning to forswear its founding non-profit principles to become a for-profit company. It has more than enough money in its coffers to pay for training data, rather than rely on the beneficence of the general public. Companies like that can certainly afford to put their hands in their own pockets, rather than ours. So hands off.