AI Companies Spent $2.9 Billion on Training Data Last Year. Most Creators Got Nothing.
The AI training data market reached $2.9 billion in 2024 and is projected to hit $13.3 billion by 2034, growing at a 16.5% compound annual rate. But almost none of that money is reaching the creators whose work makes it all possible.
While News Corp struck a deal worth over $250 million with OpenAI for access to Wall Street Journal and Times archives, the vast majority of web publishers — the bloggers, journalists, technical writers, and independent creators who produce the content AI systems consume — receive nothing.
The Scale of the Problem
Google traffic to publishers dropped by a third in 2025, according to Press Gazette, as AI-generated summaries replaced the need to click through to original sources. Seer Interactive measured a 61% drop in organic click-through rates when AI Overviews appeared in search results.
The economics are stark. AI model training costs have increased 4,300% since 2020, yet content — the raw material that makes these models useful — remains the one input that companies have been taking for free.
“The training data market has grown to $2.9 billion, but almost all of that money flows between AI companies and data brokers. The actual creators are cut out of the value chain entirely.” — Dataset Licensing Alliance, June 2025
A New Approach
Some companies are starting to build licensing infrastructure that works at web scale. Rather than requiring individual negotiations — practical only for publishers the size of News Corp — these systems use machine-readable license tags embedded directly in web content.
The model mirrors how music licensing already works. ASCAP and BMI don’t require every restaurant and radio station to negotiate individually with every songwriter. Instead, standardized licenses create a market that works for everyone, from stadium tours to coffee shop playlists.
What Comes Next
Getty Images, Adobe, and music labels have all moved from litigation to licensing deals in the past year. Getty now runs a “commercially safe” program that pays creators recurring royalties. Adobe pays creators $0.06–$0.16 per image for training data. Music organization GEMA has proposed an ongoing royalty model for AI-generated music.
The question isn’t whether AI content licensing will become standard — it’s whether individual creators will be included in that standard, or whether only the largest publishers will have seats at the table.