Stack Overflow Will Begin Charging AI Companies for Training Data

2 min readApr 28, 2023

Following Twitter and Reddit, a popular question-and-answer website, Stack Overflow, has announced plans to charge AI companies for training data. The website’s CEO Prashanth Chandrasekar also stated that the policy would go into effect soon after this year’s halfway point.

As many know, Stack Overflow is considered the “go-to” for many programmers due to its community’s rich question & answer nature. But with the release of AI such as Microsoft/Git Hub’s Co-pilot, it’s unsurprising that the website has decided to charge AI companies for the ability to train their models using the website’s data.

While speaking with Wired, Stack Overflow CEO Prashanth Chandrasekar made the announcement. In plain English, it’s simple compensation. He also points to Reddit’s approach to charging AI companies a model moving forward. “Community platforms that fuel LLMs absolutely should be compensated for their contributions so that companies like us can reinvest back into our communities to continue to make them thrive…We’re very supportive of Reddit’s approach.”

All of this also comes after a report by The Washington Post revealed that thanks to Google’s C4 dataset, millions of websites are training AI technology inadvertently. This is done simply through web scraping of data found online. As you expect websites such as Wikipedia, Medium, The New York Times, and Open Data Science Conference have been used to train models such as Facebook’s LLaMA.

It makes sense why data from ODSC and many others are considered valuable for training models. But more and more websites are beginning to push back on their information being used to train models without compensation. In Chandrasekar’s case, his hope is simple. Revenue coming from AI companies to train their models can become a source to enrich these same communities that make these models possible.

If all goes well, it could become a feedback loop where AI companies and the websites they scrape for data can mutually benefit to keep data fresh and for the communities to flourish.

Originally posted on OpenDataScience.com

Read more data science articles on OpenDataScience.com, including tutorials and guides from beginner to advanced levels! Subscribe to our weekly newsletter here and receive the latest news every Thursday. You can also get data science training on-demand wherever you are with our Ai+ Training platform. Subscribe to our fast-growing Medium Publication too, the ODSC Journal, and inquire about becoming a writer.

Stack Overflow Will Begin Charging AI Companies for Training Data

Written by ODSC - Open Data Science