Will you let ChatGPT access your content?

ChatGPT's product loop and bootstrapping LLMs

Most people are looking at the graph below and professing ChatGPT’s decline. I believe OpenAI knew this would happen and pulled off an incredible bootstrapping strategy.

Here’s why:

ChatGPT’s new content policy

OpenAI is giving website owners the ability to choose whether they want their data crawled by ChatGPT.

Website owners, or publishers for simplicity, can decide whether they want their content to be used in ChatGPT’s responses. It’s driven by publishers complaining (understandably) that their content was used, without consent, to make a commercial product.

The change prompted me to think about a classic network effects problem with an important caveat. Today, I want to cover the following:

  1. Will publishers share data

  2. How can OpenAI incentivise publishers

  3. Why OpenAI pulled off the biggest bootstrap in recent times

Will publishers want to share their data?

It depends on how the publisher makes money.

If the publishers make money directly from the content on the website, they have no incentive to share this with ChatGPT. Think paid blogs, newsletters or any other instances where the content is the product. Unlike Google, ChatGPT will not send most of these users to your website. It does not provide links for you to click on. Even if it did (I suspect it will at some point), you will not click on it very often. ChatGPT is designed to give you answers without you having to navigate to a different page. If this were not the case, you’d just use Google.

If you make money indirectly from the content, you will share your content. Consider a brand like Nike: all visibility is good visibility. They have an established brand. If ChatGPT recommends the latest Nike shoe, it’s valuable for Nike. Sharing information which ChatGPT outweighs the benefit of not doing so. Even if a customer doesn’t go on to purchase in the same session, there is benefit to them knowing about Nike’s latest shoe.

How can OpenAI incentivise publishers?

To incentivise all publishers to share data, OpenAI needs to leverage its product loop:

  1. Publishers need to be paid, directly or indirectly, to share content

  2. To be paid, ChatGPT needs to be used

  3. Users will only use ChatGPT if quality remains high

  4. Content is critical for LLM quality

The only way OpenAI can get owners of private data to publish content is by offering them money. There are two ways to do this: a) you offer a direct reward for crawling their site, or b) you increase the number of people landing on their website (which is what Google does). Providing a direct reward is hard. Even if OpenAI made a significant investment on this front, the individual amount earned by a publisher may not be sufficient.

If they want to send users to publishers’ websites, it is equally challenging. OpenAI will provide links to original sources, either via blue links like Google, or using plugins. You ask ChatGPT about at topic, it gives you the gist and then a click to get more details information. I’m not optimistic about this outcome because it moves ChatGPT to a glorified search engine, which is kind of pointless.

Either way, let’s assume that OpenAI figures out how to do the above. To pay publishers a meaningful amount of money, ChatGPT needs enough users. Users will only use ChatGPT if the quality remains high. To maintain the quality of an LLM like ChatGPT, you need content to continuously refine and update it.

So there you go: you need enough users to make it interesting for publishers, users only stay if the product is good enough and for the product to be good enough you need the content.

How OpenAI bootstrapped the network

This is a classic network effects problem. It’s a bit like your local market: sellers want to buy a stall only if there are enough buyers, and buyers will come to the market only if there are enough sellers.

Getting the market going, i.e. the initial push, is called bootstrapping. OpenAI has killed it on this front. Most people look at the graph below and go “the blue line is tanking”. Truth be told, the OpenAI team probably expected this.

They knew they were going up against Google, which has historically invested more in AI than any other company. Google has a monopoly on search. They need something to kick things off and give them a boost, which was ChatGPT’s launch in November 2022.

The caveat in OpenAI’s favour

Unlike other networks, you don’t lose quality quickly with large language models.

At your local market, you will leave as soon as you notice there aren’t enough sellers. This won’t happen with ChatGPT because the quality of the network compounds. Even if every single publisher decided to stop sharing content with OpenAI, ChatGPT will remain at its current quality. It is future quality that will suffer. In addition, ChatGPT has collected incredible data on what users are asking, what answers it gave and the feedback on those answers.

This was the bootstrapping effort of a lifetime, and only time will tell if it’s good enough.