Innovation Insights
by Stephen Shapiro

SS Blog AI Inbreeding

AI Inbreeding

AI is creating massive amounts of content quickly. And a lot of this ends up getting posted on the internet.

This got me thinking about the future of Large Language Models (LLMs).

At some point, a large percentage of content on the internet will be AI-generated. And if the LLMs continue to be trained using that content, a form of inbreeding will occur.

As the models get trained on the content they already spit out, any newly generated content would be a regurgitated version of that old information. The quality, instead of improving, will decline rapidly.

To make matters even worse, AI hallucinations (inaccurate content generated by AI) will lead to a higher level of invalid information on the internet.

As the LLMs get trained on old or invalid information, the usefulness and accuracy of what they generate will decrease over time.

I knew I wasn’t the first person to consider this, so I Googled (not GPTed) the concept and found an interesting article on this topic.

They call this concept “model collapse.”

This reminds me of something we used to do years ago with a Xerox machine. If you took a picture of your face and put it into the copying machine, and then took the copy and copied it, you looked like a cartoon after a few iterations. Each copy looked less and less like reality.

Will we get cartoon versions of reality with AI in the future? As the content copies itself over and over, will it be less and less accurate?

And given the hallucinations, it’s even worse. It’s not a copy of a copy but rather a copy of a lie.

It will be interesting to see what happens.