The AI caching problem that everyone is ignoring

2026-01-20, Evan X. Merz

There is a lot of talk about AI lately, but I haven't heard much chatter about one particularly difficult problem that is facing the industry: caching.

It's not a sexy topic to discuss. It involves some esoteric technical intricacies that only web developers will understand. But caching is a big problem for AI.

Scaling traditional web services

A typical web request goes from your web browser to a web server somewhere in the world that processes your information and sends you the requested information. The requested information might be a blog post, a social media post, a video on a streaming site, or sports scores. However, in a traditional web service, many users are requesting the same thing. In other words the server is giving the exact same sports scores to every user who requests them.

Because users are requesting the same thing, the web service can optimize their system by caching that information. This means storing it in a way that is much easier and more efficient to access. One server running a caching system like Redis might cache millions of different requests and serve each response in a tiny fraction of a second. Those requests never hit the main server, which is much more expensive to run.

This is how traditional web services scale. If performant caching didn't work, then all web services would be much slower and much more expensive.

Image of a thinking machine generated by Meta AI

AI requests can't be cached

The fundamental problem facing everyone in the AI race is that requests to AI can't be cached in the same way. Even if users are making very similar requests, they might not want exactly the same response. In other words, when user A tries to generate a cartoon image of a cat, they probably want one that looks like their cat. When user B makes the same request, they probably don't want the same image as user A.

Everyone knows that AI involves some very complex math that uses Generative Adverserial Networks. These networks, even when not in training mode, can't be cached like traditional web requests. They need to be run on actual servers.

This is what has resulted in the big debate about server resources. AI inherently requires more computing power than traditional web services.

The big problem is profitability

The computing issue on its own is a big problem, but if you focus on it, then you miss the larger problem.

AI can't be profitable if it can't be cached.

If traditional web services couldn't be cached, then they would have a hard time being profitable. In other words, if the companies had to pay for many more servers to run their sites, then users would have to pay a monthly fee to use their services. If everyone had to pay a monthly fee to use social media, would they do so? The number of users would go way down. What about sites like Craigslist or YouTube or Yelp? Would those be able to exist if every user had to pay a monthly fee?

AI services might be free for now, and that's nice for regular people, but the services aren't going to last if they aren't making money.

And they won't make money unless they can solve the caching problem.

So the big problem facing the AI industry today is that we have a bunch of huge companies racing to own a field that, so far, is a money loser. It probably can't be profitable without a significant innovation in hardware or software.

Potential solutions to the AI caching problem

There are several potential solutions for this problem.

A hardware breakthrough could solve the issue. If someone found a way to make hardware that can execute AI models extremely efficiently, then the requests wouldn't need to be cached.

A software breakthrough could also solve the issue. If someone invented a way to break down AI requests into smaller pieces that could be cached, then that would make the services potentially scalable.

But I have a lot of experience in web development and in artificial intelligence. I'm not in academia (at the moment), but I'm familiar enough with the complexities of these issues to know that these are extremely difficult problems.

In fact, there are some problems in computing that we know are unsolvable, or at least not solvable without the computing power of the entire universe. The caching problem in AI looks a lot like one of these problems that can't be solved efficiently.

Of course, the counterargument to this is the existence of the human brain. The human brain somehow runs these calculations very efficiently. It's existence implies that there must be some way to optimize these queries.

I'm looking forward to seeing how the industry takes on this problem.