Evan X. Merz

Programmer / Master Gardener / Doctor of Music / Curious Person

Caching: The most important topic in software development

I increasingly think that caching is the most crucial concept required to be a good software developer. Understanding caching is vital at the micro level of computer engineering, but it's also vital at the macro level of system design, and in the middle with software engineering. You really can't be a good software developer if you don't have a strong understanding of caching.

This is the third in a series of three articles about caching. I wrote one article about how caching is limiting the AI industry in a way that nobody is talking about. I wrote another article about how Spotify's failure to understand how and when to use caching is hurting the user experience of their app. This article will give you three reasons why a deep understanding of caching is one of the most important skills for a software developer to master.

SPOILER: It's because the data structures that underlie all caches are critical to everything that software developers do.

An AI generated image of a hash table as a tree.

1. You cannot understand the web without understanding caching

How many layers of caching are involved in any web request?

Take a minute to think about it. It's not an easy question to answer. In fact, the correct answer is that it varies depending on context, however, it's generally at least three.

  1. All web browsers have at least one cache layer.
  2. Most websites use a Content Delivery Network (CDN) that caches static assets such as images.
  3. Most servers cache requests or pieces of requests.

And that is a simplified view of a very simple website with not many users. Once you get databases involved, you're almost certainly introducing another layer of caching. The web server will typically have one layer of caching around requests, or pieces or requests, and another layer of caching around database access.

The fact of the matter is that you can't understand how the web works without understanding the way the various standard layers of caching interact. Since almost any software you write is going to interact with the web in some way, a software developer has to have a basic understanding of web related caching in order to write effective software.

2. Misuse or underuse of caching is a common source of failure as systems are built or scaled

Using caching incorrectly, or not at all, is the most common source of failure when building or scaling a web service.

I was once at a company where they had a notification list for each user. They were struggling with the scalability of their website. They didn't understand why everything seemed to load so slowly, especially notifications. Well I asked how they were storing notifications and the answer was a search optimized database called ElasticSearch. ElasticSearch is a great tool ... for search. However, it has to re-index every document in the database every time a new one is added, and that's expensive. This is a failure to understand the way notifications should be stored or cached to make it fast for users to access them.

I was at an ecommerce company where list pages were slow. The problem was that the list pages had lots of parameters, and every time a parameter was changed the page had to hit the API to get a new list of items. Well, the items didn't change that much, and the parameters changed even more rarely. We figured out that it would be easier just to cache the list of every set of parameters for a given list page, and then the list pages loaded instantaneously. The difference was night and day.

The point is that almost all scaling problems can be solved by changing the way items are stored or cached.

3. Hash tables are crucial to software developer interviews

The final, crucial reason that all good software developers need a deep understanding of caching is that they need an understanding of caching to be hired in the first place. Questions about caching are critical to both coding problems and system design problems.

Most caches are implemented using a data structure called a Hash Table (aka Lookup table, dictionary, map, associative array). In almost every coding interview I've ever been part of (on either side), there has been a question that involved the use of a hash table. Some of the most frequently asked questions are based on hash tables: Two Sum, Ransom Note, and Top K Frequent Elements.

Caching is equally important to any system design interview. Whether you're designing Netflix, or a parking garage, or any other service, if you forget the cache layer, then you won't get the job.