- Published on
Effectively Leveraging Browser Cache with Some Bundler Help
Hello there 👋, I felt that caching was an under-documented and an overlooked aspect when it comes to reducing loading times of web apps. I previously worked on reducing the TTI and LCP of a client-rendered web app with a large JS bundle and thought I would write about what I learned.
For this post, I will specifically focus on caching your build output to ensure that your web apps load Blazingly Fast™️.
A Refresher on Browser Caching
So, before we delve into how bundlers help with caching, I first want to cover a few different ways of caching in the browser.
Short-term Cache Using the Max-Age Directive
I'll begin with the simplest form of caching in the browser: utilising the max-age directive. The Max Age directive enables us to specify a value in seconds, instructing the browser to cache the response for that duration.
However, this approach presents a challenge: it's challenging to revalidate the response from the server. For instance, if you need to deploy a critical bug fix in your JS code, the browser won't download the updated bundle as long as the cached bundle is considered "fresh".
As a result, short-term cache is not ideal for caching your JS bundles.
Understanding ETags and Their Advantages Over Using the max-age Directive
I don’t want to delve too deeply into the specifics of how ETags (also known as Entity Tags) can be used for caching, but I will provide a brief overview:
Each time the server sends a response, it also includes an Etag header in the response headers. This header contains a hash generated based on the contents of the response.
When the browser requests the same resource next time, it includes an If-None-Match header in its request. The server then compares the value of this header to the Etag of the current version of the resource. If they match, it responds with a status code of 304 Not Modified, without a body, indicating to the browser that its cached version is still fresh.
This method is an improvement over using a Max-Age directive because here, instead of simply setting expiration times for resources and assuming their freshness until that time elapses, the browser actively validates each resource's freshness. However, this approach also means taking an extra round-trip to verify each resource's freshness with the server.
You could also use stale-while-revalidate here, but it would mean that the outdated content is being used while it is being revalidated in the background. If you are shipping a bug fix, this is not what you want since your users will still use a broken build at least once.
Immutable Resources and Hashing
We instruct the browser to cache a resource indefinitely by using the Cache-Control: Immutable header and directive. The browser will never revalidate the freshness of the resource and will always use it from cache, saving a trip to the server. However, this creates a new problem: how do we tell the browser that a newer version of the resource is available?
The way to address this issue is to include the version or a hash of the content in the resource URL itself. So, whenever you update the contents of the file, generate a new hash based on its contents.
If you have ever seen your JS bundle URL like bundle.3cdbc6.js, this is the reason behind it.
Understanding the Role of Bundlers
When it comes to the role of bundlers, two key factors come into play: deterministic builds and managing small changes in input.
For deterministic builds
Ensuring that builds are consistently deterministic is crucial. When running the build process on the same code multiple times, the output should always be identical. This is significant because any randomisation in the bundler would cause cache invalidation with every build.
Small changes in input result in small changes in output
We usually change only small parts of our codebase via pull requests. A small change in the input should result in a small change in the output as well.
So, let’s consider the following file dependency structure: Module 1 imports Asset, and Module 2 imports Module 1
If you use content hash, any change in the Asset will result in its URL changing. This will cause the contents of Module 1 to change, thereby updating its content hash and consequently leading to a change in the contents of Module 2. Ultimately, this would invalidate the cache for all your files.
Bundlers use a manifest file to keep track of all the URLs, preventing this problem from occurring. This bundler feature makes it easier to use content hash for files. Chunks reference other chunks via the use of the manifest file rather than referencing them directly.
So, if you were building a web app tomorrow, what would I recommend to you?
Since content hash cannot be used for HTML files (the URL of index.html needs to stay the same), ETag is a really good strategy here. You only download a new HTML file if the contents of the HTML file itself change or the URL to your JS entrypoint changes.
When it comes to the JS entrypoint (and other initial chunks), use ETag for this as well; otherwise, it would update the content of your HTML file and invalidate the browser cache.
For everything else, use content hash to aggressively cache all assets in the browser.