How Google is wasting your bandwidth
Nicholas Piël | November 30, 2009Using a Content Delivery Network (CDN) is a method to improve the performance of your website. Some of the reasons for using a CDN are:
- Placing content geographically close to the end user and thus lowering latency and increasing bandwidth.
- Increasing the amount of parallel downloads at the client by distributing over different domains
- Offload the burden on your servers
- Facilitate long term caching by using a robust source for libraries
Especially this last point is why I looked at Google’s CDN for Ajax libraries. It is a beautiful idea. When more people are using the same CDN, the cost of downloading an Ajax library can be ignored because it is very likely that the web browser will already have the library in its belly. Wonderfull!
For example, when I try to download the Prototype library everything goes well and the Google’s CDN spits the following back:
Content-Type:text/javascript; charset=UTF-8
Date:Mon, 30 Nov 2009 14:31:50 GMT
Expires:Tue, 30 Nov 2010 13:56:34 GMT
As you can see, Google tells your browser to cache the file for a full year as it should. Now, lets look at what happens when trying this with JQuery 1.3.2:
Content-Type:text/javascript; charset=UTF-8
Date:Mon, 30 Nov 2009 14:40:15 GMT
Expires:Tue, 30 Nov 2010 14:40:15 GMT
Again, everything is ok. Now, lets try a different version, JQuery 1.3:
Content-Type:text/javascript; charset=UTF-8
Date:Mon, 30 Nov 2009 14:41:42 GMT
Expires:Mon, 30 Nov 2009 15:41:42 GMT
Huh? When requesting the 1.3 version, Google is basically telling us to ‘remember it only for one hour’. This is wrong imho. When you specify 1.3, you are telling Google you want ‘the latest version in the 1.3 series‘. On the jquery.com site they are linking to the 1.3 version as well. This means that for a pagehit on the jquery website you will re download 60k of minified Jquery goodness when this file is not in your cache. A better approach would be if Google just let the client do its cache revalidation (which can do so by using the ‘if-modified since’ header).
But wait there is more. Wordpress, for example, adds an extra version argument to the file (?ver=<bla>). This can be handy when you want to generate a certain script or css file dynamically. And really should not be a problem with Google’s CDN, should it? Well lets see what happens when we request http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js?ver=1.3.2
Content-Type:text/javascript; charset=UTF-8
Date:Mon, 30 Nov 2009 15:00:00 GMT
Expires:Fri, 01 Jan 1990 00:00:00 GMT
Last-Modified:Mon, 23 Nov 2009 18:54:21 GMT
Holy cow, Google invented time travel! The implications of this are pretty big, this may affect a lot of people with Wordpress blogs who where ’smart enough’ to use the Google CDN but without really testing if it worked.
Basically what this means is that http://ajax.googleapis.com isn’t really your performance safety net. You need to know exactly what you’re doing otherwise it will bite you back and you might be better off just hosting those libraries on your own site. Thus my recommendation would be to use the Google CDN but specify exactly which version of the library you are going to need and make sure you do not provide any arguments.
Subscribe to the RSS feed
Receive updates by Email
When you request 1.3 Google serves the latest 1.3 version, that is 1.3.2. Tht would explain why it is not cached.
Very interesting findings. Personally not that fond of CDN, usually more headache then pleasure imho!
I can understand the implication on the users (sites takes longer to load) but does anyone really have to worry about bandwidth anymore?
@Michael,
Page load is getting more and more important. People do not accept to wait, page load time already influences your bounce and conversion rates and might soon even influence your search ranking.
But yes, it is wrong to talk about bandwidth in the title, but then again the complete title is somewhat exaggerated. As it is not really Google who is screwing up your latency it is the misuse of their CDN which is at the heart of the problem.
Good catch.
I’ve never used a CDN before, because I’m kinda scared to download a JS file from a third party servers. I tried to avoid this as much as possible. And your post gave me one more reason for me to avoid CDN!
I don’t see the point.
jQuery 1.3 is linked to last version. Setting the expiration time for one hour is complete understable – maybe new version will be released in next hour?
However, your final note is correct.
Google also sets:
So, if the version changes the client will download the new version independent of the expires header.
[...] 在「How Google is wasting your bandwidth」看到有人發現 Google 所提供的 Google AJAX Libraries 有一些地方處理的非常極端,沒有注意的話反而會使得使用者多花不少頻寬在上面。 [...]
I think you have misunderstood the way HTTP cache revalidation works. There are two important points to clarify:
1. Cache revalidation only happens *after* a cache entry goes stale.
2. Cache revalidation prevents re-downloading the entire content if it has not been modified.
So, what really happens when you use jquery version 1.3 from Google? The cached version will be used for 1 hour without revalidation, then the content will be revalidated once every hour to make sure it hasn’t changed. If the content hasn’t changed, the revalidation request will be a lightweight 304 (Not Modified) response, thus avoiding an unnecessary 60k download.
Let me know if I’m misunderstanding something myself. See http://www.freesoft.org/CIE/RFC/2068/168.htm for details.
…On the other hand, the “time travel” issue for libraries with URL parameters is a completely separate issue. It seems that caching is disabled for requests with URL parameters. Maybe that’s a bug.
Hi Brett,
Thanks for your comment. I did some testing in Safari and it revalidates the cache *before* it goes stale. However, after some more testing it seems that Safari is kinda special in this. Opera and Firefox do not show this behavior.
I should also note that Safari revalidates in an asynchronous way, it first shows you the cache and then puts the validity check on some sort of queue. Closing the browser before it fires the conditional-get does not clear this queue so it seems, as reopening simply fires a ‘if modified since’ request.
From reading the specification:
It seems to only say how to handle the ‘must-revalidate’ directive AFTER the cache goes stale. And when interpreting the next part:
One could say that the gist of providing a 1.3 version by the CDN is this: “always make sure the client has the latest (bug free / patched) version of the library”. If that is the case, then the cache control setting ‘must-revalidate’ is indeed correct but setting a cache expiry time of one hour isn’t imho.
ps,
You’re right that when the cache expires, the browsers do not automatically purge the item from their cache. At least not in the 1 hour interval i have tested. I have modified my post accordingly, thanks!
[...] specific version (http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js) is undesirable; Google will expire the “latest” content after one hour, as opposed to one year for a specific version. This is for good reason of course, as the latest [...]