Nicholas Piël

  • Home
  • About
  • Projects

How Google is wasting your bandwidth

Nicholas Piël | November 30, 2009

Using a Content Delivery Network (CDN) is a method  to improve the performance of your website. Some of the reasons for using a CDN are:

  • Placing content geographically close to the end user and thus lowering latency and increasing bandwidth.
  • Increasing the amount of parallel downloads at the client by distributing over different domains
  • Offload the burden on your servers
  • Facilitate long term caching by using a robust source for libraries

Especially this last point is why I looked at Google’s CDN for Ajax libraries. It is a beautiful idea. When more people are using the same CDN, the cost of downloading an Ajax library can be ignored because it is very likely that the web browser will already have the library in its belly. Wonderfull!

For example, when I try to download the Prototype library everything goes well and the Google’s CDN spits the following back:

Content-Type:text/javascript; charset=UTF-8
Date:Mon, 30 Nov 2009 14:31:50 GMT
Expires:Tue, 30 Nov 2010 13:56:34 GMT

As you can see, Google tells your browser to cache the file for a full year as it should. Now, lets look at what happens when trying this with JQuery 1.3.2:

Content-Type:text/javascript; charset=UTF-8
Date:Mon, 30 Nov 2009 14:40:15 GMT
Expires:Tue, 30 Nov 2010 14:40:15 GMT

Again, everything is ok. Now, lets try a different version, JQuery 1.3:

Content-Type:text/javascript; charset=UTF-8
Date:Mon, 30 Nov 2009 14:41:42 GMT
Expires:Mon, 30 Nov 2009 15:41:42 GMT

Huh? When requesting the 1.3 version, Google is basically telling us to ‘remember it only for one hour’. This is wrong imho. When you specify 1.3, you are telling Google you want ‘the latest version in the 1.3 series‘. On the jquery.com site they are linking to the 1.3 version as well. This means that for a pagehit on the jquery website you will re download 60k of minified Jquery goodness when this file is not in your cache. A better approach would be if Google just let the client do its cache  revalidation  (which can do so by using the ‘if-modified since’ header).

But wait there is more. Wordpress, for example, adds an extra version argument to the file (?ver=<bla>). This can be handy when you want to generate a certain script or css file dynamically. And really should not be a problem with Google’s CDN, should it? Well lets see what happens when we request http://ajax.googleapis.com/ajax/libs/jquery/1.3.2/jquery.min.js?ver=1.3.2

Content-Type:text/javascript; charset=UTF-8
Date:Mon, 30 Nov 2009 15:00:00 GMT
Expires:Fri, 01 Jan 1990 00:00:00 GMT
Last-Modified:Mon, 23 Nov 2009 18:54:21 GMT

Holy cow, Google invented time travel!  The implications of this are pretty big, this may affect a lot of people with Wordpress blogs who where ’smart enough’ to use the Google CDN but without really testing if it worked.

Basically what this means is that http://ajax.googleapis.com isn’t really your performance safety net. You need to know exactly what you’re doing otherwise it will bite you back and you might be better off just hosting those libraries on your own site. Thus my recommendation would be to use the Google CDN but specify exactly which version of the library you are going to need and make sure you do not provide any arguments.

Tags
cdn, javascript, performance, programming

« Hello world! Climategate battle — start sharing data »

11 Responses to “How Google is wasting your bandwidth”

  1. Artur Honzawa says:
    November 30, 2009 at 5:07 pm

    When you request 1.3 Google serves the latest 1.3 version, that is 1.3.2. Tht would explain why it is not cached.

  2. screenshotscores says:
    November 30, 2009 at 5:30 pm

    Very interesting findings. Personally not that fond of CDN, usually more headache then pleasure imho!

  3. Michael Wales says:
    November 30, 2009 at 5:58 pm

    I can understand the implication on the users (sites takes longer to load) but does anyone really have to worry about bandwidth anymore?

  4. Nicholas Piël says:
    November 30, 2009 at 6:15 pm

    @Michael,

    Page load is getting more and more important. People do not accept to wait, page load time already influences your bounce and conversion rates and might soon even influence your search ranking.

    But yes, it is wrong to talk about bandwidth in the title, but then again the complete title is somewhat exaggerated. As it is not really Google who is screwing up your latency it is the misuse of their CDN which is at the heart of the problem.

  5. Veera says:
    November 30, 2009 at 7:12 pm

    Good catch.

    I’ve never used a CDN before, because I’m kinda scared to download a JS file from a third party servers. I tried to avoid this as much as possible. And your post gave me one more reason for me to avoid CDN! :-)

  6. bluszcz says:
    November 30, 2009 at 9:14 pm

    I don’t see the point.

    jQuery 1.3 is linked to last version. Setting the expiration time for one hour is complete understable – maybe new version will be released in next hour?

    However, your final note is correct.

    • Nicholas Piël says:
      November 30, 2009 at 9:21 pm

      Google also sets:

      Cache-Control:public, must-revalidate, proxy-revalidate, max-age=3600

      So, if the version changes the client will download the new version independent of the expires header.

  7. 使用 Google 提供的 AJAX Libraries 需要注意的細節… at Gea-Suan Lin’s BLOG says:
    November 30, 2009 at 11:13 pm

    [...] 在「How Google is wasting your bandwidth」看到有人發現 Google 所提供的 Google AJAX Libraries 有一些地方處理的非常極端,沒有注意的話反而會使得使用者多花不少頻寬在上面。 [...]

  8. Brett Bavar says:
    December 1, 2009 at 3:22 am

    I think you have misunderstood the way HTTP cache revalidation works. There are two important points to clarify:
    1. Cache revalidation only happens *after* a cache entry goes stale.
    2. Cache revalidation prevents re-downloading the entire content if it has not been modified.

    So, what really happens when you use jquery version 1.3 from Google? The cached version will be used for 1 hour without revalidation, then the content will be revalidated once every hour to make sure it hasn’t changed. If the content hasn’t changed, the revalidation request will be a lightweight 304 (Not Modified) response, thus avoiding an unnecessary 60k download.

    Let me know if I’m misunderstanding something myself. See http://www.freesoft.org/CIE/RFC/2068/168.htm for details.

    …On the other hand, the “time travel” issue for libraries with URL parameters is a completely separate issue. It seems that caching is disabled for requests with URL parameters. Maybe that’s a bug.

    • Nicholas Piël says:
      December 1, 2009 at 11:45 am

      Hi Brett,

      Thanks for your comment. I did some testing in Safari and it revalidates the cache *before* it goes stale. However, after some more testing it seems that Safari is kinda special in this. Opera and Firefox do not show this behavior.

      I should also note that Safari revalidates in an asynchronous way, it first shows you the cache and then puts the validity check on some sort of queue. Closing the browser before it fires the conditional-get does not clear this queue so it seems, as reopening simply fires a ‘if modified since’ request.

      From reading the specification:

      When the must-revalidate directive is present in a response received by a cache, that cache MUST NOT use the entry after it becomes stale to respond to a subsequent request without first revalidating it with the origin server.

      It seems to only say how to handle the ‘must-revalidate’ directive AFTER the cache goes stale. And when interpreting the next part:

      Servers should send the must-revalidate directive if and only if failure to revalidate a request on the entity could result in incorrect operation, such as a silently unexecuted financial transaction. Recipients MUST NOT take any automated action that violates this directive, and MUST NOT automatically provide an unvalidated copy of the entity if revalidation fails.

      One could say that the gist of providing a 1.3 version by the CDN is this: “always make sure the client has the latest (bug free / patched) version of the library”. If that is the case, then the cache control setting ‘must-revalidate’ is indeed correct but setting a cache expiry time of one hour isn’t imho.

      ps,
      You’re right that when the cache expires, the browsers do not automatically purge the item from their cache. At least not in the 1 hour interval i have tested. I have modified my post accordingly, thanks!

  9. Google jQuery CDN says:
    March 19, 2010 at 12:23 pm

    [...] specific version (http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js) is undesirable; Google will expire the “latest” content after one hour, as opposed to one year for a specific version. This is for good reason of course, as the latest [...]

SiteSupport

Working on:

SiteSupport - Remote desktop for web apps
remote desktop for web apps

We've just launched our first product demo, check it out!

Posts

  • Announcing: SiteSupport
  • ZeroMQ an introduction
  • Benchmark of Python WSGI Servers
  • Asynchronous Servers in Python
  • Person Recognition (with Python)

Tags

ai async cdn comet computer vision gevent javascript performance programming Python rant scalability sitesupport websockets wsgi zeromq

Tweets

  • Why gevent is switching from libevent to libev: http://bit.ly/j2kMgX YC comments: http://bit.ly/keeLKz 08:27:24 PM April 28, 2011 from Tweetie for Mac
  • RT @openQRM: openQRM 4.8 released - much more than "just" Cloud Computing - http://bit.ly/iatiQa, http://bit.ly/7dy0HF, http://bit.ly/hgz060 01:01:13 PM April 01, 2011 from Tweetie for Mac
  • RT @greenhostnl: Greenhost gaat per direct 25% minder energie gebruiken. Lees meer op het weblog: http://bit.ly/gOCnpO 09:01:37 AM April 01, 2011 from Tweetie for Mac
  • RT @mikkohypponen: As it turns out, mysql.com is vulnerable to - wait for it - SQL injection. 06:53:54 PM March 27, 2011 from Tweetie for Mac
  • "Silly me, I thought the 'sellable resource' lawyers had was their law expertise, not their hours in the day." by @bramcohen 10:44:04 PM March 25, 2011 from Tweetie for Mac

Follow

Follow on Twitter
Subscribe to the RSS feed
Receive updates by Email

Running on Wordpress
design based on Freshy by Jidé, the nutmeg image is from Shlomit & Ziv
(c) Nicholas Piël