Nicholas Piël

  • Home
  • About
  • Projects

Climategate battle — start sharing data

Nicholas Piël | December 11, 2009

Now that the dust has somewhat settled after climategate, the consensus seems to be that it has been overblown. If you look at the timeline of events this isn’t surprising. Between the public appearance of the report and the first damning articles on the 20th there was less then a single day.  It is not that difficult to question how thorough the review of 160mb of data was.  It simply wasn’t.

It was as if some people thought they had hit gold and where aggressively searching for that specific quote within the leaked emails which would make them famous instantly. But all in all it was a bit disappointing if you where hoping to find exciting revelations. The thing that could be distilled from the e-mails was that most researchers are having strong opinions and big ego’s, but this shouldn’t really be a surprise.

It is naive to think that scientists are unbiassed, they simply aren’t. However, they are expected to backup up their views with unbiassed facts. The main argument thats left if we ignore all personal slander seems to be focused around a quote in one of the emails concerning the WMO Statement of the status of the global climate in 1999. The front page of this report shows the picture below and indicates that 1990-1999 has been the hottest decade on the record. So yes, it is an argument about a 10 year old report. It might be worth noting that a few days ago (8 dec 2009), the World Meteorological Institute came with a new press release that our current decade is the warmest on record. That information got probably lost in the heated debate.

1009061939

From the leaked emails conservative news sources state that the following quote is a clear sign of manipulation of evidence:

“I’ve just completed Mike’s Nature trick of adding in the real temps to each series for the last 20 years (ie from 1981 onwards) amd from 1961 for Keith’s to hide the decline.”

But is it? In a rebuttal by the Climate Research Unit they state the following:

This email referred to a “trick” of adding recent instrumental data to the end of temperature reconstructions that were based on proxy data.

Phil Jones comments further: “One of the three temperature reconstructions was based entirely on a particular set of tree-ring data that shows a strong correlation with temperature from the 19th century through to the mid-20th century, but does not show a realistic trend of temperature after 1960. This is well known and is called the ‘decline’ or ‘divergence’. The use of the term ‘hiding the decline’ was in an email written in haste. CRU has not sought to hide the decline. Indeed, CRU has published a number of articles that both illustrate, and discuss the implications of, this recent tree-ring decline, including the article that is listed in the legend of the WMO Statement figure.

They also provide an extra graph where they show the climate reconstruction and the recent instrumental data seperately:

seperated data

So, as you can see there isn’t really anything shocking to report.

It seems that our viewpoint concerning climate change seems closely linked to our position on the political spectrum. In the red corner, we have the conservatives who consider any idea where they might need to change their way of living threatening. In the blue corner we have the progressives, those who feel that change is a goal not just a method. During the first round of the climate gate boxing match we mainly heard the conservative viewpoint represented by the Telegraph, FOX news, Washington Times and lots of infuriated bloggers but now that that the round is over i think the focus will shift to a more progressive point of view. You see, wether or not climate change is happening we will have to think about how we manage our environment. We are running out of resources and we are polluting our environment . When we do not act accordingly we will end up like the easter islands.

Round 2: The need for data sharing

A positive result of this climate battle is the renewed focus on the public availability of data and methodologies. CRU claims that 95% of their data is already open to the public and that they will make the remaining 5% publicly available, which is great news.  This movement of ‘data freeing’ is a great initiative, certainly in this time of collective sharing. John Wilbanks of Science Commons says the following:

“the irony that right at the historical moment when we have the technologies to permit worldwide availability and distributed process of scientific data, broadening collaboration and accelerating the pace and depth of discovery…..we are busy locking up that data and preventing the use of correspondingly advanced technologies on knowledge”.

Making our research widely available is a great way to catalyze progress in the broadest sense, this is probably better illustrated with the next video by  Jesse Dylan.

The importance of data sharing is already recognized by the government of the USA, they have created data.gov with the purpose to increase public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government. It currently has over 118000 different datasets which really makes it a dataminers wetdream.

My efforts on data sharing

In an effort to not just stand along the sideline but participate in this ‘release your data’ party, I have decided to put my master thesis and its results in the public domain. For my master thesis i have implemented a system in mostly Python code which does person recognition on static images. You can compare it with what Google’s Picasa does. However, i was able to outperform Picasa in recognition rate on a few datasets. I have already released some of the source code on BitBucket and you can find a little bit more information on the Projects page.

In the next few months i am going to explain this approach in more detail and will put up my collected resources and a bibtex file. I think this will be a great start for anyone interested in machine vision and person recognition. If your interested just follow me on twitter!

Tags
rant

« How Google is wasting your bandwidth Person Recognition (with Python) »

SiteSupport

Working on:

SiteSupport - Remote desktop for web apps
remote desktop for web apps

We've just launched our first product demo, check it out!

Posts

  • Announcing: SiteSupport
  • ZeroMQ an introduction
  • Benchmark of Python WSGI Servers
  • Asynchronous Servers in Python
  • Person Recognition (with Python)

Tags

ai async cdn comet computer vision gevent javascript performance programming Python rant scalability sitesupport websockets wsgi zeromq

Tweets

  • Why gevent is switching from libevent to libev: http://bit.ly/j2kMgX YC comments: http://bit.ly/keeLKz 08:27:24 PM April 28, 2011 from Tweetie for Mac
  • RT @openQRM: openQRM 4.8 released - much more than "just" Cloud Computing - http://bit.ly/iatiQa, http://bit.ly/7dy0HF, http://bit.ly/hgz060 01:01:13 PM April 01, 2011 from Tweetie for Mac
  • RT @greenhostnl: Greenhost gaat per direct 25% minder energie gebruiken. Lees meer op het weblog: http://bit.ly/gOCnpO 09:01:37 AM April 01, 2011 from Tweetie for Mac
  • RT @mikkohypponen: As it turns out, mysql.com is vulnerable to - wait for it - SQL injection. 06:53:54 PM March 27, 2011 from Tweetie for Mac
  • "Silly me, I thought the 'sellable resource' lawyers had was their law expertise, not their hours in the day." by @bramcohen 10:44:04 PM March 25, 2011 from Tweetie for Mac

Follow

Follow on Twitter
Subscribe to the RSS feed
Receive updates by Email

Running on Wordpress
design based on Freshy by Jidé, the nutmeg image is from Shlomit & Ziv
(c) Nicholas Piël