This page contains a selection of previous projects. I do not support any of this code and will guarantee you that most of the code will not run out of the box. However, you might still find the following stuff helpful and i think it is better to have the code at BitBucket and the reports online so it can be indexed instead of having it rot on my hard drive. All the codes unless otherwise specified is released under the MIT license.
Person recognition by Pictorial and Contextual Cues
Identifying persons is a complicated task, in my master thesis i investigated the possibility of improving the recognition of persons by not only using pictorial cues, which is the information that can be directly extracted directly from the region of interested, but also using those cues in the nearby context such as clothing, background or co-occurence with other people. I have implemented a working system in Python code whereby the most intensive parts where running on the GPU, the recognition results outperformed those by Google’s Picasa. The thesis can be found here and the code is in BitBucket. I plan to write some more about this soon.
Language and Speech
On the domain of language and speech processing i did some research and produced some python code. You can find the papers below.
- Part of Speech (POS) Tagger, Paper
- Probablistic parser using Viterbi, Paper
- Statistical Machine Translator IBM Model 1, Paper
Python Blog Software
Implemented a Blog on top of Pylons, STORM and Mako. It has the following features: xml-rpc interface, rss feeds for posts/categories/comments, syntax highlighting, markdown syntax support, inline editing and support for Latex code. The code can be found at Bitbucket.
Mr. Tag: The Image Labeler
When analyzing image data you often need to label your data first. This can be a painstaking task, but not with this labeling tool, it allows you to quickly label a large amount of (sequential) image data. And allows for easy keyboard navigation. The source of this program can be found in BitBucket. We have used it extensively in our Intelligent Surveillance Project of which you can get the final paper.
Below, you will find some lab exercises, all implemented in Python by using Numpy and Matplotlib.
- Bayesian Classifier, Lab report and Code
- Maximum Likelihood Estimation (MLE) and Bayesian Estimator, Lab report and Code
- Density Estimation on a Gaussian distribution, Lab report and Code
Topic Detection and Document Clustering
Say you are presented with a certain document or text snippet and you want to know what it is all about or how to classify it. How do you do that? Well, first you need a certain taxonomy and then you need some kind of distance method to measure the distance between your document and each different class of the taxonomy. In this paper we present a method to use the categorization of Wikipedia pages and the cumulative occurrence of each word within a certain category as the features that can be used to classify a certain document. The results are promising and the code can be found here.
Genetic Algorithms: Incremental Tree Induction
In this paper i describe an experiment with Genetic Algorithms, i show good performance on small and simple datasets compared to other approaches such as C4.5 nephew J48. However, I should warn you that these are pretty lame experiments, it could very well be that the algorithm explodes on more complex and deeper data. But take a look anyway.
No Free Lunch and MDL
The No-Free-Lunch theorem (NFL) states that no learning algorithm exists for the complete domain of problems that will outperform any other algorithm. Or in other words, every learning algorithm will perform equally well when averaged on the complete problem domain.
The minimum description length (MDL) is a formalization of Occam’s Razor in which the best hypothesis for a given set of data is the one that leads to the largest compression of the data. This will help us against overfitting as this compressed result is a tradeoff between the complexity of the hypothesis and the data given this hypothesis. In this paper I discuss how these views relate and differ.
For the course Game Theory by Peter van Emde Boas we had to finish a variety of different exercises. Some of these exercises where really brain crunchers and I think it can be interesting to see our answers.
Opponent Modelling in Texas Hold’Em
Written a Texas Hold’em simulation and ultra fast poker odds calculator which also looked at the playing behavior of your opponents. The code is partly based on Cactus Kev’s Poker Hand Evaluator and Paul Senzee’s perfect hash. The Python code for this project can be found at bitbucket, and i have also put the accompanying dutch paper online.
Other Stuff for which i can not release the source
- A dutch company Visual Recognition created software that allows a computer to measure your emotion by visually recognizing your facial expressions. We worked on improving their image mapping technique and have described some our approach in this paper.
- When labeling images it is often difficult to come up with the right tags, we presented an approach which is basically a mashup between Flickr’s API and Wordnet which is then able to suggest relevant tags. More can be found here.