Nicholas Piël

  • Home
  • About
  • Projects

Person Recognition (with Python)

Nicholas Piël | December 21, 2009

For my Msc thesis I have developed a system build in Python which does person recognition and have shown that it is possible to obtain a better recognition rate with this system than by using Google’s Picasa. I have put the source code online and will hereby announce that I will try my best to spend some time explaining how to do person and face recognition with Python.  I hope that a public announcements such as this will instantly create some public debt forcing me to actually complete this task. We shall see. :)

Bert and Ernie

The approach described in my thesis uses a combination of pictorial cues (Eigenfaces, SIFT points, color histogram) and contextual cues (co-occurence with other persons or background). The idea behind this is really simple, in order to recognize a person we don’t even need to see the face of that specific person in most cases if we have more contextual information about the setting in which it was taken. For example, lets say you are looking at some pictures from ‘Sesame street’, when you detect Ernie there is a high probability that that other person on that same photo will be Bert. Even more so if we detect that the main component of that other persons color histogram is yellow.

The approach can be divided in three different subtasks:

  1. Detecting the person and segmenting its specific region
  2. Extracting the features
  3. Clustering over these features

In the first task, we will detect the different people by using a face detection technique based on haarcascades. I plan on showing how to use OpenCV with Python and how you can improve its performance by combining the result of multiple haar cascades.

With the detected face we only have a certain square region within a picture which is very likely to contain a face. In order to detect the rest of the body i used a graph based segmentation technique and highly optimized the segmentation algorithm by using implementations in NumPy, Fortran and finally PyCUDA

From this segmented region we will then extract pictorial features such as a color histogram and SIFT features. With this information we can then try to extract and use our contextual information.

Global Overview of the Person Recognition System

In the schematic overview we can see the different steps, we start with some images and segment it in regions of interest. From these regions we will then extract features to build our person models, over which we can then cluster by using  pictorial features (SIFT points and Color Histograms) and contextual features (ie, co-occurence with detected background or other persons). The code for this can already be found on BitBucket but is a bit rough, but as already said I promised to do some explaining. So keep an eye on this blog if you’re interested.

For now, I present you my collected list of references (also in BibTex format) regarding person recognition. It could be a nice starting point for anyone interested in this domain.

References

davis2006rbp
The relationship between precision-recall and ROC curves
J. Davis and M. Goadrich
233--240  (2006)
raghavan1989cir
A critical investigation of recall and precision as measures of retrieval system performance
V. Raghavan and P. Bollmann and G. S. Jung
ACM Transactions on Information Systems (TOIS)  7  205--229  (1989)
wilson2006ffd
Facial feature detection using Haar classifiers
P. I. Wilson and J. Fernandez
Journal of Computing Sciences in Colleges  21  127--133  (2006)
freund1997dtg
A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
Y. Freund and R. E. Schapire
Journal of Computer and System Sciences  55  119--139  (1997)
graham2002tep
Time as essence for photo browsing through personal digital libraries
A. Graham and H. Garcia-Molina and A. Paepcke and T. Winograd
326--335  (2002)
cooper2005tec
Temporal event clustering for digital photo collections
M. Cooper and J. Foote and A. Girgensohn and L. Wilcox
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP)  1  269--288  (2005)
moon2001cap
Computational and performance aspects of PCA-based face-recognition algorithms
H. Moon and P. J. Phillips
Perception-London  30  303--322  (2001)
otoole2005fra
Face Recognition Algorithms Surpass Humans Matching Faces over Changes in Illumination
A. O'Toole and P. J. Phillips and F. Jiang and J. Ayyad and N. Pénard and H. Abdi
IEEE Transactions on Pattern Analysis and Machine Intelligence  1642--1646  (2007)
liu2006cdi
Capitalize on Dimensionality Increasing Techniques for Improving Face Recognition Grand Challenge Performance
C. Liu
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE  725--737  (2006)
beis1997siu
Shape Indexing Using Approximate Nearest-Neighbour Search in High-Dimensional Spaces
J. Beis and D. Lowe
1000--1006  (1997)
bay2006ssu
SURF: Speeded Up Robust Features
H. Bay and T. Tuytelaars and L. Van Gool
Lecture Notes in Computer Science  3951  404  (2006)
ke2004psm
PCA-SIFT: A More Distinctive Representation for Local Image Descriptors
Y. Ke and R. Sukthankar
2  (2004)
kryszczuk:ccf
Color correction for face detection based on human visual perception metaphor
K. Kryszczuk and A. Drygajlo
138--143  (2007)
gevers1999cbo
Color-based object recognition
T. Gevers and A. W. M. Smeulders
Pattern Recognition  32  453--464  (1999)
swain1991ci
Color indexing
M. J. Swain and D. H. Ballard
International Journal of Computer Vision  7  11--32  (1991)
schmid2000eip
Evaluation of Interest Point Detectors
C. Schmid and R. Mohr and C. Bauckhage
International Journal of Computer Vision  37  151--172  (2000)
lowe2004dif
Distinctive Image Features from Scale-Invariant Keypoints
D. G. Lowe
International Journal of Computer Vision  60  91--110  (2004)
mikolajczyk2004sai
Scale & Affine Invariant Interest Point Detectors
K. Mikolajczyk and C. Schmid
International Journal of Computer Vision  60  63--86  (2004)
mikolajczyk2005pel
A Performance Evaluation of Local Descriptors
K. Mikolajczyk and C. Schmid
IEEE Transactions on Pattern Analysis and Machine Intelligence  1615--1630  (2005)
comaniciu2002msr
Mean Shift: A Robust Approach Toward Feature Space Analysis
D. Comaniciu and P. Meer
IEEE Transactions on Pattern Analysis and Machine Intelligence  603--619  (2002)
grabner2006fas
Fast Approximated SIFT
M. Grabner and H. Grabner and H. Bischof
Lecture Notes in Computer Science  3851  918  (2006)
castrillonsantana2008faf
Face and Facial Feature Detection Evaluation
M. Castrillón-Santana and L. Déniz-Suárez and L. Antón-Canalís and J. Lorenzo-Navarro
7  (2008)
lienhart2002esh
LAn Extended Set of Haar-like Features for Rapid Object Detection
R. Lienhart and J. Maydt
1  900--903  (2002)
turk1991er
Eigenfaces for Recognition
M. Turk and A. Pentland
Journal of Cognitive Neuroscience  3  71--86  (1991)
felzenszwalb2004egb
Efficient Graph-Based Image Segmentation
P. F. Felzenszwalb and D. P. Huttenlocher
International Journal of Computer Vision  59  167--181  (2004)
Hae-sang:2006rz
A K-means-like Algorithm for K-medoids Clustering and Its Performance
P. Hae-sang and L. Jong-seok and J. Chi-hyuck
1222-1231  (2006)
cui2007eip
EasyAlbum: an interactive photo annotation system based on face clustering and re-ranking
J. Cui and F. Wen and R. Xiao and Y. Tian and X. Tang
367--376  (2007)
elgammal2001pfs
Probabilistic framework for segmenting people under occlusion
A. Elgammal and L. Davis
2  (2001)
2008_garcia_cvgpu
Fast k nearest neighbor search using GPU
V. Garcia and E. Debreuve and M. Barlaud
(2008)
VanDeSandeCVPR2008
Evaluation of Color Descriptors for Object and Scene Recognition
K. E. A. van de Sande and T. Gevers and C. G. M. Snoek
(2008)
Image category recognition is important to access visual information on the level of objects and scene types. So far, intensity-based descriptors have been widely used. To increase illumination invariance and discriminative power, color descriptors have been proposed only recently. As many descriptors exist, a structured overview of color invariant descriptors in the context of image category recognition is required. Therefore, this paper studies the invariance properties and the distinctiveness of color descriptors in a structured way. The invariance properties of color descriptors are shown analytically using a taxonomy based on invariance properties with respect to photometric transformations. The distinctiveness of color descriptors is assessed experimentally using two benchmarks from the image domain and the video domain. From the theoretical and experimental results, it can be derived that invariance to light intensity changes and light color changes affects category recognition. The results reveal further that, for light intensity changes, the usefulness of invariance is category-specific.
wagstaff2001ckm
Constrained k-means clustering with background knowledge
K. Wagstaff and C. Cardie and S. Rogers and S. Schroedl
577--584  (2001)
gallagher_cvpr_08_clothing
Clothing Cosegmentation for Recognizing People
A. Gallagher and T. Chen
(2008)
Lepetit:cr
Keypoint Recognition using Randomized Trees
V. Lepetit and P. Fua
IEEE Transactions on Pattern Analysis and Machine Intelligence  (2006)
Indyk:1999dq
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality
P. Indyk and R. Motwani
Proceedings of the thirtieth annual ACM symposium on Theory of computing  605--613  (1998)
Gionis:1999bh
Similarity Search in High Dimensions via Hashing
A. Gionis and P. Indyk and R. Motwani
???  518-529  (1999)
Sivic:2004qf
Efficient Visual Content Retrieval and Mining in Videos
J. Sivic and A. Zisserman
???  (2004)
Clayton:2007vn
A learning framework for nearest neighbor search
L. Clayton and S. Dasgupta
Advances in Neural Information Processing Systems 20  (2007)
ozuysal2007fkr
Fast keypoint recognition in ten lines of code
M. Ozuysal and P. Fua and V. Lepetit
Proc. IEEE Conference on Computing Vision and Pattern Recognition  (2007)
shi2000nca
Normalized cuts and image segmentation
J. Shi and J. Malik
IEEE Transactions on Pattern Analysis and Machine Intelligence  22  888--905  (2000)
marfil2006psa
Pyramid segmentation algorithms revisited
R. Marfil and L. Molina-Tanco and A. Bandera and J. Rodríguez and F. Sandoval
Pattern Recognition  39  1430--1451  (2006)
Zhang:2007pd
Local features and kernels for classification of texture and ob ject categories
J. Zhang and M. Marszalek and S. Lazebnik and C. Schmid
International Journal of Computer Vision  73  213-238  (2007)
lowe1999orl
Object recognition from local scale-invariant features
D. G. Lowe
International Conference on Computer Vision  2  1150--1157  (1999)
SandeMSC07
Coloring Concept Detection in Video using Interest Regions
K. E. A. v. d. Sande
(2007)
Video concept detection aims to detect high-level semantic information present in video. State-of-the-art systems are based on visual features and use machine learning to build concept detectors from annotated examples. The choice of features and machine learning algorithms is of great influence on the accuracy of the concept detector. So far, intensity-based SIFT features based on interest regions have been applied with great success in image retrieval. Features based on interest regions, also known as local features, consist of an interest region detector and a region descriptor. In contrast to using intensity information only, we will extend both interest region detection and region description with color information in this thesis. We hypothesize that automated concept detection using interest region features benefits from the addition of color information. Our experiments, using the Mediamill Challenge benchmark, show that the combination of intensity features with color features improves significantly over intensity features alone.
ramanan2003fat
Finding and tracking people from the bottom up
D. Ramanan and D. Forsyth
Computer Vision and Pattern Recognition, 2003. Proceedings. 2003 IEEE Computer Society Conference on  2  (2003)
felzenswalb
Efficient Matching of Pictorial Structures
P. Felzenszwalb and D. Huttenlocher
66-73  (2000)
girgensohn2004lfr
Leveraging face recognition technology to find and organize photos
A. Girgensohn and J. Adcock and L. Wilcox
Proceedings of the 6th ACM SIGMM international workshop on Multimedia information retrieval  99--106  (2004)
naaman2005lcr
Leveraging context to resolve identity in photo albums
M. Naaman and R. B. Yeh and H. Garcia-Molina and A. Paepcke
Proceedings of the 5th ACM/IEEE-CS joint conference on Digital libraries  178--187  (2005)
berg2007naf
Names and Faces
T. L. Berg and A. C. Berg and J. Edwards and M. Maire and R. White and Y. W. Teh and E. Learned-Miller and D. Forsyth
University of California Berkeley. Technical report  (2007)
tian2007faf
A Face Annotation Framework with Partial Clustering and Interactive Labeling
Y. Tian and W. Liu and R. Xiao and F. Wen and X. Tang
Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on  1--8  (2007)
zhao2006apa
Automatic Person Annotation of Family Photo Album
M. Zhao and Y. Teo and S. Liu and T. Chua and R. Jain
International Conference on Image and Video Retrieval  163--172  (2006)
zhang2005rfa
Robust Face Alignment Based on Local Texture Classifiers
L. Zhang and H. Ai and S. Xin and C. Huang and S. Tsukiji and S. Lao
The IEEE International Conference on Image Processing  354--357  (2005)
arandjelovic2006acl
Automatic Cast Listing in Feature-Length Films with Anisotropic Manifold Space
O. Arandjelovic and R. Cipolla
2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition  2  1513--1520
jaffre11ipl
JImprovement of a person labelling method using extracted knowledge on costume
G. Jaffre and P. Joly
jaffre:cnf
Costume: A New Feature for Automatic Video Content Indexing
G. Jaffre and P. Joly
Coupling approaches, coupling media and coupling languages for information retrieval (RIAO)  314--325  (2004)
anguelov2007cir
Contextual Identity Recognition in Personal Photo Albums
D. Anguelov and K. Lee and S. B. Gokturk and B. Sumengen
Computer Vision and Pattern Recognition, 2007. CVPR'07. IEEE Conference on  1--7  (2007)
yacoob2005daa
Detection, Analysis and Matching of Hair
Y. Yacoob and L. Davis
Computer Vision, 2005. ICCV 2005. Tenth IEEE International Conference on  1  (2005)
song2006cah
Context-Aided Human Recognition--Clustering
Y. Song and T. Leung
European Conference on Computer Vision  (2006)
sivic2006fpr
Finding people in repeated shots of the same scene
J. Sivic and C. L. Zitnick and R. Szeliski
British Machine Vision Conference  (2006)
yilmaz2006ots
Object tracking: A survey
A. YILMAZ and O. JAVED and M. SHAH
ACM computing surveys  38  1--45  (2006)
dalal:hdu
Human Detection Using Oriented Histograms of Flow and Appearance
N. Dalal and B. Triggs and C. Schmid
kpalma:oap
An Overview of Advances of Pattern Recognition Systems in Computer Vision
K. Kpalma and J. Ronsin
gavrila1999vah
Visual analysis of human movement: A survey
D. M. Gavrila
Computer Vision and Image Understanding  73  82--98  (1999)
The ability to recognize humans and their activities by vision is key for a machine to interact intelligently and effortlessly with a human-inhabited environment. Because of many potentially impor- tant applications, ``looking at people'' is currently one of the most active application domains in computer vision. This survey identi- fies a number of promising applications and provides an overview of recent developments in this domain. The scope of this survey is limited to work on whole-body or hand motion; it does not include work on human faces. The emphasis is on discussing the various methodologies; they are grouped in 2-D approaches with or without explicit shape models and 3-D approaches. Where appropriate, sys- tems are reviewed. We conclude with some thoughts about future directions.
jones2002scm
Statistical Color Models with Application to Skin Detection
M. J. Jones and J. M. Rehg
International Journal of Computer Vision  46  81--96  (2002)
The existence of large image datasets such as the set of photos on the World Wide Web make it possible to build powerful generic models for low-level image attributes like color using simple histogram learning techniques. We describe the construction of color models for skin and non-skin classes from a dataset of nearly 1 billion labelled pixels. These classes exhibit a surprising degree of separability which we exploit by building a skin pixel detector achieving a detection rate of 80% with 8.5% false positives. We compare the performance of histogram and mixture models in skin detection and find histogram models to be superior in accuracy and computational cost. Using aggregate features computed from the skin pixel detector we build a surprisingly effective detector for naked people. Our results suggest that color can be a more powerful cue for detecting people in unconstrained imagery than was previously suspected. We believe this work is the most comprehensive and detailed exploration of skin color models to date.
diplaros2004sdu
Skin detection using the EM algorithm with spatial constraints
A. Diplaros and T. Gevers and N. Vlassis
Systems, Man and Cybernetics, 2004 IEEE International Conference on  4  (2004)
Abstract -- In this paper, we propose a color-based method for skin detection and segmentation, which also takes into account the spatial coherence of the skin pixels. We treat the problem of skin detection as an inference problem. We as- sume that each pixel in an image has a hidden binary label associated with it, that specifies if it is skin or not. In order to solve the inference problem ,we use a variational EM al- gorithm which incorporates the spatial constraints with just a small computational overhead in the E-step. Finally, we show that our method provides better results than the stan- dard EM algorithm and a state-of-art skin-detection method from the literature [9].
gavrila07eb
A Bayesian, Exemplar-Based Approach to Hierarchical Shape Matching
D. M. Gavrila
29  (2007)
gavrila2007mcp
Multi-cue Pedestrian Detection and Tracking from a Moving Vehicle
D. M. Gavrila and S. Munder
International Journal of Computer Vision  73  41--59  (2007)
This paper presents a multi-cue vision system for the real-time detection and tracking of pedestrians from a moving vehicle. The detection component involves a cascade of modules, each utilizing complementary visual criteria to successively narrow down the image search space, balancing robustness and efficiency considerations. Novel is the tight integration of the consecutive modules: (sparse) stereo-based ROI generation, shape-based detection, texture-based classification and (dense) stereo-based verification. For example, shape-based detection activates a weighted combination of texture-based classifiers, each attuned to a particular body pose.Performance of individual modules and their interaction is analyzed by means of Receiver Operator Characteristics (ROCs). A sequential optimization technique allows the successive combination of individual ROCs, providing optimized system parameter settings in a systematic fashion, avoiding ad-hoc parameter tuning. Application-dependent processing constraints can be incorporated in the optimization procedure. Results from extensive field tests in difficult urban traffic conditions suggest system performance is at the leading edge.
bowyer2006saa
A survey of approaches and challenges in 3D and multi-modal 3D+ 2D face recognition
K. W. Bowyer and K. Chang and P. Flynn
Computer Vision and Image Understanding  101  1--15  (2006)
This survey focuses on recognition performed by matching models of the three-dimensional shape of the face, either alone or in combination with matching corresponding two-dimensional intensity images. Research trends to date are summarized, and challenges confronting the development of more accurate three-dimensional face recognition are identified. These challenges include the need for better sensors, improved recognition algorithms, and more rigorous experimental methodology.
phillips2005ofr
Overview of the face recognition grand challenge
P. J. Phillips and P. J. Flynn and T. Scruggs and K. W. Bowyer and J. Chang and K. Hoffman and J. Marques and J. Min and W. Worek
Proceedings of IEEE Conference on Computer Vision and Pattern Recognition  1  947--954  (2005)
Over the last couple of years, face recognition researchers have been developing new techniques. These developments are being fueled by advances in computer vision techniques, computer design, sensor design, and interest in fielding face recognition systems. Such advances hold the promise of reducing the error rate in face recognition systems by an order of magnitude over Face Recognition Vendor Test (FRVT) 2002 results. The Face Recognition Grand Challenge (FRGC) is designed to achieve this performance goal by presenting to researchers a six-experiment challenge problem along with data corpus of 50,000 images. The data consists of 3D scans and high resolution still imagery taken under controlled and uncontrolled conditions. This paper describes the challenge problem, data corpus, and presents baseline performance and preliminary results on natural statistics of facial imagery.
zhu2006fhd
Fast Human Detection Using a Cascade of Histograms of Oriented Gradients
Q. Zhu and S. Avidan and M. C. Yeh and K. T. Cheng
Computer Vision and Pattern Recognition  1  4  (2006)
We integrate the cascade-of-rejectors approach with the Histograms of Oriented Gradients (HoG) features to achieve a fast and accurate human detection system. The features used in our system are HoGs of variable-size blocks that capture salient features of humans automatically. Using AdaBoost for feature selection, we identify the appropriate set of blocks, from a large set of possible blocks. In our system, we use the integral image representation and a rejection cascade which significantly speed up the computation. For a 320 × 280 image, the system can process 5 to 30 frames per second depending on the density in which we scan the image, while maintaining an accuracy level similar to existing methods.
dalai2005hog
Histograms of oriented gradients for human detection
N. Dalai and B. Triggs and I. Rhone-Alps and F. Montbonnot
Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on  1  (2005)
We study the question of feature sets for robust visual object recognition; adopting linear SVM based human detection as a test case. After reviewing existing edge and gradient based descriptors, we show experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance, concluding that fine-scale gradients, fine orientation binning, relatively coarse spatial binning, and high-quality local contrast normalization in overlapping descriptor blocks are all important for good results. The new approach gives near-perfect separation on the original MIT pedestrian database, so we introduce a more challenging dataset containing over 1800 annotated human images with a large range of pose variations and backgrounds.
schneiderman2004odu
Object Detection Using the Statistics of Parts
H. Schneiderman and T. Kanade
International Journal of Computer Vision  56  151--177  (2004)
In this paper we describe a trainable object detector and its instantiations for detecting faces and cars at any size, location, and pose. To cope with variation in object orientation, the detector uses multiple classifiers, each spanning a different range of orientation. Each of these classifiers determines whether the object is present at a specified size within a fixed-size image window. To find the object at any location and size, these classifiers scan the image exhaustively. Each classifier is based on the statistics of localized parts. Each part is a transform from a subset of wavelet coefficients to a discrete set of values. Such parts are designed to capture various combinations of locality in space, frequency, and orientation. In building each classifier, we gathered the class-conditional statistics of these part values from representative samples of object and non-object images. We trained each classifier to minimize classification error on the training set by using Adaboost with Confidence-Weighted Predictions (Shapire and Singer, 1999). In detection, each classifier computes the part values within the image window and looks up their associated class-conditional probabilities. The classifier then makes a decision by applying a likelihood ratio test. For efficiency, the classifier evaluates this likelihood ratio in stages. At each stage, the classifier compares the partial likelihood ratio to a threshold and makes a decision about whether to cease evaluation---labeling the input as non-object---or to continue further evaluation. The detector orders these stages of evaluation from a low-resolution to a high-resolution search of the image. Our trainable object detector achieves reliable and efficient detection of human faces and passenger cars with out-of-plane rotation.
zhao2003frl
Face Recognition: A Literature Survey
W. Zhao and R. Chellappa and P. Phillips and A. Rosenfeld
ACM Computing Surveys  35  399--458  (2003)
As one of the most successful applications of image analysis and understanding, face recognition has recently received significant attention, especially during the past several years. At least two reasons account for this trend: the first is the wide range of commercial and law enforcement applications, and the second is the availability of feasible technologies after 30 years of research. Even though current machine recognition systems have reached a certain level of maturity, their success is limited by the conditions imposed by many real applications. For example, recognition of face images acquired in an outdoor environment with changes in illumination and/or pose remains a largely unsolved problem. In other words, current systems are still far away from the capability of the human perception system. This paper provides an up-to-date critical survey of still- and video-based face recognition research. There are two underlying motivations for us to write this survey paper: the first is to provide an up-to-date review of the existing literature, and the second is to offer some insights into the studies of machine recognition of faces. To provide a comprehensive survey, we not only categorize existing recognition techniques but also present detailed descriptions of representative methods within each category. In addition, relevant topics such as psychophysical studies, system evaluation, and issues of illumination and pose variation are covered.
viola2001rod
Rapid object detection using a boosted cascade of simple features
P. Viola and M. Jones
Proc. CVPR  1  511--518  (2001)
yang2002dfi
Detecting faces in images: a survey
M. H. Yang and D. Kriegman and N. Ahuja
Pattern Analysis and Machine Intelligence, IEEE Transactions on  24  34--58  (2002)
osuna1997tsv
Training support vector machines: an application to face detection
E. Osuna and R. Freund and F. Girosi and others
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition  24  (1997)
rowley1998nnb
Neural network-based face detection
H. Rowley and S. Baluja and T. Kanade
Pattern Analysis and Machine Intelligence, IEEE Transactions on  20  23--38  (1998)
vezhnevets2003spb
A survey on pixel-based skin color detection techniques
V. Vezhnevets and V. Sazonov and A. Andreeva
Proc. Graphicon  85--92  (2003)
Skin color has proven to be a useful and robust cue for face de- tection, localization and tracking. Image content filtering, content- aware video compression and image color balancing applications can also benefit from automatic detection of skin in images. Numer- ous techniques for skin color modelling and recognition have been proposed during several past years. A few papers comparing differ- ent approaches have been published [Zarit et al. 1999], [Terrillon et al. 2000], [Brand and Mason 2000]. However, a comprehensive survey on the topic is still missing. We try to fill this vacuum by reviewing most widely used methods and techniques and collecting their numerical evaluation results.
terrillon1998adh
Automatic detection of human faces in natural scene images by use of a skin color model and of invariant moments
J. C. Terrillon and M. David and S. Akamatsu
Proc. of the Third International Conference on Automatic Face and Gesture Recognition  112--117  (1998)
yang1997scm
Skin-color Modeling and Adaptation
J. Yang and W. Lu and A. Waibel
(1997)
sigal2000eap
Estimation and prediction of evolving color distributions for skin segmentation under varying illumination
L. Sigal and S. Sclaroff and V. Athitsos
PROC IEEE COMPUT SOC CONF COMPUT VISION PATTERN RECOGNIT  2  152--159  (2000)
A novel approach for real-time skin segmentation in video sequences is described. The approach enables reliable skin segmentation despite wide variation in illumination during tracking. An explicit second order Markov model is used to predict evolution of the skin color (HSV) histogram over time. Histograms are dynamically updated based on feed- back from the current segmentation and based on predic- tions of the Markov model. The evolution of the skin color distribution at each frame is parameterized by translation, scaling and rotation in color space. Consequent changes in geometric parameterization of the distribution are prop- agated by warping and re-sampling the histogram. The parameters of the discrete-time dynamic Markov model are estimated using Maximum Likelihood Estimation, and also evolve over time. Quantitative evaluation of the method was conducted on labeled ground-truth video sequences taken from popular movies.
raja1998tas
Tracking and segmenting people in varying lighting conditions using colour
Y. Raja and S. J. McKenna and S. Gong
Third International Conference on Automatic Face and Gesture Recognition, Nara, Japan, IEEE Computer Society Press  228--233  (1998)
Colour cues were used to obtain robust detection and tracking of people in relatively unconstrained dynamic scenes. Gaussian mixture models were used to estimate probability densities of colour for skin, clothing and back- ground. These models were used to detect, track and seg- ment people, faces and hands. A technique for dynamically updating the models to accommodate changes in apparent colour due to varying lighting conditions was used. Two applications are highlighted: (1) actor segmentation for vir- tual studios, and (2) focus of attention for face and gesture recognition systems. A system implemented on a 200MHz PC tracks multiple objects in real-time.
drew1998iic
Illumination-invariant color object recognition via compressedchromaticity histograms of color-channel-normalized images
M. Drew and J. Wei and Z. N. Li
Computer Vision, 1998. Sixth International Conference on  533--540  (1998)
chang1996cts
Color texture segmentation for clothing in a computer-aided fashion design system
C. C. Chang and L. L. Wang
Image and Vision Computing  14  685--702  (1996)
A traditional fashion designer has to draw a large number of drafts in order to accomplish an ideal style. Better performance can be achieved if these operations are done on computers, because the designer can easily make changes for various patterns and colors. To develop a computer-aided fashion design system, one of the most difficult tasks is to automatically separate the clothing from the background so that a new item can be `put on'. One difficulty of the segmentation work arises from the diverse patterns on the clothing, especially with folds or shadows. In this study, circular histograms are first utilized to quantize color and to reduce shadow/highlight effects. Then a color co-occurrence matrix and a color occurrence vector are proposed to characterize the color spatial dependence and color occurrence frequency of the clothing's texture. Next, based on the two color features blocks on the clothing are found by a region growing method. Finally, post-processing is applied to obtain a smooth clothing boundary. Experimental results are presented to show the feasibility of the proposed approach.
darrell2000ipt
Integrated Person Tracking Using Stereo, Color, and Pattern Detection
T. Darrell and G. Gordon and M. Harville and J. Woodfill
International Journal of Computer Vision  37  175--185  (2000)
We present an approach to real-time person tracking in crowded and/or unknown environments using integration of multiple visual modalities. We combine stereo, color, and face detection modules into a single robust system, and show an initial application in an interactive, face-responsive display. Dense, real-time stereo processing is used to isolate users from other objects and people in the background. Skin-hue classification identifies and tracks likely body parts within the silhouette of a user. Face pattern detection discriminates and localizes the face within the identified body parts. Faces and bodies of users are tracked over several temporal scales: short-term (user stays within the field of view), medium-term (user exits/reenters within minutes), and long term (user returns after hours or days). Short-term tracking is performed using simple region position and size correspondences, while medium and long-term tracking are based on statistics of user appearance. We discuss the failure modes of each individual module, describe our integration method, and report results with the complete system in trials with thousands of users.
arandjelovic2005afr
Automatic face recognition for film character retrieval in feature-length films
O. Arandjelovic and A. Zisserman
Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on  1  (2005)
everingham2005iiv
Identifying individuals in video by combining generative and discriminative head models
M. Everingham and A. Zisserman
Proc. ICCV  1103--1110  (2005)
zhang2003aah
Automated annotation of human faces in family albums
L. Zhang and L. Chen and M. Li and H. Zhang
Proceedings of the eleventh ACM international conference on Multimedia  355--358  (2003)
Automatic annotation of photographs is one of the most desirable needs in family photograph management systems. In this paper, we present a learning framework to automate the face annotation in family photograph albums. Firstly, methodologies of content-based image retrieval and face recognition are seamlessly integrated to achieve automated annotation. Secondly, face annotation is formulated in a Bayesian framework, in which the face similarity measure is defined as maximum a posteriori (MAP) estimation. Thirdly, to deal with the missing features, marginal probability is used so that samples which have missing features are compared with those having the full feature set to ensure a non-biased decision. The experimental evaluation has been conducted within a family album of few thousands of photographs and the results show that the proposed approach is effective and efficient in automated face annotation in family albums.
apostof07
Who Are You? realtime person identification
N. Apostoloff and A. Zisserman
(2007)
Everingham06a
Hello! My name is... Buffy -- Automatic Naming of Characters in TV Video
M. Everingham and J. Sivic and A. Zisserman
(2006)
kruppa
Fast and Robust Face Finding via Local Context
H. Kruppa and M. Costrillon-Santana and B. Schiele
Joint IEEE International Workshop on Visual Surveillance and Performance Evaluation of Tracking and Surveillance  (2003)
santana
ENCARA2: Real-time Detection of Multiple Faces at Different Resolutions in Video Streams
M. Castrillón Santana and O. Déniz Suárez and M. Hernández Tejera and C. Guerra Artal
Journal of Visual Communication and Image Representation  130-140  (2007)
sanne
Classifying the Head-shoulder region and orientation in pedestrians
S. Korzec
(2007)
mori
Recovering Human body configurations: Combining segmentation and recognition
G. Mori and X. Ren and A. A. Efros and J. Malik
IEEE Computer Vision and Pattern Recognition  326-333  (2004)
partassembly
Detection and tracking of Humans by Probalistic Body Part Assembly
A. Micilotta
This paper presents a probabilistic framework of assembling detected hu- man body parts into a full 2D human configuration. The face, torso, legs and hands are detected in cluttered scenes using boosted body part detectors trained by AdaBoost. Body configurations are assembled from the detected parts using RANSAC, and a coarse heuristic is applied to eliminate obvious outliers. An a priori mixture model of upper-body configurations is used to provide a pose likelihood for each configuration. A joint-likelihood model is then determined by combining the pose, part detector and corresponding skin model likelihoods. The assembly with the highest likelihood is selected by RANSAC, and the elbow positions are inferred. This paper also illustrates the combination of skin colour likelihood and detection likelihood to further reduce false hand and face detections.
viola
Robust real-time face detection
P. Viola and M. J. Jones

International Journal of Computer Vision  57  137-154  (2004)

Tags
ai, computer vision, programming, Python

« Climategate battle — start sharing data Asynchronous Servers in Python »

SiteSupport

Working on:

SiteSupport - Remote desktop for web apps
remote desktop for web apps

We've just launched our first product demo, check it out!

Posts

  • Announcing: SiteSupport
  • ZeroMQ an introduction
  • Benchmark of Python WSGI Servers
  • Asynchronous Servers in Python
  • Person Recognition (with Python)

Tags

ai async cdn comet computer vision gevent javascript performance programming Python rant scalability sitesupport websockets wsgi zeromq

Tweets

  • Why gevent is switching from libevent to libev: http://bit.ly/j2kMgX YC comments: http://bit.ly/keeLKz 08:27:24 PM April 28, 2011 from Tweetie for Mac
  • RT @openQRM: openQRM 4.8 released - much more than "just" Cloud Computing - http://bit.ly/iatiQa, http://bit.ly/7dy0HF, http://bit.ly/hgz060 01:01:13 PM April 01, 2011 from Tweetie for Mac
  • RT @greenhostnl: Greenhost gaat per direct 25% minder energie gebruiken. Lees meer op het weblog: http://bit.ly/gOCnpO 09:01:37 AM April 01, 2011 from Tweetie for Mac
  • RT @mikkohypponen: As it turns out, mysql.com is vulnerable to - wait for it - SQL injection. 06:53:54 PM March 27, 2011 from Tweetie for Mac
  • "Silly me, I thought the 'sellable resource' lawyers had was their law expertise, not their hours in the day." by @bramcohen 10:44:04 PM March 25, 2011 from Tweetie for Mac

Follow

Follow on Twitter
Subscribe to the RSS feed
Receive updates by Email

Running on Wordpress
design based on Freshy by Jidé, the nutmeg image is from Shlomit & Ziv
(c) Nicholas Piël