Iacopo Masi, PhD
Postdoctoral Scholar - Research Associate
University of Southern California (USC)
Postdoc at MICC, University of Florence
PhD Student at MICC, University of Florence
iacopoma at usc dot edu
cv / google scholar / Ph.D. thesis
papers: arXiv / journal / conference
My previous website@MICC, Florence

About me

Ciao,
<Welcome · Benvenuto · خوش آمدید · 欢迎 · Bienvenido · ברוך הבא · ترحيب · Bienvenue · Chào mừng · 歓迎 · добро пожаловать · Willkommen · स्वागत · 환영 · Boas-vindas · kuwakaribisha> to my web-page!

I am a computer vision researcher interested in face recognition, person re-identification and tracking .

I pursued the PhD in Computer Vision with Prof. Alberto Del Bimbo, Prof. Andrew D. Bagdanov and Dr. Federico Pernici, working at the Media Integration and Communication Center, University of Florence and discussing the dissertation "From Motion to Faces: 3D-assisted automatic analysis of people".

I have been a visiting scholar at USC Iris Computer Vision Lab. in Los Angeles, CA with Prof. G. Medioni.

Recently, I joined USC Iris Computer Vision Lab as a postdoctoral scholar with Prof. G. Medioni

My research spans a broad spectrum of computer vision, pattern recognition and machine learning with applications in the field of video-surveillance with PTZ cameras, tracking, person re-identification and 2D/3D face modeling and recognition.

News

  • Jul. 2017, Co-organizing the CHI workshop at ICCV17 with Giuseppe, Tal, and Shaogang!  
  • Feb. 2017, One paper accepted at CVPR17!  
  • Jan. 2017, ACM Transactions published on re-identification!  
  • Jan. 2017, Check out my talk at ECCV16 regarding Face-Specific Domain Augmentation for Effective Face Recognition  
  • Dec. 2016, Project page available (with code!) to regress 3DMM with no hassles!
  • Oct. 2016, Project page available (with code!) for Face-Specific Data Augmentation for Effective Face Recognition  
  • Jul. 2016, One paper accepted at ECCV16!  
  • Jul. 2016, Project page available for the Pose-Aware Models!  
  • Jul. 2016, New tech-report on re-identification!  
  • Apr. 2016, One paper accepted at CVPRW16!  
  • Feb. 2016, One paper accepted at CVPR16!  
  • Jan. 2016, One paper accepted at WACV16!  
  • Sep. 2015, Re-identification tutorial at BTAS15!  

Project pages

Journal papers

Multi Channel-Kernel Canonical Correlation Analysis for Cross-View Person Re-Identification
G. Lisanti, S. Karaman, I. Masi
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) 2017
abstract / bibtex / arxiv / Code and Project Page
    @article{lisanti2016mckcca,
      title={Multi Channel-Kernel Canonical Correlation
Analysis for Cross-View Person Re-Identification},
      author={Giuseppe Lisanti 
      and Svebor Karaman
      and Iacopo Masi},
      journal={ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM) },
      year={2017},
    }
        
In this paper we introduce a method to overcome one of the main challenges of person re-identification in multicamera networks, namely cross-view appearance changes. The proposed solution addresses the extreme variability of person appearance in different camera views by exploiting multiple feature representations. For each feature, Kernel Canonical Correlation Analysis (KCCA) with different kernels is exploited to learn several projection spaces in which the appearance correlation between samples of the same person observed from different cameras is maximized. An iterative logistic regression is finally used to select and weigh the contributions of each feature projections and perform the matching between the two views. Experimental evaluation shows that the proposed solution obtains comparable performance on VIPeR and PRID 450s datasets and improves on PRID and CUHK01 datasets with respect to the state of the art.
Continuous Localization and Mapping of a Pan Tilt Zoom Camera for Wide Area Tracking
G. Lisanti, I. Masi, F. Pernici, A. Del Bimbo
Machine Vision and Applications 2016 (MVA)
abstract / bibtex
    @article{lisanti2016mva,
      author    = {Giuseppe Lisanti and
                   Iacopo Masi and
                   Federico Pernici and
                   Alberto Del Bimbo},
      title     = {Continuous Localization and Mapping of a Pan Tilt Zoom Camera for
                   Wide Area Tracking},
      journal   = {Machine Vision and Applications},
      volume    = {--},
      year      = {2016},
    }
        
Pan-tilt-zoom (PTZ) cameras are powerful to support object identification and recognition in far-field scenes. However, the effective use of PTZ cameras in real contexts is complicated by the fact that a continuous on-line camera calibration is needed and the absolute pan, tilt and zoom positional values provided by the camera actuators cannot be used because are not synchronized with the video stream. So, accurate calibration must be directly extracted from the visual content of the frames. Moreover, the large and abrupt scale changes, the scene background changes due to the camera operation and the need of camera motion compensation make target tracking with these cameras extremely challenging. In this paper, we present a solution that provides continuous on-line calibration of PTZ cameras which is robust to rapid camera motion, changes of the environment due to illumination or moving objects and scales beyond thousands of landmarks. The method directly derives the relationship between the position of a target in the 3D world plane and the corresponding scale and position in the 2D image, and allows real-time tracking of multiple targets with high and stable degree of accuracy even at far distances and any zooming level.
Person Re-identification by Iterative Sparse Ranking
G. Lisanti, I. Masi, A. D. Bagdanov, A. Del Bimbo
IEEE Transactions on Pattern Analysis and Machine Intelligence 2014 (TPAMI)
abstract / bibtex / WHOS descriptor / code
@article{lisanti:pami14, 
    author = {Lisanti, Giuseppe and Masi, Iacopo and Bagdanov, Andrew D. 
    and {Del Bimbo}, Alberto }, 
    title = { Person Re-identification by Iterative Re-weighted Sparse Ranking}, 
    booktitle = {IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
    year = {2015},
    }
        
In this paper we introduce a method for person re-identification based on discriminative, sparse basis expansions of targets in terms of a labeled gallery of known individuals. We propose an iterative extension to sparse discriminative classifiers capable of ranking many candidate targets. The approach makes use of soft- and hard- re-weighting to redistribute energy among the most relevant contributing elements and to ensure that the best candidates are ranked at each iteration. Our approach also leverages a novel visual descriptor which we show to be discriminative while remaining robust to pose and illumination variations. An extensive comparative evaluation is given demonstrating that our approach achieves state-of-the-art performance on single- and multi-shot person re-identification scenarios on the VIPeR, i-LIDS, ETHZ, and CAVIAR4REID datasets. The combination of our descriptor and iterative sparse basis expansion improves state-of-the-art rank-1 performance by six percentage points on VIPeR and by 20 on CAVIAR4REID compared to other methods with a single gallery image per person. With multiple gallery and probe images per person our approach improves by 17 percentage points the state-of-the-art on i-LIDS and by 72 on CAVIAR4REID at rank-1. The approach is also quite efficient, capable of single-shot person re-identification over galleries containing hundreds of individuals at about 30 re-identifications per second.
Compact and efficient posterity logging of face imagery for video surveillance
A. D. Bagdanov, A. Del Bimbo, F. Dini, G. Lisanti, I. Masi
IEEE Multimedia 2012 (IEEEMM)
abstract / bibtex / code & dataset
@article{bagdanov2012ieee, 
      author = {Bagdanov, Andrew D. and Del Bimbo, Alberto and Dini, Fabrizio and Lisanti, Giuseppe and Masi, Iacopo}, 
      title = {Compact and efficient posterity logging of face imagery for video surveillance}, 
      booktitle = {IEEE Multimedia}, 
      year = {2012},
    }
        
A real-time posterity logging system detects and tracks multiple targets in video streams, grabbing face images and retaining only the best quality for each detected target.

Conference papers

2017

Regressing Robust and Discriminative 3D Morphable Models with a very Deep Neural Network
A. Tran, T. Hassner, I. Masi, G. Medioni
Computer Vision and Patter Recognition 2017 (CVPR17)
abstract / bibtex / project page / code (Python) / arxiv
@article{tran2016regressing,
  title={Regressing Robust and Discriminative {3D} Morphable Models with a very Deep Neural Network},
  author={Anh Tu\~{a}n Tr\~{a}n and Tal Hassner and Iacopo Masi and G\'{e}rard Medioni},
  journal={arXiv preprint arXiv:1612.04904},
  year={2016}
}
        
The 3D shapes of faces are well known to be discriminative. Yet despite this, they are rarely used for face recognition and always under controlled viewing conditions. We claim that this is a symptom of a serious but often overlooked problem with existing methods for single view 3D face reconstruction: when applied "in the wild", their 3D estimates are either unstable and change for different photos of the same subject or they are over-regularized and generic. In response, we describe a robust method for regressing discriminative 3D morphable face models (3DMM). We use a convolutional neural network (CNN) to regress 3DMM shape and texture parameters directly from an input photo. We overcome the shortage of training data required for this purpose by offering a method for generating huge numbers of labeled examples. The 3D estimates produced by our CNN surpass state of the art accuracy on the MICC data set. Coupled with a 3D-3D face matching pipeline, we show the first competitive face recognition results on the LFW, YTF and IJB-A benchmarks using 3D face shapes as representations, rather than the opaque deep feature vectors used by other modern systems.
Rapid Synthesis of Massive Face Sets for Improved Face Recognition
I. Masi, T. Hassner, A. Tran, G. Medioni
International Conference on Automatic Face and Gesture Recognition (FG17)
abstract / bibtex / project page / code (Python)
@article{masi2017rapid,
  title={Rapid Synthesis of Massive Face Sets for Improved Face Recognition},
  author={Iacopo Masi and Tal Hassner and Anh Tu\~{a}n Tr\~{a}n and G\'{e}rard Medioni},
  journal={International Conference on Automatic Face and Gesture Recognition (FG)},
  year={2017}
}
        
Recent work demonstrated that computer graphics techniques can be used to improve face recognition performances by synthesizing multiple new views of faces available in existing face collections. By so doing, more images and more appearance variations are available for training, thereby improving the deep models trained on these images. Similar rendering techniques were also applied at test time to align faces in 3D and reduce appearance variations when comparing faces. These previous results, however, did not consider the computational cost of rendering: At training, rendering millions of face images can be prohibitive; at test time, rendering can quickly become a bottleneck, particularly when multiple images represent a subject. This paper builds on a number of observations which, under certain circumstances, allow rendering new 3D views of faces at a computational cost which is equivalent to simple 2D image warping. We demonstrate this by showing that the run-time of an optimized OpenGL rendering engine is slower than the simple Python implementation we designed for the same purpose. The proposed rendering is used in a face recognition pipeline and tested on the challenging IJB-A and Janus CS2 benchmarks. Our results show that our rendering is not only fast, but improves recognition accuracy.

2016

Do We Really Need to Collect Millions of Faces for Effective Face Recognition?
I. Masi*, A. Tran*, T. Hassner*, J. Leksut, G. Medioni
European Conference on Computer Vision (ECCV16)
abstract / bibtex / project page / code (Python) / arxiv
    @article{MTLHM:2016:dowe,
      title={Do We Really Need to Collect Millions of Faces 
      for Effective Face Recognition?},
      author={Iacopo Masi 
      and Anh Tu\~{a}n Tr\~{a}n 
      and Tal Hassner 
      and Jatuporn Toy Leksut 
      and G\'{e}rard Medioni},
      booktitle={European Conference on Computer Vision},
      year={2016},
    }
        
Face recognition capabilities have recently made extraordinary leaps. Though this progress is at least partially due to ballooning training set sizes -- huge numbers of face images downloaded and labeled for identity -- it is not clear if the formidable task of collecting so many images is truly necessary. We propose a far more accessible means of increasing training data sizes for face recognition systems. Rather than manually harvesting and labeling more faces, we simply synthesize them. We describe novel methods of enriching an existing dataset with important facial appearance variations by manipulating the faces it contains. We further apply this synthesis approach when matching query images represented using a standard convolutional neural network. The effect of training and testing with synthesized images is extensively tested on the LFW and IJB-A (verification and identification) benchmarks and Janus CS2. The performances obtained by our approach match state of the art results reported by systems trained on millions of downloaded images.
Pooling Faces: Template based Face Recognition with Pooled Face Images
T. Hassner, I. Masi, J. Kim, J. Choi. S. Harel, P. Natarajan, G. Medioni
Computer Vision and Patter Recognition Workshops 2016 (CVPRW16)
abstract / bibtex
    @article{hassner2016pool, 
    author = {Hassner, Tal and Masi, Iacopo and Kim, Jungyeon and Choi,
    Jongmoo and Harel, Shai and {Natarajan}, Prem and  Medioni, G\'{e}rard }, 
    title = {Pooling Faces: Template based Face Recognition with Pooled Face Images}, 
    booktitle = {Computer Vision and Pattern Recognition Workshop}, 
    year = {2016},
    }
    
We propose a novel approach to template based face recognition. Our dual goal is to both increase recognition accuracy and reduce the computational and storage costs of template matching. To do this, we leverage on an approach which was proven effective in many other domains, but, to our knowledge, never fully explored for face images: aver- age pooling of face photos. We show how (and why!) the space of a template’s images can be partitioned and then pooled based on image quality and head pose and the effect this has on accuracy and template size. We perform exten- sive tests on the IJB-A and Janus CS2 template based face identification and verification benchmarks. These show that not only does our approach outperform published state of the art despite requiring far fewer cross template compar- isons, but also, surprisingly, that image pooling performs on par with deep feature pooling.
Pose-Aware Face Recognition in the Wild
I. Masi, S. Rawls, G. Medioni, P. Natarajan
Computer Vision and Patter Recognition 2016 (CVPR16)
abstract / bibtex / Download PAM CNNs / IJB-A yaw estimates
    @INPROCEEDINGS{masi2016cvpr, 
    author={Iacopo Masi and Stephen Rawls and G{\'e}rard Medioni and Prem Natarajan}, 
    booktitle={CVPR}, 
    title={Pose-{A}ware {F}ace {R}ecognition in the {W}ild}, 
    year={2016}
    }
        
We propose a method to push the frontiers of unconstrained face recognition in the wild, focusing on the problem of extreme pose variations. As opposed to current techniques which either expect a single model to learn pose invariance through massive amounts of training data, or which normalize images to a single frontal pose, our method explicitly tackles pose variation by using multiple pose-specific models and rendered face images. We leverage deep Convolutional Neural Networks (CNNs) to learn discriminative representations we call Pose-Aware Models (PAMs) using 500K images from the CASIA WebFace dataset. We present a comparative evaluation on the new IARPA Janus Benchmark A (IJB-A) and PIPA datasets. On these datasets PAMs achieve remarkably better performance than commercial products and surprisingly also outperform methods that are specifically fine-tuned on the target dataset.
Face Recognition Using Deep Multi-Pose Representations
W. AbdAlmageed*, Y. Wu*, S. Rawls*, S. Harel, T. Hassner, I. Masi, J. Choi, J. Leksut, J. Kim,
P. Natarajan, R. Nevatia, G. Medioni
Winter Conference on Applications of Computer Vision 2016 (WACV16)
abstract / bibtex
    @inproceedings{AbdAlmageed2016multipose,
      title={Face Recognition Using Deep Multi-Pose Representations},
      author={AbdAlmageed, Wael and Wu, Yue and Rawls, Stephen and Harel, Shai and Hassner, Tal and
      Masi, Iacopo and Choi, Jongmoo and Lekust, Jatuporn and Kim, Jungyeon and Natarajan, Prem and
      Nevatia, Ram  and Medioni, Gerard },
      booktitle=WACV,
      year={2016}
    }
        
We introduce our method and system for face recognition using multiple pose-aware deep learning models. In our representation, a face image is processed by several pose-specific deep convolutional neural network (CNN) models to generate multiple pose-specific features. 3D rendering is used to generate multiple face poses from the input image. Sensitivity of the recognition system to pose variations is reduced since we use an ensemble of pose-specific CNN features. The paper presents extensive experimental results on the effect of landmark detection, CNN layer selection and pose model selection on the performance of the recognition pipeline. Our novel representation achieves better results than the state-of-the-art on IARPA's CS2 and NIST's IJB-A in both verification and identification (i.e. search) tasks.

2014

Information Theoretic Sensor Management for Multi-Target Tracking with a Single Pan-Tilt-Zoom Camera
P. Salvagnini, F. Pernici, M. Cristani, G. Lisanti, I. Masi, A. Del Bimbo, V. Murino
Winter Conference on Applications of Computer Vision 2014 (WACV14)
abstract / bibtex
    @inproceedings{salvagnini2014sensor,
      title={Information Theoretic Sensor Management for Multi-Target Tracking with a Single Pan-Tilt-Zoom Camera},
      author={P. Salvagnini, F. Pernici, M. Cristani, G. Lisanti, I. Masi, A. {Del Bimbo}, V. Murino },
      booktitle=WACV,
      year={2014}
    }
        
Automatic multiple target tracking with pan-tilt-zoom (PTZ) cameras is a hard task, with few approaches in the literature, most of them proposing simplistic scenarios. In this paper, we present a PTZ camera management framework which lies on information theoretic principles: at each time step, the next camera pose (pan, tilt, focal length) is chosen, according to a policy which ensures maximum information gain. The formulation takes into account occlusions, physical extension of targets, realistic pedestrian detectors and the mechanical constraints of the camera. Convincing comparative results on synthetic data, realistic simulations and the implementation on a real video surveillance camera validate the effectiveness of the proposed method.
Matching People across Camera Views using Kernel Canonical Correlation Analysis
G. Lisanti, I. Masi, A. Del Bimbo
International Conference on Distributed Smart Cameras (ICDSC14)
abstract / bibtex / code
    @inproceedings{lisanti2014kccareid,
      title={Matching People across Camera Views using Kernel Canonical Correlation Analysis},
      author={G. Lisanti, I. Masi, A. {Del Bimbo},
      booktitle="Eighth ACM/IEEE International Conference on Distributed Smart Cameras",
      year={2014}
    }
        
Matching people across views is still an open problem in computer vision and in video surveillance systems. In this paper we address the problem of person re-identification across disjoint cameras by proposing an efficient but robust kernel descriptor to encode the appearance of a person. The matching is then improved by applying a learning technique based on Kernel Canonical Correlation Analysis (KCCA) which finds a common subspace between the proposed descriptors extracted from disjoint cameras, projecting them into a new description space. This common description space is then used to identify a person from one camera to another with a standard nearest-neighbor voting method. We evaluate our approach on two publicly available datasets for re-identification (VIPeR and PRID), demonstrating that our method yields state-of-the-art performance with respect to recent techniques proposed for the re-identification task.
Pose Independent Face Recognition by Localizing Local Binary Patterns via Deformation Components
I. Masi, C. Ferrari, A. Del Bimbo, G. Medioni
International Conference on Pattern Recognition 2014 (ICPR14)
abstract / bibtex / video
    @inproceedings{masiicpr14facepose,
      author = {Masi, Iacopo and Ferrari, Claudio and {Del Bimbo}, Alberto and Medioni, Gerard}, 
      title = {Pose Independent Face Recognition by Localizing Local Binary Patterns via Deformation Components}, 
      booktitle = {International Conference on Pattern Recognition}, 
      year = {2014},
    }
        
In this paper we address the problem of pose independent face recognition with a gallery set containing one frontal face image per enrolled subject while the probe set is composed by just a face image undergoing pose variations. The approach uses a set of aligned 3D models to learn deformation components using a 3D Morphable Model (3DMM). This further allows fitting a 3DMM efficiently on an image using a Ridge regression solution, regularized on the face space estimated via PCA. Then the approach describes each profile face by computing Local Binary Pattern (LBP) histograms localized on each deformed vertex, projected on a rendered frontal view. In the experimental result we evaluate the proposed method on the CMU Multi-PIE to assess face recognition algorithm across pose. We show how our process leads to higher performance than regular baselines reporting high recognition rate considering a range of facial poses in the probe set, up to 45 degree. Finally we remark that our approach can handle continuous pose variations and it is comparable with recent state-of-the-art approaches.

2013

Multi-Target Data Association using Sparse Reconstruction
A. D. Bagdanov, A. Del Bimbo, D. Di Fina, S. Karaman, G. Lisanti, I. Masi
International Conference on Image Analysis and Processing 2013 (ICIAP2013)
abstract / bibtex
    @InProceedings{difina2013assoc, 
      author = {Bagdanov, Andrew and and Del Bimbo, Alberto and {Di Fina}, Dario and Lisanti, Giuseppe and Karaman, Svebor and  Masi, Iacopo}, 
      title = {Multi-Target Data Association using Sparse Reconstruction}, 
      booktitle = {International Conference on Image Analysis and Processing (ICIAP)}, 
      year = {2013},
    }
        
In this paper we describe a solution to multi-target data association problem based on ℓ1-regularized sparse basis expansions. Assuming we have sufficient training samples per subject, our idea is to create a discriminative basis of observations that we can use to reconstruct and associate a new target. The use of ℓ1-regularized basis expansions allows our approach to exploit multiple instances of the target when performing data association rather than relying on an average representation of target appearance. Preliminary experimental results on the PETS dataset are encouraging and demonstrate that our approach is an accurate and efficient approach to multi-target data association.
Using 3D Models to Recognize 2D Faces in the Wild
I. Masi, G. Lisanti, A. D. Bagdanov, P. Pala, and A. Del Bimbo
CVPR Workshop on Socially Intelligent Surveillance and Monitoring (CVPRW2013)
abstract / bibtex
    @InProceedings{masi2013models, 
      author = {Masi, Iacopo and Lisanti, Giuseppe and Bagdanov,Andrew D. and Pala, Pietro and {Del Bimbo}, Alberto}, 
      title = {Using {3D} Models to Recognize {2D} Faces in the Wild}, 
      booktitle = {CVPR Workshop on Socially Intelligent Surveillance and Monitoring},
      year = {2013}, 
      }
        
In this paper we consider the problem of face recognition in imagery captured in uncooperative environments using PTZ cameras. For each subject enrolled in the gallery, we acquire a high-resolution 3D model from which we generate a series of rendered face images of varying viewpoint. The result of regularly sampling face pose for all subjects is a redundant basis that over represents each target. To recognize an unknown probe image, we perform a sparse reconstruction of SIFT features extracted from the probe using a basis of SIFT features from the gallery. While directly collecting images over varying pose for all enrolled subjects is prohibitive at enrollment, the use of high speed, 3D acquisition systems allows our face recognition system to quickly acquire a single model, and generate synthetic views offline. Finally we show, using two publicly available datasets, how our approach performs when using rendered gallery images to recognize 2D rendered probe images and 2D probe images acquired using PTZ cameras.

2012 and earlier

Multi-pose Face Detection for Accurate Face Logging
A. D. Bagdanov, A. Del Bimbo, G. Lisanti, and I. Masi
International Conference on Pattern Recognition 2012 (ICPR2012)
abstract / bibtex / code & dataset
    @InProceedings{bagdanov2012log, 
      author="Andrew D. Bagdanov and Alberto {Del Bimbo} and Giuseppe Lisanti and Iacopo Masi", title="Multi-pose Face Detection for Accurate Face Logging",
      booktitle="International Conference on Pattern Recognition",
      year=2012, 
      }
        
In this paper we present a technique for real-time face logging in video streams. Our system is capable of detecting faces across a range of poses and of tracking multiple targets in real time, grabbing face images and evaluating their quality in order to store only the best for each detected target. An advantage of our approach is that we qualify every logged face in terms of a quality measure based both on face pose and on resolution. Extensive qualitative and quantitative evaluation of the performance of our system is provided on many hours of realistic surveillance footage captured in different environments. Results show that our system can simultaneously minimizing false positives and identity mismatches, while balancing this against the need to obtain face images of all people in a scene.
Florence faces: a dataset supporting 2D/3D face recognition
A. D. Bagdanov, A. Del Bimbo, I. Masi
International Symposium on Communications, Control, and Signal Processing 2012 (ISCCSP 2012)

The Florence 2D/3D Hybrid Face Dataset
A. D. Bagdanov, A. Del Bimbo, I. Masi
Joint ACM Workshop on Human Gesture and Behavior Understanding (J-HGBU’11) ACM Multimedia Workshop 2011
abstract / bibtex / dataset
    @InProceedings{bagdanov2012ffaces, 
      author="Andrew D. Bagdanov and Alberto {Del Bimbo} and Iacopo Masi", 
      title="Florence faces: a dataset supporting 2D/3D face recognition", 
      booktitle="IEEE 5th International Symposium on Communications, Control, and Signal Processing 2012",
       year="2012"
      }
        
This article describes a new dataset under construction at the Media Integration and Communication Center and the University of Florence. The dataset consists of high-resolution 3D scans of human faces from each subject, along with several video sequences of varying resolution and zoom level. Each subject is recorded in a controlled setting in HD video, then in a less-constrained (but still indoor) setting using a standard, PTZ surveillance camera, and finally in an unconstrained, outdoor environment with challenging conditions. In each sequence the subject is recorded at three levels of zoom. This dataset is being constructed specifically to support research on techniques that bridge the gap between 2D, appearance-based recognition techniques, and fully 3D approaches. It is designed to simulate, in a controlled fashion, realistic surveillance conditions and to probe the efficacy of exploiting 3D models in real scenarios.
Continuous Recovery for Real Time Pan Tilt Zoom Localization and Mapping
A. Del Bimbo, G. Lisanti, I. Masi, F. Pernici
Advanced Video and Signal based Surveillance (AVSS2011)
abstract / bibtex / poster / dataset / video
    @InProceedings{delbimbo2011recovery, 
      author="Alberto {Del Bimbo} and Giuseppe Lisanti and Iacopo Masi and Federico Pernici",
      title="Continuous Recovery for Real Time Pan Tilt Zoom Localization and Mapping",
      booktitle="Advanced Video and Signal based Surveillance (AVSS 2011)"
      year=2011,
      }
        
We propose a method for real time recovering from tracking failure in monocular localization and mapping with a Pan Tilt Zoom camera (PTZ). The method automatically detects and seamlessly recovers from tracking failure while preserving map integrity. By extending recent advances in the PTZ localization and mapping, the system can quickly and continuously resume tracking failures by determining the best way to task two different localization modalities. The tradeoff involved when choosing between the two modalities is captured by maximizing the information expected to be extracted from the scene map. This is especially helpful in four main viewing condition: blurred frames, weak textured scene, not up to date map and occlusions due to sensor quantization or moving objects. Extensive tests show that the resulting system is able to recover from several different failures while zooming-in weak textured scene, all in real time.
Person Detection using Temporal and Geometric Context with a PTZ Camera
A. Del Bimbo, G. Lisanti, I. Masi, F. Pernici
International Conference on Pattern Recognition 2010 (ICPR 2010)
abstract / bibtex
   @InProceedings{delbimbo2010icpr,
  author       = {{Del Bimbo}, Alberto and Lisanti, Giuseppe and Masi, Iacopo and Pernici, Federico},
  title        = {Person Detection using Temporal and Geometric Context with a Pan Tilt Zoom Camera},
  booktitle    = {IAPR International Conference on Pattern Recognition},
  year         = {2010},
}     
        
In this paper we present a system that integrates automatic camera geometry estimation and object detection from a Pan Tilt Zoom camera. We estimate camera pose with respect to a world scene plane in real-time and perform human detection exploiting the relative space-time context. Using camera self-localization, 2D object detections are clustered in a 3D world coordinate frame. Target scale inference is further exploited to reduce the number of false alarms and to increase also the detection rate in the final non-maximum suppression stage. Our integrated system applied on real-world data shows superior performance with respect to the standard detector used.
Device-Tagged Feature-based Localization and Mapping of Wide Areas with a PTZ Camera
A. Del Bimbo, G. Lisanti, I. Masi, F. Pernici
CVPR Int'l Workshop on Socially Intelligent Surveillance (CVPRW10)
abstract / bibtex / video
@InProceedings{delbimbo2010sism,
  author       = {{Del Bimbo}, Alberto and Lisanti, Giuseppe and Masi, Iacopo and Pernici, Federico},
  title        = {Device-Tagged Feature-based Localization and Mapping of Wide Areas with a PTZ Camera},
  booktitle    = {Proc. of CVPR Int'l Workshop on Socially Intelligent Surveillance
and Monitoring},
  year         = {2010},

}
        
This paper proposes a new method for estimating and maintaining over time the pose of a single Pan-Tilt-Zoom camera (PTZ). This is achieved firstly by building offline a keypoints database of the scene; then, in the online step, a coarse localization is obtained from camera odometry and finally refined by visual landmarks matching. A maintenance step is also performed at runtime to keep updated the geometry and appearance of the map. At the present state-of-the-art, there are no methods addressing the problem of being operative for a long period of time. Also, differently from our proposal these methods do not take into account for variations in focal length. Experimental evaluation shows that the proposed approach makes it possible to deliver stable camera pose tracking over time with hundreds of thousand landmarks, which can be kept updated at runtime.
Authors listed alphabetically
* Indicates equal contributions

Ph.D. dissertation

From Motion to Faces: 3D-assisted automatic analysis of people
I. Masi
Ph.D. dissertation, University of Florence, March 2014

I really like these two websites.