TY - GEN
T1 - Multiview random forest of local experts combining RGB and LIDAR data for pedestrian detection
AU - Gonzalez, Alejandro
AU - Villalonga, Gabriel
AU - Xu, Jiaolong
AU - Vazquez, David
AU - Amores, Jaume
AU - Lopez, Antonio M.
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2015/8/26
Y1 - 2015/8/26
N2 - Despite recent significant advances, pedestrian detection continues to be an extremely challenging problem in real scenarios. In order to develop a detector that successfully operates under these conditions, it becomes critical to leverage upon multiple cues, multiple imaging modalities and a strong multi-view classifier that accounts for different pedestrian views and poses. In this paper we provide an extensive evaluation that gives insight into how each of these aspects (multi-cue, multi-modality and strong multi-view classifier) affect performance both individually and when integrated together. In the multi-modality component we explore the fusion of RGB and depth maps obtained by high-definition LIDAR, a type of modality that is only recently starting to receive attention. As our analysis reveals, although all the aforementioned aspects significantly help in improving the performance, the fusion of visible spectrum and depth information allows to boost the accuracy by a much larger margin. The resulting detector not only ranks among the top best performers in the challenging KITTI benchmark, but it is built upon very simple blocks that are easy to implement and computationally efficient. These simple blocks can be easily replaced with more sophisticated ones recently proposed, such as the use of convolutional neural networks for feature representation, to further improve the accuracy.
AB - Despite recent significant advances, pedestrian detection continues to be an extremely challenging problem in real scenarios. In order to develop a detector that successfully operates under these conditions, it becomes critical to leverage upon multiple cues, multiple imaging modalities and a strong multi-view classifier that accounts for different pedestrian views and poses. In this paper we provide an extensive evaluation that gives insight into how each of these aspects (multi-cue, multi-modality and strong multi-view classifier) affect performance both individually and when integrated together. In the multi-modality component we explore the fusion of RGB and depth maps obtained by high-definition LIDAR, a type of modality that is only recently starting to receive attention. As our analysis reveals, although all the aforementioned aspects significantly help in improving the performance, the fusion of visible spectrum and depth information allows to boost the accuracy by a much larger margin. The resulting detector not only ranks among the top best performers in the challenging KITTI benchmark, but it is built upon very simple blocks that are easy to implement and computationally efficient. These simple blocks can be easily replaced with more sophisticated ones recently proposed, such as the use of convolutional neural networks for feature representation, to further improve the accuracy.
UR - http://www.scopus.com/inward/record.url?scp=84951177187&partnerID=8YFLogxK
U2 - 10.1109/IVS.2015.7225711
DO - 10.1109/IVS.2015.7225711
M3 - Conference contribution
AN - SCOPUS:84951177187
T3 - IEEE Intelligent Vehicles Symposium, Proceedings
SP - 356
EP - 361
BT - IV 2015 - 2015 IEEE Intelligent Vehicles Symposium
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - IEEE Intelligent Vehicles Symposium, IV 2015
Y2 - 28 June 2015 through 1 July 2015
ER -