Smart Cities rely on the use of ICTs for a more efficient and intelligent use of resources, whilst improving citizens' quality of life and reducing the environmental footprint. As far as the livability of cities is concerned, traffic is one of the most frequent and complex factors directly affecting citizens. Particularly, drivers in search of a vacant parking spot are a non-negligible source of atmospheric and acoustic pollution. Although some cities have installed sensor-based vacant parking spot detectors in some neighbourhoods, the cost of this approach makes it unfeasible at large scale. As an approach to implement a sustainable solution to the vacant parking spot detection problem in urban environments, this work advocates fusing the information from small-scale sensor-based detectors with that obtained from exploiting the widely-deployed video surveillance camera networks. In particular, this paper focuses on how video analytics can be exploited as a prior step towards Smart City solutions based on data fusion. Through a set of experiments carefully planned to replicate a real-world scenario, the vacant parking spot detection success rate of the proposed system is evaluated through a critical comparison of local and global visual features (either alone or fused at feature level) and different classifier systems applied to the task. Furthermore, the system is tested under setup scenarios of different complexities, and experimental results show that while local features are best when training with small amounts of highly accurate on-site data, they are outperformed by their global counterparts when training with more samples from an external vehicle database.