Adaptation of Deep Neural Networks for Visual Recognition
General presentation of the topic:
The proliferation of inexpensive sensors paves the way for innovative AI technologies and applications. For instance, data from distributed RGB, IR, LiDAR and depth sensors are often integrated into, e.g., mobile robotics, autonomous driving, and video surveillance applications to enhance perception. In such applications, cost-effective systems are required to recognize individuals, objects, and their behaviours from the massive amounts of data captured over multiple sensors. Beyond the computational complexity, video recognition across multiple domains (sensor modalities and operational conditions) may be degraded in real-world scenarios due to cross-domain shifts, background clutter, variations in illumination, occlusion, etc. Deep learning (DL) models based on convolutional NNs and vision transformers provide state-of-the-art performance in many visual recognition applications. Yet, their performance can decline in real-world applications when training on noisy data with limited or no annotations and in the presence of domain shift between source and target (operational) data.
Objectives:
The main objective of this project is to investigate and develop DL models for accurate visual recognition across multiple diverse domains with limited supervision. Given the cost of collecting and annotating the target data for training, these models will rely on methods for domain adaptation, weakly supervised learning, and data generation to sustain a high level of performance. We are looking for highly motivated students interested in performing cutting-edge research on machine learning algorithms applied to video-based object detection, tracking, retrieval, and classification, with a particular focus on deep learning architectures (e.g., auto-encoders, vision transformers, convolutional NNs) for source-free and test-time domain adaptation, and domain generalization. Application of interest includes person and vehicle recognition in video analytics and surveillance and cross-modal recognition.
Connaissances requises
Expected ability of the student:
• Strong academic record in computer science, applied mathematics, or electrical engineering, preferably with expertise in one or more areas: machine learning, computer vision, pattern recognition, artificial intelligence.
• Good programming skills in languages such as C, C++ and Python. Knowledge of deep learning frameworks would be a plus.