On Friday, October 2, the Computer Science Department will host its first colloquium of the Fall 2020 semester. Niluthpol Chowdhury Mithun of SRI International will give a technical talk entitled “Learning Multimodal Retrieval Models with Limited Labeled Data”. An abstract of his talk can be found below.
In recent years, tremendous success has been achieved in many computer vision and multimedia tasks using deep neural network models trained on large hand-labeled datasets. In many applications, this may be impractical or infeasible, either because of the non-availability of large datasets or the amount of time and resource needed for such labeling. In this respect, an increasingly important problem is in the light of data-hungry deep neural network models is how to learn useful models with limited labeled data. Developing robust models with a limited degree of supervision could be extremely useful for multi-modal retrieval and analysis tasks as collecting training data for these tasks is extremely labor-intensive and prone to significant errors. In this talk, I will go over several multi-modal retrieval tasks (i.e., video-text retrieval, RGB-LiDAR Localization, and text-based video moment retrieval) focusing on developing efficient solutions leveraging available incidental signals or weak labels.
Niluthpol Chowdhury Mithun is currently an Advanced Computer Scientist at the Center for Vision Technologies, SRI International in Princeton, NJ, USA. He graduated with a Ph.D. degree in 2019 from Video Computing Group at the University of California, Riverside (UCR). Before joining UCR, he was a Sr. Software Engineer at Samsung R&D Institute Bangladesh. Previously, he received his Bachelors and Masters degree from Bangladesh University of Engineering and Technology. His current research is focused on solving fundamental problems in Computer Vision, and Machine Learning with more focus on representation learning with multiple modalities (e.g., vision, language, LiDAR), learning under limited/weak supervision and multi-modal embedding. He has successfully applied these methods to several real-world problems such as image-text retrieval, video moment localization, video summarization, object detection, visual localization. His work has been published at several high-quality venues such as CVPR, MM, TIP, T-ITS etc. He has won the ACM International Conference on Multimedia retrieval 2018 best paper award and the SRI CVT SharkTank 2019. He serves as a program-committee member/reviewer for venues such as CVPR, ICCV, ECCV, AAAI, MM, ICIP, T-PAMI, T-MM, T-CSVT, T-ITS, PR, PRL