Mihai Gabriel CONSTANTIN
Data și ora: 2021-03-23 17:00
Locația: Microsoft Teams
Rezumat teză de doctorat: Accesează
Machine learning methods that can automatically analyze images and videos represent an important part of the Artificial Intelligence domain,leading to the creation and popularization of some novel and sometimes groundbreaking methods, like the introduction of Convolutional Neural Networks. While in general, researchers addressed the prediction of objective concepts in multimedia, such as object detection, gesture recognition or scene classification, novel advances in machine learning algorithms, and the emergence of social networks that contain immense quantities of user-generated data allowed for the development of methods that analyze subjective concepts in multimedia data. Therefore, researchers seek to create methods that would answer some new questions about images and videos that would entail a subjective response from human annotators, such as “is this image interesting?” or “is this video violent?”. The subjective nature of these concepts implies the need for a close collaboration between researchers from different domains, including computer vision, psychology, and human behavior. This thesis will present an analysis and some prediction methods for five of these concepts: interestingness, aesthetic value, memorability, violence, and affective value. Specifically, we will start by analyzing the state-of-the-art literature and the concepts themselves, showing the definitions and taxonomies currently proposed, analyzing how these concepts affect human perception, what datasets are currently published, and state-of-the-art computer vision methods that attempt to predict these concepts. Following this, we will analyze my contributions to this domain, starting with the creation of some publicly available datasets and the common evaluation benchmarks they are part of and continuing with machine learning methods that predict these concepts. The datasets are developed for the prediction of interestingness, violence, and memorability and for the creation of video recommender systems, while the machine learning methods are used for interestingness, memorability, violence, arousal/valence, and fear. The machine learning methods consist of both traditional and deep learning-based systems but also introduce a novel approach to late fusion that uses several types of deep learning networks in the ensembling process, with results that significantly surpass the current state-of-the-art. Finally, this thesis ends with the conclusions, summarizing the main contributions, identifying the most promising results and proposing some future development directions.

Conducător de doctorat

Prof. dr. ing. Bogdan Emanuel IONESCU, Universitatea Politehnica din București, România.

Comisie de doctorat

Prof. dr. ing. Gheorghe BREZEANU, Universitatea Politehnica din București, România
Prof. dr. ing. Martha LARSON, Radbound University, Olanda
Dr. ing. Claire-Hélène DEMARTY, InterDigital, Franța
Prof. dr. ing. Mihai CIUC, Universitatea Politehnica din București, România.

Comisie de îndrumare

Prof. dr. ing. Mihai CIUC, Universitatea Politehnica din București, România
Conf. dr. ing. Horia CUCU, Universitatea Politehnica din București, România
Ș.l. dr. ing. Marta ZAMFIR, Universitatea Politehnica din București, România.