Publications

Peer-reviewed publications by year.

No matching publications found.

2025

03/02/2025
Privacy Preserving Histopathological Image Augmentation with Conditional Generative Adversarial Networks
access here
Author(s)
Andrei Alexandra‑Georgiana; Constantin Mihai Gabriel; Graziani Mara; Müller Henning; Ionescu Bogdan
Institution
National University of Science & Technology POLITEHNICA Bucharest, Romania; University of Applied Sciences Western Switzerland, Switzerland
Abstract
Deep learning approaches for histopathology image processing and analysis are gaining increasing interest in the research field, and this comes with a demand to extract more information from images. Pathological datasets are relatively small mainly due to confidentiality of medical data and legal questions, data complexity and labeling costs. Typically, a large number of annotated images for different tissue subtypes are required as training samples to automate the learning algorithms. In this paper, we present a latent-to-image approach for generating synthetic images by applying a Conditional Deep Convolutional Generative Adversarial Network for generating images of human colorectal cancer and healthy tissue. We generate high-quality images of various tissue types that preserve the general structure and features of the source classes, and we investigate an important yet overlooked aspect of data generation: ensuring privacy-preserving capabilities. The quality of these images is evaluated through perceptual experiments with pathologists and the Fréchet Inception Distance (FID) metric. Using the generated data to train classifiers improved MobileNet’s accuracy by 35.36%, and also enhanced the accuracies of DenseNet, ResNet, and EfficientNet. We further validated the robustness and versatility of our model on a different dataset, yielding promising results. Additionally, we make a novel contribution by addressing security and privacy concerns in personal medical image data, ensuring that training medical images “fingerprints” are not contained in the synthetic images generated with the model we propose.
Access
Closed Access
Type of Publication
Journal Article
Publisher
Elsevier, Pattern Recognition Letters
10/03/2025
Unsupervised Learning from EEG Data for Epilepsy: A Systematic Literature Review
access here
Author(s)
Tautan Alexandra Maria; Andrei Alexandra‑Georgiana; Smeraldi Carmelo Luca; Viatti Giampaolo; Rossi Simone; Ionescu Bogdan
Institution
National University of Science & Technology POLITEHNICA Bucharest, Romania; University of Siena, Italy; Harvard Medical School, USA
Abstract
Epilepsy is a neurological disorder characterized by recurrent epileptic seizures, whose neurophysiological signatures are altered electroencephalographic (EEG) activity. The use of artificial intelligence (AI) methods on EEG data can positively impact the management of the disease, significantly improving diagnostic and prognostic accuracy as well as treatment outcomes. Our work aims to systematically review the available literature on the use of unsupervised machine learning methods on EEG data in epilepsy, focusing on methodological and clinical differences in terms of algorithms used and clinical applications. Methods: Following the PRISMA guideline, a systematic literature search was performed in several databases for papers published in the last 10 years. Studies employing both unsupervised and self-supervised methods for the classification of EEG data in epilepsy patients were included. The main outcomes of the study were: (i) to provide an overview of the datasets used as input to train the algorithms; (ii) to identify trends in pre-processing, algorithm architectures, validation, and metrics for performance estimation; (iii) to identify and review the clinical applications of AI in epilepsy patients. Results: A total of 108 studies met the inclusion criteria. Of them, 86 (79.6%) have been published in the last 5 years and 60 (55.6%) in the last two years. The most used validation methods were: hold out in 37 (34.2%), k-fold cross validation in 35 (32.4%), and leave one out in 19 (17.6%) studies, respectively. Accuracy, sensitivity, and specificity were the most used performance metrics being reported in 71 (65.7%), 57 (44.9%), and 28 (25.9%) studies, respectively, followed by F1 score (27 studies; 25%), precision (26 studies; 24%), area under the curve (23 studies; 21.3%), and positive rate (21 studies; 20.3%). Furthermore, 42 (38.9%) compared to 53 (38.3%) studies used individual patient versus multiple patient models, respectively. Finally, concerning the clinical application of unsupervised learning methods on epilepsy patients, we identified six main fields of interest: seizure detection (53 studies; 53.9%), seizure prediction (27 studies; 25.4%), signal propagation and characterization (21 studies; 18.1%), seizure localization (10 studies; 3.7%), and seizure classification (22 studies; 20.3%), respectively. Conclusion: The results of the review suggest that the interest in the use of unsupervised learning methods in epilepsy has significantly increased in recent years. From a methodological perspective, the input EEG datasets used for training and testing the algorithms remain the hardest challenge. From a clinical standpoint, the vast majority of studies addressed seizure detection, prediction, and classification whereas studies focusing on seizure characterization and localization are lacking. Future work that can potentially improve the performance of these algorithms includes the use of context information via reinforcement learning and a focus on model explainability.
Access
Closed Access
Type of Publication
Journal Article
Publisher
Elsevier, Artificial Intelligence in Medicine
11/04/2025
Exploring Generative Adversarial Networks for Augmenting Network Intrusion Detection Tasks
access here
Author(s)
Constantin Mihai Gabriel; Stanciu Dan‑Cristian; Stefan Liviu‑Daniel; Dogariu Mihai; Mihailescu Dan; Cioban George; Bergeron Matt; Liu Winston; Belov Konstantin; Radu Octavian
Institution
University Politehnica of Bucharest, Romania; Keysight Technol, Bucharest, Romania; Keysight Technol, Santa Rosa, CA, USA
Abstract
The advent of generative networks and their adoption in numerous domains and communities have led to a wave of innovation and breakthroughs in AI and machine learning. Generative Adversarial Networks (GANs) have expanded the scope of what is possible with machine learning, allowing for new applications in areas such as computer vision, natural language processing, and creative AI. GANs, in particular, have been used for a wide range of tasks, including image and video generation, data augmentation, style transfer, and anomaly detection. They have also been used for medical imaging and drug discovery, where they can generate synthetic data to augment small datasets, reduce the need for expensive experiments, and lower the number of real patients that must be included in medical trials. Given these developments, we propose using the power of GANs to create and augment flow-based network traffic datasets. We evaluate a series of GAN architectures, including Wasserstein, conditional, energy-based, gradient penalty, and LSTM-GANs. We evaluate their performance on a set of flow-based network traffic data collected from 16 subjects who used their computers for home, work, and study purposes. The performance of these GAN architectures is described according to metrics that involve networking principles, data distribution among a collection of flows, and temporal data distribution. Given the tendency of network intrusion detection datasets to have a very imbalanced data distribution, i.e., a large number of samples in the “normal traffic” category and a comparatively low number of samples assigned to the “intrusion” categories, we test our GANs by augmenting these intrusion datasets.
Access
Closed Access
Type of Publication
Journal Article
Publisher
Association for Computing Machinery, ACM Transactions on Multimedia Computing Communications and Applications
01/01/2025
MelanoDet: Multimodal Imaging System for Screening and Analysis of Cutaneous Melanoma
access here
Author(s)
Sultana Alina Elena; Oniga Maria; Rus Paul Florin; Dobre Andra Laura; Orzan Olguta Anca
Institution
National University of Science and Technology POLITEHNICA Bucharest,, Romania; 4PSA SRL, Bucharest, Romania; Carol Davila University of Medicine and Pharmacy, Dermatology Department, Bucharest, Romania; Elias Emergency University Hospital, Bucharest, Romania
Abstract
Cutaneous melanoma is one of the most aggressive skin cancers, and early detection is critical for effective treatment. Although traditional imaging techniques such as dermoscopy improve diagnostic accuracy, many cases remain undetected. Recent advancements in non-invasive imaging, including multispectral imaging and infrared thermal imaging, offer new opportunities for early diagnosis. In this study, we introduce MelanoDet, a multimodal imaging system integrating three cameras corresponding to different spectra: visible, near-infrared, and long-wavelength infrared. To ensure accurate lesion inspection, an analytical method based on the optical cameras parameters estimated the cameras’ common field of view, to capture the same region of interest. Since the acquired images differ in resolution, angle, and orientation, a second refined registration step has been applied. A standardized image acquisition protocol was established in collaboration with an experienced team of dermatologists from Elias Emergency University Hospital, Bucharest. The MelanoDet system, along with its protocol and image processing techniques, was validated on 25 cases of suspicious nevi, demonstrating its potential for improving melanoma screening.
Access
Open Access
Type of Publication
Journal Article
Publisher
IEEE Access

2024

01/12/2024
SalvAIoT platform for mountain accidents prevention and search and rescue missions
access here
Author(s)
Dragulinescu Ana-Maria; Zamfirescu Ciprian; Ionescu Bogdan
Institution
National University of Science & Technology POLITEHNICA Bucharest, Romania
Abstract
Between 2015 and 2022, the Mountain Rescuers Romanian Association conducted over 57,000 search and rescue (SAR) operations, addressing incidents primarily caused by route deviation, failure to adapt to environmental conditions, and natural disasters. However, SAR operations in remote mountainous areas face significant challenges due to limited connectivity and the lack of an integrated platform linking users and SAR teams. This paper introduces the SalvAIoT platform addressing these challenges by equipping tourists and SAR unmanned vehicles with low-power, long-range communication technology devices, sensors, state-of-the-art embedded systems, and innovative TinyML algorithms for route assistance and victim localization. Additionally, the paper designs five practical scenarios for SalvAIoT use cases, involving end-devices (ED) for users and unmanned aerial and surface vehicles (UAVs, USVs) able to provide connectivity in harsh environments. Furthermore, the paper investigates the impact of antenna position deviation, UAV antenna height, and USV and ED position on the communication link. Experimental results show that LoRa technology can provide a detection range of 437.2 m for a direct link between user ED and UAV, 310.5 m for ED-USV, and 480 m for UAV-USV link when using the smallest spreading factor. Moreover, the paper highlights that increasing the UAV’s altitude from 30 m to 80 m results in a maximum difference of 8 dB for distances below 250 m. However, for distances between 250 m and the maximum coverage range, similar or better Received Signal Strength Indicator (RSSI) characteristics were observed when the UAV’s height was 30 m. To conclude, LoRa emerges as a promising technology for providing network coverage for the SalvAIoT platform, facilitating mountain accident prevention and intervention, particularly when employing unmanned vehicles.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
IEEE International Conference on Communications
14/11/2024
Freezing of Gait detection in Parkinson's disease: comparison of deep learning frameworks
access here
Author(s)
Andrei Alexandra-Georgiana; Tautan Alexandra-Maria; Ionescu Bogdan
Institution
National University of Science & Technology POLITEHNICA Bucharest, Romania
Abstract
Parkinson’s Disease (PD) is a neurodegenerative disorder characterized by various motor symptoms. One of the most prominent is Freezing of Gait (FoG). In this paper, we explore different deep learning frameworks (deep convolutional neural networks and an autoencoder-based network) for the automatic detection of FoG episodes in PD. Our study focuses on 3D accelerometer signals from the Daphnet dataset, which were used for both the training and testing of our models. We propose several robust methods inspired from well-known deep learning frameworks such as AlexNet, VGG and LeNet as well as an algorithm that combines an autoencoder-based feature extraction and classic machine learning classification. All algorithms are independent of the input window size and utilize raw data as input. Individual accelerometer axis as well as the magnitude from sensors placed on the ankle, thigh, and trunk are considered for their potential impact on the classification results. The potential of each individual algorithm to accurately detect FoG episodes is investigated in a 10-fold cross-validation. We achieved the highest accuracy of 90.92% using both an autoencoder and various convolutional neural network architectures applied to the magnitude of the accelerometer signals. For the LeNet-based and VGG-16-based architectures, sensors positioned on the thigh and trunk provide better results, while other deep learning frameworks are sensor-independent.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
IEEE International Symposium on Medical Measurements and Applications
29/01/2024
Few-Shot Object Detection as a Service: Facilitating Training and Deployment for Domain Experts
access here
Author(s)
Bailer Werner; Dogariu Mihai; Ionescu Bogdan; Fassold Hannes
Institution
Joanneum Research – Digital, Graz, Austria; National University of Science & Technology POLITEHNICA Bucharest, Romania
Abstract
We propose a service-based approach for training few-shot object detectors and running inference with these models. This eliminates the need to write code or execute scripts, thus enabling domain experts to train their own detectors. The training service implements an efficient ensemble learning method in order to obtain more robust models without parameter search. The entire pipeline is deployed as a single container and can be controlled from a web user interface.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
Lecture Notes in Computer Science
28/11/2024
Overview of the ImageCLEF 2024: Multimedia Retrieval in Medical Applications
access here
Author(s)
Ionescu Bogdan; Müller Henning; Dragulinescu Ana-Maria; Rückert Johannes; Ben Abacha Asma; de Herrera Alba Garcia Seco; Bloch Louise; Brünge Raphael; Idrissi-Yaghir Aymen; Schäfer Henning; Schmidt Cynthia Sabrina; Pakull Tabea M. G.; Damm Hendrik; Bracke Benjamin; Friedrich Christoph M.; Andrei Andrei Alexandra-Georgiana; Prokopchuk Yuri; Karpenka Dzmitry; Radzhabov Ahmedkhan; Kovalev Vassili; Macaire Cecile; Schwab Didier; Lecouteux Benjamin; Esperança-Rodier Emmanuelle; Yim Wen-Wai; Fu Yujun; Sun Zhaoyi; Yetisgen Meliha; Xia Fei; Hicks Steven A.; Riegler Michael A.; Thambawita Vajira; Storås Ståle; Halvorsen Pål; Heinrich Maximilian; Kiesel Johannes; Potthast Martin; Stein Benno
Institution
National University of Science and Technology Politehnica Bucharest, Romania; University of Applied Sciences Western Switzerland (HES-SO), Switzerland; University of Applied Sciences and Arts, Dortmund, Germany; Microsoft, Redmond, USA; University of Essex, Colchester, UK; UNED, Madrid, Spain; University Hospital Essen, Institute for Artificial Intelligence in Medicine, Germany; University Hospital Essen, Institute for Transfusion Medicine, Germany; Belarusian National Academy of Sciences, Minsk, Belarus; Belarus State University, Minsk, Belarus; University Grenoble Alpes, Inria, France; University of Washington, Seattle, USA; SimulaMet, Oslo, Norway; Bauhaus-Universität Weimar, Germany; University of Kassel, Germany; HessianAI, Darmstadt, Germany; ScaDS.AI, Leipzig, Germany
Abstract
This paper presents an overview of the ImageCLEF 2024 lab, organized as part of the Conference and Labs of the Evaluation Forum CLEF Labs 2024. ImageCLEF, an ongoing evaluation event since 2003, encourages the evaluation of technologies for annotation, indexing and retrieval of multimodal data. The goal is to provide information access to large collections of data across various usage scenarios and domains. In 2024, the 22st edition of ImageCLEF runs three main tasks: (i) a medical task, continuing the caption analysis, Visual Question Answering for colonoscopy images alongside GANs for medical images, and medical dialogue summarization; (ii) a novel task related to image retrieval/generation for arguments for visual communication, aimed at augmenting the effectiveness of arguments; and (iii) ToPicto, a new task focused on translating natural language, whether spoken or textual, into a sequence of pictograms. The benchmarking campaign was a real success and received the participation of over 35 groups submitting more than 220 runs.
Access
Closed Access
Type of Publication
Book Chapter
Publisher
Springer, Lecture Notes in Computer Science, Experimental IR Meets Multilinguality, Multimodality, and Interaction
24/05/2024
Advancing Multimedia Retrieval in Medical, Social Media and Content Recommendation Applications with ImageCLEF 2024
access here
Author(s)
Ionescu Bogdan; Müller Henning; Dragulinescu Ana Maria; Idrissi-Yaghir Ahmad; Radzhabov Ahmedkhan; de Herrera Alba Garcia Seco; Andrei Alexandra; Stan Alexandru; Storås Andrea M.; Ben Abacha Asma; Lecouteux Benjamin; Stein Benno; Macaire Cecile; Friedrich Christoph M.; Schmidt Cynthia Sabrina; Schwab Didier; Esperança-Rodier Emmanuelle; Ioannidis George; Adams G.; Schäfer Henning; Manguinhas Hugo; Coman Ioan; Schöler Johanna; Kiesel Johannes; Rückert Johannes; Bloch Louise; Potthast Martin; Heinrich Maximilian; Yetisgen Meliha; Riegler Michael; Snider Neal; Halvorsen Pål; Hicks Steven A.; Thambawita Vajira; Kovalev Vassili; Prokopchuk Yuri; Yim Wen-Wai
Institution
National University of Science & Technology POLITEHNICA Bucharest, Romania; Univ Appl Sci Western Switzerland HES SO, Switzerland; CEA, LIST, Paris, France; Univ Appl Sci & Arts Dortmund, Germany; University of Essex, England; IN2 Digital Innovation, Germany; SimulaMet, Oslo, Norway; Microsoft, Redmond, WA, USA; Columbia University, USA; Univ Hosp Essen, Germany; Europeana Foundation, The Hague, Netherlands; Belarusian State University, Belarus; Sahlgrens Univ Hospital, Sweden; University of Washington, USA; Microsoft, Nuance, Burlington, MA, USA; Belarus Natl Acad Sci, Belarus; Bauhaus Univ Weimar, Germany; University of Leipzig, Germany; ScaDS AI, Leipzig, Germany; Univ Grenoble Alpes, CNRS, Grenoble INP, France
Abstract
The ImageCLEF evaluation campaign was integrated with CLEF (Conference and Labs of the Evaluation Forum) for more than 20 years and represents a Multimedia Retrieval challenge aimed at evaluating the technologies for annotation, indexing, and retrieval of multimodal data. Thus, it provides information access to large data collections in usage scenarios and domains such as medicine, argumentation and content recommendation. ImageCLEF 2024 has four main tasks: (i) a Medical task targeting automatic image captioning for radiology images, synthetic medical images created with Generative Adversarial Networks (GANs), Visual Question Answering and medical image generation based on text input, and multimodal dermatology response generation; (ii) a joint ImageCLEF-Touche task Image Retrieval/Generation for Arguments to convey the premise of an argument, (iii) a Recommending task addressing cultural heritage content-recommendation, and (iv) a joint ImageCLEF-ToPicto task aiming to provide a translation in pictograms from natural language. In 2023, participation increased by 67% with respect to 2022 which reveals its impact on the community.
Access
Closed Access
Type of Publication
Book Chapter
Publisher
Springer, Lecture Notes in Computer Science, Advances in Information Retrieval, ECIR 2024, PT VI
01/01/2024
Multimodal active speaker detection using cross-Attention and contextual information
access here
Author(s)
Mocanu Bogdan; Tapu Ruxandra
Institution
Institut Polytechnique de Paris, Télécom SudParis, Laboratoire SAMOVAR, Evry, France; National University of Science and Technology Politehnica Bucharest
Abstract
An active speaker detection (ASD) framework is aimed to identify whether an on-screen person is speaking or not in each frame of the video. In this paper, we introduce a novel ASD system by mindful integration of audio-video cues through a cross-Attention module to capture inter-modal information while retaining the distinct intra-modal features. Furthermore, the system models the inter-speaker relations between the speakers within the same scene. The experimental evaluation validates the effectiveness of the approach, achieving an average mAP score of 94.8%.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
IEEE International Conference on Consumer Electronics
01/01/2024
SemanticAd: A Multimodal Contextual Advertisement Framework for Online Video Streaming Platforms
access here
Author(s)
Mocanu, Bogdan; Tapu, Ruxandra
Institution
National University of Science and Technology Politehnica Bucharest, Romania; Institut Polytechnique de Paris, Télécom SudParis, Laboratoire SAMOVAR, France
Abstract
In the past few years, the online video streaming market has witnessed rapid growth and has become the most important form of entertainment. Motivated by the huge business opportunities, the advertisement insertion mechanisms have become a hot topic of research and represent the most important component of an online delivery ecosystem. In this paper, we introduce SemanticAd, a multimodal ad insertion framework designed from the viewers' perspective in terms of the quality of experience and degree of intrusiveness. The core of the proposed approach involves a novel temporal segmentation algorithm that extracts story units with a frame-level precision. To the best of our knowledge, the proposed solution is the most robust and accurate solution dedicated to TV news videos. In addition, by taking into consideration ad temporal distribution and semantic information, the framework proposes commercials that are contextually relevant with respect to video content. The quantitative and qualitative experimental results conducted on a challenging set of 50 multimedia documents validate the SemanticAd methodology, returning a F1-score superior to 92%. Moreover, when compared to other state-of-the-art methods, our system demonstrates its superiority with gains in performance ranging in the [4.19%, 10.22%] interval.
Access
Open Access
Type of Publication
Journal Article
Publisher
IEEE Access
01/01/2024
SemanticAd: A Multimodal Contextual Advertisement Framework for Online Video Streaming Platforms
access here
Author(s)
Mocanu Bogdan; Tapu Ruxandra
Institution
National University of Science and Technology Politehnica Bucharest, Romania; Institut Polytechnique de Paris, Laboratoire SAMOVAR, Télécom SudParis, France
Abstract
In the past few years, the online video streaming market has witnessed rapid growth and has become the most important form of entertainment. Motivated by the huge business opportunities, the advertisement insertion mechanisms have become a hot topic of research and represent the most important component of an online delivery ecosystem. In this paper, we introduce SemanticAd, a multimodal ad insertion framework designed from the viewers' perspective in terms of the quality of experience and degree of intrusiveness. The core of the proposed approach involves a novel temporal segmentation algorithm that extracts story units with a frame level precision. To the best of our knowledge, the proposed solution is the most robust and accurate solution dedicated to TV news videos. In addition, by taking into consideration ad temporal distribution and semantic information, the framework proposes commercials that are contextually relevant with respect to video content. The quantitative and qualitative experimental results conducted on a challenging set of 50 multimedia documents validate the SemanticAd methodology, returning a F1-score superior to 92%. Moreover, when compared to other state-of-the-art methods, our system demonstrates its superiority with gains in performance ranging in the [4.19%, 10.22%] interval.
Access
Open Access
Type of Publication
Journal Article
Publisher
IEEE Access
01/01/2024
Intelligent System for Monitoring Skin Lesions From Dermatoscopic Images
access here
Author(s)
Oniga Maria; Sultana Alina Elena; Orzan Olguta Anca
Institution
National University of Science and Technology Politehnica Bucharest, Romania; Carol Davila University of Medicine and Pharmacy, Bucharest, Romania
Abstract
Skin cancer is one of the most aggressive types of cancers that causes over 20,000 deaths/year in Europe. If diagnosed early, this type of cancer is the most curable one. Thus, this paper proposes an intelligent system designed specifically to aid clinicians in diagnosing and monitoring multiple types of skin lesions. The system is composed of a Graphic User Interface (GUI) that encapsulates classification (using DenseNet-121) and segmentation (using U-Net) of images from which features are extracted based on the type of diagnosis determined. The extracted features are the ones corresponding to the ABCD rule of dermatology, to which we added area and redness, that are specific to the vascular lesions. The system has two essential windows, one that refers to image interpretation, namely, diagnosis determination and feature extraction, while the other window allows extracting and viewing information about previous screenings. The solution was evaluated on HAM10000 and on a proprietary dataset, that contains images captured with usual cameras. Results show that the proposed application can represent a second opinion regarding the decision-making process for the dermatologists. Thus, the proposed system can provide all the necessary information for a better personalization of the therapy, as well as monitoring of therapeutic effects, to improve the efficiency of the medical act.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
15th International Conference on Communications (COMM 2024)
01/01/2024
Research on Best Practices for EEG Analysis in Sleep Stage Scoring
access here
Author(s)
Preoteasa Rareş-Marin; Oniga Maria; Sultana Alina Elena; Orzan Olguţa Anca; Ţarǎlungǎ Dragoş-Daniel; Vasile Titus Mihai; Neagu Georgeta-Mihaela
Institution
National University of Science and Technology Politehnica Bucharest, Romania; Oncological Dermatology, Elias Emergency University Hospital, Carol Davila University of Medicine and Pharmacy, Bucharest, Romania; Neurology, Central Military Emergency University Hospital, Carol Davila University of Medicine and Pharmacy, Bucharest, Romania; Faculty of Medical Engineering, Romania
Abstract
This paper outlines the best practices in utilizing the electroencephalogram (EEG) signals for scoring sleep stages, deliberates on their effectiveness in practical applications, and examines the impacts of various feature selection approaches. The research findings indicate optimal outcomes when EEG signals recorded from electrodes Fp2-F4 are used. This is achieved by i) implementing fourth-order Butterworth filters and isolating the five EEG rhythm bands (alpha, beta, gamma, delta, and theta), and ii) employing the spectra and spectrogram of the extracted EEG bands as input for a support vector machine (SVM) for automatic sleep stage scoring. It separates the sleep states based on the energy estimation and considering the maxim-minim distance for samples of duration 10 seconds, using a limited amount of data for training, in order to minimize the volume of samples which overlap and to avoid overfitting. Other electrodes can be further considered, the strongest received signal changing in time and among patients. It is important to highlight that this paper proves highly valuable for accelerating the classification of sleep stages, particularly when an enough amount of input data is available.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
IEEE International Symposium on Medical Measurements and Applications
01/01/2024
Neural Network Based Fetal ECG Extraction from Abdominal Signals
access here
Author(s)
Țarălungă Dragoș-Daniel; Botezatu Radu; Sultana Alina-Elena; Vasile Titus Mihai; Neagu Georgeta-Mihaela
Institution
National University of Science and Technology Politehnica Bucharest, Romania; Carol Davila University of Medicine and Pharmacy, Bucharest, Romania
Abstract
Non-invasive fetal ECG extraction from abdominal signals might provide significant information for long-term fetal monitoring, being very attractive for physicist. Nevertheless, accurate extraction of the fetal ECG is a challenging task, due to the disturbing signals, which overlap the signal of interest in the frequency domain. Among the current denoising methods, neural networks are very attractive due to their performance. The current paper proposes a linear feed-forward neural network that estimates very accurately the abdominal mECG, the strongest disturbing signal, based on two thoracic mECG, removing it thereafter. The obtained results are very promising, allowing the further investigation of the fHR, for the fetal well-being evaluation. The comparison with the event-synchronous interference canceller shows the advantage of the neural network in preserving the fECG morphology, with the cost of higher computation time. Both methods require the preprocessing of abdominal signal in order to remove the power line interference and the baseline wander.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
Springer, IFMBE Proceedings
01/01/2024
Evaluation of BSS Methods for Fetal ECG Extraction and Analysis
access here
Author(s)
Manea Ionut; Taralunga Dragos
Institution
National University of Science and Technology POLITEHNICA Bucharest, Applied Electronics and Information Engineering, Romania
Abstract
Assessing the health state of the fetal cardiovascular system during pregnancy represents a major challenge in the current medical system. To be able to prevent cardiac malformations, a non-invasive fetal diagnosis method is needed, which highlights the cardiac status of the fetus. The fetal electrocardiogram (fECG), extracted from abdominal signals, recorded non-invasively, can overcome the limitations of the classic methods of intrauterine monitoring, without endangering the fetus and highlighting more diagnostic features compared to ones obtained with the methods in clinical practice. However, fECG is a very noisy signal, being sensitive to fetal or maternal movements, to other physiological signals that can be recorded from the abdomen like the maternal electrocardiogram (mECG) and the powerline interference (PLI). This paper evaluates the performance of two methods for fECG extraction from abdominal signals. One is based on the cancellation of the interferences through time-frequency signal processing methods and the band-pass filtering to isolate fetal QRS complexes. The other method is based on Blind Source Separation (BSS) method, namely the Independent Component Analysis (ICA). Isolation of the PLI, the motion artifacts (MA) and the baseline wander (BW) is performed, in both cases using Empirical Mode Decomposition (EMD) and Empirical Wavelet Transform (EWT). The mECG is approximated and eliminated using Fractional Fourier Transform (FrFT) and Maximum Likelihood Estimation (MLE). The two methods are evaluated on a synthetic database in order to evaluate the effectiveness of the BSS methods, compared to a more general approach.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
8th IEEE-EMBS Conference on Biomedical Engineering and Sciences
01/01/2024
Artificial Intelligence Advancements in Fetal Monitoring: Enhancing Prenatal Care
access here
Author(s)
Țarălungă Dragoș Daniel; Manea Ionut; Preoteasa Rareș-Marin; Florea Bogdan Cristian; Neagu Georgeta Mihaela
Institution
National University of Science and Technology Politehnica Bucharest, Romania
Abstract
Pregnancy and the delivery of a healthy baby rank among the crucial milestones in the human life cycle. Obstetrical science is devoted to ensuring these events unfold as smoothly as possible, promoting the well-being of both the infant and the mother. The primary challenge in achieving this objective is the occurrence of fetal deaths within the uterus, which represents a significant obstacle to the overall goal of a healthy outcome for the baby and the mother. Fetal monitoring is an essential aspect of prenatal care, aiming to assess the health and well-being of the developing fetus during pregnancy. There are various approaches for fetal monitoring used in clinical practice: cardiotocography (CTG) and Doppler ultrasound, fetal electrocardiography, fetal ecography etc. However there are crucial limitations: inter- and intra-observer variability, invasivity, low signal to noise ratio (SNR) etc. The aim of the present study is to evaluate the current contribution of artificial intelligence (AI) advancements in offering innovative solutions to enhance accuracy, early detection, and overall efficiency in assessing fetal health.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
Springer, IFMBE Proceedings
01/01/2024
Small-Scale V2V-VLC for Enhanced Road Traffic Safety
access here
Author(s)
Moussa Elie Hajj; Marcu Alina-Elena; Chehade Rima Abdallah; Ballouz Gilles; Ionescu Bogdan
Institution
Lebanese University, Faculty of Engineering-Branch II, Roumieh, Lebanon; National University of Science and Technology POLITEHNICA Bucharest, Romania
Abstract
In the context of global research aimed at enhancing data transfer rates and efficiency, light emerges as a cost-free and unregulated asset that has the potential to revolutionize communication technologies. The complete replacement of conventional lighting with LEDs led to the development of a novel optical communication system. Visible light communication (VLC) uses LEDs as transmitters and photodetectors or image sensors as receivers. The technology has significant potential for a range of applications, especially in intelligent transport systems (ITS). In this context, the headlights and taillights of vehicles can transmit different types of information to other vehicles, enhancing traffic safety and ultimately preventing fatalities on the road. The paper proposes a small-scale vehicle-To-vehicle ITS that uses robot-Type cars that are equipped with emission and reception modules. The cars transmit various information using VLC to implement a variety of events, including emergency braking, deceleration, left or right turns, and engine on/off, all of which are essential for road traffic safety. The maximum distance between vehicles that ensures successful data transmission will also be determined.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
IEEE International Symposium for Design and Technology of Electronics Packages
01/01/2024
Wide Temperature Range Modeling Considerations for Silicon Carbide Schottky Diodes
access here
Author(s)
Pristavu Gheorghe; Oneata Dan-Theodor; Pascu Rǎzvan; Marcu Alina-Elena; Draghici Florin; Brezeanu Gheorghe
Institution
National University of Science and Technology POLITEHNICA Bucharest; National Institute for R&D in Microtechnology, Romania
Abstract
The paper analyzes the challenges of modeling silicon carbide Schottky diodes with low barrier height values over extensive temperature ranges. Such devices, with Cr/4H-SiC contacts, are fabricated and measured in the 25 - 366°C interval. Excellent fitting of all experimental forward curves (R2 = 99.89%) is achieved using the p-diode model. The degree of contact inhomogeneity is evaluated and parameter relevance is discussed.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
IEEE Conference on Advanced Topics on Measurement and Simulation
01/01/2024
Accurate Numerical Methods for Modeling Forward Characteristics of High Temperature Capable Schottky Diodes
access here
Author(s)
Pristavu Gheorghe; Oneață Dan-Theodor; Pascu Răzvan; Marcu Alina Elena; Șerbănescu Matei-Constantin; Enache Andrei; Drăghici Florin; Brezeanu Gheorghe
Institution
National University of Science and Technology Politehnica Bucharest, Romania; National Institute for R&D in Microtechnology, Bucharest, Romania
Abstract
The paper discusses two algorithms for accurately determining solutions to the transcendental thermionic emission equation, which is the cornerstone of forward electrical behavior in Schottky diodes. The numerical techniques are developed based on the Newton-Raphson and Halley methods. Both approaches use distinct forms for the thermionic emission expression, emphasizing robustness against numerical overflows. Parameter initialization, complexity and applicability are discussed for each technique. A comparison is carried out between forward characteristics simulated with the two methods, which are then also used for characterizing real SiC-Schottky diodes. Results evince complete compatibility and highly accurate approximations of experimental measurements (R2 ∼= 99.9%) on devices with different contact compositions.
Access
Closed Access
Type of Publication
Journal Article
Publisher
Publishing House of the Romanian Academy, Romanian Journal of Information Science and Technology

2023

11/01/2024
Deepfake Sentry: Harnessing Ensemble Intelligence for Resilient Detection and Generalisation
access here
Author(s)
Stefan Liviu‑Daniel; Stanciu Dan‑Cristian; Dogariu Mihai; Constantin Mihai Gabriel; Jitaru Andrei Cosmin; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania
Abstract
Recent advancements in Generative Adversarial Networks (GANs) have enabled photorealistic image generation with high quality. However, the malicious use of such generated media has raised concerns regarding visual misinformation. Although deepfake detection research has demonstrated high accuracy, it is vulnerable to advances in generation techniques and adversarial iterations on detection countermeasures. To address this, we propose a proactive and sustainable deepfake training augmentation solution that introduces artificial fingerprints into models. We achieve this by employing an ensemble learning approach that incorporates a pool of autoencoders that mimic the effect of the artefacts introduced by the deepfake generator models. Experiments on three datasets reveal that our proposed ensemble in terms of generalisation, resistance against basic data perturbations such as noise, blurring, sharpness enhancement, and affine transforms, resilience to commonly used lossy compression algorithms such as JPEG, and enhanced resistance against adversarial attacks.
Access
Closed Access
Type of Publication
Journal Article
Publisher
University Politehnica of Bucharest, Scientific Bulletin Series C-Electrical Engineering and Computer Science
11/01/2024
Inducer Selection Principles for DeepFusion Systems
access here
Author(s)
Constantin Mihai Gabriel; Stefan Liviu‑Daniel; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Bucharest, Romania
Abstract
The current landscape of ensemble learning or late fusion approaches is dominated by methods that employ a very low number of inducer systems, while using traditional approaches with regards to the fusion engine, predominantly statistical, weighted, Bagging or Random Forests. Even with the advent of deep learning, few approaches use deep neural networks in building the ensemble decision and improving the results of single-system approaches. One of these methods is represented by the DeepFusion set of approaches, that integrate a very large number of inducer systems, while providing significantly improved final performance over the performance of its component inducers. However, no attempt has yet been made for Deep-Fusion with regards to reducing and optimizing the set of inducers, while maintaining the same level of performance. Thus, this paper proposes a set of methods for inducer selection and reduction, based on their performance and on their similarity computed via clustering. Our methods are tested on the popular Interestingness10k dataset, that provides data and inducers for the prediction of image and video visual interestingness. We present an in-depth analysis of the performance of the optimization methods, with regards to the results according to the main performance metric associated with this dataset, as well as the degree to which these methods reduce the number of utilized inducers.
Access
Closed Access
Type of Publication
Journal Article
Publisher
University Politehnica of Bucharest, Scientific Bulletin Series C-Electrical Engineering and Computer Science
11/06/2023
TMS-EEG Perturbation Biomarkers for Alzheimer's Disease Patients Classification
access here
Author(s)
Tautan Alexandra‑Maria; Casula Elias; Pellicciari Maria Concetta; Borghi Ilaria; Maiella Michele; Bonni Sonia; Minei Marilena; Assogna Martina; Palmisano Annalisa; Smeralda Carmelo; Romanella Sara.; Ionescu Bogdan; Koch Giacomo; Santarnecchi Emiliano
Institution
Harvard Medical School, Massachusetts General Hospital, Precis Neuroscience & Neuromodulation Program, USA; Harvard Medical School, Massachusetts General Hospital, Network Control Lab, USA; Harvard Medical School, Berenson-Allen Center for Noninvasive Brain Stimulation, USA; University of Ferrara, Italy; Santa Lucia Foundation, Italy; University Politehnica of Bucharest, Romania; University of Siena, Italy; University of Bari Aldo Moro, Italy; University of Rome La Sapienza, Italy
Abstract
The combination of TMS and EEG has the potential to capture relevant features of Alzheimer's disease (AD) pathophysiology. We used a machine learning framework to explore time-domain features characterizing AD patients compared to age-matched healthy controls (HC). More than 150 time-domain features including some related to local and distributed evoked activity were extracted from TMS-EEG data and fed into a Random Forest (RF) classifier using a leave-one-subject out validation approach. The best classification accuracy, sensitivity, specificity and F1 score were of 92.95%, 96.15%, 87.94% and 92.03% respectively when using a balanced dataset of features computed globally across the brain. The feature importance and statistical analysis revealed that the maximum amplitude of the post-TMS signal, its Hjorth complexity and the amplitude of the TEP calculated in the window 45-80 ms after the TMS-pulse were the most relevant features differentiating AD patients from HC. TMS-EEG metrics can be used as a non-invasive tool to further understand the AD pathophysiology and possibly contribute to patients' classification as well as longitudinal disease tracking.
Access
Open Access
Type of Publication
Journal Article
Publisher
Springer Nature, Scientific Reports
28/08/2024
Maritime Internet of Things meets LoRaWAN: a Container Testbed, Measurement Campaign and Dataset Analysis
access here
Author(s)
Dragulinescu Ana-Maria; Constantin Mihai Gabriel; Ionescu Bogdan; Vochin Marius; Tamas Razvan
Institution
National University of Science & Technology POLITEHNICA Bucharest, Romania; Constanta Maritime University, Romania
Abstract
Internet of Things improved the decision making processes related to ship and maritime environment and brought the software and hardware data acquisition, transmission and processing platforms and knowledge closer to the users through the so-called Maritime IoT (MIoT), enabling maritime services such as cargo, containers and ship monitoring. Recently, commercial solutions employing LoRa/LoRaWAN as a communication technology for containers monitoring were announced, but in the literature no experimental testbed and associate experiments were identified to assess the performance of such solutions, even though the maritime environment is very challenging and different communication technologies and topologies are proposed to meet its requirements. The contribution of the current paper is three-fold: i) proposing a practical implementation of a LoRaWAN-based container monitoring system, ii) analysing the collected dataset, solving a machine learning problem, i.e. classifying LoRaWAN packets origin location based on k-best selected features algorithm and algorithms such as k-Nearest Neighbours, Gradient Boosting a.s.o. iii) assessing the performance of the communication in multiple containers scenarios. The results show that LoRaWAN has a huge potential to ensure communication between containers and outdoor gateways.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
IEEE World Forum on Internet of Things
28/08/2024
Tridimensional Representation and Analysis of Ultra Wideband using Two Way Ranging for Maritime and Sports Applications
access here
Author(s)
Tanase Cristian-Alexandru; Dumitrescu Anamaria; Dragulinescu Ana-Maria Claudia; Ionescu Bogdan Emanuel
Institution
National University of Science & Technology POLITEHNICA Bucharest, Romania
Abstract
Ultra-Wideband (UWB) technology proved its applicability in maritime, medical and sport environments. Tracking assets and personnel in maritime environments and elderly in retirement homes. In this paper, we analyse the behaviour of Two Way Ranging (TWR), a UWB radio based localisation method. The purpose of the study is to understand UWB behaviour at short range or medium range for future data set creation with the constrain of using a single anchor or a reduced number of devices. To this end, we created two-dimensional (2D) and tri-dimensional (3D) representations of the localisation information provided by the NXP SR150 anchor and an NXP SR040 tag. The determination was made using angle of arrival in a frontal cone with phase difference of arrival (PDoA) realised between its antennae. The outcome of the study and measurements shows that the standard deviation of the measured position by means of UWB increases with range from an average of 0.84 cm at 0.25 m to 3.68 cm at 3 m. If the anchor is covered by a blocking medium such as ice or by a human hand, the AoA will be highly affected and the percentage of non-line-of-sight transmissions will be increased. The general behaviour is quite linear and can be used for future data sets useful for training machine learning algorithms.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
IEEE World Forum on Internet of Things
08/03/2024
Assessing the difficulty of predicting media memorability
access here
Author(s)
Constantin Mihai Gabriel; Dogariu Mihai; Jitaru Andrei-Cosmin; Ionescu Bogdan
Institution
National University of Science & Technology POLITEHNICA Bucharest, Romania
Abstract
Memorability is a critical aspect of human cognition that has been studied extensively in various fields, including psychology, education, and computer vision. The ability to remember information and experiences over time is essential for learning, decision-making, and creating lasting impressions. While the number of computer vision works that attempt to predict the memorability score of videos has recently seen a significant boost, thanks to several benchmarking tasks and datasets, some questions related to the performance of automated systems on certain types of videos are still largely unexplored. Given this, we are interested in discerning what makes a video sample easy or hard to classify or predict from a memorability standpoint. In this paper, we use a large set of runs, created and submitted by the participants to the MediaEval Predicting Video Memorability task, and, using their results and a set of visual, object, and annotator-based features and analyses, we attempt to find and define common traits that make the memorability scores of videos hard or easy to predict.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
20th International Conference on Content-Based Multimedia Indexing
04/10/2023
Autoencoder-based Data Augmentation for Deepfake Detection
access here
Author(s)
Stanciu Dan-Cristian; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Bucharest, Romania
Abstract
Image generation has seen huge leaps in the last few years. Less than 10 years ago we could not generate accurate images using deep learning at all, and now it is almost impossible for the average person to distinguish a real image from a generated one. In spite of the fact that image generation has some amazing use cases, it can also be used with ill intent. As an example, deepfakes have become more and more indistinguishable from real pictures and that poses a real threat to society. It is important for us to be vigilant and active against deepfakes, to ensure that the false information spread is kept under control. In this context, the need for good deepfake detectors feels more and more urgent. There is a constant battle between deepfake generators and deepfake detection algorithms, each one evolving at a rapid pace. But, there is a big problem with deepfake detectors: they can only be trained on so many data points and images generated by specific architectures. Therefore, while we can detect deepfakes on certain datasets with near 100% accuracy, it is sometimes very hard to generalize and catch all real-world instances. Our proposed solution is a way to augment deepfake detection datasets using deep learning architectures, such as Autoencoders or U-Net. We show that augmenting deepfake detection datasets using deep learning improves generalization to other datasets. We test our algorithm using multiple architectures, with experimental validation being carried out on state-of-the-art datasets like CelebDF and DFDC Preview. The framework we propose can give flexibility to any model, helping to generalize to unseen datasets and manipulations.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
2nd ACM International Workshop on Multimedia AI against Discrimination
11/12/2024
Overview of the ImageCLEF 2023: Multimedia Retrieval in Medical, Social Media and Internet Applications
access here
Author(s)
Ionescu Bogdan; Müller Henning; Dragulinescu Ana-Maria; Yim Wen-Wai; Abacha Asma Ben; Sinder Neal; Adams Griffin; Yetisgen Melih; Rueckert Johannes; de Herrera AGS; Friedrich Christoph M.; Bloch Louise; Brüngel Raphael; Idrissi-Yaghir Ahmad; Schäfer Henning; Hicks Steven A.; Riegler Michael A.; Thambawita Vajira; Storås Andrea M.; Halvorsen Pål; Papachrysos Nikolaos; Schöler Johanna; Jha Debesh; Andrei Alexandra-Georgiana; Coman Ioan; Kovalev Vassil; Radzhabov Ahmedkhan; Prokophchuk Yurii; Stefan Liviu-Daniel; Constantin Mihai-Gabriel; Dogariu Mihai; Deshayes Jerome; Popescu Adrian
Institution
Univ Politehn Bucharest, Romania; Univ Appl Sci Western Switzerland HES SO, Sierre, Switzerland; Microsoft, Redmond, WA USA; Columbia Univ, New York, NY USA; Univ Washington, Seattle, WA USA; Univ Appl Sci & Arts Dortmund, Dortmund, Germany; Univ Essex, Colchester, Essex, England; SimulaMet, Oslo, Norway; Sahlgrens Univ Hosp, Gothenburg, Sweden; Northwestern Univ, Chicago, IL USA; Belarusian State Univ, Minsk, BELARUS; Belarusian Natl Acad Sci, Minsk, BELARUS; CEA LIST, Palaiseau, France
Abstract
This paper presents an overview of the ImageCLEF 2023 lab, which was organized in the frame of the Conference and Labs of the Evaluation Forum - CLEF Labs 2023. ImageCLEF is an ongoing evaluation event that started in 2003 and encourages the evaluation of technologies for annotation, indexing and retrieval of multimodal data with the goal of providing information access to large collections of data in various usage scenarios and domains. In 2023, the 21st edition of ImageCLEF runs three main tasks: (i) a medical task including the sequel of the caption analysis task and new tasks such as GANs for medical images, Visual Question Answering for colonoscopy images, and medical dialogue summarization; (ii) a fusion task addressing late fusion schemes for performance boosting, including real-world applications like image search diversification (retrieval) and prediction of visual interestingness (regression); and (iii) a social media-aware task on the potential real-life effects of online image sharing. The benchmark campaign was a success with over 45 groups submitting more than 240 runs.
Access
Closed Access
Type of Publication
Book Chapter
Publisher
Springer, Lecture Notes in Computer Science, Experimental IR Meets Multilinguality, Multimodality, and Interaction, CLEF 2023
21/06/2023
ImageCLEF 2023 Highlight: Multimedia Retrieval in Medical, Social Media and Content Recommendation Applications
access here
Author(s)
Ionescu Bogdan; Müller Henning; Dragulinescu Ana-Maria; Popescu Adrian; Idrissi-Yaghir Ahmad; de Herrera AGS; Andrei Alexandra; Stan Alexandru; Storås Andrea M.; Ben Abacha Asma; Friedrich Christoph M.; Ioannidis George; Adams Griffin; Schäfer Henning; Manguinhas Hugo; Filipovich Ihar; Coman Ioan; Deshayes Jerome; Schöler Johanna; Rückert Johannes; Stefan Liviu-Daniel; Bloch Louise; Yetisgen Melih; Riegler Michael A.; Dogariu Mihai; Constantin Mihai Gabriel; Snider Neal; Papachrysos Nikolaos; Halvorsen Pål; Brüngel Raphael; Kozlovski Serge; Hicks Steven; de Lange Thomas; Thambawita Vajira; Kovalev Vassil; Yim Wen-Wai
Institution
Univ Politehn Bucharest, Romania; Univ Appl Sci Western Switzerland HES SO, Sierre, Switzerland; CEA LIST, Gif Sur Yvette, France; Univ Appl Sci & Arts Dortmund, Dortmund, Germany; Univ Essex, Colchester, England; IN2 Digital Innovat, Lindau, Germany; SimulaMet, Oslo, Norway; Microsoft, Redmond, WA USA; Columbia Univ, New York, NY USA; Univ Hosp Essen, Essen, Germany; Europeana Fdn, The Hague, Netherlands; Belarusian State Univ, Minsk, BELARUS; Sahlgrens Univ Hosp, Gothenburg, Sweden; Univ Washington, Seattle, WA USA; Nuance, Burlington, MA USA; Belarusian Acad Sci, Minsk, BELARUS
Abstract
In this paper, we provide an overview of the upcoming ImageCLEF campaign. ImageCLEF is part of the CLEF Conference and Labs of the Evaluation Forum since 2003. ImageCLEF, the Multimedia Retrieval task in CLEF, is an ongoing evaluation initiative that promotes the evaluation of technologies for annotation, indexing, and retrieval of multimodal data with the aim of providing information access to large collections of data in various usage scenarios and domains. In its 21st edition, ImageCLEF 2023 will have four main tasks: (i) a Medical task addressing automatic image captioning, synthetic medical images created with GANs, Visual Question Answering for colonoscopy images, and medical dialogue summarization; (ii) an Aware task addressing the prediction of real-life consequences of online photo sharing; (iii) a Fusion task addressing late fusion techniques based on the expertise of a pool of classifiers; and (iv) a Recommending task addressing cultural heritage content-recommendation. In 2022, ImageCLEF received the participation of over 25 groups submitting more than 258 runs. These numbers show the impact of the campaign. With the COVID-19 pandemic now over, we expect that the interest in participating, especially at the physical CLEF sessions, will increase significantly in 2023.
Access
Closed Access
Type of Publication
Book Chapter
Publisher
Springer, Lecture Notes in Computer Science, Advances in Information Retrieval, ECIR 2023
18/09/2023
Overview of ImageCLEFfusion 2023 Task - Testing Ensembling Methods in Diverse Scenarios
access here
Author(s)
Ştefan Liviu-Daniel; Constantin Mihai Gabriel; Dogariu Mihai; Ionescu Bogdan
Institution
University Politehnica of Bucharest
Abstract
This paper presents a comprehensive overview of the second edition of the ImageCLEFfusion task, held in 2023. The primary goal of this endeavor is to facilitate the advancement of late fusion or ensembling methodologies, which possess the capability to leverage prediction outcomes derived from pre-computed inducers to generate superior and enhanced prediction outputs. The present iteration of this task encompasses three distinct challenges: the continuation of the previous year’s regression challenge utilizing media interestingness data, where performance is measured via the mAP at 10 metric; the continuation of the retrieval challenge involving image search result diversification data, where performance is measured via the F1-score and Cluster Recall at 20; and the addition of a new multi-label classification task focused on concepts detection in medical data, where performance is measured via the F1-score. Participants were provided with a predetermined set of pre-computed inducers and were strictly prohibited from incorporating external inducers during the competition. This ensured a fair and standardized playing field for all participants. A total of 23 runs were received and the analysis of the proposed methods shows diversity among them ranging from machine learning approaches that join the inducer predictions to ensemble schemes that learn the results of other methods.
Access
Open Access
Type of Publication
Conference Paper
Publisher
24th Working Notes of the Conference and Labs of the Evaluation Forum
18/09/2023
AIMultimediaLab at ImageCLEFmedical GANs 2023: Determining “Fingerprints” of Training Data in Generated Synthetic Images
access here
Author(s)
Andrei Alexandra-Georgiana; Ionescu Bogdan
Institution
University Politehnica of Bucharest
Abstract
This paper presents the participation of the AI Multimedia Lab to the 2023 ImageCLEF medical GANs task. The 2023 ImageCLEFmedical GANs task challenges participants to examine the hypothesis that generative models (Generative Adversarial Networks — GAN) are generating medical images that contain “fingerprints” of the real images used for the training of the network. We present our team’s approach to tackle this task, consisting of a method that implies generating synthetic images from the development dataset. Subsequently, features were extracted from both sets of generated images (the one provided in the development dataset and the one we generated). A binary Support Vector Machine (SVM) classifier was trained using these features and the labels were predicted for the real images from the test dataset. Experimentation on the testing dataset show promising results. © 2023 Copyright for this paper by its authors.
Access
Open Access
Type of Publication
Conference Paper
Publisher
24th Working Notes of the Conference and Labs of the Evaluation Forum
18/09/2023
Overview of ImageCLEFmedical GANs 2023 Task — Identifying Training Data “Fingerprints” in Synthetic Biomedical Images Generated by GANs for Medical Image Security
access here
Author(s)
Andrei Alexandra-Georgiana; Radzhabov Ahmedkhan; Coman Ioan; Kovalev Vassili; Ionescu Bogdan; Müller Henning
Institution
- University Politehnica of Bucharest, Romania - Belarusian Academy of Sciences, Minsk, Belarus - University of Applied Sciences Western Switzerland, Sierre, Switzerland
Abstract
The 2023 ImageCLEFmedical GANs task is the first edition of this task, examining the existing hypothesis that GANs (Generative Adversarial Networks) are generating medical images that contain the “fingerprints” of the real images used for generative network training. The objective proposed to the participants is to identify the real images that were used to obtain some synthetic images using Generative Models. Overall, 23 teams registered to the task, 8 of them finalizing the task and submitting runs. A total of 40 runs were received. An analysis of the proposed methods shows a great diversity among them, ranging from texture analysis, similarity-based approaches that join inducer predictions like SVM or KNN, to deep learning approaches and even multi-stage transfer learning. This paper presents the overview of 2023 ImageCLEFmedical GANs task by describing its datasets, evaluation metrics as well as a discussion of the participants runs and results, and the future challenges.
Access
Open Access
Type of Publication
Conference Paper
Publisher
24th Working Notes of the Conference and Labs of the Evaluation Forum
12/01/2023
Overview of The MediaEval 2022 Predicting Video Memorability Task
access here
Author(s)
Sweeney Lorin; Constantin Mihai Gabriel; Demarty Claire-Hélène; Fosco Camilo; Seco de Herrera Alba G.; Halder Sebastian; Healy Graham; Ionescu Bogdan; Matran-Fernandez Ana; Smeaton Alan F.; Sultana Mushfika
Institution
- Dublin City University, Ireland - University Politehnica of Bucharest, Romania - InterDigital, France - Massachusetts Institute of Technology Cambridge, USA - University of Essex, UK
Abstract
This paper describes the 5th edition of the Predicting Video Memorability Task as part of MediaEval2022. This year we have reorganised and simplified the task in order to lubricate a greater depth of inquiry. Similar to last year, two datasets are provided in order to facilitate generalisation, however, this year we have replaced the TRECVid2019 Video-to-Text dataset with the VideoMem dataset in order to remedy underlying data quality issues, and to prioritise short-term memorability prediction by elevating the Memento10k dataset as the primary dataset. Additionally, a fully fledged electroencephalography (EEG)based prediction sub-task is introduced. In this paper, we outline the core facets of the task and its constituent sub-tasks; describing the datasets, evaluation metrics, and requirements for participant submissions.
Access
Open Access
Type of Publication
Conference Paper
Publisher
2022 MediaEval Workshop
12/01/2023
AIMultimediaLab at MediaEval 2022: Predicting Media Memorability Using Video Vision Transformers and Augmented Memorable Moments
access here
Author(s)
Constantin Mihai Gabriel; Ionescu Bogdan
Institution
- University Politehnica of Bucharest, Romania
Abstract
This paper describes AIMultimediaLab's approach and results achieved during the 2022 MediaEval Predicting Video Memorability task. The proposed approach represents a continuation of last year's work, using, updating and better analysing the concept of Memorable Moments. This is done by improving the scheme we use for selecting Memorable Moments, and allowing for the possibility that more than one video segment is representative for the entire video clip from a memorability standpoint. Furthermore, we propose studying a new architecture for processing the selected Memorable Moments, by implementing a variant of the popular ViViT architecture, that is more suited to analysing pure video content.
Access
Open Access
Type of Publication
Conference Paper
Publisher
2022 MediaEval Workshop
01/01/2023
Multimodal emotion recognition using cross modal audio-video fusion with attention and deep metric learning
access here
Author(s)
Mocanu Bogdan; Tapu Ruxandra; Zaharia Titus
Institution
University Politehnica of Bucharest, Romania; Institut Polytechnique de Paris, Télécom SudParis, France
Abstract
In the last few years, the multi-modal emotion recognition has become an important research issue in the affective computing community due to its wide range of applications that include mental disease diagnosis, human behavior understanding, human machine/robot interaction or autonomous driving systems. In this paper, we introduce a novel end-to-end multimodal emotion recognition methodology, based on audio and visual fusion designed to leverage the mutually complementary nature of features while maintaining the modality-specific information. The proposed method integrates spatial, channel and temporal attention mechanisms into a visual 3D convolutional neural network (3D-CNN) and temporal attention into an audio 2D convolutional neural network (2D-CNN) to capture the intra-modal features characteristics. Further, the inter-modal information is captured with the help of an audio-video (A-V) cross-attention fusion technique that effectively identifies salient relationships across the two modalities. Finally, by considering the semantic relations between the emotion categories, we design a novel classification loss based on an emotional metric constraint that guides the attention generation mechanisms. We demonstrate that by exploiting the relations between the emotion categories our method yields more discriminative embeddings, with more compact intra-class representations and increased inter-class separability. The experimental evaluation carried out on the RAVDESS (The Ryerson Audio-Visual Database of Emotional Speech and Song), and CREMA-D (Crowd-sourced Emotional Multimodal Actors Dataset) datasets validates the proposed methodology, which leads to average accuracy scores of 89.25% and 84.57%, respectively. In addition, when compared to state-of-the-art techniques, the proposed solution shows superior performances, with gains in accuracy ranging in the [1.72%, 11.25%] interval.
Access
Open Access
Type of Publication
Journal Article
Publisher
Elsevier, Image and Vision Computing
01/01/2023
Facial Emotion Recognition using Video Visual Transformer and Attention Dropping
access here
Author(s)
Mocanu Bogdan; Tapu Ruxandra
Institution
University Politehnica of Bucharest, Romania; Institut Polytechnique de Paris, Télécom SudParis, France
Abstract
Understanding human emotions is a fundamental task in the affective computing community due to its wide range of applications including robotics, psychology, or computer science. Recognizing emotions is an important factor in human interaction, helping people to convey intentions, empathy and in some cases the actual meaning of a message. In this paper, we have introduced a novel discrete emotion recognition framework based on visual information analysis. At the technical level, the core of the proposed methodology involves a novel video-visual transformer extended with an attention dropping stage that allows extracting the spatiotemporal locations of the most relevant facial regions illustrating the peak of emotion. The experimental evaluation conducted on two publicly available datasets CREMA-D and RAVDESS validates the proposed methodology, which lead to average accuracy scores of 82.16% and 85.56%, respectively.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
2023 International Symposium on Signals, Circuits and Systems
01/01/2023
Inducing and Obtaining Cognitive Load Ground Truth Data in Automotive Scenarios
access here
Author(s)
Sultana Alina E.; Nicolae Irina E.; Fulop Szabolcs; Aursulesei Ruxandra; O’Callaghan David
Institution
University Politehnica of Bucharest, Bucharest, Romania; University Transilvania of Brasov, Brasov, Romania; ICM Concept Team, Xperi Corporation, Brasov, Romania; Alexandru Obregia Hospital, Bucharest, Romania; Xperi Corporation, Galway, Ireland
Abstract
The rise of sophisticated in-car multimedia solutions has led to both positive and negative impacts on the road-user’s driving experience. A drastic increase in the number of road accidents due to drivers’ inattention is a clear negative consequence. Thus, there has been an increased interest lately in measuring real-time driver cognitive load to alert them to focus on driving. Quantifying the ability to solve a task, such as driving safely, is difficult to accomplish in terms of diversity of subjects, their emotional state or fatigue at a given time. In this paper, a pipeline is presented that obtains ground truth labels for cognitive load from video and biosignal data. The experimental design for inducing the cognitive load state and the data processing are presented as part of the pipeline. This methodology was validated using the biosignal data collected from 31 subjects and conducting a comparative analysis between cognitive and non-cognitive states.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
SPIE, The International Society for Optical Engineering
01/01/2023
Classification of Skin Lesions from Dermatoscopic Images Using Convolutional Neural Networks
access here
Author(s)
Oniga Maria; Sultana Alina-Elena; Popescu Dan; Merezeanu Daniel-Marian; Ichim Loretta
Institution
University Politehnica of Bucharest, Romania; 'St. S. Nicolau' Institute of Virology, Bucharest, Romania
Abstract
Skin cancer is one of the most common types of cancer, and it is caused by a variety of dermatological conditions. Identifying abnormalities from skin images is an important pre-diagnostic step to assist physicians in determining the patient's condition. Thus, to aid dermatologists in the diagnosis process, we proposed five CNN-based classification approaches namely ResNet-101, DenseNet-121, GoogLeNet, VGG16, and MobileNetV2 architectures on which the transfer learning process was applied. The HAM10000-N database consisting of 7,120 images, which was obtained from the original HAM10000 dataset through an augmentation process, was used to train the proposed methods. Moreover, the images from the HAM10000-N were pre-processed by removing hair with the DullRazor algorithm. To evaluate and compare the performance of all networks five metrics were calculated: accuracy, precision, recall, and Fl-score. The best results for the seven-class classification of the HAM10000-N dataset were obtained for DenseNet-121 architecture with 87% accuracy, 0.871 precision, 0.87 recall and 0.872 F1-score.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
24th International Conference on Control Systems and Computer Science
01/06/2023
Fetal ECG Signal Processing for Fetal Monitoring Based on BSS and EMD
access here
Author(s)
Manea Ionuţ; Ţarǎlungǎ Dragoş-Daniel
Institution
University Politehnica from Bucharest, Romania
Abstract
Cardiac abnormalities are, currently, among the most widespread causes of intrauterine deaths resulting from congenital malformations. An early diagnosis could contribute to reducing their number. Isolation of the fetal electrocardiogram (fECG) from abdominal signals is one of the most promising approaches due to its advantages compared to other fetal monitoring methods. However, the multiple sources of interference that alter abdominal signals make them difficult to process and interpret. Thus, in the present work, an algorithm is proposed to isolate the fECG from such signals and to extract the fetal heart rate (fHR). It is divided in 3 main stages: (1) Preprocessing, in which both Empirical Mode Decomposition (EMD) and Empirical Wavelet Transform (EWT) are used to eliminate motion artifacts (MA), baseline wander (BA) and powerline interference (PLI), (2) Maternal electrocardiogram (mECG) approximation and removal, using the Fractional Fourier Transform (FrFT) and Maximum Likelihood Estimation (MLE) and (3) fECG approximation using Independent Component Analysis (ICA) and a bandpass filter to facilitate the detection of the fetal R peaks and to obtain an approximation of fHR. The algorithm was tested on the Abdominal and Direct Fetal ECG Database (adfecgdb) available on the Physionet platform.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
International Computer Software and Applications Conference
01/01/2023
Fetal Monitoring: Multi-Channel Fetal ECG Denoising Based on Artificial Intelligence Approach
access here
Author(s)
Taralunga Dragoș Daniel
Institution
University Politehnica from Bucharest, Romania
Abstract
Continuous electronic fetal monitoring using cardiotocography (CTG) represents the standard of evaluating the health status of the fetus and the risk of the pregnancy, in developed countries. However, the CTG has many limitations: high false positive rates, cannot be used for long term monitoring, poor sensitivity, it offers just the fetal heart rate and its variability etc. In this context, the fetal electrocardiogram (fECG) signal is used to obtain additional diagnostic information. On the other hand, the standard in clinical practice for obtaining the fECG is invasive, can pose a risk for both mother and fetus, can only be used during birth (very limited time window). An alternative is the abdominal fECG, that is recorded using a matrix of electrodes placed on the maternal abdomen. This approach is noninvasive and can be used for long term monitoring. The main drawback is the small signal to noise ratio for the abdominal fECG. Thus, the challenge is to isolate the fECG signal from other types of noise that are recorded by the abdominal electrodes: the maternal electrocardiogram (mECG), the electromyogram (EMG), the electrohysterogram (EHG), power line interference (PLI) etc. In this paper the author proposes an algorithm based on artificial neural network approach to extract the fECG signal waveform from abdominal recorded signals (ADS). The performance evaluation of proposed approach is realized on a database with simulated abdominal signals. A comparison is introduced, with other approaches described in literature for fECG denoising from abdominal signals.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
46th ICT and Electronics Convention

2022

13/06/2022
Automatic Sleep Scoring with LSTM Networks: Impact of Time Granularity and Input Signals
access here
Author(s)
Tautan Alexandra‑Maria; Rossi Alessandro; Ionescu Bogdan
Institution
National University of Science & Technology POLITEHNICA Bucharest, Romania; Onera Health, Netherlands
Abstract
Supervised automatic sleep scoring algorithms are usually trained using sleep stage labels manually annotated on 30 s epochs of PSG data. In this study, we investigate the impact of using shorter epochs with various PSG input signals for training and testing a Long Short Term Memory (LSTM) neural network. An LSTM model is evaluated on the provided 30 s epoch sleep stage labels from a publicly available dataset, as well as on 10 s subdivisions. Additionally, three independent scorers re-labeled a subset of the dataset on shorter time windows. The automatic sleep scoring experiments were repeated on the re-annotated subset. The highest performance is achieved on feature extracted from 20 s epochs of a single channel frontal EEG. The resulting accuracy, precision and recall were of 92.22%, 67.59% and 86.00% respectively. When using a shorter epoch as input, the performance decreased by approximately 20%. Re-annotating a subset of the dataset on shorter time epochs did not improve the results and further altered the sleep stage detection performance. Our results show that our feature-based LSTM classification algorithm performs better on 30 s PSG epochs when compared to 10 s epochs used as input. Future work could be oriented to determining whether varying the epoch size improves classification outcomes for different types of classification algorithms.
Access
Closed Access
Type of Publication
Journal Article
Publisher
Walter de Gruyter, Biomedical Engineering / Biomedizinische Technik
14/04/2022
Generation of Realistic Synthetic Financial Time-series
access here
Author(s)
Dogariu Mihai; Stefan Liviu‑Daniel; Boteanu Bogdan Andrei; Lamba Claudiu; Kim Bomi; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania; Hana 1st Tech, Hana TI, Big Data & AI Lab, Seoul, South Korea
Abstract
Financial markets have always been a point of interest for automated systems. Due to their complex nature, financial algorithms and fintech frameworks require vast amounts of data to accurately respond to market fluctuations. This data availability is tied to the daily market evolution, so it is impossible to accelerate its acquisition. In this article, we discuss several solutions for augmenting financial datasets via synthesizing realistic time-series with the help of generative models. This problem is complex, since financial time series present very specific properties, e.g., fat-tail distribution, cross-correlation between different stocks, specific autocorrelation, cluster volatility and so on. In particular, we propose solutions for capturing cross-correlations between different stocks and for transitioning from fixed to variable length time-series without resorting to sequence modeling networks, and adapt various network architectures, e.g., fully connected and convolutional GANs, variational autoencoders, and generative moment matching networks. Finally, we tackle the problem of evaluating the quality of synthetic financial time-series. We introduce qualitative and quantitative metrics, along with a portfolio trend prediction framework that validates our generative models’ performance. We carry out experiments on real-world financial data extracted from the US stock market, proving the benefits of these techniques.
Access
Closed Access
Type of Publication
Journal Article
Publisher
Association for Computing Machinery, ACM Transactions on Multimedia Computing Communications and Applications
24/03/2022
Affect in Multimedia: Benchmarking Violent Scenes Detection
access here
Author(s)
Constantin Mihai Gabriel; Stefan Liviu‑Daniel; Ionescu Bogdan; Demarty Claire‑Helene; Sjoberg Mats; Scheld Markus; Gravier Guillaume
Institution
University Politehnica of Bucharest, Romania; InterDigital R&D, France; CSC IT Center for Science Ltd, Finland; Johannes Kepler University Linz, Austria; Institut National de Recherche en Informatique et en Automatique, France
Abstract
In this article, we report on the creation of a publicly available, common evaluation framework for Violent Scenes Detection (VSD) in Hollywood and YouTube videos. We propose a robust data set, the VSD96, with more than 96 hours of video of various genres, annotations at different levels of detail (e.g., shot-level, segment-level), annotations of mid-level concepts (e.g., blood, fire), various pre-computed multi-modal descriptors, and over 230 system output results as baselines. This is the most comprehensive data set available to this date tailored to the VSD task and was extensively validated during the MediaEval benchmarking campaigns. Furthermore, we provide an in- depth analysis of the crucial components of VSD algorithms, by reviewing the capabilities and the evolution of existing systems (e.g., overall trends and outliers, the influence of the employed features and fusion techniques, the influence of deep learning approaches). Finally, we discuss the possibility of going beyond state-of-the-art performance via an ad-hoc late fusion approach. Experimentation is carried out on the VSD96 data. We provide the most important lessons learned and gained insights. The increasing number of publications using the VSD96 data underline the importance of the topic. The presented and published resources are a practitioner’s guide and also a strong baseline to overcome, which will help researchers for the coming years in analyzing aspects of audio-visual affect and violence detection in movies and videos.
Access
Closed Access
Type of Publication
Journal Article
Publisher
IEEE Transactions on Affective Computing
13/07/2022
Two-Stage Spatio-Temporal Vision Transformer for the Detection of Violent Scenes
access here
Author(s)
Constantin Mihai Gabriel; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania
Abstract
The rapid expansion and adoption of CCTV systems brings with itself a series of problems that, if remain unchecked, have the potential of hindering the advantages brought by such systems and reduce the effectiveness of this type of system in security surveillance scenarios. The possibly vast quantities of data associated with a CCTV system that covers a city or problematic areas of that city, venues, events, industrial sites or even smaller security perimeters can over-whelm the human operators and make it hard to distinguish important security events from the rest of the normal data. Therefore, the creation of automated systems that are able to provide operators with accurate alarms when certain events take place is of paramount importance, as this can heavily reduce their workload and improve the efficiency of the system. In this regard, we propose a Two-Stage Vision Transformer-based (2SViT) system for the detection of violent scenes. In this setup, the first stage handles frame-level processing, while the second stage processes temporal information by gathering frame-level features. We train and validate our proposed Transformer architecture on the popular XD-Violence dataset, while testing some size variations for the architecture, and show good results when compared with baseline scores.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
14th International Conference on Communications
13/07/2022
Deep Learning-based Object Searching and Reporting for Aerial Surveillance Systems
access here
Author(s)
Jitaru Andrei-Cosmin; Barbu Cosmina-Elena; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania
Abstract
The rapid expansion and adoption of CCTV systems brings with itself a series of problems that, if remain unchecked, have the potential of hindering the advantages brought by such systems and reduce the effectiveness of this type of system in security surveillance scenarios. The possibly vast quantities of data associated with a CCTV system that covers a city or problematic areas of that city, venues, events, industrial sites or even smaller security perimeters can over-whelm the human operators and make it hard to distinguish important security events from the rest of the normal data. Therefore, the creation of automated systems that are able to provide operators with accurate alarms when certain events take place is of paramount importance, as this can heavily reduce their workload and improve the efficiency of the system. In this regard, we propose a Two-Stage Vision Transformer-based (2SViT) system for the detection of violent scenes. In this setup, the first stage handles frame-level processing, while the second stage processes temporal information by gathering frame-level features. We train and validate our proposed Transformer architecture on the popular XD-Violence dataset, while testing some size variations for the architecture, and show good results when compared with baseline scores.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
14th International Conference on Communications
01/01/2022
Uncovering the Strength of Capsule Networks in Deepfake Detection
access here
Author(s)
Stanciu Dan-Cristian; Ionescu Bogdan
Institution
National University of Science & Technology POLITEHNICA Bucharest, Romania
Abstract
Information is everywhere, and sometimes we have no idea if what we read, watch or listen is accurate, real or authentic. This paper focuses on detecting deep learning generated videos, or deepfakes – a phenomenon which is more and more present in today’s society. While there are some very good methods of detecting deepfakes, there are two key elements that should always be considered, i.e., no method is perfect and deepfake generation techniques continue to evolve, sometimes even faster than detection methods. In our proposed architectures, we focus on a family of deep learning methods that is new, has several advantages over traditional Convolutional Neural Networks and has been underutilized in the fight against fake information, namely the Capsule Networks. We show that: (i) state-of-the-art Capsule Network architectures can be improved in the context of deepfake detection, (ii) they can be used to obtain accurate results using a very small number of parameters, and (iii) Capsule Networks are a viable option over deep convolutional models. Experimental validation is carried out on two publicly available datasets, namely FaceForensics++ and CelebDF, showing very promising results.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
1st ACM International Workshop on Multimedia AI against Disinformation
01/01/2022
Characterizing TMS-EEG perturbation indexes using signal energy: initial study on Alzheimer's Disease classification
access here
Author(s)
Tautan Alexandra-Maria; Casula Elias; Borghi Ilaria; Maiella Michele; Bonni Sonia; Minei Marilena; Assogna Martina; Ionescu Bogdan; Koch Giacomo; Santarnecchi Emiliano
Institution
University Politehnica of Bucharest, Romania; Harvard Medical School, USA; Santa Lucia Fdn, Rome, Italy; University Ferrara, Italy
Abstract
Transcranial Magnetic Stimulation (TMS) combined with EEG recordings (TMS-EEG) has shown great potential in the study of the brain and in particular of Alzheimer's Disease (AD). In this study, we propose an automatic method of determining the duration of TMS-induced perturbation of the EEG signal as a potential metric reflecting the brain's functional alterations. A preliminary study is conducted in patients with Alzheimer's disease (AD). Three metrics for characterizing the strength and duration of TMS-evoked EEG (TEP) activity are proposed and their potential in identifying AD patients from healthy controls was investigated. A dataset of TMS-EEG recordings from 17 AD and 17 healthy controls (HC) was used in our analysis. A Random Forest classification algorithm was trained on the extracted TEP metrics and its performance is evaluated in a leave-one-subject-out cross-validation. The created model showed promising results in identifying AD patients from HC with an accuracy, sensitivity and specificity of 69.32%, 72.23% and 66.41%, respectively.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society
01/01/2022
Preliminary study on the impact of EEG density on TMS-EEG classification in Alzheimer's disease
access here
Author(s)
Tautan Alexandra-Maria; Casula Elias; Borghi Ilaria; Maiella Michele; Bonni Sonia; Minei Marilena; Assogna Martina; Ionescu Bogdan; Koch Giacomo; Santarnecchi Emiliano
Institution
National University of Science & Technology POLITEHNICA Bucharest; Harvard Medical School, USA; Santa Lucia Fdn, Italy; University Ferrara, Italy
Abstract
Transcranial magnetic stimulation co-registered with electroencephalographic (TMS-EEG) has previously proven a helpful tool in the study of Alzheimer’s disease (AD). In this work, we investigate the use of TMS-evoked EEG responses to classify AD patients from healthy controls (HC). By using a dataset containing 17AD and 17HC, we extract various time domain features from individual TMS responses and average them over a low, medium and high density EEG electrode set. Within a leave-one-subject-out validation scenario, the best classification performance for AD vs. HC was obtained using a high-density electrode with a Random Forest classifier. The accuracy, sensitivity and specificity were of 92.7%, 96.58% and 88.82% respectively.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society
06/07/2022
Face Verification with Challenging Imposters and Diversified Demographics
access here
Author(s)
Popescu Adrian; Stefan Liviu-Daniel; Deshayes-Chossart Jerome; Ionescu Bogdan
Institution
University Paris Saclay, CEA, List, France; National University of Science & Technology POLITEHNICA Bucharest
Abstract
Face verification aims to distinguish between genuine and imposter pairs of faces, which include the same or different identities, respectively. The performance reported in recent years gives the impression that the task is practically solved. Here, we revisit the problem and argue that existing evaluation datasets were built using two oversimplifying design choices. First, the usual identity selection to form imposter pairs is not challenging enough because, in practice, verification is needed to detect challenging imposters. Second, the underlying demographics of existing datasets are often insufficient to account for the wide diversity of facial characteristics of people from across the world. To mitigate these limitations, we introduce the FaVCI2D dataset. Imposter pairs are challenging because they include visually similar faces selected from a large pool of demographically diversified identities. The dataset also includes metadata related to gender, country and age to facilitate fine-grained analysis of results. FaVCI2D is generated from freely distributable resources. Experiments with state-of-the-art deep models that provide nearly 100% performance on existing datasets show a significant performance drop for FaVCI2D, confirming our starting hypothesis. Equally important, we analyze legal and ethical challenges which appeared in recent years and hindered the development of face analysis research. We introduce a series of design choices which address these challenges and make the dataset constitution and usage more sustainable and fair. FaVCI2D is available at: https://github.com/AIMultimediaLab/FaVCI2D-Face-Verification-with-Challenging-Imposters-and-Diversified-Demographics.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
22nd IEEE/CVF Winter Conference on Applications of Computer Vision
02/11/2022
Overview of the ImageCLEF 2022: Multimedia Retrieval in Medical, Social Media and Nature Applications
access here
Author(s)
Ionescu Bogdan; Müller Henning; Péteri Renaud; Rückert Johannes; Ben Abacha Asma; de Herrera Alba G. Seco; Friedrich Christoph M.; Bloch Louise; Brüngel Raphael; Idrissi-Yaghir Ahmad; Schäfer Henning; Kozlovski Serge; Cid Yashin Dicente; Kovalev Vassil; Stefan Liviu-Daniel; Constantin Mihai Gabriel; Dogariu Mihai; Popescu Adrian; Deshayes-Chossart Jerome; Schindler Hugo; Chamberlain Jon; Campello Antonio; Clark Adrian
Institution
Univ Politehn Bucharest, Romania; Univ Appl Sci Western Switzerland HES SO, Sierre, Switzerland; Univ La Rochelle, La Rochelle, France; Univ Appl Sci & Arts Dortmund, Dortmund, Germany; Microsoft, Redmond, WA USA; Univ Essex, Colchester, Essex, England; Inst Informat, Minsk, BELARUS; Univ Warwick, Coventry, W Midlands, England; Univ Paris Saclay, LIST, CEA, F-91120 Palaiseau, France; Wellcome Trust Res Labs, London, England
Abstract
This paper presents an overview of the ImageCLEF 2022 lab that was organized as part of the Conference and Labs of the Evaluation Forum - CLEF Labs 2022. ImageCLEF is an ongoing evaluation initiative (first run in 2003) that promotes the evaluation of technologies for annotation, indexing and retrieval of visual data with the aim of providing information access to large collections of images in various usage scenarios and domains. In 2022, the 20th edition of ImageCLEF runs four main tasks: (i) a medical task that groups two previous tasks, i.e., caption analysis and tuberculosis prediction, (ii) a social media aware task on estimating potential real-life effects of online image sharing, (iii) a nature coral task about segmenting and labeling collections of coral reef images, and (iv) a new fusion task addressing the design of late fusion schemes for boosting the performance, with two real-world applications: image search diversification (retrieval) and prediction of visual interestingness (regression). The benchmark campaign received the participation of over 25 groups submitting more than 258 runs.
Access
Closed Access
Type of Publication
Book Chapter
Publisher
Springer, Lecture Notes in Computer Science, Experimental IR Meets Multilinguality, Multimodality, and Interaction
07/05/2022
ImageCLEF 2022: Multimedia Retrieval in Medical, Nature, Fusion, and Internet Applications
access here
Author(s)
de Herrera Alba G. Seco; Ionescu Bogdan; Müller Henning; Péteri Renaud; Ben Abacha Asma; Friedrich Christoph M.; Rückert Johannes; Bloch Louise; Brüngel Raphael; Idrissi-Yaghir Ahmad; Schäfer Henning; Kozlovski Serge; Cid Yashin Dicente; Kovalev Vassili; Chamberlain Jon; Clark Adrian; Campello Antonio; Schindler Hugo; Deshayes Jerome; Popescu Adrian; Stefan Liviu-Daniel; Constantin Mihai Gabriel; Dogariu Mihai
Institution
Univ Essex, Colchester, Essex, England; Univ Politehn Bucharest, Romania; Univ Appl Sci Western Switzerland HES SO, Delemont, Switzerland; Univ La Rochelle, La Rochelle, France; NLM, Bethesda, MD USA; Univ Appl Sci & Arts Dortmund FH Dortmund, Dortmund, Germany; Belarussian Acad Sci, Minsk, BELARUS; Univ Warwick, Coventry, W Midlands, England; Wellcome Trust Res Labs, London, England; CEA LIST, Palaiseau, France
Abstract
ImageCLEF is part of the Conference and Labs of the Evaluation Forum (CLEF) since 2003. CLEF 2022 will take place in Bologna, Italy. ImageCLEF is an ongoing evaluation initiative which promotes the evaluation of technologies for annotation, indexing, and retrieval of visual data with the aim of providing information access to large collections of images in various usage scenarios and domains. In its 20th edition, ImageCLEF will have four main tasks: (i) a Medical task addressing concept annotation, caption prediction, and tuberculosis detection; (ii) a Coral task addressing the annotation and localisation of substrates in coral reef images; (iii) an Aware task addressing the prediction of real-life consequences of online photo sharing; and (iv) a new Fusion task addressing late fusion techniques based on the expertise of the pool of classifiers. In 2021, over 100 research groups registered at ImageCLEF with 42 groups submitting more than 250 runs. These numbers show that, despite the COVID-19 pandemic, there is strong interest in the evaluation campaign.
Access
Closed Access
Type of Publication
Book Chapter
Publisher
Springer, Lecture Notes in Computer Science, Advances in Information Retrieval
01/01/2022
Exploring deep fusion ensembling for automatic visual interestingness prediction
access here
Author(s)
Constantin Mihai Gabriel; Stefan Liviu-Daniel; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania
Abstract
In the context of the ever growing quantity of multimedia content from social, news and educational platforms, generating meaningful recommendations and ratings now requires a more advanced understanding of their impact on the user, such as their subjective perception. One of the important subjective concepts explored by researchers is visual interestingness. While several definitions of this concept are given in the current literature, in a broader sense, this property attempts to measure the ability of audio-visual data to capture and keep the viewer's attention for longer periods of time. While many computer vision and machine learning methods have been tested for predicting media interestingness, overall, due to the heavily subjective nature of interestingness, the precision of the results is relatively low. In this chapter, we investigate several methods that address this problem from a different angle. We first review the literature on interestingness prediction and present an overview of the traditional fusion mechanisms, such as statistical fusion, weighted approaches, boosting, random forests or randomized trees. Further, we explore the possibility of employing a stronger, novel deep learning-based, system fusion for enhancing the performance. We investigate several types of deep networks for creating the fusion systems, including dense, attention, convolutional and cross-space-fusion networks, while also proposing some input decoration methods that help these networks achieve optimal performance. We present the results, as well as an analysis of the correlation between network structure and overall system performance. Experimental validation is carried out on a publicly available data set and on the systems benchmarked during the 2017 MediaEval Predicting Media Interestingness task.
Access
Closed Access
Type of Publication
Book Chapter
Publisher
Human Perception of Visual Information: Psychological and Computational Perspectives
05/09/2022
Overview of ImageCLEFfusion 2022 Task – Ensembling Methods for Media Interestingness Prediction and Result Diversification
access here
Author(s)
Ştefan Liviu-Daniel; Constantin Mihai Gabriel; Dogariu Mihai; Ionescu Bogdan
Institution
- University Politehnica of Bucharest, Romania
Abstract
The 2022 ImageCLEFfusion task is the first edition of this task, targeting the creation of late fusion or ensembling methods in two different scenarios: (i) the prediction of media visual interestingness, and (ii) social media image search results diversification. The objective proposed to participants is to train and test their proposed fusion schemes on a set of pre-computed inducers, without creating or bringing inducers from the outside. The two scenarios correspond to a regression scenario in the case of media interestingness, where performance is measured via the mean average precision at 10 (MAP@10) metric, and to a retrieval scenario in the case of result diversification, where performance is measured via the F1-score and Cluster Recall at 20 (F1@20, CR@20). Overall 6 teams registered for ImageCLEFfusion, 5 of them submitting runs, while only one team submitted runs to both the interestingness and diversification tasks. A total of 39 runs were received, and an analysis of the proposed methods shows a great diversity among them, ranging from statistical weighted approaches, weighted approaches that use learning stages for creating the weights, machine learning approaches that join the inducer predictions like SVM or KNN, deep learning approaches, and even fusion schemes that join the results of other fusion schemes.
Access
Open Access
Type of Publication
Conference Paper
Publisher
2022 Conference and Labs of the Evaluation Forum
05/09/2022
Overview of the ImageCLEF 2022 Aware Task
access here
Author(s)
Popescu Adrian; Deshayes-Chossart Jérôme; Schindler Hugo; Ionescu Bogdan
Institution
- Université Paris-Saclay, CEA, LIST, France - University Politehnica of Bucharest, Romania
Abstract
The paper presents the overview of the ImageCLEF 2022 Aware task whose final objective is to make users more aware about the consequences of posting information on social networks. This is important insofar as users are often unaware about the effects of personal data sharing. Focus is put on modeling the impact of sharing impactful real-life situations such as searching for a bank loan, an accommodation, or a job. Since photos are one of the main types of data shared online, the task is instantiated as a photographic user profile assessment. Participants receive a training and validation dataset which includes a set of photographic profiles which are manually rated for each situation. They are required to train algorithms which rate and then rank test profiles in each tested situation. The correlation between automatic and manual profile rankings is used to measure the performance of algorithms. The overview discusses the task settings, the dataset constitution process, and the approaches proposed this year. © 2022 Copyright for this paper by its authors.
Access
Open Access
Type of Publication
Conference Paper
Publisher
2022 Conference and Labs of the Evaluation Forum
01/01/2022
Automatic Alignment of Human Generated Transcripts to Speech Signals
access here
Author(s)
Mocanu Bogdan; Tapu Ruxandra
Institution
Institut Polytechnique de Paris, Télécom SudParis, ARTEMIS Department, France; University “Politehnica” of Bucharest, Romania
Abstract
This paper introduces a novel, completely automatic, audio-subtitle synchronization algorithm designed to increase the accessibility and comprehension of the hearing-impaired people over the video documents. The major contribution of the paper involves the anchor words matching strategy that can reliably put in correspondence the subtitle document generated by a human transcriber and the automatic speech recognition (ASR) textual file. In addition, we propose a subtitle positioning strategy that automatically determines the optimal location for each phrase on the user screen, with respect to the video semantic content. The experimental evaluation validates the proposed method with average accuracy scores superior to 90%.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
10th E-Health and Bioengineering Conference
01/01/2022
Emotion Recognition in Video Streams Using Intramodal and Intermodal Attention Mechanisms
access here
Author(s)
Mocanu Bogdan; Tapu Ruxandra
Institution
ARTEMIS Department, Institut Polytechnique de Paris, Télécom SudParis, France; University “Politehnica” of Bucharest, Romania
Abstract
Automatic emotion recognition from video streams is an essential challenge for various applications including human behavior understanding, mental disease diagnosis, surveillance, or human-machine interaction. In this paper we introduce a novel, completely automatic, multimodal emotion recognition framework based on audio and visual fusion of information designed to leverage the mutually complementary nature of features while maintaining the modality-distinctive information. Specifically, we integrate the spatial, channel and temporal attention into the visual processing pipeline and the temporal self-attention into the audio branch. Then, a multimodal cross-attention fusion strategy is introduced that effectively exploits the relationship between the audio and video features. The experimental evaluation performed on RAVDESS, a publicly available database, validates the proposed approach with average accuracy scores superior to 87.85%. When compared with the state-of the art methods the proposed framework returns accuracy gains of more than 1.85%.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
Springer, Lecture Notes in Computer Science
01/09/2022
Active Speaker Recognition using Cross Attention Audio-Video Fusion
access here
Author(s)
Mocanu Bogdan; Tapu Ruxandra
Institution
University “Politehnica” of Bucharest, Romania; Institut Polytechnique de Paris, Télécom SudParis, ARTEMIS Department, France
Abstract
The audio-video based multimodal active speaker recognition from video streams has attracted the attention of the scientific community due to its wide range of applications, such as human centered computing or semantic video understanding. Most of the existing techniques use early or late fusion audio-video (A-V) strategies without considering completely the inter-modal and intra-modal interactions. In this context, this research work proposes a novel cross-modal attention mechanism based on visual and audio modalities designed to capture the complex spatiotemporal relationship between descriptors and to fuse complementary information from multiple modalities. First, we perform the representation learning of audio and video using deep convolutional neural networks (CNNs). Secondly, we feed the features of both modalities to a cross attention block by fusing A-V features at the model level. Finally, we obtain the identity of the active speaker and associate to each character the corresponding subtitle segment. The experimental evaluation performed on 30 videos validates the approach with average F1-scores superior to 88%. The effectiveness of the proposed system architecture is compared against state-of-the-art methods and demonstrates accuracy gains of more than 3%.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
European Workshop on Visual Information Processing
01/01/2022
Audio-Video Fusion with Double Attention for Multimodal Emotion Recognition
access here
Author(s)
Mocanu Bogdan; Tapu Ruxandra
Institution
Institut Polytechnique de Paris, Télécom SudParis, ARTEMIS Department, France; University 'Politehnica' of Bucharest, Romania
Abstract
Recently, the multimodal emotion recognition has become a hot topic of research, within the affective computing community, due to its robust performances. In this paper, we propose to analyze emotions in an end-to-end manner based on various convolutional neural networks (CNN) architectures and attention mechanisms. Specifically, we develop a new framework that integrates the spatial and temporal attention into a visual 3D-CNN and temporal attention into an audio 2D-CNN in order to capture the intra-modal features characteristics. Further, the system is extended with an audio-video cross-attention fusion approach that effectively exploits the relationship across the two modalities. The proposed method achieves 87.89% of accuracy on RAVDESS dataset. When compared with state-of-the art methods our system demonstrates accuracy gains of more than 1.89%.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
IEEE 14th Image, Video, and Multidimensional Signal Processing Workshop
01/01/2022
Emotion Recognition from Raw Speech Signals Using 2D CNN with Deep Metric Learning
access here
Author(s)
Mocanu Bogdan; Tapu Ruxandra
Institution
Institut Polytechnique de Paris, Télécom SudParis, ARTEMIS Department, France; University 'Politehnica' of Bucharest, Romania
Abstract
In this paper we have introduced a novel emotion recognition framework from raw speech signals. The system is based on ResNet architecture fed with spectrogram inputs. The CNN is further extended with a GhostVLAD feature aggregation layer that extracts a single, fixed size descriptor constructed at the level of the utterance. The system adopts a sentiment metric loss that integrates the relations between various classes of emotions. The experimental evaluation conducted on two publicly available databases: RAVDESS and CREMA-D validates the proposed methodology with average accuracy scores of 82% and 63%, respectively.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
IEEE International Conference on Consumer Electronics

2021

22/05/2022
An Annotated Video Dataset for Computing Video Memorability
access here
Author(s)
Kiziltepe Rukiye Savran; Sweeney Lorin; Constantin Mihai Gabriel; Doctor Faiyaz; de Herrera Alba Garcia Seco; Demarty Claire‑Helene; Healy Graham; Ionescu Bogdan; Smeaton Alan F.
Institution
University of Essex, Colchester, Essex, England; Dublin City University, Insight Centre for Data Analytics, Dublin, Ireland; University Politehnica of Bucharest, Bucharest, Romania; InterDigital, R&D, Paris, France
Abstract
Using a collection of publicly available links to short form video clips of an average of 6 seconds duration each, 1275 users manually annotated each video multiple times to indicate both long-term and short-term memorability of the videos. The annotations were gathered as part of an online memory game and measured a participant's ability to recall having seen the video previously when shown a collection of videos. The recognition tasks were performed on videos seen within the previous few minutes for short-term memorability and within the previous 24 to 72 hours for longterm memorability. Data includes the reaction times for each recognition of each video. Associated with each video are text descriptions (captions) as well as a collection of image-level features applied to 3 frames extracted from each video (start, middle and end). Video-level features are also provided. The dataset was used in the Video Memorability task as part of the MediaEval benchmark in 2020. (C) 2021 The Author(s). Published by Elsevier Inc.
Access
Open Access
Type of Publication
Journal Article
Publisher
Elsevier, Data in Brief
12/09/2021
Backpropagation Aided Logo Generation Using Generative Adversarial Networks
access here
Author(s)
Dogariu Mihai; Le Borgne Herve; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania; University Paris Saclay, CEA, LIST, Palaiseau, France
Abstract
Logo detection algorithms rely on comprehensive datasets in order to achieve a high precision. The scarcity of these resources represents the motivation to augment them through artificial logo generation. In this paper, we propose a logo dataset augmentation technique that leverages the generalization power of generative adversarial networks (GANs). We train a GAN on a highly complex dataset formed of single logo instances such that we are able to generate random logos. Then, by applying deep gradient back-propagation, we manage to reconstruct very specific logos with this model. We validate our approach by replacing original logos from in-the-wild images with their reconstructed versions and running logo detection algorithms on the newly created images.
Access
Closed Access
Type of Publication
Journal Article
Publisher
University Politehnica of Bucharest, Scientific Bulletin Series C-Electrical Engineering and Computer Science
02/07/2021
Artificial Intelligence in Neurodegenerative Diseases: A Review of Available Tools with a Focus on Machine Learning Techniques
access here
Author(s)
Tautan Alexandra‑Maria; Ionescu Bogdan; Santarnecchi Emiliano
Institution
University Politehnica of Bucharest, Romania; Harvard Medical School, Beth Israel Deaconess Medical Center, USA
Abstract
Neurodegenerative diseases have shown an increasing incidence in the older population in recent years. A significant amount of research has been conducted to characterize these diseases. Computational methods, and particularly machine learning techniques, are now very useful tools in helping and improving the diagnosis as well as the disease monitoring process. In this paper, we provide an in depth review on existing computational approaches used in the whole neurodegenerative spectrum, namely for Alzheimer's, Parkinson's, and Huntington's Disease, Amyotrophic Lateral Sclerosis, and Multiple System Atrophy. We propose a taxonomy of the specific clinical features, and of the existing computational methods. We provide a detailed analysis of the various modalities and decision systems employed for each disease. We identify and present the sleep disorders which are present in various diseases and which represent an important asset for onset detection. We overview the existing data set resources and evaluation metrics. Finally, we identify current remaining open challenges and discuss future perspectives.
Access
Closed Access
Type of Publication
Journal Article
Publisher
Elsevier, Artificial Intelligence in Medicine
22/02/2021
Visual Interestingness Prediction: A Benchmark Framework and Literature Review
access here
Author(s)
Constantin Mihai Gabriel; Stefan Liviu‑Daniel; Ionescu Bogdan; Duong Ngoc Q. K.; Demarty Claire‑Helene; Sjoberg Mats
Institution
University Politehnica of Bucharest, Romania; InterDigital, Paris, France; CSC IT Center for Science, Finland
Abstract
In this paper, we report on the creation of a publicly available, common evaluation framework for image and video visual interestingness prediction. We propose a robust data set, the Interestingness10k, with 9831 images and more than 4 h of video, interestingness scores determined based on more than 1M pair-wise annotations of 800 trusted annotators, some pre-computed multi-modal descriptors, and 192 system output results as baselines. The data were validated extensively during the 2016–2017 MediaEval benchmark campaigns. We provide an in-depth analysis of the crucial components of visual interestingness prediction algorithms by reviewing the capabilities and the evolution of the MediaEval benchmark systems, as well as of prominent systems from the literature. We discuss overall trends, influence of the employed features and techniques, generalization capabilities and the reliability of results. We also discuss the possibility of going beyond state-of-the-art performance via an automatic, ad-hoc system fusion, and propose a deep MLP-based architecture that outperforms the current state-of-the-art systems by a large margin. Finally, we provide the most important lessons learned and insights gained.
Access
Closed Access
Type of Publication
Journal Article
Publisher
Springer, International Journal of Computer Vision
25/05/2021
Dimensionality Reduction for EEG-based Sleep Stage Detection: Comparison of Autoencoders, Principal Component Analysis and Factor Analysis
access here
Author(s)
Tautan Alexandra‑Maria; Rossi Alessandro C.; de Francisco Ruben; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania; Onera Health, Netherlands
Abstract
Methods developed for automatic sleep stage detection make use of large amounts of data in the form of polysomnographic (PSG) recordings to build predictive models. In this study, we investigate the effect of several dimensionality reduction techniques, i.e., principal component analysis (PCA), factor analysis (FA), and autoencoders (AE) on common datasets, e.g., random forests (RF), multilayer perceptron (MLP), long short term memory (LSTM) networks, for automated sleep stage detection. Experimental testing is carried out on the MGH Dataset provided in the "You Snooze, You Win: The PhysioNet/Computing in Cardiology Challenge 2018". The signals used as input are the six available (EEG) electrophysiological channels and combinations with other of the PSG signals provided: ECG, electrooculogram, EMG, electromyogram, respiration based signals - respiratory efforts and airflows. We observe that a similar or improved accuracy is obtained in most cases when using all dimensionality reduction techniques, which is a promising result as it allows to reduce the computational load while maintaining performance and in some cases also improves the accuracy of automated sleep stage detection. In our study, using autoencoders for dimensionality reduction maintains the performance of the model, while using PCA and FA the accuracy of the models is in most cases improved.
Access
Closed Access
Type of Publication
Journal Article
Publisher
Walter de Gruyter, Biomedical Engineering / Biomedizinische Technik
23/02/2021
Benchmarking Image Retrieval Diversification Techniques for Social Media
access here
Author(s)
Ionescu Bogdan; Röhm Maia; Boteanu Bogdan Andrei; Gînsca Alexandru Lucian; Lupu Mihai; Müller Henning
Institution
University Politehnica of Bucharest, Romania; Vienna University of Technology, Austria; Atos, Informat Management & Preservat Grp, France; University of Applied Sciences Western Switzerland, Switzerland
Abstract
Image retrieval has been an active research domain for over 30 years and historically it has focused primarily on precision as an evaluation criterion. Similar to text retrieval, where the number of indexed documents became large and many relevant documents exist, it is of high importance to highlight diversity in the search results to provide better results for the user. The Retrieving Diverse Social Images Task of the MediaEval benchmarking campaign has addressed exactly this challenge of retrieving diverse and relevant results for the past years, specifically in the social media context. Multimodal data (e.g., images, text) was made available to the participants including metadata assigned to the images, user IDs, and precomputed visual and text descriptors. Many teams have participated in the task over the years. The large number of publications employing the data and also citations of the overview articles underline the importance of this topic. In this paper, we introduce these publicly available data resources as well as the evaluation framework, and provide an in-depth analysis of the crucial aspects of social image search diversification, such as the capabilities and the evolution of existing systems. These evaluation resources will help researchers for the coming years in analyzing aspects of multimodal image retrieval and diversity of the search results.
Access
Closed Access
Type of Publication
Journal Article
Publisher
IEEE, IEEE Transactions on Multimedia
10/06/2022
SmartEEG: An End-to-End Framework for the Analysis and Classification of EEG signals
access here
Author(s)
Ciurea Alexe; Manoila Cristina-Petruta; Ionescu Bogdan
Institution
National University of Science & Technology POLITEHNICA Bucharest; Cris Engn Ltd, Bishops Itchington, England
Abstract
EEG analysis frameworks are scientific tools to make neuroscientists' research less cumbersome. However, the focus has been on manual feature extraction and signal localization with limited machine classification functionality. In this paper, we propose a new framework for developing deep learning models suited for analyzing and classifying EEG signals. To provide a baseline for comparison, after training FCNNs, CNNs, and RNNs, we show that deep learning models have more potential for general EEG monitoring than classical algorithms. Furthermore, their performance can be independent of equipment and patient.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
9th IEEE International Conference on e-Health and Bioengineering
15/07/2021
Toward Language-independent Lip Reading: A Transfer Learning Approach
access here
Author(s)
Jitaru Andrei-Cosmin; Stefan Liviu-Daniel; Ionescu Bogdan
Institution
University Politehnica of Bucharest
Abstract
Automated lip-reading, i.e., translating lip movements into text, has received growing interest in recent years with the success of deep learning across a wide variety of tasks. One major obstacle to progress in this field has been the lack of suitable training resources, with the vast majority being limited to a selective set of languages. In this paper, we study the effectiveness of transfer learning to address the lack of massive amounts of labeled data for building a language-independent lip-reading system. Towards this target, we exploit existing knowledge and generalize to new languages via deep neural networks. Experimental validation is carried out on several publicly available data, i.e., LRW for English, LRRo for Romanian, and LRW-1000 for Mandarin showing promising results with significant performance improvements of the multilingual models in contrast to the monolingual ones.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
International Symposium on Signals, Circuits and Systems
15/07/2021
Hateful meme detection with multimodal deep neural networks
access here
Author(s)
Constantin Mihai Gabriel; Parvu Dan-Stefan; Stanciu Cristian; Ionascu Denisa; Ionescu Bogdan
Institution
University Politehnica of Bucharest
Abstract
The modern advances of social media platforms and content sharing websites led to the popularization of Internet memes, and today's Internet landscape contains websites that are predominantly dedicated to meme sharing. While at their inception memes were mostly humorous, this concept evolved and nowadays memes cover a wide variety of subjects, including political and social commentaries. Considering the widespread use of memes and their power of conveying distilled messages, they became an important method for spreading hate speech against individuals or targeted groups. Given the multimodal nature of Internet memes, our proposed approach is also a multimodal one, consisting of two parallel processing branches, one textual and one visual, that are joined in a final classification step, providing prediction results for the samples. We test our approach on the publicly available Memotion 7k dataset and compare our results with the baseline approach developed for the dataset.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
International Symposium on Signals, Circuits and Systems
15/07/2021
Deepfake Video Detection with Facial Features and Long-Short Term Memory Deep Networks
access here
Author(s)
Stanciu Cristian; Ionescu Bogdan
Institution
University Politehnica of Bucharest
Abstract
Generative models have evolved immensely in the last few years. GAN-based video and image generation has become very accessible due to open source software available to anyone, and that may pose a threat to society. Deepfakes can be used to intimidate, blackmail certain public figures or to mislead the public. At the same time, with the rising popularity of deepfakes, detection algorithms have also evolved significantly. The majority of those algorithms focus on images rather than to explore the temporal evolution in the video. In this paper, we explore whether the temporal information of the video can be used to increase the performance of state-of-the-art deepfake detection algorithms. We also investigate whether certain facial regions contain more information about the authenticity of the video by using the entire aligned face as input for our model and by only selecting certain facial regions. We use late fusion to combine those results for increased performance. To validate our solution, we experiment on 2 state-of-the-art datasets, namely FaceForensics++ and CelebDF. The results show that using the temporal dimension can greatly enhance the performance of a deep learning model.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
International Symposium on Signals, Circuits and Systems
17/04/2022
Towards Realistic Financial Time Series Generation via Generative Adversarial Learning
access here
Author(s)
Dogariu Mihai; Stefan Liviu-Daniel; Boteanu Bogdan Andrei; Lamba Claudiu; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania; Hana Institute of Technology, Big Data & AI Lab, South Korea
Abstract
Training network models to accurately respond to market fluctuations requires access to vast amounts of data. Data availability is strictly bound to the market’s evolution, which updates only on a daily basis. In this paper, we propose several solutions based on Generative Adversarial Networks for providing artificially generated time series data with realistic properties. The main challenge here is the specificity of the target data, which has properties that are difficult to control and have wide variations in time, e.g., central moment statistics, autocorrelation or cluster volatility. Another contribution is in assessing the quality of synthetic data, in general, as there is no standard metric for this. Experimental validation is carried out on real-world financial data retrieved from the US stock market.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
29th European Signal Processing Conference
21/01/2021
DeepFusion: Deep Ensembles for Domain Independent System Fusion
access here
Author(s)
Constantin Mihai Gabriel; Ștefan Liviu-Daniel; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania
Abstract
While ensemble systems and late fusion mechanisms have proven their effectiveness by achieving state-of-the-art results in various computer vision tasks, current approaches are not exploiting the power of deep neural networks as their primary ensembling algorithm, but only as inducers, i.e., systems that are used as inputs for the primary ensembling algorithm. In this paper, we propose several deep neural network architectures as ensembling algorithms with various network configurations that use dense and attention layers, an input pre-processing algorithm, and a new type of deep neural network layer denoted the Cross-Space-Fusion layer, that further improves the overall results. Experimental validation is carried out on several data sets from various domains (emotional content classification, medical data captioning) and under various evaluation conditions (two-class regression, binary classification, and multi-label classification), proving the efficiency of DeepFusion.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
International Conference on Multimedia Modeling
01/01/2021
The 2021 ImageCLEF Benchmark: Multimedia Retrieval in Medical, Nature, Internet and Social Media Applications
access here
Author(s)
Ionescu Bogdan; Müller Henning; Péteri Renaud; Ben Abacha Asma; Demner-Fushman Dina; Hasan Sadid A.; Sarrouti Mourad; Pelka Obioma; Friedrich Christoph M.; de Herrera Alba G. Seco; Jacutprakart Janadhip; Kovalev Vassili; Kozlovski Serge; Liauchuk Vitali; Cid Yashin Dicente; Chamberlain Jon; Clark Adrian; Campello Antonio; Moustahfid Hassan; Oliver Thomas; Schulz Abigail; Brie Paul; Berari Raul; Fichou Dimitri; Tauteanu Andrei; Dogariu Mihai; Stefan Liviu Daniel; Constantin Mihai Gabriel; Deshayes Jerome; Popescu Adrian
Institution
Univ Politehn Bucharest, Romania; Univ Appl Sci Western Switzerland HES SO, Sierre, Switzerland; La Rochelle Univ, La Rochelle, France; Natl Lib Med, Bethesda, MD USA; Univ Appl Sci & Arts Dortmund, Dortmund, Germany; Belarussian Acad Sci, Minsk, BELARUS; Univ Essex, Colchester, Essex, England; NOAA US IOOS, Silver Spring, MD USA; Wellcome Trust Res Labs, London, England; teleportHQ, Cluj Napoca, Romania; Univ Paris Saclay, CEA, List, F-91120 Palaiseau, France; CVS Health, Wellesley, MA USA; Univ Warwick, Coventry, W Midlands, England; United Inst Informat Problems, Minsk, BELARUS
Abstract
This paper presents the ideas for the 2021 ImageCLEF lab that will be organized as part of the Conference and Labs of the Evaluation Forum - CLEF Labs 2021 in Bucharest, Romania. ImageCLEF is an ongoing evaluation initiative (active since 2003) that promotes the evaluation of technologies for annotation, indexing and retrieval of visual data with the aim of providing information access to large collections of images in various usage scenarios and domains. In 2021, the 19th edition of ImageCLEF will organize four main tasks: (i) a Medical task addressing visual question answering, a concept annotation and a tuberculosis classification task, (ii) a Coral task addressing the annotation and localisation of substrates in coral reef images, (iii) a DrawnUI task addressing the creation of websites from either a drawing or a screenshot by detecting the different elements present on the design and (iv) a new Aware task addressing the prediction of real-life consequences of online photo sharing. The strong participation in 2020, despite the COVID pandemic, with over 115 research groups registering and 40 submitting over 295 runs for the tasks shows an important interest in this benchmarking campaign. We expect the new tasks to attract at least as many researchers for 2021.
Access
Closed Access
Type of Publication
Book Chapter
Publisher
Springer, Lecture Notes in Computer Science, Advances in Information Retrieval, ECIR 2021
01/01/2021
Overview of the ImageCLEF 2021: Multimedia Retrieval in Medical, Nature, Internet and Social Media Applications
access here
Author(s)
Ionescu Bogdan; Müller Henning; Péteri Renaud; Ben Abacha Asma; Sarrouti Mourad; Demner-Fushman Dina; Hasan Sadid A.; Kozlovski Serge; Liauchuk Vitali; Cid Yashin Dicente; Kovalev Vassili; Pelka Obioma; de Herrera Alba Garcia Seco; Jacutprakart Janadhip; Friedrich Christoph M.; Berari Raul; Tauteanu Andrei; Fichou Dimitri; Brie Paul; Dogariu Mihai; Stefan Liviu Daniel; Constantin Mihai Gabriel; Chamberlain Jon; Campello Antonio; Clark Adrian; Oliver Thomas A.; Moustahfid Hassan; Popescu Adrian; Deshayes-Chossart Jerome
Institution
Univ Politehn Bucharest, Romania; Univ Appl Sci Western Switzerland HES SO, Delemont, Switzerland; Univ La Rochelle, La Rochelle, France; Natl Lib Med, Bethesda, MD USA; CVS Health, Wellesley, MA USA; United Inst Informat Problems, Minsk, BELARUS; Univ Warwick, Coventry, W Midlands, England; Univ Appl Sci & Arts Dortmund, Dortmund, Germany; Univ Essex, Colchester, Essex, England; TeleportHQ, Cluj Napoca, Romania; Wellcome Trust Res Labs, London, England; Pacific Isl Fisheries Sci Ctr, Silver Spring, MD USA; Univ Paris Saclay, List, CEA, Palaiseau, France
Abstract
This paper presents an overview of the ImageCLEF 2021 lab that was organized as part of the Conference and Labs of the Evaluation Forum - CLEF Labs 2021. ImageCLEF is an ongoing evaluation initiative (first run in 2003) that promotes the evaluation of technologies for annotation, indexing and retrieval of visual data with the aim of providing information access to large collections of images in various usage scenarios and domains. In 2021, the 19th edition of ImageCLEF runs four main tasks: (i) a medical task that groups three previous tasks, i.e., caption analysis, tuberculosis prediction, and medical visual question answering and question generation, (ii) a nature coral task about segmenting and labeling collections of coral reef images, (iii) an Internet task addressing the problems of identifying hand-drawn and digital user interface components, and (iv) a new social media aware task on estimating potential real-life effects of online image sharing. Despite the current pandemic situation, the benchmark campaign received a strong participation with over 38 groups submitting more than 250 runs.
Access
Closed Access
Type of Publication
Book Chapter
Publisher
Springer, Lecture Notes in Computer Science, Experimental IR Meets Multilinguality, Multimodality, and Interaction
13/12/2021
Overview of The MediaEval 2021 Predicting Media Memorability Task
access here
Author(s)
Kiziltepe Rukiye Savran; Constantin Mihai Gabriel; Demarty Claire-Hélène; Healy Graham; Fosco Camilo; Seco de Herrera Alba G.; Halder Sebastian; Ionescu Bogdan; Matran-Fernandez Ana; Smeaton Alan F.; Sweeney Lorin
Institution
- University of Essex, United Kingdom - University Politehnica of Bucharest, Romania - InterDigital, France - Dublin City University, Ireland - Massachusetts Institute of Technology, Cambridge, United States
Abstract
This paper describes the MediaEval 2021 Predicting Media Memorability task, which is in its 4th edition this year, as the prediction of short-term and long-term video memorability remains a challenging task. In 2021, two datasets of videos are used: first, a subset of the TRECVid 2019 Video-to-Text dataset; second, the Memento10K dataset in order to provide opportunities to explore cross-dataset generalisation. In addition, an Electroencephalography (EEG)-based prediction pilot subtask is introduced. In this paper, we outline the main aspects of the task and describe the datasets, evaluation metrics, and requirements for participants’ submissions. Copyright 2021 for this paper by its authors.
Access
Open Access
Type of Publication
Conference Paper
Publisher
CEUR Workshop Proceedings, MediaEval 2021 Workshop
15/12/2021
Using Vision Transformers and Memorable Moments for the Prediction of Video Memorability
access here
Author(s)
Constantin Mihai Gabriel; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania
Abstract
This paper describes the approach taken by the AI Multimedia Lab team for the MediaEval 2021 Predicting Media Memorability task. Our approach is based on a Vision Transformer-based learning method, which is optimized by filtering the training sets for the two proposed datasets. We attempt to train the methods we propose with video segments that are more representative for the videos they are part of. We test several types of filtering architectures, and submit and test the architectures that best performed in our preliminary studies. Copyright 2021 for this paper by its authors.
Access
Open Access
Type of Publication
Conference Paper
Publisher
CEUR Workshop Proceedings, MediaEval 2021 Workshop
24/09/2021
Overview of the 2021 ImageCLEFdrawnUI Task: Detection and recognition of hand drawn and digital website UIs
access here
Author(s)
Berari Raul; Tăuteanu Andrei; Fichou Dimitri; Brie Paul; Dogariu Mihai; Ștefan Liviu Daniel; Constantin Mihai Gabriel; Ionescu Bogdan
Institution
TeleportHQ, Romania; Politehnica University of Bucharest, Romania
Abstract
An appealing web-page is a must have for most companies nowadays. The creation of such user interfaces is a complex process involving various actors such as project managers, designers and developers. Facilitating this process can democratize access to the web to non-experts. The second edition ImageCLEFdrawnUI 2021 addresses this issue by fostering systems that are capable of automatically generating a web-page from a sketch. Participants were challenged to develop machine learning solutions to analyze images of user interfaces and extract the position and type of its different elements, such as images, buttons and text. The task is separated into two subtasks, the wireframe subtask with hand drawn images and the screenshot subtask with digital images. In this article, we overview the task requirements and data as well as the participants' results. For the wireframe subtask, three teams submitted 21 runs and two of the teams outperformed the baseline, with the best run scoring 0.9 compared to a baseline of 0.747 in terms of mAP@0.5 IoU. For the screenshot subtask, one team submitted 7 runs and all runs scored better than the baseline in terms of mAP@0.5 IoU, the best run obtaining 0.628 against 0.329 for the baseline.
Access
Open Access
Type of Publication
Conference Paper
Publisher
CEUR Workshop Proceedings, Working Notes of CLEF 2021

2020

01/01/2020
Low Latency Automated Epileptic Seizure Detection: Individualized vs. Global Approaches
access here
Author(s)
Ciurea Alexe; Manoila Cristina-Petruta; Tautan Alexandra-Maria; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania; Cris Engn Ltd, Bishops Itchington, England
Abstract
Seizures significantly reduce the quality of life for epilepsy patients. Short detection latency allows the implementation of a fast response algorithm that can be used for real-time epileptic seizure detection applications. In this paper, we propose an efficient algorithm that extracts time-domain features in the form of one second long EEG recordings, using as input one second long EEG recordings, to ensure reduced computational complexity. The features are fed to a simple neural network for detection. We validate the model, individually and globally on two real-world data sets, namely: Upenn-Mayo Clinic and CHB-MIT. On a per patient basis, results show an accuracy, sensitivity, and specificity of up to 99.17%, 99.44%, and 98.89%, respectively, while globally we reach 92.69%, 91.99%, and 93.38%, respectively.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
8th International Conference on E-Health and Bioengineering
01/01/2020
Freezing of Gait Detection for Parkinson's Disease Patients using Accelerometer Data: Case Study
access here
Author(s)
Tautan Alexandra-Maria; Andrei Alexandra-Georgiana; Ionescu Bogdan
Institution
National University of Science & Technology POLITEHNICA Bucharest
Abstract
Freezing of Gait (FoG) is a common symptom of Parkinson’s Disease (PD) and its automatic detection would allow for an improvement of disease tracking and rehabilitation possibilities. In this study, we investigate a deep convolutional neural network for the automatic detection of FoG episodes in PD patients. The Daphnet dataset, containing three 3D accelerometer signals, was used for training and testing the proposed algorithm. Some of the benefits of this approach include: (i) the use of the simple, raw data, for classification; (ii) developing a method which is independent of the input window size. Using a 10-fold cross validation, we achieve a sensitivity and specificity of up to 93.44% and 87.38%, respectively.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
8th International Conference on E-Health and Bioengineering
02/03/2021
Deep Learning-based Person Search with Visual Attention Embedding
access here
Author(s)
Stefan Liviu-Daniel; Abdulamit Seila; Dogariu Mihai; Constantin Mihai Gabriel; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania
Abstract
In this work, we consider the problem of person search, which is a challenging task that requires both person detection and person re-identification run concurrently. In this context, we propose a person search approach based on deep neural networks that incorporates attention mechanisms to perform retrieval more robustly. Global and local features are extracted for person detection and person identification, respectively, boosted by attention layers that allow the extraction of discriminative feature representations, all in an end-to-end manner. We evaluate our approach on three challenging data sets and show that our proposed method improves the state-of-the-art networks.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
13th International Conference on Communications
02/03/2021
Human-Object Interaction: Application to Abandoned Luggage Detection in Video Surveillance Scenarios
access here
Author(s)
Dogariu Mihai; Stefan Liviu-Daniel; Constantin Mihai Gabriel; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania
Abstract
CCTV systems bring numerous advantages to security systems, but they require notable efforts from human operators in case of alarming events in order to detect the precise triggering moments. This paper proposes a system that can automatically trigger alarms when it detects abandoned luggage, detects the person that left the baggage and then tracks the suspicious person throughout the perimeter covered by a CCTV system. The system is based on Mask R-CNN and has been tested with several backbone configurations. We evaluate each subsystem independently on datasets specific for their task. The network model proves to be robust enough to carry on all of the three different tasks as demonstrated by tests.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
13th International Conference on Communications
08/06/2020
System fusion with deep ensembles
access here
Author(s)
Ştefan Liviu-Daniel; Constantin Mihai Gabriel; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania
Abstract
Deep neural networks (DNNs) are universal estimators that have achieved state-of-the-art performance in a broad spectrum of classification tasks, opening new perspectives for many applications. One of them is addressing ensemble learning. In this paper, we introduce a set of deep learning techniques for ensemble learning with dense, attention, and convolutional neural network layers. Our approach automatically discovers patterns and correlations between the decisions of individual classifiers, therefore, alleviating the difficulty of building such architectures. To assess its robustness, we evaluate our approach on two complex data sets that target different perspectives of predicting the user perception of multimedia data, i.e., interestingness and violence. The proposed approach outperforms the existing state-of-the-art algorithms by a large margin.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
International Conference on Multimedia Retrieval
16/03/2021
Automatic Sleep Stage Detection: A Study on the Influence of Various PSG Input Signals
access here
Author(s)
Tautan Alexandra-Maria; Rossi Alessandro; de Francisco Ruben; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania; Onera Hlth, Eindhoven, Netherlands
Abstract
Automatic sleep stage detection can be performed using a variety of input signals from a polysomnographic (PSG) recording. In this study, we investigate the effect of different input signals on the performance of feature-based automatic sleep stage classification algorithms with both a Random Forest (RF) and Multilayer Perceptron (MLP) classifier. Combinations of the EEG (electroencephalographic) signal and ECG (electrocardiographic), EMG (electromyographic) and respiratory signals as input are investigated with respect to using single channel and multi-channel EEG as input. The Physionet "You Snooze, You Win" dataset is used for the study. The RF classifier consistently outperforms our MLP implementation in all cases and is positively affected by specific signal combinations. The overall classification performance using a single channel EEG is high (an accuracy, precision and recall of 86.91 %, 89.52%, 86.91% respectively) using RF. The results are comparable to the performance obtained using six EEG channels as input. Adding respiratory signals to the inputs processed by RF increases the N2 stage detection performance with 20%, while adding the EMG signal improves the accuracy of the REM stage detection with 5%. Our analysis shows that adding specific signals as input to RF improves the accuracy of specific sleep stages and increases the overall performance. Using a combination of EEG and respiratory signals we achieved an accuracy of 93% for the RF classifier.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
42nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society
08/06/2020
LRRo: A Lip Reading Data Set for the Under-resourced Romanian Language
access here
Author(s)
Jitaru Andrei Cosmin; Abdulamit Seila; Ionescu Bogdan
Institution
University Politehnica of Bucharest
Abstract
Automatic lip reading is a challenging and important research topic as it allows to transcript visual-only recordings of a speaker into editable text. There are many useful applications of such technology, starting from the aid of hearing impaired people, to improving general automatic speech recognition. In this paper, we introduce and release publicly lip reading resources for Romanian language. Two distinct collections are proposed: (i) wild LRRo data is designed for an Internet-in-the-wild, ad-hoc scenario, coming with more than 35 different speakers, 1.1k words, a vocabulary of 21 words, and more than 20 hours; (ii) lab LRRo data, addresses a lab controlled scenario for more accurate data, coming with 19 different speakers, 6.4k words, a vocabulary of 48 words, and more than 5 hours. This is the first resource available for Romanian lip reading and would serve as a pioneering foundation for this under-resourced language. Nevertheless, given the fact that word-level models are not strongly language dependent, these resources will also contribute to the general lip-reading task via transfer learning. To provide a validation and reference for future developments, we propose two strong baselines via VGG-M and Inception-V4 state-of-the-art deep network architectures.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
ACM Multimedia Systems Conference
01/01/2020
Overview of the ImageCLEF 2020: Multimedia Retrieval in Medical, Lifelogging, Nature, and Internet Applications
access here
Author(s)
Ionescu Bogdan; Müller Henning; Péteri Renaud; Ben Abacha Asma; Datla Vivek; Hasan Sadid A.; Demner-Fushman Dina; Kozlovski Serge; Liauchuk Vitali; Cid Yashin Dicente; Kovalev Vassili; Pelka Obioma; Friedrich Christoph M.; de Herrera Alba Garcia Seco; Ninh Van-Tu; Le Tu-Khiem; Zhou Liting; Piras Luca; Riegler Michael; Halvorsen Pal; Tran Minh-Triet; Lux Mathias; Gurrin Cathal; Dang-Nguyen Duc-Tien; Chamberlain Jon; Clark Adrian; Campello Antonio; Fichou Dimitri; Berari Raul; Brie Paul; Dogariu Mihai; Stefan Liviu Daniel; Constantin Mihai Gabriel
Institution
Univ Politehn, Bucharest, Romania; Univ Appl Sci Western Switzerland HES SO, Sierre, Switzerland; Univ La Rochelle, La Rochelle, France; Natl Lib Med, Bethesda, MD USA; Philips Res Cambridge, Cambridge, MA, USA; CVS Health, Monroeville, PA, USA; United Inst Informat Problems, Minsk, BELARUS; Univ Warwick, Coventry, W Midlands, England; Univ Appl Sci & Arts Dortmund, Dortmund, Germany; Univ Essex, Colchester, Essex, England; Dublin City Univ, Dublin, Ireland; Pluribus One, Cagliari, Italy; Univ Cagliari, Cagliari, Italy; Univ Oslo, Oslo, Norway; Univ Sci, Ho Chi Minh City, Vietnam; Klagenfurt Univ, Klagenfurt, Austria; Univ Bergen, Bergen, Norway; Wellcome Trust Res Labs, London, England; TeleportHQ, Cluj Napoca, Romania
Abstract
This paper presents an overview of the ImageCLEF 2020 lab that was organized as part of the Conference and Labs of the Evaluation Forum - CLEF Labs 2020. ImageCLEF is an ongoing evaluation initiative (first run in 2003) that promotes the evaluation of technologies for annotation, indexing and retrieval of visual data with the aim of providing information access to large collections of images in various usage scenarios and domains. In 2020, the 18th edition of ImageCLEF runs four main tasks: (i) a medical task that groups three previous tasks, i.e., caption analysis, tuberculosis prediction, and medical visual question answering and question generation, (ii) a lifelog task (videos, images and other sources) about daily activity understanding, retrieval and summarization, (iii) a coral task about segmenting and labeling collections of coral reef images, and (iv) a new Internet task addressing the problems of identifying hand-drawn user interface components. Despite the current pandemic situation, the benchmark campaign received a strong participation with over 40 groups submitting more than 295 runs.
Access
Closed Access
Type of Publication
Book Chapter
Publisher
Springer, Lecture Notes in Computer Science, Experimental IR Meets Multilinguality, Multimodality, and Interaction
01/01/2020
ImageCLEF 2020: Multimedia Retrieval in Lifelogging, Medical, Nature, and Internet Applications
access here
Author(s)
Ionescu Bogdan; Müller Henning; Péteri Renaud; Dang-Nguyen Duc-Tien; Zhou Liting; Piras Luca; Riegler Michael; Halvorsen Pal; Tran Minh-Triet; Lux Mathias; Gurrin Cathal; Chamberlain Jon; Clark Adrian; Campello Antonio; de Herrera Alba G. Seco; Ben Abacha Asma; Datla Vivek; Hasan Sadid A.; Liu Joey; Demner-Fushman Dina; Pelka Obioma; Friedrich Christoph M.; Cid Yashin Dicente; Kozlovski Serge; Liauchuk Vitali; Kovalev Vassili; Berari Raul; Brie Paul; Fichou Dimitri; Dogariu Mihai; Stefan Liviu Daniel; Constantin Mihai Gabriel
Institution
Univ Politehn Bucharest, Romania; Univ Appl Sci Western Switzerland HES SO, Delemont, Switzerland; Univ La Rochelle, La Rochelle, France; Univ Bergen, Bergen, Norway; Dublin City Univ, Dublin, Ireland; Pluribus One, Cagliari, Italy; Univ Cagliari, Cagliari, Italy; SimulaMet, Oslo, Norway; Univ Sci, Ho Chi Minh City, Vietnam; Klagenfurt Univ, Klagenfurt, Austria; Univ Essex, Colchester, Essex, England; Wellcome Trust Res Labs, London, England; Natl Lib Med, Bethesda, MD USA; Philips Res Cambridge, Cambridge, MA USA; CVS Health, Monroeville, PA USA; Univ Appl Sci & Arts Dortmund, Dortmund, Germany; Univ Warwick, Coventry, W Midlands, England; United Inst Informat Problems, Minsk, BELARUS; teleportHQ, Cluj Napoca, Romania
Abstract
This paper presents an overview of the 2020 ImageCLEF lab that will be organized as part of the Conference and Labs of the Evaluation Forum - CLEF Labs 2020 in Thessaloniki, Greece. ImageCLEF is an ongoing evaluation initiative (run since 2003) that promotes the evaluation of technologies for annotation, indexing and retrieval of visual data with the aim of providing information access to large collections of images in various usage scenarios and domains. In 2020, the 18th edition of ImageCLEF will organize four main tasks: (i) a Lifelog task (videos, images and other sources) about daily activity understanding, retrieval and summarization, (ii) a Medical task that groups three previous tasks (caption analysis, tuberculosis prediction, and medical visual question answering) with new data and adapted tasks, (iii) a Coral task about segmenting and labeling collections of coral images for 3D modeling, and (iv) a new Web user interface task addressing the problems of detecting and recognizing hand drawn website UIs (User Interfaces) for generating automatic code. The strong participation, with over 235 research groups registering and 63 submitting over 359 runs for the tasks in 2019, shows an important interest in this benchmarking campaign. We expect the new tasks to attract at least as many researchers for 2020.
Access
Closed Access
Type of Publication
Book Chapter
Publisher
Springer, Lecture Notes in Computer Science, Advances in Information Retrieval
15/12/2020
Overview of MediaEval 2020 Predicting Media Memorability Task: What Makes a Video Memorable?
access here
Author(s)
De Herrera Alba G. Seco; Kiziltepe Rukiye Savran; Chamberlain Jon; Constantin Mihai Gabriel; Demarty Claire-Hélène; Doctor Faiyaz; Ionescu Bogdan; Smeaton Alan F.
Institution
University of Essex, United Kingdom; University Politehnica of Bucharest, Romania; InterDigital, R&I, France; Dublin City University, Ireland
Abstract
This paper describes the MediaEval 2020 Predicting Media Memorability task. After first being proposed at MediaEval 2018, the Predicting Media Memorability task is in its 3rd edition this year, as the prediction of short-term and long-term video memorability (VM) remains a challenging task. In 2020, the format remained the same as in previous editions. This year the videos are a subset of the TRECVid 2019 Video-to-Text dataset, containing more action-rich video content as compared with the 2019 task. In this paper, a description of some aspects of this task is provided, including its main characteristics, a description of the collection, the ground truth dataset, evaluation metrics and the requirements for participants' run submissions.
Access
Open Access
Type of Publication
Conference Paper
Publisher
CEUR Workshop Proceedings, MediaEval 2020
25/09/2020
Overview of the 2020 ImageCLEFdrawnUI Task: Detection and Recognition of Hand Drawn Website UIs
access here
Author(s)
Fichou Dimitri; Berari Raul; Brie Paul; Dogariu Mihai; Ștefan Liviu Daniel; Constantin Mihai Gabriel; Ionescu Bogdan
Institution
teleportHQ, Romania; University Politehnica of Bucharest, Romania
Abstract
Nowadays, companies’ online presence and user interfaces are critical for their success. However, such user interfaces involve multiple actors. Some of them, like project managers, designers or developers, are hard to recruit and train, making the process slow and prone to errors. There is a need for new tools to facilitate the creation of user interfaces. In this context, the detection and recognition of hand drawn website UIs task was run in its first edition with ImageCLEF 2020. The task challenged participants to provide automatic solutions for annotating different user interface elements, e.g., buttons, paragraphs and checkboxes, starting from their hand drawn wireframes. Three teams submitted a total of 18 runs using different object detection techniques and all teams obtained better scores compared to the recommended baseline. The best run in terms of mAP 0.5 IoU obtained a score of 0.793 compared to the baseline score of 0.572. The leading overall precision score was 0.970, compared to the baseline score of 0.947. In this overview working notes paper, we present in detail the task and these results.
Access
Open Access
Type of Publication
Conference Paper
Publisher
CEUR Workshop Proceedings, CLEF 2020

2019

04/06/2019
Movie Genome: Alleviating New Item Cold Start in Movie Recommendation
access here
Author(s)
Deldjoo Yashar; Dacrema Maurizio Ferrari; Constantin Mihai Gabriel; Eghbal‑Zadeh Hamid; Cereda Stefano; Schedl Markus; Ionescu Bogdan; Cremonesi Paolo
Institution
Politechnico di Milano, Italy; University Politehnica of Bucharest, Romania; Johannes Kepler University Linz, Austria
Abstract
As of today, most movie recommendation services base their recommendations on collaborative filtering (CF) and/or content‑based filtering (CBF) models that use metadata (e.g., genre or cast). In most video‑on‑demand and streaming services, however, new movies and TV series are continuously added. CF models are unable to make predictions in such a scenario, since the newly added videos lack interactions a problem technically known as new item cold start (CS). Currently, the most common approach to this problem is to switch to a purely CBF method, usually by exploiting textual metadata. This approach is known to have lower accuracy than CF because it ignores useful collaborative information and relies on human‑generated textual metadata, which are expensive to collect and often prone to errors. User‑generated content, such as tags, can also be rare or absent in CS situations. In this paper, we introduce a new movie recommender system that addresses the new item problem in the movie domain by: (i) integrating state‑of‑the‑art audio and visual descriptors, which can be automatically extracted from video content and constitute what we call the movie genome; (ii) exploiting an effective data fusion method named canonical correlation analysis, which was successfully tested in our previous works; (iii) proposing a two‑step hybrid approach which trains a CF model on warm items (items with interactions) and leverages the learned model on the movie genome to recommend cold items (items without interactions). Experimental validation is carried out using a system‑centric study on a large‑scale, real‑world movie recommendation dataset both in an absolute cold start and in a cold to warm transition; and a user‑centric online experiment measuring different subjective aspects, such as satisfaction and diversity. Results show the benefits of this approach compared to existing approaches.
Access
Closed Access
Type of Publication
Journal Article
Publisher
Springer, User Modeling and User‑Adapted Interaction
17/07/2019
Computational Understanding of Visual Interestingness Beyond Semantics: Literature Survey and Analysis of Covariates
access here
Author(s)
Constantin Mihai Gabriel; Redi Miriam; Zen Gloria; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania; Kings Coll London, England; University of Trento, Italy
Abstract
Understanding visual interestingness is a challenging task addressed by researchers in various disciplines ranging from humanities and psychology to, more recently, computer vision and multimedia. The rise of infographics and the visual information overload that we are facing today have given this task a crucial importance. Automatic systems are increasingly needed to help users navigate through the growing amount of visual information available, either on the web or our personal devices, for instance by selecting relevant and interesting content. Previous studies indicate that visual interest is highly related to concepts like arousal, unusualness, or complexity, where these connections are found based on psychological theories, user studies, or computational approaches. However, the link between visual interestingness and other related concepts has been only partially explored so far, for example, by considering only a limited subset of covariates at a time. In this article, we present a comprehensive survey on visual interestingness and related concepts, aiming to bring together works based on different approaches, highlighting controversies, and identifying links that have not been fully investigated yet. Finally, we present some open questions that may be addressed in future works. Our work aims to support researchers interested in visual interestingness and related subjective or abstract concepts, providing an in-depth overlook at state-of-the-art theories in humanities and methods in computational approaches, as well as providing an extended list of datasets.
Access
Closed Access
Type of Publication
Journal Article
Publisher
Association for Computing Machinery, ACM Computing Surveys
23/11/2019
Automatic Sleep Stage Detection using a Single Channel Frontal EEG
access here
Author(s)
Tautan Alexandra-Maria; Rossi Alessandro; de Francisco Ruben; Ionescu Bogdan
Institution
University Politehnica of Bucharest; Onera Health, Netherlands
Abstract
Sleep stage detection algorithms can significantly reduce the workload of manual sleep staging and in improving sleep disorder diagnostics. In this paper, we focus on the automatic detection of sleep stages from a frontal channel EEG using expert defined features in both time and frequency domain, fed to a random forest classifier. The proposed approach shows that using a single frontal channel EEG signal as input to automated sleep scoring algorithms is as effective as using EEGs recorded from the central and occipital regions. Mean overall accuracy, precision and recall were respectively of 72.98%, 79.75% and 71.83%, when validating our method on the MGH (Massachusetts General Hospital), You Snooze, you win dataset.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
E-Health and Bioengineering Conference
21/11/2019
Parkinson's Disease Detection from Gait Patterns
access here
Author(s)
Andrei Alexandra-Georgiana; Tautan Alexandra-Maria; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania
Abstract
Parkinson's disease (PD) patients display abnormal gait patterns with impairments and postural instability. In this paper, we propose an automatic system for extracting gait parameters. Various features were extracted from force sensors and analyzed using a threshold-based algorithm and machine learning techniques with the objective to identify the most significant features that would best characterize the presence of the disease. A machine learning algorithm using support vector machine method was developed to identify the presence of the disease. The analyses of the results show that the machine learning algorithm has the best accuracy of 100% in distinguishing between the two groups when looking at features based on stride, swing and stance phases.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
E-Health and Bioengineering Conference
10/09/2020
Detection of Epileptic Seizures using Unsupervised Learning Techniques for Feature Extraction
access here
Author(s)
Tautan Alexandra-Maria; Dogariu Mihai; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania
Abstract
Automatic epileptic seizure prediction from EEG (electroencephalogram) data is a challenging problem. This is due to the complex nature of the signal itself and of the generated abnormalities. In this paper, we investigate several deep network architectures i.e. stacked autoencoders and convolutional networks, for unsupervised EEG feature extraction. The proposed EEG features are used to solve the prediction of epileptic seizures via Support Vector Machines. This approach has many benefits: (i) it allows to achieve a high accuracy using small size sample data, e.g. 1 second EEG data; (ii) features are determined in an unsupervised manner, without the need for manual selection. Experimental validation is carried out on real-world data, i.e. the CHB-MIT dataset. We achieve an overall accuracy, sensitivity and specificity of up to 92%, 95% and 90% respectively.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society
01/01/2019
ImageCLEF 2019: Multimedia Retrieval in Medicine, Lifelogging, Security and Nature
access here
Author(s)
Ionescu Bogdan; Müller Henning; Péteri Renaud; Cid Yashin Dicente; Liauchuk Vitali; Kovalev Vassili; Klimuk Dzmitri; Tarasau Aleh; Ben Abacha Asma; Hasan Sadid A.; Datla Vivek; Liu Joey; Demner-Fushman Dina; Dang-Nguyen Duc-Tien; Piras Luca; Riegler Michael; Tran Minh-Triet; Lux Mathias; Gurrin Cathal; Pelka Obioma; Friedrich Christoph M.; de Herrera Alba Garcia Seco; Garcia Narciso; Kavallieratou Ergina; del Blanco Carlos Roberto; Cuevas Carlos; Vasillopoulos Nikos; Karampidis Konstantinos; Chamberlain Jon; Clark Adrian; Campello Antonio
Institution
Univ Politehn Bucharest, Romania; Univ Appl Sci Western Switzerland HES SO, Sierre, Switzerland; La Rochelle Univ, La Rochelle, France; Dublin City Univ, Dublin, Ireland; Pluribus One, Cagliari, Italy; Univ Cagliari, Cagliari, Italy; Univ Oslo, Oslo, Norway; Univ Sci, Ho Chi Minh City, Vietnam; Klagenfurt Univ, Klagenfurt, Austria; Inst Informat, Minsk, BELARUS; Philips Res Cambridge, Cambridge, MA USA; Natl Lib Med, Bethesda, MD USA; Univ Appl Sci & Arts Dortmund, Dortmund, Germany; Univ Essex, Colchester, Essex, England; ETS Ingn Telecomunicac, Madrid, Spain; Univ Aegean, Mitilini, Greece; Univ Bergen, Bergen, Norway; Republican Res & Pract Ctr Pulmonol & TB, Minsk, BELARUS; Filament, London, England
Abstract
This paper presents an overview of the ImageCLEF 2019 lab, organized as part of the Conference and Labs of the Evaluation Forum - CLEF Labs 2019. ImageCLEF is an ongoing evaluation initiative (started in 2003) that promotes the evaluation of technologies for annotation, indexing and retrieval of visual data with the aim of providing information access to large collections of images in various usage scenarios and domains. In 2019, the 17th edition of ImageCLEF runs four main tasks: (i) a medical task that groups three previous tasks (caption analysis, tuberculosis prediction, and medical visual question answering) with new data, (ii) a lifelog task (videos, images and other sources) about daily activities understanding, retrieval and summarization, (iii) a new security task addressing the problems of automatically identifying forged content and retrieve hidden information, and (iv) a new coral task about segmenting and labeling collections of coral images for 3D modeling. The strong participation, with 235 research groups registering, and 63 submitting over 359 runs, shows an important interest in this benchmark campaign.
Access
Closed Access
Type of Publication
Book Chapter
Publisher
Springer, Lecture Notes in Computer Science, Experimental IR Meets Multilinguality, Multimodality, and Interaction
01/01/2019
ImageCLEF 2019: Multimedia Retrieval in Lifelogging, Medical, Nature, and Security Applications
access here
Author(s)
Ionescu Bogdan; Müller Henning; Péteri Renaud; Dang-Nguyen Duc-Tien; Piras Luca; Riegler Michael; Tran Minh-Triet; Lux Mathias; Gurrin Cathal; Cid Yashin Dicente; Liauchuk Vitali; Kovalev Vassili; Ben Abacha Asma; Hasan Sadid A.; Datla Vivek; Liu Joey; Demner-Fushman Dina; Pelka Obioma; Friedrich Christoph M.; Chamberlain Jon; Clark Adrian; de Herrera Alba Garcia Seco; Garcia Narciso; Kavallieratou Ergina; del Blanco Carlos Roberto; Rodríguez Carlos Cuevas; Vasillopoulos Nikos; Karampidis Konstantinos
Institution
Univ Politehn Bucharest, Romania; Univ Appl Sci Western Switzerland HES SO, Sierre, Switzerland; Univ La Rochelle, La Rochelle, France; Univ Bergen, Bergen, Norway; Univ Cagliari, Cagliari, Italy; Univ Oslo, Oslo, Norway; Univ Sci, Ho Chi Minh City, Vietnam; Klagenfurt Univ, Klagenfurt, Austria; Inst Informat, Minsk, BELARUS; Philips Res Cambridge, Cambridge, MA USA; Natl Lib Med, Bethesda, MD USA; Univ Appl Sci & Arts, Dortmund, Germany; Univ Essex, Colchester, Essex, England; ETS Ingenieros Telecomunicac, Madrid, Spain; Univ Aegean, Mitilini, Greece; Univ Geneva, Geneva, Switzerland
Abstract
This paper presents an overview of the foreseen ImageCLEF 2019 lab that will be organized as part of the Conference and Labs of the Evaluation Forum - CLEF Labs 2019. ImageCLEF is an ongoing evaluation initiative (started in 2003) that promotes the evaluation of technologies for annotation, indexing and retrieval of visual data with the aim of providing information access to large collections of images in various usage scenarios and domains. In 2019, the 17th edition of ImageCLEF will run four main tasks: (i) a Lifelog task (videos, images and other sources) about daily activities understanding, retrieval and summarization, (ii) a Medical task that groups three previous tasks (caption analysis, tuberculosis prediction, and medical visual question answering) with newer data, (iii) a new Coral task about segmenting and labeling collections of coral images for 3D modeling, and (iv) a new Security task addressing the problems of automatically identifying forged content and retrieve hidden information. The strong participation, with over 100 research groups registering and 31 submitting results for the tasks in 2018, shows an important interest in this benchmarking campaign and we expect the new tasks to attract at least as many researchers for 2019.
Access
Book Chapter
Type of Publication
Conference Paper
Publisher
Springer, Lecture Notes in Computer Science, Advances in Information Retrieval
01/01/2019
Overview of the Multimedia Information Processing for Personality & Social Networks Analysis Contest
access here
Author(s)
Ramírez Gabriela; Villatoro Esau; Ionescu Bogdan; Escalante Hugo Jair; Escalera Sergio; Larson Martha; Müller Henning; Guyon Isabelle
Institution
Univ Autonoma Metropolitana, Unidad Cuajimalpa UAM C, Mexico City, DF, Mexico; Univ Politeh Bucharest, Romania; Inst Nacl Astrofis Opt & Electr, Cholula, Mexico; ChaLearn, Berkeley, CA, USA; Delft Univ Technol, MIR Lab, Delft, Netherlands; Univ Appl Sci Western Switzerland HES SO, Sierre, Switzerland; Univ Barcelona, Comp Vis Ctr UAB, Barcelona, Spain; Univ Paris Saclay, Paris, France
Abstract
Progress in the autonomous analysis of human behavior from multimodal information has led to very effective methods able to deal with problems like action/gesture/activity recognition, pose estimation, opinion mining, user tailored retrieval, etc. However, it is only recently that the community has been starting to look into related problems associated with more complex behavior, including personality analysis and deception detection. We organized an academic contest co-located with ICPR2018 running two tasks in this direction. One task addressed information fusion in the context of multimodal image retrieval in social media. The other focused on inferring personality traits from written essays, including textual and handwritten information. This paper describes both tasks in detail, outlining the associated problems, data sets, evaluation metrics, and protocols, as well as providing an analysis of the performance of simple baselines.
Access
Closed Access
Type of Publication
Book Chapter
Publisher
Springer, Lecture Notes in Computer Science, Pattern Recognition and Information Forensics
30/10/2019
The predicting media memorability task at MediaEval 2019
access here
Author(s)
Constantin Mihai Gabriel; Ionescu Bogdan; Demarty Claire-Hélène; Duong Ngoc Q.K.; Alameda-Pineda Xavier; Sjöberg Mats
Institution
University Politehnica of Bucharest, Romania; InterDigital, France; INRIA, France; CSC, Finland
Abstract
In this paper, we present the Predicting Media Memorability task, which is running for the second year at the MediaEval 2019 Benchmarking Initiative for Multimedia Evaluation. Participants are required to create systems that are able to automatically predict the memorability scores of a collection of videos, which should represent the “short-term” and “long-term” memorability of the samples. We will describe all the aspects of this task, including its main characteristics, a description of the development and test data sets, the ground truth, the evaluation metrics and the required runs.
Access
Open Access
Type of Publication
Conference Paper
Publisher
CEUR Workshop Proceedings, MediaEval 2019
01/09/2019
Using aesthetics and action recognition-based networks for the prediction of media memorability
access here
Author(s)
Constantin Mihai Gabriel; Kang Chen; Dinu Gabriela; Dufaux Frédéric; Valenzise Giuseppe; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania; Université Paris-Saclay, France
Abstract
In this working note paper we present the contribution and results of the participation of the UPB-L2S team to the MediaEval 2019 Predicting Media Memorability Task. The task requires participants to develop machine learning systems able to predict automatically whether a video will be memorable for the viewer, and for how long (e.g., hours, or days). To solve the task, we investigated several aesthetics and action recognition-based deep neural networks, either by fine-tuning models or by using them as pre-trained feature extractors. Results from different systems were aggregated in various fusion schemes. Experimental results are positive showing the potential of transfer learning for this task.
Access
Open Access
Type of Publication
Conference Paper
Publisher
CEUR Workshop Proceedings, MediaEval 2019
01/09/2019
Multimedia Lab @ ImageClef 2019 lifelog moment retrieval task
access here
Author(s)
Dogariu Mihai; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania
Abstract
This paper presents the participation of the Multimedia Lab to the 2019 ImageCLEF Lifelog Moment Retrieval task. Given 10 topics in natural language description, participants are expected to retrieve 50 images for each topic that best correspond to its description. Our method uses the data provided by the organizers, without adding any further annotations. We first remove severely blurred images. Then, according to a list of constraints concerning the images' metadata, we remove uninformative images. Finally, we compute a relevance score based on the detection scores provided by the organizers and select the 50 highest ranked images for submission as these should best match the search query.
Access
Open Access
Type of Publication
Conference Paper
Publisher
CEUR Workshop Proceedings, CLEF 2019

2018

01/03/2019
Audio-visual Encoding of Multimedia Content for Enhancing Movie Recommendations
access here
Author(s)
Deldjoo Yashar; Constantin Mihai Gabriel; Eghbal-Zadeh Hamid; Ionescu Bogdan; Schedl Markus; Cremonesi Paolo
Institution
Polytechnic University of Milan, Italy; University Politehnica of Bucharest, Romania; Johannes Kepler Univ Linz, Austria
Abstract
We propose a multi-modal content-based movie recommender system that replaces human-generated metadata with content descriptions automatically extracted from the visual and audio channels of a video. Content descriptors improve over traditional metadata in terms of both richness (it is possible to extract hundreds of meaningful features covering various modalities) and quality (content features are consistent across different systems and immune to human errors). Our recommender system integrates state-of-the-art aesthetic and deep visual features as well as block-level and i-vector audio features. For fusing the different modalities, we propose a rank aggregation strategy extending the Borda count approach. We evaluate the proposed multi-modal recommender system comprehensively against metadata-based baselines. To this end, we conduct two empirical studies: (i) a system-centric study to measure the offline quality of recommendations in terms of accuracy-related and beyond-accuracy performance measures (novelty, diversity, and coverage), and (ii) a user-centric online experiment, measuring different subjective metrics, including relevance, satisfaction, and diversity. In both studies, we use a dataset of more than 4,000 movie trailers, which makes our approach versatile. Our results shed light on the accuracy and beyond-accuracy performance of audio, visual, and textual features in content-based movie recommender systems.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
ACM, 12th ACM Conference on Recommender Systems (RecSys)
22/11/2018
Little-Big Deep Neural Networks for Embedded Video Surveillance
access here
Author(s)
Mitrea Catalin Alexandru; Constantin Mihai-Gabriel; Stefan Liviu-Daniel; Ghenescu Marian; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania; Softrust Vision Analytics, Romania
Abstract
Embedded systems are under continuous development of innovative technological trends, such as adoption of smart devices which are becoming capable of running complex video analytics tasks locally. Lately, deep neural networks have been successfully applied in the field of computer vision achieving state-of-the-art results. These techniques are not yet suitable for resource limited deployments due to high memory footprint and computational cost, factors that affect the inference time. To tackle these constraints, we propose a person re-identification architecture based on the DarkNET Deep Learning Neural Network architecture for person detection and segmentation, which is combined with SIFT algorithm for feature extraction and SVM for the classification task. The algorithm is implemented on a low processing embedded hardware system, namely Raspberry PI. The experiments were conducted on the proposed SPOTTER dataset. The results are conclusive to continue further investigation of applying specialized algorithms for real-time applications which can run on resource limited embedded systems.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
12th IEEE International Conference on Communications (COMM)
22/01/2019
SubDiv17: A Dataset for Investigating Subjectivity in the Visual Diversification of Image Search Results
access here
Author(s)
Rohm Maia; Ionescu Bogdan; Gînșca Alexandru Lucian; Santos Rodrygo L. T.; Müller Henning
Institution
TU Wien, Vienna, Austria; University Politehnica of Bucharest, Romania; CEA LIST, Palaiseau, France; University Fed Minas Gerais, Belo Horizonte, Brazil; University of Applied Sciences & Arts Western Switzerland, Switzerland
Abstract
In this paper, we present a new dataset that facilitates the comparison of approaches aiming at the diversification of image search results. The dataset was explicitly designed for general-purpose, multi-topic queries and provides multiple ground truth annotations to allow for the exploration of the subjectivity aspect in the general task of diversification. The dataset provides images and their metadata retrieved from Flickr for around 200 complex queries. Additionally, to encourage experimentations (and cooperations) from different communities such as information and multimedia retrieval, a broad range of pre-computed descriptors is provided. The proposed dataset was successfully validated during the MediaEval 2017 Retrieving Diverse Social Images task using 29 submitted runs.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
Assoc Comp Machinery, 9th ACM Multimedia Systems Conference (MMSys)
22/01/2019
MMTF-14K: A Multifaceted Movie Trailer Feature Dataset for Recommendation and Retrieval
access here
Author(s)
Deldjoo Yashar; Constantin Mihai Gabriel; Ionescu Bogdan; Schedl Markus; Cremonesi Paolo
Institution
Polytechnic University of Milan, Italy; University Politehnica of Bucharest, Romania; Johannes Kepler University Linz, Linz, Austria
Abstract
In this paper we propose a new dataset, i.e., the MMTF-14K multi-faceted dataset. It is primarily designed for the evaluation of video-based recommender systems, but it also supports the exploration of other multimedia tasks such as popularity prediction, genre classification and auto-tagging (aka tag prediction). The data consists of 13,623 Hollywood-type movie trailers, ranked by 138,492 users, generating a total of almost 12.5 million ratings. To address a broader community, metadata, audio and visual descriptors are also pre-computed and provided along with several baseline benchmarking results for uni-modal and multi-modal recommendation systems. This creates a rich collection of data for benchmarking results and which supports future development of this field.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
Assoc Comp Machinery, 9th ACM Multimedia Systems Conference (MMSys)
01/01/2018
Overview of ImageCLEF 2018: Challenges, Datasets and Evaluation
access here
Author(s)
Ionescu Bogdan; Müller Henning; Villegas Mauricio; de Herrera Alba Garcia Seco; Eickhoff Carsten; Andrearczyk Vincent; Cid Yashin Dicente; Liauchuk Vitali; Kovalev Vassili; Hasan Sadid A.; Ling Yuan; Farri Oladimeji; Liu Joey; Lungren Matthew; Dang-Nguyen Duc-Tien; Piras Luca; Riegler Michael; Zhou Liting; Lux Mathias; Gurrin Cathal
Institution
Univ Politehn Bucharest, Romania; Univ Appl Sci Western Switzerland HES SO, Sierre, Switzerland; Omni Us, Berlin, Germany; Univ Essex, Colchester, Essex, England; Brown Univ, Providence, RI, USA; United Inst Informat Problems, Minsk, BELARUS; Artificial Intelligence Lab, Philips Res North Amer, Cambridge, MA, USA; Stanford Univ, Dept Radiol, Stanford, CA, USA; Dublin City Univ, Dublin, Ireland; Univ Cagliari & Pluribus One, Cagliari, Italy; Univ Oslo, Oslo, Norway; Simula Metropolitan Ctr Digital Engn, Oslo, Norway; Klagenfurt Univ, Klagenfurt, Austria
Abstract
This paper presents an overview of the ImageCLEF 2018 evaluation campaign, an event that was organized as part of the CLEF (Conference and Labs of the Evaluation Forum) Labs 2018. ImageCLEF is an ongoing initiative (it started in 2003) that promotes the evaluation of technologies for annotation, indexing and retrieval with the aim of providing information access to collections of images in various usage scenarios and domains. In 2018, the 16th edition of ImageCLEF ran three main tasks and a pilot task: (1) a caption prediction task that aims at predicting the caption of a figure from the biomedical literature based only on the figure image; (2) a tuberculosis task that aims at detecting the tuberculosis type, severity and drug resistance from CT volumes of the lung; (3) a LifeLog task (videos, images and other sources) about daily activities understanding and moment retrieval; and (4) a pilot task on visual question answering where systems are tasked with answering medical questions. The strong participation, with over 100 research groups registering and 31 submitting results for the tasks, shows an increasing interest in this benchmarking campaign.
Access
Closed Access
Type of Publication
Book Chapter
Publisher
Springer, Lecture Notes in Computer Science, Experimental IR Meets Multilinguality, Multimodality, and Interaction
01/09/2018
MediaEval 2018: Predicting media memorability
access here
Author(s)
Cohendet Romain; Demarty Claire-Hélène; Duong Ngoc Q.K.; Sjöberg Mats; Ionescu Bogdan; Do Thanh-Toan
Institution
Technicolor, Rennes, France; Aalto University, Finland; University Politehnica of Bucharest, Romania; University of Adelaide, Australia
Abstract
In this paper, we present the Predicting Media Memorability task, which is proposed as part of the MediaEval 2018 Benchmarking Initiative for Multimedia Evaluation. Participants are expected to design systems that automatically predict memorability scores for videos, which reflect the probability of a video being remembered. In contrast to previous work in image memorability prediction, where memorability was measured a few minutes after memorization, the proposed dataset comes with "short-term" and "long-term" memorability annotations. All task characteristics are described, namely: the task's challenges and breakthrough, the released data set and ground truth, the required runs and the evaluation metrics.
Access
Open Access
Type of Publication
Conference Paper
Publisher
CEUR Workshop Proceedings, MediaEval 2018
01/09/2018
The MediaEval 2018 movie recommendation task: Recommending movies using content
access here
Author(s)
Deldjoo Yashar; Constantin Mihai Gabriel; Dritsas Athanasios; Ionescu Bogdan; Schedl Markus
Institution
Politecnico di Milano, Italy; University Politehnica of Bucharest, Romania; Delft University of Technology, Netherlands; Johannes Kepler University Linz, Austria
Abstract
In this paper we introduce the MediaEval 2018 task Recommending Movies Using Content. It focuses on predicting overall scores that users give to movies, i.e., average rating (representing overall appreciation of the movies by the viewers) and the rating variance/standard deviation (representing agreement/disagreement between users) using audio, visual and textual features derived from selected movie scenes. We release a dataset of movie clips consisting of 7K clips for 800 unique movies. In the paper, we present the challenge, the dataset and ground truth creation, the evaluation protocol and the requested runs.
Access
Open Access
Type of Publication
Conference Paper
Publisher
CEUR Workshop Proceedings, MediaEval 2018
01/09/2018
Multimedia Lab @ ImageCLEF 2018 lifelog moment retrieval task
access here
Author(s)
Dogariu Mihai; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania
Abstract
This paper describes the participation of the Multimedia Lab team at the ImageCLEF 2018 Lifelog Moment Retrieval Task. Our method makes use of visual information, text information and metadata. Our approach consists of the following steps: we reduce the number of images to analyze by eliminating the ones that are blurry or do not meet certain metadata criteria, extract relevant concepts with several Convolutional Neural Networks, perform K-means clustering on the Oriented Gradients and Color Histograms features and rerank the remaining images according to a relevance score computed between each image concept and the queried topic.
Access
Open Access
Type of Publication
Conference Paper
Publisher
CEUR Workshop Proceedings, CLEF 2018

2017

01/11/2017
Efficient human action recognition using histograms of motion gradients and VLAD with descriptor shape information
access here
Author(s)
Duta Ionut; Uijlings Jasper; Ionescu Bogdan; Aizawa Kiyoharu; Hauptmann Alexander; Sebe Nicu
Institution
University of Trento, Italy; University of Edinburgh, Scotland; University Politehnica of Bucharest, Romania; University of Tokyo, Japan; Carnegie Mellon University, Pittsburgh, PA, USA
Abstract
Feature extraction and encoding represent two of the most crucial steps in an action recognition system. For building a powerful action recognition pipeline it is important that both steps are efficient and in the same time provide reliable performance. This work proposes a new approach for feature extraction and encoding that allows us to obtain real-time frame rate processing for an action recognition system. The motion information represents an important source of information within the video. The common approach to extract the motion information is to compute the optical flow. However, the estimation of optical flow is very demanding in terms of computational cost, in many cases being the most significant processing step within the overall pipeline of the target video analysis application. In this work we propose an efficient approach to capture the motion information within the video. Our proposed descriptor, Histograms of Motion Gradients (HMG), is based on a simple temporal and spatial derivation, which captures the changes between two consecutive frames. For the encoding step a widely adopted method is the Vector of Locally Aggregated Descriptors (VLAD), which is an efficient encoding method, however, it considers only the difference between local descriptors and their centroids. In this work we propose Shape Difference VLAD (SD-VLAD), an encoding method which brings complementary information by using the shape information within the encoding process. We validated our proposed pipeline for action recognition on three challenging datasets UCF50, UCF101 and HMDB51, and we propose also a real-time framework for action recognition.
Access
Closed Access
Type of Publication
Journal Article
Publisher
Springer, Multimedia Tools and Applications
03/08/2017
People Instance Retrieval from Highly Challenging Video Surveillance Real-World Footage
access here
Author(s)
Mitrea Catalin Alexandru; Mironica Ionut; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania
Abstract
This paper addresses the task of automated multiple-instance people retrieval from video surveillance footage. Such “real-world” datasets raise particular issues in terms of low image quality, multiple image perspectives, variable lighting conditions and most distinctly, the lack of training samples. A proposed classification-based method is adapted for experiments on two public datasets. Also, comprehensive state-of-the-art descriptors and decisioners pairs are explored and evaluated in terms of average F2 score. Results denote promising performance while the training frames are reduced consistently to one instance.
Access
Closed Access
Type of Publication
Journal Article
Publisher
University Politehnica of Bucharest, Scientific Bulletin Series C-Electrical Engineering and Computer Science
31/05/2017
Pseudo-relevance feedback diversification of social image retrieval results
access here
Author(s)
Boteanu Bogdan; Mironica Ionut; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania
Abstract
In this paper we introduce a novel pseudo-relevance feedback (RF) perspective to social image search results diversification. Traditional RF techniques introduce the user in the processing loop by harvesting feedback about the relevance of the query results. This information is used for recomputing a better representation of the needed data. The novelty of our work is in exploiting the automatic generation of user feedback in a completely unsupervised diversification scenario, where positive and negative examples are used to generate better representations of visual classes in the data. First, user feedback is simulated automatically by selecting positive and negative examples from the initial query results. Then, an unsupervised hierarchical clustering is used to regroup images according to their content. Diversification is finally achieved with a re-ranking approach of the previously achieved clusters. Experimental validation on real-world data from Flickr shows the benefits of this approach achieving very promising results.
Access
Closed Access
Type of Publication
Journal Article
Publisher
Springer, Multimedia Tools and Applications
09/02/2017
The Benchmarking Initiative for Multimedia Evaluation: MediaEval 2016
access here
Author(s)
Martha Larson; Mohammad Soleymani; Guillaume Gravier; Bogdan Ionescu; Gareth J.F. Jones
Institution
Radboud University Nijmegen, Netherlands; University of Geneva, Switzerland; IRISA, France; University Politehnica of Bucharest, Romania; Dublin City University, Ireland
Abstract
The Benchmarking Initiative for Multimedia Evaluation (MediaEval) organizes an annual cycle of scientific evaluation tasks in the area of multimedia access and retrieval. The tasks offer scientific challenges to researchers working in diverse areas of multimedia technology. The tasks, which are focused on the social and human aspects of multimedia, help the research community tackle challenges linked to less widely studied user needs. They also support researchers in investigating the diversity of perspectives that naturally arise when users interact with multimedia content. Here, the authors present highlights from the 2016 workshop.
Access
Open Access
Type of Publication
Editorial
Publisher
IEEE MultiMedia
09/01/2018
Diversity and Credibility for Social Images and Image Retrieval
access here
Author(s)
Bogdan Ionescu; Mihai Lupu; Maia Röhm; Alexandru Lucian Gînsca; Henning Müller
Institution
University Politehnica of Bucharest, Romania; Vienna University of Technology, Austria; CEA LIST, France; University of Applied Sciences Western Switzerland, Switzerland
Abstract
The Retrieving Diverse Social Image task datasets, as their name indicates, address the problem of retrieving images taking into account both the need to diversify the results presented to the user, as well as the potential lack of credibility of the users in their tagging behavior. They are based on already state-of-the-art retrieval technology (i.e., the Flickr retrieval system), which makes it possible to focus on the challenge of image diversification. Moreover, the datasets are not limited to images, but also include rich social information. The credibility component, represented by the credibility subsets of the last three collections, is unique to this set of benchmark datasets.
Access
Open Access
Type of Publication
Editorial
Publisher
ACM SIGMultimedia Records
01/01/2017
Simple, Efficient and Effective Encodings of Local Deep Features for Video Action Recognition
access here
Author(s)
Duta Ionut Cosmin.; Ionescu Bogdan; Aizawa Kiyoharu; Sebe Nicu
Institution
University of Trento, Italy; University Politehnica of Bucharest, Romania; University of Tokyo, Japan
Abstract
For an action recognition system a decisive component is represented by the feature encoding part which builds the final representation that serves as input to a classifier. One of the shortcomings of the existing encoding approaches is the fact that they are built around hand-crafted features and they are also not highly competitive on encoding the current deep features, necessary in many practical scenarios. In this work we propose two solutions specifically designed for encoding local deep features, taking advantage of the nature of deep networks, focusing on capturing the highest feature response of the convolutional maps. The proposed approaches for deep feature encoding provide a solution to encapsulate the features extracted with a convolutional neural network over the entire video. In terms of accuracy our encodings outperform by a large margin the current most widely used and powerful encoding approaches, while being extremely efficient for the computational cost. Evaluated in the context of action recognition tasks, our pipeline obtains state-of-the-art results on three challenging datasets: HMDB51, UCF50 and UCF101.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
ACM International Conference on Multimedia Retrieval (ICMR)
09/01/2018
Spatio-Temporal Vector of Locally Max Pooled Features for Action Recognition in Videos
access here
Author(s)
Duta Ionut Cosmin; Ionescu Bogdan; Aizawa Kiyoharu; Sebe Nicu
Institution
University of Trento, Italy; University Politehnica of Bucharest, Romania; University of Tokyo, Japan
Abstract
We introduce Spatio-Temporal Vector of Locally Max Pooled Features (ST-VLMPF), a super vector-based encoding method specifically designed for local deep features encoding. The proposed method addresses an important problem of video understanding: how to build a video representation that incorporates the CNN features over the entire video. Feature assignment is carried out at two levels, by using the similarity and spatio-temporal information. For each assignment we build a specific encoding, focused on the nature of deep features, with the goal to capture the highest feature responses from the highest neuron activation of the network. Our ST-VLMPF clearly provides a more reliable video representation than some of the most widely used and powerful encoding approaches (Improved Fisher Vectors and Vector of Locally Aggregated Descriptors), while maintaining a low computational complexity. We conduct experiments on three action recognition datasets: HMDB51, UCF50 and UCF101. Our pipeline obtains state-of-the-art results.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
30th IEEE Conference on Computer Vision and Pattern Recognition
03/01/2018
Spatio-Temporal VLAD Encoding for Human Action Recognition in Videos
access here
Author(s)
Duta Ionut Cosmin; Ionescu Bogdan; Aizawa Kiyoharu; Sebe Nicu
Institution
University of Trento, Italy; University Politehnica of Bucharest, Romania; University of Tokyo, Japan
Abstract
Encoding is one of the key factors for building an effective video representation. In the recent works, super vector- based encoding approaches are highlighted as one of the most powerful representation generators. Vector of Locally Aggregated Descriptors (VLAD) is one of the most widely used super vector methods. However, one of the limitations of VLAD encoding is the lack of spatial information captured from the data. This is critical, especially when dealing with video information. In this work, we propose Spatio-temporal VLAD (ST-VLAD), an extended encoding method which incorporates spatio-temporal information within the encoding process. This is carried out by proposing a video division and extracting specific information over the feature group of each video split. Experimental validation is performed using both hand-crafted and deep features. Our pipeline for action recognition with the proposed encoding method obtains state-of-the-art performance over three challenging datasets: HMDB51 (67.6%), UCF50 (97.8%) and UCF101 (91.5%).
Access
Closed Access
Type of Publication
Conference Paper
Publisher
Springer, 23rd International Conference on MultiMedia Modeling
02/03/2018
End to End Very Deep Person Re-identification
access here
Author(s)
Stefan Liviu-Daniel; Mironica Ionut; Mitrea Catalin Alexandru; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania
Abstract
Convolutional Neural Networks (CNNs) are responsible for major breakthroughs in object recognition in still images. This work presents an end to end very deep architecture with small convolutional kernel size, small convolutional strides and very deep network architecture for person re-identification in video streams. To achieve such system several good practices for the training were tested, namely: (i) training from scratch, (ii) pre-training last layer, (iii) small learning rates, (iv) data augmentation techniques, (v) high dropout ratio. The key contribution of this paper is a trainable, end-to-end deep network approach that allows for effective re-identification in real time of people in multiple-stream video from various sources (indoor and outdoor). Experimental evaluation was conducted on a real-world publicly available dataset showing the benefits of this approach.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
IEEE, International Symposium on Signals, Circuits and Systems (ISSCS)
02/03/2018
Content Description for Predicting Image Interestingness
access here
Author(s)
Constantin Mihai Gabriel; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania
Abstract
In this article we analyze the prediction of image interestingness, a domain that is gaining importance in the fields such as recommendation systems, social media and advertising. We investigate the contribution of early and late fusion techniques, while using a set of image descriptors and analyze the best combinations that predict interestingness. Experimental validation is carried out on the MediaEval 2016 Predicting Media Interestingness image dataset. Results show the benefit of the introduction of late fusion approaches to solve the task, allowing to achieve better results than the state of the art.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
IEEE, International Symposium on Signals, Circuits and Systems
25/10/2018
Predicting Interestingness of Visual Content
access here
Author(s)
Demarty Claire-Helene; Sjöberg Mats; Constantin Mihai Gabriel; Duong Ngoc Q. K.; Ionescu Bogdan; Do Thanh-Toan; Wang Hanli
Institution
Technicolor R&I, Rennes, France; Univ Helsinki, Dept Comp Sci, Helsinki Inst Informat Technol HIIT, Helsinki, Finland; Univ Politehn Bucharest, Romania; Singapore Univ Technol & Design, Singapore; Univ Sci, Ho Chi Minh City, Vietnam; Tongji Univ, Dept Comp Sci & Technol, Shanghai, China
Abstract
The ability of multimedia data to attract and keep people's interest for longer periods of time is gaining more and more importance in the fields of information retrieval and recommendation, especially in the context of the ever-growing market value of social media and advertising. In this chapter we introduce a benchmarking framework (dataset and evaluation tools) designed specifically for assessing the performance of media interestingness prediction techniques. We release a dataset which consists of excerpts from 78 movie trailers of Hollywood-like movies. These data are annotated by human assessors according to their degree of interestingness. A real-world use scenario is targeted, namely interestingness is defined in the context of selecting visual content for illustrating a Video on Demand (VOD) website. We provide an in-depth analysis of the human aspects of this task, i.e., the correlation between perceptual characteristics of the content and the actual data, as well as of the machine aspects by overviewing the participating systems of the 2016 MediaEval Predicting Media Interestingness campaign. After discussing the state-of-art achievements, valuable insights, existing current capabilities as well as future challenges are presented.
Access
Closed Access
Type of Publication
Book Chapter
Publisher
Multimedia Systems and Applications, Visual Content Indexing and Retrieval with Psycho-Visual Models
31/12/2017
Retrieving diverse social images at MediaEval 2017: Challenges, dataset and evaluation
access here
Author(s)
Zaharieva Maia; Ionescu Bogdan; Ginsca Alexandru Lucian; Santos Rodrygo L.T.; Müller Henning
Institution
TU Wien, Austria; University Politehnica of Bucharest, Romania; CEA LIST, France; Universidade Federal de Minas Gerais, Brazil; University of Applied Sciences Western Switzerland, Switzerland
Abstract
This paper provides an overview of the Retrieving Diverse Social Images task that is organized as part of the MediaEval 2017 Benchmarking Initiative for Multimedia Evaluation. The task addresses the challenge of visual diversification of image retrieval results, where images, metadata, user tagging profiles, and content and text models are available for processing. We present the task challenges, the employed dataset and ground truth information, the required runs, and the considered evaluation metrics.
Access
Open Access
Type of Publication
Conference Paper
Publisher
CEUR Workshop Proceedings, MediaEval 2017
01/09/2017
MediaEval 2017 Predicting Media Interestingness Task
access here
Author(s)
Demarty Claire-Helene; Sjöberg Mats; Ionescu Bogdan; Do Thanh-Toan; Gygli Michael; Duong Ngoc Q. K.
Institution
Technicolor, Rennes, France; Department of Computer Science, Helsinki Institute for Information Technology HIIT, University of Helsinki, Finland; LAPI, University Politehnica of Bucharest, Romania; University of Adelaide, Australia; ETH Zurich, Switzerland
Abstract
In this paper, the Predicting Media Interestingness task which is running for the second year as part of the MediaEval 2017 Benchmarking Initiative for Multimedia Evaluation, is presented. For the task, participants are expected to create systems that automatically select images and video segments that are considered to be the most interesting for a common viewer. All task characteristics are described, namely the task use case and challenges, the released data set and ground truth, the required participant runs and the evaluation metrics.
Access
Open Access
Type of Publication
Conference Paper
Publisher
CEUR Workshop Proceedings, MediaEval 2017
01/09/2017
LAPI@2017 Retrieving Diverse Social Images Task: A Pseudo-Relevance Feedback Diversification Perspective
access here
Author(s)
Boteanu Bogdan; Constantin Mihai Gabriel; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania
Abstract
In this paper we present the results achieved during the 2017 MediaEval Retrieving Diverse Social Images Task, using an approach based on pseudo-relevance feedback (RF), in which human feedback is replaced by an automatic selection of images. The proposed approach is designed to have in priority the diversification of the results, in contrast to most of the existing techniques that address only the relevance. Diversification is achieved by exploiting a hierarchical clustering (HC) scheme followed by a diversification strategy. Methods are tested on the benchmarking data and results are analyzed. Insights for future work conclude the paper.
Access
Open Access
Type of Publication
Conference Paper
Publisher
CEUR Workshop Proceedings, MediaEval 2017
01/09/2017
LAPI at MediaEval 2017 - Predicting Media Interestingness
access here
Author(s)
Constantin Mihai Gabriel; Boteanu Bogdan; Ionescu Bogdan
Institution
University Politehnica, Bucharest, Romania
Abstract
In the following paper we present our contribution, approach, and results for the MediaEval 2017 Predicting Media Interestingness task. We studied several visual descriptors and created several early and late fusion approaches in our machine learning system, optimized for best results for this benchmarking competition.
Access
Open Access
Type of Publication
Conference Paper
Publisher
CEUR Workshop Proceedings, MediaEval 2017
01/09/2017
A textual filtering of HOG-based hierarchical clustering of lifelog data
access here
Author(s)
Dogariu Mihai; Ionescu Bogdan
Institution
Multimedia Lab at CAMPUS, Politehnica University of Bucharest, Romania
Abstract
In this paper we address the issue of life logging information retrieval and we introduce an approach that uses the output of a hierarchical clustering of data via assessing word similarities. Word similarity is computed using WordNet and Retina ontologies. We have tested our method during the 2017 ImageCLEF Lifelog challenge, the Summarization subtask. We discuss the performance, limitations and future improvements of our method.
Access
Open Access
Type of Publication
Conference Paper
Publisher
CEUR Workshop Proceedings, ImageCLEF 2017
01/09/2017
Generating captions for medical images with a deep learning multi-hypothesis approach: MedGIFT-UPB participation in the ImageCLEF 2017 caption task
access here
Author(s)
Ştefan Liviu-Daniel; Ionescu Bogdan; Müller Henning
Institution
University Politehnica of Bucharest, Romania; University of Applied Sciences Western Switzerland, Switzerland; University of Geneva, Switzerland
Abstract
In this report, we summarize our solution to the ImageCLEF 2017 caption detection task. ImageCLEF's concept detection task provides a testbed for figure caption prediction oriented systems using medical concepts as sentence level descriptions for images, extracted from the Unified Medical Language System (UMLS) dataset. The goal of the task is to efficiently identify the relevant medical concepts from medical images as a predictor of figure captions. For representing the images we used a very deep Convolutional Neural Network, namely ResNet-152 pre-trained on ImageNet and a binary annotation of the concepts. In the concept detection subtask, MedGIFT-UPB group occupied the 3rd place out of 9 groups. The proposed approach obtained the 12th position according to the f1 score (0.89) out of 20 participant runs (runs without external resources). The paper presents the procedure employed, and provides an analysis of the obtained evaluation results. The results showed the difficulty of the task when not using any other external resources.
Access
Open Access
Type of Publication
Conference Paper
Publisher
CEUR Workshop Proceedings, ImageCLEF 2017
01/09/2017
Finding and classifying tuberculosis types for a targeted treatment: MedGIFT-UPB participation in the ImageCLEF 2017 tuberculosis task
access here
Author(s)
Ştefan Liviu-Daniel; Cid Yashin Dicente; Jimenez-Del-Toro Oscar; Ionescu Bogdan; Müller Henning
Institution
University Politehnica of Bucharest, Romania; University of Applied Sciences Western Switzerland, Sierre, Switzerland; University of Geneva, Switzerland
Abstract
This paper describes the participation of the MedGIFT/UPB group in the ImageCLEF 2017 tuberculosis task. This task includes two subtasks: (1) multi-drug resistance detection (MDR), with the goal of determining the probability of a tuberculosis patient having a resistant form of tuberculosis and (2) tuberculosis type detection (TBT), with the goal of classifying each tuberculosis patient into one of the following five types: infiltrative, focal, tuberculoma, miliary and fibro-cavernous. Two runs were submitted for the TBT subtask and one run for the MDR subtask. Both of them use visual features learned with a deep learning approach directly from slices of patient CT (Computed Tomography) scans. For the TBT subtask the submitted runs obtained the 3rd and 8th position out of 23 runs submitted for this task, with a top Kappa value of 0.2329. In the MDR subtask, the proposed approach obtained the 7th position according to the accuracy (0.5352) out of 20 participant runs. Three main techniques were exploited during model training: pre-training the last layer of a neural network, small learning rates and data augmentation techniques. Data augmentation resulted in an effective and efficient data transformation that enhanced small lesions in the full image space.
Access
Open Access
Type of Publication
Conference Paper
Publisher
CEUR Workshop Proceedings, ImageCLEF 2017
01/09/2017
UPB HES SO @ PlantCLEF 2017: Automatic plant image identification using transfer learning via Convolutional neural networks
access here
Author(s)
Toma Alexandru; Ştefan Liviu Daniel; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania
Abstract
Recent advances in computer vision have made possible the use of neural networks in large scale image retrieval tasks. An example application is the automated plant classification. However, training a network from scratch takes a lot of computational effort and turns out to be very time consuming. In this paper, we investigate a transfer learning approach in the context of the 2017 PlantCLEF task, for automatic plant image classification. The proposed approach is based on the well-known AlexNet Convolutional Neural Network (CNN) model. The network was fine-tuned using the 2017 PlantCLEF Encyclopedia of Life (EOL) training data, which consists of approximately 260,000 plant images belonging to 10,000 species. The learning process was sped up in the upper layers leaving original features almost untouched. Our best proposed official run scored 0.361 in terms of the Mean Reciprocal Rank (MRR) when evaluated on the test dataset.
Access
Open Access
Type of Publication
Conference Paper
Publisher
CEUR Workshop Proceedings, PlantCLEF 2017

2016

01/01/2016
Result Diversification in Social Image Retrieval: A Benchmarking Framework
access here
Author(s)
Bogdan Ionescu; Adrian Popescu; Anca-Livia Radu; Henning Müller
Institution
University Politehnica of Bucharest, Romania; CEA-LIST, France; University of Applied Sciences Western Switzerland, Switzerland
Abstract
This article addresses the diversification of image retrieval results in the context of image retrieval from social media. It proposes a benchmarking framework together with an annotated dataset and discusses the results achieved during the related task run in the MediaEval 2013 benchmark. 38 multimedia diversification systems, varying from graph-based representations, re-ranking, optimization approaches, data clustering to hybrid approaches that included a human in the loop, and their results are described and analyzed in this text. A comparison of the use of expert vs. crowdsourcing annotations shows that crowdsourcing results have a slightly lower inter-rater agreement but results are comparable at a much lower cost than expert annotators. Multimodal approaches have best results in terms of cluster recall. Manual approaches can lead to high precision but often lower diversity. With this detailed results analysis we give future insights into diversity in image retrieval and also for preparing new evaluation campaigns in related areas.
Access
Open Access
Type of Publication
Journal Article
Publisher
Springer, Multimedia Tools and Applications
01/11/2016
3D Human Pose Estimation: A Review of the Literature and Analysis of Covariates
access here
Author(s)
Nikolaos Sarafianos; Bogdan Boteanu; Bogdan Ionescu; Ioannis Kakadiaris
Institution
University of Houston System, USA; University Politehnica of Bucharest, Romania
Abstract
Estimating the pose of a human in 3D given an image or a video has recently received significant attention from the scientific community. The main reasons for this trend are the ever-increasing new range of applications (e.g., human-robot interaction, gaming, sports performance analysis) driven by current technological advances. Although recent approaches have dealt with several challenges and reported remarkable results, 3D pose estimation remains a largely unsolved problem because real-life applications impose several challenges not fully addressed by existing methods. For example, estimating the 3D pose of multiple people in an outdoor environment remains a largely unsolved problem. In this paper, we review the recent advances in 3D human pose estimation from RGB images or image sequences. We propose a taxonomy of the approaches based on the input (e.g., single image or video, monocular or multi-view), and in each case categorize the methods according to their key characteristics. To provide an overview of the current capabilities, we conducted an extensive experimental evaluation of state-of-the-art approaches on a synthetic dataset created specifically for this task, which along with its ground truth is made publicly available for research purposes. Finally, we provide an in-depth discussion of the insights obtained from reviewing the literature and the results of our experiments. Future directions and challenges are identified.
Access
Open Access
Type of Publication
Journal Article
Publisher
Elsevier, Computer Vision and Image Understanding
01/09/2016
Event‑based media processing and analysis: A survey of the literature
access here
Author(s)
Tzelepis Christos; Ma Zhigang; Mezaris Vasileios; Ionescu Bogdan; Kompatsiaris Ioannis; Boato Giulia; Sebe Nicu; Yan Shuicheng
Institution
Information Technologies Institute (ITI), CERTH, Greece; Carnegie Mellon University, USA; University “Politehnica” Bucharest, Romania; University of Trento, Italy; National University of Singapore
Abstract
Research on event-based processing and analysis of media is receiving an increasing attention due to the proliferation of multimedia content and the demand for automatic understanding of events. This survey reviews the state-of-the-art approaches for event representation, modeling, detection, and retrieval, and discusses benchmarking efforts and future challenges in the field.
Access
Closed Access
Type of Publication
Journal Article
Publisher
Elsevier, Image and Vision Computing
05/10/2016
A Modified Vector of Locally Aggregated Descriptors Approach for Fast Video Classification
access here
Author(s)
Mironică Ionuț; Duță Ionuț Cosmin; Ionescu Bogdan; Sebe Nicu
Institution
University "Politehnica" of Bucharest, Romania; University of Trento, Italy
Abstract
In order to reduce the computational complexity, most of the video classification approaches represent video data at frame level. In this paper we investigate a novel perspective that combines frame features to create a global descriptor. The main contributions are: (i) a fast algorithm to densely extract global frame features which are easier and faster to compute than spatio-temporal local features; (ii) replacing the traditional k-means visual vocabulary from Bag-of-Words with a Random Forest approach allowing a significant speedup; (iii) the use of a modified Vector of Locally Aggregated Descriptor(VLAD) combined with a Fisher kernel approach that replace the classic Bag-of-Words approach, allowing us to achieve high accuracy. By doing so, the proposed approach combines the frame-based features effectively capturing video content variation in time. We show that our framework is highly general and is not dependent on a particular type of descriptors. Experiments performed on four different scenarios: movie genre classification, human action recognition, daily activity recognition and violence scene classification, show the superiority of the proposed approach compared to the state of the art.
Access
Open Access
Type of Publication
Journal Article
Publisher
Springer, Multimedia Tools and Applications
2016‑02‑01
Multiview Plus Depth Video Coding With Temporal Prediction View Synthesis
access here
Author(s)
Purica Andrei; Mora Elie; Pesquet‑Popescu Beatrice; Cagnazzo Marco; Ionescu Bogdan
Institution
Telecom Paris Tech, France; University "Politehnica" Bucharest, Romania
Abstract
Multiview video (MVV) plus depths formats use view synthesis to build intermediate views from existing adjacent views at the receiver side. Traditional view synthesis exploits the disparity information to interpolate an intermediate view by considered inter-view correlations. However, temporal correlation between different frames of the intermediate view can be used to improve the synthesis. We propose a new coding scheme for 3-D High Efficiency Video Coding (HEVC) that allows us to take full advantage of temporal correlations in the intermediate view and improve the existing synthesis from adjacent views. We use optical flow techniques to derive dense motion vector fields (MVF) from the adjacent views and then warp them at the level of the intermediate view. This allows us to construct multiple temporal predictions of the synthesized frame. A second contribution is an adaptive fusion method that judiciously selects between temporal and inter-view prediction to eliminate artifacts associated with each prediction type. The proposed system is compared against the state-of-the-art view synthesis reference software 1-D Fast technique used in 3-D HEVC standardization. Three intermediary views are synthesized. Gains of up to 1.21-dB Bjontegaard Delta peak SNR are shown when evaluated on several standard MVV test sequences.
Access
Closed Access
Type of Publication
Journal Article
Publisher
IEEE Transactions on Circuits and Systems for Video Technology
02/03/2016
Fisher Kernel Temporal Variation-based Relevance Feedback for Video Retrieval
access here
Author(s)
Mironică Ionuț; Ionescu Bogdan; Uijlings Jasper; Sebe Nicu
Institution
University "Politehnica" Bucharest, Romania; University of Edinburgh, Scotland; University of Trento, Italy
Abstract
This paper proposes a novel framework for Relevance Feedback based on the Fisher Kernel (FK). Specifically, we train a Gaussian Mixture Model (GMM) on the top retrieval results (without supervision) and use this to create a FK representation, which is therefore specialized in modelling the most relevant examples. We use the FK representation to explicitly capture temporal variation in video via frame-based features taken at different time intervals. While the GMM is being trained, a user selects from the top examples those which he is looking for. This feedback is used to train a Support Vector Machine on the FK representation, which is then applied to re-rank the top retrieved results. We show that our approach outperforms other state-of-the-art relevance feedback methods. Experiments were carried out on the Blip10000, UCF50, UFC101 and ADL standard datasets using a broad range of multi-modal content descriptors (visual, audio, and text). (C) 2015 Elsevier Inc. All rights reserved.
Access
Closed Access
Type of Publication
Journal Article
Publisher
Elsevier, Computer Vision and Image Understanding
01/01/2016
Result Diversification in Social Image Retrieval: A Benchmarking Framework
access here
Author(s)
Ionescu Bogdan; Popescu Adrian; Radu Anca‑Livia; Müller Henning
Institution
University Politehnica of Bucharest, Romania; CEA LIST, France; University of Trento, Italy; University of Applied Sciences Western Switzerland, Switzerland
Abstract
This article addresses the diversification of image retrieval results in the context of image retrieval from social media. It proposes a benchmarking framework together with an annotated dataset and discusses the results achieved during the related task run in the MediaEval 2013 benchmark. 38 multimedia diversification systems, varying from graph-based representations, re-ranking, optimization approaches, data clustering to hybrid approaches that included a human in the loop, and their results are described and analyzed in this text. A comparison of the use of expert vs. crowdsourcing annotations shows that crowdsourcing results have a slightly lower inter-rater agreement but results are comparable at a much lower cost than expert annotators. Multimodal approaches have best results in terms of cluster recall. Manual approaches can lead to high precision but often lower diversity. With this detailed results analysis we give future insights into diversity in image retrieval and also for preparing new evaluation campaigns in related areas.
Access
Closed Access
Type of Publication
Journal Article
Publisher
Springer, Multimedia Tools and Applications
27/09/2017
Boosting VLAD with Double Assignment using Deep Features for Action Recognition in Videos
access here
Author(s)
Duta Ionut C.; Nguyen Tuan A.; Aizawa Kiyoharu; Ionescu Bogdan; Sebe Nicu
Institution
University of Trento, Italy; University of Tokyo, Japan; University Politehnica of Bucharest, Romania
Abstract
The encoding method is an important factor for an action recognition pipeline. One of the key points for the encoding method is the assignment step. A very widely used super-vector encoding method is the vector of locally aggregated descriptors (VLAD), with very competitive results in many tasks. However, it considers only hard assignment and the criteria for the assignment is performed only from the features side, by looking for which visual word the features are voting. In this work we propose to encode deep features for videos using a double assignment VLAD (DA-VLAD). In addition to the traditional assignment for VLAD we perform a second assignment by taking into account the perspective from the codebook side: which are the nearest features to a visual word and not only which is the nearest centroid for the features as the standard assignment. Another important factor for the performance of an action recognition system is the feature extraction step. Recently, deep features obtained state-of-the-art results in many tasks, being also adopted for action recognition with competitive results over hand-crafted features. This work includes a pipeline to extract local deep features for videos using any available network as a black box and we show competitive results including the case when the network was trained for another task or another dataset. Our DA-VLAD encoding method outperforms the traditional VLAD and we obtain state-of-the-art results on UCF50 dataset and competitive results on UCF101 dataset.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
IEEE, 23rd International Conference on Pattern Recognition
01/01/2016
Deep Learning vs Spectral Clustering into an Active Clustering with Pairwise Constraints Propagation
access here
Author(s)
Voiron Nicolas; Benoît Alexandre; Lambert Patrick; Ionescu Bogdan
Institution
University Savoie Mt Blanc, LISTIC, France; University Politehnica of Bucharest, Romania
Abstract
In our data driven world, categorization is of major importance to help end-users and decision makers understanding information structures. Supervised learning techniques rely on annotated samples that are often difficult to obtain and training often overfits. On the other hand, unsupervised clustering techniques study the structure of the data without disposing of any training data. Given the difficulty of the task, supervised learning often outperforms unsupervised learning. A compromise is to use a partial knowledge, selected in a smart way, in order to boost performance while minimizing learning costs, which is called semi-supervised learning. In such use case, Spectral Clustering proved to be an efficient method. Also, Deep Learning outperformed several state of the art classification approaches and it is interesting to test it in our context. In this paper, we firstly introduce the concept of Deep Learning into an active semi-supervised clustering process and compare it with Spectral Clustering. Secondly, we introduce constraint propagation and demonstrate how it maximizes partitioning quality while reducing annotation costs. Experimental validation is conducted on two different real datasets. Results show the potential of the clustering methods.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
IEEE, 14th International Workshop on Content-Based Multimedia Indexing
01/01/2016
Histograms of Motion Gradients for Real-time Video Classification
access here
Author(s)
Duta Ionut C.; Uijlings Jasper R. R.; Nguyen Tuan A.; Aizawa Kiyoharu; Hauptmann Alexander G.; Ionescu Bogdan; Sebe Nicu
Institution
University of Trento, Italy; University of Edinburgh, Scotland; University of Tokyo, Japan; Carnegie Mellon University, USA; University Politehnica of Bucharest, Romania
Abstract
Besides appearance information, the video contains temporal evolution, which represents an important and useful source of information about its content. Many video representation approaches are based on the motion information within the video. The common approach to extract the motion information is to compute the optical flow from the vertical and the horizontal temporal evolution of two consecutive frames. However, the computation of optical flow is very demanding in terms of computational cost, in many cases being the most significant processing step within the overall pipeline of the target video analysis application. In this work we propose a very efficient approach to capture the motion information within the video. Our method is based on a simple temporal and spatial derivation, which captures the changes between two consecutive frames. The proposed descriptor, Histograms of Motion Gradients (HMG), is validated on the UCF50 human action recognition dataset. Our HMG pipeline with several additional speed-ups is able to achieve real-time video processing and outperforms several well-known descriptors including descriptors based on the costly optical flow.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
IEEE, 14th International Workshop on Content-Based Multimedia Indexing
01/01/2016
Div150Multi: A Social Image Retrieval Result Diversification Dataset with Multi-topic Queries
access here
Author(s)
Ionescu Bogdan; Gînșca Alexandru Lucian; Boteanu Bogdan; Lupu Mihai; Popescu Adrian; Müller Henning
Institution
University Politehnica of Bucharest, Romania; CEA LIST, Palaiseau, France; Vienna Univ Technology, Austria; University of Applied Sciences Western Switzerland, Switzerland
Abstract
In this paper we introduce a new dataset, Div150Multi, that was designed to support shared evaluation of diversification techniques in different areas of social media photo retrieval and related areas. The dataset comes with associated relevance and diversity assessments performed by trusted annotators. The data consists of around 300 complex queries represented via 86,769 Flickr photos, around 27M photo links for around 6,000 users, metadata, Wikipedia pages and content descriptors for text and visual modalities, including state of the art deep features. To facilitate distribution, only Creative Commons content allowing redistribution was included in the dataset. The proposed dataset was validated during the 2015 Retrieving Diverse Social Images Task at the MediaEval Benchmarking.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
Assoc Comp Machinery, 7th International Conference on Multimedia Systems
01/01/2016
View synthesis based on temporal prediction via warped motion vector fields
access here
Author(s)
Purica, Andrei; Cagnazzo, Marco; Pesquet-Popescu Beatrice; Dufaux Frederic; Ionescu Bogdan
Institution
Université Paris-Saclay, France; University Politehnica of Bucharest, Romania
Abstract
The demand for 3D content has increased over the last years as 3D displays are now widespread. View synthesis methods, such as depth-image-based-rendering, provide an efficient tool in 3D content creation or transmission, and are integrated in coding solutions for multiview video content such as 3D-HEVC. In this paper, we propose a view synthesis method that takes advantage of temporal and inter-view correlations in multiview video sequences. We use warped motion vector fields computed in reference views to obtain temporal predictions of a frame in a synthesized view and blend them with depth-image-based-rendering synthesis. Our method is shown to bring gains of 0.42dB in average when tested on several multiview sequences.
Access
Closed Access
Type of Publication
Conference Paper
Publisher
Assoc Comp Machinery, 7th International Conference on Multimedia Systems
01/10/2016
MediaEval 2016 predicting media interestingness task
access here
Author(s)
Demarty Claire-Hélène; Sjöberg Mats; Ionescu Bogdan; Do Thanh-Toan; Wang Hanli; Duong Ngoc Q. K.; Lefebvre Frédéric
Institution
Technicolor, Rennes, France; University of Helsinki, Finland; University Politehnica of Bucharest, Romania; Singapore University of Technology and Design, Singapore; University of Science, Viet Nam; Tongji University, China
Abstract
This paper provides an overview of the Predicting Media Interestingness task that is organized as part of the MediaEval 2016 Benchmarking Initiative for Multimedia Evaluation. The task, which is running for the first year, expects participants to create systems that automatically select images and video segments that are considered to be the most interesting for a common viewer. In this paper, we present the task use case and challenges, the proposed data set and ground truth, the required participant runs and the evaluation metrics.
Access
Open Access
Type of Publication
Conference Paper
Publisher
CEUR Workshop Proceedings, MediaEval 2016
01/10/2016
Retrieving diverse social images at MediaEval 2016: Challenge, dataset and evaluation
access here
Author(s)
Ionescu Bogdan; Gînscǎ Alexandru Lucian; Zaharieva Maia; Boteanu Bogdan; Lupu Mihai; Müller Henning
Institution
University Politehnica of Bucharest, Romania; CEA, LIST, France; University of Vienna, Austria; Vienna University of Technology, Austria; University of Applied Sciences Western Switzerland
Abstract
This paper provides an overview of the Retrieving Diverse Social Images task that is organized as part of the MediaEval 2016 Benchmarking Initiative for Multimedia Evaluation. The task addresses the problem of result diversification in the context of social photo retrieval where images, metadata, text information, user tagging profiles and content and text models are available for processing. We present the task challenges, the proposed data set and ground truth, the required participant runs and the evaluation metrics.
Access
Open Access
Type of Publication
Conference Paper
Publisher
CEUR Workshop Proceedings, MediaEval 2016
01/10/2016
LAPI at MediaEval 2016 predicting media interestingness task
access here
Author(s)
Constantin Mihai Gabriel; Boteanu Bogdan; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania
Abstract
This paper will present our results for the MediaEval 2016 Predicting Media Interestingness task. We proposed an approach based on video descriptors and studied several machine learning models, in order to detect the optimal configuration and combination for the descriptors and algorithms that compose our system.
Access
Open Access
Type of Publication
Conference Paper
Publisher
CEUR Workshop Proceedings, MediaEval 2016
01/10/2016
Lapi @ 2016 retrieving diverse social images task: A pseudo-relevance feedback diversification perspective
access here
Author(s)
Boteanu Bogdan; Constantin Mihai Gabriel; Ionescu Bogdan
Institution
University Politehnica of Bucharest, Romania
Abstract
In this paper we present the results achieved during the 2016 Media-Eval Retrieving Diverse Social Images Task, using an approach based on pseudo-relevance feedback, in which human feedback is replaced by an automatic selection of images. The proposed approach is designed to have in priority the diversification of the results, in contrast to most of the existing techniques that address only the relevance. Diversification is achieved by exploiting a hierarchical clustering scheme followed by a diversification strategy. Methods are tested on the benchmarking data and results are analyzed. Insights for future work conclude the paper.
Access
Open Access
Type of Publication
Conference Paper
Publisher
CEUR Workshop Proceedings, MediaEval 2016