PhD Theses

Segmentation and Classification of Multimodal Medical Images based on Generative Adversarial Learning and Convolutional Neural Networks.

Vivek Kumar Singh

Candidate: Vivek Kumar Singh
PhD Advisors: Dr. Domènec Puig and Dr. Santiago Romaní
Date of defense: 2019-11-22
File: Thesis download (Coming soon)

Abstract: Abstract: Medical imaging is an important means for early illness detection in the majority of medical fields, which provides better prognosis to the patients. But properly interpreting medical images needs highly trained medical experts: it is difficult, time-consuming, expensive, and error-prone. It would be more beneficial to have a computer-aided diagnosis (CAD) system that can automatically outline the possible ill tissues and suggest diagnosis to the doctor. Current development in deep learning methods motivates us to improve current medical image analysis systems. In this thesis, we have considered three different medical diagnosis, such as breast cancer from mammograms and ultrasound images, skin lesion from dermoscopic images, and retinal diseases from fundus images. These tasks are very challenging due to the several sources of variability in the image capturing processes.

Firstly, we propose a method to analyze the breast cancer in mammograms. In a first stage, we utilize the Single Shot Detector (SSD) method to locate the possibly abnormal regions, which are called regions of interest (ROIs). Then, in a second stage we apply a conditional generative adversarial network (cGAN) method to segment possible masses within the ROIs. This network works efficiently with a reduced number of training images. In a third stage, a convolutional neural network (CNN) has been introduced to classify the shape of the masses (round, oval, lobular and irregular). Besides, we also try to classify those masses into four distinct breast cancer molecular subtypes (Luminal-A, Luminal-B, Her-2, and Basal-like), based on its shape and also on the micro-texture rendered in the image pixels. Moreover, for ultrasound image processing, we extended the proposed cGAN model by introducing a novel channel attention and weighting (CAW) block, which improves the robustness of segmentation by fostering the more relevant features of the masses. Some statistical analysis corroborate the accuracy of the segmented masks. Finally, we also performed a classification between benign and malignant tumors based on the shape of the segmented masks.

Second, skin lesion segmentation in dermoscopic images is still challenging due to the low contrast and fuzzy boundaries of lesions. Besides, lesions have high similarity to healthy regions. To overcome this problems, we introduce a novel layer inside the encoder of the cGAN, called factorized channel attention (FCA) block. It integrates a channel attention mechanism and a residual 1-D kernel factorized convolution. The channel attention mechanism increases the discriminability between the lesion and non-lesion features by taking into account feature channel interdependencies. The 1-D factorized kernels provide extra convolutional layers with a minimal set of parameters and a residual connection that minimizes the impact of image artifacts and irrelevant objects.

Third, segmentation of retinal optic disc in fundus photographs plays a critical role in the diagnosis, screening and treatment of many ophthalmologic diseases. Therefore, we have applied our cGAN method to the task of optic disc segmentation, obtaining promising results with a really short number of training samples (less than twenty). Experiments with these three kinds of medical image diagnosis have been performed for quantitative and qualitative comparisons with other state-of-the-art methods, to show the advantages of the proposed detection, segmentation and classification techniques.

Keywords: Medical image analysis, deep learning, conditional generative adversarial network, segmentation.


Efficient Deep Learning Models and Their Applications to Health Informatics.

Mostafa Kamal Sarker

Candidate: Mostafa Kamal Sarker
PhD Advisors: Dr. Domènec Puig and Dr. Petia Radeva
Date of defense: 2019-11-12
File: Thesis download (Coming soon)

Abstract: Abstract: This thesis designed and implemented efficient deep learning methods to solve classification and segmentation problems in two major health informatics domains, namely pervasive sensing and medical imaging. In the area of pervasive sensing, this thesis focuses only on food and related scene classification for health and nutrition analysis. This thesis used deep learning models to find the answer of two important two questions, “where we eat?’’ and ‘’what we eat?’’ for properly monitoring our health and nutrition condition. This is a new research domain, so this thesis presented entire scenarios from the scratch (e.g. create a dataset, model selection, parameter optimization, etc.). To answer the first question, “where we eat?”, it introduced two new datasets, “FoodPlaces”, “EgoFoodPlaces” and models, “MACNet”, “MACNet+SA” based on multi-scale atrous convolutional networks with the self-attention mechanism. To answer the second question, “what we eat?”, it presented a new dataset, “Yummly48K” and model, “CuisineNet’‘, designed by aggregating convolution layers with various kernel sizes followed by residual and pyramid pooling module with two fully connected pathway. The proposed models performed state-of-the-art classification accuracy on their related datasets. In the field of medical imaging, this thesis targets skin lesion segmentation problem in the dermoscopic images. This thesis introduced two novel deep learning models to accurately segment the skin lesions, “SLSDeep” and “MobileGAN” based on dilated residual with pyramid pooling network and conditional Generative Adversarial Networks (cGANs). Both models show excellent performance on public benchmark datasets.

Keywords: Deep Learning, Wearable Device, Food Places Classification, Convolutional Neural Network, Recurrent Neural Network, Skin Lesion Segmentation, Dilated Convolutional Neural Network, Generative Adversarial Network.


Empowering Cognitive Stimulation Therapy (CST) with Socially Assistive Robotics (SAR) and Emotion Recognition.

Jainendra Shukla

Candidate: Jainendra Shukla
PhD Advisor: Dr. Domènec Puig
Date of defense: 2018-05-24
File: Thesis download

Abstract: Robot-assisted systems for cognitive rehabilitation can increase the reach of potential benefits of evidence-based psychological or psychosocial interventions to the individuals with a wide range of mental health concerns. Existing researches in socially assistive robots (SAR) lack clinical validation and hence, medical practitioners have little motivation for their use in clinical practices. Besides, existing human-robot interactions are inattentive to the user’s current emotional state and engagement. Cognitive rehabilitation interventions for individuals with mental health concerns demand complex human robot interaction, and ubiquity of wearable devices motivates for robot interaction systems which can autonomously acquire information about the user’s emotional state, intentions and surrounding context so the robot can adapt its interactions accordingly. In this thesis, I have described the design, implementation of robot-assisted cognitive rehabilitation activities and real-time emotion recognition from electro-dermal activity (EDA) signals. Design of robot-assisted interventions presents a coherent framework to produce positive effects on both the users and the caregivers. The implementation of the system confirms an increased engagement among users and a significant reduction in caregivers burden. The development of the emotion recognition algorithms has shown that it is possible to process the EDA signals in real time with minimal lag to infer the emotional state of individuals with intellectual disability (ID).

Keywords: Socially Assistive Robotics; Emotion Recognition; Stimulation Therapy.


Understanding Road Scenes using Deep Neural Networks.

Hamed Habibi

Candidate: Hamed Habibi
PhD Advisor: Dr. Domènec Puig
Date of defense: 2017-07-06
File: Thesis download

Abstract: Understanding road scenes is crucial for autonomous cars. This requires segmenting road scenes into semantically meaningful regions and recognizing objects in a scene. While objects such as cars and pedestrians has to be segmented accurately, it might not be necessary to detect and locate these objects in a scene. However, detecting and classifying objects such as traffic signs is essential for conforming to road rules. In this thesis, we first propose a method for classifying traffic signs using visual attributes and Bayesian networks. Then, we propose two neural network for this purpose and develop a new method for creating an ensemble of models. Next, we study sensitivity of neural networks against adversarial samples and propose two denoising networks that are attached to the classification networks to increase their stability against noise. In the second part of the thesis, we first propose a network to detect traffic signs in high-resolution images in real-time and show how to implement the scanning window technique within our network using dilated convolutions. Then, we formulate the detection problem as a segmentation problem and propose a fully convolutional network for detecting traffic signs. Finally, we propose a new fully convolutional network composed of fire modules, bypass connections and consecutive dilated convoletions in the last part of the thesis for segmenting road scenes into semantically meaningful regions and show that it is more accurate and computationally more efficient compared to similar networks.

Keywords: Deep Neural Networks; Understanding Road Scenes; Semantic Segmentation.


Active contours for intensity inhomogeneous image segmentation.

Farhan Akram

Candidate: Farhan Akram
PhD Advisor: Dr. Domènec Puig
PhD Advisor: Dr. Miguel Ángel garcía
Date of defense: 06-07-2017
File: Thesis download

Abstract: Intensity inhomogeneity is a well-known problem in image segmentation, which affects the accuracy of intensity-based segmentation methods. In this thesis, edge-based and region-based active contour methods are proposed to segment intensity inhomogeneous images. Firstly, we have proposed an edge-based active contour method based on the Difference of Gaussians (DoG), which helps to segment the global structure of the image. Secondly, we have proposed a region-based active contour method to both correct and segment intensity inhomogeneous images. A phase stretch transform (PST) kernel has been used to compute new intensity means and bias field, which are employed to define a bias fitted image. Thirdly, another region-based active contour method has been proposed using an energy functional based on local and global fitted images. Bias field is approximated with a Gaussian distribution and the bias of intensity inhomogeneous regions is corrected by dividing the original image by the approximated bias field. Finally, a hybrid region-based multiphase (four-phase) active contours method has been proposed to partition a brain MR image into three distinct regions: white matter (WM), gray matter (GM) and cerebrospinal fluid (CSF). In this work, a post-processing (pixel correction) method has also been devised to improve the accuracy of the segmented WM, GM and CSF regions. Experimental results with both synthetic and real brain MR images have been used for a quantitative and qualitative comparison with state-of-the-art active contour methods to show the advantages of the proposed segmentation techniques.

Keywords: Image segmentation; Active contours; Intensity inhomogeneous.


Human-robot interaction and computer-vision-based services for autonomous robots.

Jordi Bautista Ballester

Candidate: Jordi Bautista Ballester
PhD Advisor: Dr. Domènec Puig
PhD Advisor: Dr. Jaume Vergés
Date of defense: 2016-07-14
File: Doctoral thesis download

Abstract: Imitation Learning (IL), or robot Programming by Demonstration (PbD), covers methods by which a robot learns new skills through human guidance and imitation. PbD takes its inspiration from the way humans learn new skills by imitation in o der to develop methods by which new tasks can be transmitted to robots. This thesis is motivated by the generic question of “what to imitate?” which concerns the problem of how to extract the essential features of a task. To this end, here we adopt Action Recognition (AR) perspective in order to allow the robot to decide what has to be imitated or inferred when interacting with a human kind. The proposed approach is based on a well-known method from natural language processing: namely, Bag of Words (BoW). This method is applied to large databases in order to obtain a trained model. Although BoW is a machine learning technique that is used in various fields of research, in action classification for robot learning it is far from accurate. Moreover, it focuses on the classification of objects and gestures rather than actions. Thus, in this thesis we show that the method is suitable in action classification scenarios for merging information from different sources or different trials. This thesis makes three contributions: (1) it proposes a general method for dealing with action recognition and thus to contribute to imitation learning; (2) the methodology can be applied to large databases which include different modes of action captures; and (3) the method is applied specifically in a real international innovation project called Vinbot.

Keywords: Imitation Learning, Sensor Fusion, Robotics, Action Recognition, Human Robot Interaction, Computer Vision, Bag of Words, Multikernel SVM.


Development of advanced computer methods for breast cancer image interpretation through texture and temporal evolution analysis

Mohamed Abdel-Nasser

Candidate: Mohamed Abdel-Nasser
PhD Advisor: Dr. Domènec Puig
PhD Advisor: Dr. Antonio Moreno
Date of defense: 2016-07-08
File: Download

Abstract: Breast cancer is one of the most dangerous diseases that attacks women. Computer-aided diagnosis systems may help to  detect breast cancer early and reduce mortality.  This thesis proposes several methods for analyzing breast cancer images. We analyze breast cancer in  mammographies, ultrasonographies and thermographies.  Our analysis includes mass/normal breast tissue classification, benign/malignant tumor classification in mammograms and ultrasound images, nipple detection in thermograms, mammogram registration and analysis of the evolution of breast tumors.

We considered well-known texture analysis methods and proposed two new texture descriptors. We  also studied the effect of  pixel resolution, integration scale, preprocessing and feature normalization on the performance of these texture analysis methods for tumor classification. Finally, we used  super-resolution approaches to improve the performance of texture analysis methods when classifying breast tumors in ultrasound images.

For the analysis of breast cancer in thermograms, we propose an automatic method for detecting nipples that is accurate and simple. To analyze the evolution of breast cancer, we propose a temporal mammogram registration method  based on curvilinear coordinates. We also propose a method for quantifying and visualizing the evolution of breast tumors in patients undergoing medical treatment. Overall, the methods proposed in this thesis improve the performance of the state-of-the-art approaches and may help to improve the diagnosis of breast cancer.


Swarm robotic systems: Y-Pod formation with the analysis on scalability and stability

Purushotham Muniganti

Candidate: Purushotham Muniganti
PhD Advisor: Dr. Albert Oller Pujol
PhD Advisor: Dr. Domènec Puig
Date of defense: 2016-02-08
File: Download

Abstract: The context of this work is an active area of research community which is “swarm formation”. In general, swarm system has most striking examples from nature: social insect colonies are able to build sophisticated structures and regulate the activities of millions of individuals by endowing each individual with simple rules. When applying rules extracted from natural systems to artificial problems, essentially requires different control parameters in order to fulfil the system performance in terms of scalability, flexibility and robustness.

This thesis contributes to the investigation of the swarm formation shape and controller, which is important in swarm robotics too since coordinated behaviour of a group of robots to form a pattern when viewed globally. In this regard, global shape formation is one of the ongoing problems in artificial swarm intelligence. In nature, it is performed for various purposes, such as natural disaster and flock of large birds flying together while forming a shape in order to reduce the air resistance. There exist various shape formations in the literature, but in this thesis, approached new strategy, i.e. Y-Pod, which has vast applications compared to other formation techniques. The Y-Pod is a node which connected with three segments and it will appears different for 2D and 3D environments with respect to angles and shapes.

The main objective of the proposed approach is to form a Y-Pod shape using with linear controller that significantly define the resulting behavior. We have proposed system settling time and pole based approach with respect to equilibrium strategy, to control the swarm system. The proposed linear controller guarantee that the system stability and scalability based on steering analysis and pattern index matching techniques. In addition, with the help of pattern index matching technique, we justify the absolute minima and system synchronization problems in order to overcome the redundancy issues in communication networks. In this process, parameters are chosen based on desired formation as well as user defined constraints. This approach compared to others, is simple, computationally efficient, scales well to different swarm sizes, to both centralized and decentralized swarm models.

Generation and control of locomotion for biped robots based on biologically inspired approaches

Julián Efrén Cristiano Rodríguez

Candidate: Julián Efrén Cristiano Rodríguez
PhD Advisor: Dr. Domènec Puig
PhD Advisor: Dr. Miguel Ángel García
Date of defense: 2016-01-15
File: Download

Abstract: This thesis proposes the use of biologically inspired control approaches to generate and control the omnidirectional gait of humanoid robots, adapting their movement to various types of flat terrain using multi-sensory feedback. The proposed locomotion control systems were implemented using Central Pattern Generator (CPG) networks based on Matsuoka’s neuron model. CPGs are biological neural networks located in the central nervous system of vertebrates or in the main ganglia of invertebrates, which can control coordinated movements, such as those involved in locomotion, respiration, chewing or swallowing.

The fact that, in nature, human and animal locomotion is controlled by CPG networks has inspired the theory on which the present thesis is based. In particular, two closed-loop control architectures based on CPG-joint-space control methods have been proposed and tested by using both a simulated and a real NAO humanoid robot. The first control architecture identified some important features that a CPG-joint-space control scheme must have if a useful locomotion pattern is to be described. On the basis of this analysis, the second control architecture was proposed to describe well-characterized locomotion patterns. The new system, characterized by optimized parameters obtained with a genetic algorithm (GA), effectively generated and controlled locomotion patterns for biped robots on flat and sloped terrain.

To improve how the system behaves in closed-loop, a phase resetting mechanism for CPG networks based on Matsuoka’s neuron model has been proposed. It makes it possible to design and study feedback controllers that can quickly modify the locomotion pattern generated.

The results obtained show that the proposed control schemes can yield well-characterized locomotion patterns with a fast response suitable for humanoid robots with a reduced processing capability. These experiments also indicate that the proposed system enables the robot to respond quickly and robustly, and to cope with complex situations.

Robust atalysis and protection of dynamic scenes for privacy-aware video surveillance

Hatem Abd Ellatif FatahAllah Ibrahim Mahmoud Rashwan

Candidate: Hatem Abd Ellatif FatahAllah Ibrahim Mahmoud Rashwan
PhD Advisor: Dr. Domènec Puig
PhD Advisor: Dr. Antoni Martínez Ballesté
Date of defense: 2014-05-26
File: Download

Abstract: Recent advances in pervasive video surveillance systems pave the way for a compre hensive surveillance of every aspect of our lives. Computerized and interconnected camera systems can be used to profile, track and monitor individuals for the sake of security. Notwithstanding, these systems clearly interfere with the fundamental right of the individuals to privacy. To alleviate this privacy problem and avert the so-called Big Brother effect, the usage of privacy enhancing technologies is mandatory.

Privacy-aware video surveillance systems are based on a Detection Submodule that detects the so-called regions of interest (i.e. areas to protect to achieve privacy) from the captured video and on a Protection Submodule that protects the detected areas (aiming at preventing identity disclosure). Only a trusted manager might be able to access the protected video and unprotect it, for instance in case of criminal investigations and, in general, under permission of a law enforcer (judge, police, etc.). Most literature on privacy in video surveillance systems concentrates on the goal of detecting faces and other regions of interest, and in proposing different methods to protect them. However, the trustworthiness of those systems and, by extension the privacy they provide, is neglected.

In this thesis, the topic of privacy-aware video surveillance is tackled from a holistic point of view. Firstly, an introductory chapter defines the properties of a trustworthy privacy-aware video surveillance system, and reviews the techniques that can be used in the Detection Submodule and in the Protection Submodule.

The remaining of the thesis is divided into two parts. In the first one, some contributions aiming at improving the detection of regions of interest are developed. Specifically, it addresses our contributions to optical flow detection techniques: it has been found that, despite its usefulness, the widely known variational optical flow has several limitations and shortcomings for providing accurate flow fields for motion estimation problems in computer vision. In order to overcome these limitations, new development models are introduced as an alternative to classic concepts. Two models are proposed in this dissertation in order to improve the robustness of variational optical flow model through tensor voting to be more robust against noise and to preserve discontinuities. In addition, the data term of the optical flow model based on brightness constancy assumption is replaced by a rich descriptor in order to obtain an illumination-robust optical flow model.

In the second part, the protection of regions of interest is addressed. A method based on coefficient alteration in the compressed domain of the video is presented and tested in terms of robustness and efficiency. The processes related to the information security of the data involved in the protection and unprotection processes are also comprehensively taken into account.

The thesis includes tests and implementations for all the theoretical proposals, aiming at demonstrating theirrvalidity in a real video surveillance scenario. Finally, a chapter with a summary of the advances presented and further work concludes the thesis.

Modeling and applications of the focus cue in conventional digital cameras

Said David Pertuz Arroyo

Candidate: Said David Pertuz Arroyo
PhD Advisor: Dr. Domènec Puig
PhD Advisor: Dr. Miguel Ángel García
Date of defense: 2013-07-17
File: Download

Abstract: The focus of digital cameras plays a fundamental role in both the quality of the acquired images and the perception of the imaged scene. This thesis studies the focus cue in conventional cameras with focus control, such as cellphone cameras, photography cameras, webcams and the like. A deep review of the theoretical concepts behind focus in conventional cameras reveals that, despite its usefulness, the widely known thin lens model has several limitations for solving different focus-related problems in computer vision. In order to overcome these limitations, the focus profile model is introduced as an alternative to classic concepts, such as the near and far limits of the depth-of-field. The new concepts introduced in this dissertation are exploited for solving diverse focus-related problems, such as efficient image capture, depth estimation, visual cue integration and image fusion. The results obtained through an exhaustive experimental validation demonstrate the applicability of the proposed models.

Robust perceptual organization techniques for analysis of color images

Rodrigo Moreno Serrano

Candidate: Rodrigo Moreno Serrano
PhD Advisor: Dr. Domènec Puig
PhD Advisor: Dr. Miguel Ángel García
Date of defense: 2013-07-17
File: Download

Abstract:This thesis focuses on the development of new robust image analysis techniques more closely related to the way the human visual system behaves. One of the pillars of the thesis is the so called tensor voting technique. This is a robust perceptual organization technique that propagates and aggregates information encoded by means of tensors through a convolution like process. Its robustness and adaptability have been one of the key points for using tensor voting in this thesis. These two properties are verified in the thesis by applying tensor voting to three applications where it had not been applied so far: image structure estimation, edge detection and image segmentation of images acquired through stereo vision.

The most important drawback of tensor voting is that its usual implementations are highly time consuming. In this line, this thesis proposes two new efficient implumentations of tensor voting, both derived from an in depth analysis of this technique.

Despite its adaptability, this thesis shows that the original formulation of tensor voting (hereafter, classical tensor voting) is not adequate for some applications, since the hypotheses from which it is based are not suitable for all applications. This is particularly certain for color image denoising. Thus, this thesis shows that, more than a method, tensor voting can be thought of as a methodology in which the encoding and voting process can be tailored for every specific application, while maintaining the tensor voting spirit.

By following this reasoning, this thesis proposes a unified framework for both image denoising and robust edge detection.
This framework is an extension of the classical tensor voting in which both color and edginess the likelihood of finding an edge at every pixel of the image are encoded through tensors, and where the voting process takes into account a set of plausible perceptual criteria related to the way the human visual system processes visual information. Recent advances in the perception of color have been essential for designing such a voting process.

This new approach has been found effective, since it yields excellent results for both applications. In particular, the new method applied to image denoising has a better performance than other state of the art methods for real noise. This makes it more adequate for real applications, in which an image denoiser is indeed required. In addition, the method applied to edge detection yields more robust results than the state of the art techniques and has a competitive performance in recall, discriminability, precision, and false alarm rejection.

Moreover, this thesis shows how the results of this new framework can be combined with other techniques to tackle the problem of robust color image segmentation. The tensors obtained by applying the new framework are utilized to classify pixels into likely homogeneous and likely inhomogeneous. Those pixels are then sequentially segmented through a variation of an efficient graph based image segmentation algorithm. Experiments show that the proposed segmentation algorithm yields better scores in three of the five applied evaluation metrics when compared to the state of the art techniques with a competitive computational cost.

This thesis also proposes new evaluation techniques in the scope of image processing. First, two new metrics are proposed in the field of image denoising: one to measure how an algorithm is able to preserve edges, and the second to measure how a method is able not to introduce undesirable artifacts. Second, a new methodology for assessing edge detectors that avoids possible bias introduced by post processing is proposed. It consists of five new metrics for assessing recall, discriminability, precision, false alarm rejection and robustness. Finally, two new non parametric metrics are proposed for estimating the degree of over and undersegmentation yielded by image segmentation algorithms.

Supervised and unsupervised segmentation of textured images by efficient multi-level pattern classification

Jaime Christian Meléndez Rodríguez

Candidate: Jaime Christian Meléndez Rodríguez
PhD Advisor: Dr. Domènec Puig
PhD Advisor: Dr. Miguel Ángel García
Date of defense: 2010-10-08
File: Download

Abstract: This thesis proposes new, efficient methodologies for supervised and unsupervised image segmentation based on texture information. For the supervised case, a technique for pixel classification based on a multi-level strategy that iteratively refines the resulting segmentation is proposed. This strategy utilizes pattern recognition methods based on prototypes (determined by clustering algorithms) and support vector machines. In order to obtain the best performance, an algorithm for automatic parameter selection and methods to reduce the computational cost associated with the segmentation process are also included. For the unsupervised case, the previous methodology is adapted by means of an initial pattern discovery stage, which allows transforming the original unsupervised problem into a supervised one. Several sets of experiments considering a wide variety of images are carried out in order to validate the developed techniques.