Sorry, you need to enable JavaScript to visit this website.
Partager

Publications

 

Les publications de nos enseignants-chercheurs sont sur la plateforme HAL :

 

Les publications des thèses des docteurs du LTCI sont sur la plateforme HAL :

 

Retrouver les publications figurant dans l'archive ouverte HAL par année :

2020

  • Description de la méthode SCRUM à travers deux expériences en entreprise
    • Memmi Gérard
    , 2020. Deux expériences conduites dans deux jeunes pousses américaines servent de base pour décrire et discuter sur la méthode scrum. Le contexte de jeune pousse rend encore plus critique l’importance des questions relatives à l’affectation de ressources limitées et au respect rigoureux des délais. Après avoir introduit le manifeste Agile, le processus scrum est décrit. Les concepts désormais classiques de backlogs accompagnés des autres artefacts de la méthode Scrum sont intégrés à des artefacts de développement produit plus traditionnels afin de passer à l'échelle. Le concept de sprint est légèrement modifié pour mieux s’adapter à un environnement où les ressources sont rares. La gestion des livraisons produit est ensuite discutée en particulier celles correspondant à des versions majeures et critiques du produit. Ces dernières impliquent d’organiser et de coordonner plusieurs équipes scrum. Cet aspect de la méthode, tout du moins pour les deux expériences considérées, est essentiel. Un ensemble de questions fondamentales doivent être résolues pour que la méthode scrum soit adoptées par les équipes de développement et réussisse à passer à l’échelle. Ce rapport se termine par plusieurs recommandations pour améliorer la productivité du développement produit dans un contexte similaire à nos deux jeunes pousses.
  • Juger du Beau avec subjectivité : le défi de l’esthétique computationnelle
    • Maître Henri
    ISTE Openscience, 2020, 4 (4). Les techniques à base d’intelligence artificielle dont l’objectif est d’évaluer automatiquement la qualité esthétique d’une photographie, ont reçu une attention notable ces dernières années et peuvent se targuer de performances prometteuses. On constate cependant que la plupart d’entre-elles souffrent de limitations en raison de leur paradigme de base emprunté à l’esthétique platonicienne, qui attribue tous les critères de beauté à l’objet ou à la personne belle. A partir du très volumineux corpus consacré à l’esthétique depuis 25 siècles, ces limitations auraient pu être anticipées. Le reproche le plus fréquemment exprimé est que le jugement porté sur l’image ne prend pas en compte l’observateur et sa subjectivité. Sans surprise, plusieurs travaux très récents s’attaquent ce point délicat adoptant des approches diverses. Nous les discutons ici. (10.21494/ISTE.OP.2020.0588)
    DOI : 10.21494/ISTE.OP.2020.0588
  • Modélisation géométrique, simplification et visualisation des fibres de la matière blanche du cerveau
    • Mercier Corentin
    , 2020. Tractography data (fibers) obtained from diffusion MRI present several challenges.In this thesis, we propose some useful methods and algorithms for simplification, visualization, and manipulation of these data.We introduce a new multi-resolution representation for tractograms, faster, and with higher geometric accuracy than existing simplification approaches.We also investigate various geometric representations and focus on moving least square (MLS) projection with algebraic point set surfaces (APSS), on which we reduce the complexity, allowing for the use of global kernels for analysis and modeling.A segmentation technique using the multi-resolution representation is presented, achieving better reproducibility than other approaches.Tractograms being massive, we also introduce a compression algorithm taking advantage of data obtention from diffusion MRI.The algorithm speed even allows for the direct use of compressed data for visualization, as it can be decompressed on-the-fly on the GPU.This research and the obtained results lie at the intersection between Computer Graphics and Medical Data Analysis, paving the way for numerous perspectives.
  • Contributions to RSSI-based geolocation
    • Elgui Kevin
    , 2020. The Network-Based Geolocation has raised a great deal of attention in the context of the Internet of Things. In many situations, connected objects with low-consumption should be geolocated without the use of GPS or GSM. Geolocation techniques based on the Received Signal Strength Indicator (RSSI) stands out, because other location techniques may fail in the context of urban environments and/or narrow band signals. First, we propose some methods for the RSSI-based geolocation problem. The observation is a vector of RSSI received at the various base stations. In particular, we introduce a semi-parametric Nadaraya-Watson estimator of the likelihood, followed by a maximum a posteriori estimator of the object’s position. Experiments demonstrate the interest of the proposed method, both in terms of location estimation performance, and ability to build radio maps. An alternative approach is given by a k-nearest neighbors regressor which uses a suitable metric between RSSI vectors. Results also show that the quality of the prediction is highly related to the chosen metric. Therefore, we turn our attention to the metric learning problem. We introduce an original task-driven objective for learning a similarity between pairs of data points. The similarity is chosen as a sum of regression trees and is sequentially learned by means of a modified version of the so-called eXtreme Gradient Boosting algorithm (XGBoost). The last part of the thesis is devoted to the introduction of a Conditional Independence (CI) hypothesis test. The motivation is related to the fact that for many estimators, the components of the RSSI vectors are assumed independent given the position. The contribution is however provided in a general statistical framework. We introduce the weighted partial copula function for testing conditional independence. The proposed test procedure results from the following ingredients: (i) the test statistic is an explicit Cramér-von Mises transformation of the weighted partial copula, (ii) the regions of rejection are computed using a boot-strap procedure which mimics conditional independence by generating samples. Under the null hypothesis, the weak convergence of the weighted partial copula process is established and endorses the soundness of our approach.
  • Solutions aux limites des interrogateurs B-OTDR pour la surveillance d'infrastructures : augmentation de la portée de mesure et décorrélation des paramètres de température et de déformation
    • Clement Pierre
    , 2020. Cette thèse porte sur l’étude des systèmes de mesure répartie de la rétrodiffusion Brillouin dans une fibre optique, que l’on nomme B-OTDR et qui sont sensibles à la température et à la déformation de la fibre. Les solutions d'interrogateurs actuelles permettent l'instrumentation de grandes infrastructures. Cependant, il existe des limites, inhérentes au phénomène physique utilisé, qui ne permettent pas d'adresser certaines applications spécifiques. Ces limites portent sur la distance de mesure maximale accessible par ces interrogateurs mais également sur la décorrélation de la mesure des paramètres de température et de déformation. Nous avons donc cherché, au cours de cette thèse, à adresser des solutions à ces problématiques. Un nouveau système de ré-amplification optique, basé sur les technologies EDFA, a été mis au point. Associé à un système B-OTDR, cette solution nous a permis de réaliser une mesure distribuée de température sur 150 km de fibre avec une répétabilité de 1,5 °C. Cette avancée propose, à notre connaissance, les meilleurs résultats obtenus avec un tel système et nous permet d’envisager son déploiement pour la surveillance d’infrastructures du transport de l’énergie sur de longues distances. Nous avons dans un second temps conçu un nouvel interrogateur, utilisant la rétrodiffusion Brillouin, et permettant la décorrélation des mesures de température et de déformation sur une seule et unique fibre optique. Ce nouvel interrogateur, caractérisé et breveté durant cette thèse, a permis de réaliser une mesure indépendante de température et de déformation sur un câble spécifique inséré dans un puits de forage. Les résultats de ces mesures ont montré à la fois des variations de température et de déformation sur le câble, donnant des informations précieuses à l’opérateur du puits. Le nouvel interrogateur mis au point, permet une séparation de ces deux paramètres avec une répétabilité inférieure à 1 °C et 20 μm/m pour une distance de l’ordre du kilomètre. Pour des distances de l’ordre de la dizaine de kilomètres, la répétabilité de mesure est de 3 °C et 75 μm/m. Ce résultat fait l'état de l'art dans la séparation température/déformation par B-OTDR. Enfin, les différents travaux réalisés pour répondre à ces deux problématiques ont abouti au développement d’un prototype d’interrogateur qui laisse envisager une mesure simultanée de la température, de la déformation, des vibrations acoustiques et de la pression hydrostatique. Ce prototype conduit à des perspectives intéressantes pour une solution complète de surveillance d’infrastructures.
  • Recent Trends in Statistical Analysis of Event Logs for Network-Wide Intrusion Detection
    • Larroche Corentin
    • Mazel Johan
    • Clémençon Stéphan
    , 2020. Event logs are information-rich and complex data that keep track of the activity taking place in a computer network, and can therefore contain traces of malicious activity when an intrusion happens. However, such traces are scarce and buried under considerable volumes of unrelated information, making the use of event logs for intrusion detection a challenging research topic. We review some recent contributions to that area of research, focusing on the application of statistical analysis to various types of event logs collected over a computer network. Emphasis is put on the formalism used to translate the data into a collection of mathematical objects suited to statistical modelling.
  • Networks with mixed-delay constraints
    • Nikbakht Homa
    , 2020. Modern wireless communication networks have to accommodate different types of data traffic with different latency constraints. In particular, delay-sensitive video-applications represent an increasing portion of data traffic. Modern networks also have to accommodate high total data rates, which they can accomplish for example with cooperating terminals or with helper relays such as drones. However, cooperation typically introduces additional communication delays, and is thus not applicable to delay-sensitive data traffic.This thesis focuses on interference networks with mixed-delay constraints and on system architectures where neighbouring transmitters and/or neighbouring receivers can cooperate. In such systems, delay-sensitive messages have to be encoded and decoded without further delay and thus cannot benefit from available cooperation links.We propose various coding schemes that can simultaneously accommodate the transmission of both delay-sensitive and delay-tolerant messages. For the proposed schemes we analyze the multiplexing gains (MG) they achieve over Wyner's soft hand-off network, Wyner's symmetric network, the hexagonal network and the sectorized hexagonal network. For Wyner's soft hand-off network and Wyner's symmetric network, we also provide tight information-theoretic converse results and thus establish the exact set of MG pairs that can simultaneously be achieved for delay-sensitive and delay-tolerant data. These results demonstrate that when both transmitters and receivers cooperate and the cooperation rates are sufficiently large, it is possible to achieve the largest MG for delay-sensitive messages without penalizing the maximum sum MG of both delay-sensitive and delay-tolerant messages. In contrast, under our proposed schemes, the sending of delay-sensitive data in hexagonal models decreases the maximum sum MG. This penalty vanishes when we consider the sectorized hexagonal network where each cell is divided into three non-interfering sectors by employing directional antennas at the base stations.We further propose similar coding schemes for scenarios with different types of random user activity. We specifically consider two setups. In the first setup, each active transmitter always has delay-tolerant data to send and delay-sensitive data arrival is random. In the second setup, both delay-tolerant and delay-sensitive data arrivals are random. The obtained MG regions show that in the first setup, increasing the delay-sensitive MG always decreases the sum MG. In contrast, in the second setup, for certain parameters, the highest sum MG is achieved at maximum delay-sensitive MG and thus increasing the delay-sensitive MG provides a gain in sum MG.Additionally, we also study a cloud radio access network with mixed delay constraints, i.e., where each mobile user can simultaneously send a delay-sensitive and a delay-tolerant stream and only the delay-tolerant data is jointly decoded at the cloud unit. For this network, we derive inner and outer bounds on the capacity region under mixed delay constraints, and we exactly characterize the optimal MG region. At high signal-to-noise ratio (SNR), our results show that for moderate fronthaul capacities, the maximum MG for delay-sensitive messages remains unchanged over a large regime of small and moderate MGs of delay-sensitive messages. The sum MG is thus improved if some of the messages can directly be decoded at the base stations. At moderate SNR, the results show that when the data rate of delay-sensitive messages is small or moderate, the achievable sum rate is constant.
  • Using Digital Sensors to Leverage Chips' Security
    • Ebrahimabadi Mohammad
    • Anik Md Toufiq Hasan
    • Danger Jean-Luc
    • Guilley Sylvain
    • Karimi Naghmeh
    , 2020, pp.1-6. (10.1109/PAINE49178.2020.9337730)
    DOI : 10.1109/PAINE49178.2020.9337730
  • Characterization of Electromagnetic Fault Injection on a 32-bit Microcontroller Instruction Buffer
    • Trabelsi Oualid
    • Sauvage Laurent
    • Danger Jean-Luc
    , 2020, pp.1-6. (10.1109/AsianHOST51057.2020.9358270)
    DOI : 10.1109/AsianHOST51057.2020.9358270
  • Radio Resource Dimensioning for Low Delay Access in Licensed OFDMA IoT Networks
    • Yu Yi
    • Mroueh Lina
    • Martins Philippe
    • Vivier Guillaume
    • Terré Michel
    Sensors, MDPI, 2020, 20 (24), pp.7173. In this paper, we focus on the radio resource planning in the uplink of licensed Orthogonal Frequency Division Multiple Access (OFDMA) based Internet of Things (IoT) networks. The average behavior of the network is considered by assuming that active sensors and collectors are distributed according to independent random Poisson Point Process (PPP) marked by channel randomness. Our objective is to statistically determine the optimal total number of Radio Resources (RRs) required for a typical cell. On one hand, the allocated bandwidth should be sufficiently large to support the traffic of the devices and to guarantee a low access delay. On the other hand, the over-dimensioning is costly from an operator point of view and induces spectrum wastage. For this sake, we propose statistical tools derived from stochastic geometry to evaluate, adjust and adapt the allocated bandwidth according to the network parameters, namely the required Quality of Service (QoS) in terms of rate and access delay, the density of the active sensors, the collector intensities, the antenna configurations and the transmission modes. The optimal total number of RRs required for a typical cell is then calculated by jointly considering the constraints of low access delay, limited power per RR, target data rate and network outage probability. Different types of networks are considered including Single Input Single Output (SISO) systems, Single Input Multiple Output (SIMO) systems using antenna selection or Maximum Ratio Combiner (MRC), and Multiuser Multiple Input Multiple Output (MU-MIMO) systems using Zero-Forcing decoder. (10.3390/s20247173)
    DOI : 10.3390/s20247173
  • Improving parallel performance of ensemble learners for streaming data through data locality with mini-batching
    • Cassales Guilherme Weigert
    • Gomes Heitor Murilo
    • Bifet Albert
    • Pfahringer Bernhard
    • Senger Hermes
    , 2020, pp.138--146. Machine Learning techniques have been employed in virtually all domains in the past few years. New applications demand the ability to cope with dynamic environments like data streams with transient behavior. Such environments present new requirements like incrementally process incoming data instances in a single pass, under both memory and time constraints. Furthermore, prediction models often need to adapt to concept drifts observed in non-stationary data streams. Ensemble learning comprises a class of stream mining algorithms that achieved remarkable prediction performance in this scenario. Implemented as a set of (several) individual component classifiers whose predictions are combined to predict new incoming instances, ensembles are naturally amendable for task parallelism. Despite its relevance, an efficient implementation of ensemble algorithms is still challenging. For example, dynamic data structures used to model non-stationary data behavior and detect concept drifts cause inefficient memory usage patterns and poor cache memory performance in multi-core environments. In this paper, we propose a minibatching strategy which can significantly reduce cache misses and improve the performance of several ensemble algorithms for stream mining in multi-core environments. We assess our strategy on four different state-of-art ensemble algorithms applying four widely used machine learning benchmark datasets with varied characteristics. Results from two different hardware show speedups of up to 5X on 8-core processors with ensembles of 100 and 150 learners. The benefits come at the cost of changes in predictive performances. (10.1109/HPCC-SMARTCITY-DSS50907.2020.00018)
    DOI : 10.1109/HPCC-SMARTCITY-DSS50907.2020.00018
  • Efficient distributed solutions for sharing energy resources at local level: a cooperative game approach
    • Kiedanski Diego
    • Bušić Ana
    • Kofman Daniel
    • Orda Ariel
    , 2020. Local energy generation as well as local energy storage represent key opportunities for energy transition. Nevertheless , their massive deployment is being delayed mainly due to cost reasons. Sharing resources at the local level enables not only reducing these costs significantly, but also to further optimize the cost of the energy exchanged with providers external to the local community. A key question that arises while sharing resources is how to distribute the obtained benefits among the various local players that cooperate. In this paper we propose a cooperative game model, where the players are the holders of energy resources (generation and storage); they cooperate in order to reduce their individual electricity costs. We prove that the core of the game is non-empty; i.e., the proposed cooperative game has a stable solution (distribution of the payoffs among the players) for the case where all players participate in a unique community, and no strict subset of players can obtain a better gain by leaving the community. We propose a formulation of this game, based on the theory of linear production games, which lead us to the two main contributions of this paper. First, we propose an efficient (with linear complexity) centralized algorithm for finding a stable payoff. Second, we provide an efficient distributed algorithm that computes an allocation in the core of the game without any requirement for the players to share any private information. The distributed algorithm requires the exchange of intermediate solutions among players. The topology of the network that enables these exchanges is closely related to the performance of the distributed algorithm. We show, by way of simulations, which are the best topologies for these communication graphs.
  • Review of the Third Summer School on the Practice and Theory of Distributed Computing SPTDC 2020
    • Kuznetsov Petr
    • Aksenov Vitaly
    ACM SIGACT News, Association for Computing Machinery (ACM), 2020, 51 (4), pp.82-84. The third edition of the Summer School on Practice and Theory of Distributed Computing (SPTDC 2020) took place on July 6-9, 2020. Because of the ongoing pandemic, the school was held in a virtual format. Following the strategy of the rst and the second editions, we invited prominent researchers working on distributed systems to give lectures on their favourite topics and hold virtual discussions with the students. This year the school pursued three major themes: persistent computing, large-scale replicated systems and cryptographic tools in distributed systems. The program featured classes from Alexey Gotsman, Michael Scott, Christian Cachin, Prasad Jayanti, Idit Keidar, Rodrigo Rodrigues and Jing Chen. We had a record number of 150 registered participants: undergraduate and graduate students, as well as industrial researchers and software engineers from all around the globe. (10.1145/3444815.3444828)
    DOI : 10.1145/3444815.3444828
  • Mechanisms and architectures to encourage the massive and efficient use of local renewable energy
    • Kiedanski Diego
    , 2020. To meet carbon reduction goals in Europe but worldwide too, a large number of renewable distributed energy resources (DER) still need to be deployed.Aiming at mobilizing private capitals, several plans have been developed to put end-customers at the heart of the energy transition, hoping to accelerate the adoption of green energy by increasing its attractiveness and profitability.Some of the proposed models include the creation of local energy markets where households can sell their energy to their neighbors at a higher price than what the government would be willing to pay (but lower than what other customers normally pay), shared investment models in which consumers own a carbon-free power plant such as a wind turbine or a solar farm and they obtain dividends from its production to collective auto-consumption models in which several families are ‘hidden’ behind the same smart meter, allowing them to optimize their aggregated consumption profile and therefore maximizing the value of their DER.One of the main objectives of the thesis is to understand these different incentives as they will play a crucial role in tackling climate change if correctly implemented. To do so, we design a framework ‘local energy trading’ that encompasses a large number of incentives.In the context of local energy trading, we study the interactions of prosumers (consumers with generation capabilities) located in the same Low Voltage network, possibly behind the same feeder. These prosumers will still be connected to the main power grid and they will have the option, as they do today, to buy and sell to/from their utility company at a fixed price (a flat rate or a Time-of-Use, for example). For these agents to fully benefit from the advantages of local energy trading, we shall assume that they own appliances (such as batteries) that, without changing their perceived energy demand, can enable them to change their net energy demand as seen from outside their homes. Modeling prosumers as rational utility maximizers, they will schedule their battery to decrease the cost associated with their net energy demand (as their perceived demand remains unchanged).In the first part of the thesis, we investigate competitive models in which prosumers sell their surplus to their neighbors via a local energy market. We analyze different strategies that players could use to participate in these markets and their impact on the normal operation of the power grid and the Distribution System Operator. In this regard, it is shown that sequential markets can pose a problem to the system and a new market mechanism that exploits domain knowledge is proposed to increase the efficiency of the local trades.In the second part of the thesis, we delve into incentives that can be implemented through cooperation. In this regard, we use cooperative game theory to model the shared investment into energy storage and photovoltaic panels (PV) by a group of prosumers. For the studied model we show that a stable solution (in the core of the game) exists in which all participants cooperate and we provide an efficient algorithm to find it. Furthermore, we also show that cooperation is stable for participants that already own batteries and PVs but prefer to operate them in coordination to increase their value, effectively implementing collective auto-consumption.Finally, we demonstrate how to integrate both models: the shared investment and the cooperative control of existing resources into a single cooperative framework which also enjoys the existence of stable outcomes. For this later model, we propose to decouple the return over investments (ROI) obtained between the ROI produced by the investment in hardware and the ROI obtained by cooperation itself. By doing so, we can offer the former profit to external investors to raise the required capital (although nothing forbids the member of the coalition to contribute) and the latter to the actual consumers.
  • Secure authentication protocol for Internet of Things
    • Fayad Achraf
    , 2020. The interconnection of private resources on public infrastructure, user mobility and the emergence of new technologies (vehicular networks, sensor networks, Internet of things, etc.) have added new requirements in terms of security on the server side as well as the client side. Examples include the processing time, mutual authentication, client participation in the choice of security settings and protection against traffic analysis. Internet of Things (IoT) is in widespread use and its applications cover many aspects of today's life, which results in a huge and continuously increasing number of objects distributed everywhere.Security is no doubt the element that will improve and strengthen the acceptability of IoT, especially that this large scale deployment of IoT systems will attract the appetite of the attackers. The current cyber-attacks that are operational on traditional networks will be projected towards the Internet of Things. Security is so critical in this context given the underlying stakes; in particular, authentication has a critical importance given the impact of the presence of malicious node within the IoT systems and the harm they can cause to the overall system. The research works in this thesis aim to advance the literature on IoT authentication by proposing three authentication schemes that satisfy the needs of IoT systems in terms of security and performance, while taking into consideration the practical deployment-related concerns. One-Time Password (OTP) is an authentication scheme that represents a promising solution for IoT and smart cities environments. This research work extends the OTP principle and propose a new approach to generate OTP based on Elliptic Curve Cryptography (ECC) and Isogeny to guarantee the security of such protocol. The performance results obtained demonstrate the efficiency and effectiveness of our approach in terms of security and performance.We also rely on blockchains in order to propose two authentication solutions: first, a simple and lightweight blockchain-based authentication scheme for IoT systems based on Ethereum, and second, an adaptive blockchain-based authentication and authorization approach for IoT use cases. We provided a real implementation of our proposed solutions. The extensive evaluation provided, clearly shows the ability of our schemes to meet the different security requirements with a lightweight cost in terms of performance.
  • Spectral Approach to Process the (Multivariate) High-Order Template Attack against Any Masking Scheme
    • Ouladj Maamar
    • Guilley Sylvain
    • Guillot Philippe
    • Mokrane Farid
    Journal of Cryptographic Engineering, Springer, 2020. Cryptographic software is particularly vulnerable to side-channel attacks when programmed in embedded devices. Indeed, the leakage is particularly intense compared to the noise level, making it mandatory for the developer to implement side-channel attack protections. Random masking is a customary option, but in this case, the countermeasure must be high-order, meaning that each sensitive variable is splitted into multiple (at least two) shares. Attacks therefore become computationally challenging. In this paper, we show that high-order template attacks can be expressed under the form of a convolution. This formulation allows for a considerable speed-up in their computation thanks to fast Fourier transforms. To further speedup the attack, we also provide an interesting multi-threading implementation of this approach. This strategy naturally applies to template attacks where the leakage of each share is multivariate. We show that this strategy can be adapted to several masking schemes, inherently to the way the splitting is realized. This technique allows us to validate multiple very high-order attacks (order of some tens). In particular, it revealed a non-trivial flaw (hard to detect otherwise) in a multivariate extension of the DSM masking (and subsequently to fix it, and validate its rationale).
  • Statistical control of sparse models in high dimension
    • Chevalier Jérôme-Alexis
    , 2020. In this thesis, we focus on the multivariate inference problem in the context of high-dimensional structured data. More precisely, given a set of explanatory variables (features) and a target, we aim at recovering the features that are predictive conditionally to others, i.e., recovering the support of a linear predictive model. We concentrate on methods that come with statistical guarantees since we want to have a control on the occurrence of false discoveries. This is relevant to inference problems on high-resolution images, where one aims at pixel- or voxel-level analysis, e.g., in neuroimaging, astronomy, but also in other settings where features have a spatial structure, e.g., in genomics. In such settings, existing procedures are not helpful for support recovery since they lack power and are generally not tractable. The problem is then hard both from the statistical modeling point of view, and from a computation perspective. In these settings, feature values typically reflect the underlying spatial structure, which can thus be leveraged for inference. For example, in neuroimaging, a brain image has a 3D representation and a given voxel is highly correlated with its neighbors. We notably propose the ensemble of clustered desparsified Lasso (ecd-Lasso) estimator that combines three steps: i) a spatially constrained clustering procedure that reduces the problem dimension while taking into account data structure, ii) the desparsified Lasso (d-Lasso) statistical inference procedure that is tractable on reduced versions of the original problem, and iii) an ensembling method that aggregates the solutions of different compressed versions of the problem to avoid relying on only one arbitrary data clustering choice. We consider new ways to control the occurrence of false discoveries with a given spatial tolerance. This control is well adapted to spatially structured data. In this work, we focus on neuroimaging datasets but the methods that we present can be adapted to other fields which share similar setups.
  • Streaming Time Series Forecasting using Multi-Target Regression with Dynamic Ensemble Selection
    • Boulegane Dihia
    • Bifet Albert
    • Elghazel Haytham
    • Madhusudan Giyyarpuram
    , 2020, pp.2170--2179. In mining temporal data streams, Dynamic Ensemble Selection (DES) has emerged as one of the most promising approaches of ensemble methods based on the assumption that each member of the ensemble is an expert in some local area of the stream. The aim is to select, on the fly, according to a given test instance x, a subset of experts from a pool of various models. To this end, meta-learning has been widely studied to predict the performance of each base-model and accordingly select the best ones and combine their outputs to compute the final prediction. However, most of the existing selection methods for time series forecasting on data streams do not handle model's dependencies, and therefore maybe missing useful insights. In this paper, we propose a novel approach to harness the potential dependencies within base-models' behavior based on Incremental Multi-Target Regression (MTR) to achieve Dynamic Ensemble Selection (DES). We show that explicitly considering models' dependencies improves overall performance. This work is the first to use Incremental MTR for learning the behavior of each component in an ensemble of forecasters on data streams. Finally, we conduct an extensive experimental study to compare the performance of the proposed methods against state-of-the-art approaches. (10.1109/BIGDATA50022.2020.9378264)
    DOI : 10.1109/BIGDATA50022.2020.9378264
  • Random walk on simplicial complexes
    • Zhang Zhihan
    , 2020. The notion of Laplacian of a graph can be generalized to simplicial complexes and hypergraphs. This notion contains information on the topology of these structures. In the first part of this thesis, we define a new Markov chain on simplicial complexes. For a given degree k of simplices, the state space is not thek-simplices as in previous papers about this subject but rather the set of k-chains or k-cochains.This new framework is the natural generalization on the canonical Markov chains on graphs.We show that the generator of our Markov chainis related to the upper Laplacian defined in the context of algebraic topology for discrete structure. We establish several key properties of this new process. We show that when the simplicial complexes under scrutiny are a sequence of ever refining triangulation of the flat torus, the Markov chains tend to a differential form valued continuous process.In the second part of this thesis, we explore some applications of the random walk, i.e., random walk based hole detection and simplicial complexes kernels. For the random walk based hole detection, we introduce an algorithm tomake simulations carried for the cycle-valuedrandom walk (k = 1) on a simplicial complex with holes. For the simplicial complexes kernels,we extend the definition of random walk based graph kernels in order to measure the similarity between two simplicial complexes.
  • AutoML for Stream k-Nearest Neighbors Classification
    • Bahri Maroua
    • Veloso Bruno
    • Bifet Albert
    • Gama João
    , 2020, pp.597--602. The last few decades have witnessed a significant evolution of technology in different domains, changing the way the world operates, which leads to an overwhelming amount of data generated in an open-ended way as streams. Over the past years, we observed the development of several machine learning algorithms to process big data streams. However, the accuracy of these algorithms is very sensitive to their hyper-parameters, which requires expertise and extensive trials to tune. Another relevant aspect is the high-dimensionality of data, which can causes degradation to computational performance. To cope with these issues, this paper proposes a stream k-nearest neighbors (kNN) algorithm that applies an internal dimension reduction to the stream in order to reduce the resource usage and uses an automatic monitoring system that tunes dynamically the configuration of the kNN algorithm and the output dimension size with big data streams. Experiments over a wide range of datasets show that the predictive and computational performances of the kNN algorithm are improved. (10.1109/BIGDATA50022.2020.9378396)
    DOI : 10.1109/BIGDATA50022.2020.9378396
  • Low-power and radiation resilient approaches for spectrum sensing
    • Maciel de Paiva Junior Nilson
    , 2020. The advancement of technology has enabled a great increase in the number of users and the amount of information to be transmitted. In recent years, the demand for high download rates, massive connection, low latencies, and energy efficiency has increased, mainly due the popularization of IoT devices and the introduction of industry 4.0. This has led to a significant increase in demand for the frequency spectrum to accommodate new services or to improve existing ones. One of the alternatives to deal with this problem is the use of cognitive radios (CRs). They are able to sense the spectrum and see which bands are not currently being used. Among the various challenges related to CR, spectrum sensing is one of the most important and one of the primary functions of these radios. Wideband spectrum sensing presents several challenges, including antennas and processing a lot of data. However, in several moments the spectrum can be considered sparse allowing the use of compressive sensing (CS) in order to reduce the amount of samples required and thereby reducing processing resources. In terms of hardware, the use of CS can be translated in analog-to-information converters (AICs) instead of implementing analog-to-digital converters (ADCs) with high sampling rates. Furthermore, it is interesting to implement low-power devices. Downscaling transistors to nanometers helps to reduce consumption and area. However, other alternatives have been studied to decrease the leakage power. Among these alternatives, Magnetic Tunnel Junction (MTJ) has been very promising. In addition, the downscaling transistors makes circuits more sensitive to Single Event Transient (SET), and although MTJ is more robust than transistors to radiation, it is necessary to study how this impact is and how to reduce it. In this context, this thesis focuses on the SET effects analysis and MTJ applications which can be used in an AIC to perform the wideband spectrum sensing. The main contributions of this thesis are the analysis of SET effects in a comparator which is one of the main components of an ADC, analysis of SET effects in MTJ structures, and the proposition of an MTJ-based ADC which can be used in an AIC to perform the wideband spectrum sensing.
  • C-SMOTE: Continuous Synthetic Minority Oversampling for Evolving Data Streams
    • Bernardo Alessio
    • Gomes Heitor Murilo
    • Montiel Jacob
    • Pfahringer Bernhard
    • Bifet Albert
    • Valle Emanuele Della
    , 2020, pp.483--492. Streaming Machine Learning (SML) studies single-pass learning algorithms that update their models one data item at a time given an unbounded and often non-stationary flow of data (a.k.a., in presence of concept drift). Online class imbalance learning is a branch of SML that combines the challenges of both class imbalance and concept drift. In this paper, we investigate the binary classification problem of rebalancing an imbalanced stream of data in the presence of concept drift, accessing one sample at a time. We propose Continuous Synthetic Minority Oversampling Technique (C-SMOTE), a novel rebalancing meta-strategy to pipeline with SML classification algorithms. C-SMOTE is inspired by the popular SMOTE algorithm but operates continuously. We benchmark C-SMOTE pipelines on ten different groups of data streams. We bring empirical evidence that models learnt with C-SMOTE pipelines outperform models trained on imbalanced data stream without losing the ability to deal with concept drifts. Moreover, we show that they outperform other stream balancing techniques from the literature. (10.1109/BIGDATA50022.2020.9377768)
    DOI : 10.1109/BIGDATA50022.2020.9377768
  • Encoding the latent posterior of Bayesian Neural Networks for uncertainty quantification
    • Franchi Gianni
    • Bursuc Andrei
    • Aldea Emanuel
    • Dubuisson Séverine
    • Bloch Isabelle
    , 2020. Bayesian Neural Networks (BNNs) have been long considered an ideal, yet unscalable solution for improving the robustness and the predictive uncertainty of deep neural networks. While they could capture more accurately the posterior distribution of the network parameters, most BNN approaches are either limited to small networks or rely on constraining assumptions, e.g., parameter independence. These drawbacks have enabled prominence of simple, but computationally heavy approaches such as Deep Ensembles, whose training and testing costs increase linearly with the number of networks. In this work we aim for efficient deep BNNs amenable to complex computer vision architectures, e.g., ResNet50 DeepLabV3+, and tasks, e.g., semantic segmentation, with fewer assumptions on the parameters. We achieve this by leveraging variational autoencoders (VAEs) to learn the interaction and the latent distribution of the parameters at each network layer. Our approach, Latent-Posterior BNN (LP-BNN), is compatible with the recent BatchEnsemble method, leading to highly efficient (in terms of computation and memory during both training and testing) ensembles. LP-BNNs attain competitive results across multiple metrics in several challenging benchmarks for image classification, semantic segmentation and out-of-distribution detection.
  • Decoding algorithms for lattices
    • Corlay Vincent
    , 2020. This thesis discusses two problems related to lattices, an old problem and a new one.Both of them are lattice decoding problems: Namely, given a point in the space, find the closest lattice point.The first problem is related to channel coding in moderate dimensions. While efficient lattice schemes exist in low dimensions n < 30 and high dimensions n > 1000, this is not the case of intermediate dimensions. We investigate the decoding of interesting lattices in these intermediate dimensions. We introduce new families of lattices obtained by recursively applying parity checks. These families include famous lattices, such as Barnes-Wall lattices, the Leech and Nebe lattices, as well as new parity lattices.We show that all these lattices can be efficiently decoded with an original recursive list decoder.The second problem involves neural networks. Since 2016 countless papers tried to use deep learning to solve the decoding/detection problem encountered in digital communications. We propose to investigate the complexity of the problem that neural networks should solve. We introduce a new approach to the lattice decoding problem to fit the operations performed by a neural network. This enables to better understand what a neural network can and cannot do in the scope of this problem, and get hints regarding the best architecture of the neural network. Some computer simulations validating our analysis are provided.
  • Modélisation stochastique et physique de la propagation 5G indoor en bandes millimétriques
    • Nassif Georges
    , 2020. La 5G introduit des nouvelles bandes de fréquences, les ondes millimétriques, qui est une solution prometteuse au problème de l'épuisement du spectre grâce à la disponibilité de larges bandes passantes. Cependant, en raison de leurs propriétés de propagation, notamment leur très courte longueur d'onde, cette bande de fréquences peut avoir un impact sévère sur la transmission, conduisant à un nouveau défi majeur : la couverture indoor. Cela nécessite de nouvelles études pour répondre à trois questions fondamentales : Quel est l'impact des diverses géométries d'environnement (appartement, usine, etc.) sur la propagation des ondes millimétriques en indoor ? Quel est l'impact des divers matériaux de l'environnement (béton, bois, etc.) sur la propagation des ondes millimétriques en indoor ? et quelles méthodes de planification doit-on utiliser pour les diverses applications 5G indoor (accès fixe, industrie 4.0, etc.). Dans ce manuscrit, nous abordons ce problème à travers un nouveau cadre théorique qui combine la modélisation stochastique de l'environnement indoor avec la simulation avancée de la propagation physique. Cette approche est particulièrement adaptée pour étudier la propagation des ondes millimétriques 5G dans le cas où l'émetteur et le récepteur sont tous les deux en indoor. L'implémentation informatique de cette approche, appelée iGeoStat, génère des environnements typiques paramétrés qui tiennent compte des variations spatiales en indoor, puis simule la propagation radio en fonction de l'interaction physique entre les ondes électromagnétiques et les propriétés des matériaux. Ce cadre n'est pas dédié à un environnement, un matériau, une fréquence ou un cas d'usage particulier et vise à comprendre statistiquement l'influence des paramètres de l'environnement indoor sur les propriétés de propagation des ondes millimétriques, en particulier la couverture, le SINR et le path loss.L'implémentation d'iGeoStat soulève de nombreux défis de calcul informatique que nous résolvons en formulant un bilan de liaison adapté et de nouveaux algorithmes d'optimisation de la mémoire. Les premiers résultats de simulation pour deux applications 5G majeures (accès fixe et industrie 4.0) sont comparés à des données de mesure et montrent l'efficacité d'iGeoStat pour simuler la diffusion multiple dans des environnements réalistes, dans un temps et des ressources mémoire raisonnables. iGeostat génère des cartes qui confirment que la diffusion a potentiellement un impact majeur sur la propagation des ondes millimétriques en indoor et qu'une modélisation physique appropriée est de la plus haute importance pour générer des modèles de propagation pertinents. En utilisant iGeoStat, les principaux paramètres de propagation sont étudiés dans divers scénarios, montrant que l'indice de réfraction complexe du matériau indoor a un impact modéré sur la puissance reçue, tandis que la rugosité de surface a un impact majeur, pouvant modifier profondément le profil de puissance mesurée dans l'environnement. Diverses techniques d'accélération sont également présentées dans ce travail où nous montrons à travers les résultats de simulations que les performances d'iGeoStat peuvent être améliorées davantage et qu'il est possible d'obtenir une réduction d'au moins 50% du temps de simulation et de l'utilisation de la mémoire sans impacter l'aspect physique de la propagation.