Sorry, you need to enable JavaScript to visit this website.
Share

Publications

2021

  • Data expression : understanding and supporting alternatives in data analysis processes
    • Liu Jiali
    , 2021. To make sense of data, analysts consider different kinds of alternatives: they explore diverse sets of hypotheses, try out different types of methods, and experiment with a broad space of possible solutions. These alternatives influence each other within a dynamic and complex sensemaking process. Current analytic tools, however, rarely consider such alternatives as an integral part of the analysis, making the process cumbersome and cognitively demanding. Applying various empirical methods and tool designs, we address the following questions: (1) What are alternatives and how do they fit within the sensemaking process?And (2) how can tools better support the exploration and management of alternatives? This dissertation contains three parts: Part I explores the role of alternatives through interviews and observations with analysts. Based on the results and our analysis, we contribute characterisations of alternatives and a framework to help describe and reason about them. Part II focuses on supporting alternatives in the context of affinity diagramming for qualitative data analysis. Through interviews with practitioners and combined with our own experience, we propose a design space to characterise the various kinds of alternatives engaged in such sensemaking process.We further provide a vision and proof-of-concept system, ADQDA, to show how analysts can fluidly transition between alternative analysis phases, methods, representations, and how they can flexibly appropriate various devices to suit for the tasks at hand or to extend the analysis space. Part III discusses alternatives in the context of reuse. We envision a novel reuse technique, ”computational transclusion”, which maintains various dynamic links between the original and the reused contents (the alternatives) to facilitate tracking and coordinating changes.We built a sandbox system to probe into different reuse scenarios and explore the various links between alternatives and their possible reifications in notebook-ish user interface.
  • Longitudinal, large-scale and unbiased Internet measurements
    • Salutari Flavia
    , 2021. Today, a world without the Internet is unimaginable. By interconnecting billions of people worldwide and by offering an uncountable number of services, it is now fully embedded in the modern society. Yet, despite technology evolution and development, its pervasiveness and heterogeneity still raise new challenges, such as security concerns, monitoring of the users' Quality of Experience (QoE), care for transparency and fairness. Accordingly, the goal of this thesis is to shed new light on some of the challenges emerged in recent years. In particular, we provide an in-depth analysis of some of the most prominent aspects of modern Internet. A particular emphasis is given on the World Wide Web, which among all, is undoubtedly one of the most popular Internet applications, and a specific regard to its interaction with machine learning. The first part of this work studies the Quality of Experience of users' browsing the Web, with measurements led both in the wild and in controlled environments. Our contributions follow with an original analysis of both the subjective user feedback and the objective QoE metrics, showing how hard it is to build accurate supervised data-driven models capable to predict the user satisfaction, along with an in-depth discussion of the multi-modal nature of the subjective user opinions.In the second part of this work, we analyze and discuss the fairness of state-of-the-art transformer-based language models, which are pre-trained on Web-based corpora and which are typically used to solve a wide variety of Natural Language Processing (NLP) tasks. Here, we question whether the sheer size and heterogeneity of the Web guarantee diversity in the models. The core of our contributions rests in the measure of the bias embedded in the models, that we discuss under different angles. Finally, the last part of this dissertation addresses the classification of objects generated by machines through some of the simplest state-of-the-art supervised machine learning algorithms. Through a minimally intrusive, robust and lightweight framework, we show that the different behaviors of a field of the IP packet, the IP identification (IP-ID), could be easily classified with few features having high discriminative power. We finally apply our technique to an Internet-wide census and provide an updated view of the adoption of the different implementations in the Internet.
  • From local hesitations to global impressions of a speaker's feeling of knowing
    • Dinkar Tanvi
    • Biancardi Beatrice
    • Clavel Chloé
    , 2021.
  • Learning to Rank Anomalies: Scalar Performance Criteria and Maximization of Two-Sample Rank Statistics
    • Limnios Myrto
    • Noiry Nathan
    • Clémençon Stéphan
    , 2021, 154. The ability to collect and store ever more massive databases has been accompanied by the need to process them efficiently. In many cases, most observations have the same behavior, while a probable small proportion of these observations are abnormal. Detecting the latter, defined as outliers, is one of the major challenges for machine learning applications (e.g. in fraud detection or in predictive maintenance). In this paper, we propose a methodology addressing the problem of outlier detection, by learning a data-driven scoring function defined on the feature space which reflects the degree of abnormality of the observations. This scoring function is learnt through a well-designed binary classification problem whose empirical criterion takes the form of a two-sample linear rank statistics on which theoretical results are available. We illustrate our methodology with preliminary encouraging numerical experiments. (10.48550/arXiv.2109.09590)
    DOI : 10.48550/arXiv.2109.09590
  • Laser Fault Injection in a 32-bit Microcontroller: from the Flash Interface to the Execution Pipeline
    • Khuat Vanthanh
    • Danger Jean-Luc
    • Dutertre Jean-Max
    , 2021, pp.74-85. (10.1109/FDTC53659.2021.00020)
    DOI : 10.1109/FDTC53659.2021.00020
  • Towards Finding Best Linear Codes for Side-Channel Protections
    • Cheng Wei
    • Liu Yi
    • Guilley Sylvain
    • Rioul Olivier
    , 2021. Side-channel attacks aim at extracting secret keys from cryptographic devices. Randomly masking the implementation is a provable way to protect the secrets against this threat. Recently, various masking schemes have converged to the ``code-based masking'' philosophy. In code-based masking, different codes allow for different levels of side-channel security. In practice, for a given leakage function, it is important to select the code which enables the best resistance, i.e., which forces the attacker to capture and analyze the largest number of side-channel traces. This paper is a first attempt to address the constructive selection of the optimal codes in the context of side-channel countermeasures, in particular for code-based masking when the device leaks information in the Hamming weight leakage model. We show that the problem is related to the weight enumeration of the extended dual of the masking code. We first present mathematical tools to study those weight enumeration polynomials, and then provide an efficient method to search for good codes, based on a lexicographic sorting of the weight enumeration polynomial from lowest to highest degrees.
  • Mode locking and frequency comb generation by four-wave mixing in a semiconductor quantum-dot active medium
    • Grillot Frederic
    , 2021.
  • Secondary constructions of (non)weakly regular plateaued functions over finite fields
    • Mesnager Sihem
    • Özbudak Ferruh
    • Sinak Ahmet
    Turkish Journal of Mathematics, Scientific and Technical Research Council of Turkey, 2021, 45 (5), pp.2295-2306. (10.3906/mat-2104-5)
    DOI : 10.3906/mat-2104-5
  • IETF Draft, "Secure Element for TLS Version 1.3" https://datatracker.ietf.org/doc/html/draft-urien-tls-se-03
    • Urien Pascal
    , 2021.
  • Perspectives on Advances in Quantum Dot Lasers and Integration with Si Photonic Integrated Circuits
    • Shang Chen
    • Wan Yating
    • Selvidge Jennifer
    • Hughes Eamonn
    • Herrick Robert
    • Mukherjee Kunal
    • Duan Jianan
    • Grillot Frederic
    • Chow Weng
    • Bowers John
    ACS photonics, American Chemical Society, 2021, 8 (9), pp.2555-2566. (10.1021/acsphotonics.1c00707)
    DOI : 10.1021/acsphotonics.1c00707
  • 12-Core Erbium/Ytterbium-Doped Fiber Amplifier for 200G/400G Long-Haul, Metro-Regional, DCI Transmission Applications with ROADM
    • Pincemin Erwan
    • Jauffrit Jeremie
    • Disez Pierre-Yves
    • Loussouarn Yann
    • Le Bouette Claude
    • Kerampran Romain
    • Bordais Sylvain
    • Melin Gilles
    • Taunay Thierry
    • Jaouen Yves
    • Morvan Michel
    , 2021, pp.1-4. A 12-core Er/Yb-doped fiber amplifier with 21-dBm output power per core and 5.3-Watts multimode pump is used here to address various transmission applications with ROADM. 1200-km with 200G DP-QPSK and 300-km with 400G DP-16QAM are achieved in serial configuration at 1550-nm. Parallel 12x100-km transport with 400-ZR+ transceiver is also implemented. (10.1109/ECOC52684.2021.9606073)
    DOI : 10.1109/ECOC52684.2021.9606073
  • DDX Add-On Card: Transforming Any Optical Legacy Network into a Deterministic Infrastructure
    • Benzaoui Nihel
    • Soudais Guillaume
    • Angot Olivier
    • Bigo Sebastien
    , 2021, pp.1-4. We propose a novel slotted, scheduled, and synchronous add-on modular card to deliver data with truly deterministic performance over legacy optical networks. We achieve ultralow 50ns jitter and 25μs latency end-to-end for edge-cloud scenarios. (10.1109/ECOC52684.2021.9605880)
    DOI : 10.1109/ECOC52684.2021.9605880
  • Cellular traffic type recognition and prediction
    • Nguyen Tuan Anh
    • Martins Philippe
    , 2021, pp.1167-1172. (10.1109/PIMRC50174.2021.9569524)
    DOI : 10.1109/PIMRC50174.2021.9569524
  • Towards a Model Checking Tool for Strategy Logic with Simple Goals
    • Malvone Vadim
    • Stranieri Silvia
    , 2021.
  • Per Packet Distributed Monitoring Plane with Nanoseconds Measurements Precision
    • Soudais Guillaume
    • Bigo Sebastien
    • Benzaoui Nihel
    , 2021, pp.1-4. We propose a per-packet distributed monitoring plane for time-sensitive optical networking. Over a packet switched network and using our FPGA-based prototype, we demonstrate latency measurements of 6.4ns precision per hop with an offset of only 1.22µs. (10.1109/ECOC52684.2021.9605806)
    DOI : 10.1109/ECOC52684.2021.9605806
  • UCSL : A Machine Learning Expectation-Maximization framework for Unsupervised Clustering driven by Supervised Learning
    • Louiset Robin
    • Gori Pietro
    • Dufumier Benoit
    • Houenou Josselin
    • Grigis Antoine
    • Duchesnay Edouard
    , 2021. Subtype Discovery consists in finding interpretable and consistent subparts of a dataset, which are also relevant to a certain supervised task. From a mathematical point of view, this can be defined as a clustering task driven by supervised learning in order to uncover subgroups in line with the supervised prediction. In this paper, we propose a general Expectation-Maximization ensemble framework entitled UCSL (Unsupervised Clustering driven by Supervised Learning). Our method is generic, it can integrate any clustering method and can be driven by both binary classification and regression. We propose to construct a non-linear model by merging multiple linear estimators, one per cluster. Each hyperplane is estimated so that it correctly discriminates-or predictonly one cluster. We use SVC or Logistic Regression for classification and SVR for regression. Furthermore, to perform cluster analysis within a more suitable space, we also propose a dimension-reduction algorithm that projects the data onto an orthonormal space relevant to the supervised task. We analyze the robustness and generalization capability of our algorithm using synthetic and experimental datasets. In particular, we validate its ability to identify suitable consistent sub-types by conducting a psychiatric-diseases cluster analysis with known ground-truth labels. The gain of the proposed method over previous state-of-theart techniques is about +1.9 points in terms of balanced accuracy. Finally, we make codes and examples available in a scikit-learn-compatible Python package.
  • Studying and Exploiting the Relationship Between Model Accuracy and Explanation Quality
    • Jia Yunzhe
    • Frank Eibe
    • Pfahringer Bernhard
    • Bifet Albert
    • Lim Nick Jin Sean
    , 2021, 12976, pp.699--714. Many explanation methods have been proposed to reveal insights about the internal procedures of black-box models like deep neural networks. Although these methods are able to generate explanations for individual predictions, little research has been conducted to investigate the relationship of model accuracy and explanation quality, or how to use explanations to improve model performance. In this paper, we evaluate explanations using a metric based on area under the ROC curve (AUC), treating expert-provided image annotations as ground-truth explanations, and quantify the correlation between model accuracy and explanation quality when performing image classifications with deep neural networks. The experiments are conducted using two image datasets: the CUB-200-2011 dataset and a Kahikatea dataset that we publish with this paper. For each dataset, we compare and evaluate seven different neural networks with four different explainers in terms of both accuracy and explanation quality. We also investigate how explanation quality evolves as loss metrics change through the training iterations of each model. The experiments suggest a strong correlation between model accuracy and explanation quality. Based on this observation, we demonstrate how explanations can be exploited to benefit the model selection process—even if simply maximising accuracy on test data is the primary goal. (10.1007/978-3-030-86520-7_43)
    DOI : 10.1007/978-3-030-86520-7_43
  • Combine Model Checking and Runtime Verification in Multi-Agent Systems
    • Ferrando Angelo
    • Malvone Vadim
    , 2021.
  • IETF Draft,"Internet of Secure Elements", https://datatracker.ietf.org/doc/html/draft-urien-coinrg-iose-03
    • Urien Pascal
    , 2021.
  • Physics-based Deep Learning
    • Thuerey Nils
    • Holl Philipp
    • Mueller Maximilian
    • Schnell Patrick
    • Trost Felix
    • Um Kiwon
    , 2021. This digital book contains a practical and comprehensive introduction of everything related to deep learning in the context of physical simulations. As much as possible, all topics come with hands-on code examples in the form of Jupyter notebooks to quickly get started. Beyond standard supervised learning from data, we'll look at physical loss constraints, more tightly coupled learning algorithms with differentiable simulations, as well as reinforcement learning and uncertainty modeling. We live in exciting times: these methods have a huge potential to fundamentally change what computer simulations can achieve. (10.48550/arXiv.2109.05237)
    DOI : 10.48550/arXiv.2109.05237
  • Rank Aggregation for Non-stationary Data Streams
    • Irurozki Ekhine
    • Perez Aritz
    • Lobo Jesus
    • del Ser Javier
    , 2021, pp.297-313. The problem of learning over non-stationary ranking streams arises naturally, particularly in recommender systems. The rankings represent the preferences of a population, and the non-stationarity means that the distribution of preferences changes over time. We propose an algorithm that learns the current distribution of ranking in an online manner. The bottleneck of this process is a rank aggregation problem. We propose a generalization of the Borda algorithm for non-stationary ranking streams. As a main result, we bound the minimum number of samples required to output the ground truth with high probability. Besides, we show how the optimal parameters are set. Then, we generalize the whole family of weighted voting rules (the family to which Borda belongs) to situations in which some rankings are more reliable than others. We show that, under mild assumptions, this generalization can solve the problem of rank aggregation over non-stationary data streams. (10.1007/978-3-030-86523-8_18)
    DOI : 10.1007/978-3-030-86523-8_18
  • Literature Classification Data for a Systematic Mapping Study on Multi-Paradigm Modeling for Cyber-Physical Systems
    • Barisic Ankica
    • Cicchetti Antonio
    • Ruchkin Ivan
    • Blouin Dominique
    Data in Brief, Elsevier, 2021.
  • Vehicular traffic analysis based on Bluetooth sensors traces
    • Boudabous Safa
    , 2021. The pervasiveness of personal radio devices and the high penetration rate of these technologies in vehicles have, in recent years, made a strong case for the development of new traffic measurement techniques based on the analysis of the radio access network activity levels. In this thesis, we explore the use of sensor data gathered through Bluetooth (BT) passive scanning. Bluetooth sensors provide a cost-effective, low-impact and easy to deploy alternative to conventional techniques. They are adapted for mass deployment in urban areas at relatively low investment and maintenance costs. However, the BT indirect detection process may introduce bias and uncertainties that hinder the accuracy of the derived vehicular traffic metrics. In this context, we investigate the capacity to use Bluetooth sensors as a reliable sole data source for intelligent traffic systems in urban areas. Our work focuses on improving the accuracy of the obtained estimations of the traffic flow and the travel speed. The first part of this work concerns the task of vehicular traffic flow quantification from Bluetooth sensor data. We adopted a data-driven approach relying on statistical and machine learning models. We first considered traffic flow estimation in one sensing pose. Then, we proposed a model for network-scale flow estimation. In this contribution, we also introduced the transfer learning problem required to limit the need to acquire labelled training data for each new deployment. In the second part, we focus on the task of estimating the average travel speed. We propose an algorithm that uses the collected data about the quality of the received signal to improve the matching process and weigh individual vehicle speed contributions in calculating the average speed. During this work, we also developed a simulation framework of BT scanning for vehicular traffic monitoring. The simulator allows us to study and identify the factors impacting the probability, for one sensor, of detecting an active BT connection in its detection range and generate synthetic training datasets to handle data scarcity.
  • Damped Chirp Mixture Estimation via Nonlinear Bayesian Regression
    • Neri Julian
    • Depalle Philippe
    • Badeau Roland
    , 2021. Estimating mixtures of damped chirp sinusoids in noise is a problem that affects audio analysis, coding, and synthesis applications. Phase-based non-stationary parameter estimators assume that sinusoids can be resolved in the Fourier transform domain, whereas high-resolution methods estimate superimposed components with accuracy close to the theoretical limits, but only for sinusoids with constant frequencies. We present a new method for estimating the parameters of superimposed damped chirps that has an accuracy competitive with existing non-stationary estimators but also has a high-resolution like subspace techniques. After providing the analytical expression for a Gaussian-windowed damped chirp signal's Fourier transform, we propose an efficient variational EM algorithm for nonlinear Bayesian regression that jointly estimates the amplitudes, phases, frequencies, chirp rates, and decay rates of multiple non-stationary components that may be obfuscated under the same local maximum in the frequency spectrum. Quantitative results show that the new method not only has an estimation accuracy that is close to the Cramér-Rao bound, but also a high resolution that outperforms the state-of-the-art.
  • Analysis of Multi-Messages Retransmission Schemes
    • Khreis Alaa
    • Bassi Francesca
    • Ciblat Philippe
    • Duhamel Pierre
    , 2021. Hybrid Automatic ReQuest (HARQ) protocol enables reliable communications in wireless systems. Usually, several parallel streams are sent in successive timeslots following a time-sharing approach. Recently, multi-layer HARQ has been proposed by superposing packets within a timeslot. In this paper, we evaluate the potential of this multi-layer HARQ by playing with some design parameters. We show that a gain in throughput is only obtained at mid-Signal-to-Noise Ratio (SNR). (10.1109/ISWCS49558.2021.9562216)
    DOI : 10.1109/ISWCS49558.2021.9562216