Publications

Les publications de nos enseignants-chercheurs sont sur la plateforme HAL :

Publications HAL

Les publications des thèses des docteurs du LTCI sont sur la plateforme HAL :

HAL thèses

Retrouver les publications figurant dans l'archive ouverte HAL par année :

2020

Detection of Parkinson’s disease from handwriting using deep learning: a comparative study
- Taleb Catherine
- Likforman-Sulem Laurence
- Mokbel Chafic
- Khachab Maha
Evolutionary Intelligence, Springer, 2020. (10.1007/s12065-020-00470-0)
DOI : 10.1007/s12065-020-00470-0
Quantum dot lasers based photonics integrated circuits (invited)
- Grillot Frederic
- Duan Jianan
- Dong Bozhang
- Huang Heming
- Liu Songtao
- Norman Justin C
- Bowers John E
, 2020.
Regularized SAR Tomography Approaches
- Budillon Alessandra
- Denis Loïc
- Rambour Clément
- Schirinzi Gilda
- Tupin Florence
, 2021. Synthetic Aperture Radar (SAR) tomographic techniques enable the reconstruction of the scene scattering structure along the vertical direction and can provide the temporal evolution of a cloud of reliable points located in the 3D space. The use of Generalized Likelihood Ratio Test approaches have been shown to be effective in selecting reliable multiple scatter-ers. Recently regularized tomographic methods have been proposed for increasing the density of the recovered scatterers in urban environments. This paper discusses the differences between these two approaches and performs a comparison of reconstruction results obtained from a stack of TerraSAR-X images, in a region of interest located in the city of Paris, France. (10.1109/IGARSS39084.2020.9323807)
DOI : 10.1109/IGARSS39084.2020.9323807
All-optical modulation at mid-infrared wavelength with QCLs
- Spitz Olivier
- Herdt Andreas
- Maisons Gregory
- Carras Mathieu
- Elsaser Wolfgang
- Grillot Frederic
, 2020, pp.1-2. (10.1109/IPC47351.2020.9252250)
DOI : 10.1109/IPC47351.2020.9252250
Nonlinear optical properties of epitaxial quantum dot lasers on silicon (invited)
- Grillot Frederic
- Chow Weng
- Duan Jianan
- Norman Justin C
- Liu Songtao
- Bowers J. E.
, 2020.
Frequency-domain modeling of semiconductor mode lock lasers (invited)
- Chow Weng
- Liu Songtao
- Norman Justin C
- Duan Jianan
- Grillot Frédéric
- Bowers J. E.
, 2020.
COMPARISON BETWEEN MULTITEMPORAL GRAPH BASED CLASSICAL LEARNING AND LSTM MODEL CLASSIFICATIONS FOR SITS ANALYSIS
- Chaabane Ferdaous
- Réjichi Safa
- Tupin Florence
, 2020. Very High Resolution (VHR) multispectral Satellite Image Time Series (SITS) enables the production of temporal land cover maps, thanks to high spatial, temporal and spectral resolution of modern earth observation programs. Besides, statistical learning methods applied to SITS monitoring and analysis have created relatively efficient semi-automatic classification techniques. It would therefore be natural to think that the use of deep learning methods on SITS would lead to advances comparable to those known in the field of computer vision. However, when applied to concrete cases, the results are not as convincing. This paper proposes a comparison between a SOTAG (Spatial-Object Temporal Adjacency Graphs) SVM based spatio-temporal classification approach and the Recurrent Neuronal Network (RNN), LSTM (Long Short-Term Memory) model which is trained by historical SITS. The trained LSTM networks are then used to predict new time series data. Both methods perform a spatio-temporal map indicating the temporal profiles of cartographic regions. The proposed approaches will be applied on real and simulated SITS data. We will demonstrate that both results are comparable despite computational times and algorithms complexity.
Gender Identification through Handwriting: an Online Approach
- Cordasco Gennaro
- Buonanno Michele
- Faundez-Zanuy Marcos
- Riviello Maria Teresa
- Likforman-Sulem Laurence
- Esposito Anna
, 2020, pp.000197-000202. The present study was designed to identify writer's gender trough online handwriting and drawing analysis. Two groups - one of 126 males (mean age 24.65, SD=2.45) and the other of 114 females (mean age 24.51, SD=2.50) participants were recruited in the experiment. They were asked to perform seven writing and drawing tasks utilizing a digitizing tablet and a special writing device. Seventeen writing features grouped into five categories have been considered. The experiment's results show that the set of considered features enable to discriminate between male and female writers investigating their performance while copying a house drawing (task 2), writing words in capital letters (task 3) and writing a complete sentence in cursive letters (task 7), in particular focusing on Ductus (number of strokes) and Time categories of writing features (10.1109/CogInfoCom50765.2020.9237863)
DOI : 10.1109/CogInfoCom50765.2020.9237863
Compression collaborative de rayons de lumière pour le rendu distribué de Monte Carlo et applications
- Rousseau Sylvain
, 2020. Cette thèse s'inscrit dans le domaine de l'informatique graphique en étudiant un élément clé, à savoir les vecteurs unitaires. Nous proposons un nouvel espace de représentation d'ensemble de vecteurs unitaires avant de montrer plusieurs applications adaptant celles-ci à différents types de données. Dans une première partie, nous proposons une méthode de compression d'ensembles de vecteurs unitaires désordonnées. Cette méthode, nommée UniQuant permet de réaliser une compression des données de manière collaborative, en générant de la cohérence puis en l'exploitant pour changer l'espace de représentation des données. Celle-ci est ensuite exploitée au travers d'une première application, permettant de compresser des ensembles de nuages de points munis de normales, et ainsi, permet de réaliser la compression des données à la volée. Nous proposons ensuite une application à un élément clé du rendu de Monte Carlo : le rayon de lumière. Celui-ci est la structure de donnée de base permettant de réaliser la simulation du transport de la lumière dans une scène virtuelle 3D en construisant des chemins de lumière, représentés à l’aide de polylignes 3D, reliant le capteur virtuel (caméra) aux différentes sources de lumière. L'application de la compression est utilisée dans le cas distribué, où un moteur construit pour exploiter un ensemble de machines sur des réseaux distants est utilisé. Les architectures matérielles de ce type sont devenues de plus en plus populaires avec l’apparition de projets tels que SETI@Home. Elles pourraient facilement être étendues pour exploiter les machines présentes dans les institutions publiques ou dans les entreprises et utilisées moins de la moitié du temps. Cela permettrait ainsi d’exploiter la puissance de calcul perdue. La technique proposée utilise la multitude de rayons disponibles dans le cas d’un moteur distribué exploitant des portails de lumière pour réaliser une compression collaborative, permettant d’accélérer les vitesses de transfert de données sur un réseau non local. La compression des directions est étendue à celle des origines pour examiner l'impact de la baisse de précision sur les rendus. Nous montrons également que la précision de la compression des directions peut être corrélée aux matériaux rencontrés. Enfin, nous présentons QFib, une adaptation d’UniQuant à d’autres types de données présentant le même type de contraintes mathématiques que les ensembles de rayons : des tractogrammes. Ceux-ci sont couramment utilisés en neurosciences pour visualiser les zones d’influence neuronales dans le cerveau. Ils permettent aux neurochirurgiens de prédire les effets possibles d’une opération, et aux chercheurs de mieux comprendre le fonctionnement du cerveau. L’utilisation de ce type de données est complexe du fait de leur taille, les rendant difficiles à visionner, traiter, stocker ou même échanger. L’algorithme introduit permet de diviser cette taille par 10 en quelques secondes pour des jeux de données typiquement utilisé, tout en assurant une perte inférieure à la précision des IRM ayant permis d'obtenir les jeux de données.
ORSUM 2020- Workshop on Online Recommender Systems and User Modeling
- Vinagre João
- Jorge Alípio Mário
- Al-Ghossein Marie
- Bifet Albert
, 2020, pp.619--620. Modern online web-based systems continuously generate data at very fast rates. This continuous flow of data encompasses web content – e.g. posts, news, products, comments –, but also user feedback – e.g. ratings, views, reads, clicks, thumbs up –, as well as context information – device used, geographic info, social network, current user activity, weather. This is potentially overwhelming for systems and algorithms design to train in offline batches, given the continuous and potentially fast change of content, context and user preferences. Therefore it is important to investigate online methods to be able to transparently adapt to the inherent dynamics of online systems. Incremental models that learn from data streams are gaining attention in the recommender systems community, given their natural ability to deal with data generated in dynamic, complex environments. User modeling and personalization can particularly benefit from algorithms capable of maintaining models incrementally and online, as data is generated. The objective of this workshop is to foster contributions and bring together a growing community of researchers and practitioners interested in online, adaptive approaches to user modeling, recommendation and personalization, as well as other related tasks, such as evaluation, reproducibility, privacy and explainability. (10.1145/3383313.3411531)
DOI : 10.1145/3383313.3411531
Algorithmic and software contributions to graph mining
- de Lara Nathan
, 2020. Since the introduction of Google's PageRank method for Web searches in the late 1990s, graph algorithms have been part of our daily lives. In the mid 2000s, the arrival of social networks has amplified this phenomenon, creating new use-cases for these algorithms. Relationships between entities can be of multiple types: user-user symmetric relationships for Facebook or LinkedIn, follower-followee asymmetric ones for Twitter or even user-content bipartite ones for Netflix or Amazon. They all come with their own challenges and the applications are numerous: centrality calculus for influence measurement, node clustering for knowledge discovery, node classification for recommendation or embedding for link prediction, to name a few.In the meantime, the context in which graph algorithms are applied has rapidly become more constrained. On the one hand, the increasing size of the datasets with millions of entities, and sometimes billions of relationships, bounds the asymptotic complexity of the algorithms for industrial applications. On the other hand, as these algorithms affect our daily lives, there is a growing demand for explanability and fairness in the domain of artificial intelligence in general. Graph mining is no exception. For example, the European Union has published a set of ethics guidelines for trustworthy AI. This calls for further analysis of the current models and even new ones.This thesis provides specific answers via a novel analysis of not only standard, but also extensions, variants, and original graph algorithms. Scalability is taken into account every step of the way. Following what the Scikit-learn project does for standard machine learning, we deem important to make these algorithms available to as many people as possible and participate in graph mining popularization. Therefore, we have developed an open-source software, Scikit-network, which implements and documents the algorithms in a simple and efficient way. With this tool, we cover several areas of graph mining such as graph embedding, clustering, and semi-supervised node classification.
ABR prediction using supervised learning algorithms
- Yousef Hiba
- Le Feuvre Jean
- Storelli Alexandre
, 2020. With the massive increase of video traffic over the internet, HTTP adaptive streaming has now become the main technique for infotainment content delivery. In this context, many bandwidth adaptation algorithms have emerged, each aiming to improve the user QoE using different session information e.g. TCP throughput, buffer occupancy, download time... Notwithstanding the difference in their implementation, they mostly use the same inputs to adapt to the varying conditions of the media session. In this paper, we show that it is possible to predict the bitrate decision of any ABR algorithm, thanks to machine learning techniques, and supervised classification in particular. This approach has the benefit of being generic, hence it does not require any knowledge about the player ABR algorithm itself, but assumes that whatever the logic behind, it will use a common set of input features. Then, using machine learning feature selection, it is possible to predict the relevant features and then train the model over real observation. We test our approach using simulations on well-known ABR algorithms, then we verify the results on commercial closed-source players, using different VoD and Live realistic data sets. The results show that both Random Forest and Gradient Boosting achieve a very high prediction accuracy among other ML-classifier. (10.1109/MMSP48831.2020.9287123)
DOI : 10.1109/MMSP48831.2020.9287123
Surrogate-Model-Aided Optimization of the Bandwidth for a Planar UWB Inverted-F Antenna
- Du Jinxin
- Roblin Christophe
- Yang Xue-Xia
, 2020, pp.1-3. (10.1109/ICMMT49418.2020.9386410)
DOI : 10.1109/ICMMT49418.2020.9386410
RFC 8902 TLS Authentication Using Intelligent Transport System (ITS) Certificates
- Msahli Mounira
- Cam-Winget Nancy
- Whyte William
- Serhrouchni Ahmed
- Labiod Houda
, 2020.
On Performance Measurement in Psychology and Other Fields
- Guiard Yves
, 2020. The concept of quantitative performance has been used increasingly outside work management, its main field of origin, since the advent of the industrial era, pervading not just experimental psychology and many domains of science and engineering, but virtually all sectors of social life. Surprisingly, the key defining characteristic of performance measures seems to have systematically escaped notice: a performance is a numerical score subject to a deliberate extremization (i.e., minimization or maximization) effort exerted by a human agent against the resistance of a limit. Because of this characteristic performances must be recognized to constitute measures of a very special kind, where the numerical is marked axiologically. The paper contrasts the extremized scores of performance measurement with the optimized measures of feedback-controlled, regulated systems. In performance measurement the best numerical values are extrema, rather than optima, and the function that links the axiological value to the numerical value is strictly convex, rather than strictly concave. One-dimensional performance measurement is analyzed in the extremely simple case of spirometry testing, where forced vital capacity, a measure of respiratory performance, is shown to be determined by the interplay of two variables, neither of which can be directly measured: the maximization effort, which varies haphazardly from trial to trial, and the patient’s total lungs capacity, a personal upper bound, whose inductive estimation is the goal of spirometry testing. The paper shows that the magnitude of the estimation error decreases linearly with the magnitude of the patient’s effort, explaining why respirologists so strongly urge their patients to blow as hard as they can into the spirometer. The paper then turns to two-dimensional performance, analyzing distributional data from a psychology experiment on speeded aimed-movement. The variation of the speed/accuracy balance is shown to entail systematic changes in the markedly asymmetrical shapes of movement time and error distributions: The stronger the directional compression effect observable on one performance measure, the weaker this effect on the other. These observations are hard to reconcile with the traditional view that performance measures are random variables and raise doubts on the suitability of the classic descriptive tools of statistics, whether parametric or nonparametric, when it comes to the decidedly special case of performance data. One possible direction for a more appropriate statistical approach to performance data is tentatively outlined.
IoT data stream analytics
- Bifet Albert
- Gama João
Annals of Telecommunications - annales des télécommunications, Springer, 2020, 75 (9-10), pp.491--492. The volume of IoT data is rapidly increasing due to the development of the technology of information and communication. This data comes mostly in the form of streams. Learning from this ever-growing amount of data requires flexible learning models that self-adapt over time. Traditional one shot memory-based learning methods trained offline from a static historic data cannot cope with evolving data streams. This is because firstly, it is not feasible to store all incoming data over time and secondly the generated models become quickly obsolete due to data distribution changes, also known as “concept drift.” The basic assumption of offline learning is that data is generated by a stationary process and the learning models are consistent with future data. However, in multiple applications like IoT, web mining, social networks, network monitoring, sensor networks, telecommunications, financial forecasting, etc., data samples arrive continuously as unlimited streams often at high speed. Moreover, the phenomena generating these data streams may evolve over time. In this case, the environment in which the system or the phenomenon generated the data is considered to be dynamic, evolving, or non-stationary. Learning methods used to learn from data generated by dynamically evolving and potentially non-stationary processes must take into account many constraints: (pseudo) real-time processing, high-velocity, and dynamic multiform change such as concept drift and novelty. In addition in data streams scenarios, the number of classes is often unknown in advance. Therefore, new classes can appear at any time and they must be detected, and the predictor structure must be updated. It is worthwhile to emphasize that streams are very often generated by distributed sources, especially with the advent of Internet of Things, and, therefore, processing them centrally may not be efficient, particularly if the infrastructure is large and complex. Scalable and decentralized learning algorithms are potentially more suitable and efficient. This special issue aims at discussing the problem of learning from IoT data streams generated by evolving non-stationary processes. It centers on the advances of techniques, methods, and tools that are dedicated to manage, exploit, and interpret data streams in nonstationary environments. In particular, it focuses on the problems of modeling, prediction, and classification based on learning from data streams. (10.1007/S12243-020-00811-1)
DOI : 10.1007/S12243-020-00811-1
La désagrégation de consommations électriques dans les grands bâtiments : analyses, simulations et apprentissage non-supervisé par factorisation de matrices
- Henriet Simon
, 2020. With the increasing awareness about the problem of climate change and the high level of energy consumption, a need for energy efficiency has emerged especially for electric power consumptions in buildings. To spur energy savings, industrials have been looking for measurement methods to monitor power consumptions. Appliance load monitoring has thus become an active research field. Monitoring and understanding the electrical consumption of appliances can also be useful for predictive maintenance, power quality analyses, demand forecasting or occupancy detection. Thirty years ago, a method called Non Intrusive Load Monitoring (NILM) has been introduced. It consists of estimating individual appliance energy consumptions from the measurement of the total consumption of the building. Its main advantage over traditional sub-metering methods is to use a single electric power meter at the main breaker of the building and then use a disaggregation algorithm to separate the contributions of each appliance. The goal of this thesis is to address the algorithmic challenge offered by NILM. The NILM problem can be formulated as a source separation problem, where the sources are the individual electric consumptions and the mixed observation is simply the sum of individual consumptions. Its main difficulties are: (i) the standardization of the formulation, (ii) the ill-posedness of the problem, (iii) the lack of knowledge and (iv) the machine learning algorithm design. All our contributions follow from the principal objective that is to solve the NILM problem for huge systems such as commercial or industrial buildings using high frequency current and voltage measurements. However, houses and the specific equipment found inside these buildings are not excluded of the study. This thesis is split into two parts.In the first part, we tackle the lack of knowledge and datasets for NILM in commercial buildings. First of all, the NILM community has mostly focused on both residential NILM application and using low frequency data provided by power meter installed by utility providers. To tackle the lack of knowledge on higher frequency data and on other kind of buildings such as commercial or industrial installations, we propose a statistical analysis based on public and private datasets. Our study on the rank of current matrix conducted for individual devices will serve as the base of a new device taxonomy and to prior assumptions on the rest of this thesis. Secondly, we address the lack of datasets especially for commercial buildings by developping an algorithm for generating synthetic current data based on a modelization of the current flowing through an electrical device. To encourage research on commercial buildings we release a synthesized dataset called SHED that can be used to evaluate NILM algorithms.In the second part, we deal with the NILM software challenges by exploring unsupervised source separation techniques. To overcome the unaddressed difficulties of processing high frequency current signals that are measured in large buildings, we propose a novel technique called Independent-Variation Matrix Factorization (IVMF), which expresses an observation matrix as the product of two matrices: the "signature" and the "activation". Motivated by the nature of the current signals, it uses a regularization term on the temporal variations of the activation matrix and a positivity constraint, and the columns of the signature matrix are constrained to lie in a specific set. To solve the resulting optimization problem, we rely on an alternating minimization strategy involving dual optimization and quasi-Newton algorithms. IVMF is the first proposed algorithm especially designed for high frequency NILM in huge buildings. We finally show that IVMF outperforms competing methods (Independent Component Analysis, Semi Non-negative Matrix Factorization) on NILM datasets.
Protections des processeurs contre les cyber-attaques par vérification de l’intégrité du flot d’exécution
- Timbert Michaël
, 2020. Les cyber-attaques reposent sur l'intrusion des systèmes numériques en exploitant des vulnérabilités pour prendre le contrôle du système. De nombreuses protections existent contre les cyber-attaques. Parmi elles, citons les techniques d'obfuscation de code, de vérifications d'intégrité de la mémoire,la personnalisation du jeu d'instruction, la distribution aléatoire de l'espace d'adressage (ASLR), les anticipations par les canaris ou bac à sable, l'isolation des processus (machines virtuelles), la gestion de droits d'accès. Au niveau matériel, les processeurs modernes procurent des techniques de sécurisation par isolation de zones (anneaux de protection, MMU, NX bit, Trustzone). Le Contrôle de l'Intégrité du flux d'exécution (Control Flow Integrity, CFI) est une nouvelle technique proposée par Abadi et al. pour empêcher la corruption d'un programme. Cette technique a donné lieu à beaucoup d'implémentations mais aucune n'est à la fois complète, rapide et facilement incorporable aux processeurs existants. Cette thèse est inspirée des travaux de HCODE qui implémente l'intégrité du code par calcule de signature pour chaque bloc de base de code exécuté. HCODE est un module matériel conçu pour être connecté en lecture seule sur l'interface entre le processeur et le cache d'instruction. Dans cette thèse nous présentons une amélioration de HCODE nommée CCFI qui fournit à la fois la protection de d'intégrité de code et l'intégrité du flux d'exécution. Nous proposons une architecture capable de protéger les sauts directs et indirects aussi bien que les interruptions. La solution proposée repose à la fois sur des modules matériels et sur des modifications du code pour assurer rapidité et flexibilité de la solution. Pour garantir une protection CFI complète, des métadonnées sont ajoutées au code. Ces métadonnées décrivent le graphe de flot de contrôle (Control Flow Graph, CFG) du programme. Celles-ci sont calculées statiquement pendant la phase de compilation et sont utilisées par le module matériel CCFI en conjonction avec le code exécuté pour garantir que le CFG est respecté. Nous démontrons que notre solution est capable de fournir une intégrité du flux d'exécution Complète en étant à la fois rapide et facilement adaptable aux processeurs existants. Nous l'illustrons sur deux processeurs RISC-V.
Challenges of Stream Learning for Predictive Maintenance in the Railway Sector
- Nguyen Minh-Huong Le
- Turgis Fabien
- Fayemi Pierre-Emmanuel
- Bifet Albert
, 2020, 1325, pp.14--29. Smart trains nowadays are equipped with sensors that generate an abundance of data during operation. Such data may, directly or indirectly, reflect the health state of the trains. Thus, it is of interest to analyze these data in a timely manner, preferably on-the-fly as they are being generated, to make maintenance operations more proactive and efficient. This paper provides a brief overview of predictive maintenance and stream learning, with the primary goal of leveraging stream learning in order to enhance maintenance operations in the railway sector. We justify the applicability and promising benefits of stream learning via the example of a real-world railway dataset of the train doors. (10.1007/978-3-030-66770-2\_2)
DOI : 10.1007/978-3-030-66770-2\_2
Malliavin calculus and Dirichlet structures for independent random variables
- Halconruy Hélène
, 2020. Malliavin calculus was initially developed to provide an infinite-dimensional variational calculus on the Wiener space and further extended to other spaces. In this work, we develop such one in two discrete frameworks. First, we equip any countable product of probability spaces with a discrete Dirichlet-Malliavin structure, consisting of a family of Malliavin operators (gradient, divergence, number operator), an integration by parts formula, and the induced Dirichlet forms. We get the analogues of the classical functional identities and retrieve the usual Poisson and Brownian Dirichlet structures as limits of our induced structures. We provide discrete Stein-Malliavin criterions for the Normal and the Gamma approximations. Second we study insider's trading in a ternary model, Iying on a three-points compound geometric process. We state a modified chaotic decomposition and define the geometric gradient and divergence operators as the annihilation and creation operators acting on it. We state a geometric Ocone-Karatzas formula. We express the insider's additional expected logarithmic utility in terms of relative entropy as in the continuous case.
Dynamic properties of two-state lasing quantum dot laser for external optical feedback resistant applications
- Duan Jianan
- Zhou Yueguang
- Huang Heming
- Dong Bozhang
- Wang Cheng
- Grillot Frédéric
, 2020.
Single-bit Laser Fault Model in NOR Flash Memories
- Menu Alexandre
- Dutertre Jean-Max
- Rigaud Jean-Baptiste
- Colombier Brice
- Moellic Pierre-Alain
- Danger Jean-Luc
, 2020, pp.41-48. Laser injection is a powerful fault injection technique with a high spatial accuracy which allows an adversary to efficiently extract the secret information from an electronic device. The control and the repeatability of faults requires the attacker to understand the relation of the fault model to the setup (notably the laser spot size) and the process node of the target device. Most studies on laser fault injection report fault models resulting from a photo-electric current in CMOS transistors. This study provides a black-box analysis of the effect of a photo-electric current in floating-gate transistors of two embedded NOR Flash memories from two different manufacturers. Experimental results demonstrate that single-bit bit-set faults can be injected in code and data without corrupting the Flash memory, even with a laser spot of more than 20 μm in diameter, which is several orders of magnitude larger than the process node of the floating-gate transistors in the experiments. This article also presents the specifics of performing a "safe-error" attack on AES, leveraging the previously detailed single-bit bit-set fault model. (10.1109/FDTC51366.2020.00013)
DOI : 10.1109/FDTC51366.2020.00013
Balancing expressiveness and inexpressiveness in view design
- Benedikt Michael
- Bourhis Pierre
- Jachiet Louis
- Tsamoura Efthymia
, 2020, pp.109--118. We study the design of data publishing mechanisms that allow a collection of autonomous distributed datasources to collaborate to support queries. A common mechanism for data publishing is via views: functions that expose derived data to users, usually specified as declarative queries. Our autonomy assumption is that the views must be on individual sources, but with the intention of supporting integrated queries. In deciding what data to expose to users, two considerations must be balanced. The views must be sufficiently expressive to support queries that users want to ask -- the utility of the publishing mechanism. But there may also be some expressiveness restriction. Here we consider two restrictions, a minimal information requirement, saying that the views should reveal as little as possible while supporting the utility query, and a non-disclosure requirement, formalizing the need to prevent external users from computing information that data owners do not want revealed. We investigate the problem of designing views that satisfy both an expressiveness and an inexpressiveness requirement, for views in a restricted declarative language (conjunctive queries), and for arbitrary views. (10.24963/kr.2020/12)
DOI : 10.24963/kr.2020/12
Heavy-tailed Representations, Text Polarity Classification & Data Augmentation
- Jalalzai Hamid
- Colombo Pierre
- Clavel Chloé
- Gaussier Éric
- Varni Giovanna
- Vignon Emmanuel
- Sabourin Anne
, 2020. The dominant approaches to text representation in natural language rely on learning embeddings on massive corpora which have convenient properties such as compositionality and distance preservation. In this paper, we develop a novel method to learn a heavy-tailed embedding with desirable regularity properties regarding the distributional tails, which allows to analyze the points far away from the distribution bulk using the framework of multivariate extreme value theory. In particular, a classifier dedicated to the tails of the proposed embedding is obtained which exhibits a scale invariance property exploited in a novel text generation method for label preserving dataset augmentation. Experiments on synthetic and real text data show the relevance of the proposed framework and confirm that this method generates meaningful sentences with controllable attribute, e.g. positive or negative sentiments.
80 GHz beatnote generation in a single tapered distributed feedback hybrid III-V / silicon laser
- Grillot Frederic
- Callado G.
- Verolet T.
- Jany Christophe
- Hassan K.
- Malhouitre Stephane
- Coquiard A.
- Combrie Sylvain
- Shen A.
- de Rossi A.
, 2020.

Retour aux années