Publications

Les publications de nos enseignants-chercheurs sont sur la plateforme HAL :

Publications HAL

Les publications des thèses des docteurs du LTCI sont sur la plateforme HAL :

HAL thèses

Retrouver les publications figurant dans l'archive ouverte HAL par année :

2022

Worldwide Gender Differences in Public Code Contributions
- Rossi Davide
- Zacchiroli Stefano
, 2022. Gender imbalance is a well-known phenomenon observed throughout sciences which is particularly severe in software development and Free/Open Source Software communities. Little is know yet about the geography of this phenomenon in particular when considering large scales for both its time and space dimensions. We contribute to fill this gap with a longitudinal study of the population of contributors to publicly available software source code. We analyze the development history of 160 million software projects for a total of 2.2 billion commits contributed by 43 million distinct authors over a period of 50 years. We classify author names by gender using name frequencies and author geographical locations using heuristics based on email addresses and time zones. We study the evolution over time of contributions to public code by gender and by world region. For the world overall, we confirm previous findings about the low but steadily increasing ratio of contributions by female authors. When breaking down by world regions we find that the long-term growth of female participation is a worldwide phenomenon. We also observe a decrease in the ratio of female participation during the COVID-19 pandemic, suggesting that women's ability to contribute to public code has been more hindered than that of men. (10.1145/3510458.3513011)
DOI : 10.1145/3510458.3513011
You might think about slightly revising the title”: Identifying Hedges in Peer-tutoring Interactions
- Raphalen Yann
- Clavel Chloé
- Cassell Justine
, 2022, 1, pp.2160-2174. Hedges play an important role in the management of conversational interaction. In peertutoring, they are notably used by tutors in dyads (pairs of interlocutors) experiencing low rapport to tone down the impact of instructions and negative feedback. Pursuing the objective of building a tutoring agent that manages rapport with students in order to improve learning, we used a multimodal peer-tutoring dataset to construct a computational framework for identifying hedges. We compared approaches relying on pre-trained resources with others that integrate insights from the social science literature. Our best performance involved a hybrid approach that outperforms the existing baseline while being easier to interpret. We employ a model explainability tool to explore the features that characterize hedges in peer-tutoring conversations, and we identify some novel features, and the benefits of such a hybrid model approach. (10.18653/v1/2022.acl-long.153)
DOI : 10.18653/v1/2022.acl-long.153
Gaussian bounds for discrete entropies
- Rioul Olivier
, 2022. It is well known that the Gaussian distribution has the largest differential entropy amongst all distributions of equal variance. In this paper, we derive similar (generalized) Gaussian upper bounds for discrete (Rényi) entropies of integer-valued variables. Using a mixed discrete-continuous bounding technique and the Poisson summation formula from Fourier analysis, it is proved that in many cases, such Gaussian bounds hold with an additive term that vanishes exponentially as the variance increases.
Learning Disentangled Textual Representations via Statistical Measures of Similarity
- Colombo Pierre
- Staerman Guillaume
- Noiry Nathan
- Piantanida Pablo
, 2022, pp.2614–2630. When working with textual data, a natural application of disentangled representations is the fair classification where the goal is to make predictions without being biased (or influenced) by sensible attributes that may be present in the data (e.g., age, gender or race). Dominant approaches to disentangle a sensitive attribute from textual representations rely on learning simultaneously a penalization term that involves either an adversary loss (e.g., a discriminator) or an information measure (e.g., mutual information). However, these methods require the training of a deep neural network with several parameter updates for each update of the representation model. As a matter of fact, the resulting nested optimization loop is both times consuming, adding complexity to the optimization dynamic, and requires a fine hyperparameter selection (e.g., learning rates, architecture). In this work, we introduce a family of regularizers for learning disentangled representations that do not require training. These regularizers are based on statistical measures of similarity between the conditional probability distributions with respect to the sensible attributes. Our novel regularizers do not require additional training, are faster and do not involve additional tuning while achieving better results both when combined with pretrained and randomly initialized text encoders. (10.18653/v1/2022.acl-long.187)
DOI : 10.18653/v1/2022.acl-long.187
Imputing out-of-vocabulary embeddings with LOVE makes language models robust with little cost
- Chen Lihu
- Varoquaux Gaël
- Suchanek Fabian
, 2022. State-of-the-art NLP systems represent inputs with word embeddings, but these are brittle when faced with Out-of-Vocabulary (OOV) words. To address this issue, we follow the principle of mimick-like models to generate vectors for unseen words, by learning the behavior of pre-trained embeddings using only the surface form of words. We present a simple contrastive learning framework, LOVE, which extends the word representation of an existing pre-trained language model (such as BERT), and makes it robust to OOV with few additional parameters. Extensive evaluations demonstrate that our lightweight model achieves similar or even better performances than prior competitors, both on original datasets and on corrupted variants. Moreover, it can be used in a plug-and-play fashion with FastText and BERT, where it significantly improves their robustness.
Support for Detecting Integer Overow Vulnerability
- Kissi Salim Yahia
- Seladji Yassamine
- Ameur-Boulifa Rabéa
, 2022. Link to video presentation: https://www.youtube.com/watch?v=2LHRU_Y6SM4 EXTENDED ABSTRACT. Organizations and companies develop very complex software today. Errors and flaws can be introduced at different phases of the software development life cycle and can lead to exploitable vulnerabilities. Furthermore, considering that most systems are exposed to multiple users and environments, such flaws can lead to attacks (or actions) with unpredictable consequences in terms of damage and costs. It is therefore crucial that developers and users know how to detect and prevent them. Despite the substantial knowledge about vulnerabilities nowadays there is still an increasing trend in the number of reported vulnerabilities, that is why software security has become an active research area. This involves various approaches [4, 1] reverse engineering, code review, static and dynamic analysis, fuzzing and debugging. Increasing scale of software systems requires vulnerabilities scanning tools and supports, that ease their detection and can help coders to avoid them in the development of the code. Furthermore, the use of tools that integrate formal methods might provide evidence for security goals. Towards this end, we have proposed an approach [3] for detecting security bugs which are due to memory safety issues. We were particularly interested in detecting exploits that may be caused by integer overflow. The relevance of our approach lies in the fact that it is based on hardware/software co-analysis. We provided a uniform method for software analysis, considering the specifications of their execution environment (CPUs, compilers, operating systems). The main idea is to build a formula based on the path condition of a given target location in conjunction with the formula (assertions) specifying the environment of its execution, and asking a SMT solver for a satisfying solution, to find out whether the unintended solution is possible. Formally speaking, we use symbolic execution to generate program constraint (PC), and get security constraint (SC) from predefined security requirements. In addition, based on a precise knowledge on the execution context of the analyzed program (EC), we propose to solve the statement: EC⊢ PC∧ ¬ SC we seek to nd out if there is an assignment of values to program inputs executed in a certain context which could satisfy PC but violates SC. This paper is an extension of our previous work. In this paper, we have significantly extended and implemented the proposed methodology. First, we enrich the set of formulas specifying the effects of memory reference instructions, across different execution environments to address more platforms. The assertions are derived from various sources, including bad programming practices, compiler configuration settings, operating systems, Common Weakness Enumeration and C standards. Second, we give the technical approach for the vulnerability detection: we give details of the tool we are developing for the analysis of C programs. The tool relies on Clang/LLVM compiler infrastructure [2]. This option was motivated by the benefits offered by the infrastructure, including the fact that it supports various target environments such as (x86, x86-64, arm, riscv64, . . . ). (10.13140/RG.2.2.24226.71363)
DOI : 10.13140/RG.2.2.24226.71363
Wind power predictions from nowcasts to 4-hour forecasts: a learning approach with variable selection
- Bouche Dimitri
- Flamary Rémi
- d'Alché-Buc Florence
- Plougonven Riwal
- Clausel Marianne
- Badosa Jordi
- Drobinski Philippe
, 2022. We study the prediction of short term wind speed and wind power (every 10 minutes up to 4 hours ahead). Accurate forecasts for those quantities are crucial to mitigate the negative effects of wind farms' intermittent production on energy systems and markets. For those time scales, outputs of numerical weather prediction models are usually overlooked even though they should provide valuable information on higher scales dynamics. In this work, we combine those outputs with local observations using machine learning. So as to make the results usable for practitioners, we focus on simple and well known methods which can handle a high volume of data. We study first variable selection through two simple techniques, a linear one and a nonlinear one. Then we exploit those results to forecast wind speed and wind power still with an emphasis on linear models versus nonlinear ones. For the wind power prediction, we also compare the indirect approach (wind speed predictions passed through a power curve) and the indirect one (directly predict wind power). (10.48550/arXiv.2204.09362)
DOI : 10.48550/arXiv.2204.09362
RISC-V Extension for Lightweight Cryptography, Protection against SCA
- Tehrani Etienne
- Graba Tarik
- Si Merabet Abdelmalek
- Danger Jean-Luc
, 2022.
Delay Measurement of 0-RTT Transport Layer Security (TLS) Handshake Protocol
- Goncharskyi Danylo
- Kim Sung Yong
- Serhrouchni Ahmed
- Gu Pengwenlong
- Khatoun Rida
- Hachem Joel
, 2022, pp.1450-1454. (10.1109/CoDIT55151.2022.9803984)
DOI : 10.1109/CoDIT55151.2022.9803984
Realization and measurement of a wideband metamaterial absorber composed with structural composite materials
- Begaud Xavier
- Lepage Anne Claire
- Rance Olivier
- Soiron Michel
- Barka André
- Laybros Sarah
, 2022. This contribution presents the realization and measurement of a metamaterial absorber first designed with RF materials and replaced by structural composite materials, i.e. fiber reinforced. First, the optimization of the absorbing material took into account the electrical characteristics of the materials compatible with the targeted application. In a second step, it was necessary to optimize the whole again to take into account the process constraints, in particular the thickness of the composite ply combining fiber and resin. After measurement, this absorbing material has a magnitude of the reflection coefficient at normal incidence less than - 13.2 dB from 5.2 GHz to 18 GHz, for a total thickness of 8.9 mm. (10.1109/iWAT54881.2022.9811007)
DOI : 10.1109/iWAT54881.2022.9811007
Par(ab)oles... Par(ab)oles...
- Zayana Karim
- Boyer Ivan
- Hormière Pierre-Jean
- Rabiet Victor
CultureMath, ENS, 2022.
Reasoning about Human-Friendly Strategies in Repeated Keyword Auctions
- Belardinelli Francesco
- Jamroga Wojciech
- Malvone Vadim
- Mittelmann Munyque
- Murano Aniello
- Perrussel Laurent
, 2022, pp.62--71. In online advertising, search engines sell ad placements for keywords continuously through auctions. This problem can be seen as an infinitely repeated game since the auction is executed whenever a user performs a query with the keyword. As advertisers may frequently change their bids, the game will have a large set of equilibria with potentially complex strategies. In this paper, we propose the use of natural strategies for reasoning in such setting as they are processable by artificial agents with limited memory and/or computational power as well as understandable by human users. To reach this goal, we introduce a quantitative version of Strategy Logic with natural strategies in the setting of imperfect information. In a first step, we show how to model strategies for repeated keyword auctions and take advantage of the model for proving properties evaluating this game. In a second step, we study the logic in relation to the distinguishing power, expressivity, and model-checking complexity for strategies with and without recall.
Finding Optimal Moving Target Defense Strategies: A Resilience Booster for Connected Cars
- Ayrault Maxime
- Kühne Ulrich
- Borde Etienne
Information, MDPI, 2022, 13 (5), pp.242. During their life-cycle, modern connected cars will have to face various and changing security threats. As for any critical embedded system, security fixes in the form of software updates need to be thoroughly verified and cannot be deployed on a daily basis. The system needs to commit to a defense strategy, while attackers can examine vulnerabilities and prepare possible exploits before attacking. In order to break this asymmetry, it can be advantageous to use proactive defenses, such as reconfiguring parts of the system configuration. However, resource constraints and losses in quality of service need to be taken into account for such Moving Target Defenses (MTDs). In this article, we present a game-theoretic model that can be used to compute an optimal MTD defense for a critical embedded system that is facing several attackers with different objectives. The game is resolved using off-the-shelf MILP solvers. We validated the method with an automotive use case and conducted extensive experiments to evaluate its scalability and stability. (10.3390/info13050242)
DOI : 10.3390/info13050242
Process for processing data by an artificial neural network with grouped executions of individual operations to avoid side-channel attacks, and corresponding system
- Danger Jean-Luc
- Chabanne Hervé
- Guiga Linda
, 2022.
Demonstrating Virtual IO For Internet Of Things Devices Secured By TLS Server In Secure Element
- Urien Pascal
, 2022, pp.111-112. This demonstration presents an internet of things device (thermostat), whose security is enforced by a secure element (smartcard) running TLS server, and using Virtual Input/Ouput technology. The board comprises a Wi-Fi system on chip (SoC), a micro-controller managing sensor (temperature probe) and actuator (relay), and a javacard. All device messages are sent/received over TLS, and processed by the secure element. Some of them are exported to micro-controller in clear form, which returns a response, sent over TLS by the smartcard. (10.1109/IoTDI54339.2022.00025)
DOI : 10.1109/IoTDI54339.2022.00025
Explicit values of the DDT, the BCT, the FBCT, and the FBDT of the inverse, the gold, and the Bracken-Leander S-boxes
- Eddahmani Said
- Mesnager Sihem
Cryptography and Communications - Discrete Structures, Boolean Functions and Sequences, Springer, 2022, 14 (6), pp.1301-1344. The inverse, the Gold, and the Bracken-Leander functions are crucial for building S-boxes of block ciphers with good cryptographic properties in symmetric cryptography. These functions have been intensively studied, and various properties related to standard attacks have been investigated. Thanks to novel advances in symmetric cryptography and, more precisely, those pertaining to boomerang cryptanalysis, this article continues to follow this momentum and further examine these functions. More specifically, we revisit and bring new results about their Difference Distribution Table (DDT), their Boomerang Connectivity Table (BCT), their Feistel Boomerang Connectivity Table (FBCT), and their Feistel Boomerang Difference Table (FBDT). For each table, we give explicit values of all entries by solving specific systems of equations over the finite field F2n of cardinality 2n and compute the cardinalities of their corresponding sets of such values. The explicit values of the entries of these tables and their cardinalities are crucial tools to test the resistance of block ciphers based on variants of the inverse, the Gold, and the Bracken-Leander functions against cryptanalytic attacks such as differential and boomerang attacks. The computation of these entries and the cardinalities in each table aimed to facilitate the analysis of differential and boomerang cryptanalysis of S-boxes when studying distinguishers and trails. (10.1007/s12095-022-00581-8)
DOI : 10.1007/s12095-022-00581-8
Estimation of RF and ELF dose by anatomical location in the brain from wireless phones in the MOBI-Kids study
- Calderón Carolina
- Castaño-Vinyals Gemma
- Maslanyj Myron
- Wiart Joe
- Lee Ae-Kyoung
- Taki Masao
- Wake Kanako
- Abert Alex
- Badia Francesc
- Hadjem Abdelhamid
- Kromhout Hans
- de Llobet Patricia
- Varsier Nadège
- Conil Emmanuelle
- Choi Hyung-Do
- Sim Malcolm
- Cardis Elisabeth
Environment International, Elsevier, 2022, 163, pp.107189. Wireless phones (both mobile and cordless) emit not only radiofrequency (RF) electromagnetic fields (EMF) but also extremely low frequency (ELF) magnetic fields, both of which should be considered in epidemiological studies of the possible adverse health effects of use of such devices. This paper describes a unique algorithm, developed for the multinational case-control MOBI-Kids study, that estimates the cumulative specific energy (CSE) and the cumulative induced current density (CICD) in the brain from RF and ELF fields, respectively, for each subject in the study (aged 10–24 years old). Factors such as age, tumour location, self-reported phone models and usage patterns (laterality, call frequency/duration and hands-free use) were considered, as was the prevalence of different communication systems over time. Median CSE and CICD were substantially higher in GSM than 3G systems and varied considerably with location in the brain. Agreement between RF CSE and mobile phone use variables was moderate to null, depending on the communication system. Agreement between mobile phone use variables and ELF CICD was higher overall but also strongly dependent on communication system. Despite ELF dose distribution across the brain being more diffuse than that of RF, high correlation was observed between RF and ELF dose. The algorithm was used to systematically estimate the localised RF and ELF doses in the brain from wireless phones, which were found to be strongly dependent on location and communication system. Analysis of cartographies showed high correlation across phone models and across ages, however diagonal agreement between these cartographies suggest these factors do affect dose distribution to some level. Overall, duration and number of calls may not be adequate proxies of dose, particularly as communication systems available for voice calls tend to become more complex with time. (10.1016/j.envint.2022.107189)
DOI : 10.1016/j.envint.2022.107189
OPCoSA: an Optimized Product Code for space applications
- Freitas David
- Silveira Jarbas
- Marcon César
- Naviner Lirida
- Mota João
Integration, the VLSI Journal, Elsevier, 2022, 84, pp.131-141. The integrated circuit shrinkage increases the probability and the number of errors in memories due to the increase in the sensitivity to electromagnetic radiation. Critical application systems employ Error Correction Codes (ECC) to mitigate memory faults. This work introduces the Optimized Product Code for Space Applications (OPCoSA), an ECC that optimizes its original version called PCoSA, reducing 16-redundancy bits and keeping high error correction capacity. We evaluated the optimized ECC through tests with 36 specific error patterns, burst errors, and exhaustive analysis. Additionally, we compared the synthesis results in hardware, reliability, and redundancy to four other ECCs dedicated to the space application. Tests have shown that OPCoSA corrects all 36 error patterns and 100% of cases for up to four burst errors; besides, it has correction rates of 100%, 100%, 95.4%, and 78.9% for exhaustive errors of dimension one to four, respectively. (10.1016/j.vlsi.2022.02.005)
DOI : 10.1016/j.vlsi.2022.02.005
Sommes : géométriques = mirifiques ?
- Zayana Karim
- Rabiet Victor
CultureMath, ENS, 2022. Ici tout est gratuit... jusqu'à la caisse ! Les mathématiques financières font bon ménage avec les suites géométriques. Voilà pourquoi les programmes scolaires modélisent des situations, certes simplifiées, ayant trait à ce contexte. Après les avoir résumées, nous en élargirons le propos, en quête de l'une des nouvelles trouvailles marketing du moment : le bon d'achat réutilisable.
Produits scolaires
- Zayana Karim
- Boyer Ivan
- Rabiet Victor
CultureMath, ENS, 2022.
Variations on a Theme by Massey
- Rioul Olivier
IEEE Transactions on Information Theory, Institute of Electrical and Electronics Engineers, 2022, 68 (5), pp.2813—2828. In 1994, Jim Massey proposed the guessing entropy as a measure of the difficulty that an attacker has to guess a secret used in a cryptographic system, and established a well-known inequality between entropy and guessing entropy. Over 15 years before, in an unpublished work, he also established a well-known inequality for the entropy of an integer-valued random variable of given variance. In this paper, we establish a link between the two works by Massey in the more general framework of the relationship between discrete (absolute) entropy and contin- uous (differential) entropy. Two approaches are given in which the discrete entropy (or Rényi entropy) of an integer-valued variable can be upper bounded using the differential (Rényi) entropy of some suitably chosen continuous random variable. As an application, lower bounds on guessing entropy and guessing moments are derived in terms of entropy or Rényi entropy (without side information) and conditional entropy or Arimoto conditional entropy (when side information is available).
On One-Dimensional Linear Minimal Codes Over Finite (Commutative) Rings
- Maji Makhan
- Mesnager Sihem
- Sarkar Santanu
- Hansda Kalyan
IEEE Transactions on Information Theory, Institute of Electrical and Electronics Engineers, 2022, 68 (5), pp.2990-2998. (10.1109/TIT.2021.3133959)
DOI : 10.1109/TIT.2021.3133959
Neural network approaches to point lattice decoding
- Corlay Vincent
- Boutros Joseph J
- Ciblat Philippe
- Brunel Loïc
IEEE Transactions on Information Theory, Institute of Electrical and Electronics Engineers, 2022, 68 (5). We characterize the complexity of the lattice decoding problem from a neural network perspective. The notion of Voronoi-reduced basis is introduced to restrict the space of solutions to a binary set. On the one hand, this problem is shown to be equivalent to computing a continuous piecewise linear (CPWL) function restricted to the fundamental parallelotope. On the other hand, it is known that any function computed by a ReLU feed-forward neural network is CPWL. As a result, we count the number of affine pieces in the CPWL decoding function to characterize the complexity of the decoding problem. It is exponential in the space dimension n, which induces shallow neural networks of exponential size. For structured lattices we show that folding, a technique equivalent to using a deep neural network, enables to reduce this complexity from exponential in n to polynomial in n. Regarding unstructured MIMO lattices, in contrary to dense lattices many pieces in the CPWL decoding function can be neglected for quasioptimal decoding on the Gaussian channel. This makes the decoding problem easier and it explains why shallow neural networks of reasonable size are more efficient with this category of lattices (in low to moderate dimensions). (10.1109/TIT.2022.3147834)
DOI : 10.1109/TIT.2022.3147834
Cryptanalysis of the AEAD and hash algorithm DryGASCON
- Liang Huicong
- Mesnager Sihem
- Wang Meiqin
Cryptography and Communications - Discrete Structures, Boolean Functions and Sequences, Springer, 2022, 14 (3), pp.597-625. (10.1007/s12095-021-00542-7)
DOI : 10.1007/s12095-021-00542-7
Constructions of two-dimensional Z-complementary array pairs with large ZCZ ratio
- Zhang Hui
- Fan Cuiling
- Mesnager Sihem
Designs, Codes and Cryptography, Springer Verlag, 2022, 90 (5), pp.1221-1239. (10.1007/s10623-022-01035-1)
DOI : 10.1007/s10623-022-01035-1

Retour aux années