Publications

2021

A Primer on Alpha-Information Theory with Application to Leakage in Secrecy Systems
- Rioul Olivier
, 2021, 12829, pp.459 - 467. We give an informative review of the notions of Rényi's αentropy and α-divergence, Arimoto's conditional α-entropy, and Sibson's α-information, with emphasis on the various relations between them. All these generalize Shannon's classical information measures corresponding to α = 1. We present results on data processing inequalities and provide some new generalizations of the classical Fano's inequality for any α > 0. This enables one to α-information as a information theoretic metric of leakage in secrecy systems. Such metric can bound the gain of an adversary in guessing some secret (any potentially random function of some sensitive dataset) from disclosed measurements, compared with the adversary's prior belief (without access to measurements). (10.1007/978-3-030-80209-7_50)
DOI : 10.1007/978-3-030-80209-7_50
Metamorphic image registration using a semi-Lagrangian scheme
- François Anton
- Gori Pietro
- Glaunès Joan
, 2021, 12829, pp.781-788. In this paper, we propose an implementation of both Large Deformation Diffeomorphic Metric Mapping (LDDMM) and Metamorphosis image registration using a semi-Lagrangian scheme for geodesic shooting. We propose to solve both problems as an inexact matching providing a single and unifying cost function. We demonstrate that for image registration the use of a semi-Lagrangian scheme is more stable than a standard Eulerian scheme. Our GPU implementation is based on PyTorch, which greatly simplifies and accelerates the computations thanks to its powerful automatic differentiation engine. It will be freely available at https://github.com/antonfrancois/Demeter metamorphosis. (10.1007/978-3-030-80209-7_84)
DOI : 10.1007/978-3-030-80209-7_84
Zero-Knowledge Proofs for Committed Symmetric Boolean Functions
- Ling San
- Nguyen Khoa
- Phan Duong Hieu
- Tang Hanh
- Wang Huaxiong
, 2021, 12841, pp.339-359. (10.1007/978-3-030-81293-5_18)
DOI : 10.1007/978-3-030-81293-5_18
Modeling of a quantum dot gain chip in an external cavity laser configuration
- Ehlert Jannik
- Mugnier Alain
- He Gang
- Grillot Frédéric
Laser Physics, MAIK Nauka/Interperiodica, 2021, 31 (8), pp.085002. (10.1088/1555-6611/ac1073)
DOI : 10.1088/1555-6611/ac1073
Learning from Biased Data: A Semi-Parametric Approach
- Clémençon Stéphan
- Bertail Patrice
- Guyonvarch Yannick
- Noiry Nathan
, 2021 (139), pp.803-812. We consider risk minimization problems where the (source) distribution P S of the training observations Z 1 ,. .. , Z n differs from the (target) distribution P T involved in the risk that one seeks to minimize. Under the natural assumption that P S dominates P T , i.e. P T < < P S , we develop a semiparametric framework in the situation where we do not observe any sample from P T , but rather have access to some auxiliary information at the target population scale. More precisely, assuming that the Radon-Nikodym derivative dP T /dP S (z) belongs to a parametric class {g(z, α), α ∈ A} and that some (generalized) moments of P T are available to the learner, we propose a two-step learning procedure to perform the risk minimization task. We first selectα so as to match the moment constraints as closely as possible and then reweight each (biased) training observation Z i by g(Z i ,α) in the final Empirical Risk Minimization (ERM) algorithm. We establish a O P (1/ √ n) generalization bound proving that, remarkably, the solution to the weighted ERM problem thus constructed achieves a learning rate of the same order as that attained in absence of any sampling bias. Beyond these theoretical guarantees, numerical results providing strong empirical evidence of the relevance of the approach promoted in this article are displayed.
Relative Positional Encoding for Transformers with Linear Complexity
- Liutkus Antoine
- Cífka Ondřej
- Wu Shih-Lun
- Şimşekli Umut
- Yang Yi-Hsuan
- Richard Gael
, 2021, Proceedings of Machine Learning Research (139), pp.7067-7079. Recent advances in Transformer models allow for unprecedented sequence lengths, due to linear space and time complexity. In the meantime, relative positional encoding (RPE) was proposed as beneficial for classical Transformers and consists in exploiting lags instead of absolute positions for inference. Still, RPE is not available for the recent linear-variants of the Transformer, because it requires the explicit computation of the attention matrix, which is precisely what is avoided by such methods. In this paper, we bridge this gap and present Stochastic Positional Encoding as a way to generate PE that can be used as a replacement to the classical additive (sinusoidal) PE and provably behaves like RPE. The main theoretical contribution is to make a connection between positional encoding and cross-covariance structures of correlated Gaussian processes. We illustrate the performance of our approach on the Long-Range Arena benchmark and on music generation.
Confident Interpretations of Black Box Classifiers
- Radulovic Nedeljko
- Bifet Albert
- Suchanek Fabian
, 2021, pp.1-8. (10.1109/IJCNN52387.2021.9534234)
DOI : 10.1109/IJCNN52387.2021.9534234
Generalization Bounds in the Presence of Outliers: a Median-of-Means Study
- Laforgue Pierre
- Staerman Guillaume
- Clémençon Stéphan
, 2021. In contrast to the empirical mean, the Median-ofMeans (MoM) is an estimator of the mean θ of a square integrable r.v. Z, around which accuratenonasymptotic confidence bounds can be built, even when Z does not exhibit a sub-Gaussian tail behavior. Thanks to the high confidence it achieves on heavy-tailed data, MoM has found various applications in machine learning, where it is used to design training procedures that are not sensitive to atypical observations. More recently, a new line of work is now trying to characterize and leverage MoM’s ability to deal with corrupted data. In this context, the present work proposes a general study of MoM’s concentration properties under the contamination regime, that provides a clear understanding of the impact of the outlier proportion and the number of blocks chosen. The analysis is extended to (multisample) U-statistics, i.e. averages over tuples of observations, that raise additional challenges due to the dependence induced. Finally, we show that the latter bounds can be used in a straightforward fashion to derive generalization guarantees for pairwise learning in a contaminated setting, and propose an algorithm to compute provably reliable decision functions.
Cross-Modal Music-Video Recommendation: A Study of Design Choices
- Prétet Laure
- Richard Gael
- Peeters Geoffroy
, 2021.
Unbalanced Mallows Models for Optimizing Expensive Black-Box Permutation Problems
- Irurozki Ekhine
- López-Ibáñez Manuel
, 2021. Expensive black-box combinatorial optimization problems arise in practice when the objective function is evaluated by means of a simulator or a real-world experiment. Since each fitness evaluation is expensive in terms of time or resources, the number of possible evaluations is typically several orders of magnitude smaller than in non-expensive problems. Classical optimization methods are not useful in this scenario. In this paper, we propose and analyze UMM, an estimation-of-distribution (EDA) algorithm based on a Mallows probabilistic model and unbalanced rank aggregation (uBorda). Experimental results on black-box versions of LOP and PFSP show that UMM outperforms the solutions obtained by CEGO, a Bayesian optimization algorithm for combinatorial optimization. Nevertheless, a slight modification to CEGO, based on the different interpretations for rankings and orderings, significantly improves its performance, thus producing solutions that are slightly better than those of UMM and dramatically better than the original version. Another benefit of UMM is that its computational complexity increases linearly with both the number of function evaluations and the permutation size, which results in computation times an order of magnitude shorter than CEGO, making it specially useful when both computation time and number of evaluations are limited. CCS CONCEPTS • Mathematics of computing → Combinatorial optimization; • Theory of computation → Random search heuristics. (10.1145/3449639.3459366)
DOI : 10.1145/3449639.3459366
Dimensioning cellular IoT network using stochastic geometry and machine learning techniques
- Nguyen Tuan Anh
, 2021. Narrowband Internet of Things (NB-IoT) is a Low Power Wide Area technology, which was standardized in the Third Generation Partnership Project release, specifies a new random access procedure and a new transmission scheme for IoT. The advantages of the NB-IoT network are providing deep coverage, low power consumption, and support of a huge number of connections. Especially, NB-IoT can efficiently connect up to 50,000 devices per NB-IoT network cell.We focus our work on the study of NB-IoT network dimensioning. In this regard, we use stochastic geometry and machine learning techniques along with the thesis to characterize key performance indicators of the NB-IoT network, such as coverage probability, the number of required radio resource blocks, and the traffic pattern recognition and prediction based on the downlink control information. The thesis is divided into three major studies. Firstly, we derive the performance of uplink coverage probability in single-cell and multi-cell of NB-IoT network. The analytical expressions of the coverage and successful access probabilities in a single-cell NB-IoT network are presented by considering the packet arrival distribution. In the multi-cell scenario, a prediction of coverage probability is determined directly from the network parameters by using a Deep Neural Network. The subsequent analysis consists of an analytical model to calculate the required radio resource blocks in the multi-cell NB-IoT network and determine the network outage probability. This model is beneficial for operators because it clarifies how they should manage the available spectrum. Finally, the thesis addresses the recognition and prediction traffic type problems using the data collected from the Downlink Control Information. A wide group of machine learning algorithms are implemented and compared to identify the highest performances.The analysis conducted in this thesis demonstrates that stochastic geometry and machine learning techniques can serve as powerful tools to analyze the performance of the NB-IoT network. The frameworks developed in this work provide general analytical tools that can be readily extended to facilitate other research in 5G networks. (10.70675/444eba01ze277z4479zbf2eza143c513e1c4)
DOI : 10.70675/444eba01ze277z4479zbf2eza143c513e1c4
Error-Correction for Sparse Support Recovery Algorithms
- Mehrabi Mohammad
- Tchamkerten Aslan
, 2021, pp.1754-1759. This article proposes LiRE, a low complexity algorithm designed to efficiently correct errors made by any given baseline compressed sensing support recovery algorithms. LiRE takes as input an estimate sin of the true support s∗ of an m -sparse d -dimensional signal x observed through n linear measurements, and outputs a refined support estimate sout of size m . Sufficient conditions are established in the noiseless setup under which LiRE recovers all missed true features, i.e., sout⊇s∗ . Experimental results with Gaussian design matrices show that LiRE reduces the number of measurements needed for perfect support recovery via CoSaMP, BP, and OMP by up to 15%, 25%, and 40%, respectively, depending on the level of sparsity. Interestingly, adding LiRE to OMP yields a support recovery algorithm that is more accurate and significantly faster than Basis Pursuit. This conclusion carries over in the noisy measurement setup with the combination of LiRE and OMP against LASSO. These results suggest that LiRE may be used generically, on top of any baseline support recovery algorithm, to boost support recovery or to operate with a smaller number of measurements, at the cost of a relatively small computational overhead. Finally, with a random initialization LiRE becomes a standalone algorithm with OMP-like complexity, and whose reconstruction performance lies between OMP and BP. (10.1109/ISIT45174.2021.9518027)
DOI : 10.1109/ISIT45174.2021.9518027
A review of deep-learning techniques for SAR image restoration
- Denis Loïc
- Dalsasso Emanuele
- Tupin Florence
, 2021. The speckle phenomenon remains a major hurdle for the analysis of SAR images. The development of speckle reduction methods closely follows methodological progress in the field of image restoration. The advent of deep neural networks has offered new ways to tackle this longstanding problem. Deep learning for speckle reduction is a very active research topic and already shows restoration performances that exceed that of the previous generations of methods based on the concepts of patches, sparsity, wavelet transform or total variation minimization. The objective of this paper is to give an overview of the most recent works and point the main research directions and current challenges of deep learning for SAR image restoration. (10.1109/IGARSS47720.2021.9555039)
DOI : 10.1109/IGARSS47720.2021.9555039
Arithmétique efficace des extensions de corps finis
- Rousseau Édouard
, 2021. Finite fields are ubiquitous in cryptography and coding theory, two fields that are of utmost importance in modern communications. For that reason, it is crucial to represent finite fields and compute in them in the most efficient way possible. In this thesis, we investigate the arithmetic of finite field extensions in two different and independent ways.In the first part, we study the arithmetic of one fixed finite field extension F_{p^k}. When estimating the complexity of an algorithm in a finite field extension, we often count the arithmetic operations that are needed in the base field F_p. In such a model, all operations have the same unit cost. This is known as the algebraic complexity model. Nevertheless, it is known that multiplications are more expensive, i.e. take more time, than additions. For that reason, alternative models were studied, such as the bilinear complexity model, in which the assumption is that additions have no cost, thus we only count the multiplications. To have an efficient multiplication algorithm in the extension F_{p^k}, research has been done to obtain formulas in which the number of multiplications in the base field F_p are minimized. The optimal number of such multiplication is, by definition, the bilinear complexity of the multiplication in F_{p^k}. Finding the exact value of the bilinear complexity of the multiplication in finite field extensions is hard, but there exist algorithms to find optimal formulas in small dimension. Asymptotically, there exist different algorithms that give formulas that are not necessarily optimal but still give a linear upper bound on the bilinear complexity in the degree of the extension. We generalize these results to a new kind of complexity, called the hypersymmetric complexity, that is linked with formulas possessing extra properties of symmetry. We provide an ad hoc algorithm finding hypersymmetric formulas in small dimension, as well as an implementation and experimental results. Generalizing the proofs of the literature, we also prove that the hypersymmetric complexity is still linear in the degree of the extension.In the second part, we work with multiple finite field extensions simultaneously. In most computer algebra systems, it is possible to deal with finite fields, but two arbitrary extensions are often seen as independent objects, and the links between them are not necessarily accessible to the user. Our goal in this part is to construct an efficient data structure to represent multiple extensions, and the embeddings between them. We also want the embeddings to be compatible, i.e. if we have three integers a, b, c such that a | b | c, we want the composition of the embeddings from F_{p^a} to F_{p^b} and F_{p^b} to F_{p^c} to be equal to the embedding from F_{p^a} to F_{p^c}. We call this data structure a lattice of compatibly embedded finite fields. We provide an implementation of the Bosma-Canon-Steel framework, a lattice of compatibly embedded finite fields that was only available in MAGMA, as well as experimental results. After this work, we also added the Bosma-Canon-Steel framework to the computer algebra system Nemo.Another popular method to obtain lattices of compatibly embedded finite fields is to use Conway polynomials. It is quite efficient but the extensions have to be defined using these precomputed special polynomials to obtain compatibility between embeddings. Inspired by both the Bosma-Canon-Steel framework and the Conway polynomials, we construct a new kind of lattice, that we call standard lattice of compatibly embedded finite fields. This construction allows us to use arbitrary finite field extensions, while being rather efficient. We provide a detailed complexity analysis of the algorithms involved in this construction, as well as experimental results to show that the construction is practical. (10.70675/5094cdcbz9865z4674z8591z6383aef42092)
DOI : 10.70675/5094cdcbz9865z4674z8591z6383aef42092
Bent Sequences over Hadamard Codes for Physically Unclonable Functions
- Solé Patrick
- Cheng Wei
- Guilley Sylvain
- Rioul Olivier
, 2021, pp.801-806. We study challenge codes for physically unclonable functions (PUFs). Starting from the classical Hadamard challenge code, we augment it by one vector. Numerical values suggest that the optimal choice of this vector for maximizing the entropy is to pick a vector the farthest away from the code formed by the challenges and their binary complements. This leads us to study the covering radius of Hadamard codes. A notion of bent sequence that generalizes the classical notion from Hadamard matrices of Sylvester type to general Hadamard matrices is given. Lower bounds for Paley-type Hadamard matrices are given. (10.1109/ISIT45174.2021.9517752)
DOI : 10.1109/ISIT45174.2021.9517752
On the Capacity Region of Gaussian Broadcast Channels under Two-Sided Noisy Feedback
- Ravi Aditya Narayan
- Pillai Sibi Raj B.
- Prabhakaran Vinod M
- Wigger Michèle
, 2021, pp.2870-2875. The capacity region of several multiuser models in information theory can be enlarged by utilizing feedback of the received symbols. This is in contradiction to the discrete memoryless case, where feedback is known not to change the capacity. In this paper, we consider two broadcast models with noisy feedback from both the receivers. The models are derived from a standard memoryless scalar GBC, where two intermediate passive nodes are assumed to be observing the transmissions via separate noisy links corrupted by independent AWGN. In our first model, the scalar output from each intermediate node is passed through two additional independent AWGN links, called feedback and forward links. The output of the feedback link is observed by the transmitter as feedback, whereas only the forward link is observed by the corresponding decoder. We derive conditions that are both necessary and sufficient for feedback to enlarge the capacity region. In the second model, the two outputs of a standard GBC are observed by the respective decoders, but the transmitter observes the sum of the symbols at the receivers using causal feedback. We show that such a feedback has no effect on the capacity region. (10.1109/ISIT45174.2021.9518098)
DOI : 10.1109/ISIT45174.2021.9518098
Exploiting multi-temporal information for improved speckle reduction of Sentinel-1 SAR images by deep learning
- Dalsasso Emanuele
- Meraoumia Inès
- Denis Loïc
- Tupin Florence
, 2021. Deep learning approaches show unprecedented results for speckle reduction in SAR amplitude images. The wide availability of multi-temporal stacks of SAR images can improve even further the quality of denoising. In this paper, we propose a flexible yet efficient way to integrate temporal information into a deep neural network for speckle suppression. Archives provide access to long time-series of SAR images, from which multi-temporal averages can be computed with virtually no remaining speckle fluctuations. The proposed method combines this multi-temporal average and the image at a given date in the form of a ratio image and uses a state-of-the-art neural network to remove the speckle in this ratio image. This simple strategy is shown to offer a noticeable improvement compared to filtering the original image without knowledge of the multi-temporal average. (10.1109/IGARSS47720.2021.9554555)
DOI : 10.1109/IGARSS47720.2021.9554555
Interaction Fidelity vs User’s Workload in a VR Environment: A Pilot Study
- Mancini Maurizio
- Spreadborough Jake
- Maye Laura
- Biancardi Beatrice
- Varni Giovanna
, 2021.
Cumulant Expansion of Mutual Information for Quantifying Leakage of a Protected Secret
- Rioul Olivier
- Cheng Wei
- Guilley Sylvain
, 2021. The information leakage of a cryptographic implementation with a given degree of protection is evaluated in a typical situation when the signal-to-noise ratio is small. This is solved by expanding Kullback-Leibler divergence, entropy, and mutual information in terms of moments/cumulants.
Concentric Mixtures of Mallows Models for Top-k Rankings: Sampling and Identifiability
- Collas Fabien
- Irurozki Ekhine
, 2021. In this paper, we study mixtures of two Mallows models for top-k rankings with equal location parameters but with different scale parameters (a mixture of concentric Mallows models). These models arise when we have a heterogeneous population of voters formed by two populations, one of which is a subpopulation of expert voters. We show the identifiability of both components and the learnability of their respective parameters. These results are based upon, first, bounding the sample complexity for the Borda algorithm with top-k rankings. Second, we characterize the distances between rankings, showing that an off-theshelf clustering algorithm separates the rankings by components with high probability-provided the scales are well-separated.As a by-product, we include an efficient sampling algorithm for Mallows top-k rankings. Finally, since the rank aggregation will suffer from a large amount of noise introduced by the non-expert voters, we adapt the Borda algorithm to be able to recover the ground truth consensus ranking which is especially consistent with the expert rankings.
First-and Second-Moment Constrained Gaussian Channels
- Ma Shuai
- Wigger Michèle
, 2021. This paper studies the channel capacity of intensitymodulation direct-detection (IM/DD) visible light communication (VLC) systems under both optical and electrical power constraints. Specifically, it derives the asymptotic capacities in the high and low signal-to-noise ratio (SNR) regimes under peak, first-moment, and second-moment constraints. The results show that first-and second-moment constraints are never simultaneously active in the asymptotic low-SNR regime, and only in few cases in the asymptotic high-SNR regime. Moreover, the secondmoment constraint is more stringent in the asymptotic low-SNR regime than in the high-SNR regime.
Dynamic Membership for Regular Languages
- Amarilli Antoine
- Jachiet Louis
- Paperman Charles
, 2021, 198, pp.116:1--116:17. We study the dynamic membership problem for regular languages: fix a language L, read a word w, build in time O(|w|) a data structure indicating if w is in L, and maintain this structure efficiently under letter substitutions on w. We consider this problem on the unit cost RAM model with logarithmic word length, where the problem always has a solution in O(log |w| / log log |w|) per operation. We show that the problem is in O(log log |w|) for languages in an algebraically-defined, decidable class QSG, and that it is in O(1) for another such class QLZG. We show that languages not in QSG admit a reduction from the prefix problem for a cyclic group, so that they require {\Omega}(log |w| / log log |w|) operations in the worst case; and that QSG languages not in QLZG admit a reduction from the prefix problem for the multiplicative monoid U 1 = {0, 1}, which we conjecture cannot be maintained in O(1). This yields a conditional trichotomy. We also investigate intermediate cases between O(1) and O(log log |w|). Our results are shown via the dynamic word problem for monoids and semigroups, for which we also give a classification. We thus solve open problems of the paper of Skovbjerg Frandsen, Miltersen, and Skyum [30] on the dynamic word problem, and additionally cover regular languages. (10.4230/LIPIcs.ICALP.2021.116)
DOI : 10.4230/LIPIcs.ICALP.2021.116
Despeckling Sentinel-1 GRD images by deep learning and application to narrow river segmentation
- Gasnier Nicolas
- Dalsasso Emanuele
- Denis Loïc
- Tupin Florence
, 2021. This paper presents a despeckling method for Sentinel-1 GRD images based on the recently proposed framework "SAR2SAR": a self-supervised training strategy. Training the deep neural network on collections of Sentinel 1 GRD images leads to a despeckling algorithm that is robust to space-variant spatial correlations of speckle. Despeckled images improve the detection of structures like narrow rivers. We apply a detector based on exogenous information and a linear features detector and show that rivers are better segmented when the processing chain is applied to images pre-processed by our despeckling neural network. (10.1109/igarss47720.2021.9554350)
DOI : 10.1109/igarss47720.2021.9554350
Learning Output Embeddings in Structured Prediction
- Brogat-Motte Luc
- Rudi Alessandro
- Brouard Celine
- Rousu Juho
- d'Alché-Buc Florence
, 2021. A powerful and flexible approach to structured prediction consists in embedding the structured objects to be predicted into a feature space of possibly infinite dimension by means of output kernels, and then, solving a regression problem in this output space. A prediction in the original space is computed by solving a pre-image problem. In such an approach, the embedding, linked to the target loss, is defined prior to the learning phase. In this work, we propose to jointly learn a finite approximation of the output embedding and the regression function into the new feature space. For that purpose, we leverage a priori information on the outputs and also unexploited unsupervised output data, which are both often available in structured prediction problems. We prove that the resulting structured predictor is a consistent estimator, and derive an excess risk bound. Moreover, the novel structured prediction tool enjoys a significantly smaller computational complexity than former output kernel methods. The approach empirically tested on various structured prediction problems reveals to be versatile and able to handle large datasets.
Information Leakages in Code-based Masking: A Unified Quantification Approach
- Cheng Wei
- Guilley Sylvain
- Carlet Claude
- Danger Jean-Luc
- Mesnager Sihem
IACR Transactions on Cryptographic Hardware and Embedded Systems, IACR, 2021, pp.465-495. This paper presents a unified approach to quantifying the information leakages in the most general code-based masking schemes. Specifically, by utilizing a uniform representation, we highlight first that all code-based masking schemes’ side-channel resistance can be quantified by an all-in-one framework consisting of two easy-tocompute parameters (the dual distance and the number of conditioned codewords) from a coding-theoretic perspective. In particular, we use signal-to-noise ratio (SNR) and mutual information (MI) as two complementary metrics, where a closed-form expression of SNR and an approximation of MI are proposed by connecting both metrics to the two coding-theoretic parameters. Secondly, considering the connection between Reed-Solomon code and SSS (Shamir’s Secret Sharing) scheme, the SSS-based masking is viewed as a particular case of generalized code-based masking. Hence as a straightforward application, we evaluate the impact of public points on the side-channel security of SSS-based masking schemes, namely the polynomial masking, and enhance the SSS-based masking by choosing optimal public points for it. Interestingly, we show that given a specific security order, more shares in SSS-based masking leak more information on secrets in an information-theoretic sense. Finally, our approach provides a systematic method for optimizing the side-channel resistance of every code-based masking. More precisely, this approach enables us to select optimal linear codes (parameters) for the generalized code-based masking by choosing appropriate codes according to the two coding-theoretic parameters. Summing up, we provide a best-practice guideline for the application of code-based masking to protect cryptographic implementations. (10.46586/tches.v2021.i3.465-495)
DOI : 10.46586/tches.v2021.i3.465-495

Retour aux années