Publications

2020

A Model-Based Combination Language for Scheduling Verification
- Zhao Hui
- Apvrille Ludovic
- Mallet Frédéric
, 2020. Cyber-Physical Systems (CPSs) are built upon discrete software and hardware components, as well as continuous physical components. Such heterogeneous systems involve numerous domains with competencies and expertise that go far beyond traditional software engineering: systems engineering. In this paper , we explore a model-based approach for systems engineering that advocates the composition of several heterogeneous artifacts (called views) into a sound and consistent system model. A model combination Language is proposed for this purpose. Thus, rather than trying to build the universal language able to capture all possible aspects of systems, the proposed language proposes to relate small subsets of languages in order to offer specific analysis capabilities while keeping a global consistency between all joined models. We demonstrate the interest of our approach through an industrial process based on Capella, which provides (among others) a large support for functional analysis from requirements to components deployment. Even though Capella is already quite expressive, it lacks support for schedulability analysis. AADL is also a language dedicated to system analysis. If it is backed with advanced schedulability tools, it lacks support for functional analysis. Thus, instead of proposing ways to add missing aspects in either Capella or AADL, we rather extract a relevant subset of both languages to build a view adequate for conducting schedulability analysis of Capella functional models. Finally, our combination language is generic enough to extract pertinent subsets of languages and combine them to build views for different experts. It also helps maintaining a global consistency between different modeling views.
Monadic Datalog, Tree Validity, and Limited Access Containment
- Benedikt Michael
- Bourhis Pierre
- Gottlob Georg
- Senellart Pierre
ACM Transactions on Computational Logic, Association for Computing Machinery, 2020, 21 (1), pp.6:1-6:45. We reconsider the problem of containment of monadic datalog (MDL) queries in unions of conjunctive queries (UCQs). Prior work has dealt with special cases of the problem, but has left the precise complexity characterization open. In addition, the complexity of one important special case, that of containment under access patterns, was not known before. We start by revisiting the connection between MDL/UCQ containment and containment problems involving regular tree languages. We then present a general approach for getting tighter bounds on the complexity of query containment, based on analysis of the number of mappings of queries into tree-like instances. We give two applications of the machinery. We first give an important special case of the MDL/UCQ containment problem that is in EXPTIME, and use this bound to show an EXPTIME bound on containment under access patterns. Secondly we show that the same technique can be used to get a new tight upper bound for containment of tree automata in UCQs. We finally show that the new MDL/UCQ upper bounds are tight. We establish a 2EXPTIME lower bound on the MDL/UCQ containment problem, resolving an open problem from the early 1990s. This bound holds for the MDL/CQ containment problem as well. We also show that changes to the conditions given in our special cases can not be eliminated, and that in particular slight variations of the problem of containment under access patterns become 2EXPTIME-complete. (10.1145/3344514)
DOI : 10.1145/3344514
SPECTRAL EMBEDDING OF REGULARIZED BLOCK MODELS
- de Lara Nathan
- Bonald Thomas
, 2020. Spectral embedding is a popular technique for the representation of graph data. Several regularization techniques have been proposed to improve the quality of the embedding with respect to downstream tasks like clustering. In this paper, we explain on a simple block model the impact of the complete graph regularization, whereby a constant is added to all entries of the adjacency matrix. Specifically, we show that the regularization forces the spectral embedding to focus on the largest blocks, making the representation less sensitive to noise or outliers. We illustrate these results on both on both synthetic and real data, showing how regularization improves standard clustering scores.
Template-Based Graph Clustering
- Riva M.
- Yger F.
- Gori P.
- Cesar R.
- Bloch Isabelle
, 2020.
Quantitative Propagation of Chaos for SGD in Wide Neural Networks
- de Bortoli Valentin
- Durmus Alain
- Fontaine Xavier
- Şimşekli Umut
, 2020. In this paper, we investigate the limiting behavior of a continuous-time counterpart of the Stochastic Gradient Descent (SGD) algorithm applied to two-layer overparameterized neural networks, as the number or neurons (ie, the size of the hidden layer) $N \to +\infty$. Following a probabilistic approach, we show 'propagation of chaos' for the particle system defined by this continuous-time dynamics under different scenarios, indicating that the statistical interaction between the particles asymptotically vanishes. In particular, we establish quantitative convergence with respect to $N$ of any particle to a solution of a mean-field McKean-Vlasov equation in the metric space endowed with the Wasserstein distance. In comparison to previous works on the subject, we consider settings in which the sequence of stepsizes in SGD can potentially depend on the number of neurons and the iterations. We then identify two regimes under which different mean-field limits are obtained, one of them corresponding to an implicitly regularized version of the minimization problem at hand. We perform various experiments on real datasets to validate our theoretical results, assessing the existence of these two regimes on classification problems and illustrating our convergence results.
Statistical and Topological Properties of Sliced Probability Divergences
- Nadjahi Kimia
- Durmus Alain
- Chizat Lénaïc
- Kolouri Soheil
- Shahrampour Shahin
- Şimşekli Umut
, 2020. The idea of slicing divergences has been proven to be successful when comparing two probability measures in various machine learning applications including generative modeling, and consists in computing the expected value of a `base divergence' between one-dimensional random projections of the two measures. However, the computational and statistical consequences of such a technique have not yet been well-established. In this paper, we aim at bridging this gap and derive some properties of sliced divergence functions. First, we show that slicing preserves the metric axioms and the weak continuity of the divergence, implying that the sliced divergence will share similar topological properties. We then precise the results in the case where the base divergence belongs to the class of integral probability metrics. On the other hand, we establish that, under mild conditions, the sample complexity of the sliced divergence does not depend on the dimension, even when the base divergence suffers from the curse of dimensionality. We finally apply our general results to the Wasserstein distance and Sinkhorn divergences, and illustrate our theory on both synthetic and real data experiments.
Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks
- Şimşekli Umut
- Sener Ozan
- Deligiannidis George
- Erdogdu Murat A.
, 2020. Despite its success in a wide range of applications, characterizing the generalization properties of stochastic gradient descent (SGD) in non-convex deep learning problems is still an important challenge. While modeling the trajectories of SGD via stochastic differential equations (SDE) under heavy-tailed gradient noise has recently shed light over several peculiar characteristics of SGD, a rigorous treatment of the generalization properties of such SDEs in a learning theoretical framework is still missing. Aiming to bridge this gap, in this paper, we prove generalization bounds for SGD under the assumption that its trajectories can be well-approximated by a \emph{Feller process}, which defines a rich class of Markov processes that include several recent SDE representations (both Brownian or heavy-tailed) as its special case. We show that the generalization error can be controlled by the \emph{Hausdorff dimension} of the trajectories, which is intimately linked to the tail behavior of the driving process. Our results imply that heavier-tailed processes should achieve better generalization; hence, the tail-index of the process can be used as a notion of "capacity metric". We support our theory with experiments on deep neural networks illustrating that the proposed capacity metric accurately estimates the generalization error, and it does not necessarily grow with the number of parameters unlike the existing capacity metrics in the literature.
Stein's method for rough paths
- Coutin Laure
- Decreusefond Laurent
Potential Analysis, Springer Verlag, 2020, 53, pp.387--406. The original Donsker theorem says that a standard random walk converges in distribution to a Brownian motion in the space of continuous functions. It has recently been extended to enriched random walks and enriched Brownian motion. We use the Stein-Dirichlet method to precise the rate of this convergence in the topology of fractional Sobolev spaces. (10.1007/s11118-019-09773-z)
DOI : 10.1007/s11118-019-09773-z
Discrepancies of Measured SAR between Traditional and Fast Measuring Systems
- Liu Zicheng
- Allal Djamel
- Cox Maurice
- Wiart Joe
International Journal of Environmental Research and Public Health, MDPI, 2020. Human exposure to mobile devices is traditionally measured by a system in which the human body (or head) is modelled by a phantom and the energy absorbed from the device is estimated based on the electric fields measured with a single probe. Such a system suffers from low efficiency due to repeated volumetric scanning within the phantom needed to capture the absorbed energy throughout the volume. To speed up the measurement, fast SAR (specific absorption rate) measuring systems have been developed. However, discrepancies of measured results are observed between traditional and fast measuring systems. In this paper, the discrepancies in terms of post-processing procedures after the measurement of electric field (or its amplitude) are investigated. Here, the concerned fast measuring system estimates SAR based on the reconstructed field of the region of interest while the amplitude and phase of the electric field are measured on a single plane with a probe array. The numerical results presented indicate that the fast SAR measuring system has the potential to yield more accurate estimations than the traditional system, but no conclusion can be made on which kind of system is superior without knowledge of the field-reconstruction algorithms and the emitting source. (10.3390/ijerph17062111)
DOI : 10.3390/ijerph17062111
Stein's method for diffusive limit of queueing processes
- Besançon Eustache
- Decreusefond Laurent
- Moyal Pascal
Queueing Systems, Springer Verlag, 2020, 95, pp.173--201. Donsker Theorem is perhaps the most famous invariance principle result for Markov processes. It states that when properly normalized, a random walk behaves asymptotically like a Brownian motion. This approach can be extended to general Markov processes whose driving parameters are taken to a limit, which can lead to insightful results in contexts like large distributed systems or queueing networks. The purpose of this paper is to assess the rate of convergence in these so-called diffusion approximations, in a queueing context. To this end, we extend the functional Stein method introduced for the Brownian approximation of Poisson processes, to two simple examples: the single-server queue and the infinite-server queue. By doing so, we complete the recent applications of Stein's method to queueing systems, with results concerning the whole trajectory of the considered process, rather than its stationary distribution. (10.1007/s11134-020-09658-8)
DOI : 10.1007/s11134-020-09658-8
GAME-ON: A Multimodal Dataset for Cohesion and Group Analysis
- Maman Lucien
- Ceccaldi Eleonora
- Lehmann-Willenbrock Nale
- Likforman-Sulem Laurence
- Chetouani Mohamed
- Volpe Gualtiero
- Varni Giovanna
IEEE Access, IEEE, 2020, 8, pp.124185-124203. (10.1109/ACCESS.2020.3005719)
DOI : 10.1109/ACCESS.2020.3005719
The Need to Move beyond Triples
- Suchanek Fabian
, 2020. Almost all major knowledge bases are concerned mainly with binary relationships between entities. In this vision paper, we argue that it is time to broaden this view: first to relations of higher arity, complex objects, and events, and then also to knowledge about knowledge: We should be able to represent why something is true, that something is not true, that something happened before something else, or that something is mainly believed. While this idea is as old as Artificial Intelligence itself, we argue that only now we have the tools to achieve it: a better understanding of our use-cases and large amounts of data. We survey relevant approaches, and point out avenues of research.
On the boomerang uniformity of quadratic permutations
- Mesnager Sihem
- Tang C.
- Xiong M.
Journal of Designs, Codes, and Cryptography, 2020.
Solving $x+x^{2^l}+\cdots+x^{2^{ml}}=a$ over $\mathbb{F}_{2^n}$.
- Mesnager Sihem
- Kim K.H.
- Choe J.H.
- Lee D.N.
- Go D.S.
Cryptography and Communications-Discrete Structures, Boolean Functions, and Sequences (CCDS), 2020.
Advances in Neural Information Processing Systems 32 (NeurIPS 2019)
- Wallach Hanna M.
- Larochelle Hugo
- Beygelzimer Alina
- d'Alché-Buc Florence
- Fox Emily B.
- Garnett Roman
, 2020.
Matrix Factorization for High Frequency Non Intrusive Load Monitoring
- Henriet Simon
- Fuentes Benoît
- Şimşekli Umut
- Richard Gael
, 2020, pp.20-24. Non Intrusive Load Monitoring has been introduced 30 years ago in order to monitor the electric consumption of specific equipments inside a building without the need of installing multiples sensors. During three decades, researchers and industrials have described the NILM problems according to the electric data available, the desired quantity to be monitored and the application it was used for. As a consequence of the multitude of choices, a lot of different formulations can be found in the literature. This diversity makes it difficult for researchers from general domains such as machine learning to tackle the NILM problem. In this paper we aim at defining the NILM problem as a Matrix Factorization task using high frequency measurements and also to review methods to solve this problem. We start by defining the general concepts driving the NILM problem and then show how to cast high frequency NILM into a Matrix Factorization problem. Once casted as a machine learning problem, we will review general purposes algorithms applicable to this problem such as Independent Component Analysis, Sparse Coding or Semi Non-negative Matrix Factorization and specific NILM methods such as BOLT and IVMF. (10.1145/3427771.3427847)
DOI : 10.1145/3427771.3427847
QFib: Fast and Efficient Brain Tractogram Compression
- Mercier Corentin
- Rousseau S.
- Gori P.
- Bloch Isabelle
- Boubekeur T.
Neuroinformatics, Springer, 2020, 18, pp.627-640. Diffusion MRI fiber tracking datasets can contain millions of 3D streamlines, and their representation can weight tens of gigabytes of memory. These sets of streamlines are called tractograms and are often used for clinical operations or research. Their size makes them difficult to store, visualize, process or exchange over the network. We propose a new compression algorithm well-suited for trac-tograms, by taking advantage of the way streamlines are obtained with usual tracking algorithms. Our approach is based on unit vector quantization methods combined with a spatial transformation which results in low compression and decompression times, as well as a high compression ratio. For instance, a 11.5GB tractogram can be compressed to a 1.02GB file and decompressed in 11.3 seconds. Moreover, our method allows for the compression and decompression of individual streamlines, reducing the need for a costly out-of-core algorithm with heavy datasets. Last, we open a way toward on-the-fly compression and decompression for handling larger datasets without needing a load of RAM (i.e. in-core handling), faster network exchanges and faster loading times for visualization or processing.
Motion correction for PET data using subspace-based real-time MR imaging in simultaneous PET/MR
- Marin Thibault
- Djebra Y.
- Han P.
- Chemli Y.
- Bloch Isabelle
- El Fakhri G.
- Ouyang J.
- Petibon Y.
- Ma C.
Physics in Medicine and Biology, IOP Publishing, 2020, 65.
A Surrogate Model Based on Artificial Neural Network for RF Radiation Modelling with High-Dimensional Data
- Cheng Xi
- Henry Clément
- Andriulli Francesco
- Person Christian
- Wiart Joe
International Journal of Environmental Research and Public Health, MDPI, 2020. This paper focuses on quantifying the uncertainty in the specific absorption rate values of the brain induced by the uncertain positions of the electroencephalography electrodes placed on the patient's scalp. To avoid running a large number of simulations, an artificial neural network architecture for uncertainty quantification involving high-dimensional data is proposed in this paper. The proposed method is demonstrated to be an attractive alternative to conventional uncertainty quantification methods because of its considerable advantage in the computational expense and speed. (10.3390/ijerph17072586)
DOI : 10.3390/ijerph17072586
Introducing coherent MIMO sensing, a fading-resilient, polarization-independent approach to ϕ-OTDR
- Guerrier Sterenn
- Dorize Christian
- Awwad Elie
- Renaudier Jeremie
Optics Express, Optical Society of America - OSA Publishing, 2020, 28 (14), pp.21081. (10.1364/OE.396460)
DOI : 10.1364/OE.396460
Storage-Computation-Communication Tradeoff in Distributed Computing: Fundamental Limits and Complexity
- Yan Qifa
- Yang Sheng
- Wigger Michèle
IEEE Transactions on Information Theory, Institute of Electrical and Electronics Engineers, 2020. Distributed computing has become one of the most important frameworks in dealing with large computation tasks. In this paper, we propose a systematic construction of coded computing schemes for MapReduce-type distributed systems. The construction builds upon placement delivery arrays (PDA), originally proposed by Yan et al. for coded caching schemes. The main contributions of our work are threefold. First, we identify a class of PDAs, called Comp-PDAs, and show how to obtain a coded computing scheme from any Comp-PDA. We also characterize the normalized number of stored files (storage load), computed intermediate values (computation load), and communicated bits (communication load), of the obtained schemes in terms of the Comp-PDA parameters. Then, we show that the performance achieved by Comp-PDAs describing Maddah-Ali and Niesen's coded caching schemes matches a new information-theoretic converse, thus establishing the fundamental region of all achievable performance triples. In particular, we characterize all the Comp-PDAs achieving the pareto-optimal storage, computation, and communication (SCC) loads of the fundamental region. Finally, we investigate the file complexity of the proposed schemes, i.e., the smallest number of files required for implementation. In particular, we describe Comp-PDAs that achieve pareto-optimal SCC triples with significantly lower file complexity than the originally proposed Comp-PDAs.
Degrees-of-Freedom in Multi-Cloud Based Sectored Cellular Networks
- Gelincik Samet
- Rekaya-Ben Othman Ghaya
Entropy, MDPI, 2020.
Combined Vector Perturbation for Adaptive Modulation in MU-MIMO
- Askri Aymen
- Rekaya-Ben Othman Ghaya
, 2020.
Space-Time Coding Performance Analysis for CDL-impaired Multi-Core Fiber Transmission
- Abouseif Akram
- Rekaya-Ben Othman Ghaya
- Jaouën Yves
, 2020.
An analysis of the transfer learning of convolutional neural networks for artistic images
- Gonthier Nicolas
- Gousseau Yann
- Ladjal Saïd
, 2020. Transfer learning from huge natural image datasets, fine-tuning of deep neural networks and the use of the corresponding pre-trained networks have become de facto the core of art analysis applications. Nevertheless, the effects of transfer learning are still poorly understood. In this paper, we first use techniques for visualizing the network internal representations in order to provide clues to the understanding of what the network has learned on artistic images. Then, we provide a quantitative analysis of the changes introduced by the learning process thanks to metrics in both the feature and parameter spaces, as well as metrics computed on the set of maximal activation images. These analyses are performed on several variations of the transfer learning procedure. In particular, we observed that the network could specialize some pre-trained filters to the new image modality and also that higher layers tend to concentrate classes. Finally, we have shown that a double fine-tuning involving a medium-size artistic dataset can improve the classification on smaller datasets, even when the task changes. (10.1007/978-3-030-68796-0_39)
DOI : 10.1007/978-3-030-68796-0_39

Retour aux années