Sorry, you need to enable JavaScript to visit this website.
Share

Publications

2018

  • Handwriting Recognition of Historical Documents with Few Labeled Data
    • Chammas Edgard
    • Mokbel Chafic
    • Likforman-Sulem Laurence
    , 2018, pp.43-48. Historical documents present many challenges for offline handwriting recognition systems, among them, the segmentation and labeling steps. Carefully annotated textlines are needed to train an HTR system. In some scenarios, transcripts are only available at the paragraph level with no text-line information. In this work, we demonstrate how to train an HTR system with few labeled data. Specifically, we train a deep convolutional recurrent neural network (CRNN) system on only 10% of manually labeled text-line data from a dataset and propose an incremental training procedure that covers the rest of the data. Performance is further increased by augmenting the training set with specially crafted multiscale data. We also propose a model-based normalization scheme which considers the variability in the writing scale at the recognition phase. We apply this approach to the publicly available READ dataset 1. Our system achieved the second best result during the ICDAR2017 competition [1]. (10.1109/DAS.2018.15)
    DOI : 10.1109/DAS.2018.15
  • Softwarized and Distributed Learning for SON Management Systems
    • Daher Tony
    • Jemaa Sana Ben
    • Decreusefond Laurent
    , 2018. —Self-Organizing Networks (SON) functions have already proven to be useful for network operations. However, a higher automation level is required to make a network enabled with SON capabilities respond as a whole to the operator's objectives. For this purpose, a Policy Based SON Management (PBSM) layer has been proposed to manage the deployed SON functions. In this paper, we propose to empower the PBSM with cognition capability in order to manage efficiently SON enabled networks. We focus particularly on the implementation of such a Cognitive PBSM (C-PBSM) on a large scale network and propose a scalable approach based on distributed Reinforcement Learning (RL): RL agents are deployed on different clusters of the network. These clusters should be defined in such a way that the RL agents can learn independently. As the interaction between these clusters may evolve in time due for instance to traffic dynamics, we propose a flexible implementation of this C-PBSM framework with dynamic clustering to adapt to network's evo-lutions. We show how this flexible implementation is rendered possible under Software Defined Networks (SDN) framework. We also assess the performance of the proposed distributed learning approach on an LTE-A simulator.
  • Listing k-cliques in Sparse Real-World Graphs
    • Danisch Maximilien
    • Balalau Oana
    • Sozio Mauro
    , 2018, pp.589-598. Motivated by recent studies in the data mining community which require to efficiently list all k-cliques, we revisit the iconic algorithm of Chiba and Nishizeki and develop the most efficient parallel algorithm for such a problem. Our theoretical analysis provides the best asymptotic upper bound on the running time of our algorithm for the case when the input graph is sparse. Our experimental evaluation on large real-world graphs shows that our parallel algorithm is faster than state-of-the-art algorithms, while boasting an excellent degree of parallelism. In particular, we are able to list all k-cliques (for any k) in graphs containing up to tens of millions of edges as well as all 10-cliques in graphs containing billions of edges, within a few minutes and a few hours respectively. Finally, we show how our algorithm can be employed as an effective subroutine for finding the k-clique core decomposition and an approximate k-clique densest subgraphs in very large real-world graphs. (10.1145/3178876.3186125)
    DOI : 10.1145/3178876.3186125
  • A Knowledge Base for Personal Information Management
    • Montoya David
    • Tanon Thomas Pellissier
    • Abiteboul Serge
    • Senellart Pierre
    • Suchanek Fabian M.
    , 2018. Internet users have personal data spread over several devices and across several web systems. In this paper, we introduce a novel open-source framework for integrating the data of a user from different sources into a single knowledge base. Our framework integrates data of different kinds into a coherent whole, starting with email messages, calendar, contacts, and location history. We show how event periods in the user's location data can be detected and how they can be aligned with events from the calendar. This allows users to query their personal information within and across different dimensions, and to perform analytics over their emails, events, and locations. Our system models data using RDF, extending the schema.org vocabulary and providing a SPARQL interface.
  • Fully Dynamic k -Center Clustering
    • Chan T-H. Hubert
    • Guerqin Arnaud
    • Sozio Mauro
    , 2018, pp.579-587. Static and dynamic clustering algorithms are a fundamental tool in any machine learning library. Most of the efforts in developing dynamic machine learning and data mining algorithms have been focusing on the sliding window model (where at any given point in time only the most recent data items are retained) or more simplistic models. However, in many real-world applications one might need to deal with arbitrary deletions and insertions. For example, one might need to remove data items that are not necessarily the oldest ones, because they have been flagged as containing inappropriate content or due to privacy concerns. Clustering trajectory data might also require to deal with more general update operations. We develop a (2 +)-approximation algorithm for the k-center clustering problem with "small" amortized cost under the fully dynamic adversarial model. In such a model, points can be added or removed arbitrarily, provided that the adversary does not have access to the random choices of our algorithm. The amortized cost of our algorithm is poly-logarithmic when the ratio between the maximum and minimum distance between any two points in input is bounded by a polynomial, while k and are constant. Our theoretical results are complemented with an extensive experimental evaluation on dynamic data from Twitter, Flickr, as well as trajectory data, demonstrating the effectiveness of our approach. (10.1145/3178876.3186124)
    DOI : 10.1145/3178876.3186124
  • Property Label Stability in Wikidata
    • Pellissier Tanon Thomas
    • Kaffee Lucie-Aimée
    , 2018. Stability in Wikidata's schema is essential for the reuse of its data. In this paper, we analyze the stability of the data based on the changes in labels of properties in six languages. We find that the schema is overall stable, making it a reliable resource for external usage. (10.1145/3184558.3191643)
    DOI : 10.1145/3184558.3191643
  • Making Sense of Data Workers' Sense Making Practices
    • Liu Jiali
    • Boukhelifa Nadia
    • Eagan James R
    , 2018. Data workers are non-professional data scientists who engage in data analysis activities as part of their daily work. In this position paper, we draw on our past experience in studying their data analysis processes and workflows, and the tools we built to support sensemaking. We describe our background as computer scientists and our multidisciplinary approach. Finally, we conclude with open questions and research directions, and argue for more research into the challenges faced by data workers.
  • Self-Reflection and Personal Physicalization Construction
    • Thudt Alice
    • Hinrichs Uta
    • Huron Samuel
    • Carpendale Sheelagh
    , 2018. Self-reflection is a central goal of personal informatics systems, and constructing visualizations from physical tokens has been found to help people reflect on data. However, so far, physical-ization construction has only been studied in lab environments with provided datasets. Our qualitative study investigates the construction of personal physicalizations in people's domestic environments over 2-4 weeks. It contributes an understanding of (1) the process of creating personal physicalizations, (2) the types of personal insights facilitated, (3) the integration of self-reflection in the physicalization process, and (4) its benefits and challenges for self-reflection. We found that in constructive personal physicalization, data collection, construction and self-reflections are deeply intertwined. This extends previous models of visualization creation and data-driven self-reflection. We outline how benefits such as reflection through manual construction , personalization, and presence in everyday life can be transferred to a wider set of digital and physical systems. (10.1145/3173574.3173728)
    DOI : 10.1145/3173574.3173728
  • BIGFile: Bayesian Information Gain for Fast File Retrieval
    • Liu Wanyu
    • Rioul Olivier
    • Mcgrenere Joanna
    • Mackay Wendy
    • Beaudouin-Lafon Michel
    , 2018. We introduce BIGFile, a new fast file retrieval technique based on the Bayesian Information Gain framework. BIGFile provides interface shortcuts to assist the user in navigating to a desired target (file or folder). BIGFile's split interface combines a traditional list view with an adaptive area that displays shortcuts to the set of file paths estimated by our computa-tionally efficient algorithm. Users can navigate the list as usual, or select any part of the paths in the adaptive area. A pilot study of 15 users informed the design of BIGFile, revealing the size and structure of their file systems and their file retrieval practices. Our simulations show that BIGFile outper-forms Fitchett et al.'s AccessRank, a best-of-breed prediction algorithm. We conducted an experiment to compare BIGFile with ARFile (AccessRank instantiated in a split interface) and with a Finder-like list view as baseline. BIGFile was by far the most efficient technique (up to 44% faster than ARFile and 64% faster than Finder), and participants unanimously preferred the split interfaces to the Finder. (10.1145/3173574.3173959)
    DOI : 10.1145/3173574.3173959
  • Decentralized joint cache-channel coding over erasure broadcast channels
    • Kamel Sarah
    • Sarkiss Mireille
    • Wigger Michèle
    , 2018. We derive upper bounds on the rate-memory trade-off of cache-aided erasure broadcast channels with K w weak receivers and K s strong receivers. We follow a decentralized placement scenario, where coordination is not needed prior to the delivery phase. We study two setups: a standard scenario without eavesdropper and a wiretap scenario with an external eavesdropper. For both scenarios, we propose joint cache-channel coding schemes that efficiently exploit the cache contents and take into consideration the users' channel characteristics at the same time. We show that the decentralized placement strategy causes only a small increase in delivery rate compared to centralized strategy. Similarly, when cache sizes are moderate, the rate is increased only slightly by securing the communication against external eavesdroppers. This is not the case when cache memories are small and large. (10.1109/MENACOMM.2018.8371012)
    DOI : 10.1109/MENACOMM.2018.8371012
  • Innovative paradigms and architecture for future distribution electricity networks supporting the energy transition
    • Horta José Luis
    , 2018. Future electricity distribution grids will host an important and growing share of variable renewable energy sources and local storage resources. Moreover, they will face new load structures due for example to the growth of the electric vehicle market. These trends raise the need for new distribution grid architecture and operation paradigms to keep the grid stable and to ensure quality of supply. In addition, these new paradigms will enable the provision of advanced new services. In this thesis we propose a novel architecture capable of fostering collaboration among wholesale market actors, distribution system operators and end customers, to leverage flexible distributed energy resources while respecting distribution system constrains. The architecture is designed for providing innovative residential demand side management services, with a special focus on services enabled by self-consumption at the household and neighborhood level. Following these general objectives, the thesis provides three main contributions. First, based on internet of things and blockchain technology, we propose the building blocks for future distribution grid energy management architectures. Then, focusing on the services enabled by such architectures, we propose hour-ahead markets for the local exchange of renewable energy among households together with dynamic phase allocation mechanism to improve the quality of electricity supply. Finally, we propose a real time control mechanism for the adjustment of market decisions to satisfy distribution system operator constraints.
  • Random forests resource allocation for 5G systems: Performance and robustness study
    • Imtiaz Sahar
    • Ghauch Hadi
    • Koudouridis George
    • Gross James
    , 2018, pp.326-331. (10.1109/WCNCW.2018.8369028)
    DOI : 10.1109/WCNCW.2018.8369028
  • Precoding Matrix Design in Linear Video Coding
    • Zheng S.
    • Cagnazzo M.
    • Kieffer Michel
    , 2018, pp.1198-1202. (10.1109/icassp.2018.8461287)
    DOI : 10.1109/icassp.2018.8461287
  • Video enhancement with convex optimization methods
    • Boyadjis Benoit
    • Purica Andrei
    • Pesquet-Popescu Béatrice
    • Dufaux Frédéric
    , 2018. Video enhancement methods enable to optimize the viewing of video content at the end-user side. Most approaches do not consider the compressed nature of the available content. In the present work, we build upon a recently proposed video enhancement approach that explicitly models a compression stage. To apply the enhancement framework on compressed representations requires to extract specific syntax elements during their decoding. This additional information embeds the enhanced result in a domain that closely fits the observation. We evaluate the framework performance in a single source resolution enhancement scenario, and show the method efficiency with respect to state-of-the-art approaches. (10.1109/icassp.2018.8462357)
    DOI : 10.1109/icassp.2018.8462357
  • An ensemble learning approach to detect epileptic seizures from long intracranial EEG recordings
    • Schiratti Jean-Baptiste
    • Le Douget Jean-Eudes
    • Le van Quyen Michel
    • Essid Slim
    • Gramfort Alexandre
    , 2018. This paper proposes a patient-specific supervised classification algorithm to detect seizures in long offline intracranial electroencephalographic (iEEG) recordings. The main idea of the proposed algorithm is to combine a set of probabilistic classifiers, trained on a dataset of 1 s epochs, into a weighted ensemble classifier which can be used to analyze longer 5 s data segments. The method is trained and evaluated on 24 patients , all suffering from focal medically intractable epilepsy, from the Epilepsiae database. The evaluation of the method, conducted using an average of 113 hours (min: 32 h, max: 229 h) of iEEG data per patient, shows that the proposed algorithm improves upon existing methods for seizure detection with iEEG.
  • A Constant Step Stochastic Douglas-Rachford Algorithm with Application to Non Separable Regularizations
    • Salim Adil
    • Bianchi Pascal
    • Hachem Walid
    , 2018, pp.2886-2890. The Douglas Rachford algorithm is an algorithm that converges to a minimizer of a sum of two convex functions. The algorithm consists in fixed point iterations involving computations of the proximity operators of the two functions separately. The paper investigates a stochastic version of the algorithm where both functions are random and the step size is constant. We establish that the iterates of the algorithm stay close to the set of solution with high probability when the step size is small enough. Application to structured regularization is considered. (10.1109/ICASSP.2018.8461469)
    DOI : 10.1109/ICASSP.2018.8461469
  • Attitude Classification in Adjacency Pairs of a Human-Agent Interaction with Hidden Conditional Random Fields
    • Barriere Valentin
    • Clavel Chloé
    • Essid Slim
    , 2018, pp.4949-4953. (10.1109/ICASSP.2018.8462160)
    DOI : 10.1109/ICASSP.2018.8462160
  • Alpha-stable low-rank plus residual decomposition for speech enhancement
    • Şimşekli Umut
    • Erdogan Halil
    • Leglaive Simon
    • Liutkus Antoine
    • Badeau Roland
    • Richard Gael
    , 2018, pp.651-655. In this study, we propose a novel probabilistic model for separating clean speech signals from noisy mixtures by decomposing the mixture spectrograms into a structured speech part and a more flexible residual part. The main novelty in our model is that it uses a family of heavy-tailed distributions, so called the α-stable distributions, for modeling the residual signal. We develop an expectation-maximization algorithm for parameter estimation and a Monte Carlo scheme for posterior estimation of the clean speech. Our experiments show that the proposed method outperforms relevant factorization-based algorithms by a significant margin. (10.1109/ICASSP.2018.8461539)
    DOI : 10.1109/ICASSP.2018.8461539
  • Scalable and Cost-Efficient Algorithms for Baseband Unit (BBU) Function Split Placement
    • Mharsi Niezi
    • Hadji Makhlouf
    • Niyato Dusit
    • Diego William
    • Krishnaswamy Ruby
    , 2018. This paper considers the optimal placement of Baseband Unit (BBU) function split in Cloud Radio Access Networks (C-RANs) which is an essential key technology in C-RAN deployment. In particular, the BBU function split is modeled as directed chains to be mapped to a network infrastructure. As such, we propose an Integer Linear Program (ILP) formulation for small and medium size networks. Alternatively, we introduce four heuristic algorithms with significantly less complexity. We then benchmark the four heuristic algorithms based on the construction of a multi-stage graph. The simulation results strongly confirm the efficiency and scalability of our algorithms as well as their ability to achieve an optimal solution.
  • 5W 1952nm Brillouin-Free Efficient Single Clad TDFA
    • Romano Clément
    • Tench Robert E
    • Delavaux Jean-Marc
    , 2018. We report the performance of a two stage single clad (SC) Thulium-doped fiber amplifier (TDFA), delivering an output power of 5 W at 1952 nm without stimulated Brillouin scattering (SBS) for a single-frequency input signal. A slope efficiency greater than 60 %, a signal gain greater than 60 dB and an input dynamic range > 30 dB are achieved. The amplifier topology was optimized with a modelization tool of the SC TDFA performance: experimental results and simulations are in good agreement. (10.1117/12.2304910)
    DOI : 10.1117/12.2304910
  • Patch-Based image fusion for computational photography
    • Ocampo Blandon Cristian Felipe
    , 2018. The most common computational techniques to deal with the limited high dynamic range and reduced depth of field of conventional cameras are based on the fusion of images acquired with different settings. These approaches require aligned images and motionless scenes, otherwise ghost artifacts and irregular structures can arise after the fusion. The goal of this thesis is to develop patch-based techniques in order to deal with motion and misalignment for image fusion, particularly in the case of variable illumination and blur.In the first part of this work, we present a methodology for the fusion of bracketed exposure images for dynamic scenes. Our method combines a carefully crafted contrast normalization, a fast non-local combination of patches and different regularization steps. This yields an efficient way of producing contrasted and well-exposed images from hand-held captures of dynamic scenes, even in difficult cases (moving objects, non planar scenes, optical deformations, etc.).In a second part, we propose a multifocus image fusion method that also deals with hand-held acquisition conditions and moving objects. At the core of our methodology, we propose a patch-based algorithm that corrects local geometric deformations by relying on both color and gradient orientations.Our methods were evaluated on common and new datasets created for the purpose of this work. From the experiments we conclude that our methods are consistently more robust than alternative methods to geometric distortions and illumination variations or blur. As a byproduct of our study, we also analyze the capacity of the PatchMatch algorithm to reconstruct images in the presence of blur and illumination changes, and propose different strategies to improve such reconstructions.
  • Advances in automating analysis of neural time series data
    • Jas Mainak
    , 2018. Electrophysiology experiments has for long relied upon small cohorts of subjects to uncover statistically significant effects of interest. However, the low sample size translates into a low power which leads to a high false discovery rate, and hence a low rate of reproducibility. To address this issue means solving two related problems: first, how do we facilitate data sharing and reusability to build large datasets; and second, once big datasets are available, what tools can we build to analyze them ? In the first part of the thesis, we introduce a new data standard for sharing data known as the Brain Imaging Data Structure (BIDS), and its extension MEG-BIDS. Next, we introduce the reader to a typical electrophysiological pipeline analyzed with the MNE software package. We consider the different choices that users have to deal with at each stage of the pipeline and provide standard recommendations. Next, we focus our attention on tools to automate analysis of large datasets. We propose an automated tool to remove segments of data corrupted by artifacts. We develop an outlier detection algorithm based on tuning rejection thresholds. More importantly, we use the HCP data, which is manually annotated, to benchmark our algorithm against existing state-of-the-art methods. Finally, we use convolutional sparse coding to uncover structures in neural time series. We reformulate the existing approach in computer vision as a maximuma posteriori (MAP) inference problem to deal with heavy tailed distributions and high amplitude artifacts. Taken together, this thesis represents an attempt to shift from slow and manual methods of analysis to automated, reproducible analysis.
  • Self-integrating Organic Control Systems: from Crayfish to Smart Homes
    • Diaconescu Ada
    • Mata Pembe
    • Bellman Kirstie
    , 2018.
  • Beating Monte-Carlo Integration: a Nonparametric Study of Kernel Smoothing Methods
    • Clémençon Stéphan
    • Portier François
    , 2018, 84. Evaluating integrals is an ubiquitous issue and Monte Carlo methods, exploiting advances in random number generation over the last decades, offer a popular and powerful alternative to integration deterministic techniques, unsuited in particular when the domain of integration is complex. This paper is devoted to the study of a kernel smoothing based competitor built from a sequence of n≥1 i.i.d random vectors with arbitrary continuous probability distribution f(x)dx , originally proposed in Delyon et al. (2016), from a nonasymptotic perspective. We establish a probability bound showing that the method under study, though biased, produces an estimate approximating the target integral ∫x∈Rdφ(x)dx with an error bound of order o(1/n−−√) uniformly over a class Φ of functions φ, under weak complexity/smoothness assumptions related to the class Φ, outperforming Monte-Carlo procedures. This striking result is shown to derive from an appropriate decomposition of the maximal deviation between the target integrals and their estimates, highlighting the remarkable benefit to averaging strongly dependent terms regarding statistical accuracy in this situation. The theoretical analysis then rests on sharp probability inequalities for degenerate U-statistics. It is illustrated by numerical results in the context of covariate shift regression, providing empirical evidence of the relevance of the approach.
  • Generalized Concomitant Multi-Task Lasso for Sparse Multimodal Regression
    • Massias Mathurin
    • Fercoq Olivier
    • Gramfort Alexandre
    • Salmon Joseph
    , 2018. In high dimension, it is customary to consider Lasso-type estimators to enforce sparsity. For standard Lasso theory to hold, the regulariza-tion parameter should be proportional to the noise level, which is often unknown in practice. A remedy is to consider estimators such as the Concomitant Lasso, which jointly optimize over the regression coefficients and the noise level. However, when data from different sources are pooled to increase sample size, noise levels differ and new dedicated estima-tors are needed. We provide new statistical and computational solutions to perform het-eroscedastic regression, with an emphasis on brain imaging with magneto-and electroen-cephalography (M/EEG). When instantiated to de-correlated noise, our framework leads to an efficient algorithm whose computational cost is not higher than for the Lasso, but addresses more complex noise structures. Experiments demonstrate improved prediction and support identification with correct estimation of noise levels.