Publications

2019

Don't hide in the frames: Note-and pattern-based evaluation of automated melody extraction algorithms
- Frieler Klaus
- Başaran Doğaç
- Höger Frank
- Crayencour Hélène-Camille
- Peeters Geoffroy
- Dixon Simon
, 2019. (10.1145/3358664.3358672)
DOI : 10.1145/3358664.3358672
Streaming Random Patches for Evolving Data Stream Classification
- Gomes Heitor Murilo
- Read Jesse
- Bifet Albert
, 2019, pp.240-249. Ensemble methods are a popular choice for learning from evolving data streams. This popularity is due to (i) the ability to simulate simple, yet, successful ensemble learning strategies, such as bagging and random forests; (ii) the possibility of incorporating drift detection and recovery in conjunction to the ensemble algorithm; (iii) the availability of efficient incremental base learners, such as Hoeffding Trees. In this work, we introduce the Streaming Random Patches (SRP) algorithm, an ensemble method specially adapted to stream classification which combines random subspaces and online bagging. We provide theoretical insights and empirical results illustrating different aspects of SRP. In particular, we explain how the widely adopted incremental Hoeffding trees are not, in fact, unstable learners, unlike their batch counterparts, and how this fact significantly influences ensemble methods design and performance. We compare SRP against state-of-the-art ensemble variants for streaming data in a multitude of datasets. The results show how SRP produce a high predictive performance for both real and synthetic datasets. Besides, we analyze the diversity over time and the average tree depth, which provides insights on the differences between local subspace randomization (as in random forest) and global subspace randomization (as in random subspaces). (10.1109/ICDM.2019.00034)
DOI : 10.1109/ICDM.2019.00034
Cover detection using dominant melody embeddings
- Doras Guillaume
- Peeters Geoffroy
, 2019. Automatic cover detection-the task of finding in an audio database all the covers of one or several query tracks-has long been seen as a challenging theoretical problem in the MIR community and as an acute practical problem for authors and composers societies. Original algorithms proposed for this task have proven their accuracy on small datasets, but are unable to scale up to modern real-life audio corpora. On the other hand, faster approaches designed to process thousands of pairwise comparisons resulted in lower accuracy, making them unsuitable for practical use. In this work, we propose a neural network architecture that is trained to represent each track as a single embedding vector. The computation burden is therefore left to the embedding extraction-that can be conducted offline and stored, while the pairwise comparison task reduces to a simple Euclidean distance computation. We further propose to extract each track's embedding out of its dominant melody representation, obtained by another neural network trained for this task. We then show that this architecture improves state-of-the-art accuracy both on small and large datasets, and is able to scale to query databases of thousands of tracks in a few seconds.
Tracking beats and microtiming in afro-latin american music using conditional random fields and deep learning
- Fuentes Magdalena
- Maia Lucas S
- Rocamora Martín
- Biscainho Luiz W P
- Crayencour Hélène C
- Essid Slim
- Bello Juan P.
, 2019.
DEEP-RHYTHM FOR TEMPO ESTIMATION AND RHYTHM PATTERN RECOGNITION
- Foroughmand Hadrien
- Peeters Geoffroy
, 2019. It has been shown that the harmonic series at the tempo frequency of the onset-strength-function of an audio signal accurately describes its rhythm pattern and can be used to perform tempo or rhythm pattern estimation. Recently, in the case of multi-pitch estimation, the depth of the input layer of a convolutional network has been used to represent the harmonic series of pitch candidates. We use a similar idea here to represent the harmonic series of tempo candidates. We propose the Harmonic-Constant-Q-Modulation which represents, using a 4D-tensors, the harmonic series of modulation frequencies (considered as tempo frequencies) in several acoustic frequency bands over time. This representation is used as input to a convolutional network which is trained to estimate tempo or rhythm pattern classes. Using a large number of datasets, we evaluate the performance of our approach and compare it with previous approaches. We show that it slightly increases Accuracy-1 for tempo estimation but not the average-mean-Recall for rhythm pattern recognition.
Supervised Symbolic Music Style Translation Using Synthetic Data
- Cífka Ondřej
- Şimşekli Umut
- Richard Gael
, 2019. Research on style transfer and domain translation has clearly demonstrated the ability of deep learning-based algorithms to manipulate images in terms of artistic style. More recently, several attempts have been made to extend such approaches to music (both symbolic and audio) in order to enable transforming musical style in a similar manner. In this study, we focus on symbolic music with the goal of altering the 'style' of a piece while keeping its original 'content'. As opposed to the current methods, which are inherently restricted to be unsupervised due to the lack of 'aligned' data (i.e. the same musical piece played in multiple styles), we develop the first fully supervised algorithm for this task. At the core of our approach lies a synthetic data generation scheme which allows us to produce virtually unlimited amounts of aligned data, and hence avoid the above issue. In view of this data generation scheme, we propose an encoder-decoder model for translating symbolic music accompaniments between a number of different styles. Our experiments show that our models, although trained entirely on synthetic data, are capable of producing musically meaningful accompaniments even for real (non-synthetic) MIDI recordings. (10.5281/zenodo.3527878)
DOI : 10.5281/zenodo.3527878
CONDITIONED-U-NET: INTRODUCING A CONTROL MECHANISM IN THE U-NET FOR MULTIPLE SOURCE SEPARATIONS
- Meseguer-Brocal Gabriel
- Peeters Geoffroy
, 2019. Data-driven models for audio source separation such as U-Net or Wave-U-Net are usually models dedicated to and specifically trained for a single task, e.g. a particular instrument isolation. Training them for various tasks at once commonly results in worse performances than training them for a single specialized task. In this work, we introduce the Conditioned-U-Net (C-U-Net) which adds a control mechanism to the standard U-Net. The control mechanism allows us to train a unique and generic U-Net to perform the separation of various instruments. The C-U-Net decides the instrument to isolate according to a one-hot-encoding input vector. The input vector is embedded to obtain the parameters that control Feature-wise Linear Modulation (FiLM) layers. FiLM layers modify the U-Net feature maps in order to separate the desired instrument via affine transformations. The C-U-Net performs different instrument separations, all with a single model achieving the same performances as the dedicated ones at a lower cost. (10.5281/zenodo.3527766)
DOI : 10.5281/zenodo.3527766
Kolmogorov Model for Large Millimeter-Wave Antenna Arrays: Learning-based Beam-Alignment
- Chan Wai Ming
- Ghauch Hadi
- Kim Taejoon
- de Carvalho Elisabeth
- Fodor Gabor
, 2019, pp.411-415. (10.1109/IEEECONF44664.2019.9048734)
DOI : 10.1109/IEEECONF44664.2019.9048734
Counting Lattice Points in the Sphere using Deep Neural Networks
- Askri Aymen
- Rekaya-Ben Othman Ghaya
- Ghauch Hadi
, 2019, pp.2053-2057. This paper presents a deep learning model for regression to predict the number of lattice points inside the n-dimensional hypersphere. The number of points depends primarily on the lattice generator matrix and the sphere radius, which are used as inputs for the proposed deep neural network (DNN). To see the accuracy of the DNN model, we use some known lattices. Obtained results are compared to mathematical existing bounds in the literature. Our numerical results reveal that our model gives an accurate prediction, of around 80% percent, on the number of lattice points in the sphere. (10.1109/IEEECONF44664.2019.9048858)
DOI : 10.1109/IEEECONF44664.2019.9048858
Adaptive Algorithms for Estimating Betweenness and k -path Centralities
- Haghir Chehreghani Mostafa Haghir
- Bifet Albert
- Abdessalem Talel
, 2019, pp.1231-1240. (10.1145/3357384.3358064)
DOI : 10.1145/3357384.3358064
Commonsense Properties from Query Logs and Question Answering Forums
- Romero Julien
- Razniewski Simon
- Pal Koninika
- Pan Jeff
- Sakhadeo Archit
- Weikum Gerhard
, 2019. Commonsense knowledge about object properties, human behavior and general concepts is crucial for robust AI applications. However, automatic acquisition of this knowledge is challenging because of sparseness and bias in online sources. This paper presents Quasi-modo, a methodology and tool suite for distilling commonsense properties from non-standard web sources. We devise novel ways of tapping into search-engine query logs and QA forums, and combining the resulting candidate assertions with statistical cues from encyclopedias, books and image tags in a corroboration step. Unlike prior work on commonsense knowledge bases, Quasimodo focuses on salient properties that are typically associated with certain objects or concepts. Extensive evaluations, including extrinsic use-case studies, show that Quasimodo provides better coverage than state-of-the-art baselines with comparable quality. (10.1145/3357384.3357955)
DOI : 10.1145/3357384.3357955
From the Token to the Review: A Hierarchical Multimodal approach to Opinion Mining
- Garcia Alexandre
- Colombo Pierre
- Essid Slim
- d'Alché-Buc Florence
- Clavel Chloé
, 2019. The task of predicting fine grained user opinion based on spontaneous spoken language is a key problem arising in the development of Computational Agents as well as in the development of social network based opinion miners. Unfortunately, gathering reliable data on which a model can be trained is notoriously difficult and existing works rely only on coarsely labeled opinions. In this work we aim at bridging the gap separating fine grained opinion models already developed for written language and coarse grained models developed for spontaneous multimodal opinion mining. We take advantage of the implicit hierarchical structure of opinions to build a joint fine and coarse grained opinion model that exploits different views of the opinion expression. The resulting model shares some properties with attention-based models and is shown to provide competitive results on a recently released multimodal fine grained annotated corpus.
OPC UA PubSub Implementation and Configuration
- Liu Zepeng
- Bellot Patrick
, 2019.
Sharp exponential inequalities in survey sampling : conditional Poisson sampling schemes,
- Bertail Patrice
- Clémençon Stéphan
Bernoulli, Bernoulli Society for Mathematical Statistics and Probability, 2019, 25 (4B). (10.3150/18-BEJ1101)
DOI : 10.3150/18-BEJ1101
Experimenting with Power Divergences for Language Modeling
- Labeau Matthieu
- Cohen Shay B
, 2019, pp.4102-4112. (10.18653/v1/D19-1421)
DOI : 10.18653/v1/D19-1421
Extreme events in mid-infrared quantum cascade lasers : from randomness to advanced controllability
- Spitz O
- Herdt A
- Wu J
- Carras M
- Wong C W
- Elsässer W
- Grillot F
, 2019. We experimentally generate rogue waves in a mid-infrared quantum cascade laser with external optical feedback. These giant pulses become controllable when adding a low-amplitude periodic perturbation. This paves the way for applications where mid-infrared bursts can be of prime importance such as optical neuromorphic clusters and countermeasure systems.
Bernstein-type exponential inequalities in survey sampling: Conditional Poisson sampling schemes
- Bertail Patrice
- Clémençon Stéphan
Bernoulli, Bernoulli Society for Mathematical Statistics and Probability, 2019, 25 (4B), pp.3527-3554. (10.3150/18-BEJ1101)
DOI : 10.3150/18-BEJ1101
Investigation of Chaotic and Spiking Dynamics in Mid-Infrared Quantum Cascade Lasers Operating Continuous-Waves and Under Current Modulation
- Spitz Olivier
- Wu Jiagui
- Herdt Andreas
- Carras Mathieu
- Elsässer Wolfgang
- Wong Chee-Wei
- Grillot Frederic
IEEE Journal of Selected Topics in Quantum Electronics, Institute of Electrical and Electronics Engineers, 2019, 25 (6), pp.1-11. This study investigates chaotic and spiking dynamics of mid-infrared quantum cascade lasers operating under external optical feedback and emitting at 5.5 µm and 9 µm. In order to deepen the understanding, the route to chaos is experimentally studied in the case of continuous-wave and current modulation operation. The non-linear dynamics are analyzed with bifurcation diagrams. While for quasi-continuous wave operation, chaos is found to be more complex, pure continuous wave pumping always leads to the generation of a regular spiking induced low-frequency fluctuations dynamics. In the latter, results show that by combining external optical feedback with periodic forcing and further induced current modulation allows a better control of the chaotic dropouts. This work provides a novel insight into the development of future secure free-space communications based on quantum cascade lasers or unpredictable optical countermeasure systems operating within the two transparency atmospheric windows hence between 3 µm-5 µm and 8.5 µm-11 µm. (10.1109/JSTQE.2019.2937445)
DOI : 10.1109/JSTQE.2019.2937445
SAMBASET: a dataset of historical samba de Enredo recordings for computational music analysis
- Maia Lucas S
- Fuentes Magdalena
- Biscainho Luiz W P
- Rocamora Martín
- Essid Slim
, 2019. In the last few years, several datasets have been released to meet the requirements of "hungry" yet promising data-driven approaches in music technology research. Since, for historical reasons, most investigations conducted in the field still revolve around music of the so-called "West-ern" tradition, the corresponding data, methodology and conclusions carry a strong cultural bias. Music of non-"Western" background, whenever present, is usually un-derrepresented, poorly labeled, or even mislabeled, the exception being projects that aim at specifically describing such music. In this paper we present SAMBASET, a dataset of Brazilian samba music that contains over 40 hours of historical and modern samba de enredo commercial recordings. To the best of our knowledge, this is the first dataset of this genre. We describe the collection of metadata (e.g. artist, composer, release date) and outline our semiautomatic approach to the challenging task of annotating beats in this large dataset, which includes the assessment of the performance of state-of-the-art beat tracking algorithms for this specific case. Finally, we present a study on tempo and beat tracking that illustrates SAM-BASET's value, and we comment on other tasks for which it could be used.
Independent-Variation Matrix Factorization With Application to Energy Disaggregation
- Henriet Simon
- Şimşekli Umut
- Santos Sérgio F.
- Fuentes Benoît
- Richard Gael
IEEE Signal Processing Letters, Institute of Electrical and Electronics Engineers, 2019, 26 (11), pp.1643-1647. Matrix factorization techniques have proven to be useful in many unsupervised learning applications. Such techniques have been recently applied to Non Intrusive Load Monitoring (NILM), the process of breaking down the total electric consumption of a building into consumptions of individual appliances. While several studies addressed the NILM problem for small-scale buildings, only few studies considered the problem for large buildings, where the signals exhibit significantly different behavior. To overcome the unaddressed difficulties of processing high frequency current signals that are measured in large buildings, we propose a novel technique called Independent-Variation Matrix Factorization (IVMF), which expresses an observation matrix as the product of two matrices: the signature and the activation. Motivated by the nature of the current signals, it uses a regularization term on the temporal variations of the activation matrix and a positivity constraint, and the columns of the signature matrix are constrained to lie in a specific set. To solve the resulting optimization problem, we rely on an alternating minimization strategy involving dual optimization and quasi-Newton algorithms. The algorithm is tested against Independent Component Analysis (ICA) and Semi Nonnegative Matrix Factorization (SNMF) on a synthetic source separation problem and on a realistic NILM application for large commercial buildings. We show that IVMF outperforms competing methods and is particularly appropriate to recover positive sources that have a strong temporal dependency and sources whose variations are independent from each other. (10.1109/LSP.2019.2941428)
DOI : 10.1109/LSP.2019.2941428
Asymptotic Normality of Q-ary Linear Codes
- Shi Minjia
- Rioul Olivier
- Solé Patrick
IEEE Communications Letters, Institute of Electrical and Electronics Engineers, 2019, 23 (11), pp.1895-1898. Sidel’nikov proved in 1971 that the weight distribution of long binary codes is asymptotically Gaus- sian. Delsarte sketched in 1975 an extension of this result to Q-ary codes when Q > 2. In this note, we complete Delsarte’s proof.
Introducing spatial regularization in SAR tomography reconstruction
- Rambour Clément
- Denis Loïc
- Tupin Florence
- Oriot Hélène
IEEE Transactions on Geoscience and Remote Sensing, Institute of Electrical and Electronics Engineers, 2019. The resolution achieved by current Synthetic Aperture Radar (SAR) sensors provides detailed visualization of urban areas. Spaceborne sensors such as TerraSAR-X can be used to analyze large areas at a very high resolution. In addition, repeated passes of the satellite give access to temporal and interferometric information on the scene. Because of the complex 3-D structure of urban surfaces, scatterers located at different heights (ground, building façade, roof) produce radar echoes that often get mixed within the same radar cells. These echoes must be numerically unmixed in order to get a fine understanding of the radar images. This unmixing is at the core of SAR tomography. SAR tomography reconstruction is generally performed in two steps: (i) reconstruction of the so-called tomogram by vertical focusing, at each radar resolution cell, to extract the complex amplitudes (a 1-D processing); (ii) transformation from radar geometry to ground geometry and extraction of significant scat-terers. We propose to perform the tomographic inversion directly in ground geometry in order to enforce spatial regularity in 3-D space. This inversion requires solving a large-scale non-convex optimization problem. We describe an iterative method based on variable splitting and the augmented Lagrangian technique. Spatial regularizations can easily be included in this generic scheme. We illustrate on simulated data and a TerraSAR-X tomographic dataset the potential of this approach to produce 3-D reconstructions of urban surfaces. (10.1109/TGRS.2019.2921756)
DOI : 10.1109/TGRS.2019.2921756
Contribution of Different Positron Emission Tomography Tracers in Glioma Management: Focus on Glioblastoma
- Moreau Aurélie
- Febvey Olivia
- Mognetti Thomas
- Frappaz Didier
- Kryza David
Frontiers in Oncology, Frontiers Media, 2019, 9. (10.3389/fonc.2019.01134)
DOI : 10.3389/fonc.2019.01134
Benefits of Cache Assignment on Degraded Broadcast Channels
- Saeedi Bidokhti Shirin
- Wigger Michèle
- Bidokhti Shirin Saeedi
- Yener Aylin
IEEE Transactions on Information Theory, Institute of Electrical and Electronics Engineers, 2019, 65 (11), pp.6999-7019. Degraded K-user broadcast channels (BCs) are studied when the receivers are facilitated with cache memories. Lower and upper bounds are derived on the capacity-memory tradeoff, i.e., on the largest rate of reliable communication over the BC as a function of the receivers' cache sizes, and the bounds are shown to match for interesting special cases. The lower bounds are achieved by two new coding schemes that benefit from nonuniform cache assignments. Lower and upper bounds are also established on the global capacity-memory tradeoff, i.e., on the largest capacity-memory tradeoff that can be attained by optimizing the receivers' cache sizes subject to a total cache memory budget. The bounds coincide when the total cache memory budget is sufficiently small or sufficiently large, where the thresholds depend on the BC statistics. For small cache memories, it is optimal to assign all the cache memory to the weakest receiver. In this regime, the global capacity-memory tradeoff grows by the total cache memory budget divided by the number of files in the system. In other words, a perfect global caching gain is achievable in this regime and the performance corresponds to a system where all the cache contents in the network are available to all receivers. For large cache memories, it is optimal to assign a positive cache memory to every receiver, such that the weaker receivers are assigned larger cache memories compared to the stronger receivers. In this regime, the growth rate of the global capacity-memory tradeoff is further divided by the number of users, which corresponds to a local caching gain. It is observed numerically that a uniform assignment of the total cache memory is suboptimal in all regimes, unless the BC is completely symmetric. For erasure BCs, this claim is proved analytically in the regime of small cache sizes. (10.1109/TIT.2019.2926714)
DOI : 10.1109/TIT.2019.2926714
Large Normal Dispersion Mode-Locked Erbium-Doped Fiber Laser
- Tang Mincheng
- Granger Geoffroy
- Lesparre Fabien
- Wang Hongjie
- Qian Kai
- Lecaplain Caroline
- Oudar Jean-Louis
- Jaouën Yves
- Gabet Renaud
- Gaponov Dmitry
- Likhachev Mikhail
- Godin Thomas
- Février Sébastien
- Hideur Ammar
Fibers, MDPI, 2019, 7 (11), pp.97. We report on a passively mode-locked oscillator based on an erbium-doped dual concentric core fiber combining high normal dispersion and large mode area. This large normal dispersion laser generates long pulses with 30 ps duration and 0.17 nm spectral width at 1530 nm wavelength. The source delivers an average power of 64 mW at a repetition rate of 16 MHz, corresponding to 4 nJ energy. This concept opens up new degrees of freedom in the design of mode-locked fiber lasers. (10.3390/fib7110097)
DOI : 10.3390/fib7110097

Retour aux années