Sorry, you need to enable JavaScript to visit this website.
Share

Publications

2023

  • Interactive Depixelization of Pixel Art through Spring Simulation
    • Matusovic Marko
    • Parakkat Amal Dev
    • Eisemann Elmar
    Computer Graphics Forum, Wiley, 2023, 42 (2). We introduce an approach for converting pixel art into high-quality vector images. While much progress has been made on automatic conversion, there is an inherent ambiguity in pixel art, which can lead to a mismatch with the artist’s original intent. Further, there is room for incorporating aesthetic preferences during the conversion. In consequence, this work introduces an interactive framework to enable users to guide the conversion process towards high-quality vector illustrations. A key idea of the method is to cast the conversion process into a spring-system optimization that can be influenced by the user. Hereby, it is possible to resolve various ambiguities that cannot be handled by an automatic algorithm.
  • The Software Heritage License Dataset (2022 Edition)
    • González-Barahona Jesús M.
    • Montes-Leon Sergio
    • Robles Gregorio
    • Zacchiroli Stefano
    Empirical Software Engineering, Springer Verlag, 2023. Context: When software is released publicly, it is common to include with it either the full text of the license or licenses under which it is published, or a detailed reference to them. Therefore public licenses, including FOSS (free, open source software) licenses, are usually publicly available in source code repositories. Objective: To compile a dataset containing as many documents as possible that contain the text of software licenses, or references to the license terms. Once compiled, characterize the dataset so that it can be used for further research, or practical purposes related to license analysis. Method: Retrieve from Software Heritage-the largest publicly available archive of FOSS source code-all versions of all files whose names are commonly used to convey licensing terms. All retrieved documents will be characterized in various ways, using automated and manual analyses. Results: The dataset consists of 6.9 million unique license files. Additional metadata about shipped license files is also provided, making the dataset ready to use in various contexts, including: file length measures, MIME type, SPDX license (detected using ScanCode), and oldest appearance. The results of a manual analysis of 8102 documents is also included, providing a ground truth for further analysis. The dataset is released as open data as an archive file containing all deduplicated license files, plus several portable CSV files with metadata, referencing files via cryptographic checksums. Conclusions: Thanks to the extensive coverage of Software Heritage, the dataset presented in this paper covers a very large fraction of all software licenses for public code. We have assembled a large body of software licenses, characterized it quantitatively and qualitatively, and validated that it is mostly composed of licensing information and includes almost all known license texts. The dataset can be used to conduct empirical studies on open source licensing, training of automated license classifiers, natural language processing (NLP) analyses of legal texts, as well as historical and phylogenetic studies on FOSS licensing. It can also be used in practice to improve tools detecting licenses in source code. (10.1007/s10664-023-10377-w)
    DOI : 10.1007/s10664-023-10377-w
  • LEARNING RAW IMAGE DENOISING USING A PARAMETRIC COLOR IMAGE MODEL
    • Achddou Raphaël
    • Gousseau Yann
    • Ladjal Saïd
    , 2023. Deep learning methods for image restoration have produced impressive results over recent years. Nevertheless, they generalize poorly and need large learning image datasets to be collected for each new acquisition modality. In order to avoid the building of such datasets, it has been recently proposed to develop synthetic image datasets for training image restoration methods, using scale invariant dead leaves models. While the geometry of such models can be successfully encoded with only a few parameters, the color content cannot be straightforwardly encoded. In this paper, we leverage the concept of color lines prior to build a light parametric color model relying on a chromaticity/luminance factorization. Further, we show that the corresponding synthetic dataset can be used to train neural networks for the denoising of RAW images from different camera-phones, without using any image from these devices. This shows the potential of our approach to increase the generalization capacity of learning-based denoising approaches in real case scenarios.
  • Visualization Empowerment: How to Teach and Learn Data Visualization
    • Bach Benjamin
    • Carpendale Sheelagh
    • Hinrichs Uta
    • Huron Samuel
    , 2023, pp.10.4230/DagRep.12.6.83. Data visualization is becoming an important asset for a data-literate, informed, and critical society. Despite the variety of existing resources to teach theories and practical skills in this domain, little is known about 1) how learning processes in the context of visualization unfold and 2) best practices for engaging and teaching data visualization to diverse audiences and in different contexts. This Dagstuhl Seminar invited practitioners, researchers, and teachers from the areas of visualization, design, education and cognitive psychology to explore these questions from multiple perspectives. Through a range of practical activities, talks, and discussions, we have begun characterizing and classifying teaching methodologies. We have redacted a pedagogical manifesto, and started formalizing the concept of improvisation with visualization in the context of teaching and learning. We have also interrogated creativity as an important aspect of visualization teaching and learning and explored links between data physicalization and visualization teaching activities. Across these different themes, we have begun to map out the challenges of visualization teaching and learning and the opportunities for research and practice in this area. (10.4230/DagRep.12.6.83)
    DOI : 10.4230/DagRep.12.6.83
  • Tracking Intermittent Particles with Self-Learned Visual Features
    • Reme Raphael
    • Piriou Victor
    • Hanson Alison
    • Yuste Rafael
    • Newson Alasdair
    • Angelini Elsa
    • Olivo-Marin Jean-Christophe
    • Lagache Thibault
    IEEE Xplore, 2023, pp.1-5. In time-lapse fluorescence imaging, single-particle-tracking is a powerful tool to monitor the dynamics of objects of interest, and extract information about biological processes. However, tracked particles can be subject to occlusion and intermittent detectability. When these phenomena persist over a few frames, tracking algorithms tend to produce multiple tracklets for the same particle. In this work, we introduce self-supervised learning of visual features to compare tracked particles, and we exploit both visual and positional distances to robustly stitch tracklets representing the same particle. We demonstrate the performance of our stitching framework on time-lapse fluorescence sequences of Hydra Vulgaris neurons. Results show high stitching precision, and reduction of errors made by previous algorithms on the same data by a factor of two. (10.1109/ISBI53787.2023.10230664)
    DOI : 10.1109/ISBI53787.2023.10230664
  • Face Aging via Diffusion-based Editing
    • Chen Xiangyi
    • Lathuilière Stéphane
    , 2023. In this paper, we address the problem of face aging: generating past or future facial images by incorporating age-related changes to the given face. Previous aging methods rely solely on human facial image datasets and are thus constrained by their inherent scale and bias. This restricts their application to a limited generatable age range and the inability to handle large age gaps. We propose FADING, a novel approach to address Face Aging via DIffusion-based editiNG. We go beyond existing methods by leveraging the rich prior of large-scale language-image diffusion models. First, we specialize a pre-trained diffusion model for the task of face age editing by using an age-aware fine-tuning scheme. Next, we invert the input image to latent noise and obtain optimized null text embeddings. Finally, we perform text-guided local age editing via attention control. The quantitative and qualitative analyses demonstrate that our method outperforms existing approaches with respect to aging accuracy, attribute preservation, and aging quality.
  • Monotonic Alpha-divergence Minimisation for Variational Inference
    • Daudel Kamélia
    • Douc Randal
    • Roueff François
    Journal of Machine Learning Research, Microtome Publishing, 2023, 24 (62), pp.1-76. In this paper, we introduce a novel family of iterative algorithms which carry out $\alpha$-divergence minimisation in a Variational Inference context. They do so by ensuring a systematic decrease at each step in the $\alpha$-divergence between the variational and the posterior distributions. In its most general form, the variational distribution is a mixture model and our framework allows us to simultaneously optimise the weights and components parameters of this mixture model. Our approach permits us to build on various methods previously proposed for $\alpha$-divergence minimisation such as Gradient or Power Descent schemes and we also shed a new light on an integrated Expectation Maximization algorithm. Lastly, we provide empirical evidence that our methodology yields improved results on several multimodal target distributions and on a real data example.
  • On the Hardness of Module Learning with Errors with Short Distributions
    • Boudgoust Katharina
    • Jeudy Corentin
    • Roux-Langlois Adeline
    • Wen Weiqiang
    Journal of Cryptology, Springer Verlag, 2023, 36 (1), pp.1-70. The Module Learning With Errors (M-LWE) problem is a core computational assumption of lattice-based cryptography which offers an interesting trade-off between guaranteed security and concrete efficiency. The problem is parameterized by a secret distribution as well as an error distribution. There is a gap between the choices of those distributions for theoretical hardness results (standard formulation of M-LWE, i.e., uniform secret modulo $q$ and Gaussian error) and practical schemes (small bounded secret and error). In this work, we make progress towards narrowing this gap. More precisely, we prove that M-LWE with uniform $\eta$-bounded secret for any $1 \leq \eta \ll q$ and Gaussian error, in both its search and decision variants, is at least as hard as the standard formulation of M-LWE, provided that the module rank $d$ is at least logarithmic in the ring degree $n$. We also prove that the search version of M-LWE with large uniform secret and uniform $\eta$-bounded error is at least as hard as the standard M-LWE problem, if the number of samples $m$ is close to the module rank $d$ and with further restrictions on $\eta$. The latter result can be extended to provide the hardness of search M-LWE with uniform η-bounded secret and error under specific parameter conditions. Overall, the results apply to all cyclotomic fields, but most of the intermediate results are proven in more general number fields. (10.1007/s00145-022-09441-3)
    DOI : 10.1007/s00145-022-09441-3
  • Describing movement learning using metric learning
    • Loriette Antoine
    • Liu Wanyu
    • Bevilacqua Frédéric
    • Caramiaux Baptiste
    PLoS ONE, Public Library of Science, 2023, 18 (2), pp.e0272509. Analysing movement learning can rely on human evaluation, e.g. annotating video recordings, or on computing means in applying metrics on behavioural data. However, it remains challenging to relate human perception of movement similarity to computational measures that aim at modelling such similarity. In this paper, we propose a metric learning method bridging the gap between human ratings of movement similarity in a motor learning task and computational metric evaluation on the same task. It applies metric learning on a Dynamic Time Warping algorithm to derive an optimal set of movement features that best explain human ratings. We evaluated this method on an existing movement dataset, which comprises videos of participants practising a complex gesture sequence toward a target template, as well as the collected data that describes the movements. We show that it is possible to establish a linear relationship between human ratings and our learned computational metric. This learned metric can be used to describe the most salient temporal moments implicitly used by annotators, as well as movement parameters that correlate with motor improvements in the dataset. We conclude with possibilities to generalise this method for designing computational tools dedicated to movement annotation and evaluation of skill learning. (10.1371/journal.pone.0272509)
    DOI : 10.1371/journal.pone.0272509
  • Stein's method for discrete alpha stable point processes
    • Decreusefond Laurent
    • Vasseur Aurélien
    , 2023.
  • Majorana stellar representation of twisted photons
    • Fabre Nicolas
    • Klimov Andrei B
    • Murenzi Romain
    • Gazeau Jean-Pierre
    • Sánchez-Soto Luis L
    Physical Review Research, American Physical Society, 2023, 5 (3), pp.L032006. Majorana stellar representation, which visualizes a quantum spin as points on the Bloch sphere, allows quantum mechanics to accommodate the concept of trajectory, the hallmark of classical physics. We extend this notion to the discrete cylinder, which is the phase space of the canonical pair angle and orbital angular momentum. We demonstrate that the geometrical properties of the ensuing constellations aptly encapsulate the quantumness of the state. (10.1103/PhysRevResearch.5.L032006)
    DOI : 10.1103/PhysRevResearch.5.L032006
  • Bendima: a database for marine macro-invertebrate bycatch data designed to improve reproducibility in benthic ecology
    • Martin Alexis
    • Blettery Jonathan
    • Dettaï Agnès
    • Rosset Nicolas
    • Gousseau Yann
    Cybium : Revue Internationale d’Ichtyologie, Paris : Muséum national d'histoire naturelle, 2023. The difficulty of identifying marine macro-invertebrates and the lack of experts, added to the growing use of complex modeling approaches based on massive datasets, has led to a reproducibility crisis in benthic ecology. Improving the reliability of identification remains a key factor to increase the quality of raw data. We developed the database Bendima to manage benthic macro-invertebrate bycatch data from the scientific survey of the French Southern Ocean and Indian Ocean fisheries. This database is structured to store observations of macro-invertebrates in the form of images of the caught organisms associated to sampling effort data and molecular data, which allows for ongoing amendments to identifications and crossreferencing with barcode data. Once uploaded and stored as digital images, the Bendima observations data underpinning models can be fully assessed, criticized and compared. Here, we describe the Bendima system and provide an overview of the contents for teams involved in biodiversity database development, benthic ecology or fisheries monitoring. (10.26028/cybium/2023-020)
    DOI : 10.26028/cybium/2023-020
  • Runaway signals: Exaggerated displays of commitment may result from second-order signaling
    • Lie-Panis Julien
    • Dessalles Jean-Louis
    Journal of Theoretical Biology, Elsevier, 2023, 572, pp.111586. To demonstrate their commitment, for instance during wartime, members of a group will sometimes all engage in the same ruinous display. Such uniform, high-cost signals are hard to reconcile with standard models of signaling. For signals to be stable, they should honestly inform their audience; yet, uniform signals are trivially uninformative. To explain this phenomenon, we design a simple model, which we call the signal runaway game. In this game, senders can express outrage at non-senders. Outrage functions as a second-order signal. By expressing outrage at non-senders, senders draw attention to their own signal, and benefit from its increased visibility. Using our model and a simulation, we show that outrage can stabilize uniform signals, and can lead signal costs to run away. Second-order signaling may explain why groups sometimes demand displays of commitment from all their members, and why these displays can entail extreme costs. (10.1016/j.jtbi.2023.111586)
    DOI : 10.1016/j.jtbi.2023.111586
  • Learning finitely correlated states: stability of the spectral reconstruction
    • Fanizza Marco
    • Galke Niklas
    • Lumbreras Josep
    • Rouzé Cambyse
    • Winter Andreas
    , 2023. We show that marginals of subchains of length $t$ of any finitely correlated translation invariant state on a chain can be learned, in trace distance, with $O(t^2)$ copies -- with an explicit dependence on local dimension, memory dimension and spectral properties of a certain map constructed from the state -- and computational complexity polynomial in $t$. The algorithm requires only the estimation of a marginal of a controlled size, in the worst case bounded by a multiple of the minimum bond dimension, from which it reconstructs a translation invariant matrix product operator. In the analysis, a central role is played by the theory of operator systems. A refined error bound can be proven for $C^*$-finitely correlated states, which have an operational interpretation in terms of sequential quantum channels applied to the memory system. We can also obtain an analogous error bound for a class of matrix product density operators reconstructible by local marginals. In this case, a linear number of marginals must be estimated, obtaining a sample complexity of $\tilde{O}(t^3)$. The learning algorithm also works for states that are only close to a finitely correlated state, with the potential of providing competitive algorithms for other interesting families of states.
  • Procédé et système de placement automatique de données
    • Shaar Atef
    • Boukhatem Nadia
    • Baccouch Hana
    , 2023. L'invention concerne le domaine du placement de données pour les systèmes de stockage, et concerne plus particulièrement un procédé et un système de placement automatique de données.
  • Cooperative Self-Training for Multi-Target Adaptive Semantic Segmentation
    • Zhang Yangsong
    • Roy Subhankar
    • Lu Hongtao
    • Ricci Elisa
    • Lathuilière Stéphane
    , 2022. In this work we address multi-target domain adaptation (MTDA) in semantic segmentation, which consists in adapting a single model from an annotated source dataset to multiple unannotated target datasets that differ in their underlying data distributions. To address MTDA, we propose a self-training strategy that employs pseudo-labels to induce cooperation among multiple domain-specific classifiers. We employ feature stylization as an efficient way to generate image views that forms an integral part of self-training. Additionally, to prevent the network from overfitting to noisy pseudo-labels, we devise a rectification strategy that leverages the predictions from different classifiers to estimate the quality of pseudo-labels. Our extensive experiments on numerous settings, based on four different semantic segmentation datasets, validate the effectiveness of the proposed self-training strategy and show that our method outperforms state-of-the-art MTDA approaches. Code available at: https://github.com/Mael-zys/CoaST