Sorry, you need to enable JavaScript to visit this website.
Share

Publications

2025

  • Improving Ergonomic Viewing of Spatial XR Workspaces Through 2D Rotational Assistance
    • O'Hagan Joseph
    • Medeiros Daniel
    • Wilson Graham
    • Mcdermid Robert
    • Mcgill Mark
    , 2025, pp.Article No.: 336, Pages 1 - 8. Extended Reality unlocks the capability to create virtual workspaces that address and exceed the limitations of existing physical multi-monitor arrangements. We extend the ergonomic benefits of virtual workspaces by applying rotational assistance based on user gaze transitions between displays - meaning as a user looks towards a given display, the workspace would counter-rotate to reduce the amount of head/neck rotation required to view said display. Where prior work examined rotational assistance on one axis (horizontal) we extend this to movements across two axes, examining its impact on horizontal, vertical, and mixed arrangements of display. We found in a user study (n=20) rotational assistance improves ergonomic comfort, decreases necessary head/neck movement, improves workload, and decreases fatigue when viewing wide and tall virtual display spaces, further motivating the transition from physical to virtual displays for productivity. (10.1145/3706599.3719920)
    DOI : 10.1145/3706599.3719920
  • SpineLoft: Interactive Spine-based 2D-to-3D Modeling
    • Thiault Alexandre
    • Philippe Telo
    • Parakkat Amal Dev
    • Eisemann Elmar
    • Muthuganapathy Ramanathan
    • Igarashi Takeo
    , 2025. 3D artists (professionals and novices alike) often take inspiration from sketches or photos to guide their designs. Yet, existing modeling systems are not tailored to fully make use of such input. Consequently, significant effort and expertise are needed when creating model prototypes or exploring design options. In this work, we introduce a system to support the exploratory modeling process by enabling the transformation of 2D image elements into geometric 3D objects. Our solution relies on a novel d2 distance function, supporting a region-based lofting process, and delivers easily-editable 3D geometric "spine-rib" representations. The user draws a spine, and the system generates and modifies a generalized cylinder around it, considering image edges. The proposed approach, driven by simple user-defined scribble definitions, can robustly handle various image sources, ranging from photos to hand-drawn content. (10.1145/3706598.3713439)
    DOI : 10.1145/3706598.3713439
  • Restyling Unsupervised Concept Based Interpretable Networks with Generative Models
    • Parekh Jayneel
    • Bouniot Quentin
    • Mozharovskyi Pavlo
    • Newson Alasdair
    • d'Alché-Buc Florence
    , 2025. Developing inherently interpretable models for prediction has gained prominence in recent years. A subclass of these models, wherein the interpretable network relies on learning high-level concepts, are valued because of closeness of concept representations to human communication. However, the visualization and understanding of the learnt unsupervised dictionary of concepts encounters major limitations, especially for large-scale images. We propose here a novel method that relies on mapping the concept features to the latent space of a pretrained generative model. The use of a generative model enables high quality visualization, and lays out an intuitive and interactive procedure for better interpretation of the learnt concepts by imputing concept activations and visualizing generated modifications. Furthermore, leveraging pretrained generative models has the additional advantage of making the training of the system more efficient. We quantitatively ascertain the efficacy of our method in terms of accuracy of the interpretable prediction network, fidelity of reconstruction, as well as faithfulness and consistency of learnt concepts. The experiments are conducted on multiple image recognition benchmarks for large-scale images.
  • Tailoring Mixup to Data for Calibration
    • Bouniot Quentin
    • Mozharovskyi Pavlo
    • d'Alché-Buc Florence
    , 2023. Among all data augmentation techniques proposed so far, linear interpolation of training samples, also called Mixup, has found to be effective for a large panel of applications. Along with improved predictive performance, Mixup is also a good technique for improving calibration. However, mixing data carelessly can lead to manifold mismatch, i.e., synthetic data lying outside original class manifolds, which can deteriorate calibration. In this work, we show that the likelihood of assigning a wrong label with mixup increases with the distance between data to mix. To this end, we propose to dynamically change the underlying distributions of interpolation coefficients depending on the similarity between samples to mix, and define a flexible framework to do so without losing in diversity. We provide extensive experiments for classification and regression tasks, showing that our proposed method improves predictive performance and calibration of models, while being much more efficient.
  • Quantum Key Distribution with Efficient Post-Quantum Cryptography-Secured Trusted Node on a Quantum Network
    • Piétri Yoann
    • Verdier Pierre-Enguerrand
    • Lacour Baptiste
    • Gautier Maxime
    • Huang Heming
    • Camus Thomas
    • Pegon Jean-Sébastien
    • Zuber Martin
    • Faugère Jean-Charles
    • Schiavon Matteo
    • Rhouni Amine
    • Jaouën Yves
    • Fabre Nicolas
    • Alléaume Romain
    • Rivera Thomas
    • Diamanti Eleni
    , 2025. Quantum Key Distribution (QKD) enables two distant users to exchange a secret key with information-theoretic security, based on the fundamental laws of quantum physics. While it is arguably the most mature application of quantum cryptography, it has inherent limitations in the achievable distance and the scalability to large-scale infrastructures. While the applicability of QKD can be readily increased with the use of intermediary trusted nodes, this adds additional privacy requirements on third parties. In this work, we present an efficient scheme leveraging a trusted node with lower privacy requirements thanks to the use of post-quantum cryptographic techniques, and implement it on a deployed fiber optic quantum communication network in the Paris area. (10.48550/arXiv.2504.01454)
    DOI : 10.48550/arXiv.2504.01454
  • Robust and Reliable PUF Protocol Exploiting Non-Monotonic Quantization and Neyman-Pearson Lemma
    • Nasir Neelam
    • Béguinot Julien
    • Cheng Wei
    • Kühne Ulrich
    • Danger Jean-Luc
    , 2025. Strong physical unclonable functions (PUFs) provide a costeffective authentication solution for resource-limited devices. However, they are susceptible to machine learning (ML) attacks. The lightweight defenses against ML rely on adding non-linearity in the PUF behavior (as the XOR-PUF), or limiting the number of challenges at protocol level (as the lockdown protocol) to constrain learning. Another low-cost approach is to use a non-linear quantization of the response when the PUF provides an integer response, like the RO-PUF. This paper studies the non-monotonic quantization (NMQ) which greatly enhances the security when a large number of quantization level is used. Unfortunately, this makes the PUF highly unreliable, rendering it impractical for authentication purposes. In this study, we propose a solution which circumvents the intrinsic PUF unreliability of NMQ to build an effective authentication protocol. It relies on the Neyman-Pearson test which transforms the native dependability of responses into an asset to get a reliable authentication protocol. To validate this approach, we evaluate our solution in FPGA using a loop PUF (ring oscillator-based PUF) which is a multibin PUF. The results show that an authentication success of nearly 100% can be obtained with a high resistance as up to 60% accuracy against three types of ML attacks.
  • Revisiting Anatomy of Anorectal Malformations with a Symbolic AI Segmentation Method Applied on Diffusion MRI: The Lumbosacral Plexus Development and Microarchitecture Is Different in High and Low Types
    • Goulin J.
    • La Barbera G.
    • Delmonte A.
    • Bonnot E.
    • Berteloot L.
    • Lozach C.
    • Beaudoin S.
    • Blanc T.
    • Cretolle C.
    • Muller C.
    • Meignan P.
    • Peyrot Q.
    • Mille E.
    • Marret J.
    • Zerah M.
    • Boddaert N.
    • Gori P.
    • Bloch Isabelle
    • Sarnacki Sabine
    Journal of Imaging Informatics in Medicine, Springer Nature, 2025, pp.1-11. Anorectal malformations (ARMs) are congenital anomalies of the distal part of the hindgut often associated with sacral and/or spinal anomalies. We investigated anatomical and microstructural properties of the lumbosacral plexus of ARM patients from imaging data. Twenty-five patients (16 males), median age 4 months (2-49), 13 high and 12 low ARM, underwent 3 Tesla magnetic resonance imaging with diffusion tensor sequences (dMRI) before repair. A 3D model was built from manual segmentation and used to guide novel AI algorithms for the segmentation of the nervous pelvic network. Volume and diffusion parameters were obtained for each root (L5 to S4) and compared among patients with high and low ARMs using a nonparametric Wilcoxon test. Comparison was also made between the groups with (n = 9) or without (n = 16) sacral and/or spinal cord anomalies. When compared with low ARMS, high ARMs exhibited the following: a smaller volume of S1, S2, and S3 roots and of S1 and S3 for patients without sacral and/or spinal cord abnormalities; an overall significant alteration of the roots micro-architecture reflected by a diminution of the fractional anisotropy and an increase of the axial diffusivity and radial diffusivity measures. This first analysis of the lumbosacral plexus from dMRI in children with ARMs shows differences in the development and microarchitecture of the lumbosacral nerve roots between high and low ARMs. This observation supports the hypothesis that high ARMs may result from a more regional developmental abnormality than low ARMs and open new ways to visualize and assess the lumbosacral plexus in children and adults. (10.1007/s10278-024-01378-2)
    DOI : 10.1007/s10278-024-01378-2
  • GUIDING THE CLASSIFICATION OF HEPATOCELLULAR CARCINOMA ON 3D CT-SCANS USING DEEP AND HANDCRAFTED RADIOLOGICAL FEATURES
    • Sarfati Emma
    • Bône Alexandre
    • Rohé Marc-Michel
    • Aubé Christophe
    • Ronot Maxime
    • Gori Pietro
    • Bloch Isabelle
    , 2025. Hepatocellular carcinoma is the most spread primary liver cancer across the world (∼80% of the liver tumors). The gold standard for HCC diagnosis is liver biopsy. However, in the clinical routine, expert radiologists provide a visual diagnosis by interpreting hepatic CT-scans according to a standardized protocol, the LI-RADS, which uses five radiological criteria with an associated decision tree. In this paper, we propose an automatic approach to predict histologyproven HCC from CT images in order to reduce radiologists' intervariability. We first show that standard deep learning methods fail to accurately predict HCC from CT-scans on a challenging database, and propose a two-step approach inspired by the LI-RADS system to improve the performance. We achieve improvements from 6 to 18 points of AUC with respect to deep learning baselines trained with different architectures. We also provide clinical validation of our method, achieving results that outperform non-expert radiologists and are on par with expert ones.
  • Solver-in-the-loop approach to closure of shell models of turbulence
    • Freitas André
    • Um Kiwon
    • Desbrun Mathieu
    • Buzzicotti Michele
    • Biferale Luca
    Physical Review Fluids, American Physical Society, 2025, 10 (4), pp.044602. This work studies an a posteriori data-driven approach (known as solver-in-the-loop) for subgrid modeling of a shell model for turbulence. This approach takes advantage of the differentiable physics paradigm of deep learning, allowing a neural network model to interact with the differential equation solver over time during the training process. The closure model is, then, naturally exposed to equations-informed input distributions by accounting for prior corrections over the temporal evolution in training. Such a characteristic makes this approach depart from the conventional a priori instantaneous training paradigm and often leads to a more accurate and stable closure model. Our study demonstrates that the closure learned via this a posteriori approach is able to reproduce high-order statistical moments of interest also in closures of high Reynolds number turbulence. Moreover, we investigate the performance of the learned model by experimenting with the effect of unrolling in time, which has remained for the most part unexplored in the literature. Finally, we discuss potential extensions of this approach to Navier-Stokes equations. (10.1103/PhysRevFluids.10.044602)
    DOI : 10.1103/PhysRevFluids.10.044602
  • Backward Diffusion iterates Noising-Relaxed Denoising
    • Leclaire Arthur
    • Guez Eliot
    • Galerne Bruno
    , 2025. The goal of this paper is to offer a synthetic view of connections between the celebrated diffusion models and classical additive Gaussian denoising. It allows to formulate standard diffusion schemes as a simple iterative noising-relaxed denoising process. By bringing new understanding on the noise schedules, this allows to accelerate the model sampling, and also to use diffusion schemes with off-the-shelf denoisers. Finally, we question the use of diffusion-based denoisers to regularize inverse problems in a plug-and-play fashion, and highlight potential stability problems induced by such very deep regularizations.
  • Relations Among New CCA Security Notions for Approximate FHE
    • Brzuska Chris
    • Canard Sébastien
    • Fontaine Caroline
    • Phan Duong Hieu
    • Pointcheval David
    • Renard Marc
    • Sirdey Renaud
    IACR Communications in Cryptology, International Association for Cryptologic Research (IACR), 2025, 2 (1), pp.1-35. In a recent Eurocrypt'24 paper, Manulis and Nguyen have proposed a new CCA security notion, vCCA, and associated construction blueprints to leverage both CPA-secure and correct FHE beyond the CCA1 security barrier. However, because their approach is only valid under the correctness assumption, it leaves a large part of the FHE spectrum uncovered, as many FHE schemes used in practice turn out to be approximate and, as such, do not satisfy the correctness assumption. In this paper, we improve their work by defining and investigating a variant of their security notion which is suitable for a more general case where approximate FHE are included. As the passive security of approximate FHE schemes is more appropriately captured by CPAD rather than CPA security, we start from the former notion to define our vCCAD new security notion. Although we show that vCCA and vCCAD are equivalent when the correctness assumption holds, we establish that vCCAD security is strictly stronger than vCCA security in the general case. In doing so, we interestingly establish several new separation results between variants of CPAD security of increasing strength. This allows us to clarify the relationship between vCCA security and CPAD security, and to reveal that the security notions landscape is much simpler for correct FHE than when approximate ones are included — in which case, for example, we establish that multiple challenges security notions are strictly stronger than single-challenge ones for both CPAD and vCCAD security. Lastly, we also give concrete construction blueprints, showing how to leverage some of the blueprints proposed by Manulis and Nguyen to achieve vCCAD security. As a result, vCCAD security is the strongest CCA security notion known so far to be achievable by both correct and approximate FHE schemes. (10.62056/aee0iv7sf)
    DOI : 10.62056/aee0iv7sf
  • Enhancing surrogate regression methods for structured prediction : An odyssey with loss functions
    • Yang Junjie
    , 2025. Machine learning, a rapidly evolving field at the intersection of mathematics and computer science, has transformed both scientific research and real-world applications. Beyond classification and regression, it now allows tackling structured prediction, enabling breakthroughs in machine translation, metabolite identification, and protein structure prediction, to name a few.Structured prediction (SP) is challenging due to its large, combinatorial output space. Surrogate regression methods like implicit loss embedding (ILE) and output kernel regression (OKR) address this by mapping structured outputs into a Hilbert space, converting SP into a vector-valued learning problem. However, they still face several challenges: (i) their performance depends heavily on complex loss function design, (ii) the implicit or infinite-dimensional nature of surrogate spaces limits neural network integration, and (iii) inference remains computationally demanding. This thesis aims to improve surrogate regression methods to overcome these limitations. For this purpose, we leverage several families of mathematical tools, including optimal transport (OT), kernel methods, and contrastive learning.We first address structured prediction for labeled graphs, leveraging recent advances in optimal transport distances. We introduce the fused network Gromov-Wasserstein (FNGW) distance, which incorporates edge features into computations. Using FNGW as a loss function in the ILE framework, we develop ILE-FNGW, generating predictions as FNGW barycenters. To tackle inference complexity, we propose Any2Graph-FNGW, a neural network-based model that predicts directly in a relaxed surrogate graph space, simplifying inference through efficient decoding.Next, building on OKR, we introduce deep sketched output kernel regression (DSOKR), a new framework that extends neural networks as surrogate hypothesis spaces for general structured outputs. DSOKR constructs a finite-dimensional subspace of a reproducing kernel Hilbert space (RKHS) using random sketching. This approach preserves flexibility by allowing any neural architecture for input processing while requiring only the prediction of coefficients for a finite-dimensional basis in the output layer.Finally, we introduce a novel SP framework, explicit loss embedding (ELE), which replaces predefined loss functions for structured data with a learnable, differentiable loss. This loss is defined as the squared Euclidean distance between neural network-parameterized embeddings and is learned directly from output data using contrastive learning. The new loss serves a dual purpose: during training, it formulates a finite-dimensional surrogate regression problem, and during inference, it defines a differentiable decoding objective.We evaluate all proposed methods on supervised graph prediction tasks, highlighting the distinct characteristics of each SP approach.
  • Perceptual Noise-Masking with Music through Deep Spectral Envelope Shaping
    • Berger Clémentine
    • Badeau Roland
    • Essid Slim
    , 2025. People often listen to music in noisy environments, seeking to isolate themselves from ambient sounds. Indeed, a music signal can mask some of the noise's frequency components due to the effect of simultaneous masking. In this article, we propose a neural network based on a psychoacoustic masking model, designed to enhance the music's ability to mask ambient noise by reshaping its spectral envelope with predicted filter frequency responses. The model is trained with a perceptual loss function that balances two constraints: effectively masking the noise while preserving the original music mix and the user's chosen listening level. We evaluate our approach on simulated data replicating a user's experience of listening to music with headphones in a noisy environment. The results, based on defined objective metrics, demonstrate that our system improves the state of the art.
  • A Hybrid Model for Weakly-Supervised Speech Dereverberation
    • Bahrman Louis
    • Fontaine Mathieu
    • Richard Gael
    , 2025. This paper introduces a new training strategy to improve speech dereverberation systems using minimal acoustic information and reverberant (wet) speech. Most existing algorithms rely on paired dry/wet data, which is difficult to obtain, or on target metrics that may not adequately capture reverberation characteristics and can lead to poor results on non-target metrics. Our approach uses limited acoustic information, like the reverberation time (RT60), to train a dereverberation system. The system's output is resynthesized using a generated room impulse response and compared with the original reverberant speech, providing a novel reverberation matching loss replacing the standard target metrics. During inference, only the trained dereverberation model is used. Experimental results demonstrate that our method achieves more consistent performance across various objective metrics used in speech dereverberation than the state-of-the-art.
  • Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning
    • Quelennec Aurian
    • Chouteau Pierre
    • Peeters Geoffroy
    • Essid Slim
    , 2025, pp.1-5. <div><p>Recently, self-supervised learning methods based on masked latent prediction have proven to encode input data into powerful representations. However, during training, the learned latent space can be further transformed to extract higher-level information that could be more suited for downstream classification tasks. Therefore, we propose a new method: MAsked latenT Prediction And Classification (MATPAC), which is trained with two pretext tasks solved jointly. As in previous work, the first pretext task is a masked latent prediction task, ensuring a robust input representation in the latent space. The second one is unsupervised classification, which utilises the latent representations of the first pretext task to match probability distributions between a teacher and a student. We validate the MATPAC method by comparing it to other state-of-the-art proposals and conducting ablations studies. MATPAC reaches state-of-the-art self-supervised learning results on reference audio classification datasets such as OpenMIC, GTZAN, ESC-50 and US8K and outperforms comparable supervised methods' results for musical auto-tagging on Magna-tag-a-tune.</p></div> (10.1109/ICASSP49660.2025.10887666)
    DOI : 10.1109/ICASSP49660.2025.10887666
  • Multiple Choice Learning for Efficient Speech Separation with Many Speakers
    • Perera David
    • Derrida Francois
    • Mariotte Théo
    • Richard Gael
    • Essid Slim
    , 2025. <div><p>Training speech separation models in the supervised setting raises a permutation problem: finding the best assignation between the model predictions and the ground truth separated signals. This inherently ambiguous task is customarily solved using Permutation Invariant Training (PIT). In this article, we instead consider using the Multiple Choice Learning (MCL) framework, which was originally introduced to tackle ambiguous tasks. We demonstrate experimentally on the popular WSJ0-mix and LibriMix benchmarks that MCL matches the performances of PIT, while being computationally advantageous. This opens the door to a promising research direction, as MCL can be naturally extended to handle a variable number of speakers, or to tackle speech separation in the unsupervised setting.</p></div> (10.1109/ICASSP49660.2025.10888528)
    DOI : 10.1109/ICASSP49660.2025.10888528
  • Twenty-Five Years of MIR Research: Achievements, Practices, Evaluations, and Future Challenges
    • Peeters Geoffroy
    • Rafii Zafar
    • Fuentes Magdalena
    • Duan Zhiyao
    • Benetos Emmanouil
    • Nam Juhan
    • Mitsufuji Yuki
    , 2025, pp.1-5. In this paper, we trace the evolution of Music Information Retrieval (MIR) over the past 25 years. While MIR gathers all kinds of research related to music informatics, a large part of it focuses on signal processing techniques for music data, fostering a close relationship with the IEEE Audio and Acoustic Signal Processing Technical Committee. In this paper, we reflect the main research achievements of MIR along the three EDICS related to music analysis, processing and generation. We then review a set of successful practices that fuel the rapid development of MIR research. One practice is the annual research benchmark, the Music Information Retrieval Evaluation eXchange, where participants compete on a set of research tasks. Another practice is the pursuit of reproducible and open research. The active engagement with industry research and products is another key factor for achieving large societal impacts and motivating younger generations of students to join the field. Last but not the least, the commitment to diversity, equity and inclusion ensures MIR to be a vibrant and open community where various ideas, methodologies, and career pathways collide. We finish by providing future challenges MIR will have to face. (10.1109/ICASSP49660.2025.10888947)
    DOI : 10.1109/ICASSP49660.2025.10888947
  • Learning Source Disentanglement in Neural Audio Codec
    • Bie Xiaoyu
    • Liu Xubo
    • Richard Gaël
    , 2024. Neural audio codecs have significantly advanced audio compression by efficiently converting continuous audio signals into discrete tokens. These codecs preserve high-quality sound and enable sophisticated sound generation through generative models trained on these tokens. However, existing neural codec models are typically trained on large, undifferentiated audio datasets, neglecting the essential discrepancies between sound domains like speech, music, and environmental sound effects. This oversight complicates data modeling and poses additional challenges to the controllability of sound generation. To tackle these issues, we introduce the Source-Disentangled Neural Audio Codec (SD-Codec), a novel approach that combines audio coding and source separation. By jointly learning audio resynthesis and separation, SD-Codec explicitly assigns audio signals from different domains to distinct codebooks, sets of discrete representations. Experimental results indicate that SD-Codec not only maintains competitive resynthesis quality but also, supported by the separation results, demonstrates successful disentanglement of different sources in the latent space, thereby enhancing interpretability in audio codec and providing potential finer control over the audio generation process. (10.1109/ICASSP49660.2025.10888065)
    DOI : 10.1109/ICASSP49660.2025.10888065
  • O-EENC-SD: Efficient Online End-to-End Neural Clustering for Speaker Diarization
    • Gruttadauria Elio
    • Fontaine Mathieu
    • Le Roux Jonathan
    • Essid Slim
    , 2025. We introduce O-EENC-SD: an end-to-end online speaker diarization system based on EEND-EDA, featuring a novel RNN-based stitching mechanism for online prediction. In particular, we develop a novel centroid refinement decoder whose usefulness is assessed through a rigorous ablation study. Our system provides key advantages over existing methods: a hyperparameter-free solution compared to unsupervised clustering approaches, and a more efficient alternative to current online end-to-end methods, which are computationally costly. We demonstrate that O-EENC-SD is competitive with the state of the art in the two-speaker conversational telephone speech domain, as tested on the CallHome dataset. Our results show that O-EENC-SD provides a great trade-off between DER and complexity, even when working on independent chunks with no overlap, making the system extremely efficient.
  • Contrastive Knowledge Distillation for Embedding Refinement in Personalized Speech Enhancement
    • Serre Thomas
    • Fontaine Mathieu
    • Benhaim Éric
    • Essid Slim
    , 2025, pp.1-5. Personalized speech enhancement (PSE) has shown convincing results when it comes to extracting a known target voice among interfering ones. The corresponding systems usually incorporate a representation of the target voice within the enhancement system, which is extracted from an enrollment clip of the target voice with upstream models. Those models are generally heavy as the speaker embedding's quality directly affects PSE performances. Yet, embeddings generated beforehand cannot account for the variations of the target voice during inference time. In this paper, we propose to perform on-thefly refinement of the speaker embedding using a tiny speaker encoder. We first introduce a novel contrastive knowledge distillation methodology in order to train a 150k-parameter encoder from complex embeddings. We then use this encoder within the enhancement system during inference and show that the proposed method greatly improves PSE performances while maintaining a low computational load. (10.1109/icassp49660.2025.10887609)
    DOI : 10.1109/icassp49660.2025.10887609
  • AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder
    • Sadok Samir
    • Leglaive Simon
    • Girin Laurent
    • Richard Gaël
    • Alameda-Pineda Xavier
    , 2025, pp.1-5. This article introduces AnCoGen, a novel method that leverages a masked autoencoder to unify the analysis, control, and generation of speech signals within a single model. AnCoGen can analyze speech by estimating key attributes, such as speaker identity, pitch, content, loudness, signal-to-noise ratio, and clarity index. In addition, it can generate speech from these attributes and allow precise control of the synthesized speech by modifying them. Extensive experiments demonstrated the effectiveness of AnCoGen across speech analysisresynthesis, pitch estimation, pitch modification, and speech enhancement. Code and audio examples are available online.
  • Investigating the Sensitivity of Pre-trained Audio Embeddings to Common Effects
    • Deng Victor
    • Wang Changhong
    • Richard Gael
    • McFee Brian
    , 2025. In recent years, foundation models have significantly advanced data-driven systems across various domains. Yet, their underlying properties, especially when functioning as feature extractors, remain under-explored. In this paper, we investigate the sensitivity to audio effects of audio embeddings extracted from widely-used foundation models, including OpenL3, PANNs, and CLAP. We focus on audio effects as the source of sensitivity due to their prevalent presence in large audio datasets. By applying parameterized audio effects (gain, low-pass filtering, reverberation, and bitcrushing), we analyze the correlation between the deformation trajectories and the effect strength in the embedding space. We propose to quantify the dimensionality and linearizability of the deformation trajectories induced by audio effects using canonical correlation analysis. We find that there exists a direction along which the embeddings move monotonically as the audio effect strength increases, but that the subspace containing the displacements is generally high-dimensional. This shows that pre-trained audio embeddings do not globally linearize the effects. Our empirical results on instrument classification downstream tasks confirm that projecting out the estimated deformation directions cannot generally improve the robustness of pre-trained embeddings to audio effects.
  • Decoding the Hierarchy: A Hybrid Approach to Hierarchical Multi-label Text Classification
    • Torba Fatos
    • Gravier Christophe
    • Laclau Charlotte
    • Kammoun Abderrhammen
    • Subercaze Julien
    , 2025, 15572, pp.405-420. Hierarchical multi-label text classification (HMTC) aims to predict multiple labels from a tree-like hierarchy for a given input text. Recent approaches frame HMTC as a seq2seq problem, where the objective is to predict the sequence of associated labels, regardless of their order or position in the hierarchy. Despite promising results, these approaches rely solely on attention mechanisms from previously generated tokens. This limit prevents them from acquiring information about the global hierarchy and may lead to the accumulation of errors as the model learns hierarchical cues among labels. We propose a novel HMTC model based on a hybrid version of the encoder-decoder architecture where the decoder is pre-populated with the entire label embeddings. By leveraging the decoder’s Cross-Attention and Hierarchical Self-Attention mechanisms, we achieve a label representation that benefits from instance and global label-wise information. Empirical experiments on four HMTC benchmark datasets demonstrated the effectiveness of our approach by settling new state-of-the-art results. Code (https://github.com/FatosTorba/HLPD) and datasets are made available to facilitate the reproducibility and future work. (10.1007/978-3-031-88708-6_26)
    DOI : 10.1007/978-3-031-88708-6_26
  • Convex Quartic Problems: Homogenized Gradient Method and Preconditioning
    • Dragomir Radu-Alexandru
    • Nesterov Yurii
    SIAM Journal on Optimization, Society for Industrial and Applied Mathematics, 2025, 35 (2), pp.651-677. <div><p>We consider a convex minimization problem for which the objective is the sum of a homogeneous polynomial of degree four and a linear term. Such task arises as a subproblem in algorithms for quadratic inverse problems with a difference-of-convex structure. We design a first-order method called Homogenized Gradient, along with an accelerated version, which enjoy fast convergence rates of respectively O(κ 2 /K 2 ) and O(κ 2 /K 4 ) in relative accuracy, where K is the iteration counter. The constant κ is the quartic condition number of the problem.</p><p>Then, we show that for a certain class of problems, it is possible to compute a preconditioner for which this condition number is √ n, where n is the problem dimension. To establish this, we study the more general problem of finding the best quadratic approximation of an ℓ p norm composed with a quadratic map. Our construction involves a generalization of the so-called Lewis weights.</p></div> (10.1137/23M1583363)
    DOI : 10.1137/23M1583363
  • On the compressibility of large-scale source code datasets
    • Boffa Antonio
    • Di Cosmo Roberto
    • Ferragina Paolo
    • Guerra Andrea
    • Manzini Giovanni
    • Vinciguerra Giorgio
    • Zacchiroli Stefano
    Journal of Systems and Software, Elsevier, 2025, 227, pp.112429. Storing ultra-large amounts of unstructured data (often called objects or blobs) is a fundamental task for several object-based storage engines, data warehouses, data-lake systems, and key-value stores. These systems cannot currently leverage similarities between objects, which could be vital in improving their space and time performance. An important use case in which we can expect the objects to be highly similar is the storage of large-scale versioned source code datasets, such as the Software Heritage Archive (Di Cosmo and Zacchiroli, 2017). This use case is particularly interesting given the extraordinary size (1.5 PiB), the variegated nature, and the high repetitiveness of the at-issue corpus. In this paper we discuss and experiment with content-and context-based compression techniques for source-code collections that tailor known and novel tools to this setting in combination with state-of-the-art general-purpose compressors and the information coming from the Software Heritage Graph. We experiment with our compressors over a random sample of the entire corpus, and four large samples of source code files written in different popular languages: C/C++, Java, JavaScript, and Python. We also consider two scenarios of usage for our compressors, called Backup and File-Access scenario, where the latter adds to the former the support for single file retrieval. As a net result, our experiments show (i) how much ''compressible'' each language is, (ii) which content-or context-based techniques compress better and are faster to (de)compress by possibly supporting individual file access, and (iii) the ultimate compressed size that, according to our estimate, our best solution could achieve in storing all the source code written in these languages and available in the Software Heritage Archive: namely, in 3 TiB (down from their original 78 TiB total size, with an average compression ratio of 4%). (10.1016/j.jss.2025.112429)
    DOI : 10.1016/j.jss.2025.112429