Publications

2025

Differentially Private Policy Gradient
- Rio Alexandre
- Barlier Merwan
- Colin Igor
, 2025. Motivated by the increasing deployment of reinforcement learning in the real world, involving a large consumption of personal data, we introduce a differentially private (DP) policy gradient algorithm. We show that, in this setting, the introduction of Differential Privacy can be reduced to the computation of appropriate trust regions, thus avoiding the sacrifice of theoretical properties of the DP-less methods. Therefore, we show that it is possible to find the right trade-off between privacy noise and trust-region size to obtain a performant differentially private policy gradient algorithm. We then outline its performance empirically on various benchmarks. Our results and the complexity of the tasks addressed represent a significant improvement over existing DP algorithms in online RL.
Price of Safety in Linear Best Arm Identification
- Shang Xuedong
- Colin Igor
- Barlier Merwan
- Cherkaoui Hamza
, 2025. We introduce the safe best-arm identification framework with linear feedback, where the agent is subject to some stage-wise safety constraint that linearly depends on an unknown parameter vector. The agent must take actions in a conservative way so as to ensure that the safety constraint is not violated with high probability at each round. Ways of leveraging the linear structure for ensuring safety has been studied for regret minimization, but not for best-arm identification to the best our knowledge. We propose a gap-based algorithm that achieves meaningful sample complexity while ensuring the stage-wise safety. We show that we pay an extra term in the sample complexity due to the forced exploration phase incurred by the additional safety constraint. Experimental illustrations are provided to justify the design of our algorithm.
Differentially Private Deep Model-based Rein-forcement Learning
- Rio Alexandre
- Barlier Merwan
- Colin Igor
, 2025. We address private deep offline reinforcement learning (RL), where the goal is to train a policy on standard control tasks that is differentially private (DP) with respect to individual trajectories in the dataset. To achieve this, we introduce PRIMORL, a model-based RL algorithm with formal differential privacy guarantees. PRIMORL first learns an ensemble of trajectory-level DP models of the environment from offline data. It then optimizes a policy on the penalized private model, without any further interaction with the system or access to the dataset. In addition to offering strong theoretical foundations, we demonstrate empirically that PRIMORL enables the training of private RL agents on offline continuous control tasks with deep function approximations, whereas current methods are limited to simpler tabular and linear Markov Decision Processes (MDPs). We furthermore outline the tradeoffs involved in achieving privacy in this setting.
HISTOIRESMORALES: A French Dataset for Assessing Moral Alignment
- Leteno Thibaud
- Proskurina Irina
- Gourru Antoine
- Velcin Julien
- Laclau Charlotte
- Metzler Guillaume
- Gravier Christophe
, 2025, pp.2590–2612. Aligning language models with human values is crucial, especially as they become more integrated into everyday life. While models are often adapted to user preferences, it is equally important to ensure they align with moral norms and behaviours in real-world social situations. Despite significant progress in languages like English and Chinese, French has seen little attention in this area, leaving a gap in understanding how LLMs handle moral reasoning in this language. To address this gap, we introduce HISTOIRESMORALES, a French dataset derived from MORALSTORIES, created through translation and subsequently refined with the assistance of native speakers to guarantee grammatical accuracy and adaptation to the French cultural context. We also rely on annotations of the moral values within the dataset to ensure their alignment with French norms. HISTOIRESMORALES covers a wide range of social situations, including differences in tipping practices, expressions of honesty in relationships, and responsibilities toward animals. To foster future research, we also conduct preliminary experiments on the alignment of multilingual models on French and English data and the robustness of the alignment. We find that while LLMs are generally aligned with human moral norms by default, they can be easily influenced with user-preference optimization for both moral and immoral data. (10.48550/arXiv.2501.17117)
DOI : 10.48550/arXiv.2501.17117
EmoDynamiX: Emotional Support Dialogue Strategy Prediction by Modelling MiXed Emotions and Discourse Dynamics
- Wan Sky Chenwei
- Labeau Matthieu
- Clavel Chloé
, 2025, pp.1678-1695. Designing emotionally intelligent conversational systems to provide comfort and advice to people experiencing distress is a compelling area of research. Recently, with advancements in large language models (LLMs), end-to-end dialogue agents without explicit strategy prediction steps have become prevalent. However, implicit strategy planning lacks transparency, and recent studies show that LLMs’ inherent preference bias towards certain socio-emotional strategies hinders the delivery of high-quality emotional support. To address this challenge, we propose decoupling strategy prediction from language generation, and introduce a novel dialogue strategy prediction framework, EmoDynamiX, which models the discourse dynamics between user fine-grained emotions and system strategies using a heterogeneous graph for better performance and transparency. Experimental results on two ESC datasets show EmoDynamiX outperforms previous state-of-the-art methods with a significant margin (better proficiency and lower preference bias). Our approach also exhibits better transparency by allowing backtracing of decision making. (10.18653/v1/2025.naacl-long.81)
DOI : 10.18653/v1/2025.naacl-long.81
Adaptive Sample Sharing for Multi Agent Linear Bandits
- Cherkaoui Hamza
- Barlier Merwan
- Colin Igor
, 2025. The multi-agent linear bandit setting is a well-known setting for which designing efficient collaboration between agents remains challenging. This paper studies the impact of data sharing among agents on regret minimization. Unlike most existing approaches, our contribution does not rely on any assumptions on the bandit parameters structure. Our main result formalizes the trade-off between the bias and uncertainty of the bandit parameter estimation for efficient collaboration. This result is the cornerstone of the Bandit Adaptive Sample Sharing (BASS) algorithm, whose efficiency over the current state-of-the-art is validated through both theoretical analysis and empirical evaluations on both synthetic and real-world datasets. Furthermore, we demonstrate that, when agents' parameters display a cluster structure, our algorithm accurately recovers them.
Wild SBOMs: a Large-scale Dataset of Software Bills of Materials from Public Code
- Soeiro Luı́s
- Robert Thomas
- Zacchiroli Stefano
, 2025. Developers gain productivity by reusing readily available Free and Open Source Software (FOSS) components. Such practices also bring some difficulties, such as managing licensing, components and related security. One approach to handle those difficulties is to use Software Bill of Materials (SBOMs). While there have been studies on the readiness of practitioners to embrace SBOMs and on the SBOM tools ecosystem, a large scale study on SBOM practices based on SBOM files produced in the wild is still lacking. A starting point for such a study is a large dataset of SBOM files found in the wild. We introduce such a dataset, consisting of over 78 thousand unique SBOM files, deduplicated from those found in over 94 million repositories. We include metadata that contains the standard and format used, quality score generated by the tool sbomqs, number of revisions, filenames and provenance information. Finally, we give suggestions and examples of research that could bring new insights on assessing and improving SBOM real practices.
Does Functional Package Management Enable Reproducible Builds at Scale? Yes
- Malka Julien
- Zacchiroli Stefano
- Zimmermann Théo
, 2025. Reproducible Builds (R-B) guarantee that rebuilding a software package from source leads to bitwise identical artifacts. R-B is a promising approach to increase the integrity of the software supply chain, when installing open source software built by third parties. Unfortunately, despite success stories like high build reproducibility levels in Debian packages, uncertainty remains among field experts on the scalability of R-B to very large package repositories. In this work, we perform the first large-scale study of bitwise reproducibility, in the context of the Nix functional package manager, rebuilding 709 816 packages from historical snapshots of the nixpkgs repository, the largest cross-ecosystem open source software distribution, sampled in the period 2017-2023. We obtain very high bitwise reproducibility rates, between 69 and 91% with an upward trend, and even higher rebuildability rates, over 99%. We investigate unreproducibility causes, showing that about 15% of failures are due to embedded build dates. We release a novel dataset with all build statuses, logs, as well as full "diffoscopes": recursive diffs of where unreproducible build artifacts differ.
Piecewise NARX Behavioral Model for RF Power Amplifiers in 5G Applications
- Pham Thuy T.
- Pham Dang-Kièn Germain
- Mohellebi Reda
- Almairac Pierre
- Pedrosa Carolina
- Desgreys Patricia
, 2025. <div><p>The complexity of RF PA behavior in 5G communication systems is driven by nonlinearities and long memory effects under wideband, high dynamic range signals; this poses significant challenges for existing models. Traditional Volterrabased models, such as the Generalized Memory Polynomial (GMP), struggle with overfitting, while neural network-based approaches require large datasets and exhibit high computational complexity both for training and inference. This paper presents a novel Piecewise Nonlinear AutoRegressive with eXogenous inputs (PW-NARX) model that combines the strengths of piecewise modeling and the NARX architecture to capture both nonlinear and memory effects efficiently over high dynamic ranges. Each sub-model in the piecewise framework operates within a different region of the input space, significantly reducing model complexity while maintaining high accuracy. Simulation results demonstrate that the PW-NARX model outperforms state-of-the-art models, achieving the lowest normalized mean square error (NMSE) of -39.18 dB and similar or better NMSE performance as other state-of-the-art models with fewer parameters.</p></div>
Théorie ondulatoire statistique
- Badeau Roland
, 2025. La théorie ondulatoire statistique établit formellement les lois statistiques vérifiées par les solutions de l’équation des ondes, dans un domaine connexe et borné de l’espace. Elle constitue ainsi la solution mathématique d’un problème très ancien en acoustique des salles, qui a fait couler beaucoup d’encre depuis les travaux pionniers de Wallace Clement Sabine à la fin du XIXe siècle : l’étude du phénomène de réverbération.Elle fournit notamment l’expression analytique de la distribution de puissance et des corrélations du champ acoustique par rapport au temps, la fréquence et la position dans l’espace, en fonction de la géométrie de la salle et des conditions aux limites. Elle nous permet par exemple de retrouver et d’améliorer, dans le cas particulier d’un champ acoustique isotrope, les formules du temps de réverbération originalement établies par Sabine et Eyring, ainsi que la formule de corrélation spatiale. Mais elle s’applique également à des formes géométriques pouvant engendrer un champ acoustique anisotrope.Notre objectif sera ici de présenter cette théorie de la manière la plus simple et intuitive possible, en l’abordant sous un angle purement géométrique. Nous montrerons ainsi que deux chemins mathématiques très différents, le premier basé sur l’asymptotique de Weyl et les billards mathématiques, le second basé sur la géométrie et la cristallographie, convergent vers les mêmes conclusions, ce qui nous rend extrêmement confiants quant à la validité scientifique de cette théorie. Nous fournirons également la confirmation expérimentale de certaines prédictions de la théorie, qui vont au-delà des propriétés statistiques déjà connues de la réverbération.
ROSA: Finding Backdoors with Fuzzing
- Kokkonis Dimitri
- Marcozzi Michaël
- Decoux Emilien
- Zacchiroli Stefano
, 2025, pp.2816-2828. A code-level backdoor is a hidden access, programmed and concealed within the code of a program. For instance, hard-coded credentials planted in the code of a file server application would enable maliciously logging into all deployed instances of this application. Confirmed software supply chain attacks have led to the injection of backdoors into popular open-source projects, and backdoors have been discovered in various router firmware. Manual code auditing for backdoors is challenging and existing semi-automated approaches can handle only a limited scope of programs and backdoors, while requiring manual reverse-engineering of the audited (binary) program. Graybox fuzzing (automated semi-randomized testing) has grown in popularity due to its success in discovering vulnerabilities and hence stands as a strong candidate for improved backdoor detection. However, current fuzzing knowledge does not offer any means to detect the triggering of a backdoor at runtime. In this work we introduce ROSA, a novel approach (and tool) which combines a state-of-the-art fuzzer (AFL++) with a new metamorphic test oracle, capable of detecting runtime backdoor triggers. To facilitate the evaluation of ROSA, we have created ROSARUM, the first openly available benchmark for assessing the detection of various backdoors in diverse programs. Experimental evaluation shows that ROSA has a level of robustness, speed and automation similar to classical fuzzing. It finds all 17 authentic or synthetic backdooors from ROSARUM in 1h30 on average. Compared to existing detection tools, it can handle a diversity of backdoors and programs and it does not rely on manual reverse-engineering of the fuzzed binary code. (10.1109/ICSE55347.2025.00183)
DOI : 10.1109/ICSE55347.2025.00183
Improving Ergonomic Viewing of Spatial XR Workspaces Through 2D Rotational Assistance
- O'Hagan Joseph
- Medeiros Daniel
- Wilson Graham
- Mcdermid Robert
- Mcgill Mark
, 2025, pp.Article No.: 336, Pages 1 - 8. Extended Reality unlocks the capability to create virtual workspaces that address and exceed the limitations of existing physical multi-monitor arrangements. We extend the ergonomic benefits of virtual workspaces by applying rotational assistance based on user gaze transitions between displays - meaning as a user looks towards a given display, the workspace would counter-rotate to reduce the amount of head/neck rotation required to view said display. Where prior work examined rotational assistance on one axis (horizontal) we extend this to movements across two axes, examining its impact on horizontal, vertical, and mixed arrangements of display. We found in a user study (n=20) rotational assistance improves ergonomic comfort, decreases necessary head/neck movement, improves workload, and decreases fatigue when viewing wide and tall virtual display spaces, further motivating the transition from physical to virtual displays for productivity. (10.1145/3706599.3719920)
DOI : 10.1145/3706599.3719920
SpineLoft: Interactive Spine-based 2D-to-3D Modeling
- Thiault Alexandre
- Philippe Telo
- Parakkat Amal Dev
- Eisemann Elmar
- Muthuganapathy Ramanathan
- Igarashi Takeo
, 2025. 3D artists (professionals and novices alike) often take inspiration from sketches or photos to guide their designs. Yet, existing modeling systems are not tailored to fully make use of such input. Consequently, significant effort and expertise are needed when creating model prototypes or exploring design options. In this work, we introduce a system to support the exploratory modeling process by enabling the transformation of 2D image elements into geometric 3D objects. Our solution relies on a novel d2 distance function, supporting a region-based lofting process, and delivers easily-editable 3D geometric "spine-rib" representations. The user draws a spine, and the system generates and modifies a generalized cylinder around it, considering image edges. The proposed approach, driven by simple user-defined scribble definitions, can robustly handle various image sources, ranging from photos to hand-drawn content. (10.1145/3706598.3713439)
DOI : 10.1145/3706598.3713439
Computer vision and halftone visual culture: improving similarity search for historical photographs
- Aissi Mohamed Salim
- Giardinetti Marina
- Bloch Isabelle
- Schuh Julien
- Foliard Daniel
Multimedia Tools and Applications, Springer Verlag, 2025, 84, pp.39451–39471. This article advances a method to analyze a large corpus of historical photographs using artificial intelligence tools and data modeling. This research was conducted within the framework of the EyCon (Early Conflict Photography 1890-1918 and Visual AI) and HighVision projects, which aim at leveraging the power of digital tools, exploiting both visual and textual information, to investigate the development of war photography at the turn of the 20th century. To do so, one of the objectives of the project was to develop a method to extract robust features and to overcome the challenges posed by the halftone printing techniques, the most common way to reproduce photographs in daily newspapers, periodicals and books at the time. By combining visual and textual similarity measures, the proposed approach enables the identification of significant subsets of similarity within the dataset. The findings from this research hold important implications for the broader field of image analysis and provide insights into the unique characteristics and complexities of historical visual data. This work contributes to the advancement of computer vision techniques in the analysis of historical photographic collections, opening up new avenues for research in visual AI and archival studies. (10.1007/s11042-025-20855-6)
DOI : 10.1007/s11042-025-20855-6
Tailoring Mixup to Data for Calibration
- Bouniot Quentin
- Mozharovskyi Pavlo
- d'Alché-Buc Florence
, 2023. Among all data augmentation techniques proposed so far, linear interpolation of training samples, also called Mixup, has found to be effective for a large panel of applications. Along with improved predictive performance, Mixup is also a good technique for improving calibration. However, mixing data carelessly can lead to manifold mismatch, i.e., synthetic data lying outside original class manifolds, which can deteriorate calibration. In this work, we show that the likelihood of assigning a wrong label with mixup increases with the distance between data to mix. To this end, we propose to dynamically change the underlying distributions of interpolation coefficients depending on the similarity between samples to mix, and define a flexible framework to do so without losing in diversity. We provide extensive experiments for classification and regression tasks, showing that our proposed method improves predictive performance and calibration of models, while being much more efficient.
Restyling Unsupervised Concept Based Interpretable Networks with Generative Models
- Parekh Jayneel
- Bouniot Quentin
- Mozharovskyi Pavlo
- Newson Alasdair
- d'Alché-Buc Florence
, 2025. Developing inherently interpretable models for prediction has gained prominence in recent years. A subclass of these models, wherein the interpretable network relies on learning high-level concepts, are valued because of closeness of concept representations to human communication. However, the visualization and understanding of the learnt unsupervised dictionary of concepts encounters major limitations, especially for large-scale images. We propose here a novel method that relies on mapping the concept features to the latent space of a pretrained generative model. The use of a generative model enables high quality visualization, and lays out an intuitive and interactive procedure for better interpretation of the learnt concepts by imputing concept activations and visualizing generated modifications. Furthermore, leveraging pretrained generative models has the additional advantage of making the training of the system more efficient. We quantitatively ascertain the efficacy of our method in terms of accuracy of the interpretable prediction network, fidelity of reconstruction, as well as faithfulness and consistency of learnt concepts. The experiments are conducted on multiple image recognition benchmarks for large-scale images.
Quantum Key Distribution with Efficient Post-Quantum Cryptography-Secured Trusted Node on a Quantum Network
- Piétri Yoann
- Verdier Pierre-Enguerrand
- Lacour Baptiste
- Gautier Maxime
- Huang Heming
- Camus Thomas
- Pegon Jean-Sébastien
- Zuber Martin
- Faugère Jean-Charles
- Schiavon Matteo
- Rhouni Amine
- Jaouën Yves
- Fabre Nicolas
- Alléaume Romain
- Rivera Thomas
- Diamanti Eleni
, 2025. Quantum Key Distribution (QKD) enables two distant users to exchange a secret key with information-theoretic security, based on the fundamental laws of quantum physics. While it is arguably the most mature application of quantum cryptography, it has inherent limitations in the achievable distance and the scalability to large-scale infrastructures. While the applicability of QKD can be readily increased with the use of intermediary trusted nodes, this adds additional privacy requirements on third parties. In this work, we present an efficient scheme leveraging a trusted node with lower privacy requirements thanks to the use of post-quantum cryptographic techniques, and implement it on a deployed fiber optic quantum communication network in the Paris area. (10.48550/arXiv.2504.01454)
DOI : 10.48550/arXiv.2504.01454
Robust and Reliable PUF Protocol Exploiting Non-Monotonic Quantization and Neyman-Pearson Lemma
- Nasir Neelam
- Béguinot Julien
- Cheng Wei
- Kühne Ulrich
- Danger Jean-Luc
, 2025. Strong physical unclonable functions (PUFs) provide a costeffective authentication solution for resource-limited devices. However, they are susceptible to machine learning (ML) attacks. The lightweight defenses against ML rely on adding non-linearity in the PUF behavior (as the XOR-PUF), or limiting the number of challenges at protocol level (as the lockdown protocol) to constrain learning. Another low-cost approach is to use a non-linear quantization of the response when the PUF provides an integer response, like the RO-PUF. This paper studies the non-monotonic quantization (NMQ) which greatly enhances the security when a large number of quantization level is used. Unfortunately, this makes the PUF highly unreliable, rendering it impractical for authentication purposes. In this study, we propose a solution which circumvents the intrinsic PUF unreliability of NMQ to build an effective authentication protocol. It relies on the Neyman-Pearson test which transforms the native dependability of responses into an asset to get a reliable authentication protocol. To validate this approach, we evaluate our solution in FPGA using a loop PUF (ring oscillator-based PUF) which is a multibin PUF. The results show that an authentication success of nearly 100% can be obtained with a high resistance as up to 60% accuracy against three types of ML attacks.
Revisiting Anatomy of Anorectal Malformations with a Symbolic AI Segmentation Method Applied on Diffusion MRI: The Lumbosacral Plexus Development and Microarchitecture Is Different in High and Low Types
- Goulin J.
- La Barbera G.
- Delmonte A.
- Bonnot E.
- Berteloot L.
- Lozach C.
- Beaudoin S.
- Blanc T.
- Cretolle C.
- Muller C.
- Meignan P.
- Peyrot Q.
- Mille E.
- Marret J.
- Zerah M.
- Boddaert N.
- Gori P.
- Bloch Isabelle
- Sarnacki Sabine
Journal of Imaging Informatics in Medicine, Springer Nature, 2025, pp.1-11. Anorectal malformations (ARMs) are congenital anomalies of the distal part of the hindgut often associated with sacral and/or spinal anomalies. We investigated anatomical and microstructural properties of the lumbosacral plexus of ARM patients from imaging data. Twenty-five patients (16 males), median age 4 months (2-49), 13 high and 12 low ARM, underwent 3 Tesla magnetic resonance imaging with diffusion tensor sequences (dMRI) before repair. A 3D model was built from manual segmentation and used to guide novel AI algorithms for the segmentation of the nervous pelvic network. Volume and diffusion parameters were obtained for each root (L5 to S4) and compared among patients with high and low ARMs using a nonparametric Wilcoxon test. Comparison was also made between the groups with (n = 9) or without (n = 16) sacral and/or spinal cord anomalies. When compared with low ARMS, high ARMs exhibited the following: a smaller volume of S1, S2, and S3 roots and of S1 and S3 for patients without sacral and/or spinal cord abnormalities; an overall significant alteration of the roots micro-architecture reflected by a diminution of the fractional anisotropy and an increase of the axial diffusivity and radial diffusivity measures. This first analysis of the lumbosacral plexus from dMRI in children with ARMs shows differences in the development and microarchitecture of the lumbosacral nerve roots between high and low ARMs. This observation supports the hypothesis that high ARMs may result from a more regional developmental abnormality than low ARMs and open new ways to visualize and assess the lumbosacral plexus in children and adults. (10.1007/s10278-024-01378-2)
DOI : 10.1007/s10278-024-01378-2
A VERSATILE FRAMEWORK FOR EVALUATING SINGLE NEURON TRACKING IN BEHAVING ANIMALS
- Reme Raphael
- Newson Alasdair
- Angelini Elsa
- Olivo-Marin Jean-Christophe
- Lagache Thibault
, 2025, pp.1-5. Accurately tracking neuronal activity in behaving animals, such as C. elegans, Drosophila, Zebrafish and Hydra vulgaris, presents significant challenges due to complex motions and background noise. While recent advancements in genetic engineering and fluorescence microscopy have improved imaging capabilities, existing tracking algorithms have struggled to perform effectively in these dynamic environments, often relying on simpler motion models that do not replicate behavioral conditions. In particular, the lack of annotated datasets for these motions limits the evaluation and improvement of such tracking algorithms. To address this, we developed a novel simulator that generates synthetic tracking data for particles on a deformable background, closely mimicking live animal recordings. This simulator produces annotated videos that reflects the intricate movements seen in Hydra Vulgaris, allowing for a robust evaluation of four tracking algorithms. The findings highlight the current limitations of these methods in challenging scenarios, paving the way for improved cell tracking techniques in dynamic biological systems. (10.1109/ISBI60581.2025.10981111)
DOI : 10.1109/ISBI60581.2025.10981111
GUIDING THE CLASSIFICATION OF HEPATOCELLULAR CARCINOMA ON 3D CT-SCANS USING DEEP AND HANDCRAFTED RADIOLOGICAL FEATURES
- Sarfati Emma
- Bône Alexandre
- Rohé Marc-Michel
- Aubé Christophe
- Ronot Maxime
- Gori Pietro
- Bloch Isabelle
, 2025. Hepatocellular carcinoma is the most spread primary liver cancer across the world (∼80% of the liver tumors). The gold standard for HCC diagnosis is liver biopsy. However, in the clinical routine, expert radiologists provide a visual diagnosis by interpreting hepatic CT-scans according to a standardized protocol, the LI-RADS, which uses five radiological criteria with an associated decision tree. In this paper, we propose an automatic approach to predict histologyproven HCC from CT images in order to reduce radiologists' intervariability. We first show that standard deep learning methods fail to accurately predict HCC from CT-scans on a challenging database, and propose a two-step approach inspired by the LI-RADS system to improve the performance. We achieve improvements from 6 to 18 points of AUC with respect to deep learning baselines trained with different architectures. We also provide clinical validation of our method, achieving results that outperform non-expert radiologists and are on par with expert ones.
Solver-in-the-loop approach to closure of shell models of turbulence
- Freitas André
- Um Kiwon
- Desbrun Mathieu
- Buzzicotti Michele
- Biferale Luca
Physical Review Fluids, American Physical Society, 2025, 10 (4), pp.044602. This work studies an a posteriori data-driven approach (known as solver-in-the-loop) for subgrid modeling of a shell model for turbulence. This approach takes advantage of the differentiable physics paradigm of deep learning, allowing a neural network model to interact with the differential equation solver over time during the training process. The closure model is, then, naturally exposed to equations-informed input distributions by accounting for prior corrections over the temporal evolution in training. Such a characteristic makes this approach depart from the conventional a priori instantaneous training paradigm and often leads to a more accurate and stable closure model. Our study demonstrates that the closure learned via this a posteriori approach is able to reproduce high-order statistical moments of interest also in closures of high Reynolds number turbulence. Moreover, we investigate the performance of the learned model by experimenting with the effect of unrolling in time, which has remained for the most part unexplored in the literature. Finally, we discuss potential extensions of this approach to Navier-Stokes equations. (10.1103/PhysRevFluids.10.044602)
DOI : 10.1103/PhysRevFluids.10.044602
Backward Diffusion iterates Noising-Relaxed Denoising
- Leclaire Arthur
- Guez Eliot
- Galerne Bruno
, 2025. The goal of this paper is to offer a synthetic view of connections between the celebrated diffusion models and classical additive Gaussian denoising. It allows to formulate standard diffusion schemes as a simple iterative noising-relaxed denoising process. By bringing new understanding on the noise schedules, this allows to accelerate the model sampling, and also to use diffusion schemes with off-the-shelf denoisers. Finally, we question the use of diffusion-based denoisers to regularize inverse problems in a plug-and-play fashion, and highlight potential stability problems induced by such very deep regularizations.
Relations Among New CCA Security Notions for Approximate FHE
- Brzuska Chris
- Canard Sébastien
- Fontaine Caroline
- Phan Duong Hieu
- Pointcheval David
- Renard Marc
- Sirdey Renaud
IACR Communications in Cryptology, International Association for Cryptologic Research (IACR), 2025, 2 (1), pp.1-35. In a recent Eurocrypt'24 paper, Manulis and Nguyen have proposed a new CCA security notion, vCCA, and associated construction blueprints to leverage both CPA-secure and correct FHE beyond the CCA1 security barrier. However, because their approach is only valid under the correctness assumption, it leaves a large part of the FHE spectrum uncovered, as many FHE schemes used in practice turn out to be approximate and, as such, do not satisfy the correctness assumption. In this paper, we improve their work by defining and investigating a variant of their security notion which is suitable for a more general case where approximate FHE are included. As the passive security of approximate FHE schemes is more appropriately captured by CPAD rather than CPA security, we start from the former notion to define our vCCAD new security notion. Although we show that vCCA and vCCAD are equivalent when the correctness assumption holds, we establish that vCCAD security is strictly stronger than vCCA security in the general case. In doing so, we interestingly establish several new separation results between variants of CPAD security of increasing strength. This allows us to clarify the relationship between vCCA security and CPAD security, and to reveal that the security notions landscape is much simpler for correct FHE than when approximate ones are included — in which case, for example, we establish that multiple challenges security notions are strictly stronger than single-challenge ones for both CPAD and vCCAD security. Lastly, we also give concrete construction blueprints, showing how to leverage some of the blueprints proposed by Manulis and Nguyen to achieve vCCAD security. As a result, vCCAD security is the strongest CCA security notion known so far to be achievable by both correct and approximate FHE schemes. (10.62056/aee0iv7sf)
DOI : 10.62056/aee0iv7sf
Enhancing surrogate regression methods for structured prediction : An odyssey with loss functions
- Yang Junjie
, 2025. Machine learning, a rapidly evolving field at the intersection of mathematics and computer science, has transformed both scientific research and real-world applications. Beyond classification and regression, it now allows tackling structured prediction, enabling breakthroughs in machine translation, metabolite identification, and protein structure prediction, to name a few.Structured prediction (SP) is challenging due to its large, combinatorial output space. Surrogate regression methods like implicit loss embedding (ILE) and output kernel regression (OKR) address this by mapping structured outputs into a Hilbert space, converting SP into a vector-valued learning problem. However, they still face several challenges: (i) their performance depends heavily on complex loss function design, (ii) the implicit or infinite-dimensional nature of surrogate spaces limits neural network integration, and (iii) inference remains computationally demanding. This thesis aims to improve surrogate regression methods to overcome these limitations. For this purpose, we leverage several families of mathematical tools, including optimal transport (OT), kernel methods, and contrastive learning.We first address structured prediction for labeled graphs, leveraging recent advances in optimal transport distances. We introduce the fused network Gromov-Wasserstein (FNGW) distance, which incorporates edge features into computations. Using FNGW as a loss function in the ILE framework, we develop ILE-FNGW, generating predictions as FNGW barycenters. To tackle inference complexity, we propose Any2Graph-FNGW, a neural network-based model that predicts directly in a relaxed surrogate graph space, simplifying inference through efficient decoding.Next, building on OKR, we introduce deep sketched output kernel regression (DSOKR), a new framework that extends neural networks as surrogate hypothesis spaces for general structured outputs. DSOKR constructs a finite-dimensional subspace of a reproducing kernel Hilbert space (RKHS) using random sketching. This approach preserves flexibility by allowing any neural architecture for input processing while requiring only the prediction of coefficients for a finite-dimensional basis in the output layer.Finally, we introduce a novel SP framework, explicit loss embedding (ELE), which replaces predefined loss functions for structured data with a learnable, differentiable loss. This loss is defined as the squared Euclidean distance between neural network-parameterized embeddings and is learned directly from output data using contrastive learning. The new loss serves a dual purpose: during training, it formulates a finite-dimensional surrogate regression problem, and during inference, it defines a differentiable decoding objective.We evaluate all proposed methods on supervised graph prediction tasks, highlighting the distinct characteristics of each SP approach.

Retour aux années