Publications

Les publications de nos enseignants-chercheurs sont sur la plateforme HAL :

Publications HAL

Les publications des thèses des docteurs du LTCI sont sur la plateforme HAL :

HAL thèses

Retrouver les publications figurant dans l'archive ouverte HAL par année :

2024

Distributed Decoding Scheme for Uplink C-RAN System with Limited Backhaul Capacity
- Chêne Thomas
- Rekaya Ghaya
- Damen Mohamed Oussama
, 2024.
Summary of the Workshop on Visual Methods and Analyzing Visual Data in Human Computer Interaction
- Wang Zezhong
- Huron Samuel
- Sturdee Miriam
- Carpendale Sheelagh
, 2024, pp.29-32. As visualization (VIS) and human computer interaction (HCI) scientists, researchers and practitioners, we are deeply involved in analyzing visual data. We use the term, visual data, to mean artifacts that have been created to be seen, for instance, hand sketches [25], photographs [5], physical artifacts [2], screenshots of graphical user interfaces [10], videos [12, 21], information visualizations [18], and others. However, while we are rapidly moving towards more active use of qualitative methods in empiricism, the emphasis on qualitative approaches has favored verbal and textual analysis (verbatim, transcript, etc.). In contrast, more and more researchers in psychology [19] and ethnography [17] and other domains are describing new methods for analyzing visual data. However, the recognition of and formulation of these methods in HCI and VIS still remains poorly defined. For instance, we do not have a common methodological language and vocabulary to speak about the variety of visual methods that are already in use. This workshop is especially relevant for the ACM interactive surfaces and spaces community because 1) visual data is an emerging trend, 2) visual data is particularly relevant for studying diverse interaction modalities, and 3) visual methods have proven their usefulness in studying surface applications and human interactions [11, 21, 23].<p>In this workshop, we aim to gather the HCI community of researchers that use qualitative methods when analyzing visual data to identify and clarify their specific findings. We aim to collect and reflect on the strategies, processes, and challenges of using visual data as material for qualitative analysis. We will explore the methodologies and workflows involved in analyzing visual materials within qualitative human-centered research. In particular, we would like to address the challenges of the visual data analytical pipeline from raw data to the final analysis results. Both static and moving visuals play a pivotal role in qualitative inquiry, notably within the visualization community that engages in the collection, analysis, interaction with, and presentation of visual data. Different in form but in some ways in parallel to text, visual data presents distinct challenges necessitating specialized analysis techniques, processes, and expertise for thorough analysis and theory development. Researchers often find themselves modifying existing qualitative methods or innovating new approaches to meet the unique demands of visual data. In this workshop we want to gather people in HCI community that are interested in using and discussing the use of visual methods in their research to build a common language and outline the challenges and opportunities of these approaches.</p> (10.1145/3696762.3698047)
DOI : 10.1145/3696762.3698047
Gabic: Graph-Based Attention Block for Image Compression
- Spadaro Gabriele
- Presta Alberto
- Tartaglione Enzo
- Giraldo Jhony
- Grangetto Marco
- Fiandrotti Attilio
, 2024, pp.1802-1808. While standardized codecs like JPEG and HEVC-intra represent the industry standard in image compression, neural Learned Image Compression (LIC) codecs represent a promising alternative. In detail, integrating attention mechanisms from Vision Transformers into LIC models has shown improved compression efficiency. However, extra efficiency often comes at the cost of aggregating redundant features. This work proposes a Graph-based Attention Block for Image Compression (GABIC), a method to reduce feature redundancy based on a k-Nearest Neighbors enhanced attention mechanism. Our experiments show that GABIC outperforms comparable methods, particularly at high bit rates, enhancing compression performance. (10.1109/ICIP51287.2024.10647413)
DOI : 10.1109/ICIP51287.2024.10647413
SALT: STANDARDIZED AUDIO EVENT LABEL TAXONOMY
- Stamatiadis Paraskevas
- Olvera Michel
- Essid Slim
, 2024. <div><p>Machine listening systems often rely on fixed taxonomies to organize and label audio data, key for training and evaluating deep neural networks (DNNs) and other supervised algorithms. However, such taxonomies face significant constraints: they are composed of application-dependent predefined categories, which hinders the integration of new or varied sounds, and exhibits limited cross-dataset compatibility due to inconsistent labeling standards. To overcome these limitations, we introduce SALT: Standardized Audio event Label Taxonomy. Building upon the hierarchical structure of AudioSet's ontology, our taxonomy extends and standardizes labels across 24 publicly available environmental sound datasets, allowing the mapping of class labels from diverse datasets to a unified system. Our proposal comes with a new Python package designed for navigating and utilizing this taxonomy, easing cross-dataset label searching and hierarchical exploration. Notably, our package allows effortless data aggregation from diverse sources, hence easy experimentation with combined datasets.</p></div>
A SOUND DESCRIPTION: EXPLORING PROMPT TEMPLATES AND CLASS DESCRIPTIONS TO ENHANCE ZERO-SHOT AUDIO CLASSIFICATION
- Olvera Michel
- Stamatiadis Paraskevas
- Essid Slim
, 2024. Audio-text models trained via contrastive learning offer a practical approach to perform audio classification through natural language prompts, such as "this is a sound of" followed by category names. In this work, we explore alternative prompt templates for zero-shot audio classification, demonstrating the existence of higher-performing options. First, we find that the formatting of the prompts significantly affects performance so that simply prompting the models with properly formatted class labels performs competitively with optimized prompt templates and even prompt ensembling. Moreover, we look into complementing class labels by audio-centric descriptions. By leveraging large language models, we generate textual descriptions that prioritize acoustic features of sound events to disambiguate between classes, without extensive prompt engineering. We show that prompting with class descriptions leads to state-of-the-art results in zero-shot audio classification across major ambient sound datasets. Remarkably, this method requires no additional training and remains fully zero-shot.
Shackling Uncertainty using Mixed Criticality in Monte-Carlo Tree Search
- Cordeiro Franco
- Tardieu Samuel
- Pautet Laurent
, 2024, pp.34-41. <div><p>In the world of embedded systems, optimizing actions with the uncertain costs of multiple resources in order to achieve an objective is a complex challenge. Existing methods include plan building based on Monte Carlo Tree Search (MCTS), an approach that thrives in multiple online planning scenarios. However, these methods often overlook uncertainty in worst-case cost estimations. A system can fail to operate/function before achieving a critical objective when actual costs exceed optimistic worst-case estimates, even if replanning is considered. Conversely, a system based on pessimistic worst-case estimates would lead to resource over-provisioning even for less critical objectives. To solve similar issues, the Mixed Criticality (MC) approach has been developed in the real-time systems community. Thus we propose to extend the MCTS-based heuristic in three directions.</p><p>Firstly, we reformulate the concept of MC to account for uncertain worst-case costs, including optimistic and pessimistic worst-case estimates. High-criticality tasks must be executed regardless of their uncertain costs. Low-criticality tasks are either executed in low-criticality mode utilizing resources up-to their optimistic worst-case estimates, or executed in high-criticality mode by degrading them, or discarded when resources are scarce. In such cases, resources previously devoted to low-criticality tasks are reallocated to high-criticality tasks.</p><p>Secondly, although the MC approach was originally developed for real-time systems, focusing primarily on worst-case execution time as the only uncertain resource, our approach extends the concept of resources to deal with several resources at once, such as the time and energy required to perform an action.</p><p>Finally, we propose (MC) 2 TS an extension of MCTS with MC concepts to efficiently adjust resource allocation to uncertain costs according to the criticality of actions. We demonstrate our approach in an active perception scenario. Our evaluation shows (MC) 2 TS outperforms the traditional MCTS regardless of whether the worst case estimates are optimistic or pessimistic.</p></div> (10.1109/SIES62473.2024.10767910)
DOI : 10.1109/SIES62473.2024.10767910
Robust Multiparty Computation from Threshold Encryption Based on RLWE
- Urban Antoine
- Rambaud Matthieu
, 2025, 15257, pp.294-314. We consider protocols for secure multi-party computation (MPC) built from FHE under honest majority, i.e., for n =2t+1 players of which t are corrupt, that are robust. Surprisingly there exists no robust threshold FHE scheme based on BFV to design such MPC protocols. Precisely, all existing methods for generating a common relinearization key can abort as soon as one player deviates. We address this issue, with a new relinearization key (adapted from [CDKS19, CCS’19]) which we show how to securely generate in parallel of the threshold encryption key, in the same broadcast. We thus obtain the first robust threshold BFV scheme, moreover using only one broadcast for the generation of keys instead of two previously. Of independent interest, as an optional alternative, we propose the first threshold FHE decryption enabling simultaneously: (i) robustness over asynchronous channels with honest majority; (ii) tolerating a power-of-small-prime ciphertext modulus, e.g., ; and (iii) secret shares of sizes quasi-independent of n. (10.1007/978-3-031-75757-0_15)
DOI : 10.1007/978-3-031-75757-0_15
How much secure randomness is in a quantum state?
- Anco Kriss Gutierrez
- Nemoz Tristan
- Brown Peter
, 2024. How much cryptographically-secure randomness can be extracted from a quantum state? This fundamental question probes the absolute limits of quantum random number generation (QRNG) and yet, despite the technological maturity of QRNGs, it remains unsolved. In this work we consider a general adversarial model that allows for an adversary who has quantum side-information about both the source and the measurement device. Using links between randomness extraction rates and sandwiched Rényi entropies, we provide compact, easy to compute, achievable rates of secure randomness extraction from quantum states. In turn, this provides a simple to evaluate benchmarking tool for the randomness generation rates of QRNG protocols. (10.48550/arXiv.2410.16447)
DOI : 10.48550/arXiv.2410.16447
A Scalable Algorithm for the Optimal Trajectory of a Massive Swarm of UAV Base Stations Using Lagrangian Mechanics
- Coupechoux Marceau
- Darbon Jérôme
- Kélif Jean-Marc
- Sigelle Marc
, 2024, pp.683-688. In this paper, we consider multiple Unmanned Aerial Vehicles (UAV) serving as flying Base Stations (BS) of a wireless network and the problem of jointly optimizing their trajectory with respect to a running cost. This cost accounts for the consumed energy related to the vehicle velocity and for the amount of data traffic collected or served by the UAVs. The data traffic is supposed to be spatially distributed around a hotspot and is equivalent to a potential in Physics. Using the principles of Lagrangian Mechanics, we derive a scalable algorithm able to optimize the trajectory of thousands of drones in milliseconds on a off-the-shelf laptop. Our model allows to control the distance between the UAVs to avoid collisions by using a coupling between the drone trajectories. (10.1109/WiMob61911.2024.10770532)
DOI : 10.1109/WiMob61911.2024.10770532
Adversarial Attacks on Autonomous Driving Systems in the Physical World: a Survey
- Chi Lijun
- Msahli Mounira
- Zhang Qingjie
- Qiu Han
- Zhang Tianwei
- Memmi Gerard
- Qiu Meikang
IEEE Transactions on Intelligent Vehicles, Institute of Electrical and Electronics Engineers, 2024, pp.1-22. Autonomous Driving Systems (ADS) represent a revolutionary advancement in transportation and offer unprecedented safety and convenience. Real-world physical attacks are emphasized because Autonomous Driving Systems (ADS) depend heavily on sensors and perception modules to detect and interpret their surroundings, making security a critical concern. Defenders usually have the upper hand in the digital sphere while they are challenged in the physical world because attackers have greater flexibility for covert operations. A comprehensive analysis is essential for understanding attack trends, evolution, and defense directions. This paper provides a survey of stateof-the-art physical attacks that threaten ADS perception. A novel multi-label classification method is introduced to categorize these attacks along four main dimensions. Visualization and analysis of the classification enhance the understanding of these multidimensional threats. Five research directions for future exploration are also proposed. (10.1109/TIV.2024.3484152)
DOI : 10.1109/TIV.2024.3484152
Unlocking Ground-to-Space Optical Links Capacity with Optimized AO Pre-Compensation Using Spatio-Temporal Measurements and Priors
- Lognoné Perrine
- Rekaya Ghaya
- Osborn James
- Conan Jean Marc
, 2025, 13699, pp.1369923. To enable future space internet networks, it is needed to communicate at very high data rates between the Earth and satellites (GEO or LEO). Currently, the capacity of these links is drastically reduced due to the optical signal losses induced by atmospheric turbulence. Adaptive optics (AO) pre-compensation of the uplink beam has the potential to mitigate turbulence-induced signal losses. However, because of the point-ahead angle (PAA) separating the down and uplink optical paths, the current AO pre-compensation technique, based on the downlink beam AO correction, is suboptimal, and long and deep signal fades can still be observed, degrading the link capacity. In earlier work, we optimised the ground-to-space AO pre-compensation using an MMSE phase estimation at PAA using measurements and statistical priors of: the downlink beam phase and log-amplitude; the downlink beam phase and log-amplitude collected from several ground apertures; the downlink beam phase and log-amplitude and the phase sensed from a laser guide star at PAA. All the methods show to improve the link capacity, for various atmospheric conditions. We present a fourth MMSE estimator based on the downlink past measurements and priors. We compare the performance improvement brought by the different methods and discuss the corresponding system complexity, to identify the best performance/complexity trade-off and prepare future experimental demonstrations. (10.1117/12.3075399)
DOI : 10.1117/12.3075399
The Factuality of Large Language Models in the Legal Domain
- El Hamdani Rajaa
- Bonald Thomas
- Malliaros Fragkiskos D.
- Holzenberger Nils
- Suchanek Fabian M.
, 2024, pp.3741 - 3746. This paper investigates the factuality of large language models (LLMs) as knowledge bases in the legal domain, in a realistic usage scenario: we allow for acceptable variations in the answer, and let the model abstain from answering when uncertain. First, we design a dataset of diverse factual questions about case law and legislation. We then use the dataset to evaluate several LLMs under different evaluation methods, including exact, alias, and fuzzy matching. Our results show that the performance improves significantly under the alias and fuzzy matching methods. Further, we explore the impact of abstaining and in-context examples, finding that both strategies enhance precision. Finally, we demonstrate that additional pre-training on legal documents, as seen with SaulLM, further improves factual precision from 63% to 81%. (10.1145/3627673.3679961)
DOI : 10.1145/3627673.3679961
Learning to rank anomalies: scalar performance criteria and maximization of rank statistics
- Limnios Myrto
- Noiry Nathan
- Clémençon Stéphan
Machine Learning, Springer Verlag, 2024, 113 (11-12), pp.8623-8653. Abstract The ability to collect and store ever more massive data, unlabeled in many cases, has been accompanied by the need to process them efficiently in order to extract relevant information and possibly design solutions based on the latter. In various situations, the vast majority of the observations exhibit the same behavior, while a small proportion deviates from it. Detecting these outlier observations (or equivalently defined as anomalies) is now one of the major challenges for machine learning applications (e.g. fraud detection or predictive maintenance). We propose here a novel methodology for outlier/anomaly detection, by learning a scoring function defined on the feature space allowing for ranking the observations by degree of abnormality. The scoring function is built through maximization of an empirical performance criterion taking the form of a (two-sample) linear rank statistic. We show that bipartite ranking algorithms can thus be used to learn nearly optimal scoring function with provable theoretical guarantees. We illustrate our methodology with numerical experiments based on open access online code. (10.1007/s10994-024-06609-9)
DOI : 10.1007/s10994-024-06609-9
Impact of scaling up the sensor sampling frequency on the reliability of edge processing systems in tolerating soft errors caused by neutrons
- Minelli de Carvalho Matheus
- Laurini Luiz Henrique
- Atukpor Emmanuel
- Naviner Lirida
- Possamai Bastos Rodrigo
, 2024.
Link Prediction Without Learning
- Delarue Simon
- Bonald Thomas
- Viard Tiphaine
, 2024, 392, pp.2274--2281. Link prediction is a fundamental task in machine learning for graphs. Recently, Graph Neural Networks (GNNs) have gained in popularity and have become the default approach for solving this type of task. Despite the considerable interest for these methods, simple topological heuristics persistently emerge as competitive alternatives to GNNs. In this study, we show that this phenomenon is not an exception and that GNNs do not consistently establish a performance standard for link prediction on graphs. For this purpose, we identify several limitations in the current GNN evaluation methodology, such as the lack of variety in benchmark dataset characteristics and the limited use of diverse baselines outside of neural methods. In particular, we highlight that integrating feature information into topological heuristics remains a little-explored path. In line with this observation, we propose a simple non-neural model that leverages local structure, node feature, and graph feature information within a weighted combination. Experiments conducted on large variety of networks indicate that the proposed approach outperforms existing state-of-the-art GNNs and increases generalisation ability. Contrasting with GNNs, our approach does not rely on any learning process and therefore achieves superior results without sacrificing efficiency, showcasing a reduction of one to three orders of magnitude in computation time. (10.3233/FAIA240750)
DOI : 10.3233/FAIA240750
Temporal extrapolation for Zero Latency video transmission
- Vijayaratnam Melan
, 2024. In the past few years, several efforts have been devoted to reduce individual sources of latency in video delivery, including acquisition, coding and network transmission. The goal is to improve the quality of experience in applications requiring real-time interaction. Nevertheless, these efforts are fundamentally constrained by technological and physical limits.This thesis we investigate a radically different approach that can arbitrarily reduce the overall latency by means of video extrapolation. We propose two latency compensation schemes where video extrapolation is performed either at the encoder or at the decoder side. Since a loss of fidelity is the price to pay for compensating latency arbitrarily, We evaluate the three-way trade-off between latency, distortion, and rate, showing the potential of this approach using three recent video prediction schemes. We go beyond the Bjo ntegaard metrics to propose a novel three-way metric for the rate-distortion-latency trade-off.Furthermore, the bottleneck of the latency compensation scheme lies on the quality of the extrapolation. We therefore propose solutions to improve the quality of the extrapolation. First we introduce online learning algorithm for videoprediction designed to leverage the redundancies of computations when predicting at high horizon in the future. Afterwards, we propose a neural radiance fields based approach for video prediction to learn the 3D representations of the dynamic objects. (10.70675/2795e597z4ab8z4d4az91bezfd33eb13df89)
DOI : 10.70675/2795e597z4ab8z4d4az91bezfd33eb13df89
Do Recommender Systems Promote Local Music? A Reproducibility Study Using Music Streaming Data
- Matrosova Kristina
- Marey Lilian
- Salha-Galvan Guillaume
- Louail Thomas
- Bodini Olivier
- Moussallam Manuel
, 2024, pp.148-157. This paper examines the influence of recommender systems on local music representation, discussing prior findings from an empirical study on the LFM-2b public dataset. This prior study argued that different recommender systems exhibit algorithmic biases shifting music consumption either towards or against local content. However, LFM-2b users do not reflect the diverse audience of music streaming services. To assess the robustness of this study’s conclusions, we conduct a comparative analysis using proprietary listening data from a global music streaming service, which we publicly release alongside this paper. We observe significant differences in local music consumption patterns between our dataset and LFM-2b, suggesting that caution should be exercised when drawing conclusions on local music based solely on LFM-2b. Moreover, we show that the algorithmic biases exhibited in the original work vary in our dataset, and that several unexplored model parameters cansignificantly influence these biases and affect the study’s conclusion on both datasets. Finally, we discuss the complexity of accurately labeling local music, emphasizing the risk of misleading conclusions due to unreliable, biased, or incomplete labels. To encourage further research and ensure reproducibility, we have publicly shared our dataset and code. (10.1145/3640457.3688065)
DOI : 10.1145/3640457.3688065
AI-Driven Intrusion Detection Systems (IDS) on the ROAD Dataset: A Comparative Analysis for Automotive Controller Area Network (CAN)
- Guerra Lorenzo
- Xu Linhan
- Bellavista Paolo
- Chapuis Thomas
- Duc Guillaume
- Mozharovskyi Pavlo
- Nguyen Van-Tam
, 2024, pp.39-49. The integration of digital devices in modern vehicles has revolutionized automotive technology, enhancing safety and the overall driving experience. The Controller Area Network (CAN) bus is a central system for managing in-vehicle communication between the electronic control units (ECUs). However, the CAN protocol poses security challenges due to inherent vulnerabilities, lacking encryption and authentication, which, combined with an expanding attack surface, necessitates robust security measures. In response to this challenge, numerous Intrusion Detection Systems (IDS) have been developed and deployed. Nonetheless, an open, comprehensive, and realistic dataset to test the effectiveness of such IDSs remains absent in the existing literature. This paper addresses this gap by considering the latest ROAD dataset, containing stealthy and sophisticated injections. The methodology involves dataset labeling and the implementation of both state-of-the-art deep learning models and traditional machine learning models to show the discrepancy in performance between the datasets most commonly used in the literature and the ROAD dataset, a more realistic alternative. (10.1145/3689936.3694696)
DOI : 10.1145/3689936.3694696
MyWebstrates: Webstrates as Local-first Software
- Klokmose Clemens Nylandsted
- Eagan James R
- van Hardenberg Peter
, 2024. Webstrates are web substrates, a practical realization of shareable dynamic media under which distributability, shareability, and malleability are fundamental software principles. Webstrates blur the distinction between application and document in a way that enables users to share, repurpose, and refit software across a variety of domains, but its reliance on a central server constrains its use; it is at odds with personal and collective control of data; and limits applications to the web. We extend the fundamental principles to include interoperability and sovereignty over data and propose MyWebstrates, an implementation of Webstrates on top of a new, lower-level substrate for synchronization built around local-first software principles. MyWebstrates registers itself in the user’s browser and function as a piece of local software that can selectively synchronise data over sync servers or peer-to-peer connections. We show how MyWebstrates extends Webstrates to enable offline collaborative use, interoperate between Webstrates on non-web technologies such as Unity, and maintain personal and collective sovereignty over data. We demonstrate how this enables new types of applications of Webstrates and discuss limitations of this approach and new challenges that it reveals. (10.1145/3654777.3676445)
DOI : 10.1145/3654777.3676445
Jumping to Conclusions: A Visual Comparative Analysis of Online Debate Platform Layouts
- Frappier Tallullah
- Bressa Nathalie
- Huron Samuel
, 2024, pp.1-15. There has been an increase in online debate platforms in recent years that allow individuals to exchange their opinions or to foster civic engagement. Even though the design of platforms has a significant influence on the quality of debates, research has yet to systematically analyze the graphical user interfaces of debate tools that are currently in use. To address this, we collected 25 off-the-shelf online debate platforms and conducted a comparative visual analysis of their graphical user interfaces. We identified different types of platforms, interface blocks, hierarchies, and display layout patterns. We found a strong similarity among these platforms in their design and a shared emphasis on individual input. Drawing from these insights, we discuss how the design of platforms frames the practice of debate and identify potential design dimensions in order to move beyond the existing boundaries of online debate. (10.1145/3679318.3685377)
DOI : 10.1145/3679318.3685377
Towards a foundation model for cortical folding
- Laval Julien
- Chavas Joël
- Troiani Vanessa
- Snyder William
- Patti Marisa
- Moyal Mylène
- Plaze Marion
- Cachia Arnaud
- Sun Zhong Yi
- Frouin Vincent
- Gori Pietro
- Rivière Denis
- Mangin Jean-François
, 2024. <div><p>The brain surface is composed of humps called gyri, separated by grooves called sulci. Although the main folds are common to all individuals, their shape varies, making them unique to each individual. Cortical folding may contain biomarkers that have yet to be deciphered. While conventional geometric approaches fail to fully characterize the high inter-individual variability, recent efforts in large-scale MRI data collection allow us to leverage the statistical power of deep neural networks. Here, we introduce Champollion V0, a self-supervised learning (SSL) algorithm to sort sulcal variability based on 21,070 subjects from the UKBioBank dataset. We revisit from scratch an existing model and optimize its ability to retrieve hand-labeled patterns defined by the neuroscientific community. Under linear evaluation on the latent space, Champollion V0 significantly improves the detection of three different kinds of folding patterns: the presence of a parallel sulcus (AUC increases from 73% to 84%), the presence of specific interruptions (AUC increases from 50% to 79%) and the detection of a specific folding shape (R 2 increases on each of the six main geometric features), respectively in the cingulate, the orbital and the central region. These hand-labeled patterns were found to be correlated to neurodevelopmental pathologies. Champollion V0 could enable the automatic labeling of larger datasets for future studies. The code can be found on Github.</p></div>
Exploring User Placement for VR Remote Collaboration in a Constrained Passenger Space
- Medeiros Daniel
- Wilson Graham
- Sousa Mauricio
- Pantidi Nadia
- Mcgill Mark
- Drago Diego
- Brewster Stephen
, 2024, pp.1-11. Extended Reality (XR) offers the potential to transform the passenger experience by allowing users to inhabit varied virtual spaces for entertainment, work or social interaction, whilst escaping the constrained transit environment. XR allows remote collaborators to feel like they are together and enables them to perform complex 3D tasks. However, the social and physical constraints of the passenger space pose unique challenges to productive and socially acceptable collaboration. Using a collaborative VR puzzle task, we examined the effects of five different f-formations of collaborator placement and orientation in an interactive workspace on social presence, task workload, and implications for social acceptability. Our quantitative and qualitative results showed that face-to-face formations were preferred for tasks with a high need for verbal communication but may lead to social collisions, such as inadvertently staring at a neighbouring passenger, or physical intrusions, such as gesturing in another passenger’s personal space. More restrictive f-formations, however, were preferred for passenger use as they caused fewer intrusions on other passengers’ visual and physical space. (10.1145/3641825.3687722)
DOI : 10.1145/3641825.3687722
An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment
- Malard Hugo
- Olvera Michel
- Lathuiliere Stéphane
- Essid Slim
Advances in Neural Information Processing Systems, Morgan Kaufmann Publishers, 2024. Multimodal large language models have fueled progress in image captioning. These models, fine-tuned on vast image datasets, exhibit a deep understanding of semantic concepts. In this work, we show that this ability can be re-purposed for audio captioning, where the joint image-language decoder can be leveraged to describe auditory content associated with image sequences within videos featuring audiovisual content. This can be achieved via multimodal alignment. Yet, this multimodal alignment task is non-trivial due to the inherent disparity between audible and visible elements in real-world videos. Moreover, multimodal representation learning often relies on contrastive learning, facing the challenge of the so-called modality gap which hinders smooth integration between modalities. In this work, we introduce a novel methodology for bridging the audiovisual modality gap by matching the distributions of tokens produced by an audio backbone and those of an image captioner. Our approach aligns the audio token distribution with that of the image tokens, enabling the model to perform zero-shot audio captioning in an unsupervised fashion while keeping the initial image captioning component unaltered. This alignment allows for the use of either audio or audiovisual input by combining or substituting the image encoder with the aligned audio encoder. Our method achieves significantly improved performances in zero-shot audio captioning, compared to existing approaches.
Intelligent UAV Swarm Coexistence in DTV Bands
- Dubey Rajrshi
- Balakrishnan Ashutosh
- De Swades
, 2024, pp.1-5. Intelligent spectrum sharing is one of the key enablers of upcoming sixth generation (6G) communications. Unmanned aerial vehicles (UAVs) have emerged as an attractive low altitude aerial base station (BS), providing on demand capacity especially in urban areas. This work aims to demonstrate the feasibility of coexisting UAV to UAV communication based adhoc network over digital television (DTV) bands governed through latest ATSC 3.0 standards. We propose an adaptive modulation and dynamic subcarrier (AMDS) allocation framework to intelligently allocate the resources at the UAV network through adaptive bit loading and frequency allocations. The work aims to maximize the capacity of the coexisting UAV network in addition to protecting the performance of the TV-receiver from the resultant coexisting network interference. A rate maximization problem is formulated and solved using a low computation complexity based bi-section method. Extensive simulation results indicate that the connected UAV link can achieve up to 40 Mbps capacity when 1 km apart, while coexisting and guaranteeing the performance of the DTV network. (10.1109/VTC2024-Fall63153.2024.10757670)
DOI : 10.1109/VTC2024-Fall63153.2024.10757670
Tip tilt and focus estimation based on LGS and downlink joint measurements for ground to GEO satellite optical communication link
- Lognoné Perrine
- Rekaya Ghaya
- Montmerle-Bonnefois Aurélie
- Paillier Laurie
- Conan Jean-Marc
Optics Express, Optical Society of America - OSA Publishing, 2024, 32 (21), pp.37739-37757. Achieving high data rates in GEO Feeder optical uplinks faces challenges due to the fading nature of the channel induced by atmospheric turbulence. Adaptive optics precompensation using downlink measurements is a solution to mitigate the impact of the turbulence. However, the point-ahead angle anisoplanatism, inherent to the bidirectional link geometry, limits the uplink correction efficiency, leading to persistent signal fades and loss of information onboard the satellite. We recently proposed a new minimum mean square error method that improves the phase estimation at the PAA based on the downlink phase and log amplitude measurements, reducing the anisoplanatism impact on the coupled flux. Alternatively, a laser guide star can be used to measure the phase at the PAA. However, it is currently challenging to retrieve the tip, tilt, and focus modes, whose correction is essential to improve the link quality. In this article, we propose to combine both techniques to estimate the tip, tilt, and focus at the PAA by incorporating the LGS high-order measurements in the MMSE formalism. We develop the associated analytical reconstructor and evaluate the performance of the phase estimation and the gain on the coupled flux statistics aboard the GEO satellite, considering an idealized LGS system. The new estimator is shown to reduce the tip, tilt, and focus error variances by up to 70% of their initial value. (10.1364/oe.538333)
DOI : 10.1364/oe.538333

Retour aux années