The Last Human-Written Paper: Agent-Native Research Artifacts

Published: 2026-04-27 16:23:09

Authors: Jiachen Liu, Jiaxin Pei, Jintao Huang, Chenglei Si, Ao Qu, Xiangru Tang, Runyu Lu, Lichang Chen, Xiaoyan Bai, Haizhong Zheng, Carl Chen, Zhiyang Chen, Haojie Ye, Yujuan Fu, Zexue He, Zijian Jin, Zhenyu Zhang, Shangquan Sun, Maestro Harmon, John Dianzhuo Wang, Jianqiao Zeng, Jiachen Sun, Mingyuan Wu, Baoyu Zhou, Yuchen You, Shijian Lu, Yiming Qiu, Fan Lai, Yuan Yuan, Yao Li, Junyuan Hong, Ruihao Zhu, Beidi Chen, Alex Pentland, Ang Chen, Mosharaf Chowdhury, Zechen Zhang

Categories: cs.LG

Abstract:
Scientific publication compresses a branching, iterative research process into a linear narrative, discarding the majority of what was discovered along the way. This compilation imposes two structural costs: a Storytelling Tax, where failed experiments, rejected hypotheses, and the branching exploration process are discarded to fit a linear narrative; and an Engineering Tax, where the gap between reviewer-sufficient prose and agent-sufficient specification leaves critical implementation details unwritten. Tolerable for human readers, these costs become critical when AI agents must understand, reproduce, and extend published work. We introduce the Agent-Native Research Artifact (Ara), a protocol that replaces the narrative paper with a machine-executable research package structured around four layers: scientific logic, executable code with full specifications, an exploration graph that preserves the failures compilation discards, and evidence grounding every claim in raw outputs. Three mechanisms support the ecosystem: a Live Research Manager that captures decisions and dead ends during ordinary development; an Ara Compiler that translates legacy PDFs and repos into Aras; and an Ara-native review system that automates objective checks so human reviewers can focus on significance, novelty, and taste. On PaperBench and RE-Bench, Ara raises question-answering accuracy from 72.4% to 93.7% and reproduction success from 57.4% to 64.4%. On RE-Bench's five open-ended extension tasks, preserved failure traces in Ara accelerate progress, but can also constrain a capable agent from stepping outside the prior-run box depending on the agent's capabilities.

Summary (gpt-4o-mini — added 2026-04-30 16:01 UTC)

arXiv Page | PDF

Score: 0

Benefits and Costs of Adaptive Sampling

Published: 2026-04-27 16:19:53

Authors: Yu-Shiou Willy Lin, Dae Woong Ham, Iavor Bojinov

Categories: stat.ME, econ.EM

Abstract:
Multi-armed bandits are widely used for sequential experimentation in clinical trials, recommendation systems, and online platforms. While regret minimization and valid inference from adaptively collected data have each been studied extensively, a basic question remains: when does adaptivity \emph{improve estimation precision} relative to uniform designs, and how should inference be balanced against the online cost of experimentation? We first study arm-level mean estimation under mean-squared-error (MSE) objectives. We characterize when an adaptive Neyman allocation, which allocates samples according to arm variance, yields strict MSE improvements over uniform sampling. When there is variance heterogeneity across arms, these improvements arise at modest sample sizes, clarifying that adaptivity can be preferable for inference not only asymptotically, but also in many practical finite-sample settings. We then study a joint inference-regret objective that accounts for the cost of assigning units to inferior arms during experimentation. We propose the Static-Allocation Rate Policy (SARP) and Neyman-Adaptive Rate Policy (NARP), which interpolates between inference- and regret-oriented policies by adjusting exploration to the local structure of the instance. We show that SARP and NARP converge to the complete-information benchmark at the optimal rate as the sampling budget grows. Our proposed policies are practically attractive as it linearly interpolates between any standard regret-minimizing algorithm and inference-targeting adaptive policies. Yet we show it still enjoys the oracle-based asymptotic optimal rate. Simulations support the theory by demonstrating improved precision over uniform allocation while controlling performance loss across a range of instances.

Summary (gpt-4o-mini — added 2026-04-30 16:01 UTC)

arXiv Page | PDF

Score: 0

Electrical tunability of terahertz nonlinearity in graphene

Published: 2026-04-27 16:17:56

Authors: Sergey Kovalev, Hassan A. Hafez, Klaas-Jan Tielrooij, Jan-Christoph Deinert, Igor Ilyakov, Nilesh Awari, David Alcaraz, Karuppasamy Soundarapandian, David Saleta, Semyon Germanskiy, Min Chen, Mohammed Bawatna, Bertram Green, Frank H. L. Koppens, Martin Mittendorff, Mischa Bonn, Michael Gensch, Dmitry Turchinovich

Categories: cond-mat.mes-hall

Abstract:
Graphene is conceivably the most nonlinear optoelectronic material. Its nonlinear optical coefficients in the terahertz (THz) frequency range surpass those of other materials by many orders of magnitude. This, in particular, allows one to use graphene for extremely efficient up-conversion of sub-THz electronic input signals into the THz frequency range at room temperature and under ambient conditions, thus paving the way for practical graphene-based ultrahigh-frequency electronic technology. Here, we show that the THz nonlinearity of graphene can be efficiently controlled using electrical gating, with gating voltages as low as a few volts. For example, optimal electrical gating enhances the power conversion efficiency in THz third-harmonic generation in graphene by about two orders of magnitude. This essentially converts graphene from an almost perfectly linear, inert electronic material to a material with the highest possible THz nonlinearity. We demonstrate gating control of THz nonlinearity of graphene for both ultrashort single-cycle and quasi-monochromatic multi-cycle input signals. Our experimental results are in quantitative agreement with a physical model of graphene nonlinearity, describing the time-dependent thermodynamic balance maintained within the electronic population of graphene during interaction with ultrafast electric fields. Our results can serve as a basis for straightforward and accurate design of devices and applications for efficient electronic signal processing in graphene at ultra-high frequencies.

Summary (gpt-4o-mini — added 2026-04-30 16:02 UTC)

arXiv Page | PDF

Score: 0

K-MetBench: A Multi-Dimensional Benchmark for Fine-Grained Evaluation of Expert Reasoning, Locality, and Multimodality in Meteorology

Published: 2026-04-27 16:13:14

Authors: Soyeon Kim, Cheongwoong Kang, Myeongjin Lee, Eun-Chul Chang, Jaedeok Lee, Jaesik Choi

Categories: cs.CL, cs.AI

Abstract:
The development of practical (multimodal) large language model assistants for Korean weather forecasters is hindered by the absence of a multidimensional, expert-level evaluation framework grounded in authoritative sources. To address this, we introduce K-MetBench, a diagnostic benchmark grounded in national qualification exams. It exposes critical gaps across four dimensions: expert visual reasoning of charts, logical validity via expert-verified rationales, Korean-specific geo-cultural comprehension, and fine-grained domain analysis. Our evaluation of 55 models reveals a profound modality gap in interpreting specialized diagrams and a reasoning gap where models hallucinate logic despite correct predictions. Crucially, Korean models outperform significantly larger global models in local contexts, demonstrating that parameter scaling alone cannot resolve cultural dependencies. K-MetBench serves as a roadmap for developing reliable, culturally aware expert AI agents. The dataset is available at https://huggingface.co/datasets/soyeonbot/K-MetBench .

Summary (gpt-4o-mini — added 2026-04-30 16:03 UTC)

arXiv Page | PDF

Score: 0

Workplace Demands and Emotional Expression Among Early Childhood Educators: A Computational Analysis of Professional Online Discourse

Published: 2026-04-27 16:11:18

Authors: Hailong Jiang

Categories: cs.CY

Abstract:
Early childhood educators work in settings characterized by heavy regulation, emotional labor, staffing instability, and low pay. Although these conditions are well documented in survey-based research, less is known about how they manifest in the day-to-day language educators use in peer spaces. This study examines 7,506 posts from r/ECEProfessionals, a large online community used by early childhood education practitioners. Using a structured, computer-assisted thematic coding workflow and transformer-based emotion classification, posts were organized into 15 themes and mapped onto an adapted Job Demands-Resources (JD-R) framework. Across the corpus, 56.7% of posts centered on demands when task-level and core job demands were combined, compared with 33.6% focused on resources and 9.6% on career conditions. Emotion estimates indicated a broadly neutral tone overall; however, fear emerged as the most prominent non-neutral emotion. Demand-related categories also exhibited higher levels of sadness and anger than resource-related categories. These findings suggest that professional online discourse in early childhood education reflects a work environment structured more around strain than support. The study offers a practical framework for examining how occupational conditions are discussed and emotionally experienced in large-scale professional texts.

Summary (gpt-4o-mini — added 2026-04-30 16:03 UTC)

arXiv Page | PDF

Score: 0

DiffQEC: A versatile diffusion model for quantum error correction

Published: 2026-04-27 16:08:38

Authors: Tianyi Xu, Qinglong Liu, Maolin Wang, Fei Zhang, Zhe Zhao, Yang Wang, Ye Wei

Categories: quant-ph

Abstract:
Quantum computers could solve problems beyond the reach of classical devices, but this potential depends on quantum error correction (QEC) to protect fragile quantum states from noise. A central challenge in QEC is decoding: inferring likely physical errors from syndrome patterns generated by repeated stabilizer measurements. Existing decoders, including graph-based and neural approaches, typically return a single correction hypothesis and therefore discard the richer posterior structure of the error distribution conditioned on the observed syndrome. Here we recast QEC decoding as posterior inference using discrete denoising diffusion, exploiting the analogy between stochastic error accumulation and the forward diffusion process. We introduce DiffQEC, a generative decoder that combines a syndrome processor for multi-round spatial-temporal syndrome histories with syndrome feature modulation to condition denoising on the observed syndrome throughout inference. On experimental data from Google's superconducting quantum processor, DiffQEC reduces logical error rates by up to 10.2% relative to minimum-weight perfect matching and by about 5% relative to tensor-network decoding. These improvements persist for larger code distances up to 17 under depolarizing noise and for logical circuits of increasing depth. Beyond accuracy, the learned posterior provides confidence estimates for post-selection and reveals physically meaningful error structure, establishing posterior generative decoding as a practical framework for QEC.

arXiv Page | PDF

Score: 0

Meta-CoT: Enhancing Granularity and Generalization in Image Editing

Published: 2026-04-27 15:52:48

Authors: Shiyi Zhang, Yiji Cheng, Tiankai Hang, Zijin Yin, Runze He, Yu Xu, Wenxun Dai, Yunlong Lin, Chunyu Wang, Qinglin Lu, Yansong Tang

Categories: cs.CV, cs.AI, cs.LG, cs.MM

Abstract:
Unified multi-modal understanding/generative models have shown improved image editing performance by incorporating fine-grained understanding into their Chain-of-Thought (CoT) process. However, a critical question remains underexplored: what forms of CoT and training strategy can jointly enhance both the understanding granularity and generalization? To address this, we propose Meta-CoT, a paradigm that performs a two-level decomposition of any single-image editing operation with two key properties: (1) Decomposability. We observe that any editing intention can be represented as a triplet - (task, target, required understanding ability). Inspired by this, Meta-CoT decomposes both the editing task and the target, generating task-specific CoT and traversing editing operations on all targets. This decomposition enhances the model's understanding granularity of editing operations and guides it to learn each element of the triplet during training, substantially improving the editing capability. (2) Generalizability. In the second decomposition level, we further break down editing tasks into five fundamental meta-tasks. We find that training on these five meta-tasks, together with the other two elements of the triplet, is sufficient to achieve strong generalization across diverse, unseen editing tasks. To further align the model's editing behavior with its CoT reasoning, we introduce the CoT-Editing Consistency Reward, which encourages more accurate and effective utilization of CoT information during editing. Experiments demonstrate that our method achieves an overall 15.8% improvement across 21 editing tasks, and generalizes effectively to unseen editing tasks when trained on only a small set of meta-tasks. Our code, benchmark, and model are released at https://shiyi-zh0408.github.io/projectpages/Meta-CoT/

Summary (gpt-4o-mini — added 2026-04-30 16:04 UTC)

arXiv Page | PDF

Score: 0

NeSyCat: A Monad-Based Categorical Semantics of the Neurosymbolic ULLER Framework

Published: 2026-04-27 15:40:15

Authors: Daniel Romero Schellhorn, Till Mossakowski

Categories: cs.AI, cs.LO, math.CT, math.LO

Abstract:
ULLER (Unified Language for LEarning and Reasoning) offers a unified first-order logic (FOL) syntax, enabling its knowledge bases to be used directly across a wide range of neurosymbolic systems. The original specification endows this syntax with three pairwise independent semantics: classical, fuzzy, and probabilistic, each accompanied by dedicated semantic rules. We show that these seemingly disparate semantics are all instances of one categorical framework based on monads, the very construct that models side effects in functional programming. This enables the modular addition of new semantics and systematic translations between them. As example, we outline the addition of generalised quantification in Logic Tensor Networks (LTN) to arbitrary (also infinite) domains by extending the Giry monad to probability spaces. In particular, our approach allows a modular implementation of ULLER in Python and Haskell, of which we have published initial versions on GitHub.

Summary (gpt-4o-mini — added 2026-04-30 16:05 UTC)

arXiv Page | PDF

Score: 0

Evaluation of Pose Estimation Systems for Sign Language Translation

Published: 2026-04-27 15:38:20

Authors: Catherine O'Brien, Gerard Sant, Mathias Müller, Sarah Ebling

Categories: cs.CL

Abstract:
Many sign language translation (SLT) systems operate on pose sequences instead of raw video to reduce input dimensionality, improve portability, and partially anonymize signers. The choice of pose estimator is often treated as an implementation detail, with systems defaulting to widely available tools such as MediaPipe Holistic or OpenPose. We present a systematic comparison of pose estimators for pose-based SLT, covering widely used baselines (MediaPipe Holistic, OpenPose) and newer whole-body/high-capacity models (MMPose WholeBody, OpenPifPaf, AlphaPose, SDPose, Sapiens, SMPLest-X). We quantify downstream impact by training a controlled SLT pipeline on RWTH-PHOENIX-Weather 2014 where only the pose representation varies, evaluating with BLEU and BLEURT. To contextualize translation outcomes, we analyze temporal stability, missing hand keypoints, and robustness to occlusion using higher-resolution videos from the Signsuisse dataset. SDPose and Sapiens achieve the best translation performance (BLEU ~11.5), outperforming the common MediaPipe baseline (BLEU ~10). In occlusion cases, Sapiens is correct in all tested instances (15/15), while OpenPifPaf fails in nearly all (1/15) and also yields the weakest translation scores. Estimators that frequently leave out hand keypoints are associated with lower BLEU/BLEURT. We release code that can be used not only to reproduce our experiments, but also considerably lowers the barrier for other researchers to use alternative pose estimators.

Summary (gpt-4o-mini — added 2026-04-30 16:05 UTC)

arXiv Page | PDF

Score: 0

Children's Online Safety Risks and Ethical Considerations in XR Games

Published: 2026-04-27 15:28:53

Authors: Zinan Zhang, Xinning Gui, Yubo Kou

Categories: cs.HC

Abstract:
Emerging extended reality technologies are reshaping how children play, learn, and socialize. Yet, they also present serious safety risks. Gaming, a primary form of entertainment for children, is also one of the key applications of XR. While XR platforms offer immersive and engaging gaming experiences, recent news has highlighted safety concerns such as car accidents, lower judgment for real-world situations, and exposure to disturbing content like virtual rape. This research examines how XR game design may lead to online safety risks for children. Through analysis of player forums, game developer forums, and interviews with child players, we identify harmful XR design patterns, explore how developers collaboratively generate and implement risky game ideas, and document children's firsthand experiences of online safety risks. Existing ethical frameworks often fail to address the immersive and socially dynamic nature of XR games. We advocate for a child-centered, design-aware approach to ethical considerations in XR games, urging platforms and policymakers to prioritize children's developmental needs. Our work aims to help shape safer, more inclusive XR environments through research and cross-sector collaboration.

Summary (gpt-4o-mini — added 2026-04-30 16:06 UTC)

arXiv Page | PDF

Score: 0

DETOUR: A Practical Backdoor Attack against Object Detection

Published: 2026-04-27 15:25:55

Authors: Dazhuang Liu, Yanqi Qiao, Rui Wang, Kaitai Liang, Georgios Smaragdakis

Categories: cs.CR

Abstract:
Object detection (OD) is critical to real-world vision systems, yet existing backdoor attacks on detection transformers (DETRs) for OD tasks rely on patch-wise triggers optimized at fixed locations with minimal perturbations. Such attacks overlook that backdoor triggers in the real world may appear at different sizes, fields of view (FoVs), and locations in images, while minimal perturbations are difficult for cameras to capture, limiting attack practicality. We first observe that a patch-wise trigger in DETR delivers high attack effectiveness when activating the backdoor across neighboring locations, a phenomenon we term the trigger radiating effect (TRE). Meanwhile, inserting patch-wise triggers across multiple locations synergistically enhances TRE, resulting in high attack effectiveness across images. We propose DETOUR, a practical backdoor attack by using semantic triggers that are effective in real-world object detection systems. To ensure attack practicality, we rescale trigger patterns to different sizes and insert them at various predefined locations during backdoor training, enabling the model to recognize the trigger regardless of its spatial configurations. To address FoV variations in physical deployments, we extract the trigger pattern from a real-world object (e.g., a mug) captured under multiple FoVs and inject the trigger accordingly, promoting viewpoint-invariant backdoor activation and enhancing TRE across the entire image. As a result, the backdoor can be reliably activated under diverse FoVs and spatial configurations.

arXiv Page | PDF

Score: 0

Mass spectra of charged mesons and the quenching of vector meson condensation via exact phase-space diagonalization

Published: 2026-04-27 15:20:07

Authors: Jingyi Chao, Kun Xu

Categories: hep-ph

Abstract:
We investigate the dynamics and mass spectra of charged pseudoscalar ($π^+$) and vector ($ρ^+$) mesons in a background magnetic field at finite temperature using the two-flavor Nambu-Jona--Lasinio (NJL) model. By employing a quark propagator that isolates the Schwinger phase from its Landau level expansion, we formulate an exact non-commutative phase-space framework utilizing the Wigner-Weyl transform and the Moyal star product. This approach enables the algebraic diagonalization of the Bethe-Salpeter equations for composite states with asymmetric fractional constituent charges. For the pseudoscalar channel, we analytically verify the exact cancellation between the dynamical random phase approximation spatial sum rules and the vacuum gap equation. This identity preserves the generalized Goldstone theorem, causing the $π^+$ pole mass to strictly track the kinematic zero-point energy drift at order of $eB$. In the vector channel, our full phase-space evaluation reveals that the Zeeman spin-splitting emerges dynamically from microscopic threshold truncations governed by the chiral Dirac algebra. Notably, we find that the tachyonic instability of the spin-aligned $ρ^+$ state is quenched. The magnetic catalysis of the chiral condensate drives the continuum threshold ($2M$) upwards, overtaking the Zeeman attraction and preventing vector meson condensation within this mean-field framework. Furthermore, finite-temperature evaluations show a monotonic thermal suppression of the meson masses driven by Pauli blocking, yet all modes remain bound without undergoing Mott dissociation prior to chiral symmetry restoration.

arXiv Page | PDF

Score: 0

A Six-Term Functional Equation and a Quartic Dilogarithm Ladder

Published: 2026-04-27 15:11:28

Authors: Cetin Hakimoglu-Brown

Categories: math.CA

Abstract:
We introduce dilogarithm identities through a beta integral-based technique that we apply to construct new 3 and 6-term functional identities of the type $\sum r^{-1}_i L(X_i) \in π^2 \mathbf{Q}$. We derive simplified proof of the Loxton-Lewin identities though a new analytic method, what I call the ``radius method," which in conjunction with the above methods, is used to derive pair of quartic-base dilogarithm ladders, also believed to be new.

arXiv Page | PDF

Score: 0

Point-MF: One-step Point Cloud Generation from a Single Image via Mean Flows

Published: 2026-04-27 15:10:47

Authors: Yuta Baba, Keiji Yanai

Categories: cs.CV

Abstract:
Single-image point cloud reconstruction must infer complete 3D geometry, including occluded parts, from a single RGB image. While diffusion-based reconstructors achieve high accuracy, they typically require many denoising iterations, resulting in slow and expensive inference. We propose Point-MF, a Mean-Flow-based framework for low-NFE single-image point cloud reconstruction that couples a Mean-Flow-compatible architecture with an auxiliary loss. Specifically, Point-MF operates directly in point-cloud space to learn the mean velocity field and enables one-step reconstruction with a single network function evaluation (1-NFE), without relying on VAE-based latent representations. To make Mean Flow effective under large interval jumps, Point-MF employs a Diffusion Transformer tailored to the Mean-Flow setting, conditioned on frozen DINOv3 image features via a lightweight token adapter and equipped with explicit interval/time conditioning. Moreover, we introduce Denoised Space Anchor, a set-distance auxiliary loss on the denoised-space estimate $x_θ$ induced by the predicted velocity field, to stabilize large-step generation and reduce outliers and density artifacts. On ShapeNet-R2N2 and Pix3D, Point-MF strikes a strong balance between reconstruction quality and inference speed compared to multi-step diffusion baselines and competitive feedforward models, while generating high-quality point clouds with millisecond-level latency.

arXiv Page | PDF

Score: 0

Pair-Dependent Drift of Kerr Neighboring-Overtone Gap Minima

Published: 2026-04-27 15:09:07

Authors: Yuye Wu, Hong-Bo Jin

Categories: gr-qc

Abstract:
Quasinormal modes (QNMs) underpin black-hole ringdown modeling and spectroscopy, where higher overtones can contribute at early times and neighboring modes can become locally organized in the complex-frequency plane. Motivated by this, we continuously track Kerr overtones along a spin scan and study a raw neighboring-overtone quantity: the complex-frequency gap between adjacent overtones. We find robust interior minima whose spin locations drift from pair to pair, even within the same \((s,\ell,m)\) sector. We explain this by reformulating minimum setting as a local zero-setting problem for the complex separation between the two modes. Differentiating the squared gap yields a denominator-free real-projection diagnostic (the real part of the product between the complex separation, with conjugation, and its spin derivative). The sampled minimum is governed by an approximate local zero of this diagnostic, so minimum drift becomes the drift of its dominant zero crossing. This also yields a geometric picture: the minimum is a local radial turning event of the separation vector, while angular motion in the complex plane may persist. Finally, an expanded but deliberately restricted representative set (including same-family continuations, external positive transfers, and smooth no-trigger controls) supports this local picture for the triggered cases examined here. At the same time, the detailed local crossing environment remains intrinsically pair dependent.

arXiv Page | PDF

Score: 0

Improving Vision-language Models with Perception-centric Process Reward Models

Published: 2026-04-27 15:08:02

Authors: Yingqian Min, Kun Zhou, Yifan Li, Yuhuan Wu, Han Peng, Yifan Du, Wayne Xin Zhao, Min Yang, Ji-Rong Wen

Categories: cs.CV

Abstract:
Recent advancements in reinforcement learning with verifiable rewards (RLVR) have significantly improved the complex reasoning ability of vision-language models (VLMs). However, its outcome-level supervision is too coarse to diagnose and correct errors within the reasoning chain. To this end, we propose Perceval, a process reward model (PRM) that enables token-level error grounding, which can extract image-related claims from the response and compare them one by one with the visual evidence in the image, ultimately returning claims that contain perceptual errors. Perceval is trained with perception-intensive supervised training data. We then integrate Perceval into the RL training process to train the policy models. Specifically, compared to traditional GRPO, which applies sequence-level advantages, we apply token-level advantages by targeting penalties on hallucinated spans identified by Perceval, thus enabling fine-grained supervision signals. In addition to augmenting the training process, Perceval can also assist VLMs during the inference stage. Using Perceval, we can truncate the erroneous portions of the model's response, and then either have the model regenerate the response directly or induce the model to reflect on its previous output. This process can be repeated multiple times to achieve test-time scaling. Experiments show significant improvements on benchmarks from various domains across multiple reasoning VLMs trained with RL, highlighting the promise of perception-centric supervision as a general-purpose strategy. For test-time scaling, it also demonstrates consistent performance gains over other strategies, such as major voting. Our code and data will be publicly released at https://github.com/RUCAIBox/Perceval.

arXiv Page | PDF

Score: 0

Measuring the Unmeasurable: Markov Chain Reliability for LLM Agents

Published: 2026-04-27 15:05:45

Authors: Phat T. Tran-Truong, Xuan-Bach Le

Categories: cs.SE

Abstract:
Large language model (LLM) agents increasingly operate as sequential software systems, but their reliability is often summarized by scalar benchmark metrics. Metrics such as pass$@k$, pass$^k$, and the reliability decay curve (RDC) are useful summaries, but they do not identify the success-time distribution being estimated, test whether traces support that distribution, or quantify finite-trace uncertainty. We present \textsc{TraceToChain}, a reproducible pipeline that fits agent execution traces to an absorbing discrete-time Markov chain (DTMC), $\hat M=(\hat Q,\hat R_\oplus,\hat R_\ominus)$, with explicit diagnostics and uncertainty. The pipeline builds an automatic cluster taxonomy, estimates transitions with Laplace-smoothed maximum-likelihood estimation (MLE), checks fit with a composite Akaike information criterion (AIC) and Kolmogorov--Smirnov (KS) goodness-of-fit certificate, and reports Dirichlet-posterior credible intervals and non-parametric bootstrap intervals. We adapt classical reliability mathematics (Kemeny--Snell~\cite{kemenysnell}, Cheung~\cite{cheung1980}, Goel--Okumoto~\cite{goelokt}) to agent traces. The resulting first-passage view reconciles metrics usually reported separately: pass$@k$, pass$^k$, and the RDC are projections of one success-time distribution. On seven controlled MAST-style frameworks with a strict 50/50 fit/test protocol, held-out empirical RDCs overlay their analytic counterparts with max $L_\infty^{\mathrm{RDC}} = 0.053$ (median $0.048$). A two-sample KS test on the first-passage cumulative distribution function (CDF) accepts the fitted chain with $p>0.05$ on $7/7$ frameworks (min $p = 0.78$), and per-entry $95\%$ posterior and bootstrap intervals agree to $\approx\!0.01$ at the median.

arXiv Page | PDF

Score: 0

Hybrid Path-Sums for Hybrid Quantum Programs

Published: 2026-04-27 15:05:17

Authors: Christophe Chareton, Jad Issa, Mathieu Nguyen, Nicolas Blanco, Sébastien Bardin

Categories: cs.PL

Abstract:
As quantum computing becomes an emerging reality, designing efficient quantum programming capabilities is becoming more and more important. Particularly, the debugging and validation of quantum programs is of paramount importance, as these programs are by definition hard to test. Static analysis and formal verification methods for quantum programs started to emerge a few years now, yet they often miss hybrid quantum/classical reasoning facilities with, e.g., generic quantum control, classical control and classical computation instructions. In this paper, we lay out the foundations of a framework for the automated formal verification of (full) hybrid quantum programs featuring both classical and quantum control, measurement and hybrid data structures. In particular, we propose: (1) a novel symbolic representation for describing and manipulating sets of hybrid quantum/classical states called Hybrid Path-Sums (HPS); (2) a set of rewriting rules providing a rich mechanism for simplifying and reasoning on these symbolic hybrid states, and (3) a core assertion language to specify equivalence of hybrid quantum programs, the satisfaction of properties on (parts of) hybrid states, and the extraction of probabilistic statements about the program behavior. We prove the correctness of the novel symbolic representation, of its rewriting system and of the specification system. Finally, we propose a full implementation of this framework as a dedicated symbolic execution engine for hybrid programs. We present an evaluation of a set of representative hybrid case-studies from the literature, showcasing the advantage of our approach and its efficiency compared to state-of-the-art solutions.

arXiv Page | PDF

Score: 0

Diffusion Model as a Generalist Segmentation Learner

Published: 2026-04-27 15:04:13

Authors: Haoxiao Wang, Antao Xiang, Haiyang Sun, Peilin Sun, Changhao Pan, Yifu Chen, Minjie Hong, Weijie Wang, Shuang Chen, Yue Chen, Zhou Zhao

Categories: cs.CV

Abstract:
Diffusion models are primarily trained for image synthesis, yet their denoising trajectories encode rich, spatially aligned visual priors. In this paper, we demonstrate that these priors can be utilized for text-conditioned semantic and open-vocabulary segmentation, and this approach can be generalized to various downstream tasks to make a general-purpose diffusion segmentation framework. Concretely, we introduce DiGSeg (Diffusion Models as a Generalist Segmentation Learner), which repurposes a pretrained diffusion model into a unified segmentation framework. Our approach encodes the input image and ground-truth mask into the latent space and concatenates them as conditioning signals for the diffusion U-Net. A parallel CLIP-aligned text pathway injects language features across multiple scales, enabling the model to align textual queries with evolving visual representations. This design transforms an off-the-shelf diffusion backbone into a universal interface that produces structured segmentation masks conditioned on both appearance and arbitrary text prompts. Extensive experiments demonstrate state-of-the-art performance on standard semantic segmentation benchmarks, as well as strong open-vocabulary generalization and cross-domain transfer to medical, remote sensing, and agricultural scenarios-without domain-specific architectural customization. These results indicate that modern diffusion backbones can serve as generalist segmentation learners rather than pure generators, narrowing the gap between visual generation and visual understanding.

arXiv Page | PDF

Score: 0

Commutation classes of reduced words and higher Bruhat orders for affine permutations

Published: 2026-04-27 15:02:36

Authors: Sara Billey, Herman Chau, Kevin Liu

Categories: math.CO

Abstract:
The higher Bruhat orders are partial orders that generalize the weak order on the symmetric group $S_n$, and the second higher Bruhat order is a poset on commutation classes of reduced words for the longest element in $S_n$, where covering relations correspond to braid relations. Constructing analogs in other settings is an area of recent interest, and we present an analog that generalizes any interval $[id,w]$ in the weak order of both the symmetric group and the affine symmetric group. Paralleling the classical case, we show that the second higher Bruhat order is a poset on commutation classes of reduced words for any affine permutation. For the symmetric group, we also establish results for all higher Bruhat orders that are direct analogs of those in the classical case.

arXiv Page | PDF

Score: 0

Limiter Spaces: A Universal Extension for Limits of Real Sequences

Published: 2026-04-27 15:00:32

Authors: Steven Lapp, Marina Tvalavadze

Categories: math.GN

Abstract:
We introduce the Limiter, a universal extension of the real numbers and of the limit functional that assigns a canonical limit in an enlarged space to every real sequence. Motivated by generalized summation methods such as Borel summation and Ramanujan's assignments to divergent series, we require our extension to respect classical limits and assign limits in a way that depends only on the cluster points of a sequence and varies continuously when the cluster set is slightly modified.

arXiv Page | PDF

Score: 0

The Main Problem of Block Theory: Picky Elements and Subnormalizers

Published: 2026-04-27 14:55:40

Authors: Alexander Moretó

Categories: math.RT, math.GR

Abstract:
This article is essentially an English translation of a paper of mine, published in \emph{La Gaceta de la RSME}. Its aim is to present, for a broad mathematical audience, a research programme in local representation theory that goes beyond the classical restrictions to characters of $p'$-degree, characters of height zero, and blocks of abelian defect. The final and most recent part of this programme concerns Alperin's main problem of block theory: the search for local rules for character values. In that direction I describe the conjectures on picky elements and subnormalizers, which suggest that the sets ${\rm Irr}^x(G)$ and the subgroups ${\rm Sub}_G(x)$ are the natural objects attached to a $p$-element $x$.

arXiv Page | PDF

Score: 0

MEG-RAG: Quantifying Multi-modal Evidence Grounding for Evidence Selection in RAG

Published: 2026-04-27 14:51:00

Authors: Xihang Wang, Zihan Wang, Chengkai Huang, Quan Z. Sheng, Lina Yao

Categories: cs.CL, cs.IR, cs.IT

Abstract:
Multimodal Retrieval-Augmented Generation (MRAG) addresses key limitations of Multimodal Large Language Models (MLLMs), such as hallucination and outdated knowledge. However, current MRAG systems struggle to distinguish whether retrieved multimodal data truly supports the semantic core of an answer or merely provides superficial relevance. Existing metrics often rely on heuristic position-based confidence, which fails to capture the informational density of multimodal entities. To address this, we propose Multi-modal Evidence Grounding (MEG), a semantic-aware metric that quantifies the contribution of retrieved evidence. Unlike standard confidence measures, MEG utilizes Semantic Certainty Anchoring, focusing on high-IDF information-bearing tokens that better capture the semantic core of the answer. Building on MEG, we introduce MEG-RAG, a framework that trains a multimodal reranker to align retrieved evidence with the semantic anchors of the ground truth. By prioritizing high-value content based on semantic grounding rather than token probability distributions, MEG-RAG improves the accuracy and multimodal consistency of generated outputs. Extensive experiments on the M$^2$RAG benchmark show that MEG-RAG consistently outperforms strong baselines and demonstrates robust generalization across different teacher models.

arXiv Page | PDF

Score: 0

Towards Lawful Autonomous Driving: Deriving Scenario-Aware Driving Requirements from Traffic Laws and Regulations

Published: 2026-04-27 14:49:44

Authors: Bowen Jian, Rongjie Yu, Hong Wang, Liqiang Wang, Zihang Zou

Categories: cs.AI, cs.CL, cs.CY

Abstract:
Driving in compliance with traffic laws and regulations is a basic requirement for human drivers, yet autonomous vehicles (AVs) can violate these requirements in diverse real-world scenarios. To encode law compliance into AV systems, conventional approaches use formal logic languages to explicitly specify behavioral constraints, but this process is labor-intensive, hard to scale, and costly to maintain. With recent advances in artificial intelligence, it is promising to leverage large language models (LLMs) to derive legal requirements from traffic laws and regulations. However, without explicitly grounding and reasoning in structured traffic scenarios, LLMs often retrieve irrelevant provisions or miss applicable ones, yielding imprecise requirements. To address this, we propose a novel pipeline that grounds LLM reasoning in a traffic scenario taxonomy through node-wise anchors that encode hierarchical semantics. On Chinese traffic laws and OnSite dataset (5,897 scenarios), our method improves law-scenario matching by 29.1\% and increases the accuracy of derived mandatory and prohibitive requirements by 36.9\% and 38.2\%, respectively. We further demonstrate real-world applicability by constructing a law-compliance layer for AV navigation and developing an onboard, real-time compliance monitor for in-field testing, providing a solid foundation for future AV development, deployment, and regulatory oversight.

arXiv Page | PDF

Score: 0

Optimization of two-photon excitation by indistinguishable photons in a three-level atom

Published: 2026-04-27 14:48:49

Authors: Masood Valipour, Gniewomir Sarbicki, Karolina Słowik, Anita Dąbrowska

Categories: quant-ph

Abstract:
We investigate the excitation of a three-level ladder-type atom by a unidirectional field with a pair of indistinguishable photons. Starting from an analytical expression for the two-photon absorption probability, we determine the two-photon state that maximizes the population of the upper atomic state at a chosen time and show that, in the limit of an infinitely long pulse, perfect excitation is possible. The optimal state is identified as the time-reversed counterpart of the two-photon state emitted in spontaneous cascade decay. We then compare this ideal excitation strategy with experimentally accessible families of states, including symmetrized Gaussian product states, temporally correlated Gaussian states, and coherent pulses. We analyze how the optimal excitation conditions depend on the ratio of atomic decay rates and on the separation of the atomic transition frequencies. For indistinguishable photons described by Gaussian pulses, quantum interference may shift the maxima of the marginal spectral distribution away from the atomic resonances and qualitatively modify the optimal excitation strategy. Our results clarify the role of indistinguishability and correlations in two-photon absorption and provide guidance for designing realistic excitation schemes in quantum-optical light-matter interfaces .

arXiv Page | PDF

Score: 0

Algebraic expansivity on abelian groups

Published: 2026-04-27 14:46:46

Authors: Mauricio Achigar

Categories: math.DS, math.GR

Abstract:
Building on the author's earlier work on topological and abstract expansivity, this paper introduces and explores the notion of algebraic expansivity for endomorphisms of abelian groups. We analyze the fundamental properties of this algebraic analogue, establish its relationship with Weiss's algebraic entropy, and prove that positively expansive epimorphisms are necessarily restricted to finite systems. Finally, we demonstrate a robust connection with topological dynamics via Pontryagin duality: algebraic expansivity on torsion abelian groups is shown to be exactly the dual property of topological expansivity on totally disconnected compact groups.

arXiv Page | PDF

Score: 0

Energetics of stochastic limit-cycle oscillators: when does coupling reduce dissipation?

Published: 2026-04-27 14:45:02

Authors: Anton F. Burnet, Vansh Kharbanda, David Tobias, Benedikt Sabass

Categories: cond-mat.stat-mech

Abstract:
Non-linear oscillators serve important functions in many biological systems, including within the inner ear and neuronal networks. The sustainment of oscillations in noisy environments requires continuous energy dissipation, quantified by the steady-state entropy production rate (EPR). We study an idealized, analytically tractable model of a stochastic circular limit cycle and examine how mutual coupling in pairs and populations alters dissipation. For a single oscillator, the EPR depends on three key factors: intrinsic frequency, tangential velocity fluctuations, and mean tangential velocity. The dynamics are characterized by a dimensionless effective temperature given by the ratio of intrinsic relaxation and diffusion timescales. For radial (amplitude), phase (Kuramoto-like), and Cartesian couplings, we derive analytical expressions for the EPR and confirm them numerically. Varying the effective temperature and system size strongly influences how the EPR depends on coupling strength and, in some cases, results in qualitatively distinct behaviors. Moreover, the coupling types affect the tangential velocity distributions differently. Notably, in all cases studied, Cartesian coupling reduces the EPR relative to the uncoupled system, irrespective of effective temperature and system size. The analysis of idealized non-linear oscillators reveals that different classes of coupling interactions and competing timescales present in the oscillators have distinct effects on energy dissipation.

arXiv Page | PDF

Score: 0

GradMAP: Gradient-Based Multi-Agent Proximal Learning for Grid-Edge Flexibility

Published: 2026-04-27 14:43:02

Authors: Yihong Zhou, Hongtai Zeng, Thomas Morstyn

Categories: cs.LG, cs.AI

Abstract:
Coordinating large populations of grid-edge devices requires learning methods that remain fully decentralised in deployment while still respecting three-phase AC distribution-network physics. This paper proposes gradient-based multi-agent proximal learning (GradMAP) to address this challenge. GradMAP trains independent neural-network policies for each agent without any parameter sharing, and each agent uses only its own local observation for online decision-making without communication. During offline training, GradMAP embeds a differentiable three-phase AC power-flow model in a primal-dual learning loop and uses implicit differentiation to propagate exact network-constraint violations to update the policy parameters. To speed up training, GradMAP reuses expensive environment gradients through a proximal surrogate within a trust region defined in the more direct policy-output (action) space, instead of the probability distribution space used in other works, such as PPO. In case studies with 1,000 agents managing batteries, heat pumps, and controllable generators on the IEEE 123-bus feeder, GradMAP learns decentralised policies that minimise three-phase AC load-flow constraint violations within 15 minutes of training on a single workstation-class NVIDIA RTX PRO 5000 Blackwell 48GB GPU. This is a 3--5x training speed-up over gradient-based self-supervised learning benchmarks and substantially better training efficiency than multi-agent reinforcement-learning benchmarks. In out-of-sample tests, GradMAP also delivers among the lowest operating cost and constraint violations.

arXiv Page | PDF

Score: 0

Extreme bandits

Published: 2026-04-27 14:40:22

Authors: Alexandra Carpentier, Michal Valko

Categories: stat.ML, cs.LG

Abstract:
In many areas of medicine, security, and life sciences, we want to allocate limited resources to different sources in order to detect extreme values. In this paper, we study an efficient way to allocate these resources sequentially under limited feedback. While sequential design of experiments is well studied in bandit theory, the most commonly optimized property is the regret with respect to the maximum mean reward. However, in other problems such as network intrusion detection, we are interested in detecting the most extreme value output by the sources. Therefore, in our work we study extreme regret which measures the efficiency of an algorithm compared to the oracle policy selecting the source with the heaviest tail. We propose the ExtremeHunter algorithm, provide its analysis, and evaluate it empirically on synthetic and real-world experiments.

arXiv Page | PDF

Score: 0

Quantization-Aware EE Optimization and SE-EE Tradeoff for MiLAC-Aided MU-MISO Beamforming

Published: 2026-04-27 14:35:13

Authors: Yuchen Zhang, Pinjun Zheng, Tareq Y. Al-Naffouri

Categories: eess.SP

Abstract:
In large antenna arrays, hardware power consumption becomes a dominant design constraint, making energy efficiency (EE) a first-class objective alongside spectral efficiency (SE). Microwave linear analog computer (MiLAC)-aided beamforming, whose front end is a passive reciprocal stream-to-antenna network, addresses this tension by reducing the active radio-frequency chain count to the stream number, at a moderate SE cost. Despite this promise, no EE optimization framework has been established for MiLAC-aided beamforming that accounts for digital-to-analog converter quantization noise and post-quantized transmit power. We fill this gap for downlink multiuser multiple-input single-output (MU-MISO) systems by formulating quantization-aware EE maximization over the MiLAC-feasible beamformer and characterizing the resulting SE-EE tradeoff. Three contributions follow. First, we prove a row-space optimality property of the effective MiLAC-aided beamformer, yielding an equivalent reduced-dimension reformulation whose complexity scales with the stream number rather than the antenna number. Second, we develop a low-complexity Dinkelbach-weighted minimum mean-square error algorithm aided by projected gradient descent that is guaranteed to converge to a stationary point. Third, we cast the SE-EE tradeoff as a multi-objective problem and trace its Pareto boundary via a weighted-sum method that combines an alternative reduced-dimension coordinate with auxiliary-variable successive convex approximation, yielding convex per-iteration subproblems with guaranteed convergence. Numerical results on a DeepMIMO v4 deployment show MiLAC-aided beamforming substantially improves EE over digital and hybrid benchmarks at a moderate SE cost and significantly expands the achievable SE-EE operating region.

arXiv Page | PDF

Score: 0

Search for associated production of a Higgs boson and two vector bosons via vector boson scattering at $\sqrt{s}$ = 13 TeV

Published: 2026-04-27 14:30:10

Authors: CMS Collaboration

Categories: hep-ex

Abstract:
A search for Higgs boson (H) production in association with two vector bosons (V = W, Z) via vector boson scattering (VBS) is presented using proton-proton collision data collected at $\sqrt{s}$ = 13 TeV by the CMS experiment, corresponding to an integrated luminosity of 138 fb$^{-1}$. Events containing two forward jets consistent with VBS, a large-radius jet from the decay of a boosted H to a pair of b quarks, and 0, 1, or 2 charged leptons coming from V decays are selected. The process is excluded at 95% CL for observed (expected) values of the VVHH coupling modifier $κ_\mathrm{VV}$ outside the interval 0.40 $\lt$ $κ_\mathrm{VV}$ $\lt$ 1.60 (0.34 $\lt$ $κ_\mathrm{VV}$ $\lt$ 1.66), assuming standard model values for all other couplings, thus establishing a novel probe of the VVHH interaction. Constraints are also set on the individual $κ_\mathrm{2W}$ and $κ_\mathrm{2Z}$ coupling modifiers, and on the allowed region in the $κ_\mathrm{2W}$-$κ_\mathrm{2Z}$ plane.

arXiv Page | PDF

Score: 0

Understanding the Limits of Automated Evaluation for Code Review Bots in Practice

Published: 2026-04-27 14:25:35

Authors: Veli Karakaya, Utku Boran Torun, Baykal Mehmet Uçar, Eray Tüzün

Categories: cs.SE, cs.AI

Abstract:
Automated code review (ACR) bots are increasingly used in industrial software development to assist developers during pull request (PR) review. As adoption grows, a key challenge is how to evaluate the usefulness of bot-generated comments reliably and at scale. In practice, such evaluation often relies on developer actions and annotations that are shaped by contextual and organizational factors, complicating their use as objective ground truth. We examine the feasibility and limitations of automating the evaluation of LLM-powered ACR bots in an industrial setting. We analyze an industrial dataset from Beko comprising 2,604 bot-generated PR comments, each labeled by software engineers as fixed/wontFix. Two automated evaluation approaches, G-Eval and an LLM-as-a-Judge pipeline, are applied using both binary decisions and a 0-4 Likert-scale formulation, enabling a controlled comparison against developer-provided labels. Across Gemini-2.5-pro, GPT-4.1-mini, and GPT-5.2, both evaluation strategies achieve only moderate alignment with human labels. Agreement ratios range from approximately 0.44 to 0.62, with noticeable variation across models and between binary and Likert-scale formulations, indicating sensitivity to both model choice and evaluation design. Our findings highlight practical limitations in fully automating the evaluation of ACR bot comments in industrial contexts. Developer actions such as resolving or ignoring comments reflect not only comment quality, but also contextual constraints, prioritization decisions, and workflow dynamics that are difficult to capture through static artifacts. Insights from a follow-up interview with a software engineering director further corroborate that developer labeling behavior is strongly influenced by workflow pressures and organizational constraints, reinforcing the challenges of treating such signals as objective ground truth.

arXiv Page | PDF

Score: 0

Point Cloud Registration for Fusion between SPECT MPI and CTA Images

Published: 2026-04-27 14:24:49

Authors: Ni Yao, Xiangyu Liu, Shaojie Tang, Danyang Sun, Chuang Han, Yanting Li, Jiaofen Nan, Chengyang Li, Fubao Zhu, Chen Zhao, Zhihui Xu, Weihua Zhou

Categories: cs.CV

Abstract:
Clinical fusion of Single Photon Emission Computed Tomography Myocardial Perfusion Imaging (SPECT MPI) and Computed Tomography Angiography (CTA) remains limited by cross-modality misregistration and reliance on manual landmarks, which can hinder accurate ischemia localization and lesion-level functional assessment. To address this issue, we propose a registration and fusion framework for SPECT MPI and CTA that integrates functional and structural information for comprehensive cardiac evaluation. The proposed pipeline performs U-Net-based segmentation on both modalities. On SPECT MPI, only the left ventricle (LV) is extracted, and anatomical landmarks are automatically derived from characteristic LV structures. On CTA, both ventricles are segmented, and their spatial relationship is used to automatically define landmarks at the interventricular septal junction. Scale-space consistency preprocessing and landmark-driven coarse registration are applied to mitigate initial misalignment. Based on this initialization, multiple fine registration methods are evaluated on LV epicardial surface point clouds, including ICP, SICP, CPD, CluReg, FFD, and BCPD-plus-plus. The resulting transformations are then propagated to voxel-level resampling for high-precision SPECT-CTA fusion. In a retrospective cohort of 60 patients, the proposed framework preserved sub-millimeter coronary detail from CTA while accurately overlaying quantitative SPECT perfusion. Among the evaluated methods, BCPD-plus-plus achieved the highest accuracy with a mean point cloud distance of 1.7 mm. By combining robust initialization, comparative fine registration, and voxel-level fusion, the proposed approach provides a practical solution for myocardial ischemia localization and functional evaluation of coronary lesions, while remaining independent of any specific fine registration algorithm.

arXiv Page | PDF

Score: 0

NTP2 topological structures

Published: 2026-04-27 14:24:17

Authors: Pablo andújar Guerrero

Categories: math.LO

Abstract:
A subset of a topological space is constructible if it is a finite Boolean combination of closed sets. We prove that every NTP$_2$ expansion of $(\mathbb{R},<,+)$ by constructible sets defines only constructible sets, and that definable functions are generically piecewise continuous. The result also holds for all NTP$_2$ expansions of $(\mathbb{Q}_p,+,\cdot)$, and all NTP$_2$ definably complete expansions of ordered groups. In the latter case, the structure is generically locally o-minimal, has definable choice, and carries a well-behaved notion of naive topological dimension. For NIP uniform topological structures, constructibility of definable sets is preserved in the Shelah expansion. We classify strong expansions of $(\mathbb{R},<,+)$ by constructible sets, and obtain results on NTP$_2$ d-minimal structures.

arXiv Page | PDF

Score: 0

Sliding Mode Control for Safe Trajectory Tracking with Moving Obstacles Avoidance: Experimental Validation on Planar Robots

Published: 2026-04-27 14:20:10

Authors: Shubham Sawarkar, P Sangeerth, S Saharsh, Pushpak Jagtap

Categories: eess.SY, cs.RO, math.OC

Abstract:
This paper presents a unified control framework for robust trajectory tracking and moving obstacle avoidance applicable to a broad class of mobile robots. By formulating a generalized kinematic transformation, we convert diverse vehicle dynamics into a strict feedback form, facilitating the design of a Sliding Mode Control (SMC) strategy for precise and robust reference tracking. To ensure operational safety in dynamic environments, the tracking controller is integrated with a Collision Cone Control Barrier Function (C3BF) based safety filter. The proposed architecture guarantees asymptotic tracking in the presence of external disturbances while strictly enforcing collision avoidance constraints. The novelty of this work lies in designing a sliding mode controller for ground robots like the Ackermann drive, which has not been done before. The efficacy and versatility of the approach are validated through numerical simulations and extensive real-world experiments on three distinct platforms: an Ackermann-steered vehicle, a differential drive robot, and a quadrotor drone. Video of the experiments are available at https://youtu.be/dWcxwum96vk

arXiv Page | PDF

Score: 0

Self-Supervised Representation Learning via Hyperspherical Density Shaping

Published: 2026-04-27 14:03:01

Authors: Esteban Rodríguez-Betancourt, Edgar Casasola-Murillo

Categories: cs.CV

Abstract:
Modern self-supervised representation learning methods often relies on empirical heuristics that are not theoretically grounded. In this study we propose HyDeS, a theoretically grounded method based on multi-view mutual information maximization within an hyperspherical space using Shannon differential entropy with a non-parametric von Mises-Fisher density estimator. We show that HyDeS bias the trained model towards focusing on foreground features of the images and perform well on segmentation tasks such as VOC PASCAL, while it lags in fine-grained classification. We provide a detailed analysis of the induced latent space geometry and learning dynamics, that can be used for designing other theoretically grounded self-supervised learning methods.

arXiv Page | PDF

Score: 0

Deployment-Aligned Low-Precision Neural Architecture Search for Spaceborne Edge AI

Published: 2026-04-27 13:58:18

Authors: Parampuneet Kaur Thind, Vaibhav Katturu, Giacomo Zema, Roberto Del Prete

Categories: cs.CV, cs.AI, cs.ET, cs.LG, cs.NE

Abstract:
Designing deep networks that meet strict latency and accuracy constraints on edge accelerators increasingly relies on hardware-aware optimization, including neural architecture search (NAS) guided by device-level metrics. Yet most hardware-aware NAS pipelines still optimize architectures under full-precision assumptions and apply low-precision adaptation only after the search, leading to a mismatch between optimization-time behavior and deployment-time execution on low-precision hardware that can substantially degrade accuracy. We address this limitation by integrating deployment-aligned low-precision training directly into hardware-aware NAS. Candidate architectures are exposed to FP16 numerical constraints during fine-tuning and evaluation, enabling joint optimization of architectural efficiency and numerical robustness without modifying the search space or evolutionary strategy. We evaluate the proposed framework on vessel segmentation for spaceborne maritime monitoring, targeting the Intel Movidius Myriad X Visual Processing Unit (VPU). While post-training precision conversion reduces on-device performance from 0.85 to 0.78 mIoU, deployment-aligned low-precision training achieves 0.826 mIoU on-device for the same architecture (95,791 parameters), recovering approximately two-thirds of deployment-induced accuracy gap without increasing model complexity. These results demonstrate that incorporating deployment-consistent numerical constraints into hardware-aware NAS substantially improves robustness and alignment between optimization and deployment for resource-constrained edge Artificial Intelligence (AI).

arXiv Page | PDF

Score: 0

Integral representation of polynomial local functionals on convex functions

Published: 2026-04-27 13:54:23

Authors: Jonas Knoerr

Categories: math.FA

Abstract:
Integral representations for continuous polynomial local functionals on convex functions are established in terms of a finite family of polynomials. This result is obtained by approximation from a classification of the dense subspace of smooth polynomial local functionals, which is based on a Paley--Wiener--Schwartz-type classification of the Goodey--Weil distributions associated to these functionals under support restrictions. As an application, density results for various families of Monge--Ampère-type operators are established.

arXiv Page | PDF

Score: 0

Blur Effects on User Performance in Target-Pointing Tasks

Published: 2026-04-27 13:48:21

Authors: Ryuto Tomihari, Taiki Kinoshita, Yosuke Oba, Shota Yamanaka, Homei Miyashita

Categories: cs.HC

Abstract:
In projectors and head-mounted displays, an out-of-focus image appears blurred. Even when a display itself is in focus, computer operation may be hindered if the display is far from the user or if a user has poor visual acuity, because the user cannot see the screen clearly. In this study, we conducted an experiment in which participants performed a pointing task under blurred display conditions and investigated the relationship between blur strength and user performance. The results showed that movement time and error rate increased as blur became stronger, and that the effect of blur on movement time was larger when targets were smaller. We further showed that movement time can be estimated with high accuracy by a model that improves on Fitts' law. In a follow-up experiment to examine the applicability of this model, we adjusted target size for each participant and showed that the effect of blur level on movement time could be reduced. These findings suggest potential use in tools that adapt user interfaces to users' visual acuity.

arXiv Page | PDF

Score: 0

Agentic clinical reasoning over longitudinal myeloma records: a retrospective evaluation against expert consensus

Published: 2026-04-27 13:41:18

Authors: Johannes Moll, Jannik Lübberstedt, Christoph Nuernbergk, Jacob Stroh, Luisa Mertens, Anna Purcarea, Christopher Zirn, Zeineb Benchaaben, Fabian Drexel, Hartmut Häntze, Anirudh Narayanan, Friedrich Puttkammer, Andrei Zhukov, Jacqueline Lammert, Sebastian Ziegelmayer, Markus Graf, Marion Högner, Marcus Makowski, Florian Bassermann, Lisa C. Adams, Jiazhen Pan, Daniel Rueckert, Krischan Braitsch, Keno K. Bressem

Categories: cs.AI, cs.CL

Abstract:
Multiple myeloma is managed through sequential lines of therapy over years to decades, with each decision depending on cumulative disease history distributed across dozens to hundreds of heterogeneous clinical documents. Whether LLM-based systems can synthesise this evidence at a level approaching expert agreement has not been established. A retrospective evaluation was conducted on longitudinal clinical records of 811 myeloma patients treated at a tertiary centre (2001-2026), covering 44,962 documents and 1,334,677 laboratory values, with external validation on MIMIC-IV. An agentic reasoning system was compared against single-pass retrieval-augmented generation (RAG), iterative RAG, and full-context input on 469 patient-question pairs from 48 templates at three complexity levels. Reference labels came from double annotation by four oncologists with senior haematologist adjudication. Iterative RAG and full-context input converged on a shared ceiling (75.4% vs 75.8%, p = 1.00). The agentic system reached 79.6% concordance (95% CI 76.4-82.8), exceeding both baselines (+3.8 and +4.2 pp; p = 0.006 and 0.007). Gains rose with question complexity, reaching +9.4 pp on criteria-based synthesis (p = 0.032), and with record length, reaching +13.5 pp in the top decile (n = 10). The system error rate (12.2%) was comparable to expert disagreement (13.6%), but severity was inverted: 57.8% of system errors were clinically significant versus 18.8% of expert disagreements. Agentic reasoning was the only approach to exceed the shared ceiling, with gains concentrated on the most complex questions and longest records. The greater clinical consequence of residual system errors indicates that prospective evaluation in routine care is required before these findings translate into patient benefit.

arXiv Page | PDF

Score: 0

Zero-shot Large Language Models for Automatic Readability Assessment

Published: 2026-04-27 13:38:44

Authors: Riley Grossman, Yi Chen

Categories: cs.CL

Abstract:
Unsupervised automatic readability assessment (ARA) methods have important practical and research applications (e.g., ensuring medical or educational materials are suitable for their target audiences). In this paper, we propose a new zero-shot prompting methodology for ARA and present the first comprehensive evaluation of using large language models (LLMs) as an unsupervised ARA method by testing 10 diverse open-source LLMs (e.g., different sizes and developers) on 14 diverse datasets (e.g., different text lengths and languages). Our findings show that our proposed prompting methodology outperforms prior methods on 13 of the 14 datasets. Furthermore, we propose LAURAE, which combines LLM and readability formula scores to improve robustness by capturing both contextual and shallow (e.g., sentence length) features of readability. Our evaluation demonstrates that LAURAE robustly outperforms prior methods across languages, text lengths, and amounts of technical language.

arXiv Page | PDF

Score: 0

A Survey on Split Learning for LLM Fine-Tuning: Models, Systems, and Privacy Optimizations

Published: 2026-04-27 13:36:54

Authors: Zihan Liu, Yizhen Wang, Rui Wang, Xiu Tang, Sai Wu

Categories: cs.CR, cs.CL, cs.DC, cs.LG

Abstract:
Fine-tuning unlocks large language models (LLMs) for specialized applications, but its high computational cost often puts it out of reach for resource-constrained organizations. While cloud platforms could provide the needed resources, data privacy concerns make sharing sensitive information with third parties risky. A promising solution is split learning for LLM fine-tuning, which divides the model between clients and a server, allowing collaborative and secure training through exchanged intermediate data, thus enabling resource-constrained participants to adapt LLMs safely. % In light of this, a growing body of literature has emerged to advance this paradigm, introducing varied model methods, system optimizations, and privacy defense-attack techniques for split learning. To bring clarity and direction to the field, a comprehensive survey is needed to classify, compare, and critique these diverse approaches. This paper fills the gap by presenting the first extensive survey dedicated to split learning for LLM fine-tuning. We propose a unified, fine-grained training pipeline to pinpoint key operational components and conduct a systematic review of state-of-the-art work across three core dimensions: model-level optimization, system-level efficiency, and privacy preservation. Through this structured taxonomy, we establish a foundation for advancing scalable, robust, and secure collaborative LLM adaptation.

arXiv Page | PDF

Score: 0

Adaptive Tensor Network Sampling for Quantum Optimal Control

Published: 2026-04-27 13:36:00

Authors: Zeki Zeybek, Rick Mukherjee, Peter Schmelcher

Categories: quant-ph, physics.comp-ph

Abstract:
Quantum optimal control (QOC) provides a systematic framework for achieving high-fidelity operations in quantum systems and plays a central role in tasks such as gate synthesis, state transfer, and pulse design. Existing QOC methods broadly fall into two categories: gradient-based and gradient-free algorithms. The associated optimization landscape is often high-dimensional, non-convex, and populated by numerous local minima, making efficient gradient-free search strategies essential. To address this, we introduce a gradient-free matrix product state/tensor train (MPS/TT) sampling heuristic for discrete quantum optimal control. In our approach, the MPS defines a score function over the space of discrete control parameters, which in turn induces a sampling distribution over candidate control sequences. This distribution is iteratively refined through selection of better performing sequences and local tensor updates to bias the search toward high-performing sequences. We evaluate the method on a range of benchmark problems, including single-qubit state transfer, Bell-pair preparation, qutrit gate implementation, and open-system population transfer. Across these tasks, the method exhibits stable convergence behavior and competitive empirical performance relative to established gradient-free baselines. These results suggest that tensor network sampling offers a viable heuristic framework for discrete quantum control.

arXiv Page | PDF

Score: 0

TextGround4M: A Prompt-Aligned Dataset for Layout-Aware Text Rendering

Published: 2026-04-27 13:28:57

Authors: Dongxing Mao, Yilin Wang, Linjie Li, Zhengyuan Yang, Alex Jinpeng Wang

Categories: cs.CV

Abstract:
Despite recent advances in text-to-image generation, models still struggle to accurately render prompt-specified text with correct spatial layout -- especially in multi-span, structured settings. This challenge is driven not only by the lack of datasets that align prompts with the exact text and layout expected in the image, but also by the absence of effective metrics for evaluating layout quality. To address these issues, we introduce TextGround4M, a large-scale dataset of over 4 million prompt-image pairs, each annotated with span-level text grounded in the prompt and corresponding bounding boxes. This enables fine-grained supervision for layout-aware, prompt-grounded text rendering. Building on this, we propose a lightweight training strategy for autoregressive T2I models that appends layout-aware span tokens during training, without altering model architecture or inference behavior. We further construct a benchmark with stratified layout complexity to evaluate both open-source and proprietary models in a zero-shot setting. In addition, we introduce two layout-aware metrics to address the long-standing lack of spatial evaluation in text rendering. Our results show that models trained on TextGround4M outperform strong baselines in text fidelity, spatial accuracy, and prompt consistency, highlighting the importance of fine-grained layout supervision for grounded T2I generation.

arXiv Page | PDF

Score: 0

Thermodynamic Parametrisation of the Vertebrate Lifetime Cycle Invariant: Biological Proper Time, Allometric Mass-Cancellation, and Clade-Specific Predictions

Published: 2026-04-27 13:28:43

Authors: Mesfin Taye

Categories: cond-mat.stat-mech

Abstract:
Warm-blooded vertebrates accumulate approximately $\Nstar \approx 10^9$ cardiac cycles over a natural lifetime, a striking empirical regularity first quantified by Lindstedt and Calder yet lacking a physical interpretation. We propose that this invariance is consistent with a conserved thermodynamic budget, formulated here as the Principle of Biological Time Equivalence (PBTE). The framework rests on a constitutive closure $\dotΣ = σ_0 f$, which links the entropy production rate to the intrinsic physiological frequency; integration over the lifespan yields $Σ_{\mathrm{life}} = σ_0 \Nstar$, so that the observed constancy of $\Nstar$ corresponds to an approximately constant lifetime entropy budget. Algebraic exponent cancellation under Kleiber and Calder scaling laws, $\sigstar \propto M^{3/4+1/4-1}=M^0$, is consistent with mass-independence and reproduces the numerical value $N_0 \approx 1.52\times10^9$ without free parameters. The framework offers a thermodynamically consistent account of two outstanding problems: the origin of the numerical value of $\Nstar$ and the systematic deviations observed across clades. A multiplicative correction factor $Φ_C$, constructed from physiological determinants -- activity allocation, body temperature, mitochondrial efficiency, and extrinsic hazard -- predicts long-lived clades as regimes of reduced effective entropy production per cardiac cycle.

arXiv Page | PDF

Score: 0

On the Footprints of Reviewer Bots Feedback on Agentic Pull Requests in OSS GitHub Repositories

Published: 2026-04-27 13:17:13

Authors: Syeda Kaneez Fatima, Yousuf Abrar, Abdul Rehman Tahir, Amelia Nawaz, Shamsa Abid, Abdul Ali Bangash

Categories: cs.SE

Abstract:
Autonomous coding agents are reshaping software development by creating pull requests (PRs) on GitHub, referred to as agentic PRs. In parallel, the review process is also becoming autonomous, thereby making reviewer bots key actors in the assessment of these agentic PRs. However, their influence on PR acceptance and resolution remains unclear. This study empirically investigates the relationship between reviewer-bot feedback and PR outcomes by analyzing how Reviewer Bot Feedback Quality (relevance, clarity, conciseness) and Reviewer Bot Activity Volume (comment count) are associated with PR acceptance and resolution time. We analyze 7,416 reviewer-bot comments on 4,532 PRs from the AI_Dev dataset (a dataset that captured AI agents' PRs in GitHub projects). Our results show that reviewer-bot comments mainly focus on bug fixes, testing, and documentation, are civil in tone, and are prescriptive in nature. Reviewer bots generally produce clear and concise feedback, though the semantic relevance of comments to underlying code changes is moderate. We find that higher Reviewer Bot Activity volume is associated with longer PR resolution times and lower average feedback quality, showing that as bots generate more comments on a PR, the average pertinence of that feedback appears to degrade. At the same time, Reviewer Bot Feedback Quality shows no meaningful association with workflow outcomes. Our findings suggest that, in agentic PR workflows, reviewer bots should prioritize targeted high-relevance feedback over generating large numbers of comments.

arXiv Page | PDF

Score: 0

Envisioning Mobile Data Visualization Libraries for Digital Health

Published: 2026-04-27 13:13:50

Authors: Bongshin Lee, Seongjae Bae, Mengying Li, Eun Kyoung Choe

Categories: cs.HC

Abstract:
Mobile health (mHealth) applications support health management through rich data collection and self-reflection, yet the quality of their visualizations varies widely. A key limitation is the suboptimal design of visualizations for small-screen devices. We argue that this gap is partly driven by a lack of specialized developer tools. Existing libraries primarily target desktop or general-purpose mobile use, providing limited support for health-specific semantics such as normal ranges, thresholds, and goals. As a result, developers often resort to custom solutions that are inconsistent or hard to interpret. We therefore advocate for dedicated mobile visualization libraries tailored to personal health data and mobile contexts, and discuss key design considerations including intelligent defaults, built-in health annotations, and fluid interactions. Such libraries can lower barriers, promote consistency, and enable more accessible and interpretable mHealth applications.

arXiv Page | PDF

Score: 0

Energy spectrum of magnetic fields from electroweak symmetry breaking

Published: 2026-04-27 13:12:08

Authors: Károly Seller, Günter Sigl

Categories: hep-ph

Abstract:
We study the magnetic fields produced in the early Universe during the electroweak symmetry breaking by considering random configurations of an inhomogeneous Higgs field. By exploiting the inherent randomness of the initial configurations the spectrum of the produced magnetic field is essentially analytic, which bypasses the need for costly lattice simulations. On the numerical side, we devise a simulation framework which results in continuous fields capable of resolving the small-scale structure of the fields that was inaccessible for the lattice-based calculation. Finally, by revisiting the effects of statistical isotropy and causality on the spectrum, we define general correlation functions that are then fitted to the simulation data and compared to the analytic results.

arXiv Page | PDF

Score: 0

Can You Make It Sound Like You? Post-Editing LLM-Generated Text for Personal Style

Published: 2026-04-27 13:11:57

Authors: Connor Baumler, Calvin Bao, Huy Nghiem, Xinchen Yang, Marine Carpuat, Hal Daumé

Categories: cs.CL

Abstract:
Despite the growing use of large language models (LLMs) for writing tasks, users may hesitate to rely on LLMs when personal style is important. Post-editing LLM-generated drafts or translations is a common collaborative writing strategy, but it remains unclear whether users can effectively reshape LLM-generated text to reflect their personal style. We conduct a pre-registered online study ($n=81$) in which participants post-edit LLM-generated drafts for writing tasks where personal style matters to them. Using embedding-based style similarity metrics, we find that post-editing increases stylistic similarity to participants' unassisted writing and reduces similarity to fully LLM-generated output. However, post-edited text still remains stylistically closer in style to LLM text than to participants' unassisted control text, and it exhibits reduced stylistic diversity compared to unassisted human text. We find a gap between perceived stylistic authenticity and model-measured stylistic similarity, with post-edited text often perceived as representative of participants' personal style despite remaining detectable LLM stylistic traces.

arXiv Page | PDF

Score: 0

Weighted Directional Total Nuclear Variation for Joint Yttrium-90 PET/SPECT Reconstruction with CTAC-derived Guidance

Published: 2026-04-27 13:03:35

Authors: S Porter, D Deidda, D R McGowan, J Anton-Rodriguez, S Arridge, K Thielemans

Categories: physics.med-ph

Abstract:
Quantitative post-treatment activity imaging is essential for personalised dosimetry after Yttrium-90 selective internal radiation therapy (SIRT). Yttrium-90 PET offers high spatial resolution but is extremely low-count, whereas bremsstrahlung SPECT has higher count statistics but is degraded by blur, scatter, and septal penetration. Since both modalities image the same microsphere distribution, joint synergistic reconstruction can exploit their physical coupling, with CT attenuation correction (CTAC) providing additional anatomical guidance. We propose weighted directional total nuclear variation (w-dTNV), a joint variational regulariser for coupled PET/SPECT reconstruction with CTAC-guided anisotropy. w-dTNV penalises the nuclear norm of a dual-modality Jacobian, promoting co-located, geometry-consistent edges without forcing intensity correlation. Directionality is derived from the CTAC attenuation map $μ$ and applied to PET/SPECT gradients, allowing efficient per-voxel spectral computations. PET/SPECT scale disparity is mitigated using data-driven modality normalisation from preliminary reconstructions. We evaluate w-dTNV on a NEMA IEC phantom with 20 bootstrapped PET noise realisations and on 9 post-SIRT patients with 45 lesions, against dTV, w-TNV, and sequential hybrid kernel expectation maximisation (SHKEM). In the phantom, w-dTNV improves recovery coefficients over dTV and w-TNV, and improves recovery over SHKEM for the smallest spheres. In patients, w-dTNV gives higher tumour-to-background ratios than SHKEM at comparable background activity. These results suggest that CTAC-guided synergistic variational coupling improves lesion recovery and clinical lesion contrast, offering a practical route towards more stable post-SIRT Yttrium-90 activity estimates for personalised dosimetry.

arXiv Page | PDF

Score: 0