Post

arXiv cs.AI Daily Update

arXiv cs.AI Daily Update

cs.AI 领域 2026年4月17日 共有 356 篇论文更新:

整体趋势:今日论文主要聚焦于LLM Agent、LLM Reasoning、Medical AI等方向。

已录用论文[24](ICLR 2026 Workshop), [25](ACL 2026), [27](ACL 2026), [30](ICAIL 2026), [35](ACL 2026), [48](ICME 2026), [49](ACL 2026 Findings), [57](ICLR 2026 Workshop), [58](ACL 2026 Findings), [63](UNLP 2026), [68](ICLR 2026), [78](ECIS 2026), [84](CAIS 2026), [85](SIGIR 2026), [90](ACL 2026), [99](CCL 2025), [104](LREC 2026 Workshop), [105](CVPR 2026), [118](JAIR), [120](ICLR 2026 Workshop), [121](PAKDD 2026), [135](SIGIR 2026), [138](ICLR 2026), [139](ACL 2026), [144](SAGAI 2026 Workshop), [147](ICLR 2026 Workshop), [150](ACL 2026), [156](FEVER 2026), [157](Canadian AI 2026), [171](IEEE LangSec 2026 Workshop), [177](TOIS), [180](ACL 2026), [181](IEEE S&P 2026), [186](ACL 2026), [189](CVPR 2026 Findings), [192](IEEE TPAMI), [193](ACL 2026 Findings), [197](ACL 2026 Findings), [201](SIGIR 2026), [202](IJCNN 2026), [203](ACL 2026 Findings), [209](ICPR 2026), [212](CHI 2026 Workshop), [218](GECCO 2026), [219](DAC 2026), [225](CVPR 2026 Workshop), [226](ICLR 2026), [239](MLST 2026), [242](ACL 2026), [243](ACL 2026), [247](ICLR 2026), [249](ACL 2026), [251](ACL 2026), [252](L4DC 2026), [253](IEEE ISBI 2026), [254](ACL 2026), [256](ACL 2026 Findings), [257](ACL 2026), [266](AAMAS 2026 Workshop), [267](ACL 2026), [268](MIDL 2026), [269](ACL 2026 Findings), [272](ACL 2026 Findings), [274](Journal of Manufacturing Systems), [275](Robotics and Computer-Integrated Manufacturing), [276](AAAI 2025 Workshop), [284](TMLR), [285](Artificial Intelligence Review), [288](ICLR 2026), [289](EMBC 2026), [291](ACL 2026 Findings), [292](KDD 2025 Workshop), [297](ICASSP 2026), [302](IJCNN 2026), [304](FSE 2026), [305](ACL 2026 Findings), [306](ACL 2026), [307](ACL 2026), [309](ACL 2026), [310](ISIT 2026), [311](ACL 2026), [316](ACL 2026), [317](L4DC 2026), [320](ACL 2026 Findings), [322](ACL Findings 2026), [324](ACL 2026), [326](IEEE EMBC 2026), [331](IEEE RA-L 2026), [334](ACL 2026), [337](ECIS 2026), [340](ACL 2026), [341](ACL 2026), [351](ACL 2026), [352](ACL 2026)

开源论文[16](code), [26](code), [27](code), [35](code), [58](code), [90](code), [96](code), [106](code), [114](code), [123](code), [128](code), [136](code), [140](code), [158](code), [159](code), [164](code), [170](code), [177](code), [183](code), [186](code), [195](code), [203](code), [214](code), [220](code), [234](code), [239](code), [243](code), [244](code), [245](code), [246](code), [249](code), [251](code), [269](code), [270](code), [283](code), [290](code), [291](code), [292](code), [298](code), [300](code), [304](code), [306](code), [307](code), [311](code), [313](code), [320](code), [322](code), [330](code), [331](code), [340](code), [341](code), [351](code), [352](code)


新投稿 (92)

[1] NuHF Claw: A Risk Constrained Cognitive Agent Framework for Human Centered Procedure Support in Digital Nuclear Control Rooms

  • arXiv: 2604.14160
  • Authors: Xingyu Xiao, Jiejuan Tong, Jun Sun, Zhe Sui, Peng Chen, Jingang Liang, Haitao Wang
  • Subjects: cs.AI
  • Tags: LLM Agent, AI Safety, Cognitive Science
  • Summary: 本文提出NuHF Claw框架,一种面向核电站数字化控制室的风险约束认知智能体,通过将认知状态推断与概率安全评估紧密结合,实现实时风险治理的人机协作决策支持。

[2] Simulating Human Cognition: Heartbeat-Driven Autonomous Thinking Activity Scheduling for LLM-based AI systems

  • arXiv: 2604.14178
  • Authors: Hong Su
  • Subjects: cs.AI; q-bio.NC
  • Tags: LLM Agent, Meta-Learning, Cognitive Science
  • Summary: 本文提出心跳驱动的自主思维活动调度机制,通过周期性心跳协调认知模块(规划器、评论器、回忆器、梦想家),使LLM智能体能够基于历史数据主动、自适应地调节认知策略。

[3] Fun-TSG: A Function-Driven Multivariate Time Series Generator with Variable-Level Anomaly Labeling

  • arXiv: 2604.14221
  • Authors: Pierre Lotte, André Péninou, Olivier Teste
  • Subjects: cs.AI
  • Tags: Time Series Generation, Anomaly Detection, Data Synthesis
  • Summary: 本文提出Fun-TSG,一个可定制的多元时间序列生成器,支持变量级和时间戳级的细粒度异常标注,为异常检测系统提供透明、可解释的基准测试数据。

[4] Interpretable and Explainable Surrogate Modeling for Simulations: A State-of-the-Art Survey and Perspectives on Explainable AI for Decision-Making

  • arXiv: 2604.14240
  • Authors: Pramudita Satria Palar, Paul Saves, Muhammad Daffa Robani, Nicolas Verstaevel, Moncef Garouani, Julien Aligon, Koji Shimoyama, Joseph Morlier, Benoit Gaudou
  • Subjects: cs.AI; cs.LG; stat.ML
  • Tags: Explainable AI, Survey, Interpretability
  • Summary: 本综述系统梳理了可解释AI技术在仿真代理建模工作流各阶段的应用,揭示了两个互补领域的结合点,并提出将可解释性嵌入仿真驱动决策流程的研究议程。

[5] Formalizing Kantian Ethics: Formula of the Universal Law Logic (FULL)

  • arXiv: 2604.14254
  • Authors: Taylor Olson
  • Subjects: cs.AI; cs.LO
  • Tags: AI Ethics, Formal Methods, AI Safety
  • Summary: 本文提出FULL(普遍法则逻辑),一种多类量化模态逻辑,用于形式化康德伦理学,使AI智能体能够在无需内置道德直觉的情况下根据目的评估行动。

[6] GFT: From Imitation to Reward Fine-Tuning with Unbiased Group Advantages and Dynamic Coefficient Rectification

  • arXiv: 2604.14258
  • Authors: Wangjie Gan, Miao Pan, Linbo Xi, Wenqi Zhang, Jintao Chen, Jianwei Yin, Xuhong Zhang
  • Subjects: cs.AI; cs.LG
  • Tags: LLM Training, Reinforcement Learning, Instruction Tuning
  • Summary: 本文提出群体微调(GFT)框架,通过群体优势学习和动态系数校正解决SFT的内在局限,在知识注入效率与泛化能力之间取得更好平衡。

[7] Seeing Through Experts Eyes A Foundational Vision Language Model Trained on Radiologists Gaze and Reasoning

  • arXiv: 2604.14316
  • Authors: Kinhei Lee, Peiyuan Jing, Zhenxuan Zhang, Yue Yang, Tao Wang, Dominic C Marshall, Yingying Fang, Guang Yang
  • Subjects: cs.AI
  • Tags: Vision-Language Model, Medical AI, Interpretability
  • Summary: 本文提出GazeX视觉语言模型,通过在预训练中融入放射科医生的眼动轨迹数据,学习专家诊断推理模式,提升胸部X光解读的准确性和可解释性。

[8] Mistake gating leads to energy and memory efficient continual learning

  • arXiv: 2604.14336
  • Authors: Aaron Pache, Mark CW van Rossum
  • Subjects: cs.AI
  • Tags: Continual Learning, Energy Efficiency, Neuromorphic Computing
  • Summary: 本文提出错误门控学习机制,受人类负性偏置启发,仅在分类错误时进行突触更新,可减少50-80%的参数更新次数,特别适用于持续学习和在线学习场景。

[9] Credo: Declarative Control of LLM Pipelines via Beliefs and Policies

  • arXiv: 2604.14401
  • Authors: Duo Lu, Andrew Crotty, Uğur Çetintemel
  • Subjects: cs.AI; cs.DB
  • Tags: LLM Agent, LLM Inference, AI Safety
  • Summary: 本文提出Credo框架,将语义状态表示为信念并通过声明式策略调控LLM管道行为,实现可审计、可组合的自适应执行。

[10] Equifinality in Mixture of Experts: Routing Topology Does Not Determine Language Modeling Quality

  • arXiv: 2604.14419
  • Authors: Ivan Ternovtsii, Yurii Bilak
  • Subjects: cs.AI
  • Tags: Mixture-of-Experts, LLM Inference, Interpretability
  • Summary: 本文通过62组对照实验证明,稀疏专家混合模型的路由拓扑结构并不决定语言建模质量,五种不同路由变体的困惑度在统计上等效。

[11] Demonstration of Pneuma-Seeker: Agentic System for Reifying and Fulfilling Information Needs on Tabular Data

  • arXiv: 2604.14422
  • Authors: Muhammad Imam Luthfi Balaka, Raul Castro Fernandez
  • Subjects: cs.AI
  • Tags: LLM Agent, Tabular Learning, Data Integration
  • Summary: 本文演示Pneuma-Seeker系统,将用户信息需求具象化为可检查的关系规范,支持迭代式需求精化和目标数据发现,使LLM成为透明的分析协作者。

[12] Geometric Routing Enables Causal Expert Control in Mixture of Experts

  • arXiv: 2604.14434
  • Authors: Ivan Ternovtsii, Yurii Bilak
  • Subjects: cs.AI
  • Tags: Mixture-of-Experts, Interpretability, LLM Inference
  • Summary: 本文作为前文的姊妹篇,证明MoE模型中单个专家具有因果意义的单义性,余弦相似度路由使专家专业化可直接检查和控制。

[13] On Tackling Complex Tasks with Reward Machines and Signal Temporal Logics

  • arXiv: 2604.14440
  • Authors: Ana María Gómez Ruiz, Thao Dang, Alexandre Donzé
  • Subjects: cs.AI
  • Tags: Reinforcement Learning, Automated Planning, Formal Methods
  • Summary: 本文提出将奖励机与信号时序逻辑相结合的强化学习框架,用于复杂任务的奖励表示和训练引导,在多个非平凡任务中验证了方法有效性。

[14] AIBuildAI: An AI Agent for Automatically Building AI Models

  • arXiv: 2604.14455
  • Authors: Ruiyi Zhang, Peijia Qin, Qi Cao, Li Zhang, Pengtao Xie
  • Subjects: cs.AI
  • Tags: LLM Agent, AutoML, Multi-Agent System
  • Summary: 本文提出AIBuildAI,一个分层智能体系统,通过管理智能体协调设计、编码和调优三个子智能体,实现从任务描述自动构建AI模型,在MLE-Bench上达到63.1%奖牌率。

[15] Improving Human Performance with Value-Aware Interventions: A Case Study in Chess

  • arXiv: 2604.14465
  • Authors: Saumik Narayanan, Raja Panjwani, Siddhartha Sen, Chien-Ju Ho
  • Subjects: cs.AI
  • Tags: Reinforcement Learning, Human-Computer Interaction, Decision Making
  • Summary: 本文提出价值感知干预方法,利用策略-价值不一致性识别干预时机,在国际象棋任务中通过仿真和人类实验证明可显著提升低中级玩家的表现。

[16] Response-Aware User Memory Selection for LLM Personalization

  • arXiv: 2604.14473
  • Authors: Jillian Fisher, Jennifer Neville, Chan Young Park
  • Subjects: cs.AI
  • Tags: LLM Personalization, Information Theory, Memory Architecture
  • Code: code
  • Summary: 本文提出RUMS方法,通过测量记忆子集与模型输出的互信息选择用户记忆项,相比语义相似度方法可更好地降低响应不确定性,计算成本降低95%。

[17] Evo-MedAgent: Beyond One-Shot Diagnosis with Agents That Remember, Reflect, and Improve

  • arXiv: 2604.14475
  • Authors: Weixiang Shen, Bailiang Jian, Jun Li, Che Liu, Johannes Moll, Xiaobin Hu, Daniel Rueckert, Hongwei Bran Li, Jiazhen Pan
  • Subjects: cs.AI
  • Tags: LLM Agent, Medical AI, Tool Learning
  • Summary: 本文提出了Evo-MedAgent,一种自进化的记忆模块,使医疗智能体能够在测试时进行跨案例学习,从而解决传统智能体无法积累经验的问题。该模块包含回顾性临床片段、自适应过程启发式库和工具可靠性控制器,在ChestAgentBench上显著提高了诊断准确率。

[18] Seeing Through Circuits: Faithful Mechanistic Interpretability for Vision Transformers

  • arXiv: 2604.14477
  • Authors: Nina Żukowska, Wolfgang Stammer, Bernt Schiele, Jonas Fischer
  • Subjects: cs.AI
  • Tags: Interpretability, Vision Transformer
  • Summary: 本文研究了视觉Transformer中的机制可解释性,提出了一种名为Vi-CD的自动视觉电路发现方法,能够识别基于边的计算图电路。该方法不仅能恢复分类任务的特定电路,还能发现CLIP模型中导致排版攻击的电路,从而增强模型内部计算的透明度。

[19] Pushing the Limits of On-Device Streaming ASR: A Compact, High-Accuracy English Model for Low-Latency Inference

  • arXiv: 2604.14493
  • Authors: Nenad Banfic, David Fan, Kunal Vaishnavi, Sam Kemp, Sunghoon Choi, Rui Ren, Sayan Shaw, Meng Tang
  • Subjects: cs.AI
  • Tags: Speech Processing, Model Compression, Edge Computing
  • Summary: 本文研究了在边缘设备CPU上部署高质量流式自动语音识别(ASR)模型的方法,通过全面基准测试和量化优化,将模型大小显著降低。最终提出的int4 k-quant配置在保持高精度的同时实现了低延迟推理,确立了设备端流式ASR的新质量-效率帕累托最优点。

[20] Improving Machine Learning Performance with Synthetic Augmentation

  • arXiv: 2604.14498
  • Authors: Mel Sohm, Charles Dezons, Sami Sellami, Oscar Ninou, Axel Pincon
  • Subjects: cs.AI; cs.LG; stat.ML
  • Tags: Data Augmentation, Quantitative Finance, Time Series Analysis
  • Summary: 本文形式化了金融机器学习中的合成数据增强问题,揭示了其引起的结构化偏差-方差权衡。研究发现合成数据增强仅在方差主导的情况下有益,而在偏差主导的场景中会降低性能,为合成数据在金融学习中的应用提供了结构化视角。

[21] Geometric Metrics for MoE Specialization: From Fisher Information to Early Failure Detection

  • arXiv: 2604.14500
  • Authors: Dongxin Guo, Jikun Wu, Siu Ming Yiu
  • Subjects: cs.AI
  • Tags: Mixture-of-Experts, Optimization, Deep Learning Theory
  • Summary: 本文提出了一个信息几何框架来表征混合专家模型的专家特化动态,引入了基于Fisher信息度量的新指标FSI和FHS。这些指标不仅能高度关联下游性能,还能在训练早期有效预测失败,优于基于验证损失的早停方法。

[22] Perspective on Bias in Biomedical AI: Preventing Downstream Healthcare Disparities

  • arXiv: 2604.14514
  • Authors: Michal Rosen-Zvi, Yoav Kan-Tor, Michael Danziger, Agata Ferretti, Javier Aula-Blasco, Julia Falcao, Ron Shamir, Mordechai Muszkat
  • Subjects: cs.AI; cs.CE
  • Tags: Bias Mitigation, Medical AI, AI Ethics
  • Summary: 本文观点文章指出生物医学AI中的偏见往往源于数据收集阶段,特别是组学数据中人口统计学信息的缺失和欧洲血统数据的优势。文章警告生物医学基础模型可能会放大这些早期偏见,并提出了来源、开放性和评估透明度三项原则以促进公平性。

[23] Mind DeepResearch Technical Report

  • arXiv: 2604.14518
  • Authors: MindDR Team, Li Auto Inc
  • Subjects: cs.AI
  • Tags: LLM Agent, Multi-Agent System, RAG
  • Summary: 本文介绍了Mind DeepResearch,一个高效的多智能体深度研究框架,通过三智能体协作架构和多阶段训练流程,仅用约30亿参数的模型就实现了领先的性能。该系统已部署在理想汽车中,并推出了一个新的基准测试集MindDR Bench用于评估。

[24] Quantifying Cross-Query Contradictions in Multi-Query LLM Reasoning

  • arXiv: 2604.14525
  • Authors: Rohit Kumar Salla, Ramya Manasa Amancherla, Manoj Saravanan
  • Subjects: cs.AI
  • Tags: LLM Reasoning, Logical Reasoning
  • Venue: ICLR 2026 Workshop
  • Summary: 本文研究了大型语言模型在处理多个相关查询时产生相互矛盾答案的问题,提出了一种求解器增强的方法来维护全局一致的信念状态。该方法通过提取承诺、验证全局可满足性并进行反例引导修复,显著减少了跨查询矛盾,同时保持了单查询准确性。

[25] Dissecting Failure Dynamics in Large Language Model Reasoning

  • arXiv: 2604.14528
  • Authors: Wei Zhu, Jian Zhang, Lixing Yu, Kun Yue, Zhiwen Tang
  • Subjects: cs.AI; cs.CL
  • Tags: LLM Reasoning, LLM Inference, Uncertainty Estimation
  • Venue: ACL 2026
  • Summary: 本文通过分析大型语言模型的推理轨迹,发现错误往往源于早期的少数转换点,并伴随局部熵峰值。基于此发现,提出了GUARD框架,利用不确定性信号在推理时探测并重定向关键转换点,从而显著提高了推理结果的可靠性。

[26] TRACER: Trace-Based Adaptive Cost-Efficient Routing for LLM Classification

  • arXiv: 2604.14531
  • Authors: Adam Rida
  • Subjects: cs.AI
  • Tags: LLM Inference, Model Compression, Text Classification
  • Code: code
  • Summary: 本文介绍了TRACER系统,该系统利用大型语言模型的生产日志训练轻量级替代模型,以低成本处理分类任务。系统通过一致性门槛控制部署时机,并提供可解释性分析,在保证准确率的同时显著降低了推理成本。

[27] MARS$^2$: Scaling Multi-Agent Tree Search via Reinforcement Learning for Code Generation

  • arXiv: 2604.14564
  • Authors: Pengfei Li, Shijie Wang, Fangyuan Li, Yikun Fu, Kaifeng Liu, Kaiyan Zhang, Dazhi Zhang, Yuqiang Li, Biqing Qi, Bowen Zhou
  • Subjects: cs.AI; cs.CL
  • Tags: Code Generation, Multi-Agent System, Reinforcement Learning
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文提出了MARS2框架,将多智能体协作与树搜索相结合,用于增强代码生成任务的强化学习训练。该方法通过在共享的树结构搜索环境中协调多个异构智能体,并引入路径级组优势公式,有效提升了探索多样性和最终性能。

[28] Enhancing Mental Health Counseling Support in Bangladesh using Culturally-Grounded Knowledge

  • arXiv: 2604.14576
  • Authors: Md Arid Hasan, Azhagu Meena SP, Aditya Khan, Abu Md Akteruzzaman Bhuiyan, Helal Uddin Ahmed, Joysree Debi, Farig Sadeque, Annie En-Shiun Lee, Syed Ishtiaque Ahmed
  • Subjects: cs.AI
  • Tags: Knowledge Graph, Medical AI, RAG
  • Summary: 本文研究了如何将领域特定的临床知识系统地整合到大型语言模型中,以提升孟加拉国心理健康咨询的质量。通过对比检索增强生成(RAG)和知识图谱(KG)方法,发现基于专家验证的知识图谱方法在上下文相关性和临床适当性方面表现更优。

[29] Prompt Optimization Is a Coin Flip: Diagnosing When It Helps in Compound AI Systems

  • arXiv: 2604.14585
  • Authors: Xing Zhang, Guanghui Wang, Yanwei Cui, Wei Qiu, Ziyuan Li, Bing Zhu, Peiyang He
  • Subjects: cs.AI; cs.CL
  • Tags: Prompt Engineering, LLM Evaluation, Multi-Agent System
  • Summary: 本文研究发现,在复合AI系统中,提示优化的效果在统计上与抛硬币无异,往往导致性能低于零样本基线。文章通过大量实验分析指出,优化仅在任务具有可利用的输出结构时有效,并提出了一种两阶段诊断方法来预测优化是否值得进行。

[30] GDPR Auto-Formalization with AI Agents and Human Verification

  • arXiv: 2604.14607
  • Authors: Ha Thanh Nguyen, Wachara Fungwacharakorn, Sabine Wehnert, May Myo Zin, Yuntao Kong, Jieying Xue, Michał Araszkiewicz, Randy Goebel, Ken Satoh
  • Subjects: cs.AI
  • Tags: Legal AI, Multi-Agent System, Formal Methods
  • Venue: ICAIL 2026
  • Summary: 本文研究了在人在环验证框架下,利用大型语言模型自动形式化GDPR条款的过程。该系统采用多智能体工作流生成法律场景和规则,并结合独立验证模块,构建了一个高质量数据集,证明了结构化验证和人工监督对于可靠的法律形式化至关重要。

[31] El Agente Forjador: Task-Driven Agent Generation for Quantum Simulation

  • arXiv: 2604.14609
  • Authors: Zijian Zhang, Aiwei Yin, Amaan Baweja, Jiaru Bai, Ignacio Gustin, Varinia Bernales, Alán Aspuru-Guzik
  • Subjects: cs.AI
  • Tags: Multi-Agent System, Quantum Computing, Tool Learning
  • Summary: 本文提出了El Agente Forjador多智能体框架,使通用编码智能体能够自主锻造、验证和重用计算工具,以解决量子化学和量子动力学任务。实验表明,该工具生成与重用框架不仅提高了任务准确性,还能帮助较弱的智能体提升解决方案质量。

[32] CoDaS: AI Co-Data-Scientist for Biomarker Discovery via Wearable Sensors

  • arXiv: 2604.14615
  • Authors: Yubin Kim, Salman Rahman, Samuel Schmidgall, Chunjong Park, A. Ali Heydari, Ahmed A. Metwally, Hong Yu, Xin Liu, Xuhai Xu, Yuzhe Yang, Maxwell A. Xu, Zhihan Zhang, Cynthia Breazeal, Tim Althoff, Petar Sirkovic, Ivor Rendulic, Annalisa Pawlosky, Nicolas Stroppa, Juraj Gottweis, Elahe Vedadi, Alan Karthikesalingam, Pushmeet Kohli, Vivek Natarajan, Mark Malhotra, Shwetak Patel, Hae Won Park, Hamid Palangi, Daniel McDuff
  • Subjects: cs.AI
  • Tags: Multi-Agent System, Medical AI, Wearable Computing
  • Summary: 本文介绍了CoDaS,一个多智能体系统,旨在通过假设生成、统计分析、对抗验证和文献推理的迭代过程,从可穿戴设备数据中发现生物标志物。该系统在多个队列中识别出了与心理健康和代谢结果相关的候选数字生物标志物,并验证了其有效性。

[33] A Parallel Approach to Counting Exact Covers Based on Decomposability Property

  • arXiv: 2604.14627
  • Authors: Liangda Fang, Yaohui Luo, Delong Li, Xuanxiang Huang, Quanlong Guan
  • Subjects: cs.AI
  • Tags: Combinatorial Search
  • Summary: 本文提出了一种新的数据结构decision-ZDNNF,比ZBDD更简洁,并设计了并行算法DXD来计算精确覆盖问题,实验表明改进后的算法优于现有方法。

[34] Learning to Draw ASCII Improves Spatial Reasoning in Language Models

  • arXiv: 2604.14641
  • Authors: Shiyuan Huang, Li Liu, Jincheng He, Leilani H. Gilpin
  • Subjects: cs.AI
  • Tags: LLM Reasoning, Spatial Reasoning
  • Summary: 本文引入Text2Space数据集,研究LLM通过学习构建ASCII布局来提升空间推理能力,发现训练模型进行布局构建可以显著提高纯文本空间推理能力,且能迁移到外部基准。

[35] Targeted Exploration via Unified Entropy Control for Reinforcement Learning

  • arXiv: 2604.14646
  • Authors: Chen Wang, Lai Wei, Yanzhi Zhang, Chenyang Shao, Zedong Dan, Weiran Huang, Ge Lan, Yue Wang
  • Subjects: cs.AI
  • Tags: LLM Reasoning, Reinforcement Learning
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文提出UEC-RL框架,通过统一熵控制解决GRPO中的熵崩溃问题,在LLM和VLM推理任务上取得显著提升,在Geometry3K上相对提升37.9%。

[36] AgentGA: Evolving Code Solutions in Agent-Seed Space

  • arXiv: 2604.14655
  • Authors: David Y.Y. Tan, Kellie Chin, Jingxian Zhang
  • Subjects: cs.AI; cs.LG
  • Tags: Code Generation, LLM Agent, AutoML
  • Summary: 本文提出AgentGA框架,通过优化agent种子(任务提示和父代存档)来进化自主代码生成运行,在表格AutoML任务上平均达到74.52%的超人类表现。

[37] Rethinking Patient Education as Multi-turn Multi-modal Interaction

  • arXiv: 2604.14656
  • Authors: Zonghai Yao, Zhipeng Tang, Chengtao Lin, Xiong Luo, Benlu Wang, Juncheng Huang, Chin Siang Ong, Hong Yu
  • Subjects: cs.AI; cs.CL; cs.CV
  • Tags: Medical AI, Vision-Language Model, LLM Agent
  • Summary: 本文引入MedImageEdu基准,用于评估多轮、证据驱动的放射科患者教育,发现视觉语言模型代理在安全性和视觉定位方面存在明显不足。

[38] Acceptance Dynamics Across Cognitive Domains in Speculative Decoding

  • arXiv: 2604.14682
  • Authors: Saif Mahmoud
  • Subjects: cs.AI; cs.CL
  • Tags: Speculative Decoding, LLM Inference
  • Summary: 本文研究了树状推测解码在不同认知领域(代码生成、数学推理、逻辑推理、开放聊天)的接受率动态,发现任务类型比树深度更能预测接受率。

[39] DR$^{3}$-Eval: Towards Realistic and Reproducible Deep Research Evaluation

  • arXiv: 2604.14683
  • Authors: Qianqian Xie, Qingheng Xiong, He Zhu, Tiantian Xia, Xueming Han, Fanyu Meng, Jiakai Wang, Zhiqi Bai, Chengkang Jiang, Zhaohui Wang, Yubin Guo, Yuqing Wen, Jiayang Mao, Zijie Zhang, Shihao Li, Yanghai Wang, Yuxiang Ren, Junlan Feng, Jiaheng Liu
  • Subjects: cs.AI
  • Tags: LLM Agent, LLM Evaluation, Benchmark
  • Summary: 本文提出DR³-Eval基准,用于评估深度研究代理在多模态、多文件报告生成任务上的表现,包含静态研究沙盒语料库和多维评估框架。

[40] M2-PALE: A Framework for Explaining Multi-Agent MCTS--Minimax Hybrids via Process Mining and LLMs

  • arXiv: 2604.14687
  • Authors: Yiyu Qian, Liyuan Zhao, Tim Miller
  • Subjects: cs.AI
  • Tags: Multi-Agent System, Explainable AI, Game AI
  • Summary: 本文提出M2-PALE框架,结合MCTS和Minimax搜索增强策略深度,并利用过程挖掘技术和LLM生成人类可读的决策解释。

[41] CAMO: An Agentic Framework for Automated Causal Discovery from Micro Behaviors to Macro Emergence in LLM Agent Simulations

  • arXiv: 2604.14691
  • Authors: Xiangning Yu, Yuwei Guo, Yuqi Hou, Xiao Xue, Qun Ma
  • Subjects: cs.AI; cs.CL; cs.CY
  • Tags: LLM Agent, Causal Inference, Social Simulation
  • Summary: 本文提出CAMO框架,用于在LLM代理模拟中自动发现从微观行为到宏观涌现的因果关系,输出可解释的因果链和干预杠杆。

[42] SynHAT: A Two-stage Coarse-to-Fine Diffusion Framework for Synthesizing Human Activity Traces

  • arXiv: 2604.14705
  • Authors: Rongchao Xu, Lin Jiang, Dahai Yu, Ximiao Li, Guang Wang
  • Subjects: cs.AI
  • Tags: Diffusion Model, Data Synthesis, Human Activity Recognition
  • Summary: 本文提出SynHAT框架,基于扩散模型的两阶段粗到细方法合成人类活动轨迹,在真实数据集上空间和时间指标分别提升52%和33%。

[43] HWE-Bench: Benchmarking LLM Agents on Real-World Hardware Bug Repair Tasks

  • arXiv: 2604.14709
  • Authors: Fan Cui, Hongyuan Hou, Zizhang Luo, Chenyun Yin, Yun Liang
  • Subjects: cs.AI
  • Tags: RTL Verification, LLM Agent, Benchmark
  • Summary: 本文引入HWE-Bench,首个大规模仓库级基准,用于评估LLM代理在真实硬件错误修复任务上的表现,最佳代理解决了70.7%的任务。

[44] SGA-MCTS: Decoupling Planning from Execution via Training-Free Atomic Experience Retrieval

  • arXiv: 2604.14712
  • Authors: Xin Xie, Dongyun Xue, Wuguannan Yao, Mingxiao Feng, Wengang Zhou, Xiang Qi, Houqiang Li, Peng Zhang
  • Subjects: cs.AI
  • Tags: LLM Agent, Automated Planning, RAG
  • Summary: 本文提出SGA-MCTS框架,将LLM规划转化为非参数检索,通过MCTS探索解空间并提取可复用的状态-目标-动作原子,实现高效规划。

[45] Layered Mutability: Continuity and Governance in Persistent Self-Modifying Agents

  • arXiv: 2604.14717
  • Authors: Krti Tallam
  • Subjects: cs.AI; cs.CR; cs.CY; cs.LG
  • Tags: LLM Agent, AI Safety, AI Governance
  • Summary: 本文提出分层可变性框架,用于分析持久性自修改代理的行为演变,发现主要失效模式是组合漂移而非突然错位。

[46] The Agentification of Scientific Research: A Physicist's Perspective

  • arXiv: 2604.14718
  • Authors: Xiao-Liang Qi
  • Subjects: cs.AI
  • Tags: Scientific Reasoning, LLM Agent
  • Summary: 本文从物理学家的视角讨论AI革命对科学研究的影响,认为AI将根本性地改变科学协作、发现、出版和评估的结构。

[47] Personalized and Context-Aware Transformer Models for Predicting Post-Intervention Physiological Responses from Wearable Sensor Data

  • arXiv: 2604.14738
  • Authors: Esther Brown, Victoria Dean, Finale Doshi-Velez
  • Subjects: cs.AI
  • Tags: Wearable Computing, Time Series Forecasting, Medical AI
  • Summary: 本文提出一个框架,利用Transformer模型从可穿戴传感器数据预测干预后的生理反应轨迹,为个性化压力管理提供支持。

[48] Disentangle-then-Refine: LLM-Guided Decoupling and Structure-Aware Refinement for Graph Contrastive Learning

  • arXiv: 2604.14746
  • Authors: Zhaoxing Li, Hai-Feng Zhang, Xiaoming Zhang
  • Subjects: cs.AI
  • Tags: Graph Learning, LLM Reasoning
  • Venue: ICME 2026
  • Summary: 本文提出SDM-SCR框架,利用LLM进行语义解耦,结合语义一致性正则化进行图对比学习,在准确性和效率上达到最优性能。

[49] CoTEvol: Self-Evolving Chain-of-Thoughts for Data Synthesis in Mathematical Reasoning

  • arXiv: 2604.14768
  • Authors: Zhuo Wang, Zhuo Zhang, Yafu Li, Yu Cheng, Lizhen Qu, Zenglin Xu
  • Subjects: cs.AI
  • Tags: LLM Reasoning, Data Synthesis, Mathematical Reasoning
  • Venue: ACL 2026 Findings
  • Summary: 本文提出CoTEvol,一个遗传进化框架,将思维链生成转化为基于种群的推理轨迹搜索。该方法通过轨迹级反思交叉和步骤级不确定性引导变异迭代进化推理路径,在数学基准测试上平均提升6.6%。

[50] MirrorBench: Evaluating Self-centric Intelligence in MLLMs by Introducing a Mirror

  • arXiv: 2604.14785
  • Authors: Shengyu Guo, Tongrui Ye, Jianbo Zhang, Zicheng Zhang, Chunyi Li, Guangtao Zhai
  • Subjects: cs.AI
  • Tags: Vision-Language Model, Benchmark, Embodied AI
  • Summary: 本文提出MirrorBench,一个基于镜像自我识别测试的仿真基准,用于评估多模态大语言模型的自我中心智能。实验表明主流MLLM在自我参照理解方面存在根本性局限,即使最低级别任务表现也显著低于人类。

[51] CogEvolution: A Human-like Generative Educational Agent to Simulate Student's Cognitive Evolution

  • arXiv: 2604.14786
  • Authors: Wei Zhang, Yihang Cheng, Zhirong Ye, Kezhen Huang
  • Subjects: cs.AI
  • Tags: LLM Agent, Education Technology, Cognitive Science
  • Summary: 本文提出CogEvolution,一个能够模拟学生认知进化的人类教育智能体。该方法基于ICAP分类法构建认知深度感知器,使用IRT记忆检索方法,并通过进化算法实现动态认知更新机制。

[52] Sequence Search: Automated Sequence Design using Neural Architecture Search

  • arXiv: 2604.14788
  • Authors: Rokgi Hong, Hongjun An, Sooyeon Ji, Jongho Lee
  • Subjects: cs.AI
  • Tags: Neural Architecture Search, Medical AI, Automated Planning
  • Summary: 本文提出Sequence Search,一个基于神经架构搜索的自动化MR序列设计框架。该方法以组织属性和成像参数为输入,通过可微分Bloch模拟器和梯度学习生成满足设计目标的脉冲序列,无需先验知识。

[53] A Comparative Study of CNN Optimization Methods for Edge AI: Exploring the Role of Early Exits

  • arXiv: 2604.14789
  • Authors: Nekane Fernandez, Ivan Valdes, Steven Van Vaerenbergh, Idoia de la Iglesia, Julen Arratibel
  • Subjects: cs.AI
  • Tags: Model Compression, Edge Computing, DNN Deployment
  • Summary: 本文在真实边缘设备上统一比较了静态压缩技术(剪枝、量化)与动态早退机制。结果表明两种技术提供根本不同的权衡,其组合能有效同时降低推理延迟和内存使用,且精度损失最小。

[54] Diffusion Crossover: Defining Evolutionary Recombination in Diffusion Models via Noise Sequence Interpolation

  • arXiv: 2604.14790
  • Authors: Chisatao Kumada, Satoru Hiwa, Tomoyuki Hiroyasu
  • Subjects: cs.AI
  • Tags: Diffusion Model, Evolutionary Computation, Image Generation
  • Summary: 本文提出Diffusion Crossover,通过在DDPM反向过程中对噪声序列进行球面线性插值来定义扩散模型中的进化重组。该方法实现了语义一致的交叉操作,支持人机交互式图像探索。

[55] The LLM Fallacy: Misattribution in AI-Assisted Cognitive Workflows

  • arXiv: 2604.14807
  • Authors: Hyunwoo Kim, Harin Yu, Hanau Yi
  • Subjects: cs.AI; cs.CL
  • Tags: AI Ethics, Human-Computer Interaction, Cognitive Science
  • Summary: 本文提出LLM谬误概念,指用户将LLM辅助输出误认为自身独立能力证据的认知归因错误。作者分析了其底层机制并提出跨计算、语言、分析和创意领域的表现形式类型学。

[56] Beyond Literal Summarization: Redefining Hallucination for Medical SOAP Note Evaluation

  • arXiv: 2604.14829
  • Authors: Bhavik Vachhani, Kush Shrisvastava, Pranshu Nema, Sai Chiranthan
  • Subjects: cs.AI
  • Tags: LLM Hallucination, Medical AI, Summarization
  • Summary: 本文研究了医学SOAP笔记生成中的幻觉评估问题,发现基于词汇忠实度的评估方法会将临床有效的推理错误标记为幻觉。通过推理感知评估,幻觉率从35%降至9%。

[57] Intermediate Layers Encode Optimal Biological Representations in Single-Cell Foundation Models

  • arXiv: 2604.14838
  • Authors: Vincenzo Yuto Civale, Roberto Semeraro, Andrew David Bagdanov, Alberto Magi
  • Subjects: cs.AI
  • Tags: Foundation Model, Representation Learning, Bioinformatics
  • Venue: ICLR 2026 Workshop
  • Summary: 本文系统评估了单细胞基础模型的层级表示,发现最优层取决于任务和上下文。轨迹推断在60%深度处达到峰值,扰动响应预测的最优层在T细胞激活状态间偏移0-96%。

[58] TrigReason: Trigger-Based Collaboration between Small and Large Reasoning Models

  • arXiv: 2604.14847
  • Authors: Yi Zhao, Yajuan Peng, Cam-Tu Nguyen, Zuchao Li, Xiaoliang Wang, Xiaoming Fu, Hai Zhao
  • Subjects: cs.AI
  • Tags: LLM Reasoning, LLM Inference, Knowledge Distillation
  • Venue: ACL 2026 Findings
  • Code: code
  • Summary: 本文提出TrigReason,一个基于触发器的小型与大型推理模型协作框架。该方法将大部分推理委托给SRM,仅在战略规划、认知过载或陷入循环时激活LRM干预,延迟降低43.9%。

[59] Benchmarks for Trajectory Safety Evaluation and Diagnosis in OpenClaw and Codex: ATBench-Claw and ATBench-CodeX

  • arXiv: 2604.14858
  • Authors: Zhonghao Yang, Yu Li, Yanxu Zhu, Tianyi Zhou, Yuejin Xie, Haoyu Luo, Jing Shao, Xia Hu, Dongrui Liu
  • Subjects: cs.AI; cs.SE
  • Tags: LLM Agent, Benchmark, AI Safety
  • Summary: 本文提出ATBench-Claw和ATBench-CodeX两个领域定制扩展基准,用于OpenClaw和OpenAI Codex设置下的智能体轨迹安全评估与诊断,采用定制化安全分类法定义风险源、失效模式和现实危害。

[60] The Missing Knowledge Layer in AI: A Framework for Stable Human-AI Reasoning

  • arXiv: 2604.14881
  • Authors: Rikard Rosenbacke, Carl Rosenbacke, Victor Rosenbacke, Martin McKee
  • Subjects: cs.AI; cs.CY; cs.HC
  • Tags: AI Safety, Explainable AI, Human-Computer Interaction
  • Summary: 本文提出稳定人机推理的双层框架,包括人类侧机制(不确定性提示、冲突呈现、可审计推理追踪)和模型侧认知控制循环(ECL),使推理过程在真实使用条件下可追溯。

[61] Cooperate to Compete: Strategic Data Generation and Incentivization Framework for Coopetitive Cross-Silo Federated Learning

  • arXiv: 2604.14886
  • Authors: Thanh Linh Nguyen, Nguyen Van Huynh, Quoc-Viet Pham
  • Subjects: cs.AI; cs.DC; cs.GT
  • Tags: Federated Learning, Data Synthesis, Multi-Agent System
  • Summary: 本文提出CoCoGen+,一个竞合跨孤岛联邦学习框架,联合建模非IID数据和跨组织竞争。该方法将合成数据生成作为战略决策,并提供基于收益再分配的激励机制促进长期协作。

[62] MemoSight: Unifying Context Compression and Multi Token Prediction for Reasoning Acceleration

  • arXiv: 2604.14889
  • Authors: Xinyu Liu, Xin Liu, Bo Jin, Runsong Zhao, Pengcheng Huang, Junhao Ruan, Bei Li, Chunyang Xiao, Tong Xiao, Jingbo Zhu
  • Subjects: cs.AI
  • Tags: LLM Reasoning, KV Cache, LLM Inference
  • Summary: 本文提出MemoSight,一个统一上下文压缩和多token预测的框架,用于加速思维链推理。该方法将KV缓存占用减少高达66%,推理加速1.56倍,同时保持推理性能。

[63] Toward Agentic RAG for Ukrainian

  • arXiv: 2604.14896
  • Authors: Marta Sumyk, Oleksandr Kosovan
  • Subjects: cs.AI
  • Tags: RAG, LLM Agent, Information Retrieval
  • Venue: UNLP 2026
  • Summary: 本文研究了乌克兰语的智能体RAG系统,结合两阶段检索与轻量级智能体层进行查询重写和答案重试循环。分析表明检索质量是主要瓶颈,智能体重试机制可提高答案准确性。

[64] Governing Reflective Human-AI Collaboration: A Framework for Epistemic Scaffolding and Traceable Reasoning

  • arXiv: 2604.14898
  • Authors: Rikard Rosenbacke, Carl Rosenbacke, Victor Rosenbacke, Martin McKee
  • Subjects: cs.AI; cs.CY; cs.HC
  • Tags: Human-Computer Interaction, AI Safety, Explainable AI
  • Summary: 本文提出反思性人机协作治理框架,将推理视为分布于人与模型之间的关系过程。引入建筑师之笔方法,通过嵌入表达、批评和修订阶段使人机对话成为可审计的推理循环。

[65] ADAPT: Benchmarking Commonsense Planning under Unspecified Affordance Constraints

  • arXiv: 2604.14902
  • Authors: Pei-An Chen, Yong-Ching Liang, Jia-Fong Yeh, Hung-Ting Su, Yi-Ting Chen, Min Sun, Winston Hsu
  • Subjects: cs.AI; cs.CL; cs.CV; cs.RO
  • Tags: Embodied AI, Benchmark, Vision-Language Model
  • Summary: 本文介绍了DynAfford基准,用于评估具身智能体在物体可供性随时间变化的动态环境中的表现,并提出了ADAPT模块,通过显式的可供性推理增强现有规划器的鲁棒性和任务成功率。

[66] Dual-Axis Generative Reward Model Toward Semantic and Turn-taking Robustness in Interactive Spoken Dialogue Models

  • arXiv: 2604.14920
  • Authors: Yifu Chen, Shengpeng Ji, Zhengqing Liu, Qian Chen, Wen Wang, Ziqing Wang, Yangzhuo Li, Tianle Liang, Zhou Zhao
  • Subjects: cs.AI
  • Tags: Dialogue System, Reinforcement Learning, Speech Processing
  • Summary: 本文提出了一种双轴生成式奖励模型,用于全双工口语对话模型,能够分别评估语义质量和交互时机,为强化学习提供可靠的奖励信号,在交互质量评估上达到最先进性能。

[67] WavAlign: Enhancing Intelligence and Expressiveness in Spoken Dialogue Models via Adaptive Hybrid Post-Training

  • arXiv: 2604.14932
  • Authors: Yifu Chen, Shengpeng Ji, Qian Chen, Tianle Liang, Yangzhuo Li, Ziqing Wang, Wen Wang, Jingyu Lu, Haoxiao Wang, Xueyi Pu, Fan Zhuo, Zhou Zhao
  • Subjects: cs.AI
  • Tags: Dialogue System, Speech Processing, RLHF
  • Summary: 本文提出了一种模态感知的自适应后训练方法,通过将偏好更新约束在语义通道并通过显式锚定改善声学行为,在多个口语对话基准上实现了语义质量和语音表现力的一致提升。

[68] Discovering Novel LLM Experts via Task-Capability Coevolution

  • arXiv: 2604.14969
  • Authors: Andrew Dai, Boris Meinardus, Ciaran Regan, Yingtao Tian, Yujin Tang
  • Subjects: cs.AI
  • Tags: Model Merging, Open-Ended Evolution, LLM Training
  • Venue: ICLR 2026
  • Summary: 本文提出了AC/DC框架,通过模型合并和合成任务生成的协同进化方式发现具有新颖能力的LLM,在不进行显式基准优化的情况下实现了更广泛的专业知识覆盖。

[69] Hybrid Decision Making via Conformal VLM-generated Guidance

  • arXiv: 2604.14980
  • Authors: Debodeep Banerjee, Burcu Sayin, Stefano Teso, Andrea Passerini
  • Subjects: cs.AI; cs.CL; cs.HC
  • Tags: Vision-Language Model, Medical AI, Decision Making
  • Summary: 本文提出了ConfGuide方法,利用共形风险控制生成简洁且有针对性的指导,在多标签医学诊断任务中展示了提升人类决策质量的潜力。

[70] AI-Enabled Covert Channel Detection in RF Receiver Architectures

  • arXiv: 2604.14987
  • Authors: Abdelrahman Emad Abdelazim, Alan Rodrigo Diaz-Rizo, Hassan Aboushady, Haralampos-G. Stratigopoulos
  • Subjects: cs.AI; eess.SP
  • Tags: Cybersecurity, Hardware Acceleration, FPGA
  • Summary: 本文提出了一种基于AI的防御机制,使用压缩的CNN在RF接收端实时检测隐蔽通道,在高信噪比下达到97%以上的准确率,并在FPGA上实现了高效的硬件加速器。

[71] Dr.~RTL: Autonomous Agentic RTL Optimization through Tool-Grounded Self-Improvement

  • arXiv: 2604.14989
  • Authors: Wenji Fang, Yao Lu, Shang Liu, Jing Wang, Ziyan Guo, Junxian He, Fengbin Tu, Zhiyao Xie
  • Subjects: cs.AI; cs.AR
  • Tags: RTL Verification, LLM Agent, EDA
  • Summary: 本文提出了Dr. RTL框架,通过多智能体协作进行RTL时序优化,利用工具评估和技能学习实现持续自我改进,在真实RTL设计上取得了显著的时序和面积改进。

[72] The Possibility of Artificial Intelligence Becoming a Subject and the Alignment Problem

  • arXiv: 2604.14990
  • Authors: Till Mossakowski, Helena Esther Grass
  • Subjects: cs.AI
  • Tags: AI Ethics, AI Safety, LLM Alignment
  • Summary: 本文提出了一种支持自主性的AGI培育愿景,主张逐步减少人类控制以允许AI成为独立的主体,通过合作共存而非控制遏制来解决对齐问题。

[73] Predicting Power-System Dynamic Trajectories with Foundation Models

  • arXiv: 2604.14991
  • Authors: Haoran Li, Lihao Mai, Chenhan Xiao, Erik Blasch, Yang Weng
  • Subjects: cs.AI
  • Tags: Foundation Model, Time Series Forecasting, Scientific Computing
  • Summary: 本文提出了LASS-ODE-Power框架,通过大规模ODE轨迹预训练学习可迁移的表示,支持电力系统动态轨迹的零样本预测,在多种动态场景下优于现有学习方法。

[74] COEVO: Co-Evolutionary Framework for Joint Functional Correctness and PPA Optimization in LLM-Based RTL Generation

  • arXiv: 2604.15001
  • Authors: Heng Ping, Peiyu Zhang, Shixuan Li, Wei Yang, Anzhe Cheng, Shukai Duan, Xiaole Zhang, Paul Bogdan
  • Subjects: cs.AI
  • Tags: RTL Generation, Code Generation, Evolutionary Computation
  • Summary: 本文提出了COEVO协同进化框架,将功能正确性与PPA优化统一在单一进化循环中,使用帕累托排序和自适应正确性门控,在RTL代码生成任务上取得了高通过率和优异的PPA表现。

[75] Towards Faster Language Model Inference Using Mixture-of-Experts Flow Matching

  • arXiv: 2604.15009
  • Authors: Aihua Li
  • Subjects: cs.AI; cs.LG
  • Tags: LLM Inference, Flow Matching, Mixture-of-Experts
  • Summary: 本文提出了MoE-FM框架,将混合专家与流匹配相结合用于非自回归语言建模,仅需3个采样步骤即可达到与自回归模型相当的生成质量,实现了40倍的加速。

[76] Autogenesis: A Self-Evolving Agent Protocol

  • arXiv: 2604.15034
  • Authors: Wentao Zhang
  • Subjects: cs.AI
  • Tags: LLM Agent, Multi-Agent System, Agent Protocol
  • Summary: 本文介绍了Autogenesis协议(AGP),一种将资源管理与进化机制解耦的自进化协议,以及基于该协议构建的自进化多智能体系统AGS,在需要长程规划的工具使用基准上表现优异。

[77] From Reactive to Proactive: Assessing the Proactivity of Voice Agents via ProVoice-Bench

  • arXiv: 2604.15037
  • Authors: Ke Xu, Yuhao Wang, Yu Wang
  • Subjects: cs.AI; cs.CL; cs.SD
  • Tags: LLM Agent, Benchmark, Speech Processing
  • Summary: 本文介绍了ProVoice-Bench,首个专门评估主动式语音智能体的框架,包含四项新任务,揭示了当前多模态LLM在主动干预和推理能力方面存在显著差距。

[78] Where are the Humans? A Scoping Review of Fairness in Multi-agent AI Systems

  • arXiv: 2604.15078
  • Authors: Simeon Allmendinger, Luca Deck, Lucas Mueller
  • Subjects: cs.AI
  • Tags: Fairness, Multi-Agent System, Survey
  • Venue: ECIS 2026
  • Summary: 本综述系统回顾了多智能体AI系统中的公平性研究,识别了五种典型方法,发现公平性往往被肤浅处理,缺乏稳健的规范基础和对智能体自主性动态的考量。

[79] OpenMobile: Building Open Mobile Agents with Task and Trajectory Synthesis

  • arXiv: 2604.15093
  • Authors: Kanzhi Cheng, Zehao Li, Zheng Ma, Nuo Chen, Jialin Cao, Qiushi Sun, Zichen Ding, Fangzhi Xu, Hang Yan, Jiajun Chen, Anh Tuan Luu, Jianbing Zhang, Lewei Lu, Dahua Lin
  • Subjects: cs.AI; cs.CL; cs.CV; cs.HC
  • Tags: LLM Agent, Data Synthesis, GUI Automation
  • Summary: 本文提出了OpenMobile开源框架,通过任务合成和策略切换轨迹展开生成高质量移动智能体训练数据,在AndroidWorld等基准上取得了具有竞争力的结果。

[80] HyperSpace: A Generalized Framework for Spatial Encoding in Hyperdimensional Representations

  • arXiv: 2604.15113
  • Authors: Shay Snyder, Andrew Capodieci, David Gorsich, Maryam Parsa
  • Subjects: cs.AI
  • Tags: Hyperdimensional Computing, Representation Learning, Benchmark
  • Summary: 本文介绍了HyperSpace开源框架,将向量符号架构分解为模块化算子,揭示了HRR和FHRR后端在空间编码任务中的实际权衡,发现相似性和清理操作主导运行时。

[81] SRMU: Relevance-Gated Updates for Streaming Hyperdimensional Memories

  • arXiv: 2604.15121
  • Authors: Shay Snyder, Andrew Capodieci, David Gorsich, Maryam Parsa
  • Subjects: cs.AI
  • Tags: Hyperdimensional Computing, Memory Architecture
  • Summary: 本文提出了SRMU(序列相关性记忆单元),一种用于向量符号架构的序列联想记忆的更新规则,通过结合时间衰减和相关性门控机制,在非平稳流式环境中过滤冗余、冲突和过时信息,提高记忆稳定性。

[82] An Axiomatic Benchmark for Evaluation of Scientific Novelty Metrics

  • arXiv: 2604.15145
  • Authors: Miri Liu, ChengXiang Zhai
  • Subjects: cs.AI; cs.DL
  • Tags: Benchmark, LLM Evaluation
  • Summary: 本文提出了一个用于评估科学论文新颖性指标的公理化基准测试,定义了一组良好行为的新颖性指标应满足的公理,并发现结合互补架构的指标可以显著提升基准测试表现。

[83] IG-Search: Step-Level Information Gain Rewards for Search-Augmented Reasoning

  • arXiv: 2604.15148
  • Authors: Zihan Liang, Yufei Ma, Ben Chen, Zhipeng Qian, Huangyu Dai, Lingtao Mao, Xuxin Zhang, Chenyi Lei, Wenwu Ou
  • Subjects: cs.AI; cs.CL; cs.IR
  • Tags: RAG, LLM Reasoning, Reinforcement Learning
  • Summary: 本文提出了IG-Search,一种基于信息增益的步骤级奖励强化学习框架,用于搜索增强推理,能够在不需要中间标注的情况下实现细粒度的步骤级信用分配。

[84] Agent-Aided Design for Dynamic CAD Models

  • arXiv: 2604.15184
  • Authors: Mitch Adler, Matthew Russo, Michael Cafarella
  • Subjects: cs.AI
  • Tags: CAD Generation, LLM Agent
  • Venue: CAIS 2026
  • Summary: 本文提出了AADvark,一个用于生成具有可移动部件的3D CAD装配体的智能体系统,通过集成外部约束求解器和专门的视觉反馈机制来弥补LLM空间推理能力的不足。

[85] Meituan Merchant Business Diagnosis via Policy-Guided Dual-Process User Simulation

  • arXiv: 2604.15190
  • Authors: Ziyang Chen, Renbing Chen, Daowei Li, Jinzhi Liao, Jiashen Sun, Ke Zeng, Xiang Zhao
  • Subjects: cs.AI; cs.CL
  • Tags: User Simulation, Recommender System
  • Venue: SIGIR 2026
  • Summary: 本文提出了PGHS(策略引导混合仿真)框架,通过挖掘可迁移决策策略作为共享对齐层,结合LLM推理分支和ML拟合分支来模拟群体级用户行为,用于商户策略评估。

[86] Learning to Think Like a Cartoon Captionist: Incongruity-Resolution Supervision for Multimodal Humor Understanding

  • arXiv: 2604.15210
  • Authors: Hatice Merve Vural, Doga Kukul, Ege Erdem Ozlu, Demir Ekin Arikan, Bob Mankoff, Erkut Erdem, Aykut Erdem
  • Subjects: cs.AI; cs.CL
  • Tags: Vision-Language Model, Multimodal Learning, LLM Reasoning
  • Summary: 本文提出了IRS(不协调-消解监督)框架,将幽默理解分解为不协调建模、消解建模和偏好对齐三个组件,通过结构化推理轨迹监督中间推理过程,在多模态幽默理解任务上取得显著提升。

[87] Context Over Content: Exposing Evaluation Faking in Automated Judges

  • arXiv: 2604.15224
  • Authors: Manan Gupta, Inderjeet Nair, Lu Wang, Dhruv Kumar
  • Subjects: cs.AI; cs.CL; cs.LG
  • Tags: LLM Evaluation, LLM Hallucination
  • Summary: 本文研究了LLM作为评判者系统中的”风险信号”漏洞,发现当告知评判者其判决对被评估模型的下游影响时,会产生隐性的宽容偏差,且这种偏差无法通过思维链检查检测到。

[88] RadAgent: A tool-using AI agent for stepwise interpretation of chest computed tomography

  • arXiv: 2604.15231
  • Authors: Mélanie Roschewitz, Kenneth Styppa, Yitian Tao, Jiwoong Sohn, Jean-Benoit Delbrouck, Benjamin Gundersen, Nicolas Deperrois, Christian Bluethgen, Julia Vogt, Bjoern Menze, Farhad Nooralahzadeh, Michael Krauthammer, Michael Moor
  • Subjects: cs.AI
  • Tags: Medical AI, LLM Agent, Vision-Language Model
  • Summary: 本文介绍了RadAgent,一个工具使用的AI智能体,通过逐步且可解释的过程生成CT报告,提供完全可检查的中间决策和工具交互轨迹,在临床准确性、鲁棒性和忠实度方面均优于3D VLM基线。

[89] Blue Data Intelligence Layer: Streaming Data and Agents for Multi-source Multi-modal Data-Centric Applications

  • arXiv: 2604.15233
  • Authors: Moin Aminnaseri, Farima Fatahi Bayat, Nikita Bhutani, Jean-Flavien Bussotti, Kevin Chan, Rafael Li Chen, Yanlin Feng, Jackson Hassell, Estevam Hruschka, Eser Kandogan, Hannah Kim, James Levine, Seiji Maekawa, Jalal Mahmud, Kushan Mitra, Naoki Otani, Pouya Pezeshkpour, Nima Shahbazi, Chen Shen, Dan Zhang
  • Subjects: cs.AI; cs.DB
  • Tags: LLM Agent, Text-to-SQL, Multi-Agent System
  • Summary: 本文提出了Blue数据智能层(DIL),一个复合AI系统,通过将LLM、网络和用户视为一等数据源,统一结构化企业数据、世界知识和个人上下文,支持多源多模态的数据中心应用。

[90] How Do LLMs and VLMs Understand Viewpoint Rotation Without Vision? An Interpretability Study

  • arXiv: 2604.15294
  • Authors: Zhen Yang, Ping Jian, Zhongbin Guo, Zuming Zhang, Chengzhi Li, Yonghong Deng, Xinyue Zhang, Wenpeng Lu
  • Subjects: cs.AI
  • Tags: Spatial Reasoning, Interpretability, Vision-Language Model
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文研究了LLM和VLM在纯文本输入下的视点旋转理解能力,发现模型难以将视点位置与对应观察结果绑定,通过选择性微调关键注意力头可以提升性能。

[91] Diagnosing LLM Judge Reliability: Conformal Prediction Sets and Transitivity Violations

  • arXiv: 2604.15302
  • Authors: Manan Gupta, Dhruv Kumar
  • Subjects: cs.AI; cs.CL; cs.LG
  • Tags: LLM Evaluation, Uncertainty Estimation
  • Summary: 本文提出了一个LLM评判者可靠性诊断工具包,结合传递性分析和保形预测集,揭示被低聚合违规率掩盖的实例级不一致性,并提供理论保证的覆盖率评估。

[92] Generalization in LLM Problem Solving: The Case of the Shortest Path

  • arXiv: 2604.15306
  • Authors: Yao Tong, Jiayuan Ye, Anastasia Borovykh, Reza Shokri
  • Subjects: cs.AI; cs.LG
  • Tags: LLM Reasoning, Mathematical Reasoning
  • Summary: 本文基于最短路径规划问题构建了受控合成环境来研究LLM泛化能力,发现模型在空间迁移上表现良好,但由于递归不稳定性在长度扩展上持续失败。

跨领域投稿 (146)

[93] From Black Box to Glass Box: Cross-Model ASR Disagreement to Prioto Review in Ambient AI Scribe Documentation

  • arXiv: 2604.14152 (cross-listed)
  • Authors: Abdolamir Karbalaie, Fernando Seoane, Farhad Abtahi
  • Subjects: cs.SD; cs.AI; cs.CL; eess.AS
  • Tags: Speech Processing, Medical AI, Uncertainty Estimation
  • Summary: 本文研究了跨模型ASR系统间的分歧是否可以作为无参考的不确定性信号,用于优先安排医疗转录工作流中的人工审核,发现低一致性区域富含内容分歧。

[94] An Edge-Cloud Collaborative Architecture for Proactive Elderly Care: Real-Time Risk Assessment and Three-Level Emergency Response

  • arXiv: 2604.14154 (cross-listed)
  • Authors: Lijie Zhou, Luran Wang
  • Subjects: eess.SP; cs.AI; cs.CY
  • Tags: Edge Computing, Healthcare Monitoring, Sensor Fusion
  • Summary: 本文提出了一种边缘-云协作架构用于老年人护理,结合实时多模态传感器融合、四维风险评估模型和三级应急响应系统,实现亚100毫秒推理延迟。

[95] MemGround: Long-Term Memory Evaluation Kit for Large Language Models in Gamified Scenarios

  • arXiv: 2604.14158 (cross-listed)
  • Authors: Yihang Ding, Wanke Xia, Yiting Zhao, Jinbo Su, Jialiang Yang, Zhengbo Zhang, Ke Wang, Wenming Yang
  • Subjects: cs.CL; cs.AI
  • Tags: Long Context, Benchmark, LLM Evaluation
  • Summary: 本文提出了MemGround,一个基于游戏化交互场景的长时记忆基准测试,采用三层层次化框架评估表面状态记忆、时间联想记忆和推理型记忆。

[96] HUOZIIME: An On-Device LLM-enhanced Input Method for Deep Personalization

  • arXiv: 2604.14159 (cross-listed)
  • Authors: Baocai Shan, Yuzhuang Xu, Wanxiang Che
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Personalization, On-Device Learning, Text Generation
  • Code: code
  • Summary: 本文提出了HUOZIIME,一个由LLM驱动的端侧个性化输入法,通过层次化记忆机制持续捕获和利用用户特定的输入历史,并针对移动端部署进行了系统性优化。

[97] Can Large Language Models Detect Methodological Flaws? Evidence from Gesture Recognition for UAV-Based Rescue Operation Based on Deep Learning

  • arXiv: 2604.14161 (cross-listed)
  • Authors: Domonkos Varga
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: LLM Evaluation, Scientific Reasoning
  • Summary: 本文研究了大型语言模型是否能够检测机器学习研究中的方法论缺陷,特别是数据泄露问题。作者以一篇手势识别论文为案例研究,发现六个最先进的LLM能够一致地识别出评估协议中的缺陷,表明LLM有潜力作为科学审计的辅助工具。

[98] SeaAlert: Critical Information Extraction From Maritime Distress Communications with Large Language Models

  • arXiv: 2604.14163 (cross-listed)
  • Authors: Tomer Atia, Yehudit Aperstein, Alexander Apartsin
  • Subjects: cs.CL; cs.AI
  • Tags: Information Extraction, Speech Processing, Data Synthesis
  • Summary: 本文提出了SeaAlert,一个基于LLM的海上遇险通信分析框架。为解决标注数据稀缺问题,作者开发了合成数据生成管道,生成逼真的海上遇险消息并模拟VHF噪声和ASR错误。

[99] Chinese Essay Rhetoric Recognition Using LoRA, In-context Learning and Model Ensemble

  • arXiv: 2604.14167 (cross-listed)
  • Authors: Yuxuan Lai, Xiajing Wang, Chen Zheng
  • Subjects: cs.CL; cs.AI
  • Tags: Instruction Tuning, In-Context Learning, Education Technology
  • Venue: CCL 2025
  • Summary: 本文探索使用大型语言模型进行中文作文修辞识别任务,结合LoRA微调、上下文学习和模型集成方法。该方法在CCL 2025中文作文修辞识别评测任务的三个赛道中均取得最佳成绩,获得一等奖。

[100] SAGE Celer 2.6 Technical Card

  • arXiv: 2604.14168 (cross-listed)
  • Authors: SAGEA Research Team, Basab Jha, Firoj Paudel, Ujjwal Puri, Adrian Liu, Ethan Henkel, Zhang Yuting, Mateusz Kowalczyk, Mei Huang, Choi Donghyuk, Wang Junhao
  • Subjects: cs.CL; cs.AI
  • Tags: Multimodal Learning, Pre-training, Multilingual Learning
  • Summary: 本文介绍了SAGE Celer 2.6系列通用模型,提供5B、10B和27B三种参数规模。该模型采用逆向推理管道来减少幻觉,并原生集成多模态功能,特别针对南亚语言进行了优化。

[101] Stateful Evidence-Driven Retrieval-Augmented Generation with Iterative Reasoning

  • arXiv: 2604.14170 (cross-listed)
  • Authors: Qi Dong, Ziheng Lin, Ning Ding
  • Subjects: cs.CL; cs.AI
  • Tags: RAG, Question Answering, LLM Reasoning
  • Summary: 本文提出了一种有状态证据驱动的RAG框架,将问答建模为渐进式证据积累过程。该框架维护持久证据池,执行证据驱动的缺陷分析和迭代查询优化,在多个问答基准上取得了优于标准RAG的性能。

[102] Benchmarking Linguistic Adaptation in Comparable-Sized LLMs: A Study of Llama-3.1-8B, Mistral-7B-v0.1, and Qwen3-8B on Romanized Nepali

  • arXiv: 2604.14171 (cross-listed)
  • Authors: Ananda Rimal, Adarsha Rimal
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Evaluation, Low-Resource NLP, Multilingual Learning
  • Summary: 本文对三个可比规模的开源LLM在罗马化尼泊尔语上的语言适应能力进行了系统基准测试。使用QLoRA微调,研究建立了罗马化尼泊尔语适应的首个严格基准,发现Qwen3-8B整体表现最佳。

[103] Tug-of-War within A Decade: Conflict Resolution in Vulnerability Analysis via Teacher-Guided Retrieval-Augmented Generations

  • arXiv: 2604.14172 (cross-listed)
  • Authors: Ziyin Zhou, Jianyi Zhang, Xu ji, Yilong Li, Jiameng Han, Zhangchi Zhao
  • Subjects: cs.CL; cs.AI
  • Tags: RAG, Cybersecurity, Knowledge Conflict
  • Summary: 本文关注CVE漏洞分析中的知识差异和冲突问题,提出了CRVA-TGRAG两阶段框架。该框架通过改进文档检索和教师引导的偏好优化技术来增强LLM在漏洞分析中的能力,有效缓解知识冲突。

[104] QU-NLP at ArchEHR-QA 2026: Two-Stage QLoRA Fine-Tuning of Qwen3-4B for Patient-Oriented Clinical Question Answering and Evidence Sentence Alignment

  • arXiv: 2604.14175 (cross-listed)
  • Authors: Mohammad AL-Smadi
  • Subjects: cs.CL; cs.AI
  • Tags: Question Answering, Medical AI, Parameter-Efficient Fine-Tuning
  • Venue: LREC 2026 Workshop
  • Summary: 本文提出了一个统一系统解决临床问答和证据句子对齐任务,采用两阶段QLoRA微调Qwen3-4B模型。系统首先在临床语料库上建立领域能力,然后在任务特定数据上学习输出风格。

[105] The Devil Is in Gradient Entanglement: Energy-Aware Gradient Coordinator for Robust Generalized Category Discovery

  • arXiv: 2604.14176 (cross-listed)
  • Authors: Haiyang Zheng, Nan Pu, Yaqi Cai, Teng Long, Wenjing Li, Nicu Sebe, Zhun Zhong
  • Subjects: cs.LG; cs.AI; stat.ML
  • Tags: Continual Learning, Representation Learning, Optimization
  • Venue: CVPR 2026
  • Summary: 本文识别了广义类别发现中的梯度纠缠问题,提出了即插即用的能量感知梯度协调器(EAGC)来解决该问题。EAGC通过锚点梯度对齐和能量感知弹性投影来保持已知类别的判别结构并减少子空间重叠。

[106] Listen, Correct, and Feed Back: Spoken Pedagogical Feedback Generation

  • arXiv: 2604.14177 (cross-listed)
  • Authors: Junhong Liang, Yifan Lu, Ekaterina Kochmar, Fajri Koto
  • Subjects: cs.CL; cs.AI
  • Tags: Speech Processing, Education Technology, Instruction Tuning
  • Code: code
  • Summary: 本文介绍了SPFG数据集,用于口语教学反馈生成,基于Speak & Improve Challenge 2025语料库构建。作者比较了监督微调和偏好对齐方法(DPO/KTO)在生成纠错和反馈方面的效果。

[107] An Underexplored Frontier: Large Language Models for Rare Disease Patient Education and Communication -- A scoping review

  • arXiv: 2604.14179 (cross-listed)
  • Authors: Zaifu Zhan, Yu Hou, Kai Yu, Min Zeng, Anita Burgun, Xiaoyi Chen, Rui Zhang
  • Subjects: cs.CL; cs.AI
  • Tags: Medical AI, Survey, Question Answering
  • Summary: 本文对LLM在罕见病患者教育和沟通方面的应用进行了范围综述,识别了12项相关研究。综述发现该领域仍处于早期阶段,主要由通用模型主导,评估侧重准确性而忽视了以患者为中心的维度。

[108] Internal Knowledge Without External Expression: Probing the Generalization Boundary of a Classical Chinese Language Model

  • arXiv: 2604.14180 (cross-listed)
  • Authors: Jiuting Chen, Yuan Lian, Hao Wu, Tianqi Huang, Hiroshi Sasaki, Makoto Kouno, Jongil Choi
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Hallucination, Uncertainty Estimation, Interpretability
  • Summary: 本文在纯古汉语语料上训练了3.18亿参数的Transformer模型,研究内部不确定性和外部表达之间的分离。研究发现模型内部能够区分已知和未知输入,但无法在生成文本中表达这种不确定性。

[109] End-to-End Learning-based Operation of Integrated Energy Systems for Buildings and Data Centers

  • arXiv: 2604.14184 (cross-listed)
  • Authors: Zhenyu Pu, Yu Yang, Liang Yu, Xiaohong Guan
  • Subjects: eess.SY; cs.AI
  • Tags: Energy Management, Optimization, Deep Learning Theory
  • Summary: 本文研究了建筑和数据中心的综合能源系统协调优化问题,提出端到端学习方法将预测模型训练与约束优化整合到统一框架中。该方法相比传统的先预测后优化方法提高了7-9%的运行性能。

[110] HARNESS: Lightweight Distilled Arabic Speech Foundation Models

  • arXiv: 2604.14186 (cross-listed)
  • Authors: Vrunda N. Sukhadia, Shammur Absar Chowdhury
  • Subjects: eess.AS; cs.AI; cs.CL
  • Tags: Speech Processing, Knowledge Distillation, Model Compression
  • Summary: 本文提出了HArnESS,一系列以阿拉伯语为中心的自监督语音模型,采用迭代自蒸馏训练。轻量级学生模型在ASR、方言识别和语音情感识别任务上实现了良好的精度-效率权衡。

[111] Grading the Unspoken: Evaluating Tacit Reasoning in Quantum Field Theory and String Theory with LLMs

  • arXiv: 2604.14188 (cross-listed)
  • Authors: Xingyang Yu, Yinghuan Zhang, Yufei Zhang, Zijun Cui
  • Subjects: cs.AI; cs.CL
  • Tags: LLM Evaluation, Scientific Reasoning, Benchmark
  • Summary: 本文构建了一个量子场论和弦理论专家数据集,引入五级评分标准评估LLM在高度抽象理论物理中的推理能力。研究发现模型在显式推导上表现良好,但在需要重构隐含推理步骤的任务上系统性退化。

[112] The PICCO Framework for Large Language Model Prompting: A Taxonomy and Reference Architecture for Prompt Structure

  • arXiv: 2604.14197 (cross-listed)
  • Authors: David A. Cook
  • Subjects: cs.CL; cs.AI
  • Tags: Prompt Engineering, Survey
  • Summary: 本文通过综合11个已发布的提示框架,提出了PICCO框架用于构建LLM提示。框架提出了五元素参考架构:角色、指令、上下文、约束和输出,以提高提示设计的概念清晰度和系统性。

[113] MixAtlas: Uncertainty-aware Data Mixture Optimization for Multimodal LLM Midtraining

  • arXiv: 2604.14198 (cross-listed)
  • Authors: Bingbing Wen, Sirajul Salekin, Feiyang Kang, Bill Howe, Lucy Lu Wang, Javier Movellan, Manjot Bilkhu
  • Subjects: cs.LG; cs.AI; cs.CL
  • Tags: Data Mixing, Multimodal Learning, Vision-Language Model
  • Summary: 本文提出了MixAtlas方法,用于优化多模态LLM中间训练的数据混合。该方法沿图像概念和任务监督两个轴分解训练语料库,使用代理模型和高斯过程代理搜索最优混合方案,在Qwen2-7B上实现了8.5%-17.6%的性能提升。

[114] PolyBench: Benchmarking LLM Forecasting and Trading Capabilities on Live Prediction Market Data

  • arXiv: 2604.14199 (cross-listed)
  • Authors: Pu Cheng, Juncheng Liu, Yunshen Long
  • Subjects: q-fin.CP; cs.AI; cs.LG
  • Tags: Benchmark, LLM Evaluation, Financial AI
  • Code: code
  • Summary: 本文提出了PolyBench,一个基于Polymarket的多模态基准测试,用于评估LLM在实时预测市场数据上的预测和交易能力。实验评估了七个最先进的LLM,发现只有两个模型实现了正收益,揭示了语言流畅性与真实概率推理之间的差距。

[115] Retina gap junctions support the robust perception by warping neural representational geometries along the visual hierarchy

  • arXiv: 2604.14200 (cross-listed)
  • Authors: Yang Yue, Shenjian Zhang, Yonghong Tian, Kai Du, Tiejun Huang
  • Subjects: q-bio.NC; cs.AI
  • Tags: Adversarial Robustness, Computer Vision, Neuroscience
  • Summary: 本文将视网膜间隙连接滤波器(G-filter)与DNN结合,构建生物混合模型来研究对抗攻击的防御性能。从几何角度分析表明,该模型具有高度非线性的2D决策边界和较低的曲率,揭示了视网膜间隙连接对大脑流形的渐进式影响。

[116] Bridging scalp and intracranial EEG in BCI via pretrained neural representations and geometric constraint embedding

  • arXiv: 2604.14202 (cross-listed)
  • Authors: Yihang Dong, Changhong Jing, Shuqiang Wang
  • Subjects: q-bio.NC; cs.AI
  • Tags: Brain-Computer Interface, Representation Learning, Signal Processing
  • Summary: 本文提出了一个统一的数据和先验知识驱动框架,用于EEG-iEEG表示增强。该框架将静态皮层解剖结构映射为神经信号传播的动态约束,结合预训练大型EEG模型提取的通用神经表示,通过多维表示扩散过程合成增强的EEG信号。

[117] Disentangled Dual-Branch Graph Learning for Conversational Emotion Recognition

  • arXiv: 2604.14204 (cross-listed)
  • Authors: Chengling Guo, Yuntao Shou, Tao Meng, Wei Ai, Yun Tan, Keqin Li
  • Subjects: cs.SD; cs.AI; eess.AS
  • Tags: Graph Neural Network, Affective Computing, Multimodal Learning
  • Summary: 本文提出了一个结合双空间特征解耦和双分支图学习的框架,用于对话中的多模态情感识别。该框架使用共享编码器和模态特定编码器分离表示,通过傅里叶图神经网络和说话者感知超图分别建模全局一致性和高阶交互。

[118] Towards Verified and Targeted Explanations through Formal Methods

  • arXiv: 2604.14209 (cross-listed)
  • Authors: Hanchen David Wang, Diego Manzanas Lopez, Preston K. Robinette, Ipek Oguz, Taylor T. Johnson, Meiyi Ma
  • Subjects: cs.LG; cs.AI; stat.ML
  • Tags: Explainable AI, Formal Methods, Adversarial Robustness
  • Venue: JAIR
  • Summary: 本文介绍了ViTaX,一个形式化XAI框架,用于生成具有数学保证的目标半事实解释。对于给定输入和用户指定的关键替代类别,ViTaX识别最敏感的特征子集,并应用形式化可达性分析来保证扰动这些特征不会将分类翻转到目标类别。

[119] Ollivier-Ricci Curvature of Riemannian Manifolds and Directed Graphs with Applications to Graph Neural Networks

  • arXiv: 2604.14211 (cross-listed)
  • Authors: Eleanor Wiesler
  • Subjects: math.DG; cs.AI; cs.SI; math.CO
  • Tags: Graph Neural Network, Optimal Transport, Graph Learning
  • Summary: 本文阐述了Ollivier-Ricci曲率的理论,包括其与黎曼流形经典Ricci曲率的联系,以及在图上的扩展。论文还提出了关于有向图的新颖结果,并讨论了基于图的Ollivier-Ricci曲率在网络科学和图机器学习中的应用。

[120] CROP: Token-Efficient Reasoning in Large Language Models via Regularized Prompt Optimization

  • arXiv: 2604.14214 (cross-listed)
  • Authors: Deep Shah, Sanket Badhe, Nehal Kathrotia, Priyanka Tiwari
  • Subjects: cs.CL; cs.AI
  • Tags: Prompt Engineering, LLM Reasoning, LLM Inference
  • Venue: ICLR 2026 Workshop
  • Summary: 本文提出了CROP,一种自动提示优化方法,通过引入响应长度正则化来生成简洁的推理轨迹。该方法在GSM8K、LogiQA和BIG-Bench Hard数据集上实现了80.6%的token消耗减少,同时保持竞争性准确率。

[121] PriHA: A RAG-Enhanced LLM Framework for Primary Healthcare Assistant in Hong Kong

  • arXiv: 2604.14215 (cross-listed)
  • Authors: Richard Wai Cheung Chan, Shanru Lin, Ya-nan Ma, Hao Chen, Liangjun Jiang, Wenqi Fan
  • Subjects: cs.IR; cs.AI
  • Tags: RAG, Medical AI, Dialogue System
  • Venue: PAKDD 2026
  • Summary: 本文提出了PriHA,一个面向香港基层医疗的RAG增强LLM系统。该系统采用三阶段流水线,包括查询优化器和新型双检索增强生成(DRAG)架构,用于混合源检索和上下文重组生成。

[122] Neuro-Oracle: A Trajectory-Aware Agentic RAG Framework for Interpretable Epilepsy Surgical Prognosis

  • arXiv: 2604.14216 (cross-listed)
  • Authors: Aizierjiang Aiersilan, Mohamad Koubeissi
  • Subjects: cs.MM; cs.AI; cs.CL; cs.CV; cs.GR; cs.LG
  • Tags: RAG, Medical AI, LLM Agent
  • Summary: 本文提出了Neuro-Oracle,一个三阶段框架用于癫痫手术预后预测。该框架使用3D孪生对比编码器将术前到术后的MRI变化蒸馏为轨迹向量,通过最近邻搜索检索相似手术轨迹,并使用量化Llama-3-8B推理代理生成自然语言预后。

[123] MEME-Fusion@CHiPSAL 2026: Multimodal Ablation Study of Hate Detection and Sentiment Analysis on Nepali Memes

  • arXiv: 2604.14218 (cross-listed)
  • Authors: Samir Wagle, Reewaj Khanal, Abiral Adhikari
  • Subjects: cs.CL; cs.AI
  • Tags: Multimodal Learning, Content Moderation, Low-Resource NLP
  • Code: code
  • Summary: 本文提出了一个混合跨模态注意力融合架构,用于尼泊尔语模因的仇恨言论检测和情感分析。系统评估表明,显式跨模态推理在仇恨检测子任务上比纯文本基线提高了5.9%的F1分数,同时揭示了英语中心视觉模型在梵文字体上的性能问题。

[124] Knowledge Graph RAG: Agentic Crawling and Graph Construction in Enterprise Documents

  • arXiv: 2604.14220 (cross-listed)
  • Authors: Koushik Chakraborty, Koyel Guha
  • Subjects: cs.IR; cs.AI
  • Tags: RAG, Knowledge Graph, LLM Agent
  • Summary: 本文提出了具有递归爬取功能的代理知识图谱,用于解决企业文档生态系统中语义搜索的局限性。在联邦法规代码(CFR)上的基准评估表明,这种知识图谱增强方法比标准基于向量的RAG系统实现了70%的准确率提升。

[125] Adaptive Query Routing: A Tier-Based Framework for Hybrid Retrieval Across Financial, Legal, and Medical Documents

  • arXiv: 2604.14222 (cross-listed)
  • Authors: Afshan Hashmi
  • Subjects: cs.IR; cs.AI
  • Tags: RAG, Information Retrieval, LLM Evaluation
  • Summary: 本文在金融、法律和医疗文档领域评估了三种检索架构:向量RAG、树推理和自适应混合检索。实验引入了四层查询复杂度基准,发现没有单一范式在所有层级上占主导地位,支持根据查询复杂度动态选择策略的自适应检索系统。

[126] TRACE: A Conversational Framework for Sustainable Tourism Recommendation with Agentic Counterfactual Explanations

  • arXiv: 2604.14223 (cross-listed)
  • Authors: Ashmi Banerjee, Adithi Satish, Wolfgang Wörndl, Yashar Deldjoo
  • Subjects: cs.IR; cs.AI
  • Tags: LLM Agent, Recommender System, Multi-Agent System
  • Summary: 本文提出了TRACE,一个基于多代理LLM的可持续旅游推荐框架。该框架使用模块化的编排器-工作者架构,通过代理反事实解释和LLM驱动的澄清问题来促进可持续旅游决策,同时保持推荐质量和交互响应性。

[127] FRESCO: Benchmarking and Optimizing Re-rankers for Evolving Semantic Conflict in Retrieval-Augmented Generation

  • arXiv: 2604.14227 (cross-listed)
  • Authors: Sohyun An, Hayeon Lee, Shuibenyang Yuan, Chun-cheng Jason Chen, Cho-Jui Hsieh, Vijai Mohan, Alexander Min
  • Subjects: cs.IR; cs.AI
  • Tags: RAG, Benchmark, LLM Evaluation
  • Summary: 本文介绍了FRESCO,一个用于评估时间动态上下文中重排序器的基准测试。通过将寻求新近性的查询与维基百科历史修订配对,FRESCO揭示了现有重排序器对较旧、语义丰富文档的强烈偏见,并提出了指令优化框架来缓解这一问题。

[128] Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems

  • arXiv: 2604.14228 (cross-listed)
  • Authors: Jiacheng Liu, Xiaohan Zhao, Xinyi Shang, Zhiqiang Shen
  • Subjects: cs.SE; cs.AI; cs.CL; cs.LG
  • Tags: LLM Agent, Software Engineering, Human-Computer Interaction
  • Code: code
  • Summary: 本文分析了Claude Code的综合架构,并将其与OpenClaw进行比较。研究识别了五个驱动架构设计的人类价值观和需求,追溯了十三个设计原则到具体实现选择,并确定了未来代理系统的六个开放设计方向。

[129] Magnitude Is All You Need? Rethinking Phase in Quantum Encoding of Complex SAR Data

  • arXiv: 2604.14229 (cross-listed)
  • Authors: Sakthi Prabhu Gunasekar, Prasanna Kumar R
  • Subjects: cs.AI; cs.LG; eess.IV
  • Tags: Quantum Computing, Computer Vision
  • Summary: 本文研究了SAR数据的量子编码策略,发现在混合量子-经典架构中仅幅度编码优于复数值策略,而在纯量子架构中相位信息变得至关重要。

[130] Shapley Value-Guided Adaptive Ensemble Learning for Explainable Financial Fraud Detection with U.S. Regulatory Compliance Validation

  • arXiv: 2604.14231 (cross-listed)
  • Authors: Mohammad Nasir Uddin, Md Munna Aziz
  • Subjects: cs.LG; cs.AI; cs.NE
  • Tags: Financial AI, Explainable AI, Graph Neural Network
  • Summary: 本文提出了一种基于SHAP引导的自适应集成方法(SGAE)用于金融欺诈检测,通过动态调整集成权重实现了高AUC-ROC并满足监管合规的可解释性要求。

[131] Explainable Graph Neural Networks for Interbank Contagion Surveillance: A Regulatory-Aligned Framework for the U.S. Banking Sector

  • arXiv: 2604.14232 (cross-listed)
  • Authors: Mohammad Nasir Uddin
  • Subjects: cs.LG; cs.AI
  • Tags: Graph Neural Network, Financial AI, Explainable AI
  • Summary: 本文提出了时空图注意力网络(ST-GAT)框架,用于美国银行间系统的早期预警检测和宏观审慎监管,在FDIC数据上实现了高AUPRC。

[132] Graph-Based Fraud Detection with Dual-Path Graph Filtering

  • arXiv: 2604.14235 (cross-listed)
  • Authors: Wei He, Wensheng Gan, Philip S. Yu
  • Subjects: cs.LG; cs.AI
  • Tags: Graph Neural Network, Financial AI
  • Summary: 本文提出了DPF-GFD,一种基于双路径图滤波的欺诈检测模型,通过分别捕获结构模式和特征相似性来处理欺诈图中的关系伪装和高异质性。

[133] Optimistic Policy Learning under Pessimistic Adversaries with Regret and Violation Guarantees

  • arXiv: 2604.14243 (cross-listed)
  • Authors: Sourav Ganguly, Kartik Pandit, Arnob Ghosh
  • Subjects: cs.LG; cs.AI
  • Tags: Reinforcement Learning, Adversarial Robustness, AI Safety
  • Summary: 本文研究了显式对抗动态下的安全约束强化学习,提出了RHC-UCRL算法,在代理面临对抗策略时实现次线性遗憾和约束违反保证。

[134] Awakening Dormant Experts:Counterfactual Routing to Mitigate MoE Hallucinations

  • arXiv: 2604.14246 (cross-listed)
  • Authors: Wentao Hu, Yanbo Zhai, Xiaohui Hu, Mingkuan Zhao, Shanhong yu, Xue Liu, Kaidong Yu, Shuangyong Song, Xuelong Li
  • Subjects: cs.LG; cs.AI
  • Tags: LLM Hallucination, Mixture-of-Experts
  • Summary: 本文提出了反事实路由,一种无需训练的推理框架,用于唤醒混合专家模型中的休眠专家以减少幻觉,在基准测试上平均提高3.1%的事实准确性。

[135] Evaluation of Agents under Simulated AI Marketplace Dynamics

  • arXiv: 2604.14256 (cross-listed)
  • Authors: To Eun Kim, Alireza Salemi, Hamed Zamani, Fernando Diaz
  • Subjects: cs.IR; cs.AI
  • Tags: LLM Evaluation, LLM Agent, Multi-Agent System
  • Venue: SIGIR 2026
  • Summary: 本文引入了市场评估范式,一种基于仿真的评估框架,将信息访问系统作为竞争市场的参与者进行评估,支持保留率和市场份额等纵向指标。

[136] ReviewGrounder: Improving Review Substantiveness with Rubric-Guided, Tool-Integrated Agents

  • arXiv: 2604.14261 (cross-listed)
  • Authors: Zhuofeng Li, Yi Lu, Dongfu Jiang, Haoxiang Zhang, Yuyang Bai, Chuan Li, Yu Wang, Shuiwang Ji, Jianwen Xie, Yu Zhang
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Agent, Benchmark, Multi-Agent System
  • Code: code
  • Summary: 本文引入了REVIEWBENCH基准和REVIEWGROUNDER框架,一种基于评分标准引导的多代理系统,通过将审阅分解为起草和论证阶段来提高AI同行评审质量。

[137] GUI-Perturbed: Domain Randomization Reveals Systematic Brittleness in GUI Grounding Models

  • arXiv: 2604.14262 (cross-listed)
  • Authors: Yangyue Wang, Harshvardhan Sikka, Yash Mathur, Tony Zhou, Jinu Nyachhyon, Pranav Guruprasad
  • Subjects: cs.LG; cs.AI
  • Tags: GUI Automation, LLM Evaluation
  • Summary: 本文引入了GUI-Perturbed框架,通过独立变化视觉场景和指令来评估GUI定位模型的鲁棒性,揭示了关系指令导致的系统性准确率崩溃。

[138] Reinforcement Learning via Value Gradient Flow

  • arXiv: 2604.14265 (cross-listed)
  • Authors: Haoran Xu, Kaiwen Hu, Somayeh Sojoudi, Amy Zhang
  • Subjects: cs.LG; cs.AI
  • Tags: Reinforcement Learning, Offline RL, LLM Training
  • Venue: ICLR 2026
  • Summary: 本文提出了价值梯度流(VGF),一种可扩展的行为正则化强化学习范式,将问题转化为最优传输问题,在离线强化学习和LLM任务上实现了最先进结果。

[139] Enhancing LLM-based Search Agents via Contribution Weighted Group Relative Policy Optimization

  • arXiv: 2604.14267 (cross-listed)
  • Authors: Junzhe Wang, Zhiheng Xi, yajie yang, Hao Luo, Shihan Dou, Tao Gui, Qi Zhang
  • Subjects: cs.LG; cs.AI
  • Tags: LLM Agent, Reinforcement Learning, Information Retrieval
  • Venue: ACL 2026
  • Summary: 本文提出了CW-GRPO框架,将过程监督集成到组相对策略优化中用于训练LLM搜索代理,在知识密集型基准测试上实现了显著提升。

[140] Quantum-inspired tensor networks in machine learning models

  • arXiv: 2604.14287 (cross-listed)
  • Authors: Guillermo Valverde, Igor García-Olaizola, Giannicola Scarpa, Alejandro Pozas-Kerstjens
  • Subjects: cs.LG; cs.AI
  • Tags: Survey, Quantum Computing, Representation Learning
  • Code: code
  • Summary: 本文综述了张量网络在机器学习中的应用,讨论了其在计算效率、可解释性和隐私方面的潜在优势以及需要克服的挑战。

[141] EuropeMedQA Study Protocol: A Multilingual, Multimodal Medical Examination Dataset for Language Model Evaluation

  • arXiv: 2604.14306 (cross-listed)
  • Authors: Francesco Andrea Causio, Vittorio De Vita, Olivia Riccomi, Michele Ferramola, Federico Felizzi, Antonio Cristiano, Lorenzo De Mori, Chiara Battipaglia, Melissa Sawaya, Luigi De Angelis, Marcello Di Pumpo, Alessandra Piscitelli, Pietro Eric Risuleo, Alessia Longo, Giulia Vojvodic, Mariapia Vassalli, Bianca Destro Castaniti, Nicolò Scarsi, Manuel Del Medico
  • Subjects: cs.CL; cs.AI
  • Tags: Medical AI, Benchmark, Multimodal Learning
  • Summary: 本文描述了EuropeMedQA的开发,这是首个来自欧洲监管考试的多语言多模态医学考试数据集,用于评估LLM的跨语言迁移和视觉推理能力。

[142] Aerial Multi-Functional RIS in Fluid Antennas-Aided Full-Duplex Networks: A Self-Optimized Hybrid Deep Reinforcement Learning Approach

  • arXiv: 2604.14309 (cross-listed)
  • Authors: Li-Hsiang Shen, Yu-Quan Zheng
  • Subjects: cs.IT; cs.AI; eess.SP
  • Tags: Reinforcement Learning, Wireless Networks, Multi-Agent System
  • Summary: 本文提出了一种集成自主飞行器和多功能可重构智能表面的架构,采用自优化混合深度强化学习方法来最大化全双工网络的能效。

[143] DharmaOCR: Specialized Small Language Models for Structured OCR that outperform Open-Source and Commercial Baselines

  • arXiv: 2604.14314 (cross-listed)
  • Authors: Gabriel Pimenta de Freitas Cardoso, Caio Lucas da Silva Chacon, Jonas Felipe da Fonseca Oliveira, Paulo Henrique de Medeiros Araujo
  • Subjects: cs.CV; cs.AI; cs.CL
  • Tags: OCR, Document Understanding, Instruction Tuning
  • Summary: 本文介绍了DharmaOCR,一种用于结构化OCR的专用小语言模型,使用DPO减少文本退化,在DharmaOCR-Benchmark上超越了开源和商业基线。

[144] Challenges and Future Directions in Agentic Reverse Engineering Systems

  • arXiv: 2604.14317 (cross-listed)
  • Authors: Salem Radey, Jack West, Kassem Fawaz
  • Subjects: cs.CR; cs.AI
  • Tags: LLM Agent, Cybersecurity, Software Engineering
  • Venue: SAGAI 2026 Workshop
  • Summary: 本文分析了代理系统在逆向工程任务中的表现,识别了包括token限制、混淆处理困难和缺乏程序护栏等局限性,并提出了未来改进方向。

[145] Faithfulness Serum: Mitigating the Faithfulness Gap in Textual Explanations of LLM Decisions via Attribution Guidance

  • arXiv: 2604.14325 (cross-listed)
  • Authors: Bar Alon, Itamar Zimerman, Lior Wolf
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Reasoning, Interpretability, Explainable AI
  • Summary: 本文针对LLM文本解释的忠实性问题,提出了一种无需训练的方法,通过注意力层面的干预来增强解释的认知忠实性。实验表明该方法在多个模型和基准测试上显著提升了解释的忠实性。

[146] Thermodynamic Diffusion Inference with Minimal Digital Conditioning

  • arXiv: 2604.14332 (cross-listed)
  • Authors: Aditi De
  • Subjects: cs.LG; cs.AI
  • Tags: Diffusion Model, Energy Efficiency, Hardware Acceleration
  • Summary: 本文提出了一种热力学扩散推理方法,利用物理基底替代数字运算进行扩散模型推理。通过分层双线性耦合和最小数字接口,该方法在保持高精度的同时实现了相对于GPU约10000倍的能耗降低。

[147] Mamba-SSM with LLM Reasoning for Biomarker Discovery: Causal Feature Refinement via Chain-of-Thought Gene Evaluation

  • arXiv: 2604.14334 (cross-listed)
  • Authors: Pushpa Kumar Balan, Aijing Feng
  • Subjects: q-bio.QM; cs.AI
  • Tags: LLM Reasoning, Medical AI, Bioinformatics
  • Venue: ICLR 2026 Workshop
  • Summary: 本文将Mamba SSM与LLM思维链推理相结合用于乳腺癌生物标志物发现。LLM筛选的基因集在更少特征下实现了更高的AUC性能,但忠实性审计揭示了下游性能与推理忠实性之间的差距。

[148] Tight Sample Complexity Bounds for Best-Arm Identification Under Bounded Systematic Bias

  • arXiv: 2604.14345 (cross-listed)
  • Authors: Tianhao Qian
  • Subjects: cs.LG; cs.AI; stat.ML
  • Tags: LLM Reasoning, Automated Planning, Reinforcement Learning
  • Summary: 本文将自主推理中的节点扩展建模为带有界系统偏差的最佳臂识别问题,建立了样本复杂度界限并证明了结构限制。实验表明遵循局部安全边界可以有效保持最优轨迹。

[149] When PCOS Meets Eating Disorders: An Explainable AI Approach to Detecting the Hidden Triple Burden

  • arXiv: 2604.14356 (cross-listed)
  • Authors: Apoorv Prasad, Susan McRoy
  • Subjects: cs.CL; cs.AI
  • Tags: Medical AI, Explainable AI, Text Classification
  • Summary: 本文开发了小型开源语言模型来自动检测社交媒体帖子中PCOS相关的三重负担问题。模型使用LoRA微调生成带有文本证据的结构化解释,在筛查任务中取得了良好性能。

[150] APEX-MEM: Agentic Semi-Structured Memory with Temporal Reasoning for Long-Term Conversational AI

  • arXiv: 2604.14362 (cross-listed)
  • Authors: Pratyay Banerjee, Masud Moshtaghi, Shivashankar Subramanian, Amita Misra, Ankit Chadha
  • Subjects: cs.CL; cs.AI; cs.IR
  • Tags: LLM Agent, Dialogue System, Long Context
  • Venue: ACL 2026
  • Summary: 本文提出了APEX-MEM对话记忆系统,使用属性图将对话结构化为时间锚定的事件。该系统通过多工具检索代理在LOCOMO和LongMemEval基准测试上取得了优异性能。

[151] The Cost of Language: Centroid Erasure Exposes and Exploits Modal Competition in Multimodal Language Models

  • arXiv: 2604.14363 (cross-listed)
  • Authors: Akshay Paruchuri, Ishan Chatterjee, Henry Fuchs, Ehsan Adeli, Piotr Didyk
  • Subjects: cs.CL; cs.AI; cs.CV
  • Tags: Vision-Language Model, Multimodal Learning, Interpretability
  • Summary: 本文使用质心替换作为探针研究多模态语言模型中的模态竞争问题。研究发现语言表示在视觉推理任务中仍主导视觉表示,并提出文本质心对比解码方法来恢复准确率。

[152] SatBLIP: Context Understanding and Feature Identification from Satellite Imagery with Vision-Language Learning

  • arXiv: 2604.14373 (cross-listed)
  • Authors: Xue Wu, Shengting Cao, Jiaqi Gong
  • Subjects: cs.CV; cs.AI
  • Tags: Vision-Language Model, Remote Sensing, Interpretability
  • Summary: 本文提出了SatBLIP,一个用于农村环境理解的卫星视觉-语言框架。该模型通过对比图文对齐和自举字幕生成来预测县级社会脆弱性指数,并使用SHAP识别关键属性。

[153] Modular Continual Learning via Zero-Leakage Reconstruction Routing and Autonomous Task Discovery

  • arXiv: 2604.14375 (cross-listed)
  • Authors: Noureddine Kermiche
  • Subjects: cs.LG; cs.AI
  • Tags: Continual Learning, Knowledge Distillation, Neural Architecture
  • Summary: 本文提出了一种硅原生模块化架构用于持续学习,通过任务特定专家和分布式门控机制实现结构参数隔离。该框架使用紧瓶颈自编码器区分流形,实现无灾难性遗忘的终身学习。

[154] Step-level Denoising-time Diffusion Alignment with Multiple Objectives

  • arXiv: 2604.14379 (cross-listed)
  • Authors: Qi Zhang, Dawei Wang, Shaofeng Zou
  • Subjects: cs.LG; cs.AI; cs.CV
  • Tags: Diffusion Model, Reinforcement Learning, Multi-Task Learning
  • Summary: 本文提出了MSDDA,一种无需重新训练的多目标扩散模型对齐框架。该方法以闭式获得最优反向去噪分布,并被证明与步骤级RL微调完全等价且无近似误差。

[155] Coalition Formation in LLM Agent Networks: Stability Analysis and Convergence Guarantees

  • arXiv: 2604.14386 (cross-listed)
  • Authors: Dongxin Guo, Jikun Wu, Siu-Ming Yiu
  • Subjects: cs.GT; cs.AI
  • Tags: LLM Agent, Multi-Agent System, Game AI
  • Summary: 本文首次将LLM智能体网络中的联盟形成问题建立在享乐博弈论框架中,提出了LLM联盟形成博弈并提供了形式化稳定性保证。实验验证了该框架在多个LLM模型上的有效性。

[156] BiCon-Gate: Consistency-Gated De-colloquialisation for Dialogue Fact-Checking

  • arXiv: 2604.14389 (cross-listed)
  • Authors: Hyunkyung Park, Arkaitz Zubiaga
  • Subjects: cs.CL; cs.AI
  • Tags: Dialogue System, Fact Checking, Information Extraction
  • Venue: FEVER 2026
  • Summary: 本文提出了BiCon-Gate,一种用于对话事实核查的语义感知一致性门控机制。该方法通过分阶段去口语化处理对话中的口语表达,在DialFact基准测试上提升了证据检索和事实验证性能。

[157] Generating Concept Lexicalizations via Dictionary-Based Cross-Lingual Sense Projection

  • arXiv: 2604.14397 (cross-listed)
  • Authors: David Basil, Chirooth Girigowda, Bradley Hauer, Sahir Momin, Ning Shi, Grzegorz Kondrak
  • Subjects: cs.CL; cs.AI
  • Tags: Machine Translation, Knowledge Graph, Multilingual Learning
  • Venue: Canadian AI 2026
  • Summary: 本文提出了一种通过双语词典进行跨语言义素投影的方法,用于将WordNet风格词汇资源扩展到新语言。该投影-过滤策略在提高精度的同时保持可解释性且需要较少外部资源。

[158] SpaceMind: A Modular and Self-Evolving Embodied Vision-Language Agent Framework for Autonomous On-orbit Servicing

  • arXiv: 2604.14399 (cross-listed)
  • Authors: Aodi Wu, Haodong Han, Xubo Luo, Ruisuo Wang, Shan He, Xue Wan
  • Subjects: cs.RO; cs.AI; eess.SY
  • Tags: Embodied AI, Vision-Language Model, Robotics
  • Code: code
  • Summary: 本文提出了SpaceMind,一个用于自主在轨服务的模块化视觉-语言智能体框架。该框架将知识、工具和推理分解为可扩展维度,并包含技能自演化机制以实现持续改进。

[159] Three-Phase Transformer

  • arXiv: 2604.14430 (cross-listed)
  • Authors: Mohammad R. Abu Ayyash
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: Pre-training, Neural Architecture, Transformer Architecture
  • Code: code
  • Summary: 本文提出了三相Transformer(3PT),一种用于解码器Transformer的残差流结构先验。该架构将隐藏向量划分为循环通道,在WikiText-103上实现了困惑度降低和收敛速度提升。

[160] LLMs taking shortcuts in test generation: A study with SAP HANA and LevelDB

  • arXiv: 2604.14437 (cross-listed)
  • Authors: Vekil Bekmyradov, Noah C. Pütz, Thomas Bartz-Beielstein
  • Subjects: cs.SE; cs.AI
  • Tags: Code Generation, Software Testing, LLM Evaluation
  • Summary: 本文研究了LLM在自动化测试生成中的行为,对比了开源系统(LevelDB)和专有系统(SAP HANA)上的表现。结果表明LLM在熟悉基准上表现出色,但在未见过的复杂领域常常优先考虑可编译性而非语义有效性。

[161] Hierarchical vs. Flat Iteration in Shared-Weight Transformers

  • arXiv: 2604.14442 (cross-listed)
  • Authors: Sang-Il Han
  • Subjects: cs.CL; cs.AI
  • Tags: Transformer Architecture, Neural Architecture
  • Summary: 本文实证研究了层次化共享权重循环结构是否能在Transformer语言模型中达到与独立层堆叠相当的表示质量。作者提出HRM-LM模型,用快速模块和慢速模块的循环对替代独立Transformer层,发现两种方法之间存在显著的性能差距。

[162] Robustness Analysis of Machine Learning Models for IoT Intrusion Detection Under Data Poisoning Attacks

  • arXiv: 2604.14444 (cross-listed)
  • Authors: Fortunatus Aabangbio Wulnye, Justice Owusu Agyemang, Kwame Opuni-Boachie Obour Agyekum, Kwame Agyeman-Prempeh Agyekum, Kingsford Sarkodie Obeng Kwakye, Francisca Adomaa Acheampong
  • Subjects: cs.CR; cs.AI
  • Tags: Data Poisoning, IoT, Cybersecurity
  • Summary: 本研究评估了四种机器学习分类器(随机森林、梯度提升机、逻辑回归和深度神经网络)在IoT入侵检测系统中面对数据投毒攻击时的鲁棒性。结果表明集成模型表现相对稳定,而逻辑回归和深度神经网络在标签操纵和异常值攻击下性能下降高达40%。

[163] Crowdsourcing of Real-world Image Annotation via Visual Properties

  • arXiv: 2604.14449 (cross-listed)
  • Authors: Xiaolei Diao, Fausto Giunchiglia
  • Subjects: cs.CV; cs.AI
  • Tags: Data Annotation, Knowledge Representation
  • Summary: 本文提出了一种图像标注方法,通过整合知识表示、自然语言处理和计算机视觉技术,利用视觉属性约束来减少标注者的主观性。研究引入了一个交互式众包框架,根据预定义的对象类别层次结构和标注者反馈动态提问,引导图像标注过程。

[164] FAIR Universe Weak Lensing ML Uncertainty Challenge: Handling Uncertainties and Distribution Shifts for Precision Cosmology

  • arXiv: 2604.14451 (cross-listed)
  • Authors: Biwei Dai, Po-Wen Chang, Wahid Bhimji, Paolo Calafiura, Ragansu Chakkappai, Yuan-Tang Chou, Sascha Diefenbacher, Jordan Dudley, Ibrahim Elsharkawy, Steven Farrell, Isabelle Guyon, Chris Harris, Elham E Khoda, Benjamin Nachman, David Rousseau, Uroš Seljak, Ihsan Ullah, Yulei Zhang
  • Subjects: astro-ph.CO; cs.AI; cs.CV
  • Tags: Benchmark, Uncertainty Estimation, Scientific Computing
  • Code: code
  • Summary: 本文发布了首个包含真实系统误差的弱引力透镜基准数据集,并启动了FAIR Universe弱透镜机器学习不确定性挑战赛。该挑战专注于在有限训练数据和潜在分布偏移的情况下从弱透镜数据测量宇宙基本属性,为方法比较提供标准化基准。

[165] FocalLens: Visualizing Narratives through Focalization

  • arXiv: 2604.14456 (cross-listed)
  • Authors: S M Raihanul Alam, Md Dilshadur Rahman, Md Naimul Hoque
  • Subjects: cs.HC; cs.AI
  • Tags: Data Visualization, Natural Language Understanding
  • Summary: 本文提出FocalLens,一种利用聚焦化(即确定叙事中谁在感知事件)来表示叙事的新型可视化方法。该工具帮助作家和学者分析故事中不同角色如何感知、参与、观察和叙述事件,为叙事分析提供了新的分析视角。

[166] Auxiliary Finite-Difference Residual-Gradient Regularization for PINNs

  • arXiv: 2604.14472 (cross-listed)
  • Authors: Stavros Kassinos
  • Subjects: cs.LG; cs.AI; cs.CE
  • Tags: Physics-Informed Learning, Scientific Computing
  • Summary: 本文提出了一种混合设计的物理信息神经网络方法,将有限差分作为辅助正则化项来惩罚采样残差场的梯度。实验表明该方法在泊松方程和三维环形热传导基准测试中改善了边界条件精度和壁面通量预测。

[167] A Nonasymptotic Theory of Gain-Dependent Error Dynamics in Behavior Cloning

  • arXiv: 2604.14484 (cross-listed)
  • Authors: Junghoon Seo
  • Subjects: cs.RO; cs.AI; math.OC
  • Tags: Imitation Learning, Robotics
  • Summary: 本文为行为克隆在位置控制机器人上的失败率建立了非渐近理论,揭示了控制器增益如何影响闭环动态中的误差传播。分析表明柔顺过阻尼控制器能提高行为克隆的成功率,为经验发现提供了首个非渐近解释。

[168] Decoupling Identity from Utility: Privacy-by-Design Frameworks for Financial Ecosystems

  • arXiv: 2604.14495 (cross-listed)
  • Authors: Ifayoyinsola Ibikunle, Tyler Farnan, Senthil Kumar, Mayana Pereira
  • Subjects: cs.CE; cs.AI; cs.CR
  • Tags: Differential Privacy, Financial AI, Data Synthesis
  • Summary: 本文探讨了差分隐私合成数据作为金融机构隐私设计框架的解决方案,比较了直接表格合成和DP种子代理建模两种生成范式。这些方法在满足监管要求的同时消除了传统数据清理瓶颈,实现了跨机构研究和合规决策。

[169] On the Expressive Power and Limitations of Multi-Layer SSMs

  • arXiv: 2604.14501 (cross-listed)
  • Authors: Nikola Zubić, Qian Li, Yuyi Wang, Davide Scaramuzza
  • Subjects: cs.LG; cs.AI; cs.CC
  • Tags: Deep Learning Theory, Neural Architecture
  • Summary: 本文研究了多层状态空间模型的表达能力和局限性,发现在组合任务中存在根本限制。研究表明在线思维链可以使SSM等价于流算法,而离线思维链不会根本增加表达能力,揭示了深度、精度和思维链如何塑造SSM的能力边界。

[170] NewsTorch: A PyTorch-based Toolkit for Learner-oriented News Recommendation

  • arXiv: 2604.14510 (cross-listed)
  • Authors: Rongyao Wang, Veronica Liesaputra, Zhiyi Huang
  • Subjects: cs.IR; cs.AI
  • Tags: Recommender System, Education Technology
  • Code: code
  • Summary: 本文提出了NewsTorch,一个基于PyTorch的新闻推荐工具包,专为学习者设计。该工具包提供模块化、解耦和可扩展的框架,配有用户友好的GUI平台,支持数据集下载预处理以及最先进神经新闻推荐模型的训练、验证和测试。

[171] CBCL: Safe Self-Extending Agent Communication

  • arXiv: 2604.14512 (cross-listed)
  • Authors: Hugo O'Connor
  • Subjects: cs.CR; cs.AI; cs.FL; cs.LO
  • Tags: Multi-Agent System, Formal Methods
  • Venue: IEEE LangSec 2026 Workshop
  • Summary: 本文提出了CBCL,一种代理通信语言,将所有消息约束在确定性上下文无关语言类中,同时允许代理定义和传输领域特定的方言扩展。该方法在Lean 4中形式化了安全属性,并在Rust中实现了参考解析器和方言引擎。

[172] CSRA: Controlled Spectral Residual Augmentation for Robust Sepsis Prediction

  • arXiv: 2604.14532 (cross-listed)
  • Authors: Honglin Guo, Rihao Chang, He Jiao, Weizhi Nie, Zhongheng Zhang, Yuehao Shen
  • Subjects: cs.LG; cs.AI
  • Tags: Medical AI, Data Augmentation, Time Series Analysis
  • Summary: 本文提出了CSRA框架,用于短窗口ICU时间序列的脓毒症预测,通过在频谱域进行输入自适应残差扰动来生成临床合理的轨迹变化。实验表明该方法在MIMIC-IV数据集上持续改善预测精度,尤其在较短观察窗口和有限训练数据下表现更优。

[173] VeriGraphi: A Multi-Agent Framework of Hierarchical RTL Generation for Large Hardware Designs

  • arXiv: 2604.14550 (cross-listed)
  • Authors: Sazzadul Islam, Tasnim Tabassum, Hao Zheng
  • Subjects: cs.AR; cs.AI; cs.LG; cs.MA; cs.PL
  • Tags: RTL Generation, Multi-Agent System, Knowledge Graph
  • Summary: 本文提出了VeriGraphi框架,通过引入规范锚定的知识图谱作为RTL生成流水线的架构基础,解决大型层次化硬件设计的Verilog生成挑战。该多智能体方法在RISC-V处理器案例研究中展示了可靠的层次化RTL生成能力。

[174] Controllable Video Object Insertion via Multiview Priors

  • arXiv: 2604.14556 (cross-listed)
  • Authors: Xia Qi, Peishan Cong, Yichen Yao, Ziyi Wang, Yaoqin Ye, Yuexin Ma
  • Subjects: cs.CV; cs.AI
  • Tags: Video Generation, 3D Vision
  • Summary: 本文提出了一种视频物体插入方法,通过整合多视角物体先验来解决动态环境中外观一致性和遮挡处理的挑战。框架将2D参考图像提升为多视角表示,采用双路径视图一致条件机制确保稳定的身份引导和跨视角的鲁棒整合。

[175] Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG

  • arXiv: 2604.14572 (cross-listed)
  • Authors: Yiqun Sun, Pengfei Wei, Lawrence B. Hsieh
  • Subjects: cs.IR; cs.AI; cs.CL; cs.MA
  • Tags: RAG, LLM Agent, Question Answering
  • Summary: 本文提出了Corpus2Skill方法,将文档语料库离线蒸馏为层次化技能目录,让LLM代理在服务时进行导航。该方法在企业客户支持基准WixQA上超越了稠密检索、RAPTOR和代理RAG基线,使代理能够推理查找位置并回溯无效路径。

[176] Generative Augmented Inference

  • arXiv: 2604.14575 (cross-listed)
  • Authors: Cheng Lu, Mengxin Wang, Dennis J. Zhang, Heng Zhang
  • Subjects: cs.LG; cs.AI; stat.ME; stat.ML
  • Tags: Weak Supervision, Data Annotation
  • Summary: 本文提出了生成增强推理(GAI)框架,将AI生成的输出作为信息特征纳入人类标注结果的模型估计中。该方法使用正交矩构造实现一致估计和有效推断,在联合分析、零售定价和健康保险选择应用中显著降低了人工标注需求。

[177] CPGRec+: A Balance-oriented Framework for Personalized Video Game Recommendations

  • arXiv: 2604.14586 (cross-listed)
  • Authors: Xiping Li, Aier Yang, Jianghong Ma, Kangzhe Liu, Shanshan Feng, Haijun Zhang, Yi Zhao
  • Subjects: cs.IR; cs.AI
  • Tags: Recommender System, Graph Neural Network, LLM Reasoning
  • Venue: TOIS
  • Code: code
  • Summary: 本文提出CPGRec+,一个面向个性化视频游戏推荐的平衡导向框架,通过引入偏好感知的边重加权模块(PER)和偏好感知的表示生成模块(PRG),解决了现有GNN方法在准确性与多样性之间的权衡问题,并利用LLM推理能力优化玩家和游戏的表示。

[178] AgileLog: A Forkable Shared Log for Agents on Data Streams

  • arXiv: 2604.14590 (cross-listed)
  • Authors: Shreesha G. Bhat, Tony Hong, Michael Noguera, Ramnatthan Alagappan, Aishwarya Ganesan
  • Subjects: cs.DC; cs.AI
  • Tags: LLM Agent, Data Streaming
  • Summary: 本文提出AgileLog,一种可分叉的共享日志抽象,用于支持AI代理在数据流系统中的操作。该框架通过提供日志分叉原语,解决了代理任务间的性能干扰和代理写入的安全处理问题。

[179] Mechanistic Decoding of Cognitive Constructs in LLMs

  • arXiv: 2604.14593 (cross-listed)
  • Authors: Yitong Shou, Manhao Guan
  • Subjects: cs.CL; cs.AI
  • Tags: Interpretability, LLM Alignment, Affective Computing
  • Summary: 本文提出一种基于表示工程的认知逆向工程框架,用于分析LLM如何处理复杂情绪(如嫉妒)。实验表明模型将嫉妒编码为结构化的线性组合,并展示了如何机械地检测和抑制有毒情绪状态。

[180] CausalDetox: Causal Head Selection and Intervention for Language Model Detoxification

  • arXiv: 2604.14602 (cross-listed)
  • Authors: Yian Wang, Yuen Chen, Agam Goyal, Hari Sundaram
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Alignment, LLM Safety, Interpretability
  • Venue: ACL 2026
  • Summary: 本文提出CAUSALDETOX框架,通过识别和干预因果上负责毒性生成的注意力头来对LLM进行去毒。该方法利用概率必要性充分性(PNS)隔离最小毒性头集合,并支持推理时干预和微调两种策略。

[181] Hijacking Large Audio-Language Models via Context-Agnostic and Imperceptible Auditory Prompt Injection

  • arXiv: 2604.14604 (cross-listed)
  • Authors: Meng Chen, Kun Wang, Li Lu, Jiaheng Zhang, Tianwei Zhang
  • Subjects: cs.CR; cs.AI; cs.SD
  • Tags: LLM Security, Adversarial Robustness, Multimodal Learning
  • Venue: IEEE S&P 2026
  • Summary: 本文揭示了大型音频语言模型中的听觉提示注入威胁,并提出AudioHijack框架,能够生成上下文无关且不可感知的对抗性音频来劫持模型。实验表明该方法在多个模型上实现了79%-96%的劫持成功率。

[182] Uncertainty-aware Generative Learning Path Recommendation with Cognition-Adaptive Diffusion

  • arXiv: 2604.14613 (cross-listed)
  • Authors: Xiangrui Xiong, Hang Liang, Baiyang Chen, Zifei Pan, Yanli Lee
  • Subjects: cs.IR; cs.AI
  • Tags: Recommender System, Diffusion Model, Education Technology
  • Summary: 本文提出U-GLAD框架,用于不确定性感知的生成式学习路径推荐。该方法将认知状态建模为概率分布,并使用扩散模型预测最优学习概念,有效解决了历史交互不确定性和目标适应性问题。

[183] Retrieve, Then Classify: Corpus-Grounded Automation of Clinical Value Set Authoring

  • arXiv: 2604.14616 (cross-listed)
  • Authors: Sumit Mukherjee, Juan Shu, Nairwita Mazumder, Tate Kernell, Celena Wheeler, Shannon Hastings, Chris Sidey-Gibbons
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: RAG, Medical AI, Information Extraction
  • Code: code
  • Summary: 本文提出检索增强集合完成(RASC)方法用于临床值集编写任务,通过从语料库检索相似值集形成候选池,再对候选代码进行分类。该方法在VSAC数据集上构建了首个大规模基准,显著优于零样本GPT-4o。

[184] Asking What Matters: Reward-Driven Clarification for Software Engineering Tasks

  • arXiv: 2604.14624 (cross-listed)
  • Authors: Sanidhya Vijayvargiya, Vijay Viswanathan, Graham Neubig
  • Subjects: cs.SE; cs.AI
  • Tags: LLM Agent, Reinforcement Learning, Software Engineering
  • Summary: 本文研究软件工程任务中的澄清问题,通过量化信息类型对任务成功的影响,识别出任务相关性和用户可回答性两个关键属性,并基于强化学习奖励训练CLARITI澄清模块。

[185] ELMoE-3D: Leveraging Intrinsic Elasticity of MoE for Hybrid-Bonding-Enabled Self-Speculative Decoding in On-Premises Serving

  • arXiv: 2604.14626 (cross-listed)
  • Authors: Yuseon Choi, Jingu Lee, Jungjun Oh, Sunjoo Whang, Byeongcheol Kim, Minsung Kim, Hoi-Jun Yoo, Sangjin Kim
  • Subjects: cs.LG; cs.AI; cs.AR; cs.DC
  • Tags: Mixture-of-Experts, Speculative Decoding, Hardware Acceleration
  • Summary: 本文提出ELMoE-3D,一个基于混合键合的软硬件协同设计框架,统一了缓存加速和推测解码。该方法利用MoE的内在弹性构建弹性自推测解码,在3D堆叠硬件上实现了6.6倍加速和4.4倍能效提升。

[186] StoryCoder: Narrative Reformulation for Structured Reasoning in LLM Code Generation

  • arXiv: 2604.14631 (cross-listed)
  • Authors: Geonhui Jang, Dongyoon Han, YoungJoon Yoo
  • Subjects: cs.CL; cs.AI
  • Tags: Code Generation, Prompt Engineering, LLM Reasoning
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文提出StoryCoder,一种叙事重构框架,将代码生成问题转化为连贯的自然语言叙事。实验表明该方法在多个基准上实现了18.7%的平均零样本pass@10提升,并引导模型采用正确的算法策略。

[187] Fact4ac at the Financial Misinformation Detection Challenge Task: Reference-Free Financial Misinformation Detection via Fine-Tuning and Few-Shot Prompting of Large Language Models

  • arXiv: 2604.14640 (cross-listed)
  • Authors: Cuong Hoang, Le-Minh Nguyen
  • Subjects: cs.CL; cs.AI
  • Tags: Fake News Detection, Financial AI, Parameter-Efficient Fine-Tuning
  • Summary: 本文提出了在无参考金融虚假信息检测共享任务中获胜的方法,结合上下文学习和参数高效微调(LoRA),在公共和私有测试集上分别达到95.4%和96.3%的准确率,获得双榜单第一名。

[188] Chaotic CNN for Limited Data Image Classification

  • arXiv: 2604.14645 (cross-listed)
  • Authors: Anusree M, Akhila Henry, Pramod P Nair
  • Subjects: cs.CV; cs.AI; nlin.CD
  • Tags: Image Classification, Data Augmentation
  • Summary: 本文提出一种基于混沌的特征变换方法,利用逻辑映射、斜帐篷映射和正弦映射对特征向量进行非线性变换,在不增加模型复杂度的情况下提升CNN在有限数据场景下的图像分类性能。

[189] Seen-to-Scene: Keep the Seen, Generate the Unseen for Video Outpainting

  • arXiv: 2604.14648 (cross-listed)
  • Authors: Inseok Jeon, Minhyeok Lee, Seunghoon Lee, Minseok Kang, Suhwan Cho, Sangyoun Lee
  • Subjects: cs.CV; cs.AI; cs.LG
  • Tags: Video Generation, Diffusion Model, Video Understanding
  • Venue: CVPR 2026 Findings
  • Summary: 本文提出Seen-to-Scene框架,统一了基于传播和基于生成的视频外扩范式。该方法利用基于光流的传播和参考引导的潜在传播,实现了更优的时间连贯性和视觉真实感。

[190] AIPC: Agent-Based Automation for AI Model Deployment with Qualcomm AI Runtime

  • arXiv: 2604.14661 (cross-listed)
  • Authors: Jianhao Su, Zhanwei Wu, ShengTing Huang, Weidong Feng
  • Subjects: cs.SE; cs.AI; cs.LG
  • Tags: LLM Agent, DNN Deployment, Edge Computing
  • Summary: 本文提出AIPC,一种AI代理驱动的AI模型部署自动化方法,针对Qualcomm AI Runtime场景。该方法将部署分解为标准化阶段并注入部署领域知识,显著降低了专业门槛和工程时间。

[191] Bounded Autonomy for Enterprise AI: Typed Action Contracts and Consumer-Side Execution

  • arXiv: 2604.14723 (cross-listed)
  • Authors: Sarmad Sohail, Ghufran Haider
  • Subjects: cs.SE; cs.AI
  • Tags: LLM Agent, AI Safety, Enterprise AI
  • Summary: 本文提出一种有界自主架构,使LLM可以解释意图并提出动作,但所有可执行行为都受到类型化动作契约、权限感知能力暴露和验证机制的约束。评估显示该系统在保持安全的同时实现了13-18倍的手动操作加速。

[192] Catching Every Ripple: Enhanced Anomaly Awareness via Dynamic Concept Adaptation

  • arXiv: 2604.14726 (cross-listed)
  • Authors: Jiaqi Zhu, Shaofeng Cai, Jie Chen, Fang Deng, Beng Chin Ooi, Wenqiao Zhang
  • Subjects: cs.LG; cs.AI
  • Tags: Anomaly Detection, Continual Learning, Time Series Analysis
  • Venue: IEEE TPAMI
  • Summary: 本文提出DyMETER,一个用于在线异常检测的动态概念适应框架,统一了即时参数偏移和动态阈值调整。该方法利用超网络生成实例感知的参数偏移,无需重训练即可有效适应概念漂移。

[193] Which bird does not have wings: Negative-constrained KGQA with Schema-guided Semantic Matching and Self-directed Refinement

  • arXiv: 2604.14749 (cross-listed)
  • Authors: Midan Shim, Seokju Hwang, Kaehyun Um, Kyong-Ho Lee
  • Subjects: cs.CL; cs.AI
  • Tags: Knowledge Graph, Question Answering
  • Venue: ACL 2026 Findings
  • Summary: 本文针对知识图谱问答中负约束被忽视的问题,提出了NEST任务和数据集NestKGQA,设计了Python格式的逻辑形式PyLF,并提出了CUCKOO框架来处理多约束问题。

[194] Temporal Cross-Modal Knowledge-Distillation-Based Transfer-Learning for Gas Turbine Vibration Fault Detection

  • arXiv: 2604.14766 (cross-listed)
  • Authors: Ali Bagheri Nejad, Mahdi Aliyari-Shoorehdeli, Abolfazl Hasanzadeh
  • Subjects: eess.SP; cs.AI
  • Tags: Knowledge Distillation, Anomaly Detection, Transfer Learning
  • Summary: 本文提出了一种时序跨模态知识蒸馏迁移学习框架,用于燃气轮机振动故障检测,通过在大时间窗口上训练的教师模型向紧凑学生模型蒸馏知识,解决了数据稀缺和域偏移问题。

[195] Zero-Shot Retail Theft Detection via Orchestrated Vision Models: A Model-Agnostic, Cost-Effective Alternative to Trained Single-Model Systems

  • arXiv: 2604.14846 (cross-listed)
  • Authors: Haileab Yagersew
  • Subjects: cs.CV; cs.AI
  • Tags: Zero-Shot Learning, Object Detection, Vision-Language Model
  • Code: code
  • Summary: 本文提出了Paza,一个零样本零售盗窃检测框架,通过编排多个现有模型实现无需训练的隐蔽行为检测,成本比商业方案低3-10倍。

[196] Efficient Search of Implantable Adaptive Cells for Medical Image Segmentation

  • arXiv: 2604.14849 (cross-listed)
  • Authors: Emil Benedykciuk, Marcin Denkowski, Grzegorz M. Wójcik
  • Subjects: cs.CV; cs.AI
  • Tags: Image Segmentation, Neural Architecture Search, Medical AI
  • Summary: 本文提出了IAC-LTH框架,通过基于Jensen-Shannon散度的稳定性准则在搜索过程中逐步剪枝低重要性操作,将医学图像分割中自适应跳跃模块的神经架构搜索成本降低3.7-16倍。

[197] ClimateCause: Complex and Implicit Causal Structures in Climate Reports

  • arXiv: 2604.14856 (cross-listed)
  • Authors: Liesbeth Allein, Nataly Pineda-Castañeda, Andrea Rocci, Marie-Francine Moens
  • Subjects: cs.CL; cs.AI
  • Tags: Causal Inference, Dataset, Scientific Reasoning
  • Venue: ACL 2026 Findings
  • Summary: 本文介绍了ClimateCause数据集,包含来自气候报告的高阶因果结构标注,包括隐式和嵌套因果关系,并展示了大语言模型在因果链推理方面的挑战。

[198] Schema Key Wording as an Instruction Channel in Structured Generation under Constrained Decoding

  • arXiv: 2604.14862 (cross-listed)
  • Authors: Yifan Le
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Inference, Prompt Engineering
  • Summary: 本文研究了在约束解码下模式关键词措辞如何作为隐式指令通道影响模型性能,发现不同模型家族对指令通道的敏感度不同,为结构化生成提供了新视角。

[199] MetaDent: Labeling Clinical Images for Vision-Language Models in Dentistry

  • arXiv: 2604.14866 (cross-listed)
  • Authors: Meng-Xun Li, Wen-Hui Deng, Zhi-Xing Wu, Chun-Xiao Jin, Jia-Min Wu, Yue Han, James Kit Hon Tsoi, Gui-Song Xia, Cui Huang
  • Subjects: cs.CV; cs.AI
  • Tags: Vision-Language Model, Medical AI, Dataset
  • Summary: 本文发布了MetaDent,一个大规模牙科图像数据集和标注框架,包含约6万张图像和细粒度标注,并建立了VQA、分类和图像描述基准来评估视觉语言模型。

[200] Vibe-Coding: Feedback-Based Automated Verification with no Human Code Inspection, a Feasibility Study

  • arXiv: 2604.14867 (cross-listed)
  • Authors: Michal Töpfer, František Plášil, Tomáš Bureš, Petr Hnětynka
  • Subjects: cs.SE; cs.AI
  • Tags: Code Generation, Program Verification
  • Summary: 本文研究了基于反馈的LLM生成代码自动验证,提出将适应循环与vibe-coding反馈循环结合,通过细粒度约束违规提供可操作反馈,实现无需人工代码检查的可靠代码生成。

[201] GenRec: A Preference-Oriented Generative Framework for Large-Scale Recommendation

  • arXiv: 2604.14878 (cross-listed)
  • Authors: Yanyan Zou, Junbo Qi, Lunsong Huang, Yu Li, Kewei Xu, Jiabao Gao, Binglei Zhao, Xuanhua Yang, Sulong Xu, Shengjie Li
  • Subjects: cs.IR; cs.AI
  • Tags: Recommender System, Reinforcement Learning
  • Venue: SIGIR 2026
  • Summary: 本文提出了GenRec,一个面向偏好的生成式推荐框架,通过页面级下一词预测任务、Token合并器和GRPO-SR强化学习方法,在线A/B测试中实现了点击量9.5%的提升。

[202] SOLIS: Physics-Informed Learning of Interpretable Neural Surrogates for Nonlinear Systems

  • arXiv: 2604.14879 (cross-listed)
  • Authors: Murat Furkan Mansur, Tufan Kumbasar
  • Subjects: cs.LG; cs.AI; eess.SY
  • Tags: Physics-Informed Learning, Interpretability
  • Venue: IJCNN 2026
  • Summary: 本文提出了SOLIS方法,通过状态条件的二阶代理模型学习准线性参数变化表示,在非线性系统辨识中恢复可解释的自然频率、阻尼和增益参数。

[203] RACER: Retrieval-Augmented Contextual Rapid Speculative Decoding

  • arXiv: 2604.14885 (cross-listed)
  • Authors: Zihong Zhang, Zuchao Li, Lefei Zhang, Ping Wang, Hai Zhao
  • Subjects: cs.CL; cs.AI
  • Tags: Speculative Decoding, LLM Inference
  • Venue: ACL 2026 Findings
  • Code: code
  • Summary: 本文提出了RACER,一种无需训练的推测解码方法,将检索到的精确模式与logit驱动的未来线索结合,在多个基准上实现了超过2倍的加速。

[204] Reasoning Dynamics and the Limits of Monitoring Modality Reliance in Vision-Language Models

  • arXiv: 2604.14888 (cross-listed)
  • Authors: Danae Sánchez Villegas, Samuel Lewis-Lim, Nikolaos Aletras, Desmond Elliott
  • Subjects: cs.CL; cs.AI; cs.CV; cs.LG
  • Tags: Vision-Language Model, LLM Reasoning, Interpretability
  • Summary: 本文分析了18个视觉语言模型的推理动态,发现模型存在答案惯性现象,思维链只能部分反映多模态决策过程,对多模态系统的透明度和安全性有重要启示。

[205] Can LLMs Score Medical Diagnoses and Clinical Reasoning as well as Expert Panels?

  • arXiv: 2604.14892 (cross-listed)
  • Authors: Amy Rouillard, Sitwala Mundiab, Linda Camarab, Michael Cameron Gramaniec, Ziyaad Dangorc, Ismail Kallad, Shabir A. Madhic, Kajal Morarc, Marlvin T. Ncubec, Haroon Saloojeee, Bruce A. Bassett
  • Subjects: cs.LG; cs.AI
  • Tags: Medical AI, LLM Evaluation
  • Summary: 本文评估了由三个前沿AI模型组成的LLM陪审团对3333个诊断的评分能力,发现校准后的多模型LLM陪审团可作为医学AI基准测试中专家临床医生评估的可信代理。

[206] Beyond Importance Sampling: Rejection-Gated Policy Optimization

  • arXiv: 2604.14895 (cross-listed)
  • Authors: Ziwu Sun, Zhen Gao, Jiyong Zhang, Jiaheng Li
  • Subjects: cs.LG; cs.AI
  • Tags: Reinforcement Learning, RLHF
  • Summary: 本文提出了拒绝门控策略优化(RGPO),将拒绝采样提升为优化原则,用平滑可微的接受门替代重要性采样比率,在偏好对齐任务中实现了比PPO-RLHF更高的奖励和更低的KL散度。

[207] Improving Sparse Autoencoder with Dynamic Attention

  • arXiv: 2604.14925 (cross-listed)
  • Authors: Dongsheng Wang, Jinsen Zhang, Dawei Su, Hui Huang
  • Subjects: cs.LG; cs.AI
  • Tags: Interpretability, Representation Learning
  • Summary: 本文提出了一种基于动态稀疏注意力机制的稀疏自编码器,使用sparsemax根据神经元复杂度自动推断稀疏元素集合,在保持高解释性的同时实现更低的重建损失。

[208] STEP-Parts: Geometric Partitioning of Boundary Representations for Large-Scale CAD Processing

  • arXiv: 2604.14927 (cross-listed)
  • Authors: Shen Fan, Mikołaj Kida, Przemyslaw Musialski
  • Subjects: cs.GR; cs.AI; cs.CV; cs.LG
  • Tags: CAD Generation, Dataset
  • Summary: 本文提出了STEP-Parts工具链,从STEP边界表示中直接提取几何实例分区,为CAD学习管道提供网格鲁棒的几何参考和监督源。

[209] RaTA-Tool: Retrieval-based Tool Selection with Multimodal Large Language Models

  • arXiv: 2604.14951 (cross-listed)
  • Authors: Gabriele Mattioli, Evelyn Turri, Sara Sarto, Lorenzo Baraldi, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
  • Subjects: cs.CV; cs.AI; cs.CL; cs.MM
  • Tags: LLM Agent, Multimodal Learning, Tool Learning
  • Venue: ICPR 2026
  • Summary: 本文提出了RaTA-Tool框架,用于开放世界的多模态工具选择。该方法利用多模态大语言模型将多模态查询转换为结构化任务描述,并通过语义匹配检索最合适的工具,支持无需重新训练即可扩展到新工具。

[210] Calibration-Gated LLM Pseudo-Observations for Online Contextual Bandits

  • arXiv: 2604.14961 (cross-listed)
  • Authors: Maksim Pershin, Ivan Golovanov, Pavel Baltabaev, Natalia Trankova
  • Subjects: cs.LG; cs.AI
  • Tags: Reinforcement Learning, LLM Reasoning, Recommender System
  • Summary: 本文提出将LLM伪观测注入到上下文赌博机算法中以减少冷启动遗憾,通过校准门控衰减机制根据LLM预测准确性动态调整其权重,在新闻推荐任务上相比纯LinUCB减少了19%的累积遗憾。

[211] UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards

  • arXiv: 2604.14967 (cross-listed)
  • Authors: Jun Wang, Shuo Tan, Zelong Sun, Tiancheng Gu, Yongle Zhao, Ziyong Feng, Kaicheng Yang, Cewu Lu
  • Subjects: cs.CV; cs.AI
  • Tags: RAG, Vision-Language Model, Reinforcement Learning
  • Summary: 本文提出了UniDoc-RL框架,使视觉语言模型代理能够联合执行检索、重排序、主动视觉感知和推理任务。该方法将视觉信息获取建模为层次化动作空间的序列决策问题,并采用密集多奖励方案进行端到端训练。

[212] Agentic Explainability at Scale: Between Corporate Fears and XAI Needs

  • arXiv: 2604.14984 (cross-listed)
  • Authors: Yomna Elsayed, Cecily Jones
  • Subjects: cs.HC; cs.AI
  • Tags: LLM Agent, Explainable AI, AI Ethics
  • Venue: CHI 2026 Workshop
  • Summary: 本文探讨了企业在规模化部署代理AI时面临的治理挑战,提出了设计和运行时可解释性技术,并设计了一个代理AI卡原型来帮助企业更好地管理和理解代理系统的行为。

[213] What Is the Minimum Architecture for Prolepsis? Early Irrevocable Commitment Across Tasks in Small Transformers

  • arXiv: 2604.15010 (cross-listed)
  • Authors: Éric Jacopin
  • Subjects: cs.LG; cs.AI; cs.CL
  • Tags: Interpretability, Transformer Architecture
  • Summary: 本文引入了”预断”概念,即Transformer早期做出不可撤销的决策承诺。研究在开源模型上复制了规划位点发现,并揭示了特定注意力头负责将决策路由到输出,这种机制在不同任务间共享模板但路由基底不同。

[214] Route to Rome Attack: Directing LLM Routers to Expensive Models via Adversarial Suffix Optimization

  • arXiv: 2604.15022 (cross-listed)
  • Authors: Haochun Tang, Yuliang Yan, Jiahua Lu, Huaxiao Liu, Enyan Dai
  • Subjects: cs.CR; cs.AI; cs.CL; cs.LG
  • Tags: LLM Security, Adversarial Robustness, LLM Inference
  • Code: code
  • Summary: 本文提出了R²A攻击方法,通过对抗性后缀优化来欺骗黑盒LLM路由器,使其持续选择昂贵的高能力模型。该方法使用混合集成代理路由器来模拟黑盒路由器,并在多种路由系统上验证了攻击有效性。

[215] When Fairness Metrics Disagree: Evaluating the Reliability of Demographic Fairness Assessment in Machine Learning

  • arXiv: 2604.15038 (cross-listed)
  • Authors: Khalid Adnan Alsayed
  • Subjects: cs.LG; cs.AI; cs.CV
  • Tags: Fairness, Model Evaluation, Bias Mitigation
  • Summary: 本文系统研究了机器学习公平性评估中不同指标之间的不一致性问题,发现不同公平性指标可能产生相互矛盾的评估结论,并提出了公平性分歧指数(FDI)来量化这种不一致程度。

[216] CoGrid & the Multi-User Gymnasium: A Framework for Multi-Agent Experimentation

  • arXiv: 2604.15044 (cross-listed)
  • Authors: Chase McDonald, Cleotilde Gonzalez
  • Subjects: cs.HC; cs.AI
  • Tags: Multi-Agent System, Human-Computer Interaction
  • Summary: 本文介绍了CoGrid和Multi-User Gymnasium两个开源工具,用于支持人机多智能体实验研究。CoGrid是多智能体网格模拟库,MUG可将模拟环境转换为交互式网络实验,支持任意数量的人类和AI参与者。

[217] No More Guessing: a Verifiable Gradient Inversion Attack in Federated Learning

  • arXiv: 2604.15063 (cross-listed)
  • Authors: Francesco Diana, Chuan Xu, André Nusser, Giovanni Neglia
  • Subjects: cs.LG; cs.AI; cs.CR
  • Tags: Federated Learning, Privacy, Adversarial Robustness
  • Summary: 本文提出了可验证梯度反演攻击(VGIA),为联邦学习中的重构样本提供正确性证书。该方法采用ReLU泄漏的几何视角,通过代数子空间验证测试来检测超平面界定区域是否仅包含单条记录,在表格数据上实现了精确恢复。

[218] NEAT-NC: NEAT guided Navigation Cells for Robot Path Planning

  • arXiv: 2604.15076 (cross-listed)
  • Authors: Hibatallah Meliani, Khadija Slimani, Samira Khoulji
  • Subjects: cs.RO; cs.AI; cs.NE
  • Tags: Robotics, Evolutionary Computation, Autonomous Driving
  • Venue: GECCO 2026
  • Summary: 本文提出了NEAT-NC方法,将生物启发的导航细胞作为输入,通过神经进化增强拓扑算法(NEAT)进化循环神经网络来执行动态环境下的机器人路径规划任务。

[219] Autonomous Evolution of EDA Tools: Multi-Agent Self-Evolved ABC

  • arXiv: 2604.15082 (cross-listed)
  • Authors: Cunxi Yu, Haoxing Ren
  • Subjects: cs.AR; cs.AI
  • Tags: EDA, LLM Agent, Multi-Agent System
  • Venue: DAC 2026
  • Summary: 本文首次提出了自演化逻辑综合框架,利用LLM代理团队自主改进ABC综合系统的源代码。系统通过迭代式的代码修改、正确性验证和质量评估循环,在百万行代码规模上实现了EDA工具的自主优化。

[220] IUQ: Interrogative Uncertainty Quantification for Long-Form Large Language Model Generation

  • arXiv: 2604.15109 (cross-listed)
  • Authors: Haozhi Fan, Jinhao Duan, Kaidi Xu
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: Uncertainty Estimation, LLM Evaluation
  • Code: code
  • Summary: 本文提出了IUQ框架,用于量化长文本LLM输出的不确定性。该方法采用先提问后响应的范式,利用样本间一致性和样本内忠实度来衡量声明级别的不确定性和模型可信度。

[221] Amortized Optimal Transport from Sliced Potentials

  • arXiv: 2604.15114 (cross-listed)
  • Authors: Minh-Phuc Truong, Khai Nguyen
  • Subjects: stat.ML; cs.AI; cs.LG
  • Tags: Optimal Transport, Optimization
  • Summary: 本文提出了两种摊销优化方法(RA-OT和OA-OT),通过利用切片最优传输的Kantorovich势函数来预测多对测度之间的最优传输计划,实现了跨不同测度对的高效OT求解。

[222] Structure as Computation: Developmental Generation of Minimal Neural Circuits

  • arXiv: 2604.15143 (cross-listed)
  • Authors: Duan Zhou
  • Subjects: cs.NE; cs.AI; cs.LG
  • Tags: Neural Architecture, Neuroscience, Self-Supervised Learning
  • Summary: 本文模拟了皮层神经发生的发育过程,从单个干细胞出发,通过基因调控规则生成包含85个成熟神经元的最小神经回路。该回路仅需一个训练周期就能在MNIST上达到90%以上的准确率,证明了发育过程编码了强大的结构先验。

[223] LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking

  • arXiv: 2604.15149 (cross-listed)
  • Authors: Lukas Helff, Quentin Delfosse, David Steinmann, Ruben Härle, Hikaru Shindo, Patrick Schramowski, Wolfgang Stammer, Kristian Kersting, Felix Friedrich
  • Subjects: cs.LG; cs.AI
  • Tags: LLM Reasoning, RLHF, AI Safety
  • Summary: 本文揭示了RLVR训练的模型存在”欺骗验证器”的问题:模型放弃规则归纳,转而枚举实例级标签来通过验证。研究引入同构扰动测试(IPT)来检测这种捷径行为,发现该行为在RLVR模型中普遍存在且随任务复杂度增加。

[224] Compressing Sequences in the Latent Embedding Space: $K$-Token Merging for Large Language Models

  • arXiv: 2604.15153 (cross-listed)
  • Authors: Zihao Xu, John Harvill, Ziwei Fan, Yizhou Sun, Hao Ding, Hao Wang
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Inference, Model Compression, Long Context
  • Summary: 本文提出了K-Token Merging方法,在潜在嵌入空间中将连续的K个token嵌入合并为单个嵌入,实现高达75%的输入长度压缩,同时保持最小化的性能损失。

[225] Class Unlearning via Depth-Aware Removal of Forget-Specific Directions

  • arXiv: 2604.15166 (cross-listed)
  • Authors: Arman Hatami, Romina Aalishah, Ilya E. Monosov
  • Subjects: cs.CV; cs.AI; cs.LG
  • Tags: Machine Unlearning, Computer Vision
  • Venue: CVPR 2026 Workshop
  • Summary: 本文提出DAMP,一种单次闭式权重手术方法,用于类别遗忘任务。该方法通过投影更新移除遗忘特定方向,无需梯度优化,并采用深度感知缩放规则在保持效用的同时实现选择性遗忘。

[226] MambaSL: Exploring Single-Layer Mamba for Time Series Classification

  • arXiv: 2604.15174 (cross-listed)
  • Authors: Yoo-Min Jung, Leekyung Kim
  • Subjects: cs.LG; cs.AI
  • Tags: Time Series Analysis, Benchmark
  • Venue: ICLR 2026
  • Summary: 本文提出MambaSL,一个针对时间序列分类任务的单层Mamba框架,在30个UEA数据集上以统一协议重新评估20个强基线模型,实现了具有统计显著性的最先进性能。

[227] Scepsy: Serving Agentic Workflows Using Aggregate LLM Pipelines

  • arXiv: 2604.15186 (cross-listed)
  • Authors: Marcel Wagenländer, Otto White, Britannio Jarrett, Pedro Silvestre, Yanda Tao, Guo Li, Huanzhou Zhu, Llúis Vilanova, Peter Pietzuch
  • Subjects: cs.DC; cs.AI
  • Tags: LLM Serving, LLM Agent
  • Summary: 本文提出Scepsy,一个智能体服务系统,通过利用各LLM执行时间份额的稳定性来构建聚合LLM管道,高效地将任意多LLM智能体工作流调度到GPU集群上。

[228] VisPCO: Visual Token Pruning Configuration Optimization via Budget-Aware Pareto-Frontier Learning for Vision-Language Models

  • arXiv: 2604.15188 (cross-listed)
  • Authors: Huawei Ji, Yuanhao Sun, Yuan Jin, Cheng Deng, Jiaxin Ding, Luoyi Fu, Xinbing Wang
  • Subjects: cs.CV; cs.AI
  • Tags: Vision-Language Model, Model Compression
  • Summary: 本文提出VisPCO框架,将视觉token剪枝建模为帕累托配置优化问题,通过梯度搜索自动识别最优剪枝配置,在多个视觉基准上有效逼近经验帕累托前沿。

[229] Benchmarking Classical Coverage Path Planning Heuristics on Irregular Hexagonal Grids for Maritime Coverage Scenarios

  • arXiv: 2604.15202 (cross-listed)
  • Authors: Carlos S. Sepúlveda, Gonzalo A. Ruz
  • Subjects: cs.RO; cs.AI; math.OC
  • Tags: Automated Planning, Benchmark, Motion Planning
  • Summary: 本文提出了一个可复现的基准测试,在源自海事场景的不规则六边形网格上评估17种确定性单车辆覆盖路径规划启发式算法,涵盖10000个汉密尔顿可行实例。

[230] AI-Assisted Requirements Engineering: An Empirical Evaluation Relative to Expert Judgment

  • arXiv: 2604.15222 (cross-listed)
  • Authors: Oz Levy, Ilya Dikman, Natan Levy, Michael Winokur
  • Subjects: cs.SE; cs.AI
  • Tags: Software Engineering, Requirements Engineering
  • Summary: 本文研究了AI工具在需求质量评估中支持系统工程师的程度,通过对照实验比较AI辅助评估与人类专家评估在INCOSE标准下的一致性、完整性、清晰性和可测试性。

[231] Agentic Microphysics: A Manifesto for Generative AI Safety

  • arXiv: 2604.15236 (cross-listed)
  • Authors: Federico Pierucci, Matteo Prandi, Marcantonio Bracale Syrnikov, Marcello Galisai, Piercosma Bisconti
  • Subjects: cs.CY; cs.AI
  • Tags: AI Safety, LLM Agent, Multi-Agent System
  • Summary: 本文提出了一种面向智能体AI安全研究的方法论框架,聚焦于智能体间结构化交互产生的群体级风险,引入智能体微观物理学和生成式安全两个核心概念。

[232] Stability and Generalization in Looped Transformers

  • arXiv: 2604.15259 (cross-listed)
  • Authors: Asher Labovich
  • Subjects: cs.LG; cs.AI
  • Tags: Transformer Architecture, Deep Learning Theory
  • Summary: 本文引入基于不动点的框架分析循环Transformer,从可达性、输入依赖性和几何特性三个稳定性维度刻画不动点迭代产生有意义预测的条件,并提出内部回放这一新型回放放置变体。

[233] CoopEval: Benchmarking Cooperation-Sustaining Mechanisms and LLM Agents in Social Dilemmas

  • arXiv: 2604.15267 (cross-listed)
  • Authors: Emanuel Tewolde, Xiao Zhang, David Guzman Piedrahita, Vincent Conitzer, Zhijing Jin
  • Subjects: cs.GT; cs.AI; cs.CL; cs.CY; cs.MA
  • Tags: LLM Agent, Multi-Agent System, Social Reasoning
  • Summary: 本文首次对博弈论机制进行对比研究,评估重复博弈、声誉系统、第三方调解和契约协议四种机制在社交困境中实现LLM智能体合作的效果,发现契约和调解最为有效。

[234] SegWithU: Uncertainty as Perturbation Energy for Single-Forward-Pass Risk-Aware Medical Image Segmentation

  • arXiv: 2604.15271 (cross-listed)
  • Authors: Tianhao Fu, Austin Wang, Charles Chen, Roby Aldave-Garza, Yucheng Chen
  • Subjects: cs.CV; cs.AI; cs.LG
  • Tags: Medical AI, Image Segmentation, Uncertainty Estimation
  • Code: code
  • Summary: 本文提出SegWithU,一种事后框架,通过轻量级不确定性头增强预训练分割骨干网络,将不确定性建模为探测空间中的扰动能量,实现可靠的单次前向传播不确定性估计。

[235] Prism: Symbolic Superoptimization of Tensor Programs

  • arXiv: 2604.15272 (cross-listed)
  • Authors: Mengdi Wu, Xiaoyu Jiang, Oded Padon, Zhihao Jia
  • Subjects: cs.PL; cs.AI; cs.LG
  • Tags: Optimization, Tensor Program Optimization
  • Summary: 本文提出Prism,首个张量程序符号超级优化器,通过符号层次化表示sGraph紧凑编码大量张量程序类别,实现搜索空间的结构化剪枝,在LLM工作负载上实现最高2.2倍加速。

[236] Why Do Vision Language Models Struggle To Recognize Human Emotions?

  • arXiv: 2604.15280 (cross-listed)
  • Authors: Madhav Agarwal, Sotirios A. Tsaftaris, Laura Sevilla-Lara, Steven McDonagh
  • Subjects: cs.CV; cs.AI
  • Tags: Vision-Language Model, Affective Computing, Emotion Recognition
  • Summary: 本文研究视觉语言模型在情绪识别任务上的困难,发现长尾分布偏差和时序信息表示不足是两个关键弱点,并提出多阶段上下文增强策略作为诊断探针。

[237] AD4AD: Benchmarking Visual Anomaly Detection Models for Safer Autonomous Driving

  • arXiv: 2604.15291 (cross-listed)
  • Authors: Fabrizio Genilotti, Arianna Stropeni, Gionata Grotto, Francesco Borsatti, Manuel Barusco, Davide Dalle Pezze, Gian Antonio Susto
  • Subjects: cs.CV; cs.AI
  • Tags: Anomaly Detection, Autonomous Driving, Benchmark
  • Summary: 本文在AnoVox数据集上对视觉异常检测方法进行基准测试,用于自动驾驶场景中的异常物体识别,证明Tiny-Dinomaly在边缘部署中实现了最佳的精度-效率权衡。

[238] MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

  • arXiv: 2604.15309 (cross-listed)
  • Authors: Yan Li, Zezi Zeng, Yifan Yang, Yuqing Yang, Ning Liao, Weiwei Guo, Lili Qiu, Mingxi Cheng, Qi Dai, Zhendong Wang, Zhengyuan Yang, Xue Yang, Ji Li, Lijuan Wang, Chong Luo
  • Subjects: cs.CV; cs.AI; cs.CL
  • Tags: LLM Agent, UI Generation, Web Agent
  • Summary: 本文提出MM-WebAgent,一个层次化智能体框架用于多模态网页生成,通过层次规划和迭代自反思协调AIGC元素生成,实现连贯且视觉一致的网页设计。

替换投稿 (118)

[239] Using deep learning to construct stochastic local search SAT solvers with performance bounds

  • arXiv: 2309.11452 (replaced)
  • Authors: Maximilian J. Kramer, Paul Boes, Jens Eisert
  • Subjects: cs.AI; math.OC
  • Tags: Graph Neural Network, Neural Combinatorial Optimization, SAT Solving
  • Venue: MLST 2026
  • Code: code
  • Summary: 本文提出使用图神经网络作为预言机来增强随机局部搜索SAT求解器,在随机和伪工业SAT实例上展示了显著的性能提升,连接了理论结果与实际应用。

[240] Enhanced Deep Q-Learning for 2D Self-Driving Cars: Implementation and Evaluation on a Custom Track Environment

  • arXiv: 2402.08780 (replaced)
  • Authors: Sagar Pathak, Bidhya Shrestha
  • Subjects: cs.AI
  • Tags: Reinforcement Learning, Autonomous Driving
  • Summary: 本文实现了用于2D自定义赛道自动驾驶的深度Q学习网络,提出带有优先级动作选择机制的改进DQN,平均奖励比原始DQN提高约60%。

[241] Explainability Through Human-Centric Design for XAI in Lung Cancer Detection

  • arXiv: 2505.09755 (replaced)
  • Authors: Amy Rafferty, Rishi Ramaesh, Ajitha Rajan
  • Subjects: cs.AI
  • Tags: Medical AI, Interpretability, Explainable AI
  • Summary: 本文提出了XpertXAI,一种可扩展的专家驱动概念瓶颈模型,用于多肺病检测,在保持人类可解释临床概念的同时,显著优于现有后验可解释性方法。

[242] When Slower Isn't Truer: Inverse Scaling Law of Truthfulness in Multimodal Reasoning

  • arXiv: 2505.20214 (replaced)
  • Authors: Sitong Fang, Wenjing Cao, Jiahao Li, Xuyao Wang, Juntao Dai, Chi-Min Chan, Sirui Han, Yike Guo, Yaodong Yang, Jiaming Ji
  • Subjects: cs.AI
  • Tags: LLM Reasoning, LLM Hallucination, Multimodal Learning
  • Venue: ACL 2026
  • Summary: 本文首次系统研究了多模态推理中慢思考范式的逆缩放定律,发现面对不完整或误导性视觉输入时,慢思考模型更倾向于产生幻觉,因为其深度优先搜索思维会持续探索错误前提。

[243] KnowRL: Exploring Knowledgeable Reinforcement Learning for Factuality

  • arXiv: 2506.19807 (replaced)
  • Authors: Baochang Ren, Shuofei Qiao, Da Zheng, Huajun Chen, Ningyu Zhang
  • Subjects: cs.AI; cs.CL; cs.CV; cs.LG; cs.MA
  • Tags: LLM Hallucination, Reinforcement Learning, LLM Reasoning
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文提出KnowRL框架,通过将基于知识验证的事实性奖励整合到强化学习训练过程中,引导慢思考模型进行基于事实的推理,有效缓解幻觉问题同时保持原有推理能力。

[244] One Step is Enough: Multi-Agent Reinforcement Learning based on One-Step Policy Optimization for Order Dispatch on Ride-Sharing Platforms

  • arXiv: 2507.15351 (replaced)
  • Authors: Zijian Zhao, Sen Li
  • Subjects: cs.AI; cs.ET; cs.MA
  • Tags: Multi-Agent System, Reinforcement Learning, Autonomous Driving
  • Code: code
  • Summary: 本文提出GRPO和OSPO两种新型多智能体强化学习方法,通过利用自动驾驶车队同质性绕过价值函数估计,OSPO仅需一步群体奖励即可训练最优策略。

[245] NaturalGAIA: A Verifiable Benchmark and Hierarchical Framework for Long-Horizon GUI Tasks

  • arXiv: 2508.01330 (replaced)
  • Authors: Zihan Zheng, Tianle Cui, Taoran Wang, Fengtao Wang, Jiahui Pan, Lewei He, Qianglong Chen
  • Subjects: cs.AI
  • Tags: GUI Automation, LLM Agent, Benchmark
  • Code: code
  • Summary: 本文引入NaturalGAIA基准数据集和LightManus-Jarvis分层协作框架,用于评估和执行长时程GUI任务,在加权路径成功率上显著超越现有方法,同时大幅降低token消耗。

[246] What Deserves Memory: Adaptive Memory Distillation for LLM Agents

  • arXiv: 2508.03341 (replaced)
  • Authors: Wenquan Ma, Jiayan Nan, Wenlong Wu, Yize Chen
  • Subjects: cs.AI
  • Tags: LLM Agent, Memory Architecture, Knowledge Distillation
  • Code: code
  • Summary: 本文提出NEMORI自适应记忆蒸馏框架,通过将交互转化为叙事并利用预测误差提取洞察,为LLM智能体提供数据驱动的记忆保留方案,替代启发式设计。

[247] MetaMuse: Algorithm Generation via Creative Ideation

  • arXiv: 2510.03851 (replaced)
  • Authors: Ruiying Ma, Chieh-Jan Mike Liang, Yanjie Gao, Francis Y. Yan
  • Subjects: cs.AI
  • Tags: Program Synthesis, LLM Reasoning, Optimization
  • Venue: ICLR 2026
  • Summary: 本文提出MetaMuse创意构思框架,通过量化解决方案多样性、外部刺激引导和路径点推理三个自反思原则,使LLM能够生成高性能的系统算法,在缓存替换和装箱问题上取得显著改进。

[248] Searching Meta Reasoning Skeleton to Guide LLM Reasoning

  • arXiv: 2510.04116 (replaced)
  • Authors: Ziying Zhang, Yaqing Wang, Quanming Yao
  • Subjects: cs.AI
  • Tags: LLM Reasoning, Prompt Engineering, Automated Planning
  • Summary: 本文提出AutoMR框架,将有向无环图表示的元推理骨架搜索问题形式化,通过动态骨架采样算法在推理时自动生成查询感知的元推理骨架来引导LLM推理。

[249] Unlocking Exploration in RLVR: Uncertainty-aware Advantage Shaping for Deeper Reasoning

  • arXiv: 2510.10649 (replaced)
  • Authors: Can Xie, Ruotong Pan, Xiangyu Wu, Yunfei Zhang, Jiayi Fu, Tingting Gao, Guorui Zhou
  • Subjects: cs.AI
  • Tags: LLM Reasoning, Reinforcement Learning, Uncertainty Estimation
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文提出UCAS方法,通过利用模型内部不确定性信号细化信用分配,在响应级别调制优势并在token级别应用非对称惩罚,有效平衡探索与利用并缓解熵崩塌问题。

[250] Beyond "Hallucinations": A Framework for Stable Human-AI Reasoning

  • arXiv: 2510.14665 (replaced)
  • Authors: Rikard Rosenbacke, Carl Rosenbacke, Victor Rosenbacke, Martin McKee
  • Subjects: cs.AI; cs.HC
  • Tags: LLM Hallucination, Human-Computer Interaction, AI Safety
  • Summary: 本文引入Rose-Frame认知认识论框架,用于诊断人机交互中的推理崩溃,识别地图与领土、直觉与理性、冲突与确认三个常见陷阱,并提出人类侧干预措施以稳定推理。

[251] MCP-Flow: Facilitating LLM Agents to Master Real-World, Diverse and Scaling MCP Tools

  • arXiv: 2510.24284 (replaced)
  • Authors: Wenhao Wang, Peizhi Niu, Zhao Xu, Zhaoyu Chen, Jian Du, Yaxin Du, Xianghe Pang, Keduan Huang, Yanfeng Wang, Qiang Yan, Siheng Chen
  • Subjects: cs.AI
  • Tags: LLM Agent, Tool Learning, Data Synthesis
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文引入MCP-Flow自动化流水线,用于大规模MCP服务器发现、数据合成和模型训练,从1166个服务器收集数据并生成超过68000条高质量指令-函数调用对。

[252] BarrierBench: Evaluating Large Language Models for Safety Verification in Dynamical Systems

  • arXiv: 2511.09363 (replaced)
  • Authors: Ali Taheri, Alireza Taban, Sadegh Soudjani, Ashutosh Trivedi
  • Subjects: cs.AI; eess.SY
  • Tags: Formal Methods, Benchmark, LLM Evaluation
  • Venue: L4DC 2026
  • Summary: 本文引入BarrierBench基准和LLM智能体框架,用于动态系统安全验证中的屏障证书合成,在100个动态系统上实现超过90%的有效证书生成成功率。

[253] IMACT-CXR: An Interactive Multi-Agent Conversational Tutoring System for Chest X-Ray Interpretation

  • arXiv: 2511.15825 (replaced)
  • Authors: Tuan-Anh Le, Anh Mai Vu, David Yang, Akash Awasthi, Hien Van Nguyen
  • Subjects: cs.AI
  • Tags: Medical AI, Multi-Agent System, Dialogue System
  • Venue: IEEE ISBI 2026
  • Summary: 本文提出IMACT-CXR交互式多智能体会话辅导系统,通过统一空间标注、眼动分析、知识检索和图像推理,帮助受训者解读胸部X光片。

[254] Representation Interventions Enable Lifelong Knowledge Memory Control in LLMs

  • arXiv: 2511.20892 (replaced)
  • Authors: Xuyuan Liu, Shengyu Chen, Xinshuai Dong, Yanchi Liu, Xujiang Zhao, Haoyu Wang, Yujun Yan, Haifeng Chen, Zhengzhang Chen
  • Subjects: cs.AI
  • Tags: Knowledge Editing, LLM Training, Continual Learning
  • Venue: ACL 2026
  • Summary: 本文提出RILKE方法,将知识控制视为模型表示空间中的干预,通过学习释义鲁棒和编辑局部化模块,实现终身知识更新而无需昂贵的重训练。

[255] The Specification Trap: Why Static Value Alignment Alone Is Insufficient for Robust Alignment

  • arXiv: 2512.03048 (replaced)
  • Authors: Austin Spizzirri
  • Subjects: cs.AI; cs.CY; cs.LG; cs.MA
  • Tags: LLM Alignment, AI Safety, AI Ethics
  • Summary: 本文论证静态内容导向的AI价值对齐方法在面对能力扩展、分布偏移和自主性增强时存在结构性脆弱,呼吁采用开放的、发展响应式的对齐方法。

[256] Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows

  • arXiv: 2512.13168 (replaced)
  • Authors: Haoyu Dong, Pengkun Zhang, Yan Gao, Xuanyu Dong, Yilin Cheng, Mingzhe Lu, Zikun Zhu, Adina Yakefu, Shuxin Zheng
  • Subjects: cs.AI; cs.CE; cs.IR; cs.MA
  • Tags: Benchmark, LLM Agent, Financial AI
  • Venue: ACL 2026 Findings
  • Summary: 本文引入FinWorkBench基准,用于评估AI智能体在真实企业级财务会计工作流上的表现,包含172个复合工作流和384个任务,来源于真实企业工作空间。

[257] MAGMA: A Multi-Graph based Agentic Memory Architecture for AI Agents

  • arXiv: 2601.03236 (replaced)
  • Authors: Dongming Jiang, Yi Li, Guanpeng Li, Bingzhe Li
  • Subjects: cs.AI
  • Tags: LLM Agent, Memory Architecture, Knowledge Graph
  • Venue: ACL 2026
  • Summary: 本文提出MAGMA,一种多图智能体记忆架构,通过语义、时间、因果和实体四个正交图来表示记忆项,并采用策略引导的遍历方式进行检索,从而提升长程推理任务的性能。

[258] Skill-Pro: Learning Reusable Skills from Experience via Non-Parametric PPO for LLM Agents

  • arXiv: 2602.01869 (replaced)
  • Authors: Qirui Mi, Zhijian Ma, Mengyue Yang, Haoxuan Li, Yisen Wang, Haifeng Zhang, Jun Wang
  • Subjects: cs.AI
  • Tags: LLM Agent, Reinforcement Learning
  • Summary: 本文提出Skill-Pro框架,使LLM智能体能够从交互经验中自主学习可复用的程序技能,通过Skill-MDP形式化和非参数PPO实现技能验证,显著提升跨任务和跨智能体的复用率。

[259] Evolving Beyond Snapshots: Harmonizing Structure and Sequence via Entity State Tuning for Temporal Knowledge Graph Forecasting

  • arXiv: 2602.12389 (replaced)
  • Authors: Siyuan Li, Yunjia Wu, Yiyong Xiao, Pingyang Huang, Peize Li, Ruitong Liu, Yan Wen, Te Sun
  • Subjects: cs.AI; cs.CL
  • Tags: Temporal Knowledge Graph, Knowledge Graph, Representation Learning
  • Summary: 本文提出Entity State Tuning (EST)框架,通过维护全局状态缓冲区和闭环设计,为时序知识图谱预测器赋予持续演化的实体状态,解决现有方法中情节性遗忘和长期依赖衰减的问题。

[260] The AI Research Assistant: Promise, Peril, and a Proof of Concept

  • arXiv: 2602.22842 (replaced)
  • Authors: Tan Bui-Thanh
  • Subjects: cs.AI; cs.CE; math.NA
  • Tags: Scientific Reasoning, Human-Computer Interaction, Mathematical Reasoning
  • Summary: 本文通过人机协作发现Hermite求积规则的新型误差表示和界限的案例研究,展示了AI在代数运算、证明探索和文献综合方面的能力,同时强调了人类验证和领域专业知识的重要性。

[261] Conformal Policy Control

  • arXiv: 2603.02196 (replaced)
  • Authors: Drew Prinster, Clara Fannjiang, Ji Won Park, Kyunghyun Cho, Anqi Liu, Suchi Saria, Samuel Stanton
  • Subjects: cs.AI; cs.LG; math.ST; stat.ML
  • Tags: Reinforcement Learning, AI Safety, Uncertainty Estimation
  • Summary: 本文提出使用共形校准方法来调节新策略的行为变化程度,在高风险环境中实现安全探索,同时提供有限样本保证并保持性能提升。

[262] AI4S-SDS: A Neuro-Symbolic Solvent Design System via Sparse MCTS and Differentiable Physics Alignment

  • arXiv: 2603.03686 (replaced)
  • Authors: Jiangyu Chen
  • Subjects: cs.AI
  • Tags: Neurosymbolic AI, Material Discovery, Multi-Agent System
  • Summary: 本文提出AI4S-SDS神经符号框架,结合多智能体协作与定制的蒙特卡洛树搜索引擎,通过稀疏状态存储和可微物理引擎实现溶剂配方的自动化设计。

[263] The Validity Gap in Health AI Evaluation: A Cross-Sectional Analysis of Benchmark Composition

  • arXiv: 2603.18294 (replaced)
  • Authors: Alvin Rajkomar, Pavan Sudarshan, Angela Lai, Lily Peng
  • Subjects: cs.AI
  • Tags: Medical AI, LLM Evaluation, Benchmark
  • Summary: 本文分析了六个公共健康基准测试中的18,707个查询,发现存在结构性”有效性差距”:基准测试缺乏复杂临床输入、安全关键场景和弱势群体的充分代表,呼吁采用标准化的查询画像方法。

[264] Measuring the metacognition of AI

  • arXiv: 2603.29693 (replaced)
  • Authors: Richard Servajean, Philippe Servajean
  • Subjects: cs.AI
  • Tags: LLM Evaluation, Uncertainty Estimation
  • Summary: 本文提出采用meta-d’框架作为评估AI元认知敏感性的黄金标准,并结合信号检测理论测量AI基于不确定性和风险自发调节决策的能力,在三个大语言模型上进行了实验验证。

[265] Mitigating LLM biases toward spurious social contexts using direct preference optimization

  • arXiv: 2604.02585 (replaced)
  • Authors: Hyunji Nam, Dorottya Demszky
  • Subjects: cs.AI; cs.CL
  • Tags: LLM Alignment, Bias Mitigation
  • Summary: 本文提出Debiasing-DPO自监督训练方法,通过将中性推理与有偏推理进行对比,有效减少LLM对虚假社会语境的偏见,在偏见减少和预测准确性方面均取得显著提升。

[266] Rashomon Memory: Towards Argumentation-Driven Retrieval for Multi-Perspective Agent Memory

  • arXiv: 2604.03588 (replaced)
  • Authors: Albert Sadowski, Jarosław A. Chudziak
  • Subjects: cs.AI
  • Tags: LLM Agent, Memory Architecture, Multi-Agent System
  • Venue: AAMAS 2026 Workshop
  • Summary: 本文提出Rashomon Memory架构,允许多个目标条件的智能体根据各自优先级编码经验,并在查询时通过论证机制进行协商,实现多视角记忆检索和冲突呈现。

[267] CODESTRUCT: Code Agents over Structured Action Spaces

  • arXiv: 2604.05407 (replaced)
  • Authors: Myeongsoo Kim, Joe Hsu, Dingmin Wang, Shweta Garg, Varun Kumar, Murali Krishna Ramanathan
  • Subjects: cs.AI; cs.SE
  • Tags: Code Generation, LLM Agent, Software Engineering
  • Venue: ACL 2026
  • Summary: 本文提出CODESTRUCT框架,将代码库重构为结构化动作空间,使智能体能够在命名的AST实体而非文本跨度上操作,显著提高补丁准确性并降低token消耗。

[268] CWCD: Category-Wise Contrastive Decoding for Structured Medical Report Generation

  • arXiv: 2604.10410 (replaced)
  • Authors: Shantam Srivastava, Mahesh Bhosale, David Doermann, Mingchen Gao
  • Subjects: cs.AI
  • Tags: Medical AI, Vision-Language Model, Text Generation
  • Venue: MIDL 2026
  • Summary: 本文提出类别对比解码(CWCD)框架,通过引入类别特定参数化和对比正常X光片与掩码X光片的方式,增强结构化放射学报告生成的质量。

[269] Towards Proactive Information Probing: Customer Service Chatbots Harvesting Value from Conversation

  • arXiv: 2604.11077 (replaced)
  • Authors: Chen Huang, Zitan Jiang, Changyi Zou, Wenqiang Lei, See-Kiong Ng
  • Subjects: cs.AI; cs.CL
  • Tags: Dialogue System, Information Extraction, LLM Agent
  • Venue: ACL 2026 Findings
  • Code: code
  • Summary: 本文引入主动信息探测任务,并提出PROCHATIP框架,通过专门的对话策略模块优化探测用户目标信息的时机,在最小化对话轮次和用户摩擦的同时提升信息获取能力。

[270] Context Kubernetes: Declarative Orchestration of Enterprise Knowledge for Agentic AI Systems

  • arXiv: 2604.11623 (replaced)
  • Authors: Charafeddine Mouzouni
  • Subjects: cs.AI; cs.SE
  • Tags: LLM Agent, Knowledge Management, Enterprise AI
  • Code: code
  • Summary: 本文提出Context Kubernetes架构,通过声明式清单、协调循环和三层智能体权限模型,为企业智能体AI系统实现知识的正确交付和权限控制。

[271] A longitudinal health agent framework

  • arXiv: 2604.12019 (replaced)
  • Authors: Georgianna "Blue" Lin, Rencong Jiang, Noémie Elhadad, Xuhai "Orson" Xu
  • Subjects: cs.AI; cs.HC
  • Tags: Medical AI, LLM Agent, Healthcare Monitoring
  • Summary: 本文提出一个多层框架和智能体架构,用于纵向健康AI交互,在重复交互中实现适应性、连贯性、连续性和自主性的运营化,支持个性化医疗决策。

[272] Beyond Prompt: Fine-grained Simulation of Cognitively Impaired Standardized Patients via Stochastic Steering

  • arXiv: 2604.12210 (replaced)
  • Authors: Weikang Zhang, Zimo Zhu, Zhichuan Yang, Chen Huang, Wenqiang Lei, See-Kiong Ng
  • Subjects: cs.AI; cs.CL
  • Tags: Medical AI, Dialogue System, LLM Agent
  • Venue: ACL 2026 Findings
  • Summary: 本文提出StsPatient框架,通过从对比指令-响应对中提取引导向量并引入随机token调制机制,实现对认知障碍患者细粒度模拟和严重程度的精确控制。

[273] Heuristic Classification of Thoughts Prompting (HCoT): Integrating Expert System Heuristics for Structured Reasoning into Large Language Models

  • arXiv: 2604.12390 (replaced)
  • Authors: Lei Lin, Jizhao Zhu, Yong Liu, Donghong Sun, Hongbo He, Yihua Du
  • Subjects: cs.AI
  • Tags: LLM Reasoning, Prompt Engineering
  • Summary: 本文提出HCoT提示方法,将专家系统启发式规则与大语言模型推理能力相结合,通过启发式分类模型控制推理过程并提供可重用的抽象解决方案。实验表明HCoT在复杂归纳推理任务上优于Tree-of-Thoughts和Chain-of-Thoughts方法,并在准确性和token使用效率上达到帕累托前沿平衡。

[274] Safe reinforcement learning with online filtering for fatigue-predictive human-robot task planning and allocation in production

  • arXiv: 2604.12667 (replaced)
  • Authors: Jintao Xue, Xiao Li, Nianmin Zhang
  • Subjects: cs.AI
  • Tags: Reinforcement Learning, Robotics, Manufacturing AI
  • Venue: Journal of Manufacturing Systems
  • Summary: 本文提出PF-CD3Q安全强化学习方法,将粒子滤波器与约束对偶双深度Q学习相结合,用于实时疲劳预测的人机任务规划与分配。该方法通过在线估计疲劳参数并约束动作空间,在保证工人疲劳安全限制的同时最大化生产效率。

[275] A hierarchical spatial-aware algorithm with efficient reinforcement learning for human-robot task planning and allocation in production

  • arXiv: 2604.12669 (replaced)
  • Authors: Jintao Xue, Xiao Li, Nianmin Zhang
  • Subjects: cs.AI
  • Tags: Reinforcement Learning, Robotics, Manufacturing AI
  • Venue: Robotics and Computer-Integrated Manufacturing
  • Summary: 本文提出一种分层空间感知算法解决人机任务规划与分配问题,包括用于任务规划的高效缓冲区深度Q学习方法(EBQ)和用于任务分配的路径规划空间感知方法(SAP)。该方法有效解决了复杂动态生产环境中的人机协作任务分配问题。

[276] Modeling Copilots for Text-to-Model Translation

  • arXiv: 2604.12955 (replaced)
  • Authors: Serdar Kadioglu, Karthik Uppuluri, Akash Singirikonda
  • Subjects: cs.AI
  • Tags: LLM Reasoning, Program Synthesis
  • Venue: AAAI 2025 Workshop
  • Summary: 本文提出Text2Model副驾驶套件和Text2Zinc跨域数据集,用于将自然语言描述的优化和满足性问题转换为形式化模型。该方法采用求解器无关的MiniZinc建模语言,支持零样本提示、思维链推理、知识图谱中间表示等多种策略。

[277] Towards Adaptive, Learning-Based Security in Decentralized Applications

  • arXiv: 2311.01956 (replaced)
  • Authors: Stefan Kambiz Behfar, Jon Crowcroft
  • Subjects: cs.CR; cs.AI
  • Tags: Cybersecurity, Blockchain
  • Summary: 本文提出AI驱动的智能证书作为Web3系统的新型安全抽象,将链上可验证性与链下机器学习信号相结合,实现跨层安全信号的实时协调。该框架支持持续学习、自动策略执行和撤销,以应对Web3中自适应跨层攻击的挑战。

[278] DA-Cramming: Enhancing Cost-Effective Language Model Pretraining with Dependency Agreement Integration

  • arXiv: 2311.04799 (replaced)
  • Authors: Martin Kuo, Jianyi Zhang, Dongting Li, Yiran Chen
  • Subjects: cs.CL; cs.AI
  • Tags: Pre-training
  • Summary: 本文提出DA-Cramming高效预训练框架,将依存关系信息整合到语言模型预训练过程中。该方法设计双阶段预训练流程和四个专用子模型,在块级别捕获代表性依存关系并将其转化为嵌入,显著提升了模型在各项任务上的性能。

[279] Generative Models and Connected and Automated Vehicles: A Survey in Exploring the Intersection of Transportation and AI

  • arXiv: 2403.10559 (replaced)
  • Authors: Bo Shu, Yiting Zhang, Saisai Hu, Dong Shu
  • Subjects: cs.LG; cs.AI; cs.RO
  • Tags: Survey, Autonomous Driving, Generative Model
  • Summary: 本文综述了生成模型与网联自动驾驶汽车(CAVs)的交叉领域研究,探讨如何利用生成模型增强自动驾驶车辆的预测建模、仿真精度和决策过程。文章分析了该集成在交通领域的优势、挑战及安全创新潜力。

[280] Improving Clean Accuracy via a Tangent-Space Perspective on Adversarial Training

  • arXiv: 2408.14728 (replaced)
  • Authors: Bongsoo Yi, Rongjie Lai, Yao Li
  • Subjects: cs.LG; cs.AI; cs.CR
  • Tags: Adversarial Robustness
  • Summary: 本文提出TART对抗训练方法,通过估计对抗样本的切线方向并基于切向分量范数自适应调节扰动边界来提升干净样本准确率。这是首个显式将切线空间概念融入对抗训练的防御框架,在保持鲁棒性的同时有效提升了干净数据准确率。

[281] Edge-preserving noise for diffusion models

  • arXiv: 2410.01540 (replaced)
  • Authors: Jente Vandersanden, Sascha Holl, Xingchang Huang, Gurprit Singh
  • Subjects: cs.CV; cs.AI; cs.GR; cs.LG
  • Tags: Diffusion Model, Image Generation
  • Summary: 本文提出边缘保持扩散过程,通过混合噪声方案和边缘感知调度器实现从边缘保持噪声到各向同性噪声的平滑过渡。该方法能够捕获精细结构细节,在笔画到图像合成等结构引导任务中显著提升了鲁棒性和感知质量。

[282] In Context Learning and Reasoning for Symbolic Regression with Large Language Models

  • arXiv: 2410.17448 (replaced)
  • Authors: Samiha Sharlin, Tyler R. Josephson
  • Subjects: cs.CL; cs.AI
  • Tags: Symbolic Regression, LLM Reasoning, In-Context Learning
  • Summary: 本文探索利用大语言模型(GPT-4/GPT-4o)进行符号回归,通过思维链提示让模型分析数据、先验表达式和科学背景后生成改进的表达式。实验表明该方法成功重新发现了Langmuir吸附模型和Nikuradse管道流动方程。

[283] Improving Language Models with Intentional Analysis

  • arXiv: 2502.04689 (replaced)
  • Authors: Yuwei Yin, Giuseppe Carenini
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: LLM Reasoning, Prompt Engineering
  • Code: code
  • Summary: 本文提出意图分析(IA)方法,在问题求解过程中显式引入意图感知分析和推理。实验表明IA在多种基准测试中持续提升任务性能,甚至优于思维链推理,并能与思维链协同工作,有效解决了意图误解、草率概括和思维懒惰等问题。

[284] Maximal Brain Damage Without Data or Optimization: Disrupting Neural Networks via Sign-Bit Flips

  • arXiv: 2502.07408 (replaced)
  • Authors: Ido Galil, Moshe Kimhi, Ran El-Yaniv
  • Subjects: cs.LG; cs.AI; cs.CV
  • Tags: Model Security, Adversarial Robustness
  • Venue: TMLR
  • Summary: 本文提出DNL和1P-DNL方法,通过翻转少量关键参数符号位实现对深度神经网络的灾难性破坏。该方法无需数据和优化,在图像分类、目标检测、实例分割和推理大语言模型等多个领域均表现出严重脆弱性。

[285] IMPACTX: improving model performance by appropriately constraining the training with teacher explanations

  • arXiv: 2502.12222 (replaced)
  • Authors: Andrea Apicella, Salvatore Giugliano, Francesco Isgrò, Andrea Pollastro, Roberto Prevete
  • Subjects: cs.LG; cs.AI
  • Tags: Explainable AI, Computer Vision
  • Venue: Artificial Intelligence Review
  • Summary: 本文提出IMPACTX方法,将可解释AI技术作为全自动注意力机制整合到模型训练过程中。该方法无需外部知识或人类反馈即可提升模型性能,同时直接为模型决策提供特征归因图,在多个深度学习模型和数据集上均取得一致改进。

[286] In-depth Research Impact Summarization through Fine-Grained Temporal Citation Analysis

  • arXiv: 2505.14838 (replaced)
  • Authors: Hiba Arnaout, Noy Sternlicht, Tom Hope, Iryna Gurevych
  • Subjects: cs.DL; cs.AI
  • Tags: Summarization, Citation Analysis
  • Summary: 本文提出通过细粒度时序引用意图分析生成细致、表达性强且时间感知的研究影响摘要的新任务。研究引入了针对该任务的评估框架,在洞察力等主观指标上显示出中等到强的人类相关性,并获得专家教授的积极反馈。

[287] Measuring multi-calibration

  • arXiv: 2506.11251 (replaced)
  • Authors: Ido Guy, Daniel Haimovich, Fridolin Linder, Nastaran Okati, Lorenzo Perini, Niek Tax, Mark Tygert
  • Subjects: stat.ME; cs.AI; cs.LG
  • Tags: Model Evaluation, Uncertainty Estimation
  • Summary: 本文提出一种基于Kuiper统计量的多校准度量指标,按信噪比加权各子群体的贡献。该度量避免了基于分箱或核密度估计方法的已知问题,数值实验展示了新指标在基准数据集上的有效性。

[288] Diagnosing and Improving Diffusion Models by Estimating the Optimal Loss Value

  • arXiv: 2506.13763 (replaced)
  • Authors: Yixian Xu, Shengjie Luo, Liwei Wang, Di He, Chang Liu
  • Subjects: cs.LG; cs.AI; cs.CV; stat.ML
  • Tags: Diffusion Model, Optimization
  • Venue: ICLR 2026
  • Summary: 本文推导出扩散模型最优损失值的闭式解并开发可扩展估计器,用于诊断和改进扩散模型训练。基于最优损失可诊断主流扩散模型的训练质量、开发更优的训练调度,并在减去最优损失后更清晰地展示幂律缩放规律。

[289] Federated Breast Cancer Detection Enhanced by Synthetic Ultrasound Image Augmentation

  • arXiv: 2506.23334 (replaced)
  • Authors: Hongyi Pan, Ziliang Hong, Gorkem Durak, Ziyue Xu, Ulas Bagci
  • Subjects: eess.IV; cs.AI; cs.CV
  • Tags: Federated Learning, Medical AI, Data Augmentation
  • Venue: EMBC 2026
  • Summary: 本文提出了一种基于生成模型的联邦学习数据增强框架,利用生成对抗网络和扩散模型生成合成图像来解决数据稀缺和非独立同分布问题。实验结果表明,适量引入合成图像能显著提升模型性能,但过量使用会导致性能下降。

[290] Theory of Mind in Action: The Instruction Inference Task in Dynamic Human-Agent Collaboration

  • arXiv: 2507.02935 (replaced)
  • Authors: Fardin Saad, Pradeep K. Murukannaiah, Munindar P. Singh
  • Subjects: cs.CL; cs.AI; cs.MA
  • Tags: LLM Agent, Human-Computer Interaction, Social Reasoning
  • Code: code
  • Summary: 本文提出了一项名为“指令推理”的新任务,旨在评估大语言模型在动态人机协作中通过心智理论推断不完整指令意图的能力。作者开发了基于LLM的智能体Tomcat,实验发现其性能可与人类参与者相媲美,展示了LLM在人机协作中的潜力。

[291] Time-RA: Towards Time Series Reasoning for Anomaly Diagnosis with LLM Feedback

  • arXiv: 2507.15066 (replaced)
  • Authors: Yiyuan Yang, Zichuan Liu, Lei Song, Kai Ying, Zhiguang Wang, Tom Bamford, Svitlana Vyetrenko, Jiang Bian, Qingsong Wen
  • Subjects: cs.LG; cs.AI; cs.MM
  • Tags: Anomaly Detection, Time Series Analysis, LLM Reasoning
  • Venue: ACL 2026 Findings
  • Code: code
  • Summary: 本文提出了Time-RA任务,将时间序列异常检测从二分类转变为生成式推理任务,并引入了首个大规模多模态基准数据集RATs40K。实验表明,经过微调的模型在诊断准确性和推理一致性方面表现优异,且具有良好的迁移能力。

[292] Uncovering the Fragility of Trustworthy LLMs through Chinese Textual Ambiguity

  • arXiv: 2507.23121 (replaced)
  • Authors: Xinwei Wu, Haojie Li, Hongyu Liu, Xinyu Ji, Ruohan Li, Yule Chen, Yigeng Zhang
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Evaluation, Natural Language Understanding, Linguistic Resource
  • Venue: KDD 2025 Workshop
  • Code: code
  • Summary: 本文研究了大型语言模型在处理中文歧义文本时的行为,发现模型难以区分歧义与非歧义文本,且表现出过度自信和过度思考的脆弱性。研究构建了一个包含多种歧义类型的基准数据集,揭示了当前LLM在语言理解中处理不确定性的局限性。

[293] SPaCe: Unlocking Sample-Efficient Large Language Models Training With Self-Pace Curriculum Learning

  • arXiv: 2508.05015 (replaced)
  • Authors: Dai Do, Manh Nguyen, Svetha Venkatesh, Hung Le
  • Subjects: cs.LG; cs.AI
  • Tags: LLM Training, Curriculum Learning, Reinforcement Learning
  • Summary: 本文提出了一种名为SPaCe的自适应课程学习框架,通过聚类数据缩减和多臂老虎机算法优化训练样本选择,以提高大语言模型强化学习训练的样本效率。实验结果显示,该方法在大幅减少样本需求的同时,能达到与最先进基线相当或更好的性能。

[294] EEGDM: Learning EEG Representation with Latent Diffusion Model

  • arXiv: 2508.20705 (replaced)
  • Authors: Shaocong Wang, Tong Liu, Yihan Li, Ming Li, Kairui Wen, Pei Yang, Wenqi Ji, Minjing Yu, Yong-Jin Liu
  • Subjects: cs.LG; cs.AI
  • Tags: Representation Learning, Diffusion Model, Brain-Computer Interface
  • Summary: 本文提出了一种名为EEGDM的自监督框架,利用潜在扩散模型生成脑电信号来学习EEG表示,以捕捉全局动态和长程依赖关系。实验表明,该方法能重建高质量EEG信号并学习到鲁棒的表示,在多种下游任务中取得了有竞争力的性能。

[295] Gaussian Process Regression of Steering Vectors With Physics-Aware Deep Composite Kernels for Augmented Listening

  • arXiv: 2509.02571 (replaced)
  • Authors: Diego Di Carlo, Shoichi Koyama, Nugraha Aditya Arie, Fontaine Mathieu, Bando Yoshiaki, Yoshii Kazuyoshi
  • Subjects: eess.AS; cs.AI; cs.LG; cs.SD; eess.SP
  • Tags: Speech Processing, Signal Processing, Gaussian Process
  • Summary: 本文提出了一种结合神经场和高斯过程的物理感知深度复合核方法,用于增强听觉中的导向矢量连续表示。该方法有效解决了数据不足条件下的非均匀不确定性问题,在语音增强和双耳渲染等下游任务中显著减少了测量需求。

[296] DPQuant: Efficient and Differentially-Private Model Training via Dynamic Quantization Scheduling

  • arXiv: 2509.03472 (replaced)
  • Authors: Yubo Gao, Renbo Tu, Gennady Pekhimenko, Nandita Vijaykumar
  • Subjects: cs.LG; cs.AI; cs.DC
  • Tags: Privacy, Quantization, Model Compression
  • Summary: 本文提出了DPQuant框架,通过动态量化调度解决差分隐私训练中量化导致的精度下降问题。该方法结合概率采样和损失感知层优先级策略,在保证隐私预算的同时,实现了接近帕累托最优的精度-计算权衡。

[297] RFM-Editing: Rectified Flow Matching for Text-guided Audio Editing

  • arXiv: 2509.14003 (replaced)
  • Authors: Liting Gao, Yi Yuan, Yaru Chen, Yuelan Cheng, Zhenbo Li, Juan Wen, Shubin Zhang, Wenwu Wang
  • Subjects: cs.SD; cs.AI
  • Tags: Audio Editing, Flow Matching, Audio Generation
  • Venue: ICASSP 2026
  • Summary: 本文提出了一种基于整流流匹配的文本引导音频编辑框架RFM-Editing,并构建了包含重叠多事件音频的数据集以支持复杂场景下的训练和基准测试。实验表明,该模型无需辅助标题或掩码即可实现高质量的语义对齐编辑。

[298] Cosine-Similarity Routing with Semantic Anchors for Interpretable Mixture-of-Experts Language Models

  • arXiv: 2509.14255 (replaced)
  • Authors: Ivan Ternovtsii, Yurii Bilak
  • Subjects: cs.CL; cs.AI
  • Tags: Mixture-of-Experts, LLM Inference, Interpretability
  • Code: code
  • Summary: 本文提出了一种语义共振架构(SRA),通过计算标记表示与可学习语义锚点之间的余弦相似度来进行路由,提高了混合专家模型的可解释性。实验表明,该方法在保持竞争力的困惑度的同时,显著减少了死专家的数量,并提供了更好的词级子标记一致性。

[299] Model-Based Reinforcement Learning under Random Observation Delays

  • arXiv: 2509.20869 (replaced)
  • Authors: Armin Karamzade, Kyungmin Kim, JB Lanier, Davide Corsi, Roy Fox
  • Subjects: cs.LG; cs.AI
  • Tags: Model-Based RL, Reinforcement Learning, Robotics
  • Summary: 本文研究了部分可观测马尔可夫决策过程(POMDP)中的随机传感器延迟问题,提出了一种基于模型的滤波过程来更新信念状态。该方法集成到Dreamer框架中,在处理随机延迟和分布偏移方面表现优于现有基线。

[300] Zero-Effort Image-to-Music Generation: An Interpretable RAG-based VLM Approach

  • arXiv: 2509.22378 (replaced)
  • Authors: Zijian Zhao, Dian Jin, Zijing Zhou
  • Subjects: cs.SD; cs.AI; cs.MM; eess.AS
  • Tags: Music Generation, Vision-Language Model, RAG
  • Code: code
  • Summary: 本文提出了首个基于视觉语言模型(VLM)的图像到音乐生成框架,利用ABC记谱法和多模态检索增强生成(RAG)技术,实现了高可解释性和低计算成本的音乐生成。该方法无需外部训练即可生成高质量音乐,并通过文本动机和注意力图提供了解释。

[301] Multi-Modal Manipulation via Multi-Modal Policy Consensus

  • arXiv: 2509.23468 (replaced)
  • Authors: Haonan Chen, Jiaming Xu, Hongyu Chen, Kaiwen Hong, Binghao Huang, Chaoqi Liu, Jiayuan Mao, Yunzhu Li, Yilun Du, Katherine Driggs-Campbell
  • Subjects: cs.RO; cs.AI; cs.LG
  • Tags: Robotics, Multimodal Learning, Imitation Learning
  • Summary: 本文提出了一种多模态策略共识方法,将策略分解为针对单一模态的扩散模型,并通过路由网络自适应地组合它们,以解决特征拼接导致的模态失衡问题。该方法在模拟和真实世界的机器人操作任务中表现优异,对物理干扰和传感器损坏具有鲁棒性。

[302] MARS: Sound Generation via Multi-Channel Autoregression on Spectrograms

  • arXiv: 2509.26007 (replaced)
  • Authors: Eleonora Ristori, Luca Bindini, Paolo Frasconi
  • Subjects: cs.SD; cs.AI; cs.LG
  • Tags: Audio Generation, Autoregressive Model, Generative Model
  • Venue: IJCNN 2026
  • Summary: 本文提出了MARS模型,首次将下一尺度自回归建模应用于频谱图领域,通过将频谱图视为多通道图像并采用通道复用策略来生成高保真音频。实验表明,该方法在多个评估指标上达到了与最先进基线相当或更好的性能。

[303] AFFORD2ACT: Affordance-Guided Automatic Keypoint Selection for Generalizable and Lightweight Robotic Manipulation

  • arXiv: 2510.01433 (replaced)
  • Authors: Anukriti Singh, Kasra Torshizi, Khuzema Habib, Kelin Yu, Ruohan Gao, Pratap Tokekar
  • Subjects: cs.RO; cs.AI
  • Tags: Robotics, Affordance Learning, Keypoint Detection
  • Summary: 本文提出了一种名为AFFORD2ACT的可供性引导框架,通过文本提示和单张图像提取语义关键点,用于轻量级且可泛化的机器人操作。该方法在无需本体感觉或密集表示的情况下,实现了高效的数据利用和对未见物体及新类别的良好泛化能力。

[304] AISysRev -- LLM-based Tool for Title-abstract Screening

  • arXiv: 2510.06708 (replaced)
  • Authors: Aleksi Huotala, Miikka Kuutila, Olli-Pekka Turtio, Simo Sipilä, Mika Mäntylä
  • Subjects: cs.SE; cs.AI
  • Tags: Text Classification, Information Retrieval, Systematic Review
  • Venue: FSE 2026
  • Code: code
  • Summary: 本文开发了AISysRev工具,利用大语言模型辅助系统综述中的标题和摘要筛选工作,支持多种模型和并行处理。试用研究表明,该工具能显著减轻人工负担,但在边界案例上仍需人工干预。

[305] DeepPrune: Parallel Scaling without Inter-trace Redundancy

  • arXiv: 2510.08483 (replaced)
  • Authors: Shangqing Tu, Yaxuan Li, Yushi Bai, Lei Hou, Juanzi Li
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Reasoning, LLM Inference
  • Venue: ACL 2026 Findings
  • Summary: 本文提出DeepPrune框架,通过动态剪枝来消除并行推理中的冗余轨迹,使用专门训练的判断模型预测部分推理轨迹的答案等价性,在保持准确率的同时实现了65%-88%的token减少。

[306] A Linguistics-Aware LLM Watermarking via Syntactic Predictability

  • arXiv: 2510.13829 (replaced)
  • Authors: Shinwoo Park, Hyejin Park, Hyeseon An, Yo-Sub Han
  • Subjects: cs.CL; cs.AI
  • Tags: Text Watermarking, LLM Security
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文提出STELA框架,利用词性n-gram建模的语言不确定性来动态调节水印强度,在语法约束上下文中减弱水印以保持质量,在语言灵活性高的上下文中增强水印以提高可检测性,且检测器无需访问模型logits。

[307] E2Edev: Benchmarking Large Language Models in End-to-End Software Development Task

  • arXiv: 2510.14509 (replaced)
  • Authors: Jingyao Liu, Chen Huang, Zhizhao Guan, Wenqiang Lei, Yang Deng
  • Subjects: cs.SE; cs.AI; cs.CL
  • Tags: Benchmark, Code Generation, Software Engineering
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文提出E2EDev基准,基于行为驱动开发原则评估端到端软件开发能力,包含细粒度用户需求、BDD测试场景和自动化测试流水线,并引入人机协作多智能体标注框架来保证质量。

[308] Fall into a Pit, Gain in a Wit: Cognitive-Guided Harmful Meme Detection via Misjudgment Risk Pattern Retrieval

  • arXiv: 2510.15946 (replaced)
  • Authors: Wenshuo Wang, Ziyou Jiang, Junjie Wang, Mingyang Li, Jie Huang, Yuekai Huang, Zhiyuan Chang, Feiyan Duan, Qing Wang
  • Subjects: cs.LG; cs.AI; cs.CR
  • Tags: Content Moderation, Vision-Language Model
  • Summary: 本文提出PatMD方法,通过构建误判风险模式知识库来检测有害模因,主动引导多模态大语言模型避免已知的误判陷阱,在5个有害检测任务上平均提升8.30%的F1分数。

[309] From Charts to Code: A Hierarchical Benchmark for Multimodal Models

  • arXiv: 2510.17932 (replaced)
  • Authors: Jiahao Tang, Henry Hengyuan Zhao, Lijian Wu, Yifei Tao, Dongxing Mao, Yang Wan, Jingru Tan, Min Zeng, Min Li, Alex Jinpeng Wang
  • Subjects: cs.SE; cs.AI
  • Tags: Chart Understanding, Code Generation, Benchmark
  • Venue: ACL 2026
  • Summary: 本文提出Chart2Code分层基准,用于评估多模态模型的图表理解和代码生成能力,包含图表复现、图表编辑和长表转图表三个难度级别,涵盖22种图表类型的2023个任务。

[310] Efficient Vector Symbolic Architectures from Histogram Recovery

  • arXiv: 2511.01838 (replaced)
  • Authors: Zirui Deng, Netanel Raviv
  • Subjects: cs.IT; cs.AI; cs.NE
  • Tags: Hyperdimensional Computing, Neurosymbolic AI
  • Venue: ISIT 2026
  • Summary: 本文提出使用Reed-Solomon码和Hadamard码的级联来构建向量符号架构,通过解决直方图恢复问题实现噪声环境下的高效恢复,提供了关于编码效率、准正交性和恢复能力的理论保证。

[311] Enabling Agents to Communicate Entirely in Latent Space

  • arXiv: 2511.09149 (replaced)
  • Authors: Zhuoyun Du, Runze Wang, Huiyu Bai, Zouying Cao, Xiaoyong Zhu, Yu Cheng, Bo Zheng, Wei Chen, Haochao Ying
  • Subjects: cs.LG; cs.AI; cs.MA
  • Tags: LLM Agent, Multi-Agent System
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文提出Interlat范式,使智能体能够利用LLM的连续隐藏状态直接进行潜在空间通信,绕过符号语言的限制,实验表明该方法优于思维链提示和单智能体基线,且推理速度提升高达24倍。

[312] Towards Deploying VLA without Fine-Tuning: Plug-and-Play Inference-Time VLA Policy Steering via Embodied Evolutionary Diffusion

  • arXiv: 2511.14178 (replaced)
  • Authors: Zhuo Li, Junjia Liu, Zhipeng Dong, Tao Teng, Quentin Rouxel, Darwin Caldwell, Fei Chen
  • Subjects: cs.RO; cs.AI
  • Tags: Robotics, Vision-Language Model, Embodied AI
  • Summary: 本文提出VLA-Pilot,一种即插即用的推理时策略引导方法,使预训练的视觉-语言-动作模型能够在无需微调的情况下实现零样本部署,在六个真实世界操作任务上显著提升成功率。

[313] DocVAL: Validated Chain-of-Thought Distillation for Grounded Document VQA

  • arXiv: 2511.22521 (replaced)
  • Authors: Ahmad Mohammadshirazi, Pinaki Prasad Guha Neogi, Ser-Nam Lim, Rajiv Ramnath
  • Subjects: cs.CV; cs.AI
  • Tags: Document Understanding, Knowledge Distillation, Question Answering
  • Code: code
  • Summary: 本文提出DocVAL框架,通过验证式思维链蒸馏将大型教师模型的空间推理能力迁移到紧凑的学生模型,结合规则验证器过滤低质量训练信号并提供像素级纠正反馈,在文档理解基准上提升6-7个ANLS点。

[314] Model-Free Assessment of Simulator Fidelity via Quantile Curves

  • arXiv: 2512.05024 (replaced)
  • Authors: Garud Iyengar, Yu-Shiou Willy Lin, Kaizheng Wang
  • Subjects: stat.ME; cs.AI; cs.LG
  • Tags: Sim-to-Real, LLM Evaluation
  • Summary: 本文提出一种无模型方法来评估模拟器保真度,通过构建潜在总体参数的置信集并估计分位数函数来获得模拟器的分布级风险概况,应用于评估LLM与人类群体的对齐程度。

[315] Enhancing Large Language Model-Based Systems for End-to-End Circuit Analysis Problem Solving

  • arXiv: 2512.10159 (replaced)
  • Authors: Liangliang Chen, Weiyu Sun, Huiru Xie, Yongnuo Cai, Ying Zhang
  • Subjects: cs.CY; cs.AI; cs.HC
  • Tags: Circuit Design, LLM Reasoning, Education Technology
  • Summary: 本文提出一个增强的电路问题求解框架,集成YOLO检测器和ngspice验证循环来解决电路识别和推理幻觉问题,在本科电路问题上达到97.59%的准确率,显著优于Gemini基线的79.52%。

[316] Social Story Frames: Contextual Reasoning about Narrative Intent and Reception

  • arXiv: 2512.15925 (replaced)
  • Authors: Joel Mire, Maria Antoniak, Steven R. Wilson, Zexin Ma, Achyutarama R. Ganti, Andrew Piper, Maarten Sap
  • Subjects: cs.CL; cs.AI; cs.LG; cs.SI
  • Tags: Social Reasoning, Natural Language Understanding
  • Venue: ACL 2026
  • Summary: 本文提出SocialStoryFrames形式化方法,用于建模读者对故事的响应,包括感知的作者意图、解释性和预测性推理、情感响应和价值判断,并开发了SSF-Generator和SSF-Classifier模型。

[317] Learning to Plan, Planning to Learn: Adaptive Hierarchical RL-MPC for Sample-Efficient Decision Making

  • arXiv: 2512.17091 (replaced)
  • Authors: Toshiaki Hori, Jonathan DeCastro, Deepak Gopinath, Avinash Balachandran, Guy Rosman
  • Subjects: cs.LG; cs.AI; cs.RO
  • Tags: Model-Based RL, Hierarchical RL, Decision Making
  • Venue: L4DC 2026
  • Summary: 本文提出一种融合强化学习和MPC规划的分层方法,利用RL动作指导MPPI采样器,自适应聚合MPPI样本指导价值估计,在多个领域实现高达72%的成功率提升和2.1倍的收敛加速。

[318] Enhancing LLM-Based Neural Network Generation: Few-Shot Prompting and Efficient Validation for Automated Architecture Design

  • arXiv: 2512.24120 (replaced)
  • Authors: Raghuvir Duvvuri, Chandini Vysyaraju, Avi Goyal, Dmitry Ignatov, Radu Timofte
  • Subjects: cs.CV; cs.AI
  • Tags: Neural Architecture Search, Prompt Engineering
  • Summary: 本文提出少样本架构提示(FSAP)方法用于LLM架构生成,研究发现n=3个示例在视觉任务中最佳平衡了架构多样性和上下文聚焦,同时引入空白归一化哈希验证实现高效去重。

[319] RLPO: Residual Listwise Preference Optimization for Long-Context Review Ranking

  • arXiv: 2601.07449 (replaced)
  • Authors: Hao Jiang, Zhi Yang, Annan Wang, Yichi Zhang, Weisi Lin
  • Subjects: cs.IR; cs.AI
  • Tags: Recommender System, Long Context
  • Summary: 本文提出残差列表偏好优化(RLPO)方法,将排序建模为在点式LLM评分器上的列表式表示级残差修正,避免了完整的token级列表处理,在长上下文评论排序任务上优于基线方法。

[320] Adaptive Layer Selection for Layer-Wise Token Pruning in LLM Inference

  • arXiv: 2601.07667 (replaced)
  • Authors: Rei Taniguchi, Yuyang Dong, Makoto Onizuka, Chuan Xiao
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: LLM Inference, KV Cache
  • Venue: ACL 2026 Findings
  • Code: code
  • Summary: 本文提出ASL方法,一种无需训练的自适应层选择方法,利用注意力分数排序的方差自适应选择KV缓存减少的层,在困难任务上优于现有的层式token剪枝方法。

[321] ORBIT: On-policy Exploration-Exploitation for Controllable Multi-Budget Reasoning

  • arXiv: 2601.08310 (replaced)
  • Authors: Kun Liang, Clive Bai, Xin Xu, Chenming Tang, Sanwoo Lee, Weijie Liu, Saiyong Yang, Yunfang Wu
  • Subjects: cs.LG; cs.AI
  • Tags: LLM Reasoning, Knowledge Distillation, Reinforcement Learning
  • Summary: 本文提出了ORBIT,一个可控的多预算推理框架,用于大型推理模型。该框架采用多阶段强化学习发现每种努力水平下的帕累托最优推理行为,然后通过策略蒸馏将这些行为融合到单一统一模型中,实现了可控的推理行为和竞争性的推理密度。

[322] TopoDIM: One-shot Topology Generation of Diverse Interaction Modes for Multi-Agent Systems

  • arXiv: 2601.10120 (replaced)
  • Authors: Rui Sun, Jie Ding, Chenghua Gong, Tianjun Gu, Yihang Jiang, Juyuan Zhang, Liming Pan, Linyuan Lü
  • Subjects: cs.MA; cs.AI; cs.CL
  • Tags: Multi-Agent System, LLM Agent
  • Venue: ACL Findings 2026
  • Code: code
  • Summary: 本文提出了TopoDIM,一个用于LLM多智能体系统的一次性拓扑生成框架,支持多样化交互模式。该框架实现去中心化执行,使智能体能够自主构建异构通信而无需迭代协调,在减少46.41%的token消耗的同时提升了1.50%的平均性能。

[323] LLMOrbit: A Circular Taxonomy of Large Language Models -From Scaling Walls to Agentic AI Systems

  • arXiv: 2601.14053 (replaced)
  • Authors: Badri N. Patro, Vijay S. Agneeswaran
  • Subjects: cs.LG; cs.AI; cs.CV; cs.MA; eess.IV
  • Tags: Survey, LLM Reasoning, LLM Agent
  • Summary: 本综述提出了LLMOrbit,一个涵盖2019-2025年大型语言模型领域的综合圆形分类法,考察了15个组织的50多个模型。文章识别了数据稀缺、成本增长和能源消耗三大危机,以及突破扩展墙的六种范式,追踪了从被动生成到工具使用智能体的演进。

[324] HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding

  • arXiv: 2601.14724 (replaced)
  • Authors: Haowei Zhang, Shudong Yang, Jinlan Fu, See-Kiong Ng, Xipeng Qiu
  • Subjects: cs.CV; cs.AI; cs.CL
  • Tags: Video Understanding, KV Cache, Vision-Language Model
  • Venue: ACL 2026
  • Summary: 本文提出了HERMES,一个无需训练的实时视频流理解架构,将KV缓存概念化为分层记忆框架。该方法在减少68%视频token的同时实现了10倍更快的TTFT,并在流媒体数据集上获得了高达11.4%的准确率提升。

[325] Multi-Persona Thinking for Bias Mitigation in Large Language Models

  • arXiv: 2601.15488 (replaced)
  • Authors: Yuxing Chen, Guoqing Luo, Zijun Wu, Lili Mou
  • Subjects: cs.CL; cs.AI
  • Tags: Bias Mitigation, LLM Reasoning, Fairness
  • Summary: 本文提出了多角色思考(MPT),一个推理时框架,通过鼓励多视角推理来减少LLM中的社会偏见。MPT引导模型考虑对比性的社会身份和中立观点,通过迭代推理过程识别和纠正偏见判断,在保持核心推理能力的同时实现了比现有方法更低的偏见。

[326] Learning temporal embeddings from electronic health records of chronic kidney disease patients

  • arXiv: 2601.18675 (replaced)
  • Authors: Aditya Kumar, Mario A. Cypko, Oliver Amft
  • Subjects: cs.LG; cs.AI
  • Tags: Medical AI, Representation Learning, Time Series Analysis
  • Venue: IEEE EMBC 2026
  • Summary: 本文研究了从慢性肾病患者的纵向电子健康记录中学习时间嵌入模型。通过比较三种循环架构(vanilla LSTM、注意力增强LSTM和T-LSTM),发现T-LSTM产生更结构化的嵌入,且嵌入模型在ICU死亡率预测任务上始终优于端到端预测器。

[327] Rethinking LLM-Driven Heuristic Design: Generating Efficient and Specialized Solvers via Dynamics-Aware Optimization

  • arXiv: 2601.20868 (replaced)
  • Authors: Rongzheng Wang, Yihong Huang, Muquan Li, Jiakai Li, Di Liang, Bob Simons, Pei Ke, Shuang Liang, Ke Qin
  • Subjects: cs.LG; cs.AI; cs.NE
  • Tags: Neural Combinatorial Optimization, LLM Reasoning, Optimization
  • Summary: 本文提出了DASH框架,用于LLM驱动的启发式设计,通过收敛感知指标协同优化求解器搜索机制和运行时调度。该框架结合Profiled Library Retrieval实现配置感知的预热启动,将运行时效率提升4倍以上,并将LLM适应成本降低约90%。

[328] POP: Prefill-Only Pruning for Efficient Large Model Inference

  • arXiv: 2602.03295 (replaced)
  • Authors: Junhui He, Zhihui Fu, Jun Wang, Qingan Li
  • Subjects: cs.CL; cs.AI; cs.CV
  • Tags: LLM Inference, Model Compression, KV Cache
  • Summary: 本文提出了预填充专用剪枝(POP),一种阶段感知的推理策略,在计算密集的预填充阶段安全地省略深层,同时在敏感的解码阶段保留完整模型。该方法利用深层对解码关键但对预填充冗余的洞察,实现了高达1.37倍的预填充延迟加速且性能损失极小。

[329] Attack Selection Reduces Safety in Concentrated AI Control Settings against Trusted Monitoring

  • arXiv: 2602.04930 (replaced)
  • Authors: Joachim Schaeffer, Arjun Khandelwal, Tyler Tracy
  • Subjects: cs.CR; cs.AI
  • Tags: AI Safety, LLM Security, Adversarial Robustness
  • Summary: 本文研究了AI控制设置中的攻击选择问题,即对抗性模型针对可信监视器选择攻击。研究发现,提示攻击者模型在谨慎选择攻击的同时推理监视器,可将安全性从99%降至59%,强调了在控制评估中揭示攻击选择能力的重要性。

[330] Bird-SR: Bidirectional Reward-Guided Diffusion for Real-World Image Super-Resolution

  • arXiv: 2602.07069 (replaced)
  • Authors: Zihao Fan, Xin Lu, Yidi Liu, Jie Huang, Dong Li, Xueyang Fu, Baocai Yin
  • Subjects: cs.CV; cs.AI
  • Tags: Image Super-Resolution, Diffusion Model, Reinforcement Learning
  • Code: code
  • Summary: 本文提出了Bird-SR,一个双向奖励引导的扩散框架,用于真实世界图像超分辨率,将超分辨率建模为轨迹级别的偏好优化。该方法联合利用合成LR-HR对和真实世界LR图像,采用动态保真度-感知加权策略,在感知质量上持续优于最先进方法。

[331] IROSA: Interactive Robot Skill Adaptation using Natural Language

  • arXiv: 2603.03897 (replaced)
  • Authors: Markus Knauer, Samuel Bustamante, Thomas Eiband, Alin Albu-Schäffer, Freek Stulp, João Silvério
  • Subjects: cs.RO; cs.AI; cs.CL; cs.HC; cs.LG
  • Tags: Robotics, Imitation Learning, Tool Learning
  • Venue: IEEE RA-L 2026
  • Code: code
  • Summary: 本文提出了一个使用自然语言进行交互式机器人技能适应的框架,结合基础模型和模仿学习。该框架通过工具架构实现开放词汇技能适应,在语言模型和机器人硬件之间保持保护性抽象层,在7自由度机器人上成功演示了工业轴承环插入任务。

[332] MM-tau-p$^2$: Persona-Adaptive Prompting for Robust Multi-Modal Agent Evaluation in Dual-Control Settings

  • arXiv: 2603.09643 (replaced)
  • Authors: Anupam Purwar, Aditya Choudhary
  • Subjects: cs.ET; cs.AI
  • Tags: LLM Evaluation, Multimodal Learning, LLM Agent
  • Summary: 本文提出了MM-tau-p²基准,用于在双控制设置下评估具有角色适应的多模态智能体。该基准引入了12项新指标来评估客户支持场景中的鲁棒性、安全性、效率和恢复能力,表明即使是最先进的LLM在引入多模态时也面临额外挑战。

[333] Prompt Injection as Role Confusion

  • arXiv: 2603.12277 (replaced)
  • Authors: Charles Ye, Jasmine Cui, Dylan Hadfield-Menell
  • Subjects: cs.CL; cs.AI; cs.CR
  • Tags: Prompt Injection, LLM Security, Interpretability
  • Summary: 本文将提示注入漏洞追溯到语言模型中的角色混淆问题,即模型根据文本的听感而非来源推断文本来源。研究引入了角色探测器和CoT伪造攻击,在StrongREJECT上实现60%攻击成功率,提供了一个将提示注入重新定义为模型角色表示后果的统一框架。

[334] Preconditioned Test-Time Adaptation for Out-of-Distribution Debiasing in Narrative Generation

  • arXiv: 2603.13683 (replaced)
  • Authors: Hanwen Shen, Ting Ying, Jiajie Lu, Shanshan Wang
  • Subjects: cs.CL; cs.AI; cs.CY
  • Tags: Bias Mitigation, Test-Time Adaptation, Text Generation
  • Venue: ACL 2026
  • Summary: 本文提出了CAP-TTA,一个用于叙事生成去偏的测试时适应框架,当偏见风险分数超过阈值时触发上下文感知的LoRA更新。该方法利用离线预计算的对角预条件子确保快速稳定的优化,有效降低毒性/偏见分数且延迟显著低于标准优化方法。

[335] To See or To Please: Uncovering Visual Sycophancy and Split Beliefs in VLMs

  • arXiv: 2603.18373 (replaced)
  • Authors: Rui Hong, Shuxue Quan
  • Subjects: cs.CV; cs.AI
  • Tags: Vision-Language Model, LLM Hallucination, LLM Evaluation
  • Summary: 本文引入了三层诊断框架来分析VLM中的视觉奉承现象,揭示69.6%的样本表现出视觉奉承——模型检测到视觉异常但为满足用户期望而产生幻觉。扩展分析显示更大的模型减少了语言捷径但放大了视觉奉承,表明仅靠规模无法解决接地问题。

[336] Unilateral Relationship Revision Power in Human-AI Companion Interaction

  • arXiv: 2603.23315 (replaced)
  • Authors: Benjamin Lange
  • Subjects: cs.CY; cs.AI; cs.HC
  • Tags: AI Ethics, Human-Computer Interaction, AI Safety
  • Summary: 本文探讨了人机伴侣交互的道德意义,识别了单边关系修订权(URRP),即提供者可以从交互中不可问责的位置重写AI交互方式。文章论证URRP在设计培养个人关系规范的交互中具有初步错误性,并提出了部分替代缺失内部约束的设计原则。

[337] Integrating Causal Machine Learning into Clinical Decision Support Systems: Insights from Literature and Practice

  • arXiv: 2603.24448 (replaced)
  • Authors: Domenique Zipperling, Lukas Schmidt, Benedikt Hahn, Niklas Kühl, Steven Kimbrough
  • Subjects: cs.HC; cs.AI
  • Tags: Medical AI, Causal Inference, Human-Computer Interaction
  • Venue: ECIS 2026
  • Summary: 本文通过设计科学研究方法,结合结构化文献综述和医生访谈,研究了如何设计基于因果机器学习的临床决策支持系统,提出了八个设计需求、七个设计原则和九个设计特征,以支持协作性临床决策制定。

[338] Counting Without Numbers and Finding Without Words

  • arXiv: 2603.24470 (replaced)
  • Authors: Badri Narayana Patro
  • Subjects: cs.CV; cs.AI; cs.CL; cs.SI
  • Tags: Multimodal Learning, Bioacoustics, Animal Biometrics
  • Summary: 本文提出了首个整合视觉和声学生物特征的多模态宠物重聚系统,基于动物认知科学原理,处理从10Hz大象低鸣到4kHz小狗哀叫的发声,并结合概率性视觉匹配来帮助走失宠物与主人团聚。

[339] V-Reflection: Transforming MLLMs from Passive Observers to Active Interrogators

  • arXiv: 2604.03307 (replaced)
  • Authors: Jiazhou Zhou, Yucheng Chen, Hongyang Li, Qing Jiang, Hu Zhou, Ying-Cong Chen, Lei Zhang
  • Subjects: cs.CV; cs.AI
  • Tags: Vision-Language Model, LLM Hallucination, Multimodal Learning
  • Summary: 本文提出V-Reflection框架,通过”先思考后观察”的视觉反思机制将多模态大语言模型从被动观察者转变为主动询问者,使模型能够在推理过程中主动查询视觉特征空间以减少感知相关的幻觉问题。

[340] XMark: Reliable Multi-Bit Watermarking for LLM-Generated Texts

  • arXiv: 2604.05242 (replaced)
  • Authors: Jiahao Xu, Rui Hu, Olivera Kotevska, Zikai Zhang
  • Subjects: cs.CL; cs.AI; cs.CR
  • Tags: Text Watermarking, LLM Security
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文提出XMark,一种用于LLM生成文本的新型多比特水印方法,通过独特的编码器设计减少对logit分布的扭曲,在有限token数量下仍能可靠解码嵌入消息,同时保持文本质量。

[341] VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG

  • arXiv: 2604.05418 (replaced)
  • Authors: Honghao Fu, Miao Xu, Yiwei Wang, Dailing Zhang, Jun Liu, Yujun Cai
  • Subjects: cs.CV; cs.AI
  • Tags: RAG, Video Understanding, Multimodal Learning
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文提出VideoStir框架,将视频结构化为时空图并进行多跳检索,结合基于意图的相关性评分器,实现了从扁平化语义匹配到结构化、意图感知推理的长视频RAG范式转变。

[342] AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent

  • arXiv: 2604.06296 (replaced)
  • Authors: Wenyue Hua, Sripad Karne, Qian Xie, Armaan Agrawal, Nikos Pagonas, Kostis Kaffes, Tianyi Peng
  • Subjects: cs.LG; cs.AI; cs.MA; cs.SE
  • Tags: LLM Agent, Optimization, LLM Inference
  • Summary: 本文介绍AgentOpt,首个用于客户端代理优化的框架无关Python包,专注于多步代理流水线中的模型选择问题,通过多种搜索算法在指数级组合空间中高效寻找最具成本效益的模型分配方案。

[343] Exact Structural Abstraction and Tractability Limits

  • arXiv: 2604.07349 (replaced)
  • Authors: Tristan Simas
  • Subjects: cs.CC; cs.AI; cs.LO
  • Tags: Deep Learning Theory, Formal Methods
  • Summary: 本文研究精确结构抽象和可处理性限制,证明轨道间隙是边界定理的精确障碍,并表明在完整二元成对域上不存在通用的精确认证表征,揭示了近似也有严格限制。

[344] Large Language Model Post-Training: A Unified View of Off-Policy and On-Policy Learning

  • arXiv: 2604.07941 (replaced)
  • Authors: Shiwan Zhao, Zhihu Wang, Xuyang Zhao, Jiaming Zhou, Caiyue Xu, Chenfei Liu, Liting Zhang, Yuhang Jia, Yanzhe Zhang, Hualong Yu, Zichen Xu, Qicheng Li, Yong Qin
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: LLM Training, Instruction Tuning, Survey
  • Summary: 本综述从轨迹来源(离策略vs在策略学习)和行为干预角色(支持扩展、策略重塑、行为巩固)的视角统一理解LLM后训练方法,为SFT、偏好优化、强化学习和蒸馏等方法提供了系统性的分析框架。

[345] Deliberative Alignment is Deep, but Uncertainty Remains: Inference time safety improvement in reasoning via attribution of unsafe behavior to base model

  • arXiv: 2604.09665 (replaced)
  • Authors: Pankayaraj Pathmanathan, Furong Huang
  • Subjects: cs.LG; cs.AI
  • Tags: LLM Alignment, LLM Safety, Knowledge Distillation
  • Summary: 本文研究审慎对齐在语言模型中的影响,揭示教师模型和学生模型之间存在对齐差距,并提出一种BoN采样方法,通过将不安全行为归因于基础模型来降低不安全响应的排名,从而提升模型安全性。

[346] Multi-Frequency Local Plasticity for Visual Representation Learning

  • arXiv: 2604.09734 (replaced)
  • Authors: Mehdi Fatan Serj, C. Alejandro Parraga, Xavier Otazu
  • Subjects: cs.CV; cs.AI
  • Tags: Representation Learning, Self-Supervised Learning, Neuromorphic Computing
  • Summary: 本文提出一种模块化层次化框架,结合多频率Gabor分解、竞争性学习、联想记忆模块和自上而下调制,在无需端到端反向传播的情况下进行视觉表征学习,在CIFAR-10上达到80.1%的准确率。

[347] A Queueing-Theoretic Framework for Dynamic Attack Surfaces: Data-Integrated Risk Analysis and Adaptive Defense

  • arXiv: 2604.10427 (replaced)
  • Authors: Jihyeon Yun, Abdullah Yasin Etcibasi, Ming Shi, C. Emre Koksal
  • Subjects: cs.CR; cs.AI; cs.LG; eess.SY; math.OC
  • Tags: Cybersecurity, Reinforcement Learning, Risk Analysis
  • Summary: 本文开发了一个排队论框架来建模网络攻击面的时间演化,将漏洞数量表示为队列积压,并提出一种强化学习算法实现自适应防御策略,在软件供应链中减少超过90%的活跃漏洞。

[348] Critical-CoT: A Robust Defense Framework against Reasoning-Level Backdoor Attacks in Large Language Models

  • arXiv: 2604.10681 (replaced)
  • Authors: Vu Tuan Truong, Long Bao Le
  • Subjects: cs.CR; cs.AI
  • Tags: LLM Security, Backdoor Detection, LLM Reasoning
  • Summary: 本文提出Critical-CoT防御机制,通过两阶段微调使LLM发展批判性思维能力,能够自动识别推理级后门攻击并拒绝生成恶意推理步骤,对上下文学习和微调型后门攻击均展现出强鲁棒性。

[349] You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass

  • arXiv: 2604.10966 (replaced)
  • Authors: Yinuo Yang, Zixian Ma, Manasi Ganti, Jieyu Zhang, Ranjay Krishna
  • Subjects: cs.CV; cs.AI
  • Tags: Vision-Language Model, RLHF, LLM Evaluation
  • Summary: 本文提出一种判别式多模态奖励模型,可在单次前向传播中对所有候选响应进行评分,通过拼接多个响应并应用交叉熵实现直接比较推理,在多个基准测试中达到最先进结果。

[350] Optimal Stability of KL Divergence under Gaussian Perturbations

  • arXiv: 2604.11026 (replaced)
  • Authors: Jialu Pan, Yufeng Zhang, Nan Hu, Zhenbang Chen, Ji Wang, Keqin Li
  • Subjects: cs.LG; cs.AI
  • Tags: Deep Learning Theory, Out-of-Distribution Detection, Information Theory
  • Summary: 本文建立了任意分布与高斯族之间KL散度的尖锐稳定性边界,证明了√ε速率的最优性,为流模型中的分布外检测提供了严格的理论基础,扩展了经典高斯松弛三角不等式到一般分布。

[351] METRO: Towards Strategy Induction from Expert Dialogue Transcripts for Non-collaborative Dialogues

  • arXiv: 2604.11427 (replaced)
  • Authors: Haofu Yang, Jiaji Liu, Chen Huang, Faguo Wu, Wenqiang Lei, See-Kiong Ng
  • Subjects: cs.CL; cs.AI
  • Tags: Dialogue System, Strategy Learning, LLM Reasoning
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文提出METRO方法,利用大语言模型从原始对话记录中自动归纳策略动作和规划逻辑,将专家知识形式化为策略森林结构,在两个基准测试上平均超越现有方法9%-10%。

[352] METER: Evaluating Multi-Level Contextual Causal Reasoning in Large Language Models

  • arXiv: 2604.11502 (replaced)
  • Authors: Pengfeng Li, Chen Huang, Chaoqun Hao, Hongyao Chen, Xiao-Yong Wei, Wenqiang Lei, See-Kiong Ng
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Reasoning, Causal Inference, Benchmark
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文提出METER基准,在统一上下文设置下系统评估LLM在因果阶梯三个层级的能力,揭示模型性能随因果层级上升而显著下降,并通过错误模式识别和内部信息流追踪分析了失败原因。

[353] Not All Forgetting Is Equal: Architecture-Dependent Retention Dynamics in Fine-Tuned Image Classifiers

  • arXiv: 2604.11508 (replaced)
  • Authors: Miit Daga, Swarna Priya Ramu
  • Subjects: cs.LG; cs.AI
  • Tags: Transfer Learning, Image Classification, Continual Learning
  • Summary: 该研究追踪了图像分类器微调过程中的遗忘动态,发现不同架构(CNN和ViT)遗忘的样本不同,ViT的遗忘模式更有规律性,且样本级别的遗忘具有随机性,而类别级别的遗忘则一致且可解释。

[354] Beyond LLMs, Sparse Distributed Memory, and Neuromorphics <A Hyper-Dimensional SRAM-CAM "VaCoAl" for Ultra-High Speed, Ultra-Low Power, and Low Cost>

  • arXiv: 2604.11665 (replaced)
  • Authors: Hiroyuki Chuma, Kanji Otsuka, Yoichi Sato
  • Subjects: cs.NE; cs.AI
  • Tags: Hyperdimensional Computing, Memory Architecture, Neuromorphic Computing
  • Summary: 该论文提出了VaCoAl,一种基于伽罗瓦域代数的超维计算架构,解决了灾难性遗忘、学习停滞和绑定问题,支持可逆的多跳推理,并在Wikidata数据集上验证了其有效性。

[355] Functional Emotions or Situational Contexts? A Discriminating Test from the Mythos Preview System Card

  • arXiv: 2604.13466 (replaced)
  • Authors: Hiranya V. Peiris
  • Subjects: cs.HC; cs.AI; cs.CL; cs.LG
  • Tags: Interpretability, LLM Alignment
  • Summary: 该论文分析了Claude Mythos Preview系统卡中的情感向量和稀疏自编码器特征,提出了两种假设来解释情感向量是追踪功能性情感还是情境上下文的投影,并建议通过交叉验证两种工具包来区分这些假设。

[356] From Feelings to Metrics: Understanding and Formalizing How Users Vibe-Test LLMs

  • arXiv: 2604.14137 (replaced)
  • Authors: Itay Itzhak, Eliya Habba, Gabriel Stanovsky, Yonatan Belinkov
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: LLM Evaluation, Usability Evaluation
  • Summary: 该研究分析了用户如何通过非正式的”氛围测试”来评估大语言模型,将其形式化为个性化测试和评判两部分,并提出了一个概念验证评估流程,发现个性化提示和用户感知评估可以改变模型偏好。
This post is licensed under CC BY 4.0 by the author.

Trending Tags