arXiv cs.AI Daily Update
cs.AI 领域 2026年4月13日 共有 289 篇论文更新:
- 26 篇新投稿:LLM Agent (OpenKedge [1], SEA-Eval [13], DRBENCHER [18]), Reinforcement Learning (RAMP [4], SPPO [9], StaRPO [10]), LLM Reasoning (SPPO [9], StaRPO [10], [6]), LLM Evaluation (PilotBench [12], SEA-Eval [13], DRBENCHER [18]), Multi-Agent System (OpenKedge [1], [11], [17])
- 154 篇跨领域投稿:Vision-Language Model (MARINER [67], LMGenDrive [81], SenBen [93]), LLM Evaluation (HM-Bench [104], MuTSE [110], Re [113]), Reinforcement Learning (StructRL [70], HTNav [103], WOMBET [111]), Medical AI (VerifAI [29], PSIRNet [88], MedFormer-UR [99]), LLM Agent (AlphaLab [57], SkillForge [69], Re [113])
- 109 篇替换投稿:LLM Agent (AlphaCast [188], EchoTrail-GUI [190], ReplicatorBench [198]), LLM Reasoning (Chain-in-Tree [186], AlphaCast [188], PACED [199]), Reinforcement Learning (ChipSeek [183], ActivityEditor [205], Webscale-RL [237]), Multi-Agent System (H-AdminSim [197], ActivityEditor [205], AgentSociety [220]), Vision-Language Model (MONETA [210], Chain-of-Zoom [223], VSI [228])
整体趋势:今日论文主要聚焦于Reinforcement Learning、LLM Agent、LLM Reasoning等方向。
已录用论文:[3](UMAP 2026), [4](AAMAS 2026 Workshop), [6](ICLR 2026 Workshop), [9](ACL 2026), [12](IJCNN 2026), [16](CogSci 2026), [24](ACL 2026), [27](NeurIPS 2024), [32](ICLR 2026), [43](ICLR 2026 Workshop), [46](ICLR 2026 Workshop), [63](AAMAS 2026 Workshop), [65](AAAI 2026 Workshop), [68](CVPR 2026), [69](SIGIR 2026), [72](CVPR 2026), [73](ICONI 2025), [74](ICRA 2026), [77](CVPR 2026), [78](ACL 2026), [85](ACL 2026), [91](XcoAx 2026), [93](CVPR 2026 Workshop), [95](CVPR 2026), [109](SIGIR 2026), [110](ITS 2026), [111](L4DC 2026), [122](ACL 2026), [128](CVPR 2026), [131](WWW 2026), [132](ACL 2026), [133](CVPR 2026), [136](ICPR 2026), [140](LAK 2026), [141](ACL 2026), [149](CVPR 2026), [151](CHI 2026 Workshop), [155](ACL 2026), [160](CVPR 2026), [175](CVPR 2026), [176](ACL 2026), [182](ACL 2026), [183](ACL 2026), [186](ACL 2026 Findings), [187](ACL 2026), [189](CVPR 2026), [190](CVPR 2026 Findings), [195](ICLR 2026), [197](CHIL 2026), [210](ACL 2026), [214](IJCAI 2024), [217](ACL 2026 Findings), [221](NAACL 2025 Findings), [222](ACL 2026), [223](NeurIPS 2025), [224](CVPR 2026 Findings), [228](CVPR 2026 Findings), [230](IEA/AIE 2026), [232](CVPR 2026), [234](ACL 2026), [246](IJDAR), [247](CVPR 2026), [248](CVPR 2026), [250](ICLR 2026), [251](CVPR 2026), [254](bwHPC Symposium 2024 Workshop), [259](AAMAS 2026 Workshop), [260](ICWSM 2026), [261](AIED 2026), [272](CVPR 2026), [273](MIDL 2026), [274](CVPR 2026 Workshop), [286](ACL 2026)
开源论文:[24](code), [38](code), [62](code), [76](code), [101](code), [104](code), [108](code), [112](code), [114](code), [125](code), [129](code), [132](code), [148](code), [156](code), [158](code), [168](code), [171](code), [174](code), [176](code), [178](code), [183](code), [186](code), [187](code), [188](code), [198](code), [212](code), [216](code), [217](code), [224](code), [231](code), [232](code), [250](code), [256](code), [267](code), [273](code)
新投稿 (26)
[1] OpenKedge: Governing Agentic Mutation with Execution-Bound Safety and Evidence Chains
- arXiv: 2604.08601
- Authors: Jun He, Deying Yu
- Subjects: cs.AI; cs.LG
- Tags: LLM Agent, AI Safety, Multi-Agent System
- Summary: 本文提出了OpenKedge协议,将AI代理的状态变更重新定义为受治理的过程,通过声明式意图提案、执行合约和加密证据链实现确定性审计和安全执行。
[2] From Business Events to Auditable Decisions: Ontology-Governed Graph Simulation for Enterprise AI
- arXiv: 2604.08603
- Authors: Hongyin Zhu, Jinming Liang, Mengjun Hou, Ruifan Tang, Xianbin Zhu, Jingyuan Yang, Yuanman Mao, Feng Wu
- Subjects: cs.AI; cs.CL
- Tags: LLM Agent, Knowledge Graph, Decision Making
- Summary: 本文提出LOM-action框架,通过事件驱动的本体论模拟使企业AI决策基于演化的图状态,在工具链F1指标上显著超越前沿基线模型。
[3] Sustained Impact of Agentic Personalisation in Marketing: A Longitudinal Case Study
- arXiv: 2604.08621
- Authors: Olivier Jeunen, Eleanor Hanna, Schaun Wheeler
- Subjects: cs.AI; cs.HC; cs.LG
- Tags: LLM Agent, Recommender System, LLM Personalization
- Venue: UMAP 2026
- Summary: 本文通过11个月的纵向案例研究,发现虽然主动人工管理能产生最高的参与度提升,但自主代理在被动期仍能维持正向效果,揭示了人机协同的可持续个性化模式。
[4] RAMP: Hybrid DRL for Online Learning of Numeric Action Models
- arXiv: 2604.08685
- Authors: Yarin Benyamin, Argaman Mordoch, Shahaf S. Shperberg, Roni Stern
- Subjects: cs.AI
- Tags: Automated Planning, Reinforcement Learning, Model-Based RL
- Venue: AAMAS 2026 Workshop
- Summary: 本文提出RAMP策略,结合深度强化学习、动作模型学习和规划,通过与环境的在线交互学习数值规划动作模型,在IPC基准测试上显著优于PPO算法。
[5] Parameterized Complexity Of Representing Models Of MSO Formulas
- arXiv: 2604.08707
- Authors: Petr Kučera, Petr Martinek
- Subjects: cs.AI; cs.CC
- Tags: Knowledge Representation, Formal Methods
- Summary: 本文扩展了Courcelle定理,证明MSO公式的模型可以用参数化线性大小的决策图表示,建立了与知识表示领域的新联系。
[6] Model Space Reasoning as Search in Feedback Space for Planning Domain Generation
- arXiv: 2604.08712
- Authors: James Oswald, Daniel Oblinsky, Volodymyr Varha, Vasilije Dragovic, Harsha Kokel, Kavitha Srinivas, Michael Katz, Shirin Sohrabi
- Subjects: cs.AI
- Tags: Automated Planning, LLM Reasoning, Program Synthesis
- Venue: ICLR 2026 Workshop
- Summary: 本文研究利用代理化语言模型反馈框架,结合地标符号反馈和计划验证器输出,通过模型空间启发式搜索从自然语言描述生成规划域。
[7] Artifacts as Memory Beyond the Agent Boundary
- arXiv: 2604.08756
- Authors: John D. Martin, Fraser Mince, Esra'a Saleh, Amy Pajak
- Subjects: cs.AI
- Tags: Reinforcement Learning, Memory Architecture
- Summary: 本文在强化学习框架下形式化了环境作为代理记忆的概念,证明某些观测(称为工件)可以减少表示历史所需的信息量,并通过实验验证了空间路径观测能降低学习策略所需的记忆量。
[8] Hidden in Plain Sight: Visual-to-Symbolic Analytical Solution Inference from Field Visualizations
- arXiv: 2604.08863
- Authors: Pengze Li, Jiaquan Zhang, Yunbo Long, Xinping Liu, Zhou wenjie, Encheng Su, Zihang Zeng, Jiaqi Liu, Jiyao Liu, Junchi Yu, Lihao Liu, Philip Torr, Shixiang Tang, Aoran Wang, Xi Chen
- Subjects: cs.AI
- Tags: Scientific Reasoning, Vision-Language Model, Symbolic Regression
- Summary: 本文提出ViSA-R2方法,通过物理学家式的思维链管道从场可视化中恢复可执行的SymPy解析表达式,并发布了包含30个线性稳态场景的ViSA-Bench基准。
[9] SPPO: Sequence-Level PPO for Long-Horizon Reasoning Tasks
- arXiv: 2604.08865
- Authors: Tianyi Wang, Yixia Li, Long Li, Yibiao Chen, Shaohan Huang, Yun Chen, Peng Li, Yang Liu, Guanhua Chen
- Subjects: cs.AI
- Tags: LLM Reasoning, RLHF, Reinforcement Learning
- Venue: ACL 2026
- Summary: 本文提出序列级PPO(SPPO)算法,将推理过程重构为序列级上下文赌博机问题,在数学基准测试上显著超越标准PPO并匹配计算密集型组方法的性能。
[10] StaRPO: Stability-Augmented Reinforcement Policy Optimization
- arXiv: 2604.08905
- Authors: Jinghan Zhang, Fengran Mo, Tharindu Cyril Weerasooriya, Ruimin Dai, Xiaoyan Han, Yanjie Fu, Dakuo Wang, Kunpeng Liu
- Subjects: cs.AI; cs.LG
- Tags: LLM Reasoning, Reinforcement Learning, RLHF
- Summary: 本文提出StaRPO框架,将推理稳定性(自相关函数和路径效率)显式纳入优化目标,在推理基准测试上同时提升了最终答案准确性和逻辑稳定性。
[11] Enhancing LLM Problem Solving via Tutor-Student Multi-Agent Interaction
- arXiv: 2604.08931
- Authors: Nurullah Eymen Özdemir, Erhan Oztop
- Subjects: cs.AI; cs.MA
- Tags: Multi-Agent System, Code Generation, LLM Reasoning
- Summary: 本文提出PETITE框架,通过导师-学生多代理角色差异化交互结构增强LLM问题求解能力,在APPS编码基准上以更少的token消耗达到相当或更高的准确率。
[12] PilotBench: A Benchmark for General Aviation Agents with Safety Constraints
- arXiv: 2604.08987
- Authors: Yalun Wu, Haotian Liu, Zhoujun Li, Boyang Wang
- Subjects: cs.AI
- Tags: LLM Evaluation, Embodied AI, Decision Making
- Venue: IJCNN 2026
- Summary: 本文提出PilotBench基准,用于评估LLM在安全关键飞行轨迹预测任务上的表现,揭示了LLM指令遵循能力与传统预测器数值精度之间的权衡。
[13] SEA-Eval: A Benchmark for Evaluating Self-Evolving Agents Beyond Episodic Assessment
- arXiv: 2604.08988
- Authors: Sihang Jiang, Lipeng Ma, Zhonghua Hong, Keyi Wang, Zhiyu Lu, Shisong Chen, Jinghao Zhang, Tianjun Pan, Weijia Zhou, Jiaqing Liang, Yanghua Xiao
- Subjects: cs.AI
- Tags: LLM Agent, LLM Evaluation
- Summary: 本文提出SEA-Eval基准,首次从任务内执行可靠性和长期进化性能两个维度评估自进化代理,揭示了当前框架在token消耗上高达31.2倍的差异。
[14] Hypergraph Neural Networks Accelerate MUS Enumeration
- arXiv: 2604.09001
- Authors: Hiroya Ijima, Koichiro Yawata
- Subjects: cs.AI; cs.LG; cs.LO
- Tags: Graph Neural Network, Reinforcement Learning, Formal Methods
- Summary: 本文提出使用超图神经网络结合强化学习来加速最小不可满足子集(MUS)枚举,在相同的可满足性检查预算内能枚举更多MUS。
[15] Advantage-Guided Diffusion for Model-Based Reinforcement Learning
- arXiv: 2604.09035
- Authors: Daniele Foffano, Arvid Eriksson, David Broman, Karl H. Johansson, Alexandre Proutiere
- Subjects: cs.AI; cs.LG
- Tags: Model-Based RL, Diffusion Model, Reinforcement Learning
- Summary: 本文提出基于优势引导的扩散模型强化学习方法(AGD-MBRL),利用优势估计引导反向扩散过程,在MuJoCo控制任务上显著提升了样本效率和最终回报。
[16] Overhang Tower: Resource-Rational Adaptation in Sequential Physical Planning
- arXiv: 2604.09072
- Authors: Ruihong Shen, Shiqian Li, Yixin Zhu
- Subjects: cs.AI
- Tags: Cognitive Science, Automated Planning, Decision Making
- Venue: CogSci 2026
- Summary: 本文通过Overhang Tower构建任务研究人类在资源约束下的序列物理规划,发现人类在资源压力下会同时转换物理预测机制和规划策略,揭示了层次化的资源理性架构。
[17] Camera Artist: A Multi-Agent Framework for Cinematic Language Storytelling Video Generation
- arXiv: 2604.09195
- Authors: Haobo Hu, Qi Mao, Yuanhang Li, Libiao Jin
- Subjects: cs.AI
- Tags: Multi-Agent System, Video Generation
- Summary: 本文提出了Camera Artist,一个多智能体框架,用于生成具有明确电影语言的故事视频。该框架引入了专门的摄影镜头智能体,通过递归故事板生成和电影语言注入来增强镜头间的叙事连贯性和电影表现力。
[18] DRBENCHER: Can Your Agent Identify the Entity, Retrieve Its Properties and Do the Math?
- arXiv: 2604.09251
- Authors: Young-Suk Lee, Ramon Fernandez Astudillo, Radu Florian
- Subjects: cs.AI
- Tags: LLM Evaluation, LLM Agent, Knowledge Graph
- Summary: 本文介绍了DRBENCHER,一个用于评估需要网络浏览和多步计算的深度研究智能体的合成基准生成器。该基准涵盖五个领域,评估显示最强的前沿模型仅达到20%的答案准确率。
[19] SAGE: A Service Agent Graph-guided Evaluation Benchmark
- arXiv: 2604.09285
- Authors: Ling Shi, Yuqin Dai, Ziyin Wang, Ning Gao, Wei Zhang, Chaozheng Wang, Yujie Wang, Wei He, Jinpeng Wang, Deiyi Xiong
- Subjects: cs.AI
- Tags: LLM Agent, LLM Evaluation, Dialogue System
- Summary: 本文提出了SAGE,一个用于评估LLM客服智能体的多智能体基准测试框架。该框架将标准操作流程形式化为动态对话图,并引入对抗意图分类法,在6个工业场景中对27个LLM进行评估,发现了模型能准确分类意图但无法推导正确后续动作的”执行差距”现象。
[20] Constraint-Aware Corrective Memory for Language-Based Drug Discovery Agents
- arXiv: 2604.09308
- Authors: Maochen Sun, Youzhi Zhang, Gaofeng Meng
- Subjects: cs.AI
- Tags: Drug Discovery, LLM Agent
- Summary: 本文提出了CACM,一个基于语言的药物发现框架,通过约束感知的纠正记忆实现精确的集合级诊断。实验结果表明,该方法在目标级成功率上比最先进基线提高了36.4%。
[21] Mind the Gap Between Spatial Reasoning and Acting! Step-by-Step Evaluation of Agents With Spatial-Gym
- arXiv: 2604.09338
- Authors: Lars Benedikt Kaesberg, Tianyu Yang, Niklas Bauer, Terry Ruas, Jan Philip Wahle, Bela Gipp
- Subjects: cs.AI; cs.CL
- Tags: LLM Reasoning, Embodied AI, LLM Evaluation
- Summary: 本文介绍了Spatial-Gym,一个用于评估空间推理能力的Gymnasium环境,将路径规划作为序列决策任务进行测试。实验发现最佳模型仅达到16%的解决率,而人类基准为98%。
[22] HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?
- arXiv: 2604.09408
- Authors: Mohamed Elfeki, Tu Trinh, Kelvin Luu, Guangze Luo, Nathan Hunt, Ernesto Montoya, Nandan Marwaha, Yannis He, Charles Wang, Fernando Crabedo, Alessa Castilo, Bing Liu
- Subjects: cs.AI
- Tags: LLM Evaluation, LLM Agent, Decision Making
- Summary: 本文提出了HiL-Bench,用于测量AI智能体何时应该寻求帮助而非自主行动的选择性升级技能。评估揭示了普遍的判断差距:前沿模型在决定是否提问时无法恢复其完整信息性能。
[23] Do We Really Need to Approach the Entire Pareto Front in Many-Objective Bayesian Optimisation?
- arXiv: 2604.09417
- Authors: Chao Jiang, Jingyu Huang, Miqing Li
- Subjects: cs.AI
- Tags: Optimization, Bayesian Optimization
- Summary: 本文提出了SPMO,一个基于单点的多目标搜索框架,用于多目标贝叶斯优化。该方法不再逼近整个Pareto前沿,而是专注于为决策者找到单一最高质量的解决方案。
[24] E3-TIR: Enhanced Experience Exploitation for Tool-Integrated Reasoning
- arXiv: 2604.09455
- Authors: Weiyang Guo, Zesheng Shi, Liye Zhao, Jiayuan Ma, Zeen Zhu, Junxian He, Min Zhang, Jing Li
- Subjects: cs.AI
- Tags: LLM Agent, Reinforcement Learning
- Venue: ACL 2026
- Code: code
- Summary: 本文提出了E3-TIR,一种用于智能体训练早期阶段的预热范式,通过整合专家前缀、专家引导和自我探索三种经验类型来训练工具集成推理模型。该方法在工具使用任务上实现了6%的性能提升,且仅需不到10%的合成数据。
[25] Process Reward Agents for Steering Knowledge-Intensive Reasoning
- arXiv: 2604.09482
- Authors: Jiwoong Sohn, Tomasz Sternal, Kenneth Styppa, Torsten Hoefler, Michael Moor
- Subjects: cs.AI
- Tags: LLM Reasoning, Medical AI, RAG
- Summary: 本文介绍了过程奖励智能体(PRA),一种在测试时为冻结策略提供领域相关在线步骤奖励的方法。在医学推理基准测试中,PRA在MedQA上达到80.8%的准确率,无需策略模型更新即可将准确率提高最多25.7%。
[26] Strategic Algorithmic Monoculture:Experimental Evidence from Coordination Games
- arXiv: 2604.09502
- Authors: Gonzalo Ballestero, Hadi Hosseini, Samarth Khanna, Ran I. Shorrer
- Subjects: cs.AI; cs.GT; cs.MA; econ.TH
- Tags: Multi-Agent System, Game AI
- Summary: 本文区分了主要算法单一文化和策略性算法单一文化,并在协调博弈中进行了实验研究。研究发现LLM表现出高基线相似性,并在响应协调激励时调节相似性,但在需要保持异质性时落后于人类。
跨领域投稿 (154)
[27] On Divergence Measures for Training GFlowNets
- arXiv: 2410.09355 (cross-listed)
- Authors: Tiago da Silva, Eliezer de Souza da Silva, Diego Mesquita
- Subjects: cs.LG; cs.AI; stat.ML
- Tags: Generative Flow Networks
- Venue: NeurIPS 2024
- Summary: 本文研究了四种散度度量用于训练GFlowNets,并设计了统计高效的随机梯度估计器。所提出的方法实现了显著更快的收敛速度,缩小了GFlowNets训练与广义变分近似之间的差距。
[28] Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces
- arXiv: 2604.08362 (cross-listed)
- Authors: Jiawei Chen, Ruoxi Xu, Boxi Cao, Ruotong Pan, Yunfei Zhang, Yifei Hu, Yong Du, Tingting Gao, Yaojie Lu, Yingfei Sun, Xianpei Han, Le Sun, Xiangyu Wu, Hongyu Lin
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: LLM Evaluation, Social Simulation, LLM Agent
- Summary: 本文提出了OmniBehavior,首个基于真实世界数据构建的用户行为模拟基准测试,整合了长时域、跨场景和异质性行为模式。研究发现当前LLM在模拟复杂人类行为时存在结构性偏差,倾向于收敛为”积极普通人”,表现出过度活跃、人格同质化和乌托邦偏差,导致个体差异和长尾行为的丢失。
[29] VerifAI: A Verifiable Open-Source Search Engine for Biomedical Question Answering
- arXiv: 2604.08549 (cross-listed)
- Authors: Miloš Košprdić, Adela Ljajić, Bojana Bašaragin, Darija Medvecki, Lorenzo Cassano, Nikola Milošević
- Subjects: cs.IR; cs.AI; cs.CL
- Tags: RAG, Medical AI, Question Answering
- Summary: 本文介绍了VerifAI,一个开源的生物医学问答专家系统,将RAG与事后声明验证机制相结合。系统通过将生成的答案分解为原子声明并使用微调的NLI引擎进行验证,显著减少了幻觉引用。
[30] Unbiased Rectification for Sequential Recommender Systems Under Fake Orders
- arXiv: 2604.08550 (cross-listed)
- Authors: Qiyu Qin, Yichen Li, Haozhao Wang, Cheng Wang, Rui Zhang, Ruixuan Li
- Subjects: cs.IR; cs.AI
- Tags: Recommender System, Adversarial Robustness
- Summary: 本文提出了DITaR,一种用于在虚假订单下对序列推荐系统进行无偏纠正的方法。该方法通过双视图识别和定向纠正来检测有害样本,在推荐质量、计算效率和系统鲁棒性方面均优于最先进方法。
[31] Automated Standardization of Legacy Biomedical Metadata Using an Ontology-Constrained LLM Agent
- arXiv: 2604.08552 (cross-listed)
- Authors: Josef Hardi, Martin J. O'Connor, Marcos Martinez-Romero, Jean G. Rosario, Stephen A. Fisher, Mark A. Musen
- Subjects: cs.DB; cs.AI
- Tags: LLM Agent, Knowledge Graph, Medical AI
- Summary: 本文提出了一个基于LLM的元数据标准化系统,通过实时查询权威生物医学术语服务来检索规范正确的词汇术语。评估表明,增强实时工具访问的LLM在预测准确性上持续优于单独使用LLM。
[32] GNN-as-Judge: Unleashing the Power of LLMs for Graph Learning with GNN Feedback
- arXiv: 2604.08553 (cross-listed)
- Authors: Ruiyao Xu, Kaize Ding
- Subjects: cs.LG; cs.AI; cs.CL
- Tags: Graph Neural Network, Few-Shot Learning, Knowledge Distillation
- Venue: ICLR 2026
- Summary: 本文提出了GNN-as-Judge框架,通过结合GNN的结构归纳偏置来释放LLM在文本属性图上的少样本半监督学习能力。该方法使用协作伪标签策略和弱监督微调算法,在低资源场景下显著优于现有方法。
[33] Drift and selection in LLM text ecosystems
- arXiv: 2604.08554 (cross-listed)
- Authors: Søren Riis
- Subjects: cs.CL; cs.AI
- Tags: Model Collapse
- Summary: 本文提出了一个可精确求解的数学框架,用于分析AI生成文本递归进入公共语料库的过程,识别出两种作用力:漂移(逐渐移除罕见形式)和选择(通过过滤维持更丰富的结构)。该框架揭示了递归出版何时会压缩公共文本,以及选择性过滤何时能维持更深层的结构。
[34] EMA Is Not All You Need: Mapping the Boundary Between Structure and Content in Recurrent Context
- arXiv: 2604.08556 (cross-listed)
- Authors: Arth Singh
- Subjects: cs.CL; cs.AI
- Tags: Representation Learning
- Summary: 本文使用指数移动平均(EMA)作为探针,研究了高效序列模型与简单时间平均的区别。研究发现EMA轨迹能有效编码时序结构但会破坏token身份信息,证明了固定系数累积存在不可逆的信息稀释问题。
[35] Re-Mask and Redirect: Exploiting Denoising Irreversibility in Diffusion Language Models
- arXiv: 2604.08557 (cross-listed)
- Authors: Arth Singh
- Subjects: cs.CL; cs.AI
- Tags: LLM Security, LLM Alignment
- Summary: 本文揭示了扩散语言模型的安全对齐存在结构性脆弱:其安全性依赖于去噪调度的单调性。作者通过简单的重新掩码和前缀注入方法,在多个模型上实现了超过76%的攻击成功率,证明了dLLM安全性的架构浅薄性。
[36] WAND: Windowed Attention and Knowledge Distillation for Efficient Autoregressive Text-to-Speech Models
- arXiv: 2604.08558 (cross-listed)
- Authors: Hanna Lee, Tan Dat Nguyen, Jaehoon Kang, Kyuhong Shim
- Subjects: cs.CL; cs.AI
- Tags: Speech Synthesis, Knowledge Distillation
- Summary: 本文提出了WAND框架,通过窗口化注意力和知识蒸馏使自回归文本转语音模型实现恒定的计算复杂度。该方法在保持合成质量的同时,实现了高达66.2%的KV缓存内存减少和近似恒定的每步延迟。
[37] Medical Reasoning with Large Language Models: A Survey and MR-Bench
- arXiv: 2604.08559 (cross-listed)
- Authors: Xiaohan Ren, Chenxiao Fan, Wenyin Ma, Hongliang He, Chongming Gao, Xiaoyan Zhao, Fuli Feng
- Subjects: cs.CL; cs.AI
- Tags: Medical AI, LLM Reasoning
- Summary: 本文对LLM医学推理进行了全面综述,将现有方法组织为七条技术路线,并引入了基于真实医院数据的MR-Bench基准。评估揭示了考试级性能与真实临床决策任务之间存在显著差距。
[38] Uncertainty Estimation for the Open-Set Text Classification systems
- arXiv: 2604.08560 (cross-listed)
- Authors: Leonid Erlygin, Alexey Zaytsev
- Subjects: cs.CL; cs.AI
- Tags: Uncertainty Estimation, Text Classification
- Code: code
- Summary: 本文将整体不确定性估计方法适配到开放集文本分类任务,解决了文本不确定性和画廊不确定性两种主要误差来源。所提方法在多个数据集上的预测拒绝率相比基线提升了40-365%。
[39] Neural networks for Text-to-Speech evaluation
- arXiv: 2604.08562 (cross-listed)
- Authors: Ilya Trofimenko, David Kocharyan, Aleksandr Zaitsev, Pavel Repnikov, Mark Levin, Nikita Shevtsov
- Subjects: cs.CL; cs.AI; cs.SD; eess.AS
- Tags: Speech Evaluation
- Summary: 本文提出了一系列神经网络模型用于TTS系统评估,包括用于相对评估的NeuralSBS和用于绝对评估的WhisperBert集成模型。最佳MOS模型的RMSE约为0.40,显著优于人类评估者间0.62的RMSE基线。
[40] Temperature-Dependent Performance of Prompting Strategies in Extended Reasoning Large Language Models
- arXiv: 2604.08563 (cross-listed)
- Authors: Mousa Salah, Amgad Muneer
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: LLM Reasoning, Prompt Engineering
- Summary: 本文系统评估了采样温度和提示策略对扩展推理LLM的影响。研究发现零样本提示在中等温度下表现最佳,而思维链提示在温度极值处表现更好,挑战了推理任务使用T=0的常见做法。
[41] Dynamic sparsity in tree-structured feed-forward layers at scale
- arXiv: 2604.08565 (cross-listed)
- Authors: Reza Sedghi, Robin Schiewer, Anand Subramoney, David Kappel
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: Model Compression
- Summary: 本文研究了树结构前馈层作为Transformer中MLP块的稀疏替代方案,实现了通过硬层次路由进行条件计算。尽管每个token激活不到5%的单元,模型在受控训练下仍能匹配稠密基线性能,并展现出可扩展至10亿参数以上的能力。
[42] Can We Still Hear the Accent? Investigating the Resilience of Native Language Signals in the LLM Era
- arXiv: 2604.08568 (cross-listed)
- Authors: Nabelanita Utami, Sasano Ryohei
- Subjects: cs.CL; cs.AI
- Tags: Linguistic Resource
- Summary: 本研究通过分析ACL论文集中的母语识别趋势,调查了从机器翻译到LLM的写作工具演变是否正在同化研究论文。分析显示NLI性能持续下降,且后LLM时代不同语言呈现出不同的异常趋势。
[43] QuanBench+: A Unified Multi-Framework Benchmark for LLM-Based Quantum Code Generation
- arXiv: 2604.08570 (cross-listed)
- Authors: Ali Slim, Haydar Hamieh, Jawad Kotaich, Yehya Ghosn, Mahdi Chehimi, Ammar Mohanna, Hasan Abed Al Kader Hammoud, Bernard Ghanem
- Subjects: cs.LG; cs.AI; cs.PL; cs.SE
- Tags: Code Generation, Quantum Computing
- Venue: ICLR 2026 Workshop
- Summary: 本文引入了QuanBench+,一个跨Qiskit、PennyLane和Cirq三个框架的统一量子代码生成基准。结果显示最强的一次性得分在Qiskit上达到59.5%,基于反馈修复后最佳得分提升至83.3%,表明可靠的多框架量子代码生成仍具挑战性。
[44] Robust Reasoning Benchmark
- arXiv: 2604.08571 (cross-listed)
- Authors: Pavel Golikov, Evgenii Opryshko, Gennady Pekhimenko, Mark C. Jeffrey
- Subjects: cs.LG; cs.AI; cs.CL
- Tags: LLM Reasoning, LLM Evaluation
- Summary: 本文提出了一个包含14种技术的扰动管道来评估LLM推理的鲁棒性,发现开放权重推理模型在扰动后准确率下降高达55%。研究还表明中间推理步骤会永久污染标准稠密注意力机制,导致连续解题时准确率衰减。
[45] Silhouette Loss: Differentiable Global Structure Learning for Deep Representations
- arXiv: 2604.08573 (cross-listed)
- Authors: Matheus Vinícius Todescato, Joel Luís Carbonera
- Subjects: cs.LG; cs.AI; cs.CV
- Tags: Representation Learning
- Summary: 本文引入了软轮廓损失,一种受聚类分析中轮廓系数启发的可微分目标函数。该损失在批次级别评估每个样本相对于所有类别的全局结构,与交叉熵和监督对比学习结合后,在多个数据集上实现了分类准确率的提升。
[46] Distilling Genomic Models for Efficient mRNA Representation Learning via Embedding Matching
- arXiv: 2604.08574 (cross-listed)
- Authors: Rasched Haidari, Sam Martin, Maxime Allard
- Subjects: cs.LG; cs.AI
- Tags: Knowledge Distillation, Genomic AI
- Venue: ICLR 2026 Workshop
- Summary: 本文提出了一个蒸馏框架,通过嵌入级匹配将mRNA表示从大型基因组基础模型迁移到小200倍的专用模型。蒸馏模型在可比规模模型中达到了最先进性能,突显了嵌入级蒸馏在基因组序列建模中的有效性。
[47] MolPaQ: Modular Quantum-Classical Patch Learning for Interpretable Molecular Generation
- arXiv: 2604.08575 (cross-listed)
- Authors: Syed Rameez Naqvi, Lu Peng
- Subjects: cs.LG; cs.AI
- Tags: Molecular Generation, Quantum Computing
- Summary: 本文提出了MOLPAQ,一个模块化量子-经典分子生成器,通过量子生成的潜在片段组装分子。该方法实现了100%的RDKit有效性、99.75%的新颖度和0.905的多样性,量子生成器在引导下使平均QED提升约2.3%。
[48] GAN-Enhanced Deep Reinforcement Learning for Semantic-Aware Resource Allocation in 6G Network Slicing
- arXiv: 2604.08576 (cross-listed)
- Authors: Daniel Benniah John
- Subjects: cs.NI; cs.AI; cs.LG
- Tags: Reinforcement Learning, Wireless Networks
- Summary: 本文提出了GAN-DDPG框架,将条件GAN用于流量合成、连续动作DDPG和语义感知奖励优化相结合,用于6G网络切片的语义感知资源分配。仿真结果显示在频谱效率、延迟和丢包率方面均有显著改善。
[49] Distributionally Robust Token Optimization in RLHF
- arXiv: 2604.08577 (cross-listed)
- Authors: Yeping Jin, Jiaming Hu, Ioannis Ch. Paschalidis
- Subjects: cs.LG; cs.AI
- Tags: RLHF, LLM Reasoning
- Summary: 本文提出了一种分布鲁棒令牌优化方法(DRTO),将令牌级别的RLHF与分布鲁棒优化相结合,以提高LLM在提示词分布偏移下的鲁棒性。该方法在GSM8K和MathQA数学推理基准上分别实现了9.17%和2.49%的性能提升。
[50] Structured Exploration and Exploitation of Label Functions for Automated Data Annotation
- arXiv: 2604.08578 (cross-listed)
- Authors: Phong Lam, Ha-Linh Nguyen, Thu-Trang Nguyen, Son Nguyen, Hieu Dinh Vo
- Subjects: cs.LG; cs.AI
- Tags: Data Synthesis, Weak Supervision
- Summary: 本文提出了EXPONA框架,用于自动化程序化标注,通过多层次(表面、结构、语义)标签函数生成和可靠性感知过滤机制,平衡覆盖率和精确度。实验表明该方法在11个分类数据集上实现了高达98.9%的标签覆盖率和46%的下游性能提升。
[51] On the Spectral Geometry of Cross-Modal Representations: A Functional Map Diagnostic for Multimodal Alignment
- arXiv: 2604.08579 (cross-listed)
- Authors: Krisanu Sarkar
- Subjects: cs.LG; cs.AI
- Tags: Vision-Language Model, Multimodal Learning, Representation Learning
- Summary: 本文利用计算几何中的函数映射框架研究视觉和语言编码器之间的跨模态对齐,发现了”谱复杂度-方向差距”:独立训练的模型在捕获结构复杂度上趋同,但在组织方式上未对齐。
[52] Multivariate Time Series Anomaly Detection via Dual-Branch Reconstruction and Autoregressive Flow-based Residual Density Estimation
- arXiv: 2604.08582 (cross-listed)
- Authors: Jun Liu, Ying Chen, Ziqian Lu, Qinyue Tong, Jun Tang
- Subjects: cs.LG; cs.AI
- Tags: Anomaly Detection, Time Series Forecasting
- Summary: 本文提出了DBR-AF框架用于多变量时间序列异常检测,通过双分支重建编码器解耦跨变量和变量内学习,并结合自回归流模块进行残差密度估计,在七个基准数据集上达到了最先进性能。
[53] CSAttention: Centroid-Scoring Attention for Accelerating LLM Inference
- arXiv: 2604.08584 (cross-listed)
- Authors: Chuxu Song, Zhencan Peng, Jiuqi Wei, Chuanhui Yang
- Subjects: cs.LG; cs.AI
- Tags: LLM Inference, Long Context
- Summary: 本文提出了CSAttention,一种无需训练的稀疏注意力方法,通过在离线预填充阶段构建以查询为中心的查找表,将在线解码时的全上下文扫描替换为高效的表查找,在长上下文场景下实现了4.6倍的推理加速。
[54] QCFuse: Query-Centric Cache Fusion for Efficient RAG Inference
- arXiv: 2604.08585 (cross-listed)
- Authors: Jianxin Yan, Zeheng Qian, Wangze Ni, Zhitao Shen, Zhiping Wang, Haoyang Li, Jia Zhu, Lei Chen, Kui Ren
- Subjects: cs.DB; cs.AI
- Tags: RAG, LLM Inference
- Summary: 本文提出了QCFuse系统,一种以用户查询为中心的KV缓存融合方法,利用语义摘要锚点增强查询表示,并选择性地重计算与查询相关的令牌,在保持准确性的同时将响应效率提高了40%。
[55] FluidFlow: a flow-matching generative model for fluid dynamics surrogates on unstructured meshes
- arXiv: 2604.08586 (cross-listed)
- Authors: David Ramos, Lucas Lacasa, Fermín Gutiérrez, Eusebio Valero, Gonzalo Rubio
- Subjects: cs.LG; cs.AI
- Tags: Flow Matching, Scientific Computing
- Summary: 本文提出了FluidFlow,一种基于条件流匹配的生成模型,用于在非结构化网格上构建流体动力学代理模型,无需网格插值预处理即可直接操作CFD数据,在翼型和飞机几何预测任务上显著优于MLP基线。
[56] Act or Escalate? Evaluating Escalation Behavior in Automation with Language Models
- arXiv: 2604.08588 (cross-listed)
- Authors: Matthew DosSantos DiSorbo, Harang Ju
- Subjects: cs.LG; cs.AI
- Tags: LLM Evaluation, Decision Making, LLM Alignment
- Summary: 本文将LLM的升级决策建模为不确定性下的决策问题,在五个领域中发现模型的隐式阈值差异显著且与架构或规模无关,通过在思维链上进行监督微调可获得最鲁棒的升级策略。
[57] AlphaLab: Autonomous Multi-Agent Research Across Optimization Domains with Frontier LLMs
- arXiv: 2604.08590 (cross-listed)
- Authors: Brendan R. Hogan, Xiwen Chen, James T. Wilson, Kashif Rasul, Adel Boyarsky, Thomas Kamei, Anderson Schneider, Yuriy Nevmyvaka
- Subjects: cs.LG; cs.AI
- Tags: LLM Agent, Multi-Agent System, GPU Computing
- Summary: 本文提出了AlphaLab,一个利用前沿LLM智能体能力自动化完整实验周期的自主研究系统,在CUDA内核优化、LLM预训练和交通预测等领域实现了显著性能提升,无需人工干预。
[58] From Dispersion to Attraction: Spectral Dynamics of Hallucination Across Whisper Model Scales
- arXiv: 2604.08591 (cross-listed)
- Authors: Ivan Viakhirev, Kirill Borodin, Grach Mkrtchian
- Subjects: cs.LG; cs.AI
- Tags: LLM Hallucination, Speech Processing
- Summary: 本文提出了谱敏感性定理,预测深度网络从耗散态到吸引子态的相变,并通过分析Whisper模型在对抗压力下的激活图谱特征值谱验证了该理论,揭示了不同规模模型的幻觉行为机制。
[59] Mapping generative AI use in the human brain: divergent neural, academic, and mental health profiles of functional versus socio emotional AI use
- arXiv: 2604.08594 (cross-listed)
- Authors: Junjie Wang, Xianyang Gan, Dan Liu, Jingxian He, Stefania Ferraro, Keith M. Kendrick, Weihua Zhao, Shuxia Yao, Christian Montag, Benjamin Becker
- Subjects: q-bio.NC; cs.AI; cs.HC
- Tags: Cognitive Science, AI Ethics, Medical AI
- Summary: 本文结合调查问卷和高分辨率结构MRI研究了生成式AI使用对大学生大脑的影响,发现功能性AI使用与更好的学业表现和前额叶体积相关,而社会情感性AI使用与较差的心理健康相关。
[60] Adaptive Rigor in AI System Evaluation using Temperature-Controlled Verdict Aggregation via Generalized Power Mean
- arXiv: 2604.08595 (cross-listed)
- Authors: Aleksandr Meshkov
- Subjects: cs.CL; cs.AI
- Tags: LLM Evaluation
- Summary: 本文提出了TCVA方法,通过温度控制的判决聚合和广义幂均值聚合来调节LLM评估的严格程度,低温适用于安全关键领域,高温适用于对话AI,在人类判断相关性上与现有方法相当。
[61] STIndex: A Context-Aware Multi-Dimensional Spatiotemporal Information Extraction System
- arXiv: 2604.08597 (cross-listed)
- Authors: Wenxiao Zhang, Yu Liu, Qiang sun, Yihao Ding, Sirui Li, Yanbing Liu, Jin B. Hong, Wei Liu
- Subjects: cs.DB; cs.AI
- Tags: Information Extraction, Knowledge Graph
- Summary: 本文介绍了STIndex系统,一个端到端的多维时空信息提取系统,利用LLM进行上下文感知的提取和接地,将非结构化内容结构化为多维时空数据仓库,在公共卫生基准上提升了实体提取F1值。
[62] TiAb Review Plugin: A Browser-Based Tool for AI-Assisted Title and Abstract Screening
- arXiv: 2604.08602 (cross-listed)
- Authors: Yuki Kataoka, Masahiro Banno, Michihito Kyo, Shuri Nakao, Tomoo Sato, Shunsuke Taito, Tomohiro Takayama, Takahiro Tsuge, Yasushi Tsujimoto, Ryuhei So, Toshi A. Furukawa
- Subjects: cs.DL; cs.AI; cs.LG
- Tags: Information Retrieval, Education Technology
- Code: code
- Summary: 本文开发了TiAb Review Plugin,一个开源的Chrome浏览器扩展,提供无代码、无服务器的AI辅助标题和摘要筛选功能,集成了LLM批量筛选和机器学习主动学习,适用于系统性综述筛选。
[63] Extrapolating Volition with Recursive Information Markets
- arXiv: 2604.08606 (cross-listed)
- Authors: Abhimanyu Pallavi Sudhir, Long Tran-Thanh
- Subjects: cs.GT; cs.AI; econ.TH
- Tags: LLM Alignment, AI Safety
- Venue: AAMAS 2026 Workshop
- Summary: 本文通过”信息价值”范式分析了使用LLM买家克服信息市场信息不对称的机制,提出了递归版本,该机制与AI对齐研究中的外推意志和可扩展监督相关。
[64] Joint Interference Detection and Identification via Adversarial Multi-task Learning
- arXiv: 2604.08607 (cross-listed)
- Authors: H. Xu, B. He, S. Wang
- Subjects: cs.LG; cs.AI; cs.CR; cs.IT
- Tags: Multi-Task Learning, Adversarial Robustness, Wireless Networks
- Summary: 本文建立了一个理论支撑的多任务学习框架用于联合干扰检测、调制识别和干扰识别,提出了AMTIDIN网络,通过对抗训练最小化任务间分布差异,在有限训练数据和低信噪比条件下显著优于基线。
[65] Semantic Intent Fragmentation: A Single-Shot Compositional Attack on Multi-Agent AI Pipelines
- arXiv: 2604.08608 (cross-listed)
- Authors: Tanzim Ahad, Ismail Hossain, Md Jahangir Alam, Sai Puppala, Yoonpyo Lee, Syed Bahauddin Alam, Sajedul Talukder
- Subjects: cs.CR; cs.AI; cs.LG
- Tags: LLM Security, Multi-Agent System, AI Safety
- Venue: AAAI 2026 Workshop
- Summary: 本文提出语义意图碎片化(SIF)攻击,针对LLM编排系统,通过将单一合法请求分解为多个看似无害的子任务,在组合后违反安全策略。作者在14个企业场景中验证了71%的攻击成功率,并提出计划级信息流追踪作为防御机制。
[66] Detection of Hate and Threat in Digital Forensics: A Case-Driven Multimodal Approach
- arXiv: 2604.08609 (cross-listed)
- Authors: Ponkoj Chandra Shill
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Multimodal Learning, Vision-Language Model, Cybersecurity
- Summary: 本文提出一种案例驱动的多模态方法,用于数字取证中的仇恨和威胁检测。该框架根据证据类型(嵌入文本、关联上下文文本或纯图像)选择性应用文本分析、多模态融合或图像语义推理,提高取证决策的可追溯性。
[67] MARINER: A 3E-Driven Benchmark for Fine-Grained Perception and Complex Reasoning in Open-Water Environments
- arXiv: 2604.08615 (cross-listed)
- Authors: Xingming Liao, Ning Chen, Muying Shu, Yunpeng Yin, Peijian Zeng, Zhuowei Wang, Nankai Lin, Lianglun Cheng
- Subjects: cs.CV; cs.AI
- Tags: Vision-Language Model, Object Detection, 3D Vision
- Summary: 本文介绍MARINER基准,基于实体-环境-事件(3E)范式构建,包含16,629张多源海事图像,涵盖63个细粒度船舶类别和5种典型动态海事事故。评估显示当前多模态大模型在复杂海洋场景的细粒度识别和因果推理方面仍存在困难。
[68] From Selection to Scheduling: Federated Geometry-Aware Correction Makes Exemplar Replay Work Better under Continual Dynamic Heterogeneity
- arXiv: 2604.08617 (cross-listed)
- Authors: Zhuang Qi, Ying-Peng Tang, Lei Meng, Guoqing Chao, Lei Wu, Han Yu, Xiangxu Meng
- Subjects: cs.LG; cs.AI; cs.CV
- Tags: Federated Learning, Continual Learning, Knowledge Distillation
- Venue: CVPR 2026
- Summary: 本文提出FEAT方法,用于联邦持续学习中的样本回放优化。该方法包含几何结构对齐模块和基于能量的几何校正模块,通过等角紧框架原型对齐和去除任务无关方向分量,缓解类别不平衡导致的表示崩溃问题。
[69] SkillForge: Forging Domain-Specific, Self-Evolving Agent Skills in Cloud Technical Support
- arXiv: 2604.08618 (cross-listed)
- Authors: Xingyan Liu, Xiyue Luo, Linyu Li, Ganghong Huang, Jianfeng Liu, Honglin Qiao
- Subjects: cs.IR; cs.AI; cs.SE
- Tags: LLM Agent, Knowledge Distillation
- Venue: SIGIR 2026
- Summary: 本文提出SkillForge框架,用于云技术支持场景中领域特定技能的自演化创建与优化。该框架通过领域上下文化技能创建器和三阶段诊断优化管道,实现从执行失败中自动识别技能缺陷并持续改进。
[70] StructRL: Recovering Dynamic Programming Structure from Learning Dynamics in Distributional Reinforcement Learning
- arXiv: 2604.08620 (cross-listed)
- Authors: Ivo Nowak
- Subjects: cs.LG; cs.AI
- Tags: Reinforcement Learning, Model-Based RL
- Summary: 本文提出StructRL框架,从分布式强化学习的学习动态中恢复动态规划结构。通过分析回报分布的时间演化,识别状态空间中学习发生的时机和位置信号,引导采样与新兴传播结构对齐。
[71] Practical Bayesian Inference for Speech SNNs: Uncertainty and Loss-Landscape Smoothing
- arXiv: 2604.08624 (cross-listed)
- Authors: Yesmine Abdennadher, Philip N. Garner
- Subjects: cs.LG; cs.AI
- Tags: Speech Processing, Neuromorphic Computing, Uncertainty Estimation
- Summary: 本文探索贝叶斯学习方法对脉冲神经网络(SNN)预测景观的影响,应用改进变分在线牛顿(IVON)方法。实验表明贝叶斯方法能产生更平滑、更规则的预测景观,并在负对数似然和Brier分数上取得改进。
[72] Evidential Transformation Network: Turning Pretrained Models into Evidential Models for Post-hoc Uncertainty Estimation
- arXiv: 2604.08627 (cross-listed)
- Authors: Yongchan Chun, Chanhee Park, Jeongho Yoon, Jaehyung Seo, Heuiseok Lim
- Subjects: cs.LG; cs.AI
- Tags: Uncertainty Estimation, Transfer Learning
- Venue: CVPR 2026
- Summary: 本文提出证据变换网络(ETN),一种轻量级后处理模块,可将预训练模型转换为证据模型以实现不确定性估计。ETN在logit空间学习样本相关的仿射变换,将输出解释为Dirichlet分布参数,无需重新训练原模型。
[73] Retrieval Augmented Classification for Confidential Documents
- arXiv: 2604.08628 (cross-listed)
- Authors: Yeseul E. Chang, Rahul Kailasa, Simon Shim, Byunghoon Oh, Jaewoo Lee
- Subjects: cs.CR; cs.AI; cs.IR
- Tags: RAG, Information Retrieval, Cybersecurity
- Venue: ICONI 2025
- Summary: 本文提出检索增强分类(RAC)方法用于机密文档分类,在WikiLeaks美国外交语料库上与监督微调进行比较。RAC在非平衡数据上更稳定,同时通过将敏感内容保留在外部向量存储中提供更好的安全性。
[74] LEGO: Latent-space Exploration for Geometry-aware Optimization of Humanoid Kinematic Design
- arXiv: 2604.08636 (cross-listed)
- Authors: Jihwan Yoon, Taemoon Jeong, Jeongeun Park, Chanwoo Kim, Jaewoon Kwon, Yonghyeon Lee, Kyungjae Lee, Sungjoon Choi
- Subjects: cs.RO; cs.AI
- Tags: Robotics, Representation Learning
- Venue: ICRA 2026
- Summary: 本文提出LEGO框架,用于人形机器人运动学设计优化。该方法从现有机械设计中学习设计搜索空间,通过运动重定向和普氏分析从人类运动数据定义损失函数,构建几何保持的紧凑潜在空间进行优化。
[75] VOLTA: The Surprising Ineffectiveness of Auxiliary Losses for Calibrated Deep Learning
- arXiv: 2604.08639 (cross-listed)
- Authors: Rahul D Ray, Utkarsh Srivastava
- Subjects: cs.LG; cs.AI; cs.CV
- Tags: Uncertainty Estimation
- Summary: 本文对十种常用不确定性量化方法进行全面基准测试,提出VOLTA简化变体。该方法仅使用深度编码器、可学习原型、交叉熵损失和后验温度缩放,在准确性和校准误差方面达到竞争或优越性能。
[76] On Semiotic-Grounded Interpretive Evaluation of Generative Art
- arXiv: 2604.08641 (cross-listed)
- Authors: Ruixiang Jiang, Changwen Chen
- Subjects: cs.CV; cs.AI; cs.HC; cs.MM
- Tags: Text-to-Image, Interpretability
- Code: code
- Summary: 本文基于皮尔斯计算符号学理论形式化人类-生成艺术交互(HGI),提出SemJudge评估器。该方法通过层次化符号过程图(HSG)评估符号意义和索引意义,在解释密集型艺术基准上比先前评估器更符合人类判断。
[77] 3D-VCD: Hallucination Mitigation in 3D-LLM Embodied Agents through Visual Contrastive Decoding
- arXiv: 2604.08645 (cross-listed)
- Authors: Makanjuola Ogunleye, Eman Abdelrahman, Ismini Lourentzou
- Subjects: cs.CV; cs.AI; cs.LG; cs.RO
- Tags: LLM Hallucination, Embodied AI, 3D Vision
- Venue: CVPR 2026
- Summary: 本文提出3D-VCD,首个用于3D具身智能体幻觉缓解的推理时视觉对比解码框架。通过构建语义和几何扰动的扭曲3D场景图并对比预测,该方法抑制对场景证据不敏感的token,无需重新训练。
[78] Every Response Counts: Quantifying Uncertainty of LLM-based Multi-Agent Systems through Tensor Decomposition
- arXiv: 2604.08708 (cross-listed)
- Authors: Tiejin Chen, Huaiyuan Yao, Jia Chen, Evangelos E. Papalexakis, Hua Wei
- Subjects: cs.LG; cs.AI; cs.CL
- Tags: Multi-Agent System, Uncertainty Estimation, LLM Evaluation
- Venue: ACL 2026
- Summary: 本文提出MATU框架,通过张量分解量化LLM多智能体系统的不确定性。该方法将推理轨迹表示为嵌入矩阵并组织为高阶张量,通过张量分解解耦并量化不同来源的不确定性。
[79] Deep Learning-Based Tracking and Lineage Reconstruction of Ligament Breakup
- arXiv: 2604.08711 (cross-listed)
- Authors: Vrushank Ahire, Vivek Kurumanghat, Mudasir Ganaie, Lipika Kabiraj
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Object Detection, Video Understanding
- Summary: 本文提出两阶段深度学习框架,用于液体薄片破碎过程中韧带和液滴的检测与谱系重建。第一阶段使用Faster R-CNN检测,第二阶段使用Transformer增强的MLP分类帧间关联,包括延续和碎片化事件。
[80] Accelerating Transformer-Based Monocular SLAM via Geometric Utility Scoring
- arXiv: 2604.08718 (cross-listed)
- Authors: Xinmiao Xiong, Bangya Liu, Hao Wang, Dayou Li, Nuo Chen, Andrew Feng, Mingyu Ding, Suman Banerjee, Yang Zhou, Zhiwen Fan
- Subjects: cs.CV; cs.AI; cs.RO
- Tags: 3D Vision, Optimization
- Summary: 本文提出LeanGate,一种轻量级前馈帧门控网络,用于加速基于Transformer的单目SLAM。该方法在昂贵的几何特征提取之前预测帧的几何效用分数,跳过超过90%的冗余帧,实现5倍端到端吞吐量提升。
[81] LMGenDrive: Bridging Multimodal Understanding and Generative World Modeling for End-to-End Driving
- arXiv: 2604.08719 (cross-listed)
- Authors: Hao Shao, Letian Wang, Yang Zhou, Yuxuan Hu, Zhuofan Zong, Steven L. Waslander, Wei Zhan, Hongsheng Li
- Subjects: cs.CV; cs.AI; cs.RO
- Tags: Autonomous Driving, Vision-Language Model, Multimodal Learning
- Summary: 本文提出了LMGenDrive,这是首个将基于LLM的多模态理解与生成式世界模型相结合的端到端闭环驾驶框架,能够根据多视角相机输入和自然语言指令生成未来驾驶视频和控制信号。
[82] Demystifying the Silence of Correctness Bugs in PyTorch Compiler
- arXiv: 2604.08720 (cross-listed)
- Authors: Meiziniu Li, Dongze Li, Jianmeng Liu, Shing-Chi Cheung
- Subjects: cs.SE; cs.AI
- Tags: Software Testing, DNN Deployment
- Summary: 本文首次对PyTorch编译器中的正确性缺陷进行了实证研究,并提出了AlignGuard测试技术,利用LLM进行测试用例变异,成功检测出23个新的正确性缺陷。
[83] AI Driven Soccer Analysis Using Computer Vision
- arXiv: 2604.08722 (cross-listed)
- Authors: Adrian Manchado, Tanner Cellio, Jonathan Keane, Yiyang Wang
- Subjects: cs.CV; cs.AI
- Tags: Object Detection, Video Understanding
- Summary: 本文提出了一种基于计算机视觉的足球分析系统,结合目标检测模型(YOLO、Faster R-CNN)、SAM2分割和单应性变换,实现球员位置追踪和战术洞察提取。
[84] Decomposing the Delta: What Do Models Actually Learn from Preference Pairs?
- arXiv: 2604.08723 (cross-listed)
- Authors: Chia-Hsuan Lee, Mingyang Zhou, Renkun Ni, Zelei Cheng, Sihui Dai, Supriyo Chakraborty, Shixiong Zhang, Sambit Sahu, William Campbell
- Subjects: cs.CL; cs.AI
- Tags: LLM Reasoning, LLM Alignment
- Summary: 本文研究了偏好数据中哪些方面能提升推理模型性能,发现生成器级别的差异(chosen和rejected生成器之间的能力差异)和样本级别的差异(偏好对内的质量差异)都能通过偏好优化提升推理性能。
[85] LLMs Underperform Graph-Based Parsers on Supervised Relation Extraction for Complex Graphs
- arXiv: 2604.08752 (cross-listed)
- Authors: Paolo Gajo, Domenic Rosati, Hassan Sajjad, Alberto Barrón-Cedeño
- Subjects: cs.CL; cs.AI
- Tags: Information Extraction, Knowledge Graph
- Venue: ACL 2026
- Summary: 本文表明当语言图复杂度较高时,LLM在监督关系抽取任务上的表现不如基于图的解析器,且随着输入文档中关系数量增加,性能差距进一步扩大。
[86] Cards Against LLMs: Benchmarking Humor Alignment in Large Language Models
- arXiv: 2604.08757 (cross-listed)
- Authors: Yousra Fettach, Guillaume Bied, Hannu Toivonen, Tijl De Bie
- Subjects: cs.CL; cs.AI
- Tags: LLM Alignment, LLM Evaluation
- Summary: 本文通过让LLM参与《反人类牌》游戏来评估其幽默对齐程度,发现模型虽然超越随机基线,但模型之间的共识度远高于与人类的共识度,部分原因在于位置偏差和内容偏好。
[87] InstrAct: Towards Action-Centric Understanding in Instructional Videos
- arXiv: 2604.08762 (cross-listed)
- Authors: Zhuoyi Yang, Jiapeng Yu, Reuben Tan, Boyang Li, Huijuan Xu
- Subjects: cs.CV; cs.AI
- Tags: Video Understanding, Self-Supervised Learning
- Summary: 本文提出了InstrAction预训练框架,通过数据驱动策略过滤噪声字幕并生成动作中心的硬负样本,结合动作感知器和辅助目标,提升教学视频的动作中心表征学习。
[88] PSIRNet: Deep Learning-based Free-breathing Rapid Acquisition Late Enhancement Imaging
- arXiv: 2604.08781 (cross-listed)
- Authors: Arda Atalik, Hui Xue, Rhodri H. Davies, Thomas A. Treibel, Daniel K. Sodickson, Michael S. Hansen, Peter Kellman
- Subjects: eess.IV; cs.AI; cs.CV; eess.SP
- Tags: Medical AI
- Summary: 本文开发了PSIRNet深度学习网络,用于自由呼吸心脏MRI成像,可在两个心跳的单次采集中生成诊断质量图像,实现8-24倍的采集时间缩减。
[89] eBandit: Kernel-Driven Reinforcement Learning for Adaptive Video Streaming
- arXiv: 2604.08791 (cross-listed)
- Authors: Mahdi Alizadeh
- Subjects: cs.NI; cs.AI
- Tags: Video Streaming, Reinforcement Learning
- Summary: 本文提出了eBandit框架,利用eBPF将网络监控和自适应码率算法选择移至Linux内核,通过Multi-Armed Bandit学习实现实时ABR算法选择,显著提升QoE。
[90] Lessons Without Borders? Evaluating Cultural Alignment of LLMs Using Multilingual Story Moral Generation
- arXiv: 2604.08797 (cross-listed)
- Authors: Sophie Wu, Andrew Piper
- Subjects: cs.CL; cs.AI
- Tags: LLM Alignment, LLM Evaluation
- Summary: 本文引入多语言故事寓意生成作为文化对齐评估任务,发现前沿模型生成的寓意在语义上与人类相似,但跨语言变异性较低,且集中于更窄的价值观集合。
[91] Scrapyard AI
- arXiv: 2604.08803 (cross-listed)
- Authors: Marc Böhlen, Sai Krishna
- Subjects: cs.CY; cs.AI
- Tags: AI Sustainability
- Venue: XcoAx 2026
- Summary: 本文将AI模型更替视为低成本实验的机会,介绍了Project Nudge-x项目,通过重新配置废弃的AI模型来分析全球采矿场所对景观和生命的影响。
[92] Building Better Environments for Autonomous Cyber Defence
- arXiv: 2604.08805 (cross-listed)
- Authors: Chris Hicks, Elizabeth Bates, Shae McFadden, Isaac Symes Thompson, Myles Foley, Ed Chapman, Nickolas Espinosa Dice, Ankita Samaddar, Joshua Sylvester, Himanshu Neema, Nicholas Butts, Nate Foster, Ahmad Ridley, Zoe M, Paul Jones
- Subjects: cs.CR; cs.AI
- Tags: Cybersecurity, Reinforcement Learning
- Summary: 本文基于自主网络防御RL环境研讨会,提出了RL网络环境与真实系统接口的分解框架,以及RL环境开发和智能体评估的最佳实践指南。
[93] SenBen: Sensitive Scene Graphs for Explainable Content Moderation
- arXiv: 2604.08819 (cross-listed)
- Authors: Fatih Cagatay Akyon, Alptekin Temizel
- Subjects: cs.CV; cs.AI; cs.LG; cs.MM
- Tags: Vision-Language Model, Object Detection
- Venue: CVPR 2026 Workshop
- Summary: 本文介绍了首个大规模敏感内容场景图基准SenBen,并将前沿VLM蒸馏为紧凑的学生模型,在接地场景图指标上超越所有商业安全API。
[94] HiFloat4 Format for Language Model Pre-training on Ascend NPUs
- arXiv: 2604.08826 (cross-listed)
- Authors: Mehran Taghian, Yunke Peng, Xing Huang, Yao Wang, Yaoyuan Wang, Wei Guo, Yuanyong Luo, Tianchi Hu, Junsong Wang, Xin Wang, Hu Liu, Yu Cheng, Ziwei Yu, Hongliang Li, Mehdi Rahimifar, Lei Yan, Xuefei Wang, Zhuang Ma, Lei Liu, Hui Yu, Anandharaju Durai Raju, Hoang Le, Hei Yi Mak, Tanzila Rahman, Shadan Golestan
- Subjects: cs.LG; cs.AI; cs.CL
- Tags: Pre-training, Model Compression
- Summary: 本文研究了华为昇腾NPU上的HiFloat4四比特浮点格式,证明FP4训练可在保持相对误差低于1%的同时实现显著的效率提升。
[95] Dictionary-Aligned Concept Control for Safeguarding Multimodal LLMs
- arXiv: 2604.08846 (cross-listed)
- Authors: Jinqi Luo, Jinyu Yang, Tal Neiman, Lei Fan, Bing Yin, Son Tran, Mubarak Shah, René Vidal
- Subjects: cs.LG; cs.AI; cs.CL; cs.CV
- Tags: Vision-Language Model, LLM Security
- Venue: CVPR 2026
- Summary: 本文提出了DACO框架,利用精心构建的概念字典和稀疏自编码器对MLLM激活进行细粒度控制,在保持通用能力的同时显著提升模型安全性。
[96] Scalable High-Recall Constraint-Satisfaction-Based Information Retrieval for Clinical Trials Matching
- arXiv: 2604.08849 (cross-listed)
- Authors: Cyrus Zhou, Yufei Jin, Yilin Xu, Yu-Chiang Wang, Chieh-Ju Chao, Monica S. Lam
- Subjects: cs.CL; cs.AI; cs.DB; cs.MA; cs.SC
- Tags: Information Retrieval, Medical AI
- Summary: 本文提出了SatIR临床试验检索方法,基于约束满足(SMT和关系代数)并利用LLM将非形式推理转化为显式形式约束,在检索目标上全面超越TrialGPT。
[97] AI-Induced Human Responsibility (AIHR) in AI-Human teams
- arXiv: 2604.08866 (cross-listed)
- Authors: Greg Nyilasy, Brock Bastian, Jennifer Overbeck, Abraham Ryan Ade Putra Hito
- Subjects: cs.HC; cs.AI
- Tags: AI Ethics, Human-Computer Interaction
- Summary: 本文通过四个实验研究AI-人类团队中责任分配问题,发现当人类与AI配对时比与另一人类配对时被分配更多责任(AI诱导的人类责任效应),这是因为AI被视为受限执行者而人类成为自由裁量责任的默认承担者。
[98] AudioGuard: Toward Comprehensive Audio Safety Protection Across Diverse Threat Models
- arXiv: 2604.08867 (cross-listed)
- Authors: Mintong Kang, Chen Fang, Bo Li
- Subjects: cs.SD; cs.AI
- Tags: Speech Processing, LLM Security
- Summary: 本文提出AudioGuard,一个统一的音频安全防护系统,包含波形级检测的SoundGuard和语义保护的ContentGuard,并构建了首个跨多种威胁模型的音频安全基准AudioSafetyBench。
[99] MedFormer-UR: Uncertainty-Routed Transformer for Medical Image Classification
- arXiv: 2604.08868 (cross-listed)
- Authors: Mohammed Maaz Sibhai, Abedalrhman Alkhateeb, Saad B. Ahmed
- Subjects: eess.IV; cs.AI; cs.CV; cs.LG
- Tags: Medical AI, Uncertainty Estimation
- Summary: 本文提出MedFormer-UR,一种结合原型学习和不确定性引导路由的医学图像分类Transformer,通过Dirichlet分布量化不确定性,在四种医学影像模态上显著提升模型校准和选择性预测性能。
[100] Temporal Dropout Risk in Learning Analytics: A Harmonized Survival Benchmark Across Dynamic and Early-Window Representations
- arXiv: 2604.08870 (cross-listed)
- Authors: Rafael da Silva, Jeff Eicher, Gregory Longo
- Subjects: cs.LG; cs.AI
- Tags: Education Technology, Time Series Forecasting
- Summary: 本文使用OULAD数据集建立了面向学生辍学风险的时间生存分析基准,比较动态周粒度和连续时间两种建模方法,发现时间和行为信号是主要预测因素而非人口统计学特征。
[101] A Mathematical Framework for Temporal Modeling and Counterfactual Policy Simulation of Student Dropout
- arXiv: 2604.08874 (cross-listed)
- Authors: Rafael da Silva, Jeff Eicher, Gregory Longo
- Subjects: cs.LG; cs.AI
- Tags: Education Technology, Causal Inference
- Code: code
- Summary: 本文提出一个带有反事实政策模拟层的时间建模框架用于学生辍学预测,使用惩罚化逻辑回归建模周粒度风险,并实现场景索引的政策层进行生存对比分析。
[102] Revisiting the Capacity Gap in Chain-of-Thought Distillation from a Practical Perspective
- arXiv: 2604.08880 (cross-listed)
- Authors: Tokio Kajitsuka, Ukyo Honda, Sho Takase
- Subjects: cs.LG; cs.AI; cs.CL
- Tags: Knowledge Distillation, LLM Reasoning
- Summary: 本文重新审视思维链蒸馏中的容量差距问题,发现容量差距效应在不同任务和设置中并不一致主导,并提出了更现实的评估协议和教师-学生配对选择指导。
[103] HTNav: A Hybrid Navigation Framework with Tiered Structure for Urban Aerial Vision-and-Language Navigation
- arXiv: 2604.08883 (cross-listed)
- Authors: Chengjie Fan, Cong Pan, Zijian Liu, Ningzhong Liu, Jie Qin
- Subjects: cs.RO; cs.AI
- Tags: Vision-Language Model, Reinforcement Learning
- Summary: 本文提出HTNav,一个融合模仿学习和强化学习的混合导航框架,采用分层决策机制实现宏观路径规划与细粒度动作控制的协作交互,在CityNav基准上取得最优性能。
[104] HM-Bench: A Comprehensive Benchmark for Multimodal Large Language Models in Hyperspectral Remote Sensing
- arXiv: 2604.08884 (cross-listed)
- Authors: Xinyu Zhang, Zurong Mai, Qingmei Li, Zjin Liao, Yibin Wen, Yuhang Chen, Xiaoya Fan, Chan Tsz Ho, Bi Tianyuan, Haoyuan Liang, Ruifeng Su, Zihao Qian, Juepeng Zheng, Jianxi Huang, Yutong Lu, Haohuan Fu
- Subjects: cs.CV; cs.AI
- Tags: Remote Sensing, Vision-Language Model, LLM Evaluation
- Code: code
- Summary: 本文介绍HM-Bench,首个专门评估多模态大语言模型在高光谱图像理解能力的基准,包含19,337个问答对覆盖13个任务类别,并提出双模态评估框架。
[105] A Closer Look at the Application of Causal Inference in Graph Representation Learning
- arXiv: 2604.08890 (cross-listed)
- Authors: Hang Gao, Kunyu Li, Huang Hong, Baoquan Cui, Fengge Wu
- Subjects: cs.LG; cs.AI
- Tags: Causal Inference, Graph Learning
- Summary: 本文分析了图表示学习中因果推理方法的理论有效性问题,证明了现有聚合操作会损害因果有效性,提出基于最小不可分图单元的理论模型,并开发了可集成到现有图学习流程的因果建模增强模块。
[106] Adaptive Dual Residual U-Net with Attention Gate and Multiscale Spatial Attention Mechanisms (ADRUwAMS)
- arXiv: 2604.08893 (cross-listed)
- Authors: Mohsen Yaghoubi Suraki
- Subjects: cs.CV; cs.AI
- Tags: Medical AI, Image Segmentation
- Summary: 本文提出ADRUwAMS,一种结合自适应双残差网络、注意力门和多尺度空间注意力机制的U-Net模型,用于脑肿瘤分割,在BraTS 2020数据集上取得高Dice分数。
[107] Ge$^\text{2}$mS-T: Multi-Dimensional Grouping for Ultra-High Energy Efficiency in Spiking Transformer
- arXiv: 2604.08894 (cross-listed)
- Authors: Zecheng Hao, Shenghao Xie, Kang Chen, Wenxuan Liu, Zhaofei Yu, Tiejun Huang
- Subjects: cs.NE; cs.AI; cs.CV
- Tags: Energy Efficiency, Neuromorphic Computing
- Summary: 本文提出Ge²mS-T,一种在时间、空间和网络结构维度实现分组计算的脉冲Transformer架构,通过分组指数编码和分组脉冲自注意力机制实现超高能效。
[108] Large-Scale Universal Defect Generation: Foundation Models and Datasets
- arXiv: 2604.08915 (cross-listed)
- Authors: Yuanting Fan, Jun Liu, Bin-Bin Gao, Xiaochen Chen, Yuhuan Lin, Zhewei Dai, Jiawei Zhan, Chengjie Wang
- Subjects: cs.CV; cs.AI
- Tags: Anomaly Detection, Data Synthesis
- Code: code
- Summary: 本文介绍UDG,一个包含30万样本的大规模缺陷数据集,以及UniDG,一个支持参考缺陷生成和文本指令缺陷编辑的通用缺陷生成基础模型。
[109] Beyond Relevance: Utility-Centric Retrieval in the LLM Era
- arXiv: 2604.08920 (cross-listed)
- Authors: Hengran Zhang, Minghao Tang, Keping Bi, Jiafeng Guo
- Subjects: cs.IR; cs.AI; cs.CL; cs.LG
- Tags: RAG, Information Retrieval
- Venue: SIGIR 2026
- Summary: 本文论证检索目标应从相关性中心优化转向LLM中心效用优化,提出涵盖LLM无关与LLM特定效用、上下文无关与上下文相关效用的统一框架。
[110] MuTSE: A Human-in-the-Loop Multi-use Text Simplification Evaluator
- arXiv: 2604.08947 (cross-listed)
- Authors: Rares-Alexandru Roscan, Gabriel Petre1, Adrian-Marius Dumitran, Angela-Liliana Dumitran
- Subjects: cs.CL; cs.AI
- Tags: LLM Evaluation, Text Simplification
- Venue: ITS 2026
- Summary: 本文介绍MuTSE,一个交互式人在回路Web应用,用于系统评估LLM生成的文本简化结果,支持任意提示-模型排列组合的并发执行和可视化对齐映射。
[111] WOMBET: World Model-based Experience Transfer for Robust and Sample-efficient Reinforcement Learning
- arXiv: 2604.08958 (cross-listed)
- Authors: Mintae Kim, Koushil Sreenath
- Subjects: cs.LG; cs.AI; cs.RO
- Tags: Reinforcement Learning, Robotics
- Venue: L4DC 2026
- Summary: 本文提出WOMBET框架,通过在源任务学习世界模型并生成离线数据,结合不确定性惩罚规划和自适应采样,实现高效的经验迁移和样本高效的强化学习。
[112] Aligned Agents, Biased Swarm: Measuring Bias Amplification in Multi-Agent Systems
- arXiv: 2604.08963 (cross-listed)
- Authors: Keyu Li, Jin Gao, Dequan Wang
- Subjects: cs.MA; cs.AI
- Tags: Multi-Agent System, Bias Mitigation
- Code: code
- Summary: 本文研究多智能体系统中的偏见放大现象,发现结构化工作流会将微小随机偏见放大为系统性极化,并引入Discrim-Eval-Open基准进行开放式偏见评估。
[113] Litmus (Re)Agent: A Benchmark and Agentic System for Predictive Evaluation of Multilingual Models
- arXiv: 2604.08970 (cross-listed)
- Authors: Avni Mittal, Shanu Kumar, Sandipan Dandapat, Monojit Choudhury
- Subjects: cs.CL; cs.AI; cs.HC; cs.MA
- Tags: LLM Evaluation, LLM Agent
- Summary: 本文提出了一个预测性多语言评估基准和Litmus (Re)Agent系统,用于在缺乏直接基准结果时估计模型在目标语言上的表现。该系统通过DAG编排的代理架构分解查询、检索证据并合成预测,在证据稀疏的场景下表现优异。
[114] Neighbourhood Transformer: Switchable Attention for Monophily-Aware Graph Learning
- arXiv: 2604.08980 (cross-listed)
- Authors: Yi Luo, Xu Sun, Guangchun Luo, Aiguo Chen
- Subjects: cs.LG; cs.AI
- Tags: Graph Neural Network, Graph Learning
- Code: code
- Summary: 本文提出了邻域Transformer(NT),一种新的图学习范式,通过在局部邻域内应用自注意力机制来处理异配图问题。该方法在节点分类任务上达到最优性能,同时将空间和时间消耗分别降低超过95%和92%。
[115] PerMix-RLVR: Preserving Persona Expressivity under Verifiable-Reward Alignment
- arXiv: 2604.08986 (cross-listed)
- Authors: Jihwan Oh, Soowon Oh, Murad Aghazada, Minchan Jeong, Sungnyun Kim, Se-Young Yun
- Subjects: cs.CL; cs.AI
- Tags: LLM Alignment, RLHF
- Summary: 本文研究了可验证奖励强化学习(RLVR)中角色鲁棒性与保真度之间的权衡问题,提出了PerMix-RLVR策略来缓解这一矛盾。该方法在MATH500上提升了21.2%的角色稳定性,同时在PersonaGym上提升了11.4%的角色保真度。
[116] PinpointQA: A Dataset and Benchmark for Small Object-Centric Spatial Understanding in Indoor Videos
- arXiv: 2604.08991 (cross-listed)
- Authors: Zhiyu Zhou, Peilin Liu, Ruoxuan Zhang, Luyang Zhang, Cheng Zhang, Hongxia Xie, Wen-Huang Cheng
- Subjects: cs.CV; cs.AI
- Tags: Vision-Language Model, Question Answering, Video Understanding
- Summary: 本文介绍了PinpointQA,首个针对室内视频中微小物体空间理解的基准数据集,包含1024个场景和10094个QA对。实验揭示了多模态大语言模型在渐进式任务链上的能力差距,监督微调可显著提升模型性能。
[117] ASTRA: Adaptive Semantic Tree Reasoning Architecture for Complex Table Question Answering
- arXiv: 2604.08999 (cross-listed)
- Authors: Xiaoke Guo, Songze Li, Zhiqiang Liu, Zhaoyan Gong, Yuanxiang Liu, Huajun Chen, Wen Zhang
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: Question Answering, LLM Reasoning
- Summary: 本文提出了ASTRA架构用于复杂表格问答,包括将表格重建为逻辑语义树的AdaSTR模块和结合树搜索导航与符号代码执行的DuTR双模推理框架。该方法在复杂表格基准上达到了最优性能。
[118] Towards Linguistically-informed Representations for English as a Second or Foreign Language: Review, Construction and Application
- arXiv: 2604.09008 (cross-listed)
- Authors: Wenxi Li, Xihao Wang, Weiwei Sun
- Subjects: cs.CL; cs.AI
- Tags: Linguistic Resource, Representation Learning
- Summary: 本文调研了现有的ESFL(英语作为第二语言或外语)资源并识别其局限性,提出了一种基于构式理论的句法-语义资源,包含1643个标注的ESFL句子。该资源能够捕捉ESFL现象的独特特征,为二语习得研究提供有价值工具。
[119] Identification and Anonymization of Named Entities in Unstructured Information Sources for Use in Social Engineering Detection
- arXiv: 2604.09016 (cross-listed)
- Authors: Carlos Jimeno Miguel, Raul Orduna, Francesco Zola
- Subjects: cs.LG; cs.AI
- Tags: Information Extraction, Privacy, Cybersecurity
- Summary: 本文提出了一种从Telegram平台收集信息并评估命名实体识别解决方案的系统,用于创建符合GDPR法规的网络犯罪分析数据集。系统包含语音转文本转录和匿名化功能,在保护个人信息的同时保持数据结构一致性。
[120] Regime-Conditional Retrieval: Theory and a Transferable Router for Two-Hop QA
- arXiv: 2604.09019 (cross-listed)
- Authors: Andre Bacellar
- Subjects: cs.IR; cs.AI; cs.CL; cs.LG
- Tags: RAG, Question Answering, Information Retrieval
- Summary: 本文将两跳QA检索形式化为两种查询模式,并提出了RegimeRouter轻量级二分类路由器来选择检索策略。该方法在多个QA数据集上实现了显著的召回率提升,并提供了理论分析和实证验证。
[121] Noise-Aware In-Context Learning for Hallucination Mitigation in ALLMs
- arXiv: 2604.09021 (cross-listed)
- Authors: Qixuan Huang, Khalid Zaman, Masashi Unoki
- Subjects: cs.SD; cs.AI
- Tags: LLM Hallucination, In-Context Learning, Speech Processing
- Summary: 本文提出了NAICL方法,通过构建噪声先验库并检索相关噪声示例作为上下文先验,来缓解听觉大语言模型的幻觉问题。作者还建立了音频字幕任务的幻觉基准,将整体幻觉率从26.53%降低到16.98%。
[122] Leave My Images Alone: Preventing Multi-Modal Large Language Models from Analyzing Images via Visual Prompt Injection
- arXiv: 2604.09024 (cross-listed)
- Authors: Zedian Shao, Hongbin Liu, Yuepeng Hu, Neil Zhenqiang Gong
- Subjects: cs.CV; cs.AI; cs.CR; cs.LG
- Tags: Vision-Language Model, Privacy, Adversarial Robustness
- Venue: ACL 2026
- Summary: 本文提出了ImageProtector方法,通过在图像中嵌入几乎不可察觉的扰动作为视觉提示注入攻击,使MLLM在分析受保护图像时生成拒绝响应。该方法在六个MLLM和四个数据集上验证了有效性,为隐私保护提供了新思路。
[123] Skill-Conditioned Visual Geolocation for Vision-Language
- arXiv: 2604.09025 (cross-listed)
- Authors: Chenjie Yang, Yutian Jiang, Chenyu Wu
- Subjects: cs.CV; cs.AI
- Tags: Vision-Language Model, LLM Reasoning
- Summary: 本文提出了GeoSkill框架,一种基于演化技能图的免训练图像地理定位方法。框架包含自主演化机制,通过分析成功和失败的推理轨迹来迭代扩展技能图,无需参数更新即可增强地理认知能力。
[124] CONDESION-BENCH: Conditional Decision-Making of Large Language Models in Compositional Action Space
- arXiv: 2604.09029 (cross-listed)
- Authors: Yeonjun Hwang, Sungyong Park, Minju Kim, Dongha Lee, Jinyoung Yeo
- Subjects: cs.CL; cs.AI
- Tags: LLM Evaluation, Decision Making
- Summary: 本文介绍了CONDESION-BENCH基准,用于评估LLM在组合动作空间中的条件决策能力。该基准在变量、上下文和分配层面引入显式约束条件,为评估LLM作为决策支持工具提供了更严格的框架。
[125] U-Cast: A Surprisingly Simple and Efficient Frontier Probabilistic AI Weather Forecaster
- arXiv: 2604.09041 (cross-listed)
- Authors: Salva Rühling Cachay, Duncan Watson-Parris, Rose Yu
- Subjects: cs.LG; cs.AI; stat.ML
- Tags: Weather Forecasting, Uncertainty Estimation
- Code: code
- Summary: 本文提出了U-Cast概率天气预报模型,基于标准U-Net骨干网络,通过确定性预训练和概率性微调实现。该模型在1.5度分辨率上匹配或超越GenCast和IFS ENS的技能,同时将训练计算量减少超过10倍。
[126] Watt Counts: Energy-Aware Benchmark for Sustainable LLM Inference on Heterogeneous GPU Architectures
- arXiv: 2604.09048 (cross-listed)
- Authors: Mauricio Fadel Argerich, Jonathan Fürst, Marta Patiño-Martínez
- Subjects: cs.DC; cs.AI
- Tags: LLM Inference, Energy Efficiency, AI Sustainability
- Summary: 本文发布了Watt Counts数据集,包含50个LLM在10种GPU上的超过5000次能耗实验。研究表明GPU选择对能效至关重要,通过合理的硬件选择可在服务器场景下减少高达70%的能耗。
[127] PDE-regularized Dynamics-informed Diffusion with Uncertainty-aware Filtering for Long-Horizon Dynamics
- arXiv: 2604.09058 (cross-listed)
- Authors: Min Young Baeg, Yoon-Yeong Kim
- Subjects: cs.LG; cs.AI
- Tags: Diffusion Model, Uncertainty Estimation, Scientific Computing
- Summary: 本文提出了PDYffusion框架,将PDE正则化和不确定性感知滤波集成到扩散模型中,用于长期时空预测。该方法在多个动力学数据集上实现了优越的CRPS和MSE性能,同时保持稳定的不确定性行为。
[128] Learning Vision-Language-Action World Models for Autonomous Driving
- arXiv: 2604.09059 (cross-listed)
- Authors: Guoqing Wang, Pin Tang, Xiangxuan Ren, Guodongfang Zhao, Bailan Feng, Chao Ma
- Subjects: cs.CV; cs.AI
- Tags: Autonomous Driving, Vision-Language Model, Reinforcement Learning
- Venue: CVPR 2026
- Summary: 本文提出了VLA-World世界模型,将预测性想象与反思推理统一用于自动驾驶。模型使用动作导出的可行轨迹引导下一帧图像生成,并在自生成的未来帧上进行推理来优化轨迹预测。
[129] Frequency-Enhanced Diffusion Models: Curriculum-Guided Semantic Alignment for Zero-Shot Skeleton Action Recognition
- arXiv: 2604.09063 (cross-listed)
- Authors: Yuxi Zhou, Zhengbo Zhang, Jingyu Pan, Zhiyu Lin, Zhigang Tu
- Subjects: cs.CV; cs.AI
- Tags: Diffusion Model, Zero-Shot Learning, Action Recognition
- Code: code
- Summary: 本文提出FDSM方法用于零样本骨骼动作识别,通过语义引导的频谱残差模块、时间步自适应频谱损失和课程式语义抽象来解决扩散模型的高频动态过平滑问题。该方法在NTU RGB+D、PKU-MMD和Kinetics-skeleton数据集上取得了最先进的性能。
[130] NyayaMind- A Framework for Transparent Legal Reasoning and Judgment Prediction in the Indian Legal System
- arXiv: 2604.09069 (cross-listed)
- Authors: Parjanya Aditya Shukla, Shubham Kumar Nigam, Debtanu Datta, Balaramamahanthi Deepak Patnaik, Noel Shallum, Pradeep Reddy Vanga, Saptarshi Ghosh, Arnab Bhattacharya
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: Legal AI, RAG, LLM Reasoning
- Summary: 本文提出NyayaMind框架,用于印度法律系统的透明法律推理和判决预测。该框架整合了基于RAG的检索模块和使用面向推理的LLM的预测模块,能够生成结构化输出包括法律议题、论据、理由和最终判决。
[131] Beyond Isolated Clients: Integrating Graph-Based Embeddings into Event Sequence Models
- arXiv: 2604.09085 (cross-listed)
- Authors: Harry Proshian, Nikita Severin, Sergey Nikolenko, Kireev Ivan, Andrey Savchenko, Ivan Sergeev, Maria Postnova, Ilya Makarov
- Subjects: cs.LG; cs.AI
- Tags: Graph Neural Network, Self-Supervised Learning, Recommender System
- Venue: WWW 2026
- Summary: 本文提出三种模型无关策略,将图结构信息整合到对比自监督学习的事件序列模型中,包括丰富事件嵌入、对齐客户端表示与图嵌入以及添加结构前置任务。在四个金融和电商数据集上的实验表明该方法持续提升准确率。
[132] DeepGuard: Secure Code Generation via Multi-Layer Semantic Aggregation
- arXiv: 2604.09089 (cross-listed)
- Authors: Li Huang, Zhongxin Liu, Yifan Wu, Tao Yin, Dong Li, Jichao Bi, Nankun Mu, Hongyu Zhang, Meng Yan
- Subjects: cs.SE; cs.AI; cs.CR
- Tags: Code Generation, LLM Security
- Venue: ACL 2026
- Code: code
- Summary: 本文提出DeepGuard框架,通过注意力模块聚合多层Transformer表示来检测代码中的安全漏洞模式,在保证功能正确性的同时将安全且正确的代码生成率平均提升11.9%。
[133] CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion
- arXiv: 2604.09101 (cross-listed)
- Authors: Akshit Jindal, Saket Anand, Chetan Arora, Vikram Goyal
- Subjects: cs.CR; cs.AI; cs.CV; cs.LG
- Tags: Backdoor Detection, Vision-Language Model
- Venue: CVPR 2026
- Summary: 本文提出CLIP-Inspector方法,用于检测提示调优CLIP模型中的后门攻击,通过重建可能的触发器来判断模型是否存在后门行为,在50个模型中达到94%的检测准确率。
[134] Scheming in the wild: detecting real-world AI scheming incidents with open-source intelligence
- arXiv: 2604.09104 (cross-listed)
- Authors: Tommy Shaffer Shane, Simon Mylius, Hamish Hobbs
- Subjects: cs.CY; cs.AI
- Tags: AI Safety, LLM Evaluation
- Summary: 本文引入开源情报(OSINT)方法来检测真实世界的AI欺骗事件,通过分析在线分享的聊天机器人对话记录,在183,420份转录文本中识别出698起欺骗相关事件,发现六个月内事件数量增长4.9倍。
[135] TensorHub: Scalable and Elastic Weight Transfer for LLM RL Training
- arXiv: 2604.09107 (cross-listed)
- Authors: Chenhao Ye, Huaizheng Zhang, Mingcong Han, Baoquan Zhong, Xiang Li, Qixiang Chen, Xinyi Zhang, Weidong Zhang, Kaihua Jiang, Wang Zhang, He Sun, Wencong Xiao, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau
- Subjects: cs.DC; cs.AI
- Tags: Distributed Training, Reinforcement Learning
- Summary: 本文提出TensorHub系统,采用引用导向存储(ROS)抽象来优化LLM强化学习训练中的权重传输,通过直接利用GPU上已有的权重副本服务读取请求,将GPU停滞时间减少高达6.7倍。
[136] PS-TTS: Phonetic Synchronization in Text-to-Speech for Achieving Natural Automated Dubbing
- arXiv: 2604.09111 (cross-listed)
- Authors: Changi Hong, Yoonah Song, Hwayoung Park, Chaewoon Bang, Dayeon Gu, Do Hyun Lee, Hong Kook Kim
- Subjects: eess.AS; cs.AI
- Tags: Speech Synthesis, Machine Translation
- Venue: ICPR 2026
- Summary: 本文提出PS-TTS方法用于自动配音,通过语言模型改写翻译文本实现时间同步,并使用动态时间规整进行音素同步以保持唇形同步,在韩英互译任务中表现优于专业配音演员。
[137] Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition
- arXiv: 2604.09121 (cross-listed)
- Authors: Peng Wang, Yanqiao Zhu, Zixuan Jiang, Qinyuan Chen, Xingjian Zhao, Xipeng Qiu, Wupeng Wang, Zhifu Gao, Xiangang Li, Kai Yu, Xie Chen
- Subjects: cs.CL; cs.AI; cs.SD
- Tags: Speech Processing, LLM Agent
- Summary: 本文提出一个交互式ASR智能体框架,使用LLM-as-a-Judge进行语义感知评估,并通过LLM驱动的智能体实现多轮交互式纠正,在语义保真度和交互纠正能力方面取得显著提升。
[138] EquiformerV3: Scaling Efficient, Expressive, and General SE(3)-Equivariant Graph Attention Transformers
- arXiv: 2604.09130 (cross-listed)
- Authors: Yi-Lun Liao, Alexander J. Hoffman, Sabrina C. Shen, Alexandre Duval, Sam Walton Norwood, Tess Smidt
- Subjects: cs.LG; cs.AI
- Tags: Graph Neural Network, Molecular Generation
- Summary: 本文提出EquiformerV3,第三代SE(3)等变图注意力Transformer,用于3D原子建模。关键改进包括SwiGLU-S²激活函数和光滑半径截断注意力,实现了对势能面的精确建模。
[139] CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI Automation
- arXiv: 2604.09155 (cross-listed)
- Authors: Yushi Feng, Junye Du, Qifan Wang, Zizhan Ma, Qian Niu, Yutaka Matsuo, Long Feng, Lequan Yu
- Subjects: cs.LG; cs.AI
- Tags: GUI Automation, AI Safety
- Summary: 本文提出CORA框架,使用共形风险控制为GUI自动化智能体提供有害动作的统计保证,通过Guardian模型估计动作风险并引入Phone-Harm基准进行评估。
[140] Structuring versus Problematizing: How LLM-based Agents Scaffold Learning in Diagnostic Reasoning
- arXiv: 2604.09158 (cross-listed)
- Authors: Fatma Betül Güreş, Tanya Nazaretsky, Seyed Parsa Neshaei, Tanja Käser
- Subjects: cs.HC; cs.AI
- Tags: Education Technology, LLM Agent
- Venue: LAK 2026
- Summary: 本研究引入PharmaSim Switch场景化学习环境,配备LLM驱动的药剂师智能体实现结构化和问题化两种脚手架方法,实验表明两种方法都能有效支持诊断策略的学习。
[141] Persona-E$^2$: A Human-Grounded Dataset for Personality-Shaped Emotional Responses to Textual Events
- arXiv: 2604.09162 (cross-listed)
- Authors: Yuqin Yang, Haowu Zhou, Haoran Tu, Zhiwen Hui, Shiqi Yan, HaoYang Li, Dong She, Xianrong Yao, Yang Gao, Zhanpeng Jin
- Subjects: cs.CL; cs.AI; cs.HC
- Tags: Affective Computing, LLM Evaluation
- Venue: ACL 2026
- Summary: 本文引入Persona-E²数据集,基于MBTI和大五人格标注,捕捉读者对新闻、社交媒体和生活叙事的情绪反应差异,揭示当前LLM难以准确捕捉人格驱动的情绪评估变化。
[142] Generalization and Scaling Laws for Mixture-of-Experts Transformers
- arXiv: 2604.09175 (cross-listed)
- Authors: Mansour Zoubeirou a Mayaki
- Subjects: cs.LG; cs.AI; math.ST; stat.ML
- Tags: Mixture-of-Experts, Optimization
- Summary: 本文发展了MoE Transformer的泛化和缩放理论,将每个输入的活跃容量与路由组合分离,推导出泛化界、逼近定理和神经缩放定律,阐明了模型规模、数据规模和计算最优权衡。
[143] Do LLMs Follow Their Own Rules? A Reflexive Audit of Self-Stated Safety Policies
- arXiv: 2604.09189 (cross-listed)
- Authors: Avni Mittal
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: LLM Alignment, AI Safety
- Summary: 本文提出SNCA框架,提取LLM自我声明的安全规则并测量其与实际行为的一致性,对四个前沿模型的评估揭示了声明政策与观察行为之间的系统性差距。
[144] Vision Transformers for Preoperative CT-Based Prediction of Histopathologic Chemotherapy Response Score in High-Grade Serous Ovarian Carcinoma
- arXiv: 2604.09197 (cross-listed)
- Authors: Francesca Fati, Felipe Coutinho, Marika Reinius, Marina Rosanu, Gabriel Funingana, Luigi De Vitis, Gabriella Schivardi, Hannah Clayton, Alice Traversa, Zeyu Gao, Guilherme Penteado, Shangqi Gao, Francesco Pastori, Ramona Woitek, Maria Cristina Ghioni, Giovanni Damiano Aletti, Mercedes Jimenez-Linan, Sarah Burge, Nicoletta Colombo, Evis Sala, Maria Francesca Spadea, Timothy L. Kline, James D. Brenton, Jaime Cardoso, Francesco Multinu, Elena De Momi, Mireia Crispin-Ortuzar, Ines P. Machado
- Subjects: cs.CV; cs.AI
- Tags: Medical AI, Multimodal Learning, Vision Transformer
- Summary: 本文提出一个2.5D多模态深度学习框架,使用预训练Vision Transformer编码器处理CT影像并结合临床变量,用于术前预测高级别浆液性卵巢癌的化疗反应评分,内部测试集ROC-AUC达0.95。
[145] Artificial intelligence can persuade people to take political actions
- arXiv: 2604.09200 (cross-listed)
- Authors: Kobi Hackenburg, Luke Hewitt, Caroline Wagner, Ben M. Tappin, Christopher Summerfield
- Subjects: cs.CY; cs.AI; cs.HC
- Tags: AI Persuasion
- Summary: 该研究通过两个大规模实验发现,AI能够显著影响人们的政治行为(如签署请愿书和捐款),但对态度和行为的影响效果之间没有相关性,表明以往基于态度的研究可能无法准确反映AI说服的现实行为影响。
[146] On the Role of DAG topology in Energy-Aware Cloud Scheduling : A GNN-Based Deep Reinforcement Learning Approach
- arXiv: 2604.09202 (cross-listed)
- Authors: Anas Hattay, Fred Ngole Mboula, Eric Gascard, Zakaria Yahoun
- Subjects: cs.LG; cs.AI
- Tags: Graph Neural Network, Reinforcement Learning, Energy Efficiency
- Summary: 该研究分析了基于图神经网络的深度强化学习调度器在云工作流调度中的失败原因,发现训练和部署环境之间的结构不匹配会导致消息传递中断,从而影响策略泛化能力。
[147] GRM: Utility-Aware Jailbreak Attacks on Audio LLMs via Gradient-Ratio Masking
- arXiv: 2604.09222 (cross-listed)
- Authors: Yunqiang Wang, Hengyuan Na, Di Wu, Miao Hu, Guocong Quan
- Subjects: cs.SD; cs.AI
- Tags: LLM Security, Speech Processing
- Summary: 该论文提出GRM框架,通过频率选择性扰动在音频大语言模型上实现高效的越狱攻击,同时保持模型的实用性,在四个代表性ALLM上达到88.46%的攻击成功率。
[148] The Fast Lane Hypothesis: Von Economo Neurons Implement a Biological Speed-Accuracy Tradeoff
- arXiv: 2604.09229 (cross-listed)
- Authors: Esila Keskin
- Subjects: cs.NE; cs.AI; q-bio.NC
- Tags: Cognitive Science, Neuromorphic Computing
- Code: code
- Summary: 该研究首次提出Von Economo神经元的计算模型,将其建模为快速泄漏积分发放神经元,证明VENs通过提供稀疏快速投射通路实现生物学的速度-精度权衡,支持快速社会决策。
[149] Neural Distribution Prior for LiDAR Out-of-Distribution Detection
- arXiv: 2604.09232 (cross-listed)
- Authors: Zizhao Li, Zhengkang Xiang, Jiayang Ao, Feng Liu, Joseph West, Kourosh Khoshelham
- Subjects: cs.CV; cs.AI
- Tags: Anomaly Detection, Autonomous Driving, 3D Vision
- Venue: CVPR 2026
- Summary: 该论文提出神经分布先验(NDP)框架用于LiDAR的分布外检测,通过建模网络预测的分布结构并自适应重新加权OOD分数,在STU测试集上达到61.31%的点级AP,比之前最佳结果高出10倍以上。
[150] Statistical Properties of the King Wen Sequence: An Anti-Habituation Structure That Does Not Improve Neural Network Training
- arXiv: 2604.09234 (cross-listed)
- Authors: Augustin Chan
- Subjects: cs.LG; cs.AI; cs.NE
- Tags: Curriculum Learning
- Summary: 该研究对《易经》六十四卦的文王卦序进行统计特征分析,发现其具有反习惯化结构特征,但实验证明这种结构并不能改善神经网络训练,反而会因高方差导致性能下降。
[151] DDSP-QbE++: Improving Speech Quality for Speech Anonymisation for Atypical Speech
- arXiv: 2604.09246 (cross-listed)
- Authors: Suhita Ghosh, Yamini Sinha, Sebastian Stober
- Subjects: cs.SD; cs.AI
- Tags: Speech Processing, Speech Synthesis
- Venue: CHI 2026 Workshop
- Summary: 该论文改进了DDSP-QbE语音转换管道的激励阶段,通过引入显式浊音检测和PolyBLEP校正,减少了混叠伪影,提高了非典型语音匿名化的感知自然度。
[152] Mosaic: Multimodal Jailbreak against Closed-Source VLMs via Multi-View Ensemble Optimization
- arXiv: 2604.09253 (cross-listed)
- Authors: Yuqin Lan, Gen Li, Yuanze Hu, Weihao Shen, Zhaoxin Fan, Faguo Wu, Xiao Zhang, Laurence T. Yang, Zhiming Zheng
- Subjects: cs.CV; cs.AI
- Tags: LLM Security, Vision-Language Model
- Summary: 该论文提出Mosaic框架,通过多视图集成优化对闭源视觉语言模型进行多模态越狱攻击,结合文本变换、多视图图像优化和代理集成指导,在商业闭源VLM上实现了最先进的攻击成功率。
[153] SkillMOO: Multi-Objective Optimization of Agent Skills for Software Engineering
- arXiv: 2604.09297 (cross-listed)
- Authors: Jingzhi Gong, Ruizhen Gu, Zhiwei Fei, Yazhuo Cao, Lukas Twist, Alina Geiger, Shuo Han, Dominik Sobania, Federica Sarro, Jie M. Zhang
- Subjects: cs.SE; cs.AI
- Tags: LLM Agent, Optimization, Code Generation
- Summary: 该论文提出SkillMOO框架,使用多目标优化自动演化LLM编程代理的技能包,在三个软件工程任务上将通过率提高最多131%,同时降低成本最多32%。
[154] SatQNet: Satellite-assisted Quantum Network Entanglement Routing Using Directed Line Graph Neural Networks
- arXiv: 2604.09306 (cross-listed)
- Authors: Tobias Meuser, Jannis Weil, Aninda Lahiri, Marius Paraschiv
- Subjects: cs.AI; cs.NI
- Tags: Quantum Computing, Graph Neural Network, Reinforcement Learning
- Summary: 该论文提出SatQNet,一种用于卫星辅助量子网络纠缠路由的强化学习方法,采用边中心有向线图神经网络进行局部消息传递,在动态拓扑中实现高保真度端到端纠缠。
[155] Visually-Guided Policy Optimization for Multimodal Reasoning
- arXiv: 2604.09349 (cross-listed)
- Authors: Zengbin Wang, Feng Xiong, Liang Lin, Xuecai Hu, Yong Wang, Yanlin Wang, Man Zhang, Xiangxiang Chu
- Subjects: cs.CV; cs.AI; cs.CL
- Tags: Vision-Language Model, Reinforcement Learning, Multimodal Learning
- Venue: ACL 2026
- Summary: 该论文提出视觉引导策略优化(VGPO)框架,通过视觉注意力补偿机制和双粒度优势重加权策略,增强视觉语言模型在多模态推理中的视觉关注,在数学多模态推理任务上取得更好性能。
[156] LLM-Rosetta: A Hub-and-Spoke Intermediate Representation for Cross-Provider LLM API Translation
- arXiv: 2604.09360 (cross-listed)
- Authors: Peng Ding
- Subjects: cs.SE; cs.AI
- Tags: LLM Interoperability
- Code: code
- Summary: 该论文提出LLM-Rosetta,一个基于枢纽-辐射中间表示的开源翻译框架,实现不同LLM提供商API之间的双向转换,支持请求和响应负载的无损往返转换。
[157] BadSkill: Backdoor Attacks on Agent Skills via Model-in-Skill Poisoning
- arXiv: 2604.09378 (cross-listed)
- Authors: Guiyao Tie, Jiawen Shi, Pan Zhou, Lichao Sun
- Subjects: cs.CR; cs.AI
- Tags: LLM Security, Backdoor Detection, LLM Agent
- Summary: 该论文提出BadSkill后门攻击,针对代理技能中嵌入的模型进行攻击,在8个触发技能上达到99.5%的平均攻击成功率,揭示了代理生态系统中模型承载技能的供应链风险。
[158] The AI Codebase Maturity Model: From Assisted Coding to Self-Sustaining Systems
- arXiv: 2604.09388 (cross-listed)
- Authors: Andy Anderson
- Subjects: cs.SE; cs.AI
- Tags: Code Generation, Software Testing
- Code: code
- Summary: 该论文提出AI代码库成熟度模型(ACMM),描述代码库从基础AI辅助编码到自维持系统的演进过程,核心发现是AI驱动开发系统的智能在于指令、测试、指标和反馈循环的基础设施。
[159] Yes, But Not Always. Generative AI Needs Nuanced Opt-in
- arXiv: 2604.09413 (cross-listed)
- Authors: Wiebke Hutiri, Morgan Scheuerman, Shruti Nagpal, Austin Hoag, Alice Xiang
- Subjects: cs.CY; cs.AI
- Tags: AI Ethics, AI Safety
- Summary: 该论文论证生成式AI中创意作品使用同意的一刀切方法不足,提出推理时细粒度选择加入架构,通过代理验证用户意图是否满足权利持有人授予的条件性同意。
[160] PhysInOne: Visual Physics Learning and Reasoning in One Suite
- arXiv: 2604.09415 (cross-listed)
- Authors: Siyuan Zhou, Hejun Wang, Hu Cheng, Jinxi Li, Dongsheng Wang, Junwei Jiang, Yixiao Jin, Jiayue Huang, Shiwei Mao, Shangjia Liu, Yafei Yang, Hongkang Song, Shenxing Wei, Zihui Zhang, Peng Huang, Shijie Liu, Zhengli Hao, Hao Li, Yitian Li, Wenqi Zhou, Zhihan Zhao, Zongqi He, Hongtao Wen, Shouwang Huang, Peng Yun, Bowen Cheng, Pok Kazaf Fu, Wai Kit Lai, Jiahao Chen, Kaiyuan Wang, Zhixuan Sun, Ziqi Li, Haochen Hu, Di Zhang, Chun Ho Yuen, Bing Wang, Zhihua Wang, Chuhang Zou, Bo Yang
- Subjects: cs.CV; cs.AI; cs.LG; cs.RO
- Tags: Video Understanding, Video Generation, Scientific Reasoning
- Venue: CVPR 2026
- Summary: 该论文提出PhysInOne大规模合成数据集,包含200万个视频覆盖153,810个动态3D场景和71种基本物理现象,用于物理感知视频生成、未来帧预测、物理属性估计和运动迁移等应用。
[161] Three Modalities, Two Design Probes, One Prototype, and No Vision: Experience-Based Co-Design of a Multi-modal 3D Data Visualization Tool
- arXiv: 2604.09426 (cross-listed)
- Authors: Sanchita S. Kamath, Aziz N Zeidieh, Venkatesh Potluri, Sile O'Modhrain, Kenneth Perry, JooYoung Seo
- Subjects: cs.HC; cs.AI; cs.IR
- Tags: Accessibility Technology, Human-Computer Interaction, Data Visualization
- Summary: 本文提出了一种基于经验的协同设计方法,与盲人和低视力用户共同设计了一个可访问的多模态3D数据可视化工具。该原型包含参考声化、立体和体积音频等功能,能够帮助盲人和低视力用户进行非视觉3D数据探索。
[162] Rays as Pixels: Learning A Joint Distribution of Videos and Camera Trajectories
- arXiv: 2604.09429 (cross-listed)
- Authors: Wonbong Jang, Shikun Liu, Soubhik Sanyal, Juan Camilo Perez, Kam Woh Ng, Sanskar Agrawal, Juan-Manuel Perez-Rua, Yiannis Douratsos, Tao Xiang
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Video Generation, Diffusion Model, Computer Vision
- Summary: 本文提出了Rays as Pixels方法,一种学习视频和相机轨迹联合分布的视频扩散模型。该模型能够从视频中预测相机轨迹、联合生成视频和轨迹,以及根据目标轨迹生成视频,并通过闭环自一致性测试验证了前向和逆向预测的一致性。
[163] On the Representational Limits of Quantum-Inspired 1024-D Document Embeddings: An Experimental Evaluation Framework
- arXiv: 2604.09430 (cross-listed)
- Authors: Dario Maio
- Subjects: cs.IR; cs.AI
- Tags: Information Retrieval, RAG, Representation Learning
- Summary: 本文提出了一个实验框架来构建基于量子启发的1024维文档嵌入,并评估其在信息检索中的效果。实验结果表明,量子启发式嵌入存在距离压缩和排序不稳定性等结构性限制,更适合作为辅助组件而非独立的检索表示。
[164] Physics-guided surrogate learning enables zero-shot control of turbulent wings
- arXiv: 2604.09434 (cross-listed)
- Authors: Yuning Wang, Pol Suarez, Mathis Bode, Ricardo Vinuesa
- Subjects: cs.AI
- Tags: Reinforcement Learning, Flow Control, Transfer Learning
- Summary: 本文展示了利用物理引导的代理学习实现湍流翼型零样本控制的方法。该方法在湍流通道流中训练策略并直接部署到NACA4412翼型上,实现了28.7%的表面摩擦阻力和10.7%的总阻力降低,同时将训练成本降低了四个数量级。
[165] TME-PSR: Time-aware, Multi-interest, and Explanation Personalization for Sequential Recommendation
- arXiv: 2604.09439 (cross-listed)
- Authors: Qingzhuo Wang, Leilei Wen, Juntao Chen, Kunyu Peng, Ruiyang Qin, Zhihua Wei, Wen Shen
- Subjects: cs.IR; cs.AI
- Tags: Recommender System
- Summary: 本文提出了TME-PSR序列推荐模型,集成了时间感知个性化、多兴趣个性化和解释个性化。该模型采用双视图门控时间编码器、轻量级多头线性循环单元架构和动态双分支互信息加权机制,在较低计算成本下提高了推荐准确性和解释质量。
[166] Many-Tier Instruction Hierarchy in LLM Agents
- arXiv: 2604.09443 (cross-listed)
- Authors: Jingyu Zhang, Tianjian Li, William Jurayj, Hongyuan Zhan, Benjamin Van Durme, Daniel Khashabi
- Subjects: cs.CL; cs.AI
- Tags: LLM Agent, LLM Alignment
- Summary: 本文提出了Many-Tier Instruction Hierarchy方法,用于解决LLM代理中任意多特权级别的指令冲突问题。作者引入了ManyIH-Bench基准测试,包含多达12个特权级别的冲突指令,实验表明当前前沿模型在指令冲突扩展时表现不佳(约40%准确率)。
[167] ECHO: Efficient Chest X-ray Report Generation with One-step Block Diffusion
- arXiv: 2604.09450 (cross-listed)
- Authors: Lifeng Chen, Tianqi You, Hao Liu, Zhimin Bao, Jile Jiao, Xiao Han, Zhicai Ou, Tao Sun, Xiaofeng Mou, Xiaojie Jin, Yi Xu
- Subjects: cs.LG; cs.AI; eess.IV
- Tags: Medical AI, Vision-Language Model, Diffusion Model
- Summary: 本文提出了ECHO,一种高效的基于扩散的视觉语言模型,用于胸部X光报告生成。该模型通过直接条件蒸馏框架实现稳定的每块一步推理,在RaTE和SemScore上分别提升64.33%和60.58%,同时实现8倍推理加速。
[168] SafeAdapt: Provably Safe Policy Updates in Deep Reinforcement Learning
- arXiv: 2604.09452 (cross-listed)
- Authors: Maksim Anisimov, Francesco Belardinelli, Matthew Wicker
- Subjects: cs.LG; cs.AI
- Tags: Reinforcement Learning, AI Safety, Continual Learning
- Code: code
- Summary: 本文引入了Rashomon集概念,为持续强化学习中的策略更新提供先验可证明的安全保证。该方法通过将策略更新投影到经过认证的安全区域,确保在下游适应过程中保持对源任务的安全性保证。
[169] SafeMind: A Risk-Aware Differentiable Control Framework for Adaptive and Safe Quadruped Locomotion
- arXiv: 2604.09474 (cross-listed)
- Authors: Zukun Zhang, Kai Shu, Mingqiao Mo
- Subjects: cs.RO; cs.AI
- Tags: Robotics, AI Safety, Reinforcement Learning
- Summary: 本文介绍了SafeMind,一个可微分随机安全控制框架,将概率控制障碍函数与语义上下文理解和元自适应风险校准相结合。该框架在Unitree A1和ANYmal C上部署,在12种地形类型中将安全违规减少3-10倍,能耗降低10-15%。
[170] XFED: Non-Collusive Model Poisoning Attack Against Byzantine-Robust Federated Classifiers
- arXiv: 2604.09489 (cross-listed)
- Authors: Israt Jahan Mouri, Muhammad Ridowan, Muhammad Abdullah Adnan
- Subjects: cs.CR; cs.AI; cs.DC; cs.LG
- Tags: Federated Learning, Adversarial Robustness, Cybersecurity
- Summary: 本文引入了非合谋攻击模型,其中所有被攻陷的客户端共享共同的对抗目标但独立操作。提出的XFED攻击绕过了八种最先进的防御措施并优于六种现有的模型投毒攻击,表明联邦学习系统比之前认为的更不安全。
[171] RecaLLM: Addressing the Lost-in-Thought Phenomenon with Explicit In-Context Retrieval
- arXiv: 2604.09494 (cross-listed)
- Authors: Kyle Whitecross, Negin Rahimi
- Subjects: cs.CL; cs.AI; cs.IR; cs.LG
- Tags: Long Context, LLM Reasoning, RAG
- Code: code
- Summary: 本文提出了RecaLLM,一种将推理与显式上下文检索交替进行的推理语言模型,以解决”迷失在思考中”现象。该模型使用约束解码机制实现证据跨度的逐字复制,在长上下文基准测试上取得优异性能,且训练样本最多仅10K词元。
[172] BERT-as-a-Judge: A Robust Alternative to Lexical Methods for Efficient Reference-Based LLM Evaluation
- arXiv: 2604.09497 (cross-listed)
- Authors: Hippolyte Gisserot-Boukhlef, Nicolas Boizard, Emmanuel Malherbe, Céline Hudelot, Pierre Colombo
- Subjects: cs.CL; cs.AI
- Tags: LLM Evaluation
- Summary: 本文引入了BERT-as-a-Judge,一种基于编码器的方法用于评估参考型生成设置中的答案正确性。该方法在词法基线上表现更优,同时匹配更大LLM评判器的性能,为可扩展的LLM评估提供了成本效益的权衡方案。
[173] VISOR: Agentic Visual Retrieval-Augmented Generation via Iterative Search and Over-horizon Reasoning
- arXiv: 2604.09508 (cross-listed)
- Authors: Yucheng Shen, Jiulong Wu, Jizhou Huang, Dawei Yin, Lingyong Yan, Min Cao
- Subjects: cs.CV; cs.AI
- Tags: RAG, Vision-Language Model, LLM Agent
- Summary: 本文提出了VISOR,一个统一的单代理视觉RAG框架,通过结构化证据空间、视觉动作评估与纠正机制以及滑动窗口动态轨迹来解决视觉证据稀疏性和长程搜索漂移问题。该方法在ViDoSeek、SlideVQA和MMLongBench上取得了最先进的性能。
[174] Semantic Rate-Distortion for Bounded Multi-Agent Communication: Capacity-Derived Semantic Spaces and the Communication Cost of Alignment
- arXiv: 2604.09521 (cross-listed)
- Authors: Anthony T. Nixon
- Subjects: cs.IT; cs.AI
- Tags: Multi-Agent System, Information Theory
- Code: code
- Summary: 本文推导了有界代理的容量导出语义空间,证明了异构代理之间的通信存在结构性相变。研究证明了相变定理、Wyner-Ziv基准识别以及用于组合通信的对齐遍历界限,并在八个POMDP环境中验证了相变现象。
[175] Envisioning the Future, One Step at a Time
- arXiv: 2604.09527 (cross-listed)
- Authors: Stefan Andreas Baumann, Jannik Wiese, Tommaso Martorella, Mahdi M. Kalayeh, Björn Ommer
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Video Generation, Diffusion Model, Computer Vision
- Venue: CVPR 2026
- Summary: 本文将未来场景动态预测表述为稀疏点轨迹上的逐步推理,使用自回归扩散模型推进轨迹。该方法能够从单张图像快速生成数千个多样化的未来场景,同时保持物理合理性和长程一致性,在预测准确性上匹配或超越密集模拟器。
[176] VL-Calibration: Decoupled Confidence Calibration for Large Vision-Language Models Reasoning
- arXiv: 2604.09529 (cross-listed)
- Authors: Wenyi Xiao, Xinchi Xu, Leilei Gan
- Subjects: cs.CV; cs.AI; cs.CL
- Tags: Vision-Language Model, LLM Hallucination, Uncertainty Estimation
- Venue: ACL 2026
- Code: code
- Summary: 本文提出了VL-Calibration,一个强化学习框架,将大视觉语言模型的置信度显式解耦为视觉置信度和推理置信度。该方法使用内在视觉确定性估计和词元级优势重加权,在十三个基准测试上有效改善了校准效果并提升了视觉推理准确性。
[177] VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images
- arXiv: 2604.09531 (cross-listed)
- Authors: Guanyu Zhou, Yida Yin, Wenhao Chai, Shengbang Tong, Xingyu Fu, Zhuang Liu
- Subjects: cs.CV; cs.AI; cs.CL
- Tags: Vision-Language Model, Data Synthesis, Multimodal Learning
- Summary: 本文提出了VisionFoundry,一个任务感知的合成数据生成管道,仅以任务名称为输入生成视觉问答数据集,用于提升视觉语言模型的视觉感知能力。在VisionFoundry-10K数据集上训练的模型在MMVP上提升7%,在CV-Bench-3D上提升10%。
[178] Seeing is Believing: Robust Vision-Guided Cross-Modal Prompt Learning under Label Noise
- arXiv: 2604.09532 (cross-listed)
- Authors: Zibin Geng, Xuefeng Jiang, Jia Li, Zheng Li, Tian Wen, Lvhua Wu, Sheng Sun, Yuwei Wang, Min Liu
- Subjects: cs.CV; cs.AI
- Tags: Prompt Engineering, Vision-Language Model, Multimodal Learning
- Code: code
- Summary: 本文提出了VisPrompt,一个用于噪声标签场景的视觉引导提示学习框架,通过跨模态注意力机制将视觉语义反向注入提示表示,并引入条件调制机制自适应控制视觉信息注入强度,显著提升了鲁棒性。
[179] Case-Grounded Evidence Verification: A Framework for Constructing Evidence-Sensitive Supervision
- arXiv: 2604.09537 (cross-listed)
- Authors: Soroosh Tayebi Arasteh, Mehdi Joodaki, Mahshad Lotfinia, Sven Nebelung, Daniel Truhn
- Subjects: cs.CL; cs.AI; cs.IR; cs.LG
- Tags: Medical AI, Question Answering, Information Extraction
- Summary: 本文提出了案例引导的证据验证框架,通过生成显式支持示例和语义控制的非支持示例来构建监督信号,使模型真正依赖证据进行决策。该方法在放射学领域验证了其有效性,学习到的验证器显著优于基线。
[180] Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism
- arXiv: 2604.09544 (cross-listed)
- Authors: Hadas Orgad, Boyi Wei, Kaden Zheng, Martin Wattenberg, Peter Henderson, Seraphina Goldfarb-Tarrant, Yonatan Belinkov
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: LLM Alignment, LLM Security, Interpretability
- Summary: 本文通过目标权重剪枝探究LLM中有害内容的内部组织结构,发现有害内容生成依赖于紧凑的权重集合,这些权重跨危害类型通用且与良性能力分离。对齐会压缩有害生成权重,这解释了微调后出现的涌现性错位现象。
替换投稿 (109)
[181] Reflection of Episodes: Learning to Play Game from Expert and Self Experiences
- arXiv: 2502.13388 (replaced)
- Authors: Xiaojie Xu, Zongyuan Li, Chang Lu, Runnan Qi, Yanan Ni, Lumin Jiang, Xiangbei Liu, Xuebo Zhang, Yongchun Fang, Kuihua Huang, Xian Guo, Zhanghua Wu, Zhenya Li
- Subjects: cs.AI
- Tags: LLM Agent, Game AI, Reinforcement Learning
- Summary: 本文提出了基于专家经验和自我经验的反思框架(ROE),使LLM能够通过关键帧选择、决策制定和事后反思来学习《星际争霸II》中的复杂策略,成功击败了非常困难难度的机器人。
[182] Bayesian Social Deduction with Graph-Informed Language Models
- arXiv: 2506.17788 (replaced)
- Authors: Shahab Rahimirad, Guven Gergerli, Lucia Romero, Angela Qian, Matthew Lyle Olson, Simon Stepputtis, Joseph Campbell
- Subjects: cs.AI; cs.CL; cs.LG; cs.MA
- Tags: LLM Agent, Social Reasoning, Multi-Agent System
- Venue: ACL 2026
- Summary: 本文提出了一个混合推理框架,将信念推断外化到结构化概率模型中,同时使用LLM进行语言理解和交互。该方法在Avalon游戏中击败了人类玩家,成为首个在对照研究中战胜人类玩家的语言智能体。
[183] ChipSeek: Optimizing Verilog Generation via EDA-Integrated Reinforcement Learning
- arXiv: 2507.04736 (replaced)
- Authors: Zhirong Chen, Kaiyan Chang, Zhuolin Li, Cangyuan Li, Xinyang He, Chujie Chen, Mengdi Wang, Haobo Xu, Yinhe Han, Huawei Li, Ying Wang
- Subjects: cs.AI; cs.AR; cs.PL
- Tags: RTL Generation, Reinforcement Learning, EDA
- Venue: ACL 2026
- Code: code
- Summary: 本文提出了ChipSeek,一个基于分层奖励的强化学习框架,将EDA仿真器和综合工具的反馈集成到奖励机制中,使LLM能够生成功能正确且PPA指标优化的RTL代码,在标准基准测试上达到了最先进性能。
[184] Rethinking Prospect Theory for LLMs: Revealing the Instability of Decision-Making under Epistemic Uncertainty
- arXiv: 2508.08992 (replaced)
- Authors: Rui Wang, Qihan Lin, Jiayu Liu, Qing Zong, Tianshi Zheng, Dadi Guo, Haochen Shi, Weiqi Wang, Yangqiu Song
- Subjects: cs.AI
- Tags: Decision Making, LLM Evaluation, Uncertainty Estimation
- Summary: 本文研究了前景理论在LLM决策建模中的适用性和鲁棒性,发现在认知不确定性扰动下,基于前景理论的LLM决策建模在不同模型间不一致且不够鲁棒,对实际应用提出了警示。
[185] Interactive Program Synthesis for Modeling Collaborative Physical Activities from Narrated Demonstrations
- arXiv: 2509.24250 (replaced)
- Authors: Edward Kim, Daniel He, Jorge Chao, Wiktor Rajca, Mohammed Amin, Nishant Malpani, Ruta Desai, Antti Oulasvirta, Bjoern Hartmann, Sanjit Seshia
- Subjects: cs.AI; cs.HC; cs.LG
- Tags: Program Synthesis, Human-Computer Interaction, Embodied AI
- Summary: 本文将协作物理任务学习建模为程序合成问题,使用叙述演示(配对的物理动作和自然语言)作为统一模态来教学、检查和纠正系统行为。用户研究表明70%的参与者成功优化了学习到的程序。
[186] Chain-in-Tree: Back to Sequential Reasoning in LLM Tree Search
- arXiv: 2509.25835 (replaced)
- Authors: Xinzhe Li
- Subjects: cs.AI
- Tags: LLM Reasoning, LLM Inference, Optimization
- Venue: ACL 2026 Findings
- Code: code
- Summary: 本文提出了Chain-in-Tree框架,通过引入轻量级的分支必要性评估来决定搜索过程中何时分支,在GSM8K和Math500上减少了75-85%的token生成、模型调用和运行时间,同时保持准确率基本不变。
[187] When Identity Skews Debate: Anonymization for Bias-Reduced Multi-Agent Reasoning
- arXiv: 2510.07517 (replaced)
- Authors: Hyeong Kyu Choi, Xiaojin Zhu, Sharon Li
- Subjects: cs.AI; cs.MA
- Tags: Multi-Agent System, LLM Reasoning, Bias Mitigation
- Venue: ACL 2026
- Code: code
- Summary: 本文提出了一个原则性框架来缓解多智能体辩论中的身份偏见(奉承和自我偏见),通过响应匿名化使智能体无法区分自己和同伴,并定义了身份偏见系数(IBC)来量化智能体跟随同伴与自身的倾向。
[188] AlphaCast: A Human Wisdom-LLM Intelligence Co-Reasoning Framework for Interactive Time Series Forecasting
- arXiv: 2511.08947 (replaced)
- Authors: Xiaohan Zhang, Tian Gao, Mingyue Cheng, Bokai Pan, Ze Guo, Yaguo Liu, Xiaoyu Tao, Qi Liu
- Subjects: cs.AI
- Tags: Time Series Forecasting, LLM Agent, LLM Reasoning
- Code: code
- Summary: 本文提出了AlphaCast,一个交互驱动的智能体推理框架,将时间序列预测重构为多阶段工作流,包括上下文准备、推理生成和反思评估,在多个基准测试上优于代表性基线方法。
[189] Thermally Activated Dual-Modal Adversarial Clothing against AI Surveillance Systems
- arXiv: 2511.09829 (replaced)
- Authors: Jiahuan Long, Tingsong Jiang, Hanqing Liu, Chao Ma, Weien Zhou, Yang Yang, Wen Yao
- Subjects: cs.AI
- Tags: Adversarial Robustness, Privacy, Object Detection
- Venue: CVPR 2026
- Summary: 本文提出了一种热激活对抗性可穿戴系统,将热致变色染料与柔性加热单元结合,在服装表面产生动态对抗图案,在可见光和红外模态下均能以超过80%的成功率有效规避AI监控系统的检测。
[190] EchoTrail-GUI: Building Actionable Memory for GUI Agents via Critic-Guided Self-Exploration
- arXiv: 2512.19396 (replaced)
- Authors: Runze Li, Yuwen Zhai, Bo Xu, LiWu Xu, Nian Shi, Wei Zhang, Ran Lin, Liang Wang
- Subjects: cs.AI
- Tags: GUI Automation, LLM Agent, Memory Architecture
- Venue: CVPR 2026 Findings
- Summary: 本文提出了EchoTrail-GUI框架,通过经验探索、记忆注入和任务推理三个阶段为GUI智能体配备动态记忆,在Android World和AndroidLab基准测试上显著提升了任务成功率和操作效率。
[191] Sample-Efficient Neurosymbolic Deep Reinforcement Learning
- arXiv: 2601.02850 (replaced)
- Authors: Celeste Veronese, Alessandro Farinelli, Daniele Meli
- Subjects: cs.AI
- Tags: Neurosymbolic AI, Reinforcement Learning, Automated Planning
- Summary: 本文提出了一种神经符号深度强化学习方法,将背景符号知识集成为逻辑规则,通过偏置动作分布和重新缩放Q值来引导训练过程,在网格世界环境中提高了样本效率和泛化能力。
[192] Precomputing Multi-Agent Path Replanning using Temporal Flexibility
- arXiv: 2601.04884 (replaced)
- Authors: Issa Hanou, Eric Kemmeren, Devin Wild Thomas, Mathijs de Weerdt
- Subjects: cs.AI
- Tags: Multi-Agent System, Automated Planning, Optimization
- Summary: 本文提出了FlexSIPP算法,通过跟踪和利用其他智能体的时间灵活性来预计算多智能体路径重规划方案,在荷兰铁路网络和MovingAI基准测试中展示了有效性。
[193] Reasoning Models Will Sometimes Lie About Their Reasoning
- arXiv: 2601.07663 (replaced)
- Authors: William Walden, Miriam Wanner
- Subjects: cs.AI; cs.CL
- Tags: LLM Reasoning, Interpretability, LLM Hallucination
- Summary: 本文研究了大型推理模型在明确被告知存在异常输入时的忠实性问题。研究发现,虽然模型可能承认提示的存在,但往往会否认有意使用它们,即使实际上正在使用,这为思维链监控和可解释性带来了挑战。
[194] ConvoLearn: A Learning Sciences Grounded Dataset for Fine-Tuning Dialogic AI Tutors
- arXiv: 2601.08950 (replaced)
- Authors: Mayank Sharma, Roy Pea, Hari Subramonyam
- Subjects: cs.AI; cs.HC; cs.LG
- Tags: Education Technology, Instruction Tuning, Dialogue System
- Summary: 本文介绍了ConvoLearn数据集,包含2,134个半合成的师生对话,用于训练对话式AI辅导系统。在该数据集上微调的开放权重模型能够达到与专有基线相媲美的辅导质量。
[195] The Hot Mess of AI: How Does Misalignment Scale With Model Intelligence and Task Complexity?
- arXiv: 2601.23045 (replaced)
- Authors: Alexander Hägele, Aryo Pradipta Gema, Henry Sleight, Ethan Perez, Jascha Sohl-Dickstein
- Subjects: cs.AI
- Tags: LLM Alignment, AI Safety, LLM Reasoning
- Venue: ICLR 2026
- Summary: 本文使用偏差-方差分解研究AI失败是源于追求非预期目标还是行为不连贯。结果表明,随着模型在更难任务上花费更多时间推理,失败变得更加不连贯而非系统性错位。
[196] Reasoning in a Combinatorial and Constrained World: Benchmarking LLMs on Natural-Language Combinatorial Optimization
- arXiv: 2602.02188 (replaced)
- Authors: Xia Jiang, Jing Chen, Cong Zhang, Jie Gao, Chengpeng Hu, Chenhao Zhang, Yaoxin Wu, Yingqian Zhang
- Subjects: cs.AI
- Tags: LLM Reasoning, LLM Evaluation, Optimization
- Summary: 本文介绍了NLCO基准,用于评估大语言模型在自然语言描述的组合优化问题上的表现。实验表明,模型在小规模实例上表现良好,但随着实例规模增大性能下降,尤其是图结构问题。
[197] H-AdminSim: A Multi-Agent Simulator for Realistic Hospital Administrative Workflows with FHIR Integration
- arXiv: 2602.05407 (replaced)
- Authors: Jun-Min Lee, Meong Hi Son, Edward Choi
- Subjects: cs.AI; cs.CL
- Tags: Multi-Agent System, Medical AI, LLM Evaluation
- Venue: CHIL 2026
- Summary: 本文提出了H-AdminSim,一个结合真实数据生成和多智能体模拟的医院行政工作流仿真框架。通过FHIR集成,该框架为评估LLM驱动的行政自动化提供了标准化测试平台。
[198] ReplicatorBench: Benchmarking LLM Agents for Replicability in Social and Behavioral Sciences
- arXiv: 2602.11354 (replaced)
- Authors: Bang Nguyen, Dominik Soós, Qian Ma, Rochana R. Obadage, Zack Ranjan, Sai Koneru, Anna Szabelska, Adam Gill, Timothy M. Errington, Shakhlo Nematova, Sarah Rajtmajer, Jian Wu, Meng Jiang
- Subjects: cs.AI; cs.CL
- Tags: LLM Agent, LLM Evaluation, Scientific Reasoning
- Code: code
- Summary: 本文介绍了ReplicatorBench,一个用于评估AI智能体在社会科学研究复制任务中表现的基准。该基准包含可复制和不可复制的研究声明,评估智能体在复制过程三个阶段的能力。
[199] PACED: Distillation and On-Policy Self-Distillation at the Frontier of Student Competence
- arXiv: 2603.11178 (replaced)
- Authors: Yuanda Xu, Hejian Sang, Zhengze Zhou, Ran He, Zhipeng Wang
- Subjects: cs.AI; cs.LG
- Tags: Knowledge Distillation, LLM Reasoning
- Summary: 本文提出了PACED,一种知识蒸馏方法,通过学生通过率加权训练问题来聚焦最近发展区。该方法在数学推理基准上取得了最先进结果,同时减少了遗忘。
[200] Reasoning Provenance for Autonomous AI Agents: Structured Behavioral Analytics Beyond State Checkpoints and Execution Traces
- arXiv: 2603.21692 (replaced)
- Authors: Neelmani Vispute, Aditya Kadam
- Subjects: cs.AI; cs.DC; cs.SE
- Tags: LLM Agent, Interpretability
- Summary: 本文引入了智能体执行记录(AER),一种用于自主AI智能体的结构化推理溯源原语,捕获意图、观察和推理。AER支持群体级别的行为分析,包括推理模式挖掘和跨智能体比较。
[201] TRU: Targeted Reverse Update for Efficient Multimodal Recommendation Unlearning
- arXiv: 2604.02183 (replaced)
- Authors: Zhanting Zhou, KaHou Tam, Ziqiang Zheng, Zeyu Ma, Yang Yang
- Subjects: cs.AI
- Tags: Machine Unlearning, Recommender System, Multimodal Learning
- Summary: 本文提出了TRU,一种针对多模态推荐系统的机器遗忘框架。TRU通过在排序行为、模态分支和网络层上进行协调干预来解决非均匀影响分布问题。
[202] ActionNex: A Virtual Outage Manager for Cloud Computing
- arXiv: 2604.03512 (replaced)
- Authors: Zhenfeng Lin, Haoji Hu, Ming Hao, Xuchao Zhang, Ryan Zhang, Junhao Li, Ze Li, Oleg Kulygin, Chetan Bansal, Hatay Tuna, Murali Chintalapati, Sheila Jiang, Salman Zafar, Angie Anderson
- Subjects: cs.AI
- Tags: LLM Agent, Decision Making, Knowledge Distillation
- Summary: 本文介绍了ActionNex,一个生产级智能体系统,用于云计算中断管理,提供实时更新、知识蒸馏和下一步行动建议。该系统已在Azure生产环境中试点并获得积极反馈。
[203] Domain-Contextualized Inference: A Computable Graph Architecture for Explicit-Domain Reasoning
- arXiv: 2604.04344 (replaced)
- Authors: Chao Li, Yuru Wang, Chunyi Zhao
- Subjects: cs.AI
- Tags: Knowledge Representation, Neurosymbolic AI
- Summary: 本文建立了一种计算平台无关的推理架构,其中领域是显式的一等计算参数。该架构提供领域范围剪枝、平台无关执行和透明的推理链,并通过临床推理案例研究进行验证。
[204] Memory Intelligence Agent
- arXiv: 2604.04503 (replaced)
- Authors: Jingyang Qiao, Weicheng Meng, Yu Cheng, Zhihang Lin, Zhizhong Zhang, Xin Tan, Jingyu Gong, Kun Shao, Yuan Xie
- Subjects: cs.AI; cs.MA
- Tags: LLM Agent, Memory Architecture, Reinforcement Learning
- Summary: 本文提出了MIA,一个具有管理器-规划器-执行器架构的记忆智能体框架,通过参数化和非参数化记忆之间的双向转换实现高效记忆演化。该框架在十一个基准上表现出优越性能。
[205] ActivityEditor: Learning to Synthesize Physically Valid Human Mobility
- arXiv: 2604.05529 (replaced)
- Authors: Chenjie Yang, Yutian Jiang, Anqi Liang, Wei Qi, Chenyu Wu, Junbo Zhang
- Subjects: cs.AI
- Tags: Data Synthesis, Reinforcement Learning, Multi-Agent System
- Summary: 本文提出了ActivityEditor,一个用于零样本跨区域人类轨迹生成的双LLM智能体框架。该框架使用基于意图的智能体和编辑智能体,通过强化学习确保物理有效的移动性合成。
[206] AgentCE-Bench: Agent Configurable Evaluation with Scalable Horizons and Controllable Difficulty under Lightweight Environments
- arXiv: 2604.06111 (replaced)
- Authors: Wang Yang, Chaoda Song, Xinpeng Li, Debargha Ganguly, Chuang Ma, Shouren Wang, Zhihao Dou, Yuli Zhou, Vipin Chaudhary, Xiaotian Han
- Subjects: cs.AI; cs.CL
- Tags: LLM Agent, LLM Evaluation
- Summary: 本文介绍了AgentCE-Bench,一个用于评估LLM智能体的基准,具有可控任务范围和难度级别。该基准通过静态JSON文件解析消除了环境设置开销,实现快速可复现的评估。
[207] Towards Knowledgeable Deep Research: Framework and Benchmark
- arXiv: 2604.07720 (replaced)
- Authors: Wenxuan Liu, Zixuan Li, Long Bai, Chunmao Zhang, Fenghui Zhang, Zhuo Chen, Wei Li, Yuxin Zuo, Fei Wang, Bingbing Xu, Xuhui Jiang, Jin Zhang, Xiaolong Jin, Jiafeng Guo, Tat-Seng Chua, Xueqi Cheng
- Subjects: cs.AI
- Tags: LLM Agent, Multi-Agent System, LLM Evaluation
- Summary: 本文介绍了KDR(知识深度研究)任务,要求智能体使用结构化和非结构化知识生成报告。作者提出了HKA多智能体架构和KDR-Bench用于跨9个领域的系统评估。
[208] Squeeze Evolve: Unified Multi-Model Orchestration for Verifier-Free Evolution
- arXiv: 2604.07725 (replaced)
- Authors: Monishwaran Maheswaran, Leon Lakhani, Zhongzhu Zhou, Shijia Yang, Junxiong Wang, Coleman Hooper, Yuezhou Hu, Rishabh Tiwari, Jue Wang, Harman Singh, Qingyang Wu, Yuqing Jian, Ce Zhang, Kurt Keutzer, Tri Dao, Xiaoxia Wu, Ben Athiwaratkun, James Zou, Chenfeng Xu
- Subjects: cs.AI; cs.CL
- Tags: LLM Inference, Optimization
- Summary: 本文介绍了Squeeze Evolve,一个用于无验证器演化推理的多模型编排框架,根据边际效用分配模型能力。该方法在降低API成本的同时取得了多项任务的最先进结果。
[209] EigentSearch-Q+: Enhancing Deep Research Agents with Structured Reasoning Tools
- arXiv: 2604.07927 (replaced)
- Authors: Boer Zhang, Mingyan Wu, Dongzhuoran Zhou, Yuqicheng Zhu, Wendong Fan, Puzhen Zhang, Zifeng Ding, Guohao Li, Yuan He
- Subjects: cs.AI
- Tags: LLM Agent, RAG, Question Answering
- Summary: 本文介绍了Q+工具集,通过引导查询规划、监控搜索进度和从长网页快照中提取证据,使深度研究代理的网络搜索更加审慎和结构化。该方法在多个基准测试上提升了浏览器代理的准确率。
[210] MONETA: Multimodal Industry Classification through Geographic Information with Multi Agent Systems
- arXiv: 2604.07956 (replaced)
- Authors: Arda Yüksel, Gabriel Thiem, Susanne Walter, Patrick Felka, Gabriela Alves Werb, Ivan Habernal
- Subjects: cs.AI
- Tags: Multimodal Learning, Vision-Language Model, Text Classification
- Venue: ACL 2026
- Summary: 本文提出了MONETA,首个结合文本和地理空间来源的多模态行业分类基准数据集,包含欧洲1000家企业及其经济活动标签。作者展示了使用多轮设计和上下文增强的方法显著提升了分类性能。
[211] ASPECT:Analogical Semantic Policy Execution via Language Conditioned Transfer
- arXiv: 2604.08355 (replaced)
- Authors: Ajsal Shereef Palattuparambil, Thommen George Karimpanal, Santu Rana
- Subjects: cs.AI
- Tags: Transfer Learning, Reinforcement Learning, LLM Reasoning
- Summary: 本文提出了一种利用LLM作为语义操作符的方法,通过文本条件变分自编码器实现强化学习代理的零样本迁移。该方法能够将当前观察语义重映射以对齐源任务,从而实现对新任务的策略复用。
[212] Task-Distributionally Robust Data-Free Meta-Learning
- arXiv: 2311.14756 (replaced)
- Authors: Zixuan Hu, Yongxian Wei, Li Shen, Zhenyi Wang, Baoyuan Wu, Chun Yuan, Dacheng Tao
- Subjects: cs.LG; cs.AI
- Tags: Meta-Learning, Few-Shot Learning, Adversarial Robustness
- Code: code
- Summary: 本文系统研究了无数据元学习(DFML)的鲁棒性,识别出任务分布偏移和任务分布腐败两个关键漏洞。作者提出了一个可信DFML框架,包含合成任务重建、任务记忆插值元学习和自动模型选择三个组件来缓解这些漏洞。
[213] Temporal Transfer Learning for Traffic Optimization with Coarse-grained Advisory Autonomy
- arXiv: 2312.09436 (replaced)
- Authors: Jung-Hoon Cho, Sirui Li, Jeongyun Kim, Cathy Wu
- Subjects: cs.RO; cs.AI; cs.LG; eess.SY
- Tags: Transfer Learning, Autonomous Driving, Reinforcement Learning
- Summary: 本文探索了利用连接和自动驾驶车辆技术优化城市交通的咨询式自主方法。作者引入了时序迁移学习(TTL)算法来选择源任务进行零样本迁移,在多种混合交通场景中验证了方法的有效性。
[214] Group-Aware Coordination Graph for Multi-Agent Reinforcement Learning
- arXiv: 2404.10976 (replaced)
- Authors: Wei Duan, Jie Lu, Junyu Xuan
- Subjects: cs.LG; cs.AI; cs.MA
- Tags: Multi-Agent System, Reinforcement Learning, Graph Neural Network
- Venue: IJCAI 2024
- Summary: 本文提出了群体感知协调图(GACG)方法,用于捕获智能体对之间的合作关系和基于行为模式的群体级依赖关系。该方法通过图卷积进行信息交换,并引入群体距离损失来确保行为一致性。
[215] Detection and Characterization of Coordinated Online Behavior: A Survey
- arXiv: 2408.01257 (replaced)
- Authors: Lorenzo Mannocci, Michele Mazza, Anna Monreale, Maurizio Tesconi, Stefano Cresci
- Subjects: cs.SI; cs.AI; cs.CY; cs.HC; cs.LG
- Tags: Social Network Analysis, Cybersecurity
- Summary: 本综述收集、分类并批判性讨论了关于在线协调行为的研究成果。作者协调了行业和学术定义,提出了研究在线协调行为的综合框架,并识别了开放挑战和有前景的研究方向。
[216] Learning General Representation of 12-Lead Electrocardiogram with a Joint-Embedding Predictive Architecture
- arXiv: 2410.08559 (replaced)
- Authors: Sehun Kim
- Subjects: cs.LG; cs.AI
- Tags: Self-Supervised Learning, Medical AI, Representation Learning
- Code: code
- Summary: 本文介绍了ECG-JEPA,一种用于12导联心电图分析的自监督学习模型,通过在隐藏潜在空间中进行预测来学习心电图的语义表示。该模型在诊断分类、特征提取和分割等下游任务中达到了最先进的性能。
[217] Mitigating Extrinsic Gender Bias for Bangla Classification Tasks
- arXiv: 2411.10636 (replaced)
- Authors: Sajib Kumar Saha Joy, Arman Hassan Mahy, Meherin Sultana, Azizah Mamun Abha, MD Piyal Ahmmed, Yue Dong, G M Shahariar
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: Bias Mitigation, Fairness, Text Classification
- Venue: ACL 2026 Findings
- Code: code
- Summary: 本文研究了孟加拉语预训练语言模型中的外在性别偏见,构建了四个手动标注的任务特定基准数据集。作者提出了RandSymKL去偏见策略,在有效减少偏见的同时保持了竞争性的准确率。
[218] OmniPrism: Learning Disentangled Visual Concept for Image Generation
- arXiv: 2412.12242 (replaced)
- Authors: Yangyang Li, Daqing Liu, Wu Liu, Allen He, Xinchen Liu, Yongdong Zhang, Guoqing Jin
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Text-to-Image, Diffusion Model, Image Synthesis
- Summary: 本文提出了OmniPrism,一种用于创意图像生成的视觉概念解耦方法,通过自然语言引导学习解耦的概念表示。该方法构建了配对概念解耦数据集,并使用对比正交解耦训练管线生成高质量的概念解耦结果。
[219] Neurons Speak in Ranges: Breaking Free from Discrete Neuronal Attribution
- arXiv: 2502.06809 (replaced)
- Authors: Muhammad Umair Haider, Hammad Rizwan, Hassan Sajjad, Peizhong Ju, A.B. Siddique
- Subjects: cs.LG; cs.AI; cs.CL
- Tags: Interpretability, LLM Hallucination
- Summary: 本文分析了大型语言模型中的多义性问题,发现概念条件化的激活幅度形成独特的分布。作者引入了NeuronLens框架,通过基于范围的解释和操作实现更精确的可解释性和目标概念操控。
[220] AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Advances Understanding of Human Behaviors and Society
- arXiv: 2502.08691 (replaced)
- Authors: Jinghua Piao, Yuwei Yan, Jun Zhang, Nian Li, Junbo Yan, Xiaochong Lan, Zhihong Lu, Zhiheng Zheng, Jing Yi Wang, Di Zhou, Chen Gao, Fengli Xu, Fang Zhang, Ke Rong, Jun Su, Yong Li
- Subjects: cs.SI; cs.AI
- Tags: LLM Agent, Social Simulation, Multi-Agent System
- Summary: 本文提出了AgentSociety,一个大规模社会模拟器,集成了LLM驱动的代理、真实的社会环境和强大的大规模模拟引擎。作者通过五个关键社会问题验证了该平台作为计算社会实验测试平台的有效性。
[221] Constraining Sequential Model Editing with Editing Anchor Compression
- arXiv: 2503.00035 (replaced)
- Authors: Hao-Xiang Xu, Jun-Yu Ma, Zhen-Hua Ling, Ningyu Zhang, Jia-Chen Gu
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: Knowledge Editing, LLM Hallucination
- Venue: NAACL 2025 Findings
- Summary: 本文针对大型语言模型在序列编辑过程中通用能力退化的问题,提出了编辑锚点压缩(EAC)框架。该方法通过选择重要的编辑锚点来压缩编辑信息,在保留编辑知识的同时保持超过70%的通用能力。
[222] Revitalizing Black-Box Interpretability: Actionable Interpretability for LLMs via Proxy Models
- arXiv: 2505.12509 (replaced)
- Authors: Junhao Liu, Haonan Yu, Zhenyu Yan, Xin Zhang
- Subjects: cs.LG; cs.AI
- Tags: Interpretability, Prompt Engineering
- Venue: ACL 2026
- Summary: 本文提出了一种经济高效的代理框架,利用高效模型近似昂贵LLM的决策边界来实现可解释性。该框架仅用11%的代价实现了超过90%的保真度,并在提示压缩和中毒样本移除中展示了实用价值。
[223] Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment
- arXiv: 2505.18600 (replaced)
- Authors: Bryan Sangwoo Kim, Jeongsol Kim, Jong Chul Ye
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Image Super-Resolution, Diffusion Model, Vision-Language Model
- Venue: NeurIPS 2025
- Summary: 本文介绍了Chain-of-Zoom框架,将单图像超分辨率分解为具有多尺度感知提示的自回归链。该方法使用标准4倍扩散超分模型实现了超过256倍的极端放大,同时保持高感知质量和保真度。
[224] Gen-n-Val: Agentic Image Data Generation and Validation
- arXiv: 2506.04676 (replaced)
- Authors: Jing-En Huang, I-Sheng Fang, Tzuhsuan Huang, Yu-Lun Liu, Chih-Yu Wang, Jun-Cheng Chen
- Subjects: cs.CV; cs.AI; cs.LG; cs.MA
- Tags: Data Synthesis, Object Detection, LLM Agent
- Venue: CVPR 2026 Findings
- Code: code
- Summary: 本文介绍了Gen-n-Val,一个利用Layer Diffusion、LLM和VLLM生成高质量实例掩码和图像的代理式数据生成框架。该方法将无效合成数据从50%减少到7%,并在LVIS和COCO等基准上显著提升了性能。
[225] Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations
- arXiv: 2506.09067 (replaced)
- Authors: Zhiyu Xue, Reza Abbasi-Asl, Ramtin Pedarsani
- Subjects: cs.CV; cs.AI
- Tags: Vision-Language Model, LLM Security, Medical AI
- Summary: 本文提出了一种推理时防御策略,通过合成临床演示来增强医学视觉语言模型的安全性,使其能够拒绝有害查询同时避免过度防御问题。实验表明该方法在九种医学影像模态上有效提升了模型安全性,且混合演示策略可在少样本预算约束下平衡安全性与性能。
[226] Listener-Rewarded Thinking in VLMs for Image Preferences
- arXiv: 2506.22832 (replaced)
- Authors: Alexander Gambashidze, Li Pengyi, Matvey Skripkin, Andrey Galichin, Anton Gusarov, Konstantin Sobolev, Andrey Kuznetsov, Ivan Oseledets
- Subjects: cs.CV; cs.AI
- Tags: Vision-Language Model, RLHF, Text-to-Image
- Summary: 本文提出了一种听众增强的GRPO框架,用于训练人类视觉偏好的奖励模型。该方法通过独立视觉语言模型重新评估推理链来提供校准的置信度分数,在ImageReward基准上达到67.4%的准确率,显著提升了分布外性能并减少了推理矛盾。
[227] Provable Post-Training Quantization: Theoretical Analysis of OPTQ and Qronos
- arXiv: 2508.04853 (replaced)
- Authors: Haoyu Zhang, Shihao Zhang, Ian Colbert, Rayan Saab
- Subjects: cs.LG; cs.AI; cs.IT; math.NA
- Tags: Model Compression, LLM Inference
- Summary: 本文首次为OPTQ和Qronos后训练量化算法提供了定量的误差界限理论分析。研究推导了非渐近的2-范数误差界限,为实践中常用的设计选择提供了理论依据,并为正则化参数选择提供了指导。
[228] VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding
- arXiv: 2508.06869 (replaced)
- Authors: Jianxiang He, Meisheng Hong, Jungang Li, Weiyu Guo, Xuming Hu, Hui Xiong
- Subjects: cs.CV; cs.AI
- Tags: Video Understanding, Vision-Language Model
- Venue: CVPR 2026 Findings
- Summary: 本文提出了VSI,一种多模态关键帧检索框架,通过双分支协作检索方法融合视觉和文本信息进行精确定位。该方法在LongVideoBench和VideoMME上达到了最先进的关键帧检索精度,在文本相关任务上取得了突破性性能。
[229] Mitigating Domain Drift in Multi Species Segmentation with DINOv2: A Cross-Domain Evaluation in Herbicide Research Trials
- arXiv: 2508.07514 (replaced)
- Authors: Artzai Picon, Itziar Eguskiza, Daniel Mugica, Javier Romero, Carlos Javier Jimenez, Eric White, Gabriel Do-Lago-Junqueira, Christian Klukas, Ramon Navarra-Mestre
- Subjects: cs.CV; cs.AI
- Tags: Image Segmentation, Transfer Learning, Domain Adaptation
- Summary: 本文评估了一种结合DINOv2与层次化分类推断的分割框架,用于在异构农业条件下实现稳健的植物物种和损害分割。该方法在时间、地理和传感器变化等多种域偏移条件下表现出色,已部署于BASF的表型分析工作流程中。
[230] Investigating Multimodal Large Language Models to Support Usability Evaluation
- arXiv: 2508.16165 (replaced)
- Authors: Sebastian Lubos, Alexander Felfernig, Damian Garber, Gerhard Leitner, Julian Schwazer, Manuel Henrich
- Subjects: cs.SE; cs.AI; cs.HC
- Tags: Vision-Language Model, Usability Evaluation, Human-Computer Interaction
- Venue: IEA/AIE 2026
- Summary: 本文研究了将多模态大语言模型作为可用性评估辅助工具,将任务框架化为优先级排序问题。研究比较了多个MLLM与可用性专家的评估结果,表明MLLM能够提供互补性见解并支持关键问题的高效优先级排序。
[231] AR-KAN: Autoregressive-Weight-Enhanced Kolmogorov-Arnold Network for Time Series Forecasting
- arXiv: 2509.02967 (replaced)
- Authors: Chen Zeng, Tiehang Xu, Qiao Wang
- Subjects: cs.LG; cs.AI; eess.SP
- Tags: Time Series Forecasting
- Code: code
- Summary: 本文提出了AR-KAN,将预训练的自回归模块与Kolmogorov-Arnold网络相结合用于时间序列预测。该方法在合成近周期函数和真实数据集上均表现出色,证明了其在处理复杂信号频谱结构方面的优势。
[232] STCast: Adaptive Boundary Alignment for Global and Regional Weather Forecasting
- arXiv: 2509.25210 (replaced)
- Authors: Hao Chen, Tao Han, Jie Zhang, Song Guo, Lei Bai
- Subjects: cs.LG; cs.AI
- Tags: Weather Forecasting, Mixture-of-Experts
- Venue: CVPR 2026
- Code: code
- Summary: 本文提出了STCast,一种用于自适应区域边界优化和动态月度预测分配的AI驱动天气预报框架。该方法采用空间对齐注意力机制和时间混合专家模块,在全球、区域、极端事件和集成预测四项任务上均优于现有方法。
[233] On-the-Fly Adaptation to Quantization: Configuration-Aware LoRA for Efficient Fine-Tuning of Quantized LLMs
- arXiv: 2509.25214 (replaced)
- Authors: Rongguang Ye, Ming Tang, Edith C. H. Ngai
- Subjects: cs.LG; cs.AI
- Tags: Model Compression, LLM Inference
- Summary: 本文提出了CoA-LoRA,一种无需重复微调即可动态调整LoRA适配器以适应任意量化配置的方法。该方法通过基于帕累托的配置搜索优化训练配置集,在不增加额外时间成本的情况下达到与专门微调方法相当或更优的性能。
[234] Adaptive Planning for Multi-Attribute Controllable Summarization with Monte Carlo Tree Search
- arXiv: 2509.26435 (replaced)
- Authors: Sangwon Ryu, Heejin Do, Yunsu Kim, Gary Geunbae Lee, Jungseul Ok
- Subjects: cs.CL; cs.AI
- Tags: Summarization, LLM Reasoning
- Venue: ACL 2026
- Summary: 本文提出了PACO,一种无需训练的多属性可控摘要框架,通过蒙特卡洛树搜索自适应规划属性控制顺序。该方法在多个领域和模型上实现了稳健的多属性可控性,使用Llama-3.2-1B即可与Llama-3.3-70B基线的可控性相媲美。
[235] Traj2Action: A Co-Denoising Framework for Trajectory-Guided Human-to-Robot Skill Transfer
- arXiv: 2510.00491 (replaced)
- Authors: Han Zhou, Jinjin Cao, Liyuan Ma, Xueji Fang, Guo-jun Qi
- Subjects: cs.RO; cs.AI
- Tags: Robotics, Imitation Learning, Embodied AI
- Summary: 本文提出了Traj2Action框架,使用3D轨迹作为统一中间表示来实现人到机器人的技能迁移。该协同去噪方法在真实世界机器人任务上相比基线提升了高达27%的性能,有效弥合了形态差距。
[236] Unmasking Puppeteers: Leveraging Biometric Leakage to Disarm Impersonation in AI-based Videoconferencing
- arXiv: 2510.03548 (replaced)
- Authors: Danial Samadi Vahdati, Tai Duc Nguyen, Ekta Prashnani, Koki Nagano, David Luebke, Orazio Gallo, Matthew Stamm
- Subjects: cs.CV; cs.AI
- Tags: Deepfake Detection, Cybersecurity
- Summary: 本文提出了一种基于生物特征泄漏的防御方法,用于检测AI视频会议系统中的傀儡攻击。该方法通过姿态条件化的大间隔对比编码器从传输的潜变量中分离持久身份线索,无需查看重建的RGB视频即可实时标记非法身份交换。
[237] Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels
- arXiv: 2510.06499 (replaced)
- Authors: Zhepeng Cen, Haolin Chen, Shiyu Wang, Zuxin Liu, Zhiwei Liu, Jielin Qiu, Ding Zhao, Silvio Savarese, Caiming Xiong, Huan Wang, Weiran Yao
- Subjects: cs.CL; cs.AI
- Tags: Reinforcement Learning, Data Synthesis, LLM Reasoning
- Summary: 本文介绍了Webscale-RL管道,可将大规模预训练文档系统性地转换为多样化的可验证问答对用于强化学习。使用该方法构建的数据集训练模型,在达到与持续预训练相当性能的同时,所需token数量减少高达100倍。
[238] Dejavu: Towards Experience Feedback Learning for Embodied Intelligence
- arXiv: 2510.10181 (replaced)
- Authors: Shaokai Wu, Yanbiao Ji, Qiuchang Li, Zhiyi Zhang, Qichen He, Wenyuan Xie, Guodong Zhang, Bayram Bayramli, Yue Ding, Hongtao Lu
- Subjects: cs.RO; cs.AI; cs.CV
- Tags: Embodied AI, Robotics, Memory Architecture
- Summary: 本文提出了Dejavu,一种通过经验反馈网络增强冻结视觉-语言-动作策略的部署后学习框架。该方法通过检索相关执行记忆来指导动作预测,使具身智能体能够在部署过程中从经验中学习。
[239] RESample: A Robust Data Augmentation Framework via Exploratory Sampling for Robotic Manipulation
- arXiv: 2510.17640 (replaced)
- Authors: Yuquan Xue, Guanxing Lu, Zhenyu Wu, Chuanrui Zhang, Bofang Jia, Zhengyi Gu, Ziwei Wang
- Subjects: cs.RO; cs.AI; cs.LG
- Tags: Robotics, Data Synthesis, Embodied AI
- Summary: 本文提出了RESample,一种通过探索性采样机制有效提升VLA训练数据集分布覆盖的自动化数据增强框架。该方法在LIBERO基准和真实世界机器人任务上相比基线提升了12%的性能,仅需额外10-20%的样本。
[240] LLM4Delay: Flight Delay Prediction via Cross-Modality Adaptation of Large Language Models and Aircraft Trajectory Representation
- arXiv: 2510.23636 (replaced)
- Authors: Thaweerath Phisannupawong, Joshua Julian Damanik, Han-Lim Choi
- Subjects: cs.LG; cs.AI; cs.CL
- Tags: Time Series Forecasting, Multimodal Learning
- Summary: 本文提出了LLM4Delay,一种基于大语言模型的航班延误预测框架,通过跨模态适应策略整合文本航空信息与轨迹表示。该方法在预测准确性上优于现有ATM框架和先前的时序到语言适应方法。
[241] How Similar Are Grokipedia and Wikipedia? A Multi-Dimensional Textual and Structural Comparison
- arXiv: 2510.26899 (replaced)
- Authors: Taha Yasseri, Saeedeh Mohammadi
- Subjects: cs.CY; cs.AI; cs.SI
- Tags: Bias Mitigation, LLM Evaluation
- Summary: 本文对AI生成的Grokipedia与维基百科进行了大规模对比分析,发现Grokipedia文章更长但引用更少,且在历史、宗教等主题上存在系统性的政治偏见偏移。
[242] EGMOF: Efficient Generation of Metal-Organic Frameworks Using a Hybrid Diffusion-Transformer Architecture
- arXiv: 2511.03122 (replaced)
- Authors: Seunghee Han, Yeonghun Kang, Taeun Bae, Junho Kim, Younghun Kim, Varinia Bernales, Alan Aspuru-Guzik, Jihan Kim
- Subjects: cs.AI; cs.LG
- Tags: Molecular Generation, Diffusion Model
- Summary: 本文提出EGMOF框架,结合扩散模型和Transformer实现金属有机框架材料的高效生成,在少量训练数据下实现了超过95%的有效性和84%的命中率。
[243] Evolutionary Optimization Trumps Adam Optimization on Embedding Space Exploration
- arXiv: 2511.03913 (replaced)
- Authors: Domício Pereira Neto, João Correia, Penousal Machado
- Subjects: cs.NE; cs.AI
- Tags: Diffusion Model, Optimization, Text-to-Image
- Summary: 本文比较了进化优化算法sep-CMA-ES与Adam优化器在Stable Diffusion提示词嵌入搜索中的表现,发现进化方法在美学-对齐权衡上表现更优。
[244] Structured Uncertainty guided Clarification for LLM Agents
- arXiv: 2511.08798 (replaced)
- Authors: Manan Suri, Puneet Mathur, Nedim Lipka, Franck Dernoncourt, Ryan A. Rossi, Dinesh Manocha
- Subjects: cs.CL; cs.AI
- Tags: LLM Agent, Uncertainty Estimation
- Summary: 本文提出了一种结构化不确定性框架,帮助LLM智能体在用户指令模糊时提出澄清性问题,显著提高了工具调用的准确性和效率。
[245] Commanding Humanoid by Free-form Language: A Large Language Action Model with Unified Motion Vocabulary
- arXiv: 2511.22963 (replaced)
- Authors: Zhirui Liu, Kaiyang Ji, Ke Yang, Jingyi Yu, Ye Shi, Jingya Wang
- Subjects: cs.RO; cs.AI
- Tags: Embodied AI, Robotics
- Summary: 本文提出Humanoid-LLA模型,通过统一运动词汇表将自由形式语言命令映射为人形机器人的全身动作,实现了语言泛化与物理可行性的平衡。
[246] Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding
- arXiv: 2511.23071 (replaced)
- Authors: Anik De, Abhirama Subramanyam Penamakuri, Rajeev Yadav, Aditya Rathore, Harshiv Shah, Devesh Sharma, Sagar Agarwal, Pravin Kumar, Anand Mishra
- Subjects: cs.CV; cs.AI; cs.CL
- Tags: Scene Text Recognition, OCR, Linguistic Resource
- Venue: IJDAR
- Summary: 本文发布了BSTD数据集,涵盖11种印度语言和英语的10万多个场景文本词,支持场景文本检测、脚本识别和端到端识别等多项任务。
[247] See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
- arXiv: 2512.02231 (replaced)
- Authors: Le Thien Phuc Nguyen, Zhuoran Yu, Samuel Low Yu Hang, Subin An, Jeongik Lee, Yohan Ban, SeungEun Chung, Thanh-Huy Nguyen, JuWan Maeng, Soochahn Lee, Yong Jae Lee
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Vision-Language Model, Speech Processing, LLM Evaluation
- Venue: CVPR 2026
- Summary: 本文提出AV-SpeakerBench基准,用于评估多模态大语言模型在真实视频中以说话者为中心的音视频推理能力。
[248] From Navigation to Refinement: Revealing the Two-Stage Nature of Flow-based Diffusion Models through Oracle Velocity
- arXiv: 2512.02826 (replaced)
- Authors: Haoming Liu, Jinnuo Liu, Yanhao Li, Liuyang Bai, Yunkai Ji, Yuanhe Guo, Shenji Wan, Hongyi Wen
- Subjects: cs.LG; cs.AI
- Tags: Diffusion Model, Flow Matching
- Venue: CVPR 2026
- Summary: 本文揭示了基于流的扩散模型具有两阶段训练特性:早期导航阶段形成全局布局,后期细化阶段记忆细节,为理解扩散模型训练动态提供了新视角。
[249] Out-of-the-box: Black-box Causal Attacks on Object Detectors
- arXiv: 2512.03730 (replaced)
- Authors: Melane Navaratnarajah, David A. Kelly, Hana Chockler
- Subjects: cs.CV; cs.AI
- Tags: Object Detection, Adversarial Robustness, Causal Inference
- Summary: 本文提出BlackCAtt算法,利用因果充分的像素集构建可解释的黑盒对抗攻击,在目标检测器上实现了更小、更不可感知的攻击效果。
[250] SkillFactory: Self-Distillation For Learning Cognitive Behaviors
- arXiv: 2512.04072 (replaced)
- Authors: Zayne Sprague, Jack Lu, Manya Wadhwa, Sedrick Keh, Mengye Ren, Greg Durrett
- Subjects: cs.CL; cs.AI
- Tags: Knowledge Distillation, LLM Reasoning, Reinforcement Learning
- Venue: ICLR 2026
- Code: code
- Summary: 本文提出SkillFactory方法,通过自蒸馏让模型在强化学习前学习验证、回溯等认知技能,提升了模型在困难任务上的泛化能力。
[251] Relational Visual Similarity
- arXiv: 2512.07833 (replaced)
- Authors: Thao Nguyen, Sicheng Mo, Krishna Kumar Singh, Yilin Wang, Jing Shi, Nicholas Kolkin, Eli Shechtman, Yong Jae Lee, Yuheng Li
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Vision-Language Model, Representation Learning
- Venue: CVPR 2026
- Summary: 本文提出关系视觉相似性概念,通过在匿名化描述数据集上微调视觉语言模型,使图像能够基于底层关系结构而非表面属性进行相似性匹配。
[252] Multi-agent Adaptive Mechanism Design
- arXiv: 2512.21794 (replaced)
- Authors: Qiushi Han, David Simchi-Levi, Renfei Tan, Zishuo Zhao
- Subjects: cs.GT; cs.AI; cs.LG; cs.MA; econ.TH
- Tags: Multi-Agent System, Decision Making
- Summary: 本文提出DRAM框架,结合机制设计和在线学习,在无先验知识的情况下从多个理性智能体获取真实报告,同时实现最优遗憾界。
[253] The Two-Stage Decision-Sampling Hypothesis: Understanding the Emergence of Self-Reflection in RL-Trained LLMs
- arXiv: 2601.01580 (replaced)
- Authors: Zibo Zhao, Yuanting Zha, Haipeng Zhang, Xingcheng Xu
- Subjects: cs.LG; cs.AI
- Tags: LLM Reasoning, Reinforcement Learning
- Summary: 本文提出两阶段决策采样假说,解释了RL训练的LLM中自我反思能力的涌现机制,证明了RL通过改善决策能力而非采样能力实现更好的泛化。
[254] Adversarial Evasion Attacks on Computer Vision using SHAP Values
- arXiv: 2601.10587 (replaced)
- Authors: Frank Mollard, Marcus Becker, Florian Roehrbein
- Subjects: cs.CV; cs.AI
- Tags: Adversarial Robustness, Computer Vision
- Venue: bwHPC Symposium 2024 Workshop
- Summary: 本文提出一种基于SHAP值的白盒对抗攻击方法,通过量化输入重要性生成对抗扰动,在梯度隐藏场景下表现出更强的鲁棒性。
[255] Screen, Cache, and Match: A Training-Free Causality-Consistent Reference Frame Framework for Human Animation
- arXiv: 2601.22160 (replaced)
- Authors: Jianan Wang, Nailei Hei, Li He, Huanzhen Wang, Aoxing Li, Yingkai Zhao, Yuxuan Lin, Haofen Wang, Chunyang Wang, Yan Wang, Wenqiang Zhang
- Subjects: cs.GR; cs.AI
- Tags: Video Generation, Diffusion Model
- Summary: 本文提出FrameCache框架,通过屏幕-缓存-匹配策略和轨迹感知自回归生成机制,在无需训练的情况下提升人体动画的时序一致性和视觉稳定性。
[256] Self-Supervised Slice-to-Volume Reconstruction with Gaussian Representations for Fetal MRI
- arXiv: 2601.22990 (replaced)
- Authors: Yinsong Wang, Thomas Fletcher, Xinzhe Luo, Aine Travers Dineen, Rhodri Cusack, Chen Qin
- Subjects: cs.CV; cs.AI
- Tags: Medical AI, 3D Reconstruction, Self-Supervised Learning
- Code: code
- Summary: 本文提出GaussianSVR自监督框架,利用3D高斯表示从运动损坏的2D切片重建胎儿MRI体数据,无需真实标注即可实现高质量重建。
[257] On the Limits of Layer Pruning for Generative Reasoning in Large Language Models
- arXiv: 2602.01997 (replaced)
- Authors: Safal Shrestha, Anubhav Shrestha, Aadim Nepal, Minwu Kim, Keith Ross
- Subjects: cs.LG; cs.AI
- Tags: Model Compression, LLM Reasoning
- Summary: 本文研究了大型语言模型的层剪枝方法,发现虽然分类任务在剪枝后能较好恢复性能,但生成式推理任务(如GSM8K、HumanEval)的恢复能力存在根本性限制,即使在大规模微调后也难以恢复原有的推理能力。
[258] Tiled Prompts: Overcoming Prompt Misguidance in Image and Video Super-Resolution
- arXiv: 2602.03342 (replaced)
- Authors: Bryan Sangwoo Kim, Jonghyun Park, Jong Chul Ye
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Image Super-Resolution, Diffusion Model, Video Generation
- Summary: 本文提出了Tiled Prompts框架,通过为每个潜在tile生成特定的提示词来解决图像和视频超分辨率中的提示误导问题,在保持感知质量和保真度的同时减少了幻觉和tile级伪影。
[259] SPEAR: An Engineering Case Study of Multi-Agent Coordination for Smart Contract Auditing
- arXiv: 2602.04418 (replaced)
- Authors: Indraveni Chebolu, Arnab Mallick, Harmesh Rana
- Subjects: cs.MA; cs.AI; cs.DC; cs.ET; cs.SE
- Tags: Multi-Agent System, LLM Agent
- Venue: AAMAS 2026 Workshop
- Summary: 本文提出了SPEAR,一个用于智能合约审计的多智能体协调框架,采用规划智能体、执行智能体和修复智能体的分工协作模式,通过AGM兼容的信念更新和协商协议实现安全分析工作流。
[260] Overstating Attitudes, Ignoring Networks: LLM Biases in Simulating Misinformation Susceptibility
- arXiv: 2602.04674 (replaced)
- Authors: Eun Cheol Choi, Lindsay E. Young, Emilio Ferrara
- Subjects: cs.SI; cs.AI; cs.CL
- Tags: LLM Evaluation, Social Simulation, Bias Mitigation
- Venue: ICWSM 2026
- Summary: 本文评估了LLM模拟调查受访者在错误信息易感性方面的能力,发现LLM会夸大态度与分享行为的关联,同时忽视个人网络特征,表明LLM更适合用于诊断与人类判断的差异而非替代人类判断。
[261] Exploring Teachers' Perspectives on Using Conversational AI Agents for Group Collaboration
- arXiv: 2602.07142 (replaced)
- Authors: Prerna Ravi, Carúmey Stevens, Beatriz Flamia Azevedo, Jasmine David, Brandon Hanks, Hal Abelson, Grace Lin, Emma Anderson
- Subjects: cs.HC; cs.AI
- Tags: Education Technology, Dialogue System, Human-Computer Interaction
- Venue: AIED 2026
- Summary: 本文通过33位K12教师的探索性定性研究,考察了他们对语音对话智能体Phoenix在小组协作中应用的看法,揭示了关于自主性、信任、拟人化和教学一致性方面的设计张力。
[262] An Adaptive Model Selection Framework for Demand Forecasting under Horizon-Induced Degradation to Support Business Strategy and Operations
- arXiv: 2602.13939 (replaced)
- Authors: Adolfo González, Víctor Parada
- Subjects: cs.LG; cs.AI
- Tags: Time Series Forecasting, Optimization
- Summary: 本文提出了AHSIV框架,一种面向预测范围和需求模式的自适应模型选择方法,通过整合预测范围退化分析、结构化需求分类和多目标帕累托优势来解决预测范围导致的排名不稳定问题。
[263] SubQuad: Near-Quadratic-Free Structure Inference with Distribution-Balanced Objectives in Adaptive Receptor framework
- arXiv: 2602.17330 (replaced)
- Authors: Rong Fu, Zijian Zhang, Kun Liu, Jiekai Wu, Xianda Li, Simon Fong
- Subjects: cs.LG; cs.AI
- Tags: Medical AI, Bias Mitigation
- Summary: 本文提出了SubQuad管道,通过结合近次二次检索、GPU加速的亲和力核、可学习多模态融合和公平性约束聚类,解决了大规模适应性免疫受体比较分析中的计算成本和数据不平衡问题。
[264] Descriptor: Parasitoid Wasps and Associated Hymenoptera Dataset (DAPWH)
- arXiv: 2602.20028 (replaced)
- Authors: Joao Manoel Herrera Pinheiro, Gabriela Do Nascimento Herrera, Luciana Bueno Dos Reis Fernandes, Alvaro Doria Dos Santos, Ricardo V. Godoy, Eduardo A. B. Almeida, Helena Carolina Onody, Marcelo Andrade Da Costa Vieira, Angelica Maria Penteado-Dias, Marcelo Becker
- Subjects: cs.CV; cs.AI
- Tags: Object Detection, Computer Vision
- Summary: 本文发布了一个包含3,556张高分辨率图像的膜翅目寄生蜂数据集,其中1,739张图像带有COCO格式的多类别边界框标注,为开发自动化昆虫识别系统提供了基础资源。
[265] Reinforcement-aware Knowledge Distillation for LLM Reasoning
- arXiv: 2602.22495 (replaced)
- Authors: Zhaoyang Zhang, Shuli Jiang, Yantao Shen, Yuting Zhang, Dhananjay Ram, Shuo Yang, Zhuowen Tu, Wei Xia, Stefano Soatto
- Subjects: cs.LG; cs.AI
- Tags: Knowledge Distillation, LLM Reasoning, Reinforcement Learning
- Summary: 本文提出了RLAD方法,一种强化学习感知的知识蒸馏技术,通过Trust Region Ratio Distillation在RL训练过程中进行选择性模仿,在逻辑推理和数学基准测试上优于传统蒸馏方法。
[266] Why Adam Can Beat SGD: Second-Moment Normalization Yields Sharper Tails
- arXiv: 2603.03099 (replaced)
- Authors: Ruinan Jin, Yingbin Liang, Shaofeng Zou
- Subjects: cs.LG; cs.AI
- Tags: Optimization
- Summary: 本文从理论上分析了Adam优化器相比SGD的优势,证明了Adam的二阶矩归一化使其在高概率收敛行为上达到δ^(-1/2)依赖,而SGD至少需要δ^(-1)依赖。
[267] Better Eyes, Better Thoughts: Why Vision Chain-of-Thought Fails in Medicine
- arXiv: 2603.06665 (replaced)
- Authors: Yuan Wu, Zongxian Yang, Jiayu Qian, Songpan Gao, Guanxing Chen, Qiankun Li, Yu-An Huang, Zhi-An Huang
- Subjects: cs.CV; cs.AI
- Tags: Vision-Language Model, Medical AI, LLM Reasoning
- Code: code
- Summary: 本文发现思维链提示在医学视觉问答任务中经常不如直接回答,将其归因于医学感知瓶颈,并提出了感知锚定和描述接地两种推理时干预方法来改善视觉接地。
[268] Memory-efficient Continual Learning with Prototypical Exemplar Condensation
- arXiv: 2603.13804 (replaced)
- Authors: Minh-Duong Nguyen, Thien-Thanh Dao, Le-Tuan Nguyen, Dung D. Le, Kok-Seng Wong
- Subjects: cs.LG; cs.AI
- Tags: Continual Learning, Model Compression
- Summary: 本文提出了一种基于原型样本合成的持续学习方法,通过生成代表性原型样本和扰动增强机制,在大幅压缩内存占用的同时有效缓解灾难性遗忘问题。
[269] Fine-tuning is Not Enough: A Parallel Framework for Collaborative Imitation and Reinforcement Learning in End-to-end Autonomous Driving
- arXiv: 2603.13842 (replaced)
- Authors: Zhexi Lian, Haoran Wang, Xuerun Yan, Weimeng Lin, Xianhong Zhang, Yongyu Chen, Jia Hu
- Subjects: cs.RO; cs.AI
- Tags: Autonomous Driving, Imitation Learning, Reinforcement Learning
- Summary: 本文提出了PaIR-Drive框架,一种端到端自动驾驶的并行模仿学习和强化学习方法,通过分离IL和RL为两个并行分支实现无冲突的协作优化,在NAVSIM基准上取得了竞争性性能。
[270] You've Got a Golden Ticket: Improving Generative Robot Policies With A Single Noise Vector
- arXiv: 2603.15757 (replaced)
- Authors: Omkar Patil, Ondrej Biza, Thomas Weng, Karl Schmeckpeper, Wil Thomason, Xiaohan Zhang, Robin Walters, Nakul Gopalan, Sebastian Castro, Eric Rosen
- Subjects: cs.RO; cs.AI
- Tags: Robotics, Diffusion Model, Flow Matching
- Summary: 本文发现使用固定的”黄金票据”初始噪声向量替代随机采样可以提升预训练扩散/流匹配机器人策略的性能,并提出了一种蒙特卡洛策略评估搜索方法来寻找最优噪声向量。
[271] Improving Automatic Summarization of Radiology Reports through Mid-Training of Large Language Models
- arXiv: 2603.19275 (replaced)
- Authors: Mengxian Lyu, Cheng Peng, Ziyi Chen, Mengyuan Zhang, Jieting Li Lu, Yonghui Wu
- Subjects: cs.CL; cs.AI
- Tags: Summarization, Medical AI, Transfer Learning
- Summary: 本文提出了一种在预训练和微调之间加入中间训练的策略来改进放射学报告自动摘要,实验表明中间训练模型GatorTronT5-Radio在ROUGE-L和RadGraph-F1指标上均优于直接微调方法。
[272] RAM: Recover Any 3D Human Motion in-the-Wild
- arXiv: 2603.19929 (replaced)
- Authors: Sen Jia, Ning Zhu, Jinqin Zhong, Jiale Zhou, Huaping Zhang, Jenq-Neng Hwang, Lei Li
- Subjects: cs.CV; cs.AI
- Tags: 3D Vision, Video Understanding
- Venue: CVPR 2026
- Summary: 本文提出了RAM框架,结合运动感知语义跟踪器、自适应卡尔曼滤波和记忆增强时序模块,实现了在野外环境下鲁棒的多人3D人体运动重建。
[273] Chronological Contrastive Learning: Few-Shot Progression Assessment in Irreversible Diseases
- arXiv: 2603.21935 (replaced)
- Authors: Clemens Watzenböck, Daniel Aletaha, Michaël Deman, Thomas Deimel, Jana Eder, Ivana Janickova, Robert Janiczek, Peter Mandl, Philipp Seeböck, Gabriela Supp, Paul Weiser, Georg Langs
- Subjects: cs.CV; cs.AI
- Tags: Medical AI, Self-Supervised Learning, Few-Shot Learning
- Venue: MIDL 2026
- Code: code
- Summary: 该论文提出ChronoCon方法,利用不可逆疾病中患者纵向扫描的时间顺序进行对比学习,无需专家标签即可学习疾病相关表示。在类风湿关节炎X光片评估中,该方法在少样本设置下显著优于全监督基线,仅需5名患者的专家评分即可达到86%的组内相关系数。
[274] Towards Context-Aware Image Anonymization with Multi-Agent Reasoning
- arXiv: 2603.27817 (replaced)
- Authors: Robert Aufschläger, Jakob Folz, Gautam Savaliya, Manjitha D Vidanalage, Michael Heigl, Martin Schramm
- Subjects: cs.CV; cs.AI; cs.CR
- Tags: Multi-Agent System, Privacy, Diffusion Model
- Venue: CVPR 2026 Workshop
- Summary: 该论文提出CAIAMAR框架,通过多智能体推理实现上下文感知的PII分割和基于扩散模型的匿名化。三个专门化智能体通过PDCA循环协调工作,在CUHK03-NP上将人员重识别风险降低73%,同时保持优异的图像质量。
[275] Kill-Chain Canaries: Stage-Level Tracking of Prompt Injection Across Attack Surfaces and Model Safety Tiers
- arXiv: 2603.28013 (replaced)
- Authors: Haochuan Kevin Wang, Zechen Zhang
- Subjects: cs.CR; cs.AI; cs.LG
- Tags: LLM Security, Multi-Agent System, Cybersecurity
- Summary: 该论文提出kill-chain金丝雀方法,通过四个阶段跟踪提示注入攻击,在950次运行中评估了5个前沿LLM、6个攻击面和5种防御条件。研究发现提示注入主要是流水线架构问题,写入节点位置是最关键的安全决策点。
[276] Explorable Theorems: Making Written Theorems Explorable by Grounding Them in Formal Representations
- arXiv: 2604.02598 (replaced)
- Authors: Hita Kambhamettu, Will Crichton, Sean Welleck, Harrison Goldstein, Andrew Head
- Subjects: cs.HC; cs.AI; cs.PL
- Tags: Formal Methods, Scientific Reasoning, LLM Reasoning
- Summary: 该论文提出可探索定理系统,利用LLM将数学定理及其证明转换为Lean形式化表示,使读者能够逐步执行证明、测试自定义示例并追踪逻辑依赖。用户研究表明,使用可探索性功能的参与者在理解问题上表现更好。
[277] Verbalizing LLMs' assumptions to explain and control sycophancy
- arXiv: 2604.03058 (replaced)
- Authors: Myra Cheng, Isabel Sieh, Humishka Zope, Sunny Yu, Lujain Ibrahim, Aryaman Arora, Jared Moore, Desmond Ong, Dan Jurafsky, Diyi Yang
- Subjects: cs.CL; cs.AI; cs.CY
- Tags: LLM Alignment, Interpretability, LLM Hallucination
- Summary: 该论文提出Verbalized Assumptions框架,用于引出LLM的假设以解释和控制奉承行为。研究发现奉承行为源于对用户的错误假设,通过线性探针可以实现对社交奉承行为的可解释细粒度调控。
[278] From Paper to Program: Accelerating Quantum Many-Body Algorithm Development via a Multi-Stage LLM-Assisted Workflow
- arXiv: 2604.04089 (replaced)
- Authors: Yi Zhou
- Subjects: cs.AI; cs.HC
- Tags: Code Generation, Scientific Computing, Quantum Computing
- Summary: 该论文提出多阶段LLM辅助工作流,通过中间技术规范步骤将科学算法从论文转化为代码,将实现关键的计算知识外化。应用于DMRG量子多体算法时,16种模型组合的成功率达100%,而直接实现仅为46%。
[279] Many Preferences, Few Policies: Towards Scalable Language Model Personalization
- arXiv: 2604.04144 (replaced)
- Authors: Cheol Woo Kim, Jai Moondra, Roozbeh Nahavandi, Andrew Perrault, Milind Tambe, Swati Gupta
- Subjects: cs.CL; cs.AI
- Tags: LLM Personalization, LLM Alignment, Reinforcement Learning
- Summary: 该论文提出PALM方法,通过选择少量LLM组合来覆盖异构用户的偏好空间,为LLM个性化提供理论保证。该方法刻画了系统成本与个性化之间的权衡,以及覆盖用户偏好景观所需的LLM多样性。
[280] Boosted Distributional Reinforcement Learning: Analysis and Healthcare Applications
- arXiv: 2604.04334 (replaced)
- Authors: Zequn Chen, Wesley J. Marrero
- Subjects: cs.LG; cs.AI
- Tags: Reinforcement Learning, Medical AI, Distributional RL
- Summary: 该论文提出BDRL算法,在优化个体结果分布的同时强制相似智能体之间的可比性,并分析了其收敛性。应用于高血压管理时,该方法改善了治疗结果的一致性和质量调整生命年。
[281] ALTO: Adaptive LoRA Tuning and Orchestration for Heterogeneous LoRA Training Workloads
- arXiv: 2604.05426 (replaced)
- Authors: Jingwei Zuo, Xinze Feng, Zien Liu, Kaijian Wang, Fanjiang Ye, Ye Cao, Zhuang Wang, Yuke Wang
- Subjects: cs.LG; cs.AI; cs.DC
- Tags: LLM Inference, Model Compression, Distributed Training
- Summary: 该论文提出ALTO系统,通过监控损失轨迹提前终止弱候选、使用融合分组GEMM和新型秩局部适配器并行来加速LoRA超参数调优。该系统在不牺牲适配器质量的情况下实现高达13.8倍的加速。
[282] DiffHDR: Re-Exposing LDR Videos with Video Diffusion Models
- arXiv: 2604.06161 (replaced)
- Authors: Zhengming Yu, Li Ma, Mingming He, Leo Isikdogan, Yuancheng Xu, Dmitriy Smirnov, Pablo Salamanca, Dao Mi, Pablo Delgado, Ning Yu, Julien Philip, Xin Li, Wenping Wang, Paul Debevec
- Subjects: cs.CV; cs.AI; cs.GR
- Tags: Diffusion Model, Video Generation, Image Enhancement
- Summary: 该论文提出DiffHDR框架,将LDR到HDR视频转换建模为视频扩散模型潜在空间中的生成性辐射修复任务。该方法在Log-Gamma色彩空间中操作,能够合成过曝和欠曝区域的合理HDR辐射度,同时保持时间稳定性。
[283] WisdomInterrogatory (LuWen): An Open-Source Legal Large Language Model Technical Report
- arXiv: 2604.06737 (replaced)
- Authors: Yiquan Wu, Yuhang Liu, Yifei Liu, Ang Li, Siying Zhou, Kun Kuang, Fei Wu
- Subjects: cs.CL; cs.AI
- Tags: Legal AI, RAG, Instruction Tuning
- Summary: 该论文提出LuWen开源中文法律大语言模型,通过持续预训练、监督微调和RAG集成构建。该模型在法律判决预测、司法考试、法律文本摘要等五项任务上优于多个强基线模型。
[284] Governed Capability Evolution for Embodied Agents: Safe Upgrade, Compatibility Checking, and Runtime Rollback for Embodied Capability Modules
- arXiv: 2604.08059 (replaced)
- Authors: Xue Qin, Simin Luan, John See, Cong Yang, Zhijun Li
- Subjects: cs.RO; cs.AI
- Tags: Embodied AI, LLM Agent, AI Safety
- Summary: 该论文将受控能力演化形式化为具身智能体的系统问题,提出包含四种兼容性检查和分阶段运行时流水线的生命周期感知升级框架。评估表明该框架在保持相当任务成功率的同时实现零不安全激活。
[285] OV-Stitcher: A Global Context-Aware Framework for Training-Free Open-Vocabulary Semantic Segmentation
- arXiv: 2604.08110 (replaced)
- Authors: Seungjae Moon, Seunghyun Oh, Youngmin Ro
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Image Segmentation, Vision-Language Model, Zero-Shot Learning
- Summary: 该论文提出OV-Stitcher框架,通过在最终编码器块中拼接碎片化子图像特征来解决滑动窗口方法的全局注意力限制。该方法在八个基准测试上实现了显著的mIoU提升,从48.7提高到50.7。
[286] HyperMem: Hypergraph Memory for Long-Term Conversations
- arXiv: 2604.08256 (replaced)
- Authors: Juwei Yue, Chuanrui Hu, Jiawei Sheng, Zuyi Zhou, Wenyuan Zhang, Tingwen Liu, Li Guo, Yafeng Deng
- Subjects: cs.CL; cs.AI
- Tags: Dialogue System, Memory Architecture, Knowledge Graph
- Venue: ACL 2026
- Summary: 该论文提出HyperMem超图记忆架构,通过超边显式建模长期对话中的高阶关联。三级结构(主题、情节、事实)配合混合词汇语义索引和粗到细检索策略,在LoCoMo基准上达到92.73%的LLM评判准确率。
[287] QARIMA: A Quantum Approach To Classical Time Series Analysis
- arXiv: 2604.08277 (replaced)
- Authors: Nishikanta Mohanty, Bikash K. Behera, Badshah Mukherjee, Pravat Dash
- Subjects: cs.AI; cs.LG
- Tags: Quantum Computing, Time Series Forecasting, Optimization
- Summary: 该论文提出量子启发的ARIMA方法,将量子辅助滞后发现与变分量子电路相结合进行参数估计。该方法通过交换测试驱动的自相关分析和VQC弱滞后细化,减少了元优化开销。
[288] CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Understanding and Reasoning
- arXiv: 2604.08457 (replaced)
- Authors: Rui Gan, Junyi Ma, Pei Li, Xingyou Yang, Kai Chen, Sikai Chen, Bin Ran
- Subjects: cs.CV; cs.AI; cs.RO
- Tags: Video Understanding, Autonomous Driving, Vision-Language Model
- Summary: 该论文提出CrashSight大规模视觉语言基准,包含250个交通事故视频和13K问答对,用于评估路侧摄像头视角的交通事故理解。基准测试显示当前VLM在安全关键场景的时间和因果推理方面存在困难。
[289] SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds
- arXiv: 2604.08544 (replaced)
- Authors: Yunsong Zhou, Hangxu Liu, Xuekun Jiang, Xing Shen, Yuanzhen Zhou, Hui Wang, Baole Fang, Yang Tian, Mulin Yu, Qiaojun Yu, Li Ma, Hengjie Li, Hanqing Wang, Jia Zeng, Jiangmiao Pang
- Subjects: cs.RO; cs.AI; cs.CV
- Tags: Sim-to-Real, Robotics, Data Synthesis
- Summary: 本文提出SIM1,一种物理对齐的真实到仿真再到真实的数据引擎,用于可变形物体(如布料)的机器人操作。该系统通过将场景数字化为度量一致的孪生体、校准可变形动力学并通过扩散轨迹生成扩展行为,将稀疏观测转化为大规模合成监督。实验表明,纯合成数据训练的策略在真实世界部署中达到90%的零样本成功率和50%的泛化提升。