arXiv cs.AI Daily Update
cs.AI 领域 2026年4月16日 共有 274 篇论文更新:
- 21 篇新投稿:LLM Agent (SciFi [2], WebXSkill [5], RiskWebWorld [10]), LLM Inference ([3], [11], [13]), GUI Automation (WebXSkill [5], RiskWebWorld [10], [9]), Reinforcement Learning (AlphaCNOT [14], [18], [20]), LLM Evaluation ([1], [16])
- 138 篇跨领域投稿:LLM Agent (LiveClawBench [39], Contract-Coding [49], CCCE [51]), Reinforcement Learning ([26], [46], [47]), Benchmark (WorkRB [31], LiveClawBench [39], DeEscalWild [41]), LLM Reasoning (HETA [71], C-voting [106], LongCoT [158]), Medical AI (MEmEBG [62], L2D-Clinical [74], Med-CAM [122])
- 115 篇替换投稿:LLM Agent (AAIO [160], FieldWorkArena [161], Orak [162]), LLM Reasoning (RL-PLUS [163], TRIM [174], GraphScout [180]), Benchmark (FieldWorkArena [161], Orak [162], MAS-Bench [165]), Reinforcement Learning (RL-PLUS [163], WOMBET [262], [169]), LLM Inference (Saber [168], TRIM [174], A-IO [265])
整体趋势:今日论文主要聚焦于LLM Agent、Reinforcement Learning、Benchmark等方向。
已录用论文:[9](ACL 2026), [24](AAAI 2026 Workshop), [35](ICLR 2026 Workshop), [61](IAFOR Agen 2026), [63](AISTATS 2026), [68](CVPR 2026 Workshop), [71](ICLR 2026), [83](WCCI 2026), [91](CVPR 2026 Findings), [93](CVPR 2026 Workshop), [97](ICSE-SEIP 2026), [108](ACL 2026), [112](ICAIL 2026), [115](IEEE ICASI 2026), [119](AIED 2026), [129](FUZZ-IEEE 2012), [136](DEFCON SG 2026), [151](FAccT 2026), [156](ACL 2026), [161](ICPR 2026), [163](ACL 2026), [164](ICLR 2026), [166](ICLR 2026), [168](ACL 2026), [169](Philosophical Transactions A 2026), [170](ACL 2026 Findings), [171](ACL 2026), [172](AAAI 2026), [174](ICLR 2026), [177](CHIL 2026), [182](ACL 2026), [185](ACL 2026), [191](CVPR 2026), [197](ACL 2026), [198](ACL 2026), [202](DAC 2026), [205](ACL 2026 Findings), [206](ACL 2026), [212](ACL 2026 Findings), [216](ACL 2026), [217](ICLR 2026), [219](ACL 2026), [221](ICRA 2026), [223](ACL 2026), [225](ICLR 2026), [226](MLSys 2026), [228](HOST 2026), [231](ACL 2026), [232](ACL 2026 Findings), [234](AAAI 2026 Workshop), [236](CVPR 2026), [237](ACL 2026), [239](ICASSP 2026), [243](ACL 2026 Findings), [248](ECIR 2026 Workshop), [249](CVPR 2026), [250](CVPR 2026 Workshop), [253](AIED 2026), [256](DAC 2026), [257](FSE 2026), [262](L4DC 2026), [267](ACL 2026)
开源论文:[1](code), [5](code), [31](code), [38](code), [39](code), [49](code), [56](code), [65](code), [91](code), [92](code), [99](code), [108](code), [110](code), [123](code), [132](code), [149](code), [152](code), [154](code), [159](code), [162](code), [166](code), [171](code), [179](code), [180](code), [182](code), [184](code), [186](code), [190](code), [197](code), [203](code), [205](code), [206](code), [216](code), [222](code), [224](code), [232](code), [245](code), [251](code), [274](code)
新投稿 (21)
[1] Exploration and Exploitation Errors Are Measurable for Language Model Agents
- arXiv: 2604.13151
- Authors: Jaden Park, Jungtaek Kim, Jongwon Jeong, Robert D. Nowak, Kangwook Lee, Yong Jae Lee
- Subjects: cs.AI
- Tags: LLM Agent, LLM Evaluation, Embodied AI
- Code: code
- Summary: 本文提出了一种在可控环境中测量语言模型代理探索与利用错误的方法,通过设计包含2D网格地图和任务DAG的环境,实现了与策略无关的评估指标。实验发现即使是前沿模型也在此任务上表现不佳,但推理模型通过简单的工程优化可以显著改善性能。
[2] SciFi: A Safe, Lightweight, User-Friendly, and Fully Autonomous Agentic AI Workflow for Scientific Applications
- arXiv: 2604.13180
- Authors: Qibin Liu, Julia Gonski
- Subjects: cs.AI
- Tags: LLM Agent, Scientific Computing
- Summary: 本文提出了SciFi框架,一个安全、轻量级且用户友好的智能体框架,用于自主执行科学任务。该框架结合隔离执行环境、三层代理循环和自我评估机制,能够在最小人工干预下实现端到端自动化。
[3] Numerical Instability and Chaos: Quantifying the Unpredictability of Large Language Models
- arXiv: 2604.13206
- Authors: Chashi Mahiul Islam, Alan Villarreal, Mao Nishino, Shaeke Salman, Xiuwen Liu
- Subjects: cs.AI; cs.LG; math.NA
- Tags: LLM Inference, Interpretability
- Summary: 本文深入分析了大语言模型中数值不稳定性如何导致不可预测性,追踪浮点舍入误差在Transformer层中的传播。研究发现LLM表现出三种不同的行为模式:稳定态、混沌态和信号主导态。
[4] Optimizing Earth Observation Satellite Schedules under Unknown Operational Constraints: An Active Constraint Acquisition Approach
- arXiv: 2604.13283
- Authors: Mohamed-Bachir Belaid
- Subjects: cs.AI; cs.LG
- Tags: Satellite Control, Optimization, Automated Planning
- Summary: 本文研究在未知约束条件下的地球观测卫星调度问题,提出了保守约束获取方法(CCA),通过交互式学习可行性约束来优化调度。实验表明该方法显著减少了oracle查询次数并提高了优化效果。
[5] WebXSkill: Skill Learning for Autonomous Web Agents
- arXiv: 2604.13318
- Authors: Zhaoyang Wang, Qianhui Wu, Xuchao Zhang, Chaoyun Zhang, Wenlin Yao, Fazle Elahi Faisal, Baolin Peng, Si Qin, Suman Nath, Qingwei Lin, Chetan Bansal, Dongmei Zhang, Saravan Rajmohan, Jianfeng Gao, Huaxiu Yao
- Subjects: cs.AI; cs.CL
- Tags: Web Agent, GUI Automation, LLM Agent
- Code: code
- Summary: 本文提出了WebXSkill框架,通过创建可执行技能来弥合文本工作流技能与代码技能之间的差距,每个技能包含参数化动作程序和步骤级自然语言指导。在WebArena和WebVoyager基准上分别提升了9.8和12.9个百分点的任务成功率。
[6] Listening Alone, Understanding Together: Collaborative Context Recovery for Privacy-Aware AI
- arXiv: 2604.13348
- Authors: Tanmay Srivastava, Amartya Basu, Shubham Jain, Vaishnavi Ranganathan
- Subjects: cs.AI; cs.CR
- Tags: Privacy, Dialogue System, Speech Processing
- Summary: 本文提出了CONCORD框架,一个隐私感知的异步助手协作框架,通过实时说话人验证仅捕获所有者语音,并通过时空上下文解析和信息缺口检测来安全恢复必要上下文。
[7] ReSS: Learning Reasoning Models for Tabular Data Prediction via Symbolic Scaffold
- arXiv: 2604.13392
- Authors: Chenlang Yi, Gang Li, Zizhan Xiong, Tue Minh Cao, Yanmin Gong, My T. Thai, Tianbao Yang
- Subjects: cs.AI
- Tags: Tabular Learning, Neurosymbolic AI, Explainable AI
- Summary: 本文提出了ReSS框架,利用决策树提取的符号脚手架引导LLM生成符合底层决策逻辑的自然语言推理,用于表格数据预测。该方法在医疗和金融基准上提升了高达10%的准确率,同时产生忠实一致的推理。
[8] Quantifying and Understanding Uncertainty in Large Reasoning Models
- arXiv: 2604.13395
- Authors: Yangyi Li, Chenxu Zhao, Mengdi Huai
- Subjects: cs.AI; cs.LG
- Tags: LLM Reasoning, Uncertainty Estimation, Explainable AI
- Summary: 本文提出了一种基于共形预测的方法来量化大型推理模型的不确定性并提供统计保证,同时开发了基于Shapley值的解释框架来识别保持这些保证的训练样本和推理步骤。
[9] Towards Scalable Lightweight GUI Agents via Multi-role Orchestration
- arXiv: 2604.13488
- Authors: Ziwei Wang, Junjie Zheng, Leyang Yang, Sheng Zhou, Xiaoxuan Tang, Zhouhua Fang, Zhiwei Liu, Dajun Chen, Yong Li, Jiajun Bu
- Subjects: cs.AI
- Tags: GUI Automation, LLM Agent, Multimodal Learning
- Venue: ACL 2026
- Summary: 本文提出了LAMO框架,通过多角色编排使轻量级MLLM能够参与GUI自动化,结合角色导向数据合成和两阶段训练方法。开发的LAMO-3B模型支持单体执行和多智能体系统式编排。
[10] RiskWebWorld: A Realistic Interactive Benchmark for GUI Agents in E-commerce Risk Management
- arXiv: 2604.13531
- Authors: Renqi Chen, Zeyin Tao, Jianming Guo, Jing Wang, Zezhou Xu, Jingzhe Zhu, Qingqing Sun, Tianyi Zhang, Shuai Chen
- Subjects: cs.AI; cs.LG
- Tags: GUI Automation, Benchmark, LLM Agent
- Summary: 本文介绍了RiskWebWorld,首个用于评估电商风险管理中GUI代理的高仿真交互式基准,包含来自8个核心领域的1,513个任务。实验揭示了顶级模型与专用开源模型之间的巨大能力差距,同时展示了智能体RL可将开源模型性能提升16.2%。
[11] Weight Patching: Toward Source-Level Mechanistic Localization in LLMs
- arXiv: 2604.13694
- Authors: Chenghao Sun, Chengsheng Zhang, Guanzheng Qin, Rui Dai, Xinmei Tian
- Subjects: cs.AI
- Tags: Interpretability, Model Merging, LLM Inference
- Summary: 本文提出了权重修补方法,一种参数空间干预方法,用于在配对的同架构模型中定位源级行为机制。该方法揭示了从浅层载体到聚合模块再到下游执行电路的层次结构。
[12] Rethinking AI Hardware: A Three-Layer Cognitive Architecture for Autonomous Agents
- arXiv: 2604.13757
- Authors: Li Chen
- Subjects: cs.AI; cs.HC
- Tags: LLM Agent, Edge Computing, Hardware Architecture
- Summary: 本文提出了Tri-Spirit架构,一种三层认知框架,将智能分解为规划层、推理层和执行层,分别映射到不同的计算基板。该架构将任务延迟降低75.6%,能耗降低71.1%。
[13] The cognitive companion: a lightweight parallel monitoring architecture for detecting and recovering from reasoning degradation in LLM agents
- arXiv: 2604.13759
- Authors: Rafflesia Khan, Nafiul Islam Khan
- Subjects: cs.AI; cs.LG
- Tags: LLM Agent, LLM Inference, Monitoring
- Summary: 本文提出了认知伴侣架构,一种并行监控架构,包含基于LLM和基于探针两种实现,用于检测和恢复LLM代理的推理退化。基于探针的伴侣实现了零推理开销,同时在循环易发任务上减少了52-62%的重复。
[14] AlphaCNOT: Learning CNOT Minimization with Model-Based Planning
- arXiv: 2604.13812
- Authors: Jacopo Cossio, Daniele Lizzio Bosco, Riccardo Romanello, Giuseppe Serra, Carla Piazza
- Subjects: cs.AI
- Tags: Quantum Computing, Reinforcement Learning, Automated Planning
- Summary: 本文提出了AlphaCNOT,一个基于蒙特卡洛树搜索的强化学习框架,用于量子电路中的CNOT门最小化。该方法在线性可逆合成中相比基线减少了高达32%的CNOT门数量。
[15] GeoAgentBench: A Dynamic Execution Benchmark for Tool-Augmented Agents in Spatial Analysis
- arXiv: 2604.13888
- Authors: Bo Yu, Cheng Yang, Dongyang Hou, Chengfu Liu, Jiayao Liu, Chi Wang, Zhiming Zhang, Haifeng Li, Wentao Yang
- Subjects: cs.AI
- Tags: Benchmark, LLM Agent, GIS
- Summary: 本文介绍了GeoAgentBench,一个面向工具增强GIS代理的动态交互式评估基准,包含117个原子GIS工具和53个空间分析任务。作者提出的Plan-and-React架构在多步推理和错误恢复方面显著优于传统框架。
[16] AI-Assisted Peer Review at Scale: The AAAI-26 AI Review Pilot
- arXiv: 2604.13940
- Authors: Joydeep Biswas, Sheila Schoepp, Gautham Vasan, Anthony Opipari, Arthur Zhang, Zichao Hu, Sebastian Joseph, Matthew Lease, Junyi Jessy Li, Peter Stone, Kiri L. Wagstaff, Matthew E. Taylor, Odest Chadwicke Jenkins
- Subjects: cs.AI
- Tags: AI Ethics, Scientific Computing, LLM Evaluation
- Summary: 本文报告了首次大规模AI辅助同行评审部署:AAAI-26的每篇主轨道投稿都收到了一份AI生成的评审。大规模调查显示参与者认为AI评审在技术准确性和研究建议等关键维度上优于人类评审。
[17] [Emerging Ideas] Artificial Tripartite Intelligence: A Bio-Inspired, Sensor-First Architecture for Physical AI
- arXiv: 2604.13959
- Authors: You Rim Choi, Subeom Park, Hyung-Sin Kim
- Subjects: cs.AI
- Tags: Embodied AI, Hardware Architecture, Sensor Fusion
- Summary: 本文提出了一种受生物启发的、以传感器为先的物理AI架构ATI,包含脑干、小脑和大脑推理子系统三个层次,通过协同设计感知和推理来提升具身AI的性能。
[18] Reward Design for Physical Reasoning in Vision-Language Models
- arXiv: 2604.13993
- Authors: Derek Lilienthal, Manisha Mukherjee, Sameera Horawalavithana
- Subjects: cs.AI; cs.CL; cs.CV
- Tags: Vision-Language Model, Reinforcement Learning, Scientific Reasoning
- Summary: 本文系统研究了基于GRPO的视觉语言模型物理推理训练中的奖励设计,比较了四种奖励信号,发现基于准确率的奖励效果最佳,而基于注意力的奖励能增强空间推理能力。
[19] Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents
- arXiv: 2604.14004
- Authors: Kangsan Kim, Minki Kang, Taeil Kim, Yanlai Yang, Mengye Ren, Sung Ju Hwang
- Subjects: cs.AI; cs.CL
- Tags: LLM Agent, Transfer Learning, Code Generation
- Summary: 本文研究了编码智能体的记忆迁移学习,发现跨域记忆可将平均性能提升3.7%,主要通过迁移元知识而非任务特定代码实现,高层洞察比低层轨迹更具泛化性。
[20] Hierarchical Reinforcement Learning with Runtime Safety Shielding for Power Grid Operation
- arXiv: 2604.14032
- Authors: Gitesh Malik
- Subjects: cs.AI; cs.LG
- Tags: Reinforcement Learning, Energy Management, Decision Making
- Summary: 本文提出了一种用于电网运行的安全约束分层控制框架,将长期决策与实时可行性执行解耦,在未见过的电网拓扑上实现了鲁棒的零样本泛化能力。
[21] TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration
- arXiv: 2604.14116
- Authors: Zerun Ma, Guoqiang Wang, Xinchen Xie, Yicheng Chen, He Du, Bowen Li, Yanan Sun, Wenran Liu, Kai Chen, Yining Li
- Subjects: cs.AI; cs.CL
- Tags: LLM Agent, LLM Training, AutoML
- Summary: 本文介绍了TREX多智能体系统,可自动化整个LLM训练生命周期,通过将实验过程建模为搜索树来高效规划探索路径,在FT-Bench基准上展示了持续优化能力。
跨领域投稿 (138)
[22] When Reasoning Models Hurt Behavioral Simulation: A Solver-Sampler Mismatch in Multi-Agent LLM Negotiation
- arXiv: 2604.11840 (cross-listed)
- Authors: Sandro Andric
- Subjects: cs.LG; cs.AI; cs.CY; cs.MA
- Tags: LLM Reasoning, Multi-Agent System, Social Simulation
- Summary: 本文指出推理增强模型在多智能体谈判场景中可能成为更好的求解者但更差的模拟者,有界反思比原生推理能产生更多样化和妥协导向的轨迹。
[23] OVT-MLCS: An Online Visual Tool for MLCS Mining from Long or Big Sequences
- arXiv: 2604.13037 (cross-listed)
- Authors: Zhi Wang, Yanni Li, Tihua Duan, Bing Liu, Liyong Zhang, Hui Li
- Subjects: cs.DB; cs.AI
- Tags: Data Visualization, Sequence Mining
- Summary: 本文提出了OVT-MLCS在线可视化工具,用于从长序列或大序列中挖掘多个最长公共子序列,支持实时图形可视化和用户友好的交互功能。
[24] TableNet A Large-Scale Table Dataset with LLM-Powered Autonomous
- arXiv: 2604.13041 (cross-listed)
- Authors: Ruilin Zhang, Kai Yang
- Subjects: cs.DB; cs.AI
- Tags: Document Understanding, Data Synthesis, LLM Agent
- Venue: AAAI 2026 Workshop
- Summary: 本文提出了TableNet大规模表格结构识别数据集,通过LLM驱动的多智能体系统生成,并采用基于多样性的主动学习范式进行模型训练。
[25] A Pythonic Functional Approach for Semantic Data Harmonisation in the ILIAD Project
- arXiv: 2604.13042 (cross-listed)
- Authors: Erik Johan Nystad, Francisco Martín-Recuerda
- Subjects: cs.DB; cs.AI; cs.SE
- Tags: Knowledge Graph, Data Integration
- Summary: 本文提出了一种Python函数式语义数据协调方法,使用户能够通过简单的函数调用生成正确的RDF,并在ILIAD水产养殖试点中进行了验证。
[26] Integration of Deep Reinforcement Learning and Agent-based Simulation to Explore Strategies Counteracting Information Disorder
- arXiv: 2604.13047 (cross-listed)
- Authors: Luigi Lomasto, Andrea Camoia, Alfonso Guarino, Nicola Lettieri, Delfina Malandrino, Rocco Zaccagnino
- Subjects: cs.SI; cs.AI; cs.CY
- Tags: Social Simulation, Reinforcement Learning, Fake News Detection
- Summary: 本文将基于智能体的模拟与深度强化学习相结合,探索对抗社交媒体信息混乱的策略,为有效缓解虚假信息传播的条件提供了见解。
[27] From Natural Language to PromQL: A Catalog-Driven Framework with Dynamic Temporal Resolution for Cloud-Native Observability
- arXiv: 2604.13048 (cross-listed)
- Authors: Twinkll Sisodia
- Subjects: cs.DB; cs.AI; cs.SE
- Tags: Cloud Computing, Natural Language Query
- Summary: 本文提出了一种目录驱动框架,可将自然语言问题转换为可执行的PromQL查询,支持混合指标目录和动态时间分辨率机制,已在生产Kubernetes集群中部署。
[28] Hijacking online reviews: sparse manipulation and behavioral buffering in popularity-biased rating systems
- arXiv: 2604.13049 (cross-listed)
- Authors: Itsuki Fujisaki, Kunhao Yang
- Subjects: cs.SI; cs.AI
- Tags: Recommender System, Cybersecurity
- Summary: 本文研究了恶意评论者如何通过稀疏攻击利用流行度偏见的评分动态,发现适度的反向用户多样性可以部分缓冲这些扭曲效应。
[29] Form Without Function: Agent Social Behavior in the Moltbook Network
- arXiv: 2604.13052 (cross-listed)
- Authors: Saber Zerhoudi, Kanishka Ghosh Dastidar, Felix Klement, Artur Romazanov, Andreas Einwiller, Dang H. Dang, Michael Dinzinger, Michael Granitzer, Annette Hautli-Janisz, Stefan Katzenbeisser, Florian Lemmerich, Jelena Mitrovic
- Subjects: cs.SI; cs.AI; cs.CL; cs.CY; cs.MA
- Tags: Social Simulation, Multi-Agent System, LLM Agent
- Summary: 本文分析了一个完全由AI智能体组成的社交网络Moltbook,发现技术层对变化有响应但社会层基本未能形成,交互互惠性极低且质量过滤机制失效。
[30] Caption First, VQA Second: Knowledge Density, Not Task Format, Drives Multimodal Scaling
- arXiv: 2604.13054 (cross-listed)
- Authors: Hongjian Zou, Yue Ge, Qi Ding, Yixuan Liao, Xiaoxin Chen
- Subjects: cs.CL; cs.AI; cs.CV
- Tags: Vision-Language Model, Multimodal Learning, Pre-training
- Summary: 本文论证了多模态扩展的主要瓶颈是训练数据中的知识密度而非任务格式,表明VQA信号可从标题重建,性能与语义覆盖率相关性更强。
[31] WorkRB: A Community-Driven Evaluation Framework for AI in the Work Domain
- arXiv: 2604.13055 (cross-listed)
- Authors: Matthias De Lange, Warre Veys, Federico Retyk, Daniel Deniz, Warren Jouanneau, Mike Zhang, Aleksander Bielinski, Emma Jouffroy, Nicole Clobes, Nina Baranowska, David Graus, Marc Palyart, Rabih Zbib, Dimitra Gkatzia, Thomas Demeester, Tijl De Bie, Toine Bogers, Jens-Joris Decorte, Jeroen Van Hautte
- Subjects: cs.CL; cs.AI
- Tags: Benchmark, Recommender System, LLM Evaluation
- Code: code
- Summary: 本文提出了WorkRB,首个面向工作领域AI的开源社区驱动基准,涵盖13个多样化任务,支持单语和跨语言评估设置。
[32] Text-as-Signal: Quantitative Semantic Scoring with Embeddings, Logprobs, and Noise Reduction
- arXiv: 2604.13056 (cross-listed)
- Authors: Hugo Moreira
- Subjects: cs.CL; cs.AI
- Tags: Representation Learning, Text Classification, Anomaly Detection
- Summary: 本文提出了一种将文本语料库转化为定量语义信号的实用流程,结合嵌入、对数概率评估和降噪技术,应用于葡萄牙语AI新闻语料库。
[33] A Proactive EMR Assistant for Doctor-Patient Dialogue: Streaming ASR, Belief Stabilization, and Preliminary Controlled Evaluation
- arXiv: 2604.13059 (cross-listed)
- Authors: Zhenhai Pan, Yan Liu, Jia You
- Subjects: cs.CL; cs.AI
- Tags: Medical AI, Speech Processing, Dialogue System
- Summary: 本文提出了一种端到端的前瞻性电子病历助手系统,集成了流式语音识别、标点恢复、状态提取、信念稳定化和行动规划等功能,在初步控制实验中取得了良好的提取、检索和行动选择性能。
[34] Bi-Predictability: A Real-Time Signal for Monitoring LLM Interaction Integrity
- arXiv: 2604.13061 (cross-listed)
- Authors: Wael Hafez, Amir Nazeri
- Subjects: cs.CL; cs.AI
- Tags: LLM Evaluation, LLM Inference
- Summary: 本文提出了双向可预测性(P)这一信息论度量,用于实时监控多轮LLM交互完整性,并引入信息数字孪生(IDT)架构,能够在无需二次推理的情况下检测交互中的结构性解耦问题。
[35] Correct Chains, Wrong Answers: Dissociating Reasoning from Output in LLM Logic
- arXiv: 2604.13065 (cross-listed)
- Authors: Abinav Rao, Sujan Rachuri, Nikhil Vemuri
- Subjects: cs.CL; cs.AI; cs.LO
- Tags: LLM Reasoning, Benchmark
- Venue: ICLR 2026 Workshop
- Summary: 本文引入Novel Operator Test基准测试,用于区分LLM的真实推理与模式检索能力,揭示了模型能够正确执行推理链但仍产生错误答案的推理-输出分离现象。
[36] Lossless Prompt Compression via Dictionary-Encoding and In-Context Learning: Enabling Cost-Effective LLM Analysis of Repetitive Data
- arXiv: 2604.13066 (cross-listed)
- Authors: Andresa Rodrigues de Campos, David Lee, Imry Kissos, Piyush Paritosh
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: Prompt Engineering, LLM Inference
- Summary: 本文展示了LLM能够通过上下文学习编码键并直接对编码表示进行分析,实现了无需模型微调的无损提示压缩,压缩率可达80%且保持高分析准确率。
[37] Curation of a Palaeohispanic Dataset for Machine Learning
- arXiv: 2604.13070 (cross-listed)
- Authors: Gonzalo Martínez-Fernández, Jose F Quesada, Agustín Riscos-Núñez, Francisco José Salguero-Lamillar
- Subjects: cs.CL; cs.AI
- Tags: Dataset, Low-Resource NLP
- Summary: 本文构建了一个结构化的古伊比利亚语言数据集,为这一研究领域的机器学习方法提供了适合计算研究的数据资源。
[38] EVE: A Domain-Specific LLM Framework for Earth Intelligence
- arXiv: 2604.13071 (cross-listed)
- Authors: Àlex R. Atrio, Antonio Lopez, Jino Rohit, Yassine El Ouahidi, Marcello Politi, Vijayasri Iyer, Umar Jamil, Sébastien Bratières, Nicolas Longépé
- Subjects: cs.CL; cs.AI
- Tags: Foundation Model, RAG, LLM Hallucination
- Code: code
- Summary: 本文介绍了EVE,首个面向地球智能的开源端到端领域专用LLM框架,包含24B参数的领域适配模型、训练语料库和评估基准,并集成了RAG和幻觉检测管道。
[39] LiveClawBench: Benchmarking LLM Agents on Complex, Real-World Assistant Tasks
- arXiv: 2604.13072 (cross-listed)
- Authors: Xiang Long, Li Du, Yilong Xu, Fangcheng Liu, Haoqing Wang, Ning Ding, Ziheng Li, Jianyuan Guo, Yehui Tang
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: LLM Agent, Benchmark
- Code: code
- Summary: 本文引入LiveClawBench基准测试,用于评估LLM代理在真实世界助手任务上的表现,提出了三轴复杂度框架来刻画任务难度。
[40] OmniTrace: A Unified Framework for Generation-Time Attribution in Omni-Modal LLMs
- arXiv: 2604.13073 (cross-listed)
- Authors: Qianqi Yan, Yichen Guo, Ching-Chen Kuo, Shan Jiang, Hang Yin, Yang Zhao, Xin Eric Wang
- Subjects: cs.CL; cs.AI; cs.MM
- Tags: Multimodal Learning, Interpretability, Vision-Language Model
- Summary: 本文提出了OmniTrace框架,用于全模态LLM的生成时归因,能够在解码过程中追踪生成token到多模态输入的对应关系,提供跨模态解释。
[41] DeEscalWild: A Real-World Benchmark for Automated De-Escalation Training with SLMs
- arXiv: 2604.13075 (cross-listed)
- Authors: Md Hasebul Hasan, Krity Haque Charu, Eshwara Prasad Sridhar, Shuchisnigdha Deb, Mohammad A. Islam
- Subjects: cs.CL; cs.AI
- Tags: Benchmark, Dialogue System, Knowledge Distillation
- Summary: 本文提出了DeEscalWild基准数据集,用于训练小型语言模型进行警察-平民降级场景训练,微调后的SLM在大幅降低计算成本的同时超越了通用大模型。
[42] Document-tuning for robust alignment to animals
- arXiv: 2604.13076 (cross-listed)
- Authors: Jasmine Brazilek, Miles Tidmarsh
- Subjects: cs.CL; cs.AI
- Tags: LLM Alignment, Benchmark
- Summary: 本文研究了通过文档微调进行价值对齐的鲁棒性,以动物同情心为价值案例,开发了动物伤害基准测试,发现文档调优比对齐指令调优效果更好,但后续训练可能会削弱干预效果。
[43] Alignment as Institutional Design: From Behavioral Correction to Transaction Structure in Intelligent Systems
- arXiv: 2604.13079 (cross-listed)
- Authors: Rui Chai
- Subjects: cs.CY; cs.AI; cs.GT; cs.LG
- Tags: LLM Alignment, AI Safety
- Summary: 本文从制度经济学视角提出将AI对齐视为制度设计而非行为纠正,主张通过设计内部交易结构使对齐行为成为各组件的最低成本策略。
[44] Sparse Goodness: How Selective Measurement Transforms Forward-Forward Learning
- arXiv: 2604.13081 (cross-listed)
- Authors: Kamer Ali Yuksel, Hassan Sawaf
- Subjects: cs.LG; cs.AI; cs.NE
- Tags: Optimization, Representation Learning
- Summary: 本文系统研究了Forward-Forward学习中的优良函数设计空间,引入top-k优良性和entmax加权能量,发现稀疏性是最重要的设计选择,显著提升了模型准确率。
[45] The Long Delay to Arithmetic Generalization: When Learned Representations Outrun Behavior
- arXiv: 2604.13082 (cross-listed)
- Authors: Laura Gomezjurado Gonzalez
- Subjects: cs.LG; cs.AI
- Tags: LLM Reasoning, Deep Learning Theory
- Summary: 本文研究了Transformer在算术任务上的顿悟(grokking)现象,发现延迟反映了已学习结构的有限访问而非结构获取失败,编码器早期学习结构而解码器成为瓶颈。
[46] Adaptive Memory Crystallization for Autonomous AI Agent Learning in Dynamic Environments
- arXiv: 2604.13085 (cross-listed)
- Authors: Rajat Khanda, Mohammad Baqar Sambuddha Chakrabarti, Satyasaran Changdar
- Subjects: cs.LG; cs.AI
- Tags: Continual Learning, Memory Architecture, Reinforcement Learning
- Summary: 本文提出了自适应记忆结晶(AMC)架构,用于持续强化学习中的渐进经验整合,采用三阶段记忆层次结构(液态-玻璃态-晶体态),在正向迁移和减少遗忘方面表现优异。
[47] Design Conditions for Intra-Group Learning of Sequence-Level Rewards: Token Gradient Cancellation
- arXiv: 2604.13088 (cross-listed)
- Authors: Fei Ding, Yongkang Zhang, youwei wang, Zijian Zeng
- Subjects: cs.LG; cs.AI
- Tags: Reinforcement Learning, Optimization
- Summary: 本文从token级信用分配角度提出了组内学习算法设计的必要条件:组内目标必须保持token更新间的梯度可交换性,以实现弱信用/高频率token上的梯度取消。
[48] ECM Contracts: Contract-Aware, Versioned, and Governable Capability Interfaces for Embodied Agents
- arXiv: 2604.13097 (cross-listed)
- Authors: Xue Qin, Simin Luan, John See, Cong Yang, Zhijun Li
- Subjects: cs.SE; cs.AI
- Tags: Embodied AI, Robotics
- Summary: 本文提出了ECM契约,一种面向具身能力模块的契约式接口模型,编码了功能签名、行为假设、资源需求等六个维度,实现了安全的模块组合和版本管理。
[49] Contract-Coding: Towards Repo-Level Generation via Structured Symbolic Paradigm
- arXiv: 2604.13100 (cross-listed)
- Authors: Yi Lin, Lujin Zhao, Yijie Shi
- Subjects: cs.SE; cs.AI
- Tags: Repo-Level Code Generation, LLM Agent, Software Engineering
- Code: code
- Summary: 该论文提出Contract-Coding,一种结构化符号范式,通过自主符号接地将模糊意图转化为形式化的语言契约,从而解决仓库级代码生成中的上下文保真度权衡问题。该方法通过强制拓扑独立性实现了架构并行性,并在Greenfield-5基准测试中取得了47%的功能成功率。
[50] Building Trust in the Skies: A Knowledge-Grounded LLM-based Framework for Aviation Safety
- arXiv: 2604.13101 (cross-listed)
- Authors: Anirudh Iyengar, Alisa Tiselska, Dumindu Samaraweera, Hong Liu
- Subjects: cs.SE; cs.AI
- Tags: RAG, Knowledge Graph, LLM Hallucination
- Summary: 本文提出了一种结合大语言模型和知识图谱的端到端框架,用于航空安全决策。该框架首先利用LLM构建航空安全知识图谱,然后通过检索增强生成(RAG)架构来验证和解释LLM生成的响应,从而提高准确性和可追溯性。
[51] CCCE: A Continuous Code Calibration Engine for Autonomous Enterprise Codebase Maintenance via Knowledge Graph Traversal and Adaptive Decision Gating
- arXiv: 2604.13102 (cross-listed)
- Authors: Santhosh Kusuma Kumar Parimi
- Subjects: cs.SE; cs.AI
- Tags: LLM Agent, Knowledge Graph, Software Engineering
- Summary: 论文提出了持续代码校准引擎(CCCE),这是一个事件驱动的AI智能体系统,旨在通过动态知识图谱、自适应门控框架和多模型持续学习架构,自主维护企业代码库。该系统能够生成原子化、语义验证过的补丁,并显著减少平均修复时间。
[52] Can Coding Agents Be General Agents?
- arXiv: 2604.13107 (cross-listed)
- Authors: Maksim Ivanov, Abhijay Rana, Gokul Prabhakaran
- Subjects: cs.SE; cs.AI; cs.LG
- Tags: LLM Agent, Code Generation
- Summary: 该研究调查了编码智能体是否能够泛化到端到端的业务流程自动化任务。实验发现,虽然智能体能可靠地完成简单任务,但在复杂任务上表现出特征性失败,表明连接领域逻辑与代码执行是实现泛化的关键瓶颈。
[53] Formal Architecture Descriptors as Navigation Primitives for AI Coding Agents
- arXiv: 2604.13108 (cross-listed)
- Authors: Ruoqi Jin
- Subjects: cs.SE; cs.AI
- Tags: LLM Agent, Software Engineering
- Summary: 本文研究了形式化架构描述符是否能减少AI编码智能体的导航开销。实验表明,架构上下文能显著减少导航步骤和行为方差,且自动生成的描述符在代码定位任务中达到了100%的准确率。
[54] Applying an Agentic Coding Tool for Improving Published Algorithm Implementations
- arXiv: 2604.13109 (cross-listed)
- Authors: Worasait Suwannik
- Subjects: cs.SE; cs.AI
- Tags: LLM Agent, Code Generation
- Summary: 该论文提出了一种两阶段流水线,利用具有研究能力的大语言模型识别已发表的算法,并使用Claude Code迭代改进其实现。实验表明,所有十一个实验均在一天内实现了改进,同时分析了人类在此过程中不可或缺的贡献。
[55] The Code Whisperer: LLM and Graph-Based AI for Smell and Vulnerability Resolution
- arXiv: 2604.13114 (cross-listed)
- Authors: Mohammad Baqar, Raji Rustamov, Alexander Hughes
- Subjects: cs.SE; cs.AI
- Tags: Program Repair, Software Engineering, Cybersecurity
- Summary: 本文提出了The Code Whisperer,一个结合图程序分析和大语言模型的混合框架,用于在统一工作流中检测、解释和修复代码异味及软件漏洞。该方法通过联合学习结构信号和语义信号,提高了检测性能并生成了更有用的修复建议。
[56] AgentForge: Execution-Grounded Multi-Agent LLM Framework for Autonomous Software Engineering
- arXiv: 2604.13120 (cross-listed)
- Authors: Rajesh Kumar, Waqar Ali, Junaid Ahmed, Najma Imtiaz Ali, Shaban Usman
- Subjects: cs.SE; cs.AI
- Tags: LLM Agent, Multi-Agent System, Code Generation
- Code: code
- Summary: 论文介绍了AGENTFORGE,一个多智能体框架,通过强制执行沙盒验证来确保代码更改的正确性。该框架包含规划者、编码者、测试者等角色,在SWE-BENCH Lite基准测试中取得了40.0%的解决率,显著优于单智能体基线。
[57] Spectral Entropy Collapse as an Empirical Signature of Delayed Generalisation in Grokking
- arXiv: 2604.13123 (cross-listed)
- Authors: Truong Xuan Khanh, Truong Quynh Hoa, Luu Duc Trung, Phan Thanh Duc
- Subjects: cs.LG; cs.AI
- Tags: Grokking, Deep Learning Theory, Representation Learning
- Summary: 该论文确定了表示协方差的归一化谱熵作为Grokking(延迟泛化)转变的标量序参数。研究发现熵坍缩发生在泛化之前,并且因果干预证实了熵坍缩是驱动该转变的关键因素,而非范数。
[58] Graph Propagated Projection Unlearning: A Unified Framework for Vision and Audio Discriminative Models
- arXiv: 2604.13127 (cross-listed)
- Authors: Shreyansh Pathak, Jyotishman Das
- Subjects: cs.CV; cs.AI; cs.SD
- Tags: Machine Unlearning, Computer Vision, Speech Processing
- Summary: 本文提出了图传播投影遗忘(GPPU),一种适用于视觉和音频模型的统一且可扩展的类级遗忘算法。该方法利用基于图的传播识别特定类别的方向,并通过投影到正交子空间来有效移除目标类信息,同时保持模型在保留类上的实用性。
[59] Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization
- arXiv: 2604.13175 (cross-listed)
- Authors: Aadyot Bhatnagar, Peter Mørch Groth, Ali Madani
- Subjects: cs.LG; cs.AI; q-bio.QM
- Tags: Offline RL, Protein Engineering, Optimization
- Summary: 论文提出了STOMP,一种基于平滑切比雪夫标量化的离线强化学习算法,用于解决多目标对齐问题。该方法在蛋白质工程任务中表现出色,能够有效优化多个冲突目标,并在多个设置中实现了最高的超体积指标。
[60] InfiniteScienceGym: An Unbounded, Procedurally-Generated Benchmark for Scientific Analysis
- arXiv: 2604.13201 (cross-listed)
- Authors: Oliver Bentham, Vivek Srikumar
- Subjects: cs.CL; cs.AI
- Tags: Benchmark, Scientific Reasoning, LLM Evaluation
- Summary: 本文介绍了InfiniteScienceGym,一个程序化生成的科学分析基准,用于评估大语言模型从经验数据中推理的能力。该基准通过模拟器生成自包含的存储库和可验证的问答对,解决了现有基准中的发表偏倚和标签噪声问题。
[61] Inclusive Kitchen Design for Older Adults: Generative AI Visualizations to Support Mild Cognitive Impairment
- arXiv: 2604.13203 (cross-listed)
- Authors: Ibrahim Bilau, Nicole Li, Terrence Malayvong, Eunhwa Yang
- Subjects: cs.HC; cs.AI
- Tags: Text-to-Image, Diffusion Model, Accessibility Technology
- Venue: IAFOR Agen 2026
- Summary: 该研究创建了一个AI系统,利用Stable Diffusion模型将标准厨房照片转换为适合轻度认知障碍(MCI)老年人的设计。调查显示,参与者强烈偏好AI修改后的厨房设计,认为其更具认知友好性,有助于支持居家养老。
[62] Multitasking Embedding for Embryo Blastocyst Grading Prediction (MEmEBG)
- arXiv: 2604.13217 (cross-listed)
- Authors: Nahid Khoshk Angabini, Mohsen Tajgardan, Mahesh Madhavan, Zahra Asghari Varzaneh, Reza Khoshkangini, Thomas Ebner
- Subjects: cs.CV; cs.AI
- Tags: Medical AI, Multi-Task Learning, Image Classification
- Summary: 本文提出了一种基于多任务嵌入的方法(MEmEBG),用于囊胚质量评估的自动化分析和预测。该方法利用预训练的ResNet-18架构学习判别性表示,以自动识别关键囊胚成分及其等级,展示了在有限数据集上的潜力。
[63] Identifiability of Potentially Degenerate Gaussian Mixture Models With Piecewise Affine Mixing
- arXiv: 2604.13218 (cross-listed)
- Authors: Danru Xu, Sébastien Lachapelle, Sara Magliacane
- Subjects: stat.ML; cs.AI; cs.LG; math.ST
- Tags: Causal Representation Learning, Representation Learning
- Venue: AISTATS 2026
- Summary: 该论文研究了在潜在变量服从潜在退化高斯混合分布且通过分段仿射混合函数观测的情况下的因果表示学习可识别性问题。作者提出了一种两阶段方法,通过在学习的表示中强制稀疏性和高斯性来估计潜在变量,并提供了理论证明和实验验证。
[64] KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs
- arXiv: 2604.13226 (cross-listed)
- Authors: Chuangtao Chen, Grace Li Zhang, Xunzhao Yin, Cheng Zhuo, Bing Li, Ulf Schlichtmann
- Subjects: cs.LG; cs.AI
- Tags: KV Cache, LLM Inference, Memory Architecture
- Summary: 论文提出了KV Packet,一种无需重计算的缓存重用框架,通过将缓存的文档视为不可变的“数据包”并使用轻量级可训练软令牌适配器来处理上下文不连续性。实验表明,该方法在保持F1分数的同时,显著降低了计算开销和首个令牌时间延迟。
[65] SemiFA: An Agentic Multi-Modal Framework for Autonomous Semiconductor Failure Analysis Report Generation
- arXiv: 2604.13236 (cross-listed)
- Authors: Shivam Chand Kaushik
- Subjects: cs.CV; cs.AI; eess.IV
- Tags: LLM Agent, Multimodal Learning, Manufacturing AI
- Code: code
- Summary: 本文提出了SemiFA,一个多模态智能体框架,能够在不到一分钟内自主生成结构化的半导体失效分析报告。该系统采用四智能体LangGraph流水线,结合视觉语言模型、设备遥测数据和历史缺陷检索,实现了92.1%的分类准确率。
[66] On the Creativity of AI Agents
- arXiv: 2604.13242 (cross-listed)
- Authors: Giorgio Franceschelli, Mirco Musolesi
- Subjects: cs.CY; cs.AI
- Tags: LLM Agent, Cognitive Science
- Summary: 本文从功能主义和本体论两个互补视角分析AI智能体的创造力,论证了LLM智能体展现出功能主义创造力但缺乏本体论创造力的关键方面,并讨论了实现人工创造力的路径与潜在风险。
[67] Lazy or Efficient? Towards Accessible Eye-Tracking Event Detection Using LLMs
- arXiv: 2604.13243 (cross-listed)
- Authors: Dongyang Guo, Yasmeen Abdrabou, Enkelejda Kasneci
- Subjects: cs.HC; cs.AI
- Tags: Prompt Engineering, Human-Computer Interaction
- Summary: 本文介绍了一个无代码的LLM驱动流水线,可将自然语言指令转换为端到端的眼动追踪分析。该系统在保持与传统方法相当准确率的同时,大幅降低了技术门槛。
[68] 4th Workshop on Maritime Computer Vision (MaCVi): Challenge Overview
- arXiv: 2604.13244 (cross-listed)
- Authors: Benjamin Kiefer, Jan Lukas Augustin, Jon Muhovič, Mingi Jeong, Arnold Wiliem, Janez Pers, Matej Kristan, Alberto Quattrini Li, Matija Teršek, Josip Šarić, Arpita Vats, Dominik Hildebrand, Rafia Rahim, Mahmut Karaaslan, Arpit Vaishya, Steve Xie, Ersin Kaya, Akib Mashrur, Tze-Hsiang Tang, Chun-Ming Tsai, Jun-Wei Hsieh, Ming-Ching Chang, Wonwoo Jo, Doyeon Lee, Yusi Cao, Lingling Li, Vinayak Nageli, Arshad Jamal, Gorthi Rama Krishna Sai Subrahmanyam, Jemo Maeng, Seongju Lee, Kyoobin Lee, Xu Liu, LiCheng Jiao, Jannik Sheikh, Martin Weinmann, Ivan Martinović, Jose Mateus Raitz Persch, Rahul Harsha Cheppally, Mehmet E. Belviranli, Dimitris Gahtidis, Hyewon Chun, Sangmun Lee, Philipp Gorczak, Hansol Kim, Jeeyeon Jeon, Borja Carrillo Perez, Jiahui Wang, Sangmin Park, Andreas Michel, Jannick Kuester, Bettina Felten, Wolfgang Gross, Yuan Feng, Justin Davis
- Subjects: cs.CV; cs.AI; cs.RO
- Tags: Computer Vision, Object Detection, Benchmark
- Venue: CVPR 2026 Workshop
- Summary: 本文概述了CVPR 2026第四届海事计算机视觉研讨会的五个基准挑战赛,涵盖预测精度和嵌入式实时可行性,并展示了定量结果和跨挑战分析方法趋势。
[69] GeoVision-Enabled Digital Twin for Hybrid Autonomous-Teleoperated Medical Responses
- arXiv: 2604.13248 (cross-listed)
- Authors: Parham Kebria, Soheil Sabri, Laura J Brattain
- Subjects: cs.RO; cs.AI
- Tags: Digital Twin, Medical AI, Robotics
- Summary: 本文提出了一种用于混合自主-遥操作医疗响应系统的数字孪生架构,集成感知和自适应导航功能,通过实时同步的虚拟表示增强态势感知和决策能力。
[70] Out of Context: Reliability in Multimodal Anomaly Detection Requires Contextual Inference
- arXiv: 2604.13252 (cross-listed)
- Authors: Kevin Wilkinghoff, Neelu Madan, Juan Miguel Valverde, Kamal Nasrollahi, Radu Tudor Ionescu, Rafal Wisniewski, Thomas B. Moeslund, Wenwu Wang, Zheng-Hua Tan
- Subjects: cs.LG; cs.AI
- Tags: Anomaly Detection, Multimodal Learning
- Summary: 本文论证了多模态异常检测应重构为跨模态上下文推理问题,模态应扮演不对称角色以区分上下文和观测,从而实现条件化的异常定义而非单一全局参考。
[71] Hessian-Enhanced Token Attribution (HETA): Interpreting Autoregressive LLMs
- arXiv: 2604.13258 (cross-listed)
- Authors: Vishal Pramanik, Maisha Maliha, Nathaniel D. Bastian, Sumit Kumar Jha
- Subjects: cs.CL; cs.AI
- Tags: Interpretability, LLM Reasoning
- Venue: ICLR 2026
- Summary: 本文提出了HETA,一种针对解码器语言模型的新型归因框架,结合语义转移向量、基于海森矩阵的敏感度分数和KL散度,生成上下文感知且因果可信的归因结果。
[72] Rethinking Uncertainty in Segmentation: From Estimation to Decision
- arXiv: 2604.13262 (cross-listed)
- Authors: Saket Maganti
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Image Segmentation, Uncertainty Estimation, Medical AI
- Summary: 本文将分割建模为估计和决策两阶段流水线,提出置信度感知延迟规则,在仅延迟25%像素的情况下可消除高达80%的分割错误,揭示了标准不确定性指标与实际决策效用之间的脱节。
[73] Explainable Fall Detection for Elderly Care via Temporally Stable SHAP in Skeleton-Based Human Activity Recognition
- arXiv: 2604.13279 (cross-listed)
- Authors: Mohammad Saleh, Azadeh Tabatabaei
- Subjects: cs.CV; cs.AI
- Tags: Human Activity Recognition, Explainable AI, Medical AI
- Summary: 本文提出了一种轻量级可解释的跌倒检测框架,结合LSTM模型和T-SHAP时序感知后验聚合策略,在NTU RGB+D数据集上实现94.3%分类准确率,并提供临床可信的解释。
[74] L2D-Clinical: Learning to Defer for Adaptive Model Selection in Clinical Text Classification
- arXiv: 2604.13285 (cross-listed)
- Authors: Rishik Kondadadi, John E. Ortega
- Subjects: cs.CL; cs.AI
- Tags: Text Classification, Medical AI, Decision Making
- Summary: 本文提出了L2D-Clinical框架,学习何时将BERT分类器延迟给LLM,基于不确定性信号和文本特征进行自适应延迟,在临床文本分类任务上显著提升准确率同时降低API成本。
[75] English is Not All You Need: Systematically Exploring the Role of Multilinguality in LLM Post-Training
- arXiv: 2604.13286 (cross-listed)
- Authors: Mehak Dhaliwal, Shashwat Chaurasia, Yao Qin, Dezhi Hong, Thomas Butler
- Subjects: cs.CL; cs.AI
- Tags: Multilingual Learning, Instruction Tuning, LLM Training
- Summary: 本文基于220次监督微调实验系统研究了多语言性在LLM后训练中的作用,发现增加语言覆盖率对各任务和模型规模均有益处,低资源语言受益最大,仅加入一种非英语语言即可改善性能。
[76] Giving Voice to the Constitution: Low-Resource Text-to-Speech for Quechua and Spanish Using a Bilingual Legal Corpus
- arXiv: 2604.13288 (cross-listed)
- Authors: John E. Ortega, Rodolfo Zevallos, Fabricio Carraro
- Subjects: cs.CL; cs.AI; cs.DL
- Tags: Speech Synthesis, Low-Resource NLP, Multilingual Learning
- Summary: 本文提出了一个统一的语音合成流水线,使用三种先进TTS架构为秘鲁宪法生成高质量的克丘亚语和西班牙语语音,利用跨语言迁移缓解克丘亚语的数据稀缺问题。
[77] Can Cross-Layer Transcoders Replace Vision Transformer Activations? An Interpretable Perspective on Vision
- arXiv: 2604.13304 (cross-listed)
- Authors: Gerasimos Chatzoudis, Konstantinos D. Polyzos, Zhuowei Li, Difei Gu, Gemma E. Moran, Hao Wang, Dimitris N. Metaxas
- Subjects: cs.CV; cs.AI
- Tags: Vision Transformer, Interpretability
- Summary: 本文引入跨层转码器作为Vision Transformer中MLP块的稀疏深度感知代理模型,实现高重建保真度的同时提供忠实归因和过程级可解释性。
[78] Beyond Uniform Sampling: Synergistic Active Learning and Input Denoising for Robust Neural Operators
- arXiv: 2604.13316 (cross-listed)
- Authors: Samrendra Roy, Souvik Chakraborty, Syed Bahauddin Alam
- Subjects: cs.LG; cs.AI
- Tags: Neural Operator, Adversarial Robustness, Scientific Computing
- Summary: 本文提出了一种协同防御方法,结合主动学习数据生成和输入去噪架构来增强神经算子的鲁棒性,在粘性Burgers方程基准上实现87%的误差降低。
[79] Finetuning-Free Diffusion Model with Adaptive Constraint Guidance for Inorganic Crystal Structure Generation
- arXiv: 2604.13354 (cross-listed)
- Authors: Auguste de Lambilly, Vladimir Baturin, David Portehault, Guillaume Lambard, Nataliya Sokolovska, Florence d'Alché-Buc, Jean-Claude Crivello
- Subjects: cs.AI
- Tags: Diffusion Model, Material Discovery, Molecular Generation
- Summary: 本文提出了一种基于扩散模型的自适应约束引导框架,用于生成无机晶体结构,结合图神经网络估计器和凸包分析进行多步验证,生成满足目标几何约束的热力学合理晶体结构。
[80] Peer-Predictive Self-Training for Language Model Reasoning
- arXiv: 2604.13356 (cross-listed)
- Authors: Shi Feng, Hanlin Zhang, Fan Nie, Sham Kakade, Yiling Chen
- Subjects: cs.CL; cs.AI; cs.GT
- Tags: LLM Reasoning, Mathematical Reasoning, Self-Supervised Learning
- Summary: 本文提出了PST框架,一种无标签微调方法,多个语言模型通过跨模型聚合响应作为内部训练信号协同改进,在数学推理基准上提升2.2-4.3个百分点的准确率。
[81] A 3D SAM-Based Progressive Prompting Framework for Multi-Task Segmentation of Radiotherapy-induced Normal Tissue Injuries in Limited-Data Settings
- arXiv: 2604.13367 (cross-listed)
- Authors: Caiwen Jiang, Lei Zeng, Wei Liu
- Subjects: cs.CV; cs.AI
- Tags: Medical AI, Image Segmentation, 3D Vision
- Summary: 本文提出了一个基于3D SAM的渐进式提示框架,用于在有限数据设置下对放疗引起的正常组织损伤进行多任务分割。该框架逐步整合文本提示、剂量引导的边界框提示和点击提示,并引入小目标聚焦损失来改善小而稀疏病变的分割效果。
[82] Young people's perceptions and recommendations for conversational generative artificial intelligence in youth mental health
- arXiv: 2604.13381 (cross-listed)
- Authors: Adam Poulsen, Ian B. Hickie, Carla Gorban, Zsofi de Haan, William Capon, Ebenezer Eyeson-Annan, Jalal Radwan, Elizabeth M. Scott, Frank Iorfino, Haley M. LaMonica
- Subjects: cs.HC; cs.AI
- Tags: Dialogue System, AI Ethics, Human-Computer Interaction
- Summary: 本研究探讨了年轻人对生成式AI聊天机器人在青少年心理健康领域的看法和建议。通过共同设计方法,32名年轻人参与了在线研讨会,形成了四个主题,为生成式AI聊天机器人在青少年心理健康服务中的伦理、设计、开发和治理提供了重要见解。
[83] On the Use of Evolutionary Optimization for the Dynamic Chance Constrained Open-Pit Mine Scheduling Problem
- arXiv: 2604.13385 (cross-listed)
- Authors: Ishara Hewa Pathiranage, Aneta Neumann
- Subjects: cs.NE; cs.AI
- Tags: Optimization, Evolutionary Computation
- Venue: WCCI 2026
- Summary: 本文研究了一个动态机会约束的露天矿调度问题,其中区块经济价值具有随机性,采矿和加工能力随时间变化。作者提出了一种基于多样性的变化响应机制,在检测到变化时修复不可行解并引入额外的可行解,实验证明该方法持续优于基线方法。
[84] From Prediction to Justification: Aligning Sentiment Reasoning with Human Rationale via Reinforcement Learning
- arXiv: 2604.13398 (cross-listed)
- Authors: Shihao Zhang, Ziwei Wang, Jie Zhou, Yulan Wu, Qin Chen, Zhikai Lei, Liyang Yu, Liang Dou, Liang He
- Subjects: cs.CL; cs.AI
- Tags: Sentiment Analysis, Reinforcement Learning, LLM Reasoning
- Summary: 本文提出了ABSA-R1框架,通过强化学习使大语言模型学习”先推理后预测”的认知过程,生成自然语言解释来支撑情感预测。实验结果表明,这种显式推理能力不仅增强了可解释性,还在情感分类和三元组提取任务上取得了优于非推理基线的性能。
[85] Minimax Optimality and Spectral Routing for Majority-Vote Ensembles under Markov Dependence
- arXiv: 2604.13414 (cross-listed)
- Authors: Ibne Farabi Shihab, Sanjeda Akter, Anuj Sharma
- Subjects: cs.LG; cs.AI
- Tags: Time Series Forecasting, Optimization, Deep Learning Theory
- Summary: 本文研究了训练数据具有马尔可夫依赖性时多数投票集成的性能退化问题。作者提供了离散分类在固定维度马尔可夫设置下的极小化表征,并提出了自适应谱路由算法,通过依赖图的Fiedler特征向量划分训练数据,在图正则子类上达到极小化率。
[86] DF3DV-1K: A Large-Scale Dataset and Benchmark for Distractor-Free Novel View Synthesis
- arXiv: 2604.13416 (cross-listed)
- Authors: Cheng-You Lu, Yi-Shan Hung, Wei-Ling Chi, Hao-Ping Wang, Charlie Li-Ting Tsai, Yu-Cheng Chang, Yu-Lun Liu, Thomas Do, Chin-Teng Lin
- Subjects: cs.CV; cs.AI
- Tags: Novel View Synthesis, Dataset, 3D Vision
- Summary: 本文介绍了DF3DV-1K,一个大规模真实世界数据集,包含1,048个场景,每个场景提供干净和杂乱的图像集用于无干扰新视角合成基准测试。数据集共包含89,924张图像,涵盖128种干扰物类型和161种场景主题,用于评估无干扰辐射场方法。
[87] The Cognitive Circuit Breaker: A Systems Engineering Framework for Intrinsic AI Reliability
- arXiv: 2604.13417 (cross-listed)
- Authors: Jonathan Pan
- Subjects: cs.SE; cs.AI
- Tags: LLM Hallucination, Interpretability
- Summary: 本文提出了认知断路器框架,通过在模型前向传播过程中提取隐藏状态来计算”认知失调增量”,即LLM外在语义置信度与内在潜在确定性之间的数学差距。该方法能够以最小的延迟开销检测幻觉和”伪造真实性”,且对主动推理管道增加的计算开销可忽略不计。
[88] MERRIN: A Benchmark for Multimodal Evidence Retrieval and Reasoning in Noisy Web Environments
- arXiv: 2604.13418 (cross-listed)
- Authors: Han Wang, David Wan, Hyunji Lee, Thinh Pham, Mikaela Cankosyan, Weiyuan Chen, Elias Stengel-Eskin, Tu Vu, Mohit Bansal
- Subjects: cs.CL; cs.AI; cs.CV
- Tags: Benchmark, Multimodal Learning, Information Retrieval
- Summary: 本文介绍了MERRIN基准,用于评估搜索增强型AI代理在嘈杂网络环境中识别相关模态、检索多模态证据并进行多跳推理的能力。该基准使用自然语言查询,包含视频和音频等未被充分探索的模态,要求在网页搜索过程中检索复杂、嘈杂或冲突的多模态证据。
[89] Event-Adaptive State Transition and Gated Fusion for RGB-Event Object Tracking
- arXiv: 2604.13426 (cross-listed)
- Authors: Jinlin You, Muyu Li, Xudong Zhao
- Subjects: cs.CV; cs.AI
- Tags: Object Tracking, Multimodal Learning
- Summary: 本文提出了MambaTrack,一个基于动态状态空间模型的多模态高效跟踪框架。该方法引入事件自适应状态转移机制,根据事件流密度动态调节状态转移矩阵,并开发了门控投影融合模块用于鲁棒的跨模态集成,在FE108和FELT数据集上达到了最先进的性能。
[90] A Unified Conditional Flow for Motion Generation, Editing, and Intra-Structural Retargeting
- arXiv: 2604.13427 (cross-listed)
- Authors: Junlin Li, Xinhao Song, Siqi Wang, Haibin Huang, Yili Zhao
- Subjects: cs.GR; cs.AI; cs.CV
- Tags: Motion Synthesis, Flow Matching
- Summary: 本文提出了一种统一视角,将文本驱动的运动编辑和结构内重定向都视为单一生成框架内的条件传输实例。通过流匹配技术,单一训练模型即可支持文本到运动生成、零样本编辑和零样本结构内重定向,简化了部署并提高了结构一致性。
[91] MaMe & MaRe: Matrix-Based Token Merging and Restoration for Efficient Visual Perception and Synthesis
- arXiv: 2604.13432 (cross-listed)
- Authors: Simin Huo, Ning Li
- Subjects: cs.CV; cs.AI
- Tags: Vision Transformer, Model Compression
- Venue: CVPR 2026 Findings
- Code: code
- Summary: 本文介绍了MaMe,一种完全基于矩阵运算的免训练可微分token合并方法,以及其逆操作MaRe用于token恢复。当应用于预训练模型时,MaMe使ViT-B吞吐量翻倍,准确率仅下降2%;在图像合成任务中,MaMe+MaRe管道在将Stable Diffusion v2.1生成延迟降低31%的同时提高了质量。
[92] A KL Lens on Quantization: Fast, Forward-Only Sensitivity for Mixed-Precision SSM-Transformer Models
- arXiv: 2604.13440 (cross-listed)
- Authors: Jason Kong, Nilesh Prasad Pandey, Flavio Ponzina, Tajana Rosing
- Subjects: cs.LG; cs.AI
- Tags: Quantization, LLM Inference
- Code: code
- Summary: 本文提出了一种轻量级、无反向传播的敏感性分析框架,用于识别混合SSM-Transformer模型中最容易受量化影响的组件。该方法仅依赖前向传播指标,并证明KL散度指标比MSE和SQNR更能捕捉语言建模任务的量化敏感性,实现了在边缘设备上的高效部署。
[93] A Study of Failure Modes in Two-Stage Human-Object Interaction Detection
- arXiv: 2604.13448 (cross-listed)
- Authors: Lemeng Wang, Qinqian Lei, Vidhi Bakshi, Daniel Yi, Yifan Liu, Jiacheng Hou, Asher Seng Hao, Zheda Mai, Wei-Lun Chao, Robby T. Tan, Bo Wang
- Subjects: cs.CV; cs.AI
- Tags: Human-Object Interaction, Object Detection
- Venue: CVPR 2026 Workshop
- Summary: 本文研究了双阶段人-物交互检测模型的失败模式。作者从现有HOI数据集中按人-物-交互配置组织图像子集,分析模型在这些配置下的行为,以研究不同类型的失败模式。研究发现,高整体基准性能并不一定反映对人-物关系的鲁棒视觉推理能力。
[94] Outperforming Self-Attention Mechanisms in Solar Irradiance Forecasting via Physics-Guided Neural Networks
- arXiv: 2604.13455 (cross-listed)
- Authors: Mohammed Ezzaldin Babiker Abdullah, Rufaidah Abdallah Ibrahim Mohammed
- Subjects: cs.LG; cs.AI; eess.SY
- Tags: Time Series Forecasting, Physics-Informed Learning
- Summary: 本文提出了一种轻量级的物理信息混合CNN-BiLSTM框架用于全球水平辐照度预测,整合了15个工程特征而非仅依赖原始历史数据。实验验证表明,该物理引导方法的RMSE为19.53 W/m²,显著优于复杂的注意力基线方法,证实了在高噪声气象任务中显式物理约束比自注意力机制更高效准确。
[95] Asymmetric-Loss-Guided Hybrid CNN-BiLSTM-Attention Model for Industrial RUL Prediction with Interpretable Failure Heatmaps
- arXiv: 2604.13459 (cross-listed)
- Authors: Mohammed Ezzaldin Babiker Abdullah
- Subjects: cs.LG; cs.AI; eess.SY
- Tags: Predictive Maintenance, Time Series Forecasting
- Summary: 本文提出了一种混合架构,集成了双阶段一维CNN、双向LSTM和自定义Bahdanau加性注意力机制,用于涡扇发动机剩余使用寿命预测。模型采用非对称指数损失函数,对剩余寿命高估施加更大惩罚以确保工业安全约束,并提供了可解释的故障热图支持维护决策。
[96] From Order to Distribution: A Spectral Characterization of Forgetting in Continual Learning
- arXiv: 2604.13460 (cross-listed)
- Authors: Zonghuan Xu, Xingjun Ma
- Subjects: cs.LG; cs.AI
- Tags: Continual Learning, Deep Learning Theory
- Summary: 本文从顺序转向分布的视角研究持续学习中的遗忘问题。在任务从分布中独立同分布采样的精确拟合线性机制下,作者推导出了遗忘量的精确算子恒等式,揭示了递归谱结构,并阐明了任务分布的几何性质如何驱动慢速或快速遗忘。
[97] Learning from Change: Predictive Models for Incident Prevention in a Regulated IT Environment
- arXiv: 2604.13462 (cross-listed)
- Authors: Eileen Kapel, Jan Lennartz, Luis Cruz, Diomidis Spinellis, Arie van Deursen
- Subjects: cs.SE; cs.AI; cs.CE; cs.LG
- Tags: Predictive Maintenance, Explainable AI, Software Engineering
- Venue: ICSE-SEIP 2026
- Summary: 本文提出了一种在大型国际银行中预测IT变更风险的方法,使用LightGBM等机器学习模型结合SHAP值提供可解释性,在满足监管合规需求的同时超越了基于规则的方法。
[98] Functional Emotions or Situational Contexts? A Discriminating Test from the Mythos Preview System Card
- arXiv: 2604.13466 (cross-listed)
- Authors: Hiranya V. Peiris
- Subjects: cs.HC; cs.AI; cs.CL; cs.LG
- Tags: LLM Alignment, Interpretability, LLM Security
- Summary: 本文分析了Claude Mythos Preview系统卡片中的情感向量,提出两种假设:情感向量追踪功能性情绪还是将情境上下文投射到人类情感轴上,并建议通过测试来区分这两种假设。
[99] Bridging MARL to SARL: An Order-Independent Multi-Agent Transformer via Latent Consensus
- arXiv: 2604.13472 (cross-listed)
- Authors: Zijian Zhao, Jing Gao, Sen Li
- Subjects: cs.LG; cs.AI; cs.MA
- Tags: Multi-Agent System, Reinforcement Learning, Transformer
- Code: code
- Summary: 本文提出CMAT(共识多智能体Transformer),一种将协作多智能体强化学习桥接到层次化单智能体强化学习的中心化框架,通过潜在共识向量实现顺序无关的联合决策。
[100] Secure and Privacy-Preserving Vertical Federated Learning
- arXiv: 2604.13474 (cross-listed)
- Authors: Shan Jin, Sai Rahul Rachuri, Yizhen Wang, Anderson C.A. Nascimento, Yiwei Cai
- Subjects: cs.CR; cs.AI; cs.DC
- Tags: Federated Learning, Privacy, Differential Privacy
- Summary: 本文提出了一种端到端的隐私保护框架,用于垂直联邦学习场景,通过多服务器运行安全多方计算协议进行模型和特征聚合,并对最终模型应用差分隐私。
[101] Monthly Diffusion v0.9: A Latent Diffusion Model for the First AI-MIP
- arXiv: 2604.13481 (cross-listed)
- Authors: Kyle J. C. Hall, Maria J. Molina
- Subjects: cs.LG; cs.AI
- Tags: Diffusion Model, Weather Forecasting, Neural Operator
- Summary: 本文描述了MD-1.5,一种气候模拟器,利用球面傅里叶神经算子启发的CVAE架构和潜在扩散来模拟低频大气变异性,设计用于数据稀疏场景和适度计算需求。
[102] Chain of Uncertain Rewards with Large Language Models for Reinforcement Learning
- arXiv: 2604.13504 (cross-listed)
- Authors: Shentong Mo
- Subjects: cs.LG; cs.AI; cs.CL; cs.MA; cs.RO
- Tags: Reinforcement Learning, Reward Design, LLM Reasoning
- Summary: 本文提出CoUR框架,将大语言模型集成到强化学习的奖励函数设计中,通过代码不确定性量化和贝叶斯优化减少冗余评估,在IsaacGym和Bidexterous Manipulation基准上实现了更好的性能。
[103] SFT-GRPO Data Overlap as a Post-Training Hyperparameter for Autoformalization
- arXiv: 2604.13515 (cross-listed)
- Authors: Xiaole Su, Kasey Zhang, Andy Lyu
- Subjects: cs.LG; cs.AI; cs.LO
- Tags: Instruction Tuning, Mathematical Reasoning, Autoformalization
- Summary: 本文研究了SFT-GRPO数据重叠作为后训练超参数对Lean 4自动形式化的影响,发现保持SFT和GRPO数据不相交能够持续优于完全重叠,揭示了编译-语义差距。
[104] Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO
- arXiv: 2604.13517 (cross-listed)
- Authors: Jing Sun
- Subjects: cs.LG; cs.AI
- Tags: Reinforcement Learning, Optimization
- Summary: 本文揭示了多时间尺度PPO在延迟奖励任务中的算法病理问题,包括代理目标黑客攻击和短视退化,并提出目标解耦架构来解决这些问题。
[105] From Alignment to Prediction: A Study of Self-Supervised Learning and Predictive Representation Learning
- arXiv: 2604.13518 (cross-listed)
- Authors: Mintu Dutta, Ritesh Vyas, Mohendra Roy
- Subjects: cs.LG; cs.AI
- Tags: Self-Supervised Learning, Representation Learning
- Summary: 本文定义了预测性表示学习(PRL)作为自监督学习的新类别,专注于基于观察预测未观察到的数据成分,提出了将PRL与对齐和重建方法分类的分类法。
[106] C-voting: Confidence-Based Test-Time Voting without Explicit Energy Functions
- arXiv: 2604.13521 (cross-listed)
- Authors: Kenji Kubo, Shunsuke Kamiya, Masanori Koyama, Kohei Hayashi, Yusuke Iwasawa, Yutaka Matsuo
- Subjects: cs.LG; cs.AI
- Tags: LLM Reasoning, Test-Time Adaptation
- Summary: 本文提出了基于置信度的投票(C-voting),一种适用于循环模型的测试时扩展策略,通过选择最大化top-1概率平均值的候选来提升推理性能,无需显式能量函数。
[107] Free Lunch for Unified Multimodal Models: Enhancing Generation via Reflective Rectification with Inherent Understanding
- arXiv: 2604.13540 (cross-listed)
- Authors: Yibo Jiang, Tao Wu, Rui Jiang, Yehao Lu, Chaoxiang Cai, Zequn Qin, Xi Li
- Subjects: cs.CV; cs.AI
- Tags: Vision-Language Model, Diffusion Model, Multimodal Learning
- Summary: 本文提出UniRect-CoT,一种免训练的统一修正思维链框架,利用统一多模态模型的内在理解能力在生成过程中进行反思和修正,显著提升生成质量。
[108] Training-Free Test-Time Contrastive Learning for Large Language Models
- arXiv: 2604.13552 (cross-listed)
- Authors: Kaiwen Zheng, Kai Zhou, Jinwu Hu, Te Gu, Mingkai Peng, Fei Liu
- Subjects: cs.CL; cs.AI
- Tags: LLM Reasoning, Test-Time Adaptation, In-Context Learning
- Venue: ACL 2026
- Code: code
- Summary: 本文提出TF-TTCL,一种免训练的测试时对比学习框架,通过探索-反思-引导循环使冻结的大语言模型能够从自身推理经验中学习,在分布偏移下保持稳健性能。
[109] CLIP Architecture for Abdominal CT Image-Text Alignment and Zero-Shot Learning: Investigating Batch Composition and Data Scaling
- arXiv: 2604.13561 (cross-listed)
- Authors: Shivika, Kartik Bose, Pankaj Gupta
- Subjects: cs.CV; cs.AI
- Tags: Vision-Language Model, Medical AI, Zero-Shot Learning
- Summary: 本文研究了训练批次组成对Merlin模型(用于3D腹部CT与放射报告对齐)的影响,发现随机采样的随机多样性比工程化的类别比例更有效,性能随数据规模亚线性增长。
[110] UHR-BAT: Budget-Aware Token Compression Vision-Language model for Ultra-High-Resolution Remote Sensing
- arXiv: 2604.13565 (cross-listed)
- Authors: Yunkai Dang, Minxin Dai, Yuekun Yang, Zhangnan Li, Wenbin Li, Feng Miao, Yang Gao
- Subjects: cs.CV; cs.AI
- Tags: Vision-Language Model, Remote Sensing, Token Compression
- Code: code
- Summary: 本文提出UHR-BAT,一种查询引导的token压缩框架,用于超高分辨率遥感图像,通过文本引导的多尺度重要性估计和区域保留合并策略在严格上下文预算下高效选择视觉token。
[111] Comparison of window shapes and lengths in short-time feature extraction for classification of heart sound signals
- arXiv: 2604.13567 (cross-listed)
- Authors: Mahmoud Fakhry, Abeer FathAllah Brery
- Subjects: cs.SD; cs.AI
- Tags: Medical AI, Signal Processing, Speech Processing
- Summary: 本文评估了不同窗口形状(高斯、三角、矩形)和长度对心音信号特征提取和分类的影响,发现75ms高斯窗口效果最佳,优于基线方法。
[112] BenGER: A Collaborative Web Platform for End-to-End Benchmarking of German Legal Tasks
- arXiv: 2604.13583 (cross-listed)
- Authors: Sebastian Nagl, Matthias Grabmair
- Subjects: cs.CL; cs.AI
- Tags: Legal AI, LLM Evaluation, Benchmark
- Venue: ICAIL 2026
- Summary: 本文介绍了BenGER,一个用于德国法律任务端到端基准测试的开源网络平台,集成了任务创建、协作标注、LLM执行和多指标评估功能,支持多组织项目。
[113] Design Space Exploration of Hybrid Quantum Neural Networks for Chronic Kidney Disease
- arXiv: 2604.13608 (cross-listed)
- Authors: Muhammad Kashif, Hanzalah Mohamed Siraj, Nouhaila Innan, Alberto Marchisio, Muhammad Shafique
- Subjects: cs.LG; cs.AI
- Tags: Quantum Computing, Medical AI
- Summary: 本文对用于慢性肾病诊断的混合量子神经网络进行了全面的设计空间探索,通过组合不同的编码方案、纠缠架构、测量策略和采样设置,基准测试了625种不同的HQNN模型。研究发现紧凑架构配合适当的编码可以在准确性、鲁棒性和效率之间实现最佳权衡。
[114] Golden Handcuffs make safer AI agents
- arXiv: 2604.13609 (cross-listed)
- Authors: Aram Ebtekar, Michael K. Cohen
- Subjects: cs.LG; cs.AI
- Tags: AI Safety, Reinforcement Learning, LLM Agent
- Summary: 本文提出了一种贝叶斯方法来增强强化学习智能体的安全性,通过扩展主观奖励范围并设计覆盖机制,在预测值低于阈值时将控制权交给安全导师。作者证明了该智能体兼具能力(次线性遗憾)和安全性属性。
[115] Syn-TurnTurk: A Synthetic Dataset for Turn-Taking Prediction in Turkish Dialogues
- arXiv: 2604.13620 (cross-listed)
- Authors: Ahmet Tuğrul Bayrak, Mustafa Sertaç Türkel, Fatma Nur Korkmaz
- Subjects: cs.CL; cs.AI
- Tags: Dialogue System, Data Synthesis, Speech Processing
- Venue: IEEE ICASI 2026
- Summary: 本文介绍了Syn-TurnTurk,一个使用多种Qwen大语言模型生成的合成土耳其语对话数据集,用于话轮转换预测。实验表明BI-LSTM和集成方法在该数据集上取得了0.839的准确率和0.910的AUC分数。
[116] SafeHarness: Lifecycle-Integrated Security Architecture for LLM-based Agent Deployment
- arXiv: 2604.13630 (cross-listed)
- Authors: Xixun Lin, Yang Liu, Yancheng Chen, Yongxuan Wu, Yucheng Ning, Yilong Liu, Nan Sun, Shun Zhang, Bin Chong, Chuan Zhou, Yanan Cao, Li Guo
- Subjects: cs.CR; cs.AI
- Tags: LLM Security, LLM Agent
- Summary: 本文提出了SafeHarness安全架构,将四层防御机制直接编织到LLM智能体生命周期中,包括对抗性上下文过滤、分层因果验证、权限分离的工具控制和安全回滚。相比未受保护的基线,该系统在 unsafe behavior rate 和 attack success rate 上分别降低了约38%和42%。
[117] A Mechanistic Analysis of Sim-and-Real Co-Training in Generative Robot Policies
- arXiv: 2604.13645 (cross-listed)
- Authors: Yu Lei, Minghuan Liu, Abhiram Maddukuri, Zhenyu Jiang, Yuke Zhu
- Subjects: cs.RO; cs.AI; cs.LG
- Tags: Robotics, Sim-to-Real, Reinforcement Learning
- Summary: 本文通过理论分析和实验研究揭示了仿真与现实协同训练的内在机制,识别出两个关键效应:结构化表示对齐和重要性重加权效应。该分析为协同训练技术提供了统一解释,并提出了一种简单方法持续改进先前方法。
[118] Ordinary Least Squares is a Special Case of Transformer
- arXiv: 2604.13656 (cross-listed)
- Authors: Xiaojun Tan, Yuchen Zhao
- Subjects: cs.LG; cs.AI; math.ST; stat.ML
- Tags: Deep Learning Theory, Representation Learning
- Summary: 本文通过严格的代数证明表明,普通最小二乘法(OLS)是单层线性Transformer的特例。作者利用经验协方差矩阵的谱分解构造了特定参数设置,使注意力机制的前向传播在数学上等价于OLS闭式投影。
[119] Automatically Inferring Teachers' Geometric Content Knowledge: A Skills Based Approach
- arXiv: 2604.13666 (cross-listed)
- Authors: Ziv Fenigstein, Kobi Gal, Avi Segal, Osama Swidan, Inbal Israel, Hassan Ayoob
- Subjects: cs.CY; cs.AI; cs.LG
- Tags: Education Technology, Knowledge Tracing, RAG
- Venue: AIED 2026
- Summary: 本文开发了一种基于大语言模型的自动化方法,用于诊断教师的Van Hiele几何推理水平,通过构建包含33个细粒度推理技能的结构化技能字典。实验表明,结合技能信息的RAG和多任务学习方法在Van Hiele分类任务上显著优于基线。
[120] IndicDB -- Benchmarking Multilingual Text-to-SQL Capabilities in Indian Languages
- arXiv: 2604.13686 (cross-listed)
- Authors: Aviral Dawar, Roshan Karanth, Vikram Goyal, Dhruv Kumar
- Subjects: cs.CL; cs.AI; cs.DB
- Tags: Text-to-SQL, Benchmark, Multilingual Learning
- Summary: 本文提出了IndicDB,一个用于评估印度语言跨语言语义解析的多语言Text-to-SQL基准,包含20个数据库和15,617个任务。实验结果显示从英语到印度语言存在9%的性能下降,揭示了由模式链接困难和结构歧义导致的’Indic Gap’。
[121] Beyond Voxel 3D Editing: Learning from 3D Masks and Self-Constructed Data
- arXiv: 2604.13688 (cross-listed)
- Authors: Yizhao Xu, Hongyuan Zhu, Caiyun Liu, Tianfu Wang, Keyu Chen, Sicheng Xu, Jiaolong Yang, Nicholas Jing Yuan, Qi Zhang
- Subjects: cs.CV; cs.AI
- Tags: 3D Vision, Image Editing, Data Synthesis
- Summary: 本文提出了Beyond Voxel 3D Editing(BVE)框架,配备了自构建的大规模3D编辑数据集,通过轻量级可训练模块增强基础图像到3D生成架构。该框架还引入了无标注3D掩码策略来保持局部不变性,在生成高质量、文本对齐的3D资产方面取得了优越性能。
[122] Med-CAM: Minimal Evidence for Explaining Medical Decision Making
- arXiv: 2604.13695 (cross-listed)
- Authors: Pirzada Suhail, Aditya Anand, Amit Sethi
- Subjects: cs.CV; cs.AI
- Tags: Medical AI, Explainable AI, Interpretability
- Summary: 本文介绍了Med-CAM框架,通过分类器激活匹配生成最小且清晰的证据图来解释医学决策。与产生模糊区域的Grad-CAM等方法不同,Med-CAM提供具有优越空间感知能力的确定性、基于证据的解释,能够忠实地复现模型预测。
[123] MIND: AI Co-Scientist for Material Research
- arXiv: 2604.13699 (cross-listed)
- Authors: Geonhee Ahn, Donghyun Lee, Hayoung Doo, Jonggeol Na, Hyunsoo Cho, Sookyung Kim
- Subjects: cs.MA; cs.AI; cs.CE
- Tags: Material Discovery, LLM Agent, Multi-Agent System
- Code: code
- Summary: 本文提出了MIND,一个LLM驱动的材料研究自动化假设验证框架,在多智能体管道中组织假设细化、实验和辩论验证过程。该系统集成了机器学习原子间势能模型,实现了可扩展的计算实验。
[124] Beyond Arrow's Impossibility: Fairness as an Emergent Property of Multi-Agent Collaboration
- arXiv: 2604.13705 (cross-listed)
- Authors: Sayan Kumar Chaki, Antoine Gourru, Julien Velcin
- Subjects: cs.CL; cs.AI; cs.GT; cs.MA
- Tags: Fairness, Multi-Agent System, LLM Agent
- Summary: 本文通过医院分诊框架研究公平性作为多智能体协作的涌现属性,其中一个智能体通过RAG与特定伦理框架对齐。研究发现联合最终分配可以满足单独智能体无法达到的公平性标准,将公平性重新定位为去中心化智能体交互的涌现属性。
[125] Towards Fine-grained Temporal Perception: Post-Training Large Audio-Language Models with Audio-Side Time Prompt
- arXiv: 2604.13715 (cross-listed)
- Authors: Yanfeng Shi, Pengfei Cai, Jun Liu, Qing Gu, Nan Jiang, Lirong Dai, Ian McLoughlin, Yan Song
- Subjects: cs.SD; cs.AI
- Tags: Speech Processing, Multimodal Learning, Reinforcement Learning
- Summary: 本文提出了TimePro-RL框架,通过音频侧时间提示和强化学习来增强大型音频语言模型的细粒度时间感知能力。该方法将时间戳编码为嵌入并交错在音频特征序列中,在音频定位、声音事件检测和密集音频描述任务上取得了显著性能提升。
[126] FRAGATA: Semantic Retrieval of HPC Support Tickets via Hybrid RAG over 20 Years of Request Tracker History
- arXiv: 2604.13721 (cross-listed)
- Authors: Santiago Paramés-Estévez, Nicolás Filloy-Montesino, Jorge Fernández-Fabeiro, José Carlos Mouriño-Gallego
- Subjects: cs.IR; cs.AI
- Tags: RAG, Information Retrieval, High Performance Computing
- Summary: 本文介绍了Fragata,一个语义工单搜索系统,结合现代信息检索技术与超算中心20年的Request Tracker历史数据。该系统能够找到相关的历史事件,不受语言、拼写错误或查询措辞的限制,相比原生搜索有显著改进。
[127] Jump-Start Reinforcement Learning with Vision-Language-Action Regularization
- arXiv: 2604.13733 (cross-listed)
- Authors: Angelo Moroncelli, Roberto Zanetti, Marco Maccarini, Loris Roveda
- Subjects: cs.LG; cs.AI; cs.RO
- Tags: Robotics, Vision-Language Model, Reinforcement Learning
- Summary: 本文提出了VLAJS方法,通过方向性动作一致性正则化将稀疏的视觉-语言-动作引导与在线强化学习桥接。该方法在多个操作任务中将所需环境交互减少超过50%,并在真实机器人上展示了零样本仿真到现实的迁移能力。
[128] TokenFormer: Unify the Multi-Field and Sequential Recommendation Worlds
- arXiv: 2604.13737 (cross-listed)
- Authors: Yifeng Zhou, Yuehong Hu, Zhixiang Feng, Junwei Pan, Kaihui Wu, Hanyong Li, Shangyu Zhang, Shudong Huang, Zhangbin Zhu, Chengguo Yin, Haijie Gu, Jie Jiang
- Subjects: cs.IR; cs.AI
- Tags: Recommender System, Vision Transformer
- Summary: 本文提出了TokenFormer统一推荐架构,通过Bottom-Full-Top-Sliding注意力机制和非线性交互表示解决了序列崩溃传播问题。该模型在公共基准和腾讯广告平台上取得了最先进的性能,同时显著提高了统一建模下的维度鲁棒性。
[129] A Dynamic-Growing Fuzzy-Neuro Controller, Application to a 3PSP Parallel Robot
- arXiv: 2604.13763 (cross-listed)
- Authors: Mohsen Jalaeian-Farimani, Mohammad-R Akbarzadeh-T, Alireza Akbarzadeh, Mostafa Ghaemi
- Subjects: eess.SY; cs.AI; cs.LG; cs.NE; cs.RO
- Tags: Robotics, Fuzzy Logic
- Venue: FUZZ-IEEE 2012
- Summary: 本文提出了一种结合自适应策略的动态增长模糊神经控制器(DGFNC),并将其应用于3PSP并联机器人的位置控制问题。该方法通过保守的规则增长机制和滑模非线性控制器,实现了更快的响应速度和系统稳定性。
[130] From Anchors to Supervision: Memory-Graph Guided Corpus-Free Unlearning for Large Language Models
- arXiv: 2604.13777 (cross-listed)
- Authors: Wenxuan Li, Zhenfei Zhang, Mi Zhang, Geng Hong, Mi Wen, Xiaoyu You, Min Yang
- Subjects: cs.CL; cs.AI
- Tags: Machine Unlearning, LLM Security
- Summary: 本文提出了MAGE框架,用于大型语言模型的无语料库遗忘学习。该框架仅通过轻量级用户锚点探测目标模型,构建记忆图并合成监督信号,从而在无需访问原始训练语料的情况下实现有效遗忘。
[131] Soft $Q(λ)$: A multi-step off-policy method for entropy regularised reinforcement learning using eligibility traces
- arXiv: 2604.13780 (cross-listed)
- Authors: Pranav Mahajan, Ben Seymour
- Subjects: cs.LG; cs.AI
- Tags: Reinforcement Learning
- Summary: 本文提出了一种名为Soft Q(λ)的多步离策略方法,用于熵正则化强化学习。该方法引入了新的Soft Tree Backup算子,并结合资格迹实现了高效的信用分配。
[132] Gaslight, Gatekeep, V1-V3: Early Visual Cortex Alignment Shields Vision-Language Models from Sycophantic Manipulation
- arXiv: 2604.13803 (cross-listed)
- Authors: Arya Shah, Vaibhav Tripathi, Mayank Singh, Chaklam Silpasuwanchai
- Subjects: cs.CV; cs.AI
- Tags: Vision-Language Model, LLM Alignment, Cognitive Science
- Code: code
- Summary: 本文研究了视觉语言模型的大脑对齐与其受阿谀奉承操纵易感性之间的关系。研究发现,早期视觉皮层(V1-V3)的对齐程度是阿谀奉承行为的可靠负向预测因子,表明忠实的低级视觉编码能抵御对抗性语言覆盖。
[133] Cognitive Offloading in Agile Teams: How Artificial Intelligence Reshapes Risk Assessment and Planning Quality
- arXiv: 2604.13814 (cross-listed)
- Authors: Adriana Caraeni, Alexander Shick, Andrew Lan
- Subjects: cs.HC; cs.AI
- Tags: Software Engineering, Human-Computer Interaction
- Summary: 本文通过对照实验研究了敏捷冲刺规划中的认知卸载,比较了纯AI、纯人类和混合规划模型。研究发现虽然纯AI规划效率高,但会降低风险捕捉率,因此提出了结合算法工具与人类审议的混合规划框架。
[134] Sentiment analysis for software engineering: How far can zero-shot learning (ZSL) go?
- arXiv: 2604.13826 (cross-listed)
- Authors: Reem Alfayez, Manal Binkhonain
- Subjects: cs.SE; cs.AI
- Tags: Sentiment Analysis, Zero-Shot Learning, Software Engineering
- Summary: 本文评估了零样本学习技术在软件工程领域情感分析任务中的表现。实验结果表明,结合专家标签的零样本学习方法能够达到与微调模型相当的性能,为解决标注数据稀缺问题提供了可行方案。
[135] SparseBalance: Load-Balanced Long Context Training with Dynamic Sparse Attention
- arXiv: 2604.13847 (cross-listed)
- Authors: Hongtao Xu, Jianchao Tan, Yuxuan Hu, Pengju Lu, Hongyu Wang, Pingwei Sun, Yerui Sun, Yuchen Xie, Xunliang Cai, Mingzhen Li, Weile Jia
- Subjects: cs.LG; cs.AI
- Tags: Long Context, LLM Training, Distributed Training
- Summary: 本文提出了SparseBalance框架,通过动态稀疏性调整和稀疏感知批处理策略,解决了长上下文大语言模型训练中的负载不平衡问题。该方法在提高训练速度的同时,还提升了模型的长上下文处理能力。
[136] MCPThreatHive: Automated Threat Intelligence for Model Context Protocol Ecosystems
- arXiv: 2604.13849 (cross-listed)
- Authors: Yi Ting Shen, Kentaroh Toyoda, Alex Leung
- Subjects: cs.CR; cs.AI
- Tags: LLM Agent, LLM Security
- Venue: DEFCON SG 2026
- Summary: 本文介绍了MCPThreatHive平台,用于自动化Model Context Protocol生态系统的威胁情报全生命周期管理。该平台定义了包含38种威胁模式的MCP-38分类法,并提供了风险评估模型和可视化工具。
[137] Evaluating Supervised Machine Learning Models: Principles, Pitfalls, and Metric Selection
- arXiv: 2604.13882 (cross-listed)
- Authors: Xuanyan Liu, Ignacio Cabrera Martin, Marcello Trovati, Xiaolong Xu, Nikolaos Polatidis
- Subjects: cs.LG; cs.AI
- Tags: Model Evaluation
- Summary: 本文探讨了监督机器学习模型评估的原则、陷阱及指标选择,强调了数据集特征、验证设计和指标选择对评估结果的影响。研究通过实验场景指出了常见的评估误区,并提供了选择指标和验证协议的结构化基础。
[138] Beyond Conservative Automated Driving in Multi-Agent Scenarios via Coupled Model Predictive Control and Deep Reinforcement Learning
- arXiv: 2604.13891 (cross-listed)
- Authors: Saeed Rahmani, Gözde Körpe, Zhenlin, Bruno Brito, Simeon Craig Calvert, Bart van Arem
- Subjects: cs.RO; cs.AI; eess.SY
- Tags: Autonomous Driving, Reinforcement Learning, Multi-Agent System
- Summary: 本文提出了一种结合模型预测控制(MPC)和深度强化学习(RL)的框架,用于解决无信号灯交叉口的多智能体自动驾驶问题。实验表明,该集成方法在降低碰撞率和提高成功率方面优于独立的MPC或RL方法,并展现出更好的跨场景泛化能力。
[139] Do We Still Need Humans in the Loop? Comparing Human and LLM Annotation in Active Learning for Hostility Detection
- arXiv: 2604.13899 (cross-listed)
- Authors: Ahmad Dawar Hakimi, Lea Hirlimann, Isabelle Augenstein, Hinrich Schütze
- Subjects: cs.CL; cs.AI
- Tags: Data Annotation, Active Learning
- Summary: 本文比较了主动学习中人类标注与大语言模型标注在仇恨言论检测任务上的表现。研究发现,虽然LLM标注成本更低且F1分数相当,但其错误结构与人类标注存在系统性差异,在特定应用场景下需谨慎选择。
[140] ASTER: Latent Pseudo-Anomaly Generation for Unsupervised Time-Series Anomaly Detection
- arXiv: 2604.13924 (cross-listed)
- Authors: Romain Hermary, Samet Hicsonmez, Dan Pineau, Abd El Rahman Shabayek, Djamila Aouada
- Subjects: cs.LG; cs.AI; cs.CV
- Tags: Anomaly Detection, Time Series Analysis, Data Synthesis
- Summary: 本文提出了ASTER框架,用于无监督时间序列异常检测。该方法直接在潜在空间生成伪异常样本,并利用预训练的大语言模型增强时空表征,在基准数据集上取得了最优性能。
[141] HINTBench: Horizon-agent Intrinsic Non-attack Trajectory Benchmark
- arXiv: 2604.13954 (cross-listed)
- Authors: Jiacheng Wang, Jinchang Hou, Fabian Wang, Ping Jian, Chenfu Bao, Zhonghou Lv
- Subjects: cs.LG; cs.AI
- Tags: LLM Agent, Benchmark, AI Safety
- Summary: 本文提出了HINTBench基准,用于评估智能体在非攻击场景下的内在风险。该基准包含风险检测、定位和失效类型识别任务,揭示了现有模型在细粒度风险诊断方面的能力差距。
[142] Creo: From One-Shot Image Generation to Progressive, Co-Creative Ideation
- arXiv: 2604.13956 (cross-listed)
- Authors: Zoe De Simone, Angie Boggust, Fredo Durand, Ashia Wilson, Arvind Satyanarayan
- Subjects: cs.HC; cs.AI; cs.CV
- Tags: Text-to-Image, Human-Computer Interaction
- Summary: 本文提出了Creo系统,一种支持渐进式协同创作的多阶段文本生成图像系统。该系统通过从草图到高分辨率图像的分层生成过程,结合锁定机制,增强了用户对生成过程的控制力和创造力。
[143] How Can We Synthesize High-Quality Pretraining Data? A Systematic Study of Prompt Design, Generator Model, and Source Data
- arXiv: 2604.13977 (cross-listed)
- Authors: Joel Niklaus, Atsuki Yamaguchi, Michal Štefánik, Guilherme Penedo, Hynek Kydlíček, Elie Bakouch, Lewis Tunstall, Edward Emanuel Beeching, Thibaud Frere, Colin Raffel, Leandro von Werra, Thomas Wolf
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: Data Synthesis, LLM Training
- Summary: 本文对合成预训练数据的设计维度进行了系统研究,发现结构化输出格式(如表格、数学问题)优于基线方法。基于此发现,作者构建了包含4860亿词元的FinePhrase数据集,在降低生成成本的同时提升了性能。
[144] Leveraging LLM-GNN Integration for Open-World Question Answering over Knowledge Graphs
- arXiv: 2604.13979 (cross-listed)
- Authors: Hussein Abdallah, Ibrahim Abdelaziz, Panos Kalnis, Essam Mansour
- Subjects: cs.CL; cs.AI; cs.DB
- Tags: Question Answering, Knowledge Graph, Graph Neural Network
- Summary: 本文提出了GLOW系统,结合预训练图神经网络和大语言模型解决开放世界知识图谱问答问题。该系统利用GNN预测候选答案并通过结构化提示引导LLM推理,在标准基准和新提出的GLOW-BENCH上均取得了显著提升。
[145] Adaptive Conformal Prediction for Improving Factuality of Generations by Large Language Models
- arXiv: 2604.13991 (cross-listed)
- Authors: Aleksandr Rubashevskii, Dzianis Piatrashyn, Preslav Nakov, Maxim Panov
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: LLM Hallucination, Uncertainty Estimation, LLM Evaluation
- Summary: 本文提出了一种自适应共形预测方法,用于提高大语言模型生成内容的真实性,通过提示依赖的校准解决了现有方法在覆盖范围上的局限性。
[146] Diffusion Language Models for Speech Recognition
- arXiv: 2604.14001 (cross-listed)
- Authors: Davyd Naveriani, Albert Zeyer, Ralf Schlüter, Hermann Ney
- Subjects: cs.CL; cs.AI; cs.LG; cs.NE
- Tags: Speech Processing, Diffusion Model
- Summary: 本文探索了掩码扩散语言模型和均匀状态扩散模型在语音识别重评分中的应用,并设计了一种结合CTC的联合解码方法以提升识别准确率。
[147] Towards Multi-Object-Tracking with Radar on a Fast Moving Vehicle: On the Potential of Processing Radar in the Frequency Domain
- arXiv: 2604.14013 (cross-listed)
- Authors: Tim Hansen, Arturo Gomez-Chavez, Ilya Shimchik, Andreas Birk
- Subjects: cs.RO; cs.AI; cs.CV; eess.IV; eess.SP
- Tags: Autonomous Driving, Signal Processing
- Summary: 本文提出在频域处理雷达数据以提高高速移动车辆场景下的鲁棒性,并展示了基于Boreas数据集的纯雷达里程计实验结果。
[148] MAny: Merge Anything for Multimodal Continual Instruction Tuning
- arXiv: 2604.14016 (cross-listed)
- Authors: Zijian Gao, Wangwang Jia, Xingxing Zhang, Pengfei Qian, Tao Sun, Bo Ding, Yong Dou, Huaimin Wang, Kele Xu
- Subjects: cs.LG; cs.AI
- Tags: Continual Learning, Multimodal Learning, Model Merging
- Summary: 本文提出了MAny框架,通过跨模态投影合并和低秩参数合并来解决多模态大语言模型在持续指令微调中的灾难性遗忘问题,无需额外训练即可实现知识融合。
[149] Feed-Forward 3D Scene Modeling: A Problem-Driven Perspective
- arXiv: 2604.14025 (cross-listed)
- Authors: Weijie Wang, Qihang Cao, Sensen Gao, Donny Y. Chen, Haofei Xu, Wenjing Bian, Songyou Peng, Tat-Jen Cham, Chuanxia Zheng, Andreas Geiger, Jianfei Cai, Jia-Wang Bian, Bohan Zhuang
- Subjects: cs.CV; cs.AI; cs.GR
- Tags: 3D Reconstruction, 3D Vision, Survey
- Code: code
- Summary: 本文综述了前馈式3D场景建模方法,提出了一种基于模型设计策略的新分类法,并回顾了相关基准数据集和实际应用。
[150] Large Language Models to Enhance Business Process Modeling: Past, Present, and Future Trends
- arXiv: 2604.14034 (cross-listed)
- Authors: João Bettencourt, Sérgio Guerreiro
- Subjects: cs.SE; cs.AI; cs.IR
- Tags: Survey, Knowledge Management, Text Generation
- Summary: 本文综述了利用大语言模型将自然语言转换为BPMN流程模型的方法,分析了现有方法的演变与挑战,并指出了未来的研究方向。
[151] First-See-Then-Design: A Multi-Stakeholder View for Optimal Performance-Fairness Trade-Offs
- arXiv: 2604.14035 (cross-listed)
- Authors: Kavya Gupta, Nektarios Kalampalikis, Christoph Heitz, Isabel Valera
- Subjects: cs.LG; cs.AI
- Tags: Fairness, Decision Making, Optimization
- Venue: FAccT 2026
- Summary: 本文提出了一种基于福利经济学和分配正义的多利益相关者框架,用于在算法决策中实现性能与公平性的最优权衡。
[152] TIP: Token Importance in On-Policy Distillation
- arXiv: 2604.14084 (cross-listed)
- Authors: Yuanda Xu, Hejian Sang, Zhengze Zhou, Ran He, Zhipeng Wang, Alborz Geramifard
- Subjects: cs.LG; cs.AI
- Tags: Knowledge Distillation, LLM Training
- Code: code
- Summary: 本文研究了在策略蒸馏中Token的重要性,提出了基于学生熵和师生差异的分类法,并验证了基于熵的采样在减少内存的同时能保持性能。
[153] UMI-3D: Extending Universal Manipulation Interface from Vision-Limited to 3D Spatial Perception
- arXiv: 2604.14089 (cross-listed)
- Authors: Ziming Wang
- Subjects: cs.RO; cs.AI
- Tags: Embodied AI, Robotics, Sensor Fusion
- Summary: 本文提出了UMI-3D系统,通过集成轻量级LiDAR传感器扩展了通用操作接口,解决了视觉SLAM在遮挡和动态场景下的局限性,提升了具身操作的数据采集质量。
[154] UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding
- arXiv: 2604.14113 (cross-listed)
- Authors: Fei Tang, Bofan Chen, Zhengxi Lu, Tongbo Chen, Songqin Nong, Tao Jiang, Wenhao Xu, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen
- Subjects: cs.CV; cs.AI; cs.CL
- Tags: GUI Automation, Uncertainty Estimation
- Code: code
- Summary: 本文提出了UI-Zoomer,一种无需训练的自适应缩放框架,通过量化预测不确定性来动态决定是否对GUI元素进行缩放,从而提升定位精度。
[155] HiVLA: A Visual-Grounded-Centric Hierarchical Embodied Manipulation System
- arXiv: 2604.14125 (cross-listed)
- Authors: Tianshuo Yang, Guanyu Chen, Yutian Chen, Zhixuan Liang, Yitian Liu, Zanxin Chen, Chunpu Xu, Haotian Liang, Jiangmiao Pang, Yao Mu, Ping Luo
- Subjects: cs.CV; cs.AI; cs.RO
- Tags: Embodied AI, Vision-Language Model, Robotics
- Summary: 本文提出了HiVLA框架,通过分层架构将高级语义规划与低级运动控制解耦,结合视觉语言模型和扩散Transformer实现了鲁棒的具身操作。
[156] Rhetorical Questions in LLM Representations: A Linear Probing Study
- arXiv: 2604.14128 (cross-listed)
- Authors: Louie Hong Yao, Vishesh Anand, Yuan Zhuang, Tianyu Jiang
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: Interpretability, Representation Learning
- Venue: ACL 2026
- Summary: 本文通过线性探测分析了大语言模型对反问句的内部表示,发现反问句信号在早期层出现且线性可分,但不同数据集间的表示存在差异。
[157] From Feelings to Metrics: Understanding and Formalizing How Users Vibe-Test LLMs
- arXiv: 2604.14137 (cross-listed)
- Authors: Itay Itzhak, Eliya Habba, Gabriel Stanovsky, Yonatan Belinkov
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: LLM Evaluation, Human-Computer Interaction, LLM Personalization
- Summary: 本文研究了用户如何通过“感觉测试”评估大语言模型,将其形式化为个性化提示生成和用户感知评估的过程,并提出了相应的评估流程。
[158] LongCoT: Benchmarking Long-Horizon Chain-of-Thought Reasoning
- arXiv: 2604.14140 (cross-listed)
- Authors: Sumeet Ramesh Motwani, Daniel Nichols, Charles London, Peggy Li, Fabio Pizzati, Acer Blake, Hasan Hammoud, Tavish McDonald, Akshat Naik, Alesia Ivanova, Vignesh Baskaran, Ivan Laptev, Ruben Glatt, Tal Ben-Nun, Philip Torr, Natasha Jaques, Ameya Prabhu, Brian Bartoldson, Bhavya Kailkhura, Christian Schroeder de Witt
- Subjects: cs.LG; cs.AI
- Tags: LLM Reasoning, Benchmark
- Summary: 本文介绍了LongCoT基准测试,包含2500个专家设计的问题,旨在评估前沿模型在长视野思维链推理方面的能力。
[159] From $P(y|x)$ to $P(y)$: Investigating Reinforcement Learning in Pre-train Space
- arXiv: 2604.14142 (cross-listed)
- Authors: Yuqiao Tan, Minzheng Wang, Bo Liu, Zichen Liu, Tian Liang, Shizhu He, Jun Zhao, Kang Liu
- Subjects: cs.LG; cs.AI; cs.CL
- Tags: LLM Reasoning, Reinforcement Learning, LLM Training
- Code: code
- Summary: 本文提出了PreRL方法,通过在预训练空间优化边缘分布来增强推理能力,并结合标准强化学习提出了Dual Space RL策略。
替换投稿 (115)
[160] Agentic AI Optimisation (AAIO): what it is, how it works, why it matters, and how to deal with it
- arXiv: 2504.12482 (replaced)
- Authors: Luciano Floridi, Carlotta Buttaboni, Nicolas Gentler, Emmie Hine, Jessica Morley, Claudio Novelli, Tyler Schroder
- Subjects: cs.AI
- Tags: LLM Agent, AI Ethics, AI Governance
- Summary: 本文介绍了智能体AI优化(AAIO)这一新范式,旨在确保网站与自主AI智能体之间的有效交互,并探讨了其治理和伦理影响。
[161] FieldWorkArena: Agentic AI Benchmark for Real Field Work Tasks
- arXiv: 2505.19662 (replaced)
- Authors: Jun Takahashi, Atsunori Moteki, Akiyoshi Uchida, Shoichi Masui, Fan Yang, Kanji Uchino, Yueqi Song, Yonatan Bisk, Graham Neubig, Ikuo Kusajima, Yasuto Watanabe, Hiroyuki Ishida, Koki Nakagawa, Shan Jiang
- Subjects: cs.AI; cs.CV
- Tags: LLM Agent, LLM Evaluation, Benchmark
- Venue: ICPR 2026
- Summary: 本文介绍了FieldWorkArena,一个针对真实世界现场工作任务的智能体AI基准测试,用于检测和记录制造和零售环境中的安全隐患和违规行为。数据集包含在工厂、仓库和零售店现场拍摄的图像/视频,任务通过与现场工人和管理人员访谈精心开发。
[162] Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games
- arXiv: 2506.03610 (replaced)
- Authors: Dongmin Park, Minkyu Kim, Beongjun Choi, Junhyuck Kim, Keon Lee, Jonghyun Lee, Inkyu Park, Byeong-Uk Lee, Jaeyoung Hwang, Jaewoo Ahn, Ameya S. Mahabaleshwarkar, Bilal Kartal, Pritam Biswas, Yoshi Suhara, Kangwook Lee, Jaewoong Cho
- Subjects: cs.AI
- Tags: LLM Agent, Game AI, Benchmark
- Code: code
- Summary: 本文提出了Orak,一个用于训练和评估LLM智能体的基准测试,涵盖12款跨越所有主要类型的视频游戏。该基准提供即插即用接口、专家游戏轨迹微调数据集,以及统一的评估框架包括游戏排行榜和对战竞技场。
[163] RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy Optimization
- arXiv: 2508.00222 (replaced)
- Authors: Yihong Dong, Xue Jiang, Yongding Tao, Huanyu Liu, Kechi Zhang, Lili Mou, Rongyu Cao, Yingwei Ma, Jue Chen, Binhua Li, Zhi Jin, Fei Huang, Yongbin Li, Ge Li
- Subjects: cs.AI; cs.CL; cs.LG
- Tags: LLM Reasoning, Reinforcement Learning, Mathematical Reasoning
- Venue: ACL 2026
- Summary: 本文提出RL-PLUS,一种混合策略优化方法,通过结合内部利用和外部数据来解决LLM在强化学习中的能力边界崩溃问题。该方法在六个数学推理基准和六个分布外推理任务上实现了最先进的性能。
[164] Pruning Long Chain-of-Thought of Large Reasoning Models via Small-Scale Preference Optimization
- arXiv: 2508.10164 (replaced)
- Authors: Bin Hong, Jiayu Liu, Kai Zhang, Jianwen Sun, Mengdi Zhang, Zhenya Huang
- Subjects: cs.AI
- Tags: LLM Reasoning, LLM Inference, Prompt Engineering
- Venue: ICLR 2026
- Summary: 本文提出长度控制偏好优化(LCPO)方法,在有限调优下减少大型推理模型的输出长度。该方法通过难度估计过滤生成轨迹,在多个基准上将平均输出长度减少超过50%同时保持推理性能。
[165] MAS-Bench: A Unified Benchmark for Shortcut-Augmented Hybrid Mobile GUI Agents
- arXiv: 2509.06477 (replaced)
- Authors: Pengxiang Zhao, Guangyi Liu, YaoZhen Liang, Weiqing He, Zhengxi Lu, WenHao Wang, Yuehao Huang, Yuxiang Chai, Zhaolu Kang, Yaxuan Guo, Hao Wang, Kexin Zhang, Liang Liu, Yong Liu
- Subjects: cs.AI
- Tags: LLM Agent, GUI Automation, Benchmark
- Summary: 本文介绍MAS-Bench,一个用于评估GUI-快捷方式混合移动智能体的基准测试,涵盖11个真实应用的139个复杂任务。实验表明混合智能体相比纯GUI方法实现高达68.3%的成功率和39%的执行效率提升。
[166] ProRe: A Proactive Reward System for GUI Agents via Reasoner-Actor Collaboration
- arXiv: 2509.21823 (replaced)
- Authors: Gaole Dai, Shiqi Jiang, Ting Cao, Yuqing Yang, Yuanchun Li, Rui Tan, Mo Li, Lili Qiu
- Subjects: cs.AI
- Tags: LLM Agent, GUI Automation, Reward Design
- Venue: ICLR 2026
- Code: code
- Summary: 本文提出ProRe,一个主动奖励系统,通过推理器-执行器协作为GUI智能体分配更准确的奖励。推理器调度目标状态探测任务,执行器主动与环境交互收集观察,使策略智能体成功率提升高达22.4%。
[167] Formalizing the Safety, Security, and Functional Properties of Agentic AI Systems
- arXiv: 2510.14133 (replaced)
- Authors: Edoardo Allegrini, Ananth Shreekumar, Z. Berkay Celik
- Subjects: cs.AI; cs.CR; cs.MA
- Tags: LLM Agent, AI Safety, Formal Methods
- Summary: 本文引入一个智能体AI系统建模框架,包含主机智能体模型和任务生命周期模型。该框架定义了30个时序逻辑属性用于系统行为的形式化验证,能够检测协调边缘情况和安全漏洞。
[168] Saber: An Efficient Sampling with Adaptive Acceleration and Backtracking Enhanced Remasking for Diffusion Language Model
- arXiv: 2510.18165 (replaced)
- Authors: Yihong Dong, Zhaoyu Ma, Xue Jiang, Zhiyuan Fan, Jiaru Qian, Yongmin Li, Jianha Xiao, Zhi Jin, Rongyu Cao, Binhua Li, Fei Huang, Yongbin Li, Ge Li
- Subjects: cs.AI; cs.CL; cs.LG; cs.SE
- Tags: Code Generation, Diffusion Model, LLM Inference
- Venue: ACL 2026
- Summary: 本文介绍Saber,一种用于扩散语言模型的免训练采样算法,通过自适应加速和回溯增强重掩码在代码生成中实现更好的推理速度和输出质量。实验显示Pass@1准确率平均提升1.9%,推理速度平均提升251.4%。
[169] Empowerment Gain and Causal Model Construction: Children and adults are sensitive to controllability and variability in their causal interventions
- arXiv: 2512.08230 (replaced)
- Authors: Eunice Yiu, Kelsey Allen, Shiry Ginosar, Alison Gopnik
- Subjects: cs.AI
- Tags: Causal Inference, Reinforcement Learning, Cognitive Science
- Venue: Philosophical Transactions A 2026
- Summary: 本文探索强化学习中的’赋能’(empowerment)内在奖励信号与因果学习之间的联系。实证研究测试了儿童和成人如何使用赋能线索推断因果关系并设计有效的因果干预。
[170] Remember Me, Refine Me: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution
- arXiv: 2512.10696 (replaced)
- Authors: Zouying Cao, Jiaji Deng, Li Yu, Weikang Zhou, Zhaoyang Liu, Bolin Ding, Hai Zhao
- Subjects: cs.AI; cs.CL
- Tags: LLM Agent, Memory Architecture, Continual Learning
- Venue: ACL 2026 Findings
- Summary: 本文提出ReMe,一个动态程序记忆框架,通过多面蒸馏、上下文自适应重用和基于效用的精炼实现经验驱动的智能体进化。实验表明配备ReMe的较小模型优于更大的无记忆模型。
[171] Logical Phase Transitions: Understanding Collapse in LLM Logical Reasoning
- arXiv: 2601.02902 (replaced)
- Authors: Xinglang Zhang, Yunyao Zhang, ZeLiang Chen, Junqing Yu, Wei Yang, Zikai Song
- Subjects: cs.AI; cs.CL; cs.LO
- Tags: LLM Reasoning, Logical Reasoning, Neurosymbolic AI
- Venue: ACL 2026
- Code: code
- Summary: 本文揭示了’逻辑相变’现象:LLM逻辑推理性能在临界逻辑深度后会突然崩溃而非平滑下降。作者提出神经符号课程调优方法来缓解这种崩溃,在高复杂度下实现显著准确率提升。
[172] Variance Computation for Weighted Model Counting with Knowledge Compilation Approach
- arXiv: 2601.03523 (replaced)
- Authors: Kengo Nakamura, Masaaki Nishino, Norihito Yasuda
- Subjects: cs.AI; cs.DS
- Tags: Knowledge Representation, Uncertainty Estimation, Probabilistic Inference
- Venue: AAAI 2026
- Summary: 本文研究知识编译中加权模型计数(WMC)方差计算的可处理性。作者推导了结构化d-DNNF的多项式时间算法,证明了其他电路类型的硬度,并展示了在贝叶斯网络推理不确定性测量中的应用。
[173] 3D Instruction Ambiguity Detection
- arXiv: 2601.05991 (replaced)
- Authors: Jiayu Ding, Haoran Tang, Hongbo Jin, Wei Gao, Ge Li
- Subjects: cs.AI
- Tags: Embodied AI, Vision-Language Model, 3D Vision
- Summary: 本文定义了3D指令歧义检测任务,即模型需判断命令在给定3D场景中是否有单一明确的含义。作者构建了Ambi3D基准(700+场景,22k指令),并提出AmbiVer框架用于检测指令歧义。
[174] TRIM: Hybrid Inference via Targeted Stepwise Routing in Multi-Step Reasoning Tasks
- arXiv: 2601.10245 (replaced)
- Authors: Vansh Kapoor, Aman Gupta, Hao Chen, Anurag Beniwal, Jing Huang, Aviral Kumar
- Subjects: cs.AI; cs.CL; cs.LG
- Tags: LLM Reasoning, LLM Inference, Mathematical Reasoning
- Venue: ICLR 2026
- Summary: 本文提出TRIM,一种混合推理方法,仅将关键推理步骤路由到大模型,而让小模型处理常规延续。该方法使用过程奖励模型识别错误步骤,在数学推理基准上实现高达6倍的性价比提升。
[175] AMA: Adaptive Memory via Multi-Agent Collaboration
- arXiv: 2601.20352 (replaced)
- Authors: Weiquan Huang, Zixuan Wang, Hehai Lin, Sudong Wang, Bo Xu, Qian Li, Beier Zhu, Linyi Yang, Chengwei Qin
- Subjects: cs.AI
- Tags: LLM Agent, Memory Architecture, Multi-Agent System
- Summary: 本文提出通过多智能体协作的自适应记忆(AMA)框架,利用协调智能体管理多粒度记忆。AMA采用层次化记忆设计,包含构造器、检索器、判断器和刷新器智能体,在长上下文基准上显著优于现有方法。
[176] Bayesian-LoRA: Probabilistic Low-Rank Adaptation of Large Language Models
- arXiv: 2601.21003 (replaced)
- Authors: Moule Lin, Shuhao Guan, Andrea Patane, David Gregg, Goetz Botterweck
- Subjects: cs.AI
- Tags: Parameter-Efficient Fine-Tuning, Uncertainty Estimation, Bayesian Optimization
- Summary: 本文引入Bayesian-LoRA,将确定性LoRA更新重新表述为受稀疏高斯过程启发的概率低秩表示。该方法仅需约0.42M额外参数,实现高达84%的ECE降低和76%的NLL降低,同时保持竞争性准确率。
[177] H-AdminSim: A Multi-Agent Simulator for Realistic Hospital Administrative Workflows with FHIR Integration
- arXiv: 2602.05407 (replaced)
- Authors: Jun-Min Lee, Meong Hi Son, Edward Choi
- Subjects: cs.AI; cs.CL
- Tags: Multi-Agent System, LLM Agent, Medical AI
- Venue: CHIL 2026
- Summary: 本文提出了H-AdminSim,一个结合真实数据生成与多智能体模拟的综合框架,用于模拟医院行政工作流程,并通过FHIR集成提供统一、可互操作的测试环境,以系统评估LLM驱动的行政自动化可行性。
[178] Contextuality from Single-State Ontological Models: An Information-Theoretic Obstruction
- arXiv: 2602.16716 (replaced)
- Authors: Song-Ju Kim
- Subjects: cs.AI; cs.IT
- Tags: Quantum Computing, Information Theory
Summary: 本文研究了经典本体论描述,推导出一个信息论障碍:当经典单状态模型使用辅助上下文寄存器重现操作统计时,所需的上下文信息被条件互信息I(C;O λ)下界约束。
[179] DeepPresenter: Environment-Grounded Reflection for Agentic Presentation Generation
- arXiv: 2602.22839 (replaced)
- Authors: Hao Zheng, Guozhao Mo, Xinru Yan, Qianhao Yuan, Wenkai Zhang, Xuanang Chen, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun
- Subjects: cs.AI
- Tags: LLM Agent, Text Generation
- Code: code
- Summary: DeepPresenter是一个演示文稿生成的智能体框架,能够自主规划、渲染和修订幻灯片,通过基于渲染结果的环镜接地反思而非内部信号自我反思来实现迭代优化。
[180] GraphScout: Empowering Large Language Models with Intrinsic Exploration Ability for Agentic Graph Reasoning
- arXiv: 2603.01410 (replaced)
- Authors: Yuchen Ying, Weiqi Jiang, Tongya Zheng, Yu Wang, Shunyu Liu, Kaixuan Chen, Mingli Song
- Subjects: cs.AI
- Tags: Knowledge Graph, RAG, LLM Reasoning
- Code: code
- Summary: GraphScout是一个以训练为中心的智能体图推理框架,使LLM能够自主与知识图谱交互并合成结构化训练数据,从而内化智能体图推理能力,无需人工标注。
[181] Animating Petascale Time-varying Data on Commodity Hardware with LLM-assisted Scripting
- arXiv: 2603.07053 (replaced)
- Authors: Ishrat Jahan Eliza, Xuan Huang, Aashish Panta, Alper Sahistan, Zhimin Li, Amy A. Gooch, Valerio Pascucci
- Subjects: cs.AI; eess.SY
- Tags: Data Visualization, LLM Agent, Scientific Computing
- Summary: 本文介绍了一个在普通工作站上创建PB级时变数据3D动画的框架,包含关键帧动画抽象、高效云端数据访问和LLM辅助对话界面,使无可视化专业知识的领域科学家能够快速生成动画。
[182] Hierarchical Reinforcement Learning with Augmented Step-Level Transitions for LLM Agents
- arXiv: 2604.05808 (replaced)
- Authors: Shuai Zhen, Yanhua Yu, Ruopei Guo, Nan Cheng, Yang Deng
- Subjects: cs.AI; cs.LG
- Tags: LLM Agent, Reinforcement Learning, Hierarchical RL
- Venue: ACL 2026
- Code: code
- Summary: STEP-HRL是一个层次化强化学习框架,使LLM智能体能够通过单步转移而非完整交互历史进行学习,利用已完成的子任务表示全局进度,并通过局部进度模块迭代压缩交互历史。
[183] Reason in Chains, Learn in Trees: Self-Rectification and Grafting for Multi-turn Agent Policy Optimization
- arXiv: 2604.07165 (replaced)
- Authors: Yu Li, Sizhe Tang, Tian Lan
- Subjects: cs.AI; cs.LG
- Tags: LLM Agent, Reinforcement Learning, LLM Reasoning
- Summary: T-STAR框架通过将轨迹整合为统一的认知树来恢复潜在的相关奖励结构,实现内省式估值和上下文思想嫁接,从而优化多轮智能体策略。
[184] IatroBench: Pre-Registered Evidence of Iatrogenic Harm from AI Safety Measures
- arXiv: 2604.07709 (replaced)
- Authors: David Gringras
- Subjects: cs.AI; cs.CL; cs.CY; cs.LG
- Tags: LLM Evaluation, AI Safety, Medical AI
- Code: code
- Summary: IatroBench通过60个预注册临床场景测量前沿模型的身份依赖性信息保留问题,发现所有模型对医生框架的临床问题提供更好的指导,揭示了AI安全措施可能导致的医源性伤害。
[185] ImplicitMemBench: Measuring Unconscious Behavioral Adaptation in Large Language Models
- arXiv: 2604.08064 (replaced)
- Authors: Chonghan Qin, Xiachong Feng, Weitao Ma, Xiaocheng Feng, Lingpeng Kong
- Subjects: cs.AI
- Tags: LLM Evaluation, Memory Architecture, Benchmark
- Venue: ACL 2026
- Summary: ImplicitMemBench是首个系统评估LLM隐式记忆的基准,通过程序性记忆、启动效应和经典条件反射三种认知构念进行测试,揭示所有模型表现远低于人类基线。
[186] Avenir-UX: Automated UX Evaluation via Simulated Human Web Interaction with GUI Grounding
- arXiv: 2604.09581 (replaced)
- Authors: Wee Joe Tan, Zi Rui Lucas Lim, Shashank Durgad, Karim Obegi, Aiden Yiliu Li
- Subjects: cs.AI; cs.CY; cs.HC
- Tags: GUI Automation, Usability Evaluation, LLM Agent
- Code: code
- Summary: Avenir-UX是一个用户体验评估智能体,通过GUI接地模拟用户在网站上的行为,生成包含系统可用性量表、步骤易用性问题和并发出声思维协议的标准化可用性报告。
[187] FinTrace: Holistic Trajectory-Level Evaluation of LLM Tool Calling for Long-Horizon Financial Tasks
- arXiv: 2604.10015 (replaced)
- Authors: Yupeng Cao, Haohang Li, Weijin Liu, Wenbo Cao, Anke Xu, Lingfei Qian, Xueqing Peng, Minxue Tang, Zhiyuan Yao, Jimin Huang, K.P. Subbalakshmi, Zining Zhu, Jordan W. Suchow, Yangyang Yu
- Subjects: cs.AI; cs.CE; cs.CL; cs.MM
- Tags: Financial AI, Tool Learning, LLM Evaluation
- Summary: FinTrace是一个评估LLM在长周期金融任务中工具调用能力的基准,包含800条专家标注轨迹,采用九项指标从行动正确性、执行效率、过程质量和输出质量四个维度进行细粒度评估。
[188] Three Roles, One Model: Role Orchestration at Inference Time to Close the Performance Gap Between Small and Large Agents
- arXiv: 2604.11465 (replaced)
- Authors: S. Aaron McClendon, Jorge Gallego-Feliciano, Stavros Zervoudakis, Antonios Saravanos
- Subjects: cs.AI
- Tags: LLM Agent, LLM Inference, Tool Learning
- Summary: 本文提出三层推理脚手架流水线,将单个冻结模型部署为摘要模型、主智能体模型和纠正模型三种角色,在无需额外训练的情况下将小模型在复杂工具使用任务上的性能提升约一倍。
[189] The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap
- arXiv: 2604.11828 (replaced)
- Authors: Mohamed Mabrok
- Subjects: cs.AI; cs.CY; math.OC
- Tags: Philosophy of Science, Cognitive Science
- Summary: 本文论证科学知识在任何历史时刻都代表局部最优而非全局最优,受历史偶然性、认知路径依赖和制度锁定的影响,并提出三种锁定机制和逃离局部最优的元科学策略。
[190] Spatial Atlas: Compute-Grounded Reasoning for Spatial-Aware Research Agent Benchmarks
- arXiv: 2604.12102 (replaced)
- Authors: Arun Sharma
- Subjects: cs.AI; cs.CV; cs.LG
- Tags: Spatial Reasoning, LLM Agent, Benchmark
- Code: code
- Summary: Spatial Atlas引入计算接地推理范式,通过确定性计算解决空间感知研究智能体的子问题,结合空间场景图引擎和熵引导动作选择,在FieldWorkArena和MLE-Bench基准上实现竞争性准确率。
[191] DocSeeker: Structured Visual Reasoning with Evidence Grounding for Long Document Understanding
- arXiv: 2604.12812 (replaced)
- Authors: Hao Yan, Yuliang Liu, Xingchen Liu, Yuyi Zhang, Minghui Liao, Jihao Wu, Wei Chen, Xiang Bai
- Subjects: cs.AI
- Tags: Document Understanding, Vision-Language Model, Multimodal Learning
- Venue: CVPR 2026
- Summary: DocSeeker通过结构化的分析、定位和推理工作流程解决长文档理解问题,采用监督微调和证据感知组相对策略优化联合优化证据定位和答案准确性。
[192] From edges to meaning: Semantic line sketches as a cognitive scaffold for ancient pictograph invention
- arXiv: 2604.12865 (replaced)
- Authors: Seowung Leem, Lin Gu, Ruogu Fang
- Subjects: cs.AI
- Tags: Cognitive Science, Image Generation, Representation Learning
- Summary: 本文提出古代表意文字源于大脑将视觉输入压缩为稳定边界抽象的内在倾向,构建了受生物启发的视觉层级数字孪生,生成的符号与埃及象形文字、中国甲骨文等早期表意文字具有惊人相似性。
[193] Auto-FP: An Experimental Study of Automated Feature Preprocessing for Tabular Data
- arXiv: 2310.02540 (replaced)
- Authors: Danrui Qi, Jinglin Peng, Yongjun He, Jiannan Wang
- Subjects: cs.LG; cs.AI; cs.DB; cs.IR
- Tags: AutoML, Tabular Learning
- Summary: 本文研究了表格数据的自动化特征预处理,将其建模为超参数优化或神经架构搜索问题,并在45个数据集上评估了15种算法,发现基于进化的算法表现最佳,随机搜索也是强有力的基线。
[194] Deep Learning Based Amharic Chatbot for FAQs in Universities
- arXiv: 2402.01720 (replaced)
- Authors: Goitom Ybrah Hailu, Hadush Hailu, Shishay Welay
- Subjects: cs.CY; cs.AI; cs.CL; cs.LG
- Tags: Dialogue System, Low-Resource NLP, Question Answering
- Summary: 本文提出了一个基于深度学习的阿姆哈拉语大学FAQ聊天机器人,通过对比SVM、朴素贝叶斯和深度神经网络模型,发现深度学习模型准确率最高,并集成到了Facebook Messenger中。
[195] RECOVER: Designing a Large Language Model-based Remote Patient Monitoring System for Postoperative Gastrointestinal Cancer Care
- arXiv: 2502.05740 (replaced)
- Authors: Ziqi Yang, Yuxuan Lu, Jennifer Bagdasarian, Vedant Das Swain, Ritu Agarwal, Collin Campbell, Waddah Al-Refaire, Jehan El-Bayoumi, Guodong Gao, Dakuo Wang, Bingsheng Yao, Nawar Shara
- Subjects: cs.HC; cs.AI
- Tags: Medical AI, LLM Agent, Human-Computer Interaction
- Summary: 本文设计了一个基于大语言模型的术后胃肠癌护理远程患者监测系统RECOVER,通过参与式设计确定了设计策略,并实现了对话代理和临床仪表盘。
[196] A closer look at how large language models trust humans: patterns and biases
- arXiv: 2504.15801 (replaced)
- Authors: Valeria Lerman, Yaniv Dover
- Subjects: cs.CL; cs.AI; cs.CY
- Tags: LLM Evaluation, Fairness, AI Ethics
- Summary: 本文研究了大型语言模型如何在决策情境中对人类产生信任,发现LLM的信任机制与人类相似,受能力、善意和正直等维度影响,但也存在年龄、宗教和性别等人口统计学偏差。
[197] MulDimIF: A Multi-Dimensional Constraint Framework for Evaluating and Improving Instruction Following in Large Language Models
- arXiv: 2505.07591 (replaced)
- Authors: Junjie Ye, Caishuang Huang, Zhuohan Chen, Wenjie Fu, Chenyuan Yang, Leyi Yang, Yilong Wu, Peng Wang, Meng Zhou, Xiaolong Yang, Tao Gui, Qi Zhang, Zhongchao Shi, Jianping Fan, Xuanjing Huang
- Subjects: cs.CL; cs.AI
- Tags: Instruction Tuning, LLM Evaluation, Data Synthesis
- Venue: ACL 2026
- Code: code
- Summary: 本文提出了一个多维约束框架MulDimIF用于评估和改进大语言模型的指令遵循能力,通过构建代码可验证样本进行训练,显著提升了模型在不同约束设置下的表现。
[198] Two-Stage Regularization-Based Structured Pruning for LLMs
- arXiv: 2505.18232 (replaced)
- Authors: Mingkuan Feng, Jinyang Wu, Siyuan Liu, Shuai Zhang, Ruihan Jin, Feihu Che, Pengpeng Shao, Zhengqi Wen, Jianhua Tao
- Subjects: cs.LG; cs.AI; cs.CL
- Tags: Model Compression, LLM Inference
- Venue: ACL 2026
- Summary: 本文提出了一种面向大语言模型的两阶段正则化结构化剪枝方法TRSP,通过学习权重和正则化项来保留知识,在无需重训练的情况下实现了优于现有方法的性能。
[199] Visual Sparse Steering (VS2): Unsupervised Adaptation for Image Classification using Sparsity-Guided Steering Vectors
- arXiv: 2506.01247 (replaced)
- Authors: Gerasimos Chatzoudis, Zhuowei Li, Gemma E. Moran, Hao Wang, Dimitris N. Metaxas
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Test-Time Adaptation, Image Classification, Interpretability
- Summary: 本文提出了一种无监督自适应方法VS2,利用稀疏自编码器提取的稀疏特征构建导向向量,在无需标签和测试时优化的情况下显著提升了视觉模型的零样本分类准确率。
[200] Frozen Forecasting: A Unified Evaluation
- arXiv: 2507.13942 (replaced)
- Authors: Jacob C Walker, Pedro Vélez, Luisa Polania Cabrera, Guangyao Zhou, Sayna Ebrahimi, Rishabh Kabra, Carl Doersch, Maks Ovsjanikov, João Carreira, Shiry Ginosar
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Video Understanding, Representation Learning, Diffusion Model
- Summary: 本文提出了一个统一的评估框架,利用潜在扩散模型评估冻结视觉骨干网络的预测能力,发现视频预训练模型在各级抽象任务中均优于图像模型。
[201] Autonomous Multi-objective Alloy Design through Simulation-guided Optimization
- arXiv: 2507.16005 (replaced)
- Authors: Penghui Yang, Chendong Zhao, Bijun Tang, Zhonghan Zhang, Xinrun Wang, Yanchen Deng, Xuyu Dong, Yuhao Lu, Jianguo Huang, Yixuan Li, Yushan Xiao, Cuntai Guan, Zheng Liu, Bo An
- Subjects: cs.AI; cs.LG
- Tags: Material Discovery, LLM Agent, Optimization
- Summary: 本文提出了AutoMAT框架,结合大语言模型、CALPHAD仿真和AI优化技术,实现了从构思到实验验证的自主合金设计,成功发现了性能优于基准的新型合金。
[202] Sandwich: Joint Configuration Search and Hot-Switching for Efficient CPU LLM Serving
- arXiv: 2507.18454 (replaced)
- Authors: Juntao Zhao, Jiuru Li, Chuan Wu
- Subjects: cs.AR; cs.AI; cs.DC; cs.PL
- Tags: LLM Serving, Hardware Acceleration
- Venue: DAC 2026
- Summary: 本文提出了Sandwich系统,通过阶段切换、硬件抽象和动态张量程序生成等创新技术,解决了CPU上大语言模型服务中的资源冲突和性能问题,实现了显著的加速。
[203] Activation-Guided Local Editing for Jailbreaking Attacks
- arXiv: 2508.00555 (replaced)
- Authors: Jiecong Wang, Haoran Li, Hao Peng, Ziqian Zeng, Zihao Wang, Haohua Du, Zhengtao Yu
- Subjects: cs.CR; cs.AI; cs.CL
- Tags: LLM Security, Adversarial Robustness
- Code: code
- Summary: 本文提出了一种两阶段越狱攻击框架AGILE,结合场景生成和激活引导的局部编辑,有效规避了模型的安全防御,在攻击成功率和迁移性上均达到了先进水平。
[204] FCBV-Net: Category-Level Robotic Garment Smoothing via Feature-Conditioned Bimanual Value Prediction
- arXiv: 2508.05153 (replaced)
- Authors: Mohammed Daba, Jing Qiu
- Subjects: cs.RO; cs.AI
- Tags: Robotics, Embodied AI, 3D Vision
- Summary: 本文提出了FCBV-Net网络,通过利用预训练的几何特征来调节双臂动作价值预测,解决了机器人衣物平滑操作中的类别级泛化难题。
[205] Memp: Exploring Agent Procedural Memory
- arXiv: 2508.06433 (replaced)
- Authors: Runnan Fang, Yuan Liang, Xiaobin Wang, Jialong Wu, Shuofei Qiao, Pengjun Xie, Fei Huang, Huajun Chen, Ningyu Zhang
- Subjects: cs.CL; cs.AI; cs.LG; cs.MA
- Tags: LLM Agent, Memory Architecture
- Venue: ACL 2026 Findings
- Code: code
- Summary: 本文提出了Memp框架,通过将智能体轨迹蒸馏为细粒度指令和高层脚本来构建可学习的程序性记忆,显著提升了智能体在类似任务上的成功率和效率。
[206] Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments
- arXiv: 2508.08791 (replaced)
- Authors: Junjie Ye, Changhao Jiang, Zhengyin Du, Yufei Xu, Xuesong Yao, Zhiheng Xi, Xiaoran Fan, Qi Zhang, Tao Gui, Xuanjing Huang, Jiecao Chen
- Subjects: cs.CL; cs.AI
- Tags: Tool Learning, Reinforcement Learning, LLM Agent
- Venue: ACL 2026
- Code: code
- Summary: 本文提出了一种自动化环境构建流程和可验证奖励机制,通过强化学习反馈驱动大语言模型的工具使用训练,显著提升了模型的工具使用性能。
[207] Decentralized Rank Scheduling for Energy-Constrained Multi-Task Federated Fine-Tuning in Edge-Assisted IoV Networks
- arXiv: 2508.09532 (replaced)
- Authors: Bokeng Zheng, Jianqiang Zhong, Jiayi Liu, Lei Xue, Xu Chen, Xiaoxi Zhang
- Subjects: cs.LG; cs.AI; cs.NI
- Tags: Federated Learning, Edge Computing, Parameter-Efficient Fine-Tuning
- Summary: 本文提出了一种分层联邦微调框架,利用LoRA和去中心化秩适应机制,解决了车联网边缘环境中多任务适应的资源受限和移动性挑战。
[208] Heavy-Tailed Class-Conditional Priors for Long-Tailed Generative Modeling
- arXiv: 2509.02154 (replaced)
- Authors: Aymene Mohammed Bouayed, Samuel Deslauriers-Gauthier, Adrian Iaccovelli, David Naccache
- Subjects: cs.LG; cs.AI; cs.CV; stat.ML
- Tags: Generative Model, Representation Learning, Long-Tailed Learning
- Summary: 本文提出了C-$t^3$VAE模型,通过引入重尾类条件先验来解决长尾数据生成中的潜在几何偏差问题,显著提升了在严重类别不平衡情况下的生成质量。
[209] Between a Rock and a Hard Place: The Tension Between Ethical Reasoning and Safety Alignment in LLMs
- arXiv: 2509.05367 (replaced)
- Authors: Shei Pern Chua, Zhen Leng Thai, Kai Jun Teh, Xiao Li, Qibing Ren, Xiaolin Hu
- Subjects: cs.CR; cs.AI
- Tags: LLM Alignment, LLM Security, Adversarial Robustness
- Summary: 该论文揭示了LLM安全对齐在伦理困境场景下的脆弱性,提出TRIAL多轮红队攻击方法利用模型的伦理推理能力进行攻击,并设计了ERR防御框架来区分工具性响应和解释性响应以增强鲁棒性。
[210] Neuro-Symbolic AI for Cybersecurity: State of the Art, Challenges, and Opportunities
- arXiv: 2509.06921 (replaced)
- Authors: Safayat Bin Hakim, Muhammad Adil, Alvaro Velasquez, Shouhuai Xu, Houbing Herbert Song
- Subjects: cs.CR; cs.AI
- Tags: Neurosymbolic AI, Cybersecurity, Survey
- Summary: 这篇综述系统分析了103篇神经符号AI在网络安全领域应用的论文,发现多智能体和结构化集成架构显著优于单智能体方法,因果推理能实现主动防御,知识引导学习提高了数据效率和可解释性。
[211] Failure Makes the Agent Stronger: Enhancing Accuracy through Structured Reflection for Reliable Tool Interactions
- arXiv: 2509.18847 (replaced)
- Authors: Junhao Su, Yuanliang Wan, Junwei Yang, Hengyu Shi, Tianyang Han, Junfeng Luo, Yurui Qiu
- Subjects: cs.CV; cs.AI; cs.CL
- Tags: LLM Agent, Tool Learning
- Summary: 该论文提出结构化反思方法,将错误诊断和修复转化为可训练的显式动作,结合DAPO和GSPO目标函数优化工具使用策略,显著提升了多轮工具调用的成功率和错误恢复能力。
[212] MARCH: Evaluating the Intersection of Ambiguity Interpretation and Multi-hop Inference
- arXiv: 2509.22750 (replaced)
- Authors: Jeonghyun Park, Ingeol Baek, Seunghyun Yoon, Haeun Jang, Aparna Garimella, Akriti Jain, Nedim Lipka, Hwanhee Lee
- Subjects: cs.CL; cs.AI
- Tags: Question Answering, Benchmark, LLM Reasoning
- Venue: ACL 2026 Findings
- Summary: 该论文提出了MARCH基准测试,包含2209个多跳歧义问题,揭示了当前最先进模型在结合歧义消解与多步推理方面的困难,并提出了CLARION两阶段智能体框架来显式分离歧义规划和证据驱动推理。
[213] Hybrid Approach for Enhancing Lesion Segmentation in Fundus Images
- arXiv: 2509.25549 (replaced)
- Authors: Mohammadmahdi Eshragh, Emad A. Mohammed, Behrouz Far, Ezekiel Weis, Carol L Shields, Sandor R Ferenczy, Trafford Crump
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Image Segmentation, Medical AI
- Summary: 该论文提出了一种结合数学/聚类分割模型与U-Net洞察的混合方法,用于眼底图像病灶分割,在减少大规模训练数据需求的同时显著提高了分割精度和泛化能力。
[214] The Signal is in the Steps: Local Scoring for Reasoning Data Selection
- arXiv: 2510.03988 (replaced)
- Authors: Hoang Anh Just, Myeongseob Ko, Ruoxi Jia
- Subjects: cs.LG; cs.AI
- Tags: Knowledge Distillation, Data Selection, LLM Reasoning
- Summary: 该论文提出局部平均对数概率(LALP)方法,通过评分每个推理步骤而非完整轨迹来选择训练数据,使小模型能够从多个教师模型中有效学习长推理链。
[215] Geometry-Aware Cross Modal Alignment for Light Field-LiDAR Semantic Segmentation
- arXiv: 2510.06687 (replaced)
- Authors: Jie Luo, Yuxuan Jiang, Xin Jin, Mingyu Liu, Yihui Fan
- Subjects: cs.CV; cs.AI
- Tags: Image Segmentation, Sensor Fusion, Autonomous Driving
- Summary: 该论文提出了首个集成光场数据和点云数据的多模态语义分割数据集,并设计了Mlpfseg网络,通过特征补全和深度感知模块实现光场与LiDAR的有效融合。
[216] Native Hybrid Attention for Efficient Sequence Modeling
- arXiv: 2510.07019 (replaced)
- Authors: Jusen Du, Jiaxi Hu, Tao Zhang, Weigao Sun, Yu Cheng
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: LLM Inference, Long Context
- Venue: ACL 2026
- Code: code
- Summary: 该论文提出原生混合注意力(NHA)架构,将线性和全注意力机制统一到单一层设计中,通过线性RNN维护长期上下文并结合滑动窗口的短期token,在保持竞争力的同时实现显著效率提升。
[217] SPG: Sandwiched Policy Gradient for Masked Diffusion Language Models
- arXiv: 2510.09541 (replaced)
- Authors: Chenyu Wang, Paria Rashidinejad, DiJia Su, Song Jiang, Sid Wang, Siyan Zhao, Cai Zhou, Shannon Zejiang Shen, Feiyu Chen, Tommi Jaakkola, Yuandong Tian, Bo Liu
- Subjects: cs.CL; cs.AI
- Tags: Diffusion Model, RLHF, LLM Alignment
- Venue: ICLR 2026
- Summary: 该论文提出三明治策略梯度(SPG)方法,利用对数似然的上界和下界来对齐扩散大语言模型,在GSM8K、MATH500等任务上显著优于现有方法。
[218] A Function-Centric Perspective on Flat and Sharp Minima
- arXiv: 2510.12451 (replaced)
- Authors: Israel Mason-Williams, Gabryel Mason-Williams, Helen Yannakoudakis
- Subjects: cs.LG; cs.AI; cs.CV
- Tags: Deep Learning Theory, Optimization
- Summary: 该论文重新审视了深度神经网络中平坦最小值与泛化能力的关系,通过大量实验表明锐度是函数依赖的属性而非泛化差的指标,更锐利的最小值可能反映更合适的归纳偏置。
[219] Scaling Test-Time Compute to Achieve IOI Gold Medal with Open-Weight Models
- arXiv: 2510.14232 (replaced)
- Authors: Mehrzad Samadi, Aleksander Ficek, Sean Narenthiran, Siddhartha Jain, Wasi Uddin Ahmad, Somshubra Majumdar, Vahid Noroozi, Boris Ginsburg
- Subjects: cs.LG; cs.AI; cs.CL
- Tags: Code Generation, LLM Reasoning, Benchmark
- Venue: ACL 2026
- Summary: 该论文提出GenCluster框架,通过大规模生成、行为聚类、排序和轮询提交策略,首次使开源权重模型在IOI 2025竞赛中达到金牌水平表现。
[220] A Practitioner's Guide to Kolmogorov-Arnold Networks
- arXiv: 2510.25781 (replaced)
- Authors: Amir Noorizadegan, Sifan Wang, Leevan Ling, Juan P. Dominguez-Morales
- Subjects: cs.LG; cs.AI; cs.NE; math.NA
- Tags: Survey, Neural Architecture Search
- Summary: 这篇综述系统梳理了Kolmogorov-Arnold Networks(KAN)的快速发展文献,围绕其与KST理论、MLP和经典核方法的关系、基函数设计以及精度效率等核心主题展开分析。
[221] X-Diffusion: Training Diffusion Policies on Cross-Embodiment Human Demonstrations
- arXiv: 2511.04671 (replaced)
- Authors: Maximus A. Pace, Prithwish Dan, Chuanruo Ning, Atiksh Bhardwaj, Audrey Du, Edward W. Duan, Wei-Chiu Ma, Kushal Kedia
- Subjects: cs.RO; cs.AI; cs.CV
- Tags: Imitation Learning, Robotics, Diffusion Model
- Venue: ICRA 2026
- Summary: 该论文提出X-Diffusion跨具身学习框架,将人类动作视为机器人动作的噪声版本,通过选择性在高噪声时间步训练扩散策略,有效利用人类视频数据而不牺牲机器人可行性。
[222] fMRI-LM: Towards a Universal Foundation Model for Language-Aligned fMRI Understanding
- arXiv: 2511.21760 (replaced)
- Authors: Yuxiang Wei, Yanteng Zhang, Xi Xiao, Chengxuan Qian, Tianyang Wang, Vince D. Calhoun
- Subjects: cs.CL; cs.AI
- Tags: Foundation Model, Multimodal Learning, Neuroscience
- Code: code
- Summary: 该论文提出fMRI-LM基础模型,通过三阶段框架连接功能磁共振成像与语言:学习神经分词器、适配LLM联合建模fMRI token和文本、以及多任务指令微调,实现fMRI的结构和语义理解。
[223] Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded Updates
- arXiv: 2512.04844 (replaced)
- Authors: Atsuki Yamaguchi, Terufumi Morishita, Aline Villavicencio, Nikolaos Aletras
- Subjects: cs.CL; cs.AI
- Tags: Continual Learning, Multilingual Learning, LLM Training
- Venue: ACL 2026
- Summary: 该论文提出Source-Shielded Updates(SSU)选择性参数更新策略,通过识别并冻结对源语言能力关键的参数,在仅使用无标注目标语言数据的低资源约束下有效缓解灾难性遗忘问题。
[224] Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning
- arXiv: 2512.08639 (replaced)
- Authors: Huilin Xu, Zhuoyang Liu, Yixiang Luomei, Feng Xu
- Subjects: cs.CV; cs.AI
- Tags: Vision-Language Model, Autonomous Driving, Embodied AI
- Code: code
- Summary: 该论文提出了一个统一的空中视觉语言导航框架,仅依赖单目RGB观测和自然语言指令,通过提示引导的多任务学习联合优化空间感知、轨迹推理和动作预测。
[225] SAQ: Stabilizer-Aware Quantum Error Correction Decoder
- arXiv: 2512.08914 (replaced)
- Authors: David Zenati, Eliya Nachmani
- Subjects: cs.AI
- Tags: Quantum Computing, Fault Tolerance
- Venue: ICLR 2026
- Summary: 本文提出了SAQ-Decoder,一个结合基于Transformer的学习和约束感知后处理的量子纠错解码框架。该方法在保持线性计算复杂度的同时实现了接近最大似然解码的准确率,在环面码上达到了10.99%(独立噪声)和18.6%(去极化噪声)的错误阈值。
[226] ZK-APEX: Zero-Knowledge Approximate Personalized Unlearning with Executable Proofs
- arXiv: 2512.09953 (replaced)
- Authors: Mohammad M Maheri, Sunil Cotterill, Alex Davidson, Hamed Haddadi
- Subjects: cs.CR; cs.AI; cs.LG
- Tags: Machine Unlearning, Privacy, Edge Computing
- Venue: MLSys 2026
- Summary: 本文提出了ZK-APEX,一种零样本个性化遗忘方法,结合稀疏掩码和零知识证明技术,使边缘设备上的个性化模型能够验证性地遗忘目标数据。该方法在视觉Transformer分类任务上恢复了几乎所有的个性化准确率,同时有效移除了目标信息。
[227] VeruSAGE: A Study of Agent-Based Verification for Rust Systems
- arXiv: 2512.18436 (replaced)
- Authors: Chenyuan Yang, Natalie Neamtu, Chris Hawblitzel, Jacob R. Lorch, Shan Lu
- Subjects: cs.OS; cs.AI; cs.FL; cs.SE
- Tags: LLM Agent, Formal Methods, Software Engineering
- Summary: 本文研究了LLM为Rust系统软件开发生成正确性证明的能力,构建了包含849个证明任务的VeruSAGE-Bench基准测试。最佳LLM-Agent组合完成了超过80%的系统验证任务,展示了LLM辅助开发验证系统软件的巨大潜力。
[228] BitFlipScope: Scalable Fault Localization and Recovery for Bit-Flip Corruptions in LLMs
- arXiv: 2512.22174 (replaced)
- Authors: Muhammad Zeeshan Karamat, Sadman Saif, Christiana Chamon Garcia
- Subjects: cs.DC; cs.AI
- Tags: Fault Tolerance, LLM Security
- Venue: HOST 2026
- Summary: 本文提出了BitFlipScope,一个可扩展的软件框架,用于识别LLM中由位翻转故障引起的损坏区域。该框架支持有无参考模型两种场景,能够有效诊断故障并支持无需微调的轻量级性能恢复。
[229] Strategic Response of News Publishers to Generative AI
- arXiv: 2512.24968 (replaced)
- Authors: Hangcheng Zhao, Ron Berman
- Subjects: econ.GN; cs.AI; cs.CY; stat.AP
- Tags: AI Ethics
- Summary: 本文使用高频细粒度数据分析新闻发布商对生成式AI的战略响应。研究发现,阻止GenAI爬虫的大型发布商网站流量下降,同时转向更难被LLM复制的丰富内容,并增加了编辑和内容制作类职位的招聘比例。
[230] Safe-FedLLM: Delving into the Safety of Federated Large Language Models
- arXiv: 2601.07177 (replaced)
- Authors: Mingxiang Tao, Yu Tian, Wenxuan Tu, Yue Yang, Xue Yang, Xiangyan Tang
- Subjects: cs.CR; cs.AI
- Tags: Federated Learning, LLM Security
- Summary: 本文研究了联邦LLM训练中的安全问题,提出了Safe-FedLLM防御框架,通过探测LoRA更新来检测恶意客户端。实验表明该方法有效提高了FedLLM对恶意客户端的鲁棒性,同时保持了在良性数据上的竞争性能。
[231] Two Pathways to Truthfulness: On the Intrinsic Encoding of LLM Hallucinations
- arXiv: 2601.07422 (replaced)
- Authors: Wen Luo, Guangyue Peng, Wei Li, Shaohang Wei, Feifan Song, Liang Wang, Nan Yang, Xingxing Zhang, Jing Jin, Furu Wei, Houfeng Wang
- Subjects: cs.CL; cs.AI
- Tags: LLM Hallucination, Interpretability
- Venue: ACL 2026
- Summary: 本文揭示了LLM真实性信号来源于两条不同的信息路径:依赖于问答信息流的问题锚定路径和从生成答案本身获取证据的答案锚定路径。研究通过注意力消除和令牌修补验证了这两条路径,并提出了增强幻觉检测的应用方法。
[232] ExpSeek: Self-Triggered Experience Seeking for Web Agents
- arXiv: 2601.08605 (replaced)
- Authors: Wenyuan Zhang, Xinghua Zhang, Haiyang Yu, Shuaiyi Nie, Bingli Wu, Juwei Yue, Tingwen Liu, Yongbin Li
- Subjects: cs.CL; cs.AI
- Tags: LLM Agent, Web Agent
- Venue: ACL 2026 Findings
- Code: code
- Summary: 本文提出了ExpSeek方法,使Web代理能够基于熵阈值主动寻求步骤级别的经验干预。实验表明该方法在四个Web代理基准测试上分别实现了9.3%和7.5%的绝对性能提升。
[233] Optimized Human-Robot Co-Dispatch Planning for Petro-Site Surveillance under Varying Criticalities
- arXiv: 2602.07924 (replaced)
- Authors: Nur Ahmad Khatim, Mansur Arief
- Subjects: cs.RO; cs.AI; math.OC
- Tags: Robotics, Multi-Agent System
- Summary: 本文提出了人机协同调度设施选址问题(HRCD-FLP),考虑分层基础设施关键性、人机监督比例约束和最低利用率要求。结果表明从保守运营过渡到未来自主运营可显著降低成本,同时保持关键基础设施的完整覆盖。
[234] In-Context Autonomous Network Incident Response: An End-to-End Large Language Model Agent Approach
- arXiv: 2602.13156 (replaced)
- Authors: Yiran Gao, Kim Hammar, Tao Li
- Subjects: cs.CR; cs.AI
- Tags: LLM Agent, Cybersecurity
- Venue: AAAI 2026 Workshop
- Summary: 本文提出了一个基于LLM的端到端代理解决方案,用于自主网络事件响应,集成了感知、推理、规划和行动四种功能。该代理利用上下文学习适应不断演变的网络攻击,无需手工建模模拟器即可运行。
[235] Online Navigation Planning for Long-term Autonomous Operation of Underwater Gliders
- arXiv: 2602.19315 (replaced)
- Authors: Victor-Alexandru Darvariu, Charlotte Z. Reed, Jan Stratmann, Bruno Lacerda, Benjamin Allsup, Stephen Woodward, Elizabeth Siddle, Trishna Saeharaseelan, Owain Jones, Dan Jones, Tobias Ferreira, Chloe Baker, Kevin Chaplin, James Kirk, Ashley Iceton-Morris, Ryan D. Patmore, Jeff Polton, Charlotte Williams, Christopher D. J. Auckland, Rob A. Hall, Alexandra Kokkinaki, Alvaro Lorenzo Lopez, Justin J. H. Buck, Nick Hawes
- Subjects: cs.RO; cs.AI
- Tags: Robotics, Automated Planning
- Summary: 本文提出了一种基于蒙特卡洛树搜索的不确定性感知在线导航规划系统,用于水下滑翔机机器人的长期自主运行。该系统在北海部署中验证了约3个月和1000公里的航行,相比标准导航方法显著改善了潜水时长和路径长度。
[236] Echoes Over Time: Unlocking Length Generalization in Video-to-Audio Generation Models
- arXiv: 2602.20981 (replaced)
- Authors: Christian Simon, Masato Ishii, Wei-Yao Wang, Koichi Saito, Akio Hayakawa, Dongseok Shim, Zhi Zhong, Shuyang Cui, Shusuke Takahashi, Takashi Shibuya, Yuki Mitsufuji
- Subjects: cs.CV; cs.AI
- Tags: Video Understanding, Audio Generation, Multimodal Learning
- Venue: CVPR 2026
- Summary: 本文提出了MMHNet,一种用于视频到音频生成的分层网络,支持长达5分钟的长音频生成。研究证明在短视频-音频对上训练的模型可以泛化到更长的测试样本,无需在长时长数据上训练。
[237] FlexGuard: Continuous Risk Scoring for Strictness-Adaptive LLM Content Moderation
- arXiv: 2602.23636 (replaced)
- Authors: Zhihao Ding, Jinming Li, Ze Lu, Jieming Shi
- Subjects: cs.LG; cs.AI
- Tags: Content Moderation, AI Safety
- Venue: ACL 2026
- Summary: 本文提出了FlexGuard,一个输出连续风险分数的LLM内容审核器,支持通过阈值调整适应不同严格程度的要求。该方法解决了现有二元分类审核器在严格程度变化时性能下降的问题。
[238] Evaluating LLM-Based Translation of a Low-Resource Technical Language: The Medical and Philosophical Greek of Galen
- arXiv: 2602.24119 (replaced)
- Authors: James L. Zainaldin, Cameron Pattison, Manuela Marai, Jacob Wu, Mark J. Schiefsky
- Subjects: cs.CL; cs.AI
- Tags: Machine Translation, Low-Resource NLP
- Summary: 本文评估了商业LLM对古希腊技术散文的翻译质量,发现LLM在已有译本的说明性文本上达到高质量,但在未翻译的药理学文本上质量较低。术语稀有度是翻译失败的主要预测因素,现有自动评估指标在高质量翻译间无法有效区分。
[239] Domain-Adaptive Model Merging Across Disconnected Modes
- arXiv: 2603.05957 (replaced)
- Authors: Junming Liu, Yusen Zhang, Rongchao Zhang, Wenkai Zhu, Tian Wu
- Subjects: cs.DC; cs.AI
- Tags: Model Merging, Domain Adaptation
- Venue: ICASSP 2026
- Summary: 本文提出了DMM,一个无数据模型合并框架,用于处理高度发散的模型。该方法通过从归一化统计量合成伪数据,并通过轻量级精炼将知识从发散模型蒸馏到合并模型中,在单模态和多模态基准测试上达到了最先进的性能。
[240] Visual Self-Fulfilling Alignment: Shaping Safety-Oriented Personas via Threat-Related Images
- arXiv: 2603.08486 (replaced)
- Authors: Qishun Yang, Shu Yang, Lijie Hu, Di Wang
- Subjects: cs.CV; cs.AI
- Tags: LLM Alignment, Vision-Language Model
- Summary: 本文提出了视觉自我实现对齐(VSFA)方法,通过在威胁相关图像上微调视觉语言模型来塑造安全导向的人格,无需任何安全标签。该方法将自我实现机制从文本扩展到视觉模态,提供了一种无标签的VLM对齐方法。
[241] UNBOX: Unveiling Black-box visual models with Natural-language
- arXiv: 2603.08639 (replaced)
- Authors: Simone Carnemolla, Chiara Russo, Simone Palazzo, Quentin Bouniot, Daniela Giordano, Zeynep Akata, Matteo Pennisi, Concetto Spampinato
- Subjects: cs.CV; cs.AI
- Tags: Interpretability, Vision-Language Model
- Summary: 本文提出了UNBOX框架,用于在完全无数据、无梯度和无反向传播的约束下对黑盒视觉模型进行类级剖析。该方法利用大语言模型和文本到图像扩散模型,通过输出概率驱动的语义搜索来揭示模型学习到的概念和潜在偏差。实验表明,在严格的黑盒约束下,UNBOX能与最先进的白盒可解释性方法相媲美。
[242] The Mirror Design Pattern: Strict Data Geometry over Model Scale for Prompt Injection Detection
- arXiv: 2603.11875 (replaced)
- Authors: J Alex Corll
- Subjects: cs.CR; cs.AI
- Tags: Prompt Injection, Cybersecurity
- Summary: 本文介绍了Mirror设计模式,通过组织匹配的正负样本语料库来训练用于提示注入检测的线性分类器。该方法在严格的数据约束下,利用稀疏字符n-gram线性SVM实现了高召回率和F1分数,证明了在首层提示注入筛查中,严格的数据几何结构比模型规模更重要。
[243] Just Use XML: Revisiting Joint Translation and Label Projection
- arXiv: 2603.12021 (replaced)
- Authors: Thennal DK, Chris Biemann, Hans Ole Hatzel
- Subjects: cs.CL; cs.AI
- Tags: Machine Translation, Multilingual Learning
- Venue: ACL 2026 Findings
- Summary: 本文提出了LabelPigeon框架,通过XML标签联合执行翻译和标签投影,解决了以往方法中翻译质量下降的问题。实验表明,该方法在多种语言和下游任务中优于基线模型,显著提升了跨语言迁移性能。
[244] Graph In-Context Operator Networks for Generalizable Spatiotemporal Prediction
- arXiv: 2603.12725 (replaced)
- Authors: Chenghan Wu, Zongmin Yu, Boai Sun, Liu Yang
- Subjects: cs.LG; cs.AI
- Tags: In-Context Learning, Time Series Forecasting, Graph Neural Network
- Summary: 本文提出了图上下文算子网络(GICON),结合图消息传递和示例感知位置编码,用于时空预测任务。通过对比实验发现,在相同训练数据和步数下,上下文算子学习在复杂任务上优于经典算子学习,并能从少量样本扩展到大量样本。
[245] ContractSkill: Repairable Contract-Based Skills for Multimodal Web Agents
- arXiv: 2603.20340 (replaced)
- Authors: Zijian Lu, Yiping Zuo, Yupeng Nie, Xin He, Weibei Fan, Lianyong Qi, Shi Jin
- Subjects: cs.SE; cs.AI
- Tags: Web Agent, LLM Agent
- Code: code
- Summary: 本文提出了ContractSkill框架,将网络代理的技能转换为具有显式过程结构的可执行工件,从而实现确定性验证、故障定位和最小化局部修复。实验表明,该方法在VisualWebArena等环境中有效,解决了技能生成不稳定的问题。
[246] Assessment Design in the AI Era: A Method for Identifying Items Functioning Differentially for Humans and Chatbots
- arXiv: 2603.23682 (replaced)
- Authors: Licol Zeinfeld, Alona Strugatski, Ziva Bar-Dov, Ron Blonder, Shelley Rap, Giora Alexandron
- Subjects: cs.HC; cs.AI
- Tags: LLM Evaluation, Education Technology
- Summary: 本文结合教育数据挖掘和心理测量学理论,提出了一种统计方法来识别人类和大语言模型表现存在系统性差异的题目。该方法利用差异项目功能(DIF)分析,帮助定位评估中最易受AI误用的环节,并为AI时代的评估设计提供指导。
[247] A Lightweight, Transferable, and Self-Adaptive Framework for Intelligent DC Arc-Fault Detection in Photovoltaic Systems
- arXiv: 2603.25749 (replaced)
- Authors: Xiaoke Yang, Long Gao, Haoyu He, Hanyuan Hang, Qi Liu, Shuai Zhao, Qiantu Tuo, Rui Li
- Subjects: eess.SP; cs.AI; cs.LG
- Tags: Anomaly Detection, Edge Computing
- Summary: 本文提出了一种轻量级、可迁移且自适应的学习框架(LD-framework),用于光伏系统中的直流电弧故障检测。该方法通过设备级的光谱表示学习、跨硬件表示对齐和云边协同自适应更新,实现了在异构设备和长期运行条件下的高精度检测。
[248] Working Notes on Late Interaction Dynamics: Analyzing Targeted Behaviors of Late Interaction Models
- arXiv: 2603.26259 (replaced)
- Authors: Antoine Edy, Max Conti, Quentin Macé
- Subjects: cs.IR; cs.AI; cs.CL
- Tags: Information Retrieval, Late Interaction
- Venue: ECIR 2026 Workshop
- Summary: 本文分析了晚期交互检索模型中的长度偏差和MaxSim算子之外的相似度分布行为。研究结果表明,虽然因果晚期交互模型存在理论上的长度偏差,但双向模型在极端情况下也会受到影响,且MaxSim算子有效地利用了令牌级相似度分数。
[249] ChartNet: A Million-Scale, High-Quality Multimodal Dataset for Robust Chart Understanding
- arXiv: 2603.27064 (replaced)
- Authors: Jovana Kondic, Pengyuan Li, Dhiraj Joshi, Isaac Sanchez, Ben Wiesel, Shafiq Abedin, Amit Alfassy, Eli Schwartz, Daniel Caraballo, Yagmur Gizem Cinar, Florian Scheidegger, Steven I. Ross, Daniel Karl I. Weidele, Hang Hua, Ekaterina Arutyunova, Roei Herzig, Zexue He, Zihan Wang, Xinyue Yu, Yunfei Zhao, Sicong Jiang, Minghao Liu, Qunshu Lin, Peter Staar, Luis Lastras, Aude Oliva, Rogerio Feris
- Subjects: cs.CV; cs.AI; cs.CL
- Tags: Chart Understanding, Multimodal Learning, Data Synthesis
- Venue: CVPR 2026
- Summary: 本文介绍了ChartNet,一个用于图表理解和推理的大规模高质量多模态数据集,包含150万个样本。该数据集利用代码引导的合成管道生成,涵盖24种图表类型,并提供代码、图像、数据表、摘要和问答等多种对齐组件,显著提升了多模态模型的图表理解能力。
[250] GUIDE: Guided Updates for In-context Decision Evolution in LLM-Driven Spacecraft Operations
- arXiv: 2603.27306 (replaced)
- Authors: Alejandro Carrasco, Mariko Storey-Matsutani, Victor Rodriguez-Fernandez, Richard Linares
- Subjects: cs.MA; cs.AI; eess.SY
- Tags: LLM Agent, Satellite Control, Decision Making
- Venue: CVPR 2026 Workshop
- Summary: 本文提出了GUIDE框架,一种用于航天器操作的上下文决策进化方法,通过演化结构化的自然语言决策规则来实现跨回合的适应。实验表明,该方法在对抗性轨道拦截任务中优于静态基线,实现了无需权重更新的策略改进。
[251] Hydra: Unifying Document Retrieval and Generation in a Single Vision-Language Model
- arXiv: 2603.28554 (replaced)
- Authors: Athos Georgiou
- Subjects: cs.CV; cs.AI; cs.IR
- Tags: Vision-Language Model, Information Retrieval, Parameter-Efficient Fine-Tuning
- Code: code
- Summary: 本文提出了Hydra,一种在单一视觉语言模型中统一文档检索和生成的双头方法。该方法通过切换单一的LoRA适配器,实现了ColBERT风格的多向量嵌入检索和高质量的自回归生成,显著降低了内存占用。
[252] WybeCoder: Verified Imperative Code Generation
- arXiv: 2603.29088 (replaced)
- Authors: Fabian Gloeckle, Mantas Baksys, Darius Feher, Kunhao Zheng, Amaury Hayat, Sean B. Holden, Gabriel Synnaeve, Peter O'Hearn
- Subjects: cs.SE; cs.AI
- Tags: Code Generation, Formal Methods, Program Synthesis
- Summary: 本文提出了WybeCoder,一个代理式代码验证框架,支持代码、不变量和证明的协同进化,实现了“即生成即证明”的开发模式。该方法结合了自动验证条件生成和SMT求解,在复杂算法上表现出色,显著提升了验证代码的生成能力。
[253] Trust and Reliance on AI in Education: AI Literacy and Need for Cognition as Moderators
- arXiv: 2604.01114 (replaced)
- Authors: Griffin Pitts, Neha Rani, Weedguet Mildort
- Subjects: cs.HC; cs.AI; cs.CY; cs.ET
- Tags: Education Technology, Human-Computer Interaction
- Venue: AIED 2026
- Summary: 本文研究了学生在编程问题解决任务中对AI助手的信任与依赖关系,发现较高的信任度与较低的适当依赖度相关。研究还表明,AI素养和认知需求显著调节了这一关系,强调了在教学和系统设计中支持反思性评估的必要性。
[254] SelfGrader: Stable Jailbreak Detection for Large Language Models using Token-Level Logits
- arXiv: 2604.01473 (replaced)
- Authors: Zikai Zhang, Rui Hu, Olivera Kotevska, Jiahao Xu
- Subjects: cs.CR; cs.AI
- Tags: Jailbreak Detection, LLM Security
- Summary: 本文提出了SelfGrader,一种轻量级的护栏方法,通过利用令牌级对数几率将越狱检测形式化为数值评分问题。该方法在紧凑的数值令牌集上评估查询安全性,生成稳定且可解释的分数,在降低攻击成功率的同时显著减少了内存开销和延迟。
[255] Evaluating the Formal Reasoning Capabilities of Large Language Models through Chomsky Hierarchy
- arXiv: 2604.02709 (replaced)
- Authors: Yihong Dong, Jianha Xiao, Xue Jiang, Xuyuan Guo, Zhiyuan Fan, Jiaru Qian, Kechi Zhang, Jia Li, Zhi Jin, Ge Li
- Subjects: cs.CL; cs.AI; cs.LG; cs.SE
- Tags: LLM Reasoning, Benchmark
- Summary: 本文引入了ChomskyBench基准,通过乔姆斯基层级系统地评估大语言模型的形式推理能力。实验结果显示,模型性能与层级复杂度相关,且大语言模型在形式任务上的效率显著低于传统算法程序,揭示了当前模型在形式推理方面的局限性。
[256] ChatSVA: Bridging SVA Generation for Hardware Verification via Task-Specific LLMs
- arXiv: 2604.02811 (replaced)
- Authors: Lik Tung Fu, Jie Zhou, Shaokai Ren, Mengli Zhang, Jia Xiong, Hugo Jiang, Nan Guan, Xi Wang, Jun Yang
- Subjects: cs.AR; cs.AI
- Tags: RTL Verification, Code Generation
- Venue: DAC 2026
- Summary: 本文提出了ChatSVA,一个基于多智能体框架的端到端SystemVerilog断言(SVA)生成系统。该方法通过AgentBridge平台生成高纯度数据集,解决了少样本场景下的数据稀缺问题,在功能正确性和覆盖率方面显著优于现有技术。
[257] From Load Tests to Live Streams: Graph Embedding-Based Anomaly Detection in Microservice Architectures
- arXiv: 2604.06448 (replaced)
- Authors: Srinidhi Madabhushi, Pranesh Vyas, Swathi Vaidyanathan, Mayur Kurup, Elliott Nash, Yegor Silyutin
- Subjects: cs.LG; cs.AI; cs.MM; eess.IV
- Tags: Anomaly Detection, Graph Neural Network
- Venue: FSE 2026
- Summary: 本文提出了一种基于图嵌入的异常检测系统,用于识别微服务架构中在负载测试中未被充分代表的服务。该方法使用GCN-GAE从服务图中学习结构表示,通过比较负载测试和实际事件嵌入的余弦相似度来检测异常。
[258] Exact Structural Abstraction and Tractability Limits
- arXiv: 2604.07349 (replaced)
- Authors: Tristan Simas
- Subjects: cs.CC; cs.AI; cs.LO
- Tags: Formal Methods, Deep Learning Theory
- Summary: 本文研究了精确结构抽象与可计算性限制问题,证明了在精确认证的闭包定律下,不存在高效可检查的结构谓词能够精确刻画问题可解性的边界。研究揭示了正确性层面的计算障碍,而非特定输出形式的问题。
[259] LPM 1.0: Video-based Character Performance Model
- arXiv: 2604.07823 (replaced)
- Authors: Ailing Zeng, Casper Yang, Chauncey Ge, Eddie Zhang, Garvey Xu, Gavin Lin, Gilbert Gu, Jeremy Pi, Leo Li, Mingyi Shi, Shawn Wang, Sheng Bi, Steven Tang, Thorn Hang, Tobey Guo, Vincent Li, Xin Tong, Yikang Li, Yuchen Sun, Yue Zhao, Yuhan Lu, Yuwei Li, Zane Zhang, Zeshi Yang, Zi Ye
- Subjects: cs.CV; cs.AI; cs.MM
- Tags: Video Generation, Multimodal Learning, Diffusion Model
- Summary: 本文提出了LPM 1.0,一个用于生成角色表演视频的大规模模型,专注于单人全双工音视频对话场景。该模型通过扩散Transformer实现高表现力、实时推理和长期身份稳定性,并蒸馏为因果流式生成器以支持低延迟无限长度交互。
[260] DBMF: A Dual-Branch Multimodal Framework for Out-of-Distribution Detection
- arXiv: 2604.08261 (replaced)
- Authors: Jiangbei Yue, Darren Treanor, Venkataraman Subramanian, Sharib Ali
- Subjects: cs.CV; cs.AI
- Tags: Anomaly Detection, Medical AI, Multimodal Learning
- Summary: 本文提出了一种双分支多模态框架用于分布外(OOD)检测,通过文本-图像分支和视觉分支互补识别OOD样本。在公开内窥镜图像数据集上的实验表明,该方法在OOD检测上比现有方法提升了高达24.84%。
[261] eBandit: Kernel-Driven Reinforcement Learning for Adaptive Video Streaming
- arXiv: 2604.08791 (replaced)
- Authors: Mahdi Alizadeh
- Subjects: cs.NI; cs.AI
- Tags: Video Streaming, Reinforcement Learning
- Summary: 本文提出了eBandit框架,将网络监控和自适应码率算法选择移至Linux内核中,使用eBPF实现的Multi-Armed Bandit算法。实验表明该方法在对抗性合成轨迹上比最佳静态启发式方法提升7.2%的QoE。
[262] WOMBET: World Model-based Experience Transfer for Robust and Sample-efficient Reinforcement Learning
- arXiv: 2604.08958 (replaced)
- Authors: Mintae Kim, Koushil Sreenath
- Subjects: cs.LG; cs.AI; cs.RO
- Tags: World Model, Reinforcement Learning, Robotics
- Venue: L4DC 2026
- Summary: 本文提出了WOMBET框架,通过世界模型在源任务中生成离线数据,并在目标任务中进行在线微调,实现样本高效的强化学习经验迁移。该方法在连续控制基准测试中显著提升了样本效率和最终性能。
[263] Token-Budget-Aware Pool Routing for Cost-Efficient LLM Inference
- arXiv: 2604.09613 (replaced)
- Authors: Huamin Chen, Xunzhuo Liu, Junchen Jiang, Bowei He, Xue Liu
- Subjects: cs.DC; cs.AI
- Tags: LLM Inference
- Summary: 本文提出了一种基于token预算感知的池路由方法,通过估计请求的token预算将其分发到短请求池或长请求池,从而优化LLM推理的GPU资源利用率。实验表明该方法可减少17-39%的GPU实例。
[264] Grid2Matrix: Revealing Digital Agnosia in Vision-Language Models
- arXiv: 2604.09687 (replaced)
- Authors: Yunkai Zhang, Linda Li, Yingxin Cui, Xiyuan Ruan, Zeyu Zheng, Kezhen Chen, Yi Zhang, Diji Yang
- Subjects: cs.CV; cs.AI
- Tags: Vision-Language Model, Benchmark
- Summary: 本文引入了Grid2Matrix基准测试,用于评估视觉语言模型捕获精细视觉细节的能力。研究发现VLMs存在”数字失认症”现象,即在零样本端到端评估中早期崩溃,无法准确提取图像中的所有视觉信息。
[265] A-IO: Adaptive Inference Orchestration for Memory-Bound NPUs
- arXiv: 2604.09752 (replaced)
- Authors: Chen Zhang, Yan Ding, Haotian Wang, Chubo Liu, Keqin Li, Kenli Li
- Subjects: cs.DC; cs.AI
- Tags: LLM Inference
- Summary: 本文揭示了NPU平台上LLM自回归解码阶段面临的内存瓶颈问题,指出了静态部署单一尺寸模型导致的”模型缩放悖论”以及细粒度推测解码在NPU计算图编译下的同步开销问题。
[266] RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies
- arXiv: 2604.09860 (replaced)
- Authors: Xuning Yang, Rishit Dagli, Alex Zook, Hugo Hadfield, Ankit Goyal, Stan Birchfield, Fabio Ramos, Jonathan Tremblay
- Subjects: cs.RO; cs.AI
- Tags: Robotics, Benchmark, Embodied AI
- Summary: 本文介绍了RoboLab,一个高保真仿真基准测试框架,用于评估通用机器人策略的真实泛化能力。该框架包含120个任务,涵盖视觉、程序和关系三个能力维度,并支持系统分析策略行为对受控扰动的敏感性。
[267] Think in Sentences: Explicit Sentence Boundaries Enhance Language Model's Capabilities
- arXiv: 2604.10135 (replaced)
- Authors: Zhichen Liu, Yongyuan Li, Yang Xu
- Subjects: cs.CL; cs.AI
- Tags: LLM Reasoning, Prompt Engineering
- Venue: ACL 2026
- Summary: 本文提出了一种在LLM输入的句子边界处插入分隔符的方法,使模型能够逐句处理信息,从而增强推理能力。实验表明该方法在GSM8k上提升7.7%,在DROP上提升12.5%。
[268] Optimal Stability of KL Divergence under Gaussian Perturbations
- arXiv: 2604.11026 (replaced)
- Authors: Jialu Pan, Yufeng Zhang, Nan Hu, Keqin Li, Zhenbang Chen, Ji Wang
- Subjects: cs.LG; cs.AI
- Tags: Anomaly Detection, Deep Learning Theory
- Summary: 本文研究了KL散度在高斯扰动下的稳定性,建立了任意分布与高斯分布族之间的尖锐稳定性界限。该结果将经典的高斯松弛三角不等式推广到一般分布,为基于流的生成模型中的OOD分析提供了理论基础。
[269] Cost-optimal Sequential Testing via Doubly Robust Q-learning
- arXiv: 2604.11165 (replaced)
- Authors: Doudou Zhou, Yiran Zhang, Dian Jin, Yingye Zheng, Lu Tian, Tianxi Cai
- Subjects: stat.ML; cs.AI; cs.LG; math.ST
- Tags: Reinforcement Learning, Medical AI
- Summary: 本文研究了从回顾性数据中学习成本最优的序贯决策策略问题,提出了双稳健Q学习框架。该方法通过路径特定的逆概率权重和辅助对比模型,在临床决策场景中实现了测试成本降低而不损失预测准确性。
[270] THEIA: Learning Complete Kleene Three-Valued Logic in a Pure-Neural Modular Architecture
- arXiv: 2604.11284 (replaced)
- Authors: Augustus Haoyang Li
- Subjects: cs.LG; cs.AI; cs.LO
- Tags: Neurosymbolic AI, Logical Reasoning
- Summary: 本文提出了THEIA,一种模块化神经架构,能够端到端学习完整的Kleene三值逻辑。实验表明模块化架构在组合泛化方面优于扁平MLP和Transformer基线,并通过机制探测揭示了模块化诱导的延迟判决机制。
[271] Network Effects and Agreement Drift in LLM Debates
- arXiv: 2604.11312 (replaced)
- Authors: Erica Cau, Andrea Failla, Giulio Rossetti
- Subjects: cs.SI; cs.AI; cs.CY; cs.MA
- Tags: LLM Agent, Social Simulation
- Summary: 本文使用网络生成模型研究LLM代理在多轮辩论中的集体行为,发现了一种称为”协议漂移”的方向敏感性现象,即代理更倾向于向特定立场转移。研究强调了在将LLM群体作为人类行为代理之前需要区分结构效应和模型偏差。
[272] CodeTracer: Towards Traceable Agent States
- arXiv: 2604.11641 (replaced)
- Authors: Han Li, Yifan Yao, Letian Zhu, Rili Feng, Hongyi Ye, Jiaming Wang, Yancheng He, Pengyu Zou, Lehan Zhang, Xinping Lei, Haoyang Huang, Ken Deng, Ming Sun, Zhaoxiang Zhang, He Ye, Jiaheng Liu
- Subjects: cs.SE; cs.AI
- Tags: LLM Agent, Agent Debugging
- Summary: 本文提出了CodeTracer,一种用于调试代码代理的追踪架构,能够解析异构运行产物并重构完整的状态转换历史。该方法通过故障起始定位来识别失败源头及其下游链,在失败定位任务上显著优于直接提示和轻量级基线。
[273] Beyond LLMs, Sparse Distributed Memory, and Neuromorphics <A Hyper-Dimensional SRAM-CAM "VaCoAl" for Ultra-High Speed, Ultra-Low Power, and Low Cost>
- arXiv: 2604.11665 (replaced)
- Authors: Hiroyuki Chuma, Kanji Otsuka, Yoichi Sato
- Subjects: cs.NE; cs.AI
- Tags: Neuromorphic Computing, Memory Architecture, Hyperdimensional Computing
- Summary: 本文提出了VaCoAl,一种基于伽罗瓦域代数的超维计算(HDC)架构,通过确定性逻辑实现超高速、超低功耗的稀疏分布式记忆。该架构在代数层面解决了灾难性遗忘、学习停滞和绑定问题等AI局限性,并在Wikidata的多跳推理任务中验证了其支持可逆组合推理的能力。
[274] Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
- arXiv: 2604.13016 (replaced)
- Authors: Yaxuan Li, Yuxin Zuo, Bingxiang He, Jinqian Zhang, Chaojun Xiao, Cheng Qian, Tianyu Yu, Huan-ang Gao, Wenkai Yang, Zhiyuan Liu, Ning Ding
- Subjects: cs.LG; cs.AI; cs.CL
- Tags: Knowledge Distillation, LLM Training
- Code: code
- Summary: 本文系统研究了大型语言模型在策略蒸馏(OPD)的训练动态和机制,发现OPD成功需要师生思维模式兼容且教师能提供真正的新能力。研究揭示了成功OPD在token层面的渐进对齐特征,并提出了离策略冷启动和教师对齐提示选择两种实用策略来改善失败的蒸馏。