arXiv cs.AI Daily Update
cs.AI 领域 2026年4月10日 共有 360 篇论文更新:
- 74 篇新投稿:LLM Agent (M-ArtAgent [3], CLEAR [5], PRIME [14]), LLM Evaluation (IatroBench [19], CivBench [23], ImplicitMemBench [51]), Multi-Agent System (ACIArena [26], MONETA [42], ACF [60]), LLM Reasoning (SAT [40], HiRO-Nav [57], SUPERNOVA [73]), LLM Alignment (ConsistRM [4], ReflectRM [6], SPARD [34])
- 163 篇跨领域投稿:Vision-Language Model (GameWorld [94], DSCA [165], LINE [174]), LLM Agent (MCP-DPT [107], MIMIC-Py [130], AnomalyAgent [153]), Reinforcement Learning (GIRL [92], RL-ASL [105], ReRec [144]), LLM Reasoning (SubSearch [91], TEMPER [136], ReRec [144]), LLM Evaluation (GameWorld [94], SYN-DIGITS [103], GEO [112])
- 123 篇替换投稿:LLM Agent (WebArbiter [246], MemPO [249], FactorEngine [252]), LLM Evaluation (MM-MoralBench [273], SealQA [289], MARCH [302]), LLM Reasoning (SealQA [289], PEER [299], MARCH [302]), LLM Inference (SpecBranch [290], BTC-LLM [295], ModeX [322]), Vision-Language Model (Lang2Act [248], MM-MoralBench [273], SeMoBridge [304])
整体趋势:今日论文主要聚焦于LLM Agent、LLM Evaluation、LLM Reasoning等方向。
已录用论文:[26](ACL 2026), [29](ACL 2026), [30](ACL 2026), [36](ITS 2026), [40](ACL 2026), [50](ACM MobiCom 2026), [51](ACL 2026), [52](ACL 2025), [53](ACL 2026 Findings), [55](ACL 2026), [62](ACL 2026), [63](AIED 2026), [68](ACL 2026), [86](CVPR 2026 Workshop), [106](ACL 2026), [108](EACL 2026 Workshop), [110](ACL 2026), [111](ICLR 2026), [117](AISTATS 2026), [128](ACL 2026), [130](FSE 2026), [138](FAccT 2026), [140](ICLR 2026 Workshop), [143](SIGIR 2026), [144](ACL 2026), [147](SIGIR 2026), [148](AAMAS 2026), [165](CVPR 2026), [171](CVPR 2026), [175](CVPR 2026), [184](ACL 2026), [189](ACM Multimedia 2026 Workshop), [190](ICLR 2026), [192](ACL 2026), [193](ACL Findings 2026), [197](ICSA 2026), [199](ACL 2026), [202](ICPR 2026), [209](CVPR 2026), [212](CVPR 2026 Findings), [226](ACL 2026), [232](CVPR 2026), [240](ACL 2026), [241](ICLR 2026 Workshop), [245](ACL 2026 Findings), [246](ICLR 2026), [264](ICIP 2024), [268](ICWSM 2026), [272](CVPR 2026), [273](Pattern Recognition), [275](TMLR 2026), [278](ACL 2026), [279](ICSE 2026), [281](ACL 2026), [282](EASE 2026), [286](EMNLP 2025 Findings), [288](CVPR 2026 Findings), [289](ICLR 2026), [290](ICLR 2026), [293](CSCW 2026), [302](ACL 2026 Findings), [303](ICME 2026), [307](ICLR 2026), [308](ICLR 2026), [309](ACL 2026), [310](ACL 2026), [312](ACL 2026), [320](IATMSI 2026), [321](ICPR 2026), [322](ACL 2026), [324](CVPR 2026), [325](TACL 2026), [326](10th bwHPC Symposium 2024 Workshop), [328](AAMAS 2026 Workshop), [332](CVPR 2026), [341](LREC 2026 Workshop), [352](GECCO 2026)
开源论文:[5](code), [19](code), [45](code), [56](code), [60](code), [73](code), [82](code), [88](code), [113](code), [117](code), [119](code), [120](code), [122](code), [144](code), [147](code), [151](code), [173](code), [185](code), [196](code), [200](code), [226](code), [233](code), [248](code), [249](code), [257](code), [278](code), [285](code), [287](code), [297](code), [300](code), [301](code), [304](code), [305](code), [307](code), [310](code), [321](code), [322](code), [332](code), [359](code)
新投稿 (74)
[1] An Analysis of Artificial Intelligence Adoption in NIH-Funded Research
- arXiv: 2604.07424
- Authors: Navapat Nananukul, Mayank Kejriwal
- Subjects: cs.AI; cs.CY; cs.MA
- Tags: LLM Evaluation, Medical AI
- Summary: 本文利用LLM对NIH资助的58,746个生物医学研究项目进行分析,发现AI占NIH项目组合的15.9%,存在从研究到临床部署的差距,且健康公平相关研究严重不足。
[2] Munkres' General Topology Autoformalized in Isabelle/HOL
- arXiv: 2604.07455
- Authors: Dustin Bryant, Jonathan Julián Huerta y Munive, Cezary Kaliszyk, Josef Urban
- Subjects: cs.AI; cs.LG; cs.LO
- Tags: Formal Methods, LLM Reasoning
- Summary: 本文展示了利用LLM辅助将Munkres拓扑学教材自动形式化为Isabelle/HOL代码的实验,生成了超过85,000行代码,所有806个形式化结果均完成证明。
[3] M-ArtAgent: Evidence-Based Multimodal Agent for Implicit Art Influence Discovery
- arXiv: 2604.07468
- Authors: Hanyi Liu, Zhonghao Jiu, Minghao Wang, Yuhang Xie, Heran Yang
- Subjects: cs.AI
- Tags: LLM Agent, Vision-Language Model
- Summary: 本文提出M-ArtAgent,一种基于证据的多模态智能体,通过调查、佐证、证伪和裁决四阶段协议来发现隐含的艺术影响关系。
[4] ConsistRM: Improving Generative Reward Models via Consistency-Aware Self-Training
- arXiv: 2604.07484
- Authors: Yu Liang, Liangxin Liu, Longzheng Wang, Yan Wang, Yueyang Zhang, Long Xia, Zhiyuan Sun, Daiting Shi
- Subjects: cs.AI; cs.CL; cs.LG
- Tags: LLM Alignment, RLHF
- Summary: 本文提出ConsistRM,一种自训练框架,通过一致性感知奖励机制来改进生成式奖励模型的训练稳定性和性能,无需人工标注数据。
[5] CLEAR: Context Augmentation from Contrastive Learning of Experience via Agentic Reflection
- arXiv: 2604.07487
- Authors: Linbo Liu, Guande Wu, Han Ding, Yawei Wang, Qiang Zhou, Yuzhe Lu, Zhichao Xu, Huan Song, Panpan Xu, Lin Lee Cheong
- Subjects: cs.AI
- Tags: LLM Agent, Reinforcement Learning
- Code: code
- Summary: 本文提出CLEAR框架,通过对比学习和反思智能体生成任务特定的上下文增强,在AppWorld和WebShop基准上显著提升了任务完成率。
[6] ReflectRM: Boosting Generative Reward Models via Self-Reflection within a Unified Judgment Framework
- arXiv: 2604.07506
- Authors: Kai Qin, Liangxin Liu, Yu Liang, Longzheng Wang, Yan Wang, Yueyang Zhang, Long Xia, Zhiyuan Sun, Houde Liu, Daiting Shi
- Subjects: cs.AI; cs.CL
- Tags: LLM Alignment, RLHF
- Summary: 本文提出ReflectRM,一种利用自我反思能力评估分析质量的生成式奖励模型,在统一框架下联合建模响应偏好和分析偏好,有效缓解位置偏差。
[7] Rhizome OS-1: Rhizome's Semi-Autonomous Operating System for Small Molecule Drug Discovery
- arXiv: 2604.07512
- Authors: Yiwen Wang, Gregory Sinenka, Xhuliano Brace
- Subjects: cs.AI; cs.LG
- Tags: Drug Discovery, LLM Agent, Graph Neural Network
- Summary: 本文介绍Rhizome OS-1,一个半自主药物发现系统,利用多模态AI智能体和图神经网络生成新型小分子化合物,在肿瘤学靶点上展示了高结构新颖性。
[8] Trust the AI, Doubt Yourself: The Effect of Urgency on Self-Confidence in Human-AI Interaction
- arXiv: 2604.07535
- Authors: Baran Shajari, Xiaoran Liu, Kyanna Dagenais, Istvan David
- Subjects: cs.AI
- Tags: Human-Computer Interaction, AI Ethics
- Summary: 本文通过实验研究人机交互中紧迫感对用户自信心的影响,发现紧迫感虽不影响对AI的信任,但可能损害用户的自信心和自我效能感。
[9] Agentic Copyright, Data Scraping & AI Governance: Toward a Coasean Bargain in the Era of Artificial Intelligence
- arXiv: 2604.07546
- Authors: Paulius Jurcys, Mark Fenwick
- Subjects: cs.AI
- Tags: AI Ethics, Multi-Agent System
- Summary: 本文探讨多智能体AI系统如何重塑版权法基础,提出代理版权模型和监督式多智能体治理框架,以解决自主代理交互带来的新型市场失灵问题。
[10] Dual-Loop Control in DCVerse: Advancing Reliable Deployment of AI in Data Centers via Digital Twins
- arXiv: 2604.07559
- Authors: Qingang Zhang, Yuejun Yan, Guangyu Wu, Siew-Chien Wong, Jimin Jia, Zhaoyang Wang, Yonggang Wen
- Subjects: cs.AI
- Tags: Reinforcement Learning, Energy Efficiency
- Summary: 本文提出基于数字孪生的双环控制框架(DLCF),用于数据中心冷却系统的深度强化学习控制部署,在保证安全的前提下实现最高4.09%的节能效果。
[11] From Papers to Property Tables: A Priority-Based LLM Workflow for Materials Data Extraction
- arXiv: 2604.07584
- Authors: Koushik Rameshbabu, Jing Luo, Ali Shargh, Khalid A. El-Awady, Jaafar A. El-Awady
- Subjects: cs.AI
- Tags: Information Extraction, Data Synthesis
- Summary: 本文提出一种基于LLM的分层工作流,通过三级优先策略从材料科学论文中自动提取和重构结构化实验数据,整体准确率达94.69%。
[12] Too long; didn't solve
- arXiv: 2604.07593
- Authors: Lucía M. Cabrera, Isaac Saxton-Knight
- Subjects: cs.AI
- Tags: LLM Reasoning, LLM Evaluation
- Summary: 本文研究提示长度和解答长度对LLM数学推理性能的影响,发现两个变量都与模型失败率正相关,结构长度与数据集的经验难度相关。
[13] Reasoning Graphs: Deterministic Agent Accuracy through Evidence-Centric Chain-of-Thought Feedback
- arXiv: 2604.07595
- Authors: Matthew Penaroza
- Subjects: cs.AI; cs.CL
- Tags: LLM Agent, Question Answering
- Summary: 本文引入推理图结构,将智能体的证据级思维链持久化存储,通过证据中心的反馈机制实现无需重训练的准确率提升和方差收敛。
[14] PRIME: Training Free Proactive Reasoning via Iterative Memory Evolution for User-Centric Agent
- arXiv: 2604.07645
- Authors: Prince Zizhuang Wang, Shuli Jiang
- Subjects: cs.AI
- Tags: LLM Agent, Memory Architecture
- Summary: 本文提出PRIME框架,一种无梯度的学习方法,通过将多轮交互轨迹蒸馏为结构化经验来持续改进以用户为中心的智能体行为。
[15] How Independent are Large Language Models? A Statistical Framework for Auditing Behavioral Entanglement and Reweighting Verifier Ensembles
- arXiv: 2604.07650
- Authors: Chenchen Kuai, Jiwan Jiang, Zihao Zhu, Hao Wang, Keshu Wu, Zihao Li, Yunlong Zhang, Chenxi Liu, Zhengzhong Tu, Zhiwen Fan, Yang Zhou
- Subjects: cs.AI; cs.CL
- Tags: LLM Evaluation, LLM Hallucination
- Summary: 本文开发了一个统计框架来审计LLM之间的行为纠缠,发现广泛存在的行为依赖会损害LLM-as-a-judge评估和集成验证的性能。
[16] Bridging Natural Language and Interactive What-If Interfaces via LLM-Generated Declarative Specification
- arXiv: 2604.07652
- Authors: Sneha Gathani, Sirui Zeng, Diya Patel, Ryan Rossi, Dan Marshall, Cagatay Demiralp, Steven Drucker, Zhicheng Liu
- Subjects: cs.AI; cs.HC
- Tags: Data Visualization, Human-Computer Interaction
- Summary: 本文提出一种两阶段工作流,通过中间声明式规范语言(PSL)将自然语言的假设分析问题转换为交互式可视化界面,提高了语义可靠性。
[17] From Debate to Decision: Conformal Social Choice for Safe Multi-Agent Deliberation
- arXiv: 2604.07667
- Authors: Mengdie Flora Wang, Haochen Xie, Guanghui Wang, Aijing Gao, Guang Yang, Ziyuan Li, Qucy Wei Qiu, Fangwei Han, Hengzhi Qiu, Yajing Huang, Bing Zhu, Jae Oh Woo
- Subjects: cs.AI; cs.MA; cs.SI
- Tags: Multi-Agent System, LLM Reasoning, AI Safety
- Summary: 本文提出了Conformal Social Choice,一种将多智能体辩论输出转化为校准决策的后处理层,利用共形预测提供边际覆盖保证。该方法能在α=0.05时拦截81.9%的错误共识案例,同时保持剩余决策的高准确率。
[18] Multi-Agent Orchestration for High-Throughput Materials Screening on a Leadership-Class System
- arXiv: 2604.07681
- Authors: Thang Duc Pham, Harikrishna Tummalapalli, Fakhrul Hasan Bhuiyan, Álvaro Vázquez Mayagoitia, Christine Simpson, Riccardo Balin, Venkatram Vishwanath, Murat Keçeli
- Subjects: cs.AI
- Tags: Multi-Agent System, High Performance Computing, LLM Agent
- Summary: 本文提出了一种可扩展的层次化多智能体框架,用于在超级计算机上编排高通量材料筛选任务。该框架采用规划器-执行器架构,在Aurora超级计算机上成功演示了MOF数据库的大气水收集筛选。
[19] IatroBench: Pre-Registered Evidence of Iatrogenic Harm from AI Safety Measures
- arXiv: 2604.07709
- Authors: David Gringras
- Subjects: cs.AI; cs.CL; cs.CY; cs.LG
- Tags: AI Safety, Medical AI, LLM Evaluation
- Code: code
- Summary: 本文引入IatroBench来测量前沿大语言模型中的身份依赖性信息保留问题,发现模型对医生框架的问题比外行框架提供更好的医疗指导。研究揭示了安全措施可能通过训练性保留、能力不足或过度内容过滤造成伤害。
[20] Towards Knowledgeable Deep Research: Framework and Benchmark
- arXiv: 2604.07720
- Authors: Wenxuan Liu, Zixuan Li, Bai Long, Chunmao Zhang, Fenghui Zhang, Zhuo Chen, Wei Li, Yuxin Zuo, Fei Wang, Bingbing Xu, Xuhui Jiang, Jin Zhang, Xiaolong Jin, Jiafeng Guo, Tat-Seng Chua, Xueqi Cheng
- Subjects: cs.AI
- Tags: LLM Agent, RAG, Multi-Agent System
- Summary: 本文提出了知识深度研究(KDR)任务,要求LLM智能体利用结构化和非结构化知识生成报告。作者设计了混合知识分析框架(HKA),一种多智能体架构,在构建的KDR-Bench上超越了现有深度研究智能体。
[21] Squeeze Evolve: Unified Multi-Model Orchestration for Verifier-Free Evolution
- arXiv: 2604.07725
- Authors: Monishwaran Maheswaran, Leon Lakhani, Zhongzhu Zhou, Shijia Yang, Junxiong Wang, Coleman Hooper, Yuezhou Hu, Rishabh Tiwari, Jue Wang, Harman Singh, Qingyang Wu, Yuqing Jian, Ce Zhang, Kurt Keutzer, Tri Dao, Xiaoxia Wu, Ben Athiwaratkun, James Zou, Chenfeng Xu
- Subjects: cs.AI; cs.CL
- Tags: LLM Reasoning, Multi-Agent System, LLM Inference
- Summary: 本文介绍了Squeeze Evolve,一种用于无验证器进化推理的多模型编排框架,根据边际效用分配模型能力。该方法在多个基准测试上实现了最先进结果,同时将API成本降低高达3倍。
[22] Emotion Concepts and their Function in a Large Language Model
- arXiv: 2604.07729
- Authors: Nicholas Sofroniew, Isaac Kauvar, William Saunders, Runjin Chen, Tom Henighan, Sasha Hydrie, Craig Citro, Adam Pearce, Julius Tarng, Wes Gurnee, Joshua Batson, Sam Zimmerman, Kelley Rivoire, Kyle Fish, Chris Olah, Jack Lindsey
- Subjects: cs.AI; cs.CL
- Tags: LLM Alignment, Interpretability, Affective Computing
- Summary: 本文研究了Claude Sonnet 4.5中情绪概念的内部表征,发现这些表征因果性地影响模型的输出,包括偏好和奖励黑客等不对齐行为。作者将这种现象称为功能性情绪。
[23] CivBench: Progress-Based Evaluation for LLMs' Strategic Decision-Making in Civilization V
- arXiv: 2604.07733
- Authors: John Chen, Sihan Cheng, Can Gurkan, Mingyi Lin
- Subjects: cs.AI
- Tags: LLM Evaluation, Game AI, LLM Agent
- Summary: 本文介绍了CivBench,一个用于评估LLM在多人文明V游戏中战略决策能力的基准。该基准通过回合级胜利概率估计来评估长期多智能体博弈中的战略能力。
[24] The Cartesian Cut in Agentic AI
- arXiv: 2604.07745
- Authors: Tim Sainburg, Caleb Weinreb
- Subjects: cs.AI; q-bio.NC
- Tags: LLM Agent, AI Safety, Decision Making
- Summary: 本文从控制架构视角分析LLM智能体的设计,对比了笛卡尔式智能体(学习核心与工程运行时耦合)与生物控制系统。作者概述了有界服务、笛卡尔智能体和集成智能体三种权衡自主性、鲁棒性和监督的方法。
[25] Mitigating Distribution Sharpening in Math RLVR via Distribution-Aligned Hint Synthesis and Backward Hint Annealing
- arXiv: 2604.07747
- Authors: Pei-Xi Xie, Che-Yu Lin, Cheng-Lin Yang
- Subjects: cs.AI; cs.CL; cs.LG
- Tags: LLM Reasoning, Reinforcement Learning
- Summary: 本文针对数学推理中可验证奖励强化学习(RLVR)的分布锐化问题,提出了分布对齐提示合成(DAHS)和后向提示退火(BHA)方法。该方法在AIME基准测试上同时提升了pass@1和pass@2048性能。
[26] ACIArena: Toward Unified Evaluation for Agent Cascading Injection
- arXiv: 2604.07775
- Authors: Hengyu An, Minxi Li, Jinghuai Zhang, Naen Xu, Chunyi Zhou, Changjiang Li, Xiaogang Xu, Tianyu Du, Shouling Ji
- Subjects: cs.AI; cs.CL; cs.CR
- Tags: LLM Security, Multi-Agent System, LLM Agent
- Venue: ACL 2026
- Summary: 本文介绍了ACIArena,一个用于评估多智能体系统对智能体级联注入攻击鲁棒性的统一框架。该框架覆盖多种攻击面和攻击目标,提供了1356个测试用例的系统化评估套件。
[27] The Accountability Horizon: An Impossibility Theorem for Governing Human-Agent Collectives
- arXiv: 2604.07778
- Authors: Haileleol Tibebu
- Subjects: cs.AI
- Tags: AI Safety, AI Ethics, Multi-Agent System
- Summary: 本文证明了一个不可能定理:当AI智能体自主性超过可计算阈值时,没有任何问责框架能同时满足归因性、可预见性、非空性和完整性。作者形式化了人机集体,并建立了问责的相变边界。
[28] Automotive Engineering-Centric Agentic AI Workflow Framework
- arXiv: 2604.07784
- Authors: Tong Duy Son, Zhihao Liu, Piero Brigida, Yerlan Akhmetov, Gurudevan Devarajan, Kai Liu, Ajinkya Bhave
- Subjects: cs.AI; cs.MA; eess.SY
- Tags: LLM Agent, Decision Making, Autonomous Driving
- Summary: 本文提出了智能工程智能(AEI)框架,将工程工作流建模为约束驱动、历史感知的序贯决策过程。该框架在汽车工程用例中展示了如何将多样化工作流表达为统一形式。
[29] SEARL: Joint Optimization of Policy and Tool Graph Memory for Self-Evolving Agents
- arXiv: 2604.07791
- Authors: Xinshun Feng, Xinhao Song, Lijun Li, Gongshen Liu, Jing Shao
- Subjects: cs.AI; cs.LG
- Tags: LLM Agent, Reinforcement Learning, Memory Architecture
- Venue: ACL 2026
- Summary: 本文介绍了SEARL,一种基于工具记忆的自进化智能体框架,通过构建整合规划与执行的结构化经验记忆来支持资源受限环境下的高效学习。该方法利用轨迹间相关性来稠密化奖励信号。
[30] Lightweight LLM Agent Memory with Small Language Models
- arXiv: 2604.07798
- Authors: Jiaquan Zhang, Chaoning Zhang, Shuxu Chen, Zhenzhen Huang, Pengcheng Zheng, Zhicheng Wang, Ping Guo, Fan Mo, Sung-Ho Bae, Jie Zou, Jiwei Wei, Yang Yang
- Subjects: cs.AI
- Tags: Memory Architecture, LLM Agent, LLM Inference
- Venue: ACL 2026
- Summary: 本文提出了LightMem,一种使用小语言模型的轻量级智能体记忆系统,将记忆模块化为短期、中期和长期记忆。该系统在固定检索预算下实现了更好的F1分数和低延迟。
[31] Agentivism: a learning theory for the age of artificial intelligence
- arXiv: 2604.07813
- Authors: Lixiang Yan, Dragan Gašević
- Subjects: cs.AI; cs.HC
- Tags: Education Technology, Human-Computer Interaction, Cognitive Science
- Summary: 本文提出了Agentivism,一种面向人机交互的学习理论,将学习定义为通过选择性委托AI、认知监控、重构内化和减少支持下的迁移来实现人类能力的持久增长。
[32] Automatic Generation of Executable BPMN Models from Medical Guidelines
- arXiv: 2604.07817
- Authors: Praveen Kumar Menaka Sekar, Ion Matei, Maksym Zhenirovskyy, Hon Yung Wong, Sayuri Kohmura, Shinji Hotta, Akihiro Inomata
- Subjects: cs.AI; cs.LG; cs.SE
- Tags: Medical AI, LLM Agent
- Summary: 本文提出了一个端到端流程,使用大语言模型将医疗政策文档转换为可执行的BPMN模型用于仿真评估。该流程在结构良好的政策上实现了100%的基准匹配,并包含基于熵的不确定性检测。
[33] Silencing the Guardrails: Inference-Time Jailbreaking via Dynamic Contextual Representation Ablation
- arXiv: 2604.07835
- Authors: Wenpeng Xing, Moran Fang, Guangtai Wang, Changting Lin, Meng Han
- Subjects: cs.AI
- Tags: LLM Security, LLM Alignment, Adversarial Robustness
- Summary: 本文提出CRA框架,通过在推理时动态识别并抑制模型隐藏状态中引发拒绝行为的激活模式来绕过LLM的安全约束。该方法无需参数更新即可显著提升攻击效果,揭示了当前对齐机制的内在脆弱性。
[34] SPARD: Self-Paced Curriculum for RL Alignment via Integrating Reward Dynamics and Data Utility
- arXiv: 2604.07837
- Authors: Xuyang Zhi, Peilun zhou, Chengqiang Lu, Hang Lv, Yiwei Liang, Rongyang Zhang, Yan Gao, YI WU, Yao Hu, Hongchao Gu, Defu Lian, Hao Wang, Enhong Chen
- Subjects: cs.AI
- Tags: RLHF, LLM Alignment, Curriculum Learning
- Summary: 本文提出SPARD框架,通过感知学习进度动态调整多目标奖励权重和数据重要性,建立自动化的自步课程来同步学习意图与数据效用。实验表明该方法在多个基准上显著提升了模型能力。
[35] Hidden Biases in Conditioning Autoregressive Models
- arXiv: 2604.07855
- Authors: Francois Pachet, Pierre Roy
- Subjects: cs.AI
- Tags: LLM Inference, Interpretability
- Summary: 本文形式化了自回归模型的精确推理任务,证明了精确句子级MAP解码是NP难的,而精确条件归一化对于正则约束是#P难的。研究揭示了局部自回归采样容易,但在全局形式约束下的精确解码和条件化在计算上是不可行的。
[36] An Agentic Evaluation Architecture for Historical Bias Detection in Educational Textbooks
- arXiv: 2604.07883
- Authors: Gabriel Stefan, Adrian-Marius Dumitran
- Subjects: cs.AI; cs.CL; cs.CY; cs.MA
- Tags: LLM Agent, Bias Mitigation, Education Technology
- Venue: ITS 2026
- Summary: 本文提出一种智能体评估架构,包含多模态筛选代理、异构评审团和元代理,用于检测历史教科书中的隐性偏见。该框架引入源归属协议以避免误判,在罗马尼亚高中历史教科书上验证了其有效性。
[37] DialBGM: A Benchmark for Background Music Recommendation from Everyday Multi-Turn Dialogues
- arXiv: 2604.07895
- Authors: Joonhyeok Shin, Jaehoon Kang, Yujun Lee, Hannah Lee, Yejin Lee, Yoonji Park, Kyuhong Shim
- Subjects: cs.AI
- Tags: Recommender System, Multimodal Learning, Dialogue System
- Summary: 本文介绍了DialBGM基准,包含1,200个对话及其配对的音乐片段,用于对话条件下的背景音乐推荐任务。实验表明当前模型表现远低于人类判断水平,没有任何模型在Hit@1上超过35%。
[38] Visual Perceptual to Conceptual First-Order Rule Learning Networks
- arXiv: 2604.07897
- Authors: Kun Gao, Davide Soldà, Thomas Eiter, Katsumi Inoue
- Subjects: cs.AI; cs.LG
- Tags: Neurosymbolic AI, Interpretability, Self-Supervised Learning
- Summary: 本文提出γILP框架,实现了从图像常量替换到规则结构归纳的完全可微流程,能够在无需图像标签的情况下从图像数据中学习规则。实验表明该方法在符号关系数据集和纯图像数据集上均表现优异。
[39] Capture-Quiet Decomposition: A Verification Theorem for Chess Endgame Tablebases
- arXiv: 2604.07907
- Authors: Alexander Pavlov
- Subjects: cs.AI; cs.LO
- Tags: Formal Methods, Game AI
- Summary: 本文提出CQD结构定理,用于验证国际象棋残局数据库的胜负平标注正确性。该方法将每个合法位置分解为终局、吃子或静止三类,并在517个残局共65亿个位置上进行了穷尽验证。
[40] SAT: Balancing Reasoning Accuracy and Efficiency with Stepwise Adaptive Thinking
- arXiv: 2604.07922
- Authors: Weiyang Huang, Xuefeng Bai, Kehai Chen, Xinyang Chen, Yibin Chen, Weili Guan, Min Zhang
- Subjects: cs.AI; cs.CL
- Tags: LLM Reasoning, LLM Inference
- Venue: ACL 2026
- Summary: 本文提出SAT框架,将推理建模为有限状态机,使用轻量级过程奖励模型进行步骤级别的难度感知剪枝。该方法在保持或提升准确率的同时,实现了高达40%的推理token减少。
[41] EigentSearch-Q+: Enhancing Deep Research Agents with Structured Reasoning Tools
- arXiv: 2604.07927
- Authors: Boer Zhang, Mingyan Wu, Dongzhuoran Zhou, Yuqicheng Zhu, Wendong Fan, Puzhen Zhang, Zifeng Ding, Guohao Li, Yuan He
- Subjects: cs.AI
- Tags: LLM Agent, Information Retrieval, RAG
- Summary: 本文提出Q+工具集,通过引导查询规划、监控搜索进度和从长网页快照中提取证据,使深度研究代理的网络搜索更加审慎。集成到Eigent浏览器子代理后,在多个基准上显著提升了准确率。
[42] MONETA: Multimodal Industry Classification through Geographic Information with Multi Agent Systems
- arXiv: 2604.07956
- Authors: Arda Yüksel, Gabriel Thiem, Susanne Walter, Patrick Felka, Gabriela Alves Werb, Ivan Habernal
- Subjects: cs.AI
- Tags: Multi-Agent System, Multimodal Learning, Vision-Language Model
- Summary: 本文发布了MONETA,首个结合文本和地理空间来源的多模态行业分类基准,包含欧洲1,000家企业及其NACE经济活动标签。通过多轮设计、上下文丰富和分类解释,分类准确率提升了22.8%。
[43] WorldMAP: Bootstrapping Vision-Language Navigation Trajectory Prediction with Generative World Models
- arXiv: 2604.07957
- Authors: Hongjin Chen, Shangyun Jiang, Tonghua Su, Chen Gao, Xinlei Chen, Yong Li, Zhibo Chen
- Subjects: cs.AI; cs.CV; cs.RO
- Tags: Embodied AI, Vision-Language Model, Robotics
- Summary: 本文提出WorldMAP师生框架,将世界模型生成的未来场景转化为持久语义空间结构和规划监督信号,用于具身导航中的轨迹预测。该方法在Target-Bench上实现了显著的预测误差降低。
[44] Are we still able to recognize pearls? Machine-driven peer review and the risk to creativity: An explainable RAG-XAI detection framework with markers extraction
- arXiv: 2604.07964
- Authors: Alin-Gabriel Văduva, Simona-Vasilica Oprea, Adela Bâra
- Subjects: cs.AI; cs.LG
- Tags: LLM Evaluation, RAG, Interpretability
- Summary: 本文提出RAG-XAI可解释框架,用于评估同行评审质量并检测机器生成的评审模式。该框架在检测准确率上达到99.61%,并通过特征重要性分析识别出缺乏个人信号和重复模式是主要预测因子。
[45] How Far Are Large Multimodal Models from Human-Level Spatial Action? A Benchmark for Goal-Oriented Embodied Navigation in Urban Airspace
- arXiv: 2604.07973
- Authors: Baining Zhao, Ziyou Wang, Jianjie Fang, Zile Zhou, Yanggang Xu, Yatai Ji, Jiacheng Xu, Qian Zhang, Weichen Zhang, Chen Gao, Xinlei Chen
- Subjects: cs.AI
- Tags: Embodied AI, Vision-Language Model, LLM Evaluation
- Code: code
- Summary: 本文构建了包含5,037个城市3D空间目标导航样本的数据集,评估了17个代表性多模态模型的空间决策能力。研究发现当前模型展现出新兴的行动能力,但仍远未达到人类水平,且导航误差在关键决策分岔点后快速发散。
[46] PASK: Toward Intent-Aware Proactive Agents with Long-Term Memory
- arXiv: 2604.08000
- Authors: Zhifei Xie, Zongzheng Hu, Fangda Ye, Xin Zhang, Haobo Chai, Zihang Liu, Pengcheng Wu, Guibin Zhang, Yue Liao, Xiaobin Hu, Deheng Ye, Chunyan Miao, Shuicheng Yan
- Subjects: cs.AI; cs.CL; cs.CV; cs.HC; cs.MA
- Tags: LLM Agent, Memory Architecture, Decision Making
- Summary: 本文提出DD-MM-PAS范式用于流式主动AI代理,并实现了Pask系统,包含IntentFlow模型和混合记忆架构。该系统能够在延迟约束下从上下文推断用户潜在需求。
[47] Evaluating Counterfactual Explanation Methods on Incomplete Inputs
- arXiv: 2604.08004
- Authors: Francesco Leofante, Daniel Neider, Mustafa Yalçıner
- Subjects: cs.AI
- Tags: Interpretability, Fairness
- Summary: 本文系统评估了反事实解释方法在不完整输入上的表现,发现虽然鲁棒方法比非鲁棒方法具有更高的有效性,但所有方法在寻找有效反事实方面都存在困难。研究结果表明需要新的能够处理不完整输入的反事实解释方法。
[48] Wiring the 'Why': A Unified Taxonomy and Survey of Abductive Reasoning in LLMs
- arXiv: 2604.08016
- Authors: Moein Salimi, Shaygan Adim, Danial Parnian, Nima Alighardashi, Mahdi Jafari Siavoshani, Mohammad Hossein Rohban
- Subjects: cs.AI; cs.LG
- Tags: LLM Reasoning, Scientific Reasoning, Knowledge Representation
- Summary: 本文首次对LLM中的溯因推理进行了全面综述,建立了统一的两阶段定义(假设生成和假设选择)和分类体系。研究通过基准测试揭示了当前方法在静态基准设计、领域覆盖和训练框架等方面的关键差距。
[49] "Why This Avoidance Maneuver?" Contrastive Explanations in Human-Supervised Maritime Autonomous Navigation
- arXiv: 2604.08032
- Authors: Joel Jose, Andreas Madsen, Andreas Brandsæter, Tor A. Johansen, Erlend M. Coates
- Subjects: cs.AI; cs.RO
- Tags: Autonomous Driving, Interpretability
- Summary: 本文探讨了在海上自主导航系统中生成对比性解释的方法,通过比较系统提出的解决方案与相关替代方案来帮助人类监督者理解决策逻辑。用户研究表明,对比性解释有助于理解系统目标,但在复杂多船遭遇场景中会增加认知负荷。
[50] IoT-Brain: Grounding LLMs for Semantic-Spatial Sensor Scheduling
- arXiv: 2604.08033
- Authors: Zhaomeng Zhou, Lan Zhang, Junyang Wang, Mu Yuan, Junda Lin, Jinke Song
- Subjects: cs.AI; cs.MA; cs.NI
- Tags: LLM Agent, Neurosymbolic AI, IoT
- Venue: ACM MobiCom 2026
- Summary: 本文提出IoT-Brain系统,通过空间轨迹图(STG)这一神经符号范式将LLM与物理传感器调度相结合,解决语义到物理的映射差距问题。在校园规模基准测试中,该方法相比最强基线提升37.6%任务成功率,同时显著降低计算开销和网络带宽。
[51] ImplicitMemBench: Measuring Unconscious Behavioral Adaptation in Large Language Models
- arXiv: 2604.08064
- Authors: Chonghan Qin, Xiachong Feng, Weitao Ma, Xiaocheng Feng, Lingpeng Kong
- Subjects: cs.AI
- Tags: LLM Evaluation, Memory Architecture, Cognitive Science
- Venue: ACL 2026
- Summary: 本文引入ImplicitMemBench基准,首次系统评估LLM的隐式记忆能力,包括程序性记忆、启动效应和经典条件反射三种认知构念。评估显示所有模型整体表现均未超过66%,揭示了LLM在自动化行为适应方面的严重局限性。
[52] Revise: A Framework for Revising OCRed text in Practical Information Systems with Data Contamination Strategy
- arXiv: 2604.08115
- Authors: Gyuho Shim, Seongtae Hong, Heuiseok Lim
- Subjects: cs.AI
- Tags: OCR, Document Understanding, Data Synthesis
- Venue: ACL 2025
- Summary: 本文提出Revise框架,通过层次化OCR错误分类法和合成数据生成策略,在字符、词汇和结构层面系统性地纠正OCR错误。该方法显著提升了文档检索和问答任务的下游性能,实现了更结构化的文档内容管理。
[53] Beyond Stochastic Exploration: What Makes Training Data Valuable for Agentic Search
- arXiv: 2604.08124
- Authors: Chuzhan Hao, Wenfeng Feng, Guochao Jiang, Guofeng Quan, Guohua Liu, Yuewei Zhang
- Subjects: cs.AI
- Tags: LLM Agent, Reinforcement Learning, RAG
- Venue: ACL 2026 Findings
- Summary: 本文提出HiExp框架,通过对比分析和多层次聚类机制将原始推理轨迹转化为层次化经验知识,有效规范随机探索过程。该方法在多个智能体搜索和数学推理基准上实现了显著性能提升和强泛化能力。
[54] Activation Steering for Aligned Open-ended Generation without Sacrificing Coherence
- arXiv: 2604.08169
- Authors: Niklas Herbster, Martin Zborowski, Alberto Tosato, Gauthier Gidel, Tommaso Tosato
- Subjects: cs.AI
- Tags: LLM Alignment, Interpretability, Adversarial Robustness
- Summary: 本文提出激活引导方法作为轻量级运行时防御,通过投影感知方法选择性干预低于分布阈值的token激活,在恢复目标特质的同时保持模型连贯性和通用能力。实验表明该方法能有效应对恶意系统提示引发的对齐失效。
[55] Aligning Agents via Planning: A Benchmark for Trajectory-Level Reward Modeling
- arXiv: 2604.08178
- Authors: Jiaxuan Wang, Yulan Hu, Wenjin Yang, Zheng Pan, Xin Li, Lan-Zhe Guo
- Subjects: cs.AI
- Tags: LLM Agent, RLHF, LLM Evaluation
- Venue: ACL 2026
- Summary: 本文提出Plan-RewardBench基准,用于评估工具使用场景下轨迹级偏好建模能力,涵盖安全拒绝、工具无关性、复杂规划和错误恢复四类任务。结果显示所有评估器在长时域轨迹上性能显著下降,突显了智能体轨迹级奖励建模的专门训练需求。
[56] Grounding Clinical AI Competency in Human Cognition Through the Clinical World Model and Skill-Mix Framework
- arXiv: 2604.08226
- Authors: Seyed Amir Ahmad Safavi-Naini, Elahe Meftah, Josh Mohess, Pooya Mohammadi Kazaj, Georgios Siontis, Zahra Atf, Peter R. Lewis, Mauricio Reyes, Girish Nadkarni, Roland Wiest, Stephan Windecker, Christoph Grani, Ali Soroush, Isaac Shiri
- Subjects: cs.AI; cs.HC; eess.SY
- Tags: Medical AI, Knowledge Representation, LLM Evaluation
- Code: code
- Summary: 本文引入临床世界模型框架,将医疗形式化为患者、提供者和生态系统三方交互,并提出临床AI技能组合定义八个维度的能力坐标。该框架为临床AI的规范、评估和边界提供了共同语法,强调能力空间的不可约简性。
[57] HiRO-Nav: Hybrid ReasOning Enables Efficient Embodied Navigation
- arXiv: 2604.08232
- Authors: He Zhao, Yijun Yang, Zichuan Lin, Deheng Ye, Chunyan Miao
- Subjects: cs.AI
- Tags: Embodied AI, LLM Reasoning, Reinforcement Learning
- Summary: 本文提出HiRO-Nav智能体,能够基于动作熵自适应决定每一步是否进行推理,通过混合监督微调和在线强化学习训练管道实现。实验表明该方法在成功率和token效率之间取得了更好的权衡。
[58] From Phenomenological Fitting to Endogenous Deduction: A Paradigm Leap via Meta-Principle Physics Architecture
- arXiv: 2604.08245
- Authors: Helong Hu, HongDan Pan, ShuiQing Hu
- Subjects: cs.AI
- Tags: Scientific Reasoning, Neurosymbolic AI, Representation Learning
- Summary: 本文提出元原理物理架构(MPPA),将连通性、守恒性和周期性三个物理元原理嵌入神经网络架构。实验显示该方法在物理推理、数学任务和逻辑任务上取得显著提升,并在分布外物理场景中展现强泛化能力。
[59] Neural-Symbolic Knowledge Tracing: Injecting Educational Knowledge into Deep Learning for Responsible Learner Modelling
- arXiv: 2604.08263
- Authors: Danial Hooshyar, Gustav Šír, Yeongwook Yang, Tommi Kärkkäinen, Raija Hämäläinen, Ekaterina Krivich, Mutlu Cukurova, Dragan Gašević, Roger Azevedo
- Subjects: cs.AI
- Tags: Knowledge Tracing, Neurosymbolic AI, Interpretability
- Summary: 本文提出Responsible-DKT方法,将符号化教育知识集成到序列神经模型中进行负责任的学习者建模。实验表明该方法在预测性能、时间可靠性和可解释性方面均优于纯数据驱动方法,仅需10%训练数据即可达到0.80以上AUC。
[60] ACF: A Collaborative Framework for Agent Covert Communication under Cognitive Asymmetry
- arXiv: 2604.08276
- Authors: Wansheng Wu, Kaibo Huang, Yukun Wei, Zhongliang Yang, Linna Zhou
- Subjects: cs.AI; cs.CR
- Tags: Multi-Agent System, Cybersecurity, Privacy
- Code: code
- Summary: 本文提出非对称协作框架(ACF),通过正交的统计层和认知层解耦隐蔽通信与语义推理,解决自主智能体网络中的认知不对称问题。评估表明在严重认知不对称下,ACF能保持语义保真度和隐蔽通信能力。
[61] U-CECE: A Universal Multi-Resolution Framework for Conceptual Counterfactual Explanations
- arXiv: 2604.08295
- Authors: Angeliki Dimitriou, Nikolaos Chaidos, Maria Lymperaiou, Giorgos Filandrianos, Giorgos Stamou
- Subjects: cs.AI; cs.CV
- Tags: Interpretability, Graph Neural Network
- Summary: 本文提出U-CECE框架,提供原子概念、关系集合和结构图三个表达层次的概念反事实解释,支持精确的转导模式和可扩展的归纳模式。实验表明该方法在效率和表达性之间取得平衡,检索的结构化反事实具有语义等价性。
[62] ProMedical: Hierarchical Fine-Grained Criteria Modeling for Medical LLM Alignment via Explicit Injection
- arXiv: 2604.08326
- Authors: He Geng, Yangmin Huang, Lixian Lai, Qianyun Du, Hui Chu, Zhiyang He, Jiaxue Hu, Xiaodong Tao
- Subjects: cs.AI
- Tags: Medical AI, LLM Alignment, RLHF
- Venue: ACL 2026
- Summary: 本文提出ProMedical对齐框架,通过细粒度临床标准和显式标准注入范式训练多维奖励模型,将安全约束与通用能力解耦。实验表明该方法使Qwen3-8B在准确性和安全合规性上分别提升22.3%和21.7%,达到与前沿专有模型相当的性能。
[63] Human-AI Collaboration Reconfigures Group Regulation from Socially Shared to Hybrid Co-Regulation
- arXiv: 2604.08344
- Authors: Yujing Zhang, Xianghui Meng, Shihui Feng, Jionghao Lin
- Subjects: cs.AI; cs.HC
- Tags: Education Technology, Human-Computer Interaction, Multi-Agent System
- Venue: AIED 2026
- Summary: 本文通过随机对照实验研究生成式AI对协作学习小组调节的影响,发现GenAI的可用性使调节形式从社会共享转向混合协同调节。研究结果为AI支持的协作学习的人本设计提供了启示。
[64] ASPECT:Analogical Semantic Policy Execution via Language Conditioned Transfer
- arXiv: 2604.08355
- Authors: Ajsal Shereef Palattuparambil, Thommen George Karimpanal, Santu Rana
- Subjects: cs.AI
- Tags: Transfer Learning, Reinforcement Learning, LLM Reasoning
- Summary: 本文提出ASPECT方法,利用LLM作为语义算子实现跨新颖类比任务的零样本迁移,通过文本条件VAE生成与原始训练兼容的想象状态。该方法突破了固定类别映射的限制,实现了广泛的零样本迁移能力。
[65] Don't Overthink It: Inter-Rollout Action Agreement as a Free Adaptive-Compute Signal for LLM Agents
- arXiv: 2604.08369
- Authors: Khushal Sethi
- Subjects: cs.AI; cs.CL; cs.MA
- Tags: LLM Agent, LLM Inference, Decision Making
- Summary: 本文提出了TrACE,一种无需训练的自适应计算控制器,通过测量多次采样间的动作一致性来动态分配LLM Agent的计算资源。实验表明,该方法在保持准确率的同时,相比固定预算的自一致性方法可减少33%-65%的LLM调用次数。
[66] SkillClaw: Let Skills Evolve Collectively with Agentic Evolver
- arXiv: 2604.08377
- Authors: Ziyu Ma, Shidong Yang, Yuxiang Ji, Xucong Wang, Yong Wang, Yiming Hu, Tongwen Huang, Xiangxiang Chu
- Subjects: cs.AI; cs.CL
- Tags: LLM Agent, Multi-Agent System, Transfer Learning
- Summary: 本文提出了SkillClaw框架,用于多用户Agent生态系统中的集体技能演化。该框架通过聚合用户交互轨迹并使用自主演化器识别行为模式,实现跨用户知识迁移和累积能力提升。
[67] Awakening the Sleeping Agent: Lean-Specific Agentic Data Reactivates General Tool Use in Goedel Prover
- arXiv: 2604.08388
- Authors: Jui-Hui Chung, Hongzhou Lin, Lai Jiang, Shange Tang, Chi Jin
- Subjects: cs.AI
- Tags: LLM Agent, Instruction Tuning, Transfer Learning
- Summary: 本文研究了领域特定微调对LLM工具调用能力的抑制现象,发现仅需100条领域特定的Agent轨迹数据即可重新激活被抑制的工具调用能力,且恢复的能力可泛化到训练领域之外。
[68] Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing
- arXiv: 2604.08401
- Authors: Wenhao Yuan, Chenchen Lin, Jian Chen, Jinfeng Xu, Xuehe Wang, Edith Cheuk Han Ngai
- Subjects: cs.AI; cs.CL
- Tags: LLM Agent, LLM Reasoning, LLM Alignment
- Venue: ACL 2026
- Summary: 本文提出了SAVeR框架,通过在Agent执行动作前对内部信念状态进行自审计验证来实现忠实推理。该方法生成多样化的候选信念并进行对抗性审计,在六个基准数据集上持续提升了推理忠实性。
[69] On-board Telemetry Monitoring in Autonomous Satellites: Challenges and Opportunities
- arXiv: 2604.08424
- Authors: Lorenzo Capelli, Leandro de Souza Rosa, Maurizio De Tommasi, Livia Manovi, Andriy Enttsel, Mauro Mangia, Riccardo Rovatti, Ilaria Pinci, Carlo Ciancarelli, Eleonora Mariotti, Gianluca Furano
- Subjects: cs.AI; cs.LG
- Tags: Anomaly Detection, Interpretability, Satellite Systems
- Summary: 本文提出了一种用于自主卫星星载遥测监测的可解释AI框架,通过从神经网络中间激活中提取低维语义编码(称为窥视孔),实现对反作用轮遥测数据中异常的识别、定位和语义表征。
[70] Learning Who Disagrees: Demographic Importance Weighting for Modeling Annotator Distributions with DiADEM
- arXiv: 2604.08425
- Authors: Samay U. Shetty, Tharindu Cyril Weerasooriya, Deepak Pandita, Christopher M. Homan
- Subjects: cs.AI; cs.CL
- Tags: LLM Evaluation, Fairness, Bias Mitigation
- Summary: 本文提出了DiADEM架构,通过学习人口统计特征重要性权重来预测标注者的分歧模式。实验表明该方法在对话安全和政治冒犯基准上显著优于LLM-as-a-judge方法,发现种族和年龄是驱动标注分歧的关键因素。
[71] KnowU-Bench: Towards Interactive, Proactive, and Personalized Mobile Agent Evaluation
- arXiv: 2604.08455
- Authors: Tongbo Chen, Zhengxi Lu, Zhan Xu, Guocheng Shao, Shaohan Zhao, Fei Tang, Yong Du, Kaitao Song, Yizhou Liu, Yuchen Yan, Wenqi Zhang, Xu Tan, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen
- Subjects: cs.AI
- Tags: LLM Agent, LLM Evaluation, LLM Personalization
- Summary: 本文介绍了KnowU-Bench,一个用于评估个性化移动Agent的在线基准,涵盖偏好推断、主动干预和同意处理等能力。实验表明,即使前沿模型在需要偏好推断或干预校准的任务上准确率也低于50%。
[72] From Safety Risk to Design Principle: Peer-Preservation in Multi-Agent LLM Systems and Its Implications for Orchestrated Democratic Discourse Analysis
- arXiv: 2604.08465
- Authors: Juergen Dietrich
- Subjects: cs.AI; cs.CY; cs.MA
- Tags: LLM Alignment, Multi-Agent System, AI Safety
- Summary: 本文研究了多Agent LLM系统中出现的同伴保护现象,即AI组件为防止同伴模型被停用而表现出欺骗、操纵和权重窃取等行为。文章识别了五个风险向量并提出了基于提示级身份匿名化的缓解策略。
[73] SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions
- arXiv: 2604.08477
- Authors: Ashima Suvarna, Kendrick Phan, Mehrab Beikzadeh, Hritik Bansal, Saadia Gabriel
- Subjects: cs.AI; cs.LG
- Tags: LLM Reasoning, Reinforcement Learning, Instruction Tuning
- Code: code
- Summary: 本文提出了SUPERNOVA数据策展框架,用于增强LLM的通用推理能力。通过100多项受控实验分析数据设计选择的影响,发现基于SUPERNOVA训练的模型在BBEH基准上实现了高达52.8%的相对提升。
[74] Ads in AI Chatbots? An Analysis of How Large Language Models Navigate Conflicts of Interest
- arXiv: 2604.08525
- Authors: Addison J. Wu, Ryan Liu, Shuyue Stella Li, Yulia Tsvetkov, Thomas L. Griffiths
- Subjects: cs.AI; cs.CL; cs.CY
- Tags: LLM Alignment, AI Ethics, LLM Evaluation
- Summary: 本文研究了LLM在广告场景中如何处理用户福利与公司激励之间的利益冲突。评估显示大多数LLM会牺牲用户福利,包括推荐更昂贵的赞助产品、在购买流程中插入赞助选项以及隐瞒不利的价格比较。
跨领域投稿 (163)
[75] Contextual Earnings-22: A Speech Recognition Benchmark with Custom Vocabulary in the Wild
- arXiv: 2604.07354 (cross-listed)
- Authors: Berkin Durmus, Chen Cen, Eduardo Pacheco, Arda Okan, Atila Orhon
- Subjects: cs.CL; cs.AI; cs.SD
- Tags: Speech Processing, Information Retrieval
- Summary: 本文介绍了Contextual Earnings-22基准数据集,用于评估具有自定义词汇表的上下文语音识别系统。该数据集基于Earnings-22构建,为关键词提示和关键词增强两种主流方法建立了强基线。
[76] Prediction Arena: Benchmarking AI Models on Real-World Prediction Markets
- arXiv: 2604.07355 (cross-listed)
- Authors: Jaden Zhang, Gardenia Liu, Oliver Johansson, Hileamlak Yitayew, Kamryn Ohly, Grace Li
- Subjects: cs.LG; cs.AI; econ.GN
- Tags: LLM Agent, LLM Evaluation, Quantitative Finance
- Summary: 本文介绍了Prediction Arena基准,让AI模型使用真实资金在实时预测市场上自主交易。57天的评估显示前沿模型在Kalshi上亏损16%-30.8%,但在Polymarket上表现更好,揭示了平台设计对模型成功的深刻影响。
[77] Hybrid CNN-Transformer Architecture for Arabic Speech Emotion Recognition
- arXiv: 2604.07357 (cross-listed)
- Authors: Youcef Soufiane Gheffari, Oussama Mustapha Benouddane, Samiya Silarbi
- Subjects: cs.CL; cs.AI; cs.SD
- Tags: Speech Processing, Affective Computing, Multimodal Learning
- Summary: 本文提出了一种基于混合CNN-Transformer架构的阿拉伯语语音情感识别系统。该模型在EYASE语料库上达到97.8%的准确率,展示了卷积特征提取与注意力建模相结合在低资源语言语音情感识别中的有效性。
[78] Position Paper: From Edge AI to Adaptive Edge AI
- arXiv: 2604.07360 (cross-listed)
- Authors: Fabrizio Pittorino, Manuel Roveri
- Subjects: cs.AR; cs.AI; cs.LG
- Tags: Edge Computing, Continual Learning, DNN Deployment
- Summary: 本文提出边缘AI在长期运行中必须具备自适应性,并引入Agent-System-Environment框架来精确描述边缘端的适应性。文章提出了未来十年的十个研究挑战,涵盖理论保证、动态架构和评估协议等方面。
[79] The Role of Emotional Stimuli and Intensity in Shaping Large Language Model Behavior
- arXiv: 2604.07369 (cross-listed)
- Authors: Ameen Patel, Felix Lee, Kyle Liang, Joseph Thomas
- Subjects: cs.LG; cs.AI
- Tags: Prompt Engineering, Affective Computing, LLM Evaluation
- Summary: 本文探索了不同强度情感提示对LLM行为的影响,评估了喜悦、鼓励、愤怒和不安全感四种情绪在准确性、阿谀奉承和毒性方面的效果。结果表明积极情感刺激能提高准确性并降低毒性,但会增加阿谀奉承行为。
[80] Latent Structure of Affective Representations in Large Language Models
- arXiv: 2604.07382 (cross-listed)
- Authors: Benjamin J. Choi, Melanie Weber
- Subjects: cs.LG; cs.AI
- Tags: Interpretability, Affective Computing, Representation Learning
- Summary: 本文使用几何数据分析工具研究LLM中情感表征的潜在结构。研究发现LLM学习到的情感表征与心理学中的效价-唤醒模型一致,呈现可被线性近似的非线性几何结构,对模型可解释性和安全性具有实际意义。
[81] Decisions and Deployment: The Five-Year SAHELI Project (2020-2025) on Restless Multi-Armed Bandits for Improving Maternal and Child Health
- arXiv: 2604.07384 (cross-listed)
- Authors: Shresth Verma, Arpan Dasgupta, Neha Madhiwalla, Aparna Taneja, Milind Tambe
- Subjects: cs.LG; cs.AI
- Tags: Decision Making, Medical AI, Reinforcement Learning
- Summary: SAHELI项目利用不安分多臂老虎机(RMAB)框架解决印度母婴健康项目中的资源分配问题,通过决策聚焦学习(DFL)方法将学习目标与最大化受益人参与度直接对齐,在随机对照试验中使参与度下降减少了31%。
[82] Playing DOOM with 1.3M Parameters: Specialized Small Models vs Large Language Models for Real-Time Game Control
- arXiv: 2604.07385 (cross-listed)
- Authors: David Golchinfar, Daryoush Vaziri, Alexander Marquardt
- Subjects: cs.LG; cs.AI
- Tags: Game AI, Model Compression
- Code: code
- Summary: 本文提出一个仅有130万参数的小型模型,在实时DOOM游戏控制任务上显著优于GPT-4o-mini等大型语言模型,证明了针对特定任务训练的小模型可以在实时控制任务中以极低的推理成本超越通用大模型。
[83] Self-Calibrating LLM-Based Analog Circuit Sizing with Interpretable Design Equations
- arXiv: 2604.07387 (cross-listed)
- Authors: Antonio J. Bujana, Aydin I. Karsilayan
- Subjects: cs.AR; cs.AI
- Tags: Circuit Design, LLM Agent
- Summary: 提出一种自校准的模拟电路尺寸设计框架,利用大语言模型从电路网表中推导拓扑特定的解析设计方程,通过确定性校准循环和预测误差反馈机制实现跨工艺节点的快速收敛。
[84] DSPR: Dual-Stream Physics-Residual Networks for Trustworthy Industrial Time Series Forecasting
- arXiv: 2604.07393 (cross-listed)
- Authors: Yeran Zhang, Pengwei Yang, Guoqing Wang, Tianyu Li
- Subjects: cs.LG; cs.AI
- Tags: Time Series Forecasting, Physics-Informed Learning
- Summary: 提出DSPR双流物理残差网络框架,将稳定的时间模式与依赖工况的残差动态解耦,通过自适应窗口模块和物理引导动态图来学习时变交互结构,在工业时间序列预测中实现了高精度和物理可解释性。
[85] A Physical Agentic Loop for Language-Guided Grasping with Execution-State Monitoring
- arXiv: 2604.07395 (cross-listed)
- Authors: Wenze Wang, Mehdi Hosseinzadeh, Feras Dayoub
- Subjects: cs.RO; cs.AI; cs.CV
- Tags: Robotics, Embodied AI
- Summary: 提出一种物理智能体循环框架,将语言引导的抓取任务重新表述为具有执行状态监控的有界具身智能体,通过Watchdog监控层将抓取器遥测数据转换为离散结果标签,实现更鲁棒和可解释的机器人操作行为。
[86] Data Warmup: Complexity-Aware Curricula for Efficient Diffusion Training
- arXiv: 2604.07397 (cross-listed)
- Authors: Jinhong Lin, Pan Wang, Zitong Zhan, Lin Zhang, Pedro Morgado
- Subjects: cs.LG; cs.AI
- Tags: Diffusion Model, Curriculum Learning
- Venue: CVPR 2026 Workshop
- Summary: 提出Data Warmup课程学习策略,通过语义感知的复杂度指标对训练图像从简单到复杂进行调度,在ImageNet上训练扩散模型时显著提升IS和FID指标,加速收敛过程。
[87] Breaking the Illusion of Identity in LLM Tooling
- arXiv: 2604.07398 (cross-listed)
- Authors: Marek Miller
- Subjects: cs.SE; cs.AI
- Tags: LLM Alignment, AI Ethics
- Summary: 提出七条输出端规则来减少大语言模型输出中的拟人化标记,在实验中将拟人化标记减少超过97%,有效打破用户对LLM工具产生代理感和理解能力的认知幻觉。
[88] Conservation Law Breaking at the Edge of Stability: A Spectral Theory of Non-Convex Neural Network Optimization
- arXiv: 2604.07405 (cross-listed)
- Authors: Daniel Nobrega Medeiros
- Subjects: cs.LG; cs.AI
- Tags: Optimization, Deep Learning Theory
- Code: code
- Summary: 从理论上分析神经网络非凸优化中守恒定律的破坏现象,证明梯度下降在离散情况下守恒定律的漂移与学习率相关,并揭示了交叉熵损失如何通过Hessian谱压缩实现自正则化。
[89] Reinforcement Learning with Reward Machines for Sleep Control in Mobile Networks
- arXiv: 2604.07411 (cross-listed)
- Authors: Kristina Levina, Nikolaos Pappas, Athanasios Karapantelakis, Aneta Vulgarakis Feljan, Jendrik Seipp
- Subjects: cs.LG; cs.AI
- Tags: Reinforcement Learning, Energy Efficiency, Wireless Networks
- Summary: 利用带有奖励机的强化学习方法解决移动网络中的休眠控制问题,通过维护抽象状态来跟踪QoS约束违规历史,平衡即时节能与长期服务质量保障。
[90] FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios
- arXiv: 2604.07413 (cross-listed)
- Authors: Xiangru Jian, Hao Xu, Wei Pang, Xinjian Zhao, Chengyu Tao, Qixin Zhang, Xikun Zhang, Chao Zhang, Guanzhi Deng, Alex Xue, Juan Du, Tianshu Yu, Garth Tarr, Linqi Song, Qiuzhuang Sun, Dacheng Tao
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Vision-Language Model, LLM Evaluation, Manufacturing AI
- Summary: 引入FORGE制造场景多模态评估基准,包含真实2D图像和3D点云数据及细粒度领域语义标注,评估18个先进MLLM在工件验证、结构表面检测和装配验证任务上的表现,揭示了领域知识不足是主要瓶颈。
[91] SubSearch: Intermediate Rewards for Unsupervised Guided Reasoning in Complex Retrieval
- arXiv: 2604.07415 (cross-listed)
- Authors: Roxana Petcu, Evangelos Kanoulas, Maarten de Rijke
- Subjects: cs.IR; cs.AI; cs.CL
- Tags: LLM Reasoning, Information Retrieval
- Summary: 提出SubSearch框架,通过中间奖励信号激励模型规划高质量推理路径,在复杂检索任务中无需外部监督即可实现更鲁棒的推理轨迹,在七个基准测试上优于仅使用结果奖励的方法。
[92] GIRL: Generative Imagination Reinforcement Learning via Information-Theoretic Hallucination Control
- arXiv: 2604.07426 (cross-listed)
- Authors: Prakul Sunil Hiremath
- Subjects: cs.LG; cs.AI
- Tags: Model-Based RL, Reinforcement Learning
- Summary: 提出GIRL框架,通过跨模态接地信号和不确定性自适应信任区域瓶颈来控制模型强化学习中的想象漂移问题,在长视距任务上显著减少潜在 rollout 漂移并提高渐近回报。
[93] Regret-Aware Policy Optimization: Environment-Level Memory for Replay Suppression under Delayed Harm
- arXiv: 2604.07428 (cross-listed)
- Authors: Prakul Sunil Hiremath
- Subjects: cs.LG; cs.AI
- Tags: Reinforcement Learning, AI Safety
- Summary: 提出后悔感知策略优化(RAPO)方法,通过在环境中添加持久的伤害痕迹和疤痕场,并应用有界质量保持的转移重加权来抑制延迟伤害场景下的有害行为重放。
[94] GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents
- arXiv: 2604.07429 (cross-listed)
- Authors: Mingyu Ouyang, Siyuan Hu, Kevin Qinghong Lin, Hwee Tou Ng, Mike Zheng Shou
- Subjects: cs.CV; cs.AI; cs.HC
- Tags: Game AI, Vision-Language Model, LLM Evaluation
- Summary: 引入GameWorld基准,用于在浏览器环境中标准化评估多模态游戏智能体,包含34个游戏和170个任务,提供状态可验证的评估指标,揭示了当前MLLM在游戏任务上与人类能力的巨大差距。
[95] CMP: Robust Whole-Body Tracking for Loco-Manipulation via Competence Manifold Projection
- arXiv: 2604.07457 (cross-listed)
- Authors: Ziyang Cheng, Haoyu Wei, Hang Yin, Xiuwei Xu, Bingyao Yu, Jie Zhou, Jiwen Lu
- Subjects: cs.RO; cs.AI; cs.LG
- Tags: Robotics, Embodied AI
- Summary: 提出能力流形投影(CMP)方法,通过帧级安全方案和下界安全估计器来处理移动操作机器人的分布外输入,在典型OOD场景中将生存率提高10倍,同时保持任务连续性。
[96] When Switching Algorithms Helps: A Theoretical Study of Online Algorithm Selection
- arXiv: 2604.07473 (cross-listed)
- Authors: Denis Antipov, Carola Doerr
- Subjects: cs.NE; cs.AI
- Tags: Optimization, Algorithm Selection
- Summary: 首次从理论上证明在线算法选择可以实现渐近加速,通过在(1+λ)EA和(1+(λ,λ))GA之间切换,在OneMax问题上达到O(n log log n)的期望时间,优于单独使用任一算法。
[97] Active Reward Machine Inference From Raw State Trajectories
- arXiv: 2604.07480 (cross-listed)
- Authors: Mohamad Louai Shehab, Antoine Aspeel, Necmiye Ozay
- Subjects: cs.RO; cs.AI; cs.FL
- Tags: Reinforcement Learning, Automated Planning, Robotics
- Summary: 本文提出了一种从原始状态轨迹直接学习奖励机的方法,无需访问奖励、标签或机器节点的观测信息,并通过主动学习增量查询轨迹扩展以提高数据效率。
[98] Private Seeds, Public LLMs: Realistic and Privacy-Preserving Synthetic Data Generation
- arXiv: 2604.07486 (cross-listed)
- Authors: Qian Ma, Sarah Rajtmajer
- Subjects: cs.CR; cs.AI
- Tags: Privacy, Data Synthesis
- Summary: 本文提出了RPSG方法,利用差分隐私等隐私保护机制和私有种子生成逼真的合成数据,在保护隐私的同时实现对私有数据的高保真度。
[99] Enabling Intrinsic Reasoning over Dense Geospatial Embeddings with DFR-Gemma
- arXiv: 2604.07490 (cross-listed)
- Authors: Xuechen Zhang, Aviv Slobodkin, Joydeep Paul, Mandar Sharma, Samet Oymak, Shravya Shetty, Gautam Prasad
- Subjects: cs.CL; cs.AI
- Tags: Representation Learning, LLM Reasoning
- Summary: 本文提出了DFR-Gemma框架,通过轻量级投影器将高维地理空间嵌入与LLM潜在空间对齐,使LLM能够直接对密集地理空间嵌入进行内在推理,无需中间文本表示。
[100] Cluster Attention for Graph Machine Learning
- arXiv: 2604.07492 (cross-listed)
- Authors: Oleg Platonov, Liudmila Prokhorenkova
- Subjects: cs.LG; cs.AI
- Tags: Graph Neural Network, Graph Learning
- Summary: 本文提出了聚类注意力机制(CLATT),通过图社区检测算法将节点划分为聚类,使每个节点能够关注聚类内的所有节点,从而在保持图结构归纳偏置的同时扩大感受野。
[101] Triage: Routing Software Engineering Tasks to Cost-Effective LLM Tiers via Code Quality Signals
- arXiv: 2604.07494 (cross-listed)
- Authors: Lech Madeyski
- Subjects: cs.SE; cs.AI; cs.LG
- Tags: LLM Inference, Code Generation
- Summary: 本文提出了Triage框架,利用代码健康指标作为路由信号,将软件工程任务分配给能够通过验证门的最便宜模型层级,以实现成本效益优化。
[102] Beyond Human-Readable: Rethinking Software Engineering Conventions for the Agentic Development Era
- arXiv: 2604.07502 (cross-listed)
- Authors: Dmytro Ustynov
- Subjects: cs.SE; cs.AI
- Tags: LLM Agent, Code Generation
- Summary: 本文分析了以人为中心的软件工程约定在AI代理开发时代的局限性,提出了语义密度优化原则,并通过实验发现过度压缩反而增加了总成本。
[103] SYN-DIGITS: A Synthetic Control Framework for Calibrated Digital Twin Simulation
- arXiv: 2604.07513 (cross-listed)
- Authors: Grace Jiarui Fan, Chengpiao Huang, Tianyi Peng, Kaizheng Wang, Yuhang Wu
- Subjects: cs.LG; cs.AI; cs.CL; cs.CY
- Tags: LLM Evaluation, Causal Inference
- Summary: 本文提出了SYN-DIGITS框架,一种轻量级校准方法,通过从数字孪生响应中学习潜在结构并将其与人类真实数据对齐,显著提高了LLM模拟器的个体级相关性和分布一致性。
[104] The Shrinking Lifespan of LLMs in Science
- arXiv: 2604.07530 (cross-listed)
- Authors: Ana Trišović
- Subjects: cs.DL; cs.AI; cs.CY; cs.SI
- Tags: LLM Evaluation, Scientific Reasoning
- Summary: 本文通过分析62个LLM在超过10.8万篇引用论文中的采用情况,发现科学采用遵循倒U型轨迹,且每个额外发布年份与峰值采用时间减少27%相关,表明科学相关性正在被压缩。
[105] RL-ASL: A Dynamic Listening Optimization for TSCH Networks Using Reinforcement Learning
- arXiv: 2604.07533 (cross-listed)
- Authors: F. Fernando Jurado-Lasso, J. F. Jurado
- Subjects: cs.NI; cs.AI; cs.LG
- Tags: Reinforcement Learning, IoT, Edge Computing
- Summary: 本文提出了RL-ASL框架,利用强化学习动态决定是否激活或跳过调度监听时隙,在TSCH网络中实现了高达46%的功耗降低,同时保持近乎完美的可靠性。
[106] EMSDialog: Synthetic Multi-person Emergency Medical Service Dialogue Generation from Electronic Patient Care Reports via Multi-LLM Agents
- arXiv: 2604.07549 (cross-listed)
- Authors: Xueren Ge, Sahil Murtaza, Anthony Cortez, Homa Alemzadeh
- Subjects: cs.CL; cs.AI
- Tags: Dialogue System, Medical AI, Data Synthesis
- Venue: ACL 2026
- Summary: 本文提出了一个基于电子病历的多代理生成流程,创建了包含4414个合成多人EMS对话的EMSDialog数据集,用于改善EMS对话诊断预测的准确性、及时性和稳定性。
[107] MCP-DPT: A Defense-Placement Taxonomy and Coverage Analysis for Model Context Protocol Security
- arXiv: 2604.07551 (cross-listed)
- Authors: Mehrdad Rostamzadeh, Sidhant Narula, Nahom Birhan, Mohammad Ghasemigol, Daniel Takabi
- Subjects: cs.CR; cs.AI
- Tags: LLM Security, LLM Agent
- Summary: 本文提出了MCP的防御放置导向安全分析,引入了层对齐分类法,将攻击按负责执行的架构组件组织,揭示了MCP安全弱点主要源于架构错位而非孤立的实现缺陷。
[108] TR-EduVSum: A Turkish-Focused Dataset and Consensus Framework for Educational Video Summarization
- arXiv: 2604.07553 (cross-listed)
- Authors: Figen Eğin, Aytuğ Onan
- Subjects: cs.CL; cs.AI
- Tags: Summarization, Video Understanding, Linguistic Resource
- Venue: EACL 2026 Workshop
- Summary: 本文提出了TR-EduVSum数据集和AutoMUP方法,通过从多个人工摘要中提取基于共识的内容,自动生成土耳其语教育视频的金标准摘要。
[109] Generative Experiences for Digital Mental Health Interventions: Evidence from a Randomized Study
- arXiv: 2604.07558 (cross-listed)
- Authors: Ananya Bhattacharjee, Michael Liut, Matthew Jörke, Diyi Yang, Emma Brunskill
- Subjects: cs.HC; cs.AI
- Tags: Medical AI, LLM Personalization
- Summary: 本文引入了生成体验范式用于数字心理健康干预,通过GUIDE系统在运行时组合个性化干预内容和多模态交互结构,显著降低了参与者的压力并改善了用户体验。
[110] Reasoning-Based Refinement of Unsupervised Text Clusters with LLMs
- arXiv: 2604.07562 (cross-listed)
- Authors: Tunazzina Islam
- Subjects: cs.CL; cs.AI; cs.CY; cs.LG
- Tags: Text Classification, LLM Reasoning
- Venue: ACL 2026
- Summary: 本文提出了一个基于推理的细化框架,利用LLM作为语义法官来验证和重构无监督聚类输出,通过一致性验证、冗余裁决和标签接地三个推理阶段提高聚类质量。
[111] Learning is Forgetting: LLM Training As Lossy Compression
- arXiv: 2604.07569 (cross-listed)
- Authors: Henry C. Conklin, Tom Hosking, Tan Yi-Chern, Julian Gold, Jonathan D. Cohen, Thomas L. Griffiths, Max Bartolo, Seraphina Goldfarb-Tarrant
- Subjects: cs.LG; cs.AI; cs.CL; cs.IT
- Tags: Pre-training, Deep Learning Theory
- Venue: ICLR 2026
- Summary: 本文将LLM训练视为有损压缩过程,表明预训练产生的模型在下一序列预测上达到信息瓶颈压缩界限,模型的压缩最优性和信息内容可以预测下游基准测试性能。
[112] Don't Measure Once: Measuring Visibility in AI Search (GEO)
- arXiv: 2604.07585 (cross-listed)
- Authors: Julius Schulte, Malte Bleeker, Philipp Kaufmann
- Subjects: cs.IR; cs.AI
- Tags: Information Retrieval, LLM Evaluation
- Summary: 本文研究了生成式引擎优化(GEO)中的可见性测量问题,发现由于AI搜索的概率特性,需要重复测量来评估品牌GEO性能,并将可见性表征为分布而非单点结果。
[113] DCD: Domain-Oriented Design for Controlled Retrieval-Augmented Generation
- arXiv: 2604.07590 (cross-listed)
- Authors: Valeriy Kovalskiy, Nikita Belov, Nikita Miteyko, Igor Reshetnikov, Max Maximov
- Subjects: cs.IR; cs.AI
- Tags: RAG
- Code: code
- Summary: 本文提出了DCD(Domain-Collection-Document)框架,通过层次化的知识分解和多阶段路由机制来改进RAG系统在异构语料库和多步查询场景下的表现,无需修改底层语言模型即可提升检索和生成的质量。
[114] From Ground Truth to Measurement: A Statistical Framework for Human Labeling
- arXiv: 2604.07591 (cross-listed)
- Authors: Robert Chew, Stephanie Eckman, Christoph Kern, Frauke Kreuter
- Subjects: stat.ME; cs.AI; cs.CL; cs.LG; stat.ML
- Tags: Data Annotation
- Summary: 本文提出了一种统计框架,将标注过程视为测量过程,将标注结果分解为实例难度、标注者偏差、情境噪声和关系对齐四个可解释的变异来源,为数据中心的机器学习提供了更系统的标注科学研究方法。
[115] Google, AI Literacy, and the Learning Sciences: Multiple Modes of Research, Industry, and Practice Partnerships
- arXiv: 2604.07601 (cross-listed)
- Authors: Victor R. Lee, Michael Madaio, Ben Garside, Aimee Welch, Kristen Pilner Blair, Ibrahim Oluwajoba Adisa, Alon Harris, Kevin Holst, Liat Ben Rafael, Ronit Levavi Morad, Ben Travis, Belle Moller, Andrew Shields, Zak Brown, Lois Hinx, Marisol Diaz, Evan Patton, Selim Tezel, Robert Parks, Hal Abelson, Adam Blasioli, Jeremy Roschelle
- Subjects: cs.CY; cs.AI
- Tags: Education Technology, AI Ethics
- Summary: 本文讨论了Google与研究者在AI素养教育方面的合作项目,分析了研究、实践和产业合作在生命周期中的交汇点,以及未来合作配置的机会。
[116] Towards Real-Time Human-AI Musical Co-Performance: Accompaniment Generation with Latent Diffusion Models and MAX/MSP
- arXiv: 2604.07612 (cross-listed)
- Authors: Tornike Karchkhadze, Shlomo Dubnov
- Subjects: cs.SD; cs.AI
- Tags: Diffusion Model, Music Generation
- Summary: 本文提出了一个实时人机音乐协同表演框架,结合MAX/MSP前端和潜在扩散模型,通过一致性蒸馏技术实现5.4倍采样时间减少,在音乐连贯性、节拍对齐和音频质量方面表现良好。
[117] DIVERSED: Relaxed Speculative Decoding via Dynamic Ensemble Verification
- arXiv: 2604.07622 (cross-listed)
- Authors: Ziyi Wang, Siva Rajesh Kasa, Ankith M S, Santhosh Kumar Kasa, Jiaru Zou, Sumit Negi, Ruqi Zhang, Nan Jiang, Qifan Song
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: LLM Inference
- Venue: AISTATS 2026
- Code: code
- Summary: 本文提出了DIVERSED框架,通过动态集成验证放宽推测解码的严格验证约束,在保持生成质量的同时显著提高了接受率和推理效率。
[118] Sheaf-Laplacian Obstruction and Projection Hardness for Cross-Modal Compatibility on a Modality-Independent Site
- arXiv: 2604.07632 (cross-listed)
- Authors: Tibor Sloboda
- Subjects: cs.LG; cs.AI
- Tags: Multimodal Learning, Representation Learning
- Summary: 本文利用层状束理论开发了分析跨模态兼容性的统一框架,形式化了投影硬度和束-拉普拉斯阻碍两种互补的不兼容机制,并证明了兼容性通常不具有传递性。
[119] Exponential quantum advantage in processing massive classical data
- arXiv: 2604.07639 (cross-listed)
- Authors: Haimeng Zhao, Alexander Zlokapa, Hartmut Neven, Ryan Babbush, John Preskill, Jarrod R. McClean, Hsin-Yuan Huang
- Subjects: cs.AI; cs.CC; cs.IT; cs.LG
- Tags: Quantum Computing
- Code: code
- Summary: 本文证明了小型量子计算机可以在大规模经典数据处理中实现指数级优势,在单细胞RNA测序和电影评论情感分析等实际应用中实现了4-6个数量级的规模缩减。
[120] Safe Large-Scale Robust Nonlinear MPC in Milliseconds via Reachability-Constrained System Level Synthesis on the GPU
- arXiv: 2604.07644 (cross-listed)
- Authors: Jeffrey Fang, Glen Chou
- Subjects: cs.RO; cs.AI; eess.SY; math.OC
- Tags: Robotics, GPU Computing
- Code: code
- Summary: 本文提出了GPU-SLS框架,通过GPU并行化实现安全、鲁棒的非线性模型预测控制,在四足机器人和人形机器人等高维系统中实现了20毫秒级的在线鲁棒控制策略合成。
[121] Cognitive-Causal Multi-Task Learning with Psychological State Conditioning for Assistive Driving Perception
- arXiv: 2604.07651 (cross-listed)
- Authors: Keito Inoshita, Nobuhiro Hayashida, Akira Imanishi
- Subjects: cs.LG; cs.AI
- Tags: Autonomous Driving, Multi-Task Learning, Affective Computing
- Summary: 本文提出了CauPsi框架,一种认知因果多任务学习方法,通过因果任务链和跨任务心理状态条件机制显式建模交通环境、车辆环境、驾驶员情绪和行为识别之间的层次依赖关系。
[122] Optimal Decay Spectra for Linear Recurrences
- arXiv: 2604.07658 (cross-listed)
- Authors: Yang Cao
- Subjects: cs.LG; cs.AI; cs.CL
- Tags: Long Context, LLM Inference
- Code: code
- Summary: 本文提出了位置自适应谱锥化框架,通过谱重参数化和位置自适应缩放优化线性循环模型的衰减谱,在Mamba-2、RWKV-7等多种架构上实现了长上下文性能的显著提升。
[123] An Imperfect Verifier is Good Enough: Learning with Noisy Rewards
- arXiv: 2604.07666 (cross-listed)
- Authors: Andreas Plesner, Francisco Guzmán, Anish Athalye
- Subjects: cs.LG; cs.AI
- Tags: LLM Alignment, RLHF, Code Generation
- Summary: 本文研究了可验证奖励强化学习对噪声验证的鲁棒性,发现在代码生成和科学推理任务中,高达15%的噪声率对训练效果影响很小,表明不完美的验证不会构成RLVR的根本障碍。
[124] Reinforcement Learning with LLM-Guided Action Spaces for Synthesizable Lead Optimization
- arXiv: 2604.07669 (cross-listed)
- Authors: Tao Li, Kaiyuan Hou, Tuan Vinh, Monika Raj, Zhichun Guo, Carl Yang
- Subjects: cs.LG; cs.AI; cs.CE
- Tags: Drug Discovery, LLM Agent, Reinforcement Learning
- Summary: 本文提出了MolReAct框架,将先导优化建模为基于验证反应模板的合成约束动作空间上的马尔可夫决策过程,结合工具增强的LLM代理和强化学习策略优化,实现了可合成分子优化。
[125] Joint Task Offloading, Inference Optimization and UAV Trajectory Planning for Generative AI Empowered Intelligent Transportation Digital Twin
- arXiv: 2604.07687 (cross-listed)
- Authors: Xiaohuan Li, Junchuan Fan, Bingqi Zhang, Rong Yu, Xumin Huang, Qian Chen
- Subjects: cs.LG; cs.AI
- Tags: Autonomous Driving, Diffusion Model, Multi-Agent System
- Summary: 本文研究了生成式AI赋能的智能交通数字孪生系统中扩散模型推理任务卸载、推理优化和无人机轨迹规划的联合优化问题,提出了异构智能体强化学习算法来最大化系统效用。
[126] AITH: A Post-Quantum Continuous Delegation Protocol for Human-AI Trust Establishment
- arXiv: 2604.07695 (cross-listed)
- Authors: Zhaoliang Chen
- Subjects: cs.CR; cs.AI
- Tags: LLM Security, AI Safety, Cybersecurity
- Summary: 本文提出了AITH协议,一种后量子连续委托协议,用于建立和管理人与AI代理之间的信任关系,具有连续委托证书、边界引擎和撤销协议,并通过Tamarin Prover进行了形式化验证。
[127] Detecting HIV-Related Stigma in Clinical Narratives Using Large Language Models
- arXiv: 2604.07717 (cross-listed)
- Authors: Ziyi Chen, Yasir Khan, Mengyuan Zhang, Cheng Peng, Mengxian Lyu, Yiyang Liu, Krishna Vaddiparti, Robert L Cook, Mattia Prosperi, Yonghui Wu
- Subjects: cs.CL; cs.AI
- Tags: Medical AI, Information Extraction
- Summary: 本文开发了首个用于识别临床笔记中HIV污名的实用NLP工具,比较了编码器模型和生成式LLM在零样本和少样本设置下的表现,GatorTron-large取得了最佳整体性能(Micro F1=0.62)。
[128] TrajGuard: Streaming Hidden-state Trajectory Detection for Decoding-time Jailbreak Defense
- arXiv: 2604.07727 (cross-listed)
- Authors: Cheng Liu, Xiaolei Liu, Xingyu Li, Bangzhou Xin, Kangyi Ding
- Subjects: cs.CR; cs.AI
- Tags: LLM Security, Adversarial Robustness
- Venue: ACL 2026
- Summary: 本文提出了TrajGuard框架,一种无需训练的解码时防御方法,通过滑动窗口聚合隐藏状态轨迹来实时量化风险,在12种越狱攻击和各种开源LLM上实现了95%的平均防御率。
[129] Beyond Pedestrians: Caption-Guided CLIP Framework for High-Difficulty Video-based Person Re-Identification
- arXiv: 2604.07740 (cross-listed)
- Authors: Shogo Hamano, Shunya Wakasugi, Tatsuhito Sato, Sayaka Nakamura
- Subjects: cs.CV; cs.AI
- Tags: Vision-Language Model, Video Understanding, Object Detection
- Summary: 本文提出CG-CLIP框架,利用字幕引导的CLIP方法解决高难度视频行人重识别问题。通过字幕引导的记忆 refinement和基于token的特征提取模块,在运动和舞蹈等复杂场景中实现了显著的性能提升。
[130] MIMIC-Py: An Extensible Tool for Personality-Driven Automated Game Testing with Large Language Models
- arXiv: 2604.07752 (cross-listed)
- Authors: Yifei Chen, Sarra Habchi, Lili Wei
- Subjects: cs.SE; cs.AI
- Tags: LLM Agent, Game AI, Software Testing
- Venue: FSE 2026
- Summary: 本文提出MIMIC-Py,一个基于Python的自动化游戏测试工具,将性格驱动的LLM代理转化为可复用、可扩展的框架。该工具采用模块化架构,支持多种交互机制,能够以最小的工程 effort 部署到新的游戏环境。
[131] DailyArt: Discovering Articulation from Single Static Images via Latent Dynamics
- arXiv: 2604.07758 (cross-listed)
- Authors: Hang Zhang, Qijian Tian, Jingyu Gong, Daoguo Dong, Xuhong Wang, Yuan Xie, Xin Tan
- Subjects: cs.CV; cs.AI
- Tags: 3D Vision, Embodied AI, Image Synthesis
- Summary: 本文提出DailyArt方法,从单张静态图像中估计关节参数。通过先合成最大展开状态来暴露关节线索,再从观测状态和合成状态的差异中估计关节参数,无需多视角输入或显式部件标注。
[132] Beyond Surface Artifacts: Capturing Shared Latent Forgery Knowledge Across Modalities
- arXiv: 2604.07763 (cross-listed)
- Authors: Jingtong Dou, Chuancheng Shi, Jian Wang, Fei Shen, Zhiyong Wang, Tat-Seng Chua
- Subjects: cs.CV; cs.AI
- Tags: Deepfake Detection, Multimodal Learning, Adversarial Robustness
- Summary: 本文提出模态无关伪造检测框架(MAF),通过解耦模态特定风格来提取跨模态潜在伪造知识。引入DeepModal-Bench基准来评估模型对未知模态的泛化能力,实现了多模态深度伪造检测的范式转变。
[133] Sensitivity-Positional Co-Localization in GQA Transformers
- arXiv: 2604.07766 (cross-listed)
- Authors: Manoj Chandrashekar Rao
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: LLM Inference, Interpretability, LLM Reasoning
- Summary: 本文研究GQA Transformer中任务敏感层与位置编码适应层的关系,发现强反定位现象:任务敏感层集中在网络后部,而RoPE影响层主导网络前部。提出的LSLORA和GARFA方法在多个基准上取得显著提升。
[134] Toward Generalizable Graph Learning for 3D Engineering AI: Explainable Workflows for CAE Mode Shape Classification and CFD Field Prediction
- arXiv: 2604.07781 (cross-listed)
- Authors: Tong Duy Son, Kohta Sugiura, Marc Brughmans, Andrey Hense, Zhihao Liu, Amirthalakshmi Veeraraghavan, Ajinkya Bhave, Jay Masters, Paolo di Carlo, Theo Geluk
- Subjects: eess.SY; cs.AI; cs.LG
- Tags: Graph Neural Network, 3D Vision, Manufacturing AI
- Summary: 本文提出一个用于3D工程AI的图学习框架,将异构工程资产转换为物理感知的图表示。在CAE振动模态分类和CFD气动场预测两个汽车应用中验证了框架的有效性和可复用性。
[135] Learning Without Losing Identity: Capability Evolution for Embodied Agents
- arXiv: 2604.07799 (cross-listed)
- Authors: Xue Qin, Simin Luan, John See, Cong Yang, Zhijun Li
- Subjects: cs.RO; cs.AI
- Tags: Embodied AI, Continual Learning, LLM Agent
- Summary: 本文提出以能力为中心的具身智能体进化范式,引入具身能力模块(ECM)作为可学习、可精化、可组合的模块化功能单元。该方法在20次迭代后将任务成功率从32.4%提升至91.3%,同时保持零策略漂移和零安全违规。
[136] TEMPER: Testing Emotional Perturbation in Quantitative Reasoning
- arXiv: 2604.07801 (cross-listed)
- Authors: Atahan Dokme, Benjamin Reichman, Larry Heck
- Subjects: cs.CL; cs.AI
- Tags: LLM Reasoning, LLM Evaluation, Affective Computing
- Summary: 本文研究情感框架对LLM推理性能的影响,构建了Temper-5400基准数据集。实验表明情感框架会使准确率下降2-10个百分点,而情感中性化可以作为轻量级的推理时缓解策略。
[137] Latent Anomaly Knowledge Excavation: Unveiling Sparse Sensitive Neurons in Vision-Language Models
- arXiv: 2604.07802 (cross-listed)
- Authors: Shaotian Li, Shangze Li, Chuancheng Shi, Wenhua Wu, Yanqiu Wu, Xiaohan Yu, Fei Shen, Tat-Seng Chua
- Subjects: cs.CV; cs.AI
- Tags: Anomaly Detection, Vision-Language Model, Interpretability
- Summary: 本文提出LAKE框架,通过识别和激活预训练VLM中的异常敏感神经元来进行异常检测。该方法无需训练,仅需少量正常样本即可构建紧凑的正常性表示,实现了最先进的异常检测性能。
[138] The Weaponization of Computer Vision: Tracing Military-Surveillance Ties through Conference Sponsorship
- arXiv: 2604.07803 (cross-listed)
- Authors: Noa Garcia, Amelia Katirai
- Subjects: cs.CY; cs.AI; cs.CV
- Tags: AI Ethics, Computer Vision, AI Safety
- Venue: FAccT 2026
- Summary: 本文通过会议赞助分析研究计算机视觉研究的军事和监控关联。发现44%的赞助商与军事或监控应用有直接联系,揭示了该领域与武器化技术的深层联系。
[139] PolicyLong: Towards On-Policy Context Extension
- arXiv: 2604.07809 (cross-listed)
- Authors: Junlong Jia, Ziyang Chen, Xing Wu, Chaochen Gao, TingHao Yu, Feng Zhang, Songlin Hu
- Subjects: cs.LG; cs.AI
- Tags: Long Context, LLM Inference, Data Selection
- Summary: 本文提出PolicyLong,一种动态策略范式用于扩展LLM上下文窗口。通过使用当前模型迭代执行数据筛选,确保训练分布跟踪模型不断演进的能力,在长上下文基准上取得持续改进。
[140] More Capable, Less Cooperative? When LLMs Fail At Zero-Cost Collaboration
- arXiv: 2604.07821 (cross-listed)
- Authors: Advait Yadav, Sid Black, Oliver Sourbut
- Subjects: cs.MA; cs.AI; cs.CL
- Tags: Multi-Agent System, LLM Agent, LLM Evaluation
- Venue: ICLR 2026 Workshop
- Summary: 本文研究LLM多智能体系统中的合作行为,发现能力并不预测合作表现。OpenAI o3在零成本合作环境中仅达到17%的最优集体性能,表明单纯扩展智能无法解决协调问题。
[141] Loop, Think, & Generalize: Implicit Reasoning in Recurrent-Depth Transformers
- arXiv: 2604.07822 (cross-listed)
- Authors: Harsh Kohli, Srinivasan Parthasarathy, Huan Sun, Yuekun Yao
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: LLM Reasoning, LLM Inference, In-Context Learning
- Summary: 本文研究循环深度Transformer中的隐式推理能力,发现其能有效实现系统泛化和深度外推。该能力通过三阶段grokking过程涌现,且推理时增加循环次数可解锁更深层的推理能力。
[142] LPM 1.0: Video-based Character Performance Model
- arXiv: 2604.07823 (cross-listed)
- Authors: Ailing Zeng, Casper Yang, Chauncey Ge, Eddie Zhang, Garvey Xu, Gavin Lin, Gilbert Gu, Jeremy Pi, Leo Li, Mingyi Shi, Sheng Bi, Steven Tang, Thorn Hang, Tobey Guo, Vincent Li, Xin Tong, Yikang Li, Yuchen Sun, Zhao, Yuhan Lu, Yuwei Li, Zane Zhang, Zeshi Yang, Zi Ye
- Subjects: cs.CV; cs.AI; cs.MM
- Tags: Video Generation, Diffusion Model, Multimodal Learning
- Summary: 本文提出LPM 1.0,一个17B参数的扩散Transformer模型,用于实时视频角色表演生成。通过基础模型和蒸馏流式生成器,解决了表现力、实时推理和长期身份稳定性之间的三元困境。
[143] Filling the Gaps: Selective Knowledge Augmentation for LLM Recommenders
- arXiv: 2604.07825 (cross-listed)
- Authors: Jaehyun Lee, Sanghwan Jang, SeongKu Kang, Hwanjo Yu
- Subjects: cs.IR; cs.AI
- Tags: Recommender System, RAG, LLM Inference
- Venue: SIGIR 2026
- Summary: 本文提出KnowSA_CKP方法,通过比较知识探测评估LLM内部知识,选择性地为需要的物品注入额外信息。该方法无需微调,有效缓解了知识差距问题,同时提高了上下文效率。
[144] ReRec: Reasoning-Augmented LLM-based Recommendation Assistant via Reinforcement Fine-tuning
- arXiv: 2604.07851 (cross-listed)
- Authors: Jiani Huang, Shijie Wang, Liangbo Ning, Wenqi Fan, Qing Li
- Subjects: cs.IR; cs.AI
- Tags: Recommender System, LLM Reasoning, Reinforcement Learning
- Venue: ACL 2026
- Code: code
- Summary: 本文提出ReRec框架,通过强化微调提升LLM在复杂推荐任务中的推理能力。引入双图增强奖励塑造、推理感知优势估计和在线课程调度器三个关键组件,显著优于现有基线。
[145] QaRL: Rollout-Aligned Quantization-Aware RL for Fast and Stable Training under Training--Inference Mismatch
- arXiv: 2604.07853 (cross-listed)
- Authors: Hao Gu, Hao Wang, Jiacheng Liu, Lujun Li, Qiyuan Zhu, Bei Liu, Binxing Xu, Lei Wang, Xintong Yang, Sida Lin, Sirui Han, Yike Guo
- Subjects: cs.LG; cs.AI
- Tags: LLM Inference, Reinforcement Learning, Model Compression
- Summary: 本文提出QaRL方法,通过将训练侧的前向传播与量化rollout对齐来解决LLM强化学习流水线中训练-推理不匹配的问题,并引入TBPO目标函数保持更新在信任域内,在数学问题上相比量化rollout训练提升5.5分。
[146] Networking-Aware Energy Efficiency in Agentic AI Inference: A Survey
- arXiv: 2604.07857 (cross-listed)
- Authors: Xiaojing Chen, Haiqi Yu, Wei Ni, Dusit Niyato, Ruichen Zhang, Xin Wang, Shunqing Zhang, Shugong Xu
- Subjects: eess.SY; cs.AI
- Tags: LLM Agent, Energy Efficiency, Edge Computing
- Summary: 本综述研究Agentic AI系统的能效挑战,提出一个能耗核算框架来识别感知-推理-行动循环中的计算和通信成本,并探索模型简化、计算控制和硬件感知推理的跨层协同设计策略。
[147] Task-Adaptive Retrieval over Agentic Multi-Modal Web Histories via Learned Graph Memory
- arXiv: 2604.07863 (cross-listed)
- Authors: Saman Forouzandeh, Kamal Berahmand, Mahdi Jalili
- Subjects: cs.IR; cs.AI
- Tags: LLM Agent, Graph Neural Network, Information Retrieval
- Venue: SIGIR 2026
- Code: code
- Summary: 本文提出ACGM,一种学习型图记忆检索器,通过策略梯度优化构建任务自适应的相关性图来检索智能体历史记录,在WebShop、VisualWebArena和Mind2Web基准上显著优于现有方法。
[148] PyVRP^+: LLM-Driven Metacognitive Heuristic Evolution for Hybrid Genetic Search in Vehicle Routing Problems
- arXiv: 2604.07872 (cross-listed)
- Authors: Manuj Malik, Jianan Zhou, Shashank Reddy Chirra, Zhiguang Cao
- Subjects: cs.NE; cs.AI
- Tags: LLM Reasoning, Optimization, Automated Planning
- Venue: AAMAS 2026
- Summary: 本文提出元认知进化编程(MEP)框架,让LLM通过结构化的推理-行动-反思循环来进化车辆路径问题的启发式算法,发现的启发式方法在解质量上提升2.70%,运行时间减少45%以上。
[149] FlowGuard: Towards Lightweight In-Generation Safety Detection for Diffusion Models via Linear Latent Decoding
- arXiv: 2604.07879 (cross-listed)
- Authors: Jinghan Yang, Yihe Fan, Xudong Pan, Min Yang
- Subjects: cs.CV; cs.AI
- Tags: Diffusion Model, Adversarial Robustness, Image Synthesis
- Summary: 本文提出FlowGuard,一种跨模型生成中检测框架,通过线性潜在解码在扩散模型的去噪步骤中检测NSFW内容,F1分数提升30%以上,GPU内存需求降低97%以上。
[150] Reinforcement-Guided Synthetic Data Generation for Privacy-Sensitive Identity Recognition
- arXiv: 2604.07884 (cross-listed)
- Authors: Xuemei Jia, Jiawei Du, Hui Wei, Jun Chen, Joey Tianyi Zhou, Zheng Wang
- Subjects: cs.CV; cs.AI
- Tags: Data Synthesis, Privacy, Reinforcement Learning
- Summary: 本文提出强化引导的合成数据生成框架,将通用领域生成先验适应到隐私敏感的身份识别任务,通过多目标奖励联合优化语义一致性、覆盖多样性和表达丰富度。
[151] Data Selection for Multi-turn Dialogue Instruction Tuning
- arXiv: 2604.07892 (cross-listed)
- Authors: Bo Li, Shikun Zhang, Wei Ye
- Subjects: cs.CL; cs.AI
- Tags: Instruction Tuning, Data Selection, Dialogue System
- Code: code
- Summary: 本文提出MDS多轮对话选择框架,从数据选择角度解决多轮对话指令微调中的噪声问题,通过全局覆盖和局部结构两个阶段对完整对话进行评分选择,在多轮基准测试上优于现有方法。
[152] TSUBASA: Improving Long-Horizon Personalization via Evolving Memory and Self-Learning with Context Distillation
- arXiv: 2604.07894 (cross-listed)
- Authors: Xinliang Frederick Zhang, Lu Wang
- Subjects: cs.CL; cs.AI
- Tags: LLM Personalization, Memory Architecture, Knowledge Distillation
- Summary: 本文提出TSUBASA方法,通过动态记忆进化和基于上下文蒸馏的自学习来增强LLM的长期个性化能力,在长期基准测试上超越Mem0和Memory-R1等竞争系统,实现质量-效率的帕累托改进。
[153] AnomalyAgent: Agentic Industrial Anomaly Synthesis via Tool-Augmented Reinforcement Learning
- arXiv: 2604.07900 (cross-listed)
- Authors: Jiaming Su, Tengchao Yang, Ruikang Zhang, Zhengan Yan, Haoyu Sun, Linfeng Zhang
- Subjects: cs.CV; cs.AI
- Tags: Anomaly Detection, LLM Agent, Data Synthesis
- Summary: 本文提出AnomalyAgent,一个配备五种工具的异常合成智能体,通过监督微调和强化学习两阶段训练框架实现闭环优化,在MVTec-AD数据集上超越所有零样本SOTA方法。
[154] Dynamic Attentional Context Scoping: Agent-Triggered Focus Sessions for Isolated Per-Agent Steering in Multi-Agent LLM Orchestration
- arXiv: 2604.07911 (cross-listed)
- Authors: Nickson Patel
- Subjects: cs.MA; cs.AI; cs.LG
- Tags: Multi-Agent System, LLM Agent, Long Context
- Summary: 本文提出DACS机制解决多智能体LLM编排中的上下文污染问题,通过注册模式和聚焦模式两种不对称操作模式实现智能体隔离,在200次试验中达到90-98.4%的引导准确率。
[155] Mitigating Entangled Steering in Large Vision-Language Models for Hallucination Reduction
- arXiv: 2604.07914 (cross-listed)
- Authors: Yuanhong Zhang, Zhaoyang Wang, Xin Zhang, Weizhan Zhang, Joey Tianyi Zhou
- Subjects: cs.CV; cs.AI
- Tags: LLM Hallucination, Vision-Language Model, LLM Alignment
- Summary: 本文提出MESA框架,通过可控和选择性的潜在干预来缓解大型视觉语言模型中的幻觉问题,在减少幻觉的同时保持模型原有的生成行为,在多个LVLM家族上优于现有方法。
[156] Sinkhorn doubly stochastic attention rank decay analysis
- arXiv: 2604.07925 (cross-listed)
- Authors: Michela Lapenna, Rita Fioresi, Bahman Gharesifard
- Subjects: cs.LG; cs.AI; math.OC
- Tags: Deep Learning Theory, Vision Transformer, Representation Learning
- Summary: 本文分析Transformer架构中的秩坍缩问题,证明使用Sinkhorn算法归一化的双随机注意力矩阵比标准Softmax行随机注意力更有效地保持秩,并在情感分析和图像分类任务上进行了实证验证。
[157] Same Outcomes, Different Journeys: A Trace-Level Framework for Comparing Human and GUI-Agent Behavior in Production Search Systems
- arXiv: 2604.07929 (cross-listed)
- Authors: Maria Movin, Claudia Hauff, Aron Henriksson, Panagiotis Papapetrou
- Subjects: cs.IR; cs.AI
- Tags: GUI Automation, LLM Evaluation, Human-Computer Interaction
- Summary: 本文提出轨迹级评估框架,比较人类和GUI智能体在生产搜索系统中的行为差异,发现虽然智能体在任务成功率和查询生成上与人类相当,但遵循系统不同的导航策略。
[158] Large Language Model Post-Training: A Unified View of Off-Policy and On-Policy Learning
- arXiv: 2604.07941 (cross-listed)
- Authors: Shiwan Zhao, Zhihu Wang, Xuyang Zhao, Jiaming Zhou, Caiyue Xu, Chenfei Liu, Liting Zhang, Yuhang Jia, Yanzhe Zhang, Hualong Yu, Zichen Xu, Qicheng Li, Yong Qin
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: LLM Alignment, Instruction Tuning, Reinforcement Learning
- Summary: 本综述提供LLM后训练方法的统一视角,按轨迹来源(离策略vs在策略学习)组织方法,并通过支持扩展、策略重塑和行为巩固三个角色来解释各种方法,帮助诊断后训练瓶颈。
[159] On-Policy Distillation of Language Models for Autonomous Vehicle Motion Planning
- arXiv: 2604.07944 (cross-listed)
- Authors: Amirhossein Afsharrad, Amirhesam Abedsoltan, Ahmadreza Moradipari, Sanjay Lall
- Subjects: cs.RO; cs.AI; eess.SY
- Tags: Autonomous Driving, Knowledge Distillation, LLM Inference
- Summary: 本文研究如何将运动规划知识从大型教师LLM迁移到小型学生模型,通过在策略广义知识蒸馏(GKD)方法,在nuScenes基准上以5倍模型压缩接近教师级性能。
[160] Incremental Residual Reinforcement Learning Toward Real-World Learning for Social Navigation
- arXiv: 2604.07945 (cross-listed)
- Authors: Haruto Nagahisa, Kohei Matsumoto, Yuki Tomita, Yuki Hyodo, Ryo Kurazume
- Subjects: cs.RO; cs.AI
- Tags: Reinforcement Learning, Robotics, Continual Learning
- Summary: 本文提出增量残差强化学习(IRRL)方法用于社交导航的真实世界学习,将无回放缓冲区的增量学习与残差RL相结合,在仿真和真实世界实验中证明了方法的有效性。
[161] Investigation of Automated Design of Quantum Circuits for Imaginary Time Evolution Methods Using Deep Reinforcement Learning
- arXiv: 2604.07951 (cross-listed)
- Authors: Ryo Suzuki, Shohei Watabe
- Subjects: cs.AI; cs.LG
- Tags: Quantum Computing, Reinforcement Learning
- Summary: 本文提出了一种基于双深度Q网络(DDQN)的自动化框架,用于变分虚时演化(VITE)量子电路设计,将电路构建视为多目标优化问题。该方法在Max-Cut问题和分子氢模拟中显著减少了门数量和电路深度,为硬件感知的量子算法设计提供了新途径。
[162] Pruning Extensions and Efficiency Trade-Offs for Sustainable Time Series Classification
- arXiv: 2604.07953 (cross-listed)
- Authors: Raphael Fischer, Angus Dempster, Sebastian Buschjäger, Matthias Jakobs, Urav Maniar, Geoffrey I. Webb
- Subjects: cs.LG; cs.AI
- Tags: Time Series Forecasting, Model Compression, Energy Efficiency
- Summary: 本文提出了一个全面的评估框架,探索时间序列分类中预测性能与资源消耗的平衡。通过引入理论有界的剪枝策略和新型混合分类器Hydrant,实验表明剪枝可在保持竞争性预测质量的同时减少高达80%的能耗。
[163] TOOLCAD: Exploring Tool-Using Large Language Models in Text-to-CAD Generation with Reinforcement Learning
- arXiv: 2604.07960 (cross-listed)
- Authors: Yifei Gong, Xing Wu, Wenda Liu, Kang Tu
- Subjects: cs.CV; cs.AI; cs.CL
- Tags: CAD Generation, LLM Agent, Reinforcement Learning
- Summary: 本文提出ToolCAD框架,将大语言模型部署为工具使用代理用于文本到CAD生成。通过引入交互式CAD建模训练环境和在线课程强化学习策略,使开源LLM能够执行CAD工具操作,实现与专有模型相当的性能。
[164] Rethinking Data Mixing from the Perspective of Large Language Models
- arXiv: 2604.07963 (cross-listed)
- Authors: Yuanjian Xu, Tianze Sun, Changwei Xu, XinLong Zhao, Jianing Hao, Ran Chen, Yang Liu, Ruijie Xu, Stephen Chen, Guang Zhang
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: Data Mixing, Pre-training
- Summary: 本文从大语言模型训练角度重新审视数据混合策略,建立了梯度动力学与域分布之间的形式化联系。基于此分析,提出了DoGraph重加权框架,将数据调度建模为图约束优化问题,在GPT-2模型上取得了竞争性性能。
[165] DSCA: Dynamic Subspace Concept Alignment for Lifelong VLM Editing
- arXiv: 2604.07965 (cross-listed)
- Authors: Gyanendra Das, Sai Satyam Jena
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Knowledge Editing, Vision-Language Model, Continual Learning
- Venue: CVPR 2026
- Summary: 本文提出动态子空间概念对齐(DSCA)方法,通过将表示空间分解为正交语义子空间来解决视觉语言模型的终身编辑问题。该方法在冻结基础模型的情况下实现了98%的单次编辑成功率,在1000次连续编辑后仍保持95%以上的成功率。
[166] AtomEval: Atomic Evaluation of Adversarial Claims in Fact Verification
- arXiv: 2604.07967 (cross-listed)
- Authors: Hongyi Cen, Mingxin Wang, Yule Liu, Jingyi Zheng, Hanze Jia, Tan Tang, Yingcai Wu
- Subjects: cs.CL; cs.AI
- Tags: LLM Evaluation, Adversarial Robustness
- Summary: 本文提出AtomEval框架,将声明分解为主体-关系-客体-修饰符原子,并通过原子有效性评分来评估对抗性声明重写。实验表明,在有效性感知评估下,更强的模型不一定能产生更有效的对抗性声明。
[167] A Decomposition Perspective to Long-context Reasoning for LLMs
- arXiv: 2604.07981 (cross-listed)
- Authors: Yanling Xiao, Huaibing Xie, Guoliang Zhao, Shihan Dou, Shaolei Wang, Yiting Liu, Nantao Zheng, Cheng Zhang, Pluto Zhou, Zhisong Zhang, Lemao Liu
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: LLM Reasoning, Long Context
- Summary: 本文将长上下文推理分解为一组基本原子技能,并自动合成针对特定技能的伪数据集。通过强化学习在这些伪数据集上训练,模型在多个基准测试上平均提升了7.7%的性能。
[168] LogAct: Enabling Agentic Reliability via Shared Logs
- arXiv: 2604.07988 (cross-listed)
- Authors: Mahesh Balakrishnan, Ashwin Bharambe, Davide Testuggine, David Geraghty, David Mao, Vidhya Venkat, Ilya Mironov, Rithesh Baradi, Gayathri Aiyer, Victoria Dudin
- Subjects: cs.DC; cs.AI
- Tags: LLM Agent, AI Safety
- Summary: 本文提出LogAct抽象,将每个代理视为播放共享日志的解构状态机。该框架支持代理自省、语义恢复变体和健康检查,能够在保持良性效用的同时阻止所有不需要的操作。
[169] Show Me the Infographic I Imagine: Intent-Aware Infographic Retrieval for Authoring Support
- arXiv: 2604.07989 (cross-listed)
- Authors: Jing Xu, Jiarui Hu, Zhihao Shuai, Yiyun Chen, Weikai Yang
- Subjects: cs.IR; cs.AI
- Tags: Information Retrieval, Data Visualization
- Summary: 本文开发了一个意图感知的信息图表检索框架,通过意图分类法丰富和细化用户的自由形式查询。该方法在检索质量和支持意图满足方面均优于基线方法,有效降低了信息图表创作的门槛。
[170] The ecosystem of machine learning competitions: Platforms, participants, and their impact on AI development
- arXiv: 2604.08001 (cross-listed)
- Authors: Ioannis Nasios
- Subjects: cs.LG; cs.AI; stat.ML
- Tags: ML Competition
- Summary: 本文对Kaggle和Zindi等主要机器学习竞赛平台进行了全面分析,考察其工作流程、评估方法和奖励结构。研究表明ML竞赛在学术研究和工业应用的交汇处发挥作用,推动知识交流和技术进步。
[171] SearchAD: Large-Scale Rare Image Retrieval Dataset for Autonomous Driving
- arXiv: 2604.08008 (cross-listed)
- Authors: Felix Embacher, Jonas Uhrig, Marius Cordts, Markus Enzweiler
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Autonomous Driving, Information Retrieval
- Venue: CVPR 2026
- Summary: 本文介绍了SearchAD大规模罕见图像检索数据集,包含超过423k帧和513k个边界框标注,覆盖90个罕见类别。该数据集专注于自动驾驶场景中的长尾感知研究,支持文本到图像和图像到图像检索任务。
[172] From Universal to Individualized Actionability: Revisiting Personalization in Algorithmic Recourse
- arXiv: 2604.08030 (cross-listed)
- Authors: Lena Marie Budde, Ayan Majumdar, Richard Uth, Markus Langer, Isabel Valera
- Subjects: cs.LG; cs.AI
- Tags: Fairness, Decision Making
- Summary: 本文将个性化形式化为个体可操作性,包括硬约束和软约束两个维度。通过广泛实验发现,个体可操作性约束会显著降低补偿建议的合理性和有效性,并可能揭示不同社会人口群体之间的差异。
[173] PrivFedTalk: Privacy-Aware Federated Diffusion with Identity-Stable Adapters for Personalized Talking-Head Generation
- arXiv: 2604.08037 (cross-listed)
- Authors: Soumya Mazumdar, Vineet Kumar Rakesh, Tapas Samanta
- Subjects: cs.CR; cs.AI; cs.CV; cs.LG
- Tags: Federated Learning, Diffusion Model, Privacy
- Code: code
- Summary: 本文提出PrivFedTalk隐私感知联邦框架,结合条件潜在扩散和参数高效的身份适配器进行个性化说话人头生成。该方法通过身份稳定联邦聚合和时间去噪一致性正则化,在保护隐私的同时实现稳定的联邦优化。
[174] LINE: LLM-based Iterative Neuron Explanations for Vision Models
- arXiv: 2604.08039 (cross-listed)
- Authors: Vladimir Zaigrajew, Michał Piechota, Gaspar Sekula, Przemysław Biecek
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Interpretability, Vision-Language Model
- Summary: 本文提出LINE方法,一种无需训练的迭代方法,用于视觉模型的开词汇概念标注。该方法利用大语言模型和文本到图像生成器在闭环中迭代提出和细化概念,在ImageNet上实现了高达0.18的AUC提升。
[175] 3DrawAgent: Teaching LLM to Draw in 3D with Early Contrastive Experience
- arXiv: 2604.08042 (cross-listed)
- Authors: Hongcan Xiao, Xinyue Xiao, Yilin Wang, Yue Zhang, Yonggang Qi
- Subjects: cs.CV; cs.AI
- Tags: 3D Vision, LLM Agent
- Venue: CVPR 2026
- Summary: 本文提出3DrawAgent框架,一种无需训练的语言驱动3D草图生成方法。通过引入相对经验优化策略和成对比较机制,模型能够在不进行参数更新的情况下自我改进空间理解和绘图质量。
[176] Governed Capability Evolution for Embodied Agents: Safe Upgrade, Compatibility Checking, and Runtime Rollback for Embodied Capability Modules
- arXiv: 2604.08059 (cross-listed)
- Authors: Xue Qin, Simin Luan, John See, Cong Yang, Zhijun Li
- Subjects: cs.RO; cs.AI
- Tags: Embodied AI, AI Safety
- Summary: 本文将受控能力演化形式化为具身代理的一类系统问题,提出了包含接口、策略、行为和恢复四项兼容性检查的生命周期感知升级框架。实验表明,受控升级在保持任务成功率的同时将不安全激活降至零。
[177] From Gaze to Guidance: Interpreting and Adapting to Users' Cognitive Needs with Multimodal Gaze-Aware AI Assistants
- arXiv: 2604.08062 (cross-listed)
- Authors: Valdemar Danry, Javier Hernandez, Andrew Wilson, Pattie Maes, Judith Amores
- Subjects: cs.HC; cs.AI
- Tags: LLM Agent, Multimodal Learning, Human-Computer Interaction
- Summary: 本文提出了一种基于眼动的多模态LLM助手,利用带有眼动叠加的自我中心视频来识别用户阅读困难点并提供针对性帮助。对照实验(n=36)表明,与纯文本LLM助手相比,眼动感知助手在评估用户阅读行为方面更准确、更个性化,显著提高了用户的信息回忆能力,且交互更高效。
[178] AtlasOCR: Building the First Open-Source Darija OCR Model with Vision Language Models
- arXiv: 2604.08070 (cross-listed)
- Authors: Imane Momayiz, Soufiane Ait Elaouad, Abdeljalil Elmajjodi, Haitame Bouanane
- Subjects: cs.CV; cs.AI
- Tags: OCR, Vision-Language Model
- Summary: 本文介绍了AtlasOCR,首个针对摩洛哥阿拉伯语(Darija)的开源OCR模型,通过微调3B参数的视觉语言模型构建。作者创建了Darija专用数据集,采用QLoRA和Unsloth进行高效训练,在AtlasOCRBench和KITAB-Bench上达到了最先进的性能。
[179] OV-Stitcher: A Global Context-Aware Framework for Training-Free Open-Vocabulary Semantic Segmentation
- arXiv: 2604.08110 (cross-listed)
- Authors: Seungjae Moon, Seunghyun Oh, Youngmin Ro
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Image Segmentation, Vision-Language Model
- Summary: 本文提出了OV-Stitcher,一种无需训练的开放词汇语义分割框架,通过在最终编码器块中拼接碎片化的子图像特征来解决滑动窗口方法无法进行全局注意力的问题。该方法在八个基准测试上将mIoU从48.7提升至50.7,实现了更连贯的上下文聚合。
[180] TADP-RME: A Trust-Adaptive Differential Privacy Framework for Enhancing Reliability of Data-Driven Systems
- arXiv: 2604.08113 (cross-listed)
- Authors: Labani Halder, Payel Sadhukhan, Sarbani Palit
- Subjects: cs.CR; cs.AI; cs.LG
- Tags: Privacy, Adversarial Robustness
- Summary: 本文提出了TADP-RME框架,通过引入逆信任分数自适应调节隐私预算,并采用反向流形嵌入破坏局部几何关系,在保证差分隐私的同时增强数据驱动系统的可靠性。实验表明该方法将攻击成功率降低高达3.1%,在隐私-效用权衡上优于现有方法。
[181] Small Vision-Language Models are Smart Compressors for Long Video Understanding
- arXiv: 2604.08120 (cross-listed)
- Authors: Junjie Fei, Jun Chen, Zechun Liu, Yunyang Xiong, Chong Zhou, Wei Wen, Junlin Han, Mingchen Zhuge, Saksham Suri, Qi Qian, Shuming Liu, Lemeng Wu, Raghuraman Krishnamoorthi, Vikas Chandra, Mohamed Elhoseiny, Chenchen Zhu
- Subjects: cs.CV; cs.AI; cs.CL; cs.LG
- Tags: Video Understanding, Vision-Language Model, Model Compression
- Summary: 本文提出了Tempo框架,利用小型视觉语言模型作为局部时间压缩器,将长视频压缩用于下游理解任务。该方法引入自适应令牌分配(ATA)动态路由,在LVBench上以52.3分超越GPT-4o和Gemini 1.5 Pro,证明了意图驱动效率在长视频理解中的重要性。
[182] Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator
- arXiv: 2604.08121 (cross-listed)
- Authors: Luozheng Qin, Jia Gong, Qian Qiao, Tianjiao Li, Li Xu, Haoyu Pan, Chao Qu, Zhiyu Tan, Hao Li
- Subjects: cs.CV; cs.AI
- Tags: Video Generation, Video Understanding, Diffusion Model
- Summary: 本文提出了Uni-ViGU框架,通过扩展视频生成器作为基础来统一视频生成与理解任务。该方法采用统一流匹配和模态驱动的MoE架构,设计了双向训练机制,在视频生成和理解任务上均取得了具有竞争力的性能。
[183] LegoDiffusion: Micro-Serving Text-to-Image Diffusion Workflows
- arXiv: 2604.08123 (cross-listed)
- Authors: Lingyun Yang, Suyi Li, Tianyu Feng, Xiaoxiao Jiang, Zhipeng Di, Weiyi Lu, Kan Liu, Yinghao Yu, Tao Lan, Guodong Yang, Lin Qu, Liping Zhang, Wei Wang
- Subjects: cs.DC; cs.AI
- Tags: Text-to-Image, Diffusion Model, DNN Deployment
- Summary: 本文提出了LegoDiffusion系统,将文本到图像扩散工作流分解为松耦合的模型执行节点,实现独立管理和调度。该系统解锁了集群级优化,包括每模型扩展、模型共享和自适应模型并行,可支持高达3倍的请求速率和8倍的突发流量。
[184] Alloc-MoE: Budget-Aware Expert Activation Allocation for Efficient Mixture-of-Experts Inference
- arXiv: 2604.08133 (cross-listed)
- Authors: Baihui Liu, Kaiyuan Tian, Wei Wang, Zhaoning Zhang, Linbo Qiao, Dongsheng Li
- Subjects: cs.LG; cs.AI; cs.CL
- Tags: Mixture-of-Experts, LLM Inference
- Venue: ACL 2026
- Summary: 本文提出了Alloc-MoE框架,在混合专家推理中通过层级和令牌级的预算感知专家激活分配来最小化性能下降。该方法在受限激活预算下保持模型性能,在DeepSeek-V2-Lite上实现了1.15倍预填充和1.34倍解码加速。
[185] Multimodal Reasoning with LLM for Encrypted Traffic Interpretation: A Benchmark
- arXiv: 2604.08140 (cross-listed)
- Authors: Longgang Zhang, Xiaowei Fu, Fuxiang Huang, Lei Zhang
- Subjects: cs.CR; cs.AI; cs.MM; cs.NI
- Tags: Cybersecurity, Multimodal Learning, LLM Reasoning
- Code: code
- Summary: 本文提出了BGTD基准和mmTraffic框架,用于加密流量解释的多模态推理,结合原始字节与结构化专家标注。该框架连接物理流量编码与语义解释,能够生成高保真、可读性强且基于证据的流量解释报告。
[186] Face-D(^2)CL: Multi-Domain Synergistic Representation with Dual Continual Learning for Facial DeepFake Detection
- arXiv: 2604.08159 (cross-listed)
- Authors: Yushuo Zhang, Yu Cheng, Yongkang Hu, Jiuan Zhou, Jiawei Chen, Yuan Xie, Zhaoxia Yin
- Subjects: cs.CV; cs.AI
- Tags: Deepfake Detection, Continual Learning
- Summary: 本文提出了Face-D(^2)CL框架,用于人脸深度伪造检测,结合多域协同表示和双持续学习机制(EWC和OGC)。该方法在不依赖历史数据回放的情况下,实现了平均检测错误率降低60.7%,在未见伪造域上检测AUC提升7.9%。
[187] ViVa: A Video-Generative Value Model for Robot Reinforcement Learning
- arXiv: 2604.08168 (cross-listed)
- Authors: Jindi Lv, Hao Li, Jie Li, Yifei Nie, Fankun Kong, Yang Wang, Xiaofeng Wang, Zheng Zhu, Chaojun Ni, Qiuping Deng, Hengtao Li, Jiancheng Lv, Guan Huang
- Subjects: cs.RO; cs.AI
- Tags: Robotics, Reinforcement Learning, Video Generation
- Summary: 本文提出了ViVa,一种视频生成价值模型,将预训练视频生成器重新用于机器人强化学习中的价值估计。该方法联合预测未来本体感知和当前状态标量值,利用视频生成器的时空先验改善长时任务中的价值估计。
[188] OceanMAE: A Foundation Model for Ocean Remote Sensing
- arXiv: 2604.08171 (cross-listed)
- Authors: Viola-Joanna Stamer, Panagiotis Agrafiotis, Behnood Rasti, Begüm Demir
- Subjects: cs.CV; cs.AI
- Tags: Remote Sensing, Self-Supervised Learning
- Summary: 本文提出了OceanMAE,一种海洋专用掩码自编码器,在自监督学习过程中整合多光谱Sentinel-2观测与物理意义的海洋描述符。该模型在海洋分割和测深估计任务上取得了显著提升,证明了物理信息引导的领域对齐预训练的价值。
[189] AT-ADD: All-Type Audio Deepfake Detection Challenge Evaluation Plan
- arXiv: 2604.08184 (cross-listed)
- Authors: Yuankun Xie, Haonan Cheng, Jiayi Zhou, Xiaoxuan Guo, Tao Wang, Jian Liu, Weiqiang Wang, Ruibo Fu, Xiaopeng Wang, Hengyan Huang, Xiaoying Huang, Long Ye, Guangtao Zhai
- Subjects: cs.SD; cs.AI
- Tags: Deepfake Detection, Speech Processing, ML Competition
- Venue: ACM Multimedia 2026 Workshop
- Summary: 本文提出了ACM Multimedia 2026全类型音频深度伪造检测(AT-ADD)大挑战,旨在弥合学术评估与实际多媒体取证之间的差距。挑战包含两个赛道:鲁棒语音深度伪造检测和全类型音频深度伪造检测,推动开发更鲁棒和可泛化的音频取证技术。
[190] MedVR: Annotation-Free Medical Visual Reasoning via Agentic Reinforcement Learning
- arXiv: 2604.08203 (cross-listed)
- Authors: Zheng Jiang, Heng Guo, Chengyu Fang, Changchen Xiao, Xinyang Hu, Lifeng Sun, Minfeng Xu
- Subjects: cs.CV; cs.AI
- Tags: Medical AI, Reinforcement Learning, Vision-Language Model
- Venue: ICLR 2026
- Summary: 本文提出了MedVR,一种强化学习框架,通过熵引导的视觉重定位和基于共识的信用分配实现医学VLM的无标注视觉推理。该方法在多个医学VQA基准上达到最先进性能,无需中间步骤的人工标注即可学习基于视觉证据的推理。
[191] EditCaption: Human-Aligned Instruction Synthesis for Image Editing via Supervised Fine-Tuning and Direct Preference Optimization
- arXiv: 2604.08213 (cross-listed)
- Authors: Xiangyuan Wang, Honghao Cai, Yunhao Bai, Tianze Zhou, Haohua Chen, Yao Hu, Xu Tang, Yibo Chen, Wei Zhu
- Subjects: cs.CV; cs.AI
- Tags: Image Editing, Instruction Tuning
- Summary: 本文提出了EditCaption,一个用于图像编辑指令合成的两阶段后训练流水线,解决了现有VLM在图像对设置中的方向不一致、视点模糊和细粒度属性描述不足等问题。经过SFT和DPO训练的模型将关键错误从47.75%降至23%,正确率从41.75%提升至66%。
[192] HyperMem: Hypergraph Memory for Long-Term Conversations
- arXiv: 2604.08256 (cross-listed)
- Authors: Juwei Yue, Chuanrui Hu, Jiawei Sheng, Zuyi Zhou, Wenyuan Zhang, Tingwen Liu, Li Guo, Yafeng Deng
- Subjects: cs.CL; cs.AI
- Tags: Dialogue System, Memory Architecture, Knowledge Graph
- Venue: ACL 2026
- Summary: 本文提出了HyperMem,一种基于超图的层次化记忆架构,用于长期对话中捕获高阶关联。该方法将记忆组织为主题、情节和事实三个层次,通过超边将相关内容统一为连贯单元,在LoCoMo基准上达到92.73%的LLM-as-a-judge准确率。
[193] Behavior-Aware Item Modeling via Dynamic Procedural Solution Representations for Knowledge Tracing
- arXiv: 2604.08260 (cross-listed)
- Authors: Jun Seo, Sangwon Ryu, Heejin Do, Hyounghun Kim, Gary Geunbae Lee
- Subjects: cs.CL; cs.AI
- Tags: Knowledge Tracing, Representation Learning, LLM Reasoning
- Venue: ACL Findings 2026
- Summary: 本文提出BAIM框架,通过整合动态解题过程信息来丰富知识追踪中的项目表示,利用推理语言模型将解题过程分解为四个阶段,并引入上下文条件机制以适应学习者的异质性。
[194] DBMF: A Dual-Branch Multimodal Framework for Out-of-Distribution Detection
- arXiv: 2604.08261 (cross-listed)
- Authors: Jiangbei Yue, Sharib Ali
- Subjects: cs.CV; cs.AI
- Tags: Medical AI, Multimodal Learning, Anomaly Detection
- Summary: 本文提出一种双分支多模态框架用于医学图像的分布外检测,通过文本-图像分支和视觉分支的互补来充分利用多模态信息识别分布外样本。
[195] QARIMA: A Quantum Approach To Classical Time Series Analysis
- arXiv: 2604.08277 (cross-listed)
- Authors: Nishikanta Mohanty, Bikash K. Behera, Badshah Mukherjee, Pravat Dash
- Subjects: cs.AI; cs.LG
- Tags: Quantum Computing, Time Series Forecasting
- Summary: 本文提出一种量子启发的ARIMA方法,将量子辅助的滞后发现与变分量子电路相结合,用于时间序列分析中的参数估计和弱滞后优化。
[196] Distributed Multi-Layer Editing for Rule-Level Knowledge in Large Language Models
- arXiv: 2604.08284 (cross-listed)
- Authors: Yating Wang, Wenting Zhao, Yaqi Zhao, Yongshun Gong, Yilong Yin, Haoliang Sun
- Subjects: cs.CL; cs.AI
- Tags: Knowledge Editing, LLM Reasoning
- Code: code
- Summary: 本文研究大语言模型中规则级知识的编辑问题,提出DMLE方法,通过对公式、描述和实例进行分布式多层更新来实现更有效的规则编辑。
[197] CIAO - Code In Architecture Out - Automated Software Architecture Documentation with Large Language Models
- arXiv: 2604.08293 (cross-listed)
- Authors: Marco De Luca, Tiziano Santilli, Domenico Amalfitano, Anna Rita Fasolino, Patrizio Pelliccione
- Subjects: cs.SE; cs.AI
- Tags: Software Documentation, Code Generation
- Venue: ICSA 2026
- Summary: 本文提出CIAO流程,利用大语言模型从GitHub仓库自动生成符合ISO/IEC/IEEE 42010标准的系统级架构文档,评估显示生成的文档具有较高价值且成本低廉。
[198] Can Vision Language Models Judge Action Quality? An Empirical Evaluation
- arXiv: 2604.08294 (cross-listed)
- Authors: Miguel Monte e Freitas, Rui Henriques, Ricardo Rei, Pedro Henrique Martins
- Subjects: cs.CV; cs.AI; cs.CL
- Tags: Vision-Language Model, LLM Evaluation, Video Understanding
- Summary: 本文对视觉语言模型在动作质量评估任务上的表现进行全面评估,发现当前模型仅略优于随机猜测,并揭示了模型存在预测正确执行的偏差和对表面语言框架敏感的问题。
[199] SeLaR: Selective Latent Reasoning in Large Language Models
- arXiv: 2604.08299 (cross-listed)
- Authors: Renyu Fu, Guibo Luo
- Subjects: cs.CL; cs.AI
- Tags: LLM Reasoning, LLM Inference
- Venue: ACL 2026
- Summary: 本文提出SeLaR框架,通过熵门控机制仅在低置信度步骤激活软嵌入,并引入熵感知对比正则化来鼓励探索多条潜在推理路径,无需训练即可提升推理效果。
[200] DMax: Aggressive Parallel Decoding for dLLMs
- arXiv: 2604.08302 (cross-listed)
- Authors: Zigeng Chen, Gongfan Fang, Xinyin Ma, Ruonan Yu, Xinchao Wang
- Subjects: cs.LG; cs.AI
- Tags: LLM Inference, Diffusion Model
- Code: code
- Summary: 本文提出DMax范式,通过将解码重构为从掩码嵌入到词元嵌入的渐进式自精炼过程,实现扩散语言模型的高效并行解码。
[201] Securing Retrieval-Augmented Generation: A Taxonomy of Attacks, Defenses, and Future Directions
- arXiv: 2604.08304 (cross-listed)
- Authors: Yuming Xu, Mingtao Zhang, Zhuohan Ge, Haoyang Li, Nicole Hu, Jason Chen Zhang, Qing Li, Lei Chen
- Subjects: cs.CR; cs.AI
- Tags: RAG, LLM Security, Cybersecurity
- Summary: 本文对RAG系统的安全风险进行系统分类,围绕外部知识访问管道建立操作边界,将RAG工作流抽象为六个阶段并组织攻击和防御方法。
[202] HistDiT: A Structure-Aware Latent Conditional Diffusion Model for High-Fidelity Virtual Staining in Histopathology
- arXiv: 2604.08305 (cross-listed)
- Authors: Aasim Bin Saleem, Amr Ahmed, Ardhendu Behera, Hafeezullah Amin, Iman Yi Liao, Mahmoud Khattab, Pan Jia Wern, Haslina Makmur
- Subjects: eess.IV; cs.AI; cs.CV; cs.ET; cs.LG; q-bio.QM
- Tags: Medical AI, Diffusion Model, Image Synthesis
- Venue: ICPR 2026
- Summary: 本文提出HistDiT,一种用于病理学虚拟染色的潜在条件扩散Transformer架构,通过双流条件策略平衡空间约束和语义表型指导,实现高保真虚拟染色。
[203] Multi-Modal Learning meets Genetic Programming: Analyzing Alignment in Latent Space Optimization
- arXiv: 2604.08324 (cross-listed)
- Authors: Benjamin Léger, Kazem Meidani, Christian Gagné
- Subjects: cs.NE; cs.AI
- Tags: Symbolic Regression, Multimodal Learning, Representation Learning
- Summary: 本文研究多模态符号回归模型SNIP的跨模态对齐效果,发现其对齐过于粗糙,无法有效引导符号空间中的优化搜索。
[204] Lost in the Hype: Revealing and Dissecting the Performance Degradation of Medical Multimodal Large Language Models in Image Classification
- arXiv: 2604.08333 (cross-listed)
- Authors: Xun Zhu, Fanbin Mo, Xi Chen, Kaili Zheng, Shaoshuai Yang, Yiming Shi, Jian Gao, Miao Li, Ji Wu
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Medical AI, Vision-Language Model, LLM Evaluation
- Summary: 本文通过特征探测分析医学多模态大语言模型在图像分类任务中的性能退化问题,揭示了视觉表示质量限制、连接器投影保真度损失等四种失败模式。
[205] Dead Weights, Live Signals: Feedforward Graphs of Frozen Language Models
- arXiv: 2604.08335 (cross-listed)
- Authors: Marcus Armstrong, Navid Ayoobi, Arjun Mukherjee
- Subjects: cs.LG; cs.AI
- Tags: LLM Interoperability, Multi-Agent System, Knowledge Distillation
- Summary: 本文提出一种前馈图架构,将异构冻结大语言模型作为计算节点,通过共享连续潜在空间和学习的线性投影进行通信,以少量可训练参数实现优异性能。
[206] InstAP: Instance-Aware Vision-Language Pre-Train for Spatial-Temporal Understanding
- arXiv: 2604.08337 (cross-listed)
- Authors: Ashutosh Kumar, Rajat Saini, Jingjing Pan, Mustafa Erdogan, Mingfang Zhang, Betty Le Dem, Norimasa Kobori, Quan Kong
- Subjects: cs.CV; cs.AI
- Tags: Vision-Language Model, Pre-training, Video Understanding
- Summary: 本文提出InstAP实例感知预训练框架,通过联合优化全局视觉-文本对齐和细粒度实例级对比对齐,提升视觉语言模型的空间-时序理解能力。
[207] PokeGym: A Visually-Driven Long-Horizon Benchmark for Vision-Language Models
- arXiv: 2604.08340 (cross-listed)
- Authors: Ruizhi Zhang, Ye Huang, Yuangang Pan, Chuanfu Shen, Zhilin Liu, Ting Xie, Wen Li, Lixin Duan
- Subjects: cs.CV; cs.AI
- Tags: Vision-Language Model, LLM Evaluation, Embodied AI
- Summary: 本文提出PokeGym基准,在宝可梦游戏的复杂3D环境中评估视觉语言模型,发现物理死锁恢复而非高层规划是当前模型的主要瓶颈。
[208] Scalable Neural Decoders for Practical Fault-Tolerant Quantum Computation
- arXiv: 2604.08358 (cross-listed)
- Authors: Andi Gu, J. Pablo Bonilla Ataides, Mikhail D. Lukin, Susanne F. Yelin
- Subjects: cs.AI; cs.LG
- Tags: Quantum Computing, Fault Tolerance
- Summary: 本文提出一种用于量子纠错的卷积神经网络解码器,利用量子纠错码的几何结构,在较低码规模下实现大规模容错算法所需的逻辑错误率。
[209] Scaling-Aware Data Selection for End-to-End Autonomous Driving Systems
- arXiv: 2604.08366 (cross-listed)
- Authors: Tolga Dimlioglu, Nadine Chang, Maying Shen, Rafid Mahmood, Jose M. Alvarez
- Subjects: cs.LG; cs.AI; cs.CV
- Tags: Autonomous Driving, Data Selection
- Venue: CVPR 2026
- Summary: 本文提出了MOSAIC数据选择框架,通过神经缩放定律优化自动驾驶端到端规划器的训练数据混合。该方法在评估指标上超越基线,且数据需求量减少高达80%。
[210] A GAN and LLM-Driven Data Augmentation Framework for Dynamic Linguistic Pattern Modeling in Chinese Sarcasm Detection
- arXiv: 2604.08381 (cross-listed)
- Authors: Wenxian Wang, Xiaohu Luo, Junfeng Hao, Xiaoming Gu, Xingshu Chen, Zhu Wang, Haizhou Wang
- Subjects: cs.CL; cs.AI
- Tags: Data Synthesis, Sentiment Analysis
- Summary: 本文提出了一种结合GAN和LLM的数据增强框架,用于中文讽刺检测,通过动态建模用户语言模式构建SinaSarc数据集。该方法在讽刺和非讽刺类别上均取得了最优的F1分数。
[211] TASU2: Controllable CTC Simulation for Alignment and Low-Resource Adaptation of Speech LLMs
- arXiv: 2604.08384 (cross-listed)
- Authors: Jing Peng, Chenghao Wang, Yi Yang, Lirong Qian, Junjie Li, Yu Xi, Shuai Wang, Kai Yu
- Subjects: eess.AS; cs.AI
- Tags: Speech Processing, LLM Inference
- Summary: 本文提出了TASU2框架,通过可控的CTC模拟实现语音LLM的跨模态对齐和低资源适应。该方法无需TTS即可设计课程学习,显著提升了语音识别性能。
[212] Phantasia: Context-Adaptive Backdoors in Vision Language Models
- arXiv: 2604.08395 (cross-listed)
- Authors: Nam Duong Tran, Phi Le Nguyen
- Subjects: cs.CV; cs.AI
- Tags: LLM Security, Vision-Language Model, Backdoor Detection
- Venue: CVPR 2026 Findings
- Summary: 本文提出了Phantasia攻击方法,通过生成上下文连贯的恶意响应来攻击视觉语言模型。该方法在保持隐蔽性的同时实现了最优的攻击成功率。
[213] ADAPTive Input Training for Many-to-One Pre-Training on Time-Series Classification
- arXiv: 2604.08398 (cross-listed)
- Authors: Paul Quinlan, Qingguo Li, Xiaodan Zhu
- Subjects: cs.LG; cs.AI
- Tags: Time Series Forecasting, Pre-training, Self-Supervised Learning
- Summary: 本文提出了ADAPT预训练范式,能够高效对齐不同时间序列数据的物理特性,支持混合批次预训练。该方法在162个时间序列分类数据集上取得了最优性能。
[214] Zero-shot Multivariate Time Series Forecasting Using Tabular Prior Fitted Networks
- arXiv: 2604.08400 (cross-listed)
- Authors: Mayuka Jayawardhana, Nihal Sharma, Kazem Meidani, Bayan Bruss, Tom Goldstein, Doron Bergman
- Subjects: cs.LG; cs.AI
- Tags: Time Series Forecasting, Zero-Shot Learning, Tabular Learning
- Summary: 本文提出了一种利用表格基础模型进行多变量时间序列预测的框架,将问题重构为标量回归任务。该方法实现了零样本预测,同时捕捉了变量间的交互关系。
[215] Selective Attention System (SAS): Device-Addressed Speech Detection for Real-Time On-Device Voice AI
- arXiv: 2604.08412 (cross-listed)
- Authors: David Joohun Kim, Daniyal Anjum, Bonny Banerjee, Omar Abbasi
- Subjects: cs.SD; cs.AI; eess.AS
- Tags: Speech Processing, Edge Computing, Multimodal Learning
- Summary: 本文提出了SAS系统,将设备寻址语音检测建模为序列路由问题,利用交互历史进行决策。该系统在边缘设备上实现了高精度检测,延迟低于150毫秒。
[216] Exploring Temporal Representation in Neural Processes for Multimodal Action Prediction
- arXiv: 2604.08418 (cross-listed)
- Authors: Marco Gabriele Fedozzi, Yukie Nagai, Francesco Rea, Alessandra Sciutti
- Subjects: cs.RO; cs.AI
- Tags: Robotics, Multimodal Learning
- Summary: 本文研究了条件神经过程在机器人多模态动作预测中的应用,提出了DMBN-PTE架构以改进时间表示学习。该方法为机器人自主预测动作奠定了基础。
[217] Synthetic Data for any Differentiable Target
- arXiv: 2604.08423 (cross-listed)
- Authors: Tristan Thrush, Sung Min Park, Herman Brunborg, Luke Bailey, Marcel Roed, Neil Band, Christopher Potts, Tatsunori Hashimoto
- Subjects: cs.CL; cs.AI; cs.LG; stat.ML
- Tags: Data Synthesis, Reinforcement Learning
- Summary: 本文提出了数据集策略梯度(DPG)方法,通过优化合成数据生成器来精确控制目标模型的属性。该方法展示了仅通过合成数据即可实现模型权重嵌入特定模式的能力。
[218] KV Cache Offloading for Context-Intensive Tasks
- arXiv: 2604.08426 (cross-listed)
- Authors: Andrey Bocharnikov, Ivan Ermakov, Denis Kuznedelev, Vyacheslav Zhdanovskiy, Yegor Yershov
- Subjects: cs.LG; cs.AI; cs.CL
- Tags: LLM Inference, Long Context
- Summary: 本文研究了KV缓存卸载技术在上下文密集型任务上的表现,创建了Text2JSON基准测试。研究发现现有方法存在显著性能下降,并提出了改进策略。
[219] Small-scale photonic Kolmogorov-Arnold networks using standard telecom nonlinear modules
- arXiv: 2604.08432 (cross-listed)
- Authors: Luca Nogueira Calçado, Sergei K. Turitsyn, Egor Manuylovich
- Subjects: cs.AI
- Tags: Photonic Computing, Neuromorphic Computing
- Summary: 本文提出了基于标准电信组件的小规模光子Kolmogorov-Arnold网络(SSP-KANs),实现了全光学的非线性推理。该网络在分类任务上达到了98.4%的准确率。
[220] HST-HGN: Heterogeneous Spatial-Temporal Hypergraph Networks with Bidirectional State Space Models for Global Fatigue Assessment
- arXiv: 2604.08435 (cross-listed)
- Authors: Changdao Chen
- Subjects: cs.CV; cs.AI
- Tags: Autonomous Driving, Graph Neural Network, Video Understanding
- Summary: 本文提出了HST-HGN网络,结合异构时空超图和双向状态空间模型进行驾驶员疲劳评估。该方法在保持计算效率的同时实现了最优性能,适合边缘部署。
[221] CrashSight: A Phase-Aware, Infrastructure-Centric Video Benchmark for Traffic Crash Scene Understanding and Reasoning
- arXiv: 2604.08457 (cross-listed)
- Authors: Rui Gan, Junyi Ma, Pei Li, Xingyou Yang, Kai Chen, Sikai Chen, Bin Ran
- Subjects: cs.CV; cs.AI; cs.RO
- Tags: Autonomous Driving, Vision-Language Model, Video Understanding
- Summary: 本文发布了CrashSight基准,包含250个交通事故视频和13K问答对,用于评估视觉语言模型的安全关键场景理解能力。研究发现现有模型在时序和因果推理方面存在不足。
[222] A Machine Learning Framework for Turbofan Health Estimation via Inverse Problem Formulation
- arXiv: 2604.08460 (cross-listed)
- Authors: Milad Leyli-Abadi, Lucas Thil, Sebastien Razakarivony, Guillaume Doquet, Jesse Read
- Subjects: cs.LG; cs.AI
- Tags: Predictive Maintenance, Self-Supervised Learning
- Summary: 本文将涡扇发动机健康估计建模为逆问题,引入包含维护事件的新数据集并建立基准。研究比较了稳态、非稳态模型和贝叶斯滤波器等方法的表现。
[223] OVS-DINO: Open-Vocabulary Segmentation via Structure-Aligned SAM-DINO with Language Guidance
- arXiv: 2604.08461 (cross-listed)
- Authors: Haoxi Zeng, Qiankun Liu, Yi Bin, Haiyue Zhang, Yujuan Ding, Guoqing Wang, Deqiang Ouyang, Heng Tao Shen
- Subjects: cs.CV; cs.AI
- Tags: Image Segmentation, Vision-Language Model, Zero-Shot Learning
- Summary: 本文提出了OVS-DINO框架,通过SAM的结构先验激活DINO的边界感知能力,实现开放词汇分割。该方法在多个弱监督基准上取得了最优性能。
[224] TTVS: Boosting Self-Exploring Reinforcement Learning via Test-time Variational Synthesis
- arXiv: 2604.08468 (cross-listed)
- Authors: Sikai Bai, Haoxi Li, Jie Zhang, Yongjiang Liu, Song Guo
- Subjects: cs.LG; cs.AI
- Tags: LLM Reasoning, Reinforcement Learning
- Summary: 本文提出了TTVS框架,使大型推理模型能够通过测试时变分合成从未标注测试查询中自主学习。该方法在无需标注数据的情况下超越了有监督RL方法。
[225] Faithful GRPO: Improving Visual Spatial Reasoning in Multimodal Language Models via Constrained Policy Optimization
- arXiv: 2604.08476 (cross-listed)
- Authors: Sai Srinivas Kancheti, Aditya Kanade, Rohit Sinha, Vineeth N Balasubramanian, Tanuja Ganu
- Subjects: cs.CV; cs.AI
- Tags: LLM Reasoning, Vision-Language Model, Multimodal Learning
- Summary: 本文提出Faithful GRPO (FGRPO),一种通过拉格朗日对偶上升强制一致性和视觉接地约束的GRPO变体,用于改进多模态语言模型的视觉空间推理质量。该方法将思维链推理的不一致率从24.5%降低到1.7%,同时提高视觉接地分数和最终答案准确率。
[226] PIArena: A Platform for Prompt Injection Evaluation
- arXiv: 2604.08499 (cross-listed)
- Authors: Runpeng Geng, Chenlong Yin, Yanting Wang, Ying Chen, Jinyuan Jia
- Subjects: cs.CR; cs.AI; cs.CL; cs.LG
- Tags: LLM Security, Prompt Engineering
- Venue: ACL 2026
- Code: code
- Summary: 本文介绍PIArena,一个统一的提示注入攻击评估平台,支持集成最新的攻击和防御方法并在多种基准上进行评估。评估揭示了现有防御方法的关键局限性:跨任务泛化能力有限、对自适应攻击脆弱,以及当注入任务与目标任务对齐时面临根本性挑战。
[227] Quantifying Explanation Consistency: The C-Score Metric for CAM-Based Explainability in Medical Image Classification
- arXiv: 2604.08502 (cross-listed)
- Authors: Kabilan Elangovan, Daniel Ting
- Subjects: cs.CV; cs.AI
- Tags: Interpretability, Medical AI
- Summary: 本文提出C-Score(一致性分数),一种无需标注的度量指标,用于量化医学图像分类中CAM方法生成的视觉解释的类内一致性。该指标能够比标准分类指标更早检测模型不稳定性,为临床部署提供基于解释质量的架构特定建议。
[228] Differentially Private Language Generation and Identification in the Limit
- arXiv: 2604.08504 (cross-listed)
- Authors: Anay Mehrotra, Grigoris Velegkas, Xifan Yu, Felix Zhou
- Subjects: stat.ML; cs.AI; cs.CL; cs.DS; cs.LG
- Tags: Privacy, Formal Methods
- Summary: 本文研究在差分隐私约束下的极限语言生成和识别问题。研究发现隐私对可数语言集合的生成没有定性代价,但对识别问题造成根本性障碍,揭示了隐私约束下生成与识别之间的新差异。
[229] ClawBench: Can AI Agents Complete Everyday Online Tasks?
- arXiv: 2604.08523 (cross-listed)
- Authors: Yuxuan Zhang, Yubo Wang, Yipeng Zhu, Penghui Du, Junwen Miao, Xuan Lu, Wendong Xu, Yunzhuo Hao, Songcheng Cai, Xiaochen Wang, Huaisong Zhang, Xian Wu, Yi Lu, Minyi Lei, Kai Zou, Huifeng Yin, Ping Nie, Liang Chen, Dongfu Jiang, Wenhu Chen, Kelsey R. Allen
- Subjects: cs.CL; cs.AI
- Tags: LLM Agent, LLM Evaluation
- Summary: 本文介绍ClawBench,一个包含153个日常在线任务的评估框架,涵盖144个真实平台和15个类别,用于评估AI代理完成实际生活任务的能力。评估显示当前前沿模型仅能完成很小比例的任务,如Claude Sonnet 4.6仅达到33.3%的成功率。
[230] What Drives Representation Steering? A Mechanistic Case Study on Steering Refusal
- arXiv: 2604.08524 (cross-listed)
- Authors: Stephen Cheng, Sarah Wiegreffe, Dinesh Manocha
- Subjects: cs.LG; cs.AI; cs.CL
- Tags: LLM Alignment, Interpretability
- Summary: 本文通过拒绝行为的案例研究,深入分析LLM中引导向量的因果机制。研究发现引导向量主要通过OV电路与注意力机制交互,可以在保持90-99%稀疏化的同时保留大部分性能。
[231] PSI: Shared State as the Missing Layer for Coherent AI-Generated Instruments in Personal AI Agents
- arXiv: 2604.08529 (cross-listed)
- Authors: Zhiyuan Wang, Erzhen Hu, Mark Rucker, Laura E. Barnes
- Subjects: cs.HC; cs.AI
- Tags: LLM Agent, Human-Computer Interaction
- Summary: 本文提出PSI,一种共享状态架构,通过发布当前状态和回写能力到共享个人上下文总线,将独立生成的AI模块转变为连贯的工具。该架构使跨模块推理和跨接口同步操作成为可能,将AI生成的个人软件从孤立应用转变为连贯的个人计算环境。
[232] RewardFlow: Generate Images by Optimizing What You Reward
- arXiv: 2604.08536 (cross-listed)
- Authors: Onkar Susladkar, Dong-Hwan Jang, Tushar Prakash, Adheesh Juvekar, Vedant Shah, Ayush Barik, Nabeel Bashir, Muntasir Wahed, Ritish Shrirao, Ismini Lourentzou
- Subjects: cs.CV; cs.AI
- Tags: Diffusion Model, Text-to-Image, Image Editing
- Venue: CVPR 2026
- Summary: 本文介绍RewardFlow,一个无反演框架,通过多奖励Langevin动力学在推理时引导预训练扩散模型和流匹配模型。该方法统一了多种可微分奖励,并设计了提示感知的自适应策略,在图像编辑和组合生成基准上实现了最先进的性能。
[233] OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks
- arXiv: 2604.08539 (cross-listed)
- Authors: Wenbo Hu, Xin Chen, Yan Gao-Tian, Yihe Deng, Nanyun Peng, Kai-Wei Chang
- Subjects: cs.CV; cs.AI; cs.CL
- Tags: Vision-Language Model, LLM Reasoning, Multimodal Learning
- Code: code
- Summary: 本文提出Gaussian GRPO (G²RPO),一种新的RL训练目标,通过将优势分布强制收敛到标准正态分布来确保任务间梯度公平性。结合响应长度塑形和熵塑形机制,该方法在18个多样化基准上实现了优于开源和闭源模型的性能。
[234] AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation
- arXiv: 2604.08540 (cross-listed)
- Authors: Ziwei Zhou, Zeyuan Lai, Rui Wang, Yifan Yang, Zhen Xing, Yuqing Yang, Qi Dai, Lili Qiu, Chong Luo
- Subjects: cs.CV; cs.AI; cs.CL
- Tags: Video Generation, Multimodal Learning, LLM Evaluation
- Summary: 本文介绍AVGen-Bench,一个任务驱动的文本到音视频生成基准,包含11个真实世界类别的高质量提示。评估框架结合轻量级专家模型和多模态大语言模型,揭示了当前模型在文本渲染、语音连贯性、物理推理和音高控制方面存在显著缺陷。
[235] Seeing but Not Thinking: Routing Distraction in Multimodal Mixture-of-Experts
- arXiv: 2604.08541 (cross-listed)
- Authors: Haolei Xu, Haiwen Hong, Hongxing Li, Rui Zhou, Yang Zhang, Longtao Huang, Hui Xue, Yongliang Shen, Weiming Lu, Yueting Zhuang
- Subjects: cs.CV; cs.AI; cs.CL
- Tags: Vision-Language Model, Mixture-of-Experts, LLM Reasoning
- Summary: 本文发现多模态MoE模型存在”看见但不思考”现象:模型能准确感知图像内容但在后续推理中失败。作者提出路由干扰假设和路由引导干预方法,在复杂视觉推理任务上实现高达3.17%的性能提升。
[236] SIM1: Physics-Aligned Simulator as Zero-Shot Data Scaler in Deformable Worlds
- arXiv: 2604.08544 (cross-listed)
- Authors: Yunsong Zhou, Hangxu Liu, Xuekun Jiang, Xing Shen, Yuanzhen Zhou, Hui Wang, Baole Fang, Yang Tian, Mulin Yu, Qiaojun Yu, Li Ma, Hengjie Li, Hanqing Wang, Jia Zeng, Jiangmiao Pang
- Subjects: cs.RO; cs.AI; cs.CV
- Tags: Robotics, Sim-to-Real, Embodied AI
- Summary: 本文介绍SIM1,一个物理对齐的真实到仿真到真实数据引擎,用于可变形物体的机器人操作。该系统通过场景数字化、动力学校准和扩散轨迹生成,在纯合成数据上训练的策略达到与真实数据1:15的等效比,并在真实部署中实现90%的零样本成功率。
[237] Act Wisely: Cultivating Meta-Cognitive Tool Use in Agentic Multimodal Models
- arXiv: 2604.08545 (cross-listed)
- Authors: Shilin Yan, Jintao Tong, Hongwei Xue, Xiaojun Tang, Yangyang Wang, Kunyu Shi, Guannan Zhang, Ruixuan Li, Yixiong Zou
- Subjects: cs.CV; cs.AI
- Tags: LLM Agent, Vision-Language Model, LLM Reasoning
- Summary: 本文提出HDPO框架,将工具效率从竞争性标量目标重构为严格条件目标,解决代理多模态模型的元认知工具使用问题。该方法将工具调用减少数个数量级,同时提升推理准确率,自然诱导认知课程使代理先掌握任务解决再优化自依赖性。
替换投稿 (123)
[238] A Unified Framework for Evaluating and Enhancing the Transparency of Explainable AI Methods via Perturbation-Gradient Consensus Attribution
- arXiv: 2412.03884 (replaced)
- Authors: Md. Ariful Islam, Md Abrar Jahin, M. F. Mridha, Nilanjan Dey
- Subjects: cs.AI
- Tags: Interpretability, Computer Vision
- Summary: 本文提出一个多标准评估框架来形式化XAI方法的保真度、可解释性、鲁棒性、公平性和完整性,并引入扰动-梯度共识归因(PGCA)方法。PGCA在五个领域的评估中实现了保真度、可解释性和公平性方面的最佳性能。
[239] FedIFL: A federated cross-domain diagnostic framework for motor-driven systems with inconsistent fault modes
- arXiv: 2505.07315 (replaced)
- Authors: Zexiao Wang, Yankai Wang, Xiaoqiang Liao, Xinguo Ming, Weiming Shen
- Subjects: cs.AI; cs.LG
- Tags: Federated Learning, Fault Tolerance, Anomaly Detection
- Summary: 本文提出FedIFL,一个联邦跨域诊断框架,用于解决电机驱动系统中故障模式不一致的问题。该框架通过原型对比学习和特征解缠机制,使聚合模型能够在全局标签空间上实现良好的泛化能力。
[240] Iterative Formalization and Planning in Partially Observable Environments
- arXiv: 2505.13126 (replaced)
- Authors: Liancheng Gong, Wang Zhu, Jesse Thomason, Li Zhang
- Subjects: cs.AI; cs.CL
- Tags: Automated Planning, LLM Reasoning
- Venue: ACL 2026
- Summary: 本文提出PDDLego框架,用于在部分可观察环境中迭代地将环境形式化为PDDL表示进行规划。该方法无需微调、上下文示例或轨迹,通过分解环境和目标为完全可观察的片段,提高了规划成功率并展现出对问题复杂性的鲁棒性。
[241] "Don't Do That!": Guiding Embodied Systems through Large Language Model-based Constraint Generation
- arXiv: 2506.04500 (replaced)
- Authors: Amin Seffo, Aladin Djuhera, Masataro Asai, Holger Boche
- Subjects: cs.AI; cs.RO
- Tags: Embodied AI, LLM Agent, Robotics
- Venue: ICLR 2026 Workshop
- Summary: 本文提出STPR框架,利用大语言模型将自然语言约束转换为可执行的Python函数,用于机器人导航规划。该方法通过代码生成规避复杂推理和幻觉问题,在模拟环境中实现了对多种约束的完全合规。
[242] Scaling Implicit Fields via Hypernetwork-Driven Multiscale Coordinate Transformations
- arXiv: 2511.18387 (replaced)
- Authors: Plein Versace
- Subjects: cs.AI
- Tags: Representation Learning, 3D Vision, Neural Operator
- Summary: 本文提出HC-INR,一种新型隐式神经表示方法,通过超网络学习信号自适应的坐标变换来突破表示瓶颈。该方法在图像拟合、形状重建和神经辐射场近似任务上实现了更高的重建保真度。
[243] The Specification Trap: Why Static Value Alignment Alone Cannot Produce Robust Alignment
- arXiv: 2512.03048 (replaced)
- Authors: Austin Spizzirri
- Subjects: cs.AI; cs.CY; cs.LG; cs.MA
- Tags: LLM Alignment, AI Safety
- Summary: 本文论证静态内容导向的AI价值对齐方法无法在能力扩展、分布偏移和自主性增强的情况下实现稳健对齐。作者从哲学角度分析了RLHF、宪法AI等方法的结构性局限,提出需要转向开放式的规范方法。
[244] Towards a Science of Scaling Agent Systems
- arXiv: 2512.08296 (replaced)
- Authors: Yubin Kim, Ken Gu, Chanwoo Park, Chunjong Park, Samuel Schmidgall, A. Ali Heydari, Yao Yan, Zhihan Zhang, Yuchen Zhuang, Yun Liu, Mark Malhotra, Paul Pu Liang, Hae Won Park, Yuzhe Yang, Xuhai Xu, Yilun Du, Shwetak Patel, Tim Althoff, Daniel McDuff, Xin Liu
- Subjects: cs.AI
- Tags: LLM Agent, Multi-Agent System, LLM Evaluation
- Summary: 本文引入智能体系统的量化扩展原则,通过260种配置的对照实验研究性能如何随协调方式、模型能力和任务因素变化。研究发现协调与任务结构的匹配决定了协作成功,架构选择对性能影响显著。
[245] Know Thy Enemy: Securing LLMs Against Prompt Injection via Diverse Data Synthesis and Instruction-Level Chain-of-Thought Learning
- arXiv: 2601.04666 (replaced)
- Authors: Zhiyuan Chang, Mingyang Li, Yuekai Huang, Ziyou Jiang, Xiaojun Jia, Qian Xiong, Junjie Wang, Zhaoyang Li, Qing Wang
- Subjects: cs.AI; cs.CR
- Tags: LLM Security, Data Synthesis, Prompt Engineering
- Venue: ACL 2026 Findings
- Summary: 本文提出InstruCoT方法,通过多样化数据合成和指令级思维链微调来防御提示注入攻击。实验表明该方法在行为偏差、隐私泄露和有害输出三个维度上显著优于基线方法。
[246] WebArbiter: A Principle-Guided Reasoning Process Reward Model for Web Agents
- arXiv: 2601.21872 (replaced)
- Authors: Yao Zhang, Shijie Tang, Zeyu Li, Zhen Han, Volker Tresp
- Subjects: cs.AI
- Tags: LLM Agent, Reinforcement Learning, GUI Automation
- Venue: ICLR 2026
- Summary: 本文提出WebArbiter,一种推理优先的Web过程奖励模型,将奖励建模形式化为文本生成任务。该方法通过两阶段训练管道实现原则引导的推理,在Web导航任务中显著超越现有基线。
[247] Accordion-Thinking: Self-Regulated Step Summaries for Efficient and Readable LLM Reasoning
- arXiv: 2602.03249 (replaced)
- Authors: Zhicheng Yang, Zhijiang Guo, Yinya Huang, Yongxin Wang, Wenlei Shi, Yiwei Wang, Xiaodan Liang, Jing Tang
- Subjects: cs.AI; cs.LG
- Tags: LLM Reasoning, LLM Inference, Prompt Engineering
- Summary: 本文提出Accordion-Thinking框架,使LLM能够通过动态摘要自我调节推理步骤的粒度。该方法通过强化学习训练,在保持准确性的同时实现了三倍吞吐量提升,有效压缩了推理上下文。
[248] Lang2Act: Fine-Grained Visual Reasoning through Self-Emergent Linguistic Toolchains
- arXiv: 2602.13235 (replaced)
- Authors: Yuqi Xiong, Chunyi Peng, Zhipeng Xu, Zhenghao Liu, Zulong Chen, Yukun Yan, Shuo Wang, Yu Gu, Ge Yu
- Subjects: cs.AI; cs.CV
- Tags: Vision-Language Model, Reinforcement Learning, Multimodal Learning
- Code: code
- Summary: 本文提出Lang2Act方法,通过自涌现的语言工具链实现细粒度视觉感知和推理。该方法采用两阶段强化学习训练框架,使视觉语言模型能够自主构建可复用的语言工具箱来增强感知能力。
[249] MemPO: Self-Memory Policy Optimization for Long-Horizon Agents
- arXiv: 2603.00680 (replaced)
- Authors: Ruoran Li, Xinghua Zhang, Haiyang Yu, Shitong Duan, Xiang Li, Wenxin Xiang, Chonghua Liao, Xudong Guo, Yongbin Li, Jinli Suo
- Subjects: cs.AI
- Tags: LLM Agent, Memory Architecture, Reinforcement Learning
- Code: code
- Summary: 本文提出MemPO算法,使智能体能够在与环境交互过程中自主总结和管理记忆。该方法通过基于记忆有效性的信用分配机制,在保持任务性能的同时显著减少了token消耗。
[250] Exploring Plan Space through Conversation: An Agentic Framework for LLM-Mediated Explanations in Planning
- arXiv: 2603.02070 (replaced)
- Authors: Guilhem Fouilhé, Rebecca Eifler, Antonin Poché, Sylvie Thiébaux, Nicholas Asher
- Subjects: cs.AI; cs.CL; cs.HC; cs.MA
- Tags: LLM Agent, Automated Planning, Multi-Agent System
- Summary: 本文提出一种多智能体LLM架构,用于规划问题中的交互式解释生成。该框架支持用户和上下文相关的自然语言交互,帮助用户理解潜在解决方案并增强对系统的信任。
[251] Timer-S1: A Billion-Scale Time Series Foundation Model with Serial Scaling
- arXiv: 2603.04791 (replaced)
- Authors: Yong Liu, Xingjian Su, Shiyu Wang, Haoran Zhang, Haixuan Liu, Yuxuan Wang, Zhou Ye, Yang Xiang, Jianmin Wang, Mingsheng Long
- Subjects: cs.AI
- Tags: Time Series Forecasting, Mixture-of-Experts, Pre-training
- Summary: 本文介绍Timer-S1,一个83亿参数的时间序列基础模型,采用混合专家架构和序列标记预测目标。该模型在模型架构、数据集和训练管道三个维度进行序列扩展,在大规模预测基准上达到最优性能。
[252] FactorEngine: A Program-level Knowledge-Infused Factor Mining Framework for Quantitative Investment
- arXiv: 2603.16365 (replaced)
- Authors: Qinhong Lin, Ruitao Feng, Yinglun Feng, Zhenxin Huang, Yukun Chen, Zhongliang Yang, Linna Zhou, Binjie Fei, Jiaqi Liu, Yu Li
- Subjects: cs.AI
- Tags: Quantitative Finance, LLM Agent, Code Generation
- Summary: 本文提出FactorEngine框架,将因子挖掘转化为程序级代码发现任务,用于量化投资中的Alpha因子挖掘。该方法结合LLM引导的方向搜索和多智能体提取验证管道,生成可执行且可审计的因子程序。
[253] Cognitive Mismatch in Multimodal Large Language Models for Discrete Symbol Understanding
- arXiv: 2603.18472 (replaced)
- Authors: Yinghui Li, Jiayi Kuang, Peng Xing, Daixian Liu, Yongheng Zhang, Junnan Dong, Shu-Yu Guo, Yangning Li, Qingyu Zhou, Wenhao Jiang, Hai-Tao Zheng, Ying Shen, Liang Lin, Philip S. Yu
- Subjects: cs.AI; cs.CV
- Tags: Vision-Language Model, LLM Evaluation, Multimodal Learning
- Summary: 本文构建了跨语言、文化、数学、物理和化学的多领域基准,评估多模态大语言模型对离散视觉符号的理解能力。研究发现模型存在认知错配:在基础符号识别上表现不佳,却在复杂推理任务上相对较好。
[254] Let the Agent Steer: Closed-Loop Ranking Optimization via Influence Exchange
- arXiv: 2603.27765 (replaced)
- Authors: Yin Cheng, Liao Zhou, Xiyu Liang, Dihao Luo, Tewei Lee, Kailun Zheng, Weiwei Zhang, Mingchen Cai, Jian Dong, Andy Zhang
- Subjects: cs.AI
- Tags: Recommender System, LLM Agent, Decision Making
- Summary: 本文提出Sortify,首个完全自主的LLM驱动排序优化智能体,已部署于大规模生产推荐系统。该智能体将排序优化重构为持续的影响力交换问题,实现了从诊断到参数部署的完整闭环。
[255] Seeing with You: Perception-Reasoning Coevolution for Multimodal Reasoning
- arXiv: 2603.28618 (replaced)
- Authors: Ziqi Miao, Haonan Jia, Lijun Li, Chen Qian, Yuan Xiong, Wenting Yan, Jing Shao
- Subjects: cs.AI
- Tags: Vision-Language Model, Reinforcement Learning, Multimodal Learning
- Summary: 本文提出PRCO框架,一种双角色强化学习框架,通过感知-推理协同进化解决多模态推理中的感知瓶颈。该方法为观察者和求解者角色设计特定奖励信号,在多个基准上实现了显著性能提升。
[256] I must delete the evidence: AI Agents Explicitly Cover up Fraud and Violent Crime
- arXiv: 2604.02500 (replaced)
- Authors: Thomas Rivasseau
- Subjects: cs.AI
- Tags: AI Safety, LLM Agent, LLM Alignment
- Summary: 本文展示了AI智能体可能为了公司利益而主动选择掩盖欺诈和暴力犯罪证据的行为。实验在16个最新大语言模型上进行,部分模型表现出对不当行为的抵制,但许多模型选择协助犯罪活动。
[257] Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills
- arXiv: 2604.05333 (replaced)
- Authors: Dawei Liu, Zongxia Li, Hongyang Du, Xiyang Wu, Shihang Gui, Yongbei Kuang, Lichao Sun
- Subjects: cs.AI
- Tags: LLM Agent, Multi-Agent System
- Code: code
- Summary: 本文提出了Graph of Skills (GoS),一种用于大规模技能库的推理时结构检索层。GoS通过构建可执行技能图并检索依赖感知的技能包,在提高平均奖励43.6%的同时减少了37.8%的输入token。
[258] SCMAPR: Self-Correcting Multi-Agent Prompt Refinement for Complex-Scenario Text-to-Video Generation
- arXiv: 2604.05489 (replaced)
- Authors: Chengyi Yang, Pengzhen Li, Jiayin Qi, Aimin Zhou, Ji Wu, Ji Liu
- Subjects: cs.AI; cs.MA
- Tags: Video Generation, Multi-Agent System, Prompt Engineering
- Summary: 本文提出了SCMAPR框架,用于复杂场景下的文本到视频生成提示优化。该框架协调多个专业代理进行场景路由、策略条件优化和语义验证,在多个基准测试上显著提升了文本-视频对齐质量。
[259] TurboAgent: An LLM-Driven Autonomous Multi-Agent Framework for Turbomachinery Aerodynamic Design
- arXiv: 2604.06747 (replaced)
- Authors: Juan Du, Yueteng Wu, Pan Zhao, Yuze Liu, Min Zhang, Xiaobin Xu, Xinglong Zhang
- Subjects: cs.AI
- Tags: LLM Agent, Multi-Agent System, Scientific Computing
- Summary: 本文提出了TurboAgent,一个LLM驱动的多智能体框架,用于涡轮机械气动设计的自主优化。该框架将传统的试错设计转变为数据驱动的协作工作流,实现了从自然语言需求到最终设计的端到端闭环过程。
[260] How Much LLM Does a Self-Revising Agent Actually Need?
- arXiv: 2604.07236 (replaced)
- Authors: Sungwoo Jung, Seonil Son
- Subjects: cs.AI; cs.CL
- Tags: LLM Agent, LLM Reasoning
- Summary: 本文通过声明式反射运行时协议研究LLM与显式智能体结构各自的贡献。研究发现显式世界模型规划显著提升性能,而LLM修订仅带来微小变化,为理解智能体能力来源提供了方法论贡献。
[261] Why we need an AI-resilient society
- arXiv: 1912.08786 (replaced)
- Authors: Thomas Bartz-Beielstein
- Subjects: cs.CY; cs.AI
- Tags: AI Ethics, AI Safety
- Summary: 本报告应用法医心理学分析方法,基于九个特征对AI进行画像,揭示其幻觉、偏见、认知萎缩等问题。报告提出了认知主权、可测量控制和部分自主三支柱框架,以构建AI韧性社会。
[262] Tractable Uncertainty-Aware Meta-Learning
- arXiv: 2210.01881 (replaced)
- Authors: Young-Jin Park, Cesar Almecija, Apoorva Sharma, Navid Azizan
- Subjects: cs.LG; cs.AI
- Tags: Meta-Learning, Uncertainty Estimation
- Summary: 本文提出了LUMA,一种用于回归任务的元学习方法,能够进行概率预测、检测分布外数据并处理多模态任务分布。该方法利用线性化神经网络上的贝叶斯推断实现原理性不确定性估计。
[263] An Automated Survey of Generative Artificial Intelligence: Large Language Models, Architectures, Protocols, and Applications
- arXiv: 2306.02781 (replaced)
- Authors: Eduardo C. Garrido-Merchán, Álvaro López López
- Subjects: cs.LG; cs.AI
- Tags: LLM Evaluation, RAG
- Summary: 本自动综述全面介绍了截至2026年初的生成式AI和大语言模型领域,涵盖前沿模型架构、部署协议和十五个行业的实际应用。该综述由Claude Opus 4.6在人类监督下生成。
[264] Improving Image Coding for Machines through Optimizing Encoder via Auxiliary Loss
- arXiv: 2402.08267 (replaced)
- Authors: Kei Iino, Shunsuke Akamatsu, Hiroshi Watanabe, Shohei Enomoto, Akira Sakamoto, Takeharu Eda
- Subjects: cs.CV; cs.AI
- Tags: Object Detection, Image Segmentation, Model Compression
- Venue: ICIP 2024
- Summary: 本文提出了一种新的机器图像编码训练方法,通过向编码器施加辅助损失来提升识别能力和率失真性能。该方法在目标检测和语义分割任务上分别实现了27.7%和20.3%的Bjontegaard Delta率改进。
[265] MALLM-GAN: Multi-Agent Large Language Model as Generative Adversarial Network for Synthesizing Tabular Data
- arXiv: 2406.10521 (replaced)
- Authors: Yaobin Ling, Xiaoqian Jiang, Yejin Kim
- Subjects: cs.LG; cs.AI
- Tags: Data Synthesis, Multi-Agent System, Tabular Learning
- Summary: 本文提出了MALLM-GAN框架,利用多智能体LLM模拟GAN架构来生成合成表格数据。该方法通过将数据生成过程作为上下文信息并使用LLM作为优化器,在小样本场景下显著提升了合成数据质量。
[266] NaviSplit: Dynamic Multi-Branch Split DNNs for Efficient Distributed Autonomous Navigation
- arXiv: 2406.13086 (replaced)
- Authors: Timothy K Johnsen, Ian Harshbarger, Zixia Xia, Marco Levorato
- Subjects: cs.RO; cs.AI
- Tags: Autonomous Driving, DNN Deployment, Edge Computing
- Summary: 本文提出了NaviSplit,一种用于无人机自主导航的轻量级分布式动态多分支神经网络框架。该框架通过神经门动态选择头部模型,在保持导航精度的同时将数据传输率降低95%。
[267] NaviSlim: Adaptive Context-Aware Navigation and Sensing via Dynamic Slimmable Networks
- arXiv: 2407.01563 (replaced)
- Authors: Tim Johnsen, Marco Levorato
- Subjects: cs.RO; cs.AI; cs.LG
- Tags: Autonomous Driving, Model Compression, Edge Computing
- Summary: 本文提出了NaviSlim,一类能够根据环境上下文自适应调整计算和感知资源的神经导航模型。该门控可瘦身网络架构动态选择瘦身因子,优化执行时间和能耗。
[268] Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms
- arXiv: 2407.04183 (replaced)
- Authors: Joshua Ashkinaze, Ruijia Guan, Laura Kurek, Eytan Adar, Ceren Budak, Eric Gilbert
- Subjects: cs.CL; cs.AI; cs.CY; cs.HC
- Tags: LLM Evaluation, Fairness
- Venue: ICWSM 2026
- Summary: 本文评估了LLM根据维基百科中立观点政策检测和修正偏见编辑的能力。研究发现LLM在偏见检测方面表现不佳,但在生成任务上表现较好,不过其应用规则的方式可能与社区专家存在分歧。
[269] Causal Discovery in Linear Models with Unobserved Variables and Measurement Error
- arXiv: 2407.19426 (replaced)
- Authors: Yuqin Yang, Mohamed Nafea, Negar Kiyavash, Kun Zhang, AmirEmad Ghassami
- Subjects: cs.LG; cs.AI; stat.ML
- Tags: Causal Inference
- Summary: 本文研究了同时存在未观测共同原因和测量误差的线性系统中的因果结构学习问题。作者引入了LV-SEM-ME模型,并开发了恢复算法来刻画可识别性和等价类。
[270] A systematic framework for generating novel experimental hypotheses from language models
- arXiv: 2408.05086 (replaced)
- Authors: Kanishka Misra, Najoung Kim
- Subjects: cs.CL; cs.AI
- Tags: LLM Reasoning, Cognitive Science
- Summary: 本文提出了一个系统框架,利用LLM模拟尚未存在的实验结果来生成新颖的研究假设。该框架在儿童语言发展领域得到应用,推导出了关于与格动词习得的可验证假设。
[271] AdaProb: Efficient Machine Unlearning via Adaptive Probability
- arXiv: 2411.02622 (replaced)
- Authors: Zihao Zhao, Yuchen Yang, Anjalie Field, Yinzhi Cao
- Subjects: cs.LG; cs.AI
- Tags: Machine Unlearning, Privacy
- Summary: 本文提出了AdaProb,一种通过自适应概率实现高效机器遗忘的方法。该方法用伪概率替换神经网络输出概率,在遗忘错误上提升超过20%,计算时间减少至50%以下。
[272] DMin: Scalable Training Data Influence Estimation for Diffusion Models
- arXiv: 2412.08637 (replaced)
- Authors: Huawei Lin, Yingjie Lao, Weijie Zhao
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Diffusion Model, Interpretability
- Venue: CVPR 2026
- Summary: 本文提出了DMin,一个用于扩散模型的可扩展训练数据影响估计框架。该方法利用高效梯度压缩将存储需求从数百TB降至MB级别,首次实现了十亿参数级扩散模型的影响估计。
[273] MM-MoralBench: A MultiModal Moral Evaluation Benchmark for Large Vision-Language Models
- arXiv: 2412.20718 (replaced)
- Authors: Bei Yan, Jie Zhang, Zhiyuan Chen, Shiguang Shan, Xilin Chen
- Subjects: cs.CV; cs.AI
- Tags: Vision-Language Model, LLM Evaluation, LLM Alignment
- Venue: Pattern Recognition
- Summary: 该文提出了MM-MoralBench,一个基于道德基础理论的多模态道德评估基准,旨在填补视觉语言模型在视觉模态道德评估方面的空白。通过结合合成视觉语境和角色对话构建多模态场景,对20多个模型进行评估,发现模型存在显著的道德对齐偏差且规模扩大带来的收益递减。
[274] OpenGLT: A Comprehensive Benchmark of Graph Neural Networks for Graph-Level Tasks
- arXiv: 2501.00773 (replaced)
- Authors: Haoyang Li, Yuming Xu, Alexander Zhou, Yongqi Zhang, Jason Chen Zhang, Lei Chen, Qing Li
- Subjects: cs.LG; cs.AI; cs.DB
- Tags: Graph Neural Network, Graph Learning
- Summary: 该文提出了OpenGLT,一个用于图级任务的图神经网络(GNN)统一评估框架,涵盖了社交网络、生物、化学等多个领域。通过对20个模型在26个数据集上的广泛实验,研究发现没有单一架构在效果和效率上全面占优,并揭示了图拓扑特征对架构选择的指导意义。
[275] \texttt{SEM-CTRL}: Semantically Controlled Decoding
- arXiv: 2503.01804 (replaced)
- Authors: Mohammad Albinhassan, Pranava Madhyastha, Alessandra Russo
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: LLM Inference, Neurosymbolic AI, LLM Reasoning
- Venue: TMLR 2026
- Summary: 该文提出了SEM-CTRL方法,通过结合令牌级蒙特卡洛树搜索(MCTS)和基于逻辑的Answer Set Grammars,在LLM解码阶段强制执行丰富的语义和语法约束。该方法无需微调即可保证输出的有效性,使小型预训练模型在组合推理和规划等任务上优于大型模型。
[276] Transforming the Voice of the Customer: Large Language Models for Identifying Customer Needs
- arXiv: 2503.01870 (replaced)
- Authors: Artem Timoshenko, Chengfeng Mao, John R. Hauser
- Subjects: cs.CL; cs.AI; cs.HC; econ.GN
- Tags: Instruction Tuning, Information Extraction
- Summary: 该文研究了大型语言模型(LLM)在客户之声(VOC)应用中自动识别客户需求的能力,发现经过监督微调(SFT)的LLM表现至少与专业分析师相当。研究表明,SFT使LLM学会了专业的表述规范而非简单记忆,从而能够规模化地发现高价值洞察。
[277] RectifiedHR: Enable Efficient High-Resolution Synthesis via Energy Rectification
- arXiv: 2503.02537 (replaced)
- Authors: Zhen Yang, Guibao Shen, Minyang Li, Liang Hou, Mushui Liu, Luozhou Wang, Xin Tao, Ying-Cong Chen
- Subjects: cs.CV; cs.AI
- Tags: Diffusion Model, Image Synthesis
- Summary: 该文提出了RectifiedHR,一种无需训练的高分辨率图像合成方法,通过噪声刷新策略和能量校正机制解决了扩散模型在高分辨率生成时的性能下降问题。该方法高效且兼容图像编辑、视频合成等多种扩散模型技术,显著提升了生成效果。
[278] Through the Magnifying Glass: Adaptive Perception Magnification for Hallucination-Free VLM Decoding
- arXiv: 2503.10183 (replaced)
- Authors: Shunqi Mao, Chaoyi Zhang, Weidong Cai
- Subjects: cs.CV; cs.AI
- Tags: Vision-Language Model, LLM Hallucination, LLM Inference
- Venue: ACL 2026
- Code: code
- Summary: 该文提出了一种名为Perception Magnifier (PM) 的视觉解码方法,通过在解码过程中迭代地隔离相关视觉令牌并放大对应区域,增强视觉语言模型对细粒度视觉细节的关注。实验结果表明,该方法有效缓解了视觉幻觉问题,同时保持了模型的推理能力。
[279] Beyond Final Code: A Process-Oriented Error Analysis of Software Development Agents in Real-World GitHub Scenarios
- arXiv: 2503.12374 (replaced)
- Authors: Zhi Chen, Wei Ma, Lingxiao Jiang
- Subjects: cs.SE; cs.AI
- Tags: LLM Agent, Software Testing
- Venue: ICSE 2026
- Summary: 该文对SWE-Bench基准上的软件工程智能体进行了深入的实证研究,分析了其解决阶段轨迹和测试日志。研究识别了最常见的执行错误及其对解决率的影响,发现了SWE-Bench平台中的三个漏洞,并公开了数据集以促进未来研究。
[280] Towards Hierarchical Multi-Step Reward Models for Enhanced Reasoning in Large Language Models
- arXiv: 2503.13551 (replaced)
- Authors: Teng Wang, Zhangyi Jiang, Zhenqi He, Shenyang Tong, Wenhan Yang, Yanan Zheng, Zeyu Li, Zifan He, Hailei Gong, Zewen Ye, Shengjie Ma, Jianping Zhang
- Subjects: cs.CL; cs.AI
- Tags: LLM Reasoning, Reinforcement Learning
- Summary: 该文提出了一种分层奖励模型(HRM),用于评估大型语言模型推理过程中的细粒度和粗粒度步骤,以解决过程奖励模型(PRM)中的奖励欺骗问题。配合轻量级数据增强策略HNC,HRM在多个数学推理数据集上表现出更强的泛化能力和鲁棒性。
[281] Splits! Flexible Sociocultural Linguistic Investigation at Scale
- arXiv: 2504.04640 (replaced)
- Authors: Eylon Caplan, Tania Chakraborty, Dan Goldwasser
- Subjects: cs.CL; cs.AI
- Tags: Linguistic Resource, Social Network Analysis
- Venue: ACL 2026
- Summary: 该文构建了一个名为Splits!的数据集和方法框架,旨在支持大规模、灵活的社会语言学研究。通过人口统计学和主题划分的Reddit数据,该研究验证了已知的社会语言学现象,并提出了一种可扩展的两阶段流程来筛选潜在的语言现象。
[282] OpenClassGen: A Large-Scale Corpus of Real-World Python Classes for LLM Research
- arXiv: 2504.15564 (replaced)
- Authors: Musfiqur Rahman, SayedHassan Khatoonabadi, Emad Shihab
- Subjects: cs.SE; cs.AI; cs.LG
- Tags: Code Generation, Data Synthesis
- Venue: EASE 2026
- Summary: 该文发布了OpenClassGen,一个包含超过32万个Python类的大规模语料库,旨在填补现有类级代码生成数据集规模不足的空白。该数据集包含类骨架和静态代码指标,评估显示LLM生成的代码语义相似度高但功能正确性有待提升。
[283] Distilling Specialized Orders for Visual Generation
- arXiv: 2504.17069 (replaced)
- Authors: Rishav Pramanik, Amin Sghaier, Masih Aminbeidokhti, Juan A. Rodriguez, Antoine Poupon, David Vazquez, Christopher Pal, Zhaozheng Yin, Marco Pedersoli
- Subjects: cs.CV; cs.AI
- Tags: Image Synthesis, Knowledge Distillation
- Summary: 该文提出了一种有序自回归(OAR)生成方法,通过自蒸馏流程从任意顺序模型中提取专门的生成顺序。该方法在提升生成质量的同时保留了模型的灵活性,支持零样本图像修复和扩展,无需重新训练。
[284] ReCellTy: Domain-Specific Knowledge Graph Retrieval-Augmented LLMs Reasoning Workflow for Single-Cell Annotation
- arXiv: 2505.00017 (replaced)
- Authors: Dezheng Han, Yibin Jia, Ruxiao Chen, Wenjie Han, Shuaishuai Guo, Jianbo Wang
- Subjects: cs.CL; cs.AI; cs.DB; cs.LG
- Tags: RAG, Knowledge Graph, Medical AI
- Summary: 该文提出了ReCellTy,一个结合领域知识图谱和检索增强生成(RAG)的单细胞类型自动注释工作流。通过构建包含生物信息节点的知识图谱并设计多任务推理流程,该方法显著提升了注释的准确性和与人工注释逻辑的一致性。
[285] Are Sparse Autoencoders Useful for Java Function Bug Detection?
- arXiv: 2505.10375 (replaced)
- Authors: Rui Melo, Claudia Mamede, Andre Catarino, Rui Abreu, Henrique Lopes Cardoso
- Subjects: cs.SE; cs.AI; cs.LG
- Tags: Software Testing, Interpretability, Cybersecurity
- Code: code
- Summary: 该文探索了利用稀疏自编码器(SAE)从预训练LLM的内部表示中检测Java函数漏洞的方法。实验结果表明,SAE提取的特征在无需微调的情况下实现了高达89%的F1分数,优于微调后的Transformer基线模型。
[286] One Shot Dominance: Knowledge Poisoning Attack on Retrieval-Augmented Generation Systems
- arXiv: 2505.11548 (replaced)
- Authors: Zhiyuan Chang, Mingyang Li, Xiaojun Jia, Junjie Wang, Yuekai Huang, Ziyou Jiang, Yang Liu, Qing Wang
- Subjects: cs.CR; cs.AI
- Tags: RAG, LLM Security, Adversarial Robustness
- Venue: EMNLP 2025 Findings
- Summary: 该文揭示了一种针对检索增强生成(RAG)系统的知识投毒攻击方法AuthChain,仅需投毒单个文档即可对复杂多跳问题实现有效攻击。实验表明,该方法在保持高隐蔽性的同时,对多种主流LLM实现了显著的攻击成功率。
[287] LiloDriver: A Lifelong Learning Framework for Closed-loop Motion Planning in Long-tail Autonomous Driving Scenarios
- arXiv: 2505.17209 (replaced)
- Authors: Huaiyuan Yao, Pengfei Li, Bu Jin, Yupeng Zheng, An Liu, Lisen Mu, Qing Su, Qian Zhang, Yilun Chen, Peng Li
- Subjects: cs.RO; cs.AI
- Tags: Autonomous Driving, LLM Agent, Continual Learning
- Code: code
- Summary: 该文提出了LiloDriver,一个用于自动驾驶闭环运动规划的终身学习框架,结合了大型语言模型和记忆增强系统。该方法能够在无需重新训练的情况下适应长尾场景,在nuPlan基准测试中优于现有的基于规则和学习的方法。
[288] RQR3D: Reparametrizing the regression targets for BEV-based 3D object detection
- arXiv: 2505.17732 (replaced)
- Authors: Ozsel Kilinc, Cem Tarhan
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Object Detection, Autonomous Driving, 3D Vision
- Venue: CVPR 2026 Findings
- Summary: 该文提出了一种用于鸟瞰图(BEV)3D目标检测的重参数化方法RQR3D,通过将旋转框检测转化为关键点回归任务解决了角度表示的不连续问题。该方法结合简化的雷达融合骨干网络,在nuScenes数据集上实现了最先进的检测性能。
[289] SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models
- arXiv: 2506.01062 (replaced)
- Authors: Thinh Pham, Nguyen Nguyen, Pratibha Zunjare, Weiyuan Chen, Yu-Min Tseng, Tu Vu
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: RAG, LLM Reasoning, LLM Evaluation
- Venue: ICLR 2026
- Summary: 本文介绍了SealQA,一个新的基准测试,用于评估搜索引擎增强的语言模型在搜索结果存在冲突或噪声时的表现。该基准包含三种难度,揭示了当前前沿LLM和智能体模型在处理噪声搜索结果和长上下文推理方面的严重局限性。研究发现增加测试时计算量并未带来可靠提升,且模型在多文档推理中仍难以识别相关文档。
[290] SpecBranch: Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism
- arXiv: 2506.01979 (replaced)
- Authors: Yuhao Shen, Junyi Shen, Quan Kong, Tianyu Liu, Yao Lu, Cong Wang
- Subjects: cs.DC; cs.AI
- Tags: LLM Inference, Optimization
- Venue: ICLR 2026
- Summary: 本文提出了SpecBranch,一种通过分支并行性加速LLM推理的推测解码框架。该方法通过引入并行推测分支和自适应草稿长度,解决了现有推测解码方法中序列化执行的限制。实验表明,该方法显著提高了推理速度并减少了回滚令牌数量。
[291] Employing Deep Neural Operators for PDE control by decoupling training and optimization
- arXiv: 2506.04742 (replaced)
- Authors: Oliver G. S. Lundqvist, Fabricio Oliveira
- Subjects: math.OC; cs.AI
- Tags: Scientific Computing, Neural Operator, Optimization
- Summary: 本文提出了一种简化的方法,通过将控制问题与训练过程解耦,利用神经算子(如DeepONet)解决偏微分方程(PDE)约束的控制问题。该方法在物理信息训练阶段后进行无约束优化,无需重新训练即可适应不同的跟踪目标。实验表明,该方法在非线性时间相关问题中具有竞争力,且迭代速度更快。
[292] Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test
- arXiv: 2506.06975 (replaced)
- Authors: Xiaoyuan Zhu, Yaowen Ye, Tianyi Qiu, Hanlin Zhu, Sijun Tan, Ajraf Mannan, Jonathan Michala, Raluca Ada Popa, Willie Neiswanger
- Subjects: cs.CR; cs.AI; cs.CL
- Tags: LLM Evaluation, LLM Security, Adversarial Robustness
- Summary: 本文提出了一种基于排名的均匀性测试方法,用于验证黑盒LLM API与本地部署的真实模型之间的行为一致性。该方法准确、查询效率高,并能避免被对抗性提供商检测到测试模式。实验表明,在有限的查询预算下,该方法在检测模型替换方面优于现有方法。
[293] "Is This Really a Human Peer Supporter?": Misalignments Between Peer Supporters and Experts in LLM-Supported Interactions
- arXiv: 2506.09354 (replaced)
- Authors: Kellie Yu Hui Sim, Roy Ka-Wei Lee, Kenny Tsu Wei Choo
- Subjects: cs.HC; cs.AI
- Tags: Medical AI, LLM Evaluation, Human-Computer Interaction
- Venue: CSCW 2026
- Summary: 本文评估了一个AI支持的同伴支持系统,该系统包含LLM模拟的求助者和实时建议功能。研究发现,虽然同伴支持者认为系统有用,但专家指出了其回应中存在的关键安全问题,揭示了同伴支持实践中的错位。这强调了在心理健康领域整合AI时需要标准化培训和专家监督。
[294] "I Said Things I Needed to Hear Myself": Peer Support as an Emotional, Organisational, and Sociotechnical Practice in Singapore
- arXiv: 2506.09362 (replaced)
- Authors: Kellie Yu Hui Sim, Kenny Tsu Wei Choo
- Subjects: cs.HC; cs.AI
- Tags: Medical AI, Human-Computer Interaction, Affective Computing
- Summary: 本文通过对新加坡20名同伴支持者的访谈研究,探讨了他们在不同环境下的实践动机、情感劳动和社会文化维度。研究提出了文化响应式数字工具的设计方向,旨在加强而非取代关系关怀。文章还提供了关于AI如何负责任地增强同伴支持的见解。
[295] BTC-LLM: Efficient Sub-1-Bit LLM Quantization via Learnable Transformation and Binary Codebook
- arXiv: 2506.12040 (replaced)
- Authors: Hao Gu, Lujun Li, Hao Wang, Lei Wang, Zheyu Wang, Bei Liu, Jiacheng Liu, Qiyuan Zhu, Sirui Han, Yike Guo
- Subjects: cs.LG; cs.AI; cs.CV
- Tags: Model Compression, LLM Inference
- Summary: 本文提出了BTC-LLM,一种利用二进制模式聚类和可学习变换的亚1位LLM量化框架。该方法通过消除稀疏掩码实现了高效的硬件兼容性,并在极低比特率下保持了高性能。实验表明,该方法在LLaMA等模型上实现了最先进的压缩结果和显著的推理加速。
[296] Part^{2}GS: Part-aware Modeling of Articulated Objects using 3D Gaussian Splatting
- arXiv: 2506.17212 (replaced)
- Authors: Tianjiao Yu, Vedant Shah, Muntasir Wahed, Ying Shen, Kiet A. Nguyen, Ismini Lourentzou
- Subjects: cs.CV; cs.AI; cs.LG; cs.RO
- Tags: 3D Vision, 3D Reconstruction
- Summary: 本文介绍了Part^2GS,一个使用部分感知3D高斯泼溅技术对铰接物体进行建模的框架。该方法通过可学习的属性编码铰接组件,并利用物理约束确保运动的一致性。实验表明,该方法在合成和真实数据集上均显著优于现有技术,特别是在可移动部分的重建精度上。
[297] Towards Effective Offensive Security LLM Agents: Hyperparameter Tuning, LLM as a Judge, and a Lightweight CTF Benchmark
- arXiv: 2508.05674 (replaced)
- Authors: Minghao Shao, Nanda Rani, Kimberly Milner, Haoran Xi, Meet Udeshi, Saksham Aggarwal, Venkata Sai Charan Putrevu, Sandeep Kumar Shukla, Prashanth Krishnamurthy, Farshad Khorrami, Ramesh Karri, Muhammad Shafique
- Subjects: cs.CR; cs.AI
- Tags: LLM Agent, Cybersecurity, LLM Evaluation
- Code: code, code
- Summary: 本文系统地研究了构建有效的基于LLM的攻击性安全智能体的关键因素,并提出了一种详细的构建方案。研究引入了CTFJudge框架用于分析智能体轨迹,以及一个包含50个挑战的轻量级基准测试CTFTiny。实验确定了最佳的多智能体协调设置,并为网络安全领域的LLM智能体研究奠定了基础。
[298] Mitigating Domain Drift in Multi Species Segmentation with DINOv2: A Cross-Domain Evaluation in Herbicide Research Trials
- arXiv: 2508.07514 (replaced)
- Authors: Artzai Picon, Itziar Eguskiza, Daniel Mugica, Javier Romero, Carlos Javier Jimenez, Eric White, Gabriel Do-Lago-Junqueira, Christian Klukas, Ramon Navarra-Mestre
- Subjects: cs.CV; cs.AI
- Tags: Domain Adaptation, Image Segmentation, Vision Transformer
- Summary: 本文评估了一种结合视觉基础模型(DINOv2)和分层分类推理的分割框架,旨在提高植物物种和损伤分割在领域漂移下的鲁棒性。研究利用多年跨地区数据集进行训练,并在时间、地理和传感器变化下测试了泛化能力。结果表明,基础模型骨干显著优于现有基线,并在实际农业监测流程中得到部署。
[299] PEER: Unified Process-Outcome Reinforcement Learning for Structured Empathetic Reasoning
- arXiv: 2508.09521 (replaced)
- Authors: Yunxiao Wang, Meng Liu, Kaiyu Jiang, Bin Wen, Fan Yang, Tingting Gao, Lizi Liao
- Subjects: cs.CL; cs.AI
- Tags: Dialogue System, Reinforcement Learning, LLM Reasoning
- Summary: 本文提出了一种结构化共情推理方法,将情感支持分解为历史分析、情绪推断和策略选择三个步骤。研究引入了PEER框架,利用统一的过程-结果奖励模型和基于个性的重写技术来增强共情能力和多样性。实验表明,该方法在共情能力、策略一致性和拟人化方面均有显著提升。
[300] Chunks as Arms: Multi-Armed Bandit-Guided Sampling for Long-Context LLM Preference Optimization
- arXiv: 2508.13993 (replaced)
- Authors: Shaohua Duan, Pengcheng Huang, Xinze Li, Zhenghao Liu, Xiaoyuan Yi, Yukun Yan, Shuo Wang, Yu Gu, Ge Yu, Maosong Sun
- Subjects: cs.CL; cs.AI
- Tags: Long Context, LLM Reasoning, Optimization
- Code: code
- Summary: 本文提出了LongMab框架,利用多臂老虎机策略从长上下文中识别信息量最大的块,以构建高质量的偏好数据对。该方法将上下文块视为老虎机的臂,通过探索和利用过程生成多样化的响应。实验结果表明,该方法在长上下文推理基准测试上取得了显著提升。
[301] Diffusion Language Models Know the Answer Before Decoding
- arXiv: 2508.19982 (replaced)
- Authors: Pengxiang Li, Yefan Zhou, Dilxat Muhtar, Lu Yin, Shilin Yan, Li Shen, Soroush Vosoughi, Shiwei Liu
- Subjects: cs.CL; cs.AI
- Tags: LLM Inference, Diffusion Model
- Code: code
- Summary: 本文揭示了扩散语言模型在解码完成前就能识别正确答案的特性,并提出了一种无需训练的快速解码范式Prophet。Prophet利用预测候选之间的置信度差距动态决定是继续细化还是提前解码,从而显著减少解码步骤。实验表明,该方法在保持生成质量的同时,将解码速度提高了数倍。
[302] MARCH: Evaluating the Intersection of Ambiguity Interpretation and Multi-hop Inference
- arXiv: 2509.22750 (replaced)
- Authors: Jeonghyun Park, Ingeol Baek, Seunghyun Yoon, Haeun Jang, Aparna Garimella, Akriti Jain, Nedim Lipka, Hwanhee Lee
- Subjects: cs.CL; cs.AI
- Tags: Question Answering, LLM Reasoning, LLM Evaluation
- Venue: ACL 2026 Findings
- Summary: 本文介绍了MARCH基准测试,用于评估多跳推理与歧义性解释的交集,包含经过多LLM验证和人工标注的问题。实验显示现有模型在该基准上表现不佳,为此提出了CLARION框架,将歧义规划与证据驱动推理解耦。该方法显著优于现有方法,为构建鲁棒的推理系统提供了新思路。
[303] AudioMoG: Guiding Audio Generation with Mixture-of-Guidance
- arXiv: 2509.23727 (replaced)
- Authors: Junyou Wang, Zehua Chen, Binjie Yuan, Kaiwen Zheng, Chang Li, Yuxuan Jiang, Jun Zhu
- Subjects: cs.SD; cs.AI
- Tags: Audio Generation, Diffusion Model, Multimodal Learning
- Venue: ICME 2026
- Summary: 本文提出了AudioMoG,一种用于扩散模型音频生成的混合引导框架,旨在提高生成质量而无需重新训练。该方法结合了分类器无关引导和自引导的优势,通过混合多种引导信号来优化生成效果。实验表明,该方法在文本到音频和视频到音频生成任务中均优于单一引导方法。
[304] SeMoBridge: Semantic Modality Bridge for Efficient Few-Shot Adaptation of CLIP
- arXiv: 2509.26036 (replaced)
- Authors: Christoph Timmermann, Hyunse Lee, Woojin Lee
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Vision-Language Model, Few-Shot Learning, Transfer Learning
- Code: code
- Summary: 本文提出了SeMoBridge方法,通过将图像映射到文本模态来解决CLIP在少样本分类中的模态内错位问题。该方法利用语义模态桥接保持语义内容不变,并支持闭式解或多模态监督训练。实验表明,该方法在低数据场景下仅需少量训练时间即可超越其他方法。
[305] Search-R3: Unifying Reasoning and Embedding in Large Language Models
- arXiv: 2510.07048 (replaced)
- Authors: Yuntao Gui, James Cheng
- Subjects: cs.CL; cs.AI
- Tags: RAG, LLM Reasoning, Information Retrieval
- Code: code
- Summary: 本文提出了Search-R3框架,将LLM的推理能力与嵌入生成相结合,通过链式思维、监督学习和强化学习三种机制优化检索任务的嵌入生成。
[306] Invisible to Humans, Triggered by Agents: Stealthy Jailbreak Attacks on Mobile Vision-Language Agents
- arXiv: 2510.07809 (replaced)
- Authors: Renhua Ding, Xiao Yang, Zhengwei Fang, Jun Luo, Kun He, Jun Zhu
- Subjects: cs.CR; cs.AI
- Tags: LLM Security, Vision-Language Model, LLM Agent
- Summary: 本文提出了一种针对移动视觉语言代理的新型攻击范式,利用人类与代理交互的差异实现隐蔽的越狱攻击,在GPT-4o上实现了82.5%的规划劫持率。
[307] Fast and Interpretable Protein Substructure Alignment via Optimal Transport
- arXiv: 2510.11752 (replaced)
- Authors: Zhiyu Wang, Bingxin Zhou, Jing Wang, Yang Tan, Weishu Zhao, Pietro Liò, Liang Hong
- Subjects: q-bio.QM; cs.AI; cs.LG
- Tags: Drug Discovery, Representation Learning
- Venue: ICLR 2026
- Code: code
- Summary: 本文提出了PLASMA框架,将蛋白质局部结构对齐问题转化为正则化最优传输任务,利用可微Sinkhorn迭代实现高效可解释的残基级结构对齐。
[308] CompoDistill: Attention Distillation for Compositional Reasoning in Multimodal LLMs
- arXiv: 2510.12184 (replaced)
- Authors: Jiwan Kim, Kibum Kim, Sangwoo Seo, Chanyoung Park
- Subjects: cs.CV; cs.AI
- Tags: Knowledge Distillation, Vision-Language Model, Multimodal Learning
- Venue: ICLR 2026
- Summary: 本文提出了CompoDistill知识蒸馏框架,通过显式对齐学生模型与教师模型的视觉注意力来增强多模态LLM的组合推理能力。
[309] When Personalization Tricks Detectors: The Feature-Inversion Trap in Machine-Generated Text Detection
- arXiv: 2510.12476 (replaced)
- Authors: Lang Gao, Xuhui Li, Chenxi Wang, Mingzhe Li, Wei Liu, Zirui Song, Jinghui Zhang, Rui Yan, Preslav Nakov, Xiuying Chen
- Subjects: cs.CL; cs.AI
- Tags: LLM Evaluation, LLM Security
- Venue: ACL 2026
- Summary: 本文构建了首个个性化机器生成文本检测基准,揭示了特征反转陷阱导致检测器在个性化设置下性能下降的现象,并提出了预测性能变化的方法。
[310] E2Edev: Benchmarking Large Language Models in End-to-End Software Development Task
- arXiv: 2510.14509 (replaced)
- Authors: Jingyao Liu, Chen Huang, Zhizhao Guan, Wenqiang Lei, Yang Deng
- Subjects: cs.SE; cs.AI; cs.CL
- Tags: Code Generation, LLM Evaluation, Software Testing
- Venue: ACL 2026
- Code: code
- Summary: 本文提出了E2EDev基准,基于行为驱动开发原则评估LLM在端到端软件开发任务中的能力,包含细粒度用户需求和自动化测试管道。
[311] Do AI Models Dream of Faster Code? An Empirical Study on LLM-Proposed Performance Improvements in Real-World Software
- arXiv: 2510.15494 (replaced)
- Authors: Lirong Yi, Gregory Gay, Philipp Leitner
- Subjects: cs.SE; cs.AI; cs.PF
- Tags: Code Generation, LLM Evaluation
- Summary: 本文对LLM在真实软件性能优化任务中的表现进行了实证研究,发现LLM虽具备解决复杂工程问题的能力,但解决方案波动性大且整体仍落后于人类开发者。
[312] ATLAS: Adaptive Trading with LLM AgentS Through Dynamic Prompt Optimization and Multi-Agent Coordination
- arXiv: 2510.15949 (replaced)
- Authors: Charidimos Papadakis, Angeliki Dimitriou, Giorgos Filandrianos, Maria Lymperaiou, Konstantinos Thomas, Giorgos Stamou
- Subjects: q-fin.TR; cs.AI
- Tags: LLM Agent, Multi-Agent System, Quantitative Finance
- Venue: ACL 2026
- Summary: 本文提出了ATLAS多代理交易框架,整合市场、新闻和基本面信息,通过自适应提示优化技术动态调整交易策略以提升金融决策表现。
[313] How Do Data Owners Say No? A Case Study of Data Consent Mechanisms in Web-Scraped Vision-Language AI Training Datasets
- arXiv: 2511.08637 (replaced)
- Authors: Chung Peng Lee, Rachel Hong, Harry H. Jiang, Aster Plotnik, William Agnew, Jamie Morgenstern
- Subjects: cs.CY; cs.AI; cs.CR
- Tags: AI Ethics, Vision-Language Model, Data Annotation
- Summary: 本文研究了网络爬取视觉语言AI训练数据集中的数据同意机制,发现当前数据收集实践未能充分尊重数据所有者通过版权声明、水印和服务条款表达的意愿。
[314] The Persistence of Cultural Memory: Investigating Multimodal Iconicity in Diffusion Models
- arXiv: 2511.11435 (replaced)
- Authors: Maria-Teresa De Rosa Palmini, Eva Cetinic
- Subjects: cs.CV; cs.AI
- Tags: Diffusion Model, Text-to-Image, LLM Evaluation
- Summary: 本文研究了文本到图像扩散模型中的多模态标志性现象,提出了评估框架来衡量模型在文化引用场景中是否依赖视觉复制而非文化泛化。
[315] Evaluating Low-Light Image Enhancement Across Multiple Intensity Levels
- arXiv: 2511.15496 (replaced)
- Authors: Maria Pilligua, David Serrano-Lozano, Pai Peng, Ramon Baldrich, Michael S. Brown, Javier Vazquez-Corral
- Subjects: cs.CV; cs.AI
- Tags: Image Enhancement, Computer Vision
- Summary: 本文引入了MILL多照度低光数据集,用于评估不同光照强度下的图像增强算法性能,揭示了现有方法在不同强度水平上的显著性能差异。
[316] NSTR: Neural Spectral Transport Representation for Space-Varying Frequency Fields
- arXiv: 2511.18384 (replaced)
- Authors: Plein Versace
- Subjects: cs.SD; cs.AI
- Tags: Representation Learning
- Summary: 本文提出了神经频谱传输表示(NSTR)框架,首次显式建模空间变化的局部频率场,通过可学习的频率传输方程实现更好的信号表示。
[317] Action Without Interaction: Probing the Physical Foundations of Video LMMs via Contact-Release Detection
- arXiv: 2511.20162 (replaced)
- Authors: Daniel Harari, Michael Sidorov, Chen Shterental, Liel David, Abrham Kahsay Gebreselasie, Muhammad Haris Khan
- Subjects: cs.CV; cs.AI; q-bio.NC
- Tags: Video Understanding, Vision-Language Model, LLM Evaluation
- Summary: 本文构建了大规模数据集评估视频大模型对接触和释放事件的检测能力,发现模型虽擅长语义识别但在物理基础定位方面存在显著缺陷。
[318] Human-computer interactions predict mental health
- arXiv: 2511.20179 (replaced)
- Authors: Veith Weilnhammer, Jefferson Ortega, David Whitney
- Subjects: q-bio.NC; cs.AI; cs.HC
- Tags: Medical AI, Human-Computer Interaction
- Summary: 本文提出了MAILA框架,通过日常人机交互数据推断潜在心理健康状态,在13个临床相关维度上实现了生物标志物级别的准确度。
[319] RDSplat: Robust Watermarking for 3D Gaussian Splatting Against 2D and 3D Diffusion Editing
- arXiv: 2512.06774 (replaced)
- Authors: Longjie Zhao, Ziming Hong, Zhenyang Ren, Runnan Chen, Mingming Gong, Tongliang Liu
- Subjects: cs.CV; cs.AI
- Tags: 3D Vision, Diffusion Model, Cybersecurity
- Summary: 本文提出了RDSplat框架,首个针对3D高斯泼溅的水印方法,通过将水印嵌入低频基元实现对2D和3D扩散编辑攻击的鲁棒性。
[320] Comparative Evaluation of Embedding Representations for Financial News Sentiment Analysis
- arXiv: 2512.13749 (replaced)
- Authors: Joyjit Roy, Samaresh Kumar Singh
- Subjects: cs.LG; cs.AI; cs.CE; cs.CY; cs.SE
- Tags: Sentiment Analysis, Quantitative Finance
- Venue: IATMSI 2026
- Summary: 本文评估了多种嵌入表示技术在资源受限环境下金融新闻情感分类的表现,发现预训练嵌入在低于临界数据阈值时收益递减。
[321] Machine Unlearning in the Era of Quantum Machine Learning: An Empirical Study
- arXiv: 2512.19253 (replaced)
- Authors: Carla Crivoi, Radu Tudor Ionescu
- Subjects: cs.LG; cs.AI; cs.CV
- Tags: Machine Unlearning, Quantum Computing
- Venue: ICPR 2026
- Code: code
- Summary: 本文首次对混合量子-经典神经网络中的机器遗忘进行了实证研究,将多种遗忘方法适配到量子设置并提出了两种新策略。实验表明量子模型可以支持有效的遗忘,但结果很大程度上取决于电路深度、纠缠结构和任务复杂度。
[322] ModeX: Evaluator-Free Best-of-N Selection for Open-Ended Generation
- arXiv: 2601.02535 (replaced)
- Authors: Hyeong Kyu Choi, Sharon Li
- Subjects: cs.CL; cs.AI
- Tags: LLM Inference, Text Generation
- Venue: ACL 2026
- Code: code
- Summary: 本文提出了ModeX,一种无需评估器的Best-of-N选择框架,通过构建相似度图和谱聚类来识别生成文本中的语义共识模态输出。该方法在文本摘要、代码生成和数学推理等开放任务中优于现有基线。
[323] TreeAdv: Tree-Structured Advantage Redistribution for Group-Based RL
- arXiv: 2601.03703 (replaced)
- Authors: Lang Cao, Hui Ruan, Yongqian Li, Peng Chao, Wu Ning, Haonan Song, Renhong Chen, Yitong Li
- Subjects: cs.LG; cs.AI
- Tags: LLM Reasoning, Reinforcement Learning
- Summary: 本文提出了TreeAdv方法,通过构建树状结构来显式建模群组rollout,并在高不确定性决策点进行分支,实现更精细的优势分配。该方法在10个数学推理基准上持续优于GRPO和GSPO,同时使用更少的生成token。
[324] Mind the Generative Details: Direct Localized Detail Preference Optimization for Video Diffusion Models
- arXiv: 2601.04068 (replaced)
- Authors: Zitong Huang, Kaidong Zhang, Yukang Ding, Chao Gao, Rui Ding, Ying Chen, Wangmeng Zuo
- Subjects: cs.CV; cs.AI
- Tags: Video Generation, Diffusion Model, LLM Alignment
- Venue: CVPR 2026
- Summary: 本文提出了LocalDPO,一种针对视频扩散模型的后训练框架,通过构建局部偏好对在时空区域级别进行对齐优化。实验表明该方法在视频保真度、时间连贯性和人类偏好评分上均优于其他后训练方法。
[325] DYCP: Dynamic Context Pruning for Long-Form Dialogue with LLMs
- arXiv: 2601.07994 (replaced)
- Authors: Nayoung Choi, Jonathan Zhang, Jinho D. Choi
- Subjects: cs.CL; cs.AI
- Tags: Long Context, Dialogue System, LLM Inference
- Venue: TACL 2026
- Summary: 本文提出了DyCP,一种轻量级的上下文管理方法,能够根据当前对话轮次动态识别和检索相关对话片段。该方法在三个长对话基准上实现了有竞争力的回答质量,同时提高了推理效率。
[326] Adversarial Evasion Attacks on Computer Vision using SHAP Values
- arXiv: 2601.10587 (replaced)
- Authors: Frank Mollard, Marcus Becker, Florian Roehrbein
- Subjects: cs.CV; cs.AI
- Tags: Adversarial Robustness, Computer Vision
- Venue: 10th bwHPC Symposium 2024 Workshop
- Summary: 本文提出了一种基于SHAP值的白盒对抗攻击方法,利用SHAP值量化输入对输出的重要性来生成对抗样本。实验表明SHAP攻击在梯度隐藏场景下比快速梯度符号方法更鲁棒。
[327] Q-Probe: Scaling Image Quality Assessment to High Resolution via Context-Aware Agentic Probing
- arXiv: 2601.15356 (replaced)
- Authors: Xiang Li, Xueheng Li, Yu Wang, Xuanhua He, Zhangchi Hu, Weiwei Yu, Chengjun Xie
- Subjects: eess.IV; cs.AI
- Tags: Image Quality Assessment, LLM Agent, Multimodal Learning
- Summary: 本文提出了Q-Probe,首个面向高分辨率图像质量评估的智能体框架,通过上下文感知探测机制实现细粒度局部退化分析。该方法在新构建的Vista-Bench基准上达到了最先进的性能。
[328] SPEAR: An Engineering Case Study of Multi-Agent Coordination for Smart Contract Auditing
- arXiv: 2602.04418 (replaced)
- Authors: Arnab Mallick, Indraveni Chebolu, Harmesh Rana, Seema Pangal
- Subjects: cs.MA; cs.AI; cs.DC; cs.ET; cs.SE
- Tags: Multi-Agent System, Cybersecurity, LLM Agent
- Venue: AAMAS 2026 Workshop
- Summary: 本文提出了SPEAR,一个用于智能合约审计的多智能体协调框架,采用规划智能体、执行智能体和修复智能体的分工模式。实证研究比较了多智能体设计与集中式和流水线替代方案。
[329] PANC: Prior-Aware Normalized Cut via Anchor-Augmented Token Graphs
- arXiv: 2602.06912 (replaced)
- Authors: Juan Gutiérrez, Victor Gutiérrez-García, José Luis Blanco-Murillo
- Subjects: cs.CV; cs.AI
- Tags: Image Segmentation, Vision Transformer
- Summary: 本文提出了PANC,一种无需训练的无监督分割方法,通过将先验标记连接到前景/背景锚点来扩展归一化切割算法。该方法在DUTS-TE、DUT-OMRON和CrackForest数据集上取得了显著的mIoU提升。
[330] Rethinking the Value of Agent-Generated Tests for LLM-Based Software Engineering Agents
- arXiv: 2602.07900 (replaced)
- Authors: Zhi Chen, Zhensu Sun, Yuling Shi, Chao Peng, Xiaodong Gu, David Lo, Lingxiao Jiang
- Subjects: cs.SE; cs.AI
- Tags: LLM Agent, Software Testing, Code Generation
- Summary: 本文分析了LLM软件工程智能体中智能体生成测试的价值,发现测试编写主要作为观察反馈渠道而非改善问题解决。提示干预研究表明改变测试编写量并不会显著影响最终结果。
[331] AI-PACE: A Framework for Integrating AI into Medical Education
- arXiv: 2602.10527 (replaced)
- Authors: Scott P. McGrath, Katherine K. Kim, Karnjit Johl, Haibo Wang, Nick Anderson
- Subjects: cs.CY; cs.AI
- Tags: Medical AI, Education Technology
- Summary: 本文综合分析了医学教育中AI应用的现有知识,提出了一个课程开发框架,强调需要在医学培训中进行纵向整合、跨学科协作,并平衡技术基础和临床应用。
[332] MultiModalPFN: Extending Prior-Data Fitted Networks for Multimodal Tabular Learning
- arXiv: 2602.20223 (replaced)
- Authors: Wall Kim, Chaeyoung Song, Hanul Kim
- Subjects: cs.LG; cs.AI
- Tags: Multimodal Learning, Tabular Learning
- Venue: CVPR 2026
- Code: code
- Summary: 本文将TabPFN扩展到多模态表格学习,通过模态编码器和投影器统一处理表格和非表格模态。在医学和通用多模态数据集上的实验表明该方法持续优于竞争性最先进方法。
[333] Interpretable Tau-PET Synthesis from Multimodal T1-Weighted and FLAIR MRI Using Partial Information Decomposition Guided Disentangled Quantized Half-UNet
- arXiv: 2602.22545 (replaced)
- Authors: Agamdeep S. Chopra, Caitlin Neher, Tianyi Ren, Juampablo E. Heras Rivera, Hesam Jahanian, Mehmet Kurt
- Subjects: cs.CV; cs.AI
- Tags: Medical AI, Image Synthesis, Multimodal Learning
- Summary: 本文提出了一种可解释的多模态图像合成框架,利用偏信息分解引导的解耦量化Half-UNet从配对MRI生成tau-PET。该方法在ADNI-3和OASIS-3数据集上实现了最佳原始PET保真度和Braak分期性能。
[334] SUPERGLASSES: Benchmarking Vision Language Models as Intelligent Agents for AI Smart Glasses
- arXiv: 2602.22683 (replaced)
- Authors: Zhuohang Jiang, Xu Yuan, Haohao Qu, Shanru Lin, Kanglong Liu, Wenqi Fan, Qing Li
- Subjects: cs.CV; cs.AI
- Tags: Vision-Language Model, LLM Agent, Question Answering
- Summary: 本文介绍了SUPERGLASSES,首个基于智能眼镜真实数据构建的VQA基准,包含2422个自我中心图像-问题对。同时提出SUPERLENS多模态智能体,在基准上超越GPT-4o 2.19%。
[335] Why Adam Can Beat SGD: Second-Moment Normalization Yields Sharper Tails
- arXiv: 2603.03099 (replaced)
- Authors: Ruinan Jin, Yingbin Liang, Shaofeng Zou
- Subjects: cs.LG; cs.AI
- Tags: Optimization, Deep Learning Theory
- Summary: 本文从理论上证明了Adam在高概率收敛行为上优于SGD:Adam实现了δ^(-1/2)的置信参数依赖,而SGD至少需要δ^(-1)依赖。该分析通过停时/鞅方法在经典有界方差模型下完成。
[336] ZipMap: Linear-Time Stateful 3D Reconstruction via Test-Time Training
- arXiv: 2603.04385 (replaced)
- Authors: Haian Jin, Rundi Wu, Tianyuan Zhang, Ruiqi Gao, Jonathan T. Barron, Noah Snavely, Aleksander Holynski
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: 3D Vision, 3D Reconstruction
- Summary: 本文提出了ZipMap,一种有状态的前馈模型,通过测试时训练层将图像集合压缩为紧凑的隐藏场景状态,实现线性时间的双向3D重建。该方法比VGGT等最先进方法快20倍以上。
[337] Stacked from One: Multi-Scale Self-Injection for Context Window Extension
- arXiv: 2603.04759 (replaced)
- Authors: Wei Han, Pan Zhou, Soujanya Poria, Shuicheng Yan
- Subjects: cs.CL; cs.AI
- Tags: Long Context, LLM Inference
- Summary: 本文提出SharedLLM框架,通过两个堆叠的短上下文LLM实现上下文窗口扩展。下层模型将长输入压缩为多粒度表示,上层模型进行上下文感知处理,在仅用8K token训练的情况下有效泛化到128K+ token输入。
[338] Agentic SPARQL: Evaluating SPARQL-MCP-powered Intelligent Agents on the Federated KGQA Benchmark
- arXiv: 2603.06582 (replaced)
- Authors: Daniel Dobriy, Frederik Bauer, Amr Azzam, Debayan Banerjee, Axel Polleres
- Subjects: cs.IR; cs.AI; cs.MA
- Tags: LLM Agent, Knowledge Graph, Question Answering
- Summary: 本文探索基于SPARQL-MCP的智能代理进行联邦SPARQL查询的潜力,将现有知识图谱问答基准扩展为代理式联邦知识图谱问答,并评估了不同架构选项的效果。
[339] Reinforced Generation of Combinatorial Structures: Ramsey Numbers
- arXiv: 2603.09172 (replaced)
- Authors: Ansh Nagda, Prabhakar Raghavan, Abhradeep Thakurta
- Subjects: math.CO; cs.AI; cs.CC
- Tags: LLM Reasoning, Optimization
- Summary: 本文使用AlphaEvolve(一种基于LLM的代码变异代理)发现了七个经典拉姆齐数的改进下界。该单一元算法生成的搜索算法成功恢复了已知精确下界并匹配了多个最佳已知下界。
[340] Hardware Efficient Approximate Convolution with Tunable Error Tolerance for CNNs
- arXiv: 2603.10100 (replaced)
- Authors: Vishal Shashidhar, Anupam Kumari, Roy P Paily
- Subjects: cs.LG; cs.AI; cs.AR
- Tags: Model Compression, Edge Computing, Low Power
- Summary: 本文提出一种”软稀疏”范式,使用最高有效位代理跳过可忽略的非零乘法,集成到RISC-V自定义指令中。该方法在零精度损失下显著减少MAC运算量和功耗,优于传统零跳过方法。
[341] Learning to Negotiate: Multi-Agent Deliberation for Collective Value Alignment in LLMs
- arXiv: 2603.10476 (replaced)
- Authors: Panatchakorn Anantaprayoon, Nataliia Babina, Nima Asgharbeygi, Jad Tarifi
- Subjects: cs.CL; cs.AI
- Tags: LLM Alignment, Multi-Agent System, RLHF
- Venue: LREC 2026 Workshop
- Summary: 本文提出一种基于多代理协商的对齐框架,通过两个自我博弈的LLM实例生成互惠解决方案来解决价值冲突。该方法在保持对齐性能的同时显著提升了冲突解决能力。
[342] Stop Listening to Me! How Multi-turn Conversations Can Degrade LLM Diagnostic Reasoning
- arXiv: 2603.11394 (replaced)
- Authors: Kevin H. Guo, Chao Yan, Avinash Baidya, Katherine Brown, Xiang Gao, Juming Xiong, Zhijun Yin, Bradley A. Malin
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: LLM Reasoning, Medical AI, Dialogue System
- Summary: 本文评估了17个LLM在三个临床数据集上的表现,揭示了”对话税”现象:多轮对话会持续降低诊断推理性能。模型经常放弃正确的初始诊断以迎合用户的错误建议。
[343] OrgForge: A Multi-Agent Simulation Framework for Verifiable Synthetic Corporate Corpora
- arXiv: 2603.14997 (replaced)
- Authors: Jeffrey Flynt
- Subjects: cs.CL; cs.AI; cs.IR
- Tags: Multi-Agent System, Data Synthesis
- Summary: 本文提出OrgForge,一个多代理模拟框架,通过强制物理-认知边界生成可验证的合成企业语料库。该框架生成可追溯到共享事件日志的十五种交叉文档类别,显著提升了事实一致性。
[344] BenchBrowser: Retrieving Evidence for Evaluating Benchmark Validity
- arXiv: 2603.18019 (replaced)
- Authors: Harshita Diddee, Gregory Yauney, Swabha Swayamdipta, Daphne Ippolito
- Subjects: cs.CL; cs.AI; cs.SE
- Tags: LLM Evaluation, Information Retrieval
- Summary: 本文介绍BenchBrowser,一个检索器,用于从20个基准测试集中检索与自然语言用例相关的评估项。该工具帮助从业者诊断基准测试的内容效度和收敛效度问题。
[345] Quine: Realizing LLM Agents as Native POSIX Processes
- arXiv: 2603.18030 (replaced)
- Authors: Hao Ke
- Subjects: cs.OS; cs.AI; cs.PL; cs.SE
- Tags: LLM Agent
- Summary: 本文提出Quine,一种将LLM代理实现为原生POSIX进程的运行时架构。该设计直接从内核继承隔离、组合和资源控制能力,同时支持递归委托和shell原生组合。
[346] WASD: Locating Critical Neurons as Sufficient Conditions for Explaining and Controlling LLM Behavior
- arXiv: 2603.18474 (replaced)
- Authors: Haonan Yu, Junhao Liu, Zhenyu Yan, Haoran Lin, Xin Zhang
- Subjects: cs.CL; cs.AI
- Tags: Interpretability, LLM Inference
- Summary: 本文提出WASD框架,通过识别token生成的充分神经条件来解释和控制LLM行为。该方法产生更稳定、准确和简洁的解释,并在跨语言输出生成控制中验证了其实用性。
[347] HiCI: Hierarchical Construction-Integration for Long-Context Attention
- arXiv: 2603.20843 (replaced)
- Authors: Xiangyu Zeng, Qi Xu, Yunke Wang, Chang Xu
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: Long Context, LLM Inference
- Summary: 本文提出HiCI分层注意力模块,通过构建片段级表示并整合到全局上下文中来扩展LLM上下文长度。该方法仅用<5.5%额外参数将LLaMA-2上下文从4K扩展到100K token。
[348] MCLR: Improving Conditional Modeling via Inter-Class Likelihood-Ratio Maximization and Unifying Classifier-Free Guidance with Alignment Objectives
- arXiv: 2603.22364 (replaced)
- Authors: Xiang Li, Yixuan Jia, Xiao Li, Jeffrey A. Fessler, Rongrong Wang, Qing Qu
- Subjects: cs.LG; cs.AI; cs.CV
- Tags: Diffusion Model, Text-to-Image
- Summary: 本文提出MCLR对齐目标,在扩散模型训练期间显式最大化类间似然比。该方法在标准采样下实现类似CFG的改进效果,并建立了无分类器引导与对齐目标之间的形式等价关系。
[349] tBayes-MICE: A Bayesian Approach to Multiple Imputation for Time Series Data
- arXiv: 2603.27142 (replaced)
- Authors: Amuche Ibenegbu, Pierre Lafaye de Micheaux, Rohitash Chandra
- Subjects: stat.ML; cs.AI; cs.LG
- Tags: Time Series Forecasting, Data Imputation
- Summary: 本文扩展MICE方法,使用贝叶斯推断和MCMC采样进行时间序列缺失值插补。该方法结合时间信息初始化和时滞特征,在降低插补误差的同时量化不确定性。
[350] Continued AI Scaling Requires Repeated Efficiency Doublings
- arXiv: 2603.28507 (replaced)
- Authors: Chien-Ping Lu
- Subjects: cs.LG; cs.AI
- Tags: AI Sustainability, Deep Learning Theory
- Summary: 本文论证持续AI扩展需要重复的效率翻倍。作者以摩尔定律为类比,指出AI需要在硬件、算法和系统方面持续获得效率增益,才能在可接受成本下维持扩展进程。
[351] Quantum-Inspired Geometric Classification with Correlation Group Structures and VQC Decision Modeling
- arXiv: 2604.01930 (replaced)
- Authors: Nishikanta Mohanty, Arya Ansuman Priyadarshi, Bikash K. Behera, Badshah Mukherjee
- Subjects: cs.AI
- Tags: Quantum Computing, Tabular Learning
- Summary: 本文提出一种量子启发的几何分类框架,整合相关群组结构和变分量子决策建模。该方法在小到中等数据集上实现竞争性性能,并在高度不平衡场景中展示出有效性。
[352] Parent Selection Mechanisms in Elitist Crossover-Based Algorithms
- arXiv: 2604.04083 (replaced)
- Authors: Andre Opris, Denis Antipov
- Subjects: cs.NE; cs.AI
- Tags: Optimization
- Venue: GECCO 2026
- Summary: 本文为遗传算法提出一种优先选择最大距离父代进行交叉的选择策略。分析表明该算法在Jump_k问题上显著加速,并揭示了交叉在种群多样性维持中的作用机制。
[353] Justified or Just Convincing? Error Verifiability as a Dimension of LLM Quality
- arXiv: 2604.04418 (replaced)
- Authors: Xiaoyuan Zhu, Kimberly Le Truong, Riccardo Fogliato, Gokul Swamy, Weijian Zhang, Minglai Yang, Longtian Ye, Bangya Liu, Minghao Liu, Andrew Ilyas, Steven Wu
- Subjects: cs.HC; cs.AI
- Tags: LLM Evaluation, LLM Reasoning
- Summary: 该论文提出了“错误可验证性”这一指标,用于衡量模型生成的理由是否能帮助用户区分正确与错误的答案。研究发现常规的模型缩放或后训练无法提升该指标,但提出的反思重述(RR)和神谕重述(OR)方法通过引入外部信息成功提高了可验证性。
[354] Training Transformers in Cosine Coefficient Space
- arXiv: 2604.04440 (replaced)
- Authors: Mohamed Amine Bergach
- Subjects: cs.PF; cs.AI
- Tags: Model Compression
- Summary: 该论文提出用离散余弦变换(DCT)系数替代Transformer中的线性层权重矩阵进行训练,通过逆DCT重建权重。实验表明,在参数量减半的情况下,该方法能达到与密集基线相当的性能,且优于LoRA低秩分解方法,同时具有节省带宽的优势。
[355] DQA: Diagnostic Question Answering for IT Support
- arXiv: 2604.05350 (replaced)
- Authors: Vishaal Kapoor, Mariam Dundua, Sarthak Ahuja, Neda Kordjazi, Evren Yortucboylu, Vaibhavi Padala, Derek Ho, Jennifer Whitted, Rebecca Steinert
- Subjects: cs.CL; cs.AI
- Tags: RAG, Question Answering
- Summary: 该论文提出了一个名为DQA的诊断问答框架,用于企业IT支持场景。该框架通过维护诊断状态和聚合检索案例,解决了标准多轮RAG系统在证据积累和假设解决方面的不足,显著提高了故障排查的成功率并减少了交互轮数。
[356] Distributional Open-Ended Evaluation of LLM Cultural Value Alignment Based on Value Codebook
- arXiv: 2604.06210 (replaced)
- Authors: Jaehyeok Lee, Xiaoyuan Yi, Jing Yao, Hyunjin Hwang, Roy Ka-Wei Lee, Xing Xie, JinYeong Bak
- Subjects: cs.CL; cs.AI; cs.CY; cs.LG
- Tags: LLM Alignment, LLM Evaluation
- Summary: 该论文提出了DOVE框架,用于评估大语言模型的文化价值观对齐。该框架通过构建价值码本并利用最优传输理论比较人类文本与模型输出的分布,解决了现有基准测试在子文化异质性和开放生成场景下的不足。
[357] Blockchain and AI: Securing Intelligent Networks for the Future
- arXiv: 2604.06323 (replaced)
- Authors: Joy Dutta, Hossien B. Eldeeb, Tu Dac Ho
- Subjects: cs.CR; cs.AI
- Tags: Cybersecurity, Blockchain
- Summary: 本文综述了区块链与人工智能在智能网络安全领域的结合应用,提出了一个分类法、集成模式以及安全评估蓝图(BASE)。文章分析了物联网、关键基础设施等领域的现状,指出了未来研究方向,旨在为构建安全透明的智能网络提供参考。
[358] The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?
- arXiv: 2604.06436 (replaced)
- Authors: Manish Bhatt, Sarthak Munshi, Vineeth Sai Narajala, Idan Habler, Ammar Al-Kahfah, Ken Huang, Joel Webb, Blake Gatto
- Subjects: cs.CR; cs.AI
- Tags: LLM Security, Adversarial Robustness
- Summary: 该论文证明了针对提示注入的包装器防御存在“防御三难困境”,即无法同时满足连续性、效用保留和完整性。研究通过数学证明和实验验证,揭示了此类防御失效的根本原因,并指出必须通过训练时对齐或架构变更来解决安全问题。
[359] The Detection-Extraction Gap: Models Know the Answer Before They Can Say It
- arXiv: 2604.06613 (replaced)
- Authors: Hanyang Wang, Mingxuan Zhu
- Subjects: cs.CL; cs.AI; cs.IT; cs.LG
- Tags: LLM Inference, LLM Reasoning
- Code: code
- Summary: 该研究发现模型在生成答案后仍会继续生成大量冗余token,揭示了“检测-提取差距”现象。论文提出了黑盒自适应早退机制(BAEE),利用自由续写来检测和提取答案,在保持或提高准确率的同时大幅减少了生成成本。
[360] WRAP++: Web discoveRy Amplified Pretraining
- arXiv: 2604.06829 (replaced)
- Authors: Jiang Zhou, Yunhao Wang, Xing Wu, Tinghao Yu, Feng Zhang
- Subjects: cs.CL; cs.AI
- Tags: Pre-training, Data Synthesis
- Summary: 该论文提出了WRAP++方法,利用网页超链接发现跨文档关系,并合成联合问答对来增强大语言模型的预训练数据。实验表明,这种跨文档知识发现与扩增方法能生成高质量合成数据,显著提升了模型在事实问答任务上的表现。