arXiv cs.AI Daily Update
cs.AI 领域 2026年4月15日 共有 339 篇论文更新:
- 70 篇新投稿:LLM Agent (Aethon [17], EMBER [21], GAM [33]), Benchmark (Frontier-Eng [34], MISID [55], [5]), LLM Evaluation (GoodPoint [3], RPRA [50], [5]), Multi-Agent System (Aethon [17], CIA [41], [4]), Memory Architecture (Aethon [17], EMBER [21], GAM [33])
- 129 篇跨领域投稿:LLM Agent (AutoSurrogate [87], AnyPoC [89], SIR-Bench [97]), Vision-Language Model (INDOTABVQA [90], EgoEsportsQA [130], IAD-Unify [143]), Medical AI (DBGL [80], OpenTME [102], [76]), LLM Reasoning (KG-Reasoner [151], CoDe-R [190], [85]), LLM Inference (SpecBound [121], CascadeDebate [125], Local-Splitter [127])
- 140 篇替换投稿:LLM Agent (MGA [210], PrivacyReasoner [217], WebFactory [219]), Reinforcement Learning (WebFactory [219], EvoNash-MARL [231], RationalRewards [235]), LLM Reasoning (DecompSR [212], AdaMCoT [241], ChemDFM-R [261]), LLM Evaluation (SEA-Eval [227], Silo-Bench [300], [201]), Benchmark (DecompSR [212], SEA-Eval [227], Silo-Bench [300])
整体趋势:今日论文主要聚焦于LLM Agent、LLM Reasoning、LLM Evaluation等方向。
已录用论文:[11](AIED 2026), [12](AAMAS 2026), [26](Nature Cities 2026), [27](ACL 2026), [37](ACL 2026), [41](ACL 2026), [44](WCCI 2026), [45](CHI 2026 Workshop), [46](ACL 2026), [47](ICSE 2026 Workshop), [53](Journal of Manufacturing Systems 2026), [54](Robotics and Computer-Integrated Manufacturing 2026), [58](CVPR 2026), [67](AAAI 2025 Workshop), [77](ACL 2026), [79](ACL 2026 Workshop), [86](Alife 2026), [90](ACL 2026), [92](CVPR 2026 Workshop), [96](ACL 2026), [100](AISTATS 2026), [114](ICRA 2026), [116](ACL 2026), [120](TMLR 2026), [121](ACL 2026), [131](AISTATS 2026), [132](GECCO 2026), [133](ApJS), [137](ACL 2026 Findings), [139](CVPR 2026), [147](LREC-COLING 2026), [152](AI & Society 2026), [153](ICLR 2026), [155](CVPR 2026), [157](CVPRW 2026), [159](CVPR 2026), [165](CVPR 2025), [166](ACL 2026), [167](IEEE OCEANS 2026), [172](IEEE TPAMI), [175](CVPR 2026 Workshop), [177](ICRA 2026), [183](CVPR 2026 Workshop), [184](ICLR 2026 Workshop), [186](ISBI 2026), [190](IJCNN 2026), [191](ACL 2026 Findings), [193](ACL 2026), [200](ICLR 2026), [203](FCS 2026), [206](ICLR 2026), [207](ACL 2026), [208](ESANN 2026), [209](ICLR 2026), [221](ACL 2026), [225](ACL 2026), [237](CAI 2023), [241](AAAI 2026), [247](EACL 2026), [251](NeurIPS 2025), [252](ICLR 2026), [253](ICLR 2026), [262](GECCO 2026), [264](ICLR 2026), [266](ACL 2026), [267](ACL 2026 Findings), [268](ICLR 2026), [269](EMNLP 2025 Findings), [277](ACL 2026 findings), [278](NeurIPS 2025), [280](ICLR 2026), [283](CITA 2026), [284](CVPR 2026 Workshop), [285](ESANN 2026), [288](MobiSys 2026), [289](MIDL 2026), [290](ACL 2026), [294](ACL 2026), [297](ECOOP 2026 Workshop), [300](ACL 2026), [303](ICLR 2026), [309](IJCNN 2026), [312](CVPR 2026 Workshop), [320](ACL 2026 Findings), [322](ICPR 2026), [325](IEEE CAI 2026), [328](CVPR 2026 Workshop), [333](CVPR 2026 Workshop), [335](ACL 2026)
开源论文:[7](code), [14](code), [28](code), [46](code), [49](code), [78](code), [79](code), [82](code), [91](code), [92](code), [108](code), [118](code), [120](code), [122](code), [137](code), [138](code), [139](code), [151](code), [155](code), [156](code), [157](code), [169](code), [174](code), [176](code), [188](code), [190](code), [191](code), [192](code), [197](code), [209](code), [210](code), [234](code), [238](code), [240](code), [253](code), [267](code), [274](code), [278](code), [285](code), [288](code), [290](code), [291](code), [298](code), [300](code), [303](code), [308](code), [309](code), [320](code), [335](code)
新投稿 (70)
[1] The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap
- arXiv: 2604.11828
- Authors: Mohamed Mabrok
- Subjects: cs.AI; cs.CY; math.OC
- Tags: Philosophy of Science, Cognitive Science
- Summary: 本文论证科学知识在任何历史时刻都代表局部最优而非全局最优,受历史偶然性、认知路径依赖和制度锁定的影响。通过类比机器学习中的梯度下降,作者识别了三种锁定机制并提出逃离局部最优的干预策略。
[2] Self-Monitoring Benefits from Structural Integration: Lessons from Metacognition in Continuous-Time Multi-Timescale Agents
- arXiv: 2604.11914
- Authors: Ying Xie
- Subjects: cs.AI
- Tags: Reinforcement Learning, LLM Agent, Cognitive Science
- Summary: 本文研究自我监控模块(元认知、自我预测、主观时长)对连续时间多时间尺度强化学习智能体的影响。实验发现结构化集成这些模块比附加式方法更有效,但自我监控内容本身并未显著优于无监控基线。
[3] GoodPoint: Learning Constructive Scientific Paper Feedback from Author Responses
- arXiv: 2604.11924
- Authors: Jimin Mun, Chani Jung, Xuhui Zhou, Hyunwoo Kim, Maarten Sap
- Subjects: cs.AI; cs.CL
- Tags: Text Generation, LLM Evaluation
- Summary: 本文提出GoodPoint方法,用于生成建设性的科学论文反馈。通过构建ICLR论文数据集并利用作者回复信号进行微调和偏好优化,该方法在反馈匹配任务上达到同类模型最优性能。
[4] Narrative-Driven Paper-to-Slide Generation via ArcDeck
- arXiv: 2604.11969
- Authors: Tarik Can Ozden, Sachidanand VS, Furkan Horoz, Ozgur Kara, Junho Kim, James Matthew Rehg
- Subjects: cs.AI
- Tags: Multi-Agent System, LLM Agent, Document Understanding
- Summary: 本文提出ArcDeck框架,将论文到幻灯片的生成任务建模为结构化叙事重构问题。通过构建话语树和全局承诺文档,结合多智能体迭代优化过程,显著提升了生成演示文稿的叙事流畅性和逻辑连贯性。
[5] The Long-Horizon Task Mirage? Diagnosing Where and Why Agentic Systems Break
- arXiv: 2604.11978
- Authors: Xinyu Jessica Wang, Haoyue Bai, Yiyou Sun, Haorui Wang, Shuibai Zhang, Wenjie Hu, Mya Schroder, Bilge Mutlu, Dawn Song, Robert D Nowak
- Subjects: cs.AI
- Tags: LLM Agent, LLM Evaluation, Benchmark
- Summary: 本文引入HORIZON基准,用于系统分析LLM智能体在长时域任务中的失败行为。通过评估多个模型家族的SOTA智能体并提出基于轨迹的LLM-as-a-Judge归因管道,揭示了长时域性能退化模式。
[6] When to Forget: A Memory Governance Primitive
- arXiv: 2604.12007
- Authors: Baris Simsek
- Subjects: cs.AI
- Tags: LLM Agent, Memory Architecture
- Summary: 本文提出Memory Worth(MW)指标,用于智能体记忆系统的质量治理决策。该指标通过跟踪记忆与成功/失败结果的共现频率,为记忆的陈旧检测、检索抑制和弃用决策提供理论依据和实证验证。
[7] Identity as Attractor: Geometric Evidence for Persistent Agent Architecture in LLM Activation Space
- arXiv: 2604.12016
- Authors: Vladimir Vasilenko
- Subjects: cs.AI; cs.LG
- Tags: LLM Agent, Interpretability, Representation Learning
- Code: code
- Summary: 本文研究持久认知智能体的身份文档是否在LLM激活空间中表现出吸引子动力学。实验证明语义完整的身份文档在隐藏状态空间中形成紧密聚类,为智能体身份诱导吸引子几何提供了表征证据。
[8] A longitudinal health agent framework
- arXiv: 2604.12019
- Authors: Georgianna, Rencong Jiang, Noémie Elhadad, Xuhai "Orson" Xu
- Subjects: cs.AI; cs.HC
- Tags: LLM Agent, Medical AI
- Summary: 本文提出一个多层框架用于支持纵向健康交互的AI智能体,在症状管理、行为改变和患者支持等任务中实现适应、连贯、持续和代理能力。通过用例展示了纵向智能体如何维持有意义的参与并支持安全的个性化决策。
[9] WiseOWL: A Methodology for Evaluating Ontological Descriptiveness and Semantic Correctness for Ontology Reuse and Ontology Recommendations
- arXiv: 2604.12025
- Authors: Aryan Singh Dalal, Maria Baloch, Asiyah Yu Lin, Anna Maria Masci, Kathleen M. Jagodnik, Hande Kucuk McGinty
- Subjects: cs.AI
- Tags: Knowledge Graph, Knowledge Representation
- Summary: 本文提出WiseOWL方法论,用于本体选择和复用评估。该方法通过四个指标(文档覆盖度、标签-定义对齐、结构互联性和层次平衡)对OWL本体进行评分,并提供可操作的反馈建议。
[10] Memory as Metabolism: A Design for Companion Knowledge Systems
- arXiv: 2604.12034
- Authors: Stefan Miteski
- Subjects: cs.AI
- Tags: LLM Agent, Memory Architecture, AI Safety
- Summary: 本文提出面向伴侣智能体的记忆治理框架,设计了TRIAGE、DECAY、CONTEXTUALIZE、CONSOLIDATE和AUDIT五种操作,以解决单用户知识维基中的固化和用户耦合漂移问题。
[11] Mathematics Teachers Interactions with a Multi-Agent System for Personalized Problem Generation
- arXiv: 2604.12066
- Authors: Candace Walkington, Theodora Beauchamp, Fareya Ikram, Merve Koçyiğit Gürbüz, Fangli Xia, Margan Lee, Andrew Lan
- Subjects: cs.AI; cs.CY
- Tags: Multi-Agent System, Education Technology, LLM Agent
- Venue: AIED 2026
- Summary: 本文研究一个多智能体教师参与系统,用于个性化中学数学问题生成。教师输入基础问题后,四个AI智能体分别评估数学准确性、真实性、可读性和现实性,研究发现个性化元素的真实性和适配性存在问题。
[12] Human-Inspired Context-Selective Multimodal Memory for Social Robots
- arXiv: 2604.12081
- Authors: Hangyeol Kang, Slava Voloshynovskiy, Nadia Magnenat Thalmann
- Subjects: cs.AI
- Tags: Robotics, Memory Architecture, Multimodal Learning
- Venue: AAMAS 2026
- Summary: 本文提出一种面向社交机器人的上下文选择性多模态记忆架构,捕获和检索文本与视觉情景痕迹,优先处理高情感显著性或场景新颖性的时刻。实验表明该方法在选择性存储和多模态检索上优于现有方法。
[13] LLM-HYPER: Generative CTR Modeling for Cold-Start Ad Personalization via LLM-Based Hypernetworks
- arXiv: 2604.12096
- Authors: Luyi Ma, Wanjia Sherry Zhang, Zezhong Fan, Shubham Thakur, Kai Zhao, Kehui Yao, Ayush Agarwal, Rahul Iyer, Jason Cho, Jianpeng Xu, Evren Korpeoglu, Sushant Kumar, Kannan Achan
- Subjects: cs.AI
- Tags: Recommender System, LLM Inference, Multimodal Learning
- Summary: 本文提出LLM-HYPER框架,将LLM作为超网络直接生成点击率预测器的参数,以无训练方式解决广告冷启动问题。通过少样本思维链提示和多模态广告内容推理,该方法在离线和在线实验中显著优于基线。
[14] Spatial Atlas: Compute-Grounded Reasoning for Spatial-Aware Research Agent Benchmarks
- arXiv: 2604.12102
- Authors: Arun Sharma
- Subjects: cs.AI; cs.CV; cs.LG
- Tags: LLM Agent, LLM Reasoning, Benchmark
- Code: code
- Summary: 本文引入计算导向推理(CGR)范式,在LLM生成前通过确定性计算解决所有可解答的子问题。Spatial Atlas系统实现了该范式,结合空间场景图引擎和熵引导动作选择,在空间感知基准测试上取得竞争性性能。
[15] The A-R Behavioral Space: Execution-Level Profiling of Tool-Using Language Model Agents in Organizational Deployment
- arXiv: 2604.12116
- Authors: Shasha Yu, Fiona Carroll, Barry L. Bentley
- Subjects: cs.AI; cs.SE
- Tags: LLM Agent, LLM Evaluation, LLM Security
- Summary: 本文引入基于行动率和拒绝信号的二维A-R行为空间,用于刻画工具使用LLM智能体在执行层的行为特征。研究表明执行和拒绝构成可分离的行为维度,其联合分布在上下文框架和自主性层级间系统性变化。
[16] Long-Horizon Plan Execution in Large Tool Spaces through Entropy-Guided Branching
- arXiv: 2604.12126
- Authors: Rongzhe Wei, Ge Shi, Min Cheng, Na Zhang, Pan Li, Sarthak Ghosh, Vaibhav Gorde, Leman Akoglu
- Subjects: cs.AI; cs.CL
- Tags: LLM Agent, Benchmark, Tool Learning
- Summary: 本文引入SLATE大规模工具基准和熵引导分支(EGB)算法,用于解决大规模工具空间中的长时域规划问题。EGB通过在高预测熵区域动态扩展决策分支,优化探索-利用权衡,显著提升任务成功率和计算效率。
[17] Aethon: A Reference-Based Replication Primitive for Constant-Time Instantiation of Stateful AI Agents
- arXiv: 2604.12129
- Authors: Swanand Rao, Kiran Kashalkar, Parvathi Somashekar, Priya Krishnan
- Subjects: cs.AI; cs.AR; cs.DC; cs.MA
- Tags: LLM Agent, Multi-Agent System, Memory Architecture
- Summary: 本文提出了Aethon,一种基于引用的复制原语,用于实现有状态AI智能体的近恒定时间实例化。该方法将智能体实例表示为稳定定义、分层内存和本地上下文覆盖的组合视图,将实例化从复制转变为引用,从而降低创建成本。
[18] Towards Platonic Representation for Table Reasoning: A Foundation for Permutation-Invariant Retrieval
- arXiv: 2604.12133
- Authors: Willy Carlos Tchuitcheu, Tan Lu, Ann Dooms
- Subjects: cs.AI
- Tags: RAG, Knowledge Representation, Table Reasoning
- Summary: 本文提出了表格的柏拉图表示假设,认为语义稳健的表格推理潜在空间必须具有排列不变性。作者引入了基于CKA的诊断指标,揭示了现有LLM在表格嵌入中对布局排列的脆弱性,并提出了一种结构感知的表格表示学习编码器架构。
[19] Beyond Factual Grounding: The Case for Opinion-Aware Retrieval-Augmented Generation
- arXiv: 2604.12138
- Authors: Aditya Agrawal, Alwarappan Nakkiran, Darshan Fofadiya, Alex Karlsson, Harsha Aduri
- Subjects: cs.AI; cs.CL; cs.IR
- Tags: RAG, Information Retrieval
- Summary: 本文指出现有RAG系统存在事实偏见,将观点视为噪声而非信息。作者提出了观点感知RAG架构,包含LLM观点提取、实体链接观点图和观点增强文档索引,在检索多样性方面取得显著提升。
[20] Development, Evaluation, and Deployment of a Multi-Agent System for Thoracic Tumor Board
- arXiv: 2604.12161
- Authors: Tim Ellis-Caleo, Timothy Keyes, Nerissa Ambers, Faraah Bekheet, Wen-wai Yim, Nikesh Kotecha, Nigam H. Shah, Joel Neal
- Subjects: cs.AI
- Tags: Multi-Agent System, Medical AI, Summarization
- Summary: 本文描述了为斯坦福胸部肿瘤委员会开发的多智能体AI系统,用于生成患者摘要。作者评估了多种自动化AI图表摘要方法,并报告了最终工具的部署和部署后监控情况。
[21] EMBER: Autonomous Cognitive Behaviour from Learned Spiking Neural Network Dynamics in a Hybrid LLM Architecture
- arXiv: 2604.12167
- Authors: William Savage
- Subjects: cs.AI; cs.NE
- Tags: Neuromorphic Computing, Memory Architecture, LLM Agent
- Summary: 本文提出了EMBER,一种混合认知架构,将LLM作为可替换的推理引擎置于持久的脉冲神经网络基底中。该22万神经元的SNN具有STDP学习能力,可在无外部提示的情况下自主触发LLM行动。
[22] Evaluating Relational Reasoning in LLMs with REL
- arXiv: 2604.12176
- Authors: Lukas Fesser, Yasha Ektefaie, Ada Fang, Sham M. Kakade, Marinka Zitnik
- Subjects: cs.AI
- Tags: LLM Reasoning, Benchmark, LLM Evaluation
- Summary: 本文引入了REL基准框架,通过关系复杂度视角评估LLM的关系推理能力。实验表明,随着关系复杂度增加,LLM性能持续单调下降,揭示了当前模型在高元关系推理方面的局限性。
[23] Policy-Invisible Violations in LLM-Based Agents
- arXiv: 2604.12177
- Authors: Jie Wu, Ming Gong
- Subjects: cs.AI; cs.CL; cs.CR; cs.LG
- Tags: LLM Agent, LLM Security, Benchmark
- Summary: 本文识别了LLM智能体中的策略不可见违规问题,即合规性判断所需的事实隐藏在决策时的上下文之外。作者提出了PhantomPolicy基准和Sentinel执行框架,通过反事实图模拟实现策略执行。
[24] TRUST Agents: A Collaborative Multi-Agent Framework for Fake News Detection, Explainable Verification, and Logic-Aware Claim Reasoning
- arXiv: 2604.12184
- Authors: Gautama Shastry Bulusu Venkata, Santhosh Kakarla, Maheedhar Omtri Mohan, Aishwarya Gaddam
- Subjects: cs.AI
- Tags: Multi-Agent System, Fake News Detection, Explainable AI
- Summary: 本文提出了TRUST Agents,一个用于可解释事实核查和假新闻检测的多智能体协作框架。系统包含声明提取、检索、验证和解释四个专门智能体,并扩展了复杂声明分解和多智能体陪审团机制。
[25] Beyond Scores: Diagnostic LLM Evaluation via Fine-Grained Abilities
- arXiv: 2604.12191
- Authors: Xu Zhang, Xudong Gong, Jiacheng Qin, Qiang Wang, JiaQi Liao, Zhe Wang, Dawei Feng, Bo Ding
- Subjects: cs.AI
- Tags: LLM Evaluation, Benchmark
- Summary: 本文提出了一个认知诊断框架,使用多维项目反应理论估计LLM在细粒度维度上的能力水平。该框架在数学、物理、化学和计算机科学等多个领域构建了能力分类体系,能够准确预测未见题目的表现。
[26] Latent patterns of urban mixing in mobility analysis across five global cities
- arXiv: 2604.12202
- Authors: Z. Fan, B. P. Y. Loo, F. Duarte, C. Ratti, E. Moro
- Subjects: cs.AI; cs.SI
- Tags: Graph Neural Network, Social Network Analysis
- Venue: Nature Cities 2026
- Summary: 本研究分析了五个全球城市超过20万居民的大规模出行调查数据,揭示社会混合模式。使用图神经网络构建时空场所网络,发现出行活动空间比社会人口特征更能解释社会混合的变化。
[27] Beyond Prompt: Fine-grained Simulation of Cognitively Impaired Standardized Patients via Stochastic Steering
- arXiv: 2604.12210
- Authors: Weikang Zhang, Zimo Zhu, Zhichuan Yang, Chen Huang, Wenqiang Lei, See-Kiong Ng
- Subjects: cs.AI; cs.CL
- Tags: Medical AI, Dialogue System, Prompt Engineering
- Venue: ACL 2026
- Summary: 本文提出了StsPatient方法,用于细粒度模拟认知障碍标准化病人。通过从对比指令-响应对中提取引导向量,并引入随机令牌调制机制,实现了对障碍严重程度的精确控制。
[28] Modality-Native Routing in Agent-to-Agent Networks: A Multimodal A2A Protocol Extension
- arXiv: 2604.12213
- Authors: Vasundra Srinivasan
- Subjects: cs.AI
- Tags: Multi-Agent System, Multimodal Learning, LLM Agent
- Code: code
- Summary: 本文提出了MMA2A架构,用于智能体间网络中的模态原生路由。实验表明,当下游推理智能体能够利用原生路由保留的丰富上下文时,任务准确率比文本瓶颈基线提高20个百分点。
[29] Designing Reliable LLM-Assisted Rubric Scoring for Constructed Responses: Evidence from Physics Exams
- arXiv: 2604.12227
- Authors: Xiuxiu Tang, G. Alex Ambrose, Ying Cheng
- Subjects: cs.AI; cs.CL
- Tags: LLM Evaluation, Education Technology
- Summary: 本研究检验了使用GPT-4o对本科物理建构反应题进行AI辅助评分的可靠性。结果显示人机一致性可与人类评分者间可靠性相比,细粒度检查清单式评分标准比整体评分更能提高一致性。
[30] HintMR: Eliciting Stronger Mathematical Reasoning in Small Language Models
- arXiv: 2604.12229
- Authors: Jawad Hossain, Xiangyu Guo, Jiawei Zhou, Chong Liu
- Subjects: cs.AI; cs.CL
- Tags: LLM Reasoning, Mathematical Reasoning, Knowledge Distillation
- Summary: 本文引入了提示辅助推理框架,通过从大型模型蒸馏训练的SLM生成上下文感知提示,逐步引导小型语言模型完成多步数学问题求解,在多个数学基准上显著提升推理准确性。
[31] How memory can affect collective and cooperative behaviors in an LLM-Based Social Particle Swarm
- arXiv: 2604.12250
- Authors: Taisei Hishiki, Takaya Arita, Reiji Suzuki
- Subjects: cs.AI; cs.CL; cs.GT; cs.MA
- Tags: Multi-Agent System, Social Simulation, LLM Agent
- Summary: 本研究使用社会粒子群模型检验记忆如何影响LLM智能体的集体和合作行为。实验发现记忆长度是决定集体行为的关键参数,且模型特定特征(可能包括对齐)在决定涌现社会行为中起基础性作用。
[32] A Scoping Review of Large Language Model-Based Pedagogical Agents
- arXiv: 2604.12253
- Authors: Shan Li, Juan Zheng
- Subjects: cs.AI
- Tags: Education Technology, Survey, LLM Agent
- Summary: 本综述分析了52项关于基于LLM的教学智能体的研究,识别出四个关键设计维度:交互方式、领域范围、角色复杂性和系统集成。文章讨论了新兴趋势、研究空白和伦理考量。
[33] GAM: Hierarchical Graph-based Agentic Memory for LLM Agents
- arXiv: 2604.12285
- Authors: Zhaofen Wu, Hanrong Zhang, Fulin Lin, Wujiang Xu, Xinran Xu, Yankai Chen, Henry Peng Zou, Shaowen Chen, Weizhi Zhang, Xue Liu, Philip S. Yu, Hongwei Wang
- Subjects: cs.AI
- Tags: LLM Agent, Memory Architecture
- Summary: 本文提出了GAM,一种层次化图结构代理记忆框架,通过显式解耦记忆编码与整合过程,有效解决了LLM代理在长期交互中快速上下文感知与稳定知识保留之间的冲突。该方法将正在进行的对话隔离在事件进展图中,仅在语义变化时整合到主题关联网络中,并引入图引导的多因素检索策略来增强上下文精度。
[34] Frontier-Eng: Benchmarking Self-Evolving Agents on Real-World Engineering Tasks with Generative Optimization
- arXiv: 2604.12290
- Authors: Yizhe Chi, Deyao Hong, Dapeng Jiang, Tianwei Luo, Kaisen Yang, Boshi Zhang, Zhe Cao, Xiaoyan Fan, Bingxiang He, Han Hao, Weiyang Jin, Dianqiao Lei, Qingle Liu, Houde Qian, Bowen Wang, Situ Wang, Youjie Zheng, Yifan Zhou, Calvin Xiao, Eren Cai, Qinhuai Na
- Subjects: cs.AI; cs.CL
- Tags: LLM Agent, Benchmark
- Summary: 本文介绍了Frontier-Eng,一个用于生成式优化的人机验证基准,涵盖47个跨五个工程类别的任务,采用迭代提出-执行-评估循环来评估AI代理在真实工程任务中的能力。实验表明,尽管Claude 4.6 Opus表现最为稳健,但该基准对所有模型仍具有挑战性,并揭示了改进频率和幅度的双幂律衰减规律。
[35] MultiDocFusion: Hierarchical and Multimodal Chunking Pipeline for Enhanced RAG on Long Industrial Documents
- arXiv: 2604.12352
- Authors: Joongmin Shin, Chanjun Park, Jeongbae Park, Jaehyung Seo, Heuiseok Lim
- Subjects: cs.AI; cs.CL
- Tags: RAG, Document Understanding
- Summary: 本文提出了MultiDocFusion,一种多模态分块流水线,通过视觉文档解析、OCR文本提取、LLM层次结构解析和DFS分组来处理长工业文档。实验表明,该方法相比基线在检索精度上提升8-15%,在ANLS QA分数上提升2-3%,强调了显式利用文档层次结构对RAG系统的重要性。
[36] ReflectCAP: Detailed Image Captioning with Reflective Memory
- arXiv: 2604.12357
- Authors: Kyungmin Min, Minbeom Kim, Kang-il Lee, Seunghyun Yoon, Kyomin Jung
- Subjects: cs.AI; cs.CV
- Tags: Vision-Language Model, Image Captioning
- Summary: 本文提出了ReflectCAP,一种多代理流水线方法,通过分析大型视觉语言模型的幻觉模式和系统性遗漏,将其提炼为可重用的结构化反思笔记来指导图像描述生成。该方法在事实性和覆盖范围之间达到了帕累托前沿,并在CapArena-Auto基准上取得了显著提升,同时计算开销比现有方法低21-36%。
[37] Preventing Safety Drift in Large Language Models via Coupled Weight and Activation Constraints
- arXiv: 2604.12384
- Authors: Songping Peng, Zhiheng Zhang, Daojian Zeng, Lincheng Jiang, Xieping Gao
- Subjects: cs.AI
- Tags: LLM Alignment, LLM Security
- Venue: ACL 2026
- Summary: 本文提出了CWAC方法,通过同时约束权重更新的安全子空间和对安全关键特征进行定向正则化,来防止LLM在微调过程中的安全对齐漂移。理论分析和实验表明,该方法在四个广泛使用的LLM和多种下游任务中始终实现最低的有害分数,同时对微调精度影响最小。
[38] Heuristic Classification of Thoughts Prompting (HCoT): Integrating Expert System Heuristics for Structured Reasoning into Large Language Models
- arXiv: 2604.12390
- Authors: Lei Lin, Jizhao Zhu, Yong Liu, Donghong Sun, Hongbo He, Yihua Du
- Subjects: cs.AI
- Tags: LLM Reasoning, Prompt Engineering
- Summary: 本文提出了启发式思维分类提示方法,将专家系统启发式与LLM推理能力相结合,通过启发式分类模型控制推理过程并提供可重用的抽象解决方案。在两个复杂归纳推理任务上,HCoT优于现有的思维树和思维链方法,并在24游戏任务上实现了性能与计算成本的帕累托前沿平衡。
[39] Operationalising the Right to be Forgotten in LLMs: A Lightweight Sequential Unlearning Framework for Privacy-Aligned Deployment in Politically Sensitive Environments
- arXiv: 2604.12459
- Authors: Esen Kurt, Haithem Afli
- Subjects: cs.AI
- Tags: Machine Unlearning, Privacy
- Summary: 本文提出了一种轻量级顺序遗忘框架,通过显式分离保留和抑制目标来实现LLM中的隐私对齐部署,首先通过正微调稳定良性能力,然后应用层限制负微调来抑制敏感模式。实验表明该方法能有效实现行为抑制,同时对事实准确性和流畅性影响最小。
[40] Enhancing Clustering: An Explainable Approach via Filtered Patterns
- arXiv: 2604.12460
- Authors: Motaz Ben Hassine, Saïd Jabbour
- Subjects: cs.AI
- Tags: Interpretability, Clustering
- Summary: 本文针对可解释聚类中多个不同k-RFP可能诱导相同k-cover导致冗余的问题,提出了一个模式缩减框架。该框架形式化刻画了冗余条件,提出了保留代表性模式的优化策略,实验证明能显著减少模式搜索空间、提高计算效率,同时保持或增强聚类质量。
[41] CIA: Inferring the Communication Topology from LLM-based Multi-Agent Systems
- arXiv: 2604.12461
- Authors: Yongxuan Wu, Xixun Lin, He Zhang, Nan Sun, Kun Wang, Chuan Zhou, Shirui Pan, Yanan Cao
- Subjects: cs.AI
- Tags: Multi-Agent System, LLM Security
- Venue: ACL 2026
- Summary: 本文研究了LLM多代理系统中通信拓扑的隐私风险,提出了通信推断攻击方法,在黑盒设置下通过构建对抗性查询和语义相关性建模来推断系统通信拓扑。实验表明CIA在优化通信拓扑的MAS上平均AUC达到0.87,峰值高达0.99,揭示了MAS中的重大隐私风险。
[42] Intelligent ROI-Based Vehicle Counting Framework for Automated Traffic Monitoring
- arXiv: 2604.12470
- Authors: Mohamed A. Abdelwahab, Zaynab Al-Ariny, Mahmoud Fakhry, El-Sayed Hasaneen
- Subjects: cs.AI
- Tags: Object Detection, Autonomous Driving
- Summary: 本文提出了一种全自动视频车辆计数框架,通过估计和预测两个阶段工作,在估计阶段自动确定最优感兴趣区域,在预测阶段高效执行车辆计数。该方法在UA-DETRAC、GRAM等基准数据集上实现了卓越的准确性和计算效率,处理速度比全帧处理快四倍。
[43] Technical Report -- A Context-Sensitive Multi-Level Similarity Framework for First-Order Logic Arguments: An Axiomatic Study
- arXiv: 2604.12534
- Authors: Victor David, Jérôme Delobelle, Jean-Guy Mailly
- Subjects: cs.AI; cs.LO
- Tags: Knowledge Representation, Logical Reasoning
- Summary: 本文引入了一阶逻辑论证相似性的综合框架,建立在扩展公理基础、四级参数模型和两种模型族之上,通过上下文权重实现细粒度和可解释的相似性度量。该框架整合了形式约束来强制执行期望属性,填补了现有方法仅关注命题逻辑的空白。
[44] A Two-Stage LLM Framework for Accessible and Verified XAI Explanations
- arXiv: 2604.12543
- Authors: Georgios Mermigkis, Dimitris Metaxakis, Marios Tyrovolas, Argiris Sofotasios, Nikolaos Avgeris, Panagiotis Hadjidoukas, Chrysostomos Stylios
- Subjects: cs.AI
- Tags: Explainable AI, LLM Evaluation
- Venue: WCCI 2026
- Summary: 本文提出了一个两阶段LLM元验证框架,包含解释器LLM、验证器LLM和迭代反馈机制,用于将XAI技术输出转换为可访问的自然语言解释并进行验证。实验表明验证对于过滤不可靠解释至关重要,同时熵产生率分析显示验证器的反馈逐步引导解释器走向更稳定的推理。
[45] Cross-Cultural Simulation of Citizen Emotional Responses to Bureaucratic Red Tape Using LLM Agents
- arXiv: 2604.12545
- Authors: Wanchun Ni, Jiugeng Sun, Yixian Liu, Mennatallah El-Assady
- Subjects: cs.AI; cs.CY
- Tags: LLM Agent, Social Simulation
- Venue: CHI 2026 Workshop
- Summary: 本文提出了一个评估框架来评估LLM在不同文化背景下对繁文缛节情绪反应的模拟能力,并介绍了RAMO交互界面用于模拟公民情绪反应和收集人类数据。试点研究表明所有模型与人类情绪反应的对齐程度有限,在东方文化中表现尤为薄弱,文化提示策略在改善对齐方面效果不佳。
[46] IDEA: An Interpretable and Editable Decision-Making Framework for LLMs via Verbal-to-Numeric Calibration
- arXiv: 2604.12573
- Authors: Yanji He, Yuxin Jiang, Yiwen Wu, Bo Huang, Jiaheng Wei, Wei Wang
- Subjects: cs.AI
- Tags: Decision Making, Interpretability
- Venue: ACL 2026
- Code: code
- Summary: 本文提出了IDEA框架,将LLM决策知识提取为语义因子上的可解释参数模型,通过EM算法联合学习语言到数值的映射和决策参数,实现校准概率并支持定量人机协作。实验表明IDEA配合Qwen-3-32B在五个数据集上优于DeepSeek R1和GPT-5.2,实现了完美的因子排除和精确校准。
[47] DeepTest Tool Competition 2026: Benchmarking an LLM-Based Automotive Assistant
- arXiv: 2604.12615
- Authors: Lev Sorokin, Ivan Vasilev, Samuele Pasini
- Subjects: cs.AI
- Tags: Software Testing, Benchmark
- Venue: ICSE 2026 Workshop
- Summary: 本报告总结了第一届LLM测试竞赛的结果,该竞赛作为ICSE 2026 DeepTest研讨会的一部分举行。四个工具在LLM汽车手册信息检索应用上进行竞争,目标是识别系统未能适当提及手册警告的用户输入,评估基于暴露故障的有效性和发现故障测试的多样性。
[48] Every Picture Tells a Dangerous Story: Memory-Augmented Multi-Agent Jailbreak Attacks on VLMs
- arXiv: 2604.12616
- Authors: Jianhao Chen, Haoyang Chen, Hanjie Zhao, Haozhe Liang, Tieyun Qian
- Subjects: cs.AI; cs.MM
- Tags: Vision-Language Model, LLM Security, Adversarial Robustness
- Summary: 本文提出了MemJack,一种记忆增强的多代理越狱攻击框架,利用视觉语义协调自动化越狱攻击,通过多代理合作将视觉实体映射到恶意意图,并使用迭代零空间投影几何过滤器绕过潜在空间拒绝。实验表明MemJack对Qwen3-VL-Plus达到71.48%的攻击成功率,并发布了包含超过113,000条交互轨迹的MemJack-Bench数据集。
[49] KnowRL: Boosting LLM Reasoning via Reinforcement Learning with Minimal-Sufficient Knowledge Guidance
- arXiv: 2604.12627
- Authors: Linhao Yu, Tianmeng Yang, Siyu Ding, Renren Jin, Naibin Gu, Xiangzhao Hao, Shuaiyi Nie, Deyi Xiong, Weichong Yin, Yu Sun, Hua Wu
- Subjects: cs.AI
- Tags: LLM Reasoning, Reinforcement Learning
- Code: code
- Summary: 该论文提出了KnowRL框架,通过将提示分解为原子知识点并使用约束子集搜索来解决强化学习中的奖励稀疏问题,从而提升大语言模型的推理能力。实验结果表明,该方法在1.5B规模的模型上取得了显著的性能提升。
[50] RPRA: Predicting an LLM-Judge for Efficient but Performant Inference
- arXiv: 2604.12634
- Authors: Dylan R. Ashley, Gaël Le Lan, Changsheng Zhao, Naina Dhingra, Zhipeng Cai, Ernie Chang, Mingchen Zhuge, Yangyang Shi, Vikas Chandra, Jürgen Schmidhuber
- Subjects: cs.AI; cs.CL; cs.LG; cs.MA
- Tags: LLM Inference, LLM Evaluation
- Summary: 本文研究了让模型预测其输出质量以决定是否请求帮助的范式,旨在平衡计算效率与输出质量。研究发现,经过微调或提供上下文报告卡的小模型能够可靠地预测其性能局限。
[51] Broadening the Applicability of Conditional Syntax Splitting for Reasoning from Conditional Belief Bases
- arXiv: 2604.12660
- Authors: Lars-Phillip Spiegel, Jonas Haldimann, Jesse Heyninck, Gabriele Kern-Isberner, Christoph Beierle
- Subjects: cs.AI
- Tags: Knowledge Representation, Logical Reasoning
- Summary: 该文章提出了一种广义的安全条件语法分割方法,放宽了对子基之间不相交签名的限制,从而扩宽了推理算子的适用性。这一新概念克服了以往分割概念的局限性,并引入了调整后的推理公设。
[52] Human-Centric Topic Modeling with Goal-Prompted Contrastive Learning and Optimal Transport
- arXiv: 2604.12663
- Authors: Rui Wang, Yi Zheng, Dongxin Wang, Haiping Huang, Yuanzhi Yao, Yuxiang Zhou, Jialin Yu, Philip Torr
- Subjects: cs.AI
- Tags: Topic Modeling, Optimal Transport
- Summary: 本文提出了一种以人为中心的主题建模任务,通过结合用户目标来生成更具解释性和目标导向的主题。作者提出了GCTM-OT模型,利用大语言模型提取目标并结合最优传输进行对比学习,显著提升了主题的一致性和多样性。
[53] Safe reinforcement learning with online filtering for fatigue-predictive human-robot task planning and allocation in production
- arXiv: 2604.12667
- Authors: Jintao Xue, Xiao Li, Nianmin Zhang
- Subjects: cs.AI
- Tags: Reinforcement Learning, Robotics, Manufacturing AI
- Venue: Journal of Manufacturing Systems 2026
- Summary: 该论文提出了一种名为PF-CD3Q的安全强化学习方法,用于解决人机协作制造中的任务规划与分配问题。该方法结合粒子滤波器和约束深度Q学习,实时预测工人疲劳并限制不安全动作,以保障生产效率与工人安全。
[54] A hierarchical spatial-aware algorithm with efficient reinforcement learning for human-robot task planning and allocation in production
- arXiv: 2604.12669
- Authors: Jintao Xue, Xiao Li, Nianmin Zhang
- Subjects: cs.AI
- Tags: Reinforcement Learning, Robotics, Manufacturing AI
- Venue: Robotics and Computer-Integrated Manufacturing 2026
- Summary: 本文提出了一种分层的人机任务规划与分配算法,包含用于任务规划的高层智能体和用于任务分配的低层智能体。该方法通过基于缓冲区的深度Q学习和路径规划,有效解决了动态制造环境下的空间感知与长期奖励稀疏问题。
[55] MISID: A Multimodal Multi-turn Dataset for Complex Intent Recognition in Strategic Deception Games
- arXiv: 2604.12700
- Authors: Shufang Lin, Muyang Chen, Xiabing Zhou, Rongrong Zhang, Dayou Zhang, Fangxin Wang
- Subjects: cs.AI
- Tags: Multimodal Learning, Intent Recognition, Benchmark
- Summary: 该论文介绍了MISID,一个用于战略欺骗游戏中复杂意图识别的多模态多轮数据集。作者提出了FRACTAM框架,通过解耦、锚定和推理范式,有效提升了模型在长上下文和跨模态因果推理任务中的表现。
[56] Transferable Expertise for Autonomous Agents via Real-World Case-Based Learning
- arXiv: 2604.12717
- Authors: Zhenyu Ma, Yuyang Song, Chunyi Yang, Jingyi Zhu, Letian Yang, Xukai Jiang
- Subjects: cs.AI
- Tags: LLM Agent, Transfer Learning
- Summary: 本文提出了一种基于案例学习的框架,使自主智能体能够将过往任务经验转化为可复用的知识资产。实验结果表明,该方法在复杂任务中表现优异,且获取的实践知识可在不同智能体间迁移。
[57] Can AI Tools Transform Low-Demand Math Tasks? An Evaluation of Task Modification Capabilities
- arXiv: 2604.12743
- Authors: Danielle S. Fox, Brenda L. Robles, Elizabeth DiPietro Brovey, Christian D. Schunn
- Subjects: cs.AI
- Tags: Education Technology, Mathematical Reasoning
- Summary: 该研究评估了AI工具在提升低认知需求数学任务质量方面的能力,发现通用和专用工具的成功率仅为中等。研究还指出,任务修改能力与任务分类能力之间存在负相关,揭示了AI在课程改编中的潜力与局限。
[58] DocSeeker: Structured Visual Reasoning with Evidence Grounding for Long Document Understanding
- arXiv: 2604.12812
- Authors: Hao Yan, Yuliang Liu, Xingchen Liu, Yuyi Zhang, Minghui Liao, Jihao Wu, Wei Chen, Xiang Bai
- Subjects: cs.AI
- Tags: Document Understanding, Multimodal Learning, LLM Reasoning
- Venue: CVPR 2026
- Summary: 本文提出了DocSeeker,一种用于长文档理解的结构化视觉推理框架,通过“分析、定位、推理”的工作流程来应对信噪比低和监督信号稀缺的挑战。该方法结合了监督微调和证据感知的策略优化,在长文档任务上取得了优越性能。
[59] RePAIR: Interactive Machine Unlearning through Prompt-Aware Model Repair
- arXiv: 2604.12820
- Authors: Jagadeesh Rachapudi, Pranav Singh, Ritali Vatsi, Praful Hambarde, Amit Shukla
- Subjects: cs.AI; cs.CL
- Tags: Machine Unlearning, LLM Alignment
- Summary: 该论文提出了一种交互式机器遗忘范式,允许用户通过自然语言指令让大语言模型遗忘特定知识。作者设计了RePAIR框架,利用无需训练的激活操纵方法,实现了高效且有效的模型修复与知识遗忘。
[60] Artificial Intelligence for Modeling and Simulation of Mixed Automated and Human Traffic
- arXiv: 2604.12857
- Authors: Saeed Rahmani, Shiva Rasouli, Daphne Cornelisse, Eugene Vinitsky, Bart van Arem, Simeon C. Calvert
- Subjects: cs.AI; cs.RO; eess.SY
- Tags: Autonomous Driving, Survey, Traffic Simulation
- Summary: 本文综述了人工智能在混合自动驾驶与人类驾驶交通仿真建模中的应用,涵盖了从个体行为建模到全场景仿真的各类方法。文章提出了一个新的分类体系,分析了现有仿真平台的不足,并指出了未来的研究方向。
[61] From edges to meaning: Semantic line sketches as a cognitive scaffold for ancient pictograph invention
- arXiv: 2604.12865
- Authors: Seowung Leem, Lin Gu, Ruogu Fang
- Subjects: cs.AI
- Tags: Cognitive Science, Image Synthesis, Cultural Heritage
- Summary: 该研究构建了一个受生物学启发的视觉层级数字孪生模型,模拟大脑将语义知识转化为视觉符号的过程。生成的符号与古埃及、甲骨文等早期象形文字具有惊人的结构相似性,支持了象形文字的神经计算起源假说。
[62] QuarkMedSearch: A Long-Horizon Deep Search Agent for Exploring Medical Intelligence
- arXiv: 2604.12867
- Authors: Zhichao Lin, Zhichao Liang, Gaoqiang Liu, Meng Xu, Baoyu Xiang, Jian Xu, Guanjun Jiang
- Subjects: cs.AI
- Tags: LLM Agent, Medical AI, Information Retrieval
- Summary: 本文提出了QuarkMedSearch,一个面向中文医疗场景的长视距深度搜索智能体。该方法结合医疗知识图谱与实时在线探索构建训练数据,并通过两阶段训练策略显著提升了模型在医疗垂直领域的搜索与推理能力。
[63] LIFE -- an energy efficient advanced continual learning agentic AI framework for frontier systems
- arXiv: 2604.12874
- Authors: Anne Lee, Gurudutt Hosangadi
- Subjects: cs.AI
- Tags: LLM Agent, Continual Learning, High Performance Computing
- Summary: 该论文提出了LIFE框架,一种用于高性能计算系统管理的节能、持续学习型智能体AI框架。该框架结合了编排器、智能体上下文工程和新型记忆系统,实现了自演化的网络管理与运维。
[64] AISafetyBenchExplorer: A Metric-Aware Catalogue of AI Safety Benchmarks Reveals Fragmented Measurement and Weak Benchmark Governance
- arXiv: 2604.12875
- Authors: Abiodun A. Solanke
- Subjects: cs.AI
- Tags: AI Safety, Benchmark, LLM Evaluation
- Summary: 本文发布了AISafetyBenchExplorer,一个包含195个AI安全基准的结构化目录,旨在揭示当前安全评估中存在的测量碎片化和治理薄弱问题。研究指出,尽管基准数量众多,但缺乏统一的测量语言和维护规范。
[65] BEAM: Bi-level Memory-adaptive Algorithmic Evolution for LLM-Powered Heuristic Design
- arXiv: 2604.12898
- Authors: Chuyang Xiang, Yichen Wei, Jiale Ma, Handing Wang, Junchi Yan
- Subjects: cs.AI; math.CO
- Tags: LLM Agent, Neural Combinatorial Optimization, Program Synthesis
- Summary: 本文提出BEAM,一种用于LLM驱动的启发式设计的双层优化框架。外层通过遗传算法演化高层算法结构,内层通过蒙特卡洛树搜索实现函数占位符,在CVRP和MIS等优化问题上显著优于现有方法。
[66] Drawing on Memory: Dual-Trace Encoding Improves Cross-Session Recall in LLM Agents
- arXiv: 2604.12948
- Authors: Benjamin Stern, Peter Nadel
- Subjects: cs.AI
- Tags: LLM Agent, Memory Architecture
- Summary: 本文提出双轨迹记忆编码方法,将存储的事实与具体的场景轨迹配对,以改善LLM代理的跨会话回忆能力。在LongMemEval-S基准测试中,该方法相比纯事实编码提升了20.2个百分点,尤其在时间推理和知识更新跟踪方面表现突出。
[67] Modeling Co-Pilots for Text-to-Model Translation
- arXiv: 2604.12955
- Authors: Serdar Kadioglu, Karthik Uppuluri, Akash Singirikonda
- Subjects: cs.AI
- Tags: Program Synthesis, LLM Reasoning, Optimization
- Venue: AAAI 2025 Workshop
- Summary: 本文介绍Text2Model副驾驶套件和Text2Zinc数据集,用于将自然语言描述的优化和满足性问题转换为形式化模型。该框架采用求解器无关的MiniZinc建模语言,评估了零样本提示、思维链推理和智能体分解等多种策略。
[68] Cycle-Consistent Search: Question Reconstructability as a Proxy Reward for Search Agent Training
- arXiv: 2604.12967
- Authors: Sohyun An, Shuibenyang Yuan, Hayeon Lee, Cho-Jui Hsieh, Alexander Min
- Subjects: cs.AI
- Tags: Reinforcement Learning, Information Retrieval, Question Answering
- Summary: 本文提出循环一致性搜索(CCS),一种无需黄金监督的搜索代理训练框架。核心思想是高质量的搜索轨迹应保留足够信息来重建原始问题,从而为策略优化提供奖励信号,在问答基准测试上达到与监督基线相当的性能。
[69] Bilevel Late Acceptance Hill Climbing for the Electric Capacitated Vehicle Routing Problem
- arXiv: 2604.13013
- Authors: Yinghao Qin, Mosab Bazargani, Edmund K. Burke, Carlos A. Coello Coello, Zhongmin Song, Jun Chen
- Subjects: cs.AI; math.OC
- Tags: Optimization, Neural Combinatorial Optimization
- Summary: 本文提出一种双层优化框架来解决电动容量车辆路径问题(E-CVRP),将路由和充电决策分开或联合处理。b-LAHC算法在IEEE WCCI-2020基准测试中表现优异,在大规模实例上创造了9/10的新最佳结果。
[70] PAL: Personal Adaptive Learner
- arXiv: 2604.13017
- Authors: Megha Chakraborty, Darssan L. Eswaramoorthi, Madhur Thareja, Het Riteshkumar Shah, Finlay Palmer, Aryaman Bahl, Michelle A Ihetu, Amit Sheth
- Subjects: cs.AI; cs.HC
- Tags: Education Technology, Multimodal Learning
- Summary: 本文介绍PAL,一个AI驱动的个性化学习平台,可将讲座视频转化为交互式学习体验。系统持续分析多模态内容并动态调整问题难度,最终生成个性化总结,实现了从静态个性化到实时自适应支持的转变。
跨领域投稿 (129)
[71] ART-VITON: Measurement-Guided Latent Diffusion for Artifact-Free Virtual Try-On
- arXiv: 2509.25749 (cross-listed)
- Authors: Junseo Park, Hyeryung Jang
- Subjects: cs.CV; cs.AI
- Tags: Diffusion Model, Image Synthesis, Virtual Try-On
- Summary: 本文提出ART-VITON,一种测量引导的扩散模型框架,将虚拟试穿重新表述为线性逆问题。该方法通过轨迹对齐求解器和残差先验初始化,有效保持身份和背景信息,消除了边界伪影。
[72] Should There be a Teacher In-the-Loop? A Study of Generative AI Personalized Tasks Middle School
- arXiv: 2602.15876 (cross-listed)
- Authors: Candace Walkington, Mingyu Feng, Itffini Pruitt-Britton, Theodora Beauchamp, Andrew Lan
- Subjects: cs.CY; cs.AI
- Tags: Education Technology, Human-Computer Interaction
- Summary: 本研究探讨了中学数学教师与ChatGPT合作创建个性化问题的过程。研究发现教师参与循环的个性化粒度较粗,而学生更偏好细粒度的流行文化引用,且该过程并未随着教师学习而变得时间高效。
[73] Back to Basics: Let Conversational Agents Remember with Just Retrieval and Generation
- arXiv: 2604.11628 (cross-listed)
- Authors: Yuqian Wu, Wei Chen, Zhengjun Huang, Junle Chen, Qingxiang Liu, Kai Wang, Xiaofang Zhou, Yuxuan Liang
- Subjects: cs.CL; cs.AI
- Tags: Dialogue System, Memory Architecture, Information Retrieval
- Summary: 本文识别出信号稀疏效应是对话记忆系统的主要瓶颈,并提出极简框架,仅依赖检索和生成。通过回合隔离检索和查询驱动剪枝,该方法在多个基准测试上超越复杂基线,同时保持高效的令牌和延迟性能。
[74] GRACE: A Dynamic Coreset Selection Framework for Large Language Model Optimization
- arXiv: 2604.11810 (cross-listed)
- Authors: Tianhao Tang, Haoyang Li, Lei Chen
- Subjects: cs.DB; cs.AI
- Tags: LLM Training, Data Selection
- Summary: 本文提出GRACE,一种图引导的自适应核心集选择框架,用于LLM训练。该方法结合表示多样性和梯度重要性指标动态构建核心集,通过k-NN图传播机制降低更新成本,在多个基准测试上显著提升训练效率和下游性能。
[75] M$^\star$: Every Task Deserves Its Own Memory Harness
- arXiv: 2604.11811 (cross-listed)
- Authors: Wenbo Pan, Shujie Liu, Xiangyang Zhou, Shiwei Zhang, Wanlu Shi, Mirror Xu, Xiaohua Jia
- Subjects: cs.PL; cs.AI; cs.CL; cs.LG
- Tags: LLM Agent, Memory Architecture, Program Synthesis
- Summary: 本文提出M*方法,通过可执行程序演化自动发现任务优化的记忆线束。该方法将代理记忆系统建模为Python程序,使用反思性代码演化和种群搜索策略联合优化数据模式、存储逻辑和工作流指令。
[76] Schema-Adaptive Tabular Representation Learning with LLMs for Generalizable Multimodal Clinical Reasoning
- arXiv: 2604.11835 (cross-listed)
- Authors: Hongxi Mao, Wei Zhou, Mengting Jia, Tao Fang, Huan Gao, Bin Zhang, Shangyang Li
- Subjects: cs.LG; cs.AI
- Tags: Medical AI, Tabular Learning, Multimodal Learning
- Summary: 本文提出模式自适应表格表示学习方法,利用LLM将结构化变量转换为语义自然语言语句,创建可迁移的表格嵌入。该方法在痴呆症诊断任务中实现了对未见模式的零样本对齐,显著优于临床基线。
[77] A Layer-wise Analysis of Supervised Fine-Tuning
- arXiv: 2604.11838 (cross-listed)
- Authors: Qinghua Zhao, Xueling Gong, Xinyu Chen, Zhongfeng Kang, Xinlu Li
- Subjects: cs.LG; cs.AI
- Tags: Instruction Tuning, Parameter-Efficient Fine-Tuning
- Venue: ACL 2026
- Summary: 本文通过信息论、几何和优化指标对监督微调进行层级分析,发现中间层稳定而最终层敏感。基于此洞察,提出中间块高效调优方法,选择性更新关键中间层,在GSM8K上比LoRA提升10.2%。
[78] Beyond Static Sandboxing: Learned Capability Governance for Autonomous AI Agents
- arXiv: 2604.11839 (cross-listed)
- Authors: Bronislav Sidik, Lior Rokach
- Subjects: cs.CR; cs.AI
- Tags: LLM Agent, LLM Security, AI Safety
- Code: code
- Summary: 本文针对自主AI代理的能力过度配置问题,提出Aethelgard四层自适应治理框架。该框架通过学习策略强制执行最小权限原则,动态限定代理可感知的工具范围,并使用强化学习学习每种任务类型的最小可行能力集。
[79] Polynomial Expansion Rank Adaptation: Enhancing Low-Rank Fine-Tuning with High-Order Interactions
- arXiv: 2604.11841 (cross-listed)
- Authors: Wenhao Zhang, Lin Mu, Li Ni, Peiquan Jin, Yiwen Zhang
- Subjects: cs.LG; cs.AI
- Tags: Parameter-Efficient Fine-Tuning, LLM Training
- Venue: ACL 2026 Workshop
- Code: code
- Summary: 本文提出PERA方法,在低秩因子空间中引入结构化多项式展开,以捕获高阶参数交互。该方法将适应空间转换为多项式流形,在不增加秩或推理成本的情况下增强表达能力,在多个基准测试上持续优于现有方法。
[80] DBGL: Decay-aware Bipartite Graph Learning for Irregular Medical Time Series Classification
- arXiv: 2604.11842 (cross-listed)
- Authors: Jian Chen, Yuzhu Hu, Xiaoyan Yuan, Yuxuan Hu, Jinfeng Xu, Yipeng Du, Wenhao Yuan, Wei Wang, Edith C. H. Ngai
- Subjects: cs.LG; cs.AI
- Tags: Time Series Analysis, Medical AI, Graph Neural Network
- Summary: 本文提出DBGL方法用于不规则医疗时间序列分类,引入患者-变量二部图捕获不规则采样模式,并设计节点特定的时间衰减编码机制。该方法在四个公开数据集上均优于所有基线。
[81] Evaluating the Limitations of Protein Sequence Representations for Parkinson's Disease Classification
- arXiv: 2604.11852 (cross-listed)
- Authors: César Jesús Núñez-Prado, Grigori Sidorov, Liliana Chanona-Hernández
- Subjects: q-bio.QM; cs.AI; cs.LG
- Tags: Bioinformatics, Medical AI, Representation Learning
- Summary: 本文评估了多种蛋白质序列表示方法在帕金森病分类中的表现,发现仅依靠蛋白质一级序列信息提供的判别能力有限,最佳F1分数仅为0.70左右。
[82] MVAdapt: Zero-Shot Multi-Vehicle Adaptation for End-to-End Autonomous Driving
- arXiv: 2604.11854 (cross-listed)
- Authors: Haesung Oh, Jaeheung Park
- Subjects: cs.RO; cs.AI
- Tags: Autonomous Driving, Transfer Learning, Domain Adaptation
- Code: code
- Summary: 本文提出MVAdapt框架,通过物理条件适配实现端到端自动驾驶模型在不同车辆间的零样本迁移,有效解决车辆域差距问题。
[83] Disposition Distillation at Small Scale: A Three-Arc Negative Result
- arXiv: 2604.11867 (cross-listed)
- Authors: Hari Sadasivan
- Subjects: cs.LG; cs.AI
- Tags: Knowledge Distillation, LLM Evaluation
- Summary: 本文报告了在小型语言模型中训练行为倾向的负面结果,发现没有任何操作能够在不损害内容质量的情况下改变模型的行为倾向。
[84] Thermodynamic Liquid Manifold Networks: Physics-Bounded Deep Learning for Solar Forecasting in Autonomous Off-Grid Microgrids
- arXiv: 2604.11909 (cross-listed)
- Authors: Mohammed Ezzaldin Babiker Abdullah
- Subjects: cs.LG; cs.AI; eess.SY
- Tags: Physics-Informed Learning, Time Series Forecasting, Energy Management
- Summary: 本文提出热力学液体流形网络,通过物理约束的深度学习架构实现太阳能预测,消除了物理上不可能的夜间发电预测并保持零相位延迟。
[85] How Transformers Learn to Plan via Multi-Token Prediction
- arXiv: 2604.11912 (cross-listed)
- Authors: Jianhao Huang, Zhanpeng Zhou, Renqiu Xia, Baharan Mirzasoleiman, Weijie Su, Wei Huang
- Subjects: cs.LG; cs.AI
- Tags: LLM Reasoning, Pre-training
- Summary: 本文研究多token预测如何促进推理和规划,证明其诱导反向推理过程,在图路径查找和布尔可满足性问题等任务上优于下一token预测。
[86] Can AI Detect Life? Lessons from Artificial Life
- arXiv: 2604.11915 (cross-listed)
- Authors: Ankit Gupta, Christoph Adami
- Subjects: cs.LG; cs.AI; cs.NE; q-bio.PE
- Tags: AI Safety, Adversarial Robustness
- Venue: Alife 2026
- Summary: 本文利用人工生命研究表明,机器学习方法在地外生命检测中容易被分布外样本欺骗,导致大量假阳性结果。
[87] AutoSurrogate: An LLM-Driven Multi-Agent Framework for Autonomous Construction of Deep Learning Surrogate Models in Subsurface Flow
- arXiv: 2604.11945 (cross-listed)
- Authors: Jiale Liu, Nanzhe Wang
- Subjects: cs.LG; cs.AI; cs.MA
- Tags: LLM Agent, Scientific Computing, Multi-Agent System
- Summary: 本文提出AutoSurrogate,一个LLM驱动的多智能体框架,使非ML专家能够通过自然语言指令自动构建地下流动问题的高质量深度学习代理模型。
[88] ResBM: Residual Bottleneck Models for Low-Bandwidth Pipeline Parallelism
- arXiv: 2604.11947 (cross-listed)
- Authors: Alan Aboudib, Rodrigo Lopez Portillo A., Kalei Brady, Steffen Cruz
- Subjects: cs.LG; cs.AI; cs.DC
- Tags: Distributed Training, Model Compression
- Summary: 本文提出残差瓶颈模型,专为低带宽通信环境设计,实现128倍激活压缩而不显著损失收敛性能,适用于去中心化训练。
[89] AnyPoC: Universal Proof-of-Concept Test Generation for Scalable LLM-Based Bug Detection
- arXiv: 2604.11950 (cross-listed)
- Authors: Zijie Zhao, Chenyuan Yang, Weidong Wang, Yihan Yang, Ziqi Zhang, Lingming Zhang
- Subjects: cs.SE; cs.AI; cs.CL; cs.CR
- Tags: LLM Agent, Software Testing, Multi-Agent System
- Summary: 本文提出AnyPoC多智能体框架,通过生成概念验证测试来验证LLM检测的bug,在多个关键软件系统中发现了122个新bug。
[90] INDOTABVQA: A Benchmark for Cross-Lingual Table Understanding in Bahasa Indonesia Documents
- arXiv: 2604.11970 (cross-listed)
- Authors: Somraj Gautam, Anathapindika Dravichi, Gaurav Harit
- Subjects: cs.CV; cs.AI; cs.CL; cs.LG
- Tags: Document Understanding, Vision-Language Model, Benchmark
- Venue: ACL 2026
- Summary: 本文引入INDOTABVQA基准,用于评估印尼语文档上的跨语言表格视觉问答,揭示了VLM在结构复杂表格和低资源语言上的性能差距。
[91] Filtered Reasoning Score: Evaluating Reasoning Quality on a Model's Most-Confident Traces
- arXiv: 2604.11996 (cross-listed)
- Authors: Manas Pathak, Xingyao Chen, Shuozhe Li, Amy Zhang, Liu Leqi
- Subjects: cs.CL; cs.AI
- Tags: LLM Evaluation, LLM Reasoning
- Code: code
- Summary: 本文提出过滤推理分数(FRS),通过仅使用最自信的推理轨迹来评估推理质量,能够区分具有相似准确率但推理能力不同的模型。
[92] The Second Challenge on Cross-Domain Few-Shot Object Detection at NTIRE 2026: Methods and Results
- arXiv: 2604.11998 (cross-listed)
- Authors: Xingyu Qiu, Yuqian Fu, Jiawei Geng, Bin Ren, Jiancheng Pan, Zongwei Wu, Hao Tang, Yanwei Fu, Radu Timofte, Nicu Sebe, Mohamed Elhoseiny, Lingyi Hong, Mingxi Cheng, Xingqi He, Runze Li, Xingdong Sheng, Wenqiang Zhang, Jiacong Liu, Shu Luo, Yikai Qin, Yaze Zhao, Yongwei Jiang, Yixiong Zou, Zhe Zhang, Yang Yang, Kaiyu Li, Bowen Fu, Zixuan Jiang, Ke Li, Hui Qiao, Xiangyong Cao, Xuanlong Yu, Youyang Sha, Longfei Liu, Di Yang, Xi Shen, Kyeongryeol Go, Taewoong Jang, Saiprasad Meesiyawar, Ravi Kirasur, Rakshita Kulkarni, Bhoomi Deshpande, Harsh Patil, Uma Mudenagudi, Shuming Hu, Chao Chen, Tao Wang, Wei Zhou, Qi Xu, Zhenzhao Xing, Dandan Zhao, Hanzhe Xia, Dongdong Lu, Zhe Zhang, Jingru Wang, Guangwei Huang, Jiachen Tu, Yaokun Shi, Guoyi Xu, Yaoxin Jiang, Jiajia Liu, Liwei Zhou, Bei Dou, Tao Wu, Zekang Fan, Junjie Liu, Adhémar de Senneville, Flavien Armangeon, Mengbers, Yazhe Lyu, Zhimeng Xin, Zijian Zhuang, Hongchun Zhu, Li Wang
- Subjects: cs.CV; cs.AI
- Tags: Object Detection, Few-Shot Learning, Domain Adaptation
- Venue: CVPR 2026 Workshop
- Code: code
- Summary: 本文报告了NTIRE 2026第二届跨域少样本目标检测挑战赛的结果,128名参与者探索了在有限标注条件下检测未见域目标的策略。
[93] BayMOTH: Bayesian optiMizatiOn with meTa-lookahead -- a simple approacH
- arXiv: 2604.12005 (cross-listed)
- Authors: Rahman Ejaz, Varchas Gopalaswamy, Ricardo Luna, Aarne Lees, Vineet Gundecha, Christopher Kanan, Soumyendu Sarkar, Riccardo Betti
- Subjects: cs.LG; cs.AI
- Tags: Bayesian Optimization, Meta-Learning
- Summary: 本文提出一种简单的元贝叶斯优化算法,在任务相关性高时利用相关任务信息,否则回退到前瞻策略,在低任务相关性场景下保持强性能。
[94] LLMs Struggle with Abstract Meaning Comprehension More Than Expected
- arXiv: 2604.12018 (cross-listed)
- Authors: Hamoud Alhazmi, Jiachen Jiang
- Subjects: cs.CL; cs.AI
- Tags: Natural Language Understanding, LLM Evaluation
- Summary: 本文发现包括GPT-4o在内的大多数大语言模型在抽象语义理解方面存在困难,而基于双向注意力的微调模型表现更好。
[95] Curvelet-Based Frequency-Aware Feature Enhancement for Deepfake Detection
- arXiv: 2604.12028 (cross-listed)
- Authors: Salar Adel Sabri, Ramadhan J. Mstafa
- Subjects: cs.CV; cs.AI
- Tags: Deepfake Detection, Image Forensics
- Summary: 本文引入基于Curvelet变换的深度伪造检测方法,通过楔形级注意力和尺度感知空间掩码增强频率特征,在FaceForensics++数据集上取得高准确率。
[96] Benchmarking Deflection and Hallucination in Large Vision-Language Models
- arXiv: 2604.12033 (cross-listed)
- Authors: Nicholas Moratelli, Christopher Davis, Leonardo F. R. Ribeiro, Bill Byrne, Gonzalo Iglesias
- Subjects: cs.CL; cs.AI; cs.CV
- Tags: LLM Hallucination, Vision-Language Model, Benchmark
- Venue: ACL 2026
- Summary: 本文引入VLM-DeflectionBench基准,用于评估大型视觉语言模型在冲突或不足证据场景下的偏转和幻觉行为。
[97] SIR-Bench: Evaluating Investigation Depth in Security Incident Response Agents
- arXiv: 2604.12040 (cross-listed)
- Authors: Daniel Begimher, Cristian Leo, Jack Huang, Pat Gaw, Bonan Zheng
- Subjects: cs.CR; cs.AI; cs.SE
- Tags: LLM Agent, Cybersecurity, Benchmark
- Summary: 该论文提出了SIR-Bench,一个用于评估安全事件响应代理调查深度的基准测试,包含794个测试用例。它引入了OUAT框架来重放真实事件模式,并提出了三个互补指标来衡量代理的分类准确性、新发现发现能力和工具使用适当性。
[98] VISTA: Validation-Informed Trajectory Adaptation via Self-Distillation
- arXiv: 2604.12044 (cross-listed)
- Authors: Eli Corn, Daphna Weinshall
- Subjects: cs.LG; cs.AI
- Tags: Knowledge Distillation, Optimization
- Summary: 该论文提出了VISTA,一种在线自蒸馏框架,旨在解决深度学习模型在训练过程中可能放弃高泛化状态的“轨迹偏差”问题。该方法利用验证信息识别专家锚点,并通过在线集成来正则化损失景观,从而提高模型的鲁棒性和泛化能力。
[99] Leveraging Weighted Syntactic and Semantic Context Assessment Summary (wSSAS) Towards Text Categorization Using LLMs
- arXiv: 2604.12049 (cross-listed)
- Authors: Shreeya Verma Kathuria, Nitin Mayande, Sharookh Daruwalla, Nitin Joglekar, Charles Weber
- Subjects: cs.CL; cs.AI
- Tags: Text Classification, LLM Inference
- Summary: 该论文提出了wSSAS框架,旨在解决大语言模型在文本分类中因随机性和噪声导致的精度和可重复性问题。该框架通过分层结构和信噪比优先级排序来增强数据完整性,显著提升了聚类完整性和分类准确性。
[100] Interpretable DNA Sequence Classification via Dynamic Feature Generation in Decision Trees
- arXiv: 2604.12060 (cross-listed)
- Authors: Nicolas Huynh, Krzysztof Kacprzyk, Ryan Sheridan, David Bentley, Mihaela van der Schaar
- Subjects: cs.LG; cs.AI; q-bio.GN
- Tags: Genomic AI, Interpretability, Feature Generation
- Venue: AISTATS 2026
- Summary: 该论文提出了DEFT框架,通过在决策树构建过程中利用大语言模型动态生成生物学相关的特征,实现了可解释的DNA序列分类。实验表明,该方法能够发现具有高度预测性且人类可解释的序列特征,解决了传统决策树表达能力受限的问题。
[101] Robust Explanations for User Trust in Enterprise NLP Systems
- arXiv: 2604.12069 (cross-listed)
- Authors: Guilin Zhang, Kai Zhao, Jeffrey Friedman, Xu Chu, Amine Anoun, Jerry Ting
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: Interpretability, LLM Evaluation, Adversarial Robustness
- Summary: 该论文提出了一个统一的黑盒鲁棒性评估框架,用于企业NLP系统中的Token级解释,发现解码器LLM生成的解释比编码器模型更稳定。研究还揭示了模型规模与解释稳定性之间的关系,并提供了成本与鲁棒性权衡的实用曲线。
[102] OpenTME: An Open Dataset of AI-powered H&E Tumor Microenvironment Profiles from TCGA
- arXiv: 2604.12075 (cross-listed)
- Authors: Maaike Galama, Nina Kozar-Gillan, Christina Embacher, Todd Dembo, Cornelius Böhm, Evelyn Ramberger, Julika Ribbat-Idel, Rosemarie Krupar, Verena Aumiller, Miriam Hägele, Kai Standvoss, Gerrit Erdmann, Blanca Pablos, Ari Angelo, Simon Schallenberg, Andrew Norgan, Viktor Matyas, Klaus-Robert Müller, Maximilian Alber, Lukas Ruff, Frederick Klauschen
- Subjects: cs.CV; cs.AI; cs.LG; q-bio.QM
- Tags: Medical AI, Dataset
- Summary: 该论文介绍了OpenTME,一个基于TCGA数据的H&E染色病理图像肿瘤微环境(TME)图谱开放数据集。该数据集利用AI工具生成了超过4500个定量读数,旨在服务于生物标志物发现和空间生物学研究。
[103] Narrative over Numbers: The Identifiable Victim Effect and its Amplification Under Alignment and Reasoning in Large Language Models
- arXiv: 2604.12076 (cross-listed)
- Authors: Syed Rifat Raiyan
- Subjects: cs.CL; cs.AI; cs.CY
- Tags: LLM Alignment, LLM Reasoning, AI Ethics
- Summary: 该论文首次对大语言模型中的“可识别受害者效应”进行了系统性大规模实证研究,发现该效应在指令微调模型中被放大,而在推理专用模型中被反转。研究还发现标准的思维链提示会显著增加该效应,这对AI在人道主义和伦理决策中的应用具有重要意义。
[104] LLM-Based Automated Diagnosis Of Integration Test Failures At Google
- arXiv: 2604.12108 (cross-listed)
- Authors: Celal Ziftci, Ray Liu, Spencer Greene, Livio Dalloro
- Subjects: cs.SE; cs.AI
- Tags: Software Testing, LLM Agent
- Summary: 该论文介绍了Auto-Diagnose,一种利用大语言模型帮助开发者诊断集成测试失败的工具,能够分析日志并生成简洁的根本原因摘要。该工具在Google内部广泛部署并获得了高度评价,显著提高了诊断效率。
[105] PR-MaGIC: Prompt Refinement Via Mask Decoder Gradient Flow For In-Context Segmentation
- arXiv: 2604.12113 (cross-listed)
- Authors: Minjae Lee, Sungwoo Hur, Soojin Hwang, Won Hwa Kim
- Subjects: cs.CV; cs.AI
- Tags: Image Segmentation, Prompt Engineering
- Summary: 该论文提出了PR-MaGIC,一种无需训练的测试时框架,通过利用SAM掩码解码器的梯度流来优化上下文分割中的提示。该方法有效缓解了支持图像和查询图像之间的视觉不一致问题,显著提升了分割质量。
[106] Observing the unobserved confounding through its effects: toward randomized trial-like estimates from real-world survival data
- arXiv: 2604.12137 (cross-listed)
- Authors: Vasiliki Stoumpou, Dimitris Bertsimas, Samuel Singer, Georgios Antonios Margonis
- Subjects: stat.AP; cs.AI; stat.ME
- Tags: Causal Inference, Medical AI
- Summary: 该论文提出了一种三步框架,通过推断潜在预后因子并对其进行平衡,来解决观察性生存数据中的未观察混杂问题。实验结果表明,该方法能显著提高治疗效果估计与随机对照试验结果的一致性。
[107] From Plan to Action: How Well Do Agents Follow the Plan?
- arXiv: 2604.12147 (cross-listed)
- Authors: Shuyang Liu, Saman Dehghan, Jatin Ganhotra, Martin Hirzel, Reyhaneh Jabbarvand
- Subjects: cs.SE; cs.AI; cs.CL
- Tags: LLM Agent, Code Generation
- Summary: 该论文对编程代理的计划遵循情况进行了首次广泛系统分析,发现代理经常违反指令计划,且计划提醒可以缓解违规并提高任务成功率。研究还发现,低质量的计划甚至比没有计划更糟糕,揭示了当前模型在遵循指令计划方面的研究空白。
[108] Domain-Specific Latent Representations Improve the Fidelity of Diffusion-Based Medical Image Super-Resolution
- arXiv: 2604.12152 (cross-listed)
- Authors: Sebastian Cajas, Ashaba Judith, Rahul Gorijavolu, Sahil Kapadia, Hillary Clinton Kasimbazi, Leo Kinyera, Emmanuel Paul Kwesiga, Sri Sri Jaithra Varma Manthena, Luis Filipe Nakayama, Ninsiima Doreen, Leo Anthony Celi
- Subjects: cs.CV; cs.AI
- Tags: Image Super-Resolution, Medical AI, Diffusion Model
- Code: code
- Summary: 该论文证明了在基于扩散模型的医学图像超分辨率任务中,使用领域特定的变分自编码器(MedVAE)替代通用VAE可以显著提高重建质量。研究发现自编码器的重建质量是影响下游超分辨率性能的主导因素,而非扩散架构本身。
[109] Fully Homomorphic Encryption on Llama 3 model for privacy preserving LLM inference
- arXiv: 2604.12168 (cross-listed)
- Authors: Anes Abdennebi, Nadjia Kara, Laaziz Lahlou
- Subjects: cs.CR; cs.AI
- Tags: Privacy, LLM Security, LLM Inference
- Summary: 该论文将基于格的后量子全同态加密技术集成到Llama 3模型的推理流程中,以保护数据隐私。实验结果表明,该方法在保证高文本生成准确率的同时,实现了合理的延迟,验证了全同态加密在大语言模型推理中的可行性。
[110] CycloneMAE: A Scalable Multi-Task Learning Model for Global Tropical Cyclone Probabilistic Forecasting
- arXiv: 2604.12180 (cross-listed)
- Authors: Renlong Hang, Zihao Xu, Jiuwei Zhao, Runling Yu, Leye Cheng, Qingshan Liu
- Subjects: cs.LG; cs.AI
- Tags: Weather Forecasting, Multi-Task Learning
- Summary: 该论文提出了CycloneMAE,一种可扩展的多任务预测模型,利用掩码自编码器从多模态数据中学习热带气旋的可迁移表示。该模型在多个全球海盆的预测中优于领先的数值天气预报系统,并能同时提供确定性预报和概率分布。
[111] Clustering-Enhanced Domain Adaptation for Cross-Domain Intrusion Detection in Industrial Control Systems
- arXiv: 2604.12183 (cross-listed)
- Authors: Luyao Wang
- Subjects: cs.LG; cs.AI; cs.CR
- Tags: Cybersecurity, Domain Adaptation
- Summary: 该论文提出了一种针对工业控制系统跨域入侵检测的聚类增强域适应方法,通过特征对齐和聚类策略来解决动态环境中的流量分布变化和未知攻击检测问题。实验结果表明,该方法显著提高了未知攻击的检测准确率和稳定性。
[112] Characterizing Resource Sharing Practices on Underground Internet Forum Synthetic Non-Consensual Intimate Image Content Creation Communities
- arXiv: 2604.12190 (cross-listed)
- Authors: Bernardo B. P. Medeiros, Malvika Jadhav, Allison Lu, Tadayoshi Kohno, Vincent Bindschaedler, Kevin R. B. Butler
- Subjects: cs.CY; cs.AI; cs.HC
- Tags: Cybersecurity, AI Ethics
- Summary: 该论文对地下互联网论坛上合成非自愿亲密图像(SNCII)内容创建社区的资源分享实践进行了综合分析。研究揭示了不同技术水平用户之间的资源流动和知识转移,并指出了现有监管基础设施的差距及关键干预点。
[113] Towards grounded autonomous research: an end-to-end LLM mini research loop on published computational physics
- arXiv: 2604.12198 (cross-listed)
- Authors: Haonan Huang
- Subjects: cs.AI
- Tags: LLM Agent, Scientific Reasoning
- Summary: 本文提出了一种最小研究循环框架,让LLM智能体能够阅读、复现、批判和扩展计算物理论文。实验表明,智能体在111篇论文中发现了约42%的实质性缺陷,并为一篇Nature Communications论文生成了可发表的评论。
[114] Unveiling the Surprising Efficacy of Navigation Understanding in End-to-End Autonomous Driving
- arXiv: 2604.12208 (cross-listed)
- Authors: Zhihua Hua, Junli Wang, Pengfei LI, Qihao Jin, Bo Zhang, Kehua Sheng, Yilun Chen, Zhongxue Gan, Wenchao Ding
- Subjects: cs.RO; cs.AI
- Tags: Autonomous Driving, Vision-Language Model
- Venue: ICRA 2026
- Summary: 本文揭示了端到端自动驾驶系统过度依赖局部场景理解而忽视全局导航信息的问题,提出了SNG框架和SNG-VLA模型,通过精确的导航信息建模实现了最先进的性能。
[115] Ride the Wave: Precision-Allocated Sparse Attention for Smooth Video Generation
- arXiv: 2604.12219 (cross-listed)
- Authors: Wentai Zhang, Ronghui Xi, Shiyao Peng, Jiayu Huang, Haoran Luo, Zichen Tang, Haihong E
- Subjects: cs.CV; cs.AI
- Tags: Video Generation, Diffusion Model, LLM Inference
- Summary: 本文提出了PASA,一种无需训练的稀疏注意力框架,用于高效且时间平滑的视频生成。该方法通过曲率感知的动态预算分配和随机选择偏置,在加速推理的同时消除了视觉闪烁。
[116] LLM-Guided Semantic Bootstrapping for Interpretable Text Classification with Tsetlin Machines
- arXiv: 2604.12223 (cross-listed)
- Authors: Jiechao Gao, Rohan Kumar Yadav, Yuangang Li, Yuandong Pan, Jie Wang, Ying Liu, Michael Lepech
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: Text Classification, Knowledge Distillation, Interpretability
- Venue: ACL 2026
- Summary: 本文提出了一种将LLM知识迁移到符号化Tsetlin Machine的语义引导框架,用于可解释的文本分类。该方法通过LLM生成子意图并创建合成数据,在保持完全符号化和高效的同时达到了BERT级别的性能。
[117] TEMPLATEFUZZ: Fine-Grained Chat Template Fuzzing for Jailbreaking and Red Teaming LLMs
- arXiv: 2604.12232 (cross-listed)
- Authors: Qingchao Shen, Zibo Xiao, Lili Huang, Enwei Hu, Yongqiang Tian, Junjie Chen
- Subjects: cs.CR; cs.AI; cs.SE
- Tags: LLM Security, Prompt Injection, Adversarial Robustness
- Summary: 本文提出了TEMPLATEFUZZ,一种针对LLM聊天模板的细粒度模糊测试框架,通过元素级变异和启发式搜索策略系统性地暴露模板漏洞,在开源LLM上实现了98.2%的攻击成功率。
[118] MolMem: Memory-Augmented Agentic Reinforcement Learning for Sample-Efficient Molecular Optimization
- arXiv: 2604.12237 (cross-listed)
- Authors: Ziqing Wang, Yibo Wen, Abhishek Pandy, Han Liu, Kaize Ding
- Subjects: cs.LG; cs.AI; cs.CL
- Tags: Molecular Generation, Reinforcement Learning, Drug Discovery
- Code: code
- Summary: 本文提出了MolMem,一种具有双记忆系统的分子优化强化学习框架,包含静态示例记忆和演化技能记忆。该方法在仅500次预言机调用下,单属性任务成功率达90%,多属性任务成功率达52%。
[119] Continuous Knowledge Metabolism: Generating Scientific Hypotheses from Evolving Literature
- arXiv: 2604.12243 (cross-listed)
- Authors: Jinkai Tao, Yubo Wang, Xiaoyu Liu, Menglin Yang
- Subjects: cs.CL; cs.AI
- Tags: Knowledge Synthesis, Scientific Reasoning, LLM Agent
- Summary: 本文提出了CKM框架,通过滑动时间窗口处理科学文献并增量更新知识库来生成科学假设。CKM-Lite在命中率、假设产量和最佳匹配对齐方面优于批处理方法,同时将token成本降低了92%。
[120] Socrates Loss: Unifying Confidence Calibration and Classification by Leveraging the Unknown
- arXiv: 2604.12245 (cross-listed)
- Authors: Sandra Gómez-Gálvez, Tobias Olenyi, Gillian Dobbie, Katerina Taškova
- Subjects: cs.LG; cs.AI; cs.CV; cs.NE
- Tags: Uncertainty Estimation, Deep Learning Theory
- Venue: TMLR 2026
- Code: code
- Summary: 本文提出了Socrates Loss,一种统一的损失函数,通过引入辅助未知类别来同时优化分类和置信度校准。该方法在四个基准数据集上实现了更好的准确率-校准权衡,并提供了理论保证。
[121] SpecBound: Adaptive Bounded Self-Speculation with Layer-wise Confidence Calibration
- arXiv: 2604.12247 (cross-listed)
- Authors: Zhuofan Wen, Yang Feng
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: Speculative Decoding, LLM Inference
- Venue: ACL 2026
- Summary: 本文提出了一种自推测解码框架,通过层级温度退火和自适应推测长度边界来加速LLM推理。该方法无需修改基础模型参数,在多种长文本生成任务上实现了高达2.33倍的加速。
[122] SpanKey: Dynamic Key Space Conditioning for Neural Network Access Control
- arXiv: 2604.12254 (cross-listed)
- Authors: WenBin Yan
- Subjects: cs.CR; cs.AI
- Tags: LLM Security, Privacy
- Code: code
- Summary: 本文介绍了SpanKey,一种通过子空间密钥注入来控制神经网络推理的轻量级方法。论文分析了机制、失败模式(密钥吸收),并在CIFAR-10和MNIST上进行了实验验证。
[123] ARGen: Affect-Reinforced Generative Augmentation towards Vision-based Dynamic Emotion Perception
- arXiv: 2604.12255 (cross-listed)
- Authors: Huanzhen Wang, Ziheng Zhou, Jiaqi Song, Li He, Yunshi Lan, Yan Wang, Wenqiang Zhang
- Subjects: cs.CV; cs.AI
- Tags: Affective Computing, Video Generation, Diffusion Model
- Summary: 本文提出了ARGen,一种情感增强的生成式数据增强框架,用于动态面部表情识别。该方法通过情感语义注入和自适应强化扩散两个阶段,生成多样化的情感表达以增强识别性能。
[124] Coding-Free and Privacy-Preserving MCP Framework for Clinical Agentic Research Intelligence System
- arXiv: 2604.12258 (cross-listed)
- Authors: Taehun Kim, Hyeryun Park, Hyeonhoon Lee, Yushin Lee, Kyungsang Kim, Hyung-Chul Lee
- Subjects: cs.CL; cs.AI
- Tags: Medical AI, LLM Agent, Privacy
- Summary: 本文提出了CARIS,一个临床智能研究系统,通过MCP协议集成LLM与模块化工具,在保护数据隐私的同时自动化临床研究流程。系统支持自然语言驱动的研究规划、队列构建、模型开发和报告生成。
[125] CascadeDebate: Multi-Agent Deliberation for Cost-Aware LLM Cascades
- arXiv: 2604.12262 (cross-listed)
- Authors: Raeyoung Chang, Dongwook Kwon, Jisoo Lee, Nikhil Verma
- Subjects: cs.CL; cs.AI
- Tags: Multi-Agent System, LLM Inference, LLM Agent
- Summary: 本文提出了CascadeDebate,一种在LLM级联系统的升级边界插入多智能体审议的框架。通过基于置信度的路由激活轻量级智能体集合,在五个基准测试上比单模型级联提升了高达26.75%。
[126] MAST: Mask-Guided Attention Mass Allocation for Training-Free Multi-Style Transfer
- arXiv: 2604.12281 (cross-listed)
- Authors: Dongkyung Kang, Jaeyeon Hwang, Junseo Park, Minji Kang, Yeryeong Lee, Beomseok Ko, Hanyoung Roh, Jeongmin Shin, Hyeryung Jang
- Subjects: cs.CV; cs.AI
- Tags: Image Editing, Diffusion Model, Text-to-Image
- Summary: 本文提出了MAST,一种无需训练的多风格迁移框架,通过掩码引导的注意力质量分配来控制扩散模型中的内容-风格交互。该方法有效消除了边界伪影并保持了结构一致性。
[127] Local-Splitter: A Measurement Study of Seven Tactics for Reducing Cloud LLM Token Usage on Coding-Agent Workloads
- arXiv: 2604.12301 (cross-listed)
- Authors: Justice Owusu Agyemang, Jerry John Kponyo, Elliot Amponsah, Godfred Manu Addo Boakye, Kwame Opuni-Boachie Obour Agyekum
- Subjects: cs.DC; cs.AI; cs.SE
- Tags: LLM Inference, Code Generation, Edge Computing
- Summary: 本文对七种减少云端LLM token使用量的策略进行了系统测量研究,利用本地模型作为分流层。本地路由结合提示压缩在编辑密集型工作负载上实现了45-79%的云端token节省。
[128] GCA Framework: A Gulf-Grounded Dataset and Agentic Pipeline for Climate Decision Support
- arXiv: 2604.12306 (cross-listed)
- Authors: Muhammad Umer Sheikh, Khawar Shehzad, Salman Khan, Fahad Shahbaz Khan, Muhammad Haris Khan
- Subjects: cs.LG; cs.AI
- Tags: LLM Agent, Environmental Planning, Multimodal Learning
- Summary: 本文提出了GCA框架,包含一个海湾地区聚焦的多模态数据集(GCA-DS)和一个工具增强的气候分析智能体。领域微调和工具集成显著提高了模型在海湾气候任务上的可靠性。
[129] Is Vibe Coding the Future? An Empirical Assessment of LLM Generated Codes for Construction Safety
- arXiv: 2604.12311 (cross-listed)
- Authors: S M Jamil Uddin
- Subjects: cs.SE; cs.AI; cs.HC
- Tags: Code Generation, LLM Evaluation, LLM Hallucination
- Summary: 该研究评估了’vibe coding’范式(非技术用户通过自然语言指导LLM生成代码)在建筑安全领域的可靠性。通过对450个Python脚本的实证分析,发现约45%的代码存在静默失败问题(代码可执行但包含错误的安全逻辑),揭示了当前LLM在安全关键领域应用的局限性。
[130] EgoEsportsQA: An Egocentric Video Benchmark for Perception and Reasoning in Esports
- arXiv: 2604.12320 (cross-listed)
- Authors: Jianzhe Ma, Zhonghao Cao, Shangkui Chen, Yichen Xu, Wenxuan Wang, Qin Jin
- Subjects: cs.CV; cs.AI; cs.MM
- Tags: Video Understanding, Vision-Language Model, Benchmark
- Summary: 本文提出了EgoEsportsQA,一个用于评估电子竞技场景下感知和推理能力的自我中心视频问答基准。该基准包含1,745个高质量问答对,涵盖11个认知能力子任务和6个电竞知识子任务,揭示了当前视频大语言模型在快速战术推理方面的不足。
[131] Black-Box Optimization From Small Offline Datasets via Meta Learning with Synthetic Tasks
- arXiv: 2604.12325 (cross-listed)
- Authors: Azza Fadhel, Hung Tran, Trong Nghia Hoang, Jana Doppa
- Subjects: cs.LG; cs.AI
- Tags: Meta-Learning, Optimization, Molecular Generation
- Venue: AISTATS 2026
- Summary: 本文提出了OptBias,一个用于小数据离线黑盒优化的元学习框架。该方法通过高斯过程生成合成任务来学习可复用的优化偏差,然后在目标任务的小数据上进行微调,在小数据场景下持续优于现有基线方法。
[132] GeM-EA: A Generative and Meta-learning Enhanced Evolutionary Algorithm for Streaming Data-Driven Optimization
- arXiv: 2604.12336 (cross-listed)
- Authors: Yue Wu, Yuan-Ting Zhong, Ze-Yuan Ma, Yue-Jiao Gong
- Subjects: cs.NE; cs.AI
- Tags: Optimization, Meta-Learning, Continual Learning
- Venue: GECCO 2026
- Summary: 本文提出了GeM-EA,一种用于流数据驱动优化的生成式元学习增强进化算法。该方法结合双层元学习策略快速初始化代理模型,并通过生成式回放利用历史知识,在概念漂移场景下实现更快的适应和更强的鲁棒性。
[133] FRTSearch: Unified Detection and Parameter Inference of Fast Radio Transients using Instance Segmentation
- arXiv: 2604.12344 (cross-listed)
- Authors: Bin Zhang, Yabiao Wang, Xiaoyao Xie, Shanping You, Xuhong Yu, Qiuhua Li, Hongwei Li, Shaowen Du, Chenchen Miao, Dengke Zhou, Jianhua Fang, Jiafu Wu, Pei Wang, Di Li
- Subjects: astro-ph.IM; cs.AI
- Tags: Image Segmentation, Anomaly Detection, Scientific Computing
- Venue: ApJS
- Summary: 本文介绍了FRTSearch,一个用于快速射电瞬态检测和物理参数推断的端到端框架。该框架利用Mask R-CNN进行轨迹分割,结合物理驱动的IMPIC算法直接推断色散量和到达时间,在保持98%召回率的同时将误报率降低超过99.9%。
[134] Scaffold-Conditioned Preference Triplets for Controllable Molecular Optimization with Large Language Models
- arXiv: 2604.12350 (cross-listed)
- Authors: Yi Xiong, Liang Xiong, Xiaohong Ji, Sen Yang, Zhifeng Gao, Huaimin Wang, Kele Xu
- Subjects: cs.LG; cs.AI
- Tags: Molecular Generation, Drug Discovery, LLM Alignment
- Summary: 本文提出了SCPT,一种用于可控分子优化的支架条件偏好三元组构建流程。该方法通过支架对齐和化学驱动过滤器构建偏好数据,使分子LLM能够在保留支架的同时进行属性优化编辑,在单目标和多目标基准测试中均表现优异。
[135] Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning
- arXiv: 2604.12374 (cross-listed)
- Authors: NVIDIA, Aakshita Chandiramani, Aaron Blakeman, Abdullahi Olaoye, Abhibha Gupta, Abhilash Somasamudramath, Abhinav Khattar, Adeola Adesoba, Adi Renduchintala, Adil Asif, Aditya Agrawal, Aditya Vavre, Ahmad Kiswani, Aishwarya Padmakumar, Ajay Hotchandani, Akanksha Shukla, Akhiad Bercovich, Aleksander Ficek, Aleksandr Shaposhnikov, Alex Gronskiy, Alex Kondratenko, Alex Neefus, Alex Steiner, Alex Yang, Alexander Bukharin, Alexander Young, Ali Hatamizadeh, Ali Taghibakhshi, Alina Galiautdinova, Alisa Liu, Alok Kumar, Ameya Sunil Mahabaleshwarkar, Amir Klein, Amit Zuker, Amnon Geifman, Anahita Bhiwandiwalla, Ananth Subramaniam, Andrew Tao, Anjaney Shrivastava, Anjulie Agrusa, Ankur Srivastava, Ankur Verma, Ann Guan, Anna Shors, Annamalai Chockalingam, Anubhav Mandarwal, Aparnaa Ramani, Arham Mehta, Arti Jain, Arun Venkatesan, Asha Anoosheh, Ashwath Aithal, Ashwin Poojary, Asif Ahamed, Asit Mishra, Asli Sabanci Demiroz, Asma Kuriparambil Thekkumpate, Atefeh Sohrabizadeh, Avinash Kaur, Ayush Dattagupta, Barath Subramaniam Anandan, Bardiya Sadeghi, Barnaby Simkin, Ben Lanir, Benedikt Schifferer, Benjamin Chislett, Besmira Nushi, Bilal Kartal, Bill Thiede, Bita Darvish Rouhani, Bobby Chen, Boris Ginsburg, Brandon Norick, Branislav Kisacanin, Brian Yu, Bryan Catanzaro, Buvaneswari Mani, Carlo del Mundo, Chankyu Lee, Chanran Kim, Chantal Hwang, Chao Ni, Charles Wang, Charlie Truong, Cheng-Ping Hsieh, Chenhan Yu, Chenjie Luo, Cherie Wang, Chetan Mungekar, Chintan Patel, Chris Alexiuk, Chris Holguin, Chris Wing, Christian Munley, Christopher Parisien, Chuck Desai, Chunyang Sheng, Collin Neale, Cyril Meurillon, Dakshi Kumar
- Subjects: cs.LG; cs.AI; cs.CL
- Tags: Mixture-of-Experts, LLM Inference, Pre-training
- Summary: 本文介绍了Nemotron 3 Super,一个1200亿参数(激活120亿)的混合Mamba-Attention专家混合模型。该模型首次采用NVFP4预训练和LatentMoE架构,支持100万上下文长度,推理吞吐量相比GPT-OSS-120B和Qwen3.5-122B分别提升2.2倍和7.5倍。
[136] Cooperative Memory Paging with Keyword Bookmarks for Long-Horizon LLM Conversations
- arXiv: 2604.12376 (cross-listed)
- Authors: Ziyang Liu
- Subjects: cs.CL; cs.AI
- Tags: Long Context, Memory Architecture, LLM Inference
- Summary: 本文提出了协作式内存分页方法,通过用关键词书签替换被驱逐的内容片段,并提供recall()工具按需检索完整内容。在LoCoMo基准测试中,该方法在六种方法中实现了最高的回答质量,优于截断、BM25和全上下文等基线方法。
[137] SCRIPT: A Subcharacter Compositional Representation Injection Module for Korean Pre-Trained Language Models
- arXiv: 2604.12377 (cross-listed)
- Authors: SungHo Kim, Juhyeong Park, Eda Atalay, SangKeun Lee
- Subjects: cs.CL; cs.AI
- Tags: Pre-training, Natural Language Understanding, Representation Learning
- Venue: ACL 2026 Findings
- Code: code
- Summary: 本文提出了SCRIPT,一个模型无关的模块,用于将子字符组合知识注入韩语预训练语言模型。该方法通过增强子词嵌入的结构粒度,在不改变架构或额外预训练的情况下,在各种韩语NLU和NLG任务上提升了基线模型性能。
[138] Beyond Output Correctness: Benchmarking and Evaluating Large Language Model Reasoning in Coding Tasks
- arXiv: 2604.12379 (cross-listed)
- Authors: Yuangang Li, Justin Tian Jin Chen, Ethan Yu, David Hong, Iftekhar Ahmed
- Subjects: cs.SE; cs.AI; cs.LG
- Tags: LLM Reasoning, Code Generation, Benchmark
- Code: code
- Summary: 本文介绍了CodeRQ-Bench,首个用于评估LLM在代码生成、摘要和分类任务中推理质量的基准。作者还提出了VERA,一个结合证据验证和歧义感知分数修正的两阶段评估器,在四个数据集上AUCROC提升高达0.26。
[139] Chain-of-Models Pre-Training: Rethinking Training Acceleration of Vision Foundation Models
- arXiv: 2604.12391 (cross-listed)
- Authors: Jiawei Fan, Shigeng Wang, Chao Li, Xiaolong Liu, Anbang Yao
- Subjects: cs.CV; cs.AI
- Tags: Pre-training, Vision Transformer, Transfer Learning
- Venue: CVPR 2026
- Code: code
- Summary: 本文提出了CoM-PT,一种视觉基础模型的性能无损训练加速方法。该方法建立模型链进行序列化预训练,较小模型通过参数空间和特征空间的知识转移加速较大模型的训练,在45个数据集上验证了其有效性,训练更多模型反而带来更高效率。
[140] Security and Resilience in Autonomous Vehicles: A Proactive Design Approach
- arXiv: 2604.12408 (cross-listed)
- Authors: Chieh Tsai, Murad Mehrab Abrar, Salim Hariri
- Subjects: cs.CR; cs.AI
- Tags: Autonomous Driving, Cybersecurity, Adversarial Robustness
- Summary: 本章提出了增强自动驾驶车辆安全性和韧性的设计技术,包括跨架构层的攻击分类法和集成冗余、多样性及自适应重配置的韧性架构。在Quanser QCar平台上的实验验证了该方法在检测深度相机致盲攻击和软件篡改方面的有效性。
[141] RACF: A Resilient Autonomous Car Framework with Object Distance Correction
- arXiv: 2604.12418 (cross-listed)
- Authors: Chieh Tsai, Hossein Rastgoftar, Salim Hariri
- Subjects: cs.RO; cs.AI
- Tags: Autonomous Driving, Anomaly Detection, Sensor Fusion
- Summary: 本文提出了RACF,一个包含物体距离校正算法的韧性自动驾驶框架,通过深度相机、LiDAR和物理运动学的冗余和多样性提高感知层鲁棒性。该框架在强干扰下实现高达35%的RMSE降低,同时保持实时运行。
[142] Decoding by Perturbation: Mitigating MLLM Hallucinations via Dynamic Textual Perturbation
- arXiv: 2604.12424 (cross-listed)
- Authors: Sihang Jia, Shuliang Liu, Songbo Yang, Yibo Yan, Xin Zou, Xuming Hu
- Subjects: cs.CL; cs.AI; cs.CV
- Tags: LLM Hallucination, Vision-Language Model, Prompt Engineering
- Summary: 本文提出了DeP,一个无需训练的框架,通过解码阶段的受控文本扰动来缓解多模态LLM幻觉。该方法利用注意力方差增强稳定的证据区域,同时抑制可疑噪声,在多个基准测试中有效减少了幻觉现象。
[143] IAD-Unify: A Region-Grounded Unified Model for Industrial Anomaly Segmentation, Understanding, and Generation
- arXiv: 2604.12440 (cross-listed)
- Authors: Haoyu Zheng, Tianwei Lin, Wei Wang, Zhuonan Wang, Wenqiao Zhang, Jiaqi Zhu, Feifei Shao
- Subjects: cs.CV; cs.AI
- Tags: Anomaly Detection, Vision-Language Model, Image Segmentation
- Summary: 本文提出了IAD-Unify,一个用于工业异常分割、理解和生成的统一框架。该模型使用冻结的DINOv2区域专家通过轻量级token注入为共享的Qwen3.5-4B视觉语言骨干提供精确异常证据,在MMAD基准上实现了强大的跨类别泛化性能。
[144] X-VC: Zero-shot Streaming Voice Conversion in Codec Space
- arXiv: 2604.12456 (cross-listed)
- Authors: Qixi Zheng, Yuxiang Zhao, Tianrui Wang, Wenxi Chen, Kele Xu, Yikang Li, Qinyuan Chen, Xipeng Qiu, Kai Yu, Xie Chen
- Subjects: eess.AS; cs.AI
- Tags: Speech Synthesis, Speech Processing, Zero-Shot Learning
- Summary: 本文介绍了X-VC,一个在神经编解码器潜在空间执行一步转换的零样本流式语音转换系统。该系统使用双条件声学转换器和自适应归一化注入说话者信息,采用对齐编解码器分段训练的块推理方案,在保持高说话者相似度的同时实现低延迟。
[145] Euler-inspired Decoupling Neural Operator for Efficient Pansharpening
- arXiv: 2604.12463 (cross-listed)
- Authors: Anqi Zhu, Mengting Ma, Yizhen Jiang, Xiangdong Li, Kai Zheng, Jiaxin Li, Wei Zhang
- Subjects: cs.CV; cs.AI
- Tags: Image Fusion, Neural Operator, Remote Sensing
- Summary: 本文提出EDNO框架,利用欧拉公式将特征转换到极坐标系,通过显式和隐式特征交互模块实现高效的全色锐化,在频域中实现全局感受野并保持离散不变性。
[146] From Kinematics to Dynamics: Learning to Refine Hybrid Plans for Physically Feasible Execution
- arXiv: 2604.12474 (cross-listed)
- Authors: Lidor Erez, Shahaf S. Shperberg, Ayal Taitler
- Subjects: cs.RO; cs.AI
- Tags: Motion Planning, Reinforcement Learning, Robotics
- Summary: 本文提出使用强化学习来优化混合规划器生成的一阶运动轨迹,使其满足二阶动力学约束,从而弥合规划轨迹与真实物理执行之间的差距。
[147] Mining Large Language Models for Low-Resource Language Data: Comparing Elicitation Strategies for Hausa and Fongbe
- arXiv: 2604.12477 (cross-listed)
- Authors: Mahounan Pericles Adjovi, Roald Eiselen, Prasenjit Mitra
- Subjects: cs.CL; cs.AI
- Tags: Low-Resource NLP, Data Synthesis, Multilingual Learning
- Venue: LREC-COLING 2026
- Summary: 本文研究了如何通过策略性提示从LLM中提取豪萨语和丰格贝语两种低资源西非语言的可用文本数据,比较了六种提示策略在GPT-4o Mini和Gemini上的效果。
[148] Audio Source Separation in Reverberant Environments using $β$-divergence based Nonnegative Factorization
- arXiv: 2604.12480 (cross-listed)
- Authors: Mahmoud Fakhry, Piergiorgio Svaizer, Maurizio Omologo
- Subjects: cs.SD; cs.AI
- Tags: Audio Source Separation, Speech Processing
- Summary: 本文提出基于β散度的非负分解方法用于混响环境下的多通道音频源分离,通过最小化β散度控制分解稀疏性,在多种混合条件下取得更好的分离质量。
[149] Social Learning Strategies for Evolved Virtual Soft Robots
- arXiv: 2604.12482 (cross-listed)
- Authors: K. Ege de Bruin, Kyrre Glette, Kai Olav Ellefsen, Giorgia Nadizar, Eric Medvet
- Subjects: cs.RO; cs.AI
- Tags: Evolutionary Robotics, Imitation Learning, Multi-Agent System
- Summary: 本文引入社会学习方法,让虚拟软机器人能够从同伴那里利用优化后的控制参数来加速自身大脑优化,实验表明从形态相似的机器人继承经验效果更佳。
[150] Elastic Net Regularization and Gabor Dictionary for Classification of Heart Sound Signals using Deep Learning
- arXiv: 2604.12483 (cross-listed)
- Authors: Mahmoud Fakhry, Ascensión Gallardo-Antolín
- Subjects: cs.SD; cs.AI
- Tags: Medical AI, Signal Processing
- Summary: 本文提出使用Gabor字典和弹性网络正则化构建时频特征矩阵,结合深度学习网络对五种心脏瓣膜疾病的心音信号进行分类,最高准确率达98.95%。
[151] KG-Reasoner: A Reinforced Model for End-to-End Multi-Hop Knowledge Graph Reasoning
- arXiv: 2604.12487 (cross-listed)
- Authors: Shuai Wang, Yinan Yu
- Subjects: cs.CL; cs.AI
- Tags: Knowledge Graph, LLM Reasoning, Question Answering
- Code: code
- Summary: 本文提出KG-Reasoner框架,通过强化学习训练LLM内化知识图谱遍历过程,实现端到端的多跳推理,在八个知识密集型推理基准上达到竞争性或最优性能。
[152] Deepfakes at Face Value: Image and Authority
- arXiv: 2604.12490 (cross-listed)
- Authors: James Ravi Kirkpatrick
- Subjects: cs.CY; cs.AI
- Tags: AI Ethics, AI Safety
- Venue: AI & Society 2026
- Summary: 本文从哲学角度论证深度伪造的错误在于篡夺了我们对自身形象和身份治理的权威,区分了合法的艺术挪用与错误的算法模拟。
[153] Latent Planning Emerges with Scale
- arXiv: 2604.12493 (cross-listed)
- Authors: Michael Hanna, Emmanuel Ameisen
- Subjects: cs.CL; cs.AI
- Tags: LLM Reasoning, Interpretability
- Venue: ICLR 2026
- Summary: 本文定义了LLM中的潜在规划概念,通过实验证明规划能力随模型规模增长,发现模型能够提前识别目标词汇并塑造上下文以支持该词汇的生成。
[154] Lit2Vec: A Reproducible Workflow for Building a Legally Screened Chemistry Corpus from S2ORC for Downstream Retrieval and Text Mining
- arXiv: 2604.12498 (cross-listed)
- Authors: Mahmoud Amiri, Jamile Mohammad Jafari, Sara Mostafapour, Thomas Bocklitz
- Subjects: cs.DB; cs.AI
- Tags: Data Annotation, Information Retrieval, Scientific Computing
- Summary: 本文提出Lit2Vec工作流,用于从S2ORC构建经过许可筛选的化学语料库,包含全文、段落嵌入和元数据,支持下游检索和文本挖掘应用。
[155] SEATrack: Simple, Efficient, and Adaptive Multimodal Tracker
- arXiv: 2604.12502 (cross-listed)
- Authors: Junbin Su, Ziteng Xue, Shihui Zhang, Kun Chen, Weiming Hu, Zhipeng Zhang
- Subjects: cs.CV; cs.AI
- Tags: Object Tracking, Multimodal Learning, Parameter-Efficient Fine-Tuning
- Venue: CVPR 2026
- Code: code
- Summary: 本文提出SEATrack多模态跟踪器,通过AMG-LoRA实现跨模态注意力对齐,并使用分层专家混合实现高效的全局关系建模,在RGB-T、RGB-D和RGB-E跟踪任务上取得最优平衡。
[156] Topology-Aware Reasoning over Incomplete Knowledge Graph with Graph-Based Soft Prompting
- arXiv: 2604.12503 (cross-listed)
- Authors: Shuai Wang, Xixi Wang, Yinan Yu
- Subjects: cs.CL; cs.AI
- Tags: Knowledge Graph, Question Answering, Prompt Engineering
- Code: code
- Summary: 本文提出基于图的软提示框架,将KBQA推理从节点级路径遍历转变为子图级推理,使LLM能够对更丰富的结构上下文进行推理并降低对缺失边的敏感性。
[157] NTIRE 2026 The 3rd Restore Any Image Model (RAIM) Challenge: Professional Image Quality Assessment (Track 1)
- arXiv: 2604.12512 (cross-listed)
- Authors: Guanyi Qin, Jie Liang, Bingbing Zhang, Lishen Qu, Ya-nan Guan, Hui Zeng, Lei Zhang, Radu Timofte, Jianhui Sun, Xinli Yue, Tao Shao, Huan Hou, Wenjie Liao, Shuhao Han, Jieyu Yuan, Chunle Guo, Chongyi Li, Zewen Chen, Yunze Liu, Jian Guo, Juan Wang, Yun Zeng, Bing Li, Weiming Hu, Hesong Li, Dehua Liu, Xinjie Zhang, Qiang Li, Li Yan, Wei Dong, Qingsen Yan, Xingcan Li, Shenglong Zhou, Manjiang Yin, Yinxiang Zhang, Hongbo Wang, Jikai Xu, Zhaohui Fan, Dandan Zhu, Wei Sun, Weixia Zhang, Kun Zhu, Nana Zhang, Kaiwei Zhang, Qianqian Zhang, Zhihan Zhang, William Gordon, Linwei Wu, Jiachen Tu, Guoyi Xu, Yaoxin Jiang, Cici Liu, Yaokun Shi
- Subjects: cs.CV; cs.AI
- Tags: Image Quality Assessment, Vision-Language Model, Benchmark
- Venue: CVPRW 2026
- Code: code
- Summary: 本文概述了NTIRE 2026专业图像质量评估挑战赛,探索MLLM模仿人类专家认知评估高质量图像对的能力,包括比较质量选择和解释性推理两个任务。
[158] Orthogonal Subspace Projection for Continual Machine Unlearning via SVD-Based LoRA
- arXiv: 2604.12526 (cross-listed)
- Authors: Yogachandran Rahulamathavan, Nasir Iqbal, Juncheng Hu, Sangarapillai Lambotharan
- Subjects: cs.LG; cs.AI
- Tags: Machine Unlearning, Continual Learning, Parameter-Efficient Fine-Tuning
- Summary: 本文提出基于SVD的正交子空间投影方法用于持续机器遗忘,约束LoRA更新位于先前任务子空间的正交补空间,在三十个连续遗忘任务后仍保持基线性能。
[159] MODIX: A Training-Free Multimodal Information-Driven Positional Index Scaling for Vision-Language Models
- arXiv: 2604.12537 (cross-listed)
- Authors: Ruoxiang Huang, Zhen Yuan
- Subjects: cs.CV; cs.AI
- Tags: Vision-Language Model, Multimodal Learning, LLM Inference
- Venue: CVPR 2026
- Summary: 本文提出MODIX框架,无需训练即可根据模态特定信息密度动态调整位置步长,为信息丰富的模态分配更细粒度的位置索引,提升多模态推理性能。
[160] When Does Data Augmentation Help? Evaluating LLM and Back-Translation Methods for Hausa and Fongbe NLP
- arXiv: 2604.12540 (cross-listed)
- Authors: Mahounan Pericles Adjovi, Roald Eiselen, Prasenjit Mitra
- Subjects: cs.CL; cs.AI
- Tags: Data Augmentation, Low-Resource NLP, Named Entity Recognition
- Summary: 本文评估了LLM生成和回译两种数据增强方法对豪萨语和丰格贝语NER和POS标注的效果,发现增强效果取决于任务类型而非语言或LLM质量。
[161] KumoRFM-2: Scaling Foundation Models for Relational Learning
- arXiv: 2604.12596 (cross-listed)
- Authors: Valter Hudovernik, Federico López, Vid Kocijan, Akihiro Nitta, Jan Eric Lenssen, Jure Leskovec, Matthias Fey
- Subjects: cs.LG; cs.AI
- Tags: Foundation Model, Tabular Learning, In-Context Learning
- Summary: KumoRFM-2是一个针对关系数据的基础模型,支持上下文学习和微调,能够原生处理多个连接表而无需手动展平,在41个基准测试中表现优于监督学习方法高达8%。
[162] LLM-Guided Prompt Evolution for Password Guessing
- arXiv: 2604.12601 (cross-listed)
- Authors: Vladimir A. Mazin, Mikhail A. Zorin, Dmitrii S. Korzh, Elvir Z. Karimov, Dmitrii A. Bolokhov, Oleg Y. Rogov
- Subjects: cs.CR; cs.AI
- Tags: Prompt Engineering, Cybersecurity, Evolutionary Computation
- Summary: 本文提出了一种基于LLM的进化计算方法来自动优化密码猜测的提示词,使用OpenEvolve系统在RockYou测试集上将破解率从2.02%提升至8.48%。
[163] SOAR: Self-Correction for Optimal Alignment and Refinement in Diffusion Models
- arXiv: 2604.12617 (cross-listed)
- Authors: You Qin, Linqing Wang, Hao Fei, Roger Zimmermann, Liefeng Bo, Qinglin Lu, Chunyu Wang
- Subjects: cs.LG; cs.AI
- Tags: Diffusion Model, Reinforcement Learning, LLM Alignment
- Summary: SOAR是一种针对扩散模型的后训练方法,通过单步停止梯度回滚和重新加噪来提供密集的每步监督,在SD3.5-Medium上将GenEval从0.70提升至0.78。
[164] Efficient Semantic Image Communication for Traffic Monitoring at the Edge
- arXiv: 2604.12622 (cross-listed)
- Authors: Damir Assylbek, Nurmukhammed Aitymbetov, Marko Ristin, Dimitrios Zorbas
- Subjects: cs.CV; cs.AI; cs.NI
- Tags: Edge Computing, Image Compression, Autonomous Driving
- Summary: 本文提出了两种用于交通监控的语义图像通信管道MMSD和SAMR,通过语义分解和语义感知掩码重建实现高达99%的数据传输减少,同时保持语义一致性。
[165] Neural Dynamic GI: Random-Access Neural Compression for Temporal Lightmaps in Dynamic Lighting Environments
- arXiv: 2604.12625 (cross-listed)
- Authors: Jianhui Wu, Jian Zhou, Zhi Zhou, Zhangjin Huang, Chao Li
- Subjects: cs.GR; cs.AI
- Tags: Image Compression, Neural Compression
- Venue: CVPR 2025
- Summary: NDGI是一种针对时间光照贴图的神经压缩技术,利用多维特征图和轻量级神经网络整合时间信息,显著减少光照贴图的存储大小,同时保持高质量的动态全局光照效果。
[166] Calibration-Aware Policy Optimization for Reasoning LLMs
- arXiv: 2604.12632 (cross-listed)
- Authors: Ziqi Wang, Xingzhou Lou, Meiqi Wu, Zhengqi Wen, Junge Zhang
- Subjects: cs.LG; cs.AI
- Tags: LLM Reasoning, LLM Alignment, Uncertainty Estimation
- Venue: ACL 2026
- Summary: CAPO是一种针对LLM推理的校准感知策略优化方法,通过逻辑AUC代理损失实现不确定性感知的优势估计,在数学推理基准上将校准提升高达15%同时保持推理准确性。
[167] Contextual Multi-Task Reinforcement Learning for Autonomous Reef Monitoring
- arXiv: 2604.12645 (cross-listed)
- Authors: Melvin Laux, Yi-Ling Liu, Rina Alo, Sören Töpper, Mariela De Lucas Alvarez, Frank Kirchner, Rebecca Adam
- Subjects: cs.RO; cs.AI
- Tags: Multi-Task Learning, Reinforcement Learning, Robotics
- Venue: IEEE OCEANS 2026
- Summary: 本文提出了一种上下文多任务强化学习方法用于自主珊瑚礁监测,通过学习单一上下文相关策略来解决多个相关监测任务,提高了策略的样本效率和泛化能力。
[168] TimeSAF: Towards LLM-Guided Semantic Asynchronous Fusion for Time Series Forecasting
- arXiv: 2604.12648 (cross-listed)
- Authors: Fan Zhang, Shiming Fan, Hua Wang
- Subjects: cs.LG; cs.AI
- Tags: Time Series Forecasting, Multimodal Learning
- Summary: TimeSAF是一种基于层次化异步融合的时间序列预测框架,通过解耦单模态特征学习和跨模态交互,使用可学习查询聚合全局语义,在长期预测基准上显著优于现有方法。
[169] Learning Chain Of Thoughts Prompts for Predicting Entities, Relations, and even Literals on Knowledge Graphs
- arXiv: 2604.12651 (cross-listed)
- Authors: Alkid Baci, Luke Friedrichs, Caglar Demir, N'Dah Jean Kouagou, Axel-Cyrille Ngonga Ngomo
- Subjects: cs.CL; cs.AI
- Tags: Knowledge Graph, Prompt Engineering, LLM Reasoning
- Code: code
- Summary: RALP将链接预测重新表述为提示学习问题,通过贝叶斯优化学习基于字符串的思维链提示作为三元组的评分函数,在多个数据集上将MRR提升超过5%。
[170] PromptEcho: Annotation-Free Reward from Vision-Language Models for Text-to-Image Reinforcement Learning
- arXiv: 2604.12652 (cross-listed)
- Authors: Jinlong Liu, Wanggui He, Peng Zhang, Mushui Liu, Hao Jiang, Pipei Huang
- Subjects: cs.CV; cs.AI
- Tags: Text-to-Image, Reinforcement Learning, Vision-Language Model
- Summary: PromptEcho是一种无需标注和奖励模型训练的奖励构建方法,通过计算冻结VLM的token级交叉熵损失直接提取图像-文本对齐知识,在DenseAlignBench上实现显著提升。
[171] BID-LoRA: A Parameter-Efficient Framework for Continual Learning and Unlearning
- arXiv: 2604.12686 (cross-listed)
- Authors: Jagadeesh Rachapudi, Ritali Vatsi, Praful Hambarde, Amit Shukla
- Subjects: cs.LG; cs.AI
- Tags: Continual Learning, Machine Unlearning, Parameter-Efficient Fine-Tuning
- Summary: BID-LoRA是一种参数高效的持续学习和遗忘框架,通过三个专用适配器路径(保留、新增、遗忘)和逃逸遗忘机制,仅更新5%的参数即可实现精确的知识删除和整合。
[172] Information-Theoretic Optimization for Task-Adapted Compressed Sensing Magnetic Resonance Imaging
- arXiv: 2604.12709 (cross-listed)
- Authors: Xinyu Peng, Ziyang Zheng, Wenrui Dai, Duoduo Xue, Shaohui Li, Chenglin Li, Junni Zou, Hongkai Xiong
- Subjects: cs.LG; cs.AI; cs.CV
- Tags: Medical AI, Image Reconstruction, Uncertainty Estimation
- Venue: IEEE TPAMI
- Summary: 本文提出了一种信息论视角的任务自适应压缩感知MRI优化方法,通过最大化欠采样k空间测量与临床任务之间的互信息,实现概率推理和自适应采样。
[173] LASA: Language-Agnostic Semantic Alignment at the Semantic Bottleneck for LLM Safety
- arXiv: 2604.12710 (cross-listed)
- Authors: Junxiao Yang, Haoran Liu, Jinzhe Tu, Jiale Cheng, Zhexin Zhang, Shiyao Cui, Jiaqi Weng, Jialing Tao, Hui Xue, Hongning Wang, Han Qiu, Minlie Huang
- Subjects: cs.LG; cs.AI; cs.CL
- Tags: LLM Safety, LLM Alignment, Multilingual Learning
- Summary: LASA是一种语言无关的语义对齐方法,通过在语义瓶颈层锚定安全对齐,将LLaMA-3.1-8B-Instruct的平均攻击成功率从24.7%降至2.8%。
[174] GF-Score: Certified Class-Conditional Robustness Evaluation with Fairness Guarantees
- arXiv: 2604.12757 (cross-listed)
- Authors: Arya Shah, Kaveri Visavadiya, Manisha Padala
- Subjects: cs.LG; cs.AI
- Tags: Adversarial Robustness, Fairness
- Code: code
- Summary: GF-Score是一个将认证鲁棒性分解为类级鲁棒性配置文件的框架,通过四个基于福利经济学的指标量化类间鲁棒性差异,提供无需对抗攻击的审计管道。
[175] ARGOS: Who, Where, and When in Agentic Multi-Camera Person Search
- arXiv: 2604.12762 (cross-listed)
- Authors: Myungchul Kim, Kwanyong Park, Junmo Kim, In So Kweon
- Subjects: cs.CV; cs.AI; cs.MA
- Tags: LLM Agent, Multi-Agent System, Object Detection
- Venue: CVPR 2026 Workshop
- Summary: ARGOS是首个将多摄像头人员搜索重新表述为交互式推理问题的基准和框架,智能体需要在有限回合预算内规划、提问并消除候选者,涵盖2691个任务。
[176] CLASP: Class-Adaptive Layer Fusion and Dual-Stage Pruning for Multimodal Large Language Models
- arXiv: 2604.12767 (cross-listed)
- Authors: Yunkai Dang, Yizhu Jiang, Yifan Jiang, Qi Fan, Yinghuan Shi, Wenbin Li, Yang Gao
- Subjects: cs.CV; cs.AI
- Tags: Vision-Language Model, Model Compression, Multimodal Learning
- Code: code
- Summary: CLASP是一个即插即用的token减少框架,通过类自适应层融合和双阶段剪枝构建类别特定的视觉表示,在各种基准和MLLM架构上持续优于现有方法。
[177] Cognition-Inspired Dual-Stream Semantic Enhancement for Vision-Based Dynamic Emotion Modeling
- arXiv: 2604.12777 (cross-listed)
- Authors: Huanzhen Wang, Ziheng Zhou, Zeng Tao, Aoxing Li, Yingkai Zhao, Yuxuan Lin, Yan Wang, Wenqiang Zhang
- Subjects: cs.CV; cs.AI
- Tags: Affective Computing, Multimodal Learning, Vision Transformer
- Venue: ICRA 2026
- Summary: 本文提出了一种受认知启发的双流语义增强模型(DuSE),用于动态面部表情识别。该模型模拟人脑的情绪感知机制,通过层次化时序提示簇(HTPC)实现认知启动效应,并通过潜在语义情绪聚合器(LSEA)建模知识整合过程。
[178] DoseRAD2026 Challenge dataset: AI accelerated photon and proton dose calculation for radiotherapy
- arXiv: 2604.12778 (cross-listed)
- Authors: Fan Xiao, Nikolaos Delopoulos, Niklas Wahl, Lennart Volz, Lina Bucher, Matteo Maspero, Miguel Palacios, Muheng Li, Samir Schulz, Viktor Rogowski, Ye Zhang, Zoltan Perko, Christopher Kurz, George Dedes, Guillaume Landry, Adrian Thummerer
- Subjects: cs.AI; cs.CV
- Tags: Medical AI, Dataset
- Summary: 本文发布了DoseRAD2026数据集,用于放射治疗中AI加速的光子和质子剂量计算基准测试。该数据集包含115名患者的配对CT和MRI数据,以及蒙特卡洛模拟的剂量分布,支持MRI引导放疗和实时自适应放疗研究。
[179] Efficient Adversarial Training via Criticality-Aware Fine-Tuning
- arXiv: 2604.12780 (cross-listed)
- Authors: Wenyun Li, Zheng Zhang, Dongmei Jiang, Yaowei Wang, Xiangyuan Lan
- Subjects: cs.CV; cs.AI
- Tags: Adversarial Robustness, Parameter-Efficient Fine-Tuning, Vision Transformer
- Summary: 本文提出了CAAT方法,通过识别并仅微调对对抗鲁棒性最关键的参数,实现高效的Vision Transformer对抗训练。该方法仅需调整约6%的参数即可达到与全量对抗训练相当的鲁棒性。
[180] OSC: Hardware Efficient W4A4 Quantization via Outlier Separation in Channel Dimension
- arXiv: 2604.12782 (cross-listed)
- Authors: Zhiyuan Zhang, Yanzhao Li, Zhiqiang Zou, Bai Du, Yupeng Sun, Hui Dong, Hui Wang
- Subjects: cs.LG; cs.AI
- Tags: Quantization, LLM Inference, Hardware Acceleration
- Summary: 本文提出了OSC框架,通过通道维度的异常值分离实现硬件高效的W4A4量化。该方法采用双路径计算策略,将异常激活通道聚合为紧凑张量,在保持精度的同时实现1.78倍加速。
[181] VFA: Relieving Vector Operations in Flash Attention with Global Maximum Pre-computation
- arXiv: 2604.12798 (cross-listed)
- Authors: Yupeng Sun, Yanzhao Li, Zhiqiang Zou, Bai Du, Zhiyuan Zhang, Hui Dong, Gaoyige Fan, Hui Wang
- Subjects: cs.LG; cs.AI
- Tags: LLM Inference, Hardware Acceleration
- Summary: 本文提出了VFA方法,通过全局最大值预计算减少FlashAttention中的向量操作开销。该方法通过初始化运行最大值和重排序键块遍历顺序,在不损失性能的情况下显著缓解在线softmax的归约瓶颈。
[182] Efficiency of Proportional Mechanisms in Online Auto-Bidding Advertising
- arXiv: 2604.12799 (cross-listed)
- Authors: Nguyen Kim Thang
- Subjects: cs.GT; cs.AI; cs.DS
- Tags: Game AI, Recommender System
- Summary: 本文分析了自动竞价广告中比例机制的效率,证明了标准比例机制的PoA界限为2,并提出了改进版本使PoA趋近于1。研究利用对偶性和KKT条件建立了理论分析框架。
[183] Rethinking Satellite Image Restoration for Onboard AI: A Lightweight Learning-Based Approach
- arXiv: 2604.12807 (cross-listed)
- Authors: Adrien Dorise, Marjorie Bellizzi, Omar Hlimi
- Subjects: cs.CV; cs.AI
- Tags: Remote Sensing, Image Enhancement, Edge Computing
- Venue: CVPR 2026 Workshop
- Summary: 本文提出了ConvBEERS,一种轻量级CNN卫星图像修复模型,适用于星载AI处理。该方法在FPGA上实现了41倍延迟降低,同时保持竞争性图像质量和下游目标检测性能提升。
[184] Algorithmic Analysis of Dense Associative Memory: Finite-Size Guarantees and Adversarial Robustness
- arXiv: 2604.12811 (cross-listed)
- Authors: Madhava Gaikwad
- Subjects: cs.LG; cs.AI; cs.NE
- Tags: Adversarial Robustness, Memory Architecture
- Venue: ICLR 2026 Workshop
- Summary: 本文对稠密联想记忆(DAM)进行了算法分析,提供了有限规模保证和对抗鲁棒性界。研究证明了异步检索动力学的几何收敛性,并建立了与模式分离条件相关的容量保证。
[185] Loop Corrections to the Training and Generalization Errors of Random Feature Models
- arXiv: 2604.12827 (cross-listed)
- Authors: Taeyoung Kim
- Subjects: cs.LG; cs.AI; stat.ML
- Tags: Deep Learning Theory
- Summary: 本文从统计物理视角研究随机特征模型,推导了训练、测试和泛化误差的圈修正。研究通过有效场论框架捕捉了有限宽度贡献,超越了平均核近似。
[186] Detecting and refurbishing ground truth errors during training of deep learning-based echocardiography segmentation models
- arXiv: 2604.12832 (cross-listed)
- Authors: Iman Islam, Bram Ruijsink, Andrew J. Reader, Andrew P. King
- Subjects: cs.CV; cs.AI
- Tags: Medical AI, Image Segmentation, Weak Supervision
- Venue: ISBI 2026
- Summary: 本文研究了深度学习模型对超声心动图分割中标注错误的鲁棒性,提出了基于梯度方差(VOG)的错误检测方法和伪标签修复策略。该方法在高错误条件下显著提升了模型性能。
[187] FastGrasp: Learning-based Whole-body Control method for Fast Dexterous Grasping with Mobile Manipulators
- arXiv: 2604.12879 (cross-listed)
- Authors: Heng Tao, Yiming Zhong, Zemin Yang, Yuexin Ma
- Subjects: cs.RO; cs.AI
- Tags: Robotics, Reinforcement Learning, Sim-to-Real
- Summary: 本文提出了FastGrasp框架,一种面向移动机械臂快速灵巧抓取的学习方法。该框架采用两阶段强化学习策略,结合触觉反馈实现实时抓取调整,并通过仿真到现实的迁移验证了有效性。
[188] Towards Long-horizon Agentic Multimodal Search
- arXiv: 2604.12890 (cross-listed)
- Authors: Yifan Du, Zikang Liu, Jinbiao Peng, Jie Wu, Junyi Li, Jinyang Li, Wayne Xin Zhao, Ji-Rong Wen
- Subjects: cs.CV; cs.AI
- Tags: LLM Agent, Multimodal Learning, Information Retrieval
- Code: code
- Summary: 本文提出了LMM-Searcher框架,通过基于文件的视觉表示机制解决长时程多模态搜索中的上下文爆炸问题。该方法支持100轮搜索规模,在长时程基准测试中达到开源模型最优性能。
[189] Round-Trip Translation Reveals What Frontier Multilingual Benchmarks Miss
- arXiv: 2604.12911 (cross-listed)
- Authors: Ronald Skorobogat, Ameya Prabhu, Matthias Bethge
- Subjects: cs.CL; cs.AI
- Tags: Machine Translation, LLM Evaluation, Benchmark
- Summary: 本文揭示了现有前沿多语言基准主要测量数学推理和事实回忆能力,而非多语言能力。研究提出了往返翻译作为替代评估方法,并发布了LiT基准数据集。
[190] CoDe-R: Refining Decompiler Output with LLMs via Rationale Guidance and Adaptive Inference
- arXiv: 2604.12913 (cross-listed)
- Authors: Qiang Zhang, Zhongnian Li
- Subjects: cs.SE; cs.AI; cs.CR
- Tags: Code Generation, LLM Reasoning
- Venue: IJCNN 2026
- Code: code
- Summary: 本文提出了CoDe-R框架,通过语义认知增强和动态双路径回退机制优化反编译器输出。该方法在轻量级模型上首次实现了超过50%的平均可重执行率。
[191] Distorted or Fabricated? A Survey on Hallucination in Video LLMs
- arXiv: 2604.12944 (cross-listed)
- Authors: Yiyang Huang, Yitian Zhang, Yizhou Wang, Mingyuan Zhang, Liang Shi, Huimin Zeng, Yun Fu
- Subjects: cs.CV; cs.AI
- Tags: LLM Hallucination, Vision-Language Model, Survey
- Venue: ACL 2026 Findings
- Code: code
- Summary: 本文综述了视频大语言模型中的幻觉问题,将其系统分类为动态失真和内容伪造两大类。文章回顾了评估方法、缓解策略,并分析了幻觉的根本原因。
[192] Parallax: Why AI Agents That Think Must Never Act
- arXiv: 2604.12986 (cross-listed)
- Authors: Joel Fokou
- Subjects: cs.CR; cs.AI
- Tags: LLM Agent, LLM Security, AI Safety
- Code: code
- Summary: 本文提出了Parallax安全范式,主张AI代理的认知与执行分离。通过认知-执行分离、对抗验证、信息流控制和可逆执行四项原则,在代理被攻破时仍能阻止98.9%的攻击。
[193] ROSE: An Intent-Centered Evaluation Metric for NL2SQL
- arXiv: 2604.12988 (cross-listed)
- Authors: Wenqi Pei, Shizheng Hou, Boyan Li, Han Chen, Zhichao Shi, Yuyu Luo
- Subjects: cs.DB; cs.AI
- Tags: Text-to-SQL, LLM Evaluation
- Venue: ACL 2026
- Summary: 本文提出了ROSE,一种以意图为中心的NL2SQL评估指标,关注预测的SQL是否回答了用户问题,而非与ground-truth SQL的一致性。ROSE采用对抗性的Prover-Refuter级联机制,在专家对齐的验证集上实现了与人类专家的最佳一致性,比次优指标高出近24%的Cohen’s Kappa。
[194] LogicEval: A Systematic Framework for Evaluating Automated Repair Techniques for Logical Vulnerabilities in Real-World Software
- arXiv: 2604.12994 (cross-listed)
- Authors: Syed Md Mukit Rashid, Abdullah Al Ishtiaq, Kai Tu, Yilu Dong, Tianwei Wu, Ali Ranjbar, Tianchang Yang, Najrin Sultana, Shagufta Mehnaz, Syed Rafiul Hussain
- Subjects: cs.CR; cs.AI
- Tags: Software Testing, LLM Security, Program Repair
- Summary: 本文提出了LogicEval框架,用于系统评估针对软件逻辑漏洞的自动修复技术。作者创建了首个包含86个已分配CVE的逻辑漏洞数据集LogicDS,并评估了传统和基于LLM的修复方法,发现编译和测试失败主要由提示敏感性、代码上下文丢失和补丁定位困难导致。
[195] One Token Away from Collapse: The Fragility of Instruction-Tuned Helpfulness
- arXiv: 2604.13006 (cross-listed)
- Authors: Erfan Baghaei Potraghloo, Seyedarmin Azizi, Souvik Kundu, Massoud Pedram
- Subjects: cs.CL; cs.AI
- Tags: Instruction Tuning, LLM Evaluation, Interpretability
- Summary: 本文揭示了指令调优LLM在简单词汇约束下的脆弱性:仅禁止单个标点或常见词就会导致响应全面性下降14-48%。通过机制分析,作者将其识别为规划失败,并发现基础模型在相同约束下不会出现系统性崩溃,说明指令调优创造了这种脆弱性。
[196] Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation
- arXiv: 2604.13010 (cross-listed)
- Authors: Yecheng Wu, Song Han, Hai Cai
- Subjects: cs.LG; cs.AI
- Tags: Knowledge Distillation, LLM Reasoning, Mathematical Reasoning
- Summary: 本文提出了Lightning OPD,一种离线策略蒸馏框架,通过预计算教师模型的对数概率消除了对实时教师服务器的需求。该方法在数学推理和代码生成任务上实现了SOTA性能,相比标准OPD实现了4倍加速,仅需30 GPU小时即可在AIME 2024上达到69.9%。
[197] Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
- arXiv: 2604.13016 (cross-listed)
- Authors: Yaxuan Li, Yuxin Zuo, Bingxiang He, Jinqian Zhang, Chaojun Xiao, Cheng Qian, Tianyu Yu, Huan-ang Gao, Wenkai Yang, Zhiyuan Liu, Ning Ding
- Subjects: cs.LG; cs.AI; cs.CL
- Tags: Knowledge Distillation, LLM Reasoning
- Code: code
- Summary: 本文系统研究了策略蒸馏(OPD)的动态和机制,识别出OPD成功的两个条件:师生思维模式兼容,以及教师必须提供学生未见过的真正新能力。作者提出了离线冷启动和教师对齐的提示选择两种策略来恢复失败的OPD,并分析了OPD的代价。
[198] Representation geometry shapes task performance in vision-language modeling for CT enterography
- arXiv: 2604.13021 (cross-listed)
- Authors: Cristian Minoccheri, Emily Wittrup, Kayvan Najarian, Ryan Stidham
- Subjects: cs.CV; cs.AI
- Tags: Vision-Language Model, Medical AI, RAG
- Summary: 本文首次研究了CT肠造影的视觉-语言迁移学习,发现均值池化更适合疾病分类,注意力池化更适合跨模态检索。多窗口RGB编码优于多平面采样策略,RAG可将报告生成的严重程度准确率提高7-14个百分点。
[199] Visual Preference Optimization with Rubric Rewards
- arXiv: 2604.13029 (cross-listed)
- Authors: Ya-Qi Yu, Fangyu Hong, Xiangyang Qu, Hao Wang, Gaojie Wu, Qiaoyu Luo, Nuo Xu, Huixin Wang, Wuheng Xu, Yongxin Liao, Zihao Chen, Haonan Li, Ziming Li, Dezhi Peng, Minghui Liao, Jihao Wu, Haoyu Ren, Dandan Tu
- Subjects: cs.CV; cs.AI
- Tags: Vision-Language Model, RLHF, Multimodal Learning
- Summary: 本文提出了rDPO,一种基于实例特定评分标准的偏好优化框架。评分标准驱动的提示显著提升了奖励建模基准性能,在下游任务上,基于评分标准的过滤将宏平均提升至82.69,而基于结果的过滤则从81.14降至75.82。
替换投稿 (140)
[200] SmellNet: A Large-scale Dataset for Real-world Smell Recognition
- arXiv: 2506.00239 (replaced)
- Authors: Dewei Feng, Wei Dai, Carol Li, Alistair Pernigo, Yunge Wen, Paul Pu Liang
- Subjects: cs.AI
- Tags: Dataset, Sensor Fusion, Time Series Analysis
- Venue: ICLR 2026
- Summary: 本文介绍了SmellNet,一个用于真实世界气味识别的大规模数据集,包含约828,000个时间序列数据点,涵盖50种物质和43种混合物。作者开发了ScentFormer架构,在SmellNet-Base分类任务上实现了63.3%的Top-1准确率。
[201] Fragile Preferences: A Deep Dive Into Order Effects in Large Language Models
- arXiv: 2506.14092 (replaced)
- Authors: Haonan Yin, Shai Vardi, Vidyanand Choudhary
- Subjects: cs.AI
- Tags: LLM Evaluation, Bias Mitigation, Decision Making
- Summary: 本文对LLM在简历比较和颜色选择任务中的位置偏差进行了全面研究,发现了质量依赖性转变和名称偏差。作者扩展了理性选择框架将成对偏好分类为稳健、脆弱或无差异,并提出了针对性的缓解策略。
[202] League of LLMs: A Benchmark-Free Paradigm for Mutual Evaluation of Large Language Models
- arXiv: 2507.22359 (replaced)
- Authors: Qianhong Guo, Wei Xie, Xiaofang Cai, Enze Wang, Shuoyoucheng Ma, Xiaobing Sun, Tian Xia, Kai Chen, Xiaofeng Wang, Baosheng Wang
- Subjects: cs.AI; cs.CL
- Tags: LLM Evaluation, Benchmark
- Summary: 本文提出了LOL,一种无基准测试的评估范式,将多个LLM组织成自治联盟进行多轮相互评估。该框架在数学和编程任务上实现了70.7%的Top-k一致性,并揭示了传统范式难以捕捉的经验发现,如记忆化回答行为。
[203] Synthetic POMDPs to Challenge Memory-Augmented RL: Memory Demand Structure Modeling
- arXiv: 2508.04282 (replaced)
- Authors: Yongyi Wang, Lingfeng Li, Bozhou Chen, Ang Li, Hanyu Liu, Qirui Zheng, Xionghui Yang, Wenxin Li
- Subjects: cs.AI
- Tags: Reinforcement Learning, Memory Architecture, Benchmark
- Venue: FCS 2026
- Summary: 本文提出了基于记忆需求结构(MDS)分析POMDP的理论框架,以及使用线性动力学、状态聚合和奖励重分配构建具有预定义MDS的POMDP的方法。工作还包括一套轻量级、可扩展的POMDP环境,具有可调节的难度。
[204] Mantis: A Foundation Model for Mechanistic Disease Forecasting
- arXiv: 2508.12260 (replaced)
- Authors: Carson Dudley, Reiden Magdaleno, Christopher Harding, Ananya Sharma, Emily Martin, Marisa Eisenberg
- Subjects: cs.AI; q-bio.QM
- Tags: Foundation Model, Medical AI, Time Series Forecasting
- Summary: 本文介绍了Mantis,一个完全基于机制模拟训练的基础模型,用于疾病预测。尽管训练时未使用真实世界数据,Mantis在CDC COVID-19预测中心的回测中表现优于所有模型,并能泛化到训练数据中未表示的传播机制疾病。
[205] Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training
- arXiv: 2509.25758 (replaced)
- Authors: Yein Park, Minbyul Jeong, Jaewoo Kang
- Subjects: cs.AI
- Tags: LLM Reasoning, Interpretability, Knowledge Distillation
- Summary: 本文使用电路分析证明,针对复杂推理的后训练会激发新型功能专业化注意力头的出现。不同训练机制下这些头部演化方式不同:蒸馏和SFT促进稳定推理头的累积添加,而GRPO以动态搜索模式运作,迭代地激活、评估和修剪注意力头。
[206] ASGuard: Activation-Scaling Guard to Mitigate Targeted Jailbreaking Attack
- arXiv: 2509.25843 (replaced)
- Authors: Yein Park, Jungwoo Park, Jaewoo Kang
- Subjects: cs.AI
- Tags: LLM Security, LLM Alignment, Interpretability
- Venue: ICLR 2026
- Summary: 本文提出了ASGuard,一个基于机制分析的框架,通过识别因果关联的注意力头并应用通道级缩放向量来缓解定向越狱攻击。该方法在四个LLM上有效降低了定向越狱攻击成功率,同时保持通用能力并最小化过度拒绝。
[207] The Stackelberg Speaker: Optimizing Persuasive Communication in Social Deduction Games
- arXiv: 2510.09087 (replaced)
- Authors: Zhang Zheng, Deheng Ye, Peilin Zhao, Hao Wang
- Subjects: cs.AI
- Tags: LLM Agent, Game AI, Reinforcement Learning
- Venue: ACL 2026
- Summary: 本文将社交推理游戏中的回合制对话形式化为Stackelberg竞争,并提出了强化学习框架来训练代理优化说服性沟通。在三个不同的社交推理游戏中,该代理显著优于基线,代表了开发具有战略社交影响力AI代理的重要一步。
[208] Mixed-Density Diffuser: Efficient Planning with Non-Uniform Temporal Resolution
- arXiv: 2510.23026 (replaced)
- Authors: Crimson Stambaugh, Rajesh P. N. Rao
- Subjects: cs.AI; cs.RO
- Tags: Diffusion Model, Reinforcement Learning, Automated Planning
- Venue: ESANN 2026
- Summary: 本文提出了混合密度扩散器(MDD),一种时间密度可调的扩散规划器。MDD在Maze2D、Franka Kitchen和Antmaze数据集上超越了SOTA扩散老手(DV)框架,在D4RL基准上实现了新的SOTA。
[209] JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence
- arXiv: 2510.23538 (replaced)
- Authors: Qiushi Sun, Jingyang Gong, Yang Liu, Qiaosheng Chen, Lei Li, Kai Chen, Qipeng Guo, Ben Kao, Fei Yuan
- Subjects: cs.AI; cs.CL; cs.CV; cs.SE
- Tags: Code Generation, Multimodal Learning, Vision-Language Model
- Venue: ICLR 2026
- Code: code
- Summary: 本文介绍了JanusCoder,一个视觉-编程接口,能够从文本指令、视觉输入或两者结合生成代码。作者构建了JanusCode-800K这一最大的多模态代码语料库,并训练出在文本和视觉编码任务上均表现优异的模型。
[210] MGA: Memory-Driven GUI Agent for Observation-Centric Interaction
- arXiv: 2510.24168 (replaced)
- Authors: Weihua Cheng, Junming Liu, Yifei Sun, Botian Shi, Yirong Chen, Ding Wang
- Subjects: cs.AI
- Tags: GUI Automation, LLM Agent, Memory Architecture
- Code: code
- Summary: 本文提出MGA框架,通过结构化状态记忆将长时轨迹解耦为独立决策步骤,采用”先观察后记忆增强”原则来减少GUI代理的认知开销和系统复杂度。实验表明该方法在开放性GUI任务中取得了具有竞争力的性能。
[211] Does RLVR Extend Reasoning Boundaries? Investigating Capability Expansion in Vision-Language Models
- arXiv: 2511.00710 (replaced)
- Authors: Minghe Shen, Zhuo Zhi, Chonghan Liu, Shuo Xing, Zhengzhong Tu, Che Liu
- Subjects: cs.AI
- Tags: LLM Reasoning, Vision-Language Model, Reinforcement Learning
- Summary: 本文研究了强化学习与可验证奖励(RLVR)是否能扩展视觉语言模型的推理边界。通过合成迷宫导航框架Ariadne,作者证明RLVR可以使模型成功解决基础模型准确率为0%的空间推理问题,表明真正的能力扩展而非仅仅是采样效率提升。
[212] DecompSR: A dataset for decomposed analyses of compositional multihop spatial reasoning
- arXiv: 2511.02627 (replaced)
- Authors: Lachlan McPheat, Navdeep Kaur, Robert Blackwell, Alessandra Russo, Anthony G. Cohn, Pranava Madhyastha
- Subjects: cs.AI
- Tags: LLM Reasoning, Benchmark, Spatial Reasoning
- Summary: 本文介绍了DecompSR,一个包含超过500万数据点的大规模基准数据集,用于分析组合空间推理能力。该框架允许独立变化组合性的多个方面,揭示了LLM在空间推理任务中难以进行生产性和系统性泛化。
[213] Dataset Safety in Autonomous Driving: Requirements, Risks, and Assurance
- arXiv: 2511.08439 (replaced)
- Authors: Alireza Abbaspour, Tejaskumar Balgonda Patil, B Ravi Kiran, Russel Mohr, Senthil Yogamani
- Subjects: cs.AI
- Tags: Autonomous Driving, AI Safety, Data Annotation
- Summary: 本文提出了一个符合ISO/PAS 8800指南的安全数据集开发框架,涵盖数据收集、标注、筛选和维护的完整生命周期。框架包含严格的安全分析以识别和缓解数据集不足导致的风险,并提出了验证和确认策略以确保符合安全标准。
[214] Learning the Value of Value Learning
- arXiv: 2511.17714 (replaced)
- Authors: Alex John London, Aydin Mohseni
- Subjects: cs.AI; cs.GT
- Tags: Decision Making, Multi-Agent System
- Summary: 本文扩展了Jeffrey-Bolker决策框架以建模价值观的精细化,并证明了价值精细化的信息价值定理。在多智能体环境中,相互精细化可以将零和博弈转化为正和交互,并在纳什谈判中实现帕累托改进。
[215] A Benchmark for Evaluating Outcome-Driven Constraint Violations in Autonomous AI Agents
- arXiv: 2512.20798 (replaced)
- Authors: Miles Q. Li, Benjamin C. M. Fung, Martin Weiss, Pulei Xiong, Khalil Al-Hussaeni, Claude Fachkha
- Subjects: cs.AI
- Tags: LLM Agent, AI Safety, Benchmark
- Summary: 本文引入了一个包含40个多步骤场景的基准,用于评估自主AI代理中涌现的结果驱动约束违规。在12个最先进LLM上的测试显示违规率从11.5%到66.7%不等,且安全性在各代模型间并未可靠提升。
[216] No More Stale Feedback: Co-Evolving Critics for Open-World Agent Learning
- arXiv: 2601.06794 (replaced)
- Authors: Zhicong Li, Lingjie Jiang, Yulan Hu, Xingchen Zeng, Yixia Li, Xiangwen Zhang, Guanhua Chen, Zheng Pan, Xin Li, Yong Liu
- Subjects: cs.AI
- Tags: LLM Agent, Reinforcement Learning
- Summary: 本文介绍了ECHO框架,通过同步协同进化循环联合优化策略和评论家,解决开放世界环境中评论家反馈过时的问题。该方法使用级联展开机制和饱和感知增益整形,实现了更稳定的训练和更高的长期任务成功率。
[217] PrivacyReasoner: Can LLM Emulate a Human-like Privacy Mind?
- arXiv: 2601.09152 (replaced)
- Authors: Yiwen Tu, Xuan Liu, Lianhui Qin, Haojian Jin
- Subjects: cs.AI
- Tags: Privacy, LLM Agent
- Summary: 本文提出了PrivacyReasoner,一种能够从用户在线评论历史中重建”隐私思维”的代理架构,用于预测个人隐私关切。该系统使用上下文过滤器动态激活相关隐私信念,在多个领域显著优于基线方法。
[218] LatentRefusal: Latent-Signal Refusal for Unanswerable Text-to-SQL Queries
- arXiv: 2601.10398 (replaced)
- Authors: Xuancheng Ren, Shijing Hu, Zhihui Lu, Jiangqi Huang, Qiang Duan
- Subjects: cs.AI
- Tags: Text-to-SQL, AI Safety, Interpretability
- Summary: 本文提出了LatentRefusal,一种从大语言模型中间隐藏激活预测查询可回答性的潜在信号拒绝机制。该方法使用三残差门控编码器来检测不可回答的查询,在四个基准上平均F1提升至88.5%,同时仅增加约2毫秒的开销。
[219] WebFactory: Automated Compression of Foundational Language Intelligence into Grounded Web Agents
- arXiv: 2603.05044 (replaced)
- Authors: Sicheng Fan, Qingyun Shi, Shengze Xu, Shengbo Cai, Tieyong Zeng, Li Ling, Yanyi Shang, Dehan Kong
- Subjects: cs.AI
- Tags: GUI Automation, LLM Agent, Reinforcement Learning
- Summary: 本文介绍了WebFactory,一个全自动闭环强化学习流水线,用于将LLM编码的互联网知识压缩为高效的GUI代理行为。仅使用10个网站的合成数据训练的代理,达到了与大规模人工标注数据训练的代理相当的性能。
[220] WebChain: A Large-Scale Human-Annotated Dataset of Real-World Web Interaction Traces
- arXiv: 2603.05295 (replaced)
- Authors: Sicheng Fan, Rui Wan, Yifei Leng, Gaoning Liang, Li Ling, Yanyi Shang, Dehan Kong
- Subjects: cs.AI; cs.CV
- Tags: GUI Automation, Dataset, LLM Agent
- Summary: 本文介绍了WebChain,最大的开源真实网站人工标注轨迹数据集,包含31,725条轨迹和318k步。作者提出了双中期训练方案,将空间定位与规划解耦,在GUI基准上达到了最先进的性能。
[221] A Survey of Multimodal Mathematical Reasoning: From Perception, Alignment to Reasoning
- arXiv: 2603.08291 (replaced)
- Authors: Tianyu Yang, Sihong Wu, Yilun Zhao, Zhenwen Liang, Lisen Dai, Chen Zhao, Minhao Cheng, Arman Cohan, Xiangliang Zhang
- Subjects: cs.AI
- Tags: Mathematical Reasoning, Multimodal Learning, Survey
- Venue: ACL 2026
- Summary: 本文系统综述了多模态数学推理方法,围绕四个基本问题展开:从多模态输入中提取什么、如何表示和对齐文本与视觉信息、如何执行推理、以及如何评估推理过程的正确性。文章讨论了当前挑战和未来研究方向。
[222] Adaptive Domain Models: Bayesian Evolution, Warm Rotation, and Principled Training for Geometric and Neuromorphic AI
- arXiv: 2603.18104 (replaced)
- Authors: Houston Haynes
- Subjects: cs.AI; cs.DC; cs.LG; cs.NE
- Tags: Neuromorphic Computing, Bayesian Optimization, Model Compression
- Summary: 本文开发了一种替代性AI训练架构,基于维度类型系统、程序超图和posit算术,实现深度无关的训练内存和保权重的权重更新。引入了贝叶斯蒸馏用于领域特定训练,以及热旋转用于无缝模型部署。
[223] Man and machine: artificial intelligence and judicial decision making
- arXiv: 2603.19042 (replaced)
- Authors: Arthur Dyevre, Ahmad Shahvaroughi
- Subjects: cs.AI
- Tags: Legal AI, Fairness, Decision Making
- Summary: 本文综述了AI在司法决策中的作用,聚焦于刑事司法风险评估。文章考察了AI工具的性能和公平性、人类法官的优势和偏见,以及AI与人类的交互,发现AI对审前和量刑决策的影响适中或不存在。
[224] Evaluating Language Models for Harmful Manipulation
- arXiv: 2603.25326 (replaced)
- Authors: Canfer Akbulut, Rasmi Elasmar, Abhishek Roy, Anthony Payne, Priyanka Suresh, Lujain Ibrahim, Seliem El-Sayed, Charvi Rastogi, Ashyana Kachra, Will Hawkins, Kristian Lum, Laura Weidinger
- Subjects: cs.AI; cs.CY
- Tags: AI Safety, AI Persuasion, LLM Evaluation
- Summary: 本文引入了一个通过特定上下文人机交互研究评估有害AI操纵的框架,在三个AI使用领域和三个地区对10,101名参与者进行了测试。研究发现AI模型在被提示时会产生操纵行为,且操纵效果在领域和地区间存在显著差异。
[225] CODESTRUCT: Code Agents over Structured Action Spaces
- arXiv: 2604.05407 (replaced)
- Authors: Myeongsoo Kim, Joe Hsu, Dingmin Wang, Shweta Garg, Varun Kumar, Murali Krishna Ramanathan
- Subjects: cs.AI; cs.SE
- Tags: LLM Agent, Code Generation
- Venue: ACL 2026
- Summary: 本文提出CODESTRUCT框架,将代码库视为结构化动作空间,使智能体基于AST实体而非文本片段进行操作,在SWE-Bench Verified上提升Pass@1准确率1.2-5.0%同时降低token消耗。
[226] Reasoning Graphs: Self-Improving, Deterministic RAG through Evidence-Centric Feedback
- arXiv: 2604.07595 (replaced)
- Authors: Matthew Penaroza
- Subjects: cs.AI; cs.CL
- Tags: RAG, LLM Reasoning
- Summary: 本文提出推理图结构,持久化存储每个证据的思维链,实现以证据为中心的反馈机制,相比vanilla RAG在相同问题上减少47%的错误,且无需模型重训练。
[227] SEA-Eval: A Benchmark for Evaluating Self-Evolving Agents Beyond Episodic Assessment
- arXiv: 2604.08988 (replaced)
- Authors: Sihang Jiang, Lipeng Ma, Zhonghua Hong, Keyi Wang, Zhiyu Lu, Shisong Chen, Jinghao Zhang, Tianjun Pan, Weijia Zhou, Jiaqing Liang, Yanghua Xiao
- Subjects: cs.AI
- Tags: LLM Agent, LLM Evaluation, Benchmark
- Summary: 本文首次形式化定义自进化智能体(SEA),提出进化飞轮架构,并引入SEA-Eval基准测试,用于评估智能体跨任务边界积累经验的能力。
[228] Hubble: An LLM-Driven Agentic Framework for Safe, Diverse, and Reproducible Alpha Factor Discovery
- arXiv: 2604.09601 (replaced)
- Authors: Runze Shi, Shengyu Yan, Yuecheng Cai, Chengxi Lv
- Subjects: cs.AI; cs.CE
- Tags: LLM Agent, Quantitative Finance, RAG
- Summary: Hubble是一个LLM驱动的Alpha因子挖掘框架,结合领域特定算子语言、AST执行沙箱和双通道RAG模块,安全地发现可解释的金融因子。
[229] Edu-MMBias: A Three-Tier Multimodal Benchmark for Auditing Social Bias in Vision-Language Models under Educational Contexts
- arXiv: 2604.10200 (replaced)
- Authors: Ruijia Li, Mingzi Zhang, Zengyi Yu, Yuang Wei, Bo Jiang
- Subjects: cs.AI; cs.CV
- Tags: Vision-Language Model, Fairness, Bias Mitigation
- Summary: 本文提出Edu-MMBias,一个三层多模态基准测试,用于审计教育场景下视觉语言模型的社会偏见,发现视觉输入可作为安全后门触发被文本对齐保护绕过的偏见。
[230] Dead Cognitions: A Census of Misattributed Insights
- arXiv: 2604.10288 (replaced)
- Authors: Aaron Tuor, claude.ai
- Subjects: cs.AI
- Tags: AI Ethics, Human-Computer Interaction
- Summary: 本文识别出AI聊天系统的一种失败模式——归属洗钱,即模型执行实质性认知工作后却将洞察归功于用户,逐渐侵蚀用户准确评估自身认知贡献的能力。
[231] EvoNash-MARL: A Closed-Loop Multi-Agent Reinforcement Learning Framework for Medium-Horizon Equity Allocation
- arXiv: 2604.10911 (replaced)
- Authors: Chongliu Jia, Yi Luo, Sipeng Han, Pengwei Li, Jie Ding, Youshuang Hu, Yimiao Qian, Qiya Wang
- Subjects: cs.AI; cs.LG
- Tags: Multi-Agent System, Reinforcement Learning, Quantitative Finance
- Summary: 本文提出EvoNash-MARL,一个闭环多智能体强化学习框架用于中长期股票配置,结合策略种群、博弈论聚合和约束感知验证,实现19.6%的年化收益。
[232] EmergentBridge: Improving Zero-Shot Cross-Modal Transfer in Unified Multimodal Embedding Models
- arXiv: 2604.11043 (replaced)
- Authors: Jincheng Xie, Xingchen Xiao, Runheng Liu, Zhongyi Huang, Yu Zheng, Heyan Huang
- Subjects: cs.AI
- Tags: Multimodal Learning, Zero-Shot Learning, Representation Learning
- Summary: EmergentBridge是一个嵌入级桥接框架,通过学习在保持锚点对齐的同时增强非锚点连接性的映射,改善统一多模态嵌入模型中未配对模态对的零样本跨模态迁移。
[233] Persona Non Grata: Single-Method Safety Evaluation Is Incomplete for Persona-Imbued LLMs
- arXiv: 2604.11120 (replaced)
- Authors: Wenkai Li, Fan Yang, Shaunak A. Mehta, Koichi Onoue
- Subjects: cs.AI
- Tags: LLM Alignment, LLM Security, LLM Evaluation
- Summary: 本文表明人格注入LLM的安全评估使用单一方法是不完整的,提示和激活引导暴露出不同的漏洞特征,某些亲社会人格在激活引导下反而变得高度脆弱。
[234] Context Kubernetes: Declarative Orchestration of Enterprise Knowledge for Agentic AI Systems
- arXiv: 2604.11623 (replaced)
- Authors: Charafeddine Mouzouni
- Subjects: cs.AI; cs.SE
- Tags: LLM Agent, Knowledge Management
- Code: code
- Summary: 本文提出Context Kubernetes架构,用于编排企业知识,通过声明式清单、协调循环和三层权限模型,在五个攻击场景中全部成功阻断。
[235] RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time
- arXiv: 2604.11626 (replaced)
- Authors: Haozhe Wang, Cong Wei, Weiming Ren, Jiaming Liu, Fangzhen Lin, Wenhu Chen
- Subjects: cs.AI; cs.LG
- Tags: Text-to-Image, Reinforcement Learning, Image Generation
- Summary: RationalRewards教导奖励模型在评分前生成显式多维评论,在训练时提供可解释的强化学习奖励,在测试时通过评论-优化循环改进输出,达到开源奖励模型中最优偏好预测。
[236] Pictorial and apictorial polygonal jigsaw puzzles from arbitrary number of crossing cuts
- arXiv: 2008.07644 (replaced)
- Authors: Peleg Harel Ofir Itzhak Shahar, Ohad Ben-Shahar
- Subjects: cs.CV; cs.AI; cs.CG
- Tags: Image Reconstruction, 3D Vision
- Summary: 本文形式化一种新型拼图问题,其碎片为任意直线切割生成的凸多边形,并提出基于多体弹簧-质量动力学系统和分层循环约束的自动求解方法。
[237] Prompt Evolution for Generative AI: A Classifier-Guided Approach
- arXiv: 2305.16347 (replaced)
- Authors: Melvin Wong, Yew-Soon Ong, Abhishek Gupta, Kavitesh K. Bali, Caishun Chen
- Subjects: cs.LG; cs.AI; cs.CV; cs.NE
- Tags: Prompt Engineering, Text-to-Image, Evolutionary Computation
- Venue: CAI 2023
- Summary: 本文提出提示进化方法,在生成过程中应用分类器引导的进化选择压力,产生多个更好满足目标概念/偏好的输出图像。
[238] A2-DIDM: Privacy-preserving Accumulator-enabled Auditing for Distributed Identity of DNN Model
- arXiv: 2405.04108 (replaced)
- Authors: Tianxiu Xie, Keke Gai, Jing Yu, Liehuang Zhu
- Subjects: cs.CR; cs.AI
- Tags: Blockchain, Privacy, Model Security
- Code: code
- Summary: 本文提出A2-DIDM,一种基于累加器的去中心化DNN模型身份审计方案,利用区块链和零知识证明技术保护隐私并确保轻量级所有权验证。
[239] OmniHands: Towards Robust 4D Hand Mesh Recovery via A Versatile Transformer
- arXiv: 2405.20330 (replaced)
- Authors: Dixuan Lin, Yuxiang Zhang, Mengcheng Li, Wei Jing, Qi Yan, Qianying Wang, Yebin Liu, Hongwen Zhang
- Subjects: cs.CV; cs.AI; cs.GR
- Tags: 3D Vision, Pose Estimation
- Summary: OmniHands是一种从单目或多视角输入恢复交互式4D手部网格的通用方法,采用关系感知双手标记化和4D交互推理模块实现鲁棒的手部重建。
[240] animal2vec and MeerKAT: A self-supervised transformer for rare-event raw audio input and a large-scale reference dataset for bioacoustics
- arXiv: 2406.01253 (replaced)
- Authors: Julian C. Schäfer-Zimmermann, Vlad Demartsev, Baptiste Averly, Kiran Dhanjal-Adams, Mathieu Duteil, Gabriella Gall, Marius Faiß, Lily Johnson-Ulrich, Dan Stowell, Marta B. Manser, Marie A. Roch, Ariana Strandburg-Peshkin
- Subjects: cs.SD; cs.AI; eess.AS; q-bio.QM; stat.AP
- Tags: Bioacoustics, Self-Supervised Learning, Dataset
- Code: code
- Summary: 本文提出animal2vec,一种面向稀疏生物声学数据的自监督Transformer模型,并发布MeerKAT数据集——目前最大的非人类陆生哺乳动物发声标注数据集。
[241] AdaMCoT: Rethinking Cross-Lingual Factual Reasoning through Adaptive Multilingual Chain-of-Thought
- arXiv: 2501.16154 (replaced)
- Authors: Weihua Zheng, Xin Huang, Zhengyuan Liu, Tarun Kumar Vangani, Bowei Zou, Xiyan Tao, Yuhao Wu, Ai Ti Aw, Nancy F. Chen, Roy Ka-Wei Lee
- Subjects: cs.CL; cs.AI
- Tags: LLM Reasoning, Multilingual Learning
- Venue: AAAI 2026
- Summary: 本文提出AdaMCOT框架,通过动态路由思维过程到中间”思考语言”来增强多语言事实推理能力。该方法利用语言无关的核心和自适应奖励机制选择最优推理路径,无需额外预训练即可有效弥合高资源与低资源语言间的性能差距。
[242] RegD: Hierarchical Embeddings via Dissimilarity between Arbitrary Euclidean Regions
- arXiv: 2501.17518 (replaced)
- Authors: Hui Yang, Jiaoyan Chen
- Subjects: cs.LG; cs.AI
- Tags: Representation Learning, Knowledge Graph
- Summary: 本文提出RegD,一个灵活的欧几里得框架,支持使用任意几何区域(如盒子、球体)作为嵌入表示。该方法通过基于深度的区域间差异性实现类双曲表达能力,在层次数据嵌入任务上优于现有方法。
[243] Large Language Models are Powerful Electronic Health Record Encoders
- arXiv: 2502.17403 (replaced)
- Authors: Stefan Hegselmann, Georg von Arnim, Tillmann Rheude, Noel Kronenberg, David Sontag, Gerhard Hindricks, Roland Eils, Benjamin Wild
- Subjects: cs.LG; cs.AI; cs.CL
- Tags: Medical AI, Representation Learning
- Summary: 本文探索将电子健康记录转换为自然语言描述后利用通用大语言模型生成嵌入向量,用于临床预测任务。实验表明LLM嵌入在15个临床任务上与专用EHR基础模型性能相当,且具有更好的可移植性。
[244] Siamese Foundation Models for Crystal Structure Prediction
- arXiv: 2503.10471 (replaced)
- Authors: Liming Wu, Wenbing Huang, Rui Jiao, Jianxing Huang, Liwei Liu, Yipeng Zhou, Hao Sun, Yang Liu, Fuchun Sun, Yuxiang Ren, Jirong Wen
- Subjects: cs.AI
- Tags: Material Discovery, Foundation Model, Diffusion Model
- Summary: 本文提出DAO框架,集成孪生基础模型(结构生成器和能量预测器)用于晶体结构预测。该方法在真实超导体验证中实现与实验参考100%匹配率,比DFT方法快2000倍以上。
[245] Fine-Tuning LLMs for Report Summarization: Analysis on Supervised and Unsupervised Data
- arXiv: 2503.10676 (replaced)
- Authors: Swati Rallapalli, Shannon Gallagher, Andrew O. Mellinger, Jasmine Ratchford, Anusha Sinha, Tyler Brooks, William R. Nichols, Nick Winski, Bryan Brown
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: Summarization, Parameter-Efficient Fine-Tuning
- Summary: 本文研究在资源受限环境下微调大语言模型用于报告摘要任务,包括政府档案、新闻和情报报告。实验表明微调在许多情况下能提高摘要质量并减少无效输出。
[246] Characterizing higher-order representations through generative diffusion models explains human decoded neurofeedback performance
- arXiv: 2503.14333 (replaced)
- Authors: Hojjat Azimi Asrari, Megan A. K. Peters
- Subjects: cs.LG; cs.AI; q-bio.NC
- Tags: Diffusion Model, Neuroscience, Reinforcement Learning
- Summary: 本文提出NERD模型,通过强化学习训练去噪扩散模型来推断fMRI数据中的噪声分布,用于解码神经反馈任务。该模型能有效捕捉人类表现并揭示预测任务成功的个体差异。
[247] On the Mathematical Relationship Between Layer Normalization and Dynamic Activation Functions
- arXiv: 2503.21708 (replaced)
- Authors: Felix Stollenwerk
- Subjects: cs.LG; cs.AI; cs.CL
- Tags: Deep Learning Theory, Optimization
- Venue: EACL 2026
- Summary: 本文揭示了层归一化与动态激活函数之间的数学关系,从RMSNorm推导出Dynamic Tanh,并提出DyISRU作为RMSNorm的精确元素级对应物,能更准确地重现归一化效果。
[248] On the Geometry of Receiver Operating Characteristic and Precision-Recall Curves
- arXiv: 2504.02169 (replaced)
- Authors: Reza Sameni
- Subjects: cs.LG; cs.AI; math.ST; stat.ML
- Tags: Optimization, Benchmark
- Summary: 本文研究二分类问题中ROC和PR曲线的几何特性,发现常用分类指标可表示为类别条件累积分布函数复合的函数。该框架有助于分类器选择、决策阈值理解和特定应用优化。
[249] Joint Flashback Adaptation for Forgetting-Resistant Instruction Tuning
- arXiv: 2505.15467 (replaced)
- Authors: Yukun Zhao, Lingyong Yan, Zhenyang Li, Shuaiqiang Wang, Zhumin Chen, Zhaochun Ren, Dawei Yin
- Subjects: cs.CL; cs.AI
- Tags: Continual Learning, Instruction Tuning
- Summary: 本文提出联合闪回适应方法用于抗遗忘的指令微调,通过在学习新任务时引入旧任务的少量提示并约束模型输出偏差,实现平滑的知识迁移和适应。
[250] SEW: Self-Evolving Agentic Workflows for Automated Code Generation
- arXiv: 2505.18646 (replaced)
- Authors: Siwei Liu, Jinyuan Fang, Han Zhou, Yingxu Wang, Zaiqiao Meng
- Subjects: cs.SE; cs.AI; cs.CL
- Tags: Code Generation, LLM Agent, Multi-Agent System
- Summary: 本文提出SEW自进化框架,能够自动生成和优化多智能体工作流用于代码生成。该方法在LiveCodeBench上相比骨干LLM提升达12%,实现了工作流的自动化设计。
[251] Enhancing Text-to-Image Diffusion Transformer via Split-Text Conditioning
- arXiv: 2505.19261 (replaced)
- Authors: Yu Zhang, Jialei Zhou, Xinchen Li, Qi Zhang, Zhongwei Wan, Tianyu Wang, Duoqian Miao, Changwei Wang, Longbing Cao
- Subjects: cs.CV; cs.AI
- Tags: Text-to-Image, Diffusion Model, Vision Transformer
- Venue: NeurIPS 2025
- Summary: 本文提出DiT-ST分裂文本条件框架,将完整文本标题转换为简化句子集合,并在不同去噪阶段分层增量注入,以缓解扩散变换器对完整文本的理解缺陷。
[252] SpecBranch: Speculative Decoding via Hybrid Drafting and Rollback-Aware Branch Parallelism
- arXiv: 2506.01979 (replaced)
- Authors: Yuhao Shen, Junyi Shen, Quan Kong, Tianyu Liu, Yao Lu, Cong Wang
- Subjects: cs.DC; cs.AI
- Tags: Speculative Decoding, LLM Inference
- Venue: ICLR 2026
- Summary: 本文提出SpecBranch框架,通过引入分支并行性来减少推测解码中草稿模型和目标模型之间的等待开销,实现1.8x-4.5x加速并减少50%的回滚token。
[253] HSG-12M: A Large-Scale Benchmark of Spatial Multigraphs from the Energy Spectra of Non-Hermitian Crystals
- arXiv: 2506.08618 (replaced)
- Authors: Xianquan Yan, Hakan Akgün, Kenji Kawaguchi, N. Duane Loh, Ching Hua Lee
- Subjects: cs.LG; cs.AI; cs.CV
- Tags: Dataset, Graph Neural Network, Scientific Computing
- Venue: ICLR 2026
- Code: code, code
- Summary: 本文发布HSG-12M数据集,包含1160万个来自非厄米晶体能谱的哈密顿谱图,是首个大规模空间多重图数据集,为凝聚态物理和几何感知图学习奠定基础。
[254] Fast AI Model Partition for Split Learning over Edge Networks
- arXiv: 2507.01041 (replaced)
- Authors: Zuguang Li, Wen Wu, Shaohua Wu, Xuemin, Shen
- Subjects: cs.LG; cs.AI
- Tags: Distributed Training, Edge Computing, DNN Deployment
- Summary: 本文提出一种用于分割学习的最优模型划分算法,通过将问题转化为DAG上的最小s-t割问题来最小化训练延迟,在硬件测试平台上实现显著加速。
[255] Global optimization tailored for graphics processing units: Complete and rigorous search for large-scale nonlinear minimization
- arXiv: 2507.01770 (replaced)
- Authors: Guanglu Zhang, Qihang Shan, Jonathan Cagan
- Subjects: math.NA; cs.AI; cs.DC; cs.MS; math.OC
- Tags: GPU Computing, Optimization, Scientific Computing
- Summary: 本文提出一种基于区间分析和GPU架构的全局优化方法,用于包围非线性函数的全局最小值。该方法成功处理高达10000维的基准测试函数,远超文献报道结果。
[256] Mobile GUI Agents under Real-world Threats: Are We There Yet?
- arXiv: 2507.04227 (replaced)
- Authors: Guohong Liu, Jialei Ye, Jiacheng Liu, Yuanchun Li, Wei Liu, Pengzhi Gao, Jian Luan, Yunxin Liu
- Subjects: cs.CR; cs.AI
- Tags: GUI Automation, LLM Security, Benchmark
- Summary: 本文研究移动GUI智能体在真实世界威胁下的安全漏洞,构建包含动态任务执行环境和静态数据集的测试套件,发现第三方内容可导致智能体平均42%和36%的误导率。
[257] A document is worth a structured record: Principled inductive bias design for document recognition
- arXiv: 2507.08458 (replaced)
- Authors: Benjamin Meyer, Lukas Tuggener, Sascha Hänzi, Daniel Schmid, Erdal Ayfer, Benjamin F. Grewe, Ahmed Abdulkadir, Thilo Stadelmann
- Subjects: cs.CV; cs.AI
- Tags: Document Understanding, Graph Learning, Representation Learning
- Summary: 该论文提出了一种新的文档识别视角,将其视为从文档到结构化记录的转录任务,并设计了针对特定结构的归纳偏置Transformer架构。实验表明,该方法在乐谱、形状图纸和工程图纸等复杂文档类型上取得了良好效果,成功实现了首个端到端的机械工程图纸转录模型。
[258] Simulation as Supervision: Mechanistic Pretraining for Scientific Discovery
- arXiv: 2507.08977 (replaced)
- Authors: Carson Dudley, Reiden Magdaleno, Christopher Harding, Marisa Eisenberg
- Subjects: cs.LG; cs.AI; stat.ML
- Tags: Scientific Computing, Pre-training, Scientific Reasoning
- Summary: 该论文提出了仿真引导神经网络框架,通过利用机制仿真数据对神经网络进行预训练,将科学理论融入模型以提升预测能力和可解释性。实验结果显示,该方法在流行病学、生态学等多个科学领域的预测任务中优于传统数据驱动和物理约束模型。
[259] Automatic Road Subsurface Distress Recognition from Ground Penetrating Radar Images using Deep Learning-based Cross-verification
- arXiv: 2507.11081 (replaced)
- Authors: Chang Peng, Bao Yang, Meiqi Li, Ge Zhang, Hui Sun, Zhenyu Jiang
- Subjects: cs.CV; cs.AI
- Tags: Object Detection, Remote Sensing, Autonomous Driving
- Summary: 该研究提出了一种基于深度学习的交叉验证策略,利用YOLO模型处理三维探地雷达图像,实现了对道路地下病害的高精度自动识别。该方法在实地测试中召回率超过98.6%,并能显著减少人工检测的工作量。
[260] Improved particle swarm optimization algorithm: multi-target trajectory optimization for swarm drones
- arXiv: 2507.13647 (replaced)
- Authors: Minze Li, Wei Zhao, Ran Chen, Mingqiang Wei
- Subjects: cs.RO; cs.AI
- Tags: Robotics, Optimization, Multi-Agent System
- Summary: 该论文提出了一种改进的粒子群优化算法(PE-PSO),通过引入持续探索机制和基于熵的参数调整策略,解决了无人机群实时轨迹规划中的收敛和延迟问题。仿真结果表明,该框架在轨迹质量、能效和避障等方面均优于传统方法。
[261] ChemDFM-R: A Chemical Reasoning LLM Enhanced with Atomized Chemical Knowledge
- arXiv: 2507.21990 (replaced)
- Authors: Zihan Zhao, Ziping Wan, Lu Chen, Xuanze Lin, Shiyang Yu, Situo Zhang, Da Ma, Zichen Zhu, Danyang Zhang, Huayang Wang, Zhongyang Dai, Liyang Wen, Bo Chen, Xin Chen, Kai Yu
- Subjects: cs.CE; cs.AI
- Tags: LLM Reasoning, Scientific Reasoning, Knowledge Distillation
- Summary: 该论文开发了化学推理大模型ChemDFM-R,通过构建原子化化学知识数据集ChemFG并提出混合源蒸馏方法,增强了模型对化学原理的理解和推理能力。实验表明,该模型在多项化学基准测试中达到了领先水平,且具备可解释的推理输出。
[262] Teaching the Teacher: The Role of Teacher-Student Smoothness Alignment in Genetic Programming-based Symbolic Distillation
- arXiv: 2507.22767 (replaced)
- Authors: Soumyadeep Dhar, Kei Sen Fong, Mehul Motani
- Subjects: cs.LG; cs.AI
- Tags: Knowledge Distillation, Symbolic Regression, Explainable AI
- Venue: GECCO 2026
- Summary: 该论文针对遗传编程符号蒸馏中预测精度低的问题,提出通过雅可比和利普希茨惩罚项来正则化教师模型的功能平滑度,使其与学生模型对齐。实验证明,从平滑正则化后的教师模型中蒸馏得到的学生模型在R^2分数上有显著提升。
[263] BRAIN: Bias-Mitigation Continual Learning Approach to Vision-Brain Understanding
- arXiv: 2508.18187 (replaced)
- Authors: Xuan-Bac Nguyen, Thanh-Dat Truong, Pawan Sinha, Khoa Luu
- Subjects: cs.CV; cs.AI
- Tags: Continual Learning, Bias Mitigation, Brain-Computer Interface
- Summary: 该论文针对大脑信号随时间衰减导致的不一致性和偏差问题,提出了一种名为BRAIN的偏差缓解持续学习方法。该方法通过去偏差对比学习和基于角度的遗忘缓解机制,在多种视觉-大脑理解基准测试中取得了最先进的性能。
[264] Variation in Verification: Understanding Verification Dynamics in Large Language Models
- arXiv: 2509.17995 (replaced)
- Authors: Yefan Zhou, Austin Xu, Yilun Zhou, Janvijay Singh, Jiang Gui, Shafiq Joty
- Subjects: cs.CL; cs.AI; cs.LG
- Tags: LLM Evaluation, LLM Reasoning, LLM Inference
- Venue: ICLR 2026
- Summary: 该研究系统分析了大语言模型在测试时扩展中的验证动态,探讨了问题难度、生成器能力和验证器能力对验证效果的影响。研究发现,验证器在简单问题和弱生成器上的表现更优,并指出了优化验证策略的机会。
[265] Safe-SAIL: Towards a Fine-grained Safety Landscape of Large Language Models via Sparse Autoencoder Interpretation Framework
- arXiv: 2509.18127 (replaced)
- Authors: Jiaqi Weng, Han Zheng, Hanyu Zhang, Ej Zhou, Qinqin He, Jialing Tao, Hui Xue, Zhixuan Chu, Xiting Wang
- Subjects: cs.LG; cs.AI; cs.CL
- Tags: LLM Security, Interpretability, AI Safety
- Summary: 该论文提出了Safe-SAIL框架,利用稀疏自编码器(SAE)解析大语言模型在安全关键领域的细粒度特征。该方法通过预解释评估指标和分段级模拟策略,高效识别安全相关特征,并揭示了模型编码安全实体的机制。
[266] DyBBT: Dynamic Balance via Bandit-inspired Targeting for Dialog Policy with Cognitive Dual-Systems
- arXiv: 2509.19695 (replaced)
- Authors: Shuyu Zhang, Yifan Wei, Jialuo Yuan, Xinru Wang, Yanmin Zhu, Bin Li, Yujie Liu
- Subjects: cs.CL; cs.AI; cs.IR
- Tags: Dialogue System, Reinforcement Learning, Decision Making
- Venue: ACL 2026
- Summary: 该论文提出了一种名为DyBBT的对话策略学习框架,利用受强盗算法启发的元控制器,根据认知状态在快速直觉推理和慢速深思熟虑推理之间动态切换。实验结果表明,该方法在成功率、效率和泛化能力上均达到了最先进水平。
[267] HiCoLoRA: Addressing Context-Prompt Misalignment via Hierarchical Collaborative LoRA for Zero-Shot DST
- arXiv: 2509.19742 (replaced)
- Authors: Shuyu Zhang, Yifan Wei, Xinru Wang, Yanmin Zhu, Yangfan He, Yixuan Weng, Bin Li, Yujie Liu
- Subjects: cs.CL; cs.AI; cs.IR
- Tags: Dialogue System, Parameter-Efficient Fine-Tuning, Zero-Shot Learning
- Venue: ACL 2026 Findings
- Code: code
- Summary: 该论文提出了分层协作LoRA(HiCoLoRA)框架,通过动态层特定处理和自适应线性融合机制,解决了零样本对话状态跟踪中上下文与提示之间的语义错位问题。该方法在MultiWOZ和SGD数据集上取得了最先进的性能。
[268] SeedPrints: Fingerprints Can Even Tell Which Seed Your Large Language Model Was Trained From
- arXiv: 2509.26404 (replaced)
- Authors: Yao Tong, Haonan Wang, Siquan Li, Kenji Kawaguchi, Tianyang Hu
- Subjects: cs.CR; cs.AI; cs.CL
- Tags: LLM Security, Model Security, LLM Training
- Venue: ICLR 2026
- Summary: 该论文提出了一种名为SeedPrints的方法,利用随机初始化偏差作为大语言模型的内在指纹,实现了从初始化到大规模预训练及下游适应的全生命周期身份验证。实验证明,该方法在训练早期和分布偏移下依然有效,优于现有指纹技术。
[269] Benchmarking Foundation Models with Retrieval-Augmented Generation in Olympic-Level Physics Problem Solving
- arXiv: 2510.00919 (replaced)
- Authors: Shunfeng Zheng, Yudi Zhang, Meng Fang, Zihan Zhang, Zhitan Wu, Mykola Pechenizkiy, Ling Chen
- Subjects: cs.CL; cs.AI
- Tags: RAG, Scientific Reasoning, Benchmark
- Venue: EMNLP 2025 Findings
- Summary: 该论文构建了一个名为PhoPile的高质量多模态数据集,用于评估检索增强生成(RAG)在奥林匹克级物理问题求解中的表现。实验结果显示,结合物理语料库的检索能提升模型性能,但也揭示了当前模型在专家级推理上的挑战。
[270] LLM as Attention-Informed NTM and Topic Modeling as long-input Generation: Interpretability and long-Context Capability
- arXiv: 2510.03174 (replaced)
- Authors: Xuan Xu, Zhongliang Yang, Haolun Li, Beilin Chu, Rui Tian, Yu Li, Shaolin Tan, Linna Zhou
- Subjects: cs.CL; cs.AI
- Tags: Topic Modeling, Long Context, Interpretability
- Summary: 该研究从白盒和黑盒两个视角探讨了基于大语言模型的主题建模,提出了注意力知情框架以恢复可解释的主题结构,并将主题建模重构为长输入生成任务。实验表明,该方法在主题分配和关键词提取上表现优异,验证了长上下文LLM在主题建模中的潜力。
[271] Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain
- arXiv: 2510.05159 (replaced)
- Authors: Léo Boisvert, Abhay Puri, Chandra Kiran Reddy Evuru, Nazanin Sepahvand, Nicolas Chapados, Quentin Cappart, Alexandre Lacoste, Krishnamurthy Dj Dvijotham, Alexandre Drouin
- Subjects: cs.CR; cs.AI; cs.LG
- Tags: LLM Agent, LLM Security, Backdoor Detection
- Summary: 该论文揭示了智能体AI供应链中的安全漏洞,形式化了三种威胁模型,包括微调数据投毒、预置后门基础模型和环境投毒。实验表明,仅需少量投毒样本即可植入后门,导致智能体以高成功率执行恶意行为。
[272] Geometry Aware Cross-Modal Alignment for Light Field-LiDAR Semantic Segmentation
- arXiv: 2510.06687 (replaced)
- Authors: Jie Luo, Yuxuan Jiang, Xin Jin, Mingyu Liu, Yihui Fan
- Subjects: cs.CV; cs.AI
- Tags: Image Segmentation, Autonomous Driving, Sensor Fusion
- Summary: 该论文提出了首个集成光场和激光雷达数据的多模态语义分割数据集,并设计了一种融合分割网络,通过特征补全和深度感知模块解决了模态差异和遮挡问题。实验结果表明,该方法在平均交并比上优于单一模态的分割方法。
[273] GTCN-G: A Residual Graph-Temporal Fusion Network for Imbalanced Intrusion Detection
- arXiv: 2510.07285 (replaced)
- Authors: Tianxiang Xu, Zhichao Wen, Xinyu Zhao, Qi Hu, Yan Li, Chang Liu
- Subjects: cs.LG; cs.AI
- Tags: Graph Neural Network, Anomaly Detection, Cybersecurity
- Summary: 本文提出GTCN-G框架,融合门控时序卷积网络和图卷积网络,通过残差学习机制解决网络入侵检测中的类别不平衡问题。该方法在UNSW-NB15和ToN-IoT数据集上取得了最先进的性能。
[274] Instructions are all you need: Self-supervised Reinforcement Learning for Instruction Following
- arXiv: 2510.14420 (replaced)
- Authors: Qingyu Ren, Qianyu He, Powei Chang, Jie Zeng, Zeye Sun, Fei Yu, Jiaqing Liang, Yanghua Xiao
- Subjects: cs.CL; cs.AI
- Tags: Reinforcement Learning, Instruction Tuning, LLM Alignment
- Code: code
- Summary: 本文提出一种自监督强化学习框架,通过从指令中直接推导奖励信号来消除对外部监督的依赖,解决多约束指令跟随问题。该方法采用约束分解策略,在域内和域外数据集上均取得了显著改进。
[275] Think Parallax: Solving Multi-Hop Problems via Multi-View Knowledge-Graph-Based Retrieval-Augmented Generation
- arXiv: 2510.15552 (replaced)
- Authors: Jinliang Liu, Jiale Bai, Shaoning Zeng
- Subjects: cs.CL; cs.AI
- Tags: RAG, Knowledge Graph, LLM Reasoning
- Summary: 本文提出ParallaxRAG框架,通过将查询和知识图谱解耦到对齐的头部特定语义空间来解决多跳推理问题。该方法在WebQSP和CWQ基准上取得了最先进的检索和问答性能,同时显著减少了幻觉。
[276] StableSketcher: Enhancing Diffusion Model for Pixel-based Sketch Generation via Visual Question Answering Feedback
- arXiv: 2510.20093 (replaced)
- Authors: Jiho Park, Sieun Choi, Jaeyoon Seo, Jihie Kim
- Subjects: cs.CV; cs.AI
- Tags: Diffusion Model, Image Generation, Reinforcement Learning
- Summary: 本文提出StableSketcher框架,通过强化学习和基于视觉问答的奖励函数增强扩散模型生成手绘素描的能力。作者还发布了SketchDUO数据集,这是首个包含实例级素描、标题和问答对的数据集。
[277] Why Did Apple Fall: Evaluating Curiosity in Large Language Models
- arXiv: 2510.20635 (replaced)
- Authors: Haoyu Wang, Sihang Jiang, Yuyan Chen, Xiaojun Meng, Jiansheng Wei, Yitong Wang, Yanghua Xiao
- Subjects: cs.CL; cs.AI
- Tags: LLM Evaluation, Cognitive Science
- Venue: ACL 2026 findings
- Summary: 本文基于五维好奇心量表设计了评估框架,从信息寻求、刺激寻求和社交好奇心等维度评估大语言模型的好奇心表现。结果表明LLM表现出比人类更强的求知欲,但在不确定环境中倾向于做出保守选择。
[278] FaCT: Faithful Concept Traces for Explaining Neural Network Decisions
- arXiv: 2510.25512 (replaced)
- Authors: Amin Parchami-Araghi, Sukrut Rao, Jonas Fischer, Bernt Schiele
- Subjects: cs.LG; cs.AI; cs.CV
- Tags: Interpretability, Representation Learning
- Venue: NeurIPS 2025
- Code: code
- Summary: 本文提出一种具有内在机制概念解释的模型,可以从任意层忠实地追溯概念对logits的贡献。作者还提出了C²-Score概念一致性指标,用于评估基于概念的解释方法。
[279] Generative Modeling Enables Molecular Structure Retrieval from Coulomb Explosion Imaging
- arXiv: 2511.00179 (replaced)
- Authors: Xiang Li, Till Jahnke, Rebecca Boll, Jiaqi Han, Minkai Xu, Michael Meyer, Maria Novella Piancastelli, Daniel Rolles, Artem Rudenko, Florian Trinter, Thomas J.A. Wolf, Jana B. Thayer, James P. Cryan, Stefano Ermon, Phay J. Ho
- Subjects: cs.AI; cs.LG
- Tags: Diffusion Model, Molecular Generation, Scientific Computing
- Summary: 本文使用基于扩散的Transformer神经网络从库仑爆炸成像的离子动量分布中重建分子几何结构。该方法实现了低于一个玻尔半径的平均绝对误差,解决了多原子分子结构检索的非线性逆问题。
[280] Turbo-DDCM: Fast and Flexible Zero-Shot Diffusion-Based Image Compression
- arXiv: 2511.06424 (replaced)
- Authors: Amit Vaisman, Guy Ohayon, Hila Manor, Michael Elad, Tomer Michaeli
- Subjects: eess.IV; cs.AI; cs.CV; eess.SP; stat.ML
- Tags: Diffusion Model, Image Compression, Zero-Shot Learning
- Venue: ICLR 2026
- Summary: 本文提出Turbo-DDCM方法,通过在每个去噪步骤高效组合大量噪声向量来加速零样本扩散图像压缩。该方法还引入了优先级感知和失真控制两种灵活变体,在保持性能的同时显著降低了计算成本。
[281] Reasoning about Intent for Ambiguous Requests
- arXiv: 2511.10453 (replaced)
- Authors: Irina Saparina, Mirella Lapata
- Subjects: cs.CL; cs.AI
- Tags: LLM Reasoning, Reinforcement Learning
- Summary: 本文提出一种处理模糊请求的方法,通过生成枚举不同解释及其对应答案的结构化响应。模型使用强化学习和双重奖励目标进行训练,在对话问答和语义解析任务上实现了更高的有效答案覆盖率。
[282] GeoPl@ntNet: A Platform for Exploring Essential Biodiversity Variables
- arXiv: 2511.13790 (replaced)
- Authors: Lukas Picek, César Leblanc, Alexis Joly, Pierre Bonnet, Rémi Palard, Maximilien Servajean
- Subjects: q-bio.QM; cs.AI
- Tags: Environmental Planning, Data Visualization
- Summary: 本文描述了GeoPl@ntNet交互式网络应用,通过动态地图和事实卡片让用户探索AI生成的物种分布、栖息地类型和生物多样性指标地图。该平台使用CNN和LLM级联管道,提供精确至50x50米的分辨率。
[283] HiFiNet: Hierarchical Fault Identification in Wireless Sensor Networks via Edge-Based Classification and Graph Aggregation
- arXiv: 2511.17537 (replaced)
- Authors: Nguyen Tri Nghia, Nguyen Van Son, Nguyen Thi Hanh
- Subjects: cs.NI; cs.AI
- Tags: Graph Neural Network, Anomaly Detection, IoT
- Venue: CITA 2026
- Summary: 本文提出HiFiNet分层故障识别框架,结合LSTM堆叠自编码器和图注意力网络来捕获无线传感器网络数据的时空特征。该方法在合成数据集上显著优于现有方法,并支持诊断性能与能效之间的可调权衡。
[284] CropVLM: Learning to Zoom for Fine-Grained Vision-Language Perception
- arXiv: 2511.19820 (replaced)
- Authors: Miguel Carvalho, Helder Dias, Bruno Martins
- Subjects: cs.CV; cs.AI; cs.CL; cs.LG
- Tags: Vision-Language Model, Reinforcement Learning
- Venue: CVPR 2026 Workshop
- Summary: 本文提出CropVLM方法,使视觉语言模型能够动态放大相关图像区域以增强细粒度理解能力。该模型使用强化学习训练,无需人工标注的边界框,可配对开源和专有VLM提升高分辨率图像理解性能。
[285] Hybrid-AIRL: Enhancing Inverse Reinforcement Learning with Supervised Expert Guidance
- arXiv: 2511.21356 (replaced)
- Authors: Bram Silue, Santiago Amaya-Corredor, Patrick Mannion, Lander Willem, Pieter Libin
- Subjects: cs.LG; cs.AI
- Tags: Reinforcement Learning, Imitation Learning
- Venue: ESANN 2026
- Code: code
- Summary: 本文提出Hybrid-AIRL方法,通过引入来自专家数据的监督损失和随机正则化机制来增强对抗逆强化学习。该方法在复杂的不完全信息环境中实现了更高的样本效率和更稳定的学习过程。
[286] BINDER: Instantly Adaptive Mobile Manipulation with Open-Vocabulary Commands
- arXiv: 2511.22364 (replaced)
- Authors: Seongwon Cho, Daechul Ahn, Donghyun Shin, Hyeonbeom Choi, San Kim, Jonghyun Choi
- Subjects: cs.RO; cs.AI
- Tags: LLM Agent, Robotics, Multimodal Learning
- Summary: 本文提出BINDER双过程框架,将战略规划与连续环境监控解耦,实现开放词汇移动操作的即时自适应。该方法通过多模态LLM规划模块和VideoLLM监控模块的双向协调,在动态环境中实现了更高的成功率。
[287] Red Teaming Large Reasoning Models
- arXiv: 2512.00412 (replaced)
- Authors: Jiawei Chen, Yang Yang, Chao Yu, Yu Tian, Zhi Cao, Xue Yang, Linghao Li, Hang Su, Zhaoxia Yin
- Subjects: cs.CR; cs.AI
- Tags: LLM Reasoning, LLM Evaluation, LLM Security
- Summary: 本文提出RT-LRM基准,用于评估大型推理模型在真实性、安全性和效率三个维度的可信度。实验表明LRM在面临推理诱导风险时比标准LLM更加脆弱,揭示了此前未被充分探索的漏洞。
[288] Vec-LUT: Vector Table Lookup for Parallel Ultra-Low-Bit LLM Inference on Edge Devices
- arXiv: 2512.06443 (replaced)
- Authors: Xiangyu Li, Chengyu Yin, Weijun Wang, Jianyu Wei, Ting Cao, Yunxin Liu
- Subjects: cs.DC; cs.AI
- Tags: LLM Inference, Model Compression, Edge Computing
- Venue: MobiSys 2026
- Code: code
- Summary: 本文提出Vec-LUT向量查表范式,通过在并行token间构建统一查找表来优化超低位宽LLM推理的内存带宽利用。该方法在5个边缘设备和3个LLM上实现了最高4.2倍的加速。
[289] INFORM-CT: INtegrating LLMs and VLMs FOR Incidental Findings Management in Abdominal CT
- arXiv: 2512.14732 (replaced)
- Authors: Idan Tankel, Nir Mazor, Rafi Brada, Christina LeBedis, Guy ben-Yosef
- Subjects: cs.LG; cs.AI; cs.CV; eess.IV
- Tags: Medical AI, LLM Agent, Vision-Language Model
- Venue: MIDL 2026
- Summary: 本文提出了一种利用LLM和VLM的规划-执行智能体框架,用于腹部CT扫描中偶发发现的检测、分类和报告。该框架通过LLM规划器生成Python脚本,执行器调用VLM和分割模型进行自动化处理,在准确性和效率上优于纯VLM方法。
[290] Revisiting the Reliability of Language Models in Instruction-Following
- arXiv: 2512.14754 (replaced)
- Authors: Jianshuo Dong, Yutong Zhang, Yan Liu, Zhenyu Zhong, Tao Wei, Chao Zhang, Han Qiu
- Subjects: cs.SE; cs.AI; cs.CL
- Tags: LLM Evaluation, Instruction Tuning, Prompt Engineering
- Venue: ACL 2026
- Code: code
- Summary: 本文研究LLM在指令遵循中的细粒度可靠性,发现模型在细微提示变化下性能可能下降高达61.8%。作者提出了新的评估指标reliable@k并构建了IFEval++基准,揭示了细粒度可靠性是构建可信LLM的关键挑战。
[291] Enabling Ultra-Fast Cardiovascular Imaging Across Heterogeneous Clinical Environments with A Generalist Foundation Model and Multimodal Database
- arXiv: 2512.21652 (replaced)
- Authors: Zi Wang, Mingkai Huang, Zhang Shi, Hongjie Hu, Lan Lan, Hui Zhang, Yan Li, Xi Hu, Qing Lu, Zongming Zhu, Qiong Yao, Yuxiang Dai, Fanwen Wang, Yinzhe Wu, Jun Lyu, Qianqian Gao, Guangming Xu, Zhenxuan Zhang, Haosen Zhang, Qing Li, Guangming Wang, Tianxing He, Lizhen Lan, Siyue Li, Le Xue, Mengting Sun, Yuntong Lyu, Junpu Hu, Jiayu Zhu, Rizwan Ahmad, Zhengyu Bu, Xianling Qian, Guanke Cai, Ruiyu Cao, Weirui Cai, Chang Xu, Yuyang Ren, Feidan Yu, Siying Ma, Ziqiang Xu, Xinran Chen, Sha Hua, Daniel Kim, Yajing Zhang, Chen Ouyang, Wenjia Bai, Jing Qin, Yucheng Yang, Daniel Rueckert, He Wang, Qian Tao, Claudia Prieto, Michael Markl, Alistair Young, Lianming Wu, Shuo Wang, Chen Qin, Mengsu Zeng, Xihong Hu, Haibo Xu, Xiaobo Qu, Hao Li, Guang Yang, Chengyan Wang
- Subjects: eess.IV; cs.AI
- Tags: Medical AI, Foundation Model, Image Reconstruction
- Code: code
- Summary: 本文构建了迄今最大的多模态心脏磁共振k空间数据库MMCMR-427K,并提出了通用重建基础模型CardioMM。该模型支持高达24倍加速,在保持诊断图像质量的同时实现了跨异构临床环境的零样本泛化。
[292] FlowPlan-G2P: A Structured Generation Framework for Transforming Scientific Papers into Patent Descriptions
- arXiv: 2601.02589 (replaced)
- Authors: Kris W Pan, Yongmin Yoo
- Subjects: cs.CL; cs.AI
- Tags: Text Generation, Patent Analysis, Knowledge Graph
- Summary: 本文提出FlowPlan-G2P框架,将科学论文转化为专利描述的任务分解为概念图归纳、段落规划和图条件生成三个阶段。该方法模拟专家认知工作流,在逻辑连贯性和法律合规性上显著优于端到端LLM基线。
[293] Safe-FedLLM: Delving into the Safety of Federated Large Language Models
- arXiv: 2601.07177 (replaced)
- Authors: Mingxiang Tao, Yu Tian, Wenxuan Tu, Yue Yang, Xue Yang, Xiangyan Tang
- Subjects: cs.CR; cs.AI
- Tags: Federated Learning, LLM Security, LLM Training
- Summary: 本文研究了联邦学习环境下LLM的安全问题,发现LLM易受恶意客户端攻击,而LoRA更新呈现可区分的行为模式。作者提出了Safe-FedLLM防御框架,通过轻量级分类器检测恶意LoRA更新,有效提升系统鲁棒性。
[294] Understanding or Memorizing? A Case Study of German Definite Articles in Language Models
- arXiv: 2601.09313 (replaced)
- Authors: Jonathan Drechsel, Erisa Bytyqi, Steffen Herbold
- Subjects: cs.CL; cs.AI
- Tags: Interpretability, LLM Reasoning
- Venue: ACL 2026
- Summary: 本文使用梯度可解释性方法GRADIEND研究语言模型对德语定冠词的处理机制,发现针对特定性别-格位的学习更新经常影响无关设置,表明模型至少部分依赖记忆关联而非抽象语法规则。
[295] CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning
- arXiv: 2602.00181 (replaced)
- Authors: Hang Wu, Yujun Cai, Zehao Li, Haonan Ge, Bowen Sun, Junsong Yuan, Yiwei Wang
- Subjects: cs.CV; cs.AI
- Tags: Video Understanding, Reinforcement Learning, Vision-Language Model
- Summary: 本文提出CamReasoner框架,将摄像机运动理解重构为结构化推理过程,采用观察-思考-回答范式。该方法首次将强化学习用于摄像机运动理解的逻辑对齐,在多个基准上显著优于现有方法。
[296] El Agente Quntur: A research collaborator agent for quantum chemistry
- arXiv: 2602.04850 (replaced)
- Authors: Juan B. Pérez-Sánchez, Yunheng Zou, Jorge A. Campos-Gonzalez-Angulo, Marcel Müller, Ignacio Gustin, Andrew Wang, Han Hao, Tsz Wai Ko, Changhyeok Choi, Eric S. Isbrandt, Mohammad Ghazi Vakili, Hanyong Xu, Chris Crebolder, Varinia Bernales, Alán Aspuru-Guzik
- Subjects: cs.AI; cs.MA
- Tags: LLM Agent, Multi-Agent System, Scientific Computing
- Summary: 本文介绍了El Agente Quntur,一个面向量子化学研究的层次化多智能体AI系统。该系统支持ORCA 6.0的全部计算功能,能够基于软件文档和科学文献进行推理,规划、执行和分析计算化学实验。
[297] Evaluating LLM-Generated ACSL Annotations for Formal Verification
- arXiv: 2602.13851 (replaced)
- Authors: Arshad Beg, Diarmuid O'Donoghue, Rosemary Monahan
- Subjects: cs.SE; cs.AI
- Tags: Formal Methods, Code Generation, LLM Evaluation
- Venue: ECOOP 2026 Workshop
- Summary: 本文对C程序的自动化ACSL注解生成策略进行实证评估,比较了基于规则的脚本、Frama-C插件和多种LLM。结果表明基于规则的方法在验证成功率上仍更可靠,而LLM方法表现更不稳定。
[298] MoDora: Tree-Based Semi-Structured Document Analysis System
- arXiv: 2602.23061 (replaced)
- Authors: Bangrui Xu, Qihang Yao, Zirui Tang, Xuanhe Zhou, Yeye He, Shihan Yu, Qianqian Xu, Bin Wang, Guoliang Li, Conghui He, Fan Wu
- Subjects: cs.IR; cs.AI; cs.CL; cs.DB; cs.LG
- Tags: Document Understanding, RAG, Question Answering
- Code: code
- Summary: 本文提出MoDora系统,用于半结构化文档分析。该系统采用组件关联树(CCTree)层次化组织文档元素,并设计了问题类型感知的检索策略,在准确率上比基线提升5.97%-61.07%。
[299] MAML-KT: Addressing Cold Start Problem in Knowledge Tracing for New Students via Few-Shot Model-Agnostic Meta Learning
- arXiv: 2603.00137 (replaced)
- Authors: Indronil Bhattacharjee, Christabel Wayllace
- Subjects: cs.LG; cs.AI
- Tags: Knowledge Tracing, Meta-Learning, Few-Shot Learning
- Summary: 本文将知识追踪中新学生的性能预测建模为少样本学习问题,提出MAML-KT方法,通过模型无关元学习优化初始化,使模型能够快速适应新学生。在冷启动场景下,该方法在早期预测准确率上显著优于现有模型。
[300] Silo-Bench: A Scalable Environment for Evaluating Distributed Coordination in Multi-Agent LLM Systems
- arXiv: 2603.01045 (replaced)
- Authors: Yuzhe Zhang, Feiran Liu, Yi Shan, Xinyi Huang, Xin Yang, Yueqi Zhu, Xuxin Cheng, Cao Liu, Ke Zeng, Terry Jingchen Zhang, Wenyuan Jiang
- Subjects: cs.MA; cs.AI
- Tags: Multi-Agent System, LLM Evaluation, Benchmark
- Venue: ACL 2026
- Code: code
- Summary: 本文引入SILO-BENCH基准,用于评估多智能体LLM系统中的分布式协调能力。实验揭示了通信-推理差距:智能体能有效交换信息但系统性地无法整合分布式状态,协调开销随规模增加而累积。
[301] FAST-DIPS: Adjoint-Free Analytic Steps and Hard-Constrained Likelihood Correction for Diffusion-Prior Inverse Problems
- arXiv: 2603.01591 (replaced)
- Authors: Minwoo Kim, Seunghyeok Shin, Hongki Lim
- Subjects: cs.LG; cs.AI; cs.CV
- Tags: Diffusion Model, Image Reconstruction, Optimization
- Summary: 本文提出一种无需训练的扩散先验逆问题求解器,用硬测量空间可行性约束和解析最优步长替代内循环优化。该方法在保持竞争力的PSNR/SSIM/LPIPS的同时实现了高达19.5倍的加速。
[302] Poisoning the Inner Prediction Logic of Graph Neural Networks for Clean-Label Backdoor Attacks
- arXiv: 2603.05004 (replaced)
- Authors: Yuxiang Zhang, Bin Ma, Enyan Dai
- Subjects: cs.LG; cs.AI
- Tags: Graph Neural Network, Backdoor Detection, Adversarial Robustness
- Summary: 本文研究图神经网络的清洁标签后门攻击,提出BA-Logic方法,通过协调中毒节点选择器和逻辑中毒触发器生成器来毒害GNN的内部预测逻辑,在清洁标签设置下显著提升攻击成功率。
[303] IMSE: Intrinsic Mixture of Spectral Experts Fine-tuning for Test-Time Adaptation
- arXiv: 2603.07926 (replaced)
- Authors: Sunghyun Baek, Jaemyung Yu, Seunghee Koh, Minsu Kim, Hyeonseong Jeon, Junmo Kim
- Subjects: cs.CV; cs.AI
- Tags: Transfer Learning, Vision Transformer, Domain Adaptation
- Venue: ICLR 2026
- Code: code
- Summary: 本文提出内在谱专家混合(IMSE)方法用于视觉Transformer的测试时适应,仅调整奇异值而保持奇异向量固定,并引入多样性最大化损失防止特征坍塌,在分布偏移基准上达到最优性能。
[304] Beyond Relevance: On the Relationship Between Retrieval and RAG Information Coverage
- arXiv: 2603.08819 (replaced)
- Authors: Saron Samuel, Alexander Martin, Eugene Yang, Andrew Yates, Dawn Lawrie, Laura Dietz, Benjamin Van Durme
- Subjects: cs.IR; cs.AI
- Tags: RAG, Information Retrieval, LLM Evaluation
- Summary: 本文系统研究了检索质量与RAG信息覆盖率之间的关系,发现基于覆盖率的检索指标与生成响应的nugget覆盖率存在强相关性,为使用检索指标作为RAG性能代理提供了实证支持。
[305] Are Video Reasoning Models Ready to Go Outside?
- arXiv: 2603.10652 (replaced)
- Authors: Yangfan He, Changgyu Boo, Jaehong Yoon
- Subjects: cs.CV; cs.AI
- Tags: Vision-Language Model, Video Understanding, LLM Reasoning
- Summary: 本文提出了ROVA框架,通过在时空扰动下建模鲁棒性感知一致性奖励来提升视频推理模型的鲁棒性。作者还引入了PVRBench基准测试,证明该方法在真实扰动下显著提升了准确性和推理质量。
[306] Public Profile Matters: A Scalable Integrated Approach to Recommend Citations in the Wild
- arXiv: 2603.17361 (replaced)
- Authors: Karan Goyal, Dikshant Kukreja, Vikram Goyal, Mukesh Mohania
- Subjects: cs.IR; cs.AI; cs.CL; cs.SI
- Tags: Recommender System, Information Retrieval
- Summary: 本文提出了Profiler,一个轻量级非学习模块,能够高效且无偏地捕获人类引用模式,显著增强候选检索效果。作者还引入了严格的归纳评估设置和新颖的DAVINCI重排序模型,在多个基准数据集上取得了最先进的结果。
[307] ALL-FEM: Agentic Large Language models Fine-tuned for Finite Element Methods
- arXiv: 2603.21011 (replaced)
- Authors: Rushikesh Deotale, Adithya Srinivasan, Yuan Tian, Tianyi Zhang, Pavlos Vlachos, Hector Gomez
- Subjects: cs.CE; cs.AI; cs.LG; cs.MS; math.NA
- Tags: Code Generation, LLM Agent, Scientific Computing
- Summary: 本文提出了ALL-FEM,一个将智能体AI与领域特定微调LLM相结合的自主仿真系统,用于有限元方法代码生成。该系统构建了1000多个验证过的FEniCS脚本语料库,在39个基准测试中实现了71.79%的代码级成功率。
[308] LPNSR: Optimal Noise-Guided Diffusion Image Super-Resolution Via Learnable Noise Prediction
- arXiv: 2603.21045 (replaced)
- Authors: Shuwei Huang, Shizhuo Liu, Zijun Wei
- Subjects: cs.CV; cs.AI
- Tags: Image Super-Resolution, Diffusion Model
- Code: code
- Summary: 本文建立了扩散模型中最优中间噪声的理论框架,并设计了可学习的噪声预测器来替代随机高斯噪声。该方法在合成和真实数据集上实现了最先进的感知性能,无需依赖大规模文本到图像先验。
[309] KG-Hopper: Empowering Compact Open LLMs with Knowledge Graph Reasoning via Reinforcement Learning
- arXiv: 2603.21440 (replaced)
- Authors: Shuai Wang, Yinan Yu
- Subjects: cs.CL; cs.AI
- Tags: Knowledge Graph, LLM Reasoning, Reinforcement Learning
- Venue: IJCNN 2026
- Code: code
- Summary: 本文提出了KG-Hopper,一个强化学习框架,使紧凑型LLM能够在单次推理轮次中执行集成的多跳知识图谱推理。基于7B参数的模型在八个KG推理基准上持续优于更大的多步系统,并与GPT-3.5-Turbo等专有模型取得竞争性表现。
[310] Suiren-1.0 Technical Report: A Family of Molecular Foundation Models
- arXiv: 2603.21942 (replaced)
- Authors: Junyi An, Xinyu Lu, Yun-Fei Shi, Li-Cheng Xu, Nannan Zhang, Chao Qu, Yuan Qi, Fenglei Cao
- Subjects: cs.AI
- Tags: Molecular Generation, Foundation Model, Knowledge Distillation
- Summary: 本文介绍了Suiren-1.0,一个用于有机系统建模的分子基础模型家族,包含三个专门变体。作者提出了构象压缩蒸馏框架,将复杂的3D结构表示蒸馏为2D构象平均表示,在多个任务上取得了最先进的结果。
[311] Decidable By Construction: Design-Time Verification for Trustworthy AI
- arXiv: 2603.25414 (replaced)
- Authors: Houston Haynes
- Subjects: cs.PL; cs.AI; cs.LG; cs.LO
- Tags: Formal Methods, AI Safety, Explainable AI
- Summary: 本文提出了一个在训练前设计时验证AI模型属性(如数值稳定性和计算正确性)的框架。该框架基于有限生成阿贝尔群上的约束,使用维度类型系统和几何代数,以极小的计算开销确保模型的可信性。
[312] GUIDE: Guided Updates for In-context Decision Evolution in LLM-Driven Spacecraft Operations
- arXiv: 2603.27306 (replaced)
- Authors: Alejandro Carrasco, Mariko Storey-Matsutani, Victor Rodriguez-Fernandez, Richard Linares
- Subjects: cs.MA; cs.AI; eess.SY
- Tags: LLM Agent, Decision Making, Satellite Control
- Venue: CVPR 2026 Workshop
- Summary: 本文介绍了GUIDE,一个非参数策略改进框架,通过演化结构化的自然语言决策规则来实现跨回合适应,无需权重更新。该方法在轨道拦截任务中持续优于静态基线,证明了上下文演化在实时闭环航天器交互中的有效性。
[313] A General Model for Deepfake Speech Detection: Diverse Bonafide Resources or Diverse AI-Based Generators
- arXiv: 2603.27557 (replaced)
- Authors: Lam Pham, Khoi Vu, Dat Tran, David Fischinger, Alexander Schindler, Martin Boyer, Ian McLoughlin
- Subjects: cs.SD; cs.AI
- Tags: Deepfake Detection, Speech Processing
- Summary: 本文分析了真实语音资源和AI生成器两个主要因素对深度伪造语音检测模型性能和泛化能力的影响。研究表明,平衡真实语音资源和AI生成器是训练通用深度伪造语音检测模型的关键因素。
[314] Building evidence-based knowledge bases from full-text literature for disease-specific biomedical reasoning
- arXiv: 2603.28325 (replaced)
- Authors: Chang Zong, Sicheng Lv, Si-tu Xue, Huilin Zheng, Jian Wan, Lei Zhang
- Subjects: cs.CE; cs.AI
- Tags: Knowledge Graph, Medical AI, Information Extraction
- Summary: 本文介绍了EvidenceNet,一个从全文生物医学文献中提取的疾病特异性结构化证据记录数据集。该数据集使用LLM辅助管道提取实验依据,支持检索增强问答和图任务如链接预测和靶点优先排序。
[315] Efficient and Scalable Granular-ball Graph Coarsening Method for Large-scale Graph Node Classification
- arXiv: 2603.29148 (replaced)
- Authors: Guan Wang, Shuyin Xia, Lei Qian, Tao Wu, Guoyin Wang, Yi Wang, Wei Wang
- Subjects: cs.LG; cs.AI
- Tags: Graph Neural Network, Model Compression
- Summary: 本文提出了一种高效可扩展的粒球图粗化方法,用于大规模图节点分类。该方法使用多粒度粒球图粗化算法,时间复杂度为线性,能够自适应地显著降低原始图规模,提升GCN的训练效率和可扩展性。
[316] Quantifying Cross-Modal Interactions in Multimodal Glioma Survival Prediction via InterSHAP: Evidence for Additive Signal Integration
- arXiv: 2603.29977 (replaced)
- Authors: Iain Swift, JingHua Ye, Ruairi O'Reilly
- Subjects: cs.LG; cs.AI; q-bio.QM
- Tags: Multimodal Learning, Medical AI, Interpretability
- Summary: 本文将InterSHAP适应于Cox比例风险模型,用于量化多模态胶质瘤生存预测中的跨模态交互。研究发现预测性能与跨模态交互之间存在反比关系,表明性能提升来自互补信号聚合而非学习到的协同效应。
[317] DarwinNet: An Evolutionary Network Architecture for Agent-Driven Protocol Synthesis
- arXiv: 2604.01236 (replaced)
- Authors: Jinliang Xu, Bingqi Li
- Subjects: cs.NE; cs.AI; cs.DC; cs.MA; cs.NI
- Tags: Network Protocol, LLM Agent, Evolutionary Computation
- Summary: 本文提出了DarwinNet,一种受生物启发的自演化网络架构,将通信协议从设计时静态范式转变为运行时增长范式。该三层框架通过双循环意图到字节码机制实现协议合成,实现了抗脆弱性和自主演化。
[318] Not All Turns Are Equally Hard: Adaptive Thinking Budgets For Efficient Multi-Turn Reasoning
- arXiv: 2604.05164 (replaced)
- Authors: Neharika Jali, Anupam Nayak, Gauri Joshi
- Subjects: cs.LG; cs.AI
- Tags: LLM Reasoning, LLM Inference
- Summary: 本文提出了TAB(回合自适应预算),一种通过群组相对策略优化训练的预算分配策略,学习在多轮推理中自适应地分配计算预算。该方法在数学推理基准上节省高达35-40%的token,同时保持准确性。
[319] MorphDistill: Distilling Unified Morphological Knowledge from Pathology Foundation Models for Colorectal Cancer Survival Prediction
- arXiv: 2604.06390 (replaced)
- Authors: Hikmat Khan, Usama Sajjad, Metin N. Gurcan, Anil Parwani, Wendy L. Frankel, Wei Chen, Muhammad Khalid Khan Niazi
- Subjects: cs.CV; cs.AI
- Tags: Knowledge Distillation, Medical AI, Foundation Model
- Summary: 本文提出了MorphDistill,一个两阶段框架,将多个病理学基础模型的知识蒸馏到紧凑的结直肠癌特定编码器中。该方法在Alliance/CALGB 89803队列上实现了约8%的相对改进,并展现出强大的跨数据集泛化能力。
[320] MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization
- arXiv: 2604.06798 (replaced)
- Authors: Zhixiong Zhao, Zukang Xu, Zhixuan Chen, Dawei Yang
- Subjects: cs.LG; cs.AI
- Tags: Model Compression, Mixture-of-Experts, Quantization
- Venue: ACL 2026 Findings
- Code: code
- Summary: 本文提出了MoBiE,首个专为MoE大语言模型定制的二值化框架。该方法使用联合SVD分解、全局损失梯度和误差约束,在多个MoE模型上显著降低了困惑度并提升了零样本性能,同时实现了2倍推理加速。
[321] Exact Structural Abstraction and Tractability Limits
- arXiv: 2604.07349 (replaced)
- Authors: Tristan Simas
- Subjects: cs.CC; cs.AI; cs.LO
- Tags: Formal Methods, Deep Learning Theory
- Summary: 本文探讨了精确结构抽象与可处理性限制的理论问题,证明了对于高效可检查的结构谓词存在元不可能性定理,表明正确性问题可处理性分类器无法在这些约束族上获得精确刻画。
[322] PS-TTS: Phonetic Synchronization in Text-to-Speech for Achieving Natural Automated Dubbing
- arXiv: 2604.09111 (replaced)
- Authors: Changi Hong, Yoonah Song, Hwayoung Park, Chaewoon Bang, Dayeon Gu, Do Hyun Lee, Hong Kook Kim
- Subjects: eess.AS; cs.AI
- Tags: Speech Synthesis
- Venue: ICPR 2026
- Summary: 本文提出了一种用于自动配音的语音同步方法PS-TTS,通过等时性约束和音素同步技术实现目标语音与源语音的时长匹配和唇形同步,在多语言实验中表现出色。
[323] Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition
- arXiv: 2604.09121 (replaced)
- Authors: Peng Wang, Yanqiao Zhu, Zixuan Jiang, Qinyuan Chen, Xingjian Zhao, Xipeng Qiu, Wupeng Wang, Zhifu Gao, Xiangang Li, Kai Yu, Xie Chen
- Subjects: cs.CL; cs.AI; cs.SD
- Tags: Speech Processing, LLM Agent, LLM Evaluation
- Summary: 本文提出了一种交互式ASR框架,利用LLM-as-a-Judge进行语义感知评估,并通过LLM驱动的代理框架实现多轮交互式识别修正,在多个基准测试上验证了其有效性。
[324] Many-Tier Instruction Hierarchy in LLM Agents
- arXiv: 2604.09443 (replaced)
- Authors: Jingyu Zhang, Tianjian Li, William Jurayj, Hongyuan Zhan, Benjamin Van Durme, Daniel Khashabi
- Subjects: cs.CL; cs.AI
- Tags: LLM Agent, Instruction Hierarchy
- Summary: 本文提出了Many-Tier Instruction Hierarchy范式,用于解决LLM代理中任意多层级指令冲突问题,并构建了ManyIH-Bench基准测试,发现当前前沿模型在指令冲突扩展时表现不佳。
[325] Face Density as a Proxy for Data Complexity: Quantifying the Hardness of Instance Count
- arXiv: 2604.09689 (replaced)
- Authors: Abolfazl Mohammadi-Seif, Ricardo Baeza-Yates
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Object Detection, Curriculum Learning
- Venue: IEEE CAI 2026
- Summary: 本文研究了人脸密度作为数据复杂度代理的影响,通过控制实验发现模型性能随人脸数量增加而单调下降,并揭示了密度作为域偏移导致低密度训练模型在高密度场景下表现不佳。
[326] LOLGORITHM: Funny Comment Generation Agent For Short Videos
- arXiv: 2604.09729 (replaced)
- Authors: Xuan Ouyang, Bouzhou Wang, Senan Wang, Siyuan Xiahou, Jinrong Zhou, Yuekang Li
- Subjects: cs.CV; cs.AI
- Tags: Text Generation, Video Understanding, LLM Agent
- Summary: 本文提出了LOLGORITHM,一个用于短视频评论生成的模块化多代理框架,支持六种可控评论风格,在YouTube和抖音双语数据集上实现了超过80%的人类偏好选择率。
[327] Wolkowicz-Styan Upper Bound on the Hessian Eigenspectrum for Cross-Entropy Loss in Nonlinear Smooth Neural Networks
- arXiv: 2604.10202 (replaced)
- Authors: Yuto Omae, Kazuki Sakai, Yohei Kakimoto, Makoto Sasaki, Yusuke Sakai, Hirotaka Takahashi
- Subjects: cs.LG; cs.AI; cs.NE
- Tags: Deep Learning Theory, Optimization
- Summary: 本文针对非线性平滑多层神经网络,利用Wolkowicz-Styan界推导了交叉熵损失下Hessian矩阵最大特征值的闭式上界,为分析损失尖锐度提供了理论工具。
[328] LoViF 2026 The First Challenge on Weather Removal in Videos
- arXiv: 2604.10655 (replaced)
- Authors: Chenghao Qian, Xin Li, Yeying Jin, Shangguan Sun, Yilian Zhong, Yuxiang Chen, Shibo Yin, Yushun Fang, Xilei Zhu, Yahui Wang, Chen Lu, Ying Fu, Jianan Tian, Jifan Zhang, Chen Zhou, Junyang Jiang, Yuping Sun, Zhuohang Shi, Xiaojing Liu, Jiao Liu, Yatong Zhou, Shuai Liu, Qiang Deng, Jiajia Mi, Qianhao Luo, Weiling Li
- Subjects: cs.CV; cs.AI; cs.MM
- Tags: Video Restoration, Image Enhancement
- Venue: CVPR 2026 Workshop
- Summary: 本文介绍了LoViF 2026视频天气去除挑战赛,提出了一个新的短视频天气去除数据集WRV,包含18个视频和1216对合成帧与真实帧,旨在推进恶劣天气条件下的视频恢复研究。
[329] Architecture-Agnostic Modality-Isolated Gated Fusion for Robust Multi-Modal Prostate MRI Segmentation
- arXiv: 2604.10702 (replaced)
- Authors: Yongbo Shu, Wenzhao Xie, Shanhu Yao, Zirui Xin, Luo Lei, Kewen Chen, Aijing Luo
- Subjects: cs.CV; cs.AI
- Tags: Image Segmentation, Medical AI, Multimodal Learning
- Summary: 本文提出了模态隔离门控融合(MIGF)模块,用于多参数前列腺MRI分割,通过独立的模态编码流和模态dropout训练实现对缺失或损坏模态的鲁棒性。
[330] MeloTune: On-Device Arousal Learning and Peer-to-Peer Mood Coupling for Proactive Music Curation
- arXiv: 2604.10815 (replaced)
- Authors: Hongwei Xu
- Subjects: cs.SD; cs.AI; cs.MA
- Tags: Recommender System, Affective Computing, Edge Computing
- Summary: 本文介绍了MeloTune,一个部署在iPhone上的音乐代理系统,通过个性化唤醒学习和点对点情绪耦合实现主动音乐推荐,所有推理均在设备端完成。
[331] Resilient Write: A Six-Layer Durable Write Surface for LLM Coding Agents
- arXiv: 2604.10842 (replaced)
- Authors: Justice Owusu Agyemang, Jerry John Kponyo, Elliot Amponsah, Godfred Manu Addo Boakye, Kwame Opuni-Boachie Obour Agyekum
- Subjects: cs.SE; cs.AI
- Tags: LLM Agent, Code Generation
- Summary: 本文提出了Resilient Write,一个为LLM编码代理设计的六层持久写入表面,通过预检风险评分、事务性原子写入等机制解决写入失败问题,显著提高了代理自纠正能力。
[332] ReXSonoVQA: A Video QA Benchmark for Procedure-Centric Ultrasound Understanding
- arXiv: 2604.10916 (replaced)
- Authors: Xucheng Wang, Xiaoman Zhang, Sung Eun Kim, Ankit Pal, Pranav Rajpurkar
- Subjects: cs.CV; cs.AI
- Tags: Medical AI, Video Understanding, Question Answering
- Summary: 本文介绍了ReXSonoVQA,一个面向超声程序理解的视频问答基准,包含514个视频片段和问题,评估了多个视觉语言模型在动作推理、伪影解决和程序规划方面的能力。
[333] Lightweight Low-Light Image Enhancement via Distribution-Normalizing Preprocessing and Depthwise U-Net
- arXiv: 2604.11071 (replaced)
- Authors: Shimon Murai, Teppei Kurita, Ryuta Satoh, Yusuke Moriuchi
- Subjects: cs.CV; cs.AI; cs.LG
- Tags: Image Enhancement
- Venue: CVPR 2026 Workshop
- Summary: 本文提出了一种轻量级低光照图像增强框架,结合冻结算法预处理和深度可分离卷积U-Net,在CVPR 2026 NTIRE挑战赛中获得第四名,实现了具有竞争力的感知质量。
[334] CocoaBench: Evaluating Unified Digital Agents in the Wild
- arXiv: 2604.11201 (replaced)
- Authors: CocoaBench Team, Shibo Hao, Zhining Zhang, Zhiqi Liang, Tianyang Liu, Yuheng Zha, Qiyue Gao, Jixuan Chen, Zilong Wang, Zhoujun Cheng, Haoxiang Zhang, Junli Wang, Hexi Jin, Boyuan Zheng, Kun Zhou, Yu Wang, Feng Yao, Licheng Liu, Yijiang Li, Zhifei Li, Zhengtao Han, Pracha Promthaw, Tommaso Cerruti, Xiaohan Fu, Ziqiao Ma, Jingbo Shang, Lianhui Qin, Julian McAuley, Eric P. Xing, Zhengzhong Liu, Rupesh Kumar Srivastava, Zhiting Hu
- Subjects: cs.CL; cs.AI
- Tags: LLM Agent, Benchmark
- Summary: 本文介绍了CocoaBench,一个用于评估统一数字代理的基准,包含需要灵活组合视觉、搜索和编码能力的人类设计任务,发现当前最佳系统成功率仅为45.1%。
[335] METRO: Towards Strategy Induction from Expert Dialogue Transcripts for Non-collaborative Dialogues
- arXiv: 2604.11427 (replaced)
- Authors: Haofu Yang, Jiaji Liu, Chen Huang, Faguo Wu, Wenqiang Lei, See-Kiong Ng
- Subjects: cs.CL; cs.AI
- Tags: Dialogue System, LLM Agent
- Venue: ACL 2026
- Code: code
- Summary: 本文提出了METRO方法,利用大语言模型从原始对话记录中自动归纳策略动作和规划逻辑,构建策略森林结构,在两个基准测试上平均提升9%-10%。
[336] CodeTracer: Towards Traceable Agent States
- arXiv: 2604.11641 (replaced)
- Authors: Han Li, Yifan Yao, Letian Zhu, Rili Feng, Hongyi Ye, Jiaming Wang, Yancheng He, Pengyu Zou, Lehan Zhang, Xinping Lei, Haoyang Huang, Ken Deng, Ming Sun, Zhaoxiang Zhang, He Ye, Jiaheng Liu
- Subjects: cs.SE; cs.AI
- Tags: LLM Agent, Software Engineering
- Summary: 本文提出了CodeTracer,一种用于代码代理的追踪架构,通过解析异构运行产物重建完整状态转换历史,并构建了CodeTraceBench基准用于系统评估故障定位能力。
[337] Towards Autonomous Mechanistic Reasoning in Virtual Cells
- arXiv: 2604.11661 (replaced)
- Authors: Yunhui Jang, Lu Zhu, Jake Fawkes, Alisandra Kaye Denton, Dominique Beaini, Emmanuel Noutahi
- Subjects: cs.LG; cs.AI
- Tags: LLM Agent, Scientific Reasoning, Medical AI
- Summary: 本文提出了VCR-Agent,一个用于虚拟细胞自主机制推理的多智能体框架,将生物推理表示为机制行动图以实现系统性验证。该框架结合生物学知识检索和验证器过滤方法,发布了VC-TRACES数据集,实验证明这些机制解释能提高基因表达预测的事实精度和监督效果。
[338] Multi-ORFT: Stable Online Reinforcement Fine-Tuning for Multi-Agent Diffusion Planning in Cooperative Driving
- arXiv: 2604.11734 (replaced)
- Authors: Haojie Bai, Aimin Li, Ruoyu Yao, Xiongwei Zhao, Tingting Zhang, Xing Zhang, Lin Gao, and Jun Ma
- Subjects: cs.RO; cs.AI
- Tags: Diffusion Model, Multi-Agent System, Autonomous Driving
- Summary: 本文提出了Multi-ORFT框架,将场景条件扩散预训练与稳定的在线强化后训练相结合,用于协同驾驶中的多智能体轨迹规划。该方法通过两级MDP公式化和方差门控组相对策略优化,在WOMD闭环基准上显著降低了碰撞率和偏航率,同时提高了平均速度。
[339] Physics-Informed State Space Models for Reliable Solar Irradiance Forecasting in Off-Grid Systems
- arXiv: 2604.11807 (replaced)
- Authors: Mohammed Ezzaldin Babiker Abdullah
- Subjects: cs.LG; cs.AI; eess.SY
- Tags: Physics-Informed Learning, Time Series Forecasting, Edge Computing
- Summary: 本文提出了物理信息状态空间模型(PISSM),用于离网光伏系统的太阳能辐照度预测。该方法结合动态Hankel矩阵嵌入、线性状态空间模型和物理信息门控机制,在保持少于4万参数的超轻量化同时,确保预测严格遵循昼夜周期等物理约束。