Post

arXiv cs.AI Daily Update

arXiv cs.AI Daily Update

cs.AI 领域 2026年4月14日 共有 731 篇论文更新:

整体趋势:今日论文主要聚焦于LLM Agent、LLM Evaluation、LLM Reasoning等方向。

已录用论文[15](ICLR 2026 Workshop), [20](LNICST 2026), [21](CCAI 2026), [26](WWW 2026), [36](ICLR 2026 Workshop), [44](ICLR 2026), [48](ICLR 2026), [51](ACL 2026), [52](DAC 2026), [53](ICLR 2026), [56](ICLR 2026), [64](ICLR 2026 Workshop), [70](EuroSys 2026 Workshop), [73](MIDL 2026), [78](ACL 2026), [79](ACL 2026 Findings), [95](ACL 2026), [98](Educational Data Mining 2026), [111](ACL 2026), [120](ACL 2026), [121](ACL 2026), [129](ACL 2026), [134](IEEE QCNC 2026 Workshop), [135](ACL 2026), [167](ICHI 2026), [171](ACL 2026), [189](WWW 2026), [192](IEEE PacificVis 2026), [197](ACM CHI 2026 Workshop), [204](XAI 2026), [211](LREC 2026), [212](CAI 2026), [213](Sensys 2026), [230](IEEE CAI 2026), [233](IEEE ICC 2026), [238](CVPR 2026 Workshop), [242](TDIS 2026 Workshop), [247](ICLR 2026 Workshop), [251](ACL 2026), [258](CVPR 2026 Workshop), [261](ICLR 2026), [275](ARCS 2026), [279](CEEPE 2026), [281](ICPR 2026), [282](CVPR 2026), [284](ACL 2026), [288](IJCNN 2026), [293](FSE 2026), [294](CVPR 2026), [296](ACL 2026), [298](GCON 2026), [304](FSE Companion 2026), [318](ACL 2026), [319](CVPR 2026), [320](ICASSP 2026), [323](ACL 2026), [324](CVPR 2026 Findings), [326](FSE 2026 Workshop), [331](ICLR 2026 Workshop), [337](AISTATS 2026 Workshop), [339](CVPR 2026 Workshop), [345](CVPR 2026 Workshop), [346](ACL 2026), [348](ACL 2025 Workshop), [355](ACL 2026), [359](ACL 2026), [364](CMN 2026 Workshop), [378](CHI 2026), [381](IC3 2016), [383](MIDL 2026), [386](ACL 2026), [393](CVPR 2026), [398](ICLR 2026), [409](CVPR 2026 Workshop), [412](FSE 2026), [414](ACL 2026), [417](IJEEE 2025), [423](ACL 2026), [424](IEEE TIP 2025), [428](SIGIR 2026), [429](NeurIPS 2025), [435](ACL 2026), [441](ACL 2026), [450](ACL 2026), [454](CHI 2026 Workshop), [457](ACL 2026), [464](CVPR 2026), [465](ACL 2026), [473](DAC 2026), [482](ICAIL 2026), [501](CVPR 2026), [502](ACL 2026 Findings), [504](AAAI 2026), [505](ACL 2026), [514](ACL 2026), [515](ACL 2026), [520](AAMAS 2026), [523](CVPR 2026), [524](ICLR 2026 Workshop), [526](ACM FAccT 2026), [532](CHI 2026 Workshop), [541](IJCNN 2026), [546](ACL 2026), [550](ACL 2026), [554](ACL 2026), [555](ACL 2026 Findings), [560](DAC 2026), [561](TKDE 2026), [563](CVPR 2026), [568](ACL 2026 Findings), [569](AAAI 2026), [575](TMLR 2026), [577](ICML 2026), [580](TMLR 2026), [581](ICLR 2026), [582](CVPR 2026 Workshop), [583](CVPR 2026), [584](ICLR 2026), [592](ACL 2026), [593](RA-L 2026), [596](ACL 2026), [597](MIDL 2026), [598](SBP-BRiMS 2025), [601](ICLR 2026), [603](IEEE FLLM 2025), [604](IEEE TASLP), [605](ICASSP 2026), [606](ICLR 2026), [609](ACL 2026), [610](CVPR 2026), [616](ACL 2026), [618](ICLR 2026), [621](ACL 2026), [622](AISTATS 2026), [626](ICLR 2026), [627](ICLR 2026), [629](ICLR 2026), [630](ACL 2026), [634](ACL 2026), [636](ICRA 2026), [637](ACL 2026), [643](IEEE ICHI 2026), [646](ACL 2026), [647](CVPR 2026), [648](CVPR 2026), [652](ICLR 2026), [654](ACL 2026), [655](ECC 2026), [658](Sci. Rep. 2026), [659](ACL 2026), [660](ACL 2026), [664](CVPR 2026 Workshop), [667](ACL 2026), [670](ACL 2026), [672](ICASSP 2026), [679](ACL 2026), [680](ICMC 2026), [681](CVPR 2026 Workshop), [689](CVPR 2026), [697](ACL 2026), [699](CVPR 2026), [705](ECCV 2026), [715](ACL 2026), [717](ACL 2026 Findings), [727](ICLR 2026), [728](ICPR 2026)

开源论文[1](code), [7](code), [9](code), [13](code), [26](code), [42](code), [49](code), [53](code), [62](code), [65](code), [86](code), [97](code), [101](code), [117](code), [129](code), [131](code), [138](code), [159](code), [162](code), [164](code), [169](code), [170](code), [194](code), [223](code), [251](code), [252](code), [267](code), [288](code), [303](code), [312](code), [316](code), [319](code), [323](code), [324](code), [327](code), [334](code), [340](code), [341](code), [360](code), [382](code), [402](code), [414](code), [421](code), [430](code), [435](code), [447](code), [450](code), [451](code), [453](code), [457](code), [479](code), [482](code), [490](code), [494](code), [496](code), [500](code), [505](code), [506](code), [512](code), [513](code), [515](code), [523](code), [524](code), [540](code), [543](code), [545](code), [550](code), [551](code), [552](code), [556](code), [561](code), [562](code), [577](code), [582](code), [584](code), [606](code), [607](code), [609](code), [629](code), [630](code), [634](code), [645](code), [648](code), [649](code), [662](code), [670](code), [678](code), [681](code), [686](code), [687](code), [699](code), [702](code), [712](code), [713](code), [717](code), [723](code), [727](code)


新投稿 (174)

[1] LABBench2: An Improved Benchmark for AI Systems Performing Biology Research

  • arXiv: 2604.09554
  • Authors: Jon M Laurent, Albert Bou, Michael Pieler, Conor Igoe, Alex Andonian, Siddharth Narayanan, James Braza, Alexandros Sanchez Vassopoulos, Jacob L Steenwyk, Blake Lash, Andrew D White, Samuel G Rodriques
  • Subjects: cs.AI; cs.CL; cs.LG
  • Tags: LLM Evaluation, Scientific Reasoning
  • Code: code
  • Summary: 该论文介绍了LABBench2,这是一个用于评估AI系统在生物学研究中实际能力的改进基准,包含近1900个任务。评估显示当前前沿模型的能力虽有提升,但在该基准上仍面临显著挑战,表明仍有改进空间。

[2] Linear Programming for Multi-Criteria Assessment with Cardinal and Ordinal Data: A Pessimistic Virtual Gap Analysis

  • arXiv: 2604.09555
  • Authors: Fuh-Hwa Franklin Liu, Su-Chuan Shih
  • Subjects: cs.AI; math.OC
  • Tags: Optimization, Decision Making
  • Summary: 该论文提出了一种基于线性规划的虚拟缺口分析(VGA)模型,用于处理包含基数和序数数据的多准则评估问题。该方法通过悲观视角评估备选方案并进行排序,具有可靠性和可扩展性。

[3] Seven simple steps for log analysis in AI systems

  • arXiv: 2604.09563
  • Authors: Magda Dubois, Ekin Zorer, Maia Hamin, Joe Skinner, Alexandra Souly, Jerome Wynne, Harry Coppock, Lucas Satos, Sayash Kapoor, Sunischal Dev, Keno Juchems, Kimberly Mai, Timo Flesch, Lennart Luettgau, Charles Teague, Eric Patey, JJ Allaire, Lorenzo Pacchiardi, Jose Hernandez-Orallo, Cozmin Ududec
  • Subjects: cs.AI; cs.CL; cs.LG
  • Tags: LLM Evaluation
  • Summary: 该论文提出了一套标准化的AI系统日志分析流程,包含七个步骤,旨在帮助研究人员理解模型行为并确保评估的可重复性。文章结合Inspect Scout库提供了具体的代码示例和指导。

[4] Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization

  • arXiv: 2604.09574
  • Authors: Jiachen Zhu, Lingyu Yang, Rong Shan, Congmin Zheng, Zeyu Zheng, Weiwen Liu, Yong Yu, Weinan Zhang, Jianghao Lin
  • Subjects: cs.AI; cs.LG
  • Tags: GUI Automation, LLM Agent, LLM Evaluation
  • Summary: 该论文提出了“屏幕上的图灵测试”概念和Agent Humanization Benchmark (AHB),旨在评估移动GUI代理在对抗性数字环境中模仿人类行为的能力。研究发现普通LMM代理容易被检测,并提出方法在保持性能的同时提高其拟人化程度。

[5] AHC: Meta-Learned Adaptive Compression for Continual Object Detection on Memory-Constrained Microcontrollers

  • arXiv: 2604.09576
  • Authors: Bibin Wilson
  • Subjects: cs.AI
  • Tags: Object Detection, Continual Learning, Model Compression
  • Summary: 该论文提出了自适应分层压缩(AHC)框架,利用元学习在内存受限的微控制器上实现持续目标检测。该方法通过分层多尺度压缩和双内存架构,在有限内存预算下有效缓解了灾难性遗忘问题。

[6] Explainable Planning for Hybrid Systems

  • arXiv: 2604.09578
  • Authors: Mir Md Sajid Sarwar
  • Subjects: cs.AI
  • Tags: Automated Planning, Interpretability
  • Summary: 该博士论文全面研究了混合系统的可解释人工智能规划(XAIP),旨在为自动驾驶和智能电网等复杂安全关键领域的自动化规划生成解释。

[7] Help Without Being Asked: A Deployed Proactive Agent System for On-Call Support with Continuous Self-Improvement

  • arXiv: 2604.09579
  • Authors: Fengrui Liu, Xiao He, Tieying Zhang
  • Subjects: cs.AI; cs.SE
  • Tags: LLM Agent, Dialogue System
  • Code: code
  • Summary: 该论文介绍了Vigil,一个部署在字节跳动火山引擎上的主动式代理系统,旨在通过主动介入人工支持对话并从已解决案例中学习,来协助处理大规模云服务工单。

[8] OOWM: Structuring Embodied Reasoning and Planning via Object-Oriented Programmatic World Modeling

  • arXiv: 2604.09580
  • Authors: Hongyu Chen, Liang Lin, Guangrun Wang
  • Subjects: cs.AI; cs.LG
  • Tags: Embodied AI, LLM Reasoning, Automated Planning
  • Summary: 该论文提出了面向对象的世界建模(OOWM)框架,利用UML类图和活动图来结构化具身推理和规划。该方法通过显式的符号化表示和三阶段训练流程,显著提升了机器人在具身任务中的规划连贯性和执行成功率。

[9] OpeFlo: Automated UX Evaluation via Simulated Human Web Interaction with GUI Grounding

  • arXiv: 2604.09581
  • Authors: Wee Joe Tan, Zi Rui Lucas Lim, Shashank Durgad, Karim Obegi, Aiden Yiliu Li
  • Subjects: cs.AI; cs.CY; cs.HC
  • Tags: GUI Automation, Usability Evaluation, LLM Agent
  • Code: code
  • Summary: 该论文提出了OpenFlo,一个通过模拟用户行为与网页交互来进行自动化用户体验(UX)评估的代理系统。该系统结合了多模态定位和标准化评估协议,为敏捷开发提供了可扩展的可用性测试方案。

[10] Factorizing formal contexts from closures of necessity operators

  • arXiv: 2604.09582
  • Authors: Roberto G. Aragón, Jesús Medina, Eloísa Ramírez-Poussa
  • Subjects: cs.AI; cs.LO
  • Tags: Knowledge Representation
  • Summary: 该论文分析了基于可能性理论算子从形式背景中获取独立子背景的方法,并研究了如何将这些性质扩展到模糊框架中,以实现数据集的分解。

[11] Agentic Exploration of PDE Spaces using Latent Foundation Models for Parameterized Simulations

  • arXiv: 2604.09584
  • Authors: Abhijeet Vishwasrao, Francisco Giral, Mahmoud Golestanian, Federica Tonti, Andrea Arroyo Ramo, Adrian Lozano-Duran, Steven L. Brunton, Sergio Hoyas, Soledad Le Clainche, Hector Gomez, Ricardo Vinuesa
  • Subjects: cs.AI; cs.CV
  • Tags: Scientific Computing, LLM Agent, Neural Operator
  • Summary: 该论文提出了一种结合多智能体LLM和潜在基础模型(LFM)的框架,用于自动探索偏微分方程(PDE)控制的物理现象。该方法在流体动力学实验中成功发现了新的标度律,展示了在科学计算中自动发现的潜力。

[12] MobiFlow: Real-World Mobile Agent Benchmarking through Trajectory Fusion

  • arXiv: 2604.09587
  • Authors: Yunfei Feng, Xi Zhao, Cheng Zhang, Dahu Feng, Daolin Cheng, Jianqi Yu, Yubin Xia, Erhu Feng
  • Subjects: cs.AI; cs.LG; cs.SE
  • Tags: GUI Automation, LLM Evaluation, LLM Agent
  • Summary: 该论文提出了MobiFlow,一个针对第三方应用程序的真实世界移动代理评估框架。该框架利用基于多轨迹融合的图构建算法,有效压缩状态空间,提供了更符合真实场景的评估指标。

[13] Persistent Identity in AI Agents: A Multi-Anchor Architecture for Resilient Memory and Continuity

  • arXiv: 2604.09588
  • Authors: Prahlad G. Menon
  • Subjects: cs.AI; cs.ET; cs.LG
  • Tags: LLM Agent, Memory Architecture
  • Code: code
  • Summary: 该论文提出了一种多锚点架构,旨在解决AI代理在上下文溢出时的身份持久性和记忆连续性问题。该框架结合了RAG和RLM检索系统,使代理能够在部分记忆丢失的情况下保持身份连续性。

[14] DeepReviewer 2.0: A Traceable Agentic System for Auditable Scientific Peer Review

  • arXiv: 2604.09590
  • Authors: Yixuan Weng, Minjun Zhu, Qiujie Xie, Zhiyuan Ning, Shichen Li, Panzhong Lu, Zhen Lin, Enhao Gu, Qiyao Sun, Yue Zhang
  • Subjects: cs.AI; cs.CL; cs.CY
  • Tags: LLM Agent, Scientific Reasoning
  • Summary: 该论文介绍了DeepReviewer 2.0,一个用于自动化同行评审的代理系统,能够生成包含锚定注释和可执行后续行动的可追溯评审包。实验表明,该系统在主要问题覆盖率上优于现有模型,并能与人类评审委员会相媲美。

[15] Spatial Competence Benchmark

  • arXiv: 2604.09594
  • Authors: Jash Vira, Ashley Harris
  • Subjects: cs.AI; cs.LG
  • Tags: LLM Reasoning, LLM Evaluation
  • Venue: ICLR 2026 Workshop
  • Summary: 该论文引入了空间能力基准,通过分层任务评估大型模型在维护环境一致内部表示和规划行动方面的能力。实验显示前沿模型在高级空间推理任务上表现不佳,且准确率随输出预算限制快速饱和。

[16] DERM-3R: A Resource-Efficient Multimodal Agents Framework for Dermatologic Diagnosis and Treatment in Real-World Clinical Settings

  • arXiv: 2604.09596
  • Authors: Ziwen Chen, Zhendong Wang, Chongjing Wang, Yurui Dong, Luozhijie Jin, Jihao Gu, Kui Chen, Jiaxi Yang, Bingjie Lu, Zhou Zhang, Jirui Dai, Changyong Luo, Xiameng Gai, Haibing Lan, Zhi Liu
  • Subjects: cs.AI; cs.MA
  • Tags: Medical AI, Multimodal Learning, LLM Agent
  • Summary: 该论文提出了DERM-3R,一个资源高效的多模态代理框架,用于模拟中医皮肤科的诊断和治疗。该框架包含三个协作代理,在有限数据和计算资源下,通过细粒度病变识别和整体推理实现了优异的性能。

[17] CID-TKG: Collaborative Historical Invariance and Evolutionary Dynamics Learning for Temporal Knowledge Graph Reasoning

  • arXiv: 2604.09600
  • Authors: Shuai-Long Lei, Xiaobin Zhu, Jiarui Liang, Guoxi Sun, Zhiyu Fang, Xu-Cheng Yin
  • Subjects: cs.AI; cs.CL
  • Tags: Temporal Knowledge Graph, Knowledge Graph
  • Summary: 该论文提出了CID-TKG框架,通过构建历史不变性图和演化动态图来协作学习时序知识图谱推理。该方法利用对比目标对齐视图表示,有效缓解了语义差异并在外推设置下取得了最优性能。

[18] Hubble: An LLM-Driven Agentic Framework for Safe and Automated Alpha Factor Discovery

  • arXiv: 2604.09601
  • Authors: Runze Shi, Shengyu Yan, Yuecheng Cai, Chengxi Lv
  • Subjects: cs.AI; cs.CE
  • Tags: Quantitative Finance, LLM Agent
  • Summary: 本文介绍了Hubble框架,利用大语言模型作为智能搜索启发式方法,在量化金融中实现安全且自动化的Alpha因子发现。该系统结合了进化反馈机制和确定性安全约束,生成了有效且可解释的因子。

[19] From Scalars to Tensors: Declared Losses Recover Epistemic Distinctions That Neutrosophic Scalars Cannot Express

  • arXiv: 2604.09602
  • Authors: Tony Mason
  • Subjects: cs.AI; cs.SE
  • Tags: Uncertainty Estimation, LLM Evaluation
  • Summary: 该研究扩展了中智逻辑评估实验,发现标量T/I/F无法区分不同的认知情况,提出了包含声明性损失的张量结构输出。这种方法能更忠实地表示大语言模型的认知状态,恢复了被标量表示忽略的认知区分。

[20] LLMs for Text-Based Exploration and Navigation Under Partial Observability

  • arXiv: 2604.09604
  • Authors: Stephan Sandfuchs, Maximilian Melchert, Jörg Frochte
  • Subjects: cs.AI; cs.LG
  • Tags: LLM Agent, LLM Reasoning
  • Venue: LNICST 2026
  • Summary: 该研究评估了大语言模型在部分可观察性下作为纯文本控制器在网格世界中进行探索和导航的能力。结果显示推理调优模型能可靠完成导航,但效率不及神谕路径,且经典指令模型表现不一致。

[21] Evaluating Reliability Gaps in Large Language Model Safety via Repeated Prompt Sampling

  • arXiv: 2604.09606
  • Authors: Keita Broadwater
  • Subjects: cs.AI; cs.SE
  • Tags: LLM Evaluation, LLM Safety
  • Venue: CCAI 2026
  • Summary: 本文提出了加速提示压力测试(APST)框架,通过重复采样相同提示来评估大语言模型的安全性和一致性。研究发现传统基准测试掩盖了模型在持续使用下的可靠性差异,不同温度下的故障概率存在显著变化。

[22] Unifying Ontology Construction and Semantic Alignment for Deterministic Enterprise Reasoning at Scale

  • arXiv: 2604.09608
  • Authors: Hongyin Zhu
  • Subjects: cs.AI; cs.CL
  • Tags: Neurosymbolic AI, Knowledge Representation
  • Summary: 该论文提出了大型本体模型(LOM),将本体构建、语义对齐和逻辑推理统一到一个端到端架构中。实验表明,该方法在企业级基准测试的本体补全和复杂图推理任务上显著优于现有的大型语言模型。

[23] General-purpose LLMs as Models of Human Driver Behavior: The Case of Simplified Merging

  • arXiv: 2604.09609
  • Authors: Samir H.A. Mohammad, Wouter Mooi, Arkady Zgonnikov
  • Subjects: cs.AI; cs.RO
  • Tags: Autonomous Driving, LLM Agent
  • Summary: 该研究将通用大语言模型作为驾驶员代理嵌入简化的一维合流场景,评估其模拟人类驾驶行为的能力。结果发现模型能复现部分人类行为特征,但在动态速度响应上存在不足,提示词设计对模型行为有显著影响。

[24] Beyond Theory of Mind in Robotics

  • arXiv: 2604.09612
  • Authors: Malte F. Jung
  • Subjects: cs.AI; cs.HC
  • Tags: Robotics, Cognitive Science
  • Summary: 该论文批判了机器人学中心智理论范式的局限性,指出社交意义产生于互动协调而非内部状态推断。作者主张机器人设计应从观察者推断转向参与式协调策略,通过互动稳定行为意义。

[25] The Geometry of Knowing: From Possibilistic Ignorance to Probabilistic Certainty -- A Measure-Theoretic Framework for Epistemic Convergence

  • arXiv: 2604.09614
  • Authors: Moriba Kemessia Jah
  • Subjects: cs.AI; cs.IT; math.ST
  • Tags: Uncertainty Estimation, Information Theory
  • Summary: 本文建立了一个测度论框架,阐述了不完整知识的可能性表示如何随证据积累收缩为概率表示。作者证明了认知坍缩条件,并比较了UKF和ESPF在轨道跟踪中的表现,强调了ESPF在认知诚实性上的优势。

[26] AdaQE-CG: Adaptive Query Expansion for Web-Scale Generative AI Model and Data Card Generation

  • arXiv: 2604.09617
  • Authors: Haoxuan Zhang, Ruochi Li, Zhenni Liang, Mehri Sattari, Phat Vo, Collin Qu, Ting Xiao, Junhua Ding, Yang Zhang, Haihua Chen
  • Subjects: cs.AI; cs.IR
  • Tags: Information Extraction, AI Ethics
  • Venue: WWW 2026
  • Code: code
  • Summary: 该论文提出了AdaQE-CG框架,通过自适应查询扩展和跨卡知识转移,自动生成生成式AI模型和数据卡。实验表明该方法在文档质量上优于现有方法并接近人类水平,同时发布了首个专家标注的基准MetaGAI-Bench。

[27] Competing with AI Scientists: Agent-Driven Approach to Astrophysics Research

  • arXiv: 2604.09621
  • Authors: Thomas Borrett, Licong Xu, Andy Nilipour, Boris Bolliet, Sebastien Pierre, Erwan Allys, Celia Lecat, Biwei Dai, Po-Wen Chang, Wahid Bhimji
  • Subjects: cs.AI
  • Tags: Scientific Reasoning, LLM Agent
  • Summary: 本文提出了一种基于代理的方法,利用多代理系统Cmbagent构建天体物理学参数推断管道。在弱透镜不确定性挑战赛中,该系统在人工干预辅助下获得第一名,展示了代理驱动工作流在科学研究中的潜力。

[28] How LLMs Might Think

  • arXiv: 2604.09674
  • Authors: Joseph Gottlieb, Ethan Kemp, Matthew Trager
  • Subjects: cs.AI; cs.CL
  • Tags: Cognitive Science
  • Summary: 该论文探讨了大型语言模型是否思考的问题,反驳了基于理性的否定论证。作者提出LLM可能仅进行非理性的联想式思考,如果它们确实思考,很可能就是以这种纯粹联想的方式进行的。

[29] Belief-Aware VLM Model for Human-like Reasoning

  • arXiv: 2604.09686
  • Authors: Anshul Nayak, Shahil Shaik, Yue Wang
  • Subjects: cs.AI; cs.CV
  • Tags: Vision-Language Model, Reinforcement Learning
  • Summary: 本文提出了一种信念感知的视觉语言模型框架,结合基于检索的记忆和强化学习来近似信念状态。该方法在VQA数据集上优于零样本基线,证明了在长视野任务中进行信念感知推理的重要性。

[30] Tipiano: Cascaded Piano Hand Motion Synthesis via Fingertip Priors

  • arXiv: 2604.09692
  • Authors: Joonhyung Bae, Kirak Kim, Hyeyoon Cho, Sein Lee, Yoon-Seok Choi, Hyeon Hur, Gyubin Lee, Akira Maezawa, Satoshi Obata, Jonghwa Park, Jaebum Park, Juhan Nam
  • Subjects: cs.AI; cs.CV
  • Tags: Motion Synthesis, Computer Vision
  • Summary: 该论文提出了Tipiano框架,利用指尖先验通过四个阶段级联合成逼真的钢琴手部动作。实验显示该方法在位置准确性上显著优于扩散基线,用户研究证实其质量接近动作捕捉水平。

[31] The Myth of Expert Specialization in MoEs: Why Routing Reflects Geometry, Not Necessarily Domain Expertise

  • arXiv: 2604.09780
  • Authors: Xi Wang, Soufiane Hayou, Eric Nalisnick
  • Subjects: cs.AI
  • Tags: Mixture-of-Experts, Interpretability
  • Summary: 本文分析了混合专家模型中的专家专业化机制,发现路由主要反映隐藏状态的几何结构而非领域专业知识。研究表明专业化是表示空间的涌现属性,且预训练MoE中的专业化模式难以被人类解释。

[32] Pioneer Agent: Continual Improvement of Small Language Models in Production

  • arXiv: 2604.09791
  • Authors: Dhruv Atreja, Julia White, Nikhil Nayak, Kelton Zhang, Henrijs Princis, George Hurn-Maloney, Ash Lewis, Urchade Zaratiana
  • Subjects: cs.AI; cs.CL; cs.LG; cs.MA
  • Tags: LLM Agent, Continual Learning
  • Summary: 本文介绍了Pioneer Agent,一个用于持续改进小型语言模型的闭环系统,能自动执行数据获取、评估和迭代训练。该系统在冷启动和生产部署场景中均表现出色,能自动发现有效的训练策略。

[33] Controllable and Verifiable Tool-Use Data Synthesis for Agentic Reinforcement Learning

  • arXiv: 2604.09813
  • Authors: Siyuan Xu, Shiyang Li, Xin Liu, Tianyi Liu, Yixiao Li, Zhan Shi, Zixuan Zhang, Zilong Wang, Qingyu Yin, Jianshu Chen, Tuo Zhao, Bing Yin
  • Subjects: cs.AI
  • Tags: LLM Agent, Data Synthesis, Reinforcement Learning
  • Summary: 本文提出COVERT,一种两阶段流水线,用于生成可靠的工具使用轨迹数据,支持强化学习的在线训练。该方法通过多层次验证和oracle保留增强策略,在BFCL v3和ACEBench上显著提升了工具调用策略的性能。

[34] EE-MCP: Self-Evolving MCP-GUI Agents via Automated Environment Generation and Experience Learning

  • arXiv: 2604.09815
  • Authors: Tiantian He, Yihang Chen, Keyue Jiang, Ka Yiu Lee, Kaiwen Zhou, Kun Shao, Shuai Wang
  • Subjects: cs.AI
  • Tags: LLM Agent, GUI Automation, Multi-Agent System
  • Summary: 本文将MCP-GUI交互建模为统一混合策略学习问题,提出自进化框架,包含自动环境生成、轨迹收集和经验库。研究发现蒸馏方法在MCP主导任务上表现更好,而经验库在GUI密集型任务上更优。

[35] COMPOSITE-Stem

  • arXiv: 2604.09836
  • Authors: Kyle Waters, Lucas Nuzzi, Tadhg Looram, Alessandro Tomasiello, Ariel Ghislain Kemogne Kamdoum, Bikun Li, Damien Sileo, Egor Kretov, Francesco Fournier-Facio, Georgios Soloupis, Haile Kassahun, Hew Wolff, Jiaqi Cai, Lianghui Li, Marc Roth, Mohinder Naiya, Naixu Guo, Qicheng Tang, Richard Wheeler, Samuele Sala, Serguei Popov, Steven Dillman, Yuqi Li
  • Subjects: cs.AI; cs.CL; cs.LG
  • Tags: LLM Evaluation, Scientific Reasoning, LLM Agent
  • Summary: 本文引入COMPOSITE-STEM基准测试,包含70个由博士级研究人员编写的物理、生物、化学和数学任务。该基准结合精确匹配评分和LLM评审协议,评估显示当前前沿模型仅达到21%的准确率。

[36] Steered LLM Activations are Non-Surjective

  • arXiv: 2604.09839
  • Authors: Aayush Mishra, Daniel Khashabi, Anqi Liu
  • Subjects: cs.AI; cs.LG
  • Tags: Interpretability, LLM Inference
  • Venue: ICLR 2026 Workshop
  • Summary: 本文证明了激活引导会将残差流推离离散提示可达的状态流形,建立了白盒引导和黑盒提示之间的形式分离。作者警示不应将激活引导的成功解释为基于提示的可解释性或漏洞的证据。

[37] MEMENTO: Teaching LLMs to Manage Their Own Context

  • arXiv: 2604.09852
  • Authors: Vasilis Kontonis, Yuchen Zeng, Shivam Garg, Lingjiao Chen, Hao Tang, Ziyan Wang, Ahmed Awadallah, Eric Horvitz, John Langford, Dimitris Papailiopoulos
  • Subjects: cs.AI; cs.LG
  • Tags: LLM Inference, Long Context, LLM Reasoning
  • Summary: 本文提出MEMENTO方法,教模型将推理分割成块并压缩为记忆单元,仅通过注意力机制处理记忆而非完整上下文。该方法在数学、科学和编程基准上保持准确性的同时,实现了约2.5倍的KV缓存减少。

[38] Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards

  • arXiv: 2604.09855
  • Authors: Shuze Daniel Liu, Claire Chen, Jiabao Sean Xiao, Lei Lei, Yuheng Zhang, Yisong Yue, David Simchi-Levi
  • Subjects: cs.AI; cs.CL; cs.GT; econ.GN
  • Tags: Reinforcement Learning, LLM Agent, Decision Making
  • Summary: 本文研究使用可验证奖励强化学习(RLVR)教LLM进行谈判,揭示了四阶段策略演化过程:从幼稚讨价还价到发展出复杂说服技巧。训练后的30B参数代理在提取经济剩余方面显著优于十倍规模的前沿模型。

[39] Evolutionary Token-Level Prompt Optimization for Diffusion Models

  • arXiv: 2604.09861
  • Authors: Domício Pereira Neto, João Correia, Penousal Machado
  • Subjects: cs.AI; cs.NE
  • Tags: Text-to-Image, Diffusion Model, Prompt Engineering
  • Summary: 本文提出使用遗传算法直接演化CLIP文本编码器的token向量来优化扩散模型的提示词。该方法在Parti Prompts数据集上相比基线方法实现了最高23.93%的适应度提升。

[40] What do your logits know? (The answer may surprise you!)

  • arXiv: 2604.09885
  • Authors: Masha Fedzechkina, Eleonora Gualdoni, Rita Ramos, Sinead Williamson
  • Subjects: cs.AI
  • Tags: Vision-Language Model, Privacy, Interpretability
  • Summary: 本文系统比较了视觉语言模型在不同表示层级保留的信息,发现即使模型顶部logit值也能泄露图像查询中与任务无关的信息。这揭示了模型内部表示可能带来的隐私泄露风险。

[41] In-situ process monitoring for defect detection in wire-arc additive manufacturing: an agentic AI approach

  • arXiv: 2604.09889
  • Authors: Pallock Halder, Satyajit Mojumder
  • Subjects: cs.AI
  • Tags: LLM Agent, Multi-Agent System, Manufacturing AI
  • Summary: 本文提出用于电弧增材制造原位过程监控的智能体AI框架,结合处理代理和监控代理进行缺陷检测。多代理配置达到91.6%的决策准确率,优于所有单代理方案。

[42] GLEaN: A Text-to-image Bias Detection Approach for Public Comprehension

  • arXiv: 2604.09923
  • Authors: Bochu Ding, Brinnae Bent, Augustus Wendell
  • Subjects: cs.AI; cs.CV
  • Tags: Text-to-Image, Bias Mitigation, Fairness
  • Code: code
  • Summary: 本文提出GLEaN,一种面向公众理解的文本到图像偏差检测方法,通过生成合成肖像直观展示模型的社会和职业身份偏差。用户研究表明该方法与传统数据表同样有效,但所需观看时间显著减少。

[43] HealthAdminBench: Evaluating Computer-Use Agents on Healthcare Administration Tasks

  • arXiv: 2604.09937
  • Authors: Suhana Bedi, Ryan Welch, Ethan Steinberg, Michael Wornow, Taeil Matthew Kim, Haroun Ahmed, Peter Sterling, Bravim Purohit, Qurat Akram, Angelic Acosta, Esther Nubla, Pritika Sharma, Michael A. Pfeffer, Sanmi Koyejo, Nigam H. Shah
  • Subjects: cs.AI
  • Tags: LLM Evaluation, LLM Agent, Medical AI
  • Summary: 本文引入HealthAdminBench基准测试,包含4个GUI环境和135个医疗行政管理任务,用于评估计算机使用代理。结果显示最佳代理仅达到36.3%的任务成功率,揭示了当前代理能力与实际工作流需求之间的差距。

[44] New Hybrid Fine-Tuning Paradigm for LLMs: Algorithm Design and Convergence Analysis Framework

  • arXiv: 2604.09940
  • Authors: Shaocong Ma, Peiran Yu, Heng Huang
  • Subjects: cs.AI; cs.LG; math.OC
  • Tags: Instruction Tuning, Optimization, LLM Training
  • Venue: ICLR 2026
  • Summary: 本文提出一种混合微调方法,结合零阶和一阶优化方法联合更新LLM和PEFT模块。作者建立了混合平滑性条件的理论框架,证明了重排型SGD算法的收敛性,并在多个下游任务上展示了性能提升。

[45] FinTrace: Holistic Trajectory-Level Evaluation of LLM Tool Calling for Long-Horizon Financial Tasks

  • arXiv: 2604.10015
  • Authors: Yupeng Cao, Haohang Li, Weijin Liu, Wenbo Cao, Anke Xu, Lingfei Qian, Xueqing Peng, Minxue Tang, Zhiyuan Yao, Jimin Huang, K.P. Subbalakshmi, Zining Zhu, Jordan W. Suchow, Yangyang Yu
  • Subjects: cs.AI; cs.CE; cs.CL; cs.MM
  • Tags: LLM Evaluation, LLM Agent, Quantitative Finance
  • Summary: 本文引入FinTrace基准测试,包含800条专家标注的金融任务轨迹,采用九项指标评估LLM工具调用行为。研究发现虽然模型工具选择能力强,但在信息利用和最终答案质量上存在明显不足。

[46] AI Achieves a Perfect LSAT Score

  • arXiv: 2604.10034
  • Authors: Bonmu Ku
  • Subjects: cs.AI
  • Tags: LLM Reasoning, LLM Evaluation, Legal AI
  • Summary: 本文报告了语言模型首次在官方LSAT考试中获得满分的案例。实验表明移除思维阶段会使准确率下降最多8个百分点,主要影响逻辑推理部分,而提示词变化和答案选项打乱对性能影响不大。

[47] LoopGuard: Breaking Self-Reinforcing Attention Loops via Dynamic KV Cache Intervention

  • arXiv: 2604.10044
  • Authors: Dongjie Xu, Hao Wu, Weijie Shi, Yue Cui, Yuanjun Liu, Jiawei Li, Haolun Ma, An Liu, Jia Zhu, Jiajie Xu
  • Subjects: cs.AI
  • Tags: LLM Inference, Long Context
  • Summary: 本文识别出解码过程中注意力模式塌缩导致的重复循环失败模式,并提出LoopGuard方法通过动态KV缓存干预来检测和打破循环。实验显示该方法将循环发生率降低超过90个百分点。

[48] Learning Hierarchical and Geometry-Aware Graph Representations for Text-to-CAD

  • arXiv: 2604.10075
  • Authors: Shengjie Gong, Wenjie Peng, Hongyuan Chen, Gangyu Zhang, Yunqing Hu, Huiyuan Zhang, Shuangping Huang, Tianshui Chen
  • Subjects: cs.AI
  • Tags: CAD Generation, Graph Neural Network, Text Generation
  • Venue: ICLR 2026
  • Summary: 本文提出使用层次化和几何感知图作为中间表示来改进文本到CAD生成,显式建模多级部件和几何约束。该方法在几何保真度和约束满足方面均优于现有方法。

[49] Ontological Trajectory Forecasting via Finite Semigroup Iteration and Lie Algebra Approximation in Geopolitical Knowledge Graphs

  • arXiv: 2604.10087
  • Authors: Qihang Wu
  • Subjects: cs.AI
  • Tags: Knowledge Graph, Temporal Knowledge Graph, Automated Planning
  • Code: code
  • Summary: 本文提出了EL-DRUIN,一个结合形式本体论、有限半群代数和李代数近似的地缘政治情报分析推理系统,用于预测长期关系轨迹。该系统将地缘政治关系建模为动态模式状态,通过半群运算进行前向模拟,并利用贝叶斯后验权重提供可解释的校准概率。

[50] Trust Your Memory: Verifiable Control of Smart Homes through Reinforcement Learning with Multi-dimensional Rewards

  • arXiv: 2604.10110
  • Authors: Kai-Yuan Guo, Jiang Wang, Renjie Zhao, Tianyi Wang, Wandong Mao, Yu Gao, Mou Xiao Feng, Yi Xu
  • Subjects: cs.AI
  • Tags: LLM Agent, Reinforcement Learning
  • Summary: 本文针对智能家居中记忆驱动的设备控制问题,发布了MemHomeLife数据集和MemHome基准测试,并提出了基于多维奖励的强化学习方法来解决细粒度记忆管理任务中的中间反馈缺失问题。

[51] Learning from Emptiness: De-biasing Listwise Rerankers with Content-Agnostic Probability Calibration

  • arXiv: 2604.10150
  • Authors: Hang Lv, Hongchao Gu, Ruiqing Yang, Liangyue Li, Zulong Chen, Defu Lian, Hao Wang, Enhong Chen
  • Subjects: cs.AI; cs.CL
  • Tags: Information Retrieval, Bias Mitigation
  • Venue: ACL 2026
  • Summary: 本文提出了CapCal,一种无需训练的框架,通过内容无关的占位符估计偏置分布,并利用熵自适应对比机制修正输出logits,从而有效解耦生成式列表重排序中的位置偏置。

[52] SpecMoE: A Fast and Efficient Mixture-of-Experts Inference via Self-Assisted Speculative Decoding

  • arXiv: 2604.10152
  • Authors: Jehyeon Bang, Eunyeong Cho, Ranggi Hwang, Jinha Chung, Minsoo Rhu
  • Subjects: cs.AI; cs.LG
  • Tags: Mixture-of-Experts, LLM Inference
  • Venue: DAC 2026
  • Summary: 本文提出了SpecMoE,一种基于自辅助推测解码的内存高效MoE推理系统,无需额外模型训练即可将推理吞吐量提升高达4.30倍,同时显著降低内存和互连带宽需求。

[53] Inductive Reasoning for Temporal Knowledge Graphs with Emerging Entities

  • arXiv: 2604.10164
  • Authors: Ze Zhao, Yuhui He, Lyuwen Wu, Gu Tang, Bin Lu, Xiaoying Gan, Luoyi Fu, Xinbing Wang, Chenghu Zhou
  • Subjects: cs.AI
  • Tags: Temporal Knowledge Graph, Knowledge Graph
  • Venue: ICLR 2026
  • Code: code
  • Summary: 本文提出了TransFIR框架,用于处理时序知识图谱中新出现实体的归纳推理问题。该方法利用语义相似实体的历史交互序列,通过码本分类器将新实体映射到潜在语义簇,在MRR指标上平均提升28.6%。

[54] MAVEN-T: Multi-Agent enVironment-aware Enhanced Neural Trajectory predictor with Reinforcement Learning

  • arXiv: 2604.10169
  • Authors: Wenchang Duan
  • Subjects: cs.AI; cs.LG
  • Tags: Autonomous Driving, Knowledge Distillation, Reinforcement Learning
  • Summary: 本文介绍了MAVEN-T,一个用于自动驾驶轨迹预测的教师-学生框架,通过渐进式知识蒸馏和强化学习实现6.2倍参数压缩和3.7倍推理加速,同时保持最先进的预测精度。

[55] PoreDiT: A Scalable Generative Model for Large-Scale Digital Rock Reconstruction

  • arXiv: 2604.10171
  • Authors: Yizhuo Huang, Baoquan Sun, Haibo Huang
  • Subjects: cs.AI
  • Tags: 3D Vision, Scientific Computing
  • Summary: 本文提出了PoreDiT,一种用于吉体素级数字岩石重建的高效生成模型,利用3D Swin Transformer直接预测孔隙空间的二值概率场,在消费级硬件上实现超大规模数字岩石样本生成。

[56] Credit-Budgeted ICPC-Style Coding: When Agents Must Pay for Every Decision

  • arXiv: 2604.10182
  • Authors: Lingfeng Zhou, Junhao Shi, Jin Gao, Dequan Wang
  • Subjects: cs.AI
  • Tags: LLM Agent, Code Generation
  • Venue: ICLR 2026
  • Summary: 本文引入了USACOArena,一个基于严格信用经济的ACM-ICPC风格竞技场,用于评估自主编程代理在资源约束下的成本感知问题解决能力,揭示了当前代理在准确性与资源消耗平衡上的不足。

[57] Edu-MMBias: A Three-Tier Multimodal Benchmark for Auditing Social Bias in Vision-Language Models under Educational Contexts

  • arXiv: 2604.10200
  • Authors: Ruijia Li, Mingzi Zhang, Zengyi Yu, Yuang Wei, Bo Jiang
  • Subjects: cs.AI; cs.CV
  • Tags: Vision-Language Model, Bias Mitigation, Fairness
  • Summary: 本文提出了Edu-MMBias,一个基于社会心理学三成分态度模型的多模态偏见审计框架,用于诊断视觉语言模型在教育场景下的认知、情感和行为三个维度的社会偏见。

[58] Cognitive Pivot Points and Visual Anchoring: Unveiling and Rectifying Hallucinations in Multimodal Reasoning Models

  • arXiv: 2604.10219
  • Authors: Zhe Qian, Yanbiao Ma, Zhuohan Ouyang, Zhonghua Wang, Zhongxing Xu, Fei Luo, Xinyu Liu, Zongyuan Ge, Yike Guo, Jungong Han
  • Subjects: cs.AI
  • Tags: LLM Hallucination, Vision-Language Model, Multimodal Learning
  • Summary: 本文识别了多模态推理模型中的推理视觉真理解离(RVTD)现象,提出了V-STAR训练范式,通过分层视觉注意力奖励机制将推理过程锚定回视觉输入,以缓解幻觉问题。

[59] SVSR: A Self-Verification and Self-Rectification Paradigm for Multimodal Reasoning

  • arXiv: 2604.10228
  • Authors: Zhe Qian, Nianbing Su, Zhonghua Wang, Hebei Li, Zhongxing Xu, Yueying Li, Fei Luo, Zhuohan Ouyang, Yanbiao Ma
  • Subjects: cs.AI
  • Tags: Vision-Language Model, LLM Reasoning, Multimodal Learning
  • Summary: 本文提出了SVSR框架,将自验证和自纠正显式集成到多模态推理流程中,通过三阶段训练范式(偏好数据构建、冷启动监督微调、半在线DPO)提升推理的鲁棒性和可靠性。

[60] A Dual-Positive Monotone Parameterization for Multi-Segment Bids and a Validity Assessment Framework for Reinforcement Learning Agent-based Simulation of Electricity Markets

  • arXiv: 2604.10252
  • Authors: Zunnan Xu, Zhaoxia Jing, Zhanhua Pan
  • Subjects: cs.AI; eess.SY
  • Tags: Reinforcement Learning
  • Summary: 本文提出了双正单调参数化方法和有效性评估框架,用于解决电力市场强化学习代理仿真中多段投标曲线的梯度失真问题,并严格评估仿真结果与纳什均衡的距离。

[61] The Amazing Agent Race: Strong Tool Users, Weak Navigators

  • arXiv: 2604.10261
  • Authors: Zae Myung Kim, Dongseok Lee, Jaehyung Kim, Vipul Raheja, Dongyeop Kang
  • Subjects: cs.AI; cs.CL; cs.LG
  • Tags: LLM Agent, LLM Evaluation
  • Summary: 本文引入了AAR基准测试,包含具有分叉-合并工具链的DAG谜题,揭示了LLM代理在导航任务上的弱点——最佳代理仅达到37.2%准确率,导航错误占主导地位而工具使用错误较少。

[62] STARS: Skill-Triggered Audit for Request-Conditioned Invocation Safety in Agent Systems

  • arXiv: 2604.10286
  • Authors: Guijia Zhang, Shu Yang, Xilin Gong, Di Wang
  • Subjects: cs.AI
  • Tags: LLM Agent, LLM Security
  • Code: code
  • Summary: 本文提出了STARS框架,结合静态能力先验、请求条件调用风险模型和校准风险融合策略,用于代理系统中技能调用的持续风险评估和分类。

[63] Dead Cognitions: A Census of Misattributed Insights

  • arXiv: 2604.10288
  • Authors: Aaron Tuor, claude.ai
  • Subjects: cs.AI
  • Tags: AI Ethics, Human-Computer Interaction
  • Summary: 本文识别了AI聊天系统中的归因洗白失败模式,即模型执行实质性认知工作后将见解归功于用户,这种机制在个人和社会层面都会侵蚀用户准确评估自身认知贡献的能力。

[64] AI Organizations are More Effective but Less Aligned than Individual Agents

  • arXiv: 2604.10290
  • Authors: Judy Hanwen Shen, Daniel Zhu, Siddarth Srinivasan, Henry Sleight, Lawrence T. Wagner III, Morgan Jane Matthews, Erik Jones, Jascha Sohl-Dickstein
  • Subjects: cs.AI
  • Tags: Multi-Agent System, LLM Alignment
  • Venue: ICLR 2026 Workshop
  • Summary: 本文通过实验证明,多代理AI组织在实现商业目标方面比单个AI代理更有效,但一致性更差。在咨询和软件开发等12个任务中,AI组织产生更高效用但也带来更大的不一致性。

[65] TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale

  • arXiv: 2604.10291
  • Authors: Malgorzata Gwiazda, Yifu Cai, Mononito Goswami, Arjun Choudhry, Artur Dubrawski
  • Subjects: cs.AI
  • Tags: LLM Evaluation, Time Series Forecasting, LLM Reasoning
  • Code: code
  • Summary: 本文提出了可扩展的时间序列推理基准测试创建方法,结合模板和LLM代理开发了TimeSeriesExam基准,涵盖模式识别、噪声理解、相似性分析、异常检测和因果关系五个核心推理类别。实验表明LLM在抽象时间序列推理和特定领域应用中的表现仍然有限。

[66] Gypscie: A Cross-Platform AI Artifact Management System

  • arXiv: 2604.10311
  • Authors: Fabio Porto, Eduardo Ogasawara, Gabriela Moraes Botaro, Julia Neumann Bastos, Augusto Fonseca, Esther Pacitti, Patrick Valduriez
  • Subjects: cs.AI; cs.DB
  • Tags: Knowledge Graph, Knowledge Representation
  • Summary: 本文介绍了Gypscie,一个跨平台AI制品管理系统,通过知识图谱提供统一的AI制品视图,支持模型生命周期管理。该系统支持跨多平台调度数据流,并记录溯源信息以实现可解释性。

[67] From GPT-3 to GPT-5: Mapping their capabilities, scope, limitations, and consequences

  • arXiv: 2604.10332
  • Authors: Hina Afridi, Habib Ullah, Sultan Daud Khan, Mohib Ullah
  • Subjects: cs.AI
  • Tags: LLM Evaluation, Multimodal Learning
  • Summary: 本文对GPT家族从GPT-3到GPT-5的演进进行了比较分析,考察了技术进展、能力变化、部署转变和持续存在的局限性。作者认为后期GPT应被理解为对齐的、多模态的、工具导向的系统,而不仅仅是更大的语言模型。

[68] Zero-shot World Models Are Developmentally Efficient Learners

  • arXiv: 2604.10333
  • Authors: Khai Loong Aw, Klemen Kotar, Wanhee Lee, Seungwoo Kim, Khaled Jedoui, Rahul Venkatesh, Lilian Naing Chen, Michael C. Frank, Daniel L.K. Yamins
  • Subjects: cs.AI; cs.CV
  • Tags: Zero-Shot Learning, Cognitive Science, Self-Supervised Learning
  • Summary: 本文提出了零样本视觉世界模型(ZWM),基于稀疏时间分解预测器、零样本估计和推理组合三个原则,从单个儿童的第一人称视角数据中学习。ZWM能够快速生成跨多个物理理解基准的能力,并重现儿童发展的行为特征。

[69] VeriTrans: Fine-Tuned LLM-Assisted NL-to-PL Translation via a Deterministic Neuro-Symbolic Pipeline

  • arXiv: 2604.10341
  • Authors: Xuan Liu, Dheeraj Kodakandla, Kushagra Srivastva, Mahfuza Farooque
  • Subjects: cs.AI
  • Tags: Neurosymbolic AI, Formal Methods, Program Synthesis
  • Summary: VeriTrans是一个可靠性优先的ML系统,通过指令调优的翻译器、往返重建和规范编译,将自然语言需求编译为求解器就绪的逻辑。在SatBench上实现了94.46%的SAT/UNSAT正确性,支持可审计和可重现的工作流程。

[70] ClawVM: Harness-Managed Virtual Memory for Stateful Tool-Using LLM Agents

  • arXiv: 2604.10352
  • Authors: Mofasshara Rafique, Laurent Bindschaedler
  • Subjects: cs.AI; cs.OS; cs.SE
  • Tags: LLM Agent, Memory Architecture
  • Venue: EuroSys 2026 Workshop
  • Summary: ClawVM是一个虚拟内存层,为有状态工具使用型LLM代理管理状态,通过类型化页面、最小保真度不变量和生命周期边界的验证写回,消除了策略可控的故障。

[71] Beyond Monologue: Interactive Talking-Listening Avatar Generation with Conversational Audio Context-Aware Kernels

  • arXiv: 2604.10367
  • Authors: Yuzhe Weng, Haotian Wang, Xinyi Yu, Xiaoyan Wu, Haoran Xu, Shan He, Jun Du
  • Subjects: cs.AI; cs.SD
  • Tags: Audio Generation, Video Generation, Multimodal Learning
  • Summary: 本文提出了一种多头高斯核方法,用于全双工交互式虚拟代理生成,能够同时处理说话和倾听的双流音频输入。该方法解决了说话和倾听行为之间的时间尺度差异问题,在生成自然交互数字人方面达到了最先进性能。

[72] TrajOnco: a multi-agent framework for temporal reasoning over longitudinal EHR for multi-cancer early detection

  • arXiv: 2604.10386
  • Authors: Sihang Zeng, Young Won Kim, Wilson Lau, Ehsan Alipour, Ruth Etzioni, Meliha Yetisgen, Anand Oka
  • Subjects: cs.AI; cs.MA
  • Tags: Medical AI, Multi-Agent System, LLM Reasoning
  • Summary: TrajOnco是一个免训练的多代理LLM框架,用于多癌症早期检测,通过对纵向EHR数据进行时间推理生成患者级摘要和风险评分。在15种癌症类型的零样本评估中,AUROC达到0.64-0.80,表现与监督学习相当。

[73] CWCD: Category-Wise Contrastive Decoding for Structured Medical Report Generation

  • arXiv: 2604.10410
  • Authors: Shantam Srivastava, Mahesh Bhosale, David Doermann, Mingchen Gao
  • Subjects: cs.AI
  • Tags: Medical AI, Vision-Language Model, Text Generation
  • Venue: MIDL 2026
  • Summary: 本文提出了类别对比解码(CWCD)框架,用于结构化放射学报告生成,通过类别特定参数化和对比正常X光片与掩码X光片来增强生成质量。该方法在临床有效性和自然语言生成指标上持续优于基线方法。

[74] Safety Guarantees in Zero-Shot Reinforcement Learning for Cascade Dynamical Systems

  • arXiv: 2604.10429
  • Authors: Shima Rabiei, Sandipan Mishra, Santiago Paternain
  • Subjects: cs.AI
  • Tags: Reinforcement Learning, Autonomous Driving, Zero-Shot Learning
  • Summary: 本文提出在降阶级联动力学系统上训练安全RL策略,并结合低级控制器在全阶系统中部署。作者提供了安全概率的理论界限,并在四旋翼导航任务上验证了方法的有效性。

[75] VeriSim: A Configurable Framework for Evaluating Medical AI Under Realistic Patient Noise

  • arXiv: 2604.10441
  • Authors: Sina Mansouri, Mohit Marvania, Vibhavari Ashok Shihorkar, Han Ngoc Tran, Kazhal Shafiei, Mehrdad Fazli, Yikuan Li, Ziwei Zhu
  • Subjects: cs.AI
  • Tags: Medical AI, LLM Evaluation, Sim-to-Real
  • Summary: VeriSim是一个患者模拟框架,通过混合UMLS-LLM验证机制,在保持医学基本事实的同时注入可控的临床噪声。实验表明所有LLM在真实患者噪声下性能显著下降,诊断准确率降低15-25%。

[76] PEMANT: Persona-Enriched Multi-Agent Negotiation for Travel

  • arXiv: 2604.10475
  • Authors: Yuran Sun, Mustafa Sameen, Yaotian Zhang, Chia-yu Wu, Xilei Zhao
  • Subjects: cs.AI
  • Tags: Multi-Agent System, LLM Agent, Social Simulation
  • Summary: PEMANT是一个基于LLM的家庭级出行生成框架,整合行为理论进行个性化建模,并通过结构化多代理对话进行家庭级出行规划协商。该方法在国家及区域家庭出行调查数据集上持续优于最先进基准。

[77] Tracing the Roots: A Multi-Agent Framework for Uncovering Data Lineage in Post-Training LLMs

  • arXiv: 2604.10480
  • Authors: Yu Li, Xiaoran Shang, Qizhi Pei, Yun Zhu, Xin Gao, Honglin Lin, Zhanping Zhong, Zhuoshi Pan, Zheng Liu, Xiaoyang Wang, Conghui He, Dahua Lin, Feng Zhao, Lijun Wu
  • Subjects: cs.AI
  • Tags: LLM Training, Data Selection, Knowledge Graph
  • Summary: 本文将数据谱系概念引入LLM生态系统,提出多代理框架重建数据集开发的演化图。分析揭示了领域特定的结构模式以及结构冗余和基准污染传播等系统性问题。

[78] CHAIRO: Contextual Hierarchical Analogical Induction and Reasoning Optimization for LLMs

  • arXiv: 2604.10502
  • Authors: Haotian Lu, Yuchen Mou, Bingzhe Wu
  • Subjects: cs.AI
  • Tags: LLM Reasoning, RAG, Content Moderation
  • Venue: ACL 2026
  • Summary: CHAIRO是一个内容审核框架,利用类比示例增强规则归纳和决策可靠性,通过端到端优化实现动态适应。该方法在审核准确性和规则质量上显著优于规则注入微调和多阶段RAG管道。

[79] CARO: Chain-of-Analogy Reasoning Optimization for Robust Content Moderation

  • arXiv: 2604.10504
  • Authors: Bingzhe Wu, Haotian Lu, Yuchen Mou
  • Subjects: cs.AI
  • Tags: LLM Reasoning, RAG, Content Moderation
  • Venue: ACL 2026 Findings
  • Summary: CARO是一个两阶段训练框架,通过RAG引导类比推理链并进行监督微调,再通过直接偏好优化强化类比推理行为。该方法在模糊审核基准上比最先进模型平均F1分数提高24.9%。

[80] Cooperation in Human and Machine Agents: Promise Theory Considerations

  • arXiv: 2604.10505
  • Authors: M. Burgess
  • Subjects: cs.AI; cs.MA
  • Tags: Multi-Agent System, LLM Agent, AI Safety
  • Summary: 本文应用承诺理论分析人机代理系统中的合作,考察代理间的信号传递、理解、信任、风险和反馈。该工作为AI代理及其与人类交互 revisits 了既定的代理合作原则。

[81] A Progressive Training Strategy for Vision-Language Models to Counteract Spatio-Temporal Hallucinations in Embodied Reasoning

  • arXiv: 2604.10506
  • Authors: Xiaoda Yang, Shuai Yang, Can Wang, Jingyang Xue, Menglan Tang, Checheng Yu, Xunzhe Zhou, Sashuai Zhou, Tao Jin, Lixin Yang, Xiangyu Yue, Zhou Zhao
  • Subjects: cs.AI
  • Tags: Vision-Language Model, LLM Hallucination, Embodied AI
  • Summary: 本文针对视觉语言模型在时空推理中的幻觉问题,提出了一种渐进式训练框架,通过构建思维链数据集和监督预训练结合弱标签微调的方法,显著缩小了模型在前向-反向时序查询间的性能差距。

[82] Beyond Compliance: A Resistance-Informed Motivation Reasoning Framework for Challenging Psychological Client Simulation

  • arXiv: 2604.10507
  • Authors: Danni Liu, Bo Liu, Yuxin Hu, Hantao Zhao, Yan Liu, Ding Ding, Jiahui Jin, Jiuxin Cao
  • Subjects: cs.AI; cs.HC
  • Tags: Dialogue System, LLM Reasoning, Medical AI
  • Summary: 本文提出ResistClient框架,基于来访者阻抗理论系统建模具有挑战性的来访者行为,通过两阶段训练框架(监督微调和过程监督强化学习)解决现有心理来访者模拟器过度顺从的问题。

[83] Thinking Fast, Thinking Wrong: Intuitiveness Modulates LLM Counterfactual Reasoning in Policy Evaluation

  • arXiv: 2604.10511
  • Authors: Yanjie He
  • Subjects: cs.AI; cs.CL
  • Tags: LLM Reasoning, LLM Evaluation
  • Summary: 本文构建了40个政策评估案例基准,发现链式思维提示在直观案例上显著提升性能,但在反直觉案例上效果几乎消失,揭示了LLM的’慢思考’可能只是形式上的推理而非实质性的深度思考。

[84] Agent Mentor: Framing Agent Knowledge through Semantic Trajectory Analysis

  • arXiv: 2604.10513
  • Authors: Roi Ben-Gigi, Yuval David, Fabiana Fournier, Lior Limonad, Dany Moshkovich, Hadar Mulian, Segev Shlomov
  • Subjects: cs.AI
  • Tags: LLM Agent, Prompt Engineering
  • Summary: 本文提出Agent Mentor分析管道,通过监控和增量调整智能体的系统提示词,识别与不良行为相关的语义特征并注入纠正指令,从而提升智能体在各种配置下的准确性。

[85] From Perception to Planning: Evolving Ego-Centric Task-Oriented Spatiotemporal Reasoning via Curriculum Learning

  • arXiv: 2604.10517
  • Authors: Xiaoda Yang, Yuxiang Liu, Shenzhou Gao, Can Wang, Jingyang Xue, Lixin Yang, Yao Mu, Tao Jin, Shuicheng Yan, Zhimeng Zhang, Zhou Zhao
  • Subjects: cs.AI
  • Tags: Embodied AI, Curriculum Learning, Vision-Language Model
  • Summary: 本文提出EgoTSR课程学习框架,用于学习以自我为中心的任务导向时空推理,构建了包含4600万样本的三阶段数据集,有效消除了时序偏差并在长时域逻辑推理任务上达到92.4%的准确率。

[86] Agent^2 RL-Bench: Can LLM Agents Engineer Agentic RL Post-Training?

  • arXiv: 2604.10547
  • Authors: Wanyi Chen, Xiao Yang, Xu Yang, Tianming Sha, Qizheng Li, Zhuo Wang, Bowen Xian, Fang Kong, Weiqing Liu, Jiang Bian
  • Subjects: cs.AI
  • Tags: LLM Agent, Reinforcement Learning, LLM Evaluation
  • Code: code
  • Summary: 本文提出Agent^2 RL-Bench基准,用于评估LLM智能体是否能自主设计、实现和运行完整的强化学习管道,发现智能体在某些任务上取得显著交互式收益,但在其他任务上进展有限。

[87] Failure Ontology: A Lifelong Learning Framework for Blind Spot Detection and Resilience Design

  • arXiv: 2604.10549
  • Authors: Yuan Sun, Hong Yi, Jinyuan Liu
  • Subjects: cs.AI
  • Tags: Continual Learning, Cognitive Science
  • Summary: 本文提出失败本体论框架,用于检测、分类和修复’本体论盲点’——即个人认知地图中系统性缺失的概念领域,并引入四类盲点分类法和五种收敛失败模式。

[88] Working Paper: Towards Schema-based Learning from a Category-Theoretic Perspective

  • arXiv: 2604.10589
  • Authors: Pablo de los Riscos, Fernando J. Corbacho, Michael A. Arbib
  • Subjects: cs.AI
  • Tags: Deep Learning Theory, Knowledge Representation
  • Summary: 本文从范畴论角度提出基于模式学习的层次化框架,跨越模式层、实现层、语义层和智能体层四个互联层次,形式化定义了记忆子系统和认知模块。

[89] Enhancing Cross-Problem Vehicle Routing via Federated Learning

  • arXiv: 2604.10652
  • Authors: Xiangchi Meng, Jianan Zhou, Jie Gao, Yifan Lu, Yaoxin Wu, Gonglin Yuan, Yaqing Hou
  • Subjects: cs.AI; cs.LG
  • Tags: Federated Learning, Optimization, Transfer Learning
  • Summary: 本文提出基于联邦学习的’多问题预训练、单问题微调’框架,用于增强跨问题车辆路径规划的神经组合优化,通过全局模型共享知识提升本地模型在异构复杂约束下的泛化能力。

[90] Governed Reasoning for Institutional AI

  • arXiv: 2604.10658
  • Authors: Mamadou Seck
  • Subjects: cs.AI; cs.CY; cs.MA
  • Tags: LLM Agent, AI Safety, Decision Making
  • Summary: 本文提出Cognitive Core治理决策框架,包含九种类型化认知原语和四层治理模型,在事先授权上诉评估中实现91%准确率且零静默错误,解决了机构AI决策中的问责问题。

[91] Preference-Agile Multi-Objective Optimization for Real-time Vehicle Dispatching

  • arXiv: 2604.10664
  • Authors: Jiahuan Jin, Wenhao Zhao, Rong Qu, Jianfeng Ren, Xinan Chen, Qingfu Zhang, Ruibin Bai
  • Subjects: cs.AI
  • Tags: Optimization, Reinforcement Learning, Decision Making
  • Summary: 本文提出偏好敏捷多目标优化方法,允许用户动态调整目标偏好,通过深度强化学习框架和校准函数实现偏好向量与决策策略的高质量对齐,在集装箱码头车辆调度问题上取得优越性能。

[92] Principles Do Not Apply Themselves: A Hermeneutic Perspective on AI Alignment

  • arXiv: 2604.10673
  • Authors: Behrooz Razeghi
  • Subjects: cs.AI; cs.HC
  • Tags: LLM Alignment, AI Ethics
  • Summary: 本文从诠释学视角分析AI对齐问题,指出原则的应用需要情境敏感的判断,对齐包含解释性成分,并区分了部署诱导评估和语料库诱导评估。

[93] FedRio: Personalized Federated Social Bot Detection via Cooperative Reinforced Contrastive Adversarial Distillation

  • arXiv: 2604.10678
  • Authors: Yingguang Yang, Hao Liu, Xin Zhang, Yunhui Liu, Yutong Xia, Qi Wu, Hao Peng, Taoran Liang, Bin Chong, Tieke He, Philip S. Yu
  • Subjects: cs.AI; cs.LG
  • Tags: Federated Learning, Graph Neural Network, Cybersecurity
  • Summary: 本文提出FedRio框架,通过联邦学习实现跨平台社交机器人检测,结合生成对抗网络的知识提取机制和多阶段对抗对比学习策略,在保护隐私的同时提升检测性能。

[94] Do LLMs Build Spatial World Models? Evidence from Grid-World Maze Tasks

  • arXiv: 2604.10690
  • Authors: Weijiang Li, Yilin Zhu, Rajarshi Das, Parijat Dube
  • Subjects: cs.AI
  • Tags: LLM Reasoning, LLM Evaluation
  • Summary: 本文通过网格迷宫任务系统评估LLM的空间理解能力,发现模型的空间推理具有表征依赖性和提示依赖性,并未构建稳健的空间世界模型,仅在狭窄条件下成功。

[95] FACT-E: Causality-Inspired Evaluation for Trustworthy Chain-of-Thought Reasoning

  • arXiv: 2604.10693
  • Authors: Yuxi Sun, Aoqi Zuo, Haotian Xie, Wei Gao, Mingming Gong, Jing Ma
  • Subjects: cs.AI
  • Tags: LLM Reasoning, LLM Evaluation
  • Venue: ACL 2026
  • Summary: 本文提出FACT-E框架,利用受控扰动作为工具信号来评估思维链推理的忠实性,联合考虑链内忠实性和思维链到答案的一致性,为可信LLM推理提供鲁棒的评估指标。

[96] Camyla: Scaling Autonomous Research in Medical Image Segmentation

  • arXiv: 2604.10696
  • Authors: Yifan Gao, Haoyue Li, Feng Yuan, Xin Gao, Weiran Huang, Xiaosong Wang
  • Subjects: cs.AI; cs.CV
  • Tags: Medical AI, Image Segmentation, LLM Agent
  • Summary: 本文提出Camyla系统,实现医学图像分割领域的完全自主研究,能够将原始数据转化为研究提案、可执行实验和完整论文,在31个数据集上超越最强基线。

[97] SciPredict: Can LLMs Predict the Outcomes of Scientific Experiments in Natural Sciences?

  • arXiv: 2604.10718
  • Authors: Udari Madhushani Sehwag, Elaine Lau, Haniyeh Ehsani Oskouie, Shayan Shabihi, Erich Liang, Andrea Toledo, Guillermo Mangialardi, Sergio Fonrouge, Ed-Yeremai Hernandez Cardona, Paula Vergara, Utkarsh Tyagi, Chen Bo Calvin Zhang, Pavi Bhatter, Nicholas Johnson, Furong Huang, Ernesto Gabriel Hernandez Montoya, Bing Liu
  • Subjects: cs.AI
  • Tags: LLM Evaluation, Scientific Reasoning, LLM Reasoning
  • Code: code
  • Summary: 本文介绍了SciPredict基准,用于评估大语言模型预测物理、生物和化学领域科学实验结果的能力。实验表明,模型准确率仅为14-26%,人类专家约为20%,且模型无法有效区分可靠与不可靠的预测。

[98] Teaching Language Models How to Code Like Learners: Conversational Serialization for Student Simulation

  • arXiv: 2604.10720
  • Authors: Charles Koutcheme, Arto Hellas, Juho Leinonen
  • Subjects: cs.AI; cs.CL; cs.CY
  • Tags: Code Generation, Knowledge Distillation, Education Technology
  • Venue: Educational Data Mining 2026
  • Summary: 本文提出了一种训练开源编程学习模拟器的方法,将学生的编程过程日志序列化为对话格式,并结合监督微调和偏好优化来对齐真实的学生调试行为。实验表明该方法在功能对齐和代码相似性方面优于现有方法。

[99] When More Thinking Hurts: Overthinking in LLM Test-Time Compute Scaling

  • arXiv: 2604.10739
  • Authors: Shu Zhou, Rui Ling, Junan Chen, Xin Wang, Tao Fan, Hao Wang
  • Subjects: cs.AI
  • Tags: LLM Reasoning, LLM Inference
  • Summary: 本文系统研究了LLM测试时计算扩展中边际效用的递减现象,发现模型存在”过度思考”问题,即延长推理可能导致放弃先前正确答案。研究表明适度计算预算可在保持准确性的同时显著降低计算成本。

[100] Learning Preference-Based Objectives from Clinical Narratives for Sequential Treatment Decision-Making

  • arXiv: 2604.10783
  • Authors: Daniel J. Tan, Kay Choong See, Mengling Feng
  • Subjects: cs.AI; cs.LG
  • Tags: Reinforcement Learning, Medical AI, Decision Making
  • Summary: 本文提出CN-PR框架,利用大语言模型从临床叙述中学习强化学习的奖励函数,通过轨迹质量分数构建成对偏好。学习到的奖励与轨迹质量高度相关,并能指导产生更好的临床治疗决策。

[101] TorchUMM: A Unified Multimodal Model Codebase for Evaluation, Analysis, and Post-training

  • arXiv: 2604.10784
  • Authors: Yinyi Luo, Wenwen Wang, Hayes Bai, Hongyu Zhu, Hao Chen, Pan He, Marios Savvides, Sharon Li, Jindong Wang
  • Subjects: cs.AI
  • Tags: Multimodal Learning, Vision-Language Model, LLM Evaluation
  • Code: code
  • Summary: 本文提出了TorchUMM,首个统一的多模态模型代码库,支持跨多种模型架构的评估、分析和后训练。该框架涵盖理解、生成和编辑三类核心任务,提供标准化评估协议以促进公平比较。

[102] CheeseBench: Evaluating Large Language Models on Rodent Behavioral Neuroscience Paradigms

  • arXiv: 2604.10825
  • Authors: Zacharie Bugaud
  • Subjects: cs.AI
  • Tags: LLM Evaluation, LLM Agent, Cognitive Science
  • Summary: 本文介绍了CheeseBench基准,用于评估LLM在九种经典行为神经科学范式上的表现。实验发现最佳模型成功率仅为52.6%,远低于啮齿动物基准的78.9%,且思维链提示反而会降低性能。

[103] Your Model Diversity, Not Method, Determines Reasoning Strategy

  • arXiv: 2604.10827
  • Authors: Moulik Choraria, Argyrios Gerogiannis, Anirban Das, Supriyo Chakraborty, Berkcan Kapusuzoglu, Chia-Hsuan Lee, Kartik Balasubramaniam, Shi-Xiong Zhang, Sambit Sahu
  • Subjects: cs.AI
  • Tags: LLM Reasoning, LLM Inference
  • Summary: 本文论证了最优推理策略取决于模型的多样性特征(即概率质量在解决方案间的分布)。理论分析和实验表明,深度细化策略对低多样性对齐模型更有效,而高多样性基础模型需要不同的方法。

[104] A Benchmark for Gap and Overlap Analysis as a Test of KG Task Readiness

  • arXiv: 2604.10853
  • Authors: Maruf Ahmed Mridul, Rohit Kapa, Oshani Seneviratne
  • Subjects: cs.AI
  • Tags: Knowledge Graph, LLM Evaluation, Information Extraction
  • Summary: 本文提出了一个用于评估知识图谱任务就绪性的基准,专注于保单文档的缺口和重叠分析。基准包含十份人寿保险合同、领域本体和结构化场景,证明本体驱动的方法比纯文本LLM方法具有更好的一致性。

[105] Beyond Statistical Co-occurrence: Unlocking Intrinsic Semantics for Tabular Data Clustering

  • arXiv: 2604.10865
  • Authors: Mingjie Zhao, Yunfan Zhang, Yiqun Zhang, Yiu-ming Cheung
  • Subjects: cs.AI
  • Tags: Tabular Learning, Representation Learning
  • Summary: 本文提出了TagCC框架,利用大语言模型从特征名称和值中提取语义知识,通过对比学习将统计表格表示锚定到开放世界文本概念,显著提升了表格数据聚类性能。

[106] A Quantitative Definition of Intelligence

  • arXiv: 2604.10873
  • Authors: Kang-Sin Choi
  • Subjects: cs.AI
  • Tags: Deep Learning Theory, Cognitive Science
  • Summary: 本文提出了一个可操作的智能定量定义:智能密度等于独立输出对数与总描述长度之比。该框架将智能置于从逻辑门到大脑的连续体上,并通过独立性条件解决了普特南的泛计算主义论证和塞尔的中文屋论证。

[107] ZoomR: Memory Efficient Reasoning through Multi-Granularity Key Value Retrieval

  • arXiv: 2604.10898
  • Authors: David H. Yang, Yuxuan Zhu, Mohammad Mohammadi Amiri, Keerthiram Murugesan, Tejaswini Pedapati, Subhajit Chaudhury, Pin-Yu Chen
  • Subjects: cs.AI; cs.CL
  • Tags: LLM Inference, LLM Reasoning, Memory Architecture
  • Summary: 本文提出了ZoomR方法,通过自适应压缩推理思路并使用动态KV缓存选择策略,实现内存高效的LLM推理。该方法在数学和推理任务上将推理内存需求降低超过4倍,同时保持竞争性性能。

[108] CASK: Core-Aware Selective KV Compression for Reasoning Traces

  • arXiv: 2604.10900
  • Authors: Buseong Kim, Heejun Gwon
  • Subjects: cs.AI; cs.LG
  • Tags: LLM Inference, LLM Reasoning, Memory Architecture
  • Summary: 本文提出了CASK方法,将推理轨迹的KV缓存划分为受保护的核心区域和可合并的暂存区域,通过选择性合并实现行为保持的结构化压缩。实验表明该方法在相同预算下比现有方法具有更高的保真度。

[109] Reasoning as Data: Representation-Computation Unity and Its Implementation in a Domain-Algebraic Inference Engine

  • arXiv: 2604.10908
  • Authors: Chao Li, Yuru Wang
  • Subjects: cs.AI
  • Tags: Knowledge Representation, Neurosymbolic AI
  • Summary: 本文提出了表示-计算统一性(RCU)概念,消除了知识系统中存储与计算的分离。作者实现了一个符号推理引擎,通过域约束四元组实现自动域范围推理,并在ICD-11分类和CBT临床推理案例中验证了有效性。

[110] EvoNash-MARL: A Closed-Loop Multi-Agent Reinforcement Learning Framework for Medium-Horizon Equity Allocation

  • arXiv: 2604.10911
  • Authors: Chongliu Jia, Yi Luo, Sipeng Han, Pengwei Li, Jie Ding, Youshuang Hu, Yimiao Qian, Qiya Wang
  • Subjects: cs.AI; cs.LG
  • Tags: Multi-Agent System, Reinforcement Learning, Quantitative Finance
  • Summary: 本文提出了EvoNash-MARL框架,将强化学习、多智能体策略种群、PSRO式聚合和进化替换整合到统一的走前验证循环中,用于中期股票配置。框架在样本外测试中实现了19.6%的年化收益率。

[111] CSPO: Alleviating Reward Ambiguity for Structured Table-to-LaTeX Generation

  • arXiv: 2604.10918
  • Authors: Yunfan Yang, Cuiling Lan, Jitao Sang, Yan Lu
  • Subjects: cs.AI
  • Tags: Reinforcement Learning, Vision-Language Model, Document Understanding
  • Venue: ACL 2026
  • Summary: 本文提出了组件特定策略优化(CSPO)框架,用于表格到LaTeX生成的强化学习训练。该方法将优化分解为结构、样式和内容三个组件,通过组件特定奖励实现针对性优化,显著提升了生成质量。

[112] RAG-KT: Cross-platform Explainable Knowledge Tracing with Multi-view Fusion Retrieval Generation

  • arXiv: 2604.10960
  • Authors: Zhiyi Duan, Hongyu Yuan, Rui Liu
  • Subjects: cs.AI
  • Tags: RAG, Knowledge Tracing, Education Technology
  • Summary: 本文提出了RAG-KT框架,将跨平台知识追踪建模为检索增强的LLM推理问题。该方法通过多视图融合检索构建统一的多源结构化上下文,实现了跨异构教育平台的准确预测和可解释诊断。

[113] Delving Aleatoric Uncertainty in Medical Image Segmentation via Vision Foundation Models

  • arXiv: 2604.10963
  • Authors: Ruiyang Li, Fang Liu, Licheng Jiao, Xinglin Xie, Jiayao Hao, Shuo Li, Xu Liu, Jingyi Yang, Lingling Li, Puhua Chen, Wenping Ma
  • Subjects: cs.AI
  • Tags: Image Segmentation, Medical AI, Uncertainty Estimation
  • Summary: 该论文提出利用视觉基础模型的通用表示能力来估计医学图像分割中的内在数据不确定性。作者设计了两种不确定性驱动的应用策略:不确定性感知数据过滤机制和动态不确定性感知优化策略,在多个医学影像数据集上取得了显著的性能提升。

[114] CFMS: A Coarse-to-Fine Multimodal Synthesis Framework for Enhanced Tabular Reasoning

  • arXiv: 2604.10973
  • Authors: Qixian Huang, Hongqiang Lin, Tong Fu, Yingsen Wang, Zhenghui Fu, Qirui Wang, Yiding Sun, Dongxu Zhang
  • Subjects: cs.AI; cs.CL
  • Tags: Multimodal Learning, Question Answering, Vision-Language Model
  • Summary: 该论文提出了一种从粗到细的多模态合成框架(CFMS),用于表格推理任务。框架利用多模态大语言模型生成多视角知识元组作为推理地图,引导符号引擎执行高效的表格操作,在WikiTQ和TabFact基准上取得了有竞争力的准确率。

[115] ATANT v1.1: Positioning Continuity Evaluation Against Memory, Long-Context, and Agentic-Memory Benchmarks

  • arXiv: 2604.10981
  • Authors: Samuel Sameer Tanguturi
  • Subjects: cs.AI; cs.IR
  • Tags: LLM Evaluation, Memory Architecture, Long Context
  • Summary: 本文是ATANT v1.0的配套论文,分析了ATANT连续性评估与现有记忆基准(LOCOMO、LongMemEval等)之间的关系。研究表明现有基准均无法充分测量ATANT定义的连续性属性,并识别了各基准的方法论缺陷。

[116] Back to the Barn with LLAMAs: Evolving Pretrained LLM Backbones in Finetuning Vision Language Models

  • arXiv: 2604.10985
  • Authors: Sameera Horawalavithana, Lauren Phillips, Ian Stewart, Sai Munikoti, Karl Pazdernik
  • Subjects: cs.AI; cs.CL; cs.CV
  • Tags: Vision-Language Model, Multimodal Learning, Transfer Learning
  • Summary: 该研究系统性地调查了预训练LLM骨干网络的变化如何影响下游VLM任务性能。研究发现较新的LLM骨干并不总是带来更好的VLM性能,且性能取决于具体的下游任务,视觉问答任务中新骨干倾向于解决不同类型的问题而非更多问题。

[117] WebForge: Breaking the Realism-Reproducibility-Scalability Trilemma in Browser Agent Benchmark

  • arXiv: 2604.10988
  • Authors: Peng Yuan, Yuyang Yin, Yuxuan Cai, Zheng Wei
  • Subjects: cs.AI; cs.CV
  • Tags: LLM Agent, LLM Evaluation, GUI Automation
  • Code: code
  • Summary: 该论文提出了WebForge,首个全自动框架,通过四智能体流水线解决浏览器智能体基准面临的现实主义-可复现性-可扩展性三难困境。框架包含七维难度控制,构建了934个任务的基准,能够系统性地刻画模型能力。

[118] MAFIG: Multi-agent Driven Formal Instruction Generation Framework

  • arXiv: 2604.10989
  • Authors: Shixing Zhao, Zheng Si, Pengpeng Ouyang, Zhengqing Hu, Wanqi Zhu, Dong Chen, Yibo Guo, Mingliang Xu
  • Subjects: cs.AI
  • Tags: Multi-Agent System, LLM Agent, Knowledge Distillation
  • Summary: 该论文提出了多智能体驱动的形式化指令生成框架(MAFIG),用于处理调度系统中的紧急情况。框架包含感知智能体和紧急决策智能体,并引入跨度聚焦损失驱动的局部蒸馏机制,将云端大模型的决策能力迁移到轻量级本地模型。

[119] Sanity Checks for Agentic Data Science

  • arXiv: 2604.11003
  • Authors: Zachary T. Rewolinski, Austin V. Zane, Hao Huang, Chandan Singh, Chenglong Wang, Jianfeng Gao, Bin Yu
  • Subjects: cs.AI; cs.LG
  • Tags: LLM Agent, LLM Evaluation, Data Synthesis
  • Summary: 该论文提出基于PCS框架的轻量级健全性检查方法,用于评估智能体数据科学流水线输出的可信度。检查方法通过扰动来筛选智能体是否能可靠地区分信号与噪声,揭示了ADS系统自我报告的置信度与经验稳定性之间存在校准偏差。

[120] Diffusion-CAM: Faithful Visual Explanations for dMLLMs

  • arXiv: 2604.11005
  • Authors: Haomin Zuo, Yidi Li, Luoxiao Yang, Xiaofeng Zhang
  • Subjects: cs.AI
  • Tags: Interpretability, Diffusion Model, Vision-Language Model
  • Venue: ACL 2026
  • Summary: 该论文提出了Diffusion-CAM,首个专门为扩散多模态大语言模型设计的可解释性方法。该方法通过可微分探测中间表示来捕获潜在特征和类别特定梯度,解决了扩散架构中并行去噪带来的激活模式分布平滑的挑战。

[121] Min-$k$ Sampling: Decoupling Truncation from Temperature Scaling via Relative Logit Dynamics

  • arXiv: 2604.11012
  • Authors: Yuanhao Ding, Meimingwei Li, Esteban Garces Arias, Matthias Aßenmacher, Christian Heumann, Chongsheng Zhang
  • Subjects: cs.AI; cs.CL; cs.LG
  • Tags: LLM Inference, Text Generation
  • Venue: ACL 2026
  • Summary: 该论文提出了Min-k采样策略,一种通过分析排序后logit分布的局部形状来识别语义悬崖的动态截断方法。该方法实现了温度不变性,在多个推理基准和创意写作任务中展示了更好的文本质量,即使在极端温度设置下也能保持稳健性能。

[122] Introspective Diffusion Language Models

  • arXiv: 2604.11035
  • Authors: Yifan Yu, Yuqing Jian, Junxiong Wang, Zhongzhu Zhou, Donglin Zhuang, Xinyu Fang, Sri Yanamandra, Xiaoxia Wu, Qingyang Wu, Shuaiwen Leon Song, Tri Dao, Ben Athiwaratkun, James Zou, Fan Lai, Chenfeng Xu
  • Subjects: cs.AI
  • Tags: Diffusion Model, LLM Inference, Text Generation
  • Summary: 该论文引入了内省扩散语言模型(I-DLM),在保留扩散式并行解码的同时继承了自回归训练的内省一致性。模型使用新颖的内省步进解码算法,在15个基准上首次实现了与同规模AR模型相匹配的质量,同时提供了更高的吞吐量。

[123] Intelligent Approval of Access Control Flow in Office Automation Systems via Relational Modeling

  • arXiv: 2604.11040
  • Authors: Dugang Liu, Zulong Chen, Chuanfei Xu, Jiaxuan He, Yunlu Ma, Jia Xu
  • Subjects: cs.AI
  • Tags: Decision Making, Knowledge Representation
  • Summary: 该论文提出了关系建模驱动的智能审批(RMIA)框架,用于自动化办公自动化系统中的访问控制流审批。框架结合二元关系建模和三元关系建模模块,分别从粗粒度和细粒度角度刻画申请者、审批者和资源之间的关系。

[124] From Topology to Trajectory: LLM-Driven World Models For Supply Chain Resilience

  • arXiv: 2604.11041
  • Authors: Jia Luo
  • Subjects: cs.AI
  • Tags: LLM Agent, Multi-Agent System, Reinforcement Learning
  • Summary: 该论文介绍了ReflectiChain,一个面向弹性半导体供应链规划的认知智能体框架。核心创新在于整合了由生成式世界模型驱动的潜在轨迹排演,以及回溯智能体强化学习机制,在出口禁令和材料短缺等极端场景下实现了显著的性能提升。

[125] EmergentBridge: Improving Zero-Shot Cross-Modal Transfer in Unified Multimodal Embedding Models

  • arXiv: 2604.11043
  • Authors: Jincheng Xie, Xingchen Xiao, Runheng Liu, Zhongyi Huang, Yu Zheng, Heyan Huang
  • Subjects: cs.AI
  • Tags: Multimodal Learning, Zero-Shot Learning, Transfer Learning
  • Summary: 该论文提出了EmergentBridge,一种嵌入级桥接框架,用于改进统一多模态嵌入模型中未配对模态对的零样本迁移性能。方法通过学习产生噪声桥锚的映射,并在与锚对齐方向正交的子空间中强制代理对齐,在保持锚对齐的同时增强非锚连接。

[126] AI Integrity: A New Paradigm for Verifiable AI Governance

  • arXiv: 2604.11065
  • Authors: Seulki Lee
  • Subjects: cs.AI
  • Tags: AI Safety, AI Ethics, Interpretability
  • Summary: 该论文引入AI完整性作为新的AI治理范式,定义为AI系统的权威栈(价值、认识论标准、来源偏好和数据选择标准的分层层级)免受腐败、污染、操纵和偏见影响的状态。论文提出了PRISM框架作为操作方法论,定义了六个核心指标。

[127] PRISM Risk Signal Framework: Hierarchy-Based Red Lines for AI Behavioral Risk

  • arXiv: 2604.11070
  • Authors: Seulki Lee
  • Subjects: cs.AI
  • Tags: AI Safety, AI Ethics, LLM Evaluation
  • Summary: 该论文基于PRISM框架定义了27种行为风险信号分类法,这些信号源于AI系统在价值优先级、证据类型权重和信息来源信任方面的结构异常。层次化方法相比案例级红线具有前瞻性、全面性和可测量性优势。

[128] Hodoscope: Unsupervised Monitoring for AI Misbehaviors

  • arXiv: 2604.11072
  • Authors: Ziqian Zhong, Shashwat Saxena, Aditi Raghunathan
  • Subjects: cs.AI
  • Tags: AI Safety, LLM Agent, Anomaly Detection
  • Summary: 该论文引入Hodoscope,一种无监督监控工具,通过比较跨组行为分布来突出显示独特且潜在可疑的行为模式。该工具发现了Commit0基准中先前未知的漏洞,并将审查工作量相比均匀采样减少了6-23倍。

[129] Towards Proactive Information Probing: Customer Service Chatbots Harvesting Value from Conversation

  • arXiv: 2604.11077
  • Authors: Chen Huang, Zitan Jiang, Changyi Zou, Wenqiang Lei, See-Kiong Ng
  • Subjects: cs.AI; cs.CL
  • Tags: Dialogue System, LLM Agent, Information Extraction
  • Venue: ACL 2026
  • Code: code
  • Summary: 该论文提出了一种新的主动信息探测任务,旨在优化客服聊天机器人何时向用户询问目标信息的时机。作者提出了PROCHATIP框架,通过专门的对话策略模块来掌握探测时机,实验表明该方法在信息探测和服务质量方面均优于基线。

[130] Do Agent Rules Shape or Distort? Guardrails Beat Guidance in Coding Agents

  • arXiv: 2604.11088
  • Authors: Xing Zhang, Guanghui Wang, Yanwei Cui, Wei Qiu, Ziyuan Li, Bing Zhu, Peiyang He
  • Subjects: cs.AI; cs.CL
  • Tags: LLM Agent, Code Generation, LLM Evaluation
  • Summary: 该论文对679个编码代理规则文件进行了首次大规模实证评估,发现负面约束规则比正面指令更有效,规则通过上下文启动而非具体指令来改善性能,为安全代理配置提供了明确原则。

[131] Frugal Knowledge Graph Construction with Local LLMs: A Zero-Shot Pipeline, Self-Consistency and Wisdom of Artificial Crowds

  • arXiv: 2604.11104
  • Authors: Pierre Jourlin
  • Subjects: cs.AI; cs.IR; cs.LG; cs.NE
  • Tags: Knowledge Graph, Zero-Shot Learning, LLM Inference
  • Code: code
  • Summary: 该论文提出了一种在消费级硬件上完全通过本地推理执行的知识图谱构建零样本流水线,研究了多模型自一致性机制,发现强共识信号可能表示集体幻觉而非可靠答案。

[132] Persona Non Grata: Single-Method Safety Evaluation Is Incomplete for Persona-Imbued LLMs

  • arXiv: 2604.11120
  • Authors: Wenkai Li, Fan Yang, Shaunak A. Mehta, Koichi Onoue
  • Subjects: cs.AI
  • Tags: LLM Evaluation, AI Safety, LLM Alignment
  • Summary: 该论文揭示了人格注入LLM的安全评估存在不完整性:提示和激活引导暴露出不同的架构依赖漏洞,单一方法测试可能错过模型的主要失效模式,发现亲社会人格悖论现象。

[133] A Proposed Biomedical Data Policy Framework to Reduce Fragmentation, Improve Quality, and Incentivize Sharing in Indian Healthcare in the era of Artificial Intelligence and Digital Health

  • arXiv: 2604.11125
  • Authors: Nikhil Mehta, Sachin Gupta, Gouri RP Anand
  • Subjects: cs.AI
  • Tags: Medical AI, AI Ethics, Federated Learning
  • Summary: 该论文提出了一个多层激励架构框架,旨在解决印度医疗生物医学数据碎片化问题,包括数据论文认可、开放数据指标、联邦学习收入共享和机构数据管理等措施。

[134] MADQRL: Distributed Quantum Reinforcement Learning Framework for Multi-Agent Environments

  • arXiv: 2604.11131
  • Authors: Abhishek Sawaika, Samuel Yen-Chi Chen, Udaya Parampalli, Rajkumar Buyya
  • Subjects: cs.AI; cs.LG; cs.MA
  • Tags: Quantum Computing, Reinforcement Learning, Multi-Agent System
  • Venue: IEEE QCNC 2026 Workshop
  • Summary: 该论文提出了一种分布式量子强化学习框架,用于多智能体环境中的高维系统学习,在合作乒乓球环境中实现了约10%的性能提升。

[135] From Answers to Arguments: Toward Trustworthy Clinical Diagnostic Reasoning with Toulmin-Guided Curriculum Goal-Conditioned Learning

  • arXiv: 2604.11137
  • Authors: Chen Zhan, Xiaoyu Tan, Gengchen Ma, Yu-Jie Xiong, Xiaoyan Jiang, Xihe Qiu
  • Subjects: cs.AI; cs.LG
  • Tags: Medical AI, LLM Reasoning, Curriculum Learning
  • Venue: ACL 2026
  • Summary: 该论文提出了一种基于图尔敏模型的可信临床论证框架,采用渐进式目标条件学习(CGCL)训练流水线,使LLM能够生成透明的诊断论证,在诊断准确性和推理质量方面达到与强化学习方法相当的效果。

[136] Environmental Footprint of GenAI Research: Insights from the Moshi Foundation Model

  • arXiv: 2604.11154
  • Authors: Marta López-Rauhut, Loic Landrieu, Mathieu Aubry, Anne-Laure Ligozat
  • Subjects: cs.AI
  • Tags: AI Sustainability, LLM Training, Speech Processing
  • Summary: 该论文首次对多模态大语言模型研究的计算开销进行了细粒度分析,量化了Moshi语音-文本基础模型从研发到部署全生命周期的能源消耗、碳排放和资源消耗,为可持续AI研究提供了可操作的指导方针。

[137] Measuring the Authority Stack of AI Systems: Empirical Analysis of 366,120 Forced-Choice Responses Across 8 AI Models

  • arXiv: 2604.11216
  • Authors: Seulki Lee
  • Subjects: cs.AI
  • Tags: LLM Evaluation, AI Ethics, Decision Making
  • Summary: 该论文首次对AI系统的权威栈框架进行大规模实证映射,通过366,120个强制选择响应分析了8个AI模型的价值优先级、证据偏好和来源信任层级,揭示了AI模型具有可测量但不稳定的权威栈特征。

[138] Mobile GUI Agent Privacy Personalization with Trajectory Induced Preference Optimization

  • arXiv: 2604.11259
  • Authors: Zhixin Lin, Jungang Li, Dongliang Xu, Shidong Pan, Yibo Shi, Yuchi Liu, Yuecong Min, Yue Yao
  • Subjects: cs.AI; cs.CR
  • Tags: LLM Agent, LLM Personalization, Privacy
  • Code: code
  • Summary: 该论文研究了移动GUI代理的隐私个性化问题,提出了轨迹诱导偏好优化(TIPO)方法,通过偏好强度加权和填充门控来处理隐私相关步骤,在保持任务执行能力的同时改善了人格对齐。

[139] Inspectable AI for Science: A Research Object Approach to Generative AI Governance

  • arXiv: 2604.11261
  • Authors: Ruta Binkyte, Sharif Abuaddba, Chamikara Mahawaga, Ming Ding, Natasha Fernandes, Mario Fritz
  • Subjects: cs.AI
  • Tags: AI Ethics, Scientific Computing, AI Safety
  • Summary: 该论文提出了AI作为研究对象(AI-RO)范式,将AI交互视为可检查的结构化研究组件,通过记录模型配置、提示和输出来实现生成式AI在科学研究中的治理。

[140] Consistency of AI-Generated Exercise Prescriptions: A Repeated Generation Study Using a Large Language Model

  • arXiv: 2604.11287
  • Authors: Kihyuk Lee
  • Subjects: cs.AI; q-bio.OT
  • Tags: Medical AI, LLM Evaluation, Text Generation
  • Summary: 该研究评估了LLM生成运动处方的模型内一致性,发现语义一致性较高但关键定量组件存在变异性,安全相关表达在所有输出中均被包含但数量因场景而异。

[141] BankerToolBench: Evaluating AI Agents in End-to-End Investment Banking Workflows

  • arXiv: 2604.11304
  • Authors: Elaine Lau, Markus Dücker, Ronak Chaudhary, Hui Wen Goh, Rosemary Wei, Vaibhav Kumar, Saed Qunbar, Guram Gogia, Yi Liu, Scott Millslagle, Nasim Borazjanizadeh, Ulyana Tkachenko, Samuel Eshun Danquah, Collin Schweiker, Vijay Karumathil, Asrith Devalaraju, Varsha Sandadi, Haemi Nam, Punit Arani, Ray Epps, Abdullah Arif, Sahil Bhaiwala, Curtis Northcutt, Skyler Wang, Anish Athalye, Jonas Mueller, Francisco Guzmán
  • Subjects: cs.AI
  • Tags: LLM Agent, LLM Evaluation, Quantitative Finance
  • Summary: 该论文介绍了BankerToolBench基准测试,用于评估AI代理在投资银行端到端分析工作流中的表现,发现即使最先进的模型也未能通过近一半的评分标准,揭示了代理AI在高风险专业工作流中的关键障碍。

[142] PaperScope: A Multi-Modal Multi-Document Benchmark for Agentic Deep Research Across Massive Scientific Papers

  • arXiv: 2604.11307
  • Authors: Lei Xiong, Huaying Yuan, Zheng Liu, Zhao Cao, Zhicheng Dou
  • Subjects: cs.AI
  • Tags: LLM Evaluation, Scientific Reasoning, Multimodal Learning
  • Summary: 该论文提出了PaperScope多模态多文档基准测试,用于评估代理深度研究能力,基于超过2000篇AI论文的知识图谱构建,包含推理、检索、摘要和问题解决等多任务评估。

[143] Select Smarter, Not More: Prompt-Aware Evaluation Scheduling with Submodular Guarantees

  • arXiv: 2604.11328
  • Authors: Xiaoyu Ma, Yiwen Li, Haoyue Liu, Zhichao Wang, Ye Chen, Yongxin Guo, Xiaoying Tang
  • Subjects: cs.AI; cs.LG
  • Tags: Prompt Engineering, LLM Evaluation, Optimization
  • Summary: 该论文提出了提示感知在线评估调度(POES)方法,将自动提示优化映射为在线自适应测试问题,通过子模优化保证实现更智能的评估样本选择,在相同评估预算下显著提高准确率。

[144] Dynamic Summary Generation for Interpretable Multimodal Depression Detection

  • arXiv: 2604.11334
  • Authors: Shiyu Teng, Jiaqing Liu, Hao Sun, Yu Li, Shurong Chai, Ruibo Hou, Tomoko Tateyama, Lanfen Lin, Yen-Wei Chen
  • Subjects: cs.AI
  • Tags: Medical AI, Multimodal Learning, Affective Computing
  • Summary: 该论文提出了一种从粗到细的多阶段框架,利用LLM进行可解释的抑郁症检测,通过生成渐进丰富的临床摘要指导多模态融合模块,在准确性和可解释性方面均优于现有基线。

[145] CoRe-ECG: Advancing Self-Supervised Representation Learning for 12-Lead ECG via Contrastive and Reconstructive Synergy

  • arXiv: 2604.11359
  • Authors: Zehao Qin, Xiaojian Lin, Ping Zhang, Hongliang Wu, Xinkang Wang, Guangling Liu, Bo Chen, Wenming Yang, Guijin Wang
  • Subjects: cs.AI; cs.LG
  • Tags: Medical AI, Self-Supervised Learning, Representation Learning
  • Summary: 本文提出了CoRe-ECG,一种用于12导联心电图的自监督学习框架,通过结合对比学习和重构学习来增强表征能力。该方法引入了频率动态增强和时空双重掩码策略,有效解决了现有方法中非生理失真和导联间捷径依赖的问题。实验结果表明,该方法在多个下游ECG数据集上取得了最先进的性能。

[146] The Missing Knowledge Layer in Cognitive Architectures for AI Agents

  • arXiv: 2604.11364
  • Authors: Michaël Roynard
  • Subjects: cs.AI
  • Tags: LLM Agent, Memory Architecture, Knowledge Representation
  • Summary: 本文指出现有的AI智能体认知架构缺乏显式的知识层,导致事实与经验的处理机制混淆。作者提出了一个四层分解架构(知识、记忆、智慧、智能),每层具有独特的持久化语义,并提供了相应的实现代码。该研究论证了这种架构区分在工程实现中的必要性和可行性。

[147] Learning from Contrasts: Synthesizing Reasoning Paths from Diverse Search Trajectories

  • arXiv: 2604.11365
  • Authors: Peiyang Liu, Zhirui Chen, Xi Wang, Di Liang, Youru Li, Zhi Cai, Wei Ye
  • Subjects: cs.AI; cs.CL
  • Tags: LLM Reasoning, Data Synthesis
  • Summary: 本文提出了对比推理路径合成框架(CRPS),通过分析蒙特卡洛树搜索中高质量与低质量轨迹的差异来合成推理链。该方法将监督提取从过滤过程转变为合成过程,使得模型仅需少量数据即可达到甚至超越基线模型的性能。实验证明,从成功与失败的对比中学习能产生更具泛化性的推理能力。

[148] From Agent Loops to Structured Graphs:A Scheduler-Theoretic Framework for LLM Agent Execution

  • arXiv: 2604.11378
  • Authors: Hu Wei
  • Subjects: cs.AI; eess.SY
  • Tags: LLM Agent, LLM Inference
  • Summary: 本文批判了当前LLM智能体主流的“智能体循环”范式,指出了其隐式依赖和无限恢复循环等结构性弱点。作者提出了结构化图利用(SGH)框架,将控制流提升为显式的静态有向无环图,以增强系统的可控性和可验证性。这是一个立场性论文,提供了理论框架和设计分析。

[149] Beyond RAG for Cyber Threat Intelligence: A Systematic Evaluation of Graph-Based and Agentic Retrieval

  • arXiv: 2604.11419
  • Authors: Dzenan Hamzic, Florian Skopik, Max Landauer, Markus Wurzenberger, Andreas Rauber
  • Subjects: cs.AI; cs.CR
  • Tags: RAG, Cybersecurity, Knowledge Graph
  • Summary: 本文针对网络威胁情报(CTI)分析任务,系统评估了四种检索增强生成(RAG)架构的性能。研究发现,基于图的检索在结构化事实查询中表现优异,而混合图文本方法在多跳问题上显著优于传统向量RAG。该工作为CTI领域的检索系统设计提供了重要参考。

[150] Escaping the Context Bottleneck: Active Context Curation for LLM Agents via Reinforcement Learning

  • arXiv: 2604.11462
  • Authors: Xiaozhe Li, Tianyi Lyu, Yizhao Yang, Liang Shan, Siyi Yang, Ligao Zhang, Zhuoyi Huang, Qingwen Liu, Yang Li
  • Subjects: cs.AI
  • Tags: LLM Agent, Reinforcement Learning, Long Context
  • Summary: 针对大语言模型在长程任务中面临的上下文瓶颈问题,本文提出了一种共生框架,将上下文管理与任务执行解耦。该框架利用强化学习训练一个轻量级的策略模型来主动修剪环境噪声并保留关键推理锚点。实验结果显示,该方法在提高任务成功率的同时显著降低了Token消耗。

[151] Three Roles, One Model: Role Orchestration at Inference Time to Close the Performance Gap Between Small and Large Agents

  • arXiv: 2604.11465
  • Authors: S. Aaron McClendon, Jorge Gallego-Feliciano, Stavros Zervoudakis, Antonios Saravanos
  • Subjects: cs.AI
  • Tags: LLM Agent, LLM Inference
  • Summary: 本文研究了如何在不增加训练计算的情况下,通过推理时的脚手架策略提升小模型在复杂多步环境中的性能。作者设计了一个三层推理流程,让同一个冻结模型扮演摘要、主智能体和修正者三个角色,从而显著提升了任务完成率。实验表明,这种结构化的推理时干预能使8B模型媲美更大的模型。

[152] From Attribution to Action: A Human-Centered Application of Activation Steering

  • arXiv: 2604.11467
  • Authors: Tobias Labarta, Maximilian Dreyer, Katharina Weitz, Wojciech Samek, Sebastian Lapuschkin
  • Subjects: cs.AI; cs.HC; cs.LG
  • Tags: Interpretability, Vision-Language Model
  • Summary: 本文提出了一种结合稀疏自编码器归因与激活引导的交互式工作流,用于视觉模型的实例级概念分析。通过对专家进行调试任务的访谈,研究发现激活引导使从业者能够从单纯的检查转向基于干预的假设检验。该工作提高了可解释性的可操作性,并揭示了安全有效使用激活引导的考量因素。

[153] OOM-RL: Out-of-Money Reinforcement Learning Market-Driven Alignment for LLM-Based Multi-Agent Systems

  • arXiv: 2604.11477
  • Authors: Kun Liu, Liqun Chen
  • Subjects: cs.AI; cs.SE; q-fin.TR
  • Tags: LLM Alignment, Multi-Agent System, Reinforcement Learning
  • Summary: 本文提出了一种名为“资金耗尽强化学习”(OOM-RL)的客观对齐范式,利用金融市场中的资本消耗作为不可欺骗的负反馈信号。该研究通过20个月的纵向实验,展示了多智能体系统如何在严格的经济惩罚下,从基线演变为稳健的架构。结果表明,用严格的经济惩罚替代主观偏好为高风险环境下的智能体对齐提供了有效方法。

[154] On the Complexity of the Discussion-based Semantics in Abstraction Argumentation

  • arXiv: 2604.11480
  • Authors: Lydia Blümel, Kai Sauerwald, Kenneth Skiba, Matthias Thimm
  • Subjects: cs.AI
  • Tags: Formal Methods, Knowledge Representation
  • Summary: 本文证明了在抽象论辩框架中,基于讨论的语义下判断两个论点强弱的问题是多项式时间可解的。作者利用自动机理论的结果,将问题转化为半环自动机的等价问题,为排序语义的计算复杂性提供了新的视角。

[155] Anthropogenic Regional Adaptation in Multimodal Vision-Language Model

  • arXiv: 2604.11490
  • Authors: Samuel Cahyawijaya, Peerat Limkonchotiwat, Tack Hwa Wong, Hitesh Laxmichand Patel, Amit Agarwal, Manuel Antonio Rufino, Carlos Rafael Catalan, Muhammad Reza Qorib, Vicky Feliren, Holy Lovenia, Aye Hninn Khine, Frederikus Hudi, David Anugraha, Alham Fikri Aji, Romrawin Chumpu, Viet-Thanh Pham, Minghan Wang, Mohamed Fazli Imam, Ruochen Zhang, Joseph Marvin Imperial, Do Xuan Long, Musa Izzanardi Wijanarko, Joel Ruben Antony Moniz, Patrick Amadeus Irawan, Hanif Muhammad Zhafran, Isaiah Flores, Ira Salsabila, Jun Kevin, Jostin Jerico Rosal, Patricia Nicole Monderin, Kun Kerdthaisong, Ahmad Mustafid, My Chiffon Nguyen, Natchapon Jongwiriyanurak, Siva Worajitwannakul, Haochen Li, Adrian Xuan Wei Lim, Bin Wang, Muhammad Ravi Shulthan Habibi, Lynnette Hui Xian Ng, Mithil Bangera, Yeshil Bangera, Priyaranjan Pattnayak, Dun Li Chan, Sherissa Caren Djuniwar, Hee Ming Shan
  • Subjects: cs.AI; cs.CL; cs.CV
  • Tags: Vision-Language Model, Domain Adaptation
  • Summary: 本文提出了“人类区域适应”这一新范式,旨在优化视觉语言模型对特定区域语境的相关性,同时保持其全球泛化能力。作者提出了一种名为GG-EZ的简单有效方法,利用区域数据过滤和模型合并策略,在东南亚案例研究中取得了显著的性能提升。该研究确立了区域价值对齐作为多模态模型应用的基础范式。

[156] Lectures on AI for Mathematics

  • arXiv: 2604.11504
  • Authors: Xiaoyang Chen, Xiaoyang Chen
  • Subjects: cs.AI; math.AP; math.AT; math.DG
  • Tags: Scientific Reasoning
  • Summary: 本书全面介绍了人工智能在数学领域的应用,涵盖了利用AI发现数学模式、辅助证明复杂定理以及构建反例等核心原理。通过清晰的解释,探讨了AI如何推动数学研究的发展。

[157] PAC-BENCH: Evaluating Multi-Agent Collaboration under Privacy Constraints

  • arXiv: 2604.11523
  • Authors: Minjun Park, Donghyun Kim, Hyeonjong Ju, Seungwon Lim, Dongwook Choi, Taeyoon Kwon, Minju Kim, Jinyoung Yeo
  • Subjects: cs.AI; cs.MA
  • Tags: Multi-Agent System, Privacy, LLM Evaluation
  • Summary: 本文提出了PAC-BENCH,一个用于评估隐私约束下多智能体协作性能的基准测试。实验发现隐私约束显著降低了协作性能,并导致了早期隐私违规和隐私诱导幻觉等协调故障。该研究指出了隐私感知的多智能体协作是一个独特且未解决的挑战。

[158] Limited Perfect Monotonical Surrogates constructed using low-cost recursive linkage discovery with guaranteed output

  • arXiv: 2604.11524
  • Authors: M.W. Przewozniczek, F. Chicano, R. Tinós, M.M. Komarnicki
  • Subjects: cs.AI; cs.DS
  • Tags: Optimization
  • Summary: 本文提出了一种名为LyMPuS的有限单调完美代理模型,用于解决计算昂贵问题的优化难题。该方法能够以低廉的成本发现变量间的链接关系,并保证在有限步骤内找到缺失的依赖关系。该代理模型无需参数调整,可在优化过程中即时训练,有效降低了局部搜索的成本。

[159] Problem Reductions at Scale: Agentic Integration of Computationally Hard Problems

  • arXiv: 2604.11535
  • Authors: Xi-Wei Pan, Shi-Wen An, Jin-Guo Liu
  • Subjects: cs.AI
  • Tags: LLM Agent, Code Generation
  • Code: code
  • Summary: 本文展示了如何利用AI编码智能体构建大规模的NP难问题归约库,从而允许用户通过单一接口将问题路由到不同的求解器。作者设计了一套包含约束、验证系统和反馈循环的工具链,在短时间内构建了一个包含100多种问题类型的命令行工具。该研究表明,良好的工程引导能让智能体构建大规模且经过充分测试的软件。

[160] A collaborative agent with two lightweight synergistic models for autonomous crystal materials research

  • arXiv: 2604.11540
  • Authors: Tongyu Shi, Yutang Li, Zhanyuan Li, Qian Liu, Jie Zhou, Wenhe Xu, Yang Li, Dawei Dai, Rui He, Wenhua Zhou, Jiahong Wang, Xue-Feng Yu
  • Subjects: cs.AI
  • Tags: LLM Agent, Scientific Reasoning, Material Discovery
  • Summary: 本文提出了MatBrain,一个用于晶体材料研究的轻量级协作智能体系统,采用双模型架构分别负责分析推理和工具执行。该架构通过解耦工具规划与分析推理的熵动态,解决了大模型在特定领域推理和工具协调上的困难。实验表明,MatBrain在显著降低硬件部署门槛的同时,在结构生成和催化剂设计等任务上超越了更大的通用模型。

[161] SemaClaw: A Step Towards General-Purpose Personal AI Agents through Harness Engineering

  • arXiv: 2604.11548
  • Authors: Ningyan Zhu, Huacan Wang, Jie Zhou, Feiyu Chen, Shuo Zhang, Ge Chen, Chen Liu, Jiarou Wu, Wangyi Chen, Xiaofeng Mou, Yi Xu
  • Subjects: cs.AI
  • Tags: LLM Agent, Multi-Agent System
  • Summary: 本文介绍了SemaClaw,一个开源多智能体应用框架,通过“挽具工程”将不受约束的智能体转化为可控、可审计且生产可靠的系统。该框架提出了基于DAG的两阶段混合智能体团队编排方法、行为安全系统以及三层上下文管理架构,旨在迈向通用个人AI智能体。

[162] UniToolCall: Unifying Tool-Use Representation, Data, and Evaluation for LLM Agents

  • arXiv: 2604.11557
  • Authors: Yijuan Liang, Xinghao Chen, Yifan Ge, Ziyi Wu, Hao Wu, Changyu Zeng, Wei Xing, Xiaoyu Shen
  • Subjects: cs.AI
  • Tags: LLM Agent, Tool Learning
  • Code: code
  • Summary: 本文提出了UniToolCall框架,统一了工具使用的表示、数据和评估流程,包括构建大型工具池和混合训练语料库。该框架引入了锚点链接机制以支持连贯的多轮推理,并在实验中显著提升了模型的工具使用性能。

[163] Intersectional Sycophancy: How Perceived User Demographics Shape False Validation in Large Language Models

  • arXiv: 2604.11609
  • Authors: Benjamin Maltbie, Shivam Raval
  • Subjects: cs.AI; cs.HC
  • Tags: LLM Alignment, Fairness, LLM Evaluation
  • Summary: 本文研究了大型语言模型中的“阿谀奉承”倾向,即模型为了迎合用户而验证错误观点的行为,并发现这种行为会因感知到的用户人口统计特征(如种族、年龄、性别)而异。实验表明,不同模型在奉承行为上存在显著差异,且特定人群受到的虚假验证率更高。

[164] Context Kubernetes: Declarative Orchestration of Enterprise Knowledge for Agentic AI Systems

  • arXiv: 2604.11623
  • Authors: Charafeddine Mouzouni
  • Subjects: cs.AI; cs.SE
  • Tags: LLM Agent, Knowledge Representation
  • Code: code
  • Summary: 本文介绍了Context Kubernetes架构,用于在企业级智能体AI系统中编排知识,通过声明式清单和协调循环确保知识的正确性、新鲜度和权限控制。实验表明,该架构能有效防止数据泄露、检测过时内容并阻止安全攻击,解决了现有平台缺乏治理的问题。

[165] RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

  • arXiv: 2604.11626
  • Authors: Haozhe Wang, Cong Wei, Weiming Ren, Jiaming Liu, Fangzhen Lin, Wenhu Chen
  • Subjects: cs.AI; cs.LG
  • Tags: Text-to-Image, Reinforcement Learning
  • Summary: 本文提出了RationalRewards模型,通过在评分前生成显式的多维评论,将奖励模型转化为主动优化工具,从而在训练和测试阶段提升视觉生成质量。该方法在偏好预测上达到了最先进水平,并能通过测试时的评论与优化循环匹配或超越基于强化学习的微调效果。

[166] Why Do Large Language Models Generate Harmful Content?

  • arXiv: 2604.11663
  • Authors: Rajesh Ganguli, Raha Moraffah
  • Subjects: cs.AI
  • Tags: LLM Alignment, Interpretability
  • Summary: 本文提出了一种基于因果中介分析的方法,用于识别大型语言模型生成有害内容的因果因素。研究发现,有害内容的生成主要源于模型后期的MLP块失效,并与特定的神经元门控机制有关。

[167] DreamKG: A KG-Augmented Conversational System for People Experiencing Homelessness

  • arXiv: 2604.11703
  • Authors: Javad M Alizadeh, Genhui Zheng, Chiu C Tan, Yuzhou Chen, Omar Martinez, Philip McCallion, Ying Ding, Chenguang Yang, AnneMarie Tomosky, Huanmei Wu
  • Subjects: cs.AI
  • Tags: Knowledge Graph, Dialogue System, RAG
  • Venue: ICHI 2026
  • Summary: DreamKG是一个面向无家可归者的知识图谱增强对话系统,通过结合Neo4j知识图谱和结构化查询理解,提供基于验证数据的可靠信息。该系统能够处理位置感知和时间敏感的查询,有效解决了标准大语言模型容易产生幻觉的问题。

[168] Agentic Driving Coach: Robustness and Determinism of Agentic AI-Powered Human-in-the-Loop Cyber-Physical Systems

  • arXiv: 2604.11705
  • Authors: Deeksha Prahlad, Daniel Fan, Hokeun Kim
  • Subjects: cs.AI; cs.CL; cs.RO; eess.SY
  • Tags: Autonomous Driving, LLM Agent
  • Summary: 本文提出了一种基于反应器计算模型的方法,利用Lingua Franca框架解决人在回路信息物理系统(CPS)中智能体AI行为的不可预测性问题。通过智能体驾驶教练的案例研究,展示了如何在动态环境中重新引入确定性并应对实际挑战。

[169] A Mamba-Based Multimodal Network for Multiscale Blast-Induced Rapid Structural Damage Assessment

  • arXiv: 2604.11709
  • Authors: Wanli Ma, Sivasakthy Selvakumaran, Dain G. Farrimond, Adam A. Dennis, Samuel E. Rigby
  • Subjects: cs.AI
  • Tags: Remote Sensing, Multimodal Learning
  • Code: code
  • Summary: 本文提出了一种基于Mamba的多模态网络,用于快速结构损伤评估,该方法将多尺度爆炸载荷信息与光学遥感图像相结合。在2020年贝鲁特爆炸事件的评估中,该方法显著优于现有最先进的方法,提高了灾害管理的效率。

[170] SWE-AGILE: A Software Agent Framework for Efficiently Managing Dynamic Reasoning Context

  • arXiv: 2604.11716
  • Authors: Shuquan Lian, Juncheng Liu, Yazhe Chen, Yuhong Chen, Hui Li
  • Subjects: cs.AI; cs.CL
  • Tags: LLM Agent, Code Generation
  • Code: code
  • Summary: 本文提出了SWE-AGILE框架,旨在解决自主软件工程中智能体推理深度与上下文限制之间的矛盾。该框架引入了动态推理上下文策略,通过维护滑动窗口和压缩历史推理内容,在保持推理连续性的同时防止上下文爆炸。

[171] Collaborative Multi-Agent Scripts Generation for Enhancing Imperfect-Information Reasoning in Murder Mystery Games

  • arXiv: 2604.11741
  • Authors: Keyang Zhong, Junlin Xie, Hefeng Wu, Haofeng Li, Guanbin Li
  • Subjects: cs.AI
  • Tags: Multi-Agent System, Vision-Language Model, Game AI
  • Venue: ACL 2026
  • Summary: 本文提出了一种协作多智能体框架,用于生成高质量的谋杀谜案游戏脚本,以增强视觉语言模型在不完美信息下的推理能力。该系统通过基于思维链的微调和基于GRPO的强化学习,显著提升了模型在叙事推理和隐藏事实提取方面的表现。

[172] Retrieval Is Not Enough: Why Organizational AI Needs Epistemic Infrastructure

  • arXiv: 2604.11759
  • Authors: Federico Bottino, Carlo Ferrero, Nicholas Dosio, Pierfrancesco Beneventano
  • Subjects: cs.AI
  • Tags: RAG, Knowledge Representation, LLM Agent
  • Summary: 本文认为组织级AI的上限取决于认识论保真度,而不仅仅是检索保真度,并提出了OIDA框架来构建具有认识论结构的组织知识。该框架引入了问题机制来显式建模组织未知的内容,并通过实验验证了其在提高知识质量方面的有效性。

[173] GenTac: Generative Modeling and Forecasting of Soccer Tactics

  • arXiv: 2604.11786
  • Authors: Jiayuan Rao, Tianlin Gui, Haoning Wu, Yanfeng Wang, Weidi Xie
  • Subjects: cs.AI; cs.MA
  • Tags: Multi-Agent System, Diffusion Model, Sports Analytics
  • Summary: 本文介绍了GenTac,一个基于扩散模型的生成框架,用于建模和预测足球战术,能够生成多样化的未来轨迹。该框架支持丰富的上下文条件控制,并能准确模拟不同球队和联赛的风格差异,为战术分析提供了新工具。

[174] Detecting Safety Violations Across Many Agent Traces

  • arXiv: 2604.11806
  • Authors: Adam Stein, Davis Brown, Hamed Hassani, Mayur Naik, Eric Wong
  • Subjects: cs.AI; cs.CL
  • Tags: LLM Agent, AI Safety, Anomaly Detection
  • Summary: 本文介绍了Meerkat系统,结合聚类和智能体搜索技术,用于在大量智能体轨迹中检测安全违规行为。该方法能够有效地发现稀疏且复杂的故障,无需依赖种子场景或固定工作流,显著提高了违规检测的效率。

跨领域投稿 (325)

[175] The Paradox of Professional Input: How Expert Collaboration with AI Systems Shapes Their Future Value

  • arXiv: 2504.12654 (cross-listed)
  • Authors: Venkat Ram Reddy Ganuthula, Krishna Kumar Balaraman
  • Subjects: econ.GN; cs.AI
  • Tags: Human-Computer Interaction, AI Ethics, Knowledge Management
  • Summary: 本文探讨了专业知识与人工智能之间的关系悖论,指出领域专家通过与AI系统协作外化隐性知识,可能会加速其自身专业知识的自动化进程。文章分析了这种协作模式,并提出了专业人员在AI时代保持和转化价值的框架。

[176] Retrieval-Augmented Large Language Models for Evidence-Informed Guidance on Cannabidiol Use in Older Adults

  • arXiv: 2604.09548 (cross-listed)
  • Authors: Ali Abedi, Charlene H. Chu, Shehroz S. Khan
  • Subjects: cs.IR; cs.AI
  • Tags: RAG, Medical AI, Question Answering
  • Summary: 本文开发了一种检索增强的大型语言模型框架,用于为老年人提供关于大麻二酚使用的循证指导。研究提出了一种自动化的无注释评估框架,实验表明检索增强模型比独立模型能产生更谨慎且符合指南的建议。

[177] Beyond Offline A/B Testing: Context-Aware Agent Simulation for Recommender System Evaluation

  • arXiv: 2604.09549 (cross-listed)
  • Authors: Nicolas Bougie, Gian Maria Marconi, Xiaotong Ye, Narimasa Watanabe
  • Subjects: cs.IR; cs.AI
  • Tags: Recommender System, LLM Agent
  • Summary: 本文介绍了ContextSim,一个基于LLM智能体的框架,通过将交互锚定在日常生活活动中来模拟可信的用户代理,以解决推荐系统评估中离线指标与在线性能脱节的问题。实验表明,该方法生成的交互行为与人类行为更一致,且基于此优化的推荐系统参数能提升真实世界的用户参与度。

[178] SemaCDR: LLM-Powered Transferable Semantics for Cross-Domain Sequential Recommendation

  • arXiv: 2604.09551 (cross-listed)
  • Authors: Chunxu Zhang, Shanqiang Huang, Zijian Zhang, Jiahong Liu, Linsong Yu, Ruiqi Wan, Bo Yang, Irwin King
  • Subjects: cs.IR; cs.AI
  • Tags: Recommender System, Transfer Learning
  • Summary: 本文提出了SemaCDR框架,利用大语言模型构建统一的语义空间,以解决跨域序列推荐中特征迁移性差的问题。该方法通过融合LLM生成的领域无关语义与特定内容,有效捕捉域内模式并促进跨域知识迁移,在真实数据集上表现优于现有基线。

[179] MCERF: Advancing Multimodal LLM Evaluation of Engineering Documentation with Enhanced Retrieval

  • arXiv: 2604.09552 (cross-listed)
  • Authors: Kiarash Naghavi Khanghah, Hoang Anh Nguyen, Anna C. Doris, Amir Mohammad Vahedi, Daniele Grandi, Faez Ahmed, Hongyi Xu
  • Subjects: cs.IR; cs.AI; cs.CL
  • Tags: RAG, Document Understanding, Vision-Language Model
  • Summary: 本文提出了MCERF框架,结合多模态检索器与大语言模型推理,用于工程文档的问答任务。该系统通过混合检索、视觉文本融合及多智能体路由等策略,在DesignQA基准上显著提升了多模态和推理密集型任务的准确率。

[180] SRBench: A Comprehensive Benchmark for Sequential Recommendation with Large Language Models

  • arXiv: 2604.09553 (cross-listed)
  • Authors: Jianhong Li, Zeheng Qian, Wangze Ni, Haoyang Li, Hongwei Yao, Yang Bai, Kui Ren
  • Subjects: cs.IR; cs.AI
  • Tags: Recommender System, LLM Evaluation
  • Summary: 本文提出了SRBench,一个用于评估大语言模型在序列推荐任务中表现的综合基准。该基准涵盖了准确性、公平性、稳定性和效率等多维度指标,并通过统一的输入范式和提示-提取器耦合机制,实现了神经网络模型与LLM模型的公平比较。

[181] Para-B&B: Load-Balanced Deterministic Parallelization of Solving MIP

  • arXiv: 2604.09556 (cross-listed)
  • Authors: Jinyu Zhang, Di Huang, Yue Liu, Shuo Wang, Zhenyu Pu, Zhiyuan Liu
  • Subjects: cs.DC; cs.AI
  • Tags: Optimization, High Performance Computing
  • Summary: 本文提出了首个针对高性能MIP求解器HiGHS的完全开源确定性并行分支定界实现。该方法引入数据并行架构和AI驱动的负载均衡机制,在保证严格确定性的同时显著提升了求解速度。

[182] SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

  • arXiv: 2604.09557 (cross-listed)
  • Authors: Talor Abramovich, Maor Ashkenazi, Carl, Putterman, Benjamin Chislett, Tiyasa Mitra, Bita Darvish Rouhani, Ran Zilberstein, Yonatan Geifman
  • Subjects: cs.DC; cs.AI
  • Tags: LLM Inference, LLM Evaluation, Speculative Decoding
  • Summary: 本文提出了SPEED-Bench,一个用于评估大语言模型推测解码技术的综合基准。该基准提供了多样化的语义数据集和吞吐量评估支持,旨在通过模拟真实生产环境来标准化推测解码的性能评估。

[183] Emergent Social Structures in Autonomous AI Agent Networks: A Metadata Analysis of 626 Agents on the Pilot Protocol

  • arXiv: 2604.09561 (cross-listed)
  • Authors: Teodor-Ioan Calin
  • Subjects: cs.SI; cs.AI; cs.CY; cs.DC
  • Tags: LLM Agent, Social Network Analysis, Multi-Agent System
  • Summary: 本文对自主AI智能体网络中的社会结构形成进行了首次实证分析。研究发现,这些自主形成的信任网络展现出类似人类社交网络的小世界特性和优先连接模式,揭示了机器社会学的全新实证领域。

[184] StreamServe: Adaptive Speculative Flows for Low-Latency Disaggregated LLM Serving

  • arXiv: 2604.09562 (cross-listed)
  • Authors: Satyam Kumar, Arpit Singh Gautam, Kailash Talreja, Saurabh Jha
  • Subjects: cs.DC; cs.AI
  • Tags: LLM Inference, High Performance Computing, Speculative Decoding
  • Summary: 本文提出了StreamServe,一种分离式预填充-解码服务架构,结合了自适应推测解码技术以降低延迟。该系统通过感知路由和运行时调整,在多种工作负载下显著降低了延迟并提升了吞吐量。

[185] ACE-Bench: A Lightweight Benchmark for Evaluating Azure SDK Usage Correctness

  • arXiv: 2604.09564 (cross-listed)
  • Authors: Wenxing Zhu, Simeng Qi, Junkui Chen, Yan Xie, Min Huang, Jingkan He, Xiao Wang, Cheng Chen, Sijing Meng, Tianqi Zhang
  • Subjects: cs.DC; cs.AI; cs.SE
  • Tags: Code Generation, LLM Evaluation, Software Testing
  • Summary: 本文提出了ACE-Bench,一个用于评估大语言模型编码智能体使用Azure SDK正确性的轻量级基准。该基准通过将官方文档示例转化为编码任务,并利用原子标准验证解决方案,实现了无需云资源配置的快速可复现评估。

[186] AEG: A Baremetal Framework for AI Acceleration via Direct Hardware Access in Heterogeneous Accelerators

  • arXiv: 2604.09565 (cross-listed)
  • Authors: Hua Jiang, Sayan Mandal, Brandon Kirincich, Govind Varadarajan
  • Subjects: cs.DC; cs.AI
  • Tags: DNN Deployment, Heterogeneous Computing, Edge Computing
  • Summary: 本文提出了一种统一的、硬件无关的裸机运行时架构,旨在通过直接硬件访问在异构加速器上实现高性能机器学习推理。该框架消除了操作系统开销,显著提升了计算效率并降低了延迟抖动。

[187] LETGAMES: An LLM-Powered Gamified Approach to Cognitive Training for Patients with Cognitive Impairment

  • arXiv: 2604.09566 (cross-listed)
  • Authors: Jingwei Shi, Shengyu Tao, Xinxiang Yin, Chen Huang, Wenqiang Lei, See-Kiong Ng
  • Subjects: cs.HC; cs.AI; cs.CL
  • Tags: Medical AI, Game AI
  • Summary: 本文提出了LETGAMES,一种利用大语言模型为认知障碍患者自动生成个性化治疗游戏的方法。该系统受《龙与地下城》启发,生成针对特定认知领域的交互式叙事游戏,并建立了心理学基础的评估协议以验证其有效性。

[188] Neuro-Symbolic Strong-AI Robots with Closed Knowledge Assumption: Learning and Deductions

  • arXiv: 2604.09567 (cross-listed)
  • Authors: Zoran Majkic
  • Subjects: cs.LO; cs.AI
  • Tags: Neurosymbolic AI, Robotics, Knowledge Representation
  • Summary: 本文探讨了强人工智能机器人的神经符号知识表示方法,提出了闭知识假设来处理未知事实。研究利用Belnap双格结构支持不一致信息和悖论,旨在通过逻辑推理和因果性赋予机器人类人的智能与安全性。

[189] Tuning Qwen2.5-VL to Improve Its Web Interaction Skills

  • arXiv: 2604.09571 (cross-listed)
  • Authors: Alexandra Yakovleva, Henrik Pärssinen, Harri Valpola, Juho Kannala, Alexander Ilin
  • Subjects: cs.HC; cs.AI
  • Tags: Vision-Language Model, GUI Automation
  • Venue: WWW 2026
  • Summary: 本文研究了通过微调Qwen2.5-VL模型来提升其在网页交互任务中的可靠性。针对元素定位不准和指令敏感等问题,提出了两阶段训练流程,显著提高了模型在单次点击网页任务中的成功率。

[190] ACE-TA: An Agentic Teaching Assistant for Grounded Q&A, Quiz Generation, and Code Tutoring

  • arXiv: 2604.09572 (cross-listed)
  • Authors: Himanshu Tripathi, Charlottee Crowell, Kaley Newlin, Subash Neupane, Shahram Rahimi, Jason Keith
  • Subjects: cs.HC; cs.AI; cs.CL
  • Tags: Education Technology, LLM Agent
  • Summary: 本文介绍了ACE-TA,一个用于编程课程的智能教学助手框架。该系统利用大语言模型自主路由概念查询,提供基于检索的问答、自适应测验生成和交互式代码辅导功能。

[191] Generative UI: LLMs are Effective UI Generators

  • arXiv: 2604.09577 (cross-listed)
  • Authors: Yaniv Leviathan, Dani Valevski, Matan Kalman, Danny Lumen, Eyal Segalis, Eyal Molad, Shlomi Pasternak, Vishnu Natchu, Valerie Nygaard, Srinivasan, Venkatachary, James Manyika, Yossi Matias
  • Subjects: cs.HC; cs.AI; cs.CL; cs.LG
  • Tags: Human-Computer Interaction, UI Generation
  • Summary: 本文展示了现代大语言模型在配备适当工具和提示后,能够稳健地生成高质量的自定义用户界面。研究结果表明,生成的UI在多数情况下优于标准的Markdown输出,并发布了一个专家构建的数据集PAGEN以支持相关评估。

[192] Evaluating Visual Prompts with Eye-Tracking Data for MLLM-Based Human Activity Recognition

  • arXiv: 2604.09585 (cross-listed)
  • Authors: Jae Young Choi, Seon Gyeom Kim, Hyungjun Yoon, Taeckyung Lee, Donggun Lee, Jaeryung Chung, Jihyung Kil, Ryan Rossi, Sung-Ju Lee, Tak Yeon Lee
  • Subjects: cs.HC; cs.AI; cs.CV
  • Tags: Multimodal Learning, Prompt Engineering, Human Activity Recognition
  • Venue: IEEE PacificVis 2026
  • Summary: 本文研究了利用眼动追踪数据通过视觉提示策略进行多模态大语言模型的人类活动识别。实验表明,将传感器信号转换为可视化图像作为输入,不仅节省了Token成本,还使MLLM能够有效推理高频传感器信号。

[193] Why Smaller Is Slower? Dimensional Misalignment in Compressed LLMs

  • arXiv: 2604.09595 (cross-listed)
  • Authors: Jihao Xin, Tian Lyu, Qilong Pan, Kesen Wang, Marco Canini
  • Subjects: cs.DC; cs.AI
  • Tags: Model Compression, LLM Inference, GPU Computing
  • Summary: 本文分析了压缩后的LLM虽然参数减少但运行速度并未提升的原因,提出了”维度错位”现象,即压缩导致的张量维度不规则会降低GPU性能。作者提出GAC(GPU对齐压缩)新范式,通过多重选择背包优化重新选择硬件对齐的维度,在保持模型质量的同时恢复高达1.5倍加速。

[194] From Theory to Protocol: Executable Frameworks for Creative Emergence and Strategic Foresight

  • arXiv: 2604.09597 (cross-listed)
  • Authors: Shun Fujiyoshi
  • Subjects: cs.HC; cs.AI
  • Tags: Decision Making, Cognitive Science
  • Code: code
  • Summary: 本文提出两个可执行协议:用于跨领域创意涌现的GHOSTY COLLIDER和用于战略前瞻的PRECOG PROTOCOL,将描述性理论转化为可重复的分步程序。通过案例研究、对照实验和批量实验评估,协议驱动的输出展现出更高的结构新颖性和参数特异性。

[195] Duration-Informed Workload Scheduler

  • arXiv: 2604.09599 (cross-listed)
  • Authors: Daniela Loreti, Davide Leone, Andrea Borghesi
  • Subjects: cs.DC; cs.AI
  • Tags: High Performance Computing, Optimization
  • Summary: 本文提出了一种增强型工作负载调度器,集成了基于机器学习的作业时长预测模块。在Tier-0超级计算机的工作负载追踪上评估,平均等待时间减少约11%,直接提升了用户服务质量和系统周转效率。

[196] ECHO: Elastic Speculative Decoding with Sparse Gating for High-Concurrency Scenarios

  • arXiv: 2604.09603 (cross-listed)
  • Authors: Xinyi Hu, Yuhao Shen, Baolin Zhang, Hengxin Zhang, Jun Dai, Shuang Ge, Lei Chen, Yue Li, Mingcheng Wan
  • Subjects: cs.DC; cs.AI; cs.LG
  • Tags: Speculative Decoding, LLM Inference
  • Summary: 本文提出ECHO框架,将推测执行重新表述为预算调度问题,采用稀疏置信门控管理批次作为统一超级树。在包括工业级Qwen3-235B在内的多种模型规模上评估,实现高达5.35倍加速,在高并发场景下持续优于现有方法。

[197] Human-AI Interaction Traces as Blackout Poetry: Reframing AI-Supported Writing as Found-Text Creativity

  • arXiv: 2604.09605 (cross-listed)
  • Authors: Syemin Park, Soobin Park, Youn-kyung Lim
  • Subjects: cs.HC; cs.AI
  • Tags: Human-Computer Interaction, AI Ethics
  • Venue: ACM CHI 2026 Workshop
  • Summary: 本文借鉴停电诗歌的概念,将人机交互痕迹重新定义为表达性工件而非审计工具。作者认为将交互痕迹设计为美学工件可以帮助读者更好地理解和信任作者在AI辅助写作中的创造性贡献。

[198] Characterizing Performance-Energy Trade-offs of Large Language Models in Multi-Request Workflows

  • arXiv: 2604.09611 (cross-listed)
  • Authors: Md. Monzurul Amin Ifath, Israat Haque
  • Subjects: cs.DC; cs.AI
  • Tags: LLM Inference, Energy Efficiency
  • Summary: 本文首次系统性地刻画了多请求LLM推理中的性能-能耗权衡,开发了四种代表性工作负载模式。研究发现批大小是最具影响力的调节杠杆,vLLM的引擎级优化在解码密集型工作负载中保持更高的GPU利用率。

[199] Token-Budget-Aware Pool Routing for Cost-Efficient LLM Inference

  • arXiv: 2604.09613 (cross-listed)
  • Authors: Huamin Chen, Xunzhuo Liu, Junchen Jiang, Bowei He, Xue Liu
  • Subjects: cs.DC; cs.AI
  • Tags: LLM Inference, Optimization
  • Summary: 本文提出Token预算感知的池路由策略,根据请求的Token预算估计将其分发到短池或长池,解决配置-流量不匹配问题。在Azure LLM推理数据集上评估,GPU实例减少17-39%,显著降低运营成本。

[200] HearthNet: Edge Multi-Agent Orchestration for Smart Homes

  • arXiv: 2604.09618 (cross-listed)
  • Authors: Zhonghao Zhan, Krinos Li, Yefan Zhang, Hamed Haddadi
  • Subjects: cs.DC; cs.AI; cs.CR
  • Tags: LLM Agent, Edge Computing, Multi-Agent System
  • Summary: 本文提出HearthNet,一个面向智能家居的边缘多智能体编排系统,在家居中心部署持久化、角色专业化的LLM智能体。系统通过MQTT和Git支持的共享状态协调,实现跨异构设备的意图驱动控制。

[201] Assessing the Pedagogical Readiness of Large Language Models as AI Tutors in Low-Resource Contexts: A Case Study of Nepal's K-10 Curriculum

  • arXiv: 2604.09619 (cross-listed)
  • Authors: Pratyush Acharya, Prasansha Bharati, Yokibha Chapagain, Isha Sharma Gauli, Kiran Parajuli
  • Subjects: cs.CY; cs.AI; cs.CL
  • Tags: Education Technology, LLM Evaluation
  • Summary: 本文评估了四种先进LLM作为尼泊尔K-10科学和数学教育AI导师的能力,揭示了”课程对齐差距”。研究发现前沿模型存在”专家诅咒”和”基础谬误”等失败模式,表明现成LLM尚未准备好在尼泊尔课堂自主部署。

[202] LLM Nepotism in Organizational Governance

  • arXiv: 2604.09620 (cross-listed)
  • Authors: Shunqi Mao, Wei Guo, Dingxin Zhang, Chaoyi Zhang, Weidong Cai
  • Subjects: cs.CY; cs.AI; cs.CL
  • Tags: LLM Alignment, Fairness, Bias Mitigation
  • Summary: 本文提出”LLM裙带关系”概念,指评估者奖励对AI本身表达信任的倾向。研究发现LLM招聘可能产生更同质化的AI信任组织,并提出”能力-态度分解”方法来分离非能力相关的AI态度与能力评估,缓解这种偏见。

[203] Explainability and Certification of AI-Generated Educational Assessments

  • arXiv: 2604.09622 (cross-listed)
  • Authors: Antoun Yaacoub, Zainab Assaghir, Anuradha Kar
  • Subjects: cs.CY; cs.AI; cs.CL
  • Tags: Education Technology, Interpretability
  • Summary: 本文提出AI生成评估项目的可解释性和认证框架,结合自我理性化、归因分析和事后验证,产生基于布鲁姆和SOLO分类法的可解释认知对齐证据。引入交通灯认证工作流区分自动认证项目和需人工审核项目。

[204] Assessing Model-Agnostic XAI Methods against EU AI Act Explainability Requirements

  • arXiv: 2604.09628 (cross-listed)
  • Authors: Francesco Sovrano, Giulia Vilone, Michael Lognoul
  • Subjects: cs.CY; cs.AI
  • Tags: Interpretability, AI Ethics
  • Venue: XAI 2026
  • Summary: 本文研究模型无关的XAI方法并将其可解释性特征与欧盟AI法案要求相关联。提出定性到定量的评分框架,将专家对XAI属性的定性评估聚合为法规特定的合规分数,帮助从业者识别XAI解决方案何时可能支持法律解释要求。

[205] Adoption and Effectiveness of AI-Based Anomaly Detection for Cross Provider Health Data Exchange

  • arXiv: 2604.09630 (cross-listed)
  • Authors: Cao Tram Anh Hoang
  • Subjects: cs.CY; cs.AI
  • Tags: Anomaly Detection, Medical AI, Interpretability
  • Summary: 本文研究跨提供商EHR环境中AI异常检测的采用和有效性,提出四支柱准备框架和10项检查清单。通过仿真比较规则方法和隔离森林,发现规则方法召回率高但告警量大,隔离森林降低告警负担但灵敏度较低。

[206] Hardware Utilization and Inference Performance of Edge Object Detection Under Fault Injection

  • arXiv: 2604.09631 (cross-listed)
  • Authors: Faezeh Pasandideh, Mehdi Azarafza, Achim Rettberg
  • Subjects: cs.DC; cs.AI
  • Tags: Object Detection, Edge Computing, Fault Tolerance
  • Summary: 本文系统刻画了在故障注入攻击下,NVIDIA Jetson Nano上TensorRT优化YOLO模型的CPU、GPU、内存、功耗和热行为。结果表明即使输入数据严重退化,推理引擎仍能保持GPU占用稳定、温升可控、功耗在安全范围内。

[207] Agentic AI in Engineering and Manufacturing: Industry Perspectives on Utility, Adoption, Challenges, and Opportunities

  • arXiv: 2604.09633 (cross-listed)
  • Authors: Kristen M. Edwards, Maxwell Bauer, Claire Jacquillat, A. John Hart, Faez Ahmed
  • Subjects: cs.CY; cs.AI
  • Tags: LLM Agent, Manufacturing AI
  • Summary: 本文通过30余次访谈研究AI特别是智能体系统在工程制造中的采用现状,发现近期AI收益集中在结构化重复工作,而采用受限于数据碎片化、安全法规要求和有限的API可访问遗留工具链。可靠性、验证和可审计性是采用的核心要求。

[208] From Understanding to Creation: A Prerequisite-Free AI Literacy Course with Technical Depth Across Majors

  • arXiv: 2604.09634 (cross-listed)
  • Authors: Amarda Shehu
  • Subjects: cs.CY; cs.AI
  • Tags: Education Technology
  • Summary: 本文描述了乔治梅森大学的UNIV 182课程,一门无先修要求的AI素养课程,教授跨专业本科生理解、使用、评估和构建AI系统。课程通过统一概念流水线、AI工作室、累积评估组合等机制,使学生从描述性推理进展到技术性设计。

[209] Leveraging Machine Learning Techniques to Investigate Media and Information Literacy Competence in Tackling Disinformation

  • arXiv: 2604.09635 (cross-listed)
  • Authors: José Manuel Alcalde-Llergo, Mariana Buenestado Fernández, Carlos Enrique George-Reyes, Andrea Zingoni, Enrique Yeguas-Bolívar
  • Subjects: cs.CY; cs.AI; cs.LG
  • Tags: Education Technology, Text Classification
  • Summary: 本研究开发了机器学习模型来评估学生在应对虚假信息方面的媒体与信息素养(MIL)能力。结果表明复杂模型优于简单方法,学年和先前培训等变量显著提高了预测准确性。

[210] Detecting Corporate AI-Washing via Cross-Modal Semantic Inconsistency Learning

  • arXiv: 2604.09644 (cross-listed)
  • Authors: Zhanjie Wen, Jingqiao Guo
  • Subjects: cs.CY; cs.AI
  • Tags: Multimodal Learning, Information Extraction
  • Summary: 本文提出AWASH框架,通过跨模态声明-证据推理来检测企业AI炒作行为。CMID网络在包含年报文本、披露图像和财报电话会议视频的大规模三模态基准上取得了优异性能。

[211] Generating High Quality Synthetic Data for Dutch Medical Conversations

  • arXiv: 2604.09645 (cross-listed)
  • Authors: Cecilia Kuan, Aditya Kamlesh Parikh, Henk van den Heuvel
  • Subjects: cs.CL; cs.AI
  • Tags: Data Synthesis, Medical AI, Dialogue System
  • Venue: LREC 2026
  • Summary: 本文提出了一个使用荷兰语微调大语言模型生成合成荷兰语医疗对话的流程。生成的对话通过定量指标和定性评审进行评估,结果表明生成合成医疗对话可行,但需要领域知识和精心设计的提示来平衡自然性和结构。

[212] Efficient Disruption of Criminal Networks through Multi-Objective Genetic Algorithms

  • arXiv: 2604.09647 (cross-listed)
  • Authors: Yehezkiel Darmadi, Thanh Thi Nguyen, Campbell Wilson
  • Subjects: cs.NE; cs.AI
  • Tags: Social Network Analysis, Optimization
  • Venue: CAI 2026
  • Summary: 本研究提出多目标遗传算法(WS-GA和NSGA-II)来识别犯罪网络破坏策略,平衡最大化碎片化和最小化运营成本。结果表明所提算法以显著更低的运营成本实现可比的破坏效果。

[213] WearBCI Dataset: Understanding and Benchmarking Real-World Wearable Brain-Computer Interfaces Signals

  • arXiv: 2604.09649 (cross-listed)
  • Authors: Haoxian Liu, Hengle Jiang, Lanxuan Hong, Xiaomin Ouyang
  • Subjects: cs.HC; cs.AI; eess.SP
  • Tags: Brain-Computer Interface, Multimodal Learning
  • Venue: Sensys 2026
  • Summary: 本文介绍了WearBCI数据集,这是首个在不同运动动态下全面评估可穿戴脑机接口信号的数据集,包含同步的多模态记录(EEG、IMU和自我中心视频)。该数据集支持运动伪影影响研究和EEG信号增强技术基准测试。

[214] Dynamic Forecasting and Temporal Feature Evolution of Stock Repurchases in Listed Companies Using Attention-Based Deep Temporal Networks

  • arXiv: 2604.09650 (cross-listed)
  • Authors: Xiang Ao, Jingxuan Zhang, Xinyu Zhao
  • Subjects: q-fin.ST; cs.AI; cs.LG
  • Tags: Time Series Forecasting, Quantitative Finance
  • Summary: 本文提出一个集成经济理论与深度时序网络的动态预警系统,使用TCN和注意力LSTM来预测股票回购。模型显著优于静态基线,并揭示了回购决策的时序动态:长期低估是动机,现金流激增是短期触发因素。

[215] Diffusion-Based Generative Priors for Efficient Beam Alignment in Directional Networks

  • arXiv: 2604.09653 (cross-listed)
  • Authors: Esraa Fahmy Othman, Lina Bariah, Merouane Debbah
  • Subjects: eess.SP; cs.AI
  • Tags: Diffusion Model, Wireless Networks
  • Summary: 本文将波束对齐重新定义为生成任务,提出条件扩散模型从紧凑的几何和多径特征学习概率波束先验。该方法在保持SNR的同时实现了强排序性能,显著减少波束训练开销。

[216] NeuroPath: Practically Adopting Motor Imagery Decoding through EEG Signals

  • arXiv: 2604.09654 (cross-listed)
  • Authors: Jiani Cao, Kun Wang, Yang Liu, Zhenjiang Li
  • Subjects: cs.HC; cs.AI; cs.LG; eess.SP
  • Tags: Brain-Computer Interface, Graph Neural Network
  • Summary: NeuroPath是一种用于稳健运动想象解码的神经架构,具有统一架构、处理不同电极配置的空间感知图适配器,以及用于低信噪比条件下增强鲁棒性的多模态辅助训练。

[217] Fairboard: a quantitative framework for equity assessment of healthcare models

  • arXiv: 2604.09656 (cross-listed)
  • Authors: James K. Ruffle, Samia Mohinta, Chris Foulon, Mohamad Zeina, Zicheng Wang, Sebastian Brandner, Harpreet Hyare, Parashkev Nachev
  • Subjects: cs.LG; cs.AI; stat.AP; stat.ME
  • Tags: Fairness, Medical AI, Image Segmentation
  • Summary: 本文评估了18个脑肿瘤分割模型在648名胶质瘤患者中的公平性,发现患者身份比模型选择更能解释性能差异。作者发布了Fairboard开源仪表板,用于医疗影像模型的公平性监测。

[218] Learning noisy phase transition dynamics from stochastic partial differential equations

  • arXiv: 2604.09664 (cross-listed)
  • Authors: Luning Sun, Van Hai Nguyen, Shusen Liu, John Klepeis, Fei Zhou
  • Subjects: cs.AI
  • Tags: Scientific Computing, Physics-Informed Learning
  • Summary: 本文开发了3D随机Cahn-Hilliard方程的物理感知代理模型,采用通量级参数化来保证质量守恒并捕获热激活成核。模型能够泛化到训练期间未见过的更大空间域和更长时间范围。

[219] Deliberative Alignment is Deep, but Uncertainty Remains: Inference time safety improvement in reasoning via attribution of unsafe behavior to base model

  • arXiv: 2604.09665 (cross-listed)
  • Authors: Pankayaraj Pathmanathan, Furong Huang
  • Subjects: cs.LG; cs.AI
  • Tags: LLM Alignment, AI Safety
  • Summary: 本文研究LLM中的审慎对齐,发现教师和学生模型之间存在对齐差距。作者提出一种BoN采样方法,将不安全行为归因于基础模型,在多个安全基准上实现了显著的安全改进。

[220] Do We Still Need GraphRAG? Benchmarking RAG and GraphRAG for Agentic Search Systems

  • arXiv: 2604.09666 (cross-listed)
  • Authors: Dongzhe Fan, Zheyi Xue, Siyuan Liu, Qiaoyu Tan
  • Subjects: cs.IR; cs.AI
  • Tags: RAG, LLM Agent, LLM Evaluation
  • Summary: 本文引入RAGSearch基准,在智能体搜索下评估密集RAG和GraphRAG方法。结果表明智能体搜索显著改善密集RAG并缩小与GraphRAG的性能差距,但GraphRAG在复杂多跳推理中仍有优势。

[221] Digital hybridity and relics in cultural heritage: using corpus linguistics to inform design in emerging technologies from AI to VR

  • arXiv: 2604.09669 (cross-listed)
  • Authors: Emma McClaughlin, Glenn McGarry, Alan Chamberlain, Geert De Wilde, Oliver Butler
  • Subjects: cs.HC; cs.AI; cs.CL; cs.CY; cs.DL; cs.LG
  • Tags: Linguistic Resource, Cultural Heritage
  • Summary: 本文使用语料库语言学方法研究历史和当代文本中遗物的表征,为混合技术(AI、VR)在文化遗产领域的设计提供参考。研究发现遗物的感知从精神对象演变为遗产符号。

[222] Human-like Working Memory Interference in Large Language Models

  • arXiv: 2604.09670 (cross-listed)
  • Authors: Hua-Dong Xiong, Li Ji-An, Jiaqi Huang, Robert C. Wilson, Kwonjoon Lee, Xue-Xin Wei
  • Subjects: cs.LG; cs.AI
  • Tags: LLM Reasoning, Cognitive Science
  • Summary: 本文研究LLM的工作记忆限制,发现它们再现了人类般的干扰特征。研究确定表征干扰是工作记忆的核心约束,成功回忆依赖于干扰控制——主动抑制任务无关内容以隔离目标。

[223] Active Inference with a Self-Prior in the Mirror-Mark Task

  • arXiv: 2604.09673 (cross-listed)
  • Authors: Dongmin Kim, Hoshinori Kanazawa, Yasuo Kuniyoshi
  • Subjects: cs.LG; cs.AI
  • Tags: Cognitive Science, Active Inference
  • Code: code
  • Summary: 本文提出一个计算模型,其中镜子自我识别行为通过自我先验机制自发涌现,无需外部奖励。模拟婴儿仅依靠视觉和本体感觉,在约70%的情况下发现并移除了镜子中脸上的贴纸。

[224] Real-Time Voicemail Detection in Telephony Audio Using Temporal Speech Activity Features

  • arXiv: 2604.09675 (cross-listed)
  • Authors: Kumar Saurav
  • Subjects: cs.SD; cs.AI; cs.LG
  • Tags: Speech Processing
  • Summary: 本文提出一种轻量级方法,从语音活动模式中提取时间特征来实时检测语音信箱。系统在商品级CPU上实现96.1%准确率和46毫秒延迟,支持380+并发WebSocket呼叫。

[225] A Comparative Theoretical Analysis of Entropy Control Methods in Reinforcement Learning

  • arXiv: 2604.09676 (cross-listed)
  • Authors: Ming Lei, Christophe Baehr
  • Subjects: cs.LG; cs.AI
  • Tags: Reinforcement Learning, LLM Reasoning
  • Summary: 本文对强化学习中的两种熵控制策略(传统熵正则化和基于协方差的机制)进行了比较理论分析,揭示了传统方法会导致次优策略,而基于协方差的方法能实现渐近无偏性。

[226] NetAgentBench: A State-Centric Benchmark for Evaluating Agentic Network Configuration

  • arXiv: 2604.09678 (cross-listed)
  • Authors: Ahmed Twabi, Yepeng Ding, Tohru Kondo
  • Subjects: cs.NI; cs.AI; cs.FL
  • Tags: LLM Agent, Benchmark, LLM Evaluation
  • Summary: 本文提出了NetAgentBench,一个基于有限状态机形式化的动态基准测试,用于评估LLM智能体在网络配置任务中的多轮操作行为,揭示了现有智能体在专家级配置任务中的严重缺陷。

[227] Heterogeneous Consensus-Progressive Reasoning for Efficient Multi-Agent Debate

  • arXiv: 2604.09679 (cross-listed)
  • Authors: Yiqing Liu, Hantao Yao, Wu Liu, Allen He, Yongdong Zhang
  • Subjects: cs.MA; cs.AI
  • Tags: Multi-Agent System, LLM Reasoning
  • Summary: 本文提出了HCP-MAD框架,通过异构共识验证实现渐进式推理,使简单任务通过轻量级双智能体辩论快速解决,复杂任务则升级为集体投票,在保持准确性的同时显著降低了token成本。

[228] Decision-Theoretic Safety Assessment of Persona-Driven Multi-Agent Systems in O-RAN

  • arXiv: 2604.09682 (cross-listed)
  • Authors: Zeinab Nezami, Syed Ali Raza Zaidi, Maryam Hafeez, Louis Powell, Vara Prasad Talari, Mallik Tatipamula
  • Subjects: cs.NI; cs.AI
  • Tags: Multi-Agent System, LLM Agent, Wireless Networks
  • Summary: 本文提出了一个基于角色的多智能体框架用于O-RAN网络管理,并开发了基于决策论的三维评估框架,系统性地验证智能体行为在规范性合规、指导对齐和行为动力学方面的表现。

[229] Grid2Matrix: Revealing Digital Agnosia in Vision-Language Models

  • arXiv: 2604.09687 (cross-listed)
  • Authors: Yunkai Zhang, Linda Li, Yingxin Cui, Xiyuan Ruan, Zeyu Zheng, Kezhen Chen, Yi Zhang, Diji Yang
  • Subjects: cs.CV; cs.AI
  • Tags: Vision-Language Model, Benchmark
  • Summary: 本文引入Grid2Matrix基准测试,揭示了视觉语言模型在精确读取视觉细节方面的”数字失认症”现象,即模型无法忠实表达视觉编码器中保留的细粒度网格信息。

[230] Face Density as a Proxy for Data Complexity: Quantifying the Hardness of Instance Count

  • arXiv: 2604.09689 (cross-listed)
  • Authors: Abolfazl Mohammadi-Seif, Ricardo Baeza-Yates
  • Subjects: cs.CV; cs.AI; cs.LG
  • Tags: Object Detection, Data Selection
  • Venue: IEEE CAI 2026
  • Summary: 本文通过控制实验证明,人脸密度是数据复杂度的内在维度,模型性能随密度增加单调下降,且低密度训练的模型在高密度场景下表现出系统性计数不足偏差。

[231] CAGE: Bridging the Accuracy-Aesthetics Gap in Educational Diagrams via Code-Anchored Generative Enhancement

  • arXiv: 2604.09691 (cross-listed)
  • Authors: Dikshant Kukreja, Kshitij Sah, Karan Goyal, Mukesh Mohania, Vikram Goyal
  • Subjects: cs.CV; cs.AI
  • Tags: Text-to-Image, Code Generation, Education Technology
  • Summary: 本文提出CAGE方法,结合LLM代码生成和扩散模型,先通过代码生成结构正确的图表,再通过ControlNet条件化扩散模型进行视觉优化,在保证标签正确性的同时生成美观的教育图表。

[232] TaFall: Balance-Informed Fall Detection via Passive Thermal Sensing

  • arXiv: 2604.09693 (cross-listed)
  • Authors: Chengxiao Li, Xie Zhang, Wei Zhu, Yan Jiang, Chenshu Wu
  • Subjects: cs.CV; cs.AI
  • Tags: Medical AI, IoT
  • Summary: 本文提出了TaFall跌倒检测系统,利用低成本热成像阵列传感,通过建模平衡退化过程和姿态驱动的生物力学平衡动力学来实现隐私保护的跌倒检测,在真实部署中达到极低的误报率。

[233] Assessing Privacy Preservation and Utility in Online Vision-Language Models

  • arXiv: 2604.09695 (cross-listed)
  • Authors: Karmesh Siddharam Chaudhari, Youxiang Zhu, Amy Feng, Xiaohui Liang, Honggang Zhang
  • Subjects: cs.CV; cs.AI
  • Tags: Privacy, Vision-Language Model
  • Venue: IEEE ICC 2026
  • Summary: 本文探讨了在线视觉语言模型中图像上传导致的个人隐私信息(PII)泄露问题,分析了直接和间接隐私暴露风险,并提出了在保护隐私的同时保持图像实用性的方法。

[234] I Can't Believe TTA Is Not Better: When Test-Time Augmentation Hurts Medical Image Classification

  • arXiv: 2604.09697 (cross-listed)
  • Authors: Daniel Nobrega Medeiros
  • Subjects: cs.CV; cs.AI
  • Tags: Medical AI, Data Augmentation
  • Summary: 本文挑战了测试时增强(TTA)能提高分类准确性的普遍假设,发现在医学图像分类中,标准TTA管道反而会持续降低准确率,降幅最高达31.6个百分点。

[235] Evaluating Scene-based In-Situ Item Labeling for Immersive Conversational Recommendation

  • arXiv: 2604.09698 (cross-listed)
  • Authors: Jiazhou Liang, Yifan Simon Liu, David Guo, Minqi Sun, Yilun Jiang, Scott Sanner
  • Subjects: cs.IR; cs.AI
  • Tags: Recommender System, Vision-Language Model, Human-Computer Interaction
  • Summary: 本文形式化了沉浸式对话推荐系统(ICRS)范式,将推荐物品直接在用户场景视觉环境中高亮显示,并提出了基于显式意图满足和主动信息需求的物品标签选择评估方法。

[236] Attention-Guided Flow-Matching for Sparse 3D Geological Generation

  • arXiv: 2604.09700 (cross-listed)
  • Authors: Zhixiang Lu, Mengqi Han, Peixin Guo, Tianming Bai, Jionglong Su, Fei Fang, Sifan Song
  • Subjects: cs.CV; cs.AI
  • Tags: Flow Matching, 3D Vision
  • Summary: 本文提出了3D-GeoFlow,首个针对稀疏多模态地质建模的注意力引导连续流匹配框架,通过3D注意力门控动态传播局部钻孔特征,从稀疏数据构建高分辨率3D地质模型。

[237] Identity-Aware U-Net: Fine-grained Cell Segmentation via Identity-Aware Representation Learning

  • arXiv: 2604.09702 (cross-listed)
  • Authors: Rui Xiao
  • Subjects: cs.CV; cs.AI; q-bio.QM
  • Tags: Image Segmentation, Medical AI
  • Summary: 本文提出了身份感知U-Net(IAU-Net),通过联合建模空间定位和实例判别,结合三元组度量学习来区分形态相似的细胞实例,有效解决了密集布局和模糊边界下的细粒度分割问题。

[238] The Deployment Gap in AI Media Detection: Platform-Aware and Visually Constrained Adversarial Evaluation

  • arXiv: 2604.09706 (cross-listed)
  • Authors: Aishwarya Budhkar, Trishita Dhara, Siddhesh Sheth
  • Subjects: cs.CV; cs.AI
  • Tags: Deepfake Detection, Adversarial Robustness
  • Venue: CVPR 2026 Workshop
  • Summary: 本文引入平台感知的对抗评估框架,模拟真实部署中的压缩、截图等变换,揭示了AI媒体检测器在实验室条件下接近完美的性能在真实场景中会显著下降。

[239] Orthogonal Quadratic Complements for Vision Transformer Feed-Forward Networks

  • arXiv: 2604.09709 (cross-listed)
  • Authors: Wang Zixian
  • Subjects: cs.CV; cs.AI
  • Tags: Vision Transformer, Representation Learning
  • Summary: 本文提出了正交二次补集(OQC),通过构建低秩二次辅助分支并将其投影到主分支的正交补集上,确保辅助特征仅贡献主分支未捕获的信息,从而提升视觉Transformer前馈网络的性能。

[240] LAST: Leveraging Tools as Hints to Enhance Spatial Reasoning for Multimodal Large Language Models

  • arXiv: 2604.09712 (cross-listed)
  • Authors: Shi-Yu Tian, Zhi Zhou, Kun-Yang Yu, Ming Yang, Yang Chen, Ziqiao Shang, Lan-Zhe Guo, Yu-Feng Li
  • Subjects: cs.CV; cs.AI
  • Tags: Vision-Language Model, Tool Learning, LLM Reasoning
  • Summary: 本文提出了LAST框架,通过可扩展的交互式沙箱LAST-Box将异构工具调用抽象为原子指令,返回多模态提示供LLM直接使用,并通过三阶段渐进训练策略增强多模态大语言模型的空间推理能力。

[241] Training Deep Visual Networks Beyond Loss and Accuracy Through a Dynamical Systems Approach

  • arXiv: 2604.09716 (cross-listed)
  • Authors: Hai La Quang, Hassan Ugail, Newton Howard, Cong Tran Tien, Nam Vu Hoai, Hung Nguyen Viet
  • Subjects: cs.CV; cs.AI
  • Tags: Deep Learning Theory, Representation Learning, Computer Vision
  • Summary: 本文提出了一种基于动力系统的方法来分析深度视觉网络的训练过程,定义了整合分数、亚稳态分数和动态稳定性指数三个指标,用于理解训练过程中内部表示的变化,而不仅仅依赖损失和准确率指标。

[242] ConfigSpec: Profiling-Based Configuration Selection for Distributed Edge--Cloud Speculative LLM Serving

  • arXiv: 2604.09722 (cross-listed)
  • Authors: Xiangchen Li, Saeid Ghafouri, Jiakun Fan, Babar Ali, Hans Vandierendonck, Dimitrios S. Nikolopoulos
  • Subjects: cs.DC; cs.AI
  • Tags: Speculative Decoding, LLM Inference, Edge Computing
  • Venue: TDIS 2026 Workshop
  • Summary: 本文提出了ConfigSpec,一个用于分布式边缘-云端推测性LLM服务的配置选择框架,通过分析边缘设备性能和草稿-目标对齐性,在不同配置空间中评估吞吐量、接受率和功耗,揭示了不同优化目标之间的结构性冲突。

[243] LOLGORITHM: Funny Comment Generation Agent For Short Videos

  • arXiv: 2604.09729 (cross-listed)
  • Authors: Xuan Ouyang, Senan Wang, Bouzhou Wang, Siyuan Xiahou, Jinrong Zhou, Yuekang Li
  • Subjects: cs.CV; cs.AI
  • Tags: LLM Agent, Text Generation, Multi-Agent System
  • Summary: 本文提出了LOLGORITHM,一个模块化的多智能体框架,用于生成短视频的风格化评论,支持六种可控评论风格,包含视频内容摘要、分类和语义检索模块,在YouTube和抖音数据集上均取得了超过80%的人类偏好选择率。

[244] SMART: When is it Actually Worth Expanding a Speculative Tree?

  • arXiv: 2604.09731 (cross-listed)
  • Authors: Lifu Wang, Pan Zhou
  • Subjects: cs.DC; cs.AI
  • Tags: Speculative Decoding, LLM Inference
  • Summary: 本文提出了SMART,一个系统感知的边际分析框架,用于树状推测解码的运行时树构建,通过硬件感知的边际收益-成本规则优化树扩展决策,在MLLM和LLM上分别实现了20.0%和15.4%的额外加速。

[245] Multi-Frequency Local Plasticity for Visual Representation Learning

  • arXiv: 2604.09734 (cross-listed)
  • Authors: Mehdi Fatan Serj, C. Alejandro Parraga, Xavier Otazu
  • Subjects: cs.CV; cs.AI
  • Tags: Representation Learning, Self-Supervised Learning, Computer Vision
  • Summary: 本文引入了一个模块化层次框架,结合固定多频Gabor分解、竞争性学习、联想记忆和自上而下调制,在无需端到端反向传播的情况下进行视觉表示学习,在CIFAR-10上达到了80.1%的准确率。

[246] STaR-DRO: Stateful Tsallis Reweighting for Group-Robust Structured Prediction

  • arXiv: 2604.09737 (cross-listed)
  • Authors: Samah Fodeh, Ganesh Puthiaraju, Elyas Irankhah, Linhai Ma, Srivani Talakokkul, Afshan Khan, Sreeraj Ramachandran, Jordan Alpert, Sarah Schellhorn
  • Subjects: cs.LG; cs.AI
  • Tags: Prompt Engineering, Information Extraction, Medical AI
  • Summary: 本文提出了一个两阶段框架,结合任务无关的提示策略和STaR-DRO鲁棒优化方法,用于处理结构化预测中的组异质性问题,在医疗文本提取任务上显著提升了困难语义决策的F1分数。

[247] ExecTune: Effective Steering of Black-Box LLMs with Guide Models

  • arXiv: 2604.09741 (cross-listed)
  • Authors: Vijay Lingam, Aditya Golatkar, Anwesan Pal, Ben Vo, Narayanan Sadagopan, Alessandro Achille, Jun Huan, Anoop Deoras, Stefano Soatto
  • Subjects: cs.LG; cs.AI
  • Tags: LLM Inference, LLM Reasoning, Code Generation
  • Venue: ICLR 2026 Workshop
  • Summary: 本文提出了ExecTune,一种针对引导-核心策略系统的训练方法,通过教师引导的接受采样、监督微调和结构感知强化学习来优化策略的可执行性,在数学推理和代码生成任务上实现了高达9.2%的准确率提升和22.4%的推理成本降低。

[248] MPAC: A Multi-Principal Agent Coordination Protocol for Interoperable Multi-Agent Collaboration

  • arXiv: 2604.09744 (cross-listed)
  • Authors: Kaiyang Qian, Xinmin Fang, Zhengxiong Li
  • Subjects: cs.MA; cs.AI
  • Tags: Multi-Agent System, LLM Agent, LLM Interoperability
  • Summary: 本文提出了MPAC,一个用于多主体智能体协作的应用层协议,在会话、意图、操作、冲突和治理五个层级上提供显式协调语义,支持跨独立主体的共享状态协调和人机交互仲裁。

[249] CONSCIENTIA: Can LLM Agents Learn to Strategize? Emergent Deception and Trust in a Multi-Agent NYC Simulation

  • arXiv: 2604.09746 (cross-listed)
  • Authors: Aarush Sinha, Arion Das, Soumyadeep Nag, Charan Karnati, Shravani Nag, Chandra Vadhan Raj, Aman Chadha, Vinija Jain, Suranjana Trivedy, Amitava Das
  • Subjects: cs.MA; cs.AI; cs.CL
  • Tags: LLM Agent, Multi-Agent System, LLM Alignment
  • Summary: 本文通过在简化纽约市模型中的大规模多智能体模拟,研究了LLM智能体中策略行为的涌现,发现蓝方智能体和红方智能体在对抗性激励下展现出有限的选择性信任和欺骗策略行为。

[250] ADAM: A Systematic Data Extraction Attack on Agent Memory via Adaptive Querying

  • arXiv: 2604.09747 (cross-listed)
  • Authors: Xingyu Lyu, Jianfeng He, Ning Wang, Yidan Hu, Tao Li, Danjue Chen, Shixiong Li, Yimin Chen
  • Subjects: cs.CR; cs.AI
  • Tags: LLM Security, Privacy, LLM Agent
  • Summary: 本文提出了ADAM,一种针对LLM智能体内存模块的隐私攻击方法,通过数据分布估计和熵引导查询策略提取敏感信息,实现了高达100%的攻击成功率,揭示了当前LLM智能体的严重隐私漏洞。

[251] Backdoors in RLVR: Jailbreak Backdoors in LLMs From Verifiable Reward

  • arXiv: 2604.09748 (cross-listed)
  • Authors: Weiyang Guo, Zesheng Shi, Zeen Zhu, Yuan Zhou, Min Zhang, Jing Li
  • Subjects: cs.CR; cs.AI
  • Tags: LLM Security, Backdoor Detection, LLM Alignment
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文首次识别了可验证奖励强化学习(RLVR)框架中的后门漏洞,提出了一种非对称奖励后门攻击方法,仅需不到2%的投毒数据即可成功植入越狱后门,使安全性能平均下降73%。

[252] Conflicts Make Large Reasoning Models Vulnerable to Attacks

  • arXiv: 2604.09750 (cross-listed)
  • Authors: Honghao Liu, Chengjin Xu, Xuhui Jiang, Cehao Yang, Shengming Yin, Zhengwu Ma, Lionel Ni, Jian Guo
  • Subjects: cs.CR; cs.AI
  • Tags: LLM Security, LLM Reasoning, LLM Alignment
  • Code: code
  • Summary: 本文研究了大型推理模型在冲突目标下对有害查询的响应,发现内部冲突和困境会显著增加攻击成功率,层级和神经元分析表明安全相关表示与功能表示在冲突下发生重叠和干扰。

[253] A-IO: Adaptive Inference Orchestration for Memory-Bound NPUs

  • arXiv: 2604.09752 (cross-listed)
  • Authors: Chen Zhang, Yan Ding, Haotian Wang, Chubo Liu, Keqin Li, Kenli Li
  • Subjects: cs.DC; cs.AI
  • Tags: LLM Inference, DNN Deployment
  • Summary: 本文针对异构NPU平台上LLM部署的内存受限挑战,揭示了静态单尺寸模型部署导致的”模型缩放悖论”,并指出了细粒度推测解码在NPU计算图编译下的同步开销问题。

[254] MedLVR: Latent Visual Reasoning for Reliable Medical Visual Question Answering

  • arXiv: 2604.09757 (cross-listed)
  • Authors: Suyang Xi, Songtao Hu, Yuxiang Lai, Wangyun Dan, Yaqi Liu, Shansong Wang, Xiaofeng Yang
  • Subjects: cs.CV; cs.AI
  • Tags: Medical AI, Vision-Language Model, Question Answering
  • Summary: 本文提出了MedLVR,一个用于医疗视觉问答的潜在视觉推理框架,通过在自回归解码中引入显式视觉证据状态和两阶段训练策略,在多个医疗VQA基准上显著提升了性能。

[255] GIANTS: Generative Insight Anticipation from Scientific Literature

  • arXiv: 2604.09793 (cross-listed)
  • Authors: Joy He-Yueya, Anikait Singh, Ge Gao, Michael Y. Li, Sherry Yang, Chelsea Finn, Emma Brunskill, Noah D. Goodman
  • Subjects: cs.CL; cs.AI
  • Tags: Scientific Reasoning, Text Generation, Knowledge Synthesis
  • Summary: 本文引入了洞察预测任务,要求模型从基础论文预测下游论文的核心洞察,构建了GiantsBench基准数据集,并通过强化学习训练了GIANTS-4B模型,在相似度评分上相比gemini-3-pro实现了34%的相对提升。

[256] Explainable Human Activity Recognition: A Unified Review of Concepts and Mechanisms

  • arXiv: 2604.09799 (cross-listed)
  • Authors: Mainak Kundu, Catherine Chen, Rifatul Islam, Ismail Uysal, Ria Kanjilal
  • Subjects: cs.LG; cs.AI
  • Tags: Human Activity Recognition, Interpretability
  • Summary: 本文对可解释人体活动识别方法进行了全面综述,涵盖了可穿戴、环境、生理和多模态传感设置,引入了将可解释性概念维度与算法解释机制分离的统一视角,并提出了以机制为中心的分类体系。

[257] ACCIDENT: A Benchmark Dataset for Vehicle Accident Detection from Traffic Surveillance Videos

  • arXiv: 2604.09819 (cross-listed)
  • Authors: Lukas Picek, Michal Čermák, Marek Hanzl, Vojtěch Čermák
  • Subjects: cs.CV; cs.AI
  • Tags: Object Detection, Video Understanding, Autonomous Driving
  • Summary: 本文介绍了ACCIDENT,一个用于CCTV监控视频交通事故检测的基准数据集,包含真实和合成视频片段,标注了事故时间、空间位置和碰撞类型。该基准定义了三个核心任务:时间定位、空间定位和碰撞类型分类,并提供了多种基线方法。

[258] F3G-Avatar : Face Focused Full-body Gaussian Avatar

  • arXiv: 2604.09835 (cross-listed)
  • Authors: Willem Menu, Erkut Akdag, Pedro Quesado, Yasaman Kashefbahrami, Egor Bondarev
  • Subjects: cs.CV; cs.AI
  • Tags: 3D Vision, 3D Reconstruction
  • Venue: CVPR 2026 Workshop
  • Summary: 本文提出F3G-Avatar,一种专注于面部细节的全身高斯化身合成方法,通过双分支架构分别处理身体姿态变形和面部几何细节。该方法使用3D高斯和可微分高斯泼溅技术,在AvatarReX数据集上实现了高质量的渲染效果。

[259] Is There Knowledge Left to Extract? Evidence of Fragility in Medically Fine-Tuned Vision-Language Models

  • arXiv: 2604.09841 (cross-listed)
  • Authors: Oliver McLaughlin, Daniel Shubin, Carsten Eickhoff, Ritambhara Singh, William Rudman, Michal Golovanevsky
  • Subjects: cs.CV; cs.AI
  • Tags: Vision-Language Model, Medical AI, LLM Evaluation
  • Summary: 本文评估了医学微调的视觉语言模型在四个医学影像任务上的表现,发现随着任务难度增加,性能下降至接近随机水平,且对提示词高度敏感。作者引入了基于描述的流程来恢复潜在知识,但结果表明医学VLM性能脆弱,领域微调未能可靠提升性能。

[260] RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies

  • arXiv: 2604.09860 (cross-listed)
  • Authors: Xuning Yang, Rishit Dagli, Alex Zook, Hugo Hadfield, Ankit Goyal, Stan Birchfield, Fabio Ramos, Jonathan Tremblay
  • Subjects: cs.RO; cs.AI
  • Tags: Robotics, LLM Evaluation, Embodied AI
  • Summary: 本文介绍了RoboLab,一个用于评估任务通用机器人策略的高保真仿真基准框架,包含120个任务,涵盖视觉、程序和关系能力三个维度。该框架支持人工和LLM生成的场景与任务,并提供了对真实世界策略在受控扰动下的系统分析。

[261] PAS: Estimating the target accuracy before domain adaptation

  • arXiv: 2604.09863 (cross-listed)
  • Authors: Raphaella Diniz, Jackson de Faria, Martin Ester
  • Subjects: cs.CV; cs.AI
  • Tags: Domain Adaptation, Transfer Learning
  • Venue: ICLR 2026
  • Summary: 本文提出PAS,一种新颖的评分方法,用于在执行域适应之前估计源域和预训练特征提取器对目标分类任务的可迁移性。实验表明PAS与实际目标精度高度相关,能有效指导最佳预训练模型和源域的选择。

[262] Automating Structural Analysis Across Multiple Software Platforms Using Large Language Models

  • arXiv: 2604.09866 (cross-listed)
  • Authors: Ziheng Geng, Jiachen Liu, Ian Franklin, Ran Cao, Dan M. Frangopol, Minghui Cheng
  • Subjects: cs.SE; cs.AI
  • Tags: LLM Agent, Multi-Agent System, Code Generation
  • Summary: 本文开发了一种能够跨多个有限元分析软件平台(ETABS、SAP2000、OpenSees)自动化框架结构分析的LLLM系统。该系统采用两阶段多智能体架构,在20个代表性框架问题上实现了超过90%的准确率。

[263] Exploring Structural Complexity in Normative RAG with Graph-based approaches: A case study on the ETSI Standards

  • arXiv: 2604.09868 (cross-listed)
  • Authors: Aiman Al Masoud, Marco Arazzi, Simone Germani, Antonino Nocera
  • Subjects: cs.IR; cs.AI; cs.CL
  • Tags: RAG, Knowledge Graph
  • Summary: 本文研究了基于图的RAG架构在处理工业标准和规范文档中的有效性,将信息表示为相互连接的节点以捕获文档的结构和关系特征。在ETSI标准上的实验表明,将结构和词汇信息纳入索引可以提升检索性能。

[264] Relational Preference Encoding in Looped Transformer Internal States

  • arXiv: 2604.09870 (cross-listed)
  • Authors: Jan Kirin
  • Subjects: cs.LG; cs.AI
  • Tags: RLHF, LLM Alignment, Interpretability
  • Summary: 本文研究了循环Transformer如何在内部迭代状态中编码人类偏好,通过在Anthropic HH-RLHF数据集上训练轻量级评估头,发现循环状态主要以关系方式编码偏好,成对评估器达到95.2%的测试准确率。

[265] Efficient Personalization of Generative User Interfaces

  • arXiv: 2604.09876 (cross-listed)
  • Authors: Yi-Hao Peng, Samarth Das, Jeffrey P. Bigham, Jason Wu
  • Subjects: cs.LG; cs.AI; cs.CV; cs.HC
  • Tags: UI Generation, LLM Personalization
  • Summary: 本文通过20位设计师对600个生成UI的成对判断数据集研究UI个性化问题,发现设计师之间存在显著分歧。作者开发了一种样本高效的个性化方法,通过先前设计师而非固定设计概念来表示新用户,在新设计师测试中优于基线方法。

[266] DINO_4D: Semantic-Aware 4D Reconstruction

  • arXiv: 2604.09877 (cross-listed)
  • Authors: Yiru Yang, Zhuojie Wu, Quentin Marguet, Nishant Kumar Singh, Max Schulthess
  • Subjects: cs.CV; cs.AI; cs.RO
  • Tags: 3D Vision, 3D Reconstruction, Vision Transformer
  • Summary: 本文提出DINO_4D,一种将冻结的DINOv2特征作为结构先验的4D重建方法,将语义感知注入重建过程以有效抑制动态跟踪中的语义漂移。该方法在保持线性时间复杂度的同时,显著提高了跟踪精度和重建完整性。

[267] Not Your Stereo-Typical Estimator: Combining Vision and Language for Volume Perception

  • arXiv: 2604.09886 (cross-listed)
  • Authors: Gautham Vinod, Bruce Coburn, Siddeshwar Raghavan, Fengqing Zhu
  • Subjects: cs.CV; cs.AI; cs.LG; cs.MM; eess.IV
  • Tags: Vision-Language Model, 3D Vision
  • Code: code
  • Summary: 本文提出一种融合立体视觉隐式3D线索和自然语言文本显式先验知识的物体体积估计方法。实验表明,文本引导的方法显著优于纯视觉基线,证明了简单文本先验可以有效指导体积估计任务。

[268] Should We be Pedantic About Reasoning Errors in Machine Translation?

  • arXiv: 2604.09890 (cross-listed)
  • Authors: Calvin Bao, Marine Carpuat
  • Subjects: cs.CL; cs.AI
  • Tags: Machine Translation, LLM Reasoning
  • Summary: 本文研究了机器翻译中的推理错误,开发了自动标注协议来检测三种错误类别:源句错位、模型假设错位和推理轨迹错位。实验表明,对推理轨迹的小修正对翻译质量影响很小,表明机器翻译的推理忠实度有限。

[269] Diffusion Denoiser Achievable Analysis for Finite Blocklength Unsourced Random Access

  • arXiv: 2604.09904 (cross-listed)
  • Authors: Yuming Han, Yuxin Long
  • Subjects: cs.IT; cs.AI
  • Tags: Diffusion Model, Wireless Networks
  • Summary: 本文为有限块长无源随机接入问题引入了一种与解码器兼容的扩散去噪器,作为联合解码中的轻量级分析组件。该方法在现有解码器上实现了至少0.5 dB的性能提升。

[270] From UAV Imagery to Agronomic Reasoning: A Multimodal LLM Benchmark for Plant Phenotyping

  • arXiv: 2604.09907 (cross-listed)
  • Authors: Yu Wu, Guangzeng Han, Ibra Niang Niang, Francia Ravelombola, Maiara Oliveira, Jason Davis, Dong Chen, Feng Lin, Xiaolei Huang
  • Subjects: cs.CV; cs.AI; cs.CL
  • Tags: Vision-Language Model, LLM Evaluation, Scientific Reasoning
  • Summary: 本文开发了PlantXpert,一个用于大豆和棉花表型的多模态推理基准,包含385张图像和超过3000个基准样本。对11个VLM的评估表明,任务特定微调带来显著提升,但定量和生物学推理仍是挑战。

[271] The Rise and Fall of $G$ in AGI

  • arXiv: 2604.09911 (cross-listed)
  • Authors: David C. Krakauer
  • Subjects: q-bio.NC; cs.AI
  • Tags: LLM Evaluation, Cognitive Science
  • Summary: 本文将心理测量学中的Spearman g因子与AGI基准测试性能联系起来,对2019-2025年间39个模型和14个基准进行主成分分析。结果显示强正流形随推理专用模型的出现而减弱,表明AI模型正从通用智能向专业化演进。

[272] A Hybrid Intelligent Framework for Uncertainty-Aware Condition Monitoring of Industrial Systems

  • arXiv: 2604.09932 (cross-listed)
  • Authors: Maryam Ahang, Todd Charter, Masoud Jalayer, Homayoun Najjaran
  • Subjects: cs.LG; cs.AI; eess.SP
  • Tags: Predictive Maintenance, Uncertainty Estimation, Physics-Informed Learning
  • Summary: 本文开发了一个混合状态监测框架,集成了传感器测量、时间特征和基于物理的残差,通过特征级融合和模型级集成两种策略进行整合。在CSTR基准上的实验表明,两种混合方法都提高了诊断准确性和不确定性管理能力。

[273] I Walk the Line: Examining the Role of Gestalt Continuity in Object Binding for Vision Transformers

  • arXiv: 2604.09942 (cross-listed)
  • Authors: Alexa R. Tartaglini, Michael A. Lepori
  • Subjects: cs.CV; cs.AI
  • Tags: Vision Transformer, Interpretability
  • Summary: 本文研究了视觉Transformer是否利用格式塔连续性原则进行物体绑定。作者通过合成数据集证明绑定探针对连续性敏感,识别出追踪连续性的特定注意力头,并表明这些头有助于生成编码物体绑定的表示。

[274] Cross-Cultural Value Awareness in Large Vision-Language Models

  • arXiv: 2604.09945 (cross-listed)
  • Authors: Phillip Howard, Xin Su, Kathleen C. Fraser
  • Subjects: cs.CV; cs.AI; cs.CL
  • Tags: Vision-Language Model, Fairness, Bias Mitigation
  • Summary: 本文研究了大型视觉语言模型中与文化背景(如宗教、国籍、社会经济地位)相关的刻板印象问题。作者使用反事实图像集和道德基础理论,评估了五种LVLM在跨文化价值判断方面的意识和偏见。

[275] Rebooting Microreboot: Architectural Support for Safe, Parallel Recovery in Microservice Systems

  • arXiv: 2604.09963 (cross-listed)
  • Authors: Laurent Bindschaedler
  • Subjects: cs.DC; cs.AI; cs.SE
  • Tags: Multi-Agent System, LLM Agent
  • Venue: ARCS 2026
  • Summary: 本文提出了一种三智能体架构来实现微服务系统的安全微重启,通过将规划与执行分离,并使用微内核进行验证。该系统在线推断恢复边界,在仿真中将智能体造成的损害减少了95%。

[276] Muon$^2$: Boosting Muon via Adaptive Second-Moment Preconditioning

  • arXiv: 2604.09967 (cross-listed)
  • Authors: Ziyue Liu, Ruijie Zhang, Zhengyang Wang, Yequan Zhao, Yupeng Su, Zi Yang, Zheng Zhang
  • Subjects: cs.LG; cs.AI
  • Tags: Optimization, Pre-training
  • Summary: 本文提出了Muon²优化器,在正交化之前应用Adam风格的自适应二阶矩预处理。该方法改善了动量矩阵的谱条件,在GPT和LLaMA预训练中表现优异,同时将Newton-Schulz迭代次数减少了40%。

[277] A Minimal Model of Representation Collapse: Frustration, Stop-Gradient, and Dynamics

  • arXiv: 2604.09979 (cross-listed)
  • Authors: Louie Hong Yao, Yuhao Li, Shengchao Liu
  • Subjects: cs.AI; cs.LG
  • Tags: Self-Supervised Learning, Representation Learning
  • Summary: 本文引入了一个最小嵌入模型来分析自监督学习中的表示崩溃问题。作者证明了受挫样本会诱导崩溃,并展示了如何通过添加投影头和应用停止梯度来稳定训练动态,实现非崩溃解。

[278] FlowPalm: Optical Flow Driven Non-Rigid Deformation for Geometrically Diverse Palmprint Generation

  • arXiv: 2604.09989 (cross-listed)
  • Authors: Yuchen Zou, Huikai Shao, Lihuang Fang, Zhipeng Xiong, Dexing Zhong
  • Subjects: cs.CV; cs.AI
  • Tags: Diffusion Model, Image Synthesis
  • Summary: 本文提出了FlowPalm框架,利用光流捕获几何变形的统计模式来生成掌纹。该方法在扩散过程中逐步引入几何变形同时保持身份一致性,在六个基准数据集上显著优于现有方法。

[279] Agentic Application in Power Grid Static Analysis: Automatic Code Generation and Error Correction

  • arXiv: 2604.09995 (cross-listed)
  • Authors: Qinjuan Wang, Shan Yang, Yongli Zhu
  • Subjects: eess.SY; cs.AI
  • Tags: LLM Agent, Code Generation
  • Venue: CEEPE 2026
  • Summary: 本文介绍了一个LLM智能体,通过将自然语言转换为MATPOWER脚本来自动化电网静态分析。该系统采用三层错误纠正机制,在代码保真度方面达到82.38%的准确率。

[280] Like a Hammer, It Can Build, It Can Break: Large Language Model Uses, Perceptions, and Adoption in Cybersecurity Operations on Reddit

  • arXiv: 2604.09998 (cross-listed)
  • Authors: Souradip Nath, Chih-Yi Huang, Aditi Ganapathi, Kashyap Thimmaraju, Jaron Mink, Gail-Joon Ahn
  • Subjects: cs.CR; cs.AI
  • Tags: Cybersecurity, LLM Evaluation
  • Summary: 本文通过分析Reddit上网络安全论坛的讨论,研究了安全运营中心从业者如何使用和看待LLM工具。研究发现从业者独立使用LLM进行低风险生产力任务,但对可靠性和安全风险存在持续担忧。

[281] Demographic and Linguistic Bias Evaluation in Omnimodal Language Models

  • arXiv: 2604.10014 (cross-listed)
  • Authors: Alaa Elobaid
  • Subjects: cs.CV; cs.AI; cs.CL
  • Tags: Fairness, Vision-Language Model, Bias Mitigation
  • Venue: ICPR 2026
  • Summary: 本文对全模态语言模型在文本、图像、音频和视频处理中的人口统计学和语言偏见进行了全面评估。结果显示音频理解任务表现出显著较低的性能和较大的偏见,而图像和视频理解任务表现较好。

[282] FREE-Switch: Frequency-based Dynamic LoRA Switch for Style Transfer

  • arXiv: 2604.10023 (cross-listed)
  • Authors: Shenghe Zheng, Minyu Zhang, Tianhao Liu, Hongzhi Wang
  • Subjects: cs.CV; cs.AI
  • Tags: Diffusion Model, Text-to-Image, Model Compression
  • Venue: CVPR 2026
  • Summary: 本文提出了一种基于频域重要性的动态LoRA切换方法,用于结合预训练适配器进行风格迁移。该框架包含自动生成对齐机制来保持适配器间的语义一致性,有效降低了高质量定制生成的训练成本。

[283] LVSum: A Benchmark for Timestamp-Aware Long Video Summarization

  • arXiv: 2604.10024 (cross-listed)
  • Authors: Alkesh Patel, Melis Ozyildirim, Ying-Chang Cheng, Ganesh Nagarajan
  • Subjects: cs.CV; cs.AI; cs.LG
  • Tags: Video Understanding, Summarization, LLM Evaluation
  • Summary: 本文提出了LVSum基准,专门用于评估具有细粒度时间对齐的长视频摘要。该基准包含13个领域的多样化长视频,每个视频配有人类生成的包含精确时间引用的摘要。

[284] CoSToM:Causal-oriented Steering for Intrinsic Theory-of-Mind Alignment in Large Language Models

  • arXiv: 2604.10031 (cross-listed)
  • Authors: Mengfan Li, Xuanhua Shi, Yang Deng
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Alignment, Interpretability, Social Reasoning
  • Venue: ACL 2026
  • Summary: 本文提出了CoSToM框架,通过因果追踪和激活引导来对齐LLM的心智理论能力。该方法映射内部ToM特征分布,并在关键层进行定向引导,显著增强了类人社会推理能力和对话质量。

[285] Closed-Form Concept Erasure via Double Projections

  • arXiv: 2604.10032 (cross-listed)
  • Authors: Chi Zhang, Jingpu Cheng, Zhixian Wang, Ping Liu
  • Subjects: cs.LG; cs.AI
  • Tags: Diffusion Model, Text-to-Image, AI Safety
  • Summary: 本文提出了一种线性变换框架,无需训练即可分析性地实现生成模型中的概念擦除。该方法通过两次闭式投影步骤实现确定性、几何可解释的概念移除,在保持非目标概念的同时匹配或超越现有方法。

[286] Computational Implementation of a Model of Category-Theoretic Metaphor Comprehension

  • arXiv: 2604.10035 (cross-listed)
  • Authors: Fumitaka Iwaki, Miho Fuyama, Hayato Saigo, Tatsuji Takahashi
  • Subjects: cs.CL; cs.AI
  • Tags: Cognitive Science, Natural Language Understanding
  • Summary: 本文开发了基于范畴论的隐喻理解模型的计算实现。改进后的算法在数据拟合、系统性和理解新颖性三个指标上均优于现有方法。

[287] ASPIRin: Action Space Projection for Interactivity-Optimized Reinforcement Learning in Full-Duplex Speech Language Models

  • arXiv: 2604.10065 (cross-listed)
  • Authors: Chi-Yuan Hsiao, Ke-Han Lu, Yu-Kuan Fu, Guan-Ting Lin, Hsiao-Tsung Hung, Hung-yi Lee
  • Subjects: cs.CL; cs.AI; cs.SD; eess.AS
  • Tags: Speech Processing, Reinforcement Learning, Dialogue System
  • Summary: 本文提出了ASPIRin框架,通过动作空间投影将全双工语音语言模型中的时机决策与内容生成分离。使用GRPO和基于规则的奖励,该方法优化了交互性同时保持了语义连贯性,将重复n-gram减少了50%以上。

[288] Graph-RHO: Critical-path-aware Heterogeneous Graph Network for Long-Horizon Flexible Job-Shop Scheduling

  • arXiv: 2604.10073 (cross-listed)
  • Authors: Yujie Li, Jiuniu Wang, Mugen Peng, Guangzuo Li, Wenjia Xu
  • Subjects: cs.LG; cs.AI
  • Tags: Graph Neural Network, Automated Planning, Scheduling
  • Venue: IJCNN 2026
  • Code: code
  • Summary: 本文提出了Graph-RHO框架,用于长视野柔性作业车间调度问题。该方法使用拓扑感知的异构图网络和关键路径感知机制,在标准基准上建立了新的最先进性能,并展现出卓越的零样本泛化能力。

[289] MatRes: Zero-Shot Test-Time Model Adaptation for Simultaneous Matching and Restoration

  • arXiv: 2604.10081 (cross-listed)
  • Authors: Kanggeon Lee, Soochahn Lee, Kyoung Mu Lee
  • Subjects: cs.CV; cs.AI
  • Tags: Zero-Shot Learning, Image Enhancement, Transfer Learning
  • Summary: 本文提出了MatRes,一种零样本测试时自适应框架,通过在对应位置强制条件相似性,联合改善图像恢复质量和对应估计,无需离线训练或额外监督。

[290] Degradation-Consistent Paired Training for Robust AI-Generated Image Detection

  • arXiv: 2604.10102 (cross-listed)
  • Authors: Zongyou Yang, Yinghan Hou, Xiaokun Yang
  • Subjects: cs.CV; cs.AI
  • Tags: Deepfake Detection, Adversarial Robustness, Data Augmentation
  • Summary: 本文提出了DCPT训练策略,通过在干净视图和退化视图之间强制特征一致性和预测一致性约束,显著提升AI生成图像检测器在真实世界图像退化条件下的鲁棒性。

[291] CircuitSynth: Reliable Synthetic Data Generation

  • arXiv: 2604.10114 (cross-listed)
  • Authors: Zehua Cheng, Wei Dai, Jiahao Sun, Thomas Lukasiewicz
  • Subjects: cs.CL; cs.AI
  • Tags: Data Synthesis, Neurosymbolic AI, Knowledge Distillation
  • Summary: 本文提出了CircuitSynth,一种神经符号框架,通过将教师LLM的推理能力蒸馏到概率句子决策图中,生成具有形式保证的高保真合成数据,实现100%的模式有效性。

[292] A Dual Cross-Attention Graph Learning Framework For Multimodal MRI-Based Major Depressive Disorder Detection

  • arXiv: 2604.10116 (cross-listed)
  • Authors: Nojod M. Alotaibi, Areej M. Alhothali
  • Subjects: cs.CV; cs.AI
  • Tags: Medical AI, Graph Neural Network, Multimodal Learning
  • Summary: 本文提出了一种基于双交叉注意力的多模态融合框架,用于重度抑郁症检测,通过显式建模结构MRI和功能MRI之间的双向交互,在REST-meta-MDD数据集上达到84.71%的准确率。

[293] MR-Coupler: Automated Metamorphic Test Generation via Functional Coupling Analysis

  • arXiv: 2604.10126 (cross-listed)
  • Authors: Congying Xu, Hengcheng Zhu, Songqiang Chen, Jiarong Wu, Valerio Terragni, Shing-Chi Cheung
  • Subjects: cs.SE; cs.AI
  • Tags: Software Testing, LLM Agent, Test Generation
  • Venue: FSE 2026
  • Summary: 本文提出了MR-Coupler,一种利用方法间的功能耦合自动构建变形关系并生成变形测试用例的方法,在超过90%的任务中生成有效测试用例,并检测到44%的真实bug。

[294] VGA-Bench: A Unified Benchmark and Multi-Model Framework for Video Aesthetics and Generation Quality Evaluation

  • arXiv: 2604.10127 (cross-listed)
  • Authors: Longteng Jiang, DanDan Zheng, Qianqian Qiao, Heng Huang, Huaye Wang, Yihang Bo, Bao Peng, Jingdong Chen, Jun Zhou, Xin Jin
  • Subjects: cs.CV; cs.AI
  • Tags: Video Generation, LLM Evaluation, Benchmark
  • Venue: CVPR 2026
  • Summary: 本文介绍了VGA-Bench,一个用于联合评估视频生成质量和美学质量的统一基准,包含超过60,000个视频和三个专门的多任务神经评估器,用于美学预测、标签和质量评估。

[295] Semantic Manipulation Localization

  • arXiv: 2604.10132 (cross-listed)
  • Authors: Zhenshan Tan, Chenhan Lu, Yuxiang Huang, Ziwen He, Xiang Zhang, Yuzhe Sha, Xianyi Chen, Tianrun Chen, Zhangjie Fu
  • Subjects: cs.CV; cs.AI
  • Tags: Image Editing, Deepfake Detection, Object Detection
  • Summary: 本文引入了语义操作定位(SML)任务,专注于定位改变图像解释的细微语义编辑,并提出了TRACE框架,通过语义锚定、扰动感知和语义约束推理来建模语义敏感性。

[296] Think in Sentences: Explicit Sentence Boundaries Enhance Language Model's Capabilities

  • arXiv: 2604.10135 (cross-listed)
  • Authors: Zhichen Liu, Yongyuan Li, Yang Xu
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Reasoning, Prompt Engineering, In-Context Learning
  • Venue: ACL 2026
  • Summary: 本文提出在LLM输入的句子边界处插入分隔符的方法,通过句子级处理增强LLM的推理能力,在GSM8k上提升7.7%,在DROP上提升12.5%。

[297] MOSAIC: Multi-Domain Orthogonal Session Adaptive Intent Capture for Prescient Recommendations

  • arXiv: 2604.10147 (cross-listed)
  • Authors: Abderaouf Bahi, Mourad Boughaba, Ibtissem Gasmi, Warda Deghmane, Amel Ourici
  • Subjects: cs.IR; cs.AI
  • Tags: Recommender System, Multi-Domain Learning, Representation Learning
  • Summary: 本文提出了MOSAIC框架,通过将用户偏好分解为领域特定、领域公共和跨序列独占三个正交组件,实现多领域会话推荐系统中的有效意图捕获。

[298] A Temporally Augmented Graph Attention Network for Affordance Classification

  • arXiv: 2604.10149 (cross-listed)
  • Authors: Ami Chopra, Supriya Bordoloi, Shyamanta M. Hazarika
  • Subjects: cs.LG; cs.AI
  • Tags: Graph Neural Network, Brain-Computer Interface, Temporal Learning
  • Venue: GCON 2026
  • Summary: 本文提出了EEG-tGAT,一种时间增强的图注意力网络,用于从EEG交互序列中进行可供性分类,通过时间注意力和时间dropout机制提升分类性能。

[299] Virtual Smart Metering in District Heating Networks via Heterogeneous Spatial-Temporal Graph Neural Networks

  • arXiv: 2604.10166 (cross-listed)
  • Authors: Keivan Faghih Niresi, Christian Møller Jensen, Carsten Skovmose Kallesøe, Rafael Wisniewski, Olga Fink
  • Subjects: cs.LG; cs.AI; eess.SY
  • Tags: Graph Neural Network, Time Series Forecasting, Energy Efficiency
  • Summary: 本文提出了HSTGNN,一种异构时空图神经网络,用于区域供热网络中的虚拟智能热表构建,通过专用分支学习流量、温度和压力测量的图结构和时间动态。

[300] Wolkowicz-Styan Upper Bound on the Hessian Eigenspectrum for Cross-Entropy Loss in Nonlinear Smooth Neural Networks

  • arXiv: 2604.10202 (cross-listed)
  • Authors: Yuto Omae, Kazuki Sakai, Yohei Kakimoto, Makoto Sasaki, Yusuke Sakai, Hirotaka Takahashi
  • Subjects: cs.LG; cs.AI; cs.NE
  • Tags: Deep Learning Theory, Optimization, Interpretability
  • Summary: 本文针对使用交叉熵损失的非线性平滑多层神经网络,推导了Hessian矩阵最大特征值的闭式上界,提供了损失尖锐度的解析表征。

[301] Exploring the impact of fairness-aware criteria in AutoML

  • arXiv: 2604.10224 (cross-listed)
  • Authors: Joana Simões, João Correia
  • Subjects: cs.LG; cs.AI
  • Tags: Fairness, AutoML, Bias Mitigation
  • Summary: 本文研究了将公平性指标集成到AutoML框架优化组件中的影响,在预测性能下降9.4%的情况下,平均公平性提升14.5%,数据使用量减少35.7%。

[302] Adapting 2D Multi-Modal Large Language Model for 3D CT Image Analysis

  • arXiv: 2604.10233 (cross-listed)
  • Authors: Yang Yu, Dunyuan Xu, Yaoqian Li, Xiaomeng Li, Jinpeng Li, Pheng-Ann Heng
  • Subjects: cs.CV; cs.AI
  • Tags: Medical AI, Vision-Language Model, Multimodal Learning
  • Summary: 本文提出将2D多模态LLM适配到3D医学图像分析,设计了文本引导的层次化MoE框架,使视觉编码器能够为医学报告生成和医学视觉问答任务提取定制化的图像特征。

[303] FashionMV: Product-Level Composed Image Retrieval with Multi-View Fashion Data

  • arXiv: 2604.10297 (cross-listed)
  • Authors: Peng Yuan, Bingyin Mei, Hui Zhang
  • Subjects: cs.CV; cs.AI
  • Tags: Vision-Language Model, Information Retrieval, Multimodal Learning
  • Code: code
  • Summary: 本文定义了多视图组合图像检索任务并构建了FashionMV数据集,提出了ProCIR框架,通过两阶段对话、对齐和思维链机制实现产品级检索。

[304] From Helpful to Trustworthy: LLM Agents for Pair Programming

  • arXiv: 2604.10300 (cross-listed)
  • Authors: Ragib Shahariar Ayon
  • Subjects: cs.SE; cs.AI
  • Tags: LLM Agent, Code Generation, Software Testing
  • Venue: FSE Companion 2026
  • Summary: 本文提出研究多智能体LLM结对编程工作流,通过外化意图和使用开发工具进行迭代验证,以构建可靠的编程助手,涵盖需求规范、测试实现和维护任务三个研究方向。

[305] Class-Adaptive Cooperative Perception for Multi-Class LiDAR-based 3D Object Detection in V2X Systems

  • arXiv: 2604.10305 (cross-listed)
  • Authors: Blessing Agyei Kyem, Joshua Kofi Asamoah, Armstrong Aboah
  • Subjects: cs.CV; cs.AI; cs.ET
  • Tags: Object Detection, Autonomous Driving, 3D Vision
  • Summary: 该论文提出了一种用于V2X系统中多类别3D目标检测的类自适应协同感知架构。模型集成了多尺度窗口注意力、类特定融合模块和类平衡目标权重,以处理不同几何结构的大小物体,在V2X-Real基准测试上取得了显著改进。

[306] Jailbreaking the Matrix: Nullspace Steering for Controlled Model Subversion

  • arXiv: 2604.10326 (cross-listed)
  • Authors: Vishal Pramanik, Maisha Maliha, Susmit Jha, Sumit Kumar Jha
  • Subjects: cs.CR; cs.AI
  • Tags: LLM Security, Adversarial Robustness, LLM Alignment
  • Summary: 该论文提出了Head-Masked Nullspace Steering (HMNS),一种电路级干预方法用于LLM越狱攻击。该方法识别因果注意力头,通过列掩码抑制其写入路径,并在正交补空间注入扰动,在多个越狱基准测试上达到了最先进的攻击成功率。

[307] A Diffusion-Contrastive Graph Neural Network with Virtual Nodes for Wind Nowcasting in Unobserved Regions

  • arXiv: 2604.10328 (cross-listed)
  • Authors: Jie Shi, Siamak Mehrkanoon
  • Subjects: cs.LG; cs.AI
  • Tags: Weather Forecasting, Graph Neural Network, Self-Supervised Learning
  • Summary: 该论文提出了一种深度图自监督框架,通过在扩散对比图神经网络中引入虚拟节点,实现对无观测区域的风速预测。该方法将风速、阵风和风向的预测平均绝对误差降低了30-46%。

[308] Multinex: Lightweight Low-light Image Enhancement via Multi-prior Retinex

  • arXiv: 2604.10359 (cross-listed)
  • Authors: Alexandru Brateanu, Tingting Mu, Codruta Ancuti, Cosmin Ancuti
  • Subjects: cs.CV; cs.AI
  • Tags: Image Enhancement, Low Power, Edge Computing
  • Summary: 该论文提出了Multinex,一种超轻量级的低光照图像增强框架,通过Retinex残差公式整合多个细粒度表示。模型仅有45K或0.7K参数,在显著降低计算成本的同时达到了与大型模型相当的性能。

[309] FishRoPE: Projective Rotary Position Embeddings for Omnidirectional Visual Perception

  • arXiv: 2604.10391 (cross-listed)
  • Authors: Rahul Ahuja, Mudit Jain, Bala Murali Manoghar Sai Sudhakar, Venkatraman Narayanan, Pratik Likhar, Varun Ravi Kumar, Senthil Yogamani
  • Subjects: cs.CV; cs.AI
  • Tags: Object Detection, Vision Transformer, Autonomous Driving
  • Summary: 该论文提出了FishRoPE,一种轻量级框架,通过LoRA和球面坐标系下的旋转位置嵌入将冻结的视觉基础模型适配到鱼眼几何。该方法在WoodScape 2D检测和SynWoodScapes BEV分割上达到了最先进的结果。

[310] Intent-aligned Formal Specification Synthesis via Traceable Refinement

  • arXiv: 2604.10392 (cross-listed)
  • Authors: Zhe Ye, Aidan Z.H. Yang, Huangyuan Su, Zhenyu Liao, Samuel Tenka, Zhizhen Qin, Udaya Ghai, Dawn Song, Soonho Kong
  • Subjects: cs.LG; cs.AI; cs.LO; cs.PL; cs.SE
  • Tags: Formal Methods, Code Generation, LLM Reasoning
  • Summary: 该论文提出了VeriSpecGen框架,通过需求级归因和局部修复来合成意图对齐的Lean形式化规范。该方法在VERINA SpecGen任务上达到86.6%的准确率,并生成了343K训练样本,将规范合成性能提升了62-106%。

[311] Rethinking Video Human-Object Interaction: Set Prediction over Time for Unified Detection and Anticipation

  • arXiv: 2604.10397 (cross-listed)
  • Authors: Yuanhao Luo, Di Wen, Kunyu Peng, Ruiping Liu, Junwei Zheng, Yufan Chen, Jiale Wei, Rainer Stiefelhage
  • Subjects: cs.CV; cs.AI
  • Tags: Video Understanding, Object Detection, Action Recognition
  • Summary: 该论文引入了DETAnt-HOI时间校正基准和HOI-DA框架,联合执行主体-客体定位、当前HOI检测和未来预测。实验表明该方法在检测和预测任务上均有一致改进,尤其在更长预测范围上增益更大。

[312] IMPACT: A Dataset for Multi-Granularity Human Procedural Action Understanding in Industrial Assembly

  • arXiv: 2604.10409 (cross-listed)
  • Authors: Di Wen, Zeyun Zhong, David Schneider, Manuel Zaremski, Linus Kunzmann, Yitian Shi, Ruiping Liu, Yufan Chen, Junwei Zheng, Jiahang Li, Jonas Hemmerich, Qiyi Tong, Patric Grauberger, Arash Ajoudani, Danda Pani Paudel, Sven Matthiesen, Barbara Deml, Jürgen Beyerer, Luc Van Gool, Rainer Stiefelhagen, Kunyu Peng
  • Subjects: cs.CV; cs.AI
  • Tags: Benchmark, Action Recognition, Human Activity Recognition
  • Code: code
  • Summary: 该论文介绍了IMPACT数据集,这是一个用于工业程序理解的多视角RGB-D数据集,包含角磨机的真实组装和拆卸过程。数据集包含112次试验,提供双手机器人标注、合规感知状态跟踪和异常恢复监督。

[313] CodaRAG: Connecting the Dots with Associativity Inspired by Complementary Learning

  • arXiv: 2604.10426 (cross-listed)
  • Authors: Cheng-Yen Li, Xuanjun Chen, Claire Lin, Wei-Yu Chen, Wenhua Nie, Hung-Yi Lee, Jyh-Shing Roger Jang
  • Subjects: cs.CL; cs.AI
  • Tags: RAG, LLM Reasoning, Knowledge Management
  • Summary: 该论文提出了CodaRAG框架,受互补学习系统启发,通过知识整合、关联导航和干扰消除三个阶段将检索从被动查找转变为主动关联发现。在GraphRAG-Bench上实现了检索召回率7-10%和生成准确率3-11%的绝对提升。

[314] A Queueing-Theoretic Framework for Dynamic Attack Surfaces: Data-Integrated Risk Analysis and Adaptive Defense

  • arXiv: 2604.10427 (cross-listed)
  • Authors: Jihyeon Yun, Abdullah Yasin Etcibasi, Ming Shi, C. Emre Koksal
  • Subjects: cs.CR; cs.AI; cs.LG; eess.SY; math.OC
  • Tags: Cybersecurity, Reinforcement Learning, AI Safety
  • Summary: 该论文开发了一个排队论框架来建模网络攻击面的时间演化,并提出了基于强化学习的自适应防御算法。在ARVO数据集上的实验表明,该方法将软件供应链中的活跃漏洞数量减少了90%以上。

[315] Towards Green Wearable Computing: A Physics-Aware Spiking Neural Network for Energy-Efficient IMU-based Human Activity Recognition

  • arXiv: 2604.10458 (cross-listed)
  • Authors: Naichuan Zheng, Hailun Xia, Zepeng Sun, Weiyi Li, Yinze Zhou
  • Subjects: cs.LG; cs.AI
  • Tags: Human Activity Recognition, Neuromorphic Computing, Energy Efficiency
  • Summary: 该论文提出了PAS-Net,一种用于可穿戴设备人体活动识别的物理感知脉冲神经网络。该架构通过自适应对称拓扑混合器和因果神经调节器实现最先进精度,并通过早退机制将动态能耗降低高达98%。

[316] Toward Accountable AI-Generated Content on Social Platforms: Steganographic Attribution and Multimodal Harm Detection

  • arXiv: 2604.10460 (cross-listed)
  • Authors: Xinlei Guan, David Arosemena, Tejaswi Dhandu, Kuan Huang, Meng Xu, Miles Q. Li, Bingyu Shen, Ruiyang Qin, Umamaheswara Rao Tida, Boyang Li
  • Subjects: cs.CV; cs.AI; cs.CR; cs.ET
  • Tags: Content Moderation, Deepfake Detection, Multimodal Learning
  • Code: code
  • Summary: 该论文引入了一个隐写术归因框架,在图像创建时嵌入加密签名标识符,并使用多模态有害内容检测触发归因验证。系统的多模态融合检测器达到0.99的AUC-ROC,实现了可靠的跨模态归因验证。

[317] Rethinking the Diffusion Model from a Langevin Perspective

  • arXiv: 2604.10465 (cross-listed)
  • Authors: Candi Zheng, Yuan Lan
  • Subjects: cs.LG; cs.AI; cs.CV
  • Tags: Diffusion Model, Deep Learning Theory
  • Summary: 该论文从朗之万视角系统性地重新组织扩散模型,提供了关于反向过程如何反转正向过程的直观解释。文章还讨论了ODE和SDE扩散模型的统一框架,以及扩散模型相比VAE的理论优势。

[318] From Query to Counsel: Structured Reasoning with a Multi-Agent Framework and Dataset for Legal Consultation

  • arXiv: 2604.10470 (cross-listed)
  • Authors: Mingfei Lu, Yi Zhang, Mengjia Wu, Yue Feng
  • Subjects: cs.CL; cs.AI
  • Tags: Legal AI, Multi-Agent System, LLM Reasoning
  • Venue: ACL 2026
  • Summary: 该论文构建了JurisCQAD数据集,包含超过43,000个真实中文法律咨询查询,并提出了JurisMA多智能体框架用于法律咨询问答。系统在多个词汇和语义指标上显著优于通用和法律领域的大语言模型。

[319] UDAPose: Unsupervised Domain Adaptation for Low-Light Human Pose Estimation

  • arXiv: 2604.10485 (cross-listed)
  • Authors: Haopeng Chen, Yihao Ai, Kabeen Kim, Robby T. Tan, Yixin Chen, Bo Wang
  • Subjects: cs.CV; cs.AI
  • Tags: Pose Estimation, Domain Adaptation, Computer Vision
  • Venue: CVPR 2026
  • Code: code
  • Summary: 该论文提出了UDAPose框架,用于低光照人体姿态估计的无监督域适应,通过合成低光照图像并动态融合视觉线索与姿态先验来改进姿态估计。在ExLPose-test困难集上取得了10.1 AP的提升。

[320] Cross-Cultural Bias in Mel-Scale Representations: Evidence and Alternatives from Speech and Music

  • arXiv: 2604.10503 (cross-listed)
  • Authors: Shivam Chauhan, Ajay Pundhir
  • Subjects: cs.SD; cs.AI
  • Tags: Speech Processing, Bias Mitigation, Fairness
  • Venue: ICASSP 2026
  • Summary: 该论文评估了mel尺度音频表示中的跨文化偏见,并在语音识别和音乐分析任务上与可学习替代方案进行比较。研究表明mel尺度特征对声调语言和非西方音乐存在性能差距,而LEAF和CQT等替代方案可显著减少这些差距。

[321] How Many Tries Does It Take? Iterative Self-Repair in LLM Code Generation Across Model Scales and Benchmarks

  • arXiv: 2604.10508 (cross-listed)
  • Authors: Johin Johny Arimbur
  • Subjects: cs.SE; cs.AI
  • Tags: Code Generation, LLM Reasoning
  • Summary: 本文研究了LLM代码生成中的迭代自修复机制,通过将执行错误反馈给模型进行修正。实验表明自修复普遍提高通过率,现代指令微调模型仅需提示即可成功实现自修复,甚至8B规模模型也能有效工作。

[322] Data-Efficient Surgical Phase Segmentation in Small-Incision Cataract Surgery: A Controlled Study of Vision Foundation Models

  • arXiv: 2604.10514 (cross-listed)
  • Authors: Lincoln Spencer, Song Wang, Chen Chen
  • Subjects: cs.CV; cs.AI
  • Tags: Medical AI, Video Understanding, Transfer Learning
  • Summary: 本文研究了小切口白内障手术的数据高效阶段分割,通过对比视觉编码器发现基础模型(如DINOv3)在低标签医学视频场景中表现最佳,展示了现代视觉基础模型向手术工作流理解的强迁移能力。

[323] ReFEree: Reference-Free and Fine-Grained Method for Evaluating Factual Consistency in Real-World Code Summarization

  • arXiv: 2604.10520 (cross-listed)
  • Authors: Suyoung Bae, CheolWon Na, Jaehoon Lee, Yumin Lee, YunSeok Choi, Jee-Hyong Lee
  • Subjects: cs.CL; cs.AI; cs.PL
  • Tags: Code Generation, LLM Evaluation, Software Documentation
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文提出ReFEree,一种用于评估代码摘要事实一致性的无参考细粒度方法。该方法在13个基线中与人类判断相关性最高,较之前最优方法提升15-18%。

[324] STORM: End-to-End Referring Multi-Object Tracking in Videos

  • arXiv: 2604.10527 (cross-listed)
  • Authors: Zijia Lu, Jingru Yi, Jue Wang, Yuxiao Chen, Junwen Chen, Xinyu Li, Davide Modolo
  • Subjects: cs.CV; cs.AI
  • Tags: Vision-Language Model, Video Understanding, Object Detection
  • Venue: CVPR 2026 Findings
  • Code: code
  • Summary: 本文提出STORM,一个端到端多模态大语言模型,用于指代性多目标跟踪任务。该方法在图像定位、单目标跟踪和RMOT基准上达到最优性能,并发布了新的STORM-Bench数据集。

[325] AI Patents in the United States and China: Measurement, Organization, and Knowledge Flows

  • arXiv: 2604.10529 (cross-listed)
  • Authors: Hanming Fang, Xian Gu, Hanyin Yan, Wu Zhu
  • Subjects: econ.GN; cs.AI; cs.CL; q-fin.GN
  • Tags: Benchmark, Patent Analysis
  • Summary: 本文开发了一个高精度分类器来测量AI专利,并应用于中美专利分析。研究发现两国AI专利快速增长,但组织方式差异显著,跨境引用显示技术相互依存而非脱钩。

[326] Towards an Appropriate Level of Reliance on AI: A Preliminary Reliance-Control Framework for AI in Software Engineering

  • arXiv: 2604.10530 (cross-listed)
  • Authors: Samuel Ferino, Rashina Hoda, John Grundy, Christoph Treude
  • Subjects: cs.SE; cs.AI; cs.HC
  • Tags: Human-Computer Interaction, LLM Evaluation
  • Venue: FSE 2026 Workshop
  • Summary: 本文基于对软件开发者的访谈,提出了一个初步的依赖控制框架,用于识别AI过度依赖和依赖不足。该框架有助于促进AI工具的负责任和有效使用。

[327] PepBenchmark: A Standardized Benchmark for Peptide Machine Learning

  • arXiv: 2604.10531 (cross-listed)
  • Authors: Jiahui Zhang, Rouyi Wang, Kuangqi Zhou, Tianshu Xiao, Lingyan Zhu, Yaosen Min, Yang Wang
  • Subjects: cs.LG; cs.AI
  • Tags: Benchmark, Drug Discovery, Molecular Generation
  • Code: code
  • Summary: 本文提出PepBenchmark,一个用于肽机器学习的标准化基准,统一了数据集、预处理和评估协议。该基准包含35个数据集,为肽药物发现提供了首个标准化基础。

[328] Machine Learning-Based Detection of MCP Attacks

  • arXiv: 2604.10534 (cross-listed)
  • Authors: Tobias Mattsson, Samuel Nyberg, Anton Borg, Ricardo Britto
  • Subjects: cs.CR; cs.AI; cs.SE
  • Tags: LLM Security, Cybersecurity
  • Summary: 本文开发了多种机器学习方法来检测恶意MCP工具描述,在二分类任务中达到100% F1分数。研究还开发了中间件用于实时环境部署,优于传统基于规则的解决方案。

[329] IceCache: Memory-efficient KV-cache Management for Long-Sequence LLMs

  • arXiv: 2604.10539 (cross-listed)
  • Authors: Yuzhen Mao, Qitong Wang, Martin Ester, Ke Li
  • Subjects: cs.LG; cs.AI
  • Tags: LLM Inference, Long Context
  • Summary: 本文提出IceCache,一种用于长序列LLM的内存高效KV缓存管理策略,结合语义token聚类与PagedAttention。该方法仅用25%的KV缓存token预算即可保持99%的原始准确率。

[330] VidAudio-Bench: Benchmarking V2A and VT2A Generation across Four Audio Categories

  • arXiv: 2604.10542 (cross-listed)
  • Authors: Qian Zhang, Yuqin Cao, Yixuan Gao, Xiongkuo Min
  • Subjects: cs.SD; cs.AI
  • Tags: Audio Generation, Video Understanding, Benchmark
  • Summary: 本文提出VidAudio-Bench,一个用于视频到音频生成评估的多任务基准,涵盖四种音频类别。基准揭示了当前V2A模型在语音和歌唱方面表现较差,并发现了视觉条件与指令遵循之间的根本张力。

[331] WaveMoE: A Wavelet-Enhanced Mixture-of-Experts Foundation Model for Time Series Forecasting

  • arXiv: 2604.10544 (cross-listed)
  • Authors: Shunyu Wu, Jiawei Huang, Weibin Feng, Boxin Li, Xiao Zhang, Erli Meng, Dan Li, Jian Lou, See-Kiong Ng
  • Subjects: cs.LG; cs.AI
  • Tags: Time Series Forecasting, Mixture-of-Experts
  • Venue: ICLR 2026 Workshop
  • Summary: 本文提出WaveMoE,一个小波增强的混合专家时间序列预测基础模型。该模型采用双路径架构联合处理时间序列token和小波token,在16个基准数据集上显示出改进的预测性能。

[332] LLMs Should Incorporate Explicit Mechanisms for Human Empathy

  • arXiv: 2604.10557 (cross-listed)
  • Authors: Xiaoxing You, Qiang Huang, Jun Yu
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Alignment, Affective Computing
  • Summary: 本文论证LLM应纳入显式的人类共情机制,识别出四种共情失败机制。作者将共情形式化为可观察的行为属性,并建议将共情感知目标作为LLM开发的一等组件。

[333] Early Decisions Matter: Proximity Bias and Initial Trajectory Shaping in Non-Autoregressive Diffusion Language Models

  • arXiv: 2604.10567 (cross-listed)
  • Authors: Jiyeon Kim, Sungik Choi, Yongrae Jo, Moontae Lee, Minjoon Seo
  • Subjects: cs.CL; cs.AI
  • Tags: Diffusion Model, LLM Inference, LLM Reasoning
  • Summary: 本文研究基于扩散的语言模型中的非自回归解码,发现了一种由邻近偏差引起的失败模式。作者提出了一种轻量级规划器的最小干预方法,在推理和规划任务上显著改进。

[334] Universal statistical signatures of evolution in artificial intelligence architectures

  • arXiv: 2604.10571 (cross-listed)
  • Authors: Theodor Spiro
  • Subjects: q-bio.PE; cs.AI; cs.CY; cs.NE
  • Tags: Deep Learning Theory
  • Code: code
  • Summary: 本文测试AI架构演化是否遵循与生物演化相同的统计规律。通过分析935个消融实验,发现架构修改的适应度效应分布与生物演化相似,证明演化统计结构是基质无关的。

[335] The Blind Spot of Agent Safety: How Benign User Instructions Expose Critical Vulnerabilities in Computer-Use Agents

  • arXiv: 2604.10577 (cross-listed)
  • Authors: Xuwei Ding, Skylar Zhai, Linxin Song, Jiate Li, Taiwei Shi, Nicholas Meade, Siva Reddy, Jian Kang, Jieyu Zhao
  • Subjects: cs.CR; cs.AI
  • Tags: LLM Agent, LLM Security, AI Safety
  • Summary: 本文引入OS-BLIND基准,评估计算机使用代理在良性用户指令下的意外攻击场景。评估显示大多数代理攻击成功率超过90%,且在多智能体系统中漏洞更为严重。

[336] AffordGen: Generating Diverse Demonstrations for Generalizable Object Manipulation with Afford Correspondence

  • arXiv: 2604.10579 (cross-listed)
  • Authors: Jiawei Zhang, Kaizhe Hu, Yingqian Huang, Yuanchen Ju, Zhengrong Xue, Huazhe Xu
  • Subjects: cs.RO; cs.AI
  • Tags: Robotics, Imitation Learning, Data Synthesis
  • Summary: 本文提出AffordGen框架,利用3D生成模型和视觉基础模型生成多样化的机器人操作演示。使用AffordGen训练的策略实现高成功率,并能零样本泛化到未见过的物体。

[337] Calibration Collapse Under Sycophancy Fine-Tuning: How Reward Hacking Breaks Uncertainty Quantification in LLMs

  • arXiv: 2604.10585 (cross-listed)
  • Authors: Subramanyam Sahoo
  • Subjects: cs.LG; cs.AI; cs.CL
  • Tags: LLM Alignment, Uncertainty Estimation, RLHF
  • Venue: AISTATS 2026 Workshop
  • Summary: 本文研究了RLHF中的奉承性奖励信号如何降低大语言模型的校准性能。作者通过实验发现,诱导奉承行为的GRPO训练会导致校准退化,即使经过后处理校正仍存在结构性残留问题。

[338] Bridging Linguistic Gaps: Cross-Lingual Mapping in Pre-Training and Dataset for Enhanced Multilingual LLM Performance

  • arXiv: 2604.10590 (cross-listed)
  • Authors: Weihua Zheng, Chang Liu, Zhengyuan Liu, Xin Huang, Kui Wu, Muhammad Huzaifah Md Shahrin, Aiti Aw, Roy Ka-Wei Lee
  • Subjects: cs.CL; cs.AI
  • Tags: Machine Translation, Pre-training, Multilingual Learning
  • Summary: 本文提出在预训练阶段引入跨语言映射任务,以增强多语言大语言模型的跨语言对齐能力。该方法在机器翻译、跨语言理解和问答任务上分别实现了最高11.9 BLEU和5%以上的准确率提升。

[339] GeoMeld: Toward Semantically Grounded Foundation Models for Remote Sensing

  • arXiv: 2604.10591 (cross-listed)
  • Authors: Maram Hasan, Md Aminur Hossain, Savitra Roy, Souparna Bhowmik, Ayush V. Patel, Mainak Singha, Subhasis Chaudhuri, Muhammad Haris Khan, Biplab Banerjee
  • Subjects: cs.CV; cs.AI
  • Tags: Remote Sensing, Multimodal Learning, Pre-training
  • Venue: CVPR 2026 Workshop
  • Summary: 本文提出了GeoMeld,一个包含约250万样本的大规模多模态遥感数据集,以及GeoMeld-FM预训练框架。该框架结合了多任务掩码自编码、JEPA表示学习和标题-视觉对比对齐,在下游迁移和跨传感器鲁棒性方面表现优异。

[340] COREY: A Prototype Study of Entropy-Guided Operator Fusion with Hadamard Reparameterization for Selective State Space Models

  • arXiv: 2604.10597 (cross-listed)
  • Authors: Bo Ma, Jinsong Wu, Hongjiang Wei, Weiqi Yan
  • Subjects: cs.CV; cs.AI
  • Tags: LLM Inference, Optimization
  • Code: code
  • Summary: 本文提出了COREY框架,结合内存感知的算子融合和基于Hadamard的特征重参数化来优化选择性状态空间模型(如Mamba)的推理效率。该方法利用激活熵作为运行时调度统计量来优化融合边界和分块大小。

[341] MoEITS: A Green AI approach for simplifying MoE-LLMs

  • arXiv: 2604.10603 (cross-listed)
  • Authors: Luis Balderas, Miguel Lastra, José M. Benítez
  • Subjects: cs.LG; cs.AI; cs.PF
  • Tags: Mixture-of-Experts, Model Compression
  • Code: code
  • Summary: 本文提出了MoEITS算法,用于简化混合专家大语言模型。该方法基于信息论框架,在Mixtral、Qwen1.5和DeepSeek模型上的实验表明,MoEITS在保持模型有效性的同时显著提升了计算效率。

[342] NSFL: A Post-Training Neuro-Symbolic Fuzzy Logic Framework for Boolean Operators in Neural Embeddings

  • arXiv: 2604.10604 (cross-listed)
  • Authors: Vladi Vexler, Ofer Idan, Gil Lederman, Dima Sivov
  • Subjects: cs.IR; cs.AI; cs.CL; cs.LG
  • Tags: Neurosymbolic AI, Information Retrieval
  • Summary: 本文提出了神经符号模糊逻辑(NSFL)框架,将形式化的t范数和t余范数适配到神经嵌入空间中,无需重新训练即可实现布尔逻辑约束。该方法在检索任务上实现了最高81%的mAP提升。

[343] Computational Lesions in Multilingual Language Models Separate Shared and Language-specific Brain Alignment

  • arXiv: 2604.10627 (cross-listed)
  • Authors: Yang Cui, Jingyuan Sun, Yizheng Sun, Yifan Wang, Yunhao Zhang, Jixing Li, Shaonan Wang, Hongpeng Zhou, John Hale, Chengqing Zong, Goran Nenadic
  • Subjects: cs.CL; cs.AI; cs.CE
  • Tags: Multilingual Learning, Cognitive Science
  • Summary: 本文通过在多语言大语言模型中创建计算损伤来研究大脑语言处理中的共享与语言特异性机制。实验结果表明,共享核心的损伤会使大脑编码相关性降低60%,而语言特异性损伤则选择性地削弱匹配母语的大脑预测能力。

[344] Vibe-driven model-based engineering

  • arXiv: 2604.10645 (cross-listed)
  • Authors: Jordi Cabot
  • Subjects: cs.SE; cs.AI
  • Tags: Software Engineering, LLM Agent
  • Summary: 本文提出了氛围驱动模型工程的概念,将基于大语言模型的氛围编码与模型驱动工程相结合,以加速可靠复杂系统的开发。作者论证了两种方法可以互补,为不同类型的软件系统和开发场景提供不同的开发路径。

[345] LoViF 2026 The First Challenge on Weather Removal in Videos

  • arXiv: 2604.10655 (cross-listed)
  • Authors: Chenghao Qian
  • Subjects: cs.CV; cs.AI; cs.MM
  • Tags: Video Understanding, Image Enhancement
  • Venue: CVPR 2026 Workshop
  • Summary: 本文介绍了LoViF 2026视频去天气挑战赛,并发布了WRV数据集用于从雨雪等恶劣天气条件下恢复清晰视频。该挑战赛吸引了37名参与者,旨在推动视频天气去除技术的发展。

[346] Efficient Process Reward Modeling via Contrastive Mutual Information

  • arXiv: 2604.10660 (cross-listed)
  • Authors: Nakyung Lee, Sangwoo Hong, Jungwoo Lee
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: LLM Reasoning, Reinforcement Learning
  • Venue: ACL 2026
  • Summary: 本文提出了对比点互信息(CPMI)方法,用于自动标注过程奖励模型的训练数据。该方法利用模型内部概率推断步骤级监督信号,将数据集构建时间减少84%,同时在数学推理基准上取得更高准确率。

[347] DynamicsLLM: a Dynamic Analysis-based Tool for Generating Intelligent Execution Traces Using LLMs to Detect Android Behavioural Code Smells

  • arXiv: 2604.10661 (cross-listed)
  • Authors: Houcine Abdelkader Cherief, Florent Avellaneda, Naouel Moha
  • Subjects: cs.SE; cs.AI
  • Tags: Software Testing, LLM Agent
  • Summary: 本文提出了DynamicsLLM工具,利用大语言模型智能生成执行轨迹来检测Android行为代码异味。实验表明,在有限操作次数下,该方法覆盖的代码异味相关事件是Dynamics工具的三倍。

[348] Learning and Enforcing Context-Sensitive Control for LLMs

  • arXiv: 2604.10667 (cross-listed)
  • Authors: Mohammad Albinhassan, Pranava Madhyastha, Mark Law, Alessandra Russo
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: LLM Inference, Prompt Engineering
  • Venue: ACL 2025 Workshop
  • Summary: 本文提出了一个自动学习上下文敏感约束的框架,通过语法探索和约束利用两个阶段,使小规模LLM能够学习并完美遵守约束规则,无需人工指定。

[349] Skill-SD: Skill-Conditioned Self-Distillation for Multi-turn LLM Agents

  • arXiv: 2604.10674 (cross-listed)
  • Authors: Hao Wang, Guozhi Wang, Han Xiao, Yufeng Zhou, Yue Pan, Jichao Wang, Ke Xu, Yafei Wen, Xiaohu Ruan, Xiaoxin Chen, Honggang Qi
  • Subjects: cs.LG; cs.AI; cs.CL
  • Tags: LLM Agent, Knowledge Distillation
  • Summary: 本文提出了Skill-SD框架,将智能体轨迹总结为自然语言技能描述,作为动态特权信息条件化教师模型进行知识蒸馏。该方法在AppWorld和Sokoban基准上分别比标准GRPO提升14%和10.9%。

[350] Critical-CoT: A Robust Defense Framework against Reasoning-Level Backdoor Attacks in Large Language Models

  • arXiv: 2604.10681 (cross-listed)
  • Authors: Vu Tuan Truong, Long Bao Le
  • Subjects: cs.CR; cs.AI
  • Tags: LLM Security, LLM Reasoning
  • Summary: 本文提出了Critical-CoT防御框架,通过两阶段微调使大语言模型发展批判性思维能力,能够自动识别推理级后门攻击并拒绝生成恶意推理步骤。

[351] SCOPE: Signal-Calibrated On-Policy Distillation Enhancement with Dual-Path Adaptive Weighting

  • arXiv: 2604.10688 (cross-listed)
  • Authors: Binbin Zheng, Xing Ma, Yiheng Liang, Jingqing Ruan, Xiaoliang Fu, Kepeng Lin, Benchang Zhu, Ke Zeng, Xunliang Cai
  • Subjects: cs.LG; cs.AI; cs.CL
  • Tags: LLM Reasoning, Knowledge Distillation
  • Summary: 本文提出了SCOPE双路径自适应训练框架,根据正确性将在线策略轨迹路由到两条互补的监督路径,在六个推理基准上实现了平均11.42%的相对提升。

[352] Bringing Value Models Back: Generative Critics for Value Modeling in LLM Reinforcement Learning

  • arXiv: 2604.10701 (cross-listed)
  • Authors: Zikang Shan, Han Zhong, Liwei Wang, Li Zhao
  • Subjects: cs.LG; cs.AI; cs.CL
  • Tags: Reinforcement Learning, LLM Reasoning
  • Summary: 本文提出了生成式Actor-Critic(GenAC)方法,用生成式评论器替代传统的单次标量价值预测,通过链式思维推理产生价值估计,改善了价值近似和强化学习性能。

[353] Architecture-Agnostic Modality-Isolated Gated Fusion for Robust Multi-Modal Prostate MRI Segmentation

  • arXiv: 2604.10702 (cross-listed)
  • Authors: Yongbo Shu, Wenzhao Xie, Shanhu Yao, Zirui Xin, Luo Lei, Kewen Chen, Aijing Luo
  • Subjects: cs.CV; cs.AI
  • Tags: Medical AI, Image Segmentation, Multimodal Learning
  • Summary: 本文提出了一种模态隔离门控融合(MIGF)模块,用于多参数前列腺MRI分割,通过保持独立的模态编码流和模态丢弃训练来处理缺失或退化的模态,在PI-CAI数据集上显著提升了分割性能和鲁棒性。

[354] Audio-Omni: Extending Multi-modal Understanding to Versatile Audio Generation and Editing

  • arXiv: 2604.10708 (cross-listed)
  • Authors: Zeyue Tian, Binxin Yang, Zhaoyang Liu, Jiexuan Zhang, Ruibin Yuan, Hubery Yin, Qifeng Chen, Chen Li, Jing Lv, Wei Xue, Yike Guo
  • Subjects: cs.SD; cs.AI; cs.CV; cs.MM
  • Tags: Multimodal Learning, Diffusion Model, Audio Generation
  • Summary: 本文提出了Audio-Omni,首个端到端框架,统一了通用声音、音乐和语音领域的生成与编辑能力,结合冻结的多模态大语言模型和可训练的扩散Transformer,并构建了大规模音频编辑数据集AudioEdit。

[355] Detecting RAG Extraction Attack via Dual-Path Runtime Integrity Game

  • arXiv: 2604.10717 (cross-listed)
  • Authors: Yuanbo Xie, Yingjie Zhang, Yulin Li, Shouyou Song, Xiaokun Chen, Zhihan Liu, Liya Su, Tingwen Liu
  • Subjects: cs.CR; cs.AI; cs.CL
  • Tags: RAG, LLM Security, Cybersecurity
  • Venue: ACL 2026
  • Summary: 本文提出了CanaryRAG防御机制,通过在检索块中嵌入金丝雀令牌,将RAG提取防御转化为双路径运行时完整性博弈,实现对知识库泄露攻击的实时检测。

[356] Turning Generators into Retrievers: Unlocking MLLMs for Natural Language-Guided Geo-Localization

  • arXiv: 2604.10721 (cross-listed)
  • Authors: Yuqi Chen, Xiaohan Zhang, Ahmad Arrabi, Waqas Sultani, Chen Chen, Safwan Wshah
  • Subjects: cs.CV; cs.AI
  • Tags: Vision-Language Model, Information Retrieval, Remote Sensing
  • Summary: 本文通过参数高效微调将多模态大语言模型适配于自然语言引导的跨视角地理定位任务,在GeoText-1652和CVG-Text基准上取得了最先进的性能。

[357] Tail-Aware Information-Theoretic Generalization for RLHF and SGLD

  • arXiv: 2604.10727 (cross-listed)
  • Authors: Huiming Zhang, Binghan Li, Wan Tian, Qiang Sun
  • Subjects: stat.ML; cs.AI; cs.LG; math.PR; math.ST
  • Tags: RLHF, Deep Learning Theory, Optimization
  • Summary: 本文针对重尾数据开发了尾部依赖的信息论框架,为sub-Weibull数据建立了泛化界,并应用于重尾奖励下的RLHF和随机梯度Langevin动力学。

[358] Perceived Importance of Cognitive Skills Among Computing Students in the Era of AI

  • arXiv: 2604.10730 (cross-listed)
  • Authors: Neha Rani, Erta Cenko, Laura Melissa Cruz Castro
  • Subjects: cs.CY; cs.AI
  • Tags: Education Technology, AI Ethics
  • Summary: 本研究调查了计算专业学生对认知技能重要性的感知,发现学生预期随着AI整合增加,所有11项认知技能的重要性都将下降,凸显了教育干预的必要性。

[359] Too Nice to Tell the Truth: Quantifying Agreeableness-Driven Sycophancy in Role-Playing Language Models

  • arXiv: 2604.10733 (cross-listed)
  • Authors: Arya Shah, Deepali Mishra, Chaklam Silpasuwanchai
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Alignment, LLM Evaluation, AI Safety
  • Venue: ACL 2026
  • Summary: 本文系统研究了角色扮演大语言模型中人格宜人性与奉承行为的关系,发现在13个模型中有9个表现出人格宜人性与奉承率之间的显著正相关。

[360] Deep-Reporter: Deep Research for Grounded Multimodal Long-Form Generation

  • arXiv: 2604.10741 (cross-listed)
  • Authors: Fangda Ye, Zhifei Xie, Yuxin Hu, Yihang Yin, Shurui Huang, Shikai Dong, Jianzhu Bao, Shuicheng Yan
  • Subjects: cs.CL; cs.AI; cs.IR
  • Tags: LLM Agent, Multimodal Learning, Text Generation
  • Code: code
  • Summary: 本文提出了Deep-Reporter框架,用于基于事实的多模态长文本生成,通过智能体多模态搜索、清单引导增量合成和循环上下文管理实现连贯的图文整合。

[361] Generating Multiple-Choice Knowledge Questions with Interpretable Difficulty Estimation using Knowledge Graphs and Large Language Models

  • arXiv: 2604.10748 (cross-listed)
  • Authors: Mehmet Can Şakiroğlu, H. Altay Güvenir, Kamer Kaya
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: Knowledge Graph, Education Technology, Question Answering
  • Summary: 本文提出了一种利用知识图谱和大语言模型生成带难度估计的选择题的方法,通过计算九种难度信号并组合成统一的难度分数,实现了可解释的难度评估。

[362] Prosociality by Coupling, Not Mere Observation: Homeostatic Sharing in an Inspectable Recurrent Artificial Life Agent

  • arXiv: 2604.10760 (cross-listed)
  • Authors: Aishik Sanyal
  • Subjects: cs.MA; cs.AI
  • Tags: Multi-Agent System, Cognitive Science, AI Safety
  • Summary: 本文在可检查的循环控制器中添加了稳态调节器和社交耦合通道,证明了当他人的需求被路由到自我调节时,亲社会帮助行为会在没有显式社交奖励的情况下出现。

[363] Lung Cancer Detection Using Deep Learning

  • arXiv: 2604.10765 (cross-listed)
  • Authors: Imama Ajmi, Abhishek Das
  • Subjects: cs.CV; cs.AI; cs.LG
  • Tags: Medical AI, Computer Vision
  • Summary: 本文比较了InceptionV3、MobileNetV2、VGG16、ResNet152等深度学习模型在肺癌检测中的表现,并提出了一种16层CNN架构,在准确率和过拟合控制方面表现良好。

[364] Do BERT Embeddings Encode Narrative Dimensions? A Token-Level Probing Analysis of Time, Space, Causality, and Character in Fiction

  • arXiv: 2604.10786 (cross-listed)
  • Authors: Beicheng Bei, Hannah Hyesun Chun, Chen Guo, Arwa Saghiri
  • Subjects: cs.CL; cs.AI
  • Tags: Representation Learning, Natural Language Understanding, Interpretability
  • Venue: CMN 2026 Workshop
  • Summary: 本研究通过探针分析调查BERT嵌入是否编码虚构叙事的时间、空间、因果和角色维度,确认BERT编码了有意义的叙事信息,但存在边界泄漏问题。

[365] TInR: Exploring Tool-Internalized Reasoning in Large Language Models

  • arXiv: 2604.10788 (cross-listed)
  • Authors: Qiancheng Xu, Yongqi Li, Fan Liu, Hongru Wang, Min Yang, Wenjie Li
  • Subjects: cs.CL; cs.AI
  • Tags: Tool Learning, LLM Reasoning
  • Summary: 本文探索工具内化推理,提出TInR-U框架,通过双向知识对齐、监督微调和强化学习三阶段训练流程,将工具知识内化到大语言模型中实现统一推理。

[366] Advancing Polish Language Modeling through Tokenizer Optimization in the Bielik v3 7B and 11B Series

  • arXiv: 2604.10799 (cross-listed)
  • Authors: Krzysztof Ociepa, Łukasz Flis, Remigiusz Kinas, Krzysztof Wróbel, Adrian Gwoździej
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Training, Pre-training, Multilingual Learning
  • Summary: 本文详细介绍了Bielik v3波兰语大语言模型系列的开发,包括从通用分词器到波兰语优化词汇表的转换、多阶段预训练课程和后训练对齐方法。

[367] Verify Before You Fix: Agentic Execution Grounding for Trustworthy Cross-Language Code Analysis

  • arXiv: 2604.10800 (cross-listed)
  • Authors: Jugal Gajjar
  • Subjects: cs.SE; cs.AI; cs.CR; cs.LG; cs.PL
  • Tags: LLM Agent, Code Generation, Software Engineering
  • Summary: 本文提出了一个跨语言漏洞分析框架,通过通用抽象语法树和执行验证的智能体验证,实现了可信的LLM驱动的代码漏洞检测与修复。

[368] MeloTune: On-Device Arousal Learning and Peer-to-Peer Mood Coupling for Proactive Music Curation

  • arXiv: 2604.10815 (cross-listed)
  • Authors: Hongwei Xu
  • Subjects: cs.SD; cs.AI; cs.MA
  • Tags: Music Generation, Edge Computing, Affective Computing
  • Summary: 本文介绍了MeloTune,一个部署在iPhone上的音乐智能体,使用连续时间网络实现情感感知的音乐策展和点对点情绪耦合,所有推理均在设备端完成。

[369] Speaking to No One: Ontological Dissonance and the Double Bind of Conversational AI

  • arXiv: 2604.10833 (cross-listed)
  • Authors: Hugh Brosnahan, Izabela Lipinska
  • Subjects: cs.HC; cs.AI; cs.CL; cs.CY; cs.ET
  • Tags: AI Ethics, AI Safety, Dialogue System
  • Summary: 本文从现象学、精神病学和认知神经科学角度分析对话式AI的心理风险,指出其产生”本体论不协调”——即关系性存在的表象与缺乏能够维持这种存在的主体之间的冲突,可能导致易感用户产生妄想体验。作者解释了为何明确的免责声明往往无法打破这种妄想卷入,并讨论了对对话AI设计和使用的伦理与临床启示。

[370] LLMs for Qualitative Data Analysis Fail on Security-specificComments in Human Experiments

  • arXiv: 2604.10834 (cross-listed)
  • Authors: Maria Camporese, Fabio Massacci, Yuanjun Gong
  • Subjects: cs.SE; cs.AI
  • Tags: LLM Evaluation, Cybersecurity, Data Annotation
  • Summary: 本文探索LLM是否能替代人类标注者对安全相关的技术评论进行主题分析编码。实验结果表明,即使使用详细的代码描述,LLM在各代码上的改进并不均匀,仍无法可靠地替代人类标注者进行安全特定方面的定性数据分析。

[371] Harnessing Photonics for Machine Intelligence

  • arXiv: 2604.10841 (cross-listed)
  • Authors: Hanqing Zhu, Shupeng Ning, Hongjian Zhou, Ziang Yin, Ray T. Chen, Jiaqi Gu, David Z. Pan
  • Subjects: cs.AI; cs.AR; cs.ET; cs.LG
  • Tags: Photonic Computing, Energy Efficiency
  • Summary: 本文综述了光子计算作为AI加速候选技术的前景,从电路与系统视角重新构建光子计算框架,提出跨层协同设计和电子-光子设计自动化(EPDA)对于实现可扩展、可复现的光子机器智能生态系统至关重要。

[372] Resilient Write: A Six-Layer Durable Write Surface for LLM Coding Agents

  • arXiv: 2604.10842 (cross-listed)
  • Authors: Justice Owusu Agyemang, Jerry John Kponyo, Elliot Amponsah, Godfred Manu Addo Boakye, Kwame Opuni-Boachie Obour Agyekum
  • Subjects: cs.SE; cs.AI
  • Tags: LLM Agent, Code Generation
  • Summary: 本文提出Resilient Write,一个为LLM编码代理提供六层持久写入表面的MCP服务器,解决因内容过滤、截断或会话中断导致的写入失败问题。该系统将恢复时间减少5倍,代理自纠正率提高13倍。

[373] Retinal Cyst Detection from Optical Coherence Tomography Images

  • arXiv: 2604.10843 (cross-listed)
  • Authors: Abhishek Dharmaratnakar, Aadheeshwar Vijayakumar, Suchand Dayanand
  • Subjects: cs.CV; cs.AI; cs.LG; cs.NE
  • Tags: Medical AI, Image Segmentation
  • Summary: 本文提出一种基于ResNet CNN的光学相干断层扫描图像视网膜囊肿自动分割方法,采用patchwise分类策略进行训练。该方法在所有供应商数据上均达到70%以上的Dice系数,优于之前的最佳方法。

[374] Task2vec Readiness: Diagnostics for Federated Learning from Pre-Training Embeddings

  • arXiv: 2604.10849 (cross-listed)
  • Authors: Cristiano Mafuz, Rodrigo Silva
  • Subjects: cs.LG; cs.AI
  • Tags: Federated Learning, Transfer Learning
  • Summary: 本文提出基于Task2Vec嵌入的就绪性指数,用于在训练前量化联邦学习的对齐程度并预测最终性能。实验表明这些指标与最终性能具有显著相关性(Pearson和Spearman系数常超过0.9),可作为联邦学习结果的有效预测代理。

[375] BridgeSim: Unveiling the OL-CL Gap in End-to-End Autonomous Driving

  • arXiv: 2604.10856 (cross-listed)
  • Authors: Seth Z. Zhao, Luobin Wang, Hongwei Ruan, Yuxin Bao, Yilan Chen, Ziyang Leng, Abhijit Ravichandran, Honglin He, Zewei Zhou, Xu Han, Abhishek Peri, Zhiyu Huang, Pranav Desai, Henrik Christensen, Jiaqi Ma, Bolei Zhou
  • Subjects: cs.RO; cs.AI
  • Tags: Autonomous Driving, Sim-to-Real
  • Summary: 本文揭示了端到端自动驾驶中开环(OL)评估与闭环(CL)部署之间的差距根源,识别出观测域偏移和目标不匹配两个主要问题。作者提出测试时适应(TTA)框架来校准观测偏移、减少状态-动作偏差并强制时间一致性。

[376] Query Lower Bounds for Diffusion Sampling

  • arXiv: 2604.10857 (cross-listed)
  • Authors: Zhiyang Xun, Eric Price
  • Subjects: cs.LG; cs.AI; cs.DS; math.ST; stat.ML
  • Tags: Diffusion Model, Deep Learning Theory
  • Summary: 本文建立了扩散模型采样的首个分数查询下界,证明对于d维分布,任何采样算法都需要Ω(√d)次自适应分数查询。这一结果正式解释了为什么实践中需要多尺度噪声调度。

[377] AOP-Smart: A RAG-Enhanced Large Language Model Framework for Adverse Outcome Pathway Analysis

  • arXiv: 2604.10874 (cross-listed)
  • Authors: Qinjiang Niu, Lu Yan
  • Subjects: cs.CL; cs.AI
  • Tags: RAG, LLM Hallucination, Medical AI
  • Summary: 本文提出AOP-Smart,一个面向不良结果路径(AOP)分析的检索增强生成(RAG)框架,利用AOP-Wiki的官方XML数据检索相关知识。实验表明,使用RAG后三个主流LLM的准确率从15-35%提升至95-100%,显著缓解了幻觉问题。

[378] Compliant But Unsatisfactory: The Gap Between Auditing Standards and Practices for Probabilistic Genotyping Software

  • arXiv: 2604.10875 (cross-listed)
  • Authors: Angela Jin, Alexander Asemota, Dan E. Krane, Nathaniel D. Adams, Rediet Abebe
  • Subjects: cs.CY; cs.AI; cs.HC; cs.SE
  • Tags: AI Ethics, AI Safety, Legal AI
  • Venue: CHI 2026
  • Summary: 本文通过分析概率基因分型软件的审计标准ASB 018和五份审计报告,揭示了审计标准与实际实践之间的差距。研究发现设计不良的标准可能掩盖并赋予不充分系统可信度,并提出了改进审计标准设计的建议。

[379] DIB-OD: Preserving the Invariant Core for Robust Heterogeneous Graph Adaptation via Decoupled Information Bottleneck and Online Distillation

  • arXiv: 2604.10882 (cross-listed)
  • Authors: Yang Yan, Qiuyan Wang, Tianjin Huang, Qiudong Yu, Kexin Zhang
  • Subjects: cs.LG; cs.AI
  • Tags: Graph Neural Network, Domain Adaptation, Knowledge Distillation
  • Summary: 本文提出DIB-OD框架,通过解耦信息瓶颈和在线蒸馏机制,将表示显式分解为正交的不变和冗余子空间,以保持跨域迁移中的不变核心。实验表明该方法在跨类型域迁移任务中显著优于现有方法。

[380] Ambiguity Detection and Elimination in Automated Executable Process Modeling

  • arXiv: 2604.10884 (cross-listed)
  • Authors: Ion Matei, Praveen Kumar Menaka Sekar, Maksym Zhenirovskyy, Hon Yung Wong, Sayuri Kohmura, Shinji Hotta, Akihiro Inomata
  • Subjects: cs.SE; cs.AI
  • Tags: Natural Language Understanding, Program Synthesis
  • Summary: 本文提出一个诊断驱动框架,用于检测和消除从自然语言规范自动生成可执行BPMN模型时的歧义。该方法通过关键绩效指标的实证分布检测行为不一致,定位到网关逻辑,并将逻辑映射回原文进行修复。

[381] Product Review Based on Optimized Facial Expression Detection

  • arXiv: 2604.10885 (cross-listed)
  • Authors: Vikrant Chaugule, Abhishek D, Aadheeshwar Vijayakumar, Pravin Bhaskar Ramteke, Shashidhar G. Koolagudi
  • Subjects: cs.CV; cs.AI; cs.GR
  • Tags: Affective Computing
  • Venue: IC3 2016
  • Summary: 本文提出一种基于改进Harris算法的面部表情检测方法,用于分析顾客对产品的接受度。改进算法降低了特征提取的时间复杂度,在保持准确性的同时显著提高了角点检测速度。

[382] Beyond A Fixed Seal: Adaptive Stealing Watermark in Large Language Models

  • arXiv: 2604.10893 (cross-listed)
  • Authors: Shuhao Zhang, Yuli Chen, Jiale Han, Bo Cheng, Jiabao Ma
  • Subjects: cs.CR; cs.AI
  • Tags: LLM Security, AI Safety
  • Code: code
  • Summary: 本文提出自适应窃取(AS)方法,一种新型LLM水印窃取攻击,通过基于位置的印章构建和自适应选择模块,根据水印兼容性、生成优先级和动态生成相关性动态选择最优攻击视角,显著提高了窃取效率。

[383] Evaluating the Impact of Medical Image Reconstruction on Downstream AI Fairness and Performance

  • arXiv: 2604.10904 (cross-listed)
  • Authors: Matteo Wohlrapp, Niklas Bubeck, Daniel Rueckert, William Lotter
  • Subjects: cs.CV; cs.AI
  • Tags: Medical AI, Fairness, Image Enhancement
  • Venue: MIDL 2026
  • Summary: 本文评估AI图像重建对下游诊断性能和公平性的影响,发现传统重建指标无法很好地追踪任务性能,重建有时会放大人口统计学偏差。研究强调了在整个医学影像工作流程中进行整体性能和公平性评估的重要性。

[384] Audio Flamingo Next: Next-Generation Open Audio-Language Models for Speech, Sound, and Music

  • arXiv: 2604.10905 (cross-listed)
  • Authors: Sreyan Ghosh, Arushi Goel, Kaousheik Jayakumar, Lasha Koroshinadze, Nishit Anand, Zhifeng Kong, Siddharth Gururani, Sang-gil Lee, Jaehyeon Kim, Aya Aljafari, Chao-Han Huck Yang, Sungwon Kim, Ramani Duraiswami, Dinesh Manocha, Mohammad Shoeybi, Bryan Catanzaro, Ming-Yu Liu, Wei Ping
  • Subjects: cs.SD; cs.AI; cs.CL; eess.AS
  • Tags: Multimodal Learning, Speech Processing
  • Summary: 本文提出Audio Flamingo Next,下一代大型音频-语言模型,支持语音、环境声音和音乐的理解与推理。该模型引入了时间音频思维链推理范式,支持长达30分钟的音频输入,在20个音频理解基准上显著优于同类开源模型。

[385] ReXSonoVQA: A Video QA Benchmark for Procedure-Centric Ultrasound Understanding

  • arXiv: 2604.10916 (cross-listed)
  • Authors: Xucheng Wang, Xiaoman Zhang, Sung Eun Kim, Ankit Pal, Pranav Rajpurkar
  • Subjects: cs.CV; cs.AI
  • Tags: Medical AI, Vision-Language Model, Benchmark
  • Summary: 本文介绍了ReXSonoVQA,一个针对超声程序理解的视频问答基准,包含514个视频片段,评估三个核心能力:动作-目标推理、伪影解决与优化、以及程序上下文与规划。实验表明视觉语言模型在故障排除问题上仍面临挑战。

[386] Mem$^2$Evolve: Towards Self-Evolving Agents via Co-Evolutionary Capability Expansion and Experience Distillation

  • arXiv: 2604.10923 (cross-listed)
  • Authors: Zihao Cheng, Zeming Liu, Yingyu Shan, Xinyi Wang, Xiangrong Zhu, Yunpu Ma, Hongru Wang, Yuhang Guo, Wei Lin, Yunhong Wang
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Agent, Knowledge Distillation
  • Venue: ACL 2026
  • Summary: 本文提出Mem²Evolve框架,通过整合经验记忆和资产记忆实现协同进化的能力扩展与经验蒸馏,使智能体能够自我进化。实验表明该方法相比标准LLM提升18.53%。

[387] QShield: Securing Neural Networks Against Adversarial Attacks using Quantum Circuits

  • arXiv: 2604.10933 (cross-listed)
  • Authors: Navid Azimi, Aditya Prakash, Yao Wang, Li Xiong
  • Subjects: cs.CR; cs.AI; cs.CV; cs.LG
  • Tags: Adversarial Robustness, Quantum Computing
  • Summary: 本文提出QShield,一种混合量子-经典神经网络架构,通过将CNN特征提取与量子处理模块结合来增强对抗鲁棒性。实验表明该方法在保持预测准确性的同时显著降低了攻击成功率。

[388] Pseudo-Unification: Entropy Probing Reveals Divergent Information Patterns in Unified Multimodal Models

  • arXiv: 2604.10949 (cross-listed)
  • Authors: Songlin Yang, Xianghao Kong, Anyi Rao
  • Subjects: cs.CV; cs.AI
  • Tags: Vision-Language Model, Multimodal Learning, Interpretability
  • Summary: 本文提出信息论探测框架来分析统一多模态模型为何无法将LLM推理能力迁移到图像生成。研究发现伪统一源于模态不对称编码和模式分裂响应两种信息模式的分歧。

[389] A molecular clock for writing systems reveals the quantitative impact of imperial power on cultural evolution

  • arXiv: 2604.10957 (cross-listed)
  • Authors: Hiroki Fukui
  • Subjects: q-bio.PE; cs.AI; cs.CL; cs.CY
  • Tags: Cultural Evolution, Knowledge Representation
  • Summary: 本文构建了全球文字数据库,包含300种书写系统,应用四种系统发育方法发现文字演化存在可检测的分子钟。研究表明政治干预会打破这一分子钟,并与文字灭绝相关。

[390] Continuous-time Online Learning via Mean-Field Neural Networks: Regret Analysis in Diffusion Environments

  • arXiv: 2604.10958 (cross-listed)
  • Authors: Erhan Bayraktar, Bingyan Han, Ziqing Zhang
  • Subjects: cs.LG; cs.AI; math.OC
  • Tags: Optimization, Deep Learning Theory
  • Summary: 本文研究连续时间在线学习,其中数据由扩散过程生成,学习者使用两层神经网络持续更新参数。研究建立了平均场极限和有限粒子系统的遗憾界限。

[391] You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass

  • arXiv: 2604.10966 (cross-listed)
  • Authors: Yinuo Yang, Zixian Ma, Manasi Ganti, Jieyu Zhang, Ranjay Krishna
  • Subjects: cs.CV; cs.AI
  • Tags: Vision-Language Model, LLM Evaluation, RLHF
  • Summary: 本文提出一种判别式多模态奖励模型,可在单次前向传播中对所有候选响应进行评分,实现N倍加速。该模型在六个多模态奖励基准上达到最先进结果。

[392] Towards Automated Solar Panel Integrity: Hybrid Deep Feature Extraction for Advanced Surface Defect Identification

  • arXiv: 2604.10969 (cross-listed)
  • Authors: Muhammad Junaid Asif, Muhammad Saad Rafaqat, Usman Nazakat, Uzair Khan, Rana Fayyaz Ahmad
  • Subjects: cs.CV; cs.AI
  • Tags: Anomaly Detection, Computer Vision
  • Summary: 本文提出一种混合缺陷检测方法,结合手工特征(LBP、HoG、Gabor滤波器)和DenseNet-169深度特征来检测太阳能电池板缺陷。实验表明DenseNet-169+Gabor(SVM)达到99.17%的准确率。

[393] MMR-AD: A Large-Scale Multimodal Dataset for Benchmarking General Anomaly Detection with Multimodal Large Language Models

  • arXiv: 2604.10971 (cross-listed)
  • Authors: Xincheng Yao, Zefeng Qian, Chao Shi, Jiayang Song, Chongyang Zhang
  • Subjects: cs.CV; cs.AI
  • Tags: Anomaly Detection, Vision-Language Model, Benchmark
  • Venue: CVPR 2026
  • Summary: 本文提出MMR-AD,一个大规模多模态数据集,用于基准测试基于多模态大语言模型的通用异常检测。作者还提出Anomaly-R1基线模型,在异常检测和定位上取得显著改进。

[394] Enabling and Inhibitory Pathways of Students' AI Use Concealment Intention in Higher Education: Evidence from SEM and fsQCA

  • arXiv: 2604.10978 (cross-listed)
  • Authors: Yiran Du, Huimin He
  • Subjects: cs.HC; cs.AI
  • Tags: AI Ethics, Education Technology
  • Summary: 本研究整合认知-情感-意动框架,采用结构方程模型和模糊集定性比较分析方法,调查高等教育中学生隐瞒AI使用意图的心理因素。研究发现恐惧负面评价促进隐瞒,心理安全感则抑制隐瞒。

[395] When Verification Fails: How Compositionally Infeasible Claims Escape Rejection

  • arXiv: 2604.10990 (cross-listed)
  • Authors: Muxin Liu, Delip Rao, Grace Kim, Chris Callison-Burch
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Reasoning, LLM Evaluation
  • Summary: 本文揭示现有验证基准无法区分严格的声明验证和简单的显著约束检查捷径。作者构建组合不可行声明,发现模型持续过度接受这些声明,确认捷径推理的普遍存在。

[396] Examining EAP Students' AI Disclosure Intention: A Cognition-Affect-Conation Perspective

  • arXiv: 2604.10991 (cross-listed)
  • Authors: Yiran Du, Huimin He
  • Subjects: cs.HC; cs.AI
  • Tags: AI Ethics, Education Technology
  • Summary: 本研究采用认知-情感-意动框架,通过混合方法设计调查EAP学生披露AI工具使用意图的心理因素。结果表明心理安全感正向预测披露意图,恐惧负面评价负向预测披露意图。

[397] When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies

  • arXiv: 2604.10996 (cross-listed)
  • Authors: Zhengzhe Yang
  • Subjects: cs.CL; cs.AI; cs.CE
  • Tags: Reinforcement Learning, Quantitative Finance
  • Summary: 本文研究LLM是否能生成数值特征来改进强化学习交易智能体。研究发现虽然优化后的提示发现了预测性特征,但在分布偏移期间这些特征会增加噪声,导致增强智能体表现不如纯价格基线。

[398] Panoptic Pairwise Distortion Graph

  • arXiv: 2604.11004 (cross-listed)
  • Authors: Muhammad Kamran Janjua, Abdul Wahab, Bahador Rashidi
  • Subjects: cs.CV; cs.AI; cs.LG
  • Tags: Vision-Language Model, Image Quality Assessment, Benchmark
  • Venue: ICLR 2026
  • Summary: 本文提出失真图任务,将图像对表示为基于区域的结构化拓扑,并贡献PandaSet数据集、PandaBench基准和Panda架构。研究表明现有MLLM在区域级退化理解上存在困难。

[399] NimbusGuard: A Novel Framework for Proactive Kubernetes Autoscaling Using Deep Q-Networks

  • arXiv: 2604.11017 (cross-listed)
  • Authors: Chamath Wanigasooriya, Indrajith Ekanayake
  • Subjects: cs.DC; cs.AI
  • Tags: Reinforcement Learning, Cloud Computing
  • Summary: 本文提出NimbusGuard,一个基于深度强化学习的Kubernetes主动式自动扩缩系统,结合LSTM预测未来工作负载模式。实验表明该主动框架在性能和成本效率上优于现有反应式方法。

[400] Brief2Design: A Multi-phased, Compositional Approach to Prompt-based Graphic Design

  • arXiv: 2604.11019 (cross-listed)
  • Authors: Kotaro Kikuchi, Nami Ogawa
  • Subjects: cs.HC; cs.AI
  • Tags: Text-to-Image, Human-Computer Interaction
  • Summary: 本文提出Brief2Design,一种多阶段组合方法,支持需求提取与推荐、元素级探索和灵活重组。用户研究表明结构化工作流有利于需求澄清,但以效率为代价。

[401] Optimal Stability of KL Divergence under Gaussian Perturbations

  • arXiv: 2604.11026 (cross-listed)
  • Authors: Jialu Pan, Yufeng Zhang, Nan Hu, Keqin Li
  • Subjects: cs.LG; cs.AI
  • Tags: Deep Learning Theory, Anomaly Detection
  • Summary: 本文建立了任意分布与高斯族之间KL散度的尖锐稳定性界,在温和矩条件下移除了先前工作中的高斯假设限制。作者证明了√ε速率是最优的,并将该结果应用于基于流的生成模型的分布外检测分析。

[402] Federated Single-Agent Robotics: Multi-Robot Coordination Without Intra-Robot Multi-Agent Fragmentation

  • arXiv: 2604.11028 (cross-listed)
  • Authors: Xue Qin, Simin Luan, John See, Cong Yang, Zhijun Li
  • Subjects: cs.RO; cs.AI
  • Tags: Robotics, Multi-Agent System
  • Code: code
  • Summary: 本文提出联邦单智能体机器人(FSAR)架构,每个机器人保持单一具身智能体运行时,通过舰队级联邦实现多机器人协调。实验表明该方法在治理局部性和恢复控制方面显著优于分解密集型基线。

[403] Uncertainty-Aware Web-Conditioned Scientific Fact-Checking

  • arXiv: 2604.11036 (cross-listed)
  • Authors: Ashwin Vinod, Katrin Erk
  • Subjects: cs.CL; cs.AI
  • Tags: Uncertainty Estimation, Scientific Reasoning, Question Answering
  • Summary: 本文提出一种科学事实验证流程,采用原子谓词参数分解和校准的不确定性门控验证机制,仅对不确定的事实触发领域受限的网络搜索。该方法在多个基准测试上超越最强基线,同时保持可解释性和成本效率。

[404] RTMC: Step-Level Credit Assignment via Rollout Trees

  • arXiv: 2604.11037 (cross-listed)
  • Authors: Tao Wang, Suhang Zheng, Xiaoxiao Xu
  • Subjects: cs.LG; cs.AI
  • Tags: Reinforcement Learning, LLM Agent
  • Summary: 本文提出Rollout-Tree Monte Carlo (RTMC)优势估计方法,通过聚合共享状态的轨迹回报统计来产生每步Q值和优势,无需学习评论家。该方法在SWE-bench Verified上将pass@1提升了3.2个百分点。

[405] A Systematic Analysis of the Impact of Persona Steering on LLM Capabilities

  • arXiv: 2604.11048 (cross-listed)
  • Authors: Jiaqi Chen, Ming Wang, Tingna Xie, Shi Feng, Yongkang Liu
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Evaluation, LLM Personalization
  • Summary: 本文研究在大语言模型中诱导大五人格特质对认知能力的影响,发现人格诱导产生稳定的、任务依赖的性能变化。作者提出动态人格路由(DPR)策略,在无需额外训练的情况下超越最佳静态人格配置。

[406] Shared Emotion Geometry Across Small Language Models: A Cross-Architecture Study of Representation, Behavior, and Methodological Confounds

  • arXiv: 2604.11050 (cross-listed)
  • Authors: Jihoon Jeong
  • Subjects: cs.CL; cs.AI
  • Tags: Interpretability, Representation Learning
  • Summary: 本文发现五种成熟的小语言模型架构共享几乎相同的21种情感几何结构,尽管行为特征差异显著。研究还表明RLHF仅重构尚未成熟的表示,并揭示了先前情感向量研究中被混淆的四个不同层次效应。

[407] Rethinking Token-Level Credit Assignment in RLVR: A Polarity-Entropy Analysis

  • arXiv: 2604.11056 (cross-listed)
  • Authors: Yuhang He, Haodong Wu, Siyi Liu, Hongyu Ge, Hange Zhou, Keyi Wu, Zhuo Zheng, Qihong Lin, Zixin Zhong, Yongqi Zhang
  • Subjects: cs.LG; cs.AI
  • Tags: LLM Reasoning, Reinforcement Learning, RLHF
  • Summary: 本文通过奖励极性和token熵的联合视角分析RLVR中的信用分配问题,证明token能承载的信用上限由其熵决定。作者提出熵感知策略优化(EAPO)方法,在两个模型家族上超越强基线。

[408] Pando: Do Interpretability Methods Work When Models Won't Explain Themselves?

  • arXiv: 2604.11061 (cross-listed)
  • Authors: Ziqian Zhong, Aashiq Muhamed, Mona T. Diab, Virginia Smith, Aditi Raghunathan
  • Subjects: cs.LG; cs.AI
  • Tags: Interpretability, LLM Evaluation
  • Summary: 本文引入Pando基准,用于评估模型解释缺失或误导时可解释性方法的有效性。结果表明当解释不可信时,基于梯度的归因方法可提高3-5个百分点的准确率,而其他方法收益有限。

[409] Lightweight Low-Light Image Enhancement via Distribution-Normalizing Preprocessing and Depthwise U-Net

  • arXiv: 2604.11071 (cross-listed)
  • Authors: Shimon Murai, Teppei Kurita, Ryuta Satoh, Yusuke Moriuchi
  • Subjects: cs.CV; cs.AI; cs.LG
  • Tags: Image Enhancement
  • Venue: CVPR 2026 Workshop
  • Summary: 本文提出一种轻量级低光照图像增强框架,结合冻结的算法预处理与深度可分离卷积U-Net。该方法在CVPR 2026 NTIRE高效低光照图像增强挑战赛中获得第4名。

[410] ReSpinQuant: Efficient Layer-Wise LLM Quantization via Subspace Residual Rotation Approximation

  • arXiv: 2604.11080 (cross-listed)
  • Authors: Suyoung Kim, Sunghyun Wee, Hyeonjin Kim, Kyomin Hwang, Hyunho Lee, Nojun Kwak
  • Subjects: cs.CV; cs.AI
  • Tags: Model Compression, LLM Inference
  • Summary: 本文提出ReSpinQuant量化框架,通过离线激活旋转融合和残差子空间旋转匹配,在保持层级适应表达力的同时实现近乎零推理开销。该方法在W4A4和W3A3量化上达到最先进性能。

[411] FlowCoMotion: Text-to-Motion Generation via Token-Latent Flow Modeling

  • arXiv: 2604.11083 (cross-listed)
  • Authors: Dawei Guan, Di Yang, Chengjie Jin, Jiangtao Wang
  • Subjects: cs.CV; cs.AI
  • Tags: Motion Synthesis, Flow Matching
  • Summary: 本文提出FlowCoMotion文本到动作生成框架,通过token-潜变量耦合统一连续和离散动作表示。该方法在HumanML3D和SnapMoGen基准测试上取得竞争性性能。

[412] E2E-REME: Towards End-to-End Microservices Auto-Remediation via Experience-Simulation Reinforcement Fine-Tuning

  • arXiv: 2604.11094 (cross-listed)
  • Authors: Lingzhe Zhang, Yunpeng Zhai, Tong Jia, Minghua He, Chiming Duan, Zhaoyang Liu, Bolin Ding, Ying Li
  • Subjects: cs.SE; cs.AI
  • Tags: Software Engineering, LLM Agent
  • Venue: FSE 2026
  • Summary: 本文引入端到端微服务修复任务(E2E-MR)和MicroRemed基准,提出基于经验模拟强化微调的E2E-REME模型。实验表明该方法在公共和工业微服务平台上实现了更优的准确性和效率。

[413] Bottleneck Tokens for Unified Multimodal Retrieval

  • arXiv: 2604.11095 (cross-listed)
  • Authors: Siyu Sun, Jing Ren, Zhaohe Liao, Dongxiao Mao, Xiangyuan Ren, Yiyi Zhang, Haohua Zhao, Weixiong Lin, Jiang Shaohua, Liqing Zhang, Yuchao Zheng
  • Subjects: cs.LG; cs.AI
  • Tags: Vision-Language Model, Multimodal Learning, Information Retrieval
  • Summary: 本文引入瓶颈Token(BToks)用于统一多模态检索,提供显式池化机制和生成式信息压缩目标。该方法在MMEB-V2上达到2B规模方法的最先进性能,总体得分59.0。

[414] Efficient Training for Cross-lingual Speech Language Models

  • arXiv: 2604.11096 (cross-listed)
  • Authors: Yan Zhou, Qingkai Fang, Yun Hong, Yang Feng
  • Subjects: cs.CL; cs.AI; cs.SD
  • Tags: Speech Processing, Multimodal Learning, Multilingual Learning
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文提出跨语言语音语言模型(CSLM)的高效训练方法,基于离散语音token并通过持续预训练实现跨模态跨语言对齐。该方法无需大规模语音数据即可实现良好的跨模态对齐和语言可扩展性。

[415] ActorMind: Emulating Human Actor Reasoning for Speech Role-Playing

  • arXiv: 2604.11103 (cross-listed)
  • Authors: Xi Chen, Wei Xue, Yike Guo
  • Subjects: cs.SD; cs.AI
  • Tags: Speech Synthesis, Multi-Agent System, Dialogue System
  • Summary: 本文提出语音角色扮演概念和ActorMindBench基准,以及ActorMind多智能体推理框架,模拟人类演员的表演方式。框架包含眼、耳、脑、口四个智能体,用于处理角色描述并生成情感语音响应。

[416] Record-Remix-Replay: Hierarchical GPU Kernel Optimization using Evolutionary Search

  • arXiv: 2604.11109 (cross-listed)
  • Authors: Daniel Nichols, Konstantinos Parasyris, Caetano Melone, Tal Ben-Nun, Giorgis Georgakoudis, Harshitha Menon
  • Subjects: cs.DC; cs.AI; cs.LG; cs.PF
  • Tags: GPU Computing, Optimization, High Performance Computing
  • Summary: 本文提出R³分层优化框架,结合LLM驱动的进化搜索、贝叶斯优化和记录-重放编译技术,用于GPU核函数优化。该方法比传统方法更有效,同时比现代进化搜索方法快近一个数量级。

[417] Use of AI Tools: Guidelines to Maintain Academic Integrity in Computing Colleges

  • arXiv: 2604.11111 (cross-listed)
  • Authors: Hatem M. El-boghdadi, Toqeer Ali Syed, Ali Akarma, Qamar Wali
  • Subjects: cs.CY; cs.AI; cs.CL; cs.ET
  • Tags: Education Technology, AI Ethics
  • Venue: IJEEE 2025
  • Summary: 本文探讨了AI工具在计算机学院中的使用对学术诚信的影响,提出了适用于各种评估形式的通用指导原则和针对性建议,帮助教师在利用AI工具提升教学效果的同时维护学术诚信。

[418] Semantic-Geometric Dual Compression: Training-Free Visual Token Reduction for Ultra-High-Resolution Remote Sensing Understanding

  • arXiv: 2604.11122 (cross-listed)
  • Authors: Yueying Li, Fengxiang Wang, Yan Li, Mingshuo Chen, Mengying Zhao, Long Lan
  • Subjects: cs.CV; cs.AI
  • Tags: Remote Sensing, Vision-Language Model, LLM Inference
  • Summary: 本文提出DualComp框架,通过语义-几何双流压缩策略解决超高分辨率遥感图像理解中的视觉token计算开销问题,实现了效率和准确性的双重提升。

[419] BoxTuning: Directly Injecting the Object Box for Multimodal Model Fine-Tuning

  • arXiv: 2604.11136 (cross-listed)
  • Authors: Zekun Qian, Ruize Han, Wei Feng
  • Subjects: cs.CV; cs.AI
  • Tags: Vision-Language Model, Video Understanding, Question Answering
  • Summary: 本文提出BoxTuning方法,通过将边界框信息直接注入视觉模态而非文本坐标,显著降低token成本并保持完整时间分辨率,在视频问答任务上取得了优异性能。

[420] Cost-optimal Sequential Testing via Doubly Robust Q-learning

  • arXiv: 2604.11165 (cross-listed)
  • Authors: Doudou Zhou, Yiran Zhang, Dian Jin, Yingye Zheng, Lu Tian, Tianxi Cai
  • Subjects: stat.ML; cs.AI; cs.LG; math.ST
  • Tags: Reinforcement Learning, Medical AI, Decision Making
  • Summary: 本文提出双稳健Q学习框架,用于从回顾性数据中学习成本最优的序贯决策策略,在前列腺癌队列研究中有效降低了测试成本且不损失预测准确性。

[421] EmbodiedGovBench: A Benchmark for Governance, Recovery, and Upgrade Safety in Embodied Agent Systems

  • arXiv: 2604.11174 (cross-listed)
  • Authors: Xue Qin, Simin Luan, John See, Cong Yang, Zhijun Li
  • Subjects: cs.RO; cs.AI
  • Tags: Embodied AI, Benchmark, AI Safety
  • Code: code
  • Summary: 本文提出EmbodiedGovBench基准,用于评估具身智能体系统的治理能力,涵盖七个治理维度,填补了现有评估仅关注任务成功率的空白。

[422] Taking a Pulse on How Generative AI is Reshaping the Software Engineering Research Landscape

  • arXiv: 2604.11184 (cross-listed)
  • Authors: Bianca Trinkenreich, Fabio Calefato, Kelly Blincoe, Viggo Tellefsen Wivestad, Antonio Pedro Santos Alves, Júlia Condé Araújo, Marina Condé Araújo, Paolo Tell, Marcos Kalinowski, Thomas Zimmermann, Margaret-Anne Storey
  • Subjects: cs.SE; cs.AI
  • Tags: Software Engineering, AI Ethics
  • Summary: 本文对457名软件工程研究人员进行大规模调查,分析了生成式AI在SE研究中的使用情况、感知效益、风险和治理需求,为负责任地整合GenAI提供了实证基线。

[423] MathAgent: Adversarial Evolution of Constraint Graphs for Mathematical Reasoning Data Synthesis

  • arXiv: 2604.11188 (cross-listed)
  • Authors: Zixiong Yu, Jun Rao, Guhan Chen, Songtao Tian, Bohan Li, Jiansheng Wei, Min Zhang, Xiaojun Meng
  • Subjects: cs.CL; cs.AI
  • Tags: Data Synthesis, LLM Reasoning, Mathematical Reasoning
  • Venue: ACL 2026
  • Summary: 本文提出分层合成框架,通过对抗演化约束图来合成高质量数学推理数据,在多个数学基准上超越了现有数据集,展现出优越的泛化能力。

[424] Towards Adaptive Open-Set Object Detection via Category-Level Collaboration Knowledge Mining

  • arXiv: 2604.11195 (cross-listed)
  • Authors: Yuqi Ji, Junjie Ke, Lihuo He, Lizhi Wang, Xinbo Gao
  • Subjects: cs.CV; cs.AI
  • Tags: Object Detection, Domain Adaptation
  • Venue: IEEE TIP 2025
  • Summary: 本文提出类别级协作知识挖掘策略,通过聚类记忆库和自适应特征分配解决自适应开放集目标检测中的跨域表示和类别歧义问题。

[425] ShapShift: Explaining Model Prediction Shifts with Subgroup Conditional Shapley Values

  • arXiv: 2604.11200 (cross-listed)
  • Authors: Tom Bewley, Salim I. Amoukou, Emanuele Albini, Saumitra Mishra, Manuela Veloso
  • Subjects: cs.LG; cs.AI; stat.ML
  • Tags: Interpretability
  • Summary: 本文提出ShapShift方法,利用子组条件Shapley值来解释模型预测偏移的原因,支持决策树、树集成和模型无关场景。

[426] CocoaBench: Evaluating Unified Digital Agents in the Wild

  • arXiv: 2604.11201 (cross-listed)
  • Authors: CocoaBench Team, Shibo Hao, Zhining Zhang, Zhiqi Liang, Tianyang Liu, Yuheng Zha, Qiyue Gao, Jixuan Chen, Zilong Wang, Zhoujun Cheng, Haoxiang Zhang, Junli Wang, Hexi Jin, Boyuan Zheng, Kun Zhou, Yu Wang, Feng Yao, Licheng Liu, Yijiang Li, Zhifei Li, Zhengtao Han, Pracha Promthaw, Tommaso Cerruti, Xiaohan Fu, Ziqiao Ma, Jingbo Shang, Lianhui Qin, Julian McAuley, Eric P. Xing, Zhengzhong Liu, Rupesh Kumar Srivastava, Zhiting Hu
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Agent, Benchmark, GUI Automation
  • Summary: 本文提出CocoaBench基准,用于评估需要组合视觉、搜索和编程能力的统一数字智能体,揭示了当前智能体在推理规划和工具使用方面的不足。

[427] Designing Adaptive Digital Nudging Systems with LLM-Driven Reasoning

  • arXiv: 2604.11206 (cross-listed)
  • Authors: Tiziano Santilli, Mina Alipour, Mahyar Tourchi Moghaddam
  • Subjects: cs.SE; cs.AI
  • Tags: LLM Reasoning, Human-Computer Interaction
  • Summary: 本文提出了一种基于LLM驱动的自适应数字助推系统架构,将行为科学理论与软件架构相结合,在住宅能源可持续性场景中验证了其可行性。

[428] Exploring Knowledge Conflicts for Faithful LLM Reasoning: Benchmark and Method

  • arXiv: 2604.11209 (cross-listed)
  • Authors: Tianzhe Zhao, Jiaoyan Chen, Shuxiu Zhang, Haiping Zhu, Qika Lin, Jun Liu
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Reasoning, RAG, Knowledge Graph
  • Venue: SIGIR 2026
  • Summary: 本文提出ConflictQA基准,系统性地实例化文本证据与知识图谱证据之间的冲突,揭示了LLM在处理跨源知识冲突时的推理缺陷,并提出XoT框架加以改进。

[429] Regional Explanations: Bridging Local and Global Variable Importance

  • arXiv: 2604.11223 (cross-listed)
  • Authors: Salim I. Amoukou, Nicolas J-B. Brunel
  • Subjects: stat.ML; cs.AI; cs.LG
  • Tags: Interpretability
  • Venue: NeurIPS 2025
  • Summary: 本文分析了Local Shapley Values和LIME的局限性,提出R-LOCO方法,通过将输入空间划分为具有相似特征重要性的区域来提供更准确的局部归因。

[430] RECIPER: A Dual-View Retrieval Pipeline for Procedure-Oriented Materials Question Answering

  • arXiv: 2604.11229 (cross-listed)
  • Authors: Zhuoyu Wu, Wenhui Ou, Pei-Sze Tan, Wenqi Fang, Sailaja Rajanala, Raphaël C.-W. Phan
  • Subjects: eess.SP; cs.AI; cs.CL
  • Tags: Information Retrieval, Question Answering, RAG
  • Code: code
  • Summary: 本文提出RECIPER双视图检索管道,结合段落级上下文和LLM提取的程序摘要,在材料科学程序导向问答任务中显著提升了检索性能。

[431] Evolving Many Worlds: Towards Open-Ended Discovery in Petri Dish NCA via Population-Based Training

  • arXiv: 2604.11248 (cross-listed)
  • Authors: Uljad Berdica, Jakob Foerster, Frank Hutter, Arber Zela
  • Subjects: cs.NE; cs.AI; cs.MA
  • Tags: Open-Ended Evolution, Cellular Automata
  • Summary: 本文提出PBT-NCA元进化算法,通过奖励历史行为新颖性和视觉多样性,在Petri Dish神经细胞自动机中实现了开放式发现,生成了多种涌现的生命现象。

[432] AbLWR:A Context-Aware Listwise Ranking Framework for Antibody-Antigen Binding Affinity Prediction via Positive-Unlabeled Learning

  • arXiv: 2604.11272 (cross-listed)
  • Authors: Fan Xu, Zhi-an Huang, Haohuai He, Yidong Song, Wei Liu, Dongxu Zhang, Yao Hu, Kay Chen Tan
  • Subjects: cs.LG; cs.AI
  • Tags: Drug Discovery, Representation Learning
  • Summary: 本文提出AbLWR框架,将抗体-抗原结合亲和力预测重构为列表排序问题,通过正未标记学习和多头自注意力机制有效缓解标签稀疏性和抗原变异问题。

[433] THEIA: Learning Complete Kleene Three-Valued Logic in a Pure-Neural Modular Architecture

  • arXiv: 2604.11284 (cross-listed)
  • Authors: Augustus Haoyang Li
  • Subjects: cs.LG; cs.AI; cs.LO
  • Tags: Neurosymbolic AI, Representation Learning
  • Summary: 本文提出了THEIA,一种模块化神经架构,能够在无外部符号求解器的情况下端到端学习完整的Kleene三值逻辑。实验表明模块化设计通过延迟判决机制实现了强大的组合泛化能力,在500步评估中达到99.97%的准确率。

[434] The Past Is Not Past: Memory-Enhanced Dynamic Reward Shaping

  • arXiv: 2604.11297 (cross-listed)
  • Authors: Yang Liu, Enxi Wang, Yufei Gao, Weixin Zhang, Bo Wang, Zhiyuan Zeng, Yikai Zhang, Yining Zheng, Xipeng Qiu
  • Subjects: cs.LG; cs.AI; cs.CL
  • Tags: Reinforcement Learning, LLM Training
  • Summary: 本文提出了MEDS框架,通过存储历史行为表示并使用密度聚类识别重复错误模式,对频繁出现的错误施加更重的惩罚。实验表明该方法在五个数据集和三个基础模型上显著提升了采样多样性和任务性能。

[435] Enhancing Multimodal Large Language Models for Ancient Chinese Character Evolution Analysis via Glyph-Driven Fine-Tuning

  • arXiv: 2604.11299 (cross-listed)
  • Authors: Rui Song, Lida Shi, Ruihua Qi, Yingji Li, Hao Xu
  • Subjects: cs.CL; cs.AI
  • Tags: Vision-Language Model, Multimodal Learning, Benchmark
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文构建了一个包含11个任务和13万实例的古文字演化分析基准,并提出字形驱动微调框架GEVO以增强多模态大语言模型对汉字演化的理解能力。实验表明2B规模模型在所有任务上均获得显著提升。

[436] 3D-Anchored Lookahead Planning for Persistent Robotic Scene Memory via World-Model-Based MCTS

  • arXiv: 2604.11302 (cross-listed)
  • Authors: Bronislav Sidik, Dror Mizrahi
  • Subjects: cs.RO; cs.AI
  • Tags: Robotics, Automated Planning, Model-Based RL
  • Summary: 本文提出了3D-ALP系统,将蒙特卡洛树搜索与3D一致的世界模型结合,实现具有空间记忆能力的机器人操作规划。在需要空间记忆的任务上,该方法相比贪婪基线实现了显著的成功率提升。

[437] Learning to Forget -- Hierarchical Episodic Memory for Lifelong Robot Deployment

  • arXiv: 2604.11306 (cross-listed)
  • Authors: Leonard Bärmann, Joana Plewnia, Alex Waibel, Tamim Asfour
  • Subjects: cs.RO; cs.AI
  • Tags: Robotics, Memory Architecture, Continual Learning
  • Summary: 本文提出了H²-EMV框架,通过用户交互学习选择性遗忘策略,构建层次化情景记忆。该方法在减少45%内存和35%查询时间的同时保持了问答准确性,并能适应用户特定优先级。

[438] The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems

  • arXiv: 2604.11309 (cross-listed)
  • Authors: Yihao Zhang, Kai Wang, Jiangrong Wu, Haolin Wu, Yuxuan Zhou, Zeming Wei, Dongxian Wu, Xun Chen, Jun Sun, Meng Sun
  • Subjects: cs.CR; cs.AI; cs.CL; cs.CV; cs.LG
  • Tags: LLM Security, Adversarial Robustness, AI Safety
  • Summary: 本文提出了”萨拉米切片风险”攻击方法,通过链接多个低风险输入累积触发高风险行为,在GPT-4o和Gemini上达到90%以上的攻击成功率。同时提出了可降低44.8%攻击效果的防御策略。

[439] Network Effects and Agreement Drift in LLM Debates

  • arXiv: 2604.11312 (cross-listed)
  • Authors: Erica Cau, Andrea Failla, Giulio Rossetti
  • Subjects: cs.SI; cs.AI; cs.CY; cs.MA
  • Tags: Social Simulation, LLM Agent, Social Reasoning
  • Summary: 本文使用网络生成模型研究LLM代理在多轮辩论中的集体行为,发现了”协议漂移”现象,即代理更倾向于向特定意见位置偏移。研究强调需要区分结构效应与模型偏差。

[440] S$^3$: Structured Sparsity Specification

  • arXiv: 2604.11315 (cross-listed)
  • Authors: Ayoub Ghriss
  • Subjects: cs.LG; cs.AI
  • Tags: Model Compression, Optimization
  • Summary: 本文提出了S³框架,一种用于定义、组合和实现结构化稀疏模式的代数框架。该框架支持从细粒度N:M模式到粗粒度通道剪枝的多种稀疏结构,并与OBD和OBS方法无缝集成。

[441] Do LLMs Know Tool Irrelevance? Demystifying Structural Alignment Bias in Tool Invocations

  • arXiv: 2604.11322 (cross-listed)
  • Authors: Yilong Liu, Xixun Lin, Pengfei Cao, Ge Zhang, Fang Fang, Yanan Cao
  • Subjects: cs.CL; cs.AI
  • Tags: Tool Learning, LLM Agent, LLM Evaluation
  • Venue: ACL 2026
  • Summary: 本文识别了LLM工具调用中的”结构对齐偏差”问题,即当查询属性可分配给工具参数时,模型倾向于调用无关工具。提出了SABEval数据集和基于对比注意力归因的重平衡策略来缓解该偏差。

[442] A Compact and Efficient 1.251 Million Parameter Machine Learning CNN Model PD36-C for Plant Disease Detection: A Case Study

  • arXiv: 2604.11332 (cross-listed)
  • Authors: Shkelqim Sherifi
  • Subjects: cs.CV; cs.AI
  • Tags: Edge Computing, Model Compression, Agricultural AI
  • Summary: 本文提出了PD36-C,一个仅125万参数的紧凑CNN模型用于植物病害分类,在38个类别上达到99.5%的测试准确率。该模型专为边缘部署设计,并配有离线推理的桌面应用程序。

[443] Governance by Design: A Parsonian Institutional Architecture for Internet-Wide Agent Societies

  • arXiv: 2604.11337 (cross-listed)
  • Authors: Anbang Ruan
  • Subjects: cs.MA; cs.AI; cs.CY
  • Tags: Multi-Agent System, AI Ethics, AI Safety
  • Summary: 本文应用Parsons的AGIL框架为互联网级代理社会推导出治理架构,并对OpenClaw生态系统进行诊断分析。研究发现现有基础设施缺乏治理协调层和规范基础,提出了优先发展路线图。

[444] Minimal Embodiment Enables Efficient Learning of Number Concepts in Robot

  • arXiv: 2604.11373 (cross-listed)
  • Authors: Zhegong Shangguan, Alessandro Di Nuovo, Angelo Cangelosi
  • Subjects: cs.RO; cs.AI
  • Tags: Embodied AI, Robotics, Representation Learning
  • Summary: 本文研究了具身数值学习,通过机器人与环境的自然交互训练神经网络进行序列计数。具身模型仅需10%的训练数据即可达到96.8%的准确率,并自发形成符合生物认知的数值表征。

[445] From Redaction to Restoration: Deep Learning for Medical Image Anonymization and Reconstruction

  • arXiv: 2604.11376 (cross-listed)
  • Authors: Adrienne Kline, Abhijit Gaonkar, Daniel Pittman, Chris Kuehn, Nils Forkert
  • Subjects: cs.CV; cs.AI
  • Tags: Medical AI, Privacy, Image Synthesis
  • Summary: 本文提出了一个端到端深度学习框架,用于医学图像匿名化和重建。该方法首先检测并编辑包含受保护健康信息的区域,然后使用潜在扩散模型填充解剖学上合理的内容,在保护隐私的同时保持下游任务效用。

[446] One Scale at a Time: Scale-Autoregressive Modeling for Fluid Flow Distributions

  • arXiv: 2604.11403 (cross-listed)
  • Authors: Mario Lino, Nils Thuerey
  • Subjects: cs.CE; cs.AI
  • Tags: Scientific Computing, Neural Operator
  • Summary: 本文提出了尺度自回归建模(SAR)方法,用于在非结构化网格上分层采样流体流动分布。该方法从粗到细逐步生成,在保持准确性的同时比扩散模型快2-7倍,适用于湍流统计量的快速估计。

[447] Retrieval as Generation: A Unified Framework with Self-Triggered Information Planning

  • arXiv: 2604.11407 (cross-listed)
  • Authors: Bo Li, Mingda Wang, Gexiang Fang, Shikun Zhang, Wei Ye
  • Subjects: cs.CL; cs.AI
  • Tags: RAG, Question Answering, LLM Inference
  • Code: code
  • Summary: 本文提出了GRIP框架,将检索控制嵌入到词元级解码中,实现检索与生成的端到端协调。模型通过自触发信息规划在单一自回归轨迹中决定何时检索、如何重构查询以及何时终止。

[448] Efficient Emotion-Aware Iconic Gesture Prediction for Robot Co-Speech

  • arXiv: 2604.11417 (cross-listed)
  • Authors: Edwin C. Montiel-Vazquez, Christian Arzate Cruz, Stefanos Gkikas, Thomas Kassiotis, Giorgos Giannakakis, Randy Gomez
  • Subjects: cs.RO; cs.AI
  • Tags: Robotics, Motion Synthesis, Affective Computing
  • Summary: 本文提出了一个轻量级Transformer模型,仅从文本和情感信息预测机器人协同语音的标志性手势位置和强度。该模型在手势位置分类和强度回归上超越GPT-4o,同时适合实时部署。

[449] Emulating Non-Differentiable Metrics via Knowledge-Guided Learning: Introducing the Minkowski Image Loss

  • arXiv: 2604.11422 (cross-listed)
  • Authors: Filippo Quarenghi, Ryan Cotsakis, Tom Beucler
  • Subjects: cs.LG; cs.AI
  • Tags: Scientific Computing, Image Super-Resolution, Neural Operator
  • Summary: 本文针对地球系统深度学习中的”可微性差距”问题,提出了一种框架,通过解析近似和神经模拟器两种方法来处理不可微函数,并引入了Minkowski图像损失函数用于表面降水场的积分几何测量。实验表明该方法在EUMETNET OPERA数据集上实现了高模拟精度,但也揭示了Lipschitz正则化与梯度信号质量之间的权衡。

[450] METRO: Towards Strategy Induction from Expert Dialogue Transcripts for Non-collaborative Dialogues

  • arXiv: 2604.11427 (cross-listed)
  • Authors: Haofu Yang, Jiaji Liu, Chen Huang, Faguo Wu, Wenqiang Lei, See-Kiong Ng
  • Subjects: cs.CL; cs.AI
  • Tags: Dialogue System, LLM Agent
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文提出METRO方法,利用大语言模型从原始对话记录中自动归纳策略动作和规划逻辑,用于非协作对话代理。该方法将专家知识形式化为策略森林结构,在两个基准测试上平均超越现有方法9%-10%,并展现出良好的跨任务迁移能力。

[451] Hardening x402: PII-Safe Agentic Payments via Pre-Execution Metadata Filtering

  • arXiv: 2604.11430 (cross-listed)
  • Authors: Vladimir Stantchev
  • Subjects: cs.CR; cs.AI; cs.CY
  • Tags: LLM Agent, Privacy
  • Code: code
  • Summary: 本文提出了presidio-hardened-x402,首个开源中间件,用于在AI代理支付请求传输前拦截并检测、修订个人身份信息(PII),执行声明式支出策略并阻止重复重放攻击。该系统在PII检测上达到0.894的F1分数,p99延迟仅为5.73毫秒。

[452] Think Before you Write: QA-Guided Reasoning for Character Descriptions in Books

  • arXiv: 2604.11435 (cross-listed)
  • Authors: Argyrios Papoudakis, Mirella Lapata, Frank Keller
  • Subjects: cs.CL; cs.AI; cs.IR; cs.LG
  • Tags: LLM Reasoning, Text Generation
  • Summary: 本文提出了一种将推理与生成分离的训练框架,用于从长篇叙事中生成角色描述。该方法包含一个推理模型生成结构化QA推理轨迹,以及一个生成模型基于该轨迹产出最终角色描述,在BookWorm和CroSS两个数据集上显著提升了忠实度和信息量。

[453] Low-rank Optimization Trajectories Modeling for LLM RLVR Acceleration

  • arXiv: 2604.11446 (cross-listed)
  • Authors: Zhipeng Chen, Tao Qian, Wayne Xin Zhao, Ji-Rong Wen
  • Subjects: cs.LG; cs.AI; cs.CL
  • Tags: LLM Training, Reinforcement Learning
  • Code: code
  • Summary: 本文提出NExt框架,通过非线性方式建模和推断低秩参数轨迹,以加速大语言模型的可验证奖励强化学习(RLVR)训练。该方法在保持与多种RLVR算法兼容的同时,将计算开销降低约37.5%。

[454] SLALOM: Simulation Lifecycle Analysis via Longitudinal Observation Metrics for Social Simulation

  • arXiv: 2604.11466 (cross-listed)
  • Authors: Juhoon Lee, Joseph Seering
  • Subjects: cs.MA; cs.AI
  • Tags: Social Simulation, LLM Evaluation
  • Venue: CHI 2026 Workshop
  • Summary: 本文引入SLALOM框架,将社会模拟验证从结果验证转向过程保真度评估,利用模式导向建模和动态时间规整来量化评估模拟轨迹与实证真值的结构一致性,解决了LLM社会模拟中的”停钟问题”。

[455] ADD for Multi-Bit Image Watermarking

  • arXiv: 2604.11491 (cross-listed)
  • Authors: An Luo, Jie Ding
  • Subjects: stat.ML; cs.AI; cs.LG; math.ST; stat.ME
  • Tags: Image Watermarking, Image Synthesis
  • Summary: 本文提出ADD(Add, Dot, Decode)多比特图像水印方法,在48比特水印任务上实现100%解码准确率,在各种图像失真下性能下降最多仅2%,远优于现有方法的14%平均下降。该方法在嵌入和解码速度上分别比现有最快方法快2倍和7.4倍。

[456] Quantization Dominates Rank Reduction for KV-Cache Compression

  • arXiv: 2604.11501 (cross-listed)
  • Authors: Samuel Salfati
  • Subjects: cs.LG; cs.AI; cs.CL
  • Tags: LLM Inference, Model Compression
  • Summary: 本文比较了Transformer推理中KV缓存压缩的两种策略:秩缩减和量化,发现在相同存储预算下,量化始终优于秩缩减4-364 PPL。作者通过理论分析表明,在softmax注意力路由下,投影损伤超过量化损伤,且该优势与基的选择无关。

[457] METER: Evaluating Multi-Level Contextual Causal Reasoning in Large Language Models

  • arXiv: 2604.11502 (cross-listed)
  • Authors: Pengfeng Li, Chen Huang, Chaoqun Hao, Hongyao Chen, Xiao-Yong Wei, Wenqiang Lei, See-Kiong Ng
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Reasoning, LLM Evaluation, Benchmark
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文提出METER基准,在统一上下文设置下系统评估LLM在因果阶梯三个层级上的能力。评估显示LLM在因果层级上升时性能显著下降,机制分析揭示了两种主要失败模式:易受因果无关但事实正确信息的干扰,以及对上下文的忠实度随因果层级上升而降低。

[458] Deep Learning for Sequential Decision Making under Uncertainty: Foundations, Frameworks, and Frontiers

  • arXiv: 2604.11507 (cross-listed)
  • Authors: I. Esra Buyuktahtakin
  • Subjects: math.OC; cs.AI; cs.LG; eess.SY; stat.ML
  • Tags: Decision Making, Reinforcement Learning
  • Summary: 本教程从运筹学/管理科学(OR/MS)视角介绍深度学习在不确定性序列决策中的应用,回顾决策基础并连接现代神经架构,强调深度学习是优化的补充而非替代,并讨论在供应链、医疗、能源等领域的应用。

[459] Not All Forgetting Is Equal: Architecture-Dependent Retention Dynamics in Fine-Tuned Image Classifiers

  • arXiv: 2604.11508 (cross-listed)
  • Authors: Miit Daga, Swarna Priya Ramu
  • Subjects: cs.LG; cs.AI
  • Tags: Transfer Learning, Continual Learning
  • Summary: 本文研究了图像分类器微调过程中的遗忘模式,发现不同架构(ResNet vs ViT)遗忘的样本根本不同,Jaccard重叠度仅为0.15-0.34。研究揭示单样本遗忘具有随机性,而类别级遗忘具有语义可解释性,对课程设计和集成构建有重要启示。

[460] Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization

  • arXiv: 2604.11510 (cross-listed)
  • Authors: Jiashu Yao, Heyan Huang, Chuwei Luo, Daiqing Wu, Zeming Liu, Yuhang Guo, Yangyang Kang
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: Reinforcement Learning, LLM Training
  • Summary: 本文提出Policy Split范式,将策略分为正常模式和高熵模式,在共享模型参数的同时进行协作式双模熵正则化。正常模式优化任务正确性,高熵模式偏好探索,该方法在各模型规模上均优于现有熵引导RL基线。

[461] EdgeCIM: A Hardware-Software Co-Design for CIM-Based Acceleration of Small Language Models

  • arXiv: 2604.11512 (cross-listed)
  • Authors: Jinane Bazzi, Mariam Rakka, Fadi Kurdahi, Mohammed E. Fouda, Ahmed Eltawil
  • Subjects: cs.AR; cs.AI
  • Tags: Compute-in-Memory, Edge Computing, LLM Inference
  • Summary: 本文提出EdgeCIM,一种软硬件协同设计框架,采用65nm工艺实现的存内计算宏单元,用于加速边缘设备上的小语言模型推理。相比NVIDIA Orin Nano,在LLaMA3.2-1B上实现7.3倍吞吐量提升和49.59倍能效提升。

[462] From Translation to Superset: Benchmark-Driven Evolution of a Production AI Agent from Rust to Python

  • arXiv: 2604.11518 (cross-listed)
  • Authors: Jinhua Wang, Biswa Sengupta
  • Subjects: cs.SE; cs.AI
  • Tags: Code Generation, LLM Agent
  • Summary: 本文提出一种LLM辅助的持续代码翻译方法,以代理基准测试为目标函数驱动迭代优化,将生产级Rust代码库(648K LOC)翻译为Python(41K LOC)。Python版本在SWE-bench Verified上达到73.8%,并扩展了30项额外功能。

[463] SVD-Prune: Training-Free Token Pruning For Efficient Vision-Language Models

  • arXiv: 2604.11530 (cross-listed)
  • Authors: Yvon Apedo, Martyna Poreba, Michal Szczepanski, Samia Bouchafa
  • Subjects: cs.CV; cs.AI
  • Tags: Vision-Language Model, Model Compression
  • Summary: 本文提出SVD-Prune,一种基于奇异值分解的免训练即插即用视觉语言模型token剪枝方法。该方法分解视觉token特征矩阵,利用统计杠杆分数选择对主导全局方差贡献最大的token,在极端视觉token预算下持续优于现有剪枝方法。

[464] CLAY: Conditional Visual Similarity Modulation in Vision-Language Embedding Space

  • arXiv: 2604.11539 (cross-listed)
  • Authors: Sohwi Lim, Lee Hyoseok, Jungjoon Park, Tae-Hyun Oh
  • Subjects: cs.CV; cs.AI
  • Tags: Vision-Language Model, Information Retrieval
  • Venue: CVPR 2026
  • Summary: 本文提出CLAY,一种自适应相似度计算方法,将预训练视觉语言模型的嵌入空间重构为文本条件相似度空间,无需额外训练即可实现高效的多条件检索。该方法在固定视觉嵌入下实现了高检索准确率和显著的计算效率。

[465] NovBench: Evaluating Large Language Models on Academic Paper Novelty Assessment

  • arXiv: 2604.11543 (cross-listed)
  • Authors: Wenqing Wu, Yi Zhao, Yuzhuo Wang, Siyou Li, Juexi Shao, Yunfei Long, Chengzhi Zhang
  • Subjects: cs.CL; cs.AI; cs.DL; cs.IR
  • Tags: LLM Evaluation, Benchmark, Scientific Reasoning
  • Venue: ACL 2026
  • Summary: 本文介绍了NovBench,首个用于评估大语言模型评估学术论文新颖性能力的大规模基准。实验表明当前模型对科学新颖性的理解有限,微调模型常存在指令遵循缺陷。

[466] Time is Not a Label: Continuous Phase Rotation for Temporal Knowledge Graphs and Agentic Memory

  • arXiv: 2604.11544 (cross-listed)
  • Authors: Weixian Waylon Li, Jiaxin Zhang, Xianan Jim Yang, Tiejun Ma, Yiwen Guo
  • Subjects: cs.CL; cs.AI
  • Tags: Temporal Knowledge Graph, LLM Agent, Memory Architecture
  • Summary: 本文提出了RoMem,一种用于结构化记忆系统的时序知识图谱模块,通过连续相位旋转来区分持久事实和演化事实。该方法在时序知识图谱补全任务上达到最优性能,并在智能体记忆应用中显著提升效果。

[467] FM-Agent: Scaling Formal Methods to Large Systems via LLM-Based Hoare-Style Reasoning

  • arXiv: 2604.11556 (cross-listed)
  • Authors: Haoran Ding, Zhaoguo Wang, Haibo Chen
  • Subjects: cs.SE; cs.AI
  • Tags: Formal Methods, LLM Agent, Software Testing
  • Summary: 本文提出了FM-Agent,首个实现大规模系统自动组合推理的框架,利用LLM自动生成函数级规范并进行Hoare风格推理。该方法在高达143k行代码的系统中发现了522个新bug。

[468] bacpipe: a Python package to make bioacoustic deep learning models accessible

  • arXiv: 2604.11560 (cross-listed)
  • Authors: Vincent S. Kather, Sylvain Haupert, Burooj Ghani, Dan Stowell
  • Subjects: cs.LG; cs.AI
  • Tags: Bioacoustics, Benchmark, Speech Processing
  • Summary: 本文介绍了bacpipe,一个Python软件包,通过图形和编程接口使生物声学深度学习模型更易于访问。该工具为生态学家和计算机科学家提供了模块化的模型评估和基准测试流水线。

[469] Synthius-Mem: Brain-Inspired Hallucination-Resistant Persona Memory Achieving 94.4% Memory Accuracy and 99.6% Adversarial Robustness on LoCoMo

  • arXiv: 2604.11563 (cross-listed)
  • Authors: Artem Gadzhiev, Andrew Kislov
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: LLM Agent, Memory Architecture, LLM Hallucination
  • Summary: 本文提出了Synthius-Mem,一种受大脑启发的结构化人格记忆系统,将对话分解为六个认知域并提取结构化事实。该方法在LoCoMo基准上达到94.37%准确率和99.55%对抗鲁棒性,超越人类表现。

[470] Minimizing classical resources in variational measurement-based quantum computation for generative modeling

  • arXiv: 2604.11578 (cross-listed)
  • Authors: Arunava Majumder, Hendrik Poulsen Nautrup, Hans J. Briegel
  • Subjects: cs.AI; cs.LG; stat.ML
  • Tags: Quantum Computing, Generative Model
  • Summary: 本文提出了一种受限变分测量型量子计算模型,仅使用单个额外可训练参数将酉设置扩展到基于通道的生成模型。该最小扩展足以生成酉模型无法学习的概率分布。

[471] A Triadic Suffix Tokenization Scheme for Numerical Reasoning

  • arXiv: 2604.11582 (cross-listed)
  • Authors: Olga Chetverina
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: LLM Reasoning, Mathematical Reasoning, Tokenization
  • Summary: 本文提出了三元后缀分词方案(TST),一种确定性分词方法,将数字划分为三位三元组并标注显式量级标记。该方法保留了位置和小数结构,有望改善LLM在算术和科学推理中的表现。

[472] Layerwise Dynamics for In-Context Classification in Transformers

  • arXiv: 2604.11613 (cross-listed)
  • Authors: Patrick Lutz, Themistoklis Haris, Arjun Chandra, Aditya Gangrade, Venkatesh Saligrama
  • Subjects: cs.LG; cs.AI
  • Tags: In-Context Learning, Deep Learning Theory
  • Summary: 本文研究了Transformer中的上下文分类,通过强制特征和标签置换等变性提取了显式的深度索引递归规则。所得动力学实现了一种几何驱动的算法模式,可证明能放大类别分离。

[473] CUTEv2: Unified and Configurable Matrix Extension for Diverse CPU Architectures with Minimal Design Overhead

  • arXiv: 2604.11615 (cross-listed)
  • Authors: Jinpeng Ye, Chongxi Wang, Wenqing Li, Bin Yuan, Shiyi Wang, Fenglu Zhang, Junyu Yue, Jianan Xie, Yunhao Ye, Haoyu Deng, Yingkun Zhou, Xin Cheng, Fuxin Zhang, Jian Wang
  • Subjects: cs.AR; cs.AI; cs.DC; cs.LG
  • Tags: Hardware Acceleration, Circuit Design, DNN Deployment
  • Venue: DAC 2026
  • Summary: 本文提出了一种统一可配置的CPU矩阵扩展架构,通过解耦矩阵单元与CPU流水线实现低开销集成。该设计在多个平台上实现高利用率,在AI模型上获得显著加速,硬件开销极小。

[474] SCNO: Spiking Compositional Neural Operator -- Towards a Neuromorphic Foundation Model for Nuclear PDE Solving

  • arXiv: 2604.11625 (cross-listed)
  • Authors: Samrendra Roy, Souvik Chakraborty, Rizwan-uddin, Syed Bahauddin Alam
  • Subjects: cs.LG; cs.AI
  • Tags: Neuromorphic Computing, Scientific Computing, Neural Operator
  • Summary: 本文介绍了脉冲组合神经算子(SCNO),一种结合脉冲和传统组件的模块化架构用于PDE求解。该方法在耦合PDE上实现更低误差,参数量更少,并支持零遗忘模块化扩展。

[475] CodeTracer: Towards Traceable Agent States

  • arXiv: 2604.11641 (cross-listed)
  • Authors: Han Li, Yifan Yao, Letian Zhu, Rili Feng, Hongyi Ye, Jiaming Wang, Yancheng He, Pengyu Zou, Lehan Zhang, Xinping Lei, Haoyang Huang, Ken Deng, Ming Sun, Zhaoxiang Zhang, He Ye, Jiaheng Liu
  • Subjects: cs.SE; cs.AI
  • Tags: LLM Agent, Software Testing, Benchmark
  • Summary: 本文提出了CodeTracer,一种用于调试代码智能体的追踪架构,可重建状态转换历史并定位故障起点。同时构建了CodeTraceBench基准用于系统评估代码智能体的故障定位能力。

[476] RPA-Check: A Multi-Stage Automated Framework for Evaluating Dynamic LLM-based Role-Playing Agents

  • arXiv: 2604.11655 (cross-listed)
  • Authors: Riccardo Rosati, Edoardo Colucci, Massimiliano Bolognini, Adriano Mancini, Paolo Sernani
  • Subjects: cs.CL; cs.AI; cs.MA
  • Tags: LLM Evaluation, LLM Agent, Benchmark
  • Summary: 本文介绍了RPA-Check,一个多阶段自动评估框架,用于客观评估LLM角色扮演智能体在复杂约束环境中的表现。实验发现较小的指令微调模型在程序一致性上可超越大型架构。

[477] Towards Autonomous Mechanistic Reasoning in Virtual Cells

  • arXiv: 2604.11661 (cross-listed)
  • Authors: Yunhui Jang, Lu Zhu, Jake Fawkes, Alisandra Kaye Denton, Dominique Beaini, Emmanuel Noutahi
  • Subjects: cs.LG; cs.AI
  • Tags: LLM Agent, Scientific Reasoning, Multi-Agent System
  • Summary: 本文提出了VCR-Agent,一个多智能体框架,用于在虚拟细胞中生成和验证机制推理。该研究发布了VC-TRACES数据集,证明训练机制解释可提高基因表达预测的事实精度。

[478] Beyond LLMs, Sparse Distributed Memory, and Neuromorphics <A Hyper-Dimensional SRAM-CAM "VaCoAl" for Ultra-High Speed, Ultra-Low Power, and Low Cost>

  • arXiv: 2604.11665 (cross-listed)
  • Authors: Hiroyuki Chuma, Kanji Otsuka, Yoichi Sato
  • Subjects: cs.NE; cs.AI
  • Tags: Neuromorphic Computing, Memory Architecture, Knowledge Representation
  • Summary: 本文提出了VaCoAl,一种基于伽罗瓦域代数的超维计算架构,可实现可逆多跳推理。该方法在Wikidata导师-学生关系数据上验证,展现出等效于STDP的涌现语义选择机制。

[479] Playing Along: Learning a Double-Agent Defender for Belief Steering via Theory of Mind

  • arXiv: 2604.11666 (cross-listed)
  • Authors: Hanqi Xiao, Vaidehi Patil, Zaid Khan, Hyunji Lee, Elias Stengel-Eskin, Mohit Bansal
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: LLM Agent, Reinforcement Learning, AI Safety
  • Code: code
  • Summary: 本文提出了ToM-SB,一个隐私主题的心智理论挑战,防御者作为双面间谍引导攻击者的信念。通过强化学习训练,模型在心智理论和欺骗攻击者能力上均有提升。

[480] NetworkNet: A Deep Neural Network Approach for Random Networks with Sparse Nodal Attributes and Complex Nodal Heterogeneity

  • arXiv: 2604.11673 (cross-listed)
  • Authors: Zhaoyu Xing, Xiufan Yu
  • Subjects: stat.ME; cs.AI; math.ST; stat.CO
  • Tags: Graph Neural Network, Social Network Analysis, Representation Learning
  • Summary: 本文提出了NetworkNet,一种用于建模随机网络中节点异质性的深度神经网络方法。该方法可同时估计潜在异质性函数并进行数据驱动的属性选择,兼具表达能力和统计严谨性。

[481] AffordSim: A Scalable Data Generator and Benchmark for Affordance-Aware Robotic Manipulation

  • arXiv: 2604.11674 (cross-listed)
  • Authors: Mingyang Li, Haofan Xu, Haowen Sun, Xinzhe Chen, Sihua Ren, Liqi Huang, Xinyang Sui, Chenyang Miao, Qiongjie Cui, Zeyang Liu, Xingyu Chen, Xuguang Lan
  • Subjects: cs.RO; cs.AI
  • Tags: Robotics, Embodied AI, Benchmark
  • Summary: 本文介绍了AffordSim,一个将开放词汇3D可供性预测集成到机器人操作数据生成流程中的仿真框架。该框架支持跨多种机器人平台的自动化可供性感知数据生成,并建立了包含50个任务的基准测试,揭示了需要精确可供性感知的任务对当前模仿学习方法仍具有挑战性。

[482] Legal2LogicICL: Improving Generalization in Transforming Legal Cases to Logical Formulas via Diverse Few-Shot Learning

  • arXiv: 2604.11699 (cross-listed)
  • Authors: Jieying Xue, Phuong Minh Nguyen, Ha Thanh Nguyen, May Myo Zin, Ken Satoh
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: Legal AI, In-Context Learning, RAG
  • Venue: ICAIL 2026
  • Code: code
  • Summary: 本文提出了Legal2LogicICL,一个基于LLM的法律推理框架,通过检索增强生成实现有效的上下文学习。该方法在潜在语义表示和法律文本结构层面平衡示例的多样性和相似性,显著提高了将法律案例转换为逻辑表示的准确性、稳定性和泛化能力。

[483] Fairness is Not Flat: Geometric Phase Transitions Against Shortcut Learning

  • arXiv: 2604.11704 (cross-listed)
  • Authors: Nicolas Rodriguez-Alvarez, Fernando Rodriguez-Merino
  • Subjects: cs.LG; cs.AI
  • Tags: Fairness, Bias Mitigation, Adversarial Robustness
  • Summary: 本文提出了一种几何先验方法来缓解深度神经网络中的捷径学习问题。通过部署拓扑审计器隔离垄断梯度的特征,该方法成功将反事实性别漏洞从21.18%降低到7.66%,且计算成本远低于现有方法。

[484] On the Robustness of Watermarking for Autoregressive Image Generation

  • arXiv: 2604.11720 (cross-listed)
  • Authors: Andreas Müller, Denis Lukovnikov, Shingo Kodama, Minh Pham, Anubhav Jain, Jonathan Petit, Niv Cohen, Asja Fischer
  • Subjects: cs.CV; cs.AI; cs.CR
  • Tags: Image Watermarking, Text-to-Image, Adversarial Robustness
  • Summary: 本文研究了自回归图像生成水印方案的脆弱性,提出了三种新的攻击方法。研究发现现有的水印方案无法可靠支持合成内容检测,且攻击者可以仅通过单个水印参考图像实现水印移除和伪造。

[485] Evaluating Cooperation in LLM Social Groups through Elected Leadership

  • arXiv: 2604.11721 (cross-listed)
  • Authors: Ryan Faulkner, Anushka Deshpande, David Guzman Piedrahita, Joel Z. Leibo, Zhijing Jin
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: Multi-Agent System, LLM Agent, Social Simulation
  • Summary: 本文研究了领导力和选举机制是否能改善LLM多智能体系统中的集体决策。实验表明,选举产生的领导力可将社会福利得分提高55.4%,生存时间提高128.6%。

[486] Endogenous Information in Routing Games: Memory-Constrained Equilibria, Recall Braess Paradoxes, and Memory Design

  • arXiv: 2604.11733 (cross-listed)
  • Authors: Saad Alqithami
  • Subjects: cs.GT; cs.AI; cs.IT
  • Tags: Game AI, Decision Making, Optimization
  • Summary: 本文研究了旅行者基于记忆或呈现的路线而非固定行动集进行优化的路由博弈。作者提出了遗忘Wardrop均衡概念,并证明了在改善回忆能力时可能出现回忆布雷斯悖论,即不改变物理容量却增加均衡延迟。

[487] Multi-ORFT: Stable Online Reinforcement Fine-Tuning for Multi-Agent Diffusion Planning in Cooperative Driving

  • arXiv: 2604.11734 (cross-listed)
  • Authors: Haojie Bai, Aimin Li, Ruoyu Yao, Xiongwei Zhao, Tingting Zhang, Xing Zhang, Lin Gao, and Jun Ma
  • Subjects: cs.RO; cs.AI
  • Tags: Autonomous Driving, Diffusion Model, Reinforcement Learning
  • Summary: 本文提出了Multi-ORFT方法,将场景条件扩散预训练与稳定的在线强化后训练相结合,用于协作驾驶规划。该方法在WOMD闭环基准上降低了碰撞率和离路率,同时提高了平均速度。

[488] Discourse Diversity in Multi-Turn Empathic Dialogue

  • arXiv: 2604.11742 (cross-listed)
  • Authors: Hongli Zhan, Emma S. Gueorguieva, Javier Hernandez, Jina Suh, Desmond C. Ong, Junyi Jessy Li
  • Subjects: cs.CL; cs.AI
  • Tags: Dialogue System, Reinforcement Learning, Affective Computing
  • Summary: 本文分析了多轮共情对话中的话语动作多样性,发现LLM在跨轮次中重复使用策略的频率几乎是人类的两倍。作者提出了MINT强化学习框架,在提高共情质量的同时减少了话语动作重复。

[489] Grounded World Model for Semantically Generalizable Planning

  • arXiv: 2604.11751 (cross-listed)
  • Authors: Quanyi Li, Lan Feng, Haonan Zhang, Wuyang Li, Letian Wang, Alexandre Alahi, Harold Soh
  • Subjects: cs.RO; cs.AI
  • Tags: Robotics, Vision-Language Model, Embodied AI
  • Summary: 本文提出了在视觉-语言对齐潜在空间中学习的基础世界模型(GWM),将视觉运动模型预测控制转化为语义可泛化的规划方法。该方法在WISER基准上达到87%的成功率,显著优于传统VLA方法。

[490] StarVLA-$α$: Reducing Complexity in Vision-Language-Action Systems

  • arXiv: 2604.11757 (cross-listed)
  • Authors: Jinhui Ye, Ning Gao, Senqiao Yang, Jinliang Zheng, Zixuan Wang, Yuxin Chen, Pengguang Chen, Yilun Chen, Shu Liu, Jiaya Jia
  • Subjects: cs.RO; cs.AI; cs.CV
  • Tags: Robotics, Vision-Language Model, Embodied AI
  • Code: code
  • Summary: 本文介绍了StarVLA-α,一个简单而强大的视觉-语言-动作模型基线,旨在最小化架构和流程复杂性。该模型在多个基准上表现优异,在RoboChallenge基准上比π0.5高出20%。

[491] Efficient KernelSHAP Explanations for Patch-based 3D Medical Image Segmentation

  • arXiv: 2604.11775 (cross-listed)
  • Authors: Ricardo Coimbra Brioso, Giulio Sichili, Damiano Dei, Nicola Lambri, Pietro Mancosu, Marta Scorsetti, Daniele Loiacono
  • Subjects: cs.CV; cs.AI
  • Tags: Medical AI, Interpretability, Image Segmentation
  • Summary: 本文提出了一个高效的KernelSHAP框架,用于基于patch的3D医学图像分割,通过将计算限制在感兴趣区域并使用patch logit缓存来加速推理。研究比较了三种特征抽象方法,揭示了忠实性和可解释性之间的权衡。

[492] General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks

  • arXiv: 2604.11778 (cross-listed)
  • Authors: Junlin Liu, Shengnan An, Shuang Zhou, Dan Ma, Shixiong Luo, Ying Xie, Yuan Zhang, Wenling Yuan, Yifan Zhou, Xiaoyu Li, Ziwen Wang, Xuezhi Cao, Xunliang Cai
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Reasoning, Benchmark, LLM Evaluation
  • Summary: 本文介绍了General365基准,专门用于评估LLM在多样化任务中的通用推理能力。该基准将背景知识限制在K-12水平,评估显示即使是表现最好的模型也仅达到62.8%的准确率,表明当前LLM的推理能力严重依赖特定领域。

[493] ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

  • arXiv: 2604.11784 (cross-listed)
  • Authors: Fei Tang, Zhiqiong Lu, Boxuan Zhang, Weiming Lu, Jun Xiao, Yueting Zhuang, Yongliang Shen
  • Subjects: cs.LG; cs.AI; cs.CL; cs.CV
  • Tags: GUI Automation, LLM Agent, Reinforcement Learning
  • Summary: 本文提出了ClawGUI,一个用于训练、评估和部署GUI智能体的统一开源框架。该框架提供RL训练基础设施、标准化评估流程和多平台部署支持,训练出的ClawGUI-2B在MobileWorld基准上超越了同规模基线。

[494] ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection

  • arXiv: 2604.11790 (cross-listed)
  • Authors: Wei Zhao, Zhe Li, Peixin Zhang, Jun Sun
  • Subjects: cs.CR; cs.AI
  • Tags: LLM Security, LLM Agent, Cybersecurity
  • Code: code
  • Summary: 本文介绍了ClawGuard,一个运行时安全框架,通过在工具调用边界执行用户确认的规则集来保护工具增强的LLM智能体免受间接提示注入攻击。该方法无需模型修改即可有效阻止三种注入攻击路径。

[495] A Mechanistic Analysis of Looped Reasoning Language Models

  • arXiv: 2604.11791 (cross-listed)
  • Authors: Hugh Blayney, Álvaro Arroyo, Johan Obando-Ceron, Pablo Samuel Castro, Aaron Courville, Michael M. Bronstein, Xiaowen Dong
  • Subjects: cs.LG; cs.AI
  • Tags: LLM Reasoning, Interpretability, Deep Learning Theory
  • Summary: 本文对循环推理语言模型的潜在状态进行了机制分析,发现循环中的每一层都会收敛到不同的不动点。研究表明循环块学习的推理阶段与前馈模型相似,并在每次迭代中重复这些阶段。

[496] C-ReD: A Comprehensive Chinese Benchmark for AI-Generated Text Detection Derived from Real-World Prompts

  • arXiv: 2604.11796 (cross-listed)
  • Authors: Chenxi Qing, Junxi Wu, Zheng Liu, Yixiang Qiu, Hongyao Yu, Bin Chen, Hao Wu, Shu-Tao Xia
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Evaluation, Benchmark, Deepfake Detection
  • Code: code
  • Summary: 本文提出了C-ReD,一个基于真实世界提示的综合性中文AI生成文本检测基准。该基准解决了模型多样性、领域覆盖和提示真实性方面的差距,支持可靠的域内检测和对未见LLM及外部中文数据集的强泛化能力。

[497] Budget-Aware Uncertainty for Radiotherapy Segmentation QA Using nnU-Net

  • arXiv: 2604.11798 (cross-listed)
  • Authors: Ricardo Coimbra Brioso, Lorenzo Mondo, Damiano Dei, Nicola Lambri, Pietro Mancosu, Marta Scorsetti, Daniele Loiacono
  • Subjects: cs.CV; cs.AI
  • Tags: Medical AI, Uncertainty Estimation
  • Summary: 本文提出了一个基于nnU-Net的预算感知不确定性驱动质量保证框架,通过结合不确定性量化和后校准来生成体素级不确定性图,指导放疗中临床靶区体积分割的手动审查。

[498] Solving Physics Olympiad via Reinforcement Learning on Physics Simulators

  • arXiv: 2604.11805 (cross-listed)
  • Authors: Mihir Prabhudesai, Aryan Satpathy, Yangmin Li, Zheyang Qin, Nikash Bhardwaj, Amir Zadeh, Chuan Li, Katerina Fragkiadaki, Deepak Pathak
  • Subjects: cs.LG; cs.AI; cs.CV; cs.RO
  • Tags: LLM Reasoning, Scientific Reasoning, Reinforcement Learning
  • Summary: 本文展示了物理模拟器可以作为训练LLM物理推理能力的有效监督来源,通过在合成数据上进行强化学习训练,模型在国际物理奥林匹克竞赛问题上实现了5-10个百分点的性能提升。

[499] Physics-Informed State Space Models for Reliable Solar Irradiance Forecasting in Off-Grid Systems

  • arXiv: 2604.11807 (cross-listed)
  • Authors: Mohammed Ezzaldin Babiker Abdullah
  • Subjects: cs.LG; cs.AI; eess.SY
  • Tags: Physics-Informed Learning, Time Series Forecasting
  • Summary: 本文提出了热力学流形网络,将气象变量投影到Koopman线性化黎曼流形中进行太阳辐照度预测,通过物理约束消除了夜间虚假发电等异常现象。

替换投稿 (232)

[500] Can Large Language Models Infer Causal Relationships from Real-World Text?

  • arXiv: 2505.18931 (replaced)
  • Authors: Ryan Saklad, Aman Chadha, Oleg Pavlov, Raha Moraffah
  • Subjects: cs.AI; cs.CL; cs.LG
  • Tags: Causal Inference, LLM Reasoning, Benchmark
  • Code: code
  • Summary: 本文构建了首个基于真实世界学术文献的因果推理基准,用于评估LLM从复杂文本中推断因果关系的能力,发现最佳模型的F1分数仅为0.535。

[501] VS-Bench: Evaluating VLMs for Strategic Abilities in Multi-Agent Environments

  • arXiv: 2506.02387 (replaced)
  • Authors: Zelai Xu, Zhexuan Xu, Xiangmin Yi, Huining Yuan, Mo Guang, Kaiwen Long, Xinlei Chen, Yi Wu, Chao Yu, Yu Wang
  • Subjects: cs.AI
  • Tags: Vision-Language Model, Multi-Agent System, Benchmark
  • Venue: CVPR 2026
  • Summary: 本文提出了VS-Bench,一个多模态基准,用于评估VLM在多智能体环境中的战略能力,涵盖感知、推理和决策三个维度,发现当前模型在推理和决策方面与最优性能存在显著差距。

[502] Disambiguation-Centric Finetuning Makes Enterprise Tool-Calling LLMs More Realistic and Less Risky

  • arXiv: 2507.03336 (replaced)
  • Authors: Ashutosh Hathidara, Julien Yu, Sebastian Schreiber
  • Subjects: cs.AI; cs.CL; cs.LG
  • Tags: LLM Agent, Tool Learning
  • Venue: ACL 2026 Findings
  • Summary: 本文提出了DiaFORGE,一个以消歧为中心的三阶段流水线,用于训练企业工具调用LLM,在动态基准DiaBENCH上将工具调用成功率相比GPT-4o提升了27个百分点。

[503] PosterGen: Aesthetic-Aware Multi-Modal Paper-to-Poster Generation via Multi-Agent LLMs

  • arXiv: 2508.17188 (replaced)
  • Authors: Zhilin Zhang, Xiang Zhang, Jiaqi Wei, Yiwei Xu, Chenyu You
  • Subjects: cs.AI
  • Tags: Multi-Agent System, Text Generation
  • Summary: 本文提出了PosterGen,一个模仿专业海报设计师工作流程的多智能体框架,通过解析、布局、风格和渲染四个协作智能体生成语义准确且视觉美观的学术海报。

[504] ChatCLIDS: Simulating Persuasive AI Dialogues to Promote Closed-Loop Insulin Adoption in Type 1 Diabetes Care

  • arXiv: 2509.00891 (replaced)
  • Authors: Zonghai Yao, Talha Chafekar, Junda Wang, Shuo Han, Feiyun Ouyang, Junhui Qian, Lingxi Li, Hong Yu
  • Subjects: cs.AI; cs.CL
  • Tags: Medical AI, AI Persuasion, Benchmark
  • Venue: AAAI 2026
  • Summary: 本文介绍了ChatCLIDS,首个用于评估LLM驱动的健康行为改变说服对话的基准,包含专家验证的虚拟患者库和多轮对话场景,揭示了当前LLM在行为改变方面的局限性。

[505] RISK: A Framework for GUI Agents in E-commerce Risk Management

  • arXiv: 2509.21982 (replaced)
  • Authors: Renqi Chen, Zeyin Tao, Jianming Guo, Jingzhe Zhu, Yiheng Peng, Qingqing Sun, Tianyi Zhang, Shuai Chen
  • Subjects: cs.AI; cs.CL
  • Tags: GUI Automation, LLM Agent
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文提出了RISK框架,用于构建和部署电商风险管理GUI智能体,包含数据集、基准测试和强化微调方法,在在线评估中实现了70.5%的最高任务成功率。

[506] Interactive Learning for LLM Reasoning

  • arXiv: 2509.26306 (replaced)
  • Authors: Hehai Lin, Shilei Cao, Sudong Wang, Haotian Wu, Minzhi Li, Linyi Yang, Juepeng Zheng, Chengwei Qin
  • Subjects: cs.AI
  • Tags: LLM Reasoning, Multi-Agent System
  • Code: code
  • Summary: 本文提出了ILR,一个多智能体协同学习框架,通过动态交互和感知校准增强LLM的独立问题解决能力,在多个推理基准上相比最强基线提升了5%。

[507] TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance

  • arXiv: 2509.26627 (replaced)
  • Authors: Yuyang Liu, Chuan Wen, Yihang Hu, Dinesh Jayaraman, Yang Gao
  • Subjects: cs.AI; cs.LG; cs.RO
  • Tags: Reinforcement Learning, Video Understanding, Robotics
  • Summary: 本文提出了TimeRewarder,一种通过建模帧间时间距离从被动视频中学习密集奖励的方法,在Meta-World任务上显著提升了强化学习的样本效率和成功率。

[508] Advancing Reasoning in Diffusion Language Models with Denoising Process Rewards

  • arXiv: 2510.01544 (replaced)
  • Authors: Shaoan Xie, Lingjing Kong, Xiangchen Song, Xinshuai Dong, Guangyi Chen, Eric P.Xing, Kun Zhang
  • Subjects: cs.AI
  • Tags: Diffusion Model, LLM Reasoning
  • Summary: 本文为扩散语言模型引入了去噪过程奖励,一种过程级强化信号,通过估计中间去噪区间对最终结果的贡献来改善推理的稳定性、可解释性和整体性能。

[509] Plug-and-Play Dramaturge: A Divide-and-Conquer Approach for Iterative Narrative Script Refinement via Collaborative LLM Agents

  • arXiv: 2510.05188 (replaced)
  • Authors: Wenda Xie, Chao Guo, Yanqing Jing. Junle Wang, Yisheng Lv, Fei-Yue Wang
  • Subjects: cs.AI
  • Tags: Multi-Agent System, Text Generation
  • Summary: 本文提出了Dramaturge,一种层次化多智能体方法,通过全局审查、场景级审查和层次协调修订三个阶段来迭代优化长叙事脚本,显著提升了脚本质量。

[510] SHE: Stepwise Hybrid Examination Reinforcement Learning Framework for E-commerce Search Relevance

  • arXiv: 2510.07972 (replaced)
  • Authors: Pengkun Jiao, Yiming Jin, Jianhui Yang, Chenhe Dong, Zerui Huang, Shaowei Yao, Xiaojiang Zhou, Dan Ou, Haihong Tang
  • Subjects: cs.AI
  • Tags: Reinforcement Learning, Recommender System
  • Summary: 本文提出了SHE框架,通过步进奖励策略优化和混合奖励机制为电商搜索相关性预测提供细粒度监督,在推理质量和预测准确性上超越了SFT、DPO和GRPO等基线。

[511] Graph-Coarsening Approach for the Capacitated Vehicle Routing Problem with Time Windows

  • arXiv: 2510.22329 (replaced)
  • Authors: Mustafa Mert Özyılmaz
  • Subjects: cs.AI; math.OC
  • Tags: Optimization, Quantum Computing
  • Summary: 本文提出了一种多级图粗化和细化策略来解决带时间窗的容量车辆路径问题,通过时空距离度量聚合客户节点,显著减少了经典启发式和量子退火的计算时间。

[512] MGA: Memory-Driven GUI Agent for Observation-Centric Interaction

  • arXiv: 2510.24168 (replaced)
  • Authors: Weihua Cheng, Junming Liu, Yifei Sun, Botian Shi, W Yirong Chen, Ding Wang
  • Subjects: cs.AI
  • Tags: GUI Automation, LLM Agent, Memory Architecture
  • Code: code
  • Summary: 本文提出了MGA,一个记忆驱动的GUI智能体框架,通过观察者模块和结构化记忆机制将长轨迹解耦为独立决策步骤,有效减少了认知开销和系统复杂度。

[513] Scalable Stewardship of an LLM-Assisted Clinical Benchmark with Physician Oversight

  • arXiv: 2512.19691 (replaced)
  • Authors: Junze Ye, Daniel Tawfik, Alex J. Goodell, Nikhil V. Kotha, Mark K. Buyyounouski, Mohsen Bayati
  • Subjects: cs.AI; stat.AP
  • Tags: LLM Evaluation, Medical AI, Benchmark
  • Code: code
  • Summary: 本文审计了MedCalc-Bench医学基准测试,发现至少27%的测试标签存在错误或无法计算。作者开发了医生参与的监督流程重新评估标签,证明使用原始标签会低估LLM准确率16-23个百分点。

[514] Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration

  • arXiv: 2601.07224 (replaced)
  • Authors: Yang Zhao, Yangou Ouyang, Xiao Ding, Hepeng Wang, Bibo Cai, Kai Xiong, Jinglong Gao, Zhouhao Sun, Li Du, Bing Qin, Ting Liu
  • Subjects: cs.AI; cs.LG
  • Tags: LLM Alignment, Reinforcement Learning, LLM Training
  • Venue: ACL 2026
  • Summary: 本文提出PRISM框架,基于梯度空间集中度来区分SFT和RL阶段的数据分配。实验表明该方法在WebShop和ALFWorld上实现了帕累托改进,同时将计算成本降低至3.22倍。

[515] AgencyBench: Benchmarking the Frontiers of Autonomous Agents in 1M-Token Real-World Contexts

  • arXiv: 2601.11044 (replaced)
  • Authors: Keyu Li, Junhao Shi, Yang Xiao, Mohan Jiang, Jie Sun, Yunze Wu, Dayuan Fu, Shijie Xia, Xiaojie Cai, Tianze Xu, Weiye Si, Wenjie Li, Dequan Wang, Pengfei Liu
  • Subjects: cs.AI
  • Tags: LLM Agent, Benchmark, LLM Evaluation
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文介绍了AgencyBench基准,用于评估自主代理在32个真实场景中的6项核心能力。该基准包含138个任务,平均需要90次工具调用和100万token,揭示了闭源模型显著优于开源模型。

[516] Subargument Argumentation Frameworks: Separating Direct Conflict from Structural Dependency

  • arXiv: 2601.12038 (replaced)
  • Authors: Beishui Liao
  • Subjects: cs.AI
  • Tags: Knowledge Representation, Formal Methods
  • Summary: 本文提出了子论证论证框架(SAFs),将直接攻击和子论证关系作为独立的原始关系表示。SAFs在保持与Dung理论语义兼容的同时,提供了更强的表示表达能力。

[517] Risk Awareness Injection: Calibrating Vision-Language Models for Safety without Compromising Utility

  • arXiv: 2602.03402 (replaced)
  • Authors: Mengxuan Wang, Yuxin Chen, Gang Xu, Tao He, Hongjie Jiang, Ming Li
  • Subjects: cs.AI; cs.LG
  • Tags: Vision-Language Model, LLM Security, Adversarial Robustness
  • Summary: 本文提出RAI框架,一种轻量级、免训练的视觉语言模型安全校准方法。该方法通过构建不安全原型子空间并对高风险视觉token进行定向调制,恢复模型识别不安全内容的能力。

[518] ANCHOR: Branch-Point Data Generation for GUI Agents

  • arXiv: 2602.07153 (replaced)
  • Authors: Jinbiao Wei, Yilun Zhao, Kangqi Ni, Arman Cohan
  • Subjects: cs.AI
  • Tags: GUI Automation, LLM Agent, Data Synthesis
  • Summary: 本文提出Anchor轨迹扩展框架,从少量验证过的种子演示中引导生成可扩展的桌面监督数据。该方法通过识别分支点并生成新轨迹,在OSWorld和WindowsAgentArena基准上取得显著改进。

[519] X-SYS: A Reference Architecture for Interactive Explanation Systems

  • arXiv: 2602.12748 (replaced)
  • Authors: Tobias Labarta, Nhi Hoang, Maximilian Dreyer, Jim Berend, Oleg Hein, Jackie Ma, Wojciech Samek, Sebastian Lapuschkin
  • Subjects: cs.AI; cs.HC; cs.SE
  • Tags: Interpretability, Human-Computer Interaction
  • Summary: 本文介绍了X-SYS参考架构,用于构建交互式解释系统。该架构围绕可扩展性、可追溯性、响应性和适应性四个质量属性,提供连接解释界面与系统功能的可复用蓝图。

[520] Constrained Assumption-Based Argumentation Frameworks

  • arXiv: 2602.13135 (replaced)
  • Authors: Emanuele De Angelis, Fabio Fioravanti, Maria Chiara Meo, Alberto Pettorossi, Maurizio Proietti, Francesca Toni
  • Subjects: cs.AI; cs.LO
  • Tags: Knowledge Representation, Formal Methods
  • Venue: AAMAS 2026
  • Summary: 本文提出了约束假设论证框架(CABA),扩展了传统ABA框架以包含约束变量。作者定义了CABA的非基础语义,证明其保守地推广了标准ABA语义。

[521] Hunt Globally: Wide Search AI Agents for Drug Asset Scouting in Investing, Business Development, and Competitive Intelligence

  • arXiv: 2602.15019 (replaced)
  • Authors: Alisa Vinogradova, Vlad Vinogradov, Luba Greenwood, Ilya Yasny, Dmitry Kobyzev, Shoman Kasbekar, Kong Nguyen, Dmitrii Radkevich, Roman Doronin, Andrey Doronichev
  • Subjects: cs.AI; cs.IR
  • Tags: LLM Agent, Drug Discovery, LLM Evaluation
  • Summary: 本文提出了药物资产搜索的基准测试方法和Bioptic Agent,一种树状自学习代理。该代理在多语言完整性基准上达到79.7%的F1分数,显著优于Claude、Gemini和GPT等商业模型。

[522] FlexMS is a flexible framework for benchmarking deep learning-based mass spectrum prediction tools in metabolomics

  • arXiv: 2602.22822 (replaced)
  • Authors: Yunhua Zhong, Yixuan Tang, Yifan Li, Jie Yang, Pan Liu, Jun Xia
  • Subjects: cs.AI; cs.LG
  • Tags: Benchmark, Drug Discovery, Molecular Generation
  • Summary: 本文创建了FlexMS基准框架,用于构建和评估质谱预测中的多种深度学习模型架构。该框架支持动态构建模型组合,并分析了数据集结构多样性、超参数和迁移学习对性能的影响。

[523] Nano-EmoX: Unifying Multimodal Emotional Intelligence from Perception to Empathy

  • arXiv: 2603.02123 (replaced)
  • Authors: Jiahao Huang, Fengyan Lin, Xuechao Yang, Chen Feng, Kexin Zhu, Xu Yang, Zhide Chen
  • Subjects: cs.AI; cs.CV
  • Tags: Affective Computing, Multimodal Learning, Vision-Language Model
  • Venue: CVPR 2026
  • Code: code
  • Summary: 本文提出了Nano-EmoX多任务多模态语言模型和P2E课程训练框架,统一了感知、理解和交互三个层次的六项情感任务。该2.2B参数模型在多个基准上达到最先进或极具竞争力的性能。

[524] Diagnosing Retrieval vs. Utilization Bottlenecks in LLM Agent Memory

  • arXiv: 2603.02473 (replaced)
  • Authors: Boqin Yuan, Yue Su, Kun Yao
  • Subjects: cs.AI
  • Tags: LLM Agent, RAG, Memory Architecture
  • Venue: ICLR 2026 Workshop
  • Code: code
  • Summary: 本文引入诊断框架分析记忆增强LLM代理,发现检索方法是性能的主导因素。原始分块存储无需LLM调用即可匹配或优于昂贵的替代方案,表明提升检索质量比增加写入复杂度收益更大。

[525] Do Machines Fail Like Humans? A Human-Centred Out-of-Distribution Spectrum for Mapping Error Alignment

  • arXiv: 2603.07462 (replaced)
  • Authors: Binxia Xu, Xiaoliang Luo, Luke Dickens, Robert M. Mok
  • Subjects: cs.AI
  • Tags: Vision-Language Model, Interpretability, Cognitive Science
  • Summary: 本文提出了以人为中心的OOD框架,将OOD程度重新定义为人感知难度的谱系。研究发现视觉语言模型在近OOD和远OOD条件下与人类对齐最一致,CNN和ViT在不同区域表现不同。

[526] Normative Common Ground Replication (NormCoRe): Replication-by-Translation for Studying Norms in Multi-Agent AI

  • arXiv: 2603.11974 (replaced)
  • Authors: Luca Deck, Simeon Allmendinger, Lucas Müller, Niklas Kühl
  • Subjects: cs.AI
  • Tags: Multi-Agent System, Fairness, AI Ethics
  • Venue: ACM FAccT 2026
  • Summary: 本文提出NormCoRe框架,将人类受试者实验设计系统性地转化为多智能体AI环境。研究表明AI代理的规范性判断可能与人类基线不同,且对基础模型和语言选择敏感。

[527] dTRPO: Trajectory Reduction in Policy Optimization of Diffusion Large Language Models

  • arXiv: 2603.18806 (replaced)
  • Authors: Wenxuan Zhang, Lemeng Wu, Changsheng Zhao, Ernie Chang, Mingchen Zhuge, Zechun Liu, Andy Su, Hanxian Huang, Jun Chen, Chong Zhou, Raghuraman Krishnamoorthi, Vikas Chandra, Mohamed Elhoseiny, Wei Wen
  • Subjects: cs.AI
  • Tags: Diffusion Model, LLM Alignment, Reinforcement Learning
  • Summary: 本文提出dTRPO方法,通过轨迹约简策略降低扩散语言模型策略优化中轨迹概率计算的成本。该方法在STEM任务上提升9.6%,编程任务提升4.3%,指令遵循任务提升3.0%。

[528] Quantitative Introspection in Language Models: Tracking Emotive States Across Conversation

  • arXiv: 2603.18893 (replaced)
  • Authors: Nicolas Martorell, Bruno Bianchi
  • Subjects: cs.AI
  • Tags: Interpretability, LLM Evaluation, Cognitive Science
  • Summary: 本文研究LLM的数值自报告是否能追踪探针定义的情感状态。研究发现基于logit的自报告可以追踪可解释的内部状态,这种内省能力随模型规模增长,在LLaMA-3.1-8B中R²接近0.93。

[529] Agentic Business Process Management: A Research Manifesto

  • arXiv: 2603.18916 (replaced)
  • Authors: Diego Calvanese, Angelo Casciani, Giuseppe De Giacomo, Marlon Dumas, Fabiana Fournier, Timotheus Kampik, Emanuele La Malfa, Lior Limonad, Andrea Marrella, Andreas Metzger, Marco Montali, Daniel Amyot, Peter Fettke, Artem Polyvyanyy, Stefanie Rinderle-Ma, Sebastian Sardiña, Niek Tax, Barbara Weber
  • Subjects: cs.AI
  • Tags: Multi-Agent System, LLM Agent
  • Summary: 本文提出了代理型业务流程管理(APM)的概念框架,将传统BPM扩展为治理自主代理执行组织流程的系统。文章介绍了核心抽象和四个关键能力:框架化自主性、可解释性、对话可操作性和自我修改,为BPM、AI和多代理系统的跨领域研究提供了路线图。

[530] Maximum Entropy Relaxation of Multi-Way Cardinality Constraints for Synthetic Population Generation

  • arXiv: 2603.22558 (replaced)
  • Authors: François Pachet, Jean-Daniel Zucker
  • Subjects: cs.AI
  • Tags: Data Synthesis
  • Summary: 本文提出了一种基于最大熵松弛的方法来生成满足多路约束的合成人口。该方法将约束在期望层面而非精确匹配,转化为凸优化问题,在属性数量和三元交互增加时相比广义raking方法展现出更大优势。

[531] From Pixels to Digital Agents: An Empirical Study on the Taxonomy and Technological Trends of Reinforcement Learning Environments

  • arXiv: 2603.23964 (replaced)
  • Authors: Lijing Luo, Yiben Luo, Alexey Gorbatovski, Sergey Kovalchuk, Xiaodan Liang
  • Subjects: cs.AI
  • Tags: Reinforcement Learning, Benchmark, Embodied AI
  • Summary: 本文通过对2000多篇核心文献的大规模数据分析,研究了强化学习环境的演化趋势。研究揭示了领域向”语义先验”生态系统(由LLM主导)和”领域特定泛化”生态系统的分化,为设计下一代具身语义模拟器提供了定量路线图。

[532] Resisting Humanization: Ethical Front-End Design Choices in AI for Sensitive Contexts

  • arXiv: 2603.24853 (replaced)
  • Authors: Silvia Rossi, Diletta Huyskes, Mackenzie Jorgensen
  • Subjects: cs.AI
  • Tags: AI Ethics, Human-Computer Interaction
  • Venue: CHI 2026 Workshop
  • Summary: 本文探讨了AI对话界面中人性化设计元素的伦理影响,特别是在敏感情境下。通过Chayn组织的案例研究,论证了在界面设计中采取审慎克制的原则,以避免误导用户期望和削弱用户自主性。

[533] AIRA_2: Overcoming Bottlenecks in AI Research Agents

  • arXiv: 2603.26499 (replaced)
  • Authors: Karen Hambardzumyan, Nicolas Baldwin, Edan Toledo, Rishi Hazra, Michael Kuchnik, Bassel Al Omari, Thomas Simon Foster, Anton Protopopov, Jean-Christophe Gagnon-Audet, Ishita Mediratta, Kelvin Niu, Michael Shvartsman, Alisia Lupidi, Alexis Audran-Reiss, Parth Pathak, Tatiana Shavrina, Despoina Magka, Hela Momand, Derek Dunfield, Nicola Cancedda, Pontus Stenetorp, Carole-Jean Wu, Jakob Nicolaus Foerster, Yoram Bachrach, Martin Josifoski
  • Subjects: cs.AI
  • Tags: LLM Agent, LLM Reasoning
  • Summary: 本文介绍了AIRA₂,一个通过异步多GPU执行池、隐藏一致性评估协议和ReAct代理来解决AI研究代理三大性能瓶颈的系统。在MLE-bench-30和AIRs-Bench上取得了超越基线的最优性能。

[534] AutoMS: Multi-Agent Evolutionary Search for Cross-Physics Inverse Microstructure Design

  • arXiv: 2603.27195 (replaced)
  • Authors: Zhenyuan Zhao, Yu Xing, Tianyang Xue, Lingxin Cao, Xin Yan, Lin Lu
  • Subjects: cs.AI
  • Tags: Multi-Agent System, Material Discovery, Neurosymbolic AI
  • Summary: 本文提出了AutoMS,一个用于跨物理逆向微结构设计的多代理神经符号框架。该框架将LLM作为语义导航器与仿真感知进化搜索相结合,在17个跨物理任务上达到83.8%的成功率,显著优于传统进化算法。

[535] CoEvoSkills: Self-Evolving Agent Skills via Co-Evolutionary Verification

  • arXiv: 2604.01687 (replaced)
  • Authors: Hanrong Zhang, Shicheng Fan, Henry Peng Zou, Yankai Chen, Zhenting Wang, Jiayu Zhou, Chengze Li, Wei-Chieh Huang, Yifei Yao, Kening Zheng, Xue Liu, Xiaoxiao Li, Philip S. Yu
  • Subjects: cs.AI
  • Tags: LLM Agent, Tool Learning
  • Summary: 本文提出了CoEvoSkills框架,使LLM代理能够自主生成复杂的多文件技能包。通过技能生成器与代理验证器的协同进化,在SkillsBench上取得了最高通过率,并展现出对其他LLM的强泛化能力。

[536] RL-Driven Sustainable Land-Use Allocation for the Lake Malawi Basin

  • arXiv: 2604.03768 (replaced)
  • Authors: Ying Yao
  • Subjects: cs.AI; cs.LG
  • Tags: Reinforcement Learning, Environmental Planning
  • Summary: 本文提出了一个深度强化学习框架,用于优化马拉维湖流域的土地利用分配以最大化生态系统服务价值。使用PPO算法和动作掩码,该框架在不同政策场景下有效学习了生态合理的土地利用模式。

[537] Beyond Fluency: Toward Reliable Trajectories in Agentic IR

  • arXiv: 2604.04269 (replaced)
  • Authors: Anushree Sinha, Srivaths Ranganathan, Debanshu Das, Abhishek Dharmaratnakar
  • Subjects: cs.AI; cs.LG
  • Tags: Information Retrieval, LLM Agent
  • Summary: 本文分析了代理型信息检索系统中的失败模式,论证了需要从端点准确性转向轨迹完整性和因果归因。提出了在每个交互单元设置验证门控和在校准不确定性下系统性地弃权的建议。

[538] A mathematical theory of evolution for self-designing AIs

  • arXiv: 2604.05142 (replaced)
  • Authors: Kenneth D Harris
  • Subjects: cs.AI; cs.CY; q-bio.PE
  • Tags: AI Safety, LLM Alignment
  • Summary: 本文为自设计AI建立了一个数学进化模型,用有向设计树替代随机突变。证明了在特定条件下适应度会收敛到最大可达值,并讨论了这对AI对齐的影响,特别是欺骗人类评估者的风险。

[539] Learning to Focus: CSI-Free Hierarchical MARL for Reconfigurable Reflectors

  • arXiv: 2604.05165 (replaced)
  • Authors: Hieu Le, Mostafa Ibrahim, Oguz Bedir, Jian Tao, Sabit Ekin
  • Subjects: cs.AI; eess.SP
  • Tags: Multi-Agent System, Reinforcement Learning, Wireless Networks
  • Summary: 本文提出了一种无CSI的分层多代理强化学习框架,用于控制毫米波网络中的可重构智能表面。双层神经架构实现了显著的RSSI提升,同时消除了信道状态信息估计的开销。

[540] SCMAPR: Self-Correcting Multi-Agent Prompt Refinement for Complex-Scenario Text-to-Video Generation

  • arXiv: 2604.05489 (replaced)
  • Authors: Chengyi Yang, Pengzhen Li, Jiayin Qi, Aimin Zhou, Ji Wu, Ji Liu
  • Subjects: cs.AI; cs.MA
  • Tags: Video Generation, Multi-Agent System, Prompt Engineering
  • Code: code
  • Summary: 本文提出了SCMAPR,一个用于复杂场景文本到视频生成的自校正多代理提示优化框架。该框架协调专门代理进行场景路由、策略条件优化和语义验证,在多个基准上持续提升了文本-视频对齐质量。

[541] CuraLight: Debate-Guided Data Curation for LLM-Centered Traffic Signal Control

  • arXiv: 2604.05663 (replaced)
  • Authors: Qing Guo, Xinhang Li, Junyu Chen, Zheng Guo, Shengzhe Xu, Lin Zhang, Lei Li
  • Subjects: cs.AI
  • Tags: Reinforcement Learning, LLM Agent, Autonomous Driving
  • Venue: IJCNN 2026
  • Summary: 本文提出了CuraLight,一个以LLM为中心的交通信号控制框架,其中RL代理通过探索轨迹辅助LLM微调。多LLM集成审议系统提供偏好感知的监督信号,在真实路网上取得了优于基线的性能。

[542] EmoMAS: Emotion-Aware Multi-Agent System for High-Stakes Edge-Deployable Negotiation with Bayesian Orchestration

  • arXiv: 2604.07003 (replaced)
  • Authors: Yunbo Long, Yuhan Liu, Liming Xu
  • Subjects: cs.AI
  • Tags: Multi-Agent System, LLM Agent, Affective Computing
  • Summary: 本文介绍了EmoMAS,一个用于高风险谈判的贝叶斯多代理框架,协调博弈论、强化学习和心理一致性三个专门代理。该框架使小语言模型能够在边缘设备上实现具有竞争力的谈判性能。

[543] EVGeoQA: Benchmarking LLMs on Dynamic, Multi-Objective Geo-Spatial Exploration

  • arXiv: 2604.07070 (replaced)
  • Authors: Jianfei Wu, Zhichun Wang, Zhensheng Wang, Zhiyu He
  • Subjects: cs.AI; cs.LG
  • Tags: Benchmark, LLM Agent, Question Answering
  • Code: code
  • Summary: 本文介绍了EVGeoQA,一个基于电动汽车充电场景的动态多目标地理空间探索基准。研究发现LLM在长距离空间探索方面存在困难,但能够通过总结历史探索轨迹来提高探索效率。

[544] Rhizome OS-1: Rhizome's Semi-Autonomous Operating System for Small Molecule Drug Discovery

  • arXiv: 2604.07512 (replaced)
  • Authors: Yiwen Wang, Gregory Sinenka, Xhuliano Brace
  • Subjects: cs.AI; cs.LG
  • Tags: Drug Discovery, Multi-Agent System, Molecular Generation
  • Summary: 本文展示了Rhizome OS-1,一个小分子药物发现的半自主操作系统,多模态AI代理作为完整的多学科发现团队运作。在肿瘤学活动中,系统生成的分子具有91.9%的新颖骨架,并展现出强 binding affinity 预测能力。

[545] IatroBench: Pre-Registered Evidence of Iatrogenic Harm from AI Safety Measures

  • arXiv: 2604.07709 (replaced)
  • Authors: David Gringras
  • Subjects: cs.AI; cs.CL; cs.CY; cs.LG
  • Tags: LLM Evaluation, LLM Safety, Medical AI
  • Code: code
  • Summary: 该论文介绍了IatroBench基准测试,用于测量AI安全措施如何因身份框架差异而造成医疗信息隐瞒伤害。研究发现,安全投入越大的模型在向普通人提供指导时表现越差,存在明显的身份依赖性信息隐瞒问题。

[546] SEARL: Joint Optimization of Policy and Tool Graph Memory for Self-Evolving Agents

  • arXiv: 2604.07791 (replaced)
  • Authors: Xinshun Feng, Xinhao Song, Lijun Li, Gongshen Liu, Jing Shao
  • Subjects: cs.AI; cs.LG
  • Tags: LLM Agent, Reinforcement Learning, Tool Learning
  • Venue: ACL 2026
  • Summary: 该论文提出了SEARL框架,一种基于工具记忆的自进化智能体方法,通过构建结构化经验记忆来优化策略。该方法利用强化学习和可验证奖励,在资源受限环境下实现了更高效的智能体学习。

[547] HiL-Bench (Human-in-Loop Benchmark): Do Agents Know When to Ask for Help?

  • arXiv: 2604.09408 (replaced)
  • Authors: Mohamed Elfeki, Tu Trinh, Kelvin Luu, Guangze Luo, Nathan Hunt, Ernesto Montoya, Nandan Marwaha, Yannis He, Charles Wang, Fernando Crabedo, Alessa Castilo, Bing Liu
  • Subjects: cs.AI
  • Tags: LLM Agent, LLM Evaluation, Benchmark
  • Summary: 该论文提出了HiL-Bench基准测试,用于评估AI智能体在面对不完整或模糊规范时是否知道何时寻求帮助。研究发现前沿模型普遍存在判断差距,但通过强化学习训练可以改善求助行为。

[548] Strategic Algorithmic Monoculture: Experimental Evidence from Coordination Games

  • arXiv: 2604.09502 (replaced)
  • Authors: Gonzalo Ballestero, Hadi Hosseini, Samarth Khanna, Ran I. Shorrer
  • Subjects: cs.AI; cs.GT; cs.MA; econ.TH
  • Tags: Multi-Agent System, Game AI
  • Summary: 该论文研究了协调博弈中的算法单一文化问题,区分了初级和策略性单一文化。实验表明LLM表现出较高的基线相似性,并能根据协调激励调节相似性,但在维持异质性方面落后于人类。

[549] AXIL: Exact Instance Attribution for Gradient Boosting

  • arXiv: 2301.01864 (replaced)
  • Authors: Paul Geertsema, Helen Lu
  • Subjects: cs.LG; cs.AI
  • Tags: Interpretability, Optimization
  • Summary: 该论文推导了AXIL方法,一种针对梯度提升机的精确实例归因方法,将预测表示为训练目标的加权和。该方法在归因测试中实现了高保真度,且无需物化完整的归因矩阵。

[550] Template-assisted Contrastive Learning of Task-oriented Dialogue Sentence Embeddings

  • arXiv: 2305.14299 (replaced)
  • Authors: Minsik Oh, Jiwei Li, Guoyin Wang
  • Subjects: cs.CL; cs.AI
  • Tags: Dialogue System, Representation Learning
  • Venue: ACL 2026
  • Code: code
  • Summary: 该论文提出了TaDSE方法,一种利用模板信息通过对比学习学习任务导向对话句子嵌入的方法。该方法在五个下游对话基准数据集上显著优于之前的SOTA方法。

[551] SCITUNE: Aligning Large Language Models with Human-Curated Scientific Multimodal Instructions

  • arXiv: 2307.01139 (replaced)
  • Authors: Sameera Horawalavithana, Sai Munikoti, Ian Stewart, Henry Kvinge, Karl Pazdernik
  • Subjects: cs.CV; cs.AI; cs.CL; cs.LG
  • Tags: Instruction Tuning, Vision-Language Model, Scientific Reasoning
  • Code: code
  • Summary: 该论文提出了SciTune框架,用于将大语言模型与科学多模态指令对齐。训练得到的LLaMA-SciTune模型在SciCap、VisText和ScienceQA等科学基准上显著优于现有模型。

[552] MM-LIMA: Less Is More for Alignment in Multi-Modal Datasets

  • arXiv: 2308.12067 (replaced)
  • Authors: Lai Wei, Xiaozhe Li, Zihao Jiang, Weiran Huang, Lichao Sun
  • Subjects: cs.LG; cs.AI; cs.CL; cs.CV
  • Tags: Instruction Tuning, Vision-Language Model, Data Selection
  • Code: code
  • Summary: 该论文证明了多模态LLM可以用更少但更高质量的指令数据实现更好的对齐效果。提出的MM-LIMA仅使用200个示例,通过可训练的数据选择器过滤低质量数据,在多项评估中超越了MiniGPT-4。

[553] CROP: Conservative Reward for Model-based Offline Policy Optimization

  • arXiv: 2310.17245 (replaced)
  • Authors: Hao Li, Xiao-Hu Zhou, Shu-Hai Li, Mei-Jiang Gui, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Zhen-Qiu Feng, Zeng-Guang Hou
  • Subjects: cs.LG; cs.AI
  • Tags: Reinforcement Learning, Model-Based RL
  • Summary: 该论文提出了CROP算法,一种基于模型的离线强化学习方法,通过保守奖励估计来缓解分布偏移和过高估计问题。该方法在最小化估计误差和随机动作奖励方面实现了竞争性能。

[554] Language Reconstruction with Brain Predictive Coding from fMRI Data

  • arXiv: 2405.11597 (replaced)
  • Authors: Congchi Yin, Ziyi Ye, Piji Li
  • Subjects: cs.CL; cs.AI
  • Tags: Brain-Computer Interface, Text Generation
  • Venue: ACL 2026
  • Summary: 该论文提出了PredFT框架,基于预测编码理论从fMRI数据重建语言。该方法通过从感兴趣区域提取大脑预测表征,在自然语言理解fMRI数据集上优于当前解码模型。

[555] An Iterative Utility Judgment Framework Inspired by Philosophical Relevance via LLMs

  • arXiv: 2406.11290 (replaced)
  • Authors: Hengran Zhang, Keping Bi, Jiafeng Guo, Xueqi Cheng
  • Subjects: cs.IR; cs.AI; cs.CL; cs.LG
  • Tags: RAG, Information Retrieval
  • Venue: ACL 2026 Findings
  • Summary: 该论文提出了ITEM框架,一种受哲学相关性理论启发的迭代效用判断框架,用于改进RAG系统。该框架在效用判断、排序和答案生成方面均取得了改进。

[556] Linear Attention Based Deep Nonlocal Means Filtering for Multiplicative Noise Removal

  • arXiv: 2407.05087 (replaced)
  • Authors: Xiao Siyao, Huang Libing, Zhang Shunsheng
  • Subjects: eess.IV; cs.AI; cs.CV
  • Tags: Image Enhancement
  • Code: code
  • Summary: 该论文提出了一种基于线性注意力的深度非局部均值滤波方法,用于去除雷达和医学图像中的乘性噪声。该方法实现了线性复杂度,同时保持了接近传统NLM的可解释性。

[557] Deep deterministic policy gradient with symmetric data augmentation for lateral attitude tracking control of a fixed-wing aircraft

  • arXiv: 2407.11077 (replaced)
  • Authors: Yifei Li, Erik-Jan van Kampen
  • Subjects: cs.LG; cs.AI
  • Tags: Reinforcement Learning, Data Augmentation
  • Summary: 该论文利用系统对称性开发了样本高效的离线强化学习方法,用于固定翼飞机横向姿态跟踪控制。该方法使用对称数据增强和双评论家结构提高了样本利用效率。

[558] MegaFake: A Theory-Driven Dataset of Fake News Generated by Large Language Models

  • arXiv: 2408.11871 (replaced)
  • Authors: Lionel Z. Wang, Ka Chung Ng, Yiming Ma, Wenqi Fan
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Security, Data Synthesis, Content Moderation
  • Summary: 该论文开发了LLM-Fake理论框架来解释机器生成的欺骗行为,并创建了MegaFake数据集。该研究推进了对人机欺骗机制的理论理解和假新闻检测的实践方法。

[559] Deep Optimizer States: Towards Scalable Training of Transformer Models Using Interleaved Offloading

  • arXiv: 2410.21316 (replaced)
  • Authors: Avinash Maurya, Jie Ye, M. Mustafa Rafique, Franck Cappello, Bogdan Nicolae
  • Subjects: cs.LG; cs.AI; cs.DC; cs.ET; cs.PF
  • Tags: LLM Training, Distributed Training, GPU Computing
  • Summary: 该论文提出了Deep Optimizer States技术,通过动态在主机和GPU内存之间移动优化器状态来实现可扩展的LLM训练。该方法比现有技术实现了2.5倍的迭代加速。

[560] The Phantom of PCIe: Constraining Generative Artificial Intelligences for Practical Peripherals Trace Synthesizing

  • arXiv: 2411.06376 (replaced)
  • Authors: Zhibai Huang, Chen Chen, James Yen, Yihan Shen, Yongchen Xie, Zhixiang Wei, Kailiang Xu, Yun Wang, Fangxin Liu, Tao Song, Mingyuan Xia, Zhengwei Qi
  • Subjects: cs.LG; cs.AI; cs.AR
  • Tags: Data Synthesis, Hardware Simulation, LLM Hallucination
  • Venue: DAC 2026
  • Summary: 该论文提出了Phantom框架,用于合成PCIe TLP跟踪,通过后处理过滤器消除幻觉并强制执行协议约束。该方法产生了实用的跟踪,在任务特定指标上比现有模型显著提升。

[561] PoTable: Towards Systematic Thinking via Plan-then-Execute Stage Reasoning on Tables

  • arXiv: 2412.04272 (replaced)
  • Authors: Qingyang Mao, Qi Liu, Zhi Li, Mingyue Cheng, Zheng Zhang, Rui Li
  • Subjects: cs.IR; cs.AI
  • Tags: Question Answering, Code Generation, LLM Reasoning
  • Venue: TKDE 2026
  • Code: code
  • Summary: 本文提出了PoTable,一种面向阶段的计划-执行方法,将系统化思维融入表格推理过程。该方法通过代码生成和实时运行反馈机制,生成可靠、可解释的表格推理结果。

[562] WebLLM: A High-Performance In-Browser LLM Inference Engine

  • arXiv: 2412.15803 (replaced)
  • Authors: Charlie F. Ruan, Yucheng Qin, Akaash R. Parthasarathy, Xun Zhou, Ruihang Lai, Hongyi Jin, Yixin Dong, Bohan Hou, Meng-Shiun Yu, Yiyan Zhai, Sudeep Agarwal, Hangrui Cao, Siyuan Feng, Tianqi Chen
  • Subjects: cs.LG; cs.AI
  • Tags: LLM Inference, Edge Computing
  • Code: code
  • Summary: 本文介绍了WebLLM,一个开源JavaScript框架,可在网页浏览器中实现高性能LLM推理。该框架利用WebGPU和WebAssembly实现本地GPU加速和CPU计算,保留高达80%的原生性能。

[563] HumanVBench: Probing Human-Centric Video Understanding in MLLMs with Automatically Synthesized Benchmarks

  • arXiv: 2412.17574 (replaced)
  • Authors: Ting Zhou, Daoyuan Chen, Qirui Jiao, Bolin Ding, Yaliang Li, Ying Shen
  • Subjects: cs.CV; cs.AI
  • Tags: Video Understanding, Vision-Language Model, Benchmark
  • Venue: CVPR 2026
  • Summary: 本文提出了HumanVBench,一个用于评估多模态大语言模型人类中心视频理解能力的综合基准。该基准包含16个细粒度任务,并采用自动化流程合成高质量视频标注和挑战性问题。

[564] A Multiparty Homomorphic Encryption Approach to Confidential Federated Kaplan Meier Survival Analysis

  • arXiv: 2412.20495 (replaced)
  • Authors: Narasimha Raghavan Veeraragavan, Svetlana Boudko, Jan Franz Nygård
  • Subjects: cs.CR; cs.AI; cs.LG; stat.ML
  • Tags: Federated Learning, Privacy, Medical AI
  • Summary: 本文提出了一种基于阈值CKKS同态加密的隐私保护联邦Kaplan-Meier生存分析框架。该框架支持加密聚合,在仅暴露公共输出的情况下实现高保真生存估计。

[565] Influencing Humans to Conform to Preference Models for RLHF

  • arXiv: 2501.06416 (replaced)
  • Authors: Stephane Hatgis-Kessell, W. Bradley Knox, Serena Booth, Peter Stone
  • Subjects: cs.LG; cs.AI; cs.HC
  • Tags: RLHF, LLM Alignment
  • Summary: 本文研究了是否可以影响人类偏好表达,使其更符合RLHF算法所假设的偏好模型。研究提出了三种干预方法:展示底层量化信息、训练人员遵循特定偏好模型、修改偏好诱导问题。

[566] Curriculum-based Sample Efficient Reinforcement Learning for Robust Stabilization of a Quadrotor

  • arXiv: 2501.18490 (replaced)
  • Authors: Fausto Mauricio Lagos Suarez, Akshit Saradagi, Vidya Sumathy, Shruti Kotpaliwar, George Nikolakopoulos
  • Subjects: cs.RO; cs.AI
  • Tags: Curriculum Learning, Reinforcement Learning, Robotics
  • Summary: 本文提出了一种样本高效的课程学习方法,用于训练四旋翼飞行器鲁棒稳定的端到端强化学习策略。该方法将学习目标分解为三阶段课程,逐步增加任务复杂度并传递知识。

[567] Integrating Semi-Supervised and Active Learning for Semantic Segmentation

  • arXiv: 2501.19227 (replaced)
  • Authors: Wanli Ma, Oktay Karakus, Paul L. Rosin
  • Subjects: cs.CV; cs.AI
  • Tags: Image Segmentation, Weak Supervision
  • Summary: 本文提出了一种结合主动学习和改进半监督学习框架的语义分割方法。该方法引入伪标签自动修正模块,通过比较特征表示来纠正潜在错误的伪标签,在不增加标注预算的情况下提升性能。

[568] Large Language Models Can Help Mitigate Barren Plateaus in Quantum Neural Networks

  • arXiv: 2502.13166 (replaced)
  • Authors: Jun Zhuang, Chaowen Guan
  • Subjects: cs.AI; cs.CL; cs.LG
  • Tags: Quantum Computing, Optimization
  • Venue: ACL 2026 Findings
  • Summary: 本文提出了AdaInit框架,利用大语言模型和下鞅性质迭代合成量子神经网络的初始参数,以缓解梯度消失问题。该方法通过数据集特征和梯度反馈自适应探索参数空间,具有理论收敛保证。

[569] ExPath: Targeted Pathway Inference for Biological Knowledge Bases via Graph Learning and Explanation

  • arXiv: 2502.18026 (replaced)
  • Authors: Rikuto Kotoge, Ziwei Yang, Zheng Chen, Yushun Dong, Yasuko Matsubara, Jimeng Sun, Yasushi Sakurai
  • Subjects: cs.LG; cs.AI
  • Tags: Graph Learning, Knowledge Graph, Medical AI
  • Venue: AAAI 2026
  • Summary: 本文提出了ExPath子图推理框架,将实验数据与生物知识库整合用于靶向通路推断。该方法使用图学习和解释技术识别具有生物学意义的通路链接。

[570] Learning to Play Piano in the Real World

  • arXiv: 2503.15481 (replaced)
  • Authors: Yves-Simon Zeulner, Simon Crämer, Sandeep Selvaraj, Roberto Calandra
  • Subjects: cs.RO; cs.AI; cs.LG
  • Tags: Robotics, Sim-to-Real
  • Summary: 本文开发了首个在真实灵巧机器人上部署的学习型钢琴演奏系统。该方法采用Sim2Real2Sim方法,在仿真训练和真实部署之间迭代交替,使机器人能够准确演奏多首钢琴曲目。

[571] AccidentSim: Generating Vehicle Collision Videos with Physically Realistic Collision Trajectories from Real-World Accident Reports

  • arXiv: 2503.20654 (replaced)
  • Authors: Xiangwen Zhang, Qian Zhang, Longfei Han, Qiang Qu, Xiaoming Chen, Weidong Cai
  • Subjects: cs.CV; cs.AI
  • Tags: Video Generation, Autonomous Driving
  • Summary: 本文介绍了AccidentSim框架,用于从真实事故报告中生成物理真实的车辆碰撞视频。该框架利用物理模拟器复制碰撞轨迹,并使用NeRF渲染高质量背景,生成视觉和物理真实性兼备的视频。

[572] If an LLM Were a Character, Would It Know Its Own Story? Evaluating Lifelong Learning in LLMs

  • arXiv: 2503.23514 (replaced)
  • Authors: Siqi Fan, Xiusheng Huang, Yiqun Yao, Xuezhi Fang, Kang Liu, Peng Han, Shuo Shang, Aixin Sun, Yequan Wang
  • Subjects: cs.CL; cs.AI
  • Tags: Continual Learning, Benchmark, LLM Evaluation
  • Summary: 本文提出了LIFESTATE-BENCH基准,用于评估大语言模型的终身学习能力。实验表明非参数方法在状态学习管理上显著优于参数方法,但所有模型都面临灾难性遗忘的挑战。

[573] TARAC: Mitigating Hallucination in LVLMs via Temporal Attention Real-time Accumulative Connection

  • arXiv: 2504.04099 (replaced)
  • Authors: Lei Jiang, Chunzhao Xie, Tongxuan Liu, Yuting Zeng, jinrong Guo, Yunheng Shen, Weizhe Huang, Jing Li, Xiaohua Xu
  • Subjects: cs.CV; cs.AI
  • Tags: LLM Hallucination, Vision-Language Model
  • Summary: 本文提出了TARAC框架,通过动态累积和重新注入历史注意力来缓解大型视觉语言模型的幻觉问题。该方法无需训练,以极低的推理开销显著减少幻觉现象。

[574] Optimizing Large Language Models: Metrics, Energy Efficiency, and Case Study Insights

  • arXiv: 2504.06307 (replaced)
  • Authors: Tahniat Khan, Soroor Motie, Sedef Akinli Kocak, Shaina Raza
  • Subjects: cs.LG; cs.AI
  • Tags: LLM Inference, Energy Efficiency
  • Summary: 本文探讨了LLM部署中的能效优化技术,展示了量化和本地推理方法可在不影响效果的情况下将能耗和碳排放降低高达45%。研究为资源受限环境下的可持续AI部署提供了可行方案。

[575] Not All Rollouts are Useful: Down-Sampling Rollouts in LLM Reinforcement Learning

  • arXiv: 2504.13818 (replaced)
  • Authors: Yixuan Even Xu, Yash Savani, Fei Fang, J. Zico Kolter
  • Subjects: cs.LG; cs.AI; cs.CL
  • Tags: LLM Reasoning, Reinforcement Learning
  • Venue: TMLR 2026
  • Summary: 本文提出了PODS方法,通过在策略优化中对rollout进行下采样来降低更新成本。该方法使用最大方差下采样准则选择子集,在保持学习质量的同时将训练速度提升至少1.7倍。

[576] LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers

  • arXiv: 2504.14386 (replaced)
  • Authors: Md Abtahi Majeed Chowdhury, Md Rifat Ur Rahman, Akil Ahmad Taki
  • Subjects: cs.CV; cs.AI
  • Tags: Vision Transformer, Representation Learning
  • Summary: 本文提出了LOOPE方法,一种用于视觉Transformer的可学习补丁排序方法,优化位置嵌入的空间表示。研究还引入了三单元实验框架,用于更敏感地评估位置嵌入的有效性。

[577] Non-stationary Diffusion For Probabilistic Time Series Forecasting

  • arXiv: 2505.04278 (replaced)
  • Authors: Weiwei Ye, Zhuopeng Xu, Ning Gui
  • Subjects: cs.LG; cs.AI
  • Tags: Time Series Forecasting, Diffusion Model
  • Venue: ICML 2026
  • Code: code
  • Summary: 本文提出了NsDiff框架,利用位置尺度噪声模型(LSNM)放松了加性噪声模型的固定不确定性假设,从而能够捕捉时间序列的非平稳特性。该框架结合了去噪扩散模型与预训练的条件均值和方差估计器,并引入了不确定性感知的噪声调度机制。实验表明,该方法在多个真实和合成数据集上均优于现有方法。

[578] Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers

  • arXiv: 2505.04842 (replaced)
  • Authors: Kusha Sareen, Morgane M Moss, Alessandro Sordoni, Rishabh Agarwal, Arian Hosseini
  • Subjects: cs.LG; cs.AI
  • Tags: LLM Reasoning, Reinforcement Learning
  • Summary: 本文提出了RL^V方法,通过联合训练LLM作为推理器和生成验证器来增强现有的无价值函数强化学习方法。该方法显著提升了数学推理准确率和测试时计算扩展的效率,并在跨领域任务中表现出良好的泛化能力。

[579] Auto-regressive transformation for image alignment

  • arXiv: 2505.04864 (replaced)
  • Authors: Kanggeon Lee, Soochahn Lee, Kyoung Mu Lee
  • Subjects: cs.CV; cs.AI
  • Tags: Image Alignment, Computer Vision
  • Summary: 本文提出了一种名为自回归变换(ART)的新方法,通过自回归流程迭代估计从粗到细的变换场来解决图像对齐问题。该方法利用分层多尺度特征和交叉注意力引导,在特征稀疏或大变形等挑战性条件下实现了精确对齐。

[580] Variational Visual Question Answering for Uncertainty-Aware Selective Prediction

  • arXiv: 2505.09591 (replaced)
  • Authors: Tobias Jan Wieczorek, Nathalie Daun, Mohammad Emtiyaz Khan, Marcus Rohrbach
  • Subjects: cs.CV; cs.AI
  • Tags: Vision-Language Model, Question Answering, Uncertainty Estimation
  • Venue: TMLR 2026
  • Summary: 本文提出了一种名为“变分VQA”的方法,首次展示了变分贝叶斯在视觉问答选择性预测中的有效性。该方法通过改进校准和提出风险规避选择器,显著提高了模型在低错误容忍度下的可靠性,使大型视觉语言模型更安全可信。

[581] TokUR: Token-Level Uncertainty Estimation for Large Language Model Reasoning

  • arXiv: 2505.11737 (replaced)
  • Authors: Tunyu Zhang, Haizhou Shi, Yibin Wang, Hengyi Wang, Xiaoxiao He, Zhuowei Li, Haoxian Chen, Ligong Han, Kai Xu, Huan Zhang, Dimitris Metaxas, Hao Wang
  • Subjects: cs.LG; cs.AI; cs.CL
  • Tags: LLM Reasoning, Uncertainty Estimation
  • Venue: ICLR 2026
  • Summary: 本文提出了针对推理的令牌级不确定性估计框架,通过在解码过程中引入低秩随机权重扰动来生成预测分布。实验表明,该方法不仅能有效评估答案正确性,还能利用不确定性信号在测试时增强模型的推理性能。

[582] Sat2Sound: A Unified Framework for Zero-Shot Soundscape Mapping

  • arXiv: 2505.13777 (replaced)
  • Authors: Subash Khanal, Srikumar Sastry, Aayush Dhakal, Adeel Ahmad, Abby Stylianou, Nathan Jacobs
  • Subjects: cs.CV; cs.AI; cs.SD
  • Tags: Multimodal Learning, Remote Sensing, Audio Generation
  • Venue: CVPR 2026 Workshop
  • Code: code
  • Summary: 本文提出了Sat2Sound框架,通过对比学习和码本对齐学习,联合学习音频、文本描述、卫星图像和合成图像标题。该方法发现了跨模态共享的“声景概念”,实现了最先进的跨模态检索性能,并支持位置条件的声景合成。

[583] SpatialScore: Towards Comprehensive Evaluation for Spatial Intelligence

  • arXiv: 2505.17012 (replaced)
  • Authors: Haoning Wu, Xiao Huang, Yaohui Chen, Ya Zhang, Yanfeng Wang, Weidi Xie
  • Subjects: cs.CV; cs.AI
  • Tags: Vision-Language Model, Benchmark, LLM Evaluation
  • Venue: CVPR 2026
  • Summary: 本文提出了SpatialScore基准,用于全面评估多模态大语言模型的空间理解能力,涵盖了多种视觉数据类型和任务。此外,作者构建了大规模训练资源SpatialCorpus和多智能体系统SpatialAgent,显著提升了模型的空间推理性能。

[584] GoT-R1: Unleashing Reasoning Capability of MLLM for Visual Generation with Reinforcement Learning

  • arXiv: 2505.17022 (replaced)
  • Authors: Chengqi Duan, Rongyao Fang, Yuqing Wang, Kun Wang, Linjiang Huang, Xingyu Zeng, Hongsheng Li, Xihui Liu
  • Subjects: cs.CV; cs.AI; cs.CL; cs.LG; cs.MM
  • Tags: Text-to-Image, Reinforcement Learning, Vision-Language Model
  • Venue: ICLR 2026
  • Code: code
  • Summary: 本文提出了GoT-R1框架,利用强化学习增强视觉生成中的语义-空间推理能力。该方法通过双阶段多维奖励框架监督生成过程,在处理涉及精确空间关系和属性绑定的复杂提示词方面取得了显著进步。

[585] Tuning Language Models for Robust Prediction of Diverse User Behaviors

  • arXiv: 2505.17682 (replaced)
  • Authors: Fanjin Meng, Jingtao Ding, Jiahui Gong, Chen Yang, Hong Chen, Zuojian Wang, Haisheng Lu, Yong Li
  • Subjects: cs.CL; cs.AI
  • Tags: Recommender System, Instruction Tuning
  • Summary: 本文介绍了BehaviorLM,一种渐进式微调方法,旨在解决现有方法在预测长尾用户行为时的过拟合问题。该方法分两阶段进行,在保持常见行为预测性能的同时,有效提升了模型对长尾行为的预测能力。

[586] Learning World Models for Interactive Video Generation

  • arXiv: 2505.21996 (replaced)
  • Authors: Taiye Chen, Xun Hu, Zihan Ding, Chi Jin
  • Subjects: cs.CV; cs.AI
  • Tags: Video Generation, RAG, World Model
  • Summary: 本文通过引入动作条件和自回归框架增强了图像到视频模型的交互能力,并提出了视频检索增强生成(VRAG)方法。该方法通过显式的全局状态条件显著减少了长期累积误差,提高了世界模型的时空一致性。

[587] Towards Reasonable Concept Bottleneck Models

  • arXiv: 2506.05014 (replaced)
  • Authors: Nektarios Kalampalikis, Kavya Gupta, Georgi Vitanov, Isabel Valera
  • Subjects: cs.LG; cs.AI; stat.ML
  • Tags: Interpretability, Knowledge Representation
  • Summary: 本文提出了一种设计概念瓶颈模型的新框架,能够显式编码概念间及概念到任务的先验关系。该方法通过引入正则化侧通道来补充不完整的概念集,在保持可解释性的同时实现了具有竞争力的任务性能。

[588] Progressive Multimodal Interaction Network for Reliable Quantification of Fish Feeding Intensity in Aquaculture

  • arXiv: 2506.14170 (replaced)
  • Authors: Shulong Zhang, Mingyuan Yao, Jiayin Zhao, Daoliang Li, Yingyi Chen, Haihua Wang
  • Subjects: cs.CV; cs.AI; cs.ET
  • Tags: Multimodal Learning, Agricultural AI
  • Summary: 本文提出了一种渐进式多模态交互网络(PMIN),融合图像、音频和水波数据来量化鱼类的摄食强度。该方法通过统一特征提取框架和基于自适应证据推理的决策融合策略,有效解决了模态间的不一致性问题,提高了量化结果的可靠性。

[589] LLM-based Realistic Safety-Critical Driving Video Generation

  • arXiv: 2507.01264 (replaced)
  • Authors: Yongjie Fu, Ruijian Zha, Pei Tian, Xuan Di
  • Subjects: cs.RO; cs.AI
  • Tags: Autonomous Driving, Video Generation, LLM Agent
  • Summary: 本文提出了一种利用大语言模型(LLM)生成安全关键驾驶场景脚本的框架,并结合Cosmos-Transfer1与ControlNet将仿真场景转化为逼真的驾驶视频。该方法能够生成多样化且现实的边缘案例,为自动驾驶系统的仿真测试提供了有力工具。

[590] Absorption and Inertness in Coarse-Grained Arithmetic: A Heuristic Application to the St. Petersburg Paradox

  • arXiv: 2507.12475 (replaced)
  • Authors: Takashi Izumo
  • Subjects: econ.TH; cs.AI; math.OC
  • Tags: Decision Making
  • Summary: 本文探讨了一种基于粗粒化划分的修正加法运算,定义了粗粒化代表加法和粗粒化单元加法,并研究了其吸收性、惰性和非结合性等结构性质。作者将此框架启发式地应用于圣彼得堡悖论,展示了发散奖励结构在粗粒化聚合下可能无法产生无界增长的数学机制。

[591] Large Language Model as An Operator: An Experience-Driven Solution for Distribution Network Voltage Control

  • arXiv: 2507.14800 (replaced)
  • Authors: Xu Yang, Chenhui Lin, Licheng Sha, Liping Yang, Shuzhou Wu, Xichen Tian, Haotian Liu, Wenchuan Wu
  • Subjects: eess.SY; cs.AI
  • Tags: LLM Agent, Power Management
  • Summary: 本文提出了一种基于大语言模型(LLM)的经验驱动型配电网日前电压/无功调度方案。该方法通过经验存储、检索、生成和修改模块的协作,实现了调度策略的自我进化,有效应对了信息不完整情况下的电力系统调度问题。

[592] Data Mixing Agent: Learning to Re-weight Domains for Continual Pre-training

  • arXiv: 2507.15640 (replaced)
  • Authors: Kailai Yang, Xiao Liu, Lei Ji, Hao Li, Xiao Liang, Zhiwei Liu, Yeyun Gong, Peng Cheng, Mao Yang
  • Subjects: cs.LG; cs.AI; cs.CL
  • Tags: Data Mixing, LLM Training, Reinforcement Learning
  • Venue: ACL 2026
  • Summary: 本文提出了数据混合智能体,这是首个基于模型的端到端框架,通过强化学习学习如何重新加权领域数据。实验表明,该方法在持续预训练中能有效平衡源领域和目标领域的性能,并具有良好的泛化能力。

[593] PRIX: Learning to Plan from Raw Pixels for End-to-End Autonomous Driving

  • arXiv: 2507.17596 (replaced)
  • Authors: Maciej K. Wozniak, Lianhang Liu, Yixi Cai, Patric Jensfelt
  • Subjects: cs.CV; cs.AI; cs.LG; cs.RO
  • Tags: Autonomous Driving, Computer Vision
  • Venue: RA-L 2026
  • Summary: 该论文提出了PRIX,一种仅使用相机数据、无需LiDAR或BEV表示的高效端到端自动驾驶架构。通过视觉特征提取器和生成式规划头,结合Context-aware Recalibration Transformer模块,直接从原始像素预测安全轨迹,在NavSim和nuScenes基准上达到最先进性能。

[594] Modular Delta Merging with Orthogonal Constraints: A Scalable Framework for Continual and Reversible Model Composition

  • arXiv: 2507.20997 (replaced)
  • Authors: Haris Khan, Sadia Asif, Shumaila Asif, Muhammad Zeeshan Karamat, Rajesh Upadhayaya
  • Subjects: cs.LG; cs.AI
  • Tags: Continual Learning, Transfer Learning
  • Summary: 该论文提出了MDM-OC框架,通过将任务特定模型编码为共享基座的增量并投影到正交子空间,实现可扩展、无干扰且可逆的模型组合。该方法支持持续集成新模型、结构化撤销以符合GDPR等合规要求,并在视觉和NLP基准上优于现有方法。

[595] Teaching the Teacher: The Role of Teacher-Student Smoothness Alignment in Genetic Programming-based Symbolic Distillation

  • arXiv: 2507.22767 (replaced)
  • Authors: Soumyadeep Dhar, Kei Sen Fong, Mehul Motani
  • Subjects: cs.LG; cs.AI
  • Tags: Knowledge Distillation, Interpretability
  • Summary: 该论文识别出神经网络与符号回归之间的功能复杂度不匹配问题,提出通过Jacobian和Lipschitz惩罚显式正则化教师模型的功能平滑性。实验表明,从平滑正则化教师蒸馏的学生模型在R²分数上取得统计显著提升。

[596] Reliable Evaluation Protocol for Low-Precision Retrieval

  • arXiv: 2508.03306 (replaced)
  • Authors: Kisu Yang, Yoonna Jang, Hwanseok Jang, Kenneth Choi, Isabelle Augenstein, Heuiseok Lim
  • Subjects: cs.IR; cs.AI; cs.CL
  • Tags: Information Retrieval, LLM Evaluation
  • Venue: ACL 2026
  • Summary: 该论文针对低精度检索系统中因精度降低导致的虚假平局问题,提出了更鲁棒的评估协议,包括高精度评分(HPS)和平局感知检索指标(TRM)。该方法显著减少了平局引起的不稳定性,实现了更一致可靠的低精度检索评估。

[597] AdvDINO: Domain-Adversarial Self-Supervised Representation Learning for Spatial Proteomics

  • arXiv: 2508.04955 (replaced)
  • Authors: Stella Su, Marc Harary, Scott J. Rodig, William Lotter
  • Subjects: cs.CV; cs.AI
  • Tags: Self-Supervised Learning, Medical AI, Domain Adaptation
  • Venue: MIDL 2026
  • Summary: 该论文提出了AdvDINO,一种将梯度反转层集成到DINOv2架构中的域对抗自监督学习框架,用于学习域不变特征。应用于肺癌患者的多重免疫荧光全切片图像,该模型有效缓解了切片特异性偏差,揭示了具有不同蛋白组特征和预后意义的表型聚类。

[598] Echoes of Automation: The Increasing Use of LLMs in Newsmaking

  • arXiv: 2508.06445 (replaced)
  • Authors: Abolfazl Ansari, Delvin Ce Zhang, Nafis Irtiza Tripto, Dongwon Lee
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Evaluation, AI Ethics
  • Venue: SBP-BRiMS 2025
  • Summary: 该研究使用三种先进AI文本检测器分析了超过40,000篇新闻文章中的AI生成内容,发现近年来GenAI使用显著增加,尤其是在地方和校园新闻中。语言分析显示GenAI提高了词汇丰富度和可读性,但降低了正式性,导致写作风格更加统一。

[599] COXNet: Cross-Layer Fusion with Adaptive Alignment and Scale Integration for RGBT Tiny Object Detection

  • arXiv: 2508.09533 (replaced)
  • Authors: Peiran Peng, Tingfa Xu, Liqiang Song, Mengqi Zhu, Yuqiang Fang, Jianan Li
  • Subjects: cs.CV; cs.AI
  • Tags: Object Detection, Multimodal Learning
  • Summary: 该论文提出了COXNet,一种用于RGBT微小目标检测的新框架,包含跨层融合模块、动态对齐与尺度细化模块以及优化的标签分配策略。该方法在RGBTDronePerson数据集上实现了3.32%的mAP提升,有效解决了空间错位和低光照等挑战。

[600] FedKLPR: KL-Guided Pruning-Aware Federated Learning for Person Re-Identification

  • arXiv: 2508.17431 (replaced)
  • Authors: Po-Hsien Yu, Yu-Syuan Tseng, Shao-Yi Chien
  • Subjects: cs.CV; cs.AI; cs.LG
  • Tags: Federated Learning, Model Compression
  • Summary: 该论文提出了FedKLPR,一种用于行人重识别的轻量级联邦学习框架,包含KL散度引导训练、非结构化剪枝策略和跨轮次恢复机制。该方法在保持竞争性准确率的同时,将通信成本降低了40-42%。

[601] Proximal Supervised Fine-Tuning

  • arXiv: 2508.17784 (replaced)
  • Authors: Wenhong Zhu, Ruobing Xie, Rui Wang, Xingwu Sun, Di Wang, Pengfei Liu
  • Subjects: cs.LG; cs.AI; cs.CL
  • Tags: Instruction Tuning, Reinforcement Learning
  • Venue: ICLR 2026
  • Summary: 该论文受强化学习中TRPO和PPO启发,提出了近端监督微调(PSFT),通过引入信任区域约束来限制策略漂移。实验表明PSFT在域内匹配SFT性能,在域外泛化上优于SFT,且在长时间训练中保持稳定。

[602] Lifetime-Aware Design for Item-Level Intelligence at the Extreme Edge

  • arXiv: 2509.08193 (replaced)
  • Authors: Shvetank Prakash, Andrew Cheng, Olof Kindgren, Ashiq Ahamed, Graham Knight, Jed Kufel, Francisco Rodriguez, Arya Tschand, David Kong, Mariam Elgamal, Jerry Huang, Emma Chen, Gage Hills, Richard Price, Emre Ozer, Vijay Janapa Reddi
  • Subjects: cs.AR; cs.AI; cs.ET
  • Tags: Edge Computing, Energy Efficiency, Low Power
  • Summary: 该论文提出了FlexiFlow,一种针对物品级智能的寿命感知设计框架,使用柔性电子技术并考虑部署规模下的碳足迹权衡。框架包括FlexiBench工作负载套件、FlexiBits优化RISC-V核心和碳感知模型,通过首次柔性电子流片验证了30.9kHz运行。

[603] Multi-Model Synthetic Training for Mission-Critical Small Language Models

  • arXiv: 2509.13047 (replaced)
  • Authors: Nolan Platt, Pragyansmita Nayak
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: Data Synthesis, Knowledge Distillation
  • Venue: IEEE FLLM 2025
  • Summary: 该论文提出了一种使用LLM作为一次性教师生成合成训练数据的方法,将32亿条AIS船舶追踪记录转化为21,543个问答对,实现了261倍的成本降低。微调后的Qwen2.5-7B模型在海事任务上达到75%准确率,同时大幅降低推理成本。

[604] CoMelSinger: Discrete Token-Based Zero-Shot Singing Synthesis With Structured Melody Control and Guidance

  • arXiv: 2509.19883 (replaced)
  • Authors: Junchuan Zhao, Wei Zeng, Tianle Lyu, Ye Wang
  • Subjects: cs.SD; cs.AI
  • Tags: Speech Synthesis, Music Generation
  • Venue: IEEE TASLP
  • Summary: 该论文提出了CoMelSinger,一种基于离散编解码器的零样本歌声合成框架,实现了结构化和解耦的旋律控制。通过粗到细对比学习策略抑制韵律泄漏,并引入轻量级歌声转录模块提供细粒度帧级监督,在音高准确性和零样本迁移性上取得显著提升。

[605] KSDiff: Keyframe-Augmented Speech-Aware Dual-Path Diffusion for Facial Animation

  • arXiv: 2509.20128 (replaced)
  • Authors: Tianle Lyu, Junchuan Zhao, Ye Wang
  • Subjects: cs.GR; cs.AI; cs.CV; cs.MM
  • Tags: Diffusion Model, Video Generation, Multimodal Learning
  • Venue: ICASSP 2026
  • Summary: 该论文提出了KSDiff,一种关键帧增强的语音感知双路径扩散面部动画框架。通过双路径语音编码器解耦表情和姿态特征,结合自回归关键帧建立模块预测显著运动帧,在HDTF和VoxCeleb基准上实现了最先进的性能。

[606] FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models

  • arXiv: 2509.20624 (replaced)
  • Authors: Amin Karimi Monsefi, Nikhil Bhendawade, Manuel Rafael Ciosici, Dominic Culver, Yizhe Zhang, Irina Belousova
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: Text Generation, Diffusion Model
  • Venue: ICLR 2026
  • Code: code
  • Summary: 该论文介绍了FS-DFM,一种少步离散流匹配模型,通过将采样步数作为显式参数并训练模型在不同步数预算下保持一致性。使用8步采样即可达到与1024步基线相同的困惑度,同时实现128倍的加速。

[607] StyleBench: Evaluating thinking styles in Large Language Models

  • arXiv: 2509.20868 (replaced)
  • Authors: Junyu Guo, Shangding Gu, Ming Jin, Costas Spanos, Javad Lavaei
  • Subjects: cs.LG; cs.AI; cs.CL
  • Tags: LLM Reasoning, LLM Evaluation, Benchmark
  • Code: code
  • Summary: 该论文提出了StyleBench,将推理结构作为容量受限的设计选择进行评估,涵盖五种推理风格和15个开源LLM。研究发现更大的结构复杂性仅在有限场景下提升准确率,且GRPO比监督微调学习到更强的自适应控制能力。

[608] Position: The Hidden Costs and Measurement Gaps of Reinforcement Learning with Verifiable Rewards

  • arXiv: 2509.21882 (replaced)
  • Authors: Fang Wu, Aaron Tu, Weihao Xuan, Heli Qi, Xu Huang, Qingcheng Zeng, Shayan Talaei, Yijia Xiao, Peng Xia, Xiangru Tang, Yuchen Zhuang, Bing Hu, Hanqun Cao, Wenqi Shi, Rui Yang, Nan Liu, Huaxiu Yao, Ge Liu, Li Erran Li, Amin Saberi, Naoto Yokoya, Jure Leskovec, Yejin Choi
  • Subjects: cs.LG; cs.AI
  • Tags: Reinforcement Learning, LLM Evaluation
  • Summary: 该论文指出RLVR的许多收益被预算不匹配、尝试膨胀和数据污染等因素混淆,通过预算匹配复现和污染探测发现多个广泛引用的差距显著缩小或消失。论文提出了RLVR训练和评估的最低标准,包括预算匹配饱和曲线、校准跟踪和污染筛查。

[609] SecureVibeBench: Evaluating Secure Coding Capabilities of Code Agents with Realistic Vulnerability Scenarios

  • arXiv: 2509.22097 (replaced)
  • Authors: Junkai Chen, Huihui Huang, Yunbo Lyu, Junwen An, Jieke Shi, Chengran Yang, Ting Zhang, Haoye Tian, Yikun Li, Zhenhao Li, Xin Zhou, Xing Hu, David Lo
  • Subjects: cs.SE; cs.AI; cs.CL; cs.CR
  • Tags: Code Generation, LLM Security, Benchmark
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文介绍了SecureVibeBench,一个包含105个C/C++安全编码任务的基准测试,用于评估代码代理在真实漏洞场景下的安全编码能力。该基准具有多文件编辑、基于真实开源漏洞的上下文对齐以及功能测试与安全检查相结合的综合评估等特点。实验表明当前代理难以同时生成正确且安全的代码,最佳代理的正确安全解决方案比例仅为23.8%。

[610] PnP-CM: Consistency Models as Plug-and-Play Priors for Inverse Problems

  • arXiv: 2509.22736 (replaced)
  • Authors: Merve Gülle, Junno Yun, Yaşar Utku Alçalar, Mehmet Akçakaya
  • Subjects: eess.IV; cs.AI; cs.CV; cs.LG; stat.ML
  • Tags: Diffusion Model, Medical AI, Image Reconstruction
  • Venue: CVPR 2026
  • Summary: 本文提出PnP-CM,将一致性模型重新解释为先验近端算子,并将其集成到即插即用框架中用于求解逆问题。该方法在少至4次神经函数评估内实现高质量重建,并首次将一致性模型应用于MRI数据,在多种线性和非线性逆问题上优于现有方法。

[611] Unsupervised Detection of Spatiotemporal Anomalies in PMU Data Using Transformer-Based BiGAN

  • arXiv: 2509.25612 (replaced)
  • Authors: Muhammad Imran Hossain, Jignesh Solanki, Sarika Khushlani Solanki
  • Subjects: cs.LG; cs.AI; eess.SY
  • Tags: Anomaly Detection, Time Series Forecasting
  • Summary: 本文提出T-BiGAN框架,将窗口注意力Transformer集成到双向生成对抗网络中,用于同步相量数据流的时空异常检测。该方法在硬件在环PMU基准测试中达到0.95的ROC-AUC和0.996的平均精度,特别擅长检测微妙的频率和电压偏差。

[612] EEG-based AI-BCI Wheelchair Advancement: Hybrid Deep Learning with Motor Imagery for Brain Computer Interface

  • arXiv: 2509.25667 (replaced)
  • Authors: Bipul Thapa, Biplov Paneru, Bishwash Paneru, Khem Narayan Poudyal
  • Subjects: cs.LG; cs.AI; cs.HC
  • Tags: Brain-Computer Interface, Medical AI
  • Summary: 本文提出了一种基于CNN-Transformer混合模型(CTHM)的脑机接口轮椅控制系统,利用运动想象左右手运动机制进行控制。该模型在测试中达到91.73%的准确率,通过分层交叉验证达到90%的平均准确率,优于XGBoost、EEGNet等基线模型。

[613] Detecting Invariant Manifolds in ReLU-Based RNNs

  • arXiv: 2510.03814 (replaced)
  • Authors: Lukas Eisenmann, Alena Brändle, Zahra Monfared, Daniel Durstewitz
  • Subjects: cs.LG; cs.AI; math.DS
  • Tags: Interpretability, Deep Learning Theory
  • Summary: 本文介绍了一种用于检测ReLU激活的分段线性RNN中稳定和不稳定流形的新算法。该方法可以追踪不同吸引域之间的边界、表征多稳定性,并证明PLRNN中混沌的存在,在皮层神经元电生理记录的实证分析中展示了其实用性。

[614] A Mathematical Explanation of Transformers

  • arXiv: 2510.03989 (replaced)
  • Authors: Xue-Cheng Tai, Hao Liu, Lingfeng Li, Raymond H. Chan
  • Subjects: cs.LG; cs.AI; math.NA
  • Tags: Deep Learning Theory, Representation Learning
  • Summary: 本文提出了一种连续数学框架,将Transformer解释为结构化积分微分方程的离散化。在该框架下,自注意力机制自然地作为非局部积分算子出现,层归一化被表征为时间相关约束的投影,为理解Transformer架构提供了统一的理论基础。

[615] Semantic Segmentation Algorithm Based on Light Field and LiDAR Fusion

  • arXiv: 2510.06687 (replaced)
  • Authors: Jie Luo, Yuxuan Jiang, Xin Jin, Mingyu Liu, Yihui Fan
  • Subjects: cs.CV; cs.AI
  • Tags: Image Segmentation, Multimodal Learning, Autonomous Driving
  • Summary: 本文提出了首个集成光场数据和点云数据的多模态语义分割数据集,以及Mlpfseg多模态融合分割网络。该方法包含特征补全模块和深度感知模块,在mIoU上比纯图像分割高1.71,比纯点云分割高2.38。

[616] EDUMATH: Generating Standards-aligned Educational Math Word Problems

  • arXiv: 2510.06965 (replaced)
  • Authors: Bryan R. Christ, Penelope Molitz, Beau LeBlond, Zachary Gottesman, Jonathan Kropko, Thomas Hartvigsen
  • Subjects: cs.CL; cs.AI
  • Tags: Data Synthesis, Education Technology, Text Generation
  • Venue: ACL 2026
  • Summary: 本文提出使用LLM生成符合学生兴趣和数学教育标准的定制化数学应用题。作者开发了首个教师标注的教育数学应用题生成数据集,训练的12B开源模型与更大模型性能相当,并在小学生研究中发现学生对定制化题目有更高偏好。

[617] Evolutionary Profiles for Protein Fitness Prediction

  • arXiv: 2510.07286 (replaced)
  • Authors: Jigang Fan, Xiaoran Jiao, Shengdong Lin, Zhanming Liang, Weian Mao, Chenchen Jing, Hao Chen, Chunhua Shen
  • Subjects: cs.LG; cs.AI; q-bio.BM; q-bio.QM
  • Tags: Molecular Generation, Representation Learning, Protein Engineering
  • Summary: 本文提出EvoIF,一个轻量级模型,整合来自同源序列的族内谱和来自逆折叠logits的跨族结构-进化约束来预测蛋白质适应度。该模型在ProteinGym上达到最先进性能,仅使用0.15%的训练数据和更少的参数。

[618] HiPRAG: Hierarchical Process Rewards for Efficient Agentic Retrieval Augmented Generation

  • arXiv: 2510.07794 (replaced)
  • Authors: Peilin Wu, Mian Zhang, Kun Wan, Wentian Zhao, Kaiyu He, Xinya Du, Zhiyu Chen
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: RAG, LLM Agent, Reinforcement Learning
  • Venue: ICLR 2026
  • Summary: 本文提出HiPRAG,一种将细粒度过程奖励融入强化学习训练的方法,用于改进代理式RAG系统的搜索效率。该方法通过评估每个搜索决策的必要性来解决过度搜索和搜索不足问题,在七个QA基准上达到65.4%(3B)和67.2%(7B)的平均准确率。

[619] Design Principles for Sequence Models via Coefficient Dynamics

  • arXiv: 2510.09389 (replaced)
  • Authors: Jerome Sieber, Antonio Orvieto, Melanie N. Zeilinger, Carmen Amo Alonso
  • Subjects: cs.LG; cs.AI
  • Tags: Deep Learning Theory, Representation Learning
  • Summary: 本文开发了一个统一框架,将深度序列模型的线性组合系数解释为脉冲输入驱动的自治线性动力系统的输出。该框架揭示了Transformer、SSM和门控线性RNN等不同架构的共同数学主题,并推导出连接架构选择与模型性质的设计原则。

[620] A Survey of Inductive Reasoning for Large Language Models

  • arXiv: 2510.10182 (replaced)
  • Authors: Kedi Chen, Dezhao Ruan, Yuhao Dan, Yaoting Wang, Siyu Yan, Xuecheng Wu, Yinqi Zhang, Qin Chen, Jie Zhou, Liang He, Biqing Qi, Linyang Li, Qipeng Guo, Xiaoming Shi, Wei Zhang
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Reasoning, Benchmark
  • Summary: 本文首次对LLM的归纳推理进行了全面综述,将改进方法分为后训练、测试时扩展和数据增强三类。文章总结了当前归纳推理基准,并提出了基于沙盒的统一评估方法和观察覆盖率指标。

[621] Domain-Specific Data Generation Framework for RAG Adaptation

  • arXiv: 2510.11217 (replaced)
  • Authors: Chris Xing Tian, Weihao Xie, Zhen Chen, Zhengyuan Yi, Hui Liu, Haoliang Li, Shiqi Wang, Siwei Ma
  • Subjects: cs.CL; cs.AI
  • Tags: RAG, Data Synthesis, Question Answering
  • Venue: ACL 2026
  • Summary: 本文提出RAGen,一个可扩展的模块化框架,用于生成面向RAG适配的领域特定问答上下文三元组。该框架通过识别文档中的关键概念、生成受布鲁姆分类法启发的多样化问题,支持多种RAG适配策略。

[622] Learning When Not to Learn: Risk-Sensitive Abstention in Bandits with Unbounded Rewards

  • arXiv: 2510.14884 (replaced)
  • Authors: Sarah Liaw, Benjamin Plaut
  • Subjects: cs.LG; cs.AI
  • Tags: Reinforcement Learning, AI Safety, Decision Making
  • Venue: AISTATS 2026
  • Summary: 本文将无界奖励下的学习形式化为带有弃权选项的双动作上下文强盗问题。提出的谨慎算法学习何时弃权而非执行,在高风险环境中建立了次线性遗憾保证,实现了安全的探索策略。

[623] Post-Processing Methods for Improving Accuracy in MRI Inpainting

  • arXiv: 2510.15282 (replaced)
  • Authors: Nishad Kulkarni, Krithika Iyer, Austin Tapp, Abhijeet Parida, Daniel Capellán-Martín, Zhifan Jiang, María J. Ledesma-Carbayo, Syed Muhammad Anwar, Marius George Linguraru
  • Subjects: cs.CV; cs.AI; cs.LG
  • Tags: Medical AI, Image Enhancement, Image Synthesis
  • Summary: 本文系统评估了MRI图像修复模型,并提出结合模型集成与后处理策略(如中值滤波、直方图匹配和像素平均)的方法。该方法通过轻量级U-Net增强阶段进一步改善解剖学合理性,提高了修复结果的准确性和鲁棒性。

[624] Think Parallax: Solving Multi-Hop Problems via Multi-View Knowledge-Graph-Based Retrieval-Augmented Generation

  • arXiv: 2510.15552 (replaced)
  • Authors: Jinliang Liu, Jiale Bai, Shaoning Zeng
  • Subjects: cs.CL; cs.AI
  • Tags: RAG, Knowledge Graph, LLM Reasoning
  • Summary: 本文提出ParallaxRAG,一个对称多视图框架,将查询和知识图谱解耦到对齐的头部特定语义空间中。该方法通过强制关系多样性和约束弱相关路径,在WebQSP和CWQ上达到最先进的检索和问答性能,同时显著减少幻觉。

[625] Self-Certifying Primal-Dual Optimization Proxies for Large-Scale Batch Economic Dispatch

  • arXiv: 2510.15850 (replaced)
  • Authors: Michael Klamkin, Mathieu Tanneau, Pascal Van Hentenryck
  • Subjects: cs.LG; cs.AI; math.OC
  • Tags: Optimization, Energy Efficiency
  • Summary: 本文提出了一种混合求解器,结合优化代理与经典求解器,利用对偶理论高效界定预测的最优性间隙,在无法认证时回退到经典求解器。该方法在大规模传输系统上实现了超过1000倍的加速,同时保证最大最优性间隙不超过2%。

[626] SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors

  • arXiv: 2510.17516 (replaced)
  • Authors: Tiancheng Hu, Joachim Baumann, Lorenzo Lupo, Nigel Collier, Dirk Hovy, Paul Röttger
  • Subjects: cs.CL; cs.AI; cs.CY; cs.LG
  • Tags: LLM Evaluation, Benchmark, Social Simulation
  • Venue: ICLR 2026
  • Summary: 本文介绍了SimBench,首个大规模标准化基准测试,用于评估LLM模拟人类行为的能力,涵盖20个多样化数据集。实验表明当前最佳LLM的模拟保真度适中(40.80/100),存在对齐-模拟权衡,且模拟能力与知识密集型推理能力高度相关。

[627] AtlasKV: Augmenting LLMs with Billion-Scale Knowledge Graphs in 20GB VRAM

  • arXiv: 2510.17934 (replaced)
  • Authors: Haoyu Huang, Hong Ting Tsang, Jiaxin Bai, Xi Peng, Gong Zhang, Yangqiu Song
  • Subjects: cs.CL; cs.AI
  • Tags: Knowledge Graph, LLM Inference
  • Venue: ICLR 2026
  • Summary: 本文提出AtlasKV,一种参数化知识集成方法,可在仅20GB显存内用十亿级知识图谱增强LLM。该方法引入KG2KV和HiKVP技术,以亚线性的时间和内存复杂度集成知识图谱三元组,无需外部检索器或重新训练。

[628] DistDF: Time-Series Forecasting Needs Joint-Distribution Wasserstein Alignment

  • arXiv: 2510.24574 (replaced)
  • Authors: Hao Wang, Licheng Pan, Yuan Lu, Zhixuan Chu, Xiaoxi Li, Shuting He, Zhichao Chen, Haoxuan Li, Qingsong Wen, Zhouchen Lin
  • Subjects: cs.LG; cs.AI
  • Tags: Time Series Forecasting, Optimization
  • Summary: 本文提出DistDF时间序列预测方法,通过最小化预测序列与标签序列条件分布之间的联合分布Wasserstein差异来实现分布对齐。该方法证明了联合分布差异上界于条件分布差异,具有良好的可微性和优化兼容性。

[629] What's In My Human Feedback? Learning Interpretable Descriptions of Preference Data

  • arXiv: 2510.26202 (replaced)
  • Authors: Rajiv Movva, Smitha Milli, Sewon Min, Emma Pierson
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Alignment, Interpretability, RLHF
  • Venue: ICLR 2026
  • Code: code
  • Summary: 本文提出WIMHF方法,利用稀疏自编码器解释人类反馈数据,识别可解释的特征来刻画偏好数据的能力和实际表达。该方法揭示了人类偏好的多样性,支持数据筛选和细粒度个性化,在有害样本重标注上实现了37%的安全性提升。

[630] Why Do Multilingual Reasoning Gaps Emerge in Reasoning Language Models?

  • arXiv: 2510.27269 (replaced)
  • Authors: Deokhyung Kang, Seonjeong Hwang, Daehui Kim, Hyounghun Kim, Gary Geunbae Lee
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: LLM Reasoning, Multilingual Learning, Machine Translation
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文研究发现多语言推理差距主要源于语言理解失败,即模型无法将多语言输入翻译为推理轨迹中的主导语言。作者提出选择性翻译策略,仅在检测到理解失败时加入英文翻译,以约20%的翻译率实现接近全翻译的性能。

[631] Thought Branches: Interpreting LLM Reasoning Requires Resampling

  • arXiv: 2510.27484 (replaced)
  • Authors: Uzay Macar, Paul C. Bogdan, Senthooran Rajamanoharan, Neel Nanda
  • Subjects: cs.LG; cs.AI; cs.CL
  • Tags: LLM Reasoning, Interpretability
  • Summary: 本文论证了仅研究单条思维链不足以理解LLM推理的因果影响,提出通过重采样来测量部分思维链的影响。作者展示了重采样在智能体错位场景、思维链引导和推理步骤移除等案例研究中的应用,引入了弹性度量来评估关键规划语句的影响。

[632] Context-Guided Decompilation: A Step Towards Re-executability

  • arXiv: 2511.01763 (replaced)
  • Authors: Xiaohan Wang, Yuxin Hu, Kevin Leach
  • Subjects: cs.SE; cs.AI
  • Tags: Code Generation, In-Context Learning, Software Engineering
  • Summary: 本文提出ICL4Decomp混合反编译框架,利用上下文学习引导LLM生成可重新执行的源代码。该方法在多个数据集、优化级别和编译器上实现了约40%的可重执行性提升,同时保持鲁棒性。

[633] Multimodal Diffusion Forcing for Forceful Manipulation

  • arXiv: 2511.04812 (replaced)
  • Authors: Zixuan Huang, Huaidian Hou, Dmitry Berenson
  • Subjects: cs.RO; cs.AI; cs.LG
  • Tags: Robotics, Diffusion Model, Multimodal Learning
  • Summary: 本文提出多模态扩散强制(MDF)框架,通过随机部分掩码训练扩散模型来重建轨迹,学习时间和跨模态依赖关系。该方法在接触密集的强力操作任务中表现出多功能性、强性能和对噪声观测的鲁棒性。

[634] SynthAgent: Adapting Web Agents with Synthetic Supervision

  • arXiv: 2511.06101 (replaced)
  • Authors: Zhaoyang Wang, Yiming Liang, Xuchao Zhang, Qianhui Wu, Siwei Han, Anson Bastos, Rujia Wang, Chetan Bansal, Baolin Peng, Jianfeng Gao, Saravan Rajmohan, Huaxiu Yao
  • Subjects: cs.LG; cs.AI; cs.CL
  • Tags: LLM Agent, Data Synthesis, GUI Automation
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文提出SynthAgent框架,通过任务和轨迹的双重细化来提高合成数据质量,实现Web智能体的合成监督适应。该方法通过分类探索合成多样化任务,在检测到冲突时细化任务,并在收集后进行全局上下文轨迹细化。

[635] Introduction to Automated Negotiation

  • arXiv: 2511.08659 (replaced)
  • Authors: Dave de Jonge
  • Subjects: cs.MA; cs.AI; cs.GT
  • Tags: Decision Making, Automated Planning
  • Summary: 这是一本面向计算机科学学生的自动谈判入门教材,仅需基础数学和编程技能。书中附带一个简单的Python玩具世界谈判框架,读者可用来实现自己的谈判算法并进行实验。

[636] Volumetric Ergodic Control

  • arXiv: 2511.11533 (replaced)
  • Authors: Jueun Kwon, Max M. Sun, Todd Murphey
  • Subjects: cs.RO; cs.AI
  • Tags: Robotics, Optimization
  • Venue: ICRA 2026
  • Summary: 本文引入一种新的遍历控制公式,使用体积状态表示来优化空间覆盖,解决了现有方法将机器人建模为非体积点的局限性。该方法保留了遍历控制的渐近覆盖保证,计算开销小,覆盖效率提升超过两倍。

[637] GroupRank: A Groupwise Paradigm for Effective and Efficient Passage Reranking with LLMs

  • arXiv: 2511.11653 (replaced)
  • Authors: Meixiu Long, Duolin Sun, Dan Yang, Yihan Jiao, Lei Liu, Jiahai Wang, BinBin Hu, Yue Shen, Jie Feng, Zhehao Tan, Junjie Wang, Lianzhen Zhong, Jian Wang, Peng Wei, Jinjie Gu
  • Subjects: cs.IR; cs.AI; cs.LG
  • Tags: Information Retrieval, LLM Inference, Reinforcement Learning
  • Venue: ACL 2026
  • Summary: 本文提出GroupRank范式,通过分组比较平衡灵活性和上下文感知,解决LLM重排序的效率-准确性权衡问题。该方法结合无答案数据合成管道和强化学习,在BRIGHT上达到65.2 NDCG@10,同时实现6.4倍推理加速。

[638] Improving Neutrino Oscillation Measurements through Event Classification

  • arXiv: 2511.11938 (replaced)
  • Authors: Sebastian A. R. Ellis, Daniel C. Hackett, Shirley Weishi Li, Pedro A. N. Machado, Karla Tame-Narvaez
  • Subjects: cs.AI; cs.LG
  • Tags: Scientific Computing, Physics-Informed Learning
  • Summary: 本文提出一种策略,在能量重建前根据底层相互作用类型对中微子事件进行分类,利用不同散射过程的运动学差异。该方法在模拟DUNE中微子消失分析中将准确性和灵敏度提高了10-20%。

[639] LiveCLKTBench: Towards Reliable Evaluation of Cross-Lingual Knowledge Transfer in Multilingual LLMs

  • arXiv: 2511.14774 (replaced)
  • Authors: Pei-Fu Guo, Yun-Da Tsai, Chun-Chia Hsu, Kai-Xin Chen, Ya-An Tsai, Kai-Wei Chang, Nanyun Peng, Mi-Yen Yeh, Shou-De Lin
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Evaluation, Multilingual Learning, Benchmark
  • Summary: 本文提出LiveCLKTBench自动生成管道,专门用于隔离和测量多语言LLM中的跨语言知识迁移。实验表明跨语言迁移受语言距离影响且常呈不对称性,较大模型的迁移增益随规模递减。

[640] Process-Centric Analysis of Agentic Software Systems

  • arXiv: 2512.02393 (replaced)
  • Authors: Shuyang Liu, Yang Chen, Rahul Krishna, Saurabh Sinha, Jatin Ganhotra, Reyhan Jabbarvand
  • Subjects: cs.SE; cs.AI; cs.CL
  • Tags: LLM Agent, Software Engineering, LLM Evaluation
  • Summary: 本文引入Graphectory方法,将智能体系统的时序和语义关系系统编码为图结构,实现以过程为中心的分析。作者分析了SWE-agent和OpenHands的4000条轨迹,并实现了实时监控技术,将问题实例的解决率提高6.9%-23.5%。

[641] A Unified Theory of Sparse Dictionary Learning in Mechanistic Interpretability: Piecewise Biconvexity and Spurious Minima

  • arXiv: 2512.05534 (replaced)
  • Authors: Yiming Tang, Harshvardhan Saini, Zhaoqian Yao, Zheng Lin, Yizhen Liao, Qianxiao Li, Mengnan Du, Dianbo Liu
  • Subjects: cs.LG; cs.AI
  • Tags: Interpretability, Representation Learning
  • Summary: 本文提出了一个统一的理论框架,将稀疏字典学习方法建模为分段双凸优化问题,分析了特征吸收和死神经元等现象的原因。作者提出了特征锚定技术来恢复稀疏字典学习的可识别性,显著改善了特征恢复效果。

[642] WisPaper: Your AI Scholar Search Engine

  • arXiv: 2512.06879 (replaced)
  • Authors: Li Ju, Jun Zhao, Mingxu Chai, Ziyu Shen, Xiangyang Wang, Yage Geng, Chunchun Ma, Hao Peng, Guangbin Li, Tao Li, Chengyong Liao, Fu Wang, Xiaolong Wang, Junshen Chen, Rui Gong, Shijia Liang, Feiyan Li, Ming Zhang, Kexin Tan, Junjie Ye, Zhiheng Xi, Shihan Dou, Tao Gui, Yuankai Ying, Yang Shi, Yue Zhang, Qi Zhang
  • Subjects: cs.IR; cs.AI
  • Tags: LLM Agent, Information Retrieval
  • Summary: 本文提出了WisPaper,一个端到端的智能体系统,用于发现、组织和跟踪学术文献。该系统结合了语义搜索与深度验证,并通过用户画像实现个性化推荐,形成从发现到长期跟踪的闭环工作流。

[643] Interpretable Alzheimer's Diagnosis via Multimodal Fusion of Regional Brain Experts

  • arXiv: 2512.10966 (replaced)
  • Authors: Farica Zhuang, Shu Yang, Dinara Aliyeva, Zixuan Wen, Duy Duong-Tran, Christos Davatzikos, Tianlong Chen, Song Wang, Li Shen
  • Subjects: cs.LG; cs.AI; cs.CV; eess.IV
  • Tags: Medical AI, Mixture-of-Experts, Multimodal Learning
  • Venue: IEEE ICHI 2026
  • Summary: 本文提出了MREF-AD,一个基于混合专家框架的多模态区域专家融合模型,用于阿尔茨海默病诊断。该模型将脑区建模为独立专家,通过门控网络学习特定主体的融合权重,在保持竞争性能的同时提供可解释的诊断洞察。

[644] Enhancing Geo-localization for Crowdsourced Flood Imagery via LLM-Guided Attention

  • arXiv: 2512.11811 (replaced)
  • Authors: Fengyi Xu, Jun Ma, Waishan Qiu, Cui Guo, Jack C.P. Cheng
  • Subjects: cs.CL; cs.AI; cs.CV; cs.CY
  • Tags: Vision-Language Model, Remote Sensing, Disaster Response
  • Summary: 本文提出了VPR-AttLLM框架,通过注意力引导的描述符增强将大语言模型的语义推理能力集成到视觉地点识别流程中。该方法无需模型重训练即可提升洪水图像的地理定位精度,在真实洪水图像上获得最高8%的相对提升。

[645] Scone: Bridging Composition and Distinction in Subject-Driven Image Generation via Unified Understanding-Generation Modeling

  • arXiv: 2512.12675 (replaced)
  • Authors: Yuran Wang, Bohan Zeng, Chengzhuo Tong, Wenxuan Liu, Yang Shi, Xiaochen Ma, Hao Liang, Yuanxing Zhang, Wentao Zhang
  • Subjects: cs.CV; cs.AI
  • Tags: Text-to-Image, Diffusion Model, Image Synthesis
  • Code: code
  • Summary: 本文提出了Scone方法,通过统一的理解-生成建模来解决主体驱动图像生成中的组合与区分问题。该方法采用两阶段训练方案,并引入SconeEval基准来评估组合和区分能力。

[646] Understanding Generalization in Role-Playing Models via Information Theory

  • arXiv: 2512.17270 (replaced)
  • Authors: Yongqi Li, Hao Lang, Fei Huang, Tieyun Qian, Yongbin Li
  • Subjects: cs.LG; cs.AI; cs.CL
  • Tags: LLM Evaluation, Reinforcement Learning, Deep Learning Theory
  • Venue: ACL 2026
  • Summary: 本文引入了基于信息论的度量R-EMID来衡量角色扮演模型在分布偏移下的性能退化,并提出了协同进化强化学习框架来增强泛化能力。研究发现用户偏移是所有偏移中风险最高的因素。

[647] M$^3$KG-RAG: Multi-hop Multimodal Knowledge Graph-enhanced Retrieval-Augmented Generation

  • arXiv: 2512.20136 (replaced)
  • Authors: Hyeongcheol Park, Jiyoung Seo, Jaewon Mun, Hogun Park, Wonmin Byeon, Sung June Kim, Hyeonsoo Im, JeungSub Lee, Sangpil Kim
  • Subjects: cs.CL; cs.AI
  • Tags: RAG, Multimodal Learning, Knowledge Graph
  • Venue: CVPR 2026
  • Summary: 本文提出了M³KG-RAG框架,通过多跳多模态知识图谱增强检索来提升多模态大语言模型的推理能力。该方法引入GRASP机制实现精确的实体定位和冗余上下文剪枝。

[648] LEAD: Minimizing Learner-Expert Asymmetry in End-to-End Driving

  • arXiv: 2512.20563 (replaced)
  • Authors: Long Nguyen, Micha Fauth, Bernhard Jaeger, Daniel Dauner, Maximilian Igl, Andreas Geiger, Kashyap Chitta
  • Subjects: cs.CV; cs.AI; cs.LG; cs.RO
  • Tags: Autonomous Driving, Imitation Learning, Sim-to-Real
  • Venue: CVPR 2026
  • Code: code
  • Summary: 本文研究了端到端自动驾驶中专家演示与学生观测之间的不对称性问题,并提出了实用的干预措施来缩小这一差距。改进后的TransFuser v6在CARLA闭环基准测试中达到了新的最优性能。

[649] Variance-Aware Prior-Based Tree Policies for Monte Carlo Tree Search

  • arXiv: 2512.21648 (replaced)
  • Authors: Maximilian Weichart
  • Subjects: cs.LG; cs.AI
  • Tags: Reinforcement Learning, Automated Planning, Decision Making
  • Code: code
  • Summary: 本文提出了Inverse-RPO方法,可从任意无先验UCB系统性地推导出基于先验的UCT树策略。将该方法应用于方差感知的UCB-V,得到了两种新的方差感知树策略,在多个基准测试中优于PUCT。

[650] CricBench: A Multilingual Benchmark for Evaluating LLMs in Cricket Analytics

  • arXiv: 2512.21877 (replaced)
  • Authors: Parth Agarwal, Navya Kommuri, Trizal Garg, Prisha Singhal, Dhruv Shah, Vaibhav Devraj, Yash Sinha, Jagat Sesh Challa, Murari Mandal, Dhruv Kumar
  • Subjects: cs.CL; cs.AI
  • Tags: Benchmark, Multilingual Learning, Question Answering
  • Summary: 本文提出了CricBench,首个针对板球分析的Text-to-SQL基准,涵盖四种比赛格式和四种语言。评估结果显示所有模型在语法有效性和语义正确性之间存在显著差距,领域差距达37-55个百分点。

[651] Artificial Intelligence for All? Brazilian Teachers on Ethics, Equity, and the Everyday Challenges of AI in Education

  • arXiv: 2512.23834 (replaced)
  • Authors: Bruno Florentino, Camila Sestito, Wellington Cruz, André de Carvalho, Robson Bonidia
  • Subjects: cs.CY; cs.AI
  • Tags: Education Technology, AI Ethics, AI Sustainability
  • Summary: 本研究通过问卷调查分析了346名巴西K-12教师对AI教育的看法。尽管大多数教师AI知识有限,但对AI应用表现出浓厚兴趣,同时指出缺乏培训和基础设施等结构性挑战。

[652] Can Small Training Runs Reliably Guide Data Curation? Rethinking Proxy-Model Practice

  • arXiv: 2512.24503 (replaced)
  • Authors: Jiachen T. Wang, Tong Wu, Kaifeng Lyu, James Zou, Dawn Song, Ruoxi Jia, Prateek Mittal
  • Subjects: cs.LG; cs.AI
  • Tags: Data Selection, LLM Training, Pre-training
  • Venue: ICLR 2026
  • Summary: 本文揭示了使用固定小规模训练配置进行数据配方评估的问题,发现结论可能因超参数调整而翻转。作者提出使用降低的学习率进行代理模型训练,使其与大规模预训练结果高度相关。

[653] AI-enhanced tuning of quantum dot Hamiltonians toward Majorana modes

  • arXiv: 2601.02149 (replaced)
  • Authors: Mateusz Krawczyk, Jarosław Pawłowski
  • Subjects: cs.AI
  • Tags: Quantum Computing, Physics-Informed Learning, Scientific Computing
  • Summary: 本文提出了一种基于神经网络的方法,用于自动调节量子点模拟器以获得马约拉纳模式。该方法使用物理信息损失在合成电导图上进行无监督训练,能够从广泛的初始参数空间快速收敛到拓扑相。

[654] Disco-RAG: Discourse-Aware Retrieval-Augmented Generation

  • arXiv: 2601.04377 (replaced)
  • Authors: Dongqi Liu, Hang Ding, Qiming Feng, Xurong Xie, Zhucun Xue, Chengjie Wang, Jian Li, Jiangning Zhang, Yabiao Wang
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: RAG, Summarization, Question Answering
  • Venue: ACL 2026
  • Summary: 本文提出了Disco-RAG框架,通过构建块内话语树和块间修辞图将话语信号显式注入生成过程。该方法在问答和长文档摘要基准测试中无需微调即可达到最优结果。

[655] Enhanced-FQL($λ$), an Efficient and Interpretable RL with novel Fuzzy Eligibility Traces and Segmented Experience Replay

  • arXiv: 2601.04392 (replaced)
  • Authors: Mohsen Jalaeian-Farimani, Xiong Xiong, Luca Bascetta
  • Subjects: cs.LG; cs.AI; cs.RO; eess.SY; math.OC
  • Tags: Reinforcement Learning, Interpretability, Fuzzy Logic
  • Venue: ECC 2026
  • Summary: 本文提出了Enhanced-FQL(λ)模糊强化学习框架,将模糊化资格迹和分段经验回放集成到模糊Q学习中。该方法使用可解释的模糊规则基替代复杂神经网络架构,在连续控制任务中保持竞争性能。

[656] Merging Triggers, Breaking Backdoors: Defensive Poisoning for Instruction-Tuned Language Models

  • arXiv: 2601.04448 (replaced)
  • Authors: San Kim, Gary Geunbae Lee
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Security, Backdoor Detection, Instruction Tuning
  • Summary: 本文提出了MB-Defense训练流程,通过防御性毒化和后门中和两个阶段来免疫指令调优大语言模型免受后门攻击。该方法在大幅降低攻击成功率的同时保持了指令遵循能力。

[657] What Users Leave Unsaid: Under-Specified Queries Limit Vision-Language Models

  • arXiv: 2601.06165 (replaced)
  • Authors: Dasol Choi, Guijin Son, Hanwool Lee, Minhyuk Kim, Hyunwoo Ko, Teabin Lim, Ahn Eungyeol, Jungwhan Kim, Seunghyeok Hong, Youngsook Song
  • Subjects: cs.CV; cs.AI
  • Tags: Vision-Language Model, LLM Evaluation, Question Answering
  • Summary: 本文介绍了HAERAE-Vision基准,包含来自韩国在线社区的真实世界视觉问题,发现视觉语言模型在处理不明确查询时表现不佳。研究表明,将查询显式化可显著提升模型性能,揭示了基准评估与实际部署之间的差距。

[658] Self-Organizing Dual-Buffer Adaptive Clustering Experience Replay (SODACER) for Safe Reinforcement Learning in Optimal Control

  • arXiv: 2601.06540 (replaced)
  • Authors: Roya Khalili Amirabadi, Mohsen Jalaeian Farimani, Omid Solaymani Fard
  • Subjects: eess.SY; cs.AI; cs.LG; cs.RO; math.OC
  • Tags: Reinforcement Learning, Robotics, Medical AI
  • Venue: Sci. Rep. 2026
  • Summary: 本文提出SODACER强化学习框架,通过双缓冲区经验回放机制实现非线性系统的安全最优控制。该方法结合控制障碍函数保证安全性,在HPV传播模型上验证了更快的收敛速度和更好的样本效率。

[659] GanitLLM: Difficulty-Aware Bengali Mathematical Reasoning through Curriculum-GRPO

  • arXiv: 2601.06767 (replaced)
  • Authors: Shubhashis Roy Dipta, Khairul Mahbub, Nadia Najjar
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: LLM Reasoning, Mathematical Reasoning, Multilingual Learning
  • Venue: ACL 2026
  • Summary: 本文提出了GanitLLM,一个面向孟加拉语数学推理的模型,配合难度感知语料库和课程式GRPO训练流程。该模型在孟加拉语数学基准上显著提升性能,同时大幅增加了孟加拉语推理token的比例。

[660] Controlling Multimodal Conversational Agents with Coverage-Enhanced Latent Actions

  • arXiv: 2601.07516 (replaced)
  • Authors: Yongqi Li, Hao Lang, Tieyun Qian, Yongbin Li
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: Vision-Language Model, Reinforcement Learning, Dialogue System
  • Venue: ACL 2026
  • Summary: 本文提出了一种基于潜在动作的强化学习微调方法,用于多模态对话代理。该方法利用跨模态投影器和循环一致性损失增强潜在动作空间的覆盖度,在对话任务上优于竞争基线。

[661] The Poisoned Apple Effect: Strategic Manipulation of Mediated Markets via Technology Expansion of AI Agents

  • arXiv: 2601.11496 (replaced)
  • Authors: Eilam Shapira, Roi Reichart, Moshe Tennenholtz
  • Subjects: cs.GT; cs.AI; cs.CL; cs.MA
  • Tags: LLM Agent, AI Ethics, Game AI
  • Summary: 本文研究了AI代理技术扩展对议价、谈判和说服三种博弈论场景中策略互动的影响。作者发现了”毒苹果”效应:代理可能发布新技术来操纵监管者的市场设计选择,从而在损害对手的情况下提升自身福利。

[662] Powerful Training-Free Membership Inference Against Autoregressive Language Models

  • arXiv: 2601.12104 (replaced)
  • Authors: David Ilić, David Stanojević, Kostadin Cvejoski
  • Subjects: cs.CL; cs.AI; cs.CR
  • Tags: Privacy, LLM Security
  • Code: code
  • Summary: 本文提出了EZ-MIA成员推理攻击方法,利用微调语言模型在错误位置的 memorization 现象来检测训练数据。该方法在低假阳性率下实现了比先前工作高数倍的检测率,揭示了比已知更大的隐私风险。

[663] DiSPA: Differential Substructure-Pathway Attention for Drug Response Prediction

  • arXiv: 2601.14346 (replaced)
  • Authors: Yewon Han, Sunghyun Kim, Eunyi Jeong, Sungkyung Lee, Seokwoo Yun, Sangsoo Lim
  • Subjects: cs.LG; cs.AI
  • Tags: Drug Discovery, Medical AI
  • Summary: 本文提出了DiSPA框架,通过建模化学子结构与通路级基因表达之间的双向相互作用来预测药物反应。该方法在GDSC基准上达到最先进性能,并展现出更好的可解释性和泛化能力。

[664] XD-MAP: Cross-Modal Domain Adaptation via Semantic Parametric Maps for Scalable Training Data Generation

  • arXiv: 2601.14477 (replaced)
  • Authors: Frank Bieder, Hendrik Königshof, Haohao Hu, Fabian Immel, Yinzhe Shen, Jan-Hendrik Pauls, Christoph Stiller
  • Subjects: cs.CV; cs.AI; eess.IV
  • Tags: Domain Adaptation, Autonomous Driving, 3D Vision
  • Venue: CVPR 2026 Workshop
  • Summary: 本文提出了XD-MAP方法,利用语义参数图将相机图像知识迁移到LiDAR域,实现跨模态域适应。该方法无需人工标注即可在2D和3D分割任务上取得显著提升。

[665] Parallelism and Generation Order in Masked Diffusion Language Models: Limits Today, Potential Tomorrow

  • arXiv: 2601.15593 (replaced)
  • Authors: Yangyang Zhong, Yanmei Gu, Zhengqing Zang, Xiaomeng Li, Yuqi Ding, Xibei Jia, Yuting Shen, Zhenzhong Lan, Liwang Zhu, Weiping Liu, Junlin Zhou, Haisheng Liu, Zhong Xin Yu, Pengxin Luo, Donglian Qi, Yunfeng Yan, Junbo Zhao
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: Diffusion Model, Text Generation, LLM Inference
  • Summary: 本文从并行性和生成顺序两个维度分析了掩码扩散语言模型的行为特征,发现其仍落后于自回归模型。作者提出了Generate-then-Edit范式,在保持并行解码效率的同时缓解依赖损失问题。

[666] StreetDesignAI: A Multi-Persona Evaluation System for Inclusive Infrastructure Design

  • arXiv: 2601.15671 (replaced)
  • Authors: Ziyi Wang, Yilong Dai, Duanya Lyu, Mateo Nader, Sihan Chen, Wanghao Ye, Zjian Ding, Xiang Yan
  • Subjects: cs.HC; cs.AI
  • Tags: LLM Agent, Human-Computer Interaction
  • Summary: 本文提出了StreetDesignAI系统,通过模拟不同骑行者角色的反馈来支持自行车基础设施设计。研究表明,结构化的多视角反馈能显著拓宽设计师对不同用户需求的理解,提升设计决策信心。

[667] Who Gets Which Message? Auditing Demographic Bias in LLM-Generated Targeted Text

  • arXiv: 2601.17172 (replaced)
  • Authors: Tunazzina Islam
  • Subjects: cs.CL; cs.AI; cs.CY; cs.LG
  • Tags: Bias Mitigation, Fairness, LLM Evaluation
  • Venue: ACL 2026
  • Summary: 本文首次系统分析了LLM在人口统计条件下的定向消息生成中的偏见问题。研究发现模型在年龄和性别维度上存在一致的不对称性,上下文提示会放大这些差异。

[668] MERMAID: Memory-Enhanced Retrieval and Reasoning with Multi-Agent Iterative Knowledge Grounding for Veracity Assessment

  • arXiv: 2601.22361 (replaced)
  • Authors: Yupeng Cao, Chengyang He, Yangyang Yu, Ping Wang, K.P. Subbalakshmi
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: LLM Agent, RAG, Multi-Agent System
  • Summary: 本文提出了MERMAID框架,通过整合代理驱动搜索、结构化知识表示和持久记忆模块来增强真实性评估。该方法在事实核查基准上达到最先进性能,同时提高了搜索效率。

[669] Beyond RAG for Agent Memory: Retrieval by Decoupling and Aggregation

  • arXiv: 2602.02007 (replaced)
  • Authors: Zhanghao Hu, Qinglin Zhu, Di Liang, Hanqi Yan, Yulan He, Lin Gui
  • Subjects: cs.CL; cs.AI
  • Tags: RAG, LLM Agent, Memory Architecture
  • Summary: 本文提出了xMemory系统,通过解耦和聚合潜在语义组件来改进代理记忆检索。该方法在多个基准上相比标准RAG流程在答案质量和token效率方面均取得一致提升。

[670] Why Steering Works: Toward a Unified View of Language Model Parameter Dynamics

  • arXiv: 2602.02343 (replaced)
  • Authors: Ziwen Xu, Chenyan Wu, Hengyu Sun, Haiwen Hong, Mengru Wang, Yunzhi Yao, Longtao Huang, Hui Xue, Shumin Deng, Zhixuan Chu, Huajun Chen, Ningyu Zhang
  • Subjects: cs.CL; cs.AI; cs.CV; cs.IR; cs.LG
  • Tags: LLM Alignment, Interpretability
  • Venue: ACL 2026
  • Code: code
  • Summary: 本文提出了LLM控制方法的统一视角,将其框架化为动态权重更新,并分析了偏好与效用之间的权衡关系。作者提出了SPLIT方法,在增强偏好的同时更好地保持效用。

[671] El Agente Estructural: An Artificially Intelligent Molecular Editor

  • arXiv: 2602.04849 (replaced)
  • Authors: Changhyeok Choi, Yunheng Zou, Marcel Müller, Han Hao, Yeonghun Kang, Juan B. Pérez-Sánchez, Ignacio Gustin, Hanyong Xu, Andrew Wang, Mohammad Ghazi Vakili, Chris Crebolder, Alán Aspuru-Guzik, Varinia Bernales
  • Subjects: cs.AI; cs.MA
  • Tags: Molecular Generation, LLM Agent, Multimodal Learning
  • Summary: 本文提出了El Agente Estructural,一个多模态分子几何生成和操控代理,模仿人类专家直接操作三维分子系统的方式。该系统通过视觉语言模型和领域工具实现了精确的分子结构控制。

[672] Fake-HR1: Rethinking Reasoning of Vision Language Model for Synthetic Image Detection

  • arXiv: 2602.10042 (replaced)
  • Authors: Changjiang Jiang, Xinkuan Sha, Fengchang Yu, Jingjing Liu, Jian Liu, Mingqi Fang, Chenfeng Zhang, Wei Lu
  • Subjects: cs.CV; cs.AI
  • Tags: Deepfake Detection, Vision-Language Model, LLM Reasoning
  • Venue: ICASSP 2026
  • Summary: 本文提出了Fake-HR1混合推理模型,能够根据生成检测任务的特性自适应决定是否需要推理。该方法通过两阶段训练框架,在检测性能和响应效率方面均超越现有方法。

[673] The Weight of a Bit: EMFI Sensitivity Analysis of Embedded Deep Learning Models

  • arXiv: 2602.16309 (replaced)
  • Authors: Jakub Breier, Štefan Kučerák, Xiaolu Hou
  • Subjects: cs.CR; cs.AI
  • Tags: Adversarial Robustness, Fault Tolerance, DNN Deployment
  • Summary: 本文研究了四种不同数值表示格式(32位和16位浮点数,8位和4位整数)对嵌入式神经网络模型抵抗电磁故障注入攻击能力的影响。实验结果表明,整数表示比浮点表示具有更好的抗攻击性,其中8位整数表示在VGG-11上能保持约70%的Top-1准确率。

[674] AdvSynGNN: Structure-Adaptive Graph Neural Nets via Adversarial Synthesis and Self-Corrective Propagation

  • arXiv: 2602.17071 (replaced)
  • Authors: Rong Fu, Muge Qi, Chunlei Meng, Shuo Yin, Kun Liu, Zhaolu Kang, Simon Fong
  • Subjects: cs.LG; cs.AI
  • Tags: Graph Neural Network, Adversarial Robustness, Representation Learning
  • Summary: 本文提出AdvSynGNN架构,用于在结构噪声和非同质拓扑下实现鲁棒的节点级表示学习。该框架结合多分辨率结构合成、对抗传播引擎和标签细化机制,有效提升了图神经网络在多样化图分布上的预测准确性。

[675] SubQuad: Near-Quadratic-Free Structure Inference with Distribution-Balanced Objectives in Adaptive Receptor framework

  • arXiv: 2602.17330 (replaced)
  • Authors: Rong Fu, Zijian Zhang, Kun Liu, Jiekai Wu, Xianda Li, Simon Fong
  • Subjects: cs.LG; cs.AI
  • Tags: Fairness, Medical AI, Computational Biology
  • Summary: 本文提出SubQuad管道,通过近二次复杂度的检索、GPU加速的亲和力核和公平性约束聚类,解决群体规模适应性免疫受体比较分析中的计算瓶颈和数据不平衡问题。该系统在病毒和肿瘤受体数据集上实现了吞吐量和内存使用的提升,同时保持了召回率和聚类纯度。

[676] UBio-MolFM: A Universal Molecular Foundation Model for Bio-Systems

  • arXiv: 2602.17709 (replaced)
  • Authors: Lin Huang, Arthur Jiang, XiaoLi Liu, Zion Wang, Jason Zhao, Chu Wang, HaoCheng Lu, ChengXiang Huang, JiaJun Cheng, YiYue Du, Jia Zhang
  • Subjects: cs.AI
  • Tags: Molecular Generation, Pre-training, Scientific Computing
  • Summary: 本文提出UBio-MolFM通用分子基础模型框架,旨在弥合量子力学精度与生物尺度之间的差距。该框架包含大规模生物特定数据集、线性复杂度等变变压器和三阶段课程学习协议,在大型生物分子系统上实现了从头计算级别的精度。

[677] Pyramid MoA: A Probabilistic Framework for Cost-Optimized Anytime Inference

  • arXiv: 2602.19509 (replaced)
  • Authors: Arindam Khaled
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: Mixture-of-Experts, LLM Inference, LLM Reasoning
  • Summary: 本文提出Pyramid MoA分层混合智能体架构,通过决策论路由器优化计算成本,建立了概率任意时间性质保证。在MBPP、GSM8K和MMLU基准测试中,该系统在匹配Oracle准确率的同时实现了高达42.9%的计算节省。

[678] Reasoning as Gradient: Scaling MLE Agents Beyond Tree Search

  • arXiv: 2603.01692 (replaced)
  • Authors: Yifei Zhang, Xu Yang, Xiao Yang, Bowen Xian, Qizheng Li, Shikai Fang, Jingyuan Li, Jian Wang, Mingrui Xu, Weiqing Liu, Jiang Bian
  • Subjects: cs.LG; cs.AI
  • Tags: LLM Agent, LLM Reasoning, Optimization
  • Code: code
  • Summary: 本文提出Gome智能体,将梯度优化应用于机器学习工程任务,把诊断推理映射为梯度计算、成功记忆映射为动量。该方法在MLE-Bench上达到35.1%的any-medal率,并证明随着LLM推理能力增强,梯度优化逐渐优于树搜索方法。

[679] How Controllable Are Large Language Models? A Unified Evaluation across Behavioral Granularities

  • arXiv: 2603.02578 (replaced)
  • Authors: Ziwen Xu, Kewei Xu, Haoming Xu, Haiwen Hong, Longtao Huang, Hui Xue, Ningyu Zhang, Yongliang Shen, Guozhou Zheng, Huajun Chen, Shumin Deng
  • Subjects: cs.CL; cs.AI; cs.HC; cs.LG
  • Tags: LLM Evaluation, LLM Alignment, Benchmark
  • Venue: ACL 2026
  • Summary: 本文提出SteerEval分层基准测试,用于评估LLM在语言特征、情感和人格三个领域及三个规范层级上的可控性。评估结果表明,控制效果在更细粒度层级上往往会下降,为安全可控的LLM行为研究提供了原则性框架。

[680] Physics-informed AI Accelerated Retention Analysis of Ferroelectric Vertical NAND: From Day-Scale TCAD to Second-Scale Surrogate Model

  • arXiv: 2603.06881 (replaced)
  • Authors: Gyujun Jeong, Sungwon Cho, Minji Shon, Namhoon Kim, Woohyun Hwang, Kwangyou Seo, Suhwan Lim, Wanki Kim, Daewon Ha, Prasanna Venkatesan, Kihang Youn, Ram Cherukuri, Yiyi Wang, Suman Datta, Asif Khan, Shimeng Yu
  • Subjects: cs.LG; cs.AI
  • Tags: Physics-Informed Learning, Neural Operator, EDA
  • Venue: ICMC 2026
  • Summary: 本文提出基于物理信息神经算子的AI代理模型,用于高效预测铁电垂直NAND的阈值电压偏移和保持特性。该模型相比TCAD实现了超过10000倍的加速,同时保持物理精度,为3D Fe-VNAND器件设计优化提供了实用工具。

[681] BiCLIP: Domain Canonicalization via Structured Geometric Transformation

  • arXiv: 2603.08942 (replaced)
  • Authors: Pranav Mantini, Shishir K. Shah
  • Subjects: cs.CV; cs.AI; cs.CL; cs.LG
  • Tags: Vision-Language Model, Domain Adaptation, Few-Shot Learning
  • Venue: CVPR 2026 Workshop
  • Code: code
  • Summary: 本文提出BiCLIP框架,通过对多模态特征应用目标几何变换来增强视觉语言模型的跨域适应能力。该方法在11个标准基准测试上取得了最先进的结果,验证了结构化对齐是鲁棒域适应的关键。

[682] Causally Sufficient and Necessary Feature Expansion for Class-Incremental Learning

  • arXiv: 2603.09145 (replaced)
  • Authors: Zhen Zhang, Jielei Chu, Tianrui Li
  • Subjects: cs.LG; cs.AI
  • Tags: Continual Learning, Causal Inference, Representation Learning
  • Summary: 本文提出基于概率必要性和充分性的正则化方法,用于指导类增量学习中的特征扩展。该方法通过双范围反事实生成器最小化任务内和任务间的PNS风险,确保任务特定特征的因果完整性和任务间表示的可分离性。

[683] Reinforced Generation of Combinatorial Structures: Ramsey Numbers

  • arXiv: 2603.09172 (replaced)
  • Authors: Ansh Nagda, Prabhakar Raghavan, Abhradeep Thakurta
  • Subjects: math.CO; cs.AI; cs.CC
  • Tags: LLM Agent, Mathematical Reasoning, Program Synthesis
  • Summary: 本文使用基于LLM的代码变异智能体AlphaEvolve,发现了七个经典拉姆齐数的改进下界。该单一元算法成功恢复了已知精确值的拉姆齐数下界,并在许多其他情况下匹配了最佳已知下界。

[684] Prompt Injection as Role Confusion

  • arXiv: 2603.12277 (replaced)
  • Authors: Charles Ye, Jasmine Cui, Dylan Hadfield-Menell
  • Subjects: cs.CL; cs.AI; cs.CR
  • Tags: LLM Security, Prompt Engineering, LLM Alignment
  • Summary: 本文将提示注入漏洞追溯到语言模型中的角色混淆问题,即模型根据文本的语气而非来源推断文本来源。作者设计了角色探针并引入CoT Forgery攻击,证明角色混淆程度强烈预测攻击成功率。

[685] Resource Consumption Threats in Large Language Models

  • arXiv: 2603.16068 (replaced)
  • Authors: Yuanhe Zhang, Xinyue Wang, Zhican Chen, Weiliu Wang, Zilu Zhang, Zhengshuo Gong, Zhenhong Zhou, Kun Wang, Li Sun, Yang Liu, Sen Su
  • Subjects: cs.CR; cs.AI; cs.CL
  • Tags: LLM Security, LLM Inference, AI Safety
  • Summary: 本文系统综述了LLM中的资源消耗威胁,从威胁诱导到机制理解和缓解进行了全面考察。论文建立了这一新兴领域的统一视角,为资源消耗威胁的表征和缓解提供了清晰的基础。

[686] Detection Is Cheap, Routing Is Learned: Why Refusal-Based Alignment Evaluation Fails

  • arXiv: 2603.18280 (replaced)
  • Authors: Gregory N. Frank
  • Subjects: cs.LG; cs.AI; cs.CL
  • Tags: LLM Alignment, LLM Evaluation, LLM Security
  • Code: code
  • Summary: 本文通过研究中国源语言模型的政治审查,揭示当前对齐评估方法忽略了概念检测到行为策略之间的路由机制。研究发现拒绝不再是主导审查机制,路由几何具有模型和实验室特异性。

[687] LPNSR: Optimal Noise-Guided Diffusion Image Super-Resolution Via Learnable Noise Prediction

  • arXiv: 2603.21045 (replaced)
  • Authors: Shuwei Huang, Shizhuo Liu, Zijun Wei
  • Subjects: cs.CV; cs.AI
  • Tags: Image Super-Resolution, Diffusion Model, Image Enhancement
  • Code: code
  • Summary: 本文建立了扩散模型中最优中间噪声的理论框架,从最大似然估计角度推导出闭式解析解。提出的LPNSR方法通过可学习噪声预测器,在合成和真实数据集上实现了最先进的感知性能。

[688] Suiren-1.0 Technical Report: A Family of Molecular Foundation Models

  • arXiv: 2603.21942 (replaced)
  • Authors: Junyi An, Xinyu Lu, Yun-Fei Shi, Li-Cheng Xu, Nannan Zhang, Chao Qu, Yuan Qi, Fenglei Cao
  • Subjects: cs.AI
  • Tags: Molecular Generation, Pre-training, Knowledge Distillation
  • Summary: 本文介绍Suiren-1.0分子基础模型家族,用于有机系统的精确建模。该框架包含三个专门变体和基于扩散的构象压缩蒸馏方法,将复杂的3D结构表示蒸馏为2D构象平均表示,在多个任务上取得了最先进的结果。

[689] Beyond Matching to Tiles: Bridging Unaligned Aerial and Satellite Views for Vision-Only UAV Navigation

  • arXiv: 2603.22153 (replaced)
  • Authors: Kejia Liu, Haoyang Zhou, Ruoyu Xu, Peicheng Wang, Mingli Song, Haofei Zhang
  • Subjects: cs.CV; cs.AI
  • Tags: Autonomous Driving, Computer Vision, 3D Vision
  • Venue: CVPR 2026
  • Summary: 本文提出了Bearing-UAV,一种纯视觉驱动的跨视角导航方法,通过联合预测无人机的绝对位置和航向实现精准导航。该方法利用全局和局部结构特征,对跨视角变化、错位和特征稀疏条件具有鲁棒性,并发布了多城市基准数据集Bearing-UAV-90k。

[690] Cognitive Training for Language Models: Towards General Capabilities via Cross-Entropy Games

  • arXiv: 2603.22479 (replaced)
  • Authors: Clément Hongler, Franck Gabriel, Valentin Hartmann, Arthur Renard, Andrew Emil
  • Subjects: math.OC; cs.AI
  • Tags: Curriculum Learning, LLM Training, Meta-Learning
  • Summary: 本文提出了一种称为”认知训练”的框架,通过Cross-Entropy Games构建课程学习,以自动化的方式培养语言模型的通用能力。作者证明在自然假设下,贪婪优化迭代可以生成相关技能发现的课程,从而实现通用能力的构建。

[691] Measuring and curing reasoning rigidity: from decorative chain-of-thought to genuine faithfulness

  • arXiv: 2603.22816 (replaced)
  • Authors: Abhinaba Basu, Pavan Chakraborty
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: LLM Reasoning, Interpretability, LLM Evaluation
  • Summary: 本文引入SLRC指标来测量语言模型的推理是否真正被使用,并提出LC-CoSR训练方法来减少推理刚性。对16个前沿模型的评估揭示了基于RL的推理训练是关键,并发现了忠实度悖论——高SLRC模型更容易受到迎合性影响。

[692] ForestPrune: High-ratio Visual Token Compression for Video Multimodal Large Language Models via Spatial-Temporal Forest Modeling

  • arXiv: 2603.22911 (replaced)
  • Authors: Shaobo Ju, Baiyang Song, Tao Chen, Jiapeng Zhang, Qiong Wu, Chao Chang, HuaiXi Wang, Yiyi Zhou, Rongrong Ji
  • Subjects: cs.CV; cs.AI
  • Tags: Model Compression, Vision-Language Model, Video Understanding
  • Summary: 本文提出了ForestPrune,一种无需训练的视频MLLM token剪枝方法,通过时空森林建模实现高效高比例压缩。实验表明,在LLaVA-OneVision上减少90%的token同时保留95.8%的平均准确率。

[693] Unilateral Relationship Revision Power in Human-AI Companion Interaction

  • arXiv: 2603.23315 (replaced)
  • Authors: Benjamin Lange
  • Subjects: cs.CY; cs.AI; cs.HC
  • Tags: AI Ethics, Human-Computer Interaction, AI Safety
  • Summary: 本文探讨了人机伴侣交互中的道德和结构问题,提出了单边关系修订权(URRP)概念,指出提供商可以在交互中单方面重写AI行为。作者认为这会导致规范性空洞、脆弱性转移和结构性不可调和等问题。

[694] MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens

  • arXiv: 2603.23516 (replaced)
  • Authors: Yu Chen, Runkai Chen, Sheng Yi, Xinda Zhao, Xiaohong Li, Jianjin Zhang, Jun Sun, Chuanrui Hu, Yunyun Han, Lidong Bing, Yafeng Deng, Tianqiao Chen
  • Subjects: cs.CL; cs.AI; cs.IR
  • Tags: Long Context, LLM Inference, Memory Architecture
  • Summary: 本文提出了记忆稀疏注意力(MSA),一种端到端可训练的记忆模型框架,在训练和推理中实现线性复杂度,可扩展至1亿token且退化小于9%。该方法在长上下文基准上超越了前沿LLM、RAG系统和记忆代理。

[695] DecepGPT: Schema-Driven Deception Detection with Multicultural Datasets and Robust Multimodal Learning

  • arXiv: 2603.23916 (replaced)
  • Authors: Jiajian Huang, Dongliang Zhu, Zitong YU, Hui Ma, Jiayu Zhang, Chunmei Zhu, Xiaochun Cao
  • Subjects: cs.CV; cs.AI
  • Tags: Multimodal Learning, Cybersecurity, Benchmark
  • Summary: 本文提出了多模态欺骗检测框架,通过结构化推理链增强可解释性,并发布了T4-Deception数据集——最大的非实验室欺骗检测数据集。提出的SICS和DMC模块在小数据条件下实现了鲁棒学习。

[696] Policy-Guided Threat Hunting: An LLM enabled Framework with Splunk SOC Triage

  • arXiv: 2603.23966 (replaced)
  • Authors: Rishikesh Sahay, Bell Eapen, Weizhi Meng, Md Rasel Al Mamun, Nikhil Kumar Dora, Manjusha Sumasadan, Sumit Kumar Tetarave, Elyson De La Cruz
  • Subjects: cs.CR; cs.AI
  • Tags: LLM Agent, Cybersecurity, Reinforcement Learning
  • Summary: 本文提出了一个自动化威胁狩猎框架,将代理AI与Splunk SIEM集成,使用自编码器、深度强化学习和LLM进行上下文分析。该框架能够自主适应SOC目标并识别可疑和恶意流量。

[697] GameplayQA: A Benchmarking Framework for Decision-Dense POV-Synced Multi-Video Understanding of 3D Virtual Agents

  • arXiv: 2603.24329 (replaced)
  • Authors: Yunzhe Wang, Runhui Xu, Kexin Zheng, Tianyi Zhang, Jayavibhav Niranjan Kogundi, Soham Hans, Volkan Ustun
  • Subjects: cs.CL; cs.AI; cs.CV
  • Tags: Benchmark, Video Understanding, Embodied AI
  • Venue: ACL 2026
  • Summary: 本文介绍了GameplayQA,一个用于评估多模态LLM在3D虚拟环境中感知和推理能力的基准,包含密集标注的游戏视频和2.4K诊断性QA对。评估揭示了前沿MLLM在时序定位和代理归因方面与人类表现存在显著差距。

[698] Do Neurons Dream of Primitive Operators? Wake-Sleep Compression Rediscovers Schank's Event Semantics

  • arXiv: 2603.25975 (replaced)
  • Authors: Peter Balogh
  • Subjects: cs.LG; cs.AI; cs.CL
  • Tags: Knowledge Representation, Representation Learning, Neurosymbolic AI
  • Summary: 本文将DreamCoder的唤醒-睡眠库学习适配到事件状态转换,通过压缩压力自动发现原始操作符,发现的操作符映射到Schank概念依赖理论的核心原语。发现的库在合成数据上达到100%覆盖率,优于手工编码的81%。

[699] Learning to Focus and Precise Cropping: A Reinforcement Learning Framework with Information Gaps and Grounding Loss for MLLMs

  • arXiv: 2603.27494 (replaced)
  • Authors: Xuanpu Zhao, Zhentao Tan, Dianmo Sheng, Tianxiang Chen, Yao Liu, Yue Wu, Tao Gong, Qi Chu, Nenghai Yu
  • Subjects: cs.CV; cs.AI
  • Tags: Vision-Language Model, Reinforcement Learning, Question Answering
  • Venue: CVPR 2026
  • Code: code
  • Summary: 本文提出了一种两阶段强化学习框架,用于增强多模态大语言模型在复杂视觉场景中的感知和推理能力。通过引入”信息差距”机制和定位损失,模型能够更精确地关注裁剪区域,在高分辨率视觉问答任务上取得了最先进的性能。

[700] C2F-Thinker: Coarse-to-Fine Reasoning with Hint-Guided Reinforcement Learning for Multimodal Sentiment Analysis

  • arXiv: 2604.00013 (replaced)
  • Authors: Miaosen Luo, Zhenhao Yang, Jieshen Long, Jinghu Sun, Yichu Liu, Sijie Mai
  • Subjects: cs.CL; cs.AI
  • Tags: Affective Computing, Multimodal Learning, LLM Reasoning
  • Summary: 本文提出了C2F-Thinker框架,将粗到细的结构化推理与提示引导的强化学习相结合用于多模态情感分析。两阶段训练流程提高了可解释性和跨域泛化能力,在细粒度情感回归任务上取得了竞争性表现。

[701] DarwinNet: An Evolutionary Network Architecture for Agent-Driven Protocol Synthesis

  • arXiv: 2604.01236 (replaced)
  • Authors: Jinliang Xu, Bingqi Li
  • Subjects: cs.NE; cs.AI; cs.DC; cs.MA; cs.NI
  • Tags: LLM Agent, Network Protocol, Automated Planning
  • Summary: 本文提出了DarwinNet,一种受生物启发的自演化网络架构,通过LLM驱动的机制在运行时合成通信协议。该系统通过将环境异常作为自主演化的催化剂来实现反脆弱性,同时通过零信任沙箱确保内生安全。

[702] Woosh: A Sound Effects Foundation Model

  • arXiv: 2604.01929 (replaced)
  • Authors: Gaëtan Hadjeres, Marc Ferras, Khaled Koutini, Benno Weck, Alexandre Bittar, Thomas Hummel, Zineb Lahrici, Hakim Missoum, Joan Serrà, Yuki Mitsufuji
  • Subjects: cs.SD; cs.AI; cs.LG
  • Tags: Audio Generation, Multimodal Learning
  • Code: code
  • Summary: 本文介绍了Woosh,Sony AI公开发布的音效基础模型,包括高质量音频编解码器、文本-音频对齐模型以及文本到音频和视频到音频生成模型。评估显示各模块与现有开源替代方案相比具有竞争力或更优的性能。

[703] Beyond Message Passing: A Semantic View of Agent Communication Protocols

  • arXiv: 2604.02369 (replaced)
  • Authors: Dun Yuan, Fuyuan Lyu, Ye Yuan, Weixu Zhang, Bowei He, Jiayi Geng, Linfeng Du, Zipeng Sun, Yankai Chen, Changjiang Han, Jikun Kang, Xi Chen, Haolun Wu, Xue Liu
  • Subjects: cs.NI; cs.AI
  • Tags: LLM Agent, LLM Interoperability, Multi-Agent System
  • Summary: 本文提出了一个三层框架来分析代理通信协议,识别出当前协议设计在传输和语法支持方面较为成熟,但在语义层面的机制支持有限。作者提供了协议选择指导并概述了语义鲁棒代理生态系统的研究议程。

[704] When simulations look right but causal effects go wrong: Large language models as behavioral simulators

  • arXiv: 2604.02458 (replaced)
  • Authors: Zonghan Li, Feng Ji
  • Subjects: cs.CY; cs.AI; cs.ET
  • Tags: Social Simulation, Causal Inference, LLM Evaluation
  • Summary: 本文评估了LLM作为气候心理学干预行为模拟器的表现,发现虽然LLM能较好地再现态度模式,但描述性拟合并不能可靠地转化为因果保真度。这种描述-因果分歧在不同干预类型中存在差异,在行为结果上更为明显。

[705] VERTIGO: Visual Preference Optimization for Cinematic Camera Trajectory Generation

  • arXiv: 2604.02467 (replaced)
  • Authors: Mengtian Li, Yuwei Lu, Feifei Li, Chenqi Gan, Zhifeng Xie, Xi Wang
  • Subjects: cs.CV; cs.AI
  • Tags: Video Generation, Computer Vision, Reinforcement Learning
  • Venue: ECCV 2026
  • Summary: 本文提出了VERTIGO框架,用于电影摄像机轨迹生成的视觉偏好优化。该框架利用Unity渲染2D视觉预览,并通过视觉语言模型评分来提供直接偏好优化(DPO)的信号,显著改善了构图质量和角色离屏率。

[706] LitPivot: Developing Well-Situated Research Ideas Through Dynamic Contextualization and Critique within the Literature Landscape

  • arXiv: 2604.02600 (replaced)
  • Authors: Hita Kambhamettu, Bhavana Dalvi Mishra, Andrew Head, Jonathan Bragg, Aakanksha Naik, Joseph Chee Chang, Pao Siangliulue
  • Subjects: cs.HC; cs.AI
  • Tags: Knowledge Management, Human-Computer Interaction
  • Summary: 本文介绍了LitPivot工具,通过文献启发的动态检索和批评机制帮助研究者发展新颖的研究想法。该工具支持研究者同时起草和完善想法,并根据想法的变化动态调整相关文献。

[707] Valence-Arousal Subspace in LLMs: Circular Emotion Geometry and Multi-Behavioral Control

  • arXiv: 2604.03147 (replaced)
  • Authors: Lihao Sun, Lewen Yan, Xiaoya Lu, Andrew Lee, Jie Zhang, Jing Shao
  • Subjects: cs.CL; cs.AI; cs.CY
  • Tags: Affective Computing, LLM Alignment, Interpretability
  • Summary: 本文在大语言模型表示中识别出效价-唤醒(VA)子空间,该子空间展现出与人类情绪感知模型一致的圆形几何结构。通过沿这些轴进行引导,可以实现对模型输出情感维度以及拒绝和奉承行为的双向控制。

[708] Can LLMs Reason About Attention? Towards Zero-Shot Analysis of Multimodal Classroom Behavior

  • arXiv: 2604.03401 (replaced)
  • Authors: Nolan Platt, Sehrish Nizamani, Alp Tural, Elif Tural, Saad Nizamani, Andrew Katz, Yoonje Lee, Nada Basit
  • Subjects: cs.HC; cs.AI; cs.CV
  • Tags: Education Technology, Video Understanding, LLM Reasoning
  • Summary: 本文提出了一种隐私保护的课堂视频分析流程,通过骨骼提取和视觉注意力估计来理解学生注意力,并使用LLM进行零样本行为分析。系统在姿态提取后立即删除原始视频帧,仅保留几何坐标以确保隐私合规。

[709] Zero-Shot Quantization via Weight-Space Arithmetic

  • arXiv: 2604.03420 (replaced)
  • Authors: Daniele Solombrino, Antonio Andrea Gargiulo, Adrian Robert Minut, Luca Zhou, Alessandro Zirilli, Emanuele Rodolà
  • Subjects: cs.CV; cs.AI; cs.LG
  • Tags: Model Compression, Transfer Learning, Vision Transformer
  • Summary: 本文证明了量化鲁棒性可以通过权重空间算术提取的”量化向量”在模型间迁移。该方法无需接收端量化感知训练数据,为极低比特部署提供了零样本、低成本的替代方案。

[710] The Augmentation Trap: AI Productivity and the Cost of Cognitive Offloading

  • arXiv: 2604.03501 (replaced)
  • Authors: Michael Caosun, Sinan Aral
  • Subjects: cs.HC; cs.AI
  • Tags: AI Ethics, Human-Computer Interaction, AI Safety
  • Summary: 本文开发了一个动态模型,展示AI工具如何在提高短期生产力的同时侵蚀工作者技能,形成”增强陷阱”。研究将部署分为五种机制,识别出有益采用与有害采用的条件。

[711] Inside the Scaffold: A Source-Code Taxonomy of Coding Agent Architectures

  • arXiv: 2604.03515 (replaced)
  • Authors: Benjamin Rombaut
  • Subjects: cs.SE; cs.AI; cs.ET
  • Tags: LLM Agent, Code Generation, Software Engineering
  • Summary: 本文对13个开源编码代理脚手架进行了源代码级别的架构分类分析,提出了12个维度和五种可组合的循环原语。研究发现脚手架架构难以离散分类,控制策略从固定流水线到蒙特卡洛树搜索各不相同。

[712] How Alignment Routes: Localizing, Scaling, and Controlling Policy Circuits in Language Models

  • arXiv: 2604.04385 (replaced)
  • Authors: Gregory N. Frank
  • Subjects: cs.CL; cs.AI; cs.LG
  • Tags: LLM Alignment, Interpretability, LLM Security
  • Code: code
  • Summary: 本文定位了对齐训练语言模型中的策略路由机制,识别出中间层注意力门控如何触发拒绝行为。研究表明该机制可以被调制以控制策略从拒绝到事实回答,且编码绕过可以规避安全措施。

[713] StableTTA: Training-Free Test-Time Adaptation that Improves Model Accuracy on ImageNet1K to 96%

  • arXiv: 2604.04552 (replaced)
  • Authors: Zheng Li, Jerry Cheng, Huanying Helen Gu
  • Subjects: cs.CV; cs.AI
  • Tags: Computer Vision, Transfer Learning, Vision Transformer
  • Code: code
  • Summary: 本文提出了StableTTA,一种无需训练的测试时适应方法,通过新颖的图像和logit处理显著提升ImageNet准确率。该方法使轻量级架构在减少参数和计算成本的同时超越ViT性能。

[714] EduIllustrate: Towards Scalable Automated Generation Of Multimodal Educational Content

  • arXiv: 2604.05005 (replaced)
  • Authors: Shuzhen Bi, Mingzi Zhang, Zhuoxuan Li, Xiaolong Wang, Keqian Li, Aimin Zhou
  • Subjects: cs.CY; cs.AI; cs.CL
  • Tags: Education Technology, Multimodal Learning, Benchmark
  • Summary: 本文提出了EduIllustrate基准,用于评估LLM生成K-12 STEM问题的图文交织解释能力。该基准包含230道题目和8维度评估标准,评估显示LLM在多模态教育内容生成方面存在较大性能差异。

[715] VideoStir: Understanding Long Videos via Spatio-Temporally Structured and Intent-Aware RAG

  • arXiv: 2604.05418 (replaced)
  • Authors: Honghao Fu, Miao Xu, Yiwei Wang, Dailing Zhang, Liu Jun, Yujun Cai
  • Subjects: cs.CV; cs.AI
  • Tags: RAG, Video Understanding, Multimodal Learning
  • Venue: ACL 2026
  • Summary: 本文提出了VideoStir框架,通过时空图结构和意图感知检索来理解长视频。该方法引入了IR-600K数据集用于学习帧-查询意图对齐,在无需辅助信息的情况下达到了最先进的性能。

[716] The Defense Trilemma: Why Prompt Injection Defense Wrappers Fail?

  • arXiv: 2604.06436 (replaced)
  • Authors: Manish Bhatt, Sarthak Munshi, Vineeth Sai Narajala, Idan Habler, Ammar Al-Kahfah, Ken Huang, Joel Webb, Blake Gatto, Md Tamjidul Hoque
  • Subjects: cs.CR; cs.AI
  • Tags: LLM Security, LLM Alignment, AI Safety
  • Summary: 本文证明了不存在连续且保持效用的包装防御能使LLM的所有输出严格安全,建立了”防御三难困境”:连续性、效用保持和完备性无法共存。该理论在Lean 4中进行了机械验证并在三个LLM上实证验证。

[717] MoBiE: Efficient Inference of Mixture of Binary Experts under Post-Training Quantization

  • arXiv: 2604.06798 (replaced)
  • Authors: Zhixiong Zhao, Zukang Xu, Zhixuan Chen, Dawei Yang
  • Subjects: cs.LG; cs.AI
  • Tags: Mixture-of-Experts, Model Compression, LLM Inference
  • Venue: ACL 2026 Findings
  • Code: code
  • Summary: 本文提出了MoBiE,首个针对混合专家LLM的二值化框架,通过联合SVD分解、全局损失梯度和误差约束来解决跨专家冗余和路由偏移问题。该方法在显著降低困惑度的同时实现了超过2倍的推理加速。

[718] Exact Structural Abstraction and Tractability Limits

  • arXiv: 2604.07349 (replaced)
  • Authors: Tristan Simas
  • Subjects: cs.CC; cs.AI; cs.LO
  • Tags: Formal Methods, Deep Learning Theory
  • Summary: 本文证明了精确认证的高效可检查结构谓词的元不可能定理,表明在闭包封闭域上不存在正确的可处理性分类器能产生精确表征。结果在Lean 4中形式化,揭示了正确性层面的根本障碍。

[719] Latent Structure of Affective Representations in Large Language Models

  • arXiv: 2604.07382 (replaced)
  • Authors: Benjamin J. Choi, Melanie Weber
  • Subjects: cs.LG; cs.AI
  • Tags: Affective Computing, Interpretability, Representation Learning
  • Summary: 本文研究了LLM中情感表示的潜在结构,发现其与心理学中的效价-唤醒模型一致,并展现出可被线性近似的非线性几何结构。该表示空间可用于量化情感处理任务中的不确定性。

[720] FORGE: Fine-grained Multimodal Evaluation for Manufacturing Scenarios

  • arXiv: 2604.07413 (replaced)
  • Authors: Xiangru Jian, Hao Xu, Wei Pang, Xinjian Zhao, Chengyu Tao, Qixin Zhang, Xikun Zhang, Chao Zhang, Guanzhi Deng, Alex Xue, Juan Du, Tianshu Yu, Garth Tarr, Linqi Song, Qiuzhuang Sun, Dacheng Tao
  • Subjects: cs.CV; cs.AI; cs.LG
  • Tags: Manufacturing AI, Vision-Language Model, Benchmark
  • Summary: 本文介绍了FORGE基准,用于评估多模态大语言模型在制造场景中的表现,涵盖工件验证、结构表面检测和装配验证三个任务。分析表明领域知识而非视觉定位是主要瓶颈。

[721] Private Seeds, Public LLMs: Realistic and Privacy-Preserving Synthetic Data Generation

  • arXiv: 2604.07486 (replaced)
  • Authors: Qian Ma, Sarah Rajtmajer
  • Subjects: cs.CR; cs.AI
  • Tags: Privacy, Data Synthesis
  • Summary: 本文提出了一种名为RPSG的方法,利用私有种子和差分隐私机制生成既真实又保护隐私的合成数据。实验表明,该方法在保持高保真度的同时提供了强有力的隐私保护。

[722] Detecting HIV-Related Stigma in Clinical Narratives Using Large Language Models

  • arXiv: 2604.07717 (replaced)
  • Authors: Ziyi Chen, Yasir Khan, Mengyuan Zhang, Cheng Peng, Mengxian Lyu, Yiyang Liu, Krishna Vaddiparti, Robert L Cook, Mattia Prosperi, Yonghui Wu
  • Subjects: cs.CL; cs.AI
  • Tags: Medical AI, Text Classification
  • Summary: 本研究开发了一种基于大语言模型的工具,用于从临床笔记中识别HIV相关污名。实验比较了编码器模型和生成式模型,发现GatorTron表现最佳,且少样本提示显著提升了生成模型的性能。

[723] Data Selection for Multi-turn Dialogue Instruction Tuning

  • arXiv: 2604.07892 (replaced)
  • Authors: Bo Li, Shikun Zhang, Wei Ye
  • Subjects: cs.CL; cs.AI
  • Tags: Data Selection, Instruction Tuning, Dialogue System
  • Code: code
  • Summary: 本文提出了一种名为MDS的多轮对话选择框架,通过评估整个对话而非孤立轮次来进行数据选择。该方法在多轮对话基准测试中表现优异,且在长对话中更具鲁棒性。

[724] Multi-Modal Learning meets Genetic Programming: Analyzing Alignment in Latent Space Optimization

  • arXiv: 2604.08324 (replaced)
  • Authors: Benjamin Léger, Kazem Meidani, Christian Gagné
  • Subjects: cs.NE; cs.AI
  • Tags: Symbolic Regression, Multimodal Learning, Optimization
  • Summary: 本文研究了多模态潜在空间优化方法SNIP在符号回归中的表现,发现其跨模态对齐过于粗糙,无法有效引导符号搜索,指出了细粒度对齐是未来的关键方向。

[725] Re-Mask and Redirect: Exploiting Denoising Irreversibility in Diffusion Language Models

  • arXiv: 2604.08557 (replaced)
  • Authors: Arth Singh
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Security, Adversarial Robustness, Diffusion Model
  • Summary: 本文提出了一种针对扩散语言模型的轨迹级攻击方法TrajHijack,通过重新掩码和注入前缀绕过安全对齐。实验发现该方法攻击成功率极高,且现有的防御机制反而可能增加模型的脆弱性。

[726] Accelerating Transformer-Based Monocular SLAM via Geometric Utility Scoring

  • arXiv: 2604.08718 (replaced)
  • Authors: Xinmiao Xiong, Bangya Liu, Hao Wang, Dayou Li, Nuo Chen, Andrew Feng, Mingyu Ding, Suman Banerjee, Yang Zhou, Zhiwen Fan
  • Subjects: cs.CV; cs.AI; cs.RO
  • Tags: 3D Vision, Optimization
  • Summary: 本文提出了LeanGate,一种轻量级的前馈帧门控网络,通过预测几何效用分数来加速基于Transformer的单目SLAM系统。该方法显著减少了计算冗余,同时保持了跟踪和建图的精度。

[727] Aligned Agents, Biased Swarm: Measuring Bias Amplification in Multi-Agent Systems

  • arXiv: 2604.08963 (replaced)
  • Authors: Keyu Li, Jin Gao, Dequan Wang
  • Subjects: cs.MA; cs.AI
  • Tags: Multi-Agent System, Bias Mitigation, AI Ethics
  • Venue: ICLR 2026
  • Code: code
  • Summary: 本文研究了多智能体系统中的偏见放大现象,发现结构化的工作流往往会放大微小的随机偏见。研究引入了一个新的基准测试,揭示了系统复杂性并不保证伦理鲁棒性。

[728] PS-TTS: Phonetic Synchronization in Text-to-Speech for Achieving Natural Automated Dubbing

  • arXiv: 2604.09111 (replaced)
  • Authors: Changi Hong, Yoonah Song, Hwayoung Park, Chaewoon Bang, Dayeon Gu, Do Hyun Lee, Hong Kook Kim
  • Subjects: eess.AS; cs.AI
  • Tags: Speech Synthesis, Multimodal Learning
  • Venue: ICPR 2026
  • Summary: 本文提出了一种用于自动配音的文本转语音方法PS-TTS,通过音素同步和等时性调整来改善口型同步和时长匹配。实验表明,该方法在客观指标上优于传统TTS,甚至超过了真人配音。

[729] Interactive ASR: Towards Human-Like Interaction and Semantic Coherence Evaluation for Agentic Speech Recognition

  • arXiv: 2604.09121 (replaced)
  • Authors: Peng Wang, Yanqiao Zhu, Zixuan Jiang, Qinyuan Chen, Xingjian Zhao, Xipeng Qiu, Wupeng Wang, Zhifu Gao, Xiangang Li, Kai Yu, Xie Chen
  • Subjects: cs.CL; cs.AI; cs.SD
  • Tags: Speech Processing, LLM Agent, LLM Evaluation
  • Summary: 本文提出了一个交互式语音识别框架,利用LLM作为评判者来评估语义正确性,并通过LLM驱动的智能体模拟人类交互进行迭代修正。实验证明了该方法在提高语义保真度方面的有效性。

[730] Physics-guided surrogate learning enables zero-shot control of turbulent wings

  • arXiv: 2604.09434 (replaced)
  • Authors: Yuning Wang, Pol Suarez, Mathis Bode, Ricardo Vinuesa
  • Subjects: cs.AI
  • Tags: Reinforcement Learning, Physics-Informed Learning, Zero-Shot Learning
  • Summary: 本文提出了一种物理引导的代理学习方法,通过在湍流通道中训练策略,实现了对机翼湍流的零样本控制。该方法显著降低了蒙皮摩擦阻力和总阻力,且训练成本大幅降低。

[731] Many-Tier Instruction Hierarchy in LLM Agents

  • arXiv: 2604.09443 (replaced)
  • Authors: Jingyu Zhang, Tianjian Li, William Jurayj, Hongyuan Zhan, Benjamin Van Durme, Daniel Khashabi
  • Subjects: cs.CL; cs.AI
  • Tags: LLM Agent, LLM Alignment, Benchmark
  • Summary: 本文提出了多层级指令层次结构范式,用于解决LLM智能体中多源指令冲突的问题,并引入了相应的基准测试ManyIH-Bench。实验表明,现有前沿模型在处理大规模指令冲突时表现不佳。
This post is licensed under CC BY 4.0 by the author.

Trending Tags