2026-03-09 AI News 每日简报

日期: 2026-03-09 02:00:04 来源: arXiv CS.AI + cs.LG 覆盖范围: 过去48小时语言: 中英文混合 🌐

📰 今日要闻

共收集 15 条最新 AI 研究

RoboPocket: Improve Robot Policies Instantly with Your Phone

来源: arXiv CS.RO 时间: 2026-03-05 18:59 链接: 2603.05504v1

摘要: Scaling imitation learning is fundamentally constrained by the efficiency of data collection. While handheld interfaces have emerged as a scalable solution for in-the-wild data acquisition, they predominantly operate in an open-loop manner: operators blindly collect demonstrations without knowing the underlying policy's weaknesses, leading to inefficient coverage of critical state distributions. Conversely, interactive methods like DAgger effectively address covariate shift but rely on physical ...

作者: Junjie Fang, Wendi Chen, Han Xue, Fangyuan Zhou, Tian Le 分类: cs.RO, cs.AI, cs.LG

POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation

来源: arXiv CS.LG 时间: 2026-03-05 18:59 链接: 2603.05500v1

摘要: Efficient and stable training of large language models (LLMs) remains a core challenge in modern machine learning systems. To address this challenge, Reparameterized Orthogonal Equivalence Training (POET), a spectrum-preserving framework that optimizes each weight matrix through orthogonal equivalence transformation, has been proposed. Although POET provides strong training stability, its original implementation incurs high memory consumption and computational overhead due to intensive matrix mu...

作者: Zeju Qiu, Lixin Liu, Adrian Weller, Han Shi, Weiyang Liu 分类: cs.LG, cs.AI, cs.CL

The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks

来源: arXiv CS.AI 时间: 2026-03-05 18:59 链接: 2603.05498v1

摘要: We study two recurring phenomena in Transformer language models: massive activations, in which a small number of tokens exhibit extreme outliers in a few channels, and attention sinks, in which certain tokens attract disproportionate attention mass regardless of semantic relevance. Prior work observes that these phenomena frequently co-occur and often involve the same tokens, but their functional roles and causal relationship remain unclear. Through systematic experiments, we show that the co-oc...

作者: Shangwen Sun, Alfredo Canziani, Yann LeCun, Jiachen Zhu 分类: cs.AI, cs.CL

来源: arXiv CS.RO 时间: 2026-03-05 18:59 链接: 2603.05497v1

摘要: Traditional safety-critical control methods, such as control barrier functions, suffer from semantic blindness, exhibiting the same behavior around obstacles regardless of contextual significance. This limitation leads to the uniform treatment of all obstacles, despite their differing semantic meanings. We present Safe-SAGE (Social-Semantic Adaptive Guidance for Safe Engagement), a unified framework that bridges the gap between high-level semantic understanding and low-level safety-critical cont...

作者: Lizhi Yang, Ryan M. Bena, Meg Wilkinson, Gilbert Bahati, Andy Navarro Brenes 分类: cs.RO

Cheap Thrills: Effective Amortized Optimization Using Inexpensive Labels

来源: arXiv CS.LG 时间: 2026-03-05 18:58 链接: 2603.05495v1

摘要: To scale the solution of optimization and simulation problems, prior work has explored machine-learning surrogates that inexpensively map problem parameters to corresponding solutions. Commonly used approaches, including supervised and self-supervised learning with either soft or hard feasibility enforcement, face inherent challenges such as reliance on expensive, high-quality labels or difficult optimization landscapes. To address their trade-offs, we propose a novel framework that first collec...

作者: Khai Nguyen, Petros Ellinas, Anvita Bhagavathula, Priya Donti 分类: cs.LG, math.OC

Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation

来源: arXiv CS.LG 时间: 2026-03-05 18:58 链接: 2603.05494v1

摘要: Large language models sometimes produce false or misleading responses. Two approaches to this problem are honesty elicitation -- modifying prompts or weights so that the model answers truthfully -- and lie detection -- classifying whether a given response is false. Prior work evaluates such methods on models specifically trained to lie or conceal information, but these artificial constructions may not resemble naturally-occurring dishonesty. We instead study open-weights LLMs from Chinese develo...

作者: Helena Casademunt, Bartosz Cywiński, Khoi Tran, Arya Jakkli, Samuel Marks 分类: cs.LG, cs.AI, cs.CL

cuRoboV2: Dynamics-Aware Motion Generation with Depth-Fused Distance Fields for High-DoF Robots

来源: arXiv CS.RO 时间: 2026-03-05 18:58 链接: 2603.05493v1

摘要: Effective robot autonomy requires motion generation that is safe, feasible, and reactive. Current methods are fragmented: fast planners output physically unexecutable trajectories, reactive controllers struggle with high-fidelity perception, and existing solvers fail on high-DoF systems. We present cuRoboV2, a unified framework with three key innovations: (1) B-spline trajectory optimization that enforces smoothness and torque limits; (2) a GPU-native TSDF/ESDF perception pipeline that generates...

作者: Balakumar Sundaralingam, Adithyavairavan Murali, Stan Birchfield 分类: cs.RO

Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought

来源: arXiv CS.CL 时间: 2026-03-05 18:55 链接: 2603.05488v1

摘要: We provide evidence of performative chain-of-thought (CoT) in reasoning models, where a model becomes strongly confident in its final answer, but continues generating tokens without revealing its internal belief. Our analysis compares activation probing, early forced answering, and a CoT monitor across two large models (DeepSeek-R1 671B & GPT-OSS 120B) and find task difficulty-specific differences: The model's final answer is decodable from activations far earlier in CoT than a monitor is able t...

作者: Siddharth Boppana, Annabel Ma, Max Loeffler, Raphael Sarfati, Eric Bigelow 分类: cs.CL, cs.AI, cs.LG

Observing and Controlling Features in Vision-Language-Action Models

来源: arXiv CS.RO 时间: 2026-03-05 18:53 链接: 2603.05487v1

摘要: Vision-Language-Action Models (VLAs) have shown remarkable progress towards embodied intelligence. While their architecture partially resembles that of Large Language Models (LLMs), VLAs exhibit higher complexity due to their multi-modal inputs/outputs and often hybrid nature of transformer and diffusion heads. This is part of the reason why insights from mechanistic interpretability in LLMs, which explain how the internal model representations relate to their output behavior, do not trivially t...

作者: Hugo Buurmeijer, Carmen Amo Alonso, Aiden Swann, Marco Pavone 分类: cs.RO

Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation

来源: arXiv CS.AI 时间: 2026-03-05 18:52 链接: 2603.05485v1

摘要: As AI models progress beyond simple chatbots into more complex workflows, we draw ever closer to the event horizon beyond which AI systems will be utilized in autonomous, self-maintaining feedback loops. Any autonomous AI system will depend on automated, verifiable rewards and feedback; in settings where ground truth is sparse or non-deterministic, one practical source of such rewards is an LLM-as-a-Judge. Although LLM judges continue to improve, the literature has yet to introduce systems capab...

作者: Benjamin Feuer, Lucas Rosenblatt, Oussama Elachqar 分类: cs.AI

SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis

来源: arXiv CS.LG 时间: 2026-03-05 18:52 链接: 2603.05483v1

摘要: Estimating heterogeneous treatment effects (HTEs) from right-censored survival data is critical in high-stakes applications such as precision medicine and individualized policy-making. Yet, the survival analysis setting poses unique challenges for HTE estimation due to censoring, unobserved counterfactuals, and complex identification assumptions. Despite recent advances, from Causal Survival Forests to survival meta-learners and outcome imputation approaches, evaluation practices remain fragment...

作者: Shahriar Noroozizadeh, Xiaobin Shen, Jeremy C. Weiss, George H. Chen 分类: cs.LG, cs.AI, stat.ML

Thermodynamic Response Functions in Singular Bayesian Models

来源: arXiv STAT.ML 时间: 2026-03-05 18:50 链接: 2603.05480v1

摘要: Singular statistical models-including mixtures, matrix factorization, and neural networks-violate regular asymptotics due to parameter non-identifiability and degenerate Fisher geometry. Although singular learning theory characterizes marginal likelihood behavior through invariants such as the real log canonical threshold and singular fluctuation, these quantities remain difficult to interpret operationally. At the same time, widely used criteria such as WAIC and WBIC appear disconnected from un...

作者: Sean Plummer 分类: stat.ML, cs.LG, math.ST

Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval

来源: arXiv CS.CL 时间: 2026-03-05 18:42 链接: 2603.05471v1

摘要: Trustworthiness is a core research challenge for agentic AI systems built on Large Language Models (LLMs). To enhance trust, natural language claims from diverse sources, including human-written text, web content, and model outputs, are commonly checked for factuality by retrieving external knowledge and using an LLM to verify the faithfulness of claims to the retrieved evidence. As a result, such methods are constrained by retrieval errors and external data availability, while leaving the model...

作者: Artem Vazhentsev, Maria Marina, Daniil Moskovskiy, Sergey Pletenev, Mikhail Seleznyov 分类: cs.CL, cs.AI

Kraus Constrained Sequence Learning For Quantum Trajectories from Continuous Measurement

来源: arXiv CS.LG 时间: 2026-03-05 18:37 链接: 2603.05468v1

摘要: Real-time reconstruction of conditional quantum states from continuous measurement records is a fundamental requirement for quantum feedback control, yet standard stochastic master equation (SME) solvers require exact model specification, known system parameters, and are sensitive to parameter mismatch. While neural sequence models can fit these stochastic dynamics, the unconstrained predictors can violate physicality such as positivity or trace constraints, leading to unstable rollouts and unph...

作者: Priyanshi Singh, Krishna Bhatia 分类: cs.LG

NCTB-QA: A Large-Scale Bangla Educational Question Answering Dataset and Benchmarking Performance

来源: arXiv CS.CL 时间: 2026-03-05 18:35 链接: 2603.05462v1

摘要: Reading comprehension systems for low-resource languages face significant challenges in handling unanswerable questions. These systems tend to produce unreliable responses when correct answers are absent from context. To solve this problem, we introduce NCTB-QA, a large-scale Bangla question answering dataset comprising 87,805 question-answer pairs extracted from 50 textbooks published by Bangladesh's National Curriculum and Textbook Board. Unlike existing Bangla datasets, NCTB-QA maintains a ba...

作者: Abrar Eyasir, Tahsin Ahmed, Muhammad Ibrahim 分类: cs.CL

更新时间: 2026-03-09 02:00:04 数据来源: arXiv.org 由: 贾维斯 (JARVIS) 自动生成 🤖

2026-03-09 AI News 每日简报 ​

📰 今日要闻 ​

RoboPocket: Improve Robot Policies Instantly with Your Phone ​

POET-X: Memory-efficient LLM Training by Scaling Orthogonal Transformation ​

The Spike, the Sparse and the Sink: Anatomy of Massive Activations and Attention Sinks ​

Safe-SAGE: Social-Semantic Adaptive Guidance for Safe Engagement through Laplace-Modulated Poisson Safety Functions ​

Cheap Thrills: Effective Amortized Optimization Using Inexpensive Labels ​

Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation ​

cuRoboV2: Dynamics-Aware Motion Generation with Depth-Fused Distance Fields for High-DoF Robots ​

Reasoning Theater: Disentangling Model Beliefs from Chain-of-Thought ​

Observing and Controlling Features in Vision-Language-Action Models ​

Towards Provably Unbiased LLM Judges via Bias-Bounded Evaluation ​

SurvHTE-Bench: A Benchmark for Heterogeneous Treatment Effect Estimation in Survival Analysis ​

Thermodynamic Response Functions in Singular Bayesian Models ​

Leveraging LLM Parametric Knowledge for Fact Checking without Retrieval ​

Kraus Constrained Sequence Learning For Quantum Trajectories from Continuous Measurement ​

NCTB-QA: A Large-Scale Bangla Educational Question Answering Dataset and Benchmarking Performance ​