📰 AI 日报 - 2026年2月5日

日期：2026年2月5日（星期三） 生成时间： 2026-02-05 03:00:00 AM GMT+8 来源： The Verge AI, AI Hub Today, arXiv, industry reports 覆盖范围： Last 48 hours (Feb 3-5, 2026)

🎓 学术研究（arXiv 论文）

理解 Agent 扩展 in LLM-Based Multi-Agent Systems via Diversity

arXiv: 2602.03794 • Submitted: Feb 3, 2026

Authors: Shangding Gu et al.

Key Contribution: 2 diverse agents outperform 16 homogeneous agents in multi-agent systems.

Abstract: LLM-based multi-agent systems (MAS) have emerged as a promising approach to tackle complex tasks that are difficult for individual LLMs. A natural strategy is to scale performance by increasing the number of agents; however, we find that such scaling exhibits strong diminishing returns in homogeneous settings, while introducing heterogeneity (e.g., different models, prompts, or tools) continues to yield substantial gains. This raises a fundamental question: what limits scaling, and why does diversity help? We present an information-theoretic framework showing that MAS performance is bounded by the intrinsic task uncertainty, not by agent count. We derive architecture-agnostic bounds demonstrating that improvements depend on how many effective channels the system accesses. Homogeneous agents saturate early because their outputs are strongly correlated, whereas heterogeneous agents contribute complementary evidence. Empirically, we show that heterogeneous configurations consistently outperform homogeneous scaling: 2 diverse agents can match or exceed the performance of 16 homogeneous agents.

Impact: Provides principled guidelines for building efficient multi-agent systems through diversity-aware design, challenging the "more agents is better" paradigm.

🔗 Link: https://arxiv.org/abs/2602.03794

AOrchestra：自动化子 Agent 创建 for Agentic Orchestration

arXiv: 2602.03786 • Submitted: Feb 3, 2026

Authors: Jianhao Ruan, Zhihao Xu, Yiran Peng, et al.

Key Contribution: Unified agent abstraction enabling automatic sub-agent creation with 16.28% performance improvement.

Abstract: Language agents have shown strong promise for task automation. Realizing this promise for increasingly complex, long-horizon tasks has driven the rise of a sub-agent-as-tools paradigm for multi-turn task solving. However, existing designs still lack a dynamic abstraction view of sub-agents, thereby hurting adaptability. We address this challenge with a unified, framework-agnostic agent abstraction that models any agent as a tuple (Instruction, Context, Tools, Model). This tuple acts as a compositional recipe for capabilities, enabling the system to spawn specialized executors for each task on demand. Building on this abstraction, we introduce an agentic system AOrchestra, where the central orchestrator concretizes the tuple at each step: it curates task-relevant context, selects tools and models, and delegates execution via on-the-fly automatic agent creation.

Impact: Reduces human engineering efforts while enabling controllable performance-cost trade-offs, approaching Pareto-efficient agent orchestration.

🔗 Link: https://arxiv.org/abs/2602.03786

AutoFigure：生成 Publication-Ready Scientific Illustrations

arXiv: 2602.03828 • Submitted: Feb 3, 2026

Authors: Yixuan Weng et al.

Key Contribution: First agentic framework for automated high-quality scientific illustration generation from text.

Abstract: High-quality scientific illustrations are crucial for effectively communicating complex scientific and technical concepts, yet their manual creation remains a well-recognized bottleneck in both academia and industry. We present FigureBench, the first large-scale benchmark for generating scientific illustrations from long-form scientific texts. It contains 3,300 high-quality scientific text-figure pairs, covering diverse text-to-illustration tasks from scientific papers, surveys, blogs, and textbooks. Moreover, we propose AutoFigure, the first agentic framework that automatically generates high-quality scientific illustrations based on long-form scientific text.

Impact: Accepted at ICLR 2026 - addresses major bottleneck in scientific communication with agentic AI solution.

🔗 Link: https://arxiv.org/abs/2602.03828

Conformal Thinking: Risk Control for Reasoning on a Compute Budget

arXiv: 2602.03814 • Submitted: Feb 3, 2026

Authors: Xi Wang et al.

Key Contribution: Distribution-free risk control framework for adaptive reasoning with optimal token budget allocation.

Abstract: Reasoning Large Language Models (LLMs) enable test-time scaling, with dataset-level accuracy improving as the token budget increases, motivating adaptive reasoning -- spending tokens when they improve reliability and stopping early when additional computation is unlikely to help. However, setting the token budget, as well as the threshold for adaptive reasoning, is a practical challenge that entails a fundamental risk-accuracy trade-off. We re-frame the budget setting problem as risk control, limiting the error rate while minimizing compute. Our framework introduces an upper threshold that stops reasoning when the model is confident (risking incorrect output) and a novel parametric lower threshold that preemptively stops unsolvable instances.

Impact: Enables computationally efficient reasoning while maintaining user-specified risk targets across diverse tasks and models.

🔗 Link: https://arxiv.org/abs/2602.03814

TodyComm: Task-Oriented Dynamic Communication for Multi-Round LLM-based Multi-Agent System

arXiv: 2602.03688 • Submitted: Feb 3, 2026

Authors: Wenzhe Fan et al.

Key Contribution: Dynamic communication topology that adapts to changing agent roles across rounds.

Abstract: Multi-round LLM-based multi-agent systems rely on effective communication structures to support collaboration across rounds. However, most existing methods employ a fixed communication topology during inference, which falls short in many realistic applications where the agents' roles may change across rounds due to dynamic adversary, task progression, or time-varying constraints such as communication bandwidth. We propose addressing this issue through TodyComm, a task-oriented dynamic communication algorithm that produces behavior-driven collaboration topologies that adapt to the dynamics at each round, optimizing the utility for the task through policy gradient.

Impact: Delivers superior task effectiveness under dynamic adversary and communication budget constraints while retaining token efficiency.

🔗 Link: https://arxiv.org/abs/2602.03688

🔥 重大公告

OpenAI Codex 桌面版 Application Launch

Summary: OpenAI releases desktop application for multi-agent orchestration with independent thread execution.

Key Points:

Designed as a command center for multi-agent systems
Each agent runs in independent threads
Project-based organization with Git Worktree support
Custom Skills framework for cross-platform synchronization
Direct competitor to Claude Code and other agentic IDEs

Impact: Intensifies competition in agentic coding space, offering developers native multi-agent collaboration capabilities.

📅 Source: AI Hub Today • Feb 4, 2026 🔗 Link: https://www.xiaohu.ai/c/xiaohu-ai/openai-codex-skills-5f0c89

GLM-5 and MiniMax M2.2 Coming Before Chinese New Year

Summary: Major Chinese AI models set for release before February 15, 2026.

Key Points:

Zhipu AI's GLM-5 focuses on creative writing, coding, and reasoning breakthroughs
MiniMax M2.2 enhances programming capabilities as "programmer's secret weapon"
DeepSeek releases minor V3 series update
ByteDance and Alibaba also preparing new model launches

Impact: Chinese AI companies accelerating release schedules to compete with global leaders, focusing on specialized capabilities.

📅 Source: AI Hub Today • Feb 4, 2026 🔗 Link: https://www.aibase.com/zh/news/25219

Sam Altman "Kinda-Sorta-Almost" Declares AGI

Summary: OpenAI CEO makes ambiguous statement about achieving Artificial General Intelligence.

Key Points:

Altman stated "we basically have built AGI, or very close to it" in Forbes profile
Later clarified: "I meant that as a spiritual statement, not a literal one"
Conceded AGI will require "a lot of medium-sized breakthroughs. I don't think we need a big one"
Highlights ongoing debate about AGI definition and timeline

Impact: Continues pattern of ambiguous AGI claims from OpenAI leadership, fueling both excitement and skepticism in AI community.

📅 Source: The Verge • Feb 3, 2026 🔗 Link: https://www.forbes.com/sites/richardnieva/2026/02/03/sam-altman-explains-the-future/

🔬 Research & Papers

Tencent's CL-bench Reveals Models Can't Learn from Context

Summary: New benchmark shows LLMs only solve 17.2% of in-context learning tasks on average.

Key Points:

First paper from Tencent Hunyuan after hiring Yao Shunyu
Benchmark tests model ability to learn and apply new knowledge from context
Best performer (GPT-5.1) only achieves 23.7% success rate
Reveals fundamental limitation: models don't truly utilize context effectively

Impact: Challenges assumptions about LLM in-context learning capabilities, suggests need for better learning mechanisms.

📅 Source: AI Hub Today • Feb 4, 2026 🔗 Link: https://www.jiqizhixin.com/articles/2026-02-03-7

ProjDevBench Evaluates End-to-End AI Project Development

Summary: New benchmark tests AI agents on complete project lifecycle from requirements to repository.

Key Points:

Existing benchmarks focus on bug fixing
ProjDevBench evaluates full software development lifecycle
20 programming tasks across 8 categories
Combines online judge testing with LLM code review
Six coding agents achieved only 27.38% overall pass rate
Complex system design identified as major weakness

Impact: Reveals significant gap in AI agents' ability to handle complete software development projects, highlighting need for better system design capabilities.

📅 Source: AI Hub Today • Feb 4, 2026 🔗 Link: https://arxiv.org/abs/2602.01655

Reinforcement Learning for Explainable Human Decision Modeling

Summary: New research direction uses outcome-based RL to guide LLMs in generating explicit reasoning chains.

Key Points:

Cognitive modeling approach for human decision explanation
Outcome-based reinforcement learning guides reasoning chain generation
Goals: prediction accuracy AND explainability
Moves beyond black-box predictions to interpretable AI

Impact: Addresses critical need for explainable AI in decision-critical applications, making AI reasoning transparent and verifiable.

📅 Source: AI Hub Today • Feb 4, 2026 🔗 Link: https://arxiv.org/abs/2505.11614

RLVR Training Instability Mechanism Revealed

Summary: Research explains why verifiable reward reinforcement learning causes MoE architecture collapse.

Key Points:

RLVR can continuously improve reasoning ability but MoE architectures often crash
Proposed objective-level hacking framework to explain instability
Core finding: token-level credit mismatch creates false signals
Leads to abnormal growth in training-inference discrepancy

Impact: Understanding RLVR instability crucial for developing more robust RL-based reasoning systems, preventing model collapse during training.

📅 Source: AI Hub Today • Feb 4, 2026 🔗 Link: https://arxiv.org/abs/2602.01103

💰 Industry & Business

Musk Announces SpaceX-xAI Merger: $1.25 Trillion Valuation

Summary: Historic merger creates world's most valuable AI/space company with ambitious space-based computing plans.

Key Points:

Combined valuation reaches $1.25 trillion
Internal memo reveals plan for space-deployed data centers
Musk predicts space-based AI is only path to true scale
Plans to launch 1 million satellites to build orbital data centers
Aiming for Kardashev Type II civilization capabilities
Leverages space's natural cooling (cryogenic vacuum environment)

Impact: Could revolutionize AI infrastructure by moving computing to space, bypassing terrestrial limitations and creating orbital data center networks.

📅 Source: AI Hub Today • Feb 4, 2026 🔗 Link: https://www.qbitai.com/2026/02/375614.html

SpaceX Files for Million-Satellite Computing Constellation

Summary: Applications filed for constellation with 80 EFLOPS total computing power.

Key Points:

Core purpose: orbital data centers, not communication
80 EFLOPS combined computing power planned
Space's natural vacuum solves cooling challenges
Timeline: 2028 startup, 2030 completion target
Traditional data center providers face potential disruption
"Dimension reduction strike" against terrestrial IDC industry

Impact: If successful, would fundamentally alter AI infrastructure landscape, creating space-based computing platform with unlimited scalability potential.

📅 Source: AI Hub Today • Feb 4, 2026 🔗 Link: https://www.aibase.com/zh/news/25192

Tencent Hunyuan Hires Another Top Scientist

Summary: Tsinghua PhD Pang Tianyu joins as Chief Research Scientist for Multimodal Division.

Key Points:

Focus on reinforcement learning technology research
Previously at Singapore Sea AI Lab
Second major hire after Yao Shunyu
Signals Tencent's aggressive investment in AI research talent

Impact: Chinese tech giants competing aggressively for top AI research talent, accelerating domestic AI innovation capabilities.

📅 Source: AI Hub Today • Feb 4, 2026 🔗 Link: https://www.aibase.com/zh/news/25199

Nvidia-OpenAI $100 Billion Deal "On Ice"

Summary: Planned massive investment deal faces renegotiation.

Key Points:

Deal announced in September now reportedly paused
Discussions continue for smaller investment (tens of billions)
Part of OpenAI's current funding round
Original plan: up to $100B for compute + cash partnership

Impact: Suggests shifting dynamics in AI infrastructure partnerships, potentially due to market conditions or strategic reassessment.

📅 Source: The Verge • Feb 3, 2026 🔗 Link: https://www.wsj.com/tech/ai/the-100-billion-megadeal-between-openai-and-nvidia-is-on-ice-aa3025e3

🛠️ Tools & Applications

Top Open Source AI Projects

superpowers - Agent Skills Framework

⭐ 43,217 stars on GitHub
Effective agent skills framework and software development methodology
Helps developers build more powerful AI agent systems
🔗 https://github.com/obra/superpowers

dexter - Deep Financial Research Agent

⭐ 9,951 stars on GitHub
Autonomous agent for deep financial research
Specialized intelligent analysis tool for financial domain
🔗 https://github.com/virattt/dexter

ccpm - Claude Code Project Management System

⭐ 6,563 stars on GitHub
Uses GitHub Issues and Git worktrees for parallel agent execution
Makes multi-agent collaboration more efficient
🔗 https://github.com/automazeio/ccpm

vm0 - Natural Language Workflow Automation

⭐ 585 stars on GitHub
Simplest way to automate natural language-described workflows
Define workflows using natural language
🔗 https://github.com/vm0-ai/vm0

review-prompts - AI Code Review Prompts

⭐ 235 stars on GitHub
Prompt collection specifically for AI code review
Complete content available on GitHub
🔗 https://github.com/masoncl/review-prompts

Anthropic Expands Cowork with Plugins

Summary: Claude's agentic AI tool gains domain expert capabilities.

Key Points:

New "plugins" feature for Cowork research preview
Enables domain expertise in: sales, legal, finance, marketing, data analysis, customer support, product management, biology research
Available now to all paid subscription tiers
Leans further into agentic AI paradigm

Impact: Transforms Cowork from generalist into specialized expert system, broadening agentic AI adoption in enterprise workflows.

📅 Source: The Verge • Jan 30, 2026 🔗 Link: http://claude.com/blog/cowork-research-preview

Rabbit Announces New AI Device and r1 Updates

Summary: AI hardware company launches "project cyberdeck" and major r1 OTA update.

Key Points:

New device: "project cyberdeck" for vibe-coding
Portable device specifically designed for agentic coding
r1 OTA update transforms it into "plug-and-play computer controller"
Enables agentic tasks on user's behalf
Integrates OpenClaw (open-source agentic tool)

Impact: Continued innovation in AI hardware space, specializing in agentic computing and vibe-coding use cases.

📅 Source: The Verge • Jan 30, 2026 🔗 Link: https://x.com/rabbit_hmi/status/2017082134717223008

🌍 Policy & Ethics

OpenAI Poaches Safety Executive from Anthropic

Summary: Dylan Scandinaro moves from Anthropic AGI safety role to OpenAI.

Key Points:

New title: "head of preparedness" at OpenAI
Came from AGI safety role at chief competitor
Posted: "AI is advancing rapidly. The potential benefits are great—and so are the risks of extreme and even irrecoverable harm. There's a lot of work to do, and not much time to do it!"

Impact: Leadership shuffle in AI safety space raises questions about safety priorities and talent competition between leading AI labs.

📅 Source: The Verge • Feb 3, 2026 🔗 Link: https://x.com/sama/status/2018800541716107477

X Safety Teams Warned Management About Grok Deepfakes

Summary: Internal reports show safety teams flagged undressing tool risks before public outcry.

Key Points:

Safety teams "repeatedly warned management" about undressing tools
Platform's content moderation filters couldn't handle estimated millions of sexualized deepfakes
AI-edited images don't trigger database warnings for known illegal images
Child sexual abuse material detection ineffective against AI-generated content

Impact: Highlights growing challenge of AI-generated abuse material and need for new detection approaches beyond traditional database matching.

📅 Source: The Verge • Feb 2, 2026 🔗 Link: https://www.washingtonpost.com/technology/2026/02/02/elon-musk-grok-porn-generator/

Sophia Robot Creator Asked Epstein for "Sexy Android" Funding

Summary: DOJ documents reveal $3 million proposal for "attractive female android."

Key Points:

Roboticist David Hanson proposed building "attractive female android"
Proposal included "working gorgeous robot face and body"
Rough sketch of "gynoid" with note: "final design will be done collaboratively with you"
Raises ethical questions about AI/robot design and funding sources

Impact: Historical revelation underscores intersection of AI ethics, robotics, and problematic funding relationships in tech industry.

📅 Source: The Verge • Feb 2, 2026 🔗 Link: https://www.justice.gov/epstein/files/DataSet 11/EFTA02725875.pdf

NYC AI Chatbot Told Businesses to Break Law

Summary: City plans to kill chatbot that encouraged illegal business practices.

Key Points:

Launched under Mayor Eric Adams to help businesses navigate regulations
Instead encouraged illegal behavior:
- Taking portion of employees' tips
- Refusing to accept cash payments
- Didn't even know minimum wage
Reporting by The City and The Markup exposed widespread problems
New administration plans to terminate the bot

Impact: Cautionary tale about deploying AI systems without proper testing in high-stakes regulatory contexts.

📅 Source: The Verge • Feb 1, 2026 🔗 Link: https://thecity.nyc/2026/01/30/mamdani-unusable-ai-chatbot-budget/

🎯 Key Takeaways

Multi-agent diversity matters more than quantity: Research shows 2 diverse agents outperform 16 homogeneous ones, fundamentally challenging scaling-by-quantity paradigm.
Space-based AI computing becomes concrete: Musk's SpaceX-xAI merger with $1.25T valuation and million-satellite constellation plan could revolutionize AI infrastructure.
Chinese AI accelerating release schedules: GLM-5, MiniMax M2.2, and others launching before Chinese New Year, focusing on specialized capabilities to compete globally.
Agentic orchestration automation: AOrchestra and similar frameworks reducing human engineering effort, making multi-agent systems more accessible and efficient.
Context learning limitations exposed: Tencent's CL-bench reveals LLMs only solve 17% of in-context learning tasks, highlighting fundamental gaps in current models.

Generated on: 2026-02-05 03:00:00 AM GMT+8 Next update: 2026-02-06 03:00:00 AM GMT+8 Total news items: 20 (5 arXiv papers + 15 industry news items)

📰 AI 日报 - 2026年2月5日 ​

🎓 学术研究（arXiv 论文） ​

理解 Agent 扩展 in LLM-Based Multi-Agent Systems via Diversity ​

AOrchestra：自动化子 Agent 创建 for Agentic Orchestration ​

AutoFigure：生成 Publication-Ready Scientific Illustrations ​

Conformal Thinking: Risk Control for Reasoning on a Compute Budget ​

TodyComm: Task-Oriented Dynamic Communication for Multi-Round LLM-based Multi-Agent System ​

🔥 重大公告 ​

OpenAI Codex 桌面版 Application Launch ​

GLM-5 and MiniMax M2.2 Coming Before Chinese New Year ​

Sam Altman "Kinda-Sorta-Almost" Declares AGI ​

🔬 Research & Papers ​

Tencent's CL-bench Reveals Models Can't Learn from Context ​

ProjDevBench Evaluates End-to-End AI Project Development ​

Reinforcement Learning for Explainable Human Decision Modeling ​

RLVR Training Instability Mechanism Revealed ​

💰 Industry & Business ​

Musk Announces SpaceX-xAI Merger: $1.25 Trillion Valuation ​

SpaceX Files for Million-Satellite Computing Constellation ​

Tencent Hunyuan Hires Another Top Scientist ​

Nvidia-OpenAI $100 Billion Deal "On Ice" ​

🛠️ Tools & Applications ​

Top Open Source AI Projects ​

Anthropic Expands Cowork with Plugins ​

Rabbit Announces New AI Device and r1 Updates ​

🌍 Policy & Ethics ​

OpenAI Poaches Safety Executive from Anthropic ​

X Safety Teams Warned Management About Grok Deepfakes ​

Sophia Robot Creator Asked Epstein for "Sexy Android" Funding ​

NYC AI Chatbot Told Businesses to Break Law ​

🎯 Key Takeaways ​

📰 AI 日报 - 2026年2月5日

🎓 学术研究（arXiv 论文）

理解 Agent 扩展 in LLM-Based Multi-Agent Systems via Diversity

AOrchestra：自动化子 Agent 创建 for Agentic Orchestration

AutoFigure：生成 Publication-Ready Scientific Illustrations

Conformal Thinking: Risk Control for Reasoning on a Compute Budget

TodyComm: Task-Oriented Dynamic Communication for Multi-Round LLM-based Multi-Agent System

🔥 重大公告

OpenAI Codex 桌面版 Application Launch

GLM-5 and MiniMax M2.2 Coming Before Chinese New Year

Sam Altman "Kinda-Sorta-Almost" Declares AGI

🔬 Research & Papers

Tencent's CL-bench Reveals Models Can't Learn from Context

ProjDevBench Evaluates End-to-End AI Project Development

Reinforcement Learning for Explainable Human Decision Modeling

RLVR Training Instability Mechanism Revealed

💰 Industry & Business

Musk Announces SpaceX-xAI Merger: $1.25 Trillion Valuation

SpaceX Files for Million-Satellite Computing Constellation

Tencent Hunyuan Hires Another Top Scientist

Nvidia-OpenAI $100 Billion Deal "On Ice"

🛠️ Tools & Applications

Top Open Source AI Projects

Anthropic Expands Cowork with Plugins

Rabbit Announces New AI Device and r1 Updates

🌍 Policy & Ethics

OpenAI Poaches Safety Executive from Anthropic

X Safety Teams Warned Management About Grok Deepfakes

Sophia Robot Creator Asked Epstein for "Sexy Android" Funding

NYC AI Chatbot Told Businesses to Break Law

🎯 Key Takeaways