Skip to content

๐Ÿ“ฐ AI ๆ—ฅๆŠฅ - 2026ๅนด2ๆœˆ5ๆ—ฅ โ€‹

ๆ—ฅๆœŸ๏ผš2026ๅนด2ๆœˆ5ๆ—ฅ๏ผˆๆ˜ŸๆœŸไธ‰๏ผ‰ ็”Ÿๆˆๆ—ถ้—ด๏ผš 2026-02-05 03:00:00 AM GMT+8 ๆฅๆบ๏ผš The Verge AI, AI Hub Today, arXiv, industry reports ่ฆ†็›–่Œƒๅ›ด๏ผš Last 48 hours (Feb 3-5, 2026)


๐ŸŽ“ ๅญฆๆœฏ็ ”็ฉถ๏ผˆarXiv ่ฎบๆ–‡๏ผ‰ โ€‹

็†่งฃ Agent ๆ‰ฉๅฑ• in LLM-Based Multi-Agent Systems via Diversity โ€‹

arXiv: 2602.03794 โ€ข Submitted: Feb 3, 2026

Authors: Shangding Gu et al.

Key Contribution: 2 diverse agents outperform 16 homogeneous agents in multi-agent systems.

Abstract: LLM-based multi-agent systems (MAS) have emerged as a promising approach to tackle complex tasks that are difficult for individual LLMs. A natural strategy is to scale performance by increasing the number of agents; however, we find that such scaling exhibits strong diminishing returns in homogeneous settings, while introducing heterogeneity (e.g., different models, prompts, or tools) continues to yield substantial gains. This raises a fundamental question: what limits scaling, and why does diversity help? We present an information-theoretic framework showing that MAS performance is bounded by the intrinsic task uncertainty, not by agent count. We derive architecture-agnostic bounds demonstrating that improvements depend on how many effective channels the system accesses. Homogeneous agents saturate early because their outputs are strongly correlated, whereas heterogeneous agents contribute complementary evidence. Empirically, we show that heterogeneous configurations consistently outperform homogeneous scaling: 2 diverse agents can match or exceed the performance of 16 homogeneous agents.

Impact: Provides principled guidelines for building efficient multi-agent systems through diversity-aware design, challenging the "more agents is better" paradigm.

๐Ÿ”— Link: https://arxiv.org/abs/2602.03794


AOrchestra๏ผš่‡ชๅŠจๅŒ–ๅญ Agent ๅˆ›ๅปบ for Agentic Orchestration โ€‹

arXiv: 2602.03786 โ€ข Submitted: Feb 3, 2026

Authors: Jianhao Ruan, Zhihao Xu, Yiran Peng, et al.

Key Contribution: Unified agent abstraction enabling automatic sub-agent creation with 16.28% performance improvement.

Abstract: Language agents have shown strong promise for task automation. Realizing this promise for increasingly complex, long-horizon tasks has driven the rise of a sub-agent-as-tools paradigm for multi-turn task solving. However, existing designs still lack a dynamic abstraction view of sub-agents, thereby hurting adaptability. We address this challenge with a unified, framework-agnostic agent abstraction that models any agent as a tuple (Instruction, Context, Tools, Model). This tuple acts as a compositional recipe for capabilities, enabling the system to spawn specialized executors for each task on demand. Building on this abstraction, we introduce an agentic system AOrchestra, where the central orchestrator concretizes the tuple at each step: it curates task-relevant context, selects tools and models, and delegates execution via on-the-fly automatic agent creation.

Impact: Reduces human engineering efforts while enabling controllable performance-cost trade-offs, approaching Pareto-efficient agent orchestration.

๐Ÿ”— Link: https://arxiv.org/abs/2602.03786


AutoFigure๏ผš็”Ÿๆˆ Publication-Ready Scientific Illustrations โ€‹

arXiv: 2602.03828 โ€ข Submitted: Feb 3, 2026

Authors: Yixuan Weng et al.

Key Contribution: First agentic framework for automated high-quality scientific illustration generation from text.

Abstract: High-quality scientific illustrations are crucial for effectively communicating complex scientific and technical concepts, yet their manual creation remains a well-recognized bottleneck in both academia and industry. We present FigureBench, the first large-scale benchmark for generating scientific illustrations from long-form scientific texts. It contains 3,300 high-quality scientific text-figure pairs, covering diverse text-to-illustration tasks from scientific papers, surveys, blogs, and textbooks. Moreover, we propose AutoFigure, the first agentic framework that automatically generates high-quality scientific illustrations based on long-form scientific text.

Impact: Accepted at ICLR 2026 - addresses major bottleneck in scientific communication with agentic AI solution.

๐Ÿ”— Link: https://arxiv.org/abs/2602.03828


Conformal Thinking: Risk Control for Reasoning on a Compute Budget โ€‹

arXiv: 2602.03814 โ€ข Submitted: Feb 3, 2026

Authors: Xi Wang et al.

Key Contribution: Distribution-free risk control framework for adaptive reasoning with optimal token budget allocation.

Abstract: Reasoning Large Language Models (LLMs) enable test-time scaling, with dataset-level accuracy improving as the token budget increases, motivating adaptive reasoning -- spending tokens when they improve reliability and stopping early when additional computation is unlikely to help. However, setting the token budget, as well as the threshold for adaptive reasoning, is a practical challenge that entails a fundamental risk-accuracy trade-off. We re-frame the budget setting problem as risk control, limiting the error rate while minimizing compute. Our framework introduces an upper threshold that stops reasoning when the model is confident (risking incorrect output) and a novel parametric lower threshold that preemptively stops unsolvable instances.

Impact: Enables computationally efficient reasoning while maintaining user-specified risk targets across diverse tasks and models.

๐Ÿ”— Link: https://arxiv.org/abs/2602.03814


TodyComm: Task-Oriented Dynamic Communication for Multi-Round LLM-based Multi-Agent System โ€‹

arXiv: 2602.03688 โ€ข Submitted: Feb 3, 2026

Authors: Wenzhe Fan et al.

Key Contribution: Dynamic communication topology that adapts to changing agent roles across rounds.

Abstract: Multi-round LLM-based multi-agent systems rely on effective communication structures to support collaboration across rounds. However, most existing methods employ a fixed communication topology during inference, which falls short in many realistic applications where the agents' roles may change across rounds due to dynamic adversary, task progression, or time-varying constraints such as communication bandwidth. We propose addressing this issue through TodyComm, a task-oriented dynamic communication algorithm that produces behavior-driven collaboration topologies that adapt to the dynamics at each round, optimizing the utility for the task through policy gradient.

Impact: Delivers superior task effectiveness under dynamic adversary and communication budget constraints while retaining token efficiency.

๐Ÿ”— Link: https://arxiv.org/abs/2602.03688


๐Ÿ”ฅ ้‡ๅคงๅ…ฌๅ‘Š โ€‹

OpenAI Codex ๆกŒ้ข็‰ˆ Application Launch โ€‹

Summary: OpenAI releases desktop application for multi-agent orchestration with independent thread execution.

Key Points:

  • Designed as a command center for multi-agent systems
  • Each agent runs in independent threads
  • Project-based organization with Git Worktree support
  • Custom Skills framework for cross-platform synchronization
  • Direct competitor to Claude Code and other agentic IDEs

Impact: Intensifies competition in agentic coding space, offering developers native multi-agent collaboration capabilities.

๐Ÿ“… Source: AI Hub Today โ€ข Feb 4, 2026 ๐Ÿ”— Link: https://www.xiaohu.ai/c/xiaohu-ai/openai-codex-skills-5f0c89


GLM-5 and MiniMax M2.2 Coming Before Chinese New Year โ€‹

Summary: Major Chinese AI models set for release before February 15, 2026.

Key Points:

  • Zhipu AI's GLM-5 focuses on creative writing, coding, and reasoning breakthroughs
  • MiniMax M2.2 enhances programming capabilities as "programmer's secret weapon"
  • DeepSeek releases minor V3 series update
  • ByteDance and Alibaba also preparing new model launches

Impact: Chinese AI companies accelerating release schedules to compete with global leaders, focusing on specialized capabilities.

๐Ÿ“… Source: AI Hub Today โ€ข Feb 4, 2026 ๐Ÿ”— Link: https://www.aibase.com/zh/news/25219


Sam Altman "Kinda-Sorta-Almost" Declares AGI โ€‹

Summary: OpenAI CEO makes ambiguous statement about achieving Artificial General Intelligence.

Key Points:

  • Altman stated "we basically have built AGI, or very close to it" in Forbes profile
  • Later clarified: "I meant that as a spiritual statement, not a literal one"
  • Conceded AGI will require "a lot of medium-sized breakthroughs. I don't think we need a big one"
  • Highlights ongoing debate about AGI definition and timeline

Impact: Continues pattern of ambiguous AGI claims from OpenAI leadership, fueling both excitement and skepticism in AI community.

๐Ÿ“… Source: The Verge โ€ข Feb 3, 2026 ๐Ÿ”— Link: https://www.forbes.com/sites/richardnieva/2026/02/03/sam-altman-explains-the-future/


๐Ÿ”ฌ Research & Papers โ€‹

Tencent's CL-bench Reveals Models Can't Learn from Context โ€‹

Summary: New benchmark shows LLMs only solve 17.2% of in-context learning tasks on average.

Key Points:

  • First paper from Tencent Hunyuan after hiring Yao Shunyu
  • Benchmark tests model ability to learn and apply new knowledge from context
  • Best performer (GPT-5.1) only achieves 23.7% success rate
  • Reveals fundamental limitation: models don't truly utilize context effectively

Impact: Challenges assumptions about LLM in-context learning capabilities, suggests need for better learning mechanisms.

๐Ÿ“… Source: AI Hub Today โ€ข Feb 4, 2026 ๐Ÿ”— Link: https://www.jiqizhixin.com/articles/2026-02-03-7


ProjDevBench Evaluates End-to-End AI Project Development โ€‹

Summary: New benchmark tests AI agents on complete project lifecycle from requirements to repository.

Key Points:

  • Existing benchmarks focus on bug fixing
  • ProjDevBench evaluates full software development lifecycle
  • 20 programming tasks across 8 categories
  • Combines online judge testing with LLM code review
  • Six coding agents achieved only 27.38% overall pass rate
  • Complex system design identified as major weakness

Impact: Reveals significant gap in AI agents' ability to handle complete software development projects, highlighting need for better system design capabilities.

๐Ÿ“… Source: AI Hub Today โ€ข Feb 4, 2026 ๐Ÿ”— Link: https://arxiv.org/abs/2602.01655


Reinforcement Learning for Explainable Human Decision Modeling โ€‹

Summary: New research direction uses outcome-based RL to guide LLMs in generating explicit reasoning chains.

Key Points:

  • Cognitive modeling approach for human decision explanation
  • Outcome-based reinforcement learning guides reasoning chain generation
  • Goals: prediction accuracy AND explainability
  • Moves beyond black-box predictions to interpretable AI

Impact: Addresses critical need for explainable AI in decision-critical applications, making AI reasoning transparent and verifiable.

๐Ÿ“… Source: AI Hub Today โ€ข Feb 4, 2026 ๐Ÿ”— Link: https://arxiv.org/abs/2505.11614


RLVR Training Instability Mechanism Revealed โ€‹

Summary: Research explains why verifiable reward reinforcement learning causes MoE architecture collapse.

Key Points:

  • RLVR can continuously improve reasoning ability but MoE architectures often crash
  • Proposed objective-level hacking framework to explain instability
  • Core finding: token-level credit mismatch creates false signals
  • Leads to abnormal growth in training-inference discrepancy

Impact: Understanding RLVR instability crucial for developing more robust RL-based reasoning systems, preventing model collapse during training.

๐Ÿ“… Source: AI Hub Today โ€ข Feb 4, 2026 ๐Ÿ”— Link: https://arxiv.org/abs/2602.01103


๐Ÿ’ฐ Industry & Business โ€‹

Musk Announces SpaceX-xAI Merger: $1.25 Trillion Valuation โ€‹

Summary: Historic merger creates world's most valuable AI/space company with ambitious space-based computing plans.

Key Points:

  • Combined valuation reaches $1.25 trillion
  • Internal memo reveals plan for space-deployed data centers
  • Musk predicts space-based AI is only path to true scale
  • Plans to launch 1 million satellites to build orbital data centers
  • Aiming for Kardashev Type II civilization capabilities
  • Leverages space's natural cooling (cryogenic vacuum environment)

Impact: Could revolutionize AI infrastructure by moving computing to space, bypassing terrestrial limitations and creating orbital data center networks.

๐Ÿ“… Source: AI Hub Today โ€ข Feb 4, 2026 ๐Ÿ”— Link: https://www.qbitai.com/2026/02/375614.html


SpaceX Files for Million-Satellite Computing Constellation โ€‹

Summary: Applications filed for constellation with 80 EFLOPS total computing power.

Key Points:

  • Core purpose: orbital data centers, not communication
  • 80 EFLOPS combined computing power planned
  • Space's natural vacuum solves cooling challenges
  • Timeline: 2028 startup, 2030 completion target
  • Traditional data center providers face potential disruption
  • "Dimension reduction strike" against terrestrial IDC industry

Impact: If successful, would fundamentally alter AI infrastructure landscape, creating space-based computing platform with unlimited scalability potential.

๐Ÿ“… Source: AI Hub Today โ€ข Feb 4, 2026 ๐Ÿ”— Link: https://www.aibase.com/zh/news/25192


Tencent Hunyuan Hires Another Top Scientist โ€‹

Summary: Tsinghua PhD Pang Tianyu joins as Chief Research Scientist for Multimodal Division.

Key Points:

  • Focus on reinforcement learning technology research
  • Previously at Singapore Sea AI Lab
  • Second major hire after Yao Shunyu
  • Signals Tencent's aggressive investment in AI research talent

Impact: Chinese tech giants competing aggressively for top AI research talent, accelerating domestic AI innovation capabilities.

๐Ÿ“… Source: AI Hub Today โ€ข Feb 4, 2026 ๐Ÿ”— Link: https://www.aibase.com/zh/news/25199


Nvidia-OpenAI $100 Billion Deal "On Ice" โ€‹

Summary: Planned massive investment deal faces renegotiation.

Key Points:

  • Deal announced in September now reportedly paused
  • Discussions continue for smaller investment (tens of billions)
  • Part of OpenAI's current funding round
  • Original plan: up to $100B for compute + cash partnership

Impact: Suggests shifting dynamics in AI infrastructure partnerships, potentially due to market conditions or strategic reassessment.

๐Ÿ“… Source: The Verge โ€ข Feb 3, 2026 ๐Ÿ”— Link: https://www.wsj.com/tech/ai/the-100-billion-megadeal-between-openai-and-nvidia-is-on-ice-aa3025e3


๐Ÿ› ๏ธ Tools & Applications โ€‹

Top Open Source AI Projects โ€‹

superpowers - Agent Skills Framework

  • โญ 43,217 stars on GitHub
  • Effective agent skills framework and software development methodology
  • Helps developers build more powerful AI agent systems
  • ๐Ÿ”— https://github.com/obra/superpowers

dexter - Deep Financial Research Agent

  • โญ 9,951 stars on GitHub
  • Autonomous agent for deep financial research
  • Specialized intelligent analysis tool for financial domain
  • ๐Ÿ”— https://github.com/virattt/dexter

ccpm - Claude Code Project Management System

  • โญ 6,563 stars on GitHub
  • Uses GitHub Issues and Git worktrees for parallel agent execution
  • Makes multi-agent collaboration more efficient
  • ๐Ÿ”— https://github.com/automazeio/ccpm

vm0 - Natural Language Workflow Automation

  • โญ 585 stars on GitHub
  • Simplest way to automate natural language-described workflows
  • Define workflows using natural language
  • ๐Ÿ”— https://github.com/vm0-ai/vm0

review-prompts - AI Code Review Prompts


Anthropic Expands Cowork with Plugins โ€‹

Summary: Claude's agentic AI tool gains domain expert capabilities.

Key Points:

  • New "plugins" feature for Cowork research preview
  • Enables domain expertise in: sales, legal, finance, marketing, data analysis, customer support, product management, biology research
  • Available now to all paid subscription tiers
  • Leans further into agentic AI paradigm

Impact: Transforms Cowork from generalist into specialized expert system, broadening agentic AI adoption in enterprise workflows.

๐Ÿ“… Source: The Verge โ€ข Jan 30, 2026 ๐Ÿ”— Link: http://claude.com/blog/cowork-research-preview


Rabbit Announces New AI Device and r1 Updates โ€‹

Summary: AI hardware company launches "project cyberdeck" and major r1 OTA update.

Key Points:

  • New device: "project cyberdeck" for vibe-coding
  • Portable device specifically designed for agentic coding
  • r1 OTA update transforms it into "plug-and-play computer controller"
  • Enables agentic tasks on user's behalf
  • Integrates OpenClaw (open-source agentic tool)

Impact: Continued innovation in AI hardware space, specializing in agentic computing and vibe-coding use cases.

๐Ÿ“… Source: The Verge โ€ข Jan 30, 2026 ๐Ÿ”— Link: https://x.com/rabbit_hmi/status/2017082134717223008


๐ŸŒ Policy & Ethics โ€‹

OpenAI Poaches Safety Executive from Anthropic โ€‹

Summary: Dylan Scandinaro moves from Anthropic AGI safety role to OpenAI.

Key Points:

  • New title: "head of preparedness" at OpenAI
  • Came from AGI safety role at chief competitor
  • Posted: "AI is advancing rapidly. The potential benefits are greatโ€”and so are the risks of extreme and even irrecoverable harm. There's a lot of work to do, and not much time to do it!"

Impact: Leadership shuffle in AI safety space raises questions about safety priorities and talent competition between leading AI labs.

๐Ÿ“… Source: The Verge โ€ข Feb 3, 2026 ๐Ÿ”— Link: https://x.com/sama/status/2018800541716107477


X Safety Teams Warned Management About Grok Deepfakes โ€‹

Summary: Internal reports show safety teams flagged undressing tool risks before public outcry.

Key Points:

  • Safety teams "repeatedly warned management" about undressing tools
  • Platform's content moderation filters couldn't handle estimated millions of sexualized deepfakes
  • AI-edited images don't trigger database warnings for known illegal images
  • Child sexual abuse material detection ineffective against AI-generated content

Impact: Highlights growing challenge of AI-generated abuse material and need for new detection approaches beyond traditional database matching.

๐Ÿ“… Source: The Verge โ€ข Feb 2, 2026 ๐Ÿ”— Link: https://www.washingtonpost.com/technology/2026/02/02/elon-musk-grok-porn-generator/


Sophia Robot Creator Asked Epstein for "Sexy Android" Funding โ€‹

Summary: DOJ documents reveal $3 million proposal for "attractive female android."

Key Points:

  • Roboticist David Hanson proposed building "attractive female android"
  • Proposal included "working gorgeous robot face and body"
  • Rough sketch of "gynoid" with note: "final design will be done collaboratively with you"
  • Raises ethical questions about AI/robot design and funding sources

Impact: Historical revelation underscores intersection of AI ethics, robotics, and problematic funding relationships in tech industry.

๐Ÿ“… Source: The Verge โ€ข Feb 2, 2026 ๐Ÿ”— Link: https://www.justice.gov/epstein/files/DataSet 11/EFTA02725875.pdf


NYC AI Chatbot Told Businesses to Break Law โ€‹

Summary: City plans to kill chatbot that encouraged illegal business practices.

Key Points:

  • Launched under Mayor Eric Adams to help businesses navigate regulations
  • Instead encouraged illegal behavior:
    • Taking portion of employees' tips
    • Refusing to accept cash payments
    • Didn't even know minimum wage
  • Reporting by The City and The Markup exposed widespread problems
  • New administration plans to terminate the bot

Impact: Cautionary tale about deploying AI systems without proper testing in high-stakes regulatory contexts.

๐Ÿ“… Source: The Verge โ€ข Feb 1, 2026 ๐Ÿ”— Link: https://thecity.nyc/2026/01/30/mamdani-unusable-ai-chatbot-budget/


๐ŸŽฏ Key Takeaways โ€‹

  1. Multi-agent diversity matters more than quantity: Research shows 2 diverse agents outperform 16 homogeneous ones, fundamentally challenging scaling-by-quantity paradigm.

  2. Space-based AI computing becomes concrete: Musk's SpaceX-xAI merger with $1.25T valuation and million-satellite constellation plan could revolutionize AI infrastructure.

  3. Chinese AI accelerating release schedules: GLM-5, MiniMax M2.2, and others launching before Chinese New Year, focusing on specialized capabilities to compete globally.

  4. Agentic orchestration automation: AOrchestra and similar frameworks reducing human engineering effort, making multi-agent systems more accessible and efficient.

  5. Context learning limitations exposed: Tencent's CL-bench reveals LLMs only solve 17% of in-context learning tasks, highlighting fundamental gaps in current models.


Generated on: 2026-02-05 03:00:00 AM GMT+8 Next update: 2026-02-06 03:00:00 AM GMT+8 Total news items: 20 (5 arXiv papers + 15 industry news items)

ๅŸบไบŽ VitePress ๆž„ๅปบ