Long Horizon：机器人世界模型的研究矩阵

June 2026

Long Horizon 在机器人世界模型里由几类互相耦合的失败模式构成：rollout drift、closed-loop planning、trust horizon、event verification、temporal abstraction，以及 object/state persistence。

机器人里的 long horizon 通常从一个很具体的计算过程开始：当前观测被编码成 state，world model 在候选 action sequence 下向前预测，planner 根据预测出的 state、reward、value 或 event score 选择动作，然后机器人只执行下一步动作。下一次观测到来后，系统再规划一次。

这个闭环把一个普通预测误差放大成决策问题。一步 transition 的误差会进入下一步预测；一个看起来合理的 video future 可能没有完成任务事件；一个高分 action sequence 可能只是利用了模型的错误。Long Horizon 的难点不在于任务时间更长这个表面事实，而在于模型、planner、memory、event verifier 和 action interface 在长轨迹里一起承受误差。

Research object

这份矩阵把 long horizon 拆成可研究的六个问题轴。它是一份用于读论文、比较方法和设计实验的工作地图，尚未替代逐篇 paper card 与图表级证据核对。

这里的公开版本只保留 robot world model、robot policy、VLA、humanoid control 与 embodied memory。软件 agent 与纯语言 agent 不放入这份矩阵。

A compact view of the interface. Long horizon is not only a longer task; it is repeated prediction under planner pressure, followed by verification and closed-loop correction.

六个问题轴

把 long horizon 拆开以后，每个方向对应不同的失败形式、解决策略和代表工作。一个研究题目要先说明自己处理哪一轴，否则“做 Long Horizon”会变成过宽的口号。

问题轴	失败形式	常见策略	代表工作
Rollout drift	预测 state、reward、contact、object 或 latent 时，小误差在多步 rollout 中累积。	short rollouts、value bootstrap、uncertainty、task-oriented latent、compositional dynamics、autoregressive stabilization。	MBPO, PlaNet, Dreamer, MuZero, TD-MPC, PRISM-WM, AR Forcing, MIND-V, WEAVER
Closed-loop planning	planner 在 imperfect model 上优化，选出 imagined future 高分但真实环境失败的 action sequence。	MPC / receding horizon、MPPI / CEM、gradient planning、terminal value、policy prior、test-time compute。	PETS, PlaNet, TD-MPC, MuZero, GRASP, Cosmos Policy, tau0-WM, WEAVER
Trust horizon	系统不知道当前 state、contact regime 或 task phase 下 rollout 还能信任多远。	ensemble disagreement、latent consistency、reward / value consistency、event verifier、adaptive horizon。	PETS, MBPO, MOPO, TD-MPC, PRISM-WM, EV-WM, Variable-Length Latent World Models
Event verification	未来看起来合理，但 drawer 没打开、object 没移动、precondition 不成立，任务进度并没有发生。	predicate verifier、event-aware world model、logical state model、simulator validation、reflection / replanning。	EV-WM, H-WM, LoHo-Manip, PALM, SayCan, Code as Policies, ReKep
Temporal abstraction	低层 motor-step rollout 太长，逐步规划会让优化和预测都变脆。	options / skills、subgoals、waypoints、language plans、action chunks、latent actions、task-and-motion planning。	Puppeteer, ULTRA, LoHo-Manip, Long-VLA, A2A, LAPO, Diffusion Policy, ACT
Object / state persistence	系统在长执行中丢失 object identity、location、containment、occluded facts 或 task progress。	object-centric representation、slot dynamics、keyframe memory、event memory、episodic memory、world/ego factorization。	Slot Attention, C-SWM, Slot-MPC, MEM, KEMO, EventVLA, Mem-World, World-Ego Modeling

World model 侧：预测要能被 planner 使用

这里的核心问题是 imagined futures 是否仍然能指导真实动作选择。

TD-MPC 提供一个基础接口：学习 task-oriented latent dynamics，在 latent space 里做 short-horizon MPC，并用 terminal value 估计 horizon 之后的 return。它对 long horizon 的处理方式很克制：少做长 rollout，把远期回报交给 value function。

PRISM-WM 把问题推到 transition function 本身。现实物理是 hybrid system，接触、滑动、落地、飞行会让连续动力学发生切换。单一 latent dynamics 容易把这些边界平均成一个连续函数，planner 多步调用以后就会累积错误。PRISM-WM 用 context-aware MoE 和 latent orthogonalization 表示不同 local dynamics。

GRASP 从 planner 侧进入问题。长 horizon 下，直接通过一条多步 learned-dynamics chain 反传会产生不稳定梯度。GRASP 用 lifted states、stochastic exploration 和 gradient reshaping 改善 gradient-based planning 的优化条件。

WEAVER、EV-WM、tau0-WM、PointWorld、DreamZero、Mem-World 和 Slot-MPC 则说明了近期路线的分化：有人提高 manipulation world model 的 rollout consistency，有人加入 event verifier，有人把视频模型改成 video-action model，有人把 state 变成 3D point flow 或 object slots，有人把 memory 绑定到几何表面。

Policy 侧：任务要能跨阶段执行

这些工作研究长任务执行，多数不直接解决 learned world model 的 rollout fidelity。

Psi-Zero、EgoHumanoid、ULTRA 和 VIRAL 处理 humanoid loco-manipulation 的数据、控制和部署问题。它们的 long horizon 体现在真实机器人要在视觉输入、身体平衡、接触、动作延迟和多阶段任务中持续工作。

A2A、LAPO、MEM、KEMO、EventVLA、LoHo-Manip 和 Long-VLA 则给出另一组接口：action history、latent action、multi-scale memory、event keyframes、visual evidence、task manager、phase-aware masking。它们通常不训练一个 classical world model，但它们暴露了长任务系统需要什么：更快 action generation、更稳定记忆、更清楚的任务阶段和更可靠的进度表示。

代表工作矩阵

下表按公开论文与项目页整理。组织信号只用于定位研究生态，不代表贡献归属的完整列表。

工作	方向	时间	可讨论的 insight
TD-MPC	latent world model + MPC	2022	用 short-horizon planning 加 terminal value 避免长 rollout，同时让 latent 只保留 control-relevant information。
PRISM-WM	hybrid / contact dynamics	2026	把 contact-rich dynamics 拆成多个 expert，避免 monolithic transition function 平均掉 regime switch。
Newt	multitask world model	2025 / 2026	检验 TD-MPC2 风格 world model 是否能跨大量 continuous-control tasks 训练。
Puppeteer	hierarchical humanoid WM	2025	把 humanoid control 分成 high-level visual puppeteer 和 low-level tracking world model。
Cosmos Policy	video model as policy / WM / value	2026	把 actions、future states 和 values 编码成 latent frames，使 video model 能参与 planning。
GRASP	gradient planning	2026	用 lifted states 和 gradient reshaping 改善长 horizon learned-dynamics planning。
WEAVER	manipulation world model	2026	同时强调 fidelity、long-horizon consistency 和 efficiency，让 world model 服务 evaluation、improvement、planning。
EV-WM	event verification	2026	把视觉未来转成可验证 task events，避免 planner 只相信看起来合理的 future。
tau0-WM	video-action WM	2026	通过 candidate actions、consistency filtering 和 simulator rectification 做 test-time action evaluation。
PointWorld	3D point-flow WM	2026	用 3D point flow 表示 action effects，使 state prediction 更接近物理几何。
DreamZero / WAM	world-action model	2026	把 future world state 与 action 一起建模，让 video diffusion prior 参与 closed-loop control。
Mem-World	persistent rollout memory	2026	用 surfel-indexed memory 维持 wrist-camera rollout 中的 object consistency。
World-Ego Modeling	world / ego factorization	2026	把 persistent world evolution 与 robot-centric interaction dynamics 分开建模。
Slot-MPC	object-centric MPC	2026	把 image state 表示成 object slots，在 slot space 中做 predictive control。
Psi-Zero	humanoid VLA	2026	用 human egocentric video 学 task semantics，再用 robot data 学 embodiment-specific action。
EgoHumanoid	robot-free human demos	2026	通过 view / action alignment 把 in-the-wild egocentric human demos 转成 humanoid training source。
ULTRA	multimodal humanoid control	2026	统一 dense reference tracking、operator command 和 sparse egocentric goal following。
VIRAL	visual sim-to-real	2025 / 2026	用大规模仿真、domain randomization、teacher-student 训练 RGB humanoid loco-manipulation policy。
MEM	embodied memory	2026	用 short-term visual memory 与 long-term semantic memory 保持长任务中的 task state。
KEMO	event keyframe memory	2026	优先保留 event keyframes，减少 dense history，让 VLA 在相似 observation 下知道任务阶段。
EventVLA	visual evidence memory	2026	把 memory 视为 task-state evidence，处理遮挡后仍需记住的关键事实。
LoHo-Manip	task manager + VLA	2026	用 high-level task manager 反复更新 remaining subtasks 和 visual trace，让短 horizon VLA 执行长任务。

一个更精确的问题：state-dependent trust horizon

在这些方向里，一个适合小团队先做的研究问题是：learned robot world model 的可信 rollout 长度是否依赖 state、contact regime 和 task phase。固定 horizon MPC 隐含假设所有状态都能预测同样远。接触、遮挡、物体关系变化、任务事件切换可能会让这个假设失效。

这个问题可以被实验化。使用 TD-MPC 或 PRISM-WM 风格的 world model，从真实状态出发，对不同候选 action sequence 做 1 到 H 步 rollout。每个 horizon 记录 latent drift、reward error、value error、event error、object-state consistency 和最终 task success。然后比较固定 horizon、ensemble uncertainty、event-verifier gating、adaptive horizon 等 planner。

Candidate experiment

第一步不需要解决所有 long-horizon planning。先测量一个现象：不同 state / contact regime / task phase 是否有系统性不同的 trust horizon。

如果答案成立，再问 adaptive horizon 或 verifier-gated planner 能否在不重训 base world model 的情况下提高 closed-loop success。

读论文顺序

如果目标是研究 robot world model 的 long horizon，可以先读下面十篇。它们覆盖 trunk、architecture、planner、event verifier、memory、object representation 和 policy-side pressure。

TD-MPC：理解 short-horizon MPC、task-oriented latent、terminal value。
PRISM-WM：理解 contact-rich hybrid dynamics 为什么会让 rollout drift。
GRASP：理解 planner 本身怎样在长 horizon 下失效。
WEAVER：理解 manipulation world model 同时需要 fidelity、consistency、efficiency。
EV-WM：理解 event verification 为什么比 visual plausibility 更接近任务成功。
tau0-WM：理解 video-action world model 如何做 test-time action evaluation。
Mem-World：理解 geometry-indexed memory 如何维持 object consistency。
Slot-MPC：理解 object-centric state 如何改变 planning 的状态空间。
LoHo-Manip：理解 long task 如何被 task manager 拆成 repeated short-horizon execution。
World Models for Robotic Manipulation survey：理解 manipulation 场景里的 open challenges 和 evaluation 标准。