Automated Reward Design (ARD) aims to replace manual reward engineering in reinforcement learning with language-driven reward function synthesis. However, existing approaches based on large language models (LLMs) remain inherently correlation-driven, relying on iterative environmental feedback to refine reward hypotheses for each specific task. This paradigm not only results in inefficient reasoning but also makes LLMs susceptible to semantically plausible yet causally spurious reward components, leading to ineffective optimization. To address these limitations, we propose the Causal Reward World Model (CRWM), which explicitly models the causal topological relationships between candidate reward components and task-targeted physical variables through offline pre-training on multi-task interaction data. Based on a coarse-to-fine pre-training strategy, we introduce a Joint Optimization Module that integrates Explicit Mechanism Decoupling with Confidence-Aware Soft Fusion to refine coarse structural priors using micro-level trajectories, thereby constructing robust and interpretable causal skeleton. During inference, LLMs leverage CRWM as a task-irrelevant causal prior to constrain the reward generation, enabling zero-shot reward function design. Our work opens up a new white-box paradigm for the ARD problem. Extensive experiments on complex continuous control benchmarks demonstrate that CRWM generates executable reward functions without feedback-driven reward refinement, significantly reducing the design latency for acquiring new robotic skills while matching or surpassing state-of-the-art performance, and further exhibits strong generalization capabilities across unseen tasks and diverse robotic embodiments.
Visualization of the CRWM. The background graph shows the task-invariant causal skeleton, where the red node denotes the score. The four insets present subgraphs corresponding to different tasks, illustrating the relevant structures within the CRWM. Directed edges indicate causal relationships, and edge weights represent the corresponding causal effects.
Overview of the Causal Reward World Model (CRWM) framework. The pipeline consists of three synergistic phases: (1) Structural Prior Extraction: The offline interventional dataset is split into macroscopic and microscopic streams. A pre-trained causal foundation model, LimiX, processes the macroscopic data to extract the initial coarse structural prior. (2) Joint Optimization Module: The Explicit Mechanism Decoupling (EMD) module constructs the instantaneous topology by combining a learnable causal skeleton with state-dependent transient physical interactions. Subsequently, the Joint Optimization Module uses microscopic data and Confidence-Aware Soft Fusion to refine the coarse structural prior, account for transient physical interactions during reconstruction, and distill the final CRWM. (3) Causal-ARD: The final CRWM is combined with unseen task information and used as a causal prior for LLM-based reward generation. Through explicit causal pruning, the LLM synthesizes executable zero-shot reward functions.
ZS-LLM
CRWM
(evolutionary-search-free)
ShadowHandCatchAbreast
ZS-LLM
CRWM
(evolutionary-search-free)
ShadowHandCatchOver2Underarm
ZS-LLM
CRWM
(evolutionary-search-free)
ShadowHandPen
ZS-LLM
CRWM
(evolutionary-search-free)
PickCube
ZS-LLM
CRWM
(evolutionary-search-free)
TurnFaucet
ZS-LLM
CRWM
(evolutionary-search-free)
OpenCabinetDrawer
PickCube
ShadowHandLiftUnderarm
ShadowHandScissors
| Method | AUC (Mean) | Reward Stability |
|---|---|---|
| Baseline | 53.9% | Low |
| CRWM | 96.5% | High |
In this work, we present Causal Reward World Models (CRWM) for zero-shot automated reward design. Instead of relying on iterative reward refinement, CRWM distills a reusable reward-relevant causal skeleton from offline multi-task data and uses it as an explicit structural prior for LLM-based reward generation. By accounting for state-dependent transient interactions during distillation, CRWM guides the LLM to prune spurious reward terms and synthesize executable rewards in a single pass. Experiments across unseen tasks, different embodiments, and real-world settings show that CRWM-guided rewards achieve strong performance without deployment-time evolutionary search. These results suggest that incorporating reward-relevant causal structure is a promising direction for improving the robustness and efficiency of automated reward design.