Mobi-π

Mobilizing Your Robot Learning Policy

TL;DR

We propose novel metrics, tasks, visualization tools and methods for “policy mobilization,” the problem of taking a non-mobile manipulation policy and finding a proper initial robot pose from which to execute it on a mobile platform.

Introducing

Policy Mobilization

Given an existing manipulation policy trained on data collected from limited viewpoints, policy mobilization aims to find an optimal robot pose in an unseen environment to successfully execute this policy.

With Policy Mobilization
π
With Policy Mobilization
π
With Policy Mobilization
π
With Policy Mobilization
π

Highlights

What Policy Mobilization Enables

Chaining a sequence of manipulation skills in-the-wild.

Generalizing zero-shot to unseen scene layouts.

Operating in large spaces that the policies have not explored during training.

Adapting to novel object heights.

How we approach policy mobilization

Our Recipe, Simple Steps

Our method for policy mobilization follows four simple steps. Provide a pre-trained robot manipulation policy with its accompanying training data. The method handles the rest.

1

Capture

Drive the robot around to capture the test-time scene.

2

Splat

Build a 3D Gaussian Splatting model from the capture.

3

Score

Use our novel hybrid score function to rate robot poses.

4

Optimize

Find the best robot pose with sampling-based optimization.

Comparisons

Real Robot Results

In real experiments, we compare our method with two baselines: BC w/ Nav and a Human baseline. In the BC w/ Nav baseline, we train an end-to-end imitation learning policy from navigation and manipulation data. In the Human baseline, we verbally ask users to manually drive the robot to where they believe is the optimal starting base pose for it to execute each manipulation task successfully, without providing any information on how the policy was trained. All videos below are played at 4x playback speed.

Human
BC w/ Nav
Ours

Introducing

The Mobi-π Framework

To motivate further research on policy mobilization, we propose the Mobi-π framework:

5

Simulated tasks based on RoboCasa

3

Navigation and imitation learning baselines

2

Mobilization Feasibility Metrics

Simulation Task Suite

To effectively study the policy mobilization problem, we develop a suite of sim environments with RoboCasa as a benchmarking setup. We pick five single-stage manipulation tasks: Close Door, Close Drawer, Turn on Faucet, Turn on Microwave, and Turn on Stove.

Baselines

We study three baselines for policy mobilization. We divide them into two categories. The first type of baselines navigates to the object of interest without considering the manipulation policy's capabilities. These methods are not policy-aware. Methods in this category include LeLaN and VLFM. The second type of baselines is policy-aware and leverages large-scale data to connect navigation with manipulation. We use BC w/ Nav, a Behavior Transformer trained to jointly perform navigation and manipulation using combined demonstrations, as a representative method in this category.

Metrics

We quantify how feasible a policy can be mobilized from spatial and visual perspectives. We evaluate the feasibility metrics for the simulated tasks and discuss their correlations with experimental results.

Learn More In Our Paper

Team

Meet Our Team

This work would not be possible without the awesome team.
Jingyun completed this work during an internship with Toyota Research Institute.
* Isabella and Brandon contributed equally.

Member

Jingyun Yang

Stanford University

Member

Isabella Huang*

Toyota Research Institute

Member

Brandon Vu*

Stanford University

Member

Max Bajracharya

Toyota Research Institute

Member

Rika Antonova

University of Cambridge

Member

Jeannette Bohg

Stanford University

FAQ

Questions? Answers.

Why should I care about policy mobilization? Why not simply train a mobile manipulation policy from a large dataset?

Indeed, one approach to learning mobile manipulation is to train a policy from a dataset that includes both navigation and manipulation data. However, training such a policy typically requires large amounts of training data, since the policy needs to not only generalize to different navigation and manipulation scenarios but also seamlessly coordinate navigation and manipulation. We showed in our simulation experiments that an imitation learning baseline that learns an end-to-end policy for navigation and manipulation fails to perform well in unseen room layouts despite learning from 5x more training data. Compared to training an end-to-end mobile manipulation policy, our policy mobilization framework provides a more data-efficient approach to learning mobile manipulation. Meanwhile, our problem formulation complements existing efforts to improve the robustness of manipulation policies and remains compatible with them.

Why not train a mobile manipulation policy in simulation and transfer to the real world?

Training a mobile manipulation policy in simulation from 3DGS requires creating an interactive and physically accurate simulation from the 3DGS model, which is an unsolved problem.

How does the approach handle distractors?

Our sim setup includes unseen test objects. We find that our method, which utilizes DINO dense descriptors to score robot poses, is capable of ignoring these irrelevant objects.

Can the method handle imperfect scene reconstruction?

Yes. Our learned 3DGS models have artifacts and surface color inconsistencies, yet the method still performs well.

I have other questions. How can I get an answer?

Please try to first find answers in our paper and supplementary materials. If you still cannot find an answer, you are welcome to contact us via email.

Found our work useful?

Cite Us

@article{yang2025mobipi,
  title={Mobi-$\pi$: Mobilizing Your Robot Learning Policy},
  author={Yang, Jingyun and Huang, Isabella and Vu, Brandon and Bajracharya, Max and Antonova, Rika and Bohg, Jeannette},
  journal={arXiv preprint arXiv:2505.23692},
  year={2025}
}