Mobi-π

Introducing

Policy Mobilization

Given an existing manipulation policy trained on data collected from limited viewpoints, policy mobilization aims to find an optimal robot pose in an unseen environment to successfully execute this policy.

With Policy Mobilization

Source Policyπ

With Policy Mobilization

Source Policyπ

With Policy Mobilization

Source Policyπ

With Policy Mobilization

Source Policyπ

With Policy Mobilization

Source Policyπ

With Policy Mobilization

Source Policyπ

With Policy Mobilization

Source Policyπ

With Policy Mobilization

Source Policyπ

With Policy Mobilization

Source Policyπ

With Policy Mobilization

Source Policyπ

Highlights

What Policy Mobilization Enables

Chaining a sequence of manipulation skills in-the-wild.

Generalizing zero-shot to unseen scene layouts.

Operating in large spaces that the policies have not explored during training.

Adapting to novel object heights.

How we approach policy mobilization

Our Recipe, Simple Steps

Our method for policy mobilization follows four simple steps. Provide a pre-trained robot manipulation policy with its accompanying training data. The method handles the rest.

Capture

Drive the robot around to capture the test-time scene.

Splat

Build a 3D Gaussian Splatting model from the capture.

Score

Use our novel hybrid score function to rate robot poses.

Optimize

Find the best robot pose with sampling-based optimization.

Comparisons

Real Robot Results

In real experiments, we compare our method with two baselines: BC w/ Nav and a Human baseline. In the BC w/ Nav baseline, we train an end-to-end imitation learning policy from navigation and manipulation data. In the Human baseline, we verbally ask users to manually drive the robot to where they believe is the optimal starting base pose for it to execute each manipulation task successfully, without providing any information on how the policy was trained. All videos below are played at 4x playback speed.

Human

BC w/ Nav

Ours

Introducing

The Mobi-π Framework

To motivate further research on policy mobilization, we propose the Mobi-π framework:

Simulated tasks based on RoboCasa

Navigation and imitation learning baselines

Mobilization Feasibility Metrics

Simulation Task Suite

To effectively study the policy mobilization problem, we develop a suite of sim environments with RoboCasa as a benchmarking setup. We pick five single-stage manipulation tasks: Close Door, Close Drawer, Turn on Faucet, Turn on Microwave, and Turn on Stove.

Baselines

We study three baselines for policy mobilization. We divide them into two categories. The first type of baselines navigates to the object of interest without considering the manipulation policy's capabilities. These methods are not policy-aware. Methods in this category include LeLaN and VLFM. The second type of baselines is policy-aware and leverages large-scale data to connect navigation with manipulation. We use BC w/ Nav, a Behavior Transformer trained to jointly perform navigation and manipulation using combined demonstrations, as a representative method in this category.

Metrics

We quantify how feasible a policy can be mobilized from spatial and visual perspectives. We evaluate the feasibility metrics for the simulated tasks and discuss their correlations with experimental results.

Learn More In Our Paper

Team

Meet Our Team

This work would not be possible without the awesome team.
Jingyun completed this work during an internship with Toyota Research Institute.
* Isabella and Brandon contributed equally.

Jingyun Yang

Stanford University

Isabella Huang*

Toyota Research Institute

Brandon Vu*

Stanford University

Max Bajracharya

Toyota Research Institute

Rika Antonova

University of Cambridge

Jeannette Bohg

Stanford University

FAQ

Questions? Answers.

Why should I care about policy mobilization? Why not simply train a mobile manipulation policy from a large dataset?

Indeed, one approach to learning mobile manipulation is to train a policy from a dataset that includes both navigation and manipulation data. However, training such a policy typically requires large amounts of training data, since the policy needs to not only generalize to different navigation and manipulation scenarios but also seamlessly coordinate navigation and manipulation. We showed in our simulation experiments that an imitation learning baseline that learns an end-to-end policy for navigation and manipulation fails to perform well in unseen room layouts despite learning from 5x more training data. Compared to training an end-to-end mobile manipulation policy, our policy mobilization framework provides a more data-efficient approach to learning mobile manipulation. Meanwhile, our problem formulation complements existing efforts to improve the robustness of manipulation policies and remains compatible with them.

Why not train a mobile manipulation policy in simulation and transfer to the real world?

Training a mobile manipulation policy in simulation requires creating an interactive and physically accurate simulation. This is still an unsolved problem for many tasks, like those with deformable and granular objects.

How does the approach handle distractors?

Our sim setup includes unseen test objects. We find that our method, which utilizes DINO dense descriptors to score robot poses, is capable of ignoring these irrelevant objects.

Can the method handle imperfect scene reconstruction?

Yes. Our learned 3DGS models have artifacts and surface color inconsistencies, yet the method still performs well.

I have other questions. How can I get an answer?

Please try to first find answers in our paper and supplementary materials. If you still cannot find an answer, you are welcome to contact us via email.

Want more reading?

Related Research

EquiBot

Generalizable and data-efficient policy learning method powered by equivariant architectures, deployed on a mobile platform.

HoMeR

Combining hybrid IL agent and whole-body controller enables sample-efficient and generalizable mobile manipulation.

RoboFuME

Reset-free finetuning system that autonomously improves a pre-trained multi-task manipulation policy without human intervention.

Mobi-π

Policy Mobilization

What Policy Mobilization Enables

Our Recipe, Simple Steps

Capture

Splat

Score

Optimize

Real Robot Results

The Mobi-π Framework

Simulation Task Suite

Baselines

Metrics

Meet Our Team

Questions? Answers.

Cite Us

Related Research