SimToolReal: An Object-Centric Policy for Zero-Shot Dexterous Tool Manipulation

* denotes equal contribution, † denotes equal advising
Stanford University

Abstract

The ability to manipulate tools significantly expands the set of tasks a robot can perform. Yet, tool manipulation represents a challenging class of dexterity, requiring grasping thin objects, in-hand object rotations, and forceful interactions. Since collecting teleoperation data for these behaviors is challenging, sim-to-real reinforcement learning (RL) is a promising alternative. However, prior approaches typically require substantial engineering effort to model objects and tune reward functions for each task. In this work, we propose SimToolReal, taking a step towards generalizing sim-to-real RL policies for tool manipulation. Instead of focusing on a single object and task, we procedurally generate a large variety of tool-like object primitives in simulation and train a single RL policy with the universal goal of manipulating each object to random goal poses. This approach enables SimToolReal to perform general dexterous tool manipulation at test-time without any object or task-specific training. We demonstrate that SimToolReal outperforms prior retargeting and fixed-grasp methods by 37% while matching the performance of specialist RL policies trained on specific target objects and tasks. Finally, we show that SimToolReal generalizes across a diverse set of everyday tools, achieving strong zero-shot performance over 120 real-world rollouts spanning 24 tasks, 12 object instances, and 6 tool categories.

Zero-Shot Dexterous Manipulation (Sound On 🔊)

Dynamic spinning, in-hand rotation, and stable grasps on unseen tools and tasks

Turn sound on

Real-World Rollouts

Select a tool category to watch manipulation across diverse objects and tasks

Universal Training Objective: Any Goal-Pose Reaching

Track Goal Trajectory on Unseen Tool at Test-Time

Common Failure Modes

We break down the different reasons for failure. The policy shows strong recovery behavior from behavior.

Acknowledgements

This work is supported by Stanford Human-Centered Artificial Intelligence (HAI), ONR Young Investigator Award, the National Science Foundation (NSF) under Grant Numbers 2153854, 2327974, 2312956, 2327973, and 2342246, and the Natural Sciences and Engineering Research Council of Canada (NSERC) under Award Number 526541680. We thank Sharpa for the donation of the Sharpa hand and for the technical support provided by their team, specifically Kaifeng Zhang, Wenjie Mei, Yi Zhou, Yunfang Yang, Jie Yin, and Jason Lee.