What Makes Tool-Use Hard?

Tool-Use is a challenging class of dexterity, requiring grasping thin objects from flat surfaces, in-hand object rotations to functional configurations, and maintaining stable grasps during interactions.

Abstract

The ability to manipulate tools significantly expands the set of tasks a robot can perform. Yet, tool manipulation represents a challenging class of dexterity, requiring grasping thin objects, in-hand object rotations, and forceful interactions. Since collecting teleoperation data for these behaviors is challenging, sim-to-real reinforcement learning (RL) is a promising alternative. However, prior approaches typically require substantial engineering effort to model objects and tune reward functions for each task. In this work, we propose SimToolReal, taking a step towards generalizing sim-to-real RL policies for tool manipulation. Instead of focusing on a single object and task, we procedurally generate a large variety of tool-like object primitives in simulation and train a single RL policy with the universal goal of manipulating each object to random goal poses. This approach enables SimToolReal to perform general dexterous tool manipulation at test-time without any object or task-specific training. We demonstrate that SimToolReal outperforms prior retargeting and fixed-grasp methods by 37% while matching the performance of specialist RL policies trained on specific target objects and tasks. Finally, we show that SimToolReal generalizes across a diverse set of everyday tools, achieving strong zero-shot performance over 120 real-world rollouts spanning 24 tasks, 12 object instances, and 6 tool categories.

SimToolReal in Action (Sound On 🔊)

Zero-shot dexterous manipulation across unseen tools and tasks

Turn sound on

Results by Tool Category

Select a tool category to watch manipulation across diverse objects and tasks

Universal Training Objective: Any Goal-Pose Reaching

Instead of training on specific objects and tasks, we procedurally generate a large variety of tool-like primitives in simulation and train a single RL policy to manipulate each to random goal poses.

Inference: Track Any Goal Trajectory on Unseen Tool

At test-time, the policy generalizes to unseen real-world tools without any object or task-specific training, tracking goal trajectories zero-shot provided by a human video demonstration.

Introducing DexToolBench:
A Benchmark for Dexterous Tool-Use

We release a benchmark for evaluating policies on diverse tool-use tasks spanning 6 tool categories, 12 object instances, and 24 task trajectories. All evaluations are zero-shot: we never train on the target object or task in simulation. We open-source all simulation assets, training, and evaluation code on GitHub.

Common Failure Modes

We break down the different reasons for failure. The policy shows strong recovery behavior from failures.

Acknowledgements

This work is supported by Stanford Human-Centered Artificial Intelligence (HAI), ONR Young Investigator Award, the National Science Foundation (NSF) under Grant Numbers 2153854, 2327974, 2312956, 2327973, and 2342246, and the Natural Sciences and Engineering Research Council of Canada (NSERC) under Award Number 526541680. We thank Sharpa for the donation of the Sharpa hand and for the technical support provided by their team, specifically Kaifeng Zhang, Wenjie Mei, Yi Zhou, Yunfang Yang, Jie Yin, and Jason Lee.