Grasping moving objects is a challenging task that combines multiple submodules such as object pose predictor, arm motion planner, etc. Each submodule operates under its own set of meta-parameters. For example, how far the pose predictor should look into the future (i.e., look-ahead time) and the maximum amount of time the motion planner can spend planning a motion (i.e., time budget). Many previous works assign fixed values to these parameters either heuristically or through grid search; however, at different moments within a single episode of dynamic grasping, the optimal values should vary depending on the current scene. In this work, we learn a meta-controller through reinforcement learning to control the look-ahead time and time budget dynamically. Our extensive experiments show that the meta-controller improves the grasping success rate (up to 12% in the most cluttered environment) and reduces grasping time, compared to the strongest baseline. Our meta-controller learns to reason about the reachable workspace and maintain the predicted pose within the reachable region. In addition, it assigns a small but sufficient time budget for the motion planner. Our method can handle different target objects, trajectories, and obstacles. Despite being trained only with 3-6 randomly generated cuboidal obstacles, our meta-controller generalizes well to 7-9 obstacles and more realistic out-of-domain household setups with unseen obstacle shapes.


Latest Paper Version: ArXiv. Code and instructions to download data coming soon.

Paper Thumbnail


* Equal Contribution
1 Columbia University
2 University of Pennsylvania

Supplementary Video


We evaluate our meta-controller in a comprehensive range of environments with different obstacles, trajectories, and targets. Our meta-controller is trained in a general setup with randomly generated cuboidal obstacles. We show that our meta-controller, despite only being trained with 3-6 obstacles, can successfully generalize to 7-9 obstacles. Note that with such obstacles, the environment becomes extremely cluttered, as shown in the figure below. We also show that our meta-controller, trained in such a general setup, can work directly in specific environments with unseen obstacle shapes that mimic ware- house, household, and retailer scenarios. Below are some dynamic grasping demos that utilize our meta-controller.

3~6 Random Blocks

In this setup, the obstacle poses are randomly sampled from a cuboidal volume that incorporates both the trajectory and robot arm. These random blocks are guaranteed not to block the conveyor trajectory, by setting up a protected area with a height of 30cm and a width of 20cm surrounding the trajectory. The number of obstacles is between 3 and 6. The x, y, and z dimensions of these rectangular obstacles are all uniformly sampled from [5cm, 15cm]. In this setup, the conveyor trajectory is sampled from all 4 trajectories.

7~9 Random Blocks

This setup is the same as 3- 6 Random Blocks except that we increase the number of obstacles to 7-9. This is used only for evaluation.

Household Setup

In this setup, there are three shelves (one top shelf and two side shelves) between the robot arm and the object trajectory. For each episode, the conveyor motion is sampled from linear, sinusoidal, and rectangular trajectories, without circular trajectories. The top shelf height is randomized between 40 - 60cm, and the side shelf locations are randomized so that an empty middle space of length 45 - 85cm is available. It is designed to have a reachable area in the middle of the trajectory while blocked at the start and end. Even though this setup is not as highly cluttered as 7- 9 Random Blocks, it can evaluate the generalization ability of our meta-controller to completely new obstacle shapes and locations not seen during training. This is used only for evaluation.

Cluttered Household Setup

In this setup, the conveyor moves following a circular trajectory. There are 5 cylinder obstacles surrounding the trajectory and a top circular shelf obstacle covering the trajectory. The top circular shelf consists of 15 identical convex trapezoidal parts. Positions of these cylinder obstacles are randomly sampled. This is a harder and more cluttered setup compared to the Household Setup. It is motivated by operating in extremely cluttered household and warehouse environments. This is used only for evaluation.


We observe three things that our meta-controller learns. (1) It can reason about the reachable workspace and through dynamically controlling the look-ahead time and time budget, it maintains the predicted pose and the planned motion within the most reachable region. (2) It learns to generate a small look-ahead time when the predicted trajectory is not accurate. (3) It learns to produce a small but sufficient time budget for motion planning.

Household Setup

Clutterred Household Setup


            title = {Learning a Meta-Controller for Dynamic Grasping},
            author = {Jia, Yinsen and Xu, Jingxi and Jayaraman, Dinesh and Song, Shuran},
            publisher = {arXiv},
            year = {2023},