Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies

Policy Explorer

Below, you can select any combination of our active robotic tasks and visual features to see how well the policy performs. Compare the policies that use visual features versus Scratch baseline (i.e. using no mid-level visual features) and Blind baseline (blocking all visual information from the environment, but keeping everything else identical, including reward, action space, etc -- the Blind policy calibrates how much solving the task actually requires visual information). The videos show sample random episodes of running the trained policies in an unseen test building. The map is for visualization only and is not seen by the agent. The results offer insight into the qualitative behavior induced by different visual features. You can explore these results quantitatively here.

Downstream Active Task

Mid-Level Visual Feature