Real tasks are messy by design
Robots need to learn from real friction: clutter, occlusion, changing lighting, deformable materials, liquids, and inconsistent human motion. Overly staged data can look clean while teaching the wrong distribution.
The goal is not perfect cinematography. The goal is task coverage with enough visual clarity to learn from.
Use multiple camera perspectives
Egocentric footage captures what the human demonstrator sees and how hands interact with objects. Exocentric footage captures the full task context and spatial relationships.
Together, these views help teams evaluate affordances, object state changes, bimanual actions, and environment layout.
QA before volume
Before scaling to hundreds of hours, inspect a smaller sample for camera stability, task completeness, annotation clarity, and consent documentation.
A strong pilot prevents teams from paying for volume that later needs to be discarded.
