Data Specs / 2026-06-06

Metadata schemas for robot learning datasets

A practical schema turns raw video into inspectable training data. These are the fields, conventions, and versioning rules robotics teams should define first.

Embodied AI Data Labs 9 min read

Start with stable clip and capture fields

Every clip needs a stable identifier plus environment, task, participant reference, camera view, frame rate, duration, and capture timestamp. These fields make datasets searchable and auditable before annotation begins.

Consent identifiers, anonymization status, and license scope should be connected at the same level so compliance can be checked without opening separate systems.

Model actions and objects with controlled vocabularies

Labels such as pick up, pour, wipe, fold, and place should have consistent definitions. Object names, states, hand usage, contact events, and task phases should follow the same convention.

Controlled terms reduce ambiguity across collectors and annotators while making downstream filtering and benchmark construction easier.

Design for versioning and buyer-defined extensions

Annotation requirements change as a robotics program matures. A schema should allow new fields without breaking older deliveries, and every export should identify the schema version used.

MP4 plus JSON or CSV works well for many evaluation programs. HDF5 or internal formats can be generated when teams need tighter integration with an existing training stack.

Metadata schemas for robot learning datasets

Start with stable clip and capture fields

Model actions and objects with controlled vocabularies

Design for versioning and buyer-defined extensions

Need human task data your robots can learn from?

Keep the signal moving