Start with stable clip and capture fields
Every clip needs a stable identifier plus environment, task, participant reference, camera view, frame rate, duration, and capture timestamp. These fields make datasets searchable and auditable before annotation begins.
Consent identifiers, anonymization status, and license scope should be connected at the same level so compliance can be checked without opening separate systems.
Model actions and objects with controlled vocabularies
Labels such as pick up, pour, wipe, fold, and place should have consistent definitions. Object names, states, hand usage, contact events, and task phases should follow the same convention.
Controlled terms reduce ambiguity across collectors and annotators while making downstream filtering and benchmark construction easier.
Design for versioning and buyer-defined extensions
Annotation requirements change as a robotics program matures. A schema should allow new fields without breaking older deliveries, and every export should identify the schema version used.
MP4 plus JSON or CSV works well for many evaluation programs. HDF5 or internal formats can be generated when teams need tighter integration with an existing training stack.
