Haptic Suit
100-motor haptic array. Whole-body pose and per-zone contact force on a single shared clock.
HALO is building the data layer and the model that learns from it — a self-improving world model of physical movement.
A synchronized, multi-device rig captures human movement across four instruments on a shared clock. The hard problem isn’t any single sensor — it’s tight synchronization across suit, gloves, shoes, and vision, so every force signal lands on the exact visual moment that caused it.
100-motor haptic array. Whole-body pose and per-zone contact force on a single shared clock.
Fingertip force and contact pressure at the hand — the modality cameras cannot see.
IMU + sole-pressure sensor fusion. Ground-reaction force and gait at sub-millisecond cadence.
Unbounded locomotion in a fixed footprint. Continuous capture without floor-space cost.
Vision tells you what happened. Force tells you why it worked. A grip that looks identical can succeed or fail on force alone — and force is exactly what video cannot see. HALO’s data carries both, paired and time-aligned.
We are building a self-supervised, JEPA-style action-conditioned world model in which prediction error itself is the reward — no human labeling required. The model learns to predict the latent consequences of motion; where it is wrong, it improves.
This approach is proven feasible by Meta FAIR’s V-JEPA 2-AC (June 2025), which was pre-trained on over 1 million hours of video, post-trained on under 62 hours of unlabeled robot data, and then controlled real robots zero-shot with no task-specific reward. HALO’s contribution is the data this approach has never had: whole-body, contact-rich, force-labeled human movement.
The same model that learns from the data scores its quality. Wearer-hours feed both loops at once.
Because the same person is captured in vision and force simultaneously — picking up and opening a real bottle, for example — the model can learn the mapping between how an action looks and the invisible force that made it work. That bridge is what pure-vision models structurally cannot learn.