Zakour* · Nath* | arXiv 2402.17758
1 / 16 Activity modelled as Chain / Graph of actions
Activities: Making Dinner / Coffee / Breakfast, Cleaning up
Actions w Hand-Obj pair: Boil / Heat · Pour / Place / Pick · Handover / Receive · Stir / Scoop · Push / Open / Close
| Dataset | Complex | Actions | 3D Hands | 6D Obj | Inter-Subj | #Hands/Seq | Avg Fr/Seq | #Img | #Cam |
|---|---|---|---|---|---|---|---|---|---|
| FPHA | ✗ | Frame | ✓ | ✓ | ✗ | 1 | 87 | 105K | 1 |
| DexYCB | ✗ | — | ✓ | ✓ | ✗ | 1 | 73 | 582K | 8 |
| H2O | ✗ | Frame | ✓ | ✓ | ✗ | 2 | 620 | 571K | 5 |
| H2O3D | ✗ | — | ✓ | ✓ | ✗ | 2 | 234 | 76K | 5 |
| MECCANO | ✓ | Frame | ✗ | ✗ | ✗ | 2 | 15K | 300K | 1 |
| OakInk | ✗ | — | ✓ | ✓ | ✓ | 2 | 44 | 230K | 4 |
| HOI4D | ✗ | Frame | ✓ | ✓ | ✗ | 1 | 480 | 2.4M | 1 |
| As.Hands | ✓ | Frame | ✓ | ✗ | ✗ | 2 | — | 3M | 12 |
| ARCTIC | ✗ | Frame | ✓ | ✓ | ✗ | 2 | 688 | 2.1M | 9 |
| ADL4D | ✓ | Hand | ✓ | ✓ | ✓ | 2–4 | 1833 | 1.1M | 8 |
ReID of landmarks is core to triangulation. Core to multi-view ReID is epipolar matching + filtering:
| Dataset | Without Method | With Method | Reduction |
|---|---|---|---|
| ADL4D | 1 088 | 22 | 98% |
| H2O | 6 302 | 213 | 97% |
| Dataset | abs MPJPE (mm) ↓ | AUC ↑ |
|---|---|---|
| H2O | 5.36 | 0.8930 |
| DexYCB | 8.56 | 0.8651 |
1.1 Million annotated RGB-D frames
ADL4D (Red) covers broader pose space than H2O (Blue) and DexYCB (Green)
Includes "in-between" transitions, handovers, and idle adjustments
| Train Set | Test Set | MPJPE (mm) ↓ |
|---|---|---|
| DexYCB (baseline) | H2O | 44.96 |
| ADL4D | H2O | 32.76 |
27% reduction in rr MPJPE — better cross-dataset generalisation. Significantly better mesh quality.
| Features | Acc. ↑ | Edit ↑ | F1@10 | F1@25 | F1@50 |
|---|---|---|---|---|---|
| I3D | 32.77 | 41.66 | 24.59 | 18.21 | 7.12 |
| X3D | 28.99 | 34.15 | 28.50 | 19.27 | 6.85 |
| SlowFast | 45.02 | 40.78 | 36.73 | 28.86 | 16.94 |
| Pose (ADL4D) | 57.15 | 53.19 | 56.77 | 50.89 | 35.81 |
| Model | ADD ↑ | ADD-S ↑ |
|---|---|---|
| FoundationPose | 0.47 | 0.64 |
| ICG+ | 0.53 | 0.74 |
Pose features remain camera-motion invariant; ADL4D validates as a challenging multi-subject benchmark for both action understanding and object pose methods.