Instructions to use BasinShapers/wilor-mlx with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use BasinShapers/wilor-mlx with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir wilor-mlx BasinShapers/wilor-mlx
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
WiLoR-MLX: Hand Pose Estimation on Apple Silicon
MLX port of WiLoR-mini for native Apple Silicon inference. Complete pipeline: ViT-H/16 backbone + MANO hand model + RefineNet refinement.
Code: github.com/lyonsno/wilor-mlx
Available Weights
| Variant | File | Size | Precision | Notes |
|---|---|---|---|---|
| float32 | wilor-mlx.safetensors |
2.4 GB | Full | Reference quality, recommended |
| int4 | wilor-mlx-int4.safetensors |
490 MB | 4-bit quantized | 5x smaller download, same speed |
Both variants produce near-identical inference speed on Apple Silicon (see benchmarks below). Choose based on download size and precision needs.
These weights contain only ViT backbone, RefineNet, and learned embedding parameters โ no MANO data is bundled or rehosted. WiLoR.from_pretrained() handles MANO automatically by fetching upstream WiLoR-mini assets and converting locally on your machine. The MANO hand model is separately licensed by the Max Planck Institute.
Performance
Apple M4 Max, single-image (1ร256ร256ร3), float32:
Same-harness gesture UI saved-frame route
| Stage | Backend | p50 | p90 | p95 | p99 |
|---|---|---|---|---|---|
| model | MLX (wilor-mlx) | 37.0 ms | 37.3 ms | 37.4 ms | 37.7 ms |
| model | PyTorch MPS | 48.8 ms | 49.6 ms | 49.8 ms | 54.2 ms |
| full route | MLX (wilor-mlx) | 48.8 ms | 49.5 ms | 49.5 ms | 49.6 ms |
| full route | PyTorch MPS | 60.1 ms | 60.8 ms | 61.1 ms | 62.1 ms |
Same M4 Max machine, same saved-frame harness, 40 recent active 160x120 frames from a gesture UI prototype. MLX is about 1.3x faster on the pose/reconstruction model stage and about 1.2x faster on the full saved-frame route. This replaces the older app-tail telemetry as the launch comparison denominator.
Larger derived-frame stress tests widen both backends; MLX remained faster in those runs, but we treat those numbers as route/runtime stress evidence rather than the headline model benchmark.
Isolated model benchmark
| Backend | p50 | p90 | min | FPS |
|---|---|---|---|---|
| MLX (wilor-mlx) | 36 ms | 36 ms | 36 ms | 28 |
| PyTorch MPS (2.5.0) | 50 ms | 51 ms | 49 ms | 20 |
1.4x faster in pure model compute. Same deterministic input, 100 iterations after 30 warmup, batched timing.
Quantization impact on speed
| Variant | p50 | FPS | Notes |
|---|---|---|---|
| float32 | 36 ms | 28 | Reference |
| float16 | 36 ms | 28 | Equal ALU throughput on M4 Max |
| int4 | 37 ms | 27 | Dequant overhead โ bandwidth savings |
On Apple Silicon, float16 and int4 do not improve latency for this model size (210 tokens ร 1280 dim). The GPU is compute-overhead-bound, not bandwidth-bound. Int4's value is purely download size reduction (2.4 GB โ 490 MB).
Numerical Accuracy
Compared against PyTorch WiLoR-mini on identical float32 inputs:
| Variant | pred_vertices max diff | pred_keypoints_3d max diff |
|---|---|---|
| float32 | 0.006 (sub-mm) | 0.006 (sub-mm) |
| int4 | 0.044 (< 2mm) | 0.044 (< 2mm) |
Both are within visual tolerance for real-time hand tracking.
Quick Start
from wilor_mlx import WiLoR
import mlx.core as mx
# Everything downloads and caches automatically
# First run requires torch for one-time MANO conversion; after that, torch is not used
model = WiLoR.from_pretrained()
# Inference
image = mx.array(your_256x256_hand_crop) # (1, 256, 256, 3) uint8
result = model(image)
mx.eval(result)
keypoints = result['pred_keypoints_3d'] # (1, 21, 3)
vertices = result['pred_vertices'] # (1, 778, 3)
See github.com/lyonsno/wilor-mlx for full documentation.
Architecture
- ViT-H/16 backbone: 1280 embed dim, 32 layers, 16 heads, 210 tokens (192 patches + 18 learnable)
- MANO hand model: 778 vertices, 16 joints, Linear Blend Skinning with kinematic chain
- RefineNet: Multi-scale deconvolution + bilinear grid sampling + MANO parameter refinement
- Total parameters: ~610M
Citation
@article{zhan2024wilor,
title={WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild},
author={Zhan, Rolandos Alexandros and others},
year={2024}
}
License
The wilor-mlx code and these weight files are MIT licensed. The weights contain only ViT backbone, RefineNet, and learned embedding parameters โ no MANO data is bundled or rehosted.
The MANO hand model is separately licensed by the Max Planck Institute. WiLoR.from_pretrained() fetches upstream WiLoR-mini assets and converts MANO data locally on your machine. You can also supply your own MANO data via mano_path=....
Quantized
Model tree for BasinShapers/wilor-mlx
Base model
warmshao/WiLoR-mini