WiLoR-MLX: Hand Pose Estimation on Apple Silicon

MLX port of WiLoR-mini for native Apple Silicon inference. Complete pipeline: ViT-H/16 backbone + MANO hand model + RefineNet refinement.

Code: github.com/lyonsno/wilor-mlx

Available Weights

Variant	File	Size	Precision	Notes
float32	`wilor-mlx.safetensors`	2.4 GB	Full	Reference quality, recommended
int4	`wilor-mlx-int4.safetensors`	490 MB	4-bit quantized	5x smaller download, same speed

Both variants produce near-identical inference speed on Apple Silicon (see benchmarks below). Choose based on download size and precision needs.

These weights contain only ViT backbone, RefineNet, and learned embedding parameters — no MANO data is bundled or rehosted. WiLoR.from_pretrained() handles MANO automatically by fetching upstream WiLoR-mini assets and converting locally on your machine. The MANO hand model is separately licensed by the Max Planck Institute.

Performance

Apple M4 Max, single-image (1×256×256×3), float32:

Same-harness gesture UI saved-frame route

Stage	Backend	p50	p90	p95	p99
model	MLX (wilor-mlx)	37.0 ms	37.3 ms	37.4 ms	37.7 ms
model	PyTorch MPS	48.8 ms	49.6 ms	49.8 ms	54.2 ms
full route	MLX (wilor-mlx)	48.8 ms	49.5 ms	49.5 ms	49.6 ms
full route	PyTorch MPS	60.1 ms	60.8 ms	61.1 ms	62.1 ms

Same M4 Max machine, same saved-frame harness, 40 recent active 160x120 frames from a gesture UI prototype. MLX is about 1.3x faster on the pose/reconstruction model stage and about 1.2x faster on the full saved-frame route. This replaces the older app-tail telemetry as the launch comparison denominator.

Larger derived-frame stress tests widen both backends; MLX remained faster in those runs, but we treat those numbers as route/runtime stress evidence rather than the headline model benchmark.

Isolated model benchmark

Backend	p50	p90	min	FPS
MLX (wilor-mlx)	36 ms	36 ms	36 ms	28
PyTorch MPS (2.5.0)	50 ms	51 ms	49 ms	20

1.4x faster in pure model compute. Same deterministic input, 100 iterations after 30 warmup, batched timing.

Quantization impact on speed

Variant	p50	FPS	Notes
float32	36 ms	28	Reference
float16	36 ms	28	Equal ALU throughput on M4 Max
int4	37 ms	27	Dequant overhead ≈ bandwidth savings

On Apple Silicon, float16 and int4 do not improve latency for this model size (210 tokens × 1280 dim). The GPU is compute-overhead-bound, not bandwidth-bound. Int4's value is purely download size reduction (2.4 GB → 490 MB).

Numerical Accuracy

Compared against PyTorch WiLoR-mini on identical float32 inputs:

Variant	pred_vertices max diff	pred_keypoints_3d max diff
float32	0.006 (sub-mm)	0.006 (sub-mm)
int4	0.044 (< 2mm)	0.044 (< 2mm)

Both are within visual tolerance for real-time hand tracking.

Quick Start

from wilor_mlx import WiLoR
import mlx.core as mx

# Everything downloads and caches automatically
# First run requires torch for one-time MANO conversion; after that, torch is not used
model = WiLoR.from_pretrained()

# Inference
image = mx.array(your_256x256_hand_crop)  # (1, 256, 256, 3) uint8
result = model(image)
mx.eval(result)

keypoints = result['pred_keypoints_3d']  # (1, 21, 3)
vertices = result['pred_vertices']        # (1, 778, 3)

See github.com/lyonsno/wilor-mlx for full documentation.

Architecture

ViT-H/16 backbone: 1280 embed dim, 32 layers, 16 heads, 210 tokens (192 patches + 18 learnable)
MANO hand model: 778 vertices, 16 joints, Linear Blend Skinning with kinematic chain
RefineNet: Multi-scale deconvolution + bilinear grid sampling + MANO parameter refinement
Total parameters: ~610M

Citation

@article{zhan2024wilor,
  title={WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild},
  author={Zhan, Rolandos Alexandros and others},
  year={2024}
}

License

The wilor-mlx code and these weight files are MIT licensed. The weights contain only ViT backbone, RefineNet, and learned embedding parameters — no MANO data is bundled or rehosted.

The MANO hand model is separately licensed by the Max Planck Institute. WiLoR.from_pretrained() fetches upstream WiLoR-mini assets and converts MANO data locally on your machine. You can also supply your own MANO data via mano_path=....

Downloads last month: -; Downloads are not tracked for this model. How to track

MLX

Hardware compatibility

Quantized

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BasinShapers/wilor-mlx

Base model

warmshao/WiLoR-mini

Finetuned

(3)

this model