Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
BasinShapers/wilor-mlx ยท Hugging Face
[go: Go Back, main page]

WiLoR-MLX: Hand Pose Estimation on Apple Silicon

MLX port of WiLoR-mini for native Apple Silicon inference. Complete pipeline: ViT-H/16 backbone + MANO hand model + RefineNet refinement.

Code: github.com/lyonsno/wilor-mlx

Available Weights

Variant File Size Precision Notes
float32 wilor-mlx.safetensors 2.4 GB Full Reference quality, recommended
int4 wilor-mlx-int4.safetensors 490 MB 4-bit quantized 5x smaller download, same speed

Both variants produce near-identical inference speed on Apple Silicon (see benchmarks below). Choose based on download size and precision needs.

These weights contain only ViT backbone, RefineNet, and learned embedding parameters โ€” no MANO data is bundled or rehosted. WiLoR.from_pretrained() handles MANO automatically by fetching upstream WiLoR-mini assets and converting locally on your machine. The MANO hand model is separately licensed by the Max Planck Institute.

Performance

Apple M4 Max, single-image (1ร—256ร—256ร—3), float32:

Same-harness gesture UI saved-frame route

Stage Backend p50 p90 p95 p99
model MLX (wilor-mlx) 37.0 ms 37.3 ms 37.4 ms 37.7 ms
model PyTorch MPS 48.8 ms 49.6 ms 49.8 ms 54.2 ms
full route MLX (wilor-mlx) 48.8 ms 49.5 ms 49.5 ms 49.6 ms
full route PyTorch MPS 60.1 ms 60.8 ms 61.1 ms 62.1 ms

Same M4 Max machine, same saved-frame harness, 40 recent active 160x120 frames from a gesture UI prototype. MLX is about 1.3x faster on the pose/reconstruction model stage and about 1.2x faster on the full saved-frame route. This replaces the older app-tail telemetry as the launch comparison denominator.

Larger derived-frame stress tests widen both backends; MLX remained faster in those runs, but we treat those numbers as route/runtime stress evidence rather than the headline model benchmark.

Isolated model benchmark

Backend p50 p90 min FPS
MLX (wilor-mlx) 36 ms 36 ms 36 ms 28
PyTorch MPS (2.5.0) 50 ms 51 ms 49 ms 20

1.4x faster in pure model compute. Same deterministic input, 100 iterations after 30 warmup, batched timing.

Quantization impact on speed

Variant p50 FPS Notes
float32 36 ms 28 Reference
float16 36 ms 28 Equal ALU throughput on M4 Max
int4 37 ms 27 Dequant overhead โ‰ˆ bandwidth savings

On Apple Silicon, float16 and int4 do not improve latency for this model size (210 tokens ร— 1280 dim). The GPU is compute-overhead-bound, not bandwidth-bound. Int4's value is purely download size reduction (2.4 GB โ†’ 490 MB).

Numerical Accuracy

Compared against PyTorch WiLoR-mini on identical float32 inputs:

Variant pred_vertices max diff pred_keypoints_3d max diff
float32 0.006 (sub-mm) 0.006 (sub-mm)
int4 0.044 (< 2mm) 0.044 (< 2mm)

Both are within visual tolerance for real-time hand tracking.

Quick Start

from wilor_mlx import WiLoR
import mlx.core as mx

# Everything downloads and caches automatically
# First run requires torch for one-time MANO conversion; after that, torch is not used
model = WiLoR.from_pretrained()

# Inference
image = mx.array(your_256x256_hand_crop)  # (1, 256, 256, 3) uint8
result = model(image)
mx.eval(result)

keypoints = result['pred_keypoints_3d']  # (1, 21, 3)
vertices = result['pred_vertices']        # (1, 778, 3)

See github.com/lyonsno/wilor-mlx for full documentation.

Architecture

  • ViT-H/16 backbone: 1280 embed dim, 32 layers, 16 heads, 210 tokens (192 patches + 18 learnable)
  • MANO hand model: 778 vertices, 16 joints, Linear Blend Skinning with kinematic chain
  • RefineNet: Multi-scale deconvolution + bilinear grid sampling + MANO parameter refinement
  • Total parameters: ~610M

Citation

@article{zhan2024wilor,
  title={WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild},
  author={Zhan, Rolandos Alexandros and others},
  year={2024}
}

License

The wilor-mlx code and these weight files are MIT licensed. The weights contain only ViT backbone, RefineNet, and learned embedding parameters โ€” no MANO data is bundled or rehosted.

The MANO hand model is separately licensed by the Max Planck Institute. WiLoR.from_pretrained() fetches upstream WiLoR-mini assets and converts MANO data locally on your machine. You can also supply your own MANO data via mano_path=....

Downloads last month

-

Downloads are not tracked for this model. How to track
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for BasinShapers/wilor-mlx

Finetuned
(3)
this model