mvp-dataset: deterministic data loading for multimodal training
mvp-dataset is a minimal, high-performance data loading library for multimodal training pipelines, with local tar and JSONL support, deterministic sharding, and PyTorch loader integration.