Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
pppop7/pppop7 Β· Hugging Face
[go: Go Back, main page]

πŸ‘‹ Hi, I'm pppop7

I'm working on Vision-Language Models and Video Research.


πŸ“š My LLaVA Training Datasets

A complete collection of datasets for training LLaVA (Large Language and Vision Assistant) and Video-LLM models.

🎯 Quick Overview

Dataset Type Size Format Purpose
LLaVA-Pretrain Image-Text 558K pairs JSON + ZIP Pretraining
LLaVA-Instruct-150K Instruction 150K conversations JSON Instruction Tuning
textvqa VQA ~35K Parquet Text Reading in Images
GQA VQA ~22M Q&A Parquet Visual Reasoning
coco_train_2017 Images 118K images ZIP (~18GB) General Images
VisualGenome_VG_100K_1_and_2 Images 216K images ZIP (~15GB) Scene Understanding
OCR-VQA VQA ~200K Parquet OCR Question Answering

πŸ“₯ Quick Download Guide

For ZIP datasets (COCO, Visual Genome)

from huggingface_hub import hf_hub_download

# Download COCO train2017
hf_hub_download(
    repo_id="pppop7/coco_train_2017",
    repo_type="dataset",
    filename="train2017.zip",
    local_dir="./data/coco"
)

# Download Visual Genome
hf_hub_download(
    repo_id="pppop7/VisualGenome_VG_100K_1_and_2",
    repo_type="dataset",
    filename="images.zip",
    local_dir="./data/vg"
)

For Parquet datasets (TextVQA, GQA, OCR-VQA)

from datasets import load_dataset

# Load TextVQA
textvqa = load_dataset("pppop7/textvqa")

# Load GQA
gqa = load_dataset("pppop7/GQA")

# Load OCR-VQA
ocr_vqa = load_dataset("pppop7/OCR-VQA")

Download All at Once

from huggingface_hub import snapshot_download

datasets = [
    "pppop7/LLaVA-Pretrain",
    "pppop7/LLaVA-Instruct-150K",
    "pppop7/textvqa",
    "pppop7/GQA",
    "pppop7/coco_train_2017",
    "pppop7/VisualGenome_VG_100K_1_and_2",
    "pppop7/OCR-VQA",
]

for ds in datasets:
    snapshot_download(repo_id=ds, repo_type="dataset", local_dir=f"./data/{ds.split('/')[-1]}")

πŸ—‚οΈ Recommended Directory Structure

After downloading, organize your data like this:

data/
β”œβ”€β”€ llava/
β”‚   β”œβ”€β”€ LLaVA-Pretrain/
β”‚   β”‚   β”œβ”€β”€ blip_laion_cc_sbu_558k.json
β”‚   β”‚   └── images/
β”‚   └── LLaVA-Instruct-150K/
β”‚       └── llava_v1_5_mix665k.json
β”œβ”€β”€ coco/
β”‚   └── train2017/          # Extracted from ZIP
β”œβ”€β”€ vg/
β”‚   β”œβ”€β”€ VG_100K/            # Extracted from images.zip
β”‚   └── VG_100K_2/          # Extracted from images2.zip
β”œβ”€β”€ textvqa/                # Parquet files
β”œβ”€β”€ gqa/                    # Parquet files
└── ocr_vqa/                # Parquet files

πŸ“Š Dataset Details

1. LLaVA-Pretrain

  • Purpose: Stage 1 pretraining for vision-language alignment
  • Content: 558K image-caption pairs from BLIP
  • Files: blip_laion_cc_sbu_558k.json + images.zip

2. LLaVA-Instruct-150K

  • Purpose: Stage 2 instruction tuning
  • Content: 150K visual instruction conversations
  • Includes: Complex reasoning, detailed descriptions, conversations

3. TextVQA

  • Purpose: Text reading in images
  • Content: Images containing text + questions about the text
  • Format: Parquet with embedded images

4. GQA

  • Purpose: Visual reasoning and compositional questions
  • Content: ~22M question-answer pairs
  • Format: Parquet with embedded images

5. COCO train2017

  • Purpose: General image understanding
  • Content: 118K diverse images
  • Format: ZIP archive

6. Visual Genome

  • Purpose: Dense scene understanding
  • Content: 216K images with rich annotations
  • Format: Two ZIP archives (VG_100K + VG_100K_2)

7. OCR-VQA

  • Purpose: Reading and understanding text in book covers
  • Content: ~200K VQA pairs
  • Format: Parquet with embedded images

🎬 Coming Soon: Video Datasets

Stay tuned for video-related datasets for Video-LLM research!


πŸ“¬ Contact

Feel free to reach out if you have questions about these datasets!


πŸ“œ License

Each dataset follows its original license. Please check individual dataset pages for details.

  • COCO: CC BY 4.0
  • Visual Genome: CC BY 4.0
  • GQA: CC BY 4.0
  • TextVQA: CC BY 4.0
  • OCR-VQA: Please check original source
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support