Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
tasksource (tasksource)
[go: Go Back, main page]

promptsource. All implemented preprocessings are in tasks.py or tasks.md. A preprocessing is a function that accepts a dataset and returns the standardized dataset. Preprocessing code is concise and human-readable.

\n

GitHub: https://github.com/sileod/tasksource

\n

Installation and usage:

\n

pip install tasksource

\n
from tasksource import list_tasks, load_task\ndf = list_tasks()\n\nfor id in df[df.task_type==\"MultipleChoice\"].id:\n    dataset = load_task(id)\n    # all yielded datasets can be used interchangeably\n
\n

See supported 600+ tasks in tasks.md (+200 MultipleChoice tasks, +200 Classification tasks) and feel free to request a new task. Datasets are downloaded to $HF_DATASETS_CACHE (as any huggingface dataset), so be sure to have >100GB of space there.

\n

Pretrained model:

\n

Text encoder pretrained on tasksource reached state-of-the-art results: 🤗/deberta-v3-base-tasksource-nli

\n

Contact and citation

\n

I can help you integrate tasksource in your experiments. damien.sileo@inria.fr

\n

More details on this article:

\n
@inproceedings{sileo-2024-tasksource-large,\n    title = \"tasksource: A Large Collection of {NLP} tasks with a Structured Dataset Preprocessing Framework\",\n    author = \"Sileo, Damien\",\n    booktitle = \"Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)\",\n    month = may,\n    year = \"2024\",\n    address = \"Torino, Italia\",\n    publisher = \"ELRA and ICCL\",\n    url = \"https://aclanthology.org/2024.lrec-main.1361\",\n    pages = \"15655--15684\",\n}\n
\n","classNames":"hf-sanitized hf-sanitized-LePh4_rwGGENropn6AXwR"},"users":[{"_id":"5fc0bcb41160c47d1d43856b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fc0bcb41160c47d1d43856b/AHCEW4TfTdyjNBx-V_F5A.png","isPro":false,"fullname":"Damien Sileo","user":"sileod","type":"user"},{"_id":"640484d20ab5e22719f254eb","avatarUrl":"/avatars/890b6725b4a3697554bd5e25c0e1fa51.svg","isPro":false,"fullname":"Sonia BADENE","user":"soniabadene","type":"user"},{"_id":"61b6cbbdbfb266841ec0f24a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61b6cbbdbfb266841ec0f24a/PHUVNOOMEw_R2CF3u-sMS.png","isPro":false,"fullname":"One","user":"imone","type":"user"}],"userCount":3,"collections":[],"datasets":[{"author":"tasksource","downloads":348,"gated":false,"id":"tasksource/SYNTH","lastModified":"2026-01-27T19:34:19.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":12114462,"libraries":["datasets","dask","polars","mlcroissant"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"author":"tasksource","downloads":117,"gated":false,"id":"tasksource/FOL-nli","lastModified":"2026-01-09T14:19:39.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":102774,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":3,"isLikedByUser":false,"isBenchmark":false},{"author":"tasksource","downloads":11,"gated":false,"id":"tasksource/zorro","lastModified":"2026-01-08T13:04:22.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":46000,"libraries":["datasets","pandas","polars","mlcroissant"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"author":"tasksource","downloads":3,"gated":false,"id":"tasksource/platinum-bench","lastModified":"2025-12-19T08:47:43.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":2725,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"author":"tasksource","downloads":79,"gated":false,"id":"tasksource/dolci-instruct","lastModified":"2025-12-16T13:55:11.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":1824493,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"author":"tasksource","downloads":13,"gated":false,"id":"tasksource/flan","lastModified":"2025-12-08T20:47:27.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":200000,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"author":"tasksource","downloads":11,"gated":false,"id":"tasksource/dolci-dpo","lastModified":"2025-12-05T18:55:15.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":259922,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"author":"tasksource","downloads":100,"gated":false,"id":"tasksource/zero-shot-label-nli","lastModified":"2025-11-20T09:57:41.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":2238864,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":10,"isLikedByUser":false,"isBenchmark":false},{"author":"tasksource","downloads":3470,"gated":false,"id":"tasksource/bigbench","lastModified":"2025-07-24T08:15:18.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":841339,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":67,"isLikedByUser":false,"isBenchmark":false},{"author":"tasksource","downloads":4490,"gated":false,"id":"tasksource/mmlu","lastModified":"2025-07-18T14:02:26.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":15858,"libraries":["datasets","pandas","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":35,"isLikedByUser":false,"isBenchmark":false}],"models":[{"author":"tasksource","authorData":{"_id":"63d9307fda4f72339246fb19","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fc0bcb41160c47d1d43856b/j06-U5e2Tifi2xOnTudqS.jpeg","fullname":"tasksource","name":"tasksource","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":68,"isUserFollowing":false},"downloads":78,"gated":false,"id":"tasksource/ettin-17m-embed","availableInferenceProviders":[{"provider":"hf-inference","modelStatus":"live","providerStatus":"live","providerId":"tasksource/ettin-17m-embed","task":"sentence-similarity","isCheapestPricingOutput":false,"isFastestThroughput":false,"isModelAuthor":false}],"lastModified":"2025-12-10T17:41:20.000Z","likes":3,"pipeline_tag":"sentence-similarity","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":16797440},{"author":"tasksource","authorData":{"_id":"63d9307fda4f72339246fb19","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fc0bcb41160c47d1d43856b/j06-U5e2Tifi2xOnTudqS.jpeg","fullname":"tasksource","name":"tasksource","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":68,"isUserFollowing":false},"downloads":2,"gated":false,"id":"tasksource/ettin-32m-embed","availableInferenceProviders":[],"lastModified":"2025-12-07T20:38:15.000Z","likes":2,"pipeline_tag":"sentence-similarity","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":31883136},{"author":"tasksource","authorData":{"_id":"63d9307fda4f72339246fb19","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fc0bcb41160c47d1d43856b/j06-U5e2Tifi2xOnTudqS.jpeg","fullname":"tasksource","name":"tasksource","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":68,"isUserFollowing":false},"downloads":3682,"gated":false,"id":"tasksource/ModernBERT-base-nli","availableInferenceProviders":[{"provider":"hf-inference","modelStatus":"live","providerStatus":"live","providerId":"tasksource/ModernBERT-base-nli","task":"zero-shot-classification","isCheapestPricingOutput":false,"isFastestThroughput":false,"isModelAuthor":false}],"lastModified":"2025-01-06T08:58:31.000Z","likes":25,"pipeline_tag":"zero-shot-classification","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":149607171},{"author":"tasksource","authorData":{"_id":"63d9307fda4f72339246fb19","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fc0bcb41160c47d1d43856b/j06-U5e2Tifi2xOnTudqS.jpeg","fullname":"tasksource","name":"tasksource","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":68,"isUserFollowing":false},"downloads":16252,"gated":false,"id":"tasksource/ModernBERT-large-nli","availableInferenceProviders":[{"provider":"hf-inference","modelStatus":"live","providerStatus":"live","providerId":"tasksource/ModernBERT-large-nli","task":"zero-shot-classification","isCheapestPricingOutput":false,"isFastestThroughput":false,"isModelAuthor":false}],"lastModified":"2025-01-04T12:03:47.000Z","likes":10,"pipeline_tag":"zero-shot-classification","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":395834371},{"author":"tasksource","authorData":{"_id":"63d9307fda4f72339246fb19","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fc0bcb41160c47d1d43856b/j06-U5e2Tifi2xOnTudqS.jpeg","fullname":"tasksource","name":"tasksource","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":68,"isUserFollowing":false},"downloads":2,"gated":false,"id":"tasksource/ModernBERT-large-adapters","availableInferenceProviders":[],"lastModified":"2025-01-04T00:57:37.000Z","likes":1,"private":false,"repoType":"model","isLikedByUser":false},{"author":"tasksource","authorData":{"_id":"63d9307fda4f72339246fb19","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fc0bcb41160c47d1d43856b/j06-U5e2Tifi2xOnTudqS.jpeg","fullname":"tasksource","name":"tasksource","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":68,"isUserFollowing":false},"downloads":48,"gated":false,"id":"tasksource/ModernBERT-base-embed","availableInferenceProviders":[],"lastModified":"2024-12-29T14:06:20.000Z","likes":16,"pipeline_tag":"sentence-similarity","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":149014272},{"author":"tasksource","authorData":{"_id":"63d9307fda4f72339246fb19","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fc0bcb41160c47d1d43856b/j06-U5e2Tifi2xOnTudqS.jpeg","fullname":"tasksource","name":"tasksource","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":68,"isUserFollowing":false},"downloads":1,"gated":false,"id":"tasksource/ModernBERT-base-adapters","availableInferenceProviders":[],"lastModified":"2024-12-28T11:56:17.000Z","likes":0,"private":false,"repoType":"model","isLikedByUser":false},{"author":"tasksource","authorData":{"_id":"63d9307fda4f72339246fb19","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fc0bcb41160c47d1d43856b/j06-U5e2Tifi2xOnTudqS.jpeg","fullname":"tasksource","name":"tasksource","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":68,"isUserFollowing":false},"downloads":1282,"gated":false,"id":"tasksource/deberta-base-long-nli","availableInferenceProviders":[{"provider":"hf-inference","modelStatus":"live","providerStatus":"live","providerId":"tasksource/deberta-base-long-nli","task":"zero-shot-classification","isCheapestPricingOutput":false,"isFastestThroughput":false,"isModelAuthor":false}],"lastModified":"2024-10-04T12:22:16.000Z","likes":26,"pipeline_tag":"zero-shot-classification","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":184424451},{"author":"tasksource","authorData":{"_id":"63d9307fda4f72339246fb19","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fc0bcb41160c47d1d43856b/j06-U5e2Tifi2xOnTudqS.jpeg","fullname":"tasksource","name":"tasksource","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":68,"isUserFollowing":false},"downloads":15433,"gated":false,"id":"tasksource/deberta-small-long-nli","availableInferenceProviders":[{"provider":"hf-inference","modelStatus":"live","providerStatus":"live","providerId":"tasksource/deberta-small-long-nli","task":"zero-shot-classification","isCheapestPricingOutput":false,"isFastestThroughput":false,"isModelAuthor":false}],"lastModified":"2024-08-28T15:06:37.000Z","likes":48,"pipeline_tag":"zero-shot-classification","private":false,"repoType":"model","isLikedByUser":false,"widgetOutputUrls":[],"numParameters":141897219},{"author":"tasksource","authorData":{"_id":"63d9307fda4f72339246fb19","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fc0bcb41160c47d1d43856b/j06-U5e2Tifi2xOnTudqS.jpeg","fullname":"tasksource","name":"tasksource","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":68,"isUserFollowing":false},"downloads":1,"gated":false,"id":"tasksource/deberta-base-long-adapters","availableInferenceProviders":[],"lastModified":"2024-07-28T08:23:50.000Z","likes":0,"private":false,"repoType":"model","isLikedByUser":false}],"paperPreviews":[],"spaces":[{"author":"tasksource","authorData":{"_id":"63d9307fda4f72339246fb19","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fc0bcb41160c47d1d43856b/j06-U5e2Tifi2xOnTudqS.jpeg","fullname":"tasksource","name":"tasksource","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":68,"isUserFollowing":false},"colorFrom":"gray","colorTo":"red","createdAt":"2024-12-21T13:47:17.000Z","emoji":"🧠","id":"tasksource/ModernBERT-zero-shot-nli","lastModified":"2025-01-17T10:26:43.000Z","likes":11,"pinned":false,"private":false,"sdk":"gradio","repoType":"space","runtime":{"stage":"RUNNING","hardware":{"current":"cpu-basic","requested":"cpu-basic"},"storage":null,"gcTimeout":172800,"replicas":{"current":1,"requested":1},"devMode":false,"domains":[{"domain":"tasksource-modernbert-zero-shot-nli.hf.space","stage":"READY"}],"sha":"7660f8089995f96ee3d4d5a36bae734637275463"},"shortDescription":"ModernBERT for reasoning and zero-shot classification","title":"ModernBERT Zero-Shot NLI","isLikedByUser":false,"ai_short_description":"Analyze text for sentiment or logical relationships","ai_category":"Text Analysis","trendingScore":1,"tags":["gradio","region:us"],"featured":false}],"buckets":[],"numBuckets":0,"numDatasets":174,"numModels":10,"numSpaces":2,"lastOrgActivities":[{"time":"2026-01-27T19:34:21.647Z","user":"sileod","userAvatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fc0bcb41160c47d1d43856b/AHCEW4TfTdyjNBx-V_F5A.png","orgAvatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fc0bcb41160c47d1d43856b/j06-U5e2Tifi2xOnTudqS.jpeg","type":"update","repoData":{"author":"tasksource","downloads":348,"gated":false,"id":"tasksource/SYNTH","lastModified":"2026-01-27T19:34:19.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":12114462,"libraries":["datasets","dask","polars","mlcroissant"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},"repoId":"tasksource/SYNTH","repoType":"dataset","org":"tasksource"},{"time":"2026-01-27T15:16:35.316Z","user":"sileod","userAvatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fc0bcb41160c47d1d43856b/AHCEW4TfTdyjNBx-V_F5A.png","type":"paper-daily","paper":{"id":"2601.18790","title":"MortalMATH: Evaluating the Conflict Between Reasoning Objectives and Emergency Contexts","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2601.18790.png","upvotes":2,"publishedAt":"2026-01-26T18:55:07.000Z","isUpvotedByUser":true}},{"time":"2026-01-09T14:19:40.169Z","user":"sileod","userAvatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fc0bcb41160c47d1d43856b/AHCEW4TfTdyjNBx-V_F5A.png","orgAvatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fc0bcb41160c47d1d43856b/j06-U5e2Tifi2xOnTudqS.jpeg","type":"update","repoData":{"author":"tasksource","downloads":117,"gated":false,"id":"tasksource/FOL-nli","lastModified":"2026-01-09T14:19:39.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":102774,"libraries":["datasets","dask","mlcroissant","polars"],"formats":["parquet"],"modalities":["text"]},"private":false,"repoType":"dataset","likes":3,"isLikedByUser":false,"isBenchmark":false},"repoId":"tasksource/FOL-nli","repoType":"dataset","org":"tasksource"}],"acceptLanguages":["*"],"canReadRepos":false,"canReadSpaces":false,"blogPosts":[],"currentRepoPage":0,"filters":{},"paperView":false}">

AI & ML interests

None defined yet.

Recent Activity

sileod  updated a dataset 24 days ago
tasksource/SYNTH
sileod  updated a dataset about 1 month ago
tasksource/FOL-nli
View all activity

tasksource: 600+ dataset harmonization preprocessings with structured annotations for frictionless extreme multi-task learning and evaluation

Huggingface Datasets is a great library, but it lacks standardization, and datasets require preprocessing work to be used interchangeably. tasksource automates this and facilitates reproducible multi-task learning scaling.

Each dataset is standardized to either MultipleChoice, Classification, or TokenClassification dataset with identical fields. We do not support generation tasks as they are addressed by promptsource. All implemented preprocessings are in tasks.py or tasks.md. A preprocessing is a function that accepts a dataset and returns the standardized dataset. Preprocessing code is concise and human-readable.

GitHub: https://github.com/sileod/tasksource

Installation and usage:

pip install tasksource

from tasksource import list_tasks, load_task
df = list_tasks()

for id in df[df.task_type=="MultipleChoice"].id:
    dataset = load_task(id)
    # all yielded datasets can be used interchangeably

See supported 600+ tasks in tasks.md (+200 MultipleChoice tasks, +200 Classification tasks) and feel free to request a new task. Datasets are downloaded to $HF_DATASETS_CACHE (as any huggingface dataset), so be sure to have >100GB of space there.

Pretrained model:

Text encoder pretrained on tasksource reached state-of-the-art results: 🤗/deberta-v3-base-tasksource-nli

Contact and citation

I can help you integrate tasksource in your experiments. damien.sileo@inria.fr

More details on this article:

@inproceedings{sileo-2024-tasksource-large,
    title = "tasksource: A Large Collection of {NLP} tasks with a Structured Dataset Preprocessing Framework",
    author = "Sileo, Damien",
    booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
    month = may,
    year = "2024",
    address = "Torino, Italia",
    publisher = "ELRA and ICCL",
    url = "https://aclanthology.org/2024.lrec-main.1361",
    pages = "15655--15684",
}