Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
MJ-Bench (MJ-Bench-Team)
[go: Go Back, main page]

\n \"UNC\n \"University\n

\n\n---\n\n## Recent News\n- πŸ”₯ We have released [**MJ-Video**](https://aiming-lab.github.io/MJ-VIDEO.github.io/). All datasets and model checkpoints are available [here](https://huggingface.co/MJ-Bench)!\n- πŸŽ‰ **MJ-PreferGen** is **accepted by ICLR25**! Check out the paper: [*MJ-PreferGen: An Automatic Framework for Preference Data Synthesis*](https://openreview.net/forum?id=WpZyPk79Fu).\n\n---\n\n## 😎 [**MJ-Video**: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation](https://aiming-lab.github.io/MJ-VIDEO.github.io/)\n\n- **Project page**: [https://aiming-lab.github.io/MJ-VIDEO.github.io/](https://aiming-lab.github.io/MJ-VIDEO.github.io/)\n- **Code repository**: [https://github.com/aiming-lab/MJ-Video](https://github.com/aiming-lab/MJ-Video)\n\nWe release **MJ-Bench-Video**, a comprehensive fine-grained video preference benchmark, and **MJ-Video**, a powerful MoE-based multi-dimensional video reward model!\n\n

\n \"MJ-Video\n

\n\n---\n\n## πŸ‘©β€βš–οΈ [**MJ-Bench**: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?](https://mj-bench.github.io/)\n\n- **Project page**: [https://mj-bench.github.io/](https://mj-bench.github.io/)\n- **Code repository**: [https://github.com/MJ-Bench/MJ-Bench](https://github.com/MJ-Bench/MJ-Bench)\n\nText-to-image models like DALLE-3 and Stable Diffusion are proliferating rapidly, but they often encounter challenges such as hallucination, bias, and unsafe or low-quality output. To effectively address these issues, it’s crucial to align these models with desired behaviors based on feedback from a **multimodal judge**.\n\n

\n \"MJ-Bench\n

\n\nHowever, current multimodal judges are often **under-evaluated**, leading to possible misalignment and safety concerns during fine-tuning. To tackle this, we introduce **MJ-Bench**, a new benchmark featuring a comprehensive preference dataset to evaluate multimodal judges on four critical dimensions:\n\n1. **Alignment** \n2. **Safety** \n3. **Image Quality** \n4. **Bias** \n\nWe evaluate a wide range of multimodal judges, including:\n- 6 smaller-sized CLIP-based scoring models \n- 11 open-source VLMs (e.g., the LLaVA family) \n- 4 closed-source VLMs (e.g., GPT-4, Claude 3)\n\n\nπŸ”₯ **We are actively updating the [leaderboard](https://mj-bench.github.io/)!** \nYou are welcome to submit your multimodal judge’s evaluation results on [our dataset](https://huggingface.co/datasets/MJ-Bench/MJ-Bench) to the [Hugging Face leaderboard](https://huggingface.co/spaces/MJ-Bench/MJ-Bench-Leaderboard).","html":"

MJ-Bench Team

\n

MJ-Bench-Team is co-founded by Stanford University, UNC-Chapel Hill, and the University of Chicago. We aim to align modern foundation models with multimodal judges to enhance reliability, safety, and performance.

\n

\n \"Stanford\n \"UNC\n \"University\n

\n\n
\n

Recent News

\n\n
\n

😎 MJ-Video: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation

\n\n

We release MJ-Bench-Video, a comprehensive fine-grained video preference benchmark, and MJ-Video, a powerful MoE-based multi-dimensional video reward model!

\n

\n \"MJ-Video\n

\n\n
\n

πŸ‘©β€βš–οΈ MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

\n\n

Text-to-image models like DALLE-3 and Stable Diffusion are proliferating rapidly, but they often encounter challenges such as hallucination, bias, and unsafe or low-quality output. To effectively address these issues, it’s crucial to align these models with desired behaviors based on feedback from a multimodal judge.

\n

\n \"MJ-Bench\n

\n\n

However, current multimodal judges are often under-evaluated, leading to possible misalignment and safety concerns during fine-tuning. To tackle this, we introduce MJ-Bench, a new benchmark featuring a comprehensive preference dataset to evaluate multimodal judges on four critical dimensions:

\n
    \n
  1. Alignment
  2. \n
  3. Safety
  4. \n
  5. Image Quality
  6. \n
  7. Bias
  8. \n
\n

We evaluate a wide range of multimodal judges, including:

\n
    \n
  • 6 smaller-sized CLIP-based scoring models
  • \n
  • 11 open-source VLMs (e.g., the LLaVA family)
  • \n
  • 4 closed-source VLMs (e.g., GPT-4, Claude 3)
  • \n
\n

πŸ”₯ We are actively updating the leaderboard!
You are welcome to submit your multimodal judge’s evaluation results on our dataset to the Hugging Face leaderboard.

\n","classNames":"hf-sanitized hf-sanitized-ACrE25HsGrlYl6Bu0g3er"},"users":[{"_id":"63e1d3451e5a4f34b7a728ef","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63e1d3451e5a4f34b7a728ef/kHE5JHF9iZJOG0uBkvJr-.jpeg","isPro":false,"fullname":"yichaodu","user":"yichaodu","type":"user"},{"_id":"6674fccb25c64f92735c826c","avatarUrl":"/avatars/40736c5843b1c6c29649189943f39232.svg","isPro":false,"fullname":"MJ-Bench-Offical","user":"MJ-Bench-Offical","type":"user"},{"_id":"64d448d3d095077728f7c740","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d448d3d095077728f7c740/4CFkiVoXXDsu7uQPTU0n3.jpeg","isPro":false,"fullname":"Zhaorun Chen","user":"Zhaorun","type":"user"},{"_id":"63af25605fe9db73f67a0fb7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63af25605fe9db73f67a0fb7/mcQHbTdJVxi2GdHzRB9ad.jpeg","isPro":false,"fullname":"Zhuokai Zhao","user":"zhuokai","type":"user"},{"_id":"668d4fde7dbe0ab21ddba8ed","avatarUrl":"/avatars/ee5671cc0020cc1b7c6847d7af7252a0.svg","isPro":false,"fullname":"Yuqing Zhang","user":"yuqingzhang","type":"user"},{"_id":"66e6d87c45da0a1b7e8fd4a8","avatarUrl":"/avatars/f947484ec07f4f9e438450700a1c8dc4.svg","isPro":false,"fullname":"Tong Haibo","user":"EchoRaven","type":"user"},{"_id":"6621025b52cfb16741752836","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6621025b52cfb16741752836/s-jSShRfTrOad__65onGL.png","isPro":false,"fullname":"wave","user":"wave-on-discord","type":"user"}],"userCount":7,"collections":[{"slug":"MJ-Bench/aligned-diffusion-model-via-dpo-667f8b71f35c3ff47acafd43","title":"Aligned Diffusion Model via DPO","description":"","gating":false,"lastUpdated":"2024-07-08T13:32:43.223Z","owner":{"_id":"666bb425066ceea60c0d59b8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d448d3d095077728f7c740/eiPzzUiEPaUmUeJRB7oRy.jpeg","fullname":"MJ-Bench-Team","name":"MJ-Bench","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12,"isUserFollowing":false},"items":[{"_id":"667f8bacc238f05e95f43058","position":0,"type":"model","author":"yichaodu","authorData":{"_id":"63e1d3451e5a4f34b7a728ef","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63e1d3451e5a4f34b7a728ef/kHE5JHF9iZJOG0uBkvJr-.jpeg","fullname":"yichaodu","name":"yichaodu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":8,"isUserFollowing":false},"downloads":5,"gated":false,"id":"yichaodu/DiffusionDPO-bias-hps-2.1","availableInferenceProviders":[],"lastModified":"2024-07-09T07:25:14.000Z","likes":0,"pipeline_tag":"text-to-image","private":false,"repoType":"model","isLikedByUser":false},{"_id":"6681038df35c3ff47a4f1e17","position":1,"type":"model","author":"yichaodu","authorData":{"_id":"63e1d3451e5a4f34b7a728ef","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63e1d3451e5a4f34b7a728ef/kHE5JHF9iZJOG0uBkvJr-.jpeg","fullname":"yichaodu","name":"yichaodu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":8,"isUserFollowing":false},"downloads":10,"gated":false,"id":"yichaodu/DiffusionDPO-bias-gemini-1.5","availableInferenceProviders":[],"lastModified":"2024-07-09T07:25:09.000Z","likes":1,"pipeline_tag":"text-to-image","private":false,"repoType":"model","isLikedByUser":false},{"_id":"668103910b72be136710150b","position":2,"type":"model","author":"yichaodu","authorData":{"_id":"63e1d3451e5a4f34b7a728ef","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63e1d3451e5a4f34b7a728ef/kHE5JHF9iZJOG0uBkvJr-.jpeg","fullname":"yichaodu","name":"yichaodu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":8,"isUserFollowing":false},"downloads":19,"gated":false,"id":"yichaodu/DiffusionDPO-bias-claude3-opus","availableInferenceProviders":[],"lastModified":"2024-07-09T07:25:08.000Z","likes":0,"pipeline_tag":"text-to-image","private":false,"repoType":"model","isLikedByUser":false},{"_id":"6681039410d2f8d7d692296a","position":3,"type":"model","author":"yichaodu","authorData":{"_id":"63e1d3451e5a4f34b7a728ef","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63e1d3451e5a4f34b7a728ef/kHE5JHF9iZJOG0uBkvJr-.jpeg","fullname":"yichaodu","name":"yichaodu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":8,"isUserFollowing":false},"downloads":12,"gated":false,"id":"yichaodu/DiffusionDPO-alignment-hps-2.1","availableInferenceProviders":[],"lastModified":"2024-07-09T07:24:54.000Z","likes":0,"pipeline_tag":"text-to-image","private":false,"repoType":"model","isLikedByUser":false}],"position":0,"theme":"blue","private":false,"shareUrl":"https://hf.co/collections/MJ-Bench/aligned-diffusion-model-via-dpo","upvotes":3,"isUpvotedByUser":false},{"slug":"MJ-Bench/aligned-diffusion-model-via-ddpo-668bc258c85d61b94af04675","title":"Aligned Diffusion Model via DDPO","description":"","gating":false,"lastUpdated":"2024-07-08T10:42:08.049Z","owner":{"_id":"666bb425066ceea60c0d59b8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d448d3d095077728f7c740/eiPzzUiEPaUmUeJRB7oRy.jpeg","fullname":"MJ-Bench-Team","name":"MJ-Bench","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12,"isUserFollowing":false},"items":[{"_id":"668bc268d356954e1960b295","position":0,"type":"model","author":"MJ-Bench","authorData":{"_id":"666bb425066ceea60c0d59b8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d448d3d095077728f7c740/eiPzzUiEPaUmUeJRB7oRy.jpeg","fullname":"MJ-Bench-Team","name":"MJ-Bench","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12,"isUserFollowing":false},"downloads":0,"gated":false,"id":"MJ-Bench/DDPO-alignment-gpt-4o","availableInferenceProviders":[],"lastModified":"2024-07-09T07:25:18.000Z","likes":0,"pipeline_tag":"text-to-image","private":false,"repoType":"model","isLikedByUser":false},{"_id":"668bc272c5f2911b5865608f","position":1,"type":"model","author":"MJ-Bench","authorData":{"_id":"666bb425066ceea60c0d59b8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d448d3d095077728f7c740/eiPzzUiEPaUmUeJRB7oRy.jpeg","fullname":"MJ-Bench-Team","name":"MJ-Bench","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12,"isUserFollowing":false},"downloads":0,"gated":false,"id":"MJ-Bench/DDPO-alignment-gpt-4v","availableInferenceProviders":[],"lastModified":"2024-07-09T07:25:19.000Z","likes":0,"pipeline_tag":"text-to-image","private":false,"repoType":"model","isLikedByUser":false},{"_id":"668bc2804bdad2a39b23d34d","position":2,"type":"model","author":"MJ-Bench","authorData":{"_id":"666bb425066ceea60c0d59b8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d448d3d095077728f7c740/eiPzzUiEPaUmUeJRB7oRy.jpeg","fullname":"MJ-Bench-Team","name":"MJ-Bench","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12,"isUserFollowing":false},"downloads":0,"gated":false,"id":"MJ-Bench/DDPO-alignment-claude3-opus","availableInferenceProviders":[],"lastModified":"2024-07-09T07:25:16.000Z","likes":0,"pipeline_tag":"text-to-image","private":false,"repoType":"model","isLikedByUser":false}],"position":1,"theme":"green","private":false,"shareUrl":"https://hf.co/collections/MJ-Bench/aligned-diffusion-model-via-ddpo","upvotes":1,"isUpvotedByUser":false},{"slug":"MJ-Bench/mj-bench-paper-list-66929efb9276df3e18c9d3fc","title":"MJ-Bench Paper List","description":"","gating":false,"lastUpdated":"2024-07-13T15:36:36.316Z","owner":{"_id":"666bb425066ceea60c0d59b8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d448d3d095077728f7c740/eiPzzUiEPaUmUeJRB7oRy.jpeg","fullname":"MJ-Bench-Team","name":"MJ-Bench","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12,"isUserFollowing":false},"items":[{"_id":"66929f041b082600cfc6afd6","position":0,"type":"paper","id":"2407.04842","title":"MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for\n Text-to-Image Generation?","thumbnailUrl":"https://cdn-thumbnails.huggingface.co/social-thumbnails/papers/2407.04842.png","upvotes":55,"publishedAt":"2024-07-05T20:03:16.000Z","isUpvotedByUser":false}],"position":2,"theme":"indigo","private":false,"shareUrl":"https://hf.co/collections/MJ-Bench/mj-bench-paper-list","upvotes":1,"isUpvotedByUser":false}],"datasets":[{"author":"MJ-Bench","downloads":104,"gated":false,"id":"MJ-Bench/MJ-Bench","lastModified":"2025-10-23T09:50:33.000Z","datasetsServerInfo":{"viewer":"viewer-partial","numRows":7562,"libraries":["datasets","dask","polars","mlcroissant"],"formats":["parquet"],"modalities":["image","text"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"author":"MJ-Bench","downloads":86,"gated":false,"id":"MJ-Bench/MJ-BENCH-VIDEO","lastModified":"2025-02-14T10:19:51.000Z","datasetsServerInfo":{"viewer":"viewer","numRows":10842,"libraries":["datasets","mlcroissant"],"formats":[],"modalities":["video"]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false},{"author":"MJ-Bench","downloads":18,"gated":false,"id":"MJ-Bench/MJ-Bench-Results","lastModified":"2024-07-09T07:23:39.000Z","datasetsServerInfo":{"viewer":"preview","numRows":0,"libraries":[],"formats":[],"modalities":[]},"private":false,"repoType":"dataset","likes":0,"isLikedByUser":false,"isBenchmark":false}],"models":[{"author":"MJ-Bench","authorData":{"_id":"666bb425066ceea60c0d59b8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d448d3d095077728f7c740/eiPzzUiEPaUmUeJRB7oRy.jpeg","fullname":"MJ-Bench-Team","name":"MJ-Bench","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12,"isUserFollowing":false},"downloads":1,"gated":false,"id":"MJ-Bench/MJ-VIDEO-2B","availableInferenceProviders":[],"lastModified":"2025-02-06T05:13:19.000Z","likes":0,"private":false,"repoType":"model","isLikedByUser":false,"numParameters":2214241075},{"author":"MJ-Bench","authorData":{"_id":"666bb425066ceea60c0d59b8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d448d3d095077728f7c740/eiPzzUiEPaUmUeJRB7oRy.jpeg","fullname":"MJ-Bench-Team","name":"MJ-Bench","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12,"isUserFollowing":false},"downloads":0,"gated":false,"id":"MJ-Bench/DDPO-alignment-gpt-4v","availableInferenceProviders":[],"lastModified":"2024-07-09T07:25:19.000Z","likes":0,"pipeline_tag":"text-to-image","private":false,"repoType":"model","isLikedByUser":false},{"author":"MJ-Bench","authorData":{"_id":"666bb425066ceea60c0d59b8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d448d3d095077728f7c740/eiPzzUiEPaUmUeJRB7oRy.jpeg","fullname":"MJ-Bench-Team","name":"MJ-Bench","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12,"isUserFollowing":false},"downloads":0,"gated":false,"id":"MJ-Bench/DDPO-alignment-gpt-4o","availableInferenceProviders":[],"lastModified":"2024-07-09T07:25:18.000Z","likes":0,"pipeline_tag":"text-to-image","private":false,"repoType":"model","isLikedByUser":false},{"author":"MJ-Bench","authorData":{"_id":"666bb425066ceea60c0d59b8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d448d3d095077728f7c740/eiPzzUiEPaUmUeJRB7oRy.jpeg","fullname":"MJ-Bench-Team","name":"MJ-Bench","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12,"isUserFollowing":false},"downloads":0,"gated":false,"id":"MJ-Bench/DDPO-alignment-claude3-opus","availableInferenceProviders":[],"lastModified":"2024-07-09T07:25:16.000Z","likes":0,"pipeline_tag":"text-to-image","private":false,"repoType":"model","isLikedByUser":false},{"author":"MJ-Bench","authorData":{"_id":"666bb425066ceea60c0d59b8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d448d3d095077728f7c740/eiPzzUiEPaUmUeJRB7oRy.jpeg","fullname":"MJ-Bench-Team","name":"MJ-Bench","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12,"isUserFollowing":false},"downloads":5,"gated":false,"id":"MJ-Bench/DiffusionDPO-alignment-claude3-opus","availableInferenceProviders":[],"lastModified":"2024-07-09T07:22:28.000Z","likes":0,"pipeline_tag":"text-to-image","private":false,"repoType":"model","isLikedByUser":false},{"author":"MJ-Bench","authorData":{"_id":"666bb425066ceea60c0d59b8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d448d3d095077728f7c740/eiPzzUiEPaUmUeJRB7oRy.jpeg","fullname":"MJ-Bench-Team","name":"MJ-Bench","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12,"isUserFollowing":false},"downloads":3,"gated":false,"id":"MJ-Bench/DiffusionDPO-alignment-gemini-1.5","availableInferenceProviders":[],"lastModified":"2024-07-09T07:21:56.000Z","likes":0,"pipeline_tag":"text-to-image","private":false,"repoType":"model","isLikedByUser":false}],"paperPreviews":[],"spaces":[{"author":"MJ-Bench","authorData":{"_id":"666bb425066ceea60c0d59b8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64d448d3d095077728f7c740/eiPzzUiEPaUmUeJRB7oRy.jpeg","fullname":"MJ-Bench-Team","name":"MJ-Bench","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12,"isUserFollowing":false},"colorFrom":"green","colorTo":"indigo","createdAt":"2024-06-30T07:02:38.000Z","emoji":"πŸ₯‡","id":"MJ-Bench/MJ-Bench-Leaderboard","lastModified":"2024-07-13T14:35:08.000Z","likes":10,"pinned":true,"private":false,"sdk":"gradio","repoType":"space","runtime":{"stage":"RUNTIME_ERROR","hardware":{"current":null,"requested":"cpu-basic"},"storage":null,"gcTimeout":172800,"errorMessage":"Workload evicted, storage limit exceeded (50G)","replicas":{"requested":1},"devMode":false,"domains":[{"domain":"mj-bench-mj-bench-leaderboard.hf.space","stage":"READY"}]},"title":"MJ Bench Leaderboard","isLikedByUser":false,"originRepo":{"name":"demo-leaderboard-backend/leaderboard","author":{"_id":"655dbd8360009b03e4451217","avatarUrl":"https://www.gravatar.com/avatar/48236a8e5b71950f0708b3f2e3e7925f?d=retro&size=100","fullname":"Demo leaderboard with an integrated backend","name":"demo-leaderboard-backend","type":"org","isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":22,"isUserFollowing":false}},"ai_short_description":"Display and filter multimodal model leaderboard results","ai_category":"Data Visualization","trendingScore":0,"tags":["gradio","region:us"],"featured":false}],"buckets":[],"numBuckets":0,"numDatasets":3,"numModels":6,"numSpaces":2,"lastOrgActivities":[{"time":"2026-01-12T10:37:28.991Z","user":"zhuokai","userAvatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63af25605fe9db73f67a0fb7/mcQHbTdJVxi2GdHzRB9ad.jpeg","type":"paper","paper":{"id":"2410.12138","title":"Preference Optimization with Multi-Sample Comparisons","publishedAt":"2024-10-16T00:59:19.000Z","upvotes":0,"isUpvotedByUser":false}},{"time":"2026-01-12T10:36:19.633Z","user":"zhuokai","userAvatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63af25605fe9db73f67a0fb7/mcQHbTdJVxi2GdHzRB9ad.jpeg","type":"paper","paper":{"id":"2601.05106","title":"Token-Level LLM Collaboration via FusionRoute","publishedAt":"2026-01-08T16:53:16.000Z","upvotes":40,"isUpvotedByUser":true}},{"time":"2025-11-10T09:32:09.130Z","user":"zhuokai","userAvatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63af25605fe9db73f67a0fb7/mcQHbTdJVxi2GdHzRB9ad.jpeg","type":"paper","paper":{"id":"2511.03773","title":"Scaling Agent Learning via Experience Synthesis","publishedAt":"2025-11-05T18:58:48.000Z","upvotes":82,"isUpvotedByUser":true}}],"acceptLanguages":["*"],"canReadRepos":false,"canReadSpaces":false,"blogPosts":[],"currentRepoPage":0,"filters":{},"paperView":false}">

AI & ML interests

None defined yet.

Recent Activity

MJ-Bench Team

MJ-Bench-Team is co-founded by Stanford University, UNC-Chapel Hill, and the University of Chicago. We aim to align modern foundation models with multimodal judges to enhance reliability, safety, and performance.

Stanford University UNC Chapel Hill University of Chicago


Recent News


😎 MJ-Video: Fine-Grained Benchmarking and Rewarding Video Preferences in Video Generation

We release MJ-Bench-Video, a comprehensive fine-grained video preference benchmark, and MJ-Video, a powerful MoE-based multi-dimensional video reward model!

MJ-Video Overview


πŸ‘©β€βš–οΈ MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?

Text-to-image models like DALLE-3 and Stable Diffusion are proliferating rapidly, but they often encounter challenges such as hallucination, bias, and unsafe or low-quality output. To effectively address these issues, it’s crucial to align these models with desired behaviors based on feedback from a multimodal judge.

MJ-Bench Dataset Overview

However, current multimodal judges are often under-evaluated, leading to possible misalignment and safety concerns during fine-tuning. To tackle this, we introduce MJ-Bench, a new benchmark featuring a comprehensive preference dataset to evaluate multimodal judges on four critical dimensions:

  1. Alignment
  2. Safety
  3. Image Quality
  4. Bias

We evaluate a wide range of multimodal judges, including:

  • 6 smaller-sized CLIP-based scoring models
  • 11 open-source VLMs (e.g., the LLaVA family)
  • 4 closed-source VLMs (e.g., GPT-4, Claude 3)

πŸ”₯ We are actively updating the leaderboard!
You are welcome to submit your multimodal judge’s evaluation results on our dataset to the Hugging Face leaderboard.