Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation
[go: Go Back, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-11-08T01:35:50.172Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":317,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7136511206626892},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2511.03001","authors":[{"_id":"690c0e1c60494e4fa767563a","user":{"_id":"66bb326534295e9cf08df4e2","avatarUrl":"/avatars/5dc3225be1194467b30691b5d33c7b19.svg","isPro":false,"fullname":"Gyeom hwangbo","user":"aerojohn1223","type":"user"},"name":"Gyeom Hwangbo","status":"claimed_verified","statusLastChangedAt":"2025-11-06T11:48:07.609Z","hidden":false},{"_id":"690c0e1c60494e4fa767563b","user":{"_id":"64c8f4cec547ed5243ebd0a8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64c8f4cec547ed5243ebd0a8/MiOH5YbMg8Gh9KYlQsLmX.jpeg","isPro":false,"fullname":"Hyungjoo Chae","user":"hyungjoochae","type":"user"},"name":"Hyungjoo Chae","status":"claimed_verified","statusLastChangedAt":"2025-11-10T09:32:13.916Z","hidden":false},{"_id":"690c0e1c60494e4fa767563c","name":"Minseok Kang","hidden":false},{"_id":"690c0e1c60494e4fa767563d","name":"Hyeonjong Ju","hidden":false},{"_id":"690c0e1c60494e4fa767563e","name":"Soohyun Oh","hidden":false},{"_id":"690c0e1c60494e4fa767563f","name":"Jinyoung Yeo","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/6039478ab3ecf716b1a5fd4d/hmrVsF8OLhsfTzRcjJyOn.png"],"publishedAt":"2025-11-04T21:13:51.000Z","submittedOnDailyAt":"2025-11-06T00:25:38.643Z","title":"LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied\n Environments with Tool Augmentation","submittedOnDailyBy":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},"summary":"Despite recent progress in using Large Language Models (LLMs) for\nautomatically generating 3D scenes, generated scenes often lack realistic\nspatial layouts and object attributes found in real-world environments. As this\nproblem stems from insufficiently detailed, coarse-grained instructions,\nadvancing 3D scene synthesis guided by more detailed, fine-grained instructions\nthat reflect real-world environments becomes crucial. Without such realistic\nscenes, training embodied agents in unrealistic environments can lead them to\nlearn priors that diverge significantly from real-world physics and semantics,\ndegrading their performance when deployed. Thus, verifying the alignment\nbetween the fine-grained instruction and the generated scene is essential for\neffective learning. However, current evaluation methods, such as CLIPScore and\nvision-language models (VLMs), often fail to reliably assess such alignment.\nThis shortcoming arises primarily from their shallow understanding of 3D\nscenes, which often leads to improperly grounded scene components. To address\nthis, we introduce LEGO-Eval, an evaluation framework equipped with diverse\ntools designed to explicitly ground scene components, enabling more accurate\nalignment assessments. We also present LEGO-Bench, a benchmark of detailed\ninstructions that specify complex layouts and attributes of real-world\nenvironments. Experiments demonstrate that LEGO-Eval outperforms VLM-as-a-judge\nby 0.41 F1 score in assessing scene-instruction alignment. Benchmarking with\nLEGO-Bench reveals significant limitations in current generation methods.\nAcross all evaluated approaches, success rates reached at most 10% in\ngenerating scenes that fully align with fine-grained instructions.","upvotes":47,"discussionId":"690c0e1d60494e4fa7675640","projectPage":"https://gyeomh.github.io/LEGO-Eval/","githubRepo":"https://github.com/gyeomh/LEGO-EVAL","githubRepoAddedBy":"auto","ai_summary":"LEGO-Eval and LEGO-Bench improve the evaluation and generation of realistic 3D scenes by aligning detailed instructions with scene components, outperforming existing methods.","ai_keywords":["Large Language Models","3D scene synthesis","fine-grained instructions","real-world environments","CLIPScore","vision-language models","LEGO-Eval","LEGO-Bench","scene-instruction alignment","F1 score"],"githubStars":5},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"690c6401fc915c04950a66ab","avatarUrl":"/avatars/4d57d346497f66c9d92f45928caaa808.svg","isPro":false,"fullname":"dayoung sung","user":"bangto5959","type":"user"},{"_id":"644a749923674553157425b6","avatarUrl":"/avatars/6acf2139290e15037e7bd0ec3adcb421.svg","isPro":false,"fullname":"dsg","user":"GriAA","type":"user"},{"_id":"664c6017de027470218143a2","avatarUrl":"/avatars/47a55a999844d70cc06b1b85fbd44946.svg","isPro":false,"fullname":"MinseokKang","user":"MANGSEOK123","type":"user"},{"_id":"690c6303cbbc094b9fb63fb3","avatarUrl":"/avatars/6912cd9092a2e306f90c34a3e0c8cbcd.svg","isPro":false,"fullname":"Jonghyeok Eun","user":"Jonghyeok1218","type":"user"},{"_id":"690c6c1f160303a7e57ce410","avatarUrl":"/avatars/7ee4b92ebdfecce4de1ddd67c9e0f71a.svg","isPro":false,"fullname":"Yein Jeong","user":"ulolfo","type":"user"},{"_id":"654c263fbe11400417c93d9f","avatarUrl":"/avatars/eb5778e28091200efee2c6b68589a1a2.svg","isPro":false,"fullname":"choi dongwook","user":"dongwookchoi","type":"user"},{"_id":"646def60df618b303b419323","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646def60df618b303b419323/JLJGYen4-5M8ivsLsSk0w.jpeg","isPro":false,"fullname":"Lei Wang","user":"demolei","type":"user"},{"_id":"690c72cd91fe8ea7642d739f","avatarUrl":"/avatars/ad1470999a807e6f9066d773276865f7.svg","isPro":false,"fullname":"leewogeen","user":"leewogeen","type":"user"},{"_id":"659754a370211b12ce2af808","avatarUrl":"/avatars/ea8a8dbe1031611fd3716fa064b055f4.svg","isPro":false,"fullname":"hyojun","user":"hyojuuun","type":"user"},{"_id":"690c7efec1bef4972929f82c","avatarUrl":"/avatars/1e8b75ee62d47555773efcbe6fcbffbe.svg","isPro":false,"fullname":"Ruǎn Lìbāng","user":"0074abcd","type":"user"},{"_id":"690c82d5d5a6e61a43f7ebf4","avatarUrl":"/avatars/782ce792be83de4e01400fd36b532a9d.svg","isPro":false,"fullname":"Dohun Kim","user":"rlaehgns","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":2}">
Papers
arxiv:2511.03001

LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation

Published on Nov 4, 2025
· Submitted by
taesiri
on Nov 6, 2025
#2 Paper of the day
Authors:
,
,
,

Abstract

LEGO-Eval and LEGO-Bench improve the evaluation and generation of realistic 3D scenes by aligning detailed instructions with scene components, outperforming existing methods.

AI-generated summary

Despite recent progress in using Large Language Models (LLMs) for automatically generating 3D scenes, generated scenes often lack realistic spatial layouts and object attributes found in real-world environments. As this problem stems from insufficiently detailed, coarse-grained instructions, advancing 3D scene synthesis guided by more detailed, fine-grained instructions that reflect real-world environments becomes crucial. Without such realistic scenes, training embodied agents in unrealistic environments can lead them to learn priors that diverge significantly from real-world physics and semantics, degrading their performance when deployed. Thus, verifying the alignment between the fine-grained instruction and the generated scene is essential for effective learning. However, current evaluation methods, such as CLIPScore and vision-language models (VLMs), often fail to reliably assess such alignment. This shortcoming arises primarily from their shallow understanding of 3D scenes, which often leads to improperly grounded scene components. To address this, we introduce LEGO-Eval, an evaluation framework equipped with diverse tools designed to explicitly ground scene components, enabling more accurate alignment assessments. We also present LEGO-Bench, a benchmark of detailed instructions that specify complex layouts and attributes of real-world environments. Experiments demonstrate that LEGO-Eval outperforms VLM-as-a-judge by 0.41 F1 score in assessing scene-instruction alignment. Benchmarking with LEGO-Bench reveals significant limitations in current generation methods. Across all evaluated approaches, success rates reached at most 10% in generating scenes that fully align with fine-grained instructions.

Community

Paper submitter

Despite recent progress in using Large Language Models (LLMs) for automatically generating 3D scenes, generated scenes often lack realistic spatial layouts and object attributes found in real-world environments. As this problem stems from insufficiently detailed, coarse-grained instructions, advancing 3D scene synthesis guided by more detailed, fine-grained instructions that reflect real-world environments becomes crucial. Without such realistic scenes, training embodied agents in unrealistic environments can lead them to learn priors that diverge significantly from real-world physics and semantics, degrading their performance when deployed. Thus, verifying the alignment between the fine-grained instruction and the generated scene is essential for effective learning. However, current evaluation methods, such as CLIPScore and vision-language models (VLMs), often fail to reliably assess such alignment. This shortcoming arises primarily from their shallow understanding of 3D scenes, which often leads to improperly grounded scene components. To address this, we introduce LEGO-Eval, an evaluation framework equipped with diverse tools designed to explicitly ground scene components, enabling more accurate alignment assessments. We also present LEGO-Bench, a benchmark of detailed instructions that specify complex layouts and attributes of real-world environments. Experiments demonstrate that LEGO-Eval outperforms VLM-as-a-judge by 0.41 F1 score in assessing scene-instruction alignment. Benchmarking with LEGO-Bench reveals significant limitations in current generation methods. Across all evaluated approaches, success rates reached at most 10% in generating scenes that fully align with fine-grained instructions.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2511.03001 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2511.03001 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.