Deprecated: The each() function is deprecated. This message will be suppressed on further calls in /home/zhenxiangba/zhenxiangba.com/public_html/phproxy-improved-master/index.php on line 456
Paper page - OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence
[go: Go Back, main page]

Papers
arxiv:2604.07296

OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence

Published on Apr 8
ยท Submitted by
Vincent Lee
on Apr 10
Authors:
,
,
,
,
,
,
,
,
,
,
,
,

Abstract

OpenSpatial presents an open-source data engine for spatial reasoning tasks using 3D bounding boxes, creating a large-scale dataset and achieving state-of-the-art performance in spatial perception benchmarks.

AI-generated summary

Spatial understanding is a fundamental cornerstone of human-level intelligence. Nonetheless, current research predominantly focuses on domain-specific data production, leaving a critical void: the absence of a principled, open-source engine capable of fully unleashing the potential of high-quality spatial data. To bridge this gap, we elucidate the design principles of a robust data generation system and introduce OpenSpatial -- an open-source data engine engineered for high quality, extensive scalability, broad task diversity, and optimized efficiency. OpenSpatial adopts 3D bounding boxes as the fundamental primitive to construct a comprehensive data hierarchy across five foundational tasks: Spatial Measurement (SM), Spatial Relationship (SR), Camera Perception (CP), Multi-view Consistency (MC), and Scene-Aware Reasoning (SAR). Leveraging this scalable infrastructure, we curate OpenSpatial-3M, a large-scale dataset comprising 3 million high-fidelity samples. Extensive evaluations demonstrate that versatile models trained on our dataset achieve state-of-the-art performance across a wide spectrum of spatial reasoning benchmarks. Notably, the best-performing model exhibits a substantial average improvement of 19 percent, relatively. Furthermore, we provide a systematic analysis of how data attributes influence spatial perception. By open-sourcing both the engine and the 3M-scale dataset, we provide a robust foundation to accelerate future research in spatial intelligence.

Community

Hi HF Community! ๐Ÿ‘‹

We are excited to share OpenSpatial, a principled data engine designed to empower the spatial intelligence of Large Multimodal Models.

Key Highlights:

  • ๐Ÿ“Š OpenSpatial-3M Dataset: We are open-sourcing a large-scale, high-fidelity dataset with 3 million samples across 100k+ diverse 3D scenes.
  • ๐Ÿ› ๏ธ Open-Source Data Engine: We release our full data production and 3D lifting framework, enabling the community to generate high-quality spatial data from 3D primitives (OBB) at scale.
  • ๐Ÿ“ˆ Significant Performance Gains: Our engine consistently boosts the spatial reasoning capabilities of state-of-the-art LMMs (e.g., Qwen2-VL, InternVL2) by a large margin across 5 foundational tasks.

Resources:

Check out our repo and feel free to join the discussion! ๐Ÿš€

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.07296
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.07296 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.07296 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.07296 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.