Abstract

This paper presents an end-to-end neural architecture based on Diffusion Models for spatial puzzle solving, particularly jigsaw puzzle and room arrangement tasks. In the latter task, for instance, the proposed system takes a set of room layouts as polygonal curves in the top-down view and aligns the room layout pieces by estimating their 2D translations and rotations, akin to solving the jigsaw puzzle of room layouts. A surprising discovery of the paper is that the simple use of a Diffusion Model effectively solves these challenging spatial puzzle tasks as a conditional generation process. To enable learning of an end-to-end neural system, the paper introduces new datasets with ground-truth arrangements:

PDF

Data

We provide the PuzzleFusion dataset, a large-scale (98,000 samples) real-world dataset with ground-truth labels for house reconstruction and generation tasks. This dataset includes both Manhattan and non-Manhattan samples and has been gathered using the MagicPlan app. The dataset is available in both unprocessed and processed versions. The unprocessed version includes more houses, including multi-floor structures. The data reader code has been provided in in Here. Please also check ReadMe in the foler before using it.

Citation


        @misc{hosseini2023puzzlefusion,
          title={PuzzleFusion: Unleashing the Power of Diffusion Models for Spatial Puzzle Solving}, 
          author={Sepidehsadat Hosseini and Mohammad Amin Shabani and Saghar Irandoust and Yasutaka Furukawa},
          year={2023},
          eprint={2211.13785},
          archivePrefix={arXiv},
          primaryClass={cs.AI}}