Abstract
This paper presents an end-to-end neural architecture based on Diffusion Models for spatial puzzle solving, particularly jigsaw puzzle and room arrangement tasks. In the latter task, for instance, the proposed system takes a set of room layouts as polygonal curves in the top-down view and aligns the room layout pieces by estimating their 2D translations and rotations, akin to solving the jigsaw puzzle of room layouts. A surprising discovery of the paper is that the simple use of a Diffusion Model effectively solves these challenging spatial puzzle tasks as a conditional generation process. To enable learning of an end-to-end neural system, the paper introduces new datasets with ground-truth arrangements:
- 2D Voronoi jigsaw dataset, a synthetic one where pieces are generated by Voronoi diagram of 2D pointset;
- PuzzleFusion dataset, a real one offered by MagicPlan from its production pipeline, where pieces are room layouts constructed by augmented reality App by real-estate consumers.
- The qualitative and quantitative evaluations demonstrate that our approach outperforms the competing methods by significant margins in all the tasks.
Data
We provide the PuzzleFusion dataset, a large-scale (98,000 samples) real-world dataset with ground-truth labels for house reconstruction and generation tasks. This dataset includes both Manhattan and non-Manhattan samples and has been gathered using the MagicPlan app. The dataset is available in both unprocessed and processed versions. The unprocessed version includes more houses, including multi-floor structures. The data reader code has been provided in in Here. Please also check ReadMe in the foler before using it.Citation
@misc{hosseini2023puzzlefusion,
title={PuzzleFusion: Unleashing the Power of Diffusion Models for Spatial Puzzle Solving},
author={Sepidehsadat Hosseini and Mohammad Amin Shabani and Saghar Irandoust and Yasutaka Furukawa},
year={2023},
eprint={2211.13785},
archivePrefix={arXiv},
primaryClass={cs.AI}}