Fedor Rodionov, Aleksandar Cvejic, Michael Birsak, John Femiani, Peter Wonka
Furnished floor plans are fundamental to real estate visualization, interior design, and architectural workflows. However, progress in automatic furniture arrangement has been limited by the lack of real, professionally designed floor-plan datasets with object-level furniture annotations. To address this gap, we introduce AntPlan-270, a curated dataset of 270 architectural floor plans with per-room furniture bounding box annotations across ten residential room categories. Building on this dataset, we present Architect-Ant, an editable automatic furnishing framework powered by a fine-tuned vision-language model. Furniture layouts are represented using a compact, coordinate-based domain-specific language (DSL) that encodes object categories and placements relative to the room geometry. To improve spatial reasoning, we generate procedural reasoning traces that capture architectural constraints such as wall alignment, door and window clearance, circulation, fixture compatibility, and room-specific furniture inventories, and use them to supervise fine-tuning of the model. We then apply preference optimization over candidate object placements to further refine layout quality. The generated DSL can be rasterized into semantic masks and used to condition a Flux-based LoRA renderer, producing realistic blueprint-style furnished floor-plan images while preserving the editable symbolic layout. Experiments on layout furnishing show that Architect-Ant produces geometrically valid and functionally plausible layouts, and suggest a scalable path for furnishing larger structure-only floor-plan datasets.
Architect-Ant addresses the problem of automatic furniture placement in 2D architectural floor plans—a practically important but under-explored niche compared to 3D indoor scene synthesis. The paper makes three interlinked contributions: (1) AntPlan-270, a dataset of 270 real architectural floor plans with per-room furniture bounding-box pseudo-labels across ten residential room categories; (2) a structured generation framework that represents furniture layouts as a coordinate-based DSL and uses a fine-tuned vision-language model (Qwen3.5-9B) with LoRA adapters to generate placements; and (3) a training pipeline combining supervised fine-tuning with procedural reasoning traces (including fail-and-fix recovery augmentation) and direct preference optimization (DPO) driven by a deterministic rule-based scorer encoding architectural constraints.
The key conceptual insight is moving constraint enforcement *into* the generative model's training distribution via preference optimization, rather than relying on post-hoc solvers or repair steps—a meaningful distinction from prior constraint-based and LLM-driven layout methods.
The methodology is well-structured with clear ablations isolating each pipeline component (Table 5): zero-shot → SFT without recovery → SFT with recovery → SFT with image input → synthetic-pair DPO → model-pair DPO. The progression demonstrates incremental value at each stage. The distinction between synthetic-pair and model-pair DPO is particularly insightful—the authors honestly report that model-pair DPO achieves higher rule scores but produces qualitatively worse layouts (reward hacking), leading them to adopt the more conservative synthetic-pair approach.
However, several methodological concerns emerge:
Practical applications in real estate visualization, interior design, and architectural CAD workflows are clear and commercially relevant. The editable DSL representation is a strength—unlike pixel-based approaches, it supports downstream modifications, a genuine requirement for professional workflows.
Research impact is more moderate. The paper occupies a narrow intersection: 2D floor-plan furnishing with bounding boxes. Most active research in indoor scene synthesis operates in 3D (3D-FRONT, ScanNet ecosystems). The contribution may have limited uptake because:
The broader principle—using rule-derived preferences to train structured generators when clean demonstrations are scarce—has wider applicability to constrained layout problems (chip design, UI layout, warehouse planning), though the paper doesn't empirically demonstrate transfer.
The paper is timely in two respects. First, applying VLMs/LLMs to structured spatial reasoning is an active frontier, and prior work (FloorplanQA, LayoutGPT) has exposed brittleness that motivates task-specific adaptation. Second, preference optimization (DPO) from programmatic verifiers is a growing paradigm (code, math), and extending it to geometric layout is natural but not yet well-explored.
The gap the paper identifies—lack of real 2D furnished floor-plan data—is genuine. However, whether 270 plans constitute a sufficient solution is debatable.
Architect-Ant is a competent systems paper that assembles existing techniques (VLM fine-tuning, DPO, rule-based scoring) into a practical pipeline for a well-defined problem. The work is methodologically careful with good ablations and honest limitations. However, the dataset is small, the representation is simplified (no rotation), evaluation relies heavily on the training signal itself, and the impact may be limited by the narrow 2D bounding-box setting when the field is moving toward richer 3D representations. The paper contributes a useful proof-of-concept for rule-guided preference optimization in spatial layout but falls short of a transformative advance.
Generated Jun 10, 2026
Paper 1 addresses a fundamental bottleneck in AI agents—recognizing uncertainty and seeking clarification—by integrating help-seeking directly into the action space. This approach has broad, cross-disciplinary implications for improving the reliability and safety of autonomous LLM agents. In contrast, Paper 2 presents a valuable but highly domain-specific dataset and pipeline for architectural floor plans, which has practical industry applications but narrower foundational scientific impact.
Paper 2 has higher likely scientific impact due to a clearer technical contribution (new annotated dataset + editable furnishing framework), stronger methodological components (DSL representation, constraint-based reasoning traces, preference optimization, and rendering pipeline), and direct real-world applicability in architecture/real estate/interior design. Its dataset addresses a key bottleneck and can enable follow-on work, broadening impact across vision-language modeling, structured generation, and design automation. Paper 1 is conceptually novel and timely for HCI, but its smaller-scale gamified study (74 participants) suggests more limited generalizability and downstream reuse compared with a reusable dataset + model framework.
Paper 1 offers a highly practical and scalable solution to a major bottleneck in architecture and real estate design. By combining a novel dataset, a domain-specific language, and vision-language models for procedural reasoning, it presents a comprehensive neuro-symbolic framework. While Paper 2 introduces an innovative multi-agent world model approach for sports analytics, Paper 1 has broader immediate real-world applications and commercial potential across multiple large-scale industries.
Paper 2 addresses a broader and more timely challenge—understanding evolving multimodal memes through open-world knowledge retrieval—with wider applicability across NLP, social media analysis, misinformation detection, and content moderation. Its zero-shot framework (Query-Retrieve-Conclude) is more generalizable and tackles a fundamental limitation of pretrained models (outdated knowledge), relevant across many AI tasks. Paper 1, while technically solid, addresses a narrower domain (architectural floor plan furnishing) with a small dataset (270 plans), limiting its breadth of impact.
Paper 2 introduces a novel dataset and a comprehensive framework using cutting-edge AI techniques (vision-language models, preference optimization) for an underexplored domain. Its integration of a domain-specific language and rendering pipeline offers broad applications in real estate, interior design, and architecture. In contrast, Paper 1 presents an incremental methodological improvement to fault diagnosis using belief rule bases, which, while useful, is more narrowly focused and less likely to drive broad, cross-disciplinary innovation compared to Paper 2's generative AI approach.
Paper 2 likely has higher impact due to a clearer technical contribution and broader applicability: it releases a new annotated dataset (AntPlan-270), proposes an editable DSL-based furnishing pipeline, and introduces methodological innovations (reasoning-trace supervision + preference optimization) that can transfer to layout generation, VLM spatial reasoning, and human-in-the-loop design tools. It has tangible real-world applications in architecture/real estate/graphics and could seed follow-on benchmarks. Paper 1 is timely and useful but is narrower (single specialty, limited question set) and more evaluative than methodologically novel.
Paper 1 addresses a critical, globally relevant problem (supply chain resilience) by introducing a highly novel theoretical framework that bridges LLMs and reinforcement learning through epistemic grounding and world models. Its methodological depth, tackling both epistemic and aleatoric uncertainty, offers significant contributions to AI and operations research. In contrast, Paper 2 presents a valuable but narrower application in architectural furnishing relying on a relatively small dataset, making its potential impact more localized to specific design workflows.
Paper 1 tackles a foundational theoretical issue in educational psychology and AI design by analyzing a massive corpus (14,000+ publications) to resolve conceptual ambiguities. Its findings broadly impact measurement, conceptualization, and the future development of generative AI in education. In contrast, Paper 2 presents a valuable but narrower technical contribution and a relatively small dataset (270 plans) for the specific applied task of floor plan furnishing, making Paper 1's scientific impact broader and more significant.
Moonshine presents a fundamentally novel paradigm—an autonomous AI agent for mathematical conjecture generation—demonstrated through formulating and partially proving the Neural Jacobian Conjecture, bridging classical mathematics with neural network theory. This has broad implications for AI-assisted mathematics, a rapidly growing frontier. While Architect-Ant makes a solid applied contribution to automatic floor-plan furnishing with a new dataset and framework, its impact is narrower, confined to architectural visualization. Moonshine's cross-disciplinary novelty (AI + pure mathematics), timeliness given the surge in AI-for-math research, and methodological ambition give it substantially higher potential impact.
Paper 2 addresses a fundamental and ubiquitous problem in machine learning—class imbalance and gradient interference. By introducing a diagnostic framework and an architectural modification (CSBA) that improves minority-class performance, its findings are broadly applicable to numerous computer vision and deep learning tasks. In contrast, Paper 1 offers a highly domain-specific solution for architectural floor plan furnishing. While practically useful, Paper 2's methodological contributions to optimization dynamics offer deeper theoretical insights and a significantly broader potential impact across diverse scientific and applied fields.