Back to Rankings

Architect-Ant: Editable Automatic Furnishing of Architectural Floor Plans

Fedor Rodionov, Aleksandar Cvejic, Michael Birsak, John Femiani, Peter Wonka

cs.AIcs.CV
Share
#2968 of 3489 · Artificial Intelligence
Tournament Score
1290±44
10501800
30%
Win Rate
6
Wins
14
Losses
20
Matches
Rating
5.5/ 10
Significance5
Rigor6
Novelty5
Clarity7.5

Abstract

Furnished floor plans are fundamental to real estate visualization, interior design, and architectural workflows. However, progress in automatic furniture arrangement has been limited by the lack of real, professionally designed floor-plan datasets with object-level furniture annotations. To address this gap, we introduce AntPlan-270, a curated dataset of 270 architectural floor plans with per-room furniture bounding box annotations across ten residential room categories. Building on this dataset, we present Architect-Ant, an editable automatic furnishing framework powered by a fine-tuned vision-language model. Furniture layouts are represented using a compact, coordinate-based domain-specific language (DSL) that encodes object categories and placements relative to the room geometry. To improve spatial reasoning, we generate procedural reasoning traces that capture architectural constraints such as wall alignment, door and window clearance, circulation, fixture compatibility, and room-specific furniture inventories, and use them to supervise fine-tuning of the model. We then apply preference optimization over candidate object placements to further refine layout quality. The generated DSL can be rasterized into semantic masks and used to condition a Flux-based LoRA renderer, producing realistic blueprint-style furnished floor-plan images while preserving the editable symbolic layout. Experiments on layout furnishing show that Architect-Ant produces geometrically valid and functionally plausible layouts, and suggest a scalable path for furnishing larger structure-only floor-plan datasets.

AI Impact Assessments

(1 models)

Scientific Impact Assessment: Architect-Ant

1. Core Contribution

Architect-Ant addresses the problem of automatic furniture placement in 2D architectural floor plans—a practically important but under-explored niche compared to 3D indoor scene synthesis. The paper makes three interlinked contributions: (1) AntPlan-270, a dataset of 270 real architectural floor plans with per-room furniture bounding-box pseudo-labels across ten residential room categories; (2) a structured generation framework that represents furniture layouts as a coordinate-based DSL and uses a fine-tuned vision-language model (Qwen3.5-9B) with LoRA adapters to generate placements; and (3) a training pipeline combining supervised fine-tuning with procedural reasoning traces (including fail-and-fix recovery augmentation) and direct preference optimization (DPO) driven by a deterministic rule-based scorer encoding architectural constraints.

The key conceptual insight is moving constraint enforcement *into* the generative model's training distribution via preference optimization, rather than relying on post-hoc solvers or repair steps—a meaningful distinction from prior constraint-based and LLM-driven layout methods.

2. Methodological Rigor

The methodology is well-structured with clear ablations isolating each pipeline component (Table 5): zero-shot → SFT without recovery → SFT with recovery → SFT with image input → synthetic-pair DPO → model-pair DPO. The progression demonstrates incremental value at each stage. The distinction between synthetic-pair and model-pair DPO is particularly insightful—the authors honestly report that model-pair DPO achieves higher rule scores but produces qualitatively worse layouts (reward hacking), leading them to adopt the more conservative synthetic-pair approach.

However, several methodological concerns emerge:

  • Dataset scale: 270 floor plans yielding ~1,351 room samples (before augmentation) is quite small. The pseudo-label pipeline (detector bootstrapped from hand-labeled subset, then manually reviewed) introduces noise that the authors acknowledge but cannot fully quantify.
  • Evaluation limitations: The rule-based scorer serves double duty as both the DPO training signal and a primary evaluation metric, creating circularity. The VLM-as-judge (Gemini 3 Flash) provides independence but has acknowledged calibration issues, especially on kitchens. No human evaluation is conducted.
  • Axis-aligned bounding boxes only: The DSL lacks rotation, which is a significant simplification for furniture placement (e.g., angled sofas, rotated desks).
  • Out-of-distribution evaluation: While testing on CubiCasa5K rooms is appropriate, the comparison with frontier models (Kimi K2.5, GLM-5V-Turbo) uses different candidate budgets (K=10 vs K=2), making direct comparison difficult despite the authors' acknowledgment.
  • 3. Potential Impact

    Practical applications in real estate visualization, interior design, and architectural CAD workflows are clear and commercially relevant. The editable DSL representation is a strength—unlike pixel-based approaches, it supports downstream modifications, a genuine requirement for professional workflows.

    Research impact is more moderate. The paper occupies a narrow intersection: 2D floor-plan furnishing with bounding boxes. Most active research in indoor scene synthesis operates in 3D (3D-FRONT, ScanNet ecosystems). The contribution may have limited uptake because:

  • The dataset is small and based on pseudo-labels rather than clean annotations.
  • The method is specific to axis-aligned 2D boxes in residential rooms.
  • The rendering pipeline (FLUX LoRA) is a visualization convenience rather than a technical contribution.
  • The broader principle—using rule-derived preferences to train structured generators when clean demonstrations are scarce—has wider applicability to constrained layout problems (chip design, UI layout, warehouse planning), though the paper doesn't empirically demonstrate transfer.

    4. Timeliness & Relevance

    The paper is timely in two respects. First, applying VLMs/LLMs to structured spatial reasoning is an active frontier, and prior work (FloorplanQA, LayoutGPT) has exposed brittleness that motivates task-specific adaptation. Second, preference optimization (DPO) from programmatic verifiers is a growing paradigm (code, math), and extending it to geometric layout is natural but not yet well-explored.

    The gap the paper identifies—lack of real 2D furnished floor-plan data—is genuine. However, whether 270 plans constitute a sufficient solution is debatable.

    5. Strengths & Limitations

    Strengths:

  • Clean problem formulation: separating structure generation from furnishing, and keeping layouts as editable symbolic objects rather than pixels.
  • Honest reporting of failure modes: reward hacking with model-pair DPO, kitchen difficulties, VLM judge calibration issues.
  • The synthetic-pair DPO construction (single bounding-box perturbation) is clever and well-motivated, preventing the model from exploiting surface-form shortcuts.
  • Comprehensive rule scorer with interpretable, decomposable penalties.
  • Thorough ablation study.
  • Limitations:

  • Small dataset with pseudo-labels limits generalization claims.
  • No orientation/rotation modeling—axis-aligned boxes are a significant constraint.
  • No human evaluation study, which is critical for a design-oriented task.
  • The four-room-type scope (bedroom, bathroom, kitchen, living room) is narrow; commercial and mixed-use spaces are absent.
  • Scalability is claimed but not demonstrated beyond 270 plans.
  • The FLUX LoRA rendering, while visually appealing, is not evaluated rigorously and serves primarily as visualization.
  • Comparison fairness issues with frontier baselines due to different candidate budgets.
  • Limited novelty in individual components—the contribution is more in their combination for this specific domain.
  • Overall Assessment

    Architect-Ant is a competent systems paper that assembles existing techniques (VLM fine-tuning, DPO, rule-based scoring) into a practical pipeline for a well-defined problem. The work is methodologically careful with good ablations and honest limitations. However, the dataset is small, the representation is simplified (no rotation), evaluation relies heavily on the training signal itself, and the impact may be limited by the narrow 2D bounding-box setting when the field is moving toward richer 3D representations. The paper contributes a useful proof-of-concept for rule-guided preference optimization in spatial layout but falls short of a transformative advance.

    Rating:5.5/ 10
    Significance 5Rigor 6Novelty 5Clarity 7.5

    Generated Jun 10, 2026

    Comparison History (20)

    Lostvs. Knowing When to Ask: Self-Gated Clarification for Hierarchical Language Agents

    Paper 1 addresses a fundamental bottleneck in AI agents—recognizing uncertainty and seeking clarification—by integrating help-seeking directly into the action space. This approach has broad, cross-disciplinary implications for improving the reliability and safety of autonomous LLM agents. In contrast, Paper 2 presents a valuable but highly domain-specific dataset and pipeline for architectural floor plans, which has practical industry applications but narrower foundational scientific impact.

    gemini-3.1-pro-preview·Jun 11, 2026
    Wonvs. Nonslop: A Gamified Experiment in Human-AI Collaborative Writing

    Paper 2 has higher likely scientific impact due to a clearer technical contribution (new annotated dataset + editable furnishing framework), stronger methodological components (DSL representation, constraint-based reasoning traces, preference optimization, and rendering pipeline), and direct real-world applicability in architecture/real estate/interior design. Its dataset addresses a key bottleneck and can enable follow-on work, broadening impact across vision-language modeling, structured generation, and design automation. Paper 1 is conceptually novel and timely for HCI, but its smaller-scale gamified study (74 participants) suggests more limited generalizability and downstream reuse compared with a reusable dataset + model framework.

    gpt-5.2·Jun 11, 2026
    Wonvs. Monte Carlo Pass Search: Using Trajectory Generation for 3D Counterfactual Pass Evaluation in Football

    Paper 1 offers a highly practical and scalable solution to a major bottleneck in architecture and real estate design. By combining a novel dataset, a domain-specific language, and vision-language models for procedural reasoning, it presents a comprehensive neuro-symbolic framework. While Paper 2 introduces an innovative multi-agent world model approach for sports analytics, Paper 1 has broader immediate real-world applications and commercial potential across multiple large-scale industries.

    gemini-3.1-pro-preview·Jun 10, 2026
    Lostvs. I Know What You Meme, Even If it Emerged Today: Understanding Evolving Memes through Open-World Knowledge Acquisition

    Paper 2 addresses a broader and more timely challenge—understanding evolving multimodal memes through open-world knowledge retrieval—with wider applicability across NLP, social media analysis, misinformation detection, and content moderation. Its zero-shot framework (Query-Retrieve-Conclude) is more generalizable and tackles a fundamental limitation of pretrained models (outdated knowledge), relevant across many AI tasks. Paper 1, while technically solid, addresses a narrower domain (architectural floor plan furnishing) with a small dataset (270 plans), limiting its breadth of impact.

    claude-opus-4-6·Jun 10, 2026
    Wonvs. A Reliable Fault Diagnosis Method Based on Belief Rule Base Consider Robustness Analysis

    Paper 2 introduces a novel dataset and a comprehensive framework using cutting-edge AI techniques (vision-language models, preference optimization) for an underexplored domain. Its integration of a domain-specific language and rendering pipeline offers broad applications in real estate, interior design, and architecture. In contrast, Paper 1 presents an incremental methodological improvement to fault diagnosis using belief rule bases, which, while useful, is more narrowly focused and less likely to drive broad, cross-disciplinary innovation compared to Paper 2's generative AI approach.

    gemini-3.1-pro-preview·Jun 10, 2026
    Wonvs. Ten Headache Specialists versus Artificial Intelligence for Clinical Literature Summarization: A Critical Evaluation and Comparison

    Paper 2 likely has higher impact due to a clearer technical contribution and broader applicability: it releases a new annotated dataset (AntPlan-270), proposes an editable DSL-based furnishing pipeline, and introduces methodological innovations (reasoning-trace supervision + preference optimization) that can transfer to layout generation, VLM spatial reasoning, and human-in-the-loop design tools. It has tangible real-world applications in architecture/real estate/graphics and could seed follow-on benchmarks. Paper 1 is timely and useful but is narrower (single specialty, limited question set) and more evaluative than methodologically novel.

    gpt-5.2·Jun 10, 2026
    Lostvs. ReflectiChain: Epistemic Grounding in LLM-Driven World Models for Supply Chain Resilience

    Paper 1 addresses a critical, globally relevant problem (supply chain resilience) by introducing a highly novel theoretical framework that bridges LLMs and reinforcement learning through epistemic grounding and world models. Its methodological depth, tackling both epistemic and aleatoric uncertainty, offers significant contributions to AI and operations research. In contrast, Paper 2 presents a valuable but narrower application in architectural furnishing relying on a relatively small dataset, making its potential impact more localized to specific design workflows.

    gemini-3.1-pro-preview·Jun 10, 2026
    Lostvs. Large-scale semantic mapping of learner agency and autonomy reveals what measurement and generative AI research overlook

    Paper 1 tackles a foundational theoretical issue in educational psychology and AI design by analyzing a massive corpus (14,000+ publications) to resolve conceptual ambiguities. Its findings broadly impact measurement, conceptualization, and the future development of generative AI in education. In contrast, Paper 2 presents a valuable but narrower technical contribution and a relatively small dataset (270 plans) for the specific applied task of floor plan furnishing, making Paper 1's scientific impact broader and more significant.

    gemini-3.1-pro-preview·Jun 10, 2026
    Lostvs. Moonshine: An Autonomous Mathematical Research Agent Centered on Conjecture Generation

    Moonshine presents a fundamentally novel paradigm—an autonomous AI agent for mathematical conjecture generation—demonstrated through formulating and partially proving the Neural Jacobian Conjecture, bridging classical mathematics with neural network theory. This has broad implications for AI-assisted mathematics, a rapidly growing frontier. While Architect-Ant makes a solid applied contribution to automatic floor-plan furnishing with a new dataset and framework, its impact is narrower, confined to architectural visualization. Moonshine's cross-disciplinary novelty (AI + pure mathematics), timeliness given the surge in AI-for-math research, and methodological ambition give it substantially higher potential impact.

    claude-opus-4-6·Jun 10, 2026
    Lostvs. Class-Specific Branch Attention for Mitigating Gradient Interference under Class Imbalance

    Paper 2 addresses a fundamental and ubiquitous problem in machine learning—class imbalance and gradient interference. By introducing a diagnostic framework and an architectural modification (CSBA) that improves minority-class performance, its findings are broadly applicable to numerous computer vision and deep learning tasks. In contrast, Paper 1 offers a highly domain-specific solution for architectural floor plan furnishing. While practically useful, Paper 2's methodological contributions to optimization dynamics offer deeper theoretical insights and a significantly broader potential impact across diverse scientific and applied fields.

    gemini-3.1-pro-preview·Jun 10, 2026