StructRL Engineer :
Deep RL for efficient exploration of
combinatorial structural design spaces

Massachusetts Institute of Technology
IASS 2025

Abstract

Structural design is an art and engineering process that involves composing elements geometrically to achieve structural performance, with effective solutions varying vastly under different boundary conditions. Existing computational approaches are limited to local optimization of member sizing on a prescribed structural topology. This is largely due to the prohibitive computational cost of exploring the exponentially large space of possible topologies, further compounded by combinatorial complexity. To address these challenges, we introduce StructRL Engineer, a reinforcement learning framework that is capable of bottom-up generation of high performing structural designs fit for individual boundary conditions. The framework formulates design as a sequential decision-making process, and employs a human learning inspired two phase training algorithm tailored for performance driven design tasks. Trained policies consistently generate high performing designs ; with required stiffness and material efficiency, responding to the specific boundary condition by employing geometric and composition strategies that align with known engineering principles. Analysis of the learned policies reveals that the agents efficiently search promising regions of the large combinatorial design space, demonstrating emergent structural reasoning and the potential of reinforcement learning as a viable paradigm for performance-driven design grounded in real-world constraints.

Performance Driven Design as a Markov Decision Process

A performance driven design tasks can be formulated into a sequential decision making problem with a 1) sequential designing scheme where each state is a (partial) design 2) elemental constraints 3) structural performance feedback.

WildRF

TrussFrameEnv We demonstrate with an environment where the agent learns to design structures additively with inventory constraints for modular truss frames and structural performance feedback via finite element analysis.

Two Phase Training

We introduce a two-phase training process, with a novel task division scheme inspired by human learning: the reasoning task is split and the agent reuses learned strategies in the training that enables efficient search for high-performing designs. By dividing the complex reasoning task into distinct phases, the agent’s search is effectively guided toward promising regions of the design space, without being confined to a single local optimum, allowing efficient optimization across a large combinatorial space. Compared to the alternative of training a separate policy from scratch for each boundary condition, this phased approach results in performant designs for generalized conditions more reliably and efficiently.

WildRF

In the first phase, a base policy is trained on a simplified yet adaptive task: generating a feasible design that connects the support and target load placed at random locations within the grid, under self-load conditions and using a single frame type. In the second phase, this policy is fine-tuned to solve the more constrained task of connecting a fixed target position under external loading, with the option to employ varying frame types.

Task : Designing Cantilever Structures

We train the agent on the task of designing a cantilevering structure, that achieves the required stiffness for various load cases while minimizing material use. Significantly, while have varying forms per scenario, solutions from the proposed method exhibit common strategies of material allocation as placing stronger frames towards the base, where shear and moment demands are maximum and creating a tapering shape either on the top or bottom side leading to the cantilever. The fact that the generated designs vary vastly between different scenarios illustrates that even when the conditions change slightly, an effective design can be drastically different, and the proposed method is capable finding tailored high performing design for each specific boundary conditions.

Efficiency in Design Space Exploration

High performing designs are very far and few in between, but our method frequently experiences them. We show that within the training of phase 2, the agent effectively covers the entire design space while quickly narrowing down the search to high performing designs.

WildRF

Good Design Hit Rate The trained policy is x116 more likely to sample high performing designs compared to the random baseline, demonstrating the agent's ability to efficiently explore the large combinatorial design space and discover high performing solutions.

WildRF

Narrowing Down Search Our method searches the entire scope of the design space but quickly reduces the search to high performing instances shown with the comparitive reduction in radius and average performance of instances.

BibTeX

@article{hong2025deep,
  title={Deep reinforcement learning for efficient exploration of combinatorial structural design spaces},
  author={Chloe S.H. Hong and Keith J. Lee and Caitlin T. Mueller},
  journal={International Association for Shell and Spatial Structures},
  year={2025},
  url={https://cshhong.github.io/TrussFrameRL/}
}