Procedural Puzzle Generation: A Survey

Book Cover

Authors: Barbara De Kegel, [Mads Haahr Tags: procedural generation, game design, AI, computer science, puzzles Publication Year: 2020

Overview

In our work, we set out to provide a detailed map of the research landscape for [[Procedural Content Generation (PCG)]] as it applies specifically to puzzles in digital games. While PCG has been widely adopted for creating vast game worlds, characters, and backstories, its application to the core gameplay mechanic of puzzles has been comparatively limited. We believe this is a missed opportunity. Puzzles are a fantastic way to improve gameplay by offering players interesting, solvable problems, but their manual creation is resource-intensive and their replay value is inherently low. Our survey addresses this gap by reviewing 32 distinct methods across 11 categories of puzzles, from Sokoban-type and mazes to more open-ended narrative and physics puzzles. Our goal is to provide a structured understanding of the field for fellow researchers and game developers. We introduce a framework of seven salient characteristics—such as online vs. offline generation, the degree of designer control, and evaluation methods—to analyze and compare the different approaches. We argue that the primary challenge in puzzle generation is not just creating content, but creating content that is guaranteed to be [[solvable]] while also meeting quality criteria for difficulty, aesthetics, and player engagement. Our survey concludes by identifying several ‘gaps on the map’—promising areas for future research, including the development of online and adaptive puzzle generators, the application of powerful constraint-based techniques like [[Answer Set Programming (ASP)]] to more puzzle types, and the crucial challenge of defining and evaluating puzzle quality beyond mere solvability. For AI product engineers, this work offers a taxonomy of generative techniques and a clear-eyed view of the practical challenges in creating systems that produce valid, high-quality, and engaging content within a constrained design space.

Book Distillation

1. Introduction

Procedural Content Generation (PCG) is a powerful technique for creating game elements, but its use for puzzles is less explored than for worlds or characters. Puzzles enhance gameplay but have low replay value when handcrafted. Procedurally generating puzzles is challenging, primarily due to the strict [[solvability constraint]]. To analyze the field, we classify puzzles into 11 genres and assess generation methods against seven key characteristics, such as the generation approach (e.g., constructive vs. search-based), the evaluation method, and the degree of designer control. This framework allows us to systematically map existing research and identify promising areas for future work.

Key Quote/Concept:

Seven Salient Characteristics of PCG Methods: 1. Constructive vs. Generate-and-Test, 2. Direct vs. Simulation-Based Evaluation, 3. Online vs. Offline, 4. Degree and Dimension of Control, 5. Automatic Generation vs. Mixed Authorship, 6. Generic vs. Adaptive, 7. Quality Considerations. This framework provides a multi-dimensional lens for analyzing and comparing different procedural puzzle generation techniques, moving beyond a simple description of algorithms.

2. Sokoban-Type Puzzles

This puzzle category involves pushing crates to goal locations within a constrained grid, where no items are ever added or removed. The main generation challenge is navigating a vast state space while guaranteeing solvability and avoiding dead ends. A common and effective technique is to work backward from a solved state. Other approaches use search algorithms like [[Monte Carlo Tree Search (MCTS)]] or evolutionary methods, often guided by metrics that evaluate puzzle complexity based on features like solution length or board layout.

Key Quote/Concept:

Reverse Generation: A core technique for generating Sokoban-style puzzles. The process starts with a completed goal configuration and applies legal moves in reverse to generate a starting position. This inherently guarantees that at least one solution path exists, elegantly solving the fundamental solvability problem.

3. Sliding Puzzles

Sliding puzzles, like the 15-puzzle or Rush Hour, involve rearranging tiles on a grid. Unlike Sokoban, moves are typically reversible, so there are no dead-end states. The generation challenge shifts to creating initial configurations that are difficult to solve, meaning they require a high number of moves. Advanced methods use techniques like [[Binary Decision Diagrams (BDDs)]] to symbolically represent and analyze the entire state graph of the puzzle, allowing them to identify the provably hardest possible starting positions.

Key Quote/Concept:

Symbolic Methods for Hardness: Instead of random generation and testing, symbolic methods can represent all possible puzzle configurations compactly. This allows for a complete analysis of the state space to find initial layouts that maximize the shortest solution path, thereby generating puzzles of a desired (often maximal) difficulty.

4. Tile-Matching Puzzles

In these puzzles, players manipulate tiles to create matches, which are then removed from the board. For simple match-three games, generation is often trivial, relying on random placement of tiles. More interesting generation problems arise in [[hybrid games]] that blend tile-matching with other mechanics, such as movement constraints. For these, generation is a multi-step process: first creating a level structure, then placing items, and finally using a solver (like a Breadth-First Search) to validate that the level is solvable.

Key Quote/Concept:

Generation for Hybrid Puzzles: The complexity in this category comes from games that mix tile-matching with other puzzle types (e.g., Sokoban or sliding puzzles). Generation for these requires more than randomness; it often involves a constructive phase for the level layout followed by a validation phase to ensure solvability, sometimes within a mixed-authorship tool.

5. Assembly Puzzles

Assembly puzzles require fitting a number of smaller shapes into a larger target shape without any gaps or overlap. For 2D puzzles like jigsaws, generation can be achieved by simply partitioning a target image or shape. For 3D [[interlocking puzzles]], the process is more complex. A successful generation technique involves starting with a complete 3D voxelized shape and algorithmically extracting pieces one by one in such a way that guarantees they can be reassembled in a specific, often non-obvious, sequence.

Key Quote/Concept:

Recursive Interlocking Puzzles: A sophisticated puzzle type where pieces are locked not just in the final configuration but also in intermediate sub-assemblies. A constructive generation model for these works by ensuring that any three consecutive pieces in the assembly sequence can interlock, which guarantees the stability of the entire structure.

6. Mazes

Mazes are puzzles that require finding a path from an entry to an exit point. The barriers can be explicit walls or implicit rules (e.g., a chess maze where movement is restricted to a knight’s moves). While classic generation algorithms are well-known, more advanced approaches use [[evolutionary algorithms]] to generate mazes with specific qualities (like a certain difficulty) or use declarative languages like [[Answer Set Programming (ASP)]] to define the desired properties of the maze directly.

Key Quote/Concept:

Declarative Maze Generation (ASP): Using Answer Set Programming provides a ‘white box’ approach. Instead of evolving solutions towards a goal with a fitness function, the designer can directly state constraints like ‘the maze must have a unique solution,’ ‘the shortest path must be exactly 30 steps,’ or ‘the path must not visit more than two red squares.’ The solver then finds a maze that satisfies all these rules.

7. Path-Building Puzzles

In path-building puzzles, the player does not find a path but must create one using a given set of items or tools. The primary generation challenges are ensuring a solution exists while also preventing unintended ‘shortcut’ solutions that bypass the intended challenge. [[Constraint-based generation]] is particularly powerful here, as it allows a designer to specify rules that explicitly forbid these undesirable solutions, ensuring both solvability and design integrity.

Key Quote/Concept:

Constraining Undesirable Solutions: A key problem in path-building puzzle generation is preventing players from finding trivial ‘shortcuts.’ Advanced generators using ASP can formalize the problem as generating a puzzle that has a solution but no undesirable solutions. This is done by adding constraints that explicitly rule out solutions that are too short or use unintended mechanics.

8. Narrative Puzzles

Narrative puzzles are integrated into a game’s story and are solved through exploration, logic, and combining items. Their generation is highly dependent on designer input to define the world’s logic. [[Grammar-based approaches]] are common, where a high-level ‘puzzle map’ dictates the structure of dependencies, and a database of items and their properties is used to populate the puzzle. This ensures the generated puzzle is logically sound and solvable within the game’s narrative context.

Key Quote/Concept:

Smart Terrain Causality Chains (STCCs): A generative technique where a puzzle solution is represented as a directed graph of cause and effect. The system starts with a goal (e.g., ‘unlock door’) and works backward, chaining together the necessary items and actions (e.g., ‘use key,’ which requires ‘get key from guard,’ which requires ‘distract guard with item’). This constructs a logical and solvable puzzle sequence.

9. Physics Puzzles

These puzzles, like Angry Birds, require players to use in-game physics to find a solution. Generation is difficult because of the dynamic, real-time nature of physics and the need to account for [[dexterity difficulty]] (i.e., the player’s physical skill) in addition to strategic difficulty. Consequently, generation methods almost always rely on [[simulation-based evaluation]], where an AI agent plays through generated levels to test if they are possible to complete.

Key Quote/Concept:

Simulation-Based Evaluation: Because the outcome of a physics puzzle is emergent and depends on precise interactions, it’s nearly impossible to guarantee solvability with a purely static analysis. Therefore, generators for these puzzles typically include a component where an AI agent simulates playing the level, often multiple times, to verify that a solution is physically achievable.

10. Logic Puzzles

Logic puzzles, such as Sudoku, are solved using deductive reasoning. Generation is intrinsically linked to solving. A common approach is to start with a fully solved grid and then remove clues (‘digging holes’). The critical step is to ensure that the puzzle still has a unique solution after clues are removed. Difficulty is best controlled not by the number of clues, but by the complexity of the logical deductions required for the solution.

Key Quote/Concept:

Difficulty via Required Techniques: A sophisticated way to generate logic puzzles of a specific difficulty is to base the difficulty rating on the actual solving techniques a human would need to use. A generator can be designed to create a puzzle that is only solvable if the player knows a specific advanced technique (e.g., an ‘X-Wing’ in Sudoku), thus providing fine-grained control over the challenge.

11. Word Puzzles and Riddles

This category includes any puzzle based on language, such as crosswords, analogies, or riddles. Generation requires large, structured knowledge bases like corpora or semantic databases (e.g., Wikipedia, Thesaurus Rex). The core challenges are managing ambiguity to ensure a single plausible solution and capturing the nuance and [[lateral thinking]] that make these puzzles compelling. Techniques often involve finding or creating relationships between concepts within the knowledge base.

Key Quote/Concept:

Generation via Semantic Databases: The foundation of word puzzle generation is a rich source of semantic information. A generator might query a database like Thesaurus Rex or a topic model built from Wikipedia to find words that are related in a specific way (e.g., ‘is a type of,’ ‘is used for’). It then constructs a puzzle around this relationship, such as an odd-one-out or an analogy question.

12. Analysis and Conclusion

Our cross-category analysis shows that search-based and generate-and-test are the dominant methods, with simulation being essential for dynamic puzzles. Most current research focuses on offline generation, and open-ended puzzles like narrative ones still require heavy designer input. We identified several key [[gaps on the map]]: there is a clear need for more work on online and adaptive generation that can tailor puzzles to players in real-time. Furthermore, developing better metrics for [[puzzle quality]] beyond solvability and applying powerful, declarative techniques like ASP more broadly are critical directions for advancing the field.

Key Quote/Concept:

Gaps on the Map: This term refers to the promising, yet underexplored, areas in procedural puzzle generation. The most significant gaps are: 1) Online Generation for creating dynamic, surprising player experiences; 2) Adaptive Generation to tailor puzzle difficulty to individual player skill; and 3) Quality Evaluation Metrics to move beyond simple solvability and begin to measure aesthetic appeal, novelty, and player engagement.

Generated using Google GenAI

Essential Questions

1. What is the fundamental challenge that distinguishes procedural puzzle generation from other forms of Procedural Content Generation (PCG), and how do different puzzle genres address it?

The fundamental challenge that sets puzzle generation apart is the [[solvability constraint]]. Unlike generating a landscape or a character, a puzzle must have at least one valid solution. A beautiful but unsolvable maze is a failure. The survey demonstrates that different puzzle genres have evolved unique strategies to tackle this. For instance, in Sokoban-type puzzles, where irreversible moves can create dead ends, a common and effective technique is [[Reverse Generation]]: starting from a solved state and working backward to create the initial setup. This inherently guarantees a solution path. Conversely, for sliding puzzles where moves are reversible, solvability is trivial; the challenge shifts to generating puzzles of a specific difficulty, often by using symbolic methods like [[Binary Decision Diagrams (BDDs)]] to find starting configurations with the longest possible solution paths. For dynamic puzzles like physics puzzles, where outcomes are emergent, [[simulation-based evaluation]] using an AI player becomes necessary to confirm that a solution is physically achievable. This illustrates that the core problem of solvability forces a deep, genre-specific approach to generation.

2. How does the survey’s framework of ‘seven salient characteristics’ help in analyzing and mapping the landscape of puzzle generation research?

The framework of seven salient characteristics provides a multi-dimensional lens to systematically compare and contrast disparate generation methods, moving beyond simple algorithmic descriptions. These characteristics—such as ‘Constructive vs. Generate-and-Test,’ ‘Online vs. Offline,’ and ‘Degree of Designer Control’—create a common language for analysis. By applying this framework across 11 puzzle categories, the authors reveal macro-level trends. For example, they observe that most current research focuses on [[offline generation]], and that search-based and generate-and-test are the dominant paradigms. It also highlights how certain puzzle types necessitate specific characteristics; for instance, narrative puzzles inherently require a high degree of designer control and mixed authorship, while physics puzzles almost universally depend on [[simulation-based evaluation]]. This structured analysis is what allows the authors to methodically identify the ‘[[gaps on the map]]’—underexplored areas like online, adaptive generation. For an AI product engineer, this framework is invaluable for understanding the trade-offs of different generative approaches and for classifying new or existing systems.

3. What are the primary ‘gaps on the map’ identified in the survey, and what are their implications for the future of AI in game development?

The survey concludes by identifying three critical ‘gaps on the map,’ or promising areas for future research. The first is [[Online Generation]], creating puzzles on-the-fly as a player progresses. This is a significant technical hurdle but promises more dynamic, surprising, and endlessly replayable experiences. The second is [[Adaptive Generation]], which involves tailoring puzzles to an individual player’s skill level in real-time. This could solve the longstanding problem of games being too hard for some players and too easy for others, maximizing engagement. The third, and perhaps most crucial, is the development of robust metrics for [[puzzle quality]] that go beyond mere solvability to include factors like aesthetics, novelty, and cognitive engagement. The implication for AI in game development is a shift from simply creating content to creating personalized, high-quality experiences. Successfully filling these gaps would mean moving towards AI systems that act not just as content factories, but as dynamic co-creators of the player’s experience, adapting the game world and its challenges in a responsive and intelligent way.

Key Takeaways

1. The Solvability Constraint Is Paramount

The survey repeatedly emphasizes that the single greatest barrier and defining characteristic of procedural puzzle generation is the [[solvability constraint]]. Unlike generating aesthetic content like terrain, where ‘correctness’ is subjective, a puzzle is fundamentally broken if it cannot be solved. This constraint dictates the entire generation strategy. The paper shows how techniques like working backward from a solved state (Sokoban), using solvers to validate clue removal (Sudoku), or employing AI agents for playtesting (Physics Puzzles) are all direct responses to this core requirement. This is a critical insight because it frames the problem not as one of creativity alone, but as a constrained optimization problem where validity is a non-negotiable binary outcome. The difficulty then becomes generating content that is not only solvable but also interesting, challenging, and elegant within that constraint.

Practical Application: An AI product engineer building a generative system for educational math problems must first and foremost ensure the generated problems are solvable with the concepts taught. Before focusing on varying the difficulty or phrasing, the core engine must have a robust validation or constructive generation method (e.g., building the problem from a known solution) to guarantee correctness. This takeaway prioritizes building a ‘correctness-by-design’ foundation before adding layers of qualitative features.

2. Declarative Methods Offer Superior Control and Quality Assurance

The survey highlights the power of declarative, constraint-based methods like [[Answer Set Programming (ASP)]] as a ‘white box’ approach to generation. Unlike evolutionary algorithms that use a ‘black box’ fitness function to blindly search for good solutions, ASP allows a designer to explicitly define the properties of a valid and high-quality puzzle as a set of logical rules. For example, a designer can state: ‘The maze must have a unique solution,’ ‘the solution path must be exactly 30 steps long,’ and ‘the path must not contain any U-turns.’ The ASP solver then finds a puzzle that satisfies all these constraints simultaneously. This approach is powerful because it integrates quality and aesthetic considerations directly into the generation process, rather than treating them as an afterthought for an evaluation function. It allows for the creation of puzzles that are not just solvable, but solvable in a very specific, intended way.

Practical Application: An AI product engineer designing a tool for generating marketing copy could use a declarative approach. Instead of just evolving text towards a vague ‘engagement’ score, they could define hard constraints: ‘The text must be under 280 characters,’ ‘it must contain the keyword [[product design]],’ ‘it must not use these three specific competitor names,’ and ‘it must end with a question.’ This provides fine-grained control and guarantees the output meets specific brand and platform requirements.

3. Puzzle Quality is More Than Just Solvability or Difficulty

While ensuring solvability is the first step, the survey makes it clear that high-quality puzzle design is a far more nuanced challenge. The authors point out that difficulty itself is complex; for example, in logic puzzles, the number of starting clues is a poor proxy for difficulty compared to the complexity of the logical deductions required to solve it. The paper identifies the need for better [[puzzle quality]] metrics as a key area for future research. This includes measuring aesthetic appeal (e.g., symmetry in a Sudoku grid), the ‘interestingness’ of a solution, the potential for emergent gameplay, and the avoidance of trivial or ‘shortcut’ solutions that bypass the intended challenge. This takeaway pushes beyond a purely functional view of generation towards a more holistic, human-centered one, acknowledging that what makes a puzzle good is often subjective and hard to quantify.

Practical Application: When developing an AI system that suggests code refactoring, ‘solvability’ might mean the code compiles and passes tests. However, a higher-quality suggestion would also consider readability, maintainability, and performance. The AI product engineer should plan to incorporate metrics and user feedback mechanisms that capture these ‘aesthetic’ qualities of good code, rather than just focusing on functional correctness. This could involve analyzing for code complexity, adherence to style guides, or even predicting future development friction.

Suggested Deep Dive

Chapter: Path-Building Puzzles (Section F) and the discussion on Answer Set Programming

Reason: This section provides the clearest case study on the power of declarative methods like [[Answer Set Programming (ASP)]]. It details how Smith et al. used ASP not only to generate solvable puzzles for the game Refraction but also to explicitly constrain and eliminate undesirable ‘shortcut’ solutions. This is a masterclass in moving beyond simple generation to fine-grained design control, which is a critical concept for any AI product engineer building generative systems that must adhere to complex rules and quality standards.

Key Vignette

Generating Sokoban by Reversing Time

To overcome the immense challenge of generating a solvable Sokoban puzzle from scratch, the work by Taylor and Parberry employs an elegant ‘reverse generation’ technique. Instead of starting with an empty board and placing items, their system begins with a completed puzzle, where all crates are already on their goal squares. It then applies legal moves in reverse—effectively ‘un-pushing’ the crates—to walk the puzzle back to a starting configuration. This process inherently guarantees that the generated puzzle has at least one valid solution path, neatly sidestepping the difficult problem of navigating a state space full of potential dead ends.

Memorable Quotes

A barrier for mainstream adoption of generated puzzles, compared to other types of content, may be the strict solvability constraint.

— Page 1, I. Introduction

Compared to Ashlock’s method, where the fitness function is essentially a black box, the ASP approach for maze generation is a ‘white box model’. While the space of possible outputs is smaller than for a black box model, a designer can control it directly through constraints.

— Page 9, E. Mazes

Because the outcome of a physics puzzle is emergent and depends on precise interactions, it’s nearly impossible to guarantee solvability with a purely static analysis. Therefore, generators for these puzzles typically include a component where an AI agent simulates playing the level… to verify that a solution is physically achievable.

— Page 9, H. Physics Puzzles (from summary)

A sophisticated way to generate logic puzzles of a specific difficulty is to base the difficulty rating on the actual solving techniques a human would need to use. A generator can be designed to create a puzzle that is only solvable if the player knows a specific advanced technique…

— Page 10, I. Logic Puzzles (from summary)

To facilitate this, we pinpoint ‘gaps in the map,’ i.e., promising areas for future research.

— Page 17, III. Analysis

Comparative Analysis

This survey by De Kegel and Haahr distinguishes itself from broader works on [[Procedural Content Generation (PCG)]], such as the foundational surveys by Togelius et al. and the textbook by Shaker, Togelius, and Nelson, by its specific and deep focus on puzzles. While general PCG surveys cover everything from game worlds to narratives, they often treat puzzles as just one of many content types. This work’s unique contribution is its argument that puzzles are a fundamentally different class of problem due to the non-negotiable [[solvability constraint]]. This focus allows the authors to build a detailed taxonomy of puzzle-specific generation techniques, like the ‘reverse generation’ method for Sokoban or symbolic methods for sliding puzzles, which are not major topics in general PCG literature. Furthermore, the survey places a strong emphasis on the potential of declarative, constraint-based methods like [[Answer Set Programming (ASP)]]. While other works certainly discuss search-based and evolutionary PCG, this paper champions ASP as a ‘white box’ model particularly suited for the well-defined rule systems of puzzles, offering a level of design control that is often elusive in more open-ended content generation. It agrees with the broader field on the importance of evaluation but refines the problem to ‘defining puzzle quality beyond solvability,’ a more specific and arguably harder challenge.

Reflection

De Kegel and Haahr’s survey is a significant contribution because it carves out and structures a complex sub-field of PCG. Its primary strength lies in its methodical classification of both puzzles and generation techniques, creating a clear ‘map of the field’ that is invaluable for researchers and practitioners alike. The identification of ‘[[gaps on the map]]’—online generation, adaptivity, and quality metrics—provides a clear and actionable research agenda. The authors’ perspective is that of academic cartographers, meticulously charting existing territory to guide future exploration. A skeptical angle might question the real-world readiness of many of these academic techniques. While generating a Sokoban level in a lab is one thing, integrating a fast, robust, and consistently high-quality puzzle generator into a commercial game engine’s production pipeline is another challenge entirely. The survey focuses heavily on academic papers, with less analysis of techniques used in shipped commercial games (which are often proprietary and unpublished). The book’s main weakness, therefore, is this inherent gap between academic research and industry practice. However, its overall significance is undeniable. By treating puzzle generation as a first-class problem, it provides the foundational language and frameworks necessary to bridge that very gap, inspiring the next generation of AI tools for creating more intelligent and engaging game experiences.

Flashcards

Card 1

Front: What is the primary barrier for mainstream adoption of procedural puzzle generation compared to other content types?

Back: The strict [[solvability constraint]]. A puzzle must have a guaranteed solution to be valid, which is a harder constraint than for content like terrain or characters.

Card 2

Front: What is ‘Reverse Generation’?

Back: A generation technique, often used for puzzles like Sokoban, that starts from a completed goal state and applies legal moves in reverse to create a starting position. This inherently guarantees solvability.

Card 3

Front: What is [[Answer Set Programming (ASP)]] and why is it considered a ‘white box model’?

Back: ASP is a declarative logic programming paradigm. It’s a ‘white box’ because designers can directly specify desired puzzle properties (e.g., solution length, path constraints) as explicit rules, giving them fine-grained control over the output.

Card 4

Front: When is [[Simulation-Based Evaluation]] necessary for puzzle generation?

Back: It is essential for puzzles with dynamic, emergent systems, such as [[physics puzzles]]. Because the outcome is not statically predictable, an AI agent must ‘play’ the puzzle to verify that a solution is physically achievable.

Card 5

Front: What are the three main ‘Gaps on the Map’ for procedural puzzle generation research?

Back:

[[Online Generation]]: Creating puzzles on-the-fly during gameplay.
[[Adaptive Generation]]: Tailoring puzzles to individual player skill in real-time.
[[Puzzle Quality Metrics]]: Defining and evaluating quality beyond mere solvability (e.g., aesthetics, engagement).

Card 6

Front: Contrast the generation challenges for Sokoban-type puzzles vs. Sliding puzzles.

Back: Sokoban has irreversible moves that can create dead ends, so the main challenge is guaranteeing [[solvability]]. Sliding puzzles have reversible moves, so solvability is trivial; the challenge shifts to generating difficult puzzles (i.e., those with long solution paths).

Card 7

Front: What is a more sophisticated way to control the difficulty of a logic puzzle like Sudoku?

Back: Instead of just counting the number of initial clues, a better method is to control for the most advanced logical technique (e.g., ‘X-Wing’, ‘Swordfish’) required to solve the puzzle. This ties difficulty to the player’s cognitive skill.

Generated using Google GenAI