Table of Contents

charlie deck

@bigblueboo • AI researcher & creative technologist

Back to index

To Engineer Is Human: The Role of Failure in Successful Design

Book Cover

Authors: Henry Petroski Tags: engineering, design, innovation, history, failure analysis Publication Year: 1985

Overview

In this book, I set out to answer the fundamental questions of ‘What is engineering?’ and ‘What do engineers do?’ My answer is perhaps counterintuitive: to understand engineering, one must first understand failure. I wrote this not just for my fellow engineers, but for anyone who has ever wondered how a bridge can stand for a century or why a seemingly simple structure might suddenly collapse. My central argument is that engineering is a profoundly human endeavor, and because it is human, it is fallible. However, this fallibility is not something to be lamented; it is the very engine of progress. We learn precious little from designs that succeed perfectly. A successful bridge confirms our existing knowledge but doesn’t expand it. A failure, on the other hand, is a powerful, if sometimes tragic, lesson. It reveals a flaw in our understanding, a miscalculation in our assumptions, or an unanticipated force of nature. By studying these failures—from the Tacoma Narrows Bridge to the Hyatt Regency skywalks to the humble paper clip—we advance the state of the art. Each new design is a hypothesis, a prediction that a new configuration of materials will withstand the forces of the world. Success offers confirmation, but only failure provides the incontrovertible proof that our hypothesis was wrong, forcing us to revise our theories and build better the next time. This book is an exploration of that iterative process. It is for the professional in technology or [[AI development]] who understands that complex systems have unforeseen failure modes, and for the curious citizen who wants to appreciate the hidden drama of the made world. It is my belief that by embracing the lessons of failure, we can design a more successful and safer future.

Book Distillation

1. Being Human

Engineering is not a perfect, abstract science but a creative, human act. Because we are human, we make mistakes, and because engineering constantly pushes into the unknown, failures are an inevitable part of the process. The rarity of catastrophic failures is actually a testament to the success of engineering, but to understand what engineers do is to understand how things can go wrong and how we learn from those instances to prevent them from happening again.

Key Quote/Concept:

The Paradox of Progress: Long periods of success can breed complacency and encourage engineers to design with smaller margins of safety. This very success can paradoxically create the conditions for a future failure.

2. Falling Down Is Part of Growing Up

We are all engineers from birth. The process of learning to walk is a series of structural experiments in balance and load-bearing, where falling down teaches us what not to do. Our childhood nursery rhymes and fairy tales—’London Bridge is Falling Down,’ ‘Humpty Dumpty’—are our first, foundational lessons in structural engineering, embedding in us the understanding that the made world is not infallible.

Key Quote/Concept:

Innate Engineering Intuition: The idea that our earliest experiences with the physical world, from balancing our own bodies to stacking blocks, provide us with a fundamental, intuitive understanding of structural principles like stability, load, and failure.

3. Lessons from Play; Lessons from Life

Everyday objects are powerful teachers of complex engineering concepts. The phenomenon of [[metal fatigue]], for instance, is perfectly demonstrated by bending a paper clip back and forth until it snaps. The failure of keys on a child’s electronic toy in the order of their frequency of use reveals the cumulative effect of repeated stress. These simple examples show that failure is often not a single event but a process of degradation over time.

Key Quote/Concept:

The One-Hoss Shay: A reference to Oliver Wendell Holmes’s poem about a carriage built so perfectly that every part was as strong as every other, causing it to run for exactly 100 years before disintegrating into dust all at once. This is a useful metaphor for the impossibility of perfect design; real-world objects fail at their single weakest link, not all at once.

4. Engineering as Hypothesis

Every engineering design is a hypothesis. It is a proposition that a particular arrangement of materials will perform a desired function without failing under expected conditions. When a structure stands and functions as intended, the hypothesis is confirmed, but it is never definitively proven. A failure, however, serves as a clear and final refutation of the hypothesis, forcing a revision of the underlying theory.

Key Quote/Concept:

[[Design as Hypothesis]]: This framework treats the design process not as the application of known formulas but as a form of the scientific method. The engineer posits a theory (the design) and then subjects it to rigorous testing (analysis) to find flaws before it is built.

5. Success Is Foreseeing Failure

The essence of successful design is not merely to create something that works, but to anticipate all the ways in which it might fail and to build in safeguards against those failures. The history of large-scale construction, from ancient pyramids to modern bridges, is a story of learning from error. Success is achieved by foreseeing and obviating failure.

Key Quote/Concept:

Obviation of Failure: The primary objective of the design engineer is not simply to make something stand up, but to imagine every possible mode of failure—material weakness, unexpected loads, environmental effects, human error—and design to prevent them.

6. Design Is Getting From Here to There

The process of design is analogous to planning a journey. There are countless routes, each with its own set of trade-offs involving cost, time, safety, and aesthetics. There is no single ‘best’ design, only a design that represents a chosen compromise among these competing constraints. The history of bridge-building shows this evolution clearly, as new materials and new demands forced engineers to take new paths, learning from both successes and failures along the way.

Key Quote/Concept:

Design as Compromise: Engineering design is never about finding a perfect solution, but about navigating a complex landscape of conflicting requirements. A lighter bridge may be cheaper but more flexible; a stronger bridge may be safer but more expensive. The final design is the chosen balance of these trade-offs.

7. Design as Revision

Engineering design is an iterative process, much like writing. An initial concept is like a first draft, which is then analyzed, critiqued, and revised to eliminate flaws. Just as a writer learns from awkward phrasing, an engineer learns from a calculation that reveals a weak point. The final, built structure is like a published work—the result of countless revisions aimed at removing error.

Key Quote/Concept:

Learning from Predecessors’ Failures: The most valuable lessons for an innovative designer come not from studying past successes, which can hide their secrets, but from understanding past failures, which clearly reveal what does not work.

8. Accidents Waiting to Happen

Catastrophic failures often originate in the smallest of details. The collapse of the Hyatt Regency skywalks was caused by a seemingly minor change to a connection detail, which doubled the load on a critical component. Such cases demonstrate the concept of the ‘weak link’ and the danger of [[progressive collapse]], where the failure of one small part triggers a chain reaction that brings down the entire structure.

Key Quote/Concept:

Alternate Load Paths: A crucial design principle for creating robust structures. If one structural member fails, are there other members (alternate paths) that can redistribute the load and prevent a total collapse? A lack of such redundancy is a common feature in many catastrophic failures.

9. Safety in Numbers

Engineers deal with uncertainty through the use of a ‘factor of safety.’ This is a multiplier applied during design to ensure a structure is built to be much stronger than the maximum load it is ever expected to see. This ‘factor of ignorance’ accounts for unknown variations in material strength, unexpected loads, and imperfections in our analytical models. The value of this factor is not static; it tends to rise after a major failure and fall during long periods of success.

Key Quote/Concept:

[[Factor of Safety]]: Calculated by dividing the load that would cause failure by the maximum expected service load. A factor of safety of 1 means the structure can only support its expected load and has no margin for error, a highly dangerous condition.

10. When Cracks Become Breakthroughs

The vast majority of structural failures are caused by the slow growth of cracks, a process known as fatigue. No material is perfectly flawless, so modern design assumes the existence of microscopic cracks and focuses on preventing them from growing to a critical size during the structure’s intended lifetime. This involves design philosophies like ‘fail-safe’ and inspection protocols to monitor for damage.

Key Quote/Concept:

Leak-Before-Break: A design criterion, especially in pressurized systems like nuclear plant piping, where a material is chosen so that a growing crack will penetrate the wall and cause a detectable leak long before it can grow large enough to cause a catastrophic rupture. This provides an inherent warning system.

11. Of Bus Frames and Knife Blades

The case of the Grumman Flxible buses, which developed severe cracks in their frames, illustrates the risks of radical design changes. In an effort to meet new demands for lightness and efficiency, designers departed from time-tested chassis designs, introducing new and unforeseen failure modes. The same analytical process used to understand a failing bus frame can be applied to a cracked dinner knife, tracing the failure to its roots in material, manufacturing, or use.

Key Quote/Concept:

The Peril of Radical Change: When a design departs significantly from established practice, the accumulated, often implicit, knowledge of generations is lost. This increases the likelihood of overlooking a critical failure mode that past designs had unconsciously evolved to prevent.

12. Interlude: The Success Story of the Crystal Palace

Innovation does not always lead to failure. The Crystal Palace of 1851 was a revolutionary structure, using prefabricated iron and glass components on an unprecedented scale. Its success was not an accident; it was the result of meticulous planning, modular design, and rigorous testing of components before and during construction. It stands as a powerful counterexample to the idea that daring design is doomed to fail.

Key Quote/Concept:

Success Through Testing: The builders of the Crystal Palace did not simply assume their innovative design would work. They proof-tested components, like the gallery floors, with massive loads (including marching soldiers) to demonstrate their safety and silence critics, proving that confidence must be earned through verification.

13. The Ups and Downs of Bridges

The history of the suspension bridge is the quintessential story of engineering progress through failure. Early bridges were prone to collapse under wind or rhythmic loads. John Roebling’s success with the Brooklyn Bridge came from his deep understanding of these failures. However, his success led to a new generation of ever-more slender and ‘efficient’ bridges, culminating in the dramatic collapse of the Tacoma Narrows Bridge, which taught a new, hard-won lesson about [[aerodynamic instability]].

Key Quote/Concept:

The Paradox of Engineering Design: Successful structures can inadvertently lead to future failures. As designers refine and optimize a successful concept, pushing it to be lighter, longer, or cheaper, they may trim away the very features, sometimes unknowingly, that made it safe in the first place.

14. Forensic Engineering and Engineering Fiction

[[Forensic engineering]] is the detective work of sifting through the wreckage of a failure to uncover its root cause. The investigation into the collapse of the Alexander L. Kielland oil rig or the de Havilland Comet jetliner reveals how a chain of events, often beginning with a hidden flaw, leads to disaster. Engineering fiction, like Nevil Shute’s novel ‘No Highway,’ can be remarkably prescient, exploring the potential failures of new technologies before they happen.

Key Quote/Concept:

Failure Analysis: The post-mortem investigation of a structural or mechanical failure. It is a critical process that turns a disaster into a lesson, providing the knowledge needed to prevent similar failures in other existing or future designs.

15. From Slide Rule to Computer: Forgetting How It Used to Be Done

The shift from the slide rule to the computer represents a fundamental change in engineering practice. While computers grant the power to perform immensely complex calculations, they also introduce the danger of ‘black box’ thinking. Without the need to perform intermediate calculations by hand, engineers can lose their intuitive feel for the scale and correctness of a result. An oversimplified model or a software bug can produce a result that is precisely calculated but dangerously wrong.

Key Quote/Concept:

The Illusion of Precision: Computers provide answers with many significant digits, creating an aura of accuracy. However, the accuracy of the output is entirely dependent on the accuracy of the input and the validity of the underlying analytical model, both of which are products of fallible human judgment.

16. Connoisseurs of Chaos

While we can create elaborate lists of failure causes—material defects, overloads, corrosion—they can almost all be traced back to a single source: human error. This is not to place blame, but to recognize a fundamental truth. Since the purpose of design is to create something that does not fail, any failure is, by definition, a design failure. The designer did not anticipate a specific mode of collapse. The key to preventing future failures is the open dissemination of information about past ones.

Key Quote/Concept:

All Failure is Design Failure: This concept posits that even failures attributed to construction error or improper maintenance are ultimately design failures. A robust design should anticipate a certain level of imperfection in its construction and use.

17. The Limits of Design

A perfect, truly fail-proof design is a myth. All design is a compromise among conflicting requirements—safety, cost, weight, aesthetics. The history of technology is a cyclic process: success leads to confidence and pushing boundaries; pushing boundaries leads to failure; failure leads to new knowledge and more conservative designs; this new success begins the cycle anew. This is not a flaw in the process; it is the process. It is how we learn and how progress is made.

Key Quote/Concept:

The Success-Failure Cycle: Engineering does not advance in a straight line. It moves in a cycle where periods of success encourage innovation that pushes designs to their limits, eventually resulting in a failure. The lessons from that failure are incorporated, leading to a new period of success.


Generated using Google GenAI

Essential Questions

1. Why do I argue that understanding failure is fundamental to understanding engineering?

My central thesis is that engineering is a human endeavor, and therefore inherently fallible. We learn very little from unmitigated success; a bridge that stands simply confirms our existing knowledge. A failure, however, is a powerful teacher. It exposes a flaw in our understanding, an error in our calculations, or an unanticipated force. The collapse of the Tacoma Narrows Bridge, for instance, taught us a profound lesson about [[aerodynamic instability]] that decades of successful suspension bridges had obscured. I argue that engineering knowledge advances not by celebrating triumphs, but by dissecting disasters. Each failure serves as an incontrovertible refutation of a design hypothesis, forcing us to revise our theories and build better. This iterative process of learning from error is the true engine of progress. For an AI product engineer, this means recognizing that unexpected system failures or edge-case errors are not just bugs to be fixed, but invaluable data points that reveal the true boundaries of a model’s capabilities and the hidden assumptions in its design.

2. How does the concept of ‘Engineering as Hypothesis’ reframe the design process?

I propose that every engineering design should be viewed as a scientific hypothesis. It is a proposition stating that a specific arrangement of materials and logic will perform a desired function without failing under expected conditions. When a structure stands or a system operates as intended, the hypothesis is confirmed, but it is never definitively proven true—it has simply not yet been falsified. A failure, on the other hand, is a definitive refutation. This framework shifts the designer’s role from one of simply applying established formulas to one of actively trying to disprove their own creation. The goal of analysis, then, is not just to confirm that a design works, but to exhaustively search for the conditions under which it might fail. This is the essence of [[forensic engineering]] applied proactively. For professionals in [[AI development]], this means treating every new model or product not as a final solution, but as a testable hypothesis about user needs and system behavior, and then designing rigorous experiments (A/B tests, red teaming) to find its breaking points before it is deployed at scale.

3. What is the ‘Success-Failure Cycle’ and what are its implications for managing innovation and risk?

I observe a cyclical pattern in the history of technology that I call the success-failure cycle. A period of prolonged success with a particular design—be it a bridge, an airplane, or a software architecture—breeds confidence and even complacency. This confidence encourages engineers to optimize the design, making it lighter, cheaper, or more efficient by trimming away perceived ‘over-design.’ In doing so, they may inadvertently remove the very margins of safety that made the original design robust, pushing the new design closer to an unknown limit. Eventually, a failure occurs, which reveals this new limit of knowledge. This failure prompts a return to more conservative designs and higher [[Factor of Safety factors of safety]], beginning a new period of success. The Tacoma Narrows Bridge is a classic example, evolving from the robust but ‘overbuilt’ designs of Roebling. For an AI product engineer, this is a crucial warning: the successful scaling of a model from one domain to another can create a false sense of security, leading teams to underestimate the risks of a new context or a seemingly minor architectural tweak, thus setting the stage for a significant and unexpected failure.

Key Takeaways

1. Failure is the Greatest Teacher in Engineering

My book’s primary message is that progress in engineering is driven by learning from what doesn’t work. Successes confirm what we already know, but failures reveal the limits of our knowledge and force us to innovate. I trace this principle through numerous historical examples, from the evolution of Gothic cathedrals, which corrected cracking through the addition of buttresses, to the development of suspension bridges, where each collapse taught a new lesson about stability and aerodynamics. The analysis of failure is not an admission of defeat but the most critical phase of the design cycle. It provides the hard-won data necessary to create safer, more reliable designs in the future. This concept of [[failure analysis]] is the cornerstone of engineering advancement, turning tragic and costly mistakes into invaluable lessons that prevent their recurrence.

Practical Application: An AI product team should implement a structured and blameless post-mortem process for every significant failure, whether it’s a model generating harmful content, a system outage, or a feature with poor user adoption. The goal is not to assign blame but to deeply understand the root causes—flawed assumptions, data gaps, unforeseen user behaviors, or architectural weaknesses. The documented findings should become required reading and directly inform the design principles, testing protocols, and risk assessments for all future projects, creating an institutional memory that learns from its mistakes.

2. Every Design is a Hypothesis Awaiting Refutation

I argue that we must treat every new design as a testable hypothesis, not as a final solution. A design posits that a certain configuration will meet its requirements without failing. The analytical process, therefore, is an attempt to refute this hypothesis by finding its flaws before it is built. This is why I detail cases like the Hyatt Regency skywalks; the seemingly minor change in the hanger rod design was a new, untested hypothesis that tragically failed. A successful design is simply one that has withstood all attempts at refutation so far. This mindset encourages intellectual humility and a healthy skepticism toward our own creations, fostering a culture of rigorous testing and critical examination rather than one of overconfidence. It frames the engineer’s job as one of [[obviation of failure]]—of foreseeing and preventing every possible way a design could go wrong.

Practical Application: When developing a new AI feature, the product manager should frame the [[product design]] as a series of hypotheses (e.g., ‘We hypothesize that this recommendation algorithm will increase user engagement by 15% without increasing exposure to misinformation’). Then, the team’s primary job is to design experiments to try and disprove this hypothesis. This could involve ‘red teaming’ the model with adversarial inputs, stress-testing it with unusual data, and running controlled A/B tests to look for negative side effects, ensuring the system is robust before a full launch.

3. Technological Tools Can Create a Dangerous Illusion of Precision

In my chapter ‘From Slide Rule to Computer,’ I caution that powerful new tools can obscure an engineer’s intuitive understanding of a problem. The slide rule forced an engineer to have a ‘feel’ for the magnitude of the answer, as it did not place the decimal point. Modern computers, however, can produce highly precise—but dangerously wrong—answers if the underlying model or input data is flawed. The collapse of the Hartford Civic Center roof was partly attributed to an oversimplified computer model that designers trusted implicitly. The tool’s apparent sophistication masked a fundamental error in the assumptions. This creates a risk of ‘black box’ thinking, where the engineer trusts the tool’s output without a deep, intuitive grasp of the system’s physical reality. The more powerful the tool, the greater the potential for a precisely calculated disaster.

Practical Application: An AI engineer using a complex pre-trained model or a new MLOps platform should be deeply skeptical of its outputs. They should not just accept the high accuracy score. Instead, they must perform extensive error analysis, probe the model’s failure modes, and use interpretability tools to understand why it makes certain decisions. The product team must insist on building ‘sanity checks’ and simple heuristics to validate the model’s outputs against common-sense expectations, preventing the team from blindly trusting a complex system that may be failing in a subtle but critical way.

Suggested Deep Dive

Chapter: Chapter 8: Accidents Waiting to Happen

Reason: This chapter provides the most compelling and detailed case study in the book: the collapse of the Hyatt Regency skywalks. It is a masterclass in how a seemingly minor change—a detail in a connection—can have catastrophic, non-linear consequences in a complex system. For an AI product engineer, this is a powerful allegory for the dangers of small changes in code, training data, or model architecture that can lead to [[progressive collapse]] in system performance or safety. It perfectly illustrates the concept of the ‘weak link’ and the critical importance of understanding load paths, both in physical structures and in data-driven systems.

Key Vignette

The Hyatt Regency Skywalk’s Fatal Design Change

The original design for the suspended walkways in the Hyatt Regency hotel called for a single, continuous hanger rod to run from the ceiling down through the fourth-floor walkway and then to the second-floor walkway below. However, during construction, this design was altered to a two-rod system for ease of assembly. In the new configuration, one rod hung from the ceiling to the fourth-floor walkway, and a separate, offset rod hung from the fourth-floor walkway’s support beam to hold the second-floor walkway. This seemingly innocuous change, made to simplify construction, had a fatal consequence: it doubled the load on the nut supporting the fourth-floor walkway’s box beam, a connection that was already under-designed. This single, seemingly minor detail created the weak link that led to the catastrophic [[progressive collapse]] of the structures, killing 114 people.

Memorable Quotes

I believe that the concept of failure—mechanical and structural failure in the context of this discussion—is central to understanding engineering, for engineering design has as its first and foremost objective the obviation of failure.

— Page 6, Preface

Success may be grand, but disappointment can often teach us more. It is for this reason that hardly a history can be written that does not include the classic blunders, which more often than not signal new beginnings and new triumphs.

— Page 16, Chapter 1: Being Human

The paradox of engineering design is that successful structural concepts devolve into failures, while the colossal failures contribute to the evolution of innovative and inspiring structures.

— Page 150, Chapter 13: The Ups and Downs of Bridges

What is commonly overlooked in using the computer is the fact that the central goal of design is still to obviate failure, and thus it is critical to identify exactly how a structure may fail. The computer cannot do this by itself.

— Page 178, Chapter 15: From Slide Rule to Computer

No one wants to learn by mistakes, but we cannot learn enough from successes to go beyond the state of the art.

— Page 62, Chapter 5: Success Is Foreseeing Failure

Comparative Analysis

My work, ‘To Engineer Is Human,’ occupies a unique space when compared to other notable books on design and technology. Unlike purely technical engineering textbooks that focus on mathematical methods and successful applications, my approach is historical and narrative, using stories of failure as the primary lens for understanding the engineering process. It shares a focus on human-centered issues with Don Norman’s ‘The Design of Everyday Things,’ but where Norman emphasizes usability and cognitive psychology in [[product design]], I concentrate on the structural and mechanical integrity of the design itself and the epistemological role of failure in advancing the field. My book also resonates with Charles Perrow’s ‘Normal Accidents,’ which analyzes how complexity and tight coupling in systems inevitably lead to failures. However, Perrow’s focus is on systemic, organizational failures in high-risk technologies like nuclear power. My contribution is to argue that failure is not just an inevitable outcome but the fundamental mechanism of learning and progress in all design, from the simplest paper clip to the most complex bridge. I aim to make the very thought process of the engineer—the constant anticipation of what could go wrong—accessible to a general audience, a perspective often missing from both purely technical and purely sociological analyses of technology.

Reflection

In writing this book, my goal was to demystify engineering by revealing its profoundly human core. The central argument—that engineering advances through failure—is, I believe, its greatest strength. It provides an accessible and compelling narrative that counters the popular image of engineering as a precise, infallible science. Instead, I present it as a creative and iterative process of trial and error, where every structure is a hypothesis and every collapse is a lesson. However, a skeptical reader might argue that I romanticize failure. In fields like [[AI safety]] or medical device engineering, the cost of failure is so catastrophic that waiting for one to happen is not an acceptable learning strategy. While my historical examples are powerful, they are largely from a pre-digital age. The nature of failure in complex, adaptive systems like large language models introduces new challenges that my focus on mechanical structures does not fully address. The ‘black box’ problem I discussed in relation to early computers is magnified a thousandfold with today’s AI. My core thesis remains relevant, but its application requires a new level of proactive [[failure analysis]]—simulating failures, red-teaming systems, and designing for safe failure modes—because we simply cannot afford to learn from a real-world, large-scale AI disaster.

Flashcards

Card 1

Front: What is the central thesis of ‘To Engineer Is Human’?

Back: Engineering is a human and therefore fallible process that advances primarily by learning from failures, not by replicating successes. The objective of design is the [[obviation of failure]].

Card 2

Front: Define the concept of [[Design as Hypothesis]].

Back: Every engineering design is a hypothesis that a specific arrangement of parts will function without failing. Success confirms the hypothesis but doesn’t prove it; failure definitively refutes it.

Card 3

Front: What is the ‘Success-Failure Cycle’ in engineering?

Back: A cycle where prolonged success leads to confidence and riskier, more ‘optimized’ designs. This pushes boundaries and eventually leads to a failure, which in turn leads to more conservative designs, starting the cycle anew.

Card 4

Front: What is a [[Factor of Safety]]?

Back: A multiplier used in design to ensure a structure is stronger than the maximum expected load. It is a ‘factor of ignorance’ that accounts for uncertainties in materials, loads, and analysis.

Card 5

Front: What was the critical design change that caused the Hyatt Regency skywalk collapse?

Back: Changing the support for two walkways from a single continuous hanger rod to two separate, offset rods. This doubled the load on the upper walkway’s connection, causing a [[progressive collapse]].

Card 6

Front: What is the danger of computational tools that Petroski warns about?

Back: Computers can create an ‘illusion of precision,’ delivering highly precise but fundamentally wrong answers if the underlying model or assumptions are flawed, a risk that grows as engineers lose their intuitive ‘feel’ for the problem.

Card 7

Front: What is [[metal fatigue]]?

Back: The weakening of a material caused by repeatedly applied loads. It is the progressive and localized structural damage that occurs when a material is subjected to cyclic loading, often leading to failure after a certain number of cycles.


Generated using Google GenAI

I used Jekyll and Bootstrap 4 to build this.