Words and Rules: The Ingredients of Language
Authors: Steven Pinker Tags: linguistics, cognitive science, psychology, language, AI Publication Year: 1999
Overview
In this book, I explore a simple but profound idea about the nature of language and the human mind: that our linguistic ability is powered by two distinct mental systems. The first is a finite, memorized list of words—a mental dictionary where we store arbitrary pairings of sound and meaning. The second is a set of combinatorial rules—a mental grammar—that allows us to assemble these words into a virtually infinite number of new phrases and sentences. I call this the [[words-and-rules theory]]. To bring this idea to life, I focus on a single, seemingly minor phenomenon: the distinction between regular and irregular verbs, like walk-walked versus bring-brought. This distinction, I believe, is not a messy accident of history but a clear window into our cognitive architecture. Irregular verbs, with their idiosyncratic forms, are the quintessential products of the word-memory system. Regular verbs, with their predictable ‘-ed’ suffix, are the products of a symbolic rule. By examining this one contrast from every angle—how children learn verbs, how languages change over time, how different languages handle regularity, and how the brain processes words—we can see the two systems in action. This investigation serves as a case study in a grander debate in cognitive science, pitting theories of mind-as-computer (which manipulates symbols and rules) against theories of mind-as-neural-network (which forms associations). I wrote this book for anyone curious about the intricate machinery behind our effortless ability to speak, but it should be particularly relevant to those in fields like AI and computer science who grapple with the same fundamental problems of knowledge representation: how to balance the storage of specific facts with the application of general rules.
Book Distillation
1. The Infinite Library
The boundless expressive power of language stems from two fundamental components: words and rules. Words are items stored in a mental dictionary, an arbitrary, memorized link between a sound and a concept. Rules are the engine of a generative grammar, a combinatorial system that assembles words into novel phrases and sentences. This dual-mechanism design is a compromise; a language built only on rules would be computationally cumbersome for common concepts, while one built only on memorized phrases would lack creative power. The distinction between regular verbs (like walk-walked) and irregular verbs (sing-sang) provides a perfect natural experiment to explore these two systems. Regular forms are the product of a rule (‘add -ed’), while irregular forms are idiosyncratic words retrieved from memory.
Key Quote/Concept:
[[The Words and Rules Theory]] is the central thesis that language is a hybrid system composed of two distinct cognitive mechanisms. The first is a finite, associative memory (the lexicon) that stores arbitrary sound-meaning pairings, including irregular forms. The second is a symbolic, computational system (the grammar) that generates open-ended combinations of words, including regular forms.
2. Dissection by Linguistics
Language is not a monolithic entity but has a complex anatomy with distinct modules for the lexicon, morphology (word-building), syntax (phrase-building), and phonology (sound patterns). The difference between morphology and syntax is revealed by how we pluralize compounds versus phrases. In the compound mother-in-law, the head is mother, but because it’s a word, the plural goes on the end (mother-in-laws is common). In the phrase mother of the groom, the head is mother, and it correctly takes the plural (mothers of the groom). Regular inflection itself is elegantly simple: a single rule attaches a suffix like ‘-ed’, and its three different pronunciations (/t/, /d/, /ɪd/) are determined not by the inflection rule itself but by universal phonological rules that govern sound combinations across the language.
Key Quote/Concept:
[[Head of a Word vs. Head of a Phrase]]. This structural distinction reveals the separation between morphology and syntax. In English words, the head is the rightmost element (a steamboat is a type of boat). In phrases, the head is typically the leftmost noun (boats on the water). The head determines the properties of the whole construction, including where inflections go.
3. Broken Telephone
Irregular verbs are not random; they are fossils of grammatical rules that existed in earlier stages of English and its ancestors, like Proto-Indo-European. Languages change over centuries through a process like the children’s game of ‘Broken Telephone’: each generation of learners imperfectly acquires the language of the previous one, reanalyzing what they hear. Old rules, like vowel alternation (ablaut), die out, but their products (like sing-sang-sung) survive as memorized lists. These lists form families of similar-sounding words, which can occasionally attract new members by analogy (sneak-snuck) but generally tend to shrink over time as less common verbs are regularized by children.
Key Quote/Concept:
[[Historical Linguistics]]. The patterns in today’s irregular verbs are not arbitrary but are echoes of grammatical systems from thousands of years ago. By tracing their history, we can see how processes like sound change, analogy, and reanalysis by learners have shaped the language we speak today.
4. In Single Combat
The fact that irregular verbs show patterns presents a challenge, leading to a ‘single combat’ between two major theories of the mind. On one side is [[generative phonology]], a rule-based theory which posits that both regular and irregular forms are generated by rules, with irregulars being subject to a small number of highly specific rules. On the other is [[connectionism]], an association-based theory which posits that all past tense forms are generated by a single neural network that learns associations between the sounds of stems and the sounds of their past forms. The rule-only theory struggles to explain the fuzzy, family-resemblance nature of irregular classes, while the association-only theory struggles with the crisp, all-or-nothing nature of the regular rule.
Key Quote/Concept:
[[Connectionism vs. Generative Phonology]]. This is the central scientific debate framing the book. Connectionist models (neural networks) try to explain all of language with a single mechanism of associative memory. Generative phonology (the Chomskyan tradition) tries to explain it all with rules. The evidence from verbs suggests both are half-right and that a hybrid words-and-rules model is needed.
5. Word Nerds
Because irregular verbs are stored in memory, they should be sensitive to frequency effects; regular verbs, generated by a rule, should not. This prediction holds true. The ten most common verbs in English are all irregular. Conversely, the rarest verbs are almost all regular. Psycholinguistic experiments confirm this: people are faster and more confident producing the past tense of common irregulars (like took) than rare ones (strove), but show no such difference between common regulars (walked) and rare ones (mauled). This suggests regulars are computed on the fly, while irregulars are retrieved from memory, where frequency strengthens the trace.
Key Quote/Concept:
[[Frequency Effects as a Diagnostic Tool]]. The frequency with which a word appears in the language affects the strength of its representation in memory. This allows us to test the words-and-rules theory: if irregulars are memorized, their processing should be affected by frequency. If regulars are rule-generated, their processing should not. The data strongly support this dissociation.
6. Of Mice and Men
The regular rule is a default operation that applies not just when memory is weak, but when it is structurally inaccessible. This occurs in words that lack a proper root (onomatopoeia like pinged, names like the Childs) or are ‘headless’—compounds where the rightmost element doesn’t define the whole, like lowlife (a person, not a type of life). Because the irregular plural lives is linked to the root life, the headless structure of lowlife blocks access to it, so the default regular rule applies, yielding lowlifes. Similarly, a baseball player flied out, he didn’t flew out, because the verb is derived from the noun a fly ball, making it headless.
Key Quote/Concept:
[[Headless Words and Default Inflection]]. A word’s internal structure determines whether it can access an irregular form stored with its root. In a ‘headless’ word, the information pathway from the root is blocked. This prevents the retrieval of the irregular form, forcing the application of the default regular rule. This explains why we say Mickey Mouses and flied out.
7. Kids Say the Darnedest Things
Children’s famous errors like goed and breaked are not mistakes but evidence of a powerful cognitive process: the acquisition of a rule. Children exhibit a [[U-shaped learning curve]]: first they correctly use irregulars like went (learned by rote), then they discover the ‘-ed’ rule and overapply it (goed), and finally they learn to block the rule for irregular verbs and revert to the correct form. This happens because a child’s memory for the irregular form is initially weak and fails to block the newly acquired, productive rule. Children are not just memorizing patterns; they are acquiring an abstract, symbolic rule and are sensitive to the same structural constraints as adults.
Key Quote/Concept:
[[U-Shaped Learning Curve]]. This developmental pattern—correct, then incorrect, then correct again—is a hallmark of rule acquisition. It shows that children do not simply imitate adults but actively construct a grammar. The initial correct performance is based on rote memory, the dip is caused by the overapplication of a newly discovered rule, and the final recovery shows the integration of the rule with memorized exceptions.
8. The Horrors of the German Language
The words-and-rules system is a universal feature of human language, not a quirk of English. German provides a crucial test case. Its regular past participle suffix applies to a minority of common verbs, yet it behaves as the default rule, applying to novel, rare, and headless words. The German plural is even more striking: the regular, default suffix ‘-s’ is used in only about 4% of nouns, yet it is the one applied to names, acronyms, and foreign borrowings. This demonstrates that a rule’s ‘regularity’ is a psychological status—being the default computation—not a statistical one based on the number of items it applies to.
Key Quote/Concept:
[[The Default Rule as a Psychological Kind]]. A rule is ‘regular’ or ‘default’ not because it applies to the most words, but because of how the mind uses it: as the go-to operation for any word of the right category that doesn’t have a competing irregular form stored in memory. The German plural system, where the default ‘-s’ is a tiny minority, proves that psychological status can be divorced from statistical frequency.
9. The Black Box
The words-and-rules theory predicts that the two systems should be physically distinct in the brain. Evidence from cognitive neuroscience confirms this. A [[double dissociation]] is seen in brain-damaged patients: those with damage to frontal/basal-ganglia circuits (implicated in grammar and procedures) struggle with regular verbs, while those with damage to temporal/parietal lobe areas (implicated in lexical memory) struggle with irregulars. Genetic disorders show the same split: Specific Language Impairment selectively impairs the rule system, while Williams Syndrome selectively impairs the lexical/associative system. Brain imaging techniques also reveal different patterns of neural activity for regular and irregular verb processing.
Key Quote/Concept:
[[Double Dissociation]]. This is a powerful form of evidence in neuroscience. Finding patients who can process regular but not irregular verbs, and other patients who can process irregular but not regular verbs, strongly implies that the two tasks are handled by separate and independent brain systems, confirming the core claim of the words-and-rules theory.
10. A Digital Mind in an Analog World
The distinction between words and rules in language mirrors a deeper distinction in human cognition between two ways of knowing. Irregular verbs are like [[family resemblance categories]] (e.g., ‘game’, ‘furniture’), which are fuzzy, graded, and organized around prototypes; they reflect the messy, contingent history of the world and are handled by an associative memory system. Regular verbs are like [[classical categories]] (e.g., ‘odd number’, ‘grandmother’), which are defined by crisp, all-or-none rules; they are by-products of computational systems like grammar, logic, and science that allow for precise deduction. Our minds are hybrids, equipped with both an analog, associative system for dealing with the historical world and a digital, rule-based system for exploiting its lawful structure.
Key Quote/Concept:
[[Classical vs. Family Resemblance Categories]]. This distinction from the psychology of concepts maps perfectly onto the regular-irregular distinction in language. Irregular verbs form family resemblance categories, learned by association. Regular verbs form a classical category, generated by a rule. This suggests that the words-and-rules architecture is not just for language, but may be a fundamental organizing principle of the human mind.
Generated using Google GenAI
Essential Questions
1. What is my ‘words-and-rules theory,’ and how does the distinction between regular and irregular verbs serve as its primary evidence?
My central argument is that human language is powered by two distinct mental systems. The first is a finite, memorized lexicon—a mental dictionary—where we store arbitrary sound-meaning pairings, including idiosyncratic words like irregular verbs (bring-brought). The second is a combinatorial grammar of symbolic rules that can generate a virtually infinite number of novel utterances, including regular verb forms (walk-walked). The regular/irregular distinction provides a perfect ‘natural experiment’ to see these two systems in action. Irregular verbs, being unpredictable, must be retrieved from the associative memory system, much like any other word. Regular verbs, being perfectly predictable, are not retrieved but are generated on the fly by a simple, powerful rule: ‘add -ed to the verb stem.’ This [[words-and-rules theory]] posits that our minds are hybrids, balancing the efficiency of storing common, arbitrary facts with the immense generative power of computation for everything else.
2. How does my analysis of verbs challenge both purely rule-based (generative) and purely association-based (connectionist) models of language?
I frame my investigation as a ‘single combat’ between two dominant theories in cognitive science. On one side, traditional [[generative phonology]] attempts to explain all verb forms, regular and irregular, with a complex system of rules, which struggles to account for the fuzzy, family-resemblance nature of irregular classes. On the other side, [[connectionism]] (neural networks) tries to explain all verb forms with a single associative mechanism, which struggles to explain the crisp, all-or-nothing, open-ended productivity of the regular rule. My evidence suggests both are half-right. Irregular verbs do show associative properties (family resemblance, frequency effects), fitting the connectionist view of memory. However, regular verbs behave like a true symbolic default rule, applying to any verb regardless of its sound or familiarity, which pure associationism cannot handle. Therefore, I argue for a hybrid model that incorporates both an associative memory for ‘words’ (irregulars) and a computational grammar for ‘rules’ (regulars).
3. What does the behavior of the ‘default rule’ in English and other languages reveal about the psychological nature of grammar?
A key insight is that a rule’s ‘regularity’ is a psychological status, not a statistical one. A rule is ‘regular’ or ‘default’ because it is the mind’s go-to operation for any word that lacks a specific, competing form in memory. This is proven most strikingly by the German plural system. The default plural suffix in German, ‘-s’, is the one applied to novel words, names, and foreign borrowings. Yet, it is used on only a tiny minority—about 4%—of German nouns. This demonstrates that the mind’s default computation can be divorced from statistical frequency. This principle also explains why the English ‘-ed’ rule applies to rare verbs, newly coined verbs (Borked), and structurally ‘headless’ words (flied out). In all these cases, memory fails to provide an alternative, so the [[default rule]] is automatically engaged. This reveals grammar as a system of abstract computation, not just a summary of statistical patterns in the input.
Key Takeaways
1. Language is a Hybrid System of Memorized Words and Computational Rules
The core of my argument is that the human mind is not a monolithic processor. For language, it employs a dual-system architecture. One system is an associative memory, our mental lexicon, which stores thousands of arbitrary pairings of sound and meaning. This is where we keep simple words like dog and also the quirky, unpredictable forms we call irregular verbs, like slept or went. The other system is a generative grammar, a computational engine that manipulates symbols according to rules. This is what allows us to combine words into novel sentences and to apply regular patterns, like adding ‘-ed’ to form a past tense, to any verb we encounter, even one we’ve never heard before. This [[words-and-rules theory]] explains why irregulars are limited in number and sensitive to frequency, while regulars are open-ended and effortlessly productive.
Practical Application: For an AI product engineer, this suggests the power of hybrid architectures. Instead of relying on a single, massive neural network to handle all tasks, consider a dual approach. A system could use a large, efficient lookup database (like a key-value store) for common, idiosyncratic cases (e.g., frequent user queries, irregular data points) while employing a separate, generative algorithm to handle novel or rare inputs. This can be more efficient and robust than a one-size-fits-all model.
2. A ‘Default Rule’ is Defined by its Psychological Role, Not its Statistical Frequency
I show that what makes a grammatical pattern ‘regular’ is not that it applies to the most words, but that it functions as the mind’s default operation. It’s the procedure that kicks in automatically when memory fails to supply a specific, stored alternative. The most compelling evidence comes from German, where the default plural suffix (‘-s’) is applied to names, acronyms, and novel nouns, yet accounts for less than 5% of existing noun types. The vast majority of German nouns use one of several irregular patterns. This proves that the mind’s computational architecture is not a slave to statistics. The [[default rule]] is a qualitative part of our mental software, designed to ensure that we are never left speechless; it can always generate a form for a word, even if that word is new, rare, or structurally unusual.
Practical Application: In designing user interfaces or machine learning systems, don’t assume that the most frequent pattern is the one that should be generalized. The most robust ‘default’ behavior for a system is the one that handles the widest variety of unforeseen circumstances, even if those circumstances are individually rare. For example, an error-handling system’s default state should be a general, safe procedure, not one based on the most common type of error seen in the past.
3. The Brain’s Physical Structure Reflects the Words-and-Rules Division
The distinction between words and rules is not just an abstract theory; it appears to be etched into the very neuroanatomy of the brain. I present evidence from patients with brain damage that shows a [[double dissociation]]. Patients with damage to anterior brain regions, particularly involving frontal/basal-ganglia circuits, often have trouble with grammar and rules, selectively impairing their ability to produce regular past tenses (walked). Conversely, patients with damage to posterior regions, in the temporal and parietal lobes, often have trouble with word memory, selectively impairing their ability to retrieve irregulars (dug). This neurological evidence strongly supports the idea that two distinct brain systems handle the two different components of language: a memory-based system for irregulars and a rule-based procedural system for regulars.
Practical Application: This provides a biological inspiration for building robust, fault-tolerant AI. A system with a modular, hybrid architecture, where distinct components handle different kinds of information (e.g., stored vs. computed), can exhibit graceful degradation. If one module fails, the others can continue to function, providing a baseline of performance. This is more resilient than a monolithic system where any damage can lead to catastrophic failure across all tasks.
Suggested Deep Dive
Chapter: Chapter 9: The Black Box
Reason: This chapter is essential for an AI product engineer because it grounds the abstract linguistic theory in the physical hardware of the brain. I explore how evidence from cognitive neuroscience—including studies of aphasia, neurodegenerative diseases like Alzheimer’s and Parkinson’s, genetic disorders, and brain imaging techniques like fMRI and ERP—converges to support the words-and-rules model. It provides a fascinating look at how we can map computational functions onto neural circuits, revealing a [[double dissociation]] between the brain systems for memory (words/irregulars) and computation (rules/regulars). This is directly relevant to anyone interested in brain-computer interfaces, neuromorphic computing, and building AI systems inspired by the brain’s actual architecture.
Key Vignette
The Curious Case of the ‘Headless’ Word
A key piece of evidence for my theory comes from so-called ‘headless’ words. For instance, the plural of lowlife is lowlifes, not the expected lowlives. This is because a lowlife is a type of person, not a type of life; the word’s meaning isn’t determined by its ‘head’ (life). This structural quirk blocks the percolation of information from the root word life, imprisoning its irregular plural lives in the lexicon. With no irregular form available, the mind’s [[default rule]] applies, yielding the regular plural. The same logic explains why a baseball player flied out—the verb is derived from the noun a fly ball, making it headless and blocking access to the irregular past tense flew.
Memorable Quotes
The premise of this book is that there are two tricks, words and rules. They work by different principles, are learned and used in different ways, and may even reside in different parts of the brain.
— Page 13, Chapter 1: The Infinite Library
Since the dawn of the modern study of the mind in the late 1950s, children’s language errors such as breaked and holded, which could not have been parroted from their parents’ speech, have served as a vivid reminder that the mind of the child is not a sponge, but actively assembles words and concepts into new combinations guided by rules and regularities.
— Page 8, Preface
Irregular and regular forms therefore would be the inevitable outcome of two mental subsystems, words and rules, trying to do the same thing, namely, express an event or state that took place in the past.
— Page 31, Chapter 1: The Infinite Library
The past-tense debate is the latest battle in a centuries-old disagreement over two very different ways of understanding the mind.
— Page 107, Chapter 4: In Single Combat
We have digital minds in an analog world. More accurately, a part of our minds is digital. We remember familiar entities and their graded, crisscrossing traits, but we also generate novel mental products by reckoning with rules.
— Page 326, Chapter 10: A Digital Mind in an Analog World
Comparative Analysis
In ‘Words and Rules,’ I aimed to make a central debate in cognitive science accessible by focusing on a single, elegant case study. This approach contrasts with the highly technical, comprehensive nature of works like Noam Chomsky and Morris Halle’s ‘The Sound Pattern of English,’ from which my rule-based analyses descend. While their work lays out a formal system for all of English phonology, my book tests the psychological reality of one component of that system against its chief rival. That rival, [[connectionism]], is detailed in the Parallel Distributed Processing volumes by David Rumelhart and James McClelland. Unlike their work, which champions a single, association-based mechanism for all of cognition, I argue that such models fail to capture the crisp, open-ended nature of regular grammar. My book thus serves as a bridge, agreeing with connectionists on the associative nature of memory (for irregulars) but siding with the Chomskyan tradition on the need for symbolic rules (for regulars). Compared to my earlier book, ‘The Language Instinct,’ which surveyed the entire field of language, ‘Words and Rules’ is a deep dive, providing the detailed evidence for the hybrid cognitive architecture I had previously sketched.
Reflection
My goal in this book was to use the humble English verb to illuminate the architecture of the mind. Its greatest strength, I believe, is this very focus: by examining one phenomenon from every possible angle—linguistics, child development, history, neuroscience, and even other languages—we can see a profound principle emerge. The mind is a hybrid, a composite of an associative, pattern-matching memory and a symbolic, rule-applying computer. This [[words-and-rules theory]] offers a satisfying resolution to the centuries-long debate between empiricism and rationalism, suggesting both were right about different parts of the mind. A skeptical reader might argue that the dichotomy is too neat. Perhaps the distinction is more of a continuum, and the associative memory I posit for irregulars, which can generalize by analogy (as in sneak-snuck), is itself a kind of fuzzy rule system, blurring the clean line I have drawn. This is a fair critique, but the weight of evidence, especially the [[double dissociation]] seen in the brain, points to two genuinely distinct systems. For an AI product engineer, the book’s significance lies in its demonstration of a fundamental design choice in intelligence: the trade-off between storing specific solutions and computing general ones. Understanding this balance is critical to building systems that are both efficient and flexible.
Flashcards
Card 1
Front: What is the central thesis of the [[words-and-rules theory]]?
Back: Language is a hybrid system with two distinct components: 1) A finite, memorized lexicon for storing arbitrary words, including irregular forms like brought. 2) A combinatorial grammar of symbolic rules for generating sentences and regular forms like walked.
Card 2
Front: What is a [[double dissociation]] and how does it provide neurological evidence for the words-and-rules theory?
Back: It’s when two patient groups show opposite patterns of impairment. For verbs, some patients with frontal/basal-ganglia damage struggle with regular verbs but not irregulars, while others with temporal/parietal damage show the reverse. This suggests two separate neural systems for rules and words, respectively.
Card 3
Front: What is a ‘headless’ word and why does it take a regular inflection?
Back: A word where the rightmost element does not define the category of the whole (e.g., a lowlife is a person, not a type of life). This structure blocks access to any irregular form stored with the root (like lives), forcing the application of the default regular rule (yielding lowlifes).
Card 4
Front: What is the [[U-shaped learning curve]] in children’s acquisition of verbs?
Back: A three-stage pattern: 1) Correct use of irregulars (e.g., went), learned by rote. 2) Overregularization errors (e.g., goed), after discovering the ‘-ed’ rule. 3) Return to correct use (went), after learning to block the rule for memorized exceptions.
Card 5
Front: How does the German plural system demonstrate that a ‘regular’ rule is a psychological default, not a statistical majority?
Back: In German, the default plural suffix is ‘-s’, which applies to novel words, names, and borrowings. However, it is used on only a small minority (~4%) of German nouns. This shows that a rule’s ‘regularity’ is its role as the go-to computation when memory fails, not its frequency.
Card 6
Front: What is the key difference between [[classical categories]] and [[family resemblance categories]]?
Back: Classical categories (e.g., ‘odd number’) are defined by crisp, all-or-none rules. Family resemblance categories (e.g., ‘game’) are fuzzy, graded, and organized around prototypes with no single defining feature. Regular and irregular verbs map onto this distinction.
Card 7
Front: Why do children make overregularization errors like goed?
Back: Not because they forget the correct form (went), but because their memory for it is not yet strong enough to be retrieved reliably every time. When retrieval fails, their newly acquired ‘-ed’ rule is applied as the default, producing the error.
Generated using Google GenAI