PART III ][ Chapter 21
Spaces of Meaning: Conceptual History, Vector Semantics, and Close Reading
Michael Gavin, Collin Jennings, Lauren Kersey, and Brad Pasanek
In the digital humanities, much research in text analysis has concerned techniques of observing large-scale lexical patterns. This chapter argues for a method of computationally assisted close reading that draws from two distinct intellectual traditions: conceptual history and vector semantics. The history of concepts is a subfield of political history, spearheaded by Reinhart Koselleck and devoted to the study of sociopolitical ideas like nation, democracy, rights, and progress.[1] The primary challenge for concept theory is to articulate how ideas function as historical objects and to explain how they emerge and transform over time. Vector semantics is a subfield of computational linguistics devoted to the quantitative study of word meaning. Its intellectual tradition stretches back to the postwar period, when linguists like Zellig Harris and J. R. Firth first articulated the “distributional hypothesis,” suggesting that semantics could be modeled by analyzing how words co-occur throughout a corpus. We argue that conceptual history and vector semantics play well together: concepts are precisely the kinds of historical object that semantic analyses are good at measuring, and concept theory provides an apt vocabulary for interpreting quantitative results of text analysis. Much like the “topics” produced by topic models, concepts are best understood as structural patterns that recur among words. Roughly analogous to communities like those detected in social-network analysis, concepts are clusters of terms that co-occur throughout a corpus. The task of historical vector semantics is to identify such concepts, to see where and when they appear and how they change over time, and to study how they are deployed in individual documents, by individual authors, and among historical communities.
Vector-space models represent a corpus in much the same way topic models do: both kinds of modeling produce lists of words meant to display an underlying thematic or conceptual coherence, and both identify latent semantic features that stretch meaningfully across large corpora.[2] According to Matthew Jockers, “cultural memes and literary themes are not expressed in single words or even in single bigrams or trigrams. Themes are formed of bigger units and operate on a higher plane” (122). Topics represent patterns of word collocation that transcend individual sentences, paragraphs, and even documents.[3] Similarly, vector-space models of word meaning trace patterns over an entire corpus. However, rather than identify patterns over a preselected number of topics (whether fifty or five hundred), vector-space models create statistical profiles for every word, and each of those profiles can be broken up into any number of clusters. For this reason, vector semantics is not as good at describing the broad themes that predominate in a corpus, but is very good at detailing word uses at the microlevel, which includes exploring all the ways a word has been used, identifying how it is invoked in a single text or phrase, or showing how the meanings of words combine, at a deep conceptual level, in the human statements that deploy them.
Discussion of textual analysis has tended to emphasize the ability of computers to describe language at a large scale, as in the oppositions between “distant reading” and “close reading” or between “micro-” and “macroanalysis.”[4] Privileging large-scale studies that stretch over centuries has an unfortunate consequence: it forecloses attention to how those large linguistic patterns inform the meanings of particular words and phrases. Recent developments in conceptual history have shown how concepts, visible at the macroscale, inform the individual statements recorded in texts. The work of conceptual historian Peter de Bolla, in particular, has shown how texts mobilize and actualize concepts drawn from a “conceptual substrate” of language. To illustrate how texts do this and to demonstrate how semantic models might support the study of concepts, we offer a short case study that uses a large-scale model of historical semantics—drawn from the Early English Books Online (EEBO) corpus—to describe the concepts deployed in a single text. John Dryden’s MacFlecknoe (1678) is known to historians of British literature as a classic in the genre of parody that masterfully juxtaposes low topics (in this case, the petty rivalries of poets) with high ones (the successions of kings). Such burlesques depend for their satire on finding deep conceptual similarities that cross social domains. For Dryden, one such concept was wit, a mental ability and a form of verbal performance that engendered new social hierarchies. MacFlecknoe therefore represents an innovative political application of the period’s most important literary concept.[5] Moreover, wit serves as an analog to or metaphor for semantic modeling in general. Wit describes the faculty for finding resemblance in difference, for seeing that seemingly disparate concepts are surprisingly allied when considered in new lights. Philosopher John Locke described wit as the “assemblage of Ideas,” and there is likely no better phrase to describe computational semantics. A vector-space model is a vast assemblage of ideas from which we hope to assemble new histories of concepts.
Conceptual History
Discussions of concept theory often begin by emphasizing a basic point: concepts are not words. The distinction can be glimpsed by considering any collection of synonyms, such as rubbish, trash, waste, junk. Each of these words has a distinct use, and they imply different connotations and meanings, but there remains a principle of synonymy among them, an underlying concept that ties them together. Yet, that conceptual je ne sais quoi does not perfectly fit any of them. Just as none of these words exactly matches any other, so too none is identical to the idea of “stuff that gets thrown away.” The technical term in linguistics is onomasiology, which names the study of how multiple words express a single concept. An onomasiological approach to modeling language makes it possible to trace ideas comparatively across national boundaries or diachronically across time. American, Chinese, or Iranian concepts of democratic governance might have surprising similarities even if they use different words for “freedom.” Concepts are verbal forms that are more complex than words or, if not more complex, at least they are different.
The phrase “history of concepts” is most directly associated with Reinhart Koselleck, a twentieth-century German historian and social theorist whose greatest work was a massive encyclopedia of political terms: the Geschichtliche Grundbegriffe, or, as the title has been translated, Basic Concepts in History: A Historical Dictionary of Political and Social Language in Germany.[6] Koselleck’s lexical histories trace the evolution of concepts across the Enlightenment, following the transition from the feudal system to a modern, capitalist moral order. Conceptual history critiques narratives that project modern meanings of terms onto the past or those that presume that ideas are inexorably determined by changes in technology, industry, or class relations. Koselleck’s analysis aims to show that ideas do real work forming a culture’s horizons of expectation and intelligibility; they are not merely along for the ride. The history of concepts is meant to narrate change diachronically over time; to show how concepts emerge, evolve, and fall away along multiple temporalities (which only sometimes correspond to the course of human events).[7] Koselleck also examines concepts synchronously to show how they subsist at any moment in time within a semantic field, where concepts gain meaning in relation to each other. A concept like “freedom” means nothing by itself; it acquires significance only against counterconcepts like “slavery.”[8]
Critics of Koselleck’s theory (Quentin Skinner and J. G. A. Pocock, most famously) have doubted that concepts really can be abstracted from their contexts, and so they prefer to narrate the histories of social movements and political conflicts.[9] For Koselleck and his followers, however, taking the concept as the primary unit of analysis retains much appeal. Most importantly, the history of concepts aims to uncover something more fundamental, abstract, and potentially valuable than social history can do by itself. Koselleck posits, “Without common concepts, there is no society, and above all, no political field of action” (Futures Past, 76). At its most ambitious, the history of conceptuality hopes to show how critical ideation reacts to and spurs social change, with the utopian goal of laying a new conceptual foundation for a more just world order. Koselleck explains, “Concepts no longer serve merely to define given states of affairs, but reach into the future . . . positions that were to be secured had first to be formulated linguistically before it was possible to enter or permanently occupy them” (80).
The history of concepts has taken a computational turn in the work of Peter de Bolla. His 2013 book, Architecture of Concepts: The Historical Formation of Human Rights, is a signal example of studying the past by means of its linguistic collocations. His primary case study is the history of rights. Relying on keyword searches in Eighteenth Century Collections Online, de Bolla shows how the words that appeared near rights changed over the hundred-year period. Early on, rights were associated with the liberties and privileges of institutions like parliament and the church. By the end of the century, however, rights had inhered in the individual and were more likely to collocate with words like man and sacred. De Bolla’s book thus offers a rationale and an exemplary method for the study of “conceptual forms”; that is, concepts concretized in usage. Locating himself in the wake of the twentieth-century linguistic turn and in a new, twenty-first-century digital moment, de Bolla furthers the work of intellectual history and discourse analysis by reconfiguring the conceptual turn in intellectual history by computational means.
Although the deep, structuring concepts investigated are “ineradicably linguistic,” de Bolla claims to be doing more than just counting words and describing linguistic patterns (3). Indeed, he continually figures his inquiry in terms of geometry, subway maps, architecture, and networks. These metaphors provide foundational, architectonic images for historical possibility:
Concepts operate according to a specific grammar and syntax that situates them in a network of linked concepts. That grammar and syntax becomes legible by tracking the use of words over time, as do the networks within which they were historically suspended. In order to be linked together, concepts have to present themselves as variously open or closed; this “fit” depends upon the shape or format of a concept’s edges, which permit more or less compatible interconnections. That shape is determined by the internal configuration of each concept. Taken together, the network connections and the internal structure comprise the architecture of a concept. (40)
Aiming to quantify the dispersion of conceptual forms in printed matter, de Bolla’s study of human rights is instead a history of that which is in excess of the word. Concepts do not inhere in words, but emerge among their interactions. His main effort is to exhume a conceptual “substrate” or “network” by attending to the keyword rights and its changing collocations.
Whereas de Bolla traces the history of a single term, vector-space models construct large matrices of words that summarize a more complete set of interconnections. These connections reflect common habits of language that permeate a corpus, but are largely invisible to writers and readers, resulting in linguistic patterns that transcend the subjective experience of communication. Jonathan Hope and Michael Witmore argue that textual models provocatively generate “retroactive statistical facts—independent of subjects who could never have hoped to recognize them but [that are] already real for us, once belatedly discovered” (148).[10] Text analysis finds collocates, bags of words, and keywords-in-context at the scale of the corpus. Considered in the light of concept theory, we might say that collocation analysis uncovers layers of meaning that form the basic conceptual structure of language without being experienced as such by readers or writers. In his study of rights, de Bolla shows that such structures can be identified using quantitative methods, but he leaves open the question of which mathematics is most appropriate for this purpose.
Vector Semantics
Many digital humanists first encountered computational methods of modeling linguistic structures in topic modeling, which makes it possible to track large-scale patterns of language. This trend, however, has run counter to research in computational linguistics—most notably using vector semantics—that has focused on word-sense disambiguation. Vector semantics is known by a variety of terms, such as “vector-based representation,” “distributional semantics,” and “word-embedding models.” Whatever one calls it, the theory that informs vector semantics can be traced to the postwar period of the late 1940s and 1950s, when military success with cryptography led scholars to believe that language as a whole might be computationally tractable. Edwin Reifler, Victor Ngve, and Anthony Oettinger, among others, sought to train computers to automatically translate scientific papers in German, English, Chinese, and Russian (see Nirenburg, Somers, and Wilks). Before accomplishing this feat, however, they faced an obvious and seemingly insurmountable challenge: How could computers differentiate among the possible meanings of a word in order to identify the appropriate synonym in another language? In a 1949 memorandum (first published in 1955), Warren Weaver, then director of the Rockefeller Foundation Natural Sciences Division, proposed a technique for solving this problem. Rather than map each word directly onto a dictionary, documents would first be subject to an additional stage of processing in which each word would be compared to the words in its immediate context:
If one examines the words in a book, one at a time as through an opaque mask with a hole in it one word wide, then it is obviously impossible to determine, one at a time, the meaning of the words. . . . But if one lengthens the slit in the opaque mask, until one can see not only the central word in question but also say N words on either side, then if N is large enough one can unambiguously decide the meaning of the central word. (Weaver, 15–16)
Of course a computer cannot tell the difference between “plane” and “plane,” but if the context window is expanded just a bit, computers might be able to differentiate “pilot landed the plane at the airport” from “line intersects the plane at an angle.” Air travel and geometry contribute two senses of the word “plane,” and those differences could be measured (or, at least, approximated) using collocation data.
What began as a specific practical solution to a challenge in scientific communication was immediately recognized for having the potential to transform the study of semantics.[11] The latent idea here is that different words will tend to appear in different contexts, and therefore one can guess at a word’s meaning by simply counting the words that appear near it. Zellig Harris and J. R. Firth wrote pioneering essays in this vein, advancing what has since come to be known as the distributional hypothesis, which posits that words that tend to co-occur have similar meanings: as Harris proposed in his seminal 1954 essay, “Difference of meaning correlates with difference of distribution” (43). In the sixty years since, Harris’s hypothesis has been subject to rigorous study and, indeed, has been confirmed as something like a natural law of language processing. It is often invoked using Firth’s famous 1957 dictum: “You shall know a word by the company it keeps” (11).[12] Vector semantics takes the distributions of terms as proxies for their meaning, and so meaning itself becomes something at once more general and more nuanced: semantic similarity.[13] Vector models embed words in a field of lexical relations distributed in high-dimensional space, where every word is measured across every dimension for its similarity to every other word. According to Hinrich Schütze, “Vector similarity is the only information present in Word Space: semantically related words are close, unrelated words are distant” (896).[14] However, this binary of “close” and “distant” masks what is, in truth, a system of many dimensions, each of which is measured in minutely fine gradations. These variations can be conceived as a large field—a semantic space—in which words are embedded among each other, near in one dimension, distant in another.[15]
How is semantic space modeled? One common strategy is to build a word-context matrix. Imagine a human reader who skims through each text from a large corpus and finds the one thousand most frequent words in it. Now imagine that this reader scrolls through the entire corpus again, counting all the words that appear near each keyword, and then tabulates the results in a giant spreadsheet. For example, consider the following context windows, taken from Wikipedia, that surround the keywords “art” and “nature”:
- production of works of art the criticism of art
- purpose. In this sense Art as creativity, is something humans
- creativity humans by their nature no other species creates art
- science. Although humans are part of nature human activity is often understood
The distribution of terms that surround each keyword can be placed together into a matrix. The collocations are tallied so that each column shows the keywords (art and nature) in the context of another term, which are conventionally placed as rows. The numbers in each cell reflect the number of times each context word appears near each keyword. In this simplified example, there are only two dimensions (art and nature), and each of the six words in the corpus (art, creativity, criticism, humans, nature, and science) is represented by a vector of two numbers (see Table 21.1). The word creativity is represented as the vector [4, 1], while science is the sequence [2, 4]. These values can in turn be plotted in a simple Cartesian plane. As shown in Figure 21.1, the words form a semantic field that divides into two smaller “conceptual spaces,” with science and creativity as their most central respective terms and humans as the word that mediates between them.[16]
In vector semantics, words are presumed to have similar meanings insofar as they appear near each other in spaces like these. Reading such data through the lens of concept theory suggests a different interpretation, however. We might say Figure 21.1 identifies two concepts—one centered among nature, science, humans; and the other among humans, creativity, art—neither of which can be conveniently reified into a single label nor equated to the meaning of a word. Instead, concepts are structures of relationships that pertain among words. Much like communities in a social-network graph, concepts are clusters of nodes that can be taken together in semantic space.
The art–nature example just shown is artificially simple. Real corpora present researchers with a large mass of data. Even a single book-length document stretches concepts across a space of thousands of dimensions, and a true corpus densely compacts that space with so many connections that all concepts dissolve into a giant alphabet soup. The challenge is to slice, condense, or summarize the data in ways that will expose its most important underlying conceptual features. A number of strategies are possible. Among digital humanists, the most familiar of these is topic modeling, which reduces the corpus to an artificially small number of dimensions (“topics”) and then estimates the vocabulary’s distribution over each. Latent Dirichlet allocation (LDA) modeling is designed primarily to survey and compare documents; it treats all texts as “bags of words” and measures document-level associations among them. Machine-learning applications like word2vec, proposed by Thomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean, use continuous bags of words (CBOW) and neural networks to estimate a word-context matrix within a smaller, denser semantic space. By reducing the number of dimensions, models generated by word2vec often outperform traditional collocation matrices in controlled experiments: in a full word-context matrix, subtle differences in usage will separate words like boat and ship, while probabilistic models tend to dismiss rare co-occurrences, allowing them to focus on the clearest lines of semantic similarity.[17]
Table 21.1. A Simplified Word-Context Matrix | ||
---|---|---|
art | nature | |
art | 5 | 1 |
creativity | 4 | 1 |
criticism | 2 | 0 |
humans | 3 | 3 |
nature | 1 | 5 |
science | 2 | 4 |
Figure 21.1. Distribution of terms surrounding nature and art in a toy corpus. In this graph, the words are represented as points in space with a contour overlay that impressionistically highlights clusters among the points. The contour lines ease visualization, but are not meant to be more than suggestive.
In creating our own model of semantic space, we ran word2vec over the EEBOText Creation Partnership corpus, limiting our selection to all documents dated 1640–1699 (for a total of 18,752 documents) and creating a model of the language with a vocabulary of 102,164 words.[18] Each word in the model is represented by a vector with 300 variables designed to estimate the word’s appearance across the corpus.[19] Once completed, the model supports a wide range of statistical tests. One of the simplest is to see which words are most similar to a given keyword. For example, the words most similar to wit in the EEBO-TCP corpus are invention, wits, eloquence, witty, fancy, argues, beside, indeed, talks, shews, and ingenuity. This list of most similar terms provides an initial glimpse of the conceptual architecture that surrounds wit, which in the seventeenth century was understood broadly as a mental alacrity socially realized through verbal (especially oral) performance; hence, its association with rhetorical terms like invention and eloquence. As this example suggests, reading the results of vector similarity measurements is much like reading the results of a topic model.[20] The difference is that topic models return words that tend to appear in the same documents; vector-space models return words that tend to appear in similar context windows.
Vector-based representation does more than just provide a list of collocates or approximate synonyms, however. It also measures these words’ relationships to each other and shows how they are distributed within semantic space. Much like topics, words in semantic space cluster into groups that often cohere around different senses of a word or different contexts of use. The dimension-reduction techniques of vector-space modeling and principal-component analysis represent axes of difference that reveal latent organizing principles of the semantic field projected by a corpus. The forty words most similar to wit cluster fairly neatly into several main groups (see Figure 21.2).[21] (This graph and the graphs that follow use hierarchical clustering to identify groups within the data, which are highlighted using a density overlay to suggest their boundaries. Readers should keep in mind, however, that this clustering involves setting an arbitrary number of divisions, just as topic modeling does, so the boundaries that seem to separate words in semantic space are neither hard nor fast.) On the left-hand side are two clusters of synonymous words, like eloquence and invention, but also sophistry and fancy.[22] On the right-hand side sit clusters of words that seem to imply argumentation—seems, pretends, shews, consequently, beside, consists, indeed, and argument. This might seem strange. After all, why is consequently semantically similar to wit? The reason is that wit was an intensely normative concept and was rarely invoked without explicit judgment: writers performed their own ingenuity while critiquing the sophistry of others.[23] Words that surround wit in the EEBO-TCP corpus also commonly appear near argument words in general; therefore, they have similar vector profiles and sit near each other in semantic space.
None of this adds up to the meaning of wit in a conventional sense. Instead, Figure 21.2 shows something like a snapshot or slice of the discourse taken from the perspective of wit, one that delivers a compact representation that still preserves the multiple temporalities prized by conceptual history. Like eloquence and invention, wit names a conceptual formation that straddles psychological and social models of discourse, pointing at once through language into the minds of authors while, more narrowly, highlighting the connective logical tissues of argument. If this graph shows a concept, that concept cannot really be reified as “wit.” As noted earlier, John Locke, for instance, defined “wit” in purely cognitive terms as “the assemblage of Ideas,” but the model returns a concept more axiomatic and abstract, which might be paraphrased as “mind as a principle of order in discourse” (2:156).
Figure 21.2. The semantic neighborhood of wit. Graph shows the forty words most similar to wit in a word2vec model built from the EEBO-TCP corpus (1640–1699). Terms are clustered using hierarchical clustering and projected onto a two-dimensional space using principal-component analysis. On the left side are various synonyms (invention, ingenuity, skill, fancy, sophistry), while on the right sit terms of argumentation, such as indeed, argues, and consequently.
Vector-space models can be validated in a number of ways. One common test involves simulating verbal exams taken by humans. Many vector-space models have been shown to outperform high school students on the SATs, for example.[24] The designers of word2vec developed a novel test to validate their system: forming analogies using what they call the “vector offset method.” Because words are all represented with numerical profiles, they can be added or subtracted from each other. Imagine one begins with the vector for king, subtracts the vector for man, then adds the vector for woman. If the semantic space has been modeled and normalized in a trustworthy way, the result should be approximately equal to queen. The operation can be restated in mathematical form as king−man+woman=queen.[25] This and similar conceptual relationships can be traced through the EEBO-TCP model, which captures category–example relationships such as king−charles+paul=apostle; part–whole relationships like inches–foot+pound=ounce; singular–plural forms like artery−arteries+banks=bank; or verb conjugations like please −pleases+talks=talk. In each case, the “right answer” appears among the most similar words with varying levels of specificity. Because of spelling variation in early modern print, capturing verb conjugations and pluralizations may be the wrong test for EEBO-TCP data, however. Far better is the model’s ability to capture alternate spellings of the same word. As shown in Table 21.2, the words most similar to africa, baptism, and publick are, respectively, affrica, baptisme, and publique. The most important test, though, is the model’s ability to return lists of words that represent a concept’s range of application, providing something like a historical thesaurus.
That said, the parlor trick of king−man+woman=queen highlights a key difference between vector-space models and, say, topic models: vector-space models are not designed to create passive representations that simply delineate themes; they are meant to be used. In machine translation, the words that appear in a context window are added together, such that the system can provide a best guess for the appropriate analog term. Just as king−man+woman will point toward female monarchs, so too plane+pilot+landed+airport will point toward l’avion, in French, rather than le plan. Search engines work according to the same principle.[26] When multiple words are entered into a query, those words are added together and the results are sorted by their cosine similarity (their “relevance”) to the aggregate vector. Any group of words can be treated as a composite entity. When wit is combined with poetry, for example, argument words like consequently are stripped away, while poetry is associated with terms commonly deployed, along with wit, in criticism (see Figure 21.3). Just as search engines might highlight the literary uses of wit and the critical uses of poetry in response to the query “wit poetry,” so too the model returns a representation of the conceptual substrate that connects those terms. Vector composition elevates semantic modeling above the level of the word to describe the structures that contain words.[27]
As the examples of translation and search suggest, although vector-space models are built using large corpora, they are actually designed to perform microanalyses of texts, phrases, and snippets of words. This might seem strange. The digital humanities stumbled into computational semantics through topic modeling, assuming that the most important applications involve “distant reading” over large swaths of time. However, decades of work in computational linguistics have pushed in precisely the opposite direction by using big data to answer small questions: What book was this search query looking for? Was that email spam? What might a user have meant by this misspelled word? Distant reading projects have not explored this direction of inquiry. Humanists are just beginning the work of studying how semantic models might inform qualitative research, and we choose in this chapter to begin “closer to home,” so to speak, not only to the humanities but also to vector semantics, by approaching conceptual history through computationally assisted close reading.[28] In what follows, we use vector semantics to analyze a canonical text, John Dryden’s MacFlecknoe (1678), to see how wit organizes the poem’s conceptual structure. Learning how to generate, visualize, and interpret semantic data will be a major challenge facing humanities computing over the next decade. Vector-space models are remarkably flexible and can be constructed in many different ways, supporting a wide and growing range of intellectual purposes. We offer just one here.
Figure 21.3. The semantic neighborhood of the composite vector of wit and poetry, drawn from a word2vec model of the EEBO-TCP corpus, 1640–1699. When added together into a single entity, the composite vector of wit + poetry bisects its components, exposing the semantic region that sits between them.
Computational Close Reading
Wit is widely regarded as a concept central to English culture of the later seventeenth century. From a medieval term denoting reason and intelligence to a modern term of criticism, art, and politics, wit has long borne the burden of reconciling minds with the social and political hierarchies policed through words.[29] John Dryden’s MacFlecknoe sits at the center of this transformation, exemplifying how neoclassical poetry stages mutually informing contradictions between wit’s psychological and political connotations.[30] Dryden was an English playwright who rose to prominence in the 1660s and early 1670s. Theatrical successes like The Indian Emperor (1665) and The Conquest of Granada (1672) earned him financial security, a literary reputation, and many detractors. In critical prefaces, prologues, and epilogues, Dryden defended his work against critics while often picking fights with other poets. His poem MacFlecknoe is a satirical takedown of his main rival, Thomas Shadwell, depicting a dystopian alternate universe where Shadwell is a callow prince newly coronated as the King of Nonsense. MacFlecknoe has been widely read as a meditation on authorship and kingship and on the personal qualifications needed for legitimacy in literary and political arenas.[31] Therefore, it offers a perfect case study in tracing the conceptual fields that structure a text. We conclude by showing how Dryden draws on wit’s conceptual association with sophistry and fancy to produce a new ligature between poetry and kingship.
MacFlecknoe is written in heroic couplets, and like semantic models, couplets produce meaning by juxtaposing words in space. In addition to proximity, heroic couplets use rhyme and rhythm distributed across paired lines, each of which is divided into two halves by a caesura, to effect relationships among words. Literary historian J. Paul Hunter has argued that the heroic couplet functions as an instrument for redefining binary oppositions, like wit and judgment or nature and art. He observes that poets like Dryden and Alexander Pope used the heroic couplet to develop a “rhetoric of complex redefinition” that “challenges the transparency of the apparent rhetoric and blurs and bleeds images of plain opposites into one another” (119).[32] He continues,
The effect, though, is not to fog or muddy or obscure—much less to deconstruct meanings to nothing stable at all—but to use the easy opposition as a way of clarifying the process of deepening qualification and refinement. It is a demonstration of how to read as an exercise in how to think. The process is rather like that described by information theorists and cognitive scientists trying to explain how computers can work and reason complexly—not by facing a great variety of complicated choices simultaneously but by sorting things one by one into little yeses and nos, ones and zeroes. Refinement occurs progressively, step by step. (119)
In comparing the binary opposition of terms in couplets to the binary code of digital media, Hunter incidentally gestures toward our claim regarding the interpretive potential of computational models. Vector semantics highlights the relational, continuous structure of the lexical field, wherein meaning is negotiated across innumerable lines of similarity and difference. Our visualizations of the semantic space of wit exchange the quadrants of the couplet for the quadrants of the graph. The heroic couplet and the vector model use proximity and association to convey linguistic meaning according to radically different rubrics, but reading one in relation to the other provides an opportunity to uncover latent premises of both.
Consider the correspondence between the opening couplet and its graphical representation relative to the larger Restoration model. In the couplet, the structural alignment between “subject” and “monarchs” introduces the tension between the contemporary divine conception of kingship and the inevitability of human decline:
All humane things are subject to decay,
And, when Fate summons, Monarchs must obey (ll. 1–2)
Just as the words of a search query are combined to return as a list of most relevant results, so too the words in a couplet can be combined to visualize the semantic fields juxtaposed in the verses. Figure 21.4 represents the words most similar to the composite vector of the terms in the couplet: humane+things+subject+decay+fate+summons+monarchs+obey. Each word in the graph attracts words with which it has high similarity scores in the model and repels words with which it has low scores. The graph disaggregates the semantic fields that Dryden yokes together in the couplet. While the arrangement of the couplet produces a mirroring relationship between contrasting words (humane and fate, subject and monarchs, decay and obey) that occupy similar positions in the two lines, the graph represents the semantic divide by grouping similar terms and separating opposing ones. The monarchs cluster is located at the point of a triangle, far from the other opposing points centered on obey and decay/fate. The intervening subject cluster occupies a mediating position, providing the semantic ligature between monarchs and obey. The graph thus foregrounds the questions of subjection, power, and authority—both poetic and political—that the opening couplet presents as the poem’s central concern.
Figure 21.4. Semantic neighborhood of MacFlecknoe’s opening couplet. Graph displays the forty terms most similar to the composite vector, humane + things + subject + decay + fate + summons + monarchs + obey. The terms in bold appear in the couplet. The model exposes a rich language of subjection as central to the couplet’s conceit.
For Dryden, wit operates as a concept for adjudicating questions of authority that cut across poetic and political domains. The retiring king of Dulness positions Shadwell (Sh——) as his son and heir, while also designating his species of dullness as distinctive:
Sh——alone, of all my Sons, is he
Who stands confirm’d in full stupidity.
The rest to some faint meaning make pretence,
But Sh——never deviates into sense.
Some Beams of Wit on other souls may fall,
Strike through and make a lucid interval;
But Sh——’s genuine night admits no ray,
His rising Fogs prevail upon the Day. (ll. 17–24)
Both the positive and negative terms from nature serve to isolate Shadwell. A lack of wit holds poetic and social consequences, impeding the conjunction of proper images as well as of proper rulers.
Figure 21.5 depicts how MacFlecknoe engages and draws from the concept of wit. It combines the semantic spaces of words from two sources: half the words are the ones most semantically similar to wit in the model, and the other half are the words Dryden actually uses near wit in the poem. Like Figure 21.2, this visualization represents the semantic space of wit, but now that space has collided with Dryden’s diction. Words with high similarity scores, like invention and reasoning, are plotted near wit, while the actual words of the poem gather separately. Words from MacFlecknoe have little overlap with the upper-right cluster because, in the poem, the key contrasts that wit invokes are not about rhetoric, eloquence, pedantry, and sophistry, but are located in the realm of poetry.[33] Wit is taken out of the realm of learned argumentation (where it is more common) and redeployed in the specialized field of dramatic poetry, which is marked by writers and kinds of writing, the institution of the stage, and epitomizing examples like Ben Jonson, found in terms from the upper-left section of the graph.[34] However, in the 1670s, unlike today, the discourse of literary criticism was not well established, and so Dryden uses a mock-heroic conceit (kingly succession) to structure his invocations of literary authority and subjection. In the bottom-left corner, terms including reign, king, truce, realms, and war correspond to the political context of the poem. Another smaller cluster (featuring father, name, born, right) to the right of the political terms indexes the patrilineal process of transmitting the kingship from Flecknoe to Shadwell. These smaller political and social clusters make the performance and evaluation of wit intelligible as a criterion for organizing the emergent literary field according to alternative social and political methods of ordering.
Figure 21.5. Dryden’s MacFlecknoe and the conceptual structure of wit. The terms most similar to wit in the model are represented in gray and a sans serif font, while the words Dryden actually uses near wit in MacFlecknoe are black and in a serif font. Four of the Dryden terms (wit, learning, sure, and shew) are also among the most similar terms found in the model.
What are left are the more varied words that occupy the positions between the legible, stable clusters, and the capacity to visualize the relationship between such words and coherent clusters suggests the new kind of historical knowledge that this comparative reading method can produce. Taken together these words (including toyl, numbers, bulk, immortal, beams, lard, flowers, hungry, copies) return a much lower average similarity score than that of either the wit clusters or the political and dramatic clusters in the poem. These words index unexpected tropes and figures that combine to produce a new view of wit. The poet depicts “beams of wit,” a “scourge of wit,” the “toyl of wit,” and a “kilderkin of wit,” an early modern unit of measurement (ll. 21, 89, 150, 196). He accuses Shadwell of “tortur[ing] one poor word ten thousand ways,” but in iteratively reconceiving wit, Dryden suggests a wide range of sources that constitute the faculty. In the process, he explores a conceptual space between the classical sources of poetic invention: tradition (portrayed as patrilineal succession) on the one hand, and inspiration (personified as the Muses) on the other. Shadwell succeeds to a “province” in which he invents “new humours” for “each new play” (ll. 187–88). While it is a famous literary-historical claim that the modern concept of originality emerges in Romantic poetry of the late eighteenth century, Dryden’s competing images suggest a nascent predecessor. In MacFlecknoe, originality looks a lot like dullness. All witty writers may be alike, but each dull writer is dull in his own way. Considered in relation to the broader history of wit, Dryden’s aberrant terms and figures present a multivalent concept that complicates critical genealogies of the poetic imagination in early modern England.
Huddled Atoms
In another poem, written a few years later, Dryden accuses hack poets of assembling their works “the Lucretian way” from “So many Huddled Atoms.” The result is invariably a mass or heap: the precise opposite of Dryden’s finely crafted couplets. The heaps of words jumbled by bad writers serve as a perverse seventeenth-century analog for the bags of words used by computational semantics. Yet, it is precisely this grammatical formlessness that enables semantic models to reach below the meanings of statements and to glimpse the conceptual substrate of culture. Concepts act in history not by inhering to words nor by finding expression in sentences, but by forming semantic fields that underlie the very conditions for thought. This is the basic premise of concept theory, and it shares much in common with vector semantics, which delineates the field of linguistic possibility underlying every statement in a corpus.
However, readers of this chapter need not take on board all of the assumptions and commitments of concept theory, nor need they have been persuaded by our close reading of MacFlecknoe to appreciate our larger ambition, which is to push the conversation toward areas of theoretical overlap that cross the disciplines. Rather than ask how computational “tools” or “methods” or “practices” can be used for humanistic purposes, we invite readers to examine the theoretical assumptions and intellectual investments that motivated those methods’ invention, as well as to look for commensurabilities across the disciplines that might inform future work. There is sometimes a tendency in digital humanities to skip to the end by speculating about how computational methods will transform the humanities without digging into the intellectual histories of those methods or explaining how the theoretical assumptions inherited from other disciplines align with the assumptions that otherwise inform one’s thinking.
We have tried to bridge this gap by emphasizing points of contact at a theoretical level between computational and humanistic scholarship. In doing so, we have offered just a few partial answers to a handful of big questions we believe should guide future work: How should scholars talk about meanings that exist beyond the confines of statements or texts? What theories of history and language support the study of broadly shared discursive features? What empirical, data-based techniques exist for measuring meaning? How do those techniques work, and what do they assume?
Notes
1. Koselleck describes concept history as a “specialized” field that directs itself “to the analysis of central expressions [i.e. keywords] having social or political content” (“Begriffsgeschichte and Social History,” 81).
2. Many ways of studying collocation patterns exist, but when text analysis is discussed in digital humanities, topic modeling has nearly monopolized the attention. Commentaries by David Blei, Ted Underwood and Andrew Goldstone, Ben Schmidt, Matthew Jockers, and Lisa Rhody have explored promising applications in literary studies and history, while John W. Mohr and Petko Bogdanov have highlighted potential uses in the cultural sciences. In Mohr’s and Bogdanov’s words, modeling provides “an automated procedure for coding the content of a corpus of texts (including very large corpora) into a set of substantively meaningful coding categories” (546).
3. Latent Dirichlet allocation (LDA), the most popular method of topic modeling among digital humanists, has well-known quirks that skeptics find off-putting; it oddly combines an arbitrary human-controlled element (users must preselect the number of topics, which powerfully affects a model’s output) with a machine-learning engine that estimates word collocation probabilistically, rather than measuring it directly. For a description of LDA as a probabilistic model, see Blei, “Introduction to Probabilistic Topic Models.”
4. For the contrast between close and distant reading, see Franco Moretti, Distant Reading. For the contrast between micro- and macroanalysis, see Jockers, Macroanalysis.
5. Indeed, wit has long been recognized as a key concept in seventeenth- and eighteenth-century British literature. C. S. Lewis observed that if a person had time “to study the history of one word only, wit would be perhaps be the best word he could choose” (95–66, 101–103, 105–106).
6. The full dictionary has not been translated into English, but the introduction has been. See Reinhart Koselleck, “Introduction and Prefaces to the Geschichtliche Grundbegriffe.” English translations of Koselleck’s monographs on historical theory include Critique and Crisis, The Practice of Conceptual History, and Futures Past. In the context of English studies, the history of concepts also may be identified with Raymond Williams. See in particular Culture and Society and Keywords.
7. One interesting example of the disjunction between conceptual and social temporalities involves the political concept “revolution,” which Koselleck argues emerges as a generally stagnant perception of time based on the slow and predictable movements of celestial objects, but eventually evolves into the modern sense of the word, which is a dramatic rupture (Futures Past, 23). For a useful overview of Koselleck’s theory of concepts, see Niels Åkerstrøm Andersen, “Reinhart Koselleck’s History of Concepts.”
8. In his overview of Koselleck’s theory of concepts, Andersen emphasizes the “semantic field” as a key component.
9. An overview of this debate can be traced in J. G. A. Pocock, “Concepts and Discourses” and Koselleck’s response in “A Response to Comments on the Geschichtliche Grundbegriffe.”
10. Witmore elsewhere argues that “a text may be thought of as a vector through a metatable of all possible words” (“Text: A Massively Addressable Object,” 325).
11. Weaver emphasizes this very point: “And it is one of the chief purposes of this memorandum to emphasize that statistical semantic studies should be undertaken, as a necessary preliminary step” (16).
12. For an introduction to Neo-Firthian work, see Tony McEnery and Andrew Hardie, Corpus Linguistics; for a statement on the relation between the fields or corpus and computational linguistics, see 227–30. Daniel Jurafsky and James H. Martin, for example, restate the hypothesis as a basic premise of computational linguistics: “Words that occur in similar contexts tend to have similar meanings. . . . The meaning of a word is thus related to the distribution of words around it” (270). In this context, the word “distribution” refers simply to word counts in a corpus. Blei uses the term similarly in his description of topic models: “We formally define a topic to be a distribution over a fixed vocabulary” (77).
13. An introduction and overview of vector-based approaches to semantic similarity can be found in Peter D. Turney’s and Patrick Pantel’s widely cited 2010 survey of the relevant literature, “From Frequency to Meaning.” For an updated and more detailed discussion, see Sébastien Harispe et al., Semantic Similarity from Natural Language and Ontology Analysis.
14. See also Will Lowe, “Towards a Theory of Semantic Space”; and, more recently, Turney and Pantel, “From Frequency to Meaning”; Katrin Erk, “Vector Space Models of Word Meaning and Phrase Meaning”; and Stephen Clark, “Vector Space Models of Lexical Meaning.”
15. A semantic space model can be understood as a generalization of the notion, in concept theory, of the semantic field. In Andersen’s words, “According to Koselleck, the analysis of the shaping, endurance and sociohistorical effects of individual concepts appear in relation to other concepts—what he calls semantic fields” (38). Such fields are understood to be structured in limited, qualitatively meaningful ways, as in, for example, the distinction between a “concept” and a “counterconcept,” its dialectical opposite. Computational semantics, by contrast, depends on statistical comparisons for each word across an entire corpus. Individual relationships between terms are identified, generally without differentiating among relationship types. Only some collocation patterns will correspond to semantic fields in the narrower qualitative sense meant by concept theorists. We might say that a semantic space contains all existing word associations within a corpus and that a semantic field selects and elevates associations that are historically relevant and interesting (relevance and interest, of course, will depend entirely and always on the interpretive judgment of the historian).
16. The term “conceptual spaces” was coined by Peter Gärdenfors to describe geometric models of word meaning (although he does not rely on word collocation as the basis for his vector representation).
17. For a breakdown of the similarities between neural-network models like word2vec and traditional distributional models, see Levy and Goldberg, “Linguistic Regularities.”
18. The results described are a subset from the full bibliography to avoid cluttering the visualizations with low-frequency terms, many of which simply represent typographical or transcription errors. Words are included in the visualizations if they meet one of two criteria: either they were used in every single year, 1640 to 1699, or they were among the 20,000 most frequent terms in any given year. Although the entire vocabulary is included for all calculations, the total vocabulary eligible for inclusion in the graphs is 26,199 words.
19. For a gentle introduction to word2vec and a practical, hands-on tutorial, Ben Schmidt has written a series of blog posts describing the application. See especially “Vector Space Models for the Digital Humanities” and “Rejecting the Gender Binary: a Vector Space Operation.” Regarding more detailed specifications, the technical and theoretical commentary is already quite large. The technique was originally described in Mikolov et al. A valuable summary can be found in Goldberg and Levy, “word2vec Explained.”
20. See Goldstone and Underwood, “What Can Topic Models of PMLA Teach Us” for a discussion of methods and challenges of interpreting topic models.
21. The x and y axes were set using a common dimension-reduction technique called principal component analysis. The position on the graph shows each word’s place in relation to the other words. The scale on each axis has been removed; as in a social-network graph, the axes of this graph are algorithmically generated and do not directly refer to any single measurement.
22. The graph visualizes additional associations between wit and rhetorical training that stretch back to Cicero. De Oratore suggests eloquence (like wit) requires an expansive background of learning. Neither eloquence nor wit is a skill that one can learn by studying the rules of a single discipline. To qualify as witty or eloquent, a speaker must have ingenuity, from the Latin word “ingenium,” which Cicero defines as “natural talent” for forging connections between disparate fields of inquiry. See Thomas Conley, Rhetoric in the European Tradition (35).
23. This cluster may allude to a distinction between wit and more common forms of humor presented by Cicero’s dialogue De Oratore through the character Marcus Antonius: wit is humor delivered for an argumentative purpose. In the words of Antonius, “a laugh is the very poorest return for cleverness.” Cicero (381).
24. Turney and Pantel describe several relevant studies in “From Frequency to Meaning.”
25. This example features in Mikolov et al.
26. Indeed, information retrieval was the research topic for which vector-space models were first theorized, in the 1950s, by Hans Peter Luhn. See in particular his essay, “A New Method of Recording and Searching Information.” Writing in 1987, Gerard Salton describes the significance of Luhn’s early work: “It was suggested, in particular, that instead of assigning complex subject indicators extracted from controlled vocabulary schedules, single term descriptors, or keywords, could be used that would be assigned to the documents without context or role specification. These single terms could then be combined in the search formulations to produce complex phrase specifications and chains of synonyms . . . the coordinate keyword indexing methods eventually became the standard used in all automatic retrieval environments” (376).
27. Dominic Widdows explains, “Continuous methods enable us to model not only atoms of meaning such as words, but the space or void in between these words. Whole passages of text are mapped to points of their own in this void, without changing the underlying shape of the space around them (164).
28. Underwood in “Distant Reading and Recent Intellectual History” makes a similar point in describing advances in semantic modeling that enable researchers to “treat writing as a field of relations to be modeled, using equations that connect linguistic variables to social ones.”
29. These uses are broader, too, than the specifically psychological application of the term as used by John Locke, who defined wit in contrast to judgment, mapping their difference onto a distinction between synthesis and analysis. Judgment, in his view, involved the careful discrimination of apparently like objects. Wit, on the other hand, involved synthetic reasoning that found surprising connections among ideas: wit discovers similarity between the most distant and dissimilar objects. These ideas are captured in words. Locke explains, “Though therefore it be the Mind that makes the Collection, ’tis the Name which is, as it were, the Knot, that ties them fast together” (III.v.10).
30. There has been significant scholarship on Dryden’s usage of wit as well as the term’s significance in Augustan poetry more generally. See, for instance, Empson and Lund H. James Jensen indexes the term by hand in A Glossary of John Dryden’s Critical Terms; his entry for “WIT” stretches to six pages.
31. Paul Hammond surveys this scholarship and provides an overview of MacFlecknoe’s textual and political history (168–79).
32. Ralph Cohen has previously made a similar point about the function of the heroic couplet in the poetry of Pope and Jonathan Swift.
33. In addition to interpreting the semantic coherence of clusters, we can evaluate the overall similarity between words in the different clusters by taking the average of the cosine similarity score of each word with every other word in the cluster. The cluster of theoretical wit terms returns an average score of 0.415, the argument terms cluster has a score of 0.278, the dramatic criticism cluster scores 0.297, and the political cluster scores 0.384. For comparison, the score for a very similar set of words (leaves, tree, branch, fruit) is 0.552, and the score for ten to twelve randomly selected words returns an average score of 0.0765.
34. The terms johnson, george, epsom each refer to authors and texts caught up in the contemporary critical debate regarding how to distinguish good from bad poetry. Johnson refers to Ben Jonson, george to George Etherege, and epsom to Shadwell’s play, Epsom Wells (1672).
Bibliography
Andersen, Niels Åkerstrøm. “Reinhart Koselleck’s History of Concepts.” In Discursive Analytical Strategies: Understanding Foucault, Koselleck, Laclau, Luhmann, 33–48. Bristol, UK: Polity Press, 2003.
Blei, David. “Introduction to Probabilistic Topic Models.” Communications of the ACM 55, no. 4 (2012): 77–84.
Cicero. De Oratore: Books 1–2. Translated by E. W. Sutton and H. Rackham. Cambridge, Mass.: Harvard University Press, 1942.
Clark, Stephen. “Vector Space Models of Lexical Meaning.” In The Handbook of Contemporary Semantic Theory, edited by Shalom Lappin and Chris Fox, 493–522. Oxford: John Wiley & Sons, 2015.
Cohen, Ralph. “The Augustan Mode in English Poetry.” In Studies in the Eighteenth Century, edited by R. F. Brissenden, 171–92. Toronto: University of Toronto, 1968.
Conley, Thomas. Rhetoric in the European Tradition. Chicago: University of Chicago Press, 1990.
de Bolla, Peter. The Architecture of Concepts: The Historical Formation of Human Rights. New York: Fordham University Press, 2013.
Dryden, John. “PROLOGUE, To the University of Oxon. Spoken by Mr. Hart, at the Acting of the Silent Woman.” In Miscellany poems containing a new translation of Virgills eclogues, Ovid’s love elegies, odes of Horace, and other authors (1684). Text Creation Partnership. https://github.com/textcreationpartnership/A36650.
Empson, William. “Wit in the Essay on Criticism.” Hudson Review 2, no. 4 (Winter 1950): 559–77.
Erk, Katrin. “Vector Space Models of Word Meaning and Phrase Meaning: A Survey.” Language and Linguistics Compass (2012): 635–53.
Firth, J. R. Studies in Linguistic Analysis. London: Oxford University Press, 1962.
Gärdenfors, Peter. Conceptual Spaces: The Geometry of Thought. Cambridge, Mass.: MIT Press, 1994.
Goldberg, Yoav, and Omar Levy. “word2vec Explained: Deriving Mikolov et al’s Negative-Sampling Word-Embedding Method.” 2014, http://arxiv.org/abs/1402.3722.
Goldstone, Andrew and Ted Underwood. “What Can Topic Models of PMLA Teach Us about the History of Literary Scholarship?” Journal of Digital Humanities 2, no. 1 (Winter, 2012), http://journalofdigitalhumanities.org/2-1/what-can-topic-models-of-pmla-teach-us-by-ted-underwood-and-andrew-goldstone/.
Hammond, Paul. The Making of Restoration Poetry. Cambridge: D. S. Brewer, 2006.
Harispe, Sébastien, Sylvie Ranwez, Stefan Janaqui, and Jacky Mountmain. “Semantic Similarity from Natural Language and Ontology Analysis.” Synthesis Lectures on Human Language Technologies 8, no. 1 (May 2015): 1–254.
Harris, Zellig. “Distributional Structure.” In The Structure of Language: Readings in the Philosophy of Language, edited by Jerry A. Fodor and Jerrold Katz, 33–49. Englewood Cliffs, N.J.: Prentice-Hall, 1964.
Hope, Jonathan, and Michael Witmore. “‘Après le déluge, More Criticism’: Philology, Literary History, and Ancestral Reading in the Coming Posttranscription World.” Renaissance Drama 40 (2012): 135–50.
Hunter, J. Paul. “Formalism and History: Binarism and the Anglophone Couplet.” Modern Language Quarterly 61, no. 1 (March 2000): 109–29.
Jensen, H. James. A Glossary of John Dryden’s Critical Terms. Minneapolis: University of Minnesota Press, 1969.
Jockers, Matthew L. Macroanalysis: Digital Methods and Literary History. Champaign: University of Illinois Press, 2013.
Jurafsky, Daniel, and James H. Martin. Processing Speech and Language: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 3rd edition draft. Stanford, Calif.: Stanford University Press, 2017.
Koselleck, Reinhart. “Begriffsgeschichte and Social History.” In Futures Past: On the Semantics of Historical Time. New York: Columbia University Press, 2004.
Koselleck, Reinhart. Critique and Crisis: Enlightenment and the Pathogenesis of Modern Society. Cambridge, Mass.: MIT Press, 1988.
Koselleck, Reinhart. “Introduction and Prefaces to the Geschichtliche Grundbegriffe.” Translated by Michaela Richter. Contributions to the History of Concepts 6, no. 1 (Summer 2011): 1–37.
Koselleck, Reinhart. The Practice of Conceptual History. Translated by Todd Samuel Presner and others. Palo Alto, Calif.: Stanford University Press, 2002.
Koselleck, Reinhart. “A Response to Comments on the Geschichtliche Grundbegriffe.” In The Meaning of Historical Concepts, edited by Lehman and Richter, 59–70. Translated by Melvin Richter and Sally E. Robertson. Washington, D.C.: German Historical Institute, 1996.
Levy, Omer, and Yoav Goldberg. “Linguistic Regularities in Sparse and Explicit Word Representations.” CoNLL (2014): 171–80.
Lewis, C. S. Studies in Words. 2nd ed. Cambridge: Cambridge University Press, 1990.
Locke, John. An Essay Concerning Human Understanding, edited by Peter H. Nidditch. Oxford: Clarendon Press, 1975.
Lowe, Will. “Towards a Theory of Semantic Space.” Proceedings of the 23rd Conference of the Cognitive Science Society (2001): 576–81.
Luhn, Hans Peter. “A New Method of Recording and Searching Information.” American Documentation 4, no. 1 (1953).
Lund, Roger D. “Wit, Judgment, and the Misprisions of Similitude.” Journal of the History of Ideas 65, no. 1 (January 2004): 53–74.
McEnery, Tony, and Andrew Hardie. Corpus Linguistics. Cambridge: Cambridge University Press, 2012.
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. “Efficient Estimation of Word Representations in Vector Space.” 2013, http://arxiv.org/abs/1301.3781.
Mohr, John W., and Petko Bogdanov. “Topic Models: What They Are and Why They Matter?” Poetics 41, no. 6 (December 2013): 545–69.
Moretti, Franco. Distant Reading. London: Verso, 2013.
Nirenburg, Sergei, Harold Somers, and Yorick Wilks. eds. Readings in Machine Translation. Cambridge, Mass.: MIT Press, 2003.
Pocock, J. G. A. “Concepts and Discourses: A Difference in Culture?” In The Meaning of Historical Concepts, edited by Lehman and Richter, 47–58. Washington, D.C.: German Historical Institute, 1996.
Rhody, Lisa M. “Topic Modeling and Figurative Language.” Journal of Digital Humanities 2, no. 1 (Winter 2012).
Salton, Gerard. “Historical Note: The Past Thirty Years in Information Retrieval.” JASIS 38, no. 5 (1987).
Schmidt, Ben. “Words Alone: Dismantling Topic Models in the Humanities.” Journal of Digital Humanities 2, no. 1 (Winter 2012).
Schütze, Hinrich. “Word Space.” Advances in Neural Information Processing Systems 5 (1993): 895–902.
Turney, Peter D., and Patrick Pantel. “From Frequency to Meaning: Vector Space Models of Semantics.” Journal of Artificial Intelligence Research (2010): 141–88.
Underwood, Ted. “Distant Reading and Recent Intellectual History.” Debates in the Digital Humanities 2016, edited by Matthew K. Gold and Lauren F. Klein. Minneapolis: University of Minnesota, 2016. http://dhdebates.gc.cuny.edu/debates/text/95.
Underwood, Ted, and Andrew Goldstone. “The Quiet Transformations of Literary Studies: What Thirteen Thousand Scholars Could Tell Us.” New Literary History 45, no. 3 (Summer 2014): 359–84.
Weaver, Warren. “Machine Translation.” In Readings in Machine Translation, edited by Sergei Nirenburg, Harold Somers, and Yorick Wilks, 13–18. Cambridge, Mass.: MIT Press, 2003.
Widdows, Dominic. Geometry and Meaning. Palo Alto, Calif.: Center for the Study of Language and Information, 2004.
Williams, Raymond. Culture and Society, 1780–1950. New York: Columbia University Press, 1958.
Williams, Raymond. Keywords: A Vocabulary of Culture and Society. New York: Oxford University Press, 1976.
Witmore, Michael. “Text: A Massively Addressable Object.” In Debates in Digital Humanities, edited by Matthew K. Gold, 324–27. Minneapolis: University of Minnesota Press, 2012.