Born Literary Natural Language Processing

David Bamman

In many ways, literary texts push the limits of natural language processing (NLP). The long, complex sentences in novels strain the limits of syntactic parsers, their use of figurative language challenges representations of meaning based on neo-Davidsonian semantics, and the sheer number of words in a book rules out existing solutions for problems like coreference resolution that expect short documents with a small set of candidate antecedents. This complexity of literature motivates the core argument of this article: if methods in NLP are to be used for analyses to help drive humanistic insight, we must develop resources and models in NLP that are literary-centric—models trained specifically on literary data that attests to the phenomena that we might want to measure, that encodes representations of the world we deem more appropriate than those encoded in other datasets designed for other purposes, and that specifically considers the needs of researchers working with fictional material—for example, by historicizing categories of gender (Mandell, “Gender and Cultural Analytics”).

This stance comes from an increasing use of NLP as a measuring instrument, where the individual low-level linguistic representations of morphology, syntax, and discourse are employed in analyses that depend on them for argumentation. In these cases, the fundamental research question is not in solving an NLP problem but in treating NLP as an algorithmic measuring device—representing text in a way that allows a comparison of measures to be made, whether for the purpose of explicit hypothesis testing, exploratory analysis, or an iterative, grounded interpretive process (Nelson). While measurement modeling has a long research history in the social sciences (Fariss, Kenwick, and Reuning; Jacobs and Wallach), work in the computational humanities—which focuses on applying empirical and computational methods for humanistic inquiry—likewise has drawn on NLP as an algorithmic measuring device for a wide range of work. Automatic part-of-speech taggers have been used to explore poetic enjambment (Houston, “Enjambment and the Poetic Line”) and characterize the patterns that distinguish the literary canon from the archive (Algee-Hewitt et al., “Canon/Archive”). Syntactic parsers have been used to attribute events to the characters who participate in them (Jockers and Kirilloff; Underwood, Bamman, and Lee) and characterize the complexity of sentences in Henry James (Reeve). Coreference resolution has been used to explore the prominence of major and minor characters as a function of their gender (Kraicer and Piper). Named entity recognizers have been used to explore the relationship between places in British fiction and cultural identity (Evans and Wilkens); geographic markers extracted from named entity recognition (NER) have been used to create visualizations of the places mentioned in texts, both for toponyms in Joyce’s Ulysses (Derven, Teehan, and Keating) and short fiction by Edward P. Jones (Rambsy and Ossom-Williamson). Topics help organize a range of work in the humanities, from identifying the characteristics of colonial fiction in Australian newspapers (Bode) to unveiling editorial labor in nineteenth-century U.S. publications (Klein) and organizing reader responses to “classics” (Walsh and Antoniak). And moving beyond text to sound studies, work has also explored using NLP to extract prosodic features from texts (Clement et al.) and directly model audio data to investigate questions revolving around applause (Clement and McLaughlin) and poet voice (MacArthur, Zellou, and Miller). In each of these cases, NLP is used to extract some aspect of linguistic structure from text and take measurements on those aspects in order to advance an argument.

As NLP is increasingly used in this measurement capacity for inquiry within the humanities, it is important to establish its instrument validity—the degree to which a measuring device (whether mechanical or algorithmic) is measuring what it is supposed to. While there are, of course, many applications of computational methods to literary texts that will resist comparison to a preexisting gold standard (and that subsequently require other methods for establishing validity), consensus is often possible for the low-level linguistic phenomena on which those methods depend, such as classifying a word as a noun or a verb or identifying the mentions of people in a sentence. One source of potential trouble in this case, however, is the mismatch in domains between literary texts and the relatively small set of domains that mainstream NLP has tended to focus on. A prominent source is news, which forms the overwhelming basis for benchmark corpora. Examples include MUC (Sundheim), the Penn Treebank (Marcus, Santorini, and Marcinkiewicz), ACE (Walker et al.), the New York Times Annotated Corpus (Sandhaus), and OntoNotes (Hovy et al.). Another prominent source is Wikipedia, which provides the benchmark datasets for question answering (Rajpurkar et al.; Rajpurkar, Jia, and Liang) and named entity linking (Cucerzan). It also provides the training material for many language models in multiple languages (Mikolov et al.; Pennington, Socher, and Manning; Devlin et al.).

As methods developed within the context of NLP are increasingly used for humanistic inquiry, much work has explored adapting unsupervised models (such as topic models) to issues commonly encountered in datasets used by humanists. This includes understanding the impact of text duplication in large digital libraries (Schofield, Thompson, and Mimno), moderating the influence of authorial voice on topic inference (Thompson and Mimno), and exploring the impact of applying such models to figurative texts (Rhody). This chapter makes an argument for extending this scrutiny to supervised models as well, charting the drop in performance that comes when training supervised models on one domain and applying them to another. I introduce LitBank, a dataset of literary texts annotated with a range of linguistic phenomena that can provide the basis for training literary-centric NLP. As we think about the ways in which researchers in the digital humanities interact with NLP (McGillivray, Poibeau, and Fabo), a focus on literary-centric NLP not only seeks to improve the state of the art for NLP in the domain of literature but also examines the specific research questions that are only afforded within literature. While other work in NLP touches on aspects of narrative that are common with news, such as inferring narrative event chains (Chambers), literature presents a number of unique opportunities, including modeling suspense, the passage of time, and focalization. Importantly, literary-centric NLP also entails a widening of opportunity for researchers in the computational humanities; either those with training in both computational methods and a literary theory or a part of a team that has expertise in both. While much work in NLP has been dominated by researchers in the fields of computer science, information science, and linguistics, the researchers poised to make the biggest advances in this area are those with training and collaborators in the humanities. These researchers and teams can not only leverage their expertise in the specific subject matter to define the appropriate boundaries of datasets, but they also use their own disciplinary expertise to define the literary phenomena that are worth modeling in the first place. In so doing, they are uniquely positioned to use their hermeneutic expertise to both interpret the ways in which measurements of those phenomena can be turned into knowledge (as Hannah Ringler points out in this volume) and challenge the initial theoretical objects such work often begins with (as Mark Algee-Hewitt highlights in this volume as well).

Performance across Domains

Progress in natural language processing is primarily driven by comparative performance on benchmark datasets—progress in phrase-structure syntactic parsing, for example, has been defined for thirty years by performance on the Penn Treebank (Marcus, Santorini, and Marcinkiewicz). Benchmark datasets provide an external control: given a fixed dataset, researchers can be more confident that an increase in performance by their model for the task relative to another model can be attributed to their work alone and not simply be the result of incomparable performance on different datasets. At the same time, benchmark datasets tend to myopically focus attention on the domains they represent, and the ability of models to perform well on domains outside the domain they have trained on (their “generalization” performance) can be quite poor. Table 3.1 represents a metareview illustrating this performance degradation across a range of training/test scenarios. A model trained on one domain may yield high performance when evaluated on data from that same domain, but it often suffers a steep drop in performance when evaluated on data from another domain. In the few cases that involve training on news and evaluating on literature, these drops in performance can amount to twenty absolute points or more, effectively rendering a tool unusable.

Table 3.1. In-Domain and Out-of-Domain Performance for Several NLP Tasks
Includes POS tagging, phrase structure (PS) parsing, dependency (dep.) parsing, named entity recognition (NER), coreference resolution, and event trigger identification. Accuracies are reported in percentages; phrase structure parsing, NER, coreference resolution, and event identification are reported in F1 measure.
Citation	Task	In domain	Accuracy	Out of domain	Accuracy
Rayson et al. (2007)	POS	English news	97.0%	Shakespeare	81.9%
Scheible et al. (2011)	POS	German news	97.0%	Early Modern German	69.6%
Moon and Baldridge (2007)	POS	WSJ	97.3%	Middle English	56.2%
Pennacchiotti and Zanzotto (2008)	POS	Italian news	97.0%	Dante	75.0%
Derczynski, Ritter, et al. (2013)	POS	WSJ	97.3%	Twitter	73.7%
Gildea (2001)	PS parsing	WSJ	86.3 F	Brown corpus	80.6 F
Lease and Charniak (2005)	PS parsing	WSJ	89.5 F	GENIA medical texts	76.3 F
Burga et al. (2013)	Dep. parsing	WSJ	88.2%	Patent data	79.6%
Pekar et al. (2014)	Dep. parsing	WSJ	86.9%	Broadcast news	79.4%
				Magazines	77.1%
				Broadcast conversation	73.4%
Derczynski, Maynard, et al. (2013)	NER	CoNLL 2003	89.0 F	Twitter	41.0 F
Bamman, Popat, and Shen (2019)	Nested NER	News	68.8 F	English literature	45.7 F
Bamman, Lewke, and Mansoor (2020)	Coreference	News	83.2 F	English literature	72.9 F
Naik and Rose (2020)	Events	News	82.6 F	English literature	44.1 F

Perhaps more pernicious than a simple drop in performance, however, are the forms of representational bias that are present in any dataset. For literary texts, an entity recognition model trained on news (the ACE 2005 dataset) is heavily biased toward recognizing men, simply given the frequency with which men are present in that news data (Bamman, Popat, and Shen). When tested on literature, where men and women are mentioned with greater parity, the recall at recognizing women is much worse, recognizing only 38.0 percent of mentions, compared to 49.6 percent for men (a difference of –11.6 points). A model trained natively on literature, however, corrects this disparity, recognizing 69.3 percent of mentions who are women and 68.2 percent of those who are men (a difference +1.1 points). Literature naturally presents its own representational bias when considered from the viewpoint of other datasets (e.g., the depiction of gender roles in a nineteenth-century novel of manners is certainly different from that of a twenty-first-century biography), and this is part of the point: the models we use should reflect the characteristics of the data we are applying it to—the words that are being used and the worlds they are depicting.

One motivation for literary-centric NLP is to simply improve this dismal performance—if a model is able to reach an F-score of 68.8 for entity recognition in English news, then we should not have to settle for an F-score of 45.7 for English literature. But beyond that overall goal is a concern that strikes at the heart of methods in the computational humanities—if we are using empirical methods as algorithmic measuring devices, then absolute accuracy is less important than the source of any measurement error: if error is nonrandom, such that measurement accuracy is dependent on a variable that is at the core of subsequent analysis (such as gender), then we need to account for it. While methods from the social sciences that deal with survey data—like multilevel regression and poststratification (Gelman and Little)—may provide one means of correcting the disparity between a biased sample and the true population they are meant to reflect (if error rates are known and we only care about aggregate statistics), there are many situations where such methods fail.

This is not to say that generalizing from a nonliterary domain to a literary one is not possible; methods in NLP already have several strategies for narrowing this gap with domain adaptation (adapting the parameters of a model toward a domain for which it has little, or perhaps no, training data) and distributed representations of words so that a model has some information that a word like “rockaway” (a nineteenth-century four-wheeled carriage) has some lexical similarity to “car,” even if it never appears in its labeled data. But if our goal is to use these methods for work in literature, then we should be centering metrics of validity on literature as a domain, including both overall accuracy and the biases that reflect those that appear in the texts under study. In the absence of any other method to optimize both of those concerns, an alternative is much simpler: we can train models natively on literary data and encode the biases present in representatives of the data we will later analyze.

LitBank

By training a model on data that resembles what it will see in the future, we can expect our performance on that future data to be similar to the performance on data we have seen during training. Several efforts have done just this for a range of linguistic phenomena, including part-of-speech tagging (Mueller) and syntactic parsing in a variety of historical and literary registers—including Portuguese (Galves and Faria), Greek and Latin (Bamman and Crane; Passarotti; Haug and Jøhndal), and English (Taylor and Kroch; Kroch, Santorini, and Delfs; Taylor et al.). By annotating data within the domain we care to analyze, we can train better methods to analyze data that looks similar to it in the future.

LitBank is one such resource: an open-source, literary-centric dataset to support a variety of contemporary work in the computational humanities working with English texts. To date, it contains 210,532 tokens drawn from 100 different English-language novels, annotated for four primary phenomena: entities, coreference, quotations, and events (each described in more detail below). By layering multiple phenomena on the same fixed set of texts, the annotations in LitBank are able to support interdependence between the layers—coreference, for example, groups mentions of entities (Tom, the boy) into the unique characters they refer to (TOM SAWYER), and quotation attribution assigns each act of dialogue to the unique character (i.e., coreference chain) who speaks it.

Sources

The texts in LitBank are all drawn from public domain texts in Project Gutenberg and include a mix of high literary style (e.g., Edith Wharton’s Age of Innocence, James Joyce’s Ulysses) and popular pulp fiction (e.g., H. Rider Haggard’s King Solomon’s Mines, Horatio Alger’s Ragged Dick). All of the texts in LitBank were originally published before 1923 and, as Figure 3.1 illustrates, predominantly fall at the turn of the twentieth century.

Phenomena

Entities

Entities define one of the core objects of interest in the computational humanities. Entities capture the characters that are involved in stories, the places where they operate, and the things they interact with. Much work in the computational humanities reasons about these entities, including character (Underwood, Bamman, and Lee), places (Evans and Wilkens), and objects (Tenen), and has focused on both evaluating and improving NER for specifically the literary domain (Brooke, Hammond, and Baldwin; Dekker, Kuhn, and van Erp).

Distribution of publication dates of LitBank texts between 1700 and 1923, with most texts published after 1850. — Figure 3.1. Distribution of texts in LitBank over time.

Figure Description

Graphic plots a histogram of publication dates for books present in LitBank; while books within it appear as early as 1719 (Robinson Crusoe), the vast majority are published after 1850, with increasing representation over time.

Nested NER structure in the phrase “the elder brother of Isabella’s husband.” — Figure 3.2. A hierarchical structure of a passage from Jane Austen’s Emma.

Traditional NER systems for other domains like news typically disallow hierarchical structure within names—flat structure is easier to reason about computationally where it can be treated as a single-layer sequence labeling problem and largely fits the structure of the phenomenon, where common geopolitical entities (such as Russia) and people (such as Bill Clinton) lack hierarchical structure. But literature abounds with hierarchical entities, many of which are not named at all (Figure 3.2). In this passage from Jane Austen’s Emma, we see multiple entities expressed: Isabella, Isabella’s husband, and the elder brother of Isabella’s husband. Even though they are not named, all are potentially significant as mentions of characters within this story.

Table 3.2. Counts of Entity Type
Category	n	Frequency
PER	24,180	83.1%
FAC	2,330	8.0%
LOC	1,289	4.4%
GPE	948	3.3%
VEH	207	0.7%
ORG	149	0.5%

Table 3.3. Counts of Entity Category
Category	n	Frequency
PRON	15,816	54.3%
NOM	9,737	33.5%
PROP	3,550	12.2%

To capture this distinctive characteristic of literary texts, the first annotation layer of LitBank identifies all entities of six types—people (PER), facilities (FAC), geopolitical entities (GPE), locations (LOC), organizations (ORG), and vehicles (VEH), as illustrated in Table 3.2—and classifies their status as a proper name (PROP), common noun phrase (NOM), or pronoun (PRON).1 Table 3.3 shows that the proportion of entities that traditional NER would capture (PROP) is quite small—common entities (her sister) are mentioned nearly three times as often as proper names (Jane) and both far less frequently than pronouns.

What can we do with books labeled with these entity categories? At their simplest, entities provide an organizing system for the collection, as Erin Wolfe (“Natural Language Processing in the Humanities”) has demonstrated by applying models trained on LitBank to texts in the Black Books Interactive Project (https://bbip.ku.edu). This work extracts a ranked list of the most frequent people and places mentioned in a text, which provides a high-level overview of the content of a work.

At the same time, entity types abstract away common patterns that provide insight into narrative structure. As David McClure (“Distributions of Words across Narrative Time in 27,266 Novels”) points out at the scale of individual words, many terms exhibit strong temporal associations with the narrative time of a book: significant plot elements like death show up near the end of a novel, while many terms that introduce people show up earlier. By tagging the entities that exist within a text, we can move beyond measuring individual words like “John” or “she” to capture the relative frequency of people as a whole. As Figure 3.3 illustrates, we also find different temporal dynamics when considering all people mentioned by proper name (Sherlock, Jane Eyre), common noun phrases (the boy, her sister), and pronouns (he, she, they). While proper names and pronouns increase in frequency as a book progresses from its beginning to end, common noun phrases show a marked decline in frequency. Identifying the entities present in a text gives us the ability to study behavior at these different levels of granularity.

Rate of occurrence of people in LitBank texts; proper name and pronoun mentions increase over narrative time, while common nouns decrease. — Figure 3.3. Distribution over narrative time of automatically predicted PROP, NOM, and PRON person entities in 100 English-language books from Project Gutenberg, excluding all paratextual material from Project Gutenberg (legal boilerplate) and the print original (tables of contents, advertisements, etc.).

Figure Description

These plots display the count of proper name (e.g., “Tom”), common noun phrases (“the boy”), and pronouns (“he”) as they appear over the complete narrative time of works in LitBank. While pronouns are used more frequently for reference than other categories (as seen in the differing scales of the y axis), we see substantial variation within each category over the linear time of the story (with pronouns, for example, increasingly being used toward the end).

Coreference

Coreference resolution is the challenge of clustering together all mentions that refer to the same distinct entity. While some mentions have names that are determinative of their identity—for example, just about every mention of New York City in a text will refer to the unique city known by that name—there is often ambiguity in mapping mentions to entities. The same term (e.g., “she”) can often refer to many different people, and the same person (e.g., Tom Sawyer) can be known by many different expressions (“Tom,” “that boy,” “him”).

The benchmark dataset for coreference in English is OntoNotes 5 (Weischedel et al.), which includes the domains of news (broadcast, magazine, and newswire), conversation, the web, and even some literature (though restricted to include only the Bible). There are many ways, however, in which coreference in literature differs from that in factual textual sources, including the delayed revelation of identity (common in detective stories and mysteries, for example), in which two characters portrayed as separate entities are revealed to be the same person. Narratives in literary texts also tend to span longer time frames than news articles—perhaps years, decades, or even centuries—which raises difficult questions on the metaphysical nature of identity (e.g., is Shakespeare’s LONDON of 1599 the same entity as Zadie Smith’s LONDON in the year 2000?).

To address these issues, the second layer of annotations in LitBank covers coreference for the entities annotated above.2 We specifically consider the ways in which literary coreference differs from coreference in news and other short, factual texts and manually assign the 29,103 mentions annotated above into 7,235 unique entities. As a result, coreference models trained on this native literary data perform much better on literary text (79.3F average F1 score) than those trained on OntoNotes (average 72.9F). This joins existing datasets for literature, including the work of Hardick Vala and colleagues (“Mr. Bennet, His Coachman, and the Archbishop Walk into a Bar but Only One of Them Gets Recognized”), which annotates character aliases in Sherlock Holmes, Pride and Prejudice, and The Moonstone and annotated datasets of conference in German novels (Krug et al.) and plays (Pagel and Reiter).

What does coreference make possible for cultural analysis? Coreference is critical for aggregating information about distinct entities like characters. For example, in “The Transformation of Gender in English-Language Fiction” (Underwood, Bamman, and Lee), we measured the amount of “attention” that characters receive in novels over 200 years of literary history by counting up the number of actions each character participates in. This is only possible by having a stable entity for each character that such counts can apply to, and since over half of all entity mentions are pronominal in nature, including pronouns in coreference is critical for that characterization. Limited attention has been devoted to the potential of coreference for other entity categories beyond people, but coreference for such classes as place (forests and marshes, houses and rooms, cities and countries and villages) is in many ways a precondition for the analysis of setting and its relationship to plot. In order to chart that characters in Lord of the Rings begin in the Shire, venture to Mount Doom, and return home in the end, we need to understand that “the Shire” and “home” refer to the same physical location. Coreference resolution would help make this kind of spatial analysis possible.

Quotation Attribution

Much work in the computational humanities has explored the affordances of speaker attribution—identifying the speaker of a given piece of dialogue. Such attributed dialogue has been used in the past to create character networks by defining characters to be nodes and forming an edge between two characters who speak to one another (Elson, Dames, and McKeown). Literary-centric quotation data exists for both English and German: for English, this data encompasses Austen’s Pride and Prejudice and Emma as well as Chekhov’s The Steppe (He, Barbosa, and Kondrak; Muzny et al.), while the Columbia Quoted Speech Corpus (Elson and McKeown) includes six texts by Austen, Dickens, Flaubert, Doyle, and Chekhov. For German, 489,459 tokens were annotated for speech, thought, and writing, including direct, indirect, free indirect, and reported speech (Brunner et al.).

To provide a more diverse set of data for English, the third layer of LitBank includes dialogue attribution for the 100 texts in its collection.3 This includes 1,765 dialogue acts across 279 characters in the 100 novels present in LitBank, which allows us to measure the accuracy of both quotation identification and attribution across a much broader range of texts than previously studied. Table 3.4 provides a summary of the characters with the most dialogue in this annotated dataset.

What can we do with such attribution data? Expanding on the use of quotations to define character interaction networks, in “Measuring Information Propagation in Literary Social Networks” (Sims and Bamman), we use quotations to extract atomic units of information and measure how those units propagate through the network defined by people speaking to each other. We specifically find both that information propagates through weak ties and that women are often depicted as being the linchpins of information flow.

Table 3.4. Characters with Most Annotated Dialogue from LitBank Database of 278 Characters across 100 Novels
Character	Text	n
Buck Mulligan	Ulysses	43
Convict	Great Expectations	33
Mrs. Bennet	Pride and Prejudice	29
Ragged Dick	Ragged Dick	28
Mr. Bennet	Pride and Prejudice	28

Quotation also enables direct analysis of character idiolects, allowing us to ask what linguistic properties differentiate dialogue from narrative (Muzny, Algee-Hewitt, and Jurafsky) and the speech of characters from each other (Vishnubhotla, Hammond, and Hirst)—including the degree to which speech is predictive of other traits like personality (Flekova and Gurevych). While our work in “Measuring Information Propagation” (Sims and Bamman) exploits the notion of “listeners” of dialogue in order to track propagation, there is a range of work to be done in analyzing what differentiates the speech of a single character as they address different listeners, following the fundamental principle of audience design (Bell).

Events

While entity recognition captures the important characters and objects in literature, recognizing events is important for grounding actions in plot. Event-annotated datasets in NLP have historically focused on the domain of news, including MUC (Sundheim), ACE (Walker et al., ACE 2005), and DEFT (Aguilar et al.), with some exceptions—in particular, an annotated dataset for historical texts that captures important classes of events in consultation with historians (Sprugnoli and Tonelli). But the depiction of events in literary texts tends to be very different from events in news. Literary texts include long and complex structures of narrative and multiple diagetic frames in which some events properly belong to the space of the plot, while others exist only in the commentary by the author. To address the specificity of the problem for literature, the fourth layer of annotation in LitBank focuses on realis events—events that are depicted as actually taking place within the narrative—which excludes hypotheticals, conditionals, and extra-diagetic events.4 The criteria for what constitutes a realis event fall into four distinct categories (in the examples below, only realis events appear in boldface):

Polarity: events must be asserted as actually occurring and not marked as having not taken place (John walked by Frank and didn’t say hello).
Tense: events must be in past or present tense; they must not be future events that have not yet occurred (John walked to the store and will buy some groceries).
Specificity: events must involve specific entities and take place at a specific place and time (John walked to work Friday morning); they must not be unqualified statements about classes (Doctors walk to work).
Modality: events must be asserted as actually occurring, as distinct from events that are the targets of other modalities, including beliefs, hypotheticals, and desires (John walked to the store to buy some groceries).

We annotate a total of 7,849 events in the 100 novels of LitBank. As Aakanksha Naik and Carolyn Rose (“Towards Open Domain Event Trigger Identification Using Adversarial Domain Adaptation”) have shown, models trained natively on news (TimeBank) tend to perform quite poorly on LitBank (leading to a cross-domain drop in performance of 38.5 points), attesting to the benefit of annotated data within the domain we care about.

What can we do with events? In “Literary Event Detection” (Sims, Park, and Bamman), we show that examining realis events reveals a meaningful difference between popular texts and texts with high prestige, which is marked as the number of times an author’s works were reviewed by elite literary journals, following Ted Underwood (Distant Horizons). Authors with high prestige not only present a lower intensity of realis events in their work than authors of popular texts, but they also have much more variability in their rates of eventfulness; popular texts, in contrast, have much less freedom in this respect, exhibiting a much narrower range of variation. Additionally, Maarten Sap and colleagues (“Recollection versus Imagination”) build on this work by leveraging models trained on LitBank events to measure the difference between imagined and recalled events, showing that stories that are recalled contain more realis events than those that are entirely fictional. While existing work to date has focused on measurements of events on their own, there is much space for exploring the interaction between events and other narrative components—such as which characters participate in the most realis events and which works have the highest ratio of realis events per page (to capture a measure of pacing).

Coverage

One critique that we might level at this work is that “literature” is not a monolithic domain—and, in fact, the differences between individual texts that fall into what we call literature can be much greater than the cross-domain difference between a random novel and a news article. One of the biggest differences on this front is due to time—methods that are trained on texts published before 1923 will help us little in recognizing entities in contemporary novels like Facebook, jets, and Teslas, along with events like googling and texting.

LitBank contains texts published before 1923 in order to work exclusively with public domain texts, so that the original text can be published along with the annotations we layer. While texts published before 1923 capture a wide range of literature, this decision is restrictive, missing nearly a century of more contemporary texts, along with the more diverse voices represented in novels published today. Our current efforts are focused on expanding LitBank to include samples from 500 books published between 1924 and 2020, including 100 works written by Black authors drawn from the Black Books Interactive Project, 100 works by global Anglophone writers, 100 bestsellers, 100 prizewinning books, and 100 works of genre fiction. While these texts are in copyright, we will publish small samples of the texts along with linguistic annotations in order to enable reproducibility. We do so under the belief that such work is a transformative use of the original text that adds new value and does not affect the market for the original work and hence falls under the protections of fair use (Samberg and Hennesy).

At the same time, LitBank also is focused on works of fiction in the English language, further exacerbating what Roopika Risam notes is “the Anglophone focus of the field” of digital humanities (Risam, “Other Worlds, Other DHs”). NLP pipelines developed in many languages likewise perform quite poorly for literary texts such as NER for Spanish literary texts (Isasi), exposing the need for literary datasets across a wide range of languages. Current work is also focused on expanding the languages represented in LitBank to include Chinese and Russian, with more to follow.

Literary-Centric Questions

There is a rich research space building methods and datasets to adapt existing components of the NLP pipeline to work better on literary texts. But at the same time, an emphasis on literary-centric NLP requires attending to specifically literary questions that current NLP systems cannot directly address. As Lauren Klein notes, we should not let our questions be guided by the performance of our algorithms (Klein, “Distant Reading after Moretti”). What are these questions that are uniquely literary?

One set of questions models the relationship between readers and the texts they read, including the state of knowledge that we might surmise a reader has at a given point in the text. This is a problem that uniquely pertains to narrative text, where a reader builds up a model of the represented world over the course of reading and can make predictions about future events that might transpire within it. While some work in NLP addresses the question of the temporal order with which stories unfold (Mostafazadeh et al.), one phenomenon that appears frequently within some genres of literature is suspense—the potential anxious uncertainty about what is yet to come. Mark Algee-Hewitt (“The Machinery of Suspense”) models this phenomenon by soliciting judgments of suspense from readers and building a model to predict that rating from an input passage that is 2 percent the length of a book, and David Wilmot and Frank Keller (“Modelling Suspense in Short Stories as Uncertainty Reduction over Neural Representation”) model suspense in short stories by measuring the reduction in future uncertainty. While most document-level classification tasks presume simultaneous access to the entirety of a text, suspense is one phenomenon where the sequential ordering of narrative is critical for understanding—we are essentially modeling a reader’s state of mind at time t having read the text through time t but not after it. Work in the computational humanities has begun to explore this phenomenon from the perspective of intratextual novelty and repetition (McGrath, Higgins, and Hintze; Long, Detwyler, and Zhu)—modeling the degree to which authors repeat information within a book—but there are many other related phenomena such as foreshadowing that remain to be explored.

A second set of questions arises from the formal nature of novels and longer literary texts. Unlike news, Wikipedia articles, and tweets, novels are long; they are roughly 100,000 words, on average. This length presents challenges for NLP that was designed for other domains—in particular, interesting questions that we might ask of the nature of objects and things more generally (Brown, “Thing Theory”) are resisted by the quality of coreference resolution for common entities like cars, guns, and houses over long distances of narrative time. Tenen’s “Toward a Computational Archaeology of Fictional Space” is one example of the kind of work that can be done when reasoning about the nature of objecthood—in that case, considering the density of objects mentioned. What we often want is not only a measure of how objects in the abstract behave, but how specific objects are depicted—such as the eponymous houses in E. M. Forster’s Howards End, Nathaniel Hawthorne’s House of Seven Gables, or Mark Danielewski’s House of Leaves. Characterizing those distinct houses requires us to identify when any individual mention of the word house refers to the named house in question—a task challenging even for short documents but far more difficult at the moment for hundreds of mentions of such a common phrase potentially describing dozens of unique entities. Even though this is more of a computational challenge than a literary one, it is driven exclusively by the characteristics of literary texts and is unlikely to be solved by anyone not working in the computational humanities.

Finally, a third set of questions are narratological ones—how do we recognize the individual components of narrative and assemble them together into a representation of plot? A wealth of work has explored this question from different angles (Piper, So, and Bamman), including inferring sentiment arcs (Jockers; Reagan et al.), identifying “turning points” in movies (Papalampidi, Keller, and Lapata), disentangling storylines in Infinite Jest (Wallace, “Multiple Narrative Disentanglement”), and separating segments in The Waste Land (Brooke, Hammond, and Hirst). Other work has focused on identifying Proppian narrative functions in fairy tales (Finlayson), recognizing free indirect speech (Brunner et al.), modeling stream of consciousness (Long and So), and measuring the passage of time (Underwood). Much of the difficulty for modeling complex narratological phenomena is embedded in the difficulty of simply operationalizing what a concept like “plot” means as a computational form. Recent work attempts to tackle this theoretical question head on by comparing different narratological annotation schemes as a first step toward computational modeling (Reiter, Willand, and Gius). But in many ways, modeling narratological questions is uniquely positioned at the intersection of computation and the humanities—requiring not only expertise in models of linguistic structure but also a deep foundation in literary and narrative theory (Genette; Bal). The breadth of areas in this space—ranging from identifying characters and settings to inferring storylines and hierarchical narrative levels—makes modeling narratological phenomena one of the most vibrant areas poised for transformative work going forward.

Future

There is a range of work in the computational humanities that relies on linguistic structure—established phenomena like named entity recognition, uniquely literary tasks like predicting the passage of time, and a variety of opportunities on the horizon—that raises the potential to generate insight by considering the inherent structure present within text. While the field of natural language processing has focused for years on developing the core computational infrastructure to infer linguistic structure, much work remains to both adapt those methods to the domain(s) of literature and also to explore the unique affordances that literature provides for computational inquiry. For existing tasks—entity recognition, coreference resolution, event identification, quotation attribution—one straightforward solution exists: we need to create more annotated data composed of the literary texts that form the basis of our analyses, for both training (to improve the models on this domain) and evaluation (so that we know they work). LitBank provides one such resource; while this dataset is expanding to encompass a greater variety of texts, it will always hold gaps—both in its representation and in the phenomena it contains. More annotated data is always needed.

Annotated data from literary texts provides a solution to one issue in literary-centric NLP: How do we go about tackling new literary-centric questions, including those research areas outlined above? For several components of these problems, we can fall back on time-tested strategies. If we can operationalize a concept and annotate its presence in text to a reliable degree, we can annotate texts and train models to predict those human judgments for new texts we have not labeled yet. The complexity of modeling can range from straightforward sentence-level classification problems of suspense to complex hierarchical models of narrative levels.

While the design of some models will require training in NLP, the most important parts of this work are often outside the realm of computation and draw on at least three different skills: first, insight into the aspects of narrative and critical theory that can provide a strong foundation for an empirical method; second, the ability to circumscribe the boundaries of a problem in a way that both simplifies it enough for computational methods to be able to address it while also still preserving enough richness to sustain its relevance for humanistic inquiry; and third—and perhaps most important—the creativity needed to identify the questions worth asking in the first place. Like its broader field of the computational humanities, literary-centric NLP necessarily draws on expertise in both disciplines that comprise its community of practice, and this need to co-construct both the questions worth working on and the methods used to address them offers a vibrant path forward in this interdisciplinary space.

Notes

Many thanks to the reviewers for helpful comments in improving this work. The research reported in this article was supported by an Amazon Research Award and NSF CAREER grant IIS-1942591, along with resources provided by NVIDIA and Berkeley Research Computing.

1. This work is described in more detail in Bamman, Popat, and Shen, “An Annotated Dataset of Literary Entities,” and Bamman, Lewke, and Mansoor, “An Annotated Dataset of Coreference in English Literature.”
2. More information on the coreference layer can be found in Bamman, Lewke and Mansoor, “An Annotated Dataset of Coreference in English Literature.”
3. This work is described in more detail in Sims and Bamman, “Measuring Information Propagation in Literary Social Networks.”
4. This layer is described in Sims, Park, and Bamman, “Literary Event Detection.”

Bibliography

Aguilar, Jacqueline, Charley Beller, Paul McNamee, Benjamin Van Durme, Stephanie Strassel, Zhiyi Song, and Joe Ellis. “A Comparison of the Events and Relations across ACE, ERE, TAC-KBP, and FrameNet Annotation Standards.” In Proceedings of the Second Workshop on Events: Definition, Detection, Coreference, and Representation, 45–53. Baltimore: Association for Computational Linguistics, 2014. https://doi.org/10.3115/v1/W14-2907.
Algee-Hewitt, Mark. “The Machinery of Suspense,” 2016. http://markalgeehewitt.org/index.php/main-page/projects/the-machinery-of-suspense/.
Algee-Hewitt, Mark, Sarah Allison, Marissa Gemma, Ryan Heuser, Franco Moretti, and Hannah Walser. “Canon/Archive: Large-Scale Dynamics in the Literary Field.” Literary Lab Pamphlet 11 (2016).
Bal, Mieke. Narratology: Introduction to the Theory of Narrative. Toronto: University of Toronto Press, 2017.
Bamman, David, and Gregory Crane. “The Ancient Greek and Latin Dependency Treebanks.” In Language Technology for Cultural Heritage, 79–98. Berlin: Springer, 2011.
Bamman, David, Olivia Lewke, and Anya Mansoor. “An Annotated Dataset of Coreference in English Literature.” In Proceedings of the 12th Language Resources and Evaluation Conference, 44–54. Marseille: European Language Resources Association, 2020. https://www.aclweb.org/anthology/2020.lrec-1.6.
Bamman, David, Sejal Popat, and Sheng Shen. “An Annotated Dataset of Literary Entities.” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2138–44. Minneapolis: Association for Computational Linguistics, 2019. https://doi.org/10.18653/v1/N19-1220.
Bell, Allan. “Language Style as Audience Design.” Language in Society 13, no. 2 (1984): 145–204.
Bode, Katherine. “‘Man People Woman Life’/‘Creek Sheep Cattle Horses’: Influence, Distinction, and Literary Traditions.” In A World of Fiction: Digital Collections and the Future of Literary History, 157–98. Ann Arbor: University of Michigan Press, 2018.
Brooke, Julian, Adam Hammond, and Timothy Baldwin. “Bootstrapped Text-Level Named Entity Recognition for Literature.” In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. Volume 2: Short Papers, 344–50. Berlin: Association for Computational Linguistics, 2016. https://doi.org/10.18653/v1/P16-2056.
Brooke, Julian, Adam Hammond, and Graeme Hirst. “Unsupervised Stylistic Segmentation of Poetry with Change Curves and Extrinsic Features.” In Proceedings of the NAACL-HLT 2012 Workshop on Computational Linguistics for Literature, 26–35. Montreal: Association for Computational Linguistics, 2012. https://www.aclweb.org/anthology/W12-2504.
Brown, Bill. “Thing Theory.” Critical Inquiry 28, no. 1 (2001): 1–22. https://www.jstor.org/stable/1344258.
Brunner, Annelen, Stefan Engelberg, Fotis Jannidis, Ngoc Duyen Tanja Tu, and Lukas Weimer. “Corpus REDEWIEDERGABE.” In Proceedings of the 12th Language Resources and Evaluation Conference, 803–12. Marseille: European Language Resources Association, 2020. https://www.aclweb.org/anthology/2020.lrec-1.100.
Brunner, Annelen, Ngoc Duyen Tanja Tu, Lukas Weimer, and Fotis Jannidis. “Deep Learning for Free Indirect Representation.” Proceedings of the 15th Conference on Natural Language Processing: KONVENS 2019. Erlangen, Germany: German Society for Computational Linguistics & Language Technology, 2019.
Burga, Alicia, Joan Codina, Gabriella Ferraro, Horacio Saggion, and Leo Wanner. “The Challenge of Syntactic Dependency Parsing Adaptation for the Patent Domain.” In ESSLLI-13 Workshop on Extrinsic Parse Improvement. 2013.
Chambers, Nathanael. “Inducing Event Schemas and Their Participants from Unlabeled Text.” PhD thesis, Stanford University, 2011.
Clement, Tanya, and Stephen McLaughlin. “Measured Applause: Toward a Cultural Analysis of Audio Collections.” Journal of Cultural Analytics 1, no. 1 (2018).
Clement, Tanya, David Tcheng, Loretta Auvil, Boris Capitanu, and Megan Monroe. “Sounding for Meaning: Using Theories of Knowledge Representation to Analyze Aural Patterns in Texts.” DHQ: Digital Humanities Quarterly 7, no. 1 (2013).
Cucerzan, Silviu. “Large-Scale Named Entity Disambiguation Based on Wikipedia Data.” In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 708–16. Prague: Association for Computational Linguistics, 2007.
Dekker, Niels, Tobias Kuhn, and Marieke van Erp. “Evaluating Named Entity Recognition Tools for Extracting Social Networks from Novels.” PeerJ Computer Science 5 (2019).
Derczynski, Leon, Diana Maynard, Niraj Aswani, and Kalina Bontcheva. “Microblog-Genre Noise and Impact on Semantic Annotation Accuracy.” In Proceedings of the 24th ACM Conference on Hypertext and Social Media, 21–30. New York: Association for Computing Machinery, 2013.
Derczynski, Leon, Alan Ritter, Sam Clark, and Kalina Bontcheva. “Twitter Part-of-Speech Tagging for All: Overcoming Sparse and Noisy Data.” In RANLP, 198–206. Shoumen, Bulgaria: Incoma, 2013.
Derven, Caleb, Aja Teehan, and John Keating. “Mapping and Unmapping Joyce: Geoparsing Wandering Rocks.” In Digital Humanities 2014. 2014.
Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. “BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding.” In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Volume 1: Long and Short Papers, 4171–86. Minneapolis: Association for Computational Linguistics, 2019.
Elson, David K., and Kathleen R. McKeown. “Automatic Attribution of Quoted Speech in Literary Narrative.” In Proceedings of the 24th AAAI Conference on Artificial Intelligence, 1013–9. AAAI Press, 2010.
Elson, David K., Nicholas Dames, and Kathleen R. McKeown. “Extracting Social Networks from Literary Fiction.” In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, 138–47. Stroudsburg, Pa.: Association for Computational Linguistics, 2010.
Evans, Elizabeth F., and Matthew Wilkens. “Nation, Ethnicity, and the Geography of British Fiction, 1880–1940.” Journal of Cultural Analytics 3, no. 2 (2018).
Fariss, Christopher J., Michael R. Kenwick, and Kevin Reuning. “Measurement Models.” In SAGE Handbook of Research Methods in Political Science and International Relations, edited by Luigi Curini and Robert Franzese. London: Sage, 2020.
Finlayson, Mark A. “ProppLearner: Deeply Annotating a Corpus of Russian Folktales to Enable the Machine Learning of a Russian Formalist Theory.” Digital Scholarship in the Humanities 32, no. 2 (2015): 284–300. https://doi.org/10.1093/llc/fqv067.
Finlayson, Mark Alan. “Inferring Propp’s Functions from Semantically Annotated Text.” Journal of American Folklore 129, no. 511 (2016): 55–77. https://www.jstor.org/stable/10.5406/jamerfolk.129.511.0055.
Flekova, Lucie, and Iryna Gurevych. “Personality Profiling of Fictional Characters Using Sense-Level Links between Lexical Resources.” In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 1805–16. Lisbon: Association for Computational Linguistics, 2015. https://aclweb.org/anthology/D15-1208.
Galves, Charlotte, and Pablo Faria. “Tycho Brahe Parsed Corpus of Historical Portuguese.” 2010. www.tycho.iel.unicamp.br/~tycho/corpus/en/index.html.
Gelman, Andrew, and Thomas C. Little. “Poststratification into Many Categories Using Hierarchical Logistic Regression.” Survey Methodology 23, no. 2 (1997): 127–35.
Genette, Gérard. Figures of Literary Discourse. New York: Columbia University Press, 1982.
Genette, Gérard. Narrative Discourse: An Essay in Method. Ithaca, N.Y.: Cornell University Press, 1983.
Gildea, Daniel. “Corpus Variation and Parser Performance.” In Proceedings of the 2001 Conference on Empirical Methods in Natural Language Processing, 167–202. Stroudsburg, Pa.: Association for Computational Linguistics, 2001.
Haug, Dag TT, and Marius Jøhndal. “Creating a Parallel Treebank of the Old Indo-European Bible Translations.” In Proceedings of the Language Technology for Cultural Heritage Data Workshop (Latech 2008), Marrakech, Morocco, 1st June 2008, 27–34. 2008.
He, Hua, Denilson Barbosa, and Grzegorz Kondrak. “Identification of Speakers in Novels.” In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. Volume 1: Long Papers, 1312–20. Sofia, Bulgaria: Association for Computational Linguistics, 2013.
Houston, Natalie. “Enjambment and the Poetic Line: Towards a Computational Poetics.” In Digital Humanities 2014. 2014.
Hovy, Eduard, Mitchell Marcus, Martha Palmer, Lance Ramshaw, and Ralph Weischedel. “OntoNotes: The 90% Solution.” In Proceedings of the Human Language Technology Conference of the NAACL. Companion Volume: Short Papers, 57–60. Stroudsburg, Pa.: Association for Computational Linguistics, 2006.
Isasi, Jennifer. “Posibilidades de La Minería de Datos Digital Para El Análisis Del Personaje Literario En La Novela Española: El Caso de Galdós Y Los ‘Episodios Nacionales.’” PhD thesis, University of Nebraska, 2017.
Jacobs, Abigail Z., and Hanna Wallach. “Measurement and Fairness.” In Conference on Fairness, Accountability, and Transparency (FAccT ’21). 2021.
Jockers, Matthew. “Revealing Sentiment and Plot Arcs with the Syuzhet Package,” February 2015. http://www.matthewjockers.net/2015/02/02/syuzhet/.
Jockers, Matthew, and Gabi Kirilloff. “Understanding Gender and Character Agency in the 19th Century Novel.” Journal of Cultural Analytics 2, no. 2 (2017).
Klein, Lauren. “Distant Reading after Moretti.” Arcade, January 29, 2018. https://arcade.stanford.edu/blogs/distant-reading-after-moretti.
Klein, Lauren F. “Dimensions of Scale: Invisible Labor, Editorial Work, and the Future of Quantitative Literary Studies.” PMLA 135, no. 1 (2020): 23–39. https://doi.org/10.1632/pmla.2020.135.1.23.
Klein, Sheldon, John F. Aeschlimann, David F. Balsiger, Steven L. Converse, Claudine Court, Mark Foster, Robin Lao, John D. Oakley, and Joel Smith. “Automatic Novel Writing.” University of Wisconsin-Madison, 1973.
Kraicer, Eve, and Andrew Piper. “Social Characters: The Hierarchy of Gender in Contemporary English-Language Fiction.” Journal of Cultural Analytics 3, no. 2 (2018).
Kroch, Anthony, Beatrice Santorini, and Lauren Delfs. “Penn-Helsinki Parsed Corpus of Early Modern English.” Department of Linguistics, University of Pennsylvania, 2004.
Krug, Markus, Lukas Weimer, Isabella Reger, Luisa Macharowsky, Stephan Feldhaus, Frank Puppe, and Fotis Jannidis. “Description of a Corpus of Character References in German Novels—DROC [Deutsches ROman Corpus].” DARIAH-DE Working Papers No. 27, 2017.
Lease, Matthew, and Eugene Charniak. “Parsing Biomedical Literature.” In Natural Language Processing-IJCNLP 2005, 58–69. Berlin: Springer, 2005.
Long, Hoyt, and Richard Jean So. “Turbulent Flow: A Computational Model of World Literature.” Modern Language Quarterly 77, no. 3 (2016): 345–67. https://doi.org/10.1215/00267929-3570656.
Long, Hoyt, Anatoly Detwyler, and Yuancheng Zhu. “Self-Repetition and East Asian Literary Modernity, 1900–1930.” Journal of Cultural Analytics 2, no. 1 (May 2018).
MacArthur, Marit J., Georgia Zellou, and Lee M. Miller. “Beyond Poet Voice: Sampling the (Non-) Performance Styles of 100 American Poets.” Journal of Cultural Analytics 3, no. 1 (2018).
Mandell, Laura. “Gender and Cultural Analytics: Finding or Making Stereotypes?” In Debates in Digital Humanities 2019, edited by Matthew K. Gold and Lauren F. Klein. Minneapolis: University of Minnesota Press, 2019.
Marcus, Mitchell P., Beatrice Santorini, and Mary Ann Marcinkiewicz. “Building a Large Annotated Corpus of English: The Penn Treebank.” Computational Linguistics 19, no. 2 (1993): 313–30.
McClure, David. “Distributions of Words across Narrative Time in 27,266 Novels.” Stanford Literary Lab, July 10, 2017. https://litlab.stanford.edu/distributions-of-words-27k-novels/.
McGillivray, Barbara, Thierry Poibeau, and Pablo Ruiz Fabo. “Digital Humanities and Natural Language Processing: ‘Je t’aime . . . Moi non plus.’” DHQ: Digital Humanities Quarterly 14, no. 2 (2020).
McGrath, Laura B., Devin Higgins, and Arend Hintze. “Measuring Modernist Novelty.” Journal of Cultural Analytics 3, no. 1 (2018).
Meehan, James R. “TALE-SPIN, an Interactive Program That Writes Stories.” IJCAI 77 (1977): 91–98.
Mendenhall, T. C. “The Characteristic Curves of Composition.” Science (1887).
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. “Efficient Estimation of Word Representations in Vector Space.” ICLR (2013).
Moon, Taesun, and Jason Baldridge. “Part-of-Speech Tagging for Middle English through Alignment and Projection of Parallel Diachronic Texts.” In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), 390–99. Prague: Association for Computational Linguistics, 2007.
Mostafazadeh, Nasrin, Nathanael Chambers, Xiaodong He, Devi Parikh, Dhruv Batra, Lucy Vanderwende, Pushmeet Kohli, and James Allen. “A Corpus and Evaluation Framework for Deeper Understanding of Commonsense Stories.” NAACL (2016).
Mosteller, F., and D. Wallace. Inference and Disputed Authorship: The Federalist. Boston: Addison-Wesley, 1964.
Mueller, Martin. “WordHoard,” 2015. https://wordhoard.northwestern.edu/userman/martin-data.html.
Muzny, Grace, Mark Algee-Hewitt, and Dan Jurafsky. “Dialogism in the Novel: A Computational Model of the Dialogic Nature of Narration and Quotations.” Digital Scholarship in the Humanities 32 (July 2017).
Muzny, Grace, Michael Fang, Angel Chang, and Dan Jurafsky. “A Two-Stage Sieve Approach for Quote Attribution.” In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. Volume 1: Long Papers, 460–70. Valencia, Spain: Association for Computational Linguistics, 2017.
Naik, Aakanksha, and Carolyn Rose. “Towards Open Domain Event Trigger Identification Using Adversarial Domain Adaptation.” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 7618–24. Online: Association for Computational Linguistics, 2020. https://www.aclweb.org/anthology/2020.acl-main.681.
Nelson, Laura K. 2020. “Computational Grounded Theory: A Methodological Framework.” Sociological Methods & Research 49, no. 1 (2020): 3–42.
Pagel, Janis, and Nils Reiter. “GerDraCor-Coref: A Coreference Corpus for Dramatic Texts in German.” In Proceedings of the 12th Language Resources and Evaluation Conference, 55–64. Marseille: European Language Resources Association, 2020. https://www.aclweb.org/anthology/2020.lrec-1.7.
Papalampidi, Pinelopi, Frank Keller, and Mirella Lapata. “Movie Plot Analysis via Turning Point Identification.” In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (Emnlp-Ijcnlp), 1707–17. Hong Kong: Association for Computational Linguistics, 2019. https://doi.org/10.18653/v1/D19-1180.
Passarotti, Marco. “Verso Il Lessico Tomistico Biculturale. La Treebank Dell’Index Thomisticus.” In Il Filo Del Discorso. Intrecci Testuali, Articolazioni Linguistiche, Composizioni Logiche. Atti Del Xiii Congresso Nazionale Della Società Di Filosofia Del Linguaggio, Viterbo, Settembre 2006, edited by Petrilli Raffaella and Femia Diego, 187–205. Rome: Aracne Editrice, Pubblicazioni della Società di Filosofia del Linguaggio, 2007.
Pekar, Viktor, Juntao Yu, Mohab El-karef, and Bernd Bohnet. “Exploring Options for Fast Domain Adaptation of Dependency Parsers.” First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages (SPMRL-SANCL 2014), 54–65. 2014.
Pennacchiotti, Marco, and Fabio Massimo Zanzotto. “Natural Language Processing across Time: An Empirical Investigation on Italian.” In Advances in Natural Language Processing, 371–82. Springer, 2008.
Pennington, Jeffrey, Richard Socher, and Christopher Manning. “Glove: Global Vectors for Word Representation.” In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 1532–43. 2014.
Piper, Andrew, Richard Jean So, and David Bamman. “Narrative Theory for Computational Narrative Understanding.” In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). 2021.
Rajpurkar, Pranav, Robin Jia, and Percy Liang. “Know What You Don’t Know: Unanswerable Questions for SQuAD.” In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 784–89. Melbourne: Association for Computational Linguistics, 2018.
Rajpurkar, Pranav, Jian Zhang, Konstantin Lopyrev, and Percy Liang. “SQuAD: 100,000+ Questions for Machine Comprehension of Text.” In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, 2383–92. Austin, Tex.: Association for Computational Linguistics, 2016.
Rambsy, Kenton, and Peace Ossom-Williamson. Lost in the City: An Exploration of Edward P. Jones’s Short Fiction. Urbana, Ill.: Publishing Without Walls, 2019. https://iopn.library.illinois.edu/scalar/lost-in-the-city-a-exploration-of-edward-p-joness-short-fiction-/index.
Rayson, Paul, Dawn Archer, Alistair Baron, Jonathan Culpeper, and Nicholas Smith. “Tagging the Bard: Evaluating the Accuracy of a Modern POS Tagger on Early Modern English Corpora.” In Proceedings of Corpus Linguistics (Cl2007). 2007.
Reagan, Andrew J., Lewis Mitchell, Dilan Kiley, Christopher M. Danforth, and Peter Sheridan Dodds. “The Emotional Arcs of Stories Are Dominated by Six Basic Shapes.” EPJ Data Science 5, no. 1 (2016): 31.
Reeve, Jonathan. “The Henry James Sentence: New Quantitative Approaches,” June 7, 2017. https://jonreeve.com/2017/06/henry-james-sentence/.
Reiter, Nils, Marcus Willand, and Evelyn Gius. “A Shared Task for the Digital Humanities Chapter 1: Introduction to Annotation, Narrative Levels and Shared Tasks.” Journal of Cultural Analytics 4, no. 3 (December 2019).
Rhody, Lisa M. 2012. “Topic Modeling and Figurative Language.” CUNY Academic Works, 2012. https://academicworks.cuny.edu/cgi/viewcontent.cgi?article=1557&context=gc_pubs.
Risam, Roopika. “Other Worlds, Other DHs: Notes towards a DH Accent.” Digital Scholarship in the Humanities 32, no. 2 (2016): 377–84. https://doi.org/10.1093/llc/fqv063.
Samberg, Rachael G., and Cody Hennesy. “Law and Literacy in Non-Consumptive Text Mining: Guiding Researchers through the Landscape of Computational Text Analysis.” In Copyright Conversations: Rights Literacy in a Digital World. 2019. https://escholarship.org/uc/item/55j0h74g.
Sandhaus, Evan. “The New York Times Annotated Corpus.” LDC. 2008.
Sap, Maarten, Eric Horvitz, Yejin Choi, Noah A. Smith, and James Pennebaker. “Recollection versus Imagination: Exploring Human Memory and Cognition via Neural Language Models.” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 1970–8. Online: Association for Computational Linguistics, 2020. https://www.aclweb.org/anthology/2020.acl-main.178.
Scheible, Silke, Richard J. Whitt, Martin Durrell, and Paul Bennett. “Evaluating an ‘Off-the-Shelf’ POS-Tagger on Early Modern German Text.” In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities,19–23. Portland, Ore.: Association for Computational Linguistics, 2011.
Schofield, Alexandra, Laure Thompson, and David Mimno. “Quantifying the Effects of Text Duplication on Semantic Models.” Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers. 2017.
Sims, Matthew, and David Bamman. “Measuring Information Propagation in Literary Social Networks.” In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 642–52. Association for Computational Linguistics, 2020.
Sims, Matthew, Jong Ho Park, and David Bamman. “Literary Event Detection.” In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, 3623–34. Florence, Italy: Association for Computational Linguistics, 2019. https://doi.org/10.18653/v1/P19-1353.
Sprugnoli, R., and S. Tonelli. “One, No One and One Hundred Thousand Events: Defining and Processing Events in an Inter-Disciplinary Perspective.” Natural Language Engineering 23, no. 4 (2017): 485–506. https://doi.org/10.1017/S1351324916000292.
Sundheim, Beth M. “Overview of the Third Message Understanding Conference.” In Processing of the Third Message Understanding Conference. 1991.
Taylor, Ann, and Anthony S. Kroch. “The Penn-Helsinki Parsed Corpus of Middle English.” University of Pennsylvania, 2000.
Taylor, Ann, Arja Nurmi, Anthony Warner, Susan Pintzuk, and Terttu Nevalainen. “Parsed Corpus of Early English Correspondence.” Oxford Text Archive, 2006.
Tenen, Dennis Yi. “Toward a Computational Archaeology of Fictional Space.” New Literary History (2018).
Thompson, Laure, and David Mimno. “Authorless Topic Models: Biasing Models Away from Known Structure.” In Proceedings of the 27th International Conference on Computational Linguistics. 2018.
Underwood, Ted. “Why Literary Time Is Measured in Minutes.” University of Illinois, 2016.
Underwood, Ted. Distant Horizons: Digital Evidence and Literary Change. University of Chicago Press, 2019.
Underwood, Ted, David Bamman, and Sabrina Lee. “The Transformation of Gender in English-Language Fiction.” Journal of Cultural Analytics 3, no. 2 (2018).
Vala, Hardik, David Jurgens, Andrew Piper, and Derek Ruths. “Mr. Bennet, His Coachman, and the Archbishop Walk into a Bar but Only One of Them Gets Recognized: On the Difficulty of Detecting Characters in Literary Texts.” In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 769–74. Lisbon: Association for Computational Linguistics, 2015.
Vishnubhotla, Krishnapriya, Adam Hammond, and Graeme Hirst. “Are Fictional Voices Distinguishable? Classifying Character Voices in Modern Drama.” In Proceedings of the 3rd Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, 29–34. Minneapolis: Association for Computational Linguistics, 2019. https://doi.org/10.18653/v1/W19-2504.
Walker, Christopher, Stephanie Strassel, Julie Medero, and Kazuaki Maeda. “ACE 2005 Multilingual Training Corpus.” LDC. 2006.
Wallace, Byron. “Multiple Narrative Disentanglement: Unraveling Infinite Jest.” In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 1–10. Montreal: Association for Computational Linguistics, 2012. https://www.aclweb.org/anthology/N12-1001.
Walsh, Melanie, and Maria Antoniak. “The Goodreads ‘Classics’: A Computational Study of Readers, Amazon, and Crowdsourced Amateur Criticism.” Journal of Cultural Analytics 6, no. 2 (2021).
Weischedel, Ralph, Sameer Pradhan, Lance Ramshaw, Jeff Kaufman, Michelle Franchini, Mohammed El-Bachouti, Nianwen Xue, et al. “OntoNotes Release 5.0.” 2012.
Wilmot, David, and Frank Keller. “Modelling Suspense in Short Stories as Uncertainty Reduction over Neural Representation.” In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 1763–88. Online: Association for Computational Linguistics, 2020. https://www.aclweb.org/anthology/2020.acl-main.161.
Wolfe, Erin. “Natural Language Processing in the Humanities: A Case Study in Automated Metadata Enhancement.” Code4lib 46 (2019).

Computational Parallax as Humanistic Inquiry

Show the following:

Adjust appearance:

Notes

Born Literary Natural Language Processing

Performance across Domains

LitBank

Sources

Phenomena

Entities

Coreference

Quotation Attribution

Events

Coverage

Literary-Centric Questions

Future

Notes

Bibliography

Annotate