Skip to main content

Computational Humanities: Computation and Hermeneutics

Computational Humanities
Computation and Hermeneutics
  • Show the following:

    Annotations
    Resources
  • Adjust appearance:

    Font
    Font style
    Color Scheme
    Light
    Dark
    Annotation contrast
    Low
    High
    Margins
  • Search within:
    • Notifications
    • Privacy
  • Project HomeComputational Humanities
  • Projects
  • Learn more about Manifold

Notes

table of contents
  1. Cover
  2. Title Page
  3. Copyright Page
  4. Contents
  5. Introduction. What Gets Counted: Computational Humanities under Revision | Lauren Tilton, David Mimno, and Jessica Marie Johnson
  6. Part I. Asking With
    1. 1. Computation and Hermeneutics: Why We Still Need Interpretation to Be by (Computational) Humanists | Hannah Ringler
    2. 2. Computing Criticism: Humanities Concepts and Digital Methods | Mark Algee-Hewitt
    3. 3. Born Literary Natural Language Processing | David Bamman
    4. 4. Computational Parallax as Humanistic Inquiry | Crystal Hall
    5. 5. Manufacturing Visual Continuity: Generative Methods in the Digital Humanities | Fabian Offert and Peter Bell
    6. 6. Maps as Data | Katherine McDonough
    7. 7. Fugitivities and Futures: Black Studies in the Digital Era | Crystal Nicole Eddins
  7. Part II. Asking About
    1. 8. Double and Triple Binds: The Barriers to Computational Ethnic Studies | Roopika Risam
    2. 9. Two Volumes: The Lessons of Time on the Cross | Benjamin M. Schmidt
    3. 10. Why Does Digital History Need Diachronic Semantic Search? | Barbara McGillivray, Federico Nanni, and Kaspar Beelen
    4. 11. Freedom on the Move and Ethical Challenges in the Digital History of Slavery | Vanessa M. Holden and Joshua D. Rothman
    5. 12. Of Coding and Quality: A Tale about Computational Humanities | Julia Damerow, Abraham Gibson, and Manfred D. Laubichler
    6. 13. The Future of Digital Humanities Research: Alone You May Go Faster, but Together You’ll Get Further | Marieke van Erp, Barbara McGillivray, and Tobias Blanke
    7. 14. Voices from the Server Room: Humanists in High-Performance Computing | Quinn Dombrowski, Tassie Gniady, David Kloster, Megan Meredith-Lobay, Jeffrey Tharsen, and Lee Zickel
    8. 15. A Technology of the Vernacular: Re-centering Innovation within the Humanities | Lisa Tagliaferri
  8. Acknowledgments
  9. Contributors

Chapter 1

Computation and Hermeneutics

Why We Still Need Interpretation to Be by (Computational) Humanists

Hannah Ringler

In Graphesis, Johanna Drucker uses train timetables as an example of a visual form that produces knowledge as well as represents it. A train timetable, on the one hand, simply represents the departure times of various trains, but it also allows its reader to calculate all various itineraries for how and when to get from point A to point B. These are not itineraries that the table itself provides; rather, they are knowledge produced by the reader from interpreting the table for their own purposes (perhaps to determine if their train is late or to evaluate the timeliness of a certain route). This example separates out the how of interpretation, which becomes salient usually when we do not know how to do it: many of us have likely stood confused in front of a train schedule when traveling ourselves, grappling with the distinction of knowing what a train schedule is but not having the same skills as those around us to figure out the information that we need.

In this chapter, I argue for a focus on the how of interpretation, or for developing hermeneutics. In computational humanities work, we are frequently traversing in new tools and forms of data where the method of interpretation is not even implicit but rather yet to be developed at all. For example, if we can distinguish two corpora using a logistic regression model or stylometry analysis, how do we move from those tables of numbers to insightful humanistic claims? How do you go about interpreting a table of weighted words or “distances” between pairs of texts in order to learn about the texts themselves? While new tools may construct new reads on artifacts like texts, the data they produce does not speak for itself and tell you what those new pieces of data mean. To be computational humanists, wherein our work fits within the broader scope of the humanities, our computation needs a hermeneutics to match.

Toward developing hermeneutics for computation, my goals with this piece are twofold: First, I hope to clarify the need for and the challenge of developing theories of interpretation by distinguishing two separate phases of interpretation. In particular, I distinguish between interpreting artifacts through tools and interpreting the output of tools toward humanistic claims. Second, I want to focus in on this second step of interpretation as a next major area of growth for computational humanities work and offer a framework for the types of questions that might be asked when interpreting the output of tools. In particular, I outline how we can think about hermeneutics in computational humanities as asking questions with and about digital tools. Ultimately, developing tools in tandem with careful thinking about how to interpret them not only opens up the use of those methods by a broader humanities audience, but it is a necessary part of computational method building in order to make our processes transparent and our claims insightful to the broader humanities. 

Interpretation, Knowledge Production, and Disciplinarity

Interpretation, hermeneutics, and methods of knowledge production more generally have a long and varied history that dates back at least to the ancient Greek Sophists. The history of hermeneutics even just in computational humanities work is too long and complex to treat thoroughly here (see van Zundert for an excellent review), but thinking through a few key concepts is useful for clarifying some of what developing hermeneutics for computational work might entail. 

In particular, a brief example drawn from social scientists Bruno Latour and Steve Woolgar is useful for thinking theoretically about method, tools, and the various forms and complexities of interpretation. In Laboratory Life, their book-length study on knowledge production, Latour and Woolgar are interested in how a biology lab goes about constructing facts, by which they do not mean to construe knowledge as completely relativistic but rather to highlight that facts have a process by which they come to be. They provide a whole narrative of the lengthy process by which biologists in one particular lab went about determining the structure of a certain hormone: first by using one tool and coming to a conclusion based on those results, then learning of a problem with that method that threw its accuracy into question and invalidated their conclusion, and finally devising a new tool altogether. The biologists eventually were satisfied with the results of the new method, finding it conclusive as to the hormone’s structure because it “eliminate[d] all but a few possibilities” as to the structure, while other methods only eliminated a few possibilities at a time and thus were not conclusive enough (146). At this point, the new conclusion about the hormone’s structure was taken as fact in the wider scientific community. 

This example, especially coming from a biology lab where method is so apparent, is helpful for distinguishing three key terms: tool, interpretation (which appears in two forms), and method. Tools are identifiable in this example as that which produces some data about the artifact (note that “data” is not a neutral term and has been used to refer to a wide variety of things, e.g., the text as data movement; in this discussion, I use “data” in its more common contemporary usage to refer to a tool’s output). The use of a tool itself is one form of interpretation, though: the biologists here imposed an interpretive screen in the form of a tool that could produce some data about the hormone, thus re-imagining and interpreting the hormone as data. The use of this tool was a choice (indeed, they developed the second one specifically for investigating this hormone), shaped by their evolving beliefs about what kind of data would be helpful for answering their questions. After imposing an interpretive screen, though, the biologists move to a second step of interpretation in that they interpreted that data toward a claim. The movement from tool usage to claim might normally go unnoticed, but it is especially salient in this example because it was at first misguided by an inaccurate belief about how accurate and conclusive their method was. When the biologists decided to use a new tool to determine the structure, they did so believing that using it eliminated enough other possibilities as to the structure that its results could be taken as conclusive. Their conclusions were shaped by their beliefs at the given time, their trust in tools, and their weighing of possible other alternatives: “It is thus important to realise that when a deduction is said not to be logical, or when we say that a logical possibility was deflected by belief, or that other deductions later became possible, this is done with the benefit of hindsight, and this hindsight provides another context within which we pronounce on the logic or illogic of a deduction” (Latour and Woolgar, 136). The entirety of these decisions—from choosing artifacts to using tools and interpreting toward claims—and all of the socially and culturally dependent decisions around them comprise the method.

Knowledge production methods as a whole, including these two forms of interpretation, are processes guided by social norms and beliefs. The biology example demonstrates how methodological decisions were made communally by those in the discipline based on prevailing beliefs; more generally, though, the design of tools and hermeneutical decisions about how to make sense of a tool’s output are shaped by social beliefs rather than neutrally objective or intuitive (Rockwell and Sinclair). Knowledge production itself is heavily informed and shaped by the social and historical rather than by disinterested and objective observation, especially in decisions around how to move from tool usage to claim or interpretation. Theoretically speaking, thinking about knowledge as socially constructed in this way became more accepted in the mid-twentieth century, post-Kant, and especially in the aftermath of World War II with key thinkers like Thomas Kuhn, Michel Foucault, Karl Popper, Tara McPherson, and others.1

The collective decisions about what types of tools to use, how to design them, and how to interpret their outputs to come up with claims are made by social communities known as disciplines. Focusing on the social and historical forces of knowledge production allows us to conceptualize disciplines as communities with accepted “active ways of knowing” rather than “repositories and delivery systems for relatively static content knowledge” (Carter, 387). In effect, this reconceptualization of disciplines as ways of constructing knowledge introduces not only method narrowly conceived as data gathering as part of knowledge production but also the ways we interpret data as an integral part of methods and how we define a discipline and its work. While the state of digital humanities as a discipline is often shifting, it is nonetheless a social community engaging in similar types of work that is often oriented around methods and the digital, and thus it is in a position to theorize about and make these types of decisions.

Computation and Hermeneutics

As an area of study situated within the broader digital humanities, computational humanities is a space to think carefully about what “ways of knowing,” hermeneutics, or interpretive strategies might define the humanities’ use of computational tools. Exciting work in computational humanities thus far has focused on the first phase of interpretation, or interpreting artifacts through tools. That is, somewhere in the messy, iterative, and sometimes playful process of computational analyses, we run some form(s) of a computational tool that produces data in the form of numbers or visualizations—perhaps we create a ranked feature list, a network visualization, a clustering of texts, or a list of topic models. Geoffrey Rockwell and Stéfan Sinclair refer to these tools as “analytical tools” and explain that they are “instantiations of interpretive methods” in that they focus our attention on certain elements of texts or other artifacts to think about them and through them in new ways (4). As an example, Jacques Savoy visualized the stylistic differences between nine 2016 presidential candidates’ speeches, allowing for a view of not just different groups of candidates but a representation of their relationships and magnitudes of differences at a glance. This work imposed an interpretive screen on those texts, made possible through the tool by focusing attention on particular features to see them in a new way. As analysts, we might try on several of these tools, reading and rereading our artifacts with different tools and views until we figure out interesting questions to ask and how to answer them (Rockwell and Sinclair, 169–87). 

While these tools can construct new reads on texts, they leave open what those new pieces of data mean. A study like Savoy’s is useful for its new visualization of candidates’ possible relations to each other, but it also introduces a whole new realm of questions that are hard to answer: Why did candidates group together in the ways they did? Does it relate to the topics they focused on? The values they believed their bases to have? And if so, how? The clustering itself can be used as a jumping off point to thinking more critically about what makes up verbal style and how it might be connected to topic, genre, beliefs about audience, values, and so forth. These questions are also interpretive questions, but of a different kind than the interpretive screen the tool provides. Answering them requires developing different hermeneutical processes to interpret the data. In addition, the use of computational tools not only changes the kinds of interpretive questions we might ask but also complicates how we answer them: Who does this kind of interpretation and how is it wrapped up in issues of power and identity (D’Ignazio and Klein), and as such, we must think critically about how interpretation happens and what it means to avoid treating it as a neutral process wherein ideas simply emerge from the tools and their output themselves (Bode). 

When we use computational tools as part of methods in the humanities, then, they may help us to explore and impose new kinds of interpretive screens on our artifacts, but they ultimately stop short of interpreting the data they produce. To undertake “algorithmic criticism” wherein tools enable critical engagement (Ramsay), it is important to recognize that doing so requires careful theorizing not just about how the tool is built and used but also about how we engage interpretively with its output. As computational humanists, if we want to contribute to humanistic inquiry, computational analysis is only useful for us insofar as it is paired with a careful interpretation of data that aligns with the humanistic tradition. In other words, our computation needs a hermeneutics. To stop short of engaging with hermeneutical questions, or to only engage with interpretive questions about how tools act as interpretive screens rather than interpreting their outputs toward claims, is a different disciplinary space of tool-building rather than holistic method-building. For while methods involve the whole range of activities from artifact-gathering to claim-making (and all are filled with interpretive choices), methods in the humanities, even if digital, must move us toward humanistic claims, which requires engagement with these difficult interpretive challenges as part of our methodological engagement with tools more broadly. 

A large amount of scholarship in computational humanities thus far has engaged well with the first phase of interpreting artifacts through tools. Taylor Arnold and Lauren Tilton explain how computational tools allow for exploring data in new ways, which is evident in their work on Photogrammar as well as other technical approaches like Maciej Eder’s influential work thinking through the complexities of visual techniques in stylometry. While this is a necessary, and frequently theoretically challenging space, interpreting the output of those tools is a separate phase to theorize that still needs work. Ryan Heuser and Long Le-Khac, for example, narrate how challenging it is to find yourself face-to-face with a table of data to make meaning of, and Sculley and Pasanek point to the difficulties in making sense of outputs given the assumptions built into computational tools. Traditional humanities work has a long history of engaging with hermeneutics and theories of interpretation to the point that analyses may not even mention them explicitly (e.g., Clement notes that close reading often goes unquestioned as a methodological approach in textual scholarship). Given the relative newness of many computational tools though, their hermeneutics could benefit from being clearer. Computational humanities has an opportunity to develop these theories in partnership with tool-builders such that the analyses produced can speak fruitfully to humanistic inquiry.

As a step toward developing these hermeneutics more fully as a field, I offer in the next section a framework for the types of questions that might be asked when interpreting the output of tools in computational humanities work. Some initial scholarship has begun higher-level theorizing a hermeneutics for interpreting the output of tools. For example, Andrew Piper conceptualizes the movement back and forth from close to distant reading, Jo Guldi outlines a process called “critical search,” and Heuser and Le-Khac explore how we can think of interpretation as hypothesis-testing. In earlier work, I talked about the process of building up evidence toward an understanding as a way of constructing arguments in the face of uncertainty (Ringler). In considering more broadly the types of interpretive questions we might encounter in analyses, I join David Bamman and Mark Algee-Hewitt in making space in this volume and the computational humanities for more critical thought about the types of hermeneutical processes we need to continue developing and the words and structure needed to define what this development might look like for future methodological work. More broadly, focusing on this second step of interpretation, of interpreting the output of tools and how that contributes to what Lincoln Mullen calls a “braided narrative” between method and interpretation, speaks to the distinction that Dennis Tenen makes between methods and tools. It also responds to concerns raised by scholars like Alan Liu and Tanya Clement about connecting method and theory. Developing tools in tandem with careful thinking about how to interpret them makes these methods more accessible to a broader humanities audience. In addition, it is also a necessary part of computational method exploration in order to make our processes transparent and our claims insightful to the broader humanities community.

A Framework for Humanistic Inquiry in Computational Humanities

As we further develop our hermeneutics for computation and focus on the second step of interpretation, what might that look like? What does interpreting the outputs of tools look like? To a large extent, this varies based on specific disciplines: the types of research questions and trajectories, methods, and orientation to the humanities broadly will affect how the results of a tool are interpreted and what counts as productive interpretation and claims for a disciplinary community (Robertson). Despite this level of variation, though, we can think generally about what interpretation of computational data in the humanities looks like based on the work being produced in computational humanities spaces now. In this section, I sketch a framework for the types of hermeneutical questions that might be asked of computational methods in the humanities. While definitions of digital humanities abound, to structure this particular framework, Kathleen Fitzpatrick’s definition of digital humanities is useful for clarifying the broad types of work that digital humanities does: DH both “use[s] computing technologies to investigate the kinds of questions that are traditional to the humanities” and “ask[s] traditional kinds of humanities-oriented questions about computing technologies.” This definition poses two categories of questions—asking questions with and about computing technologies—which is a productive frame for thinking about the types of interpretation we do with computation in the humanities.

Asking With

To ask questions with computing technologies is akin to what Rockwell and Sinclair describe when they explain hermeneutics in digital humanities as an iterative process between exploring and hypothesis testing. In this process, an analyst starts with artifacts, asks humanistic questions of them, and uses tools to help them answer those questions. Text analysis is thus the practice of “re-reading a text in different ways with the assistance of computers that make it practicable to ask formalized questions and get back artificial views” (Rockwell and Sinclair, 189). While various tools might help with broad exploration of the artifacts, as with an exploratory data analysis (EDA) framework (Tukey; Arnold and Tilton), the driving focus is the analytical humanistic question, and the tool helps answer it by pointing to specific features of interest. As we explore new tools, though, and how they interact with messy, humanistic artifacts, the key challenge for computationalists is to figure out how to interpret the tool’s output to make an interpretive claim. The humanities are fundamentally about complicating, interpreting, and understanding the messy human experience (MacDonald), which means the challenge for working with computational methods is to interpret numerical models to create well-supported insights into humanities areas. This challenge is not trivial but is fundamental to using computation in humanistic spaces. 

A few examples are helpful for demonstrating what it looks like to ask questions with computing technologies, and especially what it looks like to think carefully about how to interpret the results of tools toward research questions. As one useful illustration, a piece by David Kaufer and Shawn Parry-Giles uses a corpus text analysis tool called DocuScope (Kaufer and Ishizaki) to identify different identities that Hillary Clinton displays in her two memoirs, Living History and Hard Choices. They ultimately find seven distinct identities that Clinton enacts (e.g., litigator, political visionary, storyteller) and contextualize these identities within larger discussions of political memoirs as a genre and how these identities were strategic choices in her bid for the presidency and her role as a woman politician. They use these multiple identities to explain that women candidates in particular are “chronically ‘double-binded’ into ‘personality problems’ that impact voters’ attachment to them” as they balance creating emotional connections with voters and staying socially “correct” and “appropriate” (Kaufer and Parry-Giles, 22). Ultimately, the authors can make claims about the rhetorical strategies that Hillary Clinton used in her memoirs and how those identities speak to her complicated position as a female politician. 

DocuScope, the tool used in this analysis, is a freely available dictionary-based corpus tool.2 It classifies words and phrases into rhetorical patterns that can be counted. Factor analysis can then reveal which rhetorical patterns statistically occur together in texts, and these factors can be interpreted as rhetorical strategies. This piece was published in the Quarterly Journal of Speech, which does not regularly publish heavily computational work, so the authors take some time in the piece to explain how they go about interpreting the factors and the results that their tool produces:

The computer harvesting of latent factors (e.g., litigator versus storytelling prose) in texts identifies patterns that the human reading brain can evaluate post hoc but cannot reliably isolate and harvest systematically across hundreds and thousands of pages of text. . . . This harvesting supplies readers with a “third eye” able to discern recurrent textual patterns that remain invisible even to the most observant critic. Such systematic linkages between macrorhetorical strategies and microrhetorical patterns guards against selection bias and ensures that the passages selected for evidence are not cherry-picked but fill in defining components systematically dispersed across both memoirs. (Kaufer and Parry-Giles, 22)

In their analysis, then, the authors use the tool to identify recurring patterns that may go unnoticed by a normal serial reader. By having attention drawn to these recurring patterns, the authors are better poised to look at the text-level patterns as a whole and interpret them as “macrorhetorical strategies.” These macro strategies are the ones that they ultimately name as different identities. In this example, the tool itself does some level of interpretation in its categorization of words and phrases into particular rhetorical categories and the statistical representation of those categories in factors. But we then see an explicit second step of interpretation, wherein the authors conceptualize this interpretation as a “third eye” on the text and make their own interpretations of the factors based on their readings of the texts and understandings of their contexts. The second step of interpretation is somewhat closer to a traditional/non-corpus rhetorical analysis in its close focus on context, identity, and rhetorical situation but involves a different interpretive process due to the nature of the data produced by the method. In particular, the authors return to serial reading representative pieces of the text with an intense focus on certain types of vocabulary (highlighted visually by the tool) and think critically about how the vocabulary types in one factor might collectively support a particular identity. The various identities of Clinton, then, are a result of the authors’ interpretations of the tool’s output, rather than something created through just the tool itself.

Another piece by Marissa Gemma, Frédéric Glorieux, and Jean-Gabriel Ganascia uses slightly more traditional computational methods on texts but also highlights this second step of interpretation. In this piece, the authors explain that literary critics have agreed that American literature emerged as a distinct style that was more colloquial in the postbellum period, but that we do not know a lot about what linguistic features make up that style. They frame their study as testing the claim about the emergence of a more colloquial style in American literature and as an investigation of features that make up that style by focusing specifically on the prevalence of repetition. In the first part of this study, they find that American fiction does increase in its rate of repetition overall, but that the results are inconclusive because British fiction measures an increase in repetition by one measure but no increase by another. The authors call these results “inconclusive” and look for tests that will provide “more interpretable historical results” (321). Based on the first results, they use different measures to find that repetition actually increases when more speech is represented in writing, and that there is a clear increase in this feature over time, though it is ambiguous whether it is unique to American fiction. 

This study does not necessarily wind up confirming the hypothesis that a colloquial style is unique to American fiction, but it offers some novel insight into trends in American and possibly British fiction in the long nineteenth century. In particular, because the repetition results seemed inconclusive at first, this prompted the authors to explore with new metrics, which allowed them to discover that represented speech (and therefore repetition) increased in American and British fiction in this time period. Ultimately, the authors are able to explain what a colloquial style means in fiction by demonstrating how represented speech and dialogue increases. The claim here is ultimately one about colloquial style in American and British fiction, which was discovered through interpretation by the tool (i.e., counting repetition in novels over time) and interpretation of the tool’s output (i.e., using the results of which words were repeated frequently to map this trend onto represented speech and dialogue between characters in fiction and connecting this to broader literary criticism about colloquial style in the long nineteenth century). 

Both pieces (Kaufer and Parry-Giles; Gemma, Glorieux, and Ganascia) explicitly highlight the unique challenges of interpretation in computational work and the development of a new hermeneutics for making sense of these data. With DocuScope, the authors had to grapple with a question of how to interpret factors as rhetorical strategies. How do you look at a list of rhetorical patterns or types of words that cluster into a factor and decide that it suggests a “litigator” identity? Doing so requires a familiarity with how identity instantiates itself rhetorically (especially in regard to public figures and feminist theory) as well as a deep cultural knowledge of how political memoirs function to create public identities. With measuring repetition, how can the authors know how much of the repetition is made up by increased speech? And even if they can, how do we know that this is what literary critics mean when they say fiction seems more colloquial in America by the early twentieth century? Answering these questions requires not only a thorough understanding of American literary history but also of what it means for a novel to be seen as colloquial and how colloquial language takes its form in speech and literature more broadly. 

These kinds of hermeneutical questions that we must grapple with when using computational methods speak deeply to our training as humanists in that they are somewhat messy and not clearly verifiable in statistical terms but rather connect to contextual understandings of the artifacts being worked with. Both of these interpretive challenges appear to have been addressed at least in part by some degree of serial reading of each corpus, with a focus on the features highlighted by the tools and a deep understanding by the authors of not only the tools but also the texts and contexts they were working with. This, then, allowed them to draw on their more traditional humanistic training in their disciplines to interpret the results toward an argumentative claim that fits within the paradigms of rhetorical and literary studies, respectively. These interpretive decisions and processes may be more or less explicit in the papers themselves, but they are the kinds of challenging methodological processes that we are well poised to think about critically in computational humanities as they draw on both the expertise of humanists (in subject matter) and data scientists (in tools) in order to figure out how to create insight from reams of data. 

Asking About

While asking questions with computing technologies puts the analytic questions front and center, asking questions about computing technologies instead highlights the hermeneutic questions more explicitly. Regarding computational tools and methods specifically, this type of question prompts us to think about what particular tools or methods offer us and how they might change or add to our thinking about a concept. For example, the Six Degrees of Francis Bacon project (Warren et al.) reconstructs and visualizes a large social network in early modern Britain. This visualization allows early modernist researchers to think about relationships in entirely new ways by seeing connections between people at scale. Indeed, another social network visualization project focusing on medieval Scotland revealed a previously undiscovered role played by Duncan II, Earl of Fife (Jackson). On a more abstract level, visualizations like these ask us to question and redefine our concepts of community and what it means to be part of a social network with others in a specific context. By visualizing these connections, we have to make decisions: How do we decide someone is connected enough to warrant a link in a social network? How connected must someone be? For how long? What kind of interactions count as being connected? The tools themselves prompt a whole variety of reflective questions and ask us to make interpretive choices while also challenging our concepts of messy, humanistic concepts like community. In this sense, the tool and how we design it raises hermeneutic questions about how to interpret its design and output, which connect explicitly to broader questions that a discipline is already invested in and, as others have demonstrated (e.g., Cox and Tilton), can also prompt new interdisciplinary lines of argument and inquiry. 

Asking questions about computational methods also opens up space for being puzzled by the outputs of tools and investigating deeply why they produce the results that they do. In the last several years, there has been growing interest in interpretable machine learning (Belinkov and Glass; Lipton). This interest points to a class of question that arises when a computational tool produces a result that seems in some way accurate or matches with our understanding of the world, but we are puzzled as to why. For example, a tool may classify texts by author gender very well (thus matching with a preexisting category we place on authors), but it may be unclear why this can be done, given the features the tool uses. Many of the more complex computational methods can be described as somewhat of a black box, making it difficult to determine why they produce the results that they do. Hugh Craig raised a version of this kind of question about stylometry methods in a paper’s subtitle: “If you can tell authors apart, have you learned anything about them?” A few explorations of why particular textual features connect to authors or speakers have been taken up inside and outside the context of stylometry (McKenna and Antonia; Argamon, Dodick, and Chase; Pennebaker), but the question is still frequently a puzzling one. 

These kinds of difficult questions about computational methods are interesting, though, because they point to places where we have clear humanistic insight to gain: In stylometry, for example, why is it that certain function words map onto the gender of fiction authors in particular time periods? What would we learn about function words, gender, and language more broadly by being able to answer this question? In his piece on “computing criticism” in chapter 2 in this volume, Algee-Hewitt challenges digital humanists to engage with puzzling models by seeing unexplainable models as an opportunity to reevaluate the artifacts in question rather than assuming the model is an inaccurate or useless representation. Much like the “asking with” examples used patterns to draw the analysts’ attention to certain features in their reading, these kinds of puzzling results can be seen as lenses that draw our attention to certain features of texts, images, or other artifacts that warrant further investigation and thought about how they connect to other social aspects of the data. Developing the interpretive processes to answer these kinds of puzzling questions is an active new area of study, and one that requires collaboration and crossover with computer and data scientists to deeply understand the tools themselves and with humanists to deeply understand the artifacts and their contexts.


Using computational tools in the humanities opens up a wide space for thinking about how to interpret data toward humanistic inquiry. Tools themselves can be thought of as interpretive screens on artifacts, but they ultimately stop short of answering questions about what their outputs mean, which is a crucial though often implicit part of knowledge production as a theoretical process. Computational tools offer a new challenge for moving from lens to claim because of the entirely different nature of the lenses themselves. With computational tools in the humanities, then, we need to both keep learning how to apply tools as well as develop hermeneutics for different tools and artifacts. 

Asking questions with and about these computational tools are two ways of thinking about the types of hermeneutical theories we might develop for different tools. Figuring out what the numbers mean, and how to go about doing that, is not a trivial process and should be interdisciplinary. While computational fields are well equipped to construct these tools and understand the assumptions and processes behind them, humanists are poised to think critically about what these tools tell us about artifacts and how interpretive practices fit within a humanistic tradition that acknowledges how feminist, intersectional, postcolonial, and other types of perspectives shape meaning-making. In this sense, computationalists and humanists have a lot to offer each other in considering a broader challenge of how to interpret and make meaning in the age of big data and machine learning. As we build computational humanities, a key component will be further thinking through the entire methodological process by considering tools as interpretation and interpretation of tools as processes that need hermeneutics. Collaboration will be key. Communally developing those hermeneutics explicitly as an integral part of method-building will help us to be more transparent in our research and able to use those tools in ways that contributes productively to the broader humanities and our computational world.

Notes

  1. 1. Considering knowledge as socially constructed rather than objectively emerging has grown out of a huge range of scholarship from different perspectives. A few notable examples include Kuhn, who revealed the social aspects of scientific knowledge developing with his theorization of paradigms; Foucault, who offered a lens on knowledge construction through language; and McPherson, who pointed to the role of social justice and civil rights movements in shaping knowledge.

  2. 2. DocuScope is developed and hosted by Carnegie Mellon University: https://www.cmu.edu/dietrich/english/research-and-publications/docuscope.html.

Bibliography

  1. Argamon, Shlomo, Jeff Dodick, and Paul Chase. “Language Use Reflects Scientific Methodology: A Corpus-Based Study of Peer-Reviewed Journal Articles.” Scientometrics 75, no. 2 (2008): 203–38.
  2. Arnold, Taylor, and Lauren Tilton. “New Data? The Role of Statistics in DH.” In Debates in the Digital Humanities 2019, edited by Matthew K. Gold and Lauren F. Klein, 293–99. Minneapolis: University of Minnesota Press, 2019.
  3. Belinkov, Yonatan, and James Glass. “Analysis Methods in Neural Language Processing: A Survey.” Transactions of the Association for Computational Linguistics 7 (2019): 49–72. 
  4. Bode, Katherine. “The Equivalence of ‘Close’ and ‘Distant’ Reading: or, toward a New Object for Data-Rich Literary History.” Modern Language Quarterly 78, no. 1 (2017): 77–106.
  5. Carter, Michael. “Ways of Knowing, Doing, and Writing in the Disciplines.” College Composition and Communication 58, no. 3 (2007): 385–418. 
  6. Clement, Tanya E. “Where Is Methodology in Digital Humanities?” In Debates in the Digital Humanities 2016, edited by Matthew K. Gold and Lauren F. Klein, 153–75. Minneapolis: University of Minnesota Press, 2016.
  7. Cox, Jordana, and Lauren Tilton. “The Digital Public Humanities: Giving New Arguments and New Ways to Argue.” Review of Communication 19, no. 2 (2019): 127–46. 
  8. Craig, Hugh. “Authorial Attribution and Computational Stylistics: If You Can Tell Authors Apart, Have You Learned Anything about Them?” Literary and Linguistic Computing 14, no. 1 (1999): 103–13. 
  9. D’Ignazio, Catherine, and Lauren F. Klein. Data Feminism. Cambridge, Mass.: MIT Press, 2020.
  10. Drucker, Johanna. Graphesis: Visual Forms of Knowledge Production. Cambridge, Mass.: Harvard University Press, 2014.
  11. Eder, Maciej. “Visualization in Stylometry: Cluster Analysis Using Networks.” Digital Scholarship in the Humanities 32, no. 1 (2017): 50–64.
  12. Fitzpatrick, Kathleen. “Reporting from the Digital Humanities 2010 Conference.” ProfHacker, July 13, 2010. http://chronicle.com/blogs/profhacker/reporting-from-the-digital-humanities-2010-conference/25473.
  13. Foucault, Michel. The Archaeology of Knowledge. New York: Pantheon Books, 1972.
  14. Foucault, Michel. The Order of Things: An Archaeology of the Human Sciences. New York: Pantheon Books, 1970. 
  15. Gemma, Marissa, Frédéric Glorieux, and Jean-Gabriel Ganascia. “Operationalizing the Colloquial Style: Repetition in 19th-Century American Fiction.” Digital Scholarship in the Humanities 32, no. 2 (2017): 312–35. 
  16. Guldi, Jo. “Critical Search: A Procedure for Reading in Large-Scale Textual Corpora.” Journal of Cultural Analytics 3, no. 1 (2018).
  17. Heuser, Ryan, and Long Le-Khac. “Learning to Read Data: Bringing Out the Humanistic in the Digital Humanities.” Victorian Studies 54, no. 1 (2011): 79–86.
  18. Jackson, Cornell. “Using Social Network Analysis to Reveal Unseen Relationships in Medieval Scotland.” Digital Scholarship in the Humanities 32, no. 2 (2017): 336–43. 
  19. Kaufer, David, and Suguru Ishizaki. “Computer-Aided Rhetorical Analysis.” In Applied Natural Language Processing and Content Analysis: Advances in Identification, Investigation, and Resolution, edited by Philip M. McCarthy and Chutima Boonthum-Denecke, 276–96. Hershey, Pa.: IGI Global, 2012.
  20. Kaufer, David S., and Shawn J. Parry-Giles. “Hillary Clinton’s Presidential Campaign Memoirs: A Study in Contrasting Identities.” Quarterly Journal of Speech 103, no. 1–2 (2017): 7–32. 
  21. Kuhn, Thomas S. The Structure of Scientific Revolutions. Chicago: University of Chicago Press, 1962. 
  22. Latour, Bruno, and Steve Woolgar. Laboratory Life: The Construction of Scientific Facts. Princeton, N.J.: Princeton University Press, 1986. 
  23. Lipton, Zachary C. “The Mythos of Model Interpretability.” Queue 16, no. 3 (June 2018): 31–57. 
  24. Liu, Alan. “Where Is Cultural Criticism in the Digital Humanities?” In Debates in the Digital Humanities, edited by Matthew K. Gold, 490–510. Minneapolis: University of Minnesota Press, 2012. 
  25. MacDonald, Susan P. Professional Academic Writing in the Humanities and Social Sciences. Carbondale: Southern Illinois University Press, 2010.
  26. McKenna, C. W. F., and Alexia Antonia. “The Statistical Analysis of Style: Reflections on Form, Meaning, and Ideology in the ‘Nausicaa’ Episode of Ulysses.” Literary and Linguistic Computing 16, no. 4 (2001): 353–73. 
  27. McPherson, Tara. “U.S. Operating Systems at Mid-Century: The Intertwining of Race and Unix.” In Race after the Internet, edited by Lisa Nakamura and Peter A. Chow-White, 21–37. New York: Routledge, 2013.
  28. Mullen, Lincoln. “A Braided Narrative for Digital History.” In Debates in the Digital Humanities 2019, edited by Matthew K. Gold and Lauren F. Klein, 382–88. Minneapolis: University of Minnesota Press, 2019.
  29. Pennebaker, James W. The Secret Life of Pronouns: What Our Words Say about Us. New York: Bloomsbury Press, 2011. 
  30. Piper, Andrew. “Novel Devotions: Conversional Reading, Computational Modeling, and the Modern Novel.” New Literary History 46 (2015): 63–98.
  31. Popper, Karl R. The Logic of Scientific Discovery. London: Hutchinson, 1959.
  32. Ramsay, Stephen. Reaching Machines: Toward an Algorithmic Criticism. Champaign: University of Illinois Press, 2011.
  33. Ringler, Hannah. “‘We Can’t Read It All’: Theorizing a Hermeneutics for Large-Scale Data in the Humanities with a Case Study in Stylometry.” Digital Scholarship in the Humanities (2021).
  34. Robertson, Stephen. “The Differences between Digital Humanities and Digital History.” In Debates in the Digital Humanities 2016, edited by Matthew K. Gold and Lauren F. Klein. Minneapolis: University of Minnesota Press, 2016.
  35. Rockwell, Geoffrey, and Stéfan Sinclair. Hermeneutica: Computer-Assisted Interpretation in the Humanities. Cambridge, Mass.: MIT Press, 2016. 
  36. Savoy, Jacques. “Analysis of the Style and the Rhetoric of the 2016 US Presidential Primaries.” Digital Scholarship in the Humanities 33, no. 1 (2018): 143–59.
  37. Sculley, David, and Bradley M. Pasanek. “Meaning and Mining: The Impact of Implicit Assumptions in Data Mining for the Humanities.” Literary and Linguistic Computing 23, no. 4 (2008): 409–24.
  38. Tenen, Dennis. “Blunt Instrumentalism: On Tools and Methods.” In Debates in the Digital Humanities 2016, edited by Matthew K. Gold and Lauren F. Klein, 83–91. Minneapolis: University of Minnesota Press, 2016. 
  39. Tukey, John Wilder. Exploratory Data Analysis. Reading, Mass.: Addison-Wesley, 1977.
  40. van Zundert, Joris J. “Screwmeneutics and Hermenumericals: The Computationality of Hermeneutics.” In A New Companion to Digital Humanities, edited by Susan Schreibman, Ray Siemens, and John Unsworth, 331–47. New York: John Wiley, 2016.
  41. Warren, Christopher N., Daniel Shore, Jessica Otis, Lawrence Wang, Mike Finegold, and Cosma Shalizi. “Six Degrees of Francis Bacon: A Statistical Method for Reconstructing Large Historical Social Networks.” Digital Humanities Quarterly 10, no. 3 (2016).

Annotate

Next Chapter
Computing Criticism
PreviousNext
Copyright 2024 by the Regents of the University of Minnesota
Powered by Manifold Scholarship. Learn more at
Opens in new tab or windowmanifoldapp.org