Anyone who does text analysis in the humanities—or indeed, any kind of computational work in the humanities—has committed themselves to two kinds of conversations, two kinds of ethical frameworks, and two kinds of pedagogies.
The first kind of conversation is technical, and it involves something like the process of “getting it right.” There are good ways to write software and bad ways. There are algorithms that work in a given situation and those that do not. There are best practices, design patterns, measures of statistical significance, and concerns about extensibility, reproducibility, fault-tolerance, and design that anyone working with digital materials needs to know and heed. It requires conversation, because the best way to be heedless and unknowing is to ignore the advice of others. It is an ethical commitment, because software needs to do what it says on the tin and behave as transparently as possible. It is a pedagogical commitment, because we have to pass these skills and habits of mind on to our students.
The second kind of conversation arises—or should arise—from the fact that we undertake these activities not at a corporation, or in a scientific laboratory, or as part of the (entirely distinct) research agendas of computer science and engineering, but within the context of humanistic inquiry. Here, the conversation is about the nature of the human condition and its artistic and historical artifacts. The ethics of humanistic inquiry demands that we treat our questions as always being fundamentally rhetorical in nature, if only as a way to respect the complexity of human culture. The pedagogy of the humanities follows on the heels of this ethic. We try to teach our students to think deeply, to think of things as being this and this, to be comfortable with the open-ended and the unresolved. Where the technologist speaks of progress, the humanist might speak (with like enthusiasm) of modulation and change.
There are many who think scatter plots filled with data points drawn from, say, English novels, are already a crime against the humanities—the death of all that is good and pure about humanistic study. For them, the problem is positivism in its properly technical sense. They fear an epistemology that does not merely value empirical data, but which (in its extreme philosophical forms) considers empirical data to be the only valid form of evidence. They imagine a computationally driven history or French literature curriculum that forsakes the ancient circle of the seminar for the modern angles of the server room. They imagine humanistic conversation debasing itself in the form of technical cavils, humanistic ethics becoming nothing more than “practical business ethics,” and teaching degenerating into mere training.1
I believe these fears are not so much unwarranted as they are grossly overblown. They succumb to a shrill rhetoric that presupposes a forking path between one activity and another—as if the use of computers automatically entailed the foreclosure of humanistic discussion and summary commitment to a set of exaggerated epistemologies (chiefly scientism, and again, positivism).
Certainly, it is possible to regard text analysis as a way to settle humanistic questions once and for all. It is likewise possible to substitute those ethical frameworks that have traditionally been regarded as more congenial to humanistic discourse with a barren form of utilitarianism. And while training is simply necessary for computational work, one can easily imagine that training proceeding without any consideration of the humanistic concerns for which that training is intended. What is more, we all operate under the weight of computation as a powerful cultural force. It is not merely Google that is trying to convince us that an understanding of data is “more valid” than other kinds of knowing; the cultural processes that led us to such thinking began with the Enlightenment and, despite numerous countermovements, continue to exert a kind of Pavlovian response from us when data confronts speculative inquiry.
Digital humanities is at its strongest, though, when it is oriented toward resisting this response. We are aided in that project by our training in the humanities, which serves, above all, to reinforce those conversations, ethical commitments, and pedagogies proper to the nature of humanistic discourse. Our students may find it easy to think of software as somehow “culturally neutral” or scatter plots as inherently “more valid,” but we, simply by virtue of being members of humanistic disciplinary communities, find it almost impossible to do so. The choice we face is therefore not between scientism and humanism, but between a willingness to allow digital objects—including those that deal with empirical data—to participate fully in humanistic discussions according to the terms of those discussions, and a dismissal of digital work as inherently incompatible with those discussions.
I have always lobbied for a “ludic” approach to text analysis—one that, following the work of Jerome McGann and others, seeks to twist and deform data in order to provoke discussion.2 But at heart, this is not so much a methodology as a way for us to frame the nature of less extreme provocations (including traditional literary study). When Matthew Jockers tells us that there are “six, or possibly seven, archetypal plot shapes” in the English novel,3 he is doing something that from a technical standpoint may be right or wrong. From a humanistic standpoint, he can only be doing something right by asking us to consider a new object, a new provocation, a new arrangement, a new thing to interrogate. The real failure would not be a result that is deemed incorrect, or not interesting, or theoretically flawed. The real failure would be the decision to banish this kind of work from all consideration as humanistic scholarship. To do so would be to succumb to what David Golumbia has called, with opprobrium, the “cultural logic of computation” (Cultural Logic). It would amount to an admission that data trumps story, logic is more useful than the human experience, and that the server room can exist without the seminar room.
1. For a useful précis of various objections within the academy (as well as a pointed counterargument), see Kirschenbaum. For objections from without, see, for example, Adam Kirsch, “Technology Is Taking over English Departments: The False Promise of the Digital Humanities.” The New Republic, May 2, 2014, http://www.newrepublic.com/article/117428/limits-digital-humanities-adam-kirsch/.
2. See especially McGann, Radiant Textuality. My own work on the subject appears mainly in Reading Machines (Ramsay).
3. Matthew L. Jockers, “The Rest of the Story,” author’s blog, February 25, 2015. http://www.matthewjockers.net/2015/02/25/the-rest-of-the-story/. Jockers’s blog contains numerous posts about Syuzhet—the R package he developed for conducting sentiment analysis.
Golumbia, David. The Cultural Logic of Computation. Cambridge, Mass.: Harvard University Press, 2009.
Kirschenbaum, Matthew. “What Is ‘Digital Humanities,’ and Why Are They Saying Such Terrible Things about It?” Difference 24, no. 1 (2014): 46–63. https://mkirschenbaum.files.wordpress.com/2014/04/dhterriblethingskirschenbaum.pdf.
McGann, Jerome. Radiant Textuality: Literature after the World Wide Web. New York: Palgrave, 2004.
Ramsay, Stephen. Reading Machines: Toward an Algorithmic Criticism. Urbana–Champaign: University of Illinois Press, 2011.