Skip to main content

Debates in the Digital Humanities: Blog Post: Text: A Massively Addressable Object | Michael Witmore

Debates in the Digital Humanities
Blog Post: Text: A Massively Addressable Object | Michael Witmore
  • Show the following:

    Annotations
    Resources
  • Adjust appearance:

    Font
    Font style
    Color Scheme
    Light
    Dark
    Annotation contrast
    Low
    High
    Margins
  • Search within:
    • Notifications
    • Privacy
  • Project HomeDebates in the Digital Humanities
  • Projects
  • Learn more about Manifold

Notes

table of contents
  1. Cover
  2. Title Page
  3. Copyright Page
  4. Contents
  5. Introduction: The Digital Humanities Moment | Matthew K. Gold
  6. Part One: Defining the Digital Humanities
    1. Chapter 1: What Is Digital Humanities and What’s It Doing in English Departments? | Matthew Kirschenbaum
    2. Chapter 2: The Humanities, Done Digitally | Kathleen Fitzpatrick
    3. Chapter 3: “This Is Why We Fight”: Defining the Values of the Digital Humanities | Lisa Spiro
    4. Chapter 4: Beyond the Big Tent | Patrik Svensson
    5. Blog Post: The Digital Humanities Situation | Rafael C. Alvarado
    6. Blog Post: Where’s the Beef? Does Digital Humanities Have to Answer Questions? | Tom Scheinfeldt
    7. Blog Post: Why Digital Humanities Is “Nice” | Tom Scheinfeldt
    8. Blog Post: An Interview with Brett Bobley | Michael Gavin and Kathleen Marie Smith
    9. Blog Post: Day of DH: Defining the Digital Humanities
  7. Part Two: Theorizing the Digital Humanities
    1. Chapter 5: Developing Things: Notes toward an Epistemology of Building in the Digital Humanities | Stephen Ramsay and Geoffrey Rockwell
    2. Chapter 6: Humanistic Theory and Digital Scholarship | Johanna Drucker
    3. Chapter 7: This Digital Humanities Which Is Not One | Jamie “Skye” Bianco
    4. Chapter 8: A Telescope for the Mind? | Willard McCarty
    5. Blog Post: Sunset for Ideology, Sunrise for Methodology? | Tom Scheinfeldt
    6. Blog Post: Has Critical Theory Run Out of Time for Data-Driven Scholarship? | Gary Hall
    7. Blog Post: There Are No Digital Humanities | Gary Hall
  8. Part Three: Critiquing the Digital Humanities
    1. Chapter 9: Why Are the Digital Humanities So White? or Thinking the Histories of Race and Computation | Tara McPherson
    2. Chapter 10: Hacktivism and the Humanities: Programming Protest in the Era of the Digital University | Elizabeth Losh
    3. Chapter 11: Unseen and Unremarked On: Don DeLillo and the Failure of the Digital Humanities | Mark L. Sample
    4. Chapter 12: Disability, Universal Design, and the Digital Humanities | George H. Williams
    5. Chapter 13: The Digital Humanities and Its Users | Charlie Edwards
    6. Blog Post: Digital Humanities Triumphant? | William Pannapacker
    7. Blog Post: What Do Girls Dig? | Bethany Nowviskie
    8. Blog Post: The Turtlenecked Hairshirt | Ian Bogost
    9. Blog Post: Eternal September of the Digital Humanities | Bethany Nowviskie
  9. Part Four: Practicing the Digital Humanities
    1. Chapter 14: Canons, Close Reading, and the Evolution of Method | Matthew Wilkens
    2. Chapter 15: Electronic Errata: Digital Publishing, Open Review, and the Futures of Correction | Paul Fyfe
    3. Chapter 16: The Function of Digital Humanities Centers at the Present Time | Neil Fraistat
    4. Chapter 17: Time, Labor, and “Alternate Careers” in Digital Humanities Knowledge Work | Julia Flanders
    5. Chapter 18: Can Information Be Unfettered? Race and the New Digital Humanities Canon | Amy E. Earhart
    6. Blog Post: The Social Contract of Scholarly Publishing | Daniel J. Cohen
    7. Blog Post: Introducing Digital Humanities Now | Daniel J. Cohen
    8. Blog Post: Text: A Massively Addressable Object | Michael Witmore
    9. Blog Post: The Ancestral Text | Michael Witmore
  10. Part Five: Teaching the Digital Humanities
    1. Chapter 19: Digital Humanities and the “Ugly Stepchildren” of American Higher Education | Luke Waltzer
    2. Chapter 20: Graduate Education and the Ethics of the Digital Humanities | Alexander Reid
    3. Chapter 21: Should Liberal Arts Campuses Do Digital Humanities? Process and Products in the Small College World | Bryan Alexander and Rebecca Frost Davis
    4. Chapter 22: Where’s the Pedagogy? The Role of Teaching and Learning in the Digital Humanities | Stephen Brier
    5. Blog Post: Visualizing Millions of Words | Mills Kelly
    6. Blog Post: What’s Wrong with Writing Essays | Mark L. Sample
    7. Blog Post: Looking for Whitman: A Grand, Aggregated Experiment | Matthew K. Gold and Jim Groom
    8. Blog Post: The Public Course Blog: The Required Reading We Write Ourselves for the Course That Never Ends | Trevor Owens
  11. Part Six: Envisioning the Future of the Digital Humanities
    1. Chapter 23: Digital Humanities As/Is a Tactical Term | Matthew Kirschenbaum
    2. Chapter 24: The Digital Humanities or a Digital Humanism | Dave Parry
    3. Chapter 25: The Resistance to Digital Humanities | David Greetham
    4. Chapter 26: Beyond Metrics: Community Authorization and Open Peer Review | Kathleen Fitzpatrick
    5. Chapter 27: Trending: The Promises and the Challenges of Big Social Data | Lev Manovich
    6. Chapter 28: Humanities 2.0: Promise, Perils, Predictions | Cathy N. Davidson
    7. Chapter 29: Where Is Cultural Criticism in the Digital Humanities? | Alan Liu
  12. Acknowledgments
  13. Contributors

PART IV ][ Blog Posts

Text: A Massively Addressable Object

MICHAEL WITMORE

At the Working Group for Digital Inquiry at Wisconsin, we’ve just begun our first experiment with a new order of magnitude of texts. Jonathan Hope and I started working with thirty-six items about six years ago when we began to study Shakespeare’s First Folio plays (Witmore and Hope). Last year, we expanded to three-hundred and twenty items with the help of Martin Mueller at Northwestern, exploring the field of early modern drama. Now that the University of Wisconsin has negotiated a license with the University of Michigan to begin working with the files from the Text Creation Partnership (TCP), which contains over twenty-seven thousand items from early modern print, we can up the number again. By January, we will have begun our first one-thousand item experiment, spanning items printed in Britain and North America from 1530 through 1809. Robin Valenza and I, along with our colleagues in computer sciences and the library, will begin working up the data in the spring. Stay tuned for results.

New experiments provide opportunities for thought that precede the results. What does it mean to collect, tag, and store an array of texts at this level of generality? What does it mean to be an “item” or “computational object” within this collection? What is such a collection? In this post, I want to think further about the nature of the text objects and populations of texts we are working with.

What is the distinguishing feature of the digitized text—that ideal object of analysis considered in all its hypothetical relations with other ideal objects? The question itself goes against the grain of recent materialist criticism, which focuses on the physical existence of books and practices involved in making and circulating them. Unlike someone buying an early modern book in the bookstalls around St. Paul’s four hundred years ago, we encounter our TCP texts as computational objects. That doesn’t mean that they are immaterial, however. Human labor has transformed them from microfilm facsimiles of real pages into diplomatic quality digital transcripts, marked up in TEI so that different formatting features can be distinguished. That labor is as real as any other.

What distinguishes this text object from others? I would argue that a text is a text because it is massively addressable at different levels of scale. Addressable here means that one can query a position within the text at a certain level of abstraction. In an earlier post, for example, I argued that a text might be thought of as a vector through a metatable of all possible words (Witmore). Why is it possible to think of a text in this fashion? Because a text can be queried at the level of single words and then related to other texts at the same level of abstraction: the table of all possible words could be defined as the aggregate of points of address at a given level of abstraction (the word, as in Google’s new Ngram corpus). Now, we are discussing ideal objects here; addressability implies different levels of abstraction (character, word, phrase, line, etc.), which are stipulative or nominal: such levels are not material properties of texts or Pythagorean ideals; they are, rather, conventions.

Here’s the twist. We have physical manifestations of ideal objects (the ideal 1 Henry VI, for example), but these manifestations are only provisional realizations of that ideal. (I am using the word manifestation in the sense advanced in the Online Computer Library Center’s Functional Requirements for Bibliographic Records [FRBR] hierarchy.1) The book or physical instance, then, is one of many levels of address. Backing out into a larger population, we might take a genre of works to be the relevant level of address. Or we could talk about individual lines of print, all the nouns in every line, every third character in every third line. All this variation implies massive flexibility in levels of address. And more provocatively, when we create a digitized population of texts, our modes of address become more and more abstract: all concrete nouns in all the items in the collection, for example, or every item identified as a “History” by Heminges and Condell in the First Folio. Every level is a provisional unity: stable for the purposes of address but also stable because it is the object of address. Books are such provisional unities. So are all the proper names in the phone book.

The ontological status of the individual text is the same as that of the population of texts: both are massively addressable, and when they are stored electronically we are able to act on this flexibility in more immediate ways through iterative searches and comparisons. At first glance, this might seem like a Galilean insight, similar to his discipline-collapsing claim that the laws that apply to heavens (astronomy) are identical with the ones that apply to the sublunar realm (physics). But it is not.

Physical texts were already massively addressable before they were ever digitized, and this variation in address was and is registered at the level of the page, chapter, the binding of quires, and the like. When we encounter an index or marginal note in a printed text—for example, a marginal inscription linking a given passage of a text to some other in a different text—we are seeing an act of address. Indeed, the very existence of such notes and indexes implies just this flexibility of address.

What makes a text a text—its susceptibility to varying levels of address—is a feature of book culture and the flexibility of the textual imagination. We address ourselves to this level, in this work, and think about its relation to some other. “Oh, this passage in Hamlet points to a verse in the Geneva bible,” we say. To have this thought is to dispose relevant elements in the data set in much the same way a spreadsheet aggregates a text in ways that allow for layered access. A reader is a maker of such a momentary dispositif or device, and reading might be described as the continual redisposition of levels of address in this manner. We need a phenomenology of these acts, one that would allow us to link quantitative work on a culture’s “built environment” of words to the kinesthetic and imaginative dimensions of life at a given moment.

A physical text or manifestation is a provisional unity. There exists a potentially infinite array of such unities, some of which are already lost to us in history: what was a relevant level of address for a thirteenth-century monk reading a manuscript? Other provisional unities can be operationalized now, as we are doing in our experiment at Wisconsin, gathering one thousand texts and then counting them in different ways. Grammar, as we understand it now, affords us a level of abstraction at which texts can be stabilized: we lemmatize texts algorithmically before modernizing them, and this lemmatization implies provisional unities in the form of grammatical objects of address.

One hundred years from now, the available computational objects may be related to one another in new ways. I can only imagine what these are: every fourth word in every fourth document, assuming one could stabilize something like “word length” in any real sense. (The idea of a word is itself an artifact of manuscript culture, one that could be perpetuated in print through the affordances of moveable type.) What makes such thought experiments possible is, once again, the addressability of texts as such. Like a phone book, they aggregate elements and make these elements available in multiple ways. You could even think of such an aggregation as the substance of another aggregation, for example, “all the phone numbers belonging to people whose last name begins with A.” But unlike a phonebook, the digitized text can be reconfigured almost instantly into various layers of arbitrarily defined abstraction (characters, words, lines, works, genres). The mode of storage or virtualization is precisely what allows the object to be addressed in multiple ways.

Textuality is massive addressability. This condition of texts is realized in various manifestations, supported by different historical practices of reading and printing. The material affordances of a given medium put constraints on such practices: the practice of “discontinuous reading” described by Peter Stallybrass, for example, develops alongside the fingerable discrete leaves of a codex. But addressability as such: this is a condition rather than a technology, action, or event. And its limits cannot be exhausted at a given moment. We cannot, in a Borgesian mood, query all the possible data sets that will appear in the fullness of time. And we cannot import future query types into the present. But we can and do approximate such future searches when we automate our modes of address in unsupervised multivariate statistical analysis—for example, factor analysis or Principle Component Analysis (PCA). We want all the phonebooks. And we can simulate some of them now.

NOTES

This chapter originally appeared as “Text: A Massively Addressable Object” (http://winedarksea.org/?p=926).

1. http://www.oclc.org/research/publications/library/2003/lavoie_frbr.pdf.

BIBLIOGRAPHY

Hope, Jonathan, and Michael Witmore. “The Hundredth Psalm to the Tune of ‘Green Sleeves’: Digital Approaches to Shakespeare’s Language of Genre.” Shakespeare Quarterly 61, no. 3 (2010): 357– 90.

Witmore, Michael. “Texts as Objects II: Object Oriented Philosophy. And Criticism?” Wine Dark Sea. September 17, 2009. http://winedarksea.org/?p=381.

Annotate

Next Chapter
Blog Post: The Ancestral Text | Michael Witmore
PreviousNext
Copyright 2012 by the Regents of the University of Minnesota

Chapter 1 was previously published as “What Is Digital Humanities and What’s It Doing in English Departments?” ADE Bulletin, no. 150 (2010): 55–61. Chapter 2 was previously published as “The Humanities, Done Digitally,” The Chronicle of Higher Education, May 8, 2011. Chapter 17 was previously published as “You Work at Brown. What Do You Teach?” in #alt-academy, Bethany Nowviskie, ed. (New York: MediaCommons, 2011). Chapter 28 was previously published as “Humanities 2.0: Promises, Perils, Predictions,” PMLA 123, no. 3 (May 2008): 707–17.
Powered by Manifold Scholarship. Learn more at
Opens in new tab or windowmanifoldapp.org