Skip to main content

Debates in the Digital Humanities 2016: 45. The Ground Truth of DH Text Mining | Tanya E. Clement

Debates in the Digital Humanities 2016
45. The Ground Truth of DH Text Mining | Tanya E. Clement
  • Show the following:

    Annotations
    Resources
  • Adjust appearance:

    Font
    Font style
    Color Scheme
    Light
    Dark
    Annotation contrast
    Low
    High
    Margins
  • Search within:
    • Notifications
    • Privacy
  • Project HomeDebates in the Digital Humanities 2016
  • Projects
  • Learn more about Manifold

Notes

table of contents
  1. Cover
  2. Title Page
  3. Copyright Page
  4. Contents
  5. Digital Humanities: The Expanded Field | Lauren F. Klein and Matthew K. Gold
  6. Part 1. Histories and Futures of the Digital Humanities
    1. 1. The Emergence of the Digital Humanities (as the Network Is Everting) | Steven E. Jones
    2. 2. The “Whole Game”: Digital Humanities at Community Colleges | Anne B. McGrail
    3. 3. What’s Next: The Radical, Unrealized Potential of Digital Humanities | Miriam Posner
    4. 4. Making a Case for the Black Digital Humanities | Kim Gallon
    5. 5. QueerOS: A User’s Manual | Fiona Barnett, Zach Blas, Micha Cárdenas, Jacob Gaboury, Jessica Marie Johnson, and Margaret Rhee
    6. 6. Father Busa’s Female Punch Card Operatives | Melissa Terras and Julianne Nyhan
    7. 7. On the Origin of “Hack” and “Yack” | Bethany Nowviskie
    8. 8. Reflections on a Movement: #transformDH, Growing Up | Moya Bailey, Anne Cong-Huyen, Alexis Lothian, and Amanda Phillips
  7. Part 2. Digital Humanities and Its Methods
    1. 9. Blunt Instrumentalism: On Tools and Methods | Dennis Tenen
    2. 10. Putting the Human Back into the Digital Humanities: Feminism, Generosity, and Mess | Elizabeth Losh, Jacqueline Wernimont, Laura Wexler, and Hong-An Wu
    3. 11. Mid-Sized Digital Pedagogy | Paul Fyfe
    4. 12. Re: Search and Close Reading | Michael Hancher
    5. 13. Why We Must Read the Code: The Science Wars, Episode IV | Mark C. Marino
    6. 14. Where Is Methodology in Digital Humanities? | Tanya E. Clement
    7. 15. Resistance in the Materials | Bethany Nowviskie
    8. 16. Interview with Ernesto Oroza | Alex Gil
    9. 17. Digital Humanities Knowledge: Reflections on the Introductory Graduate Syllabus | Scott Selisker
  8. Part 3. Digital Humanities and Its Practices
    1. 18. Alien Reading: Text Mining, Language Standardization, and the Humanities | Jeffrey M. Binder
    2. 19. My Old Sweethearts: On Digitization and the Future of the Print Record | Andrew Stauffer
    3. 20. Argument, Evidence, and the Limits of Digital Literary Studies | David L. Hoover
    4. 21. Pedagogies of Race: Digital Humanities in the Age of Ferguson | Amy E. Earhart and Toniesha L. Taylor
    5. 22. Here and There: Creating DH Community | Miriam Posner
    6. 23. The Sympathetic Research Imagination: Digital Humanities and the Liberal Arts | Rachel Sagner Buurma and Anna Tione Levine
    7. 24. Lessons on Public Humanities from the Civic Sphere | Wendy F. Hsu
  9. Part 4. Digital Humanities and the Disciplines
    1. 25. The Differences between Digital Humanities and Digital History | Stephen Robertson
    2. 26. Digital History’s Perpetual Future Tense | Cameron Blevins
    3. 27. Collections and/of Data: Art History and the Art Museum in the DH Mode | Matthew Battles and Michael Maizels
    4. 28. Archaeology, the Digital Humanities, and the “Big Tent” | Ethan Watrall
    5. 29. Navigating the Global Digital Humanities: Insights from Black Feminism | Roopika Risam
    6. 30. Between Knowledge and Metaknowledge: Shifting Disciplinary Borders in Digital Humanities and Library and Information Studies | Jonathan Senchyne
    7. 31. “Black Printers” on White Cards: Information Architecture in the Data Structures of the Early American Book Trades | Molly O’Hagan Hardy
    8. 32. Public, First | Sheila A. Brennan
  10. Part 5. Digital Humanities and Its Critics
    1. 33. Are Digital Humanists Utopian? | Brian Greenspan
    2. 34. Ecological Entanglements of DH | Margaret Linley
    3. 35. Toward a Cultural Critique of Digital Humanities | Domenico Fiormonte
    4. 36. How Not to Teach Digital Humanities | Ryan Cordell
    5. 37. Dropping the Digital | Jentery Sayers
    6. 38. The Dark Side of the Digital Humanities | Wendy Hui Kyong Chun, Richard Grusin, Patrick Jagoda, and Rita Raley
    7. 39. Difficult Thinking about the Digital Humanities | Mark Sample
    8. 40. The Humane Digital | Timothy Burke
    9. 41. Hold on Loosely, or Gemeinschaft and Gesellschaft on the Web | Ted Underwood
  11. Part 6. Forum: Text Analysis at Scale
    1. 42. Introduction | Matthew K. Gold and Lauren F. Klein
    2. 43. Humane Computation | Stephen Ramsay
    3. 44. Distant Reading and Recent Intellectual History | Ted Underwood
    4. 45. The Ground Truth of DH Text Mining | Tanya E. Clement
    5. 46. Why I Dig: Feminist Approaches to Text Analysis | Lisa Marie Rhody
    6. 47. More Scale, More Questions: Observations from Sociology | Tressie McMillan Cottom
    7. 48. Do Digital Humanists Need to Understand Algorithms? | Benjamin M. Schmidt
    8. 49. Messy Data and Faulty Tools | Joanna Swafford
    9. 50. N + 1: A Plea for Cross-Domain Data in the Digital Humanities | Alan Liu
  12. Series Introduction and Editors’ Note | Matthew K. Gold and Lauren F. Klein
  13. Contributors

45

The Ground Truth of DH Text Mining

Tanya E. Clement

In the digital humanities, text mining is a logocentric practice. That is, text mining in digital humanities usually begins with The Word. We extract The Word; we count The Word; we stem The Word to its root; we parse The Word; we name The Word; we disambiguate The Word; we collocate The Word; we count The Word again; we apply an algorithm that allows us to reconstruct the world of The Word as one we can visualize as a list, as a line graph, as a histogram in small multiples, or on big screens. We use the view this new world provides us to interpret The Word.

This practice of text mining presupposes a binary logic; there is meaning in the results or there is not. It begins with a “ground truth,” or labels that signify the presence of meaning. Sometimes we determine ground truth through annotations for machine classification: “Here, machine, are the love letters that Susan Dickinson wrote to Emily Dickinson. Please, find more like these.” Sometimes we determine ground truth after we receive clustering results: “Ah, machine, I see you have done your stemming and your parsing and your counting and you have given me a pile of words. I read them and will label them ‘whaling’” (though someone else might have said “indigenous economy”). “Ah, machine, I see you have clustered novels written by ‘women’ here and novels written by ‘men’ there. You are very clever. You must understand gender, just as I do.”

When engaged in this kind of text mining, we are reinscribing the simplest meaning of The Word. The authors of a text-mining textbook write that the results of text mining are easier to understand than numerical results because analysts “all have some expertise. The document is text.” (Weiss et al., 51–52). Likewise, even when we are humanists and feminists and should know better, we think we understand the machine’s results when they are words or when they cluster books according to an author of an “always already” gender. We see a pattern we think we can interpret, because we think we know what The Word means, and gender, which we have worked so hard to complicate, is suddenly reduced to “female author” or “male author.” The Word has been proved to serve as ground truth. The Word is apodictic.

Sound, by comparison, is aporetic. Mining audio spoken word collections means extracting acoustic features for classification, clustering, and visualization. Choosing features is complicated. The Word seems to be interpretable at a determined length. What length of a sound is meaningful? The Word seems to have typical patterns of characters, seems to perform regularly as a part of speech even in the context of complex sentences, seems to have a root that grows, more usually than not, in prescribed ways. Hearing sound as digital audio means listening through filter banks, sampling rates, and compression scenarios that are meant to mimic the human ear (Salthouse and Sarpeshkar). To mine these acoustic features is to understand that ground truth must always be indeterminate. Which features you choose and how you label that cluster of acoustic features that is sound will often be different from the features and labels I might choose. We must ask: Whose ear are we mimicking? What is audible, and to whom? Playback means choosing the damping ratios and frequency ranges that include overlapping and audible signals. We must ask: What signal is noise? What signal is meaningful, and to whom? Extracting meaningful features for mining sound always means interpreting not only what sound means, but how sound creates meaning. Mining sound reminds us that we have constructed an analysis according to our own experiences with how sound is meaningful.

As humanists, we seek questions, not solutions. Practicing sound mining alerts us to the fact that in text mining, The Word should also be aporetic. Instead of The Word, we are working with a word that is always indeterminate—meaning is both present and absent at once. We construct text-mining analyses according to our own experiences with how words make meaning. We must use text mining as a hermeneutic of The Word or as a hermeneutic of text mining or as a hermeneutic of hermeneutics. The Word does not provide evidence of meaning, of identity. A word in text mining is a foil to ground truth, not its proof.

Bibliography

Salthouse, Christopher D., and Rahul Sarpeshkar. “A Practical Micropower Programmable Bandpass Filter for Use in Bionic Ears.” IEEE Journal of Solid-State Circuits 38, no. 1 (January 2003): 63–70.

Weiss, Sholom M., Nitin Indurkhya, Tong Zhang, and Fred J. Damerau. Text Mining: Predictive Methods for Analyzing Unstructured Information. New York: Springer, 2005.

Annotate

Next Chapter
46. Why I Dig: Feminist Approaches to Text Analysis | Lisa Marie Rhody
PreviousNext
Copyright 2016 by the Regents of the University of Minnesota
Powered by Manifold Scholarship. Learn more at
Opens in new tab or windowmanifoldapp.org