The Future of Digital Humanities Research
Alone You May Go Faster, but Together You’ll Get Further
Marieke van Erp, Barbara McGillivray, and Tobias Blanke
Breakthroughs in artificial intelligence and the collection of large cultural datasets have led to renewed excitement about computational literary studies, digital history, and other advanced computational analysis in the humanities. Applied work with artificial intelligence and machine learning has mainly become relevant in the context of a new discipline called “data science” that is broadly concerned with extracting meaning from digital data. Because digital humanities shares this interest, we are beginning to see more and more joint work between the two areas. The emergence of data science has fundamentally changed the relationship between computer science and humanities. This chapter provides insights on this model of collaboration based on our experiences in European projects around machine learning and data engineering that we have conducted over the past fifteen years. These projects have strengthened our conviction that the best approach to take on digital humanities research is to collaborate across disciplines. Through transnational frameworks of the European Union and also national funding collaborations, these projects should not be read as the initiative of one (usually Western) European country; rather, they incorporate different perspectives from Eastern or Southern Europe. They are generally multilingual in nature and focused on different (cultural) histories. While we cannot cover all relevant projects and perspectives, we aim to present a diverse set of examples that generalize to other initiatives.
The new breakthroughs in scientific data analysis, which we label as machine learning, data science, and large-scale language models, also came with a broadening of the availability of advanced computational analysis tools that can now be easily used by new groups of users that do not necessarily have PhDs in statistical modeling. New toolkits, often built around open-source languages such as R and Python, have allowed users to focus on their data work rather than having to implement machine learning algorithms from scratch. Python’s sklearn or R’s caret do not just embed computational analysis into well-defined processing pipelines; they also provide a range of advanced data processing tools. They have largely contributed to the wider adoption of digital methodologies as a whole—also in the humanities. This has radically shifted the relationship between computer science and humanities and allowed for a new dialogue to emerge with much more computational input from the humanities. Ted Underwood has summarized that digital humanities is unified “by reflection on digital technology.” In his 2012 contribution to Debates in Digital Humanities, Dave Parry wrote: “Digital humanities did not invent collaborative scholarship, but it does make such work more acceptable and transparent.” So, what has changed?
Not too long ago, the collaboration between humanities and computer science in projects often entailed a division of work where computer science had to create the “services” that humanities defined in requirement studies. Not only has this reduced the computer science work to software engineering work, but the humanities, too, were often not an integral part of the digital production process. Project applications generally contained standard lines that the humanities lagged behind other disciplines in their computational knowledge and therefore appreciated the role of consumers of technologies. Advances in computational modeling have meant that the traditional division of labor with requirements from humanities and engineering from computing science is still there, but we also see new projects with a more direct interaction. These are often smaller projects that work directly with a given dataset.
Humanities and computer science have both changed very fast, have become more complex along the way, and have driven new critical research. This chapter begins with the history and current account of the sometimes difficult collaboration of computer science and (digital) humanities and then offers insights into our current work bringing the two together. We demonstrate in particular how humanities can contribute to advanced computational research based on new collaborations in data science and data publication.
Doing a Better Job: Balancing the Computational and the Humanities
Computer science and humanities have found further connections due to the growing prominence of data science as well as machine learning techniques becoming more easily available. Nevertheless, they still divide work in very large projects in Europe, such as the CLARIAH collaborations or efforts to create Europeana. These large-scale projects often entail the complicated cocreation of data between content experts and digital professionals. While both academic disciplines share similarities—namely, in academic career ladders, their need to operate in a global network, and their hunt for external funding—there are marked differences. These differences are most apparent in the publication cultures. For computer science, the focus is on competitive eight-page conference publications, whereas humanities scholars are generally evaluated on books and journal articles. These different publication cultures affect the research pace and the unit of publication, which tends to be smaller for the computational sciences. In this section, we showcase several projects in which we were and are involved in order to detail how disciplinary differences influenced the work in the projects and how the teams balanced these issues. These projects were selected because they show how the collaboration between computing sciences and humanities can lead to a successful cross-fertilization of perspectives. Both initiatives are large enough for all involved disciplines to have their own specific work areas, but they would have not been successful had the disciplines not successfully cooperated on data integration and analysis. We need to spend that time to develop a common understanding of the issues to be addressed. Both projects managed to argue for this successfully with their funders.
The European Holocaust Research Infrastructure (EHRI) has worked on the integration of Holocaust materials in Europe, the United States, and Israel for over ten years (Blanke and Kristel, “Integrating Holocaust Research”). EHRI employs in almost equal measures historians, archivists, and computer experts and has developed a sophisticated data integration framework where content experts concentrate on data that is not yet well described or needs to be newly created (Blanke et al., “The European Holocaust Research Infrastructure Portal”). The computer experts, on the one hand, have set up a semiautomated data integration system covering the many already existing large-scale archives in the field. Archivists and computer scientists, on the other hand, have worked together on translating archival principles of information access such as provenance and fonds into a novel digital framework using graph databases. Given the size of the challenges with over 2,000 collection-holding institutions and datasets at the terabyte scale, a clear division of labor has emerged and remains necessary.
The story of EHRI’s disciplinary collaboration is repeated across several other initiatives in Europe that focus on a diverse set of skills in order to develop and integrate humanities resources. EHRI’s focus is on difficult to reach resources from Eastern and Southeastern Europe and to integrate them. It is very much transnational, but similar large-scale collaborations are also repeated in individual member states of the European Union. Between 2004 and 2014, eighteen projects were funded by the Dutch Research Council called CATCH: Continuous Access to Cultural Heritage. The goal was for each project team to consist of a PhD student (four years), a postdoctoral researcher, and a scientific programmer (each three years).1 This team would spend a considerable amount of time at a cultural heritage institution.2 CATCH had a coining influence on the digital humanities ecosystem in the Netherlands, leading to one of the most advanced research communities in the world that is unique in advocating for the involvement of computer science research in the humanities.
In the first round of CATCH projects, starting in 2005, partners from both computer science and cultural heritage were involved, followed by a strong focus on a humanities component. As such, the team needed to include researchers from both computer science and humanities.3 Besides embedding the teams in cultural heritage institutions and often having them share an office, the program coordinators regularly organized events for all ongoing projects to establish the division of work between computer science and humanities and keep it close to the institutional interests. These events would typically take place at one of the participating cultural heritage institutions, and the project team based at that institute would be responsible for the scientific program of the meeting. The results of putting together researchers from different disciplines in direct contact with heritage professionals resulted in a close-knit community of over 100 people (including project leaders and coordinators) who still work together in various follow-up projects such as CLARIAH4 and networks such as the Dutch Digital Heritage Network5 can call on each other for guest lectures or student excursions.
The Netherlands easily lends itself to such a network because it takes about four hours to drive from the country’s northernmost tip to the farthest corner in the south. The Netherlands is also rich enough for its research councils to spend ten million euros on a ten-year research program. Therefore, replicating this setup is not easy. However, the pandemic years taught us that digital collaborations can facilitate such partnerships, too, albeit at a somewhat slower pace and lower resolution. The most important thing to make research truly interdisciplinary in such larger projects is to spend time on the equality in research questions and mutual understanding. This means that the research is driven by interests from both sides. Neither humanities nor computer science should define the project alone with the other in a supporting role. Formulating research projects with central research questions from the different parties is an excellent starting point.
Furthermore, mutual understanding is key. Naturally, most humanities scholars will not immediately grasp the finer details of computer science methods within a three-year project or even in a long-standing collaboration, while computer scientists struggle with the focus on details in the humanities and its research interests. However, to a certain extent, one does need to understand the other’s research field as long as the division of labor is well defined and collaboration is built into the work. This includes research questions, methodologies, workflows, and research culture. It is incredibly difficult to achieve such an in-depth understanding without working together often and closely. Discussing a shared project in a meeting room for an hour a week and then spending the rest of the week in one’s own office is not enough. From our experience, it is necessary to talk often and then also about nitty-gritty details, to recommend core publications from your field and read your colleague’s core publications from her field. It is unrealistic to expect the first six months of the project to yield scientific breakthroughs, but getting to the bottom of the research problem and building an understanding of each other’s disciplines will be beneficial in the long run.
CATCH entailed several detailed collaborations between computer science and humanities. Its Agora project, for example, attempted to connect different cultural heritage collections through a historical dimension—that is, through the events depicted or related to collection objects instead of standard metadata regarding the object’s type, dimensions, and maker. Furthermore, the project’s research questions were formulated from both humanities and computer science perspectives. Fairly early on in the project, the project team started investigating the use of events as a link between different heritage collections, such that you could relate a weapon from the Rijksmuseum collection used in a conflict to a documentary from the Netherlands Institute for Sound and Vision on that conflict, thus providing additional context to those collection objects. One of the main difficulties, which is as yet unresolved, was to define what an event is and how it can be modeled. Indeed, there are several event models (cf. Scherp et al.; Shaw, Troncy, and Hardman and Van Hage et al.) and datasets that describe events in a structured format, but these are fairly simple and lack the complexity that events denote to historians. The project team thus took a step back and started spending afternoons talking and whiteboarding to try to define and model workable prototypical object-event and event-event relationships to support the interpretation of objects in cultural heritage collections. This led to a paper that was copresented by one of the historians and one of the computer scientists on the team and that also received a nomination for the best paper award at that conference (Van den Akker et al., “Digital Hermeneutics”). More importantly, the process required bridging a huge gap between the two different project angles, both on the content and on how the work is approached and the expected results of a “unit of research.” The team realized that they could only come to a successful collaboration if they understood each other’s process and learned to ask what the other meant when they talked about “modeling,” “vocabulary,” or “event.”
In EHRI and CATCH, humanities scholars and computer scientists have found innovative ways to work together over long sustained periods of time. In this process, they developed their own joint vocabulary along a distinct division of work. As the computational humanities grows, so will our work with colleagues from fields such as computer and data science. Our doing a better job in clarifying and establishing a shared language will be even more important. EHRI and CATCH are in many ways typical for a large number of digital humanities projects in Europe in the last decade where vocabularies are integrated but the work remains divided. Next, we present new styles of collaborations that we are seeing where both work and vocabularies are brought together. They are made possible by a broadening of the availability of data science tools as well as new data environments that allow for a joint publication of data. One of these new data environments is the work on humanities-specific data publication journals.
Sharing Is Caring
The widespread availability of large digital datasets for humanities research and the growing interest in computational and data-intensive methods in digital humanities go hand in hand with questions about how to ensure that the work of creating and sharing the datasets is correctly credited in the humanities as well. In this section, we argue that publishing papers in data journals should be encouraged among humanities scholars and that this becomes part of the standard publication strategy for humanities research projects. This way, data journals become symbols of new data-driven research work specific to the humanities.
In the humanities, digital datasets are now more available and do not always have to be created from scratch, which often required large-scale collaborations across disciplines in Europe. In a context where novel research within particular fields of humanities is encouraged and called for and where traditional schemes in research funding acquisition, higher-education program structures, and academic career paths are challenged (McGillivray, “Computational Methods for Semantic Analysis of Historical Texts”), the topic of data publication venues and data sharing is of particular relevance. It has been shown that humanities research based on freely available and openly licensed datasets is more likely to be cited (e.g., Colavizza et al., “The Citation Advantage”), leading to increased reproducibility rates and greater public confidence in the research. Next to ensuring the long-term preservation of and access to the datasets and resources, it is important to enhance the discoverability and intellectual context of datasets by positioning them within a broader area of research. Like many other disciplines, humanities scholars across the language and historical disciplines have started to investigate how to ensure that the work of creating and sharing of datasets is correctly credited and recognized as a valuable contribution to the scholarly community.
Important initiatives such as DataCite have contributed to setting standards in the area of data sharing, recognizing the importance of all types of research outputs, and providing a persistent digital object identifier (DOI) for all research outputs.6 However, given the dominant role played by journal publications in academic careers in the humanities, it is important that incentives are in place for the scholars themselves to create and share datasets. Data journals provide scholars with this incentive and are, in Europe at least, a sign that humanities scholars themselves have become more data-active. Data journals are academic journals that publish articles describing and analyzing datasets rather than presenting new findings, theories, or interpretations. They are a relatively new concept in academic publishing, first appearing only a decade ago. The first data journals originated in scientific publishing to ensure that the creators and curators of datasets are rewarded for their work and that data sharing is facilitated and encouraged. Examples of such journals dedicated broadly to a range of scientific disciplines include Scientific Data,7 GigaScience,8 and F1000Research.9
Following the example of scientific disciplines, two data journals specifically dedicated to the humanities have appeared in the past few years in Europe: the Journal of Open Humanities Data10 and Research Data for the Humanities and Social Sciences.11 Although they are a growing niche within the academic publishing market,12 data journals constitute a great opportunity to support and enhance research in the data-driven humanities (Marongiu et al., “Le Journal of Open Humanities Data”). Thanks to their nontraditional format and scope, they have the ability to adapt to the evolving needs of this research. For example, the Journal of Open Humanities Data publishes both short data papers dedicated to the description of specific research objects, full-length research papers, and longer narratives devoted to the discussion of “methods, challenges, and limitations in the creation, collection, management, access, processing, or analysis of data in humanities research, including standards and formats.”13 The flexibility of the latter format allows authors to report on the challenges and methodological aspects and techniques, which is particularly important for computational research in the humanities. To further stress the tight connection between data objects and computational research, the Journal of Open Humanities Data, moreover, has published so far nine articles in the special collection dedicated to Computational Humanities Research Data.14 We believe that a focus on data sharing and data publication can help develop more data-driven humanities work across the disciplines. Joint efforts aimed at enabling reproducible research, crediting resource-creation, ensuring digital preservation, and crucially being part of the same data-centric innovative process in academic publishing can contribute to bridging gaps and empowering more data-driven research in the humanities.
In EHRI and CATCH, data issues required an organized division of labor between computer science and humanities. To facilitate reuse of datasets and future integration of computing and humanities interests, humanities-specific data journals offer an opportunity to publish and receive recognition for research data outputs. Data journals, just like their traditional counterparts, provide researchers with the means to gain recognition for their research outputs and should thus encourage the publications of datasets in all disciplines that will, in turn, allow for more directly humanities-driven research. They have significantly contributed to the wider availability of data in the humanities that has, in our experience, brought about a new collaboration toward data-driven humanities. Consider, for example, a dataset of the annual ethnic fractionalization index for 162 countries across the world from 1945 to 2013 (Drazanova, “Introducing the Historical Index of Ethnic Fractionalization [HIEF] Dataset”). This dataset was reused in a number of further quantitative and computational studies on rebel rivalry (Tokdemir et al., “Rebel Rivalry and the Strategic Nature of Rebel Group Ideology and Demands”), political polarization of nations (Davis and Vila-Henninger, “Charismatic Authority and Fractured Polities”), and racism and economic inequality of countries (Caller and Gorodzeisky, “Racist Views in Contemporary European Societies”).
Advancing Data-Driven Humanities
The growing level of interaction and exchange between computational sciences and humanistic disciplines has had a strong effect on data-sharing practices, as we described in the previous section. It has also changed the relationship between computer science and humanities. Research work is not simply divided anymore but shared. On the one hand, this has led to a general recognition that new and better methodological frameworks for quantitative research in the humanities are needed (e.g., Clifford et al.; Jenset and McGillivray; Bode; McGillivray, Colavizza, and Blanke, to name a few) and that this has implications for the relationship between the tools being developed and the research questions that can be answered with them (as argued in Scheinfeldt, “Where’s the Beef?”). This has been taken critically by some, who have supported the view that computational and digital methods in the humanities, and particularly history, have led the field away from argument-driven scholarship toward tool building and resource accessibility (Blevins, “Digital History’s Perpetual Future Tense”). On the other hand, scholars have expressed views on which infrastructural considerations best enable high-quality research at scale (cf., e.g., Smithies, “Software Intensive Humanities”) and have voiced a call for more reproducible and open research practices (Liu, “Assessing Data Workflows for Common Data ‘Moves’ across Disciplines”), which are particularly topical in the case of research at the interface between computer science and humanities. This section reports on some of the projects that we have been involved with where this new type of quantitative open research has succeeded in enhancing humanities understandings.
The HiTime project (Van de Camp, “A Link to the Past”) took a data-centric approach to create social networks between people, professions, locations, events, and schools of thought in the labor movement from 1850 to 1940. By analyzing a database containing information about strikes and newspaper articles, the project team uncovered links between the strikes mentioned in the database and those mentioned in the newspapers, as well as a host of “almost” strikes mentioned in newspapers but that were canceled and thus never appeared in the database (Van den Hoven, Van den Bosch, and Zervanou, “Beyond Reported History”). The potential of this dataset became apparent only by discussing the text analytics results early and often with the historians and repeating the experiments frequently, otherwise they could have easily been dismissed as not relevant because the strikes did not take place. Small changes make a big difference in data-driven humanities.
In another example, our interdisciplinary team developed a “materialist sociology of political texts” of post-1945 U.K. government white papers (Blanke and Wilson, “Identifying Epochs in Text Archives”). We focused on U.K. government white papers to map connections and similarities in political communications from 1945 to 2010. These are 888 documents and 19.3 million words in total. Rather than discovering large-scale trends since 1945 in the political documents, we were interested in how we could deconstruct standard perceptions of political epochs that are related to, for instance, government changes or major historical events such as wars. The team relied on machine learning to classify historical time periods by means of the ambiguity, fairness, morality, and political sentiments in the documents. Three longer-term epochs of political communications emerged from correlating these sentiments: from 1945 to 1965, from 1965 to 1990, and from 1990 onward. This way the team was able to then trace changes of meaning in key political concepts across these three epochs using topic models and word2vec word embeddings. Each approach reinforced the strong differences in political communication in the three epochs, underlining that such projects can use computational methods to alter typical perceptions of historical epochs.
A “computational genealogy” was developed (Blanke and Aradau, “Computational Genealogy”) using the inaugural addresses of U.S. presidents over time. The goal of the project was to find out together with researchers from critical studies how computational methods can help to articulate discontinuity and moments of dissent that have been central to critical historical work. In our experience, this runs counter to the computational science tendency to look for patterns and trends. The team had to learn how to integrate new vocabularies of discontinuity as they appear in computational analysis: anomalies, spikes, outliers, influence, detrending, and so on. Even in a fairly small and closed corpus such as the inaugural addresses, there are no clear trends and differences that seem to dominate. The analysis focused on machine learning techniques that surface differences, such as anomaly detection and detrending. For example, Donald Trump’s inaugural rhetoric was found to be distinct not so much from his direct Democrat predecessors but from other Republicans. Trump’s speech highlights a struggle within the Republican party and the disappearance of Dwight Eisenhower’s internationalist ideas. The trend toward Trump is most influenced by Ronald Reagan’s conservatism and Republican ultranationalism. It is interrupted, however, also by Abraham Lincoln’s national consolidation and Eisenhower’s internationalism. This example shows that contributions to data-driven humanities can be made with relatively small existing datasets and readily available tools.
Another project further illustrates a fruitful interdisciplinary collaboration where the humanistic perspective of classics scholars met the interests of statisticians and machine learning scientists, with an ambitious aim to lead to original research in both fields. This project’s goal was to develop new computational models for semantic change in Ancient Greek.15 The task was identifying the change in meaning of words over time (McGillivray, “Computational Methods for Semantic Analysis of Historical Texts”). The starting point of this research was work done in computational linguistics, which relies on distributional representations of word meaning based on word co-occurrence statistics in large corpora (cf. Tahmasebi, Borin, and Jatowt, “Survey of Computational Approaches,” for an overview). These models use time as the only metadata field, thus risk missing subtle nuances in word meaning, which are particularly important in the case of ancient languages.
Computational linguists tend to be interested in developing state-of-the-art models that can automatically identify words as they change meaning in a certain time span. This has a range of applications, including lexicography, information retrieval, opinion, and sentiment mining. From the point of view of classicists, semantic shifts can help explore questions about the semantics of Ancient Greek words in relation to historical, stylistic, cultural, and geographical factors. Bridging the gap between the two sets of interests required careful consideration. One very positive aspect of this interdisciplinary exchange was that it led to an innovative contribution to the design of the computational model itself. The insight that polysemy and semantic change are particularly closely related in ancient languages, and that genre plays a very important role in the semantics of Ancient Greek words, led to the development of a new computational model that incorporates genre as a key factor in the distributional contexts of words (Perrone et al., “GASC”).
These examples show that new methods allow scholars to answer different (types) of research questions and gain new insights—in our cases, finding unexpected patterns indicating events that did not happen or rhetorical differences not between politicians of opposite parties but of different politicians within a party over time. Big data and computational methods allow us to effectively present a different lens on a domain as a complement to traditional close-reading methods.
Going Forward
Combining the digital with the humanities does not only pose research challenges, but it also challenges our research cultures. In our experience, bridging these cultures is where many of the current challenges for collaboration lie. This focus on collaboration is by no means new to the discourse on digital humanities; Spiro (“‘This Is Why We Fight’”), for example, highlights collaboration among the core values for the digital humanities community, and Davidson (“Humanities 2.0”) discusses collaboration and customization in the context of the cocreation of collective projects (particularly archives) involving the public. In this chapter, we stress the importance of collaboration specifically with the community of researchers in computer science and discuss what shape this collaboration can take. We also point to new developments, which come with the development of data science as a separate discipline and the larger availability of advanced computational methods.
Where the computer science publication culture puts a lot of emphasis on conferences with low acceptance rates, the humanities publication culture is focused on books and journal articles.16 These different publications (and, by extension, the field’s research evaluation measures) demand different research cycles that currently bar optimal cross-pollination between the fields. It is unlikely that any of the different disciplines will radically change their research and publication culture. However, interdisciplinary projects can try to use this as a feature rather than a bug and present their projects to the different communities represented in their project. With multiple project members, each can take the lead in tailoring the project’s results to their research community’s preferred format.
We also need to rethink our research evaluation strategies that currently behave as silos between the different research fields. More and more early-career researchers are becoming digital humanities scholars and may not fit one or the other publication culture paradigm, so they are at risk of falling behind when it comes to having their CVs evaluated for project proposals and next career steps. The trend to share datasets and code also calls for expanding research evaluation metrics to also include types of work that are extremely valuable to the community but are often not included in standard evaluations.17
We observe progress in the growing acceptance of computational cultures from within the appointment panels that we have participated in. There is also clearly a fairly well-funded attempt by research councils in Europe to bring together computer science and humanities. However, we expect the most sustained impact to come from changes within the disciplines themselves. The emergence of data science has widened the scope of computing research to work with more diverse data. In the humanities as well, we see concerted efforts to bridge the gaps presented in this chapter. Through data publications, humanities scholars can participate in the creation of datasets that can be shared and reused and still achieve recognition comparable to other research outputs like journal articles. Inspired by similar efforts in other disciplines and in active collaboration with them, numerous data-driven humanities projects are now underway that are based on small teams working around specific research questions and interests in the humanities.
Notes
1. In rare cases, the Dutch Research Council allowed two postdoctoral researchers or another configuration of the team.
3. Disclaimer: one of the authors was involved in two CATCH projects, one as a PhD student and one as a postdoctoral researcher.
11. https://brill.com/view/journals/rdj/rdj-overview.xml?lang=en.
12. Although it is not a dedicated data journal, the Journal of Cultural Analytics also has a section on datasets (https://culturalanalytics.org/section/1579-data-sets). Datasets are defined as offering “lengthy discussions of curatorial choices associated with new data sets relevant to cultural study” (https://culturalanalytics.org/about).
14. https://openhumanitiesdata.metajnl.com/collections/computational-humanities-research.
15. See “Computational Models of Meaning Change in Ancient Greek,” a project from the Alan Turing Institute, at https://www.turing.ac.uk/research/research-projects/computational-models-meaning-change-ancient-greek.
16. Cf. “ACL 2019 Acceptance Rates,” June 18, 2019, http://acl2019pcblog.fileli.unipi.it/?p=161, and “NeurIPS 2019 Stats,” September 9, 2019, Medium, https://medium.com/@dcharrezt/neurips-2019-stats-c91346d31c8f.
17. In 2010, the Altmetrics manifesto was published (https://altmetrics.org/manifesto/), calling for a broader approach to measuring research impact. Since 2019, all the universities of the Netherlands, the Royal Netherlands Academy of Arts and Sciences (KNAW), the Dutch Research Council (NWO), the Netherlands Organisation for Health Research and Development (ZonMw), and the university hospitals have been working on Recognition & Rewards, a national program to shape a different and broader approach to recognizing academic staff for the work they do. For more information, see https://www.knaw.nl/en/publications/recognition-and-rewards-agenda-2022-2025.
Bibliography
- Blanke, T., and C. Aradau. “Computational Genealogy: Continuities and Discontinuities in the Political Rhetoric of US Presidents.” Historical Methods: A Journal of Quantitative and Interdisciplinary History (2019): 1–15.
- Blanke, T., and C. Kristel. “Integrating Holocaust Research.” International Journal of Humanities and Arts Computing 7, no. 1–2 (2013): 41–57.
- Blanke, T., M. Bryant, M. Frankl, C. Kristel, R. Speck, V. V. Daelen, and R. V. Horik. “The European Holocaust Research Infrastructure Portal.” Journal on Computing and Cultural Heritage (JOCCH)10, no. 1 (2017): 1–18.
- Blanke, T., and J. Wilson. “Identifying Epochs in Text Archives.” In IEEE International Conference on Big Data (Big Data), Boston, MA, 2219–24. 2017. https://ieeexplore.ieee.org/document/8258172.
- Blevins, C. “Digital History’s Perpetual Future Tense.” In Debates in the Digital Humanities 2016, edited by Matthew K. Gold and Lauren F. Klein. Minneapolis: University of Minnesota Press. 2016. https://dhdebates.gc.cuny.edu/read/untitled/section/4555da10-0561-42c1-9e34-112f0695f523.
- Bode, K. A World of Fiction: Digital Collections and the Future of Literary History. Ann Arbor: University of Michigan Press, 2018. https://doi.org/10.3998/mpub.8784777.
- Caller, S., and A. Gorodzeisky. “Racist Views in Contemporary European Societies.” Ethnic and Racial Studies (2021). https://doi.org/10.1080/01419870.2021.1952289.
- Clifford, J., B. Alex, C. Coates, A. Watson, and E. Klein. “Geoparsing History: Locating Commodities in Ten Million Pages of Nineteenth-Century Sources.” Historical Methods 49, no. 3 (2010): 115–31. https://doi.org/10.1080/01615440.2015.1116419.
- Colavizza, G., I. Hrynaszkiewicz, I. Staden, K. Whitaker, and B. McGillivray. “The Citation Advantage of Linking Publications to Research Data.” PLoS ONE 15, no. 4 (2020): e0230416. https://doi.org/10.1371/journal.pone.0230416.
- Davidson, C. N. “Humanities 2.0: Promise, Perils, Predictions.” In Debates in the Digital Humanities, edited by Matthew K. Gold, 476–89. Minneapolis: University of Minnesota Press, 2012.
- Davis, A. P., and L. Vila-Henninger. “Charismatic Authority and Fractured Polities: A Cross-National Analysis.” British Journal of Sociology 72 (2021): 594–608. https://doi.org/10.1111/1468-4446.12841.
- Drazanova L. “Introducing the Historical Index of Ethnic Fractionalization (HIEF) Dataset: Accounting for Longitudinal Changes in Ethnic Diversity.” Journal of Open Humanities Data 6, no. 1 (2020): 1–8. https://doi.org/10.5334/johd.16.
- Jenset, G. B., and B. McGillivray. Quantitative Historical Linguistics. A Corpus Framework. Oxford: Oxford University Press, 2017.
- Liu, A. “Assessing Data Workflows for Common Data ‘Moves’ across Disciplines.” Alan Liu, May 6, 2017. https://doi.org/10.21972/G21593.
- Marongiu, P., N. Pedrazzini, M. Ribary, and B. McGillivray. “Le Journal of Open Humanities Data: enjeux et défis dans la publication de data papers pour les sciences humaines.” In Humanités numériques et science ouverte. Lille, France: Presses Universitaires du Septentrion, 2022.
- McGillivray, B. “Computational Methods for Semantic Analysis of Historical Texts.” In Routledge International Handbook of Research Methods in Digital Humanities, edited by Kristen Schuster and Stuart Dunn, 261–74. Abingdon-on-Thames: Routledge, 2020.
- McGillivray, B., G. Colavizza, and T. Blanke. “Towards a Quantitative Research Framework for Historical Disciplines.” In COMHUM 2018: Book of Abstracts for the Workshop on Computational Methods in the Humanities 2018, edited by M. Piotrowski, 53–58. Lausanne: Université de Lausanne, 2018. https://zenodo.org/record/1312779#.W2B4I62ZNTY.
- McGillivray, B., T. Poibeau, and P. Ruiz Fabo. “Digital Humanities and Natural Language Processing: ‘Je t’aime . . . Moi non plus.’” DHQ: Digital Humanities Quarterly 14, no. 2 (2020).
- Parry, D. “The Digital Humanities or a Digital Humanism.” In Debates in the Digital Humanities, edited by Matthew K. Gold. Minneapolis: University of Minnesota Press, 2012.
- Perrone, V., M. Palma, S. Hengchen, A. Vatri, J. Smith, and B. McGillivray. “GASC: Genre-Aware Semantic Change for Ancient Greek.” In Proceedings of the 1st International Workshop on Computational Approaches to Historical Language Change 2019, edited by Nina Tahmasebi, Lars Borin, Adam Jatowt, and Yang Xu, 56–66. Florence, Italy: Association for Computational Linguistics, 2019.
- Scheinfeldt, T. “Where’s the Beef? Does Digital Humanities Have to Answer Questions?” In Debates in the Digital Humanities, edited by Matthew K. Gold. Minneapolis: University of Minnesota Press, 2012.
- Scherp, A., T. Franz, C. Saathoff, and S. Staab. “F—A Model of Events Based on the Foundational Ontology Dolce+ DnS Ultralight.” In K-CAP’09: Proceedings of the Fifth International Conference on Knowledge Capture, 137–44. New York: Association for Computing Machinery, 2009.
- Shaw, R., R. Troncy, and L. Hardman. “Lode: Linking Open Descriptions of Events.” In Asian Semantic Web Conference, 153–67. Berlin: Springer, 2009.
- Smithies, J. “Software Intensive Humanities.” In The Digital Humanities and the Digital Modern, 153–202. Basingstoke: Palgrave Macmillan, 2017.
- Smithies, J. “Towards a Systems Analysis of the Humanities.” In The Digital Humanities and the Digital Modern, 113–51. Basingstoke: Palgrave Macmillan, 2017.
- Spiro, L. “‘This Is Why We Fight’: Defining the Values of the Digital Humanities.” In Debates in the Digital Humanities, edited by Matthew K. Gold. Minneapolis: University of Minnesota Press, 2012.
- Tahmasebi, N., L. Borin, and A. Jatowt. “Survey of Computational Approaches to Diachronic Conceptual Change.” ArXiv:1811.06278 [cs.CL] (2018).
- Tokdemir, E., E. Sedashov, S. H. Ogutcu-Fu, C. E. M. Leon, J. Berkowitz, and S. Akcinaroglu. “Rebel Rivalry and the Strategic Nature of Rebel Group Ideology and Demands.” Journal of Conflict Resolution 65, no. 4 (2021): 729–58. https://doi.org/10.1177/0022002720967411.
- Underwood, T. “A Genealogy of Distant Reading.” Digital Humanities Quarterly 11, no. 2 (2017).
- Van de Camp, M. “A Link to the Past: Constructing Historical Social Networks from Unstructured Data.” PhD thesis, Tilburg University, 2016.
- Van den Akker, C., S. Legêne, M. van Erp, L. Aroyo, R. Segers, L. van der Meij, J. van Ossenbruggen, et al. “Digital Hermeneutics: Agora and the Online Understanding of Cultural Heritage.” In WebSci’11: Proceedings of the 3rd International Web Science Conference, 1–7. New York: Association for Computing Machinery, 2011.
- Van Hage, W. R., V. Malaisé, R. Segers, L. Hollink, and G. Schreiber. “Design and Use of the Simple Event Model (SEM).” Journal of Web Semantics 9, no. 2 (2011): 128–36.
- Van den Hoven, M., A. van den Bosch, and K. Zervanou. “Beyond Reported History: Strikes That Never Happened.” In Proceedings of the First International AMICUS Workshop on Automated Motif Discovery in Cultural Heritage and Scientific Communication Texts, Vienna, Austria, 20–28. 2010.