Maps as Data

Katherine McDonough

What if historians could change the way we interact with maps? Though it became common to mechanically reproduce maps in the late eighteenth century, it has remained uncommon to examine them in large numbers. As with a large set of books, it is rare to work with thousands of maps since it can be difficult to examine them simultaneously. In fact, maps are particularly challenging because they can be so large. It is hard to fit more than a dozen small maps on the largest tables in map libraries. Beyond table size, other material- and preservation-related constraints come into play. Like books, maps bound in a volume can only be consulted page by page, and very delicate maps sometimes cannot be requested at all. Using maps in small numbers therefore became standard practice in the last 200 years. This is relevant here because the way we access historical documents is related to the kinds of questions we ask and the ways we answer them.

Because of those constraints, historians and other humanists tend to ask questions that require treating maps as if they were books: sequentially, rather than in sets. Many historians continue to think of maps as individual illustrations of past landscapes or, at best, as “texts” to be “read” one by one or in manageably sized sets (Edney, Cartography). Historians of maps and mapping practices, who often conduct studies of hundreds of maps, tend to be dedicated to understanding maps as cultural objects (for example, Withers, “On Trial”).1 Such narratives, which string maps together to show how mapping practices and products changed over time, have contributed to arguments about empire, urban development, mobility, the consolidation of state power, and scientific research.2 Yet these claims are based on examining maps one by one, mimicking both the way we encounter them in the reading room and the way we read other texts. So, if maps have had relatively limited roles in humanistic research as large groups of primary sources because of this path dependency, we could ask ourselves how our questions and claims might change if we could access them differently.

Maps are not akin to written documents: they consist primarily of visual features—with text, numbers, or other symbols added for their explanatory power. Interpreting their rhetorical messages requires accounting for new kinds of agents involved in their creation and reproduction. More fundamentally, it can be hard to interact with maps because it can be difficult to find ones that are relevant to a given investigation. They often are not cataloged in as much detail as books, pamphlets, or archival documents. Scanning maps, or digitizing at the document level, gained momentum in libraries and archives in the 1990s and continues. This was an opportunity for researchers like me to rethink the status quo.3 Digitization encouraged people to develop new strategies for looking at maps: moving from the reading-room table to a set of screens opened a creative space for imagining new ways of interacting with multiple maps at once, for example, as superimposed layers in geographic information systems (GIS) or visualized in other ways.

Nevertheless, even with digitized collections—celebrated by curators for improving access to rare items and by researchers for reducing the costs of research trips—the habit of thinking with one map at a time has held fast. Humanities projects that engage with scanned maps using GIS still do so in small batches. But there are other possibilities. My own research, for example, was transformed by being able to review files scanned by libraries thousands of miles away from those institutions and quickly change the size of an image. I used a high-definition screen at the David Rumsey Map Center to visualize much larger versions of small paper maps from eighteenth-century France. These maps documenting road construction took on new meaning as I deciphered faint pencil marks reappearing across the set. Digitization gave me access to high-quality images from multiple collections in France that would otherwise have been impossible to see simultaneously. Piecing together these maps on the screen brought me closer to the way that civil engineers saw them as they organized forced-labor service in rural Brittany. But more importantly, it opened up possibilities beyond simply scanning and viewing the images in finer detail and sharper focus. Next, I digitized the content of the maps by developing a dataset of the names and locations of the villages and roads drawn on the maps. More than simply an Old Regime engineers’ tool for managing construction labor, the maps reflected how provincial leaders were beginning to organize rural places for political purposes. I examine, for example, which communities were not indicated on the maps and why.

My set of maps was small enough to work with manually, but now there are hundreds of thousands of maps scanned around the world. Imagine what looking closely at a much larger set might offer! But it is impossible for one pair of eyes to see so many maps with an analytical eye. How can we move beyond simply digitizing images and uniting disparate collections to asking questions about the content of very large corpora of maps? What do we need to do to turn digital images of historical maps into interpretable data for humanities research?

Soon, looking at maps on screens shifted to extracting data from maps. Map content became fodder for creating digital data. This second wave of digitization—creating discrete, structured data from maps rather than treating a sheet as one digital object—raised a new set of obstacles and opportunities. Data from maps needs to be critically created and curated; it can be linked to other data. At this point, the question became whether to spend time on manual data creation (for a historian, this often boiled down to whether to do spatial history at all).

Now, propelled into the third wave of digitization by machine learning–enabled computational image analysis and its application in the visual digital humanities (DH) since the late 2010s, we have arrived at the point where it is possible to digitize the content of maps automatically rather than manually. Given this shift, two key questions are open for debate. The first question explores the tipping point between manual and automatic methods while the other seeks to bring a humanistic eye to automated methods. In other words, what are the merits of digitizing data from maps manually versus automatically? And if we embrace automatic data creation at scale, how do we approach historical maps critically as we investigate them?

Answering these questions is fundamental to the future of research with scanned map collections. And they are timely because the number of digital map collections around the world is growing quickly. It has taken a long time for the demand to work differently with maps to surface because of the kind of deep-rooted scholarly traditions that come almost naturally to historians. More than just single images, map content is one of the next frontiers in working computationally with collections as data.4 Because of digitization, historians—and other researchers—are now able to reimagine what we ask of maps.5 In this chapter, I trace these three “waves” of digitization, foregrounding their chronology. The first three sections each address one wave: scanning, early uses of GIS by historians based on manual data creation, and automatic data creation from very large map collections.

The last wave introduces MapReader, a computer vision machine learning pipeline that I have developed with colleagues on Living with Machines.6 Specifically, I use an experiment in which we develop a new dataset for capturing the footprint of railway infrastructure (“railspace”) in nineteenth-century Britain as shown on early Ordnance Survey (OS) maps scanned by the National Library of Scotland. As an example of the third wave of digitization, MapReader is a method informed by deep attention to source criticism, but one that nonetheless challenges historians to work with data predicted by a machine. Like a visual census (Hosseini et al.), OS maps provide national coverage of the landscape at a very granular level but on far too many sheets for even our large team to manually digitize. MapReader combines the efficiency of speed with the flexibility of an iterative workflow designed to encourage historians to think outside the box when it comes to labeling parts of a map.

Of course, such a neat distinction between “waves,” or phases, is an oversimplification: in practice, they have been and will continue to be interwoven. Nevertheless, the narrative of digitization shines a light on the emergence over time of different opportunities for studying maps. In addition to this three-part narrative, this chapter also grapples with just what “digitization” means. In the humanities, to say that a map has been digitized usually means that it has been scanned and can be accessed as an image file. In geographic information science (GIScience), “digitizing a map” means turning its content into machine-readable data. This substantial difference in disciplinary expectations of digitization highlights that scanning maps is really only the first of many potential steps. An entire set of practices for working with maps has grown in the space between these two definitions: I return to this in the final section of this chapter. Automatic methods will never replace the manual work of digitizing map content because each approach suits different research questions and sources. Will historians learn to ask new questions and develop interpretations of very large sets of maps when these are dependent on machine-generated data? In other words, can we trust data predicted by machines?

Digitization Wave 1: Historical Maps as Digital Objects

Maps as digitized (scanned) sources were hard to come by twenty years ago. But today, big (well, at least medium) map data could be a reality. Thanks to government and institutional commitments to digitization of cultural heritage materials beginning in the 1990s, today hundreds of thousands of maps have been scanned.7 For a cross-institution overview, OldMapsOnline enables searching by location across major digitized collections: it has about 500,000 maps as of 2022.8 With so many maps being digitized, the next crucial step is making them openly available. Because public funds underpin this work, the results are usually, but not always, freely accessible for noncommercial purposes.9 Libraries and archives often share individual images from their digitized collections online using their own catalogs. For instance, the Library of Congress exposes each sheet of its newly scanned Sanborn Fire Insurance Maps collection this way. Other times, institutions make material available through APIs (OldMapsOnline, for example), or simply through data repositories or “lab” sites such as the National Library of Scotland’s Data Foundry.10 The British Library (BL) is, in this vein, releasing thousands of nineteenth-century Ordnance Survey maps funded by the Living with Machines project through BL Labs.11 Prioritizing access in these ways reflects the principles of “collections as data.” It demonstrates that institutions are seriously engaging with the call to build their digital collections with computational uses in mind (Padilla et al., “Santa Barbara Statement”).

Creating collections as data, however, requires ongoing effort to lower barriers of access to now very large sets of digital maps held around the world. Scanning and sharing maps online are only the most basic elements in a digitization project that make it possible to use these materials as historical data. Enriching maps with sheet-level metadata (e.g., cataloging them) and georeferencing the scanned images are two key tasks that make maps discoverable and allow them to be used with computational methods. However, many maps—especially the largest serial map collections—have not been cataloged, and institutions rarely have the resources to commit to this particular task. Similarly, georeferencing has been a postprocessing task that many libraries are not equipped to complete because they often lack the resources to train or hire staff and then commit significant allocation to this intensive work. Crowdsourcing has filled this gap for some institutions that can at least oversee georeferencing done by the public. The David Rumsey Map Center, the British Library, and the National Library of Scotland have used the Georeferencer platform (Fleet, Kowal, and Přidal).12 In other contexts, libraries have experimented with designing ways to georeference maps in batches; Ordnance Survey maps at the National Library of Scotland (Fleet, “Creating”) are an example. More recently, the AllMaps project is making it easy to georeference maps shared via IIIF (International Image Interoperability Framework).13 The power of these tasks cannot be understated: metadata about print dates, for example, is what allows us to analyze maps in time, while georeferencing is what plots them in space. Together, these two elements transform scanned maps from single images unrelated to each other into a group of items that have a chronological and spatial relationship. Authoring item-level metadata and georeferencing maps are two steps that allow us to work with maps as data.

Before the collections as data and related open data movements emerged in the mid-2010s, historians had no expectation that they might find digital maps online. Exceptions to this rule—such as the online catalogs at the David Rumsey Map Collection and the French National Library’s Digital Library (Gallica)—were so stunning that it took a few years to realize that just looking at maps on these websites was not the pile of gold at the end of the rainbow. In the next section, I walk through the early experiences of using maps as sources from which data could be generated using GIS. Often done one map at a time, this was nonetheless an exciting time when researchers began to realize the opportunities and limitations of GIS.

Digitization Wave 2: Make Your Own Data from Maps

Historians and archaeologists (among others) have learned to work with GIS as a way of managing historical and prehistorical information with a spatial context.14 Before there were large collections of scanned maps online, we might scan a map or two with the help of map librarians. Those scanned maps could be used as raster data to provide a historical base map in GIS. Many introductions to GIS begin with learning to georeference a scanned map and trace that map’s features “by hand” (e.g., using a mouse or trackpad to draw features so that they appear superimposed on the scanned image). In their irreplaceable guide to using GIS in historical research, Ian Gregory and Paul Ell walk readers through scanning a map, the basics of digitizing vector data, and georeferencing (Historical GIS, 43–51). The latter two steps transform map content into points, lines, and polygons that are located in the real world and that can be stored in a geospatial database.15 It is then possible to layer information from different sources and to examine those layers at different geographical scales. But as Gregory and Ell warn, GIS has a “tendency to exclude data that cannot be represented as points, lines, polygons, or pixels” (Historical GIS, 40). There are therefore two issues with the method of digitizing map content using GIS that are worth unpacking here in the context of this exploration of how we might work with maps differently as historians. First, there is a practical problem. Using GIS to create data manually can require immense resources. Second is an epistemological issue. GIS data has a very specific form that lends itself to specific representations of the built and natural environments.

Time and Money

This kind of digitization—making vector data from maps—has been a foundational task in spatial history and historical geography, despite warnings about the expense of creating such datasets. Gregory and Ell suggest to their readers that they “should be asking ‘what are the geographical aspects of my research question’ rather than ‘what can I do with my dataset using this software?’” (Historical GIS, 1; see also Anne Kelly Knowles, “Introduction,” 462). The extensive labor involved in transforming maps and other analogue documents into data has played a significant role in shaping spatial history.16 Manually digitizing map content is a useful part of the research process, like note-taking or linguistic annotation. But the time (or, put another way, money) required to perform this labor can be a barrier for scholars who need to create their own data. Colleagues working in underresourced settings lack funding that supports data creation, whether that is for personal research or for assistants. In a growing number of large, funded projects supporting spatial history research, the manual methods of digitizing map content continue to play an important role: the large-scale, Europe-based Time Machine supports groups who scan, transcribe, and organize maps along with other records in openly available geohistorical databases.17

Tracing line or polygon data (like roads or building footprints) is a classic, time-consuming digitization task that is well suited to robust GIS software packages. But many projects increasingly want a lighter-weight solution, especially for annotating point-based or simple polygon data (like place names and the symbols depicting the location of the place on a map). For work like this, the open-source platform Recogito is a popular tool for annotating features on maps and linking them to other datasets (Vitale, “Pelagios”). But in order to annotate a place name to assign it a location (on a map that has not been georeferenced), the user needs access to a knowledge base that indexes place names with metadata about those places. These knowledge bases are known as gazetteers. They serve many purposes, including disambiguating between place names and providing location data. They therefore allow linking across datasets. However, nonscholarly gazetteers (e.g., Geonames) cause more problems than they solve for many historical places. Before the surge of community interest in gazetteer development, large parts of the world as well as premodern periods simply had no reliable digital resources to point to for linking place names to locations (McDonough, Moncla, and van de Camp, “Named Entity Recognition”).18 Working with small datasets, locating places by hand is possible: the expert researcher identifies the coordinates of each location one by one. The World Historical Gazetteer is an important contribution for helping researchers identify gazetteers built by others, reconcile records in one resource against another (such as born-digital knowledge bases like Wikidata), and download data to reuse in platforms like Recogito. Access to carefully curated, linked gazetteers that are formatted using open data is a timesaver and best practice that promotes open, reproducible research in the humanities.

Even with advances and semiautomatic tools like Recogito, building a GIS database by hand requires a large investment of personal time and institutional resources. One justification for this work is now the promise of open, reusable data. Early historical GIS projects rarely made their complete databases publicly accessible, perhaps because they reflected so much labor over the years. Even today, in the age of data papers, institutional and national research repositories, and digital object identifiers (DOIs) for nontraditional outputs, it can feel like data creation work gets short shrift. It is likely one of the reasons that many people using GIS would prefer to embark on projects small enough to complete alone or with only a couple of collaborators in order to get past the data creation stage as quickly as possible. Historians using GIS want to prioritize making claims based on their spatial evidence rather than eking out an existence tracing railway tracks across the screen. But like the constraint imposed by the size of reading-room tables, time is a barrier to working with maps at scale, even when using tools like GIS.

So the challenge is to cultivate a humanistic approach to working with maps as data that allows individuals or small groups to use large map collections as input and yet still have valuable time to explore, analyze, interpret, and write about the results. Moving away from the resource-heavy requirements of GIS data preparation can allow researchers to ask specific questions, whose concepts can be represented creatively by data structures, and to craft an argument based on that evidence.

GIS Data Models

When used to translate map content into data, GIS software has made it very easy to strip maps of their richness and encourage digitization of information that lends itself to representation as points and other geometries. For example, much effort has gone into drawing boundary lines for parishes, cities, nations, and other jurisdictions. But everything we know about pre-nineteenth-century boundaries suggests that it is not appropriate to draw such well-defined lines across the digital landscape (Scholz, “Deceptive Contiguity”). Continuous information on a map becomes discrete vector data isolated from its context when it is digitized. Once you work with the vector data alone, you literally lose sight of that context in which map information was presented and begin to treat information digitized from maps as verified, real objects on the ground. Time and again, we have learned that maps can be misleading, fanciful, or even just unintentionally error-prone: reifying this information in vector data is not only time-consuming for the person doing the digitization, but it is also potentially hazardous in terms of presenting data that could be reused without an understanding of its limits.19

In contrast, humanities researchers might want to work in a computational environment where they can flexibly and contextually interrogate map scans. Prolonging the time spent with the map as a source of contested information and enabling change of direction after initial exploration are valuable actions that currently have no formal place in GIS approaches to working with scanned maps. Miriam Posner warned us about the dilemmas of talking about “humanities data.” Researchers in the humanities struggle with thinking about our sources, maps among them, as something that can be turned into discrete pieces of information that continue to have the same value independent of their context. “You’re not extracting features in order to analyze them,” she writes, “you’re trying to dive into it, like a pool, and understand it from within” (“Humanities Data”). This call for working with maps as data is founded on the imperative of retaining contextual meaning. We need a way to “go swimming” in a large corpus of maps as well as to be able to analyze the outputs of spatial analysis during data exploration and analysis.

Working with maps as data should allow researchers to use them as primary sources that are constituted through map-making and map-reading practices. A humanistic approach to working with maps at scale can recalibrate the danger of reading maps as truth statements. How can humanities scholars contribute to computational map processing to embrace another view of the world (shown on maps)? These questions point to places where humanists can join the conversation, changing the future of scientific and humanistic uses of maps. Humanistic inquiry around spatial data has already provided a number of important critiques of a decontextualized, scientific approach. The work of getting from map to data requires “knowing what is lost” and how this “is critical to understanding what can and cannot be learned from the extracted and chosen data” (Nicole Coleman, “Everything Is Data”). As we explore the potential of working with automatic methods, at scale, using different data formats, we must not lose sight of the politics of making data from maps.

GIS tools created to organize, explore, and analyze contemporary spatial information are not always well suited to understanding change over time, race, and inequality. In the last ten years, there have been challenges to the status quo that historical data should have the same shape as twenty-first-century data. If GIS has helped scholars uncover hidden histories in data, it was still designed to manage people and communities (in addition to more nefarious military applications) (Schuurman, GIS: A Short Introduction; Kurgan, Close Up at a Distance). Kim Gallon, in “Making a Case for the Black Digital Humanities,” reminds us that “computational processes might reinforce the notion of a humanity developed out of racializing systems,” which brings to mind long-standing debates about the power of maps and the emergence of critical GIS studies. As we move toward working at scale with digitized maps, this is yet another reason to develop alternative modes of creating spatial data (Jefferson, “Predictable Policing”).

Finally, why are we constrained by the rules set by GIS data models? Miriam Posner (“What’s Next”) has asked: “What would maps and data visualizations look like if they were built to show us categories like race as they have been experienced, not as they have been captured and advanced by businesses and governments?” Rather than depending on practices engrained in GIS software features, we can, as Posner suggests, “build something entirely different and weirder and more ambitious.” Giving the example of David Kim’s work on Edward S. Curtis’s photographs of Native Americans, she calls for looking at other ways of engaging with maps, ways that offer scholars a chance to reframe what we are looking for in maps and how we translate that into machine-readable content.

Coleman, Gallon, and Posner ask us to reconsider existing ways of making and processing humanities data. I believe there is a timely intervention that begins at the level of language and carries over into method development and research workflows. Across the disciplines now using computational methods to analyze media, a rhetoric of exploitation has been used when talking about getting data from maps, texts, and other sources—they are documents to be “mined.” For maps in particular, this language reverberates with the knowledge that GIS is used to document natural resource extraction around the world. Maps, like the earth itself, can be tunneled into, and data, like minerals, can be removed. Is this the way we want to interact with the past? Nuance, complexity, uncertainty, these are the historian’s constant companions, and they are usually out of place in the fairly inflexible world of GIS data. So, if we are not going to mine maps, what are we doing? In my work, I am testing out alternative language: instead of drilling down, what if we generate, create, or classify? The language we use to talk about how we work with maps as data has power.

Digitizing map content using GIS has made it possible to incorporate maps into new spatial historical scholarship, but still on a small scale. DH projects dependent on manual data creation are at a crossroads. If they have the resources, and if the nature of the research lends itself to representing space with vector data, then what follows might be of little interest. But I imagine that many people will have two questions about using automatic methods dependent on machine learning: Will the results be trustworthy? And how can I think differently about representations of space? For instance, will spatial analysis reproduce historical inequalities? Will it divorce my analysis from the visual context of my objects of inquiry?

The current opportunity to ask questions of very large numbers of maps is an unprecedented chance to think both about maps as repositories of information and, with careful planning, how mapped landscapes can be read to understand social, cultural, or environmental history. Sometimes, because of the nature of the question, there is no replacement for manual data creation. When the not unreasonable error rate of 10 to 20 percent in an algorithm’s predictions introduces too much doubt in a dataset, human effort is the solution. But it is likely that the way forward will embrace a hybrid approach with methods that save time by automating what machines do well and allowing experts to validate or complement machines where necessary.

Wave 3 Catalysts

I now turn to two developments that helped us move toward this third wave: automatic creation of vector data from scanned maps. Natural language processing, “text as data,” machine learning, and computer vision and visual DH all contribute to emerging work analyzing map content at scale.

Text as Data

In the last five years, using text as data introduced historians to the advantages and disadvantages of working at scale. Like the growth of distant reading in literature departments, text as data is shaping research in political science economics (see Grimmer and Stewart; Gentzkow, Kelly, and Taddy). In history, take-up has been slower than in, for example, English, but this is now beginning to change with work that brings together historians, information theorists, computational linguists, and others (see Barron et al., “Individuals”; Hitchcock and Turkel, “The Old Bailey Proceedings”). Linking text analysis and spatial analysis was the next step. Using methods from natural language processing, tasks like named entity recognition (NER), where a named entity is a unique reference to a thing, such as a person or place, and more broadly, the suite of practices now known as geographic information retrieval (GIR), it is possible to explore the spatial dynamics of texts and generate datasets from them. This has been a core activity of researchers in the spatial humanities, a community that crosses multiple disciplines and is united by a shared concern with working computationally with usually historical primary sources that can be analyzed spatially. It is also a concern among geographers, computer scientists, and linguists: GIScientist May Yuan wrote a key chapter on “Mapping Text” in the 2010 volume The Spatial Humanities: GIS and the Future of Humanities Scholarship. At the same time, the linguistics team behind the Edinburgh Geoparser was publishing their first papers on software, which many humanities researchers still encounter as an introduction to semiautomatic identification (with NER) and georesolution of place names in text (Grover et al., “Use of the Edinburgh Geoparser”). Finally, a series of projects led by Ian Gregory, including Mapping Lake District Literature and Spatial Humanities, contributed to the emergence of historical, spatial text analysis at scale.20 This work taught humanities researchers how to bridge the gap between qualitative and quantitative ways of thinking about space and place, how to assess bias in a corpus, and how to grapple with statistical measures as part of the evidentiary basis for an argument. This opened the door to translating such skills from texts to visual documents like maps.

Visual DH

Historians, art historians, archaeologists, curators, and others have begun to work in creative ways with computer vision and visual sources, launching a visual turn that already shows promise for combining state-of-the-art methods in machine learning, statistics, computer vision, and the humanities. This visual turn hinges on developing methods and theories for working computationally with visual and audiovisual sources. Lauren Tilton, Taylor Arnold, Thomas Smits, and Melvin Wevers have advocated for “distant viewing,” and they have expanded to visual collections the computational methods already embraced by the DH community working with texts. Tilton and Arnold offer the term “distant viewing” as a framework for “making explicit the interpretive nature of extracting semantic metadata from images” (Arnold and Tilton, “Distant Viewing,” i3–i4).21 As they suggest, creating semantic metadata, or information about the content of images, in a critical and well-documented manner is at the core of a humanistic computer vision research agenda. The visual turn proposed by these authors highlights machine learning–driven computer vision using convolutional neural networks (CNNs) as a method for searching. Computer vision with CNNs—which is one type of deep learning algorithm—learns relationships between pixels and then uses those features to predict further patterns in unseen data. The visual turn in DH is simultaneously a shift in medium and a call for working at scale by embracing machine learning (the predictive part) and statistics (the quantitative part).

For example, Smits and Wevers argue in their analysis of images in newspapers that applications of computer vision are within reach of humanities scholars or at least interdisciplinary teams (Wevers and Smits, “The Visual Digital Turn”).22 The projects that I worked on at the Alan Turing Institute (Living with Machines and Machines Reading Maps) brought together expertise in multiple domains to make new contributions to computational image analysis with historical documents. But one of the goals of Living with Machines was to lower the cost of entry to this method. And we are not alone. Tools already exist that make visual DH accessible to individuals with no previous experience in this kind of research. PixPlot from the Yale DH Lab allows users to explore visual patterns in large, static image collections as a data exploration tool.23

The opportunity to think about maps as data in new and exciting ways is one way that the visual turn in DH is forging exciting opportunities for the computational humanities. Like the photographs, newspapers, TV programs, paintings, and born-digital images that are now being studied, maps require careful attention to transform their visual, qualitative features into quantitative data. In applying insights from the visual DH turn to maps, of particular interest is the opportunity to replace now standard GIS “codes” (e.g., interpreting a building on a map as a polygon) with a new set of codes that are appropriate to humanities research questions. New codes, in machine learning jargon, are labels or metadata about an image or a region of an image. Just like GIS data structures, these codes play a role in determining what kinds of interpretative outcomes are possible. However, we lack methods for large-scale analysis that enable this rethinking of how to capture spatial phenomena as shown on maps. In the following section on the latest digitization wave, I introduce MapReader, a computer vision pipeline that specifically focuses on what kind of task and labels are suited for working with historical maps. MapReader prompts users to ask questions of maps that might challenge existing paradigms of spatial data while offering the chance to work at the unprecedented scale of tens of thousands of maps.

Digitization Wave 3: Automatic Data

In this section, I discuss the MapReader pipeline.24 This computer vision pipeline offers practical answers to the questions driving this chapter: how to work critically with digital maps and how and why to automate data creation and curation. MapReader is a key output from Living with Machines, a digital history project colocated at the Alan Turing Institute and the British Library in London.25 Two key features of this project were at the heart of MapReader’s creation: first, a desire to work in new ways as historians with colleagues from many disciplines in order to ask and answer historical questions of large, digitized collections; and second, a critical awareness that automating data creation requires new theories and practices for accommodating machine-generated error. I want to contextualize MapReader in relation to the new field of historical map processing that has largely developed within computer science.

The high costs of historical geospatial data creation paired with the massive explosion in online map collections has created a body of scholarship in GIScience called historical map processing (also referred to as the automatic extraction and geolocation of map content, or raster-to-vector conversion).26 In Figure 6.1, we see what this research aspires to encompass. It includes a number of methods used frequently in the spatial humanities (georeferencing, querying gazetteers) but positions map processing as an end-to-end workflow that produces carefully created “analysis-ready historical geospatial data” mimicking the manual vectorization process.

If automatic map processing was an afterthought in 2007 (Gregory and Ell, Historical GIS, 45), in 2020 it is a well-funded challenge for many teams around the world. Even Google Research joined the fray (Tavakkol et al., “Kartta Labs”). Scientists and businesses see immense value in historical maps—to create fine-grained, longitudinal datasets about, for example, housing stock, forest cover, or mining resources.

Historical map processing workflow diagram. — Figure 6.1. Example of a historical map processing workflow, including methodological approaches for each step (in gray boxes). Source: Uhl and Duan, “Automating Information Extraction from Large Historical Topographic Map Archives,” 511.

Figure Description

An example of a “historical map processing workflow” from the geographic information science field, illustrating possible distinctive steps (darker boxes) for using computational methods (in lighter boxes) for creating machine-readable, analysis-ready data from scanned maps. Most of these steps and methods represent areas of active research. (Source: Uhl and Duan, “Automating Information Extraction from Large Historical Topographic Map Archives.”)

This exciting new work will benefit from greater communication and collaboration with humanities researchers and curators, who can refine these methods and attend to the implications of turning qualitative features in maps into quantitative, precisely located data stored in a database and visually abstracted from their context. One risk of this drive to vectorize, and to do so quickly, is that it may reinforce the (very) outdated view of maps as a container of objective information. Layering different datasets has the potential to reveal dissonance between mapping cultures, but too often layers calcify around each other and turn abstracted features into factual statements about what is or was on the ground.

MapReader

MapReader makes it possible to ask historical questions of large collections of maps (Hosseini et al., “Maps of a Nation?”).27 When it comes to working with scanned maps at scale, we need to simplify massive collections so that visual patterns across space and time, like streets of terrace housing or industrial areas, are actually findable. Making vector data from specific features on maps is an important way to explore spatial information over time and space, but it is not the only way. Raster data assumes that the graphic primitive, or basic form, is the pixel. With MapReader, we propose a flexible, modifiable area created by systematically dividing a map sheet into equally sized squares—patches—as the unit of analysis (Hosseini et al., “MapReader”).

For Living with Machines, the goal was not just to speed up the process of making vector data but rather to reframe how historians can engage with maps. We have created a method for iteratively asking questions and creating simple outputs that can then be analyzed as geolocated, structured data. The inputs to our computer vision pipeline, MapReader, are scanned maps (downloaded as web map tiles from the National Library of Scotland servers). The outputs are predictions of a researcher-defined label that correspond to a certain area of the map. These areas are “patches” of the map (see Figure 6.2), the size of which is determined by the researcher as a preprocessing step after the map tiles have been acquired and linked to collection metadata. This metadata contains bounding-box coordinates that allow us to reconstruct the limits of the physical map sheets among the downloaded tile layers.28 Each patch represents a section of the map, which one can then ask questions of by assigning labels.

Labeling data during annotation exercises to collect gold standard or training data becomes an active process of considering what a map is and what you can ask of it, given the “skills” as well as the limitations of state-of-the-art computer vision models. Once we have enough training data, we fine-tune and then use a CNN model to predict those same labels for millions of patches across thousands of maps. Because each patch prediction is geolocated (when input maps are georeferenced), we can perform further spatial analysis on the patches and link them to other patches with different labels or external datasets. The patches are effectively raster (image) data: the label is an attribute of the patch segment of the map, but we can work with the patch as simple point data (based on the centroid of the patch) or as a square representing the bounding box of the patch (with or without the image data included).

Moving back and forth between the visual representation of patch predictions and csv files of those predictions allows us to remain in touch with the original source. This “patchwork” method used in MapReader is adaptable to any question that seeks to find patterns and distributions of visual information on a large number of maps. It is well suited to questions that are investigating phenomena that may be visual but for which having very precise locational data is not necessary. Once the patches for all labels have been predicted, we can perform standard spatial analysis to understand the results. For example, for label X, we could analyze the “density” of X in particular areas: How many X patches are surrounded entirely by other X patches? Where are they located? We can inspect the results visually (qualitatively), overlaid on the maps being investigated, and quantitatively, using software libraries like geopandas. The patch is a useful category of analysis to think about because it represents an action of dividing up the map for closer examination, rather than removing a unique feature to analyze in isolation from the rest of the map content.

One research objective of Living with Machines is to explore the impact of the arrival of rail in British communities. Therefore, understanding where railway infrastructure takes up space on British land was a key objective. Given access to the collection of scanned, georeferenced Ordnance Survey map sheets from the National Library of Scotland, we wanted to identify areas where there was a presence of any kind of rail development in the nineteenth century.29 Existing data about British railways (tracks and stations) has a few limitations. First, they are incomplete, missing either stations or tracks (especially single-track lines). Second, they have no metadata for pinning down in time the track or station lines and points, only documenting what was present in three independent “snapshots” from 1851, 1861, and 1881 (for example, Henneberg et al., “1881”). We could not depend on these earlier open-access datasets because of their coverage gaps and the lack of metadata. But above all, we knew that we had other questions to ask of maps—not about rail. Given this, we wanted a method that could work for our rail questions but equally well for other topics in the project (and eventually for issues outside Living with Machines). Finally, while digitized images of maps sometimes have restrictions on sharing/reuse, the csv files can usually be shared as derived data with simple permissions. The patchwork method supports these goals: it is adaptable to researcher-defined labels (e.g., rail, building, coast, road), captures information on maps about the built or natural environment that might be missing other primary sources, can be easily checked by the researcher (e.g., is it a real thing or a cartographical mirage) against other sources, and can be re-released as open data about the historical landscape.

Our first experiments predict what we define as railspace (any visible rail development in a patch, including stations, depots, warehouses, sidings, and tunnels) and buildings (any size building, anywhere in a patch) using Ordnance Survey (OS) maps for England, Wales, and Scotland from the latter half of the nineteenth century and early years of the twentieth century (e.g., the six-inch-to-one-mile second edition sheets in the National Library of Scotland collections). We ask questions about railspace instead of railways to move away from the idea that we are capturing only the network of tracks. Railspace intentionally captures more than just the immediate pixels for tracks and warehouses, for example. As a concept applied to a patch, its expansiveness acknowledges the way that rail infrastructure affected the land surrounding it. From direct impacts like increased noise or air pollution to indirect ones, such as higher or lower land values (depending on the type of infrastructure), rail often imposed a buffer between it and adjacent development. Railspace brings this buffer zone into the field of vision and reminds researchers of the broader impact of the arrival of rail across the nation. Railspace is not a carefully verified version of the network of freight or passenger rail. It should not be used to calculate transport costs without further data curation and would require transforming the raster patch data into polylines.

In addition to this conceptual stance, what we are really labeling is the way that mapmakers show railspace as it sits among other elements in the landscape. We understand the map as not just a simplification of the environment at a moment in time; it also reflects a set of mapping practices (Kitchin and Dodge, “Rethinking Maps”). In deciding whether and how to label a visual feature on a map, the researcher must ask how the cartographer approached this information at the time and whether or not it is a good candidate for annotation using MapReader. So, as with any method, not everything lends itself to this approach. Often, this is because the label selected by a researcher does not coincide with the content: in dense urban areas on OS maps, for example, it is impossible to distinguish between residential, industrial, and commercial buildings. In this sense, if the researcher is interested in identifying housing, these are simply not the best sources. Thus, “residential” is not a useful label. Labels do not need to perfectly match a single visual signal (railway tracks, for example, as opposed to railspace), but the patch content needs to be visually specific enough to not overlap with other labels you wish to predict. The visual signal for a label might fill the patch entirely or be found in only part of it. The intentional coarseness of the patch (as opposed to a pixel) is part of our general goal of not reifying map features in the digital world as abstracted truth statements. Embracing patches as an alternative data format creates some critical distance between the historical map and the historical concept a researcher is interested in.

MapReader operates in a series of Jupyter notebooks, including ones for training data annotation and review. One experiment includes the steps of creating a corpus of georeferenced maps and their metadata records, labeling a subset of this corpus, testing and fine-tuning the available models, visualizing the initial results on a map to see if there are systematic biases, adding more annotations as necessary, rerunning the inference, and finally, exporting the output as a csv file. There are multiple stages at which the user can return to annotation, either to add a new label, remove a label, or completely start from scratch with new guidelines for labeling. The great peril of GIS digitization of map features is that once you begin digitizing a feature, it is difficult to change course partway through: in contrast, MapReader is fast and flexible and in fact encourages using the annotation part of the workflow as an active way of refining research questions in light of spending time looking at patches in their context. The manual labor of labeling training data for MapReader dwarfs the labor that would be needed to create vector data for the entire set of about 15,000 maps.

In seeking a humanistic approach to historical map processing, we strive to acknowledge conventional GIS approaches while also shifting our focus, quite literally, to an alternative way of looking. The railspace and building experiments are two examples of labels that a historian could use after reflecting on the relationship between what is visually documented on the map and what is of interest historically. During the training data annotation step, looking at the map through its patches focuses the eye on 100-square-meter areas. In the practice of extracting vector data, the researcher often only really looks at the tiny parts of the map where the feature of interest appears. In this sense, during the actions to select and digitize those specific features, the researcher stops looking at the whole and stops thinking about broader context and uncommon spatial relationships between features, instead homing in on the presence or absence of one thing. The patchwork method lends itself to research where the measurement of that presence or absence of, for example, the reconstruction of an actual rail network is less central. For an economic historian measuring the cost of freight travel, such a network is crucial. For the social historian who seeks to investigate the role that the arrival (and non-arrival) of rail had on the towns and villages it passes through, the connectedness and perfection of the network is secondary to a rougher representation of that information. Patches are the friend of the researcher who seeks out context, while vector data supports the analysis of discrete entities. Both are valuable, but each serves different research communities.

Example of MapReader annotation interface. — Figure 6.2. Detail of the Jupyter notebook for annotating patches in MapReader. Source: Hosseini et al., “MapReader.”

For the railspace experiment, MapReader generated 30.5 million patches from the maps of England, Wales, and Scotland and predicted the label for each patch. That prediction is measured by a confidence score of how well a patch conforms to the algorithm’s idea of the label: for example, a 99 percent confidence score for the rail label is very high, while a lower confidence score of 60 percent means it is less likely that this patch is actually railspace. Among these millions of patches with predicted labels, there will be errors. Riverbeds may be misclassified as railways and rocky outcroppings mistaken for terrace houses: one of the consequences of automatically creating this data is living with this error. In our future work, we will focus on describing why historians can trust the claims we make using this data and might even consider reusing it to make their own arguments.

MapReader exemplifies the potential for humanistic computer vision using maps as data. It brings the energies of text as data and visual DH to bear on collections that are now becoming openly available. Using patches as a way to simplify the data creation process, it shines a light on the choices historians make when deciding what on the map can be analyzed. Patches are flexible in size and therefore adaptable to many kinds of maps and questions: rather than representing an arbitrary shape, their regularity across the gridded map sheet puts the focus instead on how a label reflects a meaningful historical concept.

Of course, MapReader is just one of many methods that will emerge in the coming years. There are already other projects engaging with maps computationally. The Unlocking the Colonial Archive project is applying machine learning methods to analyze texts and maps in Spanish imperial collections.30 And Machines Reading Maps, a project that I co-led with Yao-Yi Chiang and Deborah Holmes-Wong, improved methods for generating datasets of the text found on historical maps.31 All of these projects share common objectives of reproducibility, ethical engagement with historical materials, and the ultimate goal of extending successes on large collections of European and North American maps to the rich cartographic cultures in other parts of the world.

Cautions and Next Steps

As we move toward the large-scale analysis of maps through computational methods, keeping at the fore the limits of both maps and computational methods will be key. Outside the humanities, researchers in other disciplines are beginning to use growing collections of maps to generate data about human and environmental activity in the past. They are using scanned map collections to “mine” information from them—road networks, city footprints, orchards, and more (Uhl et al., “Towards the Automated Large-Scale Reconstruction of Past Road Networks”; Uhl et al., “Combining Remote-Sensing-Derived Data and Historical Maps”; National Trust). This is an attractive proposition because it opens up map collections as sources of information about the physical environment of the past. It adds data points for measures, such as the location of certain landscape types, or proxies for others like population, which only began to be collected according to scientific methods in the late nineteenth century (Higgs, Making Sense of the Census Revisited).

Mimicking approaches used to process and analyze aerial and remote sensing imagery, it is tempting to think of maps as just another, older snapshot of the ground. Using extracted features from maps in quantitative spatial analysis enables research in an array of fields, such as migration studies or climate change. But so far, such approaches tend to engage with maps themselves as if they were remote sensing imagery. Knowledge of historical mapping practices and digitization challenges is rarely embedded in the design of these methods (for an exception, see the approaches developed in Uhl et al., “Map Archive Mining”). Remote sensing data can be verified on the ground by humans, who visit locations to check what might be distorted or missing from imagery because of cloud cover, recent natural disasters, or war, for instance. But you cannot ground truth or verify the contents of historical maps. You can get close through painstaking archival research, but only partially, and there is no replacement for walking the land at the time the map was surveyed. In creating a dataset of historical railroads in the United States, the economic historian Jeremy Atack concluded that many miles of track were a “figment of the cartographer’s imagination” (“On the Use of Geographic Information Systems,” 319). The surveyor projected rail into the landscape, and unpicking those lines on the map took significant historical research. Maps, we have learned, require careful assessment. They model the surface of the earth according to the preferences of a network of humans and their mapping practices.

So, how do we treat map data as capta, recognizing that “knowledge is constructed, taken, not simply given as a natural representation of pre-existing fact” (Drucker, “Humanities Approaches to Graphical Display”)?32 When the object of inquiry is not primarily the source itself—we are not studying map content to understand maps—we are adding another layer of interpretation. Even seen through the prism of the surveyor’s motivations, data on maps can point to how people lived in the past. In this way, data from maps is abstracted three times: first, the map itself is a simplification of the landscape; second, it is a landscape of the past; third, digitizing its content means representing qualitative information as quantitative. Using map data as capta acknowledges these three features of the content we create from digital maps.

In practice, this means thinking about alternate methods of data creation and structures. Rather than simply using maps to gather data using predetermined categories of knowledge, we have a chance to go back to the drawing board. Analyzing regions of images (as we do with patches in MapReader), for example, is an old trick in the computer vision playbook and draws on an older tradition of embedding attributes about landscapes within the pixels of GIS raster layers. But applying it to predict the content of historical maps at scale reflects the inherent uncertainty of translating a map into a precisely located landscape feature.

Using MapReader as an example of the latest wave of digitization, this chapter shows that automatic data creation is an alternative and a complement to manual GIS work. It is a methodological offering that represents the fruits of what we have learned from working with complex historical text data and from engaging with computer scientists working on historical map processing. In the specific context of Living with Machines, automatic data creation is also the product of an ethos of showing our work from end to end and making it possible for others to recreate and repurpose our code and data. Humanistic computer vision using maps as data is all the more important because the treatment of maps as accurate representations of the built and natural environments might unintentionally reproduce the inequalities and violences embedded in these sources. If humanities scholars have been slow to take up large-scale analysis of digitized maps, the scientific and commercial community has not.

This is necessary because objectifying map content can have consequences (Crampton, Mapping). In her examination of the relationship between archives, data, and the past and present of commodifying black bodies, Jessica Marie Johnson articulates just what is at stake: “The brutality of black codes, the rise of Atlantic slaving, and everyday violence in the lives of the enslaved created a devastating archive. Left unattended, these devastations reproduce themselves in digital architecture, even when and where digital humanists believe they advocate for social justice” (“Markup Bodies,” 58).

How one creates, organizes, and locates data about the past has social and political implications. By attempting to work with maps as “insecure” documents (Kitchin), humanities researchers can challenge existing paradigms that vacuum up map content according to the standards set by GIS decades ago. Furthermore, pinning down map content as vector data and sharing that in a database that is fixed for all time disincentivizes others from working iteratively with map content and encourages reuse of vector data. Most of the time, data reuse is great; but it has the potential to reify knowledge and stifle other views of map content. MapReader aims to allow researchers to create their own datasets based on their own questions: it does not assume that there are a finite number of features to extract and that these are known by us, the tool’s creators. Yes, we will release our railspace data openly: we hope it might be reused, and we hope that it reflects the reproducibility of our method. But ultimately, the real treasure is not in the datasets: it will be in showing others how to use MapReader for their own purposes.

John Corrigan points to the humanities scholar’s pursuit of complex or “dynamic” data as “data that is characterized by interaction between its various parts” (“Qualitative GIS,” 80). In the first decade of the twenty-first century, GIS appeared dynamic enough. It allowed historians to toggle back and forth between different views of scanned georeferenced maps, government datasets, or painstakingly curated data from archival sources, like railway freight tables. But in 2024 there are new opportunities for introducing flexibility in how we work with historical sources. When it comes to maps, computer vision and machine learning allow us to work with maps not as resources to be mined but rather queried.

Moving away from the language of mining and toward the language of questioning celebrates the potential for iteration and reinterpretation that is valued in the humanities. It is more in tune with the ongoing connection that a historian or other humanities researcher has with a source. It evokes the idea of a conversation and acknowledges that sources are containers of multidimensional facts. Like distant viewing of photographs or film, querying scanned maps using computer vision allows scholars to interact with the data curation process, change their minds about what features are interesting, or test different ways of naming and organizing information. This malleability is not possible in a GIS workflow. Rethinking the place of maps in history and neighboring fields is a chance to democratize spatial analysis—in terms of removing both the high costs of preparing data for a GIS and the constraints around data structures and workflows that discourage iterative research.

Now is the time for DH to embrace historical maps as searchable data. Visual features and text on maps can be quickly transformed into machine-readable data. This can be used to answer historical or other research questions, to improve discoverability of maps in libraries and archives, and for other creative or scientific purposes. Data creation depends on methods that are quick and reproducible so that historians can test, for example, different approaches to labeling and then use these results to refine their research questions. Working at a national, continental, or global level with historical maps opens opportunities for research at these scales where it can be time-consuming to identify appropriate sources or difficult-to-find continuous coverage over large areas using other primary sources.

We are used to selecting case studies based on lucky archival survivals. Using tools like MapReader offers a new way to scope patterns in the built or natural environment. Indeed, it allows historians to reconsider the shapes constituting that very environment and thus understand how common or unique a place might be based on its cartographic representation. Computational approaches to digitized maps open the door to working iteratively across multiple scales of historical experience. Side by side with archival research, we can begin to make new claims about historical places and ultimately about the lives of the people in those places.

Notes

1. The History of Cartography project (https://geography.wisc.edu/histcart/) is a major contribution to the field and exemplifies the best of writing about maps and mapping since 1987. The most recent is Matthew Edney and Mary Pedley, eds., Cartography in the European Enlightenment (Chicago: University of Chicago Press, 2019).
2. E.g., Akerman, Decolonizing the Map; Guldi, “The Tangible Shape of the Nation”; Wigen, A Malleable Map; Heffernan, “A Paper City”; Verdier, “Plans et Cartes.”
3. It also raised the thorny question of how people find maps through library catalogs or archival finding aids. Cataloging (e.g., creating metadata for collections) has been a challenge for maps. Many map collections globally are not cataloged at item level.
4. See the “Always Already Computational: Collections as Data” website at https://collectionsasdata.github.io/.
5. This chapter was written for a broad community of humanities researchers, understood to include historians but also historical geographers and archaeologists. Many times I refer to historians alone, but this is more for brevity than as a statement about disciplinary uniqueness.
6. Living with Machines (https://livingwithmachines.ac.uk/), funded by the UK Research and Innovation (UKRI) Strategic Priority Fund, is a multidisciplinary collaboration delivered by the Arts and Humanities Research Council (AHRC), with the Alan Turing Institute, the British Library, and the Universities of Cambridge, East Anglia, Exeter, and Queen Mary University of London.
7. For example, the American Memory initiative of the U.S. Library of Congress was an early digitization project that first provided digital images on CD-ROMs and videodiscs. Later, materials were made available online. See http://memory.loc.gov/ammem/about/techIn.html.
8. OldMapsOnline (https://www.oldmapsonline.org/).
9. Unfortunately, access to digital reproductions is not guaranteed. Whether because of copyright restrictions (which vary from country to country), licensing agreements with third-party partners, or institutional decisions to withhold access because of materials’ commercial value, physical collections in the public domain cannot reliably be accessed openly as digital objects.
10. Sanborn Maps (https://www.loc.gov/collections/sanborn-maps/), National Library of Scotland Data Foundry (https://data.nls.uk/).
11. https://livingwithmachines.ac.uk/georeferencing-ordnance-survey-maps/.
12. https://georeferencer.com/.
13. https://allmaps.org/, https://iiif.io/.
14. This section is not meant to be a complete review of spatial humanities work. Please see the excellent overviews by Todd Presner and David Shepard, “Mapping the Geospatial Turn”; Jo Guldi, “What Is the Spatial Turn?”; and Gregory and Ell’s “GIS in Historical Research” in Historical GIS, 15–18.
15. The role that GIS plays in allowing users simply to view overlapping, semitransparent layers of maps covering the same space has been emphasized by David Rumsey. Such a method mimics and enhances the techniques historians use in reading rooms to compare maps, but it stops short of any quantitative representation of that comparison (Rumsey and Williams, Historical Maps in GIS, 8–11). For tutorials that progress through these steps, see https://spatial.scholarslab.org/stepbystep/; for the first lesson in a set of early Programming Historian lessons, see https://programminghistorian.org/en/lessons/googlemaps-googleearth.
16. Building historical datasets—data capture—has led to new fields of inquiry in gazetteer creation, qualitative spatial relationships, and the representation of time and movement. These rich areas link spatial history to library, archive, and information science in important ways, making historical spatial data a shared interest across these fields. See, for example, Giordano and Cole, “Places of the Holocaust.”
17. See, for example, the Amsterdam Time Machine’s description of its geographical infrastructure (https://amsterdamtimemachine.nl/hisgis-clariah/) and the University of Antwerp Time Machine’s GIStorical project (https://www.uantwerpen.be/en/projects/antwerp-time-machine/about-the-project/rapid-developments/).
18. Interest in gazetteers, in large part, was spurred by the Pelagios Commons project: see the Pelagios Network (https://pelagios.org/).
19. Furthermore, for the great majority of maps published before the nineteenth century, it is extremely difficult to create vector data that is not significantly warped because of the effects of georeferencing the scanned sheet on which that content lives.
20. Among the many outputs of these projects, see Murrieta-Flores et al., “Automatically Analyzing Large Texts in a GIS Environment,” and Taylor, Gregory, and Donaldson, “Combining Close and Distant Reading.”
21. Even before considering research applications, exploratory searches across digitized and born-digital image collections is a big challenge. This is why there is a growing body of literature focused on using machine learning methods to automatically generate metadata from visual collections. See, for example, Arnold et al., “Uncovering Latent Metadata,” and Hu et al., “Enriching the Metadata of Map Images.”
22. See also the DH2018 Computer Vision workshop at https://dh2018.adho.org/computer-vision-in-dh/.
23. PixPlot project at Yale DH Lab (https://dhlab.yale.edu/projects/pixplot/).
24. For an overview of the MapReader pipeline, see https://living-with-machines.github.io/MapReader/; see also Kasra Hosseini et al., “MapReader: A Computer Vision Pipeline for the Semantic Exploration of Maps at Scale.”
25. AHRC award AH/S01179X/1, https://gtr.ukri.org/projects?ref=AH%2FS01179X%2F1.
26. For important examples of this work, see Chiang, Using Historical Maps; Uhl et al., “Map Archive Mining”; and Uhl and Duan, “Automatic Extraction.”
27. https://github.com/Living-with-machines/MapReader.
28. See the National Library of Scotland for the historic maps API layers (https://maps.nls.uk/projects/api/).
29. See the National Library of Scotland at https://maps.nls.uk/os/.
30. Unlocking the Colonial Archive project (https://unlockingarchives.com/).
31. Machine Reading Maps project (https://www.turing.ac.uk/research/research-projects/machines-reading-maps).
32. I use the word “data” when describing what is created when we make a machine-readable version of map content (Arnold and Tilton, “New Data?”), and I find Christoph Schöch’s definition of humanities data useful: “a digital, selectively constructed, machine-actionable abstraction representing some aspects of a given object of humanistic inquiry” (“Big? Smart? Clean? Messy?”).

Bibliography

Akerman, James R., ed. Decolonizing the Map: Cartography from Colony to Nation. Kenneth Nebenzahl, Jr., Lectures in the History of Cartography. Chicago: University of Chicago Press, 2017.
Arnold, Taylor, Stacey Maples, Lauren Tilton, and Laura Wexler. “Uncovering Latent Metadata in the FSA-OWI Photographic Archive.” Digital Humanities Quarterly 011, no. 2 (March 2017).
Arnold, Taylor, and Lauren Tilton. “Distant Viewing: Analyzing Large Visual Corpora.” Digital Scholarship in the Humanities 34, supp. 1 (December 2019): 3–16. https://doi.org/10.1093/llc/fqz013.
Arnold, Taylor, and Lauren Tilton. “New Data? The Role of Statistics in DH.” In Debates in the Digital Humanities 2019, edited by Matthew K. Gold and Lauren F. Klein. Minneapolis: University of Minnesota Press, 2019. https://dhdebates.gc.cuny.edu/read/untitled-f2acf72c-a469-49d8-be35-67f9ac1e3a60/section/a2a6a192-f04a-4082-afaa-97c76a75b21c#ch24.
Atack, Jeremy. “On the Use of Geographic Information Systems in Economic History: The American Transportation Revolution Revisited.” Journal of Economic History 73, no. 2 (2013): 313–38.
Barron, Alexander T. J., Jenny Huang, Rebecca L. Spang, and Simon DeDeo. “Individuals, Institutions, and Innovation in the Debates of the French Revolution.” Proceedings of the National Academy of Sciences 115, no. 18 (May 2018): 4607–12. https://doi.org/10.1073/pnas.1717729115.
Blevins, Cameron. “Digital History’s Perpetual Future Tense.” In Debates in the Digital Humanities 2016, edited by Matthew K. Gold and Lauren F. Klein. Minneapolis: University of Minnesota Press, 2016. https://dhdebates.gc.cuny.edu/read/untitled/section/4555da10-0561-42c1-9e34-112f0695f523.
Bodenhamer, David J., John Corrigan, and Trevor M Harris. The Spatial Humanities: GIS and the Future of Humanities Scholarship. Bloomington, Ind.: University of Indiana Press, 2010.
Brown, Vincent. “Mapping a Slave Revolt: Visualizing Spatial History through the Archives of Slavery.” Social Text 33, no 4 (125) (December 2015): 134–41. https://doi.org/10.1215/01642472-3315826.
Chiang, Yao-Yi, Weiwei Duan, Stefan Leyk, Johannes H. Uhl, and Craig A. Knoblock. Using Historical Maps in Scientific Studies: Applications, Challenges, and Best Practices. SpringerBriefs in Geography. Cham: Springer International, 2020. https://doi.org/10.1007/978-3-319-66908-3_3.
Coleman, Catherine Nicole. “Everything Is Data, except When It Isn’t.” Stanford Libraries Blog, May 20, 2021. https://web.archive.org/web/20230705124529/https://library.stanford.edu/blogs/stanford-libraries-blog/2021/05/everything-data-except-when-it-isnt.
Corrigan, John. “Qualitative GIS and Emergent Semantics.” In The Spatial Humanities: GIS and the Future of Humanities Scholarship, 76–88. Bloomington: University of Indiana Press, 2010.
Crampton, Jeremy W. Mapping: A Critical Introduction to Cartography and GIS. New York: John Wiley, 2011.
Drucker, Johanna. “Humanities Approaches to Graphical Display.” DHQ: Digital Humanities Quarterly 5, no. 1 (March 2011).
Edney, Matthew. Cartography: The Ideal and Its History. Chicago: University of Chicago Press, 2019.
Fleet, Christopher. “Creating, Managing, and Maximising the Potential of Large Online Georeferenced Map Layers.” E-Perimetron 14, no. 3 (2019): 140–49.
Fleet, Christopher, Kimberly C. Kowal, and Petr Přidal. “Georeferencer: Crowdsourced Georeferencing for Map Library Collections.” D-Lib Magazine 18, no. 11/12 (November 2012). https://doi.org/10.1045/november2012-fleet.
Gallon, Kim. “Making a Case for the Black Digital Humanities.” In Debates in the Digital Humanities 2016, edited by Matthew K. Gold and Lauren F. Klein. Minneapolis: University of Minnesota Press, 2016. https://dhdebates.gc.cuny.edu/read/untitled/section/fa10e2e1-0c3d-4519-a958-d823aac989eb.
Gentzkow, Matthew, Bryan Kelly, and Matt Taddy. “Text as Data.” Journal of Economic Literature 57, no. 3 (September 2019): 535–74. https://doi.org/10.1257/jel.20181020.
Giordano, Alberto, and Tim Cole. “Places of the Holocaust: Towards a Model of GIS of Place.” Transactions in GIS. Accessed January 14, 2020. https://doi.org/10.1111/tgis.12583.
Gregory, Ian N., and Paul S. Ell. Historical GIS: Technologies, Methodologies and Scholarship. Cambridge: Cambridge University Press, 2007.
Grimmer, Justin, and Brandon M. Stewart. “Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts.” Political Analysis 21, no. 3 (2013): 267–97. https://doi.org/10.1093/pan/mps028.
Grover, Claire, Richard Tobin, Kate Byrne, Matthew Woollard, James Reid, Stuart Dunn, and Julian Ball. “Use of the Edinburgh Geoparser for Georeferencing Digitized Historical Collections.” Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences 368, no. 1925 (August 2010): 3875–89. https://doi.org/10.1098/rsta.2010.0149.
Guldi, Jo. “The Tangible Shape of the Nation: The State, the Cheap Printed Map, and the Manufacture of British Identity, 1784–1855.” In The Objects and Textures of Everyday Life in Imperial Britain, edited by Janet C. Myers and Deirdre H. McMahon. London: Routledge, 2016. https://doi.org/10.4324/9781315562964-2.
Guldi, Jo. “What Is the Spatial Turn?” Spatial Humanities. Accessed January 14, 2020. https://spatial.scholarslab.org/spatial-turn/.
Heffernan, Michael. “A Paper City: On History, Maps, and Map Collections in 18th and 19th Century Paris.” Imago Mundi 66, supp. 1 (September 2014): 5–20. https://doi.org/10.1080/03085694.2014.947847.
Henneberg, J., M. Satchell, X. You, L. Shaw-Taylor, and E. A. Wrigley. 1881 England, Wales and Scotland Rail Lines. [Data Collection]. Colchester, Essex: UK Data Archive, 2017. https://reshare.ukdataservice.ac.uk/852993/.
Higgs, Edward. Making Sense of the Census Revisited: Census Records for England and Wales 1801–1901: A Handbook for Historical Researchers. London: Institute of Historical Research, 2005.
Hitchcock, Tim, and William J. Turkel. “The Old Bailey Proceedings, 1674–1913: Text Mining for Evidence of Court Behavior.” Law and History Review 34, no. 04 (November 2016): 929–55. https://doi.org/10.1017/S0738248016000304.
Hosseini, Kasra, Katherine McDonough, Daniel van Strien, Olivia Vane, and Daniel C. S. Wilson. “Maps of a Nation? The Digitized Ordnance Survey for New Historical Research.” Journal of Victorian Culture 26, no. 2 (April 2021): 284–99. https://doi.org/10.1093/jvcult/vcab009.
Hosseini, Kasra, Daniel C. S. Wilson, Kaspar Beelen, and Katherine McDonough. “MapReader: A Computer Vision Pipeline for the Semantic Exploration of Maps at Scale.” ArXiv:2111.15592 [Cs] (November 2021). http://arxiv.org/abs/2111.15592.
Hu, Yingjie, Zhipeng Gui, Jimin Wang, and Muxian Li. “Enriching the Metadata of Map Images: A Deep Learning Approach with GIS-Based Data Augmentation.” International Journal of Geographical Information Science 36, no. 4 (April 2022): 799–821. https://doi.org/10.1080/13658816.2021.1968407.
Jefferson, Brian Jordan. “Predictable Policing: Predictive Crime Mapping and Geographies of Policing and Race,” Annals of the American Association of Geographers 108, no. 1 (2018): 1–16. https://doi.org/10.1080/24694452.2017.1293500.
Johnson, Jessica Marie. “Markup Bodies: Black [Life] Studies and Slavery [Death] Studies at the Digital Crossroads.” Social Text 36, no. 4 (137) (December 2018): 57–79. https://doi.org/10.1215/01642472-7145658.
Kitchin, Rob, and Martin Dodge. “Rethinking Maps.” Progress in Human Geography 31, no. 3 (June 2007): 331–44. https://doi.org/10.1177/0309132507077082.
Knowles, Anne Kelly. “Historical Geographic Information Systems and Social Science History.” Social Science History 40, no. 4 (2016): 741–50. https://doi.org/10.1017/ssh.2016.29.
Knowles, Anne Kelly. “Introduction.” Social Science History 24, no. 3 (August 2000): 451–70.
Kurgan, Laura. Close Up at a Distance: Mapping, Technology, and Politics. Cambridge, Mass.: MIT Press, 2013.
McDonough, Katherine, Ludovic Moncla, and Matje van de Camp. “Named Entity Recognition Goes to Old Regime France: Geographic Text Analysis for Early Modern French Corpora.” International Journal of Geographical Information Science vol. 33, no. 12 (2019): 2498–522.
Murrieta-Flores, Patricia, Alistair Baron, Ian Gregory, Andrew Hardie, and Paul Rayson. “Automatically Analyzing Large Texts in a GIS Environment: The Registrar General’s Reports and Cholera in the 19th Century.” Transactions in GIS 19, no. 2 (2015): 296–320. https://doi.org/10.1111/tgis.12106.
Murrieta-Flores, Patricia, and Bruno Martins. “The Geospatial Humanities: Past, Present and Future.” International Journal of Geographical Information Science 33, no. 12 (December 2019): 2424–29. https://doi.org/10.1080/13658816.2019.1645336.
National Trust. “National Trust Vows to ‘Bring Back the Blossom’ as New Research Reveals Massive Drop in Orchards since 1900s.” Accessed March 28, 2022. https://www.nationaltrust.org.uk/press-release/national-trust-vows-to-bring-back-the-blossom-as-new-research-reveals-massive-drop-in-orchards-since-1900s.
Padilla, Thomas, Laurie Allen, Stewart Varner, Sarah Potvin, Elizabeth Russey Roke, and Hannah Frost, “Santa Barbara Statement on Collections as Data.” Always Already Computational: Collections as Data. 2018. Accessed March 28, 2022, https://collectionsasdata.github.io/statement/.
Posner, Miriam. “Humanities Data: A Necessary Contradiction.” June 25, 2015. https://miriamposner.com/blog/humanities-data-a-necessary-contradiction/.
Posner, Miriam. “What’s Next: The Radical, Unrealized Potential of Digital Humanities.” In Debates in the Digital Humanities 2016, edited by Matthew K. Gold and Lauren F. Klein. Minneapolis: University of Minnesota Press, 2016. https://dhdebates.gc.cuny.edu/read/untitled/section/a22aca14-0eb0-4cc6-a622-6fee9428a357.
Presner, Todd, and David Shepard. “Mapping the Geospatial Turn.” In A New Companion to Digital Humanities, edited by Susan Schreibman, Ray Siemens, and John Unsworth, 199–212. Malden, Mass.: Wiley-Blackwell, 2015. https://doi.org/10.1002/9781118680605.ch14.
Robertson, Stephen, and Lincoln Mullen. “Arguing with Digital History: Patterns of Historical Interpretation.” Journal of Social History 54, no. 4 (July 2021): 1005–22. https://doi.org/10.1093/jsh/shab015.
Rumsey, David, and Meredith Williams. “Historical Maps in GIS.” In Past Time, Past Place: GIS for History, edited by Anne Kelly Knowles, 1–18. Redlands, Calif.: ESRI Press, 2002.
Ryan, Lyndall, Jennifer Debenham, Mark Brown, and William Pascoe. “Introduction: Colonial Frontier Massacres in Eastern Australia 1788–1872.” Centre for 21st Century Humanities. 2017. https://c21ch.newcastle.edu.au/colonialmassacres/introduction.php.
Schöch, Christof. “Big? Smart? Clean? Messy? Data in the Humanities.” Journal of Digital Humanities 2, no. 3 (Summer 2013). http://journalofdigitalhumanities.org/2-3/big-smart-clean-messy-data-in-the-humanities/.
Scholz, Luca. “Deceptive Contiguity: The Polygon in Spatial History.” Cartographica: The International Journal for Geographic Information and Geovisualization 54, no. 3 (Fall 2019): 206–16. https://doi.org/10.3138/cart.54.3.2018-0018.
Schuurman, Nadine. GIS: A Short Introduction. Malden, Mass.: Blackwell, 2004.
Tavakkol, Sasan, Yao-Yi Chiang, Tim Waters, Feng Han, Kisalaya Prasad, and Raimondas Kiveris. “Kartta Labs: Unrendering Historical Maps.” In Proceedings of the 3rd ACM SIGSPATIAL International Workshop on AI for Geographic Knowledge Discovery (GeoAI 2019), 48–51. Chicago: Association for Computing Machinery, 2019. https://doi.org/10.1145/3356471.3365236.
Taylor, Joanna E., Ian N. Gregory, and Christopher Donaldson. “Combining Close and Distant Reading: A Multiscalar Analysis of the English Lake District’s Historical Soundscape.” International Journal of Humanities and Arts Computing 12, no. 2 (October 2018): 163–82. https://doi.org/10.3366/ijhac.2018.0220.
Uhl, Johannes H., and Weiwei Duan. “Automating Information Extraction from Large Historical Topographic Map Archives: New Opportunities and Challenges.” Handbook of Big Geospatial Data, edited by Martin Werner and Yao-Yi Chiang, 509–22. Cham: Springer International, 2021. https://doi.org/10.1007/978-3-030-55462-0_20.
Uhl, Johannes, Stefan Leyk, Yao-Yi Chiang, Weiwei Duan, and Craig Knoblock. “Map Archive Mining: Visual-Analytical Approaches to Explore Large Historical Map Collections.” ISPRS International Journal of Geo-Information 7, no. 4 (2018): 148.
Uhl, Johannes H., Stefan Leyk, Yao-Yi Chiang, and Craig A. Knoblock. “Towards the Automated Large-Scale Reconstruction of Past Road Networks from Historical Maps.” Computers, Environment and Urban Systems 94 (June 2022): 101794. https://doi.org/10.1016/j.compenvurbsys.2022.101794.
Uhl, Johannes H., Stefan Leyk, Zekun Li, Weiwei Duan, Basel Shbita, Yao-Yi Chiang, and Craig A. Knoblock. “Combining Remote-Sensing-Derived Data and Historical Maps for Long-Term Back-Casting of Urban Extents.” Remote Sensing 13, no. 18 (January 2021): 3672. https://doi.org/10.3390/rs13183672.
Verdier, Nicolas. “Plans et Cartes (France, XVIIIe Siècle).” In Les Projets: Une Histoire Politique (XVIe-XXIe Siècles), edited by Frédéric Graber and Martin Giraudeau, 149–61. Paris: Presses des Mines, 2018.
Vitale, Valeria, Pau de Soto, Rainer Simon, Elton Barker, Leif Isaksen, and Rebecca Kahn. “Pelagios—Connecting Histories of Place. Part I: Methods and Tools.” International Journal of Humanities and Arts Computing 15, no. 1–2 (2021): 5–32.
Wevers, Melvin, and Thomas Smits. “The Visual Digital Turn: Using Neural Networks to Study Historical Images.” Digital Scholarship in the Humanities 35, no. 1 (1 April 2020): 194–207. https://doi.org/10.1093/llc/fqy085.
Wigen, Kären. A Malleable Map: Geographies of Restoration in Central Japan, 1600–1912. Berkeley: University of California Press, 2010. https://public.eblib.com/EBLPublic/PublicView.do?ptiID=566761.
Withers, Charles W. J. “On Trial—Social Relations of Map Production in Mid-Nineteenth-Century Britain.” Imago Mundi 71, no. 2 (July 2019): 173–95. https://doi.org/10.1080/03085694.2019.1607044.
Yuan, May. “Mapping Text.” In The Spatial Humanities: GIS and the Future of Humanities Scholarship, edited by David J. Bodenhamer, et al. Indiana University Press, 2010. ProQuest Ebook Central, https://ebookcentral.proquest.com/lib/lancaster/detail.action?docID=1402899.

Fugitivities and Futures

Show the following:

Adjust appearance:

Notes

Maps as Data

Digitization Wave 1: Historical Maps as Digital Objects

Digitization Wave 2: Make Your Own Data from Maps

Time and Money

GIS Data Models

Wave 3 Catalysts

Text as Data

Visual DH

Digitization Wave 3: Automatic Data

MapReader

Cautions and Next Steps

Notes

Bibliography

Annotate