Chapter 3 Right-to-Left (RTL) Text: Digital Humanists Plus Half a Billion Users

Masoud Ghorbaninejad, Nathan P. Gibson, and David Joseph Wrisley

In early 2020, digital humanist Zoe LeBlanc posted an “extremely niche tweet” reporting her surprise that the Altair visualization library correctly displayed the Arabic characters that she analyzed in her scholarship (Figure 3.1).1 To be sure, the set of digital humanities (DH) practitioners using both Python visualizations and right-to-left (RTL) scripts like Arabic might be small. But the subject is not, in fact, niche at all: Many thousands of people in the world are interested in designing data visualizations that involve Arabic characters. Furthermore, projects supporting RTL scripts have potential user bases that number in the hundreds of millions—users who read and write not only in Arabic but also in Lahnda, Urdu, Persian, Hebrew, and numerous other languages. And yet most visualization libraries and most software in general is typically designed with only one direction in mind: left-to-right (LTR). The persistent frustrations of those who read and write RTL languages in digital environments are reflected in the enormous number of GitHub bug reports, feature requests, and software patches related to “RTL” and “bidi” (bidirectional text, mixing left-to-right and right-to-left scripts). As of January 2023, a search for “RTL” turned up more than 200,000 issues and 6 million code commits, with more than 38,000 of these issues labeled as unresolved.2

In this chapter, we shed light on the challenges and inequities that arise when doing digital humanities work with RTL languages.3 We argue that these challenges are not unique to DH; rather, they reflect the experience of myriad other RTL developers, content creators, and users. Digital humanists working with RTL languages must acknowledge that we share many of the same concerns with these RTL users and that ignoring their digital habitus (a term we use, following Bourdieu, to denote formative habits, attitudes, and skills in digital environments) and cultural perspectives has led to a failure to recognize our shared concerns. While the DH community should give more attention to RTL voices within DH, we make the case that sustainable solutions to the obstacles faced by RTL DHers do not rest in building custom DH tools or “bootstrapping” workarounds but rather in joining forces with a larger set of developers and creators outside academia to advocate for multidirectional, multiscript support in the tools we all use.4

A screenshot of a Twitter thread in which using Arabic in visualizations is discussed. — Figure 3.1. A Twitter thread discussing a novel solution for labels in the Arabic language within a statistical data visualization package, Altair. Figure description

So-called technical solutions for RTL languages should not be carried out in isolation from the lived, and often multilingual, realities of the societies in which these languages are used. In this chapter, we therefore think about RTL DH scholarship in the context of both its historical subject matter and its contemporary expression. That is, the context of RTL scholarship includes not only the study of ancient or historical languages in the centuries-long tradition of Orientalist scholarship but also the modern, often multilingual societies that themselves require multiscript, and multidirectional digital environments (Figure 3.2). RTL DH research, in other words, needs to contribute to DH research but also participate in digital life more broadly. Here we identify a potentially synergistic relationship between the habitus of living RTL languages, on the one hand, and digital stewardship of their heritage, on the other. To realize this relationship, we must pay attention to the way people live with RTL languages and move toward a DH practice that exists in dialogue with the societies they inhabit. We must include RTL DHers as experts about best practices and as strong voices in documenting the barriers they face in environments that have been, first and foremost, designed for LTR accessibility.

Map of the Middle East and Africa expressing percentages of Wikipedia views in different countries in either right-to-left text or left-to-right text. — Figure 3.2. “A Map of Views of RTL text articles in Wikipedia,” illustrating views of *Wikipedia* pages by script type and by country. The dark gray portion of the pie chart indicates the proportion of articles consulted, which are right to left (RTL) text, and light gray represents the proportion of left to right (LTR) text. Data source: https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerCountryBreakdown.htm. Visualization by Wrisley in QGIS with Natural Earth physical land polygons. License: CC-BY-SA. Figure description

The argument we voice here as authors is grounded, in part, in our individual experiences. Masoud “Kasra” Ghorbaninejad was born and raised in West Asia (the “Middle East”), moved to North America for graduate studies, and has since worked in West Asia and North America in K–12, higher education, and digital humanities positions. He can understand, speak, and write in Arabic, Azeri, English, German, and Persian to varying degrees. Nathan Gibson grew up in Africa, North America, and Europe; lived in the Middle East for two years; received classical philological training in Arabic, Syriac, and Hebrew; and for the last several years has been conducting DH research in Europe on historical RTL texts. David Joseph Wrisley is a comparativist working across six European languages, Latin, and Arabic. He grew up in North America and has had extended work and research stays in Algeria, Belgium, France, Germany, and Tunisia. Since 2002, he has been residing in Arab countries, as a faculty member first in Beirut and now in Abu Dhabi, working to build digital humanities communities of practices and infrastructure. Together, we have worked in different areas of the world with research involving several languages from both contemporary and historical perspectives. While this experience informs our argument about LTR biases, it does not make our voices representative of any particular context.

The Anglocentrism of DH research is only part of a broader bias toward LTR, or LTR-centrism, stemming from the fact that English and many other languages of the postindustrial world are written from left-to-right (Fiormonte; Galina; Mahony; Meza). This bias is characterized by a persistent orientation toward tools and platforms that function well for LTR languages but poorly with RTL or bidirectional text.5 While there have been politically provocative suggestions that LTR programming languages should be replaced with new programming languages altogether written in RTL, resulting in projects like Qalb or Noor, which use Arabic keywords and right-to-left layout for code, here we are not speaking about such fundamental revisions to the practices of computing culture.6 Rather, our focus is the more common endeavor of creating content using RTL natural languages for screen-based display. This endeavor is made difficult due to a lack of support for the necessary tools, and, more fundamentally, a lack of attention to the problem.

In other words, RTL DH faces a compounded marginalization.7 Not only does it fall outside the Anglo-American tradition of DH (whose Anglocentrism has been criticized in recent years by many continental scholars, such as Domenico Fiormonte), but it also, and more importantly, in the Western academy is not well integrated into efforts to diversify DH practices beyond English, which have tended to focus on LTR languages. Furthermore, in the countries in which the RTL languages are spoken, a deep infrastructure gap impedes the development and sustaining of DH practices (Wrisley). It is precisely this power—of Anglocentric, LTR DH practice situated in the Global North—that leads us to write this critique of LTR-centrism in English.8

This issue of inclusion is all the more pressing as RTL digital cultures have begun to enter into the larger global community of DH. An increasing number of DH institutes are being held in locations where RTL languages are used, such as the Digital Humanities Institute Beirut, held at the American University of Beirut, and the Winter Institute in Digital Humanities at New York University (NYU) Abu Dhabi.9 This community is also growing through the #Right2Left workshops held at the Digital Humanities Summer Institute in Victoria, British Columbia, Canada, and transnational groups of scholars, such as the Islamicate Digital Humanities Network, which works in and between RTL societies and the West.10 The time is ripe for forging an agenda that advocates for improving RTL access to tools through partnerships among DH scholars researching the past and present, commercial developers, and content creators in both majority RTL and majority LTR societies.

After all, solutions for RTL exist within Unicode and W3C (see the “Technical” section). Is there a way to harness the callout culture of social media and guide it directly to the software development world, encouraging solutions to the issues that many of us face, such as unreadable text display, laborious text editing, confusing page sequencing, poorly conceived web application layouts, and ungraceful translations? What are the best tactics for framing RTL (and other non-Latin script) language accessibility as a question of not only diversity, in which multiple voices are valued, but also inclusion and equity, in which there is an ethical obligation to improve the status quo, giving global colleagues a seat at the table and empowering them to do the work? We propose a three-pronged approach that examines the habitual, cultural, and technical angles of these questions in order to begin the work of change.

Habitual

RTL DH often overlooks the contemporary digital habitus of RTL cultures in favor of an emphasis on digitally archiving or computationally analyzing a textual past. But a focus on the textual past often ignores the lived expressions of RTL languages and must instead invent its own digital habitus.11 Our idea of a digital habitus is informed by Pierre Bourdieu’s definition of “habitus” as “structures constitutive of a particular type of environment” (Bourdieu, 72–95). The environments in this case are ones in which RTL languages are primary, and the constituting structures are the sets of formative habits, attitudes, and skills associated with those who use digital tools. By necessity, a digital habitus emerges gradually, formed more through scholarly habits and institutional routines than by necessity.

Bourdieu takes this habitus—what he elaborates as “systems of durable, transposable dispositions”—to be self-reinforcing and aligned with particular objectives. But this habitus does not necessarily intend these objectives consciously or require specific expertise to complete them. Instead, he writes, they are “collectively orchestrated” without a conductor (Bourdieu, 72). When applied to the world of DH practice, it becomes clear that the present dispositions in RTL DH emerge simply from the way digital humanists practice their craft rather than from any predefined rules or objectives (Antonijević, 36–72). But these dispositions are nevertheless misaligned with broader RTL habitus in at least two major regards: space and time.

In regard to space, the geographic foci of RTL expression—whether print publications, social media posts, app development, film production, and so on—do not align, for historical reasons, with where RTL DH is practiced. This results in what has been called “the postcolonial digital cultural record,” characterized by a lack of parity between the digital cultural record of Global North and South, which Roopika Risam (3–21) sees as the end product of neo/colonial “disruptions within the digital cultural record.”12 During 2013 and 2014, for example, when two of the authors of this chapter were contributors to the Around DH in 80 Days project, which was designed to raise awareness of global digital humanities practice, RTL projects were lacking in the Middle East North Africa South Asia (MENASA) region.13 The first Iranian-founded academic DH center in Iran—a country with a population of more than 81 million and the most speakers of Persian, as well as many Azeri and Kurdish speakers (all of which use RTL scripts)—was launched only in early 2018 at the University of Shiraz.14

This lack of alignment between the geographic foci of RTL expression and RTL DH practice causes an additional problem. Domenico Fiormonte evokes Lev Vygotsky’s “cultural law of the artifact,” which states that “both material and cognitive artifacts produced by humans are subject to the influence of the environment, culture, and social habits of the individual and groups that devise and make use of them.” But the problem is not simply one of physical distance between those who produce DH tools and those who use them; rather, the physical distance is indicative of remote “environment, culture, and social habits” (Fiormonte, 438). Because RTL DH is mostly practiced far away from where RTL languages find social and cultural expression, a special, collective, and conscious effort must be made to close this gap. To do so, RTL projects must expand their focus beyond the historical textual archives that they typically explore and pay increased attention to contemporary practice in RTL cultures. The projects and those who participate in them, in turn, should have a voice in global DH conversations.

It is worth pointing out that the geographic-cultural disparity under discussion may have something to do with financial priorities, since universities in the Global North generally do not have sufficient incentive to invest in RTL infrastructure for the limited number of projects that require it. Why should institutions factor RTL support into their decisions about buying software licenses or building web applications when they will only help a small group of researchers working with contemporary materials? An arguable exception to this line of thinking may be RTL ephemera projects—for example, the International Digital Ephemera Project (IDEP) at UCLA Library, which aims to preserve content that is “ephemeral in nature and likely to be lost without proactive curation,” such as newspapers, postcards, and cellphone videos. In this case, everyday RTL materials are deemed broadly relevant. Yet even something like IDEP is not truly a case of contemporary RTL habitus being supported in the Global North, because such a project only helps preserve the content for archival purposes while its producers still live and work back in their own linguistic cultures. As a consequence, the infrastructure developed for these projects does not ameliorate the underlying disparity in support of RTL knowledge production practices.

The difference between archiving materials and supporting knowledge production highlights a further temporal divergence between the dispositions of RTL DH and other RTLers. Archiving, editing, and analyzing past content, which is typical of DH projects, produces different digital habitus than those of the millions of RTL users producing and consuming content in the present. Here contemporary DH projects could play an important role in counterbalancing the weight of historical projects, but they face several obstacles that lead to vicious cycles of underinvestment.15

Among the most significant of these obstacles derives from our impression—difficult though it may be to support through objective measures—that research on the premodern “golden ages” of RTL cultures and languages (such as Arabic, Hebrew, Persian, and Syriac) is considered more prestigious than research on contemporary periods, which is often perceived as less remarkable or less relevant. To the extent that the disciplines studying these cultures and languages seem bent only on retrieving, restoring, and perhaps even romanticizing an “exotic” RTL heritage, RTL DH practice—with its propensity for digital editions, computational linguistics, distant reading, and databases—can sometimes act as the quantitative arm of these disciplinary agendas. Informed by such agendas, an investment of (faculty) researcher time and institutional energy in a historical project may seem to have more scholarly merit whereas, for instance, a contemporary ephemera project (such as IDEP, mentioned above) would seem to have less. Put plainly: research that does not fit into the box of “golden ages quantified” is often seen as contributing not a different kind of value but simply less value.

Regardless of whether they are perceived as a better return on scholarly investment, historical RTL projects tend to be better funded than contemporary ones. This has the effect of supporting archival activities over RTL cultural production. Over time and across institutions, this leads to the vicious circle mentioned above, in which historical and archival projects are privileged. As funding agencies support the former—“safer,” more prestigious, and well established—and not the latter, they reinforce the opinion of other types of projects as unworthy of financial backing. Continually ignoring certain periods and certain types of research activities has the effect of further marginalizing voices outside the Global North—present voices that could contribute to present discussions.16

Even given the will to learn lived RTL expressions, overcoming logistical hurdles like the ones we detail below (see the “Technical” section) requires more than a change of attitude. As technological impediments to RTL research inhibit growth in the field, and therefore funding, another vicious circle ensues, as the lack of funding impedes the development of RTL-oriented technology. To break this cycle, new funding should go toward developing RTL infrastructure before results that rely on well-developed infrastructure can be expected.

The technological hurdles involved in RTL DH do not affect projects equally, of course.17 In some cases, historical projects can benefit significantly from the kind of receptive or “read-only” infrastructure that is less dependent on the directionality of the script, even if it was originally designed to handle LTR corpora.18 In addition, advances in optical character recognition (OCR) of non-Latin scripts (and of documents typeset with pre-1800 printing equipment) and emergent handwritten text recognition (HTR) methods have the potential to facilitate the searchability of digitized Arabic or Syriac manuscripts as much as, say, those in the Armenian alphabet.19 Conversely, what would contribute significantly to redirecting the DH world toward an RTL-inclusive program, which may not benefit LTR projects as much, would be the creation of a “read-write” infrastructure such as bidirectional support in XML editors, which is key to producing XML-encoded RTL text.20

To circle back to Bourdieu’s concept of “habitus,” we have identified certain ways of readjusting otherwise self-reinforcing systemic dispositions in order to reconstitute the RTL DH research field. They include becoming familiar with RTL practice outside the Global North, supporting RTL projects relating to contemporary as well as historical materials, and furthering infrastructure that contributes to RTL content creation. It may also mean creating a climate among DH practitioners in which RTL scripts are embraced as a contribution rather than an edge case or problem to be solved. By adjusting our posture toward RTL DH, we can encourage projects that require RTL-LTR bidirectionality and support RTL text production in a read-write infrastructure. This posture may be less difficult to achieve than it might initially seem, since many of the obstacles we have described are largely related to funding. What’s more, with the ever-evolving relationship between the “digital” and the “humanities” sides of DH, “the focus has moved away from technology as the servant of the humanities to one where our projects and other activities are of interest to and advance the research agendas of both disciplines” (Mahony, 372). Understood in this way, changing the field of DH to include living RTL cultures would undoubtedly repay the field with both a broader humanistic scope and with more robust technological advancements.

Cultural

The language situation in Arabic-speaking countries is particularly complex, illustrating the deficiencies of common assumptions about the straightforward nature of cultural translation and adaptation. Arabic is especially complicated because of its widespread usage as well as its great variety. It is spoken by more than 300 million people in more than twenty countries and reportedly claims the fifth-highest number of speakers in the world. But how we actually define Arabic is a matter of debate. It is a diglossic language with at least two distinct forms: standard written Arabic and the spoken language, which itself contains multiple national and regional varieties. These varieties pose a number of challenges for standardization in research and educational systems on their own (Bani-Khaled). In addition, in Arabic-speaking countries, as in many places in the postcolonial world, Western languages—particularly English and French—also have a presence in higher education as in everyday life and coexist with the varieties of Arabic. The resulting multilingualism manifests itself both in people’s attitudes and in the choices they make regarding language in everyday life.

The digital habitus of Arabic speakers is not conditioned by language preference or by choice alone but also by the past and present availability of software and technological infrastructures (or the lack of availability, as the case may be). For example, before the availability of Arabic versions of software in the mid-1990s to early 2000s—and even after their arrival for some time—it was considered quite normal for universities, offices, and homes to possess software in a language other than Arabic.21 This lack of localized software meant that users necessarily developed minimal literacies in a technical idiom in another language such as English or French. Moving forward to the 2020s, we find that information literacy practices have become considerably more complex. Whereas many interfaces may have been translated, how users are able to manipulate content in different languages remains a thorny problem: a native speaker of Arabic might read the news in standard Arabic on a smartphone, compose written English in Google Docs, and post on social media in a combination of English, French, and a contemporary style of Arabic written in Latin letters known as “Arabizi” (Yaghan).

There are large communities that might want their apps to reflect their multilingual digital habitus, as may be the case with, for example, Lebanese users who want to work in an English word processor interface for creating Arabic content full of glosses in English, or a Tunisian blogger who might use a French-language content management system to create bilingual, bidirectional French-Arabic content. In the realm of social media, where there is significant funding for content creation platforms, this kind of multilingual and multiscript approach is sometimes accommodated; but in the smaller-budget interface creation of the DH landscape, it is too often left out. As a result, DH is limited by what we term the “monolingual fallacy”: the tacit assumption that everywhere in the world people use one language, one script, and one direction. This assumption ends up conditioning the development of software, locking in specific possibilities of handling and displaying language.

To our knowledge, RTL-native knowledge infrastructures for DH research do not yet exist. As a consequence, digital platforms developed elsewhere either must be retrofitted to accommodate RTL content or they must be “translated” into the language of the host RTL culture, or both. However, thinking of this translation as merely passing between two languages, as is often the case, is a grave simplification; instead, proper localization, as outlined by the World Wide Web Consortium (W3C), involves an adaptation of a system that works in one place to one that will work fully in another. Examples of DH research platforms and apps that have been (or are being) adapted to new environments include Voyant Tools, Recogito, FromthePage, and Lingscape.22 The Free and Open Source Software (FOSS) movement is also known to use localization as a strategy for developing a global user base (Souphavanh and Karoonboonyanan). But there is a tension between this desire to make tools accessible for emergent humanities practices and the risk of cultural mistranslation that the process of localization can entail. Furthermore, the localization process can also carry with it the baggage of global knowledge inequity (Osborn, 5–16).

The creation of an Arabic-language interface to the text analysis and visualization platform Voyant Tools serves as a good case study demonstrating the insufficiency of thinking of localization as merely a transfer between two languages, as well as some of the problems associated with the lack of an internationalization strategy by design. Internationalization is a process related to localization, also defined by the W3C, in which software is first designed to be locale-indifferent before it is localized to meet the regional, linguistic/cultural, and technical requirements of each locale.

A few years ago, at the request of the main developers, a dozen or so language teams went to work on the terminology of the Voyant Tools interface. Voyant Tools is structured in modular fashion so that a number of widgets, each carrying out a different type of computational analysis, can be reused in different environments. With this structure in mind, the Arabic-language team, composed of Najla Jarkas and one of the authors of this chapter (Wrisley), decided to be as ecumenical as possible in naming conventions, bridging two ways of rendering foreign words. For the tool Bubbles, for example, the English was transliterated (بوبلز), but an expression more faithful to Arabic (فقاعات) was also given. This approach in introducing a foreign platform to an Arabic-speaking audience proposed a compromise perspective between those that adopt equivalences faithful to the structure and meanings of the Arabic language and those that employ more calque-like expressions taken from English.

The Arabic-language team followed this strategy throughout the platform, providing parallel equivalents, a translation of the tool’s function as well as a transliteration: Workset Builder (ورك سيت بيلدر/ إنشاء المكنز الجزئي), TermsRadio (ترمز راديو /عرض زمني), even the name Voyant Tools itself (فواينت تولز / ادوات فواينت). Whereas the intention was to provide diverse Voyant users with both styles—translation and transliteration—positioned prominently throughout the Arabic interface, user feedback based on the localization revealed mixed results: in the user testing phase, some found the presence of transliterated English jarring.

In the process of translating Voyant Tools, the Arabic-language team gained several insights into the challenges involved in localizing DH software. One revealing conclusion was that no amount of review and postediting of the interface could compensate for the fact that the back-end language processing of the platform is Anglophonic and, as a consequence, unable to handle basic linguistic features of Arabic—most notably the agglutinated definite article “لا” or prepositions such as “ب.” In some of the visualizations, such as Mandala, Links, or Bubbles, which only use a handful of tokens at a time (as shown in Figure 3.3), words do not display correctly inside the bubbles; instead, the first letter of the Arabic word is aligned with the left edge of the bubble, causing it to drift leftward outside of the bubble. In other cases in which the frequency of words and phrases are more explicitly marked, there are more significant issues. Either a design rethink or a deep reimplementation of the tokenization system would be required for Voyant to function in Arabic as well as it does for English.

Another example of the growing pains of localization can be found in the Programming Historian, which in recent years has launched a project to expand its global user base by translating its tutorials into three Romance languages (Spanish, French, and Portuguese). In 2018, questions of translation and global legibility informed a prolonged debate among the journal’s editorial board. Antonio Rojas-Castro outlined what he called the “American outlook” of the “Introduction to Stylometry with Python” tutorial written by François-Dominic Laramée, pointing out that some content is not easy to understand across cultures—in this case, not only between the English and Spanish languages but also between European and North American academic cultures.23 Furthermore, the Programming Historian’s tutorials often include companion datasets that may reflect linguistic and cultural biases. It follows, then, that localization must go beyond translation if the full functionality of the tutorial is to be adapted into a new cultural and linguistic environment. This approach is reflected in a new Multilingualism and Internationalization Policy that the contributors to the Programming Historian have now agreed to follow.

A screenshot of a digital text reader using the Arabic translation of Canadian novelist Alice Munro. Text is misaligned in the visualization. — Figure 3.3. A screenshot of two widgets in the Voyant Tools Arabic-language interface demonstrating the Bubbles (left) and Cirrus (right) tools. Featured is the Arabic translation of Alice Munro’s 1982 novel The Moons of Jupiter.

In order to avoid the issue of cultural mistranslation in future DH projects, the most obvious approach would be for the processes of internationalization and localization to follow each other in the correct order, because, as hinted earlier, internationalization is recommended as a practice that anticipates future localization (Tanev). Grounded in enterprise models of software development, internationalization can be understood as a kind of forethought in design (Yacob; Abdelali). In this way, internationalization policies might resemble other “intentional design” moves in global academic communities—for example, the adoption of collectively created and agreed-on standards for how we engage with members of our diverse communities.24 Such standards could provide a reference point for helping us to assess the needs of our communities, since they would bring a conscious focus to the ways that global digital humanists practice their craft, allowing us to connect those practices intentionally and with care to infrastructure rather than defaulting to LTR-centrism. If we were to commit to collectively agreed-on internationalization policies, we would also be able to give a more objective assessment of equity with respect to access in global DH.

Even as we design future platforms and tools with a view toward internationalization and eventual localization, we must be careful not to bury the assumptions of a monolingual, LTR-centrist DH culture in them. The fact that localization would seem to require superficial changes to labels and layouts can mask deeper problems in the transmission of tools and platforms to target RTL cultures, ones that have to do with linguistic or cultural assumptions or with global gaps in information literacy. Conceptually and culturally complex problems may arise, which cannot be adequately addressed by Anglocentric DH thinking. DH tools and platforms often embody complex arguments and forge new critical terminology; for these reasons as well, internationalizing and localizing DH platforms and tools cannot be viewed as a mere transfer between two languages. DH practitioners working in the Global North, even as they imagine new global audiences, will repeatedly confront these issues precisely because of the tension between the theoretical and discursive complexity of humanities work and the fundamental design rethink required by appropriate internationalization plans.

Technical

Support for RTL tools and environments is an issue of inclusivity, but it can also be seen as an issue of accessibility. To be clear, the way different users prefer to work with RTL, LTR, or bidirectional text ought not be considered a disability, but framing the issue in this way may lead to a sufficiently complex engagement with multilingualism similar to the engagement with multimodality that scholarship on accessibility has brought about. The accessibility community has developed production tools, conventions, and feedback mechanisms that allow creators (including digital humanists) to produce accessible content more effectively and with a more deeply informed understanding of how people will engage with that content in a range of ways.25 Similar approaches could aid RTL-friendly development if DH practitioners, software developers, social media influencers, and other tech creators would open lines of communication among each other.26 By involving users with RTL or bidirectional preferences, they would also, ideally, create stronger and more representative communities and extend the reach of their products just as they do when they include users with visual, auditory, or other disabilities in their design process (Henry and McGee; “Internationalization”).

As we are also reminded by accessibility advocates, the size of any particular user community should not be the only, or even the main, factor in decisions about inclusivity and accessibility. Yet the numbers in this case do provide one compelling reason to support RTL. As mentioned at the outset of this chapter, the number of people whose first language uses primarily a right-to-left writing system can be estimated as more than half a billion—almost 10 percent of the world’s population (Gibson, 9; Eberhard, Simons, and Fennig). A lack of awareness or disregard for RTL concerns in tool development jeopardizes the growth potential of those tools from the start, both for DH-specific tools and DH-relevant tools with a broader user base. Attending to those concerns, however, will help to ensure that these tools reach more users.

The issue of RTL support in DH relates not only to the size of its potential user communities but also to the significant presence of RTL languages in world cultural heritage. Some RTL languages, such as Arabic and Persian, have a millennia-long heritage and remain in wide use in the present day; other RTL language communities, such as the Syriac-speaking community, used to be much larger than they are today; and yet other languages, such as Turkish, have undergone significant linguistic and directional change to LTR (Kirmizialtin and Wrisley). The huge volume and diverse breadth of cultural heritage material written in RTL means that a global, historically minded field of DH must necessarily involve RTL scripts in large proportion. Previously we mentioned the dilemmas that researchers, funders, and institutions face in regard to the types of RTL projects they pursue. But when it comes to technical solutions for implementing RTL support within tools, it is not usually necessary to choose between historical and contemporary considerations. What is important to recognize is that heritage communities, as both active users of these languages and guardians of their past, lie in the overlapping region between contemporary digital habitus and history-oriented DH. Herein lies a potential and straightforward alignment between scholars of RTL and other RTL communities: both want practicable and flexible tools they can implement in a variety of workflows.

Removing accessibility-type barriers requires both the development of standards and their implementation for features relevant to RTL languages. Much as the W3C has developed standards that make it possible for assistive screen readers to describe a page to users (e.g., the “alt” attribute for images, introduced in 1995), standards developed over the last several decades have greatly augmented support for RTL and bidirectional text. The most significant ones include Unicode bidirectional controls (introduced in 1991 and supplemented in 2013) and W3C standards for HTML and CSS.27 To a large extent, then, the necessary technical standards already exist to create DH tools with RTL functionality. This is in contrast to the not-too-distant past, when users typing Hebrew had to accommodate the order in which the computer stored text by typing each line of Hebrew backward (Ishida, “Visual vs. Logical Ordering”). Today, with CSS3, it is even possible to represent the historical phenomenon of boustrophedon (lines with alternating directions) as text content on HTML pages.28

While it is undoubtedly the case that more work remains to be done on these and other technical standards, the main hurdle that remains is the continued development of tools that fail to implement these standards. Many examples could be cited, ranging from the annoying to the insurmountable. For instance, in text editors, most of which support Unicode, it is often possible to enter RTL text that displays in a readable way and even to set the “base direction.” But it is sometimes the case that punctuation is out of place or that selecting a desired portion of the entered text with a mouse, or trying to navigate it with the cursor, is nearly impossible.29 Other issues have to do with user experience and user interface (UX/UI) design. For example, in RTL text, does a right-arrow button mean “forward,” “backward,” or simply “rightward”? This problem affects many document viewers on the web; for some scanned RTL books, the Internet Archive correctly “turns” pages to the left or right when the left- or right-arrow button or key is pressed. But as Figure 3.4 illustrates, turning the page forward to the left still moves the progress slider in the opposite direction.30

The Internet Archive interface displays a Syriac book on the right, but the progress slider is on the left. It “turns” the pages to the left. — Figure 3.4. The Internet Archive BookReader interface correctly displays the beginning of a Syriac book on the right, but the progress slider is on the left. Moving the slider to the right “turns” the pages to the left. Source: “Liber scholiorum; textus,” Internet Archive, June 16, 2011, https://archive.org/details/liberscholiorumt00theo/page/n10/mode/2up. Figure description

Additionally, metadata problems afflict many RTL and bidirectional PDFs. The responsibility for these problems may rest with the person who originally created the PDF, but rectifying them requires detailed knowledge of metadata fields that are nearly impossible to find or adjust. PDF documents have a metadata field for “left” binding or “right” binding, but in order to set this when exporting a PDF from Adobe InDesign (a common publishing workflow), users must change their Adobe CC language settings and then install a Middle East and North Africa edition of InDesign.31 But a single binding direction for a document may be inadequate, since many books contain both right-bound and left-bound sections. These shortcomings mean that users have to read portions of many digitized books backward, scrolling upward rather than downward to get to the next page.32 They do not reflect the reality of multilingual cultures that are increasingly digital and screen-based.

Beyond UX/UI design confusion and opaque metadata settings, a further difficulty regarding implementing multidirectional standards is that the task often involves many-layered dependencies, so much so that one might speak of the “deep implementation” that is required throughout the entire codebase of libraries and extensions. An example of this is a recent update to the text-to-image (T2I) feature integrated into the popular DH transcription software Transkribus. Typically, a user supplied the transcribed text of a page and T2I aligned it to a page image, based on Transkribus’s HTR service. But when T2I segmented RTL text into lines, it attempted to match the last (leftmost) portion of the text with the first line of the page image. The result was that the “aligned” transcription read from bottom to top and had no meaningful relationship to the page image. Once the bug had been reported, the Transkribus team needed to contact the T2I developer, who in turn implemented an RTL “extension” that is included in the Transkribus software.33 The case of Transkribus is particularly salient, since HTR technologies have great potential for text creation with cursive Arabic-script languages that have been under-supported by more established OCR technologies. But if they are to deliver on the promise of accessibility in major world languages, developers will still face the unwieldy task of pushing for RTL support in all of their dependencies and replacing those that do not implement it.

The tweet about visualization in Python that begins this chapter (see Figure 3.1) illustrates even greater challenges for deep implementation when it displays misrendered Arabic text, in which the letters are disconnected and read left-to-right instead of right-to-left. The text is produced by seaborn, a popular code library for data visualization, which relies on matplotlib, the RTL deficiencies of which affect a multitude of tools. According to GitHub’s dependency graph for matplotlib, around 668,000 repositories depend on matplotlib’s code.34 While a workaround for correctly displaying RTL labels in matplotlib has been reported—replacing the back end with mplcairo, which has its own chain of dependencies, Raqm and Fribidi—how realistic is it for the hundreds of thousands of matplotlib-dependent repositories to implement the alternative mplcairo back end?35 As made clear by this example, the decision to support or not to support RTL or bidirectional features in a tool can have tremendous cascading effects. And depending on this single decision, myriad end-users—who may be unaware of the tools used to produce an app, website, text corpus, or set of graphs—will either have a frictionless experience or be excluded from using the tools altogether.

A final point regarding implementation—perhaps the most fundamental one—is the surprisingly poor implementation of RTL and mixed-direction text in many popular code editors.36 This is a sore point for digital humanists, who frequently discuss how to work around these issues, for example, when creating a digital edition of an RTL text using the guidelines of the Text Encoding Initiative XML (TEI-XML). Yet when one considers that code editors are a major part of the workflow for producing nearly every custom-built app or website, the problem comes into focus: it is difficult for any coder, not just a digital humanist, to create tools that will serve the half billion people whose first language uses primarily an RTL script.

A case in point is the GitHub-backed code editor Atom, which had a million active users as of March 2016.37 Currently Atom has at least six open issues relating to right-to-left text handling, the oldest of which was opened nearly five years ago.38 In one of the most commented-on issues, Atom developer @lee-dohm (Lee Dohm), drawing on the nature of open source and in light of limitations faced by developers like himself, invites the RTL community to contribute to solutions to “help speed up the process” and “work on the bits of functionality that are important to them.”39 The original poster, @salar90 (Salar Gholizadeh), concurs and encourages fellow “RTLers . . . to contribute.” A few months later, @lee-dohm acknowledges the importance of the still-unresolved issue but cannot say “when we’ll be able to get to it.” Such debate is common across community wikis: the issues are identified without a path or a timeline for a solution.

It may be that “RTLers” have not contributed enough to solving the problem, as @salar90 suggests, but it is also unfortunate that a software initiative with such major backers, which lists sixty-four people on its GitHub organization page and to which nearly a hundred people have at some point contributed code, does not “have an ETA for when we’ll be able to get to it.”40 The frustrations that emerge when developers or managers redirect potential resources away from what the community clearly desires are reflected in another RTLer’s comment on a different Atom issue: “As always—we need to develop everything for ourselves.”41

Compounding the problem of poor implementation of standards for RTL and multidirectional text is the fact that it is difficult for users to discover which applications support RTL/bidi and what workarounds exist. Typically, they must rely on listservs, forums, and social media, as well as their own trial-and-error. There is no inventory of RTL-supporting applications, nor does it seem to be standard practice for software to indicate its RTL support status. As an October 2019 Twitter conversation between DH practitioners well illustrates, the status of RTL support even for widely used tools like Gephi and R is often unknown, with DH scholars resorting to trial-and-error or word-of-mouth to find the tools and workarounds that can support their research.42

What is the way forward to overcome these implementation hurdles? First is to acknowledge that RTL support for DH tools is linked to RTL support in general. On occasion, digital humanists have the opportunity to develop their own boutique tools and may choose to prioritize RTL support in them. But, to a large extent, DH workflows consist of tools backed by broader commercial and community interests and involve teams that draw from several communities: data analysts, developers, writers, and publishers, to name only a few. As long as these workflows depend mainly on tools developed outside of DH and supported by a spectrum of differently motivated stakeholders, DH practitioners can expect RTL support to be driven by what profits these stakeholders rather than by concerns about inclusivity, equity, and accessibility.

The temptation is for digital humanists to address their own needs by bootstrapping RTL support, either by building their own niche tools or by patching and hacking existing tools. But if the authors’ experience is any guide, most RTL DH practitioners have workflows held together by temporary fixes that allow them only to bracket the real scale of the RTL implementation problem. Such fixes, furthermore, do little to address the needs of the wider RTL community and perpetuate the problem of poor documentation. They also sidestep the issue of deep implementation; DH tools are usually not written from the ground up but rely on other software frameworks or libraries. Unless RTL issues are addressed in this ground-level code, RTL support in DH tools can only be a kind of Band-Aid.

Acknowledging that RTL support is bound up with sectors that are relatively unfamiliar for DHers leads to a second obligation in order to move forward: Digital humanists must become advocates for broad-level RTL integration. Indeed, DH practitioners are in a strategic position to communicate the needs of RTL communities for a multitude of reasons. They constitute a global, multilingual guild. As humanists, they are both specialists in issues of complex, linguistic expression and advocates for representation of diverse linguistic communities, past and present. They understand something of the relevant underlying technologies, which they themselves use, teach, document, and further develop. They already have lines of communication to funders and developers. And they have an accredited voice through publications, teaching, and public engagement.

Nevertheless, exercising this kind of advocacy on behalf of RTL communities will require reorienting our own community to be more attuned to the diversity of RTL usage both within and beyond the DH community, as well as improving communication channels with commercial software developers.43 Digital humanists ground their work in knowledge discovery and exchange in the digital sphere. But knowledge is not just cultural content embedded in language; it is also infrastructure that allows that content to be represented, circulated, and preserved for the concerned communities. In this case, the knowledge to be discovered is that of the access barriers that RTLers face and the inadequate state of RTL support, both of which are obstacles to equitable knowledge production in our age. These knowledge sets in turn need to be exchanged between RTL users and developers.

Moving Forward with RTL DH

We believe that it is possible to channel user concerns to developers more effectively than is being done with existing listservs, forums, and social media, which probably reach DH and RTL communities more than developers. As such, so that issues (tickets) can be filed regarding RTL support, we have set up a pilot GitHub site.44 The goal of soliciting these issues is twofold: (1) to pass on users’ reports to developers and (2) to provide a clearinghouse where users can view and share the status of RTL support for various applications and websites. GitHub is the world’s largest code host, with over 40 million users and 100 million repositories as of August 2019.45 It is possible to “ping” the relevant development team on GitHub from an issue filed in another repository by linking to open issues in their repository or by @mentioning the relevant GitHub organization or user. This approach does not guarantee, of course, that solutions will be expedited, but it does attempt to consolidate the discussion in one place and make it more visible. This consolidation effort should be used not only to point out problems but also to raise awareness of RTL user needs, to stimulate debate and build consensus among RTL developers, and even to recognize well-implemented RTL support.

As mentioned above, technical standards for RTL support largely exist while implementation is largely lacking. But there is something else missing: best practice guidelines. There are, no doubt, developers who would like their products to reach RTL audiences, and yet it is difficult for them to know how users will expect their software to behave in RTL environments. In the same way that the W3C offers “principles” and “easy checks” for accessibility, developers should have access to high-level principles that can guide their RTL implementations—principles that are humanistic and user-oriented rather than technical standards per se.46 One example might be that arrow keys or buttons should be linked to movement in a certain direction (right, left, up, down) rather than to sequence (forward, back, next, previous).

Such best practice guidelines must develop out of community conversations and on the basis of multiple examples. The authors hope that issues filed in the Right2Left Digital Humanities (right2leftdh) GitHub repository can begin to provide such a basis. It is possible that the repository could develop into a place where the RTL community collaborates on best practice guidelines and provides a sandbox space for trying out implementations. It might also be conceivable to award “badges” as a way of recognizing products that follow these best practice guidelines. Indeed, while it might at first appear that RTL DH practitioners are a niche community within a niche community, the reality is that RTL DHers are part of a much larger RTL community, and they are well positioned to become advocates for RTL support within the broader software development space, especially in contemporary RTL linguistic zones where development is robust.

With a few exceptions, digital humanists have yet to effectively engage the broader RTL world, just as RTL communities and their knowledge practices have yet to become fully and equitably integrated into the digital humanities. In our view, these two problems are linked. In examining three complementary perspectives on the issue—habitual, cultural, and technical—we have argued for three corresponding extensions that would help reorient the vision of RTL DH practice: (1) DH objectives for RTL projects that extend beyond historical projects in order to encompass contemporary RTL creative practices and RTL content creation in digital environments; (2) DH tools that extend beyond “localization” and instead are conceived as born-global, culturally aware resources in the broad sense that the term “internationalization” entails; and (3) DH practitioners who extend their efforts beyond bespoke solutions toward discussions that encourage developers to implement existing standards and prioritize accessibility for a broad RTL user base. To address the compounded marginalization that RTL DH currently faces, we must work to further articulate these reorientations along both theoretical and practical axes, as they would help to align the broader DH community with the broader RTL community. Continuing to raise awareness of the digital-cultural habitus of RTL and other non-English communities and linking them to pragmatic solutions will foster a more inclusive and equitable environment for DH around the globe.

Notes

This contribution was conceived by the authors together with Najla Jarkas (American University of Beirut), who coauthored the presentation at the Right2Left Workshop Digital Humanities Summer Institute (Victoria, British Columbia, June 8, 2019) that formed the basis for the Cultural section of this chapter. Support for Nathan Gibson’s contribution to this chapter was generously provided by the German Federal Ministry of Education and Research through the “Kleine Fächer—Große Potenziale” program in the framework of the “Communities of Knowledge” project (grant number 01UL1826X).

Contributors to this article are listed in alphabetical order; they all contributed equally to its authorship.

The thread begins with this Twitter post, January 7, 2020, https://twitter.com/Zoe_LeBlanc/status/1214592683739668483. In a follow-up tweet, LeBlanc shares a screenshot of what seaborn, another Python visualization library, does to Arabic characters: separating the connected letters and producing an unfortunately all-too-common mangling of Arabic found everywhere from public signage to tattoos. Regarding LeBlanc’s research, see chapter 22 in this volume on digital history dissertations. For a host of examples of Arabic text mangling, see Ramsey Nasser’s long-standing blog “Nope, not Arabic” (https://www.notarabic.com/). For practical guidance on how to avoid such text rendering errors, see Nasser (“Unplain Text”).
Return to note reference.
See the commits and issues at https://github.com/search?q=%22RTL%22&type=commits; open issues are listed at https://github.com/search?q=%22RTL%22+state%3Aopen&type=Issues&ref=advsearch&l=&l=. Of course, not all of these have to do with support for RTL scripts or layout, but a quick perusal shows that a great many of them do.
Return to note reference.
In the following, we sometimes refer to “RTL languages” as shorthand for languages written using RTL writing systems. We do not mean to imply that languages themselves are inherently RTL or LTR, since the same language can be written using RTL or LTR systems (e.g., Ottoman vs. post-Ottoman Turkish or Hebrew in native script vs. romanized Hebrew).
Return to note reference.
This cooperation is all the more urgent, given the recent explanation by Boucher and Anderson of how bidirectional control characters can hide attacks in source code.
Return to note reference.
Ishida (“Unicode”) observes, “Most applications treat text by default as left-to-right, and a specific effort is required to say that the base direction should be right-to-left.”
Return to note reference.
See Nasser (“قلب”) as well as Ahmed Abdalla, Nick Doiron, and Jake Worth’s SimplyAhmazing/noor (code repository), November 5, 2018, https://github.com/SimplyAhmazing/noor; we also do not address here the issues involved in writing top-to-bottom with lines proceeding right-to-left (TTB-RTL), as is sometimes practiced with Japanese, Chinese, and Korean.
Return to note reference.
Rockwell discusses certain disciplinary barriers to full integration into the DH community. Along with issues such as “jobs,” “theory,” and “disciplinary violence,” the LTR default can present a subdisciplinary challenge that RTL DH practitioners face in having their research conducted, assessed, and valorized on an equal footing. See this chapter’s section “Habitual” for one particular example.
Return to note reference.
This point does not escape Rockwell in his reference to Fiormonte (251).
Return to note reference.
See the web content for these initiatives: DHIB: Digital Humanities Institute—Beirut (event website), https://dhibeirut.wordpress.com/; the NYU Abu Dhabi Winter Institute in Digital Humanities (event website), https://wp.nyu.edu/widh/.
Return to note reference.
For the latter, see Islamicate Digital Humanities Network: The Next Generation (society website), https://idhn.org/.
Return to note reference.
As Quinn Dombrowski pointed out in a comment on a draft of this chapter, historical materials are more widely researched in general than contemporary culture, not just in regard to RTL cultures. Our point is that this tendency produces a discrepancy between the ways DH practitioners as compared with other RTL users navigate obstacles relating to RTL accessibility.
Return to note reference.
In Risam’s view, “postcolonial digital humanities is an approach to uncovering and intervening in [such] disruptions” (3).
Return to note reference.
Around DH in 80 Days was “a multi-institutional, interdisciplinary Digital Humanities collaboration that seeks to introduce new and veteran audiences to the global field of DH scholarly practice by bringing together current DH projects from around the world.” For a project introduction, see https://web.archive.org/web/20190125065559/http://www.arounddh.org/about/.
Return to note reference.
The South Azer(baijan)i language as spoken in Iranian Azerbaijan and other Turkish regions of the country has an RTL script based on the Perso-Arabic alphabet, whereas the North Azer(baijan)i language of the Republic of Azerbaijan and the Caucasus region is written in a variety of alphabets, including Latin and Cyrillic. Melissa Terras’s blog post and “Infographic: Quantifying Digital Humanities” shows only one physical DH center in the MENASA region by 2012, which she locates in Iran; see http://melissaterras.blogspot.com/2012/01/infographic-quanitifying-digital.html. This must be IFRI (Institut Français de Recherche en Iran), an offshoot of the Embassy of France in Tehran, which is also recorded in the centerNet’s “Centres” map at http://dhcenternet.org/centers. Elsewhere in “Toward a Cultural Critique of Digital Humanities,” Fiormonte draws a connection—and possible correlation—between income and the number of physical DH centers on a global scale. For the University of Shiraz DH center, see its Instagram user page (https://www.instagram.com/dhc_shirazu/).
Return to note reference.
By “historical” projects, we are especially referring to those dealing with materials before the twentieth century. The scale of the disparity is difficult to assess without a proper survey. Nevertheless, of the projects funded by the European Research Council (ERC), the European Union’s body for funding individual research projects, only about 25 percent (around 18 of 71 projects) with the word “Arabic” in the description appear to include anything from the twentieth century or later (“ERC Funded Projects,” European Research Council, November 16, 2020, see https://erc.easme-web.eu?mode=7&fullText=Arabic).
Return to note reference.
Thanks to Hilary Green for pointing out this marginalization in her reading of an earlier draft of this chapter.
Return to note reference.
Many contemporary projects have a preservationist mission similar to archival historical ones, aiming to gather, preserve, and exhibit multimedia material that may equally include rare or ephemeral artifacts.
Return to note reference.
In comparing LTR to RTL historical corpora, Mahony relevantly observes “the Western-European and US focus on” the “production of digital editions of texts.” In one study alone, “65% of the projects are Anglo-American; that is 123 out of the total of 187 editions recorded” (376). Obviously, LTR editions comprise a percentage greater than or equal to the exclusively Anglocentric ones.
Return to note reference.
For an overview of “the state(s) of the OCR problem” with “texts printed before 1800” and in “languages other than modern English,” see https://digital.library.unt.edu/ark:/67531/metadc1010762/m1/2/. For the state of the field in Arabic HTR, see Keinan-Schoonbaert. Regarding OCR of printed Syriac texts, see Chesley, Marcantonio, and Pearson.
Return to note reference.
On bidirectional support in code editors, see the third section of this chapter “Technical.” According to Mahony (374), “the dominance of such pervasive systems as . . . HTML and the ubiquitous XML, the latter particularly having a pronounced linguistic bias (difficulties with accented characters and right-to-left scripts) as well as the English-based TEI guidelines” are as equally to blame for Anglocentrism and its “geopolitics which [Fiormonte] claims is to be found endemic in our field” as “ASCII code (American Standard [Code] for Information Exchange) and the domain name system (administered by ICANN),” which Fiormonte had singled out in his earlier critique.
Return to note reference.
Zarnegar, one of the earliest word processors to handle Persian and Arabic, was released initially for DOS in 1991 and later could be used with Windows (see “Zarnegar,” SinaSoft corporate website, accessed December 23, 2020, http://sinasoft.com/zarnegar.html). See also “Zarnegar (word processor),” Wikipedia, last modified May 13, 2022, https://en.wikipedia.org/wiki/Zarnegar_(word_processor). According to the WinWorld online museum, an Arabic version of Microsoft Windows seems to have been available first for Windows 3.1, released in 1992; see “Windows 3.0 / 3.1,” WinWorld, accessed December 23, 2020, https://winworldpc.com/product/windows-3/31; until the release of Windows 98, non-Arabic Windows versions could be arabicized only by installing additional software (Madhany). Even after this point, as Madhany explains, fully enabling RTL features in the Microsoft Windows and Office product suites required adjusting a host of settings. The original interface with these settings can be seen in the screenshots at https://www.lib.uchicago.edu/e/collections/mideast/encyclopedia/multilingual_computing_arabic.ppt.
Return to note reference.
See the relevant documentation for the following tools: Voyant Tools (https://voyant-tools.org/docs/#!/guide/languages), Recogito (https://github.com/pelagios/recogito2/wiki/User-Interface-Translation:-Contributors’-Guide), From the Page (https://content.fromthepage.com/neh-to-fund-better-internationalization-and-integration-in-fromthepage/), and Lingscape (https://lingscape.uni.lu/). The localization of the latter has also been completed by Wrisley and Jarkas.
Return to note reference.
The thread of this debate can be found here: https://github.com/programminghistorian/ph-submissions/issues/147.
Return to note reference.
For example, contributors of lessons to the Programming Historian are encouraged to refrain from culturally specific language that would exclude global readers and to shape their tutorials around datasets that could be exchanged easily by others for other language communities (Sichani). In a DH training context, at the NYU Abu Dhabi Winter Institute in Digital Humanities (WIDH), participants are encouraged to “communicate with each other in ways that respect difference, while showing compassion, empathy and understanding, instead of assuming that we are all ‘on the same page.’” For more information, see the WIDH Code of Conduct inspired by codes from other cognate DH training events: https://wp.nyu.edu/widh/code-of-conduct/.
Return to note reference.
See chapter 21 in this volume on “Reframing the Conversation: Digital Humanists, Disabilities, and Accessibility.”
Return to note reference.
See the discussion of “professional capacity networks” in chapter 21 on “Reframing the Conversation,” as well as El Khatib et al.
Return to note reference.
The Unicode bidirectional algorithm (Ishida, “Unicode”) attempts to correctly display mixed-direction text on the basis of directionality attributes embedded in Unicode character information. However, this is not adequate to correctly display ambiguous characters like numbers or punctuation, and in rare cases the direction of a text segment needs to be reversed. Therefore Unicode, HTML, and CSS each independently make it possible to (1) set the direction of a particular text segment and mark opposite-direction text so that ambiguous characters display in the correct position, (2) mark off text with an unknown direction so that it can preserve its direction when inserted into surrounding text, and (3) override the default character direction. Examples can be seen at https://right2leftdh.github.io/examples/bidi.html. Details can be found in Gibson (11–28). For helpful guides to implementing RTL and mixed-direction text in HTML, see Ishida and Lanin; see also Ishida, “Structural Markup.” For a suggested TEI-XML implementation, see the TEI Guidelines.
Return to note reference.
See the TEI Guidelines or try out this jsFiddle: https://jsfiddle.net/gh/get/library/pure/right2leftdh/right2leftdh.github.io/tree/master/examples/boustrophedon-demo.
Return to note reference.
For a particularly pronounced case, see the example of the Atom code editor, discussed below.
Return to note reference.
On February 1, 2021, Vallari Agrawal (@VallariAg) created a pull request intended to resolve this issue (https://github.com/internetarchive/bookreader/pull/615). As of October 24, 2022, the new code had not yet been merged into the Internet Archive’s BookReader production codebase.
Return to note reference.
“Arabic and Hebrew Features in InDesign,” Adobe Support, July 12, 2019, https://helpx.adobe.com/indesign/using/arabic-hebrew.html. See also the discussions in the Adobe Support Community on the topic “binding direction disappeared,” July 20, 2017, https://community.adobe.com/t5/indesign/binding-direction-disappeared/td-p/9256082, and on “change binding direction,” September 6, 2017, https://community.adobe.com/t5/indesign/change-binding-direction/td-p/9316069. It should be acknowledged that Adobe has implemented a number of advanced RTL features in InDesign, although they are hidden in the localized edition of the software. In 2001, there was a parallel situation in which only Arabic editions of Windows supported certain features (Habash).
Return to note reference.
An example is Samuel Landauer’s Arabic edition of Saadia Gaon’s Book of Beliefs and Opinions on the Internet Archive, the Arabic portion of which must be read from bottom to top if viewed in single-page mode or downloaded as a PDF; see https://archive.org/details/kitbalamnt00saaduoft/page/n352/mode/1up.
Return to note reference.
Personal correspondence with Johanna Walcher, August 27, 2019, and Günter Hackl, August 28, 2019.
Return to note reference.
“Network Dependents,” matplotlib/matplotlib (code repository), January 2, 2023, https://github.com/matplotlib/matplotlib/network/dependents.
Return to note reference.
“matplotlib/mplcairo: A (New) Cairo Backend for Matplotlib” (code repository), https://github.com/matplotlib/mplcairo.
Return to note reference.
A thorough survey is needed. However, it may be indicative that “Comparison of Text Editors” (Wikipedia, last modified February 1, 2020, https://en.wikipedia.org/w/index.php?title=Comparison_of_text_editors#Right-to-left_and_bidirectional_text) lists 58 text (and source code) editors; of these, 14 fully or partially support RTL and bidi, 17 do not support either, and 27 are unknown.
Return to note reference.
“Atom Reaches One Million Active Users,” Atom (blog), March 28, 2016, https://web.archive.org/web/20221129082040/https://blog.atom.io/2016/03/28/atom-reaches-1m-users.html. During the editing of our article, GitHub announced it would sunset Atom on December 15, 2022, in favor of furthering the development of Visual Studio Code, which has better (but still problematic) RTL and bidi handling (“Sunsetting Atom,” GitHub [blog], June 8, 2022, https://github.blog/2022-06-08-sunsetting-atom/).
Return to note reference.
The open issues related to RTL text in November 2021 can be reviewed at https://github.com/atom/atom/issues/10132, https://github.com/atom/atom/issues/13612, https://github.com/atom/atom/issues/13348, https://github.com/atom/atom/issues/10294, https://github.com/atom/atom/issues/9397, and https://github.com/atom/atom/issues/5990. Contributors Mohamed Taher Alrefaie (@mohataher) and @mohamedalmograby attempted to resolve several of these issues with a pull request (https://github.com/atom/atom/pull/21018), but an Atom maintainer rejected the pull request on September 3, 2021, asserting, “The maintenance cost of this new feature is huge and Atom currently doesn’t have the capacity to deal with issues and pull requests that come as a result of this addition.”
Return to note reference.
“RTL Text Selection,” https://github.com/atom/atom/issues/10132.
Return to note reference.
“People · Atom · GitHub,” https://web.archive.org/web/20220610160902/https://github.com/orgs/atom/people; “Contributors to atom/atom · Github,” https://github.com/atom/atom/graphs/contributors. It would be simplistic, however, to focus strictly on directional issues in implementing support for RTL. Developers additionally need to be aware that many RTL writing systems require features like connecting letters and combining diacritics. If these are poorly implemented, the text is nearly unreadable. The critical point to remember is that if RTL-unfriendly text editors are also widely used by most developers, it is easy to see how their output fails to meet expectations for RTL use cases.
Return to note reference.
GitHub issue comment related to “Editor Behaves Confusing while Editing RTL Text,” February 10, 2018, https://github.com/atom/atom/issues/4682#issuecomment-364611470.
Return to note reference.
The thread begins with this Twitter post by Till Grallert (@tillgrallert), “I just updated to a new computer,” October 2, 2019, https://twitter.com/tillgrallert/status/1179522880306061313.
Return to note reference.
See the previous “Habitual” section of this chapter.
Return to note reference.
Our pilot site can be found at Right2Left Digital Humanities, https://right2leftdh.github.io.
Return to note reference.
See Gousios et al.; see also the “About” page at https://github.com/about.
Return to note reference.
For accessibility guidelines, see Henry and McGee.
Return to note reference.

Bibliography

Abdelali, Ahmed. “Localization in Modern Standard Arabic.” Journal of the American Society for Information Science and Technology 55, no. 1 (2004): 23–28, https://doi.org/10.1002/asi.10340.
Antonijević, Smiljana. Amongst Digital Humanists: An Ethnographic Study of Digital Knowledge Production. New York: Palgrave Macmillan, 2015, https://doi.org/10.1057/9781137484185.
Bani-Khaled, Turki Ahmad Ali. “Standard Arabic and Diglossia: A Problem for Language Education in the Arab World.” American International Journal of Contemporary Research 4, no. 8 (August 2014): 180–89.
Boucher, Nicholas, and Ross Anderson. “Trojan Source: Invisible Vulnerabilities.” arXiv (preprint). October 30, 2021, https://arxiv.org/abs/2111.00169.
Bourdieu, Pierre. Outline of a Theory of Practice. Translated by Richard Nice. Cambridge: Cambridge University Press, 1977.
Chesley, Emily, Jillian Marcantonio, and Abigail Pearson. “Towards Syriac Digital Corpora: Evaluation of Tesseract 4.0 for Syriac OCR.” Hugoye: Journal of Syriac Studies 22 (2019): 109–92, https://hugoye.bethmardutho.org/article/hv22n1chesley.
Eberhard, David M., Gary F. Simons, and Charles D. Fennig, eds. “Summary by Language Size.” In Ethnologue: Languages of the World. 22nd ed. Dallas: SIL International, 2019, https://www.ethnologue.com/statistics/size (login required).
El Khatib, Randa, David Joseph Wrisley, Shady Elbassuoni, Mohamad Jaber, and Julia El Zini. “Prototyping across the Disciplines.” Digital Studies/Le Champ Numérique 8, no. 1 (2019): 1–20, https://doi.org/10.16995/dscn.282.
Fiormonte, Domenico. “Toward a Cultural Critique of Digital Humanities.” In Debates in the Digital Humanities 2016, edited by Matthew K. Gold and Lauren F. Klein, 438–58. Minneapolis: University of Minnesota Press, 2016, https://doi.org/10.5749/9781452963761.
Galina, Isabel. “Is There Anybody Out There? Building a Global Digital Humanities Community.” Red de Humanidades Digitales (blog). July 19, 2013, https://web.archive.org/web/20201205073959/http://humanidadesdigitales.net/blog/2013/07/19/is-there-anybody-out-there-building-a-global-digital-humanities-community/.
Gibson, Nathan P. “Thinking in ⅃TЯ: Reorienting the Directional Assumptions of Global Digital Scholarship.” Presentation at the Right2Left Workshop Digital Humanities Summer Institute, Victoria, British Columbia, June 8, 2019, https://doi.org/10.17613/3vws-5s29.
Gousios, Georgios, Bogdan Vasilescu, Alexander Serebrenik, and Andy Zaidman. “Lean GHTorrent: GitHub Data on Demand.” In Proceedings of the 11th Working Conference on Mining Software Repositories—MSR 2014, 384–87. Hyderabad, India: ACM Press, 2014, https://doi.org/10.1145/2597073.2597126.
Habash, Nizar. “Nuun: A System for Developing Platform and Browser Independent Arabic Web Applications.” In Proceedings of the Arabic Translation and Localization Conference (ATLAS, 1999). Tunis: Tunisia, 2001, https://www.researchgate.net/publication/2414218.
Henry, Shawn Lawton, and Liam McGee. “Accessibility.” W3C. Accessed August 9, 2022, https://www.w3.org/standards/webdesign/accessibility.
“Internationalization,” W3C. Accessed August 9, 2022, https://www.w3.org/standards/webdesign/i18n.html.
Ishida, Richard. “Structural Markup and Right-to-Left Text in HTML.” W3C. Last updated June 25, 2021, https://www.w3.org/International/questions/qa-html-dir.
Ishida, Richard. “Unicode Bidirectional Algorithm Basics.” W3C. Last updated August 9, 2016, https://www.w3.org/International/articles/inline-bidi-markup/uba-basics.
Ishida, Richard. “Visual vs. Logical Ordering of Text.” W3C. Last updated June 10, 2016, https://www.w3.org/International/questions/qa-visual-vs-logical.
Ishida, Richard, and Aharon Lanin, “Inline Markup and Bidirectional Text in HTML.” W3C. June 25, 2021, https://www.w3.org/International/articles/inline-bidi-markup/.
Keinan-Schoonbaert, Adi. “Results of the RASM2019 Competition on Recognition of Historical Arabic Scientific Manuscripts.” The British Library—Digital Scholarship (blog). September 13, 2019, https://blogs.bl.uk/digital-scholarship/2019/09/rasm2019-results.html.
Kirmizialtin, Suphan, and David Joseph Wrisley, “Automatic Transcription of Non-Latin Script Periodicals: A Case Study in Ottoman Turkish Print Archive.” DHQ: Digital Humanities Quarterly 16, no. 2 (2022), http://digitalhumanities.org:8081/dhq/vol/16/2/000577/000577.html.
Madhany, al-Husein N. “Multilingual Computing with Arabic and Arabic Transliteration: Arabicizing Windows Applications to Read and Write Arabic & Solutions for the Transliteration Quagmire Faced by Arabic-Script Languages.” February 2006, https://www.lib.uchicago.edu/e/collections/mideast/encyclopedia/Multilingual_Computing_with_Arabic_and_Arabic_Transliteration.pdf.
Mahony, Simon. “Cultural Diversity and the Digital Humanities.” Fudan Journal of the Humanities and Social Sciences 11, no. 3 (2018): 371–88, https://doi.org/10.1007/s40647-018-0216-0.
Meza, Aurelio. “Decolonizing International Research Groups: Prototyping a Digital Audio Repository from South to North.” Digital Studies/Le Champ Numérique 9, no. 1 (2019): 7, https://doi.org/10.16995/dscn.303.
Nasser, Ramsey. “.” Ramsey Nasser (blog). January 14, 2020, https://nas.sr/%D9%82%D9%84%D8%A8/.
قلب
Nasser, Ramsey. “Unplain Text: How to Shape and Render Non-Latin Text.” Increment. April 2018, https://increment.com/programming-languages/unplain-text-primer-on-non-latin/.
Osborn, Don. African Languages in a Digital Age: Challenges and Opportunities for Indigenous Language Computing. Ottawa: International Development Research Centre, HSRC Press, 2010, https://www.hsrcpress.ac.za/books/african-languages-in-a-digital-age.
Risam, Roopika. New Digital Worlds: Postcolonial Digital Humanities in Theory, Praxis and Pedagogy. Evanston, Ill.: Northwestern University Press, 2019.
Rockwell, Geoffrey. “Inclusion in the Digital Humanities.” In Defining Digital Humanities: A Reader, edited by Edward Vanhoutte, Julianne Nyhan, and Melissa M. Terras, 247–53. Surrey: Ashgate, 2013.
Sichani, Anna-Maria. “Linguistic Diversity and Ad-Hoc Translation of the Programming Historian’s Lessons.” Programming Historian. November 30, 2018, https://programminghistorian.org/posts/ad-hoc-translation.
Souphavanh, Anousak, and Theppitak Karoonboonyanan. FOSS Localization. New Delhi: Elsevier, 2005, https://en.wikibooks.org/wiki/FOSS_Localization.
Tanev, Stoyan. “Global from the Start: The Characteristics of Born-Global Firms in the Technology Sector.” Technology Innovation Management Review 2 (March 2012): 5–8, https://doi.org/10.22215/timreview/532.
TEI Guidelines. “Characters, Glyphs, and Writing Modes.” July 16, 2019, https://tei-c.org/release/doc/tei-p5-doc/en/html/WD.html.
W3C. “Localization vs. Internationalization.” Last updated December 5, 2005, https://www.w3.org/International/questions/qa-i18n.
Wrisley, David Joseph. “Enacting Open Scholarship in Transnational Contexts.” POP! Public. Open. Participatory. October 31, 2019, https://doi.org/10.21810/pop.2019.002.
Wrisley, David Joseph, and Najla Jarkas. “On Translating Voyant Tools into Arabic.” DJWrisley (blog). September 6, 2016, https://web.archive.org/web/20210226104135/https://djwrisley.com/on-translating-voyant-tools-into-arabic/.
Yacob, Daniel. “Localize or Be Localized: An Assessment of Localization Frameworks.” In International Symposium on ICT Education and Application in Developing Countries, 1–9. Addis Ababa, Ethiopia, 2004, http://yacob.org/papers%2FDanielYacob-ICTES2004.pdf.
Yaghan, Mohammad Ali. “‘Arabizi’: A Contemporary Style of Arabic Slang.” Design Issues 24, no. 2 (Spring 2008): 39–52, https://doi.org/10.1162/desi.2008.24.2.39.

Chapter 4

Show the following:

Adjust appearance:

Notes

Chapter 3

Right-to-Left (RTL) Text: Digital Humanists Plus Half a Billion Users

Habitual

Cultural

Technical

Moving Forward with RTL DH

Notes

Bibliography

Annotate