History of the Digital Syriac Corpus
History of the Digital Syriac Corpus
The Digital Syriac Corpus began in 2004 as a collaboration between David G.K. Taylor, Associate Professor of Aramaic and Syriac, at the Oriental Institute, Oxford University, and Kristian S. Heal, director of the Center for the Preservation of Ancient Religious Texts (CPART) at Brigham Young University. An earlier project at BYU, The Dead Sea Scrolls Electronic Library (published by Brill) had shown the value and viability of creating an annotated electronic corpus of Semitic texts. In the same period a collaboration between BYU and the Vatican Library to digitize Syriac manuscript collections 1 had turned the attention of CPART towards the relevance of digital tools for Syriac studies. Preliminary transcriptions of some of these Vatican manuscripts were made as early at 2001. Both of these successful projects suggested that an electronic corpus of Syriac texts would be a valuable scholarly endeavor.
As a proof of concept, Taylor and Heal set about developing a digital corpus of early Syriac texts based on the best available editions. They selected key texts, and worked with transcribers in Oxford, Rome, the Middle East, and the United States, such as Dr. Michael Oez, Fr. Dr. William Toma, and Fr. Dr. Roger Akhrass. These transcription efforts grew the electronic corpus to almost five million words, and included the complete works of Ephrem, Narsai, and Philoxenus, together with numerous other major works from a millennium of important classical Syriac authors, spanning from Bardaisan to Barhebraeus. Taylor and Heal also oversaw the production of a complete electronic edition of Jessie Payne Smith’s Compendious Syriac Dictionary for the purposes of adding lexical annotations and dictionary links to the corpus. This early progress of the Syriac Corpus project was presented in various conferences papers. 2
Eventually, a small part of the corpus was made publically available as a download package for use with BYU's WordCruncher software. Through this method, individual scholars were provided access to relevant portions of the corpus for their research and publications. During this phase of the project, the major work on the Syriac Corpus was carried out in collaboration with BYU’s Natural Language Processing Lab under the direction of Professors Eric Ringger and Kevin Seppi (Computer Science), in collaboration with Professor Deryle Lonsdale (Linguistics). The focus of this research was machine-assisted annotation of the corpus, both morphological and lexical, and was carried out in collaboration with graduate students at BYU, generating a number of computer science research papers. 3
The early stages of development of the Digital Syriac Corpus were driven by research interests in morphological tagging, corpus linguistics, text mining, and machine learning. Taylor and Heal had, however, also desired from the start to make the corpus directly available online in a format which did not require any particular expertise in computational analysis or text markup. From 2010 onward, Heal began to explore how methods in the Digital Humanities were newly intersecting with Syriac Studies to open up inexpensive possibilities for such an accessible format. 4 These issues were a particular focus at the fourth Hugoye Symposium, “Syriac and the Digital Humanities” sponsored in 2015 by the Beth Mardutho Research Library, Rutgers University, and Syriaca.org. 5 Among the papers, James Walters (at the time a Ph.D. candidate at Princeton Theological Seminary) presented an evaluation of various formats for the publication of electronic critical editions of Syriac texts including the XML guidelines of the Text Encoding Initiative. At the same symposium, David Michelson (Vanderbilt University) and Winona Salesky presented on the Srophé app, a native XML database customized by Syriaca.org for Syriac data encoded in TEI. From this confluence, Heal crafted a collaborative plan to publish an online reader-oriented interface for the corpus. The execution of this plan was taken up in 2016 by James Walters (now of Rochester College) who spearheaded further development of the Digital Syriac Corpus. An editorial board was formed at this time for scholarly governance and peer review and to assure that future development would remain relevant to the needs of the scholarly community.
One of the most pressing tasks for this phase of the corpus was to propose standards for electronic editions of Syriac texts. This work was undertaken in 2016 by Walters who established how to apply the TEI XML standards to the unique features of Syriac digital texts. Through funding provided by Brigham Young University, the Digital Syriac Corpus hired Winona Salesky to deploy and customize the Srophé app to enable browsing, search, and display of Syriac texts encoded in TEI XML. Through collaboration with Michelson, Vanderbilt University's Jean and Alexander Heard Library provided web hosting for the development of the Srophé app into a Syriac reader platform and then permanent online hosting for the Digital Syriac Corpus. Walters then converted over three hundred texts from word processor files into well-formed and semantically encoded TEI XML according to the new digital edition guidelines of the Digital Syriac Corpus. Technical advisors Michelson and Daniel Schwartz (Center of Digital Humanities Research, Texas A&M University) worked with Walters to test and create a digital corpus database robust enough to model diverse genres of Syriac literature. Once the process was complete, Walters prepared extensive documentation so that the standards for electronic editions of Syriac texts could be widely used and interpreted.
In addition to making Syriac texts available online in a reader-friendly format, the development of the Digital Syriac Corpus also began to connect these digital texts to the growing resources of the Syriac Linked Open Data community. To assist readers, the Digital Syriac Corpus collaborated with James Bennett and George Kiraz of Beth Mardutho: The Syriac Institute to facilitate automated lexical queries from the Digital Syriac Corpus to the SEDRA lexical database. The result was an API template for overlaying lexical lookups in HTML over any Syriac text. These code templates (implemented first in the Digital Syriac Corpus) are made available for free reuse by any project or website which has Syriac text. In a second collaboration, the Digital Syriac Corpus adopted the authority files for authors, works, and citations developed by Syriaca.org. Nathan Gibson (Ludwig-Maximilians-Universität München) and Michelson provided support for the use of Syriaca.org URIS (uniform resource identifiers) as cataloguing standards for the corpus. The use of common URIs with other projects not only allows the Digital Syriac Corpus to be searched according to the current standards of digital classification in Syriac studies but also allows linking between the Digital Syriac Corpus and other digital projects in Syriac studies. In fact, projects wishing to embed texts from the Digital Syriac Corpus in their own webpages can place API calls using CTS URNS (Canonical Text Services identifiers).
Walters revealed a demo of the Digital Syriac Corpus to a group of Syriac scholars in June 2017 at a workshop dedicated to Narsai, held at Brigham Young University. Further presentations were made in the spring of 2018 at the LinkSyr Workshop, Vrije Universiteit Amsterdam, Netherlands, a conference presentation at the International Congress of Medieval Studies in Kalamazoo, MI, and a training workshop at the North American Patristics Society in Chicago, IL.
In May, 2018, the editorial board determined that with the digital infrastructure and TEI/XML editorial standards in place the corpus was ready for its initial public release. In recognition of his work Walters was named as the General Editor and commissioned to undertake the next phase of the project. This phase will entail the encoding of the remaining digital files inherited from the earlier work of Heal and Taylor and will solicit new editions and contributions from scholars in the field. One such born-digital text edition has already been published in the Digital Syriac Corpus as a model for new editions: The Exhortation of Peter.
For the future of the corpus, we hope that scholars will come to recognize this resource not only as a repository of texts that can be searched, but also as a publishing venue for new editions of Syriac texts.
Kristian S. Heal
James E. Walters
About Digital Syriac Corpus
- 1 View the report on BYU's collaboration with the Vatican Library.
- 2 Kristian Heal, "The BYU-Oxford Syriac Electronic Corpus and the Study of Syriac Literature," Society of Biblical Literature Annual Meeting, 2011; Kristian Heal, "The BYU-Oxford Syriac Electronic Corpus and the Future of Syriac Lexicography," International Syriac Language Project, Society of Biblical Literature, 2011.
- 3 Paul Felt, Eric Ringger, Kevin Seppi, Kristian Heal, Robbie Haertel, Deryle Lonsdale, "First Results in a Study Evaluating Pre-annotation and Correction Propagation for Machine-Assisted Syriac Morphological Analysis," Proceedings of the Eighth International Conference on Language Resources and Evaluation (2012): 878-885; Peter McClanahan, George Busby, Robbie Haertel, Kristian Heal,Deryle Lonsdale, Kevin Seppi, Eric Ringger, "A Probabilistic Morphological Analyzer for Syriac," Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing (2010): 810-820; Kevin Black, Eric K. Ringger, Paul Felt, Kevin Seppi, Kristian Heal, Deryle Lonsdale, "Evaluating Lemmatization Models for Machine-Assisted Corpus-Dictionary Linkage," Proceedings of the Ninth International Conference on Language Resources and Evaluation (2014): 3798-3805; Paul Felt, Eric K. Ringger, Kevin D. Seppi, Kristian Heal, "Using Transfer Learning to Assist Exploratory Corpus Annotation," Proceedings of the Ninth International Conference on Language Resources and Evaluation (2014): 140-145; Paul Felt, Eric K. Ringger, Kevin Seppi, Kristian S. Heal, Robbie A. Haertel, Deryle Lonsdale, "Evaluating machine-assisted annotation in under-resourced settings," Language Resources and Evaluation 47, no. 3 (2013): 1-36.
- 4 See Kristian Heal, "Corpora, Elibraries, and Databases: Locating Syriac Studies in the 21st Century," Hugoye: Journal of Syriac Studies 15.1 (2012): 65-78, available online here; and Heal, "Digital Humanities and the Study of Christian Apocrypha," available online here.
- 5 See the conference report in Hugoye: Journal of Syriac Studies, available online here.