This repo contains the notebooks used for sourcing data for A Corpus of Biblical Names in the Greek New Testament to Study the Additions, Omissions, and Variations across Different Manuscripts
, which was submitted to SemDH 2024: First International Workshop of Semantic Digital Humanities.
data/ General directory for downloaded and generated data
|-- publish/ Directory of cleaned up lists (will be generated by 05_pub_prep.ipynb)
|-- tables/ Directory containing manually curated lists
| `-- names.csv List of manually curated names
|-- transcriptions/ Directory of transcripts (will be created during download)
|-- manuscripts/ Directory of manuscript metadata (will be created during download)
|-- manuscripts.csv Processed list of manuscripts (will be generated by 03_*.ipynb)
|-- names.csv Processed list of names (will be generated by 02_get_words.ipynb)
|-- occurrences.csv Processed list of occurrences of names (will be generated by 04_search.ipynb)
`-- verses.csv Processed list of verses in manuscripts (will be generated by 03_*.ipynb)
notebooks/ Directory of notebooks used
|-- 01_download.ipynb Download files from the IGNTP and NTVMR (TEI files and JSON files)
|-- 02_get_words.ipynb Preprocess manual curated list of names for later search
|-- 03_1_teiparse.ipynb Parsing TEI files for manuscript metadata and verses
|-- 03_2_jsonparse.ipynb Parsing JSON files for manuscript metadata
|-- 03_3_sparql.ipynb Enriching manuscript metadata with data from dbpedia
|-- 04_search.ipynb Search for occurrences and omissions of names in verses
|-- 05_pub_prep.ipynb Clean up processed lists
|-- constants.py Constants
|-- convertes.py Converter functions
|-- TEIFile.py Class file for TEIFile
|-- utils.py Helper functions
`-- tests.py Testing functions
.python-version Python version indicator
README This README
requirements.txt Requirements for Python environment
The recommended Python version for this repo is 3.9.18
(see .python-version
). Dockerimages with Python preinstalled can be found on Dockerhub. Alternatively you can setup and run a virtual Python environment.
In your Python environment run pip install -r requirements.txt
from the projects root directory to install Jupyter. This will enable you to run the notebooks.
The notebooks will automatically download and install the required packages and modules at runtime in their respective kernel.
We have utilized a SPARQL query for retrieving an initial list of biblical names in the New Testament.
Endpoint: https://database.factgrid.de/query
SELECT ?Person ?PersonLabel ?noted ?notedLabel ?GenderLabel ?link ?book
WHERE {
SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
?Person wdt:P2 wd:Q8811.
?Person wdt:P143 ?noted.
?noted wdt:P8 ?book.
FILTER (?book IN (wd:Q74942, wd:Q74943, wd:Q74944, wd:Q74945, wd:Q74946, wd:Q74947, wd:Q74948, wd:Q74949, wd:Q74950, wd:Q74951, wd:Q74952, wd:Q74953, wd:Q74954, wd:Q74955, wd:Q74956, wd:Q74957, wd:Q74958, wd:Q74959, wd:Q74960, wd:Q74961, wd:Q74962, wd:Q74963, wd:Q74964, wd:Q74965, wd:Q74966, wd:Q74967, wd:Q74968))
OPTIONAL { ?Person wdt:P154 ?Gender. }
OPTIONAL { ?link schema:about ?Person ; schema:isPartOf <https://www.wikidata.org/> . }
}
ORDER BY (?PersonLabel)
There will be/have been updates on this repo. Please have a look at the release tags for previous versions.
If you use this code or data in your research, please cite:
@inproceedings{Werner2024,
title = {A Corpus of Biblical Names in the Greek New Testament to Study the Additions, Omissions, and Variations across Different Manuscripts},
author = {Christoph Werner and Zacharias Shoukry and Soham Al-Suadi and Frank Krüger},
url = {https://ceur-ws.org/Vol-3724/paper6.pdf},
crossref = {SemDH2024},
year = {2024},
abstract = {The analysis of textual variants of verses in the New Testament across different manuscripts has mainly been done by close reading with manual effort. With the increasing number of transcriptions of the different manuscripts, quantitative analyses (so-called distant reading) can be used to search for patterns of omission, addition, or other variations, to formulate novel hypotheses to be investigated by close reading. In this work, we present a corpus of biblical names including spelling variation and inflections and their mentions in the transcriptions of the New Testament. By integrating and semantically enriching the data collected from different sources, we established a corpus that can be used for the quantitative study of omission, addition, and variation of such biblical names. To illustrate the corpus, we implement some use cases and show that well-known cases can be quantitatively reproduced. The corpus and all code are published under open licenses to enable reproduction, update, and maintenance.},
keywords = {New Testament,Biblical Names,Textual Variation Units},
}
@proceedings{SemDH2024,
booktitle = {Semantic Digital Humanities 2024},
year = {2024},
editor = {Oleksandra Bruns and Andrea Poltronieri and Lise Stork and Tabea Tietz},
series = {CEUR Workshop Proceedings},
address = {Aachen},
issn = {1613-0073},
url = {https://ceur-ws.org/Vol-3724/},
venue = {Hersonissos, Greece},
eventdate = {2024-05-27},
title = {Proceedings of the First International Workshop of Semantic Digital Humanities (SemDH 2024)}
}
- Version v1 from Mar 15, 2024
- Version v2 from May 17, 2024
- Version v3 from Jul 10, 2024