Digital Media Technology Week 3: Introduction to TEI Peter Verhaar eXtensible Markup Language <title>La Biblioteca de Babel</title> is a short story written by <persName>Jorge Luis Borges</persName>. □ General rules which determine the validity (e.g. proper nesting, single root element, case sensitivity) □ Rules for a particular language which determine well-formedness Validation rules DTD or XML Schema <?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE TEI SYSTEM "tei.dtd " > <tei> <text> <salute> Gentlemen, </salute> <body> I reply to your letter of the <date>29th Ulto</date>, offering 30 £ for an early copy of the novel (…) </body> </text> </tei> Document Instance Deconstruction Textual aspects □ □ □ □ □ □ □ Lexical codes Logical structure Typography Literary devices Grammar and syntax Semantic contents Physical structure Ontologies □ Models are based on an ontology □ The properties of the original which are represented in the model □ Models “inevitably lie, by omission at least” □ A DTD can be viewed as an ontology John Unsworth, 'What is Humanities Computing and What is Not?', in: Melissa Terras, Julianne Nyhan, & Edward Vanhoutte (eds.), Defining digital humanities: a reader, 2013, pp. 36–37. R. Davis, H. Shrobe & P. Szolovits, 'What is a Knowledge Representation?', AI Magazine, 14:1 (1993). Dear Sirs, I will accept £10 for the rights to make a translation into Dutch of my novel entitled Wanda Printers will send you entire proofs from London instantly. Please to send money on receipt of this / Address Madame Ouida. ~c. 2 words illegible~ ~c. 1 word illegible~ Ouida L. de la Ramée letter salute body closer p persName title <?xml version="1.0" encoding="UTF-8"?> <letter> <salute> Dear Sirs,</salute> <body> <p> I will accept £10 for the rights to make a translation into Dutch of my novel entitled <title>Wanda</title> </p> <p> Printers will send you entire proofs from London instantly. Please to send money on receipt of this / Address Madame Ouida. ~c. 2 words illegible~ ~c. 1 word illegible~ </p> </body> <closer> Ouida L. de la Ramée </closer> </letter> Text Encoding Initiative □ More than 500 elements □ Developed by consortium of scholars □ First established in 1987 □ Text in general: “texts in any natural language, of any date, in any literary genre” <choice> <orig>Impressions</orig> <reg>Impressions of Theophrastus Such</reg> </choice> <choice> <abbr>Yrs.</abbr> <expan>Yours</expan> </choice> <unclear reason=“illegible”> London</unclear> Madame Ouida <gap reason=“illegible” extent=“2 words” /> Unicode <p>En réponse à votre lettre du 30 Janvier nous avons <lb/> l'honneur de vous informer que nous avons payé Mon-<lb/> sieur Midderigh déjà depuis longtemps et presque toujours <lb/> d'avance.</p> Digital information □ Digital information is numerical information. Cf. Latin word ‘digitus’ E.g. words for ‘digital’ in Romanic languages: Digital Studies or ‘Le champ numérique’ ‘Studium Librorum et Instrumentorum Communicationis Numericorum’ □ Digital information is information represented as combinations of 1s and 0s A “byte” (by eight) is a sequence of eight bits 0 1 10 11 100 101 111 1000 ASCII □ Character encoding scheme □ e.g. ASCII: A = 01100001 □ Uses 7 bits (128 characters) Unicode □ 16 bits □ UTF-8 □ 1,112,064 characters α: α I <3 Digital Media Technology Calvin & Hobbes If the premises “A > B” and “A” are true, we can conclude B Entities <p>This sentence is in the <p> element.</p> > < " & Greater than Less than Quotation mark Ampersand Comments Used to improve the readability of the XML document: <!-– The next section contains the transcription -->
© Copyright 2026 Paperzz