UNIVERZI!A KARLOVA V PRAZE, FILOZOFICKA FAKULTA
USTAV ANGLISTIKY A AMERlKANISTIKY
STUDIJNI OBOR: ANGLISTlKA-AMERIKANISTlKA
ONDRE] TICHY
"DIGITIZATION OF OLD AND MIDDLE ENGLISH DICTIONARIES"
VEDOUCI PUCE: Doe. PHDR. JAN CERMAK, CSe.
2007
Prohlasuji, ze jsem diplomovou praci vypracoval samostatne s
vyuzitim uvedenych pramenu a literatury.
~./
.
Rad bych podekoval doe. Janu Cermakovi neJen za jeho piinos teto praCl a projektu
digitalizace staroanglickeho slovniku, ale
za vlidnost a ochotu, ktere provazely celou nasi
spolupriici.
I would like to thank doe. J an Cermak not only for his contribution to this paper and to the
digitization project of the Anglo-Saxon Dictionary, but also for his kindness and helpfulness he
has shown throughout our collaboration.
ABSTRACT
The aim of the paper is both to outline the methodology of digitizing Old and Middle English
dictionaries as well as to describe its successful implementation. It is argued that the digitization of
old dictionaries is generally desirable, because it increases accessibility of valuable resources, which
may be the only way of presenting their data to a wider audience. The paper first briefly and
comprehensively surveys the field of Old & Middle English lexicographical resources, comparing in
greater detail the most promising candidates for digitization. Possible and desirable features of a
digitized dictionary are then explored and on that basis An Anglo-Saxon Dictionary of J. Bosworth &
T. N. Toller is chosen for the digitization project itself. All the phases of the digitization are then
described: scanning, character recognition, hand-corrections, data preparation and application
development. The current state of the Bosworth-Toller digitization project is explained and presented,
while two major suggestions are made for its future development: the re-tagging of its data and the
development of a morphological analyser of Old English.
ABSTRAKT
Tato diplomova price si klade za cil jednak navrhnout postup ph elektronizaci slovnikU stare a
st!edni anglictiny, jednak popsat realizaci konkretniho projektu elektronizace. Autor price se
domniva, ze elektronizace starych slovniku je pHnosna, jelikoz muze b)rt jedinym zpusobem, jak
zptistupnit informacni bohatstvi techto zdroju sirSlmU publiku. Price se nejprve pokousi
0
vycerpavajid soupis lexikografickych zdroju stare a st!edni anglictiny, z nichZ je pote detailneji
porovnano nekolik nejvhodnejsich kandidalli k elektronizaci. Dale pak price navrhuje, jake vlastnosti
by elektronickJ slovnik mel mit. Na tomto zaklade je jako nejzpusobilejsi slovnik k elektronizaci
vybran Staroanglickj slovnik (An Anglo-Saxon Dictionary) autorU J. Bosworthe a T. N. Tollera. Nasledne
jsou pops any vsechny kroky vedoud k prevodu tohoto slovniku do elektronicke podoby: skenovani,
rozpoznani znaku, rucni opravy, ptiprava data a vJvoj slovnikove aplikace. Price take pfedstavuje
soucasny staY projektu elektronizace staroanglickeho slovniku a navrhuje dva hlavni smery budoudho
rozvoje: preznackovani dat a vytvofeni morfologickeho analyzatoru stare anglictiny.
CONTENTS
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
Introduction .................................................................................................................................................................. 6
A Survey of the Field ................................................................................................................................................... 8
2.1.
Existing Paper Dictionaries of OE&ME. ................................................................................................ 9
Old English ....................................................................................................................................... 10
2.1.1.
2.1.2.
Middle English ................................................................................................................................. 24
2.2.
Existing Electronic Dictionaries of OE&ME ....................................................................................... 29
2.2.1.
Digitized versions of paper dictionaries ...................................................................................... 29
2.2.2.
Original electronic dictionaries ...................................................................................................... 32
2.2.3.
Other electronic projects ................................................................................................................ 33
Comparison of selected dictionaries ....................................................................................................................... 34
3.1.
Macrostructure ............................................................................................................................................ 35
3.1.1.
Wordlist ............................................................................................................................................. 35
3.1.2.
Headwords ....................................................................................................................................... 36
3.1.3.
Ordering ............................................................................................................................................ 38
3.2.
Microstructure ............................................................................................................................................ 39
3.2.1.
Structure ............................................................................................................................................ 39
3.2.2.
Quality & Content ........................................................................................................................... 44
3.3.
Other features ............................................................................................................................................. 49
3.3.1.
Typographical Features and Additional Materials ...................................................................... 49
3.3.2.
Electronic Dictionaries ................................................................................................................... 50
An Electronic Dictionary of Old and Middle English ........................................................................................ 52
4.1.
Users & Needs ................................................. ........................................ ........
............................. 52
4.1.1.
Students and Beginners ................................................................................................................. 52
4.1.2.
Advanced Users and Professional Scholars ................................................................................ 54
4.1.3.
Readers and Translators ................................................................................................................. 54
4.1.4.
Linguists ............................................................................................................................................ 54
Type & Character of the Data ................................................................................................................. 55
4.2.
4.3.
Features of the Application ...................................................................................................................... 56
4.4.
User Custornization ................................................................................................................................... 57
Sources for Digitization ............................................................................................................................................ 59
5.1.
Existing Paper Dictionaries ...................................................................................................................... 59
5.2.
Supplementing from Other Sources ....................................................................................................... 61
Process of Digitization of An Anglo-Saxon Dictionary by J. Bosworth & T. N. Toller. .............................. 62
6.1.
Scanning ....................................................................................................................................................... 62
6.2.
Automatic Character Recognition..................................................................................
......... 64
6.2.1.
OCR - Learning & Encoding..... ............. .............. ............. ...... .... ... .... ......
........................ 64
6.2.2.
Data Format ................................................................................................................................... 66
6.2.3.
Subsequent Automatic Corrections .............................................................................................. 67
6.3.
Hand Corrections ....................................................................................................................................... 68
6.4.
Application Development ........................................................................................................................ 72
Further Development ................................................................................................................................................ 76
7.1.
Tagging the Data ........................................................................................................................................ 76
7.1.1.
Microstructural Analysis & Definition of Tags .......................................................................... 77
7.1.2.
Tagging Process ............................................................................................................................... 79
7.2.
Morphological Analyzer ............................................................................................................................ 80
7.2.1.
Stemmer ............................................................................................................................................ 81
7.2.2.
Lemmatiser ....................................................................................................................................... 82
7.2.3.
Morphological Generator ............................................................................................................... 84
Conclusion .................................................................................................................................................................. 85
Bibliography: ............................................................................................................................................................... 86
Appendices .................................................................................................................................................................. 89
10.1.
Samples of dictionary entries .............................................................................................................. 89
10.2.
Samples of correctors' materials ......................................................................................................... 91
10.3.
Bosworth-Tolleron a CD-ROM ............................................................................................................. 92
Czech Summary I Shrnuti v ceskem jazyce ........................................................................................................... 95
1.
INTRODUCTION
"In going from library to library I took with me the Bosworth-Toller Anglo-Saxon Dictionary
and Supplement, an act which for sheer physical inconvenience one is not likely soon to forget."
(Meritt 1954, vii) In his Fact and Lore About Old English Words H. D. Meritt pertinently
summarized one of the most obvious and still quite relevant reasons for digitizing large
dictionaries. In 1954, when the book was published, most of the other reasons we shall now list
were largely irrelevant because they were impossible to implement. However, although possible
and probably desirable today, they do not seem to be widely acknowledged - if somebody wanted
to carry out similar work as Merrit did (search through manuscript glosses for new Old English
words), he or she would still have hardly a better choice than to experience physical
inconvenience in carrying around the voluminous Bosworth-Toiler dictionary.
This may seem surprising to researchers of present day English, as the level of
computerisation in the lexical study of Modern English has been fast and electronic dictionaries
of English, let alone corpora, are now a standard tool of linguists, literary scientists and general
public. Obviously, the demand for Old or Middle English dictionaries is much smaller than for
their Modern English counterparts, and the supply is accordingly small. To be precise, there is no
finished general electronic dictionary of Old English and the first general electronic Middle
English dictionary was finished and digitized after decades of work only few years ago, in 2001.
Despite its great extent and the high quality of its treatment, its online format may well be
unsuitable for many users.
A great deal of effort has been, and will be, spent on the Dictionary
of Old English,
which is
currently being developed already in an electronic format at the University of Toronto. This,
however, much like the Michigan Middle English Dictionary, is a long-term project, with "only"
letters A-F published so far.
Both these dictionaries are original works of lexicography and as such, they have required a
tremendous amount of work and effort. Let us now turn to printed lexicographical resources that
6
have as yet proven adequate in their extent and quality. Not only are most of these dictionaries
heavy and difficult to access (Bosworth-Tollercosts about £200), they also have all the limitations of
the traditional dictionaries: it is difficult to browse their headwords and follow their references &
links; it is very time-consuming to create customized word-lists or glossaries from them and quite
impossible to search all the elements of their entries (i.e. in full-text). Thus all their functions from those of translating dictionaries to those of research tools - are very limited, rendering them
unfit for both general public or students interested in medieval texts (because of the accessibility),
as well as for the researchers (due to limited functionality). Their functionality and accessibility
may be increased with much less effort than would be necessary to develop new brand
dictionaries.
Therefore, it is the aim of this paper (1) to propose what and how to digitize; and (2) describe
one such completed digitization project through all its phases. We will thus fmt try to survey the
existing resources (Chapter 2), compare the more prominent of the existing paper dictionaries
and the existing electronic resources (Chapter 3), discuss the possibilities of electronic
dictionaries (Chapter 4) and propose some candidates for digitization (Chapter 5). In Chapter 6,
we will try to describe one particular project of digitization and then, in Chapter 7, we will
propose some possibilities for its extension and further development.
7
2. A SURVEY OF THE FIELD
The Old and Middle English dictionaries are essentially translating tools developed to
facilitate parsing and translation of old texts. The study of old texts, however, has always been
regarded as beneficial not only because of the literary value of the texts themselves, but also
because of the inherent value of the old languages for philological inquiry and historical
linguistics. The field can thus be divided into several basic types that will be used further on to
classify the particular dictionaries, assuming, nevertheless, that dleir character is nearly always to a
smaller or greater degree translating and philological.
• Comprehensive dictionaries try to give as much detail about all lexical items of the
respective period as technically possible. Such works are demanding both to compile
& publish; there are therefore only few such dictionaries in existence and the latest
work is usually regarded as standard.
•
Concise and student dictionaries are also general in nature, but dley strive to cover
neither the lexical depth nor the breadth of the period completely, but rather aim for
clarity of representation, ease of use and accessibility.
• Glossaries usually cover only the lexical material of a certain number of texts, while
they often provide information specific to the forms occurring in the texts rather than
more general information about lexical items of the period. Their applicability outside
their selected texts depends on their size and quality.
• Etymological dictionaries are concerned with tracing origins of words.
• Specialized dictionaries cover particular areas of vocabulary based on topics (such as
Medical or Botanical dictionaries) or authorship (e.g. Chaucerian glossaries). They are
often published as glossaries.
• Thesauri are dictionaries with thematic rather than alphabetical structure aiming at
semantically paradigmatic connections between lexical units.
• Reverse or "production" dictionaries have Old or Middle English as their target
language to help their users produce Old or Middle English texts.
• Wordlists are either lists of dictionary headwords, or compilations of lexical units
serving usually as a basis for future dictionaries.
8
The difficulty with compiling a "complete" list of Old and Middle English dictionaries lies
mainly in the selection. As the main concern of the present paper is digitization, the following
criteria have been established:
The work has to be sufficiently general - glossaries of single works, authors or narrow
semantic fields are not included. Although some of the works included are in their title and
subject partly etymological, only those restricted to Old or Middle English etymologies are
included. Modern dictionaries of etymology or general dictionaries listing etymologies as part of
their entries are not included. Finally, the work has to be of certain minimal extent and depth works with less than ca. three thousand entries 1 or mere list of equivalents are not included lest
the list be "crammed" with a large number of glossaries of the numerous Anglo-Saxon and
Middle English readers.2
An attempt at a short characterization of each item has been made with prospective users in
mind, so that the list may be of some use beyond the scope of this work. Therefore, features like
various alphabetizations are not systematically covered, while the type of information included
and usefulness of their structure generally are.
2.1.
Existing Paper Dictionaries of OE&ME
The list of the paper dictionaries is divided into those of Old and Middle English,
respectively. It should be noted that Matzner's Altenglische Sprachproben is classified under Middle
English (A.!tenglisdJe should be perhaps better understood as Early than Old English) and Wright's
Anglo-Saxon and Old English Vocabularies appears in both sections as tl1e "Old English" was used
for Middle English in that particular book.
1 The total number of entries was estimated from the number of entries on two randomly selected pages divided
by two and multiplied by the total number of pages in the dictionary.
2 Most of the Anglo-Saxon readers appeared and reappeared in three consecutive waves following the
publication of Sweet's Reader. The first in the late nineteenth century, the second approximately one hundred years
later and the third is just culminating thanks to the massive digitization of the 19 th century texts by Google's and
Microsoft's book-search services.
9
Some titles that have enjoyed high popularity have gone through numerous re-editions and
reprints, involving new editors and sometimes even a change of title. An attempt has been made
to list all the titles, important reprints & re-editions with the names of all participating authors
and editors. The main entry of the list is usually based upon the best known title and the first
author, while the subsequent titles and editors are noted in the commentary. In some cases,
however, this solution was deemed confusing and the entry was split.
It was impossible to get hold of all the titles so that several of the descriptions are therefore
based on secondary sources only. Details that may prove helpful like ISBN, years of reprints or
re-editions, and numbers of pages (either of the whole work or its respective part) were also
included where available. Links to electronic versions of the paper dictionaries consisting of
scanned images or OCRed texts only are provided in this section; if a "full" electronic version
with a proofed text and search capabilities exists, it is referred to here, but treated fully in chapter
2.2.1.
2.1.1.
Old English
A list compiled by Reinhard F. Hahn et al. on Lowlands-L web site has been of particular use
for this section.
•
Baker, Peter Stuart. "Glossary" Introduction to Old English. Blackwell Publishing,
2003, (ISBN 0631234535; 89 pages)
o This modern-looking glossary is created with ease of use in mind, focusing on students
with no previous linguistic training. It offers extensive cross-references and a list of all
forms occurring in the texts covered by the book, while all the forms are provided with
grammatical information and back references. The typography is excellent and the
electronic version of the book including the glossary named Old English Aerobics is both
helpful and freely available. See chapter 2.2.2 for more information about the electronic
version. Other readers with glossaries include works by Bright, Corson, Diamond, Leo,
Mitchell, Sweet and Wyatt
10
•
Barney, S. A. et aL Word-Hoard: An introduction to Old English vocabulary. New
Haven (USA): Yale University Press, 1985 (ISBN 0-300-03506-3; 2 nd edition 1985; 86
pages)
o Inspired by Madden & Magoun (see below), the author lists related lexical groups by
frequency, trying to cover 90% of the Old English poetic vocabulary in ca 2000 words
and provide their basic etymologies. Though aimed at beginners, the level varies with
entries including cognates and relevant phonological changes. Unfortunately, typos are
frequent in the 1st ed. (Ward 1978, 329-330) and problematic items are presented
without any further comment (Hill 1978, 786), probably because dle work is aimed at
beginners.
•
Bessinger, Jess B. A Short Dictionary of Anglo-Saxon Poetry in a Normalized Early
West-Saxon Orthography. Toronto: University of Toronto Press, 1960 (ISBN
0802011217; reprint 1961; 106 pages)
o A dictionary of Old English verse in ca 5000 entries with defmitions, basic grammatical
information and frequency data based on Madden & Magoun (see below). There,
however, the frequencies are based on word-groups whereas here, they are
indiscriminately and thus misleadingly applied to all words belonging to a particular
group. The "presumed" words that in fact do not occur anywhere in the corpus of Old
English verse are not distinguished from those that do. (Campbelll962, 436-437)
•
Borden, Arthur R. A Comprehensive Old-English Dictionary. Washington, D.e.:
University Press of America, 1982. (ISBN 0819122548; published also by Rowman &
Litdefield Publishers; 1606 pages)
o A work uncharacteristic for its period, Borden's dictionary is rather reminiscent of the
early Anglo-Saxonists' efforts. Borden's goal was to prepare a dictionary fit for students,
but more extensive in its scope than Hall's and not as outdated as Bosworili's
dictionary. His approach, however, was to compile all the words found in these well
known dictionaries or in the more extensive glossaries with the addition of some "new"
words from both prose and poetry. Although the author admits that his goal was to a
large extent fulfilled by the publication of Merrit's edition of Hall in 1960, he held onto
his project and after ca. 30 years of work published a work resembling a general glossary
of Old English. It is a glossary rather than a dictionary, because it provides only basic
grammatical information and brief, but numerous equivalents. This seems to suggest
that Borden had translators rather than philologist on mind, but as the defmitions are
not analytical or explanatory and at the same time no references or citations are given, it
11
is very difficult for a translator to choose from the list of possible equivalents, the only
guide being the context of the translated text. This might have been sufficient in a
shorter text, where the glossary can be specific, but in a general or even comprehensive
(as the title claims) work, it is very unsatisfactory. The glossarial impression is also
strengthened by the author's approach to prefixes and compounds that he always lists
under the first elements, but does not give any hint of tl1eir status. Only the prefix status
of "a-" is hinted by the dash, but the real reason is tl1at the typography of the book does
not distinguish between "ae" and "ash" (x) and this was felt as problematic in "a-"
prefIxed words beginning with "e" (and surprisingly not elsewhere). This strictly
alphabetical ordering is very convenient for students, but it is marred by only sparse
cross-references between variant spelling or dialectal forms. These forms are also hardly
ever noted in the entries. The selection of headwords is at the same time very
unpredictable - the author professes both a preference for early West-Saxon forms in
the beginning of the project, but also his later abandonment of the principle. The
typography generally decreases the usefulness of the project. Only one typeface is used
and as the indentation of the entries is rather illogical, the effect is not of a well-arranged
space. The dictionary is a product of immense effort and a well meant resolve that
resulted in a great amount of material assembled into a form of a questionable
usefulness. 3
•
Bosworth, Joseph. An Anglo-Saxon Dictionary. based on the manuscript collections
of the late Joseph Bosworth. Eds. T. Northcote Toller and Alistair Campbell. Oxford:
Oxford University Press, 1838-1972 (ISBN 0198631014; main volume as "A Dictionary
of the
Anglo-Saxon Langllage", London: Longman, 1838; edited by Toller, 1882-1898;
Toller's Supplement, 1921; reprinted 1966; Campb ell' s Enlarged Addenda and
Corrigenda to the Supplement, 1972; 2069 pages)
o <htq,: I /books.g()oj?1c.com/books~cid=OCLCOl 044234<.\:id=YI -\J.~L\_ \A\L-Uj> (1 SI ed.)
< http://lc''.:icon-ffcuni.cz/tcxts/oc bosworthtollcr about.html> (2nd ed. & supplement)
o The sheer span of its re-publication testifles to the longstancling primacy of the
dictionary whose first appearance nearly 180 years ago was heralded as "to form an era
in this study" ("Anglo-Saxon Literature" 92). Indeed, it was the first Old English
dictionary to render its defInitions mainly in English (rather than in Latin, though
Somner's dictionary used both languages in many entries) and it has remained, through
3 The lengthy description of a comparatively unimportant work may be justified by the fact that the book is very
rare and no previous descriptions or reviews are known to exist.
12
its re-edition and supplement by T.N. Toller and subsequent addenda and corrigenda by
Alistair Campbell, the most complete work of Old English lexicography until today.
However, the original Dictionary
ofAnglo-Saxon Language and
the first two parts of its re-
edition that were completed by Bosworth himself did not reflect most of the important
development made in Anglo-Saxon studies during the 19 th century, especially in
phonology (Garnett 1884, 359-361). Some of these deficiencies were subsequently
remedied by Toller: the structure planned by Bosworth was altered and many words
from prose were added by Toller himself, while he relied mostly on Grein to supply
poetical expressions and the respective citations. Some outdated features were retained,
however, for the sake of consistency with the first two parts (Garnett 1898, 323-6). The
main volume as such has not changed much since then, but citations from modern
editions were added by Camp bell in 1972 together witll all words appearing in ClarkHall's 3 rd edition and about 750 completely new words (Samuels 1974, 111). In its
current state, after the 1972 revision, this philological and translating dictionary offers
detailed definitions, plentiful citations, basic grammar information, cognates and
secondary references, but some of its assumptions may be outdated. Its structure is far
from consistent and searching through its main volume and two supplements may
prove rather strenuous. See Chapter 3 for more detailed information.
•
Bosworth, J oseph. A Compendious Anglo-Saxon and English Dictionary. John
Russell Smith, 1848 (reprinted 1852 and 1860; 280 pages)
o An abridgement of the first edition of An Anglo-Saxon Dictionary, it suffers from the
same deficiencies as the original unabridged dictionary from which it inherited its
structure. It has been superseded in its function of a concise dictionary by later works
(like that of Hall's).
•
Bright, James W., "Glossary", in Bright's Anglo-Saxon Reader. Ed. Frederic G.
Cassidy and Richard N. Ringler 3rd edition, Harcourt, 1891-1972 (ISBN 0030847133;
reprinted 1894 1912, 1917, 1935, 1961, 1972 as "Old English Grammar and Reader"; ca
150 pages)
o < http://w\\;w.archiye.orgl details/anglosaxonreaderOObriguoft >
o Probably the second best-known of Old English reader's, Bright's Reader has come out
second chronologically and has thus somewhat profited from the experience of Sweet's
Reader. Bright's Glossary strives especially for ease of use. The various forms appear all
under their respective lemmata and the cross-referencing is unfortunately not very
extensive. Each entry lists the variant spellings and all the forms occurring in the Reader
13
with references and related grammatical information. Sometimes a specific meaning is
provided with particular form on top of the general definition (Gummere 1892, 149151). In the latest edition, the inflectional variants & spellings together with the
grammatical information for each form have been omitted and replaced by a more
thorough cross-referencing. Including both cross-references and variant forms might
have perhaps increased the volume of the book, but could have proved quite helpful,
especially to the beginners. Other readers with glossaries include works by Baker,
Corson, Diamond, Leo, Mitchell, Sweet and Wyatt. More information about an
electronic version of the glossary may be found in chapter 2.2.1.
•
Bouterwek,
.Karl
Wilhelm.
Ein
Angelsachsisches
Glossar:
Caedmon's
des
Angelsachsen biblische Dichtungen. 2 vols. Julius Badeker, 1850, 1854 (ISBN
3253019985; reprinted by Sandig Reprint Verlag 1968; 393 pages)
o < hup:!I W,\V\Y. archin~.org! details! caedmonsdesangelO j c~ cd llO Et >
o An Old English-Latin glossary based on Caedmon's poetry, it is, thanks to its
thoroughness and size, of a more universal use - especially in Old English poetry. The
second volume containing the glossary gives Latin equivalents with occasional
explanations in German, basic grammatical information, variant spellings and short
quotations with references to all forms. The vowel length is marked by circumflex and
the "length" of diphthongs is marked over the second element. A special feature of the
glossary is a reverse Latin-Old English list and an index.
•
Corson, Hiram. "Glossary" in Hand-book of Anglo-Saxon and Early English. New
York: Holt & Williams, 1871, pp. 329-493
o <http://,y\\\y.archiye.org / details !handbookofanglosOOcorsuo ft:>
o The Glossary of the first reader to cover both Old (Anglo-Sa.,'wn) and Middle English
(Early English) periods brings the vocabulary of both periods under one list, but marks
Old English as "pure Anglo-Saxon words" (330) to distinguish them from later
innovations. Irregular or hard-to-derive forms are only occasionally cross-referenced,
but certain forms of verbs are given under the main entry even when they do not occur
in the covered texts; references to the texts seem a bit random. "Eth" (0), "thorn" (P)
and "yogh" (3) come surprisingly at the end of the alphabet, while "ash" comes
(similarly to Bosworth) between "ad-" and "af-". Forms prefixed by "ge-" appear under
their stems only. Compounds are occasionally explained Equivalents and basic grammar
are given with sporadic explanation (Fisher et aL 1878, 552). Other readers with
glossaries include works by Baker, Bright, Diamond, Leo, Mitchell, Sweet and Wyatt.
14
•
Diamond, Robert E. "Glossary" Old English Grammar and Reader. Wayne State
University Press, 1970, (ISBN: 0814315100; pp. 207-300)
o An approachable glossary with normalized spelling and plentiful cross-referencing that
does not expect nearly any previous knowledge of Old English grammar or phonology.
Modern English equivalents and basic grammatical information are provided. The fact
that all texts in the reader are provided with mirror-translations may slightly hamper the
usefulness of the glossary, though (Wilson 1970, 62-3). Otller readers with glossaries
include works by Baker, Bright, Corson, Leo, Mitchell, Sweet and Wyatt.
•
Ettmiiller, Ludwig. Vorda Vaellist6d Engla and Seaxna. Lexicon Anglosaxonicum Ex
Poetarum Scriptorumque Prosaicorum Operibus Nec Non Lexicis Anglosaxonicis
Collectum, Cum Synopsi Grammatica. Quedlinburg and Leipzig: Gottfried Basse /
London: Willaims & Norgate, 1851 (reprinted Rodopi, 1968; 767 pages)
o Old English poetic and prosaic Lexicon with equivalents and comments in Latin.
•
Grein, C. W. M. Sprachschatz der angelsachsischen Dichter. Cassel & Gottingen:
Georg H. Wigand, 1861-4 (ISBN 3-8253-2324-2; reprint by Carl Winter Verlag 1974;
1342 pages)
o < http://books.google.com/books)vid=OCLCLW78151 >
o The most detailed description of Old English poetic vocabulary, Grein's work may be
antiquated today (sinlliarly to the 1st edition of Bosworth), but in its days it was
plundered by many Old English lexicographers. Not only had Grein's work supplied
many words and citations to Toller's edition of Bosworth or Hall's dictionary, its
abridgement has been published by Groschopp and this abridgement has been enlarged
again, translated to English, and published again by Harrison & Baskervill. Its value for
other lexicographers principally stems (apart from its volume) from its extensive
citations and references. To these Grein added references to other modern works, Latin
equivalents, extensive comments in German, basic grammar, spelling variants and,
occasionally, old or modern cognates. The main aspect of the work being philological,
the compounds are marked, and words are listed according to their real occurrence, so
that, for example, words prefIxed by "ge-" are listed with it as well as without - if they
so occur. Interestingly, "yoghs" are transcribed with "v" and ashes by "a" (sinlliarly to
Leo).
•
Groschopp, Friedrich. Kleines Angelsachsisches Worterbuch von C. W. M. Grein.
Nach Grein's Sprachschatz der Angelsachsischen Dichter. Cassel & Gottingen: Georg
H. Wigand, 1883
15
o This abridgement of Grein's is little more than a wordlist as all citations and comments
were left out - the result being more or less a wordlist with equivalents in Latin or
German.
•
Griffiths, Bill. A User-friendly Dictionary of Old English. Loughborough: Heart of
Albion Press, 1989 (ISBN 978-1872883854; 3cd ed. 1993, repr. 1995,97, 4th ed. 2002,
repr. 2004, 5th ed. 2005; 116 pages)
o This thin volume is hardly more than a wordlist g1Villg mostly one-word Modern
English equivalents with a very simplified one page grammar introduction. The
grammatical information provided in its entries is also reduced to minimum.
Paradoxically, two classes of weak verbs are distinguished, but strong classes are not those strong verbs that appear in the word list are listed as inflected forms in a specific
tense (thus "wesan" is not listed, but "wxs" is). An interesting feature is the orderingthe 3,500 most common forms constituting the dictionary are listed according to their
consonants, disregarding the vowels. The list is divided by headings with particular
consonant and with positions of vowels indicated by asterisks. "wxs" can thus be found
under "W*S" heading, "beon" under "B*N". The author defends his system on the
grounds of the unpredictability of Old English vowels, but the real practicality of tllls
system may seem doubtful. Beginners, for whom the book is intended, may feel rather
daunted by this unusual system.
•
Hall, John Richard Clark. A Concise Anglo-Saxon Dictionary. Ed. Herbert D. Meritt.
Toronto: University of Toronto Press, 1894-1984 (ISBN 0802065481; 2 nd ed. revised
1916, yd ed. revised & enlarged 1931, 4th ed. supplement by Merrit 1960, reprinted
1984; 432 pages)
o < http://lexicon.ff.cuni.czltexts/oc clarkhall about.html>
o Probably the most popular dictionary with the students of Old English, the Comise
Dictionary tries to balance its practical usability and accessibility with exhaustiveness,
standing thus halfway between the exhaustive but cumbersome Bosworth-Toiler and the
smaller Sweet's Students Didionary or the antiquated Bosworth's Compendious Dictionary.
All the effort has been concentrated on making a translating dictionary for students - all
forms of strong verbs, irregular weak verbs and some variant spellings are crossreferenced, variant spellings are listed with the main form, basic grammatical
information, short defmition and references to the sources (not quotations) are
provided. Compounds' division is unmarked (though prefixes are); neither grammar nor
phonology is explained to any depth, though useful references are made to secondary
16
works. Etymology is not given except for occasional etymological equivalent or cognate
(Garnet 1898, 326-7) but the second edition introduced references to NED~ entries
(Knott 1917, 64). Hall was the fIrst one to introduce macrons over the fIrst element to
mark "long" diphthongs. Also, the dictionary follows a strict alphabetical order (though
words prefIxed by "ge-" appear under their respective stems, if so attested), placing
"ash" as "ae" and recognizing "eth" (used indiscriminately for both original eths and
thorns) as an individual letter after "t". Problematic from the user's point of view is
Hall's spelling normalization - all forms are normalized, but only if the resulting form is
attested (Chase 1895, 50-2). The third and fourth edition revised the defmitions and
added new words, notably the 12th century words in the 3'd edition (Magoun 1932, 287).
Unfortunately, changes & additions in the fourth edition are not incorporated into the
main body of the dictionary (Campbelll962, 436). See Chapter 3 for more information.
•
Harrison,
James
Albert and William Malone Baskervill. A Handy Anglo-Saxon
Dictionary: Based on Groschopp's Grein. New York & Chicago: A. S. Barnes & Co.,
1885 (317 pages)
o This English translation of Groschopp's Grein added cognates in a slightly random
fashion to supply some etymological information, because original Grein's cognates
were deleted by Groschopp. Modern English derivatives were marked by special type to
facilitate the use of the dictionary for students. Unfortunately, the wordlist itself was not
been seriously revised so that words were preserved that were by then proved not to
exist (like fIctitious infmitives) or to exist with different spelling; the marking of long
diphthongs was corrected only inconsistently. (Bright 1885, 493-5)
•
Holthausen, Ferdinand); Altenglisches etymologisches Worterbuch. Heidelberg: Carl
Winter Universitatsbuchhandlung, 1932-4 (lSBN 3825305082; 2 nd ed. 1968; reprinted
1974; 428 pages)
o Wordlist giving Old Germanic cognates (esp. High German), etymological cognates in
Modern English and German translations & comments of normalised Old English
words collected mainly from Hall's dictionary. No further etymologies are given, but
references are provided esp. to Walde-Pokorny5. Proper names are usually included, but
no detailed explanations are given; occasionally, modern equivalents are offered
(Magoun 1933, 94-6). The lexicon is mainly philological and features no extensive cross-
./ The New English Dictionary.
5 Walde, A. and Pokorny, J. Vergleiche/ldes If7ifrterbtlch der illdogermallischen Sprachm Berlin et Leipzig, De Gruyter,
1927,1930
17
references, compounds are listed under the second element only (CL.W. 1934, 242-4)
and only sparse grammatical information is provided. This approach saves space but
may make locating specific items rather difficult.
•
Jember, Gregory K, John C. Carrell, Robert P. Lundquist, Barbara M. OIds,
Raymond P. Tripp, Jr. English-Old English Old English-English Dictionary. Boulder
Col.: Westview Press, 1975-1984 (ISBN 0891580069; 4 editions; 178 pages)
o Jember et al. aimed to provide a production dictionary for students of Old English.
Therefore, they furnished their dictionary with both English and Old English wordlists
(each with its equivalents). The Old English words come with simplified spelling,
compounds translated by their parts and lists of affixes with their functions. In this way,
the authors try to provoke students' creativity in Old English. Also, they provide
examples of words not appearing in the existing corpus, which, together with the fact
that they decided to ignore vowel length and explain the basic grammar in a dangerously
generalised way (Mitchell 1982, 48), urges cautious usage of the dictionary. Other
dictionaries with Old English as their target language are those of Skeat and Pollington.
•
Leo, Heinrich. Angelsachsisches Glossar. Ed. Walther Biszegger. Halle: Verlag der
Buchhandlung des Waisenhauses, 1872-7 (732 pages)
o An enlarged wordlist built upon a similar system to the earlier Sprm-hproben's Erklarenden
(see below) with added references and supplemented by an alphabetical index (by
Biszegger) to facilitate searching.
•
Leo, Heinrich. Altsachsische und Angelsachsische Sprachproben Herausgegeben und
mit einem erklarenden Verzeichniss der Angelsachsischen Warter versehen. Halle:
EduardAnton, 1838, pp. 95-274
o < hrtp:! /books.googlc.col1l!books~vid=OCLCO)9959% >
oLeo's Erkkirenden is a glossary to his Old English and Old Saxon reader, but lists only
Old English vocabulary with German translations, comments and basic grammar. The
wordlist is arranged according to Grimm's vowel system, which makes it an interesting
philological resource, but a very difficult dictionary to search. A similar diacritic system
is used to Grein's. Other readers with glossaries include works by Baker, Bright, Corson,
Diamond, Mitchell, Sweet and Wyatt.
•
Lehnert, Martin. Poetry and Prose of the Anglo-Saxons: Dictionary. Berlin: VEB
Deutscher Verlag der Wissenschaften; London: Bailey & Swinfen, 1956 (2 nd ed. 1960,
repr. 1969; 250 pages)
18
o Though a separate title, the Dictionary is in fact a glossary to the fIrst volume of Lehnert's
reader. It provides detailed grammatical information and cross-references, but both
inconsistently. DefInitions are by equivalent or analytical description, sometimes the
usage is exemplifIed, but neither quotations nor references to the source texts of the
[list volume are provided. Occasionally, notes are incorporated to give additional
comments on the concepts described and an unusual number of etymons is supplied
with most entries (Woolf 1956,766-9).
•
Lye, Edward. Dictionarium Saxonico et Gothico-Latinum. Ed. Owen Manning. 2
vols. London: Ed. Allen, 1772 (478 & 741 pages)
o available through ECC0 6 < http://w\vw.galc.com/FightecnthCcntun· / >
o Lye's Didionanum published posthumously by Manning (ironically drawing on Lye's
posthumous edition of Junius's Erymologicum~ is an exceptionally detailed collection of
Old English and Gothic vocabulary with exhaustive quotations, Latin translations &
comments and occasional Modern English cognates. In its detail, it is the fIrst work of
its kind fully replacing Somner's work and superseded only by Bosworth's dictionary
whose base it forms (Birrell 1966, 111). A notable feature is its use of many
typographical features in several delicate and cleverly employed fonts. Obviously, the
date of its publication renders the work outdated for most uses.
•
Madden, J. F., and F. P. Magoun. A Grouped Frequency Word-List of Anglo-Saxon
Poetry. Harvard University Press, 1954-1967 (ISBN 0674364007; 63 pages)
o A short work that has, however, inspired several other authors including Bessinger,
Barney and possibly Jembei. The words are glossed in Modern English and arranged
into related groups; the groups are listed by frequency (aggregate frequency of the words
belonging to the group) to facilitate learning of new vocabulary for beginners. (Orrick
1955,438)
•
Mitchell, Bruce and Fred Colson Robinson. "Glossary" A Guide to Old English.
Blackwell Publishing, 1964-2001, pp. 317-391 (ISBN 0631226362; 7th revised ed.)
6 Eighteenth Century Collections Online is a Thomson Gale paid service bringing nearly all books in English
published between 1701 and 1800 online.
7 Junius, Franciscus (aka Fran<;:ois du Jon). Etymologicum Anglicanum. Oxford, 1743 (with Lye's Old English
grammar)
8 Here the reference is to a shorter work by] ember that has not been included in the list and that adopts some
of Madden's & Magoun's principle in the filed of Old English prose: Jember, G. K., and F. Kemmler. A Basic
Vocabulary of Old English Prose I Grundwortschatz altenglische Prosa. Tiibingen: F. Max Niemeyer, 1981 (ISBN
3484400870; 48 pages)
19
o A simple index of older editions of the Guide has been replaced by a detailed glossary by
Robinson since the 5th edition of 1982 (Stanley 1985, 141). It is intended for beginners
and has thus striven from the beginning for ease of use with "heavy parsing of words
recorded" (vii), great detail in translation & grammar and intensive cross-referencing. In
earlier editions it was troubled by un systematic references and inclusion of proper nouns
as well as typos (Calder 1984, 418-9) - these were mostly corrected in later editions and
the glossary is now one of the most user-friendly to be met in Old English readers.
Other readers with glossaries include works by Baker, Bright, Corson, Diamond, Leo,
Sweet and Wyatt.
•
Pollington, S. Wordcraft: Wordhoard and Wordlists. Concise dictionary and thesaurus
Modern
English-Old English.
Norfolk:
Anglo-Saxon
Books,
1993-9.
(ISBN
1898281025; 240 pages)
o Divided into two sections, Pollington's Wordcrift aims at production in Old English and
learning basic vocabulary. Its [mt part is a very concise Modern-Old English dictionary
cross-referenced with the second section, which forms a thesaurus of kinds - the words
there are grouped into very general thematic units (arts, religion, society, emotions, etc.).
It may be helpful to beginners or people writing in Old English. Similar works are those
by Jember and Skeat.
•
Roberts, J ane, Christian Kay, and Lynne Grundy. A Thesaurus of Old English. 2 vols.
King's College London Medieval Studies X, 1995 (ISBN 9042015632; 2nd ed. by
Rodopi, 2000; 1555 PAGES)
o A [ust comprehensive thesaurus of Old English, the TOE was designed as a part of a
larger Historical Thesaurus
of English
(whose development is still under way) and is based
on the dictionaries of Hall and Bosworth-Toller. It can serve as a simple tool for fInding
synonyms and connotations, but its use for research may be much broader, as the
thesaurus indicates meronyms & hyponyms and marks infrequent words or words
occurring only in poetry or glossaries (Conner 1998, 889). Its classifIcation is not
particular to Old English, thus it will be possible to view movements in semantic fields
once the thesauri of other periods are fInished. Unfortunately, no translation or gloss is
provided so that beginners in Old English may need to use a separate dictionary, but
searching for Old English forms is facilitated by an alphabetical index. See chapter 2.2.1
for information about the electronic version.
20
•
Skeat, WaIter W. English - Anglo-Saxon Vocabulary. Cambridge: University Press,
1879 (ISBN 0948565683 / 0948565685; centenary ed. 1935, reprint Cyhoeddwr:
J oseph Biddulph, 1990; 40 pages)
o A simple reverse wordlist based on Sweet's Glossary in his Anglo-Saxon Reader and on
his History
of English Sounds. It is probably the first and very brief reverse wordlist of Old
English. Other reverse wordlists include Pollington's and Jember's dictionaries.
•
Somner, William. Dictionarium Saxonico-Latino-Anglicum. Oxford: William Hall,
1659 (abridg. by Thomas Benson as Vocabularium Anglo-Saxonicum, S. Smith & B.
Walford, 1701; 228 pages)
o Available through EEB0 9 < lutp:/Ieebo.chachwck.coml >
Benson's Abridgement < http://books.googlc.com/books;:'yid=()C1.C1)14::>lS::>S >
o Though often called the first Old English dictionary, Somner's Didionarium is more
specifically the ftrst published dictionary, drawing heavily on works of many of the early
Anglo-Saxonists like Nowell, Parker, Joscelyn, Dugdale, Junius or D'Ewes. Because of
the medley of its sources, its entries give a rather inconsistent impression and as the
understanding of Old English poetry was not, at the time, quite advanced yet, the
dictionary does not include many poetic terms. It is quite characteristic of the further
development of Old English lexicography that its first product is rather a students' than
a philological dictionary. A more detailed description of the early unpublished
dictionaries may be found in M. S. Hetherington's paper.
•
Sweet, Henry. "Glossary" in An Anglo-Saxon Reader
1n
Prose and Verse: With
Granunar, Metre, Notes and Glossary. Eds. C. T. Onions (9.-14. eds.), Dorothy
Whitelock (15. ed.). Oxford: Clarendon Press, 1876-2005 (ISBN 019811169X; revised
15th ed. 1975; 418 pages)
o The ftrst and the best-known Old English reader has gone through 15 editions since its
publication. So good was Sweet's initial work or so great is the reverence for his person
that the work has not changed dramatically since his early editions. In the reader itself,
only few texts have been replaced and only the few corresponding entries in the glossary
followed. Some spelling conventions changed, defmitions were perfected and additional
cross-references supplied (Brook 1949, 283), but Sweet's intention (after he published
his Primer) that the reader is not intended for beginners (Garnett 1883, 332) has been
respected by the later editors (though not necessarily by teachers). Thus cross-
9 Early English Books Online is a ProQuest Information and Learning Company paid service bringing virtually
all books in English published between 1473 and 1700 online.
21
referencing is limited to forms unpredictable from sound knowledge of grammar, but
the grammatical introduction itself has disappeared. Headwords are not normalized, in
fact, the normalized variants that do not occur in the reader, but were included in the
Glossary by Sweet, were left out of the Glossary by later editors. Grammatical
information is provided, but user-friendliness does not seem to be very much aimed at,
because only irregular or strong verbs are marked while adjectives and pronouns are not
indicated at all. The Readerwith its glossary remain a standard work that does through its
numerous editions reflect some development in Old English scholarship, but it does not
much reflect the change in users' needs (Mitchell 1968, 415-6). Other readers with
glossaries include works by Baker, Bright, Corson, Diamond, Leo, Mitchell and Wyatt.
•
Sweet, Henry. "Glossary" in The Oldest English Texts. London: Early English Text
Society os 83, 1885, pp. 461-652 (ISBN: 978-0197220832; repr. 1938, 1957, 1966,
1978, 1985; corrected by Collins 1963?)
o Because the texts included in the GET are mostly early Old English glossaries and
charters, the edition is from beginning aimed at advanced readers. Therefore, the main
purpose of the glossary is rather to be an aid in linguistic research than a translator's
tool. The arrangement is thus similar to the one in Sweet's History ojEng!ish Sounds (see
below) - the words are listed according to their root vowels, possibly under the oldest
one (if there are variants). Compounds are listed under the second element, but for
variants of the flrst, one has to look in their independent entries. Cross-references are
generally not provided. Each entry comprises of grammatical information (though not
for all forms), Modern English equivalent and Latin equivalent if the source is a Latin
gloss. The source references are usually exhaustive. The glossary is followed by a proper
name index and an alphabetical index "of the roughest character" (vii) to facilitate
searching to some degree.
•
Sweet, Henry. The Student's Dictionary of Anglo-Saxon; London: The Macmillan
Company, 1897 (ISBN 0198631073 / 1904799094; repr. Oxford University Press,
1920-1978 and Tiger of the Stripe, 2006, 217 pages)
o A general student dictionary of Old English, Sweet is more concise than Hall and strives
for clarity rather than deepness - he condemns etymological translation and use of
modern cognates if their meaning has shifted or they are non-standard (dialectal,
archaic, etc.), giving usually only a single equivalent. There are no references in Sweet's
dictionary, but example sentences (of an unknown origin, some possible fabricated) are
provided (Garnett 1898, 327). Grammatical information is given, but not altogether
22
consistently (e.g. gender) and prefIxed words are usually to be found under the stem,
even when the word without the prefIx is unattested. The number of words is ca % of
those in Hall's dictionary.
•
Sweet, Henry. "Wordlists" in A History of English Sounds from the Earliest Period.
with full word-lists. Oxford: Clarendon Press, 1888, pp. 279-400
o < http:! h\\V\v.archi\'e.org/details/histor~·ofenglishO()swccuofL >
o Two wordlists and an index to the fIrst one were added by Sweet to his History as a
comprehensive overview of the described sound changes. In both lists the words are
arranged by the pronunciation of the vowels: in the fIrst list by the original Old English
pronunciation with the old, middle, modern and occasional variant spellings, in the
second list by the Modern pronunciation. The alphabetical index should facilitate
fInding a Modern English word in the fmt list.
•
Wright, Thomas. Anglo-Saxon and Old English Vocabularies. 2. vols. Ed. Richard
Paul Wiilker. London: Triibner, 1884 (fmt volume as A Volume
of Vocabularies,
1857;
pages 452&500)
o < http://\\"\.vw.archi\·e.org Isearch.php)guen·=\uighto()'lOnxabubries >
o A collection of Old and Middle English (here called Anglo-Saxon and Old English)
vocabularies and glossaries with Old English, Middle English and Latin indexes. Wright
assembled the vocabularies, added his commentaries and references, compiled an Old
English subject index and published the volume as A Volume
of Vocabularies
in 1857.
Wiilker changed Wright's selection and compiled new alphabetical indexes while adding
his own comments and additional references.
•
Wyatt, Alfred John. "Glossary" in An Anglo-Saxon Reader With Notes and Glossary.
Cambridge: The Cambridge University Press, 1919, pp. 284-360
o < http://\\"\\"w.archivc.org/dctails/anglosaxonrcadcrOOw\·311Jotl >
o A glossary to Wyatt's Reader, listing only variants occurring in the reader itself.
Headwords are the "best" variants, but the others are listed and cross-referenced. The
cross-references are not used where knowledge of grammar should suffIce to locate the
word though. Modern English glosses are given, but not to particular occurrences.
Phrases deemed to convey "oneness of notion" (viii) are compounded and glossed as
single words. Grammatical information is provided and each part of speech (of a given
word) has a separate entry (Wardale 1919, 34). Other readers with glossaries include
works by Baker, Bright, Corson, Diamond, Leo and Mitchell.
23
2.1.2.
•
Middle English
Brandl, Alois and
o. Zippel
Mittelenglische Sprach- und Literaturproben. Ersatz fur
Miitzners Altenglische Sprachproben. Mit etymologischen Wbrterbuch zugleich fur
Chaucer. Berlin: Weidmann, 1917, pp. 256-420 (2 nd ed. 1927)
o A new version of Miitzner's reader and glossary aimed at beginners, especially at readers
of Chaucer. Equivalents are given in German and Modern English together with
etymons. Other readers with glossaries include works by Burrow, Corson, Davis,
Dickins, Emerson, Morris, Mosse and Tolkien.
•
Burrow, John Anthony and Thorlac Turville-Petre. "Glossary" A Book of Middle
English. Blackwell Publishing, 1992-2004, pp. 322-373 (ISBN 9781405117081, 2nd ed.
2001, 3rd ed. 2004)
o Though Borrow & Turville-Petre follow the design of Mitchell's Guide, they do not
presume any knowledge of Old English. The glossary provides only very basic
etymological information in the form of an etymon, but otherwise no knowledge of
etymology and nearly no knowledge of grammar is required to locate entries. Not only
are variant spellings well cross-referenced but when the reference would send the user
searching too far, the variant is glossed separately. All forms occurring in the texts
covered are fully back-referenced together with grammatical information. Separate
glosses are also provided to possibly confusing forms. The overall arrangement together
with lucid typography makes for a remarkably easy-to-use glossary (Nicholson 1994,
115-7 & J acobs 545-7). Other readers with glossaries include works by Brandl, Corson,
Davis, Emerson, Dickins, Morris, Mosse and Tolkien.
•
o
•
Corson, Hiram. "Glossary" in Hand-book of Anglo-Saxon and Early English.
see under Old English
Coleridge, Herbert. A glossarial index to the printed English literature of the
thirteenth century. London: Trlibner, 1859 (200 pages)
o < http://w\.y\y.archiyc.org /dctails / glossarialindcxtOOcoleuoft >
o The G!ossana! index seems to be a book of double purpose. Firstly, it is a preparatory
step for the NED (Coleridge was to be its fIrst editor) serving both as a manual for
other collectors and as a database of 13 th century lexical material; secondly, it also served
as a basis for Coleridge's Dictionary (see below). While this one gives all the words found
in the literature of the 13 th century by their Modern English equivalents, the Dictionary is
its opposite. Thus they complement each other as a reverse dictionary. Their entries are
24
brief, with only the equivalent, part of speech tag, reference to the source text and
sometimes etymological cognates included.
•
Coleridge, Herbert. A dictionary of the first or oldest words in the English langpage :
from the semi-Saxon period of A.D.1250 to 1300 : consisting of an alphabetical
inventory of every word found in the printed English literature of the 13th century.
London: J. C. Hotten, 1862 (repr. 1975)
o see the above title
•
Davis, Norman. "Glossary" in Early Middle English Verse and Prose. Ed.
J.
A. W.
Bennett and G. V. Smithers. Oxford: Clarendon Press, 1966-8, pp. 431-614 (ISBN
0198711018; 2nd ed. 1968, repr. 1974, 1982, 1985, 1987, 1989, 1991)
o An exhaustive glossary with rich grammatical information, extensive cross-references,
Modern English equivalents for each form of a different meaning and with etymons.
The most common forms are used as headwords followed by all variants occurring in
the text, while "theoretical" forms are avoided. Compounds are usually treated under
the first element, where both elements are explained (d'Ardenne 1968, 183). Other
readers with glossaries include works by Brandl, Burrow, Corson, Dickins, Emerson,
Morris, Mosse and Tolkien.
•
Dickins, Bruce and R. M. Wilson. "Glossary" in Early Middle English Texts. New
York: W. W. Norton & Co., 1951, pp. 243-330 (ISBN 978-0370001487; repr. 1952-61)
o The Glossary covers all the forms appearing in the texts. Each form is provided with
grammatical information (part of speech is indicated by superscript numbers), Modern
English equivalent and a back-reference with occasional exemplifying quotations. The
forms are grouped under their head-forms on an etymological basis, so that similar
forms with differing etymologies are treated separately. Where possible, the etymon is
given in the most representative form and in problematic entries, readers are referred to
the NED. The order is alphabetical with initial "ash" following "ad", yogh following "g"
and "thorn" & "eth" following "t". For some reason, "thorn" & "eth" in the medial
position follow "t+yogh". Compounds are usually treated under the first element and
cross-references abound (Macdonald 1953, 404 & Einarsson 1953, 575-6).
•
Emerson, Oliver Farrar. "Glossary" in A Middle English Reader. 1905, pp. 319-478
(ISBN 978-0404147846; 2nd ed 1915, repr. 1916-1978)
o A glossary with incorporated list of proper nouns. "Normal" forms are used as
headwords, prefixed forms and variant spellings are usually cross-referenced. A Modern
English gloss, references and grammatical information are provided with each headword
25
and most of the forms. An interesting feature is placing an initial "ash" & "eth" after
"t", but a medial one after "tg", while an initial "yogh" before "i", but a medial one after
"f" together with "g". Diacritics used are also notable for the use of "hacek" to denote
palatalization of "c" and "g", but only in ambiguous cases, which may prove confusing.
Other readers with glossaries include works by Brandl, Burrow, Corson, Davis, Dickins,
Morris, Mosse and Tolkien.
•
Lewis, Robert E. (editor-in-chief), Middle English Dictionary. 118 vols. Michigan:
The University of Michigan Press, 1951-2001 (ca. 15000 pages)
o A comprehensive dictionary of Middle English was continually published from 1951 to
2001, but its preparations started decades earlier and through the existence of the
project, several editors-in-chief directed it, namely C. S. Northup (at Cornell University),
S. Moore (from now at the University of Michigan), T. A. Knott, H. Kurath and finally
10
R. E. Lewis. The principle of the dictionary is similar to that of NED or OED in that
respect that it builds upon a great number of quotations gathered from many
volunteering scholars around the world. Its many entries have a clear and uniform
structure. The normalized headword with a part of speech tag is followed by few
selected variants (the selection has apparendy been made upon other grounds than
frequency), etymology and groups of definitions followed by quotations. Defmitions are
analytical or by equivalent and make use of usage labels. Quotation is provided for every
meaning together with the name of the source manuscript, an approximate dating of the
source (of manuscript, composition, or both) and a unique number (Malone 1953, 204-8
& Mc Sparran 2006, "MEC Help Page"). See chapter 3 for a more detailed description
and chapter 2.2.1 for the description of the related Middle English Compendium.
•
Miitzner, Edward Adolf Ferdinand, "Worterbuch" in AItenglische Sprachproben. Ed.
Karl Goldbeck. Berlin: Wiedmann, 1867
o See Brandl & Zippel above
•
Mayhew, Anthony Lawson, Skeat, Waiter W. A Concise Dictionary of Middle
English from 1150 - 1580. Oxford: Clarendon Press, 1888 (ISBN 978-1419100703;
repr. 1977,2004; 812 pages)
o < http://library.casc.cdu/ksllccoll/books/t11a1TonOO im~l\con()Ohrt111 >
o The dictionary lists all words contained in the Clarendon's Press Middle English series
up to 1888 to which series the references are usually given. Modern English equivalents,
10
The Old English Dictionary.
26
basic grammatical information, variants, etymons and references to other works (esp.
NED) are provided. The small amount of source texts seems to be the main limitation
of this work, making it more of a glossary to the Clarendon series than a general
dictionary
•
G. M. G.
1889,90). See chapter 3 for a more detailed description.
Morris, Richard and WaIter W. Skeat. "Glossarial indexes" in Specimens of Early
English. 2 vols. Oxford: Clarendon Press, 1882-1894, vol. 1. 365-554 pp., 355-489 pp.
(1st ed. 1872, 2nd ed. 1887, 3rd ed. 1894, 4th ed. 1897)
o < http:!h.;,vw.archi\T.org/dctails/spccimcnsoFearh·Ul morruoft > vol. 1.
< http://w\vw.archi\T.org/dctails/specimcnsofearlyOOmorrnoFt>vol.lI.
o Replacing the previous work by Morris,11 the two volumes cover a selection of texts
from the periods of 1150-1300 (vol. 1.) and 1298-1393 (vol. lI.) each with a separate
glossary. Both glossaries give headwords alphabetically and for every entry a speech tag
with concise grammatical information is provided and a Modern English gloss
accompanied by a back reference is given for each form occurring in the text. Most
entries are also provided with etymological cognates (Egge 1886, 65-8 and Browne
1892, 133-4). Other readers with glossaries include works by Brandl, Burrow, Corson,
Davis, Dickins, Emerson, Mosse and Tolkien.
•
Mosse, Fernand. "Glossary" in A Handbook of Middle English. Transl. James A.
Walker. Baltimore: The Johns Hopkins Press, 1952, pp. 423-495
o Only the second volume of the French Manue! has been translated into English and the
translation, though satisfactory, is easily detectable (Wilson 1954, 107). The glossary,
following an index of proper names, is quite simple, giving all forms occurring in the
texts with generous cross-referencing, part of speech tags, back references and Modern
English equivalents. Occasionally, etymological cognates are also provided, but this is
mostly unnecessary as the author developed an intricate system of referencing to OED
(422). The translator introduced a system of diacritics into the glossary, though the texts
are not marked this way. This step may have been unnecessary (Eliason 1954, 135-8),
especially when the translator himself warns about the uncertainty of Middle English
pronunciation (422). Other readers with glossaries include works by Brandl, Burrow,
Corson, Davis, Dickins, Emerson, Morris and Tolkien.
•
Stratmann, Francis Henry and Rev. Henry Bradley. A Middle English Dictionary
Containing Words Used by English Writers from the Twelfth to the Fifteenth Century.
11 Specimens of Early English, selected from the chief English Authors, 1250-1400, Oxford, Clarendon Press,
1867
27
Oxford: Clarendon Press, 1891 (ISBN 978-0198631064; repr. 1940, '51, '54, '58, '63,
'67, '74, '78,2003; 732 pages)
o The structure of the dictionary can be well shown by comparing the original Stratmann's
A didionary
of the
Old English languag/ 2 with the new revised version by Bradley. The
original had an alphabetical structure but the headwords were often not the Middle
English forms, but rather their etymons; moreover, compounds (even opaque ones)
were to be found under their first element. Thus the user had to know the structure and
etymology of the word before looking it up. On the microstructural level, another
approach was chosen - the compounds listed as sub entries followed an etymological
order rather than an alphabetical one. Furthermore, etymologically related words that
were separated in Middle English were still treated under the same entry. The defmitions
were not always in Modern English, but sometimes only in Latin or they might not have
been given at all. Even if Modern English was used, the defmitions mostly comprised of
cognates that were often obscure or semantically misleading. No information on parts
of speech was provided and vowel quantity was marked only for Old English cognates.
Bradley added many words, especially those of Romance origin, and provided more
sources, corrected the entries, restructured the dictionary under a strictly alphabetical
order (putting all transparent compounds under their second element), and provided
some cross-references. He also added parts of speech tags, Modern English defmitions
where they were missing (revising the problematic ones) and introduced diacritic
marking. The character of the dictionary had thus changed from a hard-to-use
etymological dictionary into a more user-friendly comprehensive one (Garnett 1891,902). See chapter 3 for a more detailed description.
•
Tolkien,
J.
R. R. A Middle English Vocabulary. designed for use with Sisam's
Fourteenth Century Verse & Prose. Oxford: Clarendon Press, 1922
o < http://w\nv.archive.org;/details/middleellglishvocUOtolkuoft >
o In later editions a part of Sisam's reader, Tolkien's glossary is a well-prepared aid for
student readers - its aim being not that much to record obscure words and meanings or
to trace their etymology, but to help with the most common vocabulary of the texts
covered. Special attention is thus paid to parts of speech that are usually neglected, such
as prepositions or conjunctions, which the reader may often pass unaware of their exact
meaning, because they do not seem necessary for understanding the text at first, or
because of their superficial likeness to familiar modern forms. The entries are listed in a
1" Published in 1867, 2nd ed. 1873, 3,d ed 1878 and supplemented 1881.
28
strictly alphabetical order with extensive cross-references to variant spellings or unusual
forms. Entries provide full grammatical information, detailed translation and back
references to each form with occasional quotations to support the definition. The
etymological cognates are listed and references to NED are given where etymology is
more complex. The level of user-friendliness is at least comparable to modern glossaries
(Wardale 1920, 42-43). Other readers with glossaries include works by Brandl, Burrow,
Corson, Davis, Dickins, Emerson, Morris and Mosse.
•
Wright, Thomas. Anglo-Saxon and Old English Vocabularies. - see under Old
English
2.2.
Existing Electronic Dictionaries of OE&ME
2.2.1.
Digitized versions of paper dictionaries
A description of the dictionary structure may be found above; only the features characteristic
of the electronic versions are noted here.
•
Bosworth, Joseph. An Anglo-Saxon Dictionary, based on the manuscript collections
of the late Ioseph Bosworth. Eds. T. Northcote Toller and Alistair C ampb ell. Oxford:
Oxford University Press, 1838-1972
o < http://1cxicon.ffcuni.czltcxts/oc bos,vonhtollcr about. html >
o Digitized by Sean Crist et al. for the Germanic Lexicon Projed under a Joel Dean grant
together with a team lead by Jan Cermak under Jan Hus Educational Foundation
0HEF) grant between 2001-2007. Currently the project is in its final stages: the text of
the dictionary has been scanned, OCRed, hand-corrected and published either to be
viewed online with a simple search facility or downloaded with a special application
designed to browse and search the dictionary offline. More information about this
project may be found in subsequent chapters.
•
Bright, James W., "Glossary", in Bright's Anglo-Saxon Reader. Ed. Frederic G.
Cassidy and Richard N. Ringler 3rd edition, Harcourt, 1891-1972
o < http://lexicon.ff.cuni.czltexts/oe bright about.html>
o Digitized by Sean Crist et al. as a second text of the Germanic Lexicon Projed
(GU~
between 1999-2003. The text of the glossary was manually typed in by volunteers, the
rest of the Reader was scanned and OCRed. The resulting text of the glossary was
automatically parsed producing an XML output followed by an easy-to-use HTML
version (see chapter 6.2.2 for explanation of XML and HTML standards), which is now
29
available to be read & searched through by any ordinary web browser. The text of the
glossary is not included in the CLP search facility requiring the use of users' own search
tools.13
•
McSparran, Frances. (editor-in-chief), The Middle English Compendium. Michigan:
University of Michigan Digital Library Production Service, 2001-6
o < http://ets.umdl.umich.edu/m/mec/index.html >
o The Middle English Compendium (MEq is a project incorporating a digitized version of the
Middle English Dictionary but it improves its usability by interconnecting it with the Middle
English HyperBibliography and the Cotpus
of Middle English Prose and Verse.
The contents of
the dictionary itself are exactly those of its paper counterpart, but its digitization made
all kinds of searches possible. Currently, the electronic version of MED supports simple
lookups of headwords and variants with a possibility to use regular expressions;
wildcards1~ are available to represent any or limited number of characters, a beginning or
an end of a word, etc. More complex searches allow users to specify which part of the
entry is searched, while Boolean 15 and proximity16 searches are also allowed. As the user
is allowed to search quotations, it is possible to restrict the search to a certain period or
text(s). The interconnection with the HyperBibliography gives users an opportunity to
display full bibliographical information of any quoted text in the dictionary including a
list of its editions/facsimiles and a reference to A Linguistic Atlas
of Late
Mediaeval
English. I? The interconnection with the Cotpus makes it possible to display the whole text
from which the quotation is taken if the text has already been added into tlle COtpUS. 18
The handling of special characters has been designed with user-friendliness in mind:
substitution characters are used for special characters entry19 (e.g. capital "T" for thorn),
which is very convenient but renders the characters used for the substitution
13 There were two notable lessons learned from this project that were important in the future of the CLP: (1) j\ll
information from the original should be retained even if it seems unimportant at the time of the digitization. (2)
When working with volunteers, the most basic standard of encoding and formatting should be chosen and rigorously
verified during the submission process.
J.! \\lildcards are characters that substitute for other characters, usually in a one-to-many relationship, e.g. "*" can
stand for any number of any characters so that "a*" can stand for any word starting with the letter "a".
1S Boolean search combines several searches using logical operators like "AND", "OR" & "NOT".
16 A proximity of search terms can be specified in a number of characters.
17 McIntosh, A., M.L. Samuels and M. Benskin, with the assistance of Margaret Laing and Keith Williamson. il
Linguistic Atlas of Late Mediaeval English. 4 vols. Aberdeen, 1986
18 It is planned to add all the quoted texts into the CO/pt/so
19 The problem with typing characters that are not part of the Modern English character set has plagued
computer users for a long time. Obviously, it is not limited to special characters of dead languages, but these are
more problematic, because they are often not included in any living language character set and there have been very
few incentives so far for the computer industry, to create character sets for long dead languages.
30
unsearchable. Images are used to display the non-standard characters,20 which is again
very convenient because there are no special requirements for the user, but it limits
further handling of the displayed text. The MEC is an invaluable resource and a model
for future projects. Its value was further increased in 2006 by a generous decision of the
publishers to offer all of the MEC for free. For more information about MEC and
MED see chapter 3.
•
Mayhew, Anthony Lawson, Skeat, Walter W. A Concise Dictionary of Middle
English. from 1150 -1580. Oxford: Clarendon Press, 1888
o < http:!h.t ww.gutcnbcrg.org/ctcxt/lOC)25 >
o The dictionary had been scanned by Case Western Reserve University and given to
Project Gutenberg volunteers who hand-corrected the text in 2004. The text is now
freely available to download, but it has been stripped of most formatting, while special
characters have been substituted (e.g. "yogh" for "3"). The text is also offered
commercially in PDF21 for handheld devices, but this format does not provide many
interesting features. For more information see chapter 3.
•
Roberts, Jane, Flora Edmonds, Christian Kay, Irene Wotherspoon, Thesaurus of Old
English Online, University of Glasgow, 2005
o < http://libra.englang.arts.gla.ac.uk/oethesaurus >
o An electronic version of the 2nd ed. of the Thesaurus digitized by its authors is in fact a
searchable database with a thesaurus structure. It is possible to search for an Old
English word and view all the thesaurus categories that include this word and then
display full listing of any of these categories (so that related words are displayed). Search
can be carried with or without regard to the length marking and wildcards can be used
for searches of the beginning/ middle/ end of the word. Modern English words
appearing in the category descriptions can be searched, as well as the frequency flags or
Old English phrases (entries consisting of more than one word). Users that do not wish
to search for a specific form may use the browsing facilities that list categories in
alphabetical or semantic order. Special characters are substituted by standard capital
letters and length can be marked using the underscore.
20
2)
Throughout this paper the non-standard characters stand for letters outside the standard English alphabet.
Portable Document Format
31
2.2.2.
•
Original electronic dictionaries
Baker, Peter Stuart. "Glossary" Old English Aerobic. University of Virginia, 2003
< http:lh.,v\\-.enzl.virginia.edu/OE/glossarY/>
o An electronic version of Baker's Introduction to Old Engltsh is an innovative project in
several ways. So far, it is probably the only traditionally published Old English textbook
that would originate onhne. Peter Baker has seamlessly integrated reading texts with
glossary, notes, grammar and voice recordings so that the users need only to click on
words or clauses of the text they are reading and the desired information is readily
displayed. Selecting individual words brings up their Modern English translation and
grammatical information from the glossary, together with a link to an appropriate
chapter in the grammatical introduction. Selecting highlighted items displays additional
notes or explanations of idiomatical expressions. Selecting clauses brings information
about the clause type being displayed and flnally clicking paragraphs produces the voice
of the author reading them aloud to the user. The texts, the introduction or the glossary
can be also printed for offline use. The introduction corresponds to its printed version,
but the text and the glossary add some new material. The project is under development
though hardly anything can be added to user-friendliness of the application.
•
Healey, Antonette diPaolo, Angus Cameron and Ashley Crandell Amos. The
Dictionary of Old English, Toronto: Pontiflcal Institute of Medieval Studies, 2003
(ISBN ISBN 0-88844-928-3)
o <http:/ h'7\v,v.doe.utoronto.ca/>
o DOE aims to be an exhaustive dictionary of Old English, replacing the Bosworth-Toller
dictionary and covering the period prior to that covered by MED. It has been in
development since 1970s, with most of its flrst two decades spent in preparing and
digitizing at least one copy of every extant text in Old English. This way, The Electronic
Corpus
of the
DOE was created and from this a wordlist (concordances) was generated
with a number of citations for each of the future entries. On this groundwork, the
entries are now being written with letters A-F being already published online and on
CD-ROM, with its own application for searching & browsing. The structure of the
dictionary is stricdy alphabetical with "ash" following "a" and the "ge-" preflx
disregarded; headwords are the extant West-Saxon forms if such occur in the corpus, or
the most common forms. Headwords are followed by part of speech information with
occasional reference to cognate words. After this, all attested forms of the word follow
with basic grammatical information provided. Then the frequency is stated in number of
32
occurrences, usually accompanied by usage notes. If the entry is a complex one, a
schema of its sense division follows succeeded by the defmitions themselves. These can
be analytical or simple equivalents, depending on the complexity of the particular word.
Defmitions are followed by supporting citations, their references and notes to the
sources (with editorial changes in square brackets). Then Latin equivalents from the
manuscripts are provided, if there are any, and a "See Also" section referring to other
dictionary entries. The last two sections are references to secondary literature and
additional material, usually in the form of general notes to the particular entry. The
search tools are similar to those of MED and they include the possibility to search
separate parts of the entries and the use of regular expressions (wildcards), but the
Boolean search is not provided. The special characters can be entered either through
character codes 22 or by substitution for capital letters ("T" for thorn, etc.). However, the
characters are displayed in Unicode 23 so that the displayed text can be further processed
(copied, edited, etc.) Oenkyns 1991,380-416 and DOE Help). See chapter 3 for more
information.
2.2.3.
•
Other electronic projects
Johnson, J. R., "Dictionaries" in Old English Made Easy, 2006
o <
http://home.comcast.net/~m()deaIl52/oeme
dictionaries.htm >
o An interesting project using unknown lexicographical data, but apparently mostly the
electronic version of Bosworth to create a hyperlinked dictionary with a reverse
wordlist. The citations and references were left out, but the reverse wordlist can prove
useful.
•
Slade, Benjamin. Old English Glossary for Beowulf and the Finnesburh Fragment
o < http:lhvww.hcorot.dk/glossary.html>
o And extensive glossary to Beowulf based upon Klaeber's glossary, with numerous
additions and corrections. The glossary can be browsed or searched via browser's own
search engine.
22
23
Any character used on a PC can be expressed by its code according to the particular encoding.
An advanced encoding system trying to encompass all the world's writing systems.
33
3. COMPARISON OF SELECTED DICTIONARIES
To summarize the above list, we could characterize the history of Old and Middle English
lexicography as developing in four successive stages:
•
15 th
-
18 th century - the stage of the fIrst antiquarian efforts with individuals from
Nowell to Lye who compiled wordlists from any sources at hand and widlout much of
a scientifIc method to facilitate reading of old manuscripts.
•
19 th century - the stage of classical Anglo-Saxon scholarship when the standard
comprehensive dictionaries such as Bosworth's or Stratmann's were compiled.
•
First half of the 20 th century - the period when the standard dictionaries were revised
and supplemented to comply with the progress of the scholarship, but large projects
were not any more undertaken by individuals. Instead, inspired by the team
collaboration on NED, the development of MED started.
•
Second half of the 20 th century - the period of computerization, when large electronic
corpora could be used as a basis for new lexicographical projects. The MED was
digitized and connected with its corpus and the development of DOE as a corpusbased dictionary began.
The interesting fact this summary entails is that whatever progress has the Old and Middle
English scholarship made since the 19 th century, the dictionaries still in use today are mainly those
of the 19 th century.
Therefore with digitization and the future usability of ensuing projects in mind, dle following
comparison will concentrate prinlarily on the dictionaries that are still in use today, contrasting
the standard dictionaries of the 19 th century with ilie newest electronic resources. For the Old
English period the Bosworth-Toller's Anglo-Saxon Dictionary, Clark Hall's Comise Anglo-Saxon
Didionary and the A-F on CD-ROM version of the DOE will be compared, while for ilie Middle
English period a comparison will be made between Stratmann-Bradley's Middle English Dictionary,
Mayhew-Skeat's Concise Dictionary ofMiddle English and the online version of the MED.
34
3.1.
Macrostructure
The macrostructure of a dictionary determines several of its basic characteristics. The size of
a wordlist is proportional to the exhaustiveness of a dictionary, while its structure and
organisation of items predetermines whether the dictionary be semasiological, onomasiological or
based upon frequency.
3.1.1.
Wordlist
All the dictionaries under consideration tried to compile as exhaustive wordlist as possible
with the exception of Bradley, who recognizes his failure to fully achieve the goal in his foreword
to Stratrnann's dictionary, and Mqyhew-Skeat, whose wordlist was intentionally limited to the
Clarendon Press series. The proper names, specialist usage and vocabulary that might be avoided
in modern general dictionaries were mostly included. It is thus not the aims, but the methods and
results of the wordlist compilation that are of interest here. Bosworth, Hall and Stratmann all
used a similar method: the authors plundered previous dictionaries and wordlists and added some
of their own material that they could discover, mostly in newer manuscript editions, collecting
their materials on slips of papers with citations and references. Toller, Campbell, Meritt and
Bradley - the subsequent editors of their dictionaries - partly followed their lead, but Campbell
and Meritt went also directly into manuscript themselves - both to collect additional materials as
well as to check the accuracy of their predecessors.
This method can, obviously, be attacked on scientific grounds. Even though most of their
sources are acknowledged, it is difficult to ascertain what areas of the language were covered and
to what depth. The method of MED and DOE is more scientific, although the difference in their
development is characteristic both of the period of their genesis and of the periods of language
they cover - it is, for example, quite impossible to compile all the surviving texts in Middle
English as a basis of a dictionary, while we will se that it is possible for Old English. Accordingly,
the editors of MED followed the example of NED in selecting representative texts and asking
independent volunteers to report any interesting items they could fmd. These were than checked
35
against manuscripts and citations with references were collected on slips. The DOE editors, on
the other hand, decided to cover all the extant texts of Old English (one version of each text),
which would be impossible for later stages of English. The editors decided to select the best
edition for every Old English text, check their reliability, re-edit those that would prove unreliable
and create new editions of yet unedited texts. It is not quite obvious to what extent the qualitative
criteria were kept, but the expected extent remained the same and a wordlist was created from
the concordances of the electronic DOE COtpUS. 24
3.1.2.
Headwords
The compilation of a wordlist, however, does not consist only in collecting all the words of a
future dictionary, but also in deciding what forms will represent particular words as headwords.
This may be a straightforward matter with a synchronic dictionary of a standard variety of a
language, but it presents much greater difficulties to editors of a diachronic dictionary of a
language without any dialectal or orthographic standard. The headword should be representative
of the whole paradigm and it should be easy to locate given any of its forms as a starting point.
Also, it should not mislead the user about the real occurrence of the form chosen for the
headword itself - as we will see below, some lexicographers chose headword forms that in fact
do not occur in the corpus of the language covered. To deal with these requirements, some
degree of idealization of the headword seems to be desirable and the dictionaries in question
display different levels of such idealization. An extreme approach would be that of the original
1838 edition of Bosworth's dictionary (and to some extent that of the l't edition of Hall's) which
aims "to present their words in their own dress" (Bosworth 1838, vii). This approach requires no
idealization, but in consequence leads to a double or triple entry for the same word and to
extensive cross-referencing lest the user be unable to locate the particular entry. This strategy was
24 It should be noted here what Jenkyns quite rightly pointed out in his article about DOE that "[ilt is a
remarkable tribute to the earlier scholars and to the card-index! shoe-box technology that so few entirely new and
incontrovertible OE words have been discovered." Genkyns 1991,390)
36
criticised by scholars and users so that both Toller and Hall in their 2 nd editions adopted a milder
version of Sweet's system. Sweet's approach represents another extreme on the scale - he himself
has been changing his system from publication to publication, but generally speaking he adopted
Early West Saxon
2j
spelling for his headwords. The problem with this system lies mainly in the
number of extant texts in its preferred dialect, which is relatively small. Many words are therefore
unattested in Early West Saxon and Sweet thus deduced the particular headword, introducing
forms that in fact did not exist in Old English corpus. Both Hall and Toller accepted Sweet's
choice of Early West Saxon, but they did not use unattested forms rendering the headwords in
other dialects if necessary (Ellis 1993, 6). Similar approach seems to have been taken by
Stratrnann and retained by Bradley - the Early Middle English spelling is preferred where
attested, probably because it best exhibits its connection to Old English cognates, which would
accord with the originally etymological aim of the dictionary. The remaining dictionaries (that is
Mqyhew-Skeat, MED and DOE) all recognize the problem of this approach in offering users
headwords that actually represent minority of the real occurrence, making most of their users
either look at a wrong place in the dictionary most of the time, or learn to deduce earlier forms of
words they search for. Therefore the decision of the editors was to prefer the later, best recorded
periods of the language, which is Late West Saxon for Old English and Southeast Midland dialect
of ca 1400 for Middle English (which is quite close to Chaucer's dialect), but again only if the
forms are so attested. If they are not, the editors usually go for the most frequent form in other
dialect. This strategy, if well explained to the users 26 and complemented by frequent crossreferences, seems to be the most user-friendly approach.27
It seems he believed it to be the "purest" or the most "genuine" form of Old English (Ellis 1993,7).
Most of the dictionaries explain their method of idealization and normalization with some notes on
phonology that should help users fmd the form desired especially by explaining which letters are interchangeable in
the particular system of orthography.
27 It is perhaps understandable that none of the selected dictionaries chose frequency as a sole principle of
selecting the headword form. First, it is laborious to assemble plausible frequency data, and, second, the frequency is
quite unpredictable for users.
25
26
37
3.1.3.
Ordering
With a wordlist compiled and headwords selected, the wordlist has to be ordered. All the
dictionaries under consideration are semasiological in type and the progressive alphabetical order
is thus only natural to all of them. The alphabetical ordering of Old English headwords may not,
however, be as straightforward either. Firstly, there are several characters alien to Modern
English alphabet that have to be placed somewhere in the order, and, secondly, the usual
difficulties with prefixes, compounds and nesting have to be dealt with.
The position of the special graphemes is usually based to some degree upon the proximity of
their pronunciation to the pronunciation of some standard grapheme.2~ This seems to be a
natural approach, but it is problematic in several ways. First, the pronunciation depends very
much on the relative position of the grapheme in the word - should its place in the alphabetical
order also change accordingly? Second, some pronunciation may place the non-standard
grapheme directly into the sequence of a standard grapheme rather than between two standard
graphemes - the usual placement of "ash" in the sequence of "a" along Witll "ae" may serve as an
example. This placement does not recognize the "ash" grapheme as a separate letter and it may
graphically disturb the sequence of "a-" prefixed words 29 This is a strategy employed by all
Middle English dictionaries which not only place "ash" along with "ae", but also "thorn/ eth"
along with "th" and "yogh" mostly along with "g", but also along witll "i/y" and elsewhere
depending on the phonetic quality of the grapheme. 30 These graphemes were not used widely in
Late Middle English, so that Mqyhew-Skeat and MED need only to use them occasionally for
words unattested in their preferred dialect. Stratmallll-Bradlry, who prefers the early forms, lists
28 It should be noted here that Old and Middle English used a phonemic spelling, but like Modern English, there
were exceptions in their orthographies. The phonology itself has gone through many changes that show noticeably in
the dictionaries. Conspicuous for the user is the absence of "v" and "z" in most Old English dictionaries which
reflects the fact that these voiced fricatives were allophones of "f' and "s" respectively. The same reason is behind
inflating "eth" and "thorn" under one grapheme.
29 Thus for example in BOsw01th-To!!er the "a-" prefixed entries "a-dylfan" and "a-fxged" that might be expected
to follow each other are divided by 17 pages of words with initial "ash".
30 The practice of placing "eth/thorn" along with "th" or "yogh" along with "gh" in Middle English dictionaries
is also strengthened by the Late Middle English orthography which introduced these digraphs in place of the nonstandard graphemes.
38
"thorn" as a separate letter, because the number of forms with an initial "thorn" is large in their
wordlist. Old English dictionaries - with the exception of DOE - also adopt the practice of
incorporating "ash" in the sequence of "a", but both Bosworth-Toller and Hall give "eth/thorn" a
status of a separate letter after "t", like Stratmann-Bradley, while refraining from the use of "yogh"
in headwords initially. DOE has not come beyond "f", but its plan suggests it will adopt a similar
practice.
The omnipresent Old and Middle English prefIx "ge-" is treated similarly by all the
dictionaries except for Bosworth-Toiler'! and the fIrst editions of Hall; that is - the prefDced words
are to be found under their stem, but are treated separately from their unprefD;:ed stems. 32 Other
prefDCed words are usually to be found alphabetically by their prefDC.
Similarly the compounds are mostly treated under their fIrst element, although Stratmann-
Bradley and MED list the transparent syntactic compounds under their second element. DOE
solves this by cross-referencing all the compound forms from their elements, but treats them
separately, which seems to be the most user-friendly solution 33
Nesting is generally not used for derived forms with the exception of Stratmann-Bradley, but
grammatical forms are always treated in one entry, except for some anomalous forms.'-\
3.2.
Microstructure
3.2.1.
Structure
The entry structures of the selected dictionaries are more varied than their macrostructure
and it is thus more demanding to present them fully, clearly and in contrast with each other. We
31 Bosworth's second edition was taken over by Toiler right after the letter "g". He decided to leave the edited
part be and conform to most of its principles in his supplement.
32 In subsequent editions Hall only identifies whether the item appears prefixed, unprefixed or in both forms.
33 For example DOE lists the compound "fole-egesa" - "fear, terror (among the people)" in a separate entry
under "f', but it is also cross-referenced from the entries for "foie" and "egesa", while at the same time referring to
both these entries itself.
3. An example of nesting can be StratmaJII1-Bradlry's entry for "belwen" - "to bellow" with an incorporated
treatment of "bellwinge" - "the bellowing". An example of separate treatment of irregular forms can be BosworthToiler's entry for "dyde" - "p. of don", which defines the form separately, although referring to its standard lemma as
well.
39
will fIrst go through each section as they appear in DOE entries, because DOE seems to make
use of all the sections used to some degree in all the other dictionaries. Then we will attempt to
present a tabular overview of the sections in all the dictionaries for easier comparison.
•
Headword - First element in all the dictionaries, see 3.1.2
•
Grammatical information - All dictionaries give the part of speech information,
though Bosworth-Toller and Hall provide this type of information indirectly, by stating
gender for nouns and strong/weak type of verbs, while Hall does not mark adjectives
and adverbs at all. Gender is given only by Old English dictionaries, as it is irrelevant
for Middle English. Hall and DOE also provide verb class membership and DOE
noun class membership. Bosworth and DOE supply additional information for variant
forms, if relevant.
•
Etymology - All the dictionaries provide some etymological information in the form
of cognates and references. In the Middle English dictionaries, the cognates are usually
Old English, Old French and Latin. Stratmann-Bradley goes often beyond that and
provides other Germanic cognates, which is a practice more common in the Old
English dictionaries. These also list Latin cognates; Bosworth-Toller & Mayhew-Skeat add
Greek etymons, while Hebrew etymons are supplied by BOJworth-Toiler only.
•
Variant forms/spellings - In DOE, MED and Bosworth-Toller the variants, listed with
appropriate grammatical information, precede the defInitions. In MED, they are also
connected with subsequent citations by numbers. BOJworth-Toiler tries to list all variants,
but without apparent connection to their citations. DOE lists all attested forms,
variants and spellings (which may go into hundreds) in this section, but it also provides
the most characteristic variants together with the headword. It is not quite clear on
what basis these have been chosen as they are obviously not the most frequent ones,
neither in the corpus, nor in the citations. Such list is not only interesting for
comparison of forms, but it also guarantees that the user will surely locate the
40
appropriate entries by searching for any attested form. DOE also mentions relevant
limitations of a particular form to a historical period, author or manuscript. Stratmann-
Bradlry and Mqyhew-Skeat use variants together with citations and definitions to capture
different meanings, while Hall in its conciseness gives mostly the varying vowels only.
•
Occurrence (frequency based on the DOE Corpus)
- Only DOE specifies
occurrence35 , but the number of citations in MED and, to some extent, in Boswortb-
Toiler may provide rough indication of the overall occurrence.
•
Usage (labels characterizing the manner, style or context of the particular usage) -
DOE and MED are the only dictionaries
that provide usage labels systematically;
other
dictionaries
furnish
similar
information only occasionally in form of
comments at different places
entries
(usually
as
ill
their
a
part
of their
ill
DOE can be
defmitions) .
•
Schema -
Entries
complex, especially their sense divisions.
With the more complicated entries, an
outline of the division is provided to
facilitate user orientation. MED offers
summaries of variants linked to citations
and defmitions, which may be compared
to this device (see Table 1).
•
Definition -
As the dictionaries
ill
A.
part
used with an ordinal number
region, area
portion of land
A.3.
portion of time; period offour minutes
(ByrM)
A.4.
part of speech, word, also used of figures of
speech; nimende de I 'participle'
A.5.
in special senses
A.5.a.
in numeric expressions: times
A.5.b.
for (one's) part, for (one's) sake
A.5.c.
se ear! an de I 'body'
A.5.d.
limes de I glossing comma
B.
share, allotted portion
B.t.
portion, inheritance
B.2.
lot, fate
B.3.
ne habban (nanne) de I mid / on 'have no
connection with, take no part in'
c.
amount, quantity
C.t.
god de I (with gen.) 'a good deal (of), a great
quantity (of)'
C.2.
(largely poetic) in understatement, with
genitive: a portion (of), hence, a great deal
;(of), a lot (Clf)
D.
, in adverbial phrases
D.t.
J sume de le 'partly, in part'
D.2.
. be / ofsumum de le 'in part, to some extent'
D.3.
be de le 'in part, to some extent'
D.4.
be! e m de le 'to that extent'
D.5.
be e nigum de le 'at all, to any extent'
D.6.
of mic/um / me stan de le 'in large part, to a
great extent'
E.
in poetic crux
Table t Schema for the entry of "drel" in DOE
A.t.
A.2.a.
A.2.b.
35 Up to about 45 the number is not rounded, then it is rounded to nearest fives up to a hundred and over a
thousand the number is rounded to hundreds.
41
question mainly try to assist users in translation, the deflnition by modern equivalent is
used where possible by all of them. The smaller dictionaries like Hall, Mqyhew-Skeat
and, to some extent, Stratmann-Bradlry occasionally add contextual hints, but generally
refrain from analytical deftnitions to save space. Bosworth-Toller, MED and DOE use all
these strategies, while MED and DOE also divide many of the deftnitions
hierarchically by numbered senses. Bosworth-Toiler, which is more restricted by space,
uses the sense divisions only occasionally. These are then flatly divided into paragraphs
marked by roman numerals, Greek characters and Arabic numerals. This dictionary
also makes frequent use of Latin equivalents and translations. Although proper names
were mostly allowed into the wordlists, encyclopaedic deflnitions are generally kept to
minimum. Bosworth-Toller is an exception in this - its long description of Beowuifwith
dozen lines from the poem is just an example of a surprising excess in a work
otherwise understandably "obsessed" with conciseness and brevity.
•
Primary references and Citations - All the selected dictionaries give references to
primary texts, but Hall and Mqyhew-Skeat do not provide citations. Stratmann-Bradlry
usually gives only a few words to establish the nearest context, while Bosworth-Toiler
usually provides complete sentences. Unfortunately, the sentences are often shortened
or otherwise changed for conciseness' sake, unfortunately without acknowledging tl1e
editor's interference Genkyns 1991, 411-2). MED and DOE have enough space to
provide proper citations that have been checked against manuscripts. MED dates the
manuscript and the time of composition (if they differ signiflcantly). Both MED and
DOE mark editors' interventions (if such occurred), DOE even adding editor's
comments. The number of citations in these dictionaries is also much greater,
especially with more frequent words. Probably the greatest number of citations is used
in DOE's entry for "beon" (to be) - it goes into hundreds, which would obviously be
impossible in any printed dictionary. The electronic dictionaries also let users
42
interactively expand the abbreviations (or stencils) of the sources, and MED even
offers access to some of these sources. In addition DOE and Bosworth-To!!er provide
complete translations of the more problematic citations occasionally, and if the source
had a Latin gloss, or was a gloss of Latin text, the Latin version is usually provided.
•
Internal references - All the dictionaries have links to other entries. Most just point
to cognate words (MED does this often instead of providing fuller etymological
information), but Bosworth-To!!er and DOE list compounds containing the particular
form as one of the elements.
•
Secondary references -
DOE, Bosworth-To!!er and Mayhew-Skeat glVe frequent
references to other dictionaries (eps. NED or OED) and occasionally to grammars or
encyclopaedic sources. Bosworth-To!!er and DOE also note surviving forms in Middle
English, Bosworth-To!!er occasionally adduces particular authors like Layamon or
Chaucer. Each work gives a full list of secondary sources (and it seems futile to repeat
it here).
•
Comments -
DOE, Bosworth-To!!er and Mayhew-Skeat give occasional general
comments, like DOE's cautionary notes against poorly documented meanings,
problematic manuscript readings, etc.
The following fIgure (Fig. 1) tries to capture an overview of the above-listed elements of the
dictionary entries. It shows whether the particular element is used in the dictionary, in what order
do the elements follow (from 1-11) and if they are grouped in a paragraph or regularly repeated
together as a group (a-e). The headword column, which obviously represents the fIrst element in
all the dictionaries, is left out to save space. See Appendix 10.1 for samples of dictionary entries.
_pictionalY r--Gra~_L~m . ..L_~,..J Occur.!
,
,
,
Definition
Def. ! Lat. eq.! Ref. !
! Usage! Schema!
---~-- --t--j-}-+ -t-l---~--
_ Sou~e~ ___ ~ ______ LJIlt..ref. L~c,_~LI Corn.
C~.
! Transl. i La!. Gloss i Ed. comm. i
!
!
---1-- -i--%-+-----t-~1-6a-I-- -+------ 1-----1---}--+------7 --+
Figure 1 Microstructural elements (in addition to the usual abbreviations: BT Bosworth-Toller, MS - Mayhew-Skeatand SB - Stratmann-Bradley)
43
8_
3.2.2.
Quality & Content
We have so far touched mainly upon the structural and design traits of the dictionaries; to
assess the quality of their content is much more difficult. It is, however, in a way, outside the
scope of the present work.
It has already been pointed out that both DOE and MED surpass the older dictionaries in
quantity of material they encompass for their respective periods. It has also been noted that the
main difference is in the size of their text base. It is, therefore, quite understandable that the
number of words and their citations or spellings will be appropriately larger. The only other
dictionary citing its sources in a manner comparable to DOE or MED is Bosworth-Toller but its
qualitative shortcomings in this respect were commented upon in the preceding section. It is,
however, much more problematic to establish how the resulting differences were reflected in the
quality of definitions. This is also complicated by the fact that we can only compare letters A-F as
only the respective fascicules were so far published by DOE.
As there is no space to carry out a proper qualitative analysis, few entries that might have
been considered problematic by Anglo-Saxon scholarship have been sampled to show how DOE
has capitalized upon the decades of development of the discipline dividing it from Bosworth-Toiler
and Hall.
•
acan (verb "to ache") - Not problematic in itself, this word is well attested. Hall gives
only reference to lElfric's Grammar & Glossary. Bosworth provides
SL'(
different sources
and two citations, while Toller adds two more citations from other sources. One of the
two original citations is the gloss by lElfric "Acap mine eigan", which Bosworth-Toiler
translates in a straightforward manner as "my eyes ake". DOE lists ten citations, but the
interesting difference is its addition of the glossed Latin word to iElfric's gloss "taligo acap
mine eagan" and translates "'my eyes ache, ? are weak / failing' rendering caligo 'I see
indistincdy / am blind"', thus establishing a second sense of the verb "a can" (to see
indistincdy/ to be blind). This difference has clearly no grounds in quantitative superiority
44
over the two older dictionaries, but rather testifies to careful treatment of well-known
citations by DOE.
•
fag, fah (1,2) (adjectives "hostile" / "coloured") - All the three Old English
dictionaries distinguish these two forms as two different words with two different
spellings each. It is not quite obvious why the words are treated separately. All the
dictionaries recognize the etymological principle as def1!ling for distinguishing homonymy
from polysemy and the etymology of these two words seems to be similar. Perhaps, the
reason is the contrast in their meaning and the different surviving forms in Middle English
(foe, faw). Such at least seems to be the situation in Hall and Bosworth-Toller. Both
dictionaries seem to suggest that the form "fah" is primary to dle meaning of "hostile"
and "fag" to "coloured", respectively, while the vice versa relation is secondary. Bosworth-
Toiler does not seem to suggest any connection between the words and in fact links only
one variant with its main entry. Hall creates a rather confusing cyclical reference between
the forms, but does not suggest any special kind of relation between the words. DOE
treats both words as having the same headword: "fah", with "fag" as their main variant,
but also points out their delicate relationship by commenting in both entries that "Some of
the citations taken here have elsewhere been taken s.v. fah2 'particoloured' / as fahl
'hostile'; in some instances a deliberate ambiguity may have been intended". Interestingly,
there are no similar citations used in the two entries, though several similar sources are
cited. No matter how the comment is interpreted, it at least notifies the user of a possible
relation between the two words. If the user accepts the note and explores the senses
offered by DOE to some depth he or she may see that the definition "e" (for the general
meaning of "hostile") is given as "of Satan, someone damned or accursed: at enmity (with
God / society); in poetry" and the defmition under "2.b.i" (for the general meaning of
"coloured") as "of people / Grendel / Satan: stained (with sins / crimes / deeds)". This
clearly suggests that the "deliberate ambiguity" may not be achieved merely by
45
homonymy, but that the semantic fields are close enough to make a valid point for
polysemy as well. Here again we may conclude that although the number of citations in
DOE is far greater than in Bosworth-Toller or Hal!, it is especially its careful treatment of
semantics and its awareness of possible relations between the words that makes its
information value superior to the older dictionaries. 36
•
fleotan (verb "to float") - This verb is defIned quite straightfolwardly by Bosworth-
Toiler as "to float, swim" with a number of citations to support this unambiguous
definition, while Hall provides much broader field of meanings "to float, drift, flow, swim,
sail, skim", with three references to support the whole range. One of the citations in
Bosworth-Toller is of particular interest: Wanderer
~.
50): "Fleotendra ferp no oxr fela
bringep cupra cwidegiedda", which Bosworth-Toller renders again quite straightforwardly
as "the spirit of seafarers brings there not many known songs". Both the syntactic and
semantic diffIculty of "fleotendta" has been noted by many careful readers of the passage
and the published translations differ markedly. Owen analyses the passage in his article
"Wanderer, Lines 50-57" and criticises the more "airy" translations such as Wyatt's ("the
floaters in the air") or Kershaw's ("melt away ... vanish, their spirits ... "), which he feels
are not supported by the meanings of "fleotendra". He argues on the basis of Bosworth-
Toiler that "these interpretations are suspicious: the words seem to be confined in Old
English to contexts dealing with water" (Owen 1950, 161-5). This seems to be a circular
deftnition, because Bosworth-Toller's deftnitions are based on the same passage that Owen
tries to explain through it. The evidence in Hall is too scarce to suggest whed1er this
passage has been considered, but it seems that DOE noted the diffIculty of this passage
36 Although Klaeber's Glossary to Beowulf has not been included in the previous list of dictionaries and Slade's
electronic compilation of the same Glossary has not been selected for a closer comparison in this chapter, Klaebcr's
treatment of this problem should be noted, because the words are closely connected to Beowulf. Klaeber's approach
is similar to Bosworth-Toller and Hall in its treatment of headwords and variants, but it suggests that both forms arc
polysemous: "fig, fih" with the meanings: (1) "patterned, coloured; variegated, decorated; shining" and (2) "bloodstained", while "fih, fig" with the meanings: (1) "hostile, (foe); in a state of feud with" and (2) "outlawed, guilty".
This, similarly to DOE, may suggest some semantic relations between senses (2) of both words.
46
and builds on it a second meaning for the verb while summarizing the problem in its
defmition as "present participle used as noun: fleotendra ferhp ? 'spirit of the floating /
fleeting ones' where the sense of the complement is uncertain, probably referring to
imagined companions, or perhaps to seabirds". This indicates that the editors of DOE are
well aware of the discussion and development in Anglo-Saxon scholarship and furnish
their dictionary accordingly.
A similar cursory comparison of the three Middle English dictionaries is difficult, because
Middle English vocabulary is generally better understood and there is a smaller probability of
finding any fundamental differences in lexical semantics. Brand new readings based on new
interpretation of the lexical material are also less likely and it is therefore more problematic to
assess to what degree has MED reflected the development of Middle English scholarship. We
may note that, for example, Tolkien's new and probably well-founded reading of "eaueres" in
Ha!i Meilfhad (Tolkien 1925, 331-6) as "horses" apparently escaped MED's editors despite
Tolkien's straightforward conclusion to the effect that "e(o)ver 'boar' should not appear in future
Middle English dictionaries, unless some further occurrence can be adduced". Unfortunately,
MED's entry is nearly identical to that of Stratmann-Brad!ry (criticised by Tolkien), moreover it
seems to confirm Tolkien's concerns that the form is a hapax. In fact it gives only those two
occurrences from Hali Meilfhad that Tolkien originally analysed.
By randomly picking a few other entries from MED and by comparing them to the entries in
the two older dictionaries some minor extensions or slight qualifications in defmitions can be
revealed:
•
Thus "dagoun" translated as "a jag of blanket" by Mayhew-Skeat and as "piece of
cloth" by Stratmann-Bradley, which are identically supported by two references
(without citations), is extended in MED and an additional meaning of "a pledget" is
provided with two more citations & Latin glosses "plagelle" & "lichnis". Tlus,
47
considering the new citations are both from a medical text, may suggest the positive
impact of a text base extended to comprise more technical texts.
•
Similarly "daffe", translated as "fool, idiot" in Stratmann-Bradlry and missing in Mqyhew-
Skeat (which lists only its probable cognate "daft"), is extended by MED as a surname
and supported by citations from the Rolls.
•
"Bent", surprisingly treated by both Stratmann-Bradlry and Mayhew-Skeat as two words,
one meaning "field, hillside" / "a moor, open grassy place", the other "sort of grass" /
"coarse grass, small rushes", is recognized as one word by MED with the primary
meaning of "A type of grass; prob., any of several coarse wild grasses growing in open
fields" extending to personal and place names (supported by citations from the lists of
place names), "field covered with the grass called bent", and finally "battlefield". The
word is thus rightly reanalysed on a broader basis and new senses are added.
•
Sometimes, new etymological connections are suggested as in the case of "akimed",
which Stratmann-Bradlry lists under the reconstructed headword of "a-kImen",
translated as "? grow faint", and gives no clue as to its formation except for its
compound character and the MHG37 cognate "erkumen". MED also suggests the
MHG cognate, spelling it "irqueman", but also ventures on to point out a possible
formation on Old English "ofer-cuman" and accordingly translates the word as
"overcome, dumbfounded".
To conclude, we may say that although the time separating MED from the two older
dictionaries is half the time separating the Old English dictionaries from OED, it has also
capitalized on its much broader text base and its analyses are deeper and more detailed.
37
Middle High German.
48
3.3.
Other features
3.3.1.
Typographical Features and Additional Materials
The typography has an important supportive function in dictionaries - it helps users to find
their way around the entry and, even more importandy, to locate it. These are more difficult tasks
in paper dictionaries and the typography has thus a more important role there, because the paper
dictionaries are, unlike their electronic counterparts, 1irrllted by space - both by the total amount
of pages as well as by the constraints of the page itself. Locating the entry is facilitated by two
devices. First the page has to be found and for this purpose all the four paper dictionaries we
have so far considered print running heads on top of their pages, though none of them uses a
thumb index or a similar device. On the page itself, the headwords are usually highlighted. This is
achieved in all of the dictionaries by printing the headword in bold type; Mayhew-Skeat also uses a
slighdy raised font. To both mark the headword and de1irrllt the entry, the text of d1e entry and
the headword are slighdy indented from the neighbouring entries, but more importandy they use
a different indentation at the left margin. Clark-Hall and StratmalZlZ-Bradley both indent the text of
the entry more than the headword so that the headwords stand out slighdy to the left which
makes them more conspicuous to the user. Bosworth-Toller and Mayhew-Skeat use an opposite
approach of indenting the headword more than the text, which serves the same function. The
highlighting effect might be a bit smaller, but the space saved especially in long entries of
Bosworth-Toller is significant. The text of the entries is nearly always printed in one aligned
paragraph, except for a few special entries in Bosworth-Toller that are divided into several
paragraphs and for the sub entries in StratmalZlZ-Bradley dealing wid1 derived forms. In the
paragraph itself several techniques are used to visually divide the elements of the entry. The
major division into subentries, which are used in Bosworth-Toller is achieved by bold roman
numerals preceded by large space of about five ems 38 . The individual senses are divided by colons
38 ; \
typographical unit of measurement equal to the size of the particular font in points.
49
or semicolons. Bosworth-ToIler also divides the general part of the entry from individual citations
by a colon and a dash, while Mqyhew-Skeat divides the etymological section from the rest of the
entry by a dash. The rest of the divisions is marked by alternating italics and regular types, while
Bosworth-Toller also makes an occasional use of small caps to highlight either the most important
headword (usually the one with the most detailed entry) if there is a sequence of the same
headwords or to mark the most general meaning in a complex entry. There is no common
standard as to which elements should be printed in italics and which in regular type, but Bosworth-
Toller and S tratmann-Bradlry adopt a similar approach in printing definitions or translation in italics,
while citations and their sources in regular type. Apart from this practice, dle choice seems to be
arbitrary. For more information on marking of individual elements in Bosworth-Toller see chapter
7.1.1. For samples of dictionary entries see Appendix 10.1
3.3.2.
Electronic Dictionaries
Electronic dictionaries have a very different visual organisation of their entries, because they
display one entry at a time and the entry is only partially constrained by the size of the screen not only is the displaying space resizable and scrollable, it is also, quite logically, unpredictable.
Finding of the appropriate entries is achieved through the searching or browsing facilities and
is thus independent from typographical devices. Both dictionaries have powerful search engines
that allow users to search in a particular element of their entries separately (for example in
headwords, etymologies or citations) and MED also supports combined search in multiple
elements thanks to its Boolean search feature. Wildcards are supported in both dictionaries, so
that searching for variant spellings, common stems or affixes is possible. Even though the search
features will probably cover most of the users' needs in locating entries, browsing the alphabetical
list can be helpful in some situations. This is unfortunately only supported by DOE.
The differences between DOE and MED in their treatment of the entries could be
summarised by saying that MED's approach is much closer to that of dle paper dictionaries. Its
authors did not venture very far from the printed format - although they spaced individual
50
elements of their entries sufficiently enough, the elements themselves are hardly more transparent
than their printed counterparts. DOE is more successful in this respect - it provides each
definition and citation with a separate line and it also clearly distinguishes citation sources from
their contents by font colour and type. The citations themselves are also more transparent thanks
to differentiating Latin and vernacular by type and by highlighting the occurrence of the form in
whose entry the citations are located. Individual elements in DOE are also easier to distinguish, as
DOE varies font size and background colours for each element. Both dictionaries provide the
user with a quick access to information about the sources through hyperlinked abbreviations
(stencils), but only DOE uses hyperlinks for internal references. MED allows the citations to be
hidden, which may help the users who are looking for the definitions/translations only.
MED is an online application provided through a web browser interface and because of that
its authors have opted for the most universally supported way of displaying the non-standard
characters: replacing them with images. Although the approach is failsafe in delivering a correct
representation of the data, it does not provide for its further efficient processing or
customization - the images cannot be easily copied or resized. DOE on the other hand uses a
proprietary application for search and Internet Explorer for display, but as it is a client-side
application (running on the user's PC, rather than on a far server), it can safely use its own fonts
and a Unicode encoding that is able to correctly display all the non-standard characters while
giving the user more possibilities in further processing of the data.
51
4. AN ELECTRONIC DICTIONARY OF OLD AND MIDDLE ENGLISH
In this chapter we will try to go through the specific characteristics of Old and Middle
English dictionaries and perhaps discern the most desirable practices especially with regards to
electronic versions and the possibility of digitization.
4.1.
Users & Needs
The needs of the users are one of the most important aspects in planning any dictionary and
the authors/publishers of a dictionary should always determine the target group of their product
and the specific needs of this group, before starting their work. The users of the Old and Middle
English dictionaries are differentiated mainly by their level of proficiency in the language of the
respective period and their specialisation or interests (native language being largely irrelevant).
The division in proficiency runs basically between students and scholars, but is, obviously, far
from clear-cut. The division of specialisation lies mainly between literary (reading & translating)
and linguistic interests. Again the division is not clear-cut, or more precisely, the needs of these
interest groups largely overlap.
We may briefly characterise these groups as follows:
4.1.1.
Students and Beginners
The less advanced users are often unable to precisely analyse the morphology of a language they
are not yet fully familiar with. Therefore they need assistance in locating morphologically or
morphonologically complex entries, especially where the form (usually inflected) about which
they are trying to find some information differs from the headword of a corresponding entry,
rendering thus the correct entry unreachable for a beginner.
•
This is especially true about prefixed or compound forms - these either have to be
treated separately and listed alphabetically under their first element, or they have to be
cross-referenced from that location.
52
•
Also forms unpredictable or difficult to predict without an advanced knowledge of the
language like variant spellings, dialectal forms or irregular grammatical forms should
be widely cross-referenced.
•
The regular forms different from the lemma have to be either analysed by the user, or
in case of an electronic application an automatic lemmatiser could be employed.
•
A clear policy should be applied and explained concerning the cross-referencing as
well as concerning the choice of the headwords, alphabetization, treatment of nonstandard characters, abbreviations, etc.
•
In the entry itself, great care should be given to a detailed grammatical description.
Not only should the lemma be clearly classified,39 but more forms of the paradigm should
be noted, especially with irregular items.
•
Good usage notes and a representative selection of citations is also important for
students, who have to rely on the dictionary in choosing the correct translation rather than
on their previous experience or context (which is often being translated with the same
dictionary).
•
The clarity of presentation is also very desirable, as the beginners are easier to be
daunted by a hard to use dictionary.
•
Generally, more attention should be paid to the words of higher frequency, rather
than to the obscure words and the most common meanings should be stressed.
•
Some basic encyclopaedic information may also be appreciated and the most
notable proper names should be included.
•
An electronic application aimed at beginners should feature a headword and a full-text
search, ideally with a built-in lemmatiser, browsing functions and possibly a short
39 For example the stem class for nouns is hardly ever provided, while it is quite difficult to infer the correct
declination without knowing the particular class affiliation. The category of gender, on the other hand, is hardly ever
omitted; yet it is impossible to infer the declination on the basis of the gender alone.
53
grammar survey. If the organisation of the entries allow, abbreviations could be expanded,
at least optionally.
4.1.2.
Advanced Users and Professional Scholars
Advanced users may undoubtedly profit from the similar features like their less experienced
colleagues, but their main interests are usually slightly different - they may have fewer troubles
fmding the appropriate entry, but they require greater wealth of information. Dictionaries for
advanced users should aim at comprehensiveness and their entries should strive to capture all
the shades of the meaning. They should also provide as much background information as
possible: etymology, morphological analysis, spelling & dialectal variants, references to primary &
secondary sources, internal references, detailed citations, contemporary Latin or other glosses &
equivalents. Advanced users may welcome some more powerful search functions, especially the
wildcard search to facilitate variant search or search delimited to individual entry elements.
4.1.3.
Readers and Translators
Users of this type are mostly interested in decoding functions of the dictionary. Quality
defmitions and a wealth of equivalents are particularly helpful for both understanding and
interpreting, but the basic grammatical information, usage notes and references to the sources are
also important. It may even help to provide paradigmatic information, because the synonyms or
meronyms may suggest new solutions to the translator. Secondary references to modern
dictionaries like OED could be also
usefu~
not least to the non-native speakers of English. In this
respect an electronic application could provide direct links to available sources.
4.1.4.
Linguists
Linguists can benefit from the same features as readers or translators, but they require additional
material, mainly detailed etymological information with cognates in related languages or at least
references to sources dealing with etymology in greater detail. The precise citations with as much
information about the primary sources (like manuscript dating, provenance, ete.) are no less
54
important. The real occurrence should be summarised by frequency data and a list of variants.
Complex search functions allowing search for affIxed forms or enabling the user to limit the
search by dates and provenance of occurrences are necessary if the data are to be made the best
use of in linguistic research.
4.2.
Type & Character of the Data
Sources of the dictionary data are varied, as we have seen in chapter 2. Other dictionaries or
glossaries are a ready source, but the quality of the data has to be ascertained and legal issues may
arise if data from more recent works are used. The texts are the most authoritative source of the
data, but their processing is not easy - either the already existing editions have to be used or new
have to be created and digitised so that concordances and corresponding citations may be
generated (semi-) automatically. The entries have to be devised mosdy manually, but some types
of information may be acquired by combining different sources (like etymology from
etymological dictionary, or equivalents in a different modern language than English). That,
however, requires pairing the headwords of the sources, the difflculty of which depends on the
complexity and compatibility of the sources. Similarly, the data may be enriched by including data
from corpora by listing more citations, introducing frequency data, ete. Tagged corpora and a
morphological analyser would be necessary to automate such tasks, d1.ough.
With electronic dictionaries, it is necessary to consider how to store the data. In the most
general terms, the decision lies between usability and versatility - an open and "standard" format
that is universally readable and easy to edit paradoxically often requires more processing before it
can be presented to the user than a proprietary format designed for d1.e specifIc application. This
concerns such issues as encoding of non-standard characters (it is more diffIcult to work with
non-standard characters at the development level, but if the non-standard characters are encoded
by standard ones, they have to be decoded at runtime), tagging the entry elements, searchability
55
(pre-indexing vs. runtime indexing)"o and compactness (compressed data are compact, but more
difficult to work with).
However advantageous it may seem from the above said to devise a dedicated format for
each electronic dictionary or application, it should be noted that: (a) unlike some other databases
the dictionary data have a long life expectancy (especially in the field of dead languages), while at
the same time (b) the value of the data is increased if it can be easily updated (especially by a large
number of users).
4.3.
Features ofthe Application
The application represents the interface between the user and the dictionary data. In printed
dictionaries its functions are carried out by the book design and typography, but it also takes over
part of the work that is normally done by the users themselves (like searching). The design of an
electronic dictionary application depends not only on the type of the data presented and the
target group of its users, but also on the general publishing plan. Basically, the application can be
distributed either in an online or an offline version (or both). The rapid development of online
browsing technologies seems to blur the line between these two versions, but some remain - the
online version is not tied to the user's copy of the data, but it is limited by the availability of the
internet connection; it also has to be more universal, because the target environment is largely
unknown; and [mally it may not be [me tuned for performance on a specific client device (such as
PC), but on the other hand it can run partly on the host device.
Whatever the version of the application, most of the following features can be and should be
included in both. We have already noted several types of the search, which is one of the most
important functions of an electronic dictionary: headword search, full text search, search by
elements, Boolean search, combined search (search in the previous search results - allows the
-l0 Generally, there are three possibilities. Either the data are pre-indexed in which case no indexing needs to be
done at run-time and the application is thus fast to start and fast to search. However, it is not possible to change the
indexed data. The second possibility is to index the data each time the application starts, which results in a fast search
and flexibility of the data, but the application may be slow to start. The third option is not to index at all, but that
results in slow search and is thus only advisable where small amounts of data are concerned.
56
users to filter or limit the results in steps) and the possibility to use wildcards in all these types of
searches. This can be further enhanced by the inclusion of a lemmatiser, which may not only
locate the right entries given other forms than a lemma on the input, but it can also help in
creating internal references between entries, or between the dictionary and other sources. As
there is a large quantity of freely accessible material for Old and Middle English on the internet - such as editions of primary texts, grammars, other lexicographical databases (thesaurus,
etymological dictionary, ete.) - the dictionary should try to refer to it as much as possible, if the
external material may be deemed trustworthy. This also applies to corpora that may either be
incorporated to serve as the basis for the wordlist and citations, or merely referenced so that
more examples may be found if needed.
The dictionary itself, if it is extensive and includes a large number of citations, may become a
reliable corpus on its own, thus the inclusion of some basic corpus tools like concordancer may
be of an interest to linguists.
If the data are well tagged (tagging systems will be discussed in chapter 7.1.2) so that both
headwords and equivalents (preferably one-word equivalents) are easy to distinguish and handle
for the application, it may be feasible to create a reverse wordlist. It should be noted, however,
that a reverse wordlist is not a full fledged reverse dictionary, it can only serve as an unreliable
encoding aid in an otherwise decoding dictionary.
4.4.
User Customization
The user customization may be a powerful option that can turn a general dictionary into a
specialised one or a philological one into a student's dictionary, but it has not been used to its
utmost by the publishers, because it is not easy to implement. The customization usually affects
what and how data are presented to the user. It starts with a simple, but useful option to change
size, colour or type of the typeface (as it is possible in DOE) and ends with a selection of the
entry elements to be displayed. MED allows the user to hide or expand the citations, but it could
be also possible to display only parts of the wordlist based on the provenance of entries or on
57
their contents (none of the reviewed dictionaries supported such feature, but it could be very
useful to have the possibility of working only with common words or poetic ones, etc.). Were
such customizations possible, big scholarly dictionaries as DOE could easily serve as simple
translating aids to the beginners without discouraging them by complexity. At the same time,
customization is an important element in accessibility helping the users with specific needs (large
typeface for people with weak eyesight, customizable colours for the colour-blind, etc.).
In general, the more customization the application allows, the better, but it should not be
forgotten that too complicated options may be also discouraging - for that reason it may be
helpful to create groups of sensible customizations that would provide for the most common
needs of the more prominent groups of users (e.g. all data visible in full-screen view for
philologists; or only usage, grammatical information and definitions visible in small font for
translators; etc.)
58
5. SOURCES FOR DIGITIZATION
We have already discussed sources of lexical data in the previous chapter, but there are more
issues to consider when choosing the source specifically for a digitization project. The target
audience, their needs and the way of delivering the content has to be considered as with any new
dictionary project. However, the digitisation is essentially improving an accessibility of existing
data. Therefore a specific set of needs has to be considered:
1. The computer skills of the users - how sophisticated can the application be?
2. Their access to information technologies - is online or offline deployment more
desirable?
3. Their financial means - students may not be able to afford to buy the dictionary or
pay for the access to commercial data and an inclusion of expensive data would thus
be out of question.
4.
Can feedback be expected from the target users & will they expect any feedback
from the digitiser/publisher?
If the target group is very general, it is desirable to design the application as accessible as
possible. Therefore the ideal source of data will be an extensive, accurate and freely available
dictionary.
5.1.
Existing Paper Dictionaries
In this light we can now consider the paper dictionaries listed above - the most extensive and
detailed general dictionaries are those compared in chapter 3: Bosworth-Toller, Stratmann-Bradlry,
Hall and Mqyhew-Skeat. Borden's dictionary may have a more extensive wordlist, but the detail
covered is poor. The availability of these dictionaries to digitisation and ensuing distribution is
dependent primarily on their copyright status. It may not be easy to determine whether a work
has entered into a public domain or not, but usually it is safe to assume that either the Berne
Convention applies, which sets the length of a copyright to 50 years from the death of the author,
59
or a national law applies, which ordinarily extends the period (the law of the country of
distribution is binding). Most European countries, USA and Canada have now extended this
period to 70 years after the author's death. This, however, is only true about authorial works. In
case of edition, the period is mostly 50 years from publication. Obviously, the problem lies in
ascertaining where the editorship ends and authorship begins - this is especially true about
glossaries. For example, Tolkien's Glossary to Sisam from 1922 was made publicly available by
Microsoft and The Internet Archive, which seems to suggest that it is considered an editorial
work. On the other hand, the
4th
edition of Hall supplemented by Campbell in 1960 is still
considered as copyrighted.
For our purposes the dictionaries in public domain are Bosworth-Toller's edition from 1921 (or
earlier), Hall's edition from 1916 (and possibly from 1931 outside of USA,
4th
edition will enter
public domain in 2010) and any impression of Stratmann-Bradley & Mqyhew-Skeat (the subsequent
impressions, unlike re-editions, do not renew the copyright). As far as recency goes, Bosworth-
Toiler from 1921 seems to be the most up-to-date Old or Middle English dictionary in public
domain.
It is also important to ponder the contribution that the perspective digitisation may make.
Clearly, if there is a current dictionary covering the same area of language, of comparable or even
higher quality and it is at the same time easily accessible, the digitisation may be superfluous. This
may be the situation in the area of Middle English, where MED is indisputably the most
extensive and detailed general dictionary and it is now freely accessible to anybody connected to
the internet, which leaves only a small space for contribution through digitisation of older
dictionaries.
Although there is also the DOE project for the Old English period, the situation is different[ust, the project is not yet close to its end and second, its wide accessibility in the future is not
quite certain. It is quite probable that it will remain a commercial product.
Thus the Bosworth-Toiler dictionary from 1921 seems to be the best candidate for digitisation.
60
5.2.
Supplementing from Other Sources
One dictionary application may work with more than one digitized source - it is either possible
to merge several sources in one, as has already been suggested, or leave the sources separate.
Pairing the equivalent entries to merge the sources may be laborious, thus leaving the sources
separate seems to be a better approach. However, all the data processed by one application should
be in a compatible format so that they can be searched together and perhaps automatically crossreferenced using an automatic lemmatiser. It would be fruitless to integrate data of too similar
coverage (like Bosworth-Toller and
Ha/~,
but having one general and several specialised sources of
data may be worthwhile. Bosworth-Toller, for example, may be amply supplemented by corpora
(DOE's Corpus may be freely accessible), or quality glossaries of particular important texts (like
Klaeber's Glossary to Biowulj) or specialist areas (like medical or botanical glossaries). Last but not
least, the paper dictionary may have its unincorporated paper supplements digitised, which may be
done at a later time (e.g. for copyright reasons). It is important that tl1e user should be always aware
of the provenance of the particular entry and it should be a part of the customization option to add
or remove sources from the set for the user to work with.
In online versions, it may be advantageous to simply link other online sources rather than
incorporate or add them to the application itself. First, other sources may develop independently
and second, it may be possible to link to commercial data, thus giving tl1e users with the paid access
the possibility to follow them, but not denying the rest any freely accessible content. The measure
of usefulness would obviously depend on the willingness of the administrators of the other data to
cooperate. They may be willing to provide their wordlists if it means more visitors of their
presentation and the wordlists could be incorporated in the browse & search engine of ilie
application. If it is impossible to cooperate in this way, but ilie source is deemed important, it is
possible to include a link that would automatically start a search (e.g. for the currently displayed
entry) in that particular resource, but only for iliose who have the access rights (this may be a
possible way of linking the free digitised dictionary with commercial projects such as DOE).
61
6.
PROCESS OF DIGITIZATION OF AN ANGLO-SAXON DICTIONARY BY
J. BOSWORTH
&T.N.ToLLER
The aim of digitising the Bosworth-Toiler dictionary is to produce an easily accessible electronic
version that would retain all the merits of the original work with all the ease of use the electronic
data processing can offer. Therefore the dictionary data need to be transformed into an editable
text, the text has to be processed so as to be easily manageable by a dictionary application, which
in turn has to mediate the data to the user.
As has been already mentioned, it is impossible to predict precisely (at the time of the
digitisation) what of the original data in the paper dictionary may prove useful in the future. Thus
the main guideline in the digitisation project is to retain all the information of the original and
leave the decisions about its possible usefulness to the interface or the end user.
The first step in a digitisation project is to select the method of transferring the printed data
into an electronic format. There are two basic possibilities if the result is to be an editable &
searchable (or generally processable) text and not just a digital image: (a) the text can be manually
typed in, which is usually a more precise method and requires virtually no equipment, or (b) the
text can be scanned and a software for Optical Character Recognition (OCR) is then used to
transform the resulting images into a text (semi-)automatically.
6.1.
Scanning
Currently, there are several possible methods of scanning a book and it is upon the factor of
cost-effectiveness that an appropriate approach should be selected, rather than upon the quality
of results, which are comparable (for the purposes of a digitisation of this type).
1. The book(s) can be scanned using cheap equipment such as a flat-bed scanner of a
negligible price, which is, however, a very time consuming approach. One page can be
scanned in ca. 3-5 min.
62
2. The sheets of the books(s) can be separated and a scanner with an automatic document
feeder can be used to automatically scan the text. The price of the equipment is ca. $100-
300. The price depends on the speed of the scanner and the amount of pages it can be fed
in one batch. The speed of scanning is ca. 3-10 pages per min., wid1 some amount of
manual work involved.
3. The book(s) can also be scanned manually on a digital copy machine that costs about
$1,000-3,000 at a speed of ca. 2-3 pages per min.
4. With the sheets separated and an automatic feeder used, a speed of ca. 10-30 pages per
min. can be achieved with the digital copy machine.
5. There are also dedicated machines for book scanning that combine automatic page turning
devices with fast digital cameras. The price ranges from about $3,500 to $35,000
depending on the format and condition of the fascicules. Using these machines, books
may be scanned completely automatically at a great speed of ca 20-60 pages per mm.
without damaging them.
It should not be difficult to calculate which approach is the most appropriate for the given
project if the number of the pages to be scanned and the cost of the manual labour can be
estimated.
It is important that the text is scanned in a sufficient quality for the OCR to work as
expected. The quality depends on the size of the letters, but with the most prevalent height of d1e
printed characters (10-12 pts.) 300 DPI (dots per inch) should be enough, which should be
possible in any of the oudined scenarios. If the text is black and white, the images can be scanned
in shades of grey (resulting files will be smaller and easier to process), but if colours are used in
the original, they should be retained (which is usually not possible with d1e standard office copymachines). At this step the images should be saved uncompressed or with a loss-less compression
63
only.41 A systematic approach in naming and organising the images can only be recommended it is the best way to make sure that the book was in fact scanned completely and to keep the
correspondence between image and page numbers clear & consistent.
The project of digitisation of the Anglo-Saxon Dictionary was initiated in 2001 by Sean Crist at
Swarthmore College (USA) as a part of a larger Germanic Lexicon Project. His team chose the
third scenario and scanned the whole dictionary between 2001 and 2002 under a Joel Dean grant,
though several pages had to be subsequently re-scanned during the later stages of the project.
In the post-processing stage several steps were taken to improve the subsequent OCR results.
First, each image usually represents two pages of the original book, depending on the scanning
technique. These were separated into individual files. The pages were also tilted upright because
some were scanned manually and even the automatic feeders tend to introduce some rotation.
The OCR program can usually do this during the recognition process, but it may be helpful to do
it at this stage so that the images can be cropped and any unnecessary information removed stains, hand-written marginal notes, black margins, etc. can cause problems in the OCR software.
The image flies were then organised into four distinct batches: (a) front matter, (b) body of the
main volume, (c) front matter of the supplement, and (d) body of the supplement
6.2.
Automatic Character Recognition
6.2.1.
OCR - Learning & Encoding
There are many different software solutions for the OCR processing, unfortunately there
does not seem to be any free software that would give satisfactory results. Currently the most
popular and accurate products are probably OmniPage (by Nuance), FineReader (by ABBYY the program used for Bosworth-Toller) and Readiris (by I.R.I.S). Though each product features its
proprietary methods of recognition, all do essentially the same thing:
41 The uncompressed formats are for example MS Windows Bitmap or an uncompressed TIFF, the loss-less
compression is available in formats like a ZIP compressed TIFF, PNG or GIF (only suitable for greyscale images
due to its limitation to max. 256 colours). Formats like ]PG are not suitable, because their compression methods may
hinder the OCR processing.
64
•
They start by separating individual characters in the image - poor contrast of the
background & letters or blended/merged characters may cause difficulties at this
point.
•
Then they compare each character with the ideal shapes of the alphabet and select the
best match - the typeface is usually recognised automatically.
o
It should be noted at this point that the smaller the number of ideal shapes
considered, the higher is the accuracy. The software usually allows users to
select a language(s) of the document so that the number of shapes can be
limited to the characters of the alphabet(s) of the particular language(s), or it
allows a custom set of characters to be made. The obvious difficulty with Old
and Middle English dictionaries is that the programs naturally do not offer
Old or Middle English as one of the languages and it may be very difficult to
guess all the characters included in a book like Bosworth-Taller. Because many
unthought-of characters usually appear in subsequent phases of the OCR, it
may be worthwhile to partially repeat the OCR process with a supplemented
list of characters.
•
If the program cannot match the character with an ideal shape with some minimal
accuracy/probability, the user is asked to resolve the situation. Any such conflict that
is resolved by the user is used as a training example and other similar characters are
then recognised more easily. Therefore it pays off to spend some time on training the
program. Sufficient training may substitute for the missing characters in the character
set.
•
If the language of the document is recognised by the software or a custom dictionary
of the word forms contained in the document is available, spellchecker may try to
resolve conflicting situations - such selections are preferred where the result is an
existing word form.
65
o
As has been noted, the software does not recognise Old or Middle English as
valid language selections, but good results could be gained in this phase by
using a concordance of an Old or Middle English corpus as a custom
spellchecking dictionary. Unfortunately, there was none such widely available
at the time of Bosworth-Tollers digitisation.
•
In the final stage, the user is notified about any problematic places in the document.
Again, any manual corrections at this stage may be used as training data by the
software in case of re-processing the document or its parts.
•
Because Bosworth-Toiler includes Latin, Greek, Hebrew and runic characters as well as
many extensive diacritical marks, which would make the list of possible character
extremely large and the OCR more prone to errors, it was decided not to process the
non-Latin characters through OCR and leave their digitisation to the stage described in
chapter 6.3.
6.2.2.
Data Format
After the OCR is finished, it is necessary to choose a format into which the data can be
exported by the OCR software. In case of Bosworth-Toller, the XML - the Extended Markup
Language - was aimed at, because it is universally compatible (can be edited on almost any PC)
and flexible. Unlike HTML42 it allows customizations of the markup and also unlike HTML,
which is effectively a visual markup, XML captures the semantics of the data (this will be
discussed in more detail in chapter 7.1), yet it is simpler and more legible than the SGML
(Standard Generalized Markup Language). It should also be noted that the use of XML to store
textual data conforms to TEI's (the Text Encoding Initiative) Guidelines (Burnard & SperbergMcQueen 2004, Introduction).
42 Hypertext Markup Language - the prevalent markup language of World Wide Web - defines how to format a
text and other elements.
66
However, it is impossible to produce a fully semantically tagged XML data just by scanning
the dictionary and applying the OCR. Some of the OCR programs may allow a substitution of
some special characters by HTML or non-standard entities during the export of the data, but
most do not.
6.2.3.
Subsequent Automatic Corrections
Owing to the decision that the data will be stored in a format allowing only standard ASCn43
characters, all non-standard characters were substituted for entities. An entity is a string of
standard characters substituting one non-standard character. This string has to be delimited by
characters that are not used for any other purpose throughout the text so that the entity may be
recognized automatically and unequivocally by the dictionary application or other software. For
example, the "ash" character (x) is substituted by the entity "æ" ("&" and ";" are used to
mark the start and the end of an entity respectively and "aelig" - ae-ligature - specifies that this
entity stands for the "ash" character). Some of the entities used had been already standardized by
HTML, but some were established for the purposes of the project alone. It is notable that there
were still new unforeseen characters found and new entities established for them nearly six years
into the project.
Similarly, because the format of the data is plain text (only ASCII characters without any
proprietary formatting), the differences in type-faces in the printed dictionary were retained by
introducing markup tags. The tags are strings delimited like entities by dedicated characters. The
tags may form pairs - a start tag and an end tag - and in such cases they define the properties of
an enclosed text: e.g. <b>this text is bold</b> defines the enclosed text as being bold ( "<"
and ">" are used to mark a beginning and an end of a tag respectively, "b" specifies that the tag
43 American Standard Code for Information Interchange - a set of 95 printing characters including all letters of
the English alphabet (both upper and lower case) and some other alphanumerical characters. In the early stages of
the project an extended set of characters, the LATIN1 encoding was also considered, but ASCII was adopted as the
more practical: it was developed in 1963 and became a standard in 1967. It has been the most universal encoding in
the IT industry ever since. This guarantees that the raw dictionary data may be edited on any PC in basically any text
editor without difficulty and that this will be so even in a foreseeable future.
67
stands for bold formatting of an enclosed text and "/" denotes an end of a paired tag). Some tags
do not form a pair and they thus delimit a point in the text of a special significance: e.g. <PAGE
NUM="b0001"
/> signifies a page brake of the first page of the main body of the dictionary.
The tags and entities may look confusing in the raw data format, but if they are properly
interpreted by the dictionary application, they are invisible to the end user.
Some major automatic corrections of errors resulting from the OCR process may also take
place at this point. It was pointed out by Sean Crist that in the case of the OCR process, which
he carried out on Bosworth-Toiler data, the "thorn" character Q:» was often wrongly recognized as
"P", "p" or "b". It is possible to correct part of these errors automatically saving thus an
enormous amount of time in the subsequent hand-corrections by substituting all "P" characters
for "thorns", if the "P" characters are found elsewhere than at the beginning of a word. It should
be noted that this is a dangerous step and as it is very difficult to predict character occurrence in a
work so diverse and inconsistent as Bosworth-Toller, each general substitution like this should be
well judged. There were numerous general substitutions like this applied (e.g. space between any
letter direcdy preceded by a comma), but it seems unnecessary to list them all here. Suffice it to
say that Sean Crist developed scripts carrying out various complex substitutions correcting over
6,000 characters while admittedly introducing few random errors but in general saving a
considerable amount of time to the hand-correctors.
6.3.
Hand Corrections
The longest and most tedious, but altogether unavoidable part of a digitisation project is the
hand-correcting of the text. Hand-corrections, however, by no means imply corrections in the
original text at this stage. The premise that all information from the original should be digitised is
still valid. The corrections are only made to bring the naturally inaccurate OCRed version of the
text closer to its original.
Simple as the task may sound, it gets more complicated when it is distributed among many
people. The digitised version of Bosworth-Toller consists of more than 30 million characters and it
68
was obvious from the beginning that correcting such a huge amount of material would either be
very time-consuming, or very expensive (or both). Initially, the decision was to try and handcorrect the text without any extra funding and thus the only option was to organise a large
number of volunteer hand-correctors. The system developed to organise the corrections at this
stage was to be very useful in subsequent phases of the project. The volunteer program ran from
2002 rill 2005, but even with about half of the correction data supplied by an independent project
of Bekie Marett's44, the whole text was covered only from about 40% after the ftrst three years. In
2005 the project of hand-correcting the remaining part of the texts and developing the dictionary
application was supported by the John Hus Educational Foundation grant and headed by Jan
Cermiik (Faculty of Arts, Charles University in Prague), so that most of the volunteer work could
be replaced by the paid participants in the project. Thanks to this grant, the main part of the
hand-corrections was f1rlished in early 2007, although the project had to be moved to a new host
and the system had to be signiftcantly adjusted between 2005 and 2006.
From this experience, it is possible to generalise and conclude that any system organising a
large number of people editing the same complex text has to ensure that:
1.
The skills and equipment required from the individual correctors are kept at
minimum. 45
2.
The correctors produce minimal number of errors or oversights by making their work
as simple as possible.
3.
No conflicts arise over the different versions of the same text.
44 http://clntlt(!ohcrc.nu/ DC! as-bt/
45 Surprisingly, the correctors who had little or no knowledge of Old English and lexicography and whose work
was thus completely mechanical seemed less prone to make errors, probably because they did not unconsciously
analyse the text which might sometimes lead to substituting what the corrector expects in the text rather than what
the text actually contains. An interesting parallel with the amount of errors introduced by (un)skilled copyist into
early manuscripts of Beowulf has been pointed by Jan (:ermak. The difference between the knowledgeable and un
knowledgeable assistants in a digitization process of a lexicographical material seems to have wider currency, because
it also corresponds to Sidney Landau's distinction between Type A and Type B companies to a large extent. (Landau
1989,274-5).
69
4. Tags and entities are used correctly (only those declared are used and pair tags are
paired) & basic consistency of the data (like file headers) is kept.
5. The whole system requires minimal administration.
6.
As much use as possible is made of the correctors' work, because this is probably the
only point at which the whole of the text can be fully and carefully examined.
The web distribution system developed for Bosworth-Toiler digitisation by Sean Crist at
Swarthmore but later adjusted for the Faculty of Arts (Charles University in Prague) servers tries
to fulfil all these obligations:
1. The correctors need an internet connection only when they reserve, download and
later submit the corrections. The corrections may be carried out offline. The text for
correction is in a plain text format and, as has been already noted, requires no special
software to be edited. The paper dictionary that should serve as the main reference for
the corrections is available as images in two different widely accessible formats or as a
part of a PDF46 file, so that users of any platform should be able to procure free
software for its viewing and printing. The correctors are thus required to compare the
XML code produced in earlier stages with the scanned images and correct any
differences. The correctors are not asked to interpret meaning of any tags or entities,
they merely correct their occurrence, thus the tags resulting from this step are still just
a way of transcribing the visual formatting of the paper dictionalY, rather than
semantic tags.
2.
Obviously it is difficult to spot the differences between a text with non-standard
characters printed in a special font and a text where these characters are encoded by
entities: the corrector is for example asked to change the entity "æ" to "æacute;" when the corresponding character in the scanned image is
"x" and not "::e". To
46 Portable Document Format created by Asobe Systems in 1993 ensures that the documents saved in it will be
visually same, whatever platform or machine is sued for viewing.
70
be able to do this without a significant number of oversights requires a great deal of
concentration and practice. To make the process easier for the correctors, the OCRed
version of the text with the entities replaced by the non-standard characters and with
the formatting tags replaced by the formatting itself (the correctors have to check for
correct formatting as well) is supplied in a PDF ftie. Thus the corrector is presented
with an easily discernible difference (such as between ";e" and ",e") that can be marked
in the "middle" version in the PDF and subsequendy located in the raw plain text and
corrected there. See Appendix 10.2 for a few samples of the correctors' materials.
3. Because working with the whole volume of the text would be complicated for the
individual correctors and to enable them to reserve portions of the text so that only
one corrector may work with one portion of the text at a time, the text was divided
into separate fties by pages of the paper dictionary.
4. After the page is corrected, it can be uploaded for an automatic validation - the
validation system checks that only the predefl1led tags and entities were used, ftie
headers are intact (necessary for correcdy composing the pages back into one piece),
but it also runs several tests that were used right after the OCR (like checking for
capital letters in the middle of words, ete.) to warn the corrector about possibly missed
errors or about errors introduced during the corrections.
5. This is an extremely important step, especially if a large number of correctors is
working on the project, and it has to be completely automatic so that individual
correctors can go through the whole correction cycle of reservation-downloadingcorrecting-validation-submission alone without any help needed from the project's
administrator.
6.
It should be also noted that (a) some differences between the original and the
electronic verSion still remaining after this stage are intentional: the non-Latin
characters that have not been OCRed are not dealt with in this stage, because it would
71
reqwre that all of the correctors have a sound knowledge of Greek, Hebrew, ete.
Instead, the correctors just need to spot the non-Latin characters in the original and
place a tag at the appropriate place in the electronic version so that these instances are
easy to find later on. (b) Similar procedure is required if correctors spot an obvious
error in the original (misprint, etc.), only that a special error tag is used, or (c) when a
problem is identified that the correctors cannot resolve, for which case a special
problem tag exists.
After the main hand-corrections are finished, the final round of smaller corrections can start
where higher demands are placed on a smaller number of more skilled correctors. This is the
stage at which the non-Latin characters are transcribed into entities by people versed in the
appropriate languages and errors or problems are checked & solved by those who can deal with
them. Thanks to the correctors' work in the previous stage, the portions of the texts requiring
additional attention are easy to find. This stage of digitisation of Boswortb-Toller is still in progress
and is expected to be fmished in 2007.
6.4.
Application Development
Although Sean Crist built an online search function for the continually updated text of the
Boswortb-Toiler dictionary straight into the web application that handles the hand correction
process, it was decided to create a full-fledged off-line version of a dictionary application under
the JHEF grant.
The application 1S currently being developed by Ondrej Tichy (Faculty of Arts, Charles
University in Prague), and its test verS10n can be found on the CD-ROM attached as an
Appendix 10.3 to this paper.
A detailed description of the application's objectives can be found in chapter 4. It is not the
aim of this paper to give a detailed description of the programming process or its documentation.
However, several interesting aspects of the development may be noted here. Among these were:
72
•
Choice of the programming language & platform (an operating system), for which the
application would be developed. Several possibilities were considered, mainly the Java
programming language (by Sun Microsystems) - a language notable for its platform
universality, C++ - probably the most popular programming language for desktop
applications, and Delphi (by Borland) - a development tool based on an Object Pascal
language, which is notable for its clear structure and effectiveness. Java's multiplatform
support is usually a trade-off for performance and the resulting application may have
problems running efficiently on slower machines. It seemed more important to aim at
users with cheaper hardware and provide them with a reliable program, rather than to
support a large number of minority platforms at this stage, especially because C++ can
be compiled for various platforms and Delphi offers a possibility to compile for both
Microsoft Windows (natively) and Linux47 (using Kylix and other tools). From the two
languages, Delphi was chosen, mainly for subjective reasons, but the two languages are
closely related and a switch of the language may yet happen in later stages of
development. The Microsoft Windows operating system was chosen as the native
environment for the application, because of its wide user base.
•
Standardizing the data -
after almost six years of hand-corrections, many
inconsistencies are still infesting the dictionary data.
Some anse from the
inconsistencies of the original dictionary, some from uncorrected OCR errors and
some were introduced by the correctors themselves. They are often insignificant when
viewed from the perspective of a corrector, lexicographer, or an end-user, but they
cause great troubles to the programmer. Unfortunately, most of the errors may be
eradicated only after the last hand-corrections are finished and it seems that it would
have been a better idea to start the development of the application only after the text is
finished and fixed. Our approach may seem less logical, but it transpired that many of
47
A free operating system of a UNIX class.
73
the errors in the text become apparent only during the application development, when
the inconsistencies start causing serious troubles.
•
Displaying all the non-standard characters (though only the non-Latin ones at this
stage) is problematic - there is over 400 different characters in the dictionary. The only
standard encoding that supports such a large number of characters and that is also able
to treat the non-Latin characters fully in the next stage is Unicode, but its spread and
support is still hardly universal. Moreover, there is no font available that would contain
all the ligatures occurring in the dictionary. The application currendy uses the Junicode
- a free Unicode font designed for medievalists, but many ligatures will have to be
added specifically for the Bosworth-Tollerproject.
•
Another problem posed by the non-standard characters is searching. There are
basically two possibilities. Either the text can be loaded by the application in its plain
text XML format, transformed into a formatted text with non-standard characters in
Unicode and then worked with (e.g. searched). Or it can be worked widl in its XML
form and transformed only when portions of it are to be displayed to the user. As it
has been a successful policy throughout the project to retain dle data in the plain text
as long as possible, the application tries to respect this approach and the search runs
over the XML data, while the user's input can be either in non-standard Unicode
characters or in XML entities. This decision was also based on the fact that the search
routines are not yet well developed for the Unicode. However, a wildcard search over
XML data of questionable consistency is difficult. It is for example difficult to predict
length of words when a large number of entities are involved and possibly some
erroneous (unpredictable) as well.
•
Alphabetical sorting also causes problems in the application's support of browsing
functions. The sorting is applied on a wordlist that is automatically generated when the
data is loaded at the application's start and the wordlist is first transformed into
74
Unicode, because it would be very difficult to sort the entities and the standard
characters alphabetically in the same list. However, the sorting routines usually do not
expect non-standard characters in their alphabetical order, moreover it has not been
yet clearly decided whether to keep the original alphabetical ordering, or use a different
one when incorporating the supplemental data, because the main body and the
supplement use a different alphabetical order. Currently, the user may choose whether
to stick with the original ordering or re-sort the wordlist, but re-sorting is still
experimental.
Notable implemented functions at this stage include: the browsing function with a browse-asyou-type support (when the user types in characters, the wordlist scrolls accordingly), a full-text
search and a partial wildcard support ("*" stands for any number of any characters and "?" stands
for anyone character), the display of the page number on which the current entry is located in
the printed dictionary and a possibility to display the scan of the original page in the dictionary,
so that the user can easily check for the accuracy of the transcribed version. The non-standard
characters can be either entered using the predefmed input buttons in tlle application itself or the
entities can be used instead. Obviously, there are also alternative methods such as customized
keyboard defmitions or a copy/paste method that can be used to the same effect.
The application does not currently support any kind of searches based on the individual
elements of the entry or Boolean searches. These require the semantically tagged data, which is
one of the future tasks of the project. For screen-shots of the application, short instructions and
the application itself, see Appendix 10.3.
75
7.
FURTHER DEVELOPMENT
There are still two major tasks to be dealt with in this project: the tagging of the data and the
development of a morphological analyser, but it should be noted that the application is already
usable as a replacement of the paper dictionary in its current state.
7.1.
Tagging the Data
As has already been mentioned, the dictionary data are currently tagged according to the
visual formatting of the printed dictionary. To increase the value of the data for future use, tllls
tagging has to be transformed and enhanced into a semantic tagging. The following table tries to
illustrate how the two tagging systems differ and at the same time it represents Sean Crist's
cursory proposal of the possible semantic tagging for Bosworth-To!!er.
type oftagging
(original)
visual
Semantic
example of an entry
a-berstan;p. -bearst,pf. -burston;pp. -borsten; [a, berstan] To burst, break, to
be broken; perfringi. v. for-berstan.
<p>
<b>a-berstan;</b> <i>p.</i> -bearst, <i>pl.</i> -burston; <i>pp.</i>
-borsten; [a, berstan] <i>To burst, break, to be broken;</i> perfringi. v.
for-berstan.
<Jp>
<entry>
<headword>a-berstan; < /headword>
<pos>p. < / pos> <inflected-form>-bearst, </inflected-form>
< pos >pl. < / pos > <inflected-form>-burs ton; < / inflected-form>
<pos>pp.</pos> <inflected-form>-borsten;</inflected-form>
<etym> [a, berstan]<etym>
<gloss lang="en">To burst, break, to be broken;</gloss>
<gloss lang="latin">perfringi.</gloss>
<xref>v. for-berstan.</xref>
</entry>
Table 2 Sean Cnst's proposed taggmg system
What seems to be evident from this example is that the visual tagging can serve well to
reproduce the original typography of the printed dictionary, but not for any advanced automated
processing. The problem is that the visual tagging is ambivalent. In the example entry, the
portions of the text tagged by "<i></i>" (italics) are parts of the grammatical information as
well as of the modern English equivalent. The un tagged portions of the text are the inflected
76
forms, Latin equivalent and a reference. The headword seems to be unambiguous here, as the
only bold part of the entry, but that is not necessarily so in other entries, where bold can signify
subentry divisions. Similarly the paragraph divisions «p></p» do usually divide the entries,
but sometimes they help to format a complex entry.
The current version of the application disambiguates some of these tags by considering their
relative occurrence so as to be able to distinguish individual entries and their headwords. Thus,
for example, the bold text following a paragraph division can be more or less safely expected to
be a headword. This is, however, more complicated with other entry elements and quite
impossible to fully automate with all of them.
7.1.1.
Microstructural Analysis & Definition of Tags
A detailed analysis of Bosworth-To!!ers microstructure has already been carried out in chapter
3.2.1. The system of sense subdivisions was, however, not mentioned at that point and should be
described for the present purpose now. The senses may be subdivided at least on three levels: by
Roman numerals in bold (e.g. "I."), Arabic numerals in brackets (e.g. "(1)") and Greek characters
in brackets (e.g. "(IX)"). The Roman numeral usually follows the general variants, grammatical
information and defmition (if these elements are defmed as general, and not with particular
senses only), but the occurrence of the Arabic numerals and Greek characters is less predictable.
They only appear if the Roman numerals are used, but then the Arabic numerals or the Greek
characters may be used directly or one after another. If we combine all the elements depicted in
Fig. 1. with these subdivisions, we may try to produce a complete set of semantic tags and
exemplify it on a fake (shortened in length with a few elements added) entry for "ofer-faran".
The line breaks, indentations and bold typeface are irrelevant - they are used here to make the
entry more transparent, as well as to distinguish the tags. Some of the tags proposed by Sean
Crist were retained, but the system was extended and slightly changed.
77
<entry>
<headword>ofer-faran < /headword>
<variant>ofer-freran</variant>
<pos type="verb">verb. </pos>
<deflnition>A word expressing all kinds of crossing</defmition>
<infl_form pers="l" num="sg" tense="pres">ic oferfare</infl_form>
<sense level="l" num="l">I.
< pos type="v_intrans">intrans. < / pos>
<equivalent lang="eng">To pass, go off; </equivalent>
<equivalent lang="lat">transeo :-- </ equivalent>
<citation>1El}Je6diglice is oferfare</citation>
<gloss lang="lat">peregre transeo, </gloss>
<source name="AElfc. Gr." place="38">AElfc. Gr. 38 ; </source>
<source name=" Som." place="41, 28">Som. 41, 28.</source>
<citation>Oferfare on munt swa swa spearwa</citation>
<gloss lang="lat">transmigra in montem, </gloss>
<source name=" Ps. SpI." place="10. 1."> Ps. SpI. 10. 1. < / source>
</sense>
<sense level="l" num="2"> 11.
< pos type="v_trans">trans. < / pos>
<sense level="2" num="l">(Q()
<gloss lang="en">to pass, cross (a river, boundary, etc.) :-- </gloss>
<citation> lc lordane eft ongean oferfare mid twam floccon, </ citation>
(... )
</sense>
<sense level="2" num="2">(~)
(... )
</sense>
</sense>
<etym src=" Piers P."> [Piers P. </etym>
<etymon>faren, fare: </ etymon>
<etym src="OSax">O. Sax. </ etym>
< etymon> faran ] < / etymon>
(... )
<xref target="faran">DER. faran</xref>
</entry>
This example tried to illustrate how to retain the appearance of the original dictionary, but at
the same time enrich it semantically. With entries tagged like this one, it would not only be
possible to search through individual elements, but because the tags have meaningful parameters,
the user could for example list all nouns, or all Old Saxon etymons. Such searches or filters could
be combined with full-text search so that for example all verbs ending in "-ian" could be
displayed. It would be also much easier to link directly the internal references and either display
the full bibliographical information about the sources or link them as well (if available). The
78
original appearance of the dictionary also does not need to be stuck to and the semantic tags can
form a basis of a more transparent and customizable format, as has been suggested in chapters
4.3 and 4.4.
Several questions still remain open. They will require a more detailed analysis and may be
resolved only during the process of tagging. Should for example the inflected forms be
grammatically interpreted inside the tags as their properties - that is, should the verb or noun
categories be specified as the parameters of the tag? This might be useful, but it would require a
lot of additional work. Or should the citations, sources and glosses be numbered, so that they can
be connected together easily? However, there are often more sources for one citation. And
similarly, should etymons and the source of the etymons be somehow connected or is the fact
that they follow each other enough?
7.1.2.
Tagging Process
Due to the ambiguous nature of the current tagging and the inconsistency of the data, fully
automatic re-tagging is impossible. However, it may be feasible to devise some rules by which a
large amount of current tags could be disambiguated. These may include:
•
"<p><b>" stands for "<entry><headword>"
•
"<i>" is either "<pos>", "<gloss>" or "<etym>"
•
o
the text enclosed by "<pos>" is predictable Ca part-of-speech tag)
o
"<etym>" is always between "[" and
o
the rest has to be "<gloss>"
'1"
untagged text is either "<infl_form>", "<definition>", "<equivalent lang="lat">",
"<citation>", "<source>" or "<etymon>"
o
"<infl_form >" is preceded by "</b>", ";" or "</pos>"
o
"<etymon>" is between "[" and
o
the text enclosed by "<source>" always contains Arabic numerals
o
"<equivalent lang="lat">" is alwas followed by ":--" or "<source>"
o
"<citation>" is always followed by "<gloss>" or "<source>"
o
the rest has to be "<defmition<"
'1"
79
•
"</b>"
if
not
followed
by
"<pos>",
"<gloss>"
or
"<inf_form>"
is
"</b><variant>"
•
the text directly following "<sense>" is predictable: Roman numerals enclosed by
"<b>", Arabic numerals or individual Greek characters, both enclosed by "(" and ")"
Although this is just a rough draft and most of the end tags are not treated, it amply
demonstrates how many semantic tags can replace the current ones through an automated
procedure. There are, however, several problems involved: (a) general inconsistency, (b) author's
comments that tend to spring up at unexpected places, though usually enclosed by square
brackets, (c) properties of etym, pos, xref, etc. that can be derived only partially from the
enclosed text so that manual editing would be unavoidable.
7.2.
Morphological Analyzer
Another task connected with the project which would certainly enhance its value, but which
could also stand independently, is the development of a general morphological analyzer of Old
English. The function of such a tool is to analyze an Old English word form on the input and
provide (a) a corresponding lemma, (b) grammatical categories of that particular word form and
(c) all inflected word forms of the lemma. This is in fact what a dictionary user normally does
(and has to do to use the dictionary effectively), but it requires some knowledge of the language
and especially its grammar. Not only would many beginners appreciate help of such a tool in
searching for a translation of a word form they found in their seminar texts, it could also become
a basis for new features of the dictionary. First, not all dictionary entries contain inflected forms
of the particular word and it may be difficult to determine the grammatical categories of a
searched item without the inflected forms to guide the user, thus it may be helpful to see all the
inflected word forms. Second, if a lemma can be easily derived from any word form, it is possible
(a) to connect external (lemmatised) data with dictionary entries and thus, for example, to create
an automatic glossary and (b) to connect any specimen of Old English in the dictionary with
external resources like other dictionaries (DOE), editions of Old English texts or corpora.
Usefulness of such connections has already been discussed in chapter 4.
80
However, development of such a tool is an ambitious enterprise. Because we do agree with
Sidney Landau's comment that "[d]esigners of computer programs invariably say that they can do
whatever one needs to be done" and especially that "[t]his should be taken as a philosophical and
rhetorical comment rather than as a practical statement of intention" (Landau 1989, 27 5) we will
fIrst try to identify the major factors that may complicate the development of this tool:
•
Old English is an inflectional language with a rich introflectional subsystem and it is
thus impossible to pair any word form with its lemma on the basis of a cursory
comparison, as it is in case of isolating languages. That it can be done, however, has
been exemplified by such tools as the Czech lemmatiser AJKA (Sedbicek and Smrz,
2001).
•
There were many Old English dialects and many orthographical standards. Hence also
many variant spellings and dialectal forms that further complicate the affiliation of
forms to paradigms and lemmata.
•
Our knowledge of Old English grammar is necessarily fragmentary and some word
forms can only be inferred.
7.2.1.
Stemmer
The fIrst step in developing an analyser is the production of a stemmer tool. The stemmer
first needs to identify all grammatical morphemes (both inner and outer flection) in the form and
remove them or replace them with wildcard characters. A simple example may be the form
"farep" where the stemmer needs to identify the "-ep" ending and produce the form "far*". To
identify an ending or a case of inner flection, the stemmer needs a list of all possible grammatical
morphemes of Old English and a list of all acceptable stems. If a list of stems is not available, but
a list of all acceptable lemmata is, a lemmatiser may be built instead of, or as an extension of, the
stemmer.
81
7.2.2.
Lemmatiser
A lemmatiser differs from the stemmer in that its final product is a lemma, not a stem.
Because we have a wordlist of the dictionary and possibly a list of the variants, which could be
combined into a list of lemmata, it seems more sensible to build a lemmatiser directly. We do not
have a list of all the possible grammatical morphemes readily available and there seem to be at
least two ways of producing it. We could either try to build it upon our knowledge of Old
English grammar and word-formation (inductively) or we could (deductively) take a
grammatically tagged corpus, have a concordance made with paradigms of the same word listed
grouped together and then by contrasting all the forms with each other within the paradigms we
could get a tentative list of grammatical morphemes. Such list would, however, include also a
number of differences that have nothing to do with grammar. It seems that it would be profitable
to combine the two approaches, but it is difficult to predict results of this process before it is
tried out. A third list of the "lemma forming" morphemes (such as "-an" for some verbs) needs
to be prepared.
With the three lists, the lemmatiser's run may be expressed by a simplified flow chart as
depicted on the following page (Fig. 2):
82
Does the form
Yes
Does any morpheme
from the list matches
an appropriate
string(s) in the form?
Output form(s).
Strip the form of any matching
morphemes and add corresponding
lemma formine momhemes.
Does the
form(s) match
any form(s)in
the lemma list?
Yes
Output form(s).
No
No
Vary the root vowel of
the form(s) (spellings).
Does the
form(s) match
any form(s)in
the lemma list?
Yes
Output form (s).
No
Figure 2 Lemmatiser Flow Chart
This chart is obviously very simplified; it does not, for example, specify how to strip the
morphemes or vary the root vowels. Such particulars need to be devised after the lists are
completed, because such decisions depend on the character of the lists. For example, if the
lemma list does not contain prefixed items, prefixes should be stripped, but not if the prefIxed
items are in the list. Similarly it does not explore the possibility of stripping word-formation
affixes, which may be a necessary step in lemmatising input forms.
83
More detailed analysis into the possible results of such procedure will also be necessary. If the
spelling variation on the input is too wide, the lemmatiser may be unable to generate any
satisfactory output (none at all, too many forms, etc.).
7.2.3.
Morphological Generator
Suffice it to add that if the list of endings and affIxes can be enhanced to provide possible
grammatical categories that each of the grammatical aff1xes (endings) can express, it could be
used as a basis for a tool that would generate all possible forms of a paradigm. Clearly only some
endings may be combined with some bases. This information may also be derived from a
grammatically tagged corpus, but the practical output of such a tool is difficult to predict without
any further analysis.
It is not an aim of this paper to give detailed solutions of this sort, but raci1er to provide ideas
for future development of the Bosworth-Toiler dictionary.
84
8. CONCLUSION
This paper aimed to survey the field of Old and Middle English dictionaries with the
perspective of digitising some of them. It attempted to choose a suitable candidate, discuss what
a resulting electronic dictionary should look like and it tried to go through all the necessary steps
of implementing this. As these steps have been actually followed, a functional electronic version
of a Bosworth-Toller dictionary has been created. Hopefully, the extensive survey of dle field and
the theoretical description of the digitisation may prove useful to other researchers, but it may be
that the merit of this paper should be judged mainly by the usefulness of the dictionary produced.
In its two final chapters, the paper tried to suggest some future directions of development
that the digitisation project may take, and some cursory proposals have been made about some of
the future phases of the project. The two proposed phases that were described in a greater detail
- the semantic re-tagging and the construction of the morphological analyser - were also
submitted for a grant support to the Grant Agency of Charles University (GAUK).
The dictionary itself is attached to this paper and can be readily tested on any PC equipped
with MS Windows 2000 /XP /Vista or compatible. If a newer version has been produced
meanwhile, it should be available at the webpage of the dictionary application,48 where all future
versions will be freely available to anybody in need of an Old English dictionary.
85
9. BIBLIOGRAPHY:
"Anglo-Saxon Literature" The North American Review, Vol. 47 (1838), pp. 90-138
Bosworth, J oseph. An Anglo-Saxon dictionary, based on the manuscript collections of the late
Joseph Bosworth. Ed. T. Northcote Toller. Oxford: Oxford University Press, 1838-1972
Birrell, T. A. "The society of antiquaries and the taste for old English 1705-1840"
Neophilologus. Vol. 50, No. 1. Oan., 1966), pp. 107-117
Bright, James W., "Glossary" An Anglo-Saxon Reader, edited, with notes, a complete glossary, a
chapter on versification, and an outline of Anglo-Saxon Grammar. N ew York: Henry and
Holt, 1912, pp. 241-385
Bright, James W., "Review of A Han4J Anglo-Saxon Dictionary: Based on Groschopp's Grein by James
A. Harrison; W. M. Baskervill" The American Journal of Philology. Vol. 6, No. 4. (1885),
pp. 493-495.
Brook, G. L. "Review of Anglo-Saxon Reader in Prose and Verse by H. Sweet; C. T. Onions" The
Review of English Studies, Vol. 25, No. 99. Oul., 1949), pp. 282-283
Browne, Wm. Hand. "Notes on Morris and Skeat's 'Specimens of EarlY English"' of Modern
Language Notes, Vol. 7, No. 5. (May, 1892), pp. 133-134
Burnard, Lou and C M. Sperberg-McQueen. TEI Lite: An Introduction to Text Encoding for
Interchange. TEI, 2004 < Imp:! hv\nv.tei-c.org/Lite/tciuS split cn.html>
Campbell, A. "Review of A Short Dictionary ofAnglo-Saxon Poetry in a Normalized EarlY West-Saxon
Orthography by J. B. Bessinger" The Review of English Studies. New Series, Vo!. 13, No.
52. (Nov., 1962), pp. 436-437
Calder, Daniel G. "Review of A Guide to Old English by Bruce Mitchell; Fred C Robinson"
Speculum, Vo!. 59, No. 2. (Apr., 1984), pp. 416-419
Chase, Frank H. "Review of A Concise Anglo-Saxon Dictionaryfor the Use of Students by John R. Clark
Hall" Modern Language Notes. Vo!. 10, No. 2. (Feb., 1895), pp. 50-52
CL.W. "Review of Altenglisches Erymologisches Worterbuch by F. Holthausen" The Review of English
Studies. Vo!. 10, No. 38. (Apr., 1934), pp. 242-244
Conner, Patrick, W. "Review of A Thesaurus of Old English, 1: Introduction and Thesaurus; 2: Index. by
Jane Roberts; Christian Kay; Lynne Grundy" Speculum, Vo!. 73, No. 3. Oul., 1998), pp.
887-889
d'Ardenne, S. R. T. O. "Review of EarlY Middle English Verse and Prose by J. A. \V Bennett; G. V.
Smithers" The Review of English Studies, New Series, Vo!. 19, No. 74. (May, 1968), pp.
183-186
Egge, Albert E. "Review of Specimens of EarlY English. by Richard Morris" Modern Language
Notes, Vo!. 1, No. 5. (May, 1886), pp. 65-68
Einarsson, Stefan "Review of EarlY Middle English Texts. by Bruce Dickins; R. M. Wilson" Modern
Language Notes, Vol. 68, No. 8. (Dec., 1953), pp. 575-576
Eliason, Norman E. "Review of A Handbook ofMiddle English. by Fernand Mosse;James A.
Walker" Modern Language Notes, Vo!. 69, No. 2. (Feb., 1954), pp. 135-138
Ellis, Michae!. "Old English Lexicography and the Problem of Headword Spelling." ANQ. Vol. 6
Issue 1. 0an., 1993), pp. 1-9
Fisher, George P., Timothy Dwight and William L. Kingsley (eds.). "Review of Hand-book of
Anglo-Saxon and Early English by Hiram Corson." The New Englander. Vo!. 30, No.
114. Oan., 1871), pp. 552-3
86
Garnett, M. J ames. "Review of An Anglo-Saxon Dictionary by Joseph Bosworth; T Northcote Toiler and
A New English Dictionary on Historical Principles by James A. H. Murrqy" The American
Joumal of Philology, Vol. 5, No. 3. (1884), pp. 359-366
Garnett, M. J ames. "Review of An Anglo-Saxon Dictionary, Based on the Manuscript Collections of the
Late Joseph Bosworth, D.D., F. R S. by T. Northcote Toller, A Concise Anglo-Saxon Dictionary
for the Use of Students by John R. Clark Hall and The Student's Dictionary ofAnglo-Saxon by
Henry Sweet" The American Iournal of Philology. Vol. 19, No. 3. (1898), pp. 323-328
Garnett, M. J ames. "Review of An Anglo-Saxon Reader in Prose and Verse by Henry Sweet" The
American Iournal of Philology, Vol. 4, No. 3. (1883), pp. 332-338
Gummere, Francis B. "Review of An Anglo-Saxon Reader. by J ames W. Bright" Modern Language
Notes, Vol. 7, No. 5. (May, 1892), pp. 149-151
Hahn, Reinhard F. "Dictionaries, Lexicons & Thesauri" Beginners' Guides to Omine Language
Materials. Lowlands-L, 2007
< http://wwv:.dmvlands-l.nct/oft1inc cng.php#thcsauri>
Hill, Thomas D. "Review of Word-Hoard: An Introduction to Old English Vocabulary" Speculum, Vol.
53, No. 4. (Oct., 1978), p. 786
Hetherington, M. Sue. "The Recovery of the Anglo-Saxon Lexicon" Anglo-Saxon Scholarship.
Eds. Carl T. Berkhout and Milton McCormick Gatch. Boston: G. K. Hall & Co., 1982,
pp. 79-90
Jacobs, Nicolas. "Review of A Book ofMiddle English J. A. Burrow and Thorlac Turville-Petre"
The Review of English Studies, New Series, Vol. 45, No. 180. (Nov., 1994), pp. 545-547
Jenkyns, Joy. "Review: The Toronto Dictionary of Old English Resources: A User's View" The Review
of English Studies, New Series, Vol. 42, No. 167. (Aug., 1991), pp. 380-416
J. M. G. "Review of A Concise Dictionary ofMiddle Englishfrom A. D. 1150 to 1580 by A. L.
Mayhew; WaIter W. Skeat" The American JournaI of Philology, Vol. 10, No. 1. (1889), p.
99
Knott, Thomas A. "Review ofA Concise Anglo-Saxon Dictionary by John R. Clark Hall" Modern
Philology, Vol. 15, No. 1. (May, 1917), p. 64
Landau, Sidney. Dictionaries. The Art and Craft of Lexicography. Cambridge: Cambridge
University Press, 1989.
Macdonald, A. "Review of EarlY Middle English Texts by Bruce Dickins; R. M. Wilson." The
Review of English Studies, New Series, Vol. 4, No. 16. (Oct., 1953), p. 404
Magoun, F. P. Jr. "Review of A Concise Anglo-Saxon Dictionary by John R. Clark Hall" Speculum.
Vol. 7, No. 2. (Apr., 1932), pp. 286-289
Magoun, F. P. Jr. "Review ofAltenglisches Etymologisches Wiirterbuch by Ferdinand Holthausen"
Speculum. Vol. 8, No. 1. Gan., 1933), pp. 94-96
Malone, Kemp. "Review of Middle English Dictionary by Hans Kurath; Sherman M. Kuhn"
Language, Vol. 29, No. 2. (Apr. - Jun., 1953), pp. 204-208
McSparran, Frances. (editor-in-chief), The Middle English Compendium. Michigan: University of
Michigan Digital Library Production Service, 2001-6
< http://cts.umdl.umich.cdu/m/mcc/hdp.html >
Meritt, H. D., Fact and Lore About Old English Words. Stanford: Stanford University Press,
1954.p.vii
Mitchell, Bruce and Fred Colson Robinson. A Guide to Old English. Blackwell Publishing, 1982
Mitchell, Bruce. "Review of Sweet's Anglo-Saxon Reader in Prose and Verse by Dorothy Whitelock"
The Review of English Studies, New Series, Vol. 19, No. 76. (Nov., 1968), pp. 415-416
Nicholson, Peter. "Review of A Book ofMiddle English. by J. A. Burrow; Thorlac Turville-Petre"
Speculum, Vol. 69, No. 1. Gan., 1994), pp. 115-117
87
Orrick, Allan, H. "Review of A Grouped Frequenry Word-List ofAnglo-Saxon Poetry" Modern
Language Notes, VoI. 70, No. 6. Gun., 1955), p. 438
Owen, W. J. B. "Wanderer, Lines 50-57" Modern Language Notes, VoI. 65, No. 3. (Mar., 1950),
pp. 161-165
Samuels, M. L. "Review of An Anglo-Saxon Dictionary by J oseph Bosworth; T. N orthcote Toller
Enlarged Addenda and Corrigenda to the Supplement by Alistair Camp bell" The Review of
English Studies, New Series, VoI. 25, No. 97. (Feb., 1974), p. 111
Sedlack, Radek and Pavel Smrz. "Automatic Processing of Czech Inflectional and Derivative
Morphology." FI MU Report Series. June 2001. Brno: Faculty of Informatics, Masaryk
University.
Stanley, E. G. "Review of A Guide to Old English, Revised with Texts and Glossary by Bruce
Mitchell; Fred C. Robinson" The Review of English Studies, New Series, VoI. 36, No.
141. (Feb., 1985), p. 141
Tolkien, J. R. R. "The Devil's Coach-Horses: Eaueres" The Review of English Studies, VoI. 1,
No. 3. GuI., 1925), pp. 331-336
Ward, Allan. "Review of Word-Hoard. An Introduction to Old English Vocabulary by S. A.
Barney; E. Wertheimer; D. Stevens" The Review of English Studies, New Series, VoI. 29,
No. 115. (Aug., 1978), pp. 329-330
Wardale, Edith E. "Anglo-Saxon Studies" Years Work Eng Studies. VoI. 1, No. 1. (1919), pp. 3238
Wardale, Edith E. "Anglo-Saxon Studies" Years Work Eng Studies. VoI. 2, No. 1. (1920), pp. 4153
Wilson, R M. "Old English Literature" Year's Work in English Studies. VoL 51, No. 1. (1970),
pp. 59-77
Wilson, R. M. "Review of A Handbook ofMiddle English by Fernand Moss(\James A. Walker" The
Review of English Studies, New Series, Vol. 5, No. 17. Gan., 1954), p. 107
Woolf, H. B. "Review of Poetry and Prose of the Anglo-Saxons: Dictionary by Martin Lehnert"
Language, VoL 32, No. 4. (Oct. - Dec., 1956), pp. 766-769
88
10. APPENDICES
10.1.
Samples of dictionary entries
All the samples show an entry for the corresponding equivalent of a Modern English verb "to ache".
ACAN; ic ace, dO recest, :ecst, he :ecep, :ecp, plo acap; p. ~C, pi.
6con; subj. ic, du, he ace; pp. acen; 11. n. To AKE, pain; dolere: -Gif
mannes midrif [MS. midrifeJ ace if a man's midrijfake, Herb. 6; Lchdm.
i. 88, I I: Herb. Cont. 3. 6; Lchdm. i. 6; 3, 6. Acap ml'ne eagan
my BJes ake, lElfc. Gr. 36. MS. D; [mistiap .... acap. Som. 3S, 48J ; dolent
mci oculi, Mann. [Laym. p. oc: R. Glouc. p. ok: Chauc. ake: lV. L. Ger.
aJs.en, :eke?]
_
a,
Figure 3 "ACAN" - an entry from An Anglo-Saxon DictionaryofJ. Bosworth and T.N. Toller
acan~ to •ache, I su.ffer pain. '•.'J!.'.
Figure 4"acan" - an entry from A Concise Anglo-Saxon Diction31yofJ.R.C. Hall
acan
:\::t. S?: 1I!ce() J! lCl~ Ii lee; .ley 11\led 3) I: lcen.
Lue: :J.i.:e t.i!'l~'1.":i;j) iltl::cp i;..:iii) Ilk~~ (:ci.ii) liuo!1 (,ub.i.pLJ~:iii: Hcdxiii)
20 oc<:. ,'ffi1.ln1r l!l xedic.11 !e~iFe.)
1. of parts of the body to ache
l.a. in im personal construction: it aches
TrinHom (r...lorris 4) 21"'::5: ~U€ biG:r:: u:~ :.id: S.ti:-.tt: )'I.t:"Je rr.idcbEde _po6e ,ime Cl..'ll"\':O ~.l: :'i.:e:le OC:1e:1e i.:T.e..::.
2. aaft mine eagan 'my eyes ache, ? are weak I failing' rendering caligo'I see indistinctly / am blind'
S~e
11so: ec.:
~,rEDike:1.0ED2,EDD
tc..:':e :·.DO:3T.tkt:
I"-
Figure 5"acan" - an entry from A Dictionary of Old English Project on CD-ROM, A-F
89
Aken, v. to ake, to throb with pain, C2,
Sz, Prompt., NED; eken, MD; ;aik,
NED; OC,}!. s., MD; ok, MD; oke, MD,
NED; akide, NED; oken, pt. pt.} PP.-AS. amll, pt. oc, pp. ace1l; cp, lee1. aka, to
drive, Lat. ague. Cf. Ache.
Figure 6" Aken" - an entry from A Concise DictionaJY ofMiddle English by Mayhew & Skeat
aken, v., O.E. acan [.Y allied to O.Jv' aka, drh/e,
Lat. agereJ; adze: aken & smerten HO?l1. Il.
207; akin' doleo' PR. P. 8; ake HO!>!. 1. 149 ;
MIse. 95; FER. 1831; LUD. COY. 232; akep
(fres.) REL. 1. I I I; LANGL. A vii. 243; pet
heaved me akp AYENB. SI; his bone-s akep
SHOR. 2; pine banes ake>'s pe H. 111. 31 ; aken
CH. C. T.B ZI 13; reke, eke (fres. subj.) A. R.
360,368; ekinde (jJjJ/e.) A. R. 360; 5c (fret.)
LA3. 6707; HOM. n. 21; ok ROB. 68; ook
S. A. L. 42; oken,oke LANGL. B xvii. 194 ; 5ke
(fret. subj.) P. L. S. xx. 66; derhJ. ache.
Figure 7"aken" - an entry from A Middle English Dictiona1J7by Stratmann & Bradley
akell (,0') •.lJso ekeu. Fonns: ppL aking. akinde, ekillde; p. ok, hock, 'trok, eok, akide.
[0;: .c.lI, Oc.J
1.
To ache. be painfuL (a) of parts of the body; (b) of a wound or sore; .. also with personal obj . esp. in early :-",~.
(,)
(c~
in pro'.:ercs.
cll:;O(Or) HrLHApuL(Hrl6158B) 109.83 I: Gifman his liou acen.
c 1150(Or) H,'LHApuL(Hrl 6258Bl 110.83 5: Gifm.nnos midhrif ace.c1l50eOI) PDidax,(Hrl 625Sb) - ,6 To pan matu) pret hys heafod
::ecp ..:'\im senepsred [etc.}.a1125(?OI) rrin.Ho1lt.CTrin-C B.14.52) 107: Penne akeo his h.;olt.; and smeneo Iorhis sinnes.d17S(?al:::OO) Lay.
BrHt(Clg .-\..9) 670;: iiis h::ehued oc s;;e s\..·eOen (Otho: hod: so s ..\·ipe].al~:!5{cl:!OO) Vices,-* EO) (Se)'; 3-1) il ~S: Eis heaued him acph, oe
eiene him trukieb.d2~5(?cl100) H_Uaid.(Bod 34) 28 ';'56: Pine banes akeo pe.clZ1:: StJuUaJla (RoY 17,,\,~ 7) 3':9: Him eoe [Bod: wrong] euch
t1ei1.cl130r~a1100) *Anu.(Cory-C -to::!:) 9:b: Se Sa! akinde (:\ero: ekinde; Pep: akeande] heaued ..Betere is tinger oife. pen he ake [:\ero; eke)
eauer.cl300 SL~g.And.(Hr11277) 66: Pat him eke ech bOI).c1350(a1.333) Shol'eham Poems (.-\dd 1-:'376) 1 2t: r or',.l:anne man dra;\ip in·to
oldeward. Wel oft< his bones a1:el>.(1340) .1\'e"b.(AI·un 57) 51: Pet heaued me akp.(cl390) C1ta,'tAbb,,· HG (Ld~lisc 210) ;"6:; haue .. cried
afti.r hem, pat my chaules aken.(cl390) Chaucer CT..\JeL(:\Ianh.RickE'rt) B.2 i 13: ).f;·ne erys aken ('n. eke J of thy drasty speche.cl-lOO(d.3 7~
PPLB (Ld.'Hsc 581) 17.19.l: pough alle my fyngre.s oke.a14:2::(cl3S5) Chaucer re CBenson-Robinson) ~ 1~61: }J nyght .. hath reyn 5e do me
\,\·aJ.:e, That som ofus..hire he des ake.?a.1.:l25 *Challliac(l) 0"\' 12) lD7b b: .-\1 wery wal1:-::.:ng:uhurte~ ii.U1cturo5e i. akyng: men.':al-t:2::
*ChauLiac(l) C....\' 1:2) ] 39b a: Anoynt pe akyng top \\ith wyne.d4:!5 Glo.C1tr01f~4 (Hr! ;:Ol:\\'t·ight) ~6: Pat hech 1~1]le hy-m
oke.al-t50(cl410) Lo\·el. Cl-ail (Corp.C SO) 27.91: P...is Re;.-nes Oken, his Ribbes they gnowe.a14S0 Sr.Edirha {[st B.3) ~ -:-~ 1: Bot his hedde
woke so sore y-\\'ys.
(b)
~l~!H!.§£.!:~. b~j~kW.rin-.~.!!.lA£l
105: p~ manie opere bittere pines & te sore akinde \':ondis
cI415(al-t20) L,·dg. TB (Aug A.4) 3.592: Eis wouncles ..fofto hele of her akyng sore.cl-t15(aI-L:O) Lyd~, TB (.~U2 AA) 5.~ :SS: r..is sore ~a.n so
ake and ~ren:.:cl.t':5 +C1tauuac(2J (paris angL~5J 7a b: An al-ynge and apostome-d dc€r.cl..t30{l-l10) \Yalton Boedl.(Lin-C lOJ) p j:-:~. Et
[a bee) stingep bitterly ..it akep feruently. 1301500 HE.'nsLo~~· Recipes (Hellslow) 26 15: 3~d a \\·oLmde akyp. ! ake nepte .. and hit schal do away p~
ache.
(c)
cB25P"o".Hend,iHrl125~)
93: T,lpoun,u.rpyfopatpyfotak,p.
cl..t75 Rn'LPro1.'.(R''i'l D.328) p.123: Euermy tong permy toth ah.-:;the. Semper cum dente reman€"bit lingua doleme.
(a) \\'ithout subj: hire ok, she felt pain; (b) in Lat. constr.
(Ji)
al~:!5
(bl
(.1382) I'BibleO) (Dd70) "Kings '.19: :;1yn heu,dJ aal:e lL caput m.um dol,,,].
TIiIl.Hom.C1'I!ed (lrin-C B.l-t.5:!) 21: Po oe time cam swo pat hire ne 0': ne ne smeal1.
31415(3138:) HBibi.e(J J (Corp-O
-n
3 Kings 15.23: In
th~ t)1ne
of his eetde he akide the feet (rT'3(2,!: hadde ache in feet: L doluit pedesj.
Figure 8"Acan" - an entry from an online version ofthe Middle English Dictiona1J7by McSparran et al.
90
10.2.
Samples of correctors' materials
Following are samples of the materials that the correctors worked with in the Bosworth-Toiler
project. First the scan of the original, second the "middle" version (interpreted OCR) and third
the raw data that had to be corrected. There are 13 errors in the randomly selected 10 lines
sample. There are 180 such lines per one dictionary page and over 2000 dictionary pages were
hand corrected. The sample is from page 695 of the main volume (part of entry for "molde").
bred <fret hyre man sumne diEl d"rere halwendan molJ:m (pulveris) sealde,
Bd. 3. 11 ; S. 536, 5-8: 3, 10; S. 534. 'la, 29. Ba de for hund wintrum
mid eorpan moldan (pu/vere terrae) bewrogene wiron, L. Ecg.
P. iv. 66 ~ Th. ii. :126, 23. Bonne hit (cadaver) bip on <!a. byrgellne
lIet, I!ollne wyrpcp man moldan ofer hit, L. Ecg. C. 30; Th. ii. 162. 3.
His pegnas mid moldan hit (a cross) ge[xstnedon adgesto a militiblls
pulvere. terrae jigerelur, Bd. 3, 2; S. 524, 19. Be moldan ;:fa de 011
«:Ere stowe genumenc wi£fon, 3, 9; S. 533. z 7.
1I. ground. earlh,
lalld :-Molde vel land humus, rlJs, Q/'vum, Wrt. Vac. i. 41, 61 : humus,
70, 12 : lElfc. Gr. 8; Som. 7. 5~. Of ctrere moldan tyrf from the Krass
L Ecg. P. i\". GC;: Th. ii. 22G, 2:3.
(J.
DOllll(~
bit ((l/.rlU;:';I:: hijJ
(,11
<la
militilnl., ]llIluTe, terfat'; figcretuT", Bd. :1. 2 : S ..")24. 1D. Hi.' lIl,oId;ill
fa f(· Oll <la'H' stow<' P,"<'1l11l1Wll<' "';i'roll,
("I/Tth. land:
GI :
II"IJ.1II.1J.8,
\[01<1(, 1:('/
bro
:3. !J : S. ':;:.\:). . i. 11. ,
hmd huw:IJ.S. rit".
TU. 12 : :THe. Cr. 8: SOlll. T.
lilld..
1II"'i"II!!1.
I,Yrr. '·"c. i . .~ I.
·:;:3. Of
;\;1']"(' lIHJlddll 1,\"i'f
fml!/. thl' gm.>., of the YTo'll.nti, Exoll. 56 b: Th. 2n2. i:,: Pll. fi(j. Cud
bæd ðæt hyre man sumne dæ-acute;l ðære hálwendan
moldan «I>pulveri</I> s) sealde, Bd. 3, 11 ; S. 536, 5-8 : 3, 10; S. 534,23,29. Da ðe for hund
wintrum mid eorban moldan <I>(pulvere terras)</I> bewrogene wæ-acute;ron, L. Ecg. P. iv.
66; Th. ii. 226, 23. Donne hit «I>cadaver</I» bi&thom; on da byrgenne set, ðonne
wyrpe&thom; man moldan ofer hit, L. Ecg. C. 36; Th. ii. 162,3 .. His &thom;egnas mid moldan hit
«I>a cr</I> oss) gefsestnedon <I>adgesto a militibus pulvere, terrae figeretur,</I> Bd. 3, 2 ; S.
524, 19. Be moldan ða ðe on dære stowe genumene wæ-acute;ron, 3, 9 ; S. 533, -.
7. n. <I>ground, earth, land</I> :-- Molde <I>vel</I> land <I>humus, rtts, arvum,</I> Wrt.
Voc. i. 41, 61 : <I>humus,</I> 70, 12 : Ælfc. Gr. 8; Som. 7, 53. Of ðære moldan
tyrf <I>from the grass
91
10.3.
Bosworth-ToUeron a CD-ROM
Here are the basic user instructions for the electronic application as they were made available
on the application website: 9
Installation - Simply extract the contents of the .zip files anywhere on your disk,
preferably in an empty folder. Any common extraction utility can be used, e.g. the free
ZipGenius. 5o Optionally install the supplied font by copying the "Card098s.ttf" file from
the "font" folder into the Windows fonts folder (usually C:\windows\fonts).
Start the dictionary by running the "BT_01.exe" file.
Use - The application is designed to be easy to use. Note that the current version
can be freely resized. Here are the basic things, you can do with the application:
• Scroll the wordlist using the scrollbars and display the particularly entry by
clicking the headword in the word list. Note that the list is not ordered
alphabetically, but rather as it was in the original dictionary, so don't forget to
check out the supplemental entries by the end of the word list. The automatic
alphabetical sorting has be temporarily disabled (it was buggy and needs to be
reconsidered) .
• Browse as you type - the application automatically scrolls the wordlist as
you type in the search box and displays the first word.
• Visually the entry looks almost like the printed original, but it allows you to
select portions of the text and copy/past them anywhere you like
._-ToIIer
..
,~
GU. :;64. De in Dryhtne~ nom;.n
d:-,,-ht-g-u.."l:;;'
d,-;-hp
dryhc-le6p
&:--1::-lic
d.:-hr-l~c~
C\YC1lle 'Wl10 G:nest in the Lords J"ld..:n:=.. Ex.:-!!.. :.3 b: Th. 26. S:
':'1:3. We for Dryhcene iu drei...-nas hefdon tiE fOrrn.t:£~:- 112djoTs beIc..:,,:"= clle Lo£.::: Cd. 1:";; Th.
267.26; Sat. -+4. [L2}711J cirihrell: OnI:. d=ibti..n: O. S.2..:\.: ch'i)i's~n: O. F~ . .:!:"oC:l:e:1 LO:-(1; oz:ly t:sed
Goe' ll1d eiwsr .. O. H Ge.r. mlhtin c!6mI..t1US : Ice!. crO(c:n::l f'--r111ceF!'~ DER. f~e'd.'~r:-hten, t~eO
. hle-o-. m-an-, sige-! v,-eoruld-, ""ine-.
d~,.ht-!l::.ipn:
_ cl,-,-h,-o:,
49 hrrp: ! !lcxicon.ffculli_cziapp / in,;rrucri()lFhrm
http:; /\\·\\·w./'1PJ-~L'niLls.it!l'nfJ/~par(: id=11
50
•
•
•
Full-text search - any string you enter is looked up anywhere in the text of
the dictionary by clicking the "Fulltext" button or hinting "ENTER". If the
"browse as you type" slows your typing, click the "Search" tab to have it turned
off.
The text you enter may match any text in the dictionary, including parts of
words. Thus if you enter "ing" you will get all the words including this string
(e.g. "searching", "findings", or "ingenious"). If you are looking for whole words
only, include spaces before and after the searched text. Individual character
can be substituted with "7" so that searching for "p7ay" will match both "play"
and "pray". HTML entities can be also used to search for special characters so
that "dæI" will match "de I".
The results are displayed under the "Search" tab as a kind of filtered wordlist.
The matched strings are highlighted in the entries, though the current version
does not highlight text found using the entities or wildcards.
f-f. . :.'
un-wira. 111; m . ..d tOofish, SWptG., it·id~;;s per:;O!2,
J
Eo!p um~-it2.. Exon. Th. 433, 21; Ra. 50,1:. Geb:g
fool :-- So! urnn~2 1!1sipie!J.s, P~. Lal'ib. 13. l.
fi.'2!U
u;'l\yir..n
:inienSJ[L-l.:,
J..::d
OU!1~ v.-i~'e:].st
on SCuntnY5Se r...is, Seine. 188, 11. "\V e l~n.p o~t pre:6m. ge:hV.-l1c [,) :;:r.ooe gd~c=-.:: nu...'"'. (0 cnihre
oc drug lufige. 1. Edg. C. -+: Th. ii. 24..+. 1":', Gifhir ;"rr.:.v,-i(~r: ;e:l:ge bv,-:1e
b1.lran h;efrum, hit ourh href v,-adep, b:mep boldgetimb:u. 2.J.J!~:. K:~:b1. 222: ~~.l. .:.1 fJ. [Gif
eni unv·;eo(e leseo DU oflnnr o~dre 3-= been, A. R. 3, 22. O&e Fl::';; ',-.,icc::'e:c:ef:e:: 8i.3ub5
-and ni:nigne umyiaa
h~aldlP
um\-eoren (-wiren, l\1S. R.j, Kath. 1054. Ull"';i:eoten bmen "I,;-i::• .\Li.:-h. 6. :: 1. 0. h~ G=~-. ur:-\'.-izzo
in;cius, ignJrus: lcf!j, u-viri J.ll i.J.fj!s_.
I
'1
Page:
•
.-
2
H"jd~5; per..'ot1,'u-Yir~
sen;::h:s;, :n5:m~.J
..!J
bt_blB3
,?
Special characters - can be entered using the buttons and a drop-down
menu in the upper right corner. Standard HTML entities (æ
e ) can be
used in full-text search .
•
BosWorth-T~
• f.
I
pxc~:::
pxc~r,
pxc-cigd:
p:oec.:::
pxc:-=;
p:oeg.::
b"h
pXi'..in.
pi:h~:;;!1
px=-x:
p;e::.,n
p.e:--b!g
p~-b!..'1..'l~:
1-"'~ ~M:.b~Mim
__
I
i i ~_?~!M
........................_ _ _ _........"""'"-......
~-.~
.. _ _ ,...
.:.~~
93
•
Accesibility - has been on our mind; font, size and colours can be easily
changed at any time, using the "Font" dialog under the "View" menu. Note that
some characters may not be displayed correctly in some fonts. Our default font
is Cardo .98. 51
D::::~;r5::. ',;-- ;_,l"
__
"".lidiot .
. .
Wordfist
~
~
-
.•searCh
o,,,,,._ .... ~
_., •. _
.. - -
.' '
"jQJ.'S
Text
IImage.".-
I~
..
Sans ITe
un-gelcered; adj. Un (j Tefl1;:;us
Teml<nal
r·jew Rorri3r.
Ungelcered i..c!.igla, Wr 'Ito Times
Times r-lew Rcmaq P
..!:row
sce ungelcered scipstiE 'li'Toefl
1j; To~e'ock
un-wfs
un-wita
navem et imperitus d
Eff-!"::,
swa ungelceredes foie r Stn<::~Jwt
AaBbYyZz
S. 628, 30. Dysine anc r Un~f'le
20,9. T6 hwon DU SCE Di(7,
ungelceredne fiscere ( I.,,,,
we syndon idiotae SUI
cefre suce Clrfste CIa ur
lari6wd6mes ab impentls pastorale maglstefium qUClre7fi7ffFitate--;
suscipitur? Past. 1; Swt. 25, 16. v. u n-lcered.
I,~.
51
J
"
...
..>J
,. .
'.,"
~.,
http:,: . schoiarsfonts. net! c;lrdofnt.hr1l1i
94
11. CZECH SUMMARY / SHRNUTi V CESKEM JAZYCE
Uvod
Cliem teto diplomove price je nastinit moznosti digitalizace slovniku stare a stiedni anglictiny
a popsat postup takove digitalizace na konkretnim projektu, jehoz vystupem bude digitalizovany
slovnik.
V uvodu prace jsou uvedeny nektere zakladni argumenty pro digitalizaci slovniku obecne,
jako je snazsl price se slovnikem nebo nove moznosti vyuziti slovnikovych dat, a dale argumenty
pro digitalizaci slovniku starskh obdobl anglickeho jazyka konkretne, jakymi jsou napHklad lepsl
dostupnost, Ci moznost zapojeni slovniku do v,Yuky a vyzkumu historie anglictiny.
Prehled materhihi
Nejprve je nastlneno deleni slovniku a jejich klasiflkace, kterou prace pouzlva. Dale se pak
vymezuje okruh lexikograflckych zdroju, kterymi ma smysl se v pHpade digitalizace Zab)TVat.
Konstatuje se, ze prace nebude prihlliet k tematicky p£ilis uzce vymezenym dlium, ani k dlium
velmi maleho rozsahu a hloubky. Neuvadejl se tedy napHklad specializovane slovniky, glosare
jednotlivych del ci autor.l, ani obecne glosa£e s mene nd zhruba tiemi tiski hesel.
V tomto okruhu se pak autor pokousl nalezt vsechny dostupne zdroje, zlskat detallnl
bibliograflcke udaje, pHpadne odkazy na elektronicke verze a pokud mozno i strucne tato dlia
charakterizovat (zpravidla na zaklade prozkoumanl jejich obsahu a recenzl v odbornem tisku).
Timto zpusobem je sestaven uplny seznam lexikograflck,Ych zdroju stare a stiedni anglictiny,
rozdeleny na tistene a elektronicke zdroje a podle obdobl, kterym se jednotlive zdroje zabyvajL
Celkem se podarilo shromazdit a popsat pres padesat ruznych zdroju dat, predevsim tihenych
slovniku a glosaru.
Srovnani vybranych slovnikti
Z vyse popsaneho seznamu autor vybral etyri slovnlky, ktere se na zaklade sveho rozsahu a na
zaklade kvality zpracovani jevily jako nejslibnejsi pro naslednou elektronizaci a ktere alespori
easteene reprezentovaly Slii lexikografickeho materialu stare a stiedni anglietiny. Ke dvema
staroanglickym a dvema stiedoanglick-ym tistenym slovnlkum pak byly jdte pridany nejnovejsi
elektronicke zdroje pro kazde ze dvou obdobi, aekoliv jeden z nich - The Dictionary ojOld English
(University of Toronto) - jdte neni dokoneen. Techto sest zdroju pak autor srovnal jak
z hlediska makrostruktury, tak i mikrostruktury:
•
Srovnani makrostruktury se zamerilo predevsim na organizaci heslare, ktera je pE
popisu stadich obdobi anglickeho jazyka komplikovana radou znaku nevyskytujidch
se v moderni anglietine, vyberem hesloveho slova Ge tieba vybrat z mnoha nareei a
ortografickych tradic), nebo chipanim slozenych, resp. prefigovanych slov (typicky je
napr. problem, kam v heslari zaiadit slova s CastYrni predponarni).
•
Na mikrostrukturni urovni byla jednak zvazena organizace hesel (poradi jednotlivych
prvku, jejich odliseni, apod.), take se vsak srovnavala kvalita obsahu. Detailni srovnani
obsahu bylo mimo rimec teto prace, proto se autor radeji zameril na to, jak jednotlivi
lexikografove reflektovali vyvoj ve vyzkumu piislusneho obdobi jazyka a jak vyuzili sve
zdroje.
•
Dale se price zabyvala typografil a formalnimi naleZitostrni, ktere mohou hrat ph
elektronizaci dulezitou roli, a porovniny byly take specificke vlastnosti elektronickych
slovulku, jako jsou vyhledavani, zpusob prezentace, price s daty, atp.
Na zaklade srovnani price konstatuje, ze nove elektronicke slovulky predCi sve predchudce
jak v objemu, tak ve zpracovani dat, nicmene ze rozdil oproti standardnim tistenym slovnlkum
neni tak velky, jak by se vzhledem k rozdilnym moznostem v dobe vzniku dalo oeekivat.
Vzhledem k naroenosti pripravy noveho slovnlku, poetu potencialnich uzivatelu a vzhledem
k tomu, ze torontsky tYm publikoval prozatim jen cast A-F sveho slovnlku, neni prekvapive, ze za
96
standardni slovnik pro starou anglictinu je dnes stale povazovan pres sto let stary An Anglo-Saxon
Dictionary Bosworthe a Tol1era.
Moznosti elektronickeho slovniku stare a stredni anglictiny
Jelikoz dIem elektronizace je slovnik slouzid uzivatelum lepe nei original, nebo alespon
umoznujid sirsi vyuziti lexikografickeho materialu, zamerila se prace i na to, jake vlastnosti muze
elektronicky slovnik nabidnout. Nejprve byly vymezeny cilove skupiny uzivatelu, jelikoz podle
jejich porreb je vhodne budoud slovnik koncipovat. V ptipade slovniku stare a srredni anglictiny
se uzivatele lisi predevsim pokrocilosti ve studiu daneho jazykoveho obdobi a ucelem, ke kteremu
slovnik vyuzivaji (preklad, filologicky vYzkum apod.). Konstatuje se, ze v pHpade obecneho
slovniku urceneho pro vsechny skupiny uzivatelu je rreba vyuzit moznosti elektronicke aplikace
predevsim k prehledne prezentaci dat, ktere by ovsem nemeIo byt dosazeno na ukor kvality
udaju. Proto je duleZite poskytnout uZivatelum co nejvice moznosti prace s daty a zaroven jim
umoznit, aby zpusoby prezentace menili podle svYch konkretnich a aktualnich potreb.
Zduraznena je take moznost kombinace dat z vice zdroju a zpusoby vyhledavani Ci
prochizeni udaju specificke pro ruzne typy uzivatelu.
Zdroje elektronizace
Pti vYberu zdroju doporucuje prace vyuzivat jak stavajidch tistenych, tak i elektronickych
zdroju (naptiklad korpusu). Zaroveii vsak ptipomina, ze je rreba rnit na pameti hdu omezeni, a to
jak technickych (propojeni slovniku a korpusu nemusi byr podle charakteru dat zcela trivialnD, tak
i fmancne-privnich (licencni zaleiitosti je rreba resit od pocatku, jelikoz status jednotlivYch
zdroju muze rnit vliv jak na proces elektronizace, tak na budoud uzivatele). DalSim omezenim
jsou pak technicke schopnosti budoudch uzivatelu, kterym je rreba elektronickou aplikaci
prizpusobit.
97
Prubeh elektronizace An Anglo-Saxon Dictionary Bosworthe a T ol1era
S prihlednutlm k tomu, ze novy a nedavno zdarma zpHstupneny srredoanglicky slovnik
v mnoha ohledech prekonava starSi slovniky pro dane obdobi, a na zaklade predchazejid analyzy
se autor prace dOmnlva, ze nejlepsim kandiditem k elektronizaci je staroanglicky slovnik
Bosworthe a Tollera. Popis jeho elektronizace se nasledne sestava z casti pojedmivajidch 0:
•
skenovani (prevodu tisteneho textu do digitalnich obrazu) - tuto cast digitalizace
provedl Sean Crist a jeho trm na Swarthmore College ve Spojenych statech;
•
automatickem rozpoznani znaki'l z digitalnich obrazu (proces, pE nemz Je
obrazova informace automaticky prevadena na textovou) - tuto cast take provadel trm
Seana Crista a mela velkJ vyznam pro budoud charakter slovniku, nebot' se v ni mimo
jine reSit zpusob ulozem a formatu slovnikovych dat;
•
rucnich opravich rozpoznaneho textu - Tato cast byla rozdelena mezi nekolik
trmu a zUCastnilo se ji nekolik desitek korektoru. Krome oprav samotnych bylo treba
vytvoEt sofistikovany system sberu a zpracovam dat, ktery byl nejprve umlsten a
spravovan na serverech Swarthmore College a pozdeji na serverech Filozoficke fakulty
UK vPraze;
•
vyvoji elektronicke slovnikove aplikace - Aplikace je rozhrani mezi uzivatelem a
daty, pticeruZ mnoho z vlastnosti budoudho slovniku zavisi prave na tomto rozhrani.
Na zaver popisu digitalizace, jejiz vysledky jsou nym zdarma pfistupne komukoliv na
internetu52, prace konstatuje, ze Slce bylo dosazeno puvodniho dIe, tedy teoretickeho i
praktickeho rdem procesu elektronizace, nicmene vysledna data by bylo vhodne dale
zpracovavat, aby mohla byt co nejlepe zuzitkovana ve vJuce a vyzkumu.
52 Imp: i ·'kxicon. ffcllni.c, / app
98
DalSi vyvoj
Posledni kapitola pojednava
0
dvou hlavnich smerech v}TVoJe, ktere by bylo vhodne ph
zpracovani ziskanych dat dale sledovat.
Prvnim je preznackovani dat. Znackovani pouzite v souCasne verzi dat odpovida tistenemu
slovniku a umozimje tak prezentovat elektronicky slovnik v podobe blizke originilu, nehodi se
vsak uz tolik k dalSimu elektronickemu zpracovani. Predevsim by bylo rreba rozlisit jednotlive
casti slovnikovych hesel jednoznacneji, nd to cini puvodni typografie. Za timto liCelem prace
navrhuje
komplexni
system
znacek
a
doporucuje
zakladni
postup
k alespon
castecne automatizaci mozneho preznackovani.
Druhym je pak vyvoj nastroje pro morfologickou analyzu stare anglictiny, ktery by mel
umoznit napi'. automatickou lemmatizaci uzivatelskeho vstupu, cimz by se do velke miry reSila
dilemata spojena se sestavovanim a i'azenim heslare a vyznamne by se ulehcila prace uzivatele.
Dale by se dal analyzitor pouzit ke snazsimu propojovani slovniku s dalSirni zdroji dat nebo
k castecne automatizaci tvorby glosaru.
Prace popisuje mozne fungovani analyzatoru a navrhuje obecne reSeni podobne jiZ
existujicim analyzatorum flektivnich jazyku. Zaroven upozornuje na pravdepodobne problemy
spojene s malou dialektickou a ortografickou standardizaci stare anglictiny.
Doporucuje se vytvorit seznam staroanglickYch lemmat (napr. z heslare slovniku), seznam
gramatickych morfemu a seznam morfemu tvoricich lemmata (castecne z korpusu, castecne
rucne na zaklade detailni analyzy staroanglicke morfologie). Na zaklade techto seznamu by pak
melo dojit k algoritmizaci procesu ziskivani kmene (6 kmenu) z uzivatelem zadaneho slova a
dale k prevodu kmene na mozna lemmata. Poslednim krokem by pak mohlo byt jeSte vytvoreni
vsech polozek daneho paradigmatu, ktere by doplnilo neupIne tvaroslovne informace puvodniho
slovniku.
99
Ziver
Zaverem price konstatuje, ze se podarilo vytvoht elektronicky slovnik, ktery muze plne
nahradit tisteny original, a zptistupnit ho zdarma stroke verejnosti. Autor doufa, ze tak dojde
k lepsimu vyuziti lexikografickeho materialu ve slovniku obsazenem a ze se diky tomuto projektu
a zdarma nabizene aplikaci podati rozSfrit zajem
0
problematiku elektronizace, lexikografie i
rustoricke lingvistiky. V neposledni fade by pak mel byt elektronicky slovnik vhodnou pomuckou
medievalisrnm, a to nejen tem, kteti k puvodnimu slovniku nemeli dosud pristup.
100
© Copyright 2026 Paperzz