Current Issues in Language Documentation

Tokyo University of Foreign Studies, February 2010
Current Issues in Language
Documentation
Prof Peter K. Austin
Endangered Languages Academic
Programme
Department of Linguistics, SOAS
1
Outline
• What is language documentation?
• How does it differ from language description (and
from linguistic theory)?
• Reshaping 'the science of language'
• Some challenges
• Conclusions
2
Language documentation
• “concerned with the methods, tools, and theoretical
underpinnings for compiling a representative and
lasting multipurpose record of a natural language or
one of its varieties” (Himmelmann 1998)
• has developed over the last decade in large part in
response to the urgent need to make an enduring
record of the world’s many endangered languages
and to support speakers of these languages in their
desire to maintain them, fuelled also by
developments in information and communication
technologies
• essentially concerned with roles of language
speakers and their rights and needs
3
What documentary linguistics is not
4
•
it's not about collecting stuff to preserve it
without analysing it
•
it's not = description + technology
•
it's not necessarily about endangered
languages per se
•
it's not a fad
The level of interest is very high
Graduate student interest
•
•
•
5
62 students graduated from SOAS MA in
Language Documentation and Description 200409 – currently 17 are enrolled
7 graduates in PhD in Field Linguistics – 12
currently enrolled
other documentation programmes, eg. UTAustin
have similar experience
Interest in training
•
•
•
6
3L Summer School 2009 – 100 attendees
3L Summer School 2008 – 80 attendees
SOAS fieldwork seminars – 70 attendees
InField 2008 – 75 attendees
7
And more good news ...
Research funding £££ $$$ ¥¥¥
•
•
•
•
•
8
ELDP has so far funded 195 documentation
research projects on endangered languages
worth GBP 7.25 million (¥ 1,049,157,855)
Volkswagen DoBeS has funded 60 projects EUR
30 million
NSF-NEH DEL 60 projects $US 10 million
ESF EuroBABEL project EUR 8 million
and ELF, FEL, GfBS, Unesco ...
DoBeS projects
9
ELDP funding
Projects
2003-2007
10
Books and journals
•
•
•
•
•
11
•
Gippert et al 2006 Essentials of Language
Documentation. Mouton
Tsunoda 2006 Language endangerment and
language revitalization: an introduction
Language Documentation and Description –
6 issues (1,500 copies sold), 2 in prep
Language Documentation and Conservation
– 6 issues (on-line only)
Cambridge Handbook of Endangered
Languages
Routledge Essential Readings
back to Language Documentation
12
Main features (Himmelmann 2006:15)
• Focus on primary data – collection and analysis of
an array of primary language data to be made
available for a wide range of users;
• Explicit concern for accountability – access to
primary data and representations of it makes
evaluation of linguistic analyses possible and
expected;
• Concern for long-term storage and preservation of
primary data – includes a focus on archiving in
order to ensure that documentary materials are
made available to potential users now and into the
distant future;
13
Main features (cont.)
• Diversity – of contexts, languages, cultures,
communities, individuals, projects
• Work in interdisciplinary teams – documentation
requires input and expertise from a range of
disciplines and is not restricted to mainstream
(“core”) linguistics alone
• Close cooperation with and direct involvement of
the speech community – active and collaborative
work with community members both as producers
of language materials and as co-researchers
14
The documentation record
• core of a language documentation is a corpus of
audio and/or video materials with transcription,
annotation, translation into a language of wider
communication, and relevant metadata on context
and use of the materials
• the corpus will ideally be large, cover a diverse range
of genres and contexts, be expandable, opportunistic, portable, transparent, ethical and preservable
• lexico-grammatical analysis (description) and theory
construction is contingent on and emergent from the
documentation corpus (Woodbury 2003, 2010)
15
Components of documentation
• Recording – of media and text (including metadata)
in context
• Transfer – to data management environment
• Adding value – transcription, translation, annotation,
notation and linking of metadata
• Archiving – creating archival objects, assigning
access and usage rights
• Mobilisation – creation, publication and distribution of
outputs
16
An example – Stuart McGill
•
•
•
4 year PhD project at SOAS
documentation of Cicipu (Niger-Congo, northwest Nigeria) in collaboration with native
speaker researchers
outcomes:
–
–
–
–
–
–
17
a corpus of texts (video, ELAN, Toolbox)
2,000 item lexicon
archive (956 files, 50Gbytes)
overview grammar (134 pages)
analysis of agreement (158 pages)
website, cassette tapes, books, orthography
proposal and workshop
Documentation and description
18
Documentation and description
• language documentation: systematic
recording, transcription, translation and
analysis of the broadest possible variety of
spoken (and written) language samples
collected within their appropriate social and
cultural context
• language description: grammar, dictionary,
text collection, typically written for linguists
Ref: Himmelmann 1998, Woodbury 2003, 2010
19
Documentation and description
•
•
20
documentation projects must rely on
application of theoretical and descriptive
linguistic techniques, to ensure that they
are usable (i.e. have accessible entry
points via transcription, translation and
annotation) as well as to ensure that they
are comprehensive
only through linguistic analysis can we
discover that some crucial speech genre,
lexical form, grammatical paradigm or
sentence construction is missing or underrepresented in the documentary record
Documentation and description
•
21
without good analysis, recorded audio and
video materials do not serve as data for
any community of potential users.
Similarly, linguistic description without
documentary support risks being sterile,
opaque and untestable (not to mention
non-preservable for future generations and
useless for language support)
Workflow
Description
something happened
applied knowledge,
made decisions
something
inscribed
NOT OF
INTEREST
cleaned up,
selected,
analysed
representations, lists,
summaries, analyses
presented, published
FOCUS OF INTEREST
Documentation
recapitulates
something happened
22
applied knowledge,
techniques
recording
made decisions,
applied linguistic
knowledge
FOCUS OF INTEREST
representations, eg
transcription, annotation
archived, mobilised
FOCUS OF INTEREST
Language documentation gives
linguistics an opportunity to reassert itself
as 'the science of human language'
23
Linguistics – the science of language
•
•
24
documentation requires a scientific approach to
information capture, paying proper attention to
environmental factors including spatial layouts,
equipment choice etc., requiring knowledge and skills
more often found in music or film rather than
descriptive/theoretical linguistics
documentation requires a scientific approach to data
structuring, processing and analysis, paying attention
to, eg. data modeling and knowledge representation.
requiring skills more often found in computer science
rather than descriptive/theoretical linguistics
• documentation requires a scientific approach to data
archiving and preservation, with proper attention to
metadata, data formats, corpus structure, workflows, and
to protocols (access and usage rights) requiring
knowledge and skills more often found in archiving theory
and practice rather than descriptive/theoretical linguistics
• documentation demands a scientific approach to
mobilisation with proper attention to pedagogy, applied
linguistics, human-computer interaction (interface design
etc.)
25
Challenges
26
•
Interdisciplinarity
•
Legacy data
•
Meta-documentation
•
Recruitment, training and sustainability
Interdisciplinarity
• multidisciplinary perspective in language
documentation could potentially draw in
researchers, theories and methods from a
wide range of areas, including anthropology,
musicology, (oral) history, psychology,
ecology, pedagogy, applied linguistics,
computer science etc
• true interdisciplinary research, is difficult to
achieve, both because of theoretically different
orientations, and practical differences in
approach that can make communication and
understanding complex and difficult
27
• mainstream linguistics has tended to turn
away from other disciplines and to
emphasise its ‘independence’ by
concentrating on theoretical concerns that
are of internal interest primarily to linguists
alone (Libermann 2007)
• language documentation opens new doors to
interdisciplinary collaboration but we need to
work out how to achieve it
28
Legacy data
• Language documentation theory has
assumed that data is collected now and has
not paid attention to already existing
materials (digital or analogue)
• Legacy data raises many issues if we wish to
include it in our corpus or to treat it like other
modern digital data
• There are practical, technical, ethical, and
political issues that legacy data raise, and
many questions which may be difficult to
answer
29
Some problems
30
• Should legacy materials (media, text) be
‘cleaned up’ for modern use? Important to
document goals and processes
• Modern documentation assumes ‘informed
consent’ and rights to control access
respecting individual and community wishes
and sensitivities, but
consent1980 ≠ consent2010
• And access1980 ≠ access2010
• And sensitivities1980 ≠ sensitivities2010
• And community1980 ≠ community2010
Meta-documentation
• We can’t easily anticipate technological
changes and ethical norms of the future
• We can expect that they will change
• Creating information about the research
context (documenting the documentation)
could facilitate reassessments of rights
and responsibilities in the future
• Critical information may be impossible to
obtain after the death of the researcher
31
What to metadocument?
32
• The stakeholders that were involved and
how (their roles in the project)
• The attitudes of language consultants
• The methodology of the researcher
(contact, consent, compensation, culture)
• The biography (including background
knowledge and experience) of the researcher
and main consultants
• Any agreements entered into, whether
formal or informal (MOU, payment, promises
and expectations)
• …
A (not too serious)
metaphor
• Researchers and projects have different
working arrangements, but perhaps we can
develop a typology of these
• Using a metaphor from human land use
practices
– Hunting & gathering
– Slash & burn swidden
– Pastoral nomadism
– Sedentary intensive agriculture
– Plantation
– Sustainable land use
33
34
35
Hunting and gathering
36
Slash and burn swidden
37
remove major obstacles, farm intensively for 2-3 years and then move to
another site
Sedentary intensive
agriculture
38
ranges from feudal to communal, with employment of local serfs and
artisans in temporary or specialist roles (and necessary application of
fertiliser and pesticides, or crop rotation)
Plantation
39
train 3rd world locals to grow consumable products in the correct
form and extract them to refine and add expensive value in 1st world
Sustainable land use
ecology-driven
wholistic
approach,
including
reforestation and
recuperation of
damaged land
40
41
Sustainability
•
•
•
42
we need to work out how to recruit new
contributors to the discipline, how to train them,
and how to sustain them through fulfilling career
paths
we understand sustainability of archived data
but how do we sustain projects and
relationships beyond the typical 3-5 year
academic life cycle?
how can documentation contribute to sustaining
endangered languages and the communities
who want to maintain and develop them?
More metadocumentation
• Meta-documentation of granting projects
• Meta-documentation of archiving practices
• Meta-documentation of meta-data (cf. ‘best
practices model’ vs. Nathan 2009 “types of
metadata vary by project, language,
community, consultants,...”)
• Meta-documentation of project outcomes
against project (application) goals
• …
43
Example: Meta-documentation of
grants
ELDP grants by continent
80
70
60
50
Number of projects 40
30
20
10
Target
Host
44
Asia
AusPac
Africa
CSAmerica
Continents
Namerica
Europe
MidEast
0
Host
Target Target vs Host
Where does the money
go?
Value of ELDP grants
£3,000,000
£2,500,000
£2,000,000
Host versus target £1,500,000
£1,000,000
Target value
£500,000
45
Target value
Namerica
Europe
CSAmerica
Middle East
Continents
AustPac
Asia
Host value
Africa
£0
Grant value
Conclusion
language documentation is an exciting
development in terms of research goals,
methods and outcomes that offers the
potential to reshape the scientific and
humanistic basis of linguistics now and into
the future but many significant issues
remain to be developed and discussed
46
Thank you
47