Pragmatics
l:L71,-106
InternationalPragmaticsAssociation
TRANSCRIPTION DESIGN PRINCIPLES
FOR SPOKEN DISCOURSE RESEARCH
John W. Du Bois
I Introduction
Transcriptionis theory.l This inevitable conclusionhas been brought
homeevermore forcefully in the years since Ochs' seminal article (7979),as our
awareness
of the tangible shapeof discoursehas sharpened,and as the push to
languagein relation to use has whetted appetitesfor more and more
comprehend
newdiscourse
materiaF-transcribedwith a penpicacity and insight that can
nourishnew theorieswith the vital information they need to grow. How we
doesn'tjust reflect our theoriesof language,it also shapesthem,
transcribe
drawingour eyesto some phenomenawhile leaving others in shadow. We know
this,andyet surprisinglylittle has been written that might help discoursetheorists
thinkabouthow to get a discoursetranscriptionto do what they want it to do (but
seeOchs1979,Edwards 1989and forthcoming,Edwardsand l-ampert
forthcoming).Partly this involvesunsessing
what a discoursetranscriptionneeds
to do,andpartly it involvesfiguring out how to frame a systemthat can do it.
Theseare the problems I wish to addresshere. First, I will considerhow
peopleusediscoursetranscriptions,in order to determinewhat featuresneed to
bebuilt into them. Second,I will offer a framework for designingdiscourse
transcription
systemsto meet thesedemands,includingdemandsengenderedby
modernresearchtools like the computer,and by the nature of their human users.
Because
transcriptionis shapedby theory, and discoursetheories ytr!, it is not
possible
to detail here every viable transcriptionalcategory. Rather, it is my
purpose
to presentgeneralprinciples for systemdesign. My remarks may be of
interestto three kinds of users:those who wish to designa new transcription
to meet their own unique researchneedsand theoretical orientation; those
system
whowishto adapt an existingsystemthat almost,but not quite, meets their needs;
andthosewho would simply like to know more about the foundationsof
lohn W Du Bois
transcription systemdesign,in order to deepentheir understandingof the
transcribingprocessand the data recordswhich result from it. Discourse
transcriptionsare becomingthe key data for many researchers,not only in
linguisticsbut in neighboringdisciplinesas well. Given the propulsion that new
conceptionsof data can bring to ongoingtheoretical developments,a close
examinationof discoursetranscriptionfrom the theoretical groundwork up has
value for all participantsin the expandingspectrumof spokendiscoursestudies.
Though the focus will remain on generalprinciples,these must be
exemplified through specificillustrations,for which I will draw mainly on a
transcription systemwhich I and my colleagueshave developed(Du Bois et al.
forthcoming a, b). Many of the points could of coursebe made as well with other
good transcription systems(see Edwards and l-ampert, forthcoming and the
referencestherein).
2 What should a discoursetranscription be?
The first thing to do in designinga discoursetranscriptionsystemis to
assessone's goals and plans for using the transcriptions. What is discourse
transcription? Who will use the transcription? What for? What featureswill
best servethe users'goals? In this sectionI set out general assumptions,goals,
and desideratafor discoursetranscription,in order to lay the groundwork for the
more specificdesignprinciples treated in the following section.
What is discoursetranscription? Discoursetranscriptioncan be defined as
the processof creating a representationin writing of a speechevent so ztsto make
it accessibleto discourseresearch(Du Bois et al. forthcoming a). What it means
to make the event "accessibleto discourseresearch"will of coursedepend on
what kinds of researchquestionsone seeksto answer. Although speechevents
are alwaysviewed through the lens of some theory, one can try to ensurethat the
theorl=the framework for explanationand understanding-doesjustice to the
spokenreality, or rather, to selectedaspectsof it. The processof discourse
transcription is never mechanical,but crucially relies on interpretation within a
theoretical frame of referenceto arrive at functionally significantcategories,
rather than raw acousticfacts (cf. Ochs 1979,I-adefoged1990,Du Bois et al.
forthcoming a).
The nature of discoursetranscriptionis necessarilyshapedby its end. Why
transcribe? Transcription documentslanguageuse, but languageuse is attested
equally in written discourse,which has the advantageof being easyto obtain
without transcribing. What makesspeakingworth the extra effort? Spoken
Tianscription designprinciples
73
language
differs in structurefrom written language,in ways that remain
surprisingly
little studied:many aspectsof spokengrarnmar,meaning,and even
lexiconremainto be documented. Moreover, it is in spokendiscoursethat the
process
of the production of languageis most accessibleto the observer.
Hesitations,
pauses,glottal constrictions,false starts,and numerousother subtle
evidences
observablein speechbut not in writing provide clues to how
participants
mobilize resourcesto plan and produce their utterances,and to how
theynegotiatewith each other the ongoingsocial interaction. Prosodicfeatures
like accentand intonation contour provide important indicatorsof the flow of new
andold informationthrough the discourse(Chafe t987, etc.). And the momentby-moment
flux of speechdisplaysa rich index of the shifting social interactional
meanings
that participantsgenerateand attend to, as well as of the larger
dimensions
of culture embodiedin social interactions(Goodwin 1981,etc.). A
transcription
of spokendiscoursecan provide a broad array of information about
theseand other aspectsof language,with powerful implicationsfor grammar,
pragmatics,cognition,social interaction,culture, and other domains
semantics,
thatmeetat the crossroadsof discourse.
But discoursetranscriptioncannot be equatedwith simply writing down
speech,
becausethere is not, nor ever can be, a single standardway of putting
spokenword to paper. An oral historian,a phonetician,a journalist, and a
dialectologist
will all produce very different renderingsof the same recording,and
researcher'stranscriptionwill differ yet from all of these. Indeed,
a discourse
because
theseother methodologiesfor writing speechare available,the discourse
transcriber
can defer to them when necessary,relying on the relevant specialist,
equippedwith the appropriate analyticalframework and notational conventions,
to dealwith thosefeatureswhich are specificto his or her domain of inquiry.
Thisis not to saythat the domain of discourgeis insulatedfrom, say,phonetics,2
sinceclearlya small detail of pronunciationcan ironically reversea conveyed
meaning,
or evensignal a speaker'salignmentwith one conversationalparticipant
againsta third. But at least the predictableregularitiesof phonetics,phonology,
grarnmarand lexicon can generallybe assumedto have been describedelsewhere
(perhaps
evenin a descriptivegrammar written by the same researcherwearing
hat),
another
so that thesefacts need not be recapitulatedin the discourse
transcription.The discoursetranscriptionis designedfor answeringsome but not
all questionsabout speaking,and will necessarilycontain both more and less
informationthan another discipline'srepresentationof the same event.
Pursuingthe nature of discoursetranscriptionfurther leads to more specific
questions:What kinds of eventswill be transcribed? Who will use the
74
John W Du Bois
transcriptions? How will they use them? What should go into a transcription? I
take up each question in turn.
What kinds of speecheventswill be transcribed? A conversation,
classroomlecture, committee meeting,political speech,serviceencounter,or even
just a few words exchangedhastily in the hallway might form the object of
scrutiny. Each of these speechevent typespresentssomewhatdifferent demands
but a good transcription systemshould be able to accommodate,with adaptation
if need be, the full range of eventsthat are likely to be of interest. In many
respectsthe most challengingcaseis the free-wheelingmulti-party conversation,
and any systemthat can meet its vast demandswill have passedthe severesttest,
and positioned itself well to handle other speecheventsthat may be encountered.
Who will use the transcriptions? Discourseresearchers,of course,in all
their variety. But these daystheir interest in discourseis sharedby an everwidening circle. Grammariansand general linguistsuse transcriptionsas sources
of linguistic data on a range of topics, and to follow the action in theories
grounded in discourse;computationallinguistsuse them to test speechrecognition
protocols againstactual languageuse; languageteachersuse them to illustrate
realistic usesof spokenlanguage;social scientistsuse them for understandingthe
nature of social interaction; curious folks find it intriguing to look closelyat how
people really talk; and the studentsof any of thesemay use transcriptionsto learn
more about their field of study. And, as we shall see,one of the most important
groups of usersis the transcribersthemselves.A good transcriptionsystemshould
be flexible enough to accommodatethe needsof all of thesekinds of users.
How will people use the transcriptions? The most fundamentalthing they
do is to read them, perhapsbrowsingthrough a transcription(or a stack of them)
to look for a particular phenomenonor pattern, or to formulate a hypothesis.
This requires the transcriptionnot only to present the neededinformation but to
present it in a way that is easily assimilated. Second,many userswill want to
searchthe transcriptionusing a computer in conjunctionwith various kinds of
data managementsoftware,which may include a word processor,a database
manager,and a concordancemaker, among other things. For this, the
transcription should make all the necessarydistinctionsin waysthat ensurethat
searcheswill be exhaustiveand economical(see $3 below and Edwards 1989,
forthcoming). It would be hard to overestimatethe impact of the microcomputer
on discoursetranscription design;many of the possibilities,and many of the
constraints,that are spoken about in this article would not exist if the computer
had not in recent years made itself such an indispensabletool for many, though
certainly not all, types of discourseresearch. This is not to say that the needsof
Tianscription designpinciples
75
researchershould be bent to the requirementsof the machine;as
the discourse
Edwardshaspersuasivelyargued (1989),and as I will reaffirm below, an aware
andpurposefulpursuit of certain basic designprinciples can insure from the
outsetthat it is the computer that adjuststo the needsof the researcher. Finally,
onekey functionthat is often overlookedis embeddedin the transcribingprocess
itself. Through the experienceof transcribing the transcriber is constantly
learningabout discourse,not only gaining skill in discriminating the categories
implicitin the transcription systembut also acquiring a vivid image of the
reality that he or she is seekingto represent. To the extent that
conversational
thereis more going on than the transcriptionsystemcan capture,it is the
immersedin the recorded speechevent and groundedin discourse
transcriber,
theory,who is in a position to rectify this, to advancethe potential of the
transcription
systemand its theoretical framework. Although transcribingis
sometimes
thoughtof as a kind of manual labor, merely a necessarymeansof
producingcertainvaluable end products,in reality the processitself has
potential for enlighteningits practitioners,and for generatingthe
tremendous
levelof keenperceptionand intimate knowledgethat can translateinto
theoreticalinsightand new researchdirections. With this in mind, the
transcription
systemshould contribute to making the transcribingprocessa
valuableexperiencein itself. The systemshould be convenientand comfortable
to use,reasonablyeasyto learn, and through its implicit categoriesit should
promoteinsightfulperception and classificationof discoursephenomena,which in
the end may feed back into advancesin the systemitself.
The fact that a transcriptionis likely to be exploited in severaldifferent
waysusingdiversetools meansthat it must be transportable($:.2;, that is, it
shouldbe straightforwardto move data from one context of use to another. As
noted,the computationaltools of choice may include word processors,
concordance-makers,
databasemanagers,and other programs,used on systems
rangingfrom IBM PC-compatiblesto Macintoshesto mainframes. And the
outputfrom thesetooFsearch results,key-word-in-contextconcordances,the
basictranscriptionsthemselves,and so orFneed to be presentedin formats as
variedas computertext files, databasefiles, screendisplays,printouts, handwritten
notes,blackboardinscriptions,and printed articles. These transcriptionsmay
cometo be read by a wide range of people interestedin discourse,from linguists
to studentsof foreign languages,in addition to the discourse
to sociologists
who may form their initial audience. What this all boils down to is that
specialists
data need to be representedin a form that is sufficientlyrobust to
discourse
maintaindata integrity acrosscontextsof use. Practicallyspeakingthis means
usingthe most widely standardizedsymbolsavailable ($3.2,$3.1;.
76
Iohn W Du Bois
What should go into a transcription? Allowing for theoretical diversity,we
can still observethat many transcriptionsystemstake pains to represent(or at
least make notations availablefor) many or all of the following features:the
words spoken,written so as to allow each lexical item to be recognized;an
identification of the speakerof each turn; the temporal sequencingof utterances,
whether these follow each other in successionor are simultaneous(as when
speakersoverlap); basic units in which the utteranceswere articulated,such as
turns and intonation units; intonation contour,whether functionally or
phonetically classified;accent;fluctuationsin timing such as tempo, pauseand
lengthening;nonverbal noisesmade by speechevent participants,such as laughter,
throat-clearing,inhalation; specialqualities of voice that extend over a stretch of
speech;non-utteranceeventsthat becomerelevant to the interaction, such as a
participant's servingfood, or a suddenthunderclapin the background;
metatranscriptionaland "evidential"commentson the transcriptionitself,
indicating where the transcriberis uncertain of the words spoken,and so on; and
other featuresas appropriate. (Depending on the language,these may include
interlinear glossesand free translations,among other things.) Most of this
information is locally sequenced-linkedto specificmomentsin the stream of
speectr-but some of it pertains rather to the whole of the speechevent or
transcription,and can be presentedseparatelyin a header of "global",nonsequencedinformation containing,for example,general contextualdata on the
speechevent participants,the situation, the event'srecording,and its transcription
(Du Bois forthcoming).
It will be useful at this point to look at a brief exampleof a discourse
transcriptionthat takes thesefactors into consideration. The following exampleis
based on the systempresented in DiscourseTransciption (Du Bois et al.
forthcoming &, referred to hereafter as DT). It should be borne in mind that DT
is only one of many possiblesystemsthat could meet the kinds of needsdiscussed
in this article. In the examplebelow,3speakeridentification is indicated by a
colon following the name or label in capital letters; overlap between speakersis
indicated by squarebrackets,with the left bracketsvertically aligned; two separate
cuxesof overlap in close successionare distinguishedfrom each other by single
versusdouble brackets;pausesof various lengthsare written as clustersof three
dots, with the duration of longer pausesgiven in parentheses;truncation of words
is shown by a hyphen at the end of the word; intonation units are marked by
carriagereturn, that is, each line containsa single intonation unit; and intonation
contours are broadly classifiedas to their "transitionalcontinuity" (Du Bois et al.
forthcomin1 a, b) by the comma,period, or questionmark at the end of the line.
A summaryof these and other symbolsused in the DT systemis given in
Tianscription designprinciples
77
Appendix
1. (For a full discussion
of eachof thesetranscriptional
categories,
see
DuBoiset al. forthcoming4 b.)
(1)
((Cnnsarrs))
D: we have the used Dodge,
but no new ones.
G: ... No new ones?
IHuh],
D: [Dodge] is separate,
from Chrysler Plymouth.
G: .o. You're [kiddingl me].
D:
[I mean] they're all affiliated,
but th- th- [ [theyrre not] I on our lot.
G:
ttRightll.
...(.8) Theyrre not on your lot.
[wow].
D: IYeah].
A "narrower"transcriptionwithin the samesystemwould add more detail, using
additionalconventions.In the following illustration, a very short pause(less than
200milliseconds)
is indicated by a cluster of two dots; primary and secondary
accentare markedwith caret and grave accent,respectively;and laughter is
writtenwith one @ symbol for each "syllable"of laughter.
(2)
L:
R:
M:
((LuNcH))
But rthey never rfigured ^out what he had?
He had ^pneunonia.
[The ^second rweek] he had rpneu€monia,
[^Eventually].
Thesebrief illustrationsare enoughto provide some common ground for the
discussion
ahead;additional transcriptionconventionsand examplesfrom this
will be introducedas needed.
system
3 T[anscriptiondesignprinciples
With the goalsof discoursetranscriptionin mind, we are now in a position
to specifythe designprincipleswhich they motivate. While transcriptiondesign
principles
are commonlyleft implicit, it is to the benefit of all concernedwith the
qualityof discoursedata and theory that they should be made explicit.
We can group our transcriptiondesignprinciplesunder five broad maxims:
78
John W Du Bois
1.
2.
3.
4.
5.
Categorydefinition: Define good categories.
Accessibility:Make the systemaccessible.
Robustness:
Make representationsrobust.
Economy: Make representationseconomical.
Adaptability: Make the systemadaptable.
I will now take up each of thesemaxims and the designprinciples they motivate.
3.1 Categorydefinition
DenNIE GooD cATEGoRIES.The most basic issuein designinga discourse
transcription systemis not choosingsymbols,but defining the analyticalcategories
for which the symbolswill stand.
1. Define trarccriprtonal categoieswhich mal<cthe necessatydistinctioru
among discoursephenomena Deciding on what discoursephenomenashould be
representedby distinct transcriptional categoriesis logically the crucial
foundational task. It is also the task which is most sensitiveto the theoretical
assumptionsand researchgoals of the transcriptionsystemdesigner(see $1, $2).
For a discussionof categorydefinitions for one transcriptionsystem,see Du Bois
et al. (forthcoming a, b).
2. Define sufficienttyexplicit categoies. Categoriesshould be differentiated
at a sufficientlyfine delicacyto ensurethat searchesare economical,and not
underminedby false selection. Categoriesshould also be orthogonal (Edwards
1989). The idea is to avoid false finds, for example,coming up with examplesof
onty when attempting to find all instancesof on (Edwards 1989). This reflects the
need for a kind of "splitting":making sure that the notation allows sufficient
discrimination and doesn't lump categoriesthat ought to kept distinct. For
example,some transcriptionsysterx use a single symbol,the colon (:), to indicate
both prosodic lengtheningand speakeridentification; but this is likely to produce
searchesfor one categorythat turn up unwantedinstancesof the other. In DT,
speakeridentification is marked by colon (cf. principle 4) while prosodic
lengtheningis marked by equal sign,which avoidsthis kind of problem.
(3)
B:
A:
((Arnrca))
Did y o u e v e r
No=,
hear
th a t?
Of course,if two different usesof a characteralwaysoccur in distinct
contexts(i.e. are in complementarydistribution), then as long as the contextscan
Tianscriptiondesignprinciples
79
be madeexplicit it should be possibleto ensurethat searcheswill select only the
relevantitems. Thus DT exploits the slashcharacter/ both in the marking of
phonetictranscription(/ /) and rising pitch /, sincein the first casethe paired
slashes
are alwaysimmediately adjacentto a single parenthesiswhile in the
secondcasethis configurationnever occurs. There is a distinction here between
anambiguoussymbol and a characterthat happensto form part of two different
symbols.
3. Define sufficienttygeneralcategoies. Categoriesshould encompassa
sufficientrange to ensurethat the setsthey define will be meaningful,and not
partialor distorted. One can think of this as a "lumping" tendency,in a positive
sense.This is done to ensureexhaustiveselectionof all the instancesof the
category
one is seeking(Edwards 1989).
Obviouslythere is a tension betweenthe two principlesjust articulated
whichneedsbalancing. Edwards (1989) arguesthat splitting actually presentsa
gleaterdangerthan lumping, sincewhen extraneousinstancesturn up in a
computersearchthey can be weeded out by hand, but when relevant instancesare
justnot there the researchersmay never know what they are missing.
4. Contrastdua types. As I have noted elsewhere(Du Bois forthcoming),
discourse
transcriptionstypically include a number of distinct types of data, which
areof necessityintermingled in a fairly complexfashion,given the intrinsically
multi-layeredcomplexityof speecheventsthemselves.It is very useful to visibly
contrastthesedata types,so that readersof a transcriptionwill know at every
momentwhat kind of information they are taking in. Some of the relevant
distinctionsare: utterancesversustheir attribution to a speaker;speakingversus
non-utteranceactions (such as eating); eventsversusthe temporal unfolding of
thoseevents;words spokenby speechevent participantsversuswords interjected
ascommentaryby the transcriber;commentaryabout a specificmoment in the
speechevent versuscommentaryabout the whole event (e.g. about the event
setting);and others. While no practical transcriptionsystemis likely to draw a
rigidvisual distinction between all of thesetypesof information, it is important to
markclearly at least the most crucial distinctions.
Perhapsthe most familiar illustration of a highly effective differentiation of
datatypesis to be found in the practice of Westernplaywrightsand their
publishers.Their conventionsfor capturingthe unfolding of the spoken drama,
evolvedover centuriesof theatrical tradition, neatly mark differencesbetween
spokenwords, speakeridentification labels (or "speechassignments"),
actor
directions,stagedirections,and other data typesin a way that can be readily
80
rohn w Du Bois
absorbedat a glance (cf. Catron 1984:83-89).While script formatting conventions
vary somewhat from publisher to publisher, as well as acrosstime and across
publishing genres,there is one popular format, commonlyused in "collected
works" editions becauseof its economy of space,which showsmany affinities to
popular discoursetranscriptionformats. Considerthe following example:
(4)
(Wilde t982:334)
Jecrc Gwendolen!
GwnNooLEN: Yes, Mr. Worthing, what have you got to say to me?
Jacr: You know what I have got to say to you.
GwENpoLEN: Yes, but you don't say it.
Jacr: Gwendolen, wiil you marry me? (Goes on his lorces.)
GwexpoLEN: Of courseI will, darling. How long you have been about it!
Here the speakeridentification labels are written in small capitals("JacK'), while
the words actually spoken are written in mixed-case,i.e. a mixture of upper- and
lower-caseletters ("Gwendolen!"). This contrastis reinforced by the placementof
the colon: words before the colon generallyrepresentspeakerlabels,while words
after the colon generallyrepresentwords actually spoken. These two data Upes
are in turn contrastedwith a third, the playwright'sinterjected descriptionsof
actions,marked by italics enclosedwithin parentheses("(Goeson his lorces.)").
While it would be attractive to simply incorporate all thesefamiliar (cf. principle
5) and effective devicesinto one's discoursetranscription system,for various
reasonsnotationslike small capitalsand italics are problematic for discourse
transcriptionsystems(cf. principle 2, $3.2,and $3.3). But with only slight
adaptation, equivalent contrastsbetween data types can be marked using more
suitable symbolicresources. Considerthe following discoursetranscription
example:
(5)
ALICE:
and I mean,
( (SNAPSFINGERS)) you can get up,
just that quick.
if [you just] stumble and fall.
BETTY:
[Uhhuh] ,
((Amtca))
Tiansciption designprinciples
81
Here,the distinctionbetweenspeakeridentification labels and actual utterancesis
markedby usingcapitals("ALICE:") versusmixed-case("and I mean"). This
traditionalcontrastis reinforced by the use of colon"with essentiallythe same
meaningin DT as in the playwright'spractice (cf. example4). Similarly, DT
distinguishes
interjectedtranscribercommentaryfrom actual speakerutterances
via a versionof the playwright'suse of parentheses-plus-distinctive-font
(e.g.
italics):DT makesthe parenthesesdouble (so as to unambiguouslydistinguish
interjectedcommentaryfrom nonverbalvocal noises,marked by single
parentheses;
cf. example L2), and the distinctive"font" employedis simply uppercaseletters. This visually reinforcesthe contrastbetweentranscribercommentary
("((SNAPS FINGERS))'') and speakerutterances,written
or actiondescriptionsa
with mixed-case
letters. Such data type discriminationshould be pursued
whereverfeasible,within the scopeallowed by the somewhatlimited typographical
resources
available,and recognizingthe rigorous demandfor ambiguity avoidance
thatthe transcriptionsystemdesignermust face.
32 Accessibilitv
MAKE,"" ,tatEM ACCESSIBLE.
The task of symbolicallyrepresentingthe
chosentranscriptionalcategoriesshould be guided by the need to make the
notationalsystemma-riimallyaccessible,in terms of both learning and actual use,
to usersat all levels,including new learners.
5. Usefamiliar notatioru. Transcription systemshave a lot to gain from
drawingon existingtraditions for representingspeechin writing, wheneverviable
conventions
can be found. Oldest and most familiar are literary traditions like
thoseof playwrightsand novelists,who have grappledwith the problem of
representing
speechin writing for many centuries. Most discoursetranscription
haveborrowedfreely from these systems,whether consciouslyor not.
systems
For example,when writers of fiction in the Western tradition seek to
capturethe rhythm of spokeninteraction, one of the notational devicesthey often
exploitis a sequenceof three dots to mark a pause:
Iohn W Du Bois
(6)
"I mean there's somethingI
didn't trust you."
"What is it?"
"It's ... hard to say."
mightsaytoyouwhich
mightfilff:X84:272)
If this conventionis incorporatedinto a discoursetranscriptionsystem,
anyone
who has read a few novelsin the Western literary tradition wilt effortlessly
acquire and apprehendthe intended meaning:
(7)
((CanSarns))
D: f want to rnake fifty
thou a year,
G: --- e But what about arr thoie phone numbers.
Another feature of conversationalinteraction that novelistsare attuned
to
is truncation (or aposiopesis),often representedby an em dash:
(8)
(Hillerman1990:37)
, ^
"For example,
from the Zuni sorcerytradition. Or the Hopi ,two-heart,
legends,61-" Dr. Bourebonettestopped,midphrase. she looked
embarrassed.
The essenceof truncatiorrstopping in midphrase-is even spelled out
in words
here. This conventionhas been around foi a long time, * iuggrsted
by the
following example,which is far from the oldest thit could be Ei"ted:
(e)
'You
speak of-' said Egremont, hesitatingly.
(Disraeli 79M:93)
Becauseof its.familiarity, DT incorporatesthis literary conventionfor
truncation into the discoursetranscriptionlystem (approximatingthe
em dash
with a sequenceof two hyphens):
(10)
J:
And he
and he kicks
((J&J))
ny
feet
apart,
The playwright'sneed to displayspoken
interaction in a way that can be
.
instantly absorbedby actors and reiders (example4) has led to devices
and
conventionstested.bymany years of use,whoseilarity and simplicity make
them
well worth borrowing:
Tiansciption designpinciples
83
(Williams 1971,:1,62)
(11a)
AMANDA: Come back here, Tom Wingfield! I'm not through talking to
you!
TOM: Oh, gr
I-AURA [desperatef] : -Tom!
AMANDA: You're going to listen, and no more insolencefrom you! I'm at
the end of my patience!
lHe comesback toward her.l
(11b)
JIM: Hello there, [-aura.
I-AURA Vainttyl: Hello.
lSheclearsher throat.l
JIM: How are you feeling now? Better?
IAURA: Yes. Yes, thank you.
(Williams 197I:210-11)
Theplayrright'sconventionsmay well be the richest sourceof time-testedand
familiarnotationaldevices,many of which go back to Shakespeare's
time and
beyond(e.g.Shakespeare1623). They contribute powerfully not only to the
of data types (principle 4) but also to the overall display of
discrimination
informationon the page (principle 18).
Cartoonistsgo farther than novelistsand perhapseven playwrightsin their
attempts
to capturethe expressivecharacterof speech. To be sure, they often
drawon specialrepresentationalresourcesthat neither novelistsnor discourse
transcribers
can count on (see $3.3below), such as variable size,shape,alignment
andevencolor of characters,as well as balloons and other speechattribution
(Du Bois 1986). But some of their devicescan be carried over, like that
devices
of writingnonverbalvocal soundrsuch as a sigh or a bur5within parentheses,
whentheseare included in a speechballoon:
(12)
(SIGH) IF GOD HAD WANTED US TO GROW UP, HE WOULDN'T
HAVE GIVEN US TEDDY BEARS. (Momma, by Mell Lazarus,[-os
AngelesTimes 5/27/80)
here serveto reinforce the distinction betweenverbal and nonSingleparentheses
verbal,helpingthe reader to immediatelyrecognizethat (SIGH) indicatesnot an
utterance
of the word "sigh"but an actual instanceof the action of sighing. In
transcriptionit is equally useful to draw this distinction,for which DT
discourse
andseveralother svstemsborrow the cartoonists'convention.
John W Du Bois
(13)
V:
((Bnmce))
( E ) and th e o n l y th i n g g o i n g
( s wA I J I o w ) (T s K ) (E ) We l l ,
I mean,
I can really
kick
through
my mind was,
and fight,
Even existingdiscoursetranscriptionsystems,despitetheir relatively brief
history, can sometimesoffer notational conventionsboastinga degreeof
familiarity, at least among those already initiated into spoken discourseresearch.
For example,the use of squarebrackets(or similar symbols)for overlapping
speechis familiar to many through exposureto transcriptionsin the Conversation
Analysis tradition (Atkinson and Heritage 1984);and many transcriptionsystems,
including DT, have adopted some sort of bracket-like symbol for this function.
(14)
A: I'm
B:
C:
B:
S:
B:
((Dnnmn))
[thinking inl
[She was]
t e r m s o f the
No.
[She was]
[She wasJ radioactive.
She was nuts.
[ [geneti c]
I
[ [She was] I
factor,
--
(While DT adopts the bracket-like symbol,it takes a slightly different approachto
its placement,as discussedbelow under principle 18.)
Transcription systemsbenefit greatly from drawing on readers'years of
exposureto notations like these for pause,truncation,nonverbalvocal noises,data
type discriminations,and so on, becausethey invoke conventionsthat cost users
nothing to learn, since they already know them. Naturally the meansemployedby
writers to representspeechare not without limitations, which include a degreeof
ambiguity and inconsistencyin usage(Du Bois et al. forthcoming a), as well as a
certain lack of expressivepower. For example,in a novel three dots do not
indicatehow long the pauselasts,and sometimesthey are usedwith other
meaningssuch as ellipsis. Similarly, em dash indicatesnot only truncationsbut
hesitationsand parentheticalcommentsas well. But this merely underscoresthe
task of the transcription system,which is to make explicit, consistent,and
unambiguousthe meaning attachedto every notation-in accordancewith
transcription designprinciples like those presentedin this article-and to increase
expressivepower by offering additional conventions.
Tiansciption destgnpincip les
6. Usemotivatednotatioru. In addition to the authority accruedhistorically
viatradition(as in the previoussection),a notation may be motivated in other
ways,suchas iconicity and internal consistency.For example,writing a stroke
upwardto the right (/) tor rising pitch and a stroke sloping downward (\)
sloping
for fallingpitch is partly iconic (though also partly groundedin conventionsof
left+o-rightreadingorder).s And atigningspeechoverlapsvertically, so that
wordswhich trvo speakersbegin uttering at the same time are written beginning at
thesamecolumnon the pa1e,countsas a kind of diagrammaticiconicity. In a
similarvein,onceit is pointed out that the @ sign for laughter in DT somewhat
the widespread"smiley face" symbol (o), this strikes some people as
resembles
iconic.6
((Dtmmn))
(1s)
just
afterwards,
spoon
off
sort of rinse the
A: and if you
you donrt really [have to wash dishes],
B:
tC C@Cl
The equal sign 1=) for lengtheningin DT (cf. example3 above) comesto
looklike stretchmarks on an elastic object being stretchedouFat least once you
knowwhat it is supposedto mean. (This use of the equal sign derivesfrom a
literary convention,the similarly iconic use of dashesto indicate
long-standing
sounds:"No-o-o-o, I-" (Williams l97l.222).)
elongated
Iconicitycan be diagrammaticas well, as when the iteration of a symbol
fourtimes(e.g.four @ signs)indicatesfour iterations of an event (e.g. four pulses
of laughter).Diagrammaticiconiciry facilitatesthe meaningfuluse of space(cf.
principle18),making it easyto refer to specificmomentswithin a string of
iteratedor extendedevents. For example,when an extendedsequenceof laughter
is partiallyoverlapped,the left overlap bracket can be located preciselyat the
pointin the middle of the sequenceof iterated eventswhere the overlap actually
begins(and similarly for the right bracket that marks overlap ending).
((Arntca))
(16)
B: WeIl,
I hear d o f a e l e p h a n t,
that sat down on a [Vw,
one timel.
t e lI
t e l e e e e et e
A:
B:
A:
[ [There
Did y ou e v e r
e No,
hear
th a t.
was al ]
girl-
86
John w Du Bois
7. Useeasilylearnednotations. In a sensethis is simply the cognitive
counterpart of motivation. To the extent that new notationsmust be introduced
which will require learning, it is motivated notations (iconic, rnnemonic,or
internally consistent)that should be learned more quickly. And a transcription
systemthat is easyto learn is more likely to be used,and used without error, than
one that is not. For example,the letter H for audible breathing is, if not directly
iconic, mnemonic for anyonewho reads a languagein which orthographicH spells
a similar sound.
Ease of learning is also favored if all membersof a notational paradigm
have the same symbolic synta.n,as it were, making the systeminternally motivated
(Du Bois 1985). For example,in DT the bracketsthat indicate specialqualities of
voice extendingover a stretch of speechalways carrytheir distinctivelabeling in
the sameplace. Thus in the symbolsfor laughingwhile speaking(<@ @>), the
tags are attachedto both left and right angle bracketson the inside edge.
Becausethe same pattern is used for the DT notationsfor whispering(<WH
(<L2L2>), indexedoverlap
WH>), quotationquality(<Q Q>), codeswitching
brackets([3 3]), and so on, once the pattern is learned for one symbol it will
carry over to the rest of the symbolsin the system.
(r7)
A:
B:
((Arnrca))
theyrd even got out of the car,
He really knew better.
than to <G get out of the car G>.
Well,
Similarly, pattern congruity arguesfor marking long pauseswith a sequence
of dots like the medium and short pauses,despitethe fact that simple
distinctivenesscould be achievedjust by the use of numbersin parentheses.
(18)
R: o.o This .. is the type of person,
...(.9)
that ...(.7)
is like ...(1.0)
((RaNcu))
a hernit.
Tiansciption designpinciples
87
8. Segregate
unfamiliar notations. Traditional literary conventions,however
useful,cannotby themselvesprovide enough familiar resourcesto construct an
entirediscoursetranscriptionsystem,so a certain number of unfamiliar and even
arbitraryconventionswill be called for. But their impact can be lessenedif they
arevisuallysegregatedfrom the (familiar) representationsof words spoken. This
canbe donein severalways. Restrictingcategorieslike laughter and primary
^
accentto non-alphabeticsymbolslike @ and (as in DT) allows the nonspecialist
struggling
to follow a narrow transcriptionto get by by ignoring all the
nonalphabetic
symbolsand readingjust the words uttered, which come through
loudand clear in their traditional mixed-casealphabeticnotation. Over time, the
beginnercan learn to attend also to the other streamsof information carried by
thenonalphabetic
symbolsembeddedamong familiar words. Even veteran users
find it useful to be able to attend to these different data types in
of transcriptions
visuallysegregatedchannels(principle 4), as in the following narrow transcription
sample:
(1e)
((LuNcr))
R: ... (1.0) Whenthey .quit going to ^Littleton,
\
. . every rweek to see his ^graGndrnoGtheGr GGG, /
(E)
L: .. Oh rthatrs [when he ^outgrew it]? /
R:
t @ C @l C
Considering
the amount of prosodic information conveyedhere, this display is
more readable than it would be if all information were presented
significantly
withinjust the mixed-case"channel",rather than having the prosody (mostly)
in a separatenon-alphabetic"channel".
segregated
9. Usenotatioru which maximizedata access.The need for diversetools
andformatsto servethe various usesof discoursetranscriptionshas been treated
at somelength in $2. Clearly some notations are flexible enoughto allow access
throughall of thesemeans,while others are too dependenton one particular tool
(for examplea word processorwith specialfont capabilities). Notations which are
readilytransportable(see $3.3 below) not only make the transition easier,they
helpensuredata integrity.
10. Maintain coruistentappeeranceacrossmodesof access. This in some
followsfrom the last principle; but when it comestime to publish
sense
in a book or article it is tempting to throw off the irksome
transcriptions
limitationsof the ASCII characterset in order to draw on the wealth of special
fonts, and formatting possibilitiesthat typesettingcan offer. But this has
symbols,
John W. Du Bois
certain disadvantages,
perhapsthe most important being that time spent reading
the book version and gaining familiarity with its notational conventionswitl not
necessarilycarry over to the transcriptionsas displayedin other formats. In
contrast,maintaining the same conventionsand visual displaywhether the
medium is a blackboard,notebook, computer screen,printout, or book will mean
that experiencedoes carry over from one medium to the next. Another
considerationis that maintaining the same conventionslessensthe tikelihood that
errors will be introduced in the processof convertingfrom one formatto another.
The argument for using the same conventionsacrossformats appliesnot only to
different media but also to different tools within the samemedium, such as word
processingversusdatabasesoffware,where the latter may be unable to handle the
sophisticatedsymbolsand formats producedby the former. Of course,there is no
harm in occasionallydrawing on the additional resourcesof a more flexible
medium like print for functions like highlightingspeechevent featuresof special
interest; for example,DT suggestsusing boldface,an otherwisenonviable
notational resource(cf. principle 11), for highlightingin examplescited in
publications(cf. principle 21). It's just that both the integrity of the data and the
investmentof the learner will be better servedif such embellishmentsare
restricted to redundantlyenhancedpresentationof a stable set of conventions.
3.3 Robustness
MAKE REPRESENTATIoNs
RoBUST. Once the goals of maximizingd,ata
accessand transportabilityacrossplatforms are adopted,this immediately invokes
a need for data representationswhich are robust enoughto survivethe moves
with all their integrity intact. The maxim of robustness,and hence the principles
outlined in this section,can thus be consideredas corollariesto principtes9 ind
10.
11. Usewidelyavailablechqracters.Characterswhich are widely available
do not have to be translated,which removesthe danger of mistranslation. The
most widely available standardis the lower ASCII (7-bit) characterset. This puts
a severecramp on the number of symbolsthat can be drawn on, demandinga
certain amount of ingenuity from the transcriptiondesigner. On the bright side, it
instills discipline and discouragesa frivolous proliferation of notations,*hich
would make things more complicatedfor the learner.
- Nothing said here should be taken to imply that people who are already
comfortably using non-ASCII charactersfor writing a given languageshould stop
using them in their discoursetranscriptions;it is only that a transciiption system
designedfor general (including international) use should not impose additionat
Tiansciption designprinciples
89
demands
of this type on its users. For example,the specialistin Swedishor
Finnishdiscoursefinds nothing intimidating about venturing beyond the limited
horizonsof lower ASCII, since such symbolsare already in daily use for writing
thestandardliterary languageitself. But the transcriptionsystemshould not
presume
to add more demandsin this direction, especiallyconsideringthat
speakers
of different languagesaround the world are not likely to have the same
setof extra-ASCIIsymbolsat their disposal.T
As a corollary of this principle, one should avoid using notational resources
whichare not standardlyrepresentedacrossplatforms, such as boldface, italics,
underlining,
specialfonts (especiallyproportional fonts), margin shifts, and so on,
asthesole marker of crucial contrastsbetweentranscriptionalcategories.s
Althoughmanyword processorsallow boldface,for example,they don't represent
it in the sameway, so that distinctionscan be obliterated or garbled when moving
dataacrossplatforms. Also, many databasesand concordanceprogramsmay not
be ableto effectivelyhandle attributes like italics and boldface.
t2. Avoi^dinvisible contrasts. If trvo categoriesare distinguishedby
characters
that cannot be visually discriminatedin all formats, this makesit hard
for usersto monitor therrpfor example,in proofreadinga paper printout of a
computerfile-and hence may make it difficult to maintain data integrity acrossall
modesof access.For example,the contrastbetweena tab and a space(or
betweena tab and five spaces)should not be used even if the computer screen
candisplayit, becauseit won't show on paper. In the sameway, contrastsbased
on nonprinting"control" characterscreate problemsfor proofing of printouts, and
hencefor data integrity.
t3. Avoid fragile contrasts.By "fragile" contrastsI mean those which are
easilydisruptedor modified in the normal courseof working with a transcription.
For example,some transcriptionsystemsindicate speechoverlap by vertically
aligningthe two speakers'overlappingwords, and then aligning a pair of brackets
on a third line between them to mark the beginningand ending of the overlap.
But if a user changesthe margins,tab settings,or fonts, the vertical alignment of
thesethree lines is likely to shift. At best this kind of accidentalchangewill
producea garbled display;but in the worst case,it will result in a configuration
thatlookslike a plausible speechoverlap, allowing the spuriousalignment to go
undetected
and uncorrected. Also, moving the data from a word processingfile
into,say,a databasefile is likely again to causevertical alignment information to
be modifiedor lost. In general,contrastswhich depend entirely on vertical
alignmentbetweenseparatelines, or on the insertion of sequencesof multiple
spaces,
are prone to unnoticed modificationsand should be avoided.e
90
John ll4 Du Bois
But there is no harm in supplementinga securerobust notation with fragile
resourceslike vertical alignment,as long as theseare used redundantly. For
example,in DT bracketsare first placed within the line of overlappedspeech,
where they are not subjectto accidentalshifts;but once the overlap is securely
marked in this way, temporal sequencingand simultaneitycan be displayedmore
iconically by adding vertical alignment. The embeddedbracketsensurethat any
reformatting will not result in undetectablechanges.
(20)
A:
C:
A:
B:
S:
B:
((Dtmmn))
What were you doing before.
q
r l , au n d .
W e w e re rn
I r s - -e
r rs
r Ys i n g
a! vro
nessing]
[But we ainft
[[around]I
[HeyI .
[]thml ?
...
lAlI rishtl
lHmJI -
no more,
I .
Note that the use of singleversusdouble bracketshere to distinguishbetween two
separatecasesof overlap in close successionprovides the crucial robustnessthat
mere vertical alignment cannot. Even if the words "All right" (in line 6) were
accidentallyshifted to the left margin, the fact that they overlap with "around" and
not with "Hey" could be reliably recovered.
3.4 Economy
MAKE REPRESENTATIoNS
ECoNoMICAL. The most important reasonsto
build economyinto the designof a discoursetranscriptionsystemare efficiencyin
transcribingand easeof reading,with decreaseddata storagerequirementsbeing
a lesserside benefit.
14. Avoid verbosenotatioru. A short notation is generally preferred over a
long one. This minimizes the number of keystrokesthe transcribermust use,
decreasesstoragedemands,and improvesreadability by keeping the words from
getting lost in the clutter.
Of course,this principle must be balancedagainstthe need for mnemonic
aids to learning (principle 7). For example,in DT whisperedspeechis written
with trvo letters (<WH WH>) where an arbitraryone-characternotationwould
do as well in theory;l0but <WH WH> is easierto remember as whispering
than, say,an arbitrary < 8 8 >.
Tiansciptiondestgnprinciples
(21)
91,
((Amrca))
A : Leopar d s w i l t
b e h a rd to g e t.
Leopards and cheetah.
T hey r r e < W E fa s t W II> .
an d th e y h i d e .
You don I t see thern very often.
On the other hand, <WHISPER WHISPER>, which would obviouslybe still
easierto learn and remember,is rejectedas too verbose:the numerouscharacters
takelongerto write and, by cluttering the transcriptionunnecessarily,make it
harderto focuson the words spoken.
15. Useshort notationsfor high frequencyphenomena. Since the nonalphabetic
symbolsthat have the most obviouspotential for brief, non-verbose
notationsare a limited good, the designershould exploit them wisely. Overall
economy
of charactersin a body of transcriptionsis promoted if the briefest
notationsare reservedfor representingthings that happen most often in discourse'
infrequentlyoccurringphenomenacan be *ritten using longer notations. This cari
be thoughtof as turning Zip?s law on its head-making it a prescriptiveinjunction,
in orderto ensurethat transcriptiondata resemblesnatural languagedata in
frequency
distribution. For example,becauselaughter is generallya high
phenomenonin conversation,the use of the one-character@ sign for
frequency
laughterin DT instead of the word (I-AUGH) is not only economicalin itself, it
alsomakesfor a greater total economy than assigningthe @ to something that
happens
more rarely, such as a cough (which DT is content to write out as
(COUGD).II Similarly, it happensrather often that somewords on a recording
cannotbe heard clearly; rather than write out somethinglike "7 SYLIABLES
INDECIPHERABLE",or evenjust ((2 sylls)),12
an economyis gained in the DT
practiceof simply writing one capital letter X for each indecipherablesyllable.l3
Theadvantageis not merely that fewer keystrokesare neededduring the
process,labut also that the reader is not boggeddown by large
transcribing
amountsof clutter.
In addition, one of the most important functionsof short notations is their
potentialfor diagrammaticiconicity, sincethey can easilybe iterated when the
eventsthey representare iterated (e.g. laughter syllablesor indecipherable
syllables;
seeprinciple 6).
The short notationsprinciple applieseven for smaller scaledifferencesin
formalmarkedness.For example,audible inhalation (written as (H) in DT) is
John W Du Bois
substantiallymore frequent in discoursethan audible exhalationGIx); hence it is
the exhalation that should be assignedthe longer, "marked"notation.ls
((Amrce))
(22)
D:
A:
Do y ou th i n k
(E) No=,
s h e fe l l
out
of
fear?
((DernrssloN))
(23)
B: . . . (4.3)
(Ex) . . . Kids in the city
miss so much.
16. Usedisciminable notatioru for word-internalphenomena. The reason
for this is obvious:if users(and their computationaltools) are to be able to
recognizethe lexical words in a transcription,they need to be able to filter out
any symbolsthat appear in the middle of them. I-engtheningof sounds,laughing
while speaking,and overlap beginningand ending are some of the phenomena
that may need to be written wherever they happen,even if this is in the middle of
a word. As long as the notationsfor suchphenomenause only non-alphabetic
symbols,or employ bracketsto encloseany alphabeticcharactersinsertedwithin
the word, they will allow unambiguousdifferentiation betweenlexical word and
embeddedfeatures. For example,DT usesthe @ sign for laughter (cf. example
19) and the equal sign for prosodic lengthening(example24). Both phenomena
frequently occur in the middle of words,but their notations allow them to be
easily discriminatedfrom the word itself.
(24)
((RaNcu))
R: t he hor s e i s a l w a y s w e = t,
always moi=st,
and itts
it t s alw a y s o n s o rn e th i n g m o i= st,
going to be softer.
Sure itts
17. Minimize word-tnternalnotatioru. Becausereaders tend to recogrize
whole words at one go by taking in their overall gestaltshape,it is important to
disrupt this familiar shapeas little as possible. For phenomenathat must be
written word-internally (such as prosodiclengtheningof individual sounds),the
notation should be not merely discriminable,but short. l-onger notationswill
separatethe beginning from the ending portion of a word, and make it more
difficult to perceive its gestaltunity at a glance. At normal reading speedit is
more difficult to assimilatesomethingwritten as "both(LENGTH)er" (or even
"both= = = - =er") than as "both=er". Fortunately,the non-alphabeticcharacters
that recommendthemselveson the ground of discriminabilityalso make good
short notations,requiring just a single character.
Tiansciption designpinciples
93
The minimum disruption of a word's sequenceof letters is actually zero
if a notation can be moved from inside a word to its margins,this
characters:
lessens
interferencewith gestaltword recognition. For example,in DT tone is
generally
written before the whole word that bears the accent ("/\reduce"), rather
thanbeforethe particular vowel on which this prominenceis realized ("red/\uce")
(asin the computerizedversion of the l-ondon-Lund Corpus; Svartvik and Quirk
1980:17).This is possiblebecausein English as in many languagesthe specific
locationof the prominencewithin the word is generallypredictable on lexical
grounds,
i.e. from information availablein the dictionary.
Although it could be argued that simply enclosingalphabeticword-internal
notations
within parenthesesachievesthe end of disciminability (see principle 16
above)about as well as using nonalphabeticcharacters,the practice fares poorly
on the criterion of minimizing disruption of gestaltword recognition,since it
introduces
at least two additional characters(the parentheses)beyond the number
requiredfor the distinctivenotation itself. Thus a laugh in the middle of a word
writtenwith alphabeticsymbolsas "heh",if provided with the parenthesesneeded
to unambiguously
differentiate it from the surroundingletters of the word it is
in, will interposefive charactersbetweenthe start and the finish of the
embedded
word,as againstone for @.
18. Usespacemeaningfulty.Meaning is carried not just by charactersbut
alsoby their spatial arrangementwithin the frame provided by line and page
(OchsL979:a5ff).If exploited effectivelyas a designfeature, this can both
economize
charactersand promote iconicity. The most penrasiveuse of spacein
discourse
transcriptionis for space-timeiconicity in the portrayal of temporal
sequencing
and simultaneity. Basedon conventionsof reading order traditional in
mostWesternlanguages,notationswritten farther to the left within the line
generally
representearlier eventsthan those written farther to the right, and lines
whichappearhigher on the page representearlier eventsthan lines which appear
lower. (SeeEhlich (forthcoming) for a different but effective approachto spacetimeiconicitybasedon musical notation.) But the complexityof multi-party
interactionsometimesoverwhelmsthis simple iconicity, as when overlapping
speech
by two speakersmust be written on two separatelines: for practical
reasons
one line has to be written above the other despitethere being no basisfor
thisin the temporal meaning carried by the vertical dimension(cf. examples1, 2,
14,20). Even here, the horizontal dimensioncan still be used iconically to
represent
simultaneiryof the overlap, as in DT and other systems.For instance,
in example16 above the bracketsare placed within the sequenceof several@
signs(representingextendedlaughter) in order to displayiconicallyjust where
withinthe laughter sequencethe overlapsbegin and end.
94
John W Du Bois
Note that spacemust be used with caution, sinceit can be a problematic
symbolic resourcedue to potential fragility (cf. principle 13). The safest,most
robust spatial notations revolve around the line of text (i.e. either horizontal
position within a line or vertical position in a sequenceof lines), becauselines are
anchoredby a stable,standardcharacter-the carriagereturn-which, though
normally invisible in itself, is readily visible in its consequences.
Spacecan also be exploited to indicate unit structure, as when line breaks
representintonation unit boundaries(seeprinciple 22 be\ow).
Another key use of space,often overlooked, is for the assignmentof
significance to positions within the line or page as the stable locus for a particular
information category (cf. Ochs 1979). For example,in DT the far left margin
consistentlycarries information on speakeridentity;16the next column over is
where intonation units begin (exceptoverlappingones); and the end of the line is
always the place to look for certain intonational information. Attaching this kind
of significanceto a syntagmaticposition,while perhapsnot robust enoughby itself
to serve as the sole marker of a contrast,can be a big help to the reader who
wishesto scan a transcriptionfor, say,recurrent patterns in terminal pitch
movement,or intonation unit truncation,or accentuationof intonation unit or
turn beginnings,and so on. The common practice of simply running words on
until they fill up the line wastesa valuable symbolicresource,since neither the
right margin nor the line itself carriesany consistentmeaning. Transcriptionusers
should not have to miss out on the kind of intuitive visual organrzationof
transcription information that a stable information locus can provide.
3.5 Adaptability
MAKE THE sysrEM ADArTABLE. Becausespoken discourseis complex
enough in its many layers that it virtually demandsto be approachedfrom a
variety of viewpoints,general discoursetranscriptionsystemsmust be adaptable.
19. Allow for searnlesstraruition betweendegreesof delicaq. For practical
reunonsa discourseresearchermay begin by making a fairly broad transcription,
and later decide to add additional detail. Also, it is frequently desirableto be
able to present a given speechevent in greater or lesserdetail dependingon the
occasion,which may reflect the level of the audience,the kind of phenomenon
being examined,and so on. For thesereasonsthe transcriptionsystemshould
facilitate moving from one level of delicacyto the next, in either direction"within
the same transcriptionalframework and even amongversionsof the same text.
The most important designissuehere is to insure in advancethat if "narrow"
Tianscription desrgnpinciples
95
notationsare later added alongsidethe notationsof an initially broad
transcription,
they neither introduce ambiguitiesnor require changesto the
originaltranscription,which could easilyintroduce errors. Notational paradigms
mustbe set up in advanceto include distinctive symbolsfor narrow features, even
if theseare not as yet being used. (trss crucially,it is also useful if broad and
narrowfeaturesare sufficientlywell distinguishedthat a narrow transcriptioncan
be automaticallyconvertedto a broad one by stripping out the narrow notations.)
For example,a transcribermight begin by indicating only medium and long
pauses,
consideringshort pauses(200 millisecondsor less) as being too detailed
for currentpurposes. In the following example,a broad transcription(using DT)
showsone medium pausein the fourth line, representedby three dots:
(2s)
R: a reining pattern is,
a pattern where you do sliding
stops,
spins,
... Iead changes,
I know you probably don I t know what that
((ReNcu))
is.
But if the researcherlater decidesto focus more attention on speechrhythms,and
henceincreasesdelicacyto the point where even short pausesare shown,all that
will be required is to listen again and time the relevant (short) pauses,and then
markthem where they occur. Given proper transcriptionsystemdesigrr,the
increased
delicacywill be accommodatedsmoothly:newly introduced short-pause
notations(e.g.two-dot clustersin DT) will not be confusedwith, nor require
in, the original representationsof longer pauses.
changes
(26)
R: ..
..
..
...
..
((ReNcH))
a reining pattern is,
a pattern where you .. do sliding stops,
spins,
Iead changes,
I know you probably don I t know what that
is.
20. Allow for seamlessintegration of user-definedtraruciption categoies. It
is impossiblefor the designerof a general-usetranscriptionsystemto anticipate
everythingthat its userswill want to do with it. The systemshould be designedin
to let usersintroduce their own transcriptionalcategories,without
advance
compromising
the integrity of data written using the original transcription
conventions.The simplestway to do this is to leave some symbolsundefined so
thatuserscan assigntheir own meaningsto them, and also to provide for the
96
John W Du Bois
constructionof open-endedsetsof complexsymbols. For example,in DT the
charactersi - ; are reservedfor user-definedcategories. Also, singleparentheses
can be used with any brief descriptionto indicate new categoriesof nonverbal
vocal sounds(e.g. (SNORT), (SNIFFLE)), while angle bracketscan be combined
with a brief annotation to produce as many new categoriesuNare needed for
representingphenomenawhich apply over a stretch of speech(e.g. <WH WH>).
2t. Allow for seatnlessintegrationof presentationfeatures. Advance
planning is required to ensurethat notational resourceswill be available to people
who want to highlight particular speechevent featurerfor example,to call
attention to the backchannelsin a passagecited in an article on this topic. This is
a fairly simple matter, since highlightingis primarily a matter of displaywithin a
limited, controlled context such as a printed article. Becausethe displayformat
chosenfor a publication will not ordinarily affect the integrity of the database
from which its illustrative examplesare drawrl resourceswhich are too
problematic for discoursetranscriptionsand databasesper se (principles 11 and
13)-boldface, italics, underlining, and so orFcan be freely used for presentation
(as exemplifiedin the use of boldface throughout this article).
22. Allow for searnlessintegration of indexing information. Working with
discourseoften requires making referenceto a particular place in the text,
whether to attach analytical coding to a specific unit or to informally describe a
discoursefeature of interest. One commonway to add indexingto a transcription
is to attach a unique number (or complexof numbers)to each unit in the
transcription"such as a line.
(27)
275
277
27A
((CenSer-es))
D: we h a v e th e u s e d D o d g e ,
but no new ones.
G : . . . N o n e rd o n e s ?
While the unit in question can be arbitrarily delineated(for example,as
the set of words that happen to fit on one line), there are important advantagesif
the unit is meaningful,and its boundariessignificant(see principle 18 above).
This allows the unit of indexing to be a significantunit of coding,which can make
managementof integrated transcriptionand coding data much easier. This is one
reason a number of discourseresearchersfind it useful to place each intonation
unit on a separate[ine, allowing each to be individually indexed and subsequently
coded and manipulated consistentlyin all databaseoperations(Du Bois and
Schuetze-Coburn,forthcoming). The one-unit-per-lineconventionis so valuable
that even when one must deal with long intonation units (or long representations
Tiansciption designpinciples
97
of them,as is often the casewhen the original text line is accompaniedby a
gloss),it is worth taking extra measuressuch as using a
morpheme-by-molpheme
smallerfont or printing along the long (landscape)axis of the paper in order to
geteachunit onto a single line.
23. Allow for seamlessintegrationof user-definedcoding information.
Because
the kinds of analyticalcoding that discourseresearchersdo vary widely,
transcription
systemsmust allow for diverseuser-definedcoding schemes.There
aretwo main waysto integrate coding information with transcriptioninformation.
Thefirst is to embed codesdirectly into the transcription;for this purpose some
symbols
shouldbe reservedfor user-definedcoding categories(parallel to
reserving
symbolsfor user-definedtranscriptioncategories,principle 20). The
is simply to provide an indexing schemethat can accessvarious kinds of
second
unitsthat may be of interest to the researcher(principle 22). With a good
indexingschemein place, the researchercan attach an unlimited amount of
codingto eachunit within the comfortableframework of a discoursedatabase,
withoutclutteringup the transcription,and without underminingits integrity with
coding-rather than transcription-levelanalytical decisions. (For a descriptionof
a granmatical-and prosodic-unitindexingand coding schemespecificallydesigned
for discourseresearchdatabases,see Du Bois and Schuetze-Coburn,
forthcoming.)
4 Conclusions
What I have soughtto do in this article is to outline some of the principles
whichin my view should govern the designof an effectivediscoursetranscription
system.I beganby consideringhow discoursetranscriptionsare used,in order to
determinewhat the goals of transcriptiondesignshould be; and then went on to
offersuggestions
on the principles and conventionsthat ensurethat transcriptions
will do what is expectedof them. Throughout I have been guided by an
philosophyof the ideal of discoursetranscription:to presentthe
underlying
multifaceted
flux of discoursein a way that is zrsaccessibleto the analystas it is
to the participant. Though this ideal may representmore than any discourse
transcription
can ultimately hope to achieve,still a transcriptioncan and should
goa longway toward making the speechevent accessibleto the discourse
researcher.The raw reality of discoursemust be interpreted and analyzedinto
categories
which reveal its living structure,and which are displayedin a way that
is both evidentto the reader'seye and accessibleto any computationaltools that
maybe called into play. The transcriptiondesignprocessthus takes into
consideration
both the nature of the speechevent and the nature of discourse
research,
forging a structuredlanguagewhich articulatesthe apprehensionof the
formerby the latter. This processrequires effectivedesignprinciples,if
98
John W Du Bois
transcriptionsare to live up to the multiplex demandsthey face. Making design
principles explicit provides a framework for evaluatingtranscriptionsystems. It
contributesto the designof new systemsand the improvement of old ones,as well
as to a general understanding of the role transcriptions play in shaping analysesof
discourseand the theories to which they give rise.
Along with presentingsome general designprinciples,I have made
frequent reference to one particular transcription system. The Drscourse
Trarucription systemhas grown out of the work of many individuals representing a
variety of approachesto the study of discourse,as codified and further developed
by myself and my colleagues(Du Bois et al. forthcoming a, b). This system,or
variants of it, is now being used by a growing number of researchersat various
universitiesaround the world to pursue a broad range of questionsabout spoken
discourse-questionsabout how speakersuse featureslike discourseparticles,
sentencestructure,and intonation to organizethe flow of information and
interaction in conversation;how speakersachievetheir interactionalgoals through
speech;how social groups differ in their conversationalstyles;how spokengenres,
such as conversations,political speeches,serrnons,and so on, differ from each
other; how spoken languagediffers from written language;how languageswith
sharyly distinct typologicalfeaturesmanagethe samediscoursefunctions; and
many other questions. Becausethis systemis designedto be used for such diverse
purposes,we have worked hard to make sure it adequatelyrepresentsa broad
range of discoursefeatures,within an overall framework that is coherentyet
adaptable.
In fact, the DT transcriptionsystemis best seensimply as a framework
within which representationsof speecheventscan be created-a framework
designedfrom the outset to allow for creativeadaptationand growth. There are
severaldimensionsalong which further developmentcan be hoped for in coming
yearrfor example,toward a precisebut readableinterweavingwithin the
transcription of representationsof speechrhythms and timing; detailed pitch and
amplitude information; and nonverbalcues like eye gaze,body orientatiorqand so
on. Inevitably, ?s new researchdirectionsarise in coming years there will be new
and unforeseentranscriptionalfunctions to be incorporated,and old conventions
to be adapted. To the extent that this or any other systemof discourse
transcription must develop and adapt during the whole of its active lifetime, it is
to be hoped that the designprinciples articulatedhere will contribute to the next
round of advances.
Tianscription designpinciples
99
NOTES
1. Thispaperis basedon a seriesof presentationsmade during September1989at
Uppsala
and Lund Universitiesand at the StockholmConferenceon the Use of
in I-anguageTeachingand Research;and, at UC SantaBarbara, at the
Computers
Conference
on Current Issuesin Corpus Linguistics,June 1990. I am especially
indebted
to StephanSchuetze-Coburn,
Jane Edwardsand Ken Whistler for helpful
discussions
of severalof the issuestreated in this paper, and to Robert Potter for
information
on the history of playwrights'notational conventions. I would also like
my appreciationfor helpful commentson earlier incarnationsof this work
to express
to WallaceChafe, Alan Cruttenden, SusannaCumming, Alessandro Duranti,
Irech, Marianne Mithun, Elinor Ochs,Danae Paolino, Stig Johanssen,
Geoffrey
Emanuel
Schegloff,Jan Svartvik,Gunnel Tottie, SandraThompson,and many others
whohavemadeinvaluablesuggestionsduring the long evolution of our transcription
system.They are of coursenot accountablefor my decisionsin the end.
2. Nor is this by any meansto endorsethe doctrine once known iN separationof
- now seekingrehabilitation under the label of modularity, i.e. internal mutual
levels
(non-integration)of compartmentalizedsystemcomponents.
autonomy
3. Exceptas otherwisenoted, examplesthroughout this article will be presentedin
a fairlybroadtranscription(Du Bois et al. forthcoming a). They will introduce more
detailed"narrow"featuresonly as necessaryfor illustrating specifictranscriptional
categories.
4. All-capitalsare also used for other kinds of non-speech,such as speaker
identification
labels.
5. Thisiconicnotation, incorporatedinto DT, also gains support from modern
scholarly
tradition, given the use of similar notations (e.g. arrows or lines sloping
versusdownward-right)by many others including Crystal (1969),
upward-right
Svarwikand Quirk (1980:17,22), and the International PhoneticAssociation(1989).
6. Thefact that the iconicity doesn'tgain strengthuntil the conventionalmeaning
is revealed
may convincesome that it is not "real" iconicity. But as Whorf, Ullmann
andotherslong ago observed,such after-the-facticonicity may indeed be the norm,
for how speakersunderstandand use
andin anycasehas important consequences
thesymbolin question.
7. Thereis hope in the not-too-distantfuture for an international standardfor
(includingmicrocomputers)that would encompassmore than 64,000
computers
includingvirtually all of the world's writing systems(Ken Whistler,
characters,
i00
John w Du Bois
personal communication). If this standardcomesinto universaluse, the constraints
on the selectionof symbolsfor discoursetranscriptionsystemswill easeconsiderably.
However, the fundamentaldesignprinciples outlined here will remain valid.
8. But see principle 2L for legitimate usesof such notational resources.
9. Another contrastwhich, though lessproblematic,is not entirely reliable, is that
between lower-caseand upper-caseletters,when the notation is confrontedwith
words which are obligatorily written entirely in capitals,such as the English first
person 'I' or commonly pronouncedacronymslike 'USA, 'ACLIJ', and so on.
10. Since DT uses <W W> as a mnemonic (and non-verbose)notation for
"widened"pitch movement,the letter W is not availablefor use here.
11. Of course,soundsdon't come already analyzedinto categories:the transcriber
must actively classifythe continuousdiversityof human vocal expressionif such
discrete transcriptionalcategoriesas "laugh"and "cough"are to be employed. This
is true whether the categoriesare representedin single characterslike "@" or in
stringslike "COUGH", whose apparent orthographicsuccessin capturing the reality
of a "cough"is of courseonly illusory. It remains true whether the categoriesare few
and broad (laugh, cough,etc.) or many and narrow (distinguishingseveraltypes of
laughter,for example). Spelling out categorylabels is a mnemonic aid, not a
depiction of natural fact.
12. See SvarMk and Quirk (1980:2a).
13. The letter X has in the past been used in child languagetranscriptionto indicate
exact repetition of an utterance (Ochs 1979:65),but this practice seemslesscommon
nowadays.
14. Assumingthat the indecipherablestretcheswill normally be relatively short, of
course.
15. But see Jefferson'ssystem(Atkinson and Heritage 1984),which reversesthis
markednessrelation with .h and h, respectively.
16. The meaningful use of spaceis a key characteristicof play- and film-scripts,
though specificconventionsvary (cf. examples4 and 11 above,and Catron (198a)).
Tianscription designpinciples
101
REFERENCES
Atkinso4J. Maxwell and John Heritage (eds.) (1984)Transcript notation.
Structuresof social action: Studiesin conversationanalysis. Cambridge:
CambridgeUniversity Press,ix-xvi.
Catron,l,ouis E. (1984) Witing producing,and sellingyour play. Englewood Cliffs,
New Jersey:Prentice-Hall.
Chafe,WallaceL. (1987) Cognitive constraintson information flow. In Russell S.
Tomlin (ed.), Coherenceand groun"dtngin discource. Amsterdam:
Benjamins,
2L-52.
Crystal,David (1969) Prosodicsystemsand intonation in English. Cambridge:
CambridgeUniversity Press.
Disraeli,Benjamin(1904) Sybil;or, The two natioru. [-ondon: M. Walter Dunne.
Du Bois,John W. (1985) Competingmotivations. In John Haiman (ed.), Iconicity
in syntax. Typologicalstudiesin language,vol. 6. Amsterdam: Benjamins,
343-65.
Du Bois,John W. (1986) The pragmaticsof marginality:Illocutionlesslanguagein
cartoonsemiotics. Paper deliveredto the Conferenceon Sound
Symbolism,UC Berkeley.
Du Bois,John W. (forthcoming) Typology of discourseinformation. MS, UC
SantaBarbara.
Du Bois,John W. and StephanSchuetze-Coburn(forthcoming) Representing
hierarchy:Constituentstructure for discoursedatabases.In Edwards and
lampert (forthcoming).
Du Bois,John W., StephanSchuetze-Coburn,
Danae Paolino, and Susanna
Cumming(forthcoming a) Discoursetransciption. MS, UC Santa Barbara.
Du Bois,John W., StephanSchuetze-Coburn,
Danae Paolino, and Susanna
Cumming(forthcoming b) Outline of discoursetranscription. In Edwards
and lampert (forthcoming).
I
L02
John w. Du Bois
Edwards, Jane A (1989) Transciption and the new funcrtonalism:A
counterproposalto CHILDES' CHAT conventioru. Berkeley Cognitive
ScienceReport No. 58, March 1989. Institute of Cognitive Studies,
University of Californi4 Berkeley.
Edwards, Jane A. (forthcoming) Transcription in discourse. In William Bright
(ed.), Oxford International Enqclopedia of Linguistics. Oxford: Odord
University Press.
Edwards, Jane A. and Martin D. l-ampert (eds.) (forthcoming) Talking dua:
Trarcciption and coding of spokendiscourse. Hillsdale, New Jersey:
lawrence Erlbaum Associates.
Ehlich, Konrad (forthcoming) HIAF-A transcription systemfor discoursedata.
In Edwards and l-ampert (forthcoming).
Hillerman, Tony (1990) Coyotewaits. Harper and Row.
International Phonetic Association(1989) Report on the 1989Kiel Convention.
Joumal of the International PhoneticAssociuion 19: 67-80.
l^adefoged,Peter (1990) Some reflectionson the new IPA. UCL/I WorkingPapers
in Linguistics 74: 6l-76.
[.odge, David (1984) Small World. New York: Warner.
Ochs, Elinor (1979) Transcription as theory. In Elinor Ochs and Bambi B.
Schieffelin (eds.),Developmentalpragmatics. New York: Academic Press,
43-72.
Shakespeare,William (1,623[1928]) The tempest.A facsimileof the first folio text.
I-ondon: Faber and Faber.
Wilde, Oscar (1982) The importanceof being earnest. The annotatedOscar Wilde.
New York: Clarkson Potter.
Williams, Tennessee(1971) The glassmenagerie. The theatreof Tennessee
Wlliarns, vol. 1. New York: New Directions.
Tiansciption designprinciptes
103
Svartvik,Jan and Randolph Quirk (eds.) (1980)A corpusof English conversation.
Lund Studiesin English 56. Lund: C.W.K. Gleerup.
104
John w Du Bois
APPENDIX 1. Symbolsfor DiscourseTranscription
Umrs
Intonation unit
Truncated intonation unit
Word
Truncated word
SpEexnns
Speakeridentity/turn start
Speechoverlap
TnaNsmoNAL CoNnNUITY
Final
Continuing
Appeal
TERNanTALPtrcg DnecrroN
Fall
Rise
I-evel
Accexr AND LeNcrrreNING
Primary accent
Secondaryaccent
Booster
kngthening
Toue
Fall
Rise
Fall-rise
Rise-fall
lrvel
Peuse
[nng
Medium
Short
I-atching
Vocar Norses
Vocal noises
Inhalation
Exhalation
Glottal stop
Iaughter
!:^*,
{-sWce}
il
t
?
\
I
I
:
\
/
\/
/\
:::(*)
(0)
o
(H)
(H")
Vo
@
retum]
Tianscription destgnpinciptes
r
,
'
i
,
I
QueI-rrY
Quality
laugh quality
Quotationquality
Multiple quality features
PHounncs
Phonetic/phonemictranscription
Tnq,NscnlBER's
PeRspEctve,
Researcher'scomment
Uncertainhearing
Indecipherablesyllable
NoTATIoNS
SPEcTaUZND
Duration
Intonation unit continued
Intonation subunit boundary
Embeddedintonationunit
Reset
False start
Codeswitching
<Y Y>
<@ @>
<Q Q>
<Y <Z Z>Y >
(/ /)
(( ))
<X X>
X
(N)
&
I
.l l>
{Capital Initial}
<12 12>
NON.TR,aNSCRIPTIoN LINES
Non-transcriptionline
Interlinear glossline
Rnsenvep SvNasors
Phonemic/orthographic
Morphosyntacticcoding
User-definable
$
$G
'
+'#
" - ;
{)
105
106
John w Du Bois
^APPENDIX2. Synopsisof Tlanscription DesignPrinciples
CarecoRY DEFIMTIoN: Define good categories.
1. Define transcriptionalcategorieswhich make the necessarydistinctions
among discoursephenomena.
2. Define sufficiently explicit categories.
3. Define sufficientlygeneral categories.
4. Contrast data types.
AccessmILITY: Mal@ the systemaccessible.
5. Use familiar notations.
6. Use motivated notations.
7. Use easily learned notations.
8. Segregateunfamiliar notations.
9. Use notations which ma,rimizedata access.
10. Maintain consistentappearanceacrossmodesof access.
robust.
Ronusmvrs s: M ake representations
11. Use widely available characters.
12. Avoid invisible contrasts.
13. Avoid fragile contrasts.
EcoNouv:
14.
15.
16.
17.
18.
Make representatiorueconomical.
Avoid verbosenotations.
Use short notationsfor high frequencyphenomena.
Use discriminablenotationsfor word-internal phenomena.
Minimize word-internal notations.
Use spacemeaningfully.
AoamenILITY: Make the systemadaptable.
19. Allow for seamlesstransition betweendegreesof delicacy.
20. Allow for seamlessintegration of user-definedtranscriptioncategories.
21. Allow for seamlessintegration of presentationfeatures.
22. Allow for seamlessintegration of indexinginformation.
23. Allow for seamlessintegration of user-definedcoding information.
© Copyright 2026 Paperzz