View - OhioLINK Electronic Theses and Dissertations Center

A CORPUS-BASED INVESTIGATION
OF LEXICAL COHESION IN EN & IT NON-TRANSLATED TEXTS
AND IN IT TRANSLATED TEXTS
A thesis submitted
To Kent State University in partial
Fufillment of the requirements for the
Degree of Doctor of Philosophy
by
Leonardo Giannossa
August, 2012
© Copyright by Leonardo Giannossa 2012
All Rights Reserved
ii
Dissertation written by
Leonardo Giannossa
M.A., University of Bari – Italy, 2007
B.A., University of Bari, Italy, 2005
Approved by
______________________________, Chair, Doctoral Dissertation Committee
Brian Baer
______________________________, Members, Doctoral Dissertation Committee
Richard K. Washbourne
______________________________,
Erik Angelone
______________________________,
Patricia Dunmire
______________________________,
Sarah Rilling
Accepted by
______________________________, Interim Chair, Modern and Classical Language
Studies
Keiran J. Dunne
______________________________, Dean, College of Arts and Sciences
Timothy Moerland
iii
Table of Contents
LIST OF FIGURES .......................................................................................................... vii
LIST OF TABLES ........................................................................................................... viii
DEDICATION ................................................................................................................... xi
ACKNOWLEDGEMENTS .............................................................................................. xii
ABSTRACT ..................................................................................................................... xiv
INTRODUCTION .............................................................................................................. 1
Why Study Lexical Cohesion? ......................................................................................... 1
Research Hypotheses ....................................................................................................... 8
Research Method ............................................................................................................. 9
Significance of my Research Hypotheses ...................................................................... 12
Summary of Chapters .................................................................................................... 15
CHAPTER I ...................................................................................................................... 16
1.1 Coherence vs. Cohesion .......................................................................................... 16
1.2 Lexical Cohesion ..................................................................................................... 21
1.3 Lexical Cohesion Studies in Discourse Analysis and Linguistics ........................... 26
1.4 Lexical Chaining Sources ........................................................................................ 32
1.4.1 The WordNet Project ........................................................................................ 32
1.5 Cohesion and Lexical Cohesion in Translation Studies .......................................... 34
CHAPTER II ..................................................................................................................... 44
2.1 Methodological Approaches: Text Analysis and Corpus Linguistics ..................... 45
2.1.1 Text Analysis..................................................................................................... 45
iv
2.2 Tools ........................................................................................................................ 57
2.2.1 WordSmith Tools .............................................................................................. 57
2.2.2 WordNet ............................................................................................................ 60
2.3 Preliminary Analysis ............................................................................................... 62
2.4 Semantic Relation Analysis ..................................................................................... 66
2.5 Statistical Analysis .................................................................................................. 70
CHAPTER III ................................................................................................................... 72
3.1 Parallel and Comparable Corpora............................................................................ 72
3.2 Textual Analysis ...................................................................................................... 73
3.2.1 Standardized Type-Token Ratio........................................................................ 73
3.2.2 Sentence number ............................................................................................... 77
3.2.3 Lexical Density ................................................................................................. 82
3.2.4 Readability ........................................................................................................ 84
3.2.5 Average Sentence Length.................................................................................. 86
3.3 Semantic Analysis ................................................................................................... 89
3.3.1 Repetition and Modified Repetition .................................................................. 90
3.3.2 Synonyms .......................................................................................................... 93
3.3.3 Antonyms, Meronyms and Holonyms .............................................................. 95
3.3.4 Hypernyms ...................................................................................................... 101
3.3.5 Hyponyms ....................................................................................................... 103
3.4 SPSS Statistical Analysis....................................................................................... 106
3.4.1 Textual Features .............................................................................................. 106
v
3.4.2 SPSS Statistical Analysis: Semantic Features ................................................. 108
CHAPTER IV ................................................................................................................. 113
4.1 Introduction ........................................................................................................... 113
4.2 Textual Features .................................................................................................... 117
Average Sentence length .......................................................................................... 117
Sentence number ...................................................................................................... 119
STTR ........................................................................................................................ 121
Lexical Density ........................................................................................................ 122
4.3 Semantic Features .................................................................................................. 123
4.3.1 Repetition ........................................................................................................ 123
4.3.2 Synonyms ........................................................................................................ 126
4.3.3 Meronyms, Holonyms, Hypernyms, Hyponyms, Antonyms .......................... 127
4.3.4 Semantic categories other than repetitions as a whole .................................... 128
CHAPTER V .................................................................................................................. 133
5.1 Introduction ........................................................................................................... 133
5.2 Pedagogical Implications ....................................................................................... 138
5.3 Limitations and Future Directions ......................................................................... 152
GLOSSARY OF ACRONYMS...................................................................................... 157
References ....................................................................................................................... 159
Webography: ................................................................................................................... 167
vi
LIST OF FIGURES
Figure 1 – WordList Frequency List ................................................................................. 58
Figure 2 – Wordlist Statistics ............................................................................................ 59
Figure 3 – MultiWordNet Interface .................................................................................. 61
vii
LIST OF TABLES
Table 1 – Preliminary Textual Analysis Screenshot ......................................................... 63
Table 2 – Semantic Relation Analysis .............................................................................. 66
Table 3.1 – Parallel Corpus STTRs .................................................................................. 73
Table 3.2 – Parallel Corpus Types .................................................................................... 74
Table 3.3 Comparable Corpus STTRs .............................................................................. 76
Table 3.4 – Comparable Corpus STTR Means ................................................................. 77
Table 3.5 – Parallel Corpus Sentence Numbers................................................................ 77
Table 3.6 – Average Sentence Numbers ........................................................................... 78
Table 3.7 – Parallel and Comparable Corpus Average Sentence Length ......................... 79
Table 3.8 – Average Sentence Number in an average text of 3,339 tokens ..................... 80
Table 3.9 – Comparable Corpus Average Sentence Number ........................................... 81
Table 3.10 – Parallel Corpus Lexical Density .................................................................. 82
Table 3.11 – Lexical Density in English and Italian Originals ......................................... 83
Table 3.12 – Parallel and Comparable Corpus Average Lexical Density ........................ 83
Table 3.13 – Parallel Corpus Readability Indices ............................................................. 84
Table 3.14 – Parallel and Comparable Corpus Readability Indices ................................. 85
Table 3.15 – Parallel Corpus Average Sentence Length .................................................. 86
Table 3.16 Parallel Corpus mean ASL ............................................................................. 87
Table 3.17 – Comparable Corpus ASL ............................................................................. 88
Table 3.18 – Comparable Corpus Mean ASL ................................................................... 89
Table 3.19 – Mean ASL in English and Italian Originals................................................. 89
viii
Table 3.20 – Parallel Corpus Repetitions ......................................................................... 90
Table 3.21 – Parallel Corpus Average Repetition ............................................................ 91
Table 3.22 – Comparable Corpus Average Repetition ..................................................... 91
Table 3.23 – Percentage of Use of Repetition Compared to Text Size ............................ 92
Table 3.24 – Parallel Corpus Synonyms ........................................................................... 93
Table 3.25 – Parallel Corpus Synonym Means ................................................................. 94
Table 3.26 – Parallel Corpus Antonyms ........................................................................... 95
Table 3.27 – Parallel Corpus Antonym Means ................................................................. 96
Table 3.28 – Italian Originals Antonyms .......................................................................... 96
Table 3.29 – Parallel Corpus Meronyms .......................................................................... 97
Table 3.30 – Parallel Corpus Meronym Means ................................................................ 98
Table 3.31 – Italian Originals Meronyms ......................................................................... 98
Table 3.32 – Parallel Corpus Holonyms ........................................................................... 99
Table 3.33 – Parallel Corpus Holonym Means ............................................................... 100
Table 3.34 – Italian Originals Holonyms ........................................................................ 100
Table 3.35 – Parallel Corpus Hypernyms ....................................................................... 101
Table 3.36 – Parallel Corpus Hypernym Means ............................................................. 102
Table 3.37 – Italian Originals Hypernyms ...................................................................... 103
Table 3.38 – Parallel Corpus Hyponyms ........................................................................ 104
Table 3.39 – Parallel Corpus Hyponym Means .............................................................. 105
Table 3.40 – Italian Originals Hyponyms ....................................................................... 105
Table 3.41 – Parallel & Comparable Corpus Statistical Data ......................................... 109
ix
Table 3.42 – Parallel & Comparable Corpus Statistical Means...................................... 111
x
DEDICATION
To my parents Antonio Giannossa and Rita Tocci, without them and their good teachings
I would never have come as far as I have
xi
ACKNOWLEDGEMENTS
Firstly I would like to express deep gratitude to my advisor, Dr. Brian Baer, who helped
me throughout the, at times hard, and stressful writing stages of this dissertation with his
insightful advice and suggestions. Sometimes, it is hard to find the motivation to write
your own dissertation because of the many obstacles and hurdles that you might find
along the way, but his constant presence and aid enabled me to find the motivation to
keep moving on. I would also like to thank the remaining members of the committee,
namely Dr. Richard Kelly Washbourne, Dr. Erik Angelone, Dr. Patricia Dunmire, and
Dr. Sarah Rilling, who contributed to this dissertation through their valuable and thoughtprovoking comments and feedback which helped me to enhance this work in its style and
content.
Secondly, I would like to thank all the professors from the Institute for Applied
Linguistics at Kent State University whom I had during my coursework years. Thanks to
them I have come to learn a lot more than I already did about the field of translation
studies. Each class helped me fine-tune my critical thinking and pedagogical skills thus
promoting those scholarly skills that I will be needing once out of the program and into
the academic environment. Their constructive feedback on papers and their hands-on
approach to translation studies help me grow both as a translator and as a scholar. In
particular, I would like to express my gratitude to Dr. Isabel Lacruz for her help in
coming up with the experimental design for my study and for the time she devoted to my
questions and the interest she showed in my topic as well as her constant encouragement.
I am also grateful to Dr. Wakabayashi for helping me present and publish my first paper
xii
on the CATS website and for being always available whenever I need her help.
Thirdly, I am deeply indebted to my mother and my father who, over the last four
years, though geographically far away from me by thousands of miles, could not have
been closer morally and spiritually. They have always been there for me and supported
me in all of my decisions and for their unconditional, unselfish love and support I am
deeply grateful to them.
Fourthly, I would like to thank all the people that, over the past four years, I have
got to know at Kent State University and all of my colleagues from the translation
department with whom I shared some unforgettable moments which I will cherish in the
years to come. In particular, I would like to thank Loubna Bilali, Sohomjit Ray, Ajisa
Fukudenji, Monica Rodriguez and long-time friend Adriana Di Biase, whom I first met in
2004 during the first year of my master’s program at the University of Bari (Italy). For
their true friendship, I will always be grateful to them.
Finally, I express my sincere and deepest gratitude to all the people who, in
crossing my my life pathway, left an indelible mark because they all contributed, to
varying degrees, to making me become the person I am today. In this respect, I would
like to thank Mariangela Monteleone, and Italian Instructors Rosa Commisso and
Donatella Salvatori who warmly welcomed me in their life at the beginning of this long
journey called PhD.
xiii
ABSTRACT
The present study sets out to investigate lexical cohesion and the network of lexical
chains it creates from the point of view of translation. This topic has been largely
understudied in the translation field, though many scholars acknowledge its importance
and the major role it plays in shaping the quality of a translation, (Hoey [1991]
demonstrates that approximately 50% of a text’s cohesive markers are lexical) and in
affecting the target readership’s response to translations. This study employs Morris and
Hirst’s (1991) categorization of lexical cohesion into 1) reiteration with identity of
reference; 2) reiteration without identity of reference; 3) reiteration by means of
superordinates; 4) systematic semantic relations; and 5) non-systematic semantic
relations. The study tests two hypotheses. Hypothesis number one claims that Italian
translations of English scientific articles taken from Le Scienze, which is the Italian
edition of Scientific American, tend to reproduce the lexical cohesive markers of the
source texts failing to make them conform to the TL norms and text-type conventions.
Hypothesis number two claims that the lexical cohesive markers used in articles
originally written in Italian and published in Le Scienze differ from the ones used in the
Italian translations. In both experiments, WordNet, which is a lexical database for the
English language designed by Professor George A. Miller at Princeton University, will be
used to identify the word senses and the semantic relations connecting the different
lexical chains in the texts. As for the Italian texts, MultiWordNet, a multilingual lexical
database which allows the Italian version to be strictly aligned with the Princeton
WordNet, will be used in analyzing word senses and semantic relations. Hypotheses
xiv
number 1 and 2 are strictly interrelated insofar as they both set out to demonstrate that not
much emphasis has been placed yet on cohesion in general and lexical cohesion in
particular in translator training programs and that a greater awareness of it could benefit
both professionals and novices.
xv
INTRODUCTION
Why Study Lexical Cohesion?
This dissertation sets out to investigate the topic of lexical cohesion, which has long been
understudied in translation studies, thus raising awareness of the major role that this
cohesive device plays in translator training programs. The corpus-based approach to the
study of such cohesive device combined with the manual analysis of the intra-textual
semantic relations and the statistical processing of the results is designed to provide
evidence pointing to the importance of lexical cohesion in translation quality.
Lexical cohesion is one of the five cohesive devices that were first identified by
Halliday and Hasan in their pioneering work Cohesion in English (1976), wherein lexical
cohesion is defined as “the cohesive effect achieved by the selection of vocabulary (1976:
274)”. However, at the time, they both seemed unaware of the dominant role that this
cohesive device played, and still plays, in creating texture and making texts coherent.
This shortcoming is raised by Hoey (1991) in his book Patterns of Lexis in Text, in which
he argues that Halliday and Hasan fail to acknowledge the primary role of lexical
cohesion in building texture. He draws this conclusion by looking at the sample analyses
of seven different types of texts included at the end of Cohesion in English. He focuses
on the frequency data of the five types of cohesive markers and notices that lexical
cohesion accounts for more than forty percent, compared to thirty-two percent for
reference, twelve percent for conjunction, ten percent for ellipsis and four percent for
substitution (1991: 9). Yet, Hoey states, the two authors cover lexical cohesion in less
1
2
than twenty pages while dedicating over fifty pages to conjunction (1991: 9). But Hoey is
not the only one who holds this view; Elke Teich and Peter Fankhauser are of the same
opinion and in their article “Wordnet for Lexical Cohesion Analysis” state that lexical
cohesion makes the most substantive contribution to the semantic coherence of a text
(2004: 327). To be more precise, nearly fifty percent of a text’s cohesive ties are said to
be lexical (2004: 327).
Lexical cohesion, however, is not only important because it affects the texture and
coherence of a text; it also plays a major role in the interpreting process individuals
engage in when reading texts or listening to dialogues. In this respect, Morris and Hirst,
who investigated lexical cohesion as an indicator of text-structure/coherence, claim that
this cohesive device helps readers solve ambiguity issues and narrow down the meaning
of words by providing clues to determine the coherence of a text (1991: 23). In translator
training, the above-mentioned statement implies that cohesion could be used to help
students disambiguate the content of a text to be translated or narrow down the choice of
translation equivalents to a few candidates. This disambiguation process could be
fostered by having students work on pre-translation tasks such as the building of
referential networks of texts to be translated (see Chapter 5 for further information).
Referential networks are lists of lexical items that are semantically related to one another.
The identification of semantic relations using, for example, electronic thesauri, can help
students narrow down the semantic field of a lexical item and, within this semantic field,
single out its conceptual meaning. However, one might wonder whether disambiguation
applies to all types of translation. A case in point is literary texts, especially the ones in
3
which coherence is disrupted on purpose and thus disambiguation may be hard to
achieve. In these cases, disambiguation entails making sure that the target readership has
the same reaction or response to a piece of poetry or literary work that the source text
readership had when reading that same piece of work. Indeed, the author of a literary text
chooses his or her lexis having a specific purpose and readership in mind. Through word
choice, s/he aims at evoking a particular emotion or reaction on the reader’s part. Hence,
coherence, in this case, refers to everything in a text that leads the reader to respond/react
to the text in the way the writer originally intended.
The problem posed by the study of lexical cohesion in translation is due to the fact
that cohesive devices are language-, culture- and text-type-specific, as pointed out by
such scholars as Elke Teich and Peter Fankhauser (2004), Beata B. Klebanov, Daniel
Diermeier and Eyal Beigman (2008), Hatim and Mason (1990), Mona Baker (1992) and
Shoshana Blum-Kulka (2000). In this respect, Blum-Kulka argues that differences in
grammar and style preferences between any two languages will bring about different
ways to express cohesion in the target language (299-300). It follows that when
translating from any L1 into any L2 and from any L1 text-type into any L2 text-type,
there will inevitably be a change in the kinds of cohesive devices the translator will
adopt, as well as their distribution in the text. This means that, depending on the language
into which one is translating, the types of cohesive ties and their distribution will have to
change accordingly. From what has been said so far, it is possible to assert that in order
for a translation to be coherent, it needs to reproduce the network of lexical chains that
characterize the source text, but at the same time, the types and distribution of the lexical
4
items creating lexical chains need to be adapted to the target language (TL) norms and
TL text-type. To this end, one needs first to get acquainted with the cohesive devices
peculiar to the target language and sub-language, and only then will it be possible to
make the appropriate changes in cohesive terms when translating into L2.
Though there are many studies approaching and investigating lexical cohesion
from both text-based (linguistic) and reader-based (cognitive) points of view, few of them
are concerned with translation and pedagogical issues; and even when they do deal with
such issues, they mostly tend to approach the problem from a corpus-based standpoint,
treating lexical cohesion in theoretical terms, making generalizations about its use in
translation and suggestions about prospective fruitful pathways of research. For instance,
Mona Baker (1992) and Steiner (2005) argue that a product-based analysis of lexical
cohesion can be achieved by adopting a corpus-based approach that automates the
computation of lexical density and type-token ratio. Lexical density tells us if two or
more texts are similar in terms of number of content words, whereas type-token ratio
gives information about variation in vocabulary. One of the drawbacks of a purely
corpus-based approach to the study of lexical cohesion is that it would only take into
consideration words that tend to co-occur and are therefore semantically related while
ignoring those words that, while semantically related, do not co-occur. Indeed, there are
also semantic relationships and connections among words that require even greater
background and culture knowledge on the reader’s part. This is true of culture-bound
terms, namely terms that developed historically and culturally in a specific area and are
peculiar to the people located therein, as well as technical terms. In the latter respect, one
5
of the articles to be analyzed in this study “White Matter Matters” contains two terms
which are used as synonyms, namely white matter, which occurs twenty-seven times, and
white cabling, which occurs just once. When electronic corpora are used, attention is
placed on words or sets of words co-occurring with a certain frequency, which is not the
case with white cabling. Hypothetically, during an electronic search white cabling would
come up in the search results only if the adjective white were looked up; but if the search
is done using the noun matter, then the expression white cabling would not appear.
Besides, even when the two expressions do appear in the search results as in the first of
the two above-mentioned scenarios, it still lies with the reader to decide whether or not
they are semantically related by resorting not only to textual clues but also to his/her
world and culture knowledge which machines do not have.
This is exactly the issue on which pure linguists and schema theoreticians
disagree. In this regard, Glenn Flucher in his article “Cohesion and Coherence in Theory
and Reading Research” argues that for schema theoreticians, coherence comes first. In
other words, coherence precedes cohesion, which means that readers first look for
coherence based on their world and culture knowledge and then recognize cohesion
(1989: 146). By contrast, linguists give cohesion a primary role in making a text
coherent. In Halliday and Hasan (1976), the interpretation of the cohesive devices
populating a text is text-bound; in other words, one does not need to go beyond the text to
find the clues leading to the disambiguation of word meanings. This idea of cohesion as
an index of textual coherence has mostly been criticized by schema theory scholars
because the latter view text processing as an interactive process between a reader and a
6
text. Thus, the belief that coherence is to be found in the text is viewed as questionable
because schema theoreticians are of the opinion that a reader’s background and culture
knowledge affects his or her interpretation of a text as highly or poorly coherent
depending on whether “there is a mismatch in cultural background knowledge between
the reader and that assumed by the text.” (Carrel 1982: 485) Empirical findings suggest
that whenever such a mismatch occurs there is likely to be a loss of textual cohesion
(Carrol 1982: 485). Simply put, this means that if the reader does not have or fails to
access the appropriate background schema underlying the text he or she is reading, the
presence of cohesive ties in it will not be of any help in making sense of the text.
It is important to note here that many of these studies that investigate cohesion
from a reader-based standpoint are concerned with EFL (English as a Foreign Language)
reading and writing. As far as lexical cohesion is concerned, there is an empirical study
by Keiko Muto (2006) that is worth mentioning here. The experiment sets out to
investigate how lexical cohesion could help EFL students in reading and writing. Forty
first-year students from two Extensive Reading classes were first introduced to the notion
of lexical cohesion and then they were instructed to read three short stories and find clues
relating to the temporal and geographical settings and character features of the stories.
The purpose was to study the degree to which students were able to use knowledge of
cohesion to interpret the stories. It was found that lack of cultural/background knowledge
on the part of the readers prevented them from determining lexical cohesion. In the
experiment, the students failed to recognize “Madison Square” as a clue word linked to
the place where one of the three stories took place, namely New York (Muto 2006: 114-
7
115), because they were not familiar with where “Madison Square” was located.
When it comes to translation, the above-mentioned implies that in order for the
translator to make sense of the text, s/he needs to be familiar not only with the TL culture
but also with TL text-type norms and TL cohesion norms. Lexical cohesion, being
expressed through content words and more generally speaking through vocabulary, is
inevitably language-specific. As Mona Baker notices (1992), the lack of equivalents in a
target language pushes translators to resort to superordinates, paraphrases or loan words.
The use of such devices results in the generation of different lexical chains in the target
text (207). As a consequence, the source and target texts wind up differing in terms of
networks of lexical cohesion and so will trigger different associations in the mind of the
readers (210). Insofar as lexical cohesion has a major impact on the style, quality and
interpretation of a translation on the part of the target readers, the topic deserves more
attention and investigation than has already been done. It is necessary to shed light on
what strategies translators can adopt when translating to reproduce the network of lexical
chains of the source text as close as possible in the target text, thus trying to keep readers’
associations intact.
Translation, as a transcoding activity involving any two verbal codes, calls for
both reading and transferring strategies. In both cases, cohesive markers, in general, and
lexical cohesion, in particular, are of paramount importance because they help translators,
who, ideally, are proficient in the two languages and cultures involved and any other
facets that the latter encompass, to interpret the text when reading it; but cohesive
markers also have an impact, when projected into the target text/culture, on the way the
8
target readership, based on their world, culture, and text-type knowledge, will perceive
the text as coherent. In this case, coherence can be thought of in terms of mental
associations. In other words, the target text can be deemed coherent when the TT
readership experiences the same reaction to the text as the one intended by the original
writer for his/her readership. This idea of cohesion as involving extra-textual factors was
first introduced by Singh in “Contrastive Textual Cohesion”, in which he draws a
distinction between semantic (linguistic) and pragmatic cohesion (1979). According to
this author, in semantic cohesion, the information can be extracted from the text, whereas
in pragmatic cohesion, the information is inferred from outside the linguistic context. As
mentioned above, Halliday and Hasan focus on the former whereas Brown and Baker and
other scholars hold the view that both types of cohesion need to be taken into account
because they both contribute to the texture and coherence of a text.
As much as reading in translation is of paramount importance and deserves further
investigation, the goal of this research will not be concerned with it. Instead, I will
concentrate on the stylistic preferences and/or differences in terms of use and amount of
lexical cohesive devices that exist between Original English and Original Italian as well
as Original Italian and Translated Italian.
Research Hypotheses
The research hypotheses to be studied in this dissertation approach lexical cohesion from
a corpus-based point of view and provide evidence in support of the thesis maintained by
many discourse analysis and translation scholars according to whom cohesion is
9
language- and text-type-specific. In particular, hypothesis one claims that Italian
translations of English scientific articles taken from Le Scienze, the Italian version of the
American-English magazine Scientific American, tend to reproduce the lexical cohesive
markers of the source text. Hypothesis two, on the other hand, claims that articles taken
from Le Scienze which were originally written in Italian by Italian authors tend to differ
in the use and amount of lexical cohesive markers from Italian translated texts published
in the same magazine.
Research Method
One parallel corpus will be used to compare lexical cohesion in English texts and Italian
translations and test hypothesis number one. In this respect, the magazine that will be
used is Scientific American, which publishes news and articles about science, technology
information and policy for a general educated audience (the official website states that
one third of its readership has a postgraduate degree). The online subscription to the
magazine allows users to get access to current issues as well as to the magazine’s archive.
This is important to this study because articles have been randomly selected over a span
of ten years (more precisely from 1991 to 2000). Scientific American has also been
chosen because it has an Italian edition called Le Scienze. It started off as the Italian
translation of the American magazine in 1968 but now features both American and Italian
contributions to scientific research. The fact that the Italian edition features both
translations and articles written by Italian scientists makes it possible not only to compare
the English texts and their translations into Italian to see if the translators reproduced the
10
English lexical cohesion network in the target texts, but also to find out whether there are
any differences in lexical cohesion between articles originally written in Italian and the
ones translated into Italian from English. The latter will constitute a comparable or
reference corpus made up of Italian articles extracted from the same magazine that will
be used to test the second hypothesis.
It is worth mentioning here that the same magazine was used by Maria Teressa
Mussacchio (2007) in “The Distribution of Information in LSP Translation. A Corpus
Study of Italian.” In her study, the author investigates features of information structures
in translation such as the principle of given-new information and that of end-focus (2007:
94). In other words, she aims at finding out whether or not translations use the given-new
information flow of the original texts. To this end, a translation corpus consisting of nine
English articles on popular physics and their translations was collected. The English
articles were all published in the American monthly Scientific American over a span of
ten years (1993 through 2003). The Italian translations were published in the Italian
edition Le Scienze. In addition to this, a reference corpus of Italian articles on the same
topic was used to compare the results from the translation corpus to see whether or not
the information structure used in the Italian translations conformed to TL norms. The
author, for example, focuses her attention on verbs of happening such accadere and
venire which, in Italian, usually cause the subject-verb inversion thus promoting the verb
to marked theme. However, Musacchio notices that in the Italian translations, this
inversion does not take place; instead, the information flow of the source text is recreated
thus impairing the natural information flow of Italian (2007: 94). Besides analyzing the
11
information structure of these texts, the author also discusses cohesion, though this is not
the main focus of the article. Only three pages or so are concerned with cohesion, and in
these three pages the cohesive markers that are dealt with are conjunctions and repetition.
As far as repetition is concerned, it is said that English science cohesion is often created
through reiteration by means of repetition, especially noun repetition, which in Italian is
avoided for stylistic reasons unless non-repetition is a source of ambiguity (2007: 99). In
the translation corpus, the author identifies many examples of repetition in the Italian
translation that are clearly a calque of the English cohesive sentence structure (2007: 99).
My study fits in with the research carried out by the above-mentioned author; however,
this is just a starting point, since the author does not focus on lexical cohesion and the
different categories of reiteration and collocation identified by Morris and Hirst, which in
my research are the main focus of analysis, not to mention that the approach to the
analysis of cohesive devices is also different, since in my research, it is the network of
lexical chains created through the use of synonyms, hyponyms, meronyms,
superordinates and other lexical cohesion markers that is under investigation. In this
respect, fifteen articles have been selected from the American Magazine Scientific
American. The articles cover a span of ten years (from 1999 to 2009). The past issues of
the magazine are available through the online subscription. As for the Italian edition, Le
Scienze issued a double DVD box set early in September 2010, containing all the issues
of the Italian edition from 1968 till today. This has made it possible to have a wide range
of topics in electronic format at my disposal from which to choose.
The textual analysis of the corpora has been conducted using WordSmith tools,
12
which has been used to calculate lexical density, type-token ratio, sentence number and
average sentence length.
As in the previous hypothesis, WordNet has been used to identify word senses and
semantic relations between lexical items in texts. It is expected that Italian translations
will reproduce the lexical cohesion markers of the English articles, thus failing to comply
with TL and text-type norms and conventions. The target language norms and text-type
conventions have been extrapolated from the reference corpus made up of articles
originally written in Italian by Italian researchers and published in the same magazine (Le
Scienze) as the articles making up the parallel corpus. Based on the information that the
analysis of the articles contained in the reference corpus provides, pedagogical
suggestions have been made about how to make translation students aware of lexical
cohesion-related language norms and text-type conventions. In investigating the lexical
chains of the Italian documents and translation, MultiWordNet, which is a multilingual
lexical database wherein the Italian version is strictly aligned with the Princeton
WordNet, has been used. Finally, the statistical analysis tool SPSS (Statistical Package
for the Social Sciences) has been employed to verify the significance of the findings of
both the textual and semantic analysis.
Significance of my Research Hypotheses
The significance of this study lies in the role that cohesion plays, both at the textual and
extra-textual level, in the interpretation and translation of texts. Cohesion in general, and
lexical cohesion in particular, matter because they have an impact on the quality of
13
translations. Aiwei Shi proved empirically that when students are not aware of the role
cohesive markers play in a text, they make common errors at the sentential and infrasentential level where lexical collocates abound (1988: 145-146). Likewise, Barbara
Folkart in her article “Cohesion and the Teaching of Translation” argues that because of
students’ tendency to focus on form, that is, on lower sententional levels, rather than
intrasententional and textual levels, they fail to give cohesion the priority it deserves
(1988: 143). Students fail to see the text as a whole because cohesive markers come into
play only at the suprasententional rank, and this ultimately affects the quality of a
translation (1988: 151).
Teaching cohesion in translator-training programs can actually help students
become aware that a text is more than just the sum of its lexical units. By learning what
cohesion is and how it works, students will hopefully start viewing the text to be
translated as a unit in itself. There are several studies on the role that cohesion plays in
interpreting texts but not all of them are concerned with translation. Failing to
appropriately convey the cohesive markers of the source text according to the norms and
conventions of the target language and text-type may have an impact first on the quality
of a translation and second on the response of the target readership, thus leading to either
the failure or success of the translated text.
Defining translation quality in terms of cohesion is not new to the translation
field. Balkrishan Kachroo in “Textual Cohesion and Translation” states that an
“authentic” or rather good quality translation needs to consider factors that go beyond the
sentence level (1984). One of these factors is textual cohesion. In his view, an authentic
14
translation always strives to match the distribution of cohesive devices in the target
language text to those in the source language text but most importantly to those in the
target sub-language.
This dissertation sets out to further investigate the topic of cohesion in translation
by focusing on just one of Halliday and Hasan’s cohesive markers, which is deemed to be
the most important. By so doing, I hope to bring more awareness to the topic and to offer
pedagogical suggestions. To this end, one of the chapters of this study will be concerned
with the pedagogical implications of teaching lexical cohesion in translator-training
programs. In particular, the stress will be on how teaching lexical cohesion and cohesion
in general can improve the quality of translations and the means through which such
teaching can be carried out. For example, having translation students build reference
corpora in their L2 and focus on the stylistic preferences in terms of cohesive markers
that a certain language and text type adopt will foster not only their awareness of the text
as a global unit but also their self-learning skills which are crucial to the life-long
learning objective that modern pedagogy aims at. According to Silvia Bernardini (2002),
corpora can be used as pedagogical tools for making students interested and for having
them develop “autonomous learning strategies” and for “raising their language
consciousness, etc. (166).” Through their use, students gain valuable insights not only
into the target readership’s expectations and their own native language but also into the
way the latter may be used to achieve different communicative purposes in texts. It is
therefore about time we made students aware that texts consist of lexical items that have
textual, intra-textual and extra-textual associations, the interpretation of which requires a
15
global or macro-textual approach to text analysis when translating.
This study aims at raising awareness of the oft-neglected role that cohesion plays
in translation. The empirical and corpus-based approach to the study of lexical cohesion
in translation is designed to provide evidence pointing to the importance of lexical
cohesion. To this end, a future step involves the empirical investigation of the effects of
explicit training in cohesion. However, to accomplish this and actually teach cohesion in
a translation setting, which involves at least two verbal codes, it is necessary to study the
cohesive features that are language specific and, within each language, it is also
necessary to inquire into how language-specific text-types vary in terms of lexical
cohesion markers and cohesion in general.
Summary of Chapters
This dissertation consists of five chapters. Chapter I presents a detailed overview of the
main cohesion and coherence theories as well as hands-on studies on lexical cohesion in
both second language acquisition and translation studies. Chapter II describes the
methodological approach as well as corpus, semantic and statistical tools used to conduct
the corpus-based investigation of lexical cohesion. Chapter 3 reports the findings of both
the textual and semantic analyses of the parallel and comparable corpora. Chapter four
provides a detailed discussion of the results presented in chapter three in light of similar
studies. Finally, chapter five draws some conclusions, puts forward a framework for a
pedagogy of translational cohesion, and suggests some future directions to further
investigate
this
topic
from
a
process-based
point
of
view.
CHAPTER I
LITERATURE REVIEW
1.1 Coherence vs. Cohesion
Since Halliday and Hasan’s seminal work Cohesion in English, much has been written
about what a text should look like and what differentiates a text from a non-text. In this
respect, Halliday and Hasan claim that a text is characterized by texture, which refers to
the property of meaning unity shared by parts of any discourse (1976: 2). This concept of
texture which defines a text was further elaborated and developed by Widdowson (1978),
and De Beaugrande and Dressler (1981). Widdowson introduced the idea that texts could
also be coherent without there being explicit or overt cohesive devices in an instance of
either a spoken or written discourse (1978: 24-26). In such cases, the reader or listener
makes sense of the text by inferring the links between the sentences based on his or her
interpretation of the illocutionary acts performed by the sentences themselves (1978: 2729). In line with Widdowson’s cohesion theory, De Beaugrande and Dressler, in their
work Introduction to Text Linguistics, draw a clear distinction between cohesion and
coherence, each having its own role in building textuality. In this resepct, they define a
text as a communicative occurrence that meets seven standards of textuality:
1) Cohesion
2) Coherence
16
17
3) Intentionality
4) Acceptability
5) Informativity
6) Situationality
7) Intertextuality
Of these seven standards, cohesion and coherence are claimed to play the major role in
creating texture (1981: 3). In De Beaugrande and Dressler’s words, cohesion relates to
“the ways in which components of the SURFACE TEXT […] are mutually connected
within a sequence (1981: 3);” whereas coherence is defined as a “continuity of senses” by
which the authors refer to the logical organization of arguments in a text that allows
readers to make sense of the text and perceive it as a coherent whole (1981: 84).
Although their notion of coherence is largely text-based, they also acknowledge that texts
do not make sense by themselves but rather require “the interaction of text-presented
knowledge with people’s stored knowledge of the world (1981: 6).” By contrast, Halliday
and Hasan’s definition of coherence is exclusively text-based. They claim that cohesion,
which “refers to relations of meaning that exist within the text, and that define it as a
text,” is an index of texture or text coherence (1976: 4). Hasan (1984) clearly states that
coherence is a feature of language; in other words, cohesion is the foundation on which
coherence is built (1984: 181). She actually rejects findings of research into the
relationship between coherence and cohesion measured in terms of reader response
18
because in her view coherence is recognized by readers as a result of the cohesive
harmony or rather interaction among cohesive ties.
This idea of viewing cohesion as the major contributor to the coherence of a text
is mainly sustained and supported by scholars working in linguistics. Schema
theoreticians, by contrast, see text processing as an interactive process involving the text
and the reader (Carrell, 1982: 480). Fulcher, for instance, argues that according to schema
theory coherence comes first or plays a primary role. It follows that coherence precedes
cohesion, which means that readers first look for coherence based on their world/culture
knowledge and then recognize cohesion and make sense of the text (Fulcher, 1989: 146).
Comprehending a text involves an interactive process between the listener’s or reader’s
background knowledge of content and structure (the so-called content schematic
knowledge) and the text itself (Carell, 1983: 82). In this respect, Campell (1995), drawing
on Gestalt Theory, argues that coherence is more than the mere sum of local or global
cohesive relations (1995: 80). To put it in simpler terms, cohesive ties are only one of the
factors affecting the understanding of a text as a coherent whole. Textual coherence also
depends on a recipient’s knowledge and on the following four principles:
1) Relevance
2) Clarity
3) Adequacy and
4) Accuracy
19
In order for a text to be coherent, the information contained in the text needs to be
considered relevant, clear, adequate and accurate (1995: 96). Nevertheless, this does not
exclude the possibility that the recipient will regard some of this information as
irrelevant, unclear, inadequate or inaccurate.
In line with this subjective and reader-centered view of coherence is Stoddarb’s
notion of textual synergism, which refers to that holistic phenomenon whereby the
readers of a text grasp greater meaning than that conveyed by the sum of the words
contained in it (1991: XIII-XIV). This synergism, according to Stoddarb, is a mental
process initiated by what linguists refer to as “texture,” which consists of various kinds of
text patterns, one of them being cohesive ties (1991: XIV). The author rejects any
static/linguistics-founded definitions of text in that they do not take into account the
synergism thereof. One of the positions held in traditional linguistics is that a text can be
analyzed in isolation from its environment (1991: 9). However, Stoddarb maintains that
when people re-read texts, they rarely interpret them in the same way as the first time,
which implies that our perceptions of the meanings contained in a text are not static but
mutable (1991: 9). The only way to account for this variability of interpretation by
readers, which both Halliday and Hasan’s and De Beaugrande’s definitions of text fail to
grasp is to consider a text as a state of mind or mental model (1991: 10-11). Viewed from
this perspective, the written text is conceived of as the result of a writer’s thinking or
intentions which are in turn interpreted by readers who create their own mental model of
the text. Therefore, the author concludes that a synergetic definition of text must be
reader-based. This definition sees texts as a reader’s mental reconstruction of a writer’s
20
text (1991:11). As far as cohesive patterns are concerned, Stoddarb identifies six writerbased factors or properties affecting their interpretation (1991: 20-21):
1) Number of cohesive ties: which refers to the number of potential ties per node
(the greater the number of these ties, the more unified the text is perceived);
2) Distance: which refers to the number of intervening words between the ends
of ties (between a node and each of its ties);
3) Directionality: which refers to word order (i.e: cataphoric vs. anaphoric
directions)
4) Reentry: which refers to the repetition of cohesion networks by the writer (all
the more so if identical nodes are used);
5) Intersection: which refers to cases in which networks overlap;
6) Type: which refers to types of cohesion (like the typology of cohesive ties set
out in Halliday and Hasan [1976]).
In addition to these six writer-determined properties of cohesion, Stoddard also identifies
the following reader-dependent factors affecting the interpretation of texts:
1) Readers’ stored knowledge: when people are engaged in a reading act,
they select previously-stored information from their long-term memory to
fit the new reality or situation (1991: 24);
2) Specificity or definiteness of anaphoric expressions: in other words, there
exist among anaphoric expressions varying degrees of specificity which
21
affect the ease with which these expressions are identified and understood
(1991: 25-26);
3) Connectivity between concepts: the greater the connectivity between two
concepts, the likelier it is that a reader will retrieve one concept given the
other (1991: 27).
This reader-based conception of a text is supported by transactional theory which views
the meaning-building process as a constant interplay between the reader and the text
(Rosenblatt 1989). This theory draws on Pierce’s triadic model in which the linguistic
sign is related to its object through the interpretant’s mental association (Pierce 1933:
para. 347). It follows that, for transactional theorists, the meaning of a text, or rather
words and marks on every single page resides in the interaction between reader, who
during the reading process is conditioned by the social, cultural, educational
circumstances in which s/he finds himself/herself, and the text itself (Rosenblatt 1989).
For the purpose of this study, a linguistics-based definition of text will be
adopted; in other words, cohesion is assumed to be a property of texture, and coherence is
considered to be a facet of a reader’s evaluation of a text (Hoey 1991:12). Thus, lexical
cohesion is considered to play an important supporting role for coherence which,
although cognitive in nature, is still constructed based on explicit linguistic signals.
1.2 Lexical Cohesion
22
The term lexical cohesion was first introduced by Halliday and Hasan (1976) in Cohesion
in English, in which the authors define lexical cohesion as “the cohesive effect achieved
by the selection of vocabulary” (1976: 274). In other words, it is through the choice of
words that lexical cohesion is realized. In particular, they identify five main types of
lexical cohesion:
1) Same Item, also known as repetition, which occurs when the same word, be it a
noun, verb, adjective or adverb, is reiterated throughout a text;
2) Synonym or near synonym: words sharing similar senses. In this subcategory,
Halliday and Hasan also include hyponyms which are words having narroweddown meanings;
3) Superordinates, also known as hypernyms, which could be defined as generalmeaning words or words with a broad meaning;
4) General nouns: words that are very general in meaning and whose interpretation
requires the reader to go back to previously-mentioned items. This subcategory
might sound very similar to superordinates, but it differs from the latter in that in
order for general nouns to be cohesive they need to be preceded by the reference
item the or a demonstrative (1976: 275);
5) Collocation which is defined as the “tendency of any two items to occur in similar
contexts” (1976: 286).
These five lexical cohesion types are grouped into two major categories: Reiteration (n° 1
to 4) and Collocation ( n° 5).
23
Based on Halliday and Hasan’s cohesion categories, Morris and Hirst (1991)
propose their own classification of cohesive devices, which resembles Halliday and
Hasan’s but with a different nomenclature (1991: 21-22):
1) Reiteration with identity of reference (Halliday and Hasan’s category of same
items);
2) Reiteration without identity of reference (Halliday and Hasan’s category of
synonyms or near synonyms);
3) Reiteration by means of superordinates (Halliday’s category of hypernyms and
general nouns); and
4) Collocation, which is defined as a semantic relationship between words that cooccur and which includes two types of relationships: 1a) systematic semantic
relations as in the case of pairs of words drawn from the same ordered series
(dollar-cent; north-south); and 2a) nonsystematic relations (when words are
related in terms of such relations as synechdoches, members of the same class,
hyponyms of the same superordinate etc.).
As far as collocation is concerned, there is a major difference in meaning between the
way Halliday and Hasan and other authors working with lexical cohesion conceive it and
the way corpus linguists define it. In corpus linguistics, collocation is basically described
as the actual occurrences of words in texts. Sinclair originally defined it as “the cooccurrence of words with no more than four intervening words (2004: 141)” Generally
speaking, in corpus linguistics, in order for a set of lexical items to be identified as
24
collocates, they need to appear in the text with a certain frequency and the number of
intervening words between them should not exceed four. For Halliday and Hasan, by
contrast, textual evidence is not fundamental; what matters is the meaning associations
between words (Flowerdew & Mahlberg 2009: 112). This means that the very fact that
two items imply one another makes it possible to consider them as collocates even if their
frequency of co-occurrence is not high or the distance between them is very large.
In this study, Morris and Hirst’s classification of cohesive devices will be adopted
in the corpus-based investigation of lexical chains but the categories will be renamed as
follows:
1) Simple and modified repetition (which for the purpose of this study will be
considered as just one category)
2) Synonyms
3) Antonyms
4) Holonyms
5) Meronyms
6) Hypernyms
7) Hyponyms
The choice to use a different nomenclature has been taken to avoid any terminologyrelated confusion when the two web-based lexical databases, Wordnet and
MultiWordNet,
which classify semantic relations
denominations, will be employed to identify lexical chains.
using the above-mentioned
25
As far as semantic relations are concerned, another important notion which needs
to be briefly discussed herein is that of lexical chain, which was first introduced by
Halliday and Hasan (1976). They laid the foundation for this concept, but it was further
developed by Hasan (1984) who defines it as a relationship formed between two cohesive
elements when one refers back to the other. Hasan also classifies chains into identity and
similarity groups. Identity chains consist of ties that all share the same referent (1984:
15), whereas similarity chains are made up of ties for which issues of identity cannot
arise. The author claims that, although these chains by themselves contribute to making a
text coherent, they are not sufficient for the creation of coherence. In his own words,
coherence is mainly due to cohesive harmony whereby Hasan means the interaction of
connections among cohesive chains present in a text. In this respect, a chain interaction is
a relation associating elements from one chain with those of another chain (1984: 212).
This view is also held by Hoey (1991) who in Patterns of Lexis in Text, points out that
the presence of lexical chains in a text does not necessarily guarantee coherence. Rather,
it is the interaction among such lexical chains that appears to be the crucial factor (1991:
15). Two word groups or chains interact when at least two lexical items in one chain are,
grammatically speaking, related to two other lexical items in the other chain through such
relations as Actor – Material Process – Goal or Senser – Mental Process – Phenomenon
(Butler 2003: 341) The concept of lexical chains was further elaborated by Morris and
Hirst (1991), who refer to them as sequences of “nearby related words spanning a topical
unit of the text (1991: 22).” Lexical chains are important because they help solve
ambiguity issues and contribute to the determination of coherence and discourse structure
26
(1991: 23). In this study, lexical chains and their attendant inner semantic relations will
be the focus of my analysis.
1.3 Lexical Cohesion Studies in Discourse Analysis and Linguistics
Over the past twenty years or so, much research has focused on the analysis of lexical
cohesion. The attention that this cohesion category has been given is due to the fact that it
is in Hoey’s words “the dominant mode of creating texture” in that it can form multiple
relationships (1991: 10). Indeed, it has been proven that approximately fifty percent of a
text’s cohesion ties are lexical (Teich & Fankhauser 2004: 327). Scholars in the field of
discourse analysis and linguistics have used lexical chains for the following research
purposes:
1) Text Summarization (Barzilay & Elhdad [1997], Maheedhar [2002], and Silber &
McCoy [2003])
2) Malapropism Detection (Morris & St-Onge [1998])
3) Information Retrieval (Stairmand [1996] and Al-Halimi & Kazman [1998])
4) Topic Segmentation (Morris & Hirst [1991], Hearst [1997])
5) Word Sense Disambiguation (Okumura & Takeo [1994], Galley & Mckeown
[2003])
6) Hypertext Construction (Green [1998])
The recourse to lexical cohesion, in general, and lexical chains in particular, in the study
and analysis of discourse can be attributed to the fact that, as Anderson points out, this
27
cohesive subcategory combines the study of meaning (also called Semantics) and the
study of intersentential relations (also known as Discourse Analysis) (1977: 1). Through
the analysis of lexical cohesion, one is concerned not only with how discourse is
structured but also with the meaning thereof, thus dealing with both structural and
semantic relations. In this respect, Hoey himself states that unlike reference, conjunction,
ellipsis and substitution, which are markers of textual relations (their interpretation
requires textual analysis), reiteration and collocation are primarily markers of lexical
relation and only secondarily markers of textual relation (1991: 7).
Morris and Hirst (1991) were the first to use lexical chains to identify change of
topics in discourse. They went so far as proposing an algorithm for the computation of
lexical chains. Nevertheless, due to the lack of a machine-readable copy of Roget’s
Thesaurus, which was used as a lexical database for the extraction and detection of
semantic relations among words, the algorithm had to be worked out by hand. Their
algorithm was later taken up and improved by other discourse analysts to adjust it to their
own research purposes.
Regina Barzilay and Michael Elhadad, in 1997, present a new algorithm for
computing lexical chains, with a view to producing indicative summaries of texts without
having to semantically interpret them in their entirety. Indicative summaries are
summaries that are used by readers to decide whether or not a text is worth reading. They
use lexical chains because the latter are indicators of discourse structure (1997: 11). In
their article, they present a four-step summarization process which involves:
28
1) Segmenting a text through the use of a parser
2) Computing lexical chains by means of the WordNet Thesaurus
3) Identifying strong lexical chains
4) Extracting significant sentences which will make up the summary of texts
Unlike Morris and Hirst, Barzilay and Elhadad assessed the validity and quality of their
process and lexical chaining algorithms by running statistical tests, thus providing
empirical results (1997: 10). Furthermore, the authors argue that computing lexical chains
improves the text summarization process because the focus is on the concept not on the
linguistic representation. This argument is made against the use of word frequency in
early summarization systems (1997: 10). Their rationale is that some concepts may be
reiterated throughout a text by means of several words, which, as a consequence, may
have a low frequency of occurrence (1997: 14). Conversely, the lexical chain approach
disregards word frequency and manages to capture the conceptual spheres the discourse
of a text revolves around.
This idea of using lexical chains as an intermediate step for text summarization
was later taken up by Silber and McCoy (2003). They specifically focus on the concept
extraction phase. To this end, they propose a linear time algorithm for lexical chain
computation which, unlike previously proposed algorithms (i.e., Barzilay and Elhadad’s),
is capable of analyzing large documents. For each word candidate, their algorithm
extracts all word senses from WordNet and assigns them to that word, identifying all
possible chains to which each word sense may belong. The second step involves finding
29
the best interpretation for each word candidate and the chain to which it belongs (2003:
3). In this respect, the algorithm makes it possible to analyze each chain to which a word
belongs and, based on distance factors and the type of relation, it chooses the one having
the strongest semantic relation (2003: 3).
In carrying out their experiment, Silber and McCoy set out to determine whether
the concepts represented by strong lexical chains are the main concepts in texts. To do so,
they used textbook chapters and their attendant chapter summaries (2003: 8). They
computed lexical chains in both the original documents and their summaries and then
compared the concepts represented by the lexical chains in both text types and found that
the concepts represented by the strong lexical chains in the original documents were the
same as the ones appearing in the lexical chain analysis of the summaries (2003: 7).
A similar lexical chaining process was put forward by Michael Galley and
Kathleen McKeown (2003), who suggest a new linear algorithm for computing lexical
chains with a view to disambiguating word senses in discourse. In their study, they
actually compared their algorithm to Silber and McCoy’s and Barziley and Elhadad’s and
found that theirs had an accuracy of 62.09 % compared to 56.56 % for Barzilay and
Elhadad’s and 54.48% for Silber and McCoy’s. Similar to Silber and McCoy’s study,
they separated the word sense disambiguation process from the actual lexical chaining of
words, with the main difference being that Silber and McCoy’s algorithm creates all
possible meta-chains from stage one of the disambiguation process (when all the possible
interpretations for each word candidate are identified) whereas Gally and Kathleen’s
30
builds lexical chains only after disambiguating all words. In particular, their algorithm
first analyzes each noun instance individually rather than processing the whole text. This
means that each noun may be assigned more than one word sense in the same discourse.
Then each noun instance is checked to see whether the latter is semantically related to
other nouns by looking at such semantic relations as hyponyms, synonyms, etc. The next
step is to build a disambiguation graph which is a representation of word-sense
combinations. A representation graph is a representation of all of the interpretations that
each word in a discourse sample might have.
Another major application of lexical chains is in detecting and correcting
malapropisms, which Hirst and St-Onge (1998) define as words that are either misspelt or
mistyped due to their similarity in sound to other words or due to ignorance on the part of
the person who typed them. To this end, they proposed an algorithm which sets out to
detect and correct malapropisms based on a lexical chain construction. The rationale
behind this study is that if a coherent and cohesive text is made up of intertwined
concepts and sentences, the semantic relations of which contribute to building cohesive
chains, it follows that malapropisms can be detected by computing lexical chains insofar
as they provide sufficient context for lexical ambiguities (1998: 307). Like other studies
on lexical chains, Hirst and St-Onge’s too makes use of WordNet as a lexical database
from which word senses and semantic relations are extracted. However, Hirst and StOnge
notice some flaws in their algorithm, which are attributable to some of the
limitations of WordNet. For example, some words were not included in chains where
they clearly belonged and other words were instead included in chains where they did not
31
belong. According to the two authors, these problems result from limitations in the set of
relations contained in WordNet, inconsistencies in the semantic proximity implicit in
WordNet links, as well as incorrect or incomplete disambiguation (1998: 318-319).
Most of the above mentioned studies limit the scope of research to just one lexical
item category, namely nouns because they are considered to be the main contributors to
the “aboutness” or main topic(s) of a text; but they are also used because noun synsets
dominate in WordNet (Barzilay and Elhadad, 1997: 13). By contrast, the present study
sets out to analyze the semantic relations of nouns, verbs, adjectives and adverbs.
Initially, LexChainer, which is a lexical chaining tool created by Galley at Columbia
University, was supposed to be used because of the more accurate algorithm used in this
software. However, several factors, among which the fact that LexChainer only runs on a
Linux platform and makes use of WordNet which classifies semantic relations according
to word class and as previously-mentioned is not representative of the whole vocabulary
of a language, as well as the impossibility for this tool to disambiguate polysemic words,
made an automated analysis of semantic relations materially impossible. In its place, a
manual analysis of them with the auxiliary of WordNet will be conducted so as to make
sure no overriding chain or network is overlooked. In this respect, in analyzing such
semantic networks, emphasis will be put on those built through lexical items which
contribute the most to the aboutness of a text. In other words, only lexical items which
are related to the major theme (s) of the documents being analyzed will be taken into
consideration.
32
1.4 Lexical Chaining Sources
When it comes to computing lexical chains, two main options are available: 1) manual;
and 2) automatic. For a manual computation of lexical chains, one can resort to thesauri
in which words are grouped by meaning and semantic distance. An example is Roget’s
Thesaurus which classifies words into one thousand categories, and each of these
categories is divided into smaller groups containing closely related words. However,
thesauri only group related words but do not specify the kind of relationship they have in
common. Morris and Hirst, for instance, used Roget’s Thesaurus in their computation of
lexical chains. Since at the time there was no machine-readable copy of the thesaurus,
they had to compute the semantic relations among words manually (29). Conversely,
most of the studies that tried to automate the computation of semantic relations make use
of WordNet.
1.4.1 The WordNet Project
Word meaning started to be formulated and represented in terms of networks or diagrams
with nodes in 1985. This new way of looking at meanings was labeled Relational Lexical
Semantics. It came to be an alternative to Componential Lexical Semantics, which
approached the meaning of a word in the same way as the meaning of a sentence. In other
words, just as the meaning of a sentence needs to be broken down into the meanings of its
constituents, the meaning of a word, too, needs to be decomposed into its “conceptual
atoms” (Fellbaum 1998: XVI).
33
This new way of conceiving word meanings underlies the WordNet project. In
particular, at the core of this project was the idea of using synonym sets to represent
lexical concepts. Originally, WordNet started off as a dictionary browser and later
became a lexical database after the WordNet team was asked to develop a tool which
would read and process texts and report information about the lexical items used in them
(Fellbaum 1998: XIX-XX).
WordNet does not contain any information about the syntagmatic properties of
words, which is why it is divided into four semantic nets, each concerning a specific
word class, namely noun, verb, adjective and adverb (Fellbaum 1998: 4-5). WordNet can
be thought of as a semantic dictionary wherein words and concepts are represented as an
interrelated system or network which more closely resembles the way speakers organize
the lexis in their mind (1998: 7). WordNet is halfway between a traditional dictionary and
a thesaurus in that it combines features of both. However, as Fellbaum points out, unlike
in a thesaurus, “the relations between concepts and words in WordNet are made explicit
and labeled (1998: 8).” As in a dictionary, Wordnet provides definitions and sample
sentences for most of its synsets, which are sets of synonyms belonging to the same word
class and sharing the same concept. In WordNet, words and concepts are linked through a
variety of semantic relations which are based on similarity and contrast (1998: 10).
WordNet by itself, however, is not able to process a text and extract semantic
relations from it. In computing lexical chains, it is best used in combination with a lexical
chainer which draws on WordNet as a lexical database and extracts semantic relations
34
from it. Generally speaking, when computing lexical chains, the following steps should
be taken:
1) Candidate words are selected
2) For each candidate word sense, a suitable chain is to be found
3) If such a chain is found, the word is to be inserted into that chain; otherwise a new
one is created
Aligned with the English version of WordNet is MultiWordNet, in which the Italian
synsets are created in correspondence with the English synsets (MultiWordNet will be
used for the analysis of the lexical chains in the Italian corpus).
However, as was pointed out earlier, WordNet is divided into four semantic nets
according to word class and these nets are not connected with one another. It follows that
in the case of synonyms with word class change1, that is to say, words that are similar in
meanings but belong to two different part-of-speech categories, the tool is not capable of
detecting their semantic connection. This is why human intervention is still needed when
analyzing WordNet-computed lexical chains.
1.5 Cohesion and Lexical Cohesion in Translation Studies
When it comes to translation, many studies focus on cohesion but few deal with lexical
cohesion, lexical chain computation, and the analysis of semantic relations; and even
fewer adopt empirical methods to investigate this phenomenon.
1
See Salkie, R. Text and Discourse Analysis. London: Routledge,1995.
35
The importance of cohesion in translation is partly related to the considerable
impact that it can have on quality. In this respect, Balkrishan Kachroo in “Textual
Cohesion and Translation” states that an “authentic,” or rather good quality translation,
needs to consider factors that go beyond the sentence level (128: 1984). One of these
factors is textual cohesion. The author’s hypothesis is that the use of cohesive devices
plays a major role in determining the quality and accuracy of translations. Stated in more
technical terms, the hypothesis claims that an “authentic translation” always strives to
match the distribution of cohesive devices in the target language text not only to those in
the source language text but also, and more importantly, to those in the target sublanguage (or text-type, which in this experiment is children’s literature) (1984: 131). The
methodology involves the use of one Hindi original text and one English original text
containing the same number of paired sentences (fifty in total) and sharing the same
sublanguage (children’s literature). These texts were analyzed in terms of their
distribution of cohesive devices. Then five native speakers of Hindi were asked to
translate the Hindi text into English. These texts were later analyzed for the distribution
of cohesive devices, which as in the previous case were counted (1984: 131). In the
analysis of the results, it was found that the best or most authentic translation resembled
the English original text in terms of the distribution of cohesive devices (1984: 132). In
other words, both the distribution of cohesive devices and the cohesive patterns used
conformed to the TL norms and above all to the norms governing the sublanguage or
text-type.
36
This idea of text-type as an important element affecting our linguistic choices is
also discussed by Hatim and Mason (1990) and Mona Baker (1992). Hatim and Mason,
in particular, maintain that text-type, discourse and genre are motivating factors affecting
our lexical choices. The importance of cohesion in translation is also discussed by Berber
Sardinha (1997), who states that changes in cohesive devices affect the way readers
interpret the text. The author is clearly making reference to the theory of Rober-Alain de
Beaugrande (1980), according to whom cohesion is bound up with coherence. De
Beaugrande states that coherence results from the interaction of text-presented knowledge
with the reader’s prior knowledge of the world (1980: 19). It follows that coherence is
enabled by the reader’s inferencing, which is affected by the reader’s prior knowledge. It
follows that if translators do not respect the norms of the target language in conveying the
cohesive devices of the source text “properly” into the target text, more specifically, if
they do not respect those of the TL text-type, the text’s readability for the target audience
may decrease. When such misunderstanding occurs in legal texts or, generally speaking,
in texts whose interpretation may lead to serious consequences, it becomes especially
evident that cohesion plays an important role in translation, a role that has generally been
neglected in translator training since more often than not the focus is on translation
strategies and discussion of translation theories as applied to translation examples rather
than on the analysis of the source text in terms of its global cohesive features, and
therefore as a translation unit in itself.
Another study focusing on the quality of translation but in terms of translation
equivalence is by Lotfipour-Saedi (1997), who defines translation equivalence in terms of
37
lexical cohesion and texture. The author’s hypothesis is that lexical cohesion affects the
essence or texture of a text and therefore needs to be preserved when translating in order
to achieve TL “equivalence.” In examining lexical cohesion, the author suggests reading
the text to isolate lexical chains central to the theme of the text. The approach taken by
the author also emphasizes, in my opinion, the importance lexical cohesion has in the
structuring of information in the text and how it can affect the theme-rheme organization
when cohesive markers are moved. This is all the more true of conjunctive devices, such
as however, instead, and nevertheless, which writers may purposefully place in the rheme
or theme position. When translating such devices, we need to be aware of their function;
in other words, we need to be aware of the reason why the original writer used that
particular cohesive device and the position thereof in the text. As a matter of fact,
rhetorical styles play a major role in making a particular text-type acceptable to the
members of the discourse community for which the text is meant. In this respect, Reza
Khany and Khalil Tazik statistically proved that the publication and acceptance in
international journals of research articles written by local authors depends on the authors’
compliance not only with text-type-related rhetorical moves but also with move-related
lexical cohesive patterns (LCPs) through which the connectedness within and among and
across rhetorical moves is realized (2011: 83). What they found out in their study was
that failure to comply with lexical cohesive patterns resulted in articles being rejected
(2011: 91). So in addition to the cohesive device itself, we need to consider the role that it
plays in the whole text. As Barbara Folkart puts it, translation students and in general
translators need to learn that the translation unit is not a word, sentence or paragraph but
38
the text (1988: 153). By so doing, the translation student will learn to focus his/her
attention beyond the word or intrasentential level which, very often, results in banal
translation errors.
It has already been pointed out that lexical cohesion is of paramount importance
because it helps the reader solve ambiguity issues by narrowing down the potential
meaning of words and providing clues for establishing the coherence of a text (Morris &
Hirst 1991: 23). Applied to translation, this statement implies that cohesion can be used
to help students disambiguate the content of a text and narrow down the choice of
translation solutions to a few candidates. However, it is sometimes difficult to draw a line
between what constitutes cohesion and what does not. For example, Hatim and Mason
(1990) expand on Halliday and Hasan’s categorization of cohesive devices and add other
variables of texture that contribute to making a text coherent and cohesive. Among them
are theme-rheme or new-given information. For them, it is also possible to add
punctuation. It follows that when studying cohesive devices we cannot focus on all these
elements; rather, we need to narrow down the choice and focus on one particular type of
cohesive device if we want our study to be thorough and generalizable. In this respect,
Campbell is of the opinion that it is not necessary to perform an exhaustive analysis of all
the cohesive devices used or present in a text to carry out quality research. Quality does
not equal quantity and it is the research goals that dictate the scope of cohesive elements
that need analysis (1995: 84-85). In order to limit the scope of research, Stoddard
suggests narrowing down not only the number and types of texts to be included in the
research analysis but also the number and types of cohesive markers (1991: 32).
39
Following this rationale, the present study will focus on a limited number of texts,
namely fifteen, taken from Scientific American and its Italian version Le Scienze, as the
basis of a corpus-based study of lexical cohesive devices. For the purpose of my research,
analyzing a large number of texts is not essential because the documents will be
randomly selected. If the same results apply to all the randomly-selected documents and
they turn out to be statistically significant, then it will be possible to generalize them to
all of the documents printed in the above-mentioned magazines.
It is worth pointing out that most research studies on cohesion are either productoriented or process-oriented, only a few combine the two approaches. Sivia HansenSchirra, Stella Neumann and Erich Steiner investigate explicitness and implicitness in
translation at the level of cohesion by adopting an empirical, corpus-based approach. The
cohesive devices that were analyzed in the corpus are the ones listed by Halliday and
Hasan in Cohesion in English. Their corpus comprised multilingual comparable texts
(English originals and German originals), monolingually comparable texts (English
originals and German originals), monolingually comparable texts (English originals and
English translations/German originals and German translations) and parallel texts
(English originals and German translations/German originals and English translations)
(2000: 247). In analyzing the corpora, it was noticed that shifts in cohesive devices were
due to a normalization process whereby source text cohesive devices adjusted to TL
preferences. For example, as far as the use of pronominal referents is concerned, it was
discovered that in the translation corpora, the use of relative pronouns in the translation
conformed to the norms of the target language. In particular, there was an increase in the
40
number of pronouns when translating into German, whereas there was an increase in the
number of nouns and a decrease in the number of pronouns when translating into English
(2000: 256). Indeed, through an analysis of the monolingual or reference corpora in both
English and German, it was found that relative pronouns were more characteristic of
German than English. This product-oriented investigation of cohesion confirms KlumKulka’s (1986), Newman (1988), Hatim and Mason’s (1990) and Mona Baker’s (1992)
postulate that cohesive patterns vary both within and across languages. Within language,
they vary according to text-types, whereas across languages, each language has its own
stylistic preferences for certain patterns. It follows, Baker (1992) states, that when
translating, transferring all the ST cohesive devices into the TT will not guarantee the
creation of texture in the TT. The choice of which cohesive device to use must be
dictated by TL norms as well as the textual norms of each text-type. Another study that
focuses on one of Halliday and Hasan’s cohesive devices is Monika Krein-Kuhle’s
(2002). In her paper, she also emphasizes the need, when translating, to convey the
function and semantic meaning of the cohesive element through the use of devices that
are common in the target language. In particular, she focuses on the translation of the
demonstrative this from German into English because in English the demonstrative is
semantically strong, and so must be conveyed through other linguistic means in German,
such as pronouns, adjective, adverbs, etc. Through her product-based analysis of this, she
manages to demonstrate that shifts that occur in translation may be due to semantic and
pragmatic aspects such as domain and register considerations that call for greater
referential clarity (2002: 50).
41
Most of the above-mentioned studies deal with cohesive devices other than lexical
cohesion; and the very few that do deal with it do not have any empirical foundations.
One of the few empirical studies that exist in the literature is Marta Karina’s Master’s
thesis, Equivalence and Cohesion in Translation: A Study of a Polish Text and its English
Translation. The author tests the validity of two main hypotheses: 1) Hypothesis one
claims that the translation of a word or rather a key term by a number of words into the
target language affects the lexical cohesion of the target text negatively and results in a
less cohesive text with a less clearly articulated theme; 2) Hypothesis two claims that the
target text being analyzed, which is the English translation of a forty-page Polish travel
brochure, uses less lexical variety of words to express the same theme as the source text
(2000: 19).
To test the validity of hypothesis one, the author focused on twelve key terms and
found that the translation of the key words by a variety of words actually has the opposite
effect to what she had predicted; in other words, this translation strategy actually
increases the textual cohesiveness of the target text (2000: III). The twelve words chosen
for the analysis are central to the theme and lexical coherence of the text (2000: 12). All
English translations of the terms fell within two main lexical categories: synonyms and
hypernyms. It was also found that the same key term was translated differently
throughout the text depending on the context (2000: 19). The author hypothesizes that the
reason behind this might be stylistic (2000: 19).
42
The strength of this study is its empirical grounds. Statistical tests were run to
check the validity of both hypotheses. These tests rejected her first hypothesis, indicating
that both English and Polish texts use more or less the same number and variety of lexical
equivalents (2000: 20). Likewise, statistical analysis rejected her second hypothesis,
suggesting that a text is not made less cohesive when ST words are translated by a variety
of TL equivalents (2000: 22). Somehow, as the author herself points out, this finding is in
keeping with those of Halliday and Hasan (1976: 278-9) and Hoey (1991: 6), which state
that synonymity actually contributes to text cohesiveness. The author argues that the use
of synonyms in a translation adds to the cohesiveness of the target text. When translating,
the choice of one term over the other should be guided by the target readers’ word
preferences of which the translator can only be aware if he or she is familiar with texttype conventions and the topic lexis as well as the cultural differences existing in this
respect between source and target languages (2000: 32).
During my search for previous studies and experiments on lexical cohesion, it has
been found not only that there are few statistically grounded studies dealing with this
issue but also that none focuses on the difference between novices and experts. Studying
lexical cohesion in terms of novice vs. expert differences may help make the case for
teaching lexical cohesion in translation classes in that, as previously mentioned, more
often than not, this topic is usually disregarded or neglected by the translation trainee,
who generally approaches the translation as a set of segments or paragraphs for which
individual translation strategies and errors are discussed. This approach does not help the
student see the text as a unit. What needs to be emphasized is a global, or macro-textual
43
approach, and the analysis and discussion of lexical chains is one way to do it. However,
investigating lexical cohesion in terms of novice vs. expert differences is beyond the
scope of the present study.
The primary goal of this study is to show that cohesion, and in particular, lexical
cohesion does indeed affect the quality of translation. To this end, a product-based
approach to the study of lexical cohesion will be undertaken, and the results will be
statistically tested in order to provide the necessary empirical findings and data to support
or disprove my hypotheses.
CHAPTER II
METHODOLOGY
The purpose of this study is to investigate the differences between English and Italian in
terms of lexical cohesive markers and semantic relationships; in particular, my hypothesis
is twofold and claims that 1) Italian translated texts tend to reproduce the lexical semantic
relationships of the source texts and this in turn affects their readability; they will be
perceived as less coherent by the target readership (who has different expectations when
reading articles as a result of different stylistic, language and text-type conventions in the
target system); and 2) Italian originals tend to differ in the use of lexical semantic
relationships from Italian translated texts published in the same journal and belonging to
the same text-type, which points to the influence of the ST lexical devices/markers during
the translating process. The first hypothesis will be tested on a bilingual parallel corpus of
English and Italian semi-technical texts taken from Scientific American and Le Scienze,
respectively; the validity of the second hypothesis will be tested on a comparable corpus
of Italian originals and Italian translated texts both taken from Le Scienze. The decision to
use the same magazine for both corpora makes it possible to assume that the target
readership and the target readers’ expectations are the same, which in turn makes the
findings comparable.
44
45
2.1 Methodological Approaches: Text Analysis and Corpus Linguistics
This study combines two different approaches to the analysis and linguistic investigation
of written language, namely text analysis and corpus linguistics. Below is an overview of
what these two methodologies are mainly concerned with and how they found
applicability to the present study.
2.1.1 Text Analysis
Text analysis as applied to translation studies was first theorized by Christiane Nord in
the early 1990s. According to Nord, text analysis in translation needs to explain the
linguistic and textual structures of texts as well as the relationship the latter have with the
system and conventions of the source texts and of the source language in general (2005:
1). In this respect, she states that the semantic and stylistic features of lexical choices may
yield information about extratextual factors (the situation in which a text is produced) and
intratextual factors (such as subject matter, content and presuppositions) (2005: 122). In
the present study, the only extratextual factor that was foregrounded in analyzing the
documents and in drawing conclusions was the target readership’s expectations. The
latter, together with language and text-type conventions, are in turn deemed to play a
major role in the choice and use of lexis and sentence structure (namely lexical cohesive
devices and the distribution and length of sentences), which are the intratextual factors
that were investigated herein.
46
This type of analysis of written language involves “the deconstruction2 of
information within a text” (Tsai 2010: 61). Deconstructing the information contained in a
text makes it possible to focus on lexical features such as content words and their senses,
word frequencies (tokens and types), type-token ratio, and lexical density, as well as
syntactic features such as number of sentences, average sentence length and readability
index. A brief discussion of the above-mentioned lexical and syntactic features will be
provided below followed by an overview of the research methodology and tools used in
this study.
2.1.1.1 Lexical Analysis
Lexical analysis is of great importance to this study because of its focus on lexical
cohesive markers, which unlike other cohesive devices are actual content words, each
with a specific, subject-field-bound sense or meaning. Lexical analysis allows researchers
to identify the number and types of tokens occurring in any sample of spoken or written
language. The term “token” refers to any set of characters delimited or separated by a
whitespace character whereas the term “type” refers to the number of different tokens
present in a text. For example, in the following sentence “The book is on the table,” there
are six tokens and five types in that the word “the” occurs twice and is counted only once
when computing types. The ratio of types to tokens tells us about the lexical variations of
a text (Laviosa 1988, 2002; Olohan 2004: 80-81). The higher the type/token ratio, the
more varied the vocabulary of a text; conversely, the lower the type/token ratio, the lower
2
Deconstructing a text means breaking down the information contained therein into its textual, syntactic,
and linguistic features.
47
the vocabulary variation in a text. However, it is worth pointing out here that type/token
ratio is affected by text length (Tsai 2010: 74), which means that researchers must either
compare texts of about the same length or compute the standardized type/token ratio to
get valid results. Bowker and Pearson argue that “the standardized type/token ratio is
obtained by calculating the type/token ratio for the first 1000 words in a text, and then for
the next 1000 words and so on. Then a running average is calculated, which provides a
standardized type/token ratio based on consecutive 1000-word chunks of text (2002:
110).” The standardized type/token ratio is therefore obtained by calculating type-token
ratio every one thousand words and then by averaging the results. This way, data from
different texts of different length can be compared without compromising the validity of
the study.
Another important lexical factor is lexical density, which according to Mona
Baker is “the percentage of lexical as opposed to grammatical items in a given text or
corpus (1995: 237).” It may be computed by dividing the number of content words by the
total number of tokens in a text and multiplying the result by 100 to get the percentage
(Baker 1995: 237 & Stubbs 1996: 71-3). However, there are three other techniques or
formulae that are usually used to calculate lexical density (Baker, Hardie & McEnery
2006: 106). Technique number one involves dividing the number of unique lexical words
by the total number of words; technique number two involves dividing the number of
unique words by the number of clauses; technique number 3 involves dividing the
number of unique words (both lexical and grammatical) by the total number of words. In
48
the third case, there is no difference between type/token ratio and lexical density. For the
purposes of this study, technique number two was employed.
Unlike type/token ratio, lexical density is an indicator of information load in a
text. A text with a high information load is a text difficult to understand as a result of the
amount of details and technical vocabulary. In my search for a lexical density analyzer
for Italian and English, several web-based lexical density analyzers (Textalyser or Text
Content Analysis Tool to mention just a few) were found. However, with these analyzers,
lexical density is often mistaken for type/token ratio; therefore they were not considered
during the data collection procedure. WordSmith Tools was instead used to compute this
lexical feature by adopting a technique tailored to the purpose of this study, given that the
above-mentioned tool does not automatically calculate lexical density. More about this
topic will be discussed in the section dealing with tools.
As for syntactic features, the definition of sentence adopted in this study is any set
of tokens delimited by either a capital letter, number or currency on the left and either a
full stop, exclamation or question mark on the right. This definition is specific to the two
languages under investigation in this study and, therefore, does not take into
consideration directionality issues which can be found when dealing with non-Western
languages or, in the case of Western languages themselves, punctuation issues, as in
Spanish where exclamation and question marks are found both at the beginning and the
end of a sentence. The above-mentioned definition was also tailored to the definition
provided by the online guide to WordSmith Tools, which is available at the following
49
address: http://www.lexically.net/downloads/version5/ WordSmith.pdf. The decision to
take this guide into account in providing a definition of the syntactic feature “sentence” is
due to the fact that this suite of tools has been used in calculating the number of
sentences. It has also been used to compute average sentence length, which is obtained by
averaging the length of all the sentences present in a given text. Lastly, readability index,
which is an indicator of the level of difficulty in reading a text, has been calculated by
resorting to an online tool which works for both Italian and English. Other readability
index calculators were not considered because of their language-related constraints.
Indeed, most of them did not support Italian. The readability analyzer used in the present
study is available at: http://labs.translated.net/text-readability/. This analyzer makes use
of the so-called Gulpease Index for calculating the readability level of texts. This index
was originally developed for the Italian language but the website implemented it for
English and French as well. The Gulpease index is computed using the following
formula:
Gulpease Index = 89 – (Lp/10) + (3 * Fr)
where:
Lp = 100 * number of letters/number of words; and
Fr = 100 * number of sentences/number of words.
The scores of this index range from 0 to 100, with 0 indicating low readability
(harder to read) and 100 indicating high readability (easier to read). As can be seen from
50
the values in the formula above, this index takes into consideration word and sentence
length in computing the readability index as supported by previous studies (Flesch 1951,
Gunning 1973, Lusiano & Piemontese, 1988). The online readability index analyzer does
not provide numbers but classifies readability as easy, average or hard based on the
results obtained through the Gulpease formula. For each of the three levels of readability,
an example is provided by the website:
Easy: This phrase is easy. It contains common words, and simple concepts.
Average: Although this phrase is slightly harder on average, and despite its
complexity, the reader will have no problem understanding it.
Hard: The very phrase contained herein, having rare complexity, may
potentially be, without prior preparation, implausibly difficult to parse in as
much as it carries, at the level of the text itself, an unnecessarily, albeit
notably low readability level.
Based on the examples mentioned above, an easy-to-read text refers to a set of short,
independent sentences containing common, simple words; an average-to-read text refers
to a set of independent sentences with very few dependent clauses; finally, a hard-to-read
text refers to a set of long complex sentences consisting of many dependent clauses.
Focusing on data gleaned from the above-mentioned textual and statistical
features would make it possible to assess the quality of the translations and their
coherence in the target system. It has been shown that in translations of technical texts
51
from English to Italian, quality is associated with an overall increase in the number of
tokens (text length), fewer and longer sentences than the source text, higher type-token
ratio (lexical variety) but a lower lexical density (lexical/grammatical word ratio) (Scarpa
2006: 165-166). In other words, in order for a translation to be of optimal quality, it needs
to comply with:
1) the target language syntactic conventions (hence, the use of fewer,
longer sentences in Italian as a result of its preference for hypotaxis
which is supposed to be achieved through the use of such cohesive
devices as conjunctions or proforms); and
2) with stylistic conventions through the avoidance of simple
repetitions, which are more acceptable in English than Italian, by
means of synonyms (hence, the higher type-token ratio) (Scarpa
2006: 166 ).
This compliance with TL norms was demonstrated by comparing the grades that
translation trainees received on their translations of English domain specific source texts
in a corpus-based study conducted by Federica Scarpa. The main hypothesis behind this
study was that these grades reflect stylistic issues dealing with the use of
lexicogrammatical cohesive devices such as conjunctions and proforms (focus on
demonstratives) (Scarpa 2006: 157). This study was conducted on a bilingual parallel
corpus of English source texts belonging to several text-types and their attendant
translations carried out by translator trainees completing their four-year program at the
52
Advanced School of Modern Languages for Interpreters and Translators (SSLMIT)
(University of Trieste) (2006: 156). In it, higher grades were associated with the findings
mentioned above. However, there seems to be one finding which runs counter to the one
concerning the higher type-token ratio. Scarpa’s argument is that a higher type-token
ratio, which indicates that better quality translations have more vocabulary variation than
their originals, does not necessarily mean that target texts are more lexically dense than
source texts because of structural differences between the two languages (Scarpa 2006:
164-165). What this implies is that when type/token ratio is computed, there are more
types (distinct words) in Italian because of all the inflection in number and gender of
grammatical words. The example she provides is that of the English definite article the
which in Italian can take several forms such as il, lo, la, le, gli, l’, i. By contrast, when
lexical density is computed, Italian texts have a lower ratio because they have more
grammatical words. So her hypothesis is that a lower number of lexical or content words
in Italian compared to English is due not only to structural differences but also to the
transformation of simple repetitions into proforms when joining sentences through
subordination. However, she points out that this lower lexical density index was found in
36 out of 39 texts analyzed, which means that three of the texts that were analyzed did
not show this pattern; in this respect, Scarpa does not mention the grade associated with
these three translations, in other words, we do not know whether these three translations
stand on the quality scale. Another important observation to be made here is that lexical
density was calculated manually in this study, but only the first 100 most frequently
occurring words in the text were considered, so this sample cannot be taken as
53
representative of the whole text. Last but not least, Scarpa seems not to take into account
that text-type also plays a role in the use of stylistic preferences of a language. The
findings of my study take into consideration all these factors, confirm some of the
findings put forward by Scarpa, and disprove others.
2.1.2 Corpus linguistics
The definition of corpus linguistics is vague and not well-defined, as Charlorre Taylor
(2008) points out in her article What is corpus linguistics? What the data says. Over the
past twenty years, several conceptualizations of the expression corpus linguistics have
been put forward by a number of leading scholars in the field, such as Sinclair (1991),
Stubbs (1993), Leech (1992) and Tognini-Bonelli (2006). The crux of the matter is that it
is still not clear whether corpus linguistics is a discipline, a methodology, a theory, a tool,
a methodological approach, a theoretical approach or a combination of the above (Taylor
2008: 180). In the present study, I will adopt Tognini-Bonelli’s definition of corpus
linguistics as a “pre-application methodology” which has “theoretical status” (2001: 1).
Indeed, as Thompson and Hunston (20006) point out, corpus linguistics helped generate
two theories, one concerning meaning and the other communicative discourse. In other
words, thanks to corpus linguistics studies, meaning is no longer located in single words
but in sets of words that tend to co-occur (collocations) and communicative discourse is
conceived as a series of pre-fixed expressions (2006: 11-12).
Corpus linguistics facilitates the description and analysis of language through
corpora. A corpus is nowadays considered to be mainly a collection of texts (written
54
discourse like novels or articles) or transcripts (spoken or written-to-be-spoken discourse
like talks or speeches) held in electronic form. Mona Baker defines a corpus as “any
collection of running texts (as opposed to examples/sentences), held in electronic form
and analyzable automatically or semi-automatically (rather than manually) (1995: 225).”
By running texts, she means that a corpus may consist not only of whole texts but also
fragments of texts, the length of which should be approximately 2000 words (225). These
fragments are taken from the initial, middle, or final parts of longer texts on a random
basis (225). However, not all collections of texts constitute a corpus. In order for a set of
text samples or whole texts to be referred to as such, the texts making up the corpus must
be chosen for a particular purpose and according to specific and well-defined selection
criteria. This ensures that the chosen texts are representative of the language variety that
is under investigation (Baker 1995: 225). Some of the most important criteria to bear in
mind when choosing texts concern language variety (British English vs. American
English), language domain (general vs. technical), genres (novels vs. journal articles),
language synchronicity, and diachronicity (Baker 229). In the present study, the language
variety under analysis is American English, the language domain can be referred to as
technical in that the texts were taken from a scientific journal, the genre may be identified
as the magazine article, and the language is investigated diachronically over a span of ten
(10) years, from 1999 to 2009.
Depending on the purpose(s) of one’s study, corpora may also be monolingual,
comparable, multilingual and parallel. Monolingual corpora may be used to investigate
the lexical, syntactic, textual patterns of a specific language variety or text-type. They are
55
called monolingual because they include texts written in the same language. Comparable
corpora consist of two sets of texts written in different languages but which are
comparable in terms of subject-matters or text-types. Multilingual corpora are similar to
comparable corpora in terms of text selection criteria but include more than two sets of
comparable texts written in different languages. Last but not least, there exist parallel
corpora which consist of two sets of texts in which one set is the translation of the other.
In the present study, two different types of corpora were chosen, namely parallel and
comparable corpora. Indeed, as Baker points out, parallel corpora can tell us a lot about
translation strategies whereas comparable corpora can help find out the natural patterns of
a language (1995:232). Comparing texts written under normal conditions (in a nontranslation situation) with texts produced under translation constraints allows investigate
language pair patterns to isolate which ones are characteristic of translationese and then
use the findings to improve the training of translators.
My study was carried out over two sets of corpora: 1) one parallel corpus
consisting of fifteen (15) English source texts taken from the American magazine
Scientific American and fifteen (15) Italian target texts taken from its Italian edition Le
Scienze; and 2) one comparable corpus consisting of the fifteen Italian translated texts
included in the parallel corpus and fifteen Italian texts written by Italian scientists for Le
Scienze.
The parallel corpus is used to investigate the differences, if any, in the use of
lexical cohesive devices such as repetitions, synonyms, antonyms, hypernyms,
56
hyponyms, meronyms and holonyms between English source text and their attendant
Italian translations. It has been hypothesized that the difference between source and target
texts is minimal. Likewise, the comparable corpus is used to identify the most common
lexical devices used to make a text coherent and cohesive and to see whether there are
any differences in terms of frequency of occurrence in the translated texts. In other
words, through the comparable corpus, this study sets out to investigate the lexical
cohesive patterns of the translated and non-translated language to see whether there are
any differences between the two. In this second case, it has been hypothesized that major
differences are found between translated and non-translated texts, thus helping make the
case for a pedagogical approach to the teaching of lexical cohesion in translator-training
programs.
The choice to carry out the analysis of lexical chains manually was made for
several reasons. The main one was that the lexical chaining tool that was originally going
to be used turned out to support only the Linux platform, which was not available to me
at the time of the analysis. Another main reason which discouraged me from using a
machine to analyze the texts was knowing that the findings would not be one hundred
percent correct and that some of the lexical chains would be overlooked by the tool.
Because of the polysemic nature of content words, it would be impossible to carry out an
automated analysis of lexical chains. In order to better understand my argument, consider
the following example from the Italian article Roma e la storia delle glaciazioni which is
part of the collection of texts originally written in Italian. In this article, the proper noun
Roma is semantically related to the word area which occurs several times throughout the
57
text but which has different referents. In this respect, the computer would never be able to
differentiate between area del Tevere (area of the river) and area dove sorge Roma (the
area where Rome stands). Only the latter is semantically related to the main content word
and only an intensive and thorough reading by a human being would be able to discern
these two senses and identify the one which bears a semantic relationship with the word
Rome. Last but not least, a lexical analyzer is not able to identify collocates which
constitute semantic items per se. For example, the term search engines which occurs a
large number of times in the English source text Seeking Better Web Search English
would be treated as two different words, thus ignoring the corpus-based principle that
meaning can be found beyond single words. For all of the above-mentioned reasons, a
manual analysis of the texts was carried out.
2.2 Tools
2.2.1 WordSmith Tools
WordSmith Tools is a software suite created by Mike Scott at the University of Liverpool
and consisting of three tools or applications:
1) WordList
2) KeyWords
3) Concord
58
Of the three, the first one will be used for the purposes of this study. WordList
allows users to view the list of all the words, or technically speaking, all the tokens
occurring in a text. This list of words can be sorted in alphabetical or frequency order as
in the figure below:
Figure 1 – WordList Frequency List
There are two tabs at the bottom of the WordList window which say “Frequency”
and “Alphabetical,” respectively. By clicking on either of these two tabs, the list is sorted
either by frequency or by alphabetical order. The number on the bottom left hand corner
of the WordList window indicates the types or rather distinct words that occur in the text
being analyzed. The tool also shows statistical data including number of types and
tokens, type/token ratio, standardized type-token ratio, average sentence length and total
number of sentences, to mention just a few in the figure below:
59
Figure 2 – Wordlist Statistics
However, as previously mentioned, this tool does not automatically compute
lexical density, which is one of the lexical factors or features this study focuses on. To
compute lexical density through WordSmith Tools, a list of grammatical words
representative of a particular language is needed. In this study, the stoplist for English,
which was originally built for the Smart Information Retrieval System experiment at
Cornell University, and the Italian stoplist were both taken from the official website of
the Computer Science Department of the University of Neuchatel in Switzerland (these
stoplists are available at the following address: www.unine.ch/info/clef ). This website
provides several stoplists for different languages, such as French, German, Russian,
Spanish and many more. These stoplists were obtained by creating monolingual corpora
in several languages and then extracting the 200 most frequently recurring words. After
this word extraction, the nouns and adjectives related to the corpus subject field
60
appearing within the first 200 words were removed from the list and several personal or
possessive pronouns, prepositions and conjunctions were added even though they did not
appear in the first 200 words. The stop list for English is 571 words whereas the Italian
one consists of 399. However, these two lists do not include all function words but just
the ones that have a high frequency of occurrence in a particular language, which is why
they both had to be customized by adding more prepositions, conjunctions, adverbs and
auxiliary and modal verb conjugations. After these additions, the total word count for the
English stoplist was 596 whereas the one for the Italian stoplist was 506. After the
creation of these two stoplists, they were uploaded into WordSmith Tools in order to
produce an approximate calculation of the total number of content words present in each
text. WordList allows the user to stop unwanted words from appearing in the frequency
list of the types contained in a text. For each text, the number of content words was
calculated and then the ratio of content words to the total number of words was calculated
and the result was multiplied by 100 to arrive at a percentage (Bosseaux, 2007 & Roos,
2009).
2.2.2 WordNet
In analyzing the semantic relationships of English texts, the Princeton-developed lexical
database WordNet was used. In this study, its use was not combined with a lexical
chainer because of the lack of the latter. The online version of WordNet was accessed
through The Princeton University website at http://WordNetweb.princeton.edu. All the
lexical items which were previously manually identified in each text were then processed
61
through WordNot to identify their semantic relationships. However, given the highly
technical register of the texts under investigation , some of these terms were at times not
available or present in WordNet. In these cases, online encyclopedias and thesauri were
consulted to disambiguate the meaning of these terms and then potential semanticallyrelated words were found. The same process was followed for the Italian texts, but, this
time, MultiWordNet was accessed online at http://multiWordNet.fbk.eu. Below is a
screenshot of the online interface of MultiWordNet:
Figure 3 – MultiWordNet Interface
Like WordNet, MultiWordNet consists of an online interface which makes it
possible to view all the semantically-related words for each term that is typed in. It has a
drop-down list located right below the search tab in the blue bar, from where it is possible
62
to select the different semantic relationships a particular word might have, such as
synonyms, antonyms or hypernyms, as in the figure above. Once the type of semantic
relationship one wants to analyze is selected, all the words falling within that category are
displayed on screen, and the subject field appears in brackets in blue next to each one.
This is important because sometimes a word might have several senses, each belonging to
a different subject field. If one knows exactly which area of study the term under analysis
belongs to, it is possible to easily identify the most suitable semantically related word.
2.3 Preliminary Analysis
Before the word sense disambiguation process, intensive reading and manual analysis of
the texts were carried out. The first step in the semantic analysis of each text was the
creation of a word frequency list through the Wordlist tool, which is part of the
WordSmith Tools suite. Out of each frequency list, only content words that were directly
related to the main topic of the article and with a frequency of occurrence above the cutoff point of 10 (ten) were chosen as relevant lexical items or key terms for the semantic
analysis.
After the creation of these lists of most frequently occurring content words
including nouns, adjectives, verbs and topic-related adverbs (with the exception of
adverbs of space, time and manner), I started reading each text sentence by sentence,
comparing both source and target segments when dealing with translations. For each
word taken from the frequency list, the sentence number in which it occurred, its meaning
63
and translation(s) were recorded in a separate word doc table (see table 1). Besides the
different translations a source word could have in the target text, any additions and
omissions that occurred in the translations were also recorded.
Table 1 – Preliminary Textual Analysis Screenshot
In the figure above, which is a screenshot of the preliminary text analysis of one
of the English texts entitled “What Birds See” and its attendant translation, the four
vertical columns from left to right represent lexical chains, source text terms, target text
terms and notes, respectively. Under the Lexical Chain column were included all the key
terms extracted from the WordSmith Tools frequency word list, as well as all the
potentially semantically related terms, be they nouns, adjectives, verbs or adverbs.
64
Indeed, the words taken from the high frequency word list were just a starting point in the
analysis of the semantic relationships within each text. Under the Source Text Term
Column were listed all the sentence numbers in which each key term occurred whereas
under the Target Text Term Column were listed all the available translations of ST terms.
For example, the verb see from the table above is rendered as vedere most of the time
except for four cases in which it is translated differently (visione [change of word class
from verb to noun], distinguire [distinguish], percepire [perceive] and cogliere [grasp]).
Any omissions were signaled by means of a sequence of dashes (---------). The last
column was used to include any comments about translation strategies or incongruences
in translating the same expression. In this respect, one example may be taken from the
English text “Pandora’s baby” in which the terms cloning technology and technology of
cloning are both translated as either tecniche or tecnologia. The incongruence here is due
to the fact that tecniche and tecnologia refer to two different concepts, one indicating
methods and the other indicating the practical application of scientific knowledge to
problem resolutions.
Both simple and modified repetitions were included in the analysis. This means
that if a verb occurred later as a noun, that noun was considered semantically connected
to the verb that occurred earlier, and so classified as modified repetition. This is one of
the reasons why no lexical chainer was used in the analysis of the lexical cohesive
markers in the text. Indeed, WordNet, which is usually used to identify semantically
related words in a text, identifies semantic relationships only within each word class;
which means that only semantic relationships existing among either nouns, adjectives,
65
verbs or adverbs can be recognized. But as happens in any live language, adverbs are
formed out of adjectives and verbs are coined after nouns so it would be erroneous to
ignore these inter-word-class semantic relationships.
After listing and grouping all the semantically related words with information
about the sentence numbers where they occurred and the different translations or
omissions and additions in the attendant target texts, a percentage of all the types of
lexical cohesion for every single text was calculated. This part of my methodology is
modeled on a study by Abdel-Hafiz (2003) who compared lexical cohesive devices in the
English Translation of Naguib Mahfouz’s novel The Thief and The Dogs. In particular,
the author sets out to verify statistically Aziz (1998)’s claim that explicitness and
implicitness are associated with stylistic preferences. In particular, Aziz argues that
Arabic prefers explicit reference achieved through repetition of common and proper
nouns whereas English prefers implicit reference achieved through pronominalization. To
verify the above-mentioned claim, Abdel-Hafiz first identifies all the types of lexical
cohesion used in the original novel and then analyzes the translated text to see how the
translator handled the interlingual transcodification of these lexical cohesive devices.
Then, he calculated the frequency of occurrence of the types of lexical cohesion in both
the source and target texts by focusing on instances of recurrence, partial recurrence and
hyponymy.3 His findings disprove Aziz’s claim in that both source and target texts are
characterized by a high frequency of recurrence, which turns out to be the most common
type of lexical cohesive device; this shows that the translator did not perform the shift
3
In Abdel-Hazif’s study, recurrence and partial recurrence correspond to my two categories of repetition
and partial repetition.
66
from common nouns to pronouns as was expected and hypothesized by Aziz (AbdelHafiz 2004). Similar findings are expected to be found in my English/Italian parallel
corpus.
2.4 Semantic Relation Analysis
After collecting information about key terms, sentence numbers in which they occurred,
and in the case of translations, their attendant target texts, omissions or occasional
additions, an actual analysis of their semantic relation was carried out. Each key term
(meaning the words which had a high frequency of occurrence above 10 times), was
entered in either WordNet or MultiWordNet depending on whether the analysis was
concerned with English or Italian texts, and all the semantic relationships were inspected
and compared with the ones that were identified in a particular text. If any match was
found, then the word semantically related to the key term was listed in the term table, as
in the figure below:
Table 2 – Semantic Relation Analysis
KEY TERM: Proteins
Semantic
Category
Repetition
ENGLISH
TEXT
FREQUENCY
76
ITALIAN
TEXT
FREQUENCY
70
ENGLISH
TERM
Protein
ITALIAN TERM
1) Proteina(70– 1
addition)
2) 1 synonym
3) 2 meronyms
67
4) 2 omissions
5) 4 pronouns
Synonym
Antonym
Meronym
Holonym
0
1
Repetition
Componente proteica
0
4
2
4
Molecola
aminoacidi
15
15
Repetition
Amino
acids
Proteome
Proteoma
Hypernym
Hyponym
In this table, the first vertical column on the left features the possible semantic
relationships that the key term might have with other words in the text, namely:
1)
repetitions and modified repetitions, that is to say, words
that have the same lemma but belong to different word classes (nouns,
verbs, adjectives, adverbs: i.e. influence (n), influence (v), influenced,
influencing, etc.);
2)
Synonyms: words that are similar in meaning to the key
3)
Antonyms: words that have an opposite meaning to the key
4)
Meronyms:
term;
term;
words
that
might
have
a
part-whole
relationship with the key term. For example, in the table above, the word
68
amino acids is listed as a meronym of the key term proteins because they
are actually part of proteins;
5)
Holonyms: words that identify a whole-part relationship
with the key term. Unlike with meronyms, a holonym indicates the whole
and in this case the key term is, as a result, the meronym; In the example
above, the word proteome is listed as a holonym of proteins because the
latter includes them;
6)
Hypernyms or superordinates: words which have a more
general meaning than the key term. In the example above, there are no
hypernyms but if WordNet is consulted, it says that a supermolecule can
be a hypernym of proteins;
7)
Hyponyms are words which have a more specific meaning
than the key word. The table above does not contain any of them, but
WordNet lists several of them under proteins, such as gluten, opsin,
enzyme, etc.
This table was created for each key term and the same procedure was followed to
identify the semantic relationships existing between a key term and its semantically
related words in the text. The table above refers to the bilingual, parallel corpus given
that it contains both source and target text cohesive markers. In this respect, in very few
cases, it was not possible to classify some of the lexical choices that the translator made
in rendering some of the source text words. In other words, no semantic links could be
69
found in either Wordnet or MultiwordNet. Take, for example, the key term mixture from
the text “What birds see”. In it, this term is translated in Italian as combinazione and in
just two cases as proporzioni, which has a totally different meaning from mixture. One
refers to dimensions, the other to combinations of substances. In a few other cases, the
term used in the translations did not fall within any of the semantic relationships under
investigation in this study, such as coordinates (words sharing the same hypernym) or
attributes (properties). So in this case, the translation was recorded but no semantic
relationship was specified for those terms.
After this step, these qualitative data needed to be transformed into quantitative
data in order to document any statistical significance in terms of differences in the use of
semantic relationships between English source texts and Italian translations on the one
hand, and between Italian translations and Italian originals on the other. To do so, for
each document, the percentage of repetitions, synonyms, antonyms, meronyms,
holonyms, hypernyms and hyponyms was calculated as follows:
Total number of repetitions per text * 100
Total number of semantic relations per text
This way, all the numbers are expressed in percentages and are comparable. The
formula above refers to repetitions but it applies to the other semantic relations. After the
percentage of usage of semantic relations in all the texts was calculated, the quantitative
data was compared to test the validity of the two main hypotheses of this study. In the
results section, the quantative data will be combined with findings taken from the
70
qualitative analysis in order to support my argument and hypotheses. In the following
chapters, the results of the present study will be presented and then discussed.
2.5 Statistical Analysis
The results of the textual and semantic analyses were run in SPSS (Statistical Package for
the Social Sciences). This is a software suite which was used to conduct data analysis and
test the validity of the hypotheses put forward in this study. In this respect, a One-Way
Between Subjects ANOVA was conducted to compare the effect of language on the
amount of use of semantic categories such as repetitions, synonyms, antonyms, etc. in
three different conditions: English Originals, Italian Translations, and Italian Originals.
The independent variable of this design was the effect of language whereas the
independent variable was each semantic category which was investigated. The decision to
use a One-Way Between Subjects ANOVA is due to the fact that the Independent
variable of this study, namely, the effect of language, contains three conditions. The
experiment is called between subjects because the independent variable was tested using
independent samples (that is to say three different text corpora) (Heiman 2001: 457). The
one-way Between Subjects ANOVA aims at determining whether or not there is a
statistically significant difference between any two of the three means (English Originals,
Italian Translations and Italian Originals) for each semantic category. This test for
significance is called F value in ANOVA. If the F value turns out to be significant, this
does not complete the analysis because this points to the existence of a statistically
significant difference among the conditions but it does not indicate which ones. To find
71
out which conditions or groups differ significantly, a post-hoc Tukey HSD test needs to
be conducted which compares each condition with all the others and determines among
which conditions or groups the significance lies (Kirkpatrick & Feeney 2009:39). Post
hoc comparisons were conducted for each semantic category whenever the F value was
significant.
CHAPTER III
RESULTS
The present chapter has been organized into two major parts which mirror the
methodological approach adopted in carrying out the analysis of the texts making up the
parallel and comparable corpora. The first part is concerned with presenting the findings
gleaned from the textual analysis of the documents, whereas the second part presents the
findings relating to the semantic analysis of the documents. Quantitative data will be
accompanied by qualitative information whenever the latter is deemed relevant to
understanding the translator’s choices at the lexical and syntactic levels. Statistical
evidence will also be provided. In this respect, several one-way between- subjects
ANOVA tests were run in SPSS to find out whether or not the quantitative data had
statistical validity. Before presenting the findings, I would like to provide some
information about the size of the two corpora analyzed.
3.1 Parallel and Comparable Corpora
The text length of the English source texts ranges from 2,714 to 4,167 tokens (or running
words) with an average text length of 3,396 running words and a total corpus size of
50,944 tokens. The text length of the Italian translations ranges from 2,886 to 4,640
tokens with an average text length of 3,552 running words and a total TT corpus size of
72
73
53,291 tokens. The text length of the Italian originals ranges from 2,014 to 5,275 tokens
with an average text length of 3,259 running words and a total corpus size of 48,892
tokens.
3.2 Textual Analysis
3.2.1 Standardized Type-Token Ratio
As mentioned in the previous chapter, type-token ratio is affected by text length – being
the result of the ratio between number of types, that is to say the number of distinct words
in a text, and total number of running words (also known as tokens); which is why it was
not used in the present analysis. The value that actually makes the texts comparable
regardless of their actual word length is instead the standardized type-token ratio, which
was used herein. The latter results from averaging the type-token ratios of a text
calculated at intervals of every one thousand words. Below is a table containing the data
relating to the type-token ratios for each text in the parallel corpus:
Table 3.1 – Parallel Corpus STTRs
Corpus English
Doc.
STs
Text 1
STTR
Text 2
43.03
Text 3
46.47
Text 4
49.87
Text 5
45.50
Text 6
47.00
Text 7
40.25
Text 8
46.10
Italian
TTs
STTR
48.17
51.80
52.15
49.07
54.90
44.83
48.20
74
Text 9
Text 10
Text 11
Text 12
Text 13
Text 14
Text 15
Text 16
46.27
48.70
43.70
42.80
44.20
45.90
47.53
41.17
47.93
53.50
45.93
44.77
48.77
48.53
49.13
43.70
As a general rule, it is possible to say that the type-token ratio in the translations is
higher, which in turn points to greater vocabulary variation. This is also confirmed by a
higher number of types in the translations. The higher the number of types, the higher the
vocabulary variation. Below is a table comparing the source and target text types for each
of the fifteen documents under analysis:
Table 3.2 – Parallel Corpus Types
Corpus English Italian
Doc.
STs
TTs
Types Types
Text 1
1,163
1,453
Text 2
1,083
1,306
Text 3
1,134
1,226
Text 4
946
1,102
Text 5
1,116
1,397
Text 6
1,075
1,232
Text 7
1,160
1,217
Text 8
1,148
1,282
Text 9
1,119
1,280
Text 10
984
1,211
Text 11
940
1,006
Text 12 1,058
1,223
Text 13 1,069
1,237
Text 14 1,105
1,247
Text 15
989
1,107
75
The higher type-token ratios and number of types point to greater variation overall in the
use of vocabulary in translations. However, as pointed out in the methodology chapter,
this statistic might also be due to grammatical and syntactic differences in the two
languages. Indeed, for WordSmith Tools a type is any distinct word occurring in a text,
which means every declension of verbs, adjectives, nouns and grammatical particles is
counted as a distinct type. It follows that Italian inevitably has more types than English
by virtue of its linguistic features. One solution to this problem might be to lemmatize
those words which belong to the same word class and differ only in gender, number or
tense in the case of verbs, by reducing their inflectional forms to a common base form,
but this practice is not widespread and it was not carried out in this study.
In six (6) out of fifteen (15) translated texts, the number of types is higher despite
the fact that a few source text sentences were omitted and not translated. The texts in
question are:
 Text 2: Darwin’s Influence on Modern Thought, in which four sentences,
namely S 156, S 167, S 168 and S 169, were omitted in the Italian version.
 Text 7: Next Stretch for Plastic Electronics, in which four sentences were
omitted. These are S 67, S 153, S 154, and S155.
 Text 8: Seeking Better Web Searches, in which two sentences were
omitted. These are S 127 and S 164.
 Text 9: Shaping the Future, in which as in Text 8 two sentences were
omitted. These are S 163 and S 164.
76
 Text 14: White Matter Matters, in which one sentence, namely S 145, was
omitted.
 Text 15: The Colors of Plants on other Worlds, in which two sentences
were omitted, namely S 113 and S142.
As far as the comparable corpus is concerned, a comparison of the texts of the translated
and non-translated Italian sub-corpora is possible in that the standardized type-token ratio
of each text is computed by averaging the type-token ratios of 1,000-word text stretches;
therefore, the results are not affected by text length or, rather, they are not directly
affected by it. Below are two tables, one featuring the STTR of each text and the one
featuring the average STTR of each sub-corpus:
Table 3.3 Comparable Corpus STTRs
TT
SubCorpus
STTR
48.17
51.80
52.15
49.07
54.90
44.83
48.20
47.93
53.50
45.93
44.77
48.77
48.53
43.70
49.13
IO
SubCorpus
STTR
50.20
51.65
46.12
44.40
51.05
47.85
48.10
48.38
49.40
46.80
48.23
55.00
42.50
47.70
46.45
77
Table 3.4 – Comparable Corpus STTR Means
TT
SubCorpus
STTR
48.76
IO
SubCorpus
STTR
48.26
Generally speaking, the STTR in the translated Italian sub-corpus ranges from 43.70 to
54.90 whereas the one in the non-translated Italian sub-corpus ranges from 42.50 to 55.
These intervals are almost identical as are the average STTRs of the two sub-corpora
taken as a whole.
3.2.2 Sentence number
Another important textual feature which needs to be discussed herein relates to the
number of sentences present in the source and target texts. This textual feature documents
whether or not there are any changes as to syntactic structure when the source text
content is conveyed into the target text through the target language.
Table 3.5 – Parallel Corpus Sentence Numbers
Corpus
Doc.
Text 1
Text 2
Text 3
Text 4
Text 5
Text 6
English
Italian
STs
TTs
Sentence Sentence
Number Number
166
160
172
164
152
143
114
107
154
137
188
177
78
Text 7
Text 8
Text 9
Text 10
Text 11
Text 12
Text 13
Text 14
Text 15
162
156
172
119
116
123
137
156
184
128
138
142
111
110
118
123
133
177
If we compute the difference in number of sentences for each pair of texts contained in
the parallel corpus and subtract, only from those texts in which omissions occurred, the
number of sentences which were not translated from the source text, the data that comes
out shows that only in six out of 15 pairs of texts the difference in sentence number is
above 10. This points to the use in most TTs of a syntactic structure similar to that in STs.
Since the texts contained in the Italian comparable corpus cannot be compared
when considered individually, an average sentence number was computed for both the
parallel and comparable corpus:
Table 3.6 – Average Sentence Numbers
Average
Sentence
Number
English
STs
Italian
TTs
151.4
137
Italian
Originals
111.73
It is worth reminding the reader that the average text length between the English source
texts and the Italian originals is almost the same, namely 3,396 and 3,259, respectively.
The difference in number of tokens between the two is 137. In order to make the average
79
sentence number between these two corpora comparable, it is necessary to look at the
average sentence length for each corpus – which indicates how many tokens are in an
average sentence – and then divide 137 by the average sentence length:
Table 3.7 – Parallel and Comparable Corpus Average Sentence Length
Average
Sentence
Length
English
STs
Italian
TTs
22.53
26
Italian
Originals
30.12
Two operations are possible: either 137 is divided by the average sentence length of the
English Source Text sub-corpus and then the result is subtracted from the average
sentence length of the English Originals sub-corpus or 137 is divided by the average
sentence length of the Italian Originals sub-corpus and the result is then added to the
average sentence number of the Italian Originals sub-corpus. By so doing, the two subcorpora are made comparable in terms of sentence numbers.
EO Corpus Sentences = 137 : 22.53 = 6.1
IO Corpus Sentence = 137 : 30.12 = 4.55
The result of the first operation needs to be subtracted from the average number of
corpus sentences for the English Source Text sub-corpus, whereas the result of the second
operation is to be added to the average number of corpus sentences for the Italian subcorpus consisting of original texts.
EO Corpus Comparable Sentence Number = 151.4 – 6.1 = 145.3
80
IO Corpus Comparable Sentence Number = 111.73 + 4.55 = 116.28
Only one of the above-mentioned results is to be taken into account. I have decided to
make the Italian Originals sub-corpus sentence number comparable to the English
Originals sub-corpus sentence number. Hence, the average sentence numbers of the two
sub-corpora approximately reflect the average sentence number that an average text of 3,
339 words in English and Italian might have.
Table 3.8 – Average Sentence Number in an average text of 3,339 tokens
English
ST
Average
SL
151.4
Italian
Originals
116.28
As is evident from the table above, the average sentence number in texts originally
written in Italian for the same readership is lower than the one in the corpus consisting of
Italian original texts.
The same procedure was followed to document whether there was any significant
difference between translated and non-translated Italian texts. Since the two corpora in
question are of different size, a few operations had to be performed in order for the
results relating to the average sentence length of the two corpora to be comparable. First,
the difference in tokens between the two corpora was computed:
Token Difference = 3,552 – 3, 259 = 293
81
Second, the approximate number of sentences that a chunk of text, consisting of 293
tokens originally written in Italian, might contain was calculated by dividing the Token
Difference by the average sentence length of the Italian Originals sub-corpus:
Approximate Sentence Number = 293/30.12 = 9.73
Third, the approximate number of sentences was added to the average sentence number
of the Italian Originals sub-corpus in order to obtain the approximate average number of
sentences that an average text of 3,552 words might contain:
Approximate Average Sentence Number = 111.73 + 9.73 = 121.46.
Finally, the average sentence numbers of the translated and non-translated Italian texts
were compared:
Table 3.9 – Comparable Corpus Average Sentence Number
Average Sentence
Number
Italian
Translations
Italian
Originals
137
121.46
A comparison of the two results shows that the difference in the average number of
sentences between the two sub-corpora is lower than the difference in the average
number of sentences between English and Italian originals.
82
3.2.3 Lexical Density
As pointed out in the methodology chapter, lexical density represents the information
load of a document, and in this study it was computed by taking into account all the
content words occurring in each document analyzed instead of just focusing on the first
one hundred most frequently-used words. Below is a table featuring the lexical density
values for the parallel corpus:
Table 3.10 – Parallel Corpus Lexical Density
Corpus
Doc.
Text 1
Text 2
Text 3
Text 4
Text 5
Text 6
Text 7
Text 8
Text 9
Text 10
Text 11
Text 12
Text 13
Text 14
Text 15
English
Italian
Originals Translations
LD
LD
23.12
26.57
26.08
30.59
29.46
35.96
27.08
30.26
26.19
33.77
20.49
25.47
27.69
30.14
26.31
30.42
28.15
32.63
24.93
26.79
24.94
26.53
24.43
28.98
26.13
28.91
27.21
30.73
21.62
24.56
As a general rule, the pattern that stands out from the table above is that in the Italian
translations, the value for lexical density is higher than the value in the source texts. This
83
finding contradicts what Scarpa states in her article “Corpus-based Quality-Assessment
of Specialist Translation: A Study Using Parallel and Comparable Corpora in English and
Italian,” in which she uses lexical density as one of the benchmarks for assessing the
quality of student translations and finds that this feature is generally lower in translations.
However, in computing lexical density, Scarpa only focuses on the first one hundred high
frequency words.
As far as average lexical density is concerned, its values for the entire corpus
confirm an overall higher lexical density in the translated Italian sub-corpus as shown in
the table below:
Table 3.11 – Lexical Density in English and Italian Originals
LD
English
Italian
Originals Originals
25.60
29.49
As far as the comparable corpus is concerned, since a one-by-one comparison of lexical
density values between English Originals, Italian Translations and Italian Originals is not
possible in that the values for the Italian Originals would not be comparable because of
their different text length which inevitably affects lexical density, it is nonetheless
possible to compare the average lexical density for each sub-corpus as in the following
table:
Table 3.12 – Parallel and Comparable Corpus Average Lexical Density
LD
EO
25.60
IT
29.49
IO
29.76
84
The rationale behind comparing these results is that the overall corpus size for the Italian
originals is lower than the corpus size of the English source and Italian target texts,
respectively. Hence, the higher average lexical density in the Italian Originals corpus
cannot be due to a higher number of tokens or running words in the latter. Indeed, the
overall size of the Italian Originals sub-corpus is 48,892 tokens as opposed to 50,944 and
53,291 tokens for the English source text and Italian target text sub-corpora, respectively.
The pattern that emerges from the analysis of these values is that the Italian target
texts and Italian originals resemble each others in terms of information load, which is on
the whole higher than in English source texts.
3.2.4 Readability
The data on readability is qualitative rather than quantitative which means that statistical
evidence will not be provided for this textual feature at the end of the chapter. Readability
is classified as hard, average or easy. Below is a table featuring the readability indices
relating to the parallel corpus texts:
Table 3.13 – Parallel Corpus Readability Indices
Corpus
Doc.
Text 1
Text 2
Text 3
Text 4
ST
TT
READABILITY READABILITY
Average
Hard
Hard
Hard
Hard
Hard
Average
Hard
85
Text 5
Text 6
Text 7
Text 8
Text 9
Text 10
Text 11
Text 12
Text 13
Text 14
Text 15
Average
Average
Hard
Hard
hard
Hard
Hard
hard
Hard
Hard
Average
Hard
Hard
Hard
Hard
Hard
Hard
Hard
Hard
Hard
Hard
Hard
Since these texts are highly technical, their readability index is primarily hard, with the
exception of five cases in the source text sub-corpus in which the readability is average.
On the whole, the pattern that emerges from the comparison of the results is in alignment
with the patterns that have been identified so far in relation to other textual features. In
other words, an increase in the readability index is evident in the Italian translations. As
expected, the Italian translations are all hard to read because the texts, besides being
highly technical, also have a higher word and sentence length which are the two factors
used in calculating the Gulpease index which is the formula for readability.
A contrastive analysis between the above-mentioned values and the ones from the
Italian originals sub-corpus reveals that three out of fifteen texts have an average
readability index as shown in the table below:
Table 3.14 – Parallel and Comparable Corpus Readability Indices
Corpus
Doc.
Text 1
EO
IT
IO
READABILITY READABILITY READABILITY
Average
Hard
Hard
86
Text 2
Text 3
Text 4
Text 5
Text 6
Text 7
Text 8
Text 9
Text 10
Text 11
Text 12
Text 13
Text 14
Text 15
Hard
Hard
Average
Average
Average
Hard
Hard
hard
Hard
Hard
hard
Hard
Hard
Average
Hard
Hard
Hard
Hard
Hard
Hard
Hard
Hard
Hard
Hard
Hard
Hard
Hard
Hard
Hard
Hard
Hard
Average
Hard
Hard
Average
Hard
Hard
Hard
Hard
Average
Hard
Hard
Generally speaking, the readability of these texts is expected to be hard because of a
greater use of technical vocabulary. It is not possible to make a one-by-one comparison
between the texts from the parallel and comparable corpora because they have different
text length and are about different topics. However, considering that the target readership
is the same for both Italian Translations and Italian Originals (given that these texts were
taken from the same magazine) it is possible to state that as a general rule the overall
readability index in both corpora is hard.
3.2.5 Average Sentence Length
The average sentence length in the Source Text sub-corpus is generally lower than the
one found in translated texts, as shown in the table below:
Table 3.15 – Parallel Corpus Average Sentence Length
87
Corpus
Doc.
Text 1
Text 2
Text 3
Text 4
Text 5
Text 6
Text 7
Text 8
Text 9
Text 10
Text 11
Text 12
Text 13
Text 14
Text 15
STs
TTs
ASL
24.56
19.13
19.89
23.50
22.14
21.86
21.11
22.68
18.60
26.20
25.45
28.06
24.19
21.06
19.59
ASL
28.75
21.72
19.99
27.97
25.64
22.51
26.38
26.14
22.92
34.16
27.83
29.75
29.50
25.55
20.99
Overall, it is possible to argue that the Italian translated sentences tend to be longer than
their English counterparts. Only in four cases (Texts 3, 6, 12, and 15), the difference
between the source and target texts in terms of sentence length is less than two (2) tokens.
The average sentence length for each sub-corpus is as follows:
Table 3.16 Parallel Corpus mean ASL
ASL
ST SubCorpus
22.53
TT Subcorpus
26
A comparison of the average sentence length of each translation with the average
sentence length of the Italian Originals sub-corpus shows that eight out of fifteen texts,
88
namely texts 1, 4, 7, 8, 10, 11, 12, 13, have an average sentence length higher than the
sub-corpus mean average.
As far as the comparable corpus is concerned, the average sentence length of each
text being part thereof is as follows:
Table 3.17 – Comparable Corpus ASL
IT
SubCorpus
ASL
28.75
21.72
19.99
27.97
25.64
22.51
26.38
26.14
22.92
34.16
27.83
29.75
29.50
25.55
20.99
IO
SubCorpus
ASL
37.26
32.57
27.28
35.09
28.24
31.05
28.20
20.71
25.87
31.03
28.66
24.13
27.85
37.97
35.84
Since a one-by-one comparison of each text between the translated and non-translated
Italian sub-corpus is not possible, it is necessary to compute the average sentence length
for each sub-corpus and then compare the results, which are the following:
89
Table 3.18 – Comparable Corpus Mean ASL
ASL
IT
SubCorpus
26
IO
SubCorpus
30.12
The data contained in the table above show that there is a major difference in terms of
average sentence length between translated texts and texts that are originally written in
Italian by Italian scientists or, more generally speaking, authors. This difference is even
more evident when comparing the average sentence length of English and Italian
originals as shown below:
Table 3.19 – Mean ASL in English and Italian Originals
ASL
EO
SubCorpus
22.53
IO
SubCorpus
30.12
It is evident that English sentences are generally shorter than Italian sentences, which is
partly due to the fact that English prefers coordination whereas Italian prefers hypotaxis.
3.3 Semantic Analysis
Semantic analysis herein refers to the investigation of such semantic relationships as
repetitions, modified repetitions, synonyms, antonyms, meronyms, holonyms, hypernyms
and hyponyms. The following is a list of the findings for each above-mentioned semantic
category in the parallel and comparable corpora. The findings will be presented in the
90
form of percentages rather than raw numbers because in this way it will be easier to
document the difference in the frequency and use of these semantic relationships in each
and every text as well as the whole corpus.
3.3.1 Repetition and Modified Repetition
As mentioned in the methodology chapter, repetition and modified repetition, that is to
say the use of derived forms, were considered as a unique semantic category while
computing their percentage of occurrence. Therefore, the results presented below include
both types of relationships:
Table 3.20 – Parallel Corpus Repetitions
Corpus
Doc.
Text 1
Text 2
Text 3
Text 4
Text 5
Text 6
Text 7
Text 8
Text 9
Text 10
Text 11
Text 12
Text 13
Text 14
Text 15
ST
Repetition
52.13 %
53.90 %
69.60 %
70.83 %
33.85 %
54.67 %
48.55 %
49.86 %
88.48 %
77.86 %
61.69 %
62.35 %
59.59 %
60.68 %
48.78 %
TT
Repetition
65.56 %
55.68 %
70.13 %
73.85 %
16.57 %
54.13 %
49.38 %
54.62 %
82.76 %
77.32 %
59.26 %
58.71 %
50.86 %
62.74 %
47.56 %
91
From the table above, it is evident that the repetition patterns in the target texts for the
most part resemble those of the source texts. However, in five out of fifteen texts, there is
a major difference between source and target texts. Of these five texts, two have a greater
use of repetitions in the target text whereas three have a lower use of them. If we look at
each text individually, it is not possible to see the general picture, which is why an
average of the use of repetition in both sub-corpora is provided below:
Table 3.21 – Parallel Corpus Average Repetition
EO
IT
Average
Average
Repetition Repetition
59.52
58.61
By comparing the average use of repetitions in the English source text and the Italian
target text sub-corpora, it is possible to argue that there is no major difference in the use
of this semantic category between English Originals and Italian Translations.
A different trend is evident when non-translated and translated Italian texts are
compared. Since a one-by-one comparison between texts in the two sub-corpora is not
possible, an average of the percentage of use of this semantic category in the two subcorpora taken as a whole was computed and the results are the following:
Table 3.22 – Comparable Corpus Average Repetition
IT
IO
Average
Average
Repetition Repetition
58.61 %
45.58 %
92
The pattern that emerges concerning the use of repetition in Italian translations and
Italian originals is that there is a far lower use of repetition devices in Italian originals.
The repetition average relating to the Italian Target Text sub-corpus resembles that of the
English Source Text sub-corpus, which is only slightly higher (59.52 %). However, one
might argue that these two figures are not totally comparable because of the different
sizes of the two sub-corpora: one (the Italian Target Text Sub-corpus) being 53,291
tokens long and the other (the Italian Originals Sub-corpus) being 48,892 tokens long.
This is not entirely true given that the percentage is not computed based on the number of
tokens of a text but rather based on the amount of times the same term was repeated
throughout a text. It is true that the longer a text, the higher the chances a term can be
repeated but the percentage reflects the frequency of use of a semantic category
compared to the other semantic relationships. Therefore, regardless of the length of a text,
the percentage can be either higher or lower depending on whether the author resorted to
other semantic categories such as synonyms, antonyms, meronyms, etc. In support of this
argument is the finding that the percentage of use of any semantic category is not
proportional to the size of the texts in which they occur:
Table 3.23 – Percentage of Use of Repetition Compared to Text Size
Corpus
Doc.
Text 1
Text 2
Text 3
Text 4
Text 5
Text 6
English English
Italian
Italian
Italian
Italian
Originals Originals
Translations Translations Originals Originals
Tokens Repetition Tokens
Repetition Tokens Repetition
4,121
52.13 %
4,640
65.56 %
2,878
43.85 %
3,308
53.90 %
3,576
55.68 %
2,952
47.41 %
3,058
69.60 %
2,886
70.13 %
5,232
50.23 %
2,714
70.83 %
3,033
73.85 %
3,034
58.58 %
3,447
33.85 %
3,541
16.57 %
2,611
38.37 %
4,167
54.67 %
4,032
54.13 %
2,402
42.67 %
93
Text 7
Text 8
Text 9
Text 10
Text 11
Text 12
Text 13
Text 14
Text 15
3,473
3,557
3,221
3,132
2,967
3,478
3,356
3,311
3,634
48.55 %
49.86 %
88.48 %
77.86 %
61.69 %
62.35 %
59.59 %
60.68 %
48.78 %
3,430
3,632
3,273
3,806
3,075
3,533
3,669
3,420
3,745
49.38 %
54.62 %
82.76 %
77.32 %
59.26 %
58.71 %
50.86
47.56 %
62.74 %
2,134
4,625
2,014
5,275
3,782
3,268
3,706
2,877
2,102
61.54 %
51.66 %
43.33 %
36.69 %
47.66 %
42.79 %
42.95 %
29.38 %
46.55 %
As is evident from the table above, within the same sub-corpus, longer texts may have a
lower repetition frequency as in text 6, which consists of 4,167 running words but has a
repetition frequency of 54.67 %, which is lower than that of Text 4, which consists of
only 2,714 running words but has a repetition frequency of 70.83 %. This is due to the
fact that, as previously mentioned, the percentage was calculated out of the total number
of semantic categories used in each text.
3.3.2 Synonyms
As far as synonyms are concerned, a contrastive analysis between each source and target
text points to an overall higher use of repetitions in the translated texts, as shown in the
table below:
Table 3.24 – Parallel Corpus Synonyms
Corpus
Doc
Text 1
Text 2
Text 3
ST
TT
Synonyms Synonyms
4.25 %
1.11 %
8.81 %
4.76 %
18.71 %
16.88 %
94
Text 4
Text 5
Text 6
Text 7
Text 8
Text 9
Text 10
Text 11
Text 12
Text 13
Text 14
Text 15
0
40.10 %
16.80 %
3.26 %
11.44 %
7.37 %
0.24 %
6.47 %
2.71 %
16.33 %
16.1 %
17.48 %
0.46 %
48%
18.80 %
0.41 %
18.54 %
11.33 %
3.90 %
7.12 %
4.84 %
22.84 %
13.67 %
17.70 %
In ten out of fifteen target texts, the frequency of use of synonyms is higher than the
source texts. The most evident differences can be found between source and target texts
2, 5, 8, 9, 12, and 13. If the means of the two sub-corpora are compared, it becomes
evident that, overall, the Target Text sub-corpus has a slightly higher use of synonyms, as
featured in the table below:
Table 3.25 – Parallel Corpus Synonym Means
EO
IT
Synonym Synonym
Mean
Mean
11.34 % 12.69 %
The difference between the two means is 1.35 %. The mean values are not so different
from the mean value of the Italian Originals sub-corpus which is 10.83 %, which in turn
implies a lower use of synonyms than in both English originals and Italian translations.
95
3.3.3 Antonyms, Meronyms and Holonyms
These three semantic categories are presented under the same heading because their
frequency of use was very low compared to repetitions, synonyms, hypernyms and
hyponyms, which can be regarded as the most common lexical cohesive devices used to
create coherence and cohesion in a text.
As far as antonyms are concerned, their use in the texts under analysis was
sporadic. They were identified only in a few texts with a very low frequency of
occurrence, as shown in the table below:
Table 3.26 – Parallel Corpus Antonyms
Corpus
Doc.
Text 1
Text 2
Text 3
Text 4
Text 5
Text 6
Text 7
Text 8
Text 9
Text 10
Text 11
Text 12
Text 13
Text 14
Text 15
EO
IT
ANTONYM ANTONYM
0
0
0.68 %
0.73 %
0
0
0
0
0
0
0
0
1.81 %
2.06 %
0
0
3.23 %
3.45 %
0
0
0
0
0
0
7.35 %
7.33 %
2.17 %
2.17 %
0.2 %
0.22 %
96
By looking at the data above, one cannot draw any conclusions as to the use of antonyms
in the source and target texts because in some the target texts have a higher use of
antonyms and in others the target texts have the same frequency of use as the source
texts. Moreover, in nine out of fifteen texts, the frequency of use of this semantic
category equals zero. If the mean of these values for each sub-corpus is taken into
consideration, then it is possible to see a general trend in its use:
Table 3.27 – Parallel Corpus Antonym Means
EO
IT
Antonym Antonym
Mean
Mean
1.03 %
1.06 %
The difference between the two mean values is almost equal to zero. These findings are
not so different from the ones found in the Italian Originals sub-corpus as shown in the
table below:
Table 3.28 – Italian Originals Antonyms
Corpus
Doc.
Text 1
Text 2
Text 3
Text 4
Text 5
Text 6
Text 7
Text 8
Text 9
Text 10
Text 11
IO
Antonym
0.41 %
0
0
0
0
4.19 %
0
2.32 %
0
0
0
97
Text 12
Text 13
Text 14
Text 15
0
0
0.57 %
0
Only in three out of fifteen texts were antonyms present; however, since a comparison
between source and target texts and Italian Originals is not possible, the sub-corpus mean
was computed. Its value is 0.50 %, which is twice as low as the mean of both the source
and target text sub-corpora.
As regards meronyms, there is, overall, a higher use of the latter compared to
antonyms, but their frequency of occurrence is still low when compared to repetitions,
synonyms, hypernyms or hyponyms. Below is a table featuring the frequency of
occurrence of this semantic category in the parallel corpus:
Table 3.29 – Parallel Corpus Meronyms
Corpus
Doc.
Text 1
Text 2
Text 3
Text 4
Text 5
Text 6
Text 7
Text 8
Text 9
Text 10
Text 11
Text 12
Text 13
Text 14
EO
IT
Meronym Meronyms
10.64 %
10%
0
0
0
0
8.33 %
9.63 %
0
0
0
0
1.81 %
1.23 %
7.62 %
7.28 %
0
0
4.05
4.39 %
0.99 %
1.14 %
9.64 %
10.32 %
0
0
5.26 %
4.97 %
98
Text 15
3.86 %
4.2 %
The values in the columns above are generally very close to each other. Though there is a
slightly higher use of meronyms in the translated texts (in five out of nine texts where
meronyms appear), it is not possible to argue that there is a major difference between the
two sub-corpora, and this is proved by the mean values:
Table 3.30 – Parallel Corpus Meronym Means
EO
IT
Meronym Meronym
Mean
Mean
3.48 %
3.54 %
The means of the two sub-corpora are almost the same, which implies that both source
and target texts adopt the same number of meronyms. Different findings are evident when
these mean values are compared to the mean value of the Italian Originals sub-corpus, in
which the meronym mean amounts to 8.89 %. This higher mean is due to the fact that a
greater number of texts in the Italian Originals sub-corpus make use of this semantic
category as in the table below:
Table 3.31 – Italian Originals Meronyms
Corpus
Doc.
Text 1
Text 2
Text 3
Text 4
Text 5
Text 6
IO
Meronym
0
6.67 %
5.94 %
0
5.03 %
17.99 %
99
Text 7
Text 8
Text 9
Text 10
Text 11
Text 12
Text 13
Text 14
Text 15
5.33 %
18.87 %
10.84 %
1.62 %
0
7.96 %
34.83 %
6.22 %
12.07 %
As is evident from the table above, not only do most of the texts make use of meronyms
but the frequency of occurrence of the latter is also higher than that of the source and
target texts.
Last but not least, there is the semantic category of holonyms. Like antonyms, this
semantic category does not occur in almost half of the texts that are part of the parallel
corpus, as shown in the table below:
Table 3.32 – Parallel Corpus Holonyms
Corpus
Doc.
Text 1
Text 2
Text 3
Text 4
Text 5
Text 6
Text 7
Text 8
Text 9
Text 10
Text 11
Text 12
EO
IT
Holonyms Holonyms
3.19 %
3.33 %
0
0
0
0
10.42 %
11.47 %
1.04 %
1.14 %
0
0
7.25 %
7.42 %
0
0
0
0
4.52 %
4.15 %
0
0
0
0
100
Text 13
Text 14
Text 15
0
8.05 %
0.2 %
0
9.31 %
0.22 %
The general trend that is evident in the table above is that the target texts have a slightly
higher but overall similar frequency of occurrence of holonyms. This is confirmed by the
mean values of the two sub-corpora, which are as follows:
Table 3.33 – Parallel Corpus Holonym Means
EO
IT
Holonym Holonym
Mean
Mean
2.30 %
2.47 %
Though the TT holonym mean is slightly higher, it is not possible to state that there is a
difference in the use of this semantic category in the two sub-corpora, which instead can
be argued in the case of the Italian Originals sub-corpus, in which the presence of this
semantic category is very limited, as demonstrated in the table below:
Table 3.34 – Italian Originals Holonyms
Corpus
Doc.
Text 1
Text 2
Text 3
Text 4
Text 5
Text 6
Text 7
Text 8
Text 9
IO
Holonym
0
0
5.94 %
0.32 %
0
0.84 %
0.59 %
0
0
101
Text 10
Text 11
Text 12
Text 13
Text 14
Text 15
0
0
0
0
0
0.58 %
Not only is the frequency of occurrence in most of the texts equal to zero, but it is also
very low in the few texts in which this semantic category occurs. To see if there are any
differences in the frequency of use between the source and target texts and the Italian
original texts, it is necessary to compare the mean values of the three sub-corpora. The
mean of the Italian Originals sub-corpus amounts to 0.55 %, which is lower than that of
the Source and Target Text sub-corpora, which is 2.30 % and 2.47 %, respectively.
3.3.4 Hypernyms
As far as the parallel corpus is concerned, a comparison of the frequencies of use of
hypernyms in the source and target texts reveals some interesting details. Below is a table
featuring the frequency values in the parallel texts:
Table 3.35 – Parallel Corpus Hypernyms
Corpus
Doc.
Text 1
Text 2
Text 3
Text 4
Text 5
Text 6
EO
IT
Hypernym Hypernym
11.70 %
11.11 %
20.68 %
21.61 %
8.19 %
11.69 %
5%
2.75 %
14.60 %
23.43 %
26.40 %
24.79 %
102
Text 7
Text 8
Text 9
Text 10
Text 11
Text 12
Text 13
Text 14
Text 15
5.07 %
20.82 %
0
3.81
3.98 %
7.53 %
13.47 %
0.93 %
4.67 %
6.17 %
10.93 %
0.49 %
2.92 %
6.55 %
6.45 %
16.38 %
1.24 %
5.10 %
The table above shows that in ten out of fifteen texts, there is an increase in the use of
hypernyms in the Italian translations (see Texts 1, 2, 3, 5, 7, 9, 11, 13, 14, 15). One text,
namely, text 9, in which the frequency of use of this semantic category is equal to zero in
the source text, makes use of hypernyms in the Italian version. In the remaining five
cases, the Italian translation makes lesser use of hypernyms. In their place, the Italian
texts resort to other semantic categories such as synonyms, hyponyms, meronyms but
mostly repetitions and omissions. Overall, however, the hypernym frequency mean
values of the two sub-corpora are not very different from each other as shown in the table
below:
Table 3.36 – Parallel Corpus Hypernym Means
EO
IT
Hypernym Hypernym
Mean
Mean
9.79 %
10.11 %
The data in the table above show that the average use of hypernyms in the source text
sub-corpus is almost identical to the one in the Target Text sub-corpus. An interesting
103
finding can be found when these means are compared to the hypernym frequency mean
value of the Italian Originals sub-corpus. The individual values in each text are overall
higher than the ones in the parallel corpus as shown in the table below:
Table 3.37 – Italian Originals Hypernyms
Corpus
Doc.
Text 1
Text 2
Text 3
Text 4
Text 5
Text 6
Text 7
Text 8
Text 9
Text 10
Text 11
Text 12
Text 13
Text 14
Text 15
IO
Hypernym
33.61 %
14.82 %
9.36 %
15.53 %
17.61 %
15.90 %
4.14 %
12.91 %
1.66 %
15.26 %
15.58 %
25.37 %
7.51 %
12.99 %
23.56 %
This higher use of hypernyms in each of the Italian Original texts that were analyzed is
reflected in the overall higher mean value of the sub-corpus which is equal to 15.05 %.
This implies that the Italian Originals sub-corpus makes a greater use of hypernyms
compared to English texts and their Italian translations.
3.3.5 Hyponyms
104
Besides repetitions, synonyms and hypernyms, the fourth most widely used semantic
category in the corpus of texts selected for this study was hyponyms, namely the use of
words with a more specific meaning. Comparing the frequency of occurrence of
hyponyms in each pair of texts making up the parallel corpus yields the following results:
Table 3.38 – Parallel Corpus Hyponyms
Corpus
Doc.
Text 1
Text 2
Text 3
Text 4
Text 5
Text 6
Text 7
Text 8
Text 9
Text 10
Text 11
Text 12
Text 13
Text 14
Text 15
EO
IT
Hyponym Hyponym
18.70 %
8.89 %
15.93 % 17.22 %
3.50 %
1.3 %
5.42 %
1.84 %
10.41 % 10.86 %
2.13 %
2.28 %
32.25 % 33.33 %
10.26 %
8.61 %
0.92 %
1.97 %
9.52 %
7.32 %
26.87 % 25.93 %
17.77 % 19.68 %
3.26 %
2.59 %
6.81
5.90 %
24.81 %
25%
None of the texts has the same exact percentage of use of this semantic category in both
the source and target language. More than half of the target texts (eight out of fifteen)
make a greater use of hyponyms whereas the remaining seven make a lower use of them.
No pattern emerges from this comparison. This slightly higher use of hyponyms which is
present in the target texts is not reflected in the mean value of the latter, as shown in the
table below:
105
Table 3.39 – Parallel Corpus Hyponym Means
EO
IT
Hponym Hyponym
Mean
Mean
12.57 % 11.51 %
The data in the table show that the source text sub-corpus overall makes a greater use of
hyponyms compared to the target text sub-corpus or translations. This data is in contrast
to the findings of the Italian Originals sub-corpus, in which most of the texts have a
frequency of use of hyponyms higher than ten percent (13 texts) as opposed to the Italian
Originals sub-corpus, in which only six out of fifteen have a frequency of use higher than
ten percent as shown in the table below:
Table 3.40 – Italian Originals Hyponyms
Corpus
Doc.
Text 1
Text 2
Text 3
Text 4
Text 5
Text 6
Text 7
Text 8
Text 9
Text 10
Text 11
Text 12
Text 13
Text 14
Text 15
IO
Hyponym
17.21 %
15.55 %
26.25 %
13.92 %
33.33 %
10.88 %
16.57 %
3.31 %
20.84 %
33.12 %
16.51 %
11.94 %
12.01 %
41.24 %
6.32 %
106
The texts with a frequency of use higher than ten percent are texts 1, 2, 3, 4, 5, 6, 7, 9, 10,
11, 12, 13, and 14. This finding is also supported by the mean value of the whole subcorpus which amounts to 18.6 % as opposed to just 12.57 and 11.51 % of the Source and
Target text sub-corpora, respectively.
3.4 SPSS Statistical Analysis
3.4.1 Textual Features
3.4.1.1 STTRs
A one-way between subjects ANOVA was conducted to compare the effect of language
(IV) on standardized type token ratios (DV) under three conditions: English Originals,
Italian Translations, and Italian Originals. There was not a significant effect of language
on SSTR at the p < 0.05 level for the three conditions [F (2, 42) = 6.075; p = 0.005]. Post
hoc comparisons using the Tukey HSD test indicated that the mean score for the English
Originals condition (M = 45.23; SD = 2.69) was significantly different from the mean
score for the Italian Translations condition (M = 48.76; SD = 3.25) and from the mean
score for the Italian Originals condition (M = 48.26; SD = 3.03). However, there was no
significant difference between the mean score for the Italian Translations condition (M =
48.76; SD = 3.25) and the mean score for the Italian Originals condition (M = 48.26; SD
= 3.03).
107
3.4.1.2 Sentence Number
A one-way between subjects ANOVA was conducted to compare the effect of language
(IV) on sentence number (DV) under three conditions: English Originals, Italian
Translations, and Italian Originals. There was a significance effect of language on
sentence number at the p < 0.05 level for the three conditions [F (2, 42) = 5.1666; p =
0.010]. Post hoc comparisons using the Tukey HSD test indicated that the mean score for
the English Originals condition (M = 151.40; SD = 24.41) was significantly different
from the mean score for the Italian Originals Condition (M = 111.73; SD = 49.12).
However, there was no significant difference either between the mean score for the
English Originals condition (M = 151.40; SD = 24.41) and the mean score for the Italian
Translations condition (M = 137.87; SD = 23.10) or between the mean score for the
Italian Translations condition (M = 137.87; SD = 23.10) and the mean score for the
Italian Originals condition (M = 111.73; SD = 49.12).
3.4.1.3 Lexical Density
A one-way between subjects ANOVA was conducted to compare the effect of language
(IV) on lexical density (DV) under three conditions: English Originals, Italian
Translations, and Italian Originals. There was a significant effect of language on lexical
density at the p < 0.05 level for the three conditions [F (2, 42) = 7.696; p = 0.001]. Post
hoc comparisons using the Tukey HSD test indicated that the mean score for the English
Originals condition (M = 25.59; SD = 2.42) was significantly different from the mean
score for the Italian Translations condition (M = 29.49; SD = 3.17) and from the mean
score for the Italian Originals condition (M = 29.76; SD = 3.99). However, there was no
108
significant difference between the mean score for the Italian Translations condition (M =
29.49; SD = 3.17) and the mean score for the Italian Originals condition (M = 29.76; SD
= 3.99).
3.4.1.4 Average Sentence Length
A one-way between subjects ANOVA was conducted to compare the effect of language
(IV) on average sentence length (DV) in three conditions: English Originals, Italian
Translations, and Italian Originals. There was a significant effect of language on ASL at
the p < 0.05 level for the three conditions [F (2, 42) = 13.755; p = 0.000]. Post hoc
comparisons using the Tukey HSD test indicated that the mean score for the English
Originals condition (M = 22.53; SD = 2.78) was significantly different from the mean
score for the Italian Originals condition (M = 30.12; SD = 4.95). However, there was no
significant difference between the mean score for the English Originals condition (M =
22.53; SD = 2.78) and the mean score for the Italian Translations condition (M = 25.99;
SD = 3.87). The Tukey HSD test also indicated that the mean score for the Italian
Translations condition (M = 25.99; SD = 3.87) was significantly different from the mean
score for the Italian Originals condition (M = 30.12; SD = 4.95).
3.4.2 SPSS Statistical Analysis: Semantic Features
A one-way between subjects ANOVA was conducted to compare the effect of language
(independent variable) on the amount of use of semantic categories (dependent variables),
109
also known as lexical cohesive devices, in three (3) conditions: English Originals, Italian
Translations, and Italian Originals.
Table 3.41 – Parallel & Comparable Corpus Statistical Data
Semantic
Category
Repetition
Synonym
Antonym
Meronym
Holonym
Hypernym
Hyponym
English Source Text
Sub-Corpus
M = 59.52 ; SD = 13.41
M = 11.34 ; SD = 10.30
M = 1.03; SD = 2.02
M = 3.48; SD = 3.92
M = 2.31; SD = 3.56
M = 9.79; SD = 7.93
M = 12.57; SD = 9.76
Italian target Text
Sub-corpus
M = 58.61; SD = 15.68
M = 12.69; SD = 12.35
M = 1.06; SD = 2.04
M = 3.54; SD = 4.04
M = 2.47; SD = 3.89
M = 10.11; SD = 8.07
M = 11.51; SD = 10.29
Italian Originals Subcorpus
M = 45.58; SD = 8.09
M = 10.83; SD = 5.91
M = 0.50; SD = 1.19
M = 8.89; SD = 9.29
M = 0.55; SD = 1.52
M = 15.06; SD = 8.14
M = 18.60; SD = 10.62
3.4.2.1 Repetition
There was a significant effect of language (IV) on repetition (DV) at the p < 0.05 level
for the three conditions [ F (2, 42) = 5.576; p = 0.007].
Post hoc comparisons using the Tukey UDS test indicated that the mean score for
the English Originals condition (M = 59.52, SD = 13.41) was significantly different from
the Italian Originals Condition (M = 45.58, SD = 8.09). The Tukey HDS test also
revealed that the mean score for the Italian Translations condition (M = 58.61, SD =
15.68) was significantly different from the Italian Originals Condition (M = 45.58, SD =
8.09). There was no significant difference between the mean score for the English
Originals condition and the mean score for the Italian Translations condition (p > 0.05).
3.4.2.2 Synonym
110
There was not a significant effect of language (IV) on synonym (DV) for the three
conditions [F (2, 42) = 0.142; p = 0.87].
3.4.2.3 Antonym
There was not a significant effect of language (IV) on antonym (DV) for the three
conditions [F (2, 42) = 0.47; p = 0.630].
3.4.2.4 Meronym
There was not a significant effect of language (IV) on meronym (DV) for the three
conditions [F (2, 42) = 3.679; p = 0.034]. Post hoc comparisons using the Tukey HSD
test revealed that the mean score for the English Originals condition (M = 3.48; SD =
3.92) was significantly different from the Italian Originals Condition (M = 8.89; SD =
9.29). However, there was no significant difference between the mean score of the Italian
Translations condition and the mean score of the Italian Originals condition nor between
the mean score of the English Originals condition and the mean score of the Italian
Translations condition (p > 0.05).
3.4.2.5 Holonym
There was not a significant effect of language (IV) on holonym (DV) for the three
conditions [F (2, 42) = 1.7; p = 0.196].
3.4.2.6 Hypernym
There was not a significant effect of language (IV) on hypernym (DV) for the three
conditions [F (2, 42) = 2.02; p = 0.149].
111
3.4.2.7 Hyponym
There was not a significant effect of language (IV) on hyponym (DV) for the three
conditions [F (2, 42) = 2.09; p = 0.139].
3.4.2.8 Repetition vs. Rest of Semantic Categories
Since the mean scores for the rest of the semantic categories did not, statistically
speaking, turn out to be significant, with the exception of meronym, for which the p value
was greater than 0.05 but still less than 0.06, it was decided to group the other semantic
categories together and document whether there is a statistically significant difference in
mean values among the three conditions when considering antonyms, meronyms,
holonyms, hypernyms and hyponyms as one semantic category instead of separate ones.
A one-way between subjects ANOVA was conducted to compare the effect of language
(IV) on the use of the other semantic categories (DV) as a whole in three conditions:
English Originals, Italian Translations and Italian Originals. The table below shows the
mean values and standard deviations for each condition:
Table 3.42 – Parallel & Comparable Corpus Statistical Means
Semantic
Category
Repetition
Other
Semantic
Categories
English Source Text
Italian target Text
Sub-Corpus
Sub-corpus
M = 59.52 ; SD = 13.41 M = 58.61; SD = 15.68
Italian Originals Subcorpus
M = 45.58; SD = 8.09
M = 40.52 ;
SD = 13.43
M = 54.42;
SD = 8.09
M = 41.39;
SD = 15.68
3.4.2.9 Other semantic categories as a whole
112
There was a significant effect of language (IV) on the amount of use of the other
semantic categories as a whole for the three conditions [F (2, 42) = 5.55; p = 0.007]. Post
hoc comparisons using the Tukey HSD test indicated that the mean score for the English
Originals condition (M = 40.52; SD = 15.68) was significantly different from the mean
score for the Italian Originals condition (M = 54.42; SD = 8.09). It also revealed that
there was a statistically significant difference between the mean score of Italian
Translations (M= 41.39; SD = 15.68) and the mean score of the Italian Originals (M =
54.42; SD = 8.09). However, there was no statistically significant difference between the
mean score for the English Originals condition and the mean score for the Italian
Translations condition.
CHAPTER IV
DISCUSSIONS
4.1 Introduction
The findings from both the textual and semantic analysis tend to confirm the hypotheses
stated in the introduction and in chapter one of this work. The findings presented in the
results chapter suggest that overall there is a significant difference in the use of lexical
cohesive devices in English and Italian. In particular, the findings confirm hypothesis 1,
which claims that Italian translations tend to adopt the lexical cohesive devices of their
attendant English source texts, and hypothesis 2, which claims that articles originally
written in Italian and published in Le Scienze differ in the use of lexical cohesive devices
from Italian translations published in the same magazine. Both the textual and semantic
features under analysis point to an overall stylistic, syntactic and lexical difference
between the two languages in question. The fact that, at the sentential level, there is a
statistical difference in average sentence length between Italian Translations and Italian
originals supports the semantic analysis findings in that longer sentences in the
translations would have an impact on the use of lexical devices. Indeed, by merging two
or more sentences into a bigger, more syntactically complex one, the use of repetition,
which is not employed as often in Italian as it is in English, could be reduced by resorting
113
114
to other lexical or non-lexical cohesive devices such as references, substitutions, ellipsis,
etc.
The following results will be discussed in the light of the findings by Scarpa and
of the universals of translation by Mona Baker. In the latter respect, Baker defines
universals of translation as linguistic features that occur in translated texts rather than
original texts regardless of the source or target language involved in the translation
process (1993: 243). She identifies four major universals in translation which are said to
be valid across languages; they are explicitation, simplification, normalization and
leveling out.
As far as simplification is concerned, three types exist: lexical, syntactic and
stylistic. Blum-Kulka and Leverston (1983) define lexical simplification as “the process
and/or result of making do with less words (1983: 119).” Lexical simplification is
achieved by means of:
1) Use of superordinates, when no equivalent hyponyms are available in the target
language (this translation strategy was investigated by Baker [1992]);
2) Concept approximation, which is the case with culture-bound items;
3) Use of circumlocutions instead of equivalent high-level words (this translation
strategy was investigated by Vanderauwera [1985: 102-3] who notices a use of
colloquial/modern synonyms when translating old, formal and high-level source
language words);
115
4) Paraphrasing, to make up for cultural gaps existing between any two cultures.
In terms of simplification features, Laviosa (1998) identifies four core patterns of lexical
use in the English Comparable Corpus (1998: 565):
a) In translated texts, lexical density is generally lower because the percentage of
content words is relatively lower than that of grammatical words;
b) The ratio of high frequency words versus lower frequency words is relatively
higher in translated texts;
c) The most frequent words are repeated more often;
d) Translations contain fewer lemmas.
At the syntactic level, Vanderauwera (1985) finds several cases in which complex
syntactic structures are simplified by changing non-finite clauses into finite ones. She
also provides evidence for stylistic simplification which is achieved by breaking down
long sentences, reducing or omitting repetitions or redundant information.
As for explicitation, Vinay (1958) carried out a comparative study between
French and English and defined explicitation as “the process of introducing information
into the target language which is present only implicitly in the source language, but
which can be derived from the context or the situation (23).” Likewise, Baker (1996)
describes explicitation as the tendency “to spell things out rather than leave them implicit
(180).” An example of explicitation is provided by Blum-Kulka (1986) who speaks of
cohesive explicitness whereby he refers to shifts in the type of cohesive markers that are
116
used in a text. These shifts are achieved through the replacement of substitution or
ellipsis with repetitions or use of synonyms, which increases the level of cohesion in the
target text. Factors that might explain this phenomenon are stylistic preferences,
systematic differences, or culture-bound translation norms.
Normalization involves the unconscious or conscious use of textual features that
make a target text comply with the typical textual characteristics of the target
language/culture. Baker defines normalization as the “tendency to exaggerate features of
the target language to conform to its typical patterns (1996: 183).” An example of
normalization is when creative lexis is normalized in translations or when typical
collocations are preferred over unusual ones.
Last but not least, leveling out, in Baker’s words, or convergence as Laviosa
(2002) calls it, refers to “the tendency of translated text to gravitate towards the centre of
a continuum (1996: 184).” This tendency is also known as convergence because it
reflects “the relatively higher level of homogeneity of translated texts with regard to their
own scores on given measures of universal features (Laviosa 2002: 72).”
These four universals of translation were empirically studied by LaviosaBraithwaite (1996) using corpus linguistics tools. In this respect, Laviosa argues that
corpus-based techniques have great potential for meeting the need for a rigorous
descriptive methodology (156). However, the empirical study of translation phenomena is
only carried out by means of corpus linguistic tools, which, though providing some
statistical data, does not tell the translation scholar whether or not there is statistical
117
significance in his or her findings. The present study, by contrast, sets out to put forward
a new methodology which combines the statistical and quantitative data provided by such
corpus tools with statistical analysis tools such as SPSS which can objectively tell us
whether or not the hypotheses put forward at an early stage of a study and the actual
findings identified in the analysis are statistically significant.
Lastly, the choice of using parallel and comparable corpora is, like this whole
study, grounded in theory. In this respect, it is Baker who states that by shifting the focus
of translation studies research from comparing source and target texts or languages to
comparing text production per se with translation, translation scholars are able to
“explore how text produced in relative freedom from an individual script in another
language differs from text produced under the normal conditions that pertain in
translation (1995).” This is why in the present study the results gleaned from the textual
and lexical analysis of the English into Italian parallel corpus were compared with those
from the textual and lexical analysis of the Italian comparable corpus. By so doing, it was
possible to identify and compare text production patterns pertaining to Italian
translationese and text production patterns pertaining to Italian in a non-translation
context.
4.2 Textual Features
Average Sentence length
118
The SPSS analysis of the textual features shows that Italian translations reproduce the
syntactic features of the English source texts as far as sentence length is concerned.
Indeed, post hoc comparisons using the Tukey test indicate that the p value between the
mean score for the English source text sub-corpus and the Italian target text sub-corpus is
greater than 0.05. It follows that there is no statistically significant difference in terms of
average sentence length between the two sub-corpora. This finding does not support the
syntactic Simplification hypothesis which is one of the universals of translation identified
by Mona Baker, according to which translations may have shorter sentences; this finding
also contradicts the Normalization hypothesis according to which translations should
comply with the textual characteristics of the target language. However, this finding
confirms what Scarpa argues in her article “Corpus-based Quality-Assessment of
Specialist Translation: A Study Using Parallel and Comparable Corpora in English and
Italian” in which she finds that sentences were actually longer in translations (2006: 169).
However, she does not provide any statistical evidence for that finding. As demonstrated
herein, though the sentences tend to be longer in the translated texts, statistically
speaking, the difference in mean score between the English source text sub-corpus and
the Italian target text sub-corpus is not significant. On the other hand, the SPSS analysis
shows that there is a statistically significant difference in average sentence length
between the Italian Translations sub-corpus and the Italian Originals sub-corpus. The p
value between the two means is 0.018 which points to a great statistical significance. The
mean value of the translations is closer to the mean value of the English source texts than
it is to the mean value of the Italian originals. This implies that, syntactically speaking,
119
translations mirror the source language syntax since English prefers coordination unlike
Italian which prefers hypotaxis.
Sentence number
The SPSS analysis of this textual feature shows that Italian translations as a whole have
an amount of sentences similar to that of their attendant English source texts. Indeed, the
p value between the mean score for the English Originals sub-corpus and the Italian
translations sub-corpus is greater than 0.05, which points to an insignificant difference in
mean score. This finding contradicts what Scarpa’s translation quality-assessment study
reports. Indeed, she states that the average number of sentences in the Italian translated
texts was found to have reduced “somewhat more drastically” (2006: 170) but no
statistical evidence was provided to prove her statement. This finding also, statistically
speaking, rejects the Simplification hypothesis mentioned above, in that the latter predicts
a lower number of sentences in translation. Though this is somewhat true in that most of
the target texts generally have a lower number of sentences than their attendant source
texts, this difference is not such as to make it statistically significant.
Another interesting finding relates to the difference in sentence number between
the Italian Translations and the Italian Originals. My second hypothesis claims that there
should be a difference between the latter two sub-corpora. Though the number of
sentences is overall lower in the Italian Originals sub-corpus than in the English Originals
and Italian translations sub-corpora, statistically speaking, the SPSS analysis indicates
that there is no significant difference between Italian translated texts and texts originally
120
written in Italian. By contrast, there is a statistically significant difference between
English and Italian originals (p = 0.008). This result would somehow be in contrast with
the finding about average sentence length. Indeed, if Italian Originals have a significantly
longer average sentence length than the Italian Translations, this would imply a higher
sentence number for the Translations. This would be true, if twenty-three sentences had
not been omitted in some of the translations. Indeed, were the twenty-three sentences to
be added to the Italian Translations sub-corpus then there would be a statistically
significant difference in sentence numbers between Translations and Italian Originals, in
that the sentence number would be almost identical to that of the English Originals subcorpus, and this would hence support my second hypothesis. The omissions of these
sentences in the translation process can be classified as examples of syntactic and stylistic
simplification, as put forward by Vanderauwera (1985), who argues that translations are
usually syntactically simplified through the omission of long circumlocutions or
irrelevant details and redundant information. Indeed, the sentences which were omitted in
the translations most of the time provided further information either between brackets or
as part of the main text. The following are some examples:
Sentence 67 from “Next Stretch for Plastic Electronics” reads as follows: “(the channel is
where an electric current flows through a transistor – or not – and where the switching
action takes place);”
121
Sentence 142 from “The Color of Plants on Other Worlds” is in brackets and reads as
follows “(this is one of the models enlisted to calculate how much light reaches the solar
panels of the Mars rovers);” or
Sentence 113 from the same text which reads “The photons work together like stages of a
rocket to provide the necessary energy to an electron as it performs the chemical
reactions.”
In all the above-mentioned examples, the information omitted in the translation process is
additional and gives details which can be left out without compromising the overall
understanding and/or coherence of the text.
STTR
The SPSS analysis of this textual feature shows that Italian translations differ from their
English source texts in terms of vocabulary variation. This means that in translations
there is a greater variation in the use of vocabulary because Italian prefers lexical variety.
This finding supports the Normalization hypothesis, according to which target texts tend
to adapt the style and sentence structure of the source texts to the stylistic and syntactic
conventions of the target language. This finding also supports Scarpa’s study. She notices
that higher quality translations are associated with a higher type-token ratio (170).
However, she does not provide any statistical evidence in this respect. The finding of my
study statistically proves that there is a significant difference (p value < 0.05) between the
mean score for the English Originals and the mean score for the Italian translations. The
122
SPSS analysis also shows that there is no statistically significant difference between
Italian Translations and Italian Originals (p > 0.05). Indeed the mean values for the
comparable corpus is almost identical, which means that Italian Translations are closer,
stylistically speaking, to the target language, namely Italian.
Lexical Density
The SPSS analysis shows that there is a statistically significant difference in terms of
lexical density, which, as a reminder, is the ratio between content words and the total
number of words in a text, between the English originals and their Italian translations.
Indeed, the p value between the mean score for the English Originals sub-corpus and the
Italian Translations sub-corpus was less than 0.05 (p = 0.006). It follows that translations
were found to be more lexically densed (in terms of content words) than their source
texts. This finding contradicts Scarpa’s study in which translations were found to have a
lower lexical density (2006: 170). However, as noted above, in calculating lexical
density, she only focused on the first one-hundred high frequency words, whereas in the
present study lexical density was calculated out of the total number of content words after
taking out grammar words through a stop list.
As far as universals of translation are concerned, this finding contradicts the
Simplification hypothesis according to which translations should have a lower
information load as a result of a higher use of lexico-grammatical relations and repetition.
The SPSS analysis also shows that there is no statistically significant difference in the
123
amount of information load between Italian Translations and Italian Originals. Indeed,
the mean values for both sub-corpora are almost identical: 29.49 for the Italian
translations sub-corpus and 29.76 for the Italian Originals sub-corpus which significantly
differ from the 25.59 mean score of the English Originals sub-corpus. This implies that,
generally speaking, the Italian language has a higher lexical density, which means that
there is a greater use of different content words.
4.3 Semantic Features
4.3.1 Repetition
The SPSS analysis shows that there is no statistically significant difference in mean score
between the English Originals and the Italian Translations. In other words, translators
tend to reproduce this lexical cohesive device in the target text instead of replacing it with
other lexical or non-lexical cohesive devices. Indeed, stylistic choices in Italian
recommend the avoidance of repetition in favor of a greater use of lexical variety
(Musacchio, 2007: 179). This stylistic feature of the Italian language is statistically
proved by the comparable corpus consisting of Italian Translations and Italian Originals,
in which the difference in mean scores between these two sub-corpora was statistically
significant (p = 0.021). On the whole, the Italian Originals corpus had a lower use of
repetitions than Italian translations and this finding is similar to Scarpa’s (171). In
particular, this finding contradicts the Explicitation hypothesis, according to which
translations tend to have a higher number of repetitions (155), and the Normalization
124
hypothesis, according to which, in the transfer from the ST to the TT, the ST style and
sentence structure are adapted to the textual features of the target language (Scarpa, 2006:
156). Though the repetition mean score for the translations is slightly lower than the
mean score for English Originals, the difference is not statistically significant. A
comparison of each single pair of source and target texts shows that there is a major
difference in the use of repetition only in three cases:
1) “The Iceman Reconsidered;”
2) “Shaping the Future” and;
3) “Sowing a Gene Revolution.”
The rest of the target texts tend to overall reproduce this lexical cohesive device. In the
first pair of source and target texts, namely “The Iceman Reconsidered,” the percentage
of use of repetition in the source text is 33.85 % as opposed to just 16.57 % of the target
text. Here it is possible to state that normalization was performed at the stylistic level by
replacing repetition with other lexical cohesive devices. The following are some
examples:
The term evidence is only once translated as reperti but in the other sentences it is
then translated by means of several hypernyms such as elementi, dati, testimonianze,
segni. The verb find is translated as rinvenire three times and then in the rest of the text
the same verb is translated by means of 7 synonyms (ritrovare, trovare, scoprire,
scoperta), 2 hypernyms (individuare, ricavare) and in two cases it was omitted.
125
In the second pair of source and target texts, namely “Shaping the Future”, the
percentage of use of repetition devices in the English source text is 88.48 % as opposed
to 82.76 % of the Italian translation. Lexically speaking, repetition was replaced with
synonyms or hypernyms. In a few cases, there were omissions or use of pronouns. For
example, the English term cost, which occurs fourteen times in the source text, was
translated ten times as costi, three times as spese (synonym), and in one case it was
omitted.
Lastly, in “Sowing a Gene Revolution”, the percentage of use of repetitions in the
source text amounts to 59.59 % compared to 50.86 % of its Italian translation. Some of
the lexical cohesive devices which were employed to limit redundancy in the repetition of
the same term were hypernyms and synonyms. In other cases, repetition was avoided
through omissions or use of demonstratives or pronouns. For example, the English term
farmers, which occurs thirty-three times in the source text, is translated fourteen times as
contadino, eighteen times as agricoltore or coltivatore (synonym), and in just one case it
is omitted.
As far as the comparable corpus is concerned, the SPSS analysis indicates that
there is a statistically significant difference between the mean score for the Italian
Translations sub-corpus and the Italian Originals sub-corpus. Indeed, the p value between
these two groups is less than 0.05, more precisely 0.021. By looking at the mean values
for both sub-corpora which are 58.61 % for the Italian translations and 45.58 % for the
Italian Originals, it is possible to state that texts which are originally written in Italian
126
tend to use a lower amount of repetitions as opposed to Italian translationese. Thus the
presence of lexical redundancy in Italian translations published in the same magazine as
the Italian originals fails to meet the target readership’s expectations.
4.3.2 Synonyms
The SPSS analysis indicates that statistically speaking there is no difference in mean
score either between the texts making up the parallel corpus or between the texts making
up the comparable corpus. Indeed, the p value between the respective groups is greater
than 0.05 which points to a lack of statistical significance. Indeed, the mean values for the
three groups of texts are very close to each other. The English originals sub-corpus has a
mean value of 11.34 %, the Italian translations sub-coprus has a mean value of 12.69 %,
whereas the Italian originals sub-corpus has a mean value of 10.83 %.
Statistically speaking, the general trend found in the whole corpus does not
support the lexical simplification hypothesis put forward by Blum-Kulka and Leverson
(1983), according to which translations make a greater use of superordinates and
synonyms. However, if each pair of texts from the parallel corpus is taken into
consideration, one can notice that in five out of fifteen cases, there is a greater use of
synonyms in the translated texts. The texts in question are:
1) “The Iceman reconsidered;”
2) “Sowing a Gene Revolution;”
127
3) “Shaping the Future” and;
4) “Intrigue at the Immune System.”
In the above-mentioned texts, repetition is replaced by synonyms in fifteen, twenty-one,
nine and twenty one cases respectively. For example, in “Sowing a Gene Revolution”, the
English term farmers is first translated as contadini (for fourteen times) but in the rest of
the text the same term is translated by means of a synonym, namely agricoltori or
coltivatori. In these texts, there was an attempt to limit redundancy and adapt the style of
the source language to that of the target language by resorting to synonyms. However, as
can be deduced from the analysis of the Italian Originals documents, synonyms are not
the main lexical cohesive device that can be used to achieve that goal.
4.3.3 Meronyms, Holonyms, Hypernyms, Hyponyms, Antonyms
The SPSS analysis shows that there is no statistically difference in terms of use of
meronyms either between the source and target texts or between the Italian Translations
and Italian Originals as a whole. By contrast, there was a slightly significant difference
between English Originals and Italian Originals. However, since the percentage of use of
this semantic category is very low compared to the use of synonyms or repetitions in both
the parallel and comparable corpora due to the tendency to resort to other cohesive
devices, it is not possible to state that the use of meronyms in translations is different
from the use of meronyms in texts that are originally written in Italian because this would
mean overgeneralizing the results, which, considering the size of the corpus and its being
128
restricted to a specific text-type, would not be feasible. The same comment applies to the
other less commonly used semantic categories which were analyzed in this study such as
holonyms, antonyms, hypernyms and hyponyms. For all of these semantic categories,
when each is considered individually, the SPSS analysis showed that there was no
statistically significant difference in their use in either corpus. However, if all of them are
grouped together, including synonyms, and are considered as a whole in opposition to
repetitions, the findings, as shown in the section below, turn out to be significant.
4.3.4 Semantic categories other than repetitions as a whole
Interestingly enough, when the percentage of use of synonyms, antonyms, meronyms,
holonyms, hypernyms, and hyponyms is summed up and considered as one semantic
category apart from repetitions, important conclusions can be drawn. In this respect, the
SPSS analysis shows that there is no statistically significant difference (p = 0.981)
between the source and target text sub-corpora. This means that overall target texts resort
to semantic categories other than repetition in the same way and in the same amount as
their source texts. If this finding is considered together with the one concerning
repetition, then it is evident that translations tend to resort to the same semantic
categories in so far as there is no statistically significant difference in the use of
repetitions or the remainder of the semantic categories.
What needs to be pointed out here is that, statistically speaking, both English
Originals and Italian Translations do not differ either in the amount of use of repetitions
129
or in the use of semantic categories other than repetitions when the latter are considered
as a whole. This implies that translations tend to reproduce the semantic categories of
their source texts disregarding the stylistic preferences of the Italian language which
makes a lower use of repetitions.
Another important finding concerns the difference in the use of repetitions and the
rest of the semantic categories between Italian Translations and Italian Originals as well
as between English Originals and Italian Originals. In this respect, the SPSS analysis
shows that there is a statistically significant difference in mean score between texts
originally written in English and the ones originally written in Italian. The statistical
significance of this finding points to different stylistic preferences in the two languages
under analysis when it comes to using repetitions. Generally speaking, English Originals
have a far higher percentage of use of repetitions compared to Italian Originals. The p
value between the two mean scores is 0.013 which points to a significant difference in
resorting to repetitions. The same is true of the p value between the mean scores for the
other semantic categories when considered all together. The p value is 0.013 which
implies that unlike English, Italian prefers to avoid an overuse of repetition by resorting
to other semantic categories thus making texts more lexically varied. This finding is
confirmed by a significant difference in lexical density and standardized type-token ratio
between the mean score for the English Originals sub-corpus and the Italian Originals
sub-corpus with a p value of 0.003 and 0.023 respectively.
130
Though the SPSS analysis did not show any statistically significant difference in
terms of mean scores for either the parallel or comparable corpus with the only exception
of meronyms, overall Italian originals make a larger use of meronyms (which was
statistically proven) but above all of hyponyms and hypernyms. The same is not true for
the Italian Translations where the mean scores for hyponyms and hypernyms are almost
identical to those of their translations. As noted above, this finding contradicts the lexical
simplification hypothesis whereby translations make a larger use of superordinates. This
result is closely connected with the repetition mean score of the Italian translations subcorpus which is very close to that of the English Originals sub-corpus. Out of a very large
number of repetition devices that were computed and analyzed in the English Originals
sub-corpus, only a very small number of them were replaced by hyponyms or hypernyms.
To be more precise, the total number of repetition devices that were computed in the
English Originals sub-corpus amounts to 2,627, twenty-eight of which were rendered as
hyponyms and fifty-eight as hypernyms. The following are some examples:
Cases in which repetition was replaced by a hyponym:
Ex. 1. Gene (ST) > Allele (gene sequence variation) (TT)
Ex. 2. Company (ST) > Celera (name of the company) (TT)
Ex. 3. Response (ST) > Reaction (TT)
Cases in which repetition was replaced by a hypernym
Ex. 1. Customers (ST) > Consumatori (TT)
131
Ex.2 Microsatellites (ST) > Strutture satellitari (TT)
Ex. 3 Corpse (ST) > Corpo (TT) meaning body instead of using cadavere.
Interestingly enough, there were also cases (21 in total) in which hyponyms were
translated by means of hypernyms:
Ex. 1. Blue (ST) > Banda
Ex. 2. Human cloning (ST) > Tecnica (TT)
Ex. Query (ST) > Ricerca
Apart from these very few instances in which repetition was replaced by hyponyms and
hypernyms, other translation techniques for avoiding repetition which were identified
during the contrastive textual/lexical analysis of the English source texts and their Italian
translations were omissions (258 instances), synonyms (104 instances), substitution (7
instances), reference – mainly pronouns - (38 instances). Overall, the general translation
approach to repetition which was found in the translated texts was that of maintaing the
same cohesive device unchanged. Investigating the reasons behind these choices is not
within the scope of this study; what needs to be pointed out is that because of these
choices, the translations tend to reflect the stylistic, lexical and syntactic preferences of
the source language. By so doing, the target readership’s expectations are not met and
this ultimately compromises the readability of these texts. At the initial stage of this
dissertation, one of the objectives was to carry out an experiment whereby readers’
expectations had to be tested. However, because of several factors (mainly time
132
constraints and availability of subjects), this second part of the analysis was not carried
out. The subjects needed to be Italian native speakers living in Italy. Their task was to
assess the fluency of texts (half Italian translations and half Italian originals) in terms of
style, lexis and syntax and then classify them either as translations or as non-translated
texts. Distance from the subject recruitment location, along with the travel- and subjectrelated expenses that such experiment would entail made me abandon the project, which I
hope could still be carried out as a follow-up study.
CHAPTER V
CONCLUSIONS
5.1 Introduction
The analysis and discussion of the results presented in chapters three and four support the
two hypotheses put forward in this study. Hypothesis 1 claims that English source texts
and their attendant Italian translations are similar in terms of use and amount of lexical
cohesive devices, whereas hypothesis 2 claims that Italian translations and Italian
originals differ in the use and amount of lexical cohesive devices.
As for hypothesis 1, the statistical analysis of the lexical cohesive relations in the
parallel corpus (consisting of English Originals and Italian Translations), which was
carried out by means of SPSS, showed that there was no significant difference in mean
scores between the two sub-corpora for each of the following semantic categories when
taken individually:
1) Repetition
2) Synonymy
3) Meronymy
4) Antonymy
133
134
5) Holonymy
6) Hypernymy and
7) Hyponymy.
The SPSS analysis also showed that there was no significant difference in mean scores
between the two sub-corpora when all of the above-mentioned semantic categories, with
the only exception of repetition, were put together and considered as one single semantic
category. Further evidence in support of this hypothesis was the lack of statistically
significant difference in mean scores for both number of sentences and average sentence
length between the English Originals and the Italian Translations.
By contrast, the statistical analysis of the semantic categories in the comparable
corpus, consisting of English Originals and Italian Originals, showed that there was a
significant difference only for repetition when each semantic category was taken
individually. However, when all of the remaining semantic categories, namely synonyms,
meronyms, holonyms, hypernyms and hyponyms were put together and considered as
one single semantic category, the SPSS analysis showed that there was a significant
difference in mean scores between English Originals and Italian Translations. Further
evidence in support of hypothesis 2 was obtained from the statistical analysis of the
average sentence length which turned out to be significantly different between Italian
Translations and Italian Originals.
135
These findings statistically prove what other discourse analysis and translation
scholars have, over the past thirty years, theorized in their studies. In this respect, James
argues that “while every language has at its disposal a set of devices for maintaining
textual cohesion, different languages have preferences for certain of these devices and
neglect certain others (1980: 109).” Likewise, Hatim and Mason (1990) state that when
one translates from a source language 1 into a target language 2 , the underlying
coherence (that is to say the semantic relations set up by the cohesive devices) should be
kept invariant in the translated texts. What might need to change are the surface linguistic
elements or devices used to reproduce the source language semantic relations because
these surface elements might be language- or text-type specific.
My analysis focused on scientific texts taken from an American-English
magazine, namely Scientific American, and demonstrated that the articles originally
written in Italian make a lower use of lexical repetition and a greater use of other lexical
cohesive devices, especially hyponyms and hypernyms.
As mentioned above, during the translating process only the underlying semantic
relations should be kept invariant in the translation; what needs to change is instead the
surface structure which is used to establish those relations. Transferring the same surface
linguistic elements into the target text might cause the target readership to find the latter
less coherent. It follows that cohesion, and more specifically, lexical cohesion, is, out of
the seven standards of textuality identified by De Beaugrande and Dressler (1981), very
important to text comprehension. Indeed, research has shown that lexis is one of the main
136
reasons for comprehension problems (Alderson, 1984; Cassell, 1982). Given that
translating or the process of translation requires successful comprehension of a text, it
follows that being able to identify and transfer lexical cohesive devices is a necessary
translator’s skill if textual equivalence is to be achieved in the target text. In this respect,
Newman (1988) states that the cohesive level is “a regulator, it secures coherence, it
adjusts emphasis” (1988: 24). At this level, the translator is forced to deal with the values
that are intrinsic to lexis. S/he needs to identify the differences between positive and
negative words, positive and neutral words, and negative and neutral words and then
transfer the same value of those words in the target text (1988: 24) Cohesion, which is
one of the four levels of translating in Newman’s approach (the other three being the
textual, referential and naturalness levels), is mainly concerned with the structure of texts
and the moods of texts.
By structure, Newman means the links among sentences or information items,
whereas by mood he means a dialectical factor that helps determine the negative, positive
or neutral meanings of words which need to be kept invariant in the target language
(1988: 23-24). It follows that when choosing a synonym in a target language, translators
have to take into account shades of meaning or what Inkpen and Hirst (2006) call types of
differences in synonyms (224-225). They identify three of them:
1) Denotational differences (synonyms differing in meaning);
2) Attitudinal differences (synonyms differing in connotation); and
3) Stylistic differences (differing in their level of formality or register).
137
Making students aware of the role that cohesion in general, and lexical cohesion in
particular, play in the translation problem can make the translation process smoother. In
other words, lexical cohesion plays a major role in reading comprehension in that it is
realized through lexis. But reading comprehension is only one of the activated cognitive
processes, which a translator goes through while translating, that involve lexis. Indeed,
the translator from reader then turns himself/herself into a writer and attempts to convey
or rather evoke an experience (which is his/her response/interpretation of the text)
through the meticulous choice of lexical items in another language (Rosenblatt 1989).
Remaining on the subject of comprehension, it is worth pointing out that cohesion
is not the only source of texture or coherence in a text as claimed by Halliday and Hasan
(1976). In this respect, other studies carried out by scholars such as Widdowson (1978),
Carrell (1982), Brown & Yule (1983) show that cohesive devices need not be present in a
text in order for the latter to be coherent. In the latter case, the source of cohesion is to be
found outside the text, more specifically, in the reader’s prior or background knowledge
and schemata. Whenever cohesive devices are missing between sentences, readers tend to
infer the links based on their interpretation of the illocutionary acts accompanying the
propositions being uttered (Widdowson 1978: 28-30).
Though it does not fall within the scope of this study to report on the cognitive
processes activated by translators when faced with lack of explicit cohesive links, this
phenomenon in translation bears further investigation in order to see to what extent
readers’ schemata can affect their understanding of the semantic relations existing among
138
sentences in a text, how these mental models can help readers explicitate textual
conceptual gaps and how this whole cognitive process affects the readers’ understanding
of the global meaning of the text to be translated.
As Kostopoulou points out, translation, being an act of speech and
communication, is performed at the level of text and discourse by the translator’s
resorting not only to linguistic but also extra-linguistic devices (2007: 146), therefore
both linguistic and extra-linguistic knowledge needs to be enhanced in translation
trainees or future translators. However, this study focused only on the analysis of explicit
lexical cohesive devices. Therefore, the pedagogical suggestions offered in the section
below will deal primarily with how to recognize explicit lexical cohesive devices and
their semantic relations, and with how to transfer them into the target text thus making
sure to achieve textual equivalence.
5.2 Pedagogical Implications
Several authors have published a number of works on how to teach cohesion. Most of
these studies belong to second language teaching, particularly English as a second
language. In this respect, Lubelska, in her article “An Approach to Teaching Cohesion to
Improve Reading,” deals mainly with how to develop students’ ability to interpret
cohesive devices more effectively. Indeed, as hinted in the previous section, recent
studies on reading comprehension have shown that some learners find it difficult to make
sense of a text because of their failure to interpret the writer’s cohesive signals as
139
intended (1991: 569). The author focuses only on inter-sentential cohesion, though the
latter can also occur at the intra-sentential level. The pedagogical activities suggested by
the author are concerned with the teaching of reference and lexical cohesion. As far as
lexical cohesion is concerned, some of the activities for developing an understanding of
what this cohesive device is and how it works may involve giving students short
paragraphs in which semantically-related words were previously underlined and then ask
them a number of questions such as (1991: 592-594):
1) Do the underlined words mean the same thing?
2) What’s the name of words that have similar meanings?
3) Can you identify as many synonyms of word A as possible?
4) Write down the first ten words you think of when reading word A.
These questions are aimed at developing a student’s cognitive thinking through discovery
procedures. Since synonymy is a slippery concept, not all of the words that students will
circle or identify will most likely belong to this semantic category. Some of the words
they might think are synonyms may very well belong to other semantic categories such as
hyponymy or hypernymy. Therefore, this activity gives the opportunity to the foreign
language instructor to have students reflect upon the differences existing between the
above-mentioned categories from a semantic point of view.
Likewise, Ian McGee argues in his article “Traversing the Lexical Cohesion
Minefield” that repetition is a common way of achieving lexical cohesion in scientific
140
texts. However, students sometimes tend to overuse this cohesive device, thus
compromising the reader’s understanding of the text itself (2009: 213). He offers a few
suggestions on how to teach students to avoid overusing this cohesive category.
However, his approach to teaching cohesion falls within the framework of second
language acquisition. Indeed, one of the reasons for this overuse or abuse that he
mentions is L1 interference. In other words, foreign language learners may transfer the
text structure patterns and style preferences which are typical in their L1 into their L2
writing. Though this comment does not necessarily apply to all translators, in that some
of them may only translate into their L1, the article still offers pedagogical suggestions to
language teachers who would like to improve their students’ writing skills in a foreign
language. In this respect, the author argues that an excessive use of lexical repetition can
be avoided by resorting to synonyms. However, choosing and using synonyms
successfully is not an easy task because the semantic, attitudinal and connotative nuances
put forward by Inkpen and Hirst (2006) must be taken into consideration. In this respect,
the author suggests using WordNet which, as described in chapter 2, is a lexical database
wherein nouns, verbs, adjectives and adverbs are grouped into synsets, that is to say, sets
of synonyms. Students may be given altered texts asking them to identify inappropriate
uses of synonyms (McGee 2009: 219). An altered text could be a text in which some of
the synonyms related to key terms have been replaced with other synonyms having, say,
different connotations. For example, there are adjectives that though semantically related
have positive, neutral or negative values or “attitudes.” Having students reflect on these
differences helps them become aware of the fact that a wrong lexical choice may
141
compromise the cohesiveness of the text by changing the writer’s/reader’s stance on the
actor, object or action being described. For instance, a person can be referred to as astute
or sagacious. Though these two adjectives are synonyms, they have different
connotations. Astute,has a negative connotation in that the focus is on the person’s use of
his/her cleverness to gain some advantage; sagacious, on the other hand, has a positive
connotation in that it places the focus on the person’s wisdom without implying any
hidden purpose. The above-mentioned activity can help them reflect on the slippery
nature of synonymy.
Applied to the translation field, similar pedagogical activities can be carried out in
a translation class. The first objective that needs attaining in such a class is to make sure
students understand the difference between cohesion and coherence since there is a little
confusion about these two concepts.
Some scholars (Halliday & Hasan [1976]; Schiffrin [1987]) think of cohesion as a
semantic concept. In their view, cohesion refers to the meaning relations to be found
among the surface linguistic elements present in a text. Others (Baker [1992]; Thompson
[1996]) refer to it as surface relations, that is to say lexical and grammatical dependencies
linking words and sentences together (Baker 1992: 218). Halliday and Hasan’s notion of
cohesion is text-bound in that as mentioned above they do not take into consideration the
role played by the reader’s schemata and background knowledge in making sense of the
semantic relations among the sentences of a text.
142
Baker’s definition, on the other hand, takes into consideration both textual and
extra-textual factors. In her view, which is similar to that held by other scholars such as
De Beaugrande & Dressler (1981), Brown & Yule (1983), and Hatim & Mason (1990)
cohesion relates to the surface relations within a text whereas coherence relates to the
semantic relations, as perceived by the reader, underlying the very same surface linguistic
elements (or cohesive devices).
One way to show the difference between cohesion and coherence is to have
students read two short paragraphs one in which the interpretation of the propositions,
and hence the illocutionary acts expressed by the latter, requires activating one’s
background knowledge or schemata, and another one in which the relations between the
sentences are made explicit through cohesive devices. This way the translation teacher
can also have students reflect on the necessary but not absolute role of cohesive devices
in making sense of a text or sets of semantically-related sentences. To exemplify this
concept, Widdowson (1978: 29) gives the following example:
A: That’s the telephone.
B: I’m in the bath.
A: O.K.
Though the above-mentioned three utterances do not have any cohesive devices which
may textually establish a semantic relationship among them, any person who is familiar
with a phone call scenario, can make sense of it by inferring the communicative value
behind these three sentences. Taken together, these utterances are part of a
143
communicative exchange in which A’s utterance is interpreted as a request, B’s utterance
as a negative response and finally A’s reply as an acceptance of what B says. This stretch
of text is interpreted as coherent because the person who reads it recognizes the
illocutionary act performed by each sentence and can therefore fill in the propositional
gaps which help produce a cohesive conversational exchange which reads as follows
(1978: 29):
A: That’s the telephone. (Can you answer it, please?)
B: (No, I can’t answer it because) I’m in the bath.
A: O.K. (I’ll answer it).
It is because of the illocutionary value we give to the propositions we hear or read that we
are able to recover the propositional link(s) that are missing and that enable us to make
sense of the written or spoken discourse (1978: 31).
By having students work on short extracts taken from texts or dialogues, the
teacher helps them recognize and understand not only the difference between cohesion
and coherence but also the role that a reader’s schemata or background knowledge play in
making sense of texts. At this level, students are encouraged to recall what the difference
between cohesion and coherence is, recognize it when presented with hands-on activities
or tasks, and explain where the difference lies. These activities will help students start
developing two of the six intellectual skills or behaviors that the cognitive domain
involves and that are classified in Bloom’s revised taxonomy as follows (Krathwohl
2002: 215):
144
1) Remembering (being able to recall data/information);
2) Understanding (being able to determine the meaning of instructional messages);
3) Applying (being able to apply what one learns in similar but new situations);
4) Analyzing (being able to compare and contrast);
5) Evaluating (being able to assess decisions or a course of action);
6) Creating (being able to draw on what one has learned and generate new ideas or
products).
Once this first pedagogical milestone is attained, the next step is to make them aware of
the fact that, as Hatim and Mason (1990) claim, when translating from a Language A into
a Language B, what needs to be kept invariant is the underlying coherence (semantic
relations) which is textually conveyed by surface linguistic elements, namely cohesive
devices, which may, on the other hand, need to shift insofar as they are language- and
text-type specific. In this respect, pertinent insights into the teaching of cohesion are
given in the book Teaching Translation from Spanish to English: Worlds beyond Words
by Allison Beeby Lonsdale who discusses cohesion differences between English and
Spanish. In particular, the author states that English prefers lexical repetition and
pronominalization because, unlike Romance languages, it makes very few distinctions in
terms of gender, verb agreement and number (1996: 219). Therefore, it is more difficult
for the reader to keep track of the right reference. Lexical repetition and
pronominalization make it easier for the English reader to establish reference and
145
cohesion in a text (1996: 225). As far as lexical cohesion is concerned, one of the
activities that the author suggests is to have students read short texts and then have them
make a list of the referential networks that can be identified in a text. In other words, a
text is made up of several paragraphs, each dealing with the same or a different concept.
The main referential network is the one established by the title but within the text other
semantic fields can be established and identified. Once these referential networks are
identified, it is then possible to make a list of all the content words that belong to each
referential network. This task can be sped up and made easier through the use of
WordList which is one of the three applications of WordSmith tools which can make a
word list for us.
In a translation class, the focus should be on the different ways these referential
networks are represented in the text. In this respect, Allison Beeby Lonsdale suggests
providing translation trainees with parallel texts. The term parallel employed by the
author has a different definition from the one adopted in the present study. By parallel
texts, she means texts which are not translations of each other but which are comparable
in terms of topic, genre, register, language sub-field etc. The texts chosen by the author of
the book deal with Einstein’s theory of relativity. Three main topics were singled out,
namely (225):
1) physics before Einstein;
2) the theory of relativity; and
3) reality and time/space
146
Students were asked to list all the referential networks or references related to these three
topics. Once they were done with this activity, students noticed the absence of repetition
and a variety of references in the Spanish text (225). This pedagogical activity is an
effective one because it helps translation trainees reflect on the difference in the use of
lexical cohesive devices in any two languages. The author does not suggest the use of any
corpus tools because the length of the texts provided to the students in their study was
short. However, using corpora can help students easily identify referential networks
because programs such as WordSmith Tool create word lists indicating the frequency of
use of words. Since, when working with referential networks, the focus is on content
words, WordSmith Tool allows the user to leave out of consideration all grammatical
words by loading a word stop list into the application. This way, only content words will
be included in the word list. Word lists can help students identify all types of repetitions,
including the use of derivational forms. In this respect, to identify derivational forms of
certain content words, the user can just arrange his or her word list in alphabetical order
and then lemmatize the words having the same stem. Using WordSmith Tool to identify
referential networks can be a follow-up activity to their manual identification. This way,
students are prompted to apply what they already learned in a novel situation thus
fostering the development of the third level of Bloom’s intellectual behaviours, namely
Applying.
Another activity that could be done in class could be having students identify the
referential networks of both parallel (my definition) and comparable texts dealing with
the same topic(s) and compare them to see how the references to the latter change
147
depending on whether the text being analyzed is a translation or an original text. All these
activities are aimed at making students aware of stylistic, syntactic, cultural or semantic
differences when it comes to using lexical cohesive devices in another language. In terms
of learning objectives, students are encouraged to develop their analytical skills through
compare-and-contrast activities (Bloom’s fourth level).
Let us consider lexical differences and the use of these databases in task design.
As mentioned above, choosing synonyms when translating is not always an easy task
because words have semantic, attitudinal and connotative differences. Students need to be
aware of these nuances if textual equivalence is to be attained in the target text. In other
words, students need to be able to choose TL lexical items that do not betray the semantic
value conveyed by their ST counterparts. This is all the more true if they have to pick a
synonym to avoid repetition. In this respect, a very useful tool is WordNet for the English
language and MultiWordNet for other languages such as Italian, Spanish, Portuguese,
Romanian, Hebrew and Latin. This multilingual lexical database allows the user to type
in a term and see its synsets, or sets of synonyms not only in the language of the search
but also in the other languages which are supported by the database. Like WordNet,
MultiWordNet also allows you through a drop-down menu, to check the other semantic
relationships that the word being looked up has to other words such as hypernymy,
hyponymy, meronymy or antonymy. In this respect, an interesting pedagogical task
aimed at having students become familiar with the above-mentioned semantic relations,
could be to provide them with a short text, in which a limited number of key words are
underlined, and then ask students to identify all the other lexical words which are
148
semantically related to each word and establish their type of relationship by using
MultiWordNet. By so doing, students will understand not only the importance that words
have in a text but also the network(s) of semantic relationships they create. This part of
the task will also help students pay attention to inter-sentential bonds and links and how
sentences are semantically related to one another thus developing their awareness of a
text as a translation unit in itself. Then students could be grouped into pairs of two/three,
as the case may be, and asked to compare their referential networks and assess one
another’s choices thus promoting their moving up Bloom’s revised taxonomy (fifth level:
Evaluating). Lastly, students could be asked to translate the text in question and the
referential networks thereof, by making all the necessary shifts in terms of lexical
cohesive devices, and then justify their choices.
This last part of the task aims at
promoting the development of their creativity (Bloom’s sixth level) by having them
suggest an actual solution to the translation of such referential networks. However, when
using WordNet or MultiWordNet, students need to be warned of one of the big
shortcomings of these lexical databases, which is their classification of semantic
relationships based on the word class (nouns, verbs, adverbs or adjectives) lexical items
belong to. This means that the verb open and the adjective open are not considered
semantically related in these lexical databases; it follows that students need to use their
best judgement when analyzing a text and identifying semantic relationships because
machines cannot do all the work for us.
As previously mentioned, Italian does not make a great use of repetition because it
makes many distinctions in terms of gender, verb agreement and number. When it comes
149
to scientific texts, as the ones analyzed in this study, though, the use of repetition is
higher because the aim is that of avoiding ambiguity especially in the case of coreference. However, as the analysis of the data showed, the use of the repetition device
though higher compared to the other lexical cohesive devices, was yet lower compared to
English Originals. What students need to understand is not that they have to avoid
repetition all the time but rather that they can make use of synonyms or superordinates or
other semantic categories whenever repetition can be avoided without making the
referent obscure or ambiguous.
All the above-mentioned activities or tasks are aimed at developing not only the
students’ micro-analysis skills, by having them focus on intra- and inter-sentential
semantic relations, but also their instrumental, interpersonal and attitudinal competences
(Kelly 2005). However, translation trainees also need to learn to look beyond the textual
boundaries; they need to look at the bigger picture which allows them to grasp the global
meaning of the text they need to understand and translate. They need to be taught how to
identify textual characteristics and conventions associated with TL genres, register and
speech acts. Lexical cohesion is strictly related to textual features and conventions
because, as mentioned above, depending on the level of formality of a text or the type of
text, the use of content words may differ. Cultures differ in levels of formality and
conventions depending on the context(s).
Students need to understand that words that are equivalent in a given target language
at the micro-textual level may not be so at the macro-textual level, that is to say, in terms
150
of text-type or register. The choice of lexis and the level of formality of the latter change
from culture to culture, and consequently from language to language. To make students
aware of these differences that exist at the macro-textual levels, an activity that could be
done in class is to have students collect a comparable corpus of texts belonging to the
same text-type and language sub-field in the language pair the students are working with
and then analyze the text-type features and use of lexis through corpus tools such as
WordSmith Tool.
By means of this tool, students can be made aware of textual features which are often
neglected in a translation class. By looking at the statistical data that this corpus tool
provides, students can become familiar with differences concerning syntax (hypotaxis vs.
parataxis) and paragraph arrangement (rhetorical purposes). To focus on sintax, students
can just look at the statistical data concerning number of sentences and average sentence
length to reflect upon differences in how messages are conveyed in writing in a particular
text-type and language sub-field. The analysis of paragraphs in terms of rhetorical
purposes (Bhatia [1993]; Swales [1990]) of a particular text-type can also help students
become aware of differences as to its “cognitive structure”, which refers to the
conventionalized and standardized arrangement of rhetorical moves or functions used by
a particular professional discourse community (Bhatia 1993: 32). In other words, any
given text-type has a particular organization of rhetorical functions which is realized
through a series of paragraphs, each fulfilling a specific function. These functions or
moves are usually associated with communicative intentions and realized by specific
lexical-grammatical choices, including lexical cohesive patterns themselves (see chapter
151
1), which ensure the connectedness of rhetorical moves and are accepted and recognized
by the member of a particular discourse community (Swales 1990: 58). Discourse
communities have common communicative goals, a high level of expertise, use a highlyspecialized terminology and possess specific text-types through which their members
further their aims (Swales 1990: 24-27). In this study, the discourse community in
question is a highly educated one. However, despite the magazines belonging to the same
text-type and language sub-field, the discourse communities involved belong to two
different cultures, namely American-English and Italian. Since each culture has its own
set of lexical-grammatical features to fulfill specific rhetorical purposes, through the
comparison of comparable corpora, translation students can become familiar with texttype differences and stylistic preferences in any two languages and learn how to
effectively convey the source text rhetorical purposes into the target text through the
appropriate TL lexica-grammatical and stylistic choices. In this respect, in order to test
the students’ textual competence, borrowing Neubert and Shreve’s terminology, different
projects, with different learning objectives, could be devised depending on the linguistic
background of the class. In a language-specific class, in which students work with the
same language pair, the focus of the project could be on the differences in rhetorical
moves and move-related lexical cohesive patterns (LCPs) across text-types. Students
could be grouped in pairs and each group could be asked to collect a comparable corpus
of texts of a to-be-agreed-upon length and belonging to a specific text-type (assigned by
the instructor to make sure each group has a different one).Their task would be to identify
the text-type-specific rhetorical moves and the move-related LCPs and then present their
152
findings in class at the end of a module, course, or semester. Having each group present
their projects allows to have students reflect on text-type differences, since each group
focuses on a different text type. By contrast, in a non-language-specific class, in which
students come from or have different cultural/linguistic background, the focus should be
on language-specific differences in rhetorical moves and LCPs. In this case, students
could be grouped according to their language pair, if possible, and asked to collect a
comparable corpus of texts, the text-type of which should be the same for each group. As
in the previous project, their task would be to identify the rhetorical moves and moverelated LCPs. At the end of the module, course or semester, they would be asked to
present their findings so students can compare them and see whether or not the rhetorical
moves and move-related LCPs for the same text-type change across languages.
5.3 Limitations and Future Directions
Some of the limitations of this study concern text-type, language pair, text-bound
cohesion, and subjectivity in establishing semantic relations.
As for text-type, this study focused only on articles taken from one scientific
magazine, namely Scientific American and its Italian version Le Scienze. So the results
are representative of the specialized language used by the discourse community working
for and reading this magazine. It would be an overgeneralization to say that the findings
for the Italian Originals apply to the Italian science language as a whole. Therefore,
further investigation into the differences or rather stylistic preferences as to the use of
153
lexical cohesive devices needs conducting in several specialized sectors of the Italian as
well as English language. Needless to say, in order for the findings to be generalizable,
texts should belong to different text-types and be taken from a number of magazines or
sources.
Another limitation of this study is the language pair. The findings apply to
stylistic and text-type-related differences between English and Italian. However, it is not
possible to claim that the same differences or findings apply for example to other
language pairs such as German and Italian or Spanish and Italian. There is still a lot of
research to be done as to the stylistic preferences and differences between languages.
The analysis of the lexical cohesive devices carried out on the selected texts is
product-based and is mainly concerned with lexical cohesion. This cohesive category has
been studied because of its major impact (as Hoey [1991] asserts) on the coherence of a
text. However, since coherence is subjective in that it depends on the reader’s ability to
interpret the semantic relationships within a text, this study fails to assess to what extent
the reproduction of the lexical cohesive patterns of the source text into the Italian target
text might affect the reader’s understanding of the latter as a coherent whole. Indeed, in
Sanchez Escobar’s view, the coherence of a text is created by the writer through his or
her lexical, semantic and syntactic choices and by the reader through his or her
interpretation of the text (1999: 558). Therefore, a direction for future research would be
to study the target readership’s response to Italian translated and original texts to see if
translated texts, wherein the surface structure of the underlying source text’s semantic
154
relations has been kept invariant are overall rated as less coherent than texts which are
originally written in Italian. This experiment would also help provide evidence for the
importance that lexical cohesion has in translation quality in that in the end it is the target
audience who decides on the success or failure of the end product, that is to say, the
translated text.
Another important limitation is the subjectivity in establishing semantic relations.
Since the analysis conducted in this study was manual, it was the author himself who read
through the source and target texts and identified the semantic relationships existing
among the different key words which were extracted from WordSmith Tool. Though the
semantic categories to which each lexical item being investigated belonged were
identified by means of WordNet for the English texts and MultiWordNet for the Italian
translations and originals, in a few cases the lexical database only provided a set of
synsets but no hyponyms, hypernyms, meronyms and so on. In such cases, other sources
such as Italian thesauri or dictionary were consulted to identify the semantic
relationship(s) between any two words, and in establishing these semantic relations the
author’s subjectivity might have played a small part. However, this only happened for a
very small number of terms; the bulk of the semantic relations were established through
the above-mentioned lexical databases.
Lastly, the biggest limitation of this study is its being product-based. At its initial
stage, both a product- and process-based investigation of lexical cohesion was supposed
to take place. However, for a series of reasons the process-based project could not be
155
carried out, mainly because of time constraints and shortage of subject availability in the
desired target language. Therefore, a future direction that this study should take is mainly
empirical, namely an investigation into cohesion and coherence from an expertise point
of view. My hypothesis rested on Folkart’s claim that the quality of a translation is
affected by the students’ focus on lower sententional ranks which prevents them from
seeing the text as a whole. Indeed, cohesive markers come into play only at the suprasententional rank (1988: 151).
Based on these findings, the hypothesis which was formulated at the very
beginning and which needs to be tested claims that translations carried out by novice
translators are less cohesive and coherent than the ones done by expert translators in view
of the micro-textual approach to text analysis adopted by novices, which leads them to
disregard or pay less attention to the network of lexical chains created by lexical
cohesion. By contrast, translations done by expert translators are expected to be more
cohesive and coherent given their awareness of text-level forms but nevertheless are not
completely compliant with TL norms and TL text-type conventions.
The latter
assumption was partly proven by the product-based investigation of lexical cohesion
carried out in this study in that the translated texts tend overall to comply with the lexical
cohesion preferences of their source text.
The present study set out to shed more light on the stylistic differences or
preferences when it comes to lexical cohesion between English and Italian in scientific
texts. However, much is yet to be done on this topic across languages and text-types. An
156
important innovation of this study was its adoption of both corpus and statistical tools for
analyzing lexical cohesion. Indeed, so far most of the corpus-based studies which relate
to cohesion in general or lexical cohesion in particular mostly discuss the statistical data
computed by such corpus tools as WordSmith Tool in general terms without
demonstrating whether or not there is statistical significance in the differences which are
discovered. It is hoped that this study will help future translation scholars adopt this
combined methodology in investigating lexical cohesion thus providing more valuable
data in support of the teaching of this standard of textuality in translator training
programs.
GLOSSARY OF ACRONYMS
ANOVA = Analysis of Variance
ASL = Average Sentence Length
DV = Dependent Variable
EN = English
EO = English Original (Text)
EFL = English as a Foreign Language
F = F value
IO = Italian Original (Text)
IT = Italian Translation
IV = Independent Variable
L1 = First Language (native)
L2 = Second Language (non-native)
LCP = Lexical Cohesive Pattern
LD = Lexical Density
LSP = Language for Special Purposes
M = Mean
S = Sentence
SD = Standard Devision
SL = Source Language
ST = Source Text
SPSS = Statistical Package for the Social Sciences
SSLMIT = Scuola Superiore di Lingue Moderne per Interpreti e Traduttori (Advanced
School of Modern Languages for Interpreters and Translators)
157
158
STTR = Standardized Type-Token Ratio
TL = Target Language
TT = Target Text
TTR = Type-Token Ratio
References
Abdel-Hafiz, A.S. “Lexical Cohesion in the Translated Novels of Naguib Mahfouz: the
Evidence from the Thief and the Dogs.” Occasional Papers in the Development of
English Language Education, Vol. 37, Oct. 2003 – Mar. 2004, pp. 63 – 88.
Aiwei, S. “The Importance of Teaching Cohesion in Translation on a Textual Level: A
Comparison of Test Scores before and after Teaching.” In Translation Journal,
2004, Vol. 8, n° 2.
Alderson, J.C. “Reading in a Foreign Language: A Reading Problem or a Language
Problem?.” In Alderson, J.C., & Urquhart, A.H. (eds.), Reading in a Foreign
Language. London: Longman, 1984.
Anderson, M. L. Lexical Cohesion in Emma: An Approach to the Semantic Analysis of
Style. Ann Arbor, Michigan: Bell & Howell Company, 1997.
Aziz. “Translation and Pragmatic Meaning.” In Shunnaq et. Al (eds.), Issues in
Translation. Irbid: Irbid Nation University & Jordanian Translators’ Association,
1998, pp. 119 – 141.
Baker, M. “Textual Equivalence: Cohesion.” In Baker, M. In Other Words, London:
Routledge, 1992, pp. 180-215.
Baker, M. In Other Words: A Coursebook on Translation. London: Routledge, 1992.
Baker, M. “Corpus Linguistics and Translation Studies: Implications and Applications.”
In Baker, M., Gill, F., & Tognini-Bonelli, E. (eds.), Text and Technology: In
Honour of John Sinclair. Amsterdam: John Benjamins, pp. 233-250, 1993.
Baker, M. “Corpora in Translation Studies: An Overview and some Suggestions for
Future Research.” In Target, Vol. 7, n. 2, 1995, pp. 223 – 243.
Baker, M. “Corpus-based Translation Studies: The Challenges That Lie Ahead.” In
Somers, H. (ed.), Terminology, LSP and Translation. Studies in Language
Engineering in Honour of John C. Sager. Amsterdam: John Benjamins, pp. 175186, 1996.
Baker, M. “A Corpus-based View of Similarity and Difference in Translation.” In
International Journal of Corpus Linguistics, Vol. 9, n. 2, 2004, pp. 167 – 193.
159
160
Baker, P., Hardie, B., McEnery, T. A Glossary of Linguistics. Edinburgh: Edinburgh
University Press, 2006.
Bhatia, V.K. (Analysing Genre: Language Use in Professional Settings. London:
Longman, 1993.
Bosseaux, C. How Does it Feel? Point of View in Translation: The Case of Virginia
Woolf into French. Amsterdam/New York: Rodopi, 2007.
Bowker, L., & Pearson, J. Working with Specialized Language: A Practical Gude to
Using Corpora. London: Routledge, 2002.
Beaugrande, De, R. Text, Discourse, and Process: Toward a Multidisciplinary Science of
Texts. Norwood, NJ: ABLEX Publishing Corporation, 1980.
Beaugrande, De, R., & Dressler, W. Introduction to Text Linguistics. London & New
York: Longman, 1981.
Beigman, B., & Shamir, E. “Lexical Cohesion: Some Implications of an Empirical
Study.” In Bernadette Sharp (Ed.), Natural Language Understanding and
Cognitive Science. Miami, FL: INSTICC Press, 2005, pp. 13-21.
Beigman, K.B., & Shamir, E. “Reader-based Exploration of Lexical Cohesion”. In
Language Resources and Evaluation, 2006, Vol. 40, n° 2, pp. 109-126.
Beigman, B., Diermeier, D, & Beigman, E. “Lexical Cohesion Analysis of Political
Speech: Web Appendix.” In Political Analysis, 2008, Vol. 16, pp. 447-463.
Berber Sardinha, Antonio Paulo. “Patterns of Lexis in Original and Translated Business
Reports: Textual Differences and Similarities.” In Karl Simms, Translating
Sensitive Texts: Linguistic Aspects, 1997, pp. 147-154.
Bernardini, S. “Exploring New Directions for Discovery Learning.” In Kettemann, B, &
Marko, G. (Eds), Language and Computers, Teaching and Learning by Doing
Corpus Analysis. Proceedings of the Fourth International Conference on
Teaching and Language Corpora, Graz 19-24 July, 2000, pp. 165-182.
Bloom, B. Developing Talent in Young People. New York: Ballantine, 1985.
Blum-Kulka, S., & Levenston, E.A. “Universals of Lexical Simplification.” In Faerch,
C., & Kasper, G. (eds.)., Strategies in Interlanguage Communication.
London/New York: Longman, pp. 119-139, 1983.
Blum-Kulka, S. “Shifts of Cohesion and Coherence in Translation.” In House J., Blum-
161
Kulka S. Interlingual and Intercultural Communication: Discourse and Cognition
in Translation and Second Language Acquisition. Tübingen: Narr, 1986, pp. 1735 – reprinted in Lawrence Venuti. The Translation Studies Reader. London/ New
York: Routledge, 2004, 298-313.
Brown, G., & Yule, G. “The Nature of Reference in Text and in Discourse.” In Brown,
G., & Yule, G., Discourse Analysis. Cambridge, UK: Cambridge University Press,
1983, pp. 190-222.
Butler, C. Structure and Function: From Clause to Discourse and Beyond. Amsterdam:
John Benjamins, 2003.
Campbell, K. S. Coherence, Continuity, and Cohesion: Theoretical Foundations for
Document Design. Hillsdale, NJ: Lawrence Erlbaum Associates, 1995.
Carrell, P.L. “Cohesion Is Not Coherence.” In TESOL Quarterly, 1982, Vol. 16, n° 4, pp.
479-488.
Carrell, P.L. “Some Issues in Studying the Role of Schemata, or Background Knowledge,
in Second Language Comprehension.” In Reading in a Foreign Language, 1983,
Vol. 1, pp. 81-92.
Chueca Moncayo, F. J.” The Textual Function of Terminology in Business and Finance
Discourse.” In JoSTrans, 3/2005, S. 40 – 63.
Cradler, James F. & Michael K. Launer. “Problems of Referential Cohesion in Russianto-English Translation.” In Karl Kummer, Building Bridges, 1986, pp. 293-300.
Ericsson, K. A., Krampe, R. T., & Tesch-Roemer, C. “The Role of Deliberate Practice in
the Acquisition of Expert Performance.” In Psychological Review, 1993, Vol.
100, pp. 363 – 406.
Eriksson, A. “Tense, Cohesion and Coherence” In Karin Aijmer, Hilde Hasselgård,
Translation and Corpora, 2004, pp. 19-31.
Fellbaum, C. WordNet: An Electronic Lexical Database. Cambridge/London: The MIT
Press, 1998.
Flesch, R. How to Test Readability. New York: Harper & Brothers, 1951.
Flowerdew, J., & Mahlberg, M. Lexical Cohesion and Corpus Linguistics.
Amsterdam/Philadelphia: John Benjamins Publishing Company, 2009.
Folkart, B. “Cohesion and the Teaching of Translation.” In Meta, 1988, Vol. 33, n° 2, pp.
162
142-155.
Fulcher, G. “Cohesion and Coherence in Theory and Reading Research.” In Journal of
Research in Reading, 1989, Vol. 12, n° 2, pp. 146-163.
Gunning, R. The Technique of Clear Writing. New York: McGraw-Hill, 1973.
Halliday, M. A. K. & Hasan, R. Cohesion in English. London: Longman Group Limited,
1976.
Halliday, M. A. K. An Introduction to Functional Grammar. London: Edward Arnold
Limited, 1985.
Hansen-Schirra, S., Neumann, S. & Steiner, E. “Cohesion and Explicitness and
Explicitation in an English-German Translation Corpus.” In Languages in
Contrast, 2007, Vol. 7, No. 2, pp. 241-265.
Hasan, R. “Coherence and Cohesive Harmony.” In J. Flood (Ed.), Understanding
Reading Comprehension. Newark, Del.: ITA, 1984, pp. 181-219.
Hatim, B., & Mason, I. “Discourse Texture.” In Hatim, B., & Mason, I., Discourse and
the Translator. London: Longman, 1990, pp. 192-222.
Heiman,G.W. Understanding Research Methods and Statistics: An Integrated
Introduction for Psychology. Boston/New York: Houghton Mifflin Company,
2001.
Hirst, G. & St-Onge, D. “Lexical Chains as Representations of Context for the Detection
and Correction of Malapropisms.” In Fellbaum, C. (Ed), WordNet: An Electronic
Lexical Database. Cambridge, Massachusetts/London, England: The MIT Press,
1998, pp.305-332.
Hoey, M. Patterns of Lexis in Text. New York/Oxford/Toronto: Oxford University Press,
1991.
Inkpen, D., & Hirst, G. “Building and Using a Lexical Knowledge Base of NearSynonym Differences.” In Computational Linguistics, 2006, 32 (2), pp. 223-262.
James, C. Contrastive Analysis. London: Longman, 1980.
Jobbins, A.C., & Evett, L.J. “Automatic Identification of Cohesion in Texts: Exploting
the Lexical Organization of Roget‘s Thesaurus.” In Proceedings of ROCLING
VIII, Taipei, Taiwan, 1995.
163
Kachroo, B. “Textual Cohesion and Translation.” In Meta, 1984, Vol. 29, n° 2, pp. 128134.
Kalina, M. Equivalence and Cohesion in Translation: A Study of a Polish Text and its
English Translation. Master Thesis published at the University of Toledo,
December 2000.
Kelly, D. A Handbook for Translator Trainers. Manchester: St. Jerome, 2005.
Khany, R., & Tazik, K. “The Relationship between Rhetorical Moves and Lexical
Cohesion Patterns; the case of Introduction and Discussion sections of Local and
International Research Articles.” In Journal of English Language Teaching and
Learning, 2011, pp. 71-95.
Kirkpatrick, L.A., & Feeney, B.C. A Simple Guide to SPSS for Version 16.0. Wadsworth:
Cengage Learning, 2009.
Klaudy, Kinga & Kristina Károly. “The Text-Organizing Function of Lexical Repetition
in Translation.” In Maeve Olahan, Intercultural Faultlines: Research Models in
Translation Studies 1. Textual and Cognitive Aspects, 2000, pp. 143-160.
Klebanov, B.B., Diermeier, D, & Beigman, E. “Lexical Cohesion Analysis of Political
Speech: Web Appendix.” In Political Analysis, 2008, Vol. 16, pp. 447-463.
Kostopoulou, G. “The Role of Coherence in Text Approaching and Comprehension:
Applications in Translation Didactics.” In Meta, 2007, LII (1), pp. 146-155.
Krathwohl, D.R. “A Revision of Bloom’s Taxonomy. An Overview.” In Thoery into
Practice, Vol. 41 (4), pp. 212-218.
Krein-Kühle, Monika. “Cohesion and Coherence in Technical Translation: The Case of
Demonstrative Reference.” In Leona Van Vaerenbergh, Linguistics and
Translation Studies, Translation Studies and Linguistics, 2002, pp. 41-53.
Laviosa-Braithwaite, S. “Comparable Corpora: Towards a Corpus Linguistic
Methodology for the Empirical Study of Translation.” In Thelen, M., &
Lewandoska-Tomaszczyk, B. (eds.), Translation and Meaning Part 3. Maastricht:
UPM, pp. 153-163, 1996.
Laviosa, S. “The English Comparable Corpus: A Resource and a Methodogy.” In
Bowker, L., Cronin, M., Kenny, D., Pearson, J. (eds.), Unity in Diversity: Current
Trends in Translation Studies. Manchester: St. Jerome, 1998, pp. 101 – 112.
164
Laviosa, S. Corpus-based Translation Studies. Theory, Findings, Applications.
Amsterdam: Rodopi, 2002.
Leech, G. “Corpora and Theories of Linguistic Performance.” In Svartvik, J. (ed.),
Directions in Corpus Linguistics. Berlin: Mouton Gruyter, 1992, pp. 105 – 122.
Lonsdale, A.B. Teaching Translation from Spanish to English: Worlds beyond Words.
Ottawa: University of Ottawa Press, 1996.
Lotfipour-Saedi, Kazem. “Lexical Cohesion and Translation Equivalence.” In Meta,
1997, 42:1. pp. 185-192.
Lubelska, D. “An Approach to Teaching Cohesion to Improve Reading.” In Reading in a
Foreign Language, 1991, 7 (2), pp. 569-596.
Lucisano, P. Piemontese, M.E. “GULPEASE: Una Formula per la Predizione della
Difficoltà dei Testi in Lingua Italiana.” In Scuola e Città, 3, 1988, pp. 110 – 124.
Mason, I. “Discourse, Ideology and Translation.” In Robert de Beaugrande, Abdulla
Shunnaq, and Mohamed H. Heliel (eds.), Language, Discourse and Translation in
the West and Middle East (John Benjamins, 1994), pp. 23–34.
McGee, I. “Traversing the Lexical Cohesion Minefield.” In ELT Journal Volume,2009,
63 (3), July, pp. 212-220.
Morris, J., & Hirst, G. “Lexical Cohesion Computed by Thesaural Relations as an
Indicator of the Structure of Text.” In Computational Linguistics, 1991, Vol. 17,
n° 1, pp. 21-48.
Musacchio, M. T. "The distribution of Information in LSP Translation: A Corpus Study
of Italian". In Ahmad K. & Rogers M. (Eds.). Evidence-based LSP: Translation,
Text and Terminology. Frankfurt am Main: Peter Lang, 2007, pp. 97-117.
Muto, K. “The Use of Lexical Cohesion in Reading and Writing.” In Journal of School of
Foreign Languages, 2006, Vol. 30, pp: 107-129.
Neubert, A., & Shreve, G.M. Translation as Text. Kent, OH: Kent State University Press,
1992.
Newman, P. A Textbook of Translation. Hemel Hempstead: Prentice-Hall, 1988.
Nord, C. Text Analysis in Translation: Theory, Methodology, Didactic Application of a
Model for a Translation-oriented Text Analysis. Amsterdam & New York:
Rodopi, 2005.
165
Okumura, M., & Honda, T. “Word Sense Disambiguation and Text Segmentation Based
on Lexical Cohesion.” In Proceedings of COLING-94, 1994, pp. 755-761.
Olohan, M. Introducing Corpora in Translation Studies. London: Routledge, 2004.
Pierce, C.S. Collected Papers. Cambridge, MA: Harvard University Press, 1933.
Roos, D. “Translation Features in a Comparable Corpus of Afrikaans Newspaper
Articles.” In Stellenbosch Papers in Linguistics PLUS, Vol. 38, 2009, pp. 73 – 83.
Rosenblatt, L.M. “Writing and Reading: The Transactional Theory.” In Mason, J.
Reading and Writing Connections. Newton, MA: Allyn & Bacon, 1989.
Salkie, R. Text and Discourse Analysis. London/New York: Routledge, 1995.
Sanchez, Escobar, A.F. “Teaching Textual Cohesion Through Analyses of Defoe’s Moll
Flanders and Swift’s Gulliver’s Travels.” In Cauce, Revista de Filologia y su
Didactica, 1999, 22-23, pp. 557-570.
Scarpa, F. “Corpus-based Quality-Assessment of Specialist Translation: A Study Using
Parallel and Comparable Corpora in English and Italian.” In Gotti, M., &
Šarčevič, S. Insights into Specilized Translation. New York/Oxford: PETER
LANG, 2006, pp. 155 – 172.
Schiffrin, D. Approaches to Discourse. Oxford: Blackwell, 1994.
Séguinot, C. “Pragmatics and the Explicitation Hypothesis”. In TTR, 1988, Vol. 1,
Number 2, pp. 106-113.
Shreve, G. “Knowing Translation: Cognitive and Experiential Aspects of Translation
Expertise from the Perspective of Expertise Studies.” In Alessandra R. (Ed.),
Translation Studies: Perspectives on an Emerging Discipline. Cambridge:
Cambridge University Press, 2002, pp. 150 – 171.
Sinclair, J. Corpus, Concordance, Collocation. Oxford: Oxford University Press, 1991.
Sinclair, J.McH. Trust the Text. Language, Corpus and Discourse. London: Routledge,
2004.
Singh, R. “Contrastive Textual Cohesion.” Montreal, Université de Montréal.
Unpublished.
Steiner, E. “Explicitation, its Lexicogrammatical Realization, and its Determining
Variables - Towards an Empirical and Corpus-based Methodology.” In
166
SPRIKreport, 2005, no. 36, Dec. 2005.
Stubbs, M. “British Traditions in Text Analysis: From Firth to Sinclair.” In Baker, M.,
Francis, F. & Tognini-Bonelli, E. (eds.). Text and Technology: In Honour of John
Sinclair. Amsterdam: John Benjamins, 1993, pp. 1 – 36.
Stubbs, M. Text and Corpus Analysis. Computer-assisted Studies of Language and
Culture. Oxford and Cambridge, Mass.: Blackwell, 1996.
Stubbs, M. “Computer-assisted Text and Corpus Analysis: Lexical Cohesion and
Communicative Competence.” In Schiffrin, D., Tannen, D. & Hamilton, E. (Eds),
The Handbook of Discourse Analysis. Malden Massachusetts, Oxford: Blackwell,
2001, pp. 304-320.
Stoddard, S. Text and Texture: Patterns of Cohesion. Norwood, NJ: Ablex Publishing
Corporation, 1991.
Swales, John M. Genre Analysis: English in Academic and Research Settings.
Cambridge: Cambridge University Press, 1990.
Taylor, C. “What is Corpus Linguistics: What the Data Says.” In ICAME Journal,
32/2008, pp. 179 – 200.
Teich, E., & Fankhauser, P. “WordNet for Lexical Cohesion Analysis.” In Soijika, P.,
Pala, K., Smrz, P., Fellbaum, C., & Vossen, P. (Eds.), Proceedings of the 2nd
Global WordNet Conference, Masaryk University, Brno, Czech Republic, Jan,
2004, pp. 326-331.
Thompson, G. Introducing Functional Grammar. London: Edward Arnold, 1996.
Thompson, G., & Hunston, S. System and Corpus: Exploring Connections. London:
Equinox, 2006.
Tirkkonen-Condit, Sonja & Jukka Mäkisalo. “Cohesion in Subtitles: A Corpus-based
Study.” In Across Languages and Cultures, 2007, 8:2. pp. 221-230.
Tognini-Bonelli, E. Corpus Linguistics at Work. Amsterdam: John Benjamins, 2001.
Tsai, Y. “Text Analysis of Patent Abstracts.” In The Journal of Specialized Translation,
13/2010, pp. 61 – 80.
Valdes, C. & Fuentes, L.,A. “Coherence in Translated Television Commercials.” In
European Journal of English Studies, 2008, Vol. 12, N. 2, pp. 133-148.
167
Vanderauwera, R. Dutch Novels Translated into English: The Transformation of a
“Military” Literature. Amsterdam: Rodopi, 1985.
Vinay, J.P., & Darbelnet, J. Stylistique Comparée du Français et l’Anglais. Méthode de
Traduction. Paris: Didier, 1958.
Widdowson, H. G. Teaching Language as Communication. Oxford: Oxford University
Press, 1978.
Xuanmin, L. “A Textual-Cognitive Model for Translation” In Perspectives, 2003, 11:1.
pp. 73-79.
Webography:
Mike, S. “WordSmith Tools.” Lexically.Net. 2010, Version 5.0. July 15th 2011
<http://www.lexically.net/downloads/version5/WordSmith.pdf>.
Translatednet Labs. September 5th 2011 <http://labs.translated.net/text-readability/>
Université de Neuchâtel. September 20th, 2011 <www.unine.ch/info/clef>
WordNet: A Lexical Database for English. Princeton University. September 20th. <http://
WordNetweb.princeton.edu>
MultiWordNet.
Fondazione
<http://multiWordNet.fbk.eu>
Bruno
Kessler.
September
20th,
2011