Do Chinese and English speakers think about time differently

Cognition 104 (2007) 427–436
www.elsevier.com/locate/COGNIT
Brief article
Do Chinese and English speakers think about
time differently? Failure of replicating
Boroditsky (2001) q,qq
Jenn-Yeu Chen
Institute of Cognitive Science, National Cheng Kung University, 1 University Road, Tainan 701, Taiwan
Received 10 May 2006; revised 19 July 2006; accepted 18 September 2006
Abstract
English uses the horizontal spatial metaphors to express time (e.g., the good days
ahead of us). Chinese also uses the vertical metaphors (e.g., ‘the month above’ to mean
last month). Do Chinese speakers, then, think about time in a different way than English speakers? Boroditsky [Boroditsky, L. (2001). Does language shape thought? Mandarin and English speakers’ conceptions of time. Cognitive Psychology, 43(1), 1–22]
claimed that they do, and went on to conclude that ‘language is a powerful tool in
shaping habitual thought about abstract domains’ (such as time). By estimating the frequency of usage, we found that Chinese speakers actually use the horizontal spatial
metaphors more often than the vertical metaphors. This offered no logical ground
for Boroditsky’s claim. We were also unable to replicate her experiments in four different attempts. We conclude that Chinese speakers do not think about time in a different
way than English speakers just because Chinese also uses the vertical spatial metaphors
to express time.
2006 Elsevier B.V. All rights reserved.
q
This manuscript was accepted under the editorship of Jacques Mehler.
This work was sponsored by the NSC-93-2752-H-006-001-PAE grant awarded to the author by the
National Council of Taiwan, ROC. It was carried out by Yi-Tien Tsai as part of the requirement for her
master’s thesis.
E-mail address: [email protected]
qq
0010-0277/$ - see front matter 2006 Elsevier B.V. All rights reserved.
doi:10.1016/j.cognition.2006.09.012
428
J.-Y. Chen / Cognition 104 (2007) 427–436
Keywords: Linguistic relativity hypothesis; Time; Chinese; English
English uses the horizontal spatial metaphors primarily to express time (e.g.,
‘to look back 30 years’), whereas Chinese also uses the vertical metaphors (e.g.,
‘the month above’ means last month; ‘the day ahead’ means the day before yesterday). Will the use of different spatial metaphors affect the way time is conceptualized in the language users? Boroditsky (2001) addressed this question with a
spatial priming task. The task first engaged the participants in spatial processing,
followed immediately by sentence judgment that involved the processing of time.
In a typical trial, the participants saw two pictures in a row, each depicting two
objects aligned horizontally or vertically. A sentence appeared at the bottom
describing the spatial relationship of the two objects. The participants had to
determine if the sentence was a correct statement. Following the pictures was
shown a sentence which described the temporal relationship of two time units,
e.g., June comes before May. The participants had to decide too if the sentence
was a correct statement. The sentence used two kinds of words to describe the
temporal relationship, one a spatial metaphor such as before and after, and the
other a time word such as earlier and later. Using the spatial metaphor was
meant to serve as a methodological check of the effectiveness of the priming
procedure. Indeed, both the English and the Chinese participants responded to
the time sentence faster when they had just processed the horizontal relationship
of the objects than the vertical relationship of the objects. The critical test of
the hypothesis came from the participants’ responses to the time sentences which
used the time words. The English participants responded to this kind of sentences faster when they had just processed the horizontal relationship of the objects,
but the Chinese participants were faster when they had just processed the vertical relationship of the objects. Because the Chinese participants were Chinese–
English bilinguals, and they processed the English sentences in the task, the
findings argued strongly for the point that the use of spatial metaphors could
change the way speakers think about time. The Chinese speakers displayed a
tendency of thinking about time vertically because they talked about time vertically. Such a tendency persisted even when the Chinese speakers processed time
in English.
Boroditsky’s findings were very persuasive, except that an important assumption she made runs against our intuition as a native speaker of Chinese. She
assumed implicitly that Chinese speakers used the vertical metaphors far more
frequently than the horizontal metaphors when expressing time. The assumption
was evident in the way she analyzed and described the data: ‘‘English speakers
answered purely temporal questions faster after horizontal primes than after vertical primes. . . Mandarin speakers were faster after vertical primes than after horizontal primes’’ (p. 10). Unfortunately, she never tested that assumption. Here,
we report a study which tested the assumption (Part 1) and repeated her experiment (Part 2). To anticipate the results, the data of the frequency of usage did
J.-Y. Chen / Cognition 104 (2007) 427–436
429
not support Boroditsky’s assumption. We also could not replicate her experimental findings.
1. Part 1
1.1. Method
We searched the web news in Taiwan for the time expressions. In the first attempt,
we downloaded 100 pieces of news from the Yahoo News Taiwan over four days.
We, then, extracted all the expressions that contained time. The frequencies of the
horizontal spatial terms and the vertical spatial terms were tallied. In the second
attempt, we searched the Google News Taiwan, using the Chinese time words
(day, week, month, season, and year) and the spatial terms (above, below, before,
and after) as the combined keywords. Each search yielded many pieces of news,
so we kept only the first 30.
1.2. Results
The results from the Yahoo search (Table 1) showed that the number of time
expressions using the horizontal spatial terms exceeded the number using the vertical
spatial terms (250 vs. 122). We assumed that the news had been drawn from a single
sample and so performed a v2 analysis of the frequency distribution. The result was
significant: v2(1) = 44, p < .0001. We have also performed a matched-sample t-test as
well as a sign test, treating each news piece as the unit of analysis. The results of both
tests were consistent with that of the v2 analysis: for the t-test, t(99) = 4.38,
p < .0001; for the sign test, 57 out of 100 news pieces used more horizontal metaphors than vertical metaphors, 29 showed the opposite, and 14 had a tie, z = 2.91,
p < .005.
The Google search yielded similar results (Table 2). Here, we applied the v2 tests
only. Overall, the time units were expressed with a horizontal metaphor more often
than with a vertical metaphor: v2(1) = 33, p < .0001. Time of event (e.g., before the
moon disappeared) was the major contributor of the horizontal bias. However, when
time of event had been removed, the horizontal bias remained significant: 58% vs.
42%, v2(1) = 11, p < .0008. We also contrasted the usage patterns for week and other
time units (excluding time of event). The patterns were opposite: v2(1) = 24.8,
p < .0001, with week showing a vertical bias (56% over 44%) while other time units
showing a horizontal bias (68% over 32%).
Table 1
Number of time expressions using the horizontal and the vertical spatial terms from searching the Yahoo
News Taiwan
Before
After
Sum of horizontal
Above
Below
Sum of vertical
123
127
250
70
52
122
430
J.-Y. Chen / Cognition 104 (2007) 427–436
Table 2
Number of time expressions using the horizontal and the vertical spatial terms from searching the Google
News Taiwan
Day
Week
Month
Year
Season
Event
Total
Before
After
Sum of horizontal
Above
Below
Sum of vertical
38
38
25
11
19
54
185
22
44
30
7
19
53
175
60
82
55
18
38
107
360
3
63
21
7
6
25
125
1
40
23
10
9
13
96
4
103
44
17
15
38
221
1.3. Discussion
We searched the Yahoo and the Google News Taiwan to estimate the frequency
of use of the horizontal and the vertical spatial metaphors when Chinese people
expressed time. The results from both rounds of search showed clearly that the horizontal spatial metaphors were used more frequently than the vertical spatial metaphors (except for the time unit ‘week’). Thus, Boroditsky’s assumption cannot be
justified. We, then, went on to repeat her experiment. A total of four experiments
were conducted, all following the basic design and procedure of her study. The
experiments were programmed in DMDX (developed at the University of Arizona
by K. I. Forster and J. C. Forster) and conducted on a personal computer with a
Pentium-III 667 Hz microprocessor and a 15-inch LCD monitor.
2. Part 2
2.1. Experiment 1
2.1.1. Method
Twenty-five Chinese–English bilinguals from the Department of Foreign Languages, National Cheng Kung University participated. They were graduate students
or at least in their undergraduate sophomore year, with an English major. Fourteen
native speakers of English who taught English in Tainan City also participated. All
of them were paid for participation.
There were 128 pictures serving as the primes. Half depicted a horizontal relation of two objects, while the other half the vertical relation. At the bottom of
each picture was a sentence, which described the spatial relation of the objects
using the horizontal (ahead of or behind) or the vertical (above or below) spatial
metaphors.
There were 32 target sentences, each depicting a time relation. Half were true,
while the other half were false. Within each half, eight sentences used before/after
to describe the time relation (e.g., Monday is before Wednesday), and eight used earlier/later. The time units used in the sentences included week days, months, and
seasons.
J.-Y. Chen / Cognition 104 (2007) 427–436
431
A trial consisted of two prime pictures followed by a target sentence. The two
prime pictures depicted similar spatial relations (both horizontal or both vertical).
The first one always gave a FALSE answer, while the second one always a TRUE
answer. The target sentence was sometimes TRUE and sometime FALSE, and
was randomly arranged. The participants responded by pressing one of two keys
to indicate TRUE or FALSE. Response times were measured to the accuracy of milliseconds by the computer. All the sentences and the instructions were presented in
English. The trial started with the presentation of four pound signs (####) serving
as a fixation mark. The participants pressed the space bar to initiate the trial. The
two prime pictures and the one target sentence followed one after another in that
sequence upon the participants’ keypress responses. The experiment had a withinsubject 2 (prime type) · 2 (target type) · 3 (time unit) design.
2.1.2. Results and discussion
The statistical analysis focused on the TRUE target sentences. Outliers
(RTs > 6000 ms) and errors constituted 3% and 6% of the trials. Table 3 shows that
the participants were slower (but not significantly) when the target sentence followed
a horizontal prime than a vertical prime. This was the case regardless of the type of
target, the time unit, and the native language. Detailed results of the analysis of var-
Table 3
Mean response time (in ms) and its standard deviation (SD) as a function of prime type (H, horizontal; V,
vertical), target type (before/after, earlier/later), and time unit (day, month, and season) from Experiment
1
Mean RT (SD)
H
Chinese participants
Before/after
Day
Month
Season
All
Difference of V–H
p of t-test
V
3053
3219
2780
3017
(799)
(827)
(980)
(879)
2939
2996
2677
2870
(711)
(727)
(964)
(810)
114
223
103
147
.442
.108
.669
.238
Day
Month
Season
All
3353
3194
3017
3188
(822)
(818)
(782)
(808)
3368
3103
2987
3152
(820)
(802)
(826)
(821)
15
91
30
36
.922
.516
.857
.759
English participants
Before/after
Day
Month
Season
All
2461
2326
2196
2328
(905)
(652)
(381)
(672)
2162
2574
2444
2393
(544)
(517)
(498)
(536)
299
248
248
65
.177
.112
.083
.527
2461
2464
2937
2621
(886)
(789)
(768)
(828)
2369
2654
2365
2462
(830)
(664)
(583)
(696)
92
190
572
159
.607
.164
.006
.130
Earlier/later
Earlier/later
Day
Month
Season
All
The p values of the matched-sample t-tests comparing the horizontal and vertical RTs are also shown.
432
J.-Y. Chen / Cognition 104 (2007) 427–436
iance (for all experiments) are available in the supplementary items. Table 3 (as well
as Tables 4–6) also presents the results of the individual t-tests contrasting the horizontal RTs with the vertical RTs for different languages, target types and time units.
The results, which were inconsistent with Boroditsky’s (2001) findings, suggested
that the paradigm of spatial priming did not work in our experiment. This could
mean some problems in the way we followed the design and the procedure of Boroditsky’s experiment. But, it could also be a true failure of replication. Before drawing
any conclusion, we conducted Experiment 2 with the same method, but using the
Chinese–English bilinguals and changing the language of the experiment (instructions and the sentences) into Chinese. The reason for the modification was to maximize the condition for observing a vertical bias in the Chinese speakers. If the
Chinese speakers think about time vertically when they process an English sentence,
they must display an even stronger vertical bias when they process a Chinese sentence. If Experiment 2 also failed to replicate Boroditsky’s findings, perhaps we
had not done the experiments grossly differently from the way she did hers.
2.2. Experiment 2
2.2.1. Method
Twenty Chinese–English bilinguals were recruited from the student population of
the National Cheng Kung University, with no special requirement on English proficiency. The design and the procedure were similar to those of Experiment 1, except
that all the sentences were presented in Chinese, and so were the instructions.
2.2.2. Results and discussion
Outliers (RTs > 5000 ms) and errors constituted 2% and 7% of the trials. Table 4
shows the participants responded more slowly (again insignificantly) when the target
sentence followed a horizontal prime than a vertical prime regardless of the type of
target, the time unit, and the native language. Thus, the replication failed again. We
Table 4
Mean response time (in ms) and its standard deviation (SD) as a function of prime type (H, horizontal; V,
vertical), target type (before/after, earlier/later), and time unit (day, month, and season) from Experiment
2
Mean RT (SD)
H
Difference of V–H
p of t-test
V
Before/after
Day
Month
Season
All
2198
2060
2158
2139
(847)
(866)
(706)
(798)
1908
1763
2138
1937
(593)
(503)
(738)
(628)
290
297
20
202
.042
.101
.918
.120
Earlier/later
Day
Month
Season
All
2010
1735
2099
1948
(625)
(511)
(623)
(599)
1824
1743
1881
1816
(634)
(662)
(686)
(653)
186
8
218
132
.186
.955
.039
.213
The p values of the matched-sample t-tests comparing the horizontal and vertical RTs are also shown.
J.-Y. Chen / Cognition 104 (2007) 427–436
433
Table 5
Mean response time (in ms) and its standard deviation (SD) as a function of prime type (H, horizontal; V,
vertical), target type (before/after, earlier/later), and time unit (day, month, and season) from Experiment 3
Mean RT (SD)
H
Difference of V–H
p of t-test
V
Before/after
Day
Month
Season
All
1579
1426
1534
1513
(415)
(371)
(242)
(344)
1559
1326
1462
1449
(441)
(217)
(555)
(424)
20
100
72
64
.312
.631
.535
.292
Earlier/later
Day
Month
Season
All
1577
1347
1812
1579
(277)
(302)
(526)
(419)
1488
1474
1765
1575
(364)
(369)
(537)
(438)
89
127
47
4
.357
.844
.692
.978
The p values of the matched-sample t-tests comparing the horizontal and vertical RTs are also shown.
Table 6
Mean response time (in ms) and its standard deviation (SD) as a function of prime type (H, horizontal; V,
vertical), target type (before/after, earlier/later), and time unit (day, month, and season) from Experiment 4
Mean RT (SD)
H
Difference of V–H
p of t-test
V
Before/after
Day
Month
Season
All
1947
1907
2018
1957
(634)
(618)
(750)
(659)
2085
1937
1970
1997
(841)
(976)
(659)
(822)
138
30
48
40
.506
.880
.756
.755
Earlier/later
Day
Month
Season
All
1904
1721
1982
1869
(707)
(490)
(576)
(597)
1773
1622
2018
1804
(693)
(530)
(684)
(649)
131
99
36
65
.256
.492
.791
.531
The p values of the matched-sample t-tests comparing the horizontal and vertical RTs are also shown.
conducted Experiment 3 to rule out a potential methodological flaw in the previous
experiments.
2.3. Experiment 3
2.3.1. Method
One possible explanation of why the response times tended to be slightly longer
when the target sentences followed the horizontal primes than the vertical primes
is that there was no delay between the response to the second prime picture and
the presentation of the target sentence. Because the horizontal pictures tended to
be harder to process than the vertical pictures (as verified by a separate analysis of
their corresponding response times), the participants might still be pondering about
their response to the horizontal picture when the target sentence appeared. This
would have caused a delay in their response to the target sentence.
434
J.-Y. Chen / Cognition 104 (2007) 427–436
To avoid this, we inserted the pound sign before every prime picture and every
target sentence. The participants needed to press the space bar to receive the picture
or the sentence. The experiment continued to use Chinese participants (N = 10, similarly recruited as in Experiment 2) and Chinese sentences and instructions.
2.3.2. Results and discussion
The average proportions of outliers (RTs > 4000 ms) and errors were 0.5% and
4%. The results shown in Table 5 display a similar pattern to those of Experiments
1 and 2. Thus, having ruled out the potential methodological problem, Experiment 3
still failed to replicate Boroditsky’s (2001) results.
2.4. Experiment 4
2.4.1. Method
In the last experiment, we rearranged the two objects in the horizontal pictures so
that they were aligned vertically. Two approaching lines extending from bottom to
top bordered the two objects so that the one at the top appeared in the front and
the one at the bottom appeared in the back (the linear perspective). This arrangement made the horizontal pictures more comparable to the vertical pictures in terms
of the visual angles. The rest of the method was the same as that of Experiment 2 and
3. Eighteen Chinese participants were recruited from the same subject pool.
2.4.2. Results
The average proportions of outliers (RTs > 5000 ms) and errors were 1.5% and
8%. As Table 6 shows, although there were some conditions in which the response
times were faster when the target sentences followed the horizontal pictures, overall
they were not. The only significant effects were the main effect of target type and the
main effect of time unit. The main effect of prime type was not significant; neither
were the interactions, involving this factor.
3. General discussion
Boroditsky (2001) observed that whereas English monolinguals tended to think
about time horizontally, Chinese–English bilinguals tended to think about time vertically even when they did it in English. She attributed this vertical bias in the Chinese–English bilinguals to the fact that the Chinese language uses the vertical spatial
metaphors (in addition to the horizontal metaphor) to express time, while the English language uses only the horizontal metaphors. The author concluded that the language one uses can have a profound effect on one’s habitual thinking.
We found that the use of the horizontal spatial metaphors in Chinese to express
time was actually more frequent than the use of the vertical spatial metaphors. Moreover, we were unable, in four attempts, to replicate the results of Boroditsky’s experiment. The effect sizes of the vertical bias with respect to the earlier/later questions
for the Chinese participants were generally small so that extremely large sample sizes
J.-Y. Chen / Cognition 104 (2007) 427–436
435
would have been required to detect them with a reasonable power (see Table 7). The
English participants displayed a much larger effect size, but in the direction opposite
to Boroditsky’s prediction.
It may appear to the readers that despite the null results of the statistical analyses,
our Chinese data seem to present a trend that is consistent with Boroditsky’s prediction. This impression is not supported by a careful examination of the relevant data.
The trend is there only when different time units are lumped together. But, since time
unit is clearly a critical variable here, it is inappropriate to combine the data across
time units. If we focus on ‘month,’ the time unit that was used in Boroditsky (2001),
we find that Experiments 1 and 4 displayed a trend of vertical bias, while Experiments 2 and 3 displayed a trend of horizontal bias. Thus, our data, when examined
with care, cannot be taken as consistent with Boroditsky’s predition.
There was a concern that our participants might not have processed the primes
because the answers fell in a fixed pattern. This could explain our null results of spatial priming. We ran an analysis of the error rates and the RTs of the participants’
responses to the prime questions. The error rates were, on the average, comparable
or higher for the prime questions than for the target questions. The RTs were also on
the average longer for the primes than for the targets (see Table 8). If the fixed pattern of the answers for the prime questions had encouraged the participants to skip
processing the primes, the error rates would have been close to zero, and the reaction
Table 7
The results of the power analysis testing the vertical bias for the earlier/later questions
Experiment 1
Chinese
English
Experiment 2
Experiment 3
Experiment 4
Initial sample size
Effect size
Power (%)
Required sample size
to achieve 80% power
25
14
20
10
18
0.062
0.432
0.288
0.009
0.151
4.6
29.8
21.6
2.6
8.0
2050
45
97
100000
350
The effect size is the standardized mean difference of the vertical RT minus the horizontal RT, which
represents the vertical bias. The alpha level of the significance test was set at .05.
Table 8
Error rates and the mean RTs for the primes and the targets
Error rate
Experiment 1
Experiment 2
Experiment 3
Experiment 4
Chinese
English
Mean RT
Prime 1
Prime 2
Target
Prime 1
Prime 2
Target
0.37
0.05
0.14
0.04
0.15
0.32
0.03
0.12
0.05
0.15
0.06
0.06
0.07
0.04
0.06
3278
2998
2456
2075
2802
2929
2364
2204
1901
2464
3114
2639
1990
1553
1953
436
J.-Y. Chen / Cognition 104 (2007) 427–436
times would have been much shorter. Thus, there is no indication that the participants in our study skipped processing the prime questions.
In sum, the two parts of the study led us to conclude that Chinese speakers do not
conceptualize time differently than English speakers. This conclusion, however, must
be limited to the way time is expressed spatially. Whether Chinese and English
speakers might differ in other ways of conceptualizing time remains an open question; so does the linguistic relativity claim.
One lesson that must be learned from this investigation and Boroditsky’s is that
researchers can reach erroneous conclusions when they examine a cross-language
issue but do not have competent knowledge about the languages they examine.
The controversy over the issue of counterfactual reasoning serves as a case in point
(Au, 1983; Bloom, 1981). Collaborations, involving native speakers in this type of
research are strongly advised.
Appendix A. Supplementary data
Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.cognition.2006.09.012.
References
Au, T. K-F. (1983). Chinese and English counterfactuals: the Sapir–Whorf hypothesis revised. Cognition,
15, 155–187.
Bloom, A. H. (1981). The linguistic shaping of thought: A study in the impact of language on thinking in
China and the west. Hillsdale, NJ: Lawrence Erlbaum Associates.
Boroditsky, L. (2001). Does language shape thought? Mandarin and English speakers’ conceptions of
time. Cognitive Psychology, 43(1), 1–22.