What is Corpus Linguistics?

NUT Workshop 2016
Corpus Linguistics in the Classroom
Neill Wylie & Denise McAllister
Maastricht University Language Centre
Who we are…..
Neill Wylie
-MA TESOL
-Socio Linguistics
-Male v female communication
-Blended Learning in Academic English
Denise McAllister
-MA TESOL
-Academic Writing course coordination
-Intercultural Studies
-International Classroom, IntlUni
Language Centre
Format of workshop
• Activate your knowledge!
• Definition of Corpus Linguistics
–
–
–
–
Types of corpora
Why use corpus informed methods?
Student benefits
Application of corpus to classroom
• Constructing a specialised corpus
• Summary: How can using a corpus benefit your
institution?
Language Centre
On your smartphone / laptop / tablet:
Go to
app.gosoapbox.com
Enter access code:
nut
Language Centre
What is Corpus Linguistics?
• The study of language as expressed in samples of
‘real world’ text
Language Centre
What is Corpus Linguistics?
”…a method for finding out about language use which
involves the interrogation of large, electronically-stored
and rapidly-searchable collections of texts”
Language Centre
What is Corpus Linguistics?
Types of corpora:
1. General Corpora – 100 million to a billion
words.
Big and diverse to be representative of language as a
whole
Language Centre
What is Corpus Linguistics?
Types of corpora :
1. General Corpora – 100 million to a billion
words.
Big and diverse to be representative of language as a
whole
Typical book contains around 100,000 words
(academic text or novel)
Language Centre
What is Corpus Linguistics?
Types of corpora :
1. General Corpora – 100 million to a billion
words.
Big and diverse to be representative of language as a
whole
2. Specialised Corpora
Purpose built to focus on a particular type of text,
writer or speaker
Language Centre
What is Corpus Linguistics?
Additional Types of corpora :
3. Multilingual Corpora
English & Spanish; American English & Indian English
4. Parallel corpus
English & Spanish translated (CRATER)
5. Learner Corpus
International Corpus of Learner English
6. Historical corpus
Helsinki corpus 1.5 million words of text from 700AD – 1700AD
7. Monitor corpus
Continuation e.g. Bank of English
Language Centre
Important English Corpora Websites
• British National Corpus (BNC) 100 million words
Language Centre
Important English Corpora Websites
• Corpus of Contemporary American English
(COCA) 520 million words
Language Centre
Example of Foreign Language Corpora
• Corpus del Español 100 million words
Language Centre
Additional Corpora of Interest
• Hansard corpus:
– British parliament 1803 – 2005, 1.6 billion words
• Wikipedia corpus:
– -2014, 1.9 billion words
• Global Web-Based English (GloWbE):
– 2012-2013, 1.9 billion words across 20 countries
• Corpus of American Soap Operas:
– 2001 – 2012, 100 million words
Language Centre
Why use a corpus?
• Discover tendencies
– what’s normal/typical in real-life language?
• Rare or exceptional cases
– Reveals what we wouldn’t know from looking at single texts
or from introspection
• Speed and accuracy
– Human researchers make mistakes and are slow
Language Centre
Application to the classroom
• Suitable for both EAP and general English contexts
• Syllabus design
• Materials development
• Classroom activities
Language Centre
The benefit of such student-centred
discovery learning:
• Access to facts of authentic language use
– comes from real contexts not constructed for pedagogical
purposes
• Challenges students
– to construct generalizations
– note patterns of language behaviour
• Make students more aware of language use
Language Centre
The benefit of such student-centred
discovery learning:
Students may be able to determine:
– different meanings and uses of common words
– useful phrases and typical collocations
– the structure and nature (written and spoken discourse)
– where certain language features are more typical
Language Centre
Authentic Examples of the use of
Corpus Linguistics in the Classroom
Examples of classroom application (1)
Student Example:
Use of ‘Aftermath’ by PhD Student
“In the aftermath of the Vatican Council, the Catholic
Church began to restructure its hierarchy of bishops..”
Sentence seems correct in aspects of language,
structure, grammar….
Language Centre
Examples of classroom application
Student Example:
Use of ‘Aftermath’ by PhD Student
“In the aftermath of the Vatican Council, the Catholic
Church began to restructure its hierarchy of bishops..”
Sentence seems correct in aspects of language,
structure, grammar….
Language Centre
Examples of classroom application
“In the aftermath of the Vatican Council, the Catholic
Church began to restructure its hierarchy of bishops..”
However, a native speaker might think the inclusion of
aftermath to be somewhat out of place…
….but may not know exactly why.
Language Centre
Using COCA to Explain
• Search for aftermath
• Choose Academic context
Language Centre
‘Aftermath’
• Instances in ACADEMIC in COCA
Language Centre
‘Aftermath’
Language Centre
Using COCA to Explain
• Nearly every line contains a negative
situation suggesting that aftermath is used
in close proximity to:
• Catastrophe
• Disaster
• Misfortune
Language Centre
Examples of classroom application (2)
“James was not looking forward to his
impending birthday”
Is there something wrong here?
Language Centre
Examples of classroom application (2)
“James was not looking forward to his
impending birthday”
Is there something wrong here?
Language Centre
Search for ‘Impending’ in COCA
• Frequency: 2584
• Most common collocates?
• Words which occur most frequently together
(commit a crime, back – front)
Language Centre
Search for ‘Impending’ in COCA
Language Centre
Search for ‘Impending’ in COCA
•
•
•
•
•
•
•
Frequency: 2584
Most common collocates?
danger
doom
anniversary
famine
suicide mission
Language Centre
What can Corpus Linguistics tell us
about the use of ‘impending’?
Language Centre
Use of ‘Impending’
Impending
• carries a negative semantic prosody
• is normally followed by a noun
• looks out of place next to an otherwise neutral /
positive noun such as birthday.
• can be used in the grammatical context of the
example
Language Centre
Examples of classroom application (3)
‘Barrier to learning’ vs. ‘Barrier for
learning’
“…which plays a deactivating role and might
be a barrier for learning”
What does COCA say?
Language Centre
Examples of classroom application
Search word
Choose
prepositions
Choose
academic
Language Centre
Examples of classroom application
Language Centre
Examples of classroom application
Barrier + to
Language Centre
Examples of classroom application
Barrier + for
Language Centre
Example of classroom application (4)
‘Aims at’ vs ‘aims to’
Both have very similar functions in similar
contexts…
…..so they are interchangeable, right?
Language Centre
Example of classroom application
Language Centre
Example of classroom application
Language Centre
‘Aims at’ vs ‘aims to’
What does
this tell us?
• 10 times more likely to use/come across ‘aims + at’
• Doesn’t make ‘aims + to’ wrong
• Descriptive method of investigation – not prescriptive
Language Centre
‘Aims at’ vs ‘aims to’
Aims + at + -ing form
Language Centre
‘Aims at’ vs ‘aims to’
Aims + to + verb/adverb +
verb/noun/adjective + noun
Language Centre
Example of classroom application (5)
“Stress is a risk factor of several psychiatric disorders
such as post-traumatic stress disorder and
depression.”
Anything amiss here?
Language Centre
Example of classroom application (5)
“Stress is a risk factor of several psychiatric disorders
such as post-traumatic stress disorder and
depression.”
Anything amiss here?
Language Centre
Search query:
risk + prep ALL Academic settings
Factor + of = 802
Factor + for = 621
Language Centre
Factor + prep
What do you notice about what comes
after factor + of?
Language Centre
Factor + prep
What do you notice about what comes
after factor + of?
Language Centre
Factor + prep
What do you notice about what comes
before and after factor + for?
Language Centre
Register free @
http://corpus.byu.edu/coca/
Language Centre
Now it’s your turn….
Language Centre
Now it’s your turn….
Search for
prepositions
used with
‘welcome’
Language Centre
Now it’s your turn….
Search for
he most
common
exical verb
n spoken
English
Language Centre
Constructing a specialised corpus
Corpus Building Tools
• Wordsmith Tools
– http://www.lexically.net/wordsmith/
• MonoConc Pro
– http://monoconc.com/
• AntConc
– http://www.laurenceanthony.net/software.html
Language Centre
Corpus Building Tools
• Wordsmith Tools
– http://www.lexically.net/wordsmith/
• MonoConc Pro
– http://monoconc.com/
• AntConc
FREE!!!!!
– http://www.laurenceanthony.net/software.html
Language Centre
AntConc
Language Centre
AntFileConverter
(Converts texts from PDF and Word to text file)
Language Centre
Open AntFileConverter
Click File
Language Centre
In AntFileConverter
Select folder
where files
are stored
Language Centre
In AntFileConverter
Change
file type
(e.g. to
PDF, Word,
All files)
Language Centre
In AntFileConverter
Select the files
you want to
convert
Language Centre
In AntFileConverter
Click Start
Language Centre
In AntFileConverter
Files convert to .txt
Language Centre
In Folder
New txt sub-folder is created
containing the converted .txt files
Language Centre
In txt sub-folder
Converted .txt files
Language Centre
Constructing the corpus
Run AntConc
and
click File
Language Centre
Constructing the corpus
Select
Open
File
Language Centre
Constructing the corpus
Open the txt
sub-folder
Language Centre
Constructing the corpus
Choose the
.txt files that
you want to
include in the
Corpus and
click Open
Language Centre
Constructing the corpus
Your corpus
is complete!
Language Centre
Constructing the corpus
Now start
your search!
Language Centre
More uses..?
Teacher’s role as researcher facilitator
Language Centre
How can Corpus Linguistics benefit
your institution?
•
•
•
•
•
•
•
Tailored-approach to language learning
Create authentic learning materials
Aids in syllabus design
Student-centred approach to learning
Promotes autonomous learning
User-friendly
Cost-effective
Language Centre