download

Natural Languages
© 2007
Language
• Definition of Language
– In math and computer science:
• A lexicon & rules for combining terms from the
lexicon
– In common use:
• Structured verbal interaction between people
• Any structured interaction such as “The
Language of Film”
• Are computer languages a model for
human natural language?
© 2007
Wide Variability among
Natural Languages
• Sentence Structure
– SVO (Subject-Verb-Object) (English, Chinese)
– OVS (Gaelic/Celtic)
– SVO (Hindi, Japanese, Hopi)
• Written
– Ideographic (Chinese),
– Syllabic (Thai),
– Alphabetic (English)
• Spoken
– Tonal (Chinese)
– Non-tonal (English)
© 2007
Layers of Natural Language
• Words
– Morphology, Orthography, Phonetics, Phonology
– Words are categorized into parts of speech
• Syntax
– Phrase and sentence structure based on parts of speech
• Semantics
– Literal meaning
• Pragmatics/Discourse
– Uses beyond the literal meaning
© 2007
Grammars
• Grammars are most often associated with modeling
syntax though semantic grammars are also possible.
In the broadest sense, grammars are rules for
languages
• The most general grammars are “context-free”. That
is, the structure does not depend of the context.
• The grammars used for syntax are usually “constituent
grammars”. That is they identify the relationship of the
components (constituents) of the phrase.
• Grammars taught in grade school are “descriptive”
grammars. Grammars in the formal analysis of
language are “prescriptive” and usually “generative”.
• Grammars are usually defined by rules, but statistical
transition networks are also used to model the
structure of language.
© 2007
Modeling Natural Language
Syntax with Grammars
• Rewrite (or production) rules (phrasestructure grammar)
• A very simple example of rewrite rules
S
NP
NP+VP
N, Adj+N,
VP
V, V+NP
© 2007
Parsing
• Can we identify the grammatical structure of a given statement?
• Parsing is the basis of syntax checking for computer program
compilers.
• A parse tree is structure of a given statement given
– a lexicon with parts-of-speech
– a grammar
S
NP
• A very simple sample parse tree
shown at the right. This has
a Verb Phrase with a Direct Object.
This Direct Object is itself a Noun
Adj
Phrase.
• Difficulties: Garden path sentences
VP
NP
N
V
Adj
– “The man who hunts ducks out on weekends”
• Many algorithms have been developed for parsing,
© 2007
N
Psycholinguistics
• What do we know about how people
process and learn language?
• Are all languages context free?
• Language learning
– Children sometimes seem to over-apply
rules. “I goed to the store”
• Competence vs. performance
• Transformational grammars are a model
that allows re-arrangement of structure.
© 2007
Modeling Syntax with
Statistical Models
• While most grammars are a rule-based
representation, a statistical
representation of language may more
capture structure more flexibly.
• In particular, Markov models can
describe the transitions between different
parts of speech. For instance, the Nouns
are often followed by Verbs but
Adjectives are rarely followed by Verbs
© 2007
Words
• What exactly is a word?
– Sail-boat, Pennsylvania, 555-1212, F-16
• Definitions of words
– Why aren’t the definitions of words in dictionaries
all the same?
– Are exact definitions of words possible?
• Across time, across groups
– Words evolve in meaning
• Sometimes by radial categories (that is, often by
metaphor)
• What is the relationship between concepts
and words?
© 2007
Tools beyond Traditional Dictionaries:
WordNet and FrameNet
• WordNet http://wordnet.princeton.edu/
– Shows hierarchical relationships for dictionary
terms. Very loosely, this can be thought of as an
ontology.
• FrameNet http://framenet.icsi.berkeley.edu/
– Verbs show the relationship among concepts. For
instance “to give” implies that there is a gift, a gifter,
and a giftee.
© 2007
Semantics
• Very different statements can have similar
semantics.
• The semantics of statements in a computer
programming language (i.e., a program) can
be determined from its behavior.
• The semantics of natural language is often
judged by the meaning and relationship of the
components. Subjective and contextualized
meaning is considered as pragmatics which
we will discuss later.
© 2007
Representing Semantics
• Semantic grammar
– Even with different surface structure, can
we develop a standard representation for
the meaning.
• Interlingua
– A common mediator for meaning across
languages. This could be useful for
translation.
© 2007
Pragmatics:
Social Uses of Language
Referential
• Conveys information about some real phenomenon
• This is what we think about as normal language use
Expressive
• describes feelings of the speaker
Conative
• attempts to elicit some behavior from the addressee
Phatic
• builds a relationship between both parties in a conversation
Meta-lingual
• self-references
Poetic
• focuses on the text independent of reference
from R. Jakobson
© 2007
Discourse
• Sentences form macro-structures or superstructures of meaning. This includes
structured language such as argumentation,
negotiation, news, narrative, and
explanations.
• What are the components (elements) and
structure of discourse. For instance,
structuring messages to make it clear for
listeners
• Given-New
Bill (a person you know) went to the store (is in a new location)
• Theme-Rheme
When in Rome (theme), do as the Romans do (rheme)
© 2007
Argumentation
• Toulmin has proposed a general structure for
arguments
Grounds
Claim
Evidence Rebuttal
• There are a lot of complex structured verbal
interactions
– Legal arguments
– Design rationale
– Negotiations
© 2007
Explanations and Causation
• What an explanation consists of
– Two types of phenomena being explained
• Causal antecedents
– How do we explain the American Civil War?
• Sub-processes
– How does a gasoline engine work?
– Background for the person receiving the
explanation needs to be considered.
© 2007
Stories and Narrative
• (Goals + Events + Resolution) + Characters
• Many stories seem highly structured
– Some stories seem so structured that they have
been described as “story grammars”. This is most
notably true of Russian Fairy Tales
• Many stories also reflect familiar human
quandaries
– “Romeo and Juliet”
• Interactive and dynamic narrative (useful in
games)
– Could we become a player in an interactive
“Romeo and Juliet”?
© 2007
Conversation
• Conversation adds a social and interactive
component to language
• Conversational norms (Maxims)
• Truthful, informative, relevant, clear
• But these are routinely violated
• Managing conversations
– Opening / Closing
– Turn taking
• In Native American councils,
the person holding the talking stick
controlled the floor
© 2007