Presentation

ComAComA 2014 24 August 2014
Dublin, Ireland A Taxonomy for Afrikaans and Dutch Compounds
Gerhard B van Huyssteen Centre for Text Technology North-­‐West University Potchefstroom, South Africa [email protected] Ben Verhoeven CLiPS -­‐ ComputaEonal LinguisEcs University of Antwerp Antwerp, Belgium [email protected] 1.  CompoNet 2.  ExisEng taxonomies 3.  Our taxonomy for Afrikaans and Dutch 4.  Future work CompoNet
hSp://componet.sslmit.unibo.it/ Exis5ng (amongst others) English = 163 Dutch = 188 56 revisions New Afrikaans = 144 Problems with 2005 annotaAon guidelines
O:en vague •  Only one arEcle: BiseSo & Scalise (2005) •  Some language-­‐specific guidelines for some languages on website Outdated •  Revised taxonomy: Scalise & BiseSo (2009) •  ApplicaEons and criEques: • 
• 
• 
• 
Lieber (2009a, 2009b) Arcodia et al. (2009) Vercellob and Mortensen (2012) Arnaud and Renner (2014) Insufficient for language specific phenomena •  ProblemaEc for annotators BiseDo & Scalise (2005)
Morphosemantic Semosyntactic
level
level
•  Head-­‐complement (argumental) relaEon •  NN compounds: oeen of-­‐relaEon •  Also syntheEc and (neo)classical compounds mushroom soup subordinate
endo
exo
•  Head-­‐modifier (non-­‐argumental) relaEon •  Two semanEc heads, one •  AN compounds categorial head mushroom cloud •  and-­‐relaEon poet-­‐painter compounds
attributive
endo
•  Mostly semanEc head doormat exo
coordinate
endo
exo
other
Scalise & BiseDo (2009)
•  Secondary compounds •  Deverbal components taxi driver •  Primary compounds •  Uninflected components mushroom soup •  Non-­‐head expresses property of head by means of noun acEng as aSribute •  Oeen metaphorical atomic bomb (ATT) vs mushroom cloud Morphosyntactic
level
SUB
ground
endo
exo
ATAP
verbal-nexus
endo
exo
attributive
endo
exo
COORD
appositive
endo
exo
endo
exo
Scalise & BiseDo (2009)
•  Secondary compounds •  Deverbal components taxi driver •  Primary compounds •  Uninflected components mushroom soup •  Non-­‐head expresses property of head by means of noun acEng as aSribute •  Oeen metaphorical atomic bomb (ATT) vs mushroom cloud ATTR
ATAP
Morphosyntactic
level
SUB
ground
endo
exo
verbal-nexus
endo
exo
attributive
real
endo
exo
COORD
appositive
endo
exo
Fábregas and Scalise (2012) endo
exo
Word-formation
process
VercelloK & Mortensen (2012)
compounds
??? hierarchical
Semosyntactic
level
Morphosyntactic
level
Morphosemantic
level
1.  “is unclear how many languages would need this category, given the difficulty disEnguishing the category” 2.  “‘apposiEve’ is already in the literature as a type of coordinate compound” subordinate
expressed
predicate
endo
exo
apposiEve unexpressed
predicate
endo
exo
equal
attributive
expressed
predicate
endo
exo
coordinate
unexpressed
predicate
endo
exo
future
research
Arnaud & Renner (2014)
NN composites
subordinative
relational
generic-specific
attributive
analogical
coordinative
multifunctional
equative
hybrid
additional
Van Huyssteen & Verhoeven (2014)
Word-formation
process
Wordformation
Morphosemantic
level
Morphosyntactic
level
Semosyntactic
level
Compounding
Subordinate
compound
Attributive
compound
Appositive
compound
Coordinate
compound
hierarchical
Verbal-nexus
compound
Ground
compound
Exocentric
Phrasal
compound
Endocentric
(Neo-)classical
compound
Reduplication
Derivation
Compounding
compound
Parasynthetic
compound
Separable
complex verb
language specifi c/marginal
Phrasal
compound
Ground
compound
Exocentric
Phrasal
compound
Endocentric
Ground
compound
Ground
compound
Phrasal
compound
Endocentric
Endocentric
Phrasal
compound
Ground
compound
Ground
compound
Endocentric
Word-formation
process
• 
Wordformation
Compounding
mushroom soup Semosyntactic
level
Morphosyntactic
level
Morphosemantic
level
ApposiEves differ sufficiently from coordinates and aSribuEves atomic bomb Subordinate
compound
Attributive
compound
Deriva
mushroom cloud chicken mayonnaise Appositive
compound
Coordinate
compound
hierarchical
Verbal-nexus
compound
Ground
compound
Exocentric
Phrasal
compound
Endocentric
(Neo-)classical
compound
Phrasal
compound
Ground
compound
Exocentric
Phrasal
compound
Endocentric
Ground
compound
Ground
compound
Endocentric
Word-formation
process
Wordformation
Morphosemantic
level
Morphosyntactic
level
Semosyntactic
level
Compounding
Subordinate
compound
Attributive
compound
Appositive
compound
Deriva
Coordinate
compound
hierarchical
Verbal-nexus
compound
Ground
compound
deverbal uninflected Exocentric
Phrasal
compound
(Neo-)classical
compound
Phrasal
compound
Ground
compound
Phrasal
compound
Ground
compound
Ground
compound
phrases semi-­‐words Endocentric
Exocentric
• 
• 
Kind of component Category of component Endocentric
Endocentric
Word-formation
process
Wordformation
Morphosemantic
level
Morphosyntactic
level
Semosyntactic
level
Compounding
Subordinate
compound
Attributive
compound
Appositive
compound
Deriva
Coordinate
compound
hierarchical
Verbal-nexus
compound
Ground
compound
Exocentric
knip+oog snip+eye ‘to wink’ Phrasal
compound
(Neo-)classical
compound
Endocentric
Phrasal
compound
Ground
compound
Exocentric
• 
• 
Phrasal
compound
Ground
compound
Endocentric
. rooi+kop red+head ‘ginger (derogatory) (Morpho)semanEc head Only two kinds of exocentric compounds Ground
compound
Endocentric
Word-formation
process
Wordformation
Morphosemantic
level
Morphosyntactic
level
Semosyntactic
level
Compounding
Subordinate
compound
Attributive
compound
Appositive
compound
Coordinate
compound
hierarchical
Verbal-nexus
compound
Ground
compound
Exocentric
Phrasal
compound
Endocentric
(Neo-)classical
compound
Reduplication
Derivation
Compounding
compound
Parasynthetic
compound
Separable
complex verb
language specifi c/marginal
Phrasal
compound
Ground
compound
Exocentric
Phrasal
compound
Endocentric
Ground
compound
Ground
compound
Phrasal
compound
Endocentric
Endocentric
Phrasal
compound
Ground
compound
Ground
compound
Endocentric
Wordformation
Compounding
Appositive
compound
Reduplication
Derivation
Coordinate
compound
Compounding
compound
Parasynthetic
compound
Separable
complex verb
language specifi c/marginal
Ground
mpound
Phrasal
compound
Exocentric
Endocentric
Ground
compound
Ground
compound
Phrasal
compound
Endocentric
Endocentric
• 
• 
• 
Phrasal
compound
Ground
compound
Ground
compound
Endocentric
[[[a]Adj [b]N]NP [c]N]N volcanic-­‐mountain climber [[[a]Num [b]N]NP [c]N]N three-­‐tooth giant [[[a]P [b]N]PP [c]N]N
in-­‐house exper;se Wordformation
Compounding
Appositive
compound
Reduplication
Derivation
Coordinate
compound
Compounding
compound
Parasynthetic
compound
Separable
complex verb
language specifi c/marginal
Ground
mpound
Phrasal
compound
Exocentric
Endocentric
Ground
compound
Ground
compound
Phrasal
compound
Endocentric
Endocentric
• 
Phrasal
compound
Ground
compound
Ground
compound
Endocentric
Compounding through derivaEon groot+skaal-­‐s ‘large-­‐scale’ Wordformation
Compounding
Appositive
compound
Reduplication
Derivation
Coordinate
compound
Compounding
compound
Parasynthetic
compound
Separable
complex verb
language specifi c/marginal
Ground
mpound
Phrasal
compound
Exocentric
Endocentric
Ground
compound
Ground
compound
Phrasal
compound
Endocentric
Endocentric
• 
Phrasal
compound
Ground
compound
Ground
compound
Endocentric
Well-­‐known phenomenon in Germanic languages op+zoeken up+look ‘look up/search for’ Word-formation
process
Wordformation
Morphosemantic
level
Morphosyntactic
level
Semosyntactic
level
Compounding
Subordinate
compound
Attributive
compound
Appositive
compound
Coordinate
compound
hierarchical
Verbal-nexus
compound
Ground
compound
Exocentric
Phrasal
compound
(Neo-)classical
compound
Reduplication
Derivation
Compounding
compound
Parasynthetic
compound
Separable
complex verb
language specifi c/marginal
Phrasal
compound
Ground
compound
Endocentric
Exocentric
subordinate
[SEM j of SEMi]k
[[x]i[y]j]k
Phrasal
compound
Endocentric
Ground
compound
Ground
compound
Phrasal
compound
Endocentric
Endocentric
Phrasal
compound
Ground
compound
Ground
compound
Endocentric
Hence…
•  rigorous/explicit annotaEon guidelines •  Corrected anomalies and discrepancies •  Formalised (ConstrucEon Morphology) •  Terminology disambiguated and translated •  comprehensive for all compounds in Dutch and Afrikaans •  All paSerns in De Haas & Trommelen (1993) and Afrikaans literature •  Illustrated what/how to adapt for specific languages •  Next: •  Re-­‐annotate data in CompoNet •  Future research hSp://Enyurl.com/aucopro A Taxonomy for Afrikaans and Dutch Compounds
Gerhard B van Huyssteen Centre for Text Technology North-­‐West University Potchefstroom, South Africa [email protected] Ben Verhoeven CLiPS -­‐ ComputaEonal LinguisEcs University of Antwerp Antwerp, Belgium [email protected] Acknowledgements •  Funding: Nederlandse Taalunie (Dutch Language Union), South African Department of Arts and Culture (DAC), South African NaEonal Research FoundaEon (NRF), European Network on Word Structure (NetWordS) (European Science FoundaEon) •  Benito Trollip, who populated the first version of the Afrikaans secEon in the CompoNet database •  Anonymous reviewers for comments and suggesEons