Natural Language Processing
as Philology
Philology is the art of reading slowly.
David Smith
Department of Computer Science
UMass Amherst
Roman Jakobson
DFG‐Perseus Workshop
1
DFG‐Perseus Workshop
2
DFG‐Perseus Workshop
3
DFG‐Perseus Workshop
4
Franco Moretti
NLP Highlights
Morphological Disambiguation
• Efficient algorithms for linguistic inference
– Joint inference across many layers of language
• Adaptation to new languages and domains
• Inferring structure in large, noisy collections
– Detecting text reuse and linkage
– Inferring temporal sequence of events
DFG‐Perseus Workshop
5
6
DFG‐Perseus Workshop
Bare‐Bones Dependency Structure
Syntax in Translation
Er wird in den Strassen wandern
He will in the streets
walk
He will walk in the streets
The computer knows you like your mother
Er wird in den kleinen Strassen wandern
He will in the small
streets
walk
He is in the small streets hike
7
DFG‐Perseus Workshop
8
DFG‐Perseus Workshop
Who Did What To Whom?
Pierre Vinken , 61 years old
, will
join the board as a nonexecutive director
PropBank join predicate
ARG0
Vinken
9
SYNTAX AND PARSING
ARG1
ARG-PRD
board
director
DFG‐Perseus Workshop
DFG‐Perseus Workshop
Grammars and Trees
Dependency Parsing as Graph Inference
Raw sentence
Context-free grammars
VP ➔ V NP PP
10
He reckons the current account deficit will narrow to only 1.8 billion in September.
0.0001
VP
Part-of-speech tagging
0.0001
POS‐tagged sentence
V
NP
He reckons the current account deficit will narrow to only 1.8 billion in September.
PP
≥O(n3)
Tree (substitution | insertion | adjoining) grammars
VP
V↓
NP↓
PP↓
S
NP↓
≥O(n3), O(n6)
VBZ
DT
JJ
NN
NN
MD
Synchronous grammars
0.0001
VP
PRP
likes
0.0001
S
VP
NP↓
NP↓
VB
TO
gusta
SUBJ
NP↓
MOD
MOD
MOD
SUBJ
COMP
SPEC
ROOT
DFG‐Perseus Workshop
CD
IN
NNP
.
He reckons the current account deficit will narrow to only 1.8 billion in September .
VP
S-COMP
11
CD
Word dependency parsed sentence
≥O(n6)
Higher complexity of CCG, LFG, HPSG, Minimalist
RB
Word dependency parsing
slide adapted from Yuji Matsumoto
DFG‐Perseus Workshop
12
COMP
How about structured outputs?
How about structured outputs?
• Log‐linear models great for n‐way classification
• Also good for predicting sequences
v
a
but to allow fast dynamic programming or MST parsing,
only use single‐edge features
…find preferred links…
n
but to allow fast dynamic programming, only use n‐gram features
find
preferred
tags
• Also good for dependency parsing
but to allow fast dynamic programming or MST parsing,
only use single‐edge features
…find preferred links…
DFG‐Perseus Workshop
13
DFG‐Perseus Workshop
14
13
14
Edge‐Factored Parsers (McDonald et al. 2005)
Edge‐Factored Parsers (McDonald et al. 2005)
• Is this a good edge?
• Is this a good edge?
jasný den
yes, lots of green ...
Byl
jasný studený
dubnový
(“bright day”)
den a hodiny
odbíjely
třináctou
“It was a bright cold day in April and the clocks were striking thirteen”
Byl
jasný studený
dubnový
třináctou
DFG‐Perseus Workshop
16
Edge‐Factored Parsers (McDonald et al. 2005)
• Is this a good edge?
Edge‐Factored Parsers (McDonald et al. 2005)
• Is this a good edge?
jasný N
jasný N
jasný den
(“bright NOUN”)
(“bright day”)
odbíjely
“It was a bright cold day in April and the clocks were striking thirteen”
DFG‐Perseus Workshop
15
jasný den
den a hodiny
(“bright NOUN”)
(“bright day”)
AN
Byl
V
jasný studený
A
A
dubnový
A
den a hodiny
N
J
N
odbíjely
třináctou
Byl
V
C
V
“It was a bright cold day in April and the clocks were striking thirteen”
DFG‐Perseus Workshop
17
jasný studený
A
A
dubnový
A
den a hodiny
N
J
N
odbíjely
třináctou
V
C
“It was a bright cold day in April and the clocks were striking thirteen”
DFG‐Perseus Workshop
18
Edge‐Factored Parsers (McDonald et al. 2005)
• Is this a good edge?
• How about this competing edge?
jasný N
jasný den
(“bright
A
N day”)
(“bright NOUN”)
preceding
conjunction
Byl
V
Edge‐Factored Parsers (McDonald et al. 2005)
not as good, lots of red ...
AN
jasný studený
A
A
dubnový
A
den a hodiny
N
J
odbíjely
třináctou
Byl
V
C
V
N
“It was a bright cold day in April and the clocks were striking thirteen”
jasný studený
A
A
dubnový
A
den a hodiny
N
J
odbíjely
třináctou
V
C
N
“It was a bright cold day in April and the clocks were striking thirteen”
DFG‐Perseus Workshop
19
DFG‐Perseus Workshop
20
Edge‐Factored Parsers (McDonald et al. 2005)
Edge‐Factored Parsers (McDonald et al. 2005)
• How about this competing edge?
• How about this competing edge?
jasný hodiny
jasný hodiny
jasn hodi
(“bright clocks”)
(“bright clocks”)
... undertrained ...
... undertrained ...
(“bright clock,”
stems only)
Byl
V
jasný studený
A
A
dubnový
A
den a hodiny
N
J
odbíjely
třináctou
Byl
V
C
V
N
byl
“It was a bright cold day in April and the clocks were striking thirteen”
jasný studený
A
jasn
dubnový
A
A
stud
dubn
• How about this competing edge?
jasn hodi
(“bright clock,”
stems only)
... undertrained ...
Byl
V
byl
jasný studený
A
jasn
dubnový
A
A
stud
dubn
den a
C
hodi
odbí
třin
jasn hodi
(“bright clock,”
stems only)
where
N follows ...
... undertrained
a conjunction
odbíjely
třináctou
Byl
N
V
C
V
hodi
odbí
třin
byl
“It was a bright cold day in April and the clocks were striking thirteen”
DFG‐Perseus Workshop
23
třináctou
V
• How about this competing edge?
Aplural Nsingular
J
den a
odbíjely
N
Edge‐Factored Parsers (McDonald et al. 2005)
jasný hodiny
A (“bright
N clocks”)
den a hodiny
N
J
DFG‐Perseus Workshop
22
Edge‐Factored Parsers (McDonald et al. 2005)
(“bright clocks”)
N
“It was a bright cold day in April and the clocks were striking thirteen”
DFG‐Perseus Workshop
21
jasný hodiny
den a hodiny
jasný studený
A
jasn
dubnový
A
A
stud
dubn
Aplural Nsingular
den a hodiny
N
J
den a
odbíjely
třináctou
N
V
C
hodi
odbí
třin
“It was a bright cold day in April and the clocks were striking thirteen”
DFG‐Perseus Workshop
24
Edge‐Factored Parsers (McDonald et al. 2005)
• Which edge is better?
V
jasný studený
A
byl
jasn
dubnový
A
A
stud
dubn
our current weight vector
• Which edge is better?
• Score of an edge e = features(e)
• Standard algos valid parse with max total score
– “bright day” or “bright clocks”?
Byl
Edge‐Factored Parsers (McDonald et al. 2005)
den a hodiny
N
J
den a
odbíjely
třináctou
Byl
N
V
C
V
A
A
A
hodi
odbí
třin
byl
jasn
stud
dubn
“It was a bright cold day in April and the clocks were striking thirteen”
jasný studený
dubnový
den a hodiny
N
J
den a
odbíjely
třináctou
N
V
C
hodi
odbí
třin
“It was a bright cold day in April and the clocks were striking thirteen”
DFG‐Perseus Workshop
25
DFG‐Perseus Workshop
26
Edge‐Factored Parsers (McDonald et al. 2005)
our current weight vector
• Which edge is better?
• Score of an edge e = features(e)
• Standard algos valid parse with max total score
Finding Highest‐Scoring Parse
• Convert to context‐free grammar (CFG)
• Then use dynamic programming
The cat in the hat wore a stovepipe. ROOT
can’t have both
can‘t have both
(one parent per word)
(no crossing links)
let’s vertically stretch this graph drawing
ROOT
wore
Thus, an edge may lose (or win)
because of a consensus of other
edges.
Can’t have all three
(no cycles)
cat
The
stovepipe
in
a
hat
the
DFG‐Perseus Workshop
27
Finding Highest‐Scoring Parse
• Convert to context‐free grammar (CFG)
• Then use dynamic programming
– CKY algorithm for CFG parsing is O(n3)
– Unfortunately, O(n5) in this case
• to score “cat wore” link, not enough to know this is NP
• must know it’s rooted at “cat”
• so expand nonterminal set by O(n): {NP
the, NPcat, NPhat, ...}
ROOT
each subtree is a linguistic constituent
(here a noun phrase)
DFG‐Perseus Workshop
28
Finding Highest‐Scoring Parse
• Convert to context‐free grammar (CFG)
• Then use dynamic programming
– CKY algorithm for CFG parsing is O(n3)
– Unfortunately, O(n5) in this case
– Solution: Use a different decomposition (Eisner 1996)
• Back to O(n3)
so CKY’s “grammar constant” is no longer constant
wore
cat
The
in
a
cat
The
stovepipe
in
a
hat
the
ROOT
wore
stovepipe
hat
each subtree is a linguistic constituent
(here a noun phrase)
DFG‐Perseus Workshop
29
the
each subtree is a linguistic constituent
(here a noun phrase)
DFG‐Perseus Workshop
30
Hard Constraints on Valid Trees
our current weight vector
Finding Highest‐Scoring Parse
• Convert to context‐free grammar (CFG)
• Then use dynamic programming
– CKY algorithm for CFG parsing is O(n3)
– Unfortunately, O(n5) in this case
– Solution: Use a different decomposition (Eisner 1996)
• Score of an edge e = features(e)
• Standard algos valid parse with max total score
• Back to O(n3)
• Can play usual tricks for dynamic programming parsing
can’t have both
can‘t have both
(one parent per word)
(no crossing links)
– Further refining the constituents or spans • Allow prob. model to keep track of even more internal information – A*, best‐first, coarse‐to‐fine
Thus, an edge may lose (or win)
because of a consensus of other
edges.
Can’t have all three
require “outside” probabilities
of constituents, spans, or links (no cycles)
– Training by EM etc.
DFG‐Perseus Workshop
31
DFG‐Perseus Workshop
32
Non‐Projective Parses
ROOT
I
‘ll
give
a
talk
tomorrow
Non‐Projective Parses
on bootstrapping
ROOT
I
‘ll
give
a
talk
tomorrow
on bootstrapping
occasional non‐projectivity in English
subtree rooted at “talk”
is a discontiguous noun phrase
ROOT
can‘t have both
(no crossing links)
The “projectivity” restriction.
Do we really want it?
ista
meam
norit
gloria
canitiem
thatNOM
myACC
may-know
gloryNOM
going-grayACC
That glory may-know my going-gray
(i.e., it shall last till I go gray)
frequent non‐projectivity in Latin, etc.
DFG‐Perseus Workshop
33
DFG‐Perseus Workshop
34
Summing over all non-projective trees
Finding highest‐scoring non‐projective tree
Consider the sentence “John saw Mary” (left).
The Chu-Liu-Edmonds algorithm finds the maximumweight spanning tree (right) – may be non-projective.
Can be found in time O(n2).
Finding highest‐scoring non‐projective tree
Consider the sentence “John saw Mary” (left).
The Chu-Liu-Edmonds algorithm finds the maximumweight spanning tree (right) – may be non-projective.
Can be found in time O(n2).
9
root
9
root
10
saw
20
30
John
11
3
slide thanks to Dragomir Radev
10
30
saw
0
Mary
John
30
30
Mary
How about total weight Z of all trees?
How about outside probabilities or gradients?
Can be found in time O(n3) by matrix determinants
and inverses (Smith & Smith, 2007).
Every node selects best parent
If cycles, contract them and repeat
DFG‐Perseus Workshop
35
slide thanks to Dragomir Radev
DFG‐Perseus Workshop
36
Future Opportunities
Local factors for parsing
– So what factors shall we multiply to define parse probability?
• Unary factors to evaluate each link in isolation
• Global TREE factor to require that the links form a legal tree
• Efficiently modeling more hidden structure
– POS tags, link roles, secondary links (DAG‐shaped parses)
– this is a “hard constraint”: factor is either 0 or 1
• Beyond dependencies
• Second‐order effects: factors on 2 variables
–
–
–
–
–
–
…
– Constituency parsing, traces, lattice parsing
grandparent
no‐cross
siblings
hidden POS tags
subcategorization
…
• Beyond parsing
– Alignment, translation
– Bipartite matching and network flow
– Joint decoding of parsing and other tasks (IE, MT, reasoning ...)
preferred
find
links
…
by
• Modeling sentence processing
– BP is a parallel, anytime process DFG‐Perseus Workshop
37
DFG‐Perseus Workshop
38
37
Projecting Hidden Structure
ADAPTATION
DFG‐Perseus Workshop
39
Projection
Im
In
Anfang
the
war
beginning
was
das
the
Wort
word
•
•
•
•
•
•
•
•
40
Divergent Projection
Train with bitext
Parse one side
Align words
Project dependencies
Many to one links?
Invalid trees?
Hwa et al.: fix‐up rules
Ganchev et al.: trust only some links
DFG‐Perseus Workshop
Yarowsky & Ngai ‘01
DFG‐Perseus Workshop
41
Auf
diese
Frage
habe
ich
leider
keine
Antwort
bekommen
NULL
I
did
not
unfortunately
receive
an
head‐swapping
DFG‐Perseus Workshop
monotonic
answer
to
null
this
question
siblings
42
What’s Wrong with Projection?
Free Translation
• Hwa et al. Chinese data:
Bad dependencies
Tschernobyl
könnte
dann
etwas
später
an
die
Reihe
Im
kommen
Anfang
war
das
– 38.1% F1 after projection
– Only 26.3% with automatic English parses
– Cf. 35.9% for attach right!
– 52.4% after fix‐up rules
Wort
NULL
• Only 1‐to‐1 alignments:
Parent‐ancestors?
Then
we
could
deal
with
Chernobyl
some
time
DFG‐Perseus Workshop
In
later
the
beginning
was
In
Anfang
the
war
beginning
was
das
the
Wort
DFG‐Perseus Workshop
• Different languages
• Similar meaning
• Divergent syntax
• Same sentence
• Divergent syntax
word
In
the
beginning
was
the
word
In
the
beginning
was
the
word
45
DFG‐Perseus Workshop
A Lack of Coordination
or
never
now
Prague
now
or
44
Adaptation
DFG‐Perseus Workshop
now
– 68% precision
– 11% recall
word
43
Projection
Im
the
or
46
Prepositions and Auxiliaries
in
never
the
end
in
the
end
in
the
end
Mel’čuk
never
now
or
never
I
CoNLL
have
decided
I
have
decided
MALT
DFG‐Perseus Workshop
47
DFG‐Perseus Workshop
48
Adaptation Recipe
Why?
• Why not just modify source treebank?
• Source parser could be a black box
• Acquire (a few) trees in target domain
• Run source‐domain parser on training set
• Train parser with features for:
– Or rule based
• Vastly shorter training times with a small target treebank
– Target tree alone
– Source and target trees together
– Linguists can quickly explore alternatives
– Don’t need dozens of rules
• Parse test set with:
– Source‐domain parser
– Target‐domain parser
DFG‐Perseus Workshop
• Other benefits of stacking
• And sometimes, divergence is very large
49
DFG‐Perseus Workshop
50
What We’re Modeling
This paper
t
a
w’
in
im
Anfang
the
w
beginning
Generative
p(t,a,w | t',w')
Conditional
p(t | t',a,w,w')
t’
Ongoing work
p(t,t',a | w,w')
s(t,t',a,w,w') i f i (t,w)
MODEL STRUCTURE
i
j g j (t,t',a,w,w')
DFG‐Perseus Workshop
51
Stacking
j
DFG‐Perseus Workshop
52
Quasi‐Synchronous Grammar
…
• Generative or conditional monolingual model of target language or tree
• Condition target trees on source structure
• Applications to
Model 2 has features for when to trust Model 1
Model 2
Input
– Alignment (D. Smith & Eisner ‘06)
– Question Answering (Wang, N. Smith, Mitamura ‘07)
– Paraphrase (Das & N. Smith ‘09)
– Translation (Gimpel & N. Smith ‘09)
Model 1
DFG‐Perseus Workshop
53
DFG‐Perseus Workshop
54
Dependency Relations
DFG‐Perseus Workshop
Adaptation Results
+ “none of the above”
55
Unsupervised Projection
DFG‐Perseus Workshop
Different PTB dep. conversions
See paper for more results
DFG‐Perseus Workshop
56
Supervised Projection
57
DFG‐Perseus Workshop
58
Mining Information from Books
•Modern OCR, several errors/page
•Names are worse:
•In one study, 35% names incorrectly transcribed
•Errors propagate to later steps
•Train language models and name extractors on noisy corpus
MINING A MILLION BOOKS
DFG‐Perseus Workshop
59
DFG‐Perseus Workshop
60
Learning by Reading
Reuse & Quotation
Correct Orozco occurs 149x in doc.
Tlachquiauhco occurs once in doc.
TlaGhqQlanhco.— Tl£U!h‐quiauh‐co.~77acA^ta«co. El Sr. OrozcQ y Berra no di<5 la significación de esta palabra en la pri‐
mera parte del Códice Mendocino. El juego de pelota, tlachfU, y la lluvia, quiahuitl, dan literalmente con Inducing language‐ and corpus‐specific features
DFG‐Perseus Workshop
61
UMass Book Search
DFG‐Perseus Workshop
63
DFG‐Perseus Workshop
62
© Copyright 2026 Paperzz