Full text - UvA/FNWI

Informatica — Universiteit van Amsterdam
Bachelor Informatica
Contradiction detection between
news articles
Kasper van Veen
June 8, 2016
Supervisor(s): Christof Monz (UvA)
Signed:
2
Abstract
This thesis will try to detect contradictions in two news articles using four phases. During
these phases, dependency graphs are obtained, aligned with each other and non co-referent
sentences are filtered. The last phase is to apply logistic regression using five features. This
experiment will focus on contradictions containing antonyms and negations. The experiments at the end will show the results using contradictions found in the RTE datasets and
in between two news articles. From the results we can conclude that the chosen features
work well, but are not enough to cover the whole RTE dataset.
3
4
Contents
1 Introduction
1.1 Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
7
2 Background
2.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 What are contradictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
9
9
3 How do we detect contradictions?
3.1 Find semantics and syntax of sentences . . . .
3.1.1 Using spaCy for syntax analysis . . . .
3.1.2 Dependency graphs . . . . . . . . . . . .
3.2 Alignment between dependency graphs . . . . .
3.3 Filter non co-referent sentences . . . . . . . . .
3.4 Logistic Regression . . . . . . . . . . . . . . . .
3.5 Features . . . . . . . . . . . . . . . . . . . . . .
3.5.1 Antonyms feature . . . . . . . . . . . .
3.5.2 Switching of object and subject feature
3.5.3 Alignment feature . . . . . . . . . . . .
3.5.4 Negation feature . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
11
14
14
15
15
16
16
18
18
18
4 Experiments
19
5 Conclusion and discussion
23
5
6
CHAPTER 1
Introduction
When the MH17 plane disaster occurred two years ago, the Western media immediately
pointed their fingers at the Russians. The Russians however, pleaded the Ukrainian government as guilty. Instead of accepting and believing what the Western news said, I decided
to start my own research to find out what both parties were saying. I found an article
published on the BBC on the 14th of October 2015, which stated:
‘Mr Joustra said pro-Russian rebels were in charge of the area from where the
missile that hit MH17 had been fired.’
while Pravda (one of the biggest Russian news sources) published an article on the 15th of
October 2015 which stated:
‘Group representatives confirmed that the plane was shot down from the territory
controlled by the official Kiev.’
It can easily be seen that two of the biggest news sources in the world were making totally
different statements about the same subject. This is an interesting observation because it
can be seen as propaganda by both parties. Although it is not possible to find out the truth,
it is however possible to show the statements in which parties contradict from each other.
This is where the idea for the thesis came from: to build a program that is able to detect
contradictions and therefore show the differences in opinion.
Stanford University did some research on contradiction detection.[3] It uses four steps to
determine if sentence pairs from the RTE3 dataset are either a contradiction or not. This
thesis will use a similar approach from Stanford University but will also use different tools
and features which will be discussed later in this thesis.
The research question of this thesis is: what kind of contradictions can I detect in two
related news articles?
1.1 Structure
This thesis starts with a short introduction about the methodology, recent research and the
definition of a contradiction. Chapter three will go further into the subject and will describe
the four steps how computers are able to detect contradictions. Chapter four will show
the results of the experiment using RTE datasets and a few contradictions found in news
articles. Chapter five will mark out the conclusion and the discussion about future work.
7
8
CHAPTER 2
Background
2.1 Methodology
Instead of using the Stanford parser as a dependency parser like Stanford University, this
experiment uses spaCy for a faster and more accurate result. The WordNet database is used
to acquire a large amount of antonyms and negations. This thesis will not only focus on
the contradictions in the RTE datasets, it (Recognizing Textual Entailment) will also try
to detect the contradictions found between different news articles about the MH17 disaster.
The features that are used for the logistic regression are based on some contradictions found
in the news articles. These contradictions are mainly antonyms and negations.
We used the RTE1 dataset to verify if the features and the logistic regression classifier
are sufficient enough to detect contradictions.
2.2 What are contradictions
Before the experiment could start, the definition of a contradiction should be clear. ”Contradictions occur when sentence A and sentence B are unlikely to be true at the same time.”
[3] In terms of logical expressions it states: A∧¬B or ¬A∧B. An important requirement for
contradictions is that both sentences are about the same event (co-referent sentences). Contradictions occur in many different forms and levels of difficulties. Antonyms and negations
are the easiest to recognize followed by numerical differences. Next come factive and modal
words and the hardest are sentences which require world-knowledge to be understood.
Examples are a good way to show what the differences are and how they are recognized. Antonyms are words that are opposites of each other like big/small, rich/poor and
young/old:
‘The people of Kenya are rich’ vs ’The people of Kenya are poor’
Negations are words that are negations of each other like did/didn’t, have/haven’t and
could/couldn’t:
‘Frank committed the crime’ vs ’Frank didn’t commit the crime’
Numerical differences occur when there is a difference between numbers:
‘Apple’s annual revenue was 50 million in 2016’ vs ’In 2016, Apple’s annual
revenue was 40 million.
Also a difference between the date on which an event occurred could be seen as a contradiction:
‘Willem-Alexander became king of the Netherlands in 2013’ vs ’Willem-Alexander
became king of the Netherlands in 2010.
To detect other numerical differences the computer should be able to recognize words like
‘no’, ‘some’, ‘many’, ‘most’, and ‘all’ [6]. These words add extra value to the number next
it to. An example:
9
‘More than 500 people attended the ceremony’ vs ’700 people attended the ceremony’
The program would detect this as a contradiction, since 500 and 700 are different numbers.
However, ‘more than 500’ is technically the same as ‘700’[1]. Although this is not true in
all cases: it depends on the range between the two numbers. ‘At least 200’ and ‘5000’ are
too far apart from each other to be considered reliable sources, although it is technically the
same. It is up to the end user to determine this boundary.
Factive words add necessity or possibility to a verb:
‘The burglar managed to open the door’ vs ‘The burglar opened the door’
Modal words add modality to a verb:
‘He will go to work’ vs ‘He would to go to work’
The last and hardest type of contradictions needs world knowledge to be understood. For
the human eye it might be easy to recognize them as a contradiction but for a computer it
is difficult. An example:
‘Albert Einstein was in Austria’ vs ‘Albert Einstein was in Germany’
This is not a contradiction because both sentences can be true time. Albert Einstein could
have been in both places, but not in the same time
‘Albert Einstein died in Austria’ vs ‘Albert Einstein died in Germany’
This is obviously a contradiction, because both sentences can’t be true. Someone can only
pass away in one place. ‘Died in’ should be seen as a function of a person’s unique place of
death[7]. For a program that is able to detect contradictions it is hard to know all of these
functions, so an idea for further research is to make a dataset of these functions. Research
showed that only few contradictions could be detected using syntactic matching. The rest
depends on having world knowledge and the understanding of the semantic structure of the
sentences [2]. A relative easy part of world knowledge are location relations. It is common
sense that ‘Amsterdam’ is the capital of the Netherlands. A computer however, often does
not have that knowledge.
‘Today, the mayor returned to the capital of the Netherlands, Amsterdam.’ vs
‘Today, the mayor returned to Rotterdam, the capital of the Netherlands.’
The program would detect a contradiction here, because the mayor returned to two different
cities on the same day. Still, this is not a contradiction because the second sentence is false.
Rotterdam is not the capital of the Netherlands. Holonyms should be used to construct
a dataset containing world knowledge [1]. Holonomy is a semantic relationship between
multiple terms. ‘House’ is a holonomy of ‘door’ and ‘window’. In the case above, ‘capital
of the Netherlands’ is a holonomy of ‘Amsterdam’. If implemented correctly in a dataset, it
should see the second sentence as false and thereby will not detect any contradictions.
There are many different sorts of contradictions which makes it hard to detect them.
Often the sentence pairs are not as similar as the sentences above and thus require the
knowledge of their syntactic structure. This paper will mainly focus on the antonym and
negations to test if the contradiction detection program is working on a specific level.
10
CHAPTER 3
How do we detect contradictions?
Two sentences are needed to determine if they are contradictions or not, so we name them
sentence A and B. For contradiction detection there are various steps that need to be made.
The first step is the find the syntactic structure for each sentence. The syntactic structure
of a sentence shows the words that form a sentence and their properties like verbs and
adjuncts. spaCy will be used as a dependency parser to achieve the first step. Next the
two graphs obtained from spaCy will be aligned with each other to acquire a specific score
which will determine if sentence A and B have the possibility to be a contradiction or not.
This score is based on the occurrence of antonyms, negations and other words that might
lead to a contradiction. For the third step we have to filter the non co-referent sentences,
which are sentences that are not about the same event. The final step for this program
is to apply logistic regression which will determine if the sentences are true contradictions
of each other. For this experiment, a couple already existing datasets are used. RTE are
datasets from Stanford University which contains 800 pairs of sentences with the possibility
of being a contradiction. WordNet is a dataset that contains many of the English antonyms
and negations. These datasets are used as a tool to complete this experiment
3.1 Find semantics and syntax of sentences
Computers and humans differ in many ways. When a human hears a sentence, its prior
knowledge is used to understand it. The person does not only uses its knowledge of grammar
but he/she also understands the words and their meanings. To detect contradictions, a
computer should also be able to understand the context of the sentence and the meaning
of the words. To get a better understanding about the content of a sentence, the semantics
should be studied.
Semantic is the study of meaning of linguistic expressions. It shows how words and
phrases are related to their denotation. The meaning of words often depend on the whole
sentence. A good example to show this:
‘Spears have a very sharp point’ vs ‘You should not point at people’
Words like ‘point’ are called homonyms: they are written the same but could have different
meanings. In the first sentence, ’point’ is a noun while in the second sentence ‘point’ is a
verb. The syntax are the grammar, rules and principles that give the structure of a language.
The grammar and the meaning of a sentence are closely related. Sometimes the grammar
can be right but the sentence is meaningless, and vice versa.
‘The helpless apple saw anger’ vs ‘The young man took in the shop some cigarettes’
The first sentence doesn’t make any sense, however the syntactic structure is correct. The
second sentence is syntactic incorrect but the reader can understand the sentence.
3.1.1 Using spaCy for syntax analysis
spaCy was used to get the syntactic structure of a sentence. spaCy is a dependency parser
which reads every word and links them together based on their syntactical structure. It uses
11
tokenization to split the sentence in separate words and numbers. White space characters
are used to separate the tokens from each other. Although it uses 1.5GB of RAM, the
choice for using spaCy instead of other dependency parsers like Stanford was quickly made.
spaCy is a fast and very accurate parser written in Python which made it easier to combine
with the other components of the contradiction detection program. It can also recognize
homonyms based on the rest of the sentence, a proof of this can be seen in figure 3.1 and
3.2 where the sentence pair ‘Spears have a very sharp point’ and ‘You should not point at
people’ is used. The figures are images taken from the CSS version of spaCy’s dependency
parser[8].
Figure 3.1: point as a noun
Figure 3.2: point as a verb
It is clear that ‘point’ in the first sentence is seen as a noun while ‘point’ in the second
sentence is a verb. Names of persons, companies, countries and other named entities are
also recognized which made spaCy the perfect tool to start the experiment. It is obvious
what verb and noun means but the other tags are important as well so table 1 gives more
information about the most common tags [10].
12
VERB
NOUN
PRON
PROPN
AUX
ADJ
ADV
ADP
CONJ
DET
INTJ
NUM
PRT
PUNCT
SCONJ
SYM
verbs (all tenses and modes)
nouns (common and proper)
pronouns
proper noun
auxiliary verb
adjectives
adverbs
adpositions (prepositions and postpositions)
conjunctions
determiners
interjection
cardinal numbers
particles or other function words
punctuation
subordinating conjunction
symbol
For this experiment it is useless to only use the CSS version of spaCy so the API is
used to obtain the graphs. A next pressing issue that came across during this experiment
are syntactic ambiguities. They occur when the specific word order is not enough to fully
understand the sentence. A famous example came from a quote by Grouche Marx:
‘One morning I shot an elephant in my pajamas. How he got into my pajamas
I’ll never know..’
This sentence can be interpreted in two different ways, either: I shot the elephant when I
was wearing my pyjamas or I shot the elephant, who was wearing my pyjamas. When this
sentence is used in spaCy’s dependency parser we get the result seen in figure 3.3.
Figure 3.3: Syntactic ambiguity
This means that spaCy is interpreting this sentence as: I shot the elephant when I was
wearing my pyjamas. ‘Shot’ refers to ‘in’ which refers to ’pyjamas’. In the other case
‘elephant’ would have referred to ‘pyjamas’. Although syntactic ambiguities don’t occur so
often, it is something to take into consideration for this experiment. So far this experiment
is only able to detect one interpretation and therefore it might not be able to detect some
contradictions. When we transform the sentence above into a more realistic version we get:
‘The burglar shot someone in his pyjamas’
which translates in:
‘The burglar shot someone, while the burglar was wearing his pyjamas’
Now a sentence that might look like a contradiction:
‘Someone was not shot wearing his pyjamas’,
this might look like a contradiction, but it is not about the same event. Nobody was shot
while wearing pyjamas, instead somebody shot while wearing pyjamas.
So after both sentences are parsed, they should get their corresponding dependency
graphs.
13
3.1.2 Dependency graphs
A dependency graph is a graph which represents dependencies of various objects (in this
case, words) to each other. Each graph contains as much information as possible about the
semantic structure of the sentence. Each sentence is split up into words (nodes) and each
edge show the grammatical relationship between the words. In the program, there are three
different dependency graphs that are shown to the user. POS, tag and NER. From the official
spaCy documentation: POS represents the word-class of a token. It is a coarse-grained and
less detailed tag. Tag is different, it is fine-grained and more detailed than POS. It represents
not only the word-class but also some standard morphological information about the token.
Tag is used for the syntactic parser because they are language and treebank dependent.
The tagger has the ability to predict these fine-grained tags and a mapping table is used
to reduce them to the coarse-grained .pos tags.[9] At last, NER stands for named-entity
recognition, which are names of persons, places, companies and other known entities.
The program allocates each word to a single node. However, some words are auxiliary
verbs which are verbs that are used in forming the tenses of other verbs. Examples are:
‘were lost’ and ‘must go’. A auxiliary verb is often attached to a main verb, which shows
the semantic content of the sentence. Another example: ‘I did not complete my homework’,
the main verb is ‘complete’ while ‘did not’ is used to support it. The program sees these
two words as one word and makes sure to only allocate it to one single node. The output of
this graph is:
I did not complete my homework
[’PRP’, ’VB’, ’PRP$’, ’NN’]
[’PRON’, ’VERB’, ’ADJ’, ’NOUN’]
[’’, ’’, ’’, ’’,’’]
The next step is to align the two graphs to each other to find the similarities and
differences.
3.2 Alignment between dependency graphs
After each sentence is transformed into a dependency graph, they are aligned to each other.
Alignment between graphs is the concept of mapping two graphs with each other, to make
them as similar as possible. For contradiction detection it is used to map words (nodes)
from sentence A to similar words in sentence B. If a word does not have any similar words,
it is ignored.
The idea is to obtain a specific score based on the alignment. Synonyms and antonyms
will get the highest score while words that have no similarity (irrelevant words) will get the
lowest score.
The similarity score is based on the cosine metric. This is an similarity measurement
between two vectors that evaluates the cosine of the angel between them. The spaCy toolkit
uses the word2vec model vectors produced by Levy et al.[5], and those vectors are used for
the cosine metric.
If two nodes are identical, they get a score of 100% (1.000). Nodes also get a score of
1.0000 if the node is a substring of another node. To avoid matching a substring of a word,
a space is prepended and appended to the node. This is done to make sure that nodes are
not matched partially. For example, the words ‘automobile’ and ‘mobile’ should not match.
Experiments showed the importance of merging named entities. In the sentences
‘Mitsubishi Motors Corp. sales fell 46 percent’ vs ‘Mitsubishi sales rose 46 percent’
‘Mitsubishi Motors Corp.’ and ‘Mitsubishi’ should be seen as the same corporation.
Originally, the program parsed Mitsubishi Motor Corp into three different tokens and compared those to ‘Mitsubishi’. Part of the output of finding this similarity can be seen below.
similarity
similarity
similarity
similarity
similarity
similarity
of
of
of
of
of
of
Mitsubishi
Mitsubishi
Mitsubishi
Mitsubishi
Mitsubishi
Mitsubishi
(mitsubishi)
(mitsubishi)
(mitsubishi)
(mitsubishi)
(mitsubishi)
(mitsubishi)
and Mitsubishi (mitsubishi) = 1.0000
and Motors (motors) = 0.5534
and Corp. (corp.) = 0.0000
and sales (sale) = 0.2112
and percent (percent) = 0.2821
is the highest (1.0000)
14
Merging named entities only applies for PROPN (proper noun), thus names for persons,
companies and places. Merging named entities is limited to PROPN, to prevent NUMPERCENT pairs from merging. Consider the example below:
‘Mitsubishi Motors Corp. sales fell 46 percent’ vs ‘Mitsubishi sales rose more
than 40 percent’
If merging is not limited to PROPN, the NUM-PERCENT pair will be merged. This
will result in a low entailment score, because it sees ‘40 percent; and ‘46 percent’ as totally
different words with no similarity:
similarity of 40 percent (40 percent) and 46 percent (46 percent) = 0.0000
However when merging named entities is turned off for NUM (cardinal numbers) the
total alignment score will be much higher:
similarity of 40 (40)
similarity of 46 (46)
--- percent --similarity of percent
similarity of percent
similarity of percent
similarity of percent
and 46 (46) = 0.7749
is the highest (0.7749)
(percent)
(percent)
(percent)
(percent)
and Mitsubishi Motors Corp. (Mitsubishi Motors Corp.) = 0.0000
and sales (sale) = 0.3118
and percent (percent) = 1.0000
is the highest (1.0000)
Mathematically speaking: f (‘40 percent’, ‘46 percent’) < f (‘40’,‘46’)+f (‘percent’, ‘percent’)
where f is the similarity function. Computing this equation results in 0 < 0.7749 + 1.0000.
The words which have the best similarity and thus the highest score will be used to determine
the total alignment score.
3.3 Filter non co-referent sentences
Some sentence pairs might obtain a high alignment score but are not contradiction at all.
An important requirement of contradictions is that both sentences are about the same event,
called co-reference. A good example to illustrate this is:
‘The palace of the Dutch royal family is in Amsterdam’ vs ‘The palace of the
British royal family is in London’
This will get a very high alignment score because most words are the same. Amsterdam and London are different places and therefore the program should see this pair as a
contradiction, because the palace of the royal family can only be in one country.
similarity
palace → palace
family → family
is → is
Amsterdam → London
alignment score
1.0000
1.0000
1.0000
0.6464
However this is about two different royal families, so it is not a contradiction at all. Here
people can see the importance of filtering non co-referent sentences. This thesis will only
try to detect contradictions from antonyms and negations in a given text. Contradictions
based on world knowledge are very hard to detect without a reliable database containing
world knowledge. Therefore this part won’t be used further.
3.4 Logistic Regression
When the alignment between two graphs have resulted in a high score, logistic regression
will be the final step in this experiment. It will determine if two sentences are entailed with
each other. Entailment occurs when sentence A needs the truth of sentence B to be true
itself. So if A is true, then B is true: A |= B. If there is no entailment in this stage, it
means that sentence A and B have a high possibility of being a contradiction. An example
of entailment:
‘The child broke the glass’ vs ‘The glass is broken’
15
In this example it can be seen that if the first sentence is true, the second sentence is true
as well. If the child broke the glass it means that the glass is broken.
Logistic regression is a mathematical model which determines the probability if a specific
event will occur or not. It uses previously given data to predict the outcome of new data.
The goal is to let a computer make the decisions while it is not programmed to do a task
specifically. The model will learn from a training set, which consists of a matrix X and
a vector y. X contains all the features, while vector y is the decision. In matrix X, each
column shows a single feature and each row shows the value of that feature.
A mathematical approach for logistic regression starts with the following formula:
p
= β0 + β1 x1 + β2 x2 + . . . + βn xn
1−p
where x1 , x2 , . . . , xn are the elements in the feature vector. The bias term β0 is used
to vertically translate the decision boundary to make sure it does not have to intersect the
origin. Since there is no feature x0 , it has a weight of 1 (1 · β0 = β0 ). Therefore every
feature vector starts with weight 1, called the bias term. The remaining β1 , β2 and βn are
the weights that correspond to the feature vector elements
The probability p notates the chance of something to happen, which is always between
p
0 and 1. The odds of the dependent variables are log 1−p
. So the odds for Y:
log
P (Y = 1)
P (Y = 1)
=
P (Y = 0)
1 − P (Y = 1)
In the case of this experiment, the data that is used is based on features which, if used
together, will determine if two sentences entail each other. Stanford University tried a
similar experiment, it uses 28 features to recognize entailment using specific patterns.[6].
The features that are used in this thesis mainly focus on antonyms, negations, switching
of subjects and objects, alignment and possible more in the future. If a pair of sentences
contain antonyms or many negations, the possibility of a contradiction is high and there is
no entailment.
This experiment uses a three class classifier to determine entailment. The three classes
are: ‘yes’, ‘no’, and ‘unknown’.
A famous three class classifier is the iris dataset. The iris dataset contains the measurements of four attributes of 150 iris flowers from three different types of irises: setosa,
virginica and versicolor[4]. The four attributes are sepal length in cm, sepal width in cm,
petal length in cm and petal width in cm
Some data points in figure 3.5 are incorrectly classified because the decision boundary is
in the center of a cluster. The two features are not distinctive enough to separate the two
classes. The conclusion is that it is very important to have distinctive features in order to
predict the right class.
3.5 Features
For this experiment we chose a few features to detect contradictions:
• Amount of antonyms
• Switching of object and subject
• Alignment
• Amount of negations
3.5.1 Antonyms feature
This feature will search for antonyms in a sentence pair. When antonyms occur in two
sentences, there is often no entailment and thus a high change of contradiction. An example
that contains an antonym is:
‘Mitsubishi Motors Corp. sales fell 46 percent’ vs ‘Mitsubishi sales rose 46 percent’
16
Figure 3.4: An example of a 2D plot for contradiction features. The red dots means a low
probability of contradiction while the blue squares show a high probability of contradiction.
Feature one could be the amount of negations while feature two could be the amount of antonyms.
A line will separate the non-entailment from the entailment. All sentences above the line will
have no entailment, and thus have a high chance of being a contradiction. Everything below the
line will have high entailment and therefore a low chance of being a contradiction. This line, also
called the decision boundary, is obtained after fitting the model using training data.
‘Fell’ and ‘rose’ are not direct antonyms like ‘good’ and ‘bad’. However the synonyms of
‘fell’ and ‘rose’ are antonyms. The program will first check if word A and B are antonyms, if
not, it will check if word A is an antonym of a synonym of B or vice versa. To improve the
detection of antonyms, the lemma of a word is used. A lemma is the result of canonicalization
of a word. In this case the lemma of ‘fell’ is ‘fall’ and the lemma of ‘rose’ is ‘rise’.
The word ’fall’ can have different meanings. It could be a synonym for ‘autumn’ or a
synonym for ‘descend’. In this example the program will compare ‘rise’ (meaning: to go up)
with a synonym of fall: descend (meaning: to go down). The output of the program shows
which synonyms are used:
### are_antonyms(fell, rose):
lemma1: Lemma(’fall.v.01.fall’)
lemma2: Lemma(’rise.v.01.rise’)
===========
synonyms: {Lemma(’decrease.v.01.decrease’),
Lemma(’descend.v.01.come_down’), Lemma(’precipitate.v.03.precipitate’),
Lemma(’fall.v.32.settle’), Lemma(’decrease.v.01.lessen’),
Lemma(’hang.v.05.flow’), Lemma(’fall.v.21.return’),
Lemma(’fall.v.20.light’), Lemma(’fall.v.04.come’),
Lemma(’decrease.v.01.diminish’), Lemma(’fall.v.23.fall_down’),
Lemma(’fall.v.01.fall’), Lemma(’hang.v.05.hang’),
Lemma(’fall.v.08.shine’), Lemma(’fall.v.21.pass’),
Lemma(’descend.v.01.go_down’), Lemma(’fall.v.21.devolve’),
Lemma(’accrue.v.02.accrue’), Lemma(’fall.v.08.strike’),
Lemma(’descend.v.01.descend’)}
17
Figure 3.5: Two features of the iris dataset. This dataset is used to classify each type of iris
based on the given measurements. There is a cluster in the blue area and a second cluster in
the brown/red area. The blue one contains Iris setosa, while the other two types of flowers are
grouped together in the second cluster.
antonyms: [Lemma(’descend.v.01.fall’)]
fell rose => True
The string of each lemma consists of four parts. First part is a synonym, followed by a
letter which indicates if it is a verb (v), noun (n), adjective (a), adjective satellite (s) or
adverb (r). Next is a two digit hexadecimal integer which shows the amount of words used
in the synset. The last part consists of another synonym and the dot symbol is used as a
separator. The function are antonyms returns either ‘true’ or ‘false’ to see if the two words
are antonyms of each other.
3.5.2 Switching of object and subject feature
In some cases the subject and object are switched in the sentences. This feature detects when
an object becomes a subject and vise versa. If it detects a switch, there is no entailment and
thus a higher chance of being a contradiction. Since the filtering of non co-referent sentences
is not implemented, it can not clearly be said that they contradict each other because they
might be about different events.
‘CD Technologies announced that it has closed the acquisition of Datel, Inc.’ vs
‘Datel acquired CD Technologies’
In this example, ‘CD Technologies’ was the subject in the first sentence, but the object in
the second sentence. So in this case the sentences contradict each other.
3.5.3 Alignment feature
This feature uses the alignment score to determine if a pair of sentences contradict each
other. When the alignment score is high, it is possible to predict if sentences entail each
other. If the alignment score is low, there is a high chance that entailment is unknown.
Sentence pairs need an alignment score higher than 4. This number is empirically chosen.
3.5.4 Negation feature
Negations are words like ‘not’ (since ‘didn’t’ and ‘haven’t’ parse as two nodes). The first
feature will count the number of negations in the first graph, while the second feature will
detect negations in the next graph. In this way it will try to recognize patterns regarding
negations.
18
CHAPTER 4
Experiments
In this experiment, various RTE datasets are used to detect entailment, which contain the
following numbers [3]:
Dataset
RTE1 dev1
RTE1 dev2
RTE1 test
RTE2 dev
RTE3 dev
RTE3 test
# of contradictions
48
55
149
111
80
72
# of total pairs
287
280
800
800
800
800
These datasets contain many sentence pairs divided in three classes based on entailment:
‘yes’, ‘no’ and ‘unknown’. Unknown entailment is often the result of non co-referent sentences. An example of a non co-referent sentence pair found in the RTE1 dev2 3ways dataset
is:
‘The Irish Sales, Services and Marketing Operation of Microsoft was established
in 1991’ vs ‘Microsoft was established in 1991’
The first sentence is about a specific department of Microsoft, the second sentence is about
Microsoft itself. Although they are similar, they are not about the same event and therefore
entailment is unknown. To detect entailment, 10 pairs containing antonyms and negations
were chosen from the RTE1 dev1 3ways and RTE1 dev2 3ways datasets and 16 more were
chosen from the RTE1 test 3ways
Dataset
RTE1 dev
RTE1 test
# of pairs
10
16
# cont. antonyms
5
11
# cont. negations
2
5
% accurate
90%
62.5%
The table above shows the total results. For the RTE1 dev datasets, 10 pairs were chosen of which 5 contained antonyms, 2 contained negations and 2 contained none of them.
Together with the features, the program achieved an accuracy of 90%. When looking at
the RTE1 test dataset, 16 pairs were chosen. There were 11 pairs containing antonyms and
5 pairs containing negations. Here we achieved an accuracy of 62.5%. Some other results
showing the output of individual sentence pairs are shown below:
=== id=13 entails=1 length=None task=IR ===
T: iTunes software has seen strong sales in Europe.
H: Strong sales for iTunes in Europe.
alignment: 3.0
features: [ 0. 0. 0. 0. 0.]
This output shows that the alignment score is ‘3.0’. There are three words (‘sales’, ‘iTunes’
and ‘Europe’) that are exactly the same. Exact matches are 100% the same and therefore get
a score of ‘1.0000’ each. The summation of these values gives the final alignment score. The
word ‘strong’ is not a PROPN, NOUN or VERB and therefore not considered for matching.
The next number is the antonym feature and since there are no antonyms in both sentences, the value is 0. The last number is the switch between object and subject feature.
19
Since there is no verb in the second sentence, there are no objects and subjects to switch.
Therefore the object and subject switch feature has a value of 0.
=== id=148 entails=0 length=None task=RC ===
T: The Philippine Stock Exchange Composite Index rose 0.1 percent to 1573.65.
H: The Philippine Stock Exchange Composite Index dropped.
alignment: 5.50424838962
features: [ 1. 1. 0. 0. 0.]
In this example the alignment score is high, although it should be lower. The Philippine
Stock Exchange Composite Index should be seen as one organization and therefore only get
an alignment score of ‘1.0000’. The words ‘rose’ and ‘dropped’ are antonyms and therefore
also get a high alignment score. Since there is an antonym, the second value is now 1. This
means that there is no entailment en thus a high probability of contradictions.
=== id=177 entails=0 length=None task=RC ===
T: Increased storage isn’t all Microsoft will be offering its Hotmail users
--they can also look forward to free anti-virus protection.
H: Microsoft won’t offer increased storage to its users.
alignment: 4.99999963253
n’t
n’t
features: [ 1. 0. 1. 1. 1.]
This case shows a high entailment and the change of subject and object. In the first sentence
‘increased storage’ is the object and ‘Microsoft’ is the subject. This is a false positive because
‘Microsoft’ is not switched as an object, only its position in the sentence is switched. In both
sentences, negations are detected. Although there is no antonym, there is no entailment so
the probability of contradictions are high.
=== id=969 entails=0 length=None task=PP ===
T: Doug Lawrence bought the impressionist oil landscape by J. Ottis
Adams in the mid-1970s at a Fort Wayne antiques dealer.
H: Doug Lawrence sold the impressionist oil landscape by J. Ottis Adams
alignment: 4.78047287886
features: [ 1. 1. 0. 0. 0.]
This is a clear example of an antonym.
=== id=971 entails=0 length=None task=PP ===
T: Mitsubishi Motors Corp.’s new vehicle sales in the US fell 46 percent in June
H: Mitsubishi sales rose 46 percent
alignment: 4.48539948256
features: [ 1. 1. 0. 0. 0.]
This sentence pair used in the thesis earlier is also a clear example of antonym.
=== DEV DATASET =============================
number of pairs in dev: 10
logreg score on dev: 0.9
pair ids: [13, 46, 52, 148, 177, 227, 956, 969, 971, 1950]
answers: [ 1. 2. 1. 0. 0. 0. 0. 0. 0. 0.]
predicted: [ 1. 1. 1. 0. 0. 0. 0. 0. 0. 0.]
This output shows that 10 sentence pairs were used for training purposes. When predicting
the same sentences pairs using the training model, a 90% mean accuracy is achieved. This
is done to verify that the examples are learned correctly. The list of numbers after pairs are
the ID’s in the RTE1 dataset. The list of numbers after answers describes the entailment
value of the dataset. 0 is no, 1 is yes and 2 is unknown.
In this case, the second predicted value is not the same because pair 13 and 46 share the
same feature vectors [0, 0, 0, 0, 0] and that causes a mismatch between the predicted value
and the answer. Pair 13 has value 1 and pair 46 has value 2.
=== id=1984 entails=0 length=None task=PP ===
T: Those accounts were not officially confirmed by the Uzbek or American governments.
H: The Uzbek or American governments confirmed those accounts.
alignment: 4.0
20
not
features: [ 0.
0.
1.
1.
0.]
=== id=1981 entails=0 length=None task=PP ===
T: The bombers had not managed to enter the embassy compounds.
H: The bombers entered the embassy compounds.
alignment: 4.00000032203
not
features: [ 1. 0. 0. 1. 0.]
These two sentence pair both contain negations. The first sentence pair does not get a value
of 1 at the alignment feature because the score is not larger than 4.
=== TEST DATASET ============================
number of pairs in test: 16
logreg score on test: 0.625
pair ids: [1370, 2167, 2019, 934, 1847, 1990, 1984, 1421, 1445, 1981, 1960, 2088,
1044, 986, 1078, 1077]
answers: [0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0]
predicted: [ 0. 1. 1. 0. 1. 0. 0. 1. 0. 0. 1. 1. 1. 0. 0. 0.]
It can be seen that not all pairs are correct. We get an accuracy of 62,5%. All of the incorrect
pairs have the same feature vector [0, 0, 0, 0, 0] and thus automatically get entailment 1. More
features need to be implemented to get a higher accuracy.
When running the program using the sentence pairs found in the BBC and Pravda we
get:
Mr joulstra said Pro-Russian rebels were in charge of the area from where
the missile that hit MH17 had been fired
Group representatives confirmed that the plane was shot down from the territory
controlled by the official Kiev
alignment: 2.31372259356
features: [ 0. 0. 0. 0. 0.]
predicted: [ 1.]
Unfortunately the program detects entailment in this sentence pair while there should be
no entailment. The problem of the low alignment score is that ‘MH17’ and ‘plane’ are not
seen as synonyms. This is considered world knowledge and therefore it should manually put
in a database. The synonyms found were ‘fired’ - ‘shot down’ and ‘area’ - ‘territory’.
21
22
CHAPTER 5
Conclusion and discussion
Contradiction detection is hard due to all the differences in contradictions and the difficulty
of language in general. This thesis focused on recognizing antonyms and negations using
four phases. The first phase uses spaCy as a dependency parser. The second phase aligned
to two dependency graphs obtained from spaCy. The third phase, filtering non co-referent
sentences, was not implemented because this experiment does not focus on contradictions
containing world knowledge. In the last phase, logistic regression is applied using five features. These features are based on antonyms, negations, alignment score and switching
between objects and subjects.
In the experiments, we achieved an accuracy of 90% on the dev set and 62.5% on the
test set. Antonyms and negations are detected but when all of the features have value 0, the
predictions are wrong. When using the program to detect entailment in the sentence pairs
found regarding MH17, the result is unfortunately not accurate. A database containing
world knowledge should be used. For example, ‘MH17’ and ‘plane’ should be seen as a
synonym.
In the future, this program could be extended with more features to detect a wider range
of contradictions. Detecting numerical differences could be achieved by adding words like
‘no’, ‘some’, ‘many’, ‘most’, and ‘all’ to a database containing numbers. If implemented
correctly it should identify ‘more than 500’ as technically equal to ‘700’.
To detect contradictions containing world knowledge another database should be incorporated. This database should answer queries related to specific events. For example, ‘born
in’ has to be related to one specific place since a person can only be born in one place.
Sentence pairs containing geographical places should make use of holonymys. These are
words that have a semantic relationship with other words, for example ‘House’ is a holonomy
of ‘door’ and ‘window’ and ‘capital of the Netherlands’ is a holonomy for ‘Amsterdam’. This
database should therefore be expanded with places and their relationship to other places.
When making these kind of databases, containing information about numbers, specific
functions and holonyms, one should be able to detect a larger amount of contradictions.
23
24
Bibliography
[1] Daniel Cer. Aligning semantic graphs for textual inference and machine reading.
[2] Ido Dagan, Bill Dolan, Bernardo Magnini, and Dan Roth. Recognizing textual entailment: Rational, evaluation and approaches–erratum. Natural Language Engineering,
16(01):105–105, 2010.
[3] Marie-Catherine De Marneffe, Anna N Rafferty, and Christopher D Manning. Finding
contradictions in text. In ACL, volume 8, pages 1039–1047, 2008.
[4] Ravindra Koggalage and Saman Halgamuge. Reducing the number of training samples
for fast support vector machine classification. Neural Information Processing-Letters
and Reviews, 2(3):57–65, 2004.
[5] Omer Levy and Yoav Goldberg. Dependency-based word embeddings. In ACL (2),
pages 302–308, 2014.
[6] Bill MacCartney, Trond Grenager, Marie-Catherine de Marneffe, Daniel Cer, and
Christopher D Manning. Learning to recognize features of valid textual entailments. In
Proceedings of the main conference on Human Language Technology Conference of the
North American Chapter of the Association of Computational Linguistics, pages 41–48.
Association for Computational Linguistics, 2006.
[7] Alan Ritter, Doug Downey, Stephen Soderland, and Oren Etzioni. It’s a contradiction—
no, it’s not: a case study using functional relations. In Proceedings of the Conference
on Empirical Methods in Natural Language Processing, pages 11–20. Association for
Computational Linguistics, 2008.
[8] spaCy. spacy css demo, 2015. [Online; accessed 15-May-2016].
[9] spaCy. spacy documentation, 2015. [Online; accessed 15-May-2016].
[10] universaldependencies. Universal pos tags, 2014. [Online; accessed 26-May-2016].
25