Dialogue act tagging

Compositional vector models of natural language
Dmitrijs Milajevs
Richard Ethan Jean-Marie
[email protected]
[email protected]
only in Ireland
Idea - Wikipedia, the free encyclopedia
Video
en.wikipedia.org/wiki/Idea Cached
Innate and... | Philosophy | In anthropology... | Semantics
In philosophy, the term idea has been used to cover a range of concepts. Ideas are often
construed as mental representational images; i.e., images of some ...
News
Shopping
More
idea
idea by the Free Online Dictionary ...
www.thefreedictionary.com/idea Cached
As I lay in bed this morning between sleeping and waking, an idea came riding on a
sunbeam into my room,--a mad, whimsical idea, but one that suits my mood; and put ...
Anytime
Past day
Past week
Idea
Past month
Idea at Dictionary.com
dictionary.reference.com/browse/idea Cached
noun . 1. any conception existing in the mind as a result of mental understanding,
awareness, or activity. 2. a thought, conception, or notion: That is an excellent idea.
Idea - Image Results
10
7
0
0
47
3
12
19
39
15
146
93
a
ide
book
gi
rl
y
bo
idea
notion
boy
girl
on
only in UK
ti
Images
the Web
Mail
The rows in the co-occurrence
table can be interpreted as
vectors whose dimensions are
labeled by the columns.
philosophy
Search:
Dmitrijs
The co-occurrence
frequencies are extracted
from The British National
Corpus [4].
Vectors of similar words point to the same direction.
sc
ho
ol
Web
Search
Word similarity
Vector space
school
idea am I looking for?
What
Co-occurrence
book
What am I looking for?
Word vectors
philosophy
You shall know a word by the company it keeps [1]
[email protected]
[email protected]
no
Distributional semantics
Matthew Purver
Mehrnoosh Sadrzadeh
Similarity
notion
boy
girl
More Idea images
idea
The angle between two words correlates with their semantic
similarity. For example, girl is much closer to boy than to notion.
1.
.788
.560
.576
.788
1.
.499
.505
.560
.499
1.
.938
.576
.505
.938
1.
IdeaCellular
www.ideacellular.com/wps/portal Cached
Idea receives Grant from U.S. Trade & Development Agency for Green Telecom pilot in
India. Idea's smart and stylish, 'Aurus 4', for the tech-savvy youth!
Idea Synonyms, Idea Antonyms | Thesaurus.com
idea
notion
boy
girl
thesaurus.com/browse/idea Cached
Synonyms for idea at Thesaurus.com with free online thesaurus, antonyms, and
idea: Home - Evangelical Alliance
www.eauk.org/idea Cached
idea: Discipleship: freedom, joy and abundant life Lucy Peppiatt shares on discipleship:
freedom, joy and abundant life for idea for leaders.
IDEA - International Institute for Democracy and Electoral ...
[1] John R. Firth. 1957. A Synopsis of Linguistic Theory, 1930-1955. Studies in Linguistic Analysis, pages 1– 32.
[4] The British National Corpus, version 3 (BNC XML Edition). 2007. Distributed by Oxford University Computing Services on behalf of the BNC Consortium. URL: http://www.natcorp.ox.ac.uk/
http://www.eecs.qmul.ac.uk/~dm303/eecs_open14.html
Dialogue act tagging
Symmetric composition
utterance categorization
for dialogue act tagging
Grammatical analysis
Utterance vectors word vectors are summed up
Action-directive
Well
Reject
I
I
pretty
good
idea
.
nn n n
l
ll
l
nn
l
n
Approximate M, the matrix of the utterance
V
T
!
pretty =
ij
good =
idea =
!
V =M
!
Pij n
⃗i ⊗ n⃗j
Gkl n⃗k ⊗ n
⃗l
Dw n⃗w
Compositional phrase vector
T
U M
Switchboard corpus [2]
pretty good idea = (1N ⊗ ϵN )(1N ⊗N ⊗N ⊗ ϵN )(pretty ⊗ good ⊗ idea)
!
=
Pij Gkl Dw ⟨n⃗j |n⃗k ⟩⟨n⃗l |n⃗w ⟩n⃗i
ijklw
Look for the most dominant
category between k nearest
neighbors
•••
••• • •• •
•• ? •••• ••
•••• •
•• •
• • •
#Dubai
<+nomic> apps are the future
<+D_Polizei> They are?
<+nomic> yes.
+nomic has just coded 7 hours str8
<+nomic> back on track now
<+D_Polizei> Good thing I have a Mac
then! and the APP STORE
<+nomic> yus
<+nomic> how many apps hve you bought
<+Baldilocks> hey TeChNoSouL
<TeChNoSouL> hi again
<+kelp> :-Dgood to see u
<TeChNoSouL> not in mood but here
<TeChNoSouL> dont know
<+kelp> Wooohooooooooooo
<TeChNoSouL> it will be another shorty
<+Baldilocks> !voice TeChNoSouL
<+TeChNoSouL> trip
#Bookz
#Teens
<numbertu> @search horus heresy
<clipp>
<Trjegul> @Find The Taming of the
Samurai
<becky> @Jung Chang - Wild Swans Three Daughters of China (epub).rar
<becky>
Three Daughters of China (epub).rar
<Melinaaa> good morning
<Melinaaa> ooo noo dose anyone know how
to open disk thingie when u mess it up?>
<sc-zZz-9> what disk thingie
<scorp9> im gonna go drop off some
pitosporum tothe shop
<scorp9> Antares topic none
<KornNut> yikea
http://i-world-tech.blogspot.co.uk/2013/01/internet-relay-chat-irc.html
0.602
0.592
0.639
0.572
Bag of words
Chatlog similarity
General
Acknowledgment
Material things
Animals and people
.00597
.00469
.00950
0
.00833
.00472
.00952
0
0
.01066
.01074
.02650
.00168
.00933
.02462
.02763
0
0
.00034
.00078
0
.00292
.00142
.00198
.00207
0
.00105
.00314
0
0
0
0
0
.00603
.00189
.00147
0
0
0
0
0
0
0
0
.11607
0
0
0
.00256
0
0
0
.00599
0
.00359
0
0
.00424
0
0
0
.00222
.00029
.00179
0
0
.00048
0
.00161
.00167
0
0
.00059
0
0
0
0
0
0
.00055
0
0
0
0
0
0
0
.00054
0.0003
0.0092
Doc1 = (Term11 + Term12 + Term13)
Doc2 = (Term21 + Term22 + Term23)
#Dubai
think
title
young thanks
white
world
warning
voted years time wrong
people lol solar
#Bookz
white
:)
years
hi like
wb
welcome
say
:p
good
thanks
yummy
Dubai
money
lol
:(
know
wow work
vodka
#Teens
novel
fantasy
dark world mystery war books
wolf
wrong
table
star love private wheel papillon
sleep
;)
hi
time
syco
this
wb
love
like :)
pretty
ty
woman
lmao
good hugs
term2
te
rm
3
Doc3 = (Term31 + Term32 + Term33)
Chatlog similarity
Term frequency cloud
#USA
In the bag-of-words models documents are represented as sum of
weighted term frequencies.
term1
Books
0
.00754
0
.00511
0
.00255
0
.00220
0
.00467
0
.00592
0.00095 .01047
0
0
0
0
Music
Dubai .01438
India .00563
UK
.00975
USA
0
Canada .00783
50+
.01080
Teens .01211
Bookz 0
mp3
0
Expressive
Words with the highest TF-IDF scores were manually categorized into
13 groups. Channel vector components are the conditional probabilities
of a group given a channel.
Chatlogs
#USA
Bag of words
Addition non SVD
Addition
Multiplication
http://www.eecs.qmul.ac.uk/~dm303/cvsc14.html
Derogatory
Ethical concerns were taken
into account by automated
announcements by bots
about the experiment and
safeguarding users’ anonymity.
Accuracy
Virtues
Language use
Chatlogs from English speakers
from different countries and age
groups are compared to reveal
different language patterns.
Data collection
Conversations on 5 countryrelated, 2 age-related and 2
interest-based channels were
collected.
This composition procedure was inspired by the tensor models
of Quantum Mechanics, whereby tensors represent entangled
photons. In vector models of linguistics, entanglement represents
[3] Dmitrijs Milajevs and Matthew Purver. 2014. Investigating the Contribution of Distributional Semantic Information
Economics
Internet Relay Chat
IRC is a system that provides a textbased interface to communicate
with people around the world.
Method
•
Chatlog analysis
conversation similarity
Result
Temporal
[2] John J Godfrey, Edward C Holliman, and Jane Mc- Daniel. 1992. Switchboard: Telephone speech cor- pus for
research and development. In Acoustics, Speech, and Signal Processing, 1992. ICASSP-92., 1992 IEEE International
Conference on, volume 1, pages 517–520. IEEE.
http://xkcd.com/149/
k-nearest neighbors
Humor
** (b) : Okay.
b^m (b) : Okay.
qw (qy): Well what do you think about the idea of uh
kids having to do public service work for a
year?
qy (sd): Do you think it’s a <breathing>
sv (sv): Well I I think it’s a pretty good idea.
sv (sd): I think they should either do that or or
afford some time to the military or or
helping elderly people.
Salutations
Predicted tag
Actual tag
Speaker
a
good idea
Word vectors and
tensors
kl
Response
B
A
A
it’s
Pretty
Singular value decomposition
Action-directive
B
A
B
think
Categorical composition
hey
...
=) ya
:p
shit
bad
lol
haha
heh wrong
wtf
old