Answering 8th Grade Science Questions - b

Answering 8th
Grade Science
Questions
Steven Hewitt, An Ju,
Katherine Stasaski
Problem Statement
●
Given science questions from a grade school multiple-choice exam, find the best
possible answer.
○
●
Example: Which of the following does not allow sound to travel through?
A. Solid
B. Liquid
C. Gas
D. Vacuum
Measure of success: percent of questions correctly answered
Data Source
●
Set of AI2 questions
○
○
●
Entailment data
○
○
●
●
8th Grade: Total of 3710 Questions
4th Grade: Total of 1437 Questions
SICK: 10,000 sentence pairs
SNLI: 570,000 sentence pairs
Open source science textbooks from c12k.org
GloVe Stanford vectors
○
Word embeddings trained on large corpus of Wikipedia articles
Knowledge Base
(Textbooks)
Pipeline
Hypotheses
Answer choices
Ex: Vacuum
Question
Ex: Which does
not allow sound to
travel through?
Hypothesis
gathering
(Hand-crafted
rules)
Hypotheses
Ex: A
vacuum
does not
allow sound
to travel
through.
Relevant
sentence
selection
(PyLucene)
Evidence
Ex: Sound
wave
cannot
travel in
an airless
vacuum.
Model
Final output: Answer
Probabilities
Ex: Vacuum 0.8
Baseline Model
Entailment bi-directional model with dropout
on input words
Fully connected network with dropout on
node
Forward pass
D
r
o
p
o
u
t
Dropout on word
Two
dogs
run
… ...
Dogs
are
playing
Dropout on word
Output
Backward pass
Note: Loosely based on Baudis et al (2016)
FC Layer
Output
Confidence
score for each
category:
Entails,
Neutral,
Contradicts
Siamese Entailment Model
Bidirectional RNN with Dropout
are
playing
… ...
Output
Diff
Two
dogs
run
… ...
Output
FC
Layer
Dogs
Fully connected layer
Entailment
Neutral
Contradiction
M×3
Note: Siamese model introduced by Mueller et al. (2016) uses a fixed function instead of FC layer.
MFF (Multi Feed-Forward Network)
b1
b2
b1
b2
b3
... bL
a1
a2
a3
...
aL
b3
...
bL
ai
wi
sum
i
v1
F(ai,bj) = F(ai)F(bj)
result
v2
a1
bj
a2
a3
...
aL
Attend
Note: Originally introduced by Parikh et al (2016)
wj
sum
Compare
j
Aggregate
CNN
b1
b2
b3
... bL
a1
a2
a3
...
aL
F(ai,bj) = F(ai)F(bj)
Fk(ai,bj) = Fk(ai)Fk(bj)
CNN
Attend
Note: Loosely based on Baudis et al (2016)
CNN
Aggregate
...
FC
Result
Results: Entailment Data
Model
Test Accuracy
Parameters
Baseline
67.00%
64,131
Siamese
59.77%
64,131
CNN
65.71%
50,869
MFF
68.01%
31,043
Results: AI2 Science Question Data
Model
Test Accuracy
Parameters
Baseline
28.00%
64,131
Siamese
In-Progress
64,131
CNN
29.44%
50,869
MFF
32.19%
31,043
Next Steps:
●
●
More Parameter Tuning
Augmenting with Entailment and 4th Grade Data
○
●
Curriculum learning
Transfer Learning (Pre-train model first on entailment data and then fine-tune
on AI2 data)
Thank you! Questions?
References
Ankur P Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. A decomposable attention model for natural
language inference. arXiv preprint arXiv:1606.01933, 2016.
Baudis, P., Stanko, S., & Sedivy, J. (2016). Joint Learning of Sentence Embeddings for Relevance and Entailment, 8-17.
Retrieved from http://arxiv.org/abs/1605.04655
J. Mueller and A. Thyagarajan, “Siamese Recurrent Architecture for Learning Sentence Similarity,” AAAI, 2016.