Answering 8th Grade Science Questions Steven Hewitt, An Ju, Katherine Stasaski Problem Statement ● Given science questions from a grade school multiple-choice exam, find the best possible answer. ○ ● Example: Which of the following does not allow sound to travel through? A. Solid B. Liquid C. Gas D. Vacuum Measure of success: percent of questions correctly answered Data Source ● Set of AI2 questions ○ ○ ● Entailment data ○ ○ ● ● 8th Grade: Total of 3710 Questions 4th Grade: Total of 1437 Questions SICK: 10,000 sentence pairs SNLI: 570,000 sentence pairs Open source science textbooks from c12k.org GloVe Stanford vectors ○ Word embeddings trained on large corpus of Wikipedia articles Knowledge Base (Textbooks) Pipeline Hypotheses Answer choices Ex: Vacuum Question Ex: Which does not allow sound to travel through? Hypothesis gathering (Hand-crafted rules) Hypotheses Ex: A vacuum does not allow sound to travel through. Relevant sentence selection (PyLucene) Evidence Ex: Sound wave cannot travel in an airless vacuum. Model Final output: Answer Probabilities Ex: Vacuum 0.8 Baseline Model Entailment bi-directional model with dropout on input words Fully connected network with dropout on node Forward pass D r o p o u t Dropout on word Two dogs run … ... Dogs are playing Dropout on word Output Backward pass Note: Loosely based on Baudis et al (2016) FC Layer Output Confidence score for each category: Entails, Neutral, Contradicts Siamese Entailment Model Bidirectional RNN with Dropout are playing … ... Output Diff Two dogs run … ... Output FC Layer Dogs Fully connected layer Entailment Neutral Contradiction M×3 Note: Siamese model introduced by Mueller et al. (2016) uses a fixed function instead of FC layer. MFF (Multi Feed-Forward Network) b1 b2 b1 b2 b3 ... bL a1 a2 a3 ... aL b3 ... bL ai wi sum i v1 F(ai,bj) = F(ai)F(bj) result v2 a1 bj a2 a3 ... aL Attend Note: Originally introduced by Parikh et al (2016) wj sum Compare j Aggregate CNN b1 b2 b3 ... bL a1 a2 a3 ... aL F(ai,bj) = F(ai)F(bj) Fk(ai,bj) = Fk(ai)Fk(bj) CNN Attend Note: Loosely based on Baudis et al (2016) CNN Aggregate ... FC Result Results: Entailment Data Model Test Accuracy Parameters Baseline 67.00% 64,131 Siamese 59.77% 64,131 CNN 65.71% 50,869 MFF 68.01% 31,043 Results: AI2 Science Question Data Model Test Accuracy Parameters Baseline 28.00% 64,131 Siamese In-Progress 64,131 CNN 29.44% 50,869 MFF 32.19% 31,043 Next Steps: ● ● More Parameter Tuning Augmenting with Entailment and 4th Grade Data ○ ● Curriculum learning Transfer Learning (Pre-train model first on entailment data and then fine-tune on AI2 data) Thank you! Questions? References Ankur P Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. A decomposable attention model for natural language inference. arXiv preprint arXiv:1606.01933, 2016. Baudis, P., Stanko, S., & Sedivy, J. (2016). Joint Learning of Sentence Embeddings for Relevance and Entailment, 8-17. Retrieved from http://arxiv.org/abs/1605.04655 J. Mueller and A. Thyagarajan, “Siamese Recurrent Architecture for Learning Sentence Similarity,” AAAI, 2016.
© Copyright 2026 Paperzz