Natural Language to Machine Readable Format

Natural Language to Machine
Readable Format
By: Damian Tamayo
Presentation 2 – Nov. 13, 2009
CIS 895 – MSE Project
Outline
 Flow Diagram
 Action Items
 Architectural Design
 Test Plan
 Formal Inspection Checklist
 Project Plan
 Prototype Demonstration
 Questions / Comments
NLP Flow Diagram
Start the
Program
Data Transparency
User Input
Enter Data
Tic Tac Toe is a
game. There are
two players.
Split Sentences
Tic Tac Toe is a
game.
There are two
players.
getParse(sentence)
(S(NP(NNS(VP))))
Return set of
sentences
Select
Sentence
Output to tabs
DisplayParses
Display Parse
Objects
Display Logic
(S(NP(NNS(VP))))
Tree of Parse
Objects
(i.e. NounObject ,
Predicate,
Condition)
Tree of Logic
(i.e. OntologyAnd,
OntologyBinary,
etc)
Program Parts
 NLP Program
 User input into the system
 OpenNLP Server
 POS tagger
 StanfordNLP Server
 POS tagger
 PPOS Server
 Source of user defined POS sentences
 PPOS Client
 Program to manually define the POS for sentences
PPOS Flow Diagram
Enter in tags
NP, NNS, VBP,
etc
Display Tree
NP, NNS, VBP,
etc
Continue?
Save Parse
(S(NP(NNS(VBP)))
The sentence
PPOS Parts
 PPOS Client
 Create manually tagged sentences
 Save manually tagged sentences
 PPOS Maintenance
 View Parse list
 Delete parses
 Resave Parses
Action Items
 Formal Specification
 USE program
Architectural Design/Class Diagram
 http://cis.ksu.edu/~dtamayo/Class_Diagra
m_Expanded.html
 http://cis.ksu.edu/~dtamayo/Class_Diagra
m_Collapsed.html
Overview
«uses»
«uses»
PPOS_Server
NLP Program
«uses»
OpenNLP_Server
«subsystem»
PPOS
«uses»
StanfordNLP_Serv
er
Sequence Diagram
User
NLP_Program
PPOS
OpenNLP
StanfordNLP
Start
Enters input sentences to be processed
Sends raw input to be split up for the user to select
Split Sentences
Splits the Sentences
Selects sentence to be processed
Check for user parse first
Returns user parse if present
Retrieves OpenNLP parse trees if nothing returned from PPOS
Return all parse trees from OpenNLP
Process Sentence
Retrieve the StanfordNLP parse trees if nothing was returned from PPOS
return all parse trees from StandfordNLP
Concatenate parse trees and display
Process displayed parse tree and display internal representation to POViz tab
Process displayed parse into FOL like language and display on Viz tab
Process sentence
TagSet
 http://bulba.sdsu.edu/jeanette/thesis/Penn
Tags.html
Examples
Tic-Tac-Toe is a game.
Three squares by Three squares composes the game board in which you play
Tic-Tac-Toe.
There are two players in this game.
One Player chooses the X token while the other player gets the O token.
Player X draws a grid of 6 empty squares in the formation of 3 rows and 3
columns.
The first move is made by Player X.
Player X and Player O take turns until the game is over.
A turn is placing an X or O on an empty square.
The game is over when all nine squares are filled or one player has three of their
mark in a horizontal, vertical, or diagonal row.
Player X puts his token in the top left corner.
Player O puts his token in the middle left.
Player X puts his token in the center.
Player O puts his token in the bottom left corner.
Player X puts his token in the bottom right corner.
Formal Specification
 USE Version 2.4.0
 http://www.db.informatik.unibremen.de/projects/USE/#download
 Models all classes
 Models all necessary methods and calls
 Specification can be found at:
 Appended to the end of Architectural Design
document
Test Plan
Test Plan
Program will be tested by developer and two fellow
graduate students:
• Michael Marlen
• Jack Hart
Inspection Checklist
Action Item
Inspection Item
1
Class diagrams use UML
standard symbols
2
Class diagram is clear
3
Sequence Diagram uses UML
standard symbols
3
Sequence and class diagrams
match
4
OCL model represents class
diagram
5
Architecture Design
Document meets
requirements
P/F/Partial
Comments
Milestones
 Presentation 1
 October 12, 2009
 PPOS Client/Server
 Internal Rep
 Presentation 2
 November 13, 2009
 Presentation 3
 Bug fixes
 Logical Support
 Dec 7, 2009
 Complete by December 10, 2009
Gantt Chart
Prototype
 NLP_Program
 Tic Tac Toe Sentences
 POViz tab
 Shows structure of sentence
 PPOS_Client
 Manually tag a sentence
 Save
 Maintenance
Phase 3 Deliverables
 Action Items
 User Manual
 Component Design
 Source Code
 Assessment Evaluations
 Project Evaluation
 References
 Formal Technical Inspection Letters
To Do
 Work on Deliverables
 Revise Documents as needed