Introduction to JAPE

Introduction to JAPE
Mark A. Greenwood
University of Sheffield NLP
Recap
• Installed and run GATE
• Understand the idea of
LR – Language Resources
PR – Processing Resources
• ANNIE
Understand the goals of information extraction
Loaded ANNIE into GATE
Constructed one or more gazetteer lists
University of Sheffield NLP
Overview
•
•
•
•
Limitations of Gazetteer Lists
High Level Overview of Pattern Matching
What is JAPE?
Learn JAPE by Example
Input Specifications
Left Hand Side
Macros
Right Hand Side
Phases
Loading JAPE into GATE
• Hands On – Extending the IE example
University of Sheffield NLP
Limitations of Gazetteer Lists
• Gazetteer lists are designed for annotating
simple, regular features
Some flexibility is provided by matching
• Word roots
• Whole/part words
• For example, recognising e-mail
addresses using just a gazetteer would be
impossible
University of Sheffield NLP
High Level Overview of
Pattern Matching
• The early components in the ANNIE
pipeline produce simple annotations
Token, Sentence, Lookup
• These annotations have features
Token kind, part of speech, major type...
• Patterns in these annotations and features
can suggest more complex information
University of Sheffield NLP
What is JAPE?
• JAPE provides pattern matching in GATE
• Each JAPE rule consists of the
LHS which contains patterns to match
RHS which details the annotations (and
optionally features) to be created
• JAPE rules combine to create a phase
• Phases combine to create a grammar
University of Sheffield NLP
Learn JAPE By Example
Phase: EMail
Input: Token SpaceToken
Options: control = appelt
Macro: WORD_OR_NUMBER
(
({Token.kind == word}|{Token.kind == number})
)
Rule: emailaddress
Priority: 50
(
(WORD_OR_NUMBER)+
({Token.string == "."}(WORD_OR_NUMBER)+)*
{Token.string == "@"}
(WORD_OR_NUMBER)+
({Token.string == "."}(WORD_OR_NUMBER)+)*
)
:email -->
:email.EMail= {rule = "emailaddress"}
University of Sheffield NLP
Learn JAPE By Example:
Input Specifications
• Each JAPE file defines a phase of the
grammar.
• The header specifies how the rules within
the phase will be applied to the documents
• The input to the rules within this phrase is
the subset of annotations specified in the
header
• The rules within a single phase compete
based on the control option
University of Sheffield NLP
Learn JAPE By Example:
Input Specifications
• 5 different control styles:
Appelt (use of priorities)
Once (as soon as a rule fires, matching stops)
First (shortest rule fires)
Brill (fire every rule that applies)
All (all possible matches)
• Appelt priority is applied in the following order
Longest pattern
Explicit priority (default = -1)
First defined rule
University of Sheffield NLP
Learn JAPE By Example:
Input Specifications
A
A
{A}+
A
Appelt
Once
First
Brill
All
University of Sheffield NLP
Learn JAPE By Example:
Left Hand Side Patterns
• LHS is expressed in terms of existing annotations,
and optionally features and their values
• Any annotation to be used must be included in the
input header
• Any annotation not included in the input header
will be ignored (e.g. whitespace)
• Each annotation is enclosed in curly braces
• Annotations may be combined using traditional
Klene operators: | * + ?
• Each pattern to be matched is enclosed in round
brackets and can have a label attached
University of Sheffield NLP
Learn JAPE By Example:
Left Hand Side Patterns
• As well as matching against the presence
of an annotation, JAPE rules can access
annotation features
{Token.kind==“number”}
• Features can be compared with ==, !=, >,
<, =~, !~, ==~ and !=~
• Ranges can be specified
({Token})[1,3] or ({Token})[3]
University of Sheffield NLP
Learn JAPE By Example:
Left Hand Side Patterns
• Contextual information can be specified in
the same way, but has no label
• Contextual information will be consumed
by the rule
({Annotation1})
({Annotation2}):match
({Annotation3})
• There are other constructs that can be
used. For details see the user guide.
University of Sheffield NLP
Learn JAPE By Example:
Macros
• Macros look like the LHS of a rule but they
never have a label
• They are used in rules by enclosing the
macro name in round brackets
• Conventional to name macros in
uppercase letters
• Macros hold across an entire set of
grammar phases
University of Sheffield NLP
Learn JAPE By Example:
Right Hand Side Annotations
• LHS and RHS are separated by -->
• Label matches that on the LHS
• Annotation to be created follows the label
(Annotation1):match -->
:match.NewAnnotName =
{feature1 = value1, feature2 = value2}
University of Sheffield NLP
Learn JAPE By Example
Phase: EMail
Input: Token SpaceToken
Options: control = appelt
Macro: WORD_OR_NUMBER
(
({Token.kind == word}|{Token.kind == number})
)
Rule: emailaddress
Priority: 50
(
(WORD_OR_NUMBER)+
({Token.string == "."}(WORD_OR_NUMBER)+)*
{Token.string == "@"}
(WORD_OR_NUMBER)+
({Token.string == "."}(WORD_OR_NUMBER)+)*
)
:email -->
:email.Email = {rule = "emailaddress"}
University of Sheffield NLP
Learning JAPE By Example:
Multiple Phases
• Grammars usually consist of several phases which are
run sequentially
• A definition phase (conventionally called main.jape) lists
the phases to be used, in order
• Only the definition phase needs to be loaded
• Temporary annotations may be created in early phases
and used as input for later phases
• Annotations from earlier phases may need to be
combined or modified
17
University of Sheffield NLP
Learning JAPE By Example:
Loading Grammars into GATE
• Load a JAPE transducer, with parameter
the .jape file you have created
• Add to application and run
• Inspect results
18
University of Sheffield NLP
Learning JAPE By Example:
Loading Grammars into GATE
University of Sheffield NLP
Hands On:
Extending the IE Example
• The best way to learn JAPE is to try
writing rules yourself
• In the previous session you should have
added a new gazetteer to look for words
that might signify a change in share price
University of Sheffield NLP
Hands On:
Extending the IE Example
• Use the Lookup annotations from your gazetteer
along with named entities annotated by ANNIE
Organization
Money
Percent
...
• Annotate the documents to associate a company
with a change in share price:
Shares in Scoot rose 9 per cent on the
announcement...
Whitbread shares closed up 2p at 645p.
...
Your Turn!
Feel Free To Refer To The User Guide
And To Ask For Help
University of Sheffield NLP
Hands On:
Extending the IE Example
Phase: Shares
Input: Token Organization Lookup Money Percent
Options: control = appelt
Rule:ShareChange
(
{Organization}
({Token})[0,3]
{Lookup.majorType=="change"}
({Token})[0,3]
({Money}|{Percent})
):change -->
:change.ShareChange = {rule = "ShareChange"}