NLG_Arch_3-4-GSLT-OH

User
Usermodel
model
Architecture of
NLG-systems
Style
Style Pragmatics
Pragmatics Control
Control
Surface generator
Deep generator
Knowledge
Representation
Hercules Dalianis
Text
Text
Planner
Planner
E-mail: [email protected]
Tel: +46 70 568 13 59
http://www.dsv.su.se/~hercules
Text
Text
Plan
PlanLibrary
Library
Sentence
Sentence
Planner
Planner
Aggregation
Aggregation
Reference
Reference
SentenceSentenceDelimitation
Delimitation
Lexical
LexicalChoice
Choice
Theme
ThemeControl
Control
Surface
Surface
Grammar
Grammar
Natural
Language
text
BaseBaseLexicon
Lexicon
DomainDomainLexicon
Lexicon
Knowledge
KnowledgeRepresentation
Representation Semantics
Semantics Ontology
Ontology
Hercules Dalianis
1
Hercules Dalianis
2
Deep generation
• An knowledge representation is indata to the
text generator
• A user model describes what type of user that
is going to read the text (query analysis)
• The user model will assist in the content
selection in the deep generation.
• The deep generator plans content and
organize what to say
• A text planner plans the text the discourse
such the text becomes coherent.
– The sentences in the text will become coherent
• A sentence planner plans each sentence,
– Sentence length, aggregation and pronouns,
tempus.
– Schemas can also be used
Hercules Dalianis
3
Surface generation
4
Generation grammars
• DCG Definite clause grammar Prolog based
• Systemic grammars (Hallidays)
• The surface generator realizes the message.
• Generation grammar (language)
– Can interact between syntactic, semantic and
pragmatic parts of the system.
– Each part of the grammars delimits choices until
correct sentence structure.
– Syntactic correct text
– In correct selected language
• Lexicon
– Base lexicon
– Domain lexicon
• Functional Unification Grammar (Martin Kay)
– Each part of the grammar are modular and can be
changed
• Full control of generation
Hercules Dalianis
Hercules Dalianis
5
Hercules Dalianis
6
Query analysis and user
modelling
•
•
•
Deep generationWhat should be said?
• Content selection
Has to analyze the query
Find out what the user already knows
Use some sort of user model where
the system can relate to.
– Selects from its abundant knowledge base what should
be said
• Text planning (Discourse planning)
– Make a text plan, that contains a number of sentence
plans. The text plan has to be coherent.
• Sentence planning
– Decides the form of sentences, active or passive form,
pronominal use, aggregation, sentence length.
Hercules Dalianis
7
Hercules Dalianis
Sentence generationHow should it be said?
Deep generation
• Aggregation coordination ellipsis
• Removes redundant information without
changing the content
• ASTROGEN - Aggregated deep and
Surface naTuRal language GENerator
• http://www.dsv.su.se/~hercules/
ASTROGEN/ASTROGEN.html
• Selects language -grammar
• Lexical choices (words)
– Base dictionary
– Domain dictionary
• Post processing
Hercules Dalianis
9
Götaland får duggregn på måndag
Götaland får regn på tisdag
Götaland får hagel på onsdag
Götaland får snö på torsdag
Götaland får kuling på fredag
Götaland får storm på lördag
Götaland får orkan på söndag
Svealand får duggregn på måndag
Svealand får regn på tisdag
Svealand får hagel på onsdag
Svealand får snö på torsdag
Svealand får kuling på fredag
Svealand får storm på lördag
Svealand får orkan på söndag
Subjekt och
predikat
Predikat och
aggregering
ackusativ
objekts
Götaland får duggregn på måndag,
aggregering
regn på tisdag,
hagel på onsdag,
Göta- och Svealand får duggregn på måndag
snö på torsdag,
Göta- och Svealand får regn på tisdag
kuling på fredag,
Göta- och Svealand får hagel på onsdag
storm på lördag,
Göta- och Svealand får snö på torsdag
orkan på söndag
Göta- och Svealand får kuling på fredag
Svealand får duggregn på måndag,
Göta- och Svealand får storm på lördag
regn på tisdag,
Göta- och Svealand får orkan på söndag
hagel på onsdag,
snö på torsdag,
Subjekt och
Predikat och
kuling på fredag,
predikat
ackusativ
storm på lördag,
aggregering
objekts
orkan på söndag
aggregering
Göta- och Svealand får duggregn på måndag
får regn på tisdag
får hagel på onsdag
får snö på torsdag
får kuling på fredag
får storm på lördag
får orkan på söndag
– A parsing in a query interface can give one or
more interpretations of a query.
– Paraphrase these interpretations to NL
– Usually one sentence generation
– One can use bi-directional grammar both for
parsing and generation (For example DCG,
Definite Clause Grammar in Prolog)
Obunden
lexikal
aggregering
Obunden
lexikal
aggregering
Införande av
signaleringsord
Både Göta- och Svealand får ostadigt väder hela veckan
Hercules Dalianis
10
• Paraphrase parsed questions
Göta- och Svealand får ostadigt väder på måndag, tisdag, onsdag, torsdag, fredag, lördag och söndag
Göta- och Svealand får ostadigt väder hela veckan
Hercules Dalianis
What can one use text
generation for?
Götaland får ostadigt väder på måndag, tisdag, onsdag, torsdag, fredag, lördag och söndag.
Svealand får ostadigt väder på måndag, tisdag, onsdag, torsdag, fredag, lördag och söndag.
Bunden
lexikal
aggregering
8
Bunden
lexikal
aggregering
Götaland får ostadigt väder hela veckan
Svealand får ostadigt väder hela veckan
Predikat och
ackusativ objekts
aggregering
Göta- och Svealand får ostadigt väder hela veckan
Införande av
Signaleringsord
Både Göta- och Svealand får ostadigt väder hela veckan
11
Hercules Dalianis
12
What can one use text
generation for?
• Generation of the target text in machine
translation
• Generation of abstracts
• Generation of weather or stock reports directly
from raw data.
• On several languages
• Generation of hand books on several
languages in the same time.
Hercules Dalianis
13
Hercules Dalianis
14
Paraphrasing
• Validation of formal specification by
paraphrasing them to NL
• Explanation of medical, juridical or
technical complex systems for example
MYCIN
Hercules Dalianis
15
Fråga: vem behandlar Adler. (Who treats Adler?)
SQL:
SELECT DISTINCT T3.name, T1.name,T2.empl_no, T3.reg_no
FROM DOCTOR T1, ATTDOCTOR T2, PATIENT T3
WHERE (T1.name = `Adler J.`) AND
(T2.empl_no = T1.empl_no) AND
(T3.reg_no=T2.reg_no)
PARAFRAS AV SQL I NATURLIGT SPRÅK:
Vilka patienter behandlar doktor Adler J.
(What patients is treated by doktor Adler J)
Fråga: vilken diagnos har Amster ? (What diagnosis has Amster?)
SQL:
SELECT DISTINCT T2.other_info, T1.name, T2.reg_no
FROM PATIENT T1, DIAGNOSIS T2
WHERE (T1.name = `Amster K.`) AND
(T2.reg_no=T1.reg_no)
PARAFRAS AV SQL I NATURLIGT SPRÅK
vilka diagnoser har en patient som heter Amster K.
(what diagnosis does a patient have with the name Amster K.)
Hercules Dalianis
16
Generation of one or more
languages
•
•
•
•
Fråga: vem behandlar Hansson. (who treats Hansson)
SQL:
SELECT DISTINCT T1.name,T2.empl_no, T3.reg_no, T3.name
FROM DOCTOR T1, ATTDOCTOR T2, PATIENT T3
WHERE (T2.empl_no = T1.empl_no) AND
(T3.reg_no = T2.reg_no) AND
(T3.name=T2.`Hansson A.`)
PARAFRAS AV SQL I NATURLIGT SPRÅK:
Vilka doktorer behandlar patienten Hansson A.
(which doctors treats the patient Hansson A.)
Hercules Dalianis
HSQL-(Help system for structured query language) The prototype is
implemented on database that contain information about different
hospitals and their operation.
Example of generation of SQL from Natural language (NL) and
paraphrasing from SQL back to NL
Weather reports COGENTEX, Canada
Stock exchange
Car manual TechDoc, Honda Tyskland
Support system
– Scarrie - Scania (via “Scania Swedish” och
directly)
17
Hercules Dalianis
18
Generation from rawdata
Utdata från GOSSIP:
Indata till GOSSIP:
ee(martin,ttyp0,login,[],8:20:03,_,_,0).
The system was used for 7 hours 32 minutes and 12 seconds.
ee(martin,ttyp0,editor,[f1],8:30:00,9:10:32,0:40:32,240).
The users of the system ran editors and compilers during this
ee(martin,ttyp1,editor,[f2],8:42:21,9:13:14,0:30:53,183).
ee(martin,ttyp0,logout,[],9:21:05,_,1:01:02,0).
time. Compilers were run six times (the cpu-time equal to 46%
ee(martin,ttyp0,login,[],10:17:32,_,_,0).
of the total cpu-time). Editors were run twelwe times (the
ee(martin,ttyp0,editor,[f1],10:20:58,12:15:27,1:54:29,1200). cpu-time equal to 53% of the total cpu-time). Two users, Martin
ee(martin,ttyp1,editor,[f2],11:00:39,11:32:48,0:32:09,185). and Jessie, logged on to the system. Jessie used the system for
ee(jessie,ttyd0,login,[],11:03:46,_,_,0).
63 % of the time in use.
ee(jessie,ttyd0,editor,[f5],11:12:45,12:48:22,1:35:37,573).
ee(jessie,ttyd1,compiler,[f4],11:23:32,11:31:01,0:07:29,300).
ee(jessie,ttyd1,editor,[f3],11:32:25,11:45:56,0:13:31,70).
ee(jessie,ttyd1,editor,[f4],11:47:34,11:59:09,0:11:35,65).
ee(jessie,ttyd1,compiler,[f4],12:04:47,12:08:32,0:03:45,186).
ee(jessie,ttyd1,editor,[f3],12:09:57,12:16:34,0:06:37,15).
ee(jessie,ttyd1,editor,[f4],12:18:43,12:39:24,0:20:41,154).
ee(martin,ttyp0,logout,[],12:20:21,_,2:02:49,0).
ee(jessie,ttyd1,editor,[f7],12:56:01,13:15:02,0:19:01,143).
ee(jessie,ttyd0,editor,[f6],12:59:56,13:20:43,0:20:47,187).
Hercules Dalianis
19
network[1]
• CLARE-Ericsson
Subscribers are part of a network.
Mobile subscribers are subscribers.
Subscribers can either be in the
state idle or busy.
The state busy can either be in the
substates ringtone, ringsignal,
busytone or dialtone.
When one subscriber is calling an
other subscriber then the first
subscriber has ringtone and
the other subscriber has ringsignal.
Validation of formal specifications
by paraphrasing them to NL
• Ellemtel-Ericsson
• LOXY is a predicate logical language to describe
telephone services
• Clare = LOXY+Conceptual model
• Validates LOXY and Clare by translating them to NL
Hercules Dalianis
ASTROGEN-Validates
subscriber[0..1000]
#phonenumber
•
XOR
idle
busy
XOR
[1] ringtone
calling
[1]
ringsignal
busytone
dialtone
isA
mobile_subscriber
Hercules Dalianis
21
paraphrase(f(pres,isa,john,subscriber) &
(text plan)
f(pres,state,john,idle) &
f(pres,poss,john,f(pres,attr,phonenumber,100)) &
f(pres,poss,john,f(pres,attr,phonenumber,101)) &
f(pres,isa,mary,subscriber) &
f(pres,state,mary,idle) &
f(pres,poss,mary,f(pres,attr,phonenumber,200)) &
f(pres,isa,tom,subscriber) &
f(pres,state,tom,idle) &
f(pres,poss,tom,f(pres,attr,phonenumber,300)) ).
INSTANTIATION example
OF SPECIFICATION network
ENTITIES
John:subscriber
phonenumber = 100;phonenumber = 101
IS STATES
idle;
Mary:subscriber
phonenumber = 200
IS STATES
idle;
Tom:subscriber
phonenumber = 300
IS STATES
idle;
END;
Hercules Dalianis
22
STEP/EXPRESS VOLVEX
• The STEP/EXPRESS standard
• The manufacturing industry, car, boat,
airplane, process industry, power, oil,
etc
• AP Application Protocol
• AP 214, 500 concepts
• ASTROGEN Hybrid system with both
text generation and canned text.
john, mary and tom are subscribers
they are idle
john has phonenumbers 100 and 101
mary has a phonenumber 200
tom has a phonenumber 300.
yes
Hercules Dalianis
20
23
Hercules Dalianis
24
ASTROGEN
STEP-VOLVEX-ASTROGEN
lexical
lexicon
• ASTROGEN - Aggregated deep and
Surface naTuRal language GENerator
• http://www.dsv.su.se/~hercules/
ASTROGEN/ASTROGEN.html
building
tool
STEP AP
EXPRESS
to Prolog
f-structures
ASTROGEN
Natural
Language
Validation
Hercules Dalianis
25
ENTITY fillet
SUBTYPE OF (transition_feature);
END_ENTITY;
5.2.3.1.75 Fillet
A Fillet is a concave circular arc transition between two intersecting Face (see 4.2.167)
objects without any constraints concerning changes of the radius along the Fillet.
ENTITY constant_radius_fillet
SUBTYPE OF (fillet);
radius : feature_parameter;
first_offset : OPTIONAL feature_parameter;
second_offset : OPTIONAL feature_parameter;
END_ENTITY;
?- question(fillet).
A constant_radius_fillet is a subtype of a fillet.
A fillet is an entity.
It is a subtype of a transition_feature.
(Pron.)
A Fillet is a concave circular arc transition between two intersecting Face (see 4.2.167)
objects without any constraints concerning changes of the radius along the Fillet.
(Canned text)
Hercules Dalianis
27
ENTITY project_relationship;
(*UOF:S5*)
related : project;
relating : project;
relation_type : undefined_object;
description : OPTIONAL string_select;
END_ENTITY;
4.2.357 Project_relationship
A Project_relationship is a relationship between two Project (see 4.2.356)
objects (Example text)
EXAMPLE 174 -- For the development of a new car, a project is set up that
is responsible for the development decisions as well as for the
accounting of the costs.
yes
Hercules Dalianis
29
Hercules Dalianis
26
ENTITY project;
(*UOF:S5*)
id : undefined_object;
name : string_select;
description : OPTIONAL string_select;
actual_start_date : OPTIONAL date_time;
actual_end_date : OPTIONAL date_time;
planned_start_date : OPTIONAL event_or_date_select;
affected_product_class : SET[0:?] OF product_class;
work_program : SET[0:?] OF activity;
planned_end_date : OPTIONAL period_or_date_select;
END_ENTITY;
4.2.359 Project
A Project is a unique process with a time limit, with a defined goal, with a
defined budget, and with defined resources.
A Project is a type of organization representing a work programme that
consists of a set of assigned actions. See ARM definition for Project in
paragraph 4.2.356 for more information.
Hercules Dalianis
28
?- question(project & project_relationship).
a project is an entity and
a project has an undefined_object id and
a project has a string_select name and
a project has a string_select description and
a project has a date_time actual_start_date and
a project has a date_time actual_end_date and
a project has an event_or_date_select planned_start_date and
a project has a product_class affected_product_class
and a project has an activity work_program and
a project has a period_or_date_select planned_end_date
and a project_relationship is an entity and
a project_relationship has a project related and
a project_relationship has a project relating and
a project_relationship has an undefined_object relation_type and
a project_relationship has a string_select description.
yes
Hercules Dalianis
30
?- sort(1,2,3),canned_text, canned_example,clause_comma, pronoun,
predicate_do.
?- question(project & project_relationship).
A project and a project_relationship are entities. (Agg.)
They have a string_select description.
(Pron.)
A project has a date_time actual_end_date.
It has a date_time actual_start_date.
(Pron.)
It has a product_class affected_product_class.
"
It has an undefined_object id.
It has a string_select name.
It has a period_or_date_select planned_end_date. "
It has an event_or_date_select planned_start_date "
It has an activity work_program.
A project_relationship has a project related.
It has a project relating.
(Pron.)
It has an undefined_object relation_type.
"
A Project is a unique process with a time limit, with a defined goal, with a
defined budget, and with defined resources.
(Canned text)
A Project_relationship is a relationship between two Project (see 4.2.356)
objects (Example text)
EXAMPLE 174 -- For the development of a new car, a project is set up that is
responsible for the development decisions as well as for the accounting of the
costs.
Hercules Dalianis
31
?- sort(1,2,3),subject_pred,predicate_do,subject,predicate,sym_rel,canned_text,
canned_example, clause_comma.
?- question(project & project_relationship).
A project and a project_relationship are entities.
A project and a project_relationship have a string_select description.
A project has a date_time actual_end_date, a date_time actual_start_date, a
product_class affected_product_class, an undefined_object id, a string_select
name, a period_or_date_select planned_end_date, an event_or_date_select
planned_start_date and an
activity work_program.
A project_relationship has a project related, a project relating and an
undefined_object relation_type.
A Project is a unique process with a time limit, with a defined goal, with a defined
budget, and with defined resources.
A Project_relationship is a relationship between two Project (see 4.2.356) objects.
EXAMPLE 174 -- For the development of a new car, a project is set up that is
responsible for the development decisions as well as for the accounting of the
costs.
yes
?-
Hercules Dalianis
33
sort(1,2,3),subject_pred,predicate_do,subject,predicate,sym_rel,,canned_text,
canned_example.
?- question(project & project_relationship).
A project, a project_relationship are entities and
A project, a project_relationship have a string_select description and
A project has a date_time actual_end_date, a date_time actual_start_date, a
product_class affected_product_class, an undefined_object id, a
string_select name, a period_or_date_select planned_end_date, an
event_or_date_select planned_start_date, an activity work_program and
A project_relationship has a project related, a project relating and an
undefined_object relation_type.
A Project is a unique process with a time limit, with a defined goal, with a defined
budget, and with defined resources.
A Project_relationship is a relationship between two Project (see 4.2.356)
objects.
EXAMPLE 174 -- For the development of a new car, a project is set up that is
responsible for the development decisions as well as for the accounting of
the costs.
yes
Hercules Dalianis
32