Taxonomy of XML Schema Languages using Formal Language Theory Makato Murata, Dongwon Lee, Murali Mani, Kohsuke Kawaguchi 3. Dezember 2004 . Norman May [email protected] Norman May, 3. Dezember 2004 Taxonomy of XML Schema Languages using Formal Language Theory - p. 1/20 Outline Introduction ● Outline ● Motivation n n Definitions Properties of Tree Grammars n XML Schema Languages n Validation n Summary Norman May, 3. Dezember 2004 n Motivation Definitions of Tree Grammars Properties of Tree Grammars Properties of XML Schema Languages Document Validation Algorithms Summary Taxonomy of XML Schema Languages using Formal Language Theory - p. 2/20 Motivation Grammar Introduction ● Outline ● Motivation Definitions Properties of Tree Grammars XML Schema Languages Validation Summary Validator Norman May, 3. Dezember 2004 Language Taxonomy of XML Schema Languages using Formal Language Theory - p. 3/20 Regular Tree Grammar Introduction Definitions ● Regular Tree Grammar ● Interpretation of Grammar ● Validation, Regular Tree Language ● Example: Regular Tree Grammar ● Local Tree Grammar ● Single-Type Tree Grammar ● Example: Local/Single-Type Tree Grammar Properties of Tree Grammars XML Schema Languages Definition 1 A regular tree grammar (RTG) is a 4-tuple G = (N, T, S, P), where: n N is a finite set of non-terminals, n T is a finite set of terminals, n S is a set of start symbols, n where S ⊆ N, P is the set of production rules of the form X → a r, where X ∈ N, a ∈ T , and r is a regular expression over N; r is called the content model of this production rule. Validation Summary Norman May, 3. Dezember 2004 Taxonomy of XML Schema Languages using Formal Language Theory - p. 4/20 Interpretation of Grammar Introduction Definitions ● Regular Tree Grammar ● Interpretation of Grammar ● Validation, Regular Tree Language ● Example: Regular Tree Grammar ● Local Tree Grammar ● Single-Type Tree Grammar ● Example: Local/Single-Type Tree Grammar Properties of Tree Grammars XML Schema Languages Definition 2 An interpretation I of a tree t against a regular tree grammar G is a mapping from each node e in t to a non-terminal, denoted I(e), such that: n I(e) is a start symbol when e is the root of t, and n for each node e and its child nodes e0 , e1 , . . . , em , there exists a production rule X → a r in G such that u I(e) is X, u the terminal symbol (label) of e is a, and u I(e0 )I(e1 ) . . . I(em ) matches r. Validation Summary Norman May, 3. Dezember 2004 Taxonomy of XML Schema Languages using Formal Language Theory - p. 5/20 Validation, Regular Tree Language Introduction Definitions ● Regular Tree Grammar ● Interpretation of Grammar ● Validation, Regular Tree Language ● Example: Regular Tree Grammar ● Local Tree Grammar Definition 3 A tree t is valid against a regular tree grammar G if there is an interpretation of t against G. A set of trees is a (regular) tree language if, for some (regular) tree grammar, all trees in this set are valid and no other trees are valid. ● Single-Type Tree Grammar ● Example: Local/Single-Type Tree Grammar Properties of Tree Grammars XML Schema Languages Validation Summary Norman May, 3. Dezember 2004 Taxonomy of XML Schema Languages using Formal Language Theory - p. 6/20 Example: Regular Tree Grammar Introduction Definitions ● Regular Tree Grammar ● Interpretation of Grammar ● Validation, Regular Tree Language ● Example: Regular Tree Grammar ● Local Tree Grammar ● Single-Type Tree Grammar ● Example: Local/Single-Type Tree Grammar Properties of Tree Grammars XML Schema Languages N T S P = = = = {Doc, Para1 , Para2 , Pcdata} {doc, para, pcdata} {Doc} {Doc → doc(Para1 , Para2 ), Para1 → para(ε ), Para2 → para(Pcdata), Pcdata → pcdata(ε )} interpretation: tree: Validation doc Summary para Doc para pcdata Norman May, 3. Dezember 2004 Para1 Para2 Pcdata Taxonomy of XML Schema Languages using Formal Language Theory - p. 7/20 Local Tree Grammar Introduction Definitions ● Regular Tree Grammar ● Interpretation of Grammar ● Validation, Regular Tree Language ● Example: Regular Tree Grammar ● Local Tree Grammar ● Single-Type Tree Grammar ● Example: Local/Single-Type Tree Grammar Properties of Tree Grammars XML Schema Languages Validation Definition 4 Two different non-terminals A and B are said to be competing with each other if n one production rule has A in the left-hand side, n another production rule has B in the left-hand side, and n these two production rules share the same terminal in the right-hand side. Definition 5 A local tree grammar is a regular tree grammar without competing non-terminals. Summary Norman May, 3. Dezember 2004 Taxonomy of XML Schema Languages using Formal Language Theory - p. 8/20 Single-Type Tree Grammar Introduction Definitions ● Regular Tree Grammar ● Interpretation of Grammar ● Validation, Regular Tree Language ● Example: Regular Tree Grammar ● Local Tree Grammar Definition 6 A single-type tree grammar is a regular tree grammar such that n for each production rule, non-terminals in its content model do not compete with each other, and n start symbols do not compete with each other. ● Single-Type Tree Grammar ● Example: Local/Single-Type Tree Grammar Properties of Tree Grammars XML Schema Languages Validation Summary Norman May, 3. Dezember 2004 Taxonomy of XML Schema Languages using Formal Language Theory - p. 9/20 Example: Local/Single-Type Tree Grammar Introduction Definitions ● Regular Tree Grammar ● Interpretation of Grammar ● Validation, Regular Tree Language ● Example: Regular Tree Grammar ● Local Tree Grammar ● Single-Type Tree Grammar ● Example: Local/Single-Type Tree Grammar Properties of Tree Grammars XML Schema Languages Validation Summary Norman May, 3. Dezember 2004 N T S P = = = = {Book, Author1 , Author2 , Son, Article, Daughter} {book, author, son, article, daughter} {Book, Article} {Book → book(Author1 ), Article → article(Author2 ), Author1 → author(Son), Author2 → author(Daughter), Son → son(ε ), Daughter → daughter(ε )} . . . is not a local tree grammar because Author1 and Author2 are competing . . . is a single-type tree grammar because Author1 and Author2 do not appear in the same content model Taxonomy of XML Schema Languages using Formal Language Theory - p. 10/20 Expressive Power Introduction Definitions Properties of Tree Grammars ● Expressive Power ● Uniqueness of Interpretation ● Boolean Closure XML Schema Languages Validation Summary Local Tree Grammar Single−Type Tree Grammar Regular Tree Grammar Norman May, 3. Dezember 2004 Taxonomy of XML Schema Languages using Formal Language Theory - p. 11/20 Uniqueness of Interpretation Introduction Definitions ⇒ A interpretation is unique, if validation of a tree against a grammar yields exactly one interpretation. Properties of Tree Grammars ● Expressive Power ● Uniqueness of Interpretation ● Boolean Closure XML Schema Languages Validation Summary Norman May, 3. Dezember 2004 Grammar Unique Interpretation Local Tree Grammar Single-Type Tree Grammar Regular Tree Grammar yes yes not always Taxonomy of XML Schema Languages using Formal Language Theory - p. 12/20 Boolean Closure Introduction Definitions Properties of Tree Grammars ● Expressive Power ● Uniqueness of Interpretation ● Boolean Closure XML Schema Languages Validation Language Local Tree Single-Type Tree Regular Tree Boolean Operation Union Difference Intersection Not Closed Not Closed Closed Not Closed Not Closed Closed Closed Closed Closed Summary Norman May, 3. Dezember 2004 Taxonomy of XML Schema Languages using Formal Language Theory - p. 13/20 Appoach Introduction n Definitions Properties of Tree Grammars XML Schema Languages ● Appoach ● Results ● Restrictions of the Framework n n n Map structural features of XML Schema Languages to production rules All simple types are Pcdata Ignore integrity constraints etc. Classify by the structure of production rules Validation Summary Norman May, 3. Dezember 2004 Taxonomy of XML Schema Languages using Formal Language Theory - p. 14/20 Results Introduction Definitions Properties of Tree Grammars XML Schema Languages ● Appoach ● Results ● Restrictions of the Framework XML Schema Language Grammar DTD W3C XML Schema Relax NG Local Tree Grammar Single-Type Tree Grammar Regular Tree Grammar Validation Summary Norman May, 3. Dezember 2004 Taxonomy of XML Schema Languages using Formal Language Theory - p. 15/20 Restrictions of the Framework Introduction n Definitions Properties of Tree Grammars XML Schema Languages ● Appoach n Wildcards in W3C XML Schema make it more expressive (Restrained Competition Grammar) Attribute-element constraints and interleaving in Relax NG cannot be captured in this framework ● Results ● Restrictions of the Framework Validation Summary Norman May, 3. Dezember 2004 Taxonomy of XML Schema Languages using Formal Language Theory - p. 16/20 Approach Introduction n Definitions n Properties of Tree Grammars n XML Schema Languages Validation ● Approach ● Validation of Local & Single-Type Tree Grammars ● Validation of Regular Tree Grammars n depth first traversal of the tree find a (some) production rule(s) for the current element check content models derived from validating child nodes against the production rule(s) use automaton for checking content models Summary Norman May, 3. Dezember 2004 Taxonomy of XML Schema Languages using Formal Language Theory - p. 17/20 Validation of Local & Single-Type Tree Grammars Introduction Definitions Properties of Tree Grammars XML Schema Languages Validation ● Approach ● Validation of Local & Single-Type Tree Grammars ● Validation of Regular Tree Grammars Summary Norman May, 3. Dezember 2004 Validation is easy because . . . the interpretation of a tree is unique . . . restrictions on the production rules: u for local tree grammars the production rule is determined by the terminal u for single-type tree grammars: no competing non-terminals for the start symbol and within content models. Taxonomy of XML Schema Languages using Formal Language Theory - p. 18/20 Validation of Regular Tree Grammars Introduction Definitions Properties of Tree Grammars XML Schema Languages Validation ● Approach ● Validation of Local & Single-Type Tree Grammars ● Validation of Regular Tree Grammars Validation is more complicated because . . . no unique interpretation exists ⇒ must keep track of multiple interpretations of the input tree. ⇒ extend the automaton to check against a sequence of sets of non-terminals (instead of sequences of non-terminals). ⇒ when the last set in the sequence contains a final state of the automaton. Summary Norman May, 3. Dezember 2004 Taxonomy of XML Schema Languages using Formal Language Theory - p. 19/20 Summary Introduction n Definitions Properties of Tree Grammars n XML Schema Languages Validation Summary ● Summary Norman May, 3. Dezember 2004 n We have a framework for XML Schema languages based on tree grammars Local Tree Grammars (DTD) are too restricted in their expressiveness Regular Tree Grammars (RelaxNG) are most expressive and have nice closure properties but allow ambiguous interpretations (is this needed? . . . ) Taxonomy of XML Schema Languages using Formal Language Theory - p. 20/20
© Copyright 2026 Paperzz