XPipe - An XML Processing Methodology XML SIG, NY USA Feb 12, 2002 Sean McGrath CTO Propylon XML SIG NY, Sean McGrath http://www.propylon.com What is XPipe? • It is an architecture / methodology /framework for developing robust, scaleable, manageable XML processing systems. • based on proven mechanical manufacturing techniques. Specifically: – The Assembly Line Principle – Component assembly and component re-use XML SIG NY, Sean McGrath http://www.propylon.com What is XPipe? • An open source project hosted on Sourceforge – http://xpipe.sourceforge.net • A contribution to the blossoming meme of using pipeline based processing to tame the burgeoning complexity of XML transformations – (If you do not find XML transformation complicated, you are not sufficiently well informed.) – (And no, XSLT does not solve all your problems) XML SIG NY, Sean McGrath http://www.propylon.com What is XPipe? • A way of thinking about systems that focuses on structured dataflows rather than Object APIs • It is also: – A Scandinavian sewage treatment technology – An exhaust pipe system for high performance engines – A VT100 based strategy game for DECs VAX/VMS Operating System XML SIG NY, Sean McGrath http://www.propylon.com Contents of this talk • • • • The XPipe philosophy Major functional elements Some examples The XGrid and Commoditized XML Processing • Some anticipated objections (and answers) • Relationship to other technologies XML SIG NY, Sean McGrath http://www.propylon.com Contents of this talk • • • • • Current status Current problems Future plans Some (contentious) musings Something cold to drink XML SIG NY, Sean McGrath http://www.propylon.com XPipe Philosophy • XML is all about (potentially) complex, hierarchical data structures XML SIG NY, Sean McGrath http://www.propylon.com XPipe Philosophy Cars are complex, hierarchical structures Henry Ford’s Model T Ford Assembly Line – 1914 XML SIG NY, Sean McGrath http://www.propylon.com XPipe Philosophy Lunch is a complex, hierarchical structure Lunch Assembly Line. NY, 2002 XML SIG NY, Sean McGrath http://www.propylon.com XPipe Philosophy We are complex, hierarchical structures XML SIG NY, Sean McGrath http://www.propylon.com XPipe philosophy • What have these scenes got it common? – Complex construction of cars, tuna melts and tendons made possible and efficient through • assembly line manufacturing • re-usable component processes and component materials • Why not apply this approach to XML “manufacturing”? XML SIG NY, Sean McGrath http://www.propylon.com XPipe philosophy • Why does the assembly line approach work? – Transformation task decomposition – Re-usable transformation components • Transformation decomposition is the key to complexity management. Just ask: – Henry Ford – Herbert Simon (The Two Watchmakers – “The Architecture of Complexity”) – George Miller (7+/-2) – Adam Smith (An Inquiry into the Nature And Causes of the Wealth of Nations,1776) – Any electrical or chemical engineer. XML SIG NY, Sean McGrath http://www.propylon.com XPipe philosophy • Component re-use is the key to productivity – Ask any form of engineer (electrical, chemical etc.) apart from software engineers… – Component re-use remains a holy grail in software engineering – XPipe is yet another attempt… XML SIG NY, Sean McGrath http://www.propylon.com XPipe philosophy • A lot of data processing for the forseable future will consist of XML to XML transformation • A lot of non-XML data processing can consist of XML to XML transformations with the addition of top and tail transformations • Mantra – Get data into XML as quickly as possible – Keep it in XML until the last possible minute – Bring all your XML tools to bear on solving the data processing problem XML SIG NY, Sean McGrath http://www.propylon.com XPipe philosophy Input Output XML XML Top Transformation Non-XML Input Tail Transformation Non-XML Output XML SIG NY, Sean McGrath http://www.propylon.com XPipe philosophy • The philosophy hinges on the fact that every complex XML transformation can be broken down into a series of smaller ones than can be chained together XML SIG NY, Sean McGrath http://www.propylon.com XPipe philosophy • Only so many ways to re-arrange an XML tree structure • A finite number of fundamental transformations, from which all higher order transformations can be derived XML SIG NY, Sean McGrath http://www.propylon.com XPipe philosophy – Transformation Decomposition leads to • a series of small, manageable, “stand alone” problems with an XML input “spec” and an XML output “spec”. • Can build, test, use and then re-use these transformation components • Very team development friendly • High cohesion, loose coupling – just like the professor advised XML SIG NY, Sean McGrath http://www.propylon.com XPipe philosophy • Pipeline approach means you can mix ‘n’match black-box components that internally use whatever paradigm best suited the problem • • • • • Lexical SAX DOM XSLT XDuce, Pyxie, Haskell, AF-NG… XML SIG NY, Sean McGrath http://www.propylon.com Sample XPipe DB /CMS Character Set Mods Lexical Add Doctype + validate + strip doctype Lexical Re-arrange Elements Validation DOM Schematron/ Stats + FTP RelaxNG/ Rhino Jython SQL Replace Java XHTML Generate XSLT XML SIG NY, Sean McGrath http://www.propylon.com XPipe philosophy • Assertion : developers would use a component based approach to XML processing if they did not have to write the plumbing (orchestration, exception handling) themselves – “Gee, this problem is complex. Maybe I’ll do it in multiple stages! Gee, now I have to orchestrate the stages somehow. Batch files/shell scripts/driver program – all ugly and error prone. Maybe I’ll just write a single program after all…” XML SIG NY, Sean McGrath http://www.propylon.com XPipe philosophy • “Professional developers spend 50 percent of their time writing plumbing” – Adam Bosworth • XPipe aims to look after the plumbing letting developers concentrate on the interesting stuff XML SIG NY, Sean McGrath http://www.propylon.com Philosophy Summary • Preambles – Make things as complex as necessary but not more complex than necessary – Solve all the worlds problems – but only one at a time – Don’t even think about performance until it is too late – then it will look after itself – Only increase complexity linearly w.r.t. functionality and only in “elevator pitch sized” functionality quanta XML SIG NY, Sean McGrath http://www.propylon.com Philosophy Summary – 1#2 • Data processing == data transformation w.r.t. time. • XML is the current runaway winner in the self-descriptive data stakes and a very good QDDL (Quiescent Data Description Language) XML SIG NY, Sean McGrath http://www.propylon.com Philosophy summary – 2#2 • Inside every complex XML transformation is a sequence of simpler XML transformations trying to get out – a Pipe • Decomposed transformation = new transformations + already componentized transformations -> Component Reuse • Inside every graph transformation (read “workflow” or “business process model”) is a combination of simple Pipes trying to get out XML SIG NY, Sean McGrath http://www.propylon.com XPipe Philosophy Leveled architetecture – levels build on one another but any level is usable independently of higher levels Out Level 2 - XRigs In Out Level 1 - XPipes In Out Level 0 - XComponents In Out XML SIG NY, Sean McGrath http://www.propylon.com Major Functional Elements – XComponents In Out • Developed in any language that runs on the Java Virtual Machine (Jython, Java, XSLT, Rhino (JavaScript) etc.) • All XComponents are standalone programs of the form – [Name] [InputXML] [OutputXML] [ErrorXML] [Optional Args] XML SIG NY, Sean McGrath http://www.propylon.com Major Functional Elements XComponents • XComponents described in XML form. An XComponent consists of: – – – – Metadata (keywords etc.) Documentation Pre and Post Conditions Unit Tests (input,output XML stream pairs + Pre/Post Conditions) – Code (Java / Jython / XSLT / Exec) XML SIG NY, Sean McGrath http://www.propylon.com Major Functional Elements – XPipes In Out • A linear assembly of XComponents that together achieve some useful transformation function • Described in XML – – – – Documentation Metadata (keywords etc.) Pre/Post conditions Unit Tests (input,output XML stream pairs + Pre/Post Conditions) – References to XComponents (URIs) which are resolved when the XPipe is installed/executed XML SIG NY, Sean McGrath http://www.propylon.com Major Functional Elements – XRigs Out In I n Out • An assembly of XPipes that together achieve some useful transformation function • Described in XML – Documentation – Metadata (keywords etc.) – Pre/Post conditions – Unit Tests (input,output XML stream pairs + Pre/Post Conditions) – References to XPipes (URIs) which are resolved when the XRig is installed/executed XML SIG NY, Sean McGrath http://www.propylon.com Major Functional Elements • Unit Testers – XComponent, XPipe and XRig level Test Harnesses • Executives – XComponent, XPipe and XRig level Execution Environments (on-the-fly, disk install, compiled, web service…) – (Executing an Xcomponent is identical to executing an XPipe of arity 1, is identical to executing an XRig of arity 1…) XML SIG NY, Sean McGrath http://www.propylon.com Major Functional Elements • Executives – Uniprocessor Execution • Executed on 1 CPU, possibly with separate threads for each instantiated X* – Multiprocessor Execution (Vapor) • XML based protocol to implement “Job Shop” work distribution over a P2P network (XJCL) XML SIG NY, Sean McGrath http://www.propylon.com Major Functional Elements – XPipe Monitor (Vapor) XML SIG NY, Sean McGrath http://www.propylon.com Major Functionality Elements – Miscellany (Vapor) • Whizzy GUI Component and Pipe Editors • XComponent Creators – “Wrap” Java, XSLT etc. into XComponent compliant XML, Ant build target • XComponent Proxies – “pretend” to be a simple XComponent but invoke some external functionality – from Windows DLL to SOAP endpoint • XPipe masquerading as XComponent – this could be a very powerful paradigm XML SIG NY, Sean McGrath http://www.propylon.com Major Functionality Elements – Miscellany (Vapor) • Compilers / Packers – Pack XPipes/XRigs into standalone XPipes/XRigs for distribution (with or without an executive) – Compile pure XSLT XPipe into a self contained translet (self contained or as an XComponent) • “Compile away”/optimize intermediate files via a variety of tricks (Jackson Inversion, Java IO hook, shadow marshalling etc.) XML SIG NY, Sean McGrath http://www.propylon.com Simple XComponent examples • Fundamental Operation – Rename Element – Rename • Input : <foo>baz</foo> • Output: <bar>baz</bar> foo bar baz baz XML SIG NY, Sean McGrath http://www.propylon.com Simple XComponent examples • Fundamental Operation - Peel • Input : <foo><bar>baz</bar></foo> • Output: <foo>baz</foo> foo foo bar baz baz XML SIG NY, Sean McGrath http://www.propylon.com Simple XComponent examples • Compound Operation - Matryoshka • Input: – <foo><bar>baz</bar></foo> • Output: foo bar – <foo></foo><bar></bar>baz foo bar baz baz XML SIG NY, Sean McGrath http://www.propylon.com Simple XComponent examples • KlingonCloak – Input: • <foo><bar>baz</bar></foo> – Output: – <tag name=“foo”><tag name=“bar”>baz</tag></tag> foo bar tag type=“foo” tag type=“bar” baz baz XML SIG NY, Sean McGrath http://www.propylon.com Sample XComponents • Once you start thinking in terms of Pipes – components appear everywhere: – – – – – – – Regular fragmentations Doctype changer Namespace normalizer Character set transcoder Hash generator Architectural Forms RelaxNG/Schematron etc • A validator can be thought of as a component in an XPipe that mirrors its input on its output XML SIG NY, Sean McGrath http://www.propylon.com Sample XComponents • Reading a file is an XML to XML transformation – <file>lewisscarrol.xml</file> – <poem><line>Twas brillig, and the slithy tomes, did gyre and gimbal in the wave</line>…</poem> XML SIG NY, Sean McGrath http://www.propylon.com Sample XComponents • Arithmetic is an XML to XML transformation – <expr>1 + 2</expr> – <res>3</res> XML SIG NY, Sean McGrath http://www.propylon.com Sample XComponents • Unix pipe utilities e.g. tr – hello world – HELLO WORLD XML SIG NY, Sean McGrath http://www.propylon.com Sample XComponents • Conditionals are XML to XML transformation “tee junctions” triggered by XPaths if XPath TRUE branch In if XPath if XPath FALSE branch XML SIG NY, Sean McGrath http://www.propylon.com Validation as an XComponent XML A Input RelaxNG Schematron Jython/Java/JACL XComponent Validation Log XML A’ Output Error XML SIG NY, Sean McGrath http://www.propylon.com Some related open technologies • • • • • • • • • • | - Unix Pipes SAX Filters TRAX XBeans Cocoon axKit Ant JXTA Translets TupleSpaces XML SIG NY, Sean McGrath http://www.propylon.com The XGrid • Grid Technologies – computational power “on tap” (http://www.gridforum.org) • The XGrid – computational power “on tap” to execute XPipes/XRigs XML SIG NY, Sean McGrath http://www.propylon.com The XGrid Out In Out DMZ XML SIG NY, Sean McGrath http://www.propylon.com Some objections (with some answers) • It will be slow Me at age 46 (Projected) Speed of De Spmodification ve ee lo d o pm f en t Me at age 36 Speed of Execution Me at age 26 The 3 Axes to Speed – No it won’t Premature optimization is the root of all evil! – Speed is a three headed monster. I’m old enough to have left the X axis and currently heading for Y through Z XML SIG NY, Sean McGrath http://www.propylon.com Some objections (with some answers) • It will be slow (cont.) – Massive Parallelism will kill all von Neumann throughput arguments • Documents per second, not seconds per document – throughput is the true measure of XML processing speed • Document fulcra – Locality of reference (Denning) applies to XML processing (more on this later) – A myriad of “compile time” optimizations on XPipes possible – Keep the architecture simple – and speed will sort itself out XML SIG NY, Sean McGrath http://www.propylon.com Some objections (with some answers) • Component based software? Harumph! We have heard that one before… – XPipe is data flow based not API based (COM, VBX, CORBA). They payload is what is important – not the plumbing – Information integration (needed on the server side)– not application integration (needed on the client side) XML SIG NY, Sean McGrath http://www.propylon.com Document fulcra and the scatter/gather pattern • For any given task t to be performed on documents conforming to schema s, there is a fragment expression that can be used to chop any document into n pieces on which t can be performed independently • These points are called fulcra and are a function of (t,s) XML SIG NY, Sean McGrath http://www.propylon.com Document fulcra and scatter/gather pattern • Having identified the fulcra:– Chop the input document into fragments – scatter phase – Perform t – Join all the processed fragments together to constitute the output document – gather phase • Three stage XPipe – scatter & gather are (or more accurately soon will be) standard XPipe components XML SIG NY, Sean McGrath http://www.propylon.com Document Fulcra Input Doc Scatter TIME n fragments Invoke t t t t t t n fragments Gather Output Doc XML SIG NY, Sean McGrath http://www.propylon.com Document Fulcra • For data-oriented XML, the fulcra often coincide with the “record” iteration in the XML schema and may be independent of t. • For document-oriented XML, the fulcra are much more dependent on t. • <Colloquial>A good fulcra based scatter/gather will make performance head north faster, cheaper and with a high upper limit than any amount of hand-crafted, genius level XML coding of your transformations.</Colloquial> XML SIG NY, Sean McGrath http://www.propylon.com The XSLT/DOM -> SAX nonsequiter • XSLT and DOM are memory bound – trade off between ease of use and resource usage – ease of use favoured • SAX is not memory bound – trade off between ease of use and resource usage – low resource usage favoured • On xml-dev users often advised to rewrite their apps using SAX! Ugh! XML SIG NY, Sean McGrath http://www.propylon.com XSLT/DOM -> XPipe • XPipe and scatter/gather allow you to keep the ease of use of XSLT/DOM with the finite resource utilization of SAX • As long as you can identify a good fulcrum function – They exist more often than not – If they exist, they are very easily found XML SIG NY, Sean McGrath http://www.propylon.com Current status • The philosophy is known to work – Seven years agrowing in consulting company (IDM 1995, Digitome) – Uniprocessor XPipe used to develop • 80-C pipe from Hub notation for a complex document type to a legacy mainframe display notation. 120 page spec. • 20-C pipe for semantic validation of legislation documents XML SIG NY, Sean McGrath http://www.propylon.com Current Status • Version 0.6 • Schemas for XPipes and XComponents on xpipe.sourceforge.net. – feedback required • Sample components (Java/XSLT/Jython) and some documentation • Simple, illustrative XComponent and XPipe uniprocessor executive XML SIG NY, Sean McGrath http://www.propylon.com Current Status • Object model for XCompontents in Jython + Java (David Starr) • Object model for Xpipes in Jython • Execution, testing utilities in Jython • Start of a NetBeans based XComponent editor XML SIG NY, Sean McGrath http://www.propylon.com Current Status • Uniprocessor XPipe used to develop – 80-C pipe from Hub notation for a complex document type to a legacy mainframe display notation. 120 page spec. – 20-C pipe for semantic validation of legislation documents – Xpipe and XComponent validators XML SIG NY, Sean McGrath http://www.propylon.com Current Status • Some aspects of the XComponent model need testing – Parameters – Exec XComponents – Pre/Post condition checking • This will be a point release in late Feb. Then focus on developing the XComponent repository in parallel with core dev. • Scatter/Gather raises some interesting scheduling issues currently being grappled with • Balance between developer-hit and ease of execution current in favour of low developer-hit XML SIG NY, Sean McGrath http://www.propylon.com Current Problems • No GUI stuff and not enough documentation • Everybody agrees that an XML document is a tree but: – The content and structure of the tree depends on the parser – The content and structure of re-generated XML (The round-tripping problem) – Roll on XML-SW! XML SIG NY, Sean McGrath http://www.propylon.com Current Problems • Naming things – Taxonomy of XTLs (XML Transformation Languages) – Taxonomy of re-usable XComponents and XPipes XML SIG NY, Sean McGrath http://www.propylon.com Current Problems • Flexible transformation scheduling is hard • Optimal transformation scheduling is very hard • Calling all process engineers – help! XML SIG NY, Sean McGrath http://www.propylon.com Future Plans • Evangelize the idea that DTD validated XML 1.0 is just Well Formed XML that has been through a pipe consisting of: – – – – – A transclusion component (entity expansion) A macro pre-processor (conditional marked sections) An attribute decorator (implied/fixed attributes) A grammar checker … XML SIG NY, Sean McGrath http://www.propylon.com Valid XML Well Formed XML Paremeter Entity Expansion Conditional Sections General Entity Expansion Attribute Decoration Grammer Validation Valid XML XML SIG NY, Sean McGrath http://www.propylon.com Future plans • When DOCTYPE goes away (which it will), provide all DTD functionality as a set of XComponents) XML SIG NY, Sean McGrath http://www.propylon.com Future Plans • Getting to the point where we can grow the XComponent repository is priority #1 • XRigs, XPipes, and XComponents as web services (SOAP/XML-RPC, WSDL, UDDI etc.) • Getting the P2P and Grid Technology communities input into XGrid/XJCL • See if a P2P execution environment for XRigs/XPipes can be shortcircuited e.g. JXTA • Getting help to develop the XPipe reference implementation on Sourceforge XML SIG NY, Sean McGrath http://www.propylon.com Future Plans • Development of commercial implementations of XPipe integrated with leading EAI systems (Ongoing) • Use of SCADA tools to develop XPipe process control and monitoring systems • Use of UML tools to create XPipes and XRigs using state transition diagrams XML SIG NY, Sean McGrath http://www.propylon.com Future Plans • Use of Animation Engineering techniques for CAXTE tools (Computer Aided XML Transformation Engineering) • Digging around swarm intelligence, hierarchy theory, complexity theory, selfassembly, bio-informatics and nanofabrication for concepts and tools applicable to XML transformations XML SIG NY, Sean McGrath http://www.propylon.com In conclusion • XPipe is simple • Simplicity works! • Plenty of evidence outside of XML engineering that this approach will work • Plenty of lore and tools from other fields of science can be brought to bear to build systems using the XPipe approach XML SIG NY, Sean McGrath http://www.propylon.com Musings #1 - Debugging • XPipe is very debugging friendly – log2(N) time required for fault diagnosis – “Probes” in the form of loggers, RelaxNG validators, easily plug-inable to a pipe to watch what is going on. – Pre/Post condition on/off switch is a useful “design by contract” debugger – Unit testing at Rig, Pipe and Component level allows layer at a time re-assembly after a fault has been fixed. XML SIG NY, Sean McGrath http://www.propylon.com Musings #2 – Inbetweening and XComponent development • Transformation analysts spec the transformation • Only need to code new components • Spec == XComponent or XPipe with doc, pre/post etc. but no code • Built in JIT-style acceptance test • Outsource friendly and third-party market friendly XML SIG NY, Sean McGrath http://www.propylon.com Musing #3 - Web Services • First generation will be a total blind alley – RPC • Document Oriented Messaging – not Object Oriented Messaging – the next stage in encapsulation and loose coupling – something like XPipe will be a prerequisite. XML SIG NY, Sean McGrath http://www.propylon.com Musing #4 – Parametric Typing of XComponents • Numerous XComponents that do the same thing, not necessarily duplication – Space – Time – Infoset considerations XML SIG NY, Sean McGrath http://www.propylon.com Musing #5 – Pre-validation Transformation • Killing ourselves seeking one-shot expressivity in schema validation languages • Many complex validations become a lot simpler if you do some transformation(s) first – Co-occurrence constraints – Contextual constraints • Clear analog with formatting (pre-flow transformation(s) + flow) XML SIG NY, Sean McGrath http://www.propylon.com Musing #6 – location, location, location • Abstraction 1: keep code and data on the same high-speed bus – monolithic systems • Abstraction 2: allow code to be downloaded from the Web – sandbox required owing to security issues • Abstraction 3: leave the code ‘out there’ and move the data – bandwidth issues and data >> code XML SIG NY, Sean McGrath http://www.propylon.com Musing #6 – location, location, location • Monolithic – bad (have to “install” stuff which is very 20th century) • Sandbox – bad (the better the sandbox the less useful the code running in it.) • XGrid – Design as if data pulled by the code (easy model) but DMZ the code + data – the only thing the flows over the firewall is the transformed data… XML SIG NY, Sean McGrath http://www.propylon.com Thank you – http://xpipe.sourceforge.net XML SIG NY, Sean McGrath http://www.propylon.com
© Copyright 2026 Paperzz