HarperCollins Agenda • Content Creation Process • What is DITA? • What is DITA Open Toolkit? • What does RSuite do? • Demo – Manuscript to ICML: Word -> DITA -> ICML – Workflow Engine – InDesign • Code – Java, XSLT, XQuery, Java APIs • “Groundbreaking” Topic Current Book Composition Process Step 1: Editorial Step 2: Composition Manuscript InDesign docx indd New Book Composition Process Step 1: Editorial Manuscript docx Step 2 Transform 1 Generate DITA XML xml Step 3 Transform 2 Generate ICML Download ICML Step 4: Composition InDesign icml indd What is DITA? • • • • • • • • Darwin Information Typing Architecture Is an XML Data Model for Authoring and Publishing Topic Oriented Each Topic is a separate XML file DocBook is Book Oriented, more Complex, One Big XML file DITA Initial Spec in 2001 DocBook Initial Spec in 1991 Core DITA Topic Types: – Concept – Task – Reference • Specialization: Subtyping – New Topics derived from existing What is DITA? • Topic must have at least: Id attribute in root, title, and body. • DITA MAP stitches topics together. What is DITA? Eliot Kimber • http://www.ditausers.org/tutorials/basics/kimber/ • http://www.xiruss.org/tutorials/dita-specialization/ Norm Walsh Post from October 2005: • http://norman.walsh.name/2005/10/21/dita Four key technical differences where DITA may be “better” than DocBook: 1. A topic-oriented authoring paradigm. 2. A cross-referencing scheme that's more practical than XML's flat ID space. 3. SGML's conref, reinvented. 4. An extensibility model based on "specialization". What is DITA Open Toolkit? • Open-source publishing system for DITA • Provides multi-channel output – https://github.com/dita-ot/dita-ot/ – https://dita-ot.github.io/ • Uses Pipeline Processing Approach using: – Java – XSLT – Rendering Engine (FOP, RTF, etc.) • DITA 4 Publishers What does RSuite do? • Centralized Repository for “all” artifacts • Provides: – Workflow – DITA Transforms 1. Manuscript to DITA 2. DITA to ICML 3. Multi-channel Output – PDF, ePub3, InDesign – Role Based Security – Distribution: 1. FTP to Commercial Printer 2. E-Commerce Sites SAN Drives RSuite Tomcat Server 500 GB – 100 GB / Disk Non XML Disk 1 Temp Directories Non XML Disk 2 1. XSLT Transforms 2. File Uploads Non XML Disk 3 MySQL Disk Non XML Disk 4 Non XML Disk 5 MarkLogic Node 1 MarkLogic Node 2 MarkLogic Node 3 4 CPU - 2 Core / CPU 4 CPU - 2 Core / CPU 4 CPU - 2 Core / CPU Disk 1 - Forest 1 600 GB Disk 1 - Forest 3 600 GB Disk 1 - Forest 5 600 GB Disk 2 - Forest 2 600 GB Disk 2 - Forest 4 600 GB Disk 2 - Forest 6 600 GB Disk 3 - Backup 300 GB Disk 3 - Backup 300 GB Disk 3 - Backup 300 GB Feature Request: Use XA Transaction: 1. File Copy 2. MySQL Update 3. Metadata Update RSuite Tomcat Server 1 SAN Drives 500 GB – 100 GB / Disk Non XML Disk 1 Temp Directories Non XML Disk 2 1. XSLT Transforms 2. File Uploads Non XML Disk 3 2 MySQL Disk Non XML Disk 4 Non XML Disk 5 3 MarkLogic Node 1 MarkLogic Node 2 MarkLogic Node 3 4 CPU - 2 Core / CPU 4 CPU - 2 Core / CPU 4 CPU - 2 Core / CPU Disk 1 - Forest 1 600 GB Disk 1 - Forest 3 600 GB Disk 1 - Forest 5 600 GB Disk 2 - Forest 2 600 GB Disk 2 - Forest 4 600 GB Disk 2 - Forest 6 600 GB Disk 3 - Backup 300 GB Disk 3 - Backup 300 GB Disk 3 - Backup 300 GB RSuite Demo? • Upload • Transforms • PDF, ePub • ICML to InDesign • MarkLogic Config Code? • Java • jBPM – Biz Process Management Framework • Ivy – to manage plugin dependencies • Ember.js • XQuery • Groovy • DITA-OT XSLT • Plugins • RSuite API Docs Groundbreaking Opportunity • Unleash the Tombstones! • All Content can be reused for product development DITA to RDF Transform! • Semantically Linked DITA • Link to Internal and External Content – DBPedia: http://wiki.dbpedia.org/Downloads39 – NY Times – Dublin Core – US Census – http://dbpedia.org/page/Mark_Twain • Semantic Links create a network of Knowledge • Enables Inferencing (ML8) • Uses MarkLogic Triple Index Why RDF? • RDF compliments DITA • Contains facts about DITA topics • Facts are stored in the Triple Index • Facts are used to: – Link internal and external documents – Derive other facts (inferencing) – Provide higher quality search result • RDF is efficient storage and linking mechanism • MarkLogic turns RDF into Triples Why Triples? Triple is a Subject-Predicate-Object (SPO) structure used to represent a fact. Lets computers derive facts from other facts without human involvement. Example: • Ted lives in Chicago, Illinois • Ted lives near Wrigley Field • Ted has a roommate called Sam • Ted and Sam go to Wrigley Field to watch games From these facts: • Sam lives in Chicago • Wrigley Field is in Chicago, Illinois • Chicago is in Illinois • Sam and Ted both live in the US • Etc… How to add Triples? • • • • Facts need to be curated. Data provenance Editors can add facts to DITA Topic Docs. New world of Semantic Publishing • Eroni Kumana Profiles in Courage Example • Add Facts to Chapter 4 DITA XML: – – – – – – – – “Profiles in Courage” Primary ISBN value is 0060854936 John F. Kennedy is the Author Of “Profiles in Courage” John F. Kennedy is a Person John F. Kennedy was at the Solomon Islands in August 1943 Eroni Kumana is a Person Eroni Kumana was at the Solomon Islands in August 1943 Eroni Kumana rescued John F. Kennedy Eroni Kumana is mentioned in Chapter 4, Profiles in Courage • Semantic Event – NY Times News Feed – Eroni Kumana died on August 2, 2014 • Event Triggers Automatic Pub: – CMS automatically publishes “Profiles in Courage” web page with snippet to the specific Chapter referencing Eroni Kumana. – New web page also has link to like and/or purchase book. Book Process Steps Step 1: Editorial Manuscript Step 2 Transform 1 Word 2 DITA docx Generate DITA XML Step 3 Transform 2 Generate ICML Download ICML InDesign DITA 2 ICML xml icml indd Step 3 Transform 3 DITA 2 RDF Step 4: Composition Generate Transient RDF rdf ML Triple Index SAN Drives RSuite Tomcat Server 500 GB – 100 GB / Disk Non XML Disk 1 Temp Directories Non XML Disk 2 1. XSLT Transforms 2. File Uploads Non XML Disk 3 MySQL Disk Non XML Disk 4 Non XML Disk 5 MarkLogic Node 1 MarkLogic Node 2 MarkLogic Node 3 4 CPU - 2 Core / CPU 4 CPU - 2 Core / CPU 4 CPU - 2 Core / CPU Index Index Triples Disk 1 - Forest 1 600 GB Index Triples Disk 1 - Forest 3 600 GB Triples Disk 1 - Forest 5 600 GB Disk 2 - Forest 2 600 GB Disk 2 - Forest 4 600 GB Disk 2 - Forest 6 600 GB Disk 3 - Backup 300 GB Disk 3 - Backup 300 GB Disk 3 - Backup 300 GB De-Silo-ize Custom APIs are used to communicate between silos. Web Host Provider DAM ISBN DB ebook Store Published Docs CMS Hub Spoke – No Silos Uses standardized RDF “connectors” to communicate. DAM Web Host Provider ISBN DB ebook Store Published Docs CMS Call To Action • Contribute to DITA RDF Project https://github.com/ColinMaudry/dita-rdf/blob/master/README.md • Build a Knowledge Engine
© Copyright 2026 Paperzz