meetup-2014-08-12

HarperCollins
Agenda
• Content Creation Process
• What is DITA?
• What is DITA Open Toolkit?
• What does RSuite do?
• Demo
– Manuscript to ICML: Word -> DITA -> ICML
– Workflow Engine
– InDesign
• Code – Java, XSLT, XQuery, Java APIs
• “Groundbreaking” Topic
Current Book Composition Process
Step 1: Editorial
Step 2:
Composition
Manuscript
InDesign
docx
indd
New Book Composition Process
Step 1: Editorial
Manuscript
docx
Step 2
Transform 1
Generate
DITA XML
xml
Step 3
Transform 2
Generate ICML
Download
ICML
Step 4:
Composition
InDesign
icml
indd
What is DITA?
•
•
•
•
•
•
•
•
Darwin Information Typing Architecture
Is an XML Data Model for Authoring and Publishing
Topic Oriented
Each Topic is a separate XML file
DocBook is Book Oriented, more Complex, One Big XML file
DITA Initial Spec in 2001
DocBook Initial Spec in 1991
Core DITA Topic Types:
– Concept
– Task
– Reference
• Specialization: Subtyping – New Topics derived from existing
What is DITA?
• Topic must have at least: Id attribute in root, title, and body.
• DITA MAP stitches topics together.
What is DITA?
Eliot Kimber
• http://www.ditausers.org/tutorials/basics/kimber/
• http://www.xiruss.org/tutorials/dita-specialization/
Norm Walsh Post from October 2005:
• http://norman.walsh.name/2005/10/21/dita
Four key technical differences where DITA may be “better” than DocBook:
1. A topic-oriented authoring paradigm.
2. A cross-referencing scheme that's more practical than XML's flat ID
space.
3. SGML's conref, reinvented.
4. An extensibility model based on "specialization".
What is DITA Open Toolkit?
• Open-source publishing system for DITA
• Provides multi-channel output
– https://github.com/dita-ot/dita-ot/
– https://dita-ot.github.io/
• Uses Pipeline Processing Approach using:
– Java
– XSLT
– Rendering Engine (FOP, RTF, etc.)
• DITA 4 Publishers
What does RSuite do?
• Centralized Repository for “all” artifacts
• Provides:
– Workflow
– DITA Transforms
1. Manuscript to DITA
2. DITA to ICML
3. Multi-channel Output – PDF, ePub3, InDesign
– Role Based Security
– Distribution:
1. FTP to Commercial Printer
2. E-Commerce Sites
SAN Drives
RSuite
Tomcat Server
500 GB – 100 GB / Disk
Non XML Disk 1
Temp Directories
Non XML Disk 2
1. XSLT Transforms
2. File Uploads
Non XML Disk 3
MySQL
Disk
Non XML Disk 4
Non XML Disk 5
MarkLogic
Node 1
MarkLogic
Node 2
MarkLogic
Node 3
4 CPU - 2 Core / CPU
4 CPU - 2 Core / CPU
4 CPU - 2 Core / CPU
Disk 1 - Forest 1
600 GB
Disk 1 - Forest 3
600 GB
Disk 1 - Forest 5
600 GB
Disk 2 - Forest 2
600 GB
Disk 2 - Forest 4
600 GB
Disk 2 - Forest 6
600 GB
Disk 3 - Backup
300 GB
Disk 3 - Backup
300 GB
Disk 3 - Backup
300 GB
Feature Request:
Use XA Transaction:
1. File Copy
2. MySQL Update
3. Metadata Update
RSuite
Tomcat Server
1
SAN Drives
500 GB – 100 GB / Disk
Non XML Disk 1
Temp Directories
Non XML Disk 2
1. XSLT Transforms
2. File Uploads
Non XML Disk 3
2
MySQL
Disk
Non XML Disk 4
Non XML Disk 5
3
MarkLogic
Node 1
MarkLogic
Node 2
MarkLogic
Node 3
4 CPU - 2 Core / CPU
4 CPU - 2 Core / CPU
4 CPU - 2 Core / CPU
Disk 1 - Forest 1
600 GB
Disk 1 - Forest 3
600 GB
Disk 1 - Forest 5
600 GB
Disk 2 - Forest 2
600 GB
Disk 2 - Forest 4
600 GB
Disk 2 - Forest 6
600 GB
Disk 3 - Backup
300 GB
Disk 3 - Backup
300 GB
Disk 3 - Backup
300 GB
RSuite Demo?
• Upload
• Transforms
• PDF, ePub
• ICML to InDesign
• MarkLogic Config
Code?
• Java
• jBPM – Biz Process Management Framework
• Ivy – to manage plugin dependencies
• Ember.js
• XQuery
• Groovy
• DITA-OT XSLT
• Plugins
• RSuite API Docs
Groundbreaking Opportunity
• Unleash the Tombstones!
• All Content can be reused for product development
DITA to RDF Transform!
• Semantically Linked DITA
• Link to Internal and External Content
– DBPedia: http://wiki.dbpedia.org/Downloads39
– NY Times
– Dublin Core
– US Census
– http://dbpedia.org/page/Mark_Twain
• Semantic Links create a network of Knowledge
• Enables Inferencing (ML8)
• Uses MarkLogic Triple Index
Why RDF?
• RDF compliments DITA
• Contains facts about DITA topics
• Facts are stored in the Triple Index
• Facts are used to:
– Link internal and external documents
– Derive other facts (inferencing)
– Provide higher quality search result
• RDF is efficient storage and linking mechanism
• MarkLogic turns RDF into Triples
Why Triples?
Triple is a Subject-Predicate-Object (SPO) structure used to represent a fact.
Lets computers derive facts from other facts without human involvement.
Example:
• Ted lives in Chicago, Illinois
• Ted lives near Wrigley Field
• Ted has a roommate called Sam
• Ted and Sam go to Wrigley Field to watch games
From these facts:
• Sam lives in Chicago
• Wrigley Field is in Chicago, Illinois
• Chicago is in Illinois
• Sam and Ted both live in the US
• Etc…
How to add Triples?
•
•
•
•
Facts need to be curated.
Data provenance
Editors can add facts to DITA Topic Docs.
New world of Semantic Publishing
• Eroni Kumana
Profiles in Courage Example
• Add Facts to Chapter 4 DITA XML:
–
–
–
–
–
–
–
–
“Profiles in Courage” Primary ISBN value is 0060854936
John F. Kennedy is the Author Of “Profiles in Courage”
John F. Kennedy is a Person
John F. Kennedy was at the Solomon Islands in August 1943
Eroni Kumana is a Person
Eroni Kumana was at the Solomon Islands in August 1943
Eroni Kumana rescued John F. Kennedy
Eroni Kumana is mentioned in Chapter 4, Profiles in Courage
• Semantic Event – NY Times News Feed
– Eroni Kumana died on August 2, 2014
• Event Triggers Automatic Pub:
– CMS automatically publishes “Profiles in Courage” web page with snippet to the
specific Chapter referencing Eroni Kumana.
– New web page also has link to like and/or purchase book.
Book Process Steps
Step 1: Editorial
Manuscript
Step 2
Transform 1
Word 2 DITA
docx
Generate
DITA XML
Step 3
Transform 2
Generate ICML
Download
ICML
InDesign
DITA 2 ICML
xml
icml
indd
Step 3
Transform 3
DITA 2 RDF
Step 4:
Composition
Generate
Transient RDF
rdf
ML
Triple
Index
SAN Drives
RSuite
Tomcat Server
500 GB – 100 GB / Disk
Non XML Disk 1
Temp Directories
Non XML Disk 2
1. XSLT Transforms
2. File Uploads
Non XML Disk 3
MySQL
Disk
Non XML Disk 4
Non XML Disk 5
MarkLogic Node 1
MarkLogic Node 2
MarkLogic Node 3
4 CPU - 2 Core / CPU
4 CPU - 2 Core / CPU
4 CPU - 2 Core / CPU
Index
Index
Triples
Disk 1 - Forest 1
600 GB
Index
Triples
Disk 1 - Forest 3
600 GB
Triples
Disk 1 - Forest 5
600 GB
Disk 2 - Forest 2
600 GB
Disk 2 - Forest 4
600 GB
Disk 2 - Forest 6
600 GB
Disk 3 - Backup
300 GB
Disk 3 - Backup
300 GB
Disk 3 - Backup
300 GB
De-Silo-ize
Custom APIs are used to communicate between silos.
Web Host
Provider
DAM
ISBN
DB
ebook
Store
Published
Docs
CMS
Hub Spoke – No Silos
Uses standardized RDF “connectors” to communicate.
DAM
Web Host
Provider
ISBN
DB
ebook
Store
Published
Docs
CMS
Call To Action
• Contribute to DITA RDF Project
https://github.com/ColinMaudry/dita-rdf/blob/master/README.md
• Build a Knowledge Engine