Migrating from FAST to EMC Documentum xPlore: What To Do and Why You'll Love It Ed Bueché EMC Distinguished Engineer and xPlore Architect Agenda • Introduction to xPlore • xPlore 1.2 new capabilities • FAST-to-xPlore Migration bestpractices Documentum xPlore at-a-glance • Documentum xPlore is the next generation search for Documentum • Replaces FAST: By end-of-year 2011 support will end for the FAST version of Search within Documentum • Technology Foundation: EMC xDB (native XML database) and Lucene • D6.5 SP2 (and later) compatible Indexing system – D6.5 SP2/3, D6.6, D6.7 – Client's supported by D6.5 SP2 will work without change (with small corner-case exceptions) – Dual mode migration: xPlore & FAST both active (index & query) on the same repository Migrate to xPlore • Support for search within Documentum based on FAST ends Dec 31, 2011 • Feature is replaced by Documentum xPlore – – – – xPlore 1.0 since Oct 2010 xPlore 1.1 since April 2011 xPlore 1.2 ~ Nov 2011 No additional license cost for xPlore! • Hundreds of deployments have already converted and hundreds more are in progress Other Sessions on xPlore at Momentum 2011 • Optimizing EMC Documentum: Performance and Scalability – Wednesday, 2nd November 2011, 2:00pm – 2:45pm – Covers xPlore scalability and tuning best practices • Optimizing EMC Documentum: Best Practices for Deployment – Wednesday, 2nd November 2011, 3:00pm – 3:45pm – Covers additional Documentum scalability improvements as well as HA / DR best practices for xPlore What is new in xPlore 1.2 (Nov 2011) • Thesaurus support o o • Customizable Natural Language Processing Pipeline o • o o Multiple CPS-per-xPlore instance Wildcard performance improvements Automatic query warmup New Languages supported o • Ability to subscribe to a query executed on an interval and be notified of the results Indexing and query performance o • UIMA support and custom text extraction Query-based subscriptions o • Synonyms, alternate spellings, acronyms Based on SKOS structure Russian, Arabic, Hebrew, and Brazilian Portuguese Administration o o Improved deployment and automation (CLI) Silent installer (local & remote) xPlore Features At-a-Glance xPlore 1.2: Thesaurus • Improve the “findability” (or recall) of the content • Allows for customer defined business thesaurus • Thesaurus support allows you to query for one name and get hits in documents that have the related names • Example: in the Pharma industry a drug is known by: – Scientific name – Internal code name – Marketing name xPlore 1.2 Thesaurus Feature Notes Simple Knowledge Organizational System (SKOS) • Simple Knowledge Organizational System (SKOS) – Standard representation for Thesaurus and Categorizations, etc – XML format (RDF) – Able to Represent synonyms, concepts • Case and Space insensitive format • Ability to store multiple Thesauri per Docbase • Ability to set default thesaurus • Can override thesaurus per query • Can specify multiple thesauri per query (clause specific) • Support in DFC Search Service & DQL from D6.5 SP2 and later (in latest patches) Thesaurus: cross-language term normalization and other use-cases ‘Commission’ • SKOS formatted thesaurus allows for cross language terminology mapping • Use-case: Ability to search for content in one language and get hits in others – Not a full translation mechanism but useful for domain specific cross language terms – Only one language is lemmatized, so most useful for names • Also possible to create other relationships aside from synonyms Επιτροπή комисия Example: searching for acetaminofén (in Spanish) With no thesaurus only spanish documents are found With Thesaurus alternate synonyms in multiple languages are found xPlore Thesaurus Administration Easy import mechanism Can define multiple thesaurus per docbase xPlore 1.2 : Customizable Natural Language Processing Pipeline • xPlore 1.2 opens the Natural Language Processing (NLP) pipeline to customization – Allows customers to go beyond the base linguistic analysis – Able to inject standard UIMA compliant customizations • Use cases include – – – – Entity extraction Classification ID normalization Custom Text Extraction Functional view of NLP in xPlore In coming doc event Content Fetch Text extraction CF TE Lang Identification LI Linguistic analysis LA Store in index Custom Text Extraction for xPlore NLP in 1.2 In coming doc CF TE Text extraction customization Post-Linguistic analysis UIMA extensions LI LA Store in index • Text Extraction customization based on Mime-type • Plugin-customization code can be defined for: o Pre-Text extraction o Text Extraction phase o Post-Text extraction • Plugins can be Java or C/C++ UIMA customizations supported post-Linguistic Analysisfor xPlore NLP in 1.2 In coming doc CF TE LI • Unstructured Information Management Architecture (UIMA) • Apache Standard Architecture for Natural Language Processing customizations • xPlore 1.2 allows for UIMA components to process and annotate document elements • Enables annotation of DFTXML without adding relational columns in the RDBMS LA Post-Linguistic analysis UIMA extensions Store in index Potential UIMA customization use-cases • ID normalization – Official Company ID’s: ABC-1234-D567-EFG – Users want to query on: D567EFG, because ABC-1234 is not selective – Pipeline step can create the alternative ID formulation • Classification – Examine text of document and automatically tag it with additional metadata based on a taxonomy – CIS-based classification documented as an xCellerator • Entity Extraction – Extract entities with 3rd party entity extractor Advanced Customizations: Remote UIMA & Custom Thesaurus In coming doc CF TE LI Store in index LA Remote UIMA Query Processor Custom Thesaurus access 3rd Party external component Get additional terms Query with additional terms Query-based Subscriptions (QBS) • User to subscribe to the periodic execution of stored query • Notified of any new results since the last execution • Queries execute on defined intervals – Hour, day, week, or month • Result notification – Email or – the initiation of an xCP-defined business process Query-by-subscription Overview Subscription ? Query executed Query executed automatically ? ? Stored queries User subscribes to query Results fed to business process User notified Query defined and stored results xCP Business Process Subscribing to a saved query Users, Subscriptions, Queries, Results Some Relationships User ‘A’ Subscription to query #1 from User ‘A’ User ‘B’ Subscription to query #2 from User ‘A’ Results: user ‘A’ query #2 Subscription to query #2 from User ‘B’ Results: user ‘B’ query #2 ? ? Query #1 Query #2 QBS user activity report Provides information on each user’s subscription activity QBS activity report by subscription ID Query-based Subscriptions: Delivery Notes • Supported only with D6.7 SP1 and later DFC and Content Servers • TaskSpace components delivered as xCellerator – To be posted when D6.7 ships in Nov • API available for custom UI’s, this includes – Stored query definition (dm_smartlist) – Subscription definition and management Additional xPlore 1.2 Enhancements • Multiple CPS processes on single xPlore Instance – Significantly simplifies content processing scaling • Improved wildcard query performance • New Language Certifications – Russian, Arabic, Hebrew, and Brazilian Portuguese Agenda • Introduction to xPlore • xPlore 1.2 new capabilities • FAST-to-xPlore Migration bestpractices FAST-to-xPlore Migration Best-practices at-a-glance • Stay current with software • RTO: Backup / Restore • Plan and Test scale with larger environments • Convert Legacy DQL Apps to DFC Search Service • SAN’s Provide best performance Stay Current with Software • xPlore 1.1 shipped with DCTM 6.7 in April 2011 – Why would you start your deployment with xPlore 1.0 ? • Patch Roll up releases available each month – Available Sept 30: xPlore 1.1 P03 and xPlore 1.0 P13 – Available Oct 30: xPlore 1.1 P04 • Some important items covered – Snapshot-too-old consistency fixes – Improved diagnostics and repair for index inconsistencies – Fix for result inconsistency due to updates RTO: Backup / Restore • Recovery Time Objective – The target time to restore the system back to service after some sort of failure – Usually a target set by business users • Example characterization of hardware failure in Google’s data centers: – In cluster (of 1800 machines), 1,000 will fail somehow in first year of service – Thousands of hard drives will fail – 50% chance that rack will overheat • xPlore migrations typically involve new hardware in new operating environments – Human & Environment failures will be higher than normal • Time to recovery varies – – – – Dual xPlore systems provide fastest (but most expensive) RTO Sometimes (not always) data failure can be rectified with xPlore repair tools Restore from backup is next fastest Re-feed from Documentum is the slowest To be discussed in more detail on Wed, Nov 2 in “Optimizing EMC Documentum: Best Practices for Deployment” at 3pm Convert Legacy DQL Apps to DFC Search Service • API Options for Documentum Search Applications – IDfQuery and DQL • Legacy compatibility – DFC Search Service & automatically generated XQuery • Foundation for Advanced Search since D6.6 – IDfXQuery and custom defined Xquery • Used primary for Zone Search of XML • For most uses, the DFC Search Service is the best choice – Best performance: Pulls the least amount of data per bounded result – Native facets supported • Not part of DQL • avoids huge result set ingestion – More efficient date range query processing Plan and Test scale with larger environments • xPlore provides great out-of-box support for Documentum • However, some aspects might require index tuning – If index tuning not done, then re-feed or index re-build might be required – Important to find these in larger test environments than production environments • Items to watch for – – – – Special character tuning Date and integer values for custom DCTM object types Metadata wildcard optimization options Native Facet values • Leverage Free xPlore tools on EDN! – https://community.emc.com/docs/DOC-8922 SAN’s provide best performance • xPlore Supports SAN’s, NAS, and local disk • Services and functionality varies across disk hardware • All-else-equal: SAN volumes offer best performance and are recommended • However, some capabilities not available – Basic xPlore host sparing – Simple inter-host data movement • If leveraging NAS, please review latest guidelines on configuration EMC Symmetrix: Nondisruptive Mobility Virtual LUN VP Mobility Virtual Pools Flash 400 GB RAID 5 Fibre Channel 600 GB 15K RAID 1 SATA 2 TB RAID 6 • Fast, efficient mobility • Maintains replication and quality of service during relocations • Supports up to thousands of concurrent VP LUN migrations • Recommendation: work with storage technicians to ensure backend storage has sufficient I/O • Questions? • Comments? • See xPlore on EMC Developer Network: – https://community.emc.com/docs/DOC-8945 THANK YOU This presentation is also available at www.momentumeurope.com password: spree
© Copyright 2026 Paperzz