Informatica Data Quality (Version 9.5.1) User Guide Informatica Data Quality User Guide Version 9.5.1 December 2012 Copyright (c) 2009-2012 Informatica. All rights reserved. This software and documentation contain proprietary information of Informatica Corporation and are provided under a license agreement containing restrictions on use and disclosure and are also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any form, by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. This Software may be protected by U.S. and/or international Patents and other Patents Pending. Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as provided in DFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013©(1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applicable. The information in this product or documentation is subject to change without notice. If you find any problems in this product or documentation, please report them to us in writing. Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange, PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange Informatica On Demand, Informatica Identity Resolution, Informatica Application Information Lifecycle Management, Informatica Complex Event Processing, Ultra Messaging and Informatica Master Data Management are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks of their respective owners. Portions of this software and/or documentation are subject to copyright held by third parties, including without limitation: Copyright DataDirect Technologies. All rights reserved. Copyright © Sun Microsystems. All rights reserved. Copyright © RSA Security Inc. All Rights Reserved. Copyright © Ordinal Technology Corp. All rights reserved.Copyright © Aandacht c.v. All rights reserved. Copyright Genivia, Inc. All rights reserved. Copyright Isomorphic Software. All rights reserved. Copyright © Meta Integration Technology, Inc. All rights reserved. Copyright © Intalio. All rights reserved. Copyright © Oracle. All rights reserved. Copyright © Adobe Systems Incorporated. All rights reserved. Copyright © DataArt, Inc. All rights reserved. Copyright © ComponentSource. All rights reserved. Copyright © Microsoft Corporation. All rights reserved. Copyright © Rogue Wave Software, Inc. All rights reserved. Copyright © Teradata Corporation. All rights reserved. Copyright © Yahoo! Inc. All rights reserved. Copyright © Glyph & Cog, LLC. All rights reserved. Copyright © Thinkmap, Inc. All rights reserved. Copyright © Clearpace Software Limited. All rights reserved. Copyright © Information Builders, Inc. All rights reserved. Copyright © OSS Nokalva, Inc. All rights reserved. Copyright Edifecs, Inc. All rights reserved. Copyright Cleo Communications, Inc. All rights reserved. Copyright © International Organization for Standardization 1986. All rights reserved. Copyright © ej-technologies GmbH. All rights reserved. Copyright © Jaspersoft Corporation. All rights reserved. Copyright © is International Business Machines Corporation. All rights reserved. Copyright © yWorks GmbH. All rights reserved. Copyright © Lucent Technologies. All rights reserved. Copyright (c) University of Toronto. All rights reserved. Copyright © Daniel Veillard. All rights reserved. Copyright © Unicode, Inc. Copyright IBM Corp. All rights reserved. Copyright © MicroQuill Software Publishing, Inc. All rights reserved. Copyright © PassMark Software Pty Ltd. All rights reserved. Copyright © LogiXML, Inc. All rights reserved. Copyright © 2003-2010 Lorenzi Davide, All rights reserved. Copyright © Red Hat, Inc. All rights reserved. Copyright © The Board of Trustees of the Leland Stanford Junior University. All rights reserved. Copyright © EMC Corporation. All rights reserved. Copyright © Flexera Software. All rights reserved. This product includes software developed by the Apache Software Foundation (http://www.apache.org/), and other software which is licensed under the Apache License, Version 2.0 (the "License"). You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. This product includes software which was developed by Mozilla (http://www.mozilla.org/), software copyright The JBoss Group, LLC, all rights reserved; software copyright © 1999-2006 by Bruno Lowagie and Paulo Soares and other software which is licensed under the GNU Lesser General Public License Agreement, which may be found at http:// www.gnu.org/licenses/lgpl.html. The materials are provided free of charge by Informatica, "as-is", without warranty of any kind, either express or implied, including but not limited to the implied warranties of merchantability and fitness for a particular purpose. The product includes ACE(TM) and TAO(TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University, University of California, Irvine, and Vanderbilt University, Copyright (©) 1993-2006, all rights reserved. This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit (copyright The OpenSSL Project. All Rights Reserved) and redistribution of this software is subject to terms available at http://www.openssl.org and http://www.openssl.org/source/license.html. This product includes Curl software which is Copyright 1996-2007, Daniel Stenberg, <[email protected]>. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http://curl.haxx.se/docs/copyright.html. Permission to use, copy, modify, and distribute this software for any purpose with or without fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies. The product includes software copyright 2001-2005 (©) MetaStuff, Ltd. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http://www.dom4j.org/ license.html. The product includes software copyright © 2004-2007, The Dojo Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http://dojotoolkit.org/license. This product includes ICU software which is copyright International Business Machines Corporation and others. All rights reserved. Permissions and limitations regarding this software are subject to terms available at http://source.icu-project.org/repos/icu/icu/trunk/license.html. This product includes software copyright © 1996-2006 Per Bothner. All rights reserved. Your right to use such materials is set forth in the license which may be found at http:// www.gnu.org/software/ kawa/Software-License.html. This product includes OSSP UUID software which is Copyright © 2002 Ralf S. Engelschall, Copyright © 2002 The OSSP Project Copyright © 2002 Cable & Wireless Deutschland. Permissions and limitations regarding this software are subject to terms available at http://www.opensource.org/licenses/mit-license.php. This product includes software developed by Boost (http://www.boost.org/) or under the Boost software license. Permissions and limitations regarding this software are subject to terms available at http:/ /www.boost.org/LICENSE_1_0.txt. This product includes software copyright © 1997-2007 University of Cambridge. Permissions and limitations regarding this software are subject to terms available at http:// www.pcre.org/license.txt. This product includes software copyright © 2007 The Eclipse Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available at http:// www.eclipse.org/org/documents/epl-v10.php. This product includes software licensed under the terms at http://www.tcl.tk/software/tcltk/license.html, http://www.bosrup.com/web/overlib/?License, http://www.stlport.org/ doc/ license.html, http://www.asm.ow2.org/license.html, http://www.cryptix.org/LICENSE.TXT, http://hsqldb.org/web/hsqlLicense.html, http://httpunit.sourceforge.net/doc/ license.html, http://jung.sourceforge.net/license.txt , http://www.gzip.org/zlib/zlib_license.html, http://www.openldap.org/software/release/license.html, http://www.libssh2.org, http://slf4j.org/license.html, http://www.sente.ch/software/OpenSourceLicense.html, http://fusesource.com/downloads/license-agreements/fuse-message-broker-v-5-3- licenseagreement; http://antlr.org/license.html; http://aopalliance.sourceforge.net/; http://www.bouncycastle.org/licence.html; http://www.jgraph.com/jgraphdownload.html; http:// www.jcraft.com/jsch/LICENSE.txt. http://jotm.objectweb.org/bsd_license.html; . http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231; http://www.slf4j.org/ license.html; http://developer.apple.com/library/mac/#samplecode/HelpHook/Listings/HelpHook_java.html; http://nanoxml.sourceforge.net/orig/copyright.html; http:// www.json.org/license.html; http://forge.ow2.org/projects/javaservice/, http://www.postgresql.org/about/licence.html, http://www.sqlite.org/copyright.html, http://www.tcl.tk/ software/tcltk/license.html, http://www.jaxen.org/faq.html, http://www.jdom.org/docs/faq.html, http://www.slf4j.org/license.html; http://www.iodbc.org/dataspace/iodbc/wiki/ iODBC/License; http://www.keplerproject.org/md5/license.html; http://www.toedter.com/en/jcalendar/license.html; http://www.edankert.com/bounce/index.html; http://www.netsnmp.org/about/license.html; http://www.openmdx.org/#FAQ; http://www.php.net/license/3_01.txt; http://srp.stanford.edu/license.txt; http://www.schneier.com/blowfish.html; http://www.jmock.org/license.html; http://xsom.java.net; and http://benalman.com/about/license/. This product includes software licensed under the Academic Free License (http://www.opensource.org/licenses/afl-3.0.php), the Common Development and Distribution License (http://www.opensource.org/licenses/cddl1.php) the Common Public License (http://www.opensource.org/licenses/cpl1.0.php), the Sun Binary Code License Agreement Supplemental License Terms, the BSD License (http:// www.opensource.org/licenses/bsd-license.php) the MIT License (http://www.opensource.org/licenses/mitlicense.php) and the Artistic License (http://www.opensource.org/licenses/artistic-license-1.0). This product includes software copyright © 2003-2006 Joe WaInes, 2006-2007 XStream Committers. All rights reserved. Permissions and limitations regarding this software are subject to terms available at http://xstream.codehaus.org/license.html. This product includes software developed by the Indiana University Extreme! Lab. For further information please visit http://www.extreme.indiana.edu/. This product includes software developed by Andrew Kachites McCallum. "MALLET: A Machine Learning for Language Toolkit." http://mallet.cs.umass.edu (2002). This Software is protected by U.S. Patent Numbers 5,794,246; 6,014,670; 6,016,501; 6,029,178; 6,032,158; 6,035,307; 6,044,374; 6,092,086; 6,208,990; 6,339,775; 6,640,226; 6,789,096; 6,820,077; 6,823,373; 6,850,947; 6,895,471; 7,117,215; 7,162,643; 7,243,110, 7,254,590; 7,281,001; 7,421,458; 7,496,588; 7,523,121; 7,584,422; 7676516; 7,720,842; 7,721,270; and 7,774,791, international Patents and other Patents Pending. DISCLAIMER: Informatica Corporation provides this documentation "as is" without warranty of any kind, either express or implied, including, but not limited to, the implied warranties of noninfringement, merchantability, or use for a particular purpose. Informatica Corporation does not warrant that this software or documentation is error free. The information provided in this software or documentation may include technical inaccuracies or typographical errors. The information in this software and documentation is subject to change at any time without notice. NOTICES This Informatica product (the "Software") includes certain drivers (the "DataDirect Drivers") from DataDirect Technologies, an operating company of Progress Software Corporation ("DataDirect") which are subject to the following terms and conditions: 1. THE DATADIRECT DRIVERS ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. 2. IN NO EVENT WILL DATADIRECT OR ITS THIRD PARTY SUPPLIERS BE LIABLE TO THE END-USER CUSTOMER FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, CONSEQUENTIAL OR OTHER DAMAGES ARISING OUT OF THE USE OF THE ODBC DRIVERS, WHETHER OR NOT INFORMED OF THE POSSIBILITIES OF DAMAGES IN ADVANCE. THESE LIMITATIONS APPLY TO ALL CAUSES OF ACTION, INCLUDING, WITHOUT LIMITATION, BREACH OF CONTRACT, BREACH OF WARRANTY, NEGLIGENCE, STRICT LIABILITY, MISREPRESENTATION AND OTHER TORTS. Part Number: DQ-UG-95100-0001 Table of Contents Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Informatica Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Informatica Customer Portal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Informatica Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Informatica Web Site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Informatica How-To Library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Informatica Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Informatica Multimedia Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Informatica Global Customer Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Part I: Informatica Data Quality Concepts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Chapter 1: Introduction to Data Quality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Data Quality Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Chapter 2: Reference Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Reference Data Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 User-Defined Reference Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Informatica Reference Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Reference Data and Transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Reference Table Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Managed and Unmanaged Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Content Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Character Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Classifier Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Pattern Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Probabilistic Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Regular Expressions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Token Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Creating a Content Set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Creating a Reusable Content Expression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Part II: Data Quality Features in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Chapter 3: Column Profiles in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . 19 Column Profile Concepts Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Column Profile Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Table of Contents i Scorecards. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Column Profiles in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Filtering Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Sampling Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Creating a Single Data Object Profile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 Chapter 4: Column Profile Results in Informatica Developer. . . . . . . . . . . . . . . . . . . 23 Column Profile Results in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 Column Value Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Column Pattern Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Column Statistics Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 Exporting Profile Results from Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Chapter 5: Rules in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Rules in Informatica Developer Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Creating a Rule in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 Applying a Rule in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 Chapter 6: Scorecards in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Scorecards in Informatica Developer Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Creating a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 Chapter 7: Mapplet and Mapping Profiling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Mapplet and Mapping Profiling Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Running a Profile on a Mapplet or Mapping Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Comparing Profiles for Mapping or Mapplet Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Generating a Mapping from a Profile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Chapter 8: Reference Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Reference Tables Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Reference Table Data Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Creating a Reference Table Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Creating a Reference Table from a Flat File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 Creating a Reference Table from a Relational Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Copying a Reference Table in the Model Repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 Part III: Data Quality Features in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Chapter 9: Column Profiles in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Column Profiles in Informatica Analyst Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 Column Profiling Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Profile Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Profile Results Option. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 ii Table of Contents Sampling Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Drilldown Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Creating a Column Profile in the Analyst Tool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Editing a Column Profile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Running a Profile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Creating a Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Managing Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Synchronizing a Flat File Data Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Synchronizing a Relational Data Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Chapter 10: Column Profile Results in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . 45 Column Profile Results in Informatica Analyst Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Profile Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Column Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Column Patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Column Statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 Column Profile Drilldown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Drilling Down on Row Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Applying Filters to Drilldown Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Column Profile Export Files in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Profile Export Results in a CSV File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Profile Export Results in Microsoft Excel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Exporting Profile Results from Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 Chapter 11: Rules in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Rules in Informatica Analyst Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Predefined Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Predefined Rules Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Applying a Predefined Rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Expression Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Expression Rules Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Creating an Expression Rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Chapter 12: Scorecards in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Scorecards in Informatica Analyst Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Informatica Analyst Scorecard Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Metric Weights. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Adding Columns to a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Running a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Viewing a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 Editing a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Defining Thresholds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Table of Contents iii Metric Groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Drilling Down on Columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Viewing Trend Charts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 Scorecard Notifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Notification Email Message Template. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 Setting Up Scorecard Notifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Configuring Global Settings for Scorecard Notifications. . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Scorecard Integration with External Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Viewing a Scorecard in External Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Chapter 13: Exception Record Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Exception Record Management Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Exception Management Process Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 Reserved Column Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Exception Management Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Viewing and Editing Bad Records. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Updating Bad Record Status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Viewing and Filtering Duplicate Record Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Editing Duplicate Record Clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Consolidating Duplicate Record Clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Viewing the Audit Trail. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 Chapter 14: Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Reference Tables Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Reference Table Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 General Reference Table Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Reference Table Column Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Create Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 Creating a Reference Table in the Reference Table Editor. . . . . . . . . . . . . . . . . . . . . . . . . 73 Create a Reference Table from Profile Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Creating a Reference Table from Profile Columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Creating a Reference Table from Column Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Creating a Reference Table from Column Patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Create a Reference Table From a Flat File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Analyst Tool Flat File Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 Creating a Reference Table from a Flat File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Create a Reference Table from a Database Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Creating a Database Connection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Creating a Reference Table from a Database Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Copying a Reference Table in the Model Repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Reference Table Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Managing Columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 Managing Rows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 iv Table of Contents Finding and Replacing Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 Exporting a Reference Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Audit Trail Events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 Viewing Audit Trail Events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Rules and Guidelines for Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 Table of Contents v Preface The Informatica Data Quality User Guide is written for Informatica users who create and run data quality processes in the Informatica Developer and Informatica Analyst client applications. The Informatica Data Quality User Guide contains information about profiles and other objects that you can use to analyze the content and structure of data and to find and fix data quality issues. Informatica Resources Informatica Customer Portal As an Informatica customer, you can access the Informatica Customer Portal site at http://mysupport.informatica.com. The site contains product information, user group information, newsletters, access to the Informatica customer support case management system (ATLAS), the Informatica How-To Library, the Informatica Knowledge Base, the Informatica Multimedia Knowledge Base, Informatica Product Documentation, and access to the Informatica user community. Informatica Documentation The Informatica Documentation team takes every effort to create accurate, usable documentation. If you have questions, comments, or ideas about this documentation, contact the Informatica Documentation team through email at [email protected]. We will use your feedback to improve our documentation. Let us know if we can contact you regarding your comments. The Documentation team updates documentation as needed. To get the latest documentation for your product, navigate to Product Documentation from http://mysupport.informatica.com. Informatica Web Site You can access the Informatica corporate web site at http://www.informatica.com. The site contains information about Informatica, its background, upcoming events, and sales offices. You will also find product and partner information. The services area of the site includes important information about technical support, training and education, and implementation services. Informatica How-To Library As an Informatica customer, you can access the Informatica How-To Library at http://mysupport.informatica.com. The How-To Library is a collection of resources to help you learn more about Informatica products and features. It includes articles and interactive demonstrations that provide solutions to common problems, compare features and behaviors, and guide you through performing specific real-world tasks. vi Informatica Knowledge Base As an Informatica customer, you can access the Informatica Knowledge Base at http://mysupport.informatica.com. Use the Knowledge Base to search for documented solutions to known technical issues about Informatica products. You can also find answers to frequently asked questions, technical white papers, and technical tips. If you have questions, comments, or ideas about the Knowledge Base, contact the Informatica Knowledge Base team through email at [email protected]. Informatica Multimedia Knowledge Base As an Informatica customer, you can access the Informatica Multimedia Knowledge Base at http://mysupport.informatica.com. The Multimedia Knowledge Base is a collection of instructional multimedia files that help you learn about common concepts and guide you through performing specific tasks. If you have questions, comments, or ideas about the Multimedia Knowledge Base, contact the Informatica Knowledge Base team through email at [email protected]. Informatica Global Customer Support You can contact a Customer Support Center by telephone or through the Online Support. Online Support requires a user name and password. You can request a user name and password at http://mysupport.informatica.com. Use the following telephone numbers to contact Informatica Global Customer Support: North America / South America Europe / Middle East / Africa Asia / Australia Toll Free Toll Free Toll Free Brazil: 0800 891 0202 France: 0805 804632 Australia: 1 800 151 830 Mexico: 001 888 209 8853 Germany: 0800 5891281 New Zealand: 09 9 128 901 North America: +1 877 463 2435 Italy: 800 915 985 Netherlands: 0800 2300001 Portugal: 800 208 360 Standard Rate Spain: 900 813 166 India: +91 80 4112 5738 Switzerland: 0800 463 200 United Kingdom: 0800 023 4632 Standard Rate Belgium: +31 30 6022 797 France: +33 1 4138 9226 Germany: +49 1805 702 702 Netherlands: +31 306 022 797 United Kingdom: +44 1628 511445 Preface vii viii Part I: Informatica Data Quality Concepts This part contains the following chapters: ¨ Introduction to Data Quality, 2 ¨ Reference Data, 4 1 CHAPTER 1 Introduction to Data Quality This chapter includes the following topic: ¨ Data Quality Overview, 2 Data Quality Overview Use Informatica Data Quality to analyze the content and structure of your data and enhance the data in ways that meet your business needs. You use Informatica applications to design and run processes to complete the following tasks: ¨ Profile data. Profiling reveals the content and structure of data. Profiling is a key step in any data project, as it can identify strengths and weaknesses in data and help you define a project plan. ¨ Create scorecards to review data quality. A scorecard is a graphical representation of the quality measurements in a profile. ¨ Standardize data values. Standardize data to remove errors and inconsistencies that you find when you run a profile. You can standardize variations in punctuation, formatting, and spelling. For example, you can ensure that the city, state, and ZIP code values are consistent. ¨ Parse data. Parsing reads a field composed of multiple values and creates a field for each value according to the type of information it contains. Parsing can also add information to records. For example, you can define a parsing operation to add units of measurement to product data. ¨ Validate postal addresses. Address validation evaluates and enhances the accuracy and deliverability of postal address data. Address validation corrects errors in addresses and completes partial addresses by comparing address records against address reference data from national postal carriers. Address validation can also add postal information that speeds mail delivery and reduces mail costs. ¨ Find duplicate records. Duplicate analysis calculates the degrees of similarity between records by comparing data from one or more fields in each record. You select the fields to be analyzed, and you select the comparison strategies to apply to the data. The Developer tool enables two types of duplicate analysis: field matching, which identifies similar or duplicate records, and identity matching, which identifies similar or duplicate identities in record data. ¨ Manage exceptions. An exception is a record that contains data quality issues that you correct by hand. You can run a mapping to capture any exception record that remains in a data set after you run other data quality processes. You review and edit exception records in the Analyst tool or in Informatica Data Director for Data Quality. ¨ Create reference data tables. Informatica provides reference data that can enhance several types of data quality process, including standardization and parsing. You can create reference tables using data from profile results. 2 ¨ Create and run data quality rules. Informatica provides rules that you can run or edit to meet your project objectives. You can create mapplets and validate them as rules in the Developer tool. ¨ Collaborate with Informatica users. The Model repository stores reference data and rules, and this repository is available to users of the Developer tool and Analyst tool. Users can collaborate on projects, and different users can take ownership of objects at different stages of a project. ¨ Export mappings to PowerCenter. You can export and run mappings in PowerCenter. You can export mappings to PowerCenter to reuse the metadata for physical data integration or to create web services. Data Quality Overview 3 CHAPTER 2 Reference Data This chapter includes the following topics: ¨ Reference Data Overview, 4 ¨ User-Defined Reference Data, 5 ¨ Informatica Reference Data, 6 ¨ Reference Data and Transformations, 6 ¨ Reference Tables, 7 ¨ Content Sets, 8 Reference Data Overview A reference data object contains a set of data values that you perform search operations in source data. You can create reference data objects in the Developer tool and Analyst tool, and you can import reference data objects to the Model repository. The Data Quality Content installer includes reference data objects that you can import. You can create and edit the following types of reference data: Reference tables A reference table contains standard and alternative versions of a set of data values. You add a reference table to a transformation in the Developer tool to verify that source data values are accurate and correctly formatted. A database table contains at least two columns. One column contains the standard or preferred version of a string, and other columns contain alternative versions. When you add a reference table to a transformation, the transformation searches the input port data for values that also appear in the table. You can create tables with any data that is useful to the data project you work on. Content Sets Content sets are repository and file objects that contain reference data values. Content sets are similar in structure to reference tables but they are more commonly used for lower-level There are different types of content sets. When you add a content set to a transformation, the transformation searches the input port data for values that appear in the content or for strings that match the data patterns defined in the content set. The Data Quality Content installer includes reference data objects that you can import. You download the Data Quality Content Installer from Informatica. The Data Quality Content installer includes the following types of reference data: 4 Informatica reference tables Database tables created by Informatica. You import Informatica reference tables when you import accelerator objects from the Content Installer. The reference tables contain standard and alternative versions of common business terms from several countries. The types of reference information include telephone area codes, postcode formats, first names, Social Security number formats, occupations, and acronyms. You can edit Informatica reference tables. Informatica content sets Content sets created by Informatica. You import content sets when you import accelerator objects from the Content Installer. A content set contains different types of reference data that you can use to perform search operations in data quality transformations. Address reference data files Reference data files that identify all valid addresses in a country. The Address Validator transformation reads this data. You cannot create or edit address reference data files. The Content Installer installs files for the countries that you have purchased. Address reference data is current for a defined period and you must refresh your data regularly, for example every quarter. You cannot view or edit address reference data. Identity population files Contain information on types of personal, household, and corporate identities. The Match transformation and the Comparison transformation use this data to parse potential identities from input fields. You cannot create or edit address identity population files. The Content Installer writes population files to the file system. User-Defined Reference Data You can use the values in a data object to create a reference data object. For example, you can select a data object or profile column that contains values that are specific to a project or organization. The column values let you create custom reference data objects for a project. You can build a reference data object from a data column in the following cases: ¨ The data rows in the column contain the same type of information. ¨ The column contains a set of data values that are either correct or incorrect for the project. Note: Create a reference object with incorrect values when you want to search a data set for incorrect values. The following table lists common examples of project data columns that can contain reference data: Information Reference Data Example Stock Keeping Unit (SKU) codes Use an SKU column to create a reference table of valid SKU code for an organization. Use the reference table to find correct or incorrect SKU codes in a data set. Employee codes Use an employee code or employee ID column to create a reference table of valid employee codes. Use the reference table to find errors in employee data. User-Defined Reference Data 5 Information Reference Data Example Customer account numbers Run a profile on a customer account column to identify account number patterns. Use the profile to create a token set of incorrect data patterns. Use the token set to find account numbers that do not conform to the correct account number structure. Customer names When a customer name column contains first, middle, and last names, you can create a probabilistic model that defines the expected structure of the strings in the column. Use the probabilistic model to find data strings that do not belong in the column. Informatica Reference Data You purchase and download address reference data and identity population data from Informatica. You purchase an annual subscription to address data for a country, and you can download the latest address data from Informatica at any time during the subscription period. The Content Installer user downloads and installs reference data separately from the applications. Contact an Administrator tool user for information about the reference data installed on your system Reference Data and Transformations Several transformations read reference data to perform data quality tasks. The following transformations can read reference data: ¨ Address Validator. Reads address reference data to verify the accuracy of addresses. ¨ Case Converter. Reads reference data tables to identify strings that must change case. ¨ Classifier. Reads content set data to identify the type of information in a string. ¨ Comparison. Reads identity population data during duplicate analysis. ¨ Labeler. Reads content set data to identify and label strings. ¨ Match. Reads identity population data during duplicate analysis. ¨ Parser. Reads content set data to parse strings based on the information the contain. ¨ Standardizer. Reads reference data tables to standardize strings to a common format. You can create reference data objects in the Developer tool and Analyst tool. For example, you can create a reference table from column profile data. You can export reference tables to the file system. The Data Quality Content Installer file set includes Informatica reference data objects that you can import. 6 Chapter 2: Reference Data Reference Tables A reference table contains the standard versions of a set of data values and any alternative version of the values that you may want to find. You add reference tables to transformations in the Developer tool. You create reference tables in the following ways: ¨ Create a reference table object and enter data values. ¨ Create a reference table from column profile results. ¨ Create a reference table from data in a flat file. ¨ Create a reference table from data in another database table. When you create a reference table, the Model repository stores the table metadata. The staging database or another database stores the column data values. After you create a reference table, you can add and edit columns, rows, and data values. You can also search and replace values in reference table rows. Reference Table Structure Most reference tables contain at least two columns. One column contains the correct or required versions of the data values. Other columns contain different versions of the values, including alternative versions that may appear in the source data. The column that contains the correct or required values is called the valid column. When a transformation reads a reference table in a mapping, the transformation looks for values in the non-valid columns. When the transformation finds a non-valid value, it returns the corresponding value from the valid column. You can also configure a transformation to return a single common value instead of the valid values. The valid column can contain data that is formally correct, such as ZIP codes. It can contain data that is relevant to a project, such as stock keeping unit (SKU) numbers that are unique to an organization. You can also create a valid column from bad data, such as values that contain known data errors that you want to search for. For example, a Developer tool user creates a reference table that contains a list of valid SKU numbers in a retail organization. The user adds the reference table to a Labeler transformation and creates a mapping with the transformation. The user runs the mapping on a product database table. When the mapping runs, the Labeler creates a column that identifies the product records that do not contain valid SKU numbers. Reference Tables and the Parser Transformation You create a reference table with a single column when you want to use the table data in a pattern-based parsing operation. You configure the Parser transformation to perform pattern-based parsing, and you import the data to the transformation configuration. Managed and Unmanaged Reference Tables Reference tables store metadata in the Model repository. Reference tables can store column data in the reference data database or in another database. The Content Management Service stores the database connection for the reference data database. A managed reference table stores column data in the reference data database. You can edit the values of a managed table in the Analyst tool and Developer tool. An unmanaged reference table stores column data in a database other than the reference data database. You cannot edit the values of an unmanaged table in the Analyst tool or Developer tool. Reference Tables 7 Content Sets A content set is a Model repository object that you use to store reusable content expressions. A content expression is an expression that you can use in Labeler and Parser transformations to identify data. You can create content sets to organize content expressions into logical groups. For example, if you create a number of content expressions that identify Portuguese strings, you can create a content set that groups these content expressions. Create content sets in the Developer tool. Content expressions include character sets, pattern sets, regular expressions, and token sets. Content expressions can be system-defined or user-defined. System-defined content expressions cannot be added to content sets. User-defined content expressions can be reusable or non-reusable. Character Sets A character set contains expressions that identify specific characters and character ranges. You can use character sets in Labeler transformations that use character labeling mode. Character ranges specify a sequential range of character codes. For example, the character range "[A-C]" matches the uppercase characters "A," "B," and "C." This character range does not match the lowercase characters "a," "b," or "c." Use character sets to identify a specific character or range of characters as part of labeling operations. For example, you can label all numerals in a column that contains telephone numbers. After labeling the numbers, you can identify patterns with a Parser transformation and write problematic patterns to separate output ports. Character Set Properties Configure properties that determine character labeling operations for a character set. The following table describes the properties for a user-defined character set: 8 Property Description Label Defines the label that a Labeler transformation applies to data that matches the character set. Standard Mode Enables a simple editing view that includes fields for the start range and end range. Start Range Specifies the first character in a character range. End Range Specifies the last character in a character range. For a range with a single character, leave this field blank. Advanced Mode Enables an advanced editing view where you can manually enter character ranges using range characters and delimiter characters. Range Character Temporarily changes the symbol that signifies a character range. The range character reverts to the default character when you close the character set. Delimiter Character Temporarily changes the symbol that separates character ranges. The delimiter character reverts to the default character when you close the character set. Chapter 2: Reference Data Classifier Models A classifier model analyzes input strings and determines the types of information they contain. You use a classifier model in a Classifier transformation. You can use a classifier model when input strings contain significant amounts of data. For example, you can use a classifier model and Classifier transformation to identify the types of information in a set of documents. You export the text from each document, and you store the text of each document as a separate field in a single data column. The Classifier transformation reads the data and classifies the information in each field according to the labels defined in the model. The classifier model contains the following columns: ¨ A column that contains the words and phrases that may exist in the input data. The transformation compares the input data with the data in this column. ¨ A column that contains descriptive labels that may define the information in the data. The transformation returns a label from this column as output. The classifier model also contains logic that the Classifier transformation uses to calculate the correct information type for the input data. The Model repository stores the metadata for the classifier model object. The column data and logic is stored in a file in the Informatica installation directory structure. Note: You cannot create or edit a classifier model in the Developer tool. Classifier Models and the Core Accelerator Informatica includes a classifier model in the set of prebuilt mappings and reference data objects called the Core Accelerator. The Core Accelerator is part of the Informatica Data Quality product. You download the Core Accelerator from Informatica with the Data Quality Content Installer. When you download the Data Quality Content Installer, find the Core Accelerator xml file in the Content Installer file set. Use the Developer tool to import the accelerator objects. The import operation writes the model object to the Model repository and the model data file to the Informatica file system. Pattern Sets A pattern set contains expressions that identify data patterns in the output of a token labeling operation. You can use pattern sets to analyze the Tokenized Data output port and write matching strings to one or more output ports. Use pattern sets in Parser transformations that use pattern parsing mode. For example, you can configure a Parser transformation to use pattern sets that identify names and initials. This transformation uses the pattern sets to analyze the output of a Labler transformation in token labeling mode. You can configure the Parser transformation to write names and initials in the output to separate ports. Pattern Set Properties Configure properties that determine the patterns in a pattern set. The following table describes the property for a user-defined pattern set: Property Description Pattern Defines the patterns that the pattern parser searches for. You can enter multiple patterns for one pattern set. You can enter Content Sets 9 Property Description patterns constructed from a combination of wildcards, characters, and strings. Probabilistic Models A probabilistic model identifies tokens by the types of information they contain and by their positions in an input string. You use probabilistic models with the Labeler and Parser transformations. Select a probabilistic model when you want to label or parse values on an input port into separate output ports. A probabilistic model uses a structured set of tokens as a reference data set. A labeling or parsing operation can use a probabilistic model to answer the following questions about the data that it reads on a port: ¨ Does the port data contain a token that matches the reference data in the model? ¨ What type of information does the token contain? A probabilistic model contains the following columns: ¨ An input column that represents the data on the input port. You populate the column with sample data from the input port. The model uses the sample data as reference data in parsing and labeling operations. ¨ One or more label columns that identify the types of information in each input string. You add the columns to the model, and you assign labels to the tokens in each string. Use the label columns to indicate the correct position of the tokens in the string. 10 Chapter 2: Reference Data The following figure shows a probabilistic model in the Developer tool: When you configure a token labeling operation with a probabilistic model, the Labeler transformation writes the column name from the probabilistic model to an output port on the transformation. For example, the Labeler can use a probabilistic model to label the string "Franklin Delano Roosevelt" as "FIRSTNAME MIDDLENAME LASTNAME." When you configure a token parsing operation with a probabilistic model, each column you add to the model becomes an output port on the Parser transformation. The transformation writes each token to an output port based on its position in the model. Probabilistic Logic Probabilistic models behave differently to other types of content set. Data Quality can infer a match between the input port data values and the model data values even if the port data is not listed in the model. This means that a probabilistic model does not need to list every token in a data set to correctly label or parse the tokens in the data set. Data Quality uses probabilistic or fuzzy logic to identify tokens on the transformation input port that match tokens in the probabilistic model. The engine updates the fuzzy logic rules when you compile the probabilistic model. Content Sets 11 Probabilistic Model Advanced Properties The Advanced Properties dialog box exposes the computational properties that are built into a probabilistic model when you compile the model. The basic element in the compilation of probabilistic models is the n-gram. An n-gram is a series of letters that can be followed or preceded by one or more letters to complete a word. Probabilistic analysis creates n-grams for each value in the Input column of the probabilistic model. The analysis adds one or more letters to each n-gram to create different words. If the probabilistic analysis can create a word that matches a value on a Labeler or Parser transformation input port, then the analysis determines that the Input value in the probabilistic model matches the input value on the transformation port. The advanced properties on a probabilistic model determine how the probabilistic model handles n-grams and other model features. Note: The default property values represent the preferred settings for probabilistic analysis and probabilistic model compilation in Informatica. If you edit an advanced property, you may adversely affect the accuracy of the probabilistic analysis. Do not edit the advanced properties unless you understand the effects of the changes you make. Steps to Create a Probabilistic Model You create a probabilistic model in multiple stages. Complete the tasks associated with each stage to create and configure a model that you can use in a transformation. Complete the following tasks: Create the probabilistic model object in the repository You can use a data object to create the model, or you can create an empty model. Assign labels to the input data If the probabilistic model does not contain labels for the input data values, you must assign the labels. Compile the probabilistic model When you have entered the input data and configured the labels, you compile the model. You compile every time you edit the model. Creating an Empty Probabilistic Model You can use a data object as the source for the data in a probabilistic model, or you can create an empty model. Create an empty probabilistic model when you want to enter the reference data at a later time. Complete the following steps to create an empty probabilistic model: 1. In Object Explorer, open or create a content set. 2. Select the Content view. 3. Select Probabilistic Models, and click Add. The Probabilistic Model wizard opens. 4. Select the Probabilistic Model option. Click Next. 5. Enter a name for the model. Click Finish and save the model. The probabilistic model opens in the Developer tool. After you create the empty model, you must add input data. 12 Chapter 2: Reference Data Creating a Probabilistic Model from a Data Object You can use a data object as the source for the data in a probabilistic model. For example, use the source data object from the mapping that will read the probabilistic model. You can also profile an object in the mapping and create a data object from the profile results. Probabilistic model logic works best when you use data from the input port on the transformation to populate the input and label columns in the model. Complete the following steps to create a probabilistic model from a data object: 1. In Object Explorer, open or create a content set. 2. Select the Content view. 3. Select Probabilistic Models, and click Add. The Probabilistic Model wizard opens. 4. Select the Probabilistic Model from Data Objects option. Click Next. 5. Enter a name for the model, and browse to the data object you want to use. Click Next. 6. Review the available data columns on the data object, and select a column to add as input data or label data to the model. ¨ To add a data source column to the Input column in the model, select the column name and click Data > . ¨ To use a data source column as a label source for the model, select the column name and click Label > . Click Next. 7. Select the number of rows to copy from the data source. Select all rows, or enter the number of rows to copy. If you enter a number, the model counts the rows from the start of the data set. 8. Set the delimiters to use for the Input column and Data columns. The delimiters apply when the columns contain multiple tokens. The default delimiter is \s, which represents a character space. 9. Enter a name for a column to contain any token that the labeling or parsing operation cannot recognize. The default name is O, which stands for Overflow. 10. Click Finish and save the model. The probabilistic model opens in the Developer tool. 11. Click Compile to build the probabilistic logic rules for the model. Assigning Labels to Probabilistic Model Data If the data object you use to create the probabilistic model does not contain columns for label data, you must add the data. A label is a column name in the probabilistic model. The model uses the column name to identify different types of information in the input data. You create the label columns, and you assign a label to each token in each input row. When you assign a label to a token, the model adds the token to the label column. Follow these guidelines when you assign labels to input data: ¨ A label identifies the type of information that the token represents. A token may represent multiple types of information if it appears in multiple locations in the input string. For example, you can assign the labels FIRSTNAME LASTNAME to the names "John Blake" and "Blake Smith." ¨ You must assign a label to every token in every row, even if the tokens repeat in multiple rows. Content Sets 13 Complete the following steps to assign labels to input data: 1. Open the probabilistic model in the Developer tool canvas. 2. Verify that the model contains the input data and label columns that you need. a. To add a row of input data, click New. The cursor moves to the first available row in the input data column. Enter the input data values. b. To add a label column, right-click an input data row and select New Label. Enter a column name in the New Label dialog box. The label appears in the model. 3. Right-click an input data row and select View tokens and labels as rows. The Labels panel displays under the input data column. Note: A label is a structural element in a probabilistic model. If you add or remove a label in a probabilistic model after you add the model to a Parser transformation, you invalidate the parsing operation that uses the model. You must delete and recreate the operation that uses the probabilistic model if you add or remove a label in the model. Compiling the Probabilistic Model Each time you add data to a probabilistic model, you must compile the model. This enhances the matching logic in the Data Quality engine. u To update the fuzzy logic that the engine uses for a probabilistic model, open the model and click Compile. Generating Probabilistic Model Data from a Midstream Profile You can run a profile on mapping data to create a data source for a probabilistic model. For example, run a profile on the transformation that you connect to the Labeler or Parser transformation, and populate the model with the profile data. This ensure that the model data is as close as possible to the data on the input port you select in the Labeler or Parser transformation. Complete the following steps to run a midstream mapping profile and generate input data for a probabilistic model: 1. Open the mapping that contains the transformation you will connect to the Labeler or Parser. 2. Select a data object and click Profile Now. Select the Results tab in the profile, and review the profile results. 3. Under Column Profiling, select the column you want to add to the probabilistic model. 4. Under Details, select the option to Show Values. The editor displays the data values in the column you selected. Note: You can select all values in the column or a subset of values. 5. If you want to add a subset of column values to a probabilistic model, follow these steps: a. Use the Shift or Ctrl keys to select one or multiple values from the editor. b. Right-click the values and select Send to > Export Results to File. 6. If you want to add all column values to a probabilistic model, click the option to Export Value Frequencies to File. 7. In the Export dialog box, enter a file name. You can save the file on the Informatica services machine or on the Developer client machine. If you save the file on the client machine, enter a path to the file. You can use the file as a data source for the Label or Data column in the probabilistic model. 14 Chapter 2: Reference Data Regular Expressions In the context of content sets, a regular expression is an expression that you can use in parsing and labeling operations. Use regular expressions to identify one or more strings in input data. You can use regular expressions in Parser transformations that use token parsing mode. You can also use regular expressions in Labeler transformations that use token labeling mode. Parser transformations use regular expressions to match patterns in input data and parse all matching strings to one or more outputs. For example, you can use a regular expression to identify all email addresses in input data and parse each email address component to a different output. Labeler transformations use regular expressions to match an input pattern and create a single label. Regular expressions that have multiple outputs do not generate multiple labels. Regular Expression Properties Configure properties that determine how a regular expression identifies and writes output strings. The following table describes the properties for a user-defined regular expression: Property Description Number of Outputs Defines the number of output ports that the regular expression writes. Regular Expression Defines a pattern that the Parser transformation uses to match strings. Test Expression Contains data that you enter to test the regular expression. As you type data in this field, the field highlights strings that matches the regular expression. Next Expression Moves to the next string that matches the regular expression and changes the font of that string to bold. Previous Expression Moves to the previous string that matches the regular expression and changes the font of that string to bold. Token Sets A token set contains expressions that identify specific tokens. You can use token sets in Labeler transformations that use token labeling mode. You can also use token sets in Parser transformations that use token parsing mode. Use token sets to identify specific tokens as part of labeling and parsing operations. For example, you can use a token set to label all email addresses that use that use an "AccountName@DomainName" format. After labeling the tokens, you can use the Parser transformation to write email addresses to output ports that you specify. Content Sets 15 Token Set Properties Configure properties that determine the labeling operations for a token set. The following table describes the properties for a user-defined character set: 16 Property Token Set Mode Description Name N/A Defines the name of the token set. Description N/A Describes the token set. Token Set Options N/A Defines whether the token set uses regular expression mode or character mode. Label Regular Expression Defines the label that a Labeler transformation applies to data that matches the token set. Regular Expression Regular Expression Defines a pattern that the Labeler transformation uses to match strings. Test Expression Regular Expression Contains data that you enter to test the regular expression. As you type data in this field, the field highlights strings that match the regular expression. Next Expression Regular Expression Moves to the next string that matches the regular expression and changes the font of that string to bold. Previous Expression Regular Expression Moves to the previous string that matches the regular expression and changes the font of that string to bold. Label Character Defines the label that a Labeler transformation applies to data that matches the character set. Standard Mode Character Enables a simple editing view that includes fields for the start range and end range. Start Range Character Specifies the first character in a character range. End Range Character Specifies the last character in a character range. For singlecharacter ranges, leave this field blank. Advanced Mode Character Enables an advanced editing view where you can manually enter character ranges using Chapter 2: Reference Data Property Token Set Mode Description range characters and delimiter characters. Range Character Character Temporarily changes the symbol that signifies a character range. The range character reverts to the default character when you close the character set. Delimiter Character Character Temporarily changes the symbol that separates character ranges. The delimiter character reverts to the default character when you close the character set. Creating a Content Set Create content sets to group content expressions according to business requirements. You create content sets in the Developer tool. 1. In the Object Explorer view, select the project or folder where you want to store the content set. 2. Click File > New > Content Set. 3. Enter a name for the content set. 4. Optionally, select Browse to change the Model repository location for the content set. 5. Click Finish. Creating a Reusable Content Expression Create reusable content expressions from within a content set. You can use these content expressions in Labeler transformations and Parser transformations. 1. Open a content set in the editor and select the Content view. 2. Select a content expression view. 3. Click Add. 4. Enter a name for the content expression. 5. Optionally, enter a text description of the content expression. 6. If you selected the Token Set expression view, select a token set mode. 7. Click Next. 8. Configure the content expression properties. 9. Click Finish. Tip: You can create content expressions by copying them from another content set. Use the Copy To and Paste From options to create copies of existing content expressions. You can use the CTRL key to select multiple content expressions when using these options. Content Sets 17 Part II: Data Quality Features in Informatica Developer This part contains the following chapters: ¨ Column Profiles in Informatica Developer, 19 ¨ Column Profile Results in Informatica Developer, 23 ¨ Rules in Informatica Developer, 26 ¨ Scorecards in Informatica Developer, 28 ¨ Mapplet and Mapping Profiling, 30 ¨ Reference Data, 32 18 CHAPTER 3 Column Profiles in Informatica Developer This chapter includes the following topics: ¨ Column Profile Concepts Overview, 19 ¨ Column Profile Options, 20 ¨ Rules, 20 ¨ Scorecards, 20 ¨ Column Profiles in Informatica Developer, 21 ¨ Creating a Single Data Object Profile, 22 Column Profile Concepts Overview A column profile determines the characteristics of columns in a data source, such as value frequency, percentages, and patterns. Column profiling discovers the following facts about data: ¨ The number of unique and null values in each column, expressed as a number and a percentage. ¨ The patterns of data in each column and the frequencies with which these values occur. ¨ Statistics about the column values, such as the maximum and minimum lengths of values and the first and last values in each column. Use column profile options to select the columns on which you want to run a profile, set data sampling options, and set drilldown options when you create a profile. A rule is business logic that defines conditions applied to source data when you run a profile. You can add a rule to the profile to cleanse, change, or validate data. Create scorecards to periodically review data quality. You create scorecards before and after you apply rules to profiles so that you can view a graphical representation of the valid values for columns. 19 Column Profile Options When you create a profile with the Column Profiling option, you can use the profile wizard to define filter and sampling options. These options determine how the profile reads rows from the data set. After you complete the steps in the profile wizard, you can add a rule to the profile. The rule can have the business logic to perform data transformation operations on the data before column profiling. Rules Create and apply rules within profiles. A rule is business logic that defines conditions applied to data when you run a profile. Use rules to further validate the data in a profile and to measure data quality progress. You can add a rule after you create a profile. You can reuse rules created in either the Analyst tool or Developer tool in both the tools. Add rules to a profile by selecting a reusable rule or create an expression rule. An expression rule uses both expression functions and columns to define rule logic. After you create an expression rule, you can make the rule reusable. Create expression rules in the Analyst tool. In the Developer tool, you can create a mapplet and validate the mapplet as a rule. You can run rules from both the Analyst tool and Developer tool. Scorecards A scorecard is the graphical representation of the valid values for a column or output of a rule in profile results. Use scorecards to measure data quality progress. You can create a scorecard from a profile and monitor the progress of data quality over time. A scorecard has multiple components, such as metrics, metric groups, and thresholds. After you run a profile, you can add source columns as metrics to a scorecard and configure the valid values for the metrics. Use a metric group to categorize related metrics in a scorecard into a set. A threshold identifies the range, in percentage, of bad data that is acceptable for columns in a record. You can set thresholds for good, acceptable, or unacceptable ranges of data. When you run a scorecard, you can configure whether you want to drill down on the metrics for a score on the live data or staged data. After you run a scorecard and view the scores, you can drill down on each metric to identify valid data records and records that are not valid. To track data quality effectively, you can use trendcharts and monitor how the scores change over a period of time. The profiling warehouse stores the scorecard statistics and configuration information. You can configure a thirdparty application to get the scorecard results and run reports. You can also display the scorecard results in a web application, portal, or report such as a business intelligence report. 20 Chapter 3: Column Profiles in Informatica Developer Column Profiles in Informatica Developer Use a column profile to analyze the characteristics of columns in a data set, such as value percentages and value patterns. You can add filters to determine the rows that the profile reads at runtime. The profile does not process rows that do not meet the filter criteria. You can discover the following types of information about the columns you profile: ¨ The number of times a value appears in a column. ¨ The frequency of occurrence of each value in a column, expressed as a percentage. ¨ The character patterns of the values in a column. ¨ The maximum and minimum lengths of the values in a column, and the first and last values. You can define a column profile for a data object in a mapping or mapplet or an object in the Model repository. The object in the repository can be in a single data object profile, multiple data object profile, or profile model. You can add rules to a column profile. Use rules to select a subset of source data for profiling. You can also change the drilldown options for column profiles to determine whether the drilldown reads from staged data or live data. Filtering Options You can add filters to determine the rows that a column profile uses when performing profiling operations. The profile does not process rows that do not meet the filter criteria. 1. Create or open a column profile. 2. Select the Filter view. 3. Click Add. 4. Select a filter type and click Next. 5. Enter a name for the filter. Optionally, enter a text description of the filter. 6. Select Set as Active to apply the filter to the profile. Click Next. 7. Define the filter criteria. 8. Click Finish. Sampling Properties Configure the sampling properties to determine the number of rows that the profile reads during a profiling operation. The following table describes the sampling properties: Property Description All Rows Reads all rows from the source. Default is enabled. First Reads from the first row up to the row you specify. Random Sample of Reads a random sample from the number of rows that you specify. Random Sample (Auto) Reads from a random sample of rows. Column Profiles in Informatica Developer 21 Creating a Single Data Object Profile You can create a single data object profile for one or more columns in a data object and store the profile object in the Model repository. 1. In the Object Explorer view, select the data object you want to profile. 2. Click File > New > Profile to open the profile wizard. 3. Select Profile and click Next. 4. Enter a name for the profile and verify the project location. If required, browse to a new location. 5. Optionally, enter a text description of the profile. 6. Verify that the name of the data object you selected appears within the Data Objects section. 7. Click Next. 8. Configure the profile operations that you want to perform. You can configure the following operations: ¨ Column profiling ¨ Primary key discovery ¨ Functional dependency discovery ¨ Data domain discovery Note: To enable a profile operation, select Enabled as part of the "Run Profile" action for that operation. Column profiling is enabled by default. 9. Review the options for your profile. You can edit the column selection for all profile types. Review the filter and sampling options for column profiles. You can review the inference options for primary key, functional dependency, and data domain discovery. You can also review data domain selection for data domain discovery. 10. Review the drilldown options, and edit them if necessary. By default, the Enable Row Drilldown option is selected. You can edit drilldown options for column profiles. The options also determine whether drilldown operations read from the data source or from staged data, and whether the profile stores result data from previous profile runs. 11. Click Finish. The profile is ready to run. 22 Chapter 3: Column Profiles in Informatica Developer CHAPTER 4 Column Profile Results in Informatica Developer This chapter includes the following topics: ¨ Column Profile Results in Informatica Developer, 23 ¨ Column Value Properties, 24 ¨ Column Pattern Properties, 24 ¨ Column Statistics Properties, 24 ¨ Exporting Profile Results from Informatica Developer, 25 Column Profile Results in Informatica Developer Column profile analysis provides information about data quality by highlighting patterns and instances of nonconformance in data. The following table describes the profile results for each type of analysis: Profile Type Profile Results Column profile - Primary key profile - Inferred primary keys - Key violations Functional dependency profile - Inferred functional dependencies - Functional dependency violations Percentage and count statistics for unique and null values Inferred datatypes The datatype that the data source declares for the data The maximum and minimum values The date and time of the most recent profile run Percentage and count statistics for each unique data element in a column Percentage and count statistics for each unique character pattern in a column 23 Column Value Properties Column value properties show the values in the profiled columns and the frequency with which each value appears in each column. The frequencies are shown as a number, a percentage, and a bar chart. To view column value properties, select Values from the Show menu. Double-click a column value to drill-down to the rows that contain the value. The following table describes the properties for column values: Property Description Values List of all values for the column in the profile. Frequency Number of times a value appears in a column. Percent Number of times a value appears in a column, expressed as a percentage of all values in the column. Chart Bar chart for the percentage. Column Pattern Properties Column pattern properties show the patterns of data in the profiled columns and the frequency with which the patterns appear in each column. The patterns are shown as a number, a percentage, and a bar chart. To view pattern information, select Patterns from the Show menu. Double-click a pattern to drill-down to the rows that contain the pattern. The following table describes the properties for column value patterns: Property Description Patterns Pattern for the selected column. Frequency Number of times a pattern appears in a column. Percent Number of times a pattern appears in a column, expressed as a percentage of all values in the column. Chart Bar chart for the percentage. Column Statistics Properties Column statistics properties provide maximum and minimum lengths of values and first and last values. To view statistical information, select Statistics from the Show menu. 24 Chapter 4: Column Profile Results in Informatica Developer The following table describes the column statistics properties: Property Description Maximum Length Length of the longest value in the column. Minimum Length Length of the shortest value in the column. Bottom Last five values in the column. Top First five values in the column. Note: The profile also displays average and standard deviation statistics for columns of type Integer. Exporting Profile Results from Informatica Developer You can export column values and column pattern data from profile results. Export column values in Distinct Value Count format. Export pattern values in Domain Inference format. 1. In the Object Explorer view, select and open a profile. 2. Optionally, run the profile to update the profile results. 3. Select the Results view. 4. Select the column that contains the data for export. 5. Under Details, select Values or select Patterns and click the Export button. The Export data to a file dialog box opens. 6. Accept or change the file name. The default name is [Profile_name]_[column_name]_DVC for column value data and [Profile_name]_[column_name]_DI for pattern data. 7. Select the type of data to export. You can select either Values for the selected column or Patterns for the selected column. 8. Under Save, choose Save on Client and click Browse to select a location and save the file locally in your computer. By default, Informatica Developer writes the file to a location set in the Data Integration Service properties of Informatica Administrator. 9. If you do not want to export field names as the first row, clear the Export field names as first row check box. 10. Click OK. Exporting Profile Results from Informatica Developer 25 CHAPTER 5 Rules in Informatica Developer This chapter includes the following topics: ¨ Rules in Informatica Developer Overview, 26 ¨ Creating a Rule in Informatica Developer, 26 ¨ Applying a Rule in Informatica Developer, 27 Rules in Informatica Developer Overview A rule is business logic that defines conditions applied to source data when you run a profile. You can create reusable rules from mapplets in the Developer tool. You can reuse these rules in Analyst tool profiles to change or validate source data. Create a mapplet and validate as a rule. This rule appears as a reusable rule in the Analyst tool. You can apply the rule to a column profile in the Developer tool or in the Analyst tool. A rule must meet the following requirements: ¨ It must contain an Input and Output transformation. You cannot use data sources in a rule. ¨ It can contain Expression transformations, Lookup transformations, and passive data quality transformations. It cannot contain any other type of transformation. For example, a rule cannot contain a Match transformation as it is an active transformation. ¨ It does not specify cardinality between input groups. Creating a Rule in Informatica Developer You need to validate a mapplet as a rule to create a rule in the Developer tool. Create a mapplet in the Developer tool. 26 1. Right-click the mapplet editor. 2. Select Validate As > Rule. Applying a Rule in Informatica Developer You can add a rule to a saved column profile. You cannot add a rule to a profile configured for join analysis. 1. Browse the Object Explorer view and find the profile you need. 2. Right-click the profile and select Open. The profile opens in the editor. 3. Click the Definition tab, and select Rules. 4. Click Add. The Apply Rule dialog box opens. 5. Click Browse to find the rule you want to apply. Select a rule from a repository project, and click OK. 6. Click the Value column under Input Values to select an input port for the rule. 7. Optionally, click the Value column under Output Values to edit the name of the rule output port. The rule appears in the Definition tab. Applying a Rule in Informatica Developer 27 CHAPTER 6 Scorecards in Informatica Developer This chapter includes the following topics: ¨ Scorecards in Informatica Developer Overview, 28 ¨ Creating a Scorecard, 28 Scorecards in Informatica Developer Overview A scorecard is a graphical representation of the quality measurements in a profile. You can view scorecards in the Developer tool. After you create a scorecard in the Developer tool, you can connect to the Analyst tool to open the scorecard. You can run and edit the scorecard in the Analyst tool. You can run the scorecard on current data in the data object or on data stored in the staging database. Creating a Scorecard Create a scorecard and add columns from a profile to the scorecard. You must run a profile before you add columns to the scorecard. 1. In the Object Explorer view, select the project or folder where you want to create the scorecard. 2. Click File > New > Scorecard. The New Scorecard dialog box appears. 3. Click Add. The Select Profile dialog box appears. Select the profile that contains the columns you want to add. 4. Click OK, then click Next. 5. Select the columns that you want to add to the scorecard. By default, the scorecard wizard selects the columns and rules defined in the profile. You cannot add columns that are not included in the profile. 6. Click Finish. The Developer tool creates the scorecard. 28 7. Optionally, click Open with Informatica Analyst to connect to the Analyst tool and open the scorecard in the Analyst tool. Creating a Scorecard 29 CHAPTER 7 Mapplet and Mapping Profiling This chapter includes the following topics: ¨ Mapplet and Mapping Profiling Overview, 30 ¨ Running a Profile on a Mapplet or Mapping Object, 30 ¨ Comparing Profiles for Mapping or Mapplet Objects, 31 ¨ Generating a Mapping from a Profile, 31 Mapplet and Mapping Profiling Overview You can define a column profile for an object in a mapplet or mapping. Run a profile on a mapplet or a mapping object when you want to verify the design of the mapping or mapplet without saving the profile results. You can also generate a mapping from a profile. Running a Profile on a Mapplet or Mapping Object When you run a profile on a mapplet or mapping object, the profile runs on all data columns and enables drilldown operations on the data that is staged for the data object. You can run a profile on a mapplet or mapping object with multiple output ports. The profile traces the source data through the mapping to the output ports of the object you selected. The profile analyzes the data that would appear on those ports if you ran the mapping. 1. Open a mapplet or mapping. 2. Verify that the mapplet or mapping is valid. 3. Right-click a data object or transformation and select Profile Now. If the transformation has multiple output groups, the Select Output Group dialog box appears. If the transformation has a single output group, the profile results appear on the Results tab of the profile. 4. If the transformation has multiple output groups, select the output groups as necessary. 5. Click OK. The profile results appears in the Results tab of the profile. 30 Comparing Profiles for Mapping or Mapplet Objects You can create a profile that analyzes two objects in a mapplet or mapping and compares the results of the column profiles for those objects. Like profiles of single mapping or mapplet objects, profile comparisons run on all data columns and enable drilldown operations on the data that is staged for the data objects. 1. Open a mapplet or mapping. 2. Verify that the mapplet or mapping is valid. 3. Press the CTRL key and click two objects in the editor. 4. Right-click one of the objects and select Compare Profiles. 5. Optionally, configure the profile comparison to match columns from one object to the other object. 6. Optionally, match columns by clicking a column in one object and dragging it onto a column in the other object. 7. Optionally, choose whether the profile analyzes all columns or matched columns only. 8. Click OK. Generating a Mapping from a Profile You can create a mapping object from a profile. Use the mapping object you create to develop a valid mapping. The mapping you create has a data source based on the profiled object and can contain transformations based on profile rule logic. After you create the mapping, add objects to complete it. 1. In the Object Explorer view, find the profile on which to create the mapping. 2. Right-click the profile name and select Generate Mapping. The Generate Mapping dialog box displays. 3. Enter a mapping name. Optionally, enter a description for the mapping. 4. Confirm the folder location for the mapping. By default, the Developer tool creates the mapping in the Mappings folder in the same project as the profile. Click Browse to select a different location for the mapping. 5. Confirm the profile definition that the Developer tool uses to create the mapping. To use another profile, click Select Profile. 6. Click Finish. The mapping appears in the Object Explorer. Add objects to the mapping to complete it. Comparing Profiles for Mapping or Mapplet Objects 31 CHAPTER 8 Reference Data This chapter includes the following topics: ¨ Reference Tables Overview, 32 ¨ Reference Table Data Properties, 32 ¨ Creating a Reference Table Object, 33 ¨ Creating a Reference Table from a Flat File, 34 ¨ Creating a Reference Table from a Relational Source , 35 ¨ Copying a Reference Table in the Model Repository, 36 Reference Tables Overview Informatica provides reference tables that you can import to the Model repository. You can also create reference tables and connect to database tables that contain reference data. Use the Developer tool to create and update reference tables and to add reference data objects to transformations. Reference Table Data Properties You can view properties for reference table data and metadata in the Developer tool. The Developer tool displays the properties when you open the reference table from the Model repository. A reference table displays general properties and column properties. You can view reference table properties in the Developer tool. You can view and edit reference table properties in the Analyst tool. The following table describes the general properties of a reference table: 32 Property Description Name Name of the reference table. Description Optional description of the reference table. The following table describes the column properties of a reference table: Property Description Valid Identifies the column that contains the valid reference data. Name Name of each column. Data Type Data type of the data in each column. Precision Precision of each column. Scale Scale of each column. Description Description of the contents of the column. You can optionally add a description when you create the reference table. Include a column for low-level descriptions Indicates that the reference table contains a column for descriptions of column data. Default value Default value for the fields in the column. You can optionally add a default value when you create the reference table. Connection Name Name of the connection to the database that contains the reference table data values. Creating a Reference Table Object Choose this option when you want to create an empty reference table and add values by hand. 1. Select File > New > Reference Table from the Developer tool menu. 2. In the new table wizard, select Reference Table as Empty. 3. Enter a name for the table. 4. Select a project to store the table metadata. At the Location field, click Browse. The Select Location dialog box opens and displays the projects in the repository. Select the project you need. Click Next. 5. Add two or more columns to the table. Click the Newoption to create a column. Set the following properties for each column: Property Default Value Name column Data Type string Precision 10 Scale 0 Creating a Reference Table Object 33 Property Default Value Description Empty. Optional property. 6. Select the column that contains the valid values. You can change the order of the columns that you create. 7. Optionally, edit the following properties: Property Default Value Include a column for row-level descriptions Cleared Audit note Empty Default value Empty Maximum rows to preview 500 Click Finish. The reference table opens in the Developer tool workspace. Creating a Reference Table from a Flat File You can create a reference table from data stored in a flat file. 1. Select File > New > Reference Table from the Developer tool menu. 2. In the new table wizard, select Reference Table from a Flat File. 3. Browse to the file you want to use as the data source for the table. 4. Enter a name for the table. 5. Select a project to store the table metadata. At the Location field, click Browse. The Select Location dialog box opens and displays the projects in the repository. Select the project you need. Click Next. 34 6. Set UTF-8 as the code page. 7. Specify the delimiter that the flat file uses. 8. If the flat file contains column names, select the option to import column names from the first line of the file. 9. Optionally, edit the following properties: Property Default Value Text qualifier No quotation marks Start import at line Line 1 Row Delimiter \012 LF (\n) Chapter 8: Reference Data Property Default Value Treat consecutive delimiters as one Cleared Escape character Empty Retain escape character in data Cleared Maximum rows to preview 500 Click Next. 10. Select the column that contains the valid values. You can change the order of the columns. 11. Optionally, edit the following properties: Property Default Value Include a column for row-level descriptions Cleared Audit note Empty Default value Empty Maximum rows to preview 500 Click Finish. The reference table opens in the Developer tool workspace. Creating a Reference Table from a Relational Source You can use a database source to create a managed or unmanaged reference table. To create a managed reference table, connect to the staging database that the Model repository uses. To create an unmanaged reference table, connect to another database. Note: You can configure a database connection in the Connection Explorer. If the Developer tool does not show the Connection Explorer, select Window > Show View > Connection Explorer from the Developer tool menu. 1. Select File > New > Reference Table from the Developer tool menu. 2. In the new table wizard, select Reference Table from a Relational Source.. Click Next. 3. Select a database connection. The Developer tool uses this connection to identify a set of resources for the new reference table. At the Connection field, click Browse. The Choose Connection dialog box opens and displays the available database connections. Click More in the Choose Connection dialog box to browse other connections in the Informatica domain. 4. If the database connection you select does not specify the staging database, select Unmanaged table. 5. Select a database resource. At the Resource field, click Browse. The Choose Connection dialog box opens and displays the resources on the database connection. Explore the database and select the resource you need. 6. Enter a name for the table. Creating a Reference Table from a Relational Source 35 7. Select a project to store the reference table object. At the Location field, click Browse. The Select Location dialog box opens and displays the projects in the repository. Select the project. Click Next. 8. Select the column that contains the valid values. You can change the order of the columns. 9. Optionally, edit the following properties: Property Default Value Include a column for row-level descriptions Cleared Audit note Empty Default value Empty Maximum rows to preview 500 Click Finish. Copying a Reference Table in the Model Repository You can copy a reference table between projects and folders in the Model repository. The reference table and the copy you create are not linked in the Model repository or in the database. When you create a copy, you create a new database table. 36 1. Browse the Model repository, and find the reference table you want to copy. 2. Right-click the reference table, and select Copy from the context menu. 3. In the Model repository, find the project or folder you want to store to copy of the table. 4. Click Paste. Chapter 8: Reference Data Part III: Data Quality Features in Informatica Analyst This part contains the following chapters: ¨ Column Profiles in Informatica Analyst, 38 ¨ Column Profile Results in Informatica Analyst, 45 ¨ Rules in Informatica Analyst, 52 ¨ Scorecards in Informatica Analyst, 56 ¨ Exception Record Management, 66 ¨ Reference Tables, 71 37 CHAPTER 9 Column Profiles in Informatica Analyst This chapter includes the following topics: ¨ Column Profiles in Informatica Analyst Overview, 38 ¨ Column Profiling Process, 39 ¨ Profile Options, 39 ¨ Creating a Column Profile in the Analyst Tool, 41 ¨ Editing a Column Profile, 42 ¨ Running a Profile, 42 ¨ Creating a Filter, 42 ¨ Managing Filters, 43 ¨ Synchronizing a Flat File Data Object, 43 ¨ Synchronizing a Relational Data Object, 44 Column Profiles in Informatica Analyst Overview When you create a profile, you select the columns in the data object for which you want to profile data. You can set or configure sampling and drilldown options for faster profiling. After you run the profile, you can examine the profiling statistics to understand the data. You can profile wide tables and flat files that have a large number of columns. You can profile tables with more than 30 columns and flat files with more than 100 columns. When you create or run a profile, you can choose to select all the columns or select each column you want to include for profiling. The Analyst tool displays the first 30 columns in the data preview. You can select all columns for drilldown and view value frequencies for these columns. You can use rules that have more than 50 output fields and include the rule columns for profiling when you run the profile again. 38 Column Profiling Process As part of column profiling process, you can choose to create a quick profile or a custom profile for a data object. Use a quick profile to include all columns for a data object and use the default profile options. Use a custom profile to select the columns for a data object and to configure the profile results, sampling, and drilldown options. The following steps describe the column profiling process: 1. Select the data object you want to profile. 2. Determine whether you want to create a quick profile or a custom profile. 3. Choose where you want to save the profile. 4. Select the columns you want to profile. 5. Select the profile results option. 6. Choose the sampling options. 7. Choose the drilldown options. 8. Define a filter to determine the rows that the profile reads at run time. 9. Run the profile. Note: Consider the following rules and guidelines for column names and profiling multilingual and Unicode data: ¨ You cannot add a column to a profile if both the column name and profile name match. You cannot add the same column twice to a profile even if you change the column name. ¨ You can profile multilingual data from different sources and view profile results based on the locale settings in the browser. The Analyst tool changes the Datetime, Numeric, and Decimal datatypes based on the browser locale. ¨ Sorting on multilingual data. You can sort on multilingual data. The Analyst tool displays the sort order based on the browser locale. ¨ To profile Unicode data in a DB2 database, set the DB2CODEPAGE database environment variable in the database and restart the Data Integration Service. Profile Options Profile options include profile results option, data sampling options, and data drilldown options. You can configure these options when you create a column profile for a data object. You use the New Profile wizard to configure the profile options. You can choose to create a profile with the default options for columns, sampling, and drilldown options. When you create a profile for multiple data sources, the Analyst tool uses default column profiling options. Column Profiling Process 39 Profile Results Option You can choose to discard previous profile results or to display results for previous profile runs. The following table describes the profile results option for a profile: Option Description Show results only for columns, rules selected in current run Discards the profile results for previously profiled columns and displays results for the columns and rules selected for the latest profile run. Do not select this option if you want the Analyst tool to display profile results for previously profiled columns. Sampling Options Sampling options determine the number of rows that the Analyst tool chooses to profile. You can configure sampling options when you go through the wizard or when you run a profile. The following table describes the sampling options for a profile: Option Description All Rows Chooses all rows in the data object. First <number> Rows The number of rows that you want to run the profile against. The Analyst tool chooses the rows from the first rows in the source. Random Sample <number> Rows The number of rows for a random sample to run the profile against. Random sampling forces the Analyst tool to perform drilldown on staged data. Note that this can impact drilldown performance. Random sample Random sample size based on the number of rows in the data object. Random sampling forces the Analyst tool to perform drilldown on staged data. Note that this can impact drilldown performance. Drilldown Options You can configure drilldown options when you go through the wizard or when you run a profile. The following table describes the drilldown options for a profile: 40 Options Description Enable Row Drilldown Drills down to row data in the profile results. By default, this option is selected. Select Columns Identifies columns for drilldown that you did not select for profiling. Drilldown on live or staged data Drills down on live data to read current data in the data source. Chapter 9: Column Profiles in Informatica Analyst Options Description Drill down on staged data to read profile data that is staged in the profiling warehouse. Creating a Column Profile in the Analyst Tool Select a data object and create a custom profile or a default profile. When you create a custom profile, you can configure the columns, the rows to sample, and the drilldown options. The Analyst tool creates the profile in the same project and folder as the data object. 1. In the Navigator, select the project that contains the data object that you want to create a custom profile for. 2. In the Contents panel, right-click the data object and select New > Profile. The New Profile wizard appears. The Column profiling option is selected by default. 3. Click Next. 4. In the Sources panel, select a data object. 5. Choose to create a default profile or a custom profile. ¨ To create a default profile, click Save or Save & Run. ¨ To create a custom profile, click Next. 6. Enter a name and an optional description for the profile. 7. In the Folders panel, select the project or folder where you want to create the profile. The Analyst tool displays the project that you selected and shared projects that contain folders where you can create the profile. The profile objects in the folder appear in the Profiles panel. 8. Click Next. 9. In the Columns panel, select the columns that you want to profile. The columns include any rules you applied to the profile. The Analyst tool lists the name, datatype, precision, and scale for each column. Optionally, select Name to select all columns. 10. Accept the default option in the Profile Results Option panel. The first time you run the profile, the Analyst tool displays profile results for all columns selected for profiling. 11. In the Sampling Options panel, configure the sampling options. 12. In the Drilldown Options panel, configure the drilldown options. Optionally, click Select Columns to select columns to drill down on. In the Drilldown columns window, select the columns for drill down and click OK. 13. Click Next. 14. Optionally, define a filter for the profile. 15. Click Next to verify the row drilldown settings including the preview columns for drilldown. 16. Click Save to create the profile, or click Save & Run to create the profile and then run the profile. Creating a Column Profile in the Analyst Tool 41 Editing a Column Profile You can make changes to a column profile after running it. 1. In the Navigator, select the project or folder that contains the profile that you want to edit. 2. Click the profile to open it. The profile opens in a tab. 3. Click Actions > Edit. A short-cut menu appears. 4. Based on the changes you want to make, choose one of the following menu options: ¨ General. Change the basic properties such as name, description, and profile type. ¨ Data Source. Choose another matching data source. ¨ Column Profiling. Select the columns you want to run the profile on and configure the necessary sampling and drill down options. ¨ Column Profiling Filter. Create, edit, and delete filters. ¨ Column Profiling Rules. Create rules or change current ones. ¨ Data Domain Discovery. Set up data domain discovery options. 5. Click Save to save the changes or click Save & Run to save the changes and then run the profile. Running a Profile Run a profile to analyze a data source for content and structure and select columns and rules for drill down. You can drill down on live or staged data for columns and rules. You can run a profile on a column or rule without profiling all the source columns again after you run the profile. 1. In the Navigator, select the project or folder that contains the profile you want to run. 2. Click the profile to open it. The profile appears in a tab. Verify the profile options before you run the profile. 3. Click Actions > Run Profile. The Analyst tool displays the profile results. Creating a Filter You can create a filter so that you can make a subset of the original data source that meets the filter criteria. You can then run a profile on this sample data. 1. Open a profile. 2. Click Actions > Edit > Column Profiling Filters to open the Edit Profile dialog box. The current filters appear in the Filters panel. 42 3. Click New. 4. Enter a filter name and an optional description. Chapter 9: Column Profiles in Informatica Analyst 5. Select a simple, advanced, or SQL filter type. ¨ Simple. Use conditional operators, such as <, >, =, BETWEEN, and ISNULL for each column that you want to filter. ¨ Advanced. Use function categories, such as Character, Consolidation, Conversion, Financial, Numerical, and Data cleansing. Click the function name on the Functions panel to view its return type, description, and parameters. To include the function in the filter, click the right arrow (>) button, and you can specify the parameters in the Function dialog box. Note: For a simple or an advanced filter on a date column, provide the condition in the YYYY/MM/DD HH:MM:SS format. ¨ SQL. Creates SQL queries. You can create an SQL filter for relational data sources. Enter the WHERE clause expression to generate the SQL filter. For example, to filter company records in the European region from a Company table with a Region column, enter Region = 'Europe' in the editor. 6. Click Validate to verify the SQL expression. Managing Filters You can create, edit, and delete filters. 1. In the Navigator, select the project or folder that contains the profile you want to filter. 2. Open the profile. 3. Click Actions > Edit > Column Profiling Filters to open the Edit Profile dialog box. The current filters appear in the Filters panel. 4. Choose to create, edit, or delete a filter. ¨ Click New to create a filter. ¨ Select a filter, and click Edit to change the filter settings. ¨ Select a filter, and click Delete to remove the filter. Synchronizing a Flat File Data Object You can synchronize the changes to an external flat file data source with its data object in Informatica Analyst. Use the Synchronize Flat File wizard to synchronize the data objects. 1. In the Contents panel, select a flat file data object. 2. Click Actions > Synchronize. The Synchronize Flat File dialog box appears in a new tab. 3. Verify the flat file path in the Browse and Upload field. 4. Click Next. A synchronization status message appears. Managing Filters 43 5. When you see a Synchronization complete message, click OK. The message displays a summary of the metadata changes made to the data object. To view the details of the metadata changes, use the Properties view. Synchronizing a Relational Data Object You can synchronize the changes to an external relational data source with its data object in Informatica Analyst. External data source changes include adding, changing, and removing columns and changes to rules. 1. In the Contents panel, select a relational data object. 2. Click Actions > Synchronize. A message prompts you to confirm the action. 3. To complete the synchronization process, click OK. Click Cancel to cancel the process. If you click OK, a synchronization status message appears. 4. When you see a Synchronization complete message, click OK. The message displays a summary of the metadata changes made to the data object. To view the details of the metadata changes, use the Properties view. 44 Chapter 9: Column Profiles in Informatica Analyst CHAPTER 10 Column Profile Results in Informatica Analyst This chapter includes the following topics: ¨ Column Profile Results in Informatica Analyst Overview, 45 ¨ Profile Summary, 46 ¨ Column Values, 47 ¨ Column Patterns, 47 ¨ Column Statistics, 48 ¨ Column Profile Drilldown, 49 ¨ Column Profile Export Files in Informatica Analyst, 49 Column Profile Results in Informatica Analyst Overview View profile results to understand the structure of data and analyze its quality. You can view the profile results after you run a profile. You can view a summary of the columns and rules in the profile and the values, patterns, and statistics for columns and rules. After you run a profile, you can view the profile results in the Column Profiling, Properties, and Data Preview views. You can export value frequencies, pattern frequencies, or drilldown data to a CSV file. You can export the complete profile summary information to a Microsoft Excel file so that you can view all data in a file for further analysis. In the Column Profiling view, you can view the summary information for columns for a profile run. You can view values, patterns, and statistics for each column in the Values, Patterns, and Statistics views. The Analyst tool displays rules as columns in profile results. The profile results for a rule appear as a profiled column. The profile results that appear depend on the profile configuration and sampling options. The following profiling results appear in the Column Profiling view: ¨ The summary information for the profile run, including the number of unique and null values, inferred datatype, and last run date and time. ¨ Values for columns and the frequency in which the value appears for the column. The frequency appears as a number, a percentage, and a chart. ¨ Value patterns for the profiled columns and the frequency in which the pattern appears. The frequency appears as a number and a percentage. 45 ¨ Statistics about the column values, such as average, length, and top and bottom values. Note: You can select a value or pattern and view profiled rows that match the value or pattern on the Details panel. In the Properties view, you can view profile properties on the Properties panel. You can view properties for columns and rules on the Columns and Rules panel. In the Data Preview view, you can preview the profile data. The Analyst tool includes all columns in the profile and displays the first 100 rows of data. Profile Summary The summary for a profile run includes the number of unique and null values expressed as a number and a percentage, inferred datatypes, and last run date and time. You can click each profile summary property to sort on values of the property. The following table describes the profile summary properties: 46 Property Description Name Name of the column in the profile. Unique Values Number of unique values for the column. % Unique Percentage of unique values for the column. Null Number of null values for the column. % Null Percentage of null values for the column. Datatype Datatype derived from the values for the column. The Analyst tool can derive the following datatypes from the datatypes of values in columns: - String - Varchar - Decimal - Integer - "-" for Nulls Note: The Analyst tool cannot derive the datatype from the values of a numeric column that has a precision greater than 38. The Analyst tool cannot derive the datatype from the values of a string column that has a precision greater than 255. If you have a date column on which you are creating a column profile with a year value earlier than 1800, the inferred datatype may show up as fixed length string. Change the default value for the year-minimum parameter in the InferDateTimeConfig.xml, as necessary. % Inferred Percentage of values that match the data type inferred by the Analyst tool. Documented Datatype Datatype declared for the column in the profiled object. Maximum Value Maximum value in the column. Minimum Value Minimum value in the column. Chapter 10: Column Profile Results in Informatica Analyst Property Description Last Profile Run Date and time you last ran the profile. Drilldown If selected, drills down on live data for the column. Column Values The column values include values for columns and the frequency in which the value appears for the column. The following table describes the properties for the column values: Property Description Value List of all values for the column in the profile. Note: The Analyst tool excludes the CLOB, BLOB, Raw, and Binary datatypes in column values in a profile. Frequency Number of times a value appears for a column, expressed as a number, a percentage, and a chart. Percent Percentage that a value appears for a column. Chart Chart for the percentage. Drill down Drills down to specific source rows based on a column value. Note: To sort the Value and Frequency columns, select the columns. When you sort the results of the Frequency column, the Analyst tool sorts the results based on the datatype of the column. Column Patterns The column patterns include the value patterns for the columns and the frequency in which the pattern appears. The profiling warehouse stores 16,000 unique highest frequency values including NULL values for profile results by default. If there is at least one NULL value in the profile results, the Analyst tool can display NULL values as patterns. Note: The Analyst tool cannot derive the pattern for a numeric column that has a precision greater than 38. The Analyst tool cannot derive the pattern for a string column that has a precision greater than 255. The following table describes the properties for the column patterns: Property Description Pattern Pattern for the column in the profile. Frequency Number of times a pattern appears for a column, expressed as a number. Column Values 47 Property Description Percent Percentage that a pattern appears for a column. Chart Chart for the percentage. Drill down Drills down to specific source rows based on a column pattern. The following table describes the pattern characters and what they represent: Character Description 9 Represents any numeric character. Informatica Analyst displays up to three characters separately in the "9" format. The tool displays more than three characters as a value within parentheses. For example, the format "9(8)" represents a numeric value with 8 digits. X Represents any alphabetic character. Informatica Analyst displays up to three characters separately in the "X" format. The tool displays more than three characters as a value within parentheses. For example, the format "X(6)" may represent the value "Boston." Note: The pattern character X is not case sensitive and may represent upper case or lower case characters from the source data. p Represents "(", the left parenthesis. q Represents ")", the right parenthesis. b Represents a blank space. Column Statistics The column statistics include statistics about the column values, such as average, length, and top and bottom values. The statistics that appear depend on the column type. The following table describes the types of column statistics for each column type: 48 Statistic Column Type Description Average Integer Average of the values for the column. Standard Deviation Integer The standard deviation, or variability between column values, for all values of the column. Maximum Length Integer, String Length of the longest value for the column. Minimum Length Integer, String Length of the shortest value for the column. Bottom Integer, String Lowest values for the column. Top Integer, String Highest values for the column. Chapter 10: Column Profile Results in Informatica Analyst Column Profile Drilldown Drilldown options for a column profile enable you to drill down to specific rows in the data source based on a column value. You can choose to read the current data in a data source for drilldown or read profile data staged in the profile warehouse. When you drill down to a specific row on staged profile data, the Analyst tool creates a drilldown filter for the matching column value. After you drill down, you can edit, recall, reset, and save the drilldown filter. You can select columns for drilldown even if you did not choose those columns for profiling. You can choose to read the current data in a data source for drilldown or read profile data staged in the profiling warehouse. After you perform a drilldown on a column value, you can export drilldown data for the selected values or patterns to a CSV file at a location you choose. Though Informatica Analyst displays the first 200 values for drilldown data, the tool exports all values to the CSV file. Drilling Down on Row Data After you run a profile, you can drill down to specific rows that match the column value or pattern. 1. Run a profile. The profile appears in a tab. 2. In the Summary view, select a column name to view the profile results for the column. 3. Select a column value on the Values tab or select a column pattern on the Patterns tab. 4. Click Actions > Drilldown to view the rows of data. The Drilldown panel displays the rows that contain the values or patterns. The column value or pattern appears at the top of the panel. Note: You can choose to drill down on live data or staged data. Applying Filters to Drilldown Data You can filter the drilldown data iteratively so that you can analyze data irregularities on the subsets of profile results. 1. Drill down to row data in the profile results. 2. Select a column value on the Values tab. 3. Right-click and select Drilldown Filter > Edit to open the DrillDown Filter dialog box. 4. Add filter conditions, and click Run. 5. To manage current drilldown filters, you can save, recall, or reset filters. ¨ To save a filter, select Drilldown Filter > Save. ¨ To go back to the last saved drilldown filter results, select Drilldown Filter > Recall. ¨ To reset the drilldown filter results, select Drilldown Filter > Reset. Column Profile Export Files in Informatica Analyst You can export column profile results to a CSV file or a Microsoft Excel file based on whether you choose a part of the profile results or the complete results summary. Column Profile Drilldown 49 You can export value frequencies, pattern frequencies, or drilldown data to a CSV file for selected values and patterns. You can export the profiling results summary for all columns to a Microsoft Excel file. Use the Data Integration Service privilege Drilldown and Export Results to determine, by user or group, who exports profile results. Profile Export Results in a CSV File You can export value frequencies, pattern frequencies, or drilldown data to view the data in a file. The Analyst tool saves the information in a CSV file. When you export inferred column patterns, the Analyst tool exports a different format of the column pattern. For example, when you export the inferred column pattern X(5), the Analyst tool displays the following format of the column pattern in the CSV file: XXXXX. Profile Export Results in Microsoft Excel When you export the complete profile results summary, the Analyst tool saves the information to multiple worksheets in a Microsoft Excel file. The Analyst tool saves the file in the "xlsx" format. The following table describes the information that appears on each worksheet in the export file: Tab Description Column Profile Summary information exported from the Column Profiling view after the profile runs. Examples are column names, rule names, number of unique values, number of null values, inferred datatypes, and date and time of the last profile run. Values Values for the columns and rules and the frequency in which the values appear for each column. Patterns Value patterns for the columns and rules you ran the profile on and the frequency in which the patterns appear. Statistics Statistics about each column and rule. Examples are average, length, top values, bottom values, and standard deviation. Properties Properties view information, including profile name, type, sampling policy, and row count. Exporting Profile Results from Informatica Analyst You can export the results of a profile to a ".csv" or ".xlsx" file to view the data in a file. 1. In the Navigator, select the project or folder that contains the profile. 2. Click the profile to open it. The profile opens in a tab. 3. In the Column Profiling view, select the column that you want to export. 4. Click Actions > Export Data. The Export Data to a file window appears. 50 5. Enter the file name. Optionally, use the default file name. 6. Select the type of data to export. Chapter 10: Column Profile Results in Informatica Analyst ¨ All (Summary, Values, Patterns, Statistics, Properties) ¨ Value frequencies for the selected column. ¨ Pattern frequencies for the selected column. ¨ Drilldown data for the selected values or patterns. 7. Enter a file format. The format is Excel for the All option and CSV for the rest of the options. 8. Select the code page of the file. 9. Click OK. Column Profile Export Files in Informatica Analyst 51 CHAPTER 11 Rules in Informatica Analyst This chapter includes the following topics: ¨ Rules in Informatica Analyst Overview, 52 ¨ Predefined Rules, 53 ¨ Expression Rules, 54 Rules in Informatica Analyst Overview A rule is business logic that defines conditions applied to source data when you run a profile. You can add a rule to the profile to cleanse, change, or validate data. You may want to use a rule in different circumstances. You can add a rule to cleanse one or more data columns. You can add a lookup rule that provides information that the source data does not provide. You can add a rule to validate a cleansing rule for a data quality or data integration project. You can add a rule before or after you run a profile. When you add a rule to a profile, you can create a rule or you can apply a rule. You can create or apply the following rule types for a profile: ¨ Expression rules. Use expression functions and columns to define rule logic. Create expression rules in the Analyst tool. An analyst can create an expression rule and promote it to a reusable rule that other analysts can use in multiple profiles. ¨ Predefined rules. Includes reusable rules that a developer creates in the Developer tool. Rules that a developer creates in the Developer tool as mapplets can appear in the Analyst tool as reusable rules. After you add a rule to a profile, you can run the profile again for the rule column. The Analyst tool displays profile results for the rule column. You can modify the rule and run the profile again to view changes to the profile results. The output of a rule can be one or more virtual columns. The virtual columns exist in the profile results. The Analyst tool profiles the virtual columns. For example, you use a predefined rule that splits a column that contains first and last names into FIRST_NAME and LAST_NAME virtual columns. The Analyst tool profiles the FIRST_NAME and LAST_NAME columns. Note: If you delete a rule object that other object types reference, the Analyst tool displays a message that lists those object types. Determine the impact of deleting the rule before you delete it. 52 Predefined Rules Predefined rules are rules created in the Developer tool or provided with the Developer tool and Analyst tool. Apply predefined rules to the Analyst tool profiles to modify or validate source data. Predefined rules use transformations to define rule logic. You can use predefined rules with multiple profiles. In the Model repository, a predefined rule is a mapplet with an input group, an output group, and transformations that define the rule logic. Predefined Rules Process Use the New Rule Wizard to apply a predefined rule to a profile. You can perform the following steps to apply a predefined rule: 1. Open a profile. 2. Select a predefined rule. 3. Review the rules parameters. 4. Select the input column. 5. Configure the profiling options. Applying a Predefined Rule Use the New Rule Wizard to apply a predefined rule to a profile. When you apply a predefined rule, you select the rule and configure the input and output columns for the rule. Apply a predefined rule to use a rule promoted as a reusable rule or use a rule created by a developer. 1. In the Navigator, select the project or folder that contains the profile that you want to add the rule to. 2. Click the profile to open it. The profile appears in a tab. 3. Click Actions > Add Rule. The New Rule window appears. 4. Select the option to Apply a Rule. 5. Click Next. 6. In the Rules panel, select the rule that you want to apply. The name, datatype, description, and precision columns appear for the Inputs and Outputs columns in the Rules Parameters panel. 7. Click Next. 8. In the Inputs section, select an input column. The input column is a column name in the profile. 9. Optionally, in the Outputs section, configure the label of the output columns. 10. Click Next. 11. In the Columns panel, select the columns you want to profile. The columns include any rules you applied to the profile. Optionally, select Name to include all columns. The Analyst tool lists the name, datatype, precision, and scale for each column. 12. In the Sampling Options panel, configure the sampling options. 13. In the Drilldown Options panel, configure the drilldown options. 14. Click Save to apply the rule or click Save & Run to apply the rule and then run the profile. Predefined Rules 53 Expression Rules Expression rules use expression functions and columns to define rule logic. Create expression rules and add them to a profile in the Analyst tool. Use expression rules to change or validate values for columns in a profile. You can create one or more expression rules to use in a profile. Expression functions are SQL-like functions used to transform source data. You can create expression rule logic with the following types of functions: ¨ Character ¨ Conversion ¨ Data Cleansing ¨ Date ¨ Encoding ¨ Financial ¨ Numeric ¨ Scientific ¨ Special ¨ Test Expression Rules Process Use the New Rule Wizard to create an expression rule and add it to a profile. The New Rule Wizard includes an expression editor. Use the expression editor to add expression functions, configure columns as input to the functions, validate the expression, and configure the return type, precision, and scale. The output of an expression rule is a virtual column that uses the name of the rule as the column name. The Analyst tool profiles the virtual column. For example, you use an expression rule to validate a ZIP code. The rule returns 1 if the ZIP Code is valid and 0 if the ZIP code is not valid. Informatica Analyst profiles the 1 and 0 output values of the rule. You can perform the following steps to create an expression rule: 1. Open a profile. 2. Configure the rule logic using expression functions and columns as parameters. 3. Configure the profiling options. Creating an Expression Rule Use the New Rule Wizard to create an expression rule and add it to a profile. Create an expression rule to modify or validate values for columns in a profile. 1. In the Navigator, select the project or folder that contains the profile that you want to add the rule to. 2. In the Contents panel, click the profile to open it. The profile appears in a tab. 3. Click Actions > Edit > Column Profiling Rules. The Edit Profile dialog box appears. 4. 54 Click New. Chapter 11: Rules in Informatica Analyst 5. Select Create a rule. 6. Click Next. 7. Enter a name and optional description for the rule. 8. Optionally, choose to promote the rule as a reusable rule and configure the project and folder location. If you promote a rule to a reusable rule, you or other users can use the rule in another profile as a predefined rule. 9. In the Functions tab, select a function and click the right arrow to enter the parameters for the function. 10. In the Columns tab, select an input column and click the right arrow to add the expression in the Expression editor. You can also add logical operators to the expression. 11. Click Validate. You can proceed to the next step if the expression is valid. 12. Optionally, click Edit to configure the return type, precision, and scale. 13. Click Next. 14. In the Columns panel, select the columns you want to profile. The columns include any rules you applied to the profile. Optionally, select Name to select all columns. The Analyst tool lists the name, datatype, precision, and scale for each column. 15. In the Sampling Options panel, configure the sampling options. 16. In the Drilldown Options panel, configure the drilldown options. 17. Click Save to create the rule or click Save & Run to create the rule and then run the profile. Expression Rules 55 CHAPTER 12 Scorecards in Informatica Analyst This chapter includes the following topics: ¨ Scorecards in Informatica Analyst Overview, 56 ¨ Informatica Analyst Scorecard Process, 56 ¨ Metrics, 57 ¨ Scorecard Notifications, 62 ¨ Scorecard Integration with External Applications, 64 Scorecards in Informatica Analyst Overview A scorecard is the graphical representation of valid values for a column in a profile. You can create scorecards and drill down on live data or staged data. Use scorecards to measure data quality progress. For example, you can create a scorecard to measure data quality before you apply data quality rules. After you apply data quality rules, you can create another scorecard to compare the effect of the rules on data quality. Scorecards display the value frequency for columns as scores. The scores reflect the percentage of valid values in the columns. After you run a profile, you can add columns from the profile as metrics to a scorecard. You can create metric groups so that you can group related metrics to a single entity. You can define thresholds that specify the range of bad data acceptable for columns in a record and assign metric weights for each metric. When you run a scorecard, the Analyst tool generates weighted average values for each metric group. To identify valid data records and records that are not valid, you can drill down on each column. You can use trend charts in the Analyst tool to track how scores change over a period of time. Informatica Analyst Scorecard Process You can run and edit the scorecard in the Analyst tool. You can create and view a scorecard in the Developer tool. You can run the scorecard on current data in the data object or on data stored in the staging database. When you view a scorecard in the Contents view of the Analyst tool, it opens the scorecard in another tab. After you run the scorecard, you can view the scores on the Scorecard view. You can select the data object and navigate to the data object from a score within a scorecard. The Analyst tool opens the data object in another tab. You can perform the following tasks when you work with scorecards: 1. 56 Create a scorecard in the Developer tool and add columns from a profile. 2. Optionally, connect to the Analyst tool and open the scorecard in the Analyst tool. 3. After you run a profile, add profile columns as metrics to the scorecard. 4. Run the scorecard to generate the scores for columns. 5. View the scorecard to see the scores for each column in a record. 6. Drill down on the columns for a score. 7. Edit a scorecard. 8. Set thresholds for each metric in a scorecard. 9. Create a group to add or move related metrics in the scorecard. 10. Edit or delete a group, as required. 11. View trend charts for each score to monitor how the score changes over time. Metrics A metric is a column of a data source or output of a rule that is part of a scorecard. When you create a scorecard, you can assign a weight to each metric. Create a metric group to categorize related metrics in a scorecard into a set. Metric Weights When you create a scorecard, you can assign a weight to each metric. The default value for a weight is 1. When you run a scorecard, the Analyst tool calculates the weighted average for each metric group based on the metric score and weight you assign to each metric. For example, you assign a weight of W1 to metric M1, and you assign a weight of W2 to metric M2. The Analyst tool uses the following formula to calculate the weighted average: (M1 X W1 + M2 X W2) / (W1 + W2) Adding Columns to a Scorecard After you run a profile, you can add profile columns to a scorecard. Use the Add to Scorecard Wizard to add columns from a profile to a scorecard and configure the valid values for the columns. If you add a profile column to a scorecard from a source profile that has a filter or a sampling option other than All Rows, profile results may not reflect the scorecard results. 1. In the Navigator, select the project or folder that contains the profile. 2. Click the profile to open it. The profile appears in a tab. 3. Click Actions > Run Profile to run the profile. 4. Click Actions > Add to Scorecard. The Add to Scorecard Wizard appears. Note: Use the following rules and guidelines before you add columns to a scorecard: ¨ You cannot add a column to a scorecard if both the column name and scorecard name match. Metrics 57 ¨ You cannot add a column twice to a scorecard even if you change the column name. 5. Select Existing Scorecard to add the columns to an existing scorecard. The New Scorecard option is selected by default. 6. Click Next. 7. Select the scorecard that you want to add the columns to, and click Next. 8. Select the columns and rules that you want to add to the scorecard as metrics. Optionally, click the check box in the left column header to select all columns. Optionally, select Column Name to sort column names. 9. Select each metric in the Metrics panel and configure the valid values from the list of all values in the Score using: Values panel. You can select multiple values in the Available Values panel and click the right arrow button to move them to the Selected Values panel. 10. Select each metric in the Metrics panel and configure metric thresholds in the Metric Thresholds panel. You can set thresholds for Good, Acceptable, and Unacceptable scores. 11. Click Next. 12. In the Score using: Values panel, set up the metric weight for each metric. You can double-click the default metric weight of 1 to change the value. 13. In the Metric Group Thresholds panel, set up metric group thresholds. 14. Click Save to save the scorecard or click Save & Run to save and run the scorecard. Running a Scorecard Run a scorecard to generate scores for columns. 1. In the Navigator, select the project or folder that contains the scorecard. 2. Click the scorecard to open it. The scorecard appears in a tab. 3. Click Actions > Run Scorecard. 4. Select a score from the Metrics panel and select the columns from the Columns panel to drill down on. 5. In the Drilldown option, choose to drill down on live data or staged data. For optimal performance, drill down on live data. 6. Click Run. Viewing a Scorecard Run a scorecard to see the scores for each metric. A scorecard displays the score as a percentage and bar. View data that is valid or not valid. You can also view scorecard information, such as the metric weight, metric group score, score trend, and name of the data object. 1. Run a scorecard to view the scores. 2. Select a metric that contains the score you want to view. 3. Click Actions > Drilldown to view the rows of valid data or rows of data that is not valid for the column. The Analyst tool displays the rows of valid data by default in the Drilldown panel. 58 Chapter 12: Scorecards in Informatica Analyst Editing a Scorecard Edit valid values for metrics in a scorecard. You must run a scorecard before you can edit it. 1. In the Navigator, select the project or folder that contains the scorecard. 2. Click the scorecard to open it. The scorecard appears in a tab. 3. Click Actions > Edit. The Edit Scorecard dialog box appears. 4. On the Metrics tab, select each score in the Metrics panel and configure the valid values from the list of all values in the Score using: Values panel. 5. Make changes to the score thresholds in the Metric Thresholds panel as necessary. 6. Click the Metric Groups tab. 7. Create, edit, or remove metric groups. You can also edit the metric weights and metric thresholds on the Metric Groups tab. 8. Click the Notifications tab. 9. Make changes to the scorecard notification settings as necessary. You can set up global and custom settings for metrics and metric groups. 10. Click Save to save changes to the scorecard, or click Save & Run to save the changes and run the scorecard. Defining Thresholds You can set thresholds for each score in a scorecard. A threshold specifies the range in percentage of bad data that is acceptable for columns in a record. You can set thresholds for good, acceptable, or unacceptable ranges of data. You can define thresholds for each column when you add columns to a scorecard, or when you edit a scorecard. Complete the following prerequisite tasks before you define thresholds for columns in a scorecard: ¨ In the Navigator, select the project or folder that contains the profile and add columns from the profile to the scorecard in the Add to Scorecard window. ¨ Optionally, in the Navigator, select the project or folder that contains the scorecard and click the scorecard to edit it in the Edit Scorecard window. 1. In the Add to Scorecard window, or the Edit Scorecard window, select each metric in the Metrics panel. 2. In the Metric Thresholds panel, enter the thresholds that represent the upper bound of the unacceptable range and the lower bound of the good range. 3. Click Next or Save. Metric Groups Create a metric group to categorize related scores in a scorecard into a set. By default, the Analyst tool categorizes all the scores in a default metric group. After you create a metric group, you can move scores out of the default metric group to another metric group. You can edit a metric group to change its name and description, including the default metric group. You can delete metric groups that you no longer use. You cannot delete the default metric group. Metrics 59 Creating a Metric Group Create a metric group to add related scores in the scorecard to the group. 1. In the Navigator, select the project or folder that contains the scorecard. 2. Click the scorecard to open it. The scorecard appears in a tab. 3. Click Actions > Edit. The Edit Scorecard window appears. 4. Click the Metric Groups tab. The default group appears in the Metric Groups panel and the scores in the default group appear in the Metrics panel. 5. Click the New Group icon to create a metric group. The Metric Groups dialog box appears. 6. Enter a name and optional description. 7. Click OK. 8. Click Save to save the changes to the scorecard. Moving Scores to a Metric Group After you create a metric group, you can move related scores to the metric group. 1. In the Navigator, select the project or folder that contains the scorecard. 2. Click the scorecard to open it. The scorecard appears in a tab. 3. Click Actions > Edit. The Edit Scorecard window appears. 4. Click the Metric Groups tab. The default group appears in the Metric Groups panel and the scores in the default group appear in the Metrics panel. 5. Select a metric from the Metrics panel and click the Move Metrics icon. The Move Metrics dialog box appears. Note: To select multiple scores, hold the Shift key. 6. Select the metric group to move the scores to. 7. Click OK. Editing a Metric Group Edit a metric group to change the name and description. You can change the name of the default metric group. 1. In the Navigator, select the project or folder that contains the scorecard. 2. Click the scorecard to open it. The scorecard opens in a tab. 3. Click Actions > Edit. The Edit Scorecard window appears. 60 Chapter 12: Scorecards in Informatica Analyst 4. Click the Metric Groups tab. The default metric group appears in the Metric Groups panel and the metrics in the default metric group appear in the Metrics panel. 5. On the Metric Groups panel, click the Edit Group icon. The Edit dialog box appears. 6. Enter a name and an optional description. 7. Click OK. Deleting a Metric Group You can delete a metric group that is no longer valid. When you delete a metric group, you can choose to move the scores in the metric group to the default metric group. You cannot delete the default metric group. 1. In the Navigator, select the project or folder that contains the scorecard. 2. Click the scorecard to open it. The scorecard opens in a tab. 3. Click Actions > Edit. The Edit Scorecard window appears. 4. Click the Metric Groups tab. The default metric group appears in the Metric Groups panel and the metrics in the default metric group appear in the Metrics panel. 5. Select a metric group in the Metric Groups panel, and click the Delete Group icon. The Delete Groups dialog box appears. 6. Choose the option to delete the metrics in the metric group or the option to move the metrics to the default metric group before deleting the metric group. 7. Click OK. Drilling Down on Columns Drill down on the columns for a score to select columns that appear when you view the valid data rows or data rows that are not valid. The columns you select to drill down on appear in the Drilldown panel. 1. Run a scorecard to view the scores. 2. Select a column that contains the score you want to view. 3. Click Actions > Drilldown to view the rows of valid or invalid data for the column. 4. Click Actions > Drilldown Columns. The columns appear in the Drilldown panel for the selected score. The Analyst tool displays the rows of valid data for the columns by default. Optionally, click Invalid to view the rows of data that are not valid. Viewing Trend Charts You can view trend charts for each score to monitor how the score changes over time. 1. In the Navigator, select the project or folder that contains the scorecard. 2. Click the scorecard to open it. Metrics 61 The scorecard appears in a tab. 3. In the Scorecard view, select a score. 4. Click Actions > Show Trend Chart. The Trend Chart Detail window appears. You can view score values that have changed over time. The Analyst tool uses historical scorecard run data for each date and the latest valid score values to calculate the score. The Analyst tool uses the latest threshold settings in the chart to depict the color of the score points. Scorecard Notifications You can configure scorecard notification settings so that the Analyst tool sends emails when specific metric scores or metric group scores move across thresholds or remain in specific score ranges, such as Unacceptable, Acceptable, and Good. You can configure email notifications for individual metric scores and metric groups. If you use the global settings, the Analyst tool sends notification emails when the scores of selected metrics cross the threshold from the score ranges Good to Acceptable and Acceptable to Bad. You also get notification emails for each scorecard run if the score remains in the Unacceptable score range across consecutive scorecard runs. You can customize the notification settings so that scorecard users get email notifications when the scores move from the Unacceptable to Acceptable and Acceptable to Good score ranges. You can also choose to send email notifications if a score remains within specific score ranges for every scorecard run. Notification Email Message Template You can set up the message text and structure of email messages that the Analyst tool sends to recipients as part of scorecard notifications. The email template has an optional introductory text section, read-only message body section, and optional closing text section. The following table describes the tags in the email template: 62 Tag Description ScorecardName Name of the scorecard. ObjectURL A hyperlink to the scorecard. You need to provide the username and password. MetricGroupName Name of the metric group that the metric belongs to. CurrentWeightedAverage Weighted average value for the metric group in the current scorecard run. CurrentRange The score range, such as Unacceptable, Acceptable, and Good, for the metric group in the current scorecard run. PreviousWeightedAverage Weighted average value for the metric group in the previous scorecard run. PreviousRange The score range, such as Unacceptable, Acceptable, and Good, for the metric group in the previous scorecard run. ColumnName Name of the source column that the metric is assigned to. Chapter 12: Scorecards in Informatica Analyst Tag Description ColumnType Type of the source column. RuleName Name of the rule. RuleType Type of the rule. DataObjectName Name of the source data object. Setting Up Scorecard Notifications You can set up scorecard notifications at both metric and metric group levels. Global notification settings apply to those metrics and metric groups that do not have individual notification settings. 1. Run a scorecard in the Analyst tool. 2. Click Actions > Edit. 3. Click the Notifications tab. 4. Select Enable notifications to start configuring scorecard notifications. 5. Select a metric or metric group. 6. Click the Notifications check box to enable the global settings for the metric or metric group. 7. Select Use custom settings to change the settings for the metric or metric group. You can choose to send a notification email when the score is in Unacceptable, Acceptable, and Good ranges and moves across thresholds. 8. To edit the global settings for scorecard notifications, click the Edit Global Settings icon. The Edit Global Settings dialog box appears where you can edit the settings including the email template. Configuring Global Settings for Scorecard Notifications If you choose the global scorecard notification settings, the Analyst tool sends emails to target users when the score is in the Unacceptable range or moves down across thresholds. As part of the global settings, you can configure the email template including the email addresses and message text for a scorecard. 1. Run a scorecard in the Analyst tool. 2. Click Actions > Edit to open the Edit Scorecard dialog box. 3. Click the Notifications tab. 4. Select Enable notifications to start configuring scorecard notifications. 5. Click the Edit Global Settings icon. The Edit Global Settings dialog box appears where you can edit the settings, including the email template. 6. Choose when you want to send email notifications using the Score in and Score moves check boxes. 7. In the Email from field, change the email ID as necessary. By default, the Analyst tool uses the Sender Email Address property of the Data Integration Service as the sender email ID. 8. In the Email to field, enter the email ID of the recipient. Use a semicolon to separate multiple email IDs. Scorecard Notifications 63 9. Enter the text for the email subject. 10. In the Body field, add the introductory and closing text of the email message. 11. To apply the global settings, select Apply settings to all metrics and metric groups. 12. Click OK. Scorecard Integration with External Applications You can create a scorecard in the Analyst tool and view its results in external applications or web portals. Specify the scorecard results URL in a format that includes the host name, port number, project ID, and scorecard ID to view the results in external applications. Open a scorecard after you run it and copy its URL from the browser. The scorecard URL must be in the following format: http://{HOST_NAME}:{PORT}/AnalystTool/com.informatica.at.AnalystTool/index.jsp? mode=scorecard&project={MRS_PROJECT_ID}&id={SCORECARD_ID}&parentpath={MRS_PARENT_PATH}&view={VIEW_MODE}&pcs fcred={CREDENTIAL} The following table describes the scorecard URL attributes: Attribute Description HOST_NAME Host name of the Analyst Service. PORT Port number for the Analyst Service. MRS_PROJECT_ID Project ID in the Model repository. SCORECARD_ID ID of the scorecard. MRS_PARENT_PATH Location of the scorecard in the Analyst tool. For example, / project1/folder1/sub_folder1. VIEW_MODE Determines whether a read-only or editable view of the scorecard gets integrated with the external application. CREDENTIAL Last part of the URL generated by the single sign-on feature that represents the object type such as scorecard. The VIEW_MODE attribute in the scorecard URL determines whether you can integrate a read-only or editable view of the scorecard with the external application: view=objectonly Displays a read-only view of the scorecard results. view=objectrunonly Displays scorecard results where you can run the scorecard and drill down on results. view=full Opens the scorecard results in the Analyst tool with full access. 64 Chapter 12: Scorecards in Informatica Analyst Viewing a Scorecard in External Applications You view a scorecard using the scorecard URL in external applications or web portals. Copy the scorecard URL from the Analyst tool and add it to the source code of external applications or web portals. 1. Run a scorecard in the Analyst tool. 2. Copy the scorecard URL from the browser. 3. Verify that the URL matches the http://{HOST_NAME}:{PORT}/AnalystTool/com.informatica.at.AnalystTool/ index.jsp? mode=scorecard&project={MRS_PROJECT_ID}&id={SCORECARD_ID}&parentpath={MRS_PARENT_PATH}&view={VIEW_MODE} &pcsfcred={CREDENTIAL} format. 4. Add the URL to the source code of the external application or web portal. Scorecard Integration with External Applications 65 CHAPTER 13 Exception Record Management This chapter includes the following topics: ¨ Exception Record Management Overview, 66 ¨ Exception Management Tasks, 68 Exception Record Management Overview An exception is a record that contains unresolved data quality issues. The record may contain errors, or it may be an unintended duplicate of another record. You can use the Analyst tool to review and edit exception records that are identified by a mapping that contains an Exception transformation. You can review and edit the output from an Exception transformation in the Analyst tool or in the Informatica Data Director for Data Quality web application. You use Informatica Data Director for Data Quality when you are assigned a task as part of a workflow. You can use the Analyst tool to review the following exception types: Bad records You can edit records, delete records, tag them to be reprocessed by a mapping, or profile them to analyze the quality of changes made to the records. Duplicate records You can consolidate clusters of similar records to a single master record. You can consolidate or remove duplicate records, extract records to form new clusters, and profile duplicate records. The Exception transformation creates a database table to store the bad or duplicate records. The Model repository stores the data object associated with the table. The transformation also creates one or more tables for the metadata associated with the bad or duplicate records. To review and update the bad or duplicate records, import the database table to the staging database in the Analyst tool. The Analyst tool uses the metadata tables in the database to identify the data quality issues in each record. You do not use the data object in the Model repository to update the record data. Exception Management Process Flow The Exception transformation analyzes the output of other data quality transformations and creates tables that contain records with different levels of data quality. After the Exception transformation creates an exception table, you can use the Analyst tool or Informatica Data Director for Data Quality to review and update the records in the table. 66 You can configure data quality transformations in a single mapping, or you can create mappings for different stages in the process. Use the Developer tool to perform the following tasks: Create a mapping that generates score values for data quality issues Use a Match transformation in cluster mode to generate score values for duplicate record exceptions. Use a transformation that writes a business rule to generate score values for records that contain errors. For example, you can define an IF/THEN rule in a Decision transformation. Use the rule to evaluate the output of other data quality transformations. Use an Exception transformation to analyze the record scores Configure the Exception transformation to read the output of other transformations or to read a data object from another mapping. Configure the transformation to write records to database tables based on score values in the records. Configure target data objects for good records or automatic consolidation records Connect the Exception transformation output ports to the target data objects in the mapping. Create the target data object for bad or duplicate records Use the Generate bad records table or Generate duplicate record table option to create the database object and add it to the mapping canvas. The Developer tool auto-connects the bad or duplicate record ports to the data object. Run the mapping Run the mapping to process exceptions. Use the Analyst tool or Informatica Data Director for Data Quality to perform the following tasks: Review the exception table data You can use the Analyst tool or Informatica Data Director for Data Quality to review the bad or duplicate record tables. ¨ Use the Analyst tool to import the exception records into a bad or duplicate record table. Open the imported table from the Model repository and work on the exception data. ¨ Use Informatica Data Director for Data Quality if you are assigned a task to review or correct exceptions as part of a Human task. Note: The exception tables you create in the Exception transformation include columns that provide metadata to Informatica Data Director for Data Quality. The columns are not used in the Analyst tool. When you import the tables to the Analyst tool for exception data management, the Analyst tool hides the columns. Reserved Column Names When you create a bad record or consolidation table, the Analyst tool generates columns for use in its internal tables. Do not import tables that use these names. If an imported table contains a column with the same name as one of the generated columns, the Analyst tool will not process it. Reserve the following column names for bad record or consolidation tables: ¨ checkStatus ¨ rowIdentifier ¨ acceptChanges ¨ recordGroup ¨ masterRecord Exception Record Management Overview 67 ¨ matchScore ¨ any name beginning with DQA_ Exception Management Tasks You can perform the following exception management tasks in the Analyst tool: Manage bad records Identify problem records and fix data quality issues. Consolidate duplicate records Merge groups of duplicate records into a single record. View the audit trail Review the changes made in the bad or duplicate record tables before writing the changes to the source database. Viewing and Editing Bad Records Complete these steps to view and edit bad records: 1. Log in to the Analyst tool. 2. Select a project. 3. Select a bad records table. 4. Optionally, use the menus to filter the table records. You can filter records by value in the following columns: Priority, Quality Issue, Column, and Status. 5. Click Show to view the records that match the filter criteria. 6. Double-click a cell to edit the cell to edit the cell value. 7. Click Save to save the rows you updated. Saving changes to a record is the first step in processing the record in the Analyst tool. After you save changes to a record, you can update the record status to accept, reprocess, or reject the record. Updating Bad Record Status For each record that does not require further editing, perform one of the following actions: Select one or more records by clicking the check box next to each record. Select all the records in the table by clicking the check box at the top of the first column. Note: The Analyst tool does not display records that you have taken action on. ¨ Click Accept. Indicates that the record is acceptable for use. ¨ Click Reject. Indicates that the record is not acceptable for use. ¨ Click Reprocess. Selects the record for reprocessing by a data quality mapping. Select this option when you are unsure if the record is valid. Rerun the mapping with an updated business rule to recheck the record. 68 Chapter 13: Exception Record Management Viewing and Filtering Duplicate Record Clusters Complete these steps to view and filter duplicate clusters: 1. Log in to the Analyst tool. 2. Select a project. 3. Select a duplicate record table. 4. The first cluster in the table opens. The Analyst tool also displays the number of clusters in the table. Click a number to move to a cluster. 5. Optionally, use the Filter option to filter the cluster list. In the Filter Clusters dialog box, select a column and enter a filter string. The Analyst tool returns all clusters with one or more records that contain the string in the column you select. Editing Duplicate Record Clusters Edit clusters to change how the Analyst tool consolidates potential duplicate records. You can edit clusters in the following ways: To remove a record from a cluster: Clear the selection in the Cluster column to remove the record from the cluster. When you delete a record from a cluster, the record assumes a unique cluster ID. To create a new cluster from records in the current cluster: Select a subset of records and click the Extract Cluster button. This action creates a new cluster ID for the selected records. To edit the record: Select a record field to edit the data in that field. To select the fields that populate the master record: Click the selection arrow in a field to add its value to the corresponding field in the Final Record row. An arrow indicates that the field provides data for the master record. To specify a master record: Click a cell in the Master column for a row to select that row as the master record. Consolidating Duplicate Record Clusters When you have processed a cluster, complete this step to consolidate the cluster records to a single record in the staging database. u In the cluster you processed, click the Consolidate Cluster button. The Analyst tool performs the following updates on cluster records: ¨ In the staging database, the Analyst tool updates the master record with the contents of the Final record and sets the status to Updated. ¨ The Analyst tool sets the status of the other selected records to Consolidated. ¨ The Analyst tool sets the status of any cleared record to Reprocess. Exception Management Tasks 69 Viewing the Audit Trail The Analyst tool tracks changes to the exception record database in an audit trail. Complete the following steps to view audit trail records: 1. Select the Audit Trail tab. 2. Set the filter options. 3. Click Show. The following table describes record statuses for the audit trail. 70 Record Status Description Updated Edited during bad record processing, or selected as the Master record during consolidation. Consolidated Consolidated to a master record during consolidation. Rejected Rejected during bad record processing. Accepted Accepted during bad record processing. Reprocess Marked for reprocessing during bad record processing. Rematch Removed from a cluster during consolidation. Extracted Extracted from a cluster into a new cluster during consolidation. Chapter 13: Exception Record Management CHAPTER 14 Reference Tables This chapter includes the following topics: ¨ Reference Tables Overview, 71 ¨ Reference Table Properties, 71 ¨ Create Reference Tables, 73 ¨ Create a Reference Table from Profile Data, 74 ¨ Create a Reference Table From a Flat File, 76 ¨ Create a Reference Table from a Database Table, 78 ¨ Copying a Reference Table in the Model Repository, 79 ¨ Reference Table Management, 79 ¨ Audit Trail Events, 81 ¨ Rules and Guidelines for Reference Tables, 82 Reference Tables Overview Informatica provides reference tables that you can import to the Model repository. You can also create reference tables and connect to database tables that contain reference data. Use the Analyst tool to create and update reference tables. Reference Table Properties You can view and edit the properties of a reference table in the Analyst tool. To view the properties, open the reference table and select the Properties view. To edit the properties, open the reference table and select the Edit Table option. A reference table displays general properties that describe the repository object and column properties that describe the column data. 71 General Reference Table Properties The general properties include information about the users who created and updated the reference table. The general properties also identify the current valid column in the table. The following table describes the general properties: Property Description Name Name of the reference table. Description Optional description of the reference table. Location Project that contains the reference table in the Model repository. Precision Precision for the column. Precision is the maximum number of digits or the maximum number of characters that the column can accommodate. Valid Column Column that contains the valid reference data. Created on Creation date for the reference table. Created By User who created the reference table. Last Modified Date of the most recent update to the reference table. Last Modified User who most recently edited the reference table. Connection ID Connection name of the database that stores the reference table data. Reference Table Column Properties The column properties include information about the column metadata. The following table describes the column properties: Property Description Name Name of each column. Data Type The datatype for the data in each column. You can select one of the following datatypes: - bigint date/time decimal double integer string You cannot select a double data type when you create an empty reference table or create a reference table from a flat file. 72 Chapter 14: Reference Tables Property Description Precision Precision for each column. Precision is the maximum number of digits or the maximum number of characters that the column can accommodate. The precision values you configure depend on the data type. Scale Scale for each column. Scale is the maximum number of digits that a column can accommodate to the right of the decimal point. Applies to decimal columns. The scale values you configure depend on the data type. Description Optional description for each column. Create Reference Tables Use the reference table editor, profile results, or a flat file to create reference tables. Create reference tables to share reference data with developers in the Developer tool. Use the following methods to create a reference table: ¨ Create a reference table in the reference table editor. ¨ Create a reference table from profile column data or profile pattern data. ¨ Create a reference table from flat file data. ¨ Create a reference table from data in another database table. Creating a Reference Table in the Reference Table Editor Use the New Reference Table Wizard and the reference table editor view to create a reference table. You use the reference table editor to define the table structure and add data to the table. 1. In the Navigator, select the project or folder where you want to create the reference table. 2. Click Actions > New > Reference Table. The New Reference Table Wizard appears. 3. Select the option to Use the reference table editor. 4. Click Next. 5. Enter the table name, and optionally enter a description and default value. The Analyst tool uses the default value for any table record that does not contain a value. 6. For each column you want to include in the reference table, click the Add New Column icon and configure the properties for each column. Note: You can reorder or delete columns. 7. Optionally, enter an audit note for the table. The audit note appears in the audit trail log. 8. Click Finish. Create Reference Tables 73 Create a Reference Table from Profile Data You can use profile data to create reference tables that relate to the source data in the profile. Use the reference tables to find different types of information in the source data. You can use a profile to create or update a reference table in the following ways: ¨ Select a column in the profile and add it to a reference table. ¨ Browse a profile column and add a subset of the column data to a reference table. ¨ Select a column in the profile and add the pattern values for that column to a reference table. Creating a Reference Table from Profile Columns You can create a reference table from a profile column. You can add a profile column to an existing reference table. The New Reference Table Wizard adds the column to the reference table. 1. In the Navigator, select the project or folder that contains the profile with the column that you want to add to a reference table. 2. Click the profile name to open it in another tab. 3. In the Column Profiling view, select the column that you want to add to a reference table. 4. Click Actions > Add to Reference Table. The New Reference Table Wizard appears. 5. Select the option to Create a new reference table. Optionally, select Add to existing reference table, and click Next. Navigate to the reference table in the project or folder, preview the reference table data and click Next. Select the column to add and click Finish. 6. Click Next. 7. The column name appears by default as the table name. Optionally enter another table name, a description, and default value. The Analyst tool uses the default value for any table record that does not contain a value. 8. Click Next. 9. In the Column Attributes panel, configure the column properties for the column. 10. Optionally, choose to create a description column for rows in the reference table. Enter the name and precision for the column. 11. Preview the column values in the Preview panel. 12. Click Next. 13. The column name appears as the table name by default. Optionally, enter another table name and a description. 14. In the Save in panel, select the location where you want to create the reference table. The Reference Tables: panel lists the reference tables in the location you select. 15. Optionally, enter an audit note. 16. Click Finish. Creating a Reference Table from Column Values You can create a reference table from the column values in a profile column. Select a column in a profile and select the column values to add to a reference table or create a reference table to add the column values. 74 Chapter 14: Reference Tables 1. In the Navigator, select the project or folder that contains the profile with the column that you want to add to a reference table. 2. Click the profile name to open it in another tab. 3. In the Column Profiling view, select the column that you want to add to a reference table. 4. In the Values view, select the column values you want to add. Use the CONTROL or SHIFT keys to select multiple values. 5. Click Actions > Add to Reference Table. The New Reference Table Wizard appears. 6. Select the option to Create a new reference table. Optionally, select Add to existing reference table, and click Next. Navigate to the reference table in the project or folder, preview the reference table data and click Next. Select the column to add and click Finish. 7. Click Next. 8. The column name appears by default as the table name. Optionally enter another table name, a description, and default value. The Analyst tool uses the default value for any table record that does not contain a value. 9. Click Next. 10. In the Column Attributes panel, configure the column properties for the column. 11. Optionally, choose to create a description column for rows in the reference table. Enter the name and precision for the column. 12. Preview the column values in the Preview panel. 13. Click Next. 14. The column name appears as the table name by default. Optionally, enter another table name and a description. 15. In the Save in panel, select the location where you want to create the reference table. The Reference Tables: panel lists the reference tables in the location you select. 16. Optionally, enter an audit note. 17. Click Finish. Creating a Reference Table from Column Patterns You can create a reference table from the column patterns in a profile column. Select a column in the profile and select the pattern values to add to a reference table or create a reference table to add the pattern values. 1. In the Navigator, select the project or folder that contains the profile with the column that you want to add to a reference table. 2. Click the profile name to open it in another tab. 3. In the Column Profiling view, select the column that you want to add to a reference table. 4. In the Patterns view, select the column patterns you want to add. Use the CONTROL or SHIFT keys to select multiple values 5. Click Actions > Add to Reference Table. The New Reference Table Wizard appears. 6. Select the option to Create a new reference table. Create a Reference Table from Profile Data 75 Optionally, select Add to existing reference table, and click Next. Navigate to the reference table in the project or folder, preview the reference table data and click Next. Select the column to add and click Finish. 7. Click Next. 8. The column name appears by default as the table name. Optionally enter another table name, a description, and default value. The Analyst tool uses the default value for any table record that does not contain a value. 9. Click Next. 10. In the Column Attributes panel, configure the column properties for the column. 11. Optionally, choose to create a description column for rows in the reference table. Enter the name and precision for the column. 12. Preview the column values in the Preview panel. 13. Click Next. 14. The column name appears as the table name by default. Optionally, enter another table name and a description. 15. In the Save in panel, select the location where you want to create the reference table. The Reference Tables: panel lists the reference tables in the location you select. 16. Optionally, enter an audit note. 17. Click Finish Create a Reference Table From a Flat File You can import reference data from a CSV file. Use the New Reference Table wizard to import the file data. You must configure the properties for each flat file that you use to create a reference table. Analyst Tool Flat File Properties When you import a flat file as a reference table, you must configure the properties for each column in the file. The options that you configure determine how the Analyst tool reads the data from the file. The following table describes the properties you can configure when you import file data for a reference table: Properties Description Delimiters Character used to separate columns of data. Use the Other field to enter a different delimiter. Delimiters must be printable characters and must be different from the escape character and the quote character if selected. You cannot select non-printing multibyte characters as delimiters. Text Qualifier Quote character that defines the boundaries of text strings. Choose No Quote, Single Quote, or Double Quotes. If you select a quote character, the wizard ignores delimiters within pairs of quotes. 76 Chapter 14: Reference Tables Properties Description Column Names Imports column names from the first line. Select this option if column names appear in the first row. The wizard uses data in the first row in the preview for column names. Default is not enabled. Values Option to start value import from a line. Indicates the row number in the preview at which the wizard starts reading when it imports the file. Creating a Reference Table from a Flat File When you create a reference table data from a flat file, the table uses the column structure of the file and imports the file data. 1. In the Navigator, select the project or folder where you want to create the reference table. 2. Click Actions > New > Reference Table. The New Reference Table Wizard appears. 3. Select the option to Import a flat file. 4. Click Next. 5. Click Browse to select the flat file. 6. Click Upload to upload the file to a directory in the Informatica services installation directory that the Analyst tool can access. 7. Enter the table name. Optionally, enter a description and default value. The Analyst tool uses the default value for any table record that does not contain a value. 8. Select a code page that matches the data in the flat file. 9. Preview the data in the Preview of file panel. 10. Click Next. 11. Configure the flat file properties. 12. In the Preview panel, click Show to update the preview. 13. Click Next. 14. On the Column Attributes panel, verify or edit the column properties for each column. 15. Optionally, create a description column for rows in the reference table. Enter the name and precision for the column. 16. Optionally, enter an audit note for the table. 17. Click Finish. Create a Reference Table From a Flat File 77 Create a Reference Table from a Database Table When you create a reference table from a database table, you connect to the database and import the table data. Use the New Reference Table wizard to enter the database connection properties for the table. Then import the tables into a folder into the Model repository. Creating a Database Connection Before you import reference tables from a database, you create a database connection in the Analyst tool. 1. Select a project or folder in the Navigator. 2. Click Actions > New > Reference Table. The New Reference Table Wizard appears. 3. Select the option to Connect to a relational table. Optionally, select the option to create an unmanaged reference table. If you select this option, the Analyst tool does not store the reference table data in the reference data database. 4. Click Next. 5. Click New Connection. The New Connection window appears. 6. Enter the properties for the database you want to connect to. 7. Select Grant everyone execute permission on this connection. 8. Click OK. The Analyst tool tests the database connection. The database connection appears in the list of established connections. Creating a Reference Table from a Database Table To create the reference table, connect to a database and import the column data you need. 1. In the Navigator, select the project or folder where you want to create the reference table. 2. Click Actions > New > Reference Table. The New Reference Table Wizard appears. 3. Select the option to Connect to a relational table. 4. Select Unmanaged Table if you want to create a table that does not store data in the reference data database. You cannot edit the values in an unmanaged reference table. 5. Click Next. 6. Select the database connection from the list of established connections. 7. Click Next. 8. On the Tables panel, select a table. The table properties appear on the Properties panel. 9. 78 Optionally, click Data Preview. 10. Click Next. 11. On the Column Attributes panel, configure the column properties for each column. 12. Optionally, include a column for row-level descriptions. Chapter 14: Reference Tables 13. Optionally, add an audit note in the Audit Note field. 14. Click Next. 15. Enter a name and optionally a description for the reference table. 16. On the Folders panel, select the project or folder where you want to create the reference table. 17. The Reference Tables panel lists the reference tables in the folder you select. 18. Click Finish. Copying a Reference Table in the Model Repository You can copy a reference table between folders in a Model repository project. The reference table and the copy you create are not linked in the Model repository or in the database. When you create a copy, you create a new database table. 1. Browse the Model repository, and find the reference table you want to copy. 2. Right-click the reference table, and select Duplicate from the context menu. 3. In the Duplicate dialog box, select a folder to store the copy of the reference table. 4. Optionally, enter a new name for the copy of the reference table. 5. Click OK. Reference Table Management You can perform tasks to manage reference tables. You can find and replace column values, add or remove columns and rows, edit column values, and export a reference table to a file. You can perform the following tasks to manage reference tables: ¨ Manage columns. Use the Edit column properties window to add, edit, or delete columns in a reference table. ¨ Manage rows. Use the Add Rows window to add rows and the Edit Row window to edit rows in a reference table. Use the Delete icon to delete rows in a reference table. ¨ Find and replace values. You can find and replace values in individual reference table columns. You can find a value in a column and replace it with another value. You can replace all values in columns with another value. ¨ Export a reference table. Export a reference table to a comma-separated values (CSV) file, dictionary file, or Excel file. Managing Columns Use the Edit column properties window to add, edit, or delete columns in a reference table. 1. In the Navigator, select the project or folder that contains the reference table that you want to edit. 2. Click the reference table name to open it in a tab. The Reference Table tab appears. 3. Click Actions > Edit Table or click the Edit Table icon. Copying a Reference Table in the Model Repository 79 The Edit column properties window appears. 4. To add a column, click the Add New Column icon in the Column Attributes panel and edit the column properties. Or, to edit an existing column, click the property you want to edit. You cannot edit the datatype, precision, and scale of the column. You can rename the column and change the column description. 5. To delete a column, click the column and click the Delete icon. 6. Optionally, you can enter an audit note on the Audit Note panel. The audit note appears in the audit log for any action you perform in the Edit column properties window. 7. Click OK. Managing Rows You can add, edit, or delete rows in a reference table. 1. In the Navigator, select the project or folder containing the reference table that you want to edit. 2. Click the reference table name to open it in a tab. The Reference Table tab appears. 3. To add a row, click Actions > Add Row or click the Add Row icon. In the Add Row window, enter the value for each column and enter an optional audit note. Click OK. 4. To edit rows, select the rows and click Actions > Edit or click the Edit icon. In the Edit Rows window, enter the value for each column, select the columns to apply the changes to, and enter an optional audit note. Optionally, click Previous to edit the previous row and click Next to edit the next row. Click Apply to apply the changes. The new column values appear in the tab. 5. To delete rows, select the rows you want to delete and click Actions > Delete or click the Delete icon. In the Delete Rows window, enter an optional audit note and click OK. Note: Use the Developer to edit larger reference tables. For example, if the reference table contains more than 500 rows or five columns, edit the reference table in the Developer tool. Finding and Replacing Values You can find and replace values in individual reference table columns. 1. In the Navigator, select the project or folder containing the reference table that you want to find and replace values in. 2. Click the reference table name to open it in a tab. The Reference Table tab appears. 3. Click Actions > Find and Replace or click the Find and Replace icon. The Find and Replace toolbar appears. 4. 80 Enter the search criteria in the Find box. Select all columns or a column that you want to find in the list. Enter the value you want to replace with, and click one of the following buttons: Option Description Next/Previous Scroll through the column values that match the search criteria. Highlight All Highlight all the column values that match the search criteria. Replace Replace the currently highlighted column value. Chapter 14: Reference Tables Option Description Replace All Replace all occurrences of the search criteria in column values. Exporting a Reference Table Export a reference table to a comma-seperated values (CSV) file, dictionary file, or Microsoft Excel file. 1. In the Navigator, select the project or folder containing the reference table that you want to view the audit trail for. 2. Click the reference table name to open it in a tab. The Reference Table tab appears. 3. Click Actions > Export Data. The Export data to a file window appears. 4. Configure the following options: Option Description File Name File name for the exported data. File Format Format of the exported file. You can select the following formats: ¨ csv. Comma-separated values file. ¨ xls. Microsoft Excel file. ¨ dic. Dictionary file. Optionally, select Export field names as first row to export the column names as a header row in the exported file. Code Page 5. Code page of the reference data. Click OK. The options to save or open the file depend on your browser. Audit Trail Events Use the Audit Trail view for a reference table to view audit trail log events. The Analyst tool creates audit trail log events when you make a change to a reference table and enter an audit trail note. Audit trail log events provide information about the reference tables that you manage. Audit Trail Events 81 You can configure query options on the Audit Trail tab to filter the log events that you view. You can specify filters on the date range, type, user name, and status. The following table describes the options you configure when you view audit trail log events: Option Description Date Start and end dates for the log events to search for. Use the calender to choose dates. Type Type of audit trail events. You can filter and view the following events types: - Data. Events related to data in the reference table. Events include creating, editing, deleting, and replacing all rows. - Metadata. Events related to reference table metadata. Events include creating reference tables, adding, deleting, and editing columns, and updating valid columns. User User who edited the reference table and entered the audit trail comment. The Analyst tool generates the list of users from the Analyst tool users configured in the Administrator tool. Status Status of the audit trail log events. Status corresponds to the action performed in the reference table editor. Audit trail log events also include the audit trail comments and the column values that were inserted, updated, or deleted. Viewing Audit Trail Events View audit trail log events to get more information about changes made to a reference table. 1. In the Navigator, select the project or folder that contains the reference table that you want to view the audit trail for. 2. Click the reference table name to open it in a tab. The Reference Table tab appears. 3. Click the Audit Trail view. 4. Configure the filter options. 5. Click Show. The log events for the specified query options appear. Rules and Guidelines for Reference Tables Use the following rules and guidelines while working with reference tables in the Analyst tool: ¨ When you import a reference table from an Oracle, IBM DB2, IBM DB2/zOS, IBM DB2/iOS, or Microsoft SQL Server database, the Analyst tool cannot display the preview if the table, view, schema, synonym, or column names contain mixed case or lower case characters. To preview data in tables that reside in case-sensitive databases, set the Support Mixed Case Identifiers attribute to true in the connections for Oracle, IBM DB2, IBM DB2/zOS, IBM DB2/iOS, and Microsoft SQL Server databases in the Developer tool or Administrator tool. ¨ When you create a reference table from inferred column patterns in one format, the Analyst tool populates the reference table with column patterns in a different format. 82 Chapter 14: Reference Tables For example, when you create a reference table for the column pattern X(5), the Analyst tool displays the following format for the column pattern in the reference table: XXXXX. ¨ When you import an Oracle database table, verify the length of any VARCHAR2 column in the table. The Analyst tool cannot import an Oracle database table that contains a VARCHAR2 column with a length greater than 1000. ¨ To read a reference table, you need execute permissions on the connection to the database that stores the table data values. For example, if the reference data database stores the data values, you need execute permissions on the connection to the reference data database. This applies whether you access the reference table in read or write mode. The database connection permissions apply to all reference data in the database. Rules and Guidelines for Reference Tables 83 INDEX C column profile drilldown 49 Informatica Developer 21 options 20 overview 19 process 39 column profile results Informatica Developer 23 column properties reference tables in Analyst tool 71 reference tables in Developer tool 32 creating a custom profile profiles 41 creating a reference table from column patterns reference tables 75 creating a reference table from column values reference tables 75 creating a reference table from profile columns reference tables 74 creating a reference table manually reference tables 73 creating an expression rule rules 54 Informatica Analyst column profile results 45 column profiles overview 38 rules 52 Informatica Data Quality overview 2 Informatica Developer rules 26 M managing columns reference tables 79 managing rows reference tables 80 mapping object running a profile 30 Mapplet and Mapping Profiling Overview 30 P exporting a reference table reference tables 81 expression rules process 54 predefined rules process 53 profile results column patterns 47 column statistics 48 column values 47 drilling down 49 Excel 50 exporting 50 exporting from Informatica Analyst 50 exporting in Informatica Developer 25 summary 46 profiles creating a custom profile 41 running 42 F R finding and replacing valyes reference tables 80 flat file properties reference tables in Analyst tool 71 reference tables in Developer tool 32 flat files synchronizing a flat file data object 43 reference tables column properties in Analyst tool 71 column properties in Developer tool 32 creating a reference table from column patterns 75 creating a reference table from column values 75 creating a reference table from profile columns 74 creating a reference table manually 73 exporting a reference table 81 finding and replacing values 80 flat file properties in Analyst tool 71 flat file properties in Developer tool 32 importing a reference table 77 managed and unmanaged 7 D data object profiles creating a single profile 22 E I importing a reference table reference tables 77 84 managing columns 79 managing rows 80 viewing audit trail tables 82 rules applying a predefined rule 53 applying in Informatica Developer 27 creating an expression rule 54 creating in Informatica Developer 26 expression 54 overview 20 predefined 53 S scorecard configuring global notification settings 63 configuring notifications 63 viewing in external applications 65 scorecard integration Informatica Analyst 64 scorecards adding columns to a scoredard 57 creating a metric group 60 defining thresholds 59 deleting a metric group 61 drilling down 61 editing 59 editing a metric group 60 Informatica Analyst 56 Informatica Analyst process 56 Informatica Developer 28 metric groups 59 metric weights 57 metrics 57 moving scores 60 notifications 62 overview 20 running 58 viewing 58 T tables synchronizing a relational data object 44 trend charts viewing 61 V viewing audit table events reference tables 82 Index 85
© Copyright 2026 Paperzz