User Guide - Informatica Knowledge Base

Informatica Data Quality (Version 9.5.1)
User Guide
Informatica Data Quality User Guide
Version 9.5.1
December 2012
Copyright (c) 2009-2012 Informatica. All rights reserved.
This software and documentation contain proprietary information of Informatica Corporation and are provided under a license agreement containing restrictions on use and
disclosure and are also protected by copyright law. Reverse engineering of the software is prohibited. No part of this document may be reproduced or transmitted in any form,
by any means (electronic, photocopying, recording or otherwise) without prior consent of Informatica Corporation. This Software may be protected by U.S. and/or international
Patents and other Patents Pending.
Use, duplication, or disclosure of the Software by the U.S. Government is subject to the restrictions set forth in the applicable software license agreement and as provided in
DFARS 227.7202-1(a) and 227.7702-3(a) (1995), DFARS 252.227-7013©(1)(ii) (OCT 1988), FAR 12.212(a) (1995), FAR 52.227-19, or FAR 52.227-14 (ALT III), as applicable.
The information in this product or documentation is subject to change without notice. If you find any problems in this product or documentation, please report them to us in
writing.
Informatica, Informatica Platform, Informatica Data Services, PowerCenter, PowerCenterRT, PowerCenter Connect, PowerCenter Data Analyzer, PowerExchange,
PowerMart, Metadata Manager, Informatica Data Quality, Informatica Data Explorer, Informatica B2B Data Transformation, Informatica B2B Data Exchange Informatica On
Demand, Informatica Identity Resolution, Informatica Application Information Lifecycle Management, Informatica Complex Event Processing, Ultra Messaging and Informatica
Master Data Management are trademarks or registered trademarks of Informatica Corporation in the United States and in jurisdictions throughout the world. All other company
and product names may be trade names or trademarks of their respective owners.
Portions of this software and/or documentation are subject to copyright held by third parties, including without limitation: Copyright DataDirect Technologies. All rights
reserved. Copyright © Sun Microsystems. All rights reserved. Copyright © RSA Security Inc. All Rights Reserved. Copyright © Ordinal Technology Corp. All rights
reserved.Copyright © Aandacht c.v. All rights reserved. Copyright Genivia, Inc. All rights reserved. Copyright Isomorphic Software. All rights reserved. Copyright © Meta
Integration Technology, Inc. All rights reserved. Copyright © Intalio. All rights reserved. Copyright © Oracle. All rights reserved. Copyright © Adobe Systems Incorporated. All
rights reserved. Copyright © DataArt, Inc. All rights reserved. Copyright © ComponentSource. All rights reserved. Copyright © Microsoft Corporation. All rights reserved.
Copyright © Rogue Wave Software, Inc. All rights reserved. Copyright © Teradata Corporation. All rights reserved. Copyright © Yahoo! Inc. All rights reserved. Copyright ©
Glyph & Cog, LLC. All rights reserved. Copyright © Thinkmap, Inc. All rights reserved. Copyright © Clearpace Software Limited. All rights reserved. Copyright © Information
Builders, Inc. All rights reserved. Copyright © OSS Nokalva, Inc. All rights reserved. Copyright Edifecs, Inc. All rights reserved. Copyright Cleo Communications, Inc. All rights
reserved. Copyright © International Organization for Standardization 1986. All rights reserved. Copyright © ej-technologies GmbH. All rights reserved. Copyright © Jaspersoft
Corporation. All rights reserved. Copyright © is International Business Machines Corporation. All rights reserved. Copyright © yWorks GmbH. All rights reserved. Copyright ©
Lucent Technologies. All rights reserved. Copyright (c) University of Toronto. All rights reserved. Copyright © Daniel Veillard. All rights reserved. Copyright © Unicode, Inc.
Copyright IBM Corp. All rights reserved. Copyright © MicroQuill Software Publishing, Inc. All rights reserved. Copyright © PassMark Software Pty Ltd. All rights reserved.
Copyright © LogiXML, Inc. All rights reserved. Copyright © 2003-2010 Lorenzi Davide, All rights reserved. Copyright © Red Hat, Inc. All rights reserved. Copyright © The Board
of Trustees of the Leland Stanford Junior University. All rights reserved. Copyright © EMC Corporation. All rights reserved. Copyright © Flexera Software. All rights reserved.
This product includes software developed by the Apache Software Foundation (http://www.apache.org/), and other software which is licensed under the Apache License,
Version 2.0 (the "License"). You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0. Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under the License.
This product includes software which was developed by Mozilla (http://www.mozilla.org/), software copyright The JBoss Group, LLC, all rights reserved; software copyright ©
1999-2006 by Bruno Lowagie and Paulo Soares and other software which is licensed under the GNU Lesser General Public License Agreement, which may be found at http://
www.gnu.org/licenses/lgpl.html. The materials are provided free of charge by Informatica, "as-is", without warranty of any kind, either express or implied, including but not
limited to the implied warranties of merchantability and fitness for a particular purpose.
The product includes ACE(TM) and TAO(TM) software copyrighted by Douglas C. Schmidt and his research group at Washington University, University of California, Irvine,
and Vanderbilt University, Copyright (©) 1993-2006, all rights reserved.
This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit (copyright The OpenSSL Project. All Rights Reserved) and redistribution of
this software is subject to terms available at http://www.openssl.org and http://www.openssl.org/source/license.html.
This product includes Curl software which is Copyright 1996-2007, Daniel Stenberg, <[email protected]>. All Rights Reserved. Permissions and limitations regarding this
software are subject to terms available at http://curl.haxx.se/docs/copyright.html. Permission to use, copy, modify, and distribute this software for any purpose with or without
fee is hereby granted, provided that the above copyright notice and this permission notice appear in all copies.
The product includes software copyright 2001-2005 (©) MetaStuff, Ltd. All Rights Reserved. Permissions and limitations regarding this software are subject to terms available
at http://www.dom4j.org/ license.html.
The product includes software copyright © 2004-2007, The Dojo Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms
available at http://dojotoolkit.org/license.
This product includes ICU software which is copyright International Business Machines Corporation and others. All rights reserved. Permissions and limitations regarding this
software are subject to terms available at http://source.icu-project.org/repos/icu/icu/trunk/license.html.
This product includes software copyright © 1996-2006 Per Bothner. All rights reserved. Your right to use such materials is set forth in the license which may be found at http://
www.gnu.org/software/ kawa/Software-License.html.
This product includes OSSP UUID software which is Copyright © 2002 Ralf S. Engelschall, Copyright © 2002 The OSSP Project Copyright © 2002 Cable & Wireless
Deutschland. Permissions and limitations regarding this software are subject to terms available at http://www.opensource.org/licenses/mit-license.php.
This product includes software developed by Boost (http://www.boost.org/) or under the Boost software license. Permissions and limitations regarding this software are subject
to terms available at http:/ /www.boost.org/LICENSE_1_0.txt.
This product includes software copyright © 1997-2007 University of Cambridge. Permissions and limitations regarding this software are subject to terms available at http://
www.pcre.org/license.txt.
This product includes software copyright © 2007 The Eclipse Foundation. All Rights Reserved. Permissions and limitations regarding this software are subject to terms
available at http:// www.eclipse.org/org/documents/epl-v10.php.
This product includes software licensed under the terms at http://www.tcl.tk/software/tcltk/license.html, http://www.bosrup.com/web/overlib/?License, http://www.stlport.org/
doc/ license.html, http://www.asm.ow2.org/license.html, http://www.cryptix.org/LICENSE.TXT, http://hsqldb.org/web/hsqlLicense.html, http://httpunit.sourceforge.net/doc/
license.html, http://jung.sourceforge.net/license.txt , http://www.gzip.org/zlib/zlib_license.html, http://www.openldap.org/software/release/license.html, http://www.libssh2.org,
http://slf4j.org/license.html, http://www.sente.ch/software/OpenSourceLicense.html, http://fusesource.com/downloads/license-agreements/fuse-message-broker-v-5-3- licenseagreement; http://antlr.org/license.html; http://aopalliance.sourceforge.net/; http://www.bouncycastle.org/licence.html; http://www.jgraph.com/jgraphdownload.html; http://
www.jcraft.com/jsch/LICENSE.txt. http://jotm.objectweb.org/bsd_license.html; . http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231; http://www.slf4j.org/
license.html; http://developer.apple.com/library/mac/#samplecode/HelpHook/Listings/HelpHook_java.html; http://nanoxml.sourceforge.net/orig/copyright.html; http://
www.json.org/license.html; http://forge.ow2.org/projects/javaservice/, http://www.postgresql.org/about/licence.html, http://www.sqlite.org/copyright.html, http://www.tcl.tk/
software/tcltk/license.html, http://www.jaxen.org/faq.html, http://www.jdom.org/docs/faq.html, http://www.slf4j.org/license.html; http://www.iodbc.org/dataspace/iodbc/wiki/
iODBC/License; http://www.keplerproject.org/md5/license.html; http://www.toedter.com/en/jcalendar/license.html; http://www.edankert.com/bounce/index.html; http://www.netsnmp.org/about/license.html; http://www.openmdx.org/#FAQ; http://www.php.net/license/3_01.txt; http://srp.stanford.edu/license.txt; http://www.schneier.com/blowfish.html;
http://www.jmock.org/license.html; http://xsom.java.net; and http://benalman.com/about/license/.
This product includes software licensed under the Academic Free License (http://www.opensource.org/licenses/afl-3.0.php), the Common Development and Distribution
License (http://www.opensource.org/licenses/cddl1.php) the Common Public License (http://www.opensource.org/licenses/cpl1.0.php), the Sun Binary Code License
Agreement Supplemental License Terms, the BSD License (http:// www.opensource.org/licenses/bsd-license.php) the MIT License (http://www.opensource.org/licenses/mitlicense.php) and the Artistic License (http://www.opensource.org/licenses/artistic-license-1.0).
This product includes software copyright © 2003-2006 Joe WaInes, 2006-2007 XStream Committers. All rights reserved. Permissions and limitations regarding this software
are subject to terms available at http://xstream.codehaus.org/license.html. This product includes software developed by the Indiana University Extreme! Lab. For further
information please visit http://www.extreme.indiana.edu/.
This product includes software developed by Andrew Kachites McCallum. "MALLET: A Machine Learning for Language Toolkit." http://mallet.cs.umass.edu (2002).
This Software is protected by U.S. Patent Numbers 5,794,246; 6,014,670; 6,016,501; 6,029,178; 6,032,158; 6,035,307; 6,044,374; 6,092,086; 6,208,990; 6,339,775;
6,640,226; 6,789,096; 6,820,077; 6,823,373; 6,850,947; 6,895,471; 7,117,215; 7,162,643; 7,243,110, 7,254,590; 7,281,001; 7,421,458; 7,496,588; 7,523,121; 7,584,422;
7676516; 7,720,842; 7,721,270; and 7,774,791, international Patents and other Patents Pending.
DISCLAIMER: Informatica Corporation provides this documentation "as is" without warranty of any kind, either express or implied, including, but not limited to, the implied
warranties of noninfringement, merchantability, or use for a particular purpose. Informatica Corporation does not warrant that this software or documentation is error free. The
information provided in this software or documentation may include technical inaccuracies or typographical errors. The information in this software and documentation is
subject to change at any time without notice.
NOTICES
This Informatica product (the "Software") includes certain drivers (the "DataDirect Drivers") from DataDirect Technologies, an operating company of Progress Software
Corporation ("DataDirect") which are subject to the following terms and conditions:
1. THE DATADIRECT DRIVERS ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
2. IN NO EVENT WILL DATADIRECT OR ITS THIRD PARTY SUPPLIERS BE LIABLE TO THE END-USER CUSTOMER FOR ANY DIRECT, INDIRECT,
INCIDENTAL, SPECIAL, CONSEQUENTIAL OR OTHER DAMAGES ARISING OUT OF THE USE OF THE ODBC DRIVERS, WHETHER OR NOT INFORMED OF
THE POSSIBILITIES OF DAMAGES IN ADVANCE. THESE LIMITATIONS APPLY TO ALL CAUSES OF ACTION, INCLUDING, WITHOUT LIMITATION, BREACH
OF CONTRACT, BREACH OF WARRANTY, NEGLIGENCE, STRICT LIABILITY, MISREPRESENTATION AND OTHER TORTS.
Part Number: DQ-UG-95100-0001
Table of Contents
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Informatica Resources. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Informatica Customer Portal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Informatica Documentation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Informatica Web Site. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Informatica How-To Library. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Informatica Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Informatica Multimedia Knowledge Base. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Informatica Global Customer Support. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Part I: Informatica Data Quality Concepts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 1: Introduction to Data Quality. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Data Quality Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Chapter 2: Reference Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Reference Data Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
User-Defined Reference Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Informatica Reference Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Reference Data and Transformations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Reference Table Structure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Managed and Unmanaged Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Content Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Character Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Classifier Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Pattern Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Probabilistic Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Regular Expressions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Token Sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Creating a Content Set. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Creating a Reusable Content Expression. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Part II: Data Quality Features in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Chapter 3: Column Profiles in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . 19
Column Profile Concepts Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Column Profile Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Table of Contents
i
Scorecards. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Column Profiles in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Filtering Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Sampling Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Creating a Single Data Object Profile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Chapter 4: Column Profile Results in Informatica Developer. . . . . . . . . . . . . . . . . . . 23
Column Profile Results in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Column Value Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Column Pattern Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Column Statistics Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Exporting Profile Results from Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Chapter 5: Rules in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Rules in Informatica Developer Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Creating a Rule in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Applying a Rule in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Chapter 6: Scorecards in Informatica Developer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Scorecards in Informatica Developer Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Creating a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Chapter 7: Mapplet and Mapping Profiling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Mapplet and Mapping Profiling Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Running a Profile on a Mapplet or Mapping Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Comparing Profiles for Mapping or Mapplet Objects. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Generating a Mapping from a Profile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Chapter 8: Reference Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Reference Tables Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Reference Table Data Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Creating a Reference Table Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Creating a Reference Table from a Flat File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
Creating a Reference Table from a Relational Source . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
Copying a Reference Table in the Model Repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
Part III: Data Quality Features in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
Chapter 9: Column Profiles in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Column Profiles in Informatica Analyst Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
Column Profiling Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Profile Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Profile Results Option. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
ii
Table of Contents
Sampling Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Drilldown Options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
Creating a Column Profile in the Analyst Tool. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
Editing a Column Profile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Running a Profile. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Creating a Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
Managing Filters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Synchronizing a Flat File Data Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Synchronizing a Relational Data Object. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Chapter 10: Column Profile Results in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . 45
Column Profile Results in Informatica Analyst Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Profile Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
Column Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Column Patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Column Statistics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Column Profile Drilldown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Drilling Down on Row Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Applying Filters to Drilldown Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Column Profile Export Files in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
Profile Export Results in a CSV File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Profile Export Results in Microsoft Excel. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Exporting Profile Results from Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Chapter 11: Rules in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Rules in Informatica Analyst Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Predefined Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Predefined Rules Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Applying a Predefined Rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Expression Rules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Expression Rules Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Creating an Expression Rule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Chapter 12: Scorecards in Informatica Analyst. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Scorecards in Informatica Analyst Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Informatica Analyst Scorecard Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Metrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Metric Weights. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Adding Columns to a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Running a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Viewing a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Editing a Scorecard. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Defining Thresholds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Table of Contents
iii
Metric Groups. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
Drilling Down on Columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Viewing Trend Charts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
Scorecard Notifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Notification Email Message Template. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Setting Up Scorecard Notifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Configuring Global Settings for Scorecard Notifications. . . . . . . . . . . . . . . . . . . . . . . . . . . 63
Scorecard Integration with External Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Viewing a Scorecard in External Applications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Chapter 13: Exception Record Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Exception Record Management Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Exception Management Process Flow. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Reserved Column Names . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Exception Management Tasks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Viewing and Editing Bad Records. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Updating Bad Record Status. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Viewing and Filtering Duplicate Record Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Editing Duplicate Record Clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Consolidating Duplicate Record Clusters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Viewing the Audit Trail. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
Chapter 14: Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Reference Tables Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Reference Table Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
General Reference Table Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Reference Table Column Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
Create Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
Creating a Reference Table in the Reference Table Editor. . . . . . . . . . . . . . . . . . . . . . . . . 73
Create a Reference Table from Profile Data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Creating a Reference Table from Profile Columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Creating a Reference Table from Column Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
Creating a Reference Table from Column Patterns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Create a Reference Table From a Flat File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Analyst Tool Flat File Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
Creating a Reference Table from a Flat File. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
Create a Reference Table from a Database Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Creating a Database Connection. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Creating a Reference Table from a Database Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
Copying a Reference Table in the Model Repository. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Reference Table Management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Managing Columns. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Managing Rows. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
iv
Table of Contents
Finding and Replacing Values. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
Exporting a Reference Table. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Audit Trail Events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
Viewing Audit Trail Events. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Rules and Guidelines for Reference Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Table of Contents
v
Preface
The Informatica Data Quality User Guide is written for Informatica users who create and run data quality
processes in the Informatica Developer and Informatica Analyst client applications. The Informatica Data Quality
User Guide contains information about profiles and other objects that you can use to analyze the content and
structure of data and to find and fix data quality issues.
Informatica Resources
Informatica Customer Portal
As an Informatica customer, you can access the Informatica Customer Portal site at
http://mysupport.informatica.com. The site contains product information, user group information, newsletters,
access to the Informatica customer support case management system (ATLAS), the Informatica How-To Library,
the Informatica Knowledge Base, the Informatica Multimedia Knowledge Base, Informatica Product
Documentation, and access to the Informatica user community.
Informatica Documentation
The Informatica Documentation team takes every effort to create accurate, usable documentation. If you have
questions, comments, or ideas about this documentation, contact the Informatica Documentation team through
email at [email protected]. We will use your feedback to improve our documentation. Let us
know if we can contact you regarding your comments.
The Documentation team updates documentation as needed. To get the latest documentation for your product,
navigate to Product Documentation from http://mysupport.informatica.com.
Informatica Web Site
You can access the Informatica corporate web site at http://www.informatica.com. The site contains information
about Informatica, its background, upcoming events, and sales offices. You will also find product and partner
information. The services area of the site includes important information about technical support, training and
education, and implementation services.
Informatica How-To Library
As an Informatica customer, you can access the Informatica How-To Library at http://mysupport.informatica.com.
The How-To Library is a collection of resources to help you learn more about Informatica products and features. It
includes articles and interactive demonstrations that provide solutions to common problems, compare features and
behaviors, and guide you through performing specific real-world tasks.
vi
Informatica Knowledge Base
As an Informatica customer, you can access the Informatica Knowledge Base at http://mysupport.informatica.com.
Use the Knowledge Base to search for documented solutions to known technical issues about Informatica
products. You can also find answers to frequently asked questions, technical white papers, and technical tips. If
you have questions, comments, or ideas about the Knowledge Base, contact the Informatica Knowledge Base
team through email at [email protected].
Informatica Multimedia Knowledge Base
As an Informatica customer, you can access the Informatica Multimedia Knowledge Base at
http://mysupport.informatica.com. The Multimedia Knowledge Base is a collection of instructional multimedia files
that help you learn about common concepts and guide you through performing specific tasks. If you have
questions, comments, or ideas about the Multimedia Knowledge Base, contact the Informatica Knowledge Base
team through email at [email protected].
Informatica Global Customer Support
You can contact a Customer Support Center by telephone or through the Online Support. Online Support requires
a user name and password. You can request a user name and password at http://mysupport.informatica.com.
Use the following telephone numbers to contact Informatica Global Customer Support:
North America / South America
Europe / Middle East / Africa
Asia / Australia
Toll Free
Toll Free
Toll Free
Brazil: 0800 891 0202
France: 0805 804632
Australia: 1 800 151 830
Mexico: 001 888 209 8853
Germany: 0800 5891281
New Zealand: 09 9 128 901
North America: +1 877 463 2435
Italy: 800 915 985
Netherlands: 0800 2300001
Portugal: 800 208 360
Standard Rate
Spain: 900 813 166
India: +91 80 4112 5738
Switzerland: 0800 463 200
United Kingdom: 0800 023 4632
Standard Rate
Belgium: +31 30 6022 797
France: +33 1 4138 9226
Germany: +49 1805 702 702
Netherlands: +31 306 022 797
United Kingdom: +44 1628 511445
Preface
vii
viii
Part I: Informatica Data Quality
Concepts
This part contains the following chapters:
¨ Introduction to Data Quality, 2
¨ Reference Data, 4
1
CHAPTER 1
Introduction to Data Quality
This chapter includes the following topic:
¨ Data Quality Overview, 2
Data Quality Overview
Use Informatica Data Quality to analyze the content and structure of your data and enhance the data in ways that
meet your business needs.
You use Informatica applications to design and run processes to complete the following tasks:
¨ Profile data. Profiling reveals the content and structure of data. Profiling is a key step in any data project, as it
can identify strengths and weaknesses in data and help you define a project plan.
¨ Create scorecards to review data quality. A scorecard is a graphical representation of the quality
measurements in a profile.
¨ Standardize data values. Standardize data to remove errors and inconsistencies that you find when you run a
profile. You can standardize variations in punctuation, formatting, and spelling. For example, you can ensure
that the city, state, and ZIP code values are consistent.
¨ Parse data. Parsing reads a field composed of multiple values and creates a field for each value according to
the type of information it contains. Parsing can also add information to records. For example, you can define a
parsing operation to add units of measurement to product data.
¨ Validate postal addresses. Address validation evaluates and enhances the accuracy and deliverability of postal
address data. Address validation corrects errors in addresses and completes partial addresses by comparing
address records against address reference data from national postal carriers. Address validation can also add
postal information that speeds mail delivery and reduces mail costs.
¨ Find duplicate records. Duplicate analysis calculates the degrees of similarity between records by comparing
data from one or more fields in each record. You select the fields to be analyzed, and you select the
comparison strategies to apply to the data. The Developer tool enables two types of duplicate analysis: field
matching, which identifies similar or duplicate records, and identity matching, which identifies similar or
duplicate identities in record data.
¨ Manage exceptions. An exception is a record that contains data quality issues that you correct by hand. You
can run a mapping to capture any exception record that remains in a data set after you run other data quality
processes. You review and edit exception records in the Analyst tool or in Informatica Data Director for Data
Quality.
¨ Create reference data tables. Informatica provides reference data that can enhance several types of data
quality process, including standardization and parsing. You can create reference tables using data from profile
results.
2
¨ Create and run data quality rules. Informatica provides rules that you can run or edit to meet your project
objectives. You can create mapplets and validate them as rules in the Developer tool.
¨ Collaborate with Informatica users. The Model repository stores reference data and rules, and this repository is
available to users of the Developer tool and Analyst tool. Users can collaborate on projects, and different users
can take ownership of objects at different stages of a project.
¨ Export mappings to PowerCenter. You can export and run mappings in PowerCenter. You can export mappings
to PowerCenter to reuse the metadata for physical data integration or to create web services.
Data Quality Overview
3
CHAPTER 2
Reference Data
This chapter includes the following topics:
¨ Reference Data Overview, 4
¨ User-Defined Reference Data, 5
¨ Informatica Reference Data, 6
¨ Reference Data and Transformations, 6
¨ Reference Tables, 7
¨ Content Sets, 8
Reference Data Overview
A reference data object contains a set of data values that you perform search operations in source data. You can
create reference data objects in the Developer tool and Analyst tool, and you can import reference data objects to
the Model repository. The Data Quality Content installer includes reference data objects that you can import.
You can create and edit the following types of reference data:
Reference tables
A reference table contains standard and alternative versions of a set of data values. You add a reference
table to a transformation in the Developer tool to verify that source data values are accurate and correctly
formatted.
A database table contains at least two columns. One column contains the standard or preferred version of a
string, and other columns contain alternative versions. When you add a reference table to a transformation,
the transformation searches the input port data for values that also appear in the table. You can create tables
with any data that is useful to the data project you work on.
Content Sets
Content sets are repository and file objects that contain reference data values. Content sets are similar in
structure to reference tables but they are more commonly used for lower-level There are different types of
content sets. When you add a content set to a transformation, the transformation searches the input port data
for values that appear in the content or for strings that match the data patterns defined in the content set.
The Data Quality Content installer includes reference data objects that you can import. You download the Data
Quality Content Installer from Informatica.
The Data Quality Content installer includes the following types of reference data:
4
Informatica reference tables
Database tables created by Informatica. You import Informatica reference tables when you import accelerator
objects from the Content Installer. The reference tables contain standard and alternative versions of common
business terms from several countries. The types of reference information include telephone area codes,
postcode formats, first names, Social Security number formats, occupations, and acronyms. You can edit
Informatica reference tables.
Informatica content sets
Content sets created by Informatica. You import content sets when you import accelerator objects from the
Content Installer. A content set contains different types of reference data that you can use to perform search
operations in data quality transformations.
Address reference data files
Reference data files that identify all valid addresses in a country. The Address Validator transformation reads
this data. You cannot create or edit address reference data files.
The Content Installer installs files for the countries that you have purchased. Address reference data is
current for a defined period and you must refresh your data regularly, for example every quarter. You cannot
view or edit address reference data.
Identity population files
Contain information on types of personal, household, and corporate identities. The Match transformation and
the Comparison transformation use this data to parse potential identities from input fields. You cannot create
or edit address identity population files.
The Content Installer writes population files to the file system.
User-Defined Reference Data
You can use the values in a data object to create a reference data object.
For example, you can select a data object or profile column that contains values that are specific to a project or
organization. The column values let you create custom reference data objects for a project.
You can build a reference data object from a data column in the following cases:
¨ The data rows in the column contain the same type of information.
¨ The column contains a set of data values that are either correct or incorrect for the project.
Note: Create a reference object with incorrect values when you want to search a data set for incorrect values.
The following table lists common examples of project data columns that can contain reference data:
Information
Reference Data Example
Stock Keeping Unit (SKU) codes
Use an SKU column to create a reference table of valid SKU
code for an organization. Use the reference table to find
correct or incorrect SKU codes in a data set.
Employee codes
Use an employee code or employee ID column to create a
reference table of valid employee codes. Use the reference
table to find errors in employee data.
User-Defined Reference Data
5
Information
Reference Data Example
Customer account numbers
Run a profile on a customer account column to identify
account number patterns. Use the profile to create a token set
of incorrect data patterns. Use the token set to find account
numbers that do not conform to the correct account number
structure.
Customer names
When a customer name column contains first, middle, and
last names, you can create a probabilistic model that defines
the expected structure of the strings in the column. Use the
probabilistic model to find data strings that do not belong in
the column.
Informatica Reference Data
You purchase and download address reference data and identity population data from Informatica. You purchase
an annual subscription to address data for a country, and you can download the latest address data from
Informatica at any time during the subscription period.
The Content Installer user downloads and installs reference data separately from the applications. Contact an
Administrator tool user for information about the reference data installed on your system
Reference Data and Transformations
Several transformations read reference data to perform data quality tasks.
The following transformations can read reference data:
¨ Address Validator. Reads address reference data to verify the accuracy of addresses.
¨ Case Converter. Reads reference data tables to identify strings that must change case.
¨ Classifier. Reads content set data to identify the type of information in a string.
¨ Comparison. Reads identity population data during duplicate analysis.
¨ Labeler. Reads content set data to identify and label strings.
¨ Match. Reads identity population data during duplicate analysis.
¨ Parser. Reads content set data to parse strings based on the information the contain.
¨ Standardizer. Reads reference data tables to standardize strings to a common format.
You can create reference data objects in the Developer tool and Analyst tool. For example, you can create a
reference table from column profile data. You can export reference tables to the file system.
The Data Quality Content Installer file set includes Informatica reference data objects that you can import.
6
Chapter 2: Reference Data
Reference Tables
A reference table contains the standard versions of a set of data values and any alternative version of the values
that you may want to find. You add reference tables to transformations in the Developer tool.
You create reference tables in the following ways:
¨ Create a reference table object and enter data values.
¨ Create a reference table from column profile results.
¨ Create a reference table from data in a flat file.
¨ Create a reference table from data in another database table.
When you create a reference table, the Model repository stores the table metadata. The staging database or
another database stores the column data values. After you create a reference table, you can add and edit
columns, rows, and data values. You can also search and replace values in reference table rows.
Reference Table Structure
Most reference tables contain at least two columns. One column contains the correct or required versions of the
data values. Other columns contain different versions of the values, including alternative versions that may appear
in the source data.
The column that contains the correct or required values is called the valid column. When a transformation reads a
reference table in a mapping, the transformation looks for values in the non-valid columns. When the
transformation finds a non-valid value, it returns the corresponding value from the valid column. You can also
configure a transformation to return a single common value instead of the valid values.
The valid column can contain data that is formally correct, such as ZIP codes. It can contain data that is relevant
to a project, such as stock keeping unit (SKU) numbers that are unique to an organization. You can also create a
valid column from bad data, such as values that contain known data errors that you want to search for.
For example, a Developer tool user creates a reference table that contains a list of valid SKU numbers in a retail
organization. The user adds the reference table to a Labeler transformation and creates a mapping with the
transformation. The user runs the mapping on a product database table. When the mapping runs, the Labeler
creates a column that identifies the product records that do not contain valid SKU numbers.
Reference Tables and the Parser Transformation
You create a reference table with a single column when you want to use the table data in a pattern-based parsing
operation. You configure the Parser transformation to perform pattern-based parsing, and you import the data to
the transformation configuration.
Managed and Unmanaged Reference Tables
Reference tables store metadata in the Model repository. Reference tables can store column data in the reference
data database or in another database. The Content Management Service stores the database connection for the
reference data database.
A managed reference table stores column data in the reference data database. You can edit the values of a
managed table in the Analyst tool and Developer tool.
An unmanaged reference table stores column data in a database other than the reference data database. You
cannot edit the values of an unmanaged table in the Analyst tool or Developer tool.
Reference Tables
7
Content Sets
A content set is a Model repository object that you use to store reusable content expressions. A content
expression is an expression that you can use in Labeler and Parser transformations to identify data.
You can create content sets to organize content expressions into logical groups. For example, if you create a
number of content expressions that identify Portuguese strings, you can create a content set that groups these
content expressions. Create content sets in the Developer tool.
Content expressions include character sets, pattern sets, regular expressions, and token sets. Content
expressions can be system-defined or user-defined. System-defined content expressions cannot be added to
content sets. User-defined content expressions can be reusable or non-reusable.
Character Sets
A character set contains expressions that identify specific characters and character ranges. You can use character
sets in Labeler transformations that use character labeling mode.
Character ranges specify a sequential range of character codes. For example, the character range "[A-C]"
matches the uppercase characters "A," "B," and "C." This character range does not match the lowercase
characters "a," "b," or "c."
Use character sets to identify a specific character or range of characters as part of labeling operations. For
example, you can label all numerals in a column that contains telephone numbers. After labeling the numbers, you
can identify patterns with a Parser transformation and write problematic patterns to separate output ports.
Character Set Properties
Configure properties that determine character labeling operations for a character set.
The following table describes the properties for a user-defined character set:
8
Property
Description
Label
Defines the label that a Labeler transformation applies to data
that matches the character set.
Standard Mode
Enables a simple editing view that includes fields for the start
range and end range.
Start Range
Specifies the first character in a character range.
End Range
Specifies the last character in a character range. For a range
with a single character, leave this field blank.
Advanced Mode
Enables an advanced editing view where you can manually
enter character ranges using range characters and delimiter
characters.
Range Character
Temporarily changes the symbol that signifies a character
range. The range character reverts to the default character
when you close the character set.
Delimiter Character
Temporarily changes the symbol that separates character
ranges. The delimiter character reverts to the default
character when you close the character set.
Chapter 2: Reference Data
Classifier Models
A classifier model analyzes input strings and determines the types of information they contain. You use a classifier
model in a Classifier transformation.
You can use a classifier model when input strings contain significant amounts of data. For example, you can use a
classifier model and Classifier transformation to identify the types of information in a set of documents. You export
the text from each document, and you store the text of each document as a separate field in a single data column.
The Classifier transformation reads the data and classifies the information in each field according to the labels
defined in the model.
The classifier model contains the following columns:
¨ A column that contains the words and phrases that may exist in the input data. The transformation compares
the input data with the data in this column.
¨ A column that contains descriptive labels that may define the information in the data. The transformation
returns a label from this column as output.
The classifier model also contains logic that the Classifier transformation uses to calculate the correct information
type for the input data.
The Model repository stores the metadata for the classifier model object. The column data and logic is stored in a
file in the Informatica installation directory structure.
Note: You cannot create or edit a classifier model in the Developer tool.
Classifier Models and the Core Accelerator
Informatica includes a classifier model in the set of prebuilt mappings and reference data objects called the Core
Accelerator. The Core Accelerator is part of the Informatica Data Quality product. You download the Core
Accelerator from Informatica with the Data Quality Content Installer.
When you download the Data Quality Content Installer, find the Core Accelerator xml file in the Content Installer
file set. Use the Developer tool to import the accelerator objects. The import operation writes the model object to
the Model repository and the model data file to the Informatica file system.
Pattern Sets
A pattern set contains expressions that identify data patterns in the output of a token labeling operation. You can
use pattern sets to analyze the Tokenized Data output port and write matching strings to one or more output ports.
Use pattern sets in Parser transformations that use pattern parsing mode.
For example, you can configure a Parser transformation to use pattern sets that identify names and initials. This
transformation uses the pattern sets to analyze the output of a Labler transformation in token labeling mode. You
can configure the Parser transformation to write names and initials in the output to separate ports.
Pattern Set Properties
Configure properties that determine the patterns in a pattern set.
The following table describes the property for a user-defined pattern set:
Property
Description
Pattern
Defines the patterns that the pattern parser searches for. You
can enter multiple patterns for one pattern set. You can enter
Content Sets
9
Property
Description
patterns constructed from a combination of wildcards,
characters, and strings.
Probabilistic Models
A probabilistic model identifies tokens by the types of information they contain and by their positions in an input
string.
You use probabilistic models with the Labeler and Parser transformations. Select a probabilistic model when you
want to label or parse values on an input port into separate output ports.
A probabilistic model uses a structured set of tokens as a reference data set. A labeling or parsing operation can
use a probabilistic model to answer the following questions about the data that it reads on a port:
¨ Does the port data contain a token that matches the reference data in the model?
¨ What type of information does the token contain?
A probabilistic model contains the following columns:
¨ An input column that represents the data on the input port. You populate the column with sample data from the
input port. The model uses the sample data as reference data in parsing and labeling operations.
¨ One or more label columns that identify the types of information in each input string.
You add the columns to the model, and you assign labels to the tokens in each string. Use the label columns to
indicate the correct position of the tokens in the string.
10
Chapter 2: Reference Data
The following figure shows a probabilistic model in the Developer tool:
When you configure a token labeling operation with a probabilistic model, the Labeler transformation writes the
column name from the probabilistic model to an output port on the transformation. For example, the Labeler can
use a probabilistic model to label the string "Franklin Delano Roosevelt" as "FIRSTNAME MIDDLENAME
LASTNAME."
When you configure a token parsing operation with a probabilistic model, each column you add to the model
becomes an output port on the Parser transformation. The transformation writes each token to an output port
based on its position in the model.
Probabilistic Logic
Probabilistic models behave differently to other types of content set.
Data Quality can infer a match between the input port data values and the model data values even if the port data
is not listed in the model. This means that a probabilistic model does not need to list every token in a data set to
correctly label or parse the tokens in the data set.
Data Quality uses probabilistic or fuzzy logic to identify tokens on the transformation input port that match tokens
in the probabilistic model. The engine updates the fuzzy logic rules when you compile the probabilistic model.
Content Sets
11
Probabilistic Model Advanced Properties
The Advanced Properties dialog box exposes the computational properties that are built into a probabilistic model
when you compile the model.
The basic element in the compilation of probabilistic models is the n-gram. An n-gram is a series of letters that can
be followed or preceded by one or more letters to complete a word. Probabilistic analysis creates n-grams for each
value in the Input column of the probabilistic model. The analysis adds one or more letters to each n-gram to
create different words. If the probabilistic analysis can create a word that matches a value on a Labeler or Parser
transformation input port, then the analysis determines that the Input value in the probabilistic model matches the
input value on the transformation port.
The advanced properties on a probabilistic model determine how the probabilistic model handles n-grams and
other model features.
Note: The default property values represent the preferred settings for probabilistic analysis and probabilistic model
compilation in Informatica. If you edit an advanced property, you may adversely affect the accuracy of the
probabilistic analysis. Do not edit the advanced properties unless you understand the effects of the changes you
make.
Steps to Create a Probabilistic Model
You create a probabilistic model in multiple stages. Complete the tasks associated with each stage to create and
configure a model that you can use in a transformation.
Complete the following tasks:
Create the probabilistic model object in the repository
You can use a data object to create the model, or you can create an empty model.
Assign labels to the input data
If the probabilistic model does not contain labels for the input data values, you must assign the labels.
Compile the probabilistic model
When you have entered the input data and configured the labels, you compile the model. You compile every
time you edit the model.
Creating an Empty Probabilistic Model
You can use a data object as the source for the data in a probabilistic model, or you can create an empty model.
Create an empty probabilistic model when you want to enter the reference data at a later time.
Complete the following steps to create an empty probabilistic model:
1.
In Object Explorer, open or create a content set.
2.
Select the Content view.
3.
Select Probabilistic Models, and click Add.
The Probabilistic Model wizard opens.
4.
Select the Probabilistic Model option.
Click Next.
5.
Enter a name for the model.
Click Finish and save the model.
The probabilistic model opens in the Developer tool.
After you create the empty model, you must add input data.
12
Chapter 2: Reference Data
Creating a Probabilistic Model from a Data Object
You can use a data object as the source for the data in a probabilistic model. For example, use the source data
object from the mapping that will read the probabilistic model. You can also profile an object in the mapping and
create a data object from the profile results.
Probabilistic model logic works best when you use data from the input port on the transformation to populate the
input and label columns in the model.
Complete the following steps to create a probabilistic model from a data object:
1.
In Object Explorer, open or create a content set.
2.
Select the Content view.
3.
Select Probabilistic Models, and click Add.
The Probabilistic Model wizard opens.
4.
Select the Probabilistic Model from Data Objects option.
Click Next.
5.
Enter a name for the model, and browse to the data object you want to use.
Click Next.
6.
Review the available data columns on the data object, and select a column to add as input data or label data
to the model.
¨ To add a data source column to the Input column in the model, select the column name and click Data > .
¨ To use a data source column as a label source for the model, select the column name and click Label > .
Click Next.
7.
Select the number of rows to copy from the data source. Select all rows, or enter the number of rows to copy.
If you enter a number, the model counts the rows from the start of the data set.
8.
Set the delimiters to use for the Input column and Data columns. The delimiters apply when the columns
contain multiple tokens.
The default delimiter is \s, which represents a character space.
9.
Enter a name for a column to contain any token that the labeling or parsing operation cannot recognize.
The default name is O, which stands for Overflow.
10.
Click Finish and save the model.
The probabilistic model opens in the Developer tool.
11.
Click Compile to build the probabilistic logic rules for the model.
Assigning Labels to Probabilistic Model Data
If the data object you use to create the probabilistic model does not contain columns for label data, you must add
the data.
A label is a column name in the probabilistic model. The model uses the column name to identify different types of
information in the input data. You create the label columns, and you assign a label to each token in each input
row. When you assign a label to a token, the model adds the token to the label column.
Follow these guidelines when you assign labels to input data:
¨ A label identifies the type of information that the token represents. A token may represent multiple types of
information if it appears in multiple locations in the input string. For example, you can assign the labels
FIRSTNAME LASTNAME to the names "John Blake" and "Blake Smith."
¨ You must assign a label to every token in every row, even if the tokens repeat in multiple rows.
Content Sets
13
Complete the following steps to assign labels to input data:
1.
Open the probabilistic model in the Developer tool canvas.
2.
Verify that the model contains the input data and label columns that you need.
a.
To add a row of input data, click New. The cursor moves to the first available row in the input data
column. Enter the input data values.
b.
To add a label column, right-click an input data row and select New Label. Enter a column name in the
New Label dialog box.
The label appears in the model.
3.
Right-click an input data row and select View tokens and labels as rows.
The Labels panel displays under the input data column.
Note: A label is a structural element in a probabilistic model. If you add or remove a label in a probabilistic model
after you add the model to a Parser transformation, you invalidate the parsing operation that uses the model. You
must delete and recreate the operation that uses the probabilistic model if you add or remove a label in the model.
Compiling the Probabilistic Model
Each time you add data to a probabilistic model, you must compile the model. This enhances the matching logic in
the Data Quality engine.
u
To update the fuzzy logic that the engine uses for a probabilistic model, open the model and click Compile.
Generating Probabilistic Model Data from a Midstream Profile
You can run a profile on mapping data to create a data source for a probabilistic model. For example, run a profile
on the transformation that you connect to the Labeler or Parser transformation, and populate the model with the
profile data. This ensure that the model data is as close as possible to the data on the input port you select in the
Labeler or Parser transformation.
Complete the following steps to run a midstream mapping profile and generate input data for a probabilistic model:
1.
Open the mapping that contains the transformation you will connect to the Labeler or Parser.
2.
Select a data object and click Profile Now.
Select the Results tab in the profile, and review the profile results.
3.
Under Column Profiling, select the column you want to add to the probabilistic model.
4.
Under Details, select the option to Show Values.
The editor displays the data values in the column you selected.
Note: You can select all values in the column or a subset of values.
5.
If you want to add a subset of column values to a probabilistic model, follow these steps:
a.
Use the Shift or Ctrl keys to select one or multiple values from the editor.
b.
Right-click the values and select Send to > Export Results to File.
6.
If you want to add all column values to a probabilistic model, click the option to Export Value Frequencies to
File.
7.
In the Export dialog box, enter a file name. You can save the file on the Informatica services machine or on
the Developer client machine.
If you save the file on the client machine, enter a path to the file.
You can use the file as a data source for the Label or Data column in the probabilistic model.
14
Chapter 2: Reference Data
Regular Expressions
In the context of content sets, a regular expression is an expression that you can use in parsing and labeling
operations. Use regular expressions to identify one or more strings in input data. You can use regular expressions
in Parser transformations that use token parsing mode. You can also use regular expressions in Labeler
transformations that use token labeling mode.
Parser transformations use regular expressions to match patterns in input data and parse all matching strings to
one or more outputs. For example, you can use a regular expression to identify all email addresses in input data
and parse each email address component to a different output.
Labeler transformations use regular expressions to match an input pattern and create a single label. Regular
expressions that have multiple outputs do not generate multiple labels.
Regular Expression Properties
Configure properties that determine how a regular expression identifies and writes output strings.
The following table describes the properties for a user-defined regular expression:
Property
Description
Number of Outputs
Defines the number of output ports that the regular
expression writes.
Regular Expression
Defines a pattern that the Parser transformation uses to
match strings.
Test Expression
Contains data that you enter to test the regular expression. As
you type data in this field, the field highlights strings that
matches the regular expression.
Next Expression
Moves to the next string that matches the regular expression
and changes the font of that string to bold.
Previous Expression
Moves to the previous string that matches the regular
expression and changes the font of that string to bold.
Token Sets
A token set contains expressions that identify specific tokens. You can use token sets in Labeler transformations
that use token labeling mode. You can also use token sets in Parser transformations that use token parsing mode.
Use token sets to identify specific tokens as part of labeling and parsing operations. For example, you can use a
token set to label all email addresses that use that use an "AccountName@DomainName" format. After labeling
the tokens, you can use the Parser transformation to write email addresses to output ports that you specify.
Content Sets
15
Token Set Properties
Configure properties that determine the labeling operations for a token set.
The following table describes the properties for a user-defined character set:
16
Property
Token Set Mode
Description
Name
N/A
Defines the name of the token set.
Description
N/A
Describes the token set.
Token Set Options
N/A
Defines whether the token set
uses regular expression mode or
character mode.
Label
Regular Expression
Defines the label that a Labeler
transformation applies to data
that matches the token set.
Regular Expression
Regular Expression
Defines a pattern that the Labeler
transformation uses to match
strings.
Test Expression
Regular Expression
Contains data that you enter to
test the regular expression. As
you type data in this field, the
field highlights strings that match
the regular expression.
Next Expression
Regular Expression
Moves to the next string that
matches the regular expression
and changes the font of that
string to bold.
Previous Expression
Regular Expression
Moves to the previous string that
matches the regular expression
and changes the font of that
string to bold.
Label
Character
Defines the label that a Labeler
transformation applies to data
that matches the character set.
Standard Mode
Character
Enables a simple editing view
that includes fields for the start
range and end range.
Start Range
Character
Specifies the first character in a
character range.
End Range
Character
Specifies the last character in a
character range. For singlecharacter ranges, leave this field
blank.
Advanced Mode
Character
Enables an advanced editing
view where you can manually
enter character ranges using
Chapter 2: Reference Data
Property
Token Set Mode
Description
range characters and delimiter
characters.
Range Character
Character
Temporarily changes the symbol
that signifies a character range.
The range character reverts to
the default character when you
close the character set.
Delimiter Character
Character
Temporarily changes the symbol
that separates character ranges.
The delimiter character reverts to
the default character when you
close the character set.
Creating a Content Set
Create content sets to group content expressions according to business requirements. You create content sets in
the Developer tool.
1.
In the Object Explorer view, select the project or folder where you want to store the content set.
2.
Click File > New > Content Set.
3.
Enter a name for the content set.
4.
Optionally, select Browse to change the Model repository location for the content set.
5.
Click Finish.
Creating a Reusable Content Expression
Create reusable content expressions from within a content set. You can use these content expressions in Labeler
transformations and Parser transformations.
1.
Open a content set in the editor and select the Content view.
2.
Select a content expression view.
3.
Click Add.
4.
Enter a name for the content expression.
5.
Optionally, enter a text description of the content expression.
6.
If you selected the Token Set expression view, select a token set mode.
7.
Click Next.
8.
Configure the content expression properties.
9.
Click Finish.
Tip: You can create content expressions by copying them from another content set. Use the Copy To and Paste
From options to create copies of existing content expressions. You can use the CTRL key to select multiple
content expressions when using these options.
Content Sets
17
Part II: Data Quality Features in
Informatica Developer
This part contains the following chapters:
¨ Column Profiles in Informatica Developer, 19
¨ Column Profile Results in Informatica Developer, 23
¨ Rules in Informatica Developer, 26
¨ Scorecards in Informatica Developer, 28
¨ Mapplet and Mapping Profiling, 30
¨ Reference Data, 32
18
CHAPTER 3
Column Profiles in Informatica
Developer
This chapter includes the following topics:
¨ Column Profile Concepts Overview, 19
¨ Column Profile Options, 20
¨ Rules, 20
¨ Scorecards, 20
¨ Column Profiles in Informatica Developer, 21
¨ Creating a Single Data Object Profile, 22
Column Profile Concepts Overview
A column profile determines the characteristics of columns in a data source, such as value frequency,
percentages, and patterns.
Column profiling discovers the following facts about data:
¨ The number of unique and null values in each column, expressed as a number and a percentage.
¨ The patterns of data in each column and the frequencies with which these values occur.
¨ Statistics about the column values, such as the maximum and minimum lengths of values and the first and last
values in each column.
Use column profile options to select the columns on which you want to run a profile, set data sampling options,
and set drilldown options when you create a profile.
A rule is business logic that defines conditions applied to source data when you run a profile. You can add a rule
to the profile to cleanse, change, or validate data.
Create scorecards to periodically review data quality. You create scorecards before and after you apply rules to
profiles so that you can view a graphical representation of the valid values for columns.
19
Column Profile Options
When you create a profile with the Column Profiling option, you can use the profile wizard to define filter and
sampling options. These options determine how the profile reads rows from the data set.
After you complete the steps in the profile wizard, you can add a rule to the profile. The rule can have the business
logic to perform data transformation operations on the data before column profiling.
Rules
Create and apply rules within profiles. A rule is business logic that defines conditions applied to data when you run
a profile. Use rules to further validate the data in a profile and to measure data quality progress.
You can add a rule after you create a profile. You can reuse rules created in either the Analyst tool or Developer
tool in both the tools. Add rules to a profile by selecting a reusable rule or create an expression rule. An
expression rule uses both expression functions and columns to define rule logic. After you create an expression
rule, you can make the rule reusable.
Create expression rules in the Analyst tool. In the Developer tool, you can create a mapplet and validate the
mapplet as a rule. You can run rules from both the Analyst tool and Developer tool.
Scorecards
A scorecard is the graphical representation of the valid values for a column or output of a rule in profile results.
Use scorecards to measure data quality progress. You can create a scorecard from a profile and monitor the
progress of data quality over time.
A scorecard has multiple components, such as metrics, metric groups, and thresholds. After you run a profile, you
can add source columns as metrics to a scorecard and configure the valid values for the metrics. Use a metric
group to categorize related metrics in a scorecard into a set. A threshold identifies the range, in percentage, of bad
data that is acceptable for columns in a record. You can set thresholds for good, acceptable, or unacceptable
ranges of data.
When you run a scorecard, you can configure whether you want to drill down on the metrics for a score on the live
data or staged data. After you run a scorecard and view the scores, you can drill down on each metric to identify
valid data records and records that are not valid. To track data quality effectively, you can use trendcharts and
monitor how the scores change over a period of time.
The profiling warehouse stores the scorecard statistics and configuration information. You can configure a thirdparty application to get the scorecard results and run reports. You can also display the scorecard results in a web
application, portal, or report such as a business intelligence report.
20
Chapter 3: Column Profiles in Informatica Developer
Column Profiles in Informatica Developer
Use a column profile to analyze the characteristics of columns in a data set, such as value percentages and value
patterns. You can add filters to determine the rows that the profile reads at runtime. The profile does not process
rows that do not meet the filter criteria.
You can discover the following types of information about the columns you profile:
¨ The number of times a value appears in a column.
¨ The frequency of occurrence of each value in a column, expressed as a percentage.
¨ The character patterns of the values in a column.
¨ The maximum and minimum lengths of the values in a column, and the first and last values.
You can define a column profile for a data object in a mapping or mapplet or an object in the Model repository. The
object in the repository can be in a single data object profile, multiple data object profile, or profile model.
You can add rules to a column profile. Use rules to select a subset of source data for profiling. You can also
change the drilldown options for column profiles to determine whether the drilldown reads from staged data or live
data.
Filtering Options
You can add filters to determine the rows that a column profile uses when performing profiling operations. The
profile does not process rows that do not meet the filter criteria.
1.
Create or open a column profile.
2.
Select the Filter view.
3.
Click Add.
4.
Select a filter type and click Next.
5.
Enter a name for the filter. Optionally, enter a text description of the filter.
6.
Select Set as Active to apply the filter to the profile. Click Next.
7.
Define the filter criteria.
8.
Click Finish.
Sampling Properties
Configure the sampling properties to determine the number of rows that the profile reads during a profiling
operation.
The following table describes the sampling properties:
Property
Description
All Rows
Reads all rows from the source. Default is enabled.
First
Reads from the first row up to the row you specify.
Random Sample of
Reads a random sample from the number of rows that you specify.
Random Sample (Auto)
Reads from a random sample of rows.
Column Profiles in Informatica Developer
21
Creating a Single Data Object Profile
You can create a single data object profile for one or more columns in a data object and store the profile object in
the Model repository.
1.
In the Object Explorer view, select the data object you want to profile.
2.
Click File > New > Profile to open the profile wizard.
3.
Select Profile and click Next.
4.
Enter a name for the profile and verify the project location. If required, browse to a new location.
5.
Optionally, enter a text description of the profile.
6.
Verify that the name of the data object you selected appears within the Data Objects section.
7.
Click Next.
8.
Configure the profile operations that you want to perform. You can configure the following operations:
¨ Column profiling
¨ Primary key discovery
¨ Functional dependency discovery
¨ Data domain discovery
Note: To enable a profile operation, select Enabled as part of the "Run Profile" action for that operation.
Column profiling is enabled by default.
9.
Review the options for your profile.
You can edit the column selection for all profile types. Review the filter and sampling options for column
profiles. You can review the inference options for primary key, functional dependency, and data domain
discovery. You can also review data domain selection for data domain discovery.
10.
Review the drilldown options, and edit them if necessary. By default, the Enable Row Drilldown option is
selected. You can edit drilldown options for column profiles. The options also determine whether drilldown
operations read from the data source or from staged data, and whether the profile stores result data from
previous profile runs.
11.
Click Finish.
The profile is ready to run.
22
Chapter 3: Column Profiles in Informatica Developer
CHAPTER 4
Column Profile Results in
Informatica Developer
This chapter includes the following topics:
¨ Column Profile Results in Informatica Developer, 23
¨ Column Value Properties, 24
¨ Column Pattern Properties, 24
¨ Column Statistics Properties, 24
¨ Exporting Profile Results from Informatica Developer, 25
Column Profile Results in Informatica Developer
Column profile analysis provides information about data quality by highlighting patterns and instances of nonconformance in data.
The following table describes the profile results for each type of analysis:
Profile Type
Profile Results
Column profile
-
Primary key profile
- Inferred primary keys
- Key violations
Functional dependency profile
- Inferred functional dependencies
- Functional dependency violations
Percentage and count statistics for unique and null values
Inferred datatypes
The datatype that the data source declares for the data
The maximum and minimum values
The date and time of the most recent profile run
Percentage and count statistics for each unique data element in a column
Percentage and count statistics for each unique character pattern in a column
23
Column Value Properties
Column value properties show the values in the profiled columns and the frequency with which each value
appears in each column. The frequencies are shown as a number, a percentage, and a bar chart.
To view column value properties, select Values from the Show menu. Double-click a column value to drill-down to
the rows that contain the value.
The following table describes the properties for column values:
Property
Description
Values
List of all values for the column in the profile.
Frequency
Number of times a value appears in a column.
Percent
Number of times a value appears in a column, expressed as a percentage of all
values in the column.
Chart
Bar chart for the percentage.
Column Pattern Properties
Column pattern properties show the patterns of data in the profiled columns and the frequency with which the
patterns appear in each column. The patterns are shown as a number, a percentage, and a bar chart.
To view pattern information, select Patterns from the Show menu. Double-click a pattern to drill-down to the rows
that contain the pattern.
The following table describes the properties for column value patterns:
Property
Description
Patterns
Pattern for the selected column.
Frequency
Number of times a pattern appears in a column.
Percent
Number of times a pattern appears in a column, expressed as a percentage of all
values in the column.
Chart
Bar chart for the percentage.
Column Statistics Properties
Column statistics properties provide maximum and minimum lengths of values and first and last values.
To view statistical information, select Statistics from the Show menu.
24
Chapter 4: Column Profile Results in Informatica Developer
The following table describes the column statistics properties:
Property
Description
Maximum Length
Length of the longest value in the column.
Minimum Length
Length of the shortest value in the column.
Bottom
Last five values in the column.
Top
First five values in the column.
Note: The profile also displays average and standard deviation statistics for columns of type Integer.
Exporting Profile Results from Informatica Developer
You can export column values and column pattern data from profile results.
Export column values in Distinct Value Count format. Export pattern values in Domain Inference format.
1.
In the Object Explorer view, select and open a profile.
2.
Optionally, run the profile to update the profile results.
3.
Select the Results view.
4.
Select the column that contains the data for export.
5.
Under Details, select Values or select Patterns and click the Export button.
The Export data to a file dialog box opens.
6.
Accept or change the file name. The default name is [Profile_name]_[column_name]_DVC for column value
data and [Profile_name]_[column_name]_DI for pattern data.
7.
Select the type of data to export. You can select either Values for the selected column or Patterns for the
selected column.
8.
Under Save, choose Save on Client and click Browse to select a location and save the file locally in your
computer. By default, Informatica Developer writes the file to a location set in the Data Integration Service
properties of Informatica Administrator.
9.
If you do not want to export field names as the first row, clear the Export field names as first row check box.
10.
Click OK.
Exporting Profile Results from Informatica Developer
25
CHAPTER 5
Rules in Informatica Developer
This chapter includes the following topics:
¨ Rules in Informatica Developer Overview, 26
¨ Creating a Rule in Informatica Developer, 26
¨ Applying a Rule in Informatica Developer, 27
Rules in Informatica Developer Overview
A rule is business logic that defines conditions applied to source data when you run a profile. You can create
reusable rules from mapplets in the Developer tool. You can reuse these rules in Analyst tool profiles to change or
validate source data.
Create a mapplet and validate as a rule. This rule appears as a reusable rule in the Analyst tool. You can apply
the rule to a column profile in the Developer tool or in the Analyst tool.
A rule must meet the following requirements:
¨ It must contain an Input and Output transformation. You cannot use data sources in a rule.
¨ It can contain Expression transformations, Lookup transformations, and passive data quality transformations. It
cannot contain any other type of transformation. For example, a rule cannot contain a Match transformation as
it is an active transformation.
¨ It does not specify cardinality between input groups.
Creating a Rule in Informatica Developer
You need to validate a mapplet as a rule to create a rule in the Developer tool.
Create a mapplet in the Developer tool.
26
1.
Right-click the mapplet editor.
2.
Select Validate As > Rule.
Applying a Rule in Informatica Developer
You can add a rule to a saved column profile. You cannot add a rule to a profile configured for join analysis.
1.
Browse the Object Explorer view and find the profile you need.
2.
Right-click the profile and select Open.
The profile opens in the editor.
3.
Click the Definition tab, and select Rules.
4.
Click Add.
The Apply Rule dialog box opens.
5.
Click Browse to find the rule you want to apply.
Select a rule from a repository project, and click OK.
6.
Click the Value column under Input Values to select an input port for the rule.
7.
Optionally, click the Value column under Output Values to edit the name of the rule output port.
The rule appears in the Definition tab.
Applying a Rule in Informatica Developer
27
CHAPTER 6
Scorecards in Informatica
Developer
This chapter includes the following topics:
¨ Scorecards in Informatica Developer Overview, 28
¨ Creating a Scorecard, 28
Scorecards in Informatica Developer Overview
A scorecard is a graphical representation of the quality measurements in a profile. You can view scorecards in the
Developer tool. After you create a scorecard in the Developer tool, you can connect to the Analyst tool to open the
scorecard. You can run and edit the scorecard in the Analyst tool. You can run the scorecard on current data in
the data object or on data stored in the staging database.
Creating a Scorecard
Create a scorecard and add columns from a profile to the scorecard. You must run a profile before you add
columns to the scorecard.
1.
In the Object Explorer view, select the project or folder where you want to create the scorecard.
2.
Click File > New > Scorecard.
The New Scorecard dialog box appears.
3.
Click Add.
The Select Profile dialog box appears. Select the profile that contains the columns you want to add.
4.
Click OK, then click Next.
5.
Select the columns that you want to add to the scorecard.
By default, the scorecard wizard selects the columns and rules defined in the profile. You cannot add columns
that are not included in the profile.
6.
Click Finish.
The Developer tool creates the scorecard.
28
7.
Optionally, click Open with Informatica Analyst to connect to the Analyst tool and open the scorecard in the
Analyst tool.
Creating a Scorecard
29
CHAPTER 7
Mapplet and Mapping Profiling
This chapter includes the following topics:
¨ Mapplet and Mapping Profiling Overview, 30
¨ Running a Profile on a Mapplet or Mapping Object, 30
¨ Comparing Profiles for Mapping or Mapplet Objects, 31
¨ Generating a Mapping from a Profile, 31
Mapplet and Mapping Profiling Overview
You can define a column profile for an object in a mapplet or mapping. Run a profile on a mapplet or a mapping
object when you want to verify the design of the mapping or mapplet without saving the profile results. You can
also generate a mapping from a profile.
Running a Profile on a Mapplet or Mapping Object
When you run a profile on a mapplet or mapping object, the profile runs on all data columns and enables drilldown operations on the data that is staged for the data object. You can run a profile on a mapplet or mapping
object with multiple output ports.
The profile traces the source data through the mapping to the output ports of the object you selected. The profile
analyzes the data that would appear on those ports if you ran the mapping.
1.
Open a mapplet or mapping.
2.
Verify that the mapplet or mapping is valid.
3.
Right-click a data object or transformation and select Profile Now.
If the transformation has multiple output groups, the Select Output Group dialog box appears. If the
transformation has a single output group, the profile results appear on the Results tab of the profile.
4.
If the transformation has multiple output groups, select the output groups as necessary.
5.
Click OK.
The profile results appears in the Results tab of the profile.
30
Comparing Profiles for Mapping or Mapplet Objects
You can create a profile that analyzes two objects in a mapplet or mapping and compares the results of the
column profiles for those objects.
Like profiles of single mapping or mapplet objects, profile comparisons run on all data columns and enable drilldown operations on the data that is staged for the data objects.
1.
Open a mapplet or mapping.
2.
Verify that the mapplet or mapping is valid.
3.
Press the CTRL key and click two objects in the editor.
4.
Right-click one of the objects and select Compare Profiles.
5.
Optionally, configure the profile comparison to match columns from one object to the other object.
6.
Optionally, match columns by clicking a column in one object and dragging it onto a column in the other
object.
7.
Optionally, choose whether the profile analyzes all columns or matched columns only.
8.
Click OK.
Generating a Mapping from a Profile
You can create a mapping object from a profile. Use the mapping object you create to develop a valid mapping.
The mapping you create has a data source based on the profiled object and can contain transformations based on
profile rule logic. After you create the mapping, add objects to complete it.
1.
In the Object Explorer view, find the profile on which to create the mapping.
2.
Right-click the profile name and select Generate Mapping.
The Generate Mapping dialog box displays.
3.
Enter a mapping name. Optionally, enter a description for the mapping.
4.
Confirm the folder location for the mapping.
By default, the Developer tool creates the mapping in the Mappings folder in the same project as the profile.
Click Browse to select a different location for the mapping.
5.
Confirm the profile definition that the Developer tool uses to create the mapping. To use another profile, click
Select Profile.
6.
Click Finish.
The mapping appears in the Object Explorer.
Add objects to the mapping to complete it.
Comparing Profiles for Mapping or Mapplet Objects
31
CHAPTER 8
Reference Data
This chapter includes the following topics:
¨ Reference Tables Overview, 32
¨ Reference Table Data Properties, 32
¨ Creating a Reference Table Object, 33
¨ Creating a Reference Table from a Flat File, 34
¨ Creating a Reference Table from a Relational Source , 35
¨ Copying a Reference Table in the Model Repository, 36
Reference Tables Overview
Informatica provides reference tables that you can import to the Model repository. You can also create reference
tables and connect to database tables that contain reference data.
Use the Developer tool to create and update reference tables and to add reference data objects to transformations.
Reference Table Data Properties
You can view properties for reference table data and metadata in the Developer tool. The Developer tool displays
the properties when you open the reference table from the Model repository.
A reference table displays general properties and column properties. You can view reference table properties in
the Developer tool. You can view and edit reference table properties in the Analyst tool.
The following table describes the general properties of a reference table:
32
Property
Description
Name
Name of the reference table.
Description
Optional description of the reference table.
The following table describes the column properties of a reference table:
Property
Description
Valid
Identifies the column that contains the valid reference data.
Name
Name of each column.
Data Type
Data type of the data in each column.
Precision
Precision of each column.
Scale
Scale of each column.
Description
Description of the contents of the column. You can optionally
add a description when you create the reference table.
Include a column for low-level descriptions
Indicates that the reference table contains a column for
descriptions of column data.
Default value
Default value for the fields in the column. You can optionally
add a default value when you create the reference table.
Connection Name
Name of the connection to the database that contains the
reference table data values.
Creating a Reference Table Object
Choose this option when you want to create an empty reference table and add values by hand.
1.
Select File > New > Reference Table from the Developer tool menu.
2.
In the new table wizard, select Reference Table as Empty.
3.
Enter a name for the table.
4.
Select a project to store the table metadata.
At the Location field, click Browse. The Select Location dialog box opens and displays the projects in the
repository. Select the project you need.
Click Next.
5.
Add two or more columns to the table. Click the Newoption to create a column.
Set the following properties for each column:
Property
Default Value
Name
column
Data Type
string
Precision
10
Scale
0
Creating a Reference Table Object
33
Property
Default Value
Description
Empty. Optional property.
6.
Select the column that contains the valid values. You can change the order of the columns that you create.
7.
Optionally, edit the following properties:
Property
Default Value
Include a column for row-level descriptions
Cleared
Audit note
Empty
Default value
Empty
Maximum rows to preview
500
Click Finish.
The reference table opens in the Developer tool workspace.
Creating a Reference Table from a Flat File
You can create a reference table from data stored in a flat file.
1.
Select File > New > Reference Table from the Developer tool menu.
2.
In the new table wizard, select Reference Table from a Flat File.
3.
Browse to the file you want to use as the data source for the table.
4.
Enter a name for the table.
5.
Select a project to store the table metadata.
At the Location field, click Browse. The Select Location dialog box opens and displays the projects in the
repository. Select the project you need.
Click Next.
34
6.
Set UTF-8 as the code page.
7.
Specify the delimiter that the flat file uses.
8.
If the flat file contains column names, select the option to import column names from the first line of the file.
9.
Optionally, edit the following properties:
Property
Default Value
Text qualifier
No quotation marks
Start import at line
Line 1
Row Delimiter
\012 LF (\n)
Chapter 8: Reference Data
Property
Default Value
Treat consecutive delimiters as one
Cleared
Escape character
Empty
Retain escape character in data
Cleared
Maximum rows to preview
500
Click Next.
10.
Select the column that contains the valid values. You can change the order of the columns.
11.
Optionally, edit the following properties:
Property
Default Value
Include a column for row-level descriptions
Cleared
Audit note
Empty
Default value
Empty
Maximum rows to preview
500
Click Finish.
The reference table opens in the Developer tool workspace.
Creating a Reference Table from a Relational Source
You can use a database source to create a managed or unmanaged reference table. To create a managed
reference table, connect to the staging database that the Model repository uses. To create an unmanaged
reference table, connect to another database.
Note: You can configure a database connection in the Connection Explorer. If the Developer tool does not show
the Connection Explorer, select Window > Show View > Connection Explorer from the Developer tool menu.
1.
Select File > New > Reference Table from the Developer tool menu.
2.
In the new table wizard, select Reference Table from a Relational Source.. Click Next.
3.
Select a database connection. The Developer tool uses this connection to identify a set of resources for the
new reference table.
At the Connection field, click Browse. The Choose Connection dialog box opens and displays the available
database connections. Click More in the Choose Connection dialog box to browse other connections in the
Informatica domain.
4.
If the database connection you select does not specify the staging database, select Unmanaged table.
5.
Select a database resource.
At the Resource field, click Browse. The Choose Connection dialog box opens and displays the resources
on the database connection. Explore the database and select the resource you need.
6.
Enter a name for the table.
Creating a Reference Table from a Relational Source
35
7.
Select a project to store the reference table object.
At the Location field, click Browse. The Select Location dialog box opens and displays the projects in the
repository. Select the project.
Click Next.
8.
Select the column that contains the valid values. You can change the order of the columns.
9.
Optionally, edit the following properties:
Property
Default Value
Include a column for row-level descriptions
Cleared
Audit note
Empty
Default value
Empty
Maximum rows to preview
500
Click Finish.
Copying a Reference Table in the Model Repository
You can copy a reference table between projects and folders in the Model repository.
The reference table and the copy you create are not linked in the Model repository or in the database. When you
create a copy, you create a new database table.
36
1.
Browse the Model repository, and find the reference table you want to copy.
2.
Right-click the reference table, and select Copy from the context menu.
3.
In the Model repository, find the project or folder you want to store to copy of the table.
4.
Click Paste.
Chapter 8: Reference Data
Part III: Data Quality Features in
Informatica Analyst
This part contains the following chapters:
¨ Column Profiles in Informatica Analyst, 38
¨ Column Profile Results in Informatica Analyst, 45
¨ Rules in Informatica Analyst, 52
¨ Scorecards in Informatica Analyst, 56
¨ Exception Record Management, 66
¨ Reference Tables, 71
37
CHAPTER 9
Column Profiles in Informatica
Analyst
This chapter includes the following topics:
¨ Column Profiles in Informatica Analyst Overview, 38
¨ Column Profiling Process, 39
¨ Profile Options, 39
¨ Creating a Column Profile in the Analyst Tool, 41
¨ Editing a Column Profile, 42
¨ Running a Profile, 42
¨ Creating a Filter, 42
¨ Managing Filters, 43
¨ Synchronizing a Flat File Data Object, 43
¨ Synchronizing a Relational Data Object, 44
Column Profiles in Informatica Analyst Overview
When you create a profile, you select the columns in the data object for which you want to profile data. You can
set or configure sampling and drilldown options for faster profiling. After you run the profile, you can examine the
profiling statistics to understand the data.
You can profile wide tables and flat files that have a large number of columns. You can profile tables with more
than 30 columns and flat files with more than 100 columns. When you create or run a profile, you can choose to
select all the columns or select each column you want to include for profiling. The Analyst tool displays the first 30
columns in the data preview. You can select all columns for drilldown and view value frequencies for these
columns. You can use rules that have more than 50 output fields and include the rule columns for profiling when
you run the profile again.
38
Column Profiling Process
As part of column profiling process, you can choose to create a quick profile or a custom profile for a data object.
Use a quick profile to include all columns for a data object and use the default profile options. Use a custom profile
to select the columns for a data object and to configure the profile results, sampling, and drilldown options.
The following steps describe the column profiling process:
1.
Select the data object you want to profile.
2.
Determine whether you want to create a quick profile or a custom profile.
3.
Choose where you want to save the profile.
4.
Select the columns you want to profile.
5.
Select the profile results option.
6.
Choose the sampling options.
7.
Choose the drilldown options.
8.
Define a filter to determine the rows that the profile reads at run time.
9.
Run the profile.
Note: Consider the following rules and guidelines for column names and profiling multilingual and Unicode data:
¨ You cannot add a column to a profile if both the column name and profile name match. You cannot add the
same column twice to a profile even if you change the column name.
¨ You can profile multilingual data from different sources and view profile results based on the locale settings in
the browser. The Analyst tool changes the Datetime, Numeric, and Decimal datatypes based on the browser
locale.
¨ Sorting on multilingual data. You can sort on multilingual data. The Analyst tool displays the sort order based
on the browser locale.
¨ To profile Unicode data in a DB2 database, set the DB2CODEPAGE database environment variable in the
database and restart the Data Integration Service.
Profile Options
Profile options include profile results option, data sampling options, and data drilldown options. You can configure
these options when you create a column profile for a data object.
You use the New Profile wizard to configure the profile options. You can choose to create a profile with the default
options for columns, sampling, and drilldown options. When you create a profile for multiple data sources, the
Analyst tool uses default column profiling options.
Column Profiling Process
39
Profile Results Option
You can choose to discard previous profile results or to display results for previous profile runs.
The following table describes the profile results option for a profile:
Option
Description
Show results only for columns, rules selected in current run
Discards the profile results for previously profiled columns
and displays results for the columns and rules selected for the
latest profile run. Do not select this option if you want the
Analyst tool to display profile results for previously profiled
columns.
Sampling Options
Sampling options determine the number of rows that the Analyst tool chooses to profile. You can configure
sampling options when you go through the wizard or when you run a profile.
The following table describes the sampling options for a profile:
Option
Description
All Rows
Chooses all rows in the data object.
First <number> Rows
The number of rows that you want to run the profile against.
The Analyst tool chooses the rows from the first rows in the
source.
Random Sample <number> Rows
The number of rows for a random sample to run the profile
against. Random sampling forces the Analyst tool to perform
drilldown on staged data. Note that this can impact drilldown
performance.
Random sample
Random sample size based on the number of rows in the data
object. Random sampling forces the Analyst tool to perform
drilldown on staged data. Note that this can impact drilldown
performance.
Drilldown Options
You can configure drilldown options when you go through the wizard or when you run a profile.
The following table describes the drilldown options for a profile:
40
Options
Description
Enable Row Drilldown
Drills down to row data in the profile results. By default, this
option is selected.
Select Columns
Identifies columns for drilldown that you did not select for
profiling.
Drilldown on live or staged data
Drills down on live data to read current data in the data
source.
Chapter 9: Column Profiles in Informatica Analyst
Options
Description
Drill down on staged data to read profile data that is staged in
the profiling warehouse.
Creating a Column Profile in the Analyst Tool
Select a data object and create a custom profile or a default profile. When you create a custom profile, you can
configure the columns, the rows to sample, and the drilldown options. The Analyst tool creates the profile in the
same project and folder as the data object.
1.
In the Navigator, select the project that contains the data object that you want to create a custom profile for.
2.
In the Contents panel, right-click the data object and select New > Profile.
The New Profile wizard appears. The Column profiling option is selected by default.
3.
Click Next.
4.
In the Sources panel, select a data object.
5.
Choose to create a default profile or a custom profile.
¨ To create a default profile, click Save or Save & Run.
¨ To create a custom profile, click Next.
6.
Enter a name and an optional description for the profile.
7.
In the Folders panel, select the project or folder where you want to create the profile.
The Analyst tool displays the project that you selected and shared projects that contain folders where you can
create the profile. The profile objects in the folder appear in the Profiles panel.
8.
Click Next.
9.
In the Columns panel, select the columns that you want to profile. The columns include any rules you applied
to the profile. The Analyst tool lists the name, datatype, precision, and scale for each column.
Optionally, select Name to select all columns.
10.
Accept the default option in the Profile Results Option panel.
The first time you run the profile, the Analyst tool displays profile results for all columns selected for profiling.
11.
In the Sampling Options panel, configure the sampling options.
12.
In the Drilldown Options panel, configure the drilldown options.
Optionally, click Select Columns to select columns to drill down on. In the Drilldown columns window,
select the columns for drill down and click OK.
13.
Click Next.
14.
Optionally, define a filter for the profile.
15.
Click Next to verify the row drilldown settings including the preview columns for drilldown.
16.
Click Save to create the profile, or click Save & Run to create the profile and then run the profile.
Creating a Column Profile in the Analyst Tool
41
Editing a Column Profile
You can make changes to a column profile after running it.
1.
In the Navigator, select the project or folder that contains the profile that you want to edit.
2.
Click the profile to open it.
The profile opens in a tab.
3.
Click Actions > Edit.
A short-cut menu appears.
4.
Based on the changes you want to make, choose one of the following menu options:
¨ General. Change the basic properties such as name, description, and profile type.
¨ Data Source. Choose another matching data source.
¨ Column Profiling. Select the columns you want to run the profile on and configure the necessary
sampling and drill down options.
¨ Column Profiling Filter. Create, edit, and delete filters.
¨ Column Profiling Rules. Create rules or change current ones.
¨ Data Domain Discovery. Set up data domain discovery options.
5.
Click Save to save the changes or click Save & Run to save the changes and then run the profile.
Running a Profile
Run a profile to analyze a data source for content and structure and select columns and rules for drill down. You
can drill down on live or staged data for columns and rules. You can run a profile on a column or rule without
profiling all the source columns again after you run the profile.
1.
In the Navigator, select the project or folder that contains the profile you want to run.
2.
Click the profile to open it.
The profile appears in a tab. Verify the profile options before you run the profile.
3.
Click Actions > Run Profile.
The Analyst tool displays the profile results.
Creating a Filter
You can create a filter so that you can make a subset of the original data source that meets the filter criteria. You
can then run a profile on this sample data.
1.
Open a profile.
2.
Click Actions > Edit > Column Profiling Filters to open the Edit Profile dialog box.
The current filters appear in the Filters panel.
42
3.
Click New.
4.
Enter a filter name and an optional description.
Chapter 9: Column Profiles in Informatica Analyst
5.
Select a simple, advanced, or SQL filter type.
¨ Simple. Use conditional operators, such as <, >, =, BETWEEN, and ISNULL for each column that you
want to filter.
¨ Advanced. Use function categories, such as Character, Consolidation, Conversion, Financial, Numerical,
and Data cleansing.
Click the function name on the Functions panel to view its return type, description, and parameters. To
include the function in the filter, click the right arrow (>) button, and you can specify the parameters in the
Function dialog box.
Note: For a simple or an advanced filter on a date column, provide the condition in the YYYY/MM/DD
HH:MM:SS format.
¨ SQL. Creates SQL queries. You can create an SQL filter for relational data sources. Enter the WHERE
clause expression to generate the SQL filter. For example, to filter company records in the European
region from a Company table with a Region column, enter
Region = 'Europe'
in the editor.
6.
Click Validate to verify the SQL expression.
Managing Filters
You can create, edit, and delete filters.
1.
In the Navigator, select the project or folder that contains the profile you want to filter.
2.
Open the profile.
3.
Click Actions > Edit > Column Profiling Filters to open the Edit Profile dialog box.
The current filters appear in the Filters panel.
4.
Choose to create, edit, or delete a filter.
¨ Click New to create a filter.
¨ Select a filter, and click Edit to change the filter settings.
¨ Select a filter, and click Delete to remove the filter.
Synchronizing a Flat File Data Object
You can synchronize the changes to an external flat file data source with its data object in Informatica Analyst.
Use the Synchronize Flat File wizard to synchronize the data objects.
1.
In the Contents panel, select a flat file data object.
2.
Click Actions > Synchronize.
The Synchronize Flat File dialog box appears in a new tab.
3.
Verify the flat file path in the Browse and Upload field.
4.
Click Next.
A synchronization status message appears.
Managing Filters
43
5.
When you see a Synchronization complete message, click OK.
The message displays a summary of the metadata changes made to the data object. To view the details of
the metadata changes, use the Properties view.
Synchronizing a Relational Data Object
You can synchronize the changes to an external relational data source with its data object in Informatica Analyst.
External data source changes include adding, changing, and removing columns and changes to rules.
1.
In the Contents panel, select a relational data object.
2.
Click Actions > Synchronize.
A message prompts you to confirm the action.
3.
To complete the synchronization process, click OK. Click Cancel to cancel the process.
If you click OK, a synchronization status message appears.
4.
When you see a Synchronization complete message, click OK.
The message displays a summary of the metadata changes made to the data object. To view the details of
the metadata changes, use the Properties view.
44
Chapter 9: Column Profiles in Informatica Analyst
CHAPTER 10
Column Profile Results in
Informatica Analyst
This chapter includes the following topics:
¨ Column Profile Results in Informatica Analyst Overview, 45
¨ Profile Summary, 46
¨ Column Values, 47
¨ Column Patterns, 47
¨ Column Statistics, 48
¨ Column Profile Drilldown, 49
¨ Column Profile Export Files in Informatica Analyst, 49
Column Profile Results in Informatica Analyst Overview
View profile results to understand the structure of data and analyze its quality. You can view the profile results
after you run a profile. You can view a summary of the columns and rules in the profile and the values, patterns,
and statistics for columns and rules.
After you run a profile, you can view the profile results in the Column Profiling, Properties, and Data Preview
views. You can export value frequencies, pattern frequencies, or drilldown data to a CSV file. You can export the
complete profile summary information to a Microsoft Excel file so that you can view all data in a file for further
analysis.
In the Column Profiling view, you can view the summary information for columns for a profile run. You can view
values, patterns, and statistics for each column in the Values, Patterns, and Statistics views.
The Analyst tool displays rules as columns in profile results. The profile results for a rule appear as a profiled
column. The profile results that appear depend on the profile configuration and sampling options.
The following profiling results appear in the Column Profiling view:
¨ The summary information for the profile run, including the number of unique and null values, inferred datatype,
and last run date and time.
¨ Values for columns and the frequency in which the value appears for the column. The frequency appears as a
number, a percentage, and a chart.
¨ Value patterns for the profiled columns and the frequency in which the pattern appears. The frequency appears
as a number and a percentage.
45
¨ Statistics about the column values, such as average, length, and top and bottom values.
Note: You can select a value or pattern and view profiled rows that match the value or pattern on the Details panel.
In the Properties view, you can view profile properties on the Properties panel. You can view properties for
columns and rules on the Columns and Rules panel.
In the Data Preview view, you can preview the profile data. The Analyst tool includes all columns in the profile and
displays the first 100 rows of data.
Profile Summary
The summary for a profile run includes the number of unique and null values expressed as a number and a
percentage, inferred datatypes, and last run date and time. You can click each profile summary property to sort on
values of the property.
The following table describes the profile summary properties:
46
Property
Description
Name
Name of the column in the profile.
Unique Values
Number of unique values for the column.
% Unique
Percentage of unique values for the column.
Null
Number of null values for the column.
% Null
Percentage of null values for the column.
Datatype
Datatype derived from the values for the column. The Analyst tool can derive the
following datatypes from the datatypes of values in columns:
- String
- Varchar
- Decimal
- Integer
- "-" for Nulls
Note: The Analyst tool cannot derive the datatype from the values of a numeric
column that has a precision greater than 38. The Analyst tool cannot derive the
datatype from the values of a string column that has a precision greater than 255. If
you have a date column on which you are creating a column profile with a year
value earlier than 1800, the inferred datatype may show up as fixed length string.
Change the default value for the year-minimum parameter in the
InferDateTimeConfig.xml, as necessary.
% Inferred
Percentage of values that match the data type inferred by the Analyst tool.
Documented Datatype
Datatype declared for the column in the profiled object.
Maximum Value
Maximum value in the column.
Minimum Value
Minimum value in the column.
Chapter 10: Column Profile Results in Informatica Analyst
Property
Description
Last Profile Run
Date and time you last ran the profile.
Drilldown
If selected, drills down on live data for the column.
Column Values
The column values include values for columns and the frequency in which the value appears for the column.
The following table describes the properties for the column values:
Property
Description
Value
List of all values for the column in the profile.
Note: The Analyst tool excludes the CLOB, BLOB, Raw, and Binary datatypes in column values in a
profile.
Frequency
Number of times a value appears for a column, expressed as a number, a percentage, and a chart.
Percent
Percentage that a value appears for a column.
Chart
Chart for the percentage.
Drill down
Drills down to specific source rows based on a column value.
Note: To sort the Value and Frequency columns, select the columns. When you sort the results of the Frequency
column, the Analyst tool sorts the results based on the datatype of the column.
Column Patterns
The column patterns include the value patterns for the columns and the frequency in which the pattern appears.
The profiling warehouse stores 16,000 unique highest frequency values including NULL values for profile results
by default. If there is at least one NULL value in the profile results, the Analyst tool can display NULL values as
patterns.
Note: The Analyst tool cannot derive the pattern for a numeric column that has a precision greater than 38. The
Analyst tool cannot derive the pattern for a string column that has a precision greater than 255.
The following table describes the properties for the column patterns:
Property
Description
Pattern
Pattern for the column in the profile.
Frequency
Number of times a pattern appears for a column, expressed as a number.
Column Values
47
Property
Description
Percent
Percentage that a pattern appears for a column.
Chart
Chart for the percentage.
Drill down
Drills down to specific source rows based on a column pattern.
The following table describes the pattern characters and what they represent:
Character
Description
9
Represents any numeric character. Informatica Analyst displays up to three characters separately in
the "9" format. The tool displays more than three characters as a value within parentheses. For
example, the format "9(8)" represents a numeric value with 8 digits.
X
Represents any alphabetic character. Informatica Analyst displays up to three characters separately
in the "X" format. The tool displays more than three characters as a value within parentheses. For
example, the format "X(6)" may represent the value "Boston."
Note: The pattern character X is not case sensitive and may represent upper case or lower case
characters from the source data.
p
Represents "(", the left parenthesis.
q
Represents ")", the right parenthesis.
b
Represents a blank space.
Column Statistics
The column statistics include statistics about the column values, such as average, length, and top and bottom
values. The statistics that appear depend on the column type.
The following table describes the types of column statistics for each column type:
48
Statistic
Column Type
Description
Average
Integer
Average of the values for the column.
Standard Deviation
Integer
The standard deviation, or variability between column values, for
all values of the column.
Maximum Length
Integer, String
Length of the longest value for the column.
Minimum Length
Integer, String
Length of the shortest value for the column.
Bottom
Integer, String
Lowest values for the column.
Top
Integer, String
Highest values for the column.
Chapter 10: Column Profile Results in Informatica Analyst
Column Profile Drilldown
Drilldown options for a column profile enable you to drill down to specific rows in the data source based on a
column value. You can choose to read the current data in a data source for drilldown or read profile data staged in
the profile warehouse. When you drill down to a specific row on staged profile data, the Analyst tool creates a
drilldown filter for the matching column value. After you drill down, you can edit, recall, reset, and save the
drilldown filter.
You can select columns for drilldown even if you did not choose those columns for profiling. You can choose to
read the current data in a data source for drilldown or read profile data staged in the profiling warehouse. After you
perform a drilldown on a column value, you can export drilldown data for the selected values or patterns to a CSV
file at a location you choose. Though Informatica Analyst displays the first 200 values for drilldown data, the tool
exports all values to the CSV file.
Drilling Down on Row Data
After you run a profile, you can drill down to specific rows that match the column value or pattern.
1.
Run a profile.
The profile appears in a tab.
2.
In the Summary view, select a column name to view the profile results for the column.
3.
Select a column value on the Values tab or select a column pattern on the Patterns tab.
4.
Click Actions > Drilldown to view the rows of data.
The Drilldown panel displays the rows that contain the values or patterns. The column value or pattern
appears at the top of the panel.
Note: You can choose to drill down on live data or staged data.
Applying Filters to Drilldown Data
You can filter the drilldown data iteratively so that you can analyze data irregularities on the subsets of profile
results.
1.
Drill down to row data in the profile results.
2.
Select a column value on the Values tab.
3.
Right-click and select Drilldown Filter > Edit to open the DrillDown Filter dialog box.
4.
Add filter conditions, and click Run.
5.
To manage current drilldown filters, you can save, recall, or reset filters.
¨ To save a filter, select Drilldown Filter > Save.
¨ To go back to the last saved drilldown filter results, select Drilldown Filter > Recall.
¨ To reset the drilldown filter results, select Drilldown Filter > Reset.
Column Profile Export Files in Informatica Analyst
You can export column profile results to a CSV file or a Microsoft Excel file based on whether you choose a part of
the profile results or the complete results summary.
Column Profile Drilldown
49
You can export value frequencies, pattern frequencies, or drilldown data to a CSV file for selected values and
patterns. You can export the profiling results summary for all columns to a Microsoft Excel file. Use the Data
Integration Service privilege Drilldown and Export Results to determine, by user or group, who exports profile
results.
Profile Export Results in a CSV File
You can export value frequencies, pattern frequencies, or drilldown data to view the data in a file. The Analyst tool
saves the information in a CSV file.
When you export inferred column patterns, the Analyst tool exports a different format of the column pattern. For
example, when you export the inferred column pattern X(5), the Analyst tool displays the following format of the
column pattern in the CSV file: XXXXX.
Profile Export Results in Microsoft Excel
When you export the complete profile results summary, the Analyst tool saves the information to multiple
worksheets in a Microsoft Excel file. The Analyst tool saves the file in the "xlsx" format.
The following table describes the information that appears on each worksheet in the export file:
Tab
Description
Column Profile
Summary information exported from the Column Profiling view
after the profile runs. Examples are column names, rule
names, number of unique values, number of null values,
inferred datatypes, and date and time of the last profile run.
Values
Values for the columns and rules and the frequency in which
the values appear for each column.
Patterns
Value patterns for the columns and rules you ran the profile
on and the frequency in which the patterns appear.
Statistics
Statistics about each column and rule. Examples are average,
length, top values, bottom values, and standard deviation.
Properties
Properties view information, including profile name, type,
sampling policy, and row count.
Exporting Profile Results from Informatica Analyst
You can export the results of a profile to a ".csv" or ".xlsx" file to view the data in a file.
1.
In the Navigator, select the project or folder that contains the profile.
2.
Click the profile to open it.
The profile opens in a tab.
3.
In the Column Profiling view, select the column that you want to export.
4.
Click Actions > Export Data.
The Export Data to a file window appears.
50
5.
Enter the file name. Optionally, use the default file name.
6.
Select the type of data to export.
Chapter 10: Column Profile Results in Informatica Analyst
¨ All (Summary, Values, Patterns, Statistics, Properties)
¨ Value frequencies for the selected column.
¨ Pattern frequencies for the selected column.
¨ Drilldown data for the selected values or patterns.
7.
Enter a file format. The format is Excel for the All option and CSV for the rest of the options.
8.
Select the code page of the file.
9.
Click OK.
Column Profile Export Files in Informatica Analyst
51
CHAPTER 11
Rules in Informatica Analyst
This chapter includes the following topics:
¨ Rules in Informatica Analyst Overview, 52
¨ Predefined Rules, 53
¨ Expression Rules, 54
Rules in Informatica Analyst Overview
A rule is business logic that defines conditions applied to source data when you run a profile. You can add a rule
to the profile to cleanse, change, or validate data.
You may want to use a rule in different circumstances. You can add a rule to cleanse one or more data columns.
You can add a lookup rule that provides information that the source data does not provide. You can add a rule to
validate a cleansing rule for a data quality or data integration project.
You can add a rule before or after you run a profile. When you add a rule to a profile, you can create a rule or you
can apply a rule. You can create or apply the following rule types for a profile:
¨ Expression rules. Use expression functions and columns to define rule logic. Create expression rules in the
Analyst tool. An analyst can create an expression rule and promote it to a reusable rule that other analysts can
use in multiple profiles.
¨ Predefined rules. Includes reusable rules that a developer creates in the Developer tool. Rules that a
developer creates in the Developer tool as mapplets can appear in the Analyst tool as reusable rules.
After you add a rule to a profile, you can run the profile again for the rule column. The Analyst tool displays profile
results for the rule column. You can modify the rule and run the profile again to view changes to the profile results.
The output of a rule can be one or more virtual columns. The virtual columns exist in the profile results. The
Analyst tool profiles the virtual columns. For example, you use a predefined rule that splits a column that contains
first and last names into FIRST_NAME and LAST_NAME virtual columns. The Analyst tool profiles the
FIRST_NAME and LAST_NAME columns.
Note: If you delete a rule object that other object types reference, the Analyst tool displays a message that lists
those object types. Determine the impact of deleting the rule before you delete it.
52
Predefined Rules
Predefined rules are rules created in the Developer tool or provided with the Developer tool and Analyst tool.
Apply predefined rules to the Analyst tool profiles to modify or validate source data.
Predefined rules use transformations to define rule logic. You can use predefined rules with multiple profiles. In
the Model repository, a predefined rule is a mapplet with an input group, an output group, and transformations that
define the rule logic.
Predefined Rules Process
Use the New Rule Wizard to apply a predefined rule to a profile.
You can perform the following steps to apply a predefined rule:
1.
Open a profile.
2.
Select a predefined rule.
3.
Review the rules parameters.
4.
Select the input column.
5.
Configure the profiling options.
Applying a Predefined Rule
Use the New Rule Wizard to apply a predefined rule to a profile. When you apply a predefined rule, you select the
rule and configure the input and output columns for the rule. Apply a predefined rule to use a rule promoted as a
reusable rule or use a rule created by a developer.
1.
In the Navigator, select the project or folder that contains the profile that you want to add the rule to.
2.
Click the profile to open it.
The profile appears in a tab.
3.
Click Actions > Add Rule.
The New Rule window appears.
4.
Select the option to Apply a Rule.
5.
Click Next.
6.
In the Rules panel, select the rule that you want to apply.
The name, datatype, description, and precision columns appear for the Inputs and Outputs columns in the
Rules Parameters panel.
7.
Click Next.
8.
In the Inputs section, select an input column. The input column is a column name in the profile.
9.
Optionally, in the Outputs section, configure the label of the output columns.
10.
Click Next.
11.
In the Columns panel, select the columns you want to profile. The columns include any rules you applied to
the profile. Optionally, select Name to include all columns.
The Analyst tool lists the name, datatype, precision, and scale for each column.
12.
In the Sampling Options panel, configure the sampling options.
13.
In the Drilldown Options panel, configure the drilldown options.
14.
Click Save to apply the rule or click Save & Run to apply the rule and then run the profile.
Predefined Rules
53
Expression Rules
Expression rules use expression functions and columns to define rule logic. Create expression rules and add them
to a profile in the Analyst tool.
Use expression rules to change or validate values for columns in a profile. You can create one or more expression
rules to use in a profile. Expression functions are SQL-like functions used to transform source data. You can
create expression rule logic with the following types of functions:
¨ Character
¨ Conversion
¨ Data Cleansing
¨ Date
¨ Encoding
¨ Financial
¨ Numeric
¨ Scientific
¨ Special
¨ Test
Expression Rules Process
Use the New Rule Wizard to create an expression rule and add it to a profile.
The New Rule Wizard includes an expression editor. Use the expression editor to add expression functions,
configure columns as input to the functions, validate the expression, and configure the return type, precision, and
scale.
The output of an expression rule is a virtual column that uses the name of the rule as the column name. The
Analyst tool profiles the virtual column. For example, you use an expression rule to validate a ZIP code. The rule
returns 1 if the ZIP Code is valid and 0 if the ZIP code is not valid. Informatica Analyst profiles the 1 and 0 output
values of the rule.
You can perform the following steps to create an expression rule:
1.
Open a profile.
2.
Configure the rule logic using expression functions and columns as parameters.
3.
Configure the profiling options.
Creating an Expression Rule
Use the New Rule Wizard to create an expression rule and add it to a profile. Create an expression rule to modify
or validate values for columns in a profile.
1.
In the Navigator, select the project or folder that contains the profile that you want to add the rule to.
2.
In the Contents panel, click the profile to open it.
The profile appears in a tab.
3.
Click Actions > Edit > Column Profiling Rules.
The Edit Profile dialog box appears.
4.
54
Click New.
Chapter 11: Rules in Informatica Analyst
5.
Select Create a rule.
6.
Click Next.
7.
Enter a name and optional description for the rule.
8.
Optionally, choose to promote the rule as a reusable rule and configure the project and folder location.
If you promote a rule to a reusable rule, you or other users can use the rule in another profile as a predefined
rule.
9.
In the Functions tab, select a function and click the right arrow to enter the parameters for the function.
10.
In the Columns tab, select an input column and click the right arrow to add the expression in the Expression
editor. You can also add logical operators to the expression.
11.
Click Validate. You can proceed to the next step if the expression is valid.
12.
Optionally, click Edit to configure the return type, precision, and scale.
13.
Click Next.
14.
In the Columns panel, select the columns you want to profile. The columns include any rules you applied to
the profile. Optionally, select Name to select all columns.
The Analyst tool lists the name, datatype, precision, and scale for each column.
15.
In the Sampling Options panel, configure the sampling options.
16.
In the Drilldown Options panel, configure the drilldown options.
17.
Click Save to create the rule or click Save & Run to create the rule and then run the profile.
Expression Rules
55
CHAPTER 12
Scorecards in Informatica Analyst
This chapter includes the following topics:
¨ Scorecards in Informatica Analyst Overview, 56
¨ Informatica Analyst Scorecard Process, 56
¨ Metrics, 57
¨ Scorecard Notifications, 62
¨ Scorecard Integration with External Applications, 64
Scorecards in Informatica Analyst Overview
A scorecard is the graphical representation of valid values for a column in a profile. You can create scorecards
and drill down on live data or staged data.
Use scorecards to measure data quality progress. For example, you can create a scorecard to measure data
quality before you apply data quality rules. After you apply data quality rules, you can create another scorecard to
compare the effect of the rules on data quality.
Scorecards display the value frequency for columns as scores. The scores reflect the percentage of valid values in
the columns. After you run a profile, you can add columns from the profile as metrics to a scorecard. You can
create metric groups so that you can group related metrics to a single entity. You can define thresholds that
specify the range of bad data acceptable for columns in a record and assign metric weights for each metric. When
you run a scorecard, the Analyst tool generates weighted average values for each metric group. To identify valid
data records and records that are not valid, you can drill down on each column. You can use trend charts in the
Analyst tool to track how scores change over a period of time.
Informatica Analyst Scorecard Process
You can run and edit the scorecard in the Analyst tool. You can create and view a scorecard in the Developer tool.
You can run the scorecard on current data in the data object or on data stored in the staging database.
When you view a scorecard in the Contents view of the Analyst tool, it opens the scorecard in another tab. After
you run the scorecard, you can view the scores on the Scorecard view. You can select the data object and
navigate to the data object from a score within a scorecard. The Analyst tool opens the data object in another tab.
You can perform the following tasks when you work with scorecards:
1.
56
Create a scorecard in the Developer tool and add columns from a profile.
2.
Optionally, connect to the Analyst tool and open the scorecard in the Analyst tool.
3.
After you run a profile, add profile columns as metrics to the scorecard.
4.
Run the scorecard to generate the scores for columns.
5.
View the scorecard to see the scores for each column in a record.
6.
Drill down on the columns for a score.
7.
Edit a scorecard.
8.
Set thresholds for each metric in a scorecard.
9.
Create a group to add or move related metrics in the scorecard.
10.
Edit or delete a group, as required.
11.
View trend charts for each score to monitor how the score changes over time.
Metrics
A metric is a column of a data source or output of a rule that is part of a scorecard. When you create a scorecard,
you can assign a weight to each metric. Create a metric group to categorize related metrics in a scorecard into a
set.
Metric Weights
When you create a scorecard, you can assign a weight to each metric. The default value for a weight is 1.
When you run a scorecard, the Analyst tool calculates the weighted average for each metric group based on the
metric score and weight you assign to each metric.
For example, you assign a weight of W1 to metric M1, and you assign a weight of W2 to metric M2. The Analyst
tool uses the following formula to calculate the weighted average:
(M1 X W1 + M2 X W2) / (W1 + W2)
Adding Columns to a Scorecard
After you run a profile, you can add profile columns to a scorecard. Use the Add to Scorecard Wizard to add
columns from a profile to a scorecard and configure the valid values for the columns. If you add a profile column to
a scorecard from a source profile that has a filter or a sampling option other than All Rows, profile results may not
reflect the scorecard results.
1.
In the Navigator, select the project or folder that contains the profile.
2.
Click the profile to open it.
The profile appears in a tab.
3.
Click Actions > Run Profile to run the profile.
4.
Click Actions > Add to Scorecard.
The Add to Scorecard Wizard appears.
Note: Use the following rules and guidelines before you add columns to a scorecard:
¨ You cannot add a column to a scorecard if both the column name and scorecard name match.
Metrics
57
¨ You cannot add a column twice to a scorecard even if you change the column name.
5.
Select Existing Scorecard to add the columns to an existing scorecard.
The New Scorecard option is selected by default.
6.
Click Next.
7.
Select the scorecard that you want to add the columns to, and click Next.
8.
Select the columns and rules that you want to add to the scorecard as metrics. Optionally, click the check box
in the left column header to select all columns. Optionally, select Column Name to sort column names.
9.
Select each metric in the Metrics panel and configure the valid values from the list of all values in the Score
using: Values panel.
You can select multiple values in the Available Values panel and click the right arrow button to move them to
the Selected Values panel.
10.
Select each metric in the Metrics panel and configure metric thresholds in the Metric Thresholds panel.
You can set thresholds for Good, Acceptable, and Unacceptable scores.
11.
Click Next.
12.
In the Score using: Values panel, set up the metric weight for each metric. You can double-click the default
metric weight of 1 to change the value.
13.
In the Metric Group Thresholds panel, set up metric group thresholds.
14.
Click Save to save the scorecard or click Save & Run to save and run the scorecard.
Running a Scorecard
Run a scorecard to generate scores for columns.
1.
In the Navigator, select the project or folder that contains the scorecard.
2.
Click the scorecard to open it.
The scorecard appears in a tab.
3.
Click Actions > Run Scorecard.
4.
Select a score from the Metrics panel and select the columns from the Columns panel to drill down on.
5.
In the Drilldown option, choose to drill down on live data or staged data.
For optimal performance, drill down on live data.
6.
Click Run.
Viewing a Scorecard
Run a scorecard to see the scores for each metric. A scorecard displays the score as a percentage and bar. View
data that is valid or not valid. You can also view scorecard information, such as the metric weight, metric group
score, score trend, and name of the data object.
1.
Run a scorecard to view the scores.
2.
Select a metric that contains the score you want to view.
3.
Click Actions > Drilldown to view the rows of valid data or rows of data that is not valid for the column.
The Analyst tool displays the rows of valid data by default in the Drilldown panel.
58
Chapter 12: Scorecards in Informatica Analyst
Editing a Scorecard
Edit valid values for metrics in a scorecard. You must run a scorecard before you can edit it.
1.
In the Navigator, select the project or folder that contains the scorecard.
2.
Click the scorecard to open it.
The scorecard appears in a tab.
3.
Click Actions > Edit.
The Edit Scorecard dialog box appears.
4.
On the Metrics tab, select each score in the Metrics panel and configure the valid values from the list of all
values in the Score using: Values panel.
5.
Make changes to the score thresholds in the Metric Thresholds panel as necessary.
6.
Click the Metric Groups tab.
7.
Create, edit, or remove metric groups.
You can also edit the metric weights and metric thresholds on the Metric Groups tab.
8.
Click the Notifications tab.
9.
Make changes to the scorecard notification settings as necessary.
You can set up global and custom settings for metrics and metric groups.
10.
Click Save to save changes to the scorecard, or click Save & Run to save the changes and run the
scorecard.
Defining Thresholds
You can set thresholds for each score in a scorecard. A threshold specifies the range in percentage of bad data
that is acceptable for columns in a record. You can set thresholds for good, acceptable, or unacceptable ranges of
data. You can define thresholds for each column when you add columns to a scorecard, or when you edit a
scorecard.
Complete the following prerequisite tasks before you define thresholds for columns in a scorecard:
¨ In the Navigator, select the project or folder that contains the profile and add columns from the profile to the
scorecard in the Add to Scorecard window.
¨ Optionally, in the Navigator, select the project or folder that contains the scorecard and click the scorecard to
edit it in the Edit Scorecard window.
1.
In the Add to Scorecard window, or the Edit Scorecard window, select each metric in the Metrics panel.
2.
In the Metric Thresholds panel, enter the thresholds that represent the upper bound of the unacceptable
range and the lower bound of the good range.
3.
Click Next or Save.
Metric Groups
Create a metric group to categorize related scores in a scorecard into a set. By default, the Analyst tool
categorizes all the scores in a default metric group.
After you create a metric group, you can move scores out of the default metric group to another metric group. You
can edit a metric group to change its name and description, including the default metric group. You can delete
metric groups that you no longer use. You cannot delete the default metric group.
Metrics
59
Creating a Metric Group
Create a metric group to add related scores in the scorecard to the group.
1.
In the Navigator, select the project or folder that contains the scorecard.
2.
Click the scorecard to open it.
The scorecard appears in a tab.
3.
Click Actions > Edit.
The Edit Scorecard window appears.
4.
Click the Metric Groups tab.
The default group appears in the Metric Groups panel and the scores in the default group appear in the
Metrics panel.
5.
Click the New Group icon to create a metric group.
The Metric Groups dialog box appears.
6.
Enter a name and optional description.
7.
Click OK.
8.
Click Save to save the changes to the scorecard.
Moving Scores to a Metric Group
After you create a metric group, you can move related scores to the metric group.
1.
In the Navigator, select the project or folder that contains the scorecard.
2.
Click the scorecard to open it.
The scorecard appears in a tab.
3.
Click Actions > Edit.
The Edit Scorecard window appears.
4.
Click the Metric Groups tab.
The default group appears in the Metric Groups panel and the scores in the default group appear in the
Metrics panel.
5.
Select a metric from the Metrics panel and click the Move Metrics icon.
The Move Metrics dialog box appears.
Note: To select multiple scores, hold the Shift key.
6.
Select the metric group to move the scores to.
7.
Click OK.
Editing a Metric Group
Edit a metric group to change the name and description. You can change the name of the default metric group.
1.
In the Navigator, select the project or folder that contains the scorecard.
2.
Click the scorecard to open it.
The scorecard opens in a tab.
3.
Click Actions > Edit.
The Edit Scorecard window appears.
60
Chapter 12: Scorecards in Informatica Analyst
4.
Click the Metric Groups tab.
The default metric group appears in the Metric Groups panel and the metrics in the default metric group
appear in the Metrics panel.
5.
On the Metric Groups panel, click the Edit Group icon.
The Edit dialog box appears.
6.
Enter a name and an optional description.
7.
Click OK.
Deleting a Metric Group
You can delete a metric group that is no longer valid. When you delete a metric group, you can choose to move
the scores in the metric group to the default metric group. You cannot delete the default metric group.
1.
In the Navigator, select the project or folder that contains the scorecard.
2.
Click the scorecard to open it.
The scorecard opens in a tab.
3.
Click Actions > Edit.
The Edit Scorecard window appears.
4.
Click the Metric Groups tab.
The default metric group appears in the Metric Groups panel and the metrics in the default metric group
appear in the Metrics panel.
5.
Select a metric group in the Metric Groups panel, and click the Delete Group icon.
The Delete Groups dialog box appears.
6.
Choose the option to delete the metrics in the metric group or the option to move the metrics to the default
metric group before deleting the metric group.
7.
Click OK.
Drilling Down on Columns
Drill down on the columns for a score to select columns that appear when you view the valid data rows or data
rows that are not valid. The columns you select to drill down on appear in the Drilldown panel.
1.
Run a scorecard to view the scores.
2.
Select a column that contains the score you want to view.
3.
Click Actions > Drilldown to view the rows of valid or invalid data for the column.
4.
Click Actions > Drilldown Columns.
The columns appear in the Drilldown panel for the selected score. The Analyst tool displays the rows of valid
data for the columns by default. Optionally, click Invalid to view the rows of data that are not valid.
Viewing Trend Charts
You can view trend charts for each score to monitor how the score changes over time.
1.
In the Navigator, select the project or folder that contains the scorecard.
2.
Click the scorecard to open it.
Metrics
61
The scorecard appears in a tab.
3.
In the Scorecard view, select a score.
4.
Click Actions > Show Trend Chart.
The Trend Chart Detail window appears. You can view score values that have changed over time. The
Analyst tool uses historical scorecard run data for each date and the latest valid score values to calculate the
score. The Analyst tool uses the latest threshold settings in the chart to depict the color of the score points.
Scorecard Notifications
You can configure scorecard notification settings so that the Analyst tool sends emails when specific metric scores
or metric group scores move across thresholds or remain in specific score ranges, such as Unacceptable,
Acceptable, and Good.
You can configure email notifications for individual metric scores and metric groups. If you use the global settings,
the Analyst tool sends notification emails when the scores of selected metrics cross the threshold from the score
ranges Good to Acceptable and Acceptable to Bad. You also get notification emails for each scorecard run if the
score remains in the Unacceptable score range across consecutive scorecard runs.
You can customize the notification settings so that scorecard users get email notifications when the scores move
from the Unacceptable to Acceptable and Acceptable to Good score ranges. You can also choose to send email
notifications if a score remains within specific score ranges for every scorecard run.
Notification Email Message Template
You can set up the message text and structure of email messages that the Analyst tool sends to recipients as part
of scorecard notifications. The email template has an optional introductory text section, read-only message body
section, and optional closing text section.
The following table describes the tags in the email template:
62
Tag
Description
ScorecardName
Name of the scorecard.
ObjectURL
A hyperlink to the scorecard. You need to provide the username and password.
MetricGroupName
Name of the metric group that the metric belongs to.
CurrentWeightedAverage
Weighted average value for the metric group in the current scorecard run.
CurrentRange
The score range, such as Unacceptable, Acceptable, and Good, for the metric
group in the current scorecard run.
PreviousWeightedAverage
Weighted average value for the metric group in the previous scorecard run.
PreviousRange
The score range, such as Unacceptable, Acceptable, and Good, for the metric
group in the previous scorecard run.
ColumnName
Name of the source column that the metric is assigned to.
Chapter 12: Scorecards in Informatica Analyst
Tag
Description
ColumnType
Type of the source column.
RuleName
Name of the rule.
RuleType
Type of the rule.
DataObjectName
Name of the source data object.
Setting Up Scorecard Notifications
You can set up scorecard notifications at both metric and metric group levels. Global notification settings apply to
those metrics and metric groups that do not have individual notification settings.
1.
Run a scorecard in the Analyst tool.
2.
Click Actions > Edit.
3.
Click the Notifications tab.
4.
Select Enable notifications to start configuring scorecard notifications.
5.
Select a metric or metric group.
6.
Click the Notifications check box to enable the global settings for the metric or metric group.
7.
Select Use custom settings to change the settings for the metric or metric group.
You can choose to send a notification email when the score is in Unacceptable, Acceptable, and Good
ranges and moves across thresholds.
8.
To edit the global settings for scorecard notifications, click the Edit Global Settings icon.
The Edit Global Settings dialog box appears where you can edit the settings including the email template.
Configuring Global Settings for Scorecard Notifications
If you choose the global scorecard notification settings, the Analyst tool sends emails to target users when the
score is in the Unacceptable range or moves down across thresholds. As part of the global settings, you can
configure the email template including the email addresses and message text for a scorecard.
1.
Run a scorecard in the Analyst tool.
2.
Click Actions > Edit to open the Edit Scorecard dialog box.
3.
Click the Notifications tab.
4.
Select Enable notifications to start configuring scorecard notifications.
5.
Click the Edit Global Settings icon.
The Edit Global Settings dialog box appears where you can edit the settings, including the email template.
6.
Choose when you want to send email notifications using the Score in and Score moves check boxes.
7.
In the Email from field, change the email ID as necessary.
By default, the Analyst tool uses the Sender Email Address property of the Data Integration Service as the
sender email ID.
8.
In the Email to field, enter the email ID of the recipient.
Use a semicolon to separate multiple email IDs.
Scorecard Notifications
63
9.
Enter the text for the email subject.
10.
In the Body field, add the introductory and closing text of the email message.
11.
To apply the global settings, select Apply settings to all metrics and metric groups.
12.
Click OK.
Scorecard Integration with External Applications
You can create a scorecard in the Analyst tool and view its results in external applications or web portals. Specify
the scorecard results URL in a format that includes the host name, port number, project ID, and scorecard ID to
view the results in external applications.
Open a scorecard after you run it and copy its URL from the browser. The scorecard URL must be in the following
format:
http://{HOST_NAME}:{PORT}/AnalystTool/com.informatica.at.AnalystTool/index.jsp?
mode=scorecard&project={MRS_PROJECT_ID}&id={SCORECARD_ID}&parentpath={MRS_PARENT_PATH}&view={VIEW_MODE}&pcs
fcred={CREDENTIAL}
The following table describes the scorecard URL attributes:
Attribute
Description
HOST_NAME
Host name of the Analyst Service.
PORT
Port number for the Analyst Service.
MRS_PROJECT_ID
Project ID in the Model repository.
SCORECARD_ID
ID of the scorecard.
MRS_PARENT_PATH
Location of the scorecard in the Analyst tool. For example, /
project1/folder1/sub_folder1.
VIEW_MODE
Determines whether a read-only or editable view of the
scorecard gets integrated with the external application.
CREDENTIAL
Last part of the URL generated by the single sign-on feature
that represents the object type such as scorecard.
The VIEW_MODE attribute in the scorecard URL determines whether you can integrate a read-only or editable
view of the scorecard with the external application:
view=objectonly
Displays a read-only view of the scorecard results.
view=objectrunonly
Displays scorecard results where you can run the scorecard and drill down on results.
view=full
Opens the scorecard results in the Analyst tool with full access.
64
Chapter 12: Scorecards in Informatica Analyst
Viewing a Scorecard in External Applications
You view a scorecard using the scorecard URL in external applications or web portals. Copy the scorecard URL
from the Analyst tool and add it to the source code of external applications or web portals.
1.
Run a scorecard in the Analyst tool.
2.
Copy the scorecard URL from the browser.
3.
Verify that the URL matches the http://{HOST_NAME}:{PORT}/AnalystTool/com.informatica.at.AnalystTool/
index.jsp?
mode=scorecard&project={MRS_PROJECT_ID}&id={SCORECARD_ID}&parentpath={MRS_PARENT_PATH}&view={VIEW_MODE}
&pcsfcred={CREDENTIAL} format.
4.
Add the URL to the source code of the external application or web portal.
Scorecard Integration with External Applications
65
CHAPTER 13
Exception Record Management
This chapter includes the following topics:
¨ Exception Record Management Overview, 66
¨ Exception Management Tasks, 68
Exception Record Management Overview
An exception is a record that contains unresolved data quality issues. The record may contain errors, or it may be
an unintended duplicate of another record. You can use the Analyst tool to review and edit exception records that
are identified by a mapping that contains an Exception transformation.
You can review and edit the output from an Exception transformation in the Analyst tool or in the Informatica Data
Director for Data Quality web application. You use Informatica Data Director for Data Quality when you are
assigned a task as part of a workflow.
You can use the Analyst tool to review the following exception types:
Bad records
You can edit records, delete records, tag them to be reprocessed by a mapping, or profile them to analyze the
quality of changes made to the records.
Duplicate records
You can consolidate clusters of similar records to a single master record. You can consolidate or remove
duplicate records, extract records to form new clusters, and profile duplicate records.
The Exception transformation creates a database table to store the bad or duplicate records. The Model repository
stores the data object associated with the table. The transformation also creates one or more tables for the
metadata associated with the bad or duplicate records.
To review and update the bad or duplicate records, import the database table to the staging database in the
Analyst tool. The Analyst tool uses the metadata tables in the database to identify the data quality issues in each
record. You do not use the data object in the Model repository to update the record data.
Exception Management Process Flow
The Exception transformation analyzes the output of other data quality transformations and creates tables that
contain records with different levels of data quality.
After the Exception transformation creates an exception table, you can use the Analyst tool or Informatica Data
Director for Data Quality to review and update the records in the table.
66
You can configure data quality transformations in a single mapping, or you can create mappings for different
stages in the process.
Use the Developer tool to perform the following tasks:
Create a mapping that generates score values for data quality issues
Use a Match transformation in cluster mode to generate score values for duplicate record exceptions.
Use a transformation that writes a business rule to generate score values for records that contain errors. For
example, you can define an IF/THEN rule in a Decision transformation. Use the rule to evaluate the output of
other data quality transformations.
Use an Exception transformation to analyze the record scores
Configure the Exception transformation to read the output of other transformations or to read a data object
from another mapping. Configure the transformation to write records to database tables based on score
values in the records.
Configure target data objects for good records or automatic consolidation records
Connect the Exception transformation output ports to the target data objects in the mapping.
Create the target data object for bad or duplicate records
Use the Generate bad records table or Generate duplicate record table option to create the database
object and add it to the mapping canvas. The Developer tool auto-connects the bad or duplicate record ports
to the data object.
Run the mapping
Run the mapping to process exceptions.
Use the Analyst tool or Informatica Data Director for Data Quality to perform the following tasks:
Review the exception table data
You can use the Analyst tool or Informatica Data Director for Data Quality to review the bad or duplicate
record tables.
¨ Use the Analyst tool to import the exception records into a bad or duplicate record table. Open the
imported table from the Model repository and work on the exception data.
¨ Use Informatica Data Director for Data Quality if you are assigned a task to review or correct exceptions
as part of a Human task.
Note: The exception tables you create in the Exception transformation include columns that provide metadata
to Informatica Data Director for Data Quality. The columns are not used in the Analyst tool. When you import
the tables to the Analyst tool for exception data management, the Analyst tool hides the columns.
Reserved Column Names
When you create a bad record or consolidation table, the Analyst tool generates columns for use in its internal
tables. Do not import tables that use these names. If an imported table contains a column with the same name as
one of the generated columns, the Analyst tool will not process it.
Reserve the following column names for bad record or consolidation tables:
¨ checkStatus
¨ rowIdentifier
¨ acceptChanges
¨ recordGroup
¨ masterRecord
Exception Record Management Overview
67
¨ matchScore
¨ any name beginning with DQA_
Exception Management Tasks
You can perform the following exception management tasks in the Analyst tool:
Manage bad records
Identify problem records and fix data quality issues.
Consolidate duplicate records
Merge groups of duplicate records into a single record.
View the audit trail
Review the changes made in the bad or duplicate record tables before writing the changes to the source
database.
Viewing and Editing Bad Records
Complete these steps to view and edit bad records:
1.
Log in to the Analyst tool.
2.
Select a project.
3.
Select a bad records table.
4.
Optionally, use the menus to filter the table records. You can filter records by value in the following columns:
Priority, Quality Issue, Column, and Status.
5.
Click Show to view the records that match the filter criteria.
6.
Double-click a cell to edit the cell to edit the cell value.
7.
Click Save to save the rows you updated.
Saving changes to a record is the first step in processing the record in the Analyst tool. After you save changes to
a record, you can update the record status to accept, reprocess, or reject the record.
Updating Bad Record Status
For each record that does not require further editing, perform one of the following actions:
Select one or more records by clicking the check box next to each record. Select all the records in the table by
clicking the check box at the top of the first column.
Note: The Analyst tool does not display records that you have taken action on.
¨
Click Accept.
Indicates that the record is acceptable for use.
¨
Click Reject.
Indicates that the record is not acceptable for use.
¨
Click Reprocess.
Selects the record for reprocessing by a data quality mapping. Select this option when you are unsure if the
record is valid. Rerun the mapping with an updated business rule to recheck the record.
68
Chapter 13: Exception Record Management
Viewing and Filtering Duplicate Record Clusters
Complete these steps to view and filter duplicate clusters:
1.
Log in to the Analyst tool.
2.
Select a project.
3.
Select a duplicate record table.
4.
The first cluster in the table opens.
The Analyst tool also displays the number of clusters in the table. Click a number to move to a cluster.
5.
Optionally, use the Filter option to filter the cluster list.
In the Filter Clusters dialog box, select a column and enter a filter string. The Analyst tool returns all clusters
with one or more records that contain the string in the column you select.
Editing Duplicate Record Clusters
Edit clusters to change how the Analyst tool consolidates potential duplicate records.
You can edit clusters in the following ways:
To remove a record from a cluster:
Clear the selection in the Cluster column to remove the record from the cluster. When you delete a record
from a cluster, the record assumes a unique cluster ID.
To create a new cluster from records in the current cluster:
Select a subset of records and click the Extract Cluster button. This action creates a new cluster ID for the
selected records.
To edit the record:
Select a record field to edit the data in that field.
To select the fields that populate the master record:
Click the selection arrow in a field to add its value to the corresponding field in the Final Record row. An
arrow indicates that the field provides data for the master record.
To specify a master record:
Click a cell in the Master column for a row to select that row as the master record.
Consolidating Duplicate Record Clusters
When you have processed a cluster, complete this step to consolidate the cluster records to a single record in the
staging database.
u
In the cluster you processed, click the Consolidate Cluster button.
The Analyst tool performs the following updates on cluster records:
¨ In the staging database, the Analyst tool updates the master record with the contents of the Final record and
sets the status to Updated.
¨ The Analyst tool sets the status of the other selected records to Consolidated.
¨ The Analyst tool sets the status of any cleared record to Reprocess.
Exception Management Tasks
69
Viewing the Audit Trail
The Analyst tool tracks changes to the exception record database in an audit trail.
Complete the following steps to view audit trail records:
1.
Select the Audit Trail tab.
2.
Set the filter options.
3.
Click Show.
The following table describes record statuses for the audit trail.
70
Record Status
Description
Updated
Edited during bad record processing, or selected as the
Master record during consolidation.
Consolidated
Consolidated to a master record during consolidation.
Rejected
Rejected during bad record processing.
Accepted
Accepted during bad record processing.
Reprocess
Marked for reprocessing during bad record processing.
Rematch
Removed from a cluster during consolidation.
Extracted
Extracted from a cluster into a new cluster during
consolidation.
Chapter 13: Exception Record Management
CHAPTER 14
Reference Tables
This chapter includes the following topics:
¨ Reference Tables Overview, 71
¨ Reference Table Properties, 71
¨ Create Reference Tables, 73
¨ Create a Reference Table from Profile Data, 74
¨ Create a Reference Table From a Flat File, 76
¨ Create a Reference Table from a Database Table, 78
¨ Copying a Reference Table in the Model Repository, 79
¨ Reference Table Management, 79
¨ Audit Trail Events, 81
¨ Rules and Guidelines for Reference Tables, 82
Reference Tables Overview
Informatica provides reference tables that you can import to the Model repository. You can also create reference
tables and connect to database tables that contain reference data.
Use the Analyst tool to create and update reference tables.
Reference Table Properties
You can view and edit the properties of a reference table in the Analyst tool.
To view the properties, open the reference table and select the Properties view.
To edit the properties, open the reference table and select the Edit Table option.
A reference table displays general properties that describe the repository object and column properties that
describe the column data.
71
General Reference Table Properties
The general properties include information about the users who created and updated the reference table. The
general properties also identify the current valid column in the table.
The following table describes the general properties:
Property
Description
Name
Name of the reference table.
Description
Optional description of the reference table.
Location
Project that contains the reference table in the Model
repository.
Precision
Precision for the column. Precision is the maximum number of
digits or the maximum number of characters that the column
can accommodate.
Valid Column
Column that contains the valid reference data.
Created on
Creation date for the reference table.
Created By
User who created the reference table.
Last Modified
Date of the most recent update to the reference table.
Last Modified
User who most recently edited the reference table.
Connection ID
Connection name of the database that stores the reference
table data.
Reference Table Column Properties
The column properties include information about the column metadata.
The following table describes the column properties:
Property
Description
Name
Name of each column.
Data Type
The datatype for the data in each column. You can select one
of the following datatypes:
-
bigint
date/time
decimal
double
integer
string
You cannot select a double data type when you create an
empty reference table or create a reference table from a flat
file.
72
Chapter 14: Reference Tables
Property
Description
Precision
Precision for each column. Precision is the maximum number
of digits or the maximum number of characters that the
column can accommodate.
The precision values you configure depend on the data type.
Scale
Scale for each column. Scale is the maximum number of
digits that a column can accommodate to the right of the
decimal point. Applies to decimal columns.
The scale values you configure depend on the data type.
Description
Optional description for each column.
Create Reference Tables
Use the reference table editor, profile results, or a flat file to create reference tables. Create reference tables to
share reference data with developers in the Developer tool.
Use the following methods to create a reference table:
¨ Create a reference table in the reference table editor.
¨ Create a reference table from profile column data or profile pattern data.
¨ Create a reference table from flat file data.
¨ Create a reference table from data in another database table.
Creating a Reference Table in the Reference Table Editor
Use the New Reference Table Wizard and the reference table editor view to create a reference table. You use the
reference table editor to define the table structure and add data to the table.
1.
In the Navigator, select the project or folder where you want to create the reference table.
2.
Click Actions > New > Reference Table.
The New Reference Table Wizard appears.
3.
Select the option to Use the reference table editor.
4.
Click Next.
5.
Enter the table name, and optionally enter a description and default value.
The Analyst tool uses the default value for any table record that does not contain a value.
6.
For each column you want to include in the reference table, click the Add New Column icon and configure
the properties for each column.
Note: You can reorder or delete columns.
7.
Optionally, enter an audit note for the table.
The audit note appears in the audit trail log.
8.
Click Finish.
Create Reference Tables
73
Create a Reference Table from Profile Data
You can use profile data to create reference tables that relate to the source data in the profile. Use the reference
tables to find different types of information in the source data.
You can use a profile to create or update a reference table in the following ways:
¨ Select a column in the profile and add it to a reference table.
¨ Browse a profile column and add a subset of the column data to a reference table.
¨ Select a column in the profile and add the pattern values for that column to a reference table.
Creating a Reference Table from Profile Columns
You can create a reference table from a profile column. You can add a profile column to an existing reference
table. The New Reference Table Wizard adds the column to the reference table.
1.
In the Navigator, select the project or folder that contains the profile with the column that you want to add to a
reference table.
2.
Click the profile name to open it in another tab.
3.
In the Column Profiling view, select the column that you want to add to a reference table.
4.
Click Actions > Add to Reference Table.
The New Reference Table Wizard appears.
5.
Select the option to Create a new reference table.
Optionally, select Add to existing reference table, and click Next. Navigate to the reference table in the
project or folder, preview the reference table data and click Next. Select the column to add and click Finish.
6.
Click Next.
7.
The column name appears by default as the table name. Optionally enter another table name, a description,
and default value.
The Analyst tool uses the default value for any table record that does not contain a value.
8.
Click Next.
9.
In the Column Attributes panel, configure the column properties for the column.
10.
Optionally, choose to create a description column for rows in the reference table.
Enter the name and precision for the column.
11.
Preview the column values in the Preview panel.
12.
Click Next.
13.
The column name appears as the table name by default. Optionally, enter another table name and a
description.
14.
In the Save in panel, select the location where you want to create the reference table.
The Reference Tables: panel lists the reference tables in the location you select.
15.
Optionally, enter an audit note.
16.
Click Finish.
Creating a Reference Table from Column Values
You can create a reference table from the column values in a profile column. Select a column in a profile and
select the column values to add to a reference table or create a reference table to add the column values.
74
Chapter 14: Reference Tables
1.
In the Navigator, select the project or folder that contains the profile with the column that you want to add to a
reference table.
2.
Click the profile name to open it in another tab.
3.
In the Column Profiling view, select the column that you want to add to a reference table.
4.
In the Values view, select the column values you want to add. Use the CONTROL or SHIFT keys to select
multiple values.
5.
Click Actions > Add to Reference Table.
The New Reference Table Wizard appears.
6.
Select the option to Create a new reference table.
Optionally, select Add to existing reference table, and click Next. Navigate to the reference table in the
project or folder, preview the reference table data and click Next. Select the column to add and click Finish.
7.
Click Next.
8.
The column name appears by default as the table name. Optionally enter another table name, a description,
and default value.
The Analyst tool uses the default value for any table record that does not contain a value.
9.
Click Next.
10.
In the Column Attributes panel, configure the column properties for the column.
11.
Optionally, choose to create a description column for rows in the reference table.
Enter the name and precision for the column.
12.
Preview the column values in the Preview panel.
13.
Click Next.
14.
The column name appears as the table name by default. Optionally, enter another table name and a
description.
15.
In the Save in panel, select the location where you want to create the reference table.
The Reference Tables: panel lists the reference tables in the location you select.
16.
Optionally, enter an audit note.
17.
Click Finish.
Creating a Reference Table from Column Patterns
You can create a reference table from the column patterns in a profile column. Select a column in the profile and
select the pattern values to add to a reference table or create a reference table to add the pattern values.
1.
In the Navigator, select the project or folder that contains the profile with the column that you want to add to a
reference table.
2.
Click the profile name to open it in another tab.
3.
In the Column Profiling view, select the column that you want to add to a reference table.
4.
In the Patterns view, select the column patterns you want to add. Use the CONTROL or SHIFT keys to select
multiple values
5.
Click Actions > Add to Reference Table.
The New Reference Table Wizard appears.
6.
Select the option to Create a new reference table.
Create a Reference Table from Profile Data
75
Optionally, select Add to existing reference table, and click Next. Navigate to the reference table in the
project or folder, preview the reference table data and click Next. Select the column to add and click Finish.
7.
Click Next.
8.
The column name appears by default as the table name. Optionally enter another table name, a description,
and default value.
The Analyst tool uses the default value for any table record that does not contain a value.
9.
Click Next.
10.
In the Column Attributes panel, configure the column properties for the column.
11.
Optionally, choose to create a description column for rows in the reference table.
Enter the name and precision for the column.
12.
Preview the column values in the Preview panel.
13.
Click Next.
14.
The column name appears as the table name by default. Optionally, enter another table name and a
description.
15.
In the Save in panel, select the location where you want to create the reference table.
The Reference Tables: panel lists the reference tables in the location you select.
16.
Optionally, enter an audit note.
17.
Click Finish
Create a Reference Table From a Flat File
You can import reference data from a CSV file. Use the New Reference Table wizard to import the file data.
You must configure the properties for each flat file that you use to create a reference table.
Analyst Tool Flat File Properties
When you import a flat file as a reference table, you must configure the properties for each column in the file. The
options that you configure determine how the Analyst tool reads the data from the file.
The following table describes the properties you can configure when you import file data for a reference table:
Properties
Description
Delimiters
Character used to separate columns of data. Use the Other
field to enter a different delimiter.
Delimiters must be printable characters and must be different
from the escape character and the quote character if selected.
You cannot select non-printing multibyte characters as
delimiters.
Text Qualifier
Quote character that defines the boundaries of text strings.
Choose No Quote, Single Quote, or Double Quotes.
If you select a quote character, the wizard ignores delimiters
within pairs of quotes.
76
Chapter 14: Reference Tables
Properties
Description
Column Names
Imports column names from the first line. Select this option if
column names appear in the first row.
The wizard uses data in the first row in the preview for column
names.
Default is not enabled.
Values
Option to start value import from a line. Indicates the row
number in the preview at which the wizard starts reading
when it imports the file.
Creating a Reference Table from a Flat File
When you create a reference table data from a flat file, the table uses the column structure of the file and imports
the file data.
1.
In the Navigator, select the project or folder where you want to create the reference table.
2.
Click Actions > New > Reference Table.
The New Reference Table Wizard appears.
3.
Select the option to Import a flat file.
4.
Click Next.
5.
Click Browse to select the flat file.
6.
Click Upload to upload the file to a directory in the Informatica services installation directory that the Analyst
tool can access.
7.
Enter the table name. Optionally, enter a description and default value.
The Analyst tool uses the default value for any table record that does not contain a value.
8.
Select a code page that matches the data in the flat file.
9.
Preview the data in the Preview of file panel.
10.
Click Next.
11.
Configure the flat file properties.
12.
In the Preview panel, click Show to update the preview.
13.
Click Next.
14.
On the Column Attributes panel, verify or edit the column properties for each column.
15.
Optionally, create a description column for rows in the reference table. Enter the name and precision for the
column.
16.
Optionally, enter an audit note for the table.
17.
Click Finish.
Create a Reference Table From a Flat File
77
Create a Reference Table from a Database Table
When you create a reference table from a database table, you connect to the database and import the table data.
Use the New Reference Table wizard to enter the database connection properties for the table. Then import the
tables into a folder into the Model repository.
Creating a Database Connection
Before you import reference tables from a database, you create a database connection in the Analyst tool.
1.
Select a project or folder in the Navigator.
2.
Click Actions > New > Reference Table.
The New Reference Table Wizard appears.
3.
Select the option to Connect to a relational table.
Optionally, select the option to create an unmanaged reference table. If you select this option, the Analyst tool
does not store the reference table data in the reference data database.
4.
Click Next.
5.
Click New Connection.
The New Connection window appears.
6.
Enter the properties for the database you want to connect to.
7.
Select Grant everyone execute permission on this connection.
8.
Click OK.
The Analyst tool tests the database connection. The database connection appears in the list of established
connections.
Creating a Reference Table from a Database Table
To create the reference table, connect to a database and import the column data you need.
1.
In the Navigator, select the project or folder where you want to create the reference table.
2.
Click Actions > New > Reference Table.
The New Reference Table Wizard appears.
3.
Select the option to Connect to a relational table.
4.
Select Unmanaged Table if you want to create a table that does not store data in the reference data
database. You cannot edit the values in an unmanaged reference table.
5.
Click Next.
6.
Select the database connection from the list of established connections.
7.
Click Next.
8.
On the Tables panel, select a table.
The table properties appear on the Properties panel.
9.
78
Optionally, click Data Preview.
10.
Click Next.
11.
On the Column Attributes panel, configure the column properties for each column.
12.
Optionally, include a column for row-level descriptions.
Chapter 14: Reference Tables
13.
Optionally, add an audit note in the Audit Note field.
14.
Click Next.
15.
Enter a name and optionally a description for the reference table.
16.
On the Folders panel, select the project or folder where you want to create the reference table.
17.
The Reference Tables panel lists the reference tables in the folder you select.
18.
Click Finish.
Copying a Reference Table in the Model Repository
You can copy a reference table between folders in a Model repository project.
The reference table and the copy you create are not linked in the Model repository or in the database. When you
create a copy, you create a new database table.
1.
Browse the Model repository, and find the reference table you want to copy.
2.
Right-click the reference table, and select Duplicate from the context menu.
3.
In the Duplicate dialog box, select a folder to store the copy of the reference table.
4.
Optionally, enter a new name for the copy of the reference table.
5.
Click OK.
Reference Table Management
You can perform tasks to manage reference tables. You can find and replace column values, add or remove
columns and rows, edit column values, and export a reference table to a file.
You can perform the following tasks to manage reference tables:
¨ Manage columns. Use the Edit column properties window to add, edit, or delete columns in a reference
table.
¨ Manage rows. Use the Add Rows window to add rows and the Edit Row window to edit rows in a reference
table. Use the Delete icon to delete rows in a reference table.
¨ Find and replace values. You can find and replace values in individual reference table columns. You can find
a value in a column and replace it with another value. You can replace all values in columns with another value.
¨ Export a reference table. Export a reference table to a comma-separated values (CSV) file, dictionary file, or
Excel file.
Managing Columns
Use the Edit column properties window to add, edit, or delete columns in a reference table.
1.
In the Navigator, select the project or folder that contains the reference table that you want to edit.
2.
Click the reference table name to open it in a tab. The Reference Table tab appears.
3.
Click Actions > Edit Table or click the Edit Table icon.
Copying a Reference Table in the Model Repository
79
The Edit column properties window appears.
4.
To add a column, click the Add New Column icon in the Column Attributes panel and edit the column
properties. Or, to edit an existing column, click the property you want to edit.
You cannot edit the datatype, precision, and scale of the column. You can rename the column and change the
column description.
5.
To delete a column, click the column and click the Delete icon.
6.
Optionally, you can enter an audit note on the Audit Note panel. The audit note appears in the audit log for
any action you perform in the Edit column properties window.
7.
Click OK.
Managing Rows
You can add, edit, or delete rows in a reference table.
1.
In the Navigator, select the project or folder containing the reference table that you want to edit.
2.
Click the reference table name to open it in a tab. The Reference Table tab appears.
3.
To add a row, click Actions > Add Row or click the Add Row icon. In the Add Row window, enter the value
for each column and enter an optional audit note. Click OK.
4.
To edit rows, select the rows and click Actions > Edit or click the Edit icon. In the Edit Rows window, enter
the value for each column, select the columns to apply the changes to, and enter an optional audit note.
Optionally, click Previous to edit the previous row and click Next to edit the next row. Click Apply to apply
the changes.
The new column values appear in the tab.
5.
To delete rows, select the rows you want to delete and click Actions > Delete or click the Delete icon. In the
Delete Rows window, enter an optional audit note and click OK.
Note: Use the Developer to edit larger reference tables. For example, if the reference table contains more than
500 rows or five columns, edit the reference table in the Developer tool.
Finding and Replacing Values
You can find and replace values in individual reference table columns.
1.
In the Navigator, select the project or folder containing the reference table that you want to find and replace
values in.
2.
Click the reference table name to open it in a tab. The Reference Table tab appears.
3.
Click Actions > Find and Replace or click the Find and Replace icon.
The Find and Replace toolbar appears.
4.
80
Enter the search criteria in the Find box. Select all columns or a column that you want to find in the list. Enter
the value you want to replace with, and click one of the following buttons:
Option
Description
Next/Previous
Scroll through the column values that match the search criteria.
Highlight All
Highlight all the column values that match the search criteria.
Replace
Replace the currently highlighted column value.
Chapter 14: Reference Tables
Option
Description
Replace All
Replace all occurrences of the search criteria in column values.
Exporting a Reference Table
Export a reference table to a comma-seperated values (CSV) file, dictionary file, or Microsoft Excel file.
1.
In the Navigator, select the project or folder containing the reference table that you want to view the audit trail
for.
2.
Click the reference table name to open it in a tab. The Reference Table tab appears.
3.
Click Actions > Export Data.
The Export data to a file window appears.
4.
Configure the following options:
Option
Description
File Name
File name for the exported data.
File Format
Format of the exported file. You can select the following formats:
¨ csv. Comma-separated values file.
¨ xls. Microsoft Excel file.
¨ dic. Dictionary file.
Optionally, select Export field names as first row to export the column names as a header row
in the exported file.
Code Page
5.
Code page of the reference data.
Click OK.
The options to save or open the file depend on your browser.
Audit Trail Events
Use the Audit Trail view for a reference table to view audit trail log events.
The Analyst tool creates audit trail log events when you make a change to a reference table and enter an audit
trail note. Audit trail log events provide information about the reference tables that you manage.
Audit Trail Events
81
You can configure query options on the Audit Trail tab to filter the log events that you view. You can specify filters
on the date range, type, user name, and status. The following table describes the options you configure when you
view audit trail log events:
Option
Description
Date
Start and end dates for the log events to search for. Use the calender to choose dates.
Type
Type of audit trail events. You can filter and view the following events types:
- Data. Events related to data in the reference table. Events include creating, editing, deleting,
and replacing all rows.
- Metadata. Events related to reference table metadata. Events include creating reference
tables, adding, deleting, and editing columns, and updating valid columns.
User
User who edited the reference table and entered the audit trail comment. The Analyst tool
generates the list of users from the Analyst tool users configured in the Administrator tool.
Status
Status of the audit trail log events. Status corresponds to the action performed in the reference
table editor.
Audit trail log events also include the audit trail comments and the column values that were inserted, updated, or
deleted.
Viewing Audit Trail Events
View audit trail log events to get more information about changes made to a reference table.
1.
In the Navigator, select the project or folder that contains the reference table that you want to view the audit
trail for.
2.
Click the reference table name to open it in a tab. The Reference Table tab appears.
3.
Click the Audit Trail view.
4.
Configure the filter options.
5.
Click Show.
The log events for the specified query options appear.
Rules and Guidelines for Reference Tables
Use the following rules and guidelines while working with reference tables in the Analyst tool:
¨ When you import a reference table from an Oracle, IBM DB2, IBM DB2/zOS, IBM DB2/iOS, or Microsoft SQL
Server database, the Analyst tool cannot display the preview if the table, view, schema, synonym, or column
names contain mixed case or lower case characters.
To preview data in tables that reside in case-sensitive databases, set the Support Mixed Case Identifiers
attribute to true in the connections for Oracle, IBM DB2, IBM DB2/zOS, IBM DB2/iOS, and Microsoft SQL
Server databases in the Developer tool or Administrator tool.
¨ When you create a reference table from inferred column patterns in one format, the Analyst tool populates the
reference table with column patterns in a different format.
82
Chapter 14: Reference Tables
For example, when you create a reference table for the column pattern X(5), the Analyst tool displays the
following format for the column pattern in the reference table: XXXXX.
¨ When you import an Oracle database table, verify the length of any VARCHAR2 column in the table. The
Analyst tool cannot import an Oracle database table that contains a VARCHAR2 column with a length greater
than 1000.
¨ To read a reference table, you need execute permissions on the connection to the database that stores the
table data values. For example, if the reference data database stores the data values, you need execute
permissions on the connection to the reference data database. This applies whether you access the reference
table in read or write mode. The database connection permissions apply to all reference data in the database.
Rules and Guidelines for Reference Tables
83
INDEX
C
column profile
drilldown 49
Informatica Developer 21
options 20
overview 19
process 39
column profile results
Informatica Developer 23
column properties
reference tables in Analyst tool 71
reference tables in Developer tool 32
creating a custom profile
profiles 41
creating a reference table from column patterns
reference tables 75
creating a reference table from column values
reference tables 75
creating a reference table from profile columns
reference tables 74
creating a reference table manually
reference tables 73
creating an expression rule
rules 54
Informatica Analyst
column profile results 45
column profiles overview 38
rules 52
Informatica Data Quality
overview 2
Informatica Developer
rules 26
M
managing columns
reference tables 79
managing rows
reference tables 80
mapping object
running a profile 30
Mapplet and Mapping Profiling
Overview 30
P
exporting a reference table
reference tables 81
expression rules
process 54
predefined rules
process 53
profile results
column patterns 47
column statistics 48
column values 47
drilling down 49
Excel 50
exporting 50
exporting from Informatica Analyst 50
exporting in Informatica Developer 25
summary 46
profiles
creating a custom profile 41
running 42
F
R
finding and replacing valyes
reference tables 80
flat file properties
reference tables in Analyst tool 71
reference tables in Developer tool 32
flat files
synchronizing a flat file data object 43
reference tables
column properties in Analyst tool 71
column properties in Developer tool 32
creating a reference table from column patterns 75
creating a reference table from column values 75
creating a reference table from profile columns 74
creating a reference table manually 73
exporting a reference table 81
finding and replacing values 80
flat file properties in Analyst tool 71
flat file properties in Developer tool 32
importing a reference table 77
managed and unmanaged 7
D
data object profiles
creating a single profile 22
E
I
importing a reference table
reference tables 77
84
managing columns 79
managing rows 80
viewing audit trail tables 82
rules
applying a predefined rule 53
applying in Informatica Developer 27
creating an expression rule 54
creating in Informatica Developer 26
expression 54
overview 20
predefined 53
S
scorecard
configuring global notification settings 63
configuring notifications 63
viewing in external applications 65
scorecard integration
Informatica Analyst 64
scorecards
adding columns to a scoredard 57
creating a metric group 60
defining thresholds 59
deleting a metric group 61
drilling down 61
editing 59
editing a metric group 60
Informatica Analyst 56
Informatica Analyst process 56
Informatica Developer 28
metric groups 59
metric weights 57
metrics 57
moving scores 60
notifications 62
overview 20
running 58
viewing 58
T
tables
synchronizing a relational data object 44
trend charts
viewing 61
V
viewing audit table events
reference tables 82
Index
85