Release Notes
Cisco Data Prep
Version 1.0
First Published: August, 2015
Last Updated: July 10, 2015
Cisco Systems, Inc.
www.cisco.com
THE SPECIFICATIONS AND INFORMATION REGARDING THE PRODUCTS IN THIS MANUAL ARE SUBJECT TO CHANGE
WITHOUT NOTICE. ALL STATEMENTS, INFORMATION, AND RECOMMENDATIONS IN THIS MANUAL ARE BELIEVED TO BE
ACCURATE BUT ARE PRESENTED WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED. USERS MUST TAKE FULL
RESPONSIBILITY FOR THEIR APPLICATION OF ANY PRODUCTS.
THE SOFTWARE LICENSE AND LIMITED WARRANTY FOR THE ACCOMPANYING PRODUCT ARE SET FORTH IN THE
INFORMATION PACKET THAT SHIPPED WITH THE PRODUCT AND ARE INCORPORATED HEREIN BY THIS REFERENCE. IF YOU
ARE UNABLE TO LOCATE THE SOFTWARE LICENSE OR LIMITED WARRANTY, CONTACT YOUR CISCO REPRESENTATIVE FOR
A COPY.
The Cisco implementation of TCP header compression is an adaptation of a program developed by the University of California,
Berkeley (UCB) as part of UCB’s public domain version of the UNIX operating system. All rights reserved. Copyright © 1981,
Regents of the University of California.
NOTWITHSTANDING ANY OTHER WARRANTY HEREIN, ALL DOCUMENT FILES AND SOFTWARE OF THESE SUPPLIERS ARE
PROVIDED “AS IS” WITH ALL FAULTS. CISCO AND THE ABOVE-NAMED SUPPLIERS DISCLAIM ALL WARRANTIES, EXPRESSED
OR IMPLIED, INCLUDING, WITHOUT LIMITATION, THOSE OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
NONINFRINGEMENT OR ARISING FROM A COURSE OF DEALING, USAGE, OR TRADE PRACTICE.
IN NO EVENT SHALL CISCO OR ITS SUPPLIERS BE LIABLE FOR ANY INDIRECT, SPECIAL, CONSEQUENTIAL, OR INCIDENTAL
DAMAGES, INCLUDING, WITHOUT LIMITATION, LOST PROFITS OR LOSS OR DAMAGE TO DATA ARISING OUT OF THE USE OR
INABILITY TO USE THIS MANUAL, EVEN IF CISCO OR ITS SUPPLIERS HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES.
Any Internet Protocol (IP) addresses and phone numbers used in this document are not intended to be actual addresses and
phone numbers. Any examples, command display output, network topology diagrams, and other figures included in the
document are shown for illustrative purposes only. Any use of actual IP addresses or phone numbers in illustrative content is
unintentional and coincidental.
All printed copies and duplicate soft copies are considered un-Controlled copies and the original on-line version should be
referred to for latest version.
Cisco has more than 200 offices worldwide. Addresses, phone numbers, and fax numbers are listed on the Cisco website at
www.cisco.com/go/offices.
Red Hat® is a registered trademark of Red Hat, Inc.
CentOS™ is a trademark of the CentOS project.
Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries. SUSE® is a registered trademark of
SUSE LLC in the United States and other countries. Java™ and JDK™ are trademarks of Oracle Corporation.
MongoDB™ is a trademark of MongoDB, Inc. Chrome™ is a trademark of Google Inc.
Firefox™ is a trademark of the Mozilla Software Foundation.
Excel® and Windows® are registered trademarks of Microsoft Corporation in the United States and/or other countries.
Mac OS X® is a trademark of Apple Inc., registered in the U.S. and other countries.
Cisco and the Cisco logo are trademarks or registered trademarks of Cisco and/or its affiliates in the U.S. and other countries.
To view a list of Cisco trademarks, go to this URL: www.cisco.com/go/trademarks. Third-party trademarks mentioned are the
property of their respective owners. The use of the word partner does not imply a partnership relationship between Cisco and
any other company. (1110R)
© 2015 Cisco Systems, Inc. All rights reserved.
2
Preface
Conventions
This document uses the following conventions.
Conventions
Indication
bold font
Commands and keywords and user-entered text appear in bold font.
italic font
Document titles, new or emphasized terms, and arguments for which you supply values are in
italic font.
[ ]
Elements in square brackets are optional.
{x | y | z }
Required alternative keywords are grouped in braces and separated by vertical bars.
[x|y|z]
Optional alternative keywords are grouped in brackets and separated by vertical bars.
string
A nonquoted set of characters. Do not use quotation marks around the string or the string will
include the quotation marks.
courier font
Terminal sessions and information the system displays appear in courier font.
< >
Nonprinting characters such as passwords are in angle brackets.
[ ]
Default responses to system prompts are in square brackets.
!, #
An exclamation point (!) or a pound sign (#) at the beginning of a line of code indicates a
comment line.
Note: Means reader take note. Notes contain helpful suggestions or references to material not covered in the manual.
Caution: Means reader be careful. In this situation, you might perform an action that could result in equipment
damage or loss of data.
Obtaining Documentation and Submitting a Service Request
For information on obtaining documentation, using the Cisco Bug Search Tool (BST), submitting a service request, and
gathering additional information, see What’s New in Cisco Product Documentation at:
http://www.cisco.com/c/en/us/td/docs/general/whatsnew/whatsnew.html.
Subscribe to What’s New in Cisco Product Documentation, which lists all new and revised Cisco technical
documentation, as an RSS feed and deliver content directly to your desktop using a reader application. The RSS feeds
are a free service.
Document Change History
This section provides the revision history for this guide.
Version Number
Issue Date
1.0
August 2015
Status
Reason for Change
New product offering.
Cisco Systems, Inc.
3
www.cisco.com
Preface
4
New in This Version
Introduction of New Admin Tenant
A new tenant has been created to maximize data security and privacy. Prior to this release, a Cisco Data Prep installation
started with a single, public, Production tenant for data preparation. Starting with this release, a Cisco Data Prep
installation creates two tenants: Admin and Production. By default, the Production tenant is set to create new projects
and data sets with private visibility. Th e Admin tenant supports administrative activities such as testing and
troubleshooting. Having a separate tenant eliminates the need for an administrator to directly work in the Production
tenant.
The following default users are automatically created for each tenant type:
Admin tenant –superuser and admin
Production tenant –prodadmin
Recommendations:
All administrative activities (out-of-band testing and troubleshooting) should be performed from the Admin tenant and not
the Production tenant.
The Production tenant must only be used for data preparation.
superuser credentials should be restricted to only allow re-setting of passwords for admin and prodadmin users.
Introduction of Resource-Level Permissions
In addition to the existing role-based user permissions, permissions are now further enforced with additional
resource-level permissions. The new resource-level permissions provide more granular control of your library data sets
and your projects. For example, you can now restrict specific users from being able to delete a specific data set Foo.
Overview of how the feature works:
Based on a tenant's resource visibility setting of public or private, a data set or project is set accordingly when
imported into or created in Cisco Data Prep. The default setting is private.
public tenant defined: The data set or project allows full access (for example, read, write, delete) to all tenant users
with appropriate user-level permissions. Public visibility preserves prior product behavior for backward compatibility.
private tenant defined: All data sets and projects remain private to the user who created them until that user gives
explicit access to other users. The resource-level permissions can be configured through the user interface or
programmatically using the REST API. See product help for more information on setting the resource-level
permissions.
Important: Changing a tenant's default resource visibility setting does not affect the public or private setting for data
sets that already exist in the tenant. Data sets imported into a public tenant remain public even if the tenant is later
set to private.
Cisco Systems, Inc.
5
www.cisco.com
Resolved Issues
User actions are only permitted when a user has both the required user-level and resource-level permissions. In
this release, user-level and resource-level permissions are independent from one another. The
resource-permissions apply to all versions of a data set; they are not version specific. Resource-permissions are
immediately enforced as they are configured.
Enabled KerberosAuthentication to Hive
Cisco Data Prep supports the publishing of data sets to Hive in order to provide data access to third parties using SQL over
JDBC. This requires Cisco Data Prep to connect to Hive via JDBC. Prior to this release, the Cisco Data Prep application
servers could not authenticate to Hive. With the Cisco Data Prep application server can now also use Kerberos to
authenticate to Hive.
Enabled Kerberos Authentication to Mongo
The Cisco Data Prep application server stores metadata in a Mongo database. Prior to this release, the Cisco Data Prep
application server could be configured to connect to Mongo with any of the following authentication mechanisms:
No authentication
Authentication using challenge/response (username, password)
X.509 (certificate-based) authentication
With this release, Kerberos is another authentication option.
Cloudera Hadoop (CDH 5.4) and Spark 1.3
Cisco Data Prep uses the Hadoop Distributed File System (HDFS) for storing and retrieving data. Prior to this release,
Cisco Data Prep supported Cloudera HDFS (CDH) version 4.7. Cisco Data Prep is now adding support for CDH 5.4.
This is the last release with CDH 4.7 support. Important: On-premise customers planning to update to CDH 5.4 should
refer to the Upgrade document for required update steps.
Cisco Data Prep uses Spark for in-memory computations. Prior to this release, Cisco Data Prep used the Spark
distribution from Databricks (original Hadoop code base). Cloudera's Spark version that matches CDH 5.4 is Spark 1.3.
Therefore, Cisco Data Prep now supports Spark 1.3.
UI Enhancements
Previously, the Library and Projects Dashboard navigation, and help were located in the left margin of the page. The
navigation and help are now located on top of the page in the right corner. This UI enhancement increases the amount of
the browser window available for data preview.
Highlight current row: clicking and dragging the cursor across any row in a data set highlights that row.
Cell Level Histogram
New visualization displays the relative to-scale magnitude of a numeric value compared to the other numeric values in
that column.
Resolved Issues
Pattern highlighting now works in Google Chrome version 43.e
Negative numbers used in a formula to compute a column are now correctly handled by keeping the negative values
intact.
6
Changed in This Version
Changed in This Version
Lookup match options in UI
Renamed the Fuzzy lookup option to Automatic.
Introduced a third lookup option: Custom. This new matching option allows users to selectively choose how to handle
word ord case, whitespace, and specific punctuation values. See product help for more information on using
this option.
Updated UI to accommodate the new Custom matching option: radio buttons for selecting a match option have been
replaced with a drop-down.
All three matching options are now collectively referred to as IntellifusionTM matching methods. Automatic remains the
default matching method.
Preemptive Security Enhancement
Added a preemptive security enhancement to prevent phishing attacks.
Known Issues
Issue
Description
Data set version
numbers in
wrong order
If the search permissions are revoked and re-granted, the data set versions are displayed in the wrong
order.
Very large
numbers rounded
when imported
as numeric
The platform treats all numeric values as double precision (64-bit) IEEE 754 values on import. Integer
values that exceed this supported range must be imported to the data library as text fields.
Connectivity
failures not
Data Library database connectivity failures are not logged.
Regular
expression
syntax for
compute column
Regular expression syntax for compute column is not aligned with Java RegEx for \d.
Buttons appear in
wrong location in
Chrome for small
browser window
sizes
Resource Level
Permissions not
enforced after
export to Hive
Cisco Data Prep has a minimum resolution and width that is required for the browser buttons to remain
in place.
Salesforce
connectivity
without
authentication
When exporting result sets to Hive, the Resource Permissions are no longer enforced at the object level.
The permissions revert to global within the tenant.
The
following
role controls which users can access Hive data in the tenant: AccessJDBC.
When logging into SalesForce through Cisco Data Prep, a SalesForce login token determines how long
to keep the user logged into SalesForce. The token expires after a specific amount of time. This means
it's possible for a user to logout of Cisco Data Prep, but remain automatically logged into SalesForce.
The amount of time before the login token expires is a setting that can be adjusted from the SalesForce
7
Known Issues
8
© Copyright 2026 Paperzz