Enterprise Security for Big Data Environments

Enterprise Security for Big Data Environments
A Multi-Layered Architecture for Defense-in-Depth Protection
THOUGHT LEADERSHIP FROM ORACLE AND INTEL®
| JULY 2016
IN ASSOCIATION WITH
Introduction: More Data, More Risks
IT professionals have always been tasked with ensuring the safety and provenance of corporate information. Today
that responsibility is magnified for two primary reasons: there is more data and there is more risk. At home and at
work, our appetite for data is insatiable. From book recommendations to climate research, manufacturing to
healthcare, tiny sensors to giant mainframes, the sources and amounts of data continue to expand, along with our
ability to store, exchange, and analyze that data for personal and professional use. According to researchers at IDC,
the digital universe will be 44 times bigger in 2020 that it was in 2009, a byproduct of the digitization of nearly every
activity in personal, public, and commercial life.
Even as data grows and becomes more essential to our livelihood, the risk of its misuse escalates as well. In every
industry organizations are embarking on big data initiatives. More data and more information systems means more
opportunities for intrusion. Practically every day we hear news stories about data breaches involving high-profile
banks, vendors, and retailers—not to mention countless other cases of personal attacks on our individual systems
and identities. Even big data that has been anonymized can be correlated with other data sets to discern personally
identifiable information.
Business and IT executives are learning through harsh experience that big data brings big security
headaches. – MIT Technology Review
Unfortunately, most security technologies aren’t foolproof, leaving organizations exposed to malicious code and
criminal intentions. Traditional IT security takes a haphazard approach to quelling threats, even as destructive
malware is set loose in plain view on the Internet. The majority of IT security budgets are used to protect the
network, with less than a third used to directly protect the data and intellectual property that reside inside the
organization, according to CSO Market Pulse. Network firewalls and antivirus software packages do little to prevent
these security breaches, most of which involve tricking end-users into running malicious programs on their desktops,
thus invalidating firewall protection.
This white paper describes a comprehensive big data security strategy from Oracle and Intel that protects big data
environments at multiple levels. This three-pillared strategy focuses on essential controls that secure data at the
source. The strategy includes:
•
Preventive controls to mitigate unauthorized access to sensitive systems and data
•
Detective controls that reveal unauthorized system and data changes through auditing, monitoring, and
reporting
•
Administrative measures that help keep track of sensitive data, so you always know where all your big
data resides, and who is authorized to access it.
The Many Uses of Big Data
Architecturally, big data consists of highly distributed systems, linked by inter-node communication technologies. In
most cases, that data is online and is shared across many different functional components. It is accessible to
authorized users on internal networks. IDC identifies three primary big data use cases:
1 |
ENTERPRISE SECURITY FOR BIG DATA ENVIRONMENTS: A WHITE PAPER FROM ORACLE AND INTEL
Operational intelligence focuses on high-velocity data streaming and event processing that facilitates up to the
moment decision-making. It is often tied to sense-and-respond processes that entail monitoring a stream for specific
events and then queuing up an appropriate response. These systems may involve a feedback loop in which a realtime data stream is monitored for events, and then the raw data from the stream is loaded into a database for
additional analysis. For example, sensors on an assembly line can detect when a machine is out of tolerance. Afterthe-fact analytics can determine what is causing a recurring problem.
Exploration and discovery is geared towards discovering signals, relationships, and patterns in the data. The goal
is to uncover insights that impact decision making as well as to monitor organizational performance to establish best
practices, make informed predictions, and deliver actionable insights from a steady stream of information.
Performance management involves strategic decisions about past performance. By supplementing traditional data
warehouse analytics with Big Data analytics you can increase the timeliness of business reporting as well as
entertain new types of data sources, from IoT sensors to cellular call data to social media streams.
Security Threats and Limitations
Generally speaking, outsiders are prevented from accessing big data environments by traditional perimeter security
at the boundaries of a private network. However, with today’s sophisticated break-in strategies, perimeter security is
no longer adequate. Hacking has evolved from “crime for kicks,” carried out by mischievous youth, to global
espionage, hacktivist, and black market criminals that are part of sophisticated crime syndicates and money
launderers. These malicious organized criminals exist solely to rob individuals and organizations of their money and
intellectual property. Criminals often try to lift health information, credit card numbers, and other vital information in
order to sell it on the black market.
No company wants its data to be compromised or its systems to be breached. However, most traditional IT security
practices aren’t strong enough to resist the new types of malware, phishing schemes, netbots, and SQL injection
attacks unleashed by cybercriminal organizations. When it comes to detrimental security breaches, it is no longer a
question of if, but when.
Perimeter-based approaches to security are no longer sufficient. A CSO Market Pulse survey found that
two-thirds of security budgets are used to protect the network, with less than a third used to directly protect
the data and intellectual property that reside inside the organization.
Today’s big data environments often include both sensitive and nonsensitive data (including anonymous data).
Hackers can correlate de-anonymized data sets to identify people and their preferences. For example, one high
profile test case involved hacking an anonymous data set from Netflix. Security professionals correlated this data
with Internet Movie Database (IMDB) data to identify members of both services, and then compared the two data
sets to show how they could discover political leanings, sexual preferences, and other personal information, all
based on the movies people watched. Another company looked at a data set about New York City taxicab services
(pickups, drop-offs, fare amounts) and then correlated it to people in the area to figure out where certain people
tended to go, including celebrities. They could have potentially extended this method to tracking political figures as
well.
Security Issues With Hadoop
Many of today’s big data projects incorporate Apache Hadoop, an open-source framework for storing and
processing big data in a distributed fashion. Business analysts load data into Hadoop to detect patterns and extract
2 |
ENTERPRISE SECURITY FOR BIG DATA ENVIRONMENTS: A WHITE PAPER FROM ORACLE AND INTEL
insights from structured, semi-structured, and unstructured data. Unfortunately, not all organizations have strong
data security in place for these activities. There may be personally identifiable information and intellectual property
loaded into these data sets.
Initially developed as a way to distribute big data processing jobs among many clustered servers, the Hadoop
architecture wasn’t built with security in mind. Namely, it lacks access controls on the data, including password
controls, file and database authorization, and auditing. As such, it doesn’t comply with important industry standards
such as the Health Insurance Portability and Accountability Act (HIPAA) and the Payment Card Industry Data
Security Standard (PCI DSS). In the European Union, General Data Protection Regulations (GDPR) introduce many
additional obligations for companies, with fines of up to 4 percent of annual turnover or €20 million for companies
that don’t comply. According to the regulations, both data controllers and data processors may be subject to court
proceedings and have to pay compensation to victims for infringements of the regulations.
A Hybrid Approach to Big Data Security from Oracle and Intel
Many big data projects begin with a small test group in an isolated sandbox environment and then steadily grow into
large-scale production implementations. At some point they go online—often before proper security controls have
been implemented. This progression endangers not only the data environment but other production systems as well.
Whether moving data to the cloud or storing data on premises, customers want to know how to secure all of their
structured and unstructured data. Oracle and Intel offer a hybrid approach that preserves investments in existing
databases while allowing you to leverage data coming in from other sources.
The Oracle big data environment consists of several different technologies including
Oracle Big Data Appliance with Hadoop Distributed File System along with Oracle
Database and Oracle NoSQL Database. Intel adds industry-leading encryption
technology within its Xeon® processor family. Oracle also offers a multi-layered
defense-in-depth security architecture, which will be covered in more detail below.
Cloudera Enterprise software enables real-time analytics on massive data sets with
enterprise-class data protection. These innovative capabilities enhance open-source
Apache Hadoop solutions. Oracle and Intel enhance these implementations as follows:
•
Cloudera Enterprise can be run on Oracle Big Data Appliance to achieve industry-leading performance
•
Oracle supplies integrated security, with access and data protection at each layer via accelerated
encryption capabilities
•
Cloudera Enterprise software includes built-in support for enterprise-class access controls. It is also
optimized for Intel® Advanced Encryption Standard New Instructions (Intel® AES-NI), a technology that is
built into Intel® Xeon® processors.
In the remainder of this paper, we will explain how these unique and complementary technologies enable a
complete security strategy for big data implementations that addresses all the crucial aspects of infrastructure
security, data privacy, data management, data integrity, and reactive security.
Introduction to Oracle Defense-in-Depth Security
What are the main causes of most security breaches? Inadequate security controls, excessive privileges granted to
internal users, and an over-reliance on network and perimeter security. Defense-in-depth is an information
3 |
ENTERPRISE SECURITY FOR BIG DATA ENVIRONMENTS: A WHITE PAPER FROM ORACLE AND INTEL
assurance strategy in which multiple layers of security are established throughout the IT infrastructure. Oracle and
Intel use this proven approach to extend security and encryption technology all the way down to the silicon layer.
Having redundant controls provides exceptional resiliency in the event of a security breach. If a vulnerability is
discovered and exploited in one layer, the attacker will invariably be stopped in another layer—much like a medieval
castle with a moat, iron doors, heavily guarded ramparts, and so forth. When properly implemented, a defense-indepth security strategy not only prevents breaches, but also buys an organization time to detect and respond to
attacks, reducing or mitigating the consequences of the breach.
Oracle’s multilayered, defense-in-depth security strategy utilizes three sets of controls: preventive, detective, and
administrative.
In a layered, defense-in-depth security architecture, everything on top inherits the security from below—from the
silicon to the firmware to the operating system to the applications to the middleware to the data. This is the most
secure and efficient way to set up a big data environment since it maximizes the security controls at each layer. For
example, database security is more efficient than application security since the database underlies the applications.
You might have hundreds or even thousands of applications. Rather than coding encryption instructions into each
application, it is much easier and more effective to handle encryption in the database. As each of the applications
call the database, the data is encrypted—at rest and in transit.
Another reason to “push security down” as low as possible in the stack is because it has less of an impact on
performance. The ultimate goal is to push security down to the silicon layer, so that data is encrypted within the
processor, or “chip.” This allows for safe, high performance, in-memory processing.
Oracle’s in-memory processing technology allows you to process many gigabytes of data in memory, at the silicon
layer, rather than retrieving data from disc drives. Thanks to Intel AES-NI technology, data in memory as well as in
transit can be programmed to be encrypted with minimal impact to performance.
4 |
ENTERPRISE SECURITY FOR BIG DATA ENVIRONMENTS: A WHITE PAPER FROM ORACLE AND INTEL
Preventive Controls
Preventive controls stop intruders from gaining unauthorized access to systems and data. They also help to govern
administrative access by putting realms around the database. Administrators can only access data from the realms
for which they are authorized.
Along with encryption, Oracle Advanced Security controls include data redaction, which redacts sensitive data out of
the application layer. Users looking at an application may see asterisks instead of actual information. For example,
social security numbers might reveal only the last four digits for reference purposes. The data in the database is
encrypted, but redacted when viewed.
Data masking is similar to redaction, but for nonproduction environments. With Oracle Data Masking and
Subsetting, sensitive information such as credit card numbers and social security numbers can be replaced with
non-factual values, allowing production data to be safely used for development, testing, or sharing with partners.
This comes into play in situations such as when a third party is testing an organization’s code. During testing,
information such as credit card numbers is substituted with appropriate data, rather than actual numbers.
Oracle Database Vault increases the security of the Oracle database by preventing unlimited, ad-hoc access to
application data from administrative accounts as well as by governing legitimate administrative activity.
Oracle Label Security protects sensitive data by assigning a data label or data classification to each row in an
application table. It mediates access by comparing the data label against the label of the user requesting access.
5 |
ENTERPRISE SECURITY FOR BIG DATA ENVIRONMENTS: A WHITE PAPER FROM ORACLE AND INTEL
The Encryption Paradox
Data encryption is at the heart of a good prevention strategy. It helps address privacy and regulatory requirements
by encrypting personally identifiable information such as social security and credit card numbers. Unfortunately,
many businesses don't use encryption because of a perceived performance hit associated with encrypting and
decrypting the data. Traditional encryption software requires compute-intensive process that can slow down
querying, reporting, and analytics, putting a thorn in the side of big data security.
Thanks to a close engineering relationship between Oracle and Intel, customers no longer need to choose between
performance and data protection. Intel® Data Protection Technology with Advanced Encryption Standard New
Instructions (AES-NI) reduces performance latency for encryption and decryption operations at the silicon level, for
all big data operations.
Details on the Intel Encryption Solution
Intel tests have shown that Intel AES-NI can accelerate encryption and decryption performance in an Apache
Hadoop cluster by up to 17x and measured by in memory data processing with AES CTR mode. The process is
transparent to users. It can be applied on a file-by-file basis, and it works in combination with a broad range of
standards-based key management solutions. When an encrypted file enters the Apache Hadoop environment, it
remains encrypted in HDFS. It is decrypted as needed for processing and re-encrypted before it is moved back into
storage. The results of all analysis activities are also encrypted, including intermediate results. Data and results are
never stored or transmitted in unencrypted form.
These advanced encryption capabilities allow you to take full advantage of Apache Hadoop while protecting
sensitive data and complying with industry regulations including the Payment Card Industry (PCI) security standard
and the Health Insurance Portability and Accountability Act (HIPAA). You can enable HDFS Transparent Encryption
for an entire Cloudera Enterprise cluster with no significant performance penalty.
Detective Controls
Detective controls reveal enterprise wide changes through auditing and reporting. While encryption and access
control are key components to protecting data, a comprehensive monitoring system must also be in place. In the
same way that video surveillance cameras supplement alarm systems inside and outside business buildings,
monitoring inbound requests inside file servers, operating systems and databases is core to data protection.
Detective controls centralize auditing and reporting across your organization so you can detect if a security breach
has occurred, or your system has been compromised.
No question about it: In the age of big data, organizations need to adopt a data-centric approach to security.
Specifically, they need to employ three key types of security controls: Preventive, Detective, and
Administrative. – MIT Technology Review
Administrative Controls
Oracle’s administrative controls include security processes and procedures that help you keep track of sensitive
data. Knowing precisely where all your big data resides enables you to systematically administer the environment
while ensuring that there are no unauthorized changes in the database environment.
6 |
ENTERPRISE SECURITY FOR BIG DATA ENVIRONMENTS: A WHITE PAPER FROM ORACLE AND INTEL
One of the identity management challenges enterprises face is the lack of a single source for identity data and the
proliferation of identity stores, including directories and databases. Oracle solves this problem with Oracle Internet
Directory, a general-purpose LDAPv3 compliant directory storage that serves as a central user repository for
defining access to Big Data applications, simplifying user administration and providing a standards-based
application directory for the entire enterprise.
Oracle Internet Directory works in conjunction with Oracle Access Manager, a comprehensive solution for web
access management and user identity administration that includes an Access System and an Identity System. The
Access System secures Big Data applications by providing centralized authentication, authorization and auditing to
enable single sign-on and secure access control across enterprise resources. The Identity System manages
information about individuals, groups and organizations. It also enables delegated administration of users, as well as
self-registration interfaces with approval workflows.
Oracle Big Data Management
Oracle offers a hybrid approach to big data processing that accommodates relational, NoSQL, and unstructured
data. Hadoop, Oracle Database, and Oracle NoSQLDatabase become key components of the big data ecosystem,
thanks to Oracle’s industry-leading big data technologies. Many technologies come into play, but big data
management is anchored by two key products:
•
Oracle Big Data Cloud Service running Cloudera Enterprise and Oracle NoSQL Database
•
Oracle Big Data SQL for unifying queries across Oracle Database and Hadoop
Many organizations gravitate to the cloud or to pre-built clusters such as the Oracle Big Data Appliance so they
won't have to spend the time and effort to create a commodity cluster, which requires specialized engineering skills
to deploy, optimize, and tune for real-time data analysis. Powered by fast, efficient Intel® Xeon® processors, Oracle
Big Data Cloud Service and Oracle Big Data Appliance are optimized for Apache Hadoop and other types of largescale data analytics. This multi-purpose environment is ideal for Hadoop-only workloads such as MapReduce,
Spark, and Hive as well as for interactive SQL workloads that use Oracle Big Data SQL. These capabilities are
available for on-premises deployment using Oracle Big Data Appliance as well as in the cloud via Oracle Big Data
Cloud Service.
Both cloud and on-premises offerings include a complete Hadoop security solution based on Apache Sentry and
LDAP-based authorization, pre-configured Kerberos authentication, and centralized auditing with Cloudera
Navigator. You can extend security and access policies from Oracle Database to data in Hadoop and NoSQL when
querying through Oracle Big Data SQL. Intel Xeon processors ensure fast, secure, high performance encryption for
big data analytics.
Both offerings support the latest innovations in encryption of data-at-rest by supporting HDFS Transparent
Encryption with a key management facility, along with Intel AES-NI encryption. This implementation enables the
tightest security on all data in HDFS. Combining Oracle Big Data Cloud Service or Oracle Big Data Appliance with
Oracle Big Data SQL delivers the most comprehensive security of any big data system.
With in-memory processing for big data analytics, and chip-layer encryption, the Oracle/Intel solution is much faster
and more secure than Hadoop solutions that are based on commodity hardware, which typically face bottlenecks of
100 MB/second.
7 |
ENTERPRISE SECURITY FOR BIG DATA ENVIRONMENTS: A WHITE PAPER FROM ORACLE AND INTEL
Intel Secure Key with True Random Number Generation
In addition, Intel Secure Key offers true random number generation using secure keys technologies that are
extremely difficult to decipher or attack due to Intel’s unique Digital Random Number Generator (DRNG) hardware
implementation. Cryptographic protocols rely on this technology for generating keys and refreshing session values
to prevent replay attacks. This centralized key management platform accelerates the deployment of encryption
across the enterprise.
Securing Big Data in the Cloud
Oracle’s big data security strategy encompasses IaaS, PaaS, and SaaS environments. Thus the same data
encryption technology that is built into Oracle Database and powered by Intel Xeon processors is transparently
available when data and applications are deployed in Oracle Cloud. Multitenant capabilities ensure that customer
data is sequestered and maintained separately from other customer data. Integrated security policies protect every
aspect of on-premises, private cloud, and public cloud environments.
Conclusion – Mitigating the Risks, Reaping the Rewards of Big Data
With big data comes big responsibility. Given the prevalence of security breaches in nearly every industry, you can’t
risk leaving big data unprotected. The traditional approach of securing the IT infrastructure is no longer enough.
Today’s threats are multifaceted and often persistent, and traditional network perimeter security controls cannot
effectively mitigate them. A holistic approach to big data security begins with protecting sensitive applications and
data—both from external and internal threats.
Oracle and Intel are applying technologies, policies, and procedures developed over several decades to secure the
big data landscape. Their complementary portfolio of layered, defense-in-depth solutions ensures data privacy,
protects against insider threats, and simplifies regulatory compliance. This comprehensive security architecture
protects the entire big data environment—on-premises and in the cloud. It includes preventive controls, detective
controls, administrative controls, and physical access controls. Multiple security zones restrict access on a “need to
know” basis for all IT staff. In addition, logical access controls encrypt data on staff computers, along with personal
firewalls, two-factor authentication, and role based accounts.
From the chip level to the application level, Oracle has created a tightly interwoven set of layered defenses for big
data initiatives. Built into the cloud, this big data security architecture provides multiple layers of protection, including
IaaS, PaaS, and SaaS. Thanks to a tight engineering partnership with Intel, IT organizations no longer have to
choose between performance and security when they wish to deploy industry-leading encryption technology.
Oracle’s defense-in-depth security architecture includes security controls clear down to the chip level, with high
performance data encryption from Intel embedded in the silicon. When run on Oracle Big Data Appliance, which is
powered by Intel Xeon processors, big data analytics workloads are optimized for extreme performance, stability,
manageability, and security. Thanks to these integrated capabilities, businesses can achieve the competitive
advantages of big data analytics, with the confidence that their most sensitive data is protected. From the silicon to
the firmware to the operating system to the applications to the middleware to the data—these layered defenses
protect big data environments.
8 |
ENTERPRISE SECURITY FOR BIG DATA ENVIRONMENTS: A WHITE PAPER FROM ORACLE AND INTEL
Oracle Corporation, World Headquarters
Worldwide Inquiries
500 Oracle Parkway
Phone: +1.650.506.7000
Redwood Shores, CA 94065, USA
Fax: +1.650.506.7200
CONNECT WITH US
blogs.oracle.com/oracle
facebook.com/oracle
twitter.com/oracle
oracle.com
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only, and the
contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other
warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or
fitness for a particular purpose. We specifically disclaim any liability with respect to this document, and no contractual obligations are
formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any
means, electronic or mechanical, for any purpose, without our prior written permission.
Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners.
Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation.
Enterprise Security for Big Data Environments: a White Paper from Oracle and Intel, July 2016