A Modern Approach for Data Management

Copyright, International Journal of Advance Computing Technique and
Applications (IJACTA), ISSN : 2321-4546, Vol 4, Issue 1, June
2016
A Modern Approach for Data Management
1
G. Shravani Priya, 2P.K.Sahoo
Post graduate Scholar, Professor,
Department of Computer Science and Engineering,
Sreenidhi Institute of Science and Technology(Autonomous).
[email protected], [email protected]
Abstract
An efficient and intelligent solution is needed for
organizations to access data which is growing
exponentially. A traditional approach like ETL
(Extract, Transform and Load) integrates and moves
data physically when queried by end users and
generates reports. Data virtualization is a modern
approach to data integration with reduced cost,
reduced time with efficient and accurate access to
the agile and real-time data. Virtualization is applied
to traditional approach of data integration in order
make efficient results. Organizations seeking for
better solutions to adopt to make accurate business
decisions with ever changing information and
exponential growth of dynamic data. Data
virtualization is one such solution to make decisions
considering real time data.
Keywords: Data, Data Virtualization, Real-time data,
Agile, Access, Transform, Traditional approach.
I. INTRODUTION
Business organizations are seeking ways to
access information assets in order to make better
decisions, to reduce risk and cost and to improve
individual enterprise overall profitability. Now-adays, both structured and unstructured data is
growing exponentially. Significant volumes of
complex and diverse data spread across various
applications and locations make it difficult for the
enterprises to meet their objectives.
Data virtualization, an agile data integration
approach that has been emerged to simplify the
access to data.The term agile is short form of the
dictionary word called agility which means moving
easily and swiftly. It is an application that delivers
simplified, unified and integrated views of data as
needed by the applications and users. In simple
words, data virtualization is an alternative approach
to data management, allows accessing and
manipulating data without requiring any technical
details like where it is located and in which form it
exists, which means the technical information of the
data has been hidden. It is possible to create multiple
www.ijacta.com
views of virtual data from wide spread underlying
heterogeneous sources likeinternal and external data
sources on demand as if it was integrated when
queried by applications and tools.
II. DATA VIRTUALIZATION Vs ETL
APPROACH
In Data Virtualization, Data cleaning and
de-normalization, data transformation and data
correlation (involves filtering, joining and
aggregation of data) is defined in a logical layer
which is then applied to data as they are fetched
from origin data source while generating reports.
Whereas, in traditional approaches like ETL
(Extract, Transform and Load) or ELT (Extract,
Load and Transform) integrates and moves data
physically from the base or origin data stores to
target data store for data analysis and for queries
passed by users or the applications. As part of data
integration, actions like data cleansing, data
correlation and data transformation are performed on
data for generating reports.
Data inaccuracies and data inconsistencies
are handled in separate data quality and profiling
stage. Working in this phase, results in less effort
when data is dumped into a separate staging area.
Whereas in data virtualization, through federation all
the data is not situated within a staging area.
Data
virtualization
offers
different
capabilities when compared with ETL to access,
integrate, and deliver data. However they can offer
competing approaches for some specific scenarios
hence both the approaches are useful to the data
integration toolset of any enterprise.
The reason why data virtualization is chosen among
both is, data virtualization is more agile, flexible,
versatile, and cost efficient than ETL.
III. NEED FOR DATA VIRTUALIZATION
Exponential increase of data from
distributed sources makes it difficult to extract to
and generate reports. Data virtualization is used in
scenarios such as:
Page 159
Copyright, International Journal of Advance Computing Technique and
Applications (IJACTA), ISSN : 2321-4546, Vol 4, Issue 1, June
A.
When structured, semi-structured and
unstructured data from multiple data sources need
to be fetched and combined in order to generate the
reports. Data virtualization platform presents the
datavia standard interfaces such as SQL, Web
Services (REST and SOAP/XML), and so on,
which is gathered from the different sources.
2016
It is an alternative way to share data through
the shared user interface and shared metadata. It
provides access to data via standard data
virtualization platform with single execution engine
among the heterogeneous data sources.
B.
Virtualized Data Access
Heterogeneous data sources are connected
to a common logical access point to access the data
from them. Data from desperate sources combined
in data virtualization thorough the data
virtualization layer or the virtual database to anyreal
time dataaccessed by reporting applicationswithout
physical existences of origin sources.
C.
Data Transformation
Data transformation is required because of the
following reasons:
1)
Fig1: Data Virtualization
B.
For decision making applications, data need
to be accessed and delivered in real time or near
real time. Data virtualization approach access the
data from the underlying internal and external data
stores required on right time by the applications.
C.
Physical Transformation of data is not
processed, because the data virtualization layer
gathers only results but the origin data stays where
it is initially.
IV. CAPABILITIES PROVIDED
DATAVIRTUALIZATION
Data
virtualization
provides
data
abstraction, meaning that it hides the technical
details about the data, such as the location and
storage technology, format and access language and
storage structure.
A.
Data Federation
Data Federation is a process of gathering
data that is spread over multiple sources as a single
and meaningful entity. Data federation eliminates
large amount of data integration and making stage
databases where large data sources are combined to
be pretend as single data source. In Data
virtualization, a single call is made to multiple
sources and related data is integrated and managed
in the data virtualization layer.
www.ijacta.com
Compatibility of Data type
Generally when data from different sources
is combined, as part of integration data is filtered
and joinedtogether to single data store. It is
important to equivalent type of data when different
pieces of data is compared and if different data type
definitions are present it would result in loss of
data. Compatibility differences arise due to:



2)
Different data types are used to
represent the same piece of data
Data from different sources
supports different data types
Different definitions for same data
type in different data sources.
Format of Data type
Format of the data plays key role when
integration is performed. Data need to be
normalized before integration why because the
different formats are used for repressing same data
which means same semantically.
3)
Unit differences
The units of data may be different across
different departments. Notation of currency for
different countries is an example for unit
differences. Before joining these data from different
sources need to be normalized to a common unit.
4)
Data Delivery
Page 160
Copyright, International Journal of Advance Computing Technique and
Applications (IJACTA), ISSN : 2321-4546, Vol 4, Issue 1, June
Data virtualization delivers simplified,
unified and integrated data when queried by users
and applications. It provides the same
functionalities which are provided by the other
traditional approaches like ETL but it delivers at
faster rate and real time data accessed with lower
operational cost.
V. WORKING OF DATAVIRTUALIZATION
APPROACH
Data virtualization needs threes components for the
implementation such as,
A.
User interface
An
IDE
(Integrated
Development
Environment) is needed to provide access and for
development.
B.
Data Virtualization Working Environment
includes:
1)
Data virtualization layer:provides access to
data sources to the applications. Connection is
established to provide single version of result.
2)
Query: Triggered by the applications,
reports provided through data virtualization layer.
3)
Cache Database: Once the result is
generated, stored in cache database or file for future
reuse. When the same query is fired, cached
database is used to avoid unnecessary operation
load on the entire server. Cache database can be
stored in standard databases like Oracle, DB2, SQL
Server or in-memory databases are also used.
4)
Query Execution Engine: Using both rulebased and cost-based optimization techniques,
query optimised to get the materialized view of the
data (integrated and federated from multiple
sources).
5)
Object or Instants: Views and data services
are the objects created and used for queries in data
virtualization. The logic necessary to federate or
integrate data, abstract technical details, transform
and access, report the results to end users is
encapsulated in these objects. According to the
scope and need of the consumer the objects
definitions changes.
6)
Metadata Details: Details, structure of the
data and information related to data store is stored
in metadata repository. To achieve better cache
performance, data virtualization servers can be
clustered.
C.
Management of tasks such as monitoring,
administration and handling of the errors.
VI. ADVANTAGES OF
DATAVIRTUALIZATION
A.
Data virtualization is an advanced data
integration that combines benefits of agility and real
time data integration in order to provide accurate
data access efficiently in present the instance.
www.ijacta.com
2016
Access to the agile data plays a key role in decision
making process by the reporting applications.
B.
The
major
advantage
with
data
virtualization is that it is a faster approach to get
necessary data access as it requires less effort,
because data is not really taken out of the sources
and physical extraction and persisting of data is not
needed. This can be stated as a solution to the cloud
mitigation problem.
C.
Data Duplication is greatly avoided due to
presence of the data virtualization layer.
Synchronization and redundancy issues are reduced
with this approach.
D.
Real time data delivery is achieved through
data virtualization.
E.
Data virtualization approach is easy to
install within the existing infrastructure in less time.
VII. DISADVANTAGES OF
DATAVIRTUALIZATION
The issues involved in data virtualization concept
are:
A)
Operation or execution strain is kept on the
entire organization, as the whole data warehouse of
entire organization is considered.
B)
When updating the real data, Update
problems might be faced by the enterprise, and it
again depends on the real time data supported by
the enterprise.
VIII. CONCLUSION
Making accurate, efficient and instant decisions in
real time, dynamic economic conditions and
opportunities is a difficult task for the enterprises.
An efficient and intelligent solution is needed for
organizations to access data which is growing
exponentially. Data virtualization is implemented in
minimum cost when compared with traditional data
access approaches, with in less time, efficient and
accurate information is accessed from desperate data
sources. Accessing data with greater agility is
another feature offered by data virtualization
approach.
A competitive advantage can be obtained
only by adding business-oriented personalization so
that the information provided can fulfil the particular
needs of end-users and empower dynamic analysis
and decisions.
The combination of physical and virtual features is a
best integration solution, applying virtualization on
physical data stores in order to access data with
reduced costs and in quicker response time.
REFERENCES
[1] Mike Ferguson, “Data Virtualization – Flexible
Technology for the Agile Enterprise”, Intelligent
Business Strategies February 2014.
Page 161
Copyright, International Journal of Advance Computing Technique and
Applications (IJACTA), ISSN : 2321-4546, Vol 4, Issue 1, June
2016
[2]Pamela Szabó, “Data Virtualization and
Federation”, Stone Bond Technologies.
[3] Laljo John Pullokkaran, “Analysis of Data
Virtualization & Enterprise Data Standardization in
Business Intelligence”, CISL, 10 May 2013
[4] Data Abstraction Best Practices with Cisco Data
Virtualization, White paper, Cisco Systems, Inc.
[5] Data Virtualization and ETL, Denodo
Technologies.
[6] Data Virtualization, BIM, Capgemini
[7]Data Virtualization Overview,Cisco Systems Inc.
[8] Shannon Kempe , Data Virtualization and Realtime Data, May 2013
[9] Data Virtualization Goes Mainstream, Denodo
Technologies.
BIOGRAPHIES:
Gadasu Shravani Priya is a P.G Scholar in
Technology from Sreenidhi Institute of Science and
Technology in the branch of Computer Science and
Engineering. She has developed projects in Java,
Perl, Python and Web designing. Her research
interests include language translations and emerging
technologies.
She
can
be
reached
at
[email protected].
Prasanta KumarSahoo is a Professor from the
Department of Computer Science and Engineering,
Sreenidhi
Institute
of
Science
and
Technology(Autonomous). He can be reached at
[email protected].
www.ijacta.com
Page 162