Copyright, International Journal of Advance Computing Technique and Applications (IJACTA), ISSN : 2321-4546, Vol 4, Issue 1, June 2016 A Modern Approach for Data Management 1 G. Shravani Priya, 2P.K.Sahoo Post graduate Scholar, Professor, Department of Computer Science and Engineering, Sreenidhi Institute of Science and Technology(Autonomous). [email protected], [email protected] Abstract An efficient and intelligent solution is needed for organizations to access data which is growing exponentially. A traditional approach like ETL (Extract, Transform and Load) integrates and moves data physically when queried by end users and generates reports. Data virtualization is a modern approach to data integration with reduced cost, reduced time with efficient and accurate access to the agile and real-time data. Virtualization is applied to traditional approach of data integration in order make efficient results. Organizations seeking for better solutions to adopt to make accurate business decisions with ever changing information and exponential growth of dynamic data. Data virtualization is one such solution to make decisions considering real time data. Keywords: Data, Data Virtualization, Real-time data, Agile, Access, Transform, Traditional approach. I. INTRODUTION Business organizations are seeking ways to access information assets in order to make better decisions, to reduce risk and cost and to improve individual enterprise overall profitability. Now-adays, both structured and unstructured data is growing exponentially. Significant volumes of complex and diverse data spread across various applications and locations make it difficult for the enterprises to meet their objectives. Data virtualization, an agile data integration approach that has been emerged to simplify the access to data.The term agile is short form of the dictionary word called agility which means moving easily and swiftly. It is an application that delivers simplified, unified and integrated views of data as needed by the applications and users. In simple words, data virtualization is an alternative approach to data management, allows accessing and manipulating data without requiring any technical details like where it is located and in which form it exists, which means the technical information of the data has been hidden. It is possible to create multiple www.ijacta.com views of virtual data from wide spread underlying heterogeneous sources likeinternal and external data sources on demand as if it was integrated when queried by applications and tools. II. DATA VIRTUALIZATION Vs ETL APPROACH In Data Virtualization, Data cleaning and de-normalization, data transformation and data correlation (involves filtering, joining and aggregation of data) is defined in a logical layer which is then applied to data as they are fetched from origin data source while generating reports. Whereas, in traditional approaches like ETL (Extract, Transform and Load) or ELT (Extract, Load and Transform) integrates and moves data physically from the base or origin data stores to target data store for data analysis and for queries passed by users or the applications. As part of data integration, actions like data cleansing, data correlation and data transformation are performed on data for generating reports. Data inaccuracies and data inconsistencies are handled in separate data quality and profiling stage. Working in this phase, results in less effort when data is dumped into a separate staging area. Whereas in data virtualization, through federation all the data is not situated within a staging area. Data virtualization offers different capabilities when compared with ETL to access, integrate, and deliver data. However they can offer competing approaches for some specific scenarios hence both the approaches are useful to the data integration toolset of any enterprise. The reason why data virtualization is chosen among both is, data virtualization is more agile, flexible, versatile, and cost efficient than ETL. III. NEED FOR DATA VIRTUALIZATION Exponential increase of data from distributed sources makes it difficult to extract to and generate reports. Data virtualization is used in scenarios such as: Page 159 Copyright, International Journal of Advance Computing Technique and Applications (IJACTA), ISSN : 2321-4546, Vol 4, Issue 1, June A. When structured, semi-structured and unstructured data from multiple data sources need to be fetched and combined in order to generate the reports. Data virtualization platform presents the datavia standard interfaces such as SQL, Web Services (REST and SOAP/XML), and so on, which is gathered from the different sources. 2016 It is an alternative way to share data through the shared user interface and shared metadata. It provides access to data via standard data virtualization platform with single execution engine among the heterogeneous data sources. B. Virtualized Data Access Heterogeneous data sources are connected to a common logical access point to access the data from them. Data from desperate sources combined in data virtualization thorough the data virtualization layer or the virtual database to anyreal time dataaccessed by reporting applicationswithout physical existences of origin sources. C. Data Transformation Data transformation is required because of the following reasons: 1) Fig1: Data Virtualization B. For decision making applications, data need to be accessed and delivered in real time or near real time. Data virtualization approach access the data from the underlying internal and external data stores required on right time by the applications. C. Physical Transformation of data is not processed, because the data virtualization layer gathers only results but the origin data stays where it is initially. IV. CAPABILITIES PROVIDED DATAVIRTUALIZATION Data virtualization provides data abstraction, meaning that it hides the technical details about the data, such as the location and storage technology, format and access language and storage structure. A. Data Federation Data Federation is a process of gathering data that is spread over multiple sources as a single and meaningful entity. Data federation eliminates large amount of data integration and making stage databases where large data sources are combined to be pretend as single data source. In Data virtualization, a single call is made to multiple sources and related data is integrated and managed in the data virtualization layer. www.ijacta.com Compatibility of Data type Generally when data from different sources is combined, as part of integration data is filtered and joinedtogether to single data store. It is important to equivalent type of data when different pieces of data is compared and if different data type definitions are present it would result in loss of data. Compatibility differences arise due to: 2) Different data types are used to represent the same piece of data Data from different sources supports different data types Different definitions for same data type in different data sources. Format of Data type Format of the data plays key role when integration is performed. Data need to be normalized before integration why because the different formats are used for repressing same data which means same semantically. 3) Unit differences The units of data may be different across different departments. Notation of currency for different countries is an example for unit differences. Before joining these data from different sources need to be normalized to a common unit. 4) Data Delivery Page 160 Copyright, International Journal of Advance Computing Technique and Applications (IJACTA), ISSN : 2321-4546, Vol 4, Issue 1, June Data virtualization delivers simplified, unified and integrated data when queried by users and applications. It provides the same functionalities which are provided by the other traditional approaches like ETL but it delivers at faster rate and real time data accessed with lower operational cost. V. WORKING OF DATAVIRTUALIZATION APPROACH Data virtualization needs threes components for the implementation such as, A. User interface An IDE (Integrated Development Environment) is needed to provide access and for development. B. Data Virtualization Working Environment includes: 1) Data virtualization layer:provides access to data sources to the applications. Connection is established to provide single version of result. 2) Query: Triggered by the applications, reports provided through data virtualization layer. 3) Cache Database: Once the result is generated, stored in cache database or file for future reuse. When the same query is fired, cached database is used to avoid unnecessary operation load on the entire server. Cache database can be stored in standard databases like Oracle, DB2, SQL Server or in-memory databases are also used. 4) Query Execution Engine: Using both rulebased and cost-based optimization techniques, query optimised to get the materialized view of the data (integrated and federated from multiple sources). 5) Object or Instants: Views and data services are the objects created and used for queries in data virtualization. The logic necessary to federate or integrate data, abstract technical details, transform and access, report the results to end users is encapsulated in these objects. According to the scope and need of the consumer the objects definitions changes. 6) Metadata Details: Details, structure of the data and information related to data store is stored in metadata repository. To achieve better cache performance, data virtualization servers can be clustered. C. Management of tasks such as monitoring, administration and handling of the errors. VI. ADVANTAGES OF DATAVIRTUALIZATION A. Data virtualization is an advanced data integration that combines benefits of agility and real time data integration in order to provide accurate data access efficiently in present the instance. www.ijacta.com 2016 Access to the agile data plays a key role in decision making process by the reporting applications. B. The major advantage with data virtualization is that it is a faster approach to get necessary data access as it requires less effort, because data is not really taken out of the sources and physical extraction and persisting of data is not needed. This can be stated as a solution to the cloud mitigation problem. C. Data Duplication is greatly avoided due to presence of the data virtualization layer. Synchronization and redundancy issues are reduced with this approach. D. Real time data delivery is achieved through data virtualization. E. Data virtualization approach is easy to install within the existing infrastructure in less time. VII. DISADVANTAGES OF DATAVIRTUALIZATION The issues involved in data virtualization concept are: A) Operation or execution strain is kept on the entire organization, as the whole data warehouse of entire organization is considered. B) When updating the real data, Update problems might be faced by the enterprise, and it again depends on the real time data supported by the enterprise. VIII. CONCLUSION Making accurate, efficient and instant decisions in real time, dynamic economic conditions and opportunities is a difficult task for the enterprises. An efficient and intelligent solution is needed for organizations to access data which is growing exponentially. Data virtualization is implemented in minimum cost when compared with traditional data access approaches, with in less time, efficient and accurate information is accessed from desperate data sources. Accessing data with greater agility is another feature offered by data virtualization approach. A competitive advantage can be obtained only by adding business-oriented personalization so that the information provided can fulfil the particular needs of end-users and empower dynamic analysis and decisions. The combination of physical and virtual features is a best integration solution, applying virtualization on physical data stores in order to access data with reduced costs and in quicker response time. REFERENCES [1] Mike Ferguson, “Data Virtualization – Flexible Technology for the Agile Enterprise”, Intelligent Business Strategies February 2014. Page 161 Copyright, International Journal of Advance Computing Technique and Applications (IJACTA), ISSN : 2321-4546, Vol 4, Issue 1, June 2016 [2]Pamela Szabó, “Data Virtualization and Federation”, Stone Bond Technologies. [3] Laljo John Pullokkaran, “Analysis of Data Virtualization & Enterprise Data Standardization in Business Intelligence”, CISL, 10 May 2013 [4] Data Abstraction Best Practices with Cisco Data Virtualization, White paper, Cisco Systems, Inc. [5] Data Virtualization and ETL, Denodo Technologies. [6] Data Virtualization, BIM, Capgemini [7]Data Virtualization Overview,Cisco Systems Inc. [8] Shannon Kempe , Data Virtualization and Realtime Data, May 2013 [9] Data Virtualization Goes Mainstream, Denodo Technologies. BIOGRAPHIES: Gadasu Shravani Priya is a P.G Scholar in Technology from Sreenidhi Institute of Science and Technology in the branch of Computer Science and Engineering. She has developed projects in Java, Perl, Python and Web designing. Her research interests include language translations and emerging technologies. She can be reached at [email protected]. Prasanta KumarSahoo is a Professor from the Department of Computer Science and Engineering, Sreenidhi Institute of Science and Technology(Autonomous). He can be reached at [email protected]. www.ijacta.com Page 162
© Copyright 2026 Paperzz