MCA 204, Data Warehousing & Data Mining UNIT-1 Compelling Need for Data Warehousing © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania, U1.1 Learning Objective • • • • Escalating need for strategic information Building blocks of data warehouse Data warehouse components Defining the business requirements © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.2 I t d ti Introduction © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania, U1.‹#› © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.1 MCA 204, Data Warehousing & Data Mining DBMS and Data Warehouse • Databases and data warehouses are methods for organizing and managing information and business intelligence. • Database management systems and data mining tools are IT tools used to work with information and business intelligence. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.4 Business Intelligence Business intelligence - is knowledge about : Customers Competitors Partners Competitive environment Internal operations © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.5 Data Warehousing What Is A Data Warehouse? • Data warehousing is a rapidly expanding area of technology and one that still has a number of different definitions. • Stephen R. Gardner claims : it is “a process, not a product, for assembling and g g data from various sources for the p purpose p of managing gaining a single, detailed view of part warehouse is a place to store detailed data and a way to combine data to get a detailed picture of the business. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.6 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.2 MCA 204, Data Warehousing & Data Mining Cont.... • Another definition by Lawrence Fischer states: “A data warehouse is just another database. What sets it apart is that the information it contains is not used for operational purposes, but rather for analytical tasks.” © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.7 Data Warehousing (Definition) A subject-oriented, integrated, time-variant, and non-volatile collection of data in support of management’s decision-making process’ [Inmon, 1993]. • SUBJECT-ORIENTED: The warehouse is organized around the major subjects of an enterprise (e.g. customers, products, and sales) rather than the major j application li ti areas (e.g. ( customer t i invoicing, i i stock t k control, t l and order processing). • INTEGRATED DATA: • The data warehouse integrates corporate application-oriented data from different source systems, which often includes data that is inconsistent. • Such data, must be made consistent to present a unified view of the data to the users. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.8 Cont.... • TIME VARIANT: •Data in the warehouse is only accurate and valid at some point in time or over some time interval. •Time-variance is also shown in the extended time that the data is held, the association of time with all data, and the fact that data represents a series of historical snapshots. snapshots • NON-VOLITILE: •Data in the warehouse is not updated in real-time but is refreshed from operational systems on a regular basis. •New data is always added as a supplement to the database, rather than a replacement. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.9 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.3 MCA 204, Data Warehousing & Data Mining Cont… • Data warehouses are not transaction-oriented. • Data warehouses processing (OLAP). support online analytical © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.10 Data Warehouses What Is A Data Warehouse? © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.11 Why Data Warehousing? • Collect information centrally • Organize information consistently • Deliver information conveniently • Hence, significant cost benefits, time savings and productivity gain associated with using a data warehouse for information processing. • Conclusion: • Data warehouse enables information processing to be done in a credible, efficient manner. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.12 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.4 MCA 204, Data Warehousing & Data Mining Scope • The ability to use information to make insightful decisions depends on having appropriate tools to extract specific data, convert it into business information and monitor changes. • Data warehouse delivers not only summary information but also the ability to drill down, develop forecast and export the information to other decision-support tools. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.13 Practical Implications Methodologies and design principles needed for building a production data warehouse. The data warehouse positions the enterprise to satisfy four interrelated demands on corporations to : • Prepare their systems and their users for constant evolution. • Improve the productivity and revenue contribution of every employee. • Maximize profits by performing core business processes better than their competitors and by eliminating as many resource -draining practices as possible. • Apply science to information. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.14 Creating a Data Warehouse Data Collection There need to be extraction routines to gather data from the various operational data sources that interface with the Data Warehouse. Data Cleaning & Transformation Data must be checked for validity and accuracy and differences in syntax and semantics must be resolved Data Loading Data must be loaded into the Data Warehouse after carrying out appropriate summarisation and aggregation. Often this will be done using parallelism (as it could take weeks to serially load a terabyte of data!). © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.15 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.5 MCA 204, Data Warehousing & Data Mining Creating a Data Warehouse Data Refresh Updates to base data (operational data) must periodically be propagated to the Data Warehouse. Data Storage Appropriate storage structures must exist to allow the Data Warehouse to support fast access for search and analysis of differing data types (text, graphic, picture, …). © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.16 Structures of Data Warehouse Different levels of summarization detail that describes the data warehouse: • Current data • Older data • Summarized data • Meta data © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.17 Current Data Such data: • Reflects the most recent happenings, which are always of great interest. • It is voluminous because it is stored at the lowest level of granularity. • It is almost always stored on disk storage, which is fast to access, but expensive and complex to manage. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.18 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.6 MCA 204, Data Warehousing & Data Mining Older Data • Older data is the data that is infrequently accessed and stored at a level of detail consistent with current detailed data. • Summarized data : summarized data are of two categories, according to the processing need and storage. • Lightly summarized data • Highly summarized data • (compact and easily accessible) © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.19 Data Marts • A feature of data warehouses that sets it apart from databases is the data mart, where data is divided into a subset of the information in the data warehouse. • The size of a data warehouse typically ranges from 1-10 GB. • The data mart is typically populated using the data warehouse, but occasionally the information will come directly from the source; it is safer to populate the data mart using data directly from the warehouse because it is already cleaned and checked for consistency. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.20 Applications of Data Warehouses • Some of the many ways in which a data warehouse gets used by businesses include: Create reports for analysis. Build information about important customers in order to strengthen customer relations. Maintain information about inventory and supply. Measure success of promotions. Predict the effects of price changes. Improve the effectiveness of the business by implementing new market strategies. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.21 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.7 MCA 204, Data Warehousing & Data Mining Considerations and Issues Cost to Business • A typical warehouse costs, overall, more than $1 million. • a big risk to take on a project that has an initial failure rate as high as 50%. • The Th high hi h costt can be b attributed tt ib t d to t the th amountt off time ti and d money. • It takes to collect, clean and integrate the data from different sources. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.22 Ease of Use • When designing the data warehouse ease of use should be on the top of the list. “a data warehouse by itself does not create value; value comes from the use of data in the warehouse.” • The most successful data warehouses are ones that provide users with information they need without a lot of training. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.23 How Warehouse Works Data warehouses are based largely on four main processes: Extracting and loading the data Cleaning and transforming the data Query management Backup and archiving of the data. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.24 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.8 MCA 204, Data Warehousing & Data Mining Aggregations • Aggregations are a way of dividing the information so queries can be run on the aggregated part and not the whole set of data. • The warehouse manager is responsible for creating Aggregations. • Most aggregations can be created in a single complex query .and saves time. U1.25 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania Access: Operational and External Data • The access mechanism required to retrieve data from Heterogeneous Operational databases • i.e. retrieved from DB2, SYBASE, ORACLE etc. Transform T f • Cleans • Reconcile • Enhance • Summarize • Aggregate Distribute Di t ib t • Stage • Join Multiple Sources • Populate on demand Store St • Relational Data • Specialized caches • Multiple Platforms & H/W U1.26 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania Data Warehouse Functions • It depicts the flow of data from the original source to the user, and includes management and implementation capabilities. • Access mechanisms required to heterogeneous operational databases. retrieve data from • Data is then transformed and delivered to the “data warehouse store” based on a selected model. • The data transformation and movement processes are executed whenever an update to the warehouse data is desired. • The information that describes the model and definition of the source data elements is called “metadata”. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.27 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.9 MCA 204, Data Warehousing & Data Mining Data Flow Within the Data Warehouse • There is a normal and predictable flow of data within the data warehouse. • Most data enters the data warehouse from the operational environment. • As data enters the data warehouse from the operational environment, it is transformed. • Upon entering the data warehouse, data goes into the current level of detail. It resides there and is used there until one of three events occurs; It is purged It is summarized and/or It is archived © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.28 Usage • The different level of data within the data warehouse receive different levels of usage. • The more summarized the data, the quicker and the more efficient response time. • Good from security point of view. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.29 Building Considerations Building & Administration of DW requires the following: Indexing • Data at the higher levels of summarization can be indexed and constructed relatively easier than that at the lower levels. • The data model and formal design activities do not apply to the levels of summarization, summarization in almost each case. case Partitioning : Partitioning can be done at either of the following two levels • DBMS level : DBMS is aware of the partitions and manages them accordingly. • Application level : Only application programmer is aware of partition and responsible for the management. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.30 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.10 MCA 204, Data Warehousing & Data Mining Other Considerations • Public summary data is stored and managed in the data warehouse, even through its calculation is well outside the data warehouse scope. • Here, the data is stored for Ethical and legal reasons as required by the corporation. • In summary, a data warehouse is a subject-oriented, integrated, time-variant, non-volatile collection of data in support of management’s decision needs. Each of the salient aspects of a data warehouse carries its own implications. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.31 Differences • Classic SDLC (requirement driven) • Requirement gathering • Analysis • Design • Programming • Testing • Integrating • Implementation • Data warehouse SDLC (data driven) • Implement warehouse • Integrate data • Test for bias • Program against data • Design DSS • Analyze result • Understand requirements © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.32 Data Mining • Data mining is the process of extracting previously unknown but significant information from large database and using it to make crucial business decision. • Data mining has major implications across the enterprise – for productivity, profitability, customer satisfaction, and overallll competitiveness. titi • Data mining is about discovering facts. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.33 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.11 MCA 204, Data Warehousing & Data Mining Data Mining Process • There are two stages in the process of data mining to used when searching for information. • Initial searches should be carried out on summary information . information. • Focus on the detailed data in order to provide a clearer view. • The concept of data mining provides organizations with the ability to analyze and monitor trends and variations within their business that provide information to aid the decision-making process. • Data mining process requires following steps: Data warehouse Extracted data Data mined Extracted information Select Transform Mine Assimilate © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.34 Enabling Components Middleware • The emergence of middleware is the single most significant development that enables data mining. • Without this software connecting heterogeneous data sources, the resulting information would not provide a complete picture and could not reap the same reward. Network • The advances in networking are a key factor in providing increased bandwidth across heterogeneous protocols and therefore the necessary performance to provide train of thought processing. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.35 Cont... Data Source • Many of the DBMS vendors now provide parallel support to enable rapid query against large volumes of data. • This enables gigabytes of data to be queried in seconds where previously it would have taken minutes. Operating System • Multiple processor architectures enable high-performance computers to provide the train of through response times required for successful data mining analysis © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.36 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.12 MCA 204, Data Warehousing & Data Mining Related Technologies and Rules • To make data mining feasible, the appropriate data has to be collected and stored in a data warehouse, and adequate system resources have to be available to make the data mining process feasible: • Many statistical analysis systems such as SAS have been used to detect unusual patterns and explain patterns using linear statistical models • Ad-hoc querying and report generation are commonly used by many businesses to provide input to their decision making. • Multidimensional spreadsheets and databases are becoming popular for data analyses that require summary views of the data along multiple dimensions. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.37 Cont.... • Neural networks have been applied successfully in a few applications that involve classification. • Data mining, when complemented by the techniques descried above adds significant volume beyond the use of the above, traditional techniques. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.38 Data Mining Platform • Data mining technologies are characterized by : • intensive computations on large volumes of data. • Significant processing power and parallelism is a key to enabling significant data mining. • The system can be upgraded to provide the necessary analysis l i in i a timely ti l and d cost-effective t ff ti fashion. f hi • A balanced system architecture that supports I/O, computation, and sealing in a cost effective fashion is desirable. • Hence, the highest capacity and performance systems are of interest in this area. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.39 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.13 MCA 204, Data Warehousing & Data Mining Data Mining Tools • Data mining has been around for some years but has only recently come of age because of the following: • Variety of tools and technological trends. • Improved hardware cost/performance ratio. • Improved performance in parallel technology. technology • More flexible and intuitive query software. • Greatly advanced middleware connectively. • Data mining tools provide access to the data warehouse (which is the logical view of the organization’s data ) and enable query, analysis, and presentation of data. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.40 Example To visualize where data mining techniques can be used most effectively Examples : Fashions change frequently in the retail trade and timely analysis of information can be used to predict the latest trends on a store-by-store store by store basis. basis This analysis can be used to reduce stock levels, reduce capital outlay, and ensure stock is placed where it vides competitive advantage. An increase of 1% increase of 1% profit margin can make the difference between success and failure in the highly competitive retail trade. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.41 Data Mining Tool Characteristics • Many tools are integrated part of a total data warehouse solution. • The mining tools requires creative analysis to detect trends, although some have an element of intelligence to detect patterns. • To provide coherent information from an unstructured data requires sophisticated tools. • To get the desired results from the data requires manipulation and synchronization into a format usable by the tool. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.42 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.14 MCA 204, Data Warehousing & Data Mining Operational Warehouse Two fundamental enterprise: types of data within any • Operational data is the data that directly supports the business functions and for which the majority of applications have been written. • Informational data that supports the decisionmaking process of an organization. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.43 Cont.... Data warehouse overcomes the problems of operational environment for decision-support analysis, such as the following : • Lack of Integration: Built on diverse types of databases and run on heterogeneous mainframe, so difficult to integrate for decision support . • Lack of History : The operational environment provides no historical perspective due to space limitation and to maintain performance level. • Lack of Credibility : Difficult to access the accuracy or timeliness of the data • Performance Considerations: Data store in a format designed to optimize transaction performance rather than to support business analysis. • Difficulty in Gaining Enterprise-Wide Perspective : To make cross functional analysis of information contained in separate databases difficult. The data warehouse address these problems by providing the architecture to model, map, filter, integrate, condense and transform operational data into a separate database to meaningful information that can be accessed, analyzed. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.44 Types of Data Warehouses • A majority of the enterprises prefer to build and implement a single centralized data warehouse environment for the following reasons : • A single repository makes sense if the volume of data can be managed easily. • The data is integrated across the enterprise and only that view is used at the headquarters. • However, it may be impractical to integrate and access the data at a single site if it is dispersed over multiple locations. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.45 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.15 MCA 204, Data Warehousing & Data Mining Cont.... Hence types of DW depends on the number of business factors such as the following : • Business objective • Location of the Current Data • Need to move the data © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.46 Cont.... • Business objectives : • The enterprise should know the need of data warehouse and their priorities such as DW size, location, frequency of use and maintenance. • A properly scoped and executed DW can prove extremely cost effective in building a DW. • Location of current data : • It is extremely important to know where the data is and what are its characteristics and attributes in order to select the proper tools. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.47 Cont... Need to move data : The data movement can only be decided by considering a combination of • Quality of existing data • Size of usable data • Data design • Performance impact of a direct query • Performance impact on the current production systems • Availability and ease of use of the tool © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.48 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.16 MCA 204, Data Warehousing & Data Mining Different Configuration of Data Different configuration implementation are of data to satisfy DW • Real time data (operational data): Operational data used by operational applications contains all individual detailed data records where each update overlays the previous entry. entry • Reconciled data : Contains detailed records from the real time level which has been cleaned, adjusted or enhanced so that data can be used for informational applications. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.49 Different Configuration of Data • Derived Data: A summarized , averaged, from multiple sources of the real time data or reconciled data for improved processing capability. • Changed Data : It contains a record of all the changes to the selected real time data. • Meta Data : The information that describes the model and definition of the source data elements © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.50 Why Do Enterprise Really Need Data Warehouses? • Operational computer • Information to run day to day business Event driven Not directly suitable for review from different point E Executives i Different kind of information for Strategic decisions e.g. which product line to expand, which market should be strength Trend over time Review – Sales quantities by product, salesperson, region etc. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.51 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.17 MCA 204, Data Warehousing & Data Mining Organizations’ Use of Data Warehousing • Retail Customer loyalty Market planning • Financial Risk management Fraud detection • Airlines Route profitability Yield management • Manufacturing Cost reduction Logistics management • Utilities Asset management Resource management • Government Manpower planning Cost control © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.52 Escalating Need for Strategic Information • Failures of Past decision-support systems • Operational versus decision-support systems • Data warehousing – the only viable solution © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.53 Need for Strategic Information • After 1990s,business grew more complex. • Corporate spread globally • More competition is there Operational systems did provide info. info To run day-today to day operations but managers, executives needed different kinds of information that could be used to make strategic decisions. • DW is a new paradigm specifically intended to provide vital strategic info. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.54 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.18 MCA 204, Data Warehousing & Data Mining Need for Strategic Information • Why do enterprises really need data ware? • Escalating Need For Strategic Information. The executives & managers who are responsible for keeping the enterprise competitive need information to make proper decisions. They need info to formulate the business strategies, establish goals ,set objectives & monitor results. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.55 Escalating Need for Strategic Information • Who needs strategic information in an enterprise? Executives and managers To make proper decision For keeping the enterprise competitive To formulate and execute business t t i strategies Establish goals, Set objectives Monitor results. • What exactly do we mean by strategic information? © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.56 Some Business Objectives • Retain the present customer base • Increase the customer base by 15% over the next 5 years. • Bring new product in 2 yrs • Improve product quality levels in top 5 product group • Gain market share by 10% in next 3 years • Increase sale by 10% in East division © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.57 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.19 MCA 204, Data Warehousing & Data Mining Cont... • For making business objectives information for the following purpose:- managers needs depth knowledge of company’s operations. time Monitor how the business factor change over time. Compare company’s performance relative to competition and industry bench marks. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.58 Strategic Information • Executives and managers need to focus their attention on customers’ need and preferences, emerging technologies, sales and marketing results, quality levels of product and services. • This type of information needed to make decisions in formulation and execution of business strategies and objectives : All these essentials information in one group is called Strategic Information Strategic information is not for running the day to day operations of the business. It is important for the continued growth and survival of corporation. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.59 Characteristics of Strategic Information Integrated • Must have a single, enterprise wide view Data Integrity • Information must be accurate and must conform to business rule. Accessible • Easily accessible with intuitive access path and responsive for analysis. Credible • Every business factor must have one and only one value. Timely • Information must be available with in the stipulated time frame. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.60 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.20 MCA 204, Data Warehousing & Data Mining Escalating Need For Strategic Information • Information Crisis • Technology trends • Opportunities and risks • Failure of past decision support systems © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.61 Information Crisis • In IT Dept. of big or small organization. various computer applications in company. data bases and the Quantities of data that support the operation of company. • How many year’s worth of customer data is saved and available? • How many years’ worth of financial data is kept in storage? 10years or 15 years • Where is all this data ? On one platform? In legacy systems? In Client/server applications? © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.62 Cont… • Facts faced by organization Organizations have lots of data. IT systems are NOT effective at turning all the data into useful strategic information. • In organization we have lot of data, then why executives and managers uses this data for making strategic decisions? Information Crisis Data available not accessible Old technology/different platform For proper decision making on over all corporate strategies and objectives Information integrated from all systems. Data needed for strategic decision making must be in a format suitable for analyzing trends. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.63 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.21 MCA 204, Data Warehousing & Data Mining Technology Trends Computing Technology Main Frame Mini PC | Networking Client/Server Human/Machine Interface Punch Card Video Display GUI VOICE Processing g Options p Batch 1950 Online 60 70 Networked 80 90 2000 Growth of Information Technology © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.64 Opportunities and Risks • Examples of the opportunities made available to companies through the use of strategic information: • A community- based pharmacy competes on a national scale with more than 800 franchised pharmacies coast to coast gains in-depth understanding of what customers buy, buy reduced inventory levels, improved effectiveness of promotions and marketing campaigns improved profitability for the company. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.65 Cont... • Consider the cases where risks and threats of failures existed before strategic information was made available for analysis and decision making. Example: • For a world leading supplier of systems and components to automobile and light truck equipment manufacturer across nearly 100 plants, inability to benchmark quality matrices and time consuming manual collection of data. Reports needed to support decision making tool weeks. Not easy for company to get company wide integrated information © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.66 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.22 MCA 204, Data Warehousing & Data Mining Failures of Past Decision Support System • A marketing department is concern about performance of the west cost region. The marketing Vice President wants to get some reports from the IT department to analyze the performance over the past two years, Product by Product, and compared to monthly targets. CEO wants to deliver as soon as possible to manager and manager immediately go to the sub ordinate, to give marketing report. There is no report available gather the data from multiple application (different platform) and start from scratch These reports lacks the actual agenda, which causes in consistencies among the data obtained from different applications. It is also possible the person from IT dept. create a report from single application for his/her convenience, so such information may not be helpful in strategic decisions making. So, from the scenario we come to know that when information is scattered in different places with forms, it is difficult to use the available information in strategic Decisions. U1.67 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania Operational Vs Decision Support Systems • The fundamental reason for the in ability to provide strategic information is Trying to provide strategic operational systems. information from the These operational systems such as order processing, inventory control, claims processing, out patient billing , and so on are not designed or intended to provide strategic information. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.68 Cont... • Making the wheels of Business Turn • Get data in Take an order Process a claim Make a shipment Generate an invoice Receive cash Reserve an air line seat • Operational systems support the basic business processes of the company Day to day business • Watching the wheels of Business Turn • Get information out Shows the top-selling products. Shows the problem region. Shows the highest margins Alert whenever a district sells below target. Decision Support Systems (DSS) run the core business processes. No immediate payout DSS systems are developed to get strategic Info out of the data base where as OLTP systems are designed to put the data into database © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.69 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.23 MCA 204, Data Warehousing & Data Mining Differences Primitive data/Operational data • Application oriented • Detailed • Accurate, as of the moment of process • Serves the clerical community • Can be updated p y • Run repetitively • Compatible with SDLC • Accessed a unit at a time • Transaction driven • Control of updates a major • concern in terms of ownership • Small amount of data used in a process • Supports day today operation • High probability of access Derived data/DSS data • Subject oriented • Summarized, otherwise refined • Represents values overtime, snapshots • Severs the managerial community • Is not updated y • Run heuristically • Completely different life cycle • Accesses a set at a time • Analysis driven • Control of updates no issues • Managed by subsets • Large amount of data used for managerial support • Supports managerial needs • Low, modest probability of access © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.70 History of Decision Support Systems # Ad-Hoc Reports• This was the earliest stage • Users would send the request the IT dept. for special reports. • IT would write special program typically one for each request, and produce the ad Hoc reports. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.71 History of Decision Support Systems # Special Extract Programs• That stage was attempt by IT to anticipate the reports that would be requested from time to time. • IT would write a suit of programs and run the programs periodically i di ll to t extract t t the th data d t from f various i applications li ti • IT would create and keep the extract files to fulfill any request for special reports. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.72 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.24 MCA 204, Data Warehousing & Data Mining Cont... # Small Applications • In this Stage It formalized the extract process • Create simple application based on extracted files. • User could specify the parameters for each special report. • The Report printing programs would prints the reports based on user-specified parameters © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.73 Cont... # Information Center • In early 1970s,Major corporations centers. created Information • Information center, center User could go to request ad hoc reports or view special reports on screen. • These were predetermined reports or screens. • IT personnel were there to help the users to obtain desired information. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.74 Cont... # Decision Support Systems • In this stage, companies began to build more sophisticated systems to provide strategic information. y were menu driven and p provided on line • Systems information. • Systems were supported by extracted files. • User could specify the parameters for each special report. • Ability to print the reports. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.75 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.25 MCA 204, Data Warehousing & Data Mining Cont... # Executive Information Systems • This was first attempt to bring the strategic information to the executive desktop. • Systems were designed to display key info. every day. • Straight forward reports. reports • Only preprogrammed screens and reports were available. • It was not possible to see analysis by region, by product, or by any dimension unless such break downs were already programmed. • This limitations caused frustration and executives information Systems did not last long in many companies. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.76 Failure Reasons • What is basic reason for failure of all previous attempts by IT to provide strategic information? • The fundamental reason for the inability to provide strategic information is that Operational systems were used to provide strategic information. • These information System Like order processing, inventory control, claims processing etc. are not designed to provide strategic information. • Only special designed decision support systems can provide strategic information. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.77 Typical OLAP Operations Roll up (drill-up): summarize data by climbing up hierarchy or by dimension reduction Drill down (roll down): reverse of roll-up from higher level summary to lower level summary or detailed data, or introducing new dimensions Slice and dice: project and select Pivot (rotate): reorient the cube, visualization, 3D to series of 2D planes. Other operations drill across: involving (across) more than one fact table drill through: through the bottom level of the cube to its back-end relational tables (using SQL) © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.78 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.26 MCA 204, Data Warehousing & Data Mining Decision Support Systems • A decision support system (DSS) is a set of expandable, interactive IT technique and tools designed for processing and analyzing data and for supporting managers in decision making. Strategic information Value Reports Selected information Primary data source Quantity © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.79 Classification of Decision Support Systems System Description Passive DSS Support decision making process but it does not offer explicit suggestion on decision or suggestion Active DSS Offer suggestions and solutions Collaborative DSS Operate interactively and allows decision makers to modify, integrate or refine suggestions given by the system Model driven DSS Enhance management of statistical statistical, financial financial, optimization and simulation model Communication drive DSS Supports a group of people working on a common task Data driven DSS Enhance the access and management of time series of corporate and external data. Document driven DSS Manages and processes non structured data in many formats Knowledge driven DSS Provides problem solving features in the form of facts, rules and procedures © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.80 Data Ware housing- The only viable Solutions • Need for different types of DSS to provide Strategic information. for analysis, discerning trends monitoring performance. • Escalating Need for strategic information data ware housing is the only viable solution for providing Strategic information • Data warehousing is a collection of methods, techniques and tools used to support knowledge workers- senior managers, directors, mangers and analyst to conduct data analyses and help in performing decision making process and improving information resources © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.81 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.27 MCA 204, Data Warehousing & Data Mining New System Environment • Desirable features and processing requirements of new type of system environment. Data Base designed for analytical tasks. Data from multiple applications. Easy to use and Conducive to long interactive sessions by users. users Content updated periodically and stable Content to include current and historical data Ability for users to run queries and get results online. Ability for users to initiative reports. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.82 Processing Requirements in the New Environment • New environment for strategic information are analytical • 4 levels of analytical processing requirements • Running of Simple queries and report against current and historical data. • Ability to perform “What if “ Analysis in many different ways. • Ability to Query, step back, analyze, and then continue to process to any desired length. • Spot historical trends and apply them for future results. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.83 Business Intelligence at the Data Ware House Extraction, Cleansing, aggregation Operational Systems Basic Business Processes Data Transformation Key Measurements, Business dimensions. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.84 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.28 MCA 204, Data Warehousing & Data Mining Definition • Data warehouse is an information environment. • Provides an integrated and total view of the enterprise • Makes the enterprise current and historical information easily available for decision making • Make decision support transaction possible without hindering operational system. • Renders organization’s information consistent • Present a information flexible and interactive source © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania of strategic U1.85 Conclusion • Operational system are not for strategic information • Data warehouse is an computing environment not product to provide strategic information Data analysis and decision support Flexible and interactive User driven © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.86 Let’s Discuss 1. How strategic information can increase the quality and realize opportunities with readily available strategic information Insurance Company Airlines Company Proposal to explain problems with reasons Why data warehouse is viable ? 2. A Senior Analyst (IT Dept.) of a company manufacturing automobile parts. Marketing VP complains about poor IT response in providing strategic information. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.87 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.29 MCA 204, Data Warehousing & Data Mining Data Warehouse: Building Block © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania, U1.‹#› Data Warehouse: Building Block • • • • Defining Features Data warehouses and data marts Overview of the components Metadata in the data warehouse © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.89 Defining Features • Key Defining Features of the Data ware house based on these Definitions. • What is the nature of the Data in the Data Warehouse? • How is this Data Different from the Data in any operational System? • Why does it have to be different? • How is the Data content in the Data Ware house used? © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.90 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.30 MCA 204, Data Warehousing & Data Mining What is a Data Warehouse? Defined in many different ways, but not rigorously. A decision support database that is maintained separately from the organization’s operational database Support information processing by providing a solid platform of consolidated, historical data for analysis. “A data warehouse is a subject-oriented, integrated, timevariant, and nonvolatile collection of data in support of management’s decision-making process.”—W. H. Inmon Data warehousing: The process of constructing and using data warehouses © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.91 Data Warehouse—Subject-Oriented • Organized around major subjects, such as customer, product, sales. • Focusing on the modeling and analysis of data for d i i decision makers, k nott on daily d il operations ti or transaction t ti processing. • Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.92 Data Warehouse—Subject-Oriented • Operational Systems • Subject-Oriented Data: • Data stored by individual applications. • But in Data Ware house, Data is stored by subjects. • Data sets for an order processing application, application • Business Subjects differ from organization to organization. • These data sets provide the Data for all the functions for entering orders, Checking stock, Verifying customer’s credit, and assigning the order for shipment. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.93 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.31 MCA 204, Data Warehousing & Data Mining Data Warehouse—Integrated • Constructed by integrating multiple, heterogeneous data sources relational databases, flat files, on-line transaction records • Data cleaning and data integration techniques are applied. Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources E.g., Hotel price: currency, tax, breakfast covered, etc. When data is moved to the warehouse, it is converted. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.94 Data Warehouse—Time Variant The time horizon for the data warehouse is significantly longer than that of operational systems. Operational database: current value data. Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) Everyy keyy structure in the data warehouse Contains an element of time, explicitly or implicitly But the key of operational data may or may not contain “time element”. • The time-variant nature of the Data in a Data Warehouse. Allows for analysis of the past. Relates information to the present. Enables forecasts for the future. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.95 Data Warehouse—Non-Volatile • A physically separate store of data transformed from the operational environment. • Operational update of data does not occur in the data warehouse environment. Does not require transaction processing, recovery, and concurrency control mechanisms Requires only two operations in data accessing: initial loading of data and access of data. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.96 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.32 MCA 204, Data Warehousing & Data Mining Data Warehouse—Non-Volatile • Data from an operational system is added, deleted as each transaction happens • No update, once the data is captured in the data ware house, • Data updates are common place and operational Database. • Do not run individual transactions to change the data there. • Volatile data in the Operational Databases • Non volatile warehouse in data © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.97 Data Granularity • Operational system Lowest level of detail lot of Data Daily details • Data warehouse Data Granularity in a Data ware house refers to the level of details. Data summarized at different levels. Monthly/quarterly summary © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.98 Data Warehouse vs. Heterogeneous DBMS Traditional heterogeneous DB integration Build wrappers/mediators on top of heterogeneous databases Query driven approach When a query is posed to a client site, a meta-dictionary is used to translate the query into queries appropriate for individual heterogeneous sites involved, and the results are integrated into a global answer set Complex information filtering, compete for resources Data warehouse: update-driven, high performance Information from heterogeneous sources is integrated in advance and stored in warehouses for direct query and analysis © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.99 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.33 MCA 204, Data Warehousing & Data Mining Data Warehouse vs. Operational DBMS OLTP (on-line transaction processing) Major task of traditional relational DBMS Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc. OLAP (on-line analytical processing) Major task of data warehouse system Data analysis and decision making Distinct features (OLTP vs. OLAP): User and system orientation: customer vs. market Data contents: current, detailed vs. historical, consolidated Database design: ER + application vs. star + subject View: current, local vs. evolutionary, integrated Access patterns: update vs. read-only but complex queries U1.100 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania OLTP vs. OLAP OLTP OLAP users clerk, IT professional knowledge worker function day to day operations decision support DB design application-oriented subject-oriented data current, up-to-date detailed, flat relational isolated repetitive titi historical, summarized, multidimensional integrated, consolidated ad-hoc dh lots of scans unit of work read/write index/hash on prim. key short, simple transaction # records accessed tens millions #users thousands hundreds DB size 100MB-GB 100GB-TB metric transaction throughput query throughput, response usage access complex query U1.101 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania Why Separate Data Warehouse? High performance for both systems DBMS— tuned for OLTP: access methods, indexing, concurrency control, recovery Warehouse—tuned for OLAP: complex multidimensional view, consolidation. OLAP queries, Different functions and different data: missing data: Decision support requires historical data which operational DBs do not typically maintain data consolidation: Decision support requires consolidation (aggregation, summarization) of data from heterogeneous sources data quality: different sources typically use inconsistent data representations, codes and formats which have to be reconciled © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.102 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.34 MCA 204, Data Warehousing & Data Mining Data Ware Houses and Data Marts Cont... Data Ware House Data Mart Enterprise-wide Departmental Union of all Data marts A Single Business Process. Data Received from Staging Area Facts and Dimensions Structure for corporate view of Data Technology optimal for data access and analysis. Organized on E-R model Structure to Suit the departmental View of data © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.103 Data Ware Houses and Data Marts Cont... Data Warehouse • Is a collection of data that supports decision making process • It provides following features: subject oriented; integrated and consistent, shows evolution over time and it is not volatile l til Data marts • Is subset of the data stored to a primary data warehouse. • It includes set of information pieces relevant to a specific business area corporate department or category of users. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.104 Data Warehousing and OLAP Technology for Data Mining What is a data warehouse? A multi-dimensional data model Data warehouse building blocks © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.105 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.35 MCA 204, Data Warehousing & Data Mining Data Warehouse Components © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania, U1.‹#› Overview of Components Information Delivery Component Source Data Component Mgt & Mgt. Control Component Data Staging Component Data Storage Component & Meta data Component © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.107 Data Ware house Components Cont... 1. Source Data Component: grouped into four broad categories Production Data: • This category of data comes from various operational y of the enterprise. p systems Internal Data: • In every organization, user keep their “private” spread sheets, documents, customer profiles and some times even departmental Databases. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.108 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.36 MCA 204, Data Warehousing & Data Mining Cont... Archived Data: In operational systems, periodically take the old data and store it in archived files. The Data in these archived files is referred to as Archived Data. External Data: g y, the data included the data from the • In this Category, external sources. • For Example: Market share data of competitors. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.109 Cont... 2) Data Staging Component: • Data extracted from various operational systems and external source • Prepare data for storing in the data ware house. • The Extracted data from several disparate sources needs to be Changed Converted Make data ready to be stored in format suitable for querying and analysis. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.110 Cont... • The 3 major functions need to be performed for getting the data ready. • Data Extraction / Extract the Data: For data ware house extract the data using appropriate techniques from large amount of data received from the operational system • Data Transformation: involves many forms of combining pieces of data from the different sources. Merging, sorting in large scale in the staging area • When data transformation functions ends (collection of integrated data is cleaned, standardized and summarized). The data is ready to be loaded data in data warehouse. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.111 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.37 MCA 204, Data Warehousing & Data Mining Cont... • Data Loading: In this phase initial movement of moves large volumes of data using up substantial amount of time. • As data warehouse function continuous extraction the changes to source data Transform, revision, feed incremental data revision. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.112 Data Movement in Data Warehouse Yearly refresh Quarterly refresh Data Sources Data Warehouse Monthly refresh Daily refresh Base data load •Time consuming •Initial load moves large volume of data •Business condition determine refresh cycle © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.113 Cont. 3)Data Storage Component: • The data storage for the data warehouse is a separate repository. • The operational systems of enterprise support the dayto-day operations. • The Data repositories of the operational systems typically contain only the current data, while the data repository for a data warehouse, need to keep large volumes of historical data for analysis. • So the data in the data warehouse need to be kept in the structures suitable for analysis, and not for quick retrieval of individual pieces of Information. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.114 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.38 MCA 204, Data Warehousing & Data Mining Cont... 4) Informational Delivery Component: • Who are the user who need information from data warehouse. • To Provide information to the wide community of Data Warehouse users. • Novoice user No training Prefabricated reports and present queries • Casual user Need information once in while Need prepackaged information Navigate through data warehouse, create customer report, adhoc queries • The information delivery component includes a variety of information delivery. Such as, we may include several information delivery mechanisms, we provide for online queries and reports. U1.115 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania Information Delivery Component Data Warehouse Information Delivery e ve y Component Data Marts Online Ad hoc reports Intranet Complex queries •No voice •Casual user •MD Analysis MD Analysis Internet Statistical Analysis E-mail Executive Info System (EIS) feed •Business Analyst •Senior Manager •High Level Managers Data Mining © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.116 Cont... 5) Meta Data Component: • Meta Data in a Data ware house is similar to the Data dictionary or the Data Catalog in a Data Base Management System. • In data dictionary information about the logical data Structures, information about the files and addresses, information about the indexes. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.117 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.39 MCA 204, Data Warehousing & Data Mining Cont... 6) Management and Control Component: • This component of the data ware house architecture sits on top of all the other components. • The management and control component co-ordinates the services and activities with in the data warehouse. • Moderates the information delivery to the users. • Works with the database mgt. systems and enables data to be properly stored in the repositories. • Monitors the movement of the data into the staging area to the data warehouse storage. • Management and control component interact with metadata component to perform the management and control functions • Metadata : source of information for management module © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.118 Meta Data in the Data Warehouse • Meta Data component serve as a directory of contents of data warehouse. • Meta data in a data warehouse fall in three major categories. 1)Operational Meta Data: • Operation meta data gets its data from operational data sources. • These sources contains different data structures for storing data from various operational system. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.119 Meta Data in the Data Warehouse Cont... 2) Extraction and Transformation Meta Data: Extraction and transformation metadata contains data about the extraction of data from the source system like extraction frequency, extraction methods for data extraction. This also contains the information about all the data transformation that take place in the data staging area. area 3) End-User Meta Data: The end-user meta data is the navigational map of the data ware house. It enables the end-users to find information from the data warehouse. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.120 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.40 MCA 204, Data Warehousing & Data Mining Data Warehouse Architecture • Architecture properties essential for data warehouse system (Kelly, 1997). • Separation Analytical and transaction processing should be kept apart • Scalability Hardware and software architectures should be easily upgradeable as the volume of data increases • Extensibility Architecture should be able to host new applications and technologies without redesigning the whole system • Security Monitoring access is essential because of strategic data stored in data warehouse • Administrablility Data warehouse management should not be over difficult © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.121 Classification of Data Warehouse Architecture Two different classification are commonly adopted for data warehouse architecture • Structure oriented Single layer architecture Two layer architecture Three layer architecture • Depend on how different layers are employed to create enterprise or department oriented views of data warehouse Independent data marts © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.122 Single Layer Architecture Operational data • Only one layer available Source Layer Source layer •Goal Reduce amount of data by removing redundancies Middleware Data Warehouse Analysis Reporting tool OLAP tools • Not frequently used in practice • In this data warehouse is virtual Means data warehouse is implemented as a multidimensional view of operational data created by specific middleware, or internal processing layer (Devlin, 1997) © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.123 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.41 MCA 204, Data Warehousing & Data Mining Single Layer Architecture • Weakness of this architecture lies in its failure to meet the requirement for separation of analytical and transactional processing. • Analytical queries are submitted to operational data after the middleware interprets them . In this way queries affect regular transactional workload. workload • Although this architecture can meet the requirement for integration and correctness of data, it cannot log more data than source do. • For these reasons, a virtual approach to data ware houses can be successful only if analysis needs are particularly restricted and data volume to analyze is huge. U1.124 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania Two Layer Architecture Operational data External data Source Layer ETL tools Data Staging Data Warehouse Data marts Meta data Data Warehouse Layer Analysis Reporting tool OLAP tools Data Mining tools What-if analysis tools © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.125 Two Layer Architecture Consist of four subsequent stages • Source layer Use heterogeneous sources of data that is originally stored to corporate relational data bases or legacy (applications running on mainframes and mini computers used for operational task but does not meet modern architecture) database or may come from information systems outside the corporate walls. • Data staging Data stored should be extracted, cleansed to remove inconsistencies and fill gaps and integrate to merge heterogeneous sources into one common schema. ( ETL tool) © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.126 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.42 MCA 204, Data Warehousing & Data Mining Two Layer Architecture • Data warehouse layer Information stored to one logically centralized repository : a data warehouse. Data warehouse can be directly accessed and can be used as a source to create data marts which partially replicate data warehouse content and are designed for specific enterprise department. Meta data store information on sources, sources access procedure, procedure data staging, users, data marts etc. • Analysis In this layer integrated data is efficiently and flexibly accessed to issue reports, dynamically analyze the information and hypothetical business scenarios. Technologically it features aggregated data navigators, complex query optimizers, user friendly GUIs U1.127 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania Two Layer Architecture Benefits of two layer architecture, in which data warehouse separated from analysis applications • In data warehouse system good quality information is always available even when access to sources is denied for technical or organizational reasons. • Data warehouse analysis queries do not affect the management of transactions, the reliability of which is vital for enterprises to work at an operational level • Data warehouse are logically structured according to the multidimensional model while operational sources are generally based on relational or semi structured model. • Data warehouses can use specific solutions aimed at performance optimization of analysis and report applications U1.128 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania Three Layer Architecture Operational data External data Source Layer Data Staging ETL tools Reconciled data Reconciled layer Meta data ETL tools Loading Data Warehouse Data marts Reporting tool What-if analysis tools OLAP tools Data Mining tools © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania Data Warehouse Layer Analysis U1.129 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.43 MCA 204, Data Warehousing & Data Mining Three Layer Architecture • In this architecture, third layer is the reconciled layer or operational data store. • This layer materializes operational data obtained after integrating and cleansing source data. • Fi Figure shows h th t data that d t warehouse h i nott populated is l t d from f it its sources directly but from reconciled data. • Advantage of reconciled data Create common reference for a whole enterprise. U1.130 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania Additional Architecture Classification • Independent data marts Different data marts are separately designed and build in a non integrated fashion. This approach can be initially adopted when the organizational division in company are loosely coupled. It tends to be soon replaced by other architectures that better achieves data integration and cross reporting. U1.131 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania Independent Data Marts Architecture Operational data ETL tools Operational data ETL tools Operational data Data mart Data mart M t data Meta d t Meta data Reporting tools OLAP tools Data mining toolsWhat if analysis tools © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.132 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.44 MCA 204, Data Warehousing & Data Mining Additional Architecture Classification • Bus architecture Similar to independent data marts with a difference that a basic set of conformed dimensions (that is, analysis dimensions that preserve the same meaning throughout all facts they belong to), derived by a careful analysis of the main enterprise processes, is adopted and shared as a common design guideline. It ensures logical integration of data marts and a enterprise wide view of information © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.133 Additional Architecture Classification • Hub and spoke Most used architecture in medium to large context, there is much attention on scalability and extensibility, and to achieve an enterprise-wide view of information. Atomic, normalized data is stored in a reconciled layer that feeds a set of data marts containing summarized data in multidimensional form. form Users mainly access the data marts but they may occasionally query the reconciled data • Centralized architecture Particular implementation of hub and spoke architecture, where reconciled layer and data marts are collapsed into a single physical repository © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.134 Hub and Spoke Architecture Operational data External data ETL tools Reconciled data Meta Data Loading Data marts Reporting tools OLAP tools Data mining tools What if analysis tools © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.135 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.45 MCA 204, Data Warehousing & Data Mining Additional Architecture Classification • Federated architecture Sometime adopted in dynamic contexts where preexisting data warehouses/data marts are to be noninvasively integrated to provide a single, cross organization decision support environment (for instance, in case of mergers and acquisition). Each data warehouse/ data mart is either virtually or physically integrated with other, leaning on a variety of advanced techniques such as distributed querying, ontologies and meta data interoperability U1.136 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania Federated Architecture Operational data Operational data Operational data ETL tools ETL tools ETL tools Data marts Data marts Data marts Logical physical integration Reporting tools OLAP tools Data mining tools © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania What if analysis tools U1.137 ETL Operational and external data Extraction Validation Cleansing filtering Transformation Reconciled data Loading Data warehouse © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.138 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.46 MCA 204, Data Warehousing & Data Mining ETL • ETL consist of four separate four separate phases: extraction (or capture), cleansing (pr cleaning or scrubbing), transformation and loading. • Extraction Relevant data is obtained from source in the extraction phase. Static extraction data warehouse needs populating for first time Incremental extraction update p data warehouse regularly, g y, seizes the change g applied pp to source data since last extraction • Cleansing Main cleansing feature in ETL tools are rectification and homogenization Supposed to improve data quality Duplicate data Missing data inconsistent values that are logically associated impossible or wrong data Unexpected use of fields © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.139 Cont… • Transformation Reconciliation phase change operation data into a specific data warehouse format. conversion and normalization to make data uniform matching that associates equivalent field in different source selection that reduces the number of source fields and records When populating a data warehouse , normalization is replaced by denormalization because data warehouse are typically denormalized and aggregation is required to sum up data properly • Cleansing Main cleansing feature in ETL tools are rectification and homogenization Supposed to improve data quality Duplicate data Missing data inconsistent values that are logically associated impossible or wrong data Unexpected use of fields © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.140 ETL • Loading Last step carried out in two ways Refresh Completely rewritten : older data replaced. Refresh is normally used in combination with static extraction to initially populate a data ware house. Update Only those changes applied to source data are added to the data warehouse. Carried out without deleting or modifying preexisting data Used in combination with incremental extraction to update data warehouse regularly. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.141 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.47 MCA 204, Data Warehousing & Data Mining Example of Cleansing and Transforming Customer Data John White Downing St. 10 TW1A 2AA London (UK) Normalization first name: John Last name: White Address: 10, Downing St. Zipcode: TW1A 2AA City: London Country :United Kingdom Correction first name: John Last name: White Address: Downing St. 10 Zipcode: TW1A 2AA City: London Country: UK Standardization first name: John Last name: White Address: 10, Downing St. Zipcode: SW1A 2AA City: London Country :United Kingdom © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.142 Conclusion The Data ware house is an informational environment that • Provides an integrated and total view of the enterprise. • Makes the enterprise’s current and historical information easilyy available for Decision Making. g • Makes Decision-Support transactions possible without hindering Operational Systems. • Renders the Organization’s consistent information. • Presents a flexible and interactive source of strategic information. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.143 Let’s Discuss 1. Data Analyst on project building a data warehouse for an insurance company. List all possible data sources from which data will be brought too data warehouse (State assumptions). 2. For an airlines company, Identify Id tif three th operational ti l applications li ti th t would that ld feed f d into the data ware What would be the data load and refresh cycle 3. Identify potential users and information delivery methods for a data warehouse supporting large national grocery chain. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.144 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.48 MCA 204, Data Warehousing & Data Mining Defining the Business Requirements © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania, U1.‹#› Defining the Business Requirements • • • • Dimensional analysis Information packages Requirements gathering methods Requirements definition © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.146 Dimensional Analysis • A data warehouse is an information delivery system. • It is not about technology, but about solving users’ problems and providing strategic information to the user. Requirement defining phase What information users need, not how the information will be provide • B Building ildi a data d t ware house h i different is diff t from f b ildi an building operational system. Users cannot fully describe what they want in a data warehouse but they provide with important insights into how they think about business. Analysis required Business dimensions Measurement unit © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.147 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.49 MCA 204, Data Warehousing & Data Mining Manager Think in Business Dimension (Number) Marketing VP • How much did the new product generate • Month by month, in southern division, by user demographic, by sales office, relative to previous version, plan Marketing Manager • Sales statistics • By product, summarized by product categories, daily, weekly, monthly, by sale districts, by distribution channel Financial Controller • Show expenses • Listing actual vs budget, by months, quarters, annual, by budget line item, by district, by division, , summarized for whole company U1.148 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania From Tables and Spreadsheets to Data Cubes • A data warehouse is based on a multidimensional data model which views data in the form of a data cube • A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions Dimension tables,, such as item ((item_name,, brand,, type), yp ), or time(day, ( y, week,, month, quarter, year) Fact table contains measures (such as dollars_sold) and keys to each of the related dimension tables • In data warehousing literature, an n-D base cube is called a base cuboid. The top most 0-D cuboid, which holds the highest-level of summarization, is called the apex cuboid. The lattice of cuboids forms a data cube. U1.149 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania Multidimensional Data Juice Cola Milk Cream 10 47 30 12 3/1 3/2 3/3 3/4 Sales Volume as a function of time, city and product Date © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.150 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.50 MCA 204, Data Warehousing & Data Mining Cube: A Lattice of Cuboids all time time,item 0-D(apex) cuboid item time,location location item,location time,supplier supplier 1-D cuboids location,supplier 2-D cuboids item,supplier time,location,supplier time,item,location 3-D cuboids time,item,supplier item,location,supplier 4-D(base) cuboid time, item, location, supplier U1.151 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania Dimensional Nature of Business Data Delhi Product TV sets Jan Slice of product sale info (units sold) Time • can be extended to multiple dimension • Multidimensional cubes : Hypercube U1.152 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania Examples of Business Dimensions Time Customer Time Agent Flight Frequent flights Status Fare class Claims Type Airport Airlines Company Time Status Policy Insured Party Promotion Insurance Business Sales units Product Status Store Supermarket chain © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.153 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.51 MCA 204, Data Warehousing & Data Mining OLAP for Decision Support • Goal of OLAP is to support ad-hoc querying for the business analyst • Business analysts are familiar with spreadsheets • Extend spreadsheet analysis model to work with warehouse data Large data set Semantically enriched to understand business terms (e.g., time, geography) Combined with reporting features • Multidimensional view of data is the foundation of OLAP © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.154 OLAP for Decision Support • Pivot table - a multidimensional spreadsheet © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.155 What is Dimension Modeling? • Dimensional modeling gets its name from the business dimensions we need to incorporate into the logical data model. It is a logical design technique to structure the business dimensions and the metrics that are analyzed along these dimensions. • Using dimensional modeling, measurements and relevant dimensions must be captured and kept in the data warehouse. For this,, information p package g diagram g can be drawn for the specific subject. • It enables in packaging the data in a symmetric format which will help in: High Performance for queries and analysis. Captures critical measures Views along dimensions Intuitive to business users © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.156 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.52 MCA 204, Data Warehousing & Data Mining Dimensional Modeling • In dimension modeling, there are two types of tables: Dimension Table and Fact Table • Facts are stored in FACT Tables • Dimensions are stored in DIMENSION tables • Dimension tables contains textual descriptors of business • Fact and dimension tables form a Star Schema • “BIG” fact table in center surrounded by “SMALL” dimension tables © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.157 Multidimensional Data Model • Database is a set of facts (points) in a multidimensional space • A fact has a measure dimension quantity that is analyzed, e.g., sale, budget • A set of dimensions on which data is analyzed e.g. , store, product, date associated with a sale amount • Dimensions form a sparsely p y p populated p coordinate system • Each dimension has a set of attributes e.g., owner city and county of store • Attributes of a dimension may be related by partial order Hierarchy: e.g., street > county >city Lattice: e.g., date> month>year, date>week>year © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.158 Fact Table Fact Table • The metrics or facts from the information package diagram will form the fact table. They are facts for analysis. • For example, for automaker sales, actual sale price is a fact about what the actual price was for the sale. Similarly, the other facts are as follows: MSRP sale price Options price Full price Dealer add-ons Dealer credits Dealer invoice Amount of downpayment Manufacturer proceeds Amount financed • All the facts can be grouped into a single data structure, called the fact table. These contribute to forming the fact table for the automaker sales fact table. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.159 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.53 MCA 204, Data Warehousing & Data Mining Properties of Fact Table Concatenated key • A row in the fact table relates to a combination of rows from all the dimension tables. • Then a single row in the fact table must relate to a particular product, a specific calendar date, a specific customer, and an individual sales representative. • This means the row in the fact table must be identified by the primary keys of these four dimension tables. Thus, the primary key of the fact table must be the concatenation of the primary keys of all the dimension tables. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.160 Cont.. Data Grain: • Data grain is the level of detail for the measurements or metrics. • In this example, the metrics are at the detailed level. • The quantity ordered relates to the quantity of a particular product on a single p g order,, on a certain date,, for a specific p customer, and procured by a specific sales representative. If we keep the quantity ordered as the quantity of a specific product for each month, then the data grain is different and is at a higher level. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.161 Cont.. • Fully additive measures: Some attributes may be summed up by simple addition, like order_dollars, quantity_sold. These measures are known as fully additive measures. • Semi additive measures: Some of the attributes are not fully additive, but derived calculated metric of the attributes in fact table. For example, margin percentage can be calculated using order_dollars and extended_cost. • Table Deep, not Wide: Fact table contains lesser attributes but more number of table rows. • Sparse Data: Fact table can have gaps as for some dimension attributes, there would be no rows in the fact table. Hence, this type of sparse data is not present in fact table. • Degenerate Dimensions: They also contain, sometimes degenerate dimensions that are reference numbers likes order numbers, average_per_order which are neither facts nor dimensions. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.162 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.54 MCA 204, Data Warehousing & Data Mining Dimension Table • The product business dimension is used when analysis is to be done of the facts by products. • Sometimes analysis could be a breakdown by individual models. Another analysis could be at a higher level by product lines. • Yet another analysis could be at even a higher level by product categories. • The list of data items relating to the product dimension are as follows: • Model name, Model year, Package styling, • Product line, Product category • Exterior color, Interior color • First model year • All of these are related to the product in some way. • All of these data items can be grouped in one data structure or one relational table. This table is called the product dimension table. The data items in the above list would all be attributes in this table. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.163 Properties of Dimension Table: • Dimension table key: Primary key of the dimension table uniquely identifies each row in the table. • Large number of attributes (wide): Typically, a dimension table has many columns or attributes. Thus, the dimension table is wide. • Textual attributes: In the dimension table you will seldom find any numerical values used for calculations. The attributes in a dimension table are of textual format. • Attributes not directly related: some of the attributes in a dimension table are not directly related to the other attributes in the table. • Flattened out, not normalized: The attributes in a dimension table are used over and over again in queries. For efficient query performance, it is best if the query picks up an attribute from the dimension table and goes directly to the fact table and not through other intermediary tables. Therefore, a dimension table is flattened out, not normalized. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.164 Cont.. • Ability to drill down / roll up: The attributes in a dimension table provide the ability to get to the details from higher levels of aggregation to lower levels of details. • Multiple hierarchies: dimension tables often provide for multiple hierarchies, so that drilling down may be performed along any of the multiple hierarchies. • Less number of records: A dimension table typically has fewer number of records or rows than the fact table. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.165 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.55 MCA 204, Data Warehousing & Data Mining Sample Data Cube Diploma 1st 2nd 3rd Counttry M.Sc. B.Sc. German students in the 4th term pursuing a diploma 4th ∑ GermGerm anyy anyy S it Switzerland Switzerland S lit d l d ∑ Coun ntry Term U.S.A. U.S.A. ∑ ∑ ∑ ∑ © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.166 Operations in Multidimensional Data Model • Aggregation (roll-up) dimension reduction: e.g., total sales by city summarization over aggregate hierarchy: e.g., total sales by city and year -> total sales by region and by year • Navigation to detailed data (drill-down) e.g., g , ((sales - expense) p ) by y city, y, top p 3% of cities by y average g income • Selection (slice) defines a subcube e.g., sales where city = Palo Alto and date = 1/15/96 • Visualization Operations (e.g., Pivot) © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.167 Information Packages-A New Concept • Information Packages: A methodology for determining requirement for a data warehouse based on business dimensions for analysis on business dimension. It incorporates basic measurements and business dimensions • Information package enables to Define D fi the h common subject bj areas. Design key business metrics. Decide how data must be presented Determine how users will aggregate or roll up. Decide the data quantify for user analysis or query. Decide how data will be accessed. Establish data granularity Estimate data ware house size Determine the frequency for data refreshing © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.168 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.56 MCA 204, Data Warehousing & Data Mining Information Subject : Sales Analysis Dimensions Locations Products Age Groups Year Country Class Group 1 Hierarchies Time Period Measured Facts : Forecast Sales, Budget Sales, Actual Sales An Information Packages © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.169 Cont... • Business dimensions basis of IP • Hierarchical levels for further processing Drilling down and rolling up for analysis • Categories g : Data elements within business dimensions e.g. sales on holiday • Key business metrics or facts number © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.170 Business Dimension for Auto Sales Analysis • Hierarchies and categories for each dimension • Product : Model name, Model year, package styling, product line, product category, exterior color, interior color, first model year • Dealer : Dealer name, city, state, single brand flag, date first operation • Customer demographics: Age, gender, income, marital status, house hold size, vehicle owned, home value, own or rent • Payment method: Financial type, term in months, interest rate, agent • Time: Date, month, quarter, year, day of week, day of month, season, holiday flag w © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.171 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.57 MCA 204, Data Warehousing & Data Mining Cont... • Metrics for analyzing automobile Actual sale price Option price Full price Dealer add-ons Dealer credits Dealer invoice Amount of down Amount financed U1.172 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania Information Subject : Automaker Sales Hierarchiess Dimensions Time Product Payment Method Year Model Name Financial type Age Quarter Model Year Month Package Customer Demo Graphics Gender Dealer Dealer Name City State Date Single Brand flag Week Month Season Holiday Flag Measured Facts : Actual sale price, Option price, Full price, Dealer add-ons, etc An Information Packages © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.173 Classification of Users of Data Warehouse • Senior executive ( including sponsors) Have sense of direction, Involved in focused area • Key departmental manager Report to executive in the area of focus • Business analysts Prepare reports and analyses for executive and manager • Operational system DBA Only gives info • Other nominated by above © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.174 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.58 MCA 204, Data Warehousing & Data Mining What Requirements to Gather? Broad list: • Data elements: fact classes, dimensions • Recording of data in terms of time • Data extracts from source systems • Business rules: attributes, operational records ranges, domains, © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.175 Interviews • Interviewing is an important method for collecting data on human and system information requirements. • Kimball et al. (1998) stated that two basic procedure can be used to conduct user requirement analysis : interviews and facilitated sessions. • Interviews are conducted with single or small, homogeneous groups groups. • Everyone can participate results in very detailed list of specifications • Facilitated sessions involve large heterogeneous groups • Encourage creative brain storming • Session aim at setting general priorities typically follow interviews following on detail specification © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.176 Requirements Gathering Methods • Interviews one to one sessions Group Sessions Not good initial state Useful for confirming requirements • JAD (Joint (J i t Application A li ti Development) D l t) sessions i Joint approach concerned group for a well defined purpose • Review the existing documents Documentation from user department Documentation from IT © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.177 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.59 MCA 204, Data Warehousing & Data Mining Interview Process Task Before Project Launches • Select and train team member conducting interview • Assign roles for team member • Prepare questionnaire Current information sources Subject areas Key performance matrices Information frequency • Pre interview research History and current structure of business unit No. of employee and roles and responsibilities Location of user Primary purpose of business unit Company market Competitor in market • List of user to be interviewed • List expectations © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.178 Initial Document for Requirement Definition • • • • • • • • • Interview write ups User profile Background and objective Information requirement Analysis requirement Current tools used Success criteria Useful business metrics Relevant business dimensions © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.179 Types of Interview Question • Open ended • What do you think of data quality?, What are the key objectives your unit has to face? • Closed • Are you interested in sorting out purchase in storing out purchase by hour? Do you want to receive a sales report every week? • Evidential • Could you please give me an example of how you calculate your business unit budget?, Could you please describe the issues with poor data quality that your business unit is experiencing? © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.180 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.60 MCA 204, Data Warehousing & Data Mining Expectations From Interviews •Senior executive • Dep. Managers /Analyst Organization executive Criteria for measuring success Key business issues, current and future Problem identification Vision and direction of organization Anticipated usage of DW Departmental objective Success metrics Factor limiting success Key business issues Product and services Useful business dimensions for l i analysis Anticipated usage of DW •IT Dept. Professional Key operational source system Current information deliver process Type routing analysis Known quality issue Current IT support for information requests Concerns about proposed DW © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.181 Information Gathering: g Interactive Methods © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania, U1.‹#› Objectives • Recognize the value of interactive methods for information gathering. • Construct interview questions to elicit human information requirements. • Structure interviews in a way that is meaningful to users. p of JAD and when to use it. • Understand the concept • Write effective questions to survey users about their work. • Design and administer effective questionnaires. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.183 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.61 MCA 204, Data Warehousing & Data Mining Major Topics • Interviewing Interview preparation Question types Arranging questions The interview report • Joint Application Design (JAD) Involvement Location • Questionnaires Writing questions Using scales Design Administering KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.184 Interviewing • Interviewing is an important method for collecting data on human and system information requirements. • Interviews reveal information about: Interviewee Interviewee opinions Interviewee feelings Goals Key HCI concerns KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.185 Interview Preparation • Reading background material. • Establishing interview objectives. • Deciding whom to interview. • Preparing the interviewee. • Deciding on question types and structure. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.186 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.62 MCA 204, Data Warehousing & Data Mining Question Types • Open-ended • Closed KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.187 Open-Ended Questions • Open-ended interview questions allow interviewees to respond how they wish, and to what length they wish. • Open-ended interview questions are appropriate when the analyst is interested in breadth and depth of reply. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.188 Advantages of Open-Ended Questions • Puts the interviewee at ease. • Allows the interviewer to pick up on the interviewee’s vocabulary. • Provides richness of detail. • Reveals avenues of further questioning that may have gone untapped. untapped • Provides more interest for the interviewee. • Allows more spontaneity. • Makes phrasing easier for the interviewer. • Useful if the interviewer is unprepared. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.189 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.63 MCA 204, Data Warehousing & Data Mining Disadvantages of Open-Ended Questions • May result in too much irrelevant detail • Possibly losing control of the interview. • May take too much time for the amount of useful i f information ti gained. i d • Potentially seeming unprepared. that the interviewer is • Possibly giving the impression that the interviewer is on a “fishing expedition”. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.190 Closed Interview Questions • Closed interview questions limit the number of possible responses. • Closed interview questions are appropriate for generating precise, reliable data that is easy to analyze. • The methodology is efficient, and it requires little skill for interviewers to administer. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.191 Benefits of Closed Interview Questions • Saving interview time. • Easily comparing interviews. • Getting to the point. • Keeping control of the interview. • Covering a large area quickly. • Getting to relevant data. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.192 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.64 MCA 204, Data Warehousing & Data Mining Disadvantages of Closed Interview Questions • Boring for the interviewee. • Failure to obtain rich details. • Missing main ideas. • Failing to build rapport between interviewer and interviewee. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.193 Cont... KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.194 Bipolar Questions • Bipolar questions are those that may be answered with a “yes” or “no” or “agree” or “disagree.” • Bipolar questions should be used sparingly. • A special kind of closed question. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.195 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.65 MCA 204, Data Warehousing & Data Mining Probes • Probing questions elicit more detail about previous questions. • The purpose of probing questions is: To get more meaning. To clarify clarify. To draw out and expand on the interviewee’s point. • May be either open-ended or closed. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.196 Arranging Questions • Pyramid Starting with closed questions and working toward openended questions. • Funnel open-ended ended questions and working toward Starting with open closed questions. • Diamond Starting with closed, moving toward open-ended, and ending with closed questions. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.197 Pyramid Structure • Begins with very detailed, often closed questions. • Expands by allowing open-ended questions and more generalized responses. • Is useful if interviewees need to be warmed up to the topic or seem reluctant to address the topic. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.198 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.66 MCA 204, Data Warehousing & Data Mining Pyramid Structure Pyramid Structure for Interviewing Goes from Specific to General Questions KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.199 Funnel Structure • Begins with generalized, open-ended questions. • Concludes by narrowing the possible responses using closed questions. • P Provides id an easy, non threatening th t i way to t begin b i an interview. • Is useful when the interviewee feels emotionally about the topic. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.200 Funnel Structure Funnel Structure for Interviewing Begins with Broad Questions then Funnels to Specific Questions KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.201 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.67 MCA 204, Data Warehousing & Data Mining Diamond Structure • A diamond-shaped structure begins in a very specific way. • Then more general issues are examined • Concludes C l d with ith specific ifi questions ti • Combines the strength of both the pyramid and funnel structures • Takes longer than the other structures KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.202 Diamond-Shaped Structure Diamond-Shaped Structure for Interviewing Combines the Pyramid and Funnel Structures KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.203 Closing the Interview • Always ask “Is there anything else that you would like to add?” • Summarize and impressions. provide feedback on your • Ask whom you should talk with next. • Set up any future appointments. • Thank them for their time and shake hands. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.204 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.68 MCA 204, Data Warehousing & Data Mining Interview Report • Write as soon as possible after the interview. • Provide an initial summary, then more detail. • Review the report with the respondent. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.205 Joint Application Design (JAD) • Joint Application Design (JAD) can replace a series of interviews with the user community. • JAD is a technique that allows the analyst to accomplish requirements analysis and design the user interface with the users in a group setting. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.206 Conditions that Support the Use of JAD • Users are restless and want something new. • The organizational culture supports joint problemsolving behaviors. • A Analysts l t forecast f t an increase i i the in th number b off ideas using JAD. • Personnel may be absent from their jobs for the length of time required. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.207 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.69 MCA 204, Data Warehousing & Data Mining JAD Five Phased Approach • • • Project definition Complete high level interviews Conduct management interviews Prepare management definition guide Research Become familiar with the business are and systems Document user information requirements Document D b business i process Gather preliminary information Prepare agenda for the session Preparation Create working documents from previous phase Train the scribes Prepare visual aids Conduct pre session meetings Set up a venue for session Prepare checklist for objective © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.208 Cont... • JAD sessions Open with review of agenda and purpose Review assumptions Review data requirement Review business metrics and dimensions Discuss dimensions hierarchies and roll ups Resolve open issues Close sessions with the list of action items • Final document Convert the working document Map the gathered information List all data sources Identify all business dimensions and hierarchies Assemble and edit the document Conduct review sessions Get final approvals Establish procedure to change requirements • Success of project using JAD depend on JAD team © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.209 JAD Involves • All project team members must be committed to the JAD approach and become involved. • Executive sponsor – a senior person who will introduce and conclude the JAD session. • Analyst – gives an expert opinion about any disproportionate costs of solutions proposed • Users – try to select users that can articulate what information they need to perform f their h i jobs j b as well ll as what h they h desire d i in i anew or improved i d computer system. • Session leader – someone who has excellent communication skills to facilitate appropriate interactions. • Observers – analysts or technical experts from other functional areas to offer technical explanations and advice. • Scribe – formally write down everything that is done. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.210 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.70 MCA 204, Data Warehousing & Data Mining JAD Team • Executive sponsor Person controlling the funding, providing direction, empowering team member • Facilitator Person guiding the team through JAD process • Scribe Person designated to record all decision • Full time participants Involved in decision making for data warehouse • On call participants Person affected by project but only in specific area • Observers Person for specific session without participating in decision © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.211 Where to Hold JAD Meetings • Offsite Comfortable surroundings Minimize distractions • Attendance Schedule when participants can attend Agenda Orientation meeting KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.212 Benefits of JAD • Time is saved, compared with traditional interviewing • Rapid development of systems • Improved user ownership of the system • Creative idea production is improved KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.213 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.71 MCA 204, Data Warehousing & Data Mining Drawbacks of Using JAD • JAD requires a large block of time to be available for all session participants. • If preparation or the follow-up report is incomplete, the session may not be successful. • The organizational skills and culture may not be conducive to a JAD session. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.214 Requirements Definition Scope And Content: • Formal documentation is often neglected • requirements definition Phase. conduct interviews and GD . review i th existing the i ti documentation d t ti • requirements definition document is the basis for the next phases in the system development life cycle. But often skip the detailed documentation of the requirements definition. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.215 Questionnaires Questionnaires are useful in gathering information from key organization members about: Attitudes Beliefs Behaviors Characteristics KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.216 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.72 MCA 204, Data Warehousing & Data Mining When to Use Questionnaires • People to be questioned are widely dispersed. • Many people are involved with the project, and need to know the approval level of a proposed system. opinion • Exploratory work is needed to gauge opinion. • Need to identify and address problems with the current system. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.217 Question Types Questions are designed as either: Open-ended Try to anticipate the response you will get. Well suited for getting opinions. Closed Use when all the options may be listed. When the options are mutually exclusive. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.218 Tradeoffs between the Use of Open-Ended and Closed Questions on Questionnaires KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.219 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.73 MCA 204, Data Warehousing & Data Mining Questionnaire Language • Simple • Specific • Short patronizing g • Not p • Free of bias • Addressed to those who are knowledgeable • Technically accurate • Appropriate for the reading level of the respondent © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.220 Measurement Scales • The two different forms of measurement scales are: Nominal Interval KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.221 Nominal Scales • Nominal scales are used to classify things. • It is the weakest form of measurement. • Used to get totals for each category. What type of software do you use the most? 1 = Word Processor 2 = Spreadsheet 3 = Database 4 = An Email Program KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.222 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.74 MCA 204, Data Warehousing & Data Mining Interval Scales • An interval scale is used when the intervals are equal. • There is no absolute zero. How useful is the support given by the Technical Support Group? NOT USEFUL EXTREMELY AT ALL USEFUL 1 2 3 4 5 KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.223 Validity and Reliability • Validity is the degree to which the question measures what the analyst intends to measure. • Reliability of scales refers to consistency in response, or the likelihood of getting the same results lt if the th same questionnaire ti i was administered again under the same conditions. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.224 Problems with Scales • Leniency Caused by easy raters Solution: move the “average” category to the left or right of center • Central tendency Central tendency occurs when respondents rate everything as average. Improve I by b making ki th the diff differences smaller ll att th the ttwo ends. d Adjust the strength of the descriptors. Create a scale with more points. • Halo effect When the impression about an item in one question carries into the next question. Solution: change the focus from items to traits, by placing one trait and several items on each page. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.225 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.75 MCA 204, Data Warehousing & Data Mining Designing the Questionnaire • Allow ample white space. • Allow ample space to write or type in responses. • Make it easy for respondents to clearly mark their answers. • Be consistent in style. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.226 Order of Questions • Place most important questions first. • Cluster items of similar content together. • Introduce less controversial questions first. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.227 Different Ways to Capture Responses When Designing a Web Survey, Keep in Mind that There Are Different Ways to Capture Responses KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.228 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.76 MCA 204, Data Warehousing & Data Mining Methods of Administering the Questionnaire • Convening all concerned respondents together at one time • Personally administering the questionnaire • All Allowing i respondents d t questionnaire t to self-administer lf d i i t th the • Mailing questionnaires • Administering over the Web or via email KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.229 Electronically Submitting Questionnaires • Reduced costs. • Collecting and storing the results electronically. KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.230 Summary • Interviewing Interview preparation Question types Arranging questions The interview report • Joint Application Design (JAD) Involvement and location • Questionnaires Writing questions Using scales and overcoming problems Design and order Administering and submitting KendallInstitute & Kendall Copyright © Management, 2011 PearsonNew Education, Prentice Hall © Bharati Vidyapeeth’s of Computer Applications and Delhi-63,Inc. byPublishing Dr. Deepalias Kamthania U1.231 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.77 MCA 204, Data Warehousing & Data Mining Data Sources • The requirement definition document include the following information: Available Data sources Data Structures with in the data sources Location of the Data Sources Data extraction procedures Availability of historical data. should © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.232 Cont... • Data Transformation Data Transformation necessarily involve mapping of source data to the data in the data ware house. • Data Storage: requirement definition document must include sufficient details about storage requirement. • Information Delivery: Drill-Down Analysis. Roll-Up Analysis Slicing Ad hoc reports • Information Package Diagram © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.233 Cont… • Information Package Diagram The information packages diagrams crystallize the information requirements for the data warehouse. It contains the critical matrices measuring the performance of the b i business units, it the th business b i di dimensions i along l which hi h the th metrics t i are analyzed, and the details how drill-down & roll-up analyses are done. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.234 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.78 MCA 204, Data Warehousing & Data Mining Requirements Definition Document Outline 1. Introduction (Purpose and Scope of the Project) 2. General Requirements description (Source system review e.g. interview Summary). State what type information are required in data warehouse. 3. Specific Requirements ( data transformation and Storage requirements) 4. Information Package (form of IP dig) 5. Other Requirements ( data extract frequency, Includes Data Loading Methods, location for info delivery etc.) 6. User Expectations (How the users expect to use the data ware House) 7. User Participation (List of tasks in which users expected to participate through out the development life cycle) 8. General Implementation Plan: (give a high level plan for implementation). U1.235 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania Let’s Discuss 1. 2. 3 3. 4. VP of marketing for nation wide appliance manufacturer with three production plants. Describe three ways to analyze sales. What are business dimension for analysis. BigBook Inc is a large book distributor with domestic and international distributors to all leading bookseller. Initially build data ware house to analyze shipments that are ,made from the company many data warehouse. Determine, metrics, and business dimensions. Prepare an information package diagram. F For a data d t warehouse h on AuctionsPlus.com, A ti Pl an Internet I t t auction ti upscale for works of art gather requirement for sales analysis. Find out key metrics, business dimensions, hierarchies and categories. Draw the information package diagram. Create a detailed outline formal requirements definition document for a data warehouse to analyze profitability of large departmental store chain U1.236 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania Business Requirements as the Driving Force Business Requirements Planning & Management Maintenance Design Architecture Infrastructure Construction Architecture Infrastructure Data Acquisition Data Storage Information Delivery Data Acquisition Data Storage Information Delivery © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania Deployment U1.237 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.79 MCA 204, Data Warehousing & Data Mining Data Design • In design phase data models are required for Staging area Transform, cleanse and integrate data from source system Data D t warehouse h repository it © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.238 Requirements Driving the Data Model Information Package Diagram Data Marts (Conformed/Dependent) Dimensional Model Enterprise Data Model Relational Model Enterprise data warehouse © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.239 Composition of the Components • Source data Operational source systems Computing platforms, O/S, database files Departmental data such as files, documents & spreadsheets External data sources • Data staging Data mapping between data sources and staging area data structure Data transformation Data cleansing Data integration • Data Storage Size of extracted and integrated data DBMS features Growth potential Centralized or distributed © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.240 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.80 MCA 204, Data Warehousing & Data Mining Cont… • Information delivery Types and number of users Types of queries and reports Classes of analysis Front end DSS applications • Metadata Operational Operational meta data ETL (data extraction/transformation/loading) metadata End user meta data Metadata storage • Management & control Data loading External sources Alert systems End user information delivery © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.241 Impact of Requirement on Architecture Business Managing & Control Source Data Metadata Information Delivery Data Staging Data Storage Requirements © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.242 Data Quality Bad data leads to based decisions • Data Pollution Sources • System conversions & Migrations • Heterogeneous system integration • Inadequate database design of source systems • Data aging • Incomplete information from customers • Input errors • Internationalization/localization of systems • Lack of data management policies/procedures • Type of data quality problems • Dummy values in source system fields • Absence of data in source system fields • Multipurpose fields yp data • Cryptic • Contradicting data • Improper use of name • Violation of rules • Reused primary key • Non-unique identifiers © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.243 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.81 MCA 204, Data Warehousing & Data Mining Impact of Requirement on Metadata Business R Requirements Operational Source system data structure, External data formats Data Warehouse metadata Extraction/Transformation D t cleansing, Data l i conversion, i integration End-user Querying, reporting, analysis, OLAP, special apps U1.244 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania Data Storage Specifications • DBMS should be compatible with back and front end • Business elements that effect the choice of DBMS Level of experience Type of queries Need for openness Data loads Metadata management Data repository location Data warehouse growth • Size estimation Data staging area Overall corporate data warehouse Data marts, dependent or conformed Multi dimensional database U1.245 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania Requirement d definition on Users, location, queriees, reports, analysis Business Reequirements Impact of Business Requirement on Information Delivery Ad hoc reports •No voice •Casual user Online Complex queries •MD Analysis Intranet Information Delivery Component MD Analysis Internet Statistical Analysis E-mail Executive Info System (EIS) feed •Business Analyst •Senior Manager •High Level Managers Data Mining © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.246 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.82 MCA 204, Data Warehousing & Data Mining Conclusion • Gathering requirement for data warehouse is not same as for an operational system. • Requirement definition guides the whole process of system design and development. • D Data t warehouse h environment i t is i an information i f ti d li delivery system where user themselves access the data repository and create their own output whereas in operational system user is provided with predefined outputs. • It is essential to have right elements of information in the most optimal format. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.247 Review Questions Objective Questions: 1) A data warehouse is which of the following? a) Can be updated by end users. b) Contains numerous naming conventions and formats. c) Organized around important subject areas. d) Contains only current data. 2)An operational system is which of the following? a) A system that is used to run the business in real time and is based on historical data. b) A system that is used to run the business in real time and is based on current data. c) A system that is used to support decision making and is based on current data. d) A system that is used to support decision making and is based on historical data. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.248 Review Questions Cont... 3)The generic two-level data warehouse architecture includes which of the following? a) At least one data mart b) Data that can extracted from numerous internal and external sources c) Near real-time updates d) All of the above. 4)The active data warehouse architecture includes which of the following? a) At least one data mart b) Data that can extracted from numerous internal and external sources c) Near real-time updates d) All of the above. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.249 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.83 MCA 204, Data Warehousing & Data Mining Review Questions Cont... 5)Reconciled data is which of the following? a) Data stored in the various operational systems throughout the organization. b) Current data intended to be the single source for all decision support systems. c) Data stored in one operational system in the organization. d) Data that has been selected and formatted for end-user support applications. 6)Transient data is which of the following? a) Data in which changes to existing records cause the previous version of the records to be eliminated b) Data in which changes to existing records do not cause the previous version of the records to be eliminated c) Data that are never altered or deleted once they have been added d) Data that are never deleted once they have been added © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.250 Review Questions Cont... 7)The extract process is which of the following? a) Capturing all of the data contained in various operational systems b) Capturing a subset of the data contained in various operational systems c) Capturing all of the data contained in various decision support systems d) Capturing a subset of the data contained in various decision support systems 8)Data 8)D t scrubbing bbi is i which hi h off the th following? f ll i ? a) A process to reject data from the data warehouse and to create the necessary indexes b) A process to load the data in the data warehouse and to create the necessary indexes c) A process to upgrade the quality of data after it is moved into a data warehouse d) A process to upgrade the quality of data before it is moved into a data warehouse © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.251 Review Questions Cont... 9)The load and index is which of the following? a) A process to reject data from the data warehouse and to create the necessary indexes b) A process to load the data in the data warehouse and to create the necessary indexes c) A process to upgrade the quality of data after it is moved into a data warehouse d) A process to upgrade the quality of data before it is moved into a data warehouse 10)Data transformation includes which of the following? a) A process to change data from a detailed level to a summary level b) A process to change data from a summary level to a detailed level c) Joining data from one source into various sources of data d) Separating data from one source into various sources of data © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.252 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.84 MCA 204, Data Warehousing & Data Mining Review Questions Cont... Short answer type Questions Q1. Explain the need of metadata in a data warehouse? Q2. What do you mean by Strategic Information? Q3. Differentiate between Data Warehouse and Data Mart? Q4. What do you mean by a Web-enabled data warehouse? Q5 Define OLTP? Q5. Q6. What type of Processing take Place in a data warehouse? Q7. Define ETL routine? Q8. What data does an information package contain? Q9. In which situations can JAD methodology be successful for collecting requirements? Q10. List various data sources that feed the data warehouse? © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.253 Review Questions Cont... Long answer type Questions Q1. Explain Data warehouse Architecture in detail? Q2. Explain business Dimensions. Why and how can business dimensions be useful for defining requirements for the data warehouse? growth Q3. State anyy three factors that indicate the continued g in data warehousing. Can you think of some examples? Q4. Discuss the top-down and bottom up approach of creating a data warehouse? Q5. For a commercial bank, name five types of strategic objectives and explain each objective in detail. Q6. What do you mean by Information Packages and also explain the need for information packages? © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.254 Review Questions Cont... Q7. A data warehouse is an environment, not a product. Discuss. Q8. Explain various type of data ware house meta data in detail. Q9. For an airlines company, how can strategic information q flyers? y Discuss g giving g increases the number of frequent specific details. Q10.Examine the opportunities that can be provided by strategic information for a medical center. Can you explain five such opportunities © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.255 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.85 MCA 204, Data Warehousing & Data Mining Suggested Reading/References 1. Paul Raj Poonia, “Fundamentals of Data Warehousing”, John Wiley & Sons, 2003. 2. Sam Anahony, “Data Warehousing in the Real World: A Practical Guide for Building Decision Support Systems”, John Wiley, 2004 3. W. H. Inmon, “Building the Operational Data Store”, 2nd Ed., John Wiley, 1999. 4. Kamber and Han, Data Mining Concepts and Techniques”, Hartcourt India P. Ltd.,2001”. © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.256 © Bharati Vidyapeeth’s Institute of Computer Applications and Management, New Delhi-63, by Dr. Deepali Kamthania U1.86
© Copyright 2026 Paperzz