Harvesting Hidden Information in Structured Data Mark Grundy, ConSolve Pty Ltd [email protected] In this talk What is Data Mining? Who is doing it What they’re finding How you do it Techy buzzwords Sample tools Tricks and traps Data Mining and Knowledge Management Conclusions Data Mining: Definition KDD: “Knowledge Discovery in Databases” “The process of finding valid, novel, potentially useful and understandable patterns in data” Usama Fayyad, The KDD Process for Extracting Useful Knowledge from Volumes of Data CACM 1996,v39,11 What does it mean? Process: it is iterative and interactive Pattern: models, trends, links, causes, behaviours Valid: you want the truth Novel: you didn’t already know it Useful: should help you with some problem Understandable: need to know why the pattern is there Management Reporting vs Data Mining Management Reporting Periodic Automated Data Mining Recent events Potential future events Ad-hoc Interactive, iterative Well defined problem, eg: Ill defined problems, eg: profit, operational fraud, customer performance behaviours Simple tools, simple Complex tools, complex processes processes Separated data sets “Joined up” data sets Low risk, medium value High risk, high value Who is doing it Sector Why Financial Customer prediction Communications Customer prediction Retail Customer prediction Health Customer prediction, epidemics Manufacturing Production failures, supply chain management Modelling complex relationships Science Business in general: Efficiency, performance, waste minimisation, compliance & audit What you can find Hidden customer behaviours: Buying patterns (eg, beer and nappies) Life events (eg, childbirth, divorce) Churn (eg, mobile phones) Health trends (eg, preventative medicine) Performance opportunities Underdeveloped markets Emerging trends Inefficiencies Fraud and non-compliance Î “Nuggets” of gold: new discoveries Î New predictive models KDD Process – How you do it Learn the domain Locate the right data Clean the data Develop the model Analyse Interpret Apply the learning Key KDD Tools Simplified Process Schema Goal Dictionary Database Cleansing Modelling Analysis Output Query Tools Stats, AI Tools Visualisation Tools Presentation Tools Supporting Tools Sample Products Red Brick data warehouses SQL Server Analysis Services Datastage ETL, Microsoft DTS SAS Intelligent Miner Cognos, Hyperion, Business Objects Netmap Some Techy KDD Buzzwords Data mart, data warehouse, cube Extract, Transform, Load (ETL) Classification and Regression Trees (CART) OLAP, ROLAP, MOLAP, HOLAP, DOLAP Clusters and centroids Genetic algorithms Univariate & multivariate analyses KDD vs Knowledge Management Prediction Formal Knowledge Informal Knowledge Tacit Knowledge Domain Study, Model Interpretation Knowledge Discovery Key issues Finding the right problems Data quality Data meaning and interpretation Effective models (FABRIC criteria) Theory vs. practice: what can go wrong Poor sponsorship Expectations Politics, exposure, accountability Scope & success criteria Business & data definitions Poor model designs Technology vs. users Process vs. technologies Process vs. outcome Focus on just internal records What IM can contribute Text mining (work in progress) Image/sound mining? Metadata, meanings Business information definitions Inter-organisational standards Registration of data sets, reports Questions & Discussion
© Copyright 2026 Paperzz