Database, KDD, Database Design, and Data Access Part 1. Overview by Dr. Junping Sun https://www.nova.edu/publications/scis/faculty-viewbook/#42 Part 2. Hands-On by Dr. Peixiang Liu https://www.nova.edu/publications/scis/faculty-viewbook/#26 Department of Computer Science College of Engineering and Computing Nova Southeastern University Junping Sun Database, KDD, Database Design and Data Access 1-1 Data, Database, Database Management, and Applications Junping Sun Database, KDD, Database Design and Data Access 1-2 Outline of Presentation Basic Concepts Data, Database, Database Management System, Database System Overview and History of Database Database Design and Database Application Development Data Access and Manipulation Data Mining and Knowledge Discovery in Databases Other Relevant Issues Hands-on Practice by Dr. and Professor Peixiang Liu. Questions/Answers Junping Sun Database, KDD, Database Design and Data Access 1-3 IOT? Junping Sun Database, KDD, Database Design and Data Access 1-4 IOT = ABC Internet of Things = Artificial Intelligence + Big Data (Analysis) + Cloud Computing STEM (Science, Technology, Engineering, Mathematics) Junping Sun Database, KDD, Database Design and Data Access 1-5 Data, Data Type, Database, and Database System Data: The real world data collected can be various formats and types from various sources. Data Type: Primitive - Integer, Float/Real, Character, Boolean, etc. Non-Primitive – Structural (Composition of Primitive Data Type), SemiStructural (Hypertext, Hypermedia, XML), Un-structural (Documents) Database: A database is a collection of data with different data types and various formats. Database System: • Database System = Database(s) + Database Management System(s) (Data) ( System Software) Junping Sun Database, KDD, Database Design and Data Access 1-6 A Sample Relational Database Table Customer-Name Social-Security Customer-Street Customer-City Account-Number Johnson Alma 192-83-7465 Alma Palo Alto A-101 Smith 019-28-3746 North Rye A-215 Hayes 677-89-9011 Main Harrison A-102 Turner 182-73-6091 Putnam Stamford A-201 Johnson 192-83-7465 Alma Palo Alto A-201 Jones 321-12-3123 Main Harrison A-217 Lindsay 336-66-9999 Park Pittsfield A-222 Smith 019-28-3746 North Rye A-201 Junping Sun Database, KDD, Database Design and Data Access 1-7 Database Management System (DBMS) Database Management System (Database Engine): A system software manages retrieval and storage of data. Relational DBMS or SQL DBMS: IBM DB2 (20.2%), Oracle (48.8%), Microsoft SQL Server (17%), Sybase (4.6%), Teradata (3.7%), etc. Open Source Relational DBMS: MySQL, PostgreSQL, SQLite, etc. Object-Oriented DBMS: Gemstone, Versant NoSQL Database: Hbase (modeled after Google’s Big Table, column based) MongoDB (document oriented) CouchDB (key-value) Graph based Junping Sun Database, KDD, Database Design and Data Access 1-8 Database Management System as an Interface Applications ..... ..... System Software ..... DBMS ..... Others ..... ..... Operating Systems Kernels and Internals ..... Computer Hardware Junping Sun Database, KDD, Database Design and Data Access 1-9 A Sample Network Database A-101 500 Johnson 192-83-7456 Alma Palo Alto Smith 019-28-3746 North Rye Hayes 677-89-9011 Main Jarrison Tumer 182-73-6091 Putnam Stamford Jones 321-12-3123 Main Harrison Lindsay 336-66-9999 Park Pittsfiel Junping Sun Database, KDD, Database Design and Data Access A-215 700 A102 400 A-305 350 A-201 900 A-217 750 A-222 700 1-10 A Sample Hierarchical Database Johnson 192-83-7465 ... Smith 019-28-3746 ... Hayes 677-89-9 ... Tumer 182-73-609 ... Jones 321-12-3123 ... A-101 500 A-201 900 Lindsay 336-66-9999 ... A-102 400 A-217 750 A-215 700 Junping Sun A-201 900 A-305 350 Database, KDD, Database Design and Data Access A-222 700 1-11 Terminologies of Relational Database Table Name or Schema Name Column Names or Attributes CUSTOMER Social-Security Customer -Street Johnson Alma 192-83-7465 Alma Palo Alto A-101 Smith 019-28-3746 North Rye A-215 Hayes 677-89-9011 Main Harrison A-102 Turner 182-73-6091 Putnam Stamford A-201 Johnson 192-83-7465 Alma Palo Alto A-201 Jones 321-12-3123 Main Harrison A-217 Lindsay 336-66-9999 Park Pittsfield A-222 Smith 019-28-3746 North Rye A-201 Customer-Name Records or Tuples Junping Sun Database, KDD, Database Design and Data Access CustomerCity AccountNumber 1-12 Example of a Relational Database EMPLOYEE FNAME MINIT John Franklin Alicia Jennifer Ramesh Joyce Ahmad James SEX SALARY SUPERSSN DNO B T LNAME Smith Wong 123456789 333445555 SSN 09-JAN-55 08-DEC-45 BDATE 731 Fondren, Houston, TX 638 Voss, Houston, TX ADDRESS M M 30000 40000 333445555 888665555 5 5 J S Zelaya Wallace 999887777 987654321 19-JUL-58 20-JUN-31 3321 Castle, Spring, TX 291 Berry, Bellaire, TX F F 25000 43000 987654321 888665555 4 4 K A V E Narayan English Jabbar Borg 666884444 453453453 987987987 888665555 15-SEP-52 31-JUL-62 29-MAR-59 10-NOV-27 975 Fire Oak, Humble, TX 5631 Rice, Houston, TX 980 Dallas, Houston, TX 450 Stone, Houston, TX M F M M 38000 25000 25000 55000 333445555 333445555 987654321 null 5 5 4 1 DEPT_LOCATIONS DEPARTMENT DNAME DNUMBER Research Administration Headquarters WORKS_ON DEPENDENT ESSN PNO 1 2 3 32.5 7.5 40.0 453453453 453453453 333445555 1 2 2 20.0 20.0 10.0 333445555 333445555 333445555 3 10 20 999887777 30 10.0 10.0 10.0 30.0 999887777 987987987 10 10 987987987 987654321 987654321 888665555 30 30 20 20 ESSN 987654321 123456789 123456789 123456789 MGRSTARTDATE 333445555 987654321 888665555 22-MAY-78 01-JAN-85 19-JUN-71 DLOCATION 1 4 5 5 5 Houston Stafford Bellaire Sugarland Houston HOURS 123456789 123456789 666884444 333445555 333445555 333445555 Junping Sun 5 4 1 MGRSSN DNUMBER PROJECT PNAME PNUMBER ProductX ProductY ProductZ Computerization Reorganization Newbenefits 10.0 35.0 5.0 1 2 3 10 20 30 PLOCATION DNUM Bellaire Sugarland Houston Stafford Houston Stafford 5 5 5 4 1 4 20.0 15.0 null DEPENDENT_NAME Alice Theodore Joy Abner Michael Alice Elizabeth SEX BDATE RELATIONSHIP F M F 05-APR-76 25-OCT-73 03-MAY-48 DAUGHTER SON SPOUSE M M F F 29-FEB-32 01-JAN-78 31-DEC-78 05-MAY-57 SPOUSE SON DAUGHTER SPOUSE Database, KDD, Database Design and Data Access 1-13 Database Design Outline Real World (Mini World) Semantic Model (ER Model) Relational Schema Junping Sun Network Schema Hierarchical Schema Object-Oriented Schema Database, KDD, Database Design and Data Access Object-Relational Schema 1-14 A Sample Entity-Relationship Diagram attributes customer-street social-security customer-city balance amount customer-name customer entity account-number relationship M deposit N account entity Customers deposit into bank accounts. noun verb noun A customer can have several accounts, and an account may be shared by several customers. Junping Sun Database, KDD, Database Design and Data Access 1-15 Translating Entity-Relationship Diagram to Relational Tables CUSTOMER Customer Name Social Security Number Customer Street Customer City DEPOSIT Account Number Social Security Number Amount ACCOUNT Account Number Junping Sun Balance Database, KDD, Database Design and Data Access 1-16 Data Models Data Model: An abstraction framework to capture semantic meaning of data for database design. Data Model = Schema + Operations + Constraints Entity-Relationship model (by Peter Chen, 1976) is used for the first step of database design. Relational model (by E. F. Codd, 1970) is widely used for database development, implementation, and applications. The relational model is based on the first-order predicate logic and calculus. “Relational Database: A Practical Foundation for Productivity”, ACM 1981 Turing Award Lecture, by E. F. Codd. Junping Sun Database, KDD, Database Design and Data Access 1-17 Entity Relationship Model Entity-Relationship model (by Peter Chen, 1976) is used for the first step of database design in most IT practice. The database design created by ER model can be considered as the pseudo code of database schema before translated into implementation model schema such as relational model schema. There are some ER CASE tools such as Microsoft Visio, ERWIN, etc. Junping Sun Database, KDD, Database Design and Data Access 1-18 Relational Data Model Relational model (by E. F. Codd, 1970) is widely used for database development, implementation, and applications. “Relational Database: A Practical Foundation for Productivity”, ACM 1981 Turing Award Lecture, by E. F. Codd. Junping Sun Database, KDD, Database Design and Data Access 1-19 More on Relational Data Model According to Jeffery D. Ullman (CS Professor of Emeritus, Stanford University) The relational model is the best example of good theory. The relational model provides one basic (simple) structure. The relational model is good for anything (any data application). The relational model is perfect for almost nothing. Junping Sun Database, KDD, Database Design and Data Access 1-20 Arithmetic System vs. Relational Model Schema Operations Constraints Junping Sun Arithmetic System Relational Model Number systems A set of two dimensional tables +, - , , , , etc. 5 0 is undefined. +, are commutative and associative The results from +, are closed select, project, union, intersect, difference, Cartesian product, join, division, etc. select, project, union, and Cartesian product are commutative and associative The results from operations applied to relational tables is a table. Database, KDD, Database Design and Data Access 1-21 Data Access - Storage and Retrieval by Structural Query Language (SQL) CUSTOMER Customer Name Social Security Number create table customer (customer_name social_security_number customer_street customer city Customer Street Customer City varchar(20), char(9), varchar(20), varchar(10)); insert into customer values (‘Richard Smith’, ‘123456789’, ‘ 3301 College Avenue’, ‘Davie’); select * from customer; Junping Sun Database, KDD, Database Design and Data Access 1-22 A Simplified Database System Environment Users/Programmers Application Programs/Queries DBMS SOFTWARE SOFTWARE TO PROCESS QUERIES/PROGRAMS SOFTWARE TO ACCESS STORED DATA Stored Database Definition Stored Databases META-DATA Junping Sun Database, KDD, Database Design and Data Access 1-23 History of Data Processing Data collection and database creation (1960’s and earlier) - Primitive file processing Database management systems (1970’s) - Hierarchical, network, relational database systems - Online transaction processing Advanced database systems (mid 1980’s – present) Advanced data models: Object-relational, object-oriented Data warehousing and data mining (late 1980’s – present) On-line analytic processing New generation of information systems (2000 - ….), NoSQL, search engine Junping Sun Database, KDD, Database Design and Data Access 1-24 The Evolution of Databases File Systems Network Hierarchical Relational Object-Oriented Semantic Models Languages Complex Object Models Object-Oriented Databases Hypermedia Information Retrieval Artificial Intelligence Intelligent Databases Junping Sun Database, KDD, Database Design and Data Access 1-25 Who are Those Famous People in Database History Charles William Bachman (1924 - ) ACM Turing Award Recipient (1973) Contribution – Database Technology, Network Data Model http://amturing.acm.org/award_winners/bachman_9385610.cfm Junping Sun Database, KDD, Database Design and Data Access 1-26 Who are Those Famous People in Database History Edgar F. (“Ted”) Codd (1923-2003) ACM Turing Award Recipient (1981) Contribution – Relational Model of Data http://amturing.acm.org/award_winners/codd_1000892.cfm Junping Sun Database, KDD, Database Design and Data Access 1-27 Who are Those Famous People in Database History James Nicholas Gray (Born in 1944, disappeared in 2007, declared legally dead in 2012) ACM Turing Award Recipient (1998) Major Contribution to the Theory and Practice of Transaction Processing http://amturing.acm.org/award_winners/gray_3649936.cfm Junping Sun Database, KDD, Database Design and Data Access 1-28 Who are Those Famous People in Database History Michael Stonebraker (1943 - ) ACM Turing Award Recipient (2014) fundamental contributions to the concepts and practices underlying modern database systems (INGRES, Postgres, etc.) http://amturing.acm.org/award_winners/stonebraker_1172121.cfm Junping Sun Database, KDD, Database Design and Data Access 1-29 An Overview of Steps Comprising the KDD Processing Interpretation/Evaluation Data Mining knowledge Knowledge Transformation Preprocessing Selection Selection ... ... ... ... ... ... Data Data Junping Sun Target Data Processed Data Pattern Transformed Data Database, KDD, Database Design and Data Access 1-30 Example of K-Nearest Neighbor Classification in Data Mining Class 1 Sample Class 2 Sample Unknown Sample If it walks like a duck, quacks like a duck, and looks like a duck, then it is probably a duck. Junping Sun Database, KDD, Database Design and Data Access 1-31 Justification of K-Nearest Neighbor Classification If it walks like a duck, quacks like a duck, and looks like a duck, then it is probably a duck. Junping Sun Database, KDD, Database Design and Data Access 1-32 How to Measure “nearby” or the Similarity? One of the most popular distance measure is Euclidean distance in analytic geometry. For given two data points (or data records): X = (x1, …, xp) U = (u1, …, up) The Euclidean distance between the data points X and U are: Junping Sun Database, KDD, Database Design and Data Access 1-33 Example Junping Sun Database, KDD, Database Design and Data Access 1-34 Example Junping Sun Database, KDD, Database Design and Data Access 1-35 Example We can now use the training data set to classify an unknown case (Age = 48 and Loan = $142,000) using Euclidean distance. 𝐷= (𝑥1 − 𝑦1 )2 +(𝑥2 − 𝑦2 )2 = (48 − 33)2 +(142000 − 150000)2 = 8000.1 If K = 1 then the nearest neighbor is the last case in the training set with Default=Y. K = 1 means use the single nearest record. With K = 3, there are two Default = Y and one Default = N out of three closest neighbors. The prediction for the unknown case is again Default = Y. http://www.saedsayad.com/flash/KNN_flash.html Junping Sun Database, KDD, Database Design and Data Access 1-36 Standardized or Normalized Distance 1 2 3 Junping Sun Database, KDD, Database Design and Data Access 1-37 Database Technology Database Technology: • It is comprehensive applications of computer science and other technologies, and encompasses most of computer science subjects. Compiler Data Structures & Algorithms Operating Systems System Analysis Software Engineering AI and Expert Systems Optimization Theory User Interface and Human Factors Network and Distributed Systems Mathematical Predicate Logic Cryptography and Others Junping Sun Database Languages Storage Structures & Data Access Concurrency Control Database Modeling and Design DB & DBMS Development Heuristic Search and DSS Query Optimization Database Interfaces (GUI) Distributed Database Systems Database Theory Database Security Database, KDD, Database Design and Data Access 1-38 Issues in Database Applications Data Processing (Search) Correct, Efficiency, Security, and User Friendly Information Storage and Retrieval Search Algorithms & Access Mechanisms Data Structures Physical Storage Computer Hardware Junping Sun Database, KDD, Database Design and Data Access 1-39 Database and DBMS Revisited • Programs = Data Structures + Algorithms Nicklaus Wirth • Database System = Databases + Database Management System(s) • Database Management System = Data Model + Data Structures + Algorithms • Database Query (Data Query): Deductive Processing • Inductive Processing Data Mining (Knowledge Query): Junping Sun Database, KDD, Database Design and Data Access 1-40 Multiple of Bytes (Orders of Magnitude of Data) Multiples of Bytes Decimal Value Binary Metric Value JEDEC 1024 = 210 KB kilobyte IEC 1000 KB kilobyte KiB kibibyte 10002 MB megabyte 10242 = 220 MB megabyte 10003 GB gigabyte 10243 = 230 GB gigabyte GiB gibibyte 10004 TB terabyte 10244 = 240 TB TiB tebibyte 10005 PB petabyte 10245 = 250 PB PiB pebibyte 10006 EB exabyte 10246 = 260 EB EiB exbibyte 10007 ZB zettabyte 10247 = 220 ZB ZiB zebbibyte 10008 YB yottabyte 10248 = 280 YB YiB yobibyte MiB mebibyte • JEDEC - Joint Electron Device Engineering Council • IEC - International Electrotechnical Commission Junping Sun Database, KDD, Database Design and Data Access 1-41 Different Perspective View of Database Systems Application View Management View Data Management Systematic View Database System Management Methodological/Algorithmic View Theoretical View Logical View Philosophical View Junping Sun Database, KDD, Database Design and Data Access 1-42 Computer Science is a Problem Solving Science Database applications are comprehensive applications of computer science and other technologies. Database technology is one of the branches in computer science. Computer science is a problem solving science, so the database technology to solve various problems in data processing. Domain Knowledge + Critical Thinking + Methodology = Problem Solving. Book: “Out of Their Minds: The Lives and Discoveries of 15 Great Computer Scientists,” by Dennis Shasha and Cathy Lazere Copernicus, an imprint of Springer-Verlag, ISBN:0-387-97992-1 Junping Sun Database, KDD, Database Design and Data Access 1-43 IOT = ABC Internet of Things = Artificial Intelligence + Big Data (Analysis) + Cloud Computing STEM (Science, Technology, Engineering, Mathematics) Junping Sun Database, KDD, Database Design and Data Access 1-44 A Few Words as Last, But Not Necessarily Least If you realize something you did not know, you are learning and advancing. One of the purposes of this summer camp is to engage you to prepare for your next milestone in your life. Your success in your future may imply and reflect the success of our summer camp to some extent. You may want to visit the website: Bureau of Labor Statistics, United States Department of Labor: http://www.bls.gov/ooh/computer-and-information-technology/home.htm http://www.marketwatch.com/story/want-better-work-life-balance-here-arethe-25-best-jobs-for-you-2015-10-20?siteid=rss&rss=1 Junping Sun Database, KDD, Database Design and Data Access 1-45 Thank You! Questions/Answers Junping Sun Database, KDD, Database Design and Data Access 1-46
© Copyright 2025 Paperzz