Relational Data Model - College of Engineering and Computing

Database, KDD, Database Design,
and Data Access
Part 1. Overview
by Dr. Junping Sun
https://www.nova.edu/publications/scis/faculty-viewbook/#42
Part 2. Hands-On
by Dr. Peixiang Liu
https://www.nova.edu/publications/scis/faculty-viewbook/#26
Department of Computer Science
College of Engineering and Computing
Nova Southeastern University
Junping Sun
Database, KDD, Database Design and Data Access
1-1
Data, Database, Database Management,
and Applications
Junping Sun
Database, KDD, Database Design and Data Access
1-2
Outline of Presentation
 Basic Concepts
Data, Database, Database Management System, Database System
 Overview and History of Database
 Database Design and Database Application Development
 Data Access and Manipulation
 Data Mining and Knowledge Discovery in Databases
 Other Relevant Issues
 Hands-on Practice by Dr. and Professor Peixiang Liu.
 Questions/Answers
Junping Sun
Database, KDD, Database Design and Data Access
1-3
IOT?
Junping Sun
Database, KDD, Database Design and Data Access
1-4
IOT = ABC
Internet of Things = Artificial Intelligence +
Big Data (Analysis) +
Cloud Computing
STEM
(Science, Technology, Engineering, Mathematics)
Junping Sun
Database, KDD, Database Design and Data Access
1-5
Data, Data Type, Database, and Database System
Data:
 The real world data collected can be various formats and types from
various sources.
Data Type:
 Primitive - Integer, Float/Real, Character, Boolean, etc.
 Non-Primitive – Structural (Composition of Primitive Data Type), SemiStructural (Hypertext, Hypermedia, XML), Un-structural (Documents)
Database:
 A database is a collection of data with different data types and various
formats.
Database System:
• Database System = Database(s) + Database Management System(s)
(Data)
( System Software)
Junping Sun
Database, KDD, Database Design and Data Access
1-6
A Sample Relational Database Table
Customer-Name
Social-Security
Customer-Street
Customer-City
Account-Number
Johnson Alma
192-83-7465
Alma
Palo Alto
A-101
Smith
019-28-3746
North
Rye
A-215
Hayes
677-89-9011
Main
Harrison
A-102
Turner
182-73-6091
Putnam
Stamford
A-201
Johnson
192-83-7465
Alma
Palo Alto
A-201
Jones
321-12-3123
Main
Harrison
A-217
Lindsay
336-66-9999
Park
Pittsfield
A-222
Smith
019-28-3746
North
Rye
A-201
Junping Sun
Database, KDD, Database Design and Data Access
1-7
Database Management System (DBMS)
Database Management System (Database Engine):
 A system software manages retrieval and storage of data.
Relational DBMS or SQL DBMS:
 IBM DB2 (20.2%), Oracle (48.8%), Microsoft SQL Server (17%),
Sybase (4.6%), Teradata (3.7%), etc.
Open Source Relational DBMS:
 MySQL, PostgreSQL, SQLite, etc.
Object-Oriented DBMS:
 Gemstone, Versant
NoSQL Database:
 Hbase (modeled after Google’s Big Table, column based)
 MongoDB (document oriented)
 CouchDB (key-value)
 Graph based
Junping Sun
Database, KDD, Database Design and Data Access
1-8
Database Management System as an Interface
Applications
.....
.....
System Software
.....
DBMS
.....
Others
.....
.....
Operating Systems Kernels and Internals
.....
Computer Hardware
Junping Sun
Database, KDD, Database Design and Data Access
1-9
A Sample Network Database
A-101 500
Johnson
192-83-7456
Alma
Palo Alto
Smith
019-28-3746
North
Rye
Hayes
677-89-9011
Main
Jarrison
Tumer
182-73-6091 Putnam Stamford
Jones
321-12-3123
Main
Harrison
Lindsay
336-66-9999
Park
Pittsfiel
Junping Sun
Database, KDD, Database Design and Data Access
A-215 700
A102 400
A-305 350
A-201 900
A-217 750
A-222 700
1-10
A Sample Hierarchical Database
Johnson 192-83-7465 ...
Smith 019-28-3746 ...
Hayes 677-89-9
...
Tumer 182-73-609 ...
Jones 321-12-3123 ...
A-101 500
A-201 900
Lindsay 336-66-9999 ...
A-102 400
A-217 750
A-215 700
Junping Sun
A-201 900
A-305 350
Database, KDD, Database Design and Data Access
A-222 700
1-11
Terminologies of Relational Database
Table Name or Schema Name
Column Names or Attributes
CUSTOMER
Social-Security
Customer
-Street
Johnson Alma
192-83-7465
Alma
Palo Alto
A-101
Smith
019-28-3746
North
Rye
A-215
Hayes
677-89-9011
Main
Harrison
A-102
Turner
182-73-6091
Putnam
Stamford
A-201
Johnson
192-83-7465
Alma
Palo Alto
A-201
Jones
321-12-3123
Main
Harrison
A-217
Lindsay
336-66-9999
Park
Pittsfield
A-222
Smith
019-28-3746
North
Rye
A-201
Customer-Name
Records or
Tuples
Junping Sun
Database, KDD, Database Design and Data Access
CustomerCity
AccountNumber
1-12
Example of a Relational Database
EMPLOYEE
FNAME
MINIT
John
Franklin
Alicia
Jennifer
Ramesh
Joyce
Ahmad
James
SEX
SALARY
SUPERSSN
DNO
B
T
LNAME
Smith
Wong
123456789
333445555
SSN
09-JAN-55
08-DEC-45
BDATE
731 Fondren, Houston, TX
638 Voss, Houston, TX
ADDRESS
M
M
30000
40000
333445555
888665555
5
5
J
S
Zelaya
Wallace
999887777
987654321
19-JUL-58
20-JUN-31
3321 Castle, Spring, TX
291 Berry, Bellaire, TX
F
F
25000
43000
987654321
888665555
4
4
K
A
V
E
Narayan
English
Jabbar
Borg
666884444
453453453
987987987
888665555
15-SEP-52
31-JUL-62
29-MAR-59
10-NOV-27
975 Fire Oak, Humble, TX
5631 Rice, Houston, TX
980 Dallas, Houston, TX
450 Stone, Houston, TX
M
F
M
M
38000
25000
25000
55000
333445555
333445555
987654321
null
5
5
4
1
DEPT_LOCATIONS
DEPARTMENT
DNAME
DNUMBER
Research
Administration
Headquarters
WORKS_ON
DEPENDENT
ESSN
PNO
1
2
3
32.5
7.5
40.0
453453453
453453453
333445555
1
2
2
20.0
20.0
10.0
333445555
333445555
333445555
3
10
20
999887777
30
10.0
10.0
10.0
30.0
999887777
987987987
10
10
987987987
987654321
987654321
888665555
30
30
20
20
ESSN
987654321
123456789
123456789
123456789
MGRSTARTDATE
333445555
987654321
888665555
22-MAY-78
01-JAN-85
19-JUN-71
DLOCATION
1
4
5
5
5
Houston
Stafford
Bellaire
Sugarland
Houston
HOURS
123456789
123456789
666884444
333445555
333445555
333445555
Junping Sun
5
4
1
MGRSSN
DNUMBER
PROJECT
PNAME
PNUMBER
ProductX
ProductY
ProductZ
Computerization
Reorganization
Newbenefits
10.0
35.0
5.0
1
2
3
10
20
30
PLOCATION
DNUM
Bellaire
Sugarland
Houston
Stafford
Houston
Stafford
5
5
5
4
1
4
20.0
15.0
null
DEPENDENT_NAME
Alice
Theodore
Joy
Abner
Michael
Alice
Elizabeth
SEX
BDATE
RELATIONSHIP
F
M
F
05-APR-76
25-OCT-73
03-MAY-48
DAUGHTER
SON
SPOUSE
M
M
F
F
29-FEB-32
01-JAN-78
31-DEC-78
05-MAY-57
SPOUSE
SON
DAUGHTER
SPOUSE
Database, KDD, Database Design and Data Access
1-13
Database Design Outline
Real World (Mini World)
Semantic Model (ER Model)
Relational
Schema
Junping Sun
Network
Schema
Hierarchical
Schema
Object-Oriented
Schema
Database, KDD, Database Design and Data Access
Object-Relational
Schema
1-14
A Sample Entity-Relationship Diagram
attributes
customer-street
social-security
customer-city
balance
amount
customer-name
customer
entity
account-number
relationship
M
deposit
N
account
entity
 Customers deposit into bank accounts.
noun
verb
noun
 A customer can have several accounts, and an account may be shared by several
customers.
Junping Sun
Database, KDD, Database Design and Data Access
1-15
Translating Entity-Relationship Diagram to Relational
Tables
CUSTOMER
Customer Name
Social Security Number
Customer Street
Customer City
DEPOSIT
Account Number Social Security Number
Amount
ACCOUNT
Account Number
Junping Sun
Balance
Database, KDD, Database Design and Data Access
1-16
Data Models
Data Model:
 An abstraction framework to capture semantic meaning of data for
database design.
 Data Model = Schema + Operations + Constraints
 Entity-Relationship model (by Peter Chen, 1976) is used for the first step
of database design.
 Relational model (by E. F. Codd, 1970) is widely used for database
development, implementation, and applications. The relational model is
based on the first-order predicate logic and calculus.
 “Relational Database: A Practical Foundation for Productivity”, ACM 1981
Turing Award Lecture, by E. F. Codd.
Junping Sun
Database, KDD, Database Design and Data Access
1-17
Entity Relationship Model
 Entity-Relationship model (by Peter Chen, 1976) is used for the first step
of database design in most IT practice.
 The database design created by ER model can be considered as the
pseudo code of database schema before translated into implementation
model schema such as relational model schema.
 There are some ER CASE tools such as Microsoft Visio, ERWIN, etc.
Junping Sun
Database, KDD, Database Design and Data Access
1-18
Relational Data Model
 Relational model (by E. F. Codd, 1970) is widely used for database
development, implementation, and applications.
 “Relational Database: A Practical Foundation for Productivity”, ACM 1981
Turing Award Lecture, by E. F. Codd.
Junping Sun
Database, KDD, Database Design and Data Access
1-19
More on Relational Data Model
According to Jeffery D. Ullman (CS Professor of Emeritus, Stanford University)




The relational model is the best example of good theory.
The relational model provides one basic (simple) structure.
The relational model is good for anything (any data application).
The relational model is perfect for almost nothing.
Junping Sun
Database, KDD, Database Design and Data Access
1-20
Arithmetic System vs. Relational Model
Schema
Operations
Constraints
Junping Sun
Arithmetic System
Relational Model
Number systems
A set of two dimensional tables
+, - , , ,
, etc.
5  0 is undefined.
+,  are commutative and
associative
The results from +,  are closed
select, project, union, intersect, difference,
Cartesian product, join, division, etc.
select, project, union, and Cartesian product
are commutative and associative
The results from operations applied to
relational tables is a table.
Database, KDD, Database Design and Data Access
1-21
Data Access - Storage and Retrieval by Structural Query
Language (SQL)
CUSTOMER
Customer Name
Social Security Number
create table customer
(customer_name
social_security_number
customer_street
customer city
Customer Street
Customer City
varchar(20),
char(9),
varchar(20),
varchar(10));
insert into customer values
(‘Richard Smith’, ‘123456789’, ‘ 3301 College Avenue’, ‘Davie’);
select *
from customer;
Junping Sun
Database, KDD, Database Design and Data Access
1-22
A Simplified Database System Environment
Users/Programmers
Application Programs/Queries
DBMS
SOFTWARE
SOFTWARE TO PROCESS
QUERIES/PROGRAMS
SOFTWARE TO ACCESS
STORED DATA
Stored
Database
Definition
Stored
Databases
META-DATA
Junping Sun
Database, KDD, Database Design and Data Access
1-23
History of Data Processing
Data collection and database creation
(1960’s and earlier)
- Primitive file processing
Database management systems
(1970’s)
- Hierarchical, network, relational database systems
- Online transaction processing
Advanced database systems
(mid 1980’s – present)
Advanced data models:
Object-relational, object-oriented
Data warehousing and data mining
(late 1980’s – present)
On-line analytic processing
New generation of information systems
(2000 - ….), NoSQL, search engine
Junping Sun
Database, KDD, Database Design and Data Access
1-24
The Evolution of Databases
File Systems
Network
Hierarchical
Relational
Object-Oriented Semantic Models
Languages
Complex Object Models
Object-Oriented Databases
Hypermedia
Information Retrieval
Artificial
Intelligence
Intelligent Databases
Junping Sun
Database, KDD, Database Design and Data Access
1-25
Who are Those Famous People in Database History
Charles William Bachman (1924 - )
 ACM Turing Award Recipient (1973)
 Contribution – Database Technology, Network Data Model
 http://amturing.acm.org/award_winners/bachman_9385610.cfm
Junping Sun
Database, KDD, Database Design and Data Access
1-26
Who are Those Famous People in Database History
Edgar F. (“Ted”) Codd (1923-2003)
 ACM Turing Award Recipient (1981)
 Contribution – Relational Model of Data
 http://amturing.acm.org/award_winners/codd_1000892.cfm
Junping Sun
Database, KDD, Database Design and Data Access
1-27
Who are Those Famous People in Database History
James Nicholas Gray
(Born in 1944, disappeared in 2007, declared legally dead in 2012)
 ACM Turing Award Recipient (1998)
 Major Contribution to the Theory and Practice of Transaction Processing
 http://amturing.acm.org/award_winners/gray_3649936.cfm
Junping Sun
Database, KDD, Database Design and Data Access
1-28
Who are Those Famous People in Database History
Michael Stonebraker (1943 - )
 ACM Turing Award Recipient (2014)
 fundamental contributions to the concepts and practices underlying modern
database systems (INGRES, Postgres, etc.)
 http://amturing.acm.org/award_winners/stonebraker_1172121.cfm
Junping Sun
Database, KDD, Database Design and Data Access
1-29
An Overview of Steps Comprising the KDD Processing
Interpretation/Evaluation
Data Mining
knowledge
Knowledge
Transformation
Preprocessing
Selection
Selection
... ... ...
... ... ...
Data
Data
Junping Sun
Target Data
Processed
Data
Pattern
Transformed
Data
Database, KDD, Database Design and Data Access
1-30
Example of K-Nearest Neighbor Classification in Data
Mining
Class 1 Sample
Class 2 Sample
Unknown Sample
If it walks like a duck, quacks like a duck, and looks like a duck, then it is
probably a duck.
Junping Sun
Database, KDD, Database Design and Data Access
1-31
Justification of K-Nearest Neighbor Classification
 If it walks like a duck, quacks like a duck, and looks like a duck, then it is
probably a duck.
Junping Sun
Database, KDD, Database Design and Data Access
1-32
How to Measure “nearby” or the Similarity?


One of the most popular distance measure is Euclidean distance in
analytic geometry.
For given two data points (or data records):
X = (x1, …, xp) U = (u1, …, up)
The Euclidean distance between the data points X and U are:
Junping Sun
Database, KDD, Database Design and Data Access
1-33
Example
Junping Sun
Database, KDD, Database Design and Data Access
1-34
Example
Junping Sun
Database, KDD, Database Design and Data Access
1-35
Example
 We can now use the training data set to classify an unknown case
(Age = 48 and Loan = $142,000) using Euclidean distance.
 𝐷=
(𝑥1 − 𝑦1 )2 +(𝑥2 − 𝑦2 )2
= (48 − 33)2 +(142000 − 150000)2 = 8000.1
 If K = 1 then the nearest neighbor is the last case in the training set with
Default=Y.
 K = 1 means use the single nearest record.
 With K = 3, there are two Default = Y and one Default = N out of three
closest neighbors. The prediction for the unknown case is again Default
= Y.
http://www.saedsayad.com/flash/KNN_flash.html
Junping Sun
Database, KDD, Database Design and Data Access
1-36
Standardized or Normalized Distance
1
2
3
Junping Sun
Database, KDD, Database Design and Data Access
1-37
Database Technology
Database Technology:
• It is comprehensive applications of computer science and other technologies,
and encompasses most of computer science subjects.
Compiler
Data Structures & Algorithms
Operating Systems
System Analysis
Software Engineering
AI and Expert Systems
Optimization Theory
User Interface and Human Factors
Network and Distributed Systems
Mathematical Predicate Logic
Cryptography and Others
Junping Sun
Database Languages
Storage Structures & Data Access
Concurrency Control
Database Modeling and Design
DB & DBMS Development
Heuristic Search and DSS
Query Optimization
Database Interfaces (GUI)
Distributed Database Systems
Database Theory
Database Security
Database, KDD, Database Design and Data Access
1-38
Issues in Database Applications
Data Processing (Search)
Correct, Efficiency, Security, and User Friendly
Information Storage and Retrieval
Search Algorithms & Access Mechanisms
Data Structures
Physical Storage
Computer Hardware
Junping Sun
Database, KDD, Database Design and Data Access
1-39
Database and DBMS Revisited
• Programs = Data Structures + Algorithms
Nicklaus Wirth
• Database System = Databases + Database Management System(s)
• Database Management System =
Data Model + Data Structures + Algorithms
• Database Query (Data Query):
Deductive Processing
•
Inductive Processing
Data Mining (Knowledge Query):
Junping Sun
Database, KDD, Database Design and Data Access
1-40
Multiple of Bytes (Orders of Magnitude of Data)
Multiples of Bytes
Decimal
Value
Binary
Metric
Value
JEDEC
1024 = 210 KB kilobyte
IEC
1000
KB kilobyte
KiB kibibyte
10002
MB megabyte 10242 = 220 MB megabyte
10003
GB gigabyte
10243 = 230 GB gigabyte
GiB gibibyte
10004
TB terabyte
10244 = 240 TB
TiB tebibyte
10005
PB petabyte
10245 = 250 PB
PiB pebibyte
10006
EB exabyte
10246 = 260 EB
EiB exbibyte
10007
ZB zettabyte
10247 = 220 ZB
ZiB zebbibyte
10008
YB yottabyte
10248 = 280 YB
YiB yobibyte
MiB mebibyte
• JEDEC - Joint Electron Device Engineering Council
• IEC - International Electrotechnical Commission
Junping Sun
Database, KDD, Database Design and Data Access
1-41
Different Perspective View of Database Systems
 Application View
 Management View
Data Management
 Systematic View
Database System Management
 Methodological/Algorithmic View
 Theoretical View
 Logical View
 Philosophical View
Junping Sun
Database, KDD, Database Design and Data Access
1-42
Computer Science is a Problem Solving Science
 Database applications are comprehensive applications of computer science
and other technologies.
 Database technology is one of the branches in computer science.
 Computer science is a problem solving science, so the database technology to
solve various problems in data processing.
 Domain Knowledge + Critical Thinking + Methodology = Problem Solving.
Book:
“Out of Their Minds: The Lives and Discoveries of 15 Great Computer
Scientists,” by Dennis Shasha and Cathy Lazere
Copernicus, an imprint of Springer-Verlag,
ISBN:0-387-97992-1
Junping Sun
Database, KDD, Database Design and Data Access
1-43
IOT = ABC
Internet of Things = Artificial Intelligence +
Big Data (Analysis) +
Cloud Computing
STEM
(Science, Technology, Engineering, Mathematics)
Junping Sun
Database, KDD, Database Design and Data Access
1-44
A Few Words as Last, But Not Necessarily Least
 If you realize something you did not know, you are learning and advancing.
 One of the purposes of this summer camp is to engage you to prepare for your
next milestone in your life.
 Your success in your future may imply and reflect the success of our summer
camp to some extent.
 You may want to visit the website: Bureau of Labor Statistics, United States
Department of Labor:
http://www.bls.gov/ooh/computer-and-information-technology/home.htm
http://www.marketwatch.com/story/want-better-work-life-balance-here-arethe-25-best-jobs-for-you-2015-10-20?siteid=rss&rss=1
Junping Sun
Database, KDD, Database Design and Data Access
1-45
Thank You!
Questions/Answers
Junping Sun
Database, KDD, Database Design and Data Access
1-46