Database

COP 3540 - Introduction to Database Structures
Introduction
Welcome
Course web-site
http://faculty.eng.fau.edu/yangk/COP3540/index.html
Data
Data are raw or isolated facts from which the required
information is produced.
Information is a collection of processed data.
Data
 Banking
 Airlines
 Universities
 Credit card transaction
 Telecommunication
 Finance
 Sales
 Manufacturing
 Human resources
 Biology
 Ecology
 Geospatial
Why is it important?
Database systems are an essential component of life
in modern society.
Examples include banking system, airline reservation,
mobile, car navigation system, etc.
File Systems
A file is a collection of records, which contains logically
related data. Each record contains a logically connected
set of one or more fields, where each field represents
some characteristic of the real-world object that is being
modeled.
1) the definition of the data is embedded in the
application programs, rather than being stored
separately and independently;
2) there is no control over the access and manipulation
of data beyond that imposed by the application
programs.
File Systems
A file is a sequence of records.
 All records in a file are of the same record type.
 File-processing system is supported by a conventional
operating system. The system stores permanent
records in various files, and it needs different
application program to extract records from the
appropriate files and add record to appropriate files.
Database
Database is a collection of related data.
 Database is a shared collection of logically related data,
and a description of this data, designed to meet the
information needs of an organization.
 Database is also defined as a self-describing collection of
integrated records. The description of the data is known as
the system catalog (or data dictionary or metadata –
the ‘data about data’). It is the self-describing nature of a
database that provides program–data independence.
Database
Database represents the entities, the attributes, and
the logical relationships between the entities.
An entity is a distinct object (a person, place, thing,
concept, or event) in the organization that is to be
represented in the database. An attribute is a property
that describes some aspect of the object that we wish to
record, and a relationship is an association between
entities.
System Catalog
 The system catalog is one of the fundamental
components of a DBMS. It contains ‘data about the data’,
or metadata.
 The catalog should be accessible to users.
 The Information Resource Dictionary System is an ISO
standard that defines a set of access methods for a data
dictionary. This allows dictionaries to be shared and
transferred from one system to another.
Database Management System (DBMS)
Database Management System (DBMS) is a software
that manages and controls access to the database.
 DBMS enables users to define, create, maintain, and
control access to the database.
 DBMS interacts with the users’ application programs and
the database.
DBMS provides a Data Definition Language (DDL),
which allows users to define the database, and a Data
Manipulation Language (DML), which allows users to
insert, update, delete, and retrieve data from the
database.
Database Management System (DBMS)
DBMS provides controlled access to the database.
 a security system, which prevents unauthorized users
accessing the database;
 an integrity system, which maintains the consistency of
stored data;
 a concurrency control system, which allows shared
access of the database;
 a recovery control system, which restores the database
to a previous consistent state following a hardware or
software failure;
 a user-accessible catalog, which contains descriptions of
the data in the database.
Database Management System (DBMS)
Some advantages of the database approach include
control of data redundancy, data consistency,
sharing of data, and improved security and integrity.
Some disadvantages include complexity, cost,
reduced performance, and higher impact of a failure.
Files vs. DBMS
Application must stage large datasets between main
memory and secondary storage (e.g., buffering, pageoriented access, 32-bit addressing, etc.)
Special code for different queries
Must protect data from inconsistency due to multiple
concurrent users
Crash recovery
Security and access control
Application Program
Application program is a computer program that
interacts with the database by issuing an appropriate
request (typically an SQL statement) to the DBMS.
The more inclusive term database system is used to
define a collection of application programs that interact
with the database along with the DBMS and database
itself.
Why Use a DBMS?
Data independence and efficient access.
Reduced application development time.
Data integrity and security.
Uniform data administration.
Concurrent access, recovery from crashes.
Why Study Databases?
Shift from computation to information
 at the “low end”: scramble to webspace (a mess!)
 at the “high end”: scientific applications
Datasets increasing in diversity and volume.
 Digital libraries, interactive video, Human Genome
project, EOS project
 ... need for DBMS exploding
DBMS encompasses most of CS
 OS, languages, theory, AI, multimedia, logic
History of Database Management Systems
 The roots of the DBMS lie in file-based systems.
 The hierarchical and CODASYL systems represent the firstgeneration of DBMSs. The hierarchical model is typified by IMS
(Information Management System) and the network or
CODASYL model by IDS (Integrated Data Store), both
developed in the mid-1960s.
 The relational model, proposed by E. F. Codd in 1970,
represents the second-generation of DBMSs. It has had a
fundamental effect on the DBMS community and there are now
over one hundred relational DBMSs.
 The third-generation of DBMSs are represented by the ObjectRelational DBMS and the Object-Oriented DBMS.
Data Model
Data model is a collection of concepts that can be
used to describe a set of data, the operations to
manipulate the data, and a set of integrity constraints
for the data.
Semantic data model is a more abstract, high-level
data model that describes the meaning of its
instances.
Entity-relationship (ER) model are widely used for
Semantic data model.
Level of Abstraction in DBMS
 The ANSI-SPARC database architecture uses three levels
of abstraction: external, conceptual, and internal.
 The external level consists of the users’ views of the
database. The conceptual level is the community view of
the database. It specifies the information content of the
entire database, independent of storage considerations.
 The conceptual level represents all entities, their attributes,
and their relationships, as well as the constraints on the data,
and security and integrity information.
 The internal level is the computer’s view of the database. It
specifies how data is represented, how records are
sequenced, what indexes and pointers exist, and so on.
Level of Abstraction in DBMS
 The external/conceptual mapping transforms requests and
results between the external and conceptual levels.
 The conceptual/internal mapping transforms requests and
results between the conceptual and internal levels.
(Internal schema)
Level of Abstraction in DBMS
 Database schema is a description of the database structure.
Data independence makes each level immune to changes to
lower levels.
 Logical data independence refers to the immunity of the
external schemas to changes in the conceptual schema.
 Physical data independence refers to the immunity of the
conceptual schema to changes in the internal schema.
(Internal schema)
Levels of Abstraction in a DBMS
Database Language
 Database Language consists of two parts: a Data Definition
Language (DDL) and a Data Manipulation Language (DML).
 DDL is used to specify the database schema.
 DML is used to both read and update the database. The part of
a DML that involves data retrieval is called a query language.
Review
 A data model is a collection of concepts for describing
data.
 A schema is a description of a particular collection of
data, using the a given data model.
 The relational model of data is the most widely used
model today.
 Main concept: relation, basically a table with rows
and columns.
 Every relation has a schema, which describes the
columns, or fields.
Relational Model
 Relational data model proposed by E. F. Codd
 A mathematical relation is a subset of the Cartesian product
of two or more sets. In database terms, a relation is any subset
of the Cartesian product of the domains of the attributes.
 A relation is normally written as a set of n-tuples, in which each
element is chosen from the appropriate domain.
 Relations are physically represented as tables, with the rows
corresponding to individual tuples and the columns to
attributes.
Example: University Database
Conceptual schema:
 Students(sid: string, name: string, login: string, age:
integer, gpa:real)
 Courses(cid: string, cname:string, credits:integer)
 Enrolled(sid:string, cid:string, grade:string)
Physical schema:
 Relations stored as unordered files.
 Index on first column of Students.
External Schema (View):
 Course_info(cid:string,enrollment:integer)
Example: University Database
Applications insulated from how data is structured and
stored.
 Logical data independence: Protection from changes
in logical structure of data.
 Physical data independence: Protection from changes
in physical structure of data.
One of the most important benefits of using a DBMS!
Transaction Management
A transaction is a series of actions, carried out by a
single user or application program, which accesses or
changes the contents of the database.
A transaction is a logical unit of work consisting of
one or more SQL statements that is guaranteed to be
atomic with respect to recovery.
Transaction Management
A DBMS must furnish a mechanism that will ensure
either that all the updates corresponding to a given
transaction are made or that none of them is made.
A DBMS must furnish a mechanism to ensure that the
database is updated correctly when multiple users are
updating the database concurrently.
A DBMS must furnish a mechanism for recovering the
database in the event that the database is damaged in
any way.
Concurrency Control
 Concurrent execution of user programs is essential for
good DBMS performance.
 Because disk accesses are frequent, and relatively slow, it
is important to keep the cpu humming by working on
several user programs concurrently.
 Interleaving actions of different user programs can lead
to inconsistency: e.g., check is cleared while account
balance is being computed.
 DBMS ensures such problems don’t arise: users can
pretend they are using a single-user system.
Roles in the Database Environment
 Database Administrator (DBA) is responsible for the
physical realization of the database, including physical
database design and implementation, security and
integrity control, maintenance of the operational system,
and ensuring satisfactory performance of the
applications for users.
 Logical database designer is concerned with
identifying the data (that is, the entities and attributes),
the relationships between the data, and the constraints
on the data that is to be stored in the database.
 Application Developers
 End-Users
Databases make these folks happy ...
End users and DBMS vendors
DB application programmers
 E.g., smart webmasters
Database administrator (DBA)
 Designs logical /physical schemas
 Handles security and authorization
 Data availability, crash recovery
 Database tuning as needs evolve
Structure of a DBMS
A typical DBMS has a
layered architecture.
The figure does not
show the concurrency
control and recovery
components.
This is one of several
possible architectures;
each system has its
own variations.
These layers must
consider concurrency
control and recovery
Query Optimization
and Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
Architecture of DBMS
Summary
DBMS used to maintain, query large datasets.
Benefits include recovery from system crashes,
concurrent access, quick application development,
data integrity and security.
Levels of abstraction give data independence.
A DBMS typically has a layered architecture.
DBAs hold responsible jobs and are well-paid!
DBMS R&D is one of the broadest, most exciting
areas in CS.