DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION

DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION SYSTEMS
HCS 206- MODELS OF DATABASE AND DATABASE DESIGN
MODULE
by
T.G.GWANZURA
[email protected]
COURSE OUTLINE
HCS 206- MODELS OF DATABASE AND DATABASE DESIGN
COURSE OUTLINE
General Information
Semester: August- November 2010
Date and Time: Tue 1200-1400 and Fri 0800-1000
Location: To be advised
Contact
Instructor: Ms T.G. Gwanzura
Office: 9
E-mail: [email protected]
Course Outline
 Introduction
 Definition of terms: database; database management systems
 Database system vs File system
 ANSI-SPARC architecture; database application architectures i.e. 2-tier and 3tier architecture
 Data models
 Database languages
 Database users and administrators
 Entity Relationship Model
 Definition of key terms i.e. entity sets, relationship sets, attributes
 Constraints of an ER model: mapping cardinalities; participation constraints
 Keys: definition of different keys; super key; candidate key; primary key
 Constructing the Entity Relationship Diagram
 Unified Modeling Language (UML)
 Relational Model
 Structure of a relational database
 Relational algebra and operations in relational algebra
 SQL
 Structure of the SQL expression i.e. select clause; where clause; from clause
 Creating databases, tables and performing various operations using SQL
statements
 Integrity and Security
 Types of integrity i.e. referential integrity; entity integrity
 Database security: definition of database security; types of database security
violations, authorization, views, privileges; audit trails;
 Encryption and Authentication: encryption techniques; authentication
 Relational Database Design
 First Normal Form
 Functional Dependencies and Full Functional Dependencies
 Second Normal Form
 Third Normal Form
 Distributed Databases
 Homogenous and Heterogeneous Databases
 Distributed Data Storage
 Distributed Transactions
 Distributed Query Processing
 Heterogeneous Distributed Database
 Practical Sessions using MySQL
 Creating databases
 Creating tables
 Identifying suitable columns for the tables specifying attribute type and field
length
 Inserting records and indentifying the primary keys
 Modifying data
RECOMMENDED READINGS
 Elmasri et al (c2006), Fundamentals of Database Systems, Dorling
Kindersley (India) Pvt. Ltd.
 Silberschatz et al (2002), Database Systems Concepts 4th Edition, McGrawHill Higher Education.
 Manning, M.V. (c2004), Database: Design, Application Development and
Administration 2nd Edition, McGraw Hill.
 Date, C. J. (c2000), An Introduction to Database Systems 7th Edition,
Pearson Education India.
These are not the only books to be read. Any other book on databases
systems may be consulted plus you are advised to read any electronic books
on database systems.
CHAPTER ONE- INTRODUCTION
Databases and database systems have become an essential component of everyday life in modern
society. Most of us have had several encounters that involve some interaction with a database
for example, making a deposit or withdrawing funds from the bank, making a reservation etc (our
activities involve someone or some computer program accessing a database). It is therefore
imperative as a starting point that we start from the basics of traditional database applications.
In this chapter we will define what a database is first and then give key definitions of other basic
terms.Next we will discuss the main characteristics of the database systems and the types of
personnel whose jobs involve using and interacting with database systems. The database
architectures, data models and database languages will be briefly discussed as part of the
introduction.
What is a database?
-A database consists of an organized collection of data for one or more uses, typically in digital
form.
- A database is an application that manages data and allows fast storage and retrieval of that
data.
The key word in the above definitions is data which refers to known facts that can be recorded
and that have implicit meaning for example the names, telephone numbers and addresses of
people you know. This information may have been stored in your phone or on a computer using
software such as Microsoft Access and Excel.
Alternatively a database may be defined as a collection of persistent data that can be shared and
interrelated.
This definition points out some important properties that a database possesses as discussed
below:
 Persistent meaning that data resides on stable storage such as magnetic disk or tape.
This data has been accepted by the Database Management System (DBMS)for entry into
the database and it can be subsequently removed from the database only by some explicit
request to the DBMS
 Shared meaning that a database can have multiple uses and users. It provides a common
memory for multiple functions in an organisation for example payroll calculations,
performance evaluations etc. Many users can access the database for example making an
airline reservation
 Interrelated menaing that data stored as separate units can be connected to provide a
whole picture for example a customer database relates customer data (name, address
etc).
As we were discussing on the properties of a database an important term was introduced:
Database Managment System. It is therefore imperative that we discuss this concept further as
outlined below:
What is a Database Management System?
It can be defined as:
 a software system that facilitates the creation and maintenance and use of an electronic
database
wordnetweb.princeton.edu/perl/webwn
 A software system used to access and retrieve data stored in a database. [NARA's
Managing Electronic Records Instructional Guide]
www.ischool.utexas.edu/~scisco/lis389c.5/email/gloss.html
 A management system associated with a structured set of data that allows the data to be
accessed in a variety of ways. In a Relational DBMS, the relationships between the data
elements form keys to reduce the amount of data needing to be held and to improve
navigation and access speeds
www.infodiv.unimelb.edu.au/knowledgebase/itservices/a-z/d.html
 A set of computer programs for organizing the information in a database. A DBMS
supports the structuring of the database in a standard format and provides tools for data
input, verification, storage, retrieval, query, and manipulation.
celiang.tongji.edu.cn/weian/3S/GIS/ESRI/AboutGIS/a_d.html
 A collection of programs that enables users to create and maintain a database
The definitions above highlight that the DBMS is a general purpose software system that
facilitates the processes of defining, constructing, manipulating , sharing and allowing
maintenance of databases among various users and applications. Defining a database involves
specifying the data types, structures and contsraints for the data to be stored in the database;
constructing the database is the process of storing the database itself on some storage medium
that is controlled by the DBMS; manipulating a database includes functions such as querying the
database to retrieve specific data, updating the database to reflect changes in the miniworld and
generatinh reports from the data; sharing a database allows multiple users and programs to
access the database concurrently and maintenance means the DBMS must maintain the database
system by allowing the system to evolve as requirements change over time
Database management systems can be categorized according to the database model that they
support, such as relational or XML, the type(s) of computer they support, such as a server
cluster or a mobile phone, the query language(s) that access the database, such as SQL or
XQuery, performance trade-offs, such as maximum scale or maximum speed or others. Some
DBMS cover more than one entry in these categories, e.g., supporting multiple query languages.
Why use a database?
Why should one use a database system? What are its advantages over a file based system?
Limitations of File-based Approach
1.File processing systems store groups of records in separate files…
2. Separation and isolation of data
- Each program maintains its own set of data.
- Users of one program may be unaware of potentially useful data held by other programs.
3. Duplication of data (redundancy)
- Same data is held by different programs.
- Wasted space and potentially different values and/or different formats for the same item =>
data integrity problem: produce inconsistent results
4. Data dependence/ application program dependency
– File structure/ format and records are defined in the program/application code.
– Changes in formats must be reflected in the code
– Time consuming and error prone tasks
5. Incompatible file formats
– Programs are written in different languages, and so cannot easily access each other’s files,
rather files written in different programming languages cannot readily be combined or compared.
6. Fixed Queries/Proliferation of application programs
- Programs are written to satisfy particular functions. Any new requirement needs a new
program.
7. Difficulty of representing data in user’ view point
 Relationships among records are not readily represented or processed
Database Approach
Arose because:
– Definition of data was embedded in application programs, rather than being stored
separately and independently.
– No control over access and manipulation of data beyond that imposed by application
programs.
Result
– The database and Database Management System (DBMS).
Database Approach
- Integrated data
- All the application data is stored in a database
- Programmer is not responsible for co-ordinating files; DBMS will do it.
- Less duplication of data
- Data is stored in only one place
- Less data integrity problems
- Program/data independence
- Record formats are stored in DB itself, so it is accessed by DBMS, not by application
programs
- Minimises the impact of data format changes on application programs
- Easier representation of user’s view of data.
- Controlled access to database
Controlled access to database may include:
– A security system.
– An integrity system.
– A concurrency control system.
– A recovery control system.
– A user-accessible catalog.
Benefits of the database approach:
- Minimal data redundancy and improved data consistency
The concept of normalisation ensures that there is reduced data redundancy in a database.
- Ease of access to data/ Improved data accessibility and responsiveness
Data in a database is interrelated and is in the same format. This facilitates better data retrieval
for general use.
- Increased of development productivity/Ease of application development and reduced program
maintenance/ reduced application development time.
New application programs to manipulate data can be written with ease because the data is
integrated and is in the same format. Designing and implementing a new database from scratch
may take more time.
- Improved data sharing
Database systems allow multiple access and update of data in a consistent manner. They also
allow different views of the same data.
- Enforcement of standards/Improved security and integrity
- Improved data quality
- Availability of up to date information
- Flexibility/Program-data independence - Data is independent from applications and shared by
multiple users and applications. It should be possible to effect changes to an application program
that accesses the data without having to change the structure of the data itself. Similarly it
should be possible to change the structure of the data without affecting the application program
that operates on it.
- Persistence:It is possible to maintain data over long periods of time, independent of any
program that accesses it.
- Resilience
The ability of data to survive hardware and software failures without sustaining loss or becoming
inconsistent can be provided for in a DB environment.
Database Architectures:
The ANSI-SPARC Architecture
• The architecture of most commercial DBMSs is based on the ANSI-SPARC architecture (1975).
– American National Standards Institute (ANSI)
– Standards Planning And Requirements Committee (SPARC)
• Although this never became a formal standard, it is useful to help
understand the functionality of a typical DBMS.
• The ANSI-SPARC model of a database identifies three distinct levels
at which data items can be described.
• These levels form a three-level architecture comprising:
– an external level,
– a conceptual level, and
– an internal level.
.The objective of the three-level architecture is to separate the users’
view(s) of the database from the way that it is physically represented.
This is desirable for the following reasons:
• 1. It allows independent customised user views.
– Each user should be able to access the same data, but have a different
customised view of the data. These should be independent: changes to one
view should not affect others.
• 2. It hides the physical storage details from users.
– Users should not have to deal with physical database storage details. They
should be allowed to work with the data itself, without concern for how it
is physically stored.
• 3. The database administrator should be able to change the database
storage structures without affecting the users’ views.
– From time to time rationalisations or other changes to the structure of an
organisation’s data will be required.
• 4. The internal structure of the database should be unaffected by
changes to the physical aspects of the storage.
– For example, a changeover to a new disk.
• 5. The database administrator should be able to change the conceptual
or global structure of the database without affecting the users.
– This should be possible while still maintaining the desired individual
users’ views.
The External Level
• The external level represents the user’s view of the database.
– It consists of a number of different views of the database, potentially one
for each user.
• It describes the part of the database that is relevant to a particular user.
– For example, large organisations may have finance and stock control
departments.
– Workers in finance will not usually view stock details as they are more
concerned with the accounting side of things, for example.
– Thus, workers in each department will require a different user interface to
the information stored in the database.
• Views may provide different representations of the same data.
– For example, some users might view dates in the form (day/month/year)
while others prefer (year/month/day).
• Some views might include derived or calculated data.
– For example, a person’s age might be calculated from their date of birth
since storing their age would require it to be updated each year.
The Conceptual Level
• The conceptual level describes what data is stored in the database and
the relationships among the data.
• It is a complete view of the data requirements of the organisation that is
independent of any storage considerations.
• The conceptual level represents:
– All entities, their attributes, and their relationships.
– The constraints on the data.
– Security and integrity information.
• The conceptual level supports each external view, in that any data
available to a user must be contained in, or derivable from, the
conceptual level.
• The description of the conceptual level must not contain any storagedependent details.
The Internal Level
• The internal level covers the physical representation of the database on
the computer (and may be specified in some programming language).
• It describes how the data is stored in the database in terms of particular
data structures and file organisations.
• The internal level is concerned with:
– Allocating storage space for data and indexes.
– Describing the forms that records will take when stored.
– Record placement. Assembling records into files.
– Data compression and encryption techniques.
• The internal level interfaces with the OS to place data on the storage
devices, build the indexes, retrieve the data, etc.
• Below the internal level is the physical level which is managed by the
OS under the direction of the DBMS. It deals with the mechanics of
physically storing data on a device such as a disk.
Database Schemas
• The overall description of a database is called the database schema.
• There are three different types of schema corresponding to the three
levels in the ANSI-SPARC architecture.
• The external schemas describe the different external views of the data.
– There may be many external schemas for a given database.
• The conceptual schema describes all the data items and relationships
between them, together with integrity constraints.
– There is only one conceptual schema per database.
• At the lowest level, the internal schema contains definitions of the
stored records, the methods of representation, the data fields, and
indexes.
– There is only one internal schema per database.
Mapping Between Schemas
• The DBMS is responsible for mapping between the three types of
schema (i.e. how they actually correspond with each other).
• It must also check the schemas for consistency.
– Each external schema must be derivable from the conceptual schema.
• Each external schema is related to the conceptual schema by the
external/conceptual mapping.
• This enables the DBMS to map data in the user’s view onto the
relevant part of the conceptual schema.
• A conceptual/internal mapping relates the conceptual schema to the
internal schema.
• This enables the DBMS to find the actual record or combination of
records in physical storage that constitute a logical record in the
conceptual schema.
Notes on the above example
• The two external views are based on the conceptual view.
– The Age field is derived from the DOB (Date of Birth) field.
– The Sno field is mapped onto the StaffNo field of the conceptual record.
• The conceptual level is mapped onto the internal level.
• The internal level contains a physical description of the structure for
the conceptual record expressed in a high-level language.
• Note that the order of the fields in the physical structure is different
from that of the conceptual record.
• The physical structure contains a “pointer”, next. This will be simply
the memory address at which the next record is stored. Thus the set of
staff records may be physically linked together to form a chain.
Data Independence -I
• A major objective of the ANSI-SPARC architecture is to provide data
independence meaning that upper levels are isolated from changes to
lower levels.
• There are two kinds of data independence:
• Logical data independence refers to the immunity of external schemas
to changes in the conceptual schema.
– Changes to the conceptual schema (adding/removing entities, attributes, or
relationships) should be possible without having to change existing
external schemas or rewrite application programs.
• Physical data independence refers to the immunity of the conceptual
schema to changes in the internal schema.
– Changes to the internal schema (using different storage structures or file
organisations) should be possible without having to change the conceptual
or external schemas.
Database Application Architectures
Two-tier client/server architectures have 2 essential components
1. A Client PC and
2. A Database Server
2-Tier Considerations:
• Client program accesses database directly
o Requires a code change to port to a different database
o Potential bottleneck for data requests
o High volume of traffic due to data shipping
• Client program executes application logic
o Limited by processing capability of client workstation (memory, CPU)
o Requires application code to be distributed to each client workstation
Two – Tier Pros and Cons
Advantage
Disadvantages
Development Issues:
-Simple structure
-Easy to setup and maintain
Development Issues:
-Complex application rules difficult to
implement in database server – requires
more code for the client
-Complex application rules difficult to
implement in client and have poor
performance
-Changes to business logic not
automatically enforced by a server –
changes require new client side software
to be distributed and installed
-Not portable to other database server
platforms
Performance:
-Adequate performance for low to medium
volume environments
-Business logic and database are
physically close, which provides higher
performance.
Performance
-Inadequate performance for medium to high
volume environments since database server is
required to perform business logic
. This slows down database operations on
database server
RESEARCH MORE ON 2-TIER ARCHITECTURE
3-Tier client-server architectures have 3 essential components:
1. A Client PC
2. An Application Server
3. A Database Server
3-Tier Architecture Considerations:
•
Client program contains presentation logic only
o Less resources needed for client workstation
o No client modification if database location changes
o Less code to distribute to client workstations
•
One server handles many client requests
o More resources available for server program
o Reduces data traffic on the network
3 – Tier Pros and Cons
Advantages
Development Issues:
• Complex application rules easy to implement
in application server
• Business logic off-loaded from database
server and client, which improves
performance
• Changes to business logic automatically
enforced by server – changes require only
new application server software to be
installed
• Application server logic is portable to other
database server platforms by virtue of the
application software
Performance:
• Superior performance for medium to high
volume environments
Disadvantages
Development Issues:
• More complex structure
• More difficult to setup and maintain.
Performance:
• The physical separation of application
servers containing business logic
functions and database servers containing
databases may moderately affect
performance.
RESEARCH MORE ON 3-TIER ARCHITECTURE
Data Models
A data model is an abstract model that describes how data are represented and accessed. Data
models formally define data elements and relationships among data elements for a domain of
interest. According to Hoberman (2009), "A data model is a wayfinding tool for both business
and IT professionals, which uses a set of symbols and text to precisely explain a subset of real
information to improve communication within the organization and thereby lead to a more flexible
and stable application environment."
A data model explicitly determines the structure of data or structured data. Typical applications
of data models include database models, design of information systems, and enabling exchange of
data. Usually data models are specified in a data modeling language.
The three level of data modeling, conceptual data model, logical data model, and physical data
model, were discussed previously(refer to the ANSI-SPARC architecture). Here we will compare
these three types of data models. The table below compares the different features:
Feature
Conceptual Logical Physical
Entity Names
✓
✓
Entity Relationships
✓
✓
Attributes
✓
Primary Keys
✓
✓
Foreign Keys
✓
✓
Table Names
✓
Column Names
✓
Column Data Types
✓
Below we show the conceptual, logical, and physical versions of a single data model
CONCEPTUAL MODEL DESIGN
LOGICAL MODEL DESIGN
PHYSICAL
MODEL
DESIGN
From the above,we can see that the complexity increases from conceptual to logical to physical.
This is why we always first start with the conceptual data model (so we understand at high level
what are the different entities in our data and how they relate to one another), then move on to
the logical data model (so we understand the details of our data without worrying about how they
will actually implemented), and finally the physical data model (so we know exactly how to
implement our data model in the database of choice).
RESEARCH ON KINDS OR TYPES OF DATA MODELS
Database Languages
These are of two types as discussed below:
Data Definition Languages
A Data Definition Language (DDL) is a computer language used for defining data structures.
Initially, DDL was a subset of SQL statements and the most common ones are briefly mentioned
below:
Create statements-to make a new database, table, index, or stored query
Drop statements - To destroy an existing database, table, index, or view.
Alter statements - To modify an existing database object.
Data Manipulation Language
Data Manipulation Language Commands are used to manipulate
the data in the database.
This is done either by retrieving information from existing rows,
entering new rows, changing existing rows or removing unwanted
rows from tables in the database.
Data Manipulation Language Commands are: SELECT, INSERT,
UPDATE and DELETE.
Data Manipulation Languages have their functional capability organized by the initial word in a
statement, which is almost always a verb. In the case of SQL, these verbs are:




SELECT
INSERT
UPDATE
DELETE
... FROM ... WHERE ...
INTO ... VALUES ...
... SET ... WHERE ...
FROM ... WHERE ...
There are two types of DML:
 procedural: the user specifies what data is needed and how to get it
 nonprocedural: the user only specifies what data is needed
 Easier for user
 May not generate code as efficient as that produced by procedural languages
Database Users and Administrators
Database Designers- they are responsible for identifying the data to be stored in the database
and for choosing appropriate structures to represent and store this data. These tasks are mostly
undertaken before the database is actually implemented and populated with data. It is the
responsibility of database designers to communicate with all prospective database users in order
to understand their requirements and to come up with a design that meets these requirements.
Database Administrators – he or she is the chief administrator who oversees and manages the
database, DBMS and related software.The DBA is responsible for authorizing access to the
database, for co-ordinating and monitoring its use and for acquiring software and hardware
resources as needed. He or she is accountable for problems such as breach of security or poor
system response.
End Users- these are people whose jobs require access to the database for querying, updating
and generating reports. There are several categories of end users as discussed below:

Casual End users- occasionally access the database but they may need different
information each time. They use a sophisticated database query language to specify their
requests and are typically middle or high level managers or other occasional browsers

Naive or Parametric end users- make up a sizeable portion of database end users. Their
main job function revolves around constantly querying and updating the database, using
standard types of queries and updates called 'canned transactions' that have been
carefully programmed and tested. Examples of such users include bank tellers,
reservation clerks etc

Sophisticated end users- include engineers, scientists, business analysts and others who
familiarise themselves with the facilities of the DBMS so as to implement their
applications to meet their complex requirements

Standalone users- they maintain personal databases by using ready made program
packages that provide easy to use menu based or graphics based interfaces for example a
tax package that stores a variety of personal financial data for tax purposes
RESEARCH ON WHAT ROLE PLAYED BY EACH OF THE FOLLOWING GROUP OF
PEOPLE:
 Data administrator
 system analysts and application programmers
 tool developers
 DBMS system designers and implementers
 operators and maintenance personnel
SUMMARY
In this chapter we defined a database as a collection of persistent data that can be shared and
interrelated. A DBMS is defined as a general purpose software system that facilitates the
processes of defining, constructing, manipulating , sharing and allowing maintenance of
databases among various users and applications.We identified several characteristics that
distinguish the database approach from traditional file processing applications. We discussed the
ANSI SPARC architecture and how it reflects the different data models used in a DBMS. Lastly
we discussed the different database languages that are used and the main database users.
CHAPTER TWO- ENTITY RELATIONSHIP MODEL
The Entity-Relationship model is a data model for high-level descriptions of conceptual data
models, and it provides a graphical notation for representing such data models in the form of
entity-relationship diagrams. Such data models are typically used in the first stage of
information-system design; they are used, for example, to describe information needs and/or the
type of information that is to be stored in the database during the requirements analysis.
The ER data model employs three basic notions: entity sets, relationship sets and attributes.
Entity sets
Entity- is a 'thing' or 'discrete object' in the real world with an independent existence. An
entity may be an object with a physical existence (for example a person, car, house or employee)
or it may be an object with conceptual existence ( for example a company. Job or university
course).
Entity Set- may be defined as a collection of entities all belonging to the same class.
Alternatively it may be defined as a set of entities of the same type that share the same
properties or attributes. The set of all persons who are customers at a given bank for example
can be defined as the entity set customer. The individual entities that constitute a set are said
to be the extension of the entity set. Thus all the individual bank customers are the extension of
the entity set customer.
Attributes
An entity is represented by attributes which are defined as descriptive properties possessed by
each member of an entity set for example the student entity may be described by the student
name, date of birth, registration number, degree programme etc.
Each entity has a value for each of the attributes for example the student entity may have the
value Jones for student name, the value 27-05-90 for date of birth and so on. The attribute
values that describe each entity become a major part of the data stored in the database.
For each attribute there is a set of permitted values called the 'domain' or 'value set' of that
attribute.
The domain of attribute student name might be the set of all text strings of a certain length.
There are 3 main attribute types as discussed below:
 Simple and composite attributes- in the example given above, the attributes are simple
that is they are not divide into subparts (student name, date of birth etc). However
composite attributes may be divided into subparts for example the attribute student name
may be structured as a composite attribute consisting of first name, middle name,
surname; the attribute address may be structured as a composite attribute consisting of
street, city, country, postal code. Composite attributes help us to group together related
attributes making the modeling cleaner.
 Single valued and multivalued attributes- considering the examples we gave above the
attributes have a single value for a particular entity for example a student has only one
date of birth, one registration number and so on. However they may be instances where
an attribute has a set of values for a specific entity for example a person may have
several cell phone numbers. This type of attribute is said to be multivalued. Where
appropriate, upper and lower bounds may be placed on the number of values in a
multivalued attribute for example one may limit the number of phone numbers recorded to
two.
 Stored and Derived attributes- in some cases, two (or more) attribute values are related
for example the age and DOB attributes of a person. For a particular person. The value
of age can be determined from the current (today') date and the value of that person's
date of birth.The Age attribute is hence called a derived attribute and is said to be
derivable from the DOB attribute which is called a stored attribute.
An attribute takes a null value when an entity does not have value for it. The null value may
indicate 'not applicable' that is the value does not exist for the entity. For example one may not
have a middle name. Null can also designate that an attribute value is not known. An unknown
value may either be missing ( the value does exist but we do not have that information) or not
known ( we do not know whether or not the value actually exists).
Relationship sets
A relationship is an association among several entities for example we can define a relationship
that associates student x with degree program z. We can say the student x is a student studying
for a degree z.
A Relationship set is a set of relationships of the same type

Download Report

DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION

Paperzz.com

Your Paperzz