DEPARTMENT OF COMPUTER SCIENCE AND INFORMATION SYSTEMS HCS 206- MODELS OF DATABASE AND DATABASE DESIGN MODULE by T.G.GWANZURA [email protected] COURSE OUTLINE HCS 206- MODELS OF DATABASE AND DATABASE DESIGN COURSE OUTLINE General Information Semester: August- November 2010 Date and Time: Tue 1200-1400 and Fri 0800-1000 Location: To be advised Contact Instructor: Ms T.G. Gwanzura Office: 9 E-mail: [email protected] Course Outline Introduction Definition of terms: database; database management systems Database system vs File system ANSI-SPARC architecture; database application architectures i.e. 2-tier and 3tier architecture Data models Database languages Database users and administrators Entity Relationship Model Definition of key terms i.e. entity sets, relationship sets, attributes Constraints of an ER model: mapping cardinalities; participation constraints Keys: definition of different keys; super key; candidate key; primary key Constructing the Entity Relationship Diagram Unified Modeling Language (UML) Relational Model Structure of a relational database Relational algebra and operations in relational algebra SQL Structure of the SQL expression i.e. select clause; where clause; from clause Creating databases, tables and performing various operations using SQL statements Integrity and Security Types of integrity i.e. referential integrity; entity integrity Database security: definition of database security; types of database security violations, authorization, views, privileges; audit trails; Encryption and Authentication: encryption techniques; authentication Relational Database Design First Normal Form Functional Dependencies and Full Functional Dependencies Second Normal Form Third Normal Form Distributed Databases Homogenous and Heterogeneous Databases Distributed Data Storage Distributed Transactions Distributed Query Processing Heterogeneous Distributed Database Practical Sessions using MySQL Creating databases Creating tables Identifying suitable columns for the tables specifying attribute type and field length Inserting records and indentifying the primary keys Modifying data RECOMMENDED READINGS Elmasri et al (c2006), Fundamentals of Database Systems, Dorling Kindersley (India) Pvt. Ltd. Silberschatz et al (2002), Database Systems Concepts 4th Edition, McGrawHill Higher Education. Manning, M.V. (c2004), Database: Design, Application Development and Administration 2nd Edition, McGraw Hill. Date, C. J. (c2000), An Introduction to Database Systems 7th Edition, Pearson Education India. These are not the only books to be read. Any other book on databases systems may be consulted plus you are advised to read any electronic books on database systems. CHAPTER ONE- INTRODUCTION Databases and database systems have become an essential component of everyday life in modern society. Most of us have had several encounters that involve some interaction with a database for example, making a deposit or withdrawing funds from the bank, making a reservation etc (our activities involve someone or some computer program accessing a database). It is therefore imperative as a starting point that we start from the basics of traditional database applications. In this chapter we will define what a database is first and then give key definitions of other basic terms.Next we will discuss the main characteristics of the database systems and the types of personnel whose jobs involve using and interacting with database systems. The database architectures, data models and database languages will be briefly discussed as part of the introduction. What is a database? -A database consists of an organized collection of data for one or more uses, typically in digital form. - A database is an application that manages data and allows fast storage and retrieval of that data. The key word in the above definitions is data which refers to known facts that can be recorded and that have implicit meaning for example the names, telephone numbers and addresses of people you know. This information may have been stored in your phone or on a computer using software such as Microsoft Access and Excel. Alternatively a database may be defined as a collection of persistent data that can be shared and interrelated. This definition points out some important properties that a database possesses as discussed below: Persistent meaning that data resides on stable storage such as magnetic disk or tape. This data has been accepted by the Database Management System (DBMS)for entry into the database and it can be subsequently removed from the database only by some explicit request to the DBMS Shared meaning that a database can have multiple uses and users. It provides a common memory for multiple functions in an organisation for example payroll calculations, performance evaluations etc. Many users can access the database for example making an airline reservation Interrelated menaing that data stored as separate units can be connected to provide a whole picture for example a customer database relates customer data (name, address etc). As we were discussing on the properties of a database an important term was introduced: Database Managment System. It is therefore imperative that we discuss this concept further as outlined below: What is a Database Management System? It can be defined as: a software system that facilitates the creation and maintenance and use of an electronic database wordnetweb.princeton.edu/perl/webwn A software system used to access and retrieve data stored in a database. [NARA's Managing Electronic Records Instructional Guide] www.ischool.utexas.edu/~scisco/lis389c.5/email/gloss.html A management system associated with a structured set of data that allows the data to be accessed in a variety of ways. In a Relational DBMS, the relationships between the data elements form keys to reduce the amount of data needing to be held and to improve navigation and access speeds www.infodiv.unimelb.edu.au/knowledgebase/itservices/a-z/d.html A set of computer programs for organizing the information in a database. A DBMS supports the structuring of the database in a standard format and provides tools for data input, verification, storage, retrieval, query, and manipulation. celiang.tongji.edu.cn/weian/3S/GIS/ESRI/AboutGIS/a_d.html A collection of programs that enables users to create and maintain a database The definitions above highlight that the DBMS is a general purpose software system that facilitates the processes of defining, constructing, manipulating , sharing and allowing maintenance of databases among various users and applications. Defining a database involves specifying the data types, structures and contsraints for the data to be stored in the database; constructing the database is the process of storing the database itself on some storage medium that is controlled by the DBMS; manipulating a database includes functions such as querying the database to retrieve specific data, updating the database to reflect changes in the miniworld and generatinh reports from the data; sharing a database allows multiple users and programs to access the database concurrently and maintenance means the DBMS must maintain the database system by allowing the system to evolve as requirements change over time Database management systems can be categorized according to the database model that they support, such as relational or XML, the type(s) of computer they support, such as a server cluster or a mobile phone, the query language(s) that access the database, such as SQL or XQuery, performance trade-offs, such as maximum scale or maximum speed or others. Some DBMS cover more than one entry in these categories, e.g., supporting multiple query languages. Why use a database? Why should one use a database system? What are its advantages over a file based system? Limitations of File-based Approach 1.File processing systems store groups of records in separate files… 2. Separation and isolation of data - Each program maintains its own set of data. - Users of one program may be unaware of potentially useful data held by other programs. 3. Duplication of data (redundancy) - Same data is held by different programs. - Wasted space and potentially different values and/or different formats for the same item => data integrity problem: produce inconsistent results 4. Data dependence/ application program dependency – File structure/ format and records are defined in the program/application code. – Changes in formats must be reflected in the code – Time consuming and error prone tasks 5. Incompatible file formats – Programs are written in different languages, and so cannot easily access each other’s files, rather files written in different programming languages cannot readily be combined or compared. 6. Fixed Queries/Proliferation of application programs - Programs are written to satisfy particular functions. Any new requirement needs a new program. 7. Difficulty of representing data in user’ view point Relationships among records are not readily represented or processed Database Approach Arose because: – Definition of data was embedded in application programs, rather than being stored separately and independently. – No control over access and manipulation of data beyond that imposed by application programs. Result – The database and Database Management System (DBMS). Database Approach - Integrated data - All the application data is stored in a database - Programmer is not responsible for co-ordinating files; DBMS will do it. - Less duplication of data - Data is stored in only one place - Less data integrity problems - Program/data independence - Record formats are stored in DB itself, so it is accessed by DBMS, not by application programs - Minimises the impact of data format changes on application programs - Easier representation of user’s view of data. - Controlled access to database Controlled access to database may include: – A security system. – An integrity system. – A concurrency control system. – A recovery control system. – A user-accessible catalog. Benefits of the database approach: - Minimal data redundancy and improved data consistency The concept of normalisation ensures that there is reduced data redundancy in a database. - Ease of access to data/ Improved data accessibility and responsiveness Data in a database is interrelated and is in the same format. This facilitates better data retrieval for general use. - Increased of development productivity/Ease of application development and reduced program maintenance/ reduced application development time. New application programs to manipulate data can be written with ease because the data is integrated and is in the same format. Designing and implementing a new database from scratch may take more time. - Improved data sharing Database systems allow multiple access and update of data in a consistent manner. They also allow different views of the same data. - Enforcement of standards/Improved security and integrity - Improved data quality - Availability of up to date information - Flexibility/Program-data independence - Data is independent from applications and shared by multiple users and applications. It should be possible to effect changes to an application program that accesses the data without having to change the structure of the data itself. Similarly it should be possible to change the structure of the data without affecting the application program that operates on it. - Persistence:It is possible to maintain data over long periods of time, independent of any program that accesses it. - Resilience The ability of data to survive hardware and software failures without sustaining loss or becoming inconsistent can be provided for in a DB environment. Database Architectures: The ANSI-SPARC Architecture • The architecture of most commercial DBMSs is based on the ANSI-SPARC architecture (1975). – American National Standards Institute (ANSI) – Standards Planning And Requirements Committee (SPARC) • Although this never became a formal standard, it is useful to help understand the functionality of a typical DBMS. • The ANSI-SPARC model of a database identifies three distinct levels at which data items can be described. • These levels form a three-level architecture comprising: – an external level, – a conceptual level, and – an internal level. .The objective of the three-level architecture is to separate the users’ view(s) of the database from the way that it is physically represented. This is desirable for the following reasons: • 1. It allows independent customised user views. – Each user should be able to access the same data, but have a different customised view of the data. These should be independent: changes to one view should not affect others. • 2. It hides the physical storage details from users. – Users should not have to deal with physical database storage details. They should be allowed to work with the data itself, without concern for how it is physically stored. • 3. The database administrator should be able to change the database storage structures without affecting the users’ views. – From time to time rationalisations or other changes to the structure of an organisation’s data will be required. • 4. The internal structure of the database should be unaffected by changes to the physical aspects of the storage. – For example, a changeover to a new disk. • 5. The database administrator should be able to change the conceptual or global structure of the database without affecting the users. – This should be possible while still maintaining the desired individual users’ views. The External Level • The external level represents the user’s view of the database. – It consists of a number of different views of the database, potentially one for each user. • It describes the part of the database that is relevant to a particular user. – For example, large organisations may have finance and stock control departments. – Workers in finance will not usually view stock details as they are more concerned with the accounting side of things, for example. – Thus, workers in each department will require a different user interface to the information stored in the database. • Views may provide different representations of the same data. – For example, some users might view dates in the form (day/month/year) while others prefer (year/month/day). • Some views might include derived or calculated data. – For example, a person’s age might be calculated from their date of birth since storing their age would require it to be updated each year. The Conceptual Level • The conceptual level describes what data is stored in the database and the relationships among the data. • It is a complete view of the data requirements of the organisation that is independent of any storage considerations. • The conceptual level represents: – All entities, their attributes, and their relationships. – The constraints on the data. – Security and integrity information. • The conceptual level supports each external view, in that any data available to a user must be contained in, or derivable from, the conceptual level. • The description of the conceptual level must not contain any storagedependent details. The Internal Level • The internal level covers the physical representation of the database on the computer (and may be specified in some programming language). • It describes how the data is stored in the database in terms of particular data structures and file organisations. • The internal level is concerned with: – Allocating storage space for data and indexes. – Describing the forms that records will take when stored. – Record placement. Assembling records into files. – Data compression and encryption techniques. • The internal level interfaces with the OS to place data on the storage devices, build the indexes, retrieve the data, etc. • Below the internal level is the physical level which is managed by the OS under the direction of the DBMS. It deals with the mechanics of physically storing data on a device such as a disk. Database Schemas • The overall description of a database is called the database schema. • There are three different types of schema corresponding to the three levels in the ANSI-SPARC architecture. • The external schemas describe the different external views of the data. – There may be many external schemas for a given database. • The conceptual schema describes all the data items and relationships between them, together with integrity constraints. – There is only one conceptual schema per database. • At the lowest level, the internal schema contains definitions of the stored records, the methods of representation, the data fields, and indexes. – There is only one internal schema per database. Mapping Between Schemas • The DBMS is responsible for mapping between the three types of schema (i.e. how they actually correspond with each other). • It must also check the schemas for consistency. – Each external schema must be derivable from the conceptual schema. • Each external schema is related to the conceptual schema by the external/conceptual mapping. • This enables the DBMS to map data in the user’s view onto the relevant part of the conceptual schema. • A conceptual/internal mapping relates the conceptual schema to the internal schema. • This enables the DBMS to find the actual record or combination of records in physical storage that constitute a logical record in the conceptual schema. Notes on the above example • The two external views are based on the conceptual view. – The Age field is derived from the DOB (Date of Birth) field. – The Sno field is mapped onto the StaffNo field of the conceptual record. • The conceptual level is mapped onto the internal level. • The internal level contains a physical description of the structure for the conceptual record expressed in a high-level language. • Note that the order of the fields in the physical structure is different from that of the conceptual record. • The physical structure contains a “pointer”, next. This will be simply the memory address at which the next record is stored. Thus the set of staff records may be physically linked together to form a chain. Data Independence -I • A major objective of the ANSI-SPARC architecture is to provide data independence meaning that upper levels are isolated from changes to lower levels. • There are two kinds of data independence: • Logical data independence refers to the immunity of external schemas to changes in the conceptual schema. – Changes to the conceptual schema (adding/removing entities, attributes, or relationships) should be possible without having to change existing external schemas or rewrite application programs. • Physical data independence refers to the immunity of the conceptual schema to changes in the internal schema. – Changes to the internal schema (using different storage structures or file organisations) should be possible without having to change the conceptual or external schemas. Database Application Architectures Two-tier client/server architectures have 2 essential components 1. A Client PC and 2. A Database Server 2-Tier Considerations: • Client program accesses database directly o Requires a code change to port to a different database o Potential bottleneck for data requests o High volume of traffic due to data shipping • Client program executes application logic o Limited by processing capability of client workstation (memory, CPU) o Requires application code to be distributed to each client workstation Two – Tier Pros and Cons Advantage Disadvantages Development Issues: -Simple structure -Easy to setup and maintain Development Issues: -Complex application rules difficult to implement in database server – requires more code for the client -Complex application rules difficult to implement in client and have poor performance -Changes to business logic not automatically enforced by a server – changes require new client side software to be distributed and installed -Not portable to other database server platforms Performance: -Adequate performance for low to medium volume environments -Business logic and database are physically close, which provides higher performance. Performance -Inadequate performance for medium to high volume environments since database server is required to perform business logic . This slows down database operations on database server RESEARCH MORE ON 2-TIER ARCHITECTURE 3-Tier client-server architectures have 3 essential components: 1. A Client PC 2. An Application Server 3. A Database Server 3-Tier Architecture Considerations: • Client program contains presentation logic only o Less resources needed for client workstation o No client modification if database location changes o Less code to distribute to client workstations • One server handles many client requests o More resources available for server program o Reduces data traffic on the network 3 – Tier Pros and Cons Advantages Development Issues: • Complex application rules easy to implement in application server • Business logic off-loaded from database server and client, which improves performance • Changes to business logic automatically enforced by server – changes require only new application server software to be installed • Application server logic is portable to other database server platforms by virtue of the application software Performance: • Superior performance for medium to high volume environments Disadvantages Development Issues: • More complex structure • More difficult to setup and maintain. Performance: • The physical separation of application servers containing business logic functions and database servers containing databases may moderately affect performance. RESEARCH MORE ON 3-TIER ARCHITECTURE Data Models A data model is an abstract model that describes how data are represented and accessed. Data models formally define data elements and relationships among data elements for a domain of interest. According to Hoberman (2009), "A data model is a wayfinding tool for both business and IT professionals, which uses a set of symbols and text to precisely explain a subset of real information to improve communication within the organization and thereby lead to a more flexible and stable application environment." A data model explicitly determines the structure of data or structured data. Typical applications of data models include database models, design of information systems, and enabling exchange of data. Usually data models are specified in a data modeling language. The three level of data modeling, conceptual data model, logical data model, and physical data model, were discussed previously(refer to the ANSI-SPARC architecture). Here we will compare these three types of data models. The table below compares the different features: Feature Conceptual Logical Physical Entity Names ✓ ✓ Entity Relationships ✓ ✓ Attributes ✓ Primary Keys ✓ ✓ Foreign Keys ✓ ✓ Table Names ✓ Column Names ✓ Column Data Types ✓ Below we show the conceptual, logical, and physical versions of a single data model CONCEPTUAL MODEL DESIGN LOGICAL MODEL DESIGN PHYSICAL MODEL DESIGN From the above,we can see that the complexity increases from conceptual to logical to physical. This is why we always first start with the conceptual data model (so we understand at high level what are the different entities in our data and how they relate to one another), then move on to the logical data model (so we understand the details of our data without worrying about how they will actually implemented), and finally the physical data model (so we know exactly how to implement our data model in the database of choice). RESEARCH ON KINDS OR TYPES OF DATA MODELS Database Languages These are of two types as discussed below: Data Definition Languages A Data Definition Language (DDL) is a computer language used for defining data structures. Initially, DDL was a subset of SQL statements and the most common ones are briefly mentioned below: Create statements-to make a new database, table, index, or stored query Drop statements - To destroy an existing database, table, index, or view. Alter statements - To modify an existing database object. Data Manipulation Language Data Manipulation Language Commands are used to manipulate the data in the database. This is done either by retrieving information from existing rows, entering new rows, changing existing rows or removing unwanted rows from tables in the database. Data Manipulation Language Commands are: SELECT, INSERT, UPDATE and DELETE. Data Manipulation Languages have their functional capability organized by the initial word in a statement, which is almost always a verb. In the case of SQL, these verbs are: SELECT INSERT UPDATE DELETE ... FROM ... WHERE ... INTO ... VALUES ... ... SET ... WHERE ... FROM ... WHERE ... There are two types of DML: procedural: the user specifies what data is needed and how to get it nonprocedural: the user only specifies what data is needed Easier for user May not generate code as efficient as that produced by procedural languages Database Users and Administrators Database Designers- they are responsible for identifying the data to be stored in the database and for choosing appropriate structures to represent and store this data. These tasks are mostly undertaken before the database is actually implemented and populated with data. It is the responsibility of database designers to communicate with all prospective database users in order to understand their requirements and to come up with a design that meets these requirements. Database Administrators – he or she is the chief administrator who oversees and manages the database, DBMS and related software.The DBA is responsible for authorizing access to the database, for co-ordinating and monitoring its use and for acquiring software and hardware resources as needed. He or she is accountable for problems such as breach of security or poor system response. End Users- these are people whose jobs require access to the database for querying, updating and generating reports. There are several categories of end users as discussed below: Casual End users- occasionally access the database but they may need different information each time. They use a sophisticated database query language to specify their requests and are typically middle or high level managers or other occasional browsers Naive or Parametric end users- make up a sizeable portion of database end users. Their main job function revolves around constantly querying and updating the database, using standard types of queries and updates called 'canned transactions' that have been carefully programmed and tested. Examples of such users include bank tellers, reservation clerks etc Sophisticated end users- include engineers, scientists, business analysts and others who familiarise themselves with the facilities of the DBMS so as to implement their applications to meet their complex requirements Standalone users- they maintain personal databases by using ready made program packages that provide easy to use menu based or graphics based interfaces for example a tax package that stores a variety of personal financial data for tax purposes RESEARCH ON WHAT ROLE PLAYED BY EACH OF THE FOLLOWING GROUP OF PEOPLE: Data administrator system analysts and application programmers tool developers DBMS system designers and implementers operators and maintenance personnel SUMMARY In this chapter we defined a database as a collection of persistent data that can be shared and interrelated. A DBMS is defined as a general purpose software system that facilitates the processes of defining, constructing, manipulating , sharing and allowing maintenance of databases among various users and applications.We identified several characteristics that distinguish the database approach from traditional file processing applications. We discussed the ANSI SPARC architecture and how it reflects the different data models used in a DBMS. Lastly we discussed the different database languages that are used and the main database users. CHAPTER TWO- ENTITY RELATIONSHIP MODEL The Entity-Relationship model is a data model for high-level descriptions of conceptual data models, and it provides a graphical notation for representing such data models in the form of entity-relationship diagrams. Such data models are typically used in the first stage of information-system design; they are used, for example, to describe information needs and/or the type of information that is to be stored in the database during the requirements analysis. The ER data model employs three basic notions: entity sets, relationship sets and attributes. Entity sets Entity- is a 'thing' or 'discrete object' in the real world with an independent existence. An entity may be an object with a physical existence (for example a person, car, house or employee) or it may be an object with conceptual existence ( for example a company. Job or university course). Entity Set- may be defined as a collection of entities all belonging to the same class. Alternatively it may be defined as a set of entities of the same type that share the same properties or attributes. The set of all persons who are customers at a given bank for example can be defined as the entity set customer. The individual entities that constitute a set are said to be the extension of the entity set. Thus all the individual bank customers are the extension of the entity set customer. Attributes An entity is represented by attributes which are defined as descriptive properties possessed by each member of an entity set for example the student entity may be described by the student name, date of birth, registration number, degree programme etc. Each entity has a value for each of the attributes for example the student entity may have the value Jones for student name, the value 27-05-90 for date of birth and so on. The attribute values that describe each entity become a major part of the data stored in the database. For each attribute there is a set of permitted values called the 'domain' or 'value set' of that attribute. The domain of attribute student name might be the set of all text strings of a certain length. There are 3 main attribute types as discussed below: Simple and composite attributes- in the example given above, the attributes are simple that is they are not divide into subparts (student name, date of birth etc). However composite attributes may be divided into subparts for example the attribute student name may be structured as a composite attribute consisting of first name, middle name, surname; the attribute address may be structured as a composite attribute consisting of street, city, country, postal code. Composite attributes help us to group together related attributes making the modeling cleaner. Single valued and multivalued attributes- considering the examples we gave above the attributes have a single value for a particular entity for example a student has only one date of birth, one registration number and so on. However they may be instances where an attribute has a set of values for a specific entity for example a person may have several cell phone numbers. This type of attribute is said to be multivalued. Where appropriate, upper and lower bounds may be placed on the number of values in a multivalued attribute for example one may limit the number of phone numbers recorded to two. Stored and Derived attributes- in some cases, two (or more) attribute values are related for example the age and DOB attributes of a person. For a particular person. The value of age can be determined from the current (today') date and the value of that person's date of birth.The Age attribute is hence called a derived attribute and is said to be derivable from the DOB attribute which is called a stored attribute. An attribute takes a null value when an entity does not have value for it. The null value may indicate 'not applicable' that is the value does not exist for the entity. For example one may not have a middle name. Null can also designate that an attribute value is not known. An unknown value may either be missing ( the value does exist but we do not have that information) or not known ( we do not know whether or not the value actually exists). Relationship sets A relationship is an association among several entities for example we can define a relationship that associates student x with degree program z. We can say the student x is a student studying for a degree z. A Relationship set is a set of relationships of the same type
© Copyright 2026 Paperzz