Define the scope and function of the enterprise you have selected

Page |1
CS 5423 Project Part 1 by Hussachai Puripunpinyo
Step1
Define the scope and function of the enterprise you have selected. Identify the
requirements of your database.
The system that I have selected is blog system. Because today, many people tend to have their own blog
to share their opinions, knowledge, or even personal diary. The enterprises doing blogging business are
blogger (by google), wordpress. For this system, I’m trying to make it lean and have only important
features.
Functional requirements.
1. The blog must be able to authenticate the user against credential storing in database and also
be able to authorize the user in order to limit the action of the user. The password must be
encrypted using standard hash algorithm like SHA1, SHA2, and so on.
2. User can customize their profile. The profile of each user should be extensible. The example of
regular profiles are first name, last name, URL, etc. User also has system specific profile such as
password expiration date, locale, etc. We have the common name for both personal and system
profile that is settings.
3. User can create new blog post and edit the existing one. The owner of the post can choose
whether he or she wants to make it public or private depending on the post status. The blog
post can have many comments or none if the owner decide to disable comment. Blog post can
be accessed either in chronological manner or pretty URL (slug)
4. Comment can be nested. In the other hand, user can reply to the specific comment. If the user
click the reply under the blog post, it means that user replies to the post not the other
comments. The comment rule in blog post indicates the comment activities such as
authenticated user only, anonymous comment allowed, or disabled.
5. Category is like the directory in file system. It can be nested. Like file system, blog post is similar
to file, it can reside in only one directory (category). Category is used to keep the blog post
organized and group the similar posts into the same category.
6. Tag is the way to access blog post quickly by using short string. Tag is similar to category in some
way but it is different. It is designed to be simpler and more flat. Tag cannot be nested and one
blog post can have many tags and vice versa. The set of tags will be displayed in dynamic
manner at the side bar or any positions in the blog. The font size of tag will be varied based on
the number of blog posts that tag is referred to.
Database design aspect.
The database must use UTF-8 encoding to support multiple languages. To be more specific, I used utf8
for encoding and utf8_default_ci for collation.
I decided to use surrogate key in database design because I think that all attributes in database can be
changed except ID. So, it’s not a good idea at all to pick the ID from attributes. The surrogate key can
simplify the program, and it can tweak the performance of database in some cases. For example,
variable characters are slower than number when it’s used for indexing and it consumes more memory
as well.
Page |2
CS 5423 Project Part 1 by Hussachai Puripunpinyo
The target database is MySQL due to the requirement that I need to use the one in CSX server. The
database engine is InnoDB because it’s full ACID compliant and supports fully transactional and foreign
keys.
Most attribute names are not prefixed by table name because I think it’s not necessary and
cumbersome because they are under table name and table name is the namespace for them.
CSX limits student to have only one database, I decided to prefix table name by the system name which
is “blog_” in order to separate this project’s tables from the future tables of different system.
Use cases
-
-
-
-
-
-
After user is created, user can login to the system. The email is used as username. The password
is used for authentication and role is used for authorization.
Email is used as username and user can change their profile including email.
If user role is admin, he or she can do anything with the system.
If user role is author, he or she can create new blog post beside the abilities that user role has.
If user role is user, he or she can comment on the blog posts that are limited to authorized user
only.
Any users can comment on the blog post even they are not a member but they cannot comment
on the blog post that has status indicating authenticated user only. Unregistered users can
choose not to enter their name when they comment (if they are allowed), those users will be
shown as “Anonymous” user.
Comment can refer to another comment. We call this feature as “reply to”
Users that have role author can delete blog post that they created, all associated comments will
be removed as well.
The blog post will be displayed by chronological order. The latest one will be on the top.
Specific blog post can be accessed in 2 ways. For example,
http://somedomain.com/blog/post/ID (ID is the id of blog post)
http://somedomain.com/blog/post/slug (slug is the preferred way)
Blog posts can displayed as the list when user access the URL in the following format
http://somedomain.com/blog/posts/dateformat
http://somedomain.com/blog/posts/category/category1/category2/...
http://somedomain.com/blog/posts/tag
Admin can create, edit or delete categories. The category can be under other category. The
concept is like directory structure in file system. The blog post will be like a file. When admin
delete category, all categories that are under deleting category will be deleted as well. However,
admin cannot delete category that is referred by blog post. Admin can delete empty category
only.
When author add tag to the blog post. The counter associated with that tag will be incremented
by one. When author remove tag from the blog post, counter will be decremented by one. The
counter is used to change the size of tag label that will be displayed in side bar.
What other roles can do, admin can do.
Page |3
CS 5423 Project Part 1 by Hussachai Puripunpinyo
Step2
Model the data for the enterprise you have selected using Entity-Relation (E-R) diagrams.
There must be a minimum of 5 entities.
email
setting_name
password
setting_value
user_id
display_name
id
registered_date
settings
users
owns
status
role
authors_post
author_id
comment_
rule
authors_comment
content
slug
author_name
title
id
status
author_email
posts
comments
has
author_url
created_date
id
child
modified_date
type_of
tagged_by
approved
parent
content
author_id
replies_to
id
description
categories
tags
name
child
parent
name
ref_count
belongs_to
Figure 1. The E-R diagram of blog system.
Page |4
CS 5423 Project Part 1 by Hussachai Puripunpinyo
Table “blog_users”
Table “blog_settings”
Candidate Keys
Candidate Keys
-
id (primary key)
email
Foreign Keys
-
Foreign Keys
-
-
Constraints on attributes
-
All attributes (id, email, password,
display_name, status, role,
registered_date) are not null.
Constraints on tuples
-
id is primary key
email is unique index
setting_name + user_id (primary key)
user_id  “blog_users.id” (on delete
cascade)
Constraints on attributes
-
setting_name is not null.
Constraints on tuples
-
setting_name + user_id is primary key
Remark
All settings that are associated with the user shall
be deleted when user is deleted.
Remark
I created surrogate key, id, to identify user tuple
instead of using email because email can be
changed.
Table “blog_posts”
Table “blog_comments”
Candidate Keys
Candidate Keys
-
id (primary key)
Foreign Keys
-
author_id  “blog_users.id”
on delete cascade
category_id  “blog_categories.id”
one delete restrict
Constraints on attributes
-
All attributes except modified_date are
not null.
Constraints on tuples
-
id is primary key
slug is unique index
Remark
blog_posts will be deleted when user is deleted.
blog_categories cannot be deleted if it’s referred
by blog_post
-
id (primary key)
Foreign Keys
-
post_id  “blog_posts.id”
on delete cascade
author_id  “blog_users.id”
on delete set null
reply_to  “blog_comments.id”
on delete cascade
Constraints on attributes
-
post_id is not null
content is not null
Constraints on tuples
-
id is primary key
Remark
The author_name, author_email and author_url
are not duplicate data. They are declared on
purpose because this blog system supports nonuser and anonymous to comment.
Page |5
CS 5423 Project Part 1 by Hussachai Puripunpinyo
Table “blog_tags”
Table “blog_posts_tags”
Candidate Keys
Candidate Keys
-
id (primary key)
Foreign Keys
-
Foreign Keys
-
-
Constraints on attributes
-
-
all attributes except modified_date are
not null.
Constraints on tuples
-
post_id  “blog_posts.id”
on delete cascade
tag_id  “blog_tags.id”
on delete cascade
Constraints on attributes
-
id is primary key
name is unique index
Remark
post_id + tag_id (primary key)
All attributes are primary key and they’re
unique implicitly.
Constraints on tuples
-
It has many-to-many relationship with blog_posts
post_id + tag_id is primary key
Remark
-
Table “blog_categories”
Candidate Keys
-
id (primary key)
Foreign Keys
-
parent_id  “blog_categories.id”
on delete cascade
Constraints on attributes
-
name is not null
Constraints on tuples
-
id is primary key
name is unique index
Remark
When we delete the parent category, all children categories will be deleted as well.
Normalize Steps
It seems like I designed the models to be BCNF in the first place.
It has no repeating groups, PK identified, no partial dependencies, no transitive dependencies, and every
determinant is a candidate key. Only exception, “blog_users” table looks like it violates the 3NF because
email attribute which is another candidate key can refer to all attributes. But, if I just remove surrogate
key from that table, they will become BCNF. However, the requirement states that user must be able to
change email. So, we should not use attribute that is likely to be changed as a primary key. That’s the
reason why I used surrogate key in “blog_users” table.
Page |6
CS 5423 Project Part 1 by Hussachai Puripunpinyo
Step 3
Transform the E-R model into relations. There must be a minimum of 5 relations
Relations
Figure 2. The physical data model of blog system.
Rewriting the relations here would be duplicate. See the physical model in Figure 2 for relations.
The database name is hussach