Modeling NoSQL Databases: From Conceptual to Logical Level

Modeling NoSQL Databases: From Conceptual to
Logical Level Design
Shreya Banerjee, Anirban Sarkar
Department of Computer Applications, National Institute of Technology, Durgapur, India
{[email protected], [email protected]}
Abstract— NoSQL databases have been emerged as a
revolutionary technology for modern web-scale and cloud-based
applications. A variety of NoSQL databases are industrialized
which have different types of physical level data models.
However, lack of common standardization among NoSQL
databases makes harder to choose the right database by
organizations. Moreover, lack of commonly accepted conceptual
model for NoSQL databases often makes it difficult to choose the
right physical data model for specific application. Further, such
databases do not have separate logical schema and thus
maintenance becomes complicated. Hence, there is a strong need
for common conceptual model for those databases. Further
enforcement of proper data validation mechanism is required in
order to maintain rich set of varied typed data and get efficient
data quality in the context of NoSQL databases. This paper has
proposed a common conceptual level model for varied types of
NoSQL databases. Further, a NoSQL data specification language
has been devised to represent equivalent logical level data model,
independent of any physical level representations. In addition,
distinct validation rules regarding the proposed conceptual
model have been proposed, formalized and illustrated using a
suitable case study.
Keywords— NoSQL Databases; Conceptual Model; Logical
Model; Data Validation
I. INTRODUCTION
Modern web-scale applications impose distinct challenging
demands for the data management. The most challenging part
is to deal with huge quantity (from terabyte to petabyte) and
different types (structured to unstructured, and hybrid) of data
which are synthesized and shared in rapid rate [2]. To cope
with these challenges of modern agile applications, the
databases require the support for flexible schema and good
horizontal scalability for the simple read/write operations
distributed over many servers [3]. Traditional databases such as
RDBMS (Relational Database Management Systems) have
limited capability of horizontal scaling. Beside, those databases
have fixed schema. In recent days, NoSQL databases can be
used as sole or as a complement to the relational databases for
these new web applications [1]. These databases have
distinguished features such as persistent and non-relational
data, flexible schema, dynamic insertion of any kind of data,
high availability of data, massive horizontal scaling,
distribution, replication, support towards BASE (Basically
Available, Soft State, Eventually Consistent [4]) and CAP
(Consistency, Availability and Persistence [4]) consistency
model [4]. Further, a variety of NoSQL databases are
developed by practitioners and web companies to meet their
requirements [3]. Four basic categories of NoSQL databases
are classified by experts based on different physical level data
models. Those are Key-Value stores, Document Store,
Column-Family store and Graph databases [5].
The heterogeneity in physical level data models of NoSQL
databases draws significant challenges towards application
developers [6]. The main challenge is the lack of a standard.
This is the great concerns for the organizations interested to
adopt any of these NoSQL databases [7]. Moreover, it also
creates challenges towards application developers to choose
right physical level data model for their applications. The
standardization can be achieved by spotting common concepts
and relationships in the data model which are seemed general
[8]. Hence, it is highly required to devise a common conceptual
model that can be able to conceptualize the discrete facets of
various types of NoSQL databases. Further, it appears that
current NoSQL databases make no distinction between logical
and physical schema which complicates the maintenance of
these databases. Hence, a methodological framework
consisting of conceptual level data modeling followed by
logical level data modeling is another prime requisite since
these databases have to be accessed by applications [9]. In
addition, data model validation is an important part of data
analysis tasks. The consequences of having invalid data ranges
from harmless application failures to serious errors in decision
making process [10]. Since data is constantly being updated,
deleted, modified and queried in NoSQL databases, having
valid data is must. Moreover, adequate data validation methods
aids in efficient data quality [11] which can facilitate more
consistent and functional design of NoSQL databases and
provide more data insights towards users and designers.
Very few approaches are available in literature regarding
common conceptual level data model of NoSQL databases. In
[12], the proposed data model is built based on data query
requirement and stored data structure. In [13], the proposed
data model is used to specify a system independent realization
of application data and provides scalability, performance and
consistency. Authors in [14] have signified distinctive elements
and database hierarchy of NoSQL and relational databases with
common formal representation. In [8], authors provide a
programming interface towards NoSQL databases based on a
meta-layer providing common structure across the NoSQL
databases. Further, authors have also implemented the metalayer in JSON (Java Script Object Notation). All of these
proposed data models are used to specify system independent
realization of application data. But, in majority authors have
not identified distinct relationships between data in proper way.
Further, none of these approaches specified transformation
Collection Type
Collection
Layer
Top-Most Level Family Type
Different
Relationships
Family
Layer
Bottom-Most Level Family Type
Attribute Type
Attribute
Layer
Here, IcntAT implies Inverse Containment relationships
attached with Attribute and predicate IN() specify instances of
Attribute.
Family Layer: It is the middle layer of the conceptual
model. It may be made of numerous FA constructs types.
Several semantically related Attributes are grouped together to
form the lowest level Family Layer FA construct type. This
layer can be further decomposed into multiple levels as per
choices of designers.
Formalization:
Fig. 1. Different Layers, main construct types and distinct relationships
of proposed conceptual model
from conceptual to logical and further to physical level data
models. Moreover, none of these approaches suggest about
data validation rules.
Here, FAllev, FAulev and FAlev are denoted as Families in the
bottom-most level, in the top-most level and in any level
respectively.
With the intention of addressing the above mentioned
issues, in this paper, a unifying conceptual model for different
NoSQL databases is proposed in section II. This common
model is intended to standardize conceptual level modeling of
NoSQL databases. Beside, a NoSQL data specification
language has been proposed in section III based on the
proposed conceptual model. This language is devised in order
to separate logical level representation of NoSQL databases
from its conceptual level representation, independently of their
physical level design. Further, in section IV this specification
language is illustrated using a suitable case study. In addition,
several data validation methods are demonstrated in section V
to validate NoSQL data in the context of proposed conceptual
model. Finally, section VI is about conclusion and future work.
Collection Layer: It is the uppermost layer of the
conceptual model. Semantically related upper most layer
Families are assembled together to form a Col. From the top
level, the entire database can be viewed as a set of Cols.
II. PROPOSED CONCEPTUAL MODEL FOR NOSQL DATABASES
Proposed conceptual model is consisting of common set of
constructs, relationships and a number of significant properties
of relationships to unify conceptual level representations of
different NoSQL databases. In addition, proposed model is
comprising of three inter-related layers. Every layer has their
identifiable construct types which makes each layer distinct
from each other. Fig. 1 provides the description of the layered
organization of proposed conceptual model.
A. Constructs and Layers in Proposed Model
Proposed conceptual model can be realized as a layered
organization composed of three main layers namely –
Collection, Family and Attribute. Three layers have their
respective construct types - Collection (Col), Family (FA) and
Attribute (AT). Different construct types are related with each
other using relationships specified in section B.
Attribute Layer: It is the base layer of the conceptual
model. AT construct types are the groups of all possible same
kinds of instances and elementary in nature. A specific AT
known as Primary_Attribute (PAT) uniquely identifies
individual instances.
Formalization:
Formalization:
B. Relationships in Proposed Model
Distinct constructs of proposed model are connected with
one another using distinct relationships. These relationships of
the proposed model can be of two types. One is Inter-layer
kind relationships, and another is Intra-layer kind relationships.
Inter-layer kind relationships: These relationships can be
between disparate construct types of two different layers.
Intra-layer kind relationships: These relationships can be
between analogous construct types of identical layer.
Containment (Cnt): Containment relationships are
presented between two construct types when one encapsulate
similar or different types of constructs.
Inverse Containment (Icnt): This relationship enables one
construct type to de-encapsulate itself in order to dynamically
encapsulate similar or dissimilar construct types.
Association (AS): Association relationships connect similar
types of constructs intended for accomplishment of several
objectives together.
Inheritance (IH): Inheritance relationship is between
similar types of constructs when one derives several properties
of another construct.
Reference (ref): Reference relationship is the symmetric
connection when one construct type refers other similar types
of constructs those have the same instances.
HasTime (HT): This is the relationship between each
construct type and their survival time-duration or time stamp.
Containment and Inverse Containment can be both Interlayer kind and Intra-layer kind relationships. Rest can be only
Intra-layer kind relationships.
C. Properties of various Relationships
Relationships existing in this data-model can have several
properties such as Multiplicity, Ordering, Modality,
Availability, Conditional-Participation and Consistency.
Multiplicity (Mlp): It defines how many instances of a
construct type are taking part in a relationship.
Modality (Mdl): This defines whether the relationship is
mandatory or optional.
Ordering (or): It states that the constructs participating in a
relationship preserve several ordering constraints or not.
Availability (Avl): This specifies the existence timeduration or time stamps of the relationship.
Conditional-Participation (Cnd): This property identifies
how many instances of constructs take participation in the
relationship based on varied conditions like all of, any of etc.
Consistency (Cnst): This specifies whether the existence of
relationships keeps the whole model in a stable state. However,
the proposed model can be in inconsistent state for a small time
duration when inverse contained relationships are initially
appeared in the data model.
In proposed data model, Containment, Inverse Containment
and Association relationship have all the six properties.
However, Inheritance and HasTime relationships have three
properties – Modality, Consistency, Availability; and Reference
relationship has four properties – Modality, Availability,
Conditional-Participation and Consistency. Further, the
proposed conceptual model supports BASE [4] and CAP [4]
consistency models due to its well-formed constructs and
relations.
III. PROPOSED NOSQL DATA SPECIFICATION LANGUAGE:
LOGICAL LEVEL REPRESENTATION
Conceptual level data models are closer to the way users
perceive an application domain and logical level data models
are closer to the way designers perceive the application domain
[15]. A specification language is proposed here to represent the
proposed conceptual model of NoSQL databases at logical
level. Based on the proposed specification language the
proposed conceptual model can be transformed towards any
existing physical level data models of NoSQL databases. Two
types of Module Templates are defined in the proposed
specification language for logical level representation of
NoSQL data, namely, (i) Regular Module Template and (ii)
Complex Module Template. In both templates, Collection
encapsulates only Families and Families may encapsulate other
Families and Attributes. Further, each template is consisting of
several segments such as Time Stamp, UID, Contained
Elements, Associated Elements etc. However, all segments are
not mandatory. In addition, segments like, Contained Elements
or Associated Elements have several fields such as Mandatory,
time stamp etc. Table I and II have summarized these. Beside
static creation, both templates can also be created dynamically.
If any of these proposed templates contains Inverse Contained
Elements segment, then that will be a dynamic template. In
both templates, segments are in bold letters and ended with
Module <Module Name> Type (Col | FA): <visibility>
UID: <uid>
Time Stamp: <time stamp>
Contained Element:
Family Type Element: (allof | anyof | onlyoneof | noneof)
(ordered | unordered | partial ordered) (Mandatory|Optional)
<visibility> <time stamp> <maximum cardinality> <minimum
cardinality> <Element name>
Attribute Type Element: (allof | anyof | onlyoneof | noneof)
(ordered | unordered | partial ordered) (Mandatory|Optional)
<visibility> <time stamp> <Element name: Type>
Inverse Contained Element:
(Mandatory| Optional) < time stamp > <Element name>
End Module
Fig. 2. Regular Module Template
TABLE I.
DESCRIPTIONS OF SEGMENTS IN MODULE TEMPLATES
Segments
Module
Module Name
Type
UID
Time Stamp
Contained Elements
Inverse Contained
Elements
Associated Elements
Referred Elements
Parent Elements
Child Elements
TABLE II.
Descriptions
Beginning of a Module
Name of the specific Module
The kind of construct type
Primary Attribute specification
Time version or time duration of existence
of the specific construct type
Encapsulated elements of the specified
construct
type
using
Containment
relationship
Dynamically appended element in the
specified construct type using Inverse
Containment relationship
Elements which are related with the
corresponding construct type using
Association relationship
Elements which are referred by the
corresponding construct type using
Reference relationship
Parent elements which are inherited by the
specific construct type
Child elements which derives the specific
construct type
DESCRIPTION OF FIELDS OF CONTAINMENT SEGMENT
OF MODULE TEMPLATES
Fields
Family Type
Attribute Type
Element’s Name
Type
Referencing Attribute
Referred Attribute
Visibility
Mandatory, Optional
Allof,onlyoneof,
anyof, noneof
Ordered, partial
ordered and
unordered
time stamp
Maximum and
minimum cardinality
Descriptions
Specify whether the encapsulated elements are
of Family Type
Specify whether the encapsulated elements are
of Attribute Type
Names of encapsulated elements
Specify the built-in data type or value of a
particular Attribute
Specify the Attribute which refer another
Attribute
Specify the type or value of Referencing
Attribute
Refers different access specifiers of elements
like Private, Protected and Public
Specify Modality of relationship
Specify Conditional Participation
Specify Ordering
Specify Availability
Specify Multiplicity
colon, user or system defined values are specified within <>
and choices are given within ().
IV. ILLUSTRATION OF PROPOSED NOSQL DATA SPECIFICATION
LANGUAGE USING A CASE STUDY
Regular Module Template: This module template
represents simple structure of Collection and Family construct
types with their corresponding Contained and Inverse
Contained construct types at logical level. Fig. 2 illustrates
regular module templates.
Consider an e-prescription application using which a doctor
can prescribe a prescription towards patient electronically. This
application can be capable to aid doctors to (1) query about
guided observation of patient health status, their previous
prescriptions, medical history and their vital signs, (2) analyses
input information and (3) prescribe medication towards
patients according to their current and previous health status.
Complex Module Template: This module template
represents complex structure of Family and Collection
construct types with their corresponding Contained, Inverse
Contained, Associated, Child, Parent and Referred elements.
Fig. 3 illustrates complex module templates.
Module <Module Name>
Type (Col |FA): <visibility>
UID: <uid>
Time Stamp: <time stamp>
Contained Element:
Family Type Element: (allof | anyof | onlyoneof | noneof) (ordered |
unordered | partial ordered) (Mandatory|Optional)
<visibility> <timestamp> <maximum cardinality> <minimum
cardinality> <Element Name>
Attribute Type Element: (allof | anyof | onlyoneof | noneof) (ordered |
unordered | partial ordered) (Mandatory|Optional)
<visibility> <time stamp> <Element name: Type>
Referred Attribute Type Element: (allof | anyof | onlyoneof | noneof)
(ordered | unordered | partial ordered) (Mandatory| Optional)
<visibility> <time stamp> <Element name: Type>
Inverse Contained Element:
(allof | anyof | onlyoneof | noneof) (ordered | unordered | partial ordered)
(Mandatory|Optional) <visibility> < time stamp > <Element name>
Child Element:
(Mandatory|Optional) < time stamp > <visibility> <Child
Collection’s name for Col construct type module>
(Mandatory|Optional) < time stamp > <visibility> <Child
Family’s name for FA construct type module>
Parent Element:
(Mandatory|Optional) < time stamp > <visibility> <Child
Collection’s name for Col construct type module>
(Mandatory|Optional) < time stamp > <visibility> <Child Family’s name
for FA construct type module>
Associated Element:
(allof | anyof | onlyoneof | noneof) (ordered | unordered | partial ordered)
(Mandatory|Optional) <visibility> < time stamp > <maximum
cardinality> <minimum cardinality> <Associated Collection’s name
for Col construct type module >
(allof | anyof | onlyoneof | noneof) (ordered | unordered | partial
ordered) (Mandatory|Optional) <visibility> < time stamp > <maximum
cardinality> <minimum cardinality> <Associated Family’s name for
FA construct type module >
Referred Element:
(allof | anyof | onlyoneof | noneof) (ordered | unordered | partial ordered)
(Mandatory|Optional) <visibility> < time stamp > <maximum
cardinality> <minimum cardinality> <Referred Collection’s name for
Col construct type module >
(allof | anyof | onlyoneof | noneof) (ordered | unordered | partial ordered)
(Mandatory|Optional) <visibility> < time stamp > <maximum
cardinality> <minimum cardinality> <Referred Family’s name for FA
construct type module >
End Module
Fig. 3. Complex Module Template
This case study has several facets such as patient, doctor,
medication advices, medical history of patients, guided
observation records of patients etc. Every facets have several
addition information. Such as patients, doctors have personal
information like name, age, registration numbers etc. In
addition, often patients may have no previous prescriptions or
medical history. Beside, types of guided observations of
patients and drug information can also be frequently changed.
In addition, a patient may have e-prescriptions prescribed by
different doctors for same disease. All of these characteristics
of the case study imply that the data set is highly irregular and
need flexible representation. Hence, NoSQL databases are
required to manage this data set.
According to the case study “E-prescription records” is a
Collection in the proposed conceptual model. It consists of
several Families such as “E_prescription_info”, “Patient” etc.
Further,
Families
have
Attributes.
For
example
“E_prescription_number” is an Attribute of a Family
“E_prescription_info”. The key elements of this case study are
listed below. Collections (Col) are specified in bold letters.
Families (FA) are identified in italic letters. Attributes (AT) are
denoted in non-italic letters. Elements within parenthesis are
mandatory whether elements within braces are optional that
can be added to the model on the fly. Regular static module
specification of “E-prescription records” is demonstrated in
Fig. 4. In addition, complex dynamic module specification of
“Patient_guided_observation” is specified in Fig. 5.
E-prescription records (E_prescription_info, Patient,
Doctor, Medications, {Disease_Identification})
E_prescription_info(e_prescription_number,
prescription_date,)
Patient (Patient_Personnal_info,
{Patient_Medical_history}, Patient_guided_observations)
Patient_Personnal_info(patient_name,patient_age,patient_
gender, patient_registration_number)
Patient_Medical_history({Previous_prescription_number},
{drug_allergy}, {previous_diseases})
Patient_guided_observations({Patient_symptoms},{diagno
sis_result}, patient_nutrition, patient_vital_signs)
Patient_vital_signs(blood_pressure, pulse, height, weight,
BMI, temperature)
Patient_Symptoms (any_of {Importance, affected region},
Intensity, {symptom_type})
Doctor(doctor_name, doctor_registration_number,
doctor_contact_number)
Medications (anyof {drug_generic_name, comments},
{dosage_form}, {quantity}, {duration})
Diseases_Identification (diseases_symptoms, Treatment)
Module E-prescription record Type (Col): Public
UID: E_prescription_number
Time Stamp: 2015-01-20T09:30:10
Contained Element:
Family Type Element: (allof) (Ordered) (Mandatory) < 201501- 20T09:30:10 > E-prescription-info
(allof) (Ordered) (Mandatory) < 2015-01-20T09:30:10> Patient
(allof) (Ordered) (Mandatory) < 2015-01-20T09:30:10> Doctor
(allof) (Ordered) (Mandatory) < 2015-01-20T09:30:10>
Medication
Inverse Contained Element:
(Optional) < 2015-01-20T09:30:10> Disease_Identification
End Module
Fig. 4. Regular Module Template of Collection “E-prescription
record”
Module: Patient_guided_observation Type (FA): Public
Time Stamp: 2015-01-20T09:40:10
Contained Element:
Family Type Element: (allof) (Ordered) (Mandatory) <
2015-01- 20T09:40:10 >Patient_vital_signs
Attribute Type Element: (allof) (Mandatory) <2015-0120T09:40:10 > patient_nutrition: string
Inverse Contained Element:
(Optional) < 2015-01- 20T09:50:10 > Patient_symptoms
Associated Element:
(Ordered) (Optional) < 2015-01- 20T09:55:10 >
<maximum cardinality> <minimum cardinality>
Disease_Identification
End Module
Fig. 5. Complex Module Template of Family “Patient guided
observation”
V. PROPOSED NOSQL DATA VALIDATION METHODS
Data validation is the process of checking that data
conforms towards the specification of a data model [10].
NoSQL databases support easy schema changing. Although
this feature is appealing in the first moment but frustrating
when have to deal with huge databases. Easy schema
modification results in bad values or sometimes missing data in
large databases. To resolve this issue, data validation should be
constructed on raw data. Three data validation methods are
presented in this section regarding the proposed data model.
Beside, several validation rules are formally specified and
illustrated using case studies.
A. Structural Validation
Structural validation is the process of conformity of the real
data from domain towards building blocks of the proposed data
model. This proposed work includes twofold structural
validation mechanisms. One is static structural validation and
another is dynamic structural validation.
(a) Static Structural Validation: This validation method
checks whether static data of domain, which is fixed and not to
be changed in future, conforms to the static structural part of
the data model. The proposed sets of rules for such validations
are as follows
Rule 1: Elementary data should not encapsulate other
elementary or composite data.
Formalization:
,
,
,
Example–“Patient_name”, “age”, “e-prescription numbers” are
elementary data and they cannot contain other data.
Rule 2: Composite data should contain some elementary or
other composite data.
Example - “Patient_guided_observations” is a composite data.
It
can
encapsulate
other
composite
data
like
“patient_vital_signs” and elementary data such as
“patient_nutritions”.
Rule 3: Only composite data will be parent data when it
contains at least one child data.
Example - One “medication” may inherit substances of another
“medication”. Thus a “medication” will be a parent data when
another “medication” inherit it.
Rule 4: Several data should only refer to data which are of
similar type.
Example- Attribute “previous_prescription_number” of Family
“Patient_medical_history” refers another Attribute “eprescription_number” of another Family “E-prescription_info”.
(b) Dynamic Structural Validation: This validation method
proves whether dynamic data of domain, whose types are not
predefined or frequently changed, conforms towards the
dynamic structural part of the data model. The proposed sets of
rules for such validations are as follows
Rule 5: Dynamically entered data has no existence before
or after a specific time period.
Example– “Patient_Symptoms” can be dynamically
appeared in a database. Thus “Patient_Symptoms” have no
existence before a time period.
Rule 6: Dynamically entered data have no predefined types
in the moment of initial appearance in the database.
Module
Module
Patient_guided_observation Patient_guided_observations
s
Type (FA): Public
Type (FA): Public
Time Stamp: 2015-01Time Stamp: 2015-0120T09:40:10
20T09:30:10
Contained Element:
Contained Element:
Attribute Type Element:
<2015-01-20T09:30:10>
Attribute Type
Element: <2015-01Patient’s_nutrition: string
20T09:30:10>
Family Type Element: <2015Patient’s_nutrition:
01-20T09:36:10>
string
Patient_Symptoms1
Inverse Contained
Inverse Contained Element:
<2015-01-20T09:55:00>
Fig. 6. Example of validation rule 5 and 6
Example- A dynamically appended data “Patient_symptoms”
does not have any predefined type in the moment of initial
appearance in the database. Fig. 6 illustrates rules 5 and 6.
Module Medications
Type (FA): Public
Initial Time Stamp: 201412-20T09:30:10
Ending Time Stamp: 201412-30T09:30:10
Contained_Element:
Attribute Type
Element:
quantity: integer
End Module
Module Medications
Type (FA): Public
Initial Time Stamp: 2014-1230T09:30:10
Ending Time Stamp: 2015-0110T09:30:10
Contained_Element:
Attribute Type Element:
quantity: integer
(quantity>prev_quantity)
End Module
Fig. 7. Example of validation rule 14
B. Constraint Validation
Constraint validation is the process of conformity of real
data with distinct domain constraints. Further that conformity
should be recognized in the data model. The proposed sets of
rules for such validations are as follows
Rule 7: Mandatory constraints specify that a specific data
should have value.
Formalization:
□
_
Example- Family “patient_symptoms” may have information
about either its “Importance” or its “Affected_area”.
Rule 13: Static constraints do not depend on value of data
in past or future states of data.
Example- Attribute “Previous_prescription_number” refers
another Attribute “e_prescription_number” in all states.
Rule 14: Dynamic constraints depend on value of data in
past or future states of data.
Example - Attribute “quantity” of Family “Medications”
depends on value of “quantity” of previous “Medications”
towards same patient. Fig. 7 demonstrates this rule.
C. Consistency Validation
Consistency in data refers to the requirement that any
transaction must affect data only in permissible ways.
Consistency validation is the process of confirmation that
consistency in real data is preserved in the proposed model.
The proposed sets of rules for such validations are as follows
Rule 15: Type of a specific static data should be same in all
times.
Formalization:
2
_ _
, 2
1 2
_ _
, 1
_
1
_
Here, datavalue() is a predicate specifying value of data x
and □ is necessity operator.
Example - Attribute “doctor_registration_number” should have
value in all possible states.
Rule 8: Type constraints specify that a data should have
value from its value space or range of type.
Example- Attribute “drug_generic_name” should have value of
string type in all time.
Rule 9: Cardinality constraints specify that participation of
a data is not less than minimum cardinality and not more than
maximum cardinality.
Example- A Family “Doctor” may have minimum one and
maximum 3 “doctor_contract_number”.
Rule 10: Ordering constraints specify that a group of data
should maintain some ordering conditions.
Example- Family “patient_general_conditions” should contain
Attribute in this order “blood_pressure”, “pulse”, “height”,
“weight”, “BMI”, “temperature”.
Rule 11: Uniqueness constraint requires that there is an
elementary data whose value is need to be unique among all
instances of an element.
Example- “Patient_registration_number” should be unique in
all records of Family “Patient_personnal_info”.
Rule 12: Participation constraint specify that participation
of a data in some relationships may be based on conditions
such as all, only one, any one and no one participation.
Module Patient_medical_history
Type (FA): Public
Initial Time Stamp: 2015-0120T09:30:10
Ending Time Stamp: 201501-20T010:40:10
Contained_Elements:
Attribute Type Element:
previous_prescription_nu
mber: integer
previous_disease: string
End Module
Module Patient_medical_history
Type (FA): Public
Initial Time Stamp: 201507-20T012:40:10
Ending Time Stamp: 201507-20T015:23:02
Contained_Elements:
Attribute Type Element:
drug_allergy: integer
previous_disease: string
End Module
Fig. 8. Example of validation rule 16
Example: Type of static data like “e_prescription_number” will
be integer in all possible times.
Rule 16: Type of a specific dynamic structural data may be
not in existence before or after a specific time period.
Example:
Type
of
dynamic
data
such
as
“Patient_medical_history” can be changed with the time.
Sometimes “Patient_medical_history” encapsulates data such
as “Previous_prescription_number” and “previous_disease”.
Sometimes it may encapsulate data such as “drug_allergy” and
“previous_disease”. So the type of data is different in different
times. Figure 8 demonstrates this rule.
Rule 17: Referred data is destroyed after destruction of
referencing data.
Formalization:
_
2
1 2
1, 2
,
_
_
1
2
_
,
the proposed model partially. Further, the proposed design
approach differentiates the logical level design of NoSQL
Example: Attribute “e-prescription_number” of Family “Eprescription_info” will be destroyed after destruction of
Attribute
“Previous_prescription_number”
of
Family
“Patient_Medical_History” as the latter references the former.
Module diseases_symptoms
Type (FA): Public
Time Stamp: 2014-1220T09:30:10
Referred Elements:
Family Type Element:
Patient_Symptom
End Module
Rule 18: Referencing data may refer one dynamic data
existing in a particular time period. Before and after that time
period there should be no existence of that reference.
Formalization:
_
2
1
1 2
,
1, 2
_
,
2
_
1
Example:
Family
“diseases_symptom”
of
Family
“disease_identification” refers Family “patient_symptom”. The
reference will be consistent for the time period of existence of
“patient_symptom”. However, at initial time stamp just when
information regarding a “patient_symptom” inversely included
in the database then the database can be in an inconsistent state.
Figure 9 illustrates this.
Compliance of the proposed model towards BASE
consistency model can be established using validation rules 16,
17 and 18. BASE is the abbreviation of Basically Available,
Soft State and Eventually Consistent. Basically Available
indicates that there will be a response to any request towards
databases. Soft state specify that databases may not be write
consistent in all time. Further, it also indicates that all replicas
of data also may not be mutually consistent in all time.
Eventually Consistent implies that databases may exhibit some
consistency in later time [4]. Basically Available part of BASE
can be demonstrated by validation rule 16 in the proposed
model. Whether rules 17 and 18 illustrate soft state and
eventually consistent part of BASE. On the other hand rule 15
indicates hard state of databases as it entails strict presence of
type of data in all times. Further, examples of these rules
practically illustrate the conformity of the proposed model
towards BASE. Besides, the proposed model also compatible
with CAP consistency model as it supports Availability and
Consistency.
Proposed validation rules are described here from both
syntactical and semantical view point. However, these
proposed set of validation rules are not exhaustive. Other
validation methods such as validation of transaction,
parameter, network partition etc. can be devised on the
proposed data model.
Fig. 9. Example of validation rule 18
databases from conceptual level design, irrespective of their
physical level representation. For this purpose a NoSQL data
specification language also has been proposed in the paper. In
addition, rule based data validation methods are proposed using
mathematical logic to validate the raw data of NoSQL
databases. Three kinds of validations have been done. Those
are structural, constraint and consistency validation. These
proposed validation rules are not exhaustive. Those may be
further extended. The proposed NoSQL database modeling
approach is semantically enriched and thus it is understandable,
manageable, and expressible to represent NoSQL database
facets independent of any underlying technology.
Future work includes devising of exhaustive validation
rules for data modeling and automation of proposed validation
rules. Further, systematic transformation of the proposed
conceptual model towards existing physical models of NoSQL
databases through the proposed logical level specification
language may be also a prime future work.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
VI. CONCLUSION AND FUTURE WORK
This paper has attempted to standardize and unify the
conceptual level modeling of NoSQL databases. Proposed
conceptual model efficiently supports insertion of data on the
fly. Thus dynamic modification of schema is allowed in the
proposed model. Further, it has several significant features
such as availability, reusability and scalability. Besides, it
sustains hierarchical, symmetric relationships and replication
of data. Moreover, it strongly supports BASE consistency
model. However, CAP consistency model is also supported by
Module Patient_Symptom
Type (FA): Public
Time Stamp: 2014-1220T09:30:20
Contained_Elements:
Attribute Type Element:
Importance : string
Referred Elements:
Family Type Element:
diseases_symptoms
End Module
[7]
[8]
[9]
V. Abramova, J. Bernardino, P. Furtado, “Which NoSQL database? A
Performance Overview”, Open Journal of Databases, vol. 1(2), pp. 1724, 2014.
A.B.M. Moniruzzaman, S.A. Hossain, “NoSQL Databases: New Era of
Databases for Big Data Analytics – Classification, Characteristics, and
Comparison”, International Journal of Database Theory and Application,
vol. 6(4), pp. 1-14, 2013.
R. Cattell, “Scalable SQL and NoSQL Data Stores”, ACM SIGMOD
Record, vol. 39(4), pp. 12-27, December, 2010.
D. Pritchett, “BASE: An ACID Alternative”, ACM Queue, vol. 6(3), pp.
48-55, 2008.
R. Hecht, S. Jablonski, “NoSQL evaluation: A use case oriented
survey”, 2011 International Conference on Cloud and Service
Computing (CSC '11), IEEE Computer Society, Hong Kong, China, pp.
336-341, 2011.
L. Cabibbo, “ONDM: an Object NoSQL Datastore Mapper”, Faculty of
Engineering, Roma Tre University. Retrieved June 15th 2014.
M. Stonebraker, “Stonebraker on NoSQL and Enterprises”,
Communications of the ACM, vol. 54(8), pp. 10-11, August, 2011.
P. Atzeni, F. Bugiotti, L.Rossi, “Uniform Access to Non-relational
Database Systems: The SOS Platform”, 24th International Conference on
Advanced Information Systems Engineering (CAiSE’12), Poland,
pp.160-174, 2012.
P. Atzeni, C.S. Jensen, G. Orsi, S. Ram., L. Tanca, R. Torlone, “The
relational model is dead, SQL is dead, and I don’t feel so good myself”,
ACM SIGMOD Record, vol. 42(2), pp. 64-68, July, 2013.
[10] E. Sirin, “Data Validation with OWL Integrity Constraints”, Fourth
International Conference on Web Reasoning and Rule Systems (RR’10),
Italy, pp. 18-22, 2010.
[11] L. Zhu, H. Chen K. Quach, “A Semantic Framework for Data Quality
Assurance in Medical Research”, 4th Canadian Semantic Web
Symposium part of the Semantic Trilogy, pp. 54 – 55, 2013
[12] X. Li, Z. Ma, H. Chen, “QODM: A query-oriented data modeling
approach for NoSQL databases”, IEEE workshop on Advanced
Research and Technology in Industry Applications (WARTIA), pp. 338345, September, 2014.
[13] F. Bugiotti, L. Cabibbo, P. Atzeni, R Torlone, “Database Design for
NoSQL Systems”, 33rd International Conference on Conceptual
Modeling, Atlanta, GA, USA, pp. 223-231, October, 2014.
[14] R. Sellami, S. Bhiri, B. Defude, “Supporting multi data stores
applicatons in cloud environments”, IEEE Transactions on Services
Computing, issue 99, pp. 1-14, June, 2015.
[15] A. Sarkar, S. Choudhury, N. Chaki, S. Bhattacharya, “Object
specification Language for Graph Based Conceptual Level
Multidimentional Data Model”, 21st International Conference on
Software Engineering & Knowledge Engineering, pp. 694-697, 2009.