ADNAN MENDERES UNIVERSITY FACULTY OF ENGINEERING

MAT 254 – Probability and Statistics
2016 -2017 Spring

TYPES of DATA

TYPES of DATA
Quantitative (or Numerical) data is numerical.
e.g. Height in cm. ,weight in kg. ,blood pressure (mm/Hg)
Qualitative (or Categorical) data is data that is not given
numerically;
e.g. favourite colour, place of birth, favourite food, type of car.
Example: Identify each of the following examples as
qualitative or quantitative variables.
1. The amount of gasoline pumped by the next 10 customers.
Quantitative
2. The amount of radon in the basement of each of 25 homes in a new
development.
Quantitative
3. The color of the baseball cap worn by each of 20 students.
Qualitative
4. The length of time to complete a mathematics homework
assignment.
Quantitative
5. Birthplaces of the students in the class.
Qualitative

TYPES of DATA
Secondary data, is data collected by someone other than the user.
Common sources of secondary data for social science include censuses,
organizational records and data collected through qualitative methodologies or
qualitative research.
Primary data, by contrast, are collected by the investigator conducting the
research.

TYPES of DATA
Continuous data, can take any value (within a range).
Ex : A person's height: could be any value (within the range of human heights), not
just certain fixed heights,
Time in a race: you could even measure it to fractions of a second,
Discrete data can only take certain values.
Ex: the number of students in a class (you can't have half a student).
Discrete data is counted,
Continuous data is measured.
Data collection is the process of gathering and measuring information on variables
of interest, in an established systematic fashion that enables one to answer stated
research questions, test hypotheses, and evaluate outcomes.

Data Collection Methods

Survey Method: Standardized paper-and-pencil or phone questionnaires that ask
predetermined questions.
Observation Method: The engineer observes the process or population, disturbing it
as little as possible, and records the quantities of interest.
Experimental Method: The engineer designs an experiment and makes deliberate or
purposeful changes in the controllable variables of the system or process.
Registration Method: Registers and licenses are particularly valuable for complete
enumeration.
Use of Existing Studies





When data collection entails selecting individuals or objects from a frame, the
simplest method for ensuring a representative selection is to take a simple
random sample.
A probability sampling
 Every element in the target population or sampling frame
has equal probability of being chosen in the sample for the
survey being conducted.
 Scientific, operationally convenient and simple in theory.
 Results may be generalized.
A non-probability sampling
 Every element in the sampling frame does not have equal
probability of being chosen in the sample.
 Operationally convenient and simple in theory.
 Results may not be generalized.

Simple Random Sampling
◦ Selected by using chance or random numbers
◦ Each individual subject (human or otherwise) has an
equal chance of being selected
◦ Examples:
 Drawing names from a hat
 Random Numbers
MAT234 - Probability & Statistics, Section #3

Simple Random Sampling
Example: I want one of the students in MAT254 class to
answer my question. So I must make a selection among
the students.
I am selecting a student randomly using my signature list.
There are 54 students. Each student is numbered: 01, 02,
03, etc. up to 54.
All students have equal chance to be selected.
 Stratified Random Sampling
 Divide the population into at least two different groups
with common characteristic(s), then draw SOME subjects
from each group (group is called strata or stratum)
 Basically, randomly sample each subgroup or strata
 Results in a more representative sample
MAT234 - Probability & Statistics, Section #3

Systematic Sampling
◦ Select a random starting point and then select every kth subject
𝑁
in the population where 𝑘 =
𝑛
◦ 𝑛 is the sample size, and 𝑁 is the population size.
◦ Simple to use so it is used often
◦ The systematic technique has some inherent dangers when the
sampling frame is repetitive or cyclical in nature. In these
situations the results may not approximate a simple random
sample.
HOW CAN YOU VISUALIZE
HUGE AMOUNT OF DATA?
Data presentation
 Data Presentation
 Presentation is the process of organizing data into logical,
sequential, and meaningful categories and classifications to
make them meaningful to study and interpretation
 Analysis and presentation put data into proper order and in
categories reducing them into forms that are intelligible
and interpretable so that the relationships between the
research specific questions and their intended answers can
be established.
 Data presentation is putting results of experiments into
graphs, charts and tables to understand the data easily.
 Data Presentation
 There are three ways of presenting data;
 Textual – presented in paragraph form to allow
description.
 Tabular – arranging data in rows and columns.
 Graphical – pictorial representation of data to highlight
certain trends
MAT234 - Probability & Statistics, Section #3
MAT234 - Probability & Statistics, Section #3
Textual Methods - Rearrangement
38
17
50
23
39
25
28
45
34
35
37
38
27
24
39
43
43
20
50
45
46
26
35
18
44
46
48
38
49
34
9
42
44
38
39
29
23
46
50
45
Rearranged
data
Raw (original)
data
9
23
28
35
38
43
45
48
17
24
29
37
39
43
45
49
18
25
34
38
39
44
46
50
20
26
34
38
39
44
46
50
23
27
35
38
42
45
46
50



Each item in the sample is divided into two
parts:
stem ;consisting of the left most one or two
digits,
and leaf, which consists of next digit
MAT234 - Probability & Statistics,
Section #3
Textual Methods – Stem-and-Leaf Plot
38
17
50
23
39
25
28
45
34
35
37
38
27
24
39
43
43
20
50
45
46
26
35
18
44
46
48
38
49
34
9
42
44
38
39
29
23
46
50
45
Stem & Leaf Plot
Stem
Leaves
0
9
1
7,8
2
0,3,3,4,5,6,7,8,9
3
4,4,5,5,7,8,8,8,8,9,9,9
4
2,3,3,4,4,5,5,5,6,6,6,8,9
5
0,0,0
MAT234 - Probability & Statistics, Section #3
Raw (original)
data
Tabular Methods
A sample table with all of its parts is shown below.
Table
Number
Table Title
Column Header
Row Classifier
Body
Source Note
Tabular Methods – Frequency Distribution Table
A frequency distribution table is a table which shows the data arranged into
different classes(or categories) and the number of cases(or frequencies) which fall
into each class.
Stem
Leaves
0
9
1
7,8
2
0,3,3,4,5,6,7,8,9
3
4,4,5,5,7,8,8,8,8,9,9,9
4
2,3,3,4,4,5,5,5,6,6,6,8,9
5
0,0,0
Stem & Leaf Plot of Data
Frequency Distribution Table of same Data
Scores
Frequency
1 - 10
1
11 – 20
3
21 – 30
8
31 – 40
12
41 – 50
16
Tabular Methods – Frequency Distribution Table
Guidelines For Frequency Tables
1. Be sure that the classes are mutually exclusive.
2. Include all classes, even if the frequency is zero.
3. Try to use the same width for all classes.
4. Select convenient numbers for class limits.
5. Use between 5 and 20 classes.
6. The sum of the class frequencies must equal the number of original data
values.
Tabular Methods – Frequency Distribution Table
Relative FDT & Cumulative FDT
Score Frequency
Score
Relative
Frequency
Score
Cumulative
Frequency
1 - 10
1
1 – 10
2.5%
1 - 10
1
11 – 20
3
11 – 20
7.5%
11 – 20
4 (=1+3)
21 – 30
8
21 - 30
20%
21 – 30
12 (=4+8)
31 – 40
12
31 – 40
30%
31 – 40
24 (=12+12)
41 – 50
16
41 - 50
40%
41 – 50
40 (=24+16)
Tabular Methods – Frequency Distribution Table
Example: Guests staying at Hilton Hotel were asked to rate the quality of
their accommodations as being excellent, above average, average,
below average, or poor. The ratings provided by a sample of 20 quests
are shown below.
Below Average
Above Average
Above Average
Average
Above Average
Average
Above Average
Average
Above Average
Below Average
Poor
Excellent
Above Average
Average
Above Average
Above Average
Below Average
Poor
Above Average
Average

Frequency Distribution Table
Rating
Frequency
Poor
2
Below Average
3
Average
5
Above Average
9
Excellent
1
Total
20
Graphical Methods
Graphical Methods
Graphic presentations used to illustrate and clarify information. Tables are essential
in presentation of scientific data and diagrams are complementary to summarize these
tables in an easy, attractive and simple way.
The diagram should be:
 Simple
 Easy to understand
 Save a lot of words
 Self explanatory
 Has a clear title indicating its content
 Fully labeled
 The y axis (vertical) is usually used for
frequency
Graphical Methods-Dot Plot



One of the simplest graphical summaries of data is a dot plot.
A horizontal axis shows the range of data values.
Then each data value is represented by a dot placed above the axis.
A dot plot example from textbook
The o values represent the “nitrogen” data and
the x values represent the “no-nitrogen” data.
Graphical Methods-Scatter Plot
 It is useful to represent the relationship between two numeric
measurements, each observation being represented by a point
corresponding to its value on each axis.
A scatter plot example from textbook
Graphical Methods-Line Diagram
 It is diagram showing the relationship between two numeric variables (as the
scatter) but the points are joined together to form a line (either broken line or
smooth curve)
Number of doctors working in each clinic during years 1995-1998.
Number of doctors
6
5
4
Clinic 1
3
Clinic 2
2
Clinic 3
1
0
1995
1996
1997
1998
Graphical Methods- A cumulative frequency graph (ogive)
 A cumulative frequency graph or ogive, is a line graph that displays the cumulative
frequency of each class at its upper class boundary.
Graphical Methods - Bar Charts







The data presented is categorical.
Data is presented in the form of rectangular breadth.
Each bar represent one variant.
Suitable scale should be indicated and scale starts from zero.
The width of the bar and the gaps between the bars should be
equal.
The length of the bar is proportional to the magnitude/frequency of
the variable.
The bars may be vertical or horizontal.
Graphical Methods - Bar Charts

Multiple Bar Charts
◦ Also called compound bar charts
◦ More than one sub-variant can be expressed
Graphical Methods - Bar Charts

Component Bar Charts
◦ When there are many categories on X-axis and they have further
subcategories, then to accommodate the categories, the bars may
be divided into parts, each part representing a certain item and
proportional to the magnitude of that particular item.
Graphical Methods - Pie Charts
 Most common way of presenting data
 The value of each category is divided by the total values and then multiplied
by 360 and then each category is allocated the respective angle to present
the proportion it has.
Graphical Methods - Histogram
It is very similar to the bar chart with the difference that the rectangles or bars are
adherent (without gaps).
It is used for presenting class frequency table (continuous data).
Each bar represents a class and its height represents the frequency (number of
cases), its width represent the class interval.
Graphical Methods - Skewness of Data
Frequency Poligon
Derived from a histogram by connecting the
mid points of the tops of the rectangles in the
histogram.
The line connecting the centers of histogram
rectangles is called frequency polygon.
We can draw polygon without rectangles so we
will get simpler form of line graph.
Graphical Methods - Frequency Poligon
Ex:
Age in Years
Sex
Mid-point of interval
Males
Females
20-30
3
2
(20+30)/2=25
30-40
5
5
(30+40)/2=35
40-50
7
8
(40+50)/2=45
50-60
4
3
(50+60)/2=55
60-70
2
4
(60+70)/2=65
Total
21
22
Graphical Methods - Frequency Poligon
Ex:
Graphical Methods - Box & Whisker Plot (or Box Plot)
Box Plots are another way of representing all the same information that can be found on a
Cumulative Frequency graph.
Lowest value
Highest value
Median
Lower Quartile
Upper Quartile
Inter-Quartile Range
Range
Note: The minimum value is the lowest possible value of your first group, and the
maximum value is the highest possible value of your last group
Ex: Table 1.4 from textbok (Car battery life)
Ex: Table 1.5 Stem and Leaf plot for Car battery life

Double stem and leaf plot: the stems
corresponding to leaves 0 through 4 have
been coded by the symbol ⋆ and the stems
corresponding to leaves 5 through 9 by the
symbol ·.
MAT234 - Probability & Statistics,
Section #3
Ex: Table 1.6 Double Stem and Leaf plot for Car battery life
Ex: Table 1.7 Relative Frequency Distribution for Car battery life
Note that: total observation :40
Ex: Figure 1.6 Relative Frequency Histogram for Car battery life
Ex: Figure 1.6 Relative Frequency Histogram for Car battery life
Rotating the stem and leaf plot
CCW through an angle of 90
degree gives similar figure with
Histogram.
Skewed to the left !!!
Ex: Exercise 1.20 from textbok
Ex: Exercise 1.20 from textbok
17 20 10
9
23 13 12 19 18 24
12 14
6
9
13
6
7
16 18
8
13
3
32
9
7
10
4
13
7
18
7
10
5
10 13
7
10 11
27 19 16
14 15 10
9
6
7
7
8
15
3
6
7
9
10 11 13 15 18 20
4
7
7
9
10 12 13 15 18 23
5
7
7
9
10 12 13 16 18 24
6
7
8
9
10 13 14 16 19 27
6
7
8
10 10 13 14 17 19 32
Ex: Exercise 1.20 from textbok
Leaf
34
5666777777788999
0000001223333344
5566788899
034
7
2
Freq.
2
17
16
10
3
1
1
0
Rel.Freq.
0.04
0.34
0.32
0.20
0.6
0.2
0.2
0
Cum.freq
.
2
19
35
45
48
49
50
18
50
16
14
12
Frequency
Ste
m
0*
0.
1*
1.
2*
2.
3*
3.
10
8
6
4
2
Total obs.:50
Frequency Histogram (using MATLAB)
END OF THE LECTURE…