ENGR 112

Graphical Analysis
Why Graph Data?

Graphical methods





Require very little training
Easy to use
Massive amounts of data can be presented
more readily
Can provide an understanding of the
distribution of the data
May be easier to interpret for individuals with
less mathematical background than engineers
Graphical methods

Quantitative data (numerical data)






Cost of a computer (continuous)
Number of production defects (discrete)
Weight of a person (continuous)
Parts produced this month (discrete)
Temperature of etch bath (continuous)
Graphical tools



Line charts
Histograms
Scatter charts
Graphical methods

Qualitative data (categorical and attribute)



Type of equipment (Manual, automated, semiautomated)
Operator (Tom, Nina, Jose)
Graphical tools



Bar charts
Pie charts
Pareto charts
Getting Started

Classify data





Quantitative vs. Qualitative
Continuous or discrete (quantitative)
Chose the right graphical tool
Chose axes and scales to provide best
“view” of data
Label graphs to eliminate ambiguity
Graphical Analysis
Examples
Bar or Column Graph
Displays frequency of observations that
fall into nominal categories
Color distribution for a random package of M&Ms
25
20
Qty

15
10
5
0
brown
red
yellow
green
Color
orange
blue
0
CCD1
CCD2
LR
LCCD
Low Light
Normal Light
Bright Light
Freehand Scan
Controlled
Scan
Max Pitch
Average
Max Skew
Average

Scan Time (Seconds)
Line Chart
Shows trends in data at equal intervals
4.5
4
3.5
3
2.5
2
1.5
1
0.5
Performance Category
CMOS
Graphical methods

Acceptable graph
EDC Warehouse
Test Results for Read Time
ALL SYSTEMS
2
Read Time
(secs/read)
1.46
1
0.81
0.88
0.52
0.64
0.66
7
8
0.20
N/A
0
1
2
3
4
5
RFID System
6
Graphical methods
Better graph
EDC Warehouse
Test Results for Read Time
ALL SYSTEMS
Read Time (secs/read)

2
1.46
0.81
0.88
0.52
0.64
0.66
G
H
0.20
N/A
0
A
B
C
D
E
RFID System
F
Graphical Analysis Details





Always label axis with titles and units
Always use chart titles
Use scales that are appropriate to the
range of data being plotted
Use legends only when they add value
Use both points and lines on line graphs
only if it is appropriate – don’t use if the
data is discrete
Histograms

Histograms are pictorial representations
of the distribution of a measured
quantity or of counted items. It is a
quick tool to use to display the average
and the amount of variation present.
Histogram example
The Pareto principle
Dr. Joseph Juran (of total quality
management fame) formulated the
Pareto Principle after expanding on the
work of Wilfredo Pareto, a nineteenth
century economist and sociologist. The
Pareto Principle states that a small
number of causes is responsible for a
large percentage of the effect--usually a
20-percent to 80-percent ratio.
Pareto example
Histogram Example in Excel
Line Width Histogram
70
50
40
30
20
10
Line Width (um)
4.
96
4.
54
4.
12
3.
69
3.
27
2.
85
2.
43
2.
01
1.
59
1.
17
0
0.
75
Frequency
60
ENGR 112
Fitting Equations to Data
Introduction

Engineers frequently collect paired data in
order to understand




Characteristics of an object
Behavior of a system
Relationships between paired data is often
developed graphically
Mathematical relationships between paired
data can provide additional insight
Regression Analysis
Regression analysis is a mathematical
analysis technique used to determine
something about the relationship between
random variables.
Regression Analysis Goal
To develop a statistical model that can be
used to predict the value of a variable
based on the value of another
Regression Analysis


Regression models are used primarily
for the purpose of prediction
Regression models typically involve

A dependent or response variable


Represented as  y
One or more independent or explanatory
variables

Represented as  x1, x2, …,xn
Regression Analysis
Our focus?


Models with only one
explanatory variable
These models are
called simple linear
regression models
Regression Analysis
A scatter diagram is used to plot an
independent variable vs. a dependent variable

Mail-Order House
Relationship b/w Weight of Mail vs. No. of Orders
25
No. of Orders (thousands)
20
15
10
5
0
0
100
200
300
400
Weight of Mail (lbs)
500
600
700
800
Regression Analysis
Remember!!


Relationships between variables
can take many forms
Selection of the proper
mathematical model is influenced
by the distribution of the X and Y
values on the scatter diagram
Regression Analysis
Y
Y
X
Y
X
Y
X
X
Regression Analysis Model
SIMPLE LINEAR REGRESSION MODEL
Yi = b0 + b1Xi + ei


However, both b0 and b1 are population
parameters
ei  Represents the random error in Y
for each observation i that occurs
Regression Analysis Model

Since we will be working with samples, the
previous model becomes
^
Yi = b0 + b1Xi
Where
 b0 = Y intercept (estimate of b0)
Value of Y when X = 0
 b1 = Slope (estimate of b1)
Expected change in Y per unit change in X
^
 Yi = Predicted (estimated) value of Y
Regression Analysis Model


What happened with the error term?
Unfortunately, it is not gone. We still
have errors in the estimated values
e i  Yi  Ŷi
Regression Analysis


Find the straight line
That BEST fits the data
Regression Analysis
Y
Positive Straight-Line Relationship
Yi = b0 + b1Xi
b1
e4
e2
b0
e1
0
0
e3
x
y
e5
X
Least Squares Method


Mathematical technique that determines the
values of b0 and b1
It does so by minimizing the following
expression
n
Min  e
i 1
n
n
i 1
i 1

Min  ei2   Yi  Ŷi

2
2
i
  Yi  b 0  b1X i 
n
i 1
2
Least Squares Method
Resulting equations
(1)
(2)
n
n
i 1
i 1
 Yi  nb 0  b1  Xi
n
n
n
i 1
i 1
i 1
 Xi Yi  b0  Xi  b1  X
Equations (1) and (2) are called
the “normal equations”
2
i
Least Squares Method

Assume the following values
n  5,  x  2,  y  20,  x 2  10,  xy  15

Resulting equations
1 5b0  2b1  20
2 2b0  10b1  15
Assessing Fit

How do we know how good a regression
model is?

Sum of squares of errors (SSE)


Good if we have additional models to compare against
Coefficient of determination  r2

A value close to 1 suggests a good fit
SSE
r  1
SST
2
Where do we
get these
values?