Tuesday February 27

Homework Discussion
• Read pages 446 - 461
• Page 467: 17 – 20, 25 – 27, 61, 62, 63, 67
• See if you can find an example in your life
of a survey that might yield unreliable
results
The critical issues are:
a. Finding a sample that is representative of the population, and
b. Determining how big the sample should be.
Choosing a good sample of a reasonable size is more
important that the sampling rate.
• Bush's lead gets
smaller in poll
•
•
By Susan Page,
USA
TODAY
WASHINGTON
— President Bush leads
Sen. John Kerry by 8
percentage
points
among likely voters, the
latest USA TODAY/
CNN/Gallup Poll shows.
That is a smaller
Results based on likely
advantage than the voters are based on the sub
president held in midsample of 758 survey
September but shows
him
maintaining
a respondents deemed most
likely to vote in the
durable edge in a race
November 2004 General
that was essentially tied
Election. The margin of
for months.
sampling error is ±4 percentage
points.
George Gallup explained
• Whether you poll the Unites States or New
York State or Baton Rouge … you need…
the same number of interviews or samples.
It’s no mystery really – if a cook has two
pots of soup on the stove, one far larger than
the other, and thoroughly stirs them both, he
doesn’t have to take more spoonfuls from
one than the other to sample the taste
accurately.
Statistics is the science of dealing with data. This
includes gathering data, organizing data, interpreting data,
and understanding data.
Descriptive statistics (page 476) is the area which
describes large amounts of data in a way that is
understandable, useful, and, if need be, convincing.
EXAMPLE 1 (page 478). Stat 101 Midterm Exam Scores
(25 Points Possible): N=75
ID
1257
1297
1348
1379
1450
1506
1731
1753
1818
2030
2058
2462
2489
2542
2619
Score
12
16
11
24
9
10
14
8
12
12
11
10
11
10
1
ID
2651
2658
2794
2795
2833
2905
3269
3284
3310
3596
3906
4042
4124
4204
4224
Score
10
11
9
13
10
10
13
15
11
9
14
10
12
12
10
ID
4355
4396
4445
4787
4855
4944
5298
5434
5604
5644
5689
5736
5852
5877
5906
Score
8
7
11
11
14
6
11
13
10
9
11
10
9
9
12
ID
6336
6510
6622
6754
6798
6873
6931
7041
7196
7292
7362
7503
7616
7629
7961
Score
11
13
11
8
9
9
12
13
13
12
10
10
14
14
12
ID
8007
8041
8129
8366
8493
8522
8664
8767
9128
9380
9424
9541
9928
9953
9973
Score
13
9
11
13
8
8
10
7
10
9
10
8
15
11
10
DESCRIPTIVE STATISTICS
A data set is a collection of data values called data points.
The size of a data set is the number of data points in it.
We use N to represent size.
EXAMPLE 1 (page 478). Stat 101 Midterm Exam Scores
(25 Points Possible): N=75
ID
1257
1297
1348
1379
1450
1506
1731
1753
1818
2030
2058
2462
2489
2542
2619
Score
12
16
11
24
9
10
14
8
12
12
11
10
11
10
1
ID
2651
2658
2794
2795
2833
2905
3269
3284
3310
3596
3906
4042
4124
4204
4224
Score
10
11
9
13
10
10
13
15
11
9
14
10
12
12
10
ID
4355
4396
4445
4787
4855
4944
5298
5434
5604
5644
5689
5736
5852
5877
5906
Score
8
7
11
11
14
6
11
13
10
9
11
10
9
9
12
ID
6336
6510
6622
6754
6798
6873
6931
7041
7196
7292
7362
7503
7616
7629
7961
Score
11
13
11
8
9
9
12
13
13
12
10
10
14
14
12
ID
8007
8041
8129
8366
8493
8522
8664
8767
9128
9380
9424
9541
9928
9953
9973
Score
13
9
11
13
8
8
10
7
10
9
10
8
15
11
10
In statistical usage, a variable is any characteristic that varies
with members of a population. (page 481)
Baseball stats
When possible values of the numerical variable change by
minimum increments, the variable is called discrete
When the differences between the values of a numerical
variable can be arbitrarily small, we call the variable
continuous
.
EXAMPLE 1 (page 478). Stat 101 Midterm Exam Scores
(25 Points Possible): N=75
ID
1257
1297
1348
1379
1450
1506
1731
1753
1818
2030
2058
2462
2489
2542
2619
Score
12
16
11
24
9
10
14
8
12
12
11
10
11
10
1
ID
2651
2658
2794
2795
2833
2905
3269
3284
3310
3596
3906
4042
4124
4204
4224
Score
10
11
9
13
10
10
13
15
11
9
14
10
12
12
10
ID
4355
4396
4445
4787
4855
4944
5298
5434
5604
5644
5689
5736
5852
5877
5906
Score
8
7
11
11
14
6
11
13
10
9
11
10
9
9
12
ID
6336
6510
6622
6754
6798
6873
6931
7041
7196
7292
7362
7503
7616
7629
7961
Score
11
13
11
8
9
9
12
13
13
12
10
10
14
14
12
ID
8007
8041
8129
8366
8493
8522
8664
8767
9128
9380
9424
9541
9928
9953
9973
Score
13
9
11
13
8
8
10
7
10
9
10
8
15
11
10
TABLE 14-2 Frequency Table
for Stat 101 Data Set
A frequency table (page 478)
is a listing of the scores along
with the frequency with which
they occur.
Frequency Table
Score Frequency
%
0
0
0.00%
1
1
1.33%
2
0
0.00%
3
0
0.00%
4
0
0.00%
5
0
0.00%
6
1
1.33%
7
2
2.67%
8
6
8.00%
9
10
13.33%
10
16
21.33%
11
13
17.33%
12
9
12.00%
13
8
10.67%
14
5
6.67%
15
2
2.67%
16
1
1.33%
17
0
0.00%
18
0
0.00%
19
0
0.00%
20
0
0.00%
21
0
0.00%
22
0
0.00%
23
0
0.00%
24
1
1.33%
25
0
0.00%
N=
75
100.00%
A bar graph (page 479) is a graph with the possible test
scores listed in increasing order on a horizontal axis and the
frequency of each test score displayed by the height of the
column above that test score.
N=75
F
r
e
q
u
e
n
c
y
18
16
14
12
10
8
6
4
2
0
1
3
5
7
9
11
13
15 17
19 21
23 25
Score
Outliers are data points that do not fit into the overall pattern
of the data.
Objective 7:
Creating structures and systems that model problems
and information
25
23
21
19
17
15
Score
13
9
7
5
3
22%
20%
18%
16%
14%
12%
10%
8%
6%
4%
2%
0%
11
N=75
1
Relative Frequency
Instead of representing frequencies a bar graph may represent
relative frequencies i.e. the frequencies expressed as
percentages of the total population.
Fancy bar graphs that use icons instead of bars to show the
frequencies, are commonly referred to as pictograms.
EXAMPLE 14.3
(page 481).
Year
Yearly sales of
XYZ Corporation
from 1997
through 2002
1997
1998
1999
2000
2001
2002
Annual sales
52
55
61
63
70
77
80
Annual Sales (in m illions)
Millions of Dollars
Millions of Dollars
Annual Sales (in millions)
75
70
65
60
55
50
80
70
60
50
40
30
20
10
0
1997 1998 1999 2000 2001 2002
1997 1998 1999 2000 2001 2002
Year
Year
EXAMPLE (page 484). SAT Scores
1200
1000
800
600
400
200
1510-1600
1410-1500
1310-1400
1210-1300
1110-1200
1010-1100
910-1000
810-900
710-800
610-700
510-600
0
400-500
200
300
500
800
1000
1100
1200
900
700
400
300
100
Frequency
400-500
510-600
610-700
710-800
810-900
910-1000
1010-1100
1110-1200
1210-1300
1310-1400
1410-1500
1510-1600
When we have a large number of possible scores we often
break up the range of scores into class intervals.
EXAMPLE (page 486). Starting Salaries of TSU Graduates
Starting Salaries of First-Year TSU Graduates
Salary
Number of Students Percentage
40000+ - 45000
228
7%
45000+ - 50000
456
14%
50000+ - 55000
1043
32%
55000+ - 60000
912
28%
60000+ - 65000
391
12%
65000+ - 70000
163
5%
70000+ - 75000
65
2%
When a numerical variable is continuous, its possible values
can vary by infinitesimally small increments. Consequently,
there are no gaps between the class intervals. In this case we
use a variation of a bar graph called a histogram.
EXAMPLE (page 486). Starting Salaries of TSU Graduates
N=3258
35%
Percentages
30%
25%
20%
15%
10%
5%
7000075000
6500070000
6000065000
5500060000
5000055000
4500050000
4000045000
0%
When a numerical variable is continuous, its possible values
can vary by infinitesimally small increments. Consequently,
there are no gaps between the class intervals. In this case we
use a variation of a bar graph called a histogram.
A variable that represents a measurable quantity is called a
numerical or quantitative variable.
Variables which describe characteristics that cannot be
measured numerically are called categorical, or qualitative
variables. (page 482)
EXAMPLE 3. Enrollment (by School) at Tasmania State University
TABLE 14-3 Undergraduate Enrollments at TSU
School
Enrollment
Agriculture
2400
16%
Business
1250
8%
Education
2840
19%
Humanities
3350
22%
Science
4870
32%
Other
290
2%
Total
15000
5000
35
EXAMPLE 3. Enrollment (by School) at Tasmania State University
N=15,000
5000
4500
4000
3500
3000
2500
2000
1500
1000
500
0
4870
3350
2840
2400
1250
290
Other
Science
Humanities
Education
Business
Agriculture
EXAMPLE 3. Enrollment (by School) at Tasmania State University
N=15,000
35%
30%
25%
20%
15%
10%
5%
0%
32%
22%
19%
16%
8%
2%
Other
Science
Humanities
Education
Business
Agriculture
NUMERICAL SUMMARIES OF DATA (page 558)
Measures of location (central tendency) are numbers that
tell us something about where the values of the data fall.
Measures of spread (dispersion) tell us something
about how spread out the values of data are.
The average of a set of N numbers is obtained by adding
the numbers and dividing by N.
Example. Average Home runs per season: Mike Sweeney
TABLE 14-2 Frequency Table
for Stat 101 Data Set
Frequency Table
Score Frequency
%
0
0
0.00%
1
1
1.33%
2
0
0.00%
3
0
0.00%
4
0
0.00%
5
0
0.00%
6
1
1.33%
7
2
2.67%
8
6
8.00%
9
10
13.33%
10
16
21.33%
11
13
17.33%
12
9
12.00%
13
8
10.67%
14
5
6.67%
15
2
2.67%
16
1
1.33%
17
0
0.00%
18
0
0.00%
19
0
0.00%
20
0
0.00%
21
0
0.00%
22
0
0.00%
23
0
0.00%
24
1
1.33%
25
0
0.00%
N=
75
100.00%
Example 9. The Average Test Score in the Stat 101 Test
N=75
F
r
e
q
u
e
n
c
y
18
16
14
12
10
8
6
4
2
0
1
3
5
7
9
11
13
Score
15 17
19 21
23 25
8
7
6
5
4
3
2
1
0
15
16
17
18
19
20
21
22
23
THE AVERAGE (page 559).
STEP 1. Calculate the total of the data.
total  ( s1  f1 )  ( s2  f 2 )  ...  (sn  f n )
STEP 2. Calculate N.
N  f1  f 2  ...  f n
STEP 3. Calculate the Average.
Average = total / N
Homework
• Read pages 476 – 489
• Page 499: 1 – 3, 5, 7 – 11, 19, 21, (for 23,
25, 29, 30, 32, find the mean)