chapter 1

A foundation for analysis in the health science
Yongli YANG
Ph.D, Associate Professor
Department of Biostatistics & Epidemiology, college of public health
TEL: 67781249
E-mail: [email protected]
STATISTICS IN LIFE

GDP in China increased 7.7% in 2013 from the report
of State Statistical Bureau.

Life expectancy is 74.83 year in 6th population census

Weather forecast in Zhengzhou
week
Theory course content
8
introduction
9
Description of quantitative variable
10
Description of qualitative variable . Statistical table and graph
10
Exercise: statistical description
11
Normal distribution
11
Sampling error and sampling distribution
12
The principle of hypothesis test
12
t test
13
One-way analysis of variance
13
Nonparametric test
14
Exercise: t test and ANOVA
14
Chi-square test
15
Simple linear correlation analysis
Chapter I
introduction to biostatistics
Introduction
Some basic concepts
Basic step of statistical work
Review questions and exercises
Be familiar with
• Basic step of statistical work
Understand
• The definition: statistics and biostatistics
Master
• The definition: population, sample,
probability, quantitative variable, qualitative
variable
I.
INTRODUCTION
 We
are frequently reminded of the fact that we
are living in the information age. Appropriately,
then, this subject is about information—how it is
obtained, how it is analyzed, and how it is
interpreted. The information about which we are
concerned are called data, and the data are
available to us in the form of numbers.
Question 1
 We aim to explore whether smoking is harmful to
your health.
 How to explore?
Lung cancer, Heart disease, Other diseases?
Lung
cancer
a
a/(a+b)
smoking
no lung
cancer
b
compare
non-
Lung
cancer
c
c/(c+d)
smoking
no lung
cancer
d
conclusion
Smoking group
Non-smoking
group
Question 2
It is obvious that generally men are taller
than women, while some other women
are taller than men.
•Therefore, if you wanted to ‘prove’ that men were taller,
you should measure many people of each sex.
• How many people should you measure?
Question 3

A doctor used a new drug to cure 5 AIDS patients.
4 of them are cured.

Conclusion: The cured rate of this drug was 80%.
?
Is his conclusion right? Why or why not?
A knowledge of statistics
is like a knowledge of
foreign languages or of
algebra; it may prove of
use at any time under
any circumstances.
A.L. Bowley
II.
SOME BASIC CONCEPTS
① Data
② Statistics and biostatistics
③ Population and sample
④ Variable
⑤ Parameter and Statistic
⑥ Probability
① DATA


Definition: The raw material of statistics is data.
For our purses we define data as numbers.
Sources of data:
Routinely kept records
Surveys
Experiments
External sources
① DATA
Routinely kept records.
Hospitals keep day-to-day records, which
contain immense amounts of information on
patients. When the need for data arises, we
should look for them first among routinely kept
records.
① DATA
Surveys
If the data needed to answer a question are not
available from routinely kept records, then
logical source may be a survey.
For example, the administration of the health
department want to learn the numbers of
hypertension in Zhengzhou, we may conduct a
survey.
① DATA
Experiments
Frequently the data needed to answer a
question are available only as the result of an
experiment.
For example, a nurse wish to know which of
several strategies is best for maximizing patient
compliance.
① DATA
External sources
The data needed to answer a question may
already exist in the form of published reports.
For example, statistical yearbook, population
census……
② STATISTICS
A science dealing with
the
collection,
analysis, interpretation and presentation of
masses of numerical data
----Webster’s international dictionary
② statistics
The science and art of dealing with
variation in data
classification
as to obtain
through
collection,
and analysis in such a way
reliable results.
—— John M. Last
—— A Dictionary of Epidemiology
② STATISTICS

The tools of statistics are employed in many
fields—demography, national economic,
psychology, medicine……
Demographics
National economic statistics
Psychological statistics
Biostatistics
……
② BIOSTATISTICS

When the data analyzed are derived from the
biological sciences and medicine, we use the
term “biostatistics” to distinguish this particular
application of statistical tools and concepts.
③ Population and sample
We want to learn the average income of Beijing
doctors in 2010. Suppose there are 20,000
doctors in Beijing in 2010.
→
→
To investigate all the doctors one by one (But it
is consuming-time )
500 are drawn from which randomly. Then
generalize the population average income from
the incomes of 500 doctors.
③ Population and sample
Questions
What is study aim?
What is study population?
What is our observational unit?
What is sample?
What is sample size?
③ Population and sample
Answers
To learn the average income of Beijing
doctors in 2010
20,000 doctors’ income
Individual
500 doctors’ income
500
③ Population and sample
population
Definition:Population is the largest collection of
entities for which we have an interest at a particular
time.
For example, we are interested in the weights of all
the children enrolled in a certain country elementary
school system, our population consists of all these
weights.
③ Population and sample
population
Population may be finite or infinite. If a population of
values consists of a fixed number of these values,
the population is said to be finite. If, on the other
hand, a population consists of an endless
succession of values, the population in an infinite
one.
③ Population and sample
Sample
Definition: A sample is a random part of population.
Suppose our population consists of the weights of all
the elementary school children enrolled in a certain
country school system. If we collect for analysis the
weights of only a fraction of these children, we have
only a part of our population of weights, that is, we
have a sample.
③ Population and sample
How to get a random part of population?

Simple random sampling

Systematic sampling

Stratified sampling

Cluster sampling
If a sample of size n is drawn from a population of
size N in such a way that every possible sample of
size n has the same chance of being selected, the
3 4 random
5
sample is called
a
simple
sample
6
2
1
12
13
11
9
15
14
10
16
8
7
17
Sample
④ VARIABLE
If we observe a characteristic, we find that it
takes on different values in different persons,
places, or things, we label the characteristic a
variable.
Examples: heart rate, the heights of adult males,
diastolic blood pressure, gender,blood type,
treatment effect
④ Variable
Quantitative
variable
Binary
variable
variable
Qualitative
variable
Multiple categorical
variable
Ordinal
variable
④ Variable
quantitative variable:

also known as metric, or numerical

is one that can be measured in the usual sense

convey information regarding amount
example:the weights of preschool children,
diastolic blood pressure
④ Variable
qualitative variable

also known as categorical or nominal

is one that can not be measured in the usual
sense,only can be categorized

convey information regarding attribute
④ Variable
 Binary variable: gender, live or death, yes or no.
 Multiple categorical variable
blood types
A, B, AB, O
race
white, black, yellow, brown
 Ordinal variable: there is an order in the categories
Your opinion on something:
unsatisfactory, normal, very satisfactory
④ Variable
ID
age
gender
Educational level
occupation
height
weight
2025655
27 male
graduate
teacher
165
71.5
2025653
22 male
undergraduate
doctor
160
74
2025830
25 female
junior high school
worker
158
68
2022543
23 male
senor high school
students
161
69
2022466
25 female
senor high school
worker
159
62
2024535
27 female
elementary
farmer
157
68
2025834
20 male
graduate
cadre
158
66
2019464
24 male
graduate
students
158
70.5
2025783
29 female
junior high school
farmer
154
57
④ Variable
Data transformation
Numerical variable
weight
(kg)
fat or overweight
Ranked variable
normal
thin
binary variable
normal
abnormal
quantitative variable
qualitative variable
example:WBC(1/m3)count of five persons:
3000
6000
lower normal
5000
8000
12000
normal normal
Binary variable : normal
abnormal
Ordinal variable: lower
higher
3 persons;
2 persons
1 person
normal
3 persons
higher
1 person
⑤ Parameter and Statistic
Parameter
→ describe the characteristic of population.
→ usually presented by Greek letter,such as μ.
→ Usually unknown
⑤ Parameter and Statistic
statistic
→
describe the characteristic of a sample
→
usually presented by Latin letter,such as s and
p.
⑥ Probability
→
the possibility of occurrence of a random event.
→
designated as P
Certain
0≤P≤1
P=0
impossible event
P=1
certain event
P≤0.05
small probability event
Impossible
⑥ Probability
random event: The event may occur or may
not occur in one experiment.
Before one experiment, nobody is sure
whether the event occurs or not.
Throw the dice
⑥ Probability
Frequency of an event------the number of times
the event occurs in a sequence of repetition of
the random phenomenon.
Probability of an event----if in a long
sequence of repetition, the relative
frequency of an event approached a fixed
number, that number is the probability of the
event .
⑥ Probability
Relative frequency
1.00
0.75
0.50
0.25
0.00
0
25
50
75
100
125
⑥ Probability
The relationship between relative frequency
and probability
→Probability is the limit of frequency
n
∝
P=f=m/n

Ⅲ BASIC STEP OF STATISTICAL WORK
4 Analysis
of data
3 Sorting of
data
2 Collection
of data
1 Design
1 Design
Professional design
Statistical design
• Study aim
• Sampling method
• Study subject
• Allocation method
• measures
• Calculation of sample
size
• Data processing
2 Collection of data

Source of data
Routinely kept records
Surveys
Experiments
External sources

Principle:in time, accurate, complete
3 Sorting of data

Checking: outlier, missing value,

Coding: Blood type A(1), B(2), AB(3), O(4);
gender male(1), female(2)

Grouping:
DBP
hypotension
normal
SBP
hypertension
 Computing: weight
height
Body mass index
4 Analysis of data
Statistical analysis is divided into two parts:
descriptive statistics and inferential statistics
To teach the student to
organize and summarize data
indicator
Statistical
description
Table and chart
Statistical
analysis
Statistical
To teach the student
how
inference
to reach decisions about a
large body of data by
examining only a small
part of the data
Parameter
estimation
Hypothesis
testing
IV. REVIEW QUESTIONS AND EXERCISES
Define:
1.
Quantitative variable
2.
Qualitative variable
3.
Population
4.
Sample
5.
probability
IV. Review questions and exercises
Explain the type of the following variables:
1.
Admitting diagnosis in a mental health clinic
2.
Weights of babies born in hospital during a
year
3.
Gender of babies born in hospital during a year
4.
Under-arm temperature of patients with fever