Setting the scene - University of Reading

Setting the scene
(Session 01)
SADC Course in Statistics
Learning Objectives
At the end of this session, you will be able to
• recognise situations where statistical
modelling in relevant
• understand the purpose of modelling
• for a given scenario, be able to identify the
key response variable of interest and
potential factors that may affect the
variation in the key response
To put your footer here go to View > Header and Footer
2
Session Contents
In this session you will be
• provided with examples of situations
where modelling is relevant to answer
questions of importance in policy decisions
• given the opportunity to explore examples
in order to develop some insight into
modelling ideas
• introduced to the associated terminology
To put your footer here go to View > Header and Footer
3
Examples where modelling is relevant
Two examples will be discussed initially…
• Child malnutrition and feeding practices in
Malawi, in Food and Nutrition Bulletin,
Volume 18, No. 2, 1997. United Nations
University Press, Tokyo, Japan
• Gender-sensitive education statistics and
indicators, in UNESCO Training Materials
for workshops on Education Statistics and
Indicators in Ghana (1996), Côte d’Ivoire
(1997).
To put your footer here go to View > Header and Footer
4
Example 1 - Nutrition:
The data come from the Malawi Demographic
and Health Survey, 1992. Primary interest
was in identifying factors affecting malnutrition.
The factors were:
• gender, age, birth size, type of breast
feeding, maternal education & area of
residence amongst 4-11 month olds infants
• age, birth size, preceding and succeeding
birth interval, if still breast feeding, no. of
days with diarrhoea in past 2 weeks and
other household characteristics amongst 1259 month old children
To put your footer here go to View > Header and Footer
5
Example 2 - Education:
A cross-country study to determine factors
which hinder gender equality in education.
One outcome variables was a gender-equity
sensitive indicator (GESI). Some factors
studied were:
• Total fertility rate
• GNP per capita
• % female teachers in primary education
• Male & female enrolment ratios at primary
and secondary education
To put your footer here go to View > Header and Footer
6
Identifying response and
regressor (explanatory) variables
In each of the above examples, there was a
key response of interest. This is called the
dependent variable, usually denoted by y.
Factors identified as possibly influencing the
variability in y are called explanatory, or
regressor variables. They form the x’s in the
model. In statistical modelling, we assume
they are measured without error.
What are the y and x’s in previous examples?
To put your footer here go to View > Header and Footer
7
What is a statistical model?
A model is a simple equation which relates a
key response (y) of interest to one or more
other variables (x1, x2, …) which are believed
to contribute to the variability in the key
response.
For example, y = 38.1 – 1.91x, where y is
perinatal mortality per 1000 live births and x
the number of health centres per 1000 HHs.
This describes the relationship between
mortality and availability of health facilities.
To put your footer here go to View > Header and Footer
8
Purpose of Modelling
• To determine a simple summary of the
way that a key response (y) relates to a
set of x’s
• To understand factors (x’s) affecting y
• To use the model equation to make
predictions about y
• To determine which values of the x’s will
optimise y in some way
To put your footer here go to View > Header and Footer
9
Types of key response
In the simplest type of statistical modelling,
the key response is a quantitative
measurement, assumed to follow a normal
distribution. This module focuses on such
responses.
However, there are other types of key
responses. Often have binary variables, e.g.
whether or not a household is below the
poverty line, whether contraceptives are
used or not, person is HIV positive or not.
To put your footer here go to View > Header and Footer
10
Example 3: a binary response
See Impact of HIV on tuberculosis in
Zambia: a cross-sectional study, in British
Medical Journal, 1990, Vol.301, pp.412-5
This includes studying the relationship of
HIV-1 antibody state (yes/no) to
• years of full-time education
• housing (no. of people sharing bedroom)
• marital state (married, single, other)
• history of treatment for sexually
transmitted diseases (yes/no)
To put your footer here go to View > Header and Footer
11
Example 4: a multinomial response
See Patterns of Tobacco Use in the Early
Epidemic Stages: Malawi and Zambia, 20002002, in American J of Public Health, 2005,
Vol. 95, No. 6, pp. 1009-1015.
This was a study relating tobacco use (none,
light smoker, heavy smoker) to
• age, education, occupation, religion, and
• residence (rural/urban), and
• marital status (married, single, other)
To put your footer here go to View > Header and Footer
12
Types of regressor variables
In above examples, the explanatory
(regressor) variables can be:
• quantitative measurements, e.g years of
education;
• ordered categorical variables, e.g. extent
of smoking (low, medium, high)
• nominal (type of occupation);
• binary (possess a specific asset or not).
Quantitative x’s will be considered in sessions
1-10, and other types in later sessions.
To put your footer here go to View > Header and Footer
13
Practical work follows to
ensure learning objectives
are achieved…
To put your footer here go to View > Header and Footer
14