Worksheet

STAT10010 Introductory Statistics
Lab 2
1. Aims of Lab 2
By the end of this lab you will be able to:
i.
Recognize the type of recorded data.
ii.
Construct summaries of recorded variables.
iii.
Calculate and interpret the margin of error in a survey.
iv.
Draw a stratified random sample of data.
v.
Calculate some descriptive statistics for a data set.
2. Survey data
In this lab we will work with some survey data collected in a political study in the USA. The
researcher wanted to assess if there was an association between age or gender and candidate
preference (Democrats, Republicans, and Others) in a presidential election. The researcher
randomly selected 400 individuals and asked them the following 3 questions:
1) What gender are you?
2) What age are you in years?
3) Is the candidate you will back in the upcoming presidential election a:
(a) Democrat (b) Republican or (c) other?
Q1: Is each question asked by the researcher an open question or a closed question?
From Blackboard, download the Minitab worksheet file called PoliticalPoll.mtw to your
computer and open it in Minitab. (Recall from lab 1 how to open a Minitab worksheet.) Your
worksheet should then look like the screen below:
Clearly the first column contains the gender of each person in the survey, the second column
contains their political preference and the third, their age. Scroll down to double check that there
are 400 observations/people in your data set (i.e. there should be 400 rows of data in your
worksheet).
Q2: The data recorded for the gender variable are categorical data; are the data ordinal or
nominal?
Q3: What type of data is recorded for the preference variable?
Q4: What type of data is recorded for the age variable?
Note that the ‘Gender’ column and the ‘Preference’ column are both in text format (again, recall
lab 1.). To analyse the data it will often be easier to work with the data in numerical format. To
change the format of the data, in the menu bar go to Data, then Code, then Text to Numeric.
Code the gender variable as 0 for female and 1 for male, and save the new data in column C4,
say, in your worksheet. Give your new column of data a label.
Re-code the preference data column in the same way.
Let’s look at some tables which summarise the information in our data set. In the menu bar go to
Stat, then Tables, then Tally Individual Variables. Choose your new numerically expressed
gender data, and your new numerically expressed preference data.
Q5: How many females were in the sample of 400 people?
Q6: How many people in the sample supported neither the Democrats nor the
Republicans?
Q7: What proportion support the Democrats?
3. The margin of error
One way of assessing the uncertainty in our estimate of the proportion which supports the
Democrats is through the margin of error. Recall from lectures that the margin of error in a
survey in which the sample is of size n is 1 divided by the square root of n i.e.
MoE = 1 / √n
Let’s calculate the margin of error for the political poll data set. In the menu bar, go to Calc, then
Calculator. Store your result in the next free column (probably column C6). In the ‘Expression’
box, enter 1/SQRT(400). You should be able to find the SQRT function in the list of arithmetic
functions.
Q8: What is the margin of error (in %) of the study?
Q9: What is the interval in which the true proportion which supports the Democrats lies?
4. Stratified random sampling
Let’s now draw a stratified random sample from the political poll data. Recall from lectures
what is meant by a stratified random sample.
Let’s treat the two gender categories as our two strata. Say we wish to draw a random sample of
size 10 from each stratum. Let’s first organise our data so that all the female data is grouped
together, and then all the male data. Go to Data, then Sort. You want to sort all the columns of
data, and you want to sort them by gender. Check the original columns option.
Your worksheet should now be organised such that all the female data are in the first rows
followed by all the male data. The female observations are numbered 1 up to 204 – let’s choose a
random sample of size 10 from this set of observations.
Select Calc, then Random Data and then Integer. Ask Minitab to generate 10 rows and to store
the resulting sample in your next free column. Enter 1 as the minimum value and 204 as the
maximum. The numbers generated are a set of random numbers. Each observation in our original
(female) data set included in the list of random numbers will be an observation in our stratified
random sample.
To construct a new data set consisting of the randomly sampled female observations go to Data,
Subset Worksheet. Check the row numbers box, enter the list of randomly generated numbers
and press OK. In the next Window, tell Minitab to include all the columns containing data. A
new worksheet of data should pop up – save this worksheet as in your STAT10010 folder on
your H drive.
Repeat this to draw a sample of 10 observations from the male stratum and save your worksheet.
Q10: Based on your new stratified random samples, which stratum has the higher
proportion of support for Republicans?
5. Some basic descriptive statistics
To calculate some descriptive statistics we can use the Stat, Basic Statistics, Display
Descriptive Statistics option. Click the statistics box, and ensure that only the mean, minimum
and maximum boxes are checked.
Q11: Which stratum has the largest average age?
Q12: Which stratum has the maximum age?
oOo
You now have now worked with some categorical and numerical data, drawn some conclusions
based on sampled data and drawn a stratified random sample. Some more steps into the world of
a statistician …
oOo