Ress Act 1 problems - University of St. Thomas

Activity I: PROBLEMS
Problem 1
Part 1: Understanding Statistical Words and Symbols
Company officials wish to determine the labor costs of producing a specialty widget that requires
extensive hand work. They select 16 workers at random from their labor force of 423 and have
them spend a day building the widget. They found the average time to build the widget is 3.278
hours, with a standard deviation of 0.101 hours, and a histogram that is unimodal. They also
found the probability of making a defective widget changes with the age of the worker. Those
workers who are over 50 make defectives 2% of the time, while those under 25 make defectives
9% of the time.
In the context of the problem, answer or describe the importance of the following words or
symbols. Assume x = the time it takes a worker to produce one widget.
a)
Random Variable of Interest
b)
Other Random Variables
c)
µx
d)
x
e)
X
f)
Sx
Spring 2002
Activity I - 1
g)
Accurate Data
h)
Reliable Data
i)
Precise Data
j)
Stratified Sample
k)
Valid Data
l)
As a manager, how would your thoughts change if the study showed average labor was
3.3 hours with a standard deviation of 1 hour and the distribution was bimodal?
m)
Logic of the Measurement
n)
Histogram (Probability Distribution)
Spring 2002
Activity I - 2
Problem 1
Part 2: Producing Control Charts
A company has historical data that shows weekly sales generated by its salespeople are
distributed with a mean of approximately $10,000 and a standard deviation of about $1,500.
This mean and standard deviation are for an individual salesperson’s weekly sales. As sales
manager you have been in Europe for the past six weeks exploring an international market for
your products. You arrive back at work and are given the following summary of your sales
people's performance.
The numbers below indicate the weekly sales in $1,000 units for five sales people. The questions
will help you answer the question "How do you evaluate their performance?".
Salesperson/Week
1
2
3
4
5
6
A
12
11.8
9.2
10.4
8.6
10.1
B
14
12
13.5
11
9
10
C
9
11
9
15
7.2
8
D
12.9
10.9
11.8
13
4.9
11.9
E
8
11
8.3
8.1
7.9
10
1.
Using  = 10,000 and  = 1,500 create a Control chart for individual sales performance
for each of the 5 people (put all the charts on one chart). The chart should include the
Center Line and control lines at 1, 2, and 3. For example, the 3 limits would
be at 3  10,000  3(1,500)  10,000  4,500  5,500 and 14,500.
2.
Then create a Control Chart for the Average Weekly Sales. You will need to calculate

1500
the average sales for each week. Remember to use  X 

 671  700
n
5
instead of  in calculating the control limits.
3.
Interpret each control chart, identifying any unnatural sales performance. Explain why it
is deemed unnatural. Use p.54-55 of the In-Class Notes to help decide what is unnatural.
4.
What assumptions are you making in making this interpretation?
5.
Control Charts are an example of Spatial Thinking while the table of values is Symbolic
Thinking. Which is more informative to you? Explain. Remember: Information is
Data that has been condensed to force you to Take Action or to Think Differently.
Spring 2002
Activity I - 3
20
Individual Salespeople
15
10
5
Weeks
1
2
3
4
5
6
Weekly Averages
12
10
8
Weeks
1
Spring 2002
2
3
4
5
6
Activity I - 4
Problem 2
Part 1: Working with Decision Rules
Officials at a drug company are interested in introducing a new drug that does not have to go
through FDA approval. They presently have contracts with 12 health care organizations that
produce 80% of the drug company's revenues. If more than 10 of the organizations would
purchase the drug, they will be able to make a profit on the drug. However, it is known that a
rival drug firm is very close to offering a similar drug, so they are unable to conduct a census.
They have time to talk to at most 3 clients. The following belief has been stated.
Belief: 10 or fewer of the clients want drug
Other Belief: More than 10 want drug
The following Decision Rule has been developed: Select 2 clients at random. If both want the
drug, accept belief. If 2 don't want the drug, reject belief. If one wants drug and one doesn't,
select one more client at random. If that person wants drug, accept belief, otherwise reject belief.
1)
If in reality 9 clients want the drug and 3 don't, how effective is the decision? Determine
the type of error ( or ) and the size of the possible error. Hint: Draw a tree diagram.
2)
Comment on the logic of the Decision Rule. How well is the information being used?
Remember the two types of logic errors: Solving the correct problem incorrectly, or
solving the wrong problem.
3)
Discuss the relevance of accuracy, reliability, and precision.
Spring 2002
Activity I - 5
Problem 2
Part 2: Understanding Relationships Between Variables
Company officials do a study to determine if there is a relationship between income and the
chance that a potential customer will purchase a new vitamin. They know that 50% of the
potential customers will buy the vitamins, 40% make under $20,000 and 10% make under
$20,000 and are buyers of the vitamin.
1)
Sales are made by selecting a person at random from the phone book; a salesperson is
able to make 600 contacts per week. The profit from a sale of one bottle of vitamins is
$5. What should a salesperson generate in profits each week, if the above information is
representative of the population?
2)
An information service will sell you the names and phone numbers of people with
incomes over $20,000. If the information costs $2 per name, is it worth purchasing the
information? Justify your answer with numbers.
3)
Are you comfortable with the logic of these numbers (measurements)? Discuss.
4)
What questions would you ask about sampling methodology?
5)
For these numbers, discuss the issues of accuracy, reliability, and precision.
6)
If you put your systems thinking hat on, who would you involve in the decision: Should
we change our marketing strategy?
Spring 2002
Activity I - 6
Problem 3
Part 1: Service Contracts—Expected Value
1)
An ear institute offers a service contract on hearing aids that have just 2 parts. It found
one of the following three conditions always exists for hearing aids that are defective:



30% of the hearing aids have 1 defective part.
50% of the hearing aids have 2 defective parts.
20% of the hearing aids have 2 good parts but the aid doesn't work. When this
happens, the aid is scrapped at a cost of $4,000.
The repair procedure is to select a part at random and test it; the test takes 2 hours of
labor. If the part is defective, it is repaired and the aid is tested again. If the aid doesn't
work, further testing and/or repairing may be needed on the second part. The repairing,
replacing and testing with a new part costs $100 in material cost and takes 4 hours of
labor. If labor costs $60/hour, what should they charge on the service contract to break
even? You can assume that the aids with defective parts will always work once all the
defective parts have been repaired and replaced. Hint: Tree Diagram
2)
What measurement issues should be considered? How would you determine if the
numbers were valid? Hint: measurement, sampling and logic issues
3)
Is there any additional information you would like to know about the defective parts that
would
a) better help you solve the problem?
b) help lower the above cost?
4)
If you were an Information Specialist brought in to design an Information System for the
entire organization, what information could you conceive being sent from the service
department and who would you send it to?
Spring 2002
Activity I - 7
Problem 3
Part 2: Conditional Probability—CEA Test
Read the CEA Test article about the recurrence of colon cancer in patients who have previously
undergone surgery for colon cancer. There is a large percent (40%) of false negatives and a large
percent (20%) of false positives. Notice that the article does not mention what percent of
patients being tested with this test are found to actually have a recurrence of colon cancer.
a)
Create a tree diagram for the situation of testing a patient. Since we do not know the
chances the patient will have a recurrence, use a variable such as p to represent this
unknown probability.
b)
Using your tree diagram, what is the probability, in terms of p, that a patient has had a
recurrence of colon cancer, knowing that the CEA test is positive?
c)
Using your answer to part b, what is this probability for
p= 5%? p = 10%? p=20%? p=30%?
d)
Do you agree the test is unreliable? Careful; what is the difference between accurate and
reliable measurements. In the last 4 paragraphs the article says, ". . . small number of
lives will be saved . . . less than 1% of patients monitored." And later ". . . more than
500,000" patients are monitored each year. How does that fit with the last paragraph?
What role does precision play in your discussion?
e)
Using a Fermi approach, estimate the cost of using the test for one year in the United
States.
Spring 2002
Activity I - 8
CEA Test Article
St. Paul Pioneer Press (MN)
August 25, 1993
Section: Main
Edition: Metro Final
Page: 1A
BYLINE: Tom Majeski, Staff Writer
Illustration: Graphic: Pioneer Press
Cancer test a failure. Researchers say the CEA test commonly used for detecting the return of
colon cancer should be discontinued. The main problems:



40 percent have false negatives, which delays cancer detection.
20 percent have false positives, requiring unnecessary further testing.
Despite its cost, the test increased survival rate by less than 1 percent.
Source: Journal of the American Medical Association
TEST TO DETECT COLON CANCER DOESN'T WORK/MAYO RESEARCHERS SAY
PROCEDURE UNRELIABLE, SHOULD BE ABANDONED
A common blood test used to detect a recurrence of colon cancer is highly unreliable and
expensive and can lead to needless tests and even surgery on healthy patients, researchers at the
Mayo Clinic and elsewhere say.
The procedure - called a CEA test - identifies a protein-like substance called carcinoembryonic
antigen. The antigen may be produced in large quantities by cancer in the large bowel and can be
accurately identified in blood.
But in their study, published in today's edition of the Journal of the American Medical
Association, the researchers found that the test has such a low success rate that it should be
abandoned.
”We found the test extremely unreliable and hope that it will be abandoned,”' said Dr. Charles
Moertel, professor of oncology at the Mayo Clinic in Rochester and chief author of the study.
In a telephone interview from Alaska, Moertel said that his research team also hopes that the
study stimulates a search for a better way to give colon cancer patients a second chance.
When the CEA test was developed about 20 years ago, some experts thought it could be used as a
screening tool for colon cancer, the nation's second-leading cancer killer behind lung cancer.
Colon cancer will kill about 57,000 Americans this year.
Spring 2002
Activity I - 9
Although follow-up research showed that the test was too insensitive to serve as a screen, it
became the standard way to monitor patients after colon cancer surgery. To determine the test's
effectiveness, researchers at the Mayo Clinic, the Fred Hutchinson Cancer Research Center in
Seattle, Temple University School of Medicine and the University of Pennsylvania, both in
Philadelphia, and the Grand Forks Clinic in Grand Forks, N.D., followed 1,216 patients who had
undergone colon cancer surgery. Of those, 1,017 were monitored by CEA testing.
”The fundamental objective of the test is to pick up a recurrence at a stage when it can be cured,”
Moertel said. But one-year survival rates for study participants who experienced a recurrence was
2.3 percent for those who underwent CEA monitoring and 2 percent for those who did not.
Moertel said there are two major problems with the test.
About 40 percent of patients have false negatives, which means that many colon cancers go
undetected until they have spread.
About 20 percent of patients have false positives. To rule out a recurrence, they must
unnecessarily endure numerous tests. And in some instances, patients with false positives also
must undergo rigorous exploratory abdominal surgery at a cost of about $10,000.
“Our uncontrolled study indicates that the maximum anticipated gain from CEA monitoring will
be a small number of lives saved ... probably less than 1 percent of the patients monitored,”' the
authors wrote.
Moertel said about 80 percent of U.S. physicians now use the test to monitor more than 500,000
patients annually. During the normal five-year monitoring period, patients can have from 10 to
40 CEA tests, each costing about $50.
To give an idea of the costs involved, the researchers calculated that monitoring and conducting
associated tests on the 1,017 patients in the study totaled nearly $1.5 million.
“No more lives are saved despite the added expense,” Moertel said.
All content © 1999 St. Paul Pioneer Press (MN) and may not be republished without
permission.
Spring 2002
Activity I - 10
Problem 4: Regression Assignment
1)
Company officials wish to study what causes defective products to be produced. They
have identified several factors that may influence the number of defects positively and
negatively. They feel the number of defective parts a production worker produces each
week is dependent on the hours of training they received prior to starting work. They
have randomly selected 5 workers and have looked at their hours of training and the
number of defects per week.
X = hours of training
Y = number of defects produced each week
X
10
20
30
40
50
Y
12
10
5
4
4
X-X
ˆ  bX  a
Y
a)
Y-Y
b
(X - X)( Y - Y )
 (X  X)(Y  Y)
 (X  X)2
( X - X)2
a  Y  bX
Determine the equation of the best fitting least squares line.
b=
Yˆ =
a=
b)
Give an interpretation of the slope.
c)
Give an interpretation of the Y intercept.
Spring 2002
Activity I - 11
d)
What would you estimate for number of defectives a worker would produce each week if
they have received:
20 hours of training?
Estimated Defectives =
60 hours of training?
Estimated Defectives =
Do you have the same confidence in both estimates? Explain.
e)
Assuming that training costs $100 an hour and each defective $1,000, would you
recommend increasing the training of all employees to a minimum of 40 hours? Explain.
f)
Give a verbal interpretation of the meaning of the slope and the y-intercept for the
following problems.
1)
X = years from 1989
Y = Sales in millions of $
ˆ  52X  40, 0  X  12
Y
Slope
Intercept
2)
X = Competitors price in $
Y = Our sales in 1000 units sold
ˆ  7.5X  2000, 20  X  30
Y
Slope
Intercept
3)
X = Prime interest rate, %
Y = Sales in $100,000
ˆ  12.4X  500, 5  X  10
Y
Slope
Intercept
Spring 2002
Activity I - 12
Problem 5: Binomial Distribution
To estimate who will win an election, candidate A or B, a survey is taken.
1)
If in reality 45% favor A and 55% favor B, what is the probability that 11 or more of the
20 voters (in survey) will favor A? Hint: Use Binomial Distribution - Excel or any stats
package or tables in the book
2)
Same problem as (1) above, but now a survey of 100 voters is taken and you want to
determine the probability that 51 or more of the 100 voters will favor A.
3)
Explain why the probabilities determined in questions (1) and (2) are important to the
person taking the survey (trying to predict the winner). Think about how the information
from the survey will be used. Hint: one must quantify the risk of sampling errors. ( and
 errors)
4)
The probabilities in (1) and (2) are different; what basic statistical concept is this
difference demonstrating?
5)
What questions do you have about sampling methodology?
6)
For the survey, discuss the measurement issues of accuracy and reliability.
Spring 2002
Activity I - 13
Problem 6: Normal Distribution
An automobile insurance company wishes to estimate the total cost of settling insurance claims.
It estimates it will need to service 2 million claims next year. Taking a random sample from past
years, it knows the average claim is $1,500, the standard deviation is $800, and the numbers are
approximately Normally distributed.
1)
How many claims can they expect to be over $2,500? Hint: Draw picture, use the Z
table, or Excel.
2)
What number will 95% of the claims be less than?
3)
What would you estimate the total cost of settling claims will be?
4)
What questions might you ask about this assertion: The distribution is Normal.
5)
How does your answer to question 4 affect your faith in your answer to questions 1 & 2?
6)
Discuss the importance of accuracy, precision, and reliability.
7)
What questions would you ask about the sampling methodology?
8)
Comment on the logic of the measurement.
Spring 2002
Activity I - 14
Problem 7: Decision Trees
Decision Tree—Oil Drilling
Decision Tree Analysis views the overall decision process as a sequence of alternatives
(controllable) and chance outcomes (uncontrollable), where the chance outcomes can be
described by a probability distribution.
You are an independent oil operator who owns rights to a particular piece of property and
you must decide whether to drill for oil yourself or sell your rights. The desirability of
drilling obviously depends on how much oil, if any, there is beneath the surface. Before
making this decision, however, you may, if you wish, obtain certain geological and
geophysical information by taking seismographic recordings usually associated with oil
pools. Unfortunately, the information obtained from such readings does not provide perfect
predictions. Sometimes oil is found where no subsurface structure has been detected. At
the present time, you must choose one of the three alternative sources of action.
I.
You may sell your rights for $2,000,000.
II.
You may drill immediately at a cost of $1,000,000. Based on your experience
with similar geographic locations you predict a 50-50 chance of finding oil. If
there is oil, you expect the pay off to be $9,000,000.
III.
You may take seismographic reading at a cost of $300,000. You estimate a 2/3
chance of finding subsurface structure. If subsurface structure is found, you can
sell the rights for $3,000,000 or drill for oil. If you drill, at a cost of $1,000,000,
you estimate a 60% chance of finding oil. If oil is found, the expected payoff is
$12,000,000.
If no subsurface structure is found, you can sell the rights for $1,000,000. If you
drill at a cost of $1,000,000, you estimate the chance of finding oil to be .3. The
expected payoff, if oil is found, is $5,000,000.
a)
Using Decision Analysis, determine which alternative you would select.
b)
Do a risk analysis of the alternative you have selected.
c)
Comment on why sensitivity analysis would be helpful. You do not have
to do any mathematical analysis. Relate sensitivity analysis to the Fermi
process.
Do one solution using paper and pencil; do another solution using Precision Tree.
Spring 2002
Activity I - 15
Problem 8: More Complicated Decision Tree
Part 1: Decision Tree—Alice Hope
Alice Hope has $100,000 to invest. She has three opportunities she finds intriguing. For
tax reasons she will have to cash out this investment after two years. A further restriction is
that each opportunity requires her to invest a lump sum of $100,000.
First Opportunity: A secured loan that will pay 10% per year.
Second Opportunity: A group of venture capitalists will be taking a company public. By
investing $100,000, she is given 100,000 shares in the Faith company. The plan is to take
the company public in 6 months. Alice has talked to several sources and believes there is a
30% chance the company will be able to make the breakthrough to bring the company
public. If they don't make the breakthrough, the company will be dissolved and the sale of
assets would give 40¢ on each $1 invested. If it is successful the stock will be sold at an
IPO of $5 per share. She is certain the IPO will be successful if the Faith company makes
the breakthrough. She must decide whether she should cash out her investment or hold it
and see what it does in the open market. There is a second breakthrough the company is
working on; results will be known in two years. If this is successful, she feels confident the
stock will soar to $20. Her consultants tell her there is a 20% chance of Faith pulling off
the second breakthrough. If it is unsuccessful she believes there is a 60% chance the stock
will plunge to 50¢ a share or a 40% chance it will drop to $2 per share.
Third Opportunity: The Charity company is going public now and she can purchase stocks
in units of 10,000 for $10 per share. The Faith group has come back to her and said that
if she doesn't invest now they will need an additional $100,000 in 6 months. The difference
is that she will then receive only 50,000 shares since there is less risk that the company will
not survive. If the company needs the additional $100,000, the chance of Faith pulling off
the necessary breakthrough would be 50%. The potential price of the stock does not
change and the chance of the second breakthrough remains the same.
Because of this second option from Faith, she is considering holding Charity of only 6
months. She feels there is a 50% chance it will increase to $12 per share, a 40% chance it
will stay at $10 per share, and a 10% chance it will drop to $8 per share. If the stock drops
to $8 she has decided to hold Charity until the two years are up. She believes there is a
40% chance it will reach $16 per share, a 30% chance it stays at $8 and a 30% chance it
returns to the initial $10 per share. She has also been offered a second secured loan
investment that would pay her 8% per year for 18 months.
If Charity is successful in the first 6 months or holds at $10 per share she can stay with
Charity and believes there is a 70% chance of it going to $20 per share and a 30% chance of
it dropping to $8. She could also cash out Charity and invest the entire amount in secured
18 month loans. She has checked with Faith and they indicate that if she has more than
$100,000 to invest in 6 months she will be able to do that with them and get the same per
share stock price as the $100,000 would. She cannot go to Faith with less than $100,000.
Create and solve using Precision Tree a Decision Analysis Tree Diagram for the decision
that Alice faces. Discuss any assumptions that you have made in solving the problem.
Spring 2002
Activity I - 16
Problem 9: Working with Random Variables
Find at least 100 values of a random variable; you choose what variable that may be. You could
use economic or financial or demographic data from the World Wide Web, or something from
your work or your personal life.
a.
Explain why this data is a sample, and discuss what you know about the sampling
methodology that created the data.
b.
Discuss the accuracy, reliability and precision of the data.
c.
Calculate the mean, median, standard deviation, and standard error of the mean for the
random variable. Use BestFit to find a theoretical distribution that closely matches the
distribution of values for your variable.
d.
Find a 90% confidence interval for the mean of the random variable. Find a 95%
confidence interval for the random variable.
e.
Create (either by hand or by computer software) a histogram or stem plot of the data.
Spring 2002
Activity I - 17
Problem 10: Multiple Regression Example
The following table lists the selling price of some homes that were sold in 1997 in Roseville,
MN. Three variables that may affect the selling price of a home are also given.
Selling Price Square Feet Bedrooms Age
$ 120,000
1736
4
43
$ 135,000
2175
3
28
$ 118,900
1650
3
31
$ 122,500
1561
3
43
$ 140,300
1450
3
37
$ 209,000
2577
4
30
$ 142,400
1892
4
32
$ 162,000
2702
3
42
$ 175,000
2225
4
30
$ 137,900
1823
3
34
$ 125,000
1508
3
20
$ 156,000
2090
3
44
$ 104,900
1262
2
42
$ 112,000
1488
3
35
$ 160,000
2156
4
34
$ 101,500
1393
2
34
$ 212,500
2112
4
20
$ 212,000
2288
5
30
$ 156,000
2000
4
21
$ 195,000
2434
4
14
Spring 2002
Activity I - 18
To help determine whether or not a linear relationship may exist between the selling price and
these three variables, the following correlation coefficients and graphs were produced.
Correlation Coefficients
Selling Price
Square Feet
Bedrooms
Age
Selling Price
-
Square Feet
0.809
-
Bedrooms
0.766
0.614
-
Age
-0.466
-0.268
-0.404
-
1.
Draw a sketch on an x-y axes (or use Excel or other graphing tool) of each of the 6 pairs
of variables indicated in the table. For each of the 6 graphs, explain whether or not the
correlation values appear to make sense, both from looking at the graph and from your
common sense.
2.
Which variables seem to have the best linear relationship with the selling price? Which
do not? Which are correlated to each other? Is that a problem?
A multiple regression calculation gives the following linear model which uses all three
variables:
SellingPr ice  46 Square Feet 17,133 Bedrooms  719 Age  26, 200
3.
Give an English interpretation of each of the numbers in the model; note that the numbers
46, 17,133, and –719 are slopes.
4.
Estimate the selling price for a 35 year old home with 3 bedrooms that is 1800 square feet
and is in Roseville.
5.
What is the range of relevancy for this problem.
6.
There are 3 possible models that use a pair of x variables (i.e. Age and Square Feet, Age
and Number of Bedroom, and Square Feet and Number of Bedrooms). Which of these
models would you select based on the table of correlation coefficients. Explain your
choice.
Spring 2002
Activity I - 19