Activity I: PROBLEMS Problem 1 Part 1: Understanding Statistical Words and Symbols Company officials wish to determine the labor costs of producing a specialty widget that requires extensive hand work. They select 16 workers at random from their labor force of 423 and have them spend a day building the widget. They found the average time to build the widget is 3.278 hours, with a standard deviation of 0.101 hours, and a histogram that is unimodal. They also found the probability of making a defective widget changes with the age of the worker. Those workers who are over 50 make defectives 2% of the time, while those under 25 make defectives 9% of the time. In the context of the problem, answer or describe the importance of the following words or symbols. Assume x = the time it takes a worker to produce one widget. a) Random Variable of Interest b) Other Random Variables c) µx d) x e) X f) Sx Spring 2002 Activity I - 1 g) Accurate Data h) Reliable Data i) Precise Data j) Stratified Sample k) Valid Data l) As a manager, how would your thoughts change if the study showed average labor was 3.3 hours with a standard deviation of 1 hour and the distribution was bimodal? m) Logic of the Measurement n) Histogram (Probability Distribution) Spring 2002 Activity I - 2 Problem 1 Part 2: Producing Control Charts A company has historical data that shows weekly sales generated by its salespeople are distributed with a mean of approximately $10,000 and a standard deviation of about $1,500. This mean and standard deviation are for an individual salesperson’s weekly sales. As sales manager you have been in Europe for the past six weeks exploring an international market for your products. You arrive back at work and are given the following summary of your sales people's performance. The numbers below indicate the weekly sales in $1,000 units for five sales people. The questions will help you answer the question "How do you evaluate their performance?". Salesperson/Week 1 2 3 4 5 6 A 12 11.8 9.2 10.4 8.6 10.1 B 14 12 13.5 11 9 10 C 9 11 9 15 7.2 8 D 12.9 10.9 11.8 13 4.9 11.9 E 8 11 8.3 8.1 7.9 10 1. Using = 10,000 and = 1,500 create a Control chart for individual sales performance for each of the 5 people (put all the charts on one chart). The chart should include the Center Line and control lines at 1, 2, and 3. For example, the 3 limits would be at 3 10,000 3(1,500) 10,000 4,500 5,500 and 14,500. 2. Then create a Control Chart for the Average Weekly Sales. You will need to calculate 1500 the average sales for each week. Remember to use X 671 700 n 5 instead of in calculating the control limits. 3. Interpret each control chart, identifying any unnatural sales performance. Explain why it is deemed unnatural. Use p.54-55 of the In-Class Notes to help decide what is unnatural. 4. What assumptions are you making in making this interpretation? 5. Control Charts are an example of Spatial Thinking while the table of values is Symbolic Thinking. Which is more informative to you? Explain. Remember: Information is Data that has been condensed to force you to Take Action or to Think Differently. Spring 2002 Activity I - 3 20 Individual Salespeople 15 10 5 Weeks 1 2 3 4 5 6 Weekly Averages 12 10 8 Weeks 1 Spring 2002 2 3 4 5 6 Activity I - 4 Problem 2 Part 1: Working with Decision Rules Officials at a drug company are interested in introducing a new drug that does not have to go through FDA approval. They presently have contracts with 12 health care organizations that produce 80% of the drug company's revenues. If more than 10 of the organizations would purchase the drug, they will be able to make a profit on the drug. However, it is known that a rival drug firm is very close to offering a similar drug, so they are unable to conduct a census. They have time to talk to at most 3 clients. The following belief has been stated. Belief: 10 or fewer of the clients want drug Other Belief: More than 10 want drug The following Decision Rule has been developed: Select 2 clients at random. If both want the drug, accept belief. If 2 don't want the drug, reject belief. If one wants drug and one doesn't, select one more client at random. If that person wants drug, accept belief, otherwise reject belief. 1) If in reality 9 clients want the drug and 3 don't, how effective is the decision? Determine the type of error ( or ) and the size of the possible error. Hint: Draw a tree diagram. 2) Comment on the logic of the Decision Rule. How well is the information being used? Remember the two types of logic errors: Solving the correct problem incorrectly, or solving the wrong problem. 3) Discuss the relevance of accuracy, reliability, and precision. Spring 2002 Activity I - 5 Problem 2 Part 2: Understanding Relationships Between Variables Company officials do a study to determine if there is a relationship between income and the chance that a potential customer will purchase a new vitamin. They know that 50% of the potential customers will buy the vitamins, 40% make under $20,000 and 10% make under $20,000 and are buyers of the vitamin. 1) Sales are made by selecting a person at random from the phone book; a salesperson is able to make 600 contacts per week. The profit from a sale of one bottle of vitamins is $5. What should a salesperson generate in profits each week, if the above information is representative of the population? 2) An information service will sell you the names and phone numbers of people with incomes over $20,000. If the information costs $2 per name, is it worth purchasing the information? Justify your answer with numbers. 3) Are you comfortable with the logic of these numbers (measurements)? Discuss. 4) What questions would you ask about sampling methodology? 5) For these numbers, discuss the issues of accuracy, reliability, and precision. 6) If you put your systems thinking hat on, who would you involve in the decision: Should we change our marketing strategy? Spring 2002 Activity I - 6 Problem 3 Part 1: Service Contracts—Expected Value 1) An ear institute offers a service contract on hearing aids that have just 2 parts. It found one of the following three conditions always exists for hearing aids that are defective: 30% of the hearing aids have 1 defective part. 50% of the hearing aids have 2 defective parts. 20% of the hearing aids have 2 good parts but the aid doesn't work. When this happens, the aid is scrapped at a cost of $4,000. The repair procedure is to select a part at random and test it; the test takes 2 hours of labor. If the part is defective, it is repaired and the aid is tested again. If the aid doesn't work, further testing and/or repairing may be needed on the second part. The repairing, replacing and testing with a new part costs $100 in material cost and takes 4 hours of labor. If labor costs $60/hour, what should they charge on the service contract to break even? You can assume that the aids with defective parts will always work once all the defective parts have been repaired and replaced. Hint: Tree Diagram 2) What measurement issues should be considered? How would you determine if the numbers were valid? Hint: measurement, sampling and logic issues 3) Is there any additional information you would like to know about the defective parts that would a) better help you solve the problem? b) help lower the above cost? 4) If you were an Information Specialist brought in to design an Information System for the entire organization, what information could you conceive being sent from the service department and who would you send it to? Spring 2002 Activity I - 7 Problem 3 Part 2: Conditional Probability—CEA Test Read the CEA Test article about the recurrence of colon cancer in patients who have previously undergone surgery for colon cancer. There is a large percent (40%) of false negatives and a large percent (20%) of false positives. Notice that the article does not mention what percent of patients being tested with this test are found to actually have a recurrence of colon cancer. a) Create a tree diagram for the situation of testing a patient. Since we do not know the chances the patient will have a recurrence, use a variable such as p to represent this unknown probability. b) Using your tree diagram, what is the probability, in terms of p, that a patient has had a recurrence of colon cancer, knowing that the CEA test is positive? c) Using your answer to part b, what is this probability for p= 5%? p = 10%? p=20%? p=30%? d) Do you agree the test is unreliable? Careful; what is the difference between accurate and reliable measurements. In the last 4 paragraphs the article says, ". . . small number of lives will be saved . . . less than 1% of patients monitored." And later ". . . more than 500,000" patients are monitored each year. How does that fit with the last paragraph? What role does precision play in your discussion? e) Using a Fermi approach, estimate the cost of using the test for one year in the United States. Spring 2002 Activity I - 8 CEA Test Article St. Paul Pioneer Press (MN) August 25, 1993 Section: Main Edition: Metro Final Page: 1A BYLINE: Tom Majeski, Staff Writer Illustration: Graphic: Pioneer Press Cancer test a failure. Researchers say the CEA test commonly used for detecting the return of colon cancer should be discontinued. The main problems: 40 percent have false negatives, which delays cancer detection. 20 percent have false positives, requiring unnecessary further testing. Despite its cost, the test increased survival rate by less than 1 percent. Source: Journal of the American Medical Association TEST TO DETECT COLON CANCER DOESN'T WORK/MAYO RESEARCHERS SAY PROCEDURE UNRELIABLE, SHOULD BE ABANDONED A common blood test used to detect a recurrence of colon cancer is highly unreliable and expensive and can lead to needless tests and even surgery on healthy patients, researchers at the Mayo Clinic and elsewhere say. The procedure - called a CEA test - identifies a protein-like substance called carcinoembryonic antigen. The antigen may be produced in large quantities by cancer in the large bowel and can be accurately identified in blood. But in their study, published in today's edition of the Journal of the American Medical Association, the researchers found that the test has such a low success rate that it should be abandoned. ”We found the test extremely unreliable and hope that it will be abandoned,”' said Dr. Charles Moertel, professor of oncology at the Mayo Clinic in Rochester and chief author of the study. In a telephone interview from Alaska, Moertel said that his research team also hopes that the study stimulates a search for a better way to give colon cancer patients a second chance. When the CEA test was developed about 20 years ago, some experts thought it could be used as a screening tool for colon cancer, the nation's second-leading cancer killer behind lung cancer. Colon cancer will kill about 57,000 Americans this year. Spring 2002 Activity I - 9 Although follow-up research showed that the test was too insensitive to serve as a screen, it became the standard way to monitor patients after colon cancer surgery. To determine the test's effectiveness, researchers at the Mayo Clinic, the Fred Hutchinson Cancer Research Center in Seattle, Temple University School of Medicine and the University of Pennsylvania, both in Philadelphia, and the Grand Forks Clinic in Grand Forks, N.D., followed 1,216 patients who had undergone colon cancer surgery. Of those, 1,017 were monitored by CEA testing. ”The fundamental objective of the test is to pick up a recurrence at a stage when it can be cured,” Moertel said. But one-year survival rates for study participants who experienced a recurrence was 2.3 percent for those who underwent CEA monitoring and 2 percent for those who did not. Moertel said there are two major problems with the test. About 40 percent of patients have false negatives, which means that many colon cancers go undetected until they have spread. About 20 percent of patients have false positives. To rule out a recurrence, they must unnecessarily endure numerous tests. And in some instances, patients with false positives also must undergo rigorous exploratory abdominal surgery at a cost of about $10,000. “Our uncontrolled study indicates that the maximum anticipated gain from CEA monitoring will be a small number of lives saved ... probably less than 1 percent of the patients monitored,”' the authors wrote. Moertel said about 80 percent of U.S. physicians now use the test to monitor more than 500,000 patients annually. During the normal five-year monitoring period, patients can have from 10 to 40 CEA tests, each costing about $50. To give an idea of the costs involved, the researchers calculated that monitoring and conducting associated tests on the 1,017 patients in the study totaled nearly $1.5 million. “No more lives are saved despite the added expense,” Moertel said. All content © 1999 St. Paul Pioneer Press (MN) and may not be republished without permission. Spring 2002 Activity I - 10 Problem 4: Regression Assignment 1) Company officials wish to study what causes defective products to be produced. They have identified several factors that may influence the number of defects positively and negatively. They feel the number of defective parts a production worker produces each week is dependent on the hours of training they received prior to starting work. They have randomly selected 5 workers and have looked at their hours of training and the number of defects per week. X = hours of training Y = number of defects produced each week X 10 20 30 40 50 Y 12 10 5 4 4 X-X ˆ bX a Y a) Y-Y b (X - X)( Y - Y ) (X X)(Y Y) (X X)2 ( X - X)2 a Y bX Determine the equation of the best fitting least squares line. b= Yˆ = a= b) Give an interpretation of the slope. c) Give an interpretation of the Y intercept. Spring 2002 Activity I - 11 d) What would you estimate for number of defectives a worker would produce each week if they have received: 20 hours of training? Estimated Defectives = 60 hours of training? Estimated Defectives = Do you have the same confidence in both estimates? Explain. e) Assuming that training costs $100 an hour and each defective $1,000, would you recommend increasing the training of all employees to a minimum of 40 hours? Explain. f) Give a verbal interpretation of the meaning of the slope and the y-intercept for the following problems. 1) X = years from 1989 Y = Sales in millions of $ ˆ 52X 40, 0 X 12 Y Slope Intercept 2) X = Competitors price in $ Y = Our sales in 1000 units sold ˆ 7.5X 2000, 20 X 30 Y Slope Intercept 3) X = Prime interest rate, % Y = Sales in $100,000 ˆ 12.4X 500, 5 X 10 Y Slope Intercept Spring 2002 Activity I - 12 Problem 5: Binomial Distribution To estimate who will win an election, candidate A or B, a survey is taken. 1) If in reality 45% favor A and 55% favor B, what is the probability that 11 or more of the 20 voters (in survey) will favor A? Hint: Use Binomial Distribution - Excel or any stats package or tables in the book 2) Same problem as (1) above, but now a survey of 100 voters is taken and you want to determine the probability that 51 or more of the 100 voters will favor A. 3) Explain why the probabilities determined in questions (1) and (2) are important to the person taking the survey (trying to predict the winner). Think about how the information from the survey will be used. Hint: one must quantify the risk of sampling errors. ( and errors) 4) The probabilities in (1) and (2) are different; what basic statistical concept is this difference demonstrating? 5) What questions do you have about sampling methodology? 6) For the survey, discuss the measurement issues of accuracy and reliability. Spring 2002 Activity I - 13 Problem 6: Normal Distribution An automobile insurance company wishes to estimate the total cost of settling insurance claims. It estimates it will need to service 2 million claims next year. Taking a random sample from past years, it knows the average claim is $1,500, the standard deviation is $800, and the numbers are approximately Normally distributed. 1) How many claims can they expect to be over $2,500? Hint: Draw picture, use the Z table, or Excel. 2) What number will 95% of the claims be less than? 3) What would you estimate the total cost of settling claims will be? 4) What questions might you ask about this assertion: The distribution is Normal. 5) How does your answer to question 4 affect your faith in your answer to questions 1 & 2? 6) Discuss the importance of accuracy, precision, and reliability. 7) What questions would you ask about the sampling methodology? 8) Comment on the logic of the measurement. Spring 2002 Activity I - 14 Problem 7: Decision Trees Decision Tree—Oil Drilling Decision Tree Analysis views the overall decision process as a sequence of alternatives (controllable) and chance outcomes (uncontrollable), where the chance outcomes can be described by a probability distribution. You are an independent oil operator who owns rights to a particular piece of property and you must decide whether to drill for oil yourself or sell your rights. The desirability of drilling obviously depends on how much oil, if any, there is beneath the surface. Before making this decision, however, you may, if you wish, obtain certain geological and geophysical information by taking seismographic recordings usually associated with oil pools. Unfortunately, the information obtained from such readings does not provide perfect predictions. Sometimes oil is found where no subsurface structure has been detected. At the present time, you must choose one of the three alternative sources of action. I. You may sell your rights for $2,000,000. II. You may drill immediately at a cost of $1,000,000. Based on your experience with similar geographic locations you predict a 50-50 chance of finding oil. If there is oil, you expect the pay off to be $9,000,000. III. You may take seismographic reading at a cost of $300,000. You estimate a 2/3 chance of finding subsurface structure. If subsurface structure is found, you can sell the rights for $3,000,000 or drill for oil. If you drill, at a cost of $1,000,000, you estimate a 60% chance of finding oil. If oil is found, the expected payoff is $12,000,000. If no subsurface structure is found, you can sell the rights for $1,000,000. If you drill at a cost of $1,000,000, you estimate the chance of finding oil to be .3. The expected payoff, if oil is found, is $5,000,000. a) Using Decision Analysis, determine which alternative you would select. b) Do a risk analysis of the alternative you have selected. c) Comment on why sensitivity analysis would be helpful. You do not have to do any mathematical analysis. Relate sensitivity analysis to the Fermi process. Do one solution using paper and pencil; do another solution using Precision Tree. Spring 2002 Activity I - 15 Problem 8: More Complicated Decision Tree Part 1: Decision Tree—Alice Hope Alice Hope has $100,000 to invest. She has three opportunities she finds intriguing. For tax reasons she will have to cash out this investment after two years. A further restriction is that each opportunity requires her to invest a lump sum of $100,000. First Opportunity: A secured loan that will pay 10% per year. Second Opportunity: A group of venture capitalists will be taking a company public. By investing $100,000, she is given 100,000 shares in the Faith company. The plan is to take the company public in 6 months. Alice has talked to several sources and believes there is a 30% chance the company will be able to make the breakthrough to bring the company public. If they don't make the breakthrough, the company will be dissolved and the sale of assets would give 40¢ on each $1 invested. If it is successful the stock will be sold at an IPO of $5 per share. She is certain the IPO will be successful if the Faith company makes the breakthrough. She must decide whether she should cash out her investment or hold it and see what it does in the open market. There is a second breakthrough the company is working on; results will be known in two years. If this is successful, she feels confident the stock will soar to $20. Her consultants tell her there is a 20% chance of Faith pulling off the second breakthrough. If it is unsuccessful she believes there is a 60% chance the stock will plunge to 50¢ a share or a 40% chance it will drop to $2 per share. Third Opportunity: The Charity company is going public now and she can purchase stocks in units of 10,000 for $10 per share. The Faith group has come back to her and said that if she doesn't invest now they will need an additional $100,000 in 6 months. The difference is that she will then receive only 50,000 shares since there is less risk that the company will not survive. If the company needs the additional $100,000, the chance of Faith pulling off the necessary breakthrough would be 50%. The potential price of the stock does not change and the chance of the second breakthrough remains the same. Because of this second option from Faith, she is considering holding Charity of only 6 months. She feels there is a 50% chance it will increase to $12 per share, a 40% chance it will stay at $10 per share, and a 10% chance it will drop to $8 per share. If the stock drops to $8 she has decided to hold Charity until the two years are up. She believes there is a 40% chance it will reach $16 per share, a 30% chance it stays at $8 and a 30% chance it returns to the initial $10 per share. She has also been offered a second secured loan investment that would pay her 8% per year for 18 months. If Charity is successful in the first 6 months or holds at $10 per share she can stay with Charity and believes there is a 70% chance of it going to $20 per share and a 30% chance of it dropping to $8. She could also cash out Charity and invest the entire amount in secured 18 month loans. She has checked with Faith and they indicate that if she has more than $100,000 to invest in 6 months she will be able to do that with them and get the same per share stock price as the $100,000 would. She cannot go to Faith with less than $100,000. Create and solve using Precision Tree a Decision Analysis Tree Diagram for the decision that Alice faces. Discuss any assumptions that you have made in solving the problem. Spring 2002 Activity I - 16 Problem 9: Working with Random Variables Find at least 100 values of a random variable; you choose what variable that may be. You could use economic or financial or demographic data from the World Wide Web, or something from your work or your personal life. a. Explain why this data is a sample, and discuss what you know about the sampling methodology that created the data. b. Discuss the accuracy, reliability and precision of the data. c. Calculate the mean, median, standard deviation, and standard error of the mean for the random variable. Use BestFit to find a theoretical distribution that closely matches the distribution of values for your variable. d. Find a 90% confidence interval for the mean of the random variable. Find a 95% confidence interval for the random variable. e. Create (either by hand or by computer software) a histogram or stem plot of the data. Spring 2002 Activity I - 17 Problem 10: Multiple Regression Example The following table lists the selling price of some homes that were sold in 1997 in Roseville, MN. Three variables that may affect the selling price of a home are also given. Selling Price Square Feet Bedrooms Age $ 120,000 1736 4 43 $ 135,000 2175 3 28 $ 118,900 1650 3 31 $ 122,500 1561 3 43 $ 140,300 1450 3 37 $ 209,000 2577 4 30 $ 142,400 1892 4 32 $ 162,000 2702 3 42 $ 175,000 2225 4 30 $ 137,900 1823 3 34 $ 125,000 1508 3 20 $ 156,000 2090 3 44 $ 104,900 1262 2 42 $ 112,000 1488 3 35 $ 160,000 2156 4 34 $ 101,500 1393 2 34 $ 212,500 2112 4 20 $ 212,000 2288 5 30 $ 156,000 2000 4 21 $ 195,000 2434 4 14 Spring 2002 Activity I - 18 To help determine whether or not a linear relationship may exist between the selling price and these three variables, the following correlation coefficients and graphs were produced. Correlation Coefficients Selling Price Square Feet Bedrooms Age Selling Price - Square Feet 0.809 - Bedrooms 0.766 0.614 - Age -0.466 -0.268 -0.404 - 1. Draw a sketch on an x-y axes (or use Excel or other graphing tool) of each of the 6 pairs of variables indicated in the table. For each of the 6 graphs, explain whether or not the correlation values appear to make sense, both from looking at the graph and from your common sense. 2. Which variables seem to have the best linear relationship with the selling price? Which do not? Which are correlated to each other? Is that a problem? A multiple regression calculation gives the following linear model which uses all three variables: SellingPr ice 46 Square Feet 17,133 Bedrooms 719 Age 26, 200 3. Give an English interpretation of each of the numbers in the model; note that the numbers 46, 17,133, and –719 are slopes. 4. Estimate the selling price for a 35 year old home with 3 bedrooms that is 1800 square feet and is in Roseville. 5. What is the range of relevancy for this problem. 6. There are 3 possible models that use a pair of x variables (i.e. Age and Square Feet, Age and Number of Bedroom, and Square Feet and Number of Bedrooms). Which of these models would you select based on the table of correlation coefficients. Explain your choice. Spring 2002 Activity I - 19
© Copyright 2026 Paperzz