Math 1011 Final Exam May 1 2014 Name: ID: 1. True or false, and explain briefly if your answer is false: (a) In an observational study, it is the subjects who assign themselves to the different groups (the treatment group and the control group). The investigators just watch what happens. (5 points) (b) The CLT (Central Limit Theorem) can be applied to both sum of draws and product of draws. (5 points) (c) Suppose a 95%-confidence interval for the average household size in a city is: 2.16 ∼ 2.44. This tells you that 95% of the households in the city contain between 2.16 and 2.44 persons. (5 points) (d) A test statistic measures the difference between the data and what is expected based on the alternative hypothesis. (5 points) Solution. (a) True. (b) False. The CLT can be applied to sum of draws but not to product of draws. (c) False. It confuses the SD with the SE. The SE measures the chance error for multiple samples. The confidence level tells that about 95% of all the samples of the same size, the corresponding confidence intervals will cover the true value. The 2.16 ∼ 2.44 is just one of them. (d) False. The expected value is based on the null hypothesis, not the alternative. All the calculations are based on the null. 2 1 2. Multiple choice: there is only one correct answer to each of the following questions, please circle the correct one. (i) If a technician wants to determine whether a roulette wheel (38 pockets) is ready to . (5 points) use or not, then the technician could use a I. one-sample z-test (a) (b) (c) (d) II. χ2 -test only I only II either I or II none of above (ii) A 10 grams weight is sent to a local laboratory to be determined whether it needs calibration or not. A technician reports the following measurements: 10.0001 grams 10.0004 grams 9.9998 grams 10.0003 grams. Suppose the SD is known as 0.0001 grams, and the chance errors follow the normal distribution. According to the data, the technician could use a . (5 points) (a) (b) (c) (d) one-sample z-test two-sample z-test t-test χ2 -test Solution. (i) (b). It involves 38 categories, so we have to use a χ2 -test. (ii) (a). Since we know the SD of the box and the errors follow the normal curve, we use the normal approximation. 2 2 3. In one study, an organization wants to show the dependence between family incomes and living regions in a small town. More precisely, they divide the family incomes into 3 levels: poor, median, and rich; and they also divide the small town into 2 regions: region A and region B. They plan to show the distribution of the 3 levels in region A differs from the distribution in region B. In order to do that, they want to draw a representative sample of househoulds in both regions, and compute the sample distributions to make a test. Suppose they do the survey in this way: they use their own judgment to choose some specific districts in both regions, then they hand out questionnaires to the households in the chosen districts and wait for the responses. Although the response rate is very low: only 8% in total, they still get a sample: rich median poor Region A 17 266 8 Region B 52 134 3 (a) What kind of test shall the organization use? What is the null hypothesis and what is the alternative? (5 points) (b) Is the sample they got representative? Yes or no, explain briefly. If your answer is no, please give some suggestions to improve the survey? (5 points) Solution. (a) The organization shall use a χ2 -test. The null says the family incomes and the living regions are independent, that is the distribution of the 3 levels in region A is the same as the distribution in region B. The alternative says the distribution differs from region to region. (b) No. There are selection bias and nonresponse bias in the survey. The specific districts in the town might not be representative. For instance, in region B, they might choose a rich district to mail questionnaires. Both rich and poor households do not tend to give responses, so we could possibly get lower rate of rich and poor households. Suggestions: they should use a probability method to draw a sample rather than use their own judgment. They should have no discretion at all. It is better to send some interviewers to interview the households randomly rather than hand out questionnaires. It could increase the response rate. 2 3 4. In a clinical trial on a new therapy for lung cancer. A group of patients were selected to be observed in the study. Some of the patients felt rather despairing that they refused to try any therapies, and the others decided to try the new one. Over the period of the study, the researchers got the data about the survival rate—the rate for those who survive 5 years or longer after the treatment. They wanted to show the new therapy indeed had effect on lung cancer by using a test of significance. (a) What kind of test shall the researchers use? What is the null hypothesis and what is the alternative? (5 points) (b) Are the data based on an observational study or a randomized controlled experiment? What is the treatment group, and what is the control group? (5 points) (c) Did the data they got from the study work? Explain briefly. (5 points) Solution. (a) The researchers shall use a two-sample z-test. (Indeed, the two-sample z-test can only be applied to randomized control experiment, not observational study. The design of experiment here is really bad, they had to change into randomized control experiment, then applied the two-sample z-test. So to this question, any test you answer will get credit.) The null hypothesis is: the survival rate of the treatment group is the same as the survival rate of the control group. The alternative is: the new therapy indeed had effect, that is the survival rate of the treatment group is higher than the one of the control group. (b) The data are based on an observational study. The treatment group is the group of patients who decided to try the new therapy. The control group is the group of patients who refused to try any therapies. (c) No, the data did not work. The survival rate for the treatment group might be higher than what is expected. This is because the patients in the control group are those who felt rather despairing, and the patients in the treatment group are those who were optimistic. It seemed that the patients in the treatment group were more likely to survive after treatment, no matter the therapy was a new one or the old one. This factor confounded with the effect of the new therapy, so there is bias and confounding. 2 4 5. In one year, 200 students took calculus I. The average score is 65, and the SD is 10. Suppose the histogram of the scores follows the normal distribution. (a) Use the normal curve to estimate the number of the students with scores between 70 and 80. (10 points) (b) If one student claims his score is higher than 84% of all the students, use the normal approximation to estimate his score. (10 points) Note: You may need the following data from the normal table: z 0.50 0.75 1.00 1.25 1.50 Height 35.21 30.11 24.20 18.26 12.95 Area 38.29 (≈ 38) 54.67 (≈ 55) 68.27 (≈ 68) 78.87 (≈ 79) 86.64 (≈ 87) Solution. = 0.5, 80−65 = 1.5. From the normal (a) Convert the scores into standard units: 70−65 10 10 table, the area under the normal curve between -0.5 and 0.5 is 38%, and the area between -1.5 and 1.5 is 87%. So the area under the normal curve between 0.5 and 1.5 is 21 × 87% − 12 × 38% = 24.5%. Hence, the number of the students with scores between 70 and 80 is about 200 × 24.5% = 49. (b) Suppose the student’s score is converted into the standard unit z. Then according to the claim, the area under the normal curve to the left of z is about 84%. So the area to the right of z is about 100% − 84% = 16%. By symmetry of the normal curve, the area to the left of -z is about 16%. Therefore, the area between -z and z is about 84% − 16% = 68%. From the table, we see that z=1.00. Hence, the score is about 65 + 10 × 1.00 = 75. 2 5 6. One year, there were about 600,000 faculty members at around 3,000 institutions of higher learning in the U.S. (including junior colleges and community colleges). As part of a continuing study of higher education, the Carnegie Commission took a simple random sample of 2,500 of these faculty persons. On the average, these 2,500 sample persons had published 1.7 research papers in the two years prior to the survey, and the SD was 2.3 papers. Find a 95%-confidence interval for the average number of research papers published by all 600,000 faculty members in the two years prior to the survey. (10 points) Solution. The average number of research papers published by all 600,000 faculty members can be estimated by the sample average as 1.7. Since the SD of the population is unknown,√we use the bootstrap method to estimate it as 2.3 papers. So the SE for sum 115 = 0.046. Hence a is about 2, 500 × 2.3 = 115 papers. Then the SE for average is 2,500 95%-confidence interval is 1.7 ± 0.092. 2 6 7. According to the record in 2000, the 60th percentile of the family income in a certain city was $71,000. In 2013, a market research organization took a simple random sample of 500 families in the city; about 42% of the sample families had incomes over $71,000. Did the 60th percentile of the family income in this city increase over the period 2000 to 2013? Formulate the null and alternative hypotheses, and use a test of significant to detect the statement. (15 points) Note: You may need the following data from the normal table: z 0.80 0.90 1.00 1.10 Height 28.97 26.61 24.20 21.79 Area 57.63 (≈ 58) 63.19 (≈ 63) 68.27 (≈ 68) 72.87 (≈ 73) Solution. To detect the statement: the 60th percentile of the family income in the city increased over the period 2000 to 2013, we use a 0-1 box. Tickets are marked 1 for the families having incomes over $71,000 in 2013, and 0 for the others. The data are like 500 draws from the box. If the 60th percentile remained the same as 2000, then there were about 40% of the household incomes over $71,000, that is there are 40% of 1’s in the box. If the difference is real, 42% is more than 40%, then the 60th percentile did increase over the period. So the null is: the 60th percentile remained the same as 2000, that is there are 40% of 1’s in the box. The alternative is: the 60th percentile did increase, there are more than 40% of 1’s in the box. We√use the one-sample z-test. Based on the null, the √ SD of the box can be estimated as 0.4 × 0.6 ≈ 0.5. So the SE for number is about 500 × 0.5 ≈ 11. Then the SE for 11 percentage is about 500 × 100% = 2.2%. Hence, the z-statistic is about z ≈ 42%−40% ≈ 0.9. 2.2% Therefore, the P-value is about 18.5%, which is not significant. So we stay with the null hypothesis and conclude that the 60th percentile did not increase over the period 2000 to 2013. 2 7 8. (Bonus Problem) Do only when you finish the other problems. In a Nevada roulette, there are 38 pockets numbered: ”0”, ”00”, and 1 through 36. (i) One bet is odd or even, and it pays 1 to 1. That is, for the number 1 through 36, it will be either an odd number or an even number. When betting $1, if you win the house will give you an extra $1, if you lose the house will get your $1. If it comes out ”0” or ”00”, you lose. (ii) Another bet is 1 ∼ 18 or 19 ∼ 36, and it also pays 1 to 1. For the number 1 through 36, it will be either in 1 ∼ 18 or in 19 ∼ 36. When betting $1, if you win the house will give you an extra $1, if you lose the house will get your $1. Again, if it comes out ”0” or ”00”, you lose. Suppose someone will bet $1 on odd, and at the same time, someone else will bet $1 on 1 ∼ 18. If this pair of bets is made 400 times. (a) What is the expected net gain for the house? Give or take by how much or so? (Please round to integer. 5 points) (b) How many times will the house make money? Give or take by how much or so? (Please round to integer. 5 points) Solution. (a) We analyze the possibility for the 38 pockets: (i) for ”0” or ”00”, the house get $2 each time; (ii) for 1 ∼ 18, the house lose $1, moreover on odd numbers, the house lose another $1, so there are 9 -$2’s, and 9 cancel out $0’s; (iii) for 19 ∼ 36, the house win $1, but on odd numbers, it is canceled out, so there are 9 $0’s and 9 $2’s. In summary, the tickets are: $2, $2, 9 -$2’s, 9 $0’s, 9 $0’s, and 9 $2’s, that is 11 $2’s, 9 -$2’s, and 18 $0’s. × 400 ≈ 42 dollars. Since In 400 times, the expeted net gain will be 11×2+9×(−2) 38 the average of the box is approximately 0, the SD of the box can be estimated as q √ 11×22 +9×(−2)2 ≈ 1.5. Then the SE for the sum is about 400 × 1.5 = 30 dollars. 38 Hence, the net gain for the house would be about 42 dollars, give or take 30 dollars or so. (b) We use a new 0-1 box, with tickets marked 1 for the house making money, and 0 otherwise. From part (a), the house make money only when they get $2, so there are 11 11 1’s and 27 0’s in the box. The expected times will be 38 × 400 ≈ 116. The SD of q √ 27 the box is 11 400 × 0.45 ≈ 9. Therefore, 38 × 38 ≈ 0.45. Then the SE for number is the house will win 116 times, give or take 9 times or so. 2 8
© Copyright 2026 Paperzz