Click on the desired subsection below to position to that page. You may also scroll to browse the document. Supplemental Topic 5: Ethics S5.1 Ethical Treatment of Human and Animal Participants S5.2 Assurance of Data Quality S5.3 Appropriate Statistical Analyses S5.4 Fair Reporting of Results Case Study: Science Fair Project or Fair Science Project Exercises S5-W3527 9/28/05 4:05 PM Page S5-1 Zia Soleil /Iconica/Getty Images SUPPLEMENTAL TOPIC 5 Can assertiveness training to turn away telemarketers be evaluated? S e e E x a m p l e S 5 . 2 (p. S5-12) S5-1 S5-W3527 9/28/05 4:05 PM Page S5-2 Ethics Chapter Title Ethical issues in experiments include ensuring proper treatment of human and animal participants, making sure data are of high quality, carrying out appropriate statistical analyses, and reporting the results in a fair and unbiased way. Many professional societies have a code of ethics for their members to follow when conducting research. Chapter opeing quote (8pts b/r to box end, 36pts to ch op text) A s with any human endeavor, in statistical studies, there are ethical considerations that must be taken into account. These considerations fall into multiple categories, and we discuss the following issues here: 1. 2. 3. 4. Throughout the chapter, this icon introduces a list of resources on the StatisticsNow website at http:// 1pass.thomson.com that will: • Help you evaluate your knowledge of the material • Allow you to take an examprep quiz • Provide a Personalized Learning Plan targeting resources that address areas you should study Ethical treatment of human and animal participants Assurance of data quality Appropriate statistical analyses Fair reporting of results Most professional societies have a code of ethics that their members are asked to follow. Guidelines about the conduct of research studies are often included. For instance, the American Psychological Association first published a code of ethics in 1953 and has updated it at least every ten years since then. In 1999, the American Statistical Association, one of the largest organizations of professional statisticians, published Ethical Guidelines for Statistical Practice listing 67 recommendations, divided into categories such as “Professionalism” and “Responsibilities to Research Subjects.” ❚ S5.1 Ethical Treatment of Human and Animal Participants Increasing attention has been paid to the ethics of experimental work, partly as a consequence of some deceptive experiments that led to unintended harm to participants. Here is a classic example of an experiment conducted in the early 1960s that would probably be considered unethical today. S5-2 S5-W3527 9/28/05 4:05 PM Page S5-3 Ethics Example S5.1 S5-3 Stanley Milgram’s “Obedience and Individual Responsibility” Experiment Social psychologist Stanley Milgram was interested in the extent to which ordinary citizens would obey an authority figure, even if it meant injuring another human being. Through newspaper ads offering people money to participate in an experiment on learning, he recruited people in the area surrounding Yale University, where he was a member of the faculty. When participants arrived, they were greeted by an authoritative researcher in a white coat and introduced to another person who they were told was a participant like them but who was actually an actor. Lots were drawn to see who would be the “teacher” and who would be the “student,” but in fact it had been predetermined that the actor would be the “student” and the local citizen would be the “teacher.” The student/actor was placed in a chair with restricted movement and hooked up to what was alleged to be an electrode that administered an electric shock. The teacher conducted a memory task with the student and was instructed to administer a shock when the student missed an answer. The shocking mechanism was shown to start at 15 volts and increase in intensity with each wrong answer, up to 450 volts. When the alleged voltage reached 375, it was labeled as “Danger/Severe,” and when it reached 435, it was labeled “XXX.” The experimenter sat at a nearby table, encouraging the teacher to continue to administer the shocks. The student/actor would respond with visible and increasing distress. The teacher was told that the experimenter would take responsibility for any harm that came to the student. The disturbing outcome of the experiment was that 65% of the participants continued to administer the alleged shocks up to the full intensity, even though many of them were quite distressed and nervous about doing so. Even at “very strong” intensity, 80% of the participants were still administering the electric shocks. (Sources: http://www.vanderbilt.edu/AnS/Anthro/Anth101/ stanley_milgram_experiment.htm and Milgram (1983).) ■ This experiment would be considered unethical today because of the stress it caused for the participants. The American Psychological Association continues to update its Ethical Principles of Psychologists and Code of Conduct. The latest version as of fall 2005 can be found at http://www.apa.org/ethics/ code2002.html. One of the sections of the code is called “Deception in Research” and includes three instructions (http://www.apa.org/ethics/code2002 .html#8_07); Milgram’s experiment would most likely fail the criterion in part (b): (a) Psychologists do not conduct a study involving deception unless they have determined that the use of deceptive techniques is justified by the study’s significant prospective scientific, educational, or applied value and that effective nondeceptive alternative procedures are not feasible. (b) Psychologists do not deceive prospective participants about research that is reasonably expected to cause physical pain or severe emotional distress. (c) Psychologists explain any deception that is an integral feature of the design and conduct of an experiment to participants as early as is feasible, preferably at the conclusion of their participation, but no later S5-W3527 9/28/05 4:05 PM Page S5-4 S5-4 Supplemental Topic 5 than at the conclusion of the data collection, and permit participants to withdraw their data. Informed Consent Virtually all experiments with human participants require that the researchers obtain the informed consent of the participants. In other words, participants are to be told what the research is about and given an opportunity to make an informed choice about whether to participate. If you were a potential participant in a research study, what would you want to know in advance to make an informed choice about participation? Because of such issues as the need for a control group and the use of double-blinding, it is often the case that participants cannot be told everything in advance. For instance, it would be antithetical to good experimental procedure to tell participants in advance whether they were taking a drug or a placebo or to tell them whether they were in the treatment group or the control group. Instead, the use of multiple groups is explained, and participants are told that they will be randomly assigned to a group but will not know what it is until the conclusion of the experiment. The information that is provided in this process is slightly different in such research as psychology experiments than it is in medical research. In both cases, participants are supposed to be told the nature and purpose of the research and to be informed about any risks or benefits. In medical research, the participants generally are suffering from a disease or illness, and an additional requirement is that they be informed about alternative treatments. Of course, it is unethical to withhold a treatment that is known to work. In section 8.02 of its code of ethics, the American Psychological Association provides these guidelines for informed consent in experiments in psychology (http://www.apa.org/ethics/code2002.html#8_02): (a) When obtaining informed consent as required in Standard 3.10, Informed Consent, psychologists inform participants about (1) the purpose of the research, expected duration, and procedures; (2) their right to decline to participate and to withdraw from the research once participation has begun; (3) the foreseeable consequences of declining or withdrawing; (4) reasonably foreseeable factors that may be expected to influence their willingness to participate such as potential risks, discomfort, or adverse effects; (5) any prospective research benefits; (6) limits of confidentiality; (7) incentives for participation; and (8) whom to contact for questions about the research and research participants’ rights. They provide opportunity for the prospective participants to ask questions and receive answers. (b) Psychologists conducting intervention research involving the use of experimental treatments clarify to participants at the outset of the research (1) the experimental nature of the treatment; (2) the services that will or will not be available to the control group(s) if appropriate; (3) the means by which assignment to treatment and control groups will be made; (4) available treatment alternatives if an individual does not wish to participate in the research or wishes to withdraw once a study has begun; and (5) compensation for or monetary costs of participating including, if appropriate, whether reimbursement from the participant or a third-party payor will be sought. S5-W3527 9/28/05 4:05 PM Page S5-5 Ethics S5-5 Informed Consent in Medical Research The U.S. Department of Health and Human Services has a detailed policy on informed consent practices in medical research, which can be accessed at http://www.hhs.gov/ohrp/humansubjects/guidance/45cfr46.htm#46.116. A more user-friendly website is provided by their Office for Protection from Research Risks: http://www.hhs.gov/ohrp/humansubjects/guidance/ictips.htm. It provides the following tips and checklist. TIPS ON INFORMED CONSENT The process of obtaining informed consent must comply with the requirements of 45 CFR 46.116. The documentation of informed consent must comply with 45 CFR 46.117. The following comments may help in the development of an approach and proposed language by investigators for obtaining consent and its approval by IRBs: 1 Informed consent is a process, not just a form. Information must be presented to enable persons to voluntarily decide whether or not to participate as a research subject. It is a fundamental mechanism to ensure respect for persons through provision of thoughtful consent for a voluntary act. The procedures used in obtaining informed consent should be designed to educate the subject population in terms that they can understand. Therefore, informed consent language and its documentation (especially explanation of the study’s purpose, duration, experimental procedures, alternatives, risks, and benefits) must be written in “lay language” (i.e., understandable to the people being asked to participate). The written presentation of information is used to document the basis for consent and for the subjects’ future reference. The consent document should be revised when deficiencies are noted or when additional information will improve the consent process. ● ● Use of the first person (e.g., “I understand that . . .”) can be interpreted as suggestive, may be relied upon as a substitute for sufficient factual information, and can constitute coercive influence over a subject. Use of scientific jargon and legalese is not appropriate. Think of the document primarily as a teaching tool not as a legal instrument. Describe the overall experience that will be encountered. Explain the research activity, how it is experimental (e.g., a new drug, extra tests, separate research records, or nonstandard means of management, such as flipping a coin for random assignment or other design issues). Inform the human subjects of the reasonably foreseeable harms, discomforts, inconvenience, and risks that are associated with the research activity. If additional risks are identified during the course of the research, the consent process and documentation will require revisions to inform subjects as they are recontacted or newly contacted. ● 1 The IRB (Institutional Review Board) is a board that all research institutions are required to maintain for oversight of research involving human and animal participants. S5-W3527 9/28/05 4:05 PM Page S5-6 S5-6 Supplemental Topic 5 ● Describe the benefits that subjects may reasonably expect to encounter. There may be none other than a sense of helping the public at large. If payment is given to defray the incurred expense for participation, it must not be coercive in amount or method of distribution. ● Describe any alternatives to participating in the research project. For example, in drug studies the medication(s) may be available through their family doctor or clinic without the need to volunteer for the research activity. The regulations insist that the subjects be told the extent to which their personally identifiable private information will be held in confidence. For example, some studies require disclosure of information to other parties. Some studies inherently are in need of a Certificate of Confidentiality which protects the investigator from involuntary release (e.g., subpoena) of the names or other identifying characteristics of research subjects. The IRB will determine the level of adequate requirements for confidentiality in light of its mandate to ensure minimization of risk and determination that the residual risks warrant involvement of subjects. ● ● If research-related injury (i.e., physical, psychological, social, financial, or otherwise) is possible in research that is more than minimal risk (see 45 CFR 46.102[g]), an explanation must be given of whatever voluntary compensation and treatment will be provided. Note that the regulations do not limit injury to “physical injury.” This is a common misinterpretation. ● The regulations prohibit waiving or appearing to waive any legal rights of subjects. Therefore, for example, consent language must be carefully selected that deals with what the institution is voluntarily willing to do under circumstances such as providing for compensation beyond the provision of immediate or therapeutic intervention in response to a research-related injury. In short, subjects should not be given the impression that they have agreed to and are without recourse to seek satisfaction beyond the institution’s voluntarily chosen limits. The regulations provide for the identification of contact persons who would be knowledgeable to answer questions of subjects about the research, rights as a research subject, and research-related injuries. These three areas must be explicitly stated and addressed in the consent process and documentation. Furthermore, a single person is not likely to be appropriate to answer questions in all areas. This is because of potential conflicts of interest or the appearance of such. Questions about the research are frequently best answered by the investigator(s). However, questions about the rights of research subjects or research-related injuries (where applicable) may best be referred to those not on the research team. These questions could be addressed to the IRB, an ombudsman, an ethics committee, or other informed administrative body. Therefore, each consent document can be expected to have at least two names with local telephone numbers for contacts to answer questions in these specified areas. ● S5-W3527 9/28/05 4:05 PM Page S5-7 Ethics S5-7 ● The statement regarding voluntary participation and the right to withdraw at any time can be taken almost verbatim from the regulations (45 CFR 46.116[a][8]). It is important not to overlook the need to point out that no penalty or loss of benefits will occur as a result of both not participating or withdrawing at any time. It is equally important to alert potential subjects to any foreseeable consequences to them should they unilaterally withdraw while dependent on some intervention to maintain normal function. Don’t forget to ensure provision for appropriate additional requirements which concern consent. Some of these requirements can be found in sections 46.116(b), 46.205(a)(2), 46.207(b), 46.208(b), 46.209(d), 46.305(a)(5– 6), 46.408(c), and 46.409(b). The IRB may impose additional requirements that are not specifically listed in the regulations to ensure that adequate information is presented in accordance with institutional policy and local law. ● Source: http://www.hhs.gov/ohrp/humansubjects/guidance/ictips.htm INFORMED CONSENT CHECKLIST Basic and Additional Elements A statement that the study involves research An explanation of the purposes of the research The expected duration of the subject’s participation A description of the procedures to be followed Identification of any procedures which are experimental A description of any reasonably foreseeable risks or discomforts to the subject A description of any benefits to the subject or to others which may reasonably be expected from the research A disclosure of appropriate alternative procedures or courses of treatment, if any, that might be advantageous to the subject A statement describing the extent, if any, to which confidentiality of records identifying the subject will be maintained For research involving more than minimal risk, an explanation as to whether any compensation, and an explanation as to whether any medical treatments are available, if injury occurs and, if so, what they consist of, or where further information may be obtained Research Qs Rights Qs Injury Qs An explanation of whom to contact for answers to pertinent questions about the research and research subjects’ rights, and whom to contact in the event of a research-related injury to the subject S5-W3527 9/28/05 4:05 PM Page S5-8 S5-8 Supplemental Topic 5 A statement that participation is voluntary, refusal to participate will involve no penalty or loss of benefits to which the subject is otherwise entitled, and the subject may discontinue participation at any time without penalty or loss of benefits, to which the subject is otherwise entitled Additional Elements, as Appropriate A statement that the particular treatment or procedure may involve risks to the subject (or to the embryo or fetus, if the subject is or may become pregnant), which are currently unforeseeable Anticipated circumstances under which the subject’s participation may be terminated by the investigator without regard to the subject’s consent Any additional costs to the subject that may result from participation in the research The consequences of a subject’s decision to withdraw from the research and procedures for orderly termination of participation by the subject A statement that significant new findings developed during the course of the research, which may relate to the subject’s willingness to continue participation, will be provided to the subject The approximate number of subjects involved in the study Source: http://www.hhs.gov/ohrp/humansubjects/assurance/consentckls.htm As you can see, the process of obtaining informed consent can be daunting, and some potential participants may opt out because of an excess of information. There are additional issues that arise when participants are in certain categories. For example, young children cannot be expected to fully understand the informed consent process. The regulations state that children should be involved in the decision-making process to whatever extent possible and that a parent must also be involved. There are special rules that apply to research on prisoners and to research on “Pregnant Women, Human Fetuses and Neonates.” These can be found on the Internet at www.cdc.gov/od/ads/ hsrchklist.htm.1 Research on Animals Perhaps nothing is as controversial in research as the use of animals. There are some people who believe that any experimentation on animals is unethical, whereas others argue that humans have benefited immensely from such research and that justifies its use. There is no doubt that in the past, some research with animals, as with humans, was clearly unethical. But again, the professions of medicine and behavioral sciences, which are the two arenas in which animal research is most prevalent, have developed ethical guidelines that their members are supposed to follow. 1All websites in this chapter are current as of September 2005 but may have changed by the time you read this. S5-W3527 9/28/05 4:05 PM Page S5-9 Ethics S5-9 Nonetheless, animals do not have the rights of humans as participants in experiments. Here are two of the elements of the American Psychological Association’s Code of Ethics section on “Humane Care and Use of Animals in Research.” Clearly, these statements would not be made regarding human subjects, and even these are stated ideals that many researchers may not follow. (Source: http://www.apa.org/ethics/code2002.html#8_09). (e) Psychologists use a procedure subjecting animals to pain, stress, or privation only when an alternative procedure is unavailable and the goal is justified by its prospective scientific, educational, or applied value. (g) When it is appropriate that an animal’s life be terminated, psychologists proceed rapidly, with an effort to minimize pain and in accordance with accepted procedures. The U.S. National Institutes of Health has also established the Public Health Service Policy on Humane Care and Use of Laboratory Animals, which can be found at http://www.grants.nih.gov/grants/olaw/references/phspol.htm. However, the guidelines for research on animals are not nearly as detailed as those for human subjects. A research study of the approval of animal-use protocols found strong inconsistencies when protocols that had been approved or disapproved by one institution were submitted for approval to another institution’s review board. The decisions of the two boards to approve or disapprove the protocols agreed at no better than chance levels, suggesting that the guidelines for approval of animal research are not spelled out in sufficient detail (Plous and Herzog, 2001). For more information and links to a number of websites on the topic of research with animals, visit http://www.socialpsychology .org/methods.htm#animals. S5.2 Assurance of Data Quality When reading the results of a study, you should be able to have reasonable assurance that the researchers collected data of high quality. This is not as easy as it sounds, and we have explored many of the difficulties and disasters of collecting and interpreting data for research studies in Chapters 3 and 4. The quality of data becomes an ethical issue when there are personal, political, or financial reasons motivating one or more of those involved in the research, and steps are not taken to assure the integrity of the data. Data quality is also an ethical issue when researchers knowingly fail to report problems with data collection that may distort the interpretation of the results. As a simple example, survey data should always be reported with an explanation of how the sample was selected, what questions were asked, who responded, and how they may differ from those who didn’t respond. U.S. Federal Statistical Agencies The U.S. government spends large amounts of money to collect and disseminate statistical information on a wide variety of topics. In recent years, there has been an increased focus on making sure the data are of high quality. The Federal Register Notices on June 4, 2002 (Vol. 67, No. 107), provided a report called S5-W3527 9/28/05 4:05 PM Page S5-10 S5-10 Supplemental Topic 5 Federal Statistical Organizations’ Guidelines for Ensuring and Maximizing the Quality, Objectivity, Utility, and Integrity of Disseminated Information. (The report can be found on the Web at http://www.fedstats.gov/policy/Stat-AgencyFR-June4 –2002.pdf.) The purpose of the report was to provide notice that each of over a dozen participating federal statistical agencies were making such guidelines available for public comment. (A list of federal statistical agencies, with links to their websites and the data they provide, can be found at www .fedstats .gov.) Most of the guidelines proposed by the statistical agencies were commonsense, self-evident good practice. As an example, the following box includes data collection principles from the Bureau of Transportation Statistics website. 3.1 DATA COLLECTION OPERATIONS Principles Forms, questionnaires, automated collection screens, and file layouts are the medium through which data are collected. They consist of sets of questions or annotated blanks on paper or computer that request information from data suppliers. They need to be designed to maximize communication to the data supplier. ● Data collection includes all the processes involved in carrying out the data collection design to acquire data. Data collection operations can have a high impact on the ultimate data quality, especially when they deviate from the design. ● ● The data collection method should be appropriate to the data complexity, collection size, data requirements, and amount of time available. For example, a reporting system will often primarily rely on the required reporting mechanism, with follow-up for missing data. Similarly, a large survey requiring a high response rate will often start off with a mail out, followed by telephone contact, and finally by a personal visit. Specific data collection environmental choices can significantly affect error introduced at the collection stage. ● For example, if the data collector is collecting as a collateral duty or is working in an uncomfortable environment, it may adversely affect the quality of the data collected. Also, if the data are particularly difficult to collect, it will affect the data quality. ● Conversion of data on paper to electronic form (e.g., key entry, scanning) introduces a certain amount of error which must be controlled. ● Third-party sources of data introduce error in their collection processes. ● Computer-assisted information collection can result in more timely and accurate information. Initial development costs will be higher, and much more lead time will be required to develop, program, and test the data collection system. However, the data can be checked and corrected S5-W3527 9/28/05 4:05 PM Page S5-11 Ethics S5-11 when originally entered, key-entry error is eliminated, and the lag between data collection and data availability is reduced. ● The use of sensors for data can significantly reduce error. Source: http://www.bts.gov/statpol/guide/chapter3.html So where does this fit in a discussion of ethics? Although it is laudable that the federal statistical agencies are spelling out these principles, the ethical issues that are involved in ensuring the quality of government data are more subtle than can be evoked from general principles. There are always judgments to be made, and politics can sometimes enter into those judgments. For instance, a report by the National Research Council (Citro and Norwood, 1997) examined many aspects of the functioning of the Bureau of Transportation Statistics (BTS) and formulated suggestions for improvement. The BTS was created in 1992 and therefore had been in existence for only five years at the time of the report. One of the data-quality issues discussed in the report was “comparability [of statistics about transportation issues] across data systems and time” (p. 32). One example given was that the definition of a transportation fatality was not consistent across modes of transportation until 1994, when consistency was mandated by the Secretary of Transportation. Before that, a highway traffic fatality was counted if death resulted from the accident within 30 days. But for a railroad accident, the time limit was 365 days. Continuing to use different standards for different modes of transportation would make for unfair safety comparisons. In an era in which various transportation modes compete for federal dollars, politics could easily enter into a decision to report fatality statistics one way or another. Probably the most controversial federal data collection issue surrounds the decennial United States census, most recently conducted in 2000. In 1988, a lawsuit was filed, led by New York City, alleging that urban citizens are undercounted in the census process. The political ramifications are enormous, because redistricting of congressional seats results from shifting population counts. The lawsuit began a long saga of political and statistical twists and turns that was still unresolved at the taking of the 2000 census. For an interesting and readable account, see Who Counts? The Politics of Census-Taking in Contemporary America, by Anderson and Fienberg (2001). A recent report by the National Research Council’s Committee on National Statistics (Martin, Straf, and Citro, 2001), called Principles and Practices for a Federal Statistical Agency, enumerated three principles and eleven practices that such agencies should follow. One of the recommended practices was “A strong position of independence.” The recommendation included a clear statement about separating politics from the role of statistical agencies: In essence, a statistical agency must be distinct from those parts of the department that carry out enforcement and policy-making activities. It must be impartial and avoid even the appearance that its collection, analysis, and reporting processes might be manipulated for political purposes or that individually identifiable data might be turned over for administrative, regulatory, or enforcement purposes. S5-W3527 9/28/05 4:05 PM Page S5-12 S5-12 Supplemental Topic 5 The circumstances of different agencies may govern the form that independence takes. In some cases, the legislation that establishes the agency may specify that the agency head be professionally qualified, be appointed by the President and confirmed by the Senate, serve for a specific term not coincident with that of the administration, and have direct access to the secretary of the department in which the agency is located. (p. 6) It should be apparent to you that it is not easy to maintain complete independence from politics; for instance, many of the heads of these agencies are appointed by the President and confirmed by the Senate. Other steps that are taken to help maintain independence include prescheduled release of important statistical information such as unemployment rates and authority to release information without approval from the policy-making branches of the organization. Experimenter Effects and Personal Bias We learned in Chapter 4 that there are numerous ways in which experimenter effects can bias statistical studies. If a researcher has a desired outcome for a study and if conditions are not very carefully controlled, it is quite likely that the researcher will influence the outcome. Here are some of the precautions that it may be possible to take to help prevent this from happening: Example S5.2 ● Randomization done by a third party with no vested interest in the experiment, or at least done by a well-tested computer randomization device ● Automated data recording without intervention from the researcher ● Double-blind procedures to ensure that no one who has contact with the participants knows which treatment or condition they are receiving ● An honest evaluation that what is being measured is appropriate for the research question of interest ● A standard protocol for the treatment of all participants that must be strictly followed Janet’s (Hypothetical) Dissertation Research This is a hypothetical example to illustrate some of the subtle (and not so subtle) ways in which experimenter bias can alter the data collected for a study. Janet, a Ph.D student, is under tremendous pressure to complete her research successfully. For her study, she hypothesized that role-playing assertiveness training for women would help them learn to say “no” to telephone solicitors. She recruited 50 undergraduate women as volunteers for the study. The plan was that each volunteer would come to her office for half an hour. For 25 of the volunteers (the control group), she would simply talk with them for 30 minutes about a variety of topics, including their feelings about saying “no” to unwanted requests. For the other 25 volunteers, Janet would spend 15 minutes on similar discussion and the remaining 15 minutes on a prespecified role-playing scenario in which the volunteer got to practice saying “no” in various situations. Two weeks after each S5-W3527 9/28/05 4:05 PM Page S5-13 Ethics S5-13 volunteer’s visit, Brad, a colleague of Janet’s, would phone the volunteer anonymously, pretending to be a telephone solicitor selling a magazine for a good price, and would record the conversation so that Janet could determine whether or not the volunteers were able to say “no.” It’s the first day of the experiment, and the first volunteer is in Janet’s office. Janet has the randomization list that was prepared by someone else, which randomly assigns each of the 50 volunteers to either Group 1 or Group 2. This first volunteer appears to be particularly timid, and Janet is sure she won’t be able to learn to say “no” to anyone. The randomization list says that volunteer 1 is to be in Group 2. But what was Group 2? Did she say in advance? She can’t remember. Oh well, Group 2 will be the control group. The next volunteer comes in, and according to the randomization list, volunteer 2 is to be assigned to Group 1, which now is defined to be the roleplaying group. Janet follows her predefined protocol for the half-hour. But when the half-hour is over, the student doesn’t want to leave. Just then, Brad comes by to say “hello,” and the three of them spend another half-hour having an amiable conversation. The second phase of the experiment begins, and Brad starts phoning the volunteers. The conversations are recorded so that Janet can assess the results. When listening to volunteer 2’s conversation, Janet notices that almost immediately, she says to Brad, “Your voice sounds awfully familiar, do I know you?” When he assures her that she does not and asks her to buy the magazine, she says, “I can’t place my finger on it, but this is a trick, right? I’m sure I know your voice. No thanks, no magazine!” Janet records the data: a successful “no” to the solicitation. Janet listens to another call, and although she is supposed to be blind to which group the person was in, she recognizes the voice as being one of the role-playing volunteers. Brad pitches the magazine to her, and her response is “Oh, I already get that magazine. But if you are selling any others, I might be able to buy one.” Janet records the data: a successful “no” to the question of whether she wants to buy that magazine. The second phase is finally over, and Janet has a list of the results. But now she notices a problem. There are 26 people listed in the control group and 24 listed in the role-playing group. She tries to resolve the discrepancy but can’t figure it out. She notices that the last two people on the list are both in the control group and that they both said “no” to the solicitation. She figures she will just randomly choose one of them to move over to the role-playing data, so she flips a coin to decide which one to move. This example illustrates just a few of the many ways in which experimenter bias can enter into a research study. Even when it appears that protocols are carefully in place, in the real world, it is nearly impossible to place controls on all aspects of the research. In this case, notice that every decision Janet made benefited her desired conclusion that the role-playing group would learn to say “no.” It is up to the researchers to take the utmost care not to allow these kinds of unethical influences on the results. ■ S5-W3527 9/28/05 4:05 PM Page S5-14 S5-14 Supplemental Topic 5 S5.3 Appropriate Statistical Analyses There are a number of decisions that need to be made in analyzing the results of a study, and care must be taken not to allow biases to affect those decisions. Here are some examples of decisions that can influence the results: ● Should the alternative hypothesis be one-sided or two-sided? This decision must be made before the data are examined. ● What level of confidence or level of significance should be used? ● How should outliers be handled? ● Should a nonparametric procedure be used? ● Have all appropriate conditions and assumptions been investigated and verified? One of the easiest ethical blunders to make is to try different methods of analysis until one produces the desired results. Ideally, the planned analysis should be spelled out in advance. If various analysis methods are attempted, all analyses should be reported along with the results. Example S5.3 Jake’s (Hypothetical) Fishing Expedition Jake was doing a project for his statistics class, and he knew that it was important to find a statistically significant result because all of the interesting examples in class had statistically significant results. He decided to compare the memory skills of males and females, but he did not have a preconceived idea of who should do better, so he planned to do a two-sided test. He constructed a memory test in which he presented people with a list of 100 words and allowed them to study it for 10 minutes. The next day, he presented another list of 100 words to the participants and asked them to indicate for each word whether it had been on the list the day before. The answers were entered into a bubble sheet and scored by computer so that Jake didn’t inadvertently misreport any scores. The data for number of correct answers were as follows: Males: 69, 70, 71, 61, 73, 68, 70, 69, 67, 72, 64, 72, 65, 70, 100 Females: 64, 74, 72, 76, 64, 72, 76, 80, 72, 73, 71, 70, 64, 76, 70 Jake remembered that they had been taught two different tests for comparing independent samples: a two-sample t-test to compare means and a Mann– Whitney rank-sum test to compare medians. He decided to try them both to see which one gave better results. Using Minitab, Jake found the results shown in the output that follows. The two-sample t-test uses the sample means, which are 70.73 for males and 71.6 for females. The difference in sample means is 70.73 71.6 0.87 and the test statistic is 0.34. The degrees of freedom are reported to be 21 and the p-value for the two-sided test is .739. The Mann–Whitney test uses the sample medians, which are 70 for males and 72 for females. The test statistic is reported as W 193.5 and the p-value is .108 (adjusted for the fact that there were ties). S5-W3527 9/28/05 4:05 PM Page S5-15 Ethics S5-15 Two-Sample T-Test and CI: Males, Females Two-sample T for Males vs Females Males Females N 15 15 Mean 70.73 71.60 StDev 8.73 4.75 SE Mean 2.3 1.2 Difference = mu Males – mu Females Estimate for difference: – 0.87 95% CI for difference: (– 6.20, 4.47) T-Test of difference = 0 (vs not =): T-Value = – 0.34 P-Value = 0.739 DF = 21 Mann–Whitney Test and CI: Males, Females Males N = 15 Median = 70.00 Females N = 15 Median = 72.00 Point estimate for ETA1-ETA2 is –3.00 95.4 Percent CI for ETA1-ETA2 is (– 6.00,1.00) W = 193.5 Test of ETA1 = ETA2 vs ETA1 not = ETA2 is significant at 0.1103 The test is significant at 0.1080 (adjusted for ties) Cannot reject at alpha = 0.05 Jake was disappointed that neither test gave a small enough p-value to reject the null hypothesis. He was also surprised by how different the results were. The t-test produced a p-value of .739, whereas the Mann–Whitney test produced a p-value of .108, adjusted for ties. It dawned on him that maybe he should conduct a one-tailed test instead. After all, it was clear that the mean and median for the females were both higher, so he decided that the alternative hypothesis should be that females would do better. He reran both tests. This time, the p-value for the t-test was .369, and for the Mann–Whitney test it was .054. Maybe he should simply round that one off to .05 and be done. But Jake began to wonder why the tests produced such different results. He looked at the data and realized that there was a large outlier in the data for the males. Someone had scored 100%! Jake thought that must be impossible. He knew that he shouldn’t just remove an outlier, so he decided to replace it with the median for the males, 70. Just to be sure he was being fair, he reran the original two-sided hypothesis tests. This time, the p-value for the t-test was .066, and the p-value for the Mann–Whitney test (adjusted for ties) was .0385. Finally! Jake wrote his analysis, explaining that he did a two-sided test because he didn’t have a preconceived idea of whether males or females should do better. He said that he decided to do a nonparametric test because he had small samples and he wanted to make sure none of the assumptions was violated. He didn’t mention replacing the outlier because he didn’t think it was a legitimate value anyway. This hypothetical situation is obviously exaggerated to make a point, but it illustrates the dangers of data-snooping. If you manipulate the data and try enough different procedures, something will eventually produce desired results. It is not ethical to keep trying different methods of analysis until one produces a desired result. ■ Example S5.4 The Debate Over Passive Smoking On July 28, 1993, the Wall Street Journal featured an article with the headline “Statisticians occupy front lines in battle over passive smoking” (Jerry E. Bishop, pp. B-1 and B-4). The interesting feature S5-W3527 9/28/05 4:05 PM Page S5-16 S5-16 Supplemental Topic 5 of this article was that it highlighted a debate between the U.S. Environmental Protection Agency (EPA) and the tobacco industry over what level of confidence should prevail in a confidence interval. Here is the first side of the story, as reported in the article: The U.S. Environmental Protection Agency says there is a 90% probability that the risk of lung cancer for passive smokers is somewhere between 4% and 35% higher than for those who aren’t exposed to environmental smoke. To statisticians, this calculation is called the “90% confidence interval.” Now for the other side of the story: And that, say tobacco-company statisticians, is the rub. “Ninety-nine percent of all epidemiological studies use a 95% confidence interval,” says Gio B. Gori, director of the Health Policy Center in Bethesda, MD, who has frequently served as a consultant and an expert witness for the tobacco industry. The problem underlying this controversy is that the amount of data available at the time did not allow an extremely accurate estimate of the true change in risk of lung cancer for passive smokers. The EPA statisticians were afraid that the public would not understand the interpretation of a confidence interval. If a 95% confidence interval actually went below zero percent, which it might do, the tobacco industry could argue that passive smoke might reduce the risk of lung cancer. As noted by one of the EPA’s statisticians, Dr. Gori is correct in saying that using a 95% confidence interval would hint that passive smoking might reduce the risk of cancer. But, he says, this is exactly why it wasn’t used. The EPA believes it is inconceivable that breathing in smoke containing cancer-causing substances could be healthy and any hint in the report that it might be would be meaningless and confusing. (p. B-4) What do you think? Should the EPA have used the common standard and reported a 95% confidence interval? If they had done so, the interval would have included values indicating that the risk of lung cancer for those who are exposed to passive smoke may actually be lower than the risk for those who are not. The 90% confidence interval reported was narrower and did not include those values. (Source: Seeing Through Statistics, 3rd ed., Jessica M. Utts, Belmont, CA: Duxbury Press, 2005.) ■ S5.4 Fair Reporting of Results Research results are usually reported in articles published in professional journals. Most researchers are careful to report the details of their research, but there are more subtle issues that researchers and journalists sometimes ignore. There are also more blatant reporting biases that can mislead readers, usually in the direction of making stronger conclusions than are appropriate. We discussed some of the problems with interpreting statistical inference results in earlier chapters, and some of those problems are related to how results are re- S5-W3527 9/28/05 4:05 PM Page S5-17 Ethics S5-17 ported. Let’s revisit some of them and discuss some other possible ethical issues that can arise when reporting the results of research. Sample Size and Statistical Significance Remember that whether a study achieves statistical significance depends not only on the magnitude of whatever effect or relationship may actually exist but also on the size of the study. In particular, if a study fails to find a statistically significant result, it is important to include a discussion of the sample size that was used and the power to detect an effect that would result from that sample size. Often, research is reported as having found no effect or no difference, when in fact the study had such low power that even if an effect existed, the study would have been unlikely to detect it. One of the ethical responsibilities of a researcher is to collect enough data to have relatively high probability of finding an effect if it really does exist. In other words, it is the researcher’s responsibility to determine an appropriate sample size in advance, as well as to discuss the issue of power when the results are presented. A study that is too small to have adequate power is a waste of everyone’s time and money. As we discussed earlier in the book, the other side of the problem is the recognition that statistical significance does not necessarily imply practical importance. If possible, report the magnitude of an effect or difference through the use of a confidence interval, rather than just reporting a p-value. Multiple Hypothesis Tests and Selective Reporting In most research studies, multiple outcomes are measured. Many hypothesis tests may be done, looking for whatever statistically significant relationships may exist. For example, a study might ask people to record their dietary intake over a long time period so that the investigators can look for foods that are correlated with certain health outcomes. The problem is that even if there are no legitimate relationships in the population, something in the sample may be statistically significant just by chance. It is unethical to report conclusions only about the results that were statistically significant without informing the reader about all of the tests that were done. The Ethical Guidelines of the American Statistical Association (1999) lists the problem under “Professionalism” as follows: Recognize that any frequentist statistical test has a random chance of indicating significance when it is not really present. Running multiple tests on the same data set at the same stage of an analysis increases the chance of obtaining at least one invalid result. Selecting the one “significant” result from a multiplicity of parallel tests poses a grave risk of an incorrect conclusion. Failure to disclose the full extent of tests and their results in such a case would be highly misleading. In many cases, it is not the researcher who makes this mistake; it is the media, by doing selective reporting. The media are naturally interested in surprising or important conclusions, not in results showing that there is nothing going on. For instance, the story of interest will be that a particular food is related to S5-W3527 9/28/05 4:05 PM Page S5-18 S5-18 Supplemental Topic 5 higher cancer incidence, not that 30 other foods did not show that relationship. It is unethical for the media to publicize such results without explaining the possibility that multiple testing may be responsible for uncovering a spurious chance relationship. Even if there is fairly strong evidence that observed statistically significant relationships represent real relationships in the population, the media should mention other, less interesting results because they may be important for people making lifestyle decisions. For example, if the relationship between certain foods and disease are explored, it is interesting to know which foods do not appear to be related to the disease as well as which foods appear to be related. Example S5.5 Helpful and Harmless Outcomes from Hormone Replacement Therapy In July 2002, the results were released from a large clinical trial studying the effects of estrogen plus progestin hormone replacement for postmenopausal women. Some of these results were covered in earlier chapters. The trial was stopped early because of increased risk of breast cancer and coronary heart disease among the women taking the hormones. However, many news stories failed to report some of the other results of the study, which showed that the hormones actually decreased the risk of other adverse outcomes and were unresolved about others. The original article (Writing Group for the Women’s Health Initiative Investigators, 2002) reported the results as follows: Absolute excess risks per 10,000 person-years attributable to estrogen plus progestin were 7 more CHD [coronary heart disease] events, 8 more strokes, 8 more PEs [pulmonary embolism], 8 more invasive breast cancers, while absolute risk reductions per 10,000 person-years were 6 fewer colorectal cancers and 5 fewer hip fractures. These results show that in fact some outcomes were more favorable for those taking the hormones, specifically colorectal cancer and hip fractures. Because different people are at varying risk for certain diseases, it is important to report all of these outcomes so that an individual can make an informed choice about whether to take the hormones. In fact, overall, 231 out of 8506 women taking the hormones died of any cause during the study, which is 2.72%. Of the 8102 women taking the placebo, 218, or 2.69% died, a result virtually identical to that in the hormone group. In fact, when the results are adjusted for the time spent in the study, the death rate was slightly lower in the hormone group, with an annualized rate of 0.52% compared to 0.53% in the placebo group. The purpose of this example is not to negate the serious and unexpected outcomes related to heart disease, which hormones were thought to protect against, or the serious breast cancer outcome. Instead, the purpose is to show that the results of a large and complex study such as this one are bound to be mixed and should be presented as such so that readers can make informed decisions. ■ S5-W3527 9/28/05 4:05 PM Page S5-19 Ethics S5-19 Making Stronger or Weaker Conclusions Than Are Justified As we have learned throughout this book, for many reasons, research studies cannot be expected to measure all possible influences on a particular outcome. They are also likely to have problems with ecological validity, in which the fact that someone is participating in a study is enough to change their behavior or produce a result that would not naturally occur. It is important that research results be presented with these issues in mind and that the case not be overstated even when an effect or relationship is found. An obvious example, often discussed in this book, is that a cause-and-effect conclusion cannot generally be made on the basis of an observational study. However, there are more subtle ways in which conclusions stronger or weaker than is justified may be made. For example, often little attention is paid to how representative the sample is of a larger population, and results are presented as though they would apply to all men, or all women, or all adults. It is important to consider and report an accurate assessment of whom the participants in the study are really likely to represent. Sometimes there are financial or political pressures that can lead to stronger (or weaker) conclusions than are justified. Results of certain studies may be suppressed while others are published, if some studies support the desired outcome and others don’t. As we have cautioned throughout this book, when you read the results of a study that has personal importance to you, try to gain access to as much information as possible about what was done and who was included and to examine all of the analyses and results. case study S5.1 Science Fair Project or Fair Science Project? In 1998, a fourth-grade girl’s science project received extensive media coverage after its publication in the Journal of the American Medical Association (Rosa et al., 1998). Later that year, in its December 9, 1998, issue, the journal published a series of letters criticizing the study and its conclusions on a wide variety of issues. There are a number of ethical issues related to this study, some of which were raised by the letters and others that have not been raised before. The study was supposed to be examining “therapeutic touch” (TT), a procedure practiced by many nurses that involves working with patients through a five-step process, including a sensing and balancing of their “energy.” The experiment proceeded as follows: Twenty-one self-described therapeutic touch practitioners participated. They were asked to sit behind a cardboard screen and place their hands through cutout holes, resting them on a table on the other side of the screen. The 9-year-old “experimenter” then flipped a coin and used the outcome to decide which of the practitioner’s hands to hold her hand over. The practitioner was to guess which hand the girl was hovering over. Fourteen of the practitioners contributed 10 tries each, and the remaining 7 contributed 20 tries each, for a total of 280 tries. The paper had four authors, including the child and her mother. It is clear from the affiliations of the authors of the paper, as well as from language throughout the paper, that the authors were biased against therapeutic touch before the experiment began. For example, the first line read, “Therapeutic touch (TT) is a widely used nursing practice rooted in mysticism but alleged to have a scientific basis” (Rosa et al., 1998, p. 1005). The first author of the paper was the child’s mother, and her affiliation is listed as “the Questionable Nurse Practices Task Force, National Council Against Health Fraud, Inc.” The paper concludes that “Twenty-one experienced TT practitioners were unable to detect the investigator’s ‘energy field.’ Their failure to substantiate TT’s most fundamental claim is unrefuted evidence that the claims (continued) S5-W3527 9/28/05 4:05 PM Page S5-20 S5-20 Supplemental Topic 5 of TT are groundless and that further professional use is unjustified.” The conclusion was widely reported in the media, presumably at least in part because of the novelty of a child having done the research. That would have been cute if it hadn’t been taken so seriously by people on both sides of the debate on the validity of therapeutic touch. The letters responding to the study point out many problematic issues with how the study was done and with its conclusions. Here are several quotes: The experiments described are an artificial demonstration that some number of self-described mystics were unable to “sense the field” of the primary investigator’s 9-year-old daughter. This hardly demonstrates or debunks the efficacy of TT. The vaguely described recruitment method does not ensure or even suggest that the subjects being tested were actually skilled practitioners. More important, the experiments described are not relevant to the clinical issue supposedly being researched. Therapeutic touch is not a parlor trick and should not be investigated as such. (Freinkel, 1998) To describe this child’s homework as “research” is without foundation since it clearly fails to meet the criteria of randomization, control, and valid intervention. . . . Flagrant violations against TT include the fact that “sensing” an energy field is not TT but rather a nonessential element in the 5-step process; inclusion of many misrepresentations of cited sources; use of inflammatory language that indicates significant author bias; and bias introduced by the child conducting the project being involved in the actual trials. (Carpenter et al., 1998) I critiqued the study on TT and was amazed that a research study with so many flaws could be published. . . . The procedure was conducted in different settings with no control of environmental conditions. Even though the trials were repeated, the subjects did not change, thus claims of power based on possible repetitions of error are inappropriate. The true numbers in groups are 15 and 13, thus making a type II error highly probable with a study power of less than 30%. Another concern is whether participants signed informed consent documents or at least were truly informed as to the nature of this study and that publication of its results would be sought beyond a report to the fourth-grade teacher. (Schmidt, 1998) As can be seen by these reader comments, many of the ethical issues covered in this chapter may cloud the results of this study. However, there are two additional points that were not raised in any of the published letters. First, it is very likely that the child knew that her mother wanted the results to show that the participants would not be able to detect which hand was being hovered over. And what child does not want to please her mother? That would not matter so much if there hadn’t been so much room for “experimenter effects” to influence the results. One example is the randomization procedure. The young girl flipped a coin each time to determine which hand to hover over. Coin tosses are very easy to influence, and presumably even a nine-year-old child could pick up the response biases of the subjects. The fact that a proper randomization method wasn’t used should have ended any chance that this experiment would be taken seriously. Is there evidence that experimenter bias may have entered the experiment? Absolutely. Of the 280 tries, the correct hand was identified in 123 (44%) of them. The authors of the article conclude that this number is “close to what would be expected for random chance.” In fact, that is not the case. Using the binomial distribution, the chance of getting 123 or fewer guesses by chance is only .0242. If a two-tailed test had been used instead of a one-tailed test, the p-value would have been .048, a statistically significant outcome. The 9-year-old did an excellent job of fulfilling her mother’s expectations. Key Terms Section S5.1 Section S5.3 Section S5.4 informed consent, S5-4 data-snooping, S5-15 selective reporting, S5-17 Section S5.2 experimenter effects, S5-12 S5-W3527 9/28/05 4:05 PM Page S5-21 Ethics S5-21 Exercises ● Denotes basic skills exercises Denotes dataset is available in StatisticsNow at http:// 1pass.thomson.com or on your CD but is not required to solve the exercise. Bold-numbered exercises have answers in the back of the text and fully worked solutions in the Student Solutions Manual. ◆ Go to the StatisticsNow website at http://1pass.thomson.com to: • Assess your understanding of this chapter • Check your readiness for an exam by taking the Pre-Test quiz and exploring the resources in the Personalized Learning Plan S5.5 S5.6 S5.1 Visit the websites of one or more professional organizations related to your major. (You may need to ask one of your professors for help with this.) Find one that has a code of ethics. Describe whether the code includes anything about research methods. If not, explain why you think nothing is included. If so, briefly summarize what is included. Include the website address with your answer. S5.2 A classic psychology experiment was conducted by psychology professor Philip Zimbardo in the summer of 1971 at Stanford University. The experiment is described at the website http://www.prisonexp.org/. Visit the website; if it is no longer operative, try an Internet search on “Zimbardo” and “prison” to find information about this study. a. Briefly describe the study and its findings. b. The website has a number of discussion questions related to the study. Here is one of them: “Was it ethical to do this study? Was it right to trade the suffering experienced by participants for the knowledge gained by the research?” Discuss these questions. c. Another question asked at the website is “How do the ethical dilemmas in this research compare with the ethical issues raised by Stanley Milgram’s obedience experiments? Would it be better if these studies had never been done?” Discuss these questions. S5.3 In the report Principles and Practices for a Federal Statistical Agency (Martin, Straf, and Citro, 2001), one of the recommended practices is “a strong position of independence.” Each of the following parts gives one of the characteristics that is recommended to help accomplish this. In each case, explain how the recommendation would help ensure a position of independence for the agency. a. “Authority for selection and promotion of professional, technical, and operational staff.” (p. 6) b. “Authority for statistical agency heads and qualified staff to speak about the agency’s statistics before Congress, with congressional staff, and before public bodies.” (p. 6) S5.4 Refer to Example S5.2 about Janet’s dissertation research. Explain whether each of the following changes would have been a good idea or a bad idea. ● Basic skills ◆ Dataset available but not required S5.7 a. Have someone who is blind to conditions and has not heard the volunteers’ voices before listen to the phone calls to assess whether a firm “no” was given. b. Have Janet flip a coin each time a volunteer comes to her office to decide whether that person should go into the role-playing group or the control group. c. Use a variety of different solicitors rather than Brad alone. Refer to Example S5.3 about Jake’s memory experiment. What do you think Jake should have done regarding the analysis and reporting of it? Find an example of a statistical study reported in the news. Explain whether you think multiple tests were done and, if so, whether they were reported. Marilyn is a statistician who works for a company that manufactures components for sound systems. Two development teams have each come up with a new method for producing one of the components, and management is to decide which one to adopt based on which produces components that last longer. Marilyn is given the data for both and is asked to make a recommendation about which one should be used. She computes 95% confidence intervals for the mean lifetime using each method and finds an interval from 96 hours to 104 hours, centered on 100 hours for one of them, and an interval from 92 hours to 112 hours, centered on 102 for the second one. What should she do? Should she make a clear recommendation? Explain. Use the following scenario for Exercises S5.8 to S5.12: On the basis of the information in Case Study S5.1, describe the extent to which each of the following conditions for reducing the experimenter effect (listed in Section S5.2) was met. If you cannot tell from the description of the experiment, then explain what additional information you would need. S5.8 Randomization done by a third party with no vested interest in the experiment, or at least done by a welltested computer randomization device. S5.9 Automated data recording without intervention from the researcher. S5.10 Double-blind procedures to ensure that no one who has contact with the participants knows which treatment or condition they are receiving. S5.11 An honest evaluation that what is being measured is appropriate for the research question of interest. S5.12 A standard protocol for the treatment of all participants that must be strictly followed. Exercises S5.13 to S5.22: Explain the main ethical issue of concern in each of the following. Discuss what, if anything, should have been done differently to address that concern. S5.13 Example S5.1, Stanley Milgram’s experiment. S5.14 In Example S5.2, Janet’s decision to make Group 2 the control group after the first volunteer came to her office. Bold-numbered exercises answered in the back S5-W3527 9/28/05 4:05 PM Page S5-22 S5-22 Supplemental Topic 5 S5.15 In Example S5.2, Janet’s handling of the fact that her data showed 26 volunteers in the control group when there should have been only 25. S5.16 In Example S5.3, Jake’s decision to replace the outlier with the median of 70. S5.17 In Example S5.4, the Environmental Protection Agency’s decision to report a 90% confidence interval. S5.18 In Example S5.5, the fact that many media stories mentioned increased risk of breast cancer and coronary heart disease but not any of the other results. S5.19 In Case Study S5.1, the concerns raised in the letter from Freinkel. S5.20 In Case Study S5.1, the concerns raised in the letter from Carpenter et al. ● Basic skills ◆ Dataset available but not required S5.21 In Case Study S5.1, the concerns raised in the letter from Schmidt. S5.22 In Case Study S5.1, the concerns raised about the experimenter effect. Preparing for an exam? Assess your progress by taking the post-test at http://1pass.thomson.com. Do you need a live tutor for homework problems? Access vMentor at http://1pass.thomson.com for one-on-one tutoring from a statistics expert. Bold-numbered exercises answered in the back
© Copyright 2026 Paperzz