CHERNOBYL NUCLEAR DISASTER AN ACCIDENT INVESTIGATION REPORT SUBMITTED FOR IOE491, HUMAN ERROR AND COMPLEX SYSTEM FAILURES TO PROFESSOR NADINE B. SARTER Human Factors Investigators Team : Nathan Gelino, Marta Rey-Babarro, Mark A Siegler, Deepti Sood, Craig Verlinden April 14, 2005 TABLE OF CONTENTS 1.0 Synopsis ......................................................................................................................................... 1 2.0 Economics Drives Nuclear Power................................................................................................ 1 3.0 Utilizing the Atom........................................................................................................................... 2 4.0 RBMK-1000 Design ........................................................................................................................ 3 5.0 Accident .......................................................................................................................................... 4 5.1 5.2 5.3 6.0 6.1 6.2 6.5 A Dangerous Test........................................................................................................................ 4 Chronology................................................................................................................................... 4 After Effects ................................................................................................................................. 7 CHERNOBYL: A System Failure................................................................................................... 8 RBMK Design Flaws.................................................................................................................... 8 Cultural Flaws, Economics over Safety ....................................................................................... 9 Summary of Analysis ................................................................................................................. 14 7.0 HROs and the Cognitive Processes of Mindfulness ................................................................ 14 8.0 Recommendations for Change.................................................................................................. 17 8.1 8.2 9.0 Government Recommendations................................................................................................ 17 The “Team’s” Recommendations .............................................................................................. 19 Conclusions.................................................................................................................................. 20 Glossary ........................................................................................................................................................ i Works Cited ................................................................................................................................................. ii Appendix A Description of the Nuclear Processes....................................................................... iv Appendix B RBMK Reactor Design................................................................................................ vii Appendix C Cognitive Analysis of Phenotypic and Genotypic Problems ................................ viii Gelino, Rey, Siegler, Sood, Verlinden 1.0 SYNOPSIS On April 26, 1986 the most serious nuclear accident unfolded in Unit 4 of the Chernobyl Nuclear Power Plant located in former Ukrainian Republic of the Union of Soviet Socialist Republics, near the present borders of Belarus, the Russian federation, and Ukraine. The reactor was destroyed and release of large quantities of the radioactive material continued for the next ten days. The first Russian media report of the accident came two days after it occurred, and was the fourth piece in Moscow radio's evening news bulletin. On the day of the accident, a test designed by electrical engineers to evaluate the ability of the power plant to sustain itself in the incidence of an off-site power failure in the time the secondary power supply could be started. The accident that destroyed the reactor Unit 4 killed 31 people almost immediately. Soviet scientists estimated that about 4% of the 190 tons of uranium dioxide products escaped and began to spread unevenly across the surrounding environment. Immediately after the accident the main health concern involved high levels of radiation released. Emergency measures were taken to bring the release of the radioactive material under control, to deal with the debris from the reactor, and subsequently to construct a confinement structure. Soviet experts presented their analysis of the accident to the International Nuclear Safety Advisory Group, the International Atomic Energy Agency, and the International Nuclear Safety Advisory Group in the post accident review meeting in Vienna, August, 1986. Since then this accident has been analyzed by various organizations/individuals and from a variety of perspectives. This report is a culmination of the information available to us from all these sources and reflects the authors’ perspective on the events that lead to this accident. 2.0 ECONOMICS DRIVES NUCLEAR POWER The use of nuclear power began in 1954 as the United States and USSR started racing to explore the possibilities this new form of energy could create. By the time of the accident at Chernobyl, the Soviet Union had become reliant on nuclear power as a relatively cheap and abundant form of energy. According to the Soviet news agency, Telegraph Agency of the Soviet Union (TASS), “Scientists estimate that the USSR is not threatened with a shortage of raw materials but the basic supply of oil, gas, and coal are concentrated in remote and poorly developed regions of the east and north, where there is a harsh climate, permafrost, and no roads”. The problem with the location of the resources is that it is expensive to maintain the infrastructure which makes increasing production unprofitable. Marples (1986) noted that in the last decade, the expenditure on the recovery of each metric ton had increased three fold. Not only had the coal and other natural resources become cost prohibitive, the Soviet government wanted to preserve their oil for hard currency. Instead of pumping oil and shipping it to the power plants in Eastern Europe, the USSR felt it would be more economically beneficial to produce their energy via Gelino, Rey, Siegler, Sood, Verlinden 1 nuclear power and export the electricity to Eastern Europe. The major problem in the Ukrainian energy sector is the stagnation of the Donetsk coalfield. The Donetsk coalfield was traditionally the principal source of Ukrainian energy supplies. Stagnation of this resource led to the widespread development of nuclear energy in the Ukraine. The nuclear facilities at Chernobyl were believed to be the best and the most reliable of any of the nuclear facilities throughout the Soviet Union. Reactor number 4 had been renowned for always being able to provide power flawlessly since its commission in 1983 (May, 1989). The Soviets had developed a new type of reactor that was cheaper and allowed for the use of unrefined uranium in the core of the reactor. This new reactor type, the RBMK 1000, was used in all four of Chernobyl Nuclear Power Plant’s reactors. This allowed them to produce power with much lower costs than with more common forms of nuclear reactors. Affects of these decisions played a crucial role in the accident. To understand the accident, a fundamental knowledge of the physics of nuclear power is important in unlocking the reasons behind the events that caused the accident. 3.0 UTILIZING THE ATOM Some of the relevant concepts that are required for understanding the workings of nuclear power are described in Appendix A and specific nuclear physics terms are defined in the glossary. A nuclear power plant generates electricity by utilizing the energy released by fission. Fission occurs when a neutron collides with a nucleus of an atom and the nucleus is broken up into fission products. Nuclear power is made possible because of urainium-235’s (U-235) ability to initiate a chain reaction of fissions. When U-235 is fissioned, two new atoms and on average 2.5 neutrons are released. Neutrons created from the fission induce 2.5 additional fission reactions. This characteristic generates an exponential increase in the energy. The exponential increase in energy must be controlled because if the chain reaction of fissions were allowed to progress without obstruction, energy production would reach levels that would destroy the reactor in seconds. The chain reaction is managed through the use of control rods in the core of the reactor. Control rods help to maintain a constant energy level by absorbing a fraction of the fission inducing neutrons. This neutron absorbing material can be added to the reaction inadvertently. Fission products sometimes include atoms that have neutron absorbing capability. When atoms that have neutron absorbing characteristics accumulate in the reactor core, the power level of the reactor becomes unstable when operated at low power levels. Operating the reactor under these conditions creates a dangerous combination of unpredictability about the speed at which power levels can change. Understanding the energy encapsulated in the atom and how it can be unfolded through fission is the first step towards Gelino, Rey, Siegler, Sood, Verlinden 2 designing a nuclear power plant to produce electricity. Of the reactor designs available, Chernobyl nuclear power plant used the Reaktor Bolshoi Moshchnosty Kanalny (RBMK) reactor. 4.0 RBMK-1000 DESIGN The reactor used at Chernobyl was a RBMK-1000 style reactor (Appendix B). It was U-235 fueled, graphite moderated, and water cooled. This design used graphite as a moderator because of graphite’s excellent moderating capability that allowed the use of natural U-235 as fuel instead of refined U-235. The use of natural U-235 reduced refining costs incurred by reactors with less moderating ability. However, the tradeoff of using graphite instead of light water and heavy water is the ability to use graphite as a coolant as well as a moderator (Bodanski, 2004). The inability of graphite to act as a coolant forces the water in the reactor to have common mode functioning, acting like a coolant and as a reaction poison at the same time. Perrow (1984) points out that common mode functionality adds complexity to systems making them more prone to failure. The common mode functioning of water increases the danger of operating a nuclear power plant. When water functions both as a coolant and a poison, any loss of water through conversion to steam or otherwise, can result in increased reactivity (Bodanski, 2004). As the reactor heats up, more steam is generated and as more steam is generated, there is less water in liquid form to poison the reaction. If there is less poison in the reactor, the reactor gets hotter. As the reactor gets hotter, the amount of poison decreases, thereby increasing reactivity. This creates conditions for a cyclical self perpetuating process. This is the nature of a reactor with a positive void coefficient. When the reactor begins to heat up, it has the tendency to continue increasing power. All reactors outside of the Soviet Union were designed with an inherent tendency to decrease reaction rates when a loss of water occurs and not increase power. Soviets were aware of the potential dangers of operating a reactor with a positive void coefficient. To compensate for the dangers, additional administrative and engineering safety controls were incorporated into the system at Chernobyl. The additional administrative safety controls were designed to restrict reactor operation under conditions that could encourage power level instability. Since the reactor becomes unstable at low power settings, a minimum power level was established. If the reactor power lever were to dip under this level, reactor shut down was required. The additional engineering safety controls utilized a system of automatic trips that would set off the emergency core cooling system if any parameter values left the safe operating range. The unsafe properties of the reactor design played a crucial role in the accident and these safety measures could not prevent the disaster. Gelino, Rey, Siegler, Sood, Verlinden 3 5.0 ACCIDENT 5.1 A Dangerous Test The test planned for April 25, 1986 had very worthy reasons for being conducted. The Research Design Institute for Power Engineering in Moscow identified a safety issue in the event of outside power loss. In case of an off-site power failure, it would take two to three minutes to start the Emergency Core Cooling System (ECCS). The ECCS was dependent on emergency diesel generators for the energy that the system needed to function. The objective of the test was to see if the reactor’s turbine would have enough residual energy to supply electricity to the plant equipment and maintain the coolant flow though the reactor during those two to three minutes that the ECCS to was coming on line (Reason, 1987). The test was designed as an electrical test only, the physical and thermal characteristics of the reactor were not taken into account (Vargo, 2000). It was not the first time that this test was conducted in Chernobyl. Previously in 1982 and 1984, the test was attempted during shut down operations of the Chernobyl-4 reactor. In both occasions, the test was aborted since it caused a rapid voltage fall off. In 1986, the test suffered some variations as compared to the previous ones. Instead of conducting the test with the reactor off, the engineers in charge of it were going to run several tests with the reactor operating at reduced power. Before continuing, some psychological factors need to be explored because they have an important impact on the accident itself. The day of the test, Friday, April 25, 1986, was the eve of a national holiday in the USSR, and the following Tuesday, Unit 4 was due for its annual maintenance shutdown. Therefore, if the tests were not carried out in that short time period, the plant would have to wait another year to schedule it again (Stanton, 1996). Time pressures played a significant role in the events leading up to the accident. 5.2 Chronology The test was planned to happen early during the day. At 1:00 pm that Friday, the operators started reducing power to the goal of 25% (700MWt). As part of the test procedure, they disengaged the emergency cooling system. When the reactor was at approximately 50% power, the Kiev power controller required Unit 4 to continue supplying energy to the grid, due to an unexpected increase in demand. The test stopped and the reactor was brought back to 100% full power until 11:10 pm that night, when it was no longer necessary to provide extra energy to the grid. Reason (1987) points out the “lax attitude” of the operators in his analysis of the accident; they forgot to reconnect the ECCS when Unit 4 is reconnected to the grid. Although this violation did not affect the accident it was a violation of the safety procedures of the power plant. Gelino, Rey, Siegler, Sood, Verlinden 4 Late that Friday night, a new shift of operators came to the power plant at 12:00 am. Although the operators were not informed that they were going to carry out the experiment until they came to work, they were extensively trained to operate the reactor. Leoneed Tatanov was in charge of controlling the reactor, Boris Yelulchuk of the water pumps and Yuri Karnaea of the turbines. The National Geographic video on the reconstruction of the accident places the three men in different locations during the test. The following chronology represents the several contributing steps that let to the accident. They are based on the U.S. Department of Energy Official Report (1986) of the accident, Reason’s analysis of the accident (1987), Vargo’s (2000) risk assessment of the accident and our own analysis. Reason points out five of the eight symptoms of Janis’s (1972) groupthink syndrome, as a plausible explanation of the events that we will analyze shortly. “Their actions were certainly consistent with an illusion of invulnerability. It is likely that they rationalized away any worries (or warnings) they might have had about the hazards of their behavior … If any one operator experienced doubts, they were probably selfcensored before they were voiced.” With this attitude, the test recommenced. 12:28 am: In order to prepare the reactor for the set of tests that they were going to conduct early that morning, Leoneed switched off the “auto-pilot device” and manually decreased the power level in the reactor. As a result, the reactor power rapidly decreased to a dangerous 1% power (30 MWt). The unpredictability of the system puts the Chernobyl accident in the framework of Normal Accident Theory (Perrow, 1999) where the complexity of the subsystems and the tight coupling of the elements set the system up for inevitable failures, as we will soon describe. “Despite strict safety procedures prohibiting any operation below twenty percent of maximum power, the combined team of operators and electrical engineers continued with the test program” (Wickens and Hollands, 2000). 1:00 am: To compensate for the negative reactivity, several violations and errors were made in the subsequent steps that led to the accident. After more than half an hour trying to regain control of the reactor, Leoneed stabilized the power level at 7%. This percentage was very much under the desired 25% power level, which was never reached again. Of all the violations and mistakes that were about to follow, this is the most serious mistake, since the experiment should have been abandoned (Reason, 1987). 1:03 am: Boris started two additional cooling pumps as the test program directed. He was not aware of the low power level of the reactor and how these additional two pumps were to affect the overall system. The safety regulations of the reactor limited the total number of pumps in use to 6 at any time. The two additional pumps meant eight pumps were operating at the same time. In the design of the experiment, the additional pumps were thought to provide “safer” cooling during the experiment. This behavior can be best described as an effect of bounded rationality (Simon, 1975) in the designers of the experiment since they are unaware of how this step will affect the overall system as a whole. The consequence of this action was that the increase in water flow combined with the reduced steam to absorb Gelino, Rey, Siegler, Sood, Verlinden 5 more neutrons, thus reducing power. The automatic control rods in the reactor were taken out to maintain the reaction even at this low level of power. This produced an imbalance in the steam separator caused by less steam, and in this state cavitation was possible. 1:19 am: Boris was unaware of how his previous actions had affected the system and kept increasing the water flow into the system. Decreasing steam created less pressure to drive the turbines. At this point, the reactor would have shut itself down due to the low energy level that almost stopped the turbines. Leoneed, reacting to the circumstances from what we can consider a knowledge-based behavior (Rasmunssen, 1983), blocked the emergency protection system to maintain the reaction which prevents the reactor from automatically shutting down. He “helped” the reaction by manually moving rods outside the reactor to generate more power. Both actions represented a violation of the safety procedures of the reactor which created more steam to drive the turbines and generate a faster nuclear reaction. This brought the core to a more energetic state. 1:22 am: The shift supervisor requested a printout of the number of control rods in the core. Although the document reported six to eight rods in the reactor, he continued with the test. Six to eight rods violated the safety procedures which forbid the operation of the reactor with fewer than twelve control rods. Boris then decreased the water flow into the reactor. This action combined with the small number of control rods into the reactor made the temperature rise in a matter of seconds. 1:23:04 am: Yuri, unaware of his colleague’s problems, is signaled to proceed with the preparations of the test. He closes the steam flow valve to the number eight turbine generators to begin the test. The objective of closing this valve was to establish the necessary conditions to repeat the test several times. This action necessitated the disconnection the automatic safety trips; once more, a violation of the safety procedures that generated a change in the power distribution. The power in the upper region of the reactor core dropped due to the insertion of the control rods. Simultaneously, lower part of the reactor core power level began to rise due to the replacement of water by the graphite spacers on the ends of the control rods. Dual power levels in the reactor core made reactivity unpredictable. 1:24 (estimate) am: Due to the tight coupling and the complexity of the system, the reactor temperatures began to soar. The steam formation and the sharp temperature increase created the conditions for steam zirconium and other exothermic reactions increasing the overall temperature one hundred times the normal levels. Leoneed and Boris realized they were had a serious problem. Since they were not able to manually perform any other tasks and were alarmed by the high rise in temperature in a short amount of time, they tried to compensate by hitting down the shut down emergency button. Dysfunctional and complex interactions (Leveson, 2002) inherent to the system combined with the positive void coefficient and the sharp increase in pressure caused the rupture of the channels led to the three thermal explosions that blew away the reactor’s roof. Gelino, Rey, Siegler, Sood, Verlinden 6 5.3 After Effects After the explosion of Chernobyl’s reactor number four, the Soviet government made no attempt to publicize the accident. The international community was not informed of the seriousness of the accident until ten days later, and even then they were untruthfully told that the disaster had already been contained. There was no evacuation of the surrounding areas until the eleventh day, possibly because no formal plans existed to deal with such a catastrophe. In the eleven days that passed, all residents within 30 kilometers were exposed to levels of radiation that were capable of causing physical harm. The Soviet governments began to deal with the catastrophe on the eleventh day. Most countries with nuclear power establishments have specialized robots that are designed to do cleanup work in an extremely radioactive environment. The purpose of these robots is to minimize human casualties from cleanup efforts involved with a nuclear disaster. The Soviets had no such robots. Instead, the Soviet government called on their military to clean up the radioactive debris and contain the nuclear fallout. These men, called liquidators, would take turns shoveling radioactive debris back into the core for a couple minutes at a time. After the debris had all been deposited back in the core, other liquidators were required to fly over in helicopters to dump material that would reduce the spread of radioactive contaminants. Many different materials were layered upon the core to put out the graphite fire. Some materials helped reduce the fire, some added fuel to it, and others never made it to the core because attempts to dump the material quickly reduced the accuracy of the load placement. The liquidators ordered to fight the Chernobyl fire suffered the most fatalities; 100,000 of the 800,000 soldiers assisting in the effort have perished due to the high levels of radiation received during their service (Swiss, 2005). Human casualties were not the only losses suffered by the Soviet Union from the accident. Over 18,000 square kilometers of farmland were affected by radiation. Twenty years after the accident, 2,640 of the 18,000 sq km remain unusable (Mould, 2000). The high levels of radiation have caused increased cancer levels among children and adults. Soviet Union incurred monetary losses as well, the expected cost to the governments of Belarus, Ukraine, and the Soviet Union were as high as 250 billion dollars. A fraction of the monetary costs incurred by the Soviet Union came from efforts to provide long term containment of the melted reactor core. Over 117,000 people worked to build the containment structure, officially called a sarcophagus, which now encompasses Unit 4. This structure was built to surround the remains of the core that still contains about half of the radioactive materials inside. The sarcophagus entombment was built under extreme time pressures to contain the spread of radiation both into the air as well as the water table below. (May, 1989) Many serious doubts exist about the efficiency and effectiveness of the sarcophagus. Current efforts are being focused on determining a safer and more permanent containment structure to replace the sarcophagus. Gelino, Rey, Siegler, Sood, Verlinden 7 6.0 CHERNOBYL: A SYSTEM FAILURE A system accident is a normal accident as failure is an inherent property of the system in which they occur (Perrow, 1999; Stang, 1996). It is important to note that failure is not an extrinsic imposition of irrepressible conditions, but an intrinsic property of the high risk industrial installations such as nuclear power plants. In these systems, due to the presence of high levels of complexity and tight coupling within the system, multiple and unexpected interactions of failures become inevitable (Perrow, 1999). From a systems approach, the Chernobyl accident was the result of problems in the socio-technical system and an inevitable disaster resulting from the deepening crisis in Soviet society (Stang, 1996). Main problems with this view include the difficulty in providing justification for the system failure at the Chernobyl plant when many other nuclear power plants working under similar frameworks of social, economic, and technical boundaries seemed inviolate. This unavailability of a rationale could have also been the reason for accident investigations blaming the operators at Chernobyl as the root cause of this accident. In the Institute of Atomic Energy (IAE) report, the main cause of the accident was defined as the “… freak combination of infringements of rules and working practices on the part of the reactor staff, under which faults in the design of the reactor and its automatic control and safety systems became apparent” (IAE, 1986). It is important to note that the part underlined in the quote never reached the International Atomic Energy Agency (IAEA) and the remaining system places the entire blame on the operators working that day. This analysis establishes causal links between the human decisions and actions leading to the events that destroyed the nuclear reactor. The operators at the reactor committed errors but these errors occurred in combination with many other failures of the systems components, making it a very complex accident scenario. The presence of group link syndrome, absence of common ground among the operators managing the reactor, water pump, and the turbines, and bounded rationality are some of the genotypic problems that the operators faced. Taking a systems approach for a complete investigation of various latent and active system errors is warranted. A systems approach for analysis promotes investigation of events not in isolation, but as dysfunctional interactions amplifying effects on the system. Therefore, various factors and their unpredictable complex interactions leading up to the failure of the Chernobyl nuclear power plant were analyzed. The errors could have occurred at any of the three stages of the processes ensuring plant safety: design (RBMK system), planning (cultural and environmental influences), and on-line operations (operators’ actions). Accident investigation and analysis conducted here involved looking at errors made at all these stages of the process. 6.1 RBMK Design Flaws Chernobyl’s RBMK-1000 reactor design problems that made it an unsafe system could be attributed to four main design failings. First, the positive void coefficient inherent to this design caused Gelino, Rey, Siegler, Sood, Verlinden 8 the reactor to power run away or automatic increases in power. This tendency to increase power automatically, coupled with the speed at which hazardous power levels can be reached, creates a reactor that does not foster safety. An industry that cannot tolerate an accident should not allow a design that has a tendency to increase potential hazards when a lapse of control occurs. Many reactor designs exist that do not have this characteristic, therefore the additional risks intrinsic to the design can be avoided. Second, a reactor that has the tendency to increase power levels automatically should have a control rod system capable of countering that tendency. Chernobyl’s RBMK control rods take 18 to 20 seconds to insert. 18 to 20 seconds is considered too long since the time that it takes a reactor power excursion to cause damage is less than ten seconds. Professor B. G. Dubovskii, head of the USSR Nuclear Safety Board for 14 years (1958-73) said, “…normal emergency systems, as used in reactors all over the world, come into operation in just a few seconds, five seconds at the most.” The accident at Chernobyl occurred five seconds after the emergency stop button had been activated. The activation speed of the control rods was not the only flaw associated with their design. “…first effect of the insertion of the control rods from the full-out position was to increase the reactivity” (Bodanski, 2004). A system that was designed to reduce the reaction rate actually increased the reactivity. It is thought that pressing of the emergency button might have accelerated the increases in power culminating in the accident. The third major design defect is the use of graphite in the reactor core. Not only does graphite create a positive void coefficient in the reactor, it is also a highly combustible material. When graphite catches fire and starts burning it is extremely difficult to extinguish. Graphite burning in the reactor core is a problem because the rising heat acts as a propellant for the release of radioactive materials. The use of graphite in the design of the RBMK made cessation of radioactive material release nearly impossible because of its highly flammable nature. The final major deficiency in the design of Chernobyl’s RBMK reactor is the absence of a secondary containment structure. If safety measures fail and an accident does occur, there should be a containment vessel in place to reduce the risk of radioactive material escaping. The Soviets seemed quite convinced that the additional safety features designed into the reactor would have prevented this accident from occurring. Soviet officials did not deny [the safety defects] but insisted that the RBMK design has many extra safety systems and normally operates under strict regulations. This makes it generally safer and more reliable than other reactors (Medvedev, 1990). For this reason, Soviet engineers did not include a secondary containment structure in the RBMK design. 6.2 Cultural Flaws, Economics over Safety Analysis of various reports indicates an environment festered with neglectful approach towards safety. This was evident in continuing the use of RBMK reactors when research had provided ample Gelino, Rey, Siegler, Sood, Verlinden 9 evidence regarding the inherent risks in its operations. This belief is strengthened by the knowledge that the use of RBMK design was continued after the accident and even the other units at Chernobyl continued to remain operational till 15 years after. The RBMK system associated risks though known at the time of the accident were not investigated for there contributions to the accident until much later. Economic could have been a reason for the decision of continuing use and even in directing the attention of the accident investigation away from the inherent system flaws. Soviet Union had six other reactors similar to the one that melted down at Unit 4 of Chernobyl. Investigating and if proved, the presence of design flaws could have put them under enormous international pressure to address and maybe even shutdown will the problems were addressed (Chernousenko, 1991). This would have lead to an electricity and economic crisis throughout the Soviet Union. Another example of economics taking precedence over safety was the floundering of design regulations in the RBMK construction at Unit 4. The initial plans for the design of RBMK control rods included an absorbing section and a 7-meter full-length dispenser. Control rods length was shortened to 6-meters in the secondary drawings but a thin film cooling channel was to be implemented to eliminate the positive reactivity surges when the control rods were reinserted (Chernousenko, 1991). The RBMK working designs at the time of the accident showed that shorter control rod lengths were used but the thin film cooling channel was excluded. Reasons could have included complications in constructing the thin film cooling channels and associated high costs. This absence of thin cooling film coupled with shorter control rods lead to increase in insertion times and further contributed to the positive void coefficient. It would be pertinent to question the reasons that lead to absence of control measures that could have rectified these design problems at the design stage, manufacturing, or in the operational stages. This is answered through the absence of any formal design analysis by an external organization at the design stage due to the monopoly existing in the nuclear science area (Chernousenko, 1991). The designers did carry out analysis but there efforts were hindered due to poor experimental facilities and backwardness of technology available to them. Also, there were certification processes given by the Technical Basis of Safety of Reactor Installation (TBSRI) or by TBS of Nuclear Power Stations (TBSNPS). However, 16 reactors brought into operation were never certified and the six first generation RBMK reactors, including the one at Unit 4 of Chernobyl could not have been certified with their existing design flaws. These latent errors in the system play a central role in the accident. This is further illustrated through detailed analysis of phenotypic and genotypic elements that come into enforce before the accident (Appendix C). The dysfunctional relationships and complex interactions compounded with the opacity of system behavior create an extremely very difficult situation for the operators. Gelino, Rey, Siegler, Sood, Verlinden 10 6.3 Test The test was treated essentially as an electrical equipment test. The impact on nuclear plant safety was not taken into consideration. This resulted in the test being left to the responsibility of electrical experts only (Medvedev, 1990). This test was not approved in consultation with the project design engineer, chief design engineer, scientific project manager for RBMK nuclear power plants (NPP), and the governmental oversight authorities (Chernousenko, 1991; Medvedev, 1990). It would be important to note that the management required no such formal approval. Improper written test procedures for conducting this test also compounded the problem. Also, no extra safety interventions or interlocks were employed even though the test could have affected the safety of the reactor. 6.4 Human Error It is important to provide a context for the plausible reasons for the human errors being mentioned as the main contributing factors. Former chairman of Soviet Academy of Science and one of the founders of the Soviet Nuclear Energy Commission, Anotolij Aleksandrov, said that the Chernobyl operators acted like bad car drivers. In his words: “You must understand that the reactor does have some flaws… it was designed years ago with available technology… but the problem isn’t the design. If you are driving a car and turn the wheel in the wrong direction and have an accident, do you say that the engine is at fault? or is it the designer? No. Everyone will say that it is the fault of the driver.” This comparison of a NPP to a car oversimplifies the system and hides the very complex environment of these operators. A nuclear reactor is a tightly coupled human-machine system with an enormous energy potential. It is a high risk system and to get a complete picture, this makes it important to analyze human behavior in the context of a tightly coupled and highly complex environment (Stang, 1996). Cognitive workload associated with a complex environment can affect operators’ performance. Detailed phenotypic and genotypic analysis of events preceding the accident was undertaken to understand the realms of operator’s mental model and system behavior (Appendix C). Here some of the recurring problems seem to be the bounded rationality, bottom-up approach to problem solving, and lack of good mental model of the system. These problems compound as the work shifts from rule based behavior to a knowledge based behavior. Also one of the problems, not seem to have been addressed in any reports but seemed to have played a significant role in the operators’ actions was the lack of common ground. The two operators were performing actions in their respective domains but their knowledge of the corresponding effects on the system of their collective actions was not clearly apparent. Also being possible distanced physically the communication could have been based on not same ground. Grounding is a collective process by which the participants in the communication add to their common ground in mutually understanding each of the contributors meaning and purposes (Clark and Brennan, 1991). Gelino, Rey, Siegler, Sood, Verlinden 11 This concept applies also to the systems where the Collective actions should be built on common ground to ensure that all the contributors understand not only understand their and each others individual roles and purposes but are also clear about concurrent goals. In the chronology, the absences of common grounding for example led Boris to decrease water flow to the reactor when Leoneed had already taken out too many control rods from the reactor core. This led to the reactor power increasing very rapidly. Two approaches are available for analyzing the fallibility of humans leading to human error: the persons approach and the systems approach (Reason, 2005). In the person approach, the focus is on determining the individual errors, placing blame for forgetfulness, inattention, or moral weakness. The system approach concentrates on the conditions under which individuals work and tries to build defenses to avert errors or to mitigate their effects (Reason, 2005). In this analysis, a systems approach was undertaken. The Soviet accident commission placed the entire blame for the Chernobyl accident on the operators working at the plant on the night of the accident. Errors made by operators did contribute to the complex accident scenario at the Chernobyl plant but could the human error alone have caused the accident of this magnitude? Another important question to answer is, could the operators have reasonably anticipated and prevented the dangers? Also, the operators’ behavior on the night of the accident can be better understood if analyzed in context of the influence of a malfunctioned social and organizational system. Systems approach can hopefully provide a better understanding of the environment which surrounded the accident while reducing the influence of hindsight bias in the view of all the available information about the events preceding the accident that could not have been accessible to the operators. Working documents submitted by the Soviet commission and the information provided by the Soviet experts to the IAEA defined three main contributing factors: disabling of automatic trips, operating at unacceptably low power, and decision to proceed with the test. All of these factors were related to the operator actions on that day and were based on what were defined as the six most dangerous infringements of working practices by the operators that made it impossible for any safety system to avert the accident (Chernousenko, 1991). These infringements were analyzed from the perspective of determining the role of these infringements in the accident causation and more importantly to understand the operators’ metal model while they performed actions that led to the infringement of safe practices. Listed here are the six infringements with analysis based on available information (Chernousenko, 1991): Infringement 1: Reduction of the number of rods in the reactor to much lower than permitted. This made the reactors emergency protection system being rendered ineffective. Findings: In regulations concerned with duties of the staff in regimes involving big reductions of reactor power, there was no requirement for the monitoring of total number of rods in the reactor among the list of parameters that need monitoring. Also, there were no technical regulations describing the actions required if due to some problem (example: computer malfunction), the configuration of required rods gets Gelino, Rey, Siegler, Sood, Verlinden 12 changed to a level lower than required. The unclear display of information on the NPP monitoring system could also have facilitated the missing of this information by the operators. Infringement 2: Power output dropped to levels lower than those envisioned in the experimental plan due to which the reactor became very difficult to control. Findings: There was no documents available dated before the day of the accident that stated the minimal power output limit. The scientific supervisors of IAE noted that they were ignorant of dangers of operating reactor at low power output levels. The positive steam void coefficient effect at lower power output can create rapid effects of reactivity on the power output. In other words, the RBMK had self accelerating properties that were difficult for the operators to predict. Infringement 3: Excessive amounts of water was added to the reactor core, exceeding the limits laid down in regulations which made the temperature of coolant in the circulation system to reach saturation temperature leading to the cavitation of pumps. Findings: None of the documents forbade the connection of all main circulation pumps to the reactor. Also, later simulations of conditions present on that day showed that cavitations of pumps could not have happened on that day. Infringement 4: Blocking of the reactor’s protection system when indicators showed that both turbine generators had stopped which lead to the loss of reactor automatic trip facility. Findings: The operators assumed the right to take manual control by pressing the emergency button but there is conflicting information in the working practices documents which could have lead them to believe that they had the right to do this. Infringement 5: Disconnection of steam separator safety mechanism which made the reactor’s protection system as regards to thermal parameters rendered completely de-activated. Findings: There is contradictory evidence available regarding the actual event, as the analysis of the events preceding the accident indicates changing of parameter values but they remained in operation. Infringement 6: Disconnection of the emergency reactor core cooling system (ECCS) which leads to the loss of ability to reduce the scale of the accident Findings: Analysis of the design of the ECCS gives insight into the probable mental model of the operators. ECCS admits cold water into the main circulation pipes and fuel channels which could be heated to over 300 degrees Celsius leading to permanent damage to the reactor. The operators might have tried to avert this but they were bounded by their own inability to perceive the scope of the reactions brewing inside the reactor. Some sources cited that the Chernobyl operators did not have sufficient knowledge of reactor physics (Gasemeyr, 1995; Laaksonen, 1986). If this was indeed the case then it could have contributed to the accident. However Stang (1996), during his analysis of the Chernobyl accident notes that the original Gelino, Rey, Siegler, Sood, Verlinden 13 source of this information was the reports submitted by the Soviet accident commission to the International Atomic Energy Agency. Also since 1989, the accident analysis reports from the Soviet have increasingly emphasized the role of institutional failures. 6.5 Summary of Analysis This analysis demonstrates the need for not only the safety related physical and engineering infrastructure at nuclear facilities but also the importance of a safety culture in all aspects of atomic energy utilization. None of the deficiencies alone could have caused this accident but it was the breaching of flaws in the various defense layers together on the day of the accident. The safety culture refers to the defenses in depth at every level of the system. Safety culture presumes an overall psychological bias towards safety (Vargo, 2000). NPP should be a high reliability organization (HRO) given the high levels of complexity and tight coupling, multiple and unexpected interactions of failures are inevitable (Perrow, 1999). 7.0 HROs and the Cognitive Processes of Mindfulness All of the contributing factors of the Chernobyl disaster imply that it was an accident of a complex system. Therefore, the accident was not about a single factor, but multiple factors acting together which led to the accident. Since nuclear power plant accidents have the potential to be so catastrophic and its effects so widespread to larger amounts of population over several generations, the nuclear power plant has no choice but to be a HRO. Weick et al. (1999) shaped a framework to grasp the distinctness of HROs creating the need to look more closely at the ways in which diverse but stable cognitive processes interrelate in the service of the discovery and correction of errors. In this work they give us the concept of mindfulness which consists of five cognitive processes (preoccupation with failure, reluctance to simplify interpretations, sensitivity to operations, commitment to resilience and deference of expertise). Our analysis shows how the absence of these processes at Chernobyl led to what we coin the “crumbling of the wall of safety,” which consists of analyzing the five processes of mindfulness and seeing how the wall crumbles, letting the accident occur. When HROs focus on the wall of safety, their concerns cover a broader range of unexpected events. The first block in the wall of safety is preoccupation with failure. Our research shows that this just was not apparent in the Soviet culture or in the Chernobyl power plant. The Chernobyl engineers were performing a dangerous test that could have had risky outcomes. Even though the test was to be executed during the day with senior staff at the plant, the test was run at night with junior staff due to electricity needs of that region during the day. Problems occurred as the test was being run where unexpected results were occurring but the operators refused to terminate the test. They always thought Gelino, Rey, Siegler, Sood, Verlinden 14 they could overcome the obstacles encountered during the test. Weick gives us insight into a byproduct of increased attentiveness to all failures in that in contrast to their inconsequential role in traditional organizations maintenance; departments in HROs become central locations for organizational learning. The following quotation from a plant worker shows how Chernobyl did not practice preoccupation with failure in their daily work: “No attention was paid to the state of equipment until it was time for planned preventive maintenance” and he recalled that one station manager actually said: “what are you worried about? A nuclear reactor is only a samovar [a metal urn with a spigot at the base; used in Russia to boil water for tea]” (Marples, 1986 ). Furthermore, the safety envelope of the equipment was not well understood by those managing the operation of the plant. Finally, to illustrate the lack of preoccupation with failure, it has been argued that a generation of engineers had grown up in the Soviet nuclear industry who lacked any sort of critical attitude to the technology they were handling (Mosey, 1996) The next cognitive process of mindfulness which we have not found evidence of being followed in Chernobyl is reluctance to simplify interpretations. Weick explains how to restrain temptations to simplify which is a common property to all organizations. HROs cultivate requisite variety and assume that it takes a complex system to sense a complex environment. These efforts take such forms as diverse checks and balances embedded in a proliferation of committees and meetings, frequent adversarial reviews, selecting new employees with non-typical prior experience, frequent job rotation and re-training. The following examples will emphasize how Chernobyl did not practice this second process of mindfulness in organizations. Upon choosing an employee for his shift, a manager at the Chernobyl plant had narrowed his selection to two employees. The first employee had a clean record. No reprimands always did a fine job. The other employee had three reprimands on his record. The manager chose the second candidate arguing that he did not want someone that would not take any risks! (Shabad, 1986) As it should be expected, shortly after the accident occurred, serious nuclear safety management problems surfaced at the highest levels in the Soviet industry. It had been recognized that the existing Soviet nuclear safety regulatory system was nearly non-existent during the construction of the plant and in the acceptance of the test at Chernobyl. One of the first steps that the soviet government took after the accident was to set up a separate Ministry of Nuclear energy to wrest nuclear power from what Prime Minister Nikolai Ryzhkov has called the “negative influences” of the Ministry of Power and Electrification which had responsibility for all power stations. A serious question pondered by the world was how was it possible for a test program to be drawn up and to be implemented with, as it would appear, complete bypassing of any sort of review and approvals process. Another brick in the wall of safety crumbles as Chernobyl oversimplifies important tasks that they have to accomplish. Management should be aware that a complex system creates a complex environment. The test was a change in the environment, therefore extreme precaution and cross-checks should been taken, but they were not. Gelino, Rey, Siegler, Sood, Verlinden 15 Our analysis turns to sensitivity of operations at Chernobyl. Weick cites Emilie Roth with her insight to sensitivity to operations. Roth says that sensitivity to operations is achieved through a combination of shared mental representations, collective story building, multiple bubbles of varying size [situational awareness], situation assessing with continual updates, knowledge of physical interconnections and parameters of plant systems, and active diagnosis of the limitations of preplanned procedures. As we have seen in the chronology of the accident, there was a deviation from the expected 25% full power at the beginning of the test. Later on during the test, the power level of the plant plummeted to 200 MWt. The power level was intended to be between 700 and 1000 MWt, but the operators were never able to bring it to the specified level. This should have been taken as a free lesson and the test should have been terminated. This example clearly shows insensitivity to operations at the plant. The operators did not “have the bubble.” If someone has the bubble at all times in HROs, then catastrophic failures are forestalled by large numbers of ongoing small adjustments that prevent errors from culminating. When the power level of the reactor plummeted, this was the first event in a series of events which led to catastrophic failure. Another brick in the wall of safety crumbles allowing errors to occur and not being sensible to operations. Weick et al. describe resilience as not only bouncing back from errors, but also coping with surprises in the moment. It is important to retain both connotations of resilience to avoid the idea that resilience is simply the capability to absorb change and still persist. To be resilient also means to utilize the change that is absorbed. The best HROs do not wait for an error to strike before responding to it. Gene Rochlin explains commitment to resilience with the idea of epistemic networks. Epistemic networks are informal latent networks activated only in the face of uncertainties and rapidly developing contingencies as a supplement to the normal patterns of formal hierarchy and compliance with strict roles. The value of these networks is that they allow for rapid pooling of cognitive knowledge to handle events that were impossible to anticipate. Epistemic networks were not formed during the events at Chernobyl. As unanticipated events occurred, operators were not pooling their knowledge together, they just keep going. The operators were individually trying to solve the problems they faced without realizing the cause and effect nature each of their actions placed on the other actions. Without these networks forming, another brick in the wall of safety crumbles and accidents can happen. Mathilde Bourrier describes a finding from studying nuclear power plants which shows Chernobyl’s final process of deference to expertise crumbling. Bourrier found that “the most important characteristic [during a planned outage] is the formal delegation of power to craft personnel supported by a nearly complete availability of top-management at all times. By being very flexible and adaptive organization, any problem can rapidly receive the attention it requires at all levels of the organization.” Chernobyl’s test was a planned outage and was supposed to be executed during the day with senior staff Gelino, Rey, Siegler, Sood, Verlinden 16 available. The test was run at night with senior-management and senior staff home, while junior staff executed the test. The last brick in our wall of safety falls down and the Chernobyl accident happens. 8.0 Recommendations for Change 8.1 Government Recommendations In the aftermath of the Chernobyl accident, the Soviet Union identified several planned actions to improve the safety of RBMK type reactors. The United States government gave an assessment which was based on review and evaluation of the Soviet based safety team, analysis of basic effects and the judgment of these experts. The two key safety related areas that were examined are: human factors, and design. The Soviet report of the accident places heavy emphasis on the role of the operators. Numerous operator errors have been identified and analyzed. The human errors involved procedural, management and operator errors. The single largest contributor was the dedication to finishing the safety-related test before the reactor was shut down. The people working on the reactor for the previous three shifts were highly motivated to complete the test before the reactor’s shut down. This motivation seems to be a result of management direction. The Soviets have engaged in institutional, management and operational initiatives to conquer the human factors portion of safety. Design changes which have been planned are aimed at preventing a Chernobyl type accident. to ever happen again. The reduction or elimination of the positive void coefficient may improve the safety and stability of the RBMK type reactor at all power levels. One consequence of increasing enrichment (thereby decreasing void coefficient) is to increase power and flux peaking over the entire range of operation. The U.S. government believed that overall, the design changes suggested by the Soviets would increase the safety of the reactor in some respects, but reduce safety margins in other respects. The conditions where the safety improvements were most effective were below 700 MWt, where the plant is not normally operated, as during the test that took place. Lacking information in the details of the analysis made by the Soviets to make these tradeoffs, the team could reach no quantitative conclusion on the overall effect of these changes on safety. Additional action items were needed to be considered, for example, what is the time frame for producing more enriched fuel and therefore reducing the void coefficient? It could be years. It is not clear for the US that having more enriched fuel improves safety, since richer fuel might increase the control of the reaction. The Soviets place most of the blame for the accident on the operator’s failure to follow procedures. Many of the fixes suggested by the Soviets rely on the operators following more complex procedures. According to the U.S., it was extremely easy to bypass, to cutoff, or to otherwise render RBMK safety systems inoperative. There was no action item identified to deal with the safety procedures. Gelino, Rey, Siegler, Sood, Verlinden 17 The reliability of these fixes is dependent on the method used to implement them. For example, will the control rods be limited in how far they can come out of the core by adjustable limit switches or by physical barriers? The fix of inserting the controls rods 1.2 meters in the core should be effective if assured by a mechanical stop or other form of physical barrier. The information provided by the Soviet Commission did not provide a basis for confidence in relying upon procedures to implement this fix. The political scientist Aaron Wildavsky suggested that because of the immense complexity of nuclear power plants, adding safety devices and procedures will at some point actually decrease safety (Pool, 2005). As an example, the control rooms of nuclear power plants have more than 600 alarm lights. Each one of them, considered by itself, added to safety because it reported when something was going wrong. The overall effect in case of a serious accident would be total confusion, as so many alarms would go off that the mind could not easily grasp what might be happening. This example solidifies the government’s perspective that adding more complex procedures does not really improve safety. Changes Implemented The initial Soviet accident analysis heavily placed blame on the operators. Since the accident was thought to have stemmed from operator errors, “they gave their highest priority to organizational measures aimed at ruling out a recurrence of the status of Chernobyl Unit 4 immediately before the accident.”(Birkhofer, 1996) The preventative measures focused on the idea that if operators were not allowed to let a reactor reach a similar condition, a similar accident would not occur. To prevent this status from occurring, the Soviets implemented the following measures: 1. It was forbidden to perform experiments or disable reactor protection systems. 2. Stricter regulations dealing with reactor operation were applied. 3. Operating staff were required to check if any deviations from design had occurred in the construction of a reactor. In later years when accident analysis began to reveal a more systemic nature of the accident, preventative measures began to confront system design problems rather than operators. The known design deficiencies were systematically addressed. The positive void coefficient of the RBMK design was eliminated by installing additional control rods, increasing the absolute minimum running power level and increasing the enrichment of the fuel. The efficiency of the emergency stop system was increased by eliminating the initial increase in reactivity when engaged, increasing the speed at which control rod insertion occurs and retrofitting a new 2.4 second shutdown system. The operability of the reactor was supplemented by a display showing the surplus reactivity more frequently. Gelino, Rey, Siegler, Sood, Verlinden 18 The additional safety measures in the RBMK design made the reactor sufficiently safe. The IAEA reports that “the existing upgrading programs address most of the safety concerns.”(Birkhofer, 1996) With the adequacy of the retrofitting programs, the remaining RBMK reactors were allowed to continue operation. However, no reactors operate with a positive void coefficient and no new RBMK reactors will ever be built. 8.2 The “Team’s” Recommendations The Chernobyl team from IOE 491 refereed to as “The Third Committee” from this point forward, has worked on recommendations for the prevention of a Chernobyl type accident to happen again. Our recommendations are inspired by the insights of Nancy Leveson and James Bagian. A nuclear power plant needs to focus on safety in the initial design, everyday operations, and maintenance. Preventative maintenance programs are great, but a plant should not skip maintenance on equipment just because it is scheduled for a later date. If an operator hears unusual noises coming from a pump, the pump needs to be fixed now, not a month from now when preventative maintenance is scheduled. It is extremely important to realize when trying to increase safety that one cannot just look at components or subsystems. The system as a whole must be considered. As we have discussed before, during the events that took place at Chernobyl, an operator was pulling out control rods to increase the reactivity of the core while another operator was adding water to cool the core which created an extremely volatile environment. Operators need to understand how their actions interact with other operator’s actions. As we saw with the Columbia and Challenger space shuttle accidents, we cannot fix the system by fixing one component; but we must look at the system as a whole since small failures can lead to catastrophic events. The procedures at a nuclear power plant should specify more in depth analysis and encourage reporting when even small deviations occur. The entire fleet of Soviet power plants should be in close contact with each other and even with other power plants in the world. A seemingly small event that occurs at one plant should be broadcast to the entire fleet to generate a learning environment and promote learning from even small mistakes. This promotes a network of safety. The environment of the nuclear power plant must foster a safety culture. At Chernobyl, the test procedure and even the design of the plant were never approved by the Soviet regulatory commission. Knowing that there will inevitably be production pressures, safety should always be a priority. During the construction of Chernobyl, inspectors were looking at welded pipe joints. The inspectors found a faulty joint and went to look at the paper work for the joint. Not only had the welder signed off on the joint, but his manger signed off on the joint all in the name of a speedy construction (Marples, 1996). Production pressures should not get in the way of proper construction. Gelino, Rey, Siegler, Sood, Verlinden 19 Management needs to involve themselves with all levels of plant operation. It is not good enough for management to be concerning themselves with their central authority. Managers must understand the entire operation of the plant and not diminish any observations made by operators. Earlier we examined a quote about a manager comparing a nuclear power plant to a tea kettle. This is not how a manger should be viewing an extremely complex, tightly coupled organization. Dr. Bagian offers us insight into increasing safety in an organization. Setting up an autonomous and financially independent safety committee within the plant is important. With the lack of this autonomous safety committee there was no scrutiny of the design of the plant or the test. Furthermore, the RBMK was not subjected to serious trials and there was no attempt to get certified. A stated safety goal seems so intuitive such as to make sure there is not a catastrophic accident, but the team does not feel that it really is. The operators of the plant do their job and do not understand the effects of their actions on others in the plant. A stated safety goal should be prominently displayed and even minor issues should be discussed and fixed immediately. An open network of communications between the entire Soviet fleet of nuclear reactors and other plants in the world is important. Dr. Bagian described and example from his own personal experience working as Director of the National Center for Patient Safety, Department of Veterans Affairs. He explains that a patient came into the VA and needed an external pacemaker to correct a heart rhythm. However, when the physician tried to complete the procedure, the pacemaker gave an error message. Fortunately, the team had another pacemaker that was working and they were able to use that instead. This pacemaker is probably the most commonly-used external pacemaker in the world. Since the team at the VA had recently received patient safety training, they decided to look into what had caused the problem. After talking to their technical staff they discovered that there was a design flaw in the pacemaker. They immediately took a stopgap measure labeling all the pacemakers with instructions on how to get it working if the error message came on and then began to pressure the manufacturer to correct the design flaw. Expansive communications in the VA led to problem solving a serious problem with everyday equipment. This example exemplifies that there must be an increase in the awareness of the workers to understand all of the processes in a complex environment, like a power plant. 9.0 Conclusions This analysis has shown that various factors had contributed to the accident. The accident was a culmination of many facets of a complex system that aligned in time to erode the system defenses which lead to a catastrophe. This accident has emphasized the necessity for a nuclear power plant to be resilient because accident should be expected and prepared for. Gelino, Rey, Siegler, Sood, Verlinden 20 Glossary Cavitation: The sudden formation and collapse of low-pressure bubbles in liquids by means of mechanical forces, such as those resulting from rotation of a marine propeller. Chain reaction: A sequence of neutron-induced nuclear reactions in which the neutrons emitted in fission produce further fission events. Control rod: A rod composed of a material with a high probability of neutron absorption used to control the reactivity of a nuclear reactor and, if necessary, to provide for rapid shutdown of the reactor. Core: The region of a reactor in which the nuclear chain reaction proceeds. ECCS: Emergency Core Cooling System. Fission: A process in which a nucleus separates into two main fragments, usually accompanied by the emission of other particles, particularly neutrons. HRO: High Reliability Organization IAE: Institute of Atomic Energy IAEA: International Atomic Energy Agency. Moderator: A material in which the neutron kinetic energy is reduced to very low energies, mainly through elastic scattering. MWt: Megawatt Thermal NPP: Nuclear Power Plant. Poison: A material, produced in fission or otherwise present in the reactor, with a high probability of absorbing neutrons. RBMK: Russian acronym for High-power boiling channel. Reactivity: A measure of the reaction rate at which a reactor is operated at. Reactor: A device in which heat is produced at a controlled rate, by nuclear fission or fusion. VA: Veterans Affairs Void: A region (or bubble) of vapor in a coolant that normally is a liquid. Void coefficient: The rate of change of the reactivity with change in the void volume; a positive void coefficient corresponds to an increase in reactivity when the void volume increases. Gelino, Rey, Siegler, Sood, Verlinden i Works Cited Bagian, J. (2005, February 10). High Reliability and Patient Safety. Presented at an IOE 491 lecture at the University of Michigan. Birkhofer, A. (1996). Summary and conclusions of the International Forum “One Decade after Chernobyl: Nuclear Safety Aspects”. In International Atomic Energy Agency, One Decade After Chernobyl. (1st ed., pp. 445-474). Vienna: IAEA. Bodansky, D. (2004). Nuclear Energy: Principles, Practices, and Prospects. (2nd ed.). New York: Springer. Chernousenko, V. M. (1991). Chernobyl: Insights from the Inside. Berlin: Springer-Verlag. Clark, H.H. and Brennan S.A. (1991). Grounding in communication, in: L.B. Resnick, J.M. Levine, S.D. Teasley (Eds.), Perspectives on Socially Shared Cognition, APA Books, Washington. Institute of Atomic Energy. (1986). Investigation into the Causes of the Accident at Chernobyl Nuclear Power Station. Moscow International Nuclear Safety Advisory Group. (1986). Summary Report on the Post-Accident Review Meeting on the Chernobyl Accident. Austria: IAEA Laaksonen, J. (1995). The Accident at the Chernobyl Power Plant. Proceedings of Society of Reliability Engineers Conference. Otaniemi. Leveson, N.G. (2002). Chapter 3: Extensions Needed to Traditional Models. In A New Approach to System Safety Engineering. (Work in progress, p. 35-53) Retrieved on January 2005 from www.ctools.umich.edu Marples, D. Chernobyl and Nuclear Power in the USSR. New York: St. Martin’s Press, 1986 May, J. (1989). Book of Nuclear Age: The Hidden History, The Human Cost. New York: Pantheon Books. Medvedev, Z. (1990). The legacy of Chernobyl. Oxford: Basil Blackwell Ltd. Mould R F. (2000), Chernobyl Record: The Definitive History of the Chernobyl Disaster. Bristol and Philadelphia: Institute of Physics Publishing. Natvig, B. and Gasemeyr, X. (1995). Proceedings of Conference on Probability Risk Analysis of Technological Systems. Oslo. National Geographic. (2005). Seconds from Disaster: Meltdown in Chernobyl. Retrieved video February 2005 from National Geographic Channel. Nuclear Energy Agency. (1987). Chernobyl and the Safety of Nuclear Reactors in OECD Countries. Paris: OECD. Gelino, Rey, Siegler, Sood, Verlinden ii Perrow, C. (1999). Normal Accidents: Living With High Risk Technology. Princeton: Princeton University Press. Pool, R. (2005). Searching for Safety. Frontline. PBS. Retrieved April 2, 2005, from http://www.pbs.org/wgbh/pages/frontline/shows/reaction/readings/search.html. Rasmunssen, J. (1983). Skills, Rules and Knowledge; Signals, Signs and Symbols and Other Distinctions in Human Performance Models. In IEEE Transactions on Systems, Man and Cybernetics, 13, 3. Reason, J. (1987). The Chernobyl Errors. Bulletin on the British Psychological Society.(Vol. 40., p. 201206.) Shabad, T. “Mismanagement at Chernobyl Noted Earlier.” (1986, May 13). The New York Times, p. A7. Stang, E. (1996). Chernobyl: System Accident or Human Error?. Radiation Protection Dosimetry. (Vol. 68., No ¾, p. 197-201.) Center for Technology and Culture-Oslo: Nuclear Technology Publishing. Stanton, N. (1996). Human Factors in Nuclear Safety. London, Great Britain: Taylor and Francis. Swiss Agency for Development and Cooperation. (2005). Health Effects. Retrieved April 10, 2005, from http://www.chernobyl.info. U.S. Department of Energy. (1986). Report of the U.S. Department of Energy’s Team Analysis of the Chernoby-4l Atomic Energy Station Accident Sequence. Washington D.C.: United States Government Printing Office. Vargo, G. (Eds.).(2000). The Chornobyl Accident: A Comprehensive Risk Assessment. Columbus: Battelle Press. Weick, K., Sutcliffe, K., and Obstfeld, D. (1999). Organizing for High Reliability: Processes of Collective Mindfulness. In R. I. Sutton and B. M. Staw (Eds.), Research in Organizational Behavior. (Vol 21., pp. 81-123.) Stamford: JAI Press Inc. Wickens, C.D. and Hollands, J.G. (2000). Chapter 3: Performance Levels and Error Types. In Engineering Psychology and Human Performance. (3rd ed., p. 69-118).Upper Saddle River: Prentice Hall. Russian Resources Avarija na Chernobyl’skoj AES i ee posledstivija. (1986). The Accident at the Chernobyl Nuclear Power Plant and its Consequences. Moscow. Institute of Atomic Energy Report (1987), No. 33/806587 DSP. The Soviet Report on Chernobyl to IAEA. (1986). Gelino, Rey, Siegler, Sood, Verlinden iii Appendix A Description of the Nuclear Processes Fission U-235 Fission Products Neutron A neutron collision breaks the nucleus into two pieces Kinetic energy is captured as heat from friction of excited atoms On average 2.5 new neutrons are released from each fission Neutron Chain Reaction U-235 Neutron Fission Products Each new neutron causes another fission This leads to exponential growth of fissions and energy release Gelino, Rey, Siegler, Sood, Verlinden iv Controlling the Reaction Neutron Absorbing Material U-235 Fission Products Neutron Control rods: constructed of neutron absorbing material Constant observance of power levels and automated control rod systems keep the energy level constant Controlling the Reaction U-235 Neutron Fission Products Operators have the ability to control the energy level by controlling the amount of neutron absorbing material in the reaction Gelino, Rey, Siegler, Sood, Verlinden v Constant Energy Level Absorption Fission Gelino, Rey, Siegler, Sood, Verlinden vi Appendix B RBMK Reactor Design Diagram Key 1 Uranium fuel 9 Turbine 16 Feedwater 2 Pressure tube 10 Generator 17 Water counterflow 3 Graphite moderator 11 Condenser 18 Circulating pump 4 Control rod 12 Condense pump 19 Water dispenser tank 5 Protective gas 13 Heat transport 20 Steel casing 6 Water/steam 14 Feedwater pump 21 Conctere shield 7 Moisture separator 15 Preheater 22 Reactor building 8 Steam to turbines Gelino, Rey, Siegler, Sood, Verlinden vii Appendix C Cognitive Analysis of Phenotypic and Genotypic Problems Description of Problems Identified Time Task Description Phenotype Description Genotype Description April 25 1400 Emergency core cooling system disengaged as required by test procedures Continued power reduction was delayed for 9 hours due to request from dispatch center to maintain full output of TGG # 8 to the grid. Operation with emergency core cooling system out of service was a violation of regulations. 2310 During the process, the operator disengaged the automatic control rod system (LAR) and failed to properly set “hold power” setpoint on back power controller. As a result reactor power rapidly fell to 30 MWt before he could regain control. • The emergency cooling system was turned off by the operator • Increased complacency • Necessary violation • Operator forgot to turn on the emergency cooling system • Skill-based error type • Error (Slip)/Violation • Overconfidence bias • Operator disengages the automatic control rod system • Necessary violation • • Error (Slip/Lapse) Operator failed to set “hold • Operator mental power” setpoint on back model/heuristics power controller • System complexity April 26 0100 Power stabilized at this level. An attempt to increase power to the desired 700-1000 MWt was difficult. • 01030107 Two additional main cooling pumps were placed into service for a total of 8 pumps as the test program directed. Since the reactor power was far below the • • Accident precursor • Complex interactions Power stabilization was • Dysfunctional difficult interactions • Coupling • System recovery Two additional main cooling pumps were placed • Operator mental into service model/heuristics Gelino, Rey, Siegler, Sood, Verlinden viii planned test power of 700-1000MWt, the total flow increased above that allowed in the operating regulations. This resulted in a decrease in steam separators and the inlet coolant temperature approaching saturation. (Pumps were operating in a regime where cavitation was possible). 0119 Operator began manual replenishment of water to steam separator. As the cold water reached the reactor core, there was a sharp drop in the steam fraction of the coolant, and a corresponding power decrease. 0122 Operator reduced feedwater flow rate. This allowed increase of inlet temperature and compound the events 1 minute later. Operator noted reactivity to reserve was about 6-8 rods as opposed to the normal of 30 rods. Far below 0122:30 the value where reactor shutdown is required. No action taken. • • Reactor power was far below the planned test power of 700-1000MWt Changes made in system settings resulted in a decrease in steam separators and the inlet coolant temperature approaching saturation • Adding of water to steam separator sharp drop in the steam fraction of the coolant, and a corresponding power decrease • Reduction of feedwater increased temperature and compounded the events minute later • Operator took no action even when the noted reactivity to reserve was about 6-8 rods as opposed to the normal of 30 rods. • Accident precursor • Dysfunctional interactions • Operator error • Complex interactions • Dysfunctional interactions • System error tolerance • Complex interactions • Dysfunctional interactions • Operator heuristic • Bounded rationality • Bottom up approach • Error detection • Rule based behavior • Operator heuristic • Bounded rationality • Bottom up approach • Error detection • Rule based behavior • Misapplication of good rules • Operator confidence bias • Operator complacency • Operator heuristic • Knowledge/Rule based behavior • Availability heuristic • Bounded rationality Gelino, Rey, Siegler, Sood, Verlinden ix Steam flow valve to TG # 8 was closed to begin test. The signal for reactor shutdown on closure of both turbogenerator steam valves had been disengaged in 0123:04 order to allow repeating the test if needed. This was in violation of the test program and not operating procedures. • The axial flux double peaked with the higher peak in the top section of the core. • Signal for reactor shutdown on closure of both turbogenerator steam valves had been disengaged • Test continued and even repeated tests performance considered • Flow rate began to fall as the 4 main cooling pumps powered by TG # 8 began to run down. Steam pressure also began to increase due to removal of TG steam load and the reduction (by operator action) of the feedwater rate about 1 minute before. All these actions lead to the increase in the coolant void fraction (positivity reactivity insertion) and a resulting power increase. • • Due to the rector conditions at the time of the test, the • void fraction increased many times more sharply than at normal power. (Also, the void reactivity coefficient “promoted deterioration of the situation”. Only the Doppler effect partially compensated for the void reactivity increase. Water flow continued to Flow rate fell as the 4 main cooling pumps powered by TG # 8 began to run down. Steam pressure also began to increase due to removal of TG steam load and the reduction (by operator action) of the feedwater rate about 1 minute before. All these actions lead to the increase in the coolant void fraction (positivity reactivity insertion) and a resulting power increase. Due to the rector conditions at the time of the test, the void fraction increased many times more sharply than at normal power • Complex interactions • Dysfunctional interactions • Violation • Confidence bias • Bounded rationality • Confidence bias • Bounded rationality • Complex interactions • Dysfunctional interactions • Complex interactions • Tight coupling • Dysfunctional interactions • Complex interactions • Tight coupling • Dysfunctional interactions • Complex interactions • Tight coupling • Dysfunctional interactions Gelino, Rey, Siegler, Sood, Verlinden x decrease (due to 4 main cooling pumps) and the power continued to increase. This lead to the crisis of convective heat transfer i.e. heating up of the fuel, its disintegration, rapid voiding of the coolant, sharp increase in the pressure in the operating channels, rupture of the channels, and the thermal explosion. Steam formation and the sharp temperature increase created conditions for steam zirconium and other exothermic reactions. A fuel failure specific energy was greater than 300 Cal/gm. • • • • • • Recorded data shows the check valves between the coolant pumps on the fuel channel inlets closed. This was due to the rapid rise in the fuel channel pressure. The void reactivity coefficient “promoted deterioration of the situation”. Only the Doppler effect partially compensated for the void reactivity increase. Water flow continued to decrease (due to 4 main cooling pumps) and the power continued to increase. This lead to the crisis of convective heat transfer i.e. heating up of the fuel, its disintegration, rapid voiding of the coolant, sharp increase in the pressure in the operating channels, rupture of the channels, and the thermal explosion. Steam formation and the sharp temperature increase created conditions for steam zirconium and other exothermic reactions. The check valves between the coolant pumps on the fuel channel inlets closed due to the rapid rise in the fuel channel pressure • Complex interactions • Tight coupling • Dysfunctional interactions • Error tolerance • Complex interactions • Tight coupling • Dysfunctional interactions • Complex interactions • Tight coupling • Dysfunctional interactions • Complex interactions • Tight coupling • Dysfunctional interactions • Complex interactions • Tight coupling • Dysfunctional interactions Gelino, Rey, Siegler, Sood, Verlinden xi Shift director gave orders to press scram button. Operator saw rods were stopping before they reached bottom. He “felt shocks” (other translations are loud reports, banging). He released servo mechanism to allow rods to fall into core, but their actual movements are uncertain. • Rods could not reach the bottom • Shock waves were experienced • Actual motion of rods uncertain as accident took place before that Two explosions were heard – “hot fragments and sparks flew up above the fourth plant, some of which fell on the roof of the turbo-generator room and started a fire”. • Explosion occurred • Complex interactions • Complex interactions • Complex interactions • System accident • Complex interactions • System accident Gelino, Rey, Siegler, Sood, Verlinden xii
© Copyright 2026 Paperzz