The Always-On Assembly Line: How Fault Tolerant Servers Boost

The Always-On Assembly Line:
How Fault Tolerant Servers Boost Output and
Reduce Costs for Manufacturing
„„„ In this white paper, we will provide a management-level overview of some important
concerns regarding IT-supported manufacturing. Most importantly, we will explain how high
availability computing has become an imperative for manufacturing, and how the latest FT
servers provide an innovative method of keeping production lines and reporting running,
without interruption, and at a lower cost than other options.
Table of Contents
1 – Executive Summary
2 – Even a Minute of Downtime is
Too Much
2 – Downtime, Reporting,
and the Law
Executive Summary
Any downtime on the assembly line is tremendously costly. In the year 2000, industry
experts estimate that aggregate downtime costs U.S. manufacturers over $1.5 million per
hour in lost revenue alone. And even the briefest interruption not only takes a toll on
production, it can also create disastrous ripple effects down the supply chain, resulting in
fines and lost business when goods are not delivered on time.
3 – The Server
4 – The Central Server
5 - High Availability
Computer automation of manufacturing plants is increasing productivity, reducing downtime
on the line, and helping to eliminate these risks. Not only do programmable automation
controllers (PACs) and robotics perform more tasks on the assembly line, but in addition,
6 - The Science of High Availability
supply chains, quality control, human interfaces, and data collection are becoming
8 – Hot-Swappable Hardwar e
increasingly computerized. While automation certainly makes manufacturing more efficient
8 - ActiveUpgrade™ for
and productive, it also raises the ante for a robust IT infrastructure.
Software Maintenance
8 - What Makes an FT Server the
Best Value for Manufacturing?
At the center of automated manufacturing is the server: the computer hardware that controls
and coordinates an intricate system of automated components and interfaces. Most recently,
manufacturing plants have relied on clusters of redundant servers to control plant and shop
9 - Total Cost of Ownership
9 - Conclusion
floor systems. The redundancy of these clusters helps reduce computer downtime, thus
reducing assembly line downtime.
Page 1 of 10
The Always On Assembly Line
Another approach to reducing downtime, with more reliable fault tolerant (FT) servers, has
been overlooked because of the higher price of the hardware. However, the actual total cost
of ownership of an FT server is significantly lower than clusters or other options. Moreover,
the cost of FT servers is coming down, making FT by far the best value for automating a
manufacturing infrastructure.
Even a Minute of Downtime is Too Much
Manufacturing plants struggle constantly with the challenge of reducing—ideally
eliminating—downtime on the assembly line. Some manufacturing plants can lose hundreds
of thousands of dollars—even millions—per hour in lost production when production shuts
down unexpectedly. Other factors that exacerbate costs for downtime include the following:
• In many industries, compliance with government regulations, particularly in the areas of
safety and the environment, is a growing factor, requiring continuous data collection and
reporting mechanisms. Any downtime in the reporting system means that critical data is
lost, exposing the company to fines and even criminal liability.
• The “big board” control panels of yesteryear are being replaced by computer-controlled
human-machine interfaces (HMIs) and dashboards. Computer downtime may mean a
loss of control over plant operations.
• Today’s automated plants continually record data to track the location and status of all
parts, supplies, and goods on the plant floor. When production stops, this data is critical
for operations recovery. The more accurate the tracking data, the sooner and less
expensive the restart or recovery is. But if the automated systems go down and this
data is compromised or lost, manual recovery is slow and costly.
• As manufacturing has moved from vertical integration to an extended enterprise made
up of hundreds or even thousands of potential partners, the total costs of downtime are
extending throughout the supply chain. Many integrated manufacturers are establishing
costly penalties for suppliers who fail to meet minimum performance or delivery metrics.
Downtime, Reporting, and the Law
When the automated reporting mechanisms in a plant go down, even for a minute,
manufacturers face serious consequences in complying with government regulations. Here
are a few examples:
• The TREAD Act of 2000, enacted after the infamous Ford/Firestone recall, requires
automobile manufacturers to submit detailed records to the National Highway Traffic
Safety Administration (NHTSA) of the serial numbers of components that are installed
on each vehicle, to expedite any safety recalls on those components. Manufacturers
face criminal liability if any gaps in reporting interfere with a recall for a safety-related
defect that causes death or serious bodily injury. In 2006, auto manufacturers spent
over $60 billion in warranty remediation; many of these costs can be reduced or avoided
with complete and accurate monitoring.
Page 2 of 10
The Always On Assembly Line
• The US Environmental Protection Agency (EPA) oversees extensive regulations that
apply to emissions of water and air pollutants. Manufacturers who discharge pollutants
must continually monitor their emissions, and gather and maintain time-specific data to
document the volumes of pollutants they release into the environment. If data is lost,
manufacturers face heavy fines, and even litigation.
• For drug makers, medical device manufacturers, biotech companies, biologics
developers, and other industries regulated by the Food and Drug Administration (FDA),
current Good Manufacturing Practice (cGMP) regulations, particularly 21 CFR Part 11,
require manufacturers to implement and monitor quality controls, validation systems,
and documentation of all aspects of their plant operations. Downtime can result in loss
of data that is critical for FDA compliance.
The Server: Nerve Center of Manufacturing
Computer hardware is truly the nerve center, or backbone, of today’s automated
manufacturing plants. Plants have become increasingly reliant upon state-of-the-art
manufacturing equipment and controls to increase output, reduce costs, boost safety, and
eliminate the risks of missing shipments.
Components of the Automated Manufacturing Plant
The productivity of an automated manufacturing plant depends on constant communication
with the central computing system in order to function properly. Uninterrupted and reliable
operation is of particular importance in the following subsystems.
• Manufacturing Execution Software (MES)
MES, which runs on the central computing system, is the software hub of the assembly
line. Its role is to control and coordinate the interrelationship between the shop, the
plant, and the enterprise. It ensures that all of the functions of the plant are operating
smoothly, that data is being captured and reported, and that human operators receive
the right information when they need it. Although various modules of MES may run on
separate servers, if any one of these servers fails it can result in a serious—sometimes
complete—stoppage of plant operations.
• Automated Human Machine Interfaces (HMIs)
The control and feedback loop between human operators and plant machinery is vital.
State-of-the-art systems now replace the “big board” controls with computer dashboards
of virtual controls and monitors. These automated HMIs are more efficient, reliable, and
comprehensive—as long as the central server is online at all times. Any server
downtime results in loss of control and interruption of all systems of the plant.
• Programmable Logic Controllers (PLCs)
Robotics, PLCs and other forms of shop floor programmable automatic controllers
contain their own control logic for the specific tasks they perform. However, they must be
constantly monitored by the shop or plant system for status and performance data.
Page 3 of 10
The Always On Assembly Line
• Assembly Line Monitoring
For better efficiency, as well as reporting and compliance, plants are automating their
systems for tracking what parts and supplies are where along the assembly line. Lapses
in this information can cause long downtimes, manual reconciliation, and in the case of
government compliance, costly penalties. The more reliable the central computing
system, the less likely that assembly line monitoring mechanisms will require human
intervention.
• Inventory
A critical step in the supply chain, inventory control is now automated using barcodes
and wireless technology. But the inventory is only as good as the incoming data. Lost
data means manual counting, reconciling, and adjusting of inventory records.
• Compliance and Reporting
An increasingly important—and often burdensome—part of the manufacturing process compliance and reporting requirements—are more likely to increase than decrease over
time. The need for reliable methods of tracking, recording, and storing continuously
generated data, without interruption, is imperative. Manufacturers who implement the
most efficient compliance and reporting methods will save time and money, and remain
more competitive.
The Central Server: The Backbone of Effective Manufacturing
Assembly Line
Monitoring
Automated HMI
PLCs
Manufacturing Execution
Software (MES)
Central Computing System
Inventory
Supply Chain Management
Integration
ERP
Integration
Compliance and
Reporting
The effectiveness of Manufacturing Execution Software (MES) is dependent on the
uninterrupted service of the central computing system.
Page 4 of 10
The Always On Assembly Line
• Supply Chain Management (SCM) Integration
The manufacturing process is just one link in the supply chain. Through automated
supply chain management (SCM), incoming parts and outgoing inventory are tracked
and routed in the most efficient manner possible. MES software integrates to SCM to
ensure that goods are ordered, shipped, and received on time. Interruption of the link
between MES and SCM can have ripple effects up or down the supply chain, costing a
manufacturer time and money.
• Enterprise Resource Planning (ERP) Integration
Because the assembly line is the life blood of a manufacturing company, executives,
managers, legal and finance staff, and human resources staff are vitally interested in
integrating the systems they use with those that control the line. Data on product output,
supply distribution, compliance, staffing, and many other facets of the line have a direct
impact on the front- and back-office operations of the enterprise. For these reasons,
major ERP software systems are becoming increasingly integrated with MES.
Therefore, loss of data or communications due to computer downtime can interrupt
many other offices throughout a manufacturing company.
High Availability: Why Does It Matter in Manufacturing?
Automotive manufacturing is typical of industries with extended enterprises, complex supply
chains, many partners and vendors, and complex compliance issues. When an auto plant
stops due to a computer failure, as much as $40,000 per minute of profit is lost to the
company, directly. Indirect costs can include penalties from $5,000 to $50,000 if shipments
are missed, and audits and fines from the federal government if compliance data is lost.
To avoid these unnecessary costs, manufacturing environments need high availability
computer hardware for the backbone of the system. High availability describes a system
designed and implemented to ensure a certain absolute degree of operational continuity.
Availability is generally measured as a percentage of uptime. Since uptimes generally vary
from 99% to 99.999%, availability is commonly expressed in terms of “nines.” An average
availability of “five-nines,” or 99.999%, represents the optimal performance of today’s high
availability computing technologies.
“Five Nines” is State-of-the-Art for High Availability
Nines
Availability
Downtime
1
90%
36.5 days/year
2
99%
3.65 days/year
3
99.9%
8.76 hours/year
4
99.99%
52 minutes/year
5
99.999%
5.25 minutes/year
Page 5 of 10
The Always On Assembly Line
For manufacturing, the difference between “four nines” and “five nines” is significant. For
example, a system with only 99.99% uptime could be down for as much as one minute per
week. And one minute of downtime can result in much more time in bringing operations back
online.
The Science of High Availability
A more scientific approach to the varying types and levels of availability has been developed
„„„ “Once
manufacturing
companies weigh the
costs and the risks of
the high availability
hardware they are
using now, the
decision to switch to
FT servers is an
obvious one.”
Lary Marshall,
Business Development
Manager,
Rockwell International
by the analyst firm IDC. The following table is adapted from IDC’s “Availability Spectrum,”
which it uses to aid hardware buyers in selecting an appropriate level of availability.
Clearly, AL4 is the superior level of availability for manufacturing and MES applications,
since it offers an environment with no interruption of service, even through a hardware failure.
However, companies are unfortunately enticed, due to budget constraints, to purchase AL3
or even AL2 systems.
Impact of Component Failure on
Priority User
System Protection Features
Availability level 4
(AL4)
Transparent to user; no interruption
of work; no transactions lost; no
degradation in performance
100% component and functional
redundancy
Availability level 3
(AL3)
Stays online; current transaction may
need restarting; may experience
performance degradation
Automatic failover transfers user
session and workload to backup
components; multiple systems
connections to disks
Availability level 2
(AL2)
User interrupted, but can quickly relog on; may need to rerun some
transactions from journal file; may
experience performance degradation
User work transferred to backup
components; multiple system
access paths to disks
Availability level 1
(AL1)
Work stops; uncontrolled shutdown;
data integrity ensured
Disk mirroring or RAID, and a logbased journal file system for
identification and recovery of
incomplete in-flight transactions
Source: IDC, 2006
Although the concept of high availability computing is not new in manufacturing, the levels of
availability, and the relative costs of these levels, is new to many in manufacturing industries.
In many plants, IT attempts to “cobble together” systems using off-the-shelf computing
hardware to achieve high availability through various forms of clustering technology.
Unfortunately, this approach requires considerably more effort to install and stabilize,
advanced training of plant managers and IT staff, and reliance on third-party service
organizations for maintenance.
For these reasons, AL4 fault tolerant systems that deliver 99.999% uptime are gaining
acceptance among manufacturing companies. These systems ensure continual, always-on
operation of the MES and all of the components and systems that MES coordinates. While
in the past it has sometimes been “good enough” to save money on the hardware investment
and tolerate some computer downtime and lapse in productivity, today the reduced cost of
ownership of FT servers may save them many times their investment by increasing
productivity and avoiding line stoppages.
Page 6 of 10
The Always On Assembly Line
How Do Fault Tolerant Servers Deliver High Availability?
A fault tolerant server is built from the ground up to perform at AL4 with “five nines”
availability. It is a fully internally-redundant system that goes far beyond the traditional server
cluster in terms of reliability and cost-effectiveness. The overarching benefit of an FT server,
aside from its superior AL4 high availability, is that the features that ensure high performance
are built into the hardware, so they do not have to be installed, configured, and maintained
using software.
Here are some major features of FT servers—particularly NEC’s Express5800/ft series of
servers—that set this technology apart from other high availability technologies.
Redundant Components
All of the memory, processors, and other components of the FT server are redundant to each
other, and physically configured to operate in lockstep. Therefore, one FT server is the
equivalent of two conventional servers performing the exact same processes at the same
time. In the event that one component fails, its counterpart continues functioning with no
interruption in the operation of the system. Since failover is virtually instantaneous, there is
no single point of failure, and downtime is nearly eliminated.
Conventional vs. Fault Tolerant Servers
Conventional Server System
I/O-PCI
Disk
Memory
Chipset
CPU
CPU
CPU
CPU
NEC High Availability (Fault Tolerant) Server System
Disk
PCI
Mirror
Multipath
Processing Subsystem A
Memory
Fault
Detection
& Isolation
Fault
Detection
FT Cross Bar
Extended
PCI
Fault
Detection
& Isolation
I/O Subsystem B
CPU
CPU
CPU
CPU
Lockstep
Memory
Fault
Detection
Chipset
CPU
CPU
CPU
CPU
System B
PCI
Disk
Chipset
System A
I/O Subsystem A
Extended
PCI
Processing Subsystem B
Fault tolerant servers are superior to conventional servers because there is no
single point of failure, no switchover, and a single logical server.
Theoretically, a cluster of servers (providing availability at level AL2 or AL3) operates on the
same principles as an FT server. However, clustered servers must be configured to work
together. Failover function and operation must be set up and diligently maintained by skilled
technicians using specialized software. For some IT departments, the effective configuration
and maintenance of a high availability cluster may require outside technical expertise.
Page 7 of 10
The Always On Assembly Line
Hot-Swappable Hardware
When a hardware component in an FT server does need replacing, repairs can be made with
no downtime, while the overall system is still “hot” and online. Even routine hardware
maintenance does not require any planned downtime, and can be performed without
Before hot-swappable components were available in FT servers, clusters had the advantage
in this regard, since it is relatively easy to take one machine in a cluster offline for
„„„ “The
emergence of true
fault tolerant
solutions for standard
Windows- and Linuxbased server
environments from
vendors including
Stratus, NEC, and
others, will provide
customers with
alternatives to the
complexities of
clustered-server
solutions.”
IDC: Stephen L. Josselyn
maintenance. Now with hot-swappable FT components, companies do not have to
compromise a level of high availability to maintain these servers.
ActiveUpgrade™ for Software Maintenance
Active Upgrade, a feature of NEC’s FT servers, allows for software updates with minimal
interruption of service. One module can be administered offline while the second module
continues to handle the operational load. When updates are completed, the two modules
synchronize data and return to full redundant operation. A rollback feature is integrated into
Active Upgrade in order to return the server to its previous state in the event there is a
problem with the software update.
What Makes an FT Server the Best Value for Manufacturing?
The intense pressure to stay competitive in many manufacturing industries, plus the
increasing pressures of government regulations, makes it critical for manufacturing
companies to achieve the greatest productivity with the least expenditure of money and effort.
The money and effort required to install and deploy an FT server is lower, by every metric,
than that of other high availability technologies. The combination of a reduced initial
hardware expense, coupled with lower operating costs, makes FT servers a superior value
over any other high availability technology. There is simply no reason to compromise
productivity or risk supply chain failures.
• One NEC Express5800 FT server is less expensive than two conventional servers
It is no longer viable to try saving money by purchasing two less expensive conventional
servers and clustering them together. Like so many other computer hardware products,
FT servers are becoming more affordable.
• FT servers require less configuration and maintenance
IT departments already struggle with high maintenance expenses—in fact, analysts who
survey IT groups find that many spend 50%, even 70% of their budgets on maintaining
the systems they have, as opposed to building and improving their operations.
FT servers are less expensive to operate than clusters or RAID arrays, because the
redundancies and configurations between the components are built into the hardware,
instead of configured via software. Fewer specialized IT skills are required to implement
FT servers, so it is no longer necessary to hire or contract workers with special skills in
order to implement a high availability system.
Page 8 of 10
The Always On Assembly Line
• One FT server requires only one software license
Unlike a cluster, which requires multiple licenses of the operating system, databases,
server applications and other software (one for each server in the cluster), an FT server
requires only one license of each. Software costs can be dramatically reduced, along
with the effort required to install and maintain multiple instances of each software
product.
„„„ “Although
clustering has been the
predominant vehicle for
improving server
availability
characteristics, the
complexity that is often
associated with these
types of solutions in
certain market segments
has hampered more
widespread
acceptance…. IDC
believes that the
complexity that is often
associated with some of
these implementations
must be further hidden
from the end customer to
preserve ease of use for
IT shops that do not
specialize in clustering
technology and skill-sets
(e.g. scripting to create
cluster-aware
applications).”
• Less downtime means less maintenance
Because downtime is dramatically reduced by virtue of the high availability hardware,
maintenance time is also reduced. IT staff can concentrate on other critical tasks.
Total Cost of Ownership of High Availability Hardware
Alternatives
Fault tolerant servers provide lower total cost of ownership over the
long haul—especially in the event of an outage.
IDC: Stephen L. Josselyn
Page 9 of 10
The Always On Assembly Line
Conclusion
A fault tolerant server is clearly the must-have hardware for an effective and productive
manufacturing plant. Manufacturing businesses can ill-afford to gamble with the upstream
and downstream effects that plant stoppages can have on their supply chains, or with the
costs of failure to comply with government regulations—and now they don’t have to.
The decreasing upfront cost of FT servers, combined with lower total cost of ownership
throughout the life of the computer, makes the purchase of an FT server easy to justify for
any manufacturer that seeks to reduce risk, cut costs, and boost output.
NEC Fault Tolerant Servers
NEC offers a full line of Express5800 fault tolerant servers that offer up to 99.999% uptime.
If you would like more information, call: 866-632-3226 or visit www.necam.com/ft.
NEC CORPORATION OF AMERICA
Departmental Servers Division
2880 Scott Boulevard
Santa Clara, CA 95050
© NEC Corporation of America. All rights reserved. Specifications subject to change without notice. NEC is a registered trademark and
Empowered by Innovation is a trademark of NEC Corporation. All other trademarks are the property of their respective owners.
www.necam.com
(866) 632-3226
Page 10 of 10