Get to the Root of Your Business Service Quality Issues

White Paper
Get to the Root of Your Business
Service Quality Issues
White Paper
Business Service Quality Matters
Agile delivery of
high-quality business
services creates
competitive advantage
and drives profitability.
Business services are the lifeblood of any modern organization. They provide business
insights, empower employees, and enhance customer satisfaction. Agile delivery of highquality business services creates competitive advantage and drives profitability. On the
other hand, poorly performing business services result in radically reduced productivity,
declining market position, financial loss and ultimately organizational failure.
To illustrate the critical importance of business service quality, consider a real-life example
– an airline. The typical air carrier relies on dozens of business services – in fact, one
leading international airline identified 230 such services, of which 60 were mission-critical.
These services include seat inventory, yield management, departure control, cargo
logistics, and many other industry-specific business systems. In addition, airlines rely on
services that are found across industries, such as payroll, accounting and email.
The failure of just one of these services can have a devastating business impact. For
instance, if an airline’s ticketing system is down, this can affect millions of dollars in
bookings. In 2012, American Airlines boarded nearly 108 million passengers – the
average number of flight legs ticketed per hour exceeded 12,000 and was significantly
higher during peak periods. At a typical price of $200 per flight leg, this translates into
$2.4 million of affected bookings for a single hour’s downtime.
A business service is not simply the application that the end user sees – it is the entire
chain that supports delivery of the service, including physical and virtualized servers,
databases, middle-ware, storage and networks. A failure in any of these can affect the
service – and so it is crucial that IT organizations have an integrated, accurate and
up-to-date view of all of these components and of how they work together to provide
the service.
Why Diagnosing Service Issues
Is Difficult
While it is critical to know how a business service is delivered, many IT organizations
lack this visibility. Furthermore, they do not have all of the availability and performance
information that they need to detect and diagnose service health issues – even if they
could map this data to individual business services. Because of this, identifying and
finding the root cause of service issues becomes an enormous challenge.
There are many reasons for this lack of visibility – and all of these need to be addressed
to ensure that business service issues are detected quickly and resolved effectively.
www.servicenow.com
ServiceNow | 2
Get to the Root of Your Business Service Quality Issues
IT service
configuration data is
typically collected from
isolated technology
domains, rather than
from a top-down
service perspective.
White Paper
Lack of an Integrated Service-Level Dashboard
IT organizations do not deliver a single business service – they deliver many. Therefore,
IT personnel need to know instantly when any of these services starts to experience an
issue – and have to be able to determine the root cause quickly and easily. While many IT
departments have a dashboard based on service transaction monitoring, this often does
not correlate service issues with underlying infrastructure problems. This leaves IT staff
with the task of analyzing vast amounts of low-level infrastructure data in order to identify
the particular IT components that are responsible for the service issue.
Complex, Siloed Infrastructure Configuration Data
IT service configuration data is typically collected from isolated technology domains,
rather than from a top-down service perspective. There is no integrated information about
how services flow – only siloed data about applications, servers, middle-ware, networks
and storage. At best, topological data is limited to relationships between adjacent
components – there is no view of the total set of relationships that support an end-to-end
service, forcing staff to map the service delivery path manually. In addition, much of this
configuration data is infrastructure-oriented and irrelevant from a service perspective –
adding further to the complexity of the mapping process.
Increasingly Dynamic IT Environments
In the past, IT services and infrastructures were both relatively static, with changes
happening over periods of months. However, this is no longer the case – IT organizations
are faced with an ever-accelerating pace of change. This is being driven by virtualization
and cloud, increasing demands for new business services, and new operational
approaches such as DevOps that shorten upgrade cycles dramatically.
This makes tracking how services are delivered using traditional manual methods
impossible – these are designed for changes that happen in weeks, not in minutes. As
a result, IT staff lack the accurate and up-to-date service topology information that they
need to identify, analyze and resolve service issues correctly and efficiently.
Change-Related Service Issues
Change is the single biggest cause of service issues, and the impact of change is only
increasing as it becomes more frequent. Not only do IT staff lack a current and accurate
view of how services are delivered to help them diagnose problems, they also are unable
to see if anything has changed that could have caused a service impairment. This is the
typical “What happened last night?” scenario, and IT personnel now find themselves
struggling to answer the question.
Shared and Redundant Infrastructure
While tracing a service through dedicated infrastructure is difficult enough, the task
becomes even more challenging with shared or redundant infrastructure.
When a service flows across an enterprise bus or load balancer, the ingress point may
be known, but there are multiple possible egress points. This makes it very difficult to
determine which components are downstream of the shared component in the service
flow. This in turn makes it hard to diagnose the root cause of service problems and to
understand the service impact of infrastructure failures.
Similarly, when a service depends on redundant infrastructure such as server clusters,
failures of individual servers within that cluster may just reduce the capacity of the service
or not affect it at all. Without a clear model of the impact of these types of failures, IT staff
run the risk of ignoring crucial problems and of overreacting to minor ones.
www.servicenow.com
ServiceNow | 3
Get to the Root of Your Business Service Quality Issues
Managing service
health effectively is
not just a technical
issue – it is also an
organizational and
cultural one.
White Paper
Inaccurate and Out-of-Date Service Maps
To identify, diagnose and resolve service issues accurately and efficiently, IT staff need
a reliable map of each business service. This must identify all of the components that
support that service, along with the relationships that trace the service topology across
the end-to-end IT infrastructure. However, all of the service change and configuration
issues already discussed lead to fragmented, obsolete and unreliable service maps. As
a result, IT staff miss critical issues, take excessive amounts of time to resolve service
problems, and waste time on infrastructure issues that have little or no service impact.
What is needed is an automated, real-time service discovery tool that spans technology
domains, and that has the intelligence to trace services across complex shared and
redundant IT components – including virtualized and cloud infrastructure. This tool must
be able to update service maps as soon as changes occur, eliminating costly manual
service-mapping approaches that depend on scarce expertise and tribal knowledge.
Multiple Infrastructure-Oriented Monitoring Tools
In the same way that IT configuration data is siloed and infrastructure-oriented, IT
monitoring tools are not designed or configured with services in mind. These tools typically
monitor the health of individual IT infrastructure domains, and deliver information that is
defined by and intended for domain experts. As a result, much of the data they produce
is irrelevant from a service perspective and is not easy for NOC operators to understand.
Furthermore, the monitoring data that is relevant is not mapped back to specific business
services, leading to highly inefficient manual “swivel-chair” root-cause analysis.
At the same time, deploying these monitoring tools is expensive since the process for
doing this is not automated. Because of this, only a subset of IT components are usually
monitored. This leads to gaps in monitoring data – and even if full coverage was possible,
IT staff would be overwhelmed with the resulting flow of domain-specific information.
Since monitors are not deployed with services in mind, this translates into gaps in service
coverage. As a consequence, service issues go undetected – and even when they are
identified, there is often not enough information to determine the root cause.
Lack of Cross-Domain Expertise
Managing service health effectively is not just a technical issue – it is also an organizational
and cultural one. IT departments manage individual domains today, not services. This
creates many domain experts, but cross-domain expertise is rare. Because of this, IT
organizations lack the skills needed to trace a service issue across domains to its root
cause – and instead find themselves pointing fingers in the war room.
ServiceNow® Gets to the Root
of the Problem
ServiceNow ServiceWatch gives IT staff a completely accurate and up-to-date map of how
business services are delivered. It starts with what is important – the business service – and
then drills down intelligently across domains to discover all of the IT components that deliver
that service – along with the relationships that represent the service flow through these
components. Its unique, top-down approach to service mapping is comprehensive and
surgical – it discovers everything that matters from a service perspective while eliminating
irrelevant and confusing infrastructure data.
Once ServiceWatch has discovered a business service, it then continues to monitor the
service topology in real time – updating the corresponding service map as soon as changes
are detected. As a result, IT personnel always have a correct, complete and current map
of how each business service is delivered – and can also determine the precise service
topology at any time in the past.
www.servicenow.com
ServiceNow | 4
Get to the Root of Your Business Service Quality Issues
ServiceWatch collects
service-related
monitoring data from
industry-standard
monitoring tools.
ServiceWatch collects service-related monitoring
data from industry-standard monitoring tools. It
also has its own monitoring system, which can
either be used to augment existing monitoring
data or to provide a complete business service
monitoring solution. ServiceWatch correlates
this monitoring data using its service maps to
produce complete and up-to-date service health
status, along with detailed information about
how infrastructure issues propagate through
the service topology. This gives IT staff instant
visibility of a business service impairment when it
occurs, along with the data they need to isolate
the problem rapidly and identify its root cause.
White Paper
Figure 1.
Real-Time Service Health Dashboard
ServiceWatch’s service health dashboard lets IT staff see the status of all business services
at a glance. Each service is represented by a tile that is color-coded to represent the
health of the service. The size of the tile gives the business impact of the service – which
is calculated using completely configurable business metrics such as the number of users
of the service. The dashboard is always up-to-date and accurate, since service health is
calculated by combining monitoring data with ServiceWatch’s real-time service maps.
Figure 2.
www.servicenow.com
ServiceNow | 5
Get to the Root of Your Business Service Quality Issues
Alerts can be
sent using a
number of different
mechanisms,
including SNMP
traps, command
line invocations
and emails.
White Paper
In the example shown below, an E-banking service
has a critical problem, as do several other services.
Simply clicking on the E-banking tile displays a
prioritized list of all of the events that are affecting
the service, along with their severity.
Similarly, clicking on one of the displayed events
highlights all of the business services that
are affected.
ServiceWatch also lets IT staff define configurable
alerts when conditions occur that require attention
– such as when a business service enters a critical
Figure 3.
state. This is done simply by selecting a trigger
condition along with the set of services to which
the trigger applies. The alerts can be refined further depending on the type of trigger – for
example, the business service status change trigger shown on the right can be refined by
specifying the severities that generate the alert.
Alerts can be sent using a number of different mechanisms, including SNMP traps,
command line invocations and emails. This makes it easy to integrate ServiceWatch with
service desks and other systems, and also lets ServiceWatch alert IT staff directly
when problems occur.
Top-Down, Automated Service Map Discovery
Traditional approaches to building service maps involve collecting bottom-up infrastructure
data from multiple domains. This siloed data is then combined manually into an overall
service topology – a process that can take weeks and needs to be redone every time
there is a change. ServiceWatch, on the other hand, discovers end-to-end service maps
automatically in as little as minutes, and keeps these maps up-to-date as changes occur.
To discover a business service, ServiceWatch simply needs the service entry point – such
as an URL or MQSeries queue. It then drills down through the IT infrastructure, tracing
the service across domains – including applications, load balancers, servers, middleware,
storage and networks. It probes each component in turn, making intelligent decisions about
which components to interrogate next based on the dependencies that it discovers. This
allows it to streamline and accelerate the discovery process, focusing only on what matters
from a service perspective. Irrelevant infrastructure data is eliminated, creating an accurate and
concise service map that simplifies the tasks of problem isolation and root cause analysis.
An example of this
discovery is shown
here. The complete
E- Banking service
has been discovered
from the initial service
entry point, including
applications, load
balancers, web server
clusters, enterprise
buses, databases and
network connections.
Figure 4.
www.servicenow.com
ServiceNow | 6
Get to the Root of Your Business Service Quality Issues
ServiceWatch provides
complete support for
virtualization and cloud,
connecting directly with
management systems
such as VMware
vCenter and Citrix
XenCenter.
White Paper
Optimized For Dynamic Environments
ServiceWatch is designed for today’s dynamic
IT environments. It provides complete support
for virtualization and cloud, connecting directly
with management systems such as VMware
vCenter and Citrix XenCenter. It tracks
changes in dynamic infrastructure as they
occur, keeping its service maps completely
up-to-date as it does.
The image above shows a HAProxy load
balancer running in a virtualized environment.
Users can see that the load balancer is
Figure 5.
connected to a web server cluster, and also
have complete information about the hypervisor,
physical server and virtual machine, as well as
about the load balancer application itself.
Built-In Vendor and Technology Intelligence
ServiceWatch has a knowledge-driven approach to service mapping that makes it unique.
It analyzes IT vendor infrastructure intelligently so that it is able to map end-to-end services
without requiring expert input from IT personnel. This allows it to completely automate the
mapping process – although IT staff are free to make changes to generated service maps
once they have been discovered.
Because of this, ServiceWatch is also able to trace services automatically through shared
infrastructure such as load balancers and enterprise buses. It can also determine the true
service impact of problems in redundant infrastructure such as server clusters and parallel
network links.
Efficient Root-Cause Analysis
ServiceWatch provides comprehensive analysis capabilities that let IT personnel diagnose
the root cause of service issues quickly and accurately. Using the service map, IT staff
can see how issues propagate across the service topology, and can quickly home in on
the component causing the service issue. Each component is color-coded to indicate its
current health status, and clicking on a component brings up a list of all of the issues that
are affecting it.
For example, the E-banking service shown below has an issue due to a Microsoft SQL
Server – circled in orange. Clicking on the server displays an event list that shows there is a
free space problem. Selecting the event in turn highlights that an SSIS component and an
SSAS component are also experiencing symptoms because of the root cause space issue
– both of which are circled in white.
Figure 6.
www.servicenow.com
ServiceNow | 7
Get to the Root of Your Business Service Quality Issues
ServiceWatch
provides
comprehensive
analysis capabilities
that let IT personnel
diagnose the root
cause of service
issues quickly and
accurately.
White Paper
Once the IT component causing the issue has been identified, users can then drill down
into it to see which subcomponent(s) are causing the problem. This allows IT staff to
pinpoint the exact root cause of the service issue so that they can immediately take the
right corrective actions.
For example, returning to the same E-Banking service, this is now exhibiting the same
free space issue on the MS SQL Server as before, but there is also a critical SCOM event
on the SSIS component as shown below. Drilling into the SSIS component shows that all
of its subcomponents are affected by the free space problem. However, one highlighted
subcomponent – a Windows service – is the source of the critical SCOM event.
Figure 7.
In addition to isolating service issues to the subcomponent level, ServiceWatch also uses
service topology to intelligently assess how infrastructure issues affect service health,
including issues with redundant infrastructure and communications paths.
www.servicenow.com
ServiceNow | 8
Get to the Root of Your Business Service Quality Issues
Once the IT
component causing
the issue has been
identified, users
can then drill down
into it to see which
subcomponent(s) are
causing the problem.
White Paper
As shown below, the E-Banking service is now experiencing a major issue with a web
server cluster consisting of two Apache instances. While the cluster is shown as having
a major problem, expanding the cluster shows that one of the instances has a critical file
system issue. ServiceWatch has automatically downgraded the overall cluster status to
major since the second Apache instance is still operating normally.
At the same time, there is a failed port on a network device. In this case, ServiceWatch has
determined which communication paths within the service are affected and has highlighted
these in red to indicate a critical problem.
Figure 8.
www.servicenow.com
ServiceNow | 9
Get to the Root of Your Business Service Quality Issues
Since change is
one of the biggest
reasons why services
have issues, it is
important to be
able to correlate
changes with service
problems.
White Paper
Correlate Service Health Issues with Infrastructure Changes
Since change is one of the biggest reasons why services have issues, it is important to
be able to correlate changes with service problems. ServiceWatch makes it easy to do
this. It can show the topology of a service at any time in the past, and can also highlight
components that have been added, deleted or changed between any two points in time.
In the example shown below, a user has compared the service maps for an intranet service
over a six-day interval. The map clearly shows that an Apache instance and a Web server
instance were added during this time period.
ServiceWatch also displays a chronological list of all of the service health issues that
occurred over the selected time period, and allows users to drill into individual changes to
get component-level details.
Figure 9.
Integrated Service Monitoring
ServiceWatch integrates easily with industry-standard monitoring tools including Nagios and
Solarwinds. Administrators simply need to configure a connection to the monitoring system
using a user-friendly wizard interface. Once the connection has been made, they can then
select the desired monitors for any IT component from the set of monitors that are currently
active in the third-party monitoring system.
ServiceWatch also comes with its own comprehensive set of business service monitors.
These provide a complete cross-domain monitoring platform, and offer service-oriented
metrics that track business service availability and business service performance. Unlike
traditional monitoring solutions, ServiceWatch automatically deploys its monitors to manage
appropriate IT components when specific monitors are selected. This vastly simplifies
the task of configuring and deploying service-focused monitoring in the IT network, and
ensures that all the information needed to track business service health is collected.
Figure 10.
www.servicenow.com
ServiceNow | 10
White Paper
ServiceWatch
integrates easily with
industry-standard
monitoring tools
including Nagios and
Solarwinds.
Conclusion
High-quality business services are the cornerstone of an agile and efficient business. They
underpin almost every aspect of business operations, ranging from back-office functions
such as supply chain through to customer care and strategic planning. When they are
robust and responsive, they create competitive differentiation and unlock new business
opportunities.
However, delivering high-quality business services consistently is the single biggest
challenge that IT organizations face. They struggle to gain visibility of the health of their
business services, and lack the service-oriented information they need to isolate, diagnose
and resolve service issues quickly. They are caught in a perfect storm of ever-increasing
business demands, constant change and limited information to accomplish their mission.
ServiceNow ServiceWatch provides IT organizations with the service visibility they need to
transform the way that they operate. It tracks how services are delivered across the entire IT
infrastructure – including virtualized and physical servers, applications, middleware, storage
and networks. Its service maps are always up-to-date and accurate, providing a rock-solid
foundation for detecting and isolating service problems, and for diagnosing their root cause.
Unlike traditional mapping approaches that can take weeks, ServiceWatch automatically
creates service maps in as little as minutes, and keeps them up-to-date as changes occur.
ServiceWatch gives a comprehensive top-down view of business service health, starting
with high-level dashboards and extending all the way through to isolation of issues to
individual subcomponents within the IT infrastructure. It helps IT organizations to detect
and respond more quickly to critical service issues, and gives them the tools they need to
resolve these issues quickly and cost-effectively. The result is vastly improved service quality
levels, more responsive and agile IT organizations, and dramatically reduced costs.
www.servicenow.com
©2015 ServiceNow, Inc. All rights reserved.
ServiceNow believes information in this publication is accurate as of its publication date. This publication could include technical inaccuracies or typographical errors. The information is subject to change
without notice. Changes are periodically added to the information herein; these changes will be incorporated in new editions of the publication. ServiceNow may make improvements and/or changes in
the product(s) and/or the program(s) described in this publication at any time. Reproduction of this publication without prior written permission is forbidden. The information in this publication is provided
“as is”. ServiceNow makes no representations or warranties of any kind, with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a
particular purpose.
ServiceNow is a trademark of ServiceNow, Inc. All other brands, products, service names, trademarks or registered trademarks are used to identify the products or services of their respective owners.
SN-WP-RCA-072015