PDF

Cisco IT Case Study – May 2013
Smart Call Home for Cisco UCS
Powered by Smart Call Home, Device Diagnostics
Reduce Time to Resolution for Cisco Unified
Computing System by 80 Percent
Total time to resolve issues drops from 30 hours to less than 6; automated support
decreases time customers spend monitoring, troubleshooting, and remediating
issues.
EXECUTIVE SUMMARY
BUSINESS SITUATION AND CHALLENGE
● No mechanism for monitoring health of the
hardware and automating alerts when issues
are detected
● Process to resolve faults in customer servers
averaged around 30 hours
SOLUTION AND BENEFITS
● Smart Call Home sends proactive alerts for
early detection of potential business-impacting
problems and recommended remediation
● Automatically generates support cases for
critical issues, routes to appropriate TAC team
● Enables fast, web-based access to analysis,
remediation recommendations, Call Home
device information, and more
Business Situation and Challenge
®
Smart Call Home, a Cisco software feature, provides a mechanism
for automatically creating support cases and updating Cisco IT,
customers, or partners about events and changes on a Cisco device
in a customer network. Smart Call Home is supported on a broad
range of Cisco products, from routers and switches to security,
unified communications, and data center devices including Cisco
®
Nexus and Multilayer Director Switches and Cisco Unified
™
®
Computing System (Cisco UCS ). Smart Call Home is provided to
®
Cisco customers with SMARTnet and other specified service
contracts.
BUSINESS IMPACT AND METRICS
● Increased operational efficiency; time to
resolution under 6 hours versus 30 hours
● Higher server availability; devices continuously
monitored with secure, connected service
When enabled Smart Call Home looks for a specific set of faults that
NEXT STEPS
● Continue to evolve the tool’s functionality and
delivered value (usability, supported Cisco
devices, depth and breadth of intellectual
capital)
says Bryan Williams, technical marketing engineer, Technical
Cisco has identified through interaction with Cisco Technical
Assistance Center (TAC) engineers, the Cisco support community,
and developers. “These faults are considered significant events,”
Services Product Management at Cisco. “Essentially we are helping
reduce the ‘noise’ for customers in their network management
efforts.”
Several steps are required to isolate, troubleshoot, and remediate the problem after a significant event is detected.
Traditionally this resolution process has averaged about 30 hours from beginning to end, based on aggregate
support case data of typical IT interaction with Cisco.
“We wanted to automate as much of this interaction as possible. That’s what the Smart Call Home service does,”
says Williams. “Instead of waiting for a user to notice a problem or a fault to escalate and be reported, Smart Call
Home proactively identifies and diagnoses faults before they can affect the business.”
Solution and Benefits
When a significant event is detected, it is reported to Cisco IT through the Smart Call Home system, which
compares the alerts to an extensive set of rules based on Cisco’s intellectual capital. These rules determine when
© 2013 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public.
Page 1 of 9
the problem occurred and on what device in the network, the potential impact of the failure, and the analysis and
recommended steps to remediate the problem. Data, messages, and notifications transmitted to and from the
Smart Call Home system are encrypted for security.
The Smart Call Home automated workflow consists of the following tasks:
●
Incoming messages are analyzed and correlated. The system assesses their severity and activates a
notification sequence based on profiles set up by the customer, that specify who should be notified, how
messages should be transmitted, and what types of events should receive alerts. The notifications contain
analysis, troubleshooting data, and remediation recommendations generated automatically based on Cisco
intellectual capital.
“Our goal,” says Williams, “is to give customers the tools they need to resolve issues quickly themselves,
without Cisco TAC intervention.”
All the notification and supporting content is sent to a web portal that the customer and Cisco TAC
engineers can access securely. In addition to diagnostic and remediation data, the portal contains up-todate information about Call Home-enabled devices personalized for the customer.
●
For faults deemed complex or severe, Smart Call Home initiates a support case with Cisco TAC on the
customer’s behalf. The support case includes detailed diagnostics and is sent directly to a TAC engineer
with experience in resolving the issue. The TAC engineer contacts the customer directly to troubleshoot the
problem.
“In Cisco IT, automation is becoming a mainstay for our operations.
When we activated Smart Call Home, its automated capabilities helped
us gain efficiencies that encompassed faster resolution time, increased
network uptime, and cost savings.”
—John Manville, Senior Vice President, Global Infrastructure Services, Cisco
Cisco IT: Smart Call Home on UCS
In 2008 when Cisco IT began migrating to Cisco UCS internally, Smart Call Home was a service-level option
available to clients. As the first major internal customer of Cisco UCS, the UCS operations team has experienced
the implementation, and evolutionary bumps, of Smart Call Home firsthand. The team has nearly 200 UCS
clusters globally. A standard cluster consists of 2 Cisco UCS Fabric Interconnects, 5 Blade Server Chassis, and
40 Blade Servers. Call Home is supported within the embedded Cisco UCS Manager, which provides centralized
management of all Cisco UCS software and hardware components across multiple chassis.
Before migrating to Cisco UCS, the operations team had been using third-party monitoring tools that performed
diagnostics at the hardware level. “We were looking to replicate the same type of information and thought Smart
Call Home could be our route to accomplish that,” says Sara Kogut, a services manager in Cisco IT who ran the
UCS operations readiness efforts during the migration. “Without an alerting tool like Smart Call Home we wouldn’t
have insight into the health of the hardware.”
Soon after enabling Smart Call Home, the UCS operations team encountered a big challenge—figuring out how to
process the massive volume of alerts the tool was generating. What did each alert mean? Which alerts signaled
© 2013 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public.
Page 2 of 8
real problems that required action or TAC involvement, and which ones were non-issues?
The team discovered that the tool was initially “over-alerting” (for example, picking up false positives for thermal
readings). Meanwhile, the Cisco UCS migration was gaining momentum, and the operations team needed an
environment that would scale with the growth.
“Cisco UCS operations is a global, virtual team. During the first year of growth, we realized that the Smart Call
Home alerts would best be dealt with by a hands-on team with on-site presence in our major data center—a
vendor that could verify issues manually when needed,” says Kogut.
Both Kogut’s team and the vendor were learning about the tool simultaneously. The vendor’s training focused on
how to categorize alerts, what to look for, when to follow up, and how to close support cases. As the UCS
environment grew, the vendor supplemented Smart Call Home alerts with physical verification, by manually
opening up the UCS chassis and looking through UCS Manager to confirm the detected problem areas.
According to Kogut, “There was a good 12 months of ramp-up time before we could all use the tool efficiently.”
“The learning curve was particularly rough initially because of the sheer number of emails that were generated by
Smart Call Home,” says Rita Gapuz, IT engineer, Milestone Technologies. “Two emails are generated for every
alert, one for notification of the problem and the other for recovery. It was challenging to go through every email.
My team’s inboxes were getting maxed out extremely fast.”
Gapuz and her team have become adept at classifying alerts by hardware type and resolution category, and
assign people to look for and follow up on specific alert categories. It is a model that scales well within the Cisco
UCS environment.
“From a support standpoint, we don’t have to worry about logging into every UCS cluster and doing a health check
because Smart Call Home is actively doing that,” says Gapuz. For actionable alerts, the tool also prevents the
vendor from having to open up the UCS chassis manually.
Since the internal Cisco UCS implementation began, Kogut’s team and the vendor have provided feedback to the
Smart Call Home development group. Their input helped abate over-alerting and refine other functionality in
subsequent releases of the tool.
Business Impact and Metrics
The primary benefit of Smart Call Home is that it is proactive, notifying users of problems before they can affect
the business. The diagnostics, analysis, and recommendations automatically generated by Smart Call Home
decrease troubleshooting time and minimize the number of customer support cases that must be opened.
Faster Time to Resolution
Cisco Operations, IT, and TAC personnel have experienced significant decreases in time to resolution with Smart
Call Home. Based on aggregate case data compiled for a typical IT interaction with Cisco to identify, troubleshoot,
and remediate a business-impacting fault, the use of Smart Call Home has reduced the total time to resolution
from 30 hours to less than 6 (Figure 1).
Following are the steps and elapsed time allocations for a typical fault resolution scenario before and after using
Smart Call Home.
© 2013 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public.
Page 3 of 8
Fault Resolution without Smart Call Home
A device fault occurs and worsens. The problem is eventually noticed by an end user while running an application
to execute a sale.
1. Isolate (30 minutes)
Operations must identify the source device or component with the failure. This step might require input from
business analysts, developers, or several IT people.
2. Troubleshoot (2 hours)
After the device is isolated, IT needs to figure out why it is not working. Several resources might be consulted to
find out what the error means and if anyone has successfully resolved the problem, such as logs, Error
Message Decoder, Cisco Support Community, and external references.
3. Open support case (2 hours, 15 minutes)
IT gathers the required information, such as contract and product serial number(s), and opens a support case
on the customer’s behalf via Cisco.com or by contacting the TAC directly.
4. Engage with Cisco TAC (26 hours, 15 minutes elapsed since device fault detected)
The bulk of remediation time is spent on this step. Technical support engineer asks the customer for
information, including an updated configuration, show commands, and logs. Customer uploads information to
FTP server and sends an email to the engineer. After the logs are received and analyzed, the TAC engineer
®
establishes a Cisco WebEx conferencing session to troubleshoot the problem with the customer. In this case,
a device component needs to be replaced.
5. Issue Return Material Authorization (RMA) and dispatch part (assumes 4-hour replacement coverage)
The Cisco TAC engineer creates an RMA, and the replacement part is shipped to the customer per service-level
agreement requirements. Customer replaces the problem component.
Total time to resolution without Smart Call Home = 30 hours
Fault Resolution with Smart Call Home
During Smart Call Home 24-hour proactive monitoring, a device fault is detected. Based on rules contained within
Smart Call Home, the system isolates the source of the problem and automatically generates analysis and
recommended steps for remediation.
1. Isolate (0 minutes)
2. Troubleshoot (0 minutes)
3. Open support case (12 minutes)
A support case is opened on behalf of the customer automatically. The support case routes directly to a Cisco
TAC engineer that has experience resolving the problem. Significant data related to the fault are sent to the
secure web portal.
4. Engage with Cisco TAC (42 minutes)
The right support engineer has immediate access to the analysis, recommendations, and pertinent customer
information in the web portal. The Cisco TAC engineer informs customer of the problem and verifies the steps to
© 2013 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public.
Page 4 of 8
remediation.
5. Issue RMA and dispatch part (1 hour, 27 minutes elapsed since device fault detected)
RMA is automatically created and replacement part sent to the customer. A 4-hour replacement coverage is
assumed.
Total time to resolution with Smart Call Home = 5 hours, 27 minutes
Figure 1.
Total Time to Resolution Before and After Using Cisco Smart Call Home
Self-Help Troubleshooting, Fewer Support Cases
The following Smart Call Home data, reported by Cisco IT, are actual totals for messages, notifications, and
support cases generated for a customer’s Cisco UCS deployment in the three months of May through July 2012.
This customer has a large Cisco UCS setup consisting of approximately 175 domains and 8000 serviceable
devices and subsystem components:
Total messages (alerts) received by Smart Call Home: 28,754
Total email notifications sent to the customer: 9149
Total support cases initiated: 411
“Even though Smart Call Home sent us more than 28,000 messages, less than a third of them resulted in email
notifications to the customer,” says Williams. “The intellectual capital and rules in Smart Call Home allow us to
significantly reduce the number of messages that customers have to deal with. We strive to notify customers only
when there’s something they can do to resolve the issue.”
The system analyzes and correlates the incoming messages, assesses the severity, and activates a notification
sequence based on profiles created by the customer. Notifications contain analysis, recommendations, and
troubleshooting data that gives customers the tools they need to resolve issues themselves quickly, without Cisco
© 2013 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public.
Page 5 of 8
TAC intervention.
In the customer example (Figure 2), 28,000-plus total messages generated by the system within the 90-day period
resulted in only 411 support cases. In these instances, Smart Call Home proactively generated a Cisco TAC
support case, with detailed diagnostics and relevant debugging information attached. Support cases were routed
to the appropriate Cisco engineer who could resolve the problem. Much of the data that Cisco TAC engineers
need from customers, such as show commands, were also automatically generated and sent to a secure web
portal that the TAC engineer and customer could access, reducing Mean Time to Repair (MTTR).
Figure 2.
Troubleshooting a Cisco UCS Deployment with Smart Call Home
Next Steps
The Smart Call Home team aggressively strives to evolve the tool’s capability and increase the level of value
delivered to Cisco customers and partners, with an emphasis on three areas:
●
Growth and maturity of intellectual capital library
●
Supported Cisco devices
●
Increased functionality through feature releases
Intellectual Capital
Intellectual capital is the foundation upon which Smart Call Home is built. It consists of many bits of knowledge
gathered from within Cisco. This knowledge is codified into rules that help automate decisions about the types of
activities monitored for each device, the diagnostic path collected when a given fault is identified, the steps
followed to troubleshoot and resolve an issue, and the Cisco resources proactively engaged for critical issues.
© 2013 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public.
Page 6 of 8
In 2012 the Smart Call Home intellectual capital library grew by more than 91 percent, from 1616 assets to 3089
assets. At this rate, more than 200 assets are being added to the library each month, and the Smart Call Home
team anticipates accelerating this growth over time.
In late 2012 and early 2013, improvements to internal processes enabled the Smart Call Home team to release
more intellectual capital assets into production faster than in previous years.
Supported Cisco Devices
Smart Call Home is supported across a broad range of Cisco products, with coverage for the vast majority of
devices in the Borderless Networks and data center solution categories, as well as key devices in security and
collaboration. The inclusion of Smart Call Home in additional Cisco devices will grow along with the company’s
product portfolio.
Feature Releases
This is the most dynamic area of the Smart Call Home evolution and, by design, is heavily influenced by customer
interaction. The next few Smart Call Home feature releases are designed to focus on the following:
●
●
Simplification of registration and entitlement processes
APIs to enable new integration and consumption models, both for customers and Cisco partners as well as
internal stakeholders including TAC, IT, and product teams
●
Mobility applications
●
Platform scalability and performance
●
Improved customer feedback mechanisms
© 2013 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public.
Page 7 of 8
For More Information
To read additional Cisco IT case studies about a variety of business solutions, visit Cisco on Cisco: Inside Cisco IT
www.cisco.com/go/ciscoit
Note
This publication describes how Cisco has benefited from the deployment of its own products. Many factors may
have contributed to the results and benefits described; Cisco does not guarantee comparable results elsewhere.
CISCO PROVIDES THIS PUBLICATION AS IS WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING THE IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A
PARTICULAR PURPOSE.
Some jurisdictions do not allow disclaimer of express or implied warranties, therefore this disclaimer may not apply
to you.
© 2013 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public.
Page 8 of 8