ITimpulse NOC process

ITimpulse NOC process
This is an interactive, detailed, step
wise guide explaining how alerts are
managed at our NOC.
This document contains information that is considered proprietary and confidential. No information contained
in this document may be released, re-printed, or redistributed without prior permission from ITimpulse.
How to navigate this PPT
•
•
•
•
•
•
•
View this presentation in Slide Show (fullscreen)mode.
Do not navigate using Keyboard.
Use your mouse & click on buttons
,it will
redirect you to the appropriate slide.
Using
button will get you to the previous
slide &
to the next.
button will redirect you for more information
on the topic.
button will get you back to 1st slide.
Click F5 for Slide Show mode.
Alert detected
• An alert is generated by the RMM and an
email is sent to the NOC.
• Our Service desk responds to the alert
within minutes.
• Service desk checks if the alert is valid.
Valid
Invalid
• Service desk sets the priority and directs
the ticket to correct resource in our NOC
team.
Valid Alerts
• Valid alerts are categorized and
assigned a priority depending on our
SLA.
• They are then assigned to L1 techs.
Urgent
High
Server Outage
Work request
Ticket Life Cycle
Low
Urgent Priority
• All urgent requests are responded with
in 10 minutes. In simpler words an
engineer is working towards problem
resolution within 10 minutes.
• Urgent priority tickets (excluding server
outage) are directly assigned to L2
Engineers
• A L3 gets involved if the problem is not
resolved in an hour.
Typical Urgent alerts
Handling Server Outage
A server outage is categorized urgent.
NOC performs these steps to verify if it’s
a network problem or server crash.
1. Check for scheduled outage.
2. Check if other devices in same site are
online.
3. Ping site Public IP.
4. Try to access device from another computer
in the network.
Server Down
Network Down
Server down
If a server is confirmed as offline the NOC
performs the following actions
1. check if server reboots and comes back up
Yes
No
2. access the device using ILO/DRAC
3. If server is virtualized, check access from host
machine.
4. Inform Customer
Server reboots
Since we set all servers to reboot
automatically, in case of a BSOD they
mostly come back up. If they do...
1. Once the server is back online, Our engineers
perform a root cause analysis of the issue.
2. We implement a fix and monitor the server
for another 7 days.
Server Stays offline
Since we set all servers to reboot
automatically in case of a BSOD, they
mostly come back up. If they don’t...
1. We inform you that the server has been
offline and needs onsite attention.
2. We document the probable cause and all the
things we have tried in the ticket.
3. Our Engineer is available to help when
someone gets onsite.
Network down
The NOC checks to see if other devices at
the site are online. Yes/No
We try to ping the gateway to see if it is
an internet connection issue yes / no
Inform Customer
Inform Customer
• We will call a number provided by you
depending on the time of day.
• We will email you about the problem
with our investigations.
• All troubleshooting will be documented
in detail in your PSA.
High Priority
• All High priority requests are responded
with in 30 minutes.
• A L2 Engineer get involved on the ticket
before 60 minutes and a L3 if the
problem is not resolved in 4 hours.
• We resolve most high priority tickets in
24 hours.
Typical High priority
alerts
Low priority
• All Low priority requests are responded
with in 4 hours.
• A L2 Engineer gets involved on the
ticket if it can not be resolved after 1
hour of troubleshooting.
• We resolve most low priority tickets in
48 hours.
Typical Low priority alerts
Examples of urgent priority alerts
Server down Alert
Critical Event viewer
error
Critical RMM alert
Event viewer error
that leads to critical
error
Device failure caused Database offline alert
by patch deployment
Scheduled task
failure
Exchange service
outage alerts
Server Performance
threshold alert
Examples of high priority alerts
Non-Critical Event
viewer Error
Event viewer warning
Non-Critical RMM
alert
Server Anti-virus
scan or update alert
Server Malware
infection alert
Server Backup failure
alert
Scheduled task failure Software or RMM
agent deployment
Examples of Low priority alerts
Event viewer alerts
from workstations
Performance issues
on workstations
Workstation backup
failure alert
RMM alert for
workstations
Workstation Antivirus scan or update
alert
Workstations
Malware infection
alert
Software or RMM
agent deployment
Workstation Patch
installation failure
alert
Patch approval
Work request
• More info on Work requests
• All work requests are responded with in
4 hours.
• All work requests are resolved within 24
hours.
• The time may vary depending on the
scope of the request.
Invalid alerts
• Invalid alerts are closed and a properly
documented.
Ticket Life Cycle
The service desk is our front line of support. They perform the below tasks on
each and every ticket. Service desk does not perform any troubleshooting.
Acknowledge
• An alert generated by your RMM creates a ticket in
the PSA. For devices managed by our NOC, this alert
is forwarded to the NOC’s Board or queue.
Validate
• Our Service desk team validates these alert. They
remove the false positives. The validated alerts are
further prioritized and categorized.
Assign
• Our Service desk assigns the ticket to the right
resource. If something needs to be done at a later
time, they also schedule it.
Assigned to L1
Resolved
Majority of tickets are
resolved by the L1 team
Our L1 team follows our internal
Knowledge base and
documented resolutions to
resolve a problem.
L1 receives tickets
assigned by SD
Unresolved
Tickets are escalated to L2
Monitor
Where resolution can not be
confirmed immediately
If an input is needed , we
contact you
Escalated to L2
• When a ticket can not be resolved with known
procedures, the tickets are escalated to L2
• All our L2 engineers are MCITP certified and
have over 3 years of experience.
• L2 engineers find the root cause and resolve
the problem.
• Depending on priority, they get 30 minutes to 4
hours to research and resolve the problem.
• Any tickets that are not resolved are further
escalated to L3.
Assigned to L2
Resolved
Resolved tickets are
documented and closed
Depending on priority, L2
engineers get 30 minutes to 4
hours to research and resolve
the problem.
Unresolved
Tickets are escalated to L3
L2 receives tickets
assigned by SD
Monitor
Where resolution can not
be confirmed immediately
If an input is needed , we
contact you
Escalated to L3
• When a ticket can not be resolved a L2, the
tickets are escalated to L3
• L3 is our last tier of support. Our L3 engineers
have over 6 years of experience on the field
and they are also Subject matter experts in a
field of their choice.
• In a rare circumstance a ticket can not be
resolved by a L3, we will call you to discuss
how to proceed further.
Assigned to L3
Resolved
L3 engineers form our final tier
of support.
L3 receives escalation
from L2
Resolved tickets are
documented and closed
Unresolved
Tickets are escalated
Monitor
Where resolution can not be
confirmed immediately
Resolved tickets
• Resolved tickets are fully documented in the
PSA.
• An appropriate time entry is added in the
PSA.
• Ticket is marked closed.
• Our Quality team reviews ticket properly
closed.
• If a ticket was closed by a L2 or L3 engineer
he creates a new solution article for the
problem in our internal KB.
Assigned to Customer
• Any tickets that need physical access to
the site are assigned to customer.
• Tickets where more information is
required for resolution are assigned to
customer.
• Only 1 in every 50 tickets will require
your attention.
Unresolved by L3
• This often means that we have reached
a dead end and may need a
workaround or replacement as the
problem can not be resolved.
• Our L3 Engineer will call you and
discuss available options, their down
sides and time it will take for
implementation.
• Any changes will only be made after
your approval.
Ticket on hold for monitoring
• Some tickets may be resolved but need
confirmation before closure.
• Such tickets are assigned back to SD
team and put on hold for a specified
period of time.
• After the period of time has passed, the
SD team checks if the issue is resolved.
• Resolved tickets are closed. Unresolved
tickets are reassigned to engineers.
End of ticket Life Cycle
This brings us to the end of the Ticket Life
cycle section. Press the back button
below to go to previous section.
Click home to get to beginning of the
slide show.
To know more about how to get started with NOC services, our NOC
onboarding process, how we integrate with your existing tools and deliver
seamless NOC services schedule a web-demo with us.
Email [email protected] to schedule a live demo.
ITimpulse provides RMM agnostic, White label NOC services for MSPs
For further inquiries and information please feel free to contact us at:
US: +1 646-351-8634 India: +91 020-6500-2328
Email: [email protected] Website: www.itimpulse.in
Direct mail: ITimpulse, B112, Ganga Osian Square, Wakad, Pune – 411057
This document contains information that is considered proprietary and confidential. No information contained
in this document may be released, re-printed, or redistributed without prior permission from ITimpulse.