Agentless or Agent-based Monitoring

Agentless or Agent-based Monitoring?
The never-ending story
Document version: 1.2
Date: November 2007
Which one is best? Agentless or agent-based? There has been a lot of debate about this, and it seems
to be one endless discussion. As we receive many questions about this, this paper examines the
question in the light of new developments and Tango/04 experience and customer requests.
Nevertheless, as we say at the end, remember that it is far more important what you do with the data
collected than the method used to collect it.
What is Agent-based?
Agent-based monitoring involves the deployment of a software program that runs natively on the
monitored element. This is commonly a third-party program or script that is not part of the original
element, for instance, the Windows Server Agent from Tango/04, which is a Windows program you
need to install and properly configure in a Windows server.
What is Agentless?
Agentless monitoring refers to the ability to monitor an element without the need to install a third-party
program first. Usually, agentless monitoring leverages some standard technology that is already
included in the monitored element. For instance, the SNMP1 services that are commonly found in most
operating systems today.
Typically, you can manage the monitoring capabilities from a remote point of control. Tango/04 offers
the ThinkServer Monitoring Engine which is basically a manager of monitors. One immediate
advantage of ThinkServer is that you can inspect or change attributes of each monitor, including
thresholds, actions, data collection, etc. from a common interface for all the monitored elements.
A bit of History
Agent-based monitoring was the initial way to monitor systems and infrastructure elements. This was
due to several factors, in particular the lack of standards to interchange information across different
platforms. Another reason was the nature of IT infrastructure in the early days, typically composed of
centralized hosts (such as large mainframes) and static applications (see Table 1). There were very
few elements to monitor, they rarely changed, and applications were modified at a much slower pace.
Security was also a concern, since the early designs of SNMP version 1 were not considered safe
enough.
Thus, agent-based monitoring was a necessity and probably the best solution until the nineties. When
centralized systems were rapidly replaced by “rightsizing”, “Client/server computing” and “distributed
applications”, the situation changed radically. There were no longer just a few elements to monitor,
there were thousands, and they changed rapidly. Applications also started to be in a constant state of
flux when Web applications and rapid development tools became popular. Agent-based monitoring
started to show its disadvantages in the new scenario.
At the same time, monitoring standards started to evolve. The SNMP specification added versions 2
and 3, solving most of the early problems, adding new capabilities and security. Several other
initiatives appeared or were improved, including both open source standards and proprietary
1
SNMP means Simple Network Management Protocol (SNMP), and it is an internet protocol defined by the Internet
Engineering Task Force (IETF) that has been in use for more than 20 years.
© 2007 Tango/04 Computing Group
Page 2
specifications that became popular, such as RMON2, Windows WMI3, SMB/SAMBA4, JMX5, etc. Most
hardware and software vendors started to include APIs to facilitate the remote monitoring of their
products.
Before
Now
Number of servers
Few
Thousands
Infrastructure rate of change
Static
In constant flux
Agentless monitoring standards
Unproven, few, unsafe
Plenty, proven, safer
Application rate of change
Few monthly changes
Web 2.0 pace: several changes per day
Diversity of elements
Mono-platform
Multiplatform
Architecture
Centralized
Distributed
Table 1 – Characteristics of IT environments in the past decade and the present.
Common Agent-based Disadvantages
Generally mentioned disadvantages for agent-based monitoring include:
•
Intrusiveness (it may affect existing applications)
•
High cost to deploy
•
High cost to maintain
•
Increased CPU usage
As you need to deploy a third-party piece of software in a monitored element, a conflict may be
introduced in the environment. The fear of causing a disruption in a server that is working properly was
a very common reason for Tango/04 customers to demand agentless monitoring capabilities from us.
The “if it works, don’t touch it” motto is often heard when you mention the need to install additional
software in a fragile environment (such as Windows machines or critical applications in constant state
of flux). Usually, agentless monitoring leverages the fact that the data collection agent is already part of
the monitored element, it has been thoroughly tested in a multitude of scenarios, security issues have
been openly discussed and fixed, etc.
A common example of the danger of intrusiveness would be adding a native monitoring agent that
generates too many log entries: this native agent could fill up storage space and cause unavailability in
the element it was supposed to watch over.
2
RMON means Remote Network MONitoring . The RMON MIB was developed by the IETF to support monitoring
and protocol analysis of LANs.
3
Windows Management Instrumentation (WMI) is an extension to the Windows Driver Model that provides an
operating system interface through which instrumented components provide information and notification. WMI is
Microsoft's implementation of the Web-Based Enterprise Management (WBEM) and Common Information Model
(CIM) standards from the Distributed Management Task Force (DMTF).
4
SMB means Server Message Block, an application-level network protocol mainly applied to shared access to files,
printers, serial ports, and miscellaneous communications between nodes on a network. Samba is an open source
implementation of SMB.
5
JMX means Java Management Extensions (JMX), a Java technology that supplies tools for managing and
monitoring applications, system objects, devices (e.g. printers) and service oriented networks.
© 2007 Tango/04 Computing Group
Page 3
Agent-based deployment and maintenance costs are considered higher since it is necessary to install
software manually in each element, and the fact that it is far easier to configure everything remotely
instead of touching each agent at each monitored element. This is true in general, but we should note
that usually standard monitoring protocols still need some kind of setup, especially in earlier releases of
operating systems (for instance, SNMP services are not enabled by default in Windows 2000 or 2003
Server, Windows WMI services were not even installed by default in earlier Windows versions).
Deployment can be automated in some cases too, which mitigates this disadvantage6.
The increased CPU usage stems from the fact that usually a local agent would need to incorporate
program logic to set and control thresholds, store data, alert, and notify of exceptions. This is a valid
concern, but agentless monitoring may generate in some cases more traffic in the network, since it
would probably need to send information periodically to the controlling node.
We can add to the disadvantages, the need to use and learn different user interfaces. As each agent
will probably have their own GUI or command line interface, this adds to the general complexity of the
IT environment, adding a training learning curve for operators.
Welcome to the Agentless Era
As mentioned, with the explosion of IT infrastructure complexity and the evolution of standards, people
saw opportunities in agentless monitoring. Its commonly cited advantages are the counterpart of the
agent-based disadvantages, as discussed before, so we can mention:
•
Reduced risk of conflict and undesired crashes
•
Lower cost of deployment
•
Lower cost of maintenance
•
Reduced CPU usage
•
Common interface for multiple agents in multiple platforms
The fact that you can use a similar interface for all of your agents across all your platforms significantly
reduces the learning curve and provides for knowledge reuse across domains.
All of these advantages should yield a lower Total Cost of Ownership (TCO) and a higher Return On
Investment (ROI). The differences should be more noticeable the larger the size of the IT infrastructure.
Increased requirements for network bandwidth can be a concern, but only if there is a shortage of it.
Usually the communication between the monitored element and the controlling manager (the
ThinkServer Monitoring Engine in the case of Tango/04 technology) is a local one, and customers
usually don’t have a bottleneck in the local network.
Security, as mentioned, is less of a concern when using proven protocols such as SNMPv3, which
adds authentication, privacy, and access control7. In fact, the facilities provided by these standards are
generally considered safer than most features provided by third-party software vendors, which usually
use the “security by obscurity” paradigm.
6
Typically automated distribution, when feasible, covers only a part of the monitored elements, usually some
servers, but note that generally there are several different elements to monitor and not all of them have a common,
automated software distribution facility.
7
See SNMPv3: A Security Enhancement for SNMP, William Stallings, IEEE,
http://www.comsoc.org/livepubs/surveys/public/4q98issue/stallings.html
© 2007 Tango/04 Computing Group
Page 4
Dealing with a high volume of data has been pointed out 8 as a disadvantage for the agentless
approach, but this argument is less and less relevant as the remote monitoring technology adds the
same filtering capabilities as their agent-based counterpart.
So, the winner is…?
If you look at the summary in Table 2, it looks like agentless is the one with the most advantages. Why,
then, is there continuous debate over which method is better? Well, simply put, every environment is
different, so chances are that the best answer for a company is not the best for another. Usually, the
correct answer to such a generic question is “it depends”, so to reach to a scientific, definitive
conclusion on the subject is far from feasible.
Agent-based
Agent-less
Installation
Requires local agent
Nothing
Maintenance
Requires updating for new versions
Nothing
Risks
Application or operating system conflict,
performance degradation, crashes due to
untested interferences
Reduced, since no extra
software is deployed
Security
Could be good, but beware of “security by
obscurity” and undetected security holes in
third-party products that are not widely
deployed
Good when using safe, open
protocols
Controlling interface
Typically different for each agent
Same if a common
management engine is used
Learning curve for common
agent maintenance tasks
One for each agent
One for all the common
management engine
Bandwidth usage
Maybe less
Maybe higher
Resource usage
Maybe higher
Maybe less
Usually best for
Small number of elements, extremely critical
centralized elements (such as a big mainframe
server)
Large IT infrastructure
TCO
Probably higher
Probably lower
Table 2 – Commonly referred attributes of each monitoring alternative. Please note that the facts are not as simple
as it seems. Most “advantages” are theoretical only, and can be the opposite in case of a bad implementation.
8
Some also question the ability to get more data and hence to be able to perform in-depth monitoring has also been
cited as an advantage of agent-based approaches. This is sometimes true, but far from an absolute truth. In some
cases the implementation of the agentless mechanism is as good as a local agent, and even better. Moreover, in
some cases, the agentless mechanism uses the same underlying technology than its agent-based counterpart.
However, when doing very detailed data collection for highly technical purposes, such as capacity planning, in some
environments an agent-based product can be more capable. But this opens a whole different discussion: what
should you monitor, and why? In our practical experience with hundreds of successful monitoring projects, we
always preferred to privilege the quality of controls against the mere quantity of them. For application monitoring
(BSM and SLM) projects this has always been a winning value proposition. So discussing the richness of agentbased against agentless monitoring theoretically is just a futile exercise. Moreover, end-to-end service level
monitoring is performed through synthetic transactions, which are by definition agentless. And there are other cases
where Agentless is the only way to go (monitoring certain devices such as Internet appliances is a good example of
this).
© 2007 Tango/04 Computing Group
Page 5
Since there has been so much interest in agentless technologies from our customers, Tango/04 is
creating more and more monitors that can be deployed remotely. We rewrote our native Windows
Server Agent as a WMI-based ThinkServer monitor, and we are also rewriting most of our existing
iSeries agents as Java-based Agentless ThinAgents9. Practically all the new monitoring functionality
we are creating is based on the ThinkServer engine, ensuring its agentless capabilities. Indeed, we
strongly believe in the benefits of this approach and we made a strategic decision to support this type
of monitoring mechanism.
But we are not dogmatic about it. In some cases we will offer hybrid alternatives, such as the ability to
send a Unix/Linux script to a remote server, execute the script natively, and retrieve data back to the
ThinkServer (which is very similar to our SSH-based monitors), or use a mix of remote and agentbased monitors. We can also create a native implementation of our ThinkServer monitoring engine in
10
Unix/Linux platforms , so customers will have the choice to monitor either remotely or locally if they
want, at the cost of installing the ThinkServer engine in each machine.
Conclusion
In the end, agentless monitoring is just agent-based monitoring. The only difference is that the local,
native monitoring agent is pre-installed in the monitored element. For instance, SNMP or syslog
services are already installed in most UNIX systems, but they basically act as any other monitoring
agent, collecting data and reporting it back.
Probably the best monitoring alternative for you would be the one that gives you the most chances of
success. Unfortunately, millions of dollars have been spent on complex monitoring framework
implementations with the simple goal of controlling and improving the performance of an IT
infrastructure… all in vain. The best advice we can give you when selecting a monitoring mechanism,
is to look closely at the track record of the company that will be ultimately responsible for your project.
So, agentless or agent-based? It depends! Agentless is very tempting as discussed, but both methods
have their advantages and disadvantages, and you will find people advocating in both directions. But
trust us, in the end, what matters more is the ability (and willingness) of your monitoring partner to
convert you into a happy customer story instead of another sad tale of frustration.
9
ThinAgent is how we call a ThinkServer agent. As most of the surrounding services are offered by the engine
itself, the agent is “thin”, it just collects the data and does very little else. That leverages Tango/04 ability to rapidly
create new monitors and agents, as the time required to implement a new one is dramatically reduced.
10
The ThinkServer Monitoring Engine has been designed with portability in mind. It leverages cross-platform
standards such as C++, XML, and SOAP, and we have versions of the engine running perfectly in Linux platforms in
our labs.
© 2007 Tango/04 Computing Group
Page 6
About Tango/04 Computing Group
Tango/04 Computing Group is one of the leading developers of systems management and automation
software. Tango/04 software helps companies maintain the operating health of all their business
processes, improve service levels, increase productivity, and reduce costs through intelligent
management of their IT infrastructure.
Founded in 1991 in Barcelona, Spain, Tango/04 is an IBM Business Partner and a key member of
IBM’s Autonomic Computing initiative. Tango/04 has more than a thousand customers who are served
by over 35 authorized Business Partners around the world.
Alliances
Partnerships
IBM Business Partner
IBM Autonomic Computing Business Partner
IBM PartnerWorld for Developers Advanced Membership
IBM ISV Advantage Agreement
IBM Early code release
IBM Direct Technical Liaison
Microsoft Developer Network
Microsoft Early Code Release
Awards
© 2007 Tango/04 Computing Group
Page 7
Legal notice
The information in this document was created using certain specific equipment and environments, and it
is limited in application to those specific hardware and software products and version and releases
levels.
Any references in this document regarding Tango/04 Computing Group products, software or services
do not mean that Tango/04 Computing Group intends to make these available in all countries in which
Tango/04 Computing Group operates. Any reference to a Tango/04 Computing Group product,
software, or service may be used. Any functionally equivalent product that does not infringe any of
Tango/04 Computing Group’s intellectual property rights may be used instead of the Tango/04
Computing Group product, software or service
Tango/04 Computing Group may have patents or pending patent applications covering subject matter in
this document. The furnishing of this document does not give you any license to these patents.
The information contained in this document has not been submitted to any formal Tango/04 Computing
Group test and is distributed AS IS. The use of this information or the implementation of any of these
techniques is a customer responsibility, and depends on the customer’s ability to evaluate and integrate
them into the customer’s operational environment. Despite the fact that Tango/04 Computing Group
could have reviewed each item for accurateness in a specific situation, there is no guarantee that the
same or similar results will be obtained somewhere else. Customers attempting to adapt these
techniques to their own environments do so at their own risk. Tango/04 Computing Group shall not be
liable for any damages arising out of your use of the techniques depicted on this document, even if they
have been advised of the possibility of such damages. This document could contain technical
inaccuracies or typographical errors.
Any pointers in this publication to external web sites are provided for your convenience only and do not,
in any manner, serve as an endorsement of these web sites.
The following terms are trademarks of the International Business Machines Corporation in the United
States and/or other countries: AS/400, AS/400e, iSeries, i5, DB2, e (logo)®Server IBM ®, Operating
System/400, OS/400, i5/OS.
Microsoft, SQL Server, Windows, Windows NT, Windows XP and the Windows logo are trademarks of
Microsoft Corporation in the United States and/or other countries. Java and all Java-based trademarks
and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States
and/or other countries. UNIX is a registered trademark in the United States and other countries licensed
exclusively through The Open Group. Oracle is a registered trade mark of Oracle Corporation.
Other company, product, and service names may be trademarks or service marks of other companies.
© 2007 Tango/04 Computing Group
Page 8