ITCAM for Transactions: updating Web Response Time agent configuration to address Transaction Tracking overload Preface This document records the technical challenges encountered during a particular Agentless Monitoring deployment and the techniques and strategies used to overcome them. Authors: [email protected] William Lanny Short, Certified Process Specialist, IBM [email protected] Robert Cheung, ITCAM for Transactions Developer, IBM The problem A business has a mission critical application that spans many systems and components, and creates a lot of network traffic. When the deployment of ITCAM for Transactions: Web Response Time (KT5) was planned, the intention was to configure each agent to run in appliance mode. This deployment strategy would mean that only a small number of agents would be required and they could be deployed to monitor the application-generated web traffic. To allow an KT5 agent to monitor many hosts at once, network switches will need to enter Port Spanning mode and gave the KT5 agent host a copy of the network traffic. In this case, the network team could not support these hardware changes and port spanning could not be implemented. This resulted in the need to install KT5 agents on every host that monitoring was desired. For example, if the application consists of six IBM HTTP servers and six Websphere Application Servers, a KT5 agent is needed on each of those servers. The KT5 agent, by default, monitors all network traffic seen by its host's network interfaces. This has the potential for generating a lot of noise that can significantly impact the performance of the KT5 agent and the ITCAM infrastructure that it points to. The Application Management Console agent (KT3), which is critical for the management and maintenance of the ITCAM for Transactions solution, could use up a lot of CPU cycles or crash. This can happen because the KT3 agent is configured, by default, to report all of the applications that each KT5 agent finds. The CPU cycle consumption or crashing of the KT3 agent could be the result of seeing duplicate applications or other unnecessary network noise. So how do you mitigate this problem and prevent such a problem from occurring in future application on-boarding exercises? This white paper addresses these questions. What are the underlying issues? The underlying issue is the performance of the Transaction Reporter (KTO) and the Application Management Console agent (KT3) agents. Due to a large number of KT5 agents deployed, these agents have a lot of incoming data which causes each agent to consume a lot of CPU cycles or to crash. The KT3 agent has the primary responsibility for configuring all the ITCAM for Transactions data and filtering it based on applications that have been created as part of its configuration settings. It also consolidates all the applications detected from the various ITCAM for Transactions agents, such as the KT5 agent, the Robotic Response Tracking agents, and other ITCAM for Transactions agents, into one pane. The KTO agents have the primary responsibility for displaying the ITCAM for Transactions data as it has been collected. They read through all the collected data and create transaction overlays based on the configurations created in the KT3 and the server and client sources and destinations of the collected data. If the data collected is extremely noisy, the agents spend a significant amount of time trying to sort through the noise and that is the primary cause of CPU cycle consumption or agent crashes. What is the solution? The solution to this problem has two phases. The first phase is to correct the immediate problem. The steps for this phase address the current problem with the KTO and KT3 agents getting bogged down in all the noisy ITCAM for Transactions data and help get the agents back to doing their jobs with no performance problems. The second phase is to prevent future problems when on-boarding applications. The steps for this phase include creating a sand-box that can be used for initial testing of the KT5 agents deployed onto the new application's components so that filtering configuration can be done as part of the on-boarding process. Phase 1: Correcting the immediate problem The aim of this phase is to limit the amount of data that KT5 agent creates by configuring it to only monitor traffic of interest. Using above example, the T5 agents are configured to monitor only HTTP(S) and Websphere Application Server traffic, and ignore other traffic. Furthermore, the KTO is checked to ensure that the number of entities it is monitoring has not exceeded its capacity. If it has, additional KTO agents has to be deployed to spread the load. This second step is important particular in scenario where dozens or even hundreds of KT5 agents has been deployed. First, stop the KTO agent from retrieving data from all the KT5 agents that is connected to the TEMS but instead restricted to a relevant subset. The KTO agent should be talking only to the KT5 agents that are deployed onto the application in question. This is done by setting the Aggregation Agent List configuration parameter of the KTO agent, documented in this knowledge center page. Second, using the Application Manager Console Editor (in the Tivoli Enterprise Portal) configure both the KT5 data sources and the Transaction Collector (KTU) data sources to clearly restrict the KT5 agents to only report on traffic related to the application at hand. Detailed instruction on this can be found in the best practices guide ITCAM for Transactions V7.3 Customization: Transaction Tracking Filtering and reporter. Third, confirm that the number of nodes and edges seen in the KTO transaction overlay diagrams is manageable. To get an idea of the number of nodes, edges, and interactions, in the Tivoli Enterprise Portal, complete the following steps: 1. In the navigator, select Transaction Reporter. 2. Right-click Transactions and select the Transaction Aggregate Topology workspace. 3. Click on the Table/Topology view toggle button to switch the view into a table row. 4. Check the number of rows returned. Figure 1: TEP Transaction Aggregate Topology Workspace Figure 1 documents what would be seen in the TEP Transaction Aggregation Topology Workspace. Alternative Methods: Check the Total displayed (circled in Figure 1). This is a coarser estimate because some of the nodes and interactions can be hidden by various conditions Perform a more detailed investigation of the Transaction Reporter logs. During each collection interval (configured to be 2 minutes by default), the KTO gathers all aggregates and interactions and logs how many of each were gathered and from which Transaction Tracking agents. To find this information, search for "collectionPeriods()" in the latest KTO log. Count the number of RecordIdentityxxx.xml files that are contained in the <ITMHOME>/todata directory for the KTO. Each file represents a node that the KTO has seen at some stage. For example, on UNIX or Linux run the following command: > find <todata directory> -name "RecordIdentity*.xml" | wc -1 Example output: 5628 A manageable overlay diagram should be less than 5,000 nodes and edges. If there are more nodes than that number, add more KTO agents to reduce the load, and then test the outcome. Repeat this process interactively. That is, activate only a small number of KT5 agents at a time, and complete the filtering and reporting steps for those agents before activating additional KT5 agents. Phase 2: Preventing future problems when on-boarding applications After phase 1, the KTO agent began working but introduction of additional KT5 agents cause it to again be overloaded. So how can future applications be on-boarded problem free? Ideally, create a sand-box in your Production environment where you can ensure that the on-boarding steps for the Production version of the new application do not cause any problems for the current deployment. More importantly, in the event that overload occurs, you can reset the KTO by flushing filtered transaction tracking nodes and edges. The only required component of the sand-box is a KTO agent that is used as part of the on-boarding process. The KTO agent's Aggregation Agent List should contain only those KT5 agents that have been deployed onto the new application's components. Perform the same reporting and filtering steps that you used when correcting the original problem. After you have completed those steps, note how many nodes and edges are in the sand-box KTO transaction overlay diagram. If the current main KTO's number of nodes and edges plus the sand-box KTO's number of nodes and edges is more than 5,000, complete the following steps: 1. Make sure that the main KTO's Aggregation Agent list includes only those KT5 agents that were already deployed. 2. Deploy and configure an additional main KTO agent and add the KT5 agents that were pointing to the sandbox KTO to the new main KTO's Aggregation Agent list. 3. Remove the new KT5 agents from the sandbox KTO's Aggregation Agent list. 4. Reset the sandbox KTO agent by deleting the <agent_home>\todata directory and restart that agent. 5. Repeat steps 1 - 4 until all the new application's KT5 agents have been added. Note: Do not turn on data warehousing for the new main KTO until the filtering is complete. Never turn on data warehousing for the sandbox KTO, there is no benefit and the agent's performance will be impacted. Conclusion: This white paper discussed addressing Transaction Tracking overload problems that occur when there is a need to deploy a significant number of ITCAM for Transactions: Web Response Time (KT5) agents to monitor web traffic for an application. Document History Date Revision Notes June 2017 1.0 Initial version June 2017 1.1 Added sandbox concept, added diagram 17 June 2017 1.2 Added Preface, more edits. Thanks to Alexander Thornton for editorial review. First published
© Copyright 2026 Paperzz