USE CASE BRIEF Affinities Explained Web Application The fundamental reason a network exists is to facilitate communication between elements of the IT infrastructure–servers, virtual machines, storage, applications, or even end users. The communication is essentially a conversation between related entities. We call these relationships affinities. Modern web applications are increasingly distributed. To best understand how they work, it’s important to understand how their components communicate. This brief introduces the affinities that can exist with web applications, using Boundary’s application performance monitoring product as an example. WHAT DOES BOUNDARY’S WEB APPLICATION DO? Boundary is a software-as-a-service (SaaS) based Application Performance Management (APM) solution, designed to monitor distributed applications. It monitors and analyzes application flow-data in real-time to ensure that applications are functioning properly and that performance is optimized. Boundary’s software solution places “meters” in customer environments. These meters collect information and send that information to Boundary’s hosted analytics engine for processing and reporting. Results are then returned to customers and displayed via Boundary’s front-end graphical interface. Streaker is Boundary’s custom analytics engine. It performs packet flow analysis on the data streams from the collectors and unpacked by Phloem. Streaker stores its analytics results in a database, and it sends streaming output to a UI service called Façade. Zookeeper is an open-source component within Boundary’s datacenter architecture. It maintains synchronized configuration across a distributed application architecture. Façade receives the low-rate data stream from Streaker and presents it via the Boundary graphical user interface. Load balancers connect incoming clients with this service to provide data to end-user browsers. HOW DOES BOUNDARY’S SOLUTION WORK? To understand the underlying communications of a distributed application like Boundary, it’s important to examine its components and how they interact. As a multi-tiered web application, Boundary’s APM solution has five primary components: Collectors, Phloem, Streaker, Zookeeper, and Façade. These elements work in concert to collect information at the customer site, groom the information for analysis, and return results to end-users. Boundary’s meters are deployed on each virtual machine and/or server within a customer infrastructure. They collect packet header data and stream it to the Boundary Collectors every second. Collectors do exactly what the name suggests: they collect application and network information being streamed from customer sites. These collectors gather network flow data being sent by the meters, and send that data to Boundary’s hosted analytics engine. They act as load balancers, distributing data to Boundary’s datacenter and back to customer sites. Phloem is the name of a Boundary service backed by Kafka, a persistent, efficient, distributed message queue. This durable messaging layer buffers incoming metrics data before it is analyzed. The cluster receives a 2Gbps stream of data from the collectors and unpacks it for processing by the custom analytics engine, Streaker. WHAT ARE THE BOUNDARY AFFINITIES? The entire Boundary APM solution is architected to collect data from endusers’ systems, and perform “big-data” analytics in the cloud. In this architecture, the Collectors, Phloem (message queuing), and Streaker (analytics) have an interdependent relationship as part of the data processing pipeline. In short, these components share a natural affinity. The relationship between these components is primarily around the exchange of high-bandwidth, latency-sensitive streams of data. Accordingly, an ideal network design would optimize for these connections. Additionally, there is a relationship between Zookeeper, as the maintainer of distributed configuration, and the Collectors, Phloem (message queuing), and Streaker (analytics). As each component distributes or analyzes data, the conversations between these components have unique requirements – independent of the underlying network. These conversations and their properties are Affinities. This applies to communications between all components – but some are more critical to the scale and performance of the overall service. For instance, Zookeeper’s role in synchronizing the distributed analytics engine, makes its communication latency sensitive. On the other hand, Façade (the remote GUI) is built to work over commodity Internet connectivity and is more tolerant to high latency and low bandwidth. Understanding these Affinities between components is crucial to optimizing performance of the system as a whole. While the specific components are unique to Boundary’s solution, the overall architecture is fairly typical of many multi-tiered web applications. Optimizing network connectivity to support these Affinities is an important aspect of system design. The legacy network has no knowledge of application workload instances or components (VMs, servers, and so on), and even if it did, it has no simple mechanism to carve out specific network resources for that workload. So best common practice is to analyze peak utilization and/or latency, and design the network to support the worst case requirements (accounting for growth). This is a very costly approach, leads to a great deal of network waste when those resources are not needed, and still requires a redesign when new limits are reached. Given this architecture, the Plexxi view of the topology is as follows: OPTIMIZING FOR BOUNDARY’S AFFINITIES What follows is an actual screenshot from Boundary’s GUI. In this case, Boundary meters are being used to monitor Boundary’s own application. The screenshot shows how the application components are stitched together: Collectors Zookeeper Zookeeper Phloem Streaker Collectors Phloem Note the similarity to the Boundary view. Both Boundary and Plexxi build their solutions around logical topologies rather than physical topographies. Streaker Boundary’s design brings all data streams into the datacenter through load balancers that pass information onto the Collectors through a relatively high-bandwidth connection. The Collectors then connect to Phloem (message queuing), which is connected to both Streaker (analytics) and Zookeeper. Boundary’s GUI uses color to represent higher-latency paths, indicating that the load balancer-Collector connection is running slow in this example. The arrow width demonstrates the amount of bandwidth for a particular link, so the Phloem-to-Streaker connection is the highest bandwidth connection in the topology. The Boundary solution includes many Collectors running on some number of VMs spread across a number of servers in the Boundary datacenter. The logical grouping of all Collectors is called an affinity group. Similarly, all instances of Phloem, Streaker, and Zookeeper are logically represented by affinity groups. Links are then assigned between these affinity groups, and specific SLAs (for bandwidth and latency) can be specified. This allows Collector-Phloem connections, for example, to always have guaranteed bandwidth capacity, while links to the Zookeeper can be optimized for low-latency transfers. These attributes are shared with the Plexxi physical network, and the network topology can be optimized for the desired end-user experience. Network latency in the data processing pipeline would result in slower processing of the data and would affect the ability of the application to provide real-time analytics to the customer. Poor network throughput would cause congestion and queuing of data, which impacts the cost of the infrastructure to deliver the application and also impacts the user’s access to their data. Plexxi, Inc. 222 Third Street, Suite 1100 Cambridge, MA 02142 +1.888.630.PLEX (7539) [email protected] www.plexxi.com ©2013, Plexxi Inc. March, 2013
© Copyright 2026 Paperzz