Diskless VDI with Cisco UCS & Atlantis ILIO Eliminating Storage from VDI Architectures White Paper August 2011 Contents Executive Summary ................................................................................................................................... 3 The VDI Cost & Performance Challenge .................................................................................................. 5 Cisco UCS and Atlantis Computing Solutions Overview....................................................................... 6 Diskless VDI – Next Generation VDI Architecture ................................................................................ 11 The Cisco UCS and Atlantis ILIO Diskless VDI Architecture ............................................................... 13 Comparing Diskless VDI to Existing Approaches to VDI Storage ...................................................... 24 Conclusion ............................................................................................................................................... 29 Appendix 1 – Previous Cisco UCS and Atlantis Computing VDI Reference Architecture Testing .. 31 Appendix 2 - Test Methodology.............................................................................................................. 35 © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 2 Executive Summary For VDI to be successful, it must compete with physical PCs on both price and performance. There are two key technology bottlenecks that are driving the cost of virtual desktops upward and causing virtual desktop to underperform physical PCs: The VDI Memory Bottleneck VDI requires large amounts of high-speed memory to maximize the number of desktops that can run on a single server and lower the cost per desktop. However, existing server platforms lack the ability to deliver more than 128GB of high-speed memory per server, which lowers density and increases costs. The Cisco Extended Memory Architecture eliminates the VDI memory bottleneck by delivering up to 384GB of high speed memory on a dual socket server. The VDI Storage Bottleneck The unique nature of the VDI workloads requires customers to purchase massive amounts of shared storage or expensive SSD based storage to deploy VDI with acceptable desktop performance. IT organization routinely undersize storage to fit within the budget of a physical PC, which leads to not having enough storage per desktop to deliver the right level of desktop performance. This ultimately leads users to reject VDI in favor of more familiar PCs. Atlantis ILIO™ software optimizes VDI to deliver high performance virtual desktops with less storage. If VDI cost more and is slower than a physical PC, how can we expect it to gain widespread adoption? Cisco Systems and Atlantis Computing™ have partnered to deliver a revolutionary Diskless VDI architecture that eliminates the need for all disk-based storage. For the first time, Cisco Extended Memory Technology and Atlantis ILIO VDI Optimization software make it possible to replace shared storage or local SSDs with local memory for virtual desktop storage. With Atlantis ILIO, virtual desktops consume up to 90% less storage capacity, making it possible to run all virtual desktop on local server memory instead of disk. What are the Benefits of Diskless VDI using Cisco Extend Memory and Atlantis ILIO? Unmatched Performance – Memory outperforms even the fastest Local SSD storage and delivers virtually unlimited IOPS to desktops locally, which dramatically improves all aspects of desktop performance including boot time, login, application launch, patching and anti-virus scanning. Lower Cost – Using Atlantis ILIO and Cisco UCS Extended Memory, it is possible to drive down the cost per desktop from both a CAPEX and OPEX perspective: • CAPEX – The upfront cost per desktop can be decreased below $200 per desktop including the server hardware and storage. • OPEX - Diskless VDI architectures mean that IT organizations can lower operating expenses by eliminating rack space for SAN/NAS storage, lower power and cooling requirements and eliminate the operational expenses of maintaining disk-based storage and replacing failed disks. Increased Lifespan and Reliability – Memory does not suffer from the same lifespan issues for write-intensive VDI workloads as SSDs, meaning that Diskless VDI architecture will be more reliable and have lower operational costs because there will be no failed disks to replace. © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 3 Cisco UCS and Atlantis ILIO Diskless VDI Architecture Price vs. Performance 1 Diskless VDI 2 Local SAS with ILIO 3 Local SSD with ILIO 4 Local SAS+SSD without ILIO 5 File Storage without ILIO $197 $219 $237 $219 $392 $0.58 $0.87 $0.76 $2.55 $38.52 6 File Storage with ILIO 7 Block Storage 1 without ILIO 8 HP Servers with FusionIO $457 $1,833 $380 $1.83 $29.63 $1.04 x x CAPEX Costs & Price/Performance CAPEX Total Cost Per Desktop Cost Per IOPS Decreases OPEX Costs Supports Blade Servers NO Disk or SSD Replacement NO Power & Cooling for Disks/SSD NO Rack Space for Disks x x x x x x x x x © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information x .White Paper Page 4 The VDI Cost & Performance Challenge Virtual Desktop Infrastructure (VDI) delivers tremendous benefits in terms of reducing the cost of provisioning, upgrading and maintaining desktops, as well as, making computing resources more flexible. However, VDI is relatively new to most IT organizations and requires careful design and the right architecture to deliver a highperformance, cost-effective and scalable virtual desktop infrastructure. Many companies have deployed VDI only to realize that their VDI architecture could not scale or deliver an acceptable user experience without requiring additional, massive investments in storage infrastructure. In order for VDI to achieve broad adoption, it is critical that virtual desktops deliver better-than-physical PC performance and the upfront costs of implementing VDI are equal or lower in cost than a physical PC. While it is possible to deliver better-than-physical PC user experience or deliver a cost per desktop below that of a physical PC, it is not possible to do both with traditional storage such as SAN/NAS or Local SSDs without compromising. Better-Than-Physical PC User Experience When implementing any major change to a desktop computing environment, winning over end users must be considered a critical success factor. With traditional storage, it is possible to deliver a good user experience but it comes at a storage cost that can exceed $1,000 per desktop. If the IT organization sizes traditional storage to fit within the cost of a physical PC, the user performance is far slower than a physical PC. When desktop performance is noticeably slower than a physical PC, users will reject VDI in favor of retaining their existing physical PCs. VDI projects are often limited in size, cancelled or re-architected based on poor desktop performance, which is most often caused by VDI storage performance. Cost Per Desktop below a Physical PC In order for most companies to transition all of their desktops from physical to virtual desktops, the total cost of a virtual desktop must be lower than that of a physical PC. The major expense of VDI is not software or server hardware, but storage. Storage can consume 50-80% of the total VDI budget if sized to deliver an acceptable desktop performance. The VDI Storage Bottleneck While the Memory and CPU of a VDI desktop remain on the physical hardware where the desktop executes, the virtual desktop hard drive is moved from being connected directly to the physical hardware to having to traverse multiple network switches. Simply stated, VDI replaces one dedicated, low latency and inexpensive physical PC hard drive with an expensive, shared, high latency storage array. The result is both increased latency based on the number of network hops from the Windows operating system to the storage array and decreased desktop performance. In addition, disk IO that is optimized by the OS for a dedicated physical PC hard drive is randomized by the hypervisor converting previously sequential IO that is easy for storage to consume to random IO that decreases storage and desktop performance (this is also known as the IO Blender effect). As more desktops are added to the VDI deployment, storage contention issues arise from large numbers of VDI desktops competing for a limited pool of storage input/output per second (IOPS). The VDI Memory Bottleneck Running many desktops on a single virtualized server means running multiple OS and application instances on a single server, which demands large amounts of memory. Since CPU performance is outstripping memory performance, memory bottlenecks are a common problem. As companies migrate from Windows XP (128MB recommended memory) to Windows 7 (1-2GB recommended memory), the density of a servers is constrained by memory and not CPU. Enterprises are often forced to deploy either four-socket servers or multiple two-socket servers to address this problem. These solutions result in more expensive servers, increased power costs, and higher licensing costs. © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 5 Cisco UCS and Atlantis Computing Solutions Overview Cisco and Atlantis Computing™ have partnered to deliver a series new and innovative VDI solution and reference architectures that eliminate the VDI memory and storage bottlenecks that are preventing large scale VDI deployments. This new combination of Cisco UCS datacenter infrastructure and Atlantis Computing VDI optimization software enables customers to deploy Virtual Desktops with amazing performance, lower cost than physical PCs and increased reliability. Cisco Unified Computing System (UCS) Overview The Cisco® Unified Computing System is a next-generation data center platform that unites compute, network, storage access, and virtualization into a cohesive system designed to reduce total cost of ownership (TCO) and increase business agility. The system integrates a low-latency, lossless 10 Gigabit Ethernet unified network fabric with enterprise-class, x86-architecture servers. The system is an integrated, scalable, multichassis platform in which all resources participate in a unified management domain. Cisco UCS Servers Cisco UCS offers both blade server (B series) rack server (C series) that can both be used for VDI deployments. Both the Cisco B250 M2 and C250 M2 servers are two socket servers that incorporate Cisco UCS ExtendedMemory Technology, which allows them to support up to 384GB of RAM per blade in 48 DIMM slots. Cisco Extended Memory Technology Modern CPUs with built-in memory controllers support a limited number of memory channels and slots per CPU. The need for virtualization software to run multiple OS instances demands large amounts of memory, and that, combined with the fact that CPU performance is outstripping memory performance, can lead to memory bottlenecks. To obtain a larger memory footprint, most IT organizations are forced to upgrade to larger, more expensive, four-socket servers. CPUs that can support four-socket configurations are typically more expensive, require more power, and entail higher licensing costs. Cisco Extended Memory Technology expands the capabilities of CPU-based memory controllers by logically changing the geometry of main memory while still using standard DDR3 memory. This technology makes every four DIMM slots in the expanded memory server appear to the CPU’s memory controller as a single DIMM that is four times the size (Figure below). For example, using standard DDR3 DIMMs, the technology makes four 8-GB DIMMS appear as a single 32-GB DIMM. This patented technology allows the CPU to access more industry-standard memory than ever before in a twosocket server: For memory-intensive environments, data centers can better balance the ratio of CPU power to memory and install larger amounts of memory without having the expense and energy waste of moving to four-socket servers simply to have a larger memory capacity. With a larger main-memory footprint, CPU utilization can improve because of fewer disk waits on page-in and other I/O operations, making more effective use of capital investments and more conservative use of energy. © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 6 For environments that need significant amounts of main memory but which do not need a full 384 GB, smallersized DIMMs can be used in place of 8-GB DIMMs, with resulting cost savings: two 4-GB DIMMS are typically less expensive than one 8-GB DIMM. The other key feature of the Cisco expanded memory technology is the ability to run the DDR-3 memory at the highest, 1333Mhz speed when configured with 384 Gigabytes of memory. Typically as you add memory to systems, the memory speed decreases. This is not the case with the extended memory servers from Cisco Systems. B250 M2 Extended-Memory Blade Server Building on the success of the Cisco UCS B200 M1 and UCS B250 M1 servers, the Cisco UCS B200 M2 and UCS B250 M2 servers extend the capabilities of the Cisco Unified Computing System with the next generation of Intel processor technology: ® ® Intel Xeon 5600 series processors. These powerful processors deliver more cores, threads, and cache, all within a similar power envelope, with even faster payback, greater productivity, and better energy efficiency. When put into production, Cisco Unified Computing System and Intel Xeon 5600 series processors together offer further reductions in TCO, increased business agility, and another big leap forward in data center virtualization. Product Overview The Cisco UCS B-Series Blade Servers are crucial building blocks of the Cisco Unified Computing System, delivering scalable and flexible computing for today's and tomorrow's data center while helping reduce TCO. The Cisco UCS B-Series Blade Servers are based on industry-standard server technologies and provide: • Up to two Intel Xeon Series 5600 multicore processors • Two optional front-accessible, hot-swappable SAS hard drives • Support for up to two dual-port mezzanine card connections for up to 40 Gbps of redundant I/O throughput • Industry-standard double-data-rate 3 (DDR3) memory • Remote management through an integrated service processor that also executes policy established in Cisco UCS Manager software • Local keyboard, video, and mouse (KVM) access through a front console port on each server • Out-of-band access by remote KVM, Secure Shell (SSH) Protocol, and virtual media (vMedia) as well as Intelligent Platform Management Interface (IPMI) The Cisco UCS B-Series offers two blade server models that utilize the Intel Xeon 5600 series processors: the Cisco UCS B200 M2 2-Socket Blade Server and the Cisco UCS B250 M2 2-Socket Extended Memory Blade Server (Figure 2). The Cisco UCS B200 M2 is a half-width blade with 12 DIMM slots for up to 192 GB of memory; it supports one mezzanine adapter. The Cisco UCS B250 M2 is a full-width blade with 48 DIMM slots for up to 384 GB of memory; it supports up to two mezzanine adapters. The UCS B250 extended memory server is the hardware platform focused within this whitepaper. © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 7 C250 M2 Extended-Memory Rack-Mount Server The Cisco UCS C250 M2 server is a high-performance, memory-intensive, 2-socket, 2 RU rack-mount server designed to increase performance and capacity for demanding virtualization workloads. Applications that are memory bound today will benefit from the 384 GB of addressable memory that the Cisco UCS C250 M2 server offers.It also can reduce the cost of memory by allowing customers to use lower capacity memory with the additional memory slots available in the C250 M2 Extended Memory Rack-Mount Server. The ability to grow a server memory by utilizing 48 DIMM slots available on a single motherboard, the Cisco UCS C250 M2 server design is unique among two-socket servers based on Intel Xeon 5600 series processors. From a memory-capacity perspective, it can alleviate memory bottlenecks issues by removing g the need to move to costly four-socket. The ability for both the C250-M2 and B250-servers might otherwise be necessary, helping improve the price-to-performance ratio for running large-memory-footprint applications. From a memory-cost perspective, the server can be populated with low-cost 4-GB DIMMs for a total of up to 192 GB of main memory; this memory configuration delivers a memory footprint that other two-socket, Intel Xeon 5600 series processorbased systems require 16-GB DIMMS to achieve. The server also can be populated with 8-GB DIMMs for a total of up to 384 GB of memory. Extended memory can operate at the same speed (1333 MHz) that smaller memory footprints do (typically with x86 servers, when you add memory the speed decreases). Cisco® UCS C-Series Rack-Mount Servers extend unified computing innovations to an industry-standard form factor to help reduce total cost of ownership (TCO) and increase business agility. Designed to operate both in standalone environments and as part of the Cisco Unified Computing System™, the series employs Cisco technology to help customers handle the most challenging workloads. The series incorporates a standards-based unified network fabric, Cisco VN-Link virtualization support, and Cisco Extended Memory Technology. It supports an incremental single server deployment model and protects customer investments with a future migration path to unified computing. These benefits of Cisco Extended Memory Technology can be harnessed by customers when very large memory footprints are required, or when large, low-cost memory footprints are desirable, as in for large virtualized environments can host more or larger virtual machines with the server's larger memory footprint, and with higher performance in cases in which existing implementations are memory bound. © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 8 Atlantis ILIO VDI Storage and Performance Optimization Software Atlantis ILIO is a VDI storage and performance-optimization software solution that complements Cisco UCS to optimize storage, boost desktop performance, and make VDI security economically feasible. Atlantis ILIO fundamentally changes the economics and performance characteristics of VDI by intelligently optimizing how the Windows operating system interacts with storage. Scaling and Optimizing Storage for VDI Atlantis ILIO processes up to 90% of the virtual desktop storage traffic locally, offloading the shared storage infrastructure and therefore reducing the amount of storage needed for each desktop. This enables customers to scale their VDI deployments to 4 -7 times more desktops (6 times in the case of CBRE) with their existing storage systems. Boosting Desktop Performance Atlantis ILIO addresses the virtual desktop performance problem without requiring additional storage infrastructure. It eliminates the storage bottleneck by effectively delivering a massive amount of IOPS to boost all aspects of virtual desktop performance, including boot time, logons, profile loading, applications, productivity tasks, and virtualized applications. Making VDI Security Possible Anti-virus protection and endpoint security are requirements for enterprise VDI deployments. However, traditional anti-virus can cut density per server up to 50%, degrade performance, and ultimately increase the network, storage, and server infrastructure costs of VDI. Atlantis ILIO integrates with leading anti-virus solutions to dramatically accelerate anti-virus scanning and eliminate redundant anti-virus scanning operations. With traditional anti-virus, Atlantis ILIO can eliminate the additional storage required to service the IO traffic generated by anti-virus, increasing density and accelerating anti-virus scanning. © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 9 How Atlantis ILIO Works Atlantis ILIO is a software virtual machine that is installed on the same hypervisor or rack as the virtual desktops to optimize how the Microsoft Windows XP and Windows 7 operating systems interact with storage. Atlantis ILIO technology, including content aware IO processing and Inline Deduplication are highly efficient and designed specifically for VDI workloads: Content Aware IO Processing Atlantis ILIO software processes all VDI traffic locally with Windows NTFS content awareness—within the same server or rack—to dramatically reduce the amount of IO traffic going to storage and eliminate the huge burden normally placed on a storage array by hundreds or thousands of virtual desktops. In-Line Deduplication for VDI Workloads Atlantis ILIO deduplicates inline all VDI images before they reach storage, effectively eliminating the need to store up to 90% of Windows image components, further reducing the amount of storage required for a successful VDI deployment. © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 10 Diskless VDI – Next Generation VDI Architecture VDI adoption has been constrained by two critical barriers to adoption: cost per desktop and user acceptance related to poor desktop performance. In order for VDI to gain widespread adoption, virtual desktops must cost less than a physical PC and deliver an equal or better user experience. With traditional servers and storage, VDI suffers from storage and memory bottlenecks that drive up the cost of a physical PC and degrade performance. Cisco Systems and Atlantis Computing have eliminated the memory and storage bottlenecks of VDI and delivered VDI reference architectures and validated designs that deliver a cost per desktop lower than a physical PC, while at the same time achieving a better than PC user experience. Cisco Validated Designs with Atlantis ILIO eliminate the memory and storage bottlenecks of VDI and enable customers to deliver a virtual desktop that costs less than a PC while delivering better performance than a Physical PC. In addition, Cisco UCS with Atlantis ILIO provides IT organizations with the flexibility to use Citrix XenDesktop or VMware View, blade servers or rack servers, and shared NAS or local disk (SAS or SSD) storage. To date, VDI architectures have always relied on some type of disk to storage virtual desktop images and clones, whether it be SATA, SAS or SSD on the local server and/or a shared SAN or NAS storage system. However, the unique Cisco Extended Memory Technology and Atlantis ILIO VDI optimization software will make it possible to store virtual desktops on the memory of the local server without the use of disks to achieve a “Diskless VDI” architecture that is faster, delivers better price/performance and is more reliable than any existing VDI architecture. What is Diskless VDI? Diskless Virtual Desktop Infrastructure (VDI) is the concept of using local server memory in combination with storage optimization software to store virtual desktop images instead of shared SAN/NAS or local SAS/SSD storage. By storing virtual desktop images on the local memory of the hypervisor where the desktops execute, response time are faster than even the most expensive local SSD drives (MLC or SLC), cost less when combined with Atlantis ILIO, and increase reliability. With existing VDI architectures, virtual desktop images are stored on either shared SAN/NAS storage or local SSD disks, which are costly, have limited IOPS for write-intensive VDI workloads, can have a limited lifespan and consume more power than memory. Why is Diskless VDI Now Possible? VDI architectures have not been able to use memory as storage for non-persistent desktops for the following three reasons: 1. Storage Capacity Required per Desktop – The amount of memory required per desktop limited the density that could be achieved per server using memory as storage. 2. Cost of Memory – The cost of memory per Gigabyte was too expensive to consider using memory as storage for non-persistent virtual desktops. 3. Maximum Memory Limitations – It was not possible to put enough memory onto a cost-effective server to create a RAM disk to store non-persistent virtual desktops. 4. Bus Speed with Large Memory Configurations – Prior to Cisco Extended Memory technology, increasing memory beyond 128GB per two-socket server meant decreasing the Bus speed below 1333MHz. © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 11 Next Generation Hardware - Cisco Extended Memory Cisco Extended Memory technology enables a two-socket 12 core blade or rack server to use 48 DIMM slots to deliver 384GB of RAM at 1333MHz. For Diskless VDI, this means that customers can maximize server density per core, deliver sufficient memory to each virtual desktop, and still have enough memory remaining to create a RAM disk to storage Atlantis ILIO optimized virtual desktop images. Next Generation Software - Atlantis ILIO Atlantis ILIO reduces the size of VDI images before they reach storage, effectively eliminating the need to store up to 90% of Windows image components, further reducing the amount of storage required for a successful VDI deployment. In the context of Diskless VDI, this means that customers can store virtual desktop images with 90% less capacity, making it possible to use a memory as the storage for virtual desktop images rather than local SAS/SSD or shared SAN/NAS storage. What are the Benefits of Diskless VDI using Cisco Extend Memory and Atlantis ILIO? Unmatched Performance – Memory outperforms even the fastest Local SSD storage and delivers virtually unlimited IOPS for desktops to use locally on each server, which dramatically improves all aspects of desktop performance including boot, login, application launch, patching and anti-virus scanning. Lower Cost – Using Atlantis ILIO and Cisco UCS Extended Memory, it is possible to drive down the cost per desktop from both a CAPEX and OPEX perspective: • CAPEX – The upfront cost per desktop can be decreased below $250 per desktop including the server hardware and storage. • OPEX - Diskless VDI architectures mean that IT organizations can lower operating expenses by eliminating rack space for SAN/NAS storage, lower power consumption and eliminate the operational expenses of maintaining disk-based storage and replacing failed disks. Increased Lifespan and Reliability – Memory does not suffer from the same lifespan issues for write-intensive VDI workloads as SSDs, meaning that Diskless VDI architecture will be more reliable and have lower operational costs because there will be no failed disks to replace. © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 12 The Cisco UCS and Atlantis ILIO Diskless VDI Architecture Architecture Overview Cisco and Atlantis Computing™ have partnered to deliver a new and innovative Diskless VDI architecture that enables customers to deploy Virtual Desktops without the need for storage, better than PC performance and at a lower cost than physical workstations. The Cisco UCS and Atlantis ILIO Diskless VDI architecture integrates Cisco UCS blades (B series) or rack servers (C series) with Atlantis ILIO VDI optimization technology to provide a single high performance server without the need for disk-based storage. Figure 1. Cisco UCS and Atlantis ILIO Diskless VDI Recommended Architecture Diagram © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 13 Diskless VDI Configuration After testing a variety of configuration, Cisco determined that the following configuration provided the optimal density of up to 120 virtual desktops: • • • • • • • • Virtual Desktops Per Server (Density) - 120 RAM for the Hypervisor – 2GB RAM allocated per Desktop – 1.5GB RAM Disk allocated for virtual desktop clone storage – 150GB RAM for the Atlantis ILIO – 6GB CPUs for Atlantis ILIO - 1 CPU reservation for Atlantis ILIO – 3324Mhz CPUs for Desktops - 11 For information on the other configurations tested and detailed test results, see the “Testing Methodology” section of this document. Test Findings & Results Cisco Extended Memory Using Cisco Extended Memory, the Cisco UCS B250 is able to support using up to 384GB of DDR3 memory at the maximum bus speed of 1333 MHz on a 2 socket server. This configuration is unique and provides an optimal server configuration for a diskless VDI architecture with the best possible price/performance. Atlantis ILIO Inline Deduplication of Diskless VDI Images Atlantis ILIO deduplicates inline all virtual desktop images before they reach storage, effectively eliminating the need to store up to 90% of Windows image components and applications. In testing on the Cisco C 250 M2, results showed a reduction of 89.7% from 14.66GB to 1.5GB per virtual desktop. With the Atlantis ILIO 90% storage capacity reduction in the virtual desktop image size, it becomes cost-effective to use a RAM Disk as the primary datastore for virtual desktop images. Figure 2. Virtual Desktop Image Size Before Atlantis ILIO as seen in Hypervisor Figure 3. Virtual Desktop Image Size After Atlantis ILIO Inline Deduplication © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 14 Login VSI 3.0 was used to establish the maximum density of 120 virtual desktops on the Cisco UCS Diskless VDI configuration. During the test, VSIMax was not reached, meaning that response time at 120 virtual desktops was below the maximum threshold. At lower densities of 100 and 110, the Login VSI response time was significantly faster due to lower levels of CPU utilization. Figure 4. LoginVSI 3.0 Medium Summary Chart for Diskless VDI at 120 Density © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 15 Figure 5. LoginVSI 3.0 Medium Summary Chart for Diskless VDI at 100 Density © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 16 Atlantis ILIO IO Processing Atlantis ILIO IO processing in combination with Cisco Extended Memory was able to deliver 44,123 IOPS using only memory and no local disk or SAN/NAS. The test was conducted using the IOMeter performance benchmarking tool to simulate VDI workload. Based on analysis of VMware vSphere, the IOPS on this server were limited by the network driver that was used for vSphere (E1000). With the VMXNET3 adapter, it is possible to achieve up to 100,000 IOPS on a single server with Atlantis ILIO and Cisco Extended Memory. The IOMeter test was performed on Configuration 1 with 2 vCPUs allocated to the Atlantis ILIO virtual machine. For information on the configuration of IOMeter, see the Testing Methodology section of this document. Figure 6. Diskless VDI Architecture Storage Performance Measured in IOPS © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 17 Diskless VDI Price/Performance Compared to Existing Non-Persistent VDI Architectures Scenario 1 Diskless VDI 2 Local SAS with ILIO 3 Local SSD with ILIO 4 Local SAS+SSD without ILIO Server Cisco UCS C250 M2 or B250M2 Cisco UCS C250 M2 or B250 M2 Cisco UCS C250 M2 Cisco B250 M2 Cisco B250 M2 Cisco B200 M2 HP DL 380 G7 CPU Intel 5670 12 [email protected] 3GHz 384GB Intel 5670 12 [email protected] 3GHz 192GB Cisco UCS C250 M2 or B250 M2 Intel 5670 12 Cores@2. 93GHz 192GB Intel 5680 12 [email protected] 3GHz 192GB Intel 5670 12 Cores@2. 93GHz 192GB Intel 5670 12 Cores@2. 93GHz 192GB Intel 5680 12 [email protected] GHz Intel 5670 12 [email protected] GHz 192GB 144GB No Storage Required 2-8 15K SAS 2x100GB SSD 2x100GB SSD, 6x15K SAS NetApp FAS 3210 With Flash Cache Shared Storage VNX 5300 with FAST CACHE 1xFusionIO 320GB MLC RAM Storage Density Per Server Total IOPS Per Server 5 NFS Storage without ILIO 6 NFS Storage with ILIO 7 Block Storage without ILIO 8 HP Servers with FusionIO 120 80 80 70 80 80 40 80 44,123 20,000 25,000 6,000 916 20,000 2,475 29,200 $197 $219 $237 $219 $392 $457 $1,833 $380 $0.58 $0.87 $0.76 $2.55 $38.52 $1.83 $29.63 $1.04 x x CAPEX Costs CAPEX Total Cost Per Desktop Cost Per IOPS Decreases OPEX Costs Supports Blade Servers NO Disk or SSD Replacement NO Power & Cooling for Disks/SSD NO Rack Space for Disks x x x x x x x x x © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information x .White Paper Page 18 1. Diskless VDI with Cisco UCS and Atlantis ILIO Diskless VDI with Cisco UCS and Atlantis ILIO offers the option of either the Cisco UCS C250 rack server or B250 Blade Server with 12 cores and 384GB of RAM. The server is configured with an Atlantis ILIO virtual machine and a 150GB RAM disk as the primary storage. This configuration was able to deliver 44,123 IOPS or 333 IOPS per desktop at a density of 120 virtual desktops per server. With the Cisco Extended Memory Technology, the server is able provide 384GB of memory on a single server with Atlantis ILIO software for a cost per desktop of $212. The Cisco and Atlantis ILIO diskless VDI architecture offers by far the best price/performance of the existing non-persistent VDI architectures with a cost per IOPS of $0.58. In addition, the diskless VDI architecture lowers operational costs by eliminating the possibility of failed drives, lower power consumption and less rack space used that a shared storage architecture. The Diskless VDI architecture with Cisco UCS and Atlantis ILIO has undergone proof-ofconcept testing by Cisco and been deployed in production by a large financial services customer. 2. Local SAS Drives with Atlantis ILIO With Atlantis ILIO, it is possible to deploy a high-performance virtual desktop image at a low cost per desktop with standard 15K SAS drives in a Cisco C 250 M2 rack server. The server is configured with an Atlantis ILIO virtual machine and a 2-8 15K SAS disks as the primary storage for virtual desktop images. This configuration was able to deliver an estimated 20,000 IOPS or 250 IOPS per desktop at a density of 80 virtual desktops per server. With the Cisco Extended Memory Technology, the server is able provide 192GB of memory on a single server with Atlantis ILIO software for a cost per desktop of $234. The Cisco and Atlantis ILIO Local SAS VDI architecture offers a low cost per desktop and extremely fast desktop performance using standard SAS drives. This configuration was validated as part of Cisco VXI Phase 2 Testing. For more information, visit the Cisco VXI Cisco Validated Design for VMware View. 1. Local SSD Drives with Atlantis ILIO Local SSD Drives are often considered as a storage option for VDI because they are rated to provide a large number of IOPS per drive. However, SSD performance is dramatically reduced with the write-heavy and intense characteristics of the VDI workload. In addition, the VDI workload can cause the limited lifespan of an SSD to move from years to months, increasing the risk of disk failure. From a cost perspective, SSDs can also be very costly and have limited storage capacity, necessitating the use of SAS drives in combination with SSD drives. However, Atlantis ILIO inline deduplication decreases the capacity required per virtual desktop to the point where two 100GB SSDs are sufficient to support 80 virtual desktops per server. Atlantis ILIO IO traffic reduction reduces the write IO load on the SSD, which both extends SSD lifetime and increases the number of IOPS available to virtual desktops. In the Local SSD architecture, the server is configured with an Atlantis ILIO virtual machine and a 2-8 15K SAS disks as the primary storage for virtual desktop images. This configuration was able to deliver an estimated 25,000 IOPS or 333 IOPS per desktop at a density of 80 virtual desktops per server. With the Cisco Extended Memory Technology, the server is able provide 192GB of memory on a single server with Atlantis ILIO software for a cost per desktop of $252. The Cisco and Atlantis ILIO Local SSD VDI architecture offers a low cost per desktop and extremely fast desktop performance that can be deployed with either a Cisco UCS C Series rack server or B Series Blade Server. This configuration was validated as part of Cisco VXI Phase 2 Testing. For more information, visit the Cisco, Citrix and Atlantis ILIO VDI Reference Architecture © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 19 2. Local SAS+SSD Drives without Atlantis ILIO The Cisco VIX Phase 2 testing includes a deployment profile using a Cisco C250 M2 with 2x 100GB SSD drives and 6x 146GB SAS Drives with a density of 70. Unlike the Local SSD with Atlantis ILIO configuration, the size of the virtual desktop images prevents achieving a cost-effective density without adding the 6xSAS drives. In this configuration, the read-intensive replica or master image is placed on the SSDs, while the write intensive clones are placed on the SAS drives. As a result, only the read-intensive tasks such as Boot and anti-virus scans benefit from the additional IOPS delivered by the SSDs, while normal write-intensive tasks remain bottlenecked by the limited IOPS of the SAS drives. Note: The blue line represents reads serviced by the SSD drives storing the master image, while the red line show reads services serviced by the SAS drives storing the linked-clones. © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 20 Note: The turquoise line shows writes to the SAS drives storing the Linked-clones, while the blue line show no writes going to the SSDs. “SSD drives are used strictly for the read-only replica and is reflected in the graph above by the `0' write IOPS to the SSD drives.” According to Cisco VXI testing using a simulated VDI workload with SCAPA, this configuration was able to deliver a peak of 6,000 Total IOPS or 86 IOPS per desktop at a density of 70 virtual desktops per server. While the cost of this configuration is low at $202 per desktop, the cost per IOPS is very high at $2.36 (compared to $0.56 with Diskless VDI). The total number of IOPS provided per desktop is 86, which will provide adequate performance but will not achieve equal performance to a physical PC and performance may be poor during times of peak usage as desktops burst over 100 IOPS per desktop. This configuration was validated as part of Cisco VXI Phase 2 Testing. For more information, visit the Cisco VXI Cisco Validated Design for VMware View. © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 21 3. Network File System (NFS) without Atlantis ILIO The pricing used for this analysis includes the Cisco UCS server hardware and NFS shared storage but excludes the switching and VDI broker licenses for comparison purposes. The NFS array used was a shared storage architecture that is designed to provide 6-10 IOPS per desktop. Because buying a shared storage array is far more expensive than purchasing local SAS, SSD or Memory, the cost per desktop of the this shared storage component is almost double that of a local disk or memory based architecture. In addition, the 6-10 IOPS per desktop will result in a poor user experience as Windows 7 required 30 to 100 IOPS to achieve equal to physical performance. Due to the high cost per desktop and the low number of IOPS per desktop, the price/performance of the FlexPod architecture at $38.52 per desktop is much lower than any of the local disk based architetures ($0.56 per IOPS for Diskless VDI). This configuration was validated as part of Cisco VXI Phase 2 Testing. For more information, visit the Cisco VXI Cisco Validated Design for VMware View. Note: the green line indicates writes per second, while the red line indicates reads per second. This test was conducted as a density of 80. The total of the Write and Read IOPS equal about 800 IOPS total or 10 IOPS per desktop. © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 22 1. Network File System (NFS) utilizing Atlantis ILIO The pricing used for this analysis includes the Cisco UCS server hardware, NFS shared storage, and the Atlantis ILIO software license cost but excludes the switching and VDI broker licenses for comparison purposes. This architecture is the same as the the above architecture with the exception that an Atlantis ILIO virtual machine is inserted between the hypervisor and NFS storage to deliver 20 times more IOPS per desktop (250 IOPS per desktop with ILIO vs. 10) and reduce the amount of storage capacity consumed. The addition of the Atlantis ILIO license cost increases the cost per desktop slightly but delivers far more IOPS per desktop, resulting in a much better price per IOPS of $1.89 compared to $38.52 per IOPS with NFS alone. In addition, providing each desktop with 250 IOPS will result in better than physical PC performance compared to the 10 IOPS per desktop with NFS alone. The NFS architecture with Atlantis ILIO is a good choice for customers who prefer shared storage, want to take advantage of features such as vMotion that aren’t possible with local disk architectures or are using persistent desktops . This configuration was validated as part of Cisco VXI Phase 2 Testing. For more information, visit the Cisco VXI Cisco Validated Design for VMware View. 2. Block Level Storage The pricing used for this analysis includes the Cisco UCS server hardware and Block based shared storage components. The block level storage array uses a shared storage architecture, that is designed to provide 2,457 IOPS total and 62 IOPS per desktop when used with 160 virtual desktops (40 desktops per server). Because buying a shared storage array is far more expensive than purchasing local SAS, SSD or Memory, the cost per desktop of the block level storage is 8 times more expensive than a local SSD based architecture with Atlantis ILIO. Larger block level storage arrays that support more virtual desktops will likely have a lower cost per desktop but it will always be at least twice the cost of a local disk architecture. When there are more desktops included in a block level VDI storage architecure, it is important to ensure that the IOPS per desktops keeps pace with the increase in density. Due to the high cost per desktop, the price per IOPS of the block level architecture is $29.63 is much higher than any of the local disk based architectures ($0.56 per IOPS for Diskless VDI). This configuration was validated as part of Cisco VXI Phase 2 Testing. 3. HP Servers with FusionIO Local SSD Drives are often considered as a storage option for VDI because they are rated to provide a large number of IOPS per drive. However, SSD performance is dramatically reduced with the write-heavy and intense characteristics of the VDI workload. In addition, the VDI workload can cause the limited lifespan of an SSD to move from years to months, increasing the risk of disk failure. From a cost perspective, SSDs can also be very costly and have limited storage capacity, necessitating the use of SAS drives in combination with SSD drives or the purchase of higher capacity and more expensive SSD drives. Fusion IO SLC drives provide a large number of IOPS. However, even with a relatively small size of each desktop clone of 4GB, you need to purchase a 320GB Fusion IO SLC card, which cost $14,362 per card. If the virtual desktop clones or user data disk grow, a second 320GB SLC card will need to be added. This configuration was able to deliver an estimated 29,200 IOPS or 365 IOPS per desktop at a density of 80 virtual desktops per server. At $380, the cost per desktop is 84% more expensive per desktop than the Diskless VDI architecture and 67% more expe The Cisco and Atlantis ILIO Local SSD VDI architecture offers a low cost per desktop and extremely fast desktop performance that can be deployed with either a Cisco UCS C Series rack server or B Series Blade Server. © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 23 Comparing Diskless VDI to Existing Approaches to VDI Storage Challenges of the VDI Workload for Traditional Storage Traditional storage technologies including SAN/NAS and SSDs are not equipped to handle the unique nature of VDI workloads, which results in poor desktop performance and more storage disks required to service VDI IO traffic. Write Heavy IO Traffic Unlike server virtualization, VDI workloads are extremely write intensive with the typical distribution of IOPS being 80% write and 20% read during normal desktop operation. Traditional storage caching and SSDs are ineffective with write IO and have little impact on improving virtual desktop performance. IO Blender Effect—From Sequential to Random Small Blocks When the Windows operating system generates disk IO, it optimizes that IO on its local hard drive so that blocks are stored sequentially for optimal performance. With VDI, the hypervisor converts sequential IO into small blocks of random IO (the IO Blender effect), which decreases storage and desktop performance. Atlantis ILIO automatically converts the small random blocks into larger blocks of sequential IO before sending to storage, increasing storage and desktop performance. Peak Bursts of 10x Average IO With VDI, end user activities such as simultaneous boot, logon and application IO storms or common IT activities such as anti-virus scanning, patching or cloning generate peak IO that can be 10 times or more the average IO traffic. As a result, storage can either be sized for peaks and be extremely expensive or sized for the average IO traffic and result in serious performance impact during periods of peak activity. Atlantis ILIO delivers local IOPS to virtual desktops to ensure consistently high performance during peak usage. The Challenge of Sizing and Designing VDI Storage Architectures With physical PCs, it was easy for IT organizations to deliver desktops without concern for scale. The desktop team would support a fixed number of standardized desktop and laptop PC models designed for different types of workers. Each model delivered a predictable level of performance for a predictable price. The reason for this is that physical PCs have dedicated and fixed computing resources (memory, CPU, hard drive). With virtualization, computing resources are abstracted and pooled to be used more efficiently. With server virtualization, the workloads are predictable enabling IT organizations to accurately predict the usage of computing resources. However, with desktop virtualization, workloads are unpredictable with large variation between average and peak resource utilization, write IO heavy (80% write/20% read), and generate random 4k blocks. In order to achieve a balance between VDI cost and performance, IT organizations need to design server, networking and storage infrastructure components that can scale linearly as the VDI user base grows. There are three critical elements planning VDI storage: 1. Storage IO Throughput (Measured in IOPS) - The first VDI bottleneck reached is almost always Storage IO throughput as a typical disk has a fixed number of input/output operations per second (IOPS) but can vary in capacity from 100GB to 1TB in capacity. The only way to increase the amount of IOPS with traditional storage is to increase the number of disks and controllers. 2. Network Throughput to Storage(Gb/s) – With Shared SAN or NAS storage, it is also critical to ensure that there is sufficient network throughput to the storage system. VDI architecture often require 10Gbe or Multiple Fibre Channel links to support the amount of network traffic generated by virtual desktop IO traffic. © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 24 3. Storage Capacity (Measured in GB) – Storage capacity is often less of an issue as you can select drive sizes to match the size of the virtual desktops. However, with persistent desktops of 20-80GB per desktop, even capacity can become a bottleneck. Existing Approaches to VDI Storage There are two existing approached to VDI storage: 1. Shared Storage using SAN or NAS 2. Local Disk Storage using SSDs, SAS or SATA drives Shared SAN or NAS storage offers the benefit of increased reliability and the ability to support persistent desktops and virtualization features such as vMotion. However, using SAN or NAS storage can also increase the cost per desktop by 2 to 10 times compared to local disk or the diskless VDI architecture discussed later in this whitepaper. In addition, the desktop performance delivered by a SAN or NAS is typically poor because storage systems are designed to support only 6-10 IOPS per desktop to keep desktop costs down. Conversely, Local disk based storage architecture only support non-persistent virtual desktops. However, local disk architectures cost far less and deliver better desktop performance by providing 62-365 IOPS per desktop (See Price/Performance Section of this whitepaper). As a result, many customers with non-persistent virtual desktops are shifting to a Local Disk based VDI storage architecture with Cisco UCS and Atlantis ILIO to deliver SAN or NAS Storage Figure 7. Cisco Reference Architecture with NAS Storage © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 25 Cost Conventional SAN/NAS storage systems can only deliver the data throughput (IOPS) needed to support VDI by increasing the number of storage drives and controllers well beyond that needed to deliver the required disk space. Consequently, the cost of VDI storage per desktop to achieve the equivalent performance of a physical 1 2 PC can be anywhere from $509 to $2,385 depending on the SAN or NAS storage system, virtual desktop image and other infrastructure factors. In addition, SAN or NAS storage can require additional networking such as Fibre Channel HBA cards or 10Gbe networking to achieve acceptable performance, which further drives up the cost of VDI. While VDI functions with $50-$500 in storage per desktop, desktop performance will suffer during periods of peak usage and users will often reject switching from a physical desktop to a virtual desktop. From an operational perspective, a 224 disk NAS storage system supporting a 1,000 virtual desktop deployment would require almost 2 racks of datacenter space, power and cooling costs. 2 Interview with Franfurter Bank with Hitachi Data Systems 9585 2GB Fiber Channel Storage estimated at $2,385 per desktop without Atlantis ILIO. © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 26 Latency A traditional physical PC hard drive is dedicated to a single desktop and connected to the same physical hardware resulting in 1-2ms response time. Using Traditional SAN/NAS as the hard drive of a virtual desktop (.vmdk or .vhd file), requires the hypervisor (VMware vSphere, Citrix XenServer or Hyper-V) to read and write to disk IO over the network, which can introduce latency of 4-20ms. As the VDI deployment scales, more desktops are competing for a limited amount of IO throughput (IOPS) that increase response time. As the IOPS load increase on a SAN/NAS system, the latency or response time also increases. This means that you can only load the SAN/NAS storage system to about 50% without increasing latency and degrading desktop performance. In the example below, Latency begins increase sharply at 50% load. Figure 8. NAS Response Time by IO Throughput Load (Source: SPC Benchmark) 3 3 © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 27 Reliability SAN and NAS storage systems are typically highly reliable because they have a variety of RAID levels available to protect against data loss when a disk failure occurs and High Availability options to protect against controller failures. However, using large number of SAS and SATA drives means the disks will have to be maintained and replaced when failure occurs, which introduces significant operational expenses. “We find that in the field, annual disk replacement rates typically exceed 1%, with 2-4% common and up to 13% observed on some systems. This suggests that field replacement is a fairly different process than one might 4 predict based on datasheet MTTF.” Local Solid State Drives (SSDs) Local Solid State Drives (SSDs) can be used to store non-persistent virtual desktops to store the clones or writecache. They come in two general types Single Level Cell (SLC) or Multi-level Cell (MLC). Within the Category of SLC, there are also flash based drives that deliver more IOPS but at a much higher cost per drive and GB. Due to concerns about lifespan and write-performance under heavy VDI loads with random small block IO, VDI typically used standard SLC SSD drives or Flash-based SLC drives. Cost – The cost per desktop can vary widely for between SSD drive depending on type from $200 for consumer 5 grade SSDs to $14,362 for a 320GB flash based SLC drive. Due to the size of virtual desktop clones, you often need to purchase multiple SSD drives or limit the density of virtual desktops per server which drives up the cost per desktop (See the Price/Performance Section of this document for examples) Performance – SSDs typically list extremely high IOPS per drive for sequential read-heavy workload. However, the VDI workload, which consists of write-heavy IOPS with random 4K blocks, decreases the number of IOPS. As a result, some architecture use SSDs as a storage tier for the master image which is small in capacity requirements and read-heavy IO traffic and then places the linked Capacity – SSD drive are limited to 100GB to 320GB of capacity and the cost per GB is as high as $44 per GB. While processors can handle 50-200 desktop per server, they require at least 4GB per desktop of capacity to store the virtual desktop clones, which means that VDI architectures become capacity bound with SSDs. Reliability – MLC SSDs are not typically used for enterprise-class applications such as VDI because of the short lifespan. SLC SSDs have much longer lifespan but can still fail within the lifespan of a VDI server (months or years). As a result, it is necessary to mirror drives using RAID to protect against disk failure, which doubles the cost per usable Gigabyte. 4 http://www.usenix.org/events/fast07/tech/schroeder.html 5 http://www.cdw.ca/shop/products/FUSION-IO-320GB-IODRIVE-SLC/2318279.aspx © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 28 Conclusion Cisco Systems and Atlantis Computing™ have partnered to deliver a revolutionary Diskless VDI architecture that eliminates the need for all disk-based storage. For the first time, Cisco Extended Memory Technology and Atlantis ILIO VDI Optimization software make it possible to replace shared storage or local SSDs with local memory for virtual desktop storage. This unique VDI architecture enables customers to create a Virtual Desktop Infrastructure with a cost per desktop below $250 per seat, better performance than a physical PC and lower OPEX costs by eliminating the power, cooling and operational complexity of maintaining/replacing disks in the datacenter. Availability The products used in the Cisco and Atlantis Computing reference architectures using local disk (SAS, SSD) and shared storage are available immediately through Cisco and Atlantis Computing partners. The technology used in the Cisco UCS and Atlantis ILIO Diskless VDI architecture is being showcased for the first time at VMworld 2011 and is not yet generally available. For more information on the Cisco Systems and Atlantis ILIO Diskless VDI Architecture, contact Cisco Systems or Atlantis Computing. Cisco Systems, Inc. 170 West Tasman Drive Atlantis Computing, Inc. San Jose, CA 95134-1706 2570 West El Camino Real, Suite 230 USA Mountain View, CA 94040 www.cisco.com USA Tel: 408 526-4000 800 553-NETS (6387) Fax: www.atlantiscomputing.com Tel: 650 917-9471 408 527-0883 © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 29 © 2011 Cisco Systems, Inc. All rights reserved. Cisco, the Cisco logo, and Cisco Systems are registered trademarks or trademarks of Cisco Systems, Inc. and/or its affiliates in the United States and certain other countries. All other trademarks mentioned in this document are the property of their respective owners. (0805R) © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 30 Appendix 1 – Previous Cisco UCS and Atlantis Computing VDI Reference Architecture Testing Cisco Systems and Atlantis Computing have partnered to deliver a variety of VDI Reference Architectures and Cisco Validated Designs to assist customers in designing their VDI deployments using both VMware View and Citrix XenDesktop with a variety of backend storage options including: • Local SAS Drives on a C250 M2 UCS Server (Cisco VXI Phase 2) • Local SSD Drives on a B250 M2 UCS Server (Cisco, Citrix and Atlantis ILIO Reference Architecture) • Shared NAS (NetApp 3170) with a B250 M2 UCS Server (Cisco, Citrix and Atlantis ILIO Reference Architecture) Cisco Systems testing shows that Atlantis ILIO with Cisco UCS can: • • • Cut VDI Storage Costs – Reduce VDI storage by up to 90% Scale Existing VDI Storage – Add 4-10 times more used on the same storage with better performance Boost VDI Performance – Eliminate IO bottlenecks to increase performance up to 10 times Cisco VXI Phase 2 Testing Results Summary In this profile, Atlantis ILIO was deployed on a UCS C250 M2 with local drives to optimize storage and improve overall performance. Testing was done with 70 Windows7 32b desktops running on View 4.5 and ESXi. Test Environment and Setup • View 4.5 on ESXi 4.1; RDP • HVD Profile: – Windows 7 32b with 1.5G of memory and 20G of disk space – 1 vCPU, Persistent desktop • Workload Profile: Cisco VXI KW+ (Cisco Unified Personal Communicator 8.5 in deskphone mode, IE, Microsoft Office 2007 Apps, Acrobat) with McAfee Move AV 1.5 running a default scan policy - see MoveAV section below for the scan policy used • UCS Server: C250 M2 with 192 G of memory - Two Six Core Intel Xeon (EP) 5680) processors @ 3.33 GHz and 1GE uplinks • Storage: DAS with 8 SAS drives in a RAID 10 configuration; Atlantis ILIO was deployed on the blade with 24G of RAM; Atlantis ILIO is seen as a NFS datastore by the hypervisor • All of the data shown in the graph below is collected using resxtop with a polling interval of 5sec • User Experience is measured using Scapa as outlined earlier • For this profile, data is captured and graphed for Login, Workload and Logout phases © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 31 Summary of Test Results During Cisco VXI testing, the Atlantis ILIO software virtual appliance was installed on a Cisco UCS server running 70 virtual desktops to optimize how the windows operating system interacts with the local SAS storage disks to reduce the number of disks required and boost desktop performance. During the testing, Atlantis ILIO showed an average offload of 92% and a peak offload of 94% as measured by esxtop Disk Bandwidth in MB/s. Metric Maximum (Peak) Average Figure 9. 10 Traffic from Hypervisor to Atlantis ILIO (MB/s) 11.62 10 Traffic from Atlantis ILIO to Disk (MB/s) .98 Percentage Offload 94% 245.98 14.02 92% Cisco VXI Phase 2 Atlantis ILIO IOPS Offload Test Results For More Information on Cisco VXI Phase 2 Testing visit: http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/VXI/CVD/VXI_CVD_Citrix.html http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/VXI/CVD/VXI_CVD_VMware.html Cisco, Citrix and Atlantis Computing VDI Reference Architecture © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 32 Cisco and Atlantis Computing™ partnered to deliver a VDI solution and reference architecture that eliminates the VDI memory and storage bottlenecks, enabling customers to deploy VDI with better performance and at a lower cost than a physical PC. The joint solution includes the following components (Figure 1). Figure 1. Solution Components There are two Atlantis deployment options documented in the reference architecture: Atlantis ILIO Top-of-Rack—Atlantis ILIO running one dedicated Cisco UCS B250 M2 Blade Server optimizing storage and performance for up to 8 blade servers using shared NAS storage (NetApp 3170). Atlantis ILIO OnBlade— Atlantis ILIO running on each Cisco UCS B250 M2 Blade Server optimizing storage and performance for the desktops on that blade using 2 local SSD drives. The test results show that the Cisco and Atlantis VDI Reference Architecture are able to deliver the required performance, cost and scalability required by enterprise customers with Microsoft Windows 7 running anti-virus as shown in Table 1. Table 1. Test Results Summary Test Results Virtual Desktop Density Per Blade Performance Benchmark Pass Rate Atlantis ILIO Write IOPS Offload Atlantis ILIO Read IOPS Offload Atlantis ILIO Top-of-Rack 80 Atlantis ILIO OnBlade 80 100% 100% 71% 82% 67% 91% © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 33 This graph shows the LoginVSI 2.1 Minimum, Maximum and Average Response time for desktops in the Atlantis ILIO 80 desktop OnBlade test. For More Information on the Cisco, Citrix and Atlantis Computing VDI Reference Architecture, visit: www.atlantiscomputing.com/ciscocitrixatlantis © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 34 Appendix 2 - Test Methodology Testing Overview Testing for the Cisco and Atlantis Computing Diskless VDI reference architecture focused on determining the optimal configuration to maximize price/performance for the VDI architecture. To accomplish this, it was critical to establish the maximum density of virtual desktops per server with acceptable desktop performance. To measure this, Cisco used LoginVSI 3.0 medium workload to ensure that the maximum density or “VSIMAX” was not reached during the test cycle. In addition, the LoginVSI chart shows the response time in milliseconds to determine the desktop performance level between different tested configurations that passed the density test. The testing started at a density of 100 virtual machines per server, which showed an excellent LoginVSI test with a maximum CPU utilization of 70%. In configuration 2, the density was increase to 110 virtual desktops per server with 2 CPUs for Atlantis ILIO. In this test, the hypervisor CPU averaged 46% but spiked to 100% for a brief period. However, the Atlantis ILIO virtual machine only used a maximum of 39% of the CPU. Therefore, Cisco determined that the CPU reservation should be lowered to 1 CPU for Atlantis ILIO, thereby freeing up more CPU resources for desktops to add more density. In configuration 3, the density was increased to 120 with more CPU allocated to the desktops. The results was a passing LoginVSI Max score. This was determined to be the maximum density for the configuration with based on the CPU utilization reaching 100% and the LoginVSI response time increasing compared to the 100 and 110 density configurations. To measure the number of IOPS for the storage configuration, Cisco used IOMeter configured to simulate a VDI workload. The Cisco and Atlantis ILIO Diskless VDI architecture was able to deliver 44,123 IOPS per server or 367 IOPS per desktop. Tested Diskless VDI Configurations & Detailed Results All configurations were tested with the Cisco B250 M2 Extended Memory blade server with the Intel 5680 processor @3.33GHz with 384GB of Memory. Configuration 1 – 100 Density with 2GB of RAM Per Desktop In configuration 1, the test was setup with the following configurations: • • • • • • • • Virtual Desktops Per Server (Density) - 100 RAM for the Hypervisor – 2GB vRAM allocated per Desktop – 2GB RAM Disk allocated for virtual desktop clone storage – 150GB vRAM for the Atlantis ILIO – 6GB vCPUs for Atlantis ILIO - 2 CPU reservation for Atlantis ILIO – 6648Mhz CPUs for Desktops - 10 © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 35 Figure 10. LoginVSI 3.0 Medium Summary Chart for Diskless VDI at 100 Density © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 36 Figure 11. Atlantis ILIO CPU Utilization at 100 Virtual Machines © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 37 Figure 12. Hypervisor CPU Utilization at 100 Virtual Machines Configuration 2 – 110 Density with 1.5GB of RAM Per Desktop In configuration 2, the test was setup with the following configurations: • • • • • • • • Virtual Desktops Per Server (Density) - 100 RAM for the Hypervisor – 2GB vRAM allocated per Desktop – 2GB RAM Disk allocated for virtual desktop clone storage – 150GB vRAM for the Atlantis ILIO – 6GB vCPUs for Atlantis ILIO - 2 CPU reservation for Atlantis ILIO – 4995Mhz CPUs for Desktops – 10-11 © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 38 Figure 13. LoginVSI 3.0 Medium Summary Chart for Diskless VDI at 110 Density Figure 14. Atlantis ILIO CPU Utilization at 110 Virtual Machines © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 39 Figure 15. Hypervisor CPU Utilization at 110 Virtual Machines Configuration 3 – 120 Density with 1.5GB of RAM Per Desktop In configuration 3, the test was setup with the following configurations: • • • • • • • • Virtual Desktops Per Server (Density) - 100 RAM for the Hypervisor – 2GB vRAM allocated per Desktop – 2GB RAM Disk allocated for virtual desktop clone storage – 150GB vRAM for the Atlantis ILIO – 6GB vCPUs for Atlantis ILIO - 1 CPU reservation for Atlantis ILIO – 3333Mhz CPUs for Desktops - 11 © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 40 Figure 16. LoginVSI 3.0 Medium Summary Chart for Diskless VDI at 120 Density Figure 17. Atlantis ILIO CPU Utilization at 120 Virtual Machines © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 41 Figure 18. Hypervisor CPU Utilization at 120 Virtual Machines Measuring Diskless VDI Performance To test the performance and establish the maximum density of the Diskless VDI architecture, two different tests were performed: 1. IOMeter – Measures Storage Performance in terms of Input/Output Per Second to storage. 2. Login VSI 3.0 Medium Workload – Measures overall desktop performance and establishes the maximum density for a given VDI configuration. IOMeter IOMeter can be configured to model different types of storage workloads. In this case, IOMeter was configured to simulate a VDI workload with the following configuration: • Disk Targets - Maximum Disk size to 2097152 Sectors ( 1 GB Test File) • Test connection rate - 500 Transactions per second • Access Specifications o 4KB Transfer request size o 100 percent Access specification o 80% Write o 20% Read o 80% Random LoginVSI 3.0 © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 42 The LoginVSI Medium workload test is designed to simulate a normal knowledge worker VDI workload using common productivity applications and measure response time. Login VSI 3.0 Medium • • • • • • • This workload emulated a medium knowledge working using Office, IE and PDF. Once a session has been started the medium workload will repeat every 12 minutes. During each loop the response time is measured every 2 minutes. The medium workload opens up to 5 apps simultaneously. The type rate is 160ms for each character. The medium workload in Login VSI 2.0 is approximately 35% more resource intensive than Login VSI 1.0. Approximately 2 minutes of idle time is included to simulate real-world users. Each loop will open and use: • • • • • • • • Outlook 2007, browse 10 messages. Internet Explorer, one instance is left open (BBC.co.uk), one instance is browsed to Wired.com, Lonelyplanet.com and heavy flash app gettheglass.com (not used with MediumNoFlash workload). Word 2007, one instance to measure response time, one instance to review and edit document. Bullzip PDF Printer & Acrobat Reader, the word document is printed and reviewed to PDF. Excel 2007, a very large randomized sheet is opened. PowerPoint 2007, a presentation is reviewed and edited. 7-zip: using the command line version the output of the session is zipped. For More information on analyzing VSI results, visit http://www.loginvsi.com/ © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information .White Paper Page 43 Cisco Systems, Inc. 170 West Tasman Drive San Jose, CA 95134-1706 USA www.cisco.com Tel: 408 526-4000 800 553-NETS (6387) Fax: 408 527-0883 © 2011 Cisco Systems, Inc. All rights reserved. Cisco, the Cisco logo, and Cisco Systems are registered trademarks or trademarks of Cisco Systems, Inc. and/or its affiliates in the United States and certain other countries. All other trademarks mentioned in this document are the property of their respective owners. (0805R) © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information Document number: UCS-TR1000xx .White Paper Page 44
© Copyright 2024 Paperzz