����� ������� ��������� ������ VDI �������� ������� �� Cisco �Atlantis

Diskless VDI with Cisco UCS & Atlantis ILIO
Eliminating Storage from VDI Architectures
White Paper
August 2011
Contents
Executive Summary ................................................................................................................................... 3 The VDI Cost & Performance Challenge .................................................................................................. 5 Cisco UCS and Atlantis Computing Solutions Overview....................................................................... 6 Diskless VDI – Next Generation VDI Architecture ................................................................................ 11 The Cisco UCS and Atlantis ILIO Diskless VDI Architecture ............................................................... 13 Comparing Diskless VDI to Existing Approaches to VDI Storage ...................................................... 24 Conclusion ............................................................................................................................................... 29 Appendix 1 – Previous Cisco UCS and Atlantis Computing VDI Reference Architecture Testing .. 31 Appendix 2 - Test Methodology.............................................................................................................. 35 © 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 2
Executive Summary
For VDI to be successful, it must compete with physical PCs on both price and performance. There are two key
technology bottlenecks that are driving the cost of virtual desktops upward and causing virtual desktop to
underperform physical PCs:
The VDI Memory Bottleneck
VDI requires large amounts of high-speed memory to maximize the number of desktops that can run on a single
server and lower the cost per desktop. However, existing server platforms lack the ability to deliver more than
128GB of high-speed memory per server, which lowers density and increases costs. The Cisco Extended
Memory Architecture eliminates the VDI memory bottleneck by delivering up to 384GB of high speed memory on
a dual socket server.
The VDI Storage Bottleneck
The unique nature of the VDI workloads requires customers to purchase massive amounts of shared storage or
expensive SSD based storage to deploy VDI with acceptable desktop performance. IT organization routinely
undersize storage to fit within the budget of a physical PC, which leads to not having enough storage per desktop
to deliver the right level of desktop performance. This ultimately leads users to reject VDI in favor of more familiar
PCs. Atlantis ILIO™ software optimizes VDI to deliver high performance virtual desktops with less storage.
If VDI cost more and is slower than a physical PC, how can we expect it to gain widespread adoption?
Cisco Systems and Atlantis Computing™ have partnered to deliver a revolutionary Diskless VDI architecture that
eliminates the need for all disk-based storage. For the first time, Cisco Extended Memory Technology and Atlantis
ILIO VDI Optimization software make it possible to replace shared storage or local SSDs with local memory for
virtual desktop storage. With Atlantis ILIO, virtual desktops consume up to 90% less storage capacity, making it
possible to run all virtual desktop on local server memory instead of disk.
What are the Benefits of Diskless VDI using Cisco Extend Memory and
Atlantis ILIO?
Unmatched Performance – Memory outperforms even the fastest Local SSD storage and delivers virtually
unlimited IOPS to desktops locally, which dramatically improves all aspects of desktop performance including
boot time, login, application launch, patching and anti-virus scanning.
Lower Cost – Using Atlantis ILIO and Cisco UCS Extended Memory, it is possible to drive down the cost per
desktop from both a CAPEX and OPEX perspective:
•
CAPEX – The upfront cost per desktop can be decreased below $200 per desktop including the server
hardware and storage.
•
OPEX - Diskless VDI architectures mean that IT organizations can lower operating expenses by
eliminating rack space for SAN/NAS storage, lower power and cooling requirements and eliminate the
operational expenses of maintaining disk-based storage and replacing failed disks.
Increased Lifespan and Reliability – Memory does not suffer from the same lifespan issues for write-intensive
VDI workloads as SSDs, meaning that Diskless VDI architecture will be more reliable and have lower operational
costs because there will be no failed disks to replace.
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 3
Cisco UCS and Atlantis ILIO Diskless VDI Architecture Price vs.
Performance
1
Diskless
VDI
2
Local SAS
with ILIO
3
Local
SSD with
ILIO
4
Local
SAS+SSD
without
ILIO
5
File
Storage
without
ILIO
$197
$219
$237
$219
$392
$0.58
$0.87
$0.76
$2.55
$38.52
6
File
Storage
with ILIO
7
Block
Storage 1
without ILIO
8
HP Servers
with
FusionIO
$457
$1,833
$380
$1.83
$29.63
$1.04
x
x
CAPEX Costs & Price/Performance
CAPEX
Total Cost
Per Desktop
Cost Per
IOPS
Decreases OPEX Costs
Supports
Blade
Servers
NO Disk or
SSD
Replacement
NO Power &
Cooling for
Disks/SSD
NO Rack
Space for
Disks
x
x
x
x
x
x
x
x
x
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
x
.White Paper
Page 4
The VDI Cost & Performance Challenge
Virtual Desktop Infrastructure (VDI) delivers tremendous benefits in terms of reducing the cost of provisioning,
upgrading and maintaining desktops, as well as, making computing resources more flexible. However, VDI is
relatively new to most IT organizations and requires careful design and the right architecture to deliver a highperformance, cost-effective and scalable virtual desktop infrastructure. Many companies have deployed VDI only
to realize that their VDI architecture could not scale or deliver an acceptable user experience without requiring
additional, massive investments in storage infrastructure. In order for VDI to achieve broad adoption, it is critical
that virtual desktops deliver better-than-physical PC performance and the upfront costs of implementing VDI are
equal or lower in cost than a physical PC. While it is possible to deliver better-than-physical PC user experience
or deliver a cost per desktop below that of a physical PC, it is not possible to do both with traditional storage such
as SAN/NAS or Local SSDs without compromising.
Better-Than-Physical PC User Experience
When implementing any major change to a desktop computing environment, winning over end users must be
considered a critical success factor. With traditional storage, it is possible to deliver a good user experience but it
comes at a storage cost that can exceed $1,000 per desktop. If the IT organization sizes traditional storage to fit
within the cost of a physical PC, the user performance is far slower than a physical PC. When desktop
performance is noticeably slower than a physical PC, users will reject VDI in favor of retaining their existing
physical PCs. VDI projects are often limited in size, cancelled or re-architected based on poor desktop
performance, which is most often caused by VDI storage performance.
Cost Per Desktop below a Physical PC
In order for most companies to transition all of their desktops from physical to virtual desktops, the total cost of a
virtual desktop must be lower than that of a physical PC. The major expense of VDI is not software or server
hardware, but storage. Storage can consume 50-80% of the total VDI budget if sized to deliver an acceptable
desktop performance.
The VDI Storage Bottleneck
While the Memory and CPU of a VDI desktop remain on the physical hardware where the desktop executes, the
virtual desktop hard drive is moved from being connected directly to the physical hardware to having to traverse
multiple network switches. Simply stated, VDI replaces one dedicated, low latency and inexpensive physical PC
hard drive with an expensive, shared, high latency storage array. The result is both increased latency based on
the number of network hops from the Windows operating system to the storage array and decreased desktop
performance. In addition, disk IO that is optimized by the OS for a dedicated physical PC hard drive is randomized
by the hypervisor converting previously sequential IO that is easy for storage to consume to random IO that
decreases storage and desktop performance (this is also known as the IO Blender effect). As more desktops are
added to the VDI deployment, storage contention issues arise from large numbers of VDI desktops competing for
a limited pool of storage input/output per second (IOPS).
The VDI Memory Bottleneck
Running many desktops on a single virtualized server means running multiple OS and application instances on a
single server, which demands large amounts of memory. Since CPU performance is outstripping memory
performance, memory bottlenecks are a common problem. As companies migrate from Windows XP (128MB
recommended memory) to Windows 7 (1-2GB recommended memory), the density of a servers is constrained by
memory and not CPU. Enterprises are often forced to deploy either four-socket servers or multiple two-socket
servers to address this problem. These solutions result in more expensive servers, increased power costs, and
higher licensing costs.
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 5
Cisco UCS and Atlantis Computing Solutions Overview
Cisco and Atlantis Computing™ have partnered to deliver a series new and innovative VDI solution and reference
architectures that eliminate the VDI memory and storage bottlenecks that are preventing large scale VDI
deployments. This new combination of Cisco UCS datacenter infrastructure and Atlantis Computing VDI
optimization software enables customers to deploy Virtual Desktops with amazing performance, lower cost than
physical PCs and increased reliability.
Cisco Unified Computing System (UCS) Overview
The Cisco® Unified Computing System is a next-generation data center platform that unites compute, network,
storage access, and virtualization into a cohesive system designed to reduce total cost of ownership (TCO) and
increase business agility. The system integrates a low-latency, lossless 10 Gigabit Ethernet unified network fabric
with enterprise-class, x86-architecture servers. The system is an integrated, scalable, multichassis platform in
which all resources participate in a unified management domain.
Cisco UCS Servers
Cisco UCS offers both blade server (B series) rack server (C series) that can both be used for VDI deployments.
Both the Cisco B250 M2 and C250 M2 servers are two socket servers that incorporate Cisco UCS ExtendedMemory Technology, which allows them to support up to 384GB of RAM per blade in 48 DIMM slots.
Cisco Extended Memory Technology
Modern CPUs with built-in memory controllers support a limited number of memory channels and slots per CPU.
The need for virtualization software to run multiple OS instances demands large amounts of memory, and that,
combined with the fact that CPU performance is outstripping memory performance, can lead to memory
bottlenecks. To obtain a larger memory footprint, most IT organizations are forced to upgrade to larger, more
expensive, four-socket servers. CPUs that can support four-socket configurations are typically more expensive,
require more power, and entail higher licensing costs. Cisco Extended Memory Technology expands the
capabilities of CPU-based memory controllers by logically changing the geometry of main memory while still using
standard DDR3 memory. This technology makes every four DIMM slots in the expanded memory server appear
to the CPU’s memory controller as a single DIMM that is four times the size (Figure below). For example, using
standard DDR3 DIMMs, the technology makes four 8-GB DIMMS appear as a single 32-GB DIMM.
This patented technology allows the CPU to access more industry-standard memory than ever before in a twosocket server:
For memory-intensive environments, data centers can better balance the ratio of CPU power to memory and
install larger amounts of memory without having the expense and energy waste of moving to four-socket servers
simply to have a larger memory capacity. With a larger main-memory footprint, CPU utilization can improve
because of fewer disk waits on page-in and other I/O operations, making more effective use of capital
investments and more conservative use of energy.
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 6
For environments that need significant amounts of main memory but which do not need a full 384 GB, smallersized DIMMs can be used in place of 8-GB DIMMs, with resulting cost savings: two 4-GB DIMMS are typically
less expensive than one 8-GB DIMM.
The other key feature of the Cisco expanded memory technology is the ability to run the DDR-3 memory at the
highest, 1333Mhz speed when configured with 384 Gigabytes of memory. Typically as you add memory to
systems, the memory speed decreases. This is not the case with the extended memory servers from Cisco
Systems.
B250 M2 Extended-Memory Blade Server
Building on the success of the Cisco UCS B200 M1 and UCS B250 M1 servers, the Cisco UCS B200 M2 and UCS B250 M2
servers extend the capabilities of the Cisco Unified Computing System with the next generation of Intel processor technology:
®
®
Intel Xeon 5600 series processors. These powerful processors deliver more cores, threads, and cache, all within a similar
power envelope, with even faster payback, greater productivity, and better energy efficiency. When put into production, Cisco
Unified Computing System and Intel Xeon 5600 series processors together offer further reductions in TCO, increased
business agility, and another big leap forward in data center virtualization.
Product Overview
The Cisco UCS B-Series Blade Servers are crucial building blocks of the Cisco Unified Computing System, delivering scalable
and flexible computing for today's and tomorrow's data center while helping reduce TCO.
The Cisco UCS B-Series Blade Servers are based on industry-standard server technologies and provide:
• Up to two Intel Xeon Series 5600 multicore processors
• Two optional front-accessible, hot-swappable SAS hard drives
• Support for up to two dual-port mezzanine card connections for up to 40 Gbps of redundant I/O throughput
• Industry-standard double-data-rate 3 (DDR3) memory
• Remote management through an integrated service processor that also executes policy established in Cisco UCS Manager
software
• Local keyboard, video, and mouse (KVM) access through a front console port on each server
• Out-of-band access by remote KVM, Secure Shell (SSH) Protocol, and virtual media (vMedia) as well as Intelligent Platform
Management Interface (IPMI)
The Cisco UCS B-Series offers two blade server models that utilize the Intel Xeon 5600 series processors: the Cisco UCS
B200 M2 2-Socket Blade Server and the Cisco UCS B250 M2 2-Socket Extended Memory Blade Server (Figure 2). The Cisco
UCS B200 M2 is a half-width blade with 12 DIMM slots for up to 192 GB of memory; it supports one mezzanine adapter. The
Cisco UCS B250 M2 is a full-width blade with 48 DIMM slots for up to 384 GB of memory; it supports up to two mezzanine
adapters. The UCS B250 extended memory server is the hardware platform focused within this whitepaper.
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 7
C250 M2 Extended-Memory Rack-Mount Server
The Cisco UCS C250 M2 server is a high-performance,
memory-intensive, 2-socket, 2 RU rack-mount server
designed to increase performance and capacity for
demanding virtualization workloads. Applications that
are memory bound today will benefit from the 384 GB of
addressable memory that the Cisco UCS C250 M2
server offers.It also can reduce the cost of memory by allowing customers to use lower capacity memory with the
additional memory slots available in the C250 M2 Extended Memory Rack-Mount Server.
The ability to grow a server memory by utilizing 48 DIMM slots available on a single motherboard, the Cisco UCS
C250 M2 server design is unique among two-socket servers based on Intel Xeon 5600 series processors. From a
memory-capacity perspective, it can alleviate memory bottlenecks issues by removing g the need to move to
costly four-socket. The ability for both the C250-M2 and B250-servers might otherwise be necessary, helping
improve the price-to-performance ratio for running large-memory-footprint applications. From a memory-cost
perspective, the server can be populated with low-cost 4-GB DIMMs for a total of up to 192 GB of main memory;
this memory configuration delivers a memory footprint that other two-socket, Intel Xeon 5600 series processorbased systems require 16-GB DIMMS to achieve. The server also can be populated with 8-GB DIMMs for a total
of up to 384 GB of memory. Extended memory can operate at the same speed (1333 MHz) that smaller memory
footprints do (typically with x86 servers, when you add memory the speed decreases).
Cisco® UCS C-Series Rack-Mount Servers extend unified computing innovations to an industry-standard form
factor to help reduce total cost of ownership (TCO) and increase business agility. Designed to operate both in
standalone environments and as part of the Cisco Unified Computing System™, the series employs Cisco
technology to help customers handle the most challenging workloads. The series incorporates a standards-based
unified network fabric, Cisco VN-Link virtualization support, and Cisco Extended Memory Technology. It supports
an incremental single server deployment model and protects customer investments with a future migration path to
unified computing.
These benefits of Cisco Extended Memory Technology can be harnessed by customers when very large memory
footprints are required, or when large, low-cost memory footprints are desirable, as in for large virtualized
environments can host more or larger virtual machines with the server's larger memory footprint, and with higher
performance in cases in which existing implementations are memory bound.
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 8
Atlantis ILIO VDI Storage and Performance Optimization Software
Atlantis ILIO is a VDI storage and performance-optimization software solution that complements Cisco UCS to
optimize storage, boost desktop performance, and make VDI security economically feasible. Atlantis ILIO
fundamentally changes the economics and performance characteristics of VDI by intelligently optimizing how the
Windows operating system interacts with storage.
Scaling and Optimizing Storage for VDI
Atlantis ILIO processes up to 90% of the virtual desktop storage traffic locally, offloading the shared storage
infrastructure and therefore reducing the amount of storage needed for each desktop. This enables customers to
scale their VDI deployments to 4 -7 times more desktops (6 times in the case of CBRE) with their existing storage
systems.
Boosting Desktop Performance
Atlantis ILIO addresses the virtual desktop performance problem without requiring additional storage
infrastructure. It eliminates the storage bottleneck by effectively delivering a massive amount of IOPS to boost all
aspects of virtual desktop performance, including boot time, logons, profile loading, applications, productivity
tasks, and virtualized applications.
Making VDI Security Possible
Anti-virus protection and endpoint security are requirements for enterprise VDI deployments. However, traditional
anti-virus can cut density per server up to 50%, degrade performance, and ultimately increase the network,
storage, and server infrastructure costs of VDI. Atlantis ILIO integrates with leading anti-virus solutions to
dramatically accelerate anti-virus scanning and eliminate redundant anti-virus scanning operations. With
traditional anti-virus, Atlantis ILIO can eliminate the additional storage required to service the IO traffic generated
by anti-virus, increasing density and accelerating anti-virus scanning.
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 9
How Atlantis ILIO Works
Atlantis ILIO is a software virtual machine that is installed on the same hypervisor or rack as the virtual desktops
to optimize how the Microsoft Windows XP and Windows 7 operating systems interact with storage. Atlantis ILIO
technology, including content aware IO processing and Inline Deduplication are highly efficient and designed
specifically for VDI workloads:
Content Aware IO Processing
Atlantis ILIO software processes all VDI traffic locally with Windows NTFS content awareness—within the same
server or rack—to dramatically reduce the amount of IO traffic going to storage and eliminate the huge burden
normally placed on a storage array by hundreds or thousands of virtual desktops.
In-Line Deduplication for VDI Workloads
Atlantis ILIO deduplicates inline all VDI images before they reach storage, effectively eliminating the need to store
up to 90% of Windows image components, further reducing the amount of storage required for a successful VDI
deployment.
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 10
Diskless VDI – Next Generation VDI Architecture
VDI adoption has been constrained by two critical barriers to adoption: cost per desktop and user acceptance
related to poor desktop performance. In order for VDI to gain widespread adoption, virtual desktops must cost
less than a physical PC and deliver an equal or better user experience. With traditional servers and storage, VDI
suffers from storage and memory bottlenecks that drive up the cost of a physical PC and degrade performance.
Cisco Systems and Atlantis Computing have eliminated the memory and storage bottlenecks of VDI and delivered
VDI reference architectures and validated designs that deliver a cost per desktop lower than a physical PC, while
at the same time achieving a better than PC user experience. Cisco Validated Designs with Atlantis ILIO eliminate
the memory and storage bottlenecks of VDI and enable customers to deliver a virtual desktop that costs less than
a PC while delivering better performance than a Physical PC. In addition, Cisco UCS with Atlantis ILIO provides
IT organizations with the flexibility to use Citrix XenDesktop or VMware View, blade servers or rack servers, and
shared NAS or local disk (SAS or SSD) storage.
To date, VDI architectures have always relied on some type of disk to storage virtual desktop images and clones,
whether it be SATA, SAS or SSD on the local server and/or a shared SAN or NAS storage system. However, the
unique Cisco Extended Memory Technology and Atlantis ILIO VDI optimization software will make it possible to
store virtual desktops on the memory of the local server without the use of disks to achieve a “Diskless VDI”
architecture that is faster, delivers better price/performance and is more reliable than any existing VDI
architecture.
What is Diskless VDI?
Diskless Virtual Desktop Infrastructure (VDI) is the concept of using local server memory in combination with
storage optimization software to store virtual desktop images instead of shared SAN/NAS or local SAS/SSD
storage. By storing virtual desktop images on the local memory of the hypervisor where the desktops execute,
response time are faster than even the most expensive local SSD drives (MLC or SLC), cost less when combined
with Atlantis ILIO, and increase reliability. With existing VDI architectures, virtual desktop images are stored on
either shared SAN/NAS storage or local SSD disks, which are costly, have limited IOPS for write-intensive VDI
workloads, can have a limited lifespan and consume more power than memory.
Why is Diskless VDI Now Possible?
VDI architectures have not been able to use memory as storage for non-persistent desktops for the following
three reasons:
1. Storage Capacity Required per Desktop – The amount of memory required per desktop limited the
density that could be achieved per server using memory as storage.
2. Cost of Memory – The cost of memory per Gigabyte was too expensive to consider using memory as
storage for non-persistent virtual desktops.
3. Maximum Memory Limitations – It was not possible to put enough memory onto a cost-effective server
to create a RAM disk to store non-persistent virtual desktops.
4. Bus Speed with Large Memory Configurations – Prior to Cisco Extended Memory technology,
increasing memory beyond 128GB per two-socket server meant decreasing the Bus speed below
1333MHz.
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 11
Next Generation Hardware - Cisco Extended Memory
Cisco Extended Memory technology enables a two-socket 12 core blade or rack server to use 48 DIMM slots to
deliver 384GB of RAM at 1333MHz. For Diskless VDI, this means that customers can maximize server density
per core, deliver sufficient memory to each virtual desktop, and still have enough memory remaining to create a
RAM disk to storage Atlantis ILIO optimized virtual desktop images.
Next Generation Software - Atlantis ILIO
Atlantis ILIO reduces the size of VDI images before they reach storage, effectively eliminating the need
to store up to 90% of Windows image components, further reducing the amount of storage required for
a successful VDI deployment. In the context of Diskless VDI, this means that customers can store
virtual desktop images with 90% less capacity, making it possible to use a memory as the storage for
virtual desktop images rather than local SAS/SSD or shared SAN/NAS storage.
What are the Benefits of Diskless VDI using Cisco Extend Memory and
Atlantis ILIO?
Unmatched Performance – Memory outperforms even the fastest Local SSD storage and delivers virtually
unlimited IOPS for desktops to use locally on each server, which dramatically improves all aspects of desktop
performance including boot, login, application launch, patching and anti-virus scanning.
Lower Cost – Using Atlantis ILIO and Cisco UCS Extended Memory, it is possible to drive down the cost per
desktop from both a CAPEX and OPEX perspective:
•
CAPEX – The upfront cost per desktop can be decreased below $250 per desktop including the server
hardware and storage.
•
OPEX - Diskless VDI architectures mean that IT organizations can lower operating expenses by
eliminating rack space for SAN/NAS storage, lower power consumption and eliminate the operational
expenses of maintaining disk-based storage and replacing failed disks.
Increased Lifespan and Reliability – Memory does not suffer from the same lifespan issues for write-intensive
VDI workloads as SSDs, meaning that Diskless VDI architecture will be more reliable and have lower operational
costs because there will be no failed disks to replace.
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 12
The Cisco UCS and Atlantis ILIO Diskless VDI Architecture
Architecture Overview
Cisco and Atlantis Computing™ have partnered to deliver a new and innovative Diskless VDI architecture that
enables customers to deploy Virtual Desktops without the need for storage, better than PC performance and at a
lower cost than physical workstations. The Cisco UCS and Atlantis ILIO Diskless VDI architecture integrates
Cisco UCS blades (B series) or rack servers (C series) with Atlantis ILIO VDI optimization technology to provide a
single high performance server without the need for disk-based storage.
Figure 1.
Cisco UCS and Atlantis ILIO Diskless VDI Recommended Architecture Diagram
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 13
Diskless VDI Configuration
After testing a variety of configuration, Cisco determined that the following configuration provided the optimal
density of up to 120 virtual desktops:
•
•
•
•
•
•
•
•
Virtual Desktops Per Server (Density) - 120
RAM for the Hypervisor – 2GB
RAM allocated per Desktop – 1.5GB
RAM Disk allocated for virtual desktop clone storage – 150GB
RAM for the Atlantis ILIO – 6GB
CPUs for Atlantis ILIO - 1
CPU reservation for Atlantis ILIO – 3324Mhz
CPUs for Desktops - 11
For information on the other configurations tested and detailed test results, see the “Testing Methodology” section
of this document.
Test Findings & Results
Cisco Extended Memory
Using Cisco Extended Memory, the Cisco UCS B250 is able to support using up to 384GB of DDR3 memory at
the maximum bus speed of 1333 MHz on a 2 socket server. This configuration is unique and provides an optimal
server configuration for a diskless VDI architecture with the best possible price/performance.
Atlantis ILIO Inline Deduplication of Diskless VDI Images
Atlantis ILIO deduplicates inline all virtual desktop images before they reach storage, effectively eliminating the
need to store up to 90% of Windows image components and applications. In testing on the Cisco C 250 M2,
results showed a reduction of 89.7% from 14.66GB to 1.5GB per virtual desktop. With the Atlantis ILIO 90%
storage capacity reduction in the virtual desktop image size, it becomes cost-effective to use a RAM Disk as the
primary datastore for virtual desktop images.
Figure 2.
Virtual Desktop Image Size Before Atlantis ILIO as seen in Hypervisor
Figure 3.
Virtual Desktop Image Size After Atlantis ILIO Inline Deduplication
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 14
Login VSI 3.0 was used to establish the maximum density of 120 virtual desktops on the Cisco UCS Diskless VDI
configuration. During the test, VSIMax was not reached, meaning that response time at 120 virtual desktops was
below the maximum threshold. At lower densities of 100 and 110, the Login VSI response time was significantly
faster due to lower levels of CPU utilization.
Figure 4.
LoginVSI 3.0 Medium Summary Chart for Diskless VDI at 120 Density
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 15
Figure 5.
LoginVSI 3.0 Medium Summary Chart for Diskless VDI at 100 Density
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 16
Atlantis ILIO IO Processing
Atlantis ILIO IO processing in combination with Cisco Extended Memory was able to deliver 44,123 IOPS using
only memory and no local disk or SAN/NAS. The test was conducted using the IOMeter performance
benchmarking tool to simulate VDI workload. Based on analysis of VMware vSphere, the IOPS on this server
were limited by the network driver that was used for vSphere (E1000). With the VMXNET3 adapter, it is possible
to achieve up to 100,000 IOPS on a single server with Atlantis ILIO and Cisco Extended Memory. The IOMeter
test was performed on Configuration 1 with 2 vCPUs allocated to the Atlantis ILIO virtual machine. For information
on the configuration of IOMeter, see the Testing Methodology section of this document.
Figure 6.
Diskless VDI Architecture Storage Performance Measured in IOPS
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 17
Diskless VDI Price/Performance Compared to Existing Non-Persistent VDI
Architectures
Scenario
1
Diskless
VDI
2
Local
SAS with
ILIO
3
Local
SSD
with ILIO
4
Local
SAS+SSD
without
ILIO
Server
Cisco UCS
C250 M2 or
B250M2
Cisco UCS
C250 M2 or
B250 M2
Cisco UCS
C250 M2
Cisco
B250 M2
Cisco
B250 M2
Cisco B200
M2
HP DL 380
G7
CPU
Intel 5670
12
[email protected]
3GHz
384GB
Intel 5670
12
[email protected]
3GHz
192GB
Cisco
UCS C250
M2 or
B250 M2
Intel 5670
12
Cores@2.
93GHz
192GB
Intel 5680
12
[email protected]
3GHz
192GB
Intel 5670
12
Cores@2.
93GHz
192GB
Intel 5670
12
Cores@2.
93GHz
192GB
Intel 5680 12
[email protected]
GHz
Intel 5670 12
[email protected]
GHz
192GB
144GB
No Storage
Required
2-8 15K
SAS
2x100GB
SSD
2x100GB
SSD,
6x15K SAS
NetApp
FAS 3210
With Flash
Cache
Shared
Storage
VNX 5300
with FAST
CACHE
1xFusionIO
320GB MLC
RAM
Storage
Density Per
Server
Total IOPS
Per Server
5
NFS
Storage
without
ILIO
6
NFS
Storage
with ILIO
7
Block
Storage
without
ILIO
8
HP Servers
with
FusionIO
120
80
80
70
80
80
40
80
44,123
20,000
25,000
6,000
916
20,000
2,475
29,200
$197
$219
$237
$219
$392
$457
$1,833
$380
$0.58
$0.87
$0.76
$2.55
$38.52
$1.83
$29.63
$1.04
x
x
CAPEX Costs
CAPEX
Total Cost
Per Desktop
Cost Per
IOPS
Decreases OPEX Costs
Supports
Blade
Servers
NO Disk or
SSD
Replacement
NO Power &
Cooling for
Disks/SSD
NO Rack
Space for
Disks
x
x
x
x
x
x
x
x
x
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
x
.White Paper
Page 18
1. Diskless VDI with Cisco UCS and Atlantis ILIO
Diskless VDI with Cisco UCS and Atlantis ILIO offers the option of either the Cisco UCS C250 rack server
or B250 Blade Server with 12 cores and 384GB of RAM. The server is configured with an Atlantis ILIO
virtual machine and a 150GB RAM disk as the primary storage. This configuration was able to deliver
44,123 IOPS or 333 IOPS per desktop at a density of 120 virtual desktops per server. With the Cisco
Extended Memory Technology, the server is able provide 384GB of memory on a single server with
Atlantis ILIO software for a cost per desktop of $212. The Cisco and Atlantis ILIO diskless VDI
architecture offers by far the best price/performance of the existing non-persistent VDI architectures with
a cost per IOPS of $0.58. In addition, the diskless VDI architecture lowers operational costs by eliminating
the possibility of failed drives, lower power consumption and less rack space used that a shared storage
architecture. The Diskless VDI architecture with Cisco UCS and Atlantis ILIO has undergone proof-ofconcept testing by Cisco and been deployed in production by a large financial services customer.
2. Local SAS Drives with Atlantis ILIO
With Atlantis ILIO, it is possible to deploy a high-performance virtual desktop image at a low cost per
desktop with standard 15K SAS drives in a Cisco C 250 M2 rack server. The server is configured with an
Atlantis ILIO virtual machine and a 2-8 15K SAS disks as the primary storage for virtual desktop images.
This configuration was able to deliver an estimated 20,000 IOPS or 250 IOPS per desktop at a density of
80 virtual desktops per server. With the Cisco Extended Memory Technology, the server is able provide
192GB of memory on a single server with Atlantis ILIO software for a cost per desktop of $234. The Cisco
and Atlantis ILIO Local SAS VDI architecture offers a low cost per desktop and extremely fast desktop
performance using standard SAS drives. This configuration was validated as part of Cisco VXI Phase 2
Testing. For more information, visit the Cisco VXI Cisco Validated Design for VMware View.
1. Local SSD Drives with Atlantis ILIO
Local SSD Drives are often considered as a storage option for VDI because they are rated to provide a
large number of IOPS per drive. However, SSD performance is dramatically reduced with the write-heavy
and intense characteristics of the VDI workload. In addition, the VDI workload can cause the limited
lifespan of an SSD to move from years to months, increasing the risk of disk failure. From a cost
perspective, SSDs can also be very costly and have limited storage capacity, necessitating the use of
SAS drives in combination with SSD drives. However, Atlantis ILIO inline deduplication decreases the
capacity required per virtual desktop to the point where two 100GB SSDs are sufficient to support 80
virtual desktops per server. Atlantis ILIO IO traffic reduction reduces the write IO load on the SSD, which
both extends SSD lifetime and increases the number of IOPS available to virtual desktops. In the Local
SSD architecture, the server is configured with an Atlantis ILIO virtual machine and a 2-8 15K SAS disks
as the primary storage for virtual desktop images. This configuration was able to deliver an estimated
25,000 IOPS or 333 IOPS per desktop at a density of 80 virtual desktops per server. With the Cisco
Extended Memory Technology, the server is able provide 192GB of memory on a single server with
Atlantis ILIO software for a cost per desktop of $252. The Cisco and Atlantis ILIO Local SSD VDI
architecture offers a low cost per desktop and extremely fast desktop performance that can be deployed
with either a Cisco UCS C Series rack server or B Series Blade Server. This configuration was validated
as part of Cisco VXI Phase 2 Testing. For more information, visit the Cisco, Citrix and Atlantis ILIO VDI
Reference Architecture
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 19
2. Local SAS+SSD Drives without Atlantis ILIO
The Cisco VIX Phase 2 testing includes a deployment profile using a Cisco C250 M2 with 2x 100GB SSD
drives and 6x 146GB SAS Drives with a density of 70. Unlike the Local SSD with Atlantis ILIO
configuration, the size of the virtual desktop images prevents achieving a cost-effective density without
adding the 6xSAS drives. In this configuration, the read-intensive replica or master image is placed on the
SSDs, while the write intensive clones are placed on the SAS drives. As a result, only the read-intensive
tasks such as Boot and anti-virus scans benefit from the additional IOPS delivered by the SSDs, while
normal write-intensive tasks remain bottlenecked by the limited IOPS of the SAS drives.
Note: The blue line represents reads serviced by the SSD drives storing the master image, while the red
line show reads services serviced by the SAS drives storing the linked-clones.
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 20
Note: The turquoise line shows writes to the SAS drives storing the Linked-clones, while the blue line
show no writes going to the SSDs. “SSD drives are used strictly for the read-only replica and is reflected
in the graph above by the `0' write IOPS to the SSD drives.”
According to Cisco VXI testing using a simulated VDI workload with SCAPA, this configuration was able
to deliver a peak of 6,000 Total IOPS or 86 IOPS per desktop at a density of 70 virtual desktops per
server. While the cost of this configuration is low at $202 per desktop, the cost per IOPS is very high at
$2.36 (compared to $0.56 with Diskless VDI). The total number of IOPS provided per desktop is 86,
which will provide adequate performance but will not achieve equal performance to a physical PC and
performance may be poor during times of peak usage as desktops burst over 100 IOPS per desktop. This
configuration was validated as part of Cisco VXI Phase 2 Testing. For more information, visit the Cisco
VXI Cisco Validated Design for VMware View.
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 21
3. Network File System (NFS) without Atlantis ILIO
The pricing used for this analysis includes the Cisco UCS server hardware and NFS shared storage but
excludes the switching and VDI broker licenses for comparison purposes. The NFS array used was a
shared storage architecture that is designed to provide 6-10 IOPS per desktop. Because buying a shared
storage array is far more expensive than purchasing local SAS, SSD or Memory, the cost per desktop of
the this shared storage component is almost double that of a local disk or memory based architecture. In
addition, the 6-10 IOPS per desktop will result in a poor user experience as Windows 7 required 30 to 100
IOPS to achieve equal to physical performance. Due to the high cost per desktop and the low number of
IOPS per desktop, the price/performance of the FlexPod architecture at $38.52 per desktop is much lower
than any of the local disk based architetures ($0.56 per IOPS for Diskless VDI). This configuration was
validated as part of Cisco VXI Phase 2 Testing. For more information, visit the Cisco VXI Cisco Validated
Design for VMware View.
Note: the green line indicates writes per second, while the red line indicates reads per second. This test was
conducted as a density of 80. The total of the Write and Read IOPS equal about 800 IOPS total or 10 IOPS per
desktop.
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 22
1. Network File System (NFS) utilizing Atlantis ILIO
The pricing used for this analysis includes the Cisco UCS server hardware, NFS shared storage, and the
Atlantis ILIO software license cost but excludes the switching and VDI broker licenses for comparison
purposes. This architecture is the same as the the above architecture with the exception that an Atlantis
ILIO virtual machine is inserted between the hypervisor and NFS storage to deliver 20 times more IOPS
per desktop (250 IOPS per desktop with ILIO vs. 10) and reduce the amount of storage capacity
consumed. The addition of the Atlantis ILIO license cost increases the cost per desktop slightly but
delivers far more IOPS per desktop, resulting in a much better price per IOPS of $1.89 compared to
$38.52 per IOPS with NFS alone. In addition, providing each desktop with 250 IOPS will result in better
than physical PC performance compared to the 10 IOPS per desktop with NFS alone. The NFS
architecture with Atlantis ILIO is a good choice for customers who prefer shared storage, want to take
advantage of features such as vMotion that aren’t possible with local disk architectures or are using
persistent desktops . This configuration was validated as part of Cisco VXI Phase 2 Testing. For more
information, visit the Cisco VXI Cisco Validated Design for VMware View.
2. Block Level Storage
The pricing used for this analysis includes the Cisco UCS server hardware and Block based shared
storage components. The block level storage array uses a shared storage architecture, that is designed
to provide 2,457 IOPS total and 62 IOPS per desktop when used with 160 virtual desktops (40 desktops
per server). Because buying a shared storage array is far more expensive than purchasing local SAS,
SSD or Memory, the cost per desktop of the block level storage is 8 times more expensive than a local
SSD based architecture with Atlantis ILIO. Larger block level storage arrays that support more virtual
desktops will likely have a lower cost per desktop but it will always be at least twice the cost of a local disk
architecture. When there are more desktops included in a block level VDI storage architecure, it is
important to ensure that the IOPS per desktops keeps pace with the increase in density. Due to the high
cost per desktop, the price per IOPS of the block level architecture is $29.63 is much higher than any of
the local disk based architectures ($0.56 per IOPS for Diskless VDI). This configuration was validated as
part of Cisco VXI Phase 2 Testing.
3. HP Servers with FusionIO
Local SSD Drives are often considered as a storage option for VDI because they are rated to provide a
large number of IOPS per drive. However, SSD performance is dramatically reduced with the write-heavy
and intense characteristics of the VDI workload. In addition, the VDI workload can cause the limited
lifespan of an SSD to move from years to months, increasing the risk of disk failure. From a cost
perspective, SSDs can also be very costly and have limited storage capacity, necessitating the use of
SAS drives in combination with SSD drives or the purchase of higher capacity and more expensive SSD
drives. Fusion IO SLC drives provide a large number of IOPS. However, even with a relatively small size
of each desktop clone of 4GB, you need to purchase a 320GB Fusion IO SLC card, which cost $14,362
per card. If the virtual desktop clones or user data disk grow, a second 320GB SLC card will need to be
added. This configuration was able to deliver an estimated 29,200 IOPS or 365 IOPS per desktop at a
density of 80 virtual desktops per server. At $380, the cost per desktop is 84% more expensive per
desktop than the Diskless VDI architecture and 67% more expe The Cisco and Atlantis ILIO Local SSD
VDI architecture offers a low cost per desktop and extremely fast desktop performance that can be
deployed with either a Cisco UCS C Series rack server or B Series Blade Server.
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 23
Comparing Diskless VDI to Existing Approaches to VDI
Storage
Challenges of the VDI Workload for Traditional Storage
Traditional storage technologies including SAN/NAS and SSDs are not equipped to handle the unique nature of
VDI workloads, which results in poor desktop performance and more storage disks required to service VDI IO
traffic.
Write Heavy IO Traffic
Unlike server virtualization, VDI workloads are extremely write intensive with the typical distribution of IOPS being
80% write and 20% read during normal desktop operation. Traditional storage caching and SSDs are ineffective
with write IO and have little impact on improving virtual desktop performance.
IO Blender Effect—From Sequential to Random Small Blocks
When the Windows operating system generates disk IO, it optimizes that IO on its local hard drive so that blocks
are stored sequentially for optimal performance. With VDI, the hypervisor converts sequential IO into small blocks
of random IO (the IO Blender effect), which decreases storage and desktop performance. Atlantis ILIO
automatically converts the small random blocks into larger blocks of sequential IO before sending to storage,
increasing storage and desktop performance.
Peak Bursts of 10x Average IO
With VDI, end user activities such as simultaneous boot, logon and application IO storms or common IT activities
such as anti-virus scanning, patching or cloning generate peak IO that can be 10 times or more the average IO
traffic. As a result, storage can either be sized for peaks and be extremely expensive or sized for the average IO
traffic and result in serious performance impact during periods of peak activity. Atlantis ILIO delivers local IOPS to
virtual desktops to ensure consistently high performance during peak usage.
The Challenge of Sizing and Designing VDI Storage Architectures
With physical PCs, it was easy for IT organizations to deliver desktops without concern for scale. The desktop
team would support a fixed number of standardized desktop and laptop PC models designed for different types of
workers. Each model delivered a predictable level of performance for a predictable price. The reason for this is
that physical PCs have dedicated and fixed computing resources (memory, CPU, hard drive). With virtualization,
computing resources are abstracted and pooled to be used more efficiently. With server virtualization, the
workloads are predictable enabling IT organizations to accurately predict the usage of computing resources.
However, with desktop virtualization, workloads are unpredictable with large variation between average and peak
resource utilization, write IO heavy (80% write/20% read), and generate random 4k blocks. In order to achieve a
balance between VDI cost and performance, IT organizations need to design server, networking and storage
infrastructure components that can scale linearly as the VDI user base grows.
There are three critical elements planning VDI storage:
1. Storage IO Throughput (Measured in IOPS) - The first VDI bottleneck reached is almost always
Storage IO throughput as a typical disk has a fixed number of input/output operations per second (IOPS)
but can vary in capacity from 100GB to 1TB in capacity. The only way to increase the amount of IOPS
with traditional storage is to increase the number of disks and controllers.
2. Network Throughput to Storage(Gb/s) – With Shared SAN or NAS storage, it is also critical to ensure
that there is sufficient network throughput to the storage system. VDI architecture often require 10Gbe or
Multiple Fibre Channel links to support the amount of network traffic generated by virtual desktop IO
traffic.
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 24
3. Storage Capacity (Measured in GB) – Storage capacity is often less of an issue as you can select drive
sizes to match the size of the virtual desktops. However, with persistent desktops of 20-80GB per
desktop, even capacity can become a bottleneck.
Existing Approaches to VDI Storage
There are two existing approached to VDI storage:
1. Shared Storage using SAN or NAS
2. Local Disk Storage using SSDs, SAS or SATA drives
Shared SAN or NAS storage offers the benefit of increased reliability and the ability to support persistent desktops
and virtualization features such as vMotion. However, using SAN or NAS storage can also increase the cost per
desktop by 2 to 10 times compared to local disk or the diskless VDI architecture discussed later in this
whitepaper. In addition, the desktop performance delivered by a SAN or NAS is typically poor because storage
systems are designed to support only 6-10 IOPS per desktop to keep desktop costs down. Conversely, Local disk
based storage architecture only support non-persistent virtual desktops. However, local disk architectures cost far
less and deliver better desktop performance by providing 62-365 IOPS per desktop (See Price/Performance
Section of this whitepaper). As a result, many customers with non-persistent virtual desktops are shifting to a
Local Disk based VDI storage architecture with Cisco UCS and Atlantis ILIO to deliver
SAN or NAS Storage
Figure 7.
Cisco Reference Architecture with NAS Storage
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 25
Cost
Conventional SAN/NAS storage systems can only deliver the data throughput (IOPS) needed to support VDI by
increasing the number of storage drives and controllers well beyond that needed to deliver the required disk
space. Consequently, the cost of VDI storage per desktop to achieve the equivalent performance of a physical
1
2
PC can be anywhere from $509 to $2,385 depending on the SAN or NAS storage system, virtual desktop image
and other infrastructure factors. In addition, SAN or NAS storage can require additional networking such as Fibre
Channel HBA cards or 10Gbe networking to achieve acceptable performance, which further drives up the cost of
VDI. While VDI functions with $50-$500 in storage per desktop, desktop performance will suffer during periods of
peak usage and users will often reject switching from a physical desktop to a virtual desktop. From an operational
perspective, a 224 disk NAS storage system supporting a 1,000 virtual desktop deployment would require almost
2 racks of datacenter space, power and cooling costs.
2
Interview with Franfurter Bank with Hitachi Data Systems 9585 2GB Fiber Channel Storage estimated at $2,385 per
desktop without Atlantis ILIO.
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 26
Latency
A traditional physical PC hard drive is dedicated to a single desktop and connected to the same physical
hardware resulting in 1-2ms response time. Using Traditional SAN/NAS as the hard drive of a virtual desktop
(.vmdk or .vhd file), requires the hypervisor (VMware vSphere, Citrix XenServer or Hyper-V) to read and write to
disk IO over the network, which can introduce latency of 4-20ms. As the VDI deployment scales, more desktops
are competing for a limited amount of IO throughput (IOPS) that increase response time. As the IOPS load
increase on a SAN/NAS system, the latency or response time also increases. This means that you can only load
the SAN/NAS storage system to about 50% without increasing latency and degrading desktop performance. In
the example below, Latency begins increase sharply at 50% load.
Figure 8.
NAS Response Time by IO Throughput Load (Source: SPC Benchmark)
3
3
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 27
Reliability
SAN and NAS storage systems are typically highly reliable because they have a variety of RAID levels available
to protect against data loss when a disk failure occurs and High Availability options to protect against controller
failures. However, using large number of SAS and SATA drives means the disks will have to be maintained and
replaced when failure occurs, which introduces significant operational expenses.
“We find that in the field, annual disk replacement rates typically exceed 1%, with 2-4% common and up to 13%
observed on some systems. This suggests that field replacement is a fairly different process than one might
4
predict based on datasheet MTTF.”
Local Solid State Drives (SSDs)
Local Solid State Drives (SSDs) can be used to store non-persistent virtual desktops to store the clones or writecache. They come in two general types Single Level Cell (SLC) or Multi-level Cell (MLC). Within the Category of
SLC, there are also flash based drives that deliver more IOPS but at a much higher cost per drive and GB. Due to
concerns about lifespan and write-performance under heavy VDI loads with random small block IO, VDI typically
used standard SLC SSD drives or Flash-based SLC drives.
Cost – The cost per desktop can vary widely for between SSD drive depending on type from $200 for consumer
5
grade SSDs to $14,362 for a 320GB flash based SLC drive. Due to the size of virtual desktop clones, you often
need to purchase multiple SSD drives or limit the density of virtual desktops per server which drives up the cost
per desktop (See the Price/Performance Section of this document for examples)
Performance – SSDs typically list extremely high IOPS per drive for sequential read-heavy workload. However,
the VDI workload, which consists of write-heavy IOPS with random 4K blocks, decreases the number of IOPS. As
a result, some architecture use SSDs as a storage tier for the master image which is small in capacity
requirements and read-heavy IO traffic and then places the linked
Capacity – SSD drive are limited to 100GB to 320GB of capacity and the cost per GB is as high as $44 per GB.
While processors can handle 50-200 desktop per server, they require at least 4GB per desktop of capacity to
store the virtual desktop clones, which means that VDI architectures become capacity bound with SSDs.
Reliability – MLC SSDs are not typically used for enterprise-class applications such as VDI because of the short
lifespan. SLC SSDs have much longer lifespan but can still fail within the lifespan of a VDI server (months or
years). As a result, it is necessary to mirror drives using RAID to protect against disk failure, which doubles the
cost per usable Gigabyte.
4
http://www.usenix.org/events/fast07/tech/schroeder.html
5
http://www.cdw.ca/shop/products/FUSION-IO-320GB-IODRIVE-SLC/2318279.aspx
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 28
Conclusion
Cisco Systems and Atlantis Computing™ have partnered to deliver a revolutionary Diskless VDI architecture that
eliminates the need for all disk-based storage. For the first time, Cisco Extended Memory Technology and Atlantis
ILIO VDI Optimization software make it possible to replace shared storage or local SSDs with local memory for
virtual desktop storage. This unique VDI architecture enables customers to create a Virtual Desktop Infrastructure
with a cost per desktop below $250 per seat, better performance than a physical PC and lower OPEX costs by
eliminating the power, cooling and operational complexity of maintaining/replacing disks in the datacenter.
Availability
The products used in the Cisco and Atlantis Computing reference architectures using local disk (SAS, SSD) and
shared storage are available immediately through Cisco and Atlantis Computing partners. The technology used in
the Cisco UCS and Atlantis ILIO Diskless VDI architecture is being showcased for the first time at VMworld 2011
and is not yet generally available. For more information on the Cisco Systems and Atlantis ILIO Diskless VDI
Architecture, contact Cisco Systems or Atlantis Computing.
Cisco Systems, Inc.
170 West Tasman Drive
Atlantis Computing, Inc.
San Jose, CA 95134-1706
2570 West El Camino Real, Suite 230
USA
Mountain View, CA 94040
www.cisco.com
USA
Tel:
408 526-4000
800 553-NETS (6387)
Fax:
www.atlantiscomputing.com
Tel:
650 917-9471
408 527-0883
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 29
© 2011 Cisco Systems, Inc. All rights reserved. Cisco, the Cisco logo, and Cisco Systems are registered
trademarks or trademarks of Cisco Systems, Inc. and/or its affiliates in the United States and certain other
countries. All other trademarks mentioned in this document are the property of their respective owners. (0805R)
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 30
Appendix 1 – Previous Cisco UCS and Atlantis
Computing VDI Reference Architecture Testing
Cisco Systems and Atlantis Computing have partnered to deliver a variety of VDI Reference
Architectures and Cisco Validated Designs to assist customers in designing their VDI deployments
using both VMware View and Citrix XenDesktop with a variety of backend storage options including:
• Local SAS Drives on a C250 M2 UCS Server (Cisco VXI Phase 2)
• Local SSD Drives on a B250 M2 UCS Server (Cisco, Citrix and Atlantis ILIO Reference
Architecture)
• Shared NAS (NetApp 3170) with a B250 M2 UCS Server (Cisco, Citrix and Atlantis ILIO
Reference Architecture)
Cisco Systems testing shows that Atlantis ILIO with Cisco UCS can:
•
•
•
Cut VDI Storage Costs – Reduce VDI storage by up to 90%
Scale Existing VDI Storage – Add 4-10 times more used on the same storage with better
performance
Boost VDI Performance – Eliminate IO bottlenecks to increase performance up to 10 times
Cisco VXI Phase 2 Testing Results Summary
In this profile, Atlantis ILIO was deployed on a UCS C250 M2 with local drives to optimize storage and improve overall
performance. Testing was done with 70 Windows7 32b desktops running on View 4.5 and ESXi.
Test Environment and Setup
• View 4.5 on ESXi 4.1; RDP
• HVD Profile:
– Windows 7 32b with 1.5G of memory and 20G of disk space
– 1 vCPU, Persistent desktop
• Workload Profile: Cisco VXI KW+ (Cisco Unified Personal Communicator 8.5 in deskphone mode,
IE, Microsoft Office 2007 Apps, Acrobat) with McAfee Move AV 1.5 running a default scan policy - see
MoveAV section below for the scan policy used
• UCS Server: C250 M2 with 192 G of memory - Two Six Core Intel Xeon (EP) 5680) processors @
3.33 GHz and 1GE uplinks
• Storage: DAS with 8 SAS drives in a RAID 10 configuration; Atlantis ILIO was deployed on the
blade with 24G of RAM; Atlantis ILIO is seen as a NFS datastore by the hypervisor
• All of the data shown in the graph below is collected using resxtop with a polling interval of 5sec
• User Experience is measured using Scapa as outlined earlier
• For this profile, data is captured and graphed for Login, Workload and Logout phases
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 31
Summary of Test Results
During Cisco VXI testing, the Atlantis ILIO software virtual appliance was installed on a Cisco UCS server running 70 virtual
desktops to optimize how the windows operating system interacts with the local SAS storage disks to reduce the number of
disks required and boost desktop performance. During the testing, Atlantis ILIO showed an average offload of 92% and a peak
offload of 94% as measured by esxtop Disk Bandwidth in MB/s.
Metric
Maximum
(Peak)
Average
Figure 9.
10 Traffic from Hypervisor to
Atlantis ILIO (MB/s)
11.62
10 Traffic from Atlantis ILIO
to Disk (MB/s)
.98
Percentage
Offload
94%
245.98
14.02
92%
Cisco VXI Phase 2 Atlantis ILIO IOPS Offload Test Results
For More Information on Cisco VXI Phase 2 Testing visit:
http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/VXI/CVD/VXI_CVD_Citrix.html
http://www.cisco.com/en/US/docs/solutions/Enterprise/Data_Center/VXI/CVD/VXI_CVD_VMware.html
Cisco, Citrix and Atlantis Computing VDI Reference Architecture
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 32
Cisco and Atlantis Computing™ partnered to deliver a VDI solution and reference architecture that eliminates the VDI memory
and storage bottlenecks, enabling customers to deploy VDI with better performance and at a lower cost than a physical PC.
The joint solution includes the following components (Figure 1).
Figure 1. Solution Components
There are two Atlantis deployment options documented in the reference
architecture:
Atlantis ILIO Top-of-Rack—Atlantis ILIO running one dedicated Cisco UCS
B250 M2 Blade Server optimizing storage and performance for up to 8 blade
servers using shared NAS storage (NetApp 3170).
Atlantis ILIO OnBlade— Atlantis ILIO running on each Cisco UCS B250 M2
Blade Server optimizing storage and performance for the desktops on that
blade using 2 local SSD drives.
The test results show that the Cisco and Atlantis VDI Reference Architecture
are able to deliver the required performance, cost and scalability required by
enterprise customers with Microsoft Windows 7 running anti-virus as shown
in Table 1.
Table 1. Test Results
Summary Test Results
Virtual Desktop Density Per
Blade
Performance Benchmark Pass
Rate
Atlantis ILIO Write IOPS Offload
Atlantis ILIO Read IOPS Offload
Atlantis ILIO Top-of-Rack
80
Atlantis ILIO OnBlade
80
100%
100%
71%
82%
67%
91%
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 33
This graph shows the LoginVSI 2.1 Minimum, Maximum and Average Response time for
desktops in the Atlantis ILIO 80 desktop OnBlade test.
For More Information on the Cisco, Citrix and Atlantis Computing VDI Reference Architecture, visit:
www.atlantiscomputing.com/ciscocitrixatlantis
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 34
Appendix 2 - Test Methodology
Testing Overview
Testing for the Cisco and Atlantis Computing Diskless VDI reference architecture focused on determining the
optimal configuration to maximize price/performance for the VDI architecture. To accomplish this, it was critical to
establish the maximum density of virtual desktops per server with acceptable desktop performance. To measure
this, Cisco used LoginVSI 3.0 medium workload to ensure that the maximum density or “VSIMAX” was not
reached during the test cycle. In addition, the LoginVSI chart shows the response time in milliseconds to
determine the desktop performance level between different tested configurations that passed the density test.
The testing started at a density of 100 virtual machines per server, which showed an excellent LoginVSI test with
a maximum CPU utilization of 70%.
In configuration 2, the density was increase to 110 virtual desktops per server with 2 CPUs for Atlantis ILIO. In
this test, the hypervisor CPU averaged 46% but spiked to 100% for a brief period. However, the Atlantis ILIO
virtual machine only used a maximum of 39% of the CPU. Therefore, Cisco determined that the CPU reservation
should be lowered to 1 CPU for Atlantis ILIO, thereby freeing up more CPU resources for desktops to add more
density.
In configuration 3, the density was increased to 120 with more CPU allocated to the desktops. The results was a
passing LoginVSI Max score. This was determined to be the maximum density for the configuration with based on
the CPU utilization reaching 100% and the LoginVSI response time increasing compared to the 100 and 110
density configurations.
To measure the number of IOPS for the storage configuration, Cisco used IOMeter configured to simulate a VDI
workload. The Cisco and Atlantis ILIO Diskless VDI architecture was able to deliver 44,123 IOPS per server or
367 IOPS per desktop.
Tested Diskless VDI Configurations & Detailed Results
All configurations were tested with the Cisco B250 M2 Extended Memory blade server with the Intel 5680
processor @3.33GHz with 384GB of Memory.
Configuration 1 – 100 Density with 2GB of RAM Per Desktop
In configuration 1, the test was setup with the following configurations:
•
•
•
•
•
•
•
•
Virtual Desktops Per Server (Density) - 100
RAM for the Hypervisor – 2GB
vRAM allocated per Desktop – 2GB
RAM Disk allocated for virtual desktop clone storage – 150GB
vRAM for the Atlantis ILIO – 6GB
vCPUs for Atlantis ILIO - 2
CPU reservation for Atlantis ILIO – 6648Mhz
CPUs for Desktops - 10
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 35
Figure 10.
LoginVSI 3.0 Medium Summary Chart for Diskless VDI at 100 Density
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 36
Figure 11.
Atlantis ILIO CPU Utilization at 100 Virtual Machines
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 37
Figure 12.
Hypervisor CPU Utilization at 100 Virtual Machines
Configuration 2 – 110 Density with 1.5GB of RAM Per Desktop
In configuration 2, the test was setup with the following configurations:
•
•
•
•
•
•
•
•
Virtual Desktops Per Server (Density) - 100
RAM for the Hypervisor – 2GB
vRAM allocated per Desktop – 2GB
RAM Disk allocated for virtual desktop clone storage – 150GB
vRAM for the Atlantis ILIO – 6GB
vCPUs for Atlantis ILIO - 2
CPU reservation for Atlantis ILIO – 4995Mhz
CPUs for Desktops – 10-11
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 38
Figure 13.
LoginVSI 3.0 Medium Summary Chart for Diskless VDI at 110 Density
Figure 14.
Atlantis ILIO CPU Utilization at 110 Virtual Machines
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 39
Figure 15.
Hypervisor CPU Utilization at 110 Virtual Machines
Configuration 3 – 120 Density with 1.5GB of RAM Per Desktop
In configuration 3, the test was setup with the following configurations:
•
•
•
•
•
•
•
•
Virtual Desktops Per Server (Density) - 100
RAM for the Hypervisor – 2GB
vRAM allocated per Desktop – 2GB
RAM Disk allocated for virtual desktop clone storage – 150GB
vRAM for the Atlantis ILIO – 6GB
vCPUs for Atlantis ILIO - 1
CPU reservation for Atlantis ILIO – 3333Mhz
CPUs for Desktops - 11
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 40
Figure 16.
LoginVSI 3.0 Medium Summary Chart for Diskless VDI at 120 Density
Figure 17.
Atlantis ILIO CPU Utilization at 120 Virtual Machines
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 41
Figure 18.
Hypervisor CPU Utilization at 120 Virtual Machines
Measuring Diskless VDI Performance
To test the performance and establish the maximum density of the Diskless VDI architecture,
two different tests were performed:
1. IOMeter – Measures Storage Performance in terms of Input/Output Per Second to
storage.
2. Login VSI 3.0 Medium Workload – Measures overall desktop performance and
establishes the maximum density for a given VDI configuration.
IOMeter
IOMeter can be configured to model different types of storage workloads. In this case, IOMeter
was configured to simulate a VDI workload with the following configuration:
• Disk Targets - Maximum Disk size to 2097152 Sectors ( 1 GB Test File)
• Test connection rate - 500 Transactions per second
• Access Specifications
o 4KB Transfer request size
o 100 percent Access specification
o 80% Write
o 20% Read
o 80% Random
LoginVSI 3.0
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 42
The LoginVSI Medium workload test is designed to simulate a normal knowledge worker VDI
workload using common productivity applications and measure response time.
Login VSI 3.0 Medium
•
•
•
•
•
•
•
This workload emulated a medium knowledge working using Office, IE and PDF.
Once a session has been started the medium workload will repeat every 12 minutes.
During each loop the response time is measured every 2 minutes.
The medium workload opens up to 5 apps simultaneously.
The type rate is 160ms for each character.
The medium workload in Login VSI 2.0 is approximately 35% more resource intensive
than Login VSI 1.0.
Approximately 2 minutes of idle time is included to simulate real-world users.
Each loop will open and use:
•
•
•
•
•
•
•
•
Outlook 2007, browse 10 messages.
Internet Explorer, one instance is left open (BBC.co.uk), one instance is browsed to
Wired.com, Lonelyplanet.com and heavy
flash app gettheglass.com (not used with MediumNoFlash workload).
Word 2007, one instance to measure response time, one instance to review and edit
document.
Bullzip PDF Printer & Acrobat Reader, the word document is printed and reviewed to
PDF.
Excel 2007, a very large randomized sheet is opened.
PowerPoint 2007, a presentation is reviewed and edited.
7-zip: using the command line version the output of the session is zipped.
For More information on analyzing VSI results, visit http://www.loginvsi.com/
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
.White Paper
Page 43
Cisco Systems, Inc.
170 West Tasman Drive
San Jose, CA 95134-1706
USA
www.cisco.com
Tel:
408 526-4000
800 553-NETS (6387)
Fax:
408 527-0883
© 2011 Cisco Systems, Inc. All rights reserved. Cisco, the Cisco logo, and Cisco Systems are registered
trademarks or trademarks of Cisco Systems, Inc. and/or its affiliates in the United States and certain other
countries. All other trademarks mentioned in this document are the property of their respective owners.
(0805R)
© 2011 Cisco Systems, Inc. All rights reserved. This document is Cisco Public Information
Document number: UCS-TR1000xx
.White Paper
Page 44