Important considerations when optimizing Office 365 in an Azure Express Route (EXP) environment April 2016 Important considerations when optimizing O365 in an Azure Express Route environment Table of Contents Introduction ................................................................................................................................................................................................................. 2 Understanding the Exchange Online (EXO) architecture ........................................................................................................................................... 2 Understanding the SharePoint Online (SPO) architecture ......................................................................................................................................... 3 Challenges for SteelHead ........................................................................................................................................................................................... 4 Non-Local Host Optimization Latency......................................................................................................................................................................... 4 Recommendation.................................................................................................................................................................................................. 6 Asymmetric Routing .................................................................................................................................................................................................... 7 Recommendation – Customer-to-Microsoft traffic............................................................................................................................................ 7 Recommendation – Microsoft-to-customer traffic ............................................................................................................................................ 8 DNS............................................................................................................................................................................................................................. 9 Recommendation.................................................................................................................................................................................................. 9 Traffic Redirection ..................................................................................................................................................................................................... 10 Recommendation................................................................................................................................................................................................ 10 SSL Certificates ........................................................................................................................................................................................................ 10 Existing SteelHead O365-D customers migrating to the next generation design ....................................................................................... 10 New O365 Multi-Tenant tenant customers ....................................................................................................................................................... 10 Conclusion ................................................................................................................................................................................................................ 10 © 2016 Riverbed Technology. All rights reserved. 1 Important considerations when optimizing O365 in an Azure Express Route environment Introduction Azure Express Route (AER) provides the option for customers to peer directly with Microsoft rather than traversing the Internet. With this direct peering, AER promises the following: provide better network connectivity for resources located in Azure and Office 365 (O365); provide availability SLA; and avoid the inherent congestion of the Internet AER poses some unique considerations with SteelHead optimization. This paper highlights some of the important issues customers should take into account prior to deploying AER with Riverbed SteelHead optimization. Please note that this paper is geared towards the Exchange Provider (EXP) model and not the Network Service Provider (NSP) model. The reader should be familiar with the following products and concepts: Azure Express Route and BGP peering; SteellHead operation; Traffic engineering; Exchange Online terminologies; DNS; and SSL certificates Understanding the Exchange Online (EXO) architecture In Figure 1, the O365 tenant is in the US meaning that the locations of the users’ mailboxes are typically all located in the US. Furthermore, the users’ mailboxes are typically spread across multiple datacenters within the US. The user in this example is logging in while traveling overseas. Source: Microsoft Technet Figure 1 © 2016 Riverbed Technology. All rights reserved. 2 Important considerations when optimizing O365 in an Azure Express Route environment When a user accesses their email in Exchange Online (EXO), the client performs a DNS query for outlook.office365.com amongst other hostnames and the DNS server (steps 1 and 2) and Microsoft then returns a list of the closest client access front end (CAFÉ) server (step 3). Once the client has the list of IP addresses, the user connects to a CAFÉ server (step 4) and the CAFÉ server initiates a separate connection to the actual Exchange server where the mailbox (MBX) is held (step 5). In other words, there are two separate TCP connections here: one between the end-user and the CAFÉ server and another one between the CAFÉ server and the MBX server. In an effort to improve the underlying network connectivity into Office 365, EXO relies on DNS to direct traffic to the closest geographical CAFÉ servers in each region. This feature is known as “GeoLocation” or “GeoDNS”. For performance reasons, users typically obtain their DNS servers via DHCP that belongs to the same region. In Figure 1, a US-based user traveling to Europe utilizes a DNS server that is located in the Europe. As such, the user receives a list of IP addresses for Exchange Online CAFÉ servers based in Europe and the client will pick a CAFÉ server from that list. The idea is to bring the traffic destined to EXO into the Microsoft backbone as soon as possible as to avoid the unpredictability of the Internet. In Figure 1, step 4) is typically low latency while the latency in step 5) can exceed 100ms even though the connection between the CAFÉ and MBX server is carried over the Microsoft backbone. This traffic flow of connecting to the closest regional CAFÉ server is not limited to international travelers as the same concept also applies when users are traveling within a region. For example, a US-based user logging in from the west coast could connect to a CAFÉ server in Texas but their mailbox residing in east coast. The only difference is that the latency may be lower for intra-region rather than inter-region connectivity. Understanding the SharePoint Online (SPO) architecture With SPO, the architecture is simpler because there is no concept of front-end and back-end servers. When a client connects, it connects directly to the SPO instance. © 2016 Riverbed Technology. All rights reserved. 3 Important considerations when optimizing O365 in an Azure Express Route environment Challenges for SteelHead Figure 2 Continuing our example, the customer has an AER peering point in Europe and deploys SteelHead optimization as shown in Figure 2. The connection between the user and Microsoft regional datacenter in Europe will be optimized while the connection between the CAFÉ server and the MBX server will be unoptimized as the servers reside within Microsoft’s network. The fact that the CAFÉ and MBX servers can reside in different Microsoft data centers within a region or across different regions means that the latency most likely would not be LAN-like. The potentially high latency between the CAFÉ and MBX servers has a direct impact on the performance of the SteelHead optimization. Riverbed calls this type of situation common to cloud scenarios as Non-Local Host Optimization (NLHO) latency. The impact NLHO latency has on SteelHead optimization is further discussed in the next section. While SPO has a different architecture, it also suffers from the NLHO latency as the server-side SteelHead could be far away from the SPO instance. In the example above, the server-side SteelHead could be in Europe while the SharePoint instance is in the US. Non-Local Host Optimization Latency Figure 3a illustrates the impact NLHO latency has on performance. As the NLHO latency increases, the amount of time it takes to complete an operation in Outlook increases. The tests were performed under the following parameters: Fixed WAN latency of 60ms and unlimited bandwidth; AER with 200Mbps of bandwidth; Outlook 2013; Office 365 tenant based in the US; and Sending a 13MB Word document © 2016 Riverbed Technology. All rights reserved. 4 Important considerations when optimizing O365 in an Azure Express Route environment The NLHO latency in the graph includes the latency between the server-side SteelHead to the CAFE server and between the CAFÉ server to the MBX server. 3ms NLHO latency is the equivalent of having the CAFÉ server and MBX server at the same location (i.e. there is no NLHO latency). Figure 3a Figure 3b compares the performance between no SteelHead optimization and SteelHead optimization with the presence of NLHO latency. The orange bars in Figure 3b corresponds with the orange bars in Figure 3a. “Latency” in Figure 3b includes the WAN latency and the NLHO latency. For example, 83ms = 60ms WAN latency + 23ms of NLHO latency. Figure 3b © 2016 Riverbed Technology. All rights reserved. 5 Important considerations when optimizing O365 in an Azure Express Route environment When deploying SteelHead in an AER environment, NLHO will almost always be present and therefore the solution is not to eliminate it but rather to try and minimize its impact. Recommendation Microsoft Azure offers peering locations at different locations within the US. In order to minimize the impact NLHO latency has on EXO, the recommended peering locations are Dallas and/or Chicago. The reason why these two locations are ideal for optimizing EXO is because they are centrally located within the US (refer to Figure 4). By peering in Dallas and/or Chicago, the NLHO latency could be reduced. For example, if the client is directed by DNS to use a CAFÉ server in VA but with the MBX server located in CA, then the NLHO latency will be 36ms across the AER network (Dallas-to-VA) and 60ms across the Microsoft backbone network (VA-to-CA) for a total latency of 96ms. Compare this to a peering location in CA in which case it could incur a “full boomerang” of 60ms across the AER network (CA-to-VA) and then 60ms across the Microsoft backbone network (VA-to-CA) for a total of 120ms in the worst case scenario. Indeed, it is possible that the peering location, the CAFÉ server, and the MBX server are all located at one location and therefore there is no NLHO latency. However, given the architecture of EXO, this scenario is likely to be an exception rather than the norm. Furthermore, to minimize the impact NLHO latency has on SPO, a peering location should be chosen that is close to the SPO instance. The recommended AER peering points for the rest of the world are: Europe: Amsterdam and London; Asia: Hong Kong and Singapore; Australia: Sydney and Melbourne Figure 4 © 2016 Riverbed Technology. All rights reserved. 6 Important considerations when optimizing O365 in an Azure Express Route environment Asymmetric Routing When SteelHeads are deployed to optimize the Office 365 traffic, it is imperative that the Office 365 traffic flows symmetrically. In other words, traffic going out of a one peering location must return back via the same peering location. There two reasons why this is important: 1) Customers who are using RFC1918 address must NAT the source address to the publicly registered IP addresses. As NAT is involved, the router performing the NAT must be able to process both inbound and outbound traffic; and 2) In a traditional datacenter environment, the SteelHead appliances can support asymmetric routing by leveraging the Connection Forwarding feature. However, Connection Forwarding was designed to work within the same datacenter with LAN-like latency. Connection Forwarding across different Azure peering points (e.g. Dallas and Chicago) is not supported. Asymmetric traffic could take place in either direction: customer-to-Microsoft and Microsoft-to-customer. Recommendation – Customer-to-Microsoft traffic When peering with Microsoft, all peering points will receive the same routes from Microsoft. Therefore, depending on the interior gateway protocol (IGP) being used and how the routes are redistributed from BGP into the IGP, it’s conceivable that there may be equal cost routes in reaching the Microsoft network. From a network symmetry perspective, having equal cost routes to a destination is not an issue as traffic for a certain flow should traverse the same path (i.e. no per-packet load balancing). However, it has been observed that by minimizing the latency between the SteelHead appliance and the SPO instance, there is a noticeable performance increase. For example, if the SPO instance is located in Virginia, then it would make sense to ensure that the traffic flow through the Washington DC peering point as the latency is typically less than 10ms versus about 36ms from Dallas. A common way to influence the traffic flow is via the well-known BGP attributes such as local preference. Microsoft has indicated that routes will be tagged with a certain BGP community value based on the service. Riverbed suggests matching the route based on the BGP community value (e.g. SharePoint) and then redistribute those routes into the IGP. However, as of April 2016, the BGP community values are not part of the route updates. © 2016 Riverbed Technology. All rights reserved. 7 Important considerations when optimizing O365 in an Azure Express Route environment Source: https://azure.microsoft.com/en-us/documentation/articles/expressroute-routing/ Figure 5 In the absence of BGP community values to match the routes that are related to SPO, the alternative is to simply match using the subnet. In general, both SPO and OneDrive traffic use the URL <domain>.sharepoint.com or <domain>-my.sharepoint.com and performing a “ping” or DNS lookup on the name will reveal the IP address. Once the IP address is known, execute the command “show ip route x.x.x.x” on the router to determine which route matches that IP address. Note that it’s possible Microsoft may be blocking ICMP in their network and therefore the “ping” command may not receive a response from the server. This is not an issue as the hostname is should still resolve. $ ping rvbdaer.sharepoint.com PING prodnet320-281ipv4a0001.sharepointonline.com.akadns.net (104.146.156.34): 56 data bytes 64 bytes from 104.146.156.34: icmp_seq=0 ttl=236 time=227.037 ms 64 bytes from 104.146.156.34: icmp_seq=1 ttl=236 time=228.120 ms rtr#sh ip route 104.146.156.34 Routing entry for 104.146.0.0/15 Known via "bgp 18597", distance 20, metric 0 Tag 12076, type external Once the route has been determined, the Washington DC router can then redistribute the route into the IGP by matching the 104.146/15 prefix using a more favorable metric than the Dallas router and therefore attracting the traffic to the Washington DC peering point. Recommendation – Microsoft-to-customer traffic Instead of advertising a large summarized route to Microsoft, it is recommended to allocate smaller blocks of IP addresses to each peering location and advertise the more granular routes to Microsoft. For example, Riverbed owns the range 208.70.196.0/22 and therefore it is possible for Riverbed to advertise this single prefix at both Dallas and Chicago peering points. However, by only advertising this 208.70.196.0/22 prefix, Microsoft will have two equal cost routes in their routing table and could send the traffic back via Dallas and/or Chicago. On the other hand, by allocating smaller ranges to Dallas and Chicago, then Microsoft will have more specific routes in its routing table and traffic will always flow through corresponding peering point. For example, Dallas would be allocated 208.70.196.0/25 and the Dallas router will only advertise this prefix to Microsoft while Chicago would be allocated 208.70.196.128/25 and the Chicago router will only advertise this prefix to Microsoft via BGP. From © 2016 Riverbed Technology. All rights reserved. 8 Important considerations when optimizing O365 in an Azure Express Route environment Microsoft’s perspective, the only way to reach 208.70.196.0/25 is via Dallas and the only way to reach 208.70.196.128/25 is via Chicago. Outbound traffic will then be source NAT to these ranges depending on which peering point the traffic flows through and the return traffic from Microsoft will traverse the same router on the way back. DNS As mentioned earlier in the “Understanding Exchange Online Architecture” section, the purpose of the “GeoLocation” feature is to bring the traffic destined for EXO into the Microsoft backbone as soon as possible as to avoid the unpredictability of the Internet. This makes sense when there is local Internet breakout but likely to create an undesirable effect when used in conjunction with AER. Consider the scenario in Figure 6 whereby the customer has an AER peering point in the US but with local Internet breakout in Europe and using local DNS servers in Europe. Figure 6 After the DNS lookup, the user receives the list of CAFÉ servers that are based in Europe. When the client initiates a connection, the traffic will traverse the MPLS/WAN rather than going directly to the Internet. This is the expected and desirable behavior as the traffic should be attracted towards the AER peering point in the US. However, as the traffic enters the Microsoft backbone network in the US, the destination IP address remains that of the CAFÉ server in Europe. The traffic will traverse the Microsoft backbone network in the US and connect to the CAFÉ server in Europe. The CAFÉ server in Europe then initiates its connection across the Microsoft backbone network to the MBX server in the US. Depending on the location of the user, AER peering point, and the MBX server, the total latency in this traffic flow could exceed 400ms. Recommendation While this is not a SteelHead-specific issue, the recommended solution to this problem is to configure conditional forwarding on the local DNS server for domains related to EXO. The IP address of the conditional forwarder should be in the same region as the O365 tenant. For more information the relevant domain names that should be conditionally forwarded, refer to this article. © 2016 Riverbed Technology. All rights reserved. 9 Important considerations when optimizing O365 in an Azure Express Route environment Traffic Redirection EXO and SPO typically experience the most benefit from WAN optimization. As such, only traffic related to EXO and SPO should be optimized while the rest of the traffic should be passed through. Recommendation There are various ways to selectively redirect the traffic for optimization depending on the RiOS versions. As such, please consult with the Riverbed account team on the best way to redirect traffic only for EXO and SPO for optimization. SSL Certificates Existing SteelHead O365-D customers migrating to the next generation design Existing SteelHead customers who are migrating to the next generation O365-D environment should check and ensure their existing SSL certificates can be used in the next generation O365-D environment. New O365 Multi-Tenant tenant customers New customers who are looking to deploying SteelHead appliances in an AER environment should contact their local Riverbed account team for details on configuring SSL optimization. Conclusion Deploying SteelHead appliances in an AER environment is supported although it does introduce challenges not found in traditional on-premise deployments. However, many of these challenges can be mitigated through proper planning and traffic engineering resulting in significant benefits when SteelHead appliances are deployed in an AER environment. About Riverbed Riverbed Technology is the IT infrastructure performance company. The Riverbed family of wide area network (WAN) optimization solutions liberates businesses from common IT constraints by increasing application performance, enabling consolidation, and providing enterprise-wide network and application visibility – all while eliminating the need to increase bandwidth, storage or servers. Thousands of companies with distributed operations use Riverbed to make their IT infrastructure faster, less expensive and more responsive. Additional information about Riverbed is available at www.riverbed.com. Riverbed Technology, Inc. 680 Folsom Street San Francisco, CA 94107 Tel: (415) 247-8800 www.riverbed.com © 2016 Riverbed Technology. All rights reserved. Riverbed Technology Ltd. Farley Hall, London Rd., Level 2 Binfield Bracknell. Berks RG424EU Tel: +44 1344 354910 Riverbed Technology Pte. Ltd. 391A Orchard Road #22-06/10 Ngee Ann City Tower A Singapore 238873 Tel: +65 6508-7400 Riverbed Technology K.K. Shiba-Koen Plaza Building 9F 3-6-9, Shiba, Minato-ku Tokyo, Japan 105-0014 Tel: +81 3 5419 1990 10
© Copyright 2026 Paperzz