FAST Search Server 2010 for SharePoint
Capacity Planning
This document is provided "as-is". Information and views expressed in this document, including
URL and other Internet Web site references, may change without notice. You bear the risk of
using it.
Some examples depicted herein are provided for illustration only and are fictitious. No real
association or connection is intended or should be inferred.
This document does not provide you with any legal rights to any intellectual property in any
Microsoft product. You may copy and use this document for your internal, reference purposes.
© 2010 Microsoft Corporation. All rights reserved.
FAST Search Server 2010 for SharePoint
Capacity Planning
Microsoft Corporation
November 2010
Applies to: FAST Search Server 2010 for SharePoint
Summary: This document describes specific deployments of FAST Search Server 2010 for
SharePoint, including:
Test environment specifications, such as hardware, farm topology and configuration;
The workload used for data generation, including the number and class of users, and
farm usage characteristics;
Test farm dataset, including search indexes and external data sources;
Health and performance data specific to the tested environment;
Test data and recommendations for how to determine the hardware, topology and
configuration you need to deploy a similar environment, and how to optimize your
environment for appropriate capacity and performance characteristics.
Contents
Introduction ......................................................................................................................................................4
Search Overview ................................................................................................................................................5
Sizing Approach ...........................................................................................................................................5
Search and Indexing Lifecycle .......................................................................................................................5
Content feeding .....................................................................................................................................5
Query load ............................................................................................................................................7
Network traffic .................................................................................................................................... 14
Web Analyzer performance dimensioning ...................................................................................................... 15
Scenarios ........................................................................................................................................................ 16
Shared specifications across all scenarios ...................................................................................................... 16
Query workload ................................................................................................................................... 16
Notes on measured disk usage .............................................................................................................. 17
Configuration for extended content capacity ............................................................................................ 18
Extra small FAST Search Farm ..................................................................................................................... 20
Deployment alternatives ....................................................................................................................... 21
Specifications ...................................................................................................................................... 21
Test Results ........................................................................................................................................ 26
Small FAST Search Farm ............................................................................................................................. 30
Medium FAST Search Farm.......................................................................................................................... 31
Deployment alternatives ....................................................................................................................... 31
Specifications ...................................................................................................................................... 31
Test Results ........................................................................................................................................ 44
Large FAST Search Farm ............................................................................................................................. 61
Deployment alternatives ....................................................................................................................... 61
Specifications ...................................................................................................................................... 61
Test Results ........................................................................................................................................ 66
Extra-large FAST Search Farm ..................................................................................................................... 67
Overall Takeaways ..................................................................................................................................... 68
Query and feeding performance ............................................................................................................. 68
Redundancy ........................................................................................................................................ 68
Capacity per node ................................................................................................................................ 68
Deployments on storage area networks (SAN) ......................................................................................... 68
Deployments on solid state disks (SSD).................................................................................................. 69
Troubleshooting performance and scalability ........................................................................................................ 70
Raw I/O performance ................................................................................................................................. 70
Analyzing feeding and indexing performance ................................................................................................. 72
Content SSA ....................................................................................................................................... 72
Content distributors and item processing ................................................................................................ 75
Indexing dispatcher and indexers .......................................................................................................... 76
Analyzing query performance ...................................................................................................................... 77
Query SSA .......................................................................................................................................... 77
QRproxy and QRserver ......................................................................................................................... 78
Query dispatcher ................................................................................................................................. 79
Query matching ................................................................................................................................... 79
Introduction
This document provides capacity planning information for collaboration environment
deployments of FAST Search Server 2010 for SharePoint, in the following referred to as
FAST Search Server. It includes the following information for sample search farm
configurations:
Test environment specifications, such as hardware, farm topology and configuration
The workload used for data generation, including the number and class of users and farm
usage characteristics
Test farm dataset, including search indexes and external data sources
Health and performance data specific to the tested environment
It also contains common test data and recommendations for how to determine the hardware,
topology and configuration you need to deploy a similar environment, and how to optimize your
environment for appropriate capacity and performance characteristics.
FAST Search Server contains a richer set of features and a more flexible topology model than
the search solution in earlier versions of SharePoint. Before you employ this architecture to
deliver more powerful features and functionality to your users, you must carefully consider the
impact upon your farm’s capacity and performance.
When you read this document, you will understand how to:
Define performance and capacity targets for your environment
Plan the hardware required to support the number and type of users, and the features
you intend to deploy
Design your physical and logical topology for optimum reliability and efficiency
Test, validate and scale your environment to achieve performance and capacity targets
Monitor your environment for key indicators
Before you read this document, you should read the following:
Performance and capacity management (SharePoint Server 2010)
Plan Farm Topology (FAST Search Server 2010 for SharePoint)
4
Search Overview
Sizing Approach
The scenarios in this document describe FAST Search Server test farms, with assumptions that
allow you to start planning for the correct capacity for your farm. To choose the right scenario,
you need to consider the following questions:
1. Corpus Size: How much content needs to be searchable? The total number of items
should include all objects: documents, web pages, list items, etc.
2. Availability: What are the availability requirements? Do customers need a search
solution which can survive the failure of a particular server?
3. Content Freshness: How "fresh" do you need the search results? How long after the
customer modifies the data do you expect searches to provide the updated content in the
results? How often do you expect the content to change?
4. Throughput: How many people will be searching over the content simultaneously? This
includes people typing in a query box, as well as other hidden queries like web-parts
automatically searching for data, or Microsoft Outlook 2010 Social Connectors requesting
activity feeds that contain URLs which need security trimming from the search system.
Search and Indexing Lifecycle
Content feeding
The scenarios allow you to estimate capacity at an early stage of the farm. Farms move
through multiple stages as content is crawled:
Index acquisition
This is the first stage of data population, it is characterized by:
o
Full crawls (possibly concurrent) of content.
o
Close monitoring of the crawl system, to ensure that hosts being crawled are not
a bottleneck for the crawl.
Index Maintenance
This is the most common stage of a farm. It is characterized by:
o
Incremental crawls of all content, detecting new and changed content
o
For SharePoint content crawls, a majority of the changes encountered during the
crawl are related to access right changes
Index Cleanup This stage occurs when a content change moves the farm out of the
index maintenance stage; for example, when a content database or site is moved from
one search service application to another. This stage is not covered in the scenario
testing behind this document, but is triggered when:
o
A content source and/or start address is deleted from a search service application.
o
A host supplying content is not found by the content connector for an extended
period of time.
5
Index acquisition
When adding new content, feed performance is mainly determined by the configured number of
item processing components. Both the number of the CPU cores and the speed of each of them
will affect the results. As a first order approximation, a 1GHz CPU core will be able to process
one average size Office document per second (around 250 kB). For example, the later discussed
M4 scenario has 48 CPU cores for item processing, each being 2.26GHz, providing a total
estimated throughput of 48 cores × 2.26GHz ≈ 100 items per second on average.
The crawl rate graph below is shown from the SharePoint administration reports. The crawl rate
varies depending on the type of the content. Most of the crawl is new additions (labeled as
"modified" in the graph).
Note:
The indicated feed rates might saturate content sources and networks during peak feeding
rate periods in the above crawl. See section Troubleshooting performance and scalability for
further information on how to monitor feeding performance.
Index maintenance
Incremental crawls can consist of various operations.
6
Access right (ACL) changes and deletes: These require near zero item processing, but
high processing load in the indexer. Feed rates will be higher than for full crawls.
Content updates: These require full item processing as well as more processing by the
indexer compared to adding new content. Internally, such an update corresponds to a
delete of the old item, and an addition of the new content.
Additions: Incremental crawls will to some extent also contain newly discovered items.
These have the same workload as index acquisition crawls.
Depending on the type of operation, an incremental crawl may be faster or slower than an initial
full crawl. It will be faster in the case of mainly ACL updates and deletes, and slower in the case
of mainly updated items. Using a backup indexer may slow down the incremental crawl of
updated items further.
In addition to updates from the content sources, the index is also altered by internal operations:
The FAST Search Server link analysis and click-through log analysis generate additional
internal updates to the index.
Example: A hyperlink in one item will lead to an update of the anchor text info associated
with the referenced item. Such updates have a similar load pattern as the ACL updates.
At regular intervals, the indexer performs internal reorganization of index partitions and
data defragmentation. Defragmentation is started every night at 3am, while
redistribution across partitions occurs whenever needed.
These internal operations imply that you may observe indexing activity also outside intervals
with ongoing content crawls.
Query load
Index partitioning and query evaluation
The overall index is partitioned on two levels:
Index columns: When the complete index is too large to reside on one server, it can be
split into multiple disjoint index columns. A query will then be evaluated against all index
columns within the search cluster, and the results from each index column are merged
into the final query hit list.
Index partitions: Within each index column the indexer uses a dynamic partitioning of the
index in order to handle large number of indexed items with low indexing and query
latency. This partitioning is dynamic and handled internally on each index server. When a
query is evaluated, each partition runs within a separate thread. The default number of
partitions is 5. In order to handle more than 15 million items per server (column), you
need to change the number of partitions (and associated query evaluation threads). This
is discussed in section Configuration for extended content capacity.
Query latency
Evaluation of a single query is schematically illustrated in the following figure.
CPU processing (light blue) is followed by waiting for disk access cycles (white) and actual disk
data read transfers (dark blue); repeated in the order of 2-10 times per query. This implies that
7
the query latency depends on the speed of the CPU, as well as the I/O latency of the storage
subsystem.
A single query is evaluated separately, and in parallel, across multiple index partitions in all
index columns. In the default five-partition configuration, each query is evaluated in five
separate threads within every column.
Query throughput
When query load increase, multiple queries are evaluated in parallel as indicated in the figure
below.
As different phases of the query evaluation occurs at different times, simultaneous I/O accesses
are not likely to become a bottleneck. CPU processing shows considerable overlap, which will be
scheduled across the available CPU cores of the node.
In all scenarios tested, the query throughput reaches its maximum when all available CPU cores
are 100% utilized. This happens before the storage subsystem becomes saturated. More and
faster CPU cores will increase the query throughput, and eventually make disk accesses the
bottleneck.
Note:
In larger deployments with many index columns the network traffic between query processing
and query matching nodes may also become a bottleneck, and you may consider increasing
the network bandwidth for this interface.
8
Index size impact on query performance
Query latency is to some extent independent of query load up to the CPU starvation point at
maximum throughput. Query latency for each query is a function of the number of items in the
largest index partition.
The following diagram shows query latency on a system starting out with 5 million items in the
index, with more content being added in batches up to 43 million items. The data is taken from
the M6 scenario described later. Feeds are not running continuously, in order to see the feeding
effects on query performance at different capacity points.
There are three periods in the graph where search has been stopped, rendered as zero latency.
You can also observe that the query latency is slightly elevated when query load is applied after
an idle period. This is due to caching effects.
The query rate is on average 10 queries per minute, apart from a test for query throughput
within the first day of testing. This reached around 4000 queries per minute, making the 10 qpm
query rate graph almost invisible. Thus the graph above shows the light load query latency, and
not latency during maximum throughput.
9
The following diagram show the feed rates during the same interval.
By comparing the two graphs, we see that an ongoing feed gives some degradation of query
latency. As this scenario has a search row with backup indexer, the effect is anyhow much less
than in systems with search running on the same nodes as indexer and item processing.
10
Percentile based query performance
The graphs presented earlier in this document show the average query latency. The SharePoint
administrative reports also provide percentile based reports. This can provide a more
representative performance summary; especially under high load conditions.
The following graph shows the percentile based query performance for the same system as in
the previous section. While the previous graphs showed average query latencies around 500700ms, the percentile graph shows that the median latency (50 th percentile) is leveling out
around 400ms when content is added. The high percentiles show larger variations, both due to
the increased number of items on the system, as well as the impact of ongoing crawls.
Note:
The percentile based query performance graph includes the crawl rate of the Query SSA. This
will not show any crawling activity, as the Query SSA will only crawl user profile data for the
people search index. People search is not included in the test scenarios in this document.
Crawling of all other sources is performed by the FAST Search Server Content SSA.
The query throughput load test during the first day reveals that high query load will reduce the
latency for the high percentiles. During low query load and ongoing feed, a large fraction of the
queries will hit fresh index generations without caches. When query load increases (within the
11
maximum throughput capacity), the fraction of cold cache queries goes down. This will reduce
the high percentile latencies.
Deployments with indexing and queries on the same row
As crawls and queries both use CPU resources, deployments with indexing and queries on the
same row will show some degradation in query performance during content crawls. Single row
deployments are likely to have indexing, query and item processing all running on the same
servers.
The following test results are gathered by applying an increasing query load to a single row
system. The graphs are gathered from the SharePoint administrative reports. The query latency
is plotted as an area vs. the left axis, while the query throughput is a light blue line vs. the right
axis.
In the following diagram there is no ongoing content feed. The colors of the graph are as
follows:
Red: Backend, that is time consumed in the FAST Search Server nodes
Yellow: Object model
Blue: Server rendering
12
Query latency remains stable around 700 ms up to 8 queries per second (~500 queries per
minute). At this point the server CPU capacity becomes saturated. When applying even higher
load, query queues build up, and latency increases linearly with the queue length.
In the following diagram, the same query load is applied with ongoing content feeding. This
implies that queries need to utilize CPU capacity from the lower prioritized item processing.
Consequently, query latency now starts to increase even at low load, and also the maximum
throughput is reduced from ~600 to ~500 queries per minute. Note the change in scale on the
axis compared to the previous graph.
Query latency will have higher variation during feed. The spikes shown in the graph are due to
the indexer completing larger work batches; leading to new index generations which invalidates
the current query caches.
13
Using a dedicated search row
You can deploy a dedicated search row to isolate query traffic from indexing and item
processing. This requires twice the number of servers in the search cluster, at the benefit of
better and more consistent query performance. Such a configuration will also provide query
matching redundancy.
A dedicated search row implies some additional traffic during crawls when the indexer creates a
new index generation (a new version of the index for a given partition). The new index data is
passed over the network from the indexer node to the query matching node. Given a proper
storage subsystem, the main effect on query performance is a slight degradation when new
generations arrive due to cache invalidation.
Search row combined with backup indexer
You can deploy a backup indexer in order to handle non-recoverable errors on the primary
indexer. You will normally co-locate the backup indexer with a search row. For this scenario you
should normally not deploy item processing to the combined backup indexer and search row.
The backup indexer increase the I/O load on the search row, as there will be additional
housekeeping communication between the primary and backup indexer to keep the index data
on the two servers in sync. This also includes additional data storage on disk for both servers.
Make sure that you dimension your storage subsystem to handle the additional load.
Network traffic
With increased CPU performance on the individual servers, the network connection between the
servers can become a bottleneck. As an example, even a small 4-node FAST Search Server farm
can process and index more than 100 items per second. If the average item is 250 Kbytes, this
will represent around 250 Mbit/s average network traffic. Such a load may saturate even a
1Gbit/s network connection.
The network traffic generated by content feeding and indexing can be decomposed as follows:
The indexing connector within the Content SSA retrieves the content from the source
The Content SSA (within the SharePoint farm) passes the retrieved items in batches to
the content distributor component in the FAST Search Server farm
Each item batch is sent to an available item processing component, typically located on
another server
After processing, each batch is passed to the indexing dispatcher, which will split the
batches according to the index column distribution
The indexing dispatcher distributes the processed items to the indexers of each index
column
The binary index is copied to additional search rows (if deployed)
The accumulated network traffic across all nodes can be more than five times higher than the
content stream itself in a distributed system. A high performance network switch is needed to
interconnect the servers in such a deployment.
High query throughput also generates high network traffic, especially when using multiple index
columns. Make sure you define the deployment configuration and network configuration to avoid
too much overlap between network traffic from queries and network traffic from content feeding
and indexing.
14
Web Analyzer performance dimensioning
Performance dimensioning of the Web Analyzer component depends on the number of indexed
items and whether the items contain hyperlinks. Items containing hyperlinks, or is linked to, will
represent the main load on the Web Analyzer.
Database-type content does normally not contain hyperlinks. SharePoint and other types of
Intranet content will often contain HTML with hyperlinks. External Web content is almost
exclusively HTML documents with many hyperlinks.
Although the number of CPU cores and the amount of disk space is vital for performance
dimensioning of the Web Analyzer, disk space is the most important. The following table
specifies rule-of-thumb dimensioning recommendations for the Web Analyzer.
Content type
Number of items per CPU core
GB disk per million items
Database
20 million
2
SharePoint / Intranet
10 million
6
Public Web content
5 million
25
Note:
The table provides dimensioning rules for the whole farm. If the Web Analyzer components
are distributed over two servers the requirement per server will be half of the given values.
The amount of memory needed is the same for all types of content, but depends on the number
of cores used. We recommend planning for 30 MBytes per million items plus 300 MBytes per
CPU core.
The link, anchor text or click through log analysis will only be performed if sufficient disk space
is available. The number of CPU cores only impacts the amount of time it takes to update the
index with anchor text and rank data.
If the installation contains different types of content, the safest capacity planning strategy is to
use the most demanding content type as the basis for the dimensioning. For example; if the
system contains a mix of database and SharePoint content it is recommended to dimension the
system as if it only contains SharePoint content.
15
Scenarios
This section describes typical deployments for variously sized search farms, with some relevant
hardware variations for each scale point. The following scale points are included:
XS: Extra-small FAST Search farm tested with 1, 5 and 8 million items
S: Small FAST Search farm with 15 million items (planned for inclusion in future release)
M: Medium FAST Search farm with 40 million items
L: Large FAST Search farm with 100 million items
XL: Extra-large FAST Search farm with 500 million items (planned for inclusion in future
release)
For each of these scale points, several scenarios are defined. These are labeled as M1, M2 and
so on for the medium scale point, and correspondingly for the others. Content is crawled from
SharePoint, Web servers and file shares.
Note:
The scenarios below does not include storage sizing for storing a system backup, as backups
would normally not be stored on the FAST Search Server nodes themselves.
The next subsection describes the specifications shared across all scenarios, while each of the
following subsections describes a specific scenario. General guidelines follow in the
"Recommendations" section.
Shared specifications across all scenarios
Note:
The FAST Search Server index does not use any SQL Server based property database. The
people search index use the property database, but the test scenarios in this document does not
include people search.
Query workload
This section describes the workload used for query profiling. The number of queries per second
(QPS) is varied from 1 QPS to about 40 QPS and the latency is recorded as a function of this.
The query test set consists of 76501 queries. These queries have the following characteristics:
Query terms
1
2
3
4
5
7
Number of queries
49195
24520
2411
325
43
7
Percentage of test set
64,53
32,16
2,81
0,43
0,06
0,01
There are two types of multi-term queries used:
16
1. ALL queries (70%), meaning all terms must appear in matching items. This includes
queries containing the explicit AND, as well as list of terms that implicitly is parsed as an
AND statement.
2. ANY queries (30%), meaning at least one of the terms must appear in matching items
(OR).
The queries are chosen by random selection.
The number of user agents (simulated users) defines the query load. One agent repeats the
following two steps during the test:
1. Submit a query
2. Wait for the response
There is no pause between the repetitions of these steps; the agent submits a new query
immediately after receiving a query response.
The number of agents increases in steps during the test. For example, the figure below shows a
typical test where the number of agents increases periodically, adding two agents every 15
minute.
Notes on measured disk usage
Actual disk usage is listed for all scenarios. Please note that:
Raw source data size is only included for illustration. These data do not occupy any disk
space on the FAST Search Server system.
The indexer stores the processed items on disk in an XML based format called FiXML.
FiXML data serves as input to the indexing process which builds the indices. Every
submitted item is stored in FiXML format. Old versions are removed once a day. The data
size given contains only a single version of every item.
FAST Search Server keeps a read-only binary index file set to serve queries while
building the next index file set. The worst-case disk space usage for index data is
approximately 2.5 times the size of a single index file set. The 0.5 factor constitute
various temporary files.
17
When running with primary and backup indexers the indexers may consume an additional
50 GB each for synchronization data for other data including Web Analyzer data, log files,
etc.
Note:
The ratio between source data and index data depends strongly on the content type. This is
related to the different amount of searchable data in the various data formats.
Configuration for extended content capacity
FAST Search Server has a default configuration that is optimized for handling up to 15 million
items per index column, with a hard limit of 30 million items per index column. Some of the
scenarios described in this document use a modified configuration to allow for up to 40 million
items per column. This is referred to as an extended capacity configuration. The extended
content capacity configuration has more index partitions within each server node. In this way
low query latency can be maintained at the expense of reduced maximum QPS.
There are some tradeoffs by extending the capacity:
Query throughput (QPS) is reduced. Query latency (while not exceeding the throughput
limitation) is less affected. Query throughput reductions can be compensated with
multiple search rows, but then the reduction in server count is diminishing.
Indexing will require more resources, and also more disk accesses.
More items per column require more storage space per server. The total storage space
across the entire farm is mainly the same.
There are fewer nodes for distributing item processing components. Initial feed rate will
be reduced, as the feed rate is mainly dependent on the number of available CPU cores.
Incremental feeds will also have lower throughput as each index column has more work.
Initial pre-production bulk feeds can be accelerated by temporarily adding item
processing components to eventual search rows, or additional servers temporarily
assigned to the cluster.
More hardware resources per server are required. It is not recommended to use the
extended settings on a server with less than 16 CPU cores/threads (24 or more is
recommended). 48 GB RAM is recommended, and a high-performance storage
subsystem. See individual scenarios for tested configurations.
In summary, the extended content capacity configuration is only recommended for deployments
with:
High content volumes, but where the number of changes over time is low, typically less
than 1 million changes per column over 24 hours.
Low query throughput requirements (not more than 5-10 queries per second, depending
on CPU performance of the servers).
Search running on a different search row than the primary indexer, as the indexer is
expected to be busy most of the time.
Note:
When estimating the change rate that a farm must be able to consume, keep in mind that any
content change implies a load to the system, including ACL changes. ACL changes may appear
18
for many items at a time in case of access right changes to document libraries or sites,
resulting in high peak update rates.
Note:
Modifying the indexer configuration has implications on how to perform patch and service pack
upgrades. See procedure below.
Enable extended content capacity
In order to reconfigure the indexers to handle up to 40 million items per column, you must
modify the indexer template configuration file and run the deployment script to generate and
distribute the new configuration.
Note:
Only apply the following procedure to indexers which do not contain any data.
1. Verify that no crawling is ongoing
2. Verify that no items are indexed on any of the indexers. Run the following command:
%FASTSEARCH%\bin\indexerinfo –a doccount
All the indexers should report 0 items.
3. On all FAST Search Server nodes, run the following commands:
net stop fastsearchservice
net stop fastsearchmonitoring
4. On the administration server node:
a. Save a backup of the original configuration file,
%FASTSEARCH%\META\config\profiles\default\templates\installer\etc\config_data
\RTSearch\clusterfiles\rtsearchrc.xml.win32.template
You may need this backup at a later stage if this configuration file is modified in
any patch or service pack upgrade.
b. Modify the following values within the original configuration file:
i. Set the numberPartitions to 10 (the default is 5).
ii. Set the docsDistributionMax to
6000000,6000000,6000000,6000000,6000000,6000000,6000000,6000000
,6000000,6000000
The default value is 6000000,6000000,6000000,6000000,6000000
c. Modify the deployment file to enable re-deployment.
%FASTSEARCH%\etc\config_data\deployment\deployment.xml
must be modified in order for the Set-FASTSearchConfiguration PowerShell cmdlet
to run the re-deployment. You can do that by opening the file in Notepad, add a
space and save the file.
d. Run the following commands:
Set-FASTSearchConfiguration
19
net start fastsearchservice
5. On all non-administration server nodes, run the following commands:
Set-FASTSearchConfiguration
net start fastsearchservice
Handling patches and service pack upgrades
For all future patch or service pack updates, you need to verify if this configuration file is
updated as part of the patch or service pack update. Review the readme file thoroughly to look
for any mention of this configuration file. If a patch or service pack involves an update of this
configuration file, the following steps must be followed.
1. Replace the configuration file
%FASTSEARCH%\META\config\profiles\default\templates\installer\etc\config_data\RTSe
arch\clusterfiles\rtsearchrc.xml.win32.template
with the backup of the original file that you have saved.
2. Perform the patch or service pack upgrade according to the appropriate procedure.
3. Perform the change to the configuration file template as specified above. Do not forget to
back up the modified configuration file template!
Extra small FAST Search Farm
The extra small FAST Search farm is targeting a smaller test corpus with high query rates. The
amount of content is up to 8 million items. There are no off-business hours with reduced query
load. Crawls are likely to occur at any point in time. Query performance is measured at 1M, 5M
and 8M content volume.
The configuration for the parent SharePoint farm uses four front-end Web servers, one
application server and one database server arranged as follows:
One crawl component of the Content SSA is running on a single server.
One of the application servers also hosts Central Administration for the farm.
One database server hosts the crawl databases, the FAST Search Server administration
databases, as well as the other SharePoint databases.
Application server and Web front end servers will only have disk space for operating system and
programs. No separate data storage is required.
20
Deployment alternatives
For the extra small farm scenario, the following alternatives have been tested for the FAST
Search Server farm back-end:
XS1.
Single server install hosting all FAST Search Server components using regular disk
drives
XS2.
Same as XS1, deployed on single virtual machine
XS3.
Four nodes install running on four virtual machines, all running on the same physical
server.
XS4.
Same as XS1, with the addition of a dedicated search row (2 servers)
XS5.
Same as XS1, but with storage on SAS SSD drives
Specifications
This section provides detailed information about the hardware, software, topology, and
configuration of the test environment.
Hardware
FAST Search Server farm servers
All the extra small size deployment alternatives are running on similar hardware, although some
of the setups using virtualization, others solid state disk (SSD) storage.
Shared specifications:
Windows Server 2008 R2 x64 Enterprise Edition
2x Intel L5520 CPUs
o
Hyper-threading switched on
o
Turbo Boost switched on
24 GB memory
1 Gbit/s network card
Storage subsystem
o
OS: 2x 146GB 10k RPM SAS disks in RAID1
o
Application: 7x 146 GB 10k RPM SAS disks in RAID5. Total formatted capacity of
880 GB.
o
Disk controller: HP Smart Array P410, firmware 3.30
o
Disks: HP DG0146FARVU, firmware HPD6
Changes for XS2 and XS3:
Virtualized servers running under Hyper-V. Host server has same specification as XS1
o
4 CPU cores
o
8 GB memory
o
800 GB disk on servers with index component
21
Changes for XS5:
Storage subsystem
o
Application: 2x 400 GB SSD disks in RAID0. Total formatted capacity of 800 GB.
o
SSD disks: Stec ZeusIOPS MLC Gen3, part Z16IZF2D-400UCM-MSF
SharePoint Server 2010 servers
Application and Web front end servers do not need storage apart from operating system,
application binaries and log files.
Windows Server 2008 R2 x64 Enterprise edition
2x Intel L5420 CPUs
16 GB memory
1 Gbit/s network card
Storage subsystem for OS/Programs: 2x 146GB 10k RPM SAS disks in RAID1
SQL servers
Same specification as for SharePoint 2010 servers above, with additional disk RAID for SQL data
with 6x 146GB 10k RPM SAS disks in RAID5.
22
Topology
This section describes the topology of the test environment for all deployment alternatives.
XS1
XS1 is a generic single node install using the following deployment.xml file:
<?xml version="1.0" encoding="utf-8" ?>
<deployment version="14" modifiedBy="contoso\user"
modifiedTime="2009-03-14T14:39:17+01:00" comment="XS1"
xmlns=”http://www.microsoft.com/enterprisesearch”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd">
<instanceid>XS1</instanceid>
<connector-databaseconnectionstring>
[<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=XS1.jdbc]]>
</connector-databaseconnectionstring>
<host name="fs4sp1.contoso.com">
<admin />
<query />
<content-distributor />
<indexing-dispatcher />
<searchengine row="0" column="0" />
<webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/>
<document-processor processes="12" />
</host>
<searchcluster>
<row id="0" index="primary" search="true" />
</searchcluster>
</deployment>
XS2
XS2 is a generic single node install without a deployment file, running on a single virtual
machine. In practice, the following deployment file would have given the same setup.
<?xml version="1.0" encoding="utf-8" ?>
<deployment version="14" modifiedBy="contoso\user"
modifiedTime="2009-03-14T14:39:17+01:00" comment="XS2"
xmlns=”http://www.microsoft.com/enterprisesearch”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd">
<instanceid>XS2</instanceid>
<connector-databaseconnectionstring>
[<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=XS2.jdbc]]>
</connector-databaseconnectionstring>
<host name="fs4sp1.contoso.com">
<admin />
<query />
<content-distributor />
<indexing-dispatcher />
<searchengine row="0" column="0" />
<webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="1"/>
<document-processor processes="4" />
</host>
<searchcluster>
<row id="0" index="primary" search="true" />
</searchcluster>
</deployment>
23
XS3
XS3 is distributed across four virtual machines, getting a comparable hardware footprint to XS1.
<?xml version="1.0" encoding="utf-8" ?>
<deployment version="14" modifiedBy="contoso\user"
modifiedTime="2009-03-14T14:39:17+01:00" comment="XS3"
xmlns=”http://www.microsoft.com/enterprisesearch”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd">
<instanceid>XS3</instanceid>
<connector-databaseconnectionstring>
[<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=XS3.jdbc]]>
</connector-databaseconnectionstring>
<host name="fs4sp1.contoso.com">
<admin />
<query />
<document-processor processes="4" />
</host>
<host name="fs4sp2.contoso.com">
<indexing-dispatcher />
<searchengine row="0" column="0" />
</host>
<host name="fs4sp3.contoso.com">
<content-distributor />
<document-processor processes="4" />
</host>
<host name="fs4sp4.contoso.com">
<webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4" />
<document-processor processes="4" />
</host>
<searchcluster>
<row id="0" index="primary" search="true" />
</searchcluster>
</deployment>
XS4
XS4 is the same as the XS1 deployment, but extended with an additional search row to get
search redundancy.
<?xml version="1.0" encoding="utf-8" ?>
<deployment version="14" modifiedBy="contoso\user"
modifiedTime="2009-03-14T14:39:17+01:00" comment="XS4"
xmlns=”http://www.microsoft.com/enterprisesearch”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd">
<instanceid>XS4</instanceid>
<connector-databaseconnectionstring>
[<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=XS4.jdbc]]>
</connector-databaseconnectionstring>
<host name="fs4sp1.contoso.com">
<admin />
<query />
<content-distributor />
<indexing-dispatcher />
<searchengine row="0" column="0" />
24
<webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/>
<document-processor processes="16" />
</host>
<host name="fs4sp2.contoso.com">
<query />
<searchengine row="1" column="0" />
</host>
<searchcluster>
<row id="0" index="primary" search="true" />
<row id="1" index="none" search="true" />
</searchcluster>
</deployment>
XS5
XS5 uses the same deployment file as XS1, but with storage on SAS SSD drives.
Dataset
This section describes the test farm dataset, including database content and sizes, search
indexes, and external data sources. The overall metrics are shown in the table below.
Object
Search index size (# of items)
Size of crawl database
Size of crawl database log file
Size of property database
Size of property database log file
Size of SSA administration database
Value
5.4 M
16.7 GB
1.0 GB
< 0.1 GB
< 0.1 GB
< 0.1 GB
The table below shows the content source types used to build the index. The numbers in the
table reflect the total number of items per source. The difference between the total number of
items and the index size above is due to two factors:
Items may be disabled from indexing in the content source
The document format type cannot be indexed
For SharePoint sources, the size of the respective content database in SQL is used as the raw
data size.
Content source
HTML 1
SharePoint 1
HTML 2
Total
Items
Raw data size
1.1 M
4.5 M
3.2 M
8.8 M
8.8 GB
2.0 TB
137 GB
2.2 TB
Average size
per item
8.1 kB
443 kB
43 kB
246 kB
The test scenarios do not include people search data. People Search is crawled and indexed in a
separate index within the Query SSA.
25
Test Results
This section provides data that shows how the farm performed under load.
Feeding and indexing performance
All configurations apart from XS2 have the same CPU resources available. XS2 is running on a
single virtual machine, and is thus limited to four CPU cores, as opposed to 16 for the others.
The following graph shows the average number of items per second for the different content
sources during a full crawl:
Overall, XS2 shows 65-70% performance degradation compared to running on physical
hardware. This is expected, as the single VM is restricted by available CPU resources. XS3,
running four VMs and thus having the same hardware footprint as XS1, results on 35-40%
degradation compared to running directly on the host computer. The major degradation stems
from the lower IO performance when running in a virtual machine, using a fixed size VHD file.
The spilt of XS3 resources across four virtual machines also infers more server-to-server
communication.
Query performance
The following sub sections describe the query performance impact both from having different
farm deployments and varying content volume. There is also a separate test section for the
effects of tuning for the low document volume in the XS scenarios, combined with solid state
storage disks (SSD).
26
Impact of deployment configuration
The above graph shows the query performance of the different scenarios when there is no
ongoing feed. XS1 and XS5 show only minor differences, with a slightly better performance for
the SSD based XS5 (running with two SSDs versus 7 regular SAS spindles for XS1). As
expected, the additional search row in XS4 does not improve query performance under idle
crawl conditions. XS4 has the same throughput as XS1/XS5 under high load, but with slightly
increased latency. This is due to queries being directed to both search rows, implying a lower
cache hit ratio; as well as intra-node communication.
The virtualized scenarios (XS2 and XS3) have a significantly lower query performance, and also
with higher variation than the non-virtualized options. As observed for feed performance, this
reduction is related to the storage performance, in addition to the search components having
maximum four CPU cores at disposal.
27
The situation is somewhat different in the above graph, showing query performance under full
crawl. The single server XS1 scenario gets a reduction in query performance under concurrent
crawl load. XS5 has less impact due to the improved storage performance, but does still see CPU
congestion between item processing and query components. XS4 is least impacted, as this
scenario has a dedicated search row. XS4 results vary more at concurrent high query and feed
load due to competition for network resources.
The virtualized scenarios are both below 10 QPS maximum throughput under these load
conditions. XS1 (native hardware) and XS3 (virtualized) have the same hardware footprint, with
the non-virtualized configuration having more than five times the throughput. Some of this
difference is due to virtualization overhead, especially storage performance; and some due to
the limitations of a virtual machine with regards to how many CPU cores it has available. Under
high search load, the query components can use all 16 CPU cores in XS1, while this is restricted
to maximum four CPU cores with XS3.
Impact of varying content volume
Even though the XS-scale scenarios are sized for 8M documents, query performance testing was
also run at 1M and 5M items indexed. The following graph shows how the content capacity
affects query performance:
28
The solid lines show that maximum query capacity improves with less content, with maximum
90 QPS at 1M items, 80 QPS at 5M items, and 64 QPS at 8M items. During feed, the 1M index
can still sustain > 40 QPS, although with a lot of variance. This is due to the total index size
being relatively small, and most of it being able to fit inside application and OS level caches.
Both 5M and 8M indices have a lower maximum query performance during feed, in the 25-30
QPS range.
Impact of tuning for high performance storage
Even if the XS5 scenario demonstrated improved performance over XS1 with default settings,
configuration tuning allows better utilization of the higher IOPS potential in SSDs. This tuning is
done the same way as enabling extended content capacity discussed earlier, although by only
changing docsDistributionMax setting, and not the number of partitions:
docsDistributionMax=”2500000,2500000,2500000,2500000,2500000”
This will reduce the maximum practical capacity per column to 8–9 million items, but also
spread the workload across multiple smaller partitions than the default setting. This allows for
more parallel query execution, at the expense of more disk operations.
The following graph shows the result of this tuning at full capacity (8M items), which allows the
SSD based XS5 scenario to serve up to 75 QPS, and also reduce the response time under light
query load. For example, the response time at 40 QPS at idle crawl is reduced from 0.4 to 0.2
seconds. Further, the response time during crawls is better with this tuning, as well as more
consistent. The tuned XS5 scenario is able to deliver around 40 QPS with sub-second latency
during crawls, while XS1 only delivered 15 QPS with the same load and latency requirements.
29
In total, using high performance storage provides improved query performance, especially
during concurrent content crawls, and thus reduces or even eliminates the performance driven
need to run search on dedicated rows. SSDs also provide sufficient performance with a smaller
number of disks. In this case two SSDs outperform seven SAS spindles. This is attractive where
power, or space restrictions, does not allow for a larger number of disks, for example for blade
servers.
Disk usage
The table below shows the combined increase in disk usage on all nodes after the various
content sources have been indexed. Note that scenarios using replication of FiXML and/or index
data needs additional space.
Content source
HTML 1
Raw source
data size
1.1 M
FiXML data
size
6 GB
Index data
size
20 GB
Other data size
4 GB
SharePoint1
4.5 M
41 GB
108 GB
15 GB
HTML 2
3.2 M
27 GB
123 GB
22 GB
Total
8.8 M
74 GB
251 GB
41 GB
Small FAST Search Farm
This scenario has not yet been tested. Planned capacity is 15 million items per farm.
30
Medium FAST Search Farm
The medium FAST Search Farm is targeting a moderate test corpus. The amount of content is
up to 40 million items, and to meet freshness goals, incremental crawls are likely to occur
during business hours.
The configuration for the parent SharePoint farm uses two front-end Web servers, two
application servers and one database server arranged as follows:
Two crawl components for the Content SSA are distributed across the two application
servers. This is mainly due to I/O limitations in the test setup (1 Gbit/s network), where
a single network adapter would have been a bottleneck.
One of the application servers also hosts Central Administration for the farm.
One database server hosts the crawl databases, the FAST Search Server administration
databases, as well as the other SharePoint databases.
Application servers and Web front end servers will only have disk space for operating system
and programs. No separate data storage is required.
Deployment alternatives
For the medium farm scenario, the following alternatives have been tested for the FAST Search
Server farm back-end:
M1.
One combined administration and Web Analyzer server, and three index column
servers with default configuration (4 servers)
M2.
Same as M1, but using SAN storage (4 servers)
M3.
A single high capacity server hosting all FAST Search Server components
M4.
Same as M1, with the addition of a dedicated search row (7 servers)
M5.
Same as M3, with the addition of a dedicated search row (2 servers)
M6.
Same as M4, but where the search row includes a backup indexer row (7 servers)
M7.
Same as M5, but where the search row includes a backup indexer row (2 servers)
M8.
Same as M3, but using solid state drives (1 server)
M9.
Same as M3, but on more powerful hardware (1 server)
M10.
Same as M1, but using solid state drives for indexer/search nodes (4 servers)
Specifications
This section provides detailed information about the hardware, software, topology, and
configuration of the test environment.
Hardware
FAST Search Server farm servers
The following hardware specifications have been used for the medium size deployment
alternatives.
Shared specifications:
Windows Server 2008 R2 x64 Enterprise Edition
31
2x Intel L5520 CPUs
o
Hyper-threading switched on
o
Turbo Boost switched on
24 GB memory
1 Gbit/s network card
Storage subsystem
o
OS: 2x 146GB 10k RPM SAS disks in RAID1
o
Application: 18x 146 GB 10k RPM SAS disks in RAID50 (two parity groups of 9
drives each). Total formatted capacity of 2 TB.
o
Disk controller: HP Smart Array P410, firmware 3.00
o
Disks: HP DG0146FARVU, firmware HPD5
Changes for M2:
Application is hosted on 2TB partitions on a SAN
SAN used for test
o
3Par T-400
o
240 15k RPM spindles (450GB each)
o
Dual ported FC connection to each application server using MPIO without any FC
switch. MPIO enabled in the operating system.
Changes for M3/M5:
48 GB memory
Application is hosted on 22x300GB 10k RPM SAS drives in RAID50 (two parity groups of
11 spindles each). Total formatted capacity of 6TB.
Changes for M7:
2x Intel L5640 CPUs
o
Hyper-threading switched on
o
Turbo Boost switched on
48 GB memory
Dual 1 Gbit/s network card
Storage subsystem:
o
Application hosted on 12x 1TB 7200 RPM SAS drives in RAID10. Total formatted
capacity of 6TB.
o
Disk controller: Dell PERC H700, firmware 12.0.1-0091
o
Disks: Seagate Constellation ES ST31000424SS, firmware KS65
Changes for M8:
2x Intel L5640 CPUs
o
Hyper-threading switched on
o
Turbo Boost switched on
48 GB memory
Dual 1 Gbit/s network card
Storage subsystem:
o
Application: 3x 1280 GB SSD cards in RAID0. Total formatted capacity of 3.6 TB.
32
o
SSD cards: Fusion-IO ioDrive Duo 1.28 TB MLC, firmware revision 43284, driver
2.2 build 21459
Changes for M9:
2x Intel X5670 CPUs
o
Hyper-threading switched on
o
Turbo Boost switched on
48 GB memory
Dual 1 Gbit/s network card
Storage subsystem:
o
Application hosted on 12x 600GB 15k RPM SAS drives in RAID50. Total formatted
capacity of 6TB.
o
Disk controller: LSI MegaRAID SAS 9260-8i, firmware 2.90-03-0933
o
Disks: Seagate Cheetah 15K.7 ST3600057SS, firmware ES62
Changes for M10:
2x Intel L5640 CPUs
o
Hyper-threading switched on
o
Turbo Boost switched on
48 GB memory
Dual 1 Gbit/s network card
Storage subsystem (search cluster nodes only):
o
Application: 1x Fusion-IO ioDrive Duo 1.28 TB MLC SSD card, firmware revision
43284, driver 2.2 build 21459
SharePoint Server 2010 servers
Application and Web front end servers do not need storage apart from operating system,
application binaries and log files.
Windows Server 2008 R2 x64 Enterprise edition
2x Intel L5420 CPUs
16 GB memory
1 Gbit/s network card
Storage subsystem for OS/Programs: 2x 146GB 10k RPM SAS disks in RAID1
SQL servers
Same specification as for SharePoint 2010 servers above, with additional disk RAID for SQL data
with 6x 146GB 10k RPM SAS disks in RAID5.
33
Topology
This section describes the topology of the test environment for all deployment alternatives.
Note:
All the tested deployment alternatives use the same SharePoint Server and Database Server
configuration as shown for M1/M2/M10. For the other deployments only the FAST Search
Server farm topology is shown.
M1/M2/M10
M1, M2 and M10 are similar except for the storage subsystem. M1 is running on local disk, while
M2 uses SAN storage and M10 uses solid state storage for the search cluster. All three
deployment alternatives have a search cluster with 3 index columns and one search row. There
is one separate administration node that also includes the Web Analyzer components. Item
processing is spread out across all nodes.
None of these three alternatives have a dedicated search row. This implies that there will be a
noticeable degradation in query performance during content feeds. The impact can be reduced
by feeding in off-peak hours, or by reducing the number of item processing components to
reduce the maximum feed rate.
34
The following figure shows the M1 deployment alternative. M2 and M10 have the same
configuration.
35
The following deployment.xml file is used for M1, M2 and M10.
<?xml version="1.0" encoding="utf-8" ?>
<deployment version="14" modifiedBy="contoso\user"
modifiedTime="2009-03-14T14:39:17+01:00" comment="M1"
xmlns=”http://www.microsoft.com/enterprisesearch”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd">
<instanceid>M1</instanceid>
<connector-databaseconnectionstring>
[<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=M1.jdbc]]>
</connector-databaseconnectionstring>
<host name="fs4sp1.contoso.com">
<admin />
<query />
<webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/>
<document-processor processes="12" />
</host>
<host name="fs4sp2.contoso.com">
<content-distributor />
<searchengine row="0" column="0" />
<document-processor processes="12" />
</host>
<host name="fs4sp3.contoso.com">
<content-distributor />
<searchengine row="0" column="1" />
<document-processor processes="12" />
</host>
<host name="fs4sp4.contoso.com">
<indexing-dispatcher />
<searchengine row="0" column="2" />
<document-processor processes="12" />
</host>
<searchcluster>
<row id="0" index="primary" search="true" />
</searchcluster>
</deployment>
36
M3
The M3 scenario combines all components on one server. Running concurrent feeding and query
load has the same impact as for M1/M2/M10, but in addition, the reduced number of servers
implies fewer items processing components, and thus lower feed rate.
The following figure shows the M3 deployment alternative.
The following deployment.xml file is used.
<?xml version="1.0" encoding="utf-8" ?>
<deployment version="14" modifiedBy="contoso\user"
modifiedTime="2009-03-14T14:39:17+01:00" comment="M3"
xmlns=”http://www.microsoft.com/enterprisesearch”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd">
<instanceid>M3</instanceid>
<connector-databaseconnectionstring>
[<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=M3.jdbc]]>
</connector-databaseconnectionstring>
<host name="fs4sp1.contoso.com">
<admin />
<query />
<content-distributor />
<indexing-dispatcher />
<searchengine row="0" column="0" />
<webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/>
<document-processor processes="12" />
</host>
<searchcluster>
<row id="0" index="primary" search="true" />
</searchcluster>
</deployment>
37
M4
M4 corresponds to M1/M2/M10 with the addition of a dedicated search row. The search row adds
query throughput capacity, introduces query redundancy, and provides better separation of
query and feeding load. Each of the three servers running the dedicated search row also
includes a query processing component (query). In addition, the deployment includes a query
processing component on the administration node (fs4sp1.contoso.com). The Query SSA does
not use this query processing component during normal operation, but may be used as a
fallback to be able to serve queries if the entire search row is taken down for maintenance.
The following figure shows the M4 deployment alternative.
38
The following deployment.xml file is used.
<?xml version="1.0" encoding="utf-8" ?>
<deployment version="14" modifiedBy="contoso\user"
modifiedTime="2009-03-14T14:39:17+01:00" comment="M4"
xmlns=”http://www.microsoft.com/enterprisesearch”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd">
<instanceid>M4</instanceid>
<connector-databaseconnectionstring>
[<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=M4.jdbc]]>
</connector-databaseconnectionstring>
<host name="fs4sp1.contoso.com">
<admin />
<query />
<webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/>
<document-processor processes="12" />
</host>
<host name="fs4sp2.contoso.com">
<content-distributor />
<searchengine row="0" column="0" />
<document-processor processes="12" />
</host>
<host name="fs4sp3.contoso.com">
<content-distributor />
<indexing-dispatcher />
<searchengine row="0" column="1" />
<document-processor processes="12" />
</host>
<host name="fs4sp4.contoso.com">
<indexing-dispatcher />
<searchengine row="0" column="2" />
<document-processor processes="12" />
</host>
<host name="fs4sp5.contoso.com">
<query />
<searchengine row="1" column="0" />
</host>
<host name="fs4sp6.contoso.com">
<query />
<searchengine row="1" column="1" />
</host>
<host name="fs4sp7.contoso.com">
<query />
<searchengine row="1" column="2" />
</host>
<searchcluster>
<row id="0" index="primary" search="true" />
<row id="1" index="none" search="true" />
</searchcluster>
</deployment>
39
M5
M5 corresponds to M3 with the addition of a dedicated search row, giving the same benefits as
M4 compared to M1/M2/M10.
The following figure shows the M5 deployment alternative.
40
The following deployment.xml file is used.
<?xml version="1.0" encoding="utf-8" ?>
<deployment version="14" modifiedBy="contoso\user"
modifiedTime="2009-03-14T14:39:17+01:00" comment="M5"
xmlns=”http://www.microsoft.com/enterprisesearch”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd">
<instanceid>M5</instanceid>
<connector-databaseconnectionstring>
[<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=M5.jdbc]]>
</connector-databaseconnectionstring>
<host name="fs4sp1.contoso.com">
<admin />
<query />
<content-distributor />
<indexing-dispatcher />
<searchengine row="0" column="0" />
<webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/>
<document-processor processes="16" />
</host>
<host name="fs4sp2.contoso.com">
<query />
<searchengine row="1" column="0" />
</host>
<searchcluster>
<row id="0" index="primary" search="true" />
<row id="1" index="none" search="true" />
</searchcluster>
</deployment>
M6
M6 has the same setup as M4 with an additional backup indexer enabled on the search row. The
backup indexer is deployed by modifying the M4 deployment .xml file as shown below.
…
<searchcluster>
<row id="0" index="primary" search="true" />
<row id="1" index="secondary" search="true" />
</searchcluster>
…
41
M7
M7 has the same setup as M5, with an additional backup indexer enabled on the search row. M7
is also running on nodes with more CPU cores (see hardware specifications), allowing to increase
the number of item processing components in the farm; also running on the search row. The
following deployment.xml is used.
<?xml version="1.0" encoding="utf-8" ?>
<deployment version="14" modifiedBy="contoso\user"
modifiedTime="2009-03-14T14:39:17+01:00" comment="M7"
xmlns=”http://www.microsoft.com/enterprisesearch”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd">
<instanceid>M7</instanceid>
<connector-databaseconnectionstring>
[<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=M5.jdbc]]>
</connector-databaseconnectionstring>
<host name="fs4sp1.contoso.com">
<admin />
<query />
<content-distributor />
<indexing-dispatcher />
<searchengine row="0" column="0" />
<webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/>
<document-processor processes="20" />
</host>
<host name="fs4sp2.contoso.com">
<query />
<searchengine row="1" column="0" />
<document-processor processes="8" />
</host>
<searchcluster>
<row id="0" index="primary" search="true" />
<row id="1" index="secondary" search="true" />
</searchcluster>
</deployment>
42
M8/M9
The M8/M9 deployment alternatives combine all components on one server, just like M3. The
differences are that M8/M9 are running on hardware with better performance, especially for the
disk subsystem, and that they have an increased number of CPU cores that allows for more item
processing components.
M8 uses solid state storage. M9 has more CPU power (X5670 vs. L5520/L5640 used on most
other M-scale tests) and the fastest disk spindles readily available (12x 15k RPM SAS disks).
The following deployment.xml file is used for both M8 and M9.
<?xml version="1.0" encoding="utf-8" ?>
<deployment version="14" modifiedBy="contoso\user"
modifiedTime="2009-03-14T14:39:17+01:00" comment="M8"
xmlns=”http://www.microsoft.com/enterprisesearch”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd">
<instanceid>M8</instanceid>
<connector-databaseconnectionstring>
[<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=M8.jdbc]]>
</connector-databaseconnectionstring>
<host name="fs4sp1.contoso.com">
<admin />
<query />
<content-distributor />
<indexing-dispatcher />
<searchengine row="0" column="0" />
<webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/>
<document-processor processes="20" />
</host>
<searchcluster>
<row id="0" index="primary" search="true" />
</searchcluster>
</deployment
Dataset
This section describes the test farm dataset, including database content and sizes, search
indexes, and external data sources. The below table shows the key metrics data:
Object
Search index size (# of items)
Size of crawl database
Size of crawl database log file
Size of property database
Size of property database log file
Size of SSA administration database
Value
42.7 million
138 GB
11 GB
<0.1 GB
0.3 GB
<0.1 GB
The table below specifies which content source types that are used to build the index. The
numbers in the table reflect the total number of items per source, including replicated copies.
The difference between the total number of items below (43.8 million) and the index size above
(42.7 million) is due to two factors:
Items may be disabled from indexing in the content source
43
The document format type cannot be indexed
For SharePoint sources, the size of the respective content database in SQL is used as the raw
data size.
Content source
File share 1 (2 copies)
File share 2 (2 copies)
SharePoint 1
SharePoint 2
HTML 1
HTML 2
Total
items
Raw data size
Average size
per item
1.2 M
29.3 M
4.5 M
4.5 M
1.1 M
3.2 M
43.8 M
154 GB
6.7 TB
2.0 TB
2.0 TB
8.8 GB
137 GB
11 TB
128 kB
229 kB
443 kB
443 kB
8.1 kB
43 kB
251 kB
Note:
To reach sufficient content volume in the testing of the medium scenario, two replicas of the
file shares were added. Each copy of each document would then appear as a unique item in
the index, but treated as duplicates by the duplicate trimming feature. From a query matching
perspective the load would be similar to having all unique documents indexed, but any results
from these sources would trigger duplicate detection and collapsing in the search results.
The test scenarios do not include people search data. People Search is crawled and indexed in a
separate index within the Query SSA.
Test Results
This section provides data that shows how the farm performed under load.
Feeding and indexing performance
Test results from feeding M1 through M6 are included below. The others not included as those
do not show significantly different feed performance than their respective deployment
alternatives.
44
Full crawl
The following diagram shows the average number of items processed per second for the various
deployment alternatives.
For full crawls, the item processors represent the bottleneck. The limiting factor is the CPU
processing capacity.
M1, M2 and M4 have similar performance characteristics due to having the same number of item
processors available, the same applies to M3 and M5, but note that M1, M2 and M4 have four
times higher crawl performance during a full crawl. The reason for this is that they have four
times the item processor capacity compared to M3 and M5. Comparing M6 to M4 it also becomes
apparent that running with backup indexers incurs a performance overhead due to the extra
synchronization work required. Typically an installation without backup indexers, like M4, will
outperform one with, like M6.
Incremental crawl
The following diagram shows the average number of items processed per second for the various
deployment alternatives.
45
Incremental crawls are faster than full crawls, from slightly faster up to a factor of 2 or 3. This is
mainly due to the fact that incremental crawls mostly consist of partial updates, which only
updates metadata. This also implies that the feeding performance is largely the same for all
content types.
For incremental crawls it is the indexers that are the bottleneck since the item processing load is
limited. Typically disk I/O capacity is the limiting factor. During an incremental update the old
version of the item is fetched from disk, modified, persisted to disk and then indexed. This is
more expensive than a full crawl operation where the item is only persisted and indexed.
Note:
M1 through M5 was tested with content sources having less performance than the other
scenarios in this document. The performance numbers can thus not be directly compared, as
M1 through M5 tests were to some extent limited by the bandwidth of the content sources.
46
Query performance
M1
The following diagram shows the query latency as a function of QPS for the M1 deployment
alternative.
An idle indexer gives best query performance, with an average latency less than 0.7 until
approximately 21 QPS. The corresponding numbers when doing a full crawl is 10 QPS. The
corresponding numbers when doing an incremental crawl is 15 QPS.
Note that the latency is not impacted by higher QPS until you reach max system capacity. The
figure shows that QPS will decrease and latency will increase if you apply more query load after
the maximum capacity of the system has been reached. This occurs at the point where the
curve starts bending "backwards". On the M1 system the peak QPS is about 28, with idle
indexers. CPU resources are the bottleneck in this scenario. The behavior is also illustrated in
the next diagram where you can observe that performance decrease when having more than 40
simultaneous user agents on an idle system.
47
48
Hyper-Threading
CPU resources are the bottleneck in the M1 deployment alternative. Enabling hyper-threading
allows more threads to execute in (near) parallel, at the expense of slightly reduced average
performance for single-threaded tasks. Note that the query matching components will run in a
single thread when QPS is low and no other tasks are running on the server.
Hyper-threading performs better for all the three feeding cases in the M1 deployment
alternative. In the deployment alternative with dedicated search rows, a small reduction (around
150ms) in query latency is observed when running at very light query load.
The following diagram shows the impact of using hyper-threading in the CPU.
In general, hyper-threading reduces query latency and allows for higher QPS, especially when
having multiple components on the same server. Disabling hyper-threading only provides a
small improvement under conditions where the performance already is good. Hence, having
hyper-threading enabled is recommended.
49
M2
The following diagram shows the query latency as a function of QPS for the M2 deployment
alternative.
Note that the latency is not impacted by higher QPS until you reach max system capacity. When
the indexers are idle, there is a slow latency increase until the deployment reaches the
saturation point at approximately 20 QPS. For full and incremental crawls the latency increases
as indicated in the graph. This test does not include test data to indicate exactly when the query
latency saturation takes place during the crawl.
The following diagram shows the same test data presented as user agents versus latency:
50
Comparing M1 and M2
The next diagram compares the performance of M1 and M2. The main conclusion is that M1
performs somewhat better than M2. M1 is able to handle about 3 QPS more than M2 before
reaching the saturation point. The SAN disks used on M2 should be able to match M1’s locally
attached disks in terms of I/O operations per second, but the bandwidth towards the disks is
somewhat lower with the SAN configuration.
For full crawl and incremental crawl the performance was comparable during the light load tests.
During heavy load, ongoing indexing had less impact on M2 as the SAN provided more disk
spindles to distribute the load.
51
M3
The following diagram shows how the M3 deployment alternative (40 million items on a single
server) is able to handle about 10 QPS with idle feeding. This is shown in the diagram below as
QPS versus latency. For comparison the M1 data is also included.
One characteristic of the single node installation is that the query latency fluctuates more when
getting close to the saturation point.
Under low query load, M3 is almost able to match the performance of M1, but during higher load
the limitations become apparent. M1 has three times the number of query matching nodes and
the peak QPS capacity is close to three times as high, 28 versus 10 QPS.
52
M4
M4 is an M1 deployment with an added dedicated search row. The main benefit is that the index
and the search processes are not directly competing for the same resources, primarily disk and
CPU.
The diagrams below show that the added search row in M4 gives a 5 QPS gain versus M1. In
addition, the query latency is improved by about 0.2-0.4 seconds.
Adding search rows will in most cases improve query performance, but at the same time it
introduces additional network traffic that may impact the performance.
The query performance may degrade when adding search rows if you do not have sufficient
network capacity. This is the case when the indexers copies large index files to the query
53
matching nodes. The index file copying may also impact index latency performance due to the
added need for copying of indices.
The following diagram shows the result of running the query test on M4 having 18, 28 and 43
million documents indexed.
The document volume impacts the maximum QPS the system is able to deliver. Adding ~10
million documents gives a ~5 max QPS reduction. Below 23 QPS the document volume has low
impact on the query latency.
54
M5
The diagram below shows the query performance of the M5 versus the M3 topology.
As illustrated when comparing M1 and M4, adding a dedicated search row improves query
performance. The same is the case when adding a query matching node to the single node M3
setup in order to get an M5 deployment.
M6
The diagram below shows the query performance of the M6 versus the M4 topology.
The difference between M6 and M4 is the addition of a backup indexer row. The backup indexers
will compete with query matching for available resources, and may degrade query performance.
However, in this specific test, that was not the case. The hardware used had enough resources
to handle the extra load during normal operations.
55
The backup indexers use significantly less resources than the primary indexer. This is due to the
fact that the primary indexers performs the actual indexing and distributes the indices to the
search rows and backup indexer row.
Note:
All indexers perform regular optimization tasks of internal data structures between 03.00 AM
and 05.59 AM every night. These tasks may, depending on the feed pattern, be quite I/O
intensive. Testing on M6 has shown that you may see a significant reduction in query
performance during indexer optimization processes. The more update and delete operations the
indexer handles, the more optimization is required.
M7
The following diagram shows the query latency as a function of QPS for the M7 deployment
alternative compared to M5.
M7 is very similar to the M5 scenario, but it is running on servers with more powerful CPUs and
more memory. On the other hand, it has a disk subsystem not capable of the same amount of
I/O operations per second (IOPS). The M7 storage subsystem has more bulk capacity but less
performance compared to M5.
56
The main difference in results compared to M5 is a slightly increased QPS rate before the system
becomes saturated. M5 is saturated around 10 QPS, while M7 provides roughly 12 QPS. This is
due to the increased CPU performance and added memory, although partly counterbalanced by
the weaker disk configuration.
M8
M8 has the same extended capacity application configuration as M3 and the following M9, but it
is using solid state storage with much higher IOPS and throughput capabilities. This system is
only limited by the available CPU processing power. Thus a more powerful CPU configuration,
e.g. with quad CPU sockets, should be able to get linearly performance improvements with the
added CPU resources.
Query performance results for M8 are discussed together with M9 below.
M9
The M9 deployment alternative has the same extended capacity application configuration as M3
and M8, but has improved CPU performance (X5670) and high end disk spindles (15k RPM SAS).
M9 is thus an example of the achievable performance gains by using high end components,
keeping regular disk spindles for storage.
The improved CPU performance implies 20-30% increased crawl speeds for M9 over M8 (both
with 20 item processor components), and even more compared M3 (which had 12 item
processing components). Note that M3 ran with a less powerful server for the content sources,
and was more often limited by the sources than M8 and M9. M9 achieved >50 document per
second for all content sources.
The following graph shows the query performance for M8 and M9 under varying load patterns,
compared to M3 and M5:
57
The following observations can be made:
Both M8 and M9 perform better during idle feed than M5. M5 performance is only shown
under feed, but as M5 has a dedicated search row, the query performance is relatively
constant irrespective of ongoing feed or not. The main contribution to peak query rate
improvements are the additional CPU resources on M8 and M9 compared to M5.
During idle feed, M9 will get slightly better QPS than M8 due to the more powerful CPU.
Under overload conditions (>1 second latency), M9 although degrades in performance
due to an overloaded storage subsystem (just like M5), while M8 can sustain the peak
rate with its solid state storage.
M3 (M5 without the search row) saturates already at 5 QPS. M9 does provide higher QPS
rates. M9 has higher latency than M3 at low QPS during feed, as M9 has >50% higher
feed rates than M3 (due to more item processors and faster CPU). M9 with feed rates
reduced to M3 levels would have given better query performance than M3 also at low
QPS.
During feeds, M8 query performance is degraded <20% compared to idle, dominantly
due to CPU congestion. Thus the storage subsystem on M8 makes it possible to maintain
good query performance during feed without doubling the hardware footprint with a
search row. Adding more CPU resources would allow for further increase in query
performance, as the storage subsystem still has spare resources in the current setup.
On M9, query latency roughly doubles during feed. M9 can still deliver acceptable
performance under low QPS loads with concurrent feed, but is much more affected than
M8. This is due to the storage subsystem on M9 having slower read accesses when
combined with write traffic from feeding and indexing.
58
M10
The M10 deployment alternative has the same configuration as M1 and M2 but improves
performance by using solid state storage. M10 is using the same amount of storage as M8, but
spreading this across three search cluster servers to get more CPU power.
It is most interesting to compare M10 to M4, as both setups try to achieve a combination of high
crawl rate and query performance at the same time. In M4, this is done by splitting the
application storage across two search rows, each with 3 columns with 18 SAS disk per server.
M10 only has a single row and is replacing the application disk spindles with solid state storage.
The search cluster totals are thus (both deployment alternatives have an additional
administration server):
M4: 6 servers, 108 disk spindles
M10: 3 servers, 3 solid state storage cards
With idle content crawls (solid lines), M4 achieves around 23 qps, before degrading to around 20
QPS under overload conditions, with IO becoming the bottleneck. M10 is able to deliver 30 QPS,
at which point it becomes limited by the throughput CPU. Using faster or more CPUs would have
increased this benefit even more than the measured 30% gain.
During content crawling, M4 has no significant changes in query performance compared to idle.
It is achieving crawl and query load separation by using an additional set of servers in a
dedicated search row. M10 gets some degradation, as content processing and queries compete
for the same CPU resources. Still, M10 achieves the same 20 QPS as M4 under the highest load
conditions. Also note than the content crawling rate on M10 is 20% higher than on M4 during
this test, as the increased IO performance allows for better handling of the concurrent
operations.
59
Disk usage
Index disk usage
The table below shows the combined increase in disk usage on all nodes after the various
content sources have been indexed.
File share 1 (2 copies)
Raw source
data size
154 GB
FiXML data
size
18 GB
Index data
size
36 GB
File share 2 (2 copies)
6.7 TB
360 GB
944 GB
5 GB
10 GB
SharePoint 1
2.0 TB
70 GB
220 GB
13 GB
SharePoint 2
2.0 TB
66 GB
220 GB
17 GB
HTML 1
8.8 GB
8 GB
20 GB
8 GB
HTML 2
137 GB
31 GB
112 GB
6 GB
Total
11 TB
553 GB
1.6 TB
56 GB
Content source
Other data size
Web Analyzer disk usage
The following table shows disk usage for the Web Analyzer in a mixed content scenario, where
the data is both file share content and SharePoint items.
Number of items in index
40,667,601
Number of analyzed hyperlinks
119,672,298
Average number of hyperlinks per items
2.52
Peak disk usage during analysis (GB)
77.51
Disk usage between analysis (GB)
23.13
Disk usage per 1 million items during peak (GB)
1.63
Note:
The values in the table above are somewhat lower than the values specified for Web Analyzer
performance dimensioning in the search overview earlier in this document. The values above
derive from one specific installation where URLs are fairly short. The performance dimensioning
recommendations are based on experience from several installations.
The average number of links per item is quite low compared to pure Web content installations,
or pure SharePoint installations. For example, in a pure Web content installation the average
number of links can be as high as 50. Since the Web Analyzer only stores document IDs,
hyperlinks and anchor texts, the number of links is the dominant factor determining the disk
usage.
60
Large FAST Search Farm
The large FAST Search Farm is targeting a moderate test corpus. The amount of content is up to
100 million items, and to meet freshness goals, incremental crawls are likely to occur during
business hours.
The configuration for the parent SharePoint farm uses two front-end Web servers, two
application servers and one database server arranged as follows:
Two crawl components for the Content SSA are distributed across the two application
servers. This is mainly due to I/O limitations in the test setup (1 Gbit/s network), where
a single network adapter would have been a bottleneck.
One of the application servers also hosts Central Administration for the farm.
One database server hosts the crawl databases, the FAST Search Server administration
databases, as well as the other SharePoint databases.
Application servers and Web front end servers will only have disk space for operating system
and programs. No separate data storage is required.
Deployment alternatives
For the large farm scenario, the following alternatives have been tested for the FAST Search
Server farm back-end:
L1. Single row, six column setup, with an additional administration node (7 servers)
L2. Same as L1, with the addition of a dedicated search row (13 servers)
L3. Same as L2, but where the search row includes a backup indexer row (13 servers)
Specifications
This section provides detailed information about the hardware, software, topology, and
configuration of the test environment.
Hardware
FAST Search Server farm servers
All the large size deployment alternatives are running on similar hardware. The following
specifications have been used.
Windows Server 2008 R2 x64 Enterprise Edition
2x Intel L5520 CPUs
o
Hyper-threading switched on
o
Turbo Boost switched on
24 GB memory
1 Gbit/s network card
Storage subsystem
o
OS: 2x 146GB 10k RPM SAS disks in RAID1
o
Application: 12x 146 GB 10k RPM SAS disks in RAID50 (two parity groups of 6
drives each). Total formatted capacity of 2 TB.
o
Disk controller: HP Smart Array P410, firmware 3.30
61
o
Disks: HP DG0146FARVU, firmware HPD6
SharePoint Server 2010 servers
Application and Web front end servers do not need storage apart from operating system,
application binaries and log files.
Windows Server 2008 R2 x64 Enterprise edition
2x Intel L5420 CPUs
16 GB memory
1 Gbit/s network card
Storage subsystem for OS/Programs: 2x 146GB 10k RPM SAS disks in RAID1
SQL servers
Same specification as for SharePoint 2010 servers above, with additional disk RAID for SQL data
with 6x 146GB 10k RPM SAS disks in RAID5.
62
Topology
This section describes the topology of the test environment.
L1
L1 is a single row, six column setup with an additional administration node. The following
deployment .xml file is used:
<?xml version="1.0" encoding="utf-8" ?>
<deployment version="14" modifiedBy="contoso\user"
modifiedTime="2009-03-14T14:39:17+01:00" comment="L1"
xmlns=”http://www.microsoft.com/enterprisesearch”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd">
<instanceid>L1</instanceid>
<connector-databaseconnectionstring>
[<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=L1.jdbc]]>
</connector-databaseconnectionstring>
<host name="fs4sp1.contoso.com">
<admin />
<query />
<webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/>
<document-processor processes="12" />
</host>
<host name="fs4sp2.contoso.com">
<query />
<searchengine row="0" column="0" />
<document-processor processes="12" />
</host>
<host name="fs4sp3.contoso.com">
<content-distributor />
<searchengine row="0" column="1" />
<document-processor processes="12" />
</host>
<host name="fs4sp4.contoso.com">
<content-distributor />
<searchengine row="0" column="2" />
<document-processor processes="12" />
</host>
<host name="fs4sp5.contoso.com">
<content-distributor />
<searchengine row="0" column="3" />
<document-processor processes="12" />
</host>
<host name="fs4sp6.contoso.com">
<indexing-dispatcher />
<searchengine row="0" column="4" />
<document-processor processes="12" />
</host>
<host name="fs4sp7.contoso.com">
<indexing-dispatcher />
<searchengine row="0" column="5" />
<document-processor processes="12" />
</host>
<searchcluster>
<row id="0" index="primary" search="true" />
</searchcluster>
</deployment>
63
L2
L2 corresponds to L1 with the addition of a dedicated search row. The search row adds query
throughput capacity, introduces query redundancy, and provides better separation of query and
feeding load. Three servers running in the dedicated search row also includes a query processing
component (query). The deployment also includes a query processing component on the
administration node (fs4sp1.contoso.com). The Query SSA does not use this query processing
component during normal operation, but may be used as a fallback to be able to serve queries if
the entire search row is taken down for maintenance.
<?xml version="1.0" encoding="utf-8" ?>
<deployment version="14" modifiedBy="contoso\user"
modifiedTime="2009-03-14T14:39:17+01:00" comment="L2"
xmlns=”http://www.microsoft.com/enterprisesearch”
xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xsi:schemaLocation="http://www.microsoft.com/enterprisesearch deployment.xsd">
<instanceid>L2</instanceid>
<connector-databaseconnectionstring>
[<![CDATA[jdbc:sqlserver://sqlbox.contoso.com\sql:1433;DatabaseName=L2.jdbc]]>
</connector-databaseconnectionstring>
<host name="fs4sp1.contoso.com">
<admin />
<query />
<webanalyzer server="true" link-processing="true" lookup-db="true" max-targets="4"/>
<document-processor processes="12" />
</host>
<host name="fs4sp2.contoso.com">
<searchengine row="0" column="0" />
<document-processor processes="12" />
</host>
<host name="fs4sp3.contoso.com">
<content-distributor />
<searchengine row="0" column="1" />
<document-processor processes="12" />
</host>
<host name="fs4sp4.contoso.com">
<content-distributor />
<searchengine row="0" column="2" />
<document-processor processes="12" />
</host>
<host name="fs4sp5.contoso.com">
<content-distributor />
<searchengine row="0" column="3" />
<document-processor processes="12" />
</host>
<host name="fs4sp6.contoso.com">
<indexing-dispatcher />
<searchengine row="0" column="4" />
<document-processor processes="12" />
</host>
<host name="fs4sp7.contoso.com">
<indexing-dispatcher />
<searchengine row="0" column="5" />
<document-processor processes="12" />
</host>
<host name=" fs4sp8.contoso.com ">
<query />
<searchengine row="1" column="0" />
64
</host>
<host name="fs4sp9.contoso.com ">
<query />
<searchengine row="1" column="1" />
</host>
<host name="fs4sp10.contoso.com ">
<query />
<searchengine row="1" column="2" />
</host>
<host name="fs4sp11.contoso.com ">
<searchengine row="1" column="3" />
</host>
<host name="fs4sp12.contoso.com">
<searchengine row="1" column="4" />
</host>
<host name="fs4sp13.contoso.com"
<searchengine row="1" column="5" />
</host>
<searchcluster>
<row id="0" index="primary" search="true" />
<row id="1" index="none" search="true" />
</searchcluster>
</deployment>
L3
L3 has the same setup as L2 with an additional backup indexer enabled on the search row. The
backup indexer is deployed by modifying the L2 deployment .xml file as shown below.
…
<searchcluster>
<row id="0" index="primary" search="true" />
<row id="1" index="secondary" search="true" />
</searchcluster>
…
Dataset
This section describes the test farm dataset, including database content and sizes, search
indexes, and external data sources.
Object
Search index size (# of items)
Size of crawl database
Size of crawl database log file
Size of property database
Size of property database log file
Size of SSA administration database
Value
103 million
358 GB
65 GB
<0.1 GB
0.6 GB
<0.1 GB
65
The table below specifies the content source types used to build the index. The numbers in the
table reflect the total number of items per source, including replicated copies. The difference
between the total number of items below and the index size above is due to two factors:
Items may be disabled from indexing in the content source
The document format type cannot be indexed
For SharePoint sources, the size of the respective content database in SQL is used as the raw
data size.
Content source
File share 1 (4 copies)
File share 2 (4 copies)
SharePoint 1 (4 copies)
SharePoint 2 (3 copies)
HTML 1 (3 copies)
HTML 2 (3 copies)
Total
Note:
items
Raw data size
Average size
per item
2.4 M
58.6 M
18.1 M
13.6 M
3.2 M
9.5 M
105.5 M
308 GB
13.4 TB
8.0 TB
6.0 TB
26 GB
411 GB
28 TB
128 kB
229 kB
443 kB
443 kB
8.1 kB
43 kB
268 kB
To reach sufficient content volume in these tests, replicas of the data sources were added.
Each copy of each document would appear as a unique item in the index, but treated as
duplicates by the duplicate trimming feature. From a query matching perspective the load
would be similar as having all unique documents indexed, but any results from these sources
would trigger duplicate detection and collapsing in the search results.
The test scenarios do not include people search data. People Search is crawled and indexed in a
separate index within the Query SSA.
Test Results
This section provides data that shows how the farm performed under load.
Feed and indexing performance
All the large scenario deployment alternatives were limited by the bandwidth of the content
sources; L1 through L3 all achieved around 200 items per second feed rates.
Query performance
66
L1, L2 and L3 are scaled up versions of M1, M4 and M6 respectively. More columns are added to
the farm to be able to index more content while maintaining the query performance. The
following graph shows the query performance for L1 through L3, with and without ongoing
crawls. L2 and L3 are giving roughly the same performance, and also the same performance as
the corresponding smaller scale M4 and M6.
L1 shows a slightly different performance pattern compared to M1. Given 1 second maximum
allowable latency, M1 achieved 27 and 16 QPS under respectively idle and running feed. The
same numbers for L1 is 18 and 9 QPS. The additional columns in L1 compared to M1 infer that
more servers are involved, where the slowest one for any given query at any given time will be
the determining factor for the query latency. This effect is much less visible for the L2 and L3
deployments with dedicated search rows, as these machines do not have other components
competing for resources.
Disk usage
The table below shows the combined disk usage on all nodes in the L1 deployment alternative.
L2 and L3 uses further disk space for replication of FiXML and index files on the second row.
Content source
Total
Raw source
data size
28 TB
FiXML data
size
1.1 TB
Index data
size
3.8 TB
Other data size
104 GB
Extra-large FAST Search Farm
This scenario has not yet been tested. Planned capacity is 500 million items per farm.
67
Overall Takeaways
Query and feeding performance
Performance for feeding of new content is mainly determined by the item processing capacity. It
is therefore important that you deploy the item processing component in a way that utilizes
spare CPU capacity across all servers.
Running indexer, item processing and query matching on the same server will give high
resource utilization, but also higher variations in query performance during crawling. For such a
deployment it is recommended to schedule all crawling outside periods with high query load.
A separate search row is recommended for deployments where low query latency is required at
any time.
You can also combine a separate search row with a backup indexer. This will provide short
recovery time in case of a non-recoverable disk error, with some loss of query performance and
incremental update rates. For the highest query performance requirements, a pure search row is
recommended.
Redundancy
The storage subsystem for a farm must have some level of redundancy, as loss of storage even
in a redundant setup will lead to reduced performance during a recovery period that can last for
days. Using a RAID disk set, preferably also with hot spares, is essential to any install.
A separate search row will also provide query redundancy.
Full redundancy for the feeding and indexing chain requires a backup indexer on a separate row,
with increased server count and storage volume. While this provides the quickest recovery path
from hardware failures, other options might be more attractive when hardware outages are
infrequent:
Running full re-crawl of all the content sources after recovery. Depending on deployment
alternative this may take several days. If you have a separate search row you can
perform the re-crawl while keeping the old index searchable.
Run regular backup of the index data.
Capacity per node
For deployments with up to 15 million items per node you should use the default configuration.
Configuration for extended content capacity can be used for up to 40 million items per node if
you have moderate query performance requirements. Given sufficient storage capacity on the
servers, this will enable a substantial cut in number of servers deployed.
Deployments on storage area networks (SAN)
FAST Search Server can use SAN storage instead of local disks if this is required for operational
reasons. The requirement for high performance storage still applies. Testing of the M2
deployment alternative shows that a sufficiently powerful SAN will not be a bottleneck. Although
the actual workload is scenario dependent, the following parameters could be used as estimation
for the required SAN resources for each node in the FAST Search Server farm:
2000 – 3000 I/O operations per second (IOPS)
50 – 100 kB average block size
68
Less than 10 ms average read latency
For example, for a farm setup like M4 (7 servers), the SAN must be capable of serving 15.000 –
20.000 IOPS to the FAST Search Server farm regardless of any other traffic served by the same
storage system.
Deployments on solid state disks (SSD)
FAST Search Server can take advantage of the increased IO performance of SSDs. The XS5
deployment alternative had low content volume and high QPS. With regular spindles, a high
number of disks would be needed, while only two SSDs were sufficient. This allows for using
blade servers with local disks on the blade itself, and still having sufficient IO performance.
The M8 and M10 deployment alternatives used SSDs with even higher storage capacity and IO
performance. Both of these configurations were entirely limited by the CPU performance. The
SSDs made it feasible to have high query performance without a dedicated search row, thus
roughly halving the server count needed for a certain performance target1. The reduced disk
count per server also yields lower power consumption.
1
Removing the search row does although remove redundancy in case of server failures.
69
Troubleshooting performance and scalability
This section provides recommendations for how to optimize the capacity and performance of
your system environment.
It also covers troubleshooting tips for the FAST Search Server farm servers, and the FAST
Search Server specific configuration settings found in the Query and Content SSAs.
Raw I/O performance
FAST Search Server has extensive use of the storage subsystem. Testing the raw I/O
performance can be used as an early verification of having sufficient performance.
One such test tool is SQLIO
(http://www.microsoft.com/downloads/details.aspx?familyid=9a8b005b-84e4-4f24-8d65cb53442d9e19).
After installing SQLIO, the first step is to get or generate a suitable test file. The tests below
include write operations, thus the content of this file will be partially overwritten. The size of the
file should also be much larger than the available system memory (by a factor of 10) to avoid
most caching effects.
The test file can also be generated by SQLIO itself, although not directly for huge file sizes. It is
recommended to generate a 1 GB file with the command "sqlio.exe -t32 -s1 -b256 1g" which
will create the file named "1g" in the current directory. This file can then be concatenated to a
sufficiently large file like 256GB, by the command "copy 1g+1g+1g+…..+1g testfile". To ensure
that caching during the test file preparation do not skew the results, a server reboot is
recommended before continuing with the specified tests.
The following set of commands is representative for the most performance critical disk
operations in FAST Search Server. All assume that a file "testfile" exists in the current directory,
which should be located on the disk planned to host FAST Search Server. Each test runs for 300
seconds:
sqlio.exe
sqlio.exe
sqlio.exe
sqlio.exe
sqlio.exe
-kR
-kR
-kW
-kR
-kW
-t4
-t4
-t4
-t1
-t1
-o25 -b1 -frandom -s300 testfile
-o25 -b32 -frandom -s300 testfile
-o25 -b32 -frandom -s300 testfile
-o1 -b100000 -frandom -s300 testfile
-o1 -b100000 -frandom -s300 testfile
The first test measures the maximum number of I/O operations per second for small read
transfers. The second and third tests measure the performance for medium sized random
accesses. The two last tests measures read and write throughput for large transfers. Some
example results are given in the following table, with minimum recommendations during normal
operation in the topmost row.
70
1kB
read
[IOPS]
2000
32kB
read
[IOPS]
1800
32kB
write
[IOPS]
900
100MB
read
[MB/s]
500
100MB
write
[MB/s]
250
2952
2342
959
568
277
4326
3587
1638
1359
266
3144
2588
1155
770
257
1844
1315
518
677
780
1424
982
531
220
477
1682
1134
1169
762
692
1431
925
1154
213
220
4533
3665
848
501
235
52709
14253
27172
360
122
83545
21875
17687
676
533
1x ioDrive Duo 1280GB MLC
160663
42647
32574
1309
664
3x ioDrive Duo 1280GB MLC RAID0
3x ioDrive Duo 1280GB MLC RAID0
Non-default option: 4kB block size
3x ioDrive Duo 1280GB MLC RAID5
162317
83661
44420
2382
1412
1815933
86396
47423
2340
1631
188284
87270
11800
2459
545
With card failure
126469
48564
10961
716
202
Disk layout
Recommended minimum
16x SAS 10k RPM 2.5" drives
RAID50 in two parity groups
22x SAS 10k RPM 2.5" drives
RAID50 in two parity groups
With drive failure
12x SAS 7200 RPM 3.5" drives
RAID50 in two parity groups
With drive failure
12x SAS 7200 RPM 3.5" drives
RAID10
With drive failure
12x SAS 15k RPM 3.5” drives
RAID50 in two parity groups
2x ZeusIOPS 400GB MLC 2.5” drives
RAID0
1x ioDrive 640GB MLC
Note:
The numbers in the table reflects a deployment where the disk subsystem is at least 50%
utilized in capacity before adding the test file. Testing on empty disks tends to get elevated
results, as the test file is then placed in the most optimal tracks across all spindles (shortstroking), which can give 2-3x higher performance.
Numbers in rows highlighted in red are measured with a forced drive failure.
RAID50 provides better performance during normal operation than RAID10 for most tests apart
from small writes. RAID10 has less performance degradation if a drive should fail. We
recommend using RAID50 for most deployments, as 32kB writes is the least critical of the five
This is the average IOPS over the standard 300 second test period. It although starts out as
~3500 IOPS, degrading to a sustained ~1700 IOPS after 3-4 minutes.
2
3
Tested with 4kB block reads due to the different block size formatting
71
tests indicated in the table above. RAID50 provides near twice the storage capacity compared to
RAID10 on the same number of disks.
If you deploy a backup indexer, 32kB writes are more frequent. This is due to the fact that a
large amount of pre-index storage files (FiXML) are passed from the primary to the backup
indexer node. This may in certain cases lead to a performance improvement by using RAID10.
Note:
These results are to a large degree dependent on the disk controller and spindles used. All
scenarios in this document specify in detail the actual hardware that has been tested.
Analyzing feeding and indexing performance
The content processing chain in FAST Search Server consists of the following components, all
potentially running on separate nodes:
Crawler(s): Any node pushing content into FAST Search Server, in most cases a Content
SSA hosted in a SharePoint 2010 farm.
Content distributor(s): Receives content in batches and redistribute them to item
processing in document processors
Item processing: Converts documents to a unified internal format
Indexing dispatcher(s): Schedules a indexer node for each content batch
Primary indexer: Generates the index
Backup indexer: Persists a backup of the information in the primary indexer
Content flows as indicated by arrows 1–5 in the figure above, with the last flow from primary to
backup indexer is an optional deployment choice. Asynchronous callbacks for completed
processing are propagating in the other direction as indicated by arrows 6 through 9. Crawlers
will be throttling the feed rate based on the callbacks (9) received for document batches (1).
The overall feed performance will be determined by the slowest component in this chain. The
following sections will describe how to monitor this.
Monitoring can be done through several tools; for example the "Performance monitor" of
Windows Server 2008 [R2], or on Systems Center Operations Manager (SCOM).
Content SSA
The most frequently used crawler is the set of indexing connectors supported by the Content
SSA. The following statistics are important:
72
Batches ready: The number of batches that has been retrieved from the content sources,
and that are ready for passing on to the content distributor.
Batches submitted: The number of batches that has been sent to FAST Search Server,
and for which a callback is still pending.
Batches open: The total number of batches in some stage of processing.
The figure below shows these performance counters for a crawl session. Note that there is
different scale used in "batches submitted" and the other two. Feed starts with "batches
submitted" ramping up until the item processing components are all busy (36 in this case), and
will stay at this level as long as there are available work ("batches ready"). There is a period
around 6:45 to 8:45 where the content source is only able to provide very limited volumes of
data, bringing "batches ready" to near zero in the same period.
For deployments with backup indexer rows, the "batches submitted" tend to exceed the number
of item processing components. These "additional" batches are content that has been processed,
but which has not yet been persisted in both indexer rows. The Content SSA will by default
throttle feeds in order to avoid more than 100 "batches submitted".
For large installations, the throttling parameters should be adjusted to allow for more batches to
be in some stage of processing. Tuning is only needed for deployments with at least one of the
following characteristics:
More than 100 item processing component instances deployed per crawl component in
the content SSA
More than 50 item processing component instances deployed per crawl component in the
content SSA, in conjunction with a backup indexer row
More than 3 index columns per crawl component in the content SSA
73
The number of crawl components within the content SSA must be dimensioned properly for
large deployments to avoid network bottlenecks. This scaling will often eliminate the need for
further configuration tuning. When one or more of the above mentioned conditions apply, the
feeding performance can be improved by increasing throttling limits in the Content SSA. These
properties are "MaxSubmittedBatches" (default 100) and "MaxSubmittedPUDocs" (default 1000),
and increased limits can be calculated as given below.
Note:
These limits apply for each crawl component within the Content SSA. If you use two crawl
components (as in some of the scenario tests), the maximum total number of batches
submitted will be two times the configured value.
𝑎= {
1 for deployments without backup indexer
2
for deployments with backup indexer
𝑏 = 𝑎 ∗ Number of item processor instances
𝑐 = Number of index columns
𝑠 = Number of crawl components in the Content SSA
MaxSubmittedBatches =
20 ∗ 𝑐 + 𝑏
𝑠
MaxSubmittedPUBatches = 100 ∗ MaxSubmittedBatches
For example, the M4 scenario will have a=1, b=48, c=3, s=2; resulting in MaxSubmittedBatches
= 54 and MaxSubmittedPUDocs = 5400. The default value (100) for MaxSubmittedBatches does
not need tuning in this case. MaxSubmittedPUDocs (the maximum number of documents with
ACL changes submitted) may be increased if the feed performance is limited by a high rate of
ACL changes. These configuration parameters have not been changed in any of the scenarios
covered in this document.
These throttling limits are configurable through the SharePoint 2010 Management Shell on the
SharePoint farm hosting the Content SSA. The following commands set the default values:
$ssa = Get-SPEnterpriseSearchServiceApplication -Identity "My Content SSA"
$ssa.ExtendedConnectorProperties["MaxSubmittedBatches"] = 100
$ssa.ExtendedConnectorProperties["MaxSubmittedPUDocs"] = 1000
$ssa.Update()
You need to replace the identity string "My Content SSA" with the name of your Content SSA.
Increasing these limits will increase the load on the item processing and indexing components.
When these consume more of the farm resources, query performance will be impacted. This is
less of an issue when running without a dedicated search row. Increasing
"MaxSubmittedPUDocs" will increase the I/O load on primary and backup indexers.
The table below shows the most important performance counters for the Content SSA. Note that
these are found on the node(s) hosting the Content SSA crawl components, under "OSS Search
FAST Content Plugin", and not in the FAST Search Server farm.
74
Performance
counter
Apply to
object
Notes
Batches open
Content
SSA
The total number of batches in some stage of processing.
Batches
submitted
Content
SSA
The number of batches that has been sent to FAST Search
Server, and for which a callback is still pending. When zero,
nothing has been sent to the FAST Search Server farm backend
for processing.
Batches
ready
Content
SSA
The number of batches that has been retrieved from the
content sources and that are ready for submitting to the
content distributor. When zero, the FAST Search Server farm
backend is processing content faster than the Content SSA is
able to crawl.
Items Total
Content
SSA
The total number of items passed through the Content SSA
since last service restart.
Available
Mbytes
Memory
The total amount of available memory on the computer. The
Content SSA will by default stop aggregating batches ready
when 80% system memory has been used.
Processor
time
Processor
Overall CPU usage on the computer. High CPU load could limit
the throughput of the Content SSA.
Bytes
Total/sec
Network
Interface
Overall network usage on the computer. High network load
might become a bottleneck for the rate of data that can be
crawled and pushed to the FAST Search Server farm.
Content distributors and item processing
Each FAST Search Server farm has one or more content distributors. These components receive
all content in batches, which are passed on to the item processing components. You can ensure
good performance by verifying that the following conditions are met:
Item processing components are effectively utilized
Incoming content batches are rapidly distributed for processing
Maximum throughput can only be achieved when the Content SSA described in the previous
section has a constant queue of "bathes ready" that can be submitted. Each item processing
component will use 100% of a CPU core when busy. Item processing components can be scaled
up to one per CPU core.
When having multiple content distributors, the below performance counters should be summed
up across all of them for a total overview of the system.
Performance
counter
Apply to
object
Notes
Document
processors
FAST Search
Content
Distributor
The number of item processing components registered with
each content distributor. When having multiple content
distributors, the item processing components will be evenly
distributed across the content distributors.
75
Document
processors
busy
FAST Search
Content
Distributor
The number of item processing components that are
currently working on a content batch. This should be close
to the total number of item processing components under
maximum load.
Average
dispatch time
FAST Search
Content
Distributor
The time needed for the content distributor to send a batch
to an item processing component. This should be less than
10ms. Higher values indicate a congested network.
Average
processing
time
FAST Search
Content
Distributor
The time needed for a batch to go through an item
processing component. This time can vary depending on
content types and batch sizes, but would normally be less
than 60 seconds.
Available
Mbytes
Memory
The total amount of available memory on the computer.
Each item processing component might need up to 2GB of
memory. Processing throughput will be impacted under
memory starvation.
Processor
time
Processor
Overall CPU usage on the computer. Item processing
components are very CPU intensive. High CPU utilization is
expected during crawls, but item processing is scheduled
with reduced priority and will yield CPU resources to other
components when needed.
Bytes
Total/sec
Network
Interface
Overall network usage on the computer. High network load
might become a bottleneck for the rate of data that can be
processed by the FAST Search Server nodes.
Indexing dispatcher and indexers
Indexers are the most write intensive component in a FAST Search Server installation, and you
need to ensure that you high disk performance. High indexing activity can also affect query
matching operations when running on the same row.
Indexers distribute the items across several partitions. Partition 0, and up to three of the other
partitions can have ongoing activity at the same time. During redistribution of items among
partitions, one or more partitions might be in a state waiting for other partitions to reach a
specific checkpoint. In addition to the performance counters below, indexer status is provided by
the "indexerinfo" command, for example "indexerinfo –a status".
Performance
counter
Apply to
object
Notes
76
Current
queue size
FAST
Search
Indexer
Status
Indexers queues incoming work under high load. This is normal,
especially for partial updates. If API queues never
(intermittently) reaches zero, the indexer is the bottleneck.
Feeds will be paused when the queue reaches 256MB in one of
the indexers.
This can happen if the storage subsystem is not sufficiently
powerful. It will also happen during large redistribution of
content between partitions, which temporarily blocks more
content from being indexed.
FiXML fill
rate
FAST
Search
Indexer
FiXML files are compacted at regular intervals, by default
between 3am and 5am every night. Low FiXML fill rate (<70%)
will lead to inefficient operation.
Active
documents
FAST
Search
Indexer
Partition
Partitions 0 and 1 should have less than 1 million items each,
preferably even less in order to keep indexing latency low. In
periods with high item throughput, indexing latency will be
reduced and these partitions will be larger, as this is more
optimal for overall throughput. Items will although automatically
be rearranged into the higher numbered partitions during
periods with lighter load.
% Idle Time
Logical disk
Low disk idle time suggest a saturated storage subsystem.
% Free
space
Logical disk
Indexers need space for both the index generation currently
used for search, as well as new index generations that are under
processing. On a fully loaded system, disk usage will vary
between 40% and near 100% for the same number of items,
depending on the state of the indexer.
Analyzing query performance
Query SSA
SharePoint administrative reports provide useful statistics for query performance from an endto-end perspective. These reports are effective for tracing trends over time, as well as
identifying where to investigate when performance is not optimal.
The diagram below shows two such events. Around 2:20am, server rendering (blue graph) has a
short spike due to recycling of the application pool. Later, at 3:00am, the FiXML compaction is
starting, impacting the backend latency.
77
In general, server rendering and object model latencies occurs on the nodes running SharePoint.
These latencies are also dependent on the performance of the SQL server(s) backing the
SharePoint installation. The backend latency is within the FAST Search Server nodes, and will be
discussed in the following sections.
QRproxy and QRserver
Queries are sent from the Query SSA to the FAST Search Server farm via the QRproxy
component which resides on the server running the query processing component ("query" in the
deployment file). The performance counters in the table below can be helpful for correlating the
backend latency reported by the Query SSA, and the query matching component (named
"QRServer" in the reports). Neither of these components is likely to represent a bottleneck. Any
difference between the two is due to communication delays or processing in the QRproxy.
Performance counter
Apply to
object
Notes
# Queries/sec
FAST Search
QRServer
Current number of queries per second
# Requests/sec
FAST Search
QRServer
Current number of requests per second. In
addition to the query load, one internal request is
received every second to check that QRserver is
alive.
Average queries per
minute
FAST Search
QRServer
Average query load
78
Average latency last ms
FAST Search
QRServer
Average query latency
Peak queries per sec
FAST Search
QRServer
Peek query load seen by the QRserver since last
restart
Query dispatcher
The query dispatcher (named "Fdispatch" in the reports) distributes queries across index
columns. There is also a query dispatcher located on each query matching node, distributing
queries across index partitions. Both query dispatchers may be a bottleneck when there are
huge amounts of data in the query results, leading to network saturation. It is recommended to
keep traffic in and out of fdispatch on network connections that are not carrying heavy load from
e.g. content crawls.
Query matching
The query matching (component named "Fsearch" in the reports) is responsible for performing
the actual matching of queries against the index, computing query relevancy and performing
deep refinement. For each query, it reads the required information from the indices generated
by the indexer. Information that is likely to be reused will be kept in a memory cache. Good
query matching performance is relying on a powerful CPU as well as low latency from small
random disk reads (typically 16-64 kB). The below performance counters are useful for
analyzing a node running the query matching:
Performance counter
Apply to
object
Notes
% Idle Time
Logical disk
Low disk idle time suggest a saturated storage
subsystem
Avg. Disk sec/Read
Physical disk
Each query will need a series of disk reads. An
average read latency of less than 10 ms is
desirable.
Avg. Disk Read Queue
Length
Physical disk
On a saturated disk subsystem, read queues will
build up. Queues will affect query latency. An
average queue length smaller than 1 is desirable
for any node running query components. This will
typically be exceeded in single row deployments
during indexing, negatively impacting search
performance.
Processor time
Processor
CPU utilization is likely to become the bottleneck
for high query throughput. When query matching
has high processor time (near 100%), query
throughput will not be able to increase further.
79
© Copyright 2026 Paperzz