Scaling and Tuning
Solr in the Real World
[Todd Jefferson – Michelin]
[Sr Mgr/Business Project Leader]
[Colin Stephenson – Armedia LLC]
[Solutions Architect]
[Corentin Roux – Alfresco]
[Technical Account Manager]
Scaling and Tuning Solr
How to Train Your Dragon
Who are we?
• Thing 1
• Thing 2
• Todd Jefferson
• Michelin
• Business Project Leader
• 30+ Years IT experience
• Colin Stephenson
• Armedia LLC
• Solutions Architect
• 10+ years in ECM Space
• Thing 3
• Corentin Roux
• Alfresco
• Technical Account Manager
• 8 years experience in ECM
The Project: BibDoc
Back Ground
•
•
•
•
•
•
A Brief History …
System Usage and Growth …
Monitoring & Analysis …
Life with Solr …
Questions? …
Sessions of Interest
A Brief History …
A Brief History …
•
•
•
•
Context
Educate Team on Alfresco
Build Out Environment
Migrate Legacy System
Context: File Sharing / Content Management
How many solutions do you use to store your
files and information?
Wiki
JIRA
Page 7
Educating the team on Alfresco
• Understand the capabilities
• Install it
• Get to “know” how it works
• Demonstrate, demonstrate, demonstrate, …
• Educate, educate, educate, …
System Usage and Growth …
System Usage and Growth
Usage
• 12 months
• 0 to 800+ users
Documents
• 12 months
• 0 to 3m +
Monitoring and Analysis …
Monitoring …
•
•
•
•
Are you monitoring?
What are you watching?
Do you regularly check the data?
Quick plug
Monitoring Your Alfresco Installation
9.40 — 10.20 Thursday 25 September
Miguel Rodriguez
Monitoring and Analysis
Queries Completing Per Hour
2000
1800
1600
1400
1200
1000
800
600
400
200
0
p1
P2
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Monitoring and Analysis
Results Returned Per Hour
25,000,000
20,000,000
15,000,000
p1
10,000,000
P2
5,000,000
0
1
2
3
4
5
6
7
8
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Monitoring and Analysis
Queries Returned Per Minute
70
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
60
No. Of Queries Per Min
50
40
30
20
10
0
0
2
4
6
8
10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58
Life with Solr …
One Solr Indexer/Search Server
Alfresco-global.properties
index.subsystem.name=solr
dir.keystore=${dir.root}/keystore
solr.port.ssl=8443
solrecore.properties
enable.alfresco.tracking=true
alfresco.host=myalfresco.com
alfresco.port.ssl=8443
alfresco.baseUrl=/alfresco
alfresco.secureComms=https
Separate Cores, Redundancy
custom-core-service.context.xml
Alfresco-global.properties
solr.host.workspace=myothersolr.com
<bean
id="solrHttpClientFactoryWorkspace"
solr.host.port=8080
class="org.alfresco.httpclient.HttpClientFactory"
init-method="init">
solr.port.ssl.workspace=8443
<property name="secureCommsType" value="${solr.secureComms.workspace}"/>
solr.secureComms.workspace=none
<property name="sSLEncryptionParameters" ref="sslEncryptionParameters"/>
<property name="keyResourceLoader" ref="springKeyResourceLoader"/>
solr.store.mappings=solrMappingAlfresco,solrMappingArchive
<property name="keyStoreParameters" ref="keyStoreParameters"/>
solr.store.mappings.value.solrMappingAlfresco.httpClientFactory=
<property name="encryptionParameters" ref="md5EncryptionParameters"/>
solrHttpClientFactoryWorkspace
<property name="host" value="${solr.host.workspace}"/>
solr.store.mappings.value.solrMappingAlfresco.baseUrl=/solr/alfresco
<property name="port" value="${solr.port.workspace}"/>
solr.store.mappings.value.solrMappingAlfresco.protocol=workspace
<property name="sslPort" value="${solr.port.ssl.workspace}"/>
solr.store.mappings.value.solrMappingAlfresco.identifier=SpacesStore
<property name="maxTotalConnections"
solr.store.mappings.value.solrMappingArchive.httpClientFactory=solrHttpCl
value="${solr.max.total.connections}"/>
ientFactory
<property name="maxHostConnections"
solr.store.mappings.value.solrMappingArchive.baseUrl=/solr/archive
value="${solr.max.host.connections}"/>
solr.store.mappings.value.solrMappingArchive.protocol=archive
</bean>
solr.store.mappings.value.solrMappingArchive.identifier=SpacesStore
Split the Indexer and Searcher
Slave - solrcore.properties
SlaveSolrconfig.xml
Alfresco-global.properties
<updateHandler
class="org.alfresco.solr.AlfrescoUpdateHandler2">
enable.alfresco.tracking=false
<lst
name=“slave">
index.subsystem.name=solr
alfresco.secureComms=none
<str
dir.keystore=${dir.root}/keystore
alfresco.port=8080
name="masterUrl">http://myindexer:8080/solr/archive/replication</str>
solr.port.ssl=8443
enable.master=false
<str name="pollInterval">00:30:00</str>
solr.port=8080
enable.slave=true
</lst>
solr.host=mysearchsolr.com
Master –
- solrcore.properties
Solrconfig.xml
enable.master=true
<lst name="master">
enable.slave=false
<str name="replicateAfter">commit</str>
<!-- <str name="replicateAfter">optimize</str> -->
<str name="confFiles">schema.xml,stopwords.txt</str>
<str name="backupAfter">startup</str>
<str name="backupAfter">optimize</str>
</lst>
Add a Repeater.. Repeater..
Introducing the Repeater
Repeater..
Add Tracking dedicated Alfresco
• Avoid network congestion
• Indexing operations offloaded from
Alfresco “user” instances
• Dedicated Alfresco for transformation
operations
• Allow for specific transformations
tuning on the index tier
• Allows for Vertical and horizontal
scalability
Where We Are Today …
Where We Are Going …
Looking at the facts…
Check Out My Stats
Check Out My Stats
Hit Ratio 0.09!!
Hit Ratio 0.97!!
How to tune Solr Caches
• Check the stats
• High evictions -> too small
• Low hitratio -> worthy disabling the cache
• Long warmup time -> lessen AutoWarmCount
• Avoid Solr resizing its cache
• Set size == initial size
• Use an appropriate class
• LRU for import number of insert
• FastLRU when the cache is mostly readen
Tune the Searcher
•
•
•
•
mergeFactor
pollInterval
timeAllowed
Caches
• Size
• Initial Size
• Warm up
Tune the Indexer
• mergeFactor
• ramBufferSize
• Caches
• Size
• Initial Size
• Warm up
• Clean up DB of temp ACLs
Badly Tuned Replication
VMWare View
JVM View
Well Behaved Replication
VMWare View
JVM View
General Tuning
• Use SSD
• Disable Full Text Indexing on archive:SpacesStore
solralfresco.index.transformContent=false
• Monitor closely the JVM health of both Solr and Alfresco
(GC, Heap usage)
• Tune transformations on the repository side ->
transformation timeout
• Disable SSL if not necessary
• Save RAM for Disk cache at OS level
• Solr Sizing spreadsheet as a guideline to setting -Xms
Side Effect of replication
• Replication has a cost consistency !
• The biggest are your indexes, the more consistency
will be impacted
• Increased network chatter
• One more area which requires tuning
Looking to the SolrCloud
• Currently
• Manifold CF WS
• https://github.com/maoo/
alfresco-webscriptmanifold-connector
• Not supported
• Future
• Alfresco V5
SolrCloud brings ….
•
•
•
•
•
Index Sharding
Distributed request
Read/Write fault Tolerance
Disaster recovery
Near Real Time Searching
Questions?
Sessions of Interest
All of them……
But if we had to choose ……
• Monitoring Your Alfresco Installation
• 9.40 -10.20 Thursday
• What's New with Search in Alfresco 5
• 12.10 -12.50 Wednesday
• Best Practices for Alfresco Replication,
Backup and Disaster Recovery
• 10.50 -11.30 Thursday
© Copyright 2026 Paperzz