Retail Rush

®
IBM Software Group
Retail Rush
Surviving the peak season
This session will be recorded and a replay will be available on IBM.COM sites and possibly social
media sites such as YouTube. When speaking, do not state any confidential information, your
name, company name or any information that you do not want shared publicly in the replay. By
speaking during this presentation, you assume liability for your comments.
© 2013 IBM Corporation
IBM Software Group
AGENDA
Peak season – Why the rush?
Peak season – Analytics
Peak Readiness - Strategy
Performance Hot spots – Tuning
General Recommendations - checklist
Smart monitoring
Questions
Retail Rush
2
© 2011 IBM Corporation
IBM Software Group
Peak Season – Why the rush?
Predominantly seen during the holidays in the last
quarter – black Friday and Cyber Monday
Also seen during other sales like Boxing day sales, LTO
and exclusive discounts by retailers
Massive discounts provided by retailers –
Free shipping
Extended shopping hours – shops open through midnight
Lightening deals
Door busters
Retail Rush
3
© 2011 IBM Corporation
IBM Software Group
Peak season – Analytics
Per the report published by IBM post Black Friday 2012, an increase of 17% and 20% were noticed
in online sales compared to the previous years on Thanksgiving and Cyber Monday.
The increase in online sales were mainly driven by –
Mobile Shopping – Grew by 67%
The iPad Factor – contributed to 10% of online shopping
Multi-screen Shopping – Customers shopped in-store
store and online to get best deals
*This data is the result of cloud-based
based analytics findings from IBM.
Retail Rush
© 2011 IBM Corporation
IBM Software Group
Peak Readiness - Strategy
Time .
Performance Environment Readied .
Performance Environment Loaded .
Component Load Tests .
Combination Load Tests .
Endurance Tests .
Assembly Tests .
Failover Tests .
• Setup Performance environment to mimic production(80% of Production Hardware Capacity, Inventory
Distribution, High Availability Configuration)
•Run
Run each component in isolation to ensure that the throughput and response time SLA requirements are met.
Identify bottlenecks, tune and rerun tests until the goals are met.
• Assemble components to reflect real time scenarios and run them in parallel to ensure adherence with NFR.
NFR
For example : Create + Schedule + Release+ Create Shipment + Confirm Shipment + RTAM can be one assembly.
• Run Endurance tests to simulate back-to-back(
back( 4 hrs typical) peak load on the system to check the robustness
• Database, MQ High Availability - To ensure that the Sterling Solution is correctly configured to be available if
one of the participating components like MQ JMS, Database fails over to a different node
Retail Rush
© 2011 IBM Corporation
IBM Software Group
Hot spots– Tuning
Application server
TipSet Datasource MaxConnections(X) >
WebContainer threads(Y).
For example: X = Y + 5 - this will give enough
room for application timer threads etc
Datasource Configuration
Configure a Datasource on the Appserver side
Max and Min Datasource connection is a
sensitive setting for the performance of application
Long running Mission Critical Transactions
Response time- 90th Percentile value of
Inventory Looks up calls, Submit Order < 3s
Optimally tune services calling an external system – set
Timeout interval, number of retries, retry interval
Investigate long running synchronous transactions by generating timer logs of API/Service
Notorious List APIs
Custom actions having list API calls to have a crisp output
TipPass a cap on the TotalNumberOfRecords in the
Inputs to any API that does a List action
template per IBM Sterling documentation
Use narrow search criteria - Status, EnterpriseCode
Total Number of Records etc
Retail Rush
© 2011 IBM Corporation
IBM Software Group
Hot spots– Tuning
CPU and Memory footprint:
Even a well tuned application can depict latent behavior
when run on an inadequate hardware configuration
TipEven a spike of 40% on a non-peak day is a
red alert notice and needs to be reviewed.
Closely monitor CPU and memory of the application server boxes
Investigate any unexplained CPU or memory spike
Review JVM Arguments:
Xmx and Xms to be optimally tuned
Enable verbosegc logging – one of the crucial diagnostics to troubleshoot JVM memory issues
Sterling Reference Cache settings – Trade off
Caching a entity reduces database overhead but at the same time it stretches JVM memory foot print.
Decision to enable caching for a database table should be made after careful evaluation.
Monitor the application logs for any frequent cache refreshes
Specifically look for tables like YFS_INVENTORY_NODE_CONTROL, YPM_PRICE_LIST_LINE,
YFS_ITEM and YFS_ITEM_SHIP_NODE related entities.
Caching settings for specific tables can be set in the customer overrides property file
Retail Rush
© 2011 IBM Corporation
IBM Software Group
Hot spots– Tuning
Agents/Integration servers
Threads and Topography:
Run 5 threads/JVM, add additional threads only if
TipDo not run multiple critical agent criteria on
same JVM.
JVM memory foot print permits.
Review the agent distribution and stop redundant JVMs.
Health Monitor:
Health Monitor to be constantly run, set retention days based on your implementation
Critical for effective cache propagation and for agents to register stats with the application
Use the OOB script provided in the bin folder to start health monitormonitor startHealthMonitor.sh
Enable Bulk Sender and evaluate using JMS pooling:
Explore the option of enabling bulk sender - benefits in case of RTAM and other high volume transactions
Consider the option of enabling Sterling JMS pooling-especially
pooling
if you have JVMs which are heavy on JMS
sender side
To use these features, entries for the following need to be made in the customer overrides property file
yfs.agent.bulk.sender.enabled=true / yfs.jms.session.disable.pooling=N
Retail Rush
© 2011 IBM Corporation
IBM Software Group
Hot spots– Tuning
Hot SKU:
Enable Hot SKU feature - Parameters are to be carefully
TipExplore the option of using special parameters
like time out locking and InventorySkipList
feature
derived based on the expected load, item and inventory distribution
If you expect sudden outburst in demand for specific items
utilize yfs.yfs.hotsku.skipLockInventoryitemList feature
Purge agents
Stop the purge agents few days before peak. Running purge agents during peak hours can cause database
overhead and impact critical agent and integration processes.
Item Based Allocation Agent
Review IBA configuration, Agent Execution and Trigger interval
If business permits, turn off IBA agent during peak hours.
Complete Inventory Sync before peak hours begin
Can cause significant load on database, impacting other processes
Add more JVMs in the wee hours of the peak day and complete the process well before the rush.
..
Retail Rush
© 2011 IBM Corporation
IBM Software Group
Hot spots– Tuning
Order and Shipment Monitoring:
Explore the option of disabling non-critical
critical Order Monitor rules
Configure and run Close Order agent before peak season
Sourcing and Scheduling
Large Distribution Groups:
Time taken for sourcing/scheduling transactions
TipExplore opportunities to group large DGs into
smaller subsets
is directly proportional to number of ShipNodes
Explore the option of limiting Inventory Lookup
based on radius
Ship from Store – Mantra for many retailers.
Define Standard Capacity for each of the stores.
For stores that can potentially run out of capacity,
use the Minimum Available Capacity configuration to
TipReview the store capacity settings well before
peak
avoid sourcing engine from scanning unnecessary nodes
Retail Rush
© 2011 IBM Corporation
IBM Software Group
Hot spots– Tuning
Database
Purge the following table using recommended purge agents
YFS_STATISTICS_DETAIL
YFS_INVENTORY_AUDIT
YFS_INVENTORY_DEMAND
YFS_INBOX
Review Audit Tables
Run Order Purge and purge qualifying records. Order Purge takes care of purging Audit and Order Audit
records.
Explore the option of disabling audits for entities if your business permits.
Ensure Inventory Purge is running
Gather stats
Have your DBA review the stats mechanism before peak. There are chances that stats might be failing for
some tables owing to their size
Up-to-date
date stats are critical for optimal execution plans which in turn provides better response times and
enhanced user experience.
Analyze AWRs:
Identify and tune queries with high IO and CPU are tuned
Look at the top events, focus on high IO, queries with high elapsed time and CPU, log file switches etc
Confirm that critical database parameters (For ex: cursor sharing) are set as per product's recommendation
Explore the option of pinning high volume transactions to one single DB node after consulting your DBA
Retail Rush
© 2011 IBM Corporation
IBM Software Group
General Recommendations - checklist
Disable application logging
Enabling application logging for APIs, services, agents can cause memory issues on the application and
lead to high response times
Ensure that there is enough (Audit/Index and Trans) space to accommodate the holiday volume in the
database
Under high volume table space can get quickly filled up and cause critical OMS processes to fail.
Check the storage on all the Agent, Application server boxes and MQ server boxes
Insufficient storage in the logs and other application folders can cause latency in the application.
Do a cyclic restart all the Application servers, agent/integration servers and IHS servers
To collect any leaked memory, open sockets and to overcome other resource leak which might have
occurred during the application lifecycle.
Check for queues with very high queue depth(> 15000)
Typically queues with large queue depth indicates the backlog and requires immediate action to clear the
backlog
Backup all the old logs(older than 3 days) on the Application and Agent/Integration server boxes.
To ease troubleshooting and quicker issue resolution during the peak days.
Retail Rush
© 2011 IBM Corporation
IBM Software Group
Smart Monitoring
Smart Monitoring helps IBM’s customers manage their transactional volumes during the peak sales
to provide a great guest experience. This constitutes of monitoring a set of core parameters using
which an Administrator can track the health of a system –
Order Monitoring
Number of order created per hour
TipTo record the average/maximum response time,
number of invocations, use the table –
yfs_statistics_detail
Average number of Order Lines
Average time taken to create order
The highest number of orders created in a minute during
peak time
Inventory Monitoring
Top selling items – will help in hot SKU monitoring and to predict row lock contention in database
Number of backordered orders – can impact throughput of Sourcing and Scheduling transactions
Shipment Monitoring
Number of shipments created in the system during the Monitoring Window
Average of shipment lines
Retail Rush
© 2011 IBM Corporation
IBM Software Group
Smart Monitoring
Inbox Monitoring
Repetitive Errors - Spurious and Repetitive errors are to be reported back to the administrator
Deadlocks and Lock timeouts - Deadlocks and lock timeout errors are to be monitored
Consolidation_Count – Any higher value in Consolidation_Count attribute of YFS_INBOX table indicates an
impending problem
Response Time, Throughput Monitoring of Mission Critical Agents and Services
Monitor the response times of all critical services like Inventory Lookup.
Response times of the Synchronous Services have a direct impact on the guest experience and the
scalability of the application.
For all the critical Asynchronous Transactions, monitor the throughput - any fall in throughput during the
peak hours indicates a potential memory or resource contention issue.
Order velocity
Time taken by system to transition a Order Status is a critical parameter to determine system’s health
Any delay in Order Status transition indicates a broken or unhealthy JVM in the pipeline
Retail Rush
© 2011 IBM Corporation
IBM Software Group
Smart Monitoring
Host memory and CPU foot print
Have the monitoring tool to send alerts in case of a resource spike, so that corrective action can be taken.
Monitor system for any heap or core dumps
This would indicate a Out of memory in JVMs being run on corresponding boxes
Monitor the DB for any blocking sessions/
long running queries
Kill any long running queries based on a threshold
TipUse scripts running in the background to
identify long running queries and kill them
Have an alert mechanism to alert the interested parties if a
query runs for a long time
Retail Rush
© 2011 IBM Corporation
IBM Software Group
Questions?
For further queries, email us at –
[email protected]
[email protected]
Retail Rush
© 2011 IBM Corporation