® IBM Software Group Retail Rush Surviving the peak season This session will be recorded and a replay will be available on IBM.COM sites and possibly social media sites such as YouTube. When speaking, do not state any confidential information, your name, company name or any information that you do not want shared publicly in the replay. By speaking during this presentation, you assume liability for your comments. © 2013 IBM Corporation IBM Software Group AGENDA Peak season – Why the rush? Peak season – Analytics Peak Readiness - Strategy Performance Hot spots – Tuning General Recommendations - checklist Smart monitoring Questions Retail Rush 2 © 2011 IBM Corporation IBM Software Group Peak Season – Why the rush? Predominantly seen during the holidays in the last quarter – black Friday and Cyber Monday Also seen during other sales like Boxing day sales, LTO and exclusive discounts by retailers Massive discounts provided by retailers – Free shipping Extended shopping hours – shops open through midnight Lightening deals Door busters Retail Rush 3 © 2011 IBM Corporation IBM Software Group Peak season – Analytics Per the report published by IBM post Black Friday 2012, an increase of 17% and 20% were noticed in online sales compared to the previous years on Thanksgiving and Cyber Monday. The increase in online sales were mainly driven by – Mobile Shopping – Grew by 67% The iPad Factor – contributed to 10% of online shopping Multi-screen Shopping – Customers shopped in-store store and online to get best deals *This data is the result of cloud-based based analytics findings from IBM. Retail Rush © 2011 IBM Corporation IBM Software Group Peak Readiness - Strategy Time . Performance Environment Readied . Performance Environment Loaded . Component Load Tests . Combination Load Tests . Endurance Tests . Assembly Tests . Failover Tests . • Setup Performance environment to mimic production(80% of Production Hardware Capacity, Inventory Distribution, High Availability Configuration) •Run Run each component in isolation to ensure that the throughput and response time SLA requirements are met. Identify bottlenecks, tune and rerun tests until the goals are met. • Assemble components to reflect real time scenarios and run them in parallel to ensure adherence with NFR. NFR For example : Create + Schedule + Release+ Create Shipment + Confirm Shipment + RTAM can be one assembly. • Run Endurance tests to simulate back-to-back( back( 4 hrs typical) peak load on the system to check the robustness • Database, MQ High Availability - To ensure that the Sterling Solution is correctly configured to be available if one of the participating components like MQ JMS, Database fails over to a different node Retail Rush © 2011 IBM Corporation IBM Software Group Hot spots– Tuning Application server TipSet Datasource MaxConnections(X) > WebContainer threads(Y). For example: X = Y + 5 - this will give enough room for application timer threads etc Datasource Configuration Configure a Datasource on the Appserver side Max and Min Datasource connection is a sensitive setting for the performance of application Long running Mission Critical Transactions Response time- 90th Percentile value of Inventory Looks up calls, Submit Order < 3s Optimally tune services calling an external system – set Timeout interval, number of retries, retry interval Investigate long running synchronous transactions by generating timer logs of API/Service Notorious List APIs Custom actions having list API calls to have a crisp output TipPass a cap on the TotalNumberOfRecords in the Inputs to any API that does a List action template per IBM Sterling documentation Use narrow search criteria - Status, EnterpriseCode Total Number of Records etc Retail Rush © 2011 IBM Corporation IBM Software Group Hot spots– Tuning CPU and Memory footprint: Even a well tuned application can depict latent behavior when run on an inadequate hardware configuration TipEven a spike of 40% on a non-peak day is a red alert notice and needs to be reviewed. Closely monitor CPU and memory of the application server boxes Investigate any unexplained CPU or memory spike Review JVM Arguments: Xmx and Xms to be optimally tuned Enable verbosegc logging – one of the crucial diagnostics to troubleshoot JVM memory issues Sterling Reference Cache settings – Trade off Caching a entity reduces database overhead but at the same time it stretches JVM memory foot print. Decision to enable caching for a database table should be made after careful evaluation. Monitor the application logs for any frequent cache refreshes Specifically look for tables like YFS_INVENTORY_NODE_CONTROL, YPM_PRICE_LIST_LINE, YFS_ITEM and YFS_ITEM_SHIP_NODE related entities. Caching settings for specific tables can be set in the customer overrides property file Retail Rush © 2011 IBM Corporation IBM Software Group Hot spots– Tuning Agents/Integration servers Threads and Topography: Run 5 threads/JVM, add additional threads only if TipDo not run multiple critical agent criteria on same JVM. JVM memory foot print permits. Review the agent distribution and stop redundant JVMs. Health Monitor: Health Monitor to be constantly run, set retention days based on your implementation Critical for effective cache propagation and for agents to register stats with the application Use the OOB script provided in the bin folder to start health monitormonitor startHealthMonitor.sh Enable Bulk Sender and evaluate using JMS pooling: Explore the option of enabling bulk sender - benefits in case of RTAM and other high volume transactions Consider the option of enabling Sterling JMS pooling-especially pooling if you have JVMs which are heavy on JMS sender side To use these features, entries for the following need to be made in the customer overrides property file yfs.agent.bulk.sender.enabled=true / yfs.jms.session.disable.pooling=N Retail Rush © 2011 IBM Corporation IBM Software Group Hot spots– Tuning Hot SKU: Enable Hot SKU feature - Parameters are to be carefully TipExplore the option of using special parameters like time out locking and InventorySkipList feature derived based on the expected load, item and inventory distribution If you expect sudden outburst in demand for specific items utilize yfs.yfs.hotsku.skipLockInventoryitemList feature Purge agents Stop the purge agents few days before peak. Running purge agents during peak hours can cause database overhead and impact critical agent and integration processes. Item Based Allocation Agent Review IBA configuration, Agent Execution and Trigger interval If business permits, turn off IBA agent during peak hours. Complete Inventory Sync before peak hours begin Can cause significant load on database, impacting other processes Add more JVMs in the wee hours of the peak day and complete the process well before the rush. .. Retail Rush © 2011 IBM Corporation IBM Software Group Hot spots– Tuning Order and Shipment Monitoring: Explore the option of disabling non-critical critical Order Monitor rules Configure and run Close Order agent before peak season Sourcing and Scheduling Large Distribution Groups: Time taken for sourcing/scheduling transactions TipExplore opportunities to group large DGs into smaller subsets is directly proportional to number of ShipNodes Explore the option of limiting Inventory Lookup based on radius Ship from Store – Mantra for many retailers. Define Standard Capacity for each of the stores. For stores that can potentially run out of capacity, use the Minimum Available Capacity configuration to TipReview the store capacity settings well before peak avoid sourcing engine from scanning unnecessary nodes Retail Rush © 2011 IBM Corporation IBM Software Group Hot spots– Tuning Database Purge the following table using recommended purge agents YFS_STATISTICS_DETAIL YFS_INVENTORY_AUDIT YFS_INVENTORY_DEMAND YFS_INBOX Review Audit Tables Run Order Purge and purge qualifying records. Order Purge takes care of purging Audit and Order Audit records. Explore the option of disabling audits for entities if your business permits. Ensure Inventory Purge is running Gather stats Have your DBA review the stats mechanism before peak. There are chances that stats might be failing for some tables owing to their size Up-to-date date stats are critical for optimal execution plans which in turn provides better response times and enhanced user experience. Analyze AWRs: Identify and tune queries with high IO and CPU are tuned Look at the top events, focus on high IO, queries with high elapsed time and CPU, log file switches etc Confirm that critical database parameters (For ex: cursor sharing) are set as per product's recommendation Explore the option of pinning high volume transactions to one single DB node after consulting your DBA Retail Rush © 2011 IBM Corporation IBM Software Group General Recommendations - checklist Disable application logging Enabling application logging for APIs, services, agents can cause memory issues on the application and lead to high response times Ensure that there is enough (Audit/Index and Trans) space to accommodate the holiday volume in the database Under high volume table space can get quickly filled up and cause critical OMS processes to fail. Check the storage on all the Agent, Application server boxes and MQ server boxes Insufficient storage in the logs and other application folders can cause latency in the application. Do a cyclic restart all the Application servers, agent/integration servers and IHS servers To collect any leaked memory, open sockets and to overcome other resource leak which might have occurred during the application lifecycle. Check for queues with very high queue depth(> 15000) Typically queues with large queue depth indicates the backlog and requires immediate action to clear the backlog Backup all the old logs(older than 3 days) on the Application and Agent/Integration server boxes. To ease troubleshooting and quicker issue resolution during the peak days. Retail Rush © 2011 IBM Corporation IBM Software Group Smart Monitoring Smart Monitoring helps IBM’s customers manage their transactional volumes during the peak sales to provide a great guest experience. This constitutes of monitoring a set of core parameters using which an Administrator can track the health of a system – Order Monitoring Number of order created per hour TipTo record the average/maximum response time, number of invocations, use the table – yfs_statistics_detail Average number of Order Lines Average time taken to create order The highest number of orders created in a minute during peak time Inventory Monitoring Top selling items – will help in hot SKU monitoring and to predict row lock contention in database Number of backordered orders – can impact throughput of Sourcing and Scheduling transactions Shipment Monitoring Number of shipments created in the system during the Monitoring Window Average of shipment lines Retail Rush © 2011 IBM Corporation IBM Software Group Smart Monitoring Inbox Monitoring Repetitive Errors - Spurious and Repetitive errors are to be reported back to the administrator Deadlocks and Lock timeouts - Deadlocks and lock timeout errors are to be monitored Consolidation_Count – Any higher value in Consolidation_Count attribute of YFS_INBOX table indicates an impending problem Response Time, Throughput Monitoring of Mission Critical Agents and Services Monitor the response times of all critical services like Inventory Lookup. Response times of the Synchronous Services have a direct impact on the guest experience and the scalability of the application. For all the critical Asynchronous Transactions, monitor the throughput - any fall in throughput during the peak hours indicates a potential memory or resource contention issue. Order velocity Time taken by system to transition a Order Status is a critical parameter to determine system’s health Any delay in Order Status transition indicates a broken or unhealthy JVM in the pipeline Retail Rush © 2011 IBM Corporation IBM Software Group Smart Monitoring Host memory and CPU foot print Have the monitoring tool to send alerts in case of a resource spike, so that corrective action can be taken. Monitor system for any heap or core dumps This would indicate a Out of memory in JVMs being run on corresponding boxes Monitor the DB for any blocking sessions/ long running queries Kill any long running queries based on a threshold TipUse scripts running in the background to identify long running queries and kill them Have an alert mechanism to alert the interested parties if a query runs for a long time Retail Rush © 2011 IBM Corporation IBM Software Group Questions? For further queries, email us at – [email protected] [email protected] Retail Rush © 2011 IBM Corporation
© Copyright 2026 Paperzz