OpenVMS “Marvel” EV7 Proof Points of Olympic Proportions Tech Update - Sept 2003 [email protected] 1 OpenVMS Marvel EV7 Proof Points of Olympic Proportions • • • • • Live, mission-critical, production systems Multi-dimensional Before and after comparisons Upgrades from GS140 & GS160 to GS1280 Proof of impact on maximum headroom 2 Can your enterprise benefit from an upgrade to a GS1280? • • • • • • • Systems with high MPsynch Systems with high primary CPU interrupt load Poor SMP scaling Heavy locking Heavy IO, Direct, Buffered, Mailbox Heavy use of Oracle, TCPIP, Multinet Look closer if: – Systems with poor response time – Systems with insufficient peak period throughput 3 T4 - Data Sources • Data for these comparisons was collected using the internally developed T4 (tabular timeline tracking tool) suite of coordinated collection utilities and analyzed with TLViz • The T4 kit & TLViz have consistently proved themselves invaluable for this kind of before and after comparison project. We have now made T4 publicly available for download (will ship with OpenVMS 7.3-2 in SYS$ETC: ) • T4 could be a useful adjunct to your performance management program. 4 Would you like to participate in our Marvel Proof Point Program? • Contact [email protected] for more information about how you can take part • Download T4 kit from public web site: http://h71000.www7.hp.com/OpenVMS/products/t4/in dex.html • Start creating a compact, portable T4 based performance history of your most important systems • The T4 data will create a common and efficient language for our discussions. We can then work with you and help you evaluate your unique pattern of use and the degree to which the advantages of Marvel EV7 on OpenVMS can most benefit you. 5 Want even more detail? • The electronic version of this presentation contains extensive captions and notes on each slide for your further study, reflection, and review. 6 CASE 1- Production System 12P GS140 700 MHz vs. 16P GS1280 1.15 GHz Tremendous Gains in Headroom Oracle Database Server with Multinet 7 Compute Queue Completely Evaporates with GS1280 Node(s) : HNAM Peak Queues of 57 drop to queues of 1 or 2 and HNAM 55 55 50 50 45 45 40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5 0 08:00:00 (3-Mar-2003) 08:30:00 (3-Mar-2003) 09:00:00 (3-Mar-2003) 09:30:00 (3-Mar-2003) 10:00:00 (3-Mar-2003) 10:30:00 (3-Mar-2003) 11:00:00 (3-Mar-2003) 11:30:00 (3-Mar-2003) Green is GS140 at 700 MHz with 12 CPU Red is GS1280 at 1.15GHZ with 16 CPUs 0 12:00:00 (3-Mar-2003) b [MON.STAT]Com pute(# 1) g c d e f g b [MON.STAT]Com pute(# 2) c d e f 8 CPU O Idle Time : HNAM With GS1280, 73% spareNode(s) CPU 0 during absolute peak. With GS140 andduring HNAM CPU 0 is completely consumed peaks (e.g. at 11 AM) 90 90 85 85 80 80 75 75 70 70 65 65 60 60 55 55 50 50 45 45 40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5 0 08:00:00 (3-Mar-2003) 08:30:00 (3-Mar-2003) 09:00:00 (3-Mar-2003) 09:30:00 (3-Mar-2003) 10:00:00 (3-Mar-2003) 10:30:00 (3-Mar-2003) 11:00:00 (3-Mar-2003) 11:30:00 (3-Mar-2003) Green is GS140 at 700 MHz with 12 CPU Red is GS1280 at 1.15GHZ with 16 CPUs 0 12:00:00 (3-Mar-2003) b [MON.MODES]Cpu 00 Idle(# 1) g c d e f g b [MON.MODES]Cpu 00 Idle(# 2) c d e f 9 Almost 4 to 1 reduction in CPU Busy with GS1280 GS140 is nearly maxed out at more: than 1150% busy of 1200% while Node(s) HNAM GS1280 is cruising along at HNAM 250% to 350% busy of 1600% and 1,150 1,150 1,100 1,100 1,050 1,050 1,000 1,000 950 950 900 900 850 850 800 800 750 750 700 700 650 650 600 600 550 550 500 500 450 450 400 400 350 350 300 300 250 250 200 200 150 150 100 100 50 50 0 08:00:00 (3-Mar-2003) 08:30:00 (3-Mar-2003) 09:00:00 (3-Mar-2003) 09:30:00 (3-Mar-2003) 10:00:00 (3-Mar-2003) 10:30:00 (3-Mar-2003) 11:00:00 (3-Mar-2003) 11:30:00 (3-Mar-2003) Green is GS140 at 700 MHz with 12 CPU Red is GS1280 at 1.15GHZ with 16 CPUs 0 12:00:00 (3-Mar-2003) b [MON.SYST]Cpu Busy(# 1) g c d e f g b [MON.SYST]Cpu Busy(# 2) c d e f 10 DirectIO (includes network traffic) GS1280 is able to push to higher peaks when load gets heavy, while still having huge spare capacity for more work. Node(s) : HNAM GS140 is close to maxed out at 10,000 DIRIO per second and HNAM 11,500 11,500 11,000 11,000 10,500 10,500 10,000 10,000 9,500 9,500 9,000 9,000 8,500 8,500 8,000 8,000 7,500 7,500 7,000 7,000 6,500 6,500 6,000 6,000 5,500 5,500 5,000 5,000 4,500 4,500 4,000 4,000 3,500 3,500 3,000 3,000 2,500 2,500 2,000 2,000 1,500 1,500 1,000 1,000 500 0 08:00:00 (3-Mar-2003) Green is GS140 at 700 MHz with 12 CPU Red is GS1280 at 1.15GHZ with 16 CPUs 500 08:30:00 (3-Mar-2003) 09:00:00 (3-Mar-2003) 09:30:00 (3-Mar-2003) 10:00:00 (3-Mar-2003) 10:30:00 (3-Mar-2003) 11:00:00 (3-Mar-2003) 11:30:00 (3-Mar-2003) 0 12:00:00 (3-Mar-2003) b [MON.SYST]Direct I/O Rate(# 1) g c d e f g b [MON.SYST]Direct I/O Rate(# 2) c d e f 11 MPsynch MPsynch drops from 90% toNode(s) under 10% leaving plenty of room for further : HNAM scaling and HNAM 105 105 100 100 95 95 90 90 85 85 80 80 75 75 70 70 65 65 60 60 55 55 50 50 45 45 40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5 0 08:00:00 (3-Mar-2003) 08:30:00 (3-Mar-2003) 09:00:00 (3-Mar-2003) 09:30:00 (3-Mar-2003) 10:00:00 (3-Mar-2003) 10:30:00 (3-Mar-2003) 11:00:00 (3-Mar-2003) 11:30:00 (3-Mar-2003) Green is GS140 at 700 MHz with 12 CPU Red is GS1280 at 1.15GHZ with 16 CPUs 0 12:00:00 (3-Mar-2003) b [MON.MODE]Mp Synch(# 1) g c d e f g b [MON.MODE]Mp Synch(# 2) c d e f 12 Packets Per Second Sent – key throughput metric - Estimate actual maximum rate for GS1280 at more than 20,000/sec GS140 maxes out at about 5,000 packets per second with little or no spare Node(s) : HNAM capacity. GS1280 reaches 6,000and withHNAM substantial spare capacity 6,000 6,000 5,500 5,500 5,000 5,000 4,500 4,500 4,000 4,000 3,500 3,500 3,000 3,000 2,500 2,500 2,000 2,000 1,500 1,500 1,000 1,000 500 0 08:00:00 (3-Mar-2003) Blue is GS140 at 700 MHz with 12 CPU Red is GS1280 at 1.15GHZ with 16 CPUs 500 08:30:00 (3-Mar-2003) 09:00:00 (3-Mar-2003) 09:30:00 (3-Mar-2003) 10:00:00 (3-Mar-2003) 10:30:00 (3-Mar-2003) 11:00:00 (3-Mar-2003) 11:30:00 (3-Mar-2003) 0 12:00:00 (3-Mar-2003) b [NET.EWA0:]Pkts Sent/Sec(# 1) g c d e f g c [NET.EWC0:]Pkts Sent/Sec(# 1) g d e f c [NET.EWA0:]Pkts Sent/Sec(# 2) g d e f b [NET.EWC0:]Pkts Sent/Sec(# 2) c d e f 13 Case 1 Summary 12P GS140 to 16P GS160 • GS1280 delivers an estimated increase in headroom of at least 4X • Eliminates CPU 0 bottleneck • Drastically cuts MPsynch • Able to handle higher peaks as they arrive • Almost 4 to 1 reduction in CPU use while doing slightly more work 14 15 Case 2 – Production System 10P GS140 700 MHz vs. 8P GS1280 1.15 GHz Tremendous Gains in Headroom for a Oracle Database Server despite reduced CPU count Poised to Scale 16 Compute Queue Completely Evaporates with GS1280 and the current workload demand Peak Queues of 32 drop to 3 Red is GS1280 at 1.15GHZ with 8 CPUs Node(s) : RED DB 8P GS1280 and GREEN DB 10P GS140 32 32 30 30 28 28 26 26 24 24 22 22 20 20 18 18 16 16 14 14 12 12 10 10 8 8 6 6 4 4 2 2 0 10:00:00 (3-Mar-2003) Green is GS140 at 700 MHz with 10 CPUs 0 10:15:00 (3-Mar-2003) 10:30:00 (3-Mar-2003) 10:45:00 (3-Mar-2003) 11:00:00 (3-Mar-2003) 11:15:00 (3-Mar-2003) 11:30:00 (3-Mar-2003) b [MON.STAT]Com pute(# 1) g c d e f g b [MON.STAT]Com pute(# 2) c d e f 11:45:00 (3-Mar-2003) 17 CPU O Idle Time With GS1280, 69% spare CPU 0 during absolute peak with this workload. With GS140 CPU 0 is completely consumed during peaks (e.g. at 10:30 for many minutes at a time) Node(s) : RED DB 8P GS1280 and GREEN DB 10P GS140 90 90 85 85 80 80 75 75 70 70 65 65 60 60 55 55 50 50 45 45 40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5 0 10:00:00 (3-Mar-2003) Green is GS140 Red is MHz GS1280 at 700 at 1.15GHZ with 12 CPU with 8 CPUs Green is GS140 Red is MHz GS1280 at 700 at 1.15GHZ with 10 CPUs with 16 CPUs 0 10:15:00 (3-Mar-2003) 10:30:00 (3-Mar-2003) 10:45:00 (3-Mar-2003) 11:00:00 (3-Mar-2003) 11:15:00 (3-Mar-2003) 11:30:00 (3-Mar-2003) 11:45:00 (3-Mar-2003) b [MON.MODES]Cpu 00 Idle(# 1) g c d e f g b [MON.MODES]Cpu 00 Idle(# 2) c d e f 18 More than 3 to 1 reduction in CPU Busy with GS1280 GS140 is completed maxed out at more than 1000% busy while GS1280 is cruising along at 200% to 350% busy of 800% Red is GS1280 at 1.15GHZ with 8 CPUs Node(s) : RED DB 8P GS1280 and GREEN DB 10P GS140 950 950 900 900 850 850 800 800 750 750 700 700 650 650 600 600 550 550 500 500 450 450 400 400 350 350 300 300 250 250 200 200 150 150 100 100 50 50 0 10:00:00 (3-Mar-2003) Green is GS140 at 700 MHz with 10 CPUs 0 10:15:00 (3-Mar-2003) 10:30:00 (3-Mar-2003) 10:45:00 (3-Mar-2003) 11:00:00 (3-Mar-2003) 11:15:00 (3-Mar-2003) 11:30:00 (3-Mar-2003) b [MON.SYST]Cpu Busy(# 1) g c d e f g b [MON.SYST]Cpu Busy(# 2) c d e f 11:45:00 (3-Mar-2003) 19 DirectIO (includes network traffic) GS1280 is able to push to higher peaks of 10,500 when the load temporarily gets heavier, while still having huge spare capacity for more work (appx 5 CPUs) The 10P GS140 is maxed out at slightly over 8,000 DIRIO per second. Node(s) : RED DB 8P GS1280 and GREEN DB 10P GS140 10,500 10,000 10,500 10,000 9,500 9,500 9,000 8,500 9,000 8,500 8,000 8,000 7,500 7,500 7,000 6,500 7,000 6,500 6,000 6,000 5,500 5,500 5,000 4,500 5,000 4,500 4,000 4,000 3,500 3,500 3,000 2,500 3,000 2,500 2,000 2,000 1,500 1,500 1,000 500 1,000 500 0 10:00:00 10:15:00 10:30:00 10:45:00 (3-Mar-2003) (3-Mar-2003) (3-Mar-2003) (3-Mar-2003) Red is GS1280 at 1.15GHZ with 8 CPUs Green is GS140 at 700 MHz with 10 CPUs 0 11:00:00 11:15:00 11:30:00 11:45:00 (3-Mar-2003) (3-Mar-2003) (3-Mar-2003) (3-Mar-2003) b [MON.SYST]Direct I/O Rate(# 1) g c d e f g b [MON.SYST]Direct I/O Rate(# 2) c d e f 20 MPsynch (more than a 9 to 1 reduction with this workload) MPsynch drops from peaks of 67% to peaks of only 7% leaving plenty of room for further scaling Node(s) : RED DB 8P GS1280 and GREEN DB 10P GS140 65 65 60 60 55 55 50 50 45 45 40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5 0 10:00:00 (3-Mar-2003) Red is GS1280 at 1.15GHZ with 8 CPUs Green is GS140 at 700 MHz with 10 CPUs 0 10:15:00 (3-Mar-2003) 10:30:00 (3-Mar-2003) 10:45:00 (3-Mar-2003) 11:00:00 (3-Mar-2003) 11:15:00 (3-Mar-2003) 11:30:00 (3-Mar-2003) b [MON.MODE]Mp Synch(# 1) g c d e f g b [MON.MODE]Mp Synch(# 2) c d e f 11:45:00 (3-Mar-2003) 21 Packets Per Second Sent – a key throughput metric Estimate actual max rate for 8P GS1280 at more than 11,000/sec. With 16P this would rise to 20,000/sec The10 P GS140 maxes out at about 4,200 packets per second with no spare capacity. The 8P GS1280 reaches 4,800 with more than 4.5 CPUs to spare Node(s) : RED DB 8P GS1280 and GREEN DB 10P GS140 4,800 4,600 4,400 4,200 4,000 3,800 3,600 3,400 3,200 3,000 2,800 2,600 2,400 2,200 2,000 1,800 1,600 1,400 1,200 1,000 800 600 400 200 0 10:00:00 (3-Mar-2003) 4,800 4,600 4,400 4,200 4,000 3,800 3,600 3,400 3,200 3,000 2,800 2,600 2,400 2,200 2,000 1,800 1,600 1,400 1,200 1,000 800 600 400 200 0 10:15:00 (3-Mar-2003) 10:30:00 (3-Mar-2003) 10:45:00 (3-Mar-2003) 11:00:00 (3-Mar-2003) 11:15:00 (3-Mar-2003) 11:30:00 (3-Mar-2003) Red is GS1280 at 1.15GHZ with 8 CPUs Green is GS140 at 700 MHz with 10 CPUs 11:45:00 (3-Mar-2003) b [NET.EWA0:]Pkts Sent/Sec(# 1) g c d e f g b [NET.EWA0:]Pkts Sent/Sec(# 2) c d e f 22 CPU 0 interrupt – is well poised for scaling to 8, 12, and even more CPUs with the GS1280 During peak periods, despite the fact that the GS1280 with 8P is doing slightly more work, it uses a factor of 3.5X less CPU 0 for interrupt activity Node(s) : RED DB 8P GS1280 and GREEN DB 10P GS140 55 55 50 50 45 45 40 40 35 35 At peaks of only 20%, the GS 1280 stands ready to handle substantially higher workloads 30 25 20 Green is GS140 at 700 MHz with 10 CPUs 30 25 20 15 15 10 10 5 5 0 10:00:00 (3-Mar-2003) Red is GS1280 at 1.15GHZ with 8 CPUs 0 10:15:00 (3-Mar-2003) 10:30:00 (3-Mar-2003) 10:45:00 (3-Mar-2003) 11:00:00 (3-Mar-2003) 11:15:00 (3-Mar-2003) 11:30:00 (3-Mar-2003) 11:45:00 (3-Mar-2003) b [MON.MODES]Cpu 00 Inter m ode(# 1) g c d e f g b [MON.MODES]Cpu 00 Inter m ode(# 2) c d e f 23 Disk operations rate – This shows the same head and shoulders pattern as direct IO and packets per second During peak periods, the 10P GS140 maxes out at 2,200 disk operations per second. With this workload, the 8P is able to reach 2,900 per second with lots of room to spare Node(s) : RED DB 8P GS1280 and GREEN DB 10P GS140 As the load demand on the GS1280 increases, this 8P model looks capable of driving the disk op rate to 6,000/sec 2,800 2,600 2,400 2,200 2,800 Green is GS140 at 700 MHz with 10 CPUs 2,600 2,400 2,200 2,000 2,000 1,800 1,800 1,600 1,600 1,400 1,400 1,200 1,200 1,000 1,000 800 800 600 600 400 400 200 200 0 10:00:00 (3-Mar-2003) Red is GS1280 at 1.15GHZ with 8 CPUs 0 10:15:00 (3-Mar-2003) 10:30:00 (3-Mar-2003) 10:45:00 (3-Mar-2003) 11:00:00 (3-Mar-2003) 11:15:00 (3-Mar-2003) 11:30:00 (3-Mar-2003) b [MON.DISK]OpRate(# 1) g c d e f g b [MON.DISK]OpRate(# 2) c d e f 11:45:00 (3-Mar-2003) 24 Interrupt load during peak periods drops by a factor of almost 5 to 1 from 240% to 50%. This is another excellent sign of the potential future scalability of this GS1280 to 8 CPUs, 12 CPUs and beyond. Red is GS1280 at 1.15GHZ with 8 CPUs Node(s) : RED DB 8P GS1280 and GREEN DB 10P GS140 240 230 220 210 200 190 180 170 160 150 140 130 120 110 100 90 80 70 60 50 40 30 20 10 0 10:00:00 (3-Mar-2003) 240 230 220 210 200 190 180 170 160 150 140 130 120 110 100 90 80 70 60 50 40 30 20 10 0 10:15:00 (3-Mar-2003) 10:30:00 (3-Mar-2003) 10:45:00 (3-Mar-2003) 11:00:00 (3-Mar-2003) 11:15:00 (3-Mar-2003) 11:30:00 (3-Mar-2003) Green is GS140 at 700 MHz with 10 CPUs 11:45:00 (3-Mar-2003) b [MON.MODE]Interrupt State(# 1) g c d e f g b [MON.MODE]Interrupt State(# 2) c d e f 25 Microseconds of CPU per each Direct IO Normalized statistics like this show the relative power of each GS1280 CPU at 1.15 GHZ is between 3 to 4 times more than the GS140’s 700 MHz CPUs Node(s) : RED DB 8P GS1280 and GREEN DB 10P GS140 1,300 1,300 1,200 1,200 1,100 1,100 1,000 1,000 900 900 800 800 700 700 600 600 500 500 400 400 300 300 200 200 100 100 0 10:00:00 (3-Mar-2003) Red is GS1280 at 1.15GHZ with 8 CPUs Green is GS140 at 700 MHz with 10 CPUs 0 10:15:00 (3-Mar-2003) 10:30:00 (3-Mar-2003) 10:45:00 (3-Mar-2003) 11:00:00 (3-Mar-2003) 11:15:00 (3-Mar-2003) 11:30:00 (3-Mar-2003) 11:45:00 (3-Mar-2003) b m icroseconds cpu per dirio(# 1) g c d e f g b m icroseconds cpu per dirio(# 2) c d e f 26 Disk Reads Per Second This shows same head and shoulders pattern but even more pronounced than what we saw with network packets Node(s) : RED DB 8P GS1280 and GREEN DB 10P GS140 2,200 2,100 2,000 1,900 1,800 1,700 1,600 1,500 1,400 1,300 1,200 1,100 1,000 900 800 700 600 500 400 300 200 100 0 10:00:00 (3-Mar-2003) 2,200 2,100 2,000 1,900 1,800 1,700 1,600 1,500 1,400 1,300 1,200 1,100 1,000 900 800 700 600 500 400 300 200 100 0 10:15:00 (3-Mar-2003) 10:30:00 (3-Mar-2003) 10:45:00 (3-Mar-2003) 11:00:00 (3-Mar-2003) 11:15:00 (3-Mar-2003) 11:30:00 (3-Mar-2003) b [XFC]Read IOs/Sec(# 1) g c d e f g b [XFC]Read IOs/Sec(# 2) c d e f Red is GS1280 at 1.15GHZ with 8 CPUs Green is GS140 at 700 MHz with 10 CPUs 11:45:00 (3-Mar-2003) 27 Case 2 Summary 10P GS140 to 8P GS1280 • GS1280 with fewer CPUs delivers an estimated headroom increase more than 2X • Eliminates CPU busy bottleneck • Drastically cuts MPsynch • Able to handle higher peaks as they arrive • Well positioned to scale to 8, 12, or higher CPUs and achieve headroom increases of 3.5X or even higher. 28 29 Proof Point Patterns • Dramatic cuts in MPsynch • Large drops in Interrupt mode • Higher, short-lived bursts of throughput – directIO, packets per second, etc. – The “HEAD and SHOULDERS” • Large increase in spare capacity and headroom – Overall CPU, primary CPU Where the workload stays relatively flat at the point of transition, the overall throughput numbers are not that different, but the shape of the new curve with its sharp peaks tells an important story 30 31 Case 3 –Stress Test Marvel 32P – RMS1 • This case shows a segment of our RMS1 testing on the 32P Marvel EV7 @ 1.15 GHz • Using Multiple 4 GB Ramdisks • Started at 16P, ramped up workload • Then increased to 24P, throughput dropped • Then affinitized jobs, throughput jumped • Combines timeline data from t4, spl, bmesh 32 Background to this test • RMS1 is based on a customer developed database benchmark test originally written using Rdb and converted to carry out the same task with RMS • To generate extremely high rates of IO in order to discover the limits of Marvel 32P performance, we ran multiple copies of RMS1, each using their own dedicated RAMdisk • Caution: The net effect is a test that generates an extremely heavy load, but that cannot be considered to mirror any typical production environment. 33 Timing of Changes • • • • • 12:05 12:11 12:30 12:38 12:55 16 CPUs Start ramp up with 4GB ramdisks Increase to 24 CPUs Set Process Affinity Turn off dedicated lock manager <Observe how timelines help make sense of this complicated situation> 34 Direct IO up to 70,000 per second! Node(s) : PRF31A For the RMS1 workload, the rate of direct IO per second is a key metric of maximum throughput. 70,000 70,000 65,000 65,000 60,000 60,000 55,000 55,000 50,000 50,000 45,000 45,000 12:30 does not 40,000 40,000 increase throughput. 35,000 35,000 30,000 30,000 25,000 25,000 20,000 20,000 15,000 15,000 10,000 10,000 5,000 5,000 12:20:00 (20-Feb-2003) 12:30:00 (20-Feb-2003) 12:40:00 (20-Feb-2003) Increasing to 24 CPUs, at Turning on Affinity causes throughput to jump from 55,000 to over 70,000, and increase of approximately 30% (1.3X) 12:50:00 (20-Feb-2003) b [MON.SYST]Direct I/O Rate(# 1) c d e f g 35 Kernel & MPsynch switch roles Node(s) : PRF31A 950 950 900 900 850 850 800 800 750 750 700 700 650 650 600 600 550 550 500 500 450 450 400 400 350 350 300 300 250 250 200 200 150 150 100 100 50 50 0 0 12:15:00 12:20:00 12:25:00 12:30:00 12:35:00 12:40:00 12:45:00 12:50:00 12:55:00 (20-Feb-2003) (20-Feb-2003) (20-Feb-2003) (20-Feb-2003) (20-Feb-2003) (20-Feb-2003) (20-Feb-2003) (20-Feb-2003) (20-Feb-2003) b [MON.MODE]Kernel Mode(# 1) g c d e f g b [MON.MODE]Mp Synch(# 1) c d e f 12:30 is when we jumped from 16 CPUs to 24 CPUs. Note how MPsynch (green) jumps up substantially at that time to over 950%. At 12:37, we started affinitizing the different processes to CPUs we believed to be close to where there associated RAMdisk was located. Note how MPsynch and Kernel mode cross over at that point. 36 Lock Busy % from T4 shows jump with affinity Node(s) : PRF31A 90 90 85 85 80 80 75 75 70 70 65 65 60 60 55 55 50 50 45 45 40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5 0 12:15:00 (20-Feb-2003) 12:20:00 (20-Feb-2003) 12:25:00 (20-Feb-2003) 12:30:00 (20-Feb-2003) 12:35:00 (20-Feb-2003) 12:40:00 (20-Feb-2003) b [LCK73]Busy %(# 1) c d e f g 12:45:00 (20-Feb-2003) 12:50:00 (20-Feb-2003) 0 12:55:00 (20-Feb-2003) We had dedicated lock manager turned on for this test which creates a very heaving locking load. Note that there is no change when the number of CPUs is increased at around 12:30. Note the big jump in Lock % busy that happens when we affinitize. At over 90% busy, locking is a clear primary bottleneck that will prevent further increases in throughput even with more CPUs. 37 Lock requests per sec vs XFC write A True Linear Relationship Scatter Diagram for Data Collection on node PRF31A between 20-FEB-2003 12:10:49.12 and 20-FEB-2003 12:55:04.05 450,000 450,000 1.000 400,000 The maximum rate of lock requests per minute is an astounding 450,000 per second. 400,000 350,000 350,000 300,000 300,000 250,000 250,000 200,000 200,000 150,000 150,000 100,000 100,000 50,000 50,000 0 0 0 2,000 4,000 6,000 8,000 [XFC]Write IOs/Sec 10,000 12,000 38 Case 3 - SUMMARY • These are by far the best throughput numbers we have ever seen on this workload for: – Direct IO, Lock requests per second. • Performance is great out of the box. • New tools simplify bottleneck identification • Straightforward tuning pushes to even higher values with a surprising large upward jump • Workloads show consistent ratios between key statistics (e.g. Lock Requests per DIRIO) • Spinlock related bottlenecks remain with us, albeit at dramatically higher throughput levels 39 40 Case 4 – Production System • Upgrade from 16 CPU Wildfire EV68 running at 1.224 GHz (the fastest Wildfire) • Compared to 16 CPU Marvel EV7 running at 1.15 GHz • Oracle, TCPIP, Mixed Database Server and Application Server 41 CPU Busy cut in half Note Color Switch!!! Red is GS1280 with 16 CPUs at 1.15 GHz Node(s) : MILP1 and MILP1 1,400 1,400 1,350 1,350 1,300 1,300 1,250 1,250 1,200 1,200 1,150 1,150 1,100 1,100 1,050 1,050 1,000 1,000 950 950 900 900 850 850 800 800 750 750 700 700 650 650 600 600 550 550 500 500 450 450 400 400 350 350 300 Green is GS160 with 16 CPUs at 1.224 GHz 300 09:30:00 (31-Mar-2003) 10:00:00 (31-Mar-2003) 10:30:00 (31-Mar-2003) 11:00:00 (31-Mar-2003) 11:30:00 (31-Mar-2003) b [MON_SYST]Cpu Busy(# 1) g c d e f g b [MON_SYST]Cpu Busy(# 2) c d e f 42 CPU 0 Interrupt is cut by a factor of more than 3 to 1 Red is GS1280 with 16 CPUs at 1.15 GHz Node(s) : MILP1 and MILP1 85 85 80 80 75 75 70 70 65 65 60 60 55 55 50 50 45 45 40 40 35 35 30 30 25 25 20 20 15 15 09:30:00 (31-Mar-2003) 10:00:00 (31-Mar-2003) 10:30:00 (31-Mar-2003) 11:00:00 (31-Mar-2003) Green is GS160 with 16 CPUs at 1.224 GHz 11:30:00 (31-Mar-2003) b [MON_MODES]Cpu 00 Inter m ode(# 1) g c d e f g b [MON_MODES]Cpu 00 Inter m ode(# 2) c d e f 43 Buffered IO – sustained higher peaks Red is GS1280 with 16 CPUs at 1.15 GHz Node(s) : MILP1 and MILP1 10,600 10,400 10,200 10,000 9,800 9,600 9,400 9,200 9,000 8,800 8,600 8,400 8,200 8,000 7,800 7,600 7,400 7,200 7,000 6,800 6,600 6,400 6,200 6,000 5,800 5,600 5,400 5,200 5,000 4,800 4,600 4,400 10,600 10,400 10,200 10,000 9,800 9,600 9,400 9,200 9,000 8,800 8,600 8,400 8,200 8,000 7,800 7,600 7,400 7,200 7,000 6,800 6,600 6,400 6,200 6,000 5,800 5,600 5,400 5,200 5,000 4,800 4,600 4,400 09:30:00 (31-Mar-2003) 10:00:00 (31-Mar-2003) 10:30:00 (31-Mar-2003) 11:00:00 (31-Mar-2003) Green is GS160 with 16 CPUs at 1.224 GHz 11:30:00 (31-Mar-2003) b [MON_SYST]Buffered I/O Rate(# 1) g c d e f g b [MON_SYST]Buffered I/O Rate(# 2) c d e f 44 Direct IO – sustained higher peaks Red is GS1280 with 16 CPUs at 1.15 GHz Node(s) : MILP1 and MILP1 2,500 2,500 2,400 2,400 2,300 2,300 2,200 2,200 2,100 2,100 2,000 2,000 1,900 1,900 1,800 1,800 1,700 1,700 1,600 1,600 1,500 1,500 1,400 1,400 1,300 1,300 1,200 1,200 1,100 1,100 1,000 1,000 900 900 800 800 700 700 600 600 500 500 09:30:00 (31-Mar-2003) 10:00:00 (31-Mar-2003) 10:30:00 (31-Mar-2003) 11:00:00 (31-Mar-2003) Green is GS160 with 16 CPUs at 1.224 GHz 11:30:00 (31-Mar-2003) b [MON_SYST]Direct I/O Rate(# 1) g c d e f g b [MON_SYST]Direct I/O Rate(# 2) c d e f 45 System Wide Interrupt diminished by a factor of 4 to 1 Red is GS1280 with 16 CPUs at 1.15 GHz Node(s) : MILP1 and MILP1 190 190 180 180 170 170 160 160 150 150 140 140 130 130 120 120 110 110 100 100 90 90 80 80 70 70 60 60 50 50 40 40 30 30 09:30:00 (31-Mar-2003) 10:00:00 (31-Mar-2003) 10:30:00 (31-Mar-2003) 11:00:00 (31-Mar-2003) Green is GS160 with 16 CPUs at 1.224 GHz 11:30:00 (31-Mar-2003) b [MON_MODE]Interrupt State(# 1) g c d e f g b [MON_MODE]Interrupt State(# 2) c d e f 46 MPsynch shrinks by more than 8 to 1 Red is GS1280 with 16 CPUs at 1.15 GHz Node(s) : MILP1 and MILP1 290 290 280 270 280 270 260 260 250 240 250 240 230 230 220 210 220 210 200 200 190 180 190 180 170 170 160 150 160 150 140 140 130 120 130 120 110 110 100 90 100 90 80 80 70 60 70 60 50 50 40 30 40 30 20 20 10 Green is GS160 with 16 CPUs at 1.224 GHz 10 09:30:00 (31-Mar-2003) 10:00:00 (31-Mar-2003) 10:30:00 (31-Mar-2003) 11:00:00 (31-Mar-2003) 11:30:00 (31-Mar-2003) b [MON_MODE]Mp Synch(# 1) g c d e f g b [MON_MODE]Mp Synch(# 2) c d e f 47 Kernel Mode decreases from 260 to 150 Red is GS1280 with 16 CPUs at 1.15 GHz Node(s) : MILP1 and MILP1 310 310 300 300 290 290 280 280 270 270 260 260 250 250 240 240 230 230 220 220 210 210 200 200 190 190 180 180 170 170 160 160 150 150 140 140 09:30:00 (31-Mar-2003) 10:00:00 (31-Mar-2003) 10:30:00 (31-Mar-2003) 11:00:00 (31-Mar-2003) Green is GS160 with 16 CPUs at 1.224 GHz 11:30:00 (31-Mar-2003) b [MON_MODE]Kernel Mode(# 1) g c d e f g b [MON_MODE]Kernel Mode(# 2) c d e f 48 User Mode decreases from about 480 to 240 Red is GS1280 with 16 CPUs at 1.15 GHz Node(s) : MILP1 and MILP1 660 660 640 640 620 620 600 600 580 580 560 560 540 540 520 520 500 500 480 480 460 460 440 440 420 420 400 400 380 380 360 360 340 340 320 320 300 300 280 280 260 260 240 240 220 220 200 200 180 180 160 160 140 Green is GS160 with 16 CPUs at 1.224 GHz 140 09:30:00 (31-Mar-2003) 10:00:00 (31-Mar-2003) 10:30:00 (31-Mar-2003) 11:00:00 (31-Mar-2003) 11:30:00 (31-Mar-2003) b [MON_MODE]User Mode(# 1) g c d e f g b [MON_MODE]User Mode(# 2) c d e f 49 Compute Queue disappears Red is GS1280 with 16 CPUs at 1.15 GHz Node(s) : MILP1 and MILP1 22 22 21 21 20 20 19 19 18 18 17 17 16 16 15 15 14 14 13 13 12 12 11 11 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 Green is GS160 with 16 CPUs at 1.224 GHz 0 09:30:00 (31-Mar-2003) 10:00:00 (31-Mar-2003) 10:30:00 (31-Mar-2003) 11:00:00 (31-Mar-2003) 11:30:00 (31-Mar-2003) b [MON_STAT]Com pute(# 1) g c d e f g b [MON_STAT]Com pute(# 2) c d e f 50 Packets per second – head and shoulders with higher peaks Red is GS1280 with 16 CPUs at 1.15 GHz Node(s) : MILP1 and MILP1 4,700 4,700 4,600 4,600 4,500 4,500 4,400 4,400 4,300 4,300 4,200 4,200 4,100 4,100 4,000 4,000 3,900 3,900 3,800 3,800 3,700 3,700 3,600 3,600 3,500 3,500 3,400 3,400 3,300 3,300 3,200 3,200 3,100 3,100 3,000 3,000 2,900 2,900 2,800 2,800 2,700 2,700 2,600 2,600 2,500 2,500 2,400 2,400 09:30:00 (31-Mar-2003) 10:00:00 (31-Mar-2003) 10:30:00 (31-Mar-2003) 11:00:00 (31-Mar-2003) Green is GS160 with 16 CPUs at 1.224 GHz 11:30:00 (31-Mar-2003) b [NET_MON]EWA0: Pkts Sent/Sec(# 1) g c d e f g b [NET_MON]EWA0: Pkts Sent/Sec(# 2) c d e f 51 Mailbox Writes – head and shoulders with higher peaks Red is GS1280 with 16 CPUs at 1.15 GHz Node(s) : MILP1 and MILP1 3,100 3,100 3,000 3,000 2,900 2,900 2,800 2,800 2,700 2,700 2,600 2,600 2,500 2,500 2,400 2,400 2,300 2,300 2,200 2,200 2,100 2,100 2,000 2,000 1,900 1,900 1,800 1,800 1,700 1,700 1,600 1,600 1,500 1,500 1,400 1,400 1,300 1,300 1,200 1,200 1,100 1,100 1,000 Green is GS160 with 16 CPUs at 1.224 GHz 1,000 09:30:00 (31-Mar-2003) 10:00:00 (31-Mar-2003) 10:30:00 (31-Mar-2003) 11:00:00 (31-Mar-2003) 11:30:00 (31-Mar-2003) b [MON_IO ]Mailbox Write Rate(# 1) g c d e f g b [MON_IO ]Mailbox Write Rate(# 2) c d e f 52 Dedicated Lock Manager Busy drops from 18% down to about 6% Red is GS1280 with 16 CPUs at 1.15 GHz Node(s) : MILP1 and MILP1 28 28 27 27 26 26 25 25 24 24 23 23 22 22 21 21 20 20 19 19 18 18 17 17 16 16 15 15 14 14 13 13 12 12 11 11 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 Green is GS160 with 16 CPUs at 1.224 GHz 2 09:30:00 (31-Mar-2003) 10:00:00 (31-Mar-2003) 10:30:00 (31-Mar-2003) 11:00:00 (31-Mar-2003) 11:30:00 (31-Mar-2003) b [LCK73_MON]Busy %(# 1) g c d e f g b [LCK73_MON]Busy %(# 2) c d e f 53 Case 4 - SUMMARY • The GS160 with 16 CPUs had been highly tuned, yet was unable to handle the heaviest peak loads presented. • Bottleneck was related to reaching maximum TCPIP throughput, related MPsynch, and limits on the max BUFIO • GS1280 immediately, without further adjustment, provided dramatic increase in maximum throughput & huge improvement in spare capacity and headroom. 54 55 Case 5 – Production System • NOTE color switch in slides • Upgrade 12P GS140 to 16P GS1280 • Mixed Application and Database Server 56 MPsynch almost disappears Drops from 130 to under 10% Red is GS140 with 12 CPUs Node(s) : ALCOR and ALCOR Green is GS1280 with 16 CPUs 210 210 200 200 190 190 180 180 170 170 160 160 150 150 140 140 130 130 120 120 110 110 100 100 90 90 80 80 70 70 60 60 50 50 40 40 30 30 20 20 10 10 08:30:00 (6-May-2003) 09:00:00 (6-May-2003) 09:30:00 (6-May-2003) 10:00:00 (6-May-2003) 10:30:00 (6-May-2003) b [MON.MODE]Mp Synch(# 1) g c d e f g b [MON.MODE]Mp Synch(# 2) c d e f 57 Kernel Mode shrinks by more than 5 to 1 Red is GS140 with 12 CPUs Node(s) : ALCOR and ALCOR Green is GS1280 with 16 CPUs 220 220 210 210 200 200 190 190 180 180 170 170 160 160 150 150 140 140 130 130 120 120 110 110 100 100 90 90 80 80 70 70 60 60 50 50 40 40 30 30 20 20 08:30:00 (6-May-2003) 09:00:00 (6-May-2003) 09:30:00 (6-May-2003) 10:00:00 (6-May-2003) 10:30:00 (6-May-2003) b [MON.MODE]Kernel Mode(# 1) g c d e f g b [MON.MODE]Kernel Mode(# 2) c d e f 58 System Wide Interrupt also shrinks by more than 5 to 1 Red is GS140 with 12 CPUs Node(s) : ALCOR and ALCOR Green is GS1280 with 16 CPUs 120 120 110 110 100 100 90 90 80 80 70 70 60 60 50 50 40 40 30 30 20 20 10 10 08:30:00 (6-May-2003) 09:00:00 (6-May-2003) 09:30:00 (6-May-2003) 10:00:00 (6-May-2003) 10:30:00 (6-May-2003) b [MON.MODE]Interrupt State(# 1) g c d e f g b [MON.MODE]Interrupt State(# 2) c d e f 59 User Mode is cut in half Red is GS140 with 12 CPUs Node(s) : ALCOR and ALCOR Green is GS1280 with 16 CPUs 650 650 600 600 550 550 500 500 450 450 400 400 350 350 300 300 250 250 200 200 150 150 08:30:00 (6-May-2003) 09:00:00 (6-May-2003) 09:30:00 (6-May-2003) 10:00:00 (6-May-2003) 10:30:00 (6-May-2003) b [MON.MODE]User Mode(# 1) g c d e f g b [MON.MODE]User Mode(# 2) c d e f 60 CPU busy drops by almost 3 to 1 Red is GS140 with 12 CPUs Node(s) : ALCOR and ALCOR Green is GS1280 with 16 CPUs 1,100 1,100 1,050 1,050 1,000 1,000 950 950 900 900 850 850 800 800 750 750 700 700 650 650 600 600 550 550 500 500 450 450 400 400 350 350 300 300 250 250 200 200 08:30:00 (6-May-2003) 09:00:00 (6-May-2003) 09:30:00 (6-May-2003) 10:00:00 (6-May-2003) 10:30:00 (6-May-2003) b [MON.SYST]Cpu Busy(# 1) g c d e f g b [MON.SYST]Cpu Busy(# 2) c d e f 61 CPU 0 Interrupt almost disappears and drops by more than 6 to1 Red is GS140 with 12 CPUs Node(s) : ALCOR and ALCOR Green is GS1280 with 16 CPUs 75 75 70 70 65 65 60 60 55 55 50 50 45 45 40 40 35 35 30 30 25 25 20 20 15 15 10 10 5 5 08:30:00 (6-May-2003) 09:00:00 (6-May-2003) 09:30:00 (6-May-2003) 10:00:00 (6-May-2003) 10:30:00 (6-May-2003) b [MON.MODES]Cpu 00 Inter m ode(# 1) g c d e f g b [MON.MODES]Cpu 00 Inter m ode(# 2) c d e f 62 Buffered IO – shows consistently higher peaks. There was a real backlog of work waiting to be serviced Red is GS140 with 12 CPUs Node(s) : ALCOR and ALCOR Green is GS1280 with 16 CPUs 9,500 9,500 9,000 9,000 8,500 8,500 8,000 8,000 7,500 7,500 7,000 7,000 6,500 6,500 6,000 6,000 5,500 5,500 5,000 5,000 4,500 4,500 4,000 4,000 3,500 3,500 3,000 3,000 2,500 2,500 08:30:00 (6-May-2003) 09:00:00 (6-May-2003) 09:30:00 (6-May-2003) 10:00:00 (6-May-2003) 10:30:00 (6-May-2003) b [MON.SYST]Buffered I/O Rate(# 1) g c d e f g b [MON.SYST]Buffered I/O Rate(# 2) c d e f 63 Direct IO shows substantially higher peaks which are short-lived Red is GS140 with 12 CPUs Node(s) : ALCOR and ALCOR Green is GS1280 with 16 CPUs 850 850 800 800 750 750 700 700 650 650 600 600 550 550 500 500 450 450 400 400 350 350 300 300 250 250 200 200 150 150 100 100 08:30:00 (6-May-2003) 09:00:00 (6-May-2003) 09:30:00 (6-May-2003) 10:00:00 (6-May-2003) 10:30:00 (6-May-2003) b [MON.SYST]Direct I/O Rate(# 1) g c d e f g b [MON.SYST]Direct I/O Rate(# 2) c d e f 64 Mailbox write increases from 1400 to over 2400 Red is GS140 with 12 CPUs Node(s) : ALCOR and ALCOR Green is GS1280 with 16 CPUs 4,400 4,400 4,200 4,200 4,000 4,000 3,800 3,800 3,600 3,600 3,400 3,400 3,200 3,200 3,000 3,000 2,800 2,800 2,600 2,600 2,400 2,400 2,200 2,200 2,000 2,000 1,800 1,800 1,600 1,600 1,400 1,400 1,200 1,200 1,000 1,000 08:30:00 (6-May-2003) 09:00:00 (6-May-2003) 09:30:00 (6-May-2003) 10:00:00 (6-May-2003) 10:30:00 (6-May-2003) b [MON.IO]Mailbox Write Rate(# 1) g c d e f g b [MON.IO]Mailbox Write Rate(# 2) c d e f 65 Compute Queue evaporates Red is GS140 with 12 CPUs Node(s) : ALCOR and ALCOR Green is GS1280 with 16 CPUs 21 21 20 20 19 19 18 18 17 17 16 16 15 15 14 14 13 13 12 12 11 11 10 10 9 9 8 8 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 08:30:00 (6-May-2003) 09:00:00 (6-May-2003) 09:30:00 (6-May-2003) 10:00:00 (6-May-2003) 10:30:00 (6-May-2003) b [MON.STAT]Com pute(# 1) g c d e f g b [MON.STAT]Com pute(# 2) c d e f 66 Case 5 - SUMMARY • Huge backlog of work can now be handled successfully during long lasting peak periods as demonstrated by higher buffered IO and other throughput metrics. • Substantial further reserves of spare capacity • Large changes in key performance metrics such as MPsynch, interrupt. 67 Proof Point Summary • Marvel EV7 GS1280 systems are the best performing VMS systems ever. • Excellent out-of-the-box performance • Superior SMP scaling • Huge increases in maximum throughput, some realized immediately, the rest held in reserve as spare capacity. • Marvel provides the headroom for future growth 68 Can your enterprise benefit from an upgrade to a GS1280? • • • • • • • Systems with high MPsynch Systems with high primary CPU interrupt load Poor SMP scaling Heavy locking Heavy IO, Direct, Buffered, Mailbox Heavy use of Oracle, TCPIP, Multinet Look closer if: – Systems with poor response time – Systems with insufficient peak period throughput 69 Would you like to participate in our Marvel Proof Point Program? • Contact [email protected] for more information about how you can take part • Download T4 kit from public web site: http://h71000.www7.hp.com/OpenVMS/products/t4/in dex.html • Start creating a compact, portable, T4 based performance history of your most important systems • The T4 data will create a common and efficient language for our discussions. We can then work with you and help you evaluate your unique pattern of use and the degree to which the advantages of Marvel EV7 on OpenVMS can most benefit you. 70 71
© Copyright 2025 Paperzz