Aurora – 12P GS140 700 MHz vs. 16P Gs1280 1.15 GHz

OpenVMS “Marvel” EV7
Proof Points of Olympic
Proportions
Tech Update - Sept 2003
[email protected]
1
OpenVMS Marvel EV7 Proof
Points of Olympic Proportions
•
•
•
•
•
Live, mission-critical, production systems
Multi-dimensional
Before and after comparisons
Upgrades from GS140 & GS160 to GS1280
Proof of impact on maximum headroom
2
Can your enterprise benefit from
an upgrade to a GS1280?
•
•
•
•
•
•
•
Systems with high MPsynch
Systems with high primary CPU interrupt load
Poor SMP scaling
Heavy locking
Heavy IO, Direct, Buffered, Mailbox
Heavy use of Oracle, TCPIP, Multinet
Look closer if:
– Systems with poor response time
– Systems with insufficient peak period throughput
3
T4 - Data Sources
• Data for these comparisons was collected using
the internally developed T4 (tabular timeline
tracking tool) suite of coordinated collection
utilities and analyzed with TLViz
• The T4 kit & TLViz have consistently proved
themselves invaluable for this kind of before and
after comparison project. We have now made T4
publicly available for download (will ship with
OpenVMS 7.3-2 in SYS$ETC: )
• T4 could be a useful adjunct to your performance
management program.
4
Would you like to participate in
our Marvel Proof Point Program?
• Contact [email protected] for more information
about how you can take part
• Download T4 kit from public web site:
http://h71000.www7.hp.com/OpenVMS/products/t4/in
dex.html
• Start creating a compact, portable T4 based
performance history of your most important systems
• The T4 data will create a common and efficient
language for our discussions. We can then work with
you and help you evaluate your unique pattern of use
and the degree to which the advantages of Marvel
EV7 on OpenVMS can most benefit you.
5
Want even more detail?
• The electronic version of this presentation
contains extensive captions and notes on
each slide for your further study, reflection,
and review.
6
CASE 1- Production System
12P GS140 700 MHz
vs. 16P GS1280 1.15 GHz
Tremendous Gains in Headroom
Oracle Database Server
with Multinet
7
Compute Queue Completely Evaporates with GS1280
Node(s)
: HNAM
Peak Queues of
57 drop
to queues of 1 or 2
and HNAM
55
55
50
50
45
45
40
40
35
35
30
30
25
25
20
20
15
15
10
10
5
5
0
08:00:00
(3-Mar-2003)
08:30:00
(3-Mar-2003)
09:00:00
(3-Mar-2003)
09:30:00
(3-Mar-2003)
10:00:00
(3-Mar-2003)
10:30:00
(3-Mar-2003)
11:00:00
(3-Mar-2003)
11:30:00
(3-Mar-2003)
Green is GS140
at 700 MHz
with 12 CPU
Red is GS1280
at 1.15GHZ
with 16 CPUs
0
12:00:00
(3-Mar-2003)
b [MON.STAT]Com pute(# 1) g
c
d
e
f
g
b [MON.STAT]Com pute(# 2)
c
d
e
f
8
CPU O Idle Time
: HNAM
With GS1280, 73% spareNode(s)
CPU 0 during
absolute peak. With GS140
andduring
HNAM
CPU 0 is completely consumed
peaks (e.g. at 11 AM)
90
90
85
85
80
80
75
75
70
70
65
65
60
60
55
55
50
50
45
45
40
40
35
35
30
30
25
25
20
20
15
15
10
10
5
5
0
08:00:00
(3-Mar-2003)
08:30:00
(3-Mar-2003)
09:00:00
(3-Mar-2003)
09:30:00
(3-Mar-2003)
10:00:00
(3-Mar-2003)
10:30:00
(3-Mar-2003)
11:00:00
(3-Mar-2003)
11:30:00
(3-Mar-2003)
Green is GS140
at 700 MHz
with 12 CPU
Red is GS1280
at 1.15GHZ
with 16 CPUs
0
12:00:00
(3-Mar-2003)
b [MON.MODES]Cpu 00 Idle(# 1) g
c
d
e
f
g
b [MON.MODES]Cpu 00 Idle(# 2)
c
d
e
f
9
Almost 4 to 1 reduction in CPU Busy with GS1280
GS140 is nearly maxed out
at more: than
1150% busy of 1200% while
Node(s)
HNAM
GS1280 is cruising along
at HNAM
250% to 350% busy of 1600%
and
1,150
1,150
1,100
1,100
1,050
1,050
1,000
1,000
950
950
900
900
850
850
800
800
750
750
700
700
650
650
600
600
550
550
500
500
450
450
400
400
350
350
300
300
250
250
200
200
150
150
100
100
50
50
0
08:00:00
(3-Mar-2003)
08:30:00
(3-Mar-2003)
09:00:00
(3-Mar-2003)
09:30:00
(3-Mar-2003)
10:00:00
(3-Mar-2003)
10:30:00
(3-Mar-2003)
11:00:00
(3-Mar-2003)
11:30:00
(3-Mar-2003)
Green is GS140
at 700 MHz
with 12 CPU
Red is GS1280
at 1.15GHZ
with 16 CPUs
0
12:00:00
(3-Mar-2003)
b [MON.SYST]Cpu Busy(# 1) g
c
d
e
f
g
b [MON.SYST]Cpu Busy(# 2)
c
d
e
f
10
DirectIO (includes network traffic)
GS1280 is able to push to higher peaks when load gets heavy, while still
having huge spare capacity for more work.
Node(s) : HNAM
GS140 is close to maxed out at 10,000 DIRIO per second
and HNAM
11,500
11,500
11,000
11,000
10,500
10,500
10,000
10,000
9,500
9,500
9,000
9,000
8,500
8,500
8,000
8,000
7,500
7,500
7,000
7,000
6,500
6,500
6,000
6,000
5,500
5,500
5,000
5,000
4,500
4,500
4,000
4,000
3,500
3,500
3,000
3,000
2,500
2,500
2,000
2,000
1,500
1,500
1,000
1,000
500
0
08:00:00
(3-Mar-2003)
Green is GS140
at 700 MHz
with 12 CPU
Red is GS1280
at 1.15GHZ
with 16 CPUs
500
08:30:00
(3-Mar-2003)
09:00:00
(3-Mar-2003)
09:30:00
(3-Mar-2003)
10:00:00
(3-Mar-2003)
10:30:00
(3-Mar-2003)
11:00:00
(3-Mar-2003)
11:30:00
(3-Mar-2003)
0
12:00:00
(3-Mar-2003)
b [MON.SYST]Direct I/O Rate(# 1) g
c
d
e
f
g
b [MON.SYST]Direct I/O Rate(# 2)
c
d
e
f
11
MPsynch
MPsynch drops from 90% toNode(s)
under 10%
leaving plenty of room for further
: HNAM
scaling
and HNAM
105
105
100
100
95
95
90
90
85
85
80
80
75
75
70
70
65
65
60
60
55
55
50
50
45
45
40
40
35
35
30
30
25
25
20
20
15
15
10
10
5
5
0
08:00:00
(3-Mar-2003)
08:30:00
(3-Mar-2003)
09:00:00
(3-Mar-2003)
09:30:00
(3-Mar-2003)
10:00:00
(3-Mar-2003)
10:30:00
(3-Mar-2003)
11:00:00
(3-Mar-2003)
11:30:00
(3-Mar-2003)
Green is GS140
at 700 MHz
with 12 CPU
Red is GS1280
at 1.15GHZ
with 16 CPUs
0
12:00:00
(3-Mar-2003)
b [MON.MODE]Mp Synch(# 1) g
c
d
e
f
g
b [MON.MODE]Mp Synch(# 2)
c
d
e
f
12
Packets Per Second Sent – key throughput metric - Estimate
actual maximum rate for GS1280 at more than 20,000/sec
GS140 maxes out at about 5,000
packets
per second with little or no spare
Node(s)
: HNAM
capacity. GS1280 reaches 6,000and
withHNAM
substantial spare capacity
6,000
6,000
5,500
5,500
5,000
5,000
4,500
4,500
4,000
4,000
3,500
3,500
3,000
3,000
2,500
2,500
2,000
2,000
1,500
1,500
1,000
1,000
500
0
08:00:00
(3-Mar-2003)
Blue is GS140
at 700 MHz
with 12 CPU
Red is GS1280
at 1.15GHZ
with 16 CPUs
500
08:30:00
(3-Mar-2003)
09:00:00
(3-Mar-2003)
09:30:00
(3-Mar-2003)
10:00:00
(3-Mar-2003)
10:30:00
(3-Mar-2003)
11:00:00
(3-Mar-2003)
11:30:00
(3-Mar-2003)
0
12:00:00
(3-Mar-2003)
b [NET.EWA0:]Pkts Sent/Sec(# 1) g
c
d
e
f
g
c [NET.EWC0:]Pkts Sent/Sec(# 1) g
d
e
f
c [NET.EWA0:]Pkts Sent/Sec(# 2) g
d
e
f
b [NET.EWC0:]Pkts Sent/Sec(# 2)
c
d
e
f
13
Case 1 Summary
12P GS140 to 16P GS160
• GS1280 delivers an estimated increase in
headroom of at least 4X
• Eliminates CPU 0 bottleneck
• Drastically cuts MPsynch
• Able to handle higher peaks as they arrive
• Almost 4 to 1 reduction in CPU use while
doing slightly more work
14
15
Case 2 – Production System
10P GS140 700 MHz
vs. 8P GS1280 1.15 GHz
Tremendous Gains in Headroom
for a Oracle Database Server
despite reduced CPU count
Poised to Scale
16
Compute Queue Completely Evaporates with GS1280 and
the current workload demand
Peak Queues of 32 drop to 3
Red is GS1280
at 1.15GHZ
with 8 CPUs
Node(s) : RED DB 8P GS1280
and GREEN DB 10P GS140
32
32
30
30
28
28
26
26
24
24
22
22
20
20
18
18
16
16
14
14
12
12
10
10
8
8
6
6
4
4
2
2
0
10:00:00
(3-Mar-2003)
Green is GS140
at 700 MHz
with 10 CPUs
0
10:15:00
(3-Mar-2003)
10:30:00
(3-Mar-2003)
10:45:00
(3-Mar-2003)
11:00:00
(3-Mar-2003)
11:15:00
(3-Mar-2003)
11:30:00
(3-Mar-2003)
b [MON.STAT]Com pute(# 1) g
c
d
e
f
g
b [MON.STAT]Com pute(# 2)
c
d
e
f
11:45:00
(3-Mar-2003)
17
CPU O Idle Time
With GS1280, 69% spare CPU 0 during absolute peak with this workload.
With GS140 CPU 0 is completely consumed during peaks (e.g. at 10:30 for
many minutes at a time)
Node(s) : RED DB 8P GS1280
and GREEN DB 10P GS140
90
90
85
85
80
80
75
75
70
70
65
65
60
60
55
55
50
50
45
45
40
40
35
35
30
30
25
25
20
20
15
15
10
10
5
5
0
10:00:00
(3-Mar-2003)
Green is GS140
Red
is MHz
GS1280
at 700
at
1.15GHZ
with
12 CPU
with 8 CPUs
Green is GS140
Red
is MHz
GS1280
at 700
at
1.15GHZ
with
10 CPUs
with 16 CPUs
0
10:15:00
(3-Mar-2003)
10:30:00
(3-Mar-2003)
10:45:00
(3-Mar-2003)
11:00:00
(3-Mar-2003)
11:15:00
(3-Mar-2003)
11:30:00
(3-Mar-2003)
11:45:00
(3-Mar-2003)
b [MON.MODES]Cpu 00 Idle(# 1) g
c
d
e
f
g
b [MON.MODES]Cpu 00 Idle(# 2)
c
d
e
f
18
More than 3 to 1 reduction in CPU Busy with GS1280
GS140 is completed maxed out at more than 1000% busy while
GS1280 is cruising along at 200% to 350% busy of 800%
Red is GS1280
at 1.15GHZ
with 8 CPUs
Node(s) : RED DB 8P GS1280
and GREEN DB 10P GS140
950
950
900
900
850
850
800
800
750
750
700
700
650
650
600
600
550
550
500
500
450
450
400
400
350
350
300
300
250
250
200
200
150
150
100
100
50
50
0
10:00:00
(3-Mar-2003)
Green is GS140
at 700 MHz
with 10 CPUs
0
10:15:00
(3-Mar-2003)
10:30:00
(3-Mar-2003)
10:45:00
(3-Mar-2003)
11:00:00
(3-Mar-2003)
11:15:00
(3-Mar-2003)
11:30:00
(3-Mar-2003)
b [MON.SYST]Cpu Busy(# 1) g
c
d
e
f
g
b [MON.SYST]Cpu Busy(# 2)
c
d
e
f
11:45:00
(3-Mar-2003)
19
DirectIO (includes network traffic)
GS1280 is able to push to higher peaks of 10,500 when the load temporarily
gets heavier, while still having huge spare capacity for more work (appx 5
CPUs) The 10P GS140 is maxed out at slightly over 8,000 DIRIO per
second.
Node(s) : RED DB 8P GS1280
and GREEN DB 10P GS140
10,500
10,000
10,500
10,000
9,500
9,500
9,000
8,500
9,000
8,500
8,000
8,000
7,500
7,500
7,000
6,500
7,000
6,500
6,000
6,000
5,500
5,500
5,000
4,500
5,000
4,500
4,000
4,000
3,500
3,500
3,000
2,500
3,000
2,500
2,000
2,000
1,500
1,500
1,000
500
1,000
500
0
10:00:00
10:15:00
10:30:00
10:45:00
(3-Mar-2003) (3-Mar-2003) (3-Mar-2003) (3-Mar-2003)
Red is GS1280
at 1.15GHZ
with 8 CPUs
Green is GS140
at 700 MHz
with 10 CPUs
0
11:00:00
11:15:00
11:30:00
11:45:00
(3-Mar-2003) (3-Mar-2003) (3-Mar-2003) (3-Mar-2003)
b [MON.SYST]Direct I/O Rate(# 1) g
c
d
e
f
g
b [MON.SYST]Direct I/O Rate(# 2)
c
d
e
f
20
MPsynch (more than a 9 to 1 reduction with this workload)
MPsynch drops from peaks of 67% to peaks of only 7% leaving plenty of
room for further scaling
Node(s) : RED DB 8P GS1280
and GREEN DB 10P GS140
65
65
60
60
55
55
50
50
45
45
40
40
35
35
30
30
25
25
20
20
15
15
10
10
5
5
0
10:00:00
(3-Mar-2003)
Red is GS1280
at 1.15GHZ
with 8 CPUs
Green is GS140
at 700 MHz
with 10 CPUs
0
10:15:00
(3-Mar-2003)
10:30:00
(3-Mar-2003)
10:45:00
(3-Mar-2003)
11:00:00
(3-Mar-2003)
11:15:00
(3-Mar-2003)
11:30:00
(3-Mar-2003)
b [MON.MODE]Mp Synch(# 1) g
c
d
e
f
g
b [MON.MODE]Mp Synch(# 2)
c
d
e
f
11:45:00
(3-Mar-2003)
21
Packets Per Second Sent – a key throughput metric Estimate actual max rate for 8P GS1280 at more than
11,000/sec. With 16P this would rise to 20,000/sec
The10 P GS140 maxes out at about 4,200 packets per second with no spare
capacity. The 8P GS1280 reaches 4,800 with more than 4.5 CPUs to spare
Node(s) : RED DB 8P GS1280
and GREEN DB 10P GS140
4,800
4,600
4,400
4,200
4,000
3,800
3,600
3,400
3,200
3,000
2,800
2,600
2,400
2,200
2,000
1,800
1,600
1,400
1,200
1,000
800
600
400
200
0
10:00:00
(3-Mar-2003)
4,800
4,600
4,400
4,200
4,000
3,800
3,600
3,400
3,200
3,000
2,800
2,600
2,400
2,200
2,000
1,800
1,600
1,400
1,200
1,000
800
600
400
200
0
10:15:00
(3-Mar-2003)
10:30:00
(3-Mar-2003)
10:45:00
(3-Mar-2003)
11:00:00
(3-Mar-2003)
11:15:00
(3-Mar-2003)
11:30:00
(3-Mar-2003)
Red is GS1280
at 1.15GHZ
with 8 CPUs
Green is GS140
at 700 MHz
with 10 CPUs
11:45:00
(3-Mar-2003)
b [NET.EWA0:]Pkts Sent/Sec(# 1) g
c
d
e
f
g
b [NET.EWA0:]Pkts Sent/Sec(# 2)
c
d
e
f
22
CPU 0 interrupt – is well poised for scaling to 8, 12, and even
more CPUs with the GS1280
During peak periods, despite the fact that the GS1280 with 8P is doing
slightly more work, it uses a factor of 3.5X less CPU 0 for interrupt activity
Node(s) : RED DB 8P GS1280
and GREEN DB 10P GS140
55
55
50
50
45
45
40
40
35
35
At peaks of only 20%, the GS
1280 stands ready to handle
substantially higher workloads
30
25
20
Green is GS140
at 700 MHz
with 10 CPUs
30
25
20
15
15
10
10
5
5
0
10:00:00
(3-Mar-2003)
Red is GS1280
at 1.15GHZ
with 8 CPUs
0
10:15:00
(3-Mar-2003)
10:30:00
(3-Mar-2003)
10:45:00
(3-Mar-2003)
11:00:00
(3-Mar-2003)
11:15:00
(3-Mar-2003)
11:30:00
(3-Mar-2003)
11:45:00
(3-Mar-2003)
b [MON.MODES]Cpu 00 Inter m ode(# 1) g
c
d
e
f
g
b [MON.MODES]Cpu 00 Inter m ode(# 2)
c
d
e
f
23
Disk operations rate – This shows the same head and
shoulders pattern as direct IO and packets per second
During peak periods, the 10P GS140 maxes out at 2,200 disk operations per second.
With this workload, the 8P is able to reach 2,900 per second with lots of room to spare
Node(s) : RED DB 8P GS1280
and GREEN DB 10P GS140
As the load demand on the
GS1280 increases, this 8P
model looks capable of driving
the disk op rate to 6,000/sec
2,800
2,600
2,400
2,200
2,800
Green is GS140
at 700 MHz
with 10 CPUs
2,600
2,400
2,200
2,000
2,000
1,800
1,800
1,600
1,600
1,400
1,400
1,200
1,200
1,000
1,000
800
800
600
600
400
400
200
200
0
10:00:00
(3-Mar-2003)
Red is GS1280
at 1.15GHZ
with 8 CPUs
0
10:15:00
(3-Mar-2003)
10:30:00
(3-Mar-2003)
10:45:00
(3-Mar-2003)
11:00:00
(3-Mar-2003)
11:15:00
(3-Mar-2003)
11:30:00
(3-Mar-2003)
b [MON.DISK]OpRate(# 1) g
c
d
e
f
g
b [MON.DISK]OpRate(# 2)
c
d
e
f
11:45:00
(3-Mar-2003)
24
Interrupt load during peak periods drops by a factor of almost
5 to 1 from 240% to 50%.
This is another excellent sign of the potential future scalability of this
GS1280 to 8 CPUs, 12 CPUs and beyond.
Red is GS1280
at 1.15GHZ
with 8 CPUs
Node(s) : RED DB 8P GS1280
and GREEN DB 10P GS140
240
230
220
210
200
190
180
170
160
150
140
130
120
110
100
90
80
70
60
50
40
30
20
10
0
10:00:00
(3-Mar-2003)
240
230
220
210
200
190
180
170
160
150
140
130
120
110
100
90
80
70
60
50
40
30
20
10
0
10:15:00
(3-Mar-2003)
10:30:00
(3-Mar-2003)
10:45:00
(3-Mar-2003)
11:00:00
(3-Mar-2003)
11:15:00
(3-Mar-2003)
11:30:00
(3-Mar-2003)
Green is GS140
at 700 MHz
with 10 CPUs
11:45:00
(3-Mar-2003)
b [MON.MODE]Interrupt State(# 1) g
c
d
e
f
g
b [MON.MODE]Interrupt State(# 2)
c
d
e
f
25
Microseconds of CPU per each Direct IO
Normalized statistics like this show the relative power of each GS1280 CPU
at 1.15 GHZ is between 3 to 4 times more than the GS140’s 700 MHz CPUs
Node(s) : RED DB 8P GS1280
and GREEN DB 10P GS140
1,300
1,300
1,200
1,200
1,100
1,100
1,000
1,000
900
900
800
800
700
700
600
600
500
500
400
400
300
300
200
200
100
100
0
10:00:00
(3-Mar-2003)
Red is GS1280
at 1.15GHZ
with 8 CPUs
Green is GS140
at 700 MHz
with 10 CPUs
0
10:15:00
(3-Mar-2003)
10:30:00
(3-Mar-2003)
10:45:00
(3-Mar-2003)
11:00:00
(3-Mar-2003)
11:15:00
(3-Mar-2003)
11:30:00
(3-Mar-2003)
11:45:00
(3-Mar-2003)
b m icroseconds cpu per dirio(# 1) g
c
d
e
f
g
b m icroseconds cpu per dirio(# 2)
c
d
e
f
26
Disk Reads Per Second
This shows same head and shoulders pattern but even more pronounced than
what we saw with network packets
Node(s) : RED DB 8P GS1280
and GREEN DB 10P GS140
2,200
2,100
2,000
1,900
1,800
1,700
1,600
1,500
1,400
1,300
1,200
1,100
1,000
900
800
700
600
500
400
300
200
100
0
10:00:00
(3-Mar-2003)
2,200
2,100
2,000
1,900
1,800
1,700
1,600
1,500
1,400
1,300
1,200
1,100
1,000
900
800
700
600
500
400
300
200
100
0
10:15:00
(3-Mar-2003)
10:30:00
(3-Mar-2003)
10:45:00
(3-Mar-2003)
11:00:00
(3-Mar-2003)
11:15:00
(3-Mar-2003)
11:30:00
(3-Mar-2003)
b [XFC]Read IOs/Sec(# 1) g
c
d
e
f
g
b [XFC]Read IOs/Sec(# 2)
c
d
e
f
Red is GS1280
at 1.15GHZ
with 8 CPUs
Green is GS140
at 700 MHz
with 10 CPUs
11:45:00
(3-Mar-2003)
27
Case 2 Summary
10P GS140 to 8P GS1280
• GS1280 with fewer CPUs delivers an
estimated headroom increase more than 2X
• Eliminates CPU busy bottleneck
• Drastically cuts MPsynch
• Able to handle higher peaks as they arrive
• Well positioned to scale to 8, 12, or higher
CPUs and achieve headroom increases of
3.5X or even higher.
28
29
Proof Point Patterns
• Dramatic cuts in MPsynch
• Large drops in Interrupt mode
• Higher, short-lived bursts of throughput
– directIO, packets per second, etc.
– The “HEAD and SHOULDERS”
• Large increase in spare capacity and headroom
– Overall CPU, primary CPU
Where the workload stays relatively flat at the point
of transition, the overall throughput numbers are
not that different, but the shape of the new curve
with its sharp peaks tells an important story
30
31
Case 3 –Stress Test
Marvel 32P – RMS1
• This case shows a segment of our RMS1
testing on the 32P Marvel EV7 @ 1.15 GHz
• Using Multiple 4 GB Ramdisks
• Started at 16P, ramped up workload
• Then increased to 24P, throughput dropped
• Then affinitized jobs, throughput jumped
• Combines timeline data from t4, spl, bmesh
32
Background to this test
• RMS1 is based on a customer developed database
benchmark test originally written using Rdb and
converted to carry out the same task with RMS
• To generate extremely high rates of IO in order to
discover the limits of Marvel 32P performance, we
ran multiple copies of RMS1, each using their
own dedicated RAMdisk
• Caution: The net effect is a test that generates an
extremely heavy load, but that cannot be
considered to mirror any typical production
environment.
33
Timing of Changes
•
•
•
•
•
12:05
12:11
12:30
12:38
12:55
16 CPUs
Start ramp up with 4GB ramdisks
Increase to 24 CPUs
Set Process Affinity
Turn off dedicated lock manager
<Observe how timelines help make sense
of this complicated situation>
34
Direct IO up to 70,000 per second!
Node(s) : PRF31A
For the RMS1 workload,
the rate of direct IO per
second is a key metric of
maximum throughput.
70,000
70,000
65,000
65,000
60,000
60,000
55,000
55,000
50,000
50,000
45,000
45,000
12:30 does not
40,000
40,000
increase throughput.
35,000
35,000
30,000
30,000
25,000
25,000
20,000
20,000
15,000
15,000
10,000
10,000
5,000
5,000
12:20:00
(20-Feb-2003)
12:30:00
(20-Feb-2003)
12:40:00
(20-Feb-2003)
Increasing to 24 CPUs, at
Turning on Affinity
causes throughput to
jump from 55,000 to over
70,000, and increase of
approximately 30%
(1.3X)
12:50:00
(20-Feb-2003)
b [MON.SYST]Direct I/O Rate(# 1)
c
d
e
f
g
35
Kernel & MPsynch switch roles
Node(s) : PRF31A
950
950
900
900
850
850
800
800
750
750
700
700
650
650
600
600
550
550
500
500
450
450
400
400
350
350
300
300
250
250
200
200
150
150
100
100
50
50
0
0
12:15:00
12:20:00
12:25:00
12:30:00
12:35:00
12:40:00
12:45:00
12:50:00
12:55:00
(20-Feb-2003) (20-Feb-2003) (20-Feb-2003) (20-Feb-2003) (20-Feb-2003) (20-Feb-2003) (20-Feb-2003) (20-Feb-2003) (20-Feb-2003)
b [MON.MODE]Kernel Mode(# 1) g
c
d
e
f
g
b [MON.MODE]Mp Synch(# 1)
c
d
e
f
12:30 is when we
jumped from 16
CPUs to 24 CPUs.
Note how MPsynch
(green) jumps up
substantially at that
time to over 950%.
At 12:37, we started
affinitizing the
different processes to
CPUs we believed to
be close to where
there associated
RAMdisk was
located.
Note how MPsynch
and Kernel mode
cross over at that
point.
36
Lock Busy % from T4 shows jump with affinity
Node(s) : PRF31A
90
90
85
85
80
80
75
75
70
70
65
65
60
60
55
55
50
50
45
45
40
40
35
35
30
30
25
25
20
20
15
15
10
10
5
5
0
12:15:00
(20-Feb-2003)
12:20:00
(20-Feb-2003)
12:25:00
(20-Feb-2003)
12:30:00
(20-Feb-2003)
12:35:00
(20-Feb-2003)
12:40:00
(20-Feb-2003)
b [LCK73]Busy %(# 1)
c
d
e
f
g
12:45:00
(20-Feb-2003)
12:50:00
(20-Feb-2003)
0
12:55:00
(20-Feb-2003)
We had dedicated
lock manager turned
on for this test
which creates a very
heaving locking
load.
Note that there is no
change when the
number of CPUs is
increased at around
12:30.
Note the big jump in
Lock % busy that
happens when we
affinitize.
At over 90% busy,
locking is a clear
primary bottleneck
that will prevent
further increases in
throughput even
with more CPUs.
37
Lock requests per sec vs XFC write
A True Linear Relationship
Scatter Diagram for Data Collection on
node PRF31A between
20-FEB-2003 12:10:49.12 and 20-FEB-2003 12:55:04.05
450,000
450,000
1.000
400,000
The maximum rate
of lock requests per
minute is an
astounding 450,000
per second.
400,000
350,000
350,000
300,000
300,000
250,000
250,000
200,000
200,000
150,000
150,000
100,000
100,000
50,000
50,000
0
0
0
2,000
4,000
6,000
8,000
[XFC]Write IOs/Sec
10,000
12,000
38
Case 3 - SUMMARY
• These are by far the best throughput numbers we
have ever seen on this workload for:
– Direct IO, Lock requests per second.
• Performance is great out of the box.
• New tools simplify bottleneck identification
• Straightforward tuning pushes to even higher
values with a surprising large upward jump
• Workloads show consistent ratios between key
statistics (e.g. Lock Requests per DIRIO)
• Spinlock related bottlenecks remain with us, albeit
at dramatically higher throughput levels
39
40
Case 4 – Production System
• Upgrade from 16 CPU Wildfire EV68
running at 1.224 GHz (the fastest Wildfire)
• Compared to 16 CPU Marvel EV7 running
at 1.15 GHz
• Oracle, TCPIP, Mixed Database Server and
Application Server
41
CPU Busy cut in half
Note Color Switch!!!
Red is GS1280 with
16 CPUs at 1.15
GHz
Node(s) : MILP1
and MILP1
1,400
1,400
1,350
1,350
1,300
1,300
1,250
1,250
1,200
1,200
1,150
1,150
1,100
1,100
1,050
1,050
1,000
1,000
950
950
900
900
850
850
800
800
750
750
700
700
650
650
600
600
550
550
500
500
450
450
400
400
350
350
300
Green is GS160
with 16 CPUs at
1.224 GHz
300
09:30:00
(31-Mar-2003)
10:00:00
(31-Mar-2003)
10:30:00
(31-Mar-2003)
11:00:00
(31-Mar-2003)
11:30:00
(31-Mar-2003)
b [MON_SYST]Cpu Busy(# 1) g
c
d
e
f
g
b [MON_SYST]Cpu Busy(# 2)
c
d
e
f
42
CPU 0 Interrupt is cut
by a factor of more than 3 to 1
Red is GS1280 with
16 CPUs at 1.15
GHz
Node(s) : MILP1
and MILP1
85
85
80
80
75
75
70
70
65
65
60
60
55
55
50
50
45
45
40
40
35
35
30
30
25
25
20
20
15
15
09:30:00
(31-Mar-2003)
10:00:00
(31-Mar-2003)
10:30:00
(31-Mar-2003)
11:00:00
(31-Mar-2003)
Green is GS160
with 16 CPUs at
1.224 GHz
11:30:00
(31-Mar-2003)
b [MON_MODES]Cpu 00 Inter m ode(# 1) g
c
d
e
f
g
b [MON_MODES]Cpu 00 Inter m ode(# 2)
c
d
e
f
43
Buffered IO – sustained higher peaks
Red is GS1280 with
16 CPUs at 1.15
GHz
Node(s) : MILP1
and MILP1
10,600
10,400
10,200
10,000
9,800
9,600
9,400
9,200
9,000
8,800
8,600
8,400
8,200
8,000
7,800
7,600
7,400
7,200
7,000
6,800
6,600
6,400
6,200
6,000
5,800
5,600
5,400
5,200
5,000
4,800
4,600
4,400
10,600
10,400
10,200
10,000
9,800
9,600
9,400
9,200
9,000
8,800
8,600
8,400
8,200
8,000
7,800
7,600
7,400
7,200
7,000
6,800
6,600
6,400
6,200
6,000
5,800
5,600
5,400
5,200
5,000
4,800
4,600
4,400
09:30:00
(31-Mar-2003)
10:00:00
(31-Mar-2003)
10:30:00
(31-Mar-2003)
11:00:00
(31-Mar-2003)
Green is GS160
with 16 CPUs at
1.224 GHz
11:30:00
(31-Mar-2003)
b [MON_SYST]Buffered I/O Rate(# 1) g
c
d
e
f
g
b [MON_SYST]Buffered I/O Rate(# 2)
c
d
e
f
44
Direct IO – sustained higher peaks
Red is GS1280 with
16 CPUs at 1.15
GHz
Node(s) : MILP1
and MILP1
2,500
2,500
2,400
2,400
2,300
2,300
2,200
2,200
2,100
2,100
2,000
2,000
1,900
1,900
1,800
1,800
1,700
1,700
1,600
1,600
1,500
1,500
1,400
1,400
1,300
1,300
1,200
1,200
1,100
1,100
1,000
1,000
900
900
800
800
700
700
600
600
500
500
09:30:00
(31-Mar-2003)
10:00:00
(31-Mar-2003)
10:30:00
(31-Mar-2003)
11:00:00
(31-Mar-2003)
Green is GS160
with 16 CPUs at
1.224 GHz
11:30:00
(31-Mar-2003)
b [MON_SYST]Direct I/O Rate(# 1) g
c
d
e
f
g
b [MON_SYST]Direct I/O Rate(# 2)
c
d
e
f
45
System Wide Interrupt
diminished by a factor of 4 to 1
Red is GS1280 with
16 CPUs at 1.15
GHz
Node(s) : MILP1
and MILP1
190
190
180
180
170
170
160
160
150
150
140
140
130
130
120
120
110
110
100
100
90
90
80
80
70
70
60
60
50
50
40
40
30
30
09:30:00
(31-Mar-2003)
10:00:00
(31-Mar-2003)
10:30:00
(31-Mar-2003)
11:00:00
(31-Mar-2003)
Green is GS160
with 16 CPUs at
1.224 GHz
11:30:00
(31-Mar-2003)
b [MON_MODE]Interrupt State(# 1) g
c
d
e
f
g
b [MON_MODE]Interrupt State(# 2)
c
d
e
f
46
MPsynch shrinks by more than 8 to 1
Red is GS1280 with
16 CPUs at 1.15
GHz
Node(s) : MILP1
and MILP1
290
290
280
270
280
270
260
260
250
240
250
240
230
230
220
210
220
210
200
200
190
180
190
180
170
170
160
150
160
150
140
140
130
120
130
120
110
110
100
90
100
90
80
80
70
60
70
60
50
50
40
30
40
30
20
20
10
Green is GS160
with 16 CPUs at
1.224 GHz
10
09:30:00
(31-Mar-2003)
10:00:00
(31-Mar-2003)
10:30:00
(31-Mar-2003)
11:00:00
(31-Mar-2003)
11:30:00
(31-Mar-2003)
b [MON_MODE]Mp Synch(# 1) g
c
d
e
f
g
b [MON_MODE]Mp Synch(# 2)
c
d
e
f
47
Kernel Mode decreases from 260 to 150
Red is GS1280 with
16 CPUs at 1.15
GHz
Node(s) : MILP1
and MILP1
310
310
300
300
290
290
280
280
270
270
260
260
250
250
240
240
230
230
220
220
210
210
200
200
190
190
180
180
170
170
160
160
150
150
140
140
09:30:00
(31-Mar-2003)
10:00:00
(31-Mar-2003)
10:30:00
(31-Mar-2003)
11:00:00
(31-Mar-2003)
Green is GS160
with 16 CPUs at
1.224 GHz
11:30:00
(31-Mar-2003)
b [MON_MODE]Kernel Mode(# 1) g
c
d
e
f
g
b [MON_MODE]Kernel Mode(# 2)
c
d
e
f
48
User Mode decreases from about 480 to 240
Red is GS1280 with
16 CPUs at 1.15
GHz
Node(s) : MILP1
and MILP1
660
660
640
640
620
620
600
600
580
580
560
560
540
540
520
520
500
500
480
480
460
460
440
440
420
420
400
400
380
380
360
360
340
340
320
320
300
300
280
280
260
260
240
240
220
220
200
200
180
180
160
160
140
Green is GS160
with 16 CPUs at
1.224 GHz
140
09:30:00
(31-Mar-2003)
10:00:00
(31-Mar-2003)
10:30:00
(31-Mar-2003)
11:00:00
(31-Mar-2003)
11:30:00
(31-Mar-2003)
b [MON_MODE]User Mode(# 1) g
c
d
e
f
g
b [MON_MODE]User Mode(# 2)
c
d
e
f
49
Compute Queue disappears
Red is GS1280 with
16 CPUs at 1.15
GHz
Node(s) : MILP1
and MILP1
22
22
21
21
20
20
19
19
18
18
17
17
16
16
15
15
14
14
13
13
12
12
11
11
10
10
9
9
8
8
7
7
6
6
5
5
4
4
3
3
2
2
1
1
0
Green is GS160
with 16 CPUs at
1.224 GHz
0
09:30:00
(31-Mar-2003)
10:00:00
(31-Mar-2003)
10:30:00
(31-Mar-2003)
11:00:00
(31-Mar-2003)
11:30:00
(31-Mar-2003)
b [MON_STAT]Com pute(# 1) g
c
d
e
f
g
b [MON_STAT]Com pute(# 2)
c
d
e
f
50
Packets per second – head and shoulders with
higher peaks
Red is GS1280 with
16 CPUs at 1.15
GHz
Node(s) : MILP1
and MILP1
4,700
4,700
4,600
4,600
4,500
4,500
4,400
4,400
4,300
4,300
4,200
4,200
4,100
4,100
4,000
4,000
3,900
3,900
3,800
3,800
3,700
3,700
3,600
3,600
3,500
3,500
3,400
3,400
3,300
3,300
3,200
3,200
3,100
3,100
3,000
3,000
2,900
2,900
2,800
2,800
2,700
2,700
2,600
2,600
2,500
2,500
2,400
2,400
09:30:00
(31-Mar-2003)
10:00:00
(31-Mar-2003)
10:30:00
(31-Mar-2003)
11:00:00
(31-Mar-2003)
Green is GS160
with 16 CPUs at
1.224 GHz
11:30:00
(31-Mar-2003)
b [NET_MON]EWA0: Pkts Sent/Sec(# 1) g
c
d
e
f
g
b [NET_MON]EWA0: Pkts Sent/Sec(# 2)
c
d
e
f
51
Mailbox Writes – head and shoulders with
higher peaks
Red is GS1280 with
16 CPUs at 1.15
GHz
Node(s) : MILP1
and MILP1
3,100
3,100
3,000
3,000
2,900
2,900
2,800
2,800
2,700
2,700
2,600
2,600
2,500
2,500
2,400
2,400
2,300
2,300
2,200
2,200
2,100
2,100
2,000
2,000
1,900
1,900
1,800
1,800
1,700
1,700
1,600
1,600
1,500
1,500
1,400
1,400
1,300
1,300
1,200
1,200
1,100
1,100
1,000
Green is GS160
with 16 CPUs at
1.224 GHz
1,000
09:30:00
(31-Mar-2003)
10:00:00
(31-Mar-2003)
10:30:00
(31-Mar-2003)
11:00:00
(31-Mar-2003)
11:30:00
(31-Mar-2003)
b [MON_IO ]Mailbox Write Rate(# 1) g
c
d
e
f
g
b [MON_IO ]Mailbox Write Rate(# 2)
c
d
e
f
52
Dedicated Lock Manager Busy drops from
18% down to about 6%
Red is GS1280 with
16 CPUs at 1.15
GHz
Node(s) : MILP1
and MILP1
28
28
27
27
26
26
25
25
24
24
23
23
22
22
21
21
20
20
19
19
18
18
17
17
16
16
15
15
14
14
13
13
12
12
11
11
10
10
9
9
8
8
7
7
6
6
5
5
4
4
3
3
2
Green is GS160
with 16 CPUs at
1.224 GHz
2
09:30:00
(31-Mar-2003)
10:00:00
(31-Mar-2003)
10:30:00
(31-Mar-2003)
11:00:00
(31-Mar-2003)
11:30:00
(31-Mar-2003)
b [LCK73_MON]Busy %(# 1) g
c
d
e
f
g
b [LCK73_MON]Busy %(# 2)
c
d
e
f
53
Case 4 - SUMMARY
• The GS160 with 16 CPUs had been highly
tuned, yet was unable to handle the heaviest
peak loads presented.
• Bottleneck was related to reaching
maximum TCPIP throughput, related
MPsynch, and limits on the max BUFIO
• GS1280 immediately, without further
adjustment, provided dramatic increase in
maximum throughput & huge improvement
in spare capacity and headroom.
54
55
Case 5 – Production System
• NOTE color switch in slides
• Upgrade 12P GS140 to 16P GS1280
• Mixed Application and Database Server
56
MPsynch almost disappears
Drops from 130 to under 10%
Red is GS140 with
12 CPUs
Node(s) : ALCOR
and ALCOR
Green is GS1280
with 16 CPUs
210
210
200
200
190
190
180
180
170
170
160
160
150
150
140
140
130
130
120
120
110
110
100
100
90
90
80
80
70
70
60
60
50
50
40
40
30
30
20
20
10
10
08:30:00
(6-May-2003)
09:00:00
(6-May-2003)
09:30:00
(6-May-2003)
10:00:00
(6-May-2003)
10:30:00
(6-May-2003)
b [MON.MODE]Mp Synch(# 1) g
c
d
e
f
g
b [MON.MODE]Mp Synch(# 2)
c
d
e
f
57
Kernel Mode shrinks by more than 5 to 1
Red is GS140 with
12 CPUs
Node(s) : ALCOR
and ALCOR
Green is GS1280
with 16 CPUs
220
220
210
210
200
200
190
190
180
180
170
170
160
160
150
150
140
140
130
130
120
120
110
110
100
100
90
90
80
80
70
70
60
60
50
50
40
40
30
30
20
20
08:30:00
(6-May-2003)
09:00:00
(6-May-2003)
09:30:00
(6-May-2003)
10:00:00
(6-May-2003)
10:30:00
(6-May-2003)
b [MON.MODE]Kernel Mode(# 1) g
c
d
e
f
g
b [MON.MODE]Kernel Mode(# 2)
c
d
e
f
58
System Wide Interrupt also shrinks by more
than 5 to 1
Red is GS140 with
12 CPUs
Node(s) : ALCOR
and ALCOR
Green is GS1280
with 16 CPUs
120
120
110
110
100
100
90
90
80
80
70
70
60
60
50
50
40
40
30
30
20
20
10
10
08:30:00
(6-May-2003)
09:00:00
(6-May-2003)
09:30:00
(6-May-2003)
10:00:00
(6-May-2003)
10:30:00
(6-May-2003)
b [MON.MODE]Interrupt State(# 1) g
c
d
e
f
g
b [MON.MODE]Interrupt State(# 2)
c
d
e
f
59
User Mode is cut in half
Red is GS140 with
12 CPUs
Node(s) : ALCOR
and ALCOR
Green is GS1280
with 16 CPUs
650
650
600
600
550
550
500
500
450
450
400
400
350
350
300
300
250
250
200
200
150
150
08:30:00
(6-May-2003)
09:00:00
(6-May-2003)
09:30:00
(6-May-2003)
10:00:00
(6-May-2003)
10:30:00
(6-May-2003)
b [MON.MODE]User Mode(# 1) g
c
d
e
f
g
b [MON.MODE]User Mode(# 2)
c
d
e
f
60
CPU busy drops by almost 3 to 1
Red is GS140 with
12 CPUs
Node(s) : ALCOR
and ALCOR
Green is GS1280
with 16 CPUs
1,100
1,100
1,050
1,050
1,000
1,000
950
950
900
900
850
850
800
800
750
750
700
700
650
650
600
600
550
550
500
500
450
450
400
400
350
350
300
300
250
250
200
200
08:30:00
(6-May-2003)
09:00:00
(6-May-2003)
09:30:00
(6-May-2003)
10:00:00
(6-May-2003)
10:30:00
(6-May-2003)
b [MON.SYST]Cpu Busy(# 1) g
c
d
e
f
g
b [MON.SYST]Cpu Busy(# 2)
c
d
e
f
61
CPU 0 Interrupt almost disappears and drops
by more than 6 to1
Red is GS140 with
12 CPUs
Node(s) : ALCOR
and ALCOR
Green is GS1280
with 16 CPUs
75
75
70
70
65
65
60
60
55
55
50
50
45
45
40
40
35
35
30
30
25
25
20
20
15
15
10
10
5
5
08:30:00
(6-May-2003)
09:00:00
(6-May-2003)
09:30:00
(6-May-2003)
10:00:00
(6-May-2003)
10:30:00
(6-May-2003)
b [MON.MODES]Cpu 00 Inter m ode(# 1) g
c
d
e
f
g
b [MON.MODES]Cpu 00 Inter m ode(# 2)
c
d
e
f
62
Buffered IO – shows consistently higher peaks. There was
a real backlog of work waiting to be serviced
Red is GS140 with
12 CPUs
Node(s) : ALCOR
and ALCOR
Green is GS1280
with 16 CPUs
9,500
9,500
9,000
9,000
8,500
8,500
8,000
8,000
7,500
7,500
7,000
7,000
6,500
6,500
6,000
6,000
5,500
5,500
5,000
5,000
4,500
4,500
4,000
4,000
3,500
3,500
3,000
3,000
2,500
2,500
08:30:00
(6-May-2003)
09:00:00
(6-May-2003)
09:30:00
(6-May-2003)
10:00:00
(6-May-2003)
10:30:00
(6-May-2003)
b [MON.SYST]Buffered I/O Rate(# 1) g
c
d
e
f
g
b [MON.SYST]Buffered I/O Rate(# 2)
c
d
e
f
63
Direct IO shows substantially higher peaks
which are short-lived
Red is GS140 with
12 CPUs
Node(s) : ALCOR
and ALCOR
Green is GS1280
with 16 CPUs
850
850
800
800
750
750
700
700
650
650
600
600
550
550
500
500
450
450
400
400
350
350
300
300
250
250
200
200
150
150
100
100
08:30:00
(6-May-2003)
09:00:00
(6-May-2003)
09:30:00
(6-May-2003)
10:00:00
(6-May-2003)
10:30:00
(6-May-2003)
b [MON.SYST]Direct I/O Rate(# 1) g
c
d
e
f
g
b [MON.SYST]Direct I/O Rate(# 2)
c
d
e
f
64
Mailbox write increases
from 1400 to over 2400
Red is GS140 with
12 CPUs
Node(s) : ALCOR
and ALCOR
Green is GS1280
with 16 CPUs
4,400
4,400
4,200
4,200
4,000
4,000
3,800
3,800
3,600
3,600
3,400
3,400
3,200
3,200
3,000
3,000
2,800
2,800
2,600
2,600
2,400
2,400
2,200
2,200
2,000
2,000
1,800
1,800
1,600
1,600
1,400
1,400
1,200
1,200
1,000
1,000
08:30:00
(6-May-2003)
09:00:00
(6-May-2003)
09:30:00
(6-May-2003)
10:00:00
(6-May-2003)
10:30:00
(6-May-2003)
b [MON.IO]Mailbox Write Rate(# 1) g
c
d
e
f
g
b [MON.IO]Mailbox Write Rate(# 2)
c
d
e
f
65
Compute Queue evaporates
Red is GS140 with
12 CPUs
Node(s) : ALCOR
and ALCOR
Green is GS1280
with 16 CPUs
21
21
20
20
19
19
18
18
17
17
16
16
15
15
14
14
13
13
12
12
11
11
10
10
9
9
8
8
7
7
6
6
5
5
4
4
3
3
2
2
1
1
0
0
08:30:00
(6-May-2003)
09:00:00
(6-May-2003)
09:30:00
(6-May-2003)
10:00:00
(6-May-2003)
10:30:00
(6-May-2003)
b [MON.STAT]Com pute(# 1) g
c
d
e
f
g
b [MON.STAT]Com pute(# 2)
c
d
e
f
66
Case 5 - SUMMARY
• Huge backlog of work can now be handled
successfully during long lasting peak
periods as demonstrated by higher buffered
IO and other throughput metrics.
• Substantial further reserves of spare
capacity
• Large changes in key performance metrics
such as MPsynch, interrupt.
67
Proof Point Summary
• Marvel EV7 GS1280 systems are the best
performing VMS systems ever.
• Excellent out-of-the-box performance
• Superior SMP scaling
• Huge increases in maximum throughput, some
realized immediately, the rest held in reserve as
spare capacity.
• Marvel provides the headroom for future growth
68
Can your enterprise benefit from
an upgrade to a GS1280?
•
•
•
•
•
•
•
Systems with high MPsynch
Systems with high primary CPU interrupt load
Poor SMP scaling
Heavy locking
Heavy IO, Direct, Buffered, Mailbox
Heavy use of Oracle, TCPIP, Multinet
Look closer if:
– Systems with poor response time
– Systems with insufficient peak period throughput
69
Would you like to participate in
our Marvel Proof Point Program?
• Contact [email protected] for more information
about how you can take part
• Download T4 kit from public web site:
http://h71000.www7.hp.com/OpenVMS/products/t4/in
dex.html
• Start creating a compact, portable, T4 based
performance history of your most important systems
• The T4 data will create a common and efficient
language for our discussions. We can then work with
you and help you evaluate your unique pattern of use
and the degree to which the advantages of Marvel
EV7 on OpenVMS can most benefit you.
70
71