甘肃银行PowerHA HyperSwap测试记录 A

甘肃银行 PowerHA HyperSwap 测试记录
1 A-S
1.1 环境描述
1.1.1
拓扑信息
Site1
App1
Site2
PowerHA
Cluster
App2
App3
Tie-breaker
Disk
SAN_2
SAN_1
Metro Mirror
DS8877(Primary)
DS8876
描述:
在测试环境中,模拟三个站点,其中 Site1 有两个 AIX 分区和一个存储,Site2 有一个 AIX 分
区和一个存储,Site3 放置 PowerHA 集群的仲裁盘,目前暂时将该盘放置在 Site1。
1.1.2
Zone 信息
主机到 DS8877:
D20151030_HB_HPSW_Node1fcs0_DS8877
D20151030_HB_HPSW_Node1fcs1_DS8877
D20151030_HB_HPSW_Node2fcs0_DS8877
D20151030_HB_HPSW_Node2fcs1_DS8877
D20151030_HB_HPSW_Node3fcs0_DS8877
D20151030_HB_HPSW_Node3fcs1_DS8877
1 / 64
甘肃银行 PowerHA HyperSwap 测试记录
主机到 DS8876:
D20150730_GSB_HPSW_101_8876P1
D20150730_GSB_HPSW_101_8876P2
D20150730_GSB_HPSW_102_8876P1
D20150730_GSB_HPSW_102_8876P2
D20150730_GSB_HPSW_103_8876P1
D20150730_GSB_HPSW_103_8876P2
主机到 Tiebreaker:
D20151030_HB_HPSW_Node1fcs2_DS8877
D20151030_HB_HPSW_Node2fcs2_DS8877
D20151030_HB_HPSW_Node3fcs2_DS8877
存储之间 Metro Mirror:
D20150730_GSB_HPSW_PPRC_76to77
1.1.3
主机及 IP 信息
172.16.51.101
172.16.51.102
172.16.51.103
172.16.34.78
1.1.4
p7502901
p7502902
p7502903
serviceip
盘信息
# lspv
hdisk0
hdisk1
hdisk2
hdisk3
hdisk4
hdisk5
hdisk6
hdisk12
# lspprc -Ao
hdisk#
PPRC
state
2 / 64
00f8ba8174713d1a
00f8ba812f5f5a5b
00f8ba813257f86f
00f8ba813257e867
00f8ba813257e573
00f8ba813257e3cf
00f8ba81428bc67c
00f8ba810ad98e8e
Primary
path group
ID
Secondary
path group
ID
rootvg
oravg
testvg
testvg
testvg
testvg
caavg_private
None
Primary Storage
WWNN
active
active
concurrent
concurrent
concurrent
concurrent
active
Secondary Storage
WWNN
甘肃银行 PowerHA HyperSwap 测试记录
hdisk2
hdisk3
hdisk4
hdisk5
hdisk6
Active
Active
Active
Active
Active
0(s)
0(s)
0(s)
0(s)
0(s)
1
1
1
1
1
5005076305ffd4a4
5005076305ffd4a4
5005076305ffd4a4
5005076305ffd4a4
5005076305ffd4a4
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
# lspprc -p hdisk2
path
WWNN
LSS VOL
path
group id
group status
=======================================================
0(s)
5005076305ffd4a4 0xb3 0x00
PRIMARY
1
5005076305ffd4ac 0xb3 0x00
SECONDARY
path
path path
parent connection
group id id
status
=====================================================================
0
0
Enabled
fscsi0 50050763051014a4,40b3400000000000
0
1
Enabled
fscsi0 50050763050094a4,40b3400000000000
0
2
Enabled
fscsi0 50050763054014a4,40b3400000000000
0
3
Enabled
fscsi0 50050763050b14a4,40b3400000000000
0
4
Enabled
fscsi0 50050763054b14a4,40b3400000000000
0
5
Enabled
fscsi1 50050763051014a4,40b3400000000000
0
6
Enabled
fscsi1 50050763050094a4,40b3400000000000
0
7
Enabled
fscsi1 50050763054014a4,40b3400000000000
0
8
Enabled
fscsi1 50050763050b14a4,40b3400000000000
0
9
Enabled
fscsi1 50050763054b14a4,40b3400000000000
1 10
Enabled
fscsi0 50050763051bd4ac,40b3400000000000
1 11
Enabled
fscsi1 50050763051b94ac,40b3400000000000
1.2 主分区到主存储的 FC 路径失效
故障描述及预期:
主分区(App1)到主存储(DS8877)之间的链路失效,在测试中,模拟 App1 到 DS8877 的
Zone 失效
预期效果:
存储切换到 DS8876, 应用的 IO 暂停 15-20 秒后继续
3 / 64
甘肃银行 PowerHA HyperSwap 测试记录
Site1
App1
Site2
PowerHA
Cluster
App2
App3
Tie-breaker
Disk
SAN_2
SAN_1
Metro Mirror
DS8877(Primary)
# lspprc -Ao
hdisk#
PPRC
state
Primary
path group
ID
0(s)
0(s)
0(s)
0(s)
0(s)
DS8876
Secondary
path group
ID
1
1
1
1
1
Primary Storage
WWNN
hdisk2
Active
5005076305ffd4a4
hdisk3
Active
5005076305ffd4a4
hdisk4
Active
5005076305ffd4a4
hdisk5
Active
5005076305ffd4a4
hdisk6
Active
5005076305ffd4a4
# lspprc -p hdisk2
path
WWNN
LSS VOL
path
group id
group status
=======================================================
0(s)
5005076305ffd4a4 0xb3 0x00
PRIMARY
1
5005076305ffd4ac 0xb3 0x00
SECONDARY
Secondary Storage
WWNN
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
path
path path
parent connection
group id id
status
=====================================================================
0
0
Enabled
fscsi0 50050763051014a4,40b3400000000000
0
1
Enabled
fscsi0 50050763050094a4,40b3400000000000
0
2
Enabled
fscsi0 50050763054014a4,40b3400000000000
0
3
Enabled
fscsi0 50050763050b14a4,40b3400000000000
0
4
Enabled
fscsi0 50050763054b14a4,40b3400000000000
0
5
Enabled
fscsi1 50050763051014a4,40b3400000000000
4 / 64
甘肃银行 PowerHA HyperSwap 测试记录
0
0
0
0
1
1
6
7
8
9
10
11
Enabled
Enabled
Enabled
Enabled
Enabled
Enabled
fscsi1
fscsi1
fscsi1
fscsi1
fscsi0
fscsi1
50050763050094a4,40b3400000000000
50050763054014a4,40b3400000000000
50050763050b14a4,40b3400000000000
50050763054b14a4,40b3400000000000
50050763051bd4ac,40b3400000000000
50050763051b94ac,40b3400000000000
故障模拟:
cfgremove "CSC_Base", "D20151030_HB_HPSW_Node1fcs0_DS8877"
cfgremove "CSC_Base", "D20151030_HB_HPSW_Node1fcs1_DS8877"
echo y | cfgSave
echo y | cfgenable "CSC_Base"
过程截图:
结果说明:
IO 暂停 20 秒后继续
恢复过程描述:
cfgadd "CSC_Base", "D20151030_HB_HPSW_Node1fcs0_DS8877"
cfgadd "CSC_Base", "D20151030_HB_HPSW_Node1fcs1_DS8877"
echo y | cfgSave
echo y | cfgenable "CSC_Base"
然后手工切换主存储 DS8877
/usr/es/sbin/cluster/xd_generic/xd_cli/cl_clxd_manage_mg_smit -t 'user' -m 'usermg' -o 'swap'
/usr/es/sbin/cluster/xd_generic/xd_cli/cl_clxd_manage_mg_smit -t 'repository' -m 'repmg' -o
'swap'
5 / 64
甘肃银行 PowerHA HyperSwap 测试记录
1.3 主分区到所有存储的 FC 路径失效
故障描述及预期:
主分区到所有存储之间的链路失效,在测试中,模拟 App1 到 DS8877&DS8876 的所有 Zone
实效
预期效果:
存储不切换,资源组切换到 App2 上
Site1
App1
Site2
App2
PowerHA
Cluster
App3
Tie-breaker
Disk
SAN_2
SAN_1
Metro Mirror
DS8877(Primary)
故障模拟:
cfgremove "CSC_Base", "D20151030_HB_HPSW_Node1fcs0_DS8877"
cfgremove "CSC_Base", "D20151030_HB_HPSW_Node1fcs1_DS8877"
cfgremove "CSC_Base", "D20150730_GSB_HPSW_101_8876P1"
cfgremove "CSC_Base", "D20150730_GSB_HPSW_101_8876P2"
echo y | cfgSave
echo y | cfgenable "CSC_Base"
过程截图:
# clRGinfo
----------------------------------------------------------------------------Group Name
State
Node
----------------------------------------------------------------------------userRG
OFFLINE
p7502901@site1
ONLINE
p7502902@site1
ONLINE SECONDARY p7502903@site2
6 / 64
DS8876
甘肃银行 PowerHA HyperSwap 测试记录
结果说明:
2-3 分钟后,资源组在 App2 节点 active
恢复过程描述:
cfgadd "CSC_Base", "D20151030_HB_HPSW_Node1fcs0_DS8877"
cfgadd "CSC_Base", "D20151030_HB_HPSW_Node1fcs1_DS8877"
cfgadd "CSC_Base", "D20150730_GSB_HPSW_101_8876P1"
cfgadd "CSC_Base", "D20150730_GSB_HPSW_101_8876P2"
echo y | cfgSave
echo y | cfgenable "CSC_Base"
然后手工将资源组切换 App1 节点:
/usr/es/sbin/cluster/utilities/clRGmove -s 'false' -m -i -g 'userRG' -n 'p7502901'
Attempting to move resource group userRG to node p7502901.
Waiting for the cluster to process the resource group movement request....
Waiting for the cluster to stabilize........
Resource group movement successful.
Resource group userRG is online on node p7502901.
Cluster Name: p7502901_cluster
Resource Group Name: userRG
Node
Primary State
Secondary State
---------------------------- --------------p7502901@site1
ONLINE
OFFLINE
p7502902@site1
OFFLINE
OFFLINE
p7502903@site2
OFFLINE
ONLINE SECONDARY
1.4 数据中心之间所有 FC 链路失效
故障描述及预期:
主站点节点访问备站点存储失效+备站点节点访问主站点存储失效+存储之间 PPRC 链路失效
预期效果:
IO 会暂停 30 秒然后恢复,存储不会发生切换
7 / 64
甘肃银行 PowerHA HyperSwap 测试记录
Site1
App1
Site2
App2
PowerHA
Cluster
App3
Tie-breaker
Disk
SAN_2
SAN_1
Metro Mirror
DS8877(Primary)
故障模拟:
cfgremove "CSC_Base", " D20150730_GSB_HPSW_101_8876P1"
cfgremove "CSC_Base", " D20150730_GSB_HPSW_101_8876P2"
cfgremove "CSC_Base", " D20150730_GSB_HPSW_102_8876P1"
cfgremove "CSC_Base", " D20150730_GSB_HPSW_102_8876P2"
cfgremove "CSC_Base", " D20151030_HB_HPSW_Node3fcs0_DS8877"
cfgremove "CSC_Base", " D20151030_HB_HPSW_Node3fcs1_DS8877"
cfgremove "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77"
echo y | cfgSave
echo y | cfgenable "CSC_Base"
过程截图:
8 / 64
DS8876
甘肃银行 PowerHA HyperSwap 测试记录
dscli> lssi
Date/Time: 2015 年 7 月 2 日 下午 04 时 40 分 03 秒 IBM DSCLI Version: 7.7.10.289 DS: Name ID
Storage Unit
Model WWNN
State ESSNet
==============================================================================
DS8877 IBM.2107-75CGR21 IBM.2107-75CGR20 961 5005076305FFD4A4 Online Enabled
dscli> lspprc B800 B900 BD00
Date/Time: 2015 年 7 月 2 日 下午 04 时 31 分 12 秒 IBM DSCLI Version: 7.7.10.289 DS:
IBM.2107-75CGR21
ID
State
Reason
Type
SourceLSS Timeout
(secs) Critical Mode First Pass Status
===============================================================================
=====================================
B800:B800 Suspended Internal Conditions Target Metro Mirror B8
60
Disabled
Invalid
B900:B900 Suspended Freeze
Metro Mirror B9
60
Disabled
Invalid
BD00:BD00 Suspended Internal Conditions Target Metro Mirror BD
60
Disabled
Invalid
dscli> lspprcpath B8 B9 BD
Date/Time: 2015 年 7 月 2 日 下午 04 时 39 分 55 秒 IBM DSCLI Version: 7.7.10.289 DS:
IBM.2107-75CGR21
Src Tgt State SS Port Attached Port Tgt WWNN
========================================================
B8 B8 Failed FFB8 5005076305FFD4AC
B9 B9 Failed FFB9 5005076305FFD4AC
BD BD Failed FFBD I0134 I0333
5005076305FFD4AC
dscli> lssi
Date/Time: 2015 年 7 月 2 日 下午 04 时 40 分 28 秒 IBM DSCLI Version: 7.7.10.289 DS: Name ID
Storage Unit
Model WWNN
State ESSNet
==============================================================================
DS8876 IBM.2107-75CHD91 IBM.2107-75CHD90 961 5005076305FFD4AC Online Enabled
dscli> lspprc B800 B900 BD00
Date/Time: 2015 年 7 月 2 日 下午 04 时 31 分 15 秒 IBM DSCLI Version: 7.7.10.289 DS:
IBM.2107-75CHD91
ID
State
Reason Type
SourceLSS Timeout (secs) Critical Mode
First Pass Status
9 / 64
甘肃银行 PowerHA HyperSwap 测试记录
===============================================================================
==========================
B800:B800 Target Full Duplex Metro Mirror B8
unknown
Disabled
Invalid
B900:B900 Target Full Duplex Metro Mirror B9
unknown
Disabled
Invalid
BD00:BD00 Target Full Duplex Metro Mirror BD
unknown
Disabled
Invalid
dscli> lspprcpath B8 B9 BD
Date/Time: 2015 年 7 月 2 日 下午 04 时 31 分 05 秒 IBM DSCLI Version: 7.7.10.289 DS:
IBM.2107-75CHD91
Src Tgt State SS Port Attached Port Tgt WWNN
========================================================
B8 B8 Failed FFB8 I0333 I0134
5005076305FFD4A4
B9 B9 Failed FFB9 I0333 I0134
5005076305FFD4A4
BD BD Failed FFBD I0333 I0134
5005076305FFD4A4
结果说明:
IO 会暂停 30 秒然后恢复,存储不会发生切换
恢复过程描述:
cfgadd "CSC_Base", " D20150730_GSB_HPSW_101_8876P1"
cfgadd "CSC_Base", " D20150730_GSB_HPSW_101_8876P2"
cfgadd "CSC_Base", " D20150730_GSB_HPSW_102_8876P1"
cfgadd "CSC_Base", " D20150730_GSB_HPSW_102_8876P2"
cfgadd "CSC_Base", " D20151030_HB_HPSW_Node3fcs0_DS8877"
cfgadd "CSC_Base", " D20151030_HB_HPSW_Node3fcs1_DS8877"
cfgadd "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77"
echo y | cfgSave
echo y | cfgenable "CSC_Base"
创建 pprcpath:在 DS8877 上做
mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91
5005076305FFD4AC -srclss B3 -tgtlss B3 -consistgrp I0134:I0333
mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91
5005076305FFD4AC -srclss B4 -tgtlss B4 -consistgrp I0134:I0333
mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91
5005076305FFD4AC -srclss B8 -tgtlss B8 -consistgrp I0134:I0333
mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91
5005076305FFD4AC -srclss B9 -tgtlss B9 -consistgrp I0134:I0333
mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91
5005076305FFD4AC -srclss BD -tgtlss BD -consistgrp I0134:I0333
创建 pprcpath:在 DS8876 上做
mkpprcpath -dev IBM.2107-75CHD91
10 / 64
-remotedev
IBM.2107-75CGR21
-remotewwnn
-remotewwnn
-remotewwnn
-remotewwnn
-remotewwnn
-remotewwnn
甘肃银行 PowerHA HyperSwap 测试记录
5005076305FFD4A4 -srclss B3 -tgtlss B3 -consistgrp I0333:I0134
mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21
5005076305FFD4A4 -srclss B4 -tgtlss B4 -consistgrp I0333:I0134
mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21
5005076305FFD4A4 -srclss B8 -tgtlss B8 -consistgrp I0333:I0134
mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21
5005076305FFD4A4 -srclss B9 -tgtlss B9 -consistgrp I0333:I0134
mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21
5005076305FFD4A4 -srclss BD -tgtlss BD -consistgrp I0333:I0134
-remotewwnn
-remotewwnn
-remotewwnn
-remotewwnn
增量同步数据,在 DS8877 上做
resumepprc -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91 -type mmir B300:B300
B400:B400 B800:B800 B900:B900 BD00:BD00
最后再手工切换资源组到主存储 DS8877 上
1.5 主存储故障
故障描述及预期:
主分区到主存储之间的链路失效,在测试中,模拟所有 3 个分区到 DS8877 的 Zone 失效及
DS8877->DS8876 之间的 Zone 失效
预期效果:
存储切换到 DS8876, 应用的 IO 暂停 15-20 秒后继续
11 / 64
甘肃银行 PowerHA HyperSwap 测试记录
Site1
App1
Site2
App2
PowerHA
Cluster
App3
Tie-breaker
Disk
SAN_2
SAN_1
Metro Mirror
DS8877(Primary)
故障模拟:
cfgremove "CSC_Base", " D20151030_HB_HPSW_Node1fcs0_DS8877"
cfgremove "CSC_Base", " D20151030_HB_HPSW_Node1fcs1_DS8877"
cfgremove "CSC_Base", " D20151030_HB_HPSW_Node2fcs0_DS8877"
cfgremove "CSC_Base", " D20151030_HB_HPSW_Node2fcs1_DS8877"
cfgremove "CSC_Base", " D20151030_HB_HPSW_Node3fcs0_DS8877"
cfgremove "CSC_Base", " D20151030_HB_HPSW_Node3fcs1_DS8877"
cfgremove "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77"
echo y | cfgSave
echo y | cfgenable "CSC_Base"
过程截图:
12 / 64
DS8876
甘肃银行 PowerHA HyperSwap 测试记录
# lspprc -Ao
hdisk#
PPRC
Primary
state
Secondary
path group
path group
ID
Primary Storage
WWNN
Secondary Storage
WWNN
ID
hdisk2
Active
0, 1(s)
-1
5005076305ffd4a4,5005076305ffd4ac
hdisk3
Active
0, 1(s)
-1
5005076305ffd4a4,5005076305ffd4ac
hdisk4
Active
0, 1(s)
-1
5005076305ffd4a4,5005076305ffd4ac
hdisk5
Active
0, 1(s)
-1
5005076305ffd4a4,5005076305ffd4ac
hdisk6
Active
0, 1(s)
-1
5005076305ffd4a4,5005076305ffd4ac
# lspprc -p hdisk2
path
WWNN
LSS
VOL
group id
path
group status
=======================================================
0
5005076305ffd4a4
1(s)
5005076305ffd4ac
0xb3
0xb3
0x00
0x00
PRIMARY
PRIMARY,
SUSPENDED,
OOS
path
path
group id
id
path
parent
connection
status
=====================================================================
0
0
Failed
fscsi0
50050763051014a4,40b3400000000000
0
1
Failed
fscsi0
50050763050094a4,40b3400000000000
0
2
Failed
fscsi0
50050763054014a4,40b3400000000000
0
3
Failed
fscsi0
50050763050b14a4,40b3400000000000
0
4
Failed
fscsi0
50050763054b14a4,40b3400000000000
0
5
Failed
fscsi1
50050763051014a4,40b3400000000000
0
6
Failed
fscsi1
50050763050094a4,40b3400000000000
0
7
Failed
fscsi1
50050763054014a4,40b3400000000000
0
8
Failed
fscsi1
50050763050b14a4,40b3400000000000
0
9
Failed
fscsi1
50050763054b14a4,40b3400000000000
1
10
Enabled
fscsi0
50050763051bd4ac,40b3400000000000
1
11
Enabled
fscsi1
50050763051b94ac,40b3400000000000
#
dscli> lssi
13 / 64
甘肃银行 PowerHA HyperSwap 测试记录
Date/Time: 2015 年 6 月 30 日 下午 03 时 22 分 59 秒 IBM DSCLI Version: 7.7.10.289 DS: Name
ID
Storage Unit
Model WWNN
State
ESSNet
==============================================================================
DS8877 IBM.2107-75CGR21 IBM.2107-75CGR20 961
5005076305FFD4A4 Online Enabled
dscli> lspprc B300 B400 B800 B900 BD00
Date/Time: 2015 年 6 月 30 日 下午 03 时 22 分 11 秒 IBM DSCLI Version: 7.7.10.289 DS: IBM.2107-75CGR21
ID
State
Reason Type
SourceLSS Timeout (secs) Critical Mode First Pass Status
============================================================================================
======
B300:B300 Full Duplex -
Metro Mirror B3
60
Disabled
Invalid
B400:B400 Full Duplex -
Metro Mirror B4
60
Disabled
Invalid
B800:B800 Full Duplex -
Metro Mirror B8
60
Disabled
Invalid
B900:B900 Full Duplex -
Metro Mirror B9
60
Disabled
Invalid
BD00:BD00 Full Duplex -
Metro Mirror BD
60
Disabled
Invalid
dscli> lssi
Date/Time: 2015 年 6 月 30 日 下午 03 时 23 分 55 秒 IBM DSCLI Version: 7.7.10.289 DS: Name
ID
Storage Unit
Model WWNN
State
ESSNet
==============================================================================
DS8876 IBM.2107-75CHD91 IBM.2107-75CHD90 961
5005076305FFD4AC Online Enabled
dscli> lspprc B300 B400 B800 B900 BD00
Date/Time: 2015 年 6 月 30 日 下午 03 时 24 分 02 秒 IBM DSCLI Version: 7.7.10.289 DS: IBM.2107-75CHD91
ID
State
Reason
Type
SourceLSS Timeout (secs) Critical Mode First Pass Status
============================================================================================
=========
B300:B300 Suspended Host Source Metro Mirror B3
60
Disabled
Invalid
B400:B400 Suspended Host Source Metro Mirror B4
60
Disabled
Invalid
B800:B800 Suspended Host Source Metro Mirror B8
60
Disabled
Invalid
B900:B900 Suspended Host Source Metro Mirror B9
60
Disabled
Invalid
BD00:BD00 Suspended Host Source Metro Mirror BD
60
结果说明:
存储切换到 DS8876, 应用的 IO 暂停 15-20 秒后继续
恢复过程描述:
cfgadd "CSC_Base", " D20151030_HB_HPSW_Node1fcs0_DS8877"
cfgadd "CSC_Base", " D20151030_HB_HPSW_Node1fcs1_DS8877"
cfgadd "CSC_Base", " D20151030_HB_HPSW_Node2fcs0_DS8877"
cfgadd "CSC_Base", " D20151030_HB_HPSW_Node2fcs1_DS8877"
cfgadd "CSC_Base", " D20151030_HB_HPSW_Node3fcs0_DS8877"
cfgadd "CSC_Base", " D20151030_HB_HPSW_Node3fcs1_DS8877"
cfgadd "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77"
echo y | cfgSave
echo y | cfgenable "CSC_Base"
14 / 64
Disabled
Invalid
甘肃银行 PowerHA HyperSwap 测试记录
查看 pprcpath 是否双向全部创建成功,如果没有全部,需要手工创建
需要在 DS8877 上 pausepprc(如果 usermg 设置 resync-action 为 auto,以下操作就不用做了)
然后在 DS8876 上做 failbackpprc
最后再手工 swap 存储到主存储 DS8877
1.6 存储复制链路故障
故障描述及预期:
主分区到主存储之间的链路失效,在测试中,模拟 DS8877 和 DS8876 之间的 Zone 失效
预期效果:
IO 会暂停 30 秒然后恢复,存储不会发生切换
Site1
App1
Site2
App2
PowerHA
Cluster
App3
Tie-breaker
Disk
SAN_2
SAN_1
Metro Mirror
DS8877(Primary)
故障模拟:
cfgremove "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77"
echo y | cfgSave
echo y | cfgenable "CSC_Base"
过程截图:
15 / 64
DS8876
甘肃银行 PowerHA HyperSwap 测试记录
结果说明:
IO 会暂停 30 秒然后恢复,存储不会发生切换
恢复过程描述:
cfgadd "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77"
echo y | cfgSave
echo y | cfgenable "CSC_Base"
创建 pprcpath
在 DS8877 上运行 resumepprc
1.7 主站点故障
故障描述及预期:
主站点设备故障,在测试中,模拟 App1、App2 的系统断电以及 DS8877 存储故障
预期效果:
16 / 64
甘肃银行 PowerHA HyperSwap 测试记录
存储切换到 DS8876, 资源组会切换到 App3 节点,整个时间在 2 分钟左右
Site1
App1
Site2
App2
PowerHA
Cluster
App3
Tie-breaker
Disk
SAN_2
SAN_1
Metro Mirror
DS8877(Primary)
DS8876
故障模拟:
chsysstate -m SVRP7750-29-SN06BA81T -r lpar -o shutdown --id 2 --immed
chsysstate -m SVRP7750-29-SN06BA81T -r lpar -o shutdown --id 3 --immed
cfgremove "CSC_Base", " D20151030_HB_HPSW_Node3fcs0_DS8877"
cfgremove "CSC_Base", " D20151030_HB_HPSW_Node3fcs1_DS8877"
cfgremove "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77"
echo y | cfgSave
echo y | cfgenable "CSC_Base"
过程截图:
# lspprc -Ao
hdisk#
PPRC
state
Primary
path group
ID
0, 1(s)
0, 1(s)
0, 1(s)
0, 1(s)
0, 1(s)
Secondary
path group
ID
Primary Storage
WWNN
Secondary Storage
WWNN
hdisk2
Active
-1
5005076305ffd4a4,5005076305ffd4ac
hdisk3
Active
-1
5005076305ffd4a4,5005076305ffd4ac
hdisk4
Active
-1
5005076305ffd4a4,5005076305ffd4ac
hdisk5
Active
-1
5005076305ffd4a4,5005076305ffd4ac
hdisk6
Active
-1
5005076305ffd4a4,5005076305ffd4ac
# lspprc -p hdisk2
path
WWNN
LSS VOL
path
group id
group status
=======================================================
0
5005076305ffd4a4 0xb3 0x00
PRIMARY
17 / 64
甘肃银行 PowerHA HyperSwap 测试记录
1(s)
5005076305ffd4ac 0xb3 0x00
PRIMARY,
SUSPENDED
path
path path
parent connection
group id id
status
=====================================================================
0
0
Failed
fscsi0 50050763051014a4,40b3400000000000
0
1
Failed
fscsi0 50050763050094a4,40b3400000000000
0
2
Failed
fscsi0 50050763054014a4,40b3400000000000
0
3
Failed
fscsi0 50050763050b14a4,40b3400000000000
0
4
Failed
fscsi0 50050763054b14a4,40b3400000000000
0
5
Failed
fscsi1 50050763051014a4,40b3400000000000
0
6
Failed
fscsi1 50050763050094a4,40b3400000000000
0
7
Failed
fscsi1 50050763054014a4,40b3400000000000
0
8
Failed
fscsi1 50050763050b14a4,40b3400000000000
0
9
Failed
fscsi1 50050763054b14a4,40b3400000000000
1 10
Enabled
fscsi0 50050763051bd4ac,40b3400000000000
1 11
Enabled
fscsi1 50050763051b94ac,40b3400000000000
# lspprc -p hdisk5
path
WWNN
LSS VOL
path
group id
group status
=======================================================
0
5005076305ffd4a4 0xb9 0x00
PRIMARY
1(s)
5005076305ffd4ac 0xb9 0x00
PRIMARY,
SUSPENDED,
OOS
path
path path
parent connection
group id id
status
=====================================================================
0
0
Failed
fscsi0 50050763051014a4,40b9400000000000
0
1
Failed
fscsi0 50050763050094a4,40b9400000000000
0
2
Failed
fscsi0 50050763054014a4,40b9400000000000
0
3
Failed
fscsi0 50050763050b14a4,40b9400000000000
0
4
Failed
fscsi0 50050763054b14a4,40b9400000000000
0
5
Failed
fscsi1 50050763051014a4,40b9400000000000
0
6
Failed
fscsi1 50050763050094a4,40b9400000000000
0
7
Failed
fscsi1 50050763054014a4,40b9400000000000
0
8
Failed
fscsi1 50050763050b14a4,40b9400000000000
0
9
Failed
fscsi1 50050763054b14a4,40b9400000000000
1 10
Enabled
fscsi0 50050763051bd4ac,40b9400000000000
1 11
Enabled
fscsi1 50050763051b94ac,40b9400000000000
18 / 64
甘肃银行 PowerHA HyperSwap 测试记录
# df
Filesystem
512-blocks
Free %Used
Iused %Iused Mounted on
/dev/hd4
2097152 1516208 28%
11913
7% /
/dev/hd2
8388608 1802760 79%
63212
23% /usr
/dev/hd9var
2097152
627496 71%
7586
9% /var
/dev/hd3
20971520 14417496 32%
2886
1% /tmp
/dev/hd1
8388608 3846560 55% 430347
50% /home
/dev/hd11admin
1048576 1047696
1%
5
1% /admin
/proc
- /proc
/dev/hd10opt
4194304 3338184 21%
9120
3% /opt
/dev/livedump
1048576 1045624
1%
8
1% /var/adm/ras/livedump
/dev/lvora1
31457280 13321072 58%
21073
2% /oracle
/aha
54
1% /aha
/dev/lvoradata 62914560 61224472
3%
17
1% /oracle/oradata
# hostname
p7502903
结果说明:
存储切换到 DS8876, 资源组会切换到 App3 节点,整个时间在 2 分钟左右
恢复过程描述:
cfgadd "CSC_Base", " D20151030_HB_HPSW_Node3fcs0_DS8877"
cfgadd "CSC_Base", " D20151030_HB_HPSW_Node3fcs1_DS8877"
cfgadd "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77"
echo y | cfgSave
echo y | cfgenable "CSC_Base"
创建 pprcpath:在 DS8877 上做
mkpprcpath -dev IBM.2107-75CGR21
19 / 64
-remotedev
IBM.2107-75CHD91
-remotewwnn
甘肃银行 PowerHA HyperSwap 测试记录
5005076305FFD4AC -srclss B3 -tgtlss B3 -consistgrp I0134:I0333
mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91
5005076305FFD4AC -srclss B4 -tgtlss B4 -consistgrp I0134:I0333
mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91
5005076305FFD4AC -srclss B8 -tgtlss B8 -consistgrp I0134:I0333
mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91
5005076305FFD4AC -srclss B9 -tgtlss B9 -consistgrp I0134:I0333
mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91
5005076305FFD4AC -srclss BD -tgtlss BD -consistgrp I0134:I0333
创建 pprcpath:在 DS8876 上做
mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21
5005076305FFD4A4 -srclss B3 -tgtlss B3 -consistgrp I0333:I0134
mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21
5005076305FFD4A4 -srclss B4 -tgtlss B4 -consistgrp I0333:I0134
mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21
5005076305FFD4A4 -srclss B8 -tgtlss B8 -consistgrp I0333:I0134
mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21
5005076305FFD4A4 -srclss B9 -tgtlss B9 -consistgrp I0333:I0134
mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21
5005076305FFD4A4 -srclss BD -tgtlss BD -consistgrp I0333:I0134
-remotewwnn
-remotewwnn
-remotewwnn
-remotewwnn
-remotewwnn
-remotewwnn
-remotewwnn
-remotewwnn
-remotewwnn
failbackpprc -remotedev IBM.2107-75CGR21 -type mmir B300:B300 B400:B400 B800:B800
B900:B900 BD00:BD00
failbackpprc -remotedev IBM.2107-75CGR21 -type mmir BD00:BD00
resumepprc -remotedev IBM.2107-75CGR21 -type mmir BD00:BD00
resumepprc -remotedev IBM.2107-75CGR21 -type mmir B300:B300
chsysstate -m SVRP7750-29-SN06BA81T -r lpar -o on --id 2
chsysstate -m SVRP7750-29-SN06BA81T -r lpar -o on --id 3
chsysstate -m SVRP7750-29-SN06BA81T -r lpar -o shutdown --id 4 --immed
chsysstate -m SVRP7750-29-SN06BA81T -r lpar -o on --id 4
1.8 常用脚本
1.8.1
1.8.1.1
AIX
查看 hdisk 和 LUN 的对应关系
for i in 2 3 4 5 6 12
do
echo hdisk$i
20 / 64
甘肃银行 PowerHA HyperSwap 测试记录
lscfg -vpl hdisk$i|egrep "Serial Number|Z7"
echo "\n"
done
hdisk2
Serial Number...............75CGR21B
Device Specific.(Z7)........B300
hdisk3
Serial Number...............75CGR21B
Device Specific.(Z7)........B400
hdisk4
Serial Number...............75CGR21B
Device Specific.(Z7)........B800
hdisk5
Serial Number...............75CGR21B
Device Specific.(Z7)........B900
hdisk6
Serial Number...............75CGR21B
Device Specific.(Z7)........BD00
hdisk12
Serial Number...............75CGR21E
Device Specific.(Z7)........E204
1.8.1.2
查看 hdisk 的 HyperSwap enable 情况
# lspprc -Ao
hdisk#
PPRC
state
hdisk2
hdisk3
hdisk4
hdisk5
hdisk6
21 / 64
Active
Active
Active
Active
Active
Primary
path group
ID
1(s)
1(s)
1(s)
1(s)
1(s)
Secondary
path group
ID
0
0
0
0
0
Primary Storage
WWNN
Secondary Storage
WWNN
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4a4
5005076305ffd4a4
5005076305ffd4a4
5005076305ffd4a4
5005076305ffd4a4
甘肃银行 PowerHA HyperSwap 测试记录
# lspprc -p hdisk2
path
WWNN
LSS VOL
path
group id
group status
=======================================================
0
5005076305ffd4a4 0xb3 0x00
SECONDARY
1(s)
5005076305ffd4ac 0xb3 0x00
PRIMARY
path
path path
parent connection
group id id
status
=====================================================================
0
0
Enabled
fscsi0 50050763051014a4,40b3400000000000
0
1
Enabled
fscsi0 50050763050094a4,40b3400000000000
0
2
Enabled
fscsi0 50050763054014a4,40b3400000000000
0
3
Enabled
fscsi0 50050763050b14a4,40b3400000000000
0
4
Enabled
fscsi0 50050763054b14a4,40b3400000000000
0
5
Enabled
fscsi1 50050763051014a4,40b3400000000000
0
6
Enabled
fscsi1 50050763050094a4,40b3400000000000
0
7
Enabled
fscsi1 50050763054014a4,40b3400000000000
0
8
Enabled
fscsi1 50050763050b14a4,40b3400000000000
0
9
Enabled
fscsi1 50050763054b14a4,40b3400000000000
1 10
Enabled
fscsi0 50050763051bd4ac,40b3400000000000
1 11
Enabled
fscsi1 50050763051b94ac,40b3400000000000
1.8.1.3
查看 HyperSwap disk 的 pprcpath 配置情况
# lspprc -c hdisk2
Displaying all paths between LSS B3 and LSS B3
Source
Target
WWNN
SSID LSS Port WWNN
SSID LSS Port State
===================================================================
5005076305FFD4AC FFB3 B3 0333 5005076305FFD4A4 FFB3 B3 0134 Up
5005076305FFD4A4 FFB3 B3 0134 5005076305FFD4AC FFB3 B3 0333 Up
1.8.2
1.8.2.1
PowerHA
查看资源组状态
# clRGinfo
----------------------------------------------------------------------------Group Name
State
Node
22 / 64
甘肃银行 PowerHA HyperSwap 测试记录
----------------------------------------------------------------------------userRG
ONLINE
p7502901@site1
OFFLINE
p7502902@site1
ONLINE SECONDARY p7502903@site2
1.8.2.2
查看集群管理进程
# clshowsrv -v
Status of the RSCT subsystems used by PowerHA SystemMirror:
Subsystem
Group
PID
Status
cthags
cthags
8060928
active
ctrmc
rsct
6422726
active
Status of the PowerHA SystemMirror subsystems:
Subsystem
Group
PID
clstrmgrES
cluster
5439738
clevmgrdES
cluster
10420264
Status of the CAA subsystems:
Subsystem
Group
clcomd
caa
clconfd
caa
1.8.3
PID
6095036
4194506
Status
active
active
Status
active
active
DS8K
1.8.3.1
查看微码版本
dscli> ver -l
Date/Time: 2015 年 7 月 1 日 上午 09 时 26 分 05 秒 IBM DSCLI Version: 7.7.10.289 DS: DSCLI
7.7.10.289
StorageManager 7.7.7.0.20140929.1
================Version=================
Storage Image
LMC
===========================
IBM.2107-75CGR21 7.7.40.364
1.8.3.2
dscli> lssi
23 / 64
查看 ID 及 WWNN 等
甘肃银行 PowerHA HyperSwap 测试记录
Date/Time: 2015 年 7 月 1 日 上午 09 时 26 分 22 秒 IBM DSCLI Version: 7.7.10.289 DS: Name ID
Storage Unit
Model WWNN
State ESSNet
==============================================================================
DS8877 IBM.2107-75CGR21 IBM.2107-75CGR20 961 5005076305FFD4A4 Online Enabled
1.8.3.3
查看 PPRC 状态
dscli> lspprc B300 B400 B800 B900 BD00
Date/Time: 2015 年 7 月 1 日 上午 09 时 27 分 29 秒 IBM DSCLI Version: 7.7.10.289 DS: IBM.2107-75CGR21
ID
State
Reason Type
SourceLSS Timeout (secs) Critical Mode First Pass Status
============================================================================================
=============
B300:B300 Target Full Duplex -
Metro Mirror B3
unknown
Disabled
Invalid
B400:B400 Target Full Duplex -
Metro Mirror B4
unknown
Disabled
Invalid
B800:B800 Target Full Duplex -
Metro Mirror B8
unknown
Disabled
Invalid
B900:B900 Target Full Duplex -
Metro Mirror B9
unknown
Disabled
Invalid
BD00:BD00 Target Full Duplex -
Metro Mirror BD
1.8.3.4
unknown
Disabled
Invalid
查看 PPRCPATH 状态及配置
dscli> lspprcpath B3 B4 B8 B9 BD
Date/Time: 2015 年 7 月 1 日 上午 09 时 28 分 25 秒 IBM DSCLI Version: 7.7.10.289 DS:
IBM.2107-75CGR21
Src Tgt State SS Port Attached Port Tgt WWNN
=========================================================
B3 B3 Success FFB3 I0134 I0333
5005076305FFD4AC
B4 B4 Success FFB4 I0134 I0333
5005076305FFD4AC
B8 B8 Success FFB8 I0134 I0333
5005076305FFD4AC
B9 B9 Success FFB9 I0134 I0333
5005076305FFD4AC
BD BD Success FFBD I0134 I0333
5005076305FFD4AC
dscli> showlss b3
Date/Time: 2015 年 7 月 1 日 上午 09 时 28 分 50 秒 IBM DSCLI Version: 7.7.10.289 DS:
IBM.2107-75CGR21
ID
B3
Group
1
addrgrp
B
stgtype
fb
confgvols
1
subsys
0xFFB3
pprcconsistgrp Enabled
24 / 64
甘肃银行 PowerHA HyperSwap 测试记录
xtndlbztimout 60 secs
resgrp
RG0
1.8.3.5
查看是否有 reserve lock
dscli> showfbvol -reserve b400
Date/Time: 2015 年 7 月 1 日 上午 09 时 32 分 10 秒 IBM DSCLI Version: 7.7.10.289 DS:
IBM.2107-75CGR21
CMUN04003E showfbvol: Operation failure: internal error. Contact IBM technical support for
assistance.
Name
HB_HPSW_Data
ID
B400
accstate
Online
datastate
Normal
configstate
Normal
deviceMTM
2107-900
datatype
FB 512
1.8.4
1.8.4.1
SAN Switch
查看当前激活的 zone 信息
SAN768B-02:FID128:admin> cfgactvshow|grep D20151030_HB_HPSW_Node
zone: D20151030_HB_HPSW_Node1fcs0_DS8877
zone: D20151030_HB_HPSW_Node1fcs1_DS8877
zone: D20151030_HB_HPSW_Node1fcs2_DS8877
zone: D20151030_HB_HPSW_Node2fcs0_DS8877
zone: D20151030_HB_HPSW_Node2fcs1_DS8877
zone: D20151030_HB_HPSW_Node2fcs2_DS8877
zone: D20151030_HB_HPSW_Node3fcs0_DS8877
zone: D20151030_HB_HPSW_Node3fcs1_DS8877
zone: D20151030_HB_HPSW_Node3fcs2_DS8877
1.8.4.2
Remove zones
以模拟主存储故障为例:
cfgremove "CSC_Base", " D20151030_HB_HPSW_Node1fcs0_DS8877"
cfgremove "CSC_Base", " D20151030_HB_HPSW_Node1fcs1_DS8877"
cfgremove "CSC_Base", " D20151030_HB_HPSW_Node2fcs0_DS8877"
25 / 64
甘肃银行 PowerHA HyperSwap 测试记录
cfgremove "CSC_Base", " D20151030_HB_HPSW_Node2fcs1_DS8877"
cfgremove "CSC_Base", " D20151030_HB_HPSW_Node3fcs0_DS8877"
cfgremove "CSC_Base", " D20151030_HB_HPSW_Node3fcs1_DS8877"
cfgremove "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77"
echo y | cfgSave
echo y | cfgenable "CSC_Base"
1.8.4.3
Add zones
以模拟恢复主存储为例:
cfgadd "CSC_Base", " D20151030_HB_HPSW_Node1fcs0_DS8877"
cfgadd "CSC_Base", " D20151030_HB_HPSW_Node1fcs1_DS8877"
cfgadd "CSC_Base", " D20151030_HB_HPSW_Node2fcs0_DS8877"
cfgadd "CSC_Base", " D20151030_HB_HPSW_Node2fcs1_DS8877"
cfgadd "CSC_Base", " D20151030_HB_HPSW_Node3fcs0_DS8877"
cfgadd "CSC_Base", " D20151030_HB_HPSW_Node3fcs1_DS8877"
cfgadd "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77"
echo y | cfgSave
echo y | cfgenable "CSC_Base"
26 / 64
甘肃银行 PowerHA HyperSwap 测试记录
2 A-A
2.1 环境描述
2.1.1
拓扑信息
NFS
Quorum
Site2
Site1
RAC1
RAC2
PowerHA
Cluster
RAC3
SAN-21
SAN-11
SAN-22
SAN-12
Metro Mirror
DS8877(Primary)
DS8876
描述:
在测试环境中,模拟三个站点,其中 Site1 有两个 RAC 节点和一个存储,Site2 有一个 RAC
节点和一个存储,Site3 放置 Oracle RAC 的第三个基于 NFS 的仲裁设备,目前暂时将该设备
放置在 Site1;第一个
2.1.2
Zone 信息
主机到 DS8877:
zone: D20151030_HB_HPSW_RAC1_fcs0_DS8877
27 / 64
甘肃银行 PowerHA HyperSwap 测试记录
zone:
zone:
zone:
zone:
zone:
D20151030_HB_HPSW_RAC1_fcs1_DS8877
D20151030_HB_HPSW_RAC2_fcs0_DS8877
D20151030_HB_HPSW_RAC2_fcs1_DS8877
D20151030_HB_HPSW_RAC3_fcs0_DS8877
D20151030_HB_HPSW_RAC3_fcs1_DS8877
主机到 DS8876:
zone: D20150730_GSB_HPSW_RAC1_8876P1
zone: D20150730_GSB_HPSW_RAC1_8876P2
zone: D20150730_GSB_HPSW_RAC2_8876P1
zone: D20150730_GSB_HPSW_RAC2_8876P2
zone: D20150730_GSB_HPSW_RAC3_8876P1
zone: D20150730_GSB_HPSW_RAC3_8876P2
Oracle RAC NFS quorum:
IP: 172.16.51.11
/dev/fslv00
4194304
# ls /tftpboot
lost+found nfs_vote
3578608
存储之间 Metro Mirror:
D20150730_GSB_HPSW_PPRC_76to77
2.1.3
主机及 IP 信息
172.16.51.104
172.16.51.107
172.16.50.104
172.16.34.44
rac1
rac1-vip
rac1-priv
rac1_nfs
172.16.51.105
172.16.51.108
172.16.50.105
172.16.34.45
rac2
rac2-vip
rac2-priv
rac2_nfs
172.16.51.106
172.16.51.109
172.16.50.106
172.16.34.46
rac3
rac3-vip
rac3-priv
rac3_nfs
172.16.34.43
nfsserver
172.16.51.120
rac-scan
28 / 64
15%
5
1% /tftpboot
甘肃银行 PowerHA HyperSwap 测试记录
2.1.4
盘信息
# lspv
hdisk0
hdisk1
hdisk2
hdisk3
hdisk4
hdisk5
hdisk6
hdisk13
hdisk18
# lspprc -Ao
hdisk#
PPRC
state
00f8ba8109a1cacb
00f8ba813257def7
00f8ba813257dd67
00f8ba813257dbc4
00f8ba813257f6df
00f8ba813257f550
none
none
none
Primary
path group
ID
0(s)
0(s)
0(s)
0(s)
0(s)
Secondary
path group
ID
1
1
1
1
1
rootvg
None
None
None
None
caavg_private
None
None
None
Primary Storage
WWNN
hdisk1
Active
5005076305ffd4a4
hdisk2
Active
5005076305ffd4a4
hdisk3
Active
5005076305ffd4a4
hdisk4
Active
5005076305ffd4a4
hdisk5
Active
5005076305ffd4a4
# lspprc -p hdisk1
path
WWNN
LSS VOL
path
group id
group status
=======================================================
0(s)
5005076305ffd4a4 0xc2 0x00
PRIMARY
1
5005076305ffd4ac 0xc2 0x00
SECONDARY
active
active
Secondary Storage
WWNN
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
path
path path
parent connection
group id id
status
=====================================================================
0
0
Enabled
fscsi0 50050763051014a4,40c2400000000000
0
1
Enabled
fscsi0 50050763050094a4,40c2400000000000
0
2
Enabled
fscsi0 50050763054014a4,40c2400000000000
0
3
Missing
fscsi0 50050763050b14a4,40c2400000000000
0
4
Enabled
fscsi0 50050763054b14a4,40c2400000000000
0
5
Enabled
fscsi1 50050763051014a4,40c2400000000000
0
6
Enabled
fscsi1 50050763050094a4,40c2400000000000
0
7
Enabled
fscsi1 50050763054014a4,40c2400000000000
0
8
Missing
fscsi1 50050763050b14a4,40c2400000000000
0
9
Enabled
fscsi1 50050763054b14a4,40c2400000000000
29 / 64
甘肃银行 PowerHA HyperSwap 测试记录
1
1
10
11
Enabled
Enabled
fscsi0 50050763051bd4ac,40c2400000000000
fscsi1 50050763051b94ac,40c2400000000000
2.2 RAC1 分区到主存储的 FC 路径失效
故障描述及预期:
主分区到主存储之间的链路失效,在测试中,模拟 RAC1 到 DS8877 的 Zone 失效
预期效果:
存储切换到 DS8876, Oracle DB 的 IO 暂停 20 秒后继续
NFS
Quorum
Site2
Site1
RAC1
PowerHA
Cluster
RAC2
RAC3
SAN-21
SAN-11
SAN-22
SAN-12
Metro Mirror
DS8876
DS8877(Primary)
# lspprc -Ao
hdisk#
PPRC
state
hdisk1
Active
hdisk2
Active
hdisk3
Active
hdisk4
Active
hdisk5
Active
# lspprc -c hdisk1
30 / 64
Primary
path group
ID
0(s)
0(s)
0(s)
0(s)
0(s)
Secondary
path group
ID
1
1
1
1
1
Primary Storage
WWNN
Secondary Storage
WWNN
5005076305ffd4a4
5005076305ffd4a4
5005076305ffd4a4
5005076305ffd4a4
5005076305ffd4a4
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
甘肃银行 PowerHA HyperSwap 测试记录
Displaying all paths between LSS C2 and LSS C2
Source
Target
WWNN
SSID LSS Port WWNN
SSID LSS Port State
===================================================================
5005076305FFD4A4 FFC2 C2 0134 5005076305FFD4AC FFC2 C2 0333 Up
5005076305FFD4AC FFC2 C2 0333 5005076305FFD4A4 FFC2 C2 0134 Up
# lspprc -p hdisk1
path
WWNN
LSS VOL
path
group id
group status
=======================================================
0(s)
5005076305ffd4a4 0xc2 0x00
PRIMARY
1
5005076305ffd4ac 0xc2 0x00
SECONDARY
path
path path
parent connection
group id id
status
=====================================================================
0
0
Enabled
fscsi0 50050763051014a4,40c2400000000000
0
1
Enabled
fscsi0 50050763050094a4,40c2400000000000
0
2
Enabled
fscsi0 50050763054014a4,40c2400000000000
0
3
Missing
fscsi0 50050763050b14a4,40c2400000000000
0
4
Enabled
fscsi0 50050763054b14a4,40c2400000000000
0
5
Enabled
fscsi1 50050763051014a4,40c2400000000000
0
6
Enabled
fscsi1 50050763050094a4,40c2400000000000
0
7
Enabled
fscsi1 50050763054014a4,40c2400000000000
0
8
Missing
fscsi1 50050763050b14a4,40c2400000000000
0
9
Enabled
fscsi1 50050763054b14a4,40c2400000000000
1 10
Enabled
fscsi0 50050763051bd4ac,40c2400000000000
1 11
Enabled
fscsi1 50050763051b94ac,40c2400000000000
故障模拟:
cfgremove "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs0_DS8877"
cfgremove "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs1_DS8877"
echo y | cfgSave
echo y | cfgenable "CSC_Base"
过程截图:RAC1
31 / 64
甘肃银行 PowerHA HyperSwap 测试记录
过程截图:RAC2
过程截图:RAC3
# lspprc -p hdisk2
path
WWNN
LSS VOL
path
group id
group status
=======================================================
0
5005076305ffd4a4 0xc3 0x00
SECONDARY
32 / 64
甘肃银行 PowerHA HyperSwap 测试记录
1(s)
5005076305ffd4ac 0xc3 0x00
PRIMARY
path
path path
parent connection
group id id
status
=====================================================================
0
0
Failed
fscsi0 50050763051014a4,40c3400000000000
0
1
Failed
fscsi0 50050763050094a4,40c3400000000000
0
2
Failed
fscsi0 50050763054014a4,40c3400000000000
0
3
Missing
fscsi0 50050763050b14a4,40c3400000000000
0
4
Failed
fscsi0 50050763054b14a4,40c3400000000000
0
5
Failed
fscsi1 50050763051014a4,40c3400000000000
0
6
Failed
fscsi1 50050763050094a4,40c3400000000000
0
7
Failed
fscsi1 50050763054014a4,40c3400000000000
0
8
Missing
fscsi1 50050763050b14a4,40c3400000000000
0
9
Failed
fscsi1 50050763054b14a4,40c3400000000000
1 10
Enabled
fscsi0 50050763051bd4ac,40c3400000000000
1 11
Enabled
fscsi1 50050763051b94ac,40c3400000000000
结果说明:
存储切换到 DS8876, Oracle DB 的 IO 暂停 20 秒后继续
恢复过程描述:
cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs0_DS8877"
cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs1_DS8877"
echo y | cfgSave
echo y | cfgenable "CSC_Base"
然后手工切换存储:
/usr/es/sbin/cluster/xd_generic/xd_cli/cl_clxd_manage_mg_smit -t 'user' -m 'usermg' -o 'swap'
/usr/es/sbin/cluster/xd_generic/xd_cli/cl_clxd_manage_mg_smit -t 'repository' -m 'repmg' -o
'swap'
# lspprc -Ao
hdisk#
PPRC
state
Primary
path group
ID
0(s)
0(s)
0(s)
0(s)
0(s)
Secondary
path group
ID
1
1
1
1
1
Primary Storage
WWNN
hdisk1
Active
5005076305ffd4a4
hdisk2
Active
5005076305ffd4a4
hdisk3
Active
5005076305ffd4a4
hdisk4
Active
5005076305ffd4a4
hdisk5
Active
5005076305ffd4a4
# lspprc -p hdisk1
path
WWNN
LSS VOL
path
group id
group status
=======================================================
0(s)
5005076305ffd4a4 0xc2 0x00
PRIMARY
33 / 64
Secondary Storage
WWNN
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
甘肃银行 PowerHA HyperSwap 测试记录
1
5005076305ffd4ac 0xc2 0x00
SECONDARY
path
path path
parent connection
group id id
status
=====================================================================
0
0
Enabled
fscsi0 50050763051014a4,40c2400000000000
0
1
Enabled
fscsi0 50050763050094a4,40c2400000000000
0
2
Enabled
fscsi0 50050763054014a4,40c2400000000000
0
3
Missing
fscsi0 50050763050b14a4,40c2400000000000
0
4
Enabled
fscsi0 50050763054b14a4,40c2400000000000
0
5
Enabled
fscsi1 50050763051014a4,40c2400000000000
0
6
Enabled
fscsi1 50050763050094a4,40c2400000000000
0
7
Enabled
fscsi1 50050763054014a4,40c2400000000000
0
8
Missing
fscsi1 50050763050b14a4,40c2400000000000
0
9
Enabled
fscsi1 50050763054b14a4,40c2400000000000
1 10
Enabled
fscsi0 50050763051bd4ac,40c2400000000000
1 11
Enabled
fscsi1 50050763051b94ac,40c2400000000000
2.3 主分区到所有存储的 FC 路径失效
故障描述及预期:
主分区到所有存储之间的链路失效,在测试中,模拟 RAC1 到 DS8877&DS8876 的所有 Zone
实效
预期效果:
存储不切换,RAC1 节点会被重起,RAC2 和 RAC3 的 IO 暂停 1 分钟后继续
34 / 64
甘肃银行 PowerHA HyperSwap 测试记录
NFS
Quorum
Site2
Site1
RAC1
RAC2
PowerHA
Cluster
RAC3
SAN-21
SAN-11
SAN-22
SAN-12
Metro Mirror
DS8877(Primary)
DS8876
故障模拟:
cfgremove "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs0_DS8877"
cfgremove "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs1_DS8877"
cfgremove "CSC_Base", "D20150730_GSB_HPSW_RAC1_8876P1"
cfgremove "CSC_Base", "D20150730_GSB_HPSW_RAC1_8876P2"
echo y | cfgSave
echo y | cfgenable "CSC_Base"
故障发生后,由于 RAC1 丢失了所有存储的访问,其中包括 Repository disk,根据之前所设
置的 repos_mode=a,该节点会被 crash。从日志看,大约故障发生后 20 秒内,RAC1 节点被
crash.
Note: 大约 17:09:40 发生故障,17:10:00 发生 crash.
以下为 RAC1 节点上的 dump 信息:
(4)> stat
SYSTEM_CONFIGURATION:
CHRP_SMP_PCI POWER_PC POWER_7 machine with 12 available CPU(s) (64-bit registers)
SYSTEM STATUS:
sysname... AIX
nodename.. rac1
release... 1
35 / 64
甘肃银行 PowerHA HyperSwap 测试记录
version... 7
build date Oct 15 2014
build time 12:38:00
label..... 1441D_71Q
machine... 00F8BA814C00
nid....... F8BA814C
time of crash: Mon Jul 6 17:10:00 2015
age of system: 4 day, 6 hr., 9 min., 1 sec.
xmalloc debug: enabled
FRRs active... 0
FRRs started.. 0
PANIC MESSAGES:
Lost access to cluster repository disk.
PANIC STRING:
Lost access to cluster repository disk.
过程截图:
RAC2
RAC3:
36 / 64
甘肃银行 PowerHA HyperSwap 测试记录
# lspprc -Ao
hdisk#
PPRC
state
Primary
path group
ID
0(s)
0(s)
0(s)
0(s)
0(s)
Secondary
path group
ID
1
1
1
1
1
Primary Storage
WWNN
hdisk1
Active
5005076305ffd4a4
hdisk2
Active
5005076305ffd4a4
hdisk3
Active
5005076305ffd4a4
hdisk4
Active
5005076305ffd4a4
hdisk5
Active
5005076305ffd4a4
# lspprc -p hdisk1
path
WWNN
LSS VOL
path
group id
group status
=======================================================
0(s)
5005076305ffd4a4 0xc2 0x00
PRIMARY
1
5005076305ffd4ac 0xc2 0x00
SECONDARY
Secondary Storage
WWNN
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
path
path path
parent connection
group id id
status
=====================================================================
0
0
Enabled
fscsi0 50050763051014a4,40c2400000000000
0
1
Enabled
fscsi0 50050763050094a4,40c2400000000000
0
2
Enabled
fscsi0 50050763054014a4,40c2400000000000
0
3
Missing
fscsi0 50050763050b14a4,40c2400000000000
0
4
Enabled
fscsi0 50050763054b14a4,40c2400000000000
0
5
Enabled
fscsi1 50050763051014a4,40c2400000000000
0
6
Enabled
fscsi1 50050763050094a4,40c2400000000000
0
7
Enabled
fscsi1 50050763054014a4,40c2400000000000
37 / 64
甘肃银行 PowerHA HyperSwap 测试记录
0
0
1
1
8
9
10
11
Missing
Enabled
Enabled
Enabled
fscsi1
fscsi1
fscsi0
fscsi1
50050763050b14a4,40c2400000000000
50050763054b14a4,40c2400000000000
50050763051bd4ac,40c2400000000000
50050763051b94ac,40c2400000000000
Oracle alert log on RAC2 node:
RAC1 节点 crash 后,RAC2 和 RAC3 会进行重组,时间为 misscount=45 秒。
因此,在故障发生后 65s(20+45)后重组完成,IO 继续。
2015-07-06 17:10:23.490:
[cssd(15335570)]CRS-1612:Network communication with node rac1 (1) missing for 50% of
timeout interval. Removal of this node from cluster in 21.733 seconds
2015-07-06 17:10:34.504:
[cssd(15335570)]CRS-1611:Network communication with node rac1 (1) missing for 75% of
timeout interval. Removal of this node from cluster in 10.719 seconds
2015-07-06 17:10:41.509:
[cssd(15335570)]CRS-1610:Network communication with node rac1 (1) missing for 90% of
timeout interval. Removal of this node from cluster in 3.715 seconds
2015-07-06 17:10:45.226:
[cssd(15335570)]CRS-1632:Node rac1 is being removed from the cluster in cluster incarnation
331664508
2015-07-06 17:10:45.257:
[cssd(15335570)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac2 rac3 .
2015-07-06 17:10:45.278:
[ctssd(9175164)]CRS-2407:The new Cluster Time Synchronization Service reference node is host
rac2.
2015-07-06 17:10:49.827:
[crsd(16187422)]CRS-5504:Node down event reported for node 'rac1'.
2015-07-06 17:11:08.387:
[crsd(16187422)]CRS-2773:Server 'rac1' has been removed from pool 'Generic'.
2015-07-06 17:11:08.393:
[crsd(16187422)]CRS-2773:Server 'rac1' has been removed from pool 'ora.orcl'.
结果说明:
存储不切换,RAC1 节点会被重起,RAC2 和 RAC3 的 IO 暂停 1 分钟后继续
恢复过程描述:
cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs0_DS8877"
cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs1_DS8877"
cfgadd "CSC_Base", "D20150730_GSB_HPSW_RAC1_8876P1"
cfgadd "CSC_Base", "D20150730_GSB_HPSW_RAC1_8876P2"
echo y | cfgSave
echo y | cfgenable "CSC_Base"
重起 RAC1 节点,重起后,RAC1 节点自动加入 CAA Cluster;然后手工启动 Oracle 和 PowerHA
服务。
38 / 64
甘肃银行 PowerHA HyperSwap 测试记录
2.4 数据中心之间所有 FC 链路失效
故障描述及预期:
跨站点的所有 FC 访问全部中断
预期效果:
RAC3 被 crash,RAC1 和 RAC2 在 IO 暂停 65 后继续,存储不切换
NFS
Quorum
Site2
Site1
RAC1
RAC2
PowerHA
Cluster
RAC3
SAN-21
SAN-11
SAN-22
SAN-12
Metro Mirror
DS8877(Primary)
故障模拟:
cfgremove "CSC_Base", "D20150730_GSB_HPSW_RAC1_8876P1"
cfgremove "CSC_Base", "D20150730_GSB_HPSW_RAC1_8876P2"
cfgremove "CSC_Base", "D20150730_GSB_HPSW_RAC2_8876P1"
cfgremove "CSC_Base", "D20150730_GSB_HPSW_RAC2_8876P2"
cfgremove "CSC_Base", "D20151030_HB_HPSW_RAC3_fcs0_DS8877"
cfgremove "CSC_Base", "D20151030_HB_HPSW_RAC3_fcs1_DS8877"
cfgremove "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77"
echo y | cfgSave
echo y | cfgenable "CSC_Base"
过程截图:
39 / 64
DS8876
甘肃银行 PowerHA HyperSwap 测试记录
RAC2:
# crsctl query css votedisk
## STATE
File Universal Id
File Name Disk group
-- ----------------------------- --------1. ONLINE 0288bafd47c14f27bf70eb282798b259 (/nfsvote3/nfs_vote) [VOTEDG]
2. ONLINE 473861c8d4ff4f17bf226f031c32c575 (/dev/rhdisk6) [VOTEDG]
Located 2 voting disk(s).
# lspprc -Ao
hdisk#
PPRC
state
hdisk1
hdisk2
hdisk3
hdisk4
40 / 64
Active
Active
Active
Active
Primary
path group
ID
0(s)
0(s)
0(s)
0(s)
Secondary
path group
ID
1
1
1
1
Primary Storage
WWNN
Secondary Storage
WWNN
5005076305ffd4a4
5005076305ffd4a4
5005076305ffd4a4
5005076305ffd4a4
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
甘肃银行 PowerHA HyperSwap 测试记录
hdisk5
Active
0(s)
1
5005076305ffd4a4 5005076305ffd4ac
# lspprc -c hdisk1
Displaying all paths between LSS C2 and LSS C2
Source
Target
WWNN
SSID LSS Port WWNN
SSID LSS Port State
===================================================================
5005076305FFD4A4 FFC2 C2 0134 5005076305FFD4AC FFC2 C2 0333 Up
# lspprc -p hdisk1
path
WWNN
LSS VOL
path
group id
group status
=======================================================
0(s)
5005076305ffd4a4 0xc2 0x00
PRIMARY
1
5005076305ffd4ac 0xc2 0x00
SECONDARY
path
path path
parent connection
group id id
status
=====================================================================
0
0
Enabled
fscsi0 50050763051014a4,40c2400000000000
0
1
Enabled
fscsi0 50050763050094a4,40c2400000000000
0
2
Enabled
fscsi0 50050763054014a4,40c2400000000000
0
3
Enabled
fscsi0 50050763050b14a4,40c2400000000000
0
4
Enabled
fscsi0 50050763054b14a4,40c2400000000000
0
5
Enabled
fscsi1 50050763051014a4,40c2400000000000
0
6
Enabled
fscsi1 50050763050094a4,40c2400000000000
0
7
Enabled
fscsi1 50050763054014a4,40c2400000000000
0
8
Enabled
fscsi1 50050763050b14a4,40c2400000000000
0
9
Enabled
fscsi1 50050763054b14a4,40c2400000000000
1 10
Failed
fscsi0 50050763051bd4ac,40c2400000000000
1 11
Failed
fscsi1 50050763051b94ac,40c2400000000000
[cssd(10748108)]CRS-1649:An I/O error occured for voting file:
(:CSSNM00059:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log.
2015-07-06 18:24:51.774:
[cssd(10748108)]CRS-1649:An I/O error occured for voting file:
(:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log.
2015-07-06 18:24:55.794:
[cssd(10748108)]CRS-1649:An I/O error occured for voting file:
(:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log.
2015-07-06 18:24:59.814:
[cssd(10748108)]CRS-1649:An I/O error occured for voting file:
(:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log.
2015-07-06 18:25:03.834:
41 / 64
/dev/rhdisk18; details at
/dev/rhdisk18; details at
/dev/rhdisk18; details at
/dev/rhdisk18; details at
甘肃银行 PowerHA HyperSwap 测试记录
[cssd(10748108)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk18; details at
(:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log.
2015-07-06 18:25:07.854:
[cssd(10748108)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk18; details at
(:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log.
2015-07-06 18:25:11.877:
[cssd(10748108)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk18; details at
(:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log.
2015-07-06 18:25:15.895:
[cssd(10748108)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk18; details at
(:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log.
2015-07-06 18:25:16.893:
[cssd(10748108)]CRS-1612:Network communication with node rac3 (3) missing for 50% of
timeout interval. Removal of this node from cluster in 22.089 seconds
2015-07-06 18:25:17.704:
[cssd(10748108)]CRS-1604:CSSD voting file is offline: /dev/rhdisk18; details at (:CSSNM00058:) in
/u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log.
2015-07-06 18:25:19.924:
[cssd(10748108)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk18; details at
(:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log.
2015-07-06 18:25:23.959:
[cssd(10748108)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk18; details at
(:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log.
2015-07-06 18:25:27.909:
[cssd(10748108)]CRS-1611:Network communication with node rac3 (3) missing for 75% of
timeout interval. Removal of this node from cluster in 11.073 seconds
2015-07-06 18:25:27.987:
[cssd(10748108)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk18; details at
(:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log.
2015-07-06 18:25:32.015:
[cssd(10748108)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk18; details at
(:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log.
2015-07-06 18:25:34.917:
[cssd(10748108)]CRS-1610:Network communication with node rac3 (3) missing for 90% of
timeout interval. Removal of this node from cluster in 4.064 seconds
2015-07-06 18:25:36.035:
[cssd(10748108)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk18; details at
(:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log.
2015-07-06 18:25:38.984:
[cssd(10748108)]CRS-1632:Node rac3 is being removed from the cluster in cluster incarnation
331671385
2015-07-06 18:25:38.993:
[cssd(10748108)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 rac2 .
2015-07-06 18:25:39.022:
42 / 64
甘肃银行 PowerHA HyperSwap 测试记录
[crsd(8323192)]CRS-5504:Node down event reported for node 'rac3'.
2015-07-06 18:25:41.016:
[cssd(10748108)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk18; details at
(:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log.
2015-07-06 18:25:41.723:
[crsd(8323192)]CRS-2773:Server 'rac3' has been removed from pool 'Generic'.
2015-07-06 18:25:41.724:
[crsd(8323192)]CRS-2773:Server 'rac3' has been removed from pool 'ora.orcl'.
2015-07-06 18:25:42.232:
[cssd(10748108)]CRS-1626:A Configuration change request completed successfully
2015-07-06 18:25:42.238:
[cssd(10748108)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 rac2
结果说明:
RAC3 被 crash,RAC1 和 RAC2 在 IO 暂停 65 后继续
恢复过程描述:
cfgadd "CSC_Base", "D20150730_GSB_HPSW_RAC1_8876P1"
cfgadd "CSC_Base", "D20150730_GSB_HPSW_RAC1_8876P2"
cfgadd "CSC_Base", "D20150730_GSB_HPSW_RAC2_8876P1"
cfgadd "CSC_Base", "D20150730_GSB_HPSW_RAC2_8876P2"
cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC3_fcs0_DS8877"
cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC3_fcs1_DS8877"
cfgadd "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77"
echo y | cfgenable "CSC_Base"
创建 pprcpath:在 DS8877 上做
mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91
5005076305FFD4AC -srclss B3 -tgtlss B3 -consistgrp I0134:I0333
mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91
5005076305FFD4AC -srclss B4 -tgtlss B4 -consistgrp I0134:I0333
mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91
5005076305FFD4AC -srclss B8 -tgtlss B8 -consistgrp I0134:I0333
mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91
5005076305FFD4AC -srclss B9 -tgtlss B9 -consistgrp I0134:I0333
mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91
5005076305FFD4AC -srclss BD -tgtlss BD -consistgrp I0134:I0333
创建 pprcpath:在 DS8876 上做
mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21
5005076305FFD4A4 -srclss B3 -tgtlss B3 -consistgrp I0333:I0134
mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21
5005076305FFD4A4 -srclss B4 -tgtlss B4 -consistgrp I0333:I0134
mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21
5005076305FFD4A4 -srclss B8 -tgtlss B8 -consistgrp I0333:I0134
mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21
43 / 64
-remotewwnn
-remotewwnn
-remotewwnn
-remotewwnn
-remotewwnn
-remotewwnn
-remotewwnn
-remotewwnn
-remotewwnn
甘肃银行 PowerHA HyperSwap 测试记录
5005076305FFD4A4 -srclss B9 -tgtlss B9 -consistgrp I0333:I0134
mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21
5005076305FFD4A4 -srclss BD -tgtlss BD -consistgrp I0333:I0134
-remotewwnn
增量同步数据,在 DS8877 上做
resumepprc -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91 -type mmir B300:B300
B400:B400 B800:B800 B900:B900 BD00:BD00
结果说明:
2.5 主存储故障
故障描述及预期:
DS8877 故障
预期效果:
存储切换到 DS8876, 数据库的 IO 暂停 30 秒后继续
NFS
Quorum
Site2
Site1
RAC1
RAC2
PowerHA
Cluster
RAC3
SAN-21
SAN-11
SAN-22
SAN-12
Metro Mirror
DS8877(Primary)
故障模拟:
cfgremove "CSC_Base", " D20151030_HB_HPSW_Node1fcs0_DS8877"
cfgremove "CSC_Base", " D20151030_HB_HPSW_Node1fcs1_DS8877"
cfgremove "CSC_Base", " D20151030_HB_HPSW_Node2fcs0_DS8877"
44 / 64
DS8876
甘肃银行 PowerHA HyperSwap 测试记录
cfgremove "CSC_Base", " D20151030_HB_HPSW_Node2fcs1_DS8877"
cfgremove "CSC_Base", " D20151030_HB_HPSW_Node3fcs0_DS8877"
cfgremove "CSC_Base", " D20151030_HB_HPSW_Node3fcs1_DS8877"
cfgremove "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77"
echo y | cfgSave
echo y | cfgenable "CSC_Base"
过程截图:
RAC1:
RAC2:
RAC3:
45 / 64
甘肃银行 PowerHA HyperSwap 测试记录
Oracle 日志:
2015-07-07 13:52:30.413:
[cssd(10748108)]CRS-1605:CSSD voting file is online: /dev/rhdisk13; details in
/u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log.
2015-07-07 13:52:30.413:
[cssd(10748108)]CRS-1604:CSSD voting file is offline: /dev/rhdisk6; details at (:CSSNM00069:) in
/u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log.
2015-07-07 13:52:30.413:
[cssd(10748108)]CRS-1604:CSSD voting file is offline: /nfsvote3/nfs_vote; details at
(:CSSNM00069:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log.
2015-07-07 13:52:30.413:
[cssd(10748108)]CRS-1626:A Configuration change request completed successfully
2015-07-07 13:52:30.424:
[cssd(10748108)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 rac2 rac3 .
2015-07-07 13:55:52.884:
[cssd(10748108)]CRS-1605:CSSD voting file is online: /nfsvote3/nfs_vote; details in
/u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log.
2015-07-07 13:55:52.884:
[cssd(10748108)]CRS-1605:CSSD voting file is online: /dev/rhdisk6; details in
/u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log.
2015-07-07 13:55:52.884:
[cssd(10748108)]CRS-1605:CSSD voting file is online: /dev/rhdisk18; details in
/u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log.
2015-07-07 13:55:52.884:
[cssd(10748108)]CRS-1604:CSSD voting file is offline: /dev/rhdisk13; details at (:CSSNM00069:) in
/u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log.
46 / 64
甘肃银行 PowerHA HyperSwap 测试记录
2015-07-07 13:55:52.884:
[cssd(10748108)]CRS-1626:A Configuration change request completed successfully
2015-07-07 13:55:52.893:
[cssd(10748108)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 rac2 rac3 .
2015-07-07 14:07:07.338:
[cssd(10748108)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk6; details at
(:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log.
2015-07-07 14:07:07.338:
[cssd(10748108)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk6; details at
(:CSSNM00059:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log.
2015-07-07 14:07:11.408:
[cssd(10748108)]CRS-1626:A Configuration change request completed successfully
2015-07-07 14:07:11.417:
[cssd(10748108)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 rac2 rac3 .
# crsctl query css votedisk
## STATE
File Universal Id
File Name Disk group
-- ----------------------------- --------1. ONLINE 0eec82519af84f8bbf3ae0f365ec3148 (/nfsvote3/nfs_vote) [VOTEDG]
2. ONLINE 321e9f400a774f98bf8f9d6178c91ba6 (/dev/rhdisk18) [VOTEDG]
Located 2 voting disk(s).
#
# lspprc -Ao
hdisk#
PPRC
state
Primary
path group
ID
1(s)
1(s)
1(s)
1(s)
1(s)
Secondary
path group
ID
0
0
0
0
0
Primary Storage
WWNN
hdisk1
Active
5005076305ffd4ac
hdisk2
Active
5005076305ffd4ac
hdisk3
Active
5005076305ffd4ac
hdisk4
Active
5005076305ffd4ac
hdisk5
Active
5005076305ffd4ac
# lspprc -p hdisk1
path
WWNN
LSS VOL
path
group id
group status
=======================================================
0
5005076305ffd4a4 0xc2 0x00
SECONDARY
1(s)
5005076305ffd4ac 0xc2 0x00
PRIMARY
Secondary Storage
WWNN
5005076305ffd4a4
5005076305ffd4a4
5005076305ffd4a4
5005076305ffd4a4
5005076305ffd4a4
path
path path
parent connection
group id id
status
=====================================================================
0
0
Failed
fscsi0 50050763051014a4,40c2400000000000
0
1
Failed
fscsi0 50050763050094a4,40c2400000000000
47 / 64
甘肃银行 PowerHA HyperSwap 测试记录
0
2
Failed
fscsi0 50050763054014a4,40c2400000000000
0
3
Failed
fscsi0 50050763050b14a4,40c2400000000000
0
4
Failed
fscsi0 50050763054b14a4,40c2400000000000
0
5
Failed
fscsi1 50050763051014a4,40c2400000000000
0
6
Failed
fscsi1 50050763050094a4,40c2400000000000
0
7
Failed
fscsi1 50050763054014a4,40c2400000000000
0
8
Failed
fscsi1 50050763050b14a4,40c2400000000000
0
9
Failed
fscsi1 50050763054b14a4,40c2400000000000
1 10
Enabled
fscsi0 50050763051bd4ac,40c2400000000000
1 11
Enabled
fscsi1 50050763051b94ac,40c2400000000000
# lspprc -c hdisk1
Displaying all paths between LSS C2 and LSS C2
Source
Target
WWNN
SSID LSS Port WWNN
SSID LSS Port State
===================================================================
5005076305FFD4AC FFC2 C2 0333 5005076305FFD4A4 FFC2 C2 0134 Up
结果说明:
存储切换到 DS8876, 数据库的 IO 暂停 30 秒后继续
恢复过程描述:
cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs0_DS8877"
cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs1_DS8877"
cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC2_fcs0_DS8877"
cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC2_fcs1_DS8877"
cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC3_fcs0_DS8877"
cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC3_fcs1_DS8877"
cfgadd "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77"
echo y | cfgSave
echo y | cfgenable "CSC_Base"
查看 pprcpath 是否双向全部创建成功,如果没有全部,需要手工创建
然后手工进行数据的同步(从 DS8876 -> DS8877)
2.6 存储复制链路故障
故障描述及预期:
站点间 PPRC 链路故障,在测试中,模拟 DS8877 到 DS8876 的 Zone 失效
预期效果:
数据库的 IO 暂停 30 秒后继续,存储不切换
48 / 64
甘肃银行 PowerHA HyperSwap 测试记录
NFS
Quorum
Site2
Site1
RAC1
RAC2
PowerHA
Cluster
RAC3
SAN-21
SAN-11
SAN-22
SAN-12
Metro Mirror
DS8877(Primary)
故障模拟:
cfgremove "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77"
echo y | cfgSave
echo y | cfgenable "CSC_Base"
过程截图:
RAC1:
49 / 64
DS8876
甘肃银行 PowerHA HyperSwap 测试记录
# lspprc -p hdisk1
path
WWNN
LSS VOL
path
group id
group status
=======================================================
0(s)
5005076305ffd4a4 0xc2 0x00
PRIMARY,
SUSPENDED,
OOS
1
5005076305ffd4ac 0xc2 0x00
SECONDARY
path
path path
parent connection
group id id
status
=====================================================================
0
0
Enabled
fscsi0 50050763051014a4,40c2400000000000
0
1
Enabled
fscsi0 50050763050094a4,40c2400000000000
0
2
Enabled
fscsi0 50050763054014a4,40c2400000000000
0
3
Enabled
fscsi0 50050763050b14a4,40c2400000000000
0
4
Enabled
fscsi0 50050763054b14a4,40c2400000000000
0
5
Enabled
fscsi1 50050763051014a4,40c2400000000000
0
6
Enabled
fscsi1 50050763050094a4,40c2400000000000
0
7
Enabled
fscsi1 50050763054014a4,40c2400000000000
0
8
Enabled
fscsi1 50050763050b14a4,40c2400000000000
0
9
Enabled
fscsi1 50050763054b14a4,40c2400000000000
1 10
Enabled
fscsi0 50050763051bd4ac,40c2400000000000
1 11
Enabled
fscsi1 50050763051b94ac,40c2400000000000
# lspprc -c hdisk1
Displaying all paths between LSS C2 and LSS C2
Source
Target
WWNN
SSID LSS Port WWNN
SSID LSS Port State
===================================================================
5005076305FFD4AC FFC2 C2 0333 5005076305FFD4A4 FFC2 C2 0134 Down
dscli> lspprcpath C2 c3 C7 C8 CC
Date/Time: 2015 年 7 月 7 日 下午 03 时 12 分 51 秒 IBM DSCLI Version: 7.7.10.289 DS:
IBM.2107-75CGR21
Src Tgt State SS Port Attached Port Tgt WWNN
=======================================================
50 / 64
甘肃银行 PowerHA HyperSwap 测试记录
C2
C3
C7
C8
CC
C2
C3
C7
C8
CC
Failed FFC2 Failed FFC3 Failed FFC7 Failed FFC8 Failed FFCC -
-
5005076305FFD4AC
5005076305FFD4AC
5005076305FFD4AC
5005076305FFD4AC
5005076305FFD4AC
结果说明
数据库的 IO 暂停 30 秒后继续,存储不切换
恢复过程描述:
cfgadd "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77"
echo y | cfgSave
echo y | cfgenable "CSC_Base"
创建 pprcpath
在 DS8877 上运行 resumepprc
2.7 主站点故障
故障描述及预期:
主站点故障
预期效果:
RAC3 接管业务,存储发生切换,IO 暂停 65s
51 / 64
甘肃银行 PowerHA HyperSwap 测试记录
NFS
Quorum
Site2
Site1
RAC1
RAC2
PowerHA
Cluster
RAC3
SAN-21
SAN-11
SAN-22
SAN-12
Metro Mirror
DS8877(Primary)
DS8876
故障模拟:
chsysstate -m SVRP7750-29-SN06BA81T -r lpar -o shutdown --id 11 --immed
chsysstate -m SVRP7750-29-SN06BA81T -r lpar -o shutdown --id 12 --immed
cfgremove "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs0_DS8877"
cfgremove "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs1_DS8877"
cfgremove "CSC_Base", "D20151030_HB_HPSW_RAC2_fcs0_DS8877"
cfgremove "CSC_Base", "D20151030_HB_HPSW_RAC2_fcs1_DS8877"
cfgremove "CSC_Base", "D20151030_HB_HPSW_RAC3_fcs0_DS8877"
cfgremove "CSC_Base", "D20151030_HB_HPSW_RAC3_fcs1_DS8877"
cfgremove "CSC_Base", "D20150730_GSB_HPSW_PPRC_76to77"
echo y | cfgenable "CSC_Base"
过程截图:
52 / 64
甘肃银行 PowerHA HyperSwap 测试记录
2015-07-07 16:05:08.830:
[cssd(12779668)]CRS-1612:Network communication with node rac1 (1) missing for 50% of
timeout interval. Removal of this node from cluster in 21.834 seconds
2015-07-07 16:05:09.428:
[cssd(12779668)]CRS-1615:No I/O has completed after 50% of the maximum interval. Voting file
/dev/rhdisk6 will be considered not functional in 18703 milliseconds
2015-07-07 16:05:11.835:
[cssd(12779668)]CRS-1612:Network communication with node rac2 (2) missing for 50% of
timeout interval. Removal of this node from cluster in 21.967 seconds
2015-07-07 16:05:13.194:
[cssd(12779668)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk6; details at
(:CSSNM00059:) in /u01/app/11.2.0.4/grid/log/rac3/cssd/ocssd.log.
2015-07-07 16:05:13.194:
[cssd(12779668)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk6; details at
(:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac3/cssd/ocssd.log.
2015-07-07 16:05:13.245:
[cssd(12779668)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk6; details at
(:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac3/cssd/ocssd.log.
2015-07-07 16:05:13.834:
[cssd(12779668)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk6; details at
(:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac3/cssd/ocssd.log.
2015-07-07 16:05:17.843:
53 / 64
甘肃银行 PowerHA HyperSwap 测试记录
[cssd(12779668)]CRS-1614:No I/O has completed after 75% of the maximum interval. Voting file
/dev/rhdisk6 will be considered not functional in 10288 milliseconds
2015-07-07 16:05:17.983:
[cssd(12779668)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk6; details at
(:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac3/cssd/ocssd.log.
2015-07-07 16:05:19.845:
[cssd(12779668)]CRS-1611:Network communication with node rac1 (1) missing for 75% of
timeout interval. Removal of this node from cluster in 10.819 seconds
2015-07-07 16:05:22.484:
[cssd(12779668)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk6; details at
(:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac3/cssd/ocssd.log.
2015-07-07 16:05:22.849:
[cssd(12779668)]CRS-1611:Network communication with node rac2 (2) missing for 75% of
timeout interval. Removal of this node from cluster in 10.953 seconds
2015-07-07 16:05:24.841:
[cssd(12779668)]CRS-1613:No I/O has completed after 90% of the maximum interval. Voting file
/dev/rhdisk6 will be considered not functional in 3290 milliseconds
2015-07-07 16:05:26.852:
[cssd(12779668)]CRS-1610:Network communication with node rac1 (1) missing for 90% of
timeout interval. Removal of this node from cluster in 3.812 seconds
2015-07-07 16:05:26.993:
[cssd(12779668)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk6; details at
(:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac3/cssd/ocssd.log.
2015-07-07 16:05:28.811:
[cssd(12779668)]CRS-1604:CSSD voting file is offline: /dev/rhdisk6; details at (:CSSNM00058:) in
/u01/app/11.2.0.4/grid/log/rac3/cssd/ocssd.log.
2015-07-07 16:05:29.853:
[cssd(12779668)]CRS-1610:Network communication with node rac2 (2) missing for 90% of
timeout interval. Removal of this node from cluster in 3.949 seconds
2015-07-07 16:05:31.323:
[cssd(12779668)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk6; details at
(:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac3/cssd/ocssd.log.
2015-07-07 16:05:33.802:
[cssd(12779668)]CRS-1632:Node rac1 is being removed from the cluster in cluster incarnation
331671404
2015-07-07 16:05:33.802:
[cssd(12779668)]CRS-1632:Node rac2 is being removed from the cluster in cluster incarnation
331671404
2015-07-07 16:05:33.806:
[cssd(12779668)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac3 .
2015-07-07 16:05:33.831:
[ctssd(12910816)]CRS-2407:The new Cluster Time Synchronization Service reference node is host
rac3.
2015-07-07 16:05:36.443:
54 / 64
甘肃银行 PowerHA HyperSwap 测试记录
[cssd(12779668)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk6; details at
(:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac3/cssd/ocssd.log.
2015-07-07 16:05:37.461:
[cssd(12779668)]CRS-1626:A Configuration change request completed successfully
2015-07-07 16:05:37.465:
[cssd(12779668)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac3 .
2015-07-07 16:05:42.397:
[crsd(7471164)]CRS-5504:Node down event reported for node 'rac1'.
2015-07-07 16:05:42.397:
[crsd(7471164)]CRS-5504:Node down event reported for node 'rac2'.
2015-07-07 16:05:44.005:
[client(29556942)]CRS-4743:File
/u01/app/11.2.0.4/grid/oc4j/j2ee/home/OC4J_DBWLM_config/system-jazn-data.xml
was
updated from OCR(Size: 13365(New), 13378(Old) bytes)
2015-07-07 16:05:59.516:
[crsd(7471164)]CRS-2773:Server 'rac1' has been removed from pool 'Generic'.
2015-07-07 16:05:59.520:
[crsd(7471164)]CRS-2773:Server 'rac1' has been removed from pool 'ora.orcl'.
2015-07-07 16:05:59.520:
[crsd(7471164)]CRS-2773:Server 'rac2' has been removed from pool 'Generic'.
2015-07-07 16:05:59.521:
[crsd(7471164)]CRS-2773:Server 'rac2' has been removed from pool 'ora.orcl'.
结果说明:
RAC3 接管业务,存储发生切换,IO 暂停 65s
恢复过程描述:
cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs0_DS8877"
cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs1_DS8877"
cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC2_fcs0_DS8877"
cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC2_fcs1_DS8877"
cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC3_fcs0_DS8877"
cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC3_fcs1_DS8877"
cfgadd "CSC_Base", "D20150730_GSB_HPSW_PPRC_76to77"
echo y | cfgenable "CSC_Base"
chsysstate -m SVRP7750-29-SN06BA81T -r lpar -o shutdown --id 11 --immed
chsysstate -m SVRP7750-29-SN06BA81T -r lpar -o shutdown --id 12 –immed
恢复存储的 pprcpath,数据的同步
恢复 voting disk 的设置
手工 swap 存储回来
55 / 64
甘肃银行 PowerHA HyperSwap 测试记录
2.8 常用脚本
2.8.1
2.8.1.1
AIX
查看 hdisk 和 LUN 的对应关系
# for i in 1 2 3 4 5 6^Jdo^Jecho hdisk$i^Jlscfg -vpl hdisk$i|egrep "Serial Number|Z7"^Jecho
"\n"^Jdone
hdisk1
Serial Number...............75CGR21C
Device Specific.(Z7)........C200
hdisk2
Serial Number...............75CGR21C
Device Specific.(Z7)........C300
hdisk3
Serial Number...............75CGR21C
Device Specific.(Z7)........C700
hdisk4
Serial Number...............75CGR21C
Device Specific.(Z7)........C800
hdisk5
Serial Number...............75CGR21C
Device Specific.(Z7)........CC00
hdisk6
Serial Number...............75CGR21E
Device Specific.(Z7)........E301
2.8.1.2
查看 hdisk 的 HyperSwap enable 情况
# lspprc -Ao
hdisk#
PPRC
state
hdisk1
56 / 64
Active
Primary
path group
ID
0(s)
Secondary
path group
ID
1
Primary Storage
WWNN
Secondary Storage
WWNN
5005076305ffd4a4 5005076305ffd4ac
甘肃银行 PowerHA HyperSwap 测试记录
hdisk2
Active 0(s)
1
5005076305ffd4a4
hdisk3
Active 0(s)
1
5005076305ffd4a4
hdisk4
Active 0(s)
1
5005076305ffd4a4
hdisk5
Active 0(s)
1
5005076305ffd4a4
# lspprc -p hdisk1
path
WWNN
LSS VOL
path
group id
group status
=======================================================
0(s)
5005076305ffd4a4 0xc2 0x00
PRIMARY
1
5005076305ffd4ac 0xc2 0x00
SECONDARY
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
5005076305ffd4ac
path
path path
parent connection
group id id
status
=====================================================================
0
0
Enabled
fscsi0 50050763051014a4,40c2400000000000
0
1
Enabled
fscsi0 50050763050094a4,40c2400000000000
0
2
Enabled
fscsi0 50050763054014a4,40c2400000000000
0
3
Missing
fscsi0 50050763050b14a4,40c2400000000000
0
4
Enabled
fscsi0 50050763054b14a4,40c2400000000000
0
5
Enabled
fscsi1 50050763051014a4,40c2400000000000
0
6
Enabled
fscsi1 50050763050094a4,40c2400000000000
0
7
Enabled
fscsi1 50050763054014a4,40c2400000000000
0
8
Missing
fscsi1 50050763050b14a4,40c2400000000000
0
9
Enabled
fscsi1 50050763054b14a4,40c2400000000000
1 10
Enabled
fscsi0 50050763051bd4ac,40c2400000000000
1 11
Enabled
fscsi1 50050763051b94ac,40c2400000000000
2.8.1.3
查看 HyperSwap disk 的 pprcpath 配置情况
# lspprc -c hdisk1
Displaying all paths between LSS C2 and LSS C2
Source
Target
WWNN
SSID LSS Port WWNN
SSID LSS Port State
===================================================================
5005076305FFD4A4 FFC2 C2 0134 5005076305FFD4AC FFC2 C2 0333 Up
5005076305FFD4AC FFC2 C2 0333 5005076305FFD4A4 FFC2 C2 0134 Up
57 / 64
甘肃银行 PowerHA HyperSwap 测试记录
2.8.2
2.8.2.1
PowerHA
查看资源组状态
# clRGinfo
----------------------------------------------------------------------------Group Name
State
Node
----------------------------------------------------------------------------racRG
ONLINE
rac1@site1
ONLINE
rac2@site1
ONLINE
rac3@site2
2.8.2.2
查看集群管理进程
# clshowsrv -v
Status of the RSCT subsystems used by PowerHA SystemMirror:
Subsystem
Group
PID
Status
cthags
cthags
9502934
active
ctrmc
rsct
6619346
active
Status of the PowerHA SystemMirror subsystems:
Subsystem
Group
PID
clstrmgrES
cluster
7602196
clevmgrdES
cluster
7667716
Status of the CAA subsystems:
Subsystem
Group
clcomd
caa
clconfd
caa
2.8.3
2.8.3.1
PID
6357208
8650866
Status
active
active
Status
active
active
DS8K
查看微码版本
dscli> ver -l
Date/Time: 2015 年 7 月 1 日 上午 09 时 26 分 05 秒 IBM DSCLI Version: 7.7.10.289 DS: DSCLI
7.7.10.289
StorageManager 7.7.7.0.20140929.1
================Version=================
58 / 64
甘肃银行 PowerHA HyperSwap 测试记录
Storage Image
LMC
===========================
IBM.2107-75CGR21 7.7.40.364
2.8.3.2
查看 ID 及 WWNN 等
dscli> lssi
Date/Time: 2015 年 7 月 1 日 上午 09 时 26 分 22 秒 IBM DSCLI Version: 7.7.10.289 DS: Name ID
Storage Unit
Model WWNN
State ESSNet
==============================================================================
DS8877 IBM.2107-75CGR21 IBM.2107-75CGR20 961 5005076305FFD4A4 Online Enabled
2.8.3.3
查看 PPRC 状态
dscli> lspprc C200 C300 C700 C800 CC00
Date/Time: 2015 年 7 月 7 日 下午 05 时 27 分 50 秒 IBM DSCLI Version: 7.7.10.289 DS: IBM.2107-75CGR21
ID
State
Reason Type
SourceLSS Timeout (secs) Critical Mode First Pass Status
============================================================================================
======
C200:C200 Full Duplex -
Metro Mirror C2
60
Disabled
Invalid
C300:C300 Full Duplex -
Metro Mirror C3
60
Disabled
Invalid
C700:C700 Full Duplex -
Metro Mirror C7
60
Disabled
Invalid
C800:C800 Full Duplex -
Metro Mirror C8
60
Disabled
Invalid
CC00:CC00 Full Duplex -
Metro Mirror CC
60
Disabled
Invalid
2.8.3.4
查看 PPRCPATH 状态及配置
dscli> lspprcpath C2 c3 C7 C8 CC
Date/Time: 2015 年 7 月 7 日 下午 05 时 27 分 54 秒 IBM DSCLI Version: 7.7.10.289 DS:
IBM.2107-75CGR21
Src Tgt State SS Port Attached Port Tgt WWNN
=========================================================
C2 C2 Success FFC2 I0134 I0333
5005076305FFD4AC
C3 C3 Success FFC3 I0134 I0333
5005076305FFD4AC
C7 C7 Success FFC7 I0134 I0333
5005076305FFD4AC
C8 C8 Success FFC8 I0134 I0333
5005076305FFD4AC
CC CC Success FFCC I0134 I0333
5005076305FFD4AC
dscli> showlss c2
Date/Time: 2015 年 7 月 7 日 下午 05 时 28 分 44 秒 IBM DSCLI Version: 7.7.10.289 DS:
59 / 64
甘肃银行 PowerHA HyperSwap 测试记录
IBM.2107-75CGR21
ID
C2
Group
0
addrgrp
C
stgtype
fb
confgvols
1
subsys
0xFFC2
pprcconsistgrp Enabled
xtndlbztimout 60 secs
resgrp
RG0
2.8.4
恢复 Oracle 的 voting disk 配置
在涉及到如下场景时,Oracle voting disk group 的配置会有变化,即原先有 3 个成员,场景
发生后,会丢失其中一个成员;在恢复故障后,需要手工恢复 voting disk group 的配置,下
面以主存储故障发生后,恢复 voting disk group 的步骤进行说明:
恢复 RAC 的 voting disk 设置:
由于测试过程中丢失了 hdisk6 的 voting disk 盘访问,在进行下次测试之前,需要进行手工恢
复:
在故障之前,正常情况下是如下输出:
$ crsctl query css votedisk
## STATE
File Universal Id
File Name Disk group
-- ----------------------------- --------1. ONLINE 0288bafd47c14f27bf70eb282798b259 (/nfsvote3/nfs_vote) [VOTEDG]
2. ONLINE 473861c8d4ff4f17bf226f031c32c575 (/dev/rhdisk6) [VOTEDG]
3. ONLINE 4e28ed259e1d4fd9bf38ffa1b8f53336 (/dev/rhdisk18) [VOTEDG]
Located 3 voting disk(s).
hdisk6 来自主存储 DS8877
hdisk18 来自备存储 DS8876
/nfsvote3/nfs_vote 来自第三方站点的 NFS Server
hdisk13 是用来作为临时的 voting disk group 来进行处理
60 / 64
甘肃银行 PowerHA HyperSwap 测试记录
故障发生后,所有节点不能访问位于主存储的 hdisk6,hdisk6 就会被踢出当前的 voting disk
group,我们采用如下步骤:
1. 采用临时 voting disk group 替换当前 dg
2. 删除之前生产用的 dg
3. 重建之前生产用的 dg
4. 将新建的 dg 再替换成当前 dg
$ crsctl query css votedisk
## STATE
File Universal Id
File Name Disk group
-- ----------------------------- --------1. ONLINE 0288bafd47c14f27bf70eb282798b259 (/nfsvote3/nfs_vote) [VOTEDG]
3. ONLINE 4e28ed259e1d4fd9bf38ffa1b8f53336 (/dev/rhdisk18) [VOTEDG]
Located 2 voting disk(s).
#asmcmd
ASMCMD> lsdsk -G votedg
Path
/dev/rhdisk18
/nfsvote3/nfs_vote
ASMCMD> lsdg
State
Type
Rebal
Sector
Block
AU
Total_MB
Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED EXTERN N
512
4096 1048576
40960
0
6712
0
N DATA/
MOUNTED NORMAL N
512
4096 1048576
10540
0
5035
1
Y VOTEDG/
MOUNTED EXTERN N
512
4096 1048576
1024
0
618
0
N VOTETMPDG/
Free_MB
$ sqlplus "/as sysasm"
SQL> select name, state from v$asm_diskgroup;
NAME
------------------------------ ----------DATA
VOTEDG
VOTETMPDG
STATE
MOUNTED
MOUNTED
MOUNTED
SQL> set linesize 1000
SQL> Col name format a25
SQL> select disk_number,repair_timer,state,name,path from v$asm_disk order by name;
DISK_NUMBER REPAIR_TIMER STATE
61 / 64
NAME
PATH
6712
10337
618
甘肃银行 PowerHA HyperSwap 测试记录
0
1
2
3
1
2
0
0
0
0 NORMAL DATA_0000
0 NORMAL DATA_0001
0 NORMAL DATA_0002
0 NORMAL DATA_0003
0 NORMAL VOTEDG_0001
0 NORMAL VOTEDG_0002
0 NORMAL VOTETMPDG_0000
0 FORCING _DROPPED_0000_VOTEDG
0 NORMAL
/dev/rhdisk1
/dev/rhdisk2
/dev/rhdisk3
/dev/rhdisk4
/dev/rhdisk18
/nfsvote3/nfs_vote
/dev/rhdisk13
/dev/rhdisk6
9 rows selected.
$ crsctl replace votedisk +votetmpdg
Successful addition of voting disk 5e46bfec9eea4f37bf1c6ae33e508a23.
Successful deletion of voting disk 2393621fb4e04f4dbfc32cd2d6521d8e.
Successful deletion of voting disk 36c1a96614814f37bf5e7c1eeb0723f2.
Successfully replaced voting disk group with +votetmpdg.
CRS-4266: Voting file(s) successfully replaced
$ crsctl query css votedisk
## STATE
File Universal Id
File Name Disk group
-- ----------------------------- --------1. ONLINE 5e46bfec9eea4f37bf1c6ae33e508a23 (/dev/rhdisk13) [VOTETMPDG]
Located 1 voting disk(s).
$ sqlplus "/as sysasm"
SQL*Plus: Release 11.2.0.4.0 Production on Mon Jul 6 18:06:27 2015
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
SQL> alter diskgroup votedg dismount;
--需要在所有节点执行
Diskgroup altered.
SQL> drop diskgroup votedg force including contents;
Diskgroup dropped.
SQL> create diskgroup voteDG normal redundancy
62 / 64
甘肃银行 PowerHA HyperSwap 测试记录
failgroup fg1 disk '/dev/rhdisk6'
failgroup fg2 disk '/dev/rhdisk18'
quorum failgroup fg3 disk '/nfsvote3/nfs_vote'
attribute 'compatible.asm' = '11.2.0.0.0';
Diskgroup created.
SQL> SQL> quit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Real Application Clusters and Automatic Storage Management options
$ set -o vi
$ crsctl replace votedisk +votedg
Successful addition of voting disk 0288bafd47c14f27bf70eb282798b259.
Successful addition of voting disk 473861c8d4ff4f17bf226f031c32c575.
Successful addition of voting disk 4e28ed259e1d4fd9bf38ffa1b8f53336.
Successful deletion of voting disk 5e46bfec9eea4f37bf1c6ae33e508a23.
Successfully replaced voting disk group with +votedg.
CRS-4266: Voting file(s) successfully replaced
$ crsctl query css votedisk
## STATE
File Universal Id
File Name Disk group
-- ----------------------------- --------1. ONLINE 0288bafd47c14f27bf70eb282798b259 (/nfsvote3/nfs_vote) [VOTEDG]
2. ONLINE 473861c8d4ff4f17bf226f031c32c575 (/dev/rhdisk6) [VOTEDG]
3. ONLINE 4e28ed259e1d4fd9bf38ffa1b8f53336 (/dev/rhdisk18) [VOTEDG]
Located 3 voting disk(s).
如果想用 votetmpdg 替换当前 dg 的时候发现 votetmpdg 为 dismounted 状态,则先 mount
该 disk group。
SQL> select name, state from v$asm_diskgroup;
NAME
STATE
------------------------------ ----------DATA
MOUNTED
VOTEDG
MOUNTED
VOTETMPDG
DISMOUNTED
SQL> alter diskgroup votetmpdg mount;
Diskgroup altered.
SQL> select name, state from v$asm_diskgroup;
NAME
STATE
------------------------------ ----------63 / 64
甘肃银行 PowerHA HyperSwap 测试记录
DATA
VOTEDG
VOTETMPDG
64 / 64
MOUNTED
MOUNTED
MOUNTED