甘肃银行 PowerHA HyperSwap 测试记录 1 A-S 1.1 环境描述 1.1.1 拓扑信息 Site1 App1 Site2 PowerHA Cluster App2 App3 Tie-breaker Disk SAN_2 SAN_1 Metro Mirror DS8877(Primary) DS8876 描述: 在测试环境中,模拟三个站点,其中 Site1 有两个 AIX 分区和一个存储,Site2 有一个 AIX 分 区和一个存储,Site3 放置 PowerHA 集群的仲裁盘,目前暂时将该盘放置在 Site1。 1.1.2 Zone 信息 主机到 DS8877: D20151030_HB_HPSW_Node1fcs0_DS8877 D20151030_HB_HPSW_Node1fcs1_DS8877 D20151030_HB_HPSW_Node2fcs0_DS8877 D20151030_HB_HPSW_Node2fcs1_DS8877 D20151030_HB_HPSW_Node3fcs0_DS8877 D20151030_HB_HPSW_Node3fcs1_DS8877 1 / 64 甘肃银行 PowerHA HyperSwap 测试记录 主机到 DS8876: D20150730_GSB_HPSW_101_8876P1 D20150730_GSB_HPSW_101_8876P2 D20150730_GSB_HPSW_102_8876P1 D20150730_GSB_HPSW_102_8876P2 D20150730_GSB_HPSW_103_8876P1 D20150730_GSB_HPSW_103_8876P2 主机到 Tiebreaker: D20151030_HB_HPSW_Node1fcs2_DS8877 D20151030_HB_HPSW_Node2fcs2_DS8877 D20151030_HB_HPSW_Node3fcs2_DS8877 存储之间 Metro Mirror: D20150730_GSB_HPSW_PPRC_76to77 1.1.3 主机及 IP 信息 172.16.51.101 172.16.51.102 172.16.51.103 172.16.34.78 1.1.4 p7502901 p7502902 p7502903 serviceip 盘信息 # lspv hdisk0 hdisk1 hdisk2 hdisk3 hdisk4 hdisk5 hdisk6 hdisk12 # lspprc -Ao hdisk# PPRC state 2 / 64 00f8ba8174713d1a 00f8ba812f5f5a5b 00f8ba813257f86f 00f8ba813257e867 00f8ba813257e573 00f8ba813257e3cf 00f8ba81428bc67c 00f8ba810ad98e8e Primary path group ID Secondary path group ID rootvg oravg testvg testvg testvg testvg caavg_private None Primary Storage WWNN active active concurrent concurrent concurrent concurrent active Secondary Storage WWNN 甘肃银行 PowerHA HyperSwap 测试记录 hdisk2 hdisk3 hdisk4 hdisk5 hdisk6 Active Active Active Active Active 0(s) 0(s) 0(s) 0(s) 0(s) 1 1 1 1 1 5005076305ffd4a4 5005076305ffd4a4 5005076305ffd4a4 5005076305ffd4a4 5005076305ffd4a4 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac # lspprc -p hdisk2 path WWNN LSS VOL path group id group status ======================================================= 0(s) 5005076305ffd4a4 0xb3 0x00 PRIMARY 1 5005076305ffd4ac 0xb3 0x00 SECONDARY path path path parent connection group id id status ===================================================================== 0 0 Enabled fscsi0 50050763051014a4,40b3400000000000 0 1 Enabled fscsi0 50050763050094a4,40b3400000000000 0 2 Enabled fscsi0 50050763054014a4,40b3400000000000 0 3 Enabled fscsi0 50050763050b14a4,40b3400000000000 0 4 Enabled fscsi0 50050763054b14a4,40b3400000000000 0 5 Enabled fscsi1 50050763051014a4,40b3400000000000 0 6 Enabled fscsi1 50050763050094a4,40b3400000000000 0 7 Enabled fscsi1 50050763054014a4,40b3400000000000 0 8 Enabled fscsi1 50050763050b14a4,40b3400000000000 0 9 Enabled fscsi1 50050763054b14a4,40b3400000000000 1 10 Enabled fscsi0 50050763051bd4ac,40b3400000000000 1 11 Enabled fscsi1 50050763051b94ac,40b3400000000000 1.2 主分区到主存储的 FC 路径失效 故障描述及预期: 主分区(App1)到主存储(DS8877)之间的链路失效,在测试中,模拟 App1 到 DS8877 的 Zone 失效 预期效果: 存储切换到 DS8876, 应用的 IO 暂停 15-20 秒后继续 3 / 64 甘肃银行 PowerHA HyperSwap 测试记录 Site1 App1 Site2 PowerHA Cluster App2 App3 Tie-breaker Disk SAN_2 SAN_1 Metro Mirror DS8877(Primary) # lspprc -Ao hdisk# PPRC state Primary path group ID 0(s) 0(s) 0(s) 0(s) 0(s) DS8876 Secondary path group ID 1 1 1 1 1 Primary Storage WWNN hdisk2 Active 5005076305ffd4a4 hdisk3 Active 5005076305ffd4a4 hdisk4 Active 5005076305ffd4a4 hdisk5 Active 5005076305ffd4a4 hdisk6 Active 5005076305ffd4a4 # lspprc -p hdisk2 path WWNN LSS VOL path group id group status ======================================================= 0(s) 5005076305ffd4a4 0xb3 0x00 PRIMARY 1 5005076305ffd4ac 0xb3 0x00 SECONDARY Secondary Storage WWNN 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac path path path parent connection group id id status ===================================================================== 0 0 Enabled fscsi0 50050763051014a4,40b3400000000000 0 1 Enabled fscsi0 50050763050094a4,40b3400000000000 0 2 Enabled fscsi0 50050763054014a4,40b3400000000000 0 3 Enabled fscsi0 50050763050b14a4,40b3400000000000 0 4 Enabled fscsi0 50050763054b14a4,40b3400000000000 0 5 Enabled fscsi1 50050763051014a4,40b3400000000000 4 / 64 甘肃银行 PowerHA HyperSwap 测试记录 0 0 0 0 1 1 6 7 8 9 10 11 Enabled Enabled Enabled Enabled Enabled Enabled fscsi1 fscsi1 fscsi1 fscsi1 fscsi0 fscsi1 50050763050094a4,40b3400000000000 50050763054014a4,40b3400000000000 50050763050b14a4,40b3400000000000 50050763054b14a4,40b3400000000000 50050763051bd4ac,40b3400000000000 50050763051b94ac,40b3400000000000 故障模拟: cfgremove "CSC_Base", "D20151030_HB_HPSW_Node1fcs0_DS8877" cfgremove "CSC_Base", "D20151030_HB_HPSW_Node1fcs1_DS8877" echo y | cfgSave echo y | cfgenable "CSC_Base" 过程截图: 结果说明: IO 暂停 20 秒后继续 恢复过程描述: cfgadd "CSC_Base", "D20151030_HB_HPSW_Node1fcs0_DS8877" cfgadd "CSC_Base", "D20151030_HB_HPSW_Node1fcs1_DS8877" echo y | cfgSave echo y | cfgenable "CSC_Base" 然后手工切换主存储 DS8877 /usr/es/sbin/cluster/xd_generic/xd_cli/cl_clxd_manage_mg_smit -t 'user' -m 'usermg' -o 'swap' /usr/es/sbin/cluster/xd_generic/xd_cli/cl_clxd_manage_mg_smit -t 'repository' -m 'repmg' -o 'swap' 5 / 64 甘肃银行 PowerHA HyperSwap 测试记录 1.3 主分区到所有存储的 FC 路径失效 故障描述及预期: 主分区到所有存储之间的链路失效,在测试中,模拟 App1 到 DS8877&DS8876 的所有 Zone 实效 预期效果: 存储不切换,资源组切换到 App2 上 Site1 App1 Site2 App2 PowerHA Cluster App3 Tie-breaker Disk SAN_2 SAN_1 Metro Mirror DS8877(Primary) 故障模拟: cfgremove "CSC_Base", "D20151030_HB_HPSW_Node1fcs0_DS8877" cfgremove "CSC_Base", "D20151030_HB_HPSW_Node1fcs1_DS8877" cfgremove "CSC_Base", "D20150730_GSB_HPSW_101_8876P1" cfgremove "CSC_Base", "D20150730_GSB_HPSW_101_8876P2" echo y | cfgSave echo y | cfgenable "CSC_Base" 过程截图: # clRGinfo ----------------------------------------------------------------------------Group Name State Node ----------------------------------------------------------------------------userRG OFFLINE p7502901@site1 ONLINE p7502902@site1 ONLINE SECONDARY p7502903@site2 6 / 64 DS8876 甘肃银行 PowerHA HyperSwap 测试记录 结果说明: 2-3 分钟后,资源组在 App2 节点 active 恢复过程描述: cfgadd "CSC_Base", "D20151030_HB_HPSW_Node1fcs0_DS8877" cfgadd "CSC_Base", "D20151030_HB_HPSW_Node1fcs1_DS8877" cfgadd "CSC_Base", "D20150730_GSB_HPSW_101_8876P1" cfgadd "CSC_Base", "D20150730_GSB_HPSW_101_8876P2" echo y | cfgSave echo y | cfgenable "CSC_Base" 然后手工将资源组切换 App1 节点: /usr/es/sbin/cluster/utilities/clRGmove -s 'false' -m -i -g 'userRG' -n 'p7502901' Attempting to move resource group userRG to node p7502901. Waiting for the cluster to process the resource group movement request.... Waiting for the cluster to stabilize........ Resource group movement successful. Resource group userRG is online on node p7502901. Cluster Name: p7502901_cluster Resource Group Name: userRG Node Primary State Secondary State ---------------------------- --------------p7502901@site1 ONLINE OFFLINE p7502902@site1 OFFLINE OFFLINE p7502903@site2 OFFLINE ONLINE SECONDARY 1.4 数据中心之间所有 FC 链路失效 故障描述及预期: 主站点节点访问备站点存储失效+备站点节点访问主站点存储失效+存储之间 PPRC 链路失效 预期效果: IO 会暂停 30 秒然后恢复,存储不会发生切换 7 / 64 甘肃银行 PowerHA HyperSwap 测试记录 Site1 App1 Site2 App2 PowerHA Cluster App3 Tie-breaker Disk SAN_2 SAN_1 Metro Mirror DS8877(Primary) 故障模拟: cfgremove "CSC_Base", " D20150730_GSB_HPSW_101_8876P1" cfgremove "CSC_Base", " D20150730_GSB_HPSW_101_8876P2" cfgremove "CSC_Base", " D20150730_GSB_HPSW_102_8876P1" cfgremove "CSC_Base", " D20150730_GSB_HPSW_102_8876P2" cfgremove "CSC_Base", " D20151030_HB_HPSW_Node3fcs0_DS8877" cfgremove "CSC_Base", " D20151030_HB_HPSW_Node3fcs1_DS8877" cfgremove "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77" echo y | cfgSave echo y | cfgenable "CSC_Base" 过程截图: 8 / 64 DS8876 甘肃银行 PowerHA HyperSwap 测试记录 dscli> lssi Date/Time: 2015 年 7 月 2 日 下午 04 时 40 分 03 秒 IBM DSCLI Version: 7.7.10.289 DS: Name ID Storage Unit Model WWNN State ESSNet ============================================================================== DS8877 IBM.2107-75CGR21 IBM.2107-75CGR20 961 5005076305FFD4A4 Online Enabled dscli> lspprc B800 B900 BD00 Date/Time: 2015 年 7 月 2 日 下午 04 时 31 分 12 秒 IBM DSCLI Version: 7.7.10.289 DS: IBM.2107-75CGR21 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status =============================================================================== ===================================== B800:B800 Suspended Internal Conditions Target Metro Mirror B8 60 Disabled Invalid B900:B900 Suspended Freeze Metro Mirror B9 60 Disabled Invalid BD00:BD00 Suspended Internal Conditions Target Metro Mirror BD 60 Disabled Invalid dscli> lspprcpath B8 B9 BD Date/Time: 2015 年 7 月 2 日 下午 04 时 39 分 55 秒 IBM DSCLI Version: 7.7.10.289 DS: IBM.2107-75CGR21 Src Tgt State SS Port Attached Port Tgt WWNN ======================================================== B8 B8 Failed FFB8 5005076305FFD4AC B9 B9 Failed FFB9 5005076305FFD4AC BD BD Failed FFBD I0134 I0333 5005076305FFD4AC dscli> lssi Date/Time: 2015 年 7 月 2 日 下午 04 时 40 分 28 秒 IBM DSCLI Version: 7.7.10.289 DS: Name ID Storage Unit Model WWNN State ESSNet ============================================================================== DS8876 IBM.2107-75CHD91 IBM.2107-75CHD90 961 5005076305FFD4AC Online Enabled dscli> lspprc B800 B900 BD00 Date/Time: 2015 年 7 月 2 日 下午 04 时 31 分 15 秒 IBM DSCLI Version: 7.7.10.289 DS: IBM.2107-75CHD91 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status 9 / 64 甘肃银行 PowerHA HyperSwap 测试记录 =============================================================================== ========================== B800:B800 Target Full Duplex Metro Mirror B8 unknown Disabled Invalid B900:B900 Target Full Duplex Metro Mirror B9 unknown Disabled Invalid BD00:BD00 Target Full Duplex Metro Mirror BD unknown Disabled Invalid dscli> lspprcpath B8 B9 BD Date/Time: 2015 年 7 月 2 日 下午 04 时 31 分 05 秒 IBM DSCLI Version: 7.7.10.289 DS: IBM.2107-75CHD91 Src Tgt State SS Port Attached Port Tgt WWNN ======================================================== B8 B8 Failed FFB8 I0333 I0134 5005076305FFD4A4 B9 B9 Failed FFB9 I0333 I0134 5005076305FFD4A4 BD BD Failed FFBD I0333 I0134 5005076305FFD4A4 结果说明: IO 会暂停 30 秒然后恢复,存储不会发生切换 恢复过程描述: cfgadd "CSC_Base", " D20150730_GSB_HPSW_101_8876P1" cfgadd "CSC_Base", " D20150730_GSB_HPSW_101_8876P2" cfgadd "CSC_Base", " D20150730_GSB_HPSW_102_8876P1" cfgadd "CSC_Base", " D20150730_GSB_HPSW_102_8876P2" cfgadd "CSC_Base", " D20151030_HB_HPSW_Node3fcs0_DS8877" cfgadd "CSC_Base", " D20151030_HB_HPSW_Node3fcs1_DS8877" cfgadd "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77" echo y | cfgSave echo y | cfgenable "CSC_Base" 创建 pprcpath:在 DS8877 上做 mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91 5005076305FFD4AC -srclss B3 -tgtlss B3 -consistgrp I0134:I0333 mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91 5005076305FFD4AC -srclss B4 -tgtlss B4 -consistgrp I0134:I0333 mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91 5005076305FFD4AC -srclss B8 -tgtlss B8 -consistgrp I0134:I0333 mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91 5005076305FFD4AC -srclss B9 -tgtlss B9 -consistgrp I0134:I0333 mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91 5005076305FFD4AC -srclss BD -tgtlss BD -consistgrp I0134:I0333 创建 pprcpath:在 DS8876 上做 mkpprcpath -dev IBM.2107-75CHD91 10 / 64 -remotedev IBM.2107-75CGR21 -remotewwnn -remotewwnn -remotewwnn -remotewwnn -remotewwnn -remotewwnn 甘肃银行 PowerHA HyperSwap 测试记录 5005076305FFD4A4 -srclss B3 -tgtlss B3 -consistgrp I0333:I0134 mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21 5005076305FFD4A4 -srclss B4 -tgtlss B4 -consistgrp I0333:I0134 mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21 5005076305FFD4A4 -srclss B8 -tgtlss B8 -consistgrp I0333:I0134 mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21 5005076305FFD4A4 -srclss B9 -tgtlss B9 -consistgrp I0333:I0134 mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21 5005076305FFD4A4 -srclss BD -tgtlss BD -consistgrp I0333:I0134 -remotewwnn -remotewwnn -remotewwnn -remotewwnn 增量同步数据,在 DS8877 上做 resumepprc -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91 -type mmir B300:B300 B400:B400 B800:B800 B900:B900 BD00:BD00 最后再手工切换资源组到主存储 DS8877 上 1.5 主存储故障 故障描述及预期: 主分区到主存储之间的链路失效,在测试中,模拟所有 3 个分区到 DS8877 的 Zone 失效及 DS8877->DS8876 之间的 Zone 失效 预期效果: 存储切换到 DS8876, 应用的 IO 暂停 15-20 秒后继续 11 / 64 甘肃银行 PowerHA HyperSwap 测试记录 Site1 App1 Site2 App2 PowerHA Cluster App3 Tie-breaker Disk SAN_2 SAN_1 Metro Mirror DS8877(Primary) 故障模拟: cfgremove "CSC_Base", " D20151030_HB_HPSW_Node1fcs0_DS8877" cfgremove "CSC_Base", " D20151030_HB_HPSW_Node1fcs1_DS8877" cfgremove "CSC_Base", " D20151030_HB_HPSW_Node2fcs0_DS8877" cfgremove "CSC_Base", " D20151030_HB_HPSW_Node2fcs1_DS8877" cfgremove "CSC_Base", " D20151030_HB_HPSW_Node3fcs0_DS8877" cfgremove "CSC_Base", " D20151030_HB_HPSW_Node3fcs1_DS8877" cfgremove "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77" echo y | cfgSave echo y | cfgenable "CSC_Base" 过程截图: 12 / 64 DS8876 甘肃银行 PowerHA HyperSwap 测试记录 # lspprc -Ao hdisk# PPRC Primary state Secondary path group path group ID Primary Storage WWNN Secondary Storage WWNN ID hdisk2 Active 0, 1(s) -1 5005076305ffd4a4,5005076305ffd4ac hdisk3 Active 0, 1(s) -1 5005076305ffd4a4,5005076305ffd4ac hdisk4 Active 0, 1(s) -1 5005076305ffd4a4,5005076305ffd4ac hdisk5 Active 0, 1(s) -1 5005076305ffd4a4,5005076305ffd4ac hdisk6 Active 0, 1(s) -1 5005076305ffd4a4,5005076305ffd4ac # lspprc -p hdisk2 path WWNN LSS VOL group id path group status ======================================================= 0 5005076305ffd4a4 1(s) 5005076305ffd4ac 0xb3 0xb3 0x00 0x00 PRIMARY PRIMARY, SUSPENDED, OOS path path group id id path parent connection status ===================================================================== 0 0 Failed fscsi0 50050763051014a4,40b3400000000000 0 1 Failed fscsi0 50050763050094a4,40b3400000000000 0 2 Failed fscsi0 50050763054014a4,40b3400000000000 0 3 Failed fscsi0 50050763050b14a4,40b3400000000000 0 4 Failed fscsi0 50050763054b14a4,40b3400000000000 0 5 Failed fscsi1 50050763051014a4,40b3400000000000 0 6 Failed fscsi1 50050763050094a4,40b3400000000000 0 7 Failed fscsi1 50050763054014a4,40b3400000000000 0 8 Failed fscsi1 50050763050b14a4,40b3400000000000 0 9 Failed fscsi1 50050763054b14a4,40b3400000000000 1 10 Enabled fscsi0 50050763051bd4ac,40b3400000000000 1 11 Enabled fscsi1 50050763051b94ac,40b3400000000000 # dscli> lssi 13 / 64 甘肃银行 PowerHA HyperSwap 测试记录 Date/Time: 2015 年 6 月 30 日 下午 03 时 22 分 59 秒 IBM DSCLI Version: 7.7.10.289 DS: Name ID Storage Unit Model WWNN State ESSNet ============================================================================== DS8877 IBM.2107-75CGR21 IBM.2107-75CGR20 961 5005076305FFD4A4 Online Enabled dscli> lspprc B300 B400 B800 B900 BD00 Date/Time: 2015 年 6 月 30 日 下午 03 时 22 分 11 秒 IBM DSCLI Version: 7.7.10.289 DS: IBM.2107-75CGR21 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status ============================================================================================ ====== B300:B300 Full Duplex - Metro Mirror B3 60 Disabled Invalid B400:B400 Full Duplex - Metro Mirror B4 60 Disabled Invalid B800:B800 Full Duplex - Metro Mirror B8 60 Disabled Invalid B900:B900 Full Duplex - Metro Mirror B9 60 Disabled Invalid BD00:BD00 Full Duplex - Metro Mirror BD 60 Disabled Invalid dscli> lssi Date/Time: 2015 年 6 月 30 日 下午 03 时 23 分 55 秒 IBM DSCLI Version: 7.7.10.289 DS: Name ID Storage Unit Model WWNN State ESSNet ============================================================================== DS8876 IBM.2107-75CHD91 IBM.2107-75CHD90 961 5005076305FFD4AC Online Enabled dscli> lspprc B300 B400 B800 B900 BD00 Date/Time: 2015 年 6 月 30 日 下午 03 时 24 分 02 秒 IBM DSCLI Version: 7.7.10.289 DS: IBM.2107-75CHD91 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status ============================================================================================ ========= B300:B300 Suspended Host Source Metro Mirror B3 60 Disabled Invalid B400:B400 Suspended Host Source Metro Mirror B4 60 Disabled Invalid B800:B800 Suspended Host Source Metro Mirror B8 60 Disabled Invalid B900:B900 Suspended Host Source Metro Mirror B9 60 Disabled Invalid BD00:BD00 Suspended Host Source Metro Mirror BD 60 结果说明: 存储切换到 DS8876, 应用的 IO 暂停 15-20 秒后继续 恢复过程描述: cfgadd "CSC_Base", " D20151030_HB_HPSW_Node1fcs0_DS8877" cfgadd "CSC_Base", " D20151030_HB_HPSW_Node1fcs1_DS8877" cfgadd "CSC_Base", " D20151030_HB_HPSW_Node2fcs0_DS8877" cfgadd "CSC_Base", " D20151030_HB_HPSW_Node2fcs1_DS8877" cfgadd "CSC_Base", " D20151030_HB_HPSW_Node3fcs0_DS8877" cfgadd "CSC_Base", " D20151030_HB_HPSW_Node3fcs1_DS8877" cfgadd "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77" echo y | cfgSave echo y | cfgenable "CSC_Base" 14 / 64 Disabled Invalid 甘肃银行 PowerHA HyperSwap 测试记录 查看 pprcpath 是否双向全部创建成功,如果没有全部,需要手工创建 需要在 DS8877 上 pausepprc(如果 usermg 设置 resync-action 为 auto,以下操作就不用做了) 然后在 DS8876 上做 failbackpprc 最后再手工 swap 存储到主存储 DS8877 1.6 存储复制链路故障 故障描述及预期: 主分区到主存储之间的链路失效,在测试中,模拟 DS8877 和 DS8876 之间的 Zone 失效 预期效果: IO 会暂停 30 秒然后恢复,存储不会发生切换 Site1 App1 Site2 App2 PowerHA Cluster App3 Tie-breaker Disk SAN_2 SAN_1 Metro Mirror DS8877(Primary) 故障模拟: cfgremove "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77" echo y | cfgSave echo y | cfgenable "CSC_Base" 过程截图: 15 / 64 DS8876 甘肃银行 PowerHA HyperSwap 测试记录 结果说明: IO 会暂停 30 秒然后恢复,存储不会发生切换 恢复过程描述: cfgadd "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77" echo y | cfgSave echo y | cfgenable "CSC_Base" 创建 pprcpath 在 DS8877 上运行 resumepprc 1.7 主站点故障 故障描述及预期: 主站点设备故障,在测试中,模拟 App1、App2 的系统断电以及 DS8877 存储故障 预期效果: 16 / 64 甘肃银行 PowerHA HyperSwap 测试记录 存储切换到 DS8876, 资源组会切换到 App3 节点,整个时间在 2 分钟左右 Site1 App1 Site2 App2 PowerHA Cluster App3 Tie-breaker Disk SAN_2 SAN_1 Metro Mirror DS8877(Primary) DS8876 故障模拟: chsysstate -m SVRP7750-29-SN06BA81T -r lpar -o shutdown --id 2 --immed chsysstate -m SVRP7750-29-SN06BA81T -r lpar -o shutdown --id 3 --immed cfgremove "CSC_Base", " D20151030_HB_HPSW_Node3fcs0_DS8877" cfgremove "CSC_Base", " D20151030_HB_HPSW_Node3fcs1_DS8877" cfgremove "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77" echo y | cfgSave echo y | cfgenable "CSC_Base" 过程截图: # lspprc -Ao hdisk# PPRC state Primary path group ID 0, 1(s) 0, 1(s) 0, 1(s) 0, 1(s) 0, 1(s) Secondary path group ID Primary Storage WWNN Secondary Storage WWNN hdisk2 Active -1 5005076305ffd4a4,5005076305ffd4ac hdisk3 Active -1 5005076305ffd4a4,5005076305ffd4ac hdisk4 Active -1 5005076305ffd4a4,5005076305ffd4ac hdisk5 Active -1 5005076305ffd4a4,5005076305ffd4ac hdisk6 Active -1 5005076305ffd4a4,5005076305ffd4ac # lspprc -p hdisk2 path WWNN LSS VOL path group id group status ======================================================= 0 5005076305ffd4a4 0xb3 0x00 PRIMARY 17 / 64 甘肃银行 PowerHA HyperSwap 测试记录 1(s) 5005076305ffd4ac 0xb3 0x00 PRIMARY, SUSPENDED path path path parent connection group id id status ===================================================================== 0 0 Failed fscsi0 50050763051014a4,40b3400000000000 0 1 Failed fscsi0 50050763050094a4,40b3400000000000 0 2 Failed fscsi0 50050763054014a4,40b3400000000000 0 3 Failed fscsi0 50050763050b14a4,40b3400000000000 0 4 Failed fscsi0 50050763054b14a4,40b3400000000000 0 5 Failed fscsi1 50050763051014a4,40b3400000000000 0 6 Failed fscsi1 50050763050094a4,40b3400000000000 0 7 Failed fscsi1 50050763054014a4,40b3400000000000 0 8 Failed fscsi1 50050763050b14a4,40b3400000000000 0 9 Failed fscsi1 50050763054b14a4,40b3400000000000 1 10 Enabled fscsi0 50050763051bd4ac,40b3400000000000 1 11 Enabled fscsi1 50050763051b94ac,40b3400000000000 # lspprc -p hdisk5 path WWNN LSS VOL path group id group status ======================================================= 0 5005076305ffd4a4 0xb9 0x00 PRIMARY 1(s) 5005076305ffd4ac 0xb9 0x00 PRIMARY, SUSPENDED, OOS path path path parent connection group id id status ===================================================================== 0 0 Failed fscsi0 50050763051014a4,40b9400000000000 0 1 Failed fscsi0 50050763050094a4,40b9400000000000 0 2 Failed fscsi0 50050763054014a4,40b9400000000000 0 3 Failed fscsi0 50050763050b14a4,40b9400000000000 0 4 Failed fscsi0 50050763054b14a4,40b9400000000000 0 5 Failed fscsi1 50050763051014a4,40b9400000000000 0 6 Failed fscsi1 50050763050094a4,40b9400000000000 0 7 Failed fscsi1 50050763054014a4,40b9400000000000 0 8 Failed fscsi1 50050763050b14a4,40b9400000000000 0 9 Failed fscsi1 50050763054b14a4,40b9400000000000 1 10 Enabled fscsi0 50050763051bd4ac,40b9400000000000 1 11 Enabled fscsi1 50050763051b94ac,40b9400000000000 18 / 64 甘肃银行 PowerHA HyperSwap 测试记录 # df Filesystem 512-blocks Free %Used Iused %Iused Mounted on /dev/hd4 2097152 1516208 28% 11913 7% / /dev/hd2 8388608 1802760 79% 63212 23% /usr /dev/hd9var 2097152 627496 71% 7586 9% /var /dev/hd3 20971520 14417496 32% 2886 1% /tmp /dev/hd1 8388608 3846560 55% 430347 50% /home /dev/hd11admin 1048576 1047696 1% 5 1% /admin /proc - /proc /dev/hd10opt 4194304 3338184 21% 9120 3% /opt /dev/livedump 1048576 1045624 1% 8 1% /var/adm/ras/livedump /dev/lvora1 31457280 13321072 58% 21073 2% /oracle /aha 54 1% /aha /dev/lvoradata 62914560 61224472 3% 17 1% /oracle/oradata # hostname p7502903 结果说明: 存储切换到 DS8876, 资源组会切换到 App3 节点,整个时间在 2 分钟左右 恢复过程描述: cfgadd "CSC_Base", " D20151030_HB_HPSW_Node3fcs0_DS8877" cfgadd "CSC_Base", " D20151030_HB_HPSW_Node3fcs1_DS8877" cfgadd "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77" echo y | cfgSave echo y | cfgenable "CSC_Base" 创建 pprcpath:在 DS8877 上做 mkpprcpath -dev IBM.2107-75CGR21 19 / 64 -remotedev IBM.2107-75CHD91 -remotewwnn 甘肃银行 PowerHA HyperSwap 测试记录 5005076305FFD4AC -srclss B3 -tgtlss B3 -consistgrp I0134:I0333 mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91 5005076305FFD4AC -srclss B4 -tgtlss B4 -consistgrp I0134:I0333 mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91 5005076305FFD4AC -srclss B8 -tgtlss B8 -consistgrp I0134:I0333 mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91 5005076305FFD4AC -srclss B9 -tgtlss B9 -consistgrp I0134:I0333 mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91 5005076305FFD4AC -srclss BD -tgtlss BD -consistgrp I0134:I0333 创建 pprcpath:在 DS8876 上做 mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21 5005076305FFD4A4 -srclss B3 -tgtlss B3 -consistgrp I0333:I0134 mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21 5005076305FFD4A4 -srclss B4 -tgtlss B4 -consistgrp I0333:I0134 mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21 5005076305FFD4A4 -srclss B8 -tgtlss B8 -consistgrp I0333:I0134 mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21 5005076305FFD4A4 -srclss B9 -tgtlss B9 -consistgrp I0333:I0134 mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21 5005076305FFD4A4 -srclss BD -tgtlss BD -consistgrp I0333:I0134 -remotewwnn -remotewwnn -remotewwnn -remotewwnn -remotewwnn -remotewwnn -remotewwnn -remotewwnn -remotewwnn failbackpprc -remotedev IBM.2107-75CGR21 -type mmir B300:B300 B400:B400 B800:B800 B900:B900 BD00:BD00 failbackpprc -remotedev IBM.2107-75CGR21 -type mmir BD00:BD00 resumepprc -remotedev IBM.2107-75CGR21 -type mmir BD00:BD00 resumepprc -remotedev IBM.2107-75CGR21 -type mmir B300:B300 chsysstate -m SVRP7750-29-SN06BA81T -r lpar -o on --id 2 chsysstate -m SVRP7750-29-SN06BA81T -r lpar -o on --id 3 chsysstate -m SVRP7750-29-SN06BA81T -r lpar -o shutdown --id 4 --immed chsysstate -m SVRP7750-29-SN06BA81T -r lpar -o on --id 4 1.8 常用脚本 1.8.1 1.8.1.1 AIX 查看 hdisk 和 LUN 的对应关系 for i in 2 3 4 5 6 12 do echo hdisk$i 20 / 64 甘肃银行 PowerHA HyperSwap 测试记录 lscfg -vpl hdisk$i|egrep "Serial Number|Z7" echo "\n" done hdisk2 Serial Number...............75CGR21B Device Specific.(Z7)........B300 hdisk3 Serial Number...............75CGR21B Device Specific.(Z7)........B400 hdisk4 Serial Number...............75CGR21B Device Specific.(Z7)........B800 hdisk5 Serial Number...............75CGR21B Device Specific.(Z7)........B900 hdisk6 Serial Number...............75CGR21B Device Specific.(Z7)........BD00 hdisk12 Serial Number...............75CGR21E Device Specific.(Z7)........E204 1.8.1.2 查看 hdisk 的 HyperSwap enable 情况 # lspprc -Ao hdisk# PPRC state hdisk2 hdisk3 hdisk4 hdisk5 hdisk6 21 / 64 Active Active Active Active Active Primary path group ID 1(s) 1(s) 1(s) 1(s) 1(s) Secondary path group ID 0 0 0 0 0 Primary Storage WWNN Secondary Storage WWNN 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4a4 5005076305ffd4a4 5005076305ffd4a4 5005076305ffd4a4 5005076305ffd4a4 甘肃银行 PowerHA HyperSwap 测试记录 # lspprc -p hdisk2 path WWNN LSS VOL path group id group status ======================================================= 0 5005076305ffd4a4 0xb3 0x00 SECONDARY 1(s) 5005076305ffd4ac 0xb3 0x00 PRIMARY path path path parent connection group id id status ===================================================================== 0 0 Enabled fscsi0 50050763051014a4,40b3400000000000 0 1 Enabled fscsi0 50050763050094a4,40b3400000000000 0 2 Enabled fscsi0 50050763054014a4,40b3400000000000 0 3 Enabled fscsi0 50050763050b14a4,40b3400000000000 0 4 Enabled fscsi0 50050763054b14a4,40b3400000000000 0 5 Enabled fscsi1 50050763051014a4,40b3400000000000 0 6 Enabled fscsi1 50050763050094a4,40b3400000000000 0 7 Enabled fscsi1 50050763054014a4,40b3400000000000 0 8 Enabled fscsi1 50050763050b14a4,40b3400000000000 0 9 Enabled fscsi1 50050763054b14a4,40b3400000000000 1 10 Enabled fscsi0 50050763051bd4ac,40b3400000000000 1 11 Enabled fscsi1 50050763051b94ac,40b3400000000000 1.8.1.3 查看 HyperSwap disk 的 pprcpath 配置情况 # lspprc -c hdisk2 Displaying all paths between LSS B3 and LSS B3 Source Target WWNN SSID LSS Port WWNN SSID LSS Port State =================================================================== 5005076305FFD4AC FFB3 B3 0333 5005076305FFD4A4 FFB3 B3 0134 Up 5005076305FFD4A4 FFB3 B3 0134 5005076305FFD4AC FFB3 B3 0333 Up 1.8.2 1.8.2.1 PowerHA 查看资源组状态 # clRGinfo ----------------------------------------------------------------------------Group Name State Node 22 / 64 甘肃银行 PowerHA HyperSwap 测试记录 ----------------------------------------------------------------------------userRG ONLINE p7502901@site1 OFFLINE p7502902@site1 ONLINE SECONDARY p7502903@site2 1.8.2.2 查看集群管理进程 # clshowsrv -v Status of the RSCT subsystems used by PowerHA SystemMirror: Subsystem Group PID Status cthags cthags 8060928 active ctrmc rsct 6422726 active Status of the PowerHA SystemMirror subsystems: Subsystem Group PID clstrmgrES cluster 5439738 clevmgrdES cluster 10420264 Status of the CAA subsystems: Subsystem Group clcomd caa clconfd caa 1.8.3 PID 6095036 4194506 Status active active Status active active DS8K 1.8.3.1 查看微码版本 dscli> ver -l Date/Time: 2015 年 7 月 1 日 上午 09 时 26 分 05 秒 IBM DSCLI Version: 7.7.10.289 DS: DSCLI 7.7.10.289 StorageManager 7.7.7.0.20140929.1 ================Version================= Storage Image LMC =========================== IBM.2107-75CGR21 7.7.40.364 1.8.3.2 dscli> lssi 23 / 64 查看 ID 及 WWNN 等 甘肃银行 PowerHA HyperSwap 测试记录 Date/Time: 2015 年 7 月 1 日 上午 09 时 26 分 22 秒 IBM DSCLI Version: 7.7.10.289 DS: Name ID Storage Unit Model WWNN State ESSNet ============================================================================== DS8877 IBM.2107-75CGR21 IBM.2107-75CGR20 961 5005076305FFD4A4 Online Enabled 1.8.3.3 查看 PPRC 状态 dscli> lspprc B300 B400 B800 B900 BD00 Date/Time: 2015 年 7 月 1 日 上午 09 时 27 分 29 秒 IBM DSCLI Version: 7.7.10.289 DS: IBM.2107-75CGR21 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status ============================================================================================ ============= B300:B300 Target Full Duplex - Metro Mirror B3 unknown Disabled Invalid B400:B400 Target Full Duplex - Metro Mirror B4 unknown Disabled Invalid B800:B800 Target Full Duplex - Metro Mirror B8 unknown Disabled Invalid B900:B900 Target Full Duplex - Metro Mirror B9 unknown Disabled Invalid BD00:BD00 Target Full Duplex - Metro Mirror BD 1.8.3.4 unknown Disabled Invalid 查看 PPRCPATH 状态及配置 dscli> lspprcpath B3 B4 B8 B9 BD Date/Time: 2015 年 7 月 1 日 上午 09 时 28 分 25 秒 IBM DSCLI Version: 7.7.10.289 DS: IBM.2107-75CGR21 Src Tgt State SS Port Attached Port Tgt WWNN ========================================================= B3 B3 Success FFB3 I0134 I0333 5005076305FFD4AC B4 B4 Success FFB4 I0134 I0333 5005076305FFD4AC B8 B8 Success FFB8 I0134 I0333 5005076305FFD4AC B9 B9 Success FFB9 I0134 I0333 5005076305FFD4AC BD BD Success FFBD I0134 I0333 5005076305FFD4AC dscli> showlss b3 Date/Time: 2015 年 7 月 1 日 上午 09 时 28 分 50 秒 IBM DSCLI Version: 7.7.10.289 DS: IBM.2107-75CGR21 ID B3 Group 1 addrgrp B stgtype fb confgvols 1 subsys 0xFFB3 pprcconsistgrp Enabled 24 / 64 甘肃银行 PowerHA HyperSwap 测试记录 xtndlbztimout 60 secs resgrp RG0 1.8.3.5 查看是否有 reserve lock dscli> showfbvol -reserve b400 Date/Time: 2015 年 7 月 1 日 上午 09 时 32 分 10 秒 IBM DSCLI Version: 7.7.10.289 DS: IBM.2107-75CGR21 CMUN04003E showfbvol: Operation failure: internal error. Contact IBM technical support for assistance. Name HB_HPSW_Data ID B400 accstate Online datastate Normal configstate Normal deviceMTM 2107-900 datatype FB 512 1.8.4 1.8.4.1 SAN Switch 查看当前激活的 zone 信息 SAN768B-02:FID128:admin> cfgactvshow|grep D20151030_HB_HPSW_Node zone: D20151030_HB_HPSW_Node1fcs0_DS8877 zone: D20151030_HB_HPSW_Node1fcs1_DS8877 zone: D20151030_HB_HPSW_Node1fcs2_DS8877 zone: D20151030_HB_HPSW_Node2fcs0_DS8877 zone: D20151030_HB_HPSW_Node2fcs1_DS8877 zone: D20151030_HB_HPSW_Node2fcs2_DS8877 zone: D20151030_HB_HPSW_Node3fcs0_DS8877 zone: D20151030_HB_HPSW_Node3fcs1_DS8877 zone: D20151030_HB_HPSW_Node3fcs2_DS8877 1.8.4.2 Remove zones 以模拟主存储故障为例: cfgremove "CSC_Base", " D20151030_HB_HPSW_Node1fcs0_DS8877" cfgremove "CSC_Base", " D20151030_HB_HPSW_Node1fcs1_DS8877" cfgremove "CSC_Base", " D20151030_HB_HPSW_Node2fcs0_DS8877" 25 / 64 甘肃银行 PowerHA HyperSwap 测试记录 cfgremove "CSC_Base", " D20151030_HB_HPSW_Node2fcs1_DS8877" cfgremove "CSC_Base", " D20151030_HB_HPSW_Node3fcs0_DS8877" cfgremove "CSC_Base", " D20151030_HB_HPSW_Node3fcs1_DS8877" cfgremove "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77" echo y | cfgSave echo y | cfgenable "CSC_Base" 1.8.4.3 Add zones 以模拟恢复主存储为例: cfgadd "CSC_Base", " D20151030_HB_HPSW_Node1fcs0_DS8877" cfgadd "CSC_Base", " D20151030_HB_HPSW_Node1fcs1_DS8877" cfgadd "CSC_Base", " D20151030_HB_HPSW_Node2fcs0_DS8877" cfgadd "CSC_Base", " D20151030_HB_HPSW_Node2fcs1_DS8877" cfgadd "CSC_Base", " D20151030_HB_HPSW_Node3fcs0_DS8877" cfgadd "CSC_Base", " D20151030_HB_HPSW_Node3fcs1_DS8877" cfgadd "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77" echo y | cfgSave echo y | cfgenable "CSC_Base" 26 / 64 甘肃银行 PowerHA HyperSwap 测试记录 2 A-A 2.1 环境描述 2.1.1 拓扑信息 NFS Quorum Site2 Site1 RAC1 RAC2 PowerHA Cluster RAC3 SAN-21 SAN-11 SAN-22 SAN-12 Metro Mirror DS8877(Primary) DS8876 描述: 在测试环境中,模拟三个站点,其中 Site1 有两个 RAC 节点和一个存储,Site2 有一个 RAC 节点和一个存储,Site3 放置 Oracle RAC 的第三个基于 NFS 的仲裁设备,目前暂时将该设备 放置在 Site1;第一个 2.1.2 Zone 信息 主机到 DS8877: zone: D20151030_HB_HPSW_RAC1_fcs0_DS8877 27 / 64 甘肃银行 PowerHA HyperSwap 测试记录 zone: zone: zone: zone: zone: D20151030_HB_HPSW_RAC1_fcs1_DS8877 D20151030_HB_HPSW_RAC2_fcs0_DS8877 D20151030_HB_HPSW_RAC2_fcs1_DS8877 D20151030_HB_HPSW_RAC3_fcs0_DS8877 D20151030_HB_HPSW_RAC3_fcs1_DS8877 主机到 DS8876: zone: D20150730_GSB_HPSW_RAC1_8876P1 zone: D20150730_GSB_HPSW_RAC1_8876P2 zone: D20150730_GSB_HPSW_RAC2_8876P1 zone: D20150730_GSB_HPSW_RAC2_8876P2 zone: D20150730_GSB_HPSW_RAC3_8876P1 zone: D20150730_GSB_HPSW_RAC3_8876P2 Oracle RAC NFS quorum: IP: 172.16.51.11 /dev/fslv00 4194304 # ls /tftpboot lost+found nfs_vote 3578608 存储之间 Metro Mirror: D20150730_GSB_HPSW_PPRC_76to77 2.1.3 主机及 IP 信息 172.16.51.104 172.16.51.107 172.16.50.104 172.16.34.44 rac1 rac1-vip rac1-priv rac1_nfs 172.16.51.105 172.16.51.108 172.16.50.105 172.16.34.45 rac2 rac2-vip rac2-priv rac2_nfs 172.16.51.106 172.16.51.109 172.16.50.106 172.16.34.46 rac3 rac3-vip rac3-priv rac3_nfs 172.16.34.43 nfsserver 172.16.51.120 rac-scan 28 / 64 15% 5 1% /tftpboot 甘肃银行 PowerHA HyperSwap 测试记录 2.1.4 盘信息 # lspv hdisk0 hdisk1 hdisk2 hdisk3 hdisk4 hdisk5 hdisk6 hdisk13 hdisk18 # lspprc -Ao hdisk# PPRC state 00f8ba8109a1cacb 00f8ba813257def7 00f8ba813257dd67 00f8ba813257dbc4 00f8ba813257f6df 00f8ba813257f550 none none none Primary path group ID 0(s) 0(s) 0(s) 0(s) 0(s) Secondary path group ID 1 1 1 1 1 rootvg None None None None caavg_private None None None Primary Storage WWNN hdisk1 Active 5005076305ffd4a4 hdisk2 Active 5005076305ffd4a4 hdisk3 Active 5005076305ffd4a4 hdisk4 Active 5005076305ffd4a4 hdisk5 Active 5005076305ffd4a4 # lspprc -p hdisk1 path WWNN LSS VOL path group id group status ======================================================= 0(s) 5005076305ffd4a4 0xc2 0x00 PRIMARY 1 5005076305ffd4ac 0xc2 0x00 SECONDARY active active Secondary Storage WWNN 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac path path path parent connection group id id status ===================================================================== 0 0 Enabled fscsi0 50050763051014a4,40c2400000000000 0 1 Enabled fscsi0 50050763050094a4,40c2400000000000 0 2 Enabled fscsi0 50050763054014a4,40c2400000000000 0 3 Missing fscsi0 50050763050b14a4,40c2400000000000 0 4 Enabled fscsi0 50050763054b14a4,40c2400000000000 0 5 Enabled fscsi1 50050763051014a4,40c2400000000000 0 6 Enabled fscsi1 50050763050094a4,40c2400000000000 0 7 Enabled fscsi1 50050763054014a4,40c2400000000000 0 8 Missing fscsi1 50050763050b14a4,40c2400000000000 0 9 Enabled fscsi1 50050763054b14a4,40c2400000000000 29 / 64 甘肃银行 PowerHA HyperSwap 测试记录 1 1 10 11 Enabled Enabled fscsi0 50050763051bd4ac,40c2400000000000 fscsi1 50050763051b94ac,40c2400000000000 2.2 RAC1 分区到主存储的 FC 路径失效 故障描述及预期: 主分区到主存储之间的链路失效,在测试中,模拟 RAC1 到 DS8877 的 Zone 失效 预期效果: 存储切换到 DS8876, Oracle DB 的 IO 暂停 20 秒后继续 NFS Quorum Site2 Site1 RAC1 PowerHA Cluster RAC2 RAC3 SAN-21 SAN-11 SAN-22 SAN-12 Metro Mirror DS8876 DS8877(Primary) # lspprc -Ao hdisk# PPRC state hdisk1 Active hdisk2 Active hdisk3 Active hdisk4 Active hdisk5 Active # lspprc -c hdisk1 30 / 64 Primary path group ID 0(s) 0(s) 0(s) 0(s) 0(s) Secondary path group ID 1 1 1 1 1 Primary Storage WWNN Secondary Storage WWNN 5005076305ffd4a4 5005076305ffd4a4 5005076305ffd4a4 5005076305ffd4a4 5005076305ffd4a4 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac 甘肃银行 PowerHA HyperSwap 测试记录 Displaying all paths between LSS C2 and LSS C2 Source Target WWNN SSID LSS Port WWNN SSID LSS Port State =================================================================== 5005076305FFD4A4 FFC2 C2 0134 5005076305FFD4AC FFC2 C2 0333 Up 5005076305FFD4AC FFC2 C2 0333 5005076305FFD4A4 FFC2 C2 0134 Up # lspprc -p hdisk1 path WWNN LSS VOL path group id group status ======================================================= 0(s) 5005076305ffd4a4 0xc2 0x00 PRIMARY 1 5005076305ffd4ac 0xc2 0x00 SECONDARY path path path parent connection group id id status ===================================================================== 0 0 Enabled fscsi0 50050763051014a4,40c2400000000000 0 1 Enabled fscsi0 50050763050094a4,40c2400000000000 0 2 Enabled fscsi0 50050763054014a4,40c2400000000000 0 3 Missing fscsi0 50050763050b14a4,40c2400000000000 0 4 Enabled fscsi0 50050763054b14a4,40c2400000000000 0 5 Enabled fscsi1 50050763051014a4,40c2400000000000 0 6 Enabled fscsi1 50050763050094a4,40c2400000000000 0 7 Enabled fscsi1 50050763054014a4,40c2400000000000 0 8 Missing fscsi1 50050763050b14a4,40c2400000000000 0 9 Enabled fscsi1 50050763054b14a4,40c2400000000000 1 10 Enabled fscsi0 50050763051bd4ac,40c2400000000000 1 11 Enabled fscsi1 50050763051b94ac,40c2400000000000 故障模拟: cfgremove "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs0_DS8877" cfgremove "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs1_DS8877" echo y | cfgSave echo y | cfgenable "CSC_Base" 过程截图:RAC1 31 / 64 甘肃银行 PowerHA HyperSwap 测试记录 过程截图:RAC2 过程截图:RAC3 # lspprc -p hdisk2 path WWNN LSS VOL path group id group status ======================================================= 0 5005076305ffd4a4 0xc3 0x00 SECONDARY 32 / 64 甘肃银行 PowerHA HyperSwap 测试记录 1(s) 5005076305ffd4ac 0xc3 0x00 PRIMARY path path path parent connection group id id status ===================================================================== 0 0 Failed fscsi0 50050763051014a4,40c3400000000000 0 1 Failed fscsi0 50050763050094a4,40c3400000000000 0 2 Failed fscsi0 50050763054014a4,40c3400000000000 0 3 Missing fscsi0 50050763050b14a4,40c3400000000000 0 4 Failed fscsi0 50050763054b14a4,40c3400000000000 0 5 Failed fscsi1 50050763051014a4,40c3400000000000 0 6 Failed fscsi1 50050763050094a4,40c3400000000000 0 7 Failed fscsi1 50050763054014a4,40c3400000000000 0 8 Missing fscsi1 50050763050b14a4,40c3400000000000 0 9 Failed fscsi1 50050763054b14a4,40c3400000000000 1 10 Enabled fscsi0 50050763051bd4ac,40c3400000000000 1 11 Enabled fscsi1 50050763051b94ac,40c3400000000000 结果说明: 存储切换到 DS8876, Oracle DB 的 IO 暂停 20 秒后继续 恢复过程描述: cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs0_DS8877" cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs1_DS8877" echo y | cfgSave echo y | cfgenable "CSC_Base" 然后手工切换存储: /usr/es/sbin/cluster/xd_generic/xd_cli/cl_clxd_manage_mg_smit -t 'user' -m 'usermg' -o 'swap' /usr/es/sbin/cluster/xd_generic/xd_cli/cl_clxd_manage_mg_smit -t 'repository' -m 'repmg' -o 'swap' # lspprc -Ao hdisk# PPRC state Primary path group ID 0(s) 0(s) 0(s) 0(s) 0(s) Secondary path group ID 1 1 1 1 1 Primary Storage WWNN hdisk1 Active 5005076305ffd4a4 hdisk2 Active 5005076305ffd4a4 hdisk3 Active 5005076305ffd4a4 hdisk4 Active 5005076305ffd4a4 hdisk5 Active 5005076305ffd4a4 # lspprc -p hdisk1 path WWNN LSS VOL path group id group status ======================================================= 0(s) 5005076305ffd4a4 0xc2 0x00 PRIMARY 33 / 64 Secondary Storage WWNN 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac 甘肃银行 PowerHA HyperSwap 测试记录 1 5005076305ffd4ac 0xc2 0x00 SECONDARY path path path parent connection group id id status ===================================================================== 0 0 Enabled fscsi0 50050763051014a4,40c2400000000000 0 1 Enabled fscsi0 50050763050094a4,40c2400000000000 0 2 Enabled fscsi0 50050763054014a4,40c2400000000000 0 3 Missing fscsi0 50050763050b14a4,40c2400000000000 0 4 Enabled fscsi0 50050763054b14a4,40c2400000000000 0 5 Enabled fscsi1 50050763051014a4,40c2400000000000 0 6 Enabled fscsi1 50050763050094a4,40c2400000000000 0 7 Enabled fscsi1 50050763054014a4,40c2400000000000 0 8 Missing fscsi1 50050763050b14a4,40c2400000000000 0 9 Enabled fscsi1 50050763054b14a4,40c2400000000000 1 10 Enabled fscsi0 50050763051bd4ac,40c2400000000000 1 11 Enabled fscsi1 50050763051b94ac,40c2400000000000 2.3 主分区到所有存储的 FC 路径失效 故障描述及预期: 主分区到所有存储之间的链路失效,在测试中,模拟 RAC1 到 DS8877&DS8876 的所有 Zone 实效 预期效果: 存储不切换,RAC1 节点会被重起,RAC2 和 RAC3 的 IO 暂停 1 分钟后继续 34 / 64 甘肃银行 PowerHA HyperSwap 测试记录 NFS Quorum Site2 Site1 RAC1 RAC2 PowerHA Cluster RAC3 SAN-21 SAN-11 SAN-22 SAN-12 Metro Mirror DS8877(Primary) DS8876 故障模拟: cfgremove "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs0_DS8877" cfgremove "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs1_DS8877" cfgremove "CSC_Base", "D20150730_GSB_HPSW_RAC1_8876P1" cfgremove "CSC_Base", "D20150730_GSB_HPSW_RAC1_8876P2" echo y | cfgSave echo y | cfgenable "CSC_Base" 故障发生后,由于 RAC1 丢失了所有存储的访问,其中包括 Repository disk,根据之前所设 置的 repos_mode=a,该节点会被 crash。从日志看,大约故障发生后 20 秒内,RAC1 节点被 crash. Note: 大约 17:09:40 发生故障,17:10:00 发生 crash. 以下为 RAC1 节点上的 dump 信息: (4)> stat SYSTEM_CONFIGURATION: CHRP_SMP_PCI POWER_PC POWER_7 machine with 12 available CPU(s) (64-bit registers) SYSTEM STATUS: sysname... AIX nodename.. rac1 release... 1 35 / 64 甘肃银行 PowerHA HyperSwap 测试记录 version... 7 build date Oct 15 2014 build time 12:38:00 label..... 1441D_71Q machine... 00F8BA814C00 nid....... F8BA814C time of crash: Mon Jul 6 17:10:00 2015 age of system: 4 day, 6 hr., 9 min., 1 sec. xmalloc debug: enabled FRRs active... 0 FRRs started.. 0 PANIC MESSAGES: Lost access to cluster repository disk. PANIC STRING: Lost access to cluster repository disk. 过程截图: RAC2 RAC3: 36 / 64 甘肃银行 PowerHA HyperSwap 测试记录 # lspprc -Ao hdisk# PPRC state Primary path group ID 0(s) 0(s) 0(s) 0(s) 0(s) Secondary path group ID 1 1 1 1 1 Primary Storage WWNN hdisk1 Active 5005076305ffd4a4 hdisk2 Active 5005076305ffd4a4 hdisk3 Active 5005076305ffd4a4 hdisk4 Active 5005076305ffd4a4 hdisk5 Active 5005076305ffd4a4 # lspprc -p hdisk1 path WWNN LSS VOL path group id group status ======================================================= 0(s) 5005076305ffd4a4 0xc2 0x00 PRIMARY 1 5005076305ffd4ac 0xc2 0x00 SECONDARY Secondary Storage WWNN 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac path path path parent connection group id id status ===================================================================== 0 0 Enabled fscsi0 50050763051014a4,40c2400000000000 0 1 Enabled fscsi0 50050763050094a4,40c2400000000000 0 2 Enabled fscsi0 50050763054014a4,40c2400000000000 0 3 Missing fscsi0 50050763050b14a4,40c2400000000000 0 4 Enabled fscsi0 50050763054b14a4,40c2400000000000 0 5 Enabled fscsi1 50050763051014a4,40c2400000000000 0 6 Enabled fscsi1 50050763050094a4,40c2400000000000 0 7 Enabled fscsi1 50050763054014a4,40c2400000000000 37 / 64 甘肃银行 PowerHA HyperSwap 测试记录 0 0 1 1 8 9 10 11 Missing Enabled Enabled Enabled fscsi1 fscsi1 fscsi0 fscsi1 50050763050b14a4,40c2400000000000 50050763054b14a4,40c2400000000000 50050763051bd4ac,40c2400000000000 50050763051b94ac,40c2400000000000 Oracle alert log on RAC2 node: RAC1 节点 crash 后,RAC2 和 RAC3 会进行重组,时间为 misscount=45 秒。 因此,在故障发生后 65s(20+45)后重组完成,IO 继续。 2015-07-06 17:10:23.490: [cssd(15335570)]CRS-1612:Network communication with node rac1 (1) missing for 50% of timeout interval. Removal of this node from cluster in 21.733 seconds 2015-07-06 17:10:34.504: [cssd(15335570)]CRS-1611:Network communication with node rac1 (1) missing for 75% of timeout interval. Removal of this node from cluster in 10.719 seconds 2015-07-06 17:10:41.509: [cssd(15335570)]CRS-1610:Network communication with node rac1 (1) missing for 90% of timeout interval. Removal of this node from cluster in 3.715 seconds 2015-07-06 17:10:45.226: [cssd(15335570)]CRS-1632:Node rac1 is being removed from the cluster in cluster incarnation 331664508 2015-07-06 17:10:45.257: [cssd(15335570)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac2 rac3 . 2015-07-06 17:10:45.278: [ctssd(9175164)]CRS-2407:The new Cluster Time Synchronization Service reference node is host rac2. 2015-07-06 17:10:49.827: [crsd(16187422)]CRS-5504:Node down event reported for node 'rac1'. 2015-07-06 17:11:08.387: [crsd(16187422)]CRS-2773:Server 'rac1' has been removed from pool 'Generic'. 2015-07-06 17:11:08.393: [crsd(16187422)]CRS-2773:Server 'rac1' has been removed from pool 'ora.orcl'. 结果说明: 存储不切换,RAC1 节点会被重起,RAC2 和 RAC3 的 IO 暂停 1 分钟后继续 恢复过程描述: cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs0_DS8877" cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs1_DS8877" cfgadd "CSC_Base", "D20150730_GSB_HPSW_RAC1_8876P1" cfgadd "CSC_Base", "D20150730_GSB_HPSW_RAC1_8876P2" echo y | cfgSave echo y | cfgenable "CSC_Base" 重起 RAC1 节点,重起后,RAC1 节点自动加入 CAA Cluster;然后手工启动 Oracle 和 PowerHA 服务。 38 / 64 甘肃银行 PowerHA HyperSwap 测试记录 2.4 数据中心之间所有 FC 链路失效 故障描述及预期: 跨站点的所有 FC 访问全部中断 预期效果: RAC3 被 crash,RAC1 和 RAC2 在 IO 暂停 65 后继续,存储不切换 NFS Quorum Site2 Site1 RAC1 RAC2 PowerHA Cluster RAC3 SAN-21 SAN-11 SAN-22 SAN-12 Metro Mirror DS8877(Primary) 故障模拟: cfgremove "CSC_Base", "D20150730_GSB_HPSW_RAC1_8876P1" cfgremove "CSC_Base", "D20150730_GSB_HPSW_RAC1_8876P2" cfgremove "CSC_Base", "D20150730_GSB_HPSW_RAC2_8876P1" cfgremove "CSC_Base", "D20150730_GSB_HPSW_RAC2_8876P2" cfgremove "CSC_Base", "D20151030_HB_HPSW_RAC3_fcs0_DS8877" cfgremove "CSC_Base", "D20151030_HB_HPSW_RAC3_fcs1_DS8877" cfgremove "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77" echo y | cfgSave echo y | cfgenable "CSC_Base" 过程截图: 39 / 64 DS8876 甘肃银行 PowerHA HyperSwap 测试记录 RAC2: # crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----------------------------- --------1. ONLINE 0288bafd47c14f27bf70eb282798b259 (/nfsvote3/nfs_vote) [VOTEDG] 2. ONLINE 473861c8d4ff4f17bf226f031c32c575 (/dev/rhdisk6) [VOTEDG] Located 2 voting disk(s). # lspprc -Ao hdisk# PPRC state hdisk1 hdisk2 hdisk3 hdisk4 40 / 64 Active Active Active Active Primary path group ID 0(s) 0(s) 0(s) 0(s) Secondary path group ID 1 1 1 1 Primary Storage WWNN Secondary Storage WWNN 5005076305ffd4a4 5005076305ffd4a4 5005076305ffd4a4 5005076305ffd4a4 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac 甘肃银行 PowerHA HyperSwap 测试记录 hdisk5 Active 0(s) 1 5005076305ffd4a4 5005076305ffd4ac # lspprc -c hdisk1 Displaying all paths between LSS C2 and LSS C2 Source Target WWNN SSID LSS Port WWNN SSID LSS Port State =================================================================== 5005076305FFD4A4 FFC2 C2 0134 5005076305FFD4AC FFC2 C2 0333 Up # lspprc -p hdisk1 path WWNN LSS VOL path group id group status ======================================================= 0(s) 5005076305ffd4a4 0xc2 0x00 PRIMARY 1 5005076305ffd4ac 0xc2 0x00 SECONDARY path path path parent connection group id id status ===================================================================== 0 0 Enabled fscsi0 50050763051014a4,40c2400000000000 0 1 Enabled fscsi0 50050763050094a4,40c2400000000000 0 2 Enabled fscsi0 50050763054014a4,40c2400000000000 0 3 Enabled fscsi0 50050763050b14a4,40c2400000000000 0 4 Enabled fscsi0 50050763054b14a4,40c2400000000000 0 5 Enabled fscsi1 50050763051014a4,40c2400000000000 0 6 Enabled fscsi1 50050763050094a4,40c2400000000000 0 7 Enabled fscsi1 50050763054014a4,40c2400000000000 0 8 Enabled fscsi1 50050763050b14a4,40c2400000000000 0 9 Enabled fscsi1 50050763054b14a4,40c2400000000000 1 10 Failed fscsi0 50050763051bd4ac,40c2400000000000 1 11 Failed fscsi1 50050763051b94ac,40c2400000000000 [cssd(10748108)]CRS-1649:An I/O error occured for voting file: (:CSSNM00059:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log. 2015-07-06 18:24:51.774: [cssd(10748108)]CRS-1649:An I/O error occured for voting file: (:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log. 2015-07-06 18:24:55.794: [cssd(10748108)]CRS-1649:An I/O error occured for voting file: (:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log. 2015-07-06 18:24:59.814: [cssd(10748108)]CRS-1649:An I/O error occured for voting file: (:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log. 2015-07-06 18:25:03.834: 41 / 64 /dev/rhdisk18; details at /dev/rhdisk18; details at /dev/rhdisk18; details at /dev/rhdisk18; details at 甘肃银行 PowerHA HyperSwap 测试记录 [cssd(10748108)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk18; details at (:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log. 2015-07-06 18:25:07.854: [cssd(10748108)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk18; details at (:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log. 2015-07-06 18:25:11.877: [cssd(10748108)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk18; details at (:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log. 2015-07-06 18:25:15.895: [cssd(10748108)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk18; details at (:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log. 2015-07-06 18:25:16.893: [cssd(10748108)]CRS-1612:Network communication with node rac3 (3) missing for 50% of timeout interval. Removal of this node from cluster in 22.089 seconds 2015-07-06 18:25:17.704: [cssd(10748108)]CRS-1604:CSSD voting file is offline: /dev/rhdisk18; details at (:CSSNM00058:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log. 2015-07-06 18:25:19.924: [cssd(10748108)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk18; details at (:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log. 2015-07-06 18:25:23.959: [cssd(10748108)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk18; details at (:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log. 2015-07-06 18:25:27.909: [cssd(10748108)]CRS-1611:Network communication with node rac3 (3) missing for 75% of timeout interval. Removal of this node from cluster in 11.073 seconds 2015-07-06 18:25:27.987: [cssd(10748108)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk18; details at (:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log. 2015-07-06 18:25:32.015: [cssd(10748108)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk18; details at (:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log. 2015-07-06 18:25:34.917: [cssd(10748108)]CRS-1610:Network communication with node rac3 (3) missing for 90% of timeout interval. Removal of this node from cluster in 4.064 seconds 2015-07-06 18:25:36.035: [cssd(10748108)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk18; details at (:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log. 2015-07-06 18:25:38.984: [cssd(10748108)]CRS-1632:Node rac3 is being removed from the cluster in cluster incarnation 331671385 2015-07-06 18:25:38.993: [cssd(10748108)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 rac2 . 2015-07-06 18:25:39.022: 42 / 64 甘肃银行 PowerHA HyperSwap 测试记录 [crsd(8323192)]CRS-5504:Node down event reported for node 'rac3'. 2015-07-06 18:25:41.016: [cssd(10748108)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk18; details at (:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log. 2015-07-06 18:25:41.723: [crsd(8323192)]CRS-2773:Server 'rac3' has been removed from pool 'Generic'. 2015-07-06 18:25:41.724: [crsd(8323192)]CRS-2773:Server 'rac3' has been removed from pool 'ora.orcl'. 2015-07-06 18:25:42.232: [cssd(10748108)]CRS-1626:A Configuration change request completed successfully 2015-07-06 18:25:42.238: [cssd(10748108)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 rac2 结果说明: RAC3 被 crash,RAC1 和 RAC2 在 IO 暂停 65 后继续 恢复过程描述: cfgadd "CSC_Base", "D20150730_GSB_HPSW_RAC1_8876P1" cfgadd "CSC_Base", "D20150730_GSB_HPSW_RAC1_8876P2" cfgadd "CSC_Base", "D20150730_GSB_HPSW_RAC2_8876P1" cfgadd "CSC_Base", "D20150730_GSB_HPSW_RAC2_8876P2" cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC3_fcs0_DS8877" cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC3_fcs1_DS8877" cfgadd "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77" echo y | cfgenable "CSC_Base" 创建 pprcpath:在 DS8877 上做 mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91 5005076305FFD4AC -srclss B3 -tgtlss B3 -consistgrp I0134:I0333 mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91 5005076305FFD4AC -srclss B4 -tgtlss B4 -consistgrp I0134:I0333 mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91 5005076305FFD4AC -srclss B8 -tgtlss B8 -consistgrp I0134:I0333 mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91 5005076305FFD4AC -srclss B9 -tgtlss B9 -consistgrp I0134:I0333 mkpprcpath -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91 5005076305FFD4AC -srclss BD -tgtlss BD -consistgrp I0134:I0333 创建 pprcpath:在 DS8876 上做 mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21 5005076305FFD4A4 -srclss B3 -tgtlss B3 -consistgrp I0333:I0134 mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21 5005076305FFD4A4 -srclss B4 -tgtlss B4 -consistgrp I0333:I0134 mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21 5005076305FFD4A4 -srclss B8 -tgtlss B8 -consistgrp I0333:I0134 mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21 43 / 64 -remotewwnn -remotewwnn -remotewwnn -remotewwnn -remotewwnn -remotewwnn -remotewwnn -remotewwnn -remotewwnn 甘肃银行 PowerHA HyperSwap 测试记录 5005076305FFD4A4 -srclss B9 -tgtlss B9 -consistgrp I0333:I0134 mkpprcpath -dev IBM.2107-75CHD91 -remotedev IBM.2107-75CGR21 5005076305FFD4A4 -srclss BD -tgtlss BD -consistgrp I0333:I0134 -remotewwnn 增量同步数据,在 DS8877 上做 resumepprc -dev IBM.2107-75CGR21 -remotedev IBM.2107-75CHD91 -type mmir B300:B300 B400:B400 B800:B800 B900:B900 BD00:BD00 结果说明: 2.5 主存储故障 故障描述及预期: DS8877 故障 预期效果: 存储切换到 DS8876, 数据库的 IO 暂停 30 秒后继续 NFS Quorum Site2 Site1 RAC1 RAC2 PowerHA Cluster RAC3 SAN-21 SAN-11 SAN-22 SAN-12 Metro Mirror DS8877(Primary) 故障模拟: cfgremove "CSC_Base", " D20151030_HB_HPSW_Node1fcs0_DS8877" cfgremove "CSC_Base", " D20151030_HB_HPSW_Node1fcs1_DS8877" cfgremove "CSC_Base", " D20151030_HB_HPSW_Node2fcs0_DS8877" 44 / 64 DS8876 甘肃银行 PowerHA HyperSwap 测试记录 cfgremove "CSC_Base", " D20151030_HB_HPSW_Node2fcs1_DS8877" cfgremove "CSC_Base", " D20151030_HB_HPSW_Node3fcs0_DS8877" cfgremove "CSC_Base", " D20151030_HB_HPSW_Node3fcs1_DS8877" cfgremove "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77" echo y | cfgSave echo y | cfgenable "CSC_Base" 过程截图: RAC1: RAC2: RAC3: 45 / 64 甘肃银行 PowerHA HyperSwap 测试记录 Oracle 日志: 2015-07-07 13:52:30.413: [cssd(10748108)]CRS-1605:CSSD voting file is online: /dev/rhdisk13; details in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log. 2015-07-07 13:52:30.413: [cssd(10748108)]CRS-1604:CSSD voting file is offline: /dev/rhdisk6; details at (:CSSNM00069:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log. 2015-07-07 13:52:30.413: [cssd(10748108)]CRS-1604:CSSD voting file is offline: /nfsvote3/nfs_vote; details at (:CSSNM00069:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log. 2015-07-07 13:52:30.413: [cssd(10748108)]CRS-1626:A Configuration change request completed successfully 2015-07-07 13:52:30.424: [cssd(10748108)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 rac2 rac3 . 2015-07-07 13:55:52.884: [cssd(10748108)]CRS-1605:CSSD voting file is online: /nfsvote3/nfs_vote; details in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log. 2015-07-07 13:55:52.884: [cssd(10748108)]CRS-1605:CSSD voting file is online: /dev/rhdisk6; details in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log. 2015-07-07 13:55:52.884: [cssd(10748108)]CRS-1605:CSSD voting file is online: /dev/rhdisk18; details in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log. 2015-07-07 13:55:52.884: [cssd(10748108)]CRS-1604:CSSD voting file is offline: /dev/rhdisk13; details at (:CSSNM00069:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log. 46 / 64 甘肃银行 PowerHA HyperSwap 测试记录 2015-07-07 13:55:52.884: [cssd(10748108)]CRS-1626:A Configuration change request completed successfully 2015-07-07 13:55:52.893: [cssd(10748108)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 rac2 rac3 . 2015-07-07 14:07:07.338: [cssd(10748108)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk6; details at (:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log. 2015-07-07 14:07:07.338: [cssd(10748108)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk6; details at (:CSSNM00059:) in /u01/app/11.2.0.4/grid/log/rac1/cssd/ocssd.log. 2015-07-07 14:07:11.408: [cssd(10748108)]CRS-1626:A Configuration change request completed successfully 2015-07-07 14:07:11.417: [cssd(10748108)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac1 rac2 rac3 . # crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----------------------------- --------1. ONLINE 0eec82519af84f8bbf3ae0f365ec3148 (/nfsvote3/nfs_vote) [VOTEDG] 2. ONLINE 321e9f400a774f98bf8f9d6178c91ba6 (/dev/rhdisk18) [VOTEDG] Located 2 voting disk(s). # # lspprc -Ao hdisk# PPRC state Primary path group ID 1(s) 1(s) 1(s) 1(s) 1(s) Secondary path group ID 0 0 0 0 0 Primary Storage WWNN hdisk1 Active 5005076305ffd4ac hdisk2 Active 5005076305ffd4ac hdisk3 Active 5005076305ffd4ac hdisk4 Active 5005076305ffd4ac hdisk5 Active 5005076305ffd4ac # lspprc -p hdisk1 path WWNN LSS VOL path group id group status ======================================================= 0 5005076305ffd4a4 0xc2 0x00 SECONDARY 1(s) 5005076305ffd4ac 0xc2 0x00 PRIMARY Secondary Storage WWNN 5005076305ffd4a4 5005076305ffd4a4 5005076305ffd4a4 5005076305ffd4a4 5005076305ffd4a4 path path path parent connection group id id status ===================================================================== 0 0 Failed fscsi0 50050763051014a4,40c2400000000000 0 1 Failed fscsi0 50050763050094a4,40c2400000000000 47 / 64 甘肃银行 PowerHA HyperSwap 测试记录 0 2 Failed fscsi0 50050763054014a4,40c2400000000000 0 3 Failed fscsi0 50050763050b14a4,40c2400000000000 0 4 Failed fscsi0 50050763054b14a4,40c2400000000000 0 5 Failed fscsi1 50050763051014a4,40c2400000000000 0 6 Failed fscsi1 50050763050094a4,40c2400000000000 0 7 Failed fscsi1 50050763054014a4,40c2400000000000 0 8 Failed fscsi1 50050763050b14a4,40c2400000000000 0 9 Failed fscsi1 50050763054b14a4,40c2400000000000 1 10 Enabled fscsi0 50050763051bd4ac,40c2400000000000 1 11 Enabled fscsi1 50050763051b94ac,40c2400000000000 # lspprc -c hdisk1 Displaying all paths between LSS C2 and LSS C2 Source Target WWNN SSID LSS Port WWNN SSID LSS Port State =================================================================== 5005076305FFD4AC FFC2 C2 0333 5005076305FFD4A4 FFC2 C2 0134 Up 结果说明: 存储切换到 DS8876, 数据库的 IO 暂停 30 秒后继续 恢复过程描述: cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs0_DS8877" cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs1_DS8877" cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC2_fcs0_DS8877" cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC2_fcs1_DS8877" cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC3_fcs0_DS8877" cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC3_fcs1_DS8877" cfgadd "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77" echo y | cfgSave echo y | cfgenable "CSC_Base" 查看 pprcpath 是否双向全部创建成功,如果没有全部,需要手工创建 然后手工进行数据的同步(从 DS8876 -> DS8877) 2.6 存储复制链路故障 故障描述及预期: 站点间 PPRC 链路故障,在测试中,模拟 DS8877 到 DS8876 的 Zone 失效 预期效果: 数据库的 IO 暂停 30 秒后继续,存储不切换 48 / 64 甘肃银行 PowerHA HyperSwap 测试记录 NFS Quorum Site2 Site1 RAC1 RAC2 PowerHA Cluster RAC3 SAN-21 SAN-11 SAN-22 SAN-12 Metro Mirror DS8877(Primary) 故障模拟: cfgremove "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77" echo y | cfgSave echo y | cfgenable "CSC_Base" 过程截图: RAC1: 49 / 64 DS8876 甘肃银行 PowerHA HyperSwap 测试记录 # lspprc -p hdisk1 path WWNN LSS VOL path group id group status ======================================================= 0(s) 5005076305ffd4a4 0xc2 0x00 PRIMARY, SUSPENDED, OOS 1 5005076305ffd4ac 0xc2 0x00 SECONDARY path path path parent connection group id id status ===================================================================== 0 0 Enabled fscsi0 50050763051014a4,40c2400000000000 0 1 Enabled fscsi0 50050763050094a4,40c2400000000000 0 2 Enabled fscsi0 50050763054014a4,40c2400000000000 0 3 Enabled fscsi0 50050763050b14a4,40c2400000000000 0 4 Enabled fscsi0 50050763054b14a4,40c2400000000000 0 5 Enabled fscsi1 50050763051014a4,40c2400000000000 0 6 Enabled fscsi1 50050763050094a4,40c2400000000000 0 7 Enabled fscsi1 50050763054014a4,40c2400000000000 0 8 Enabled fscsi1 50050763050b14a4,40c2400000000000 0 9 Enabled fscsi1 50050763054b14a4,40c2400000000000 1 10 Enabled fscsi0 50050763051bd4ac,40c2400000000000 1 11 Enabled fscsi1 50050763051b94ac,40c2400000000000 # lspprc -c hdisk1 Displaying all paths between LSS C2 and LSS C2 Source Target WWNN SSID LSS Port WWNN SSID LSS Port State =================================================================== 5005076305FFD4AC FFC2 C2 0333 5005076305FFD4A4 FFC2 C2 0134 Down dscli> lspprcpath C2 c3 C7 C8 CC Date/Time: 2015 年 7 月 7 日 下午 03 时 12 分 51 秒 IBM DSCLI Version: 7.7.10.289 DS: IBM.2107-75CGR21 Src Tgt State SS Port Attached Port Tgt WWNN ======================================================= 50 / 64 甘肃银行 PowerHA HyperSwap 测试记录 C2 C3 C7 C8 CC C2 C3 C7 C8 CC Failed FFC2 Failed FFC3 Failed FFC7 Failed FFC8 Failed FFCC - - 5005076305FFD4AC 5005076305FFD4AC 5005076305FFD4AC 5005076305FFD4AC 5005076305FFD4AC 结果说明 数据库的 IO 暂停 30 秒后继续,存储不切换 恢复过程描述: cfgadd "CSC_Base", " D20150730_GSB_HPSW_PPRC_76to77" echo y | cfgSave echo y | cfgenable "CSC_Base" 创建 pprcpath 在 DS8877 上运行 resumepprc 2.7 主站点故障 故障描述及预期: 主站点故障 预期效果: RAC3 接管业务,存储发生切换,IO 暂停 65s 51 / 64 甘肃银行 PowerHA HyperSwap 测试记录 NFS Quorum Site2 Site1 RAC1 RAC2 PowerHA Cluster RAC3 SAN-21 SAN-11 SAN-22 SAN-12 Metro Mirror DS8877(Primary) DS8876 故障模拟: chsysstate -m SVRP7750-29-SN06BA81T -r lpar -o shutdown --id 11 --immed chsysstate -m SVRP7750-29-SN06BA81T -r lpar -o shutdown --id 12 --immed cfgremove "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs0_DS8877" cfgremove "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs1_DS8877" cfgremove "CSC_Base", "D20151030_HB_HPSW_RAC2_fcs0_DS8877" cfgremove "CSC_Base", "D20151030_HB_HPSW_RAC2_fcs1_DS8877" cfgremove "CSC_Base", "D20151030_HB_HPSW_RAC3_fcs0_DS8877" cfgremove "CSC_Base", "D20151030_HB_HPSW_RAC3_fcs1_DS8877" cfgremove "CSC_Base", "D20150730_GSB_HPSW_PPRC_76to77" echo y | cfgenable "CSC_Base" 过程截图: 52 / 64 甘肃银行 PowerHA HyperSwap 测试记录 2015-07-07 16:05:08.830: [cssd(12779668)]CRS-1612:Network communication with node rac1 (1) missing for 50% of timeout interval. Removal of this node from cluster in 21.834 seconds 2015-07-07 16:05:09.428: [cssd(12779668)]CRS-1615:No I/O has completed after 50% of the maximum interval. Voting file /dev/rhdisk6 will be considered not functional in 18703 milliseconds 2015-07-07 16:05:11.835: [cssd(12779668)]CRS-1612:Network communication with node rac2 (2) missing for 50% of timeout interval. Removal of this node from cluster in 21.967 seconds 2015-07-07 16:05:13.194: [cssd(12779668)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk6; details at (:CSSNM00059:) in /u01/app/11.2.0.4/grid/log/rac3/cssd/ocssd.log. 2015-07-07 16:05:13.194: [cssd(12779668)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk6; details at (:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac3/cssd/ocssd.log. 2015-07-07 16:05:13.245: [cssd(12779668)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk6; details at (:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac3/cssd/ocssd.log. 2015-07-07 16:05:13.834: [cssd(12779668)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk6; details at (:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac3/cssd/ocssd.log. 2015-07-07 16:05:17.843: 53 / 64 甘肃银行 PowerHA HyperSwap 测试记录 [cssd(12779668)]CRS-1614:No I/O has completed after 75% of the maximum interval. Voting file /dev/rhdisk6 will be considered not functional in 10288 milliseconds 2015-07-07 16:05:17.983: [cssd(12779668)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk6; details at (:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac3/cssd/ocssd.log. 2015-07-07 16:05:19.845: [cssd(12779668)]CRS-1611:Network communication with node rac1 (1) missing for 75% of timeout interval. Removal of this node from cluster in 10.819 seconds 2015-07-07 16:05:22.484: [cssd(12779668)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk6; details at (:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac3/cssd/ocssd.log. 2015-07-07 16:05:22.849: [cssd(12779668)]CRS-1611:Network communication with node rac2 (2) missing for 75% of timeout interval. Removal of this node from cluster in 10.953 seconds 2015-07-07 16:05:24.841: [cssd(12779668)]CRS-1613:No I/O has completed after 90% of the maximum interval. Voting file /dev/rhdisk6 will be considered not functional in 3290 milliseconds 2015-07-07 16:05:26.852: [cssd(12779668)]CRS-1610:Network communication with node rac1 (1) missing for 90% of timeout interval. Removal of this node from cluster in 3.812 seconds 2015-07-07 16:05:26.993: [cssd(12779668)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk6; details at (:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac3/cssd/ocssd.log. 2015-07-07 16:05:28.811: [cssd(12779668)]CRS-1604:CSSD voting file is offline: /dev/rhdisk6; details at (:CSSNM00058:) in /u01/app/11.2.0.4/grid/log/rac3/cssd/ocssd.log. 2015-07-07 16:05:29.853: [cssd(12779668)]CRS-1610:Network communication with node rac2 (2) missing for 90% of timeout interval. Removal of this node from cluster in 3.949 seconds 2015-07-07 16:05:31.323: [cssd(12779668)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk6; details at (:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac3/cssd/ocssd.log. 2015-07-07 16:05:33.802: [cssd(12779668)]CRS-1632:Node rac1 is being removed from the cluster in cluster incarnation 331671404 2015-07-07 16:05:33.802: [cssd(12779668)]CRS-1632:Node rac2 is being removed from the cluster in cluster incarnation 331671404 2015-07-07 16:05:33.806: [cssd(12779668)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac3 . 2015-07-07 16:05:33.831: [ctssd(12910816)]CRS-2407:The new Cluster Time Synchronization Service reference node is host rac3. 2015-07-07 16:05:36.443: 54 / 64 甘肃银行 PowerHA HyperSwap 测试记录 [cssd(12779668)]CRS-1649:An I/O error occured for voting file: /dev/rhdisk6; details at (:CSSNM00060:) in /u01/app/11.2.0.4/grid/log/rac3/cssd/ocssd.log. 2015-07-07 16:05:37.461: [cssd(12779668)]CRS-1626:A Configuration change request completed successfully 2015-07-07 16:05:37.465: [cssd(12779668)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac3 . 2015-07-07 16:05:42.397: [crsd(7471164)]CRS-5504:Node down event reported for node 'rac1'. 2015-07-07 16:05:42.397: [crsd(7471164)]CRS-5504:Node down event reported for node 'rac2'. 2015-07-07 16:05:44.005: [client(29556942)]CRS-4743:File /u01/app/11.2.0.4/grid/oc4j/j2ee/home/OC4J_DBWLM_config/system-jazn-data.xml was updated from OCR(Size: 13365(New), 13378(Old) bytes) 2015-07-07 16:05:59.516: [crsd(7471164)]CRS-2773:Server 'rac1' has been removed from pool 'Generic'. 2015-07-07 16:05:59.520: [crsd(7471164)]CRS-2773:Server 'rac1' has been removed from pool 'ora.orcl'. 2015-07-07 16:05:59.520: [crsd(7471164)]CRS-2773:Server 'rac2' has been removed from pool 'Generic'. 2015-07-07 16:05:59.521: [crsd(7471164)]CRS-2773:Server 'rac2' has been removed from pool 'ora.orcl'. 结果说明: RAC3 接管业务,存储发生切换,IO 暂停 65s 恢复过程描述: cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs0_DS8877" cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC1_fcs1_DS8877" cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC2_fcs0_DS8877" cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC2_fcs1_DS8877" cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC3_fcs0_DS8877" cfgadd "CSC_Base", "D20151030_HB_HPSW_RAC3_fcs1_DS8877" cfgadd "CSC_Base", "D20150730_GSB_HPSW_PPRC_76to77" echo y | cfgenable "CSC_Base" chsysstate -m SVRP7750-29-SN06BA81T -r lpar -o shutdown --id 11 --immed chsysstate -m SVRP7750-29-SN06BA81T -r lpar -o shutdown --id 12 –immed 恢复存储的 pprcpath,数据的同步 恢复 voting disk 的设置 手工 swap 存储回来 55 / 64 甘肃银行 PowerHA HyperSwap 测试记录 2.8 常用脚本 2.8.1 2.8.1.1 AIX 查看 hdisk 和 LUN 的对应关系 # for i in 1 2 3 4 5 6^Jdo^Jecho hdisk$i^Jlscfg -vpl hdisk$i|egrep "Serial Number|Z7"^Jecho "\n"^Jdone hdisk1 Serial Number...............75CGR21C Device Specific.(Z7)........C200 hdisk2 Serial Number...............75CGR21C Device Specific.(Z7)........C300 hdisk3 Serial Number...............75CGR21C Device Specific.(Z7)........C700 hdisk4 Serial Number...............75CGR21C Device Specific.(Z7)........C800 hdisk5 Serial Number...............75CGR21C Device Specific.(Z7)........CC00 hdisk6 Serial Number...............75CGR21E Device Specific.(Z7)........E301 2.8.1.2 查看 hdisk 的 HyperSwap enable 情况 # lspprc -Ao hdisk# PPRC state hdisk1 56 / 64 Active Primary path group ID 0(s) Secondary path group ID 1 Primary Storage WWNN Secondary Storage WWNN 5005076305ffd4a4 5005076305ffd4ac 甘肃银行 PowerHA HyperSwap 测试记录 hdisk2 Active 0(s) 1 5005076305ffd4a4 hdisk3 Active 0(s) 1 5005076305ffd4a4 hdisk4 Active 0(s) 1 5005076305ffd4a4 hdisk5 Active 0(s) 1 5005076305ffd4a4 # lspprc -p hdisk1 path WWNN LSS VOL path group id group status ======================================================= 0(s) 5005076305ffd4a4 0xc2 0x00 PRIMARY 1 5005076305ffd4ac 0xc2 0x00 SECONDARY 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac 5005076305ffd4ac path path path parent connection group id id status ===================================================================== 0 0 Enabled fscsi0 50050763051014a4,40c2400000000000 0 1 Enabled fscsi0 50050763050094a4,40c2400000000000 0 2 Enabled fscsi0 50050763054014a4,40c2400000000000 0 3 Missing fscsi0 50050763050b14a4,40c2400000000000 0 4 Enabled fscsi0 50050763054b14a4,40c2400000000000 0 5 Enabled fscsi1 50050763051014a4,40c2400000000000 0 6 Enabled fscsi1 50050763050094a4,40c2400000000000 0 7 Enabled fscsi1 50050763054014a4,40c2400000000000 0 8 Missing fscsi1 50050763050b14a4,40c2400000000000 0 9 Enabled fscsi1 50050763054b14a4,40c2400000000000 1 10 Enabled fscsi0 50050763051bd4ac,40c2400000000000 1 11 Enabled fscsi1 50050763051b94ac,40c2400000000000 2.8.1.3 查看 HyperSwap disk 的 pprcpath 配置情况 # lspprc -c hdisk1 Displaying all paths between LSS C2 and LSS C2 Source Target WWNN SSID LSS Port WWNN SSID LSS Port State =================================================================== 5005076305FFD4A4 FFC2 C2 0134 5005076305FFD4AC FFC2 C2 0333 Up 5005076305FFD4AC FFC2 C2 0333 5005076305FFD4A4 FFC2 C2 0134 Up 57 / 64 甘肃银行 PowerHA HyperSwap 测试记录 2.8.2 2.8.2.1 PowerHA 查看资源组状态 # clRGinfo ----------------------------------------------------------------------------Group Name State Node ----------------------------------------------------------------------------racRG ONLINE rac1@site1 ONLINE rac2@site1 ONLINE rac3@site2 2.8.2.2 查看集群管理进程 # clshowsrv -v Status of the RSCT subsystems used by PowerHA SystemMirror: Subsystem Group PID Status cthags cthags 9502934 active ctrmc rsct 6619346 active Status of the PowerHA SystemMirror subsystems: Subsystem Group PID clstrmgrES cluster 7602196 clevmgrdES cluster 7667716 Status of the CAA subsystems: Subsystem Group clcomd caa clconfd caa 2.8.3 2.8.3.1 PID 6357208 8650866 Status active active Status active active DS8K 查看微码版本 dscli> ver -l Date/Time: 2015 年 7 月 1 日 上午 09 时 26 分 05 秒 IBM DSCLI Version: 7.7.10.289 DS: DSCLI 7.7.10.289 StorageManager 7.7.7.0.20140929.1 ================Version================= 58 / 64 甘肃银行 PowerHA HyperSwap 测试记录 Storage Image LMC =========================== IBM.2107-75CGR21 7.7.40.364 2.8.3.2 查看 ID 及 WWNN 等 dscli> lssi Date/Time: 2015 年 7 月 1 日 上午 09 时 26 分 22 秒 IBM DSCLI Version: 7.7.10.289 DS: Name ID Storage Unit Model WWNN State ESSNet ============================================================================== DS8877 IBM.2107-75CGR21 IBM.2107-75CGR20 961 5005076305FFD4A4 Online Enabled 2.8.3.3 查看 PPRC 状态 dscli> lspprc C200 C300 C700 C800 CC00 Date/Time: 2015 年 7 月 7 日 下午 05 时 27 分 50 秒 IBM DSCLI Version: 7.7.10.289 DS: IBM.2107-75CGR21 ID State Reason Type SourceLSS Timeout (secs) Critical Mode First Pass Status ============================================================================================ ====== C200:C200 Full Duplex - Metro Mirror C2 60 Disabled Invalid C300:C300 Full Duplex - Metro Mirror C3 60 Disabled Invalid C700:C700 Full Duplex - Metro Mirror C7 60 Disabled Invalid C800:C800 Full Duplex - Metro Mirror C8 60 Disabled Invalid CC00:CC00 Full Duplex - Metro Mirror CC 60 Disabled Invalid 2.8.3.4 查看 PPRCPATH 状态及配置 dscli> lspprcpath C2 c3 C7 C8 CC Date/Time: 2015 年 7 月 7 日 下午 05 时 27 分 54 秒 IBM DSCLI Version: 7.7.10.289 DS: IBM.2107-75CGR21 Src Tgt State SS Port Attached Port Tgt WWNN ========================================================= C2 C2 Success FFC2 I0134 I0333 5005076305FFD4AC C3 C3 Success FFC3 I0134 I0333 5005076305FFD4AC C7 C7 Success FFC7 I0134 I0333 5005076305FFD4AC C8 C8 Success FFC8 I0134 I0333 5005076305FFD4AC CC CC Success FFCC I0134 I0333 5005076305FFD4AC dscli> showlss c2 Date/Time: 2015 年 7 月 7 日 下午 05 时 28 分 44 秒 IBM DSCLI Version: 7.7.10.289 DS: 59 / 64 甘肃银行 PowerHA HyperSwap 测试记录 IBM.2107-75CGR21 ID C2 Group 0 addrgrp C stgtype fb confgvols 1 subsys 0xFFC2 pprcconsistgrp Enabled xtndlbztimout 60 secs resgrp RG0 2.8.4 恢复 Oracle 的 voting disk 配置 在涉及到如下场景时,Oracle voting disk group 的配置会有变化,即原先有 3 个成员,场景 发生后,会丢失其中一个成员;在恢复故障后,需要手工恢复 voting disk group 的配置,下 面以主存储故障发生后,恢复 voting disk group 的步骤进行说明: 恢复 RAC 的 voting disk 设置: 由于测试过程中丢失了 hdisk6 的 voting disk 盘访问,在进行下次测试之前,需要进行手工恢 复: 在故障之前,正常情况下是如下输出: $ crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----------------------------- --------1. ONLINE 0288bafd47c14f27bf70eb282798b259 (/nfsvote3/nfs_vote) [VOTEDG] 2. ONLINE 473861c8d4ff4f17bf226f031c32c575 (/dev/rhdisk6) [VOTEDG] 3. ONLINE 4e28ed259e1d4fd9bf38ffa1b8f53336 (/dev/rhdisk18) [VOTEDG] Located 3 voting disk(s). hdisk6 来自主存储 DS8877 hdisk18 来自备存储 DS8876 /nfsvote3/nfs_vote 来自第三方站点的 NFS Server hdisk13 是用来作为临时的 voting disk group 来进行处理 60 / 64 甘肃银行 PowerHA HyperSwap 测试记录 故障发生后,所有节点不能访问位于主存储的 hdisk6,hdisk6 就会被踢出当前的 voting disk group,我们采用如下步骤: 1. 采用临时 voting disk group 替换当前 dg 2. 删除之前生产用的 dg 3. 重建之前生产用的 dg 4. 将新建的 dg 再替换成当前 dg $ crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----------------------------- --------1. ONLINE 0288bafd47c14f27bf70eb282798b259 (/nfsvote3/nfs_vote) [VOTEDG] 3. ONLINE 4e28ed259e1d4fd9bf38ffa1b8f53336 (/dev/rhdisk18) [VOTEDG] Located 2 voting disk(s). #asmcmd ASMCMD> lsdsk -G votedg Path /dev/rhdisk18 /nfsvote3/nfs_vote ASMCMD> lsdg State Type Rebal Sector Block AU Total_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name MOUNTED EXTERN N 512 4096 1048576 40960 0 6712 0 N DATA/ MOUNTED NORMAL N 512 4096 1048576 10540 0 5035 1 Y VOTEDG/ MOUNTED EXTERN N 512 4096 1048576 1024 0 618 0 N VOTETMPDG/ Free_MB $ sqlplus "/as sysasm" SQL> select name, state from v$asm_diskgroup; NAME ------------------------------ ----------DATA VOTEDG VOTETMPDG STATE MOUNTED MOUNTED MOUNTED SQL> set linesize 1000 SQL> Col name format a25 SQL> select disk_number,repair_timer,state,name,path from v$asm_disk order by name; DISK_NUMBER REPAIR_TIMER STATE 61 / 64 NAME PATH 6712 10337 618 甘肃银行 PowerHA HyperSwap 测试记录 0 1 2 3 1 2 0 0 0 0 NORMAL DATA_0000 0 NORMAL DATA_0001 0 NORMAL DATA_0002 0 NORMAL DATA_0003 0 NORMAL VOTEDG_0001 0 NORMAL VOTEDG_0002 0 NORMAL VOTETMPDG_0000 0 FORCING _DROPPED_0000_VOTEDG 0 NORMAL /dev/rhdisk1 /dev/rhdisk2 /dev/rhdisk3 /dev/rhdisk4 /dev/rhdisk18 /nfsvote3/nfs_vote /dev/rhdisk13 /dev/rhdisk6 9 rows selected. $ crsctl replace votedisk +votetmpdg Successful addition of voting disk 5e46bfec9eea4f37bf1c6ae33e508a23. Successful deletion of voting disk 2393621fb4e04f4dbfc32cd2d6521d8e. Successful deletion of voting disk 36c1a96614814f37bf5e7c1eeb0723f2. Successfully replaced voting disk group with +votetmpdg. CRS-4266: Voting file(s) successfully replaced $ crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----------------------------- --------1. ONLINE 5e46bfec9eea4f37bf1c6ae33e508a23 (/dev/rhdisk13) [VOTETMPDG] Located 1 voting disk(s). $ sqlplus "/as sysasm" SQL*Plus: Release 11.2.0.4.0 Production on Mon Jul 6 18:06:27 2015 Copyright (c) 1982, 2013, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options SQL> alter diskgroup votedg dismount; --需要在所有节点执行 Diskgroup altered. SQL> drop diskgroup votedg force including contents; Diskgroup dropped. SQL> create diskgroup voteDG normal redundancy 62 / 64 甘肃银行 PowerHA HyperSwap 测试记录 failgroup fg1 disk '/dev/rhdisk6' failgroup fg2 disk '/dev/rhdisk18' quorum failgroup fg3 disk '/nfsvote3/nfs_vote' attribute 'compatible.asm' = '11.2.0.0.0'; Diskgroup created. SQL> SQL> quit Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options $ set -o vi $ crsctl replace votedisk +votedg Successful addition of voting disk 0288bafd47c14f27bf70eb282798b259. Successful addition of voting disk 473861c8d4ff4f17bf226f031c32c575. Successful addition of voting disk 4e28ed259e1d4fd9bf38ffa1b8f53336. Successful deletion of voting disk 5e46bfec9eea4f37bf1c6ae33e508a23. Successfully replaced voting disk group with +votedg. CRS-4266: Voting file(s) successfully replaced $ crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----------------------------- --------1. ONLINE 0288bafd47c14f27bf70eb282798b259 (/nfsvote3/nfs_vote) [VOTEDG] 2. ONLINE 473861c8d4ff4f17bf226f031c32c575 (/dev/rhdisk6) [VOTEDG] 3. ONLINE 4e28ed259e1d4fd9bf38ffa1b8f53336 (/dev/rhdisk18) [VOTEDG] Located 3 voting disk(s). 如果想用 votetmpdg 替换当前 dg 的时候发现 votetmpdg 为 dismounted 状态,则先 mount 该 disk group。 SQL> select name, state from v$asm_diskgroup; NAME STATE ------------------------------ ----------DATA MOUNTED VOTEDG MOUNTED VOTETMPDG DISMOUNTED SQL> alter diskgroup votetmpdg mount; Diskgroup altered. SQL> select name, state from v$asm_diskgroup; NAME STATE ------------------------------ ----------63 / 64 甘肃银行 PowerHA HyperSwap 测试记录 DATA VOTEDG VOTETMPDG 64 / 64 MOUNTED MOUNTED MOUNTED
© Copyright 2024 Paperzz