Building Disaster Recovery Serviceguard Solutions Using Metrocluster with Continuous Access EVA P6000 for Linux Part Number: 710335-006 Published: November 2016 © Copyright 2015 Hewlett Packard Enterprise Development LP The information contained herein is subject to change without notice. The only warranties for Hewlett Packard Enterprise products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. Hewlett Packard Enterprise shall not be liable for technical or editorial errors or omissions contained herein. Confidential computer software. Valid license from Hewlett Packard Enterprise required for possession, use, or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. Links to third-party websites take you outside the Hewlett Packard Enterprise website. Hewlett Packard Enterprise has no control over and is not responsible for information outside the Hewlett Packard Enterprise website. Acknowledgments Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries. Contents 1 Introduction..........................................................................................................5 Overview of P6000/EVA and P6000 Continuous Access Concepts.....................................................5 Copy sets.........................................................................................................................................5 Data replication Groups (DR Groups).............................................................................................5 Write modes....................................................................................................................................6 DR Group write history log..............................................................................................................7 Failover............................................................................................................................................7 Failsafe mode..................................................................................................................................8 Failsafe on Link-down/Power-up.....................................................................................................8 Overview of a Metrocluster with P6000 Continuous Access configuration...........................................9 P6000 Continuous Access Management Software............................................................................10 2 Configuring an application in a Metrocluster environment ...............................11 Installing the necessary hardware and software................................................................................11 Setting up the storage hardware ..................................................................................................11 Installing the necessary software..................................................................................................11 Creating the cluster.............................................................................................................................11 Site Aware Failover Configuration.................................................................................................11 Setting up the replication....................................................................................................................12 Creating VDISKs and DR groups using P6000 command view....................................................12 Using Storage System Scripting Utility (SSSU).............................................................................13 Creating the Metrocluster with Continuous Access EVA P6000 for Linux Map for the replicated storage ..........................................................................................................................................14 Defining Management Server and SMI-S information .............................................................14 Creating the Management Server list.................................................................................15 Creating the Management Server Mapping file..................................................................15 Setting a default Management Server................................................................................15 Displaying the list of Management Servers.........................................................................15 Adding or updating Management Server information.........................................................16 Deleting a Management Server..........................................................................................16 Defining P6000/EVA Storages cells and DR groups................................................................16 Creating the Storage Map file.............................................................................................17 Displaying information about storage devices....................................................................17 Copying the Map file......................................................................................................................18 Configuring volume groups.................................................................................................................18 Identify the device special files for Vdisk in DR group...................................................................18 Identifying special device files..................................................................................................18 Configuring LVM volume group using Metrocluster with Continuous Access EVA P6000 for Linux..............................................................................................................................................19 Creating LVM volume groups ..................................................................................................19 Configuring VMware VMFS Disk..............................................................................................20 Installing and configuring an application.............................................................................................21 Configuring a Metrocluster package .............................................................................................21 Setting up Disk Monitoring.......................................................................................................23 Configuring a Metrocluster package using Serviceguard Manager.........................................23 3 Metrocluster features.........................................................................................27 Data replication storage failover preview............................................................................................27 Rolling upgrade for Metrocluster.........................................................................................................27 Live Application Detach .....................................................................................................................28 4 Understanding failover/failback scenarios.........................................................29 Metrocluster package failover/failback scenarios ..............................................................................29 Contents 3 5 Administering Metrocluster................................................................................33 Adding a node to Metrocluster ...........................................................................................................33 Maintaining EVA P6000 Continuous Access replication in Metrocluster............................................33 P6000 Continuous Access Link failure scenarios.........................................................................33 Planned maintenance....................................................................................................................34 Node maintenance...................................................................................................................34 Metrocluster package maintenance.........................................................................................34 Failback.........................................................................................................................................34 Administering Metrocluster with Serviceguard Manager....................................................................34 Rolling upgrade...................................................................................................................................35 Upgrading Metrocluster replication software.................................................................................35 Limitations of the rolling upgrade for Metrocluster........................................................................35 Upgrading replication management software................................................................................35 Upgrading the OpenPegasus WBEM Services for Metrocluster with Continuous Access EVA P6000 for Linux........................................................................................................................35 6 Troubleshooting.................................................................................................36 Troubleshooting Metrocluster.............................................................................................................36 Metrocluster log.............................................................................................................................36 P6000/EVA storage system log.....................................................................................................36 7 Support and other resources.............................................................................37 Accessing Hewlett Packard Enterprise Support.................................................................................37 Accessing updates..............................................................................................................................37 Websites.............................................................................................................................................37 Customer self repair...........................................................................................................................38 Remote support..................................................................................................................................38 Documentation feedback....................................................................................................................38 A Checklist and worksheet for configuring a Metrocluster with Continuous Access EVA P6000 for Linux............................................................................................39 Disaster Recovery Checklist...............................................................................................................39 Cluster Configuration Worksheet........................................................................................................39 Package Configuration Worksheet.....................................................................................................40 P6000/EVA Configuration Checklist...................................................................................................41 B Package attributes for Metrocluster with Continuous Access EVA P6000 for Linux.....................................................................................................................42 C smiseva.conf file...............................................................................................44 D mceva.conf file..................................................................................................45 E Identifying the devices to be used with packages.............................................47 Identifying the devices created in P6000/EVA....................................................................................47 Glossary...............................................................................................................48 Index.....................................................................................................................51 4 Contents 1 Introduction This document describes how to configure data replication solutions using HPE P6000/EVA disk Arrays to provide disaster recovery for Serviceguard clusters over long distances. It also gives an overview of the P6000 Continuous Access software and the additional files that integrate P6000/EVA disk Arrays with Metrocluster. Overview of P6000/EVA and P6000 Continuous Access Concepts P6000 Continuous Access provides remote data replication from primary P6000/EVA storage systems to remote P6000/EVA storage systems. P6000 Continuous Access uses the remote-copy function of the Hierarchical Storage Virtualization (HSV) controller running the controller software (VCS or XCS) to achieve host-independent remote data replication. Remote replication is the continuous copying of data from selected virtual disks on a source (local) array to related virtual disks on a destination (remote) array. Virtual disks (Vdisks) are user defined storage allotments of virtual or logical data storage. Applications continue to run while data is replicated in the background. Remote replication requires a fabric connection between the source and destination arrays and a logical grouping between source virtual disks and destination virtual disks. This section describes some basic remote replication concepts. The topics discussed are: • Copy Sets • Data Replication Groups (DR Groups) • Write modes • DR group write history log • Failover • Failsafe mode • Failsafe on Link-down/Power-up Copy sets A pairing relationship can be created to automatically replicate a logical disk from the source array to another logical disk in the destination array. A pair of source and destination virtual disks that have replication relationship is called a copy set. A Vdisk does not have to be part of a copy set. Vdisks at any site can be set up for local storage and used for activities, such as testing and backup. Clones and snapclones are examples of Vdisks used in this manner.When a Vdisk is not part of a copy set, it is not disaster tolerant, but it can use various Vraid types for failure tolerance. Data replication Groups (DR Groups) A DR group is a logical group of virtual disks in a remote replication relationship between two arrays. DR groups operate in a paired relationship, with one DR group being a source and the other a destination. Hence a DR group can be thought of as a collection of copy sets. The terms source and destination are sometimes referred to as a DR mode or DR role. Hosts write data to the virtual disks in the source array, and the array copies the data to the virtual disks in the destination array. I/O ordering is maintained across the virtual disks in a DR group, ensuring I/O consistency on the destination array in the event of a failure of the source array. All virtual disks used for replication must belong to a DR group, and a DR group must contain at least one Vdisk. When a DR group is first created, a full copy normalization occurs to copy all the data in the DR group from the source array to the destination array, bringing the source and destination vdisks into synchronization. Normalizations copy data from the source array to the destination array in 128 KB blocks. Overview of P6000/EVA and P6000 Continuous Access Concepts 5 The replicating direction of a DR group is always from a source to a destination. In bidirectional replication, an array can have both source and destination virtual disks that will reside in separate DR groups. That is, one virtual disk cannot be both a source and destination simultaneously. Bidirectional replication enables you to use both arrays for primary storage while they provide disaster protection for another site. The remote copy feature is intended not only for disaster recovery, but also to replicate data from one storage system or physical site to another storage system or site. It also provides a method for performing a backup at either the source or destination site. P6000 Continuous Access has the ability to suspend and resume replication. Some versions of controller software support auto suspend when a full copy of the DR group is required. This feature can be used to protect the data at the destination site by delaying the full copy operation until a snapshot or snapclone of the data has been made. See the HPE P6000 EVA Compatibility Reference available at http://www.hpe.com/support/manuals —> storage -> Storage Software -> Storage Replication Software -> HP P6000 Command View Software to determine if your array supports this feature. IMPORTANT: Metrocluster with Continuous Access EVA P6000 for Linux does not support enabling auto suspend on full copy feature. Write modes The remote replication write modes are as follows: • Synchronous — The array acknowledges I/O completion after the data is cached on both the local and destination arrays. • Asynchronous — The array acknowledges I/O completion before data is replicated on the destination array. Asynchronous write mode can be basic or enhanced, depending on the software version of the controller. ◦ Basic Asynchronous mode — An I/O completion acknowledgement is sent to the host immediately after data is written to the cache at the source controller, but before the data is delivered to the cache on the destination controller. There is no requirement to wait for the I/O completion acknowledgement from the destination controller. ◦ Enhanced Asynchronous mode — The host receives an I/O completion acknowledgement after the data is successfully written to the disk based log/journal, which is used to queue the writes, which occurs after the data is written to local cache. The asynchronous replication process reads the I/O from the journal and replicates it, using current methodologies, to destination P6000/EVA. You can specify the replication write mode when you create DR groups. The choice of write mode, which is a business decision, has implications for bandwidth requirements and RPO. Synchronous mode provides greater data currency because RPO will be zero. Asynchronous mode provides faster response to server I/O, but at the risk of losing data queued at the source side if a site disaster occurs. For complete information on which write modes are supported on each version of controller software, see HPE P6000 EVA Compatibility Reference available at http://www.hpe.com/support/manuals—>storage -> Storage Software -> Storage Replication Software -> HP P6000 Command View Software. IMPORTANT: This product supports only Synchronous replication mode and the Enhanced Asynchronous replication mode and does not support basic asynchronous replication mode. For more information on supported arrays, see the Disaster Tolerant Clusters Products Compatibility Feature Matrix available at: http://www.hpe.com/info/linux-serviceguard-docs. 6 Introduction DR Group write history log The DR group write history log is a virtual disk that stores a DR group's host write data. The log is created when you create the DR group. Once the log is created, it cannot be moved. In synchronous mode or basic asynchronous mode, the DR group write history log stores data when replication to the destination DR group is stopped because the destination DR group is unavailable or suspended. This process is called logging. When replication resumes, the contents of the log are sent to the destination virtual disks in the DR group. This process of sending I/Os contained in the write history log to the destination array is called merging. Because the data is written to the destination in the order that it was written to the log, merging maintains an I/O-consistent copy of the DR group's data at the destination. When using synchronous mode or basic asynchronous mode, if logging occurs because replication has been suspended or the replication links have failed, the size of the log file expands in proportion to the amount of writes. The size of the log file can increase only up to the user-specified maximum value or to the controller's software default maximum value. You can set the maximum size for the DR group write history log while in synchronous mode. The size of the log can't be changed while in basic asynchronous mode. You must change the write mode to synchronous, change the log file size, and then return to basic asynchronous mode. In synchronous mode and basic asynchronous mode, the log grows as needed when the DR group is logging and it shrinks as entries in the log are merged to the remote array. The controller considers the log disk full when one of the following occurs: • No free space remains in the disk group that contains the log disk. • The log disk reaches 2 TB of Vraid1 space. • The log reaches the default or user-specified maximum log disk size. In enhanced asynchronous mode, the DR group write history log acts as a buffer and stores the data until it can be replicated. The consumption of the additional capacity required for the log should not be viewed as missing capacity—it is capacity used to create the log. The DR group write history log file size is set when you transition the DR group to enhanced asynchronous mode. The space for the DR group write history log must be available on both the source and destination arrays before the DR group is transitioned to enhanced asynchronous mode. Once set, the space is reserved for the DR group write history log and cannot be reduced in size. If necessary, you can reclaim allocated log disk space from a DR group in enhanced asynchronous mode. You must first change the write mode to synchronous and then use the log control feature to reduce the log size. When the log content has been drained, you can return the DR group to enhanced asynchronous mode. Until the DR group is returned to enhanced asynchronous mode, the DR group operates in synchronous mode, which may impact performance. Allocated log file space is not decreased when DR group members are removed. Log space usage will increase when members are being added to an existing DR group unless the size of the log disk has reached the maximum of 2 TB or has been fixed to a user-defined value. For details on maximum and default log sizes for different replication modes, see the HPE P6000 EVA Compatibility Reference available at http://www.hpe.com/support/manuals—> storage -> Storage Software -> Storage Replication Software -> HP P6000 Command View Software. When a write history log overflows, the controller invalidates the log contents and marks the DR group for normalization to bring the source and destination arrays back into synchronization. NOTE: When the replication mode is manually changed from asynchronous to synchronous mode, the state is displayed as Run Down. Failover In P6000 Continuous Access replication, failover reverses replication direction for a DR group. The destination array assumes the role of the source, and the source array assumes the role of Overview of P6000/EVA and P6000 Continuous Access Concepts 7 the destination. The process can be planned or unplanned. A planned failover allows an orderly shutdown of the system before the redundant system takes over. An unplanned failover occurs when a failure or outage occurs that may not allow an orderly transition of roles. NOTE: Failover can take other forms: • Controller failover — The process that occurs when one controller in a pair assumes the workload of a failed or redirected controller in the same array. • Fabric or path failover — I/O operations transfer from one fabric or path to another. Failsafe mode Failsafe mode is only available when a DR group is being replicated in synchronous mode and specifies how host I/O is handled if data cannot be replicated between the source and destination array. The failsafe mode can be one of the following: • • Failsafe enabled — All host I/O to the DR group is stopped if data cannot be replicated between the source array and destination array. This ensures that both arrays will always contain the same data (RPO of zero). A failsafe-enabled DR group can be in one of two states: ◦ Locked (failsafe-locked) — Host I/O and remote replication have stopped because data cannot be replicated between the source and destination array. ◦ Unlocked (failsafe-unlocked) — Host I/O and remote replication have resumed once replication between the arrays is re-established. Failsafe disabled — If replication of data between the source and destination array is interrupted, the host continues writes to the source array, but all remote replication to the destination array stops and I/Os are put into the DR group write history log until remote replication is re-established. NOTE: Failsafe mode is available only in synchronous write mode. Host I/O can be recovered by changing affected DR groups from failsafe-enabled mode to failsafe-disabled mode. This action will begin logging of all incoming writes to the source member of the Data Replication group. Metrocluster with Continuous Access EVA P6000 for Linux does not support enabling Failsafe Mode. Failsafe on Link-down/Power-up Failsafe on Link-down/Power-up is a setting that specifies whether or not virtual disks in a DR group are automatically presented to hosts after a power-up (reboot) of the source array when the links to the destination array are down and the DR group is not suspended. This prevents a situation where the virtual disks in a DR group are presented to servers on the destination array following a failover and then the virtual disks on the source array are also presented when it reboots. Values for Failsafe on Link-down/Power-up are as follows: • • Enabled — Virtual disks in a source DR group are not automatically presented to hosts. This is the default value assigned to a DR group when it is created. This behavior is called presentation blocking and provides data protection under several circumstances. Host presentation remains blocked until the destination array becomes available (and can communicate with the source array) or until the DR group is suspended. Disabled — Virtual disks a source DR group are automatically presented to hosts after a controller reboot. NOTE: Metrocluster with Continuous Access EVA P6000 for Linux does not support enabling Failsafe on Link-down/Power-up. 8 Introduction For more information on remote data replication concepts and planning a remote replication solution, see HPE P6000 Continuous Access Implementation Guide available at http:// www.hpe.com/support/manuals—>storage -> Storage Software -> Storage Replication Software -> HP P6000 Continuous Access Software. Overview of a Metrocluster with P6000 Continuous Access configuration A Metrocluster is configured with the nodes at Site A and Site B. When Site A and Site B form a Metrocluster, a third location is required where Quorum Server or arbitrator nodes must be configured. There is a P6000/EVA storage at each site and they are connected to each other through Continuous Access links. An application is deployed in a Metrocluster by configuring it at both the sites. The sites are referred either as DC1 or DC2 for an application, based on their role. Typically, the application runs on DC1 site which is the primary site. If there is a disaster in DC1 site, the application automatically fails over to the recovery site referred as DC2 site. NOTE: DC1 and DC2 are application-specific roles of a site. For each application, either synchronous or enhanced asynchronous mode replication is configured to replicate data between the two sites using DR group. In a typical configuration, more than one application is configured to run in a Metrocluster. Depending on the application distribution in a Metrocluster environment, some applications can have Site A as its DC1 while some other applications can have Site B as its DC1. Overview of a Metrocluster with P6000 Continuous Access configuration 9 Figure 1 Sample Configuration of Metrocluster with Continuous Access EVA P6000 for Linux Figure 1 depicts an example of two applications distributed in a Metrocluster with Continuous Access EVA P6000 for Linux environment balancing the server and replication load. In this example, Site A is the primary site or DC1 for application A, and recovery site or DC2 for application B. Site B is the primary site or DC1 for application B, and recovery site or DC2 for application A. P6000 Continuous Access Management Software Metrocluster with Continuous Access EVA P6000 for Linux requires the following two software components to be installed in the Management Server: 10 • P6000 Command View (earlier known as HP StorageWorks HP P6000 Command View). This software component allows you to configure and manage the storage and DR group via a web browser interface. • The SMI-S (Storage Management Interface Specification) EVA software provides the interface for the management of P6000/EVA. Metrocluster with Continuous Access EVA P6000 for Linux software uses OpenPegasus WBEM API to communicate with SMI-S to automatically manage the DR Groups that are used in the application packages. Introduction 2 Configuring an application in a Metrocluster environment Installing the necessary hardware and software When the following procedures are complete, an adoptive node will be able to access the data belonging to a package after it fails over. Setting up the storage hardware 1. 2. 3. Before you configure this product, you must correctly cable the P6000/EVA with redundant paths to each node in the cluster that will run packages accessing data on the array. Install and configure the hardware components of the P6000/ EVA, including HSV controllers, disk arrays, SAN switches, and Management Server. Install and configure P6000 Command View and SMI-S EVA on the Management Server. For the installation and configuration process, see the HPE P6000 Command View Installation Guide. Installing the necessary software Before you configure a Metrocluster, make sure the following software is installed on all nodes: • Serviceguard for Linux A.12.00.00 or later • Metrocluster with Continuous Access EVA P6000 for Linux See the Release Notes and Compatibility and Feature Matrix available at http://www.hpe.com/ info/linux-serviceguard-docs for the latest patches available for the above mentioned software. Creating the cluster NOTE: The file /etc/cmcluster.conf contains the mappings that resolve symbolic references to $SGCONF, $SGROOT, $SGLBIN, etc. used in the pathnames in the following subsections. If the Serviceguard variables are not defined on your system, then include the file /etc/cmcluster.conf in your login profile for the root user. For more information on these parameters, see Understanding the Location of Serviceguard Files and Enabling Serviceguard Command Access in Managing HPE Serviceguard A.12.00.30 for Linux available at http://www.hpe.com/ info/linux-serviceguard-docs . Create a Serviceguard cluster with components on multiple sites according to the process described in the Managing HPE Serviceguard A.12.00.30 for Linux available at http:// www.hpe.com/info/linux-serviceguard-docs. NOTE: A configuration with a Lock Lun is not supported in a Metrocluster environment. Site Aware Failover Configuration Serviceguard cluster allows sites to be configured in a Metrocluster environment. The Serviceguard cluster configuration file includes the following attributes shown in Table 1 (page 11) to define sites. This enables to use the package failover policiessite_preferred_manual or site_preferred for Metrocluster package. Table 1 Site Aware Failover Configuration Attributes Description SITE_NAME To define a unique name for a site in the cluster. SITE SITE keyword under the node's NODE_NAME definition. Installing the necessary hardware and software 11 The following is a sample of the site definition in a Serviceguard cluster configuration file: SITE_NAME san_francisco SITE_NAME san_jose NODE_NAME SFO_1 SITE san_francisco ..... NODE_NAME SFO_2 SITE san_francisco ........ NODE_NAME SJC_1 SITE san_jose ....... NODE_NAME SJC_2 SITE san_jose ........ Use cmviewcl command to view the list of sites that are configured in the cluster and their associated nodes. The following is a sample of the command, and the output: # cmviewcl -l node SITE_NAME san_francisco NODE STATUS STATE SFO_1 up running SFO_2 up running ......... SITE_NAME san_jose NODE STATUS STATE SJC_1 up running SJC_2 up running You can configure either of these failover policies for Metrocluster failover packages. To use these policies, you must specify site_preferred or site_preferred_manual for the failover_policy attribute in the Metrocluster package configuration file. NOTE: For a Metrocluster package, Hewlett Packard Enterprise recommends that you set the failover_policy parameter to site_preferred. Setting up the replication Creating VDISKs and DR groups using P6000 command view The P6000 Command View is a web-based tool to configure, manage, and monitor virtual disks and DR groups as shown in Figure 2 (page 13). 12 Configuring an application in a Metrocluster environment Figure 2 P6000/EVA Management Console using P6000 Command View For more information on setting up P6000 Command View for configuring, managing, and monitoring P6000/EVA Storage System , see HPE P6000 Command View User Guide available at http://www.hpe.com/support/manuals -> storage-> Storage Software-> Storage Device Management Software-> HP StorageWorks HP P6000 Command View Software. Using the Command View (CV) web user interface create VDISKS, create DR GROUP using the VDISKS and present those VDISKS to the connected host. After a DR group is created set the destination host access field as Read only using Command View GUI. NOTE: In the Metrocluster environment, it is required that the destination volume access mode of the DR group be set to Read only mode. In some earlier Command View software versions, when a DR group is created, only the source volume (primary volume) is visible and accessible with Read/Write mode. The destination volume (secondary volume) by default is not visible and accessible to its local hosts. The destination volume access mode needs to be changed to Read only mode before the DR group can be used and the destination volumes need to be presented to its local hosts. Using Storage System Scripting Utility (SSSU) HPE Storage System Scripting Utility (SSSU) is a utility to manage and monitor P6000/EVA storage array. Using SSSU you can create VDISKS and DR groups to use with Metrocluster packages. For more information about SSSU commands, see the sample input file available at: $SGCONF/mccaeva/Samples/sssu_sample_input The contents of sample input file are listed below. In the following sample file, DC-1 is the name of the source array. select manager 15.13.244.182 user=administrator pass=administrator select system DC-1 set DR_GROUP “\Data Replication\DRG_DB1” accessmode=readonly ls DR_GROUP “\Data Replication\DRG_DB1” After you create the VDISKS and DR groups, run the following steps when copying and editing the sample input file: 1. Copy the sample file sssu_sample_input to the desired location. # cp $SGCONF/mccaeva/Samples/sssu_sample_input <desired location> 2. Customize the file sssu_sample_input. Setting up the replication 13 3. After you customize the sssu_input file, run the SSSU command as follows to set the destination Vdisk to read-only mode. # /sbin/sssu “FILE <sssu_input_file>” 4. To create the special device file name for the Vdisk on P6000/EVA, after changing the access mode of the destination Vdisk, run the /usr/bin/rescan-scsi-bus.sh command to detect and activate the disks, and then run lsscsi command to display the configured disks. NOTE: The lsscsi command is available in lssci package available in the respective OS repository. Creating the Metrocluster with Continuous Access EVA P6000 for Linux Map for the replicated storage The mapping file caeva.map is required to be present in all the cluster nodes for the Metrocluster with Continuous Access EVA P6000 for Linux product to function. The mapping file caeva.map contains information of the Management Servers as well as the information of the P6000/EVA Storages and DR Groups used in the Metrocluster environment. The Metrocluster with Continuous Access EVA P6000 for Linux product provides two utility tools, smispasswd and evadiscovery, for users to provide information about the SMI-S servers running on the Management Servers and DR groups that will be used in Metrocluster environment. These tools must be used to create or modify the map file caeva.map. This product provides two utility tools for users to provide information about the SMI-S service running on the Management Servers and DR groups that will be used in Metrocluster environment. smispasswd Metrocluster retrieves storage information from the SMIS-Server for its startup or failover operations. To contact the SMIS-server, it requires SMIS Server's hostname/IP address, username/password, and port number. This information must be available in caeva.map file before you configure any Metrocluster packages. The smispasswd utility, packaged along with Metrocluster must be used to create or edit this map file with the above SMI-S server login information. evadiscovery When querying P6000/EVA storage states through the SMI-S, Metrocluster first needs to find the internal device IDs. This process takes time. However, this process need not be repeated because the IDs are static in the P6000/EVA system. This information is cached in the caeva.map file to improve package startup time. To cache the internal device IDs, run the evadiscovery tool after the P6000/EVA and P6000 Continuous Access are configured, and the storage is accessible from the hosts. The tool queries the active Management Server for the needed information, and then saves it in caeva.map. Once the map file is created, it is necessary to distribute it to all the cluster nodes to communicate with the P6000/EVA units. NOTE: It is important to set the active Management Server before executing the evadiscovery tool. For details to set the active Management Server see Section : Setting a default Management Server. Defining Management Server and SMI-S information To define Management Server and SMI-S information use the smispasswd tool. The following steps describe the options for defining Management Server and SMI-S information: 14 Configuring an application in a Metrocluster environment Creating the Management Server list On a host that resides on the same data center as the active management server, create the Management Server list using an input file. To create use the following steps: 1. Create a configuration input file. A template of this file is available at the following location for Red Hat and SUSE. For an example of the smiseva.conf file, see“smiseva.conf file” (page 44). The smiseva.conf is available at: $SGCONF/mccaeva/smiseva.conf 2. Copy the template file smiseva.conf to the desired location. # cp $SGCONF/cmcaeva/smiseva.conf <desired location> 3. For each Management Server in your configuration (both local and remote sites), enter the Management Server’s hostname or IP address, the administrator login name, type of connection (secure or non-secure), SMI-S name space and SMI-S port. Creating the Management Server Mapping file Use the smispasswd command to create or modify the Management Server information stored in the mapping file. For each Management Server listed in the file, a password prompt is displayed. A username and password is created by your system administrator when the Management Server is configured because of the security protocol for P6000/EVA. Enter the password associated with the username of the SMI-S, and then re-enter it (as prompted) to verify that it is correct. # smispasswd -f <desired location>/smiseva.conf Enter password of <hostname1/ip_address1>: ********** Re-enter password of <hostname1/ip_address1>: ********** Enter password of <hostname2/ip_address2>: ********** Re-enter password of <hostname2/ip_address2>: ********** All the Management Server information has been successfully generated. NOTE: The desired location is where the modified smiseva.conf file is located. For more information on configuring the username and password for SMI-S on the management server, see the HPE P6000 Command View Installation Guide. After the passwords are entered, the configuration is written to the caeva.map map file located at: $SGCONF/mccaeva Setting a default Management Server Use the smispasswd command to set the active Management Server that will be used by evadiscovery tool. For Example: # smispasswd -d <hostname/ip_address> The Management Server <hostname/ip_address> is set as the default active SMI-S. Displaying the list of Management Servers Use the smispasswd command to display the current list of storage management servers that are accessible by the cluster software. Example: # smispasswd -l Setting up the replication 15 MC/CAEVA Server list: HOST USERNAME USE_SSL NAMESPACE ---------------------------------------------------------------------Host1:Port administrator N root/EVA Host2:Port administrator N root/EVA Adding or updating Management Server information To add or update individual Management Server login information to the map file, use the following command options shown in Table 2: smispasswd -h <hostname/ip_address> -n <namespace> -p <port> -u <user_name> -s <y|n> Table 2 Individual Management Server information Command Options Description -h <hostname/ip_address> This is either a DNS resolvable hostname or IP address of the Management Server -n <namespace> This is the name space configured for the SMI-S CIMOM . The default namespace is root/EVA. -p <port> This is the port on which the SMI-S server listens. This attribute is optional and is used when SMI-S server does not listen on the default ports. -u <user_name> This is the user name used to connect to SMI-S. The user name and password is the same as those used with the sssu tool. -s <y|n> This option specifies the type of connection needed to be established between Metrocluster software and the SMI-S CIMOM. “y” This option allows secure connection to Management Server using the HTTPS protocol (HTTP using Secure Socket Layer encryption). “n” This option means a secure connection is not required. 1 1 CIMOM - Common Information Model Object Manager, a key component that routes information between providers and clients. This command adds a new record if it does not find the <hostname/ip_address> in the mapping file. Otherwise, it only updates the record. For Example: # smispasswd -h <hostname/ip_address> -u administrator -n root/EVA -s y Enter password: ********** Re-enter password: ******** A new information has been successfully created Deleting a Management Server To delete a Management Server from the group used by the cluster, use the smispasswd command with -r option. Example: # smispasswd -r <hostname/ip_address> The Management Server <hostname/ip_address> has been successfully removed from the file Defining P6000/EVA Storages cells and DR groups On the node where access to SMI-S server is configured, define the P6000/EVA Storages and DR Groups information to be used in the Metrocluster environment. The Metrocluster software requires the internal device IDs to query the P6000/EVA storage states. The evadiscovery tool is a Command Line Interface (CLI) that provides functions for defining P6000/EVA storage cells 16 Configuring an application in a Metrocluster environment and DR group information. The tool caches the internal device IDs in the caeva.map file by querying and searching for a list of devices information. This type of caching improves the performance when querying P6000/EVA storage states through the SMI-S whenever required. To use the evadiscovery tool: 1. Create a configuration input file. This file will contain the names of storage pairs and DR groups. A template of configuration input file is available at the following location for Red Hat and SUSE. $SGCONF/mccaeva/mceva.conf 2. Copy the template file to the desired location. $SGCONF/mccaeva/mceva.conf <desired location> 3. 4. 5. For each pair of storage units, enter the WorldWideName (WWN). The WWN is available on the front panel of the P6000/EVA controller or from the P6000 Command View user interface. For each pair of storage units, enter the names of all DR groups that are managed by that storage pair. Save the file. Creating the Storage Map file After completing the HP P600/EVA Storages and DR Groups configuration file, use the evadiscovery utility to create or modify the storage map file. # evadiscovery -f <desired location>/mceva.conf Verifying the storage systems and DR Groups ......... Generating the mapping data ......... Adding the mapping data to the file /opt/cmcluster/conf/mccaeva/caeva.map ......... The mapping data is successfully generated. NOTE: The desired location is where you have placed the modified mceva.conf file. The command generates the mapping data and stores it in caeva.map file. The mapping file caeva.map contains information of the Management Servers and information of the P6000/EVA storage cells and DR Groups. Displaying information about storage devices Use the evadiscovery command to display information about the storage systems and DR groups in your configuration. For example: # evadiscovery -l MC EVA Storage System and DR Groups map list: Storage WWN: 50001FE15007DBA0 DR Group Name: \Data Replication\drg_cc DR Group Name: \Data Replication\drg_1 Storage WWN: 50001FE15007DBD0 DR Group Name: \Data Replication\drg_cc DR Group Name: \Data Replication\drg_1 Setting up the replication 17 NOTE: Run the evadiscovery tool after all the storage DR Groups are configured or when there is any change to the storage device. For example, the user removes and recreates a DR group that is used by an application package. In this case the DR Group's internal IDs are regenerated by the P6000/EVA system. If any name of storage systems or DR groups is changed, update the external configuration file, run the evadiscovery utility, and redistribute the map file caeva.map to all Metrocluster clustered nodes. Starting from release B.12.00.00 and B.01.00.02 (B.01.00.00 + SGLX_00461-463) onwards, for all changes including addition or removal of a disk, update the external configuration file, run the evadiscovery utility, and redistribute the map file caeva.map to all Metrocluster clustered nodes. Before running the evadiscovery command, the management server configuration must be completed using the smispasswd command, else the evadiscovery command fails. Copying the Map file After running the smispasswd and evadiscovery commands to generate the caeva.map file, copy this file in the same location to all cluster nodes so that the map file can be used by this product to communicate with the storage arrays. Copy the caeva.map file to all nodes in the cluster. # scp $SGCONF/mccaeva/caeva.map node:$SGCONF/mccaeva/caeva.map Similarly, whenever new Management Servers are added or the existing Server credentials are changed, generate and redistribute the map file to all Metrocluster clustered nodes. Configuring volume groups This section describes the required steps to create a volume group for use in a Metrocluster environment. To configure volume groups, you must first identify the device special files for the source and target VDisks in the DR Group. After that, create volume groups for source volumes, export them for access by other nodes and import them on all other cluster nodes. Identify the device special files for Vdisk in DR group To configure volume groups, you must get the device special files for the DR Group's source and target VDisks on nodes in both the sites. See the “ Identifying the devices to be used with packages” (page 47) for identifying device filenames for Vdisks. NOTE: While using Cluster Device Special Files (cDSF) feature, the device special file name is same on all nodes for a source and target VDisk. Identifying special device files The following is the sample output of the evainfo command on Linux environment. Table 3 evainfo command output # evainfo —d /dev/sdj Devicefile Array /dev/sdj WWNN Capacity 5000-1FE1-5007-DBA0 6005-08B4-0010-78F1-0002-4000-0143-0000 1024MB Controller/Port/Mode Ctl-B/FP–4/NonOptimized For more information on using the evainfo tool, see HPE P6000 Evainfo Release Notes. Use HPE P6000 Command View to identify the WWN for a Vdisk. The HPE P6000 Command View for the WWN Identifier of the Vdisk is shown in Figure 3. 18 Configuring an application in a Metrocluster environment Figure 3 P6000 Command View for the WWN identifier Configuring LVM volume group using Metrocluster with Continuous Access EVA P6000 for Linux LVM storage can be used in Metrocluster. The following section show how to set up LVM volume group. Before you create volume groups, you can create partitions on the disks and must enable activation protection for logicalvolume groups, preventing the volume group from being activated by more than one node at the same time. For more information on creating partitions and enabling activation protection for logical volume groups, see Managing HPE Serviceguard for Linux A.12.00.30 for Linux available at http://www.hpe.com/info/linux-serviceguard-docs. Creating LVM volume groups To create volume groups: 1. Create LVM physical volumes on each LUN. # pvcreate -f /dev/sda1 2. Create the volume group on the source volume. # vgcreate --addtag $(uname -n) /dev/<vgname> /dev/sda1 3. Create the logical volume. (XXXX indicates size in MB). # lvcreate -L XXXX /dev/<vgname> 4. Create a file system on the logical volume. # mke2fs -j /dev/<vgname>/rlvol1 5. If required, deactivate the Volume Groups on the primary system and remove the tag. # vgchange -a n <vgname> # vgchange --deltag $(uname -n) <vgname> NOTE: Use the vgchange --deltag command only if you are implementing volume-group activation protection. Remember that volume-group activation protection if implemented, must be done on every node. 6. Run the vgscan command on all the nodes to make the LVM configuration visible, and to create the LVM database. # vgscan Configuring volume groups 19 7. On the source disk site, run the following commands on all the other systems that might run the Serviceguard package. If required, take a back up of a LVM configuration. # vgchange --addtag $(uname -n) <vgname> # vgchange -a y <vgname> # vgcfgbackup <vgname> # vgchange -a n <vgname> # vgchange --deltag $(uname -n) <vgname> 8. To verify the Volume Group configuration on the target disk site. • 9. To failover the DR group: a. Select the remote storage system from the HPE P6000 Command View. b. Select the desired destination disaster Recovery group, and then click Fail Over. This makes the destination VDisk as SOURCE. On the target disk site, run the following commands on all the systems that might run the Serviceguard package. If required, take a back up of a LVM configuration. # vgchange --addtag $(uname -n) <vgname> # vgchange -a y <vgname> # vgcfgbackup <vgname> # vgchange -a n <vgname> # vgchange --deltag $(uname -n) <vgname> 10. To failover the DR group: a. Select the local storage system from the HPE P6000 Command View. b. Select the desired destination disaster Recovery group, and then click Fail Over. This makes the destination VDisk as SOURCE. Configuring VMware VMFS Disk Metrocluster with Continuous Access EVA P6000 for Linux supports VMware Virtual machine file System (VMFS) based disks (VMDK) for application use. For more information about VMware VMFS, see Managing HPE Serviceguard for Linux A.12.00.51 available at http://www.hpe.com/ info/linux-serviceguard-docs. You can find the details about how to deploy VMFS feature in disaster recovery environment in the next section. Before you apply or verify the Metrocluster package configuration file, ensure that the volume group uses the replication group as follows: • Each disk that is part of the replication group in the storage array must be part of the datastore as mentioned in the package. If one of the disks is not part of the datastore, a warning message is displayed. • Each disk in the datastore must be part of the replication group in the storage array as mentioned in the package. Prerequisites for configuring VMWare VMFS disk for Metrocluster with Continuous Access EVA P6000 To configure a disk for Metrocluster with Continuous Access EVA P6000 using VMFS: 1. Create replication group using storage specific command. 2. Create LUNs to the replication group on both Read-Only and Read-Write sites. 3. Export these LUNs to the ESXi hosts on Read-Write and Read-Only sites. 4. Create a datastore on the ESXi Host on the Read-Write site using the exported LUN. 5. Create VMDK disks on the datastore. 20 Configuring an application in a Metrocluster environment For more information about storage specific commands, see the following document available at http://www.hpe.com/support/manuals: Prerequisite for configuring RDM disk for Metrocluster with Continuous Access EVA P6000 for Linux If you need to configure a disk using RDM, the datastore can either reside on disk which is part of the replication group or on a different disk. In the configuration where the datastore resides on the replication group, you must follow the steps described in the Prerequisites for configuring VMWare VMFS disk for Metrocluster with Continuous Access EVA P6000 section. In a configuration where the datastore resides on a different disk which is not part of any replication group, is not supported in Dynamically Linked Storage (DLS) configuration. An error message is displayed when you try to configure it: ERROR: The disk with WWID WWID, used for the Datastore DS_NAME is not part of the Data Replication group CAEVA_DR_GROUP_NAME. In the configuration where the datastore resides on a different disk which is not part of any replication group is supported in Statically Linked Storage (SLS) configuration. For more information about VMware DLS and SLS configuration, see Managing HPE Serviceguard for Linux A.12.00.51 available at http://www.hpe.com/info/linux-serviceguard-docs. Installing and configuring an application You must replicate only the disks which contain application data and not the disks which contain application binaries and configuration files. The following section describes how to configure a package for the application. Configuring a Metrocluster package To create a Metrocluster modular packages do the following: 1. Run the following command to create a Metrocluster modular package configuration file: # cmmakepkg –m dts/mccaeva temp.config If the Metrocluster package uses Oracle Toolkit, then add the corresponding toolkit module. For example, for a Metrocluster Oracle toolkit modular package, run the following command: # cmmakepkg -m dts/mccaeva -m tkit/oracle/oracle temp.config NOTE: Metrocluster is usually used with applications such as Oracle. So, the application toolkit module must also be included when Metrocluster is used in conjunction with an application. You must make sure to specify the Metocluster module before specifying the toolkit module. 2. Edit the following attributes in the temp.config file: Table 4 Temp.config file Attributes Attributes Description dts_pkg_dir This is the package directory for this Metrocluster modular package. The Metrocluster Environment file is generated for this package in this directory. This value must be unique for all packages. DT_APPLICATION_STARTUP_POLICY This is a parameter used to define a policy for starting an application. It can be set to Availability_Preferred or Data_Currency_Preferred policy. DR_GROUP_NAME This is the name of the DR group used by the package. Installing and configuring an application 21 Table 4 Temp.config file Attributes (continued) Attributes Description DC1_STORAGE_WORLD_WIDE_NAME This is the World Wide Name of the P6000/EVA storage system that resides in Data Center 1. DC1_SMIS_LIST This is a list of management servers that reside in Data Center 1. DC1_HOST_LIST This is a parameter used to specify a list of clustered nodes that reside in Data Center 1. DC2_STORAGE_WORLD_WIDE_NAME This is the World Wide Name of the P6000/EVA storage system that resides in Data Center 2. DC2_SMIS_LIST This is a list of management servers that reside in Data Center 2. DC2_HOST_LIST This is a parameter used to specify a list of clustered nodes that reside in Data Center 2. NOTE: The host name specified in DC1/DC2 parameter must match the output of the hostname command. If hostname command displays a fully qualified name, then you must specify the fully qualified name in DC1/DC2 host parameter list. For Example: # hostname host1.domain1.com If hostname command displays only host name, then you must specify the host name in DC1/DC2 parameter list. For Example: # hostname host1 There are additional Metrocluster parameters available in the package configuration file. Hewlett Packard Enterprise recommends that you retain the default values of these variables unless there is a specific business requirement to change them. For more information on the Metrocluster parameters, see Appendix B (page 42). For the failover_policy parameter, Metrocluster failover packages can be configured to use any of the Serviceguard defined failover policies. The site_preferred and site_preferred_manual failover policies are introduced in Serviceguard specifically for Metrocluster configurations. The site_preferred value implies that when a Metrocluster package needs to fail over, it fails over to a node in the same site as the node it last ran. If there is no other configured node available within the same site, the package fails over to a node on another site. The site_preferred_manual failover policy provides automatic failover of packages within a site and manual failover across sites. Configure a cluster with sites to use either of these policies. For information on configuring the failover policy to site_preferred or site_preferred_manual, see “Site Aware Failover Configuration” (page 11). 3. Validate the package configuration file. # cmcheckconf -P temp.config 4. Apply the package configuration file. # cmapplyconf -P temp.config 22 Configuring an application in a Metrocluster environment NOTE: If external_pre_script is specified in a Metrocluster package configuration, the external_pre_script is executed after the execution of Metrocluster module scripts in package startup. Metrocluster module scripts are always executed first during package startup. 5. Run the package on a node in the Serviceguard cluster. # cmrunpkg -n <node_name> <package_name> 6. Enable global switching for the package. # cmmodpkg -e <package_name> After the package is created, if the value of any Metrocluster parameter needs to be changed, edit this package configuration file and re-apply it. Setting up Disk Monitoring HPE Serviceguard for Linux includes a Disk Monitor which you can use to detect problems in disk connectivity. This lets you fail a package over from one node to another in the event of a disk connectivity failure. For instructions on configuring disk monitoring , see Creating a Disk Monitor Configuration section in Managing HPE Serviceguard A.12.00.30 for Linux available at http://www.hpe.com/info/ linux-serviceguard-docs. Configuring a Metrocluster package using Serviceguard Manager To configure a Metrocluster package using Serviceguard Manager: 1. Access one of the node’s System Management Home Page at http://<nodename>:2301. Log in using the root user’s credentials of the node. 2. Click Tools, if Serviceguard is installed, one of the widgets will have Serviceguard as an option. Click Serviceguard Manager link within the widget. 3. On the Cluster’s Home Page, click the Configuration Tab, and then select Create A Modular Package. Figure 4 Creating modular package 4. If the product Metrocluster with Continuous Access EVA P6000 for Linux Toolkit is installed, you will be prompted to configure a Metrocluster package. Select the dts/mccaeva module, and then click Next. Installing and configuring an application 23 Figure 5 Selecting Metrocluster module 5. 6. You will be prompted next to include any other toolkit modules. In case, application being configured has a Serviceguard toolkit, select the appropriate toolkit; otherwise, move to the next screen. Enter the package name. Metrocluster packages can be configured only as failover packages. Make sure that this option is selected as shown in Figure 6 (page 24) and then click Next. Figure 6 Configuring package name 7. Select additional modules if required by the application. For example, if the application uses LVM volumegroups or VxVM diskgroups, select the volume_group module. Click Next. Figure 7 Selecting additional modules for the package 8. 24 Review the node order in which the package will start, and modify other attributes, if needed. Click Next. Configuring an application in a Metrocluster environment Figure 8 Configuring generic failover attributes 9. Configure the attributes for a Metrocluster package. All the mandatory attributes (marked with *) must be accurately filled. a. Select Application start up policy from the list. b. Specify the DR Group name, and then enter values for Wait Time and Query Timeout , if required. c. Select hosts for Data Center 1 and Data Center 2. Enter DC1/DC2 Storage World Wide Names. d. Specify the list of management servers for DC1 and DC2. Figure 9 Specifying the list of management servers for DC1 and DC2. 10. Enter the values for other modules selected in step 7. 11. After you enter the values for all the modules, review all the inputs given to the various attributes in the final screen. If you want to validate the package configuration click on Check Configuration, else click on Apply Configuration. Installing and configuring an application 25 Figure 10 Configuring Metrocluster P6000/EVA parameters 26 Configuring an application in a Metrocluster environment 3 Metrocluster features Data replication storage failover preview In an actual failure, packages are failed over to the standby site. In package startup, the underlying storage is failed over based on the parameters defined in Metrocluster package. The storage failover might fail under the following conditions: • Incorrect configuration or setup of Metrocluster and data replication environment. The storage failover can fail if the Metrocluster package has syntax errors, or invalid parameter values, or the installed Metrocluster binaries are corrupt or have incorrect file permissions or the SMI-S server is unreachable or caeva.map file is corrupted. • Invalid data replication state. The data may not be in write-order due to a track copy at the time of the failover attempt. Also, the data is not current (lagging behind the primary) and the Metrocluster package parameters are not set correctly to allow a failover on non-current data. The command cmdrprev previews the failover of data replication storage. It determines if storage failover can be successful during an actual package failover. This command can be used in both Metrocluster and Continentalclusters. If the preview fails, the cmdrprev command displays a detailed log that lists the cause for failure in stdout. The command options are as follows: cmdrprev -p <package> For more information, see the cmdrprev manpage. The command exit value indicates if the storage failover in an actual package will succeed or not. Table 5 describes the exit values of the command. Table 5 Command exit value and its description Value Description -1 The data replication storage failover preview failed. This indicates that in the event of an actual recovery process, the data replication storage failover will not succeed on any node in the cluster. The failure is due to invalid command usage or due to invalid input parameters. 0 The data replication storage failover preview is successful on the node where the command is run. This indicates if data replication storage failover will be successful in the event of a package failover. 1 The data replication storage failover preview failed. This indicates that in the event of an actual recovery process, the data replication storage failover will not succeed on any node in the cluster. 2 The data replication storage failover preview failed due to node specific error conditions or due to transient conditions. This indicates that in the event of an actual recovery process, the data replication storage failover will not succeed on that node. Failover may be successful you attempt at a later time or attempt on a different node in the cluster. Rolling upgrade for Metrocluster Use rolling Upgrade to upgrade the softwares components of the cluster with minimal downtime to the applications managed by Metrocluster. See Rolling upgrade for the procedure. Data replication storage failover preview 27 Live Application Detach There may be circumstances in which you want to do maintenance that involves halting a node, or the entire cluster, without halting or failing over the affected packages. Such maintenance might consist of anything short of rebooting the node or nodes, but a likely case is networking changes that will disrupt the heartbeat. New command options in Serviceguard for Linux A.12.00.00 (collectively known as Live Application Detach (LAD)) allows you to do this kind of maintenance while keeping the packages running. The packages are no longer monitored by Serviceguard, but the applications continue to run. Packages in this state are called detached packages. When you have done the necessary maintenance, you can restart the node or cluster, and normal monitoring will resume on the packages. For more information on the LAD feature, see Managing HPE Serviceguard A.12.00.30 for Linux available at http://www.hpe.com/info/ linux-serviceguard-docs 28 Metrocluster features 4 Understanding failover/failback scenarios Metrocluster package failover/failback scenarios This section discusses the package start up behaviors in various failure scenarios depending on DT_APPLICATION_STARTUP_POLICY and replication mode. Table 6 describes the list of failover scenarios. NOTE: The first time failover to a node at a remote site has to be done with the Management Server being active for the EVA array at the remote site. Table 6 Replication Modes and Failover Scenarios Failover Scenario Replication Mode DT_APPLICATION_STARTUP_POLICY Resolution Availability_Preferred Data_Currency_Preferred Remote failover Synchronous DR Group fails over and the package is started. The N/A during normal or Enhanced behavior is not affected by the presence of the FORCEFLAG operations Asynchronous file. (planned failover) Remote failover Synchronous when CA link or Enhanced down or when Asynchronous the primary site fails DR Group fails over and the package is started. DR Group does not fail over To forcefully start up and the package does not the package even start. though data currency The following log message appears in the package log: The following log message cannot be determined, create a The role of the device group appears in the package log: “FORCEFLAG” file Warning - Unable to get on this site is "destination". in package directory The state of the replication link remote DR group state and restart the is down and the state of data because the CA link is package. may not be current. Because down. The role of the the package startup policy is device group on this site is AVAILABILITY_PREFERRED, "destination". The state of the replication link is down the program will attempt to and the state of data may start up the package. not be current. Because the package startup policy is DATA_CURRENCY_PREFERRED, the package is NOT allowed to start up. Remote failover Synchronous during merging Restart the package If the Merge operation completes before WAIT_TIME expires, DR group is failed over and the package is started. when the merge is Otherwise, the DR Group does not failover and the package completed. is not started. There is no change in this behaviour even if the FORCEFLAG file is present. The following log message appears in the package log: The DR Group is in merging state. .... The WAIT_TIME has expired. Error - Failed to failover and swap the role of the device group. The package is NOT allowed to start up. Enhanced Waits till WAIT_TIME for the Asynchronous Merge operation to complete. The package is started even if merge is not completed within this time. If the merge operation completes before WAIT_TIME expires, then DR Group fails over and the package is started. To forcefully start up the package, create a “FORCEFLAG” file in package directory and restart the package. Package Metrocluster package failover/failback scenarios 29 Table 6 Replication Modes and Failover Scenarios (continued) Failover Scenario Replication Mode DT_APPLICATION_STARTUP_POLICY Resolution Otherwise,, the package is not started up. does not wait for the merge to complete. The following log message It starts up appears in the package log: immediately. The replication link state is good, the role of the device group on this site is destination" and the data Log Copy is in progress. Because the WAIT_TIME is set to xx minutes, the program will wait for completion of the log copy … The DR Group is in merging state. … The WAIT_TIME has expired. Error - Failed to failover and swap the role of the device group. The package is NOT allowed to start up. Remote failover Synchronous If full copy operation is complete before WAIT_TIME expires, Restart the package during copying or Enhanced DR Group fails over and the package is started. Otherwise, when the full copy is Asynchronous the DR Group does not failover and the package is not completed. started. The behavior is not affected by the presence of the FORCEFLAG file. The following log message appears in the package log : The replication link state is good, the role of the device group on this site is "destination" and the data Track Copy is in progress. Because the WAIT_TIME is set to xx minutes, the program will wait for completion of the track copy. .... The WAIT_TIME has expired. Error - Failed to failover and swap the role of the device group. The package is NOT allowed to start up. Remote failover Synchronous DR Group fails over and the when CA link or Enhanced package is started. down and when Asynchronous merge was in progress DR Group does not fail over To forcefully start up and the package does not the package even start. though data currency The following log message cannot be appears in the package log determined, create a “FORCEFLAG” file : in package directory, Warning - Unable to get and then restart the remote DR group state package. because the CA link is down. ….. The role of the device group on this site is "destination". The state of the replication link is down and the state of data may not be current. Because the package startup policy is 30 Understanding failover/failback scenarios Table 6 Replication Modes and Failover Scenarios (continued) Failover Scenario Replication Mode DT_APPLICATION_STARTUP_POLICY Resolution DATA_CURRENCY_PREFERRED, the package is NOT allowed to start up. Remote failover Synchronous when CA link or Enhanced down and when Asynchronous full copy was in progress DR Group does not fail over and the package does not start The package can be because data is not consistent on the destination storage. manually restarted successfully on The following log message appears in the package log : remote site after a The role of the device group on this site is "destination". The consistent state of the data may be inconsistent due to replication link snapclone/snapshot is down while data copying from source to destination is still of the destination in progress. The package is NOT allowed to start up. Vdisk is restored. HPE recommends taking a snapshot/snapclone of the destination Vdisks before the copy starts so that there is consistent copy available for recovery. Remote failover Synchronous when the link is manually suspended DR Group does not fail over and the package does not start. Resume the link, and The behavior is not affected by the presence of the then start up the FORCEFLAG file. package The following log message appears in the package log : Error - The replication link is in suspend mode. The DR group cannot be failed over. Enhanced DR Group fails over and the Asynchronous package is started. DR Group does not fail over To forcefully start up and the package does not the package, create start. a FORCEFLAG file The following log message in the package appears in the package log directory, and then restart the package. : The replication link of this DR group is in suspend mode. The replication link state is good and the role of the device group on this site is "destination". Because the state of the data may not be current and the package startup policy is DATA_CURRENCY_PREFERRED, the package is NOT allowed to start up. Remote failover Synchronous when DR Enhanced Group is in Asynchronous RUNDOWN state and the link is up N/A Waits until the WAIT_TIME for the Merge operation is complete. The package starts even if merge is not complete within this time. To forcefully start up the package, create a “FORCEFLAG” file in package directory and restart the package. Package The following log message does not wait for the appears in the package log merge to complete. It starts up : immediately. If the Merge operation completes before WAIT_TIME expires, the package is started. Otherwise, the package startup fails. Metrocluster package failover/failback scenarios 31 Table 6 Replication Modes and Failover Scenarios (continued) Failover Scenario Replication Mode DT_APPLICATION_STARTUP_POLICY Resolution The replication link state is good, the role of the device group on this site is destination" and the data Log Copy is in progress. Because the WAIT_TIME is set to <xx> minutes, the program will wait for completion of the log copy. …. The DR Group is in merging state. …. The WAIT_TIME has expired. Error - Failed to failover and swap the role of the device group. The package is NOT allowed to start up. Remote failover Synchronous when the DR Enhanced DR Group fails over and the Group is in Asynchronous package is started. RUNDOWN state and link is down N/A DR Group is not failed over To forcefully start up and the package is not the package even started. though data currency The following log message cannot be appears in the package log determined, create a “FORCEFLAG” file : in package directory Warning - Unable to get and restart the remote DR group state package. because the CA link is down. The role of the device group on this site is "destination". The state of the replication link is down and the state of data may not be current. Because the package startup policy is DATA_CURRENCY_PREFERRED, the package is NOT allowed to start up. 32 Understanding failover/failback scenarios 5 Administering Metrocluster Adding a node to Metrocluster To add a node to Metrocluster with Continuous Access EVA P6000 for Linux: 1. To add the node in a cluster, edit the Serviceguard cluster configuration file, and then apply the configuration: # cmapplyconf -C cluster.config 2. Copy caeva.map file to the new node. For Red Hat: # scp/usr/local/cmcluster/conf/mccaeva/caeva.map\ <new_node_name>:/usr/local/cmcluster/conf/mccaeva/caeva.map For SUSE: # scp/opt/cmcluster/conf/mccaeva/caeva.map \ <new_node_name>:/opt/cmcluster/conf/mccaeva/caeva.map 3. Add the newly added node in a Metrocluster package by editing the Metrocluster package configuration and applying the configuration: # cmapplyconf –P <package_config_file> Maintaining EVA P6000 Continuous Access replication in Metrocluster While the package is running, a manual storage failover on P6000 Continuous Access outside Metrocluster software can cause the package to halt. Hewlett Packard Enterprise recommends that no manual storage failover be performed while the package is running. A manual change of P6000 Continuous Access link state from suspend to resume is allowed to re-establish data replication while the package is running. P6000 Continuous Access Link failure scenarios If all Continuous Access links fail and if failsafe mode is disabled, the application package continues to run and writes new I/O to source Vdisk. The virtual log in P6000/EVA controller collects host write commands and data; DR group's log state changes from normal to logging. When a DR group is in a logging state, the log grows in proportion to the amount of write I/O being sent to the source Vdisks. Upon Continuous Access links recovery, P6000 Continuous Access automatically normalizes the source Vdisk and destination Vdisk data. If the links are down for a long time and failsafe mode is disabled, the log disk may be full, and full copy occurs automatically upon link recovery. If the log disk is not full, when a Continuous Access connection is re-established, the contents of the log are written to the destination Vdisk to synchronize it with the source Vdisk. This process of writing the log contents, in the order that the writes occurred, is called merging. While merging is in progress, write ordering is maintained and hence the data on the destination Vdisk is consistent. If the log disk is full, when a Continuous Access connection is re-established, a full copy from the source Vdisk to the destination Vdisk is done. Since a full copy is done at the block level, the data on the destination Vdisk is not consistent until the copy completes. If primary site fails while copy is in progress, the data in destination Vdisk is not consistent, and is not usable. The package can never startup on the recovery site. The application will not be online until the primary site is restored. To manage the resynchronization and to ensure that a consistent copy of data is there on recovery site, do the following: Adding a node to Metrocluster 33 1. 2. 3. After all Continuous Access links fail, put the Continuous Access link state to suspend mode by using P6000 Command View UI. When Continuous Access link is in suspend state, P6000 Continuous Access does not resynchronize the source and destination Vdisks upon links recovery. This helps in maintaining data consistency. Take a local replication copy of the destination Vdisks using P6000 Business Copy software so that there is consistent copy available for recovery. Change the Continuous Access link state to resume mode. This initiates the normalization upon Continuous Access link recovery. The above steps ensures that even though the primary site fails while copy is in progress, the destination Vdisk can be used after restoring the data from the BC volume. Planned maintenance Node maintenance If you take a node down for maintenance, package failover and quorum calculation is based on the remaining nodes. Make sure that the nodes are taken down evenly at each site, and that enough nodes remain on-line to form a quorum if a failure occurs. Planned maintenance is treated the same as a failure by the cluster. Metrocluster package maintenance There might be situations when the package needs to be taken down for maintenance purposes without moving the package to another node. The following procedure is recommended for normal maintenance of the Metrocluster with Continuous Access EVA P6000 for Linux: 1. Disable packge switching. # cmmodpkg -d <pkgname> 2. 3. If SMI-S user credentials or ports are being changed, update the smis.conf file and re-generate the caeva.map file. Distribute the updated caeva.map file to all the nodes of the cluster. If there are changes to the package attributes, edit the configuration file, and then apply the updated package configuration. # cmapplyconf -P <pkgname.config> 4. Start the package with the appropriate Serviceguard command. # cmmodpkg -e <pkgname> Failback If the primary site is restored after a failover to the recovery site, you may want to fail back the package to the primary site. Manually resync the data from the recovery site to the primary site. After resynchronization is complete, halt the package on the recovery site, and restart it on the primary site. Metrocluster performs a failover of the storage, which returns the SOURCE status to the primary VDisks. Administering Metrocluster with Serviceguard Manager To administer Metrocluster or the packages configured under Metrocluster using Serviceguard Manager: 1. Access one of the node’s System Management Home Page at http://<nodename>:2301. Log in using the root user’s credentials of the node. 2. Click Tools, if Serviceguard is installed, one of the widgets will have Serviceguard as an option. Click Serviceguard Manager link within the widget. 3. On the Cluster’s Home Page, click the Administration Tab and choose the available options when required. Choose appropriate packages for the required option. 34 Administering Metrocluster Rolling upgrade Metrocluster configurations follow the Serviceguard rolling upgrade procedure. The Serviceguard documentation includes rolling upgrade procedures to upgrade the Serviceguard version, operating environment, and other software. This Serviceguard procedure, along with recommendations, guidelines, and limitations, is applicable to Metrocluster versions. For more information on completing a rolling upgrade of Serviceguard, see the latest edition of Managing HPE Serviceguard A.12.00.30 for Linux, available at http://www.hpe.com/info/ linux-serviceguard-docs. Upgrading Metrocluster replication software To perform a rolling upgrade of Metrocluster software: 1. Disable package switching for all Metrocluster packages. 2. Install the new Metrocluster software on all nodes. 3. Enable package switching for all Metrocluster packages. To upgrade the array-specific replication management software, see “Upgrading replication management software” (page 35). Limitations of the rolling upgrade for Metrocluster The following are the limitations of the rolling upgrade for Metrocluster: • The cluster or package configuration cannot be modified until the rolling upgrade is complete. If the configuration must be edited, upgrade all nodes to the new release, and then modify the configuration file and copy it to all nodes in the cluster. • New features of the latest version of Metrocluster cannot be used until all nodes are upgraded to the latest version. • More than two versions of Metrocluster cannot run in the cluster while the rolling upgrade is in progress. • The rolling upgrade procedure cannot be used as a means of using multiple versions of Metrocluster software within the cluster. Hewlett Packard Enterprise recommends that all cluster nodes are immediately upgraded to the latest version. • Serviceguard cannot be deleted on any node when the rolling upgrade is in progress in the cluster. Upgrading replication management software Upgrade the replication management software that is used by Metrocluster. In this product, the array management softwares are running in a separate windows box, therefore you can upgrade them independently without affecting the running Metrocluster. Upgrading the OpenPegasus WBEM Services for Metrocluster with Continuous Access EVA P6000 for Linux Metrocluster with Continuous Access EVA P6000 for Linux uses OpenPegasus WBEM Services software to communicate with the SMI-S server that manages the P6000/EVA disk arrays. You can update OpenPegasus WBEM Services without halting the cluster or Metrocluster packages on any of the nodes. Metrocluster communicates with this software when the Metrocluster packages are starting up. Therefore, avoid Metrocluster package failover until all the nodes in the Metrocluster are upgraded to the same version of OpenPegasus WBEM Services. Rolling upgrade 35 6 Troubleshooting Troubleshooting Metrocluster Analyze Metrocluster and SMI-S/Command View log files to understand the problem in the respective environment and follow a recommended action based on the error or warning messages. Metrocluster log Make sure you periodically review the following files for messages, warnings, and recommended actions. Hewlett Packard Enterprise recommends reviewing these files after each system, data center, and/or application failures: • View the system log at /var/log/messages. • The package log file specified in the script_log_file parameter. P6000/EVA storage system log Analyze the CIMOM or Provider logs to understand the problem with the SMI-S layer. For more information on SMI-S logs see the SMI-S EVA Provider user guide and for Command View logs see the HPE P6000 Command View user guide. 36 Troubleshooting 7 Support and other resources Accessing Hewlett Packard Enterprise Support • For live assistance, go to the Contact Hewlett Packard Enterprise Worldwide website: www.hpe.com/assistance • To access documentation and support services, go to the Hewlett Packard Enterprise Support Center website: www.hpe.com/support/hpesc Information to collect • Technical support registration number (if applicable) • Product name, model or version, and serial number • Operating system name and version • Firmware version • Error messages • Product-specific reports and logs • Add-on products or components • Third-party products or components Accessing updates • Some software products provide a mechanism for accessing software updates through the product interface. Review your product documentation to identify the recommended software update method. • To download product updates, go to either of the following: ◦ Hewlett Packard Enterprise Support Center Get connected with updates page: www.hpe.com/support/e-updates ◦ Software Depot website: www.hpe.com/support/softwaredepot • To view and update your entitlements, and to link your contracts and warranties with your profile, go to the Hewlett Packard Enterprise Support Center More Information on Access to Support Materials page: www.hpe.com/support/AccessToSupportMaterials IMPORTANT: Access to some updates might require product entitlement when accessed through the Hewlett Packard Enterprise Support Center. You must have an HP Passport set up with relevant entitlements. Websites Website Link Hewlett Packard Enterprise Information Library www.hpe.com/info/enterprise/docs Hewlett Packard Enterprise Support Center www.hpe.com/support/hpesc Accessing Hewlett Packard Enterprise Support 37 Website Link Contact Hewlett Packard Enterprise Worldwide www.hpe.com/assistance Subscription Service/Support Alerts www.hpe.com/support/e-updates Software Depot www.hpe.com/support/softwaredepot Customer Self Repair www.hpe.com/support/selfrepair Insight Remote Support www.hpe.com/info/insightremotesupport/docs Serviceguard Solutions for HP-UX www.hpe.com/info/hpux-serviceguard-docs Single Point of Connectivity Knowledge (SPOCK) Storage www.hpe.com/storage/spock compatibility matrix Storage white papers and analyst reports www.hpe.com/storage/whitepapers Customer self repair Hewlett Packard Enterprise customer self repair (CSR) programs allow you to repair your product. If a CSR part needs to be replaced, it will be shipped directly to you so that you can install it at your convenience. Some parts do not qualify for CSR. Your Hewlett Packard Enterprise authorized service provider will determine whether a repair can be accomplished by CSR. For more information about CSR, contact your local service provider or go to the CSR website: www.hpe.com/support/selfrepair Remote support Remote support is available with supported devices as part of your warranty or contractual support agreement. It provides intelligent event diagnosis, and automatic, secure submission of hardware event notifications to Hewlett Packard Enterprise, which will initiate a fast and accurate resolution based on your product’s service level. Hewlett Packard Enterprise strongly recommends that you register your device for remote support. For more information and device support details, go to the following website: www.hpe.com/info/insightremotesupport/docs Documentation feedback Hewlett Packard Enterprise is committed to providing documentation that meets your needs. To help us improve the documentation, send any errors, suggestions, or comments to Documentation Feedback ([email protected]). When submitting your feedback, include the document title, part number, edition, and publication date located on the front cover of the document. For online help content, include the product name, product version, help edition, and publication date located on the legal notices page. 38 Support and other resources A Checklist and worksheet for configuring a Metrocluster with Continuous Access EVA P6000 for Linux Disaster Recovery Checklist Use this checklist to make sure you have adhered to the disaster tolerant architecture guidelines for two main data centers and a third location configuration. Data centers A and B have the same number of nodes to maintain quorum in case an entire data center fails. Arbitrary nodes or Quorum Server nodes are located in a separate location from either of the primary data centers (A or B). The elements in each data center including nodes, disks, network components, and climate control are on separate power circuits. Multipathing is configured for each disk used in Metrocluster. Each disk array is configured with redundant replication links. At least two networks are configured to function as the cluster heartbeat. All redundant cabling for network, heartbeat, and replication links are routed using physical paths. Cluster Configuration Worksheet Use this cluster configuration worksheet either in place of, or in addition to the worksheet provided in the latest version of the Managing HPE Serviceguard A.12.00.30 for Linux manual available at http://www.hpe.com/info/linux-serviceguard-docs. If you have already completed a Serviceguard cluster configuration worksheet, you only need to complete the first part of this worksheet. _______________________________________________________ Names and Nodes _______________________________________________________ Cluster Name: ________________________________________________________ Data Center A Name and Location: _____________________________________ Site Name: __________________________________________________________ Node Names: ________________________________________________________ Data Center B Name and Location: ______________________________________ Site Name: ___________________________________________________________ Node Names: ________________________________________________________ Arbitrator/Quorum Server Third Location Name and Location: ______________ Arbitrator Node/Quorum Server Names: ________________________________ Maximum Configured Packages: _______________________________________ ______________________________________________________ Subnets _______________________________________________________ Heartbeat IP Addresses: _______________________________________________ Non-Heartbeat IP Addresses: ___________________________________________ _______________________________________________________ Timing Parameters ______________________________________________________ Member Timeout: _____________________________________________________ Disaster Recovery Checklist 39 Network Polling Interval: ______________________________________________ AutoStart Delay: ______________________________________________________ Package Configuration Worksheet Use this package configuration worksheet either in place of, or in addition to the worksheet provided in the latest version of the Managing HPE Serviceguard A.12.00.30 for Linux manual available at http://www.hpe.com/info/linux-serviceguard-docs. If you have already completed a Serviceguard package configuration worksheet, you only need to complete the first part of this worksheet. _______________________________________________________ Modular package configuration worksheet ________________________________________________________ Package Configuration data _________________________________________________________ Package Name: _________________________________________________________ Primary Node: _________________________Data Center: ______________________ First Failover Node: __________________Data Center: ________________________ Second Failover Node: _________________Data Center: ______________________ Third Failover Node: __________________Data Center: _______________________ Fourth Failover Node: _________________Data Center: _______________________ Table 7 Volume Group module Information Information Name Entry 1 Entry 2 Entry 3 Entry 4 Volume Group Logical Volume File System Mount Point LVM/VxVM Table 8 Package IP module Information Package IP Address IP Subnet IP Subnet Node Table 9 Service module Information Service Name Service command Service restart Fail fast enabled Service Halt timeout _____________________________________________________________ Metrocluster Continuous Access P6000/EVA Module Information _____________________________________________________________ DR Group Name: ____________________________________________________________ 40 Checklist and worksheet for configuring a Metrocluster with Continuous Access EVA P6000 for Linux DC1 Storage Array WWN: ___________________________________________________ DC1 SMIS List: ______________________________________________________________ DC1 HOST List: _____________________________________________________________ DC2 Storage Array WWN: ___________________________________________________ DC2 SMIS List: ______________________________________________________________ DC2 HOST List: _____________________________________________________________ P6000/EVA Configuration Checklist Use the following checklist to verify the Metrocluster with Continuous Access EVA P6000 for Linux configuration Redudant Management Servers configured and accessible to all nodes. Source and Destination volumes created for use with all packages. Management Servers Security configuration is complete (smispasswd command). P6000/EVA mapping is complete (evadiscovery command). caeva.map file is copied to all cluster nodes. The caeva.map file is located at: $SGCONF/mccaeva/ P6000/EVA Configuration Checklist 41 B Package attributes for Metrocluster with Continuous Access EVA P6000 for Linux This appendix lists all Package Attributes for this product. Hewlett Packard Enterprise recommends that you use the default settings for most of these variables, so exercise caution when modifying them: CLUSTER_TYPE This parameter identifies the type of disaster recovery services cluster: Metrocluster or Continentalclusters. You must set this to “metro” if this is a Metrocluster environment and “continental” if this is a Continentalclusters environment. A type of “metro” is supported only when the HPE Metrocluster product is installed. A type of “continental” is supported only when the HPE Continentalclusters and Metrocluster software are installed. PKGDIR If the package is a legacy package, this variable contains the full path name of the package directory. If the package is a modular package, this variable contains the full path name of the directory where the Metrocluster caeva environment file is located. DT_APPLICATION_STARTUP_POLICY This parameter defines the preferred policy to start the application with respect to the state of the data in the local volumes. It should be set to one of the following two policies: Availability_Preferred: The user chooses this policy if he prefers application availability. Metrocluster software allows the application to start if the data is consistent even if the data is not current. Data_Currency_Preferred:The user chooses this policy if he prefers the application to start on consistent and current data. Metrocluster software allows the application to operate only on current data. This policy only focuses on the state of the local data (with respect to the application) being consistent and current. A package can be forced to start on a node by creating the FORCEFLAG in the package directory. 42 WAIT_TIME (0 or greater than 0 [in minutes]) This parameter defines the timeout, in minutes, to wait for completion of the data merging or copying for the DR group before startup of the package on destination volume. If WAIT_TIME is greater than zero, and if the state of DR group is “merging in progress” or “copying in progress”, Metrocluster software waits until WAIT_TIME value for the merging or copying is complete. If WAIT_TIME expires and merging or copying is still in progress, the package fails to start with an error. If WAIT_TIME is 0 (default value), and if the state of DR group is “merging in progress” or “copying in progress” state, Metrocluster software will not wait and will return an exit 1 code to Serviceguard package manager. The package fails to start with an error. DR_GROUP_NAME The name of the DR group used by this package. The DR group name is defined when the DR group is created. Package attributes for Metrocluster with Continuous Access EVA P6000 for Linux DC1_STORAGE_WORLD_WIDE_NAME The world wide name of the P6000/EVA storage system that resides in Data Center 1. This storage system name is defined when the storage is initialized. DC1_SMIS_LIST A list of the management servers that reside in Data Center 1. Multiple names can be defined by using commas as separators. If a connection to the first management server fails, attempts are made to connect to the subsequent management servers in their order of specification. A list of the clustered nodes that reside in Data Center 1. Multiple names can be defined by using commas as separators. DC2_STORAGE_WORLD_WIDE_NAME The world wide name of the P6000/EVA storage system that resides in Data Center 2. This storage system name is defined when the storage is initialized. DC2_SMIS_LIST A list of the management servers that reside in Data Center 2. Multiple names can be defined by using commas as separators. If a connection to the first management server fails, attempts are made to connect to the subsequent management servers in their order of specification.. DC1_HOST_LIST DC2_HOST_LIST QUERY_TIME_OUT (Default 300 seconds) A list of the clustered nodes that reside in Data Center 2. Multiple names can be defined by using commas as separators. Sets the time in seconds to wait for a response from the SMI-S CIMOM in storage management appliance. The minimum recommended value is 20 seconds. If the value is set to be smaller than 20 seconds, Metrocluster software may time out before getting the response from SMI-S, and the package fails to start with an error. 43 C smiseva.conf file ################################################################# # # # smiseva.conf CONFIGURATION FILE (template) # # for use with the smispasswd utility # # in the Metrocluster CA EVA Environment # # # # Note: This file MUST be edited before it can be used. # # For complete details about SMI-S configuration for use # # with Metrocluster CA EVA, consult the manual "Designing # # Disaster Tolerant High Availability Clusters." # # # ################################################################# # # This file provides input to the smispasswd utility, which you # use to set up secure access paths between cluster nodes and # SMI-S services. # # Edit this file to include the appropriate information about # the Management Server and SMI-S services that will be used in # your Metrocluster CA EVA environment. # # After entering all the desired information, run the smispasswd # command to generate the security configuration that allows # cluster nodes to communicate with the SMI-S services. # # Below is an example configuration. The data is commented out. # # Hostname/IP_Address User_login_name Secure Namespace Port # IP_Address Connection (optional) # --------------------------------- ---------------------# 15.13.244.182 administrator y root/EVA 5989 # 15.13.244.183 administrator N root/EVA 5988 # 15.13.244.192 admin12309 y root/EVA # SANMA04 admin y root/EVA # # The example shows a list of 4 Management Server and SMI-S data in # the Metrocluster CA EVA environment. Each line represents a different # Management Server and SMI-S data; fields on each line should be separated # either by space(s) or tab(s). The order of fields is significant. The first # field must be a hostname or IP address, the second field must be # a user login name on the host. The third field must be 'y' or # 'n' to use SSL connect. The next field must be the namespace # of the SMI-S service. If the SMI-S service does not use the default port, # then use the Port column to give the customized port number. # For details of each field data, refer to the smispasswd # man page, 'man smispasswd'. # # Note: # Lines beginning with the pound sign (#) are comments. You cannot # use the '#' character in your data entries. # # Enter your Managment Server and SMI-S services data under the dashed lines: # # Hostname/IP_Address User_login_name Secure Namespace Port # IP_Address Connection (optional) # --------------------------------- ------------------------- 44 smiseva.conf file D mceva.conf file ############################################################## ## mceva.conf CONFIGURATION FILE (template) for use with ## ## the evadiscovery utility in the Metrocluster Continuous ## ## Access EVA Environment. ## ## Version: A.01.00 ## ## Note: This file MUST be edited before it can be used. ## ## For complete details about EVA configuration for use ## ## with Metrocluster Continuous Access EVA, consult the ## ## manual “Designing Disaster Tolerant High Availability ## ## Clusters”. ## ############################################################## ## This file provides input to the evadiscovery utility, ## ## which you use to generate the /etc/dtsconf/caeva.map ## ## file. During Metrocluster Continuous Access EVA ## ## configuration, this file is copied to all cluster nodes. ## ## Edit the file to include the appropriate data about the ## ## EVA storage systems and DR groups that will be used in ## ## your Metrocluster Continuous Access EVA environment. ## ## After entering all the desired information, run the ## ## evadiscoverycommand to generate the mapping data and save## ## it in a map file. ## ## Note: Before running evadiscovery, you need to use the ## ## smispasswd command to create a SMI-S services ## ## configuration. ## ## Enter the data for storage device pairs and DR groups ## ## after the <storage_pair_info> and <dr_group_info> tags. ## ## The <storage_pair_info> tag represents the starting ## ## definition of a storage pair and its DR groups. Under a ## ## <storage_pair_info> tag, you must provide two storage ## ## Node World Wide Name (WWN)which both contain the DR groups## ## defined under the <dr_group_info> tag. You can define as ## ## many DR groups as you need, but each DR group must belong ## ## to only one of the storage pairs. A storage pair can have ## ## a maximum of 64 DR groups. ## ## Note that you can find storage Node World Wide Names form ## ## the front panel of your P6000 controllers or from the ## ## ‘Initialized Storage Properties’ page of command view ## ## EVA through your Web browser. ## ## Below is an example of a configuration with two storage ## ## pairs (4 storage units). The first storage pair contains ## ## 2 DR groups and the second pair contains 1 DR group. ## ## <storage_pair_info> ## ## “5000-1FE1-5000-4280” Enter first storage WWN in double ## ## quotes. ## ## “5000-1FE1-5000-4180” Enter second storage WWN in double ## ## quotes. ## ## <dr_group_info> ## ## “DR Group - Package1” Enter a DR group name in double ## ## quotes. ## ## “DR Group - OracleDB1” Enter a DR group name in double ## ## quotes. ## ## <storage_pair_info> ## ## “5000-1FE1-5000-4081” Enter first storage WWN in double ## ## quotes. ## ## “5000-1FE1-5000-4084” Enter second storage WWN in double ## ## quotes. ## ## <dr_group_info> ## ## "DR Group - Package2” Enter a DR group name in double ## ## quotes. ## ## Note:Since '#’ meant a start of a comment, you cannot ## ## include the ‘#’ in any <storage_pair_info>, ## ## <dr_group_info>, storage name and DR group name. ## 45 ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 46 ## Note: All the storage and DR Group names should be ## enclosed in double quotes (““), otherwise the ## evadiscovery command will not detect them. ## Enter your MC EVA Storage pairs and DR Groups under the ## # dashed lines: ## ----------------------------------------------------------## <storage_pair_info> ## “5000-1FE1-5000-00DF” ## “5000-1FE1-5000-00DE” ## <dr_group_info> ## “DR Group 1” ## “DR Group 2” ## “DR Group 3” ## “DR Group 4” ## mceva.conf file E Identifying the devices to be used with packages Identifying the devices created in P6000/EVA After the WWN of the P6000/ EVA virtual volume is obtained, find the WWN of the disk using lsscsi or scsi_id commands. For Example: # lsscsi | grep HSV | grep disk | awk '{print $6}' After the P6000/EVA disks are retrieved by lsscsi command, run the scsi_id command to find the WWN of the P6000/EVA disk. For Example, On SUSE: #/lib/udev/scsi_id --whitelisted <HSV disk path> On Red Hat: #/sbin/scsi_id --whitelisted <HSV disk path> You can also use EVAinfo tool to retrieve the wwn of P6000/EVA disk. For Example: # evainfo -a Devicefile Array WWNN Controller/Port/Mode /dev/sdg 5000-1FE1-5007-DBD0 6005-08B4-0010-78FD-0002-4000-0018-0000 Ctl-A/FP-4/NonOptimized /dev/sdh 5000-1FE1-5007-DBD0 6005-08B4-0010-786B-0001-E000-0031-0000 Ctl-A/FP-4/NonOptimized /dev/sdp 5000-1FE1-5007-DBD0 6005-08B4-0010-78FD-0002-4000-0018-0000 Ctl-A/FP-3/NonOptimized /dev/sdq 5000-1FE1-5007-DBD0 6005-08B4-0010-786B-0001-E000-0031-0000 Ctl-A/FP-3/NonOptimized /dev/sdr 5000-1FE1-5007-DBD0 6005-08B4-0010-78FD-0002-4000-0018-0000 Ctl-B/FP-4/Optimized /dev/sds 5000-1FE1-5007-DBD0 6005-08B4-0010-786B-0001-E000-0031-0000 Ctl-B/FP-4/Optimized /dev/sdaf 5000-1FE1-5007-DBD0 6005-08B4-0010-78FD-0002-4000-0018-0000 Ctl-A/FP-4/NonOptimized /dev/sdag 5000-1FE1-5007-DBD0 6005-08B4-0010-786B-0001-E000-0031-0000 Ctl-A/FP-4/NonOptimized /dev/sdah 5000-1FE1-5007-DBD0 6005-08B4-0010-78FD-0002-4000-0018-0000 Ctl-A/FP-3/NonOptimized /dev/sdai 5000-1FE1-5007-DBD0 6005-08B4-0010-786B-0001-E000-0031-0000 Ctl-A/FP-3/NonOptimized Capacity 1024MB 1024MB 1024MB 1024MB 1024MB 1024MB 1024MB 1024MB 1024MB 1024MB For more information on EVAInfo tool, see HPE EVAInfo Release Notes. Identifying the devices created in P6000/EVA 47 Glossary A, B arbitrator Nodes in a disaster tolerant architecture that act as tie-breakers in case all of the nodes in a data center go down at the same time. These nodes are full members of the Serviceguard cluster and must conform to the minimum requirements. The arbitrator must be located in a third data center to ensure that the failure of an entire data center does not bring the entire cluster down. See also quorum server. automatic failover Failover directed by automation scripts or software (such as Serviceguard) and requiring no human intervention. C campus cluster A single cluster that is geographically dispersed within the confines of an area owned or leased by the organization such that it has the right to run cables above or below ground between buildings in the campus. Campus clusters are usually spread out in different rooms in a single building, or in different adjacent or nearby buildings. See also extended distance cluster. cluster A Serviceguard cluster is a networked grouping of HPE 9000 and/or HPE Integrity Servers series 800 servers (host systems known as nodes) having sufficient redundancy of software and hardware that a single failure will not significantly disrupt service. Serviceguard software monitors the health of nodes, networks, application services, EMS resources, and makes failover decisions based on where the application is able to run successfully. Continentalclusters A group of clusters that use routed networks and/or common carrier networks for data replication and cluster communication to support package failover between separate clusters in different data centers. Continentalclusters are often located in different cities or different countries and can span 100s or 1000s of kilometers. Continuous Access A facility provided by the Continuos Access software option available with the HPE StorageWorks P6000/EVA disk Array. This facility enables physical data replication between P6000/EVA series disk arrays. Controller Software Controller software manages all aspects of array operations, including communication with P6000 Command View. VCS is the controller software for the EVA3000/5000 models. XCS is the controller software for all other P6000/EVA models. D data center A physically proximate collection of nodes and disks, usually all in one room. data consistency Whether data are logically correct and immediately usable; the validity of the data after the last write. Inconsistent data, if not recoverable to a consistent state, is corrupt. data currency Whether the data contain the most recent transactions, and/or whether the replica database has all of the committed transactions that the primary database contains; speed of data replication may cause the replica to lag behind the primary copy, and compromise data currency. data replication The scheme by which data is copied from one site to another for disaster tolerance. Data replication can be either physical (see physical data replication) or logical (see logical data replication). In a Continentalclusters environment, the process by which data that is used by the cluster packages is transferred to the Recovery Cluster and made available for use on the Recovery Cluster in the event of a recovery. database replication A software-based logical data replication scheme that is offered by most database vendors. disaster An event causing the failure of multiple components or entire data centers that render unavailable all services at a single location; these include natural disasters such as earthquake, fire, or flood, acts of terrorism or sabotage, large-scale power outages. 48 Glossary disaster recovery The process of restoring access to applications and data after a disaster. Disaster recovery can be manual, meaning human intervention is required, or it can be automated, requiring little or no human intervention. disaster tolerant The characteristic of being able to recover quickly from a disaster. Components of disaster tolerance include redundant hardware, data replication, geographic dispersion, partial or complete recovery automation, and well-defined recovery procedures. E, F Environment File Metrocluster uses a configuration file that includes variables that define the environment for the Metrocluster to operate in a Serviceguard cluster. This configuration file is referred to as the Metrocluster environment file. This file needs to be available on all nodes in the cluster for Metrocluster to function successfully. failback Failing back from a backup node, which may or may not be remote, to the primary node that the application normally runs on. failover The transfer of control of an application or service from one node to another node after a failure. Failover can be manual, requiring human intervention, or automated, requiring little or no human intervention. G, H, I, J, K, L heartbeat network A network that provides reliable communication among nodes in a cluster, including the transmission of heartbeat messages, signals from each functioning node, which are central to the operation of the cluster, and which determine the health of the nodes in the cluster. local failover Failover on the same node; this most often applied to hardware failover, for example local LAN failover is switching to the secondary LAN card on the same node after the primary LAN card has failed. LUN (Logical Unit Number) A SCSI term that refers to a logical disk device composed of one or more physical disk mechanisms, typically configured into a RAID level. M, N manual failover Failover requiring human intervention to start an application or service on another node. Metrocluster A Hewlett Packard Enterprise product that allows a customer to configure a Serviceguard cluster as a disaster tolerant metropolitan cluster. metropolitan cluster A cluster that is geographically dispersed within the confines of a metropolitan area requiring right-of-way to lay cable for redundant network and data replication components. mission critical application Hardware, software, processes and support services that must meet the uptime requirements of an organization. Examples of mission critical application that must be able to survive regional disasters include financial trading services, e-business operations, 911 phone service, and patient record databases. multiple points of failure (MPOF) More than one point of failure that can bring down a Serviceguard cluster. notification A message that is sent following a cluster or package event. Q quorum server A cluster node that acts as a tie-breaker in a disaster tolerant architecture in case all of the nodes in a data center go down at the same time. See also arbitrator. R remote failover Failover to a node at another data center or remote location. resynchronization The process of making the data between two sites consistent and current once systems are restored following a failure. Also called data resynchronization. 49 S split-brain syndrome When a cluster reforms with equal numbers of nodes at each site, and each half of the cluster thinks it is the authority and starts up the same set of applications, and tries to modify the same data, resulting in data corruption. Serviceguard architecture prevents split-brain syndrome in all cases unless dual cluster locks are used. sub-clusters Sub-clusters are clusterwares that run above the Serviceguard cluster and comprise only the nodes in a Metrocluster site. Sub-clusters have access only to the storage arrays within a site. T transparent failover A client application that automatically reconnects to a new server without the user taking any action. transparent IP failover Moving the IP address from one network interface card (NIC), in the same node or another node, to another NIC that is attached to the same IP subnet so that users or applications may always specify the same IP name/address whenever they connect, even after a failure. U,V,Z virtual array Synonymous with disk array and storage system; a group of disks in one or more disk enclosures combined with control software that presents disk storage capacity as one or more virtual disks. See also virtual disk. See also virtual disk. Virtual Controller Software (VCS) See controller software. virtual disk Variable disk capacity that is defined and managed by the array controller and presented to hosts as a disk. May be called Vdisk in the user interface. volume group In LVM, a set of physical volumes such that logical volumes can be defined within the volume group for user access. A volume group can be activated by only one node at a time unless you are using Serviceguard OPS Edition. Serviceguard can activate a volume group when it starts a package. A given disk can belong to only one volume group. A logical volume can belong to only one volume group. Vraid The level to which user data is protected. Redundancy is directly proportional to cost in terms of storage usage; the greater the level of data protection, the more storage space is required. See also: Vraid0, Vraid1, Vraid5, Vraid6. Vraid0 Optimized for I/O speed and efficient use of physical disk space, but provides no data redundancy. Vraid1 Optimized for data redundancy and I/O speed, but uses the most physical disk space. Vraid5 Provides a balance of data redundancy, I/O speed, and efficient use of physical disk space. Vraid6 Offers the features of Vraid5 while providing more protection for an additional drive failure, but uses additional physical disk space. 50 Glossary Index A S accessing updates, 37 Server Default Management Server, 15 Management Server, 16 Site Aware Disaster Tolerant Architecture (SADTA), 35 site_preferred_manual site_preferred, 11 smispasswd configuration, 18 evadiscovery, 14 Storage Cells DR Groups, 16 storage devices configuration, 17 support Hewlett Packard Enterprise, 37 C cluster continental, 42 Serviceguard, 11 cmviewcl command, 12 configuration environment, 9 configure web-based tool, 12 Configuring Generic Failover Attributes, 25 Metrocluster EVA Parameters, 26 contacting Hewlett Packard Enterprise, 37 Continentalclusters, 42 Metrocluster, 27 customer self repair, 38 U updates accessing, 37 V D Disaster Recovery Continentalclusters worksheet, 39 Performing, 35 documentation providing feedback on, 38 F VDisk Cluster Device Special Files (cDSF), 18 W websites, 37 customer self repair, 38 worksheet Continentalclusters, 39 failover_policy site_preferred, 12 FORCEFLAG, 31 H hardware software, 11 Hierarchical Storage Virtualization (HSV) terminology, 5 HPE P6000 Continuous Access, 5 M Metrocluster configuration, 9 Metrocluster Module, 24 Rolling, 27 SMI-S, 27 Metrocluster package Metrocluster parameters, 22 Serviceguard Manager, 23 R remote support, 38 Replication Failover Preview, 27 51
© Copyright 2026 Paperzz