Revision 3.5 Last Update: 10/10/2005 Backup Exec™ 9.1/9.2 for NetWare Servers Novell Cluster Server Environment INTRODUCTION ........................................................................................................................................ 3 AUDIENCE ................................................................................................................................................... 3 TERMINOLOGY ......................................................................................................................................... 3 INSTALLATION ......................................................................................................................................... 5 CONFIGURING BACKUP EXEC TO BACKUP CLUSTERED VOLUMES ...................................... 6 HOW BACKUP EXEC WORKS IN A CLUSTER ................................................................................... 7 NON SHARED STORAGE CONFIGURATIONS ..................................................................................10 SHARED STORAGE CONFIGURATIONS ............................................................................................10 GROUPWISE ON A CLUSTER................................................................................................................13 TROUBLESHOOTING AND QUESTIONS ............................................................................................14 ADDITIONAL RESOURCES ....................................................................................................................16 Introduction A NetWare Cluster services environment is a challenging environment for a backup application. In the cluster environment, data that used to be located in a predictable location day in, and day out suddenly acquires a degree of mobility. Additionally, it is now possible for a data volume to not only be in a different place when the backup begins; it is possible for the data volume to move while the backup is in progress. This document is meant to help you manage Backup Exec 9.1/9.2 for NetWare Servers (BENW) in a NetWare Cluster Services (NCS) environment. This will be done by explaining the specific details of how BENW operates, and behaves in a clustered environment. Audience This document is intended for Backup Exec for NetWare Servers administrators using Backup Exec in a Novell Cluster environment. Familiarity with Backup Exec for NetWare Servers and the Novell Cluster environment is assumed. There are references at the end of this document for supplemental information on the above products. Terminology There are many new, confusing, and redundant terms in the cluster environment. These are the terms that are used in this document and what they mean for the purposes of this discussion. Checkpoint Restart Checkpoint restart is the process of restarting the backup of a cluster volume after a failover of that volume. Checkpoint restart is only available if the volume that fails over is not the volume on which the Media Server doing the backup is installed. Cluster BE Installation An installation of the Backup Exec Media Server on a clustered volume. This is an instance of BE that can failover. Cluster Object The cluster object is the object that is used to manage the cluster. Selecting this object in ConsoleOne and then selecting the properties of the elements within it provides the management interface for editing the Load/Unload scripts, the secondary IP addresses, etc. Cluster Resources These are resources that are managed by the cluster software. The most common examples of cluster resources are disk volumes (cluster volume resource) and applications that are managed by the cluster so as to be highly available. This document is only concerned with cluster volume resources. Cluster Resource Elements Each cluster resource can have the following Cluster Resource Elements: Secondary IP Address; Load and Unload Scripts. Failback Failback is the process of moving a cluster resource from the secondary node back to the preferred node when the preferred node recovers from a failure. Failover Failover is the process of moving a cluster resource from the preferred node to a secondary node after a crash of the preferred node. Note the difference between a failover and a migration. Load and Unload Scripts A Cluster Resource always has Load and Unload scripts associated with it. You can see these scripts by looking at the Cluster Object and looking at the properties of the pieces inside the cluster object. The Load Script is run when the cluster resource is started on a cluster node. The Unload Script is run when the cluster resource is migrated to another node. Note that BE does not implement its own Cluster Resource type. Instead, it piggybacks on the cluster volume resource. The Load/Unload scripts for clustered BE are in the Load/Unload scripts for the cluster volume resource where BE has been installed. Local BE Installation An installation of the Backup Exec Media Server or Remote Agent on a non clustered volume. This is an instance of BE that can not failover. Note that if a cluster volume with an installation of BE on it migrates to a server running a local BE installation, the local BE installation is shut down before the clustered BE installation starts up. Migration Migration is the process of manually moving a cluster resource from one node to another node in the cluster. Note that cluster resources may behave differently in a migration than they would in the event of a failover. This is due to the fact that a migration is a controlled event in which the unload script for the resource has a chance to run. Node A node is node on the cluster. That is, a CPU and memory that is running NetWare. A cluster resource is in the end, hosted on a node in the cluster. Preferred Node This is the node that is primarily responsible for hosting a cluster resource. If the preferred node for a cluster resource is up and running, it is typically hosting the cluster resource. For example, if server Node1 is the preferred node for Cluster_Volume_A, Cluster_Volume_A is typically hosted by Node1. Secondary IP address A Cluster Resource must have a secondary IP address. The purpose of the secondary IP address is to serve as an indirect pointer to the real resource. As an analogy, if I moved around a lot, I would get Post Office (PO) box. Why? Because that way I can give out the PO Box address to all the people who need to mail something to me. When I move, those people won’t care. They can keep sending my mail to the PO Box, but it will still get to me. In the same way, a Cluster Resource is identified by its secondary IP address. All the users of that cluster resource use that secondary IP address. When the cluster resource moves, the clustering software that knows both the secondary IP address and the current node that is hosting the server routes the requests to the right place. Secondary Node The secondary node is the node that can be the host of a cluster resource when the preferred node is down. For example, if server Node2 is the secondary node for ClusterVolA, and the preferred node goes down, ClusterVolA fails over to Node2. Virtual Server A Virtual Server is a NetWare Core Protocol (NCP) server that represents a cluster resource. Typically the name of this object is the cluster object name followed by the volume name followed by the word “server”. For example, if the cluster object is named CLUSTER_OBJ and the volume is USER, the name of the Virtual Server that represents the cluster volume is CLUSTER_OBJ_USER_SERVER. Installation The installation of Backup Exec to the cluster proceeds very much like the normal installation of Backup Exec. There are however, some very important differences which are detailed below. Note that the installation software is not cluster aware. Media Server Installation A clustered installation of the Backup Exec media server must be installed to a cluster volume. It does not matter if the traditional Server/Volume name or the Cluster Resource name of the volume is used during the installation. What is important is that during the installation a clustered volume is chosen. When asked if the volume is a cluster volume, answer yes. The install copies all the necessary files to this node and cluster volume. A significant difference is that the new startup/stop NCF files for a clustered BE installation are BESTARTC.ncf and BESTOPC.ncf instead of the traditional BESTART.ncf and BESTOP.ncf scripts. After the install is finished, you need to invoke BESTARTC at the node where you just finished the install. Note that you will want all the nodes in the cluster to be up at this time. In addition, be aware that BE will have to temporarily offline the cluster volume in order to update the load/unload scripts for the cluster volume. The reason that the other nodes in the cluster need to be up is because BE needs to copy the BESTARTC.NCF and BESTOPC.NCF files to the SYS:\SYSTEM volumes on each of the nodes. If a node is offline during this process, you will have to manually copy the BESTARTC.NCF and BESTOPC.NCF files to that node. This manual installation of the NCF files also needs to be done if a new node is later added to the cluster. The Open File Option is installed to the SYS: volume on the server. Since it is not on the cluster volume, it does not migrate or failover along with the rest of the Backup Exec software. For this reason, the Open File Option needs to be installed on each node where it will be used. Volume Resource Scripts During the installation, the scripts that control loading and unloading BE are updated. Backup Exec does not have its own cluster resource. Instead, it piggy backs on the cluster volume resource of the volume on which it was installed. This is done by adding Backup Exec specific commands to the load and unload scripts of the cluster volume resource. The following lines are added to the end of the load script. IGNORE_ERROR BESTOP –C %while (loaded BESTOP) cmd delay 5 BESTARTC The IGNORE_ERROR BESTOP –C command unloads the local BE installation if there is one. The IGNORE_ERROR is necessary to ignore the resulting error in the event that there is no local BE installation. The %while (loaded BESTOP) cmd delay 5 command waits until the BESTOP.NLM is finished. The BESTARTC command launches the cluster BE installation. The following lines are added to the start of the unload script. BESTOPC –C %while (loaded BESTOP) cmd delay 5 IGNORE_ERROR BESTART The BESTOPC –C command unloads the clustered BE installation. The %while (loaded BESTOP) cmd delay 5 command waits until the BESTOP.NLM is finished. The IGNORE_ERROR BESTART command launches the local BE installation if there is one. The IGNORE_ERROR is necessary to ignore the resulting error in the event that there is no local BE installation. Remote Server Installation When you are installing the Remote Agent for NetWare Servers (RANW) to a server in the cluster, it is important to understand what the role of RANW is in the cluster. RANW is used to back up any local volumes (non-clustered). It is not used to back up clustered volumes. Therefore, it makes no sense to attempt to install RANW to a cluster volume. You might as well just leave it on a local volume. The only impact that RANW has on a cluster installation is that it should typically be loaded on the node while that node is not hosting the Cluster BE installation. Configuring Backup Exec to Backup Clustered Volumes One of the most impressive demonstrations of Backup Exec’s integration into Novell Cluster Services is the support of remote clustered volumes. This is the case where a media server is backing up a cluster volume. The media server is capable not only of accessing that cluster volume no matter where it is, but can even follow the volume when there is a failover in the middle of a backup operation. To achieve this functionality, there are some differences in the way that cluster volumes are configured within BENW. 1. The Name and IP addresses of the Virtual Server for the cluster volumes need to be in the NDMPSVRS.DAT file on the media server. Entries to this file can be made using the Options/Serial Numbers/Agents menu item in the Backup Exec NetWare Servers Administration Console. 2. On each of the nodes that could host the volume, make sure that TSA600 or TSAFS is cluster enabled. This can be changed in the SMSSTART.NCF file. Also make sure that the AUTOEXEC.NCF file on these nodes loads the cluster software (LDNCS) before the SMS software (SMSSTART). The Backup Exec Administration console should then display an entry for the clustered volume. For example: Agent CLUSTER_OBJ_USER_SERVER.NetWare Cluster File System Notice that there is no installation of Backup Exec software to the remote nodes. In the case of a clustered volume, Backup Exec leverages the failover functionality present in the NetWare 6 and NetWare 6.5 TSAs. The data path during a backup operation when backing up a cluster volume as configured above is: Cluster Volume on remote node → TSA on remote node → SMDR on remote node → SMDR on Media Server → NDMPD on Media Server → Media Server → backup media. How Backup Exec Works in a Cluster Differences between NetWare 5.1 and NetWare 6, 6.5 Clustered Volumes vs. Clustered Pools In NetWare 5.1 NSS Volumes are associated with the Cluster Resources. That is, the mapping that provides for location independence is done between the cluster object and the NSS Volume objects. In NetWare 6 and 6.5, the NSS Pools (or Storage Groups) provide this functionality. The NSS volumes inside the pools move around the cluster as a group inside the pools. To simplify the text, the term cluster volume resource is used to indicate a cluster volume generically. Checkpoint Restart Backup Exec does not support checkpoint restart In NetWare 5.1, SP4. The Novell SMS components in this release do not properly support checkpoint restart. It is possible that Novell will release updated SMS components for NetWare 5.1 that will make this possible in the future. Checkpoint restart is available for NetWare 6 and 6.5. Also note that checkpoint restart requires a failure during the backup of a file object. If a directory object is being backed up at the point of the failure, the checkpoint restart will fail. During full backups the backup is unlikely to fail at this point. However, if the failure occurs during an incremental or differential backup you may see the following message: Error: Unable to restart backup for device CLUSTER-VOLUME.NetWare Cluster File System: after cluster failover/failback because Backup Exec could not determine the last successfully backed up file. The backup of the device fails at this point. The Backup operation proceeds to back up any other required sets and ends the job with a failure. The directory that was being backed up is marked in the catalog as a corrupt file. TSA Cluster Option By default both TSA600 and TSAFS are cluster enabled. That is, they see clustered volumes via their cluster resource name. When the TSA is cluster enabled, it advertises the volumes on the node as cluster resources. When the TSA is not cluster enabled, it advertises the volumes on the node with the traditional Server/Volume name. TSA600 accepts the /cluster=off parameter to disable cluster support. TSAFS uses the /nocluster switch to do the same. The TSA also has a configuration file that may control setting of the cluster support switch. If the expected behavior is not seen with the TSA, check to see if the SYS:\ETC\SMS\TSA.CFG exists. If it does, look at this text file to see if the setting for the cluster switch is set properly. Differences between BENW 9.1 and BENW 9.2 Open File Option In BENW 9.1 the Open File Option is only usable on cluster volumes that are hosted on the media server. In BENW 9.2, the Open File Option can also be used on remote cluster volumes (volumes that are not hosted on the same node as the media server). In order to accomplish this, the both the Remote Agent for NetWare Servers and the Open File Option software must be installed and running on the remote node. The media server connects to the Remote Agent for NetWare Servers to initiate the Open File Option snapshot at the beginning of the backup and again at the end of the backup to close the snapshot. Speed File Browsing BENW 9.2 introduced the Speed File Browsing option in the Options/Media Server/Network screen. This option should not be enabled in a Novell Cluster Services environment. Doing so prevents Backup Exec from being able to locate, and back up cluster volumes. How does failover work? There are two circumstances involving failover; Media Server failover and Remote Cluster Volume failover. Note that there is a difference between a migration and a failover. When a manual migration is done on a volume with the media server, the jobs are cancelled as if the user had cancelled them, and Backup Exec shuts down normally. In this case, the job will not be restarted on the failover node as it would in the event of an actual failover. Media Server Failover Media Server Failover happens when the node that is hosting the BE Media Server fails. Nothing happens on the failed server. It is crashes. If a backup job was active during the failover, the tape that was being written to becomes unappendable. This is the same as if you pulled the power on the server during a backup in a non-clustered node. The tape in the drive has an incomplete set and doesn’t have an end of tape marker. The next time that BE tries to use the media; it notices the missing tape markers and flags the media as unappendable. Once the cluster software recognizes the failure, the load script for the cluster volume runs on the failover node. Any currently running instance of BE on the failover node is shut down and the cluster BE installation on the cluster volume is started up. On startup, BE determines if it was in the middle of a backup operation. If there was an active backup, a “Run Now” job is created to restart the backup at the beginning of the failed set. That job starts running and runs as a normal job would run. Note that if any local volumes on the failed node remain to be backed up, the failover node will not have access to those volume and their backups will fail. When looking at the job logs for a job that was active during a failover event, there are two job logs. The first is a failed job that indicates that it failed due to a cluster failure. The second is the job log from the “Run Now” job run on the failover node. If the media server that failed over is in a Clustered SAN, then there is the additional work of clearing the reservations for the media devices. If the library supports it, a LUN Reset is sent to the devices from the failover server. If the library does not support the LUN reset command, the tape drive will have a stuck reservations until the reservation is manually cleared. The tape drive is unusable until the reservation is cleared. The available methods for manually clearing the reservation depend on the hardware that is being used. The following are some ways this can be done: Power cycle the failed node that had the reservation. Power cycle the Fibre/SCSI bridge. Clear the reservation from BE on the failed node that had the reservation. This is done from the Administration Console. Select the Drives/Robots menu item. Select the affected Robot and then the drive in that robot. Then you can clear the reservation with the Ctrl-B key. Caution is advised before using any of the above methods. Power cycling the Fibre/SCSI bridge while other servers are using it for backup operations would terminate those operations and lead to failed backups along with even more unappendable media. Clearing the reservation for the wrong drive would be problematic as well. Only the backup operation that recovers from a failover. Restore and utility operations are not restarted after a failover. Remote Cluster Volume failover Remote Cluster Volume failover happens when the node that is hosting a volume that is backed up by BE fails over. In the event that this happens when no backup is occurring, the volume fails over to another node, and is found there when the backup begins. If the failover happens during a backup operation, the backup is suspended while the volume is mounted on the failover node. Once the volume is mounted on the failover node, the file that was interrupted is marked as a corrupt file, and the backup restarts at the beginning of that file. This process is referred to as a Checkpoint Restart. Also note that there are no BE modifications made to the Load/Unload scripts for these volumes. The only time that modifications are made to the Load/Unload scripts is when a Media Server is installed on a cluster volume. If a verify operation is run on the set that was being written, the verify operation fails because of the corrupt file. The resulting corrupt file on the tape can also cause a restore of the set to end in failure as well. This happens because the restore runs into the corrupt file on tape and can not restore it. This failure can either be ignored, or it can be prevented by using the “Exclude corrupt files” setting in the NetWare window from the restore job submission screen. One wrinkle in the remote failover scenario involves the use of Pre and Post commands. The Pre command would have been run on the failed node. Backup Exec does not go back and re-run that Pre command on the failover node. For example, if a failover occurs on a job that has both Pre and Post commands, the Pre command is run on the starting node, and the Post command ends up being run on the failover node. Non Shared Storage Configurations A Non Shared Storage Configuration is one in which the backup devices (tape drives and libraries) are not shared between media server. In this case, tape hardware is directly connected to each node. When a media server fails over to another node, the new node resumes running the job on its own hardware. In this configuration, each node shares the Jobs, Policies, DrivePools, MediaSets, and Partition information. However, the Drives, Robots, Slots, and Media entries are specific to each node in the cluster. This node specific data is stored in the BKUPEXEC\MM\NodeName directories where NodeName is the server name of the node where this tape hardware is located. Partition Management mode is not available in clustered SSO environments. The Backup Exec installation must use Media Management mode. Backup-to-Disk Backup-to-Disk Folders are not currently supported in SSO SAN environments. Backup-to-Disk Folders can be used in a Non Shared Storage configuration, but it requires some extra configuration steps. 1. Create a Backup-to-Disk folder and put the data for that folder on the cluster volume. The easiest place to put it is on the cluster volume that BE is installed on. That way, you are assured that the Backup-to-Disk folder is available to the BE installation no matter what node it is running on. 2. Create any other Backup-to-Disk folders in the same manner. 3. Migrate BE to the other nodes. 4. On each node, create a Backup-to-Disk folder of the same name, and point it at the same location. This is necessary because the Backup-to-Disk configuration files are maintained locally on each node. By creating Backup-to-Disk folders with the same name, jobs targeted at a Backup-to-Disk folder will find it no matter what node it is on. Best Practices Because the drive pools are shared among the nodes, it is best to target jobs at a drive pool rather than to individual drives. Then, place drives from each node into the drive pool. This way, no matter which node is running the job, a drive is available for the job. Note that if the drives are identical, with identical names, host adapter numbers, and SCSI ID’s, you could target a job at a particular drive and have it successfully failover to the other node. However, creating a drive pool will greatly simplify the process. This is the procedure for setting up the drive pools: 1. Create the drive pool on the current node and assign local drives to it. 2. Migrate Backup Exec to the other node. 3. Add the local drives from the failover node to the drive pool. As an alternative, you can simply use the All Drives pool which will automatically use whatever local drive is available. No extra configuration is necessary to update the All Drives pool. Shared Storage Configurations A Shared Storage configuration is one in which the backup devices (tape libraries) are shared between two or more media servers. Choices In a clustered SAN environment, we have the following competing and even contradictory desires: Local Backup of Data It is always faster to back up data locally. That is Data Path A is faster than Data Path B. A. Disk → SAN → Media Server → SAN → Tape Device B. Disk → SAN → Remote Server → LAN → Media Server → SAN → Tape Device Data Location Independence A clustered data volume can move around. It would be nice to set up the backup jobs so that the data volumes get backed up, no matter where they happen to be when the backup occurs. Uninterrupted Backup Operations If a volume fails over during a backup it may be critical to continue the backup of that volume immediately. This is especially true of very large volumes. As you can see, the goals themselves are contradictory. It is desirable to have all local backups. At the same time it is not desirable to have backups tied to the local media server. Data location independence is good, but sometimes means a performance degradation when the data has to be transported across the LAN. SSO Rules The following rules must be observed with BENW in a clustered SAN environment. Only one instance of BENW can be running on a node at a time. A local instance of BE must be shut down before a clustered BE installation can be started on a node. A cluster cannot contain both Group Servers and Primary Group Servers. This configuration just contains too many possibilities for failure. If a Group Server were to fail over on top of the Primary Group Server, the entire BE installation would grind to a halt. A clustered Group Server cannot fail over another clustered Group Server. It doesn’t make sense to have one Group Server fail over the top of another Group Server. Each Group Server would be responsible for backing up a certain portion of the backup domain. In the event of a failover, only one of those Group Servers would be active, thus the gain of being able to continue this job would be offset by losing the ability to run the jobs that the Group Server on the failover node. Of course, the backup domains could completely overlap with each Group Server backing up everything, but in that case, failover of BE is unnecessary because there is already redundant backup of the entire backup domain. Example Clustered SAN configurations 1-1 Redundancy In an ideal situation, there would be enough redundant hardware to ensure the constant availability of all the components necessary to do a backup, and all backups would be local. To do so requires a redundant server for each media server. The redundant server would typically be idling. In the event that a media server fails, the clustered BE installation would fail over to the idling server and the backup would continue. Benefits Complete redundancy of backup operations. Local backup of data Drawbacks Hardware and software costs for redundant systems. High degree of complexity. Many-1 Redundancy In this situation, there would be 1 idling server in the cluster. In the event that a media server fails, the clustered BE installation would fail over to the idling server and the backup would continue. In the event of multiple server failures, only the first server would be able to failover. This configuration is not currently supported by BE. There is no mechanism to keep multiple servers from failing over to the spare server. Distributed Local BE Installations In this situation, all of the media servers are installed on non-cluster volumes. If a server goes down, any backup jobs processed by that system do not run until the node is brought back online. Note that if it is the Primary Group Server node that fails, the Group Servers will not be able to proceed until the Primary Group Server node is brought back online. In this scenario, it is assumed that the backups are made by selecting the Server object so that backups hit all volumes currently mounted on this system. Note that there is no point in using the cluster volume resources in this scenario as each server is backing itself up. If the server fails, the volumes failover, but there is no failover of the backup job. Benefits Local backup of data Low complexity Volumes get backed up regardless of location Drawbacks If a node fails during the backup, the volumes on the node won’t get backed up until the next backup job runs on the nodes where they failed over to. Remote Backup of Cluster Volumes In this situation, all of the volumes are backed up remotely. The media server is installed in one location and the Remote Agent for NetWare is installed on each of the servers in the cluster. A backup job is submitted that backs up all the local volumes in the cluster via the Server/Volume view. The cluster volumes are backed up using the cluster server view. The media server can either be in, or out of the cluster. The backups will run regardless of where the volumes are mounted. If a node fails during a backup of a cluster volume, the backup continues as soon as the volume is mounted on the failover node. Benefits No redundant hardware costs Backups occur regardless of node failures Low complexity Drawbacks Remote backup of data over the LAN. This could be ameliorated by using a dedicated backup network. Best Practices in a Clustered SAN The best practices largely depend on the SAN configuration. It is important to check the SSO Rules section above and make sure that they are not being violated. Additionally: Install BE on its own cluster resource. This is not necessary, but simplifies the management of Backup Exec. By having BE installed in its own cluster pool, it can be migrated around the cluster without affecting other resources on the cluster. Just keep in mind that the volume needs to be large enough to accommodate the data generated during backup operations. The larger users of disk space in a BE installation are the catalogs and Backup-to-Disk. Make sure that time is synchronized between the nodes Having time synchronized is something that is likely already handled. There are plenty of other reasons to be sure that all the servers in the cluster are time synchronized. BE needs the time synchronized because otherwise the job schedule is going to be interpreted differently by the different nodes. GroupWise on a Cluster Limitations Unlike the file system TSA, the GroupWise TSA (GWTSA) does not provide a node independent name for the objects it publishes. Thus, when a selection list is created to back up a post office, domain, or library, the name includes the name of the node. If the volume containing that object fails over to another node, the selection list won’t work any more. For example, when selecting the post office when it is mounted on server NODE1, the file selection looks something like this: NODE1.GroupWise System/[Dom]DOM: When the volume fails over to NODE2, the name for the same post office becomes: NODE2.GroupWise System/[Dom]DOM: A backup job that is backing up using the first name won’t find the post office once it has moved. There are a couple of ways to address this problem as described below. Back up using the high level TSA selection In this method, the jobs are set up such they back up all resources that are currently supported by a node. Instead of selecting the individual objects (post offices, domains and libraries), make the selection one level higher. That is, select the entire GWTSA for backup. The effect on the backup is that instead of searching for individual objects to back up, it simply backs up everything that happens to be available on that node at the time. The advantage is that you don’t care where the volume is located when the backup runs. As long as you back up all the GWTSAs where the volume can fail to, it will get backed up. A disadvantage is that if for some reason a volume doesn’t get mounted anywhere, it won’t get backed up, and there will be no error in the backups. Redundant Backup Selections In this method, the backup job backs up all the individual resources on a node that could possibly be mounted there. This results in the same object being described in multiple backup jobs. No matter where the object is, it gets backed up. To do this, the volumes all need to be migrated to a node, then create the selection list for the job. Repeat this operation for each node. The big disadvantage with this technique is that all the jobs always fail because they can never find all the objects they need to back up. This can be made a little easier to manage by splitting up jobs into jobs that back up objects in their “normal” location, and jobs that back up objects in their “failover” locations. As long as everything is “normal”, you would expect the normal jobs to complete successfully, and the “failover” jobs to end in error. Troubleshooting and Questions Why don’t I see the cluster volumes in the Backup Sources window? There are three things to check. First, make sure that the TSA (TSA600 or TSAFS) is cluster enabled. See TSA Cluster Options for details. Secondly, make sure that you have started the cluster software before loading the SMS components. This can be easily checked by unloading SMDR.NLM. If SMDR.NLM won’t unload, it tells you all the NLMS that you need to unload first. Unload them, and then unload SMDR.NLM. Then load the TSA without disabling the cluster functionality. Reload Backup Exec and the cluster volumes should show up properly. Lastly, make sure that NDMPSVRS.DAT has been properly updated with the cluster resource names and IP addresses. It is important to use the correct name for the cluster resource. At the media server, run the “DISPLAY SLP SERVICES SMDR.NOVELL” command. This should list all the available resources, and should include the cluster resource names and IP addresses. This is a handy format for verifying that the NDMPSVRS.DAT file has the correct names and IP addresses. If the volumes are not listed in DISPLAY SLP SERVICES SMDR.NOVELL at the media server, but they do show up using the same command on the node where they are mounted, then there is a problem with SLP advertising. Try using the SET SLP RESET = ON command at the media server and on the node where the resources is mounted to see if re-broadcasting the SLP information resolves the problem. Note that the names and IP addresses can also be found in the Load/Unload scripts for the cluster resources. The Load/Unload scripts can be viewed in ConsoleOne by selecting the cluster object in the tree view and then viewing the properties of the cluster resource. Does the remote agent get used for remote cluster backups? No, NDMPD.NLM is not used when backing up remote cluster volumes. Can I do an Open File Option backup of a remote cluster volume? BENW 9.1 No. In order to do an Open File Option backup on a cluster volume, the cluster volume must be local to the media server. The other option is to back up the cluster volume via the Server/Volume name of the volume. BENW 9.2 Yes. In order to do an Open File Option backup on a cluster volume, make sure that the Open File Option software has been installed on the the cluster volume must be local to the media server. The other option is to back up the cluster volume via the Server/Volume name of the volume. What happens when BESTOP can’t stop BE? There is a chance that BESTOP will fail to unload Backup Exec. This is most likely to occur when Backup Exec is in the middle of a job when BESTOP is invoked. When it is invoked with the –C switch, BESTOP attempts to forcibly terminate any jobs and clean up resources before Backup Exec unloads. If BESTOP is unable to unload Backup Exec for NetWare Servers, it will fail, and sit at a prompt waiting indefinitely for user input. The cluster resource Load/Unload script that invoked BESTOP will also wait indefinitely for BESTOP to be unloaded. When a user responds to the BESTOP failure, the cluster resource script will continue. Note that this is an abnormal occurrence and should not occur in normal operations. How do I make a whole server selection when backing up a cluster node? When the TSA is cluster enabled (See TSA Cluster Options ), the user is prevented from making a selection to back up all volumes on the server. In a non-clustered environment, the TSA presents a server centric view of the volumes on the server. In this environment, the user can select the server object for a backup operation. This results in BE backing up all the volumes that are mounted on the server when the backup occurs. When the TSA is cluster enabled in a clustered environment, the cluster volumes are no longer presented in a server centric fashion. The result is that a backup selection that selects the server object will not back up the cluster volumes mounted on the node at the time of backup. This happens because the cluster volumes are not associated with the server. Will restore or utility operations failover? No. Only backup operations fail over. How do the secondary nodes in a cluster know where the primary is? The Backup Group Members in the BE_ESI_GROUP object include the cluster volume resource. The Backup Group Members can be seen in Options/Media Server Options / Shared Devices / Backup Group Members). How do I clear a reservation on a drive that was being used during a failure? See the Media Server Failover section above. I am getting a GrantAccessToServer: NWDSModifyObject(1) <-602> error when I migrate the clustered BE installation to another node in the cluster. How do I fix it? This is likely happening if the cluster nodes are not all in the same NDS context. If this is the case, then the problem is that the Backup Exec account object doesn’t have enough rights to add itself to the Backup Exec Queue object as an operator. The problem is that a different queue object gets created for each node that the Media Server migrates to. Grant the Backup Exec object rights to modify each of the queue objects and this problem should be resolved. Additional Resources Cluster Services for NetWare 6 Documentation http://www.novell.com/documentation/lg/ncs6p/index.html Backup Exec for NetWare Servers Documentation http://support.veritas.com/main_ddProduct_BENWARE.htm AppNote on setting up a 2 node SCSI SAN cluster http://developer.novell.com/research/sections/netsupport/networkts/2002/february/z020201.htm Novell TID – How to configure the GWTSA in a NetWare Cluster environment? http://support.novell.com/cgi-bin/search/searchtid.cgi?/10084545.htm
© Copyright 2026 Paperzz