Backup Exec 9.1 for NetWare Servers

Revision
3.5
Last Update:
10/10/2005
Backup Exec™ 9.1/9.2 for NetWare Servers
Novell Cluster Server Environment
INTRODUCTION ........................................................................................................................................ 3
AUDIENCE ................................................................................................................................................... 3
TERMINOLOGY ......................................................................................................................................... 3
INSTALLATION ......................................................................................................................................... 5
CONFIGURING BACKUP EXEC TO BACKUP CLUSTERED VOLUMES ...................................... 6
HOW BACKUP EXEC WORKS IN A CLUSTER ................................................................................... 7
NON SHARED STORAGE CONFIGURATIONS ..................................................................................10
SHARED STORAGE CONFIGURATIONS ............................................................................................10
GROUPWISE ON A CLUSTER................................................................................................................13
TROUBLESHOOTING AND QUESTIONS ............................................................................................14
ADDITIONAL RESOURCES ....................................................................................................................16
Introduction
A NetWare Cluster services environment is a challenging environment for a backup application.
In the cluster environment, data that used to be located in a predictable location day in, and day
out suddenly acquires a degree of mobility. Additionally, it is now possible for a data volume to
not only be in a different place when the backup begins; it is possible for the data volume to move
while the backup is in progress. This document is meant to help you manage Backup Exec
9.1/9.2 for NetWare Servers (BENW) in a NetWare Cluster Services (NCS) environment. This
will be done by explaining the specific details of how BENW operates, and behaves in a clustered
environment.
Audience
This document is intended for Backup Exec for NetWare Servers administrators using Backup
Exec in a Novell Cluster environment. Familiarity with Backup Exec for NetWare Servers and the
Novell Cluster environment is assumed. There are references at the end of this document for
supplemental information on the above products.
Terminology
There are many new, confusing, and redundant terms in the cluster environment. These are the
terms that are used in this document and what they mean for the purposes of this discussion.
Checkpoint Restart
Checkpoint restart is the process of restarting the
backup of a cluster volume after a failover of that
volume. Checkpoint restart is only available if the
volume that fails over is not the volume on which the
Media Server doing the backup is installed.
Cluster BE Installation
An installation of the Backup Exec Media Server on a
clustered volume. This is an instance of BE that can
failover.
Cluster Object
The cluster object is the object that is used to manage
the cluster. Selecting this object in ConsoleOne and
then selecting the properties of the elements within it
provides the management interface for editing the
Load/Unload scripts, the secondary IP addresses, etc.
Cluster Resources
These are resources that are managed by the cluster
software. The most common examples of cluster
resources are disk volumes (cluster volume resource)
and applications that are managed by the cluster so as
to be highly available. This document is only concerned
with cluster volume resources.
Cluster Resource Elements
Each cluster resource can have the following Cluster
Resource Elements: Secondary IP Address; Load and
Unload Scripts.
Failback
Failback is the process of moving a cluster resource
from the secondary node back to the preferred node
when the preferred node recovers from a failure.
Failover
Failover is the process of moving a cluster resource from
the preferred node to a secondary node after a crash of
the preferred node. Note the difference between a
failover and a migration.
Load and Unload Scripts
A Cluster Resource always has Load and Unload scripts
associated with it. You can see these scripts by looking
at the Cluster Object and looking at the properties of the
pieces inside the cluster object.
The Load Script is run when the cluster resource is
started on a cluster node. The Unload Script is run
when the cluster resource is migrated to another node.
Note that BE does not implement its own Cluster
Resource type. Instead, it piggybacks on the cluster
volume resource. The Load/Unload scripts for clustered
BE are in the Load/Unload scripts for the cluster volume
resource where BE has been installed.
Local BE Installation
An installation of the Backup Exec Media Server or
Remote Agent on a non clustered volume. This is an
instance of BE that can not failover. Note that if a cluster
volume with an installation of BE on it migrates to a
server running a local BE installation, the local BE
installation is shut down before the clustered BE
installation starts up.
Migration
Migration is the process of manually moving a cluster
resource from one node to another node in the cluster.
Note that cluster resources may behave differently in a
migration than they would in the event of a failover. This
is due to the fact that a migration is a controlled event in
which the unload script for the resource has a chance to
run.
Node
A node is node on the cluster. That is, a CPU and
memory that is running NetWare. A cluster resource is
in the end, hosted on a node in the cluster.
Preferred Node
This is the node that is primarily responsible for hosting
a cluster resource. If the preferred node for a cluster
resource is up and running, it is typically hosting the
cluster resource. For example, if server Node1 is the
preferred node for Cluster_Volume_A,
Cluster_Volume_A is typically hosted by Node1.
Secondary IP address
A Cluster Resource must have a secondary IP address.
The purpose of the secondary IP address is to serve as
an indirect pointer to the real resource. As an analogy, if
I moved around a lot, I would get Post Office (PO) box.
Why? Because that way I can give out the PO Box
address to all the people who need to mail something to
me. When I move, those people won’t care. They can
keep sending my mail to the PO Box, but it will still get to
me.
In the same way, a Cluster Resource is identified by its
secondary IP address. All the users of that cluster
resource use that secondary IP address. When the
cluster resource moves, the clustering software that
knows both the secondary IP address and the current
node that is hosting the server routes the requests to the
right place.
Secondary Node
The secondary node is the node that can be the host of
a cluster resource when the preferred node is down. For
example, if server Node2 is the secondary node for
ClusterVolA, and the preferred node goes down,
ClusterVolA fails over to Node2.
Virtual Server
A Virtual Server is a NetWare Core Protocol (NCP)
server that represents a cluster resource. Typically the
name of this object is the cluster object name followed
by the volume name followed by the word “server”. For
example, if the cluster object is named CLUSTER_OBJ
and the volume is USER, the name of the Virtual Server
that represents the cluster volume is
CLUSTER_OBJ_USER_SERVER.
Installation
The installation of Backup Exec to the cluster proceeds very much like the normal installation of
Backup Exec. There are however, some very important differences which are detailed below.
Note that the installation software is not cluster aware.
Media Server Installation
A clustered installation of the Backup Exec media server must be installed to a cluster volume. It
does not matter if the traditional Server/Volume name or the Cluster Resource name of the
volume is used during the installation. What is important is that during the installation a clustered
volume is chosen. When asked if the volume is a cluster volume, answer yes.
The install copies all the necessary files to this node and cluster volume. A significant difference
is that the new startup/stop NCF files for a clustered BE installation are BESTARTC.ncf and
BESTOPC.ncf instead of the traditional BESTART.ncf and BESTOP.ncf scripts.
After the install is finished, you need to invoke BESTARTC at the node where you just finished the
install. Note that you will want all the nodes in the cluster to be up at this time. In addition, be
aware that BE will have to temporarily offline the cluster volume in order to update the
load/unload scripts for the cluster volume.
The reason that the other nodes in the cluster need to be up is because BE needs to copy the
BESTARTC.NCF and BESTOPC.NCF files to the SYS:\SYSTEM volumes on each of the nodes. If
a node is offline during this process, you will have to manually copy the BESTARTC.NCF and
BESTOPC.NCF files to that node. This manual installation of the NCF files also needs to be done
if a new node is later added to the cluster.
The Open File Option is installed to the SYS: volume on the server. Since it is not on the cluster
volume, it does not migrate or failover along with the rest of the Backup Exec software. For this
reason, the Open File Option needs to be installed on each node where it will be used.
Volume Resource Scripts
During the installation, the scripts that control loading and unloading BE are updated. Backup
Exec does not have its own cluster resource. Instead, it piggy backs on the cluster volume
resource of the volume on which it was installed. This is done by adding Backup Exec specific
commands to the load and unload scripts of the cluster volume resource.
The following lines are added to the end of the load script.
IGNORE_ERROR BESTOP –C
%while (loaded BESTOP) cmd delay 5
BESTARTC
The IGNORE_ERROR BESTOP –C command unloads the local BE installation if there is one. The
IGNORE_ERROR is necessary to ignore the resulting error in the event that there is no local BE
installation.
The %while (loaded BESTOP) cmd delay 5 command waits until the BESTOP.NLM is
finished.
The BESTARTC command launches the cluster BE installation.
The following lines are added to the start of the unload script.
BESTOPC –C
%while (loaded BESTOP) cmd delay 5
IGNORE_ERROR BESTART
The BESTOPC –C command unloads the clustered BE installation.
The %while (loaded BESTOP) cmd delay 5 command waits until the BESTOP.NLM is
finished.
The IGNORE_ERROR BESTART command launches the local BE installation if there is one. The
IGNORE_ERROR is necessary to ignore the resulting error in the event that there is no local BE
installation.
Remote Server Installation
When you are installing the Remote Agent for NetWare Servers (RANW) to a server in the
cluster, it is important to understand what the role of RANW is in the cluster. RANW is used to
back up any local volumes (non-clustered). It is not used to back up clustered volumes.
Therefore, it makes no sense to attempt to install RANW to a cluster volume. You might as well
just leave it on a local volume. The only impact that RANW has on a cluster installation is that it
should typically be loaded on the node while that node is not hosting the Cluster BE installation.
Configuring Backup Exec to Backup Clustered Volumes
One of the most impressive demonstrations of Backup Exec’s integration into Novell Cluster
Services is the support of remote clustered volumes. This is the case where a media server is
backing up a cluster volume. The media server is capable not only of accessing that cluster
volume no matter where it is, but can even follow the volume when there is a failover in the
middle of a backup operation.
To achieve this functionality, there are some differences in the way that cluster volumes are
configured within BENW.
1. The Name and IP addresses of the Virtual Server for the cluster volumes need to be in
the NDMPSVRS.DAT file on the media server. Entries to this file can be made using the
Options/Serial Numbers/Agents menu item in the Backup Exec NetWare Servers
Administration Console.
2. On each of the nodes that could host the volume, make sure that TSA600 or TSAFS is
cluster enabled. This can be changed in the SMSSTART.NCF file. Also make sure that
the AUTOEXEC.NCF file on these nodes loads the cluster software (LDNCS) before the
SMS software (SMSSTART).
The Backup Exec Administration console should then display an entry for the clustered volume.
For example:
Agent CLUSTER_OBJ_USER_SERVER.NetWare Cluster File System
Notice that there is no installation of Backup Exec software to the remote nodes. In the case of a
clustered volume, Backup Exec leverages the failover functionality present in the NetWare 6 and
NetWare 6.5 TSAs. The data path during a backup operation when backing up a cluster volume
as configured above is: Cluster Volume on remote node → TSA on remote node → SMDR on
remote node → SMDR on Media Server → NDMPD on Media Server → Media Server → backup
media.
How Backup Exec Works in a Cluster
Differences between NetWare 5.1 and NetWare 6, 6.5
Clustered Volumes vs. Clustered Pools
In NetWare 5.1 NSS Volumes are associated with the Cluster Resources. That is, the mapping
that provides for location independence is done between the cluster object and the NSS Volume
objects.
In NetWare 6 and 6.5, the NSS Pools (or Storage Groups) provide this functionality. The NSS
volumes inside the pools move around the cluster as a group inside the pools. To simplify the
text, the term cluster volume resource is used to indicate a cluster volume generically.
Checkpoint Restart
Backup Exec does not support checkpoint restart In NetWare 5.1, SP4. The Novell SMS
components in this release do not properly support checkpoint restart. It is possible that Novell
will release updated SMS components for NetWare 5.1 that will make this possible in the future.
Checkpoint restart is available for NetWare 6 and 6.5.
Also note that checkpoint restart requires a failure during the backup of a file object. If a directory
object is being backed up at the point of the failure, the checkpoint restart will fail. During full
backups the backup is unlikely to fail at this point. However, if the failure occurs during an
incremental or differential backup you may see the following message:
Error: Unable to restart backup for device CLUSTER-VOLUME.NetWare Cluster File
System: after cluster failover/failback because Backup Exec could not determine the last
successfully backed up file.
The backup of the device fails at this point. The Backup operation proceeds to back up any other
required sets and ends the job with a failure. The directory that was being backed up is marked
in the catalog as a corrupt file.
TSA Cluster Option
By default both TSA600 and TSAFS are cluster enabled. That is, they see clustered volumes via
their cluster resource name. When the TSA is cluster enabled, it advertises the volumes on the
node as cluster resources. When the TSA is not cluster enabled, it advertises the volumes on the
node with the traditional Server/Volume name. TSA600 accepts the /cluster=off parameter to
disable cluster support. TSAFS uses the /nocluster switch to do the same.
The TSA also has a configuration file that may control setting of the cluster support switch. If the
expected behavior is not seen with the TSA, check to see if the SYS:\ETC\SMS\TSA.CFG exists.
If it does, look at this text file to see if the setting for the cluster switch is set properly.
Differences between BENW 9.1 and BENW 9.2
Open File Option
In BENW 9.1 the Open File Option is only usable on cluster volumes that are hosted on the
media server. In BENW 9.2, the Open File Option can also be used on remote cluster volumes
(volumes that are not hosted on the same node as the media server). In order to accomplish this,
the both the Remote Agent for NetWare Servers and the Open File Option software must be
installed and running on the remote node. The media server connects to the Remote Agent for
NetWare Servers to initiate the Open File Option snapshot at the beginning of the backup and
again at the end of the backup to close the snapshot.
Speed File Browsing
BENW 9.2 introduced the Speed File Browsing option in the Options/Media Server/Network
screen. This option should not be enabled in a Novell Cluster Services environment. Doing so
prevents Backup Exec from being able to locate, and back up cluster volumes.
How does failover work?
There are two circumstances involving failover; Media Server failover and Remote Cluster
Volume failover. Note that there is a difference between a migration and a failover. When a
manual migration is done on a volume with the media server, the jobs are cancelled as if the user
had cancelled them, and Backup Exec shuts down normally. In this case, the job will not be
restarted on the failover node as it would in the event of an actual failover.
Media Server Failover
Media Server Failover happens when the node that is hosting the BE Media Server fails. Nothing
happens on the failed server. It is crashes. If a backup job was active during the failover, the
tape that was being written to becomes unappendable. This is the same as if you pulled the
power on the server during a backup in a non-clustered node. The tape in the drive has an
incomplete set and doesn’t have an end of tape marker. The next time that BE tries to use the
media; it notices the missing tape markers and flags the media as unappendable.
Once the cluster software recognizes the failure, the load script for the cluster volume runs on the
failover node. Any currently running instance of BE on the failover node is shut down and the
cluster BE installation on the cluster volume is started up. On startup, BE determines if it was in
the middle of a backup operation. If there was an active backup, a “Run Now” job is created to
restart the backup at the beginning of the failed set. That job starts running and runs as a normal
job would run. Note that if any local volumes on the failed node remain to be backed up, the
failover node will not have access to those volume and their backups will fail.
When looking at the job logs for a job that was active during a failover event, there are two job
logs. The first is a failed job that indicates that it failed due to a cluster failure. The second is the
job log from the “Run Now” job run on the failover node.
If the media server that failed over is in a Clustered SAN, then there is the additional work of
clearing the reservations for the media devices. If the library supports it, a LUN Reset is sent to
the devices from the failover server. If the library does not support the LUN reset command, the
tape drive will have a stuck reservations until the reservation is manually cleared. The tape drive
is unusable until the reservation is cleared. The available methods for manually clearing the
reservation depend on the hardware that is being used. The following are some ways this can be
done:
Power cycle the failed node that had the reservation.
Power cycle the Fibre/SCSI bridge.
Clear the reservation from BE on the failed node that had the reservation. This is done
from the Administration Console. Select the Drives/Robots menu item. Select the
affected Robot and then the drive in that robot. Then you can clear the reservation with
the Ctrl-B key.
Caution is advised before using any of the above methods. Power cycling the Fibre/SCSI bridge
while other servers are using it for backup operations would terminate those operations and lead
to failed backups along with even more unappendable media. Clearing the reservation for the
wrong drive would be problematic as well.
Only the backup operation that recovers from a failover. Restore and utility operations are not
restarted after a failover.
Remote Cluster Volume failover
Remote Cluster Volume failover happens when the node that is hosting a volume that is backed
up by BE fails over. In the event that this happens when no backup is occurring, the volume fails
over to another node, and is found there when the backup begins. If the failover happens during
a backup operation, the backup is suspended while the volume is mounted on the failover node.
Once the volume is mounted on the failover node, the file that was interrupted is marked as a
corrupt file, and the backup restarts at the beginning of that file. This process is referred to as a
Checkpoint Restart.
Also note that there are no BE modifications made to the Load/Unload scripts for these volumes.
The only time that modifications are made to the Load/Unload scripts is when a Media Server is
installed on a cluster volume.
If a verify operation is run on the set that was being written, the verify operation fails because of
the corrupt file. The resulting corrupt file on the tape can also cause a restore of the set to end in
failure as well. This happens because the restore runs into the corrupt file on tape and can not
restore it. This failure can either be ignored, or it can be prevented by using the “Exclude corrupt
files” setting in the NetWare window from the restore job submission screen.
One wrinkle in the remote failover scenario involves the use of Pre and Post commands. The Pre
command would have been run on the failed node. Backup Exec does not go back and re-run
that Pre command on the failover node. For example, if a failover occurs on a job that has both
Pre and Post commands, the Pre command is run on the starting node, and the Post command
ends up being run on the failover node.
Non Shared Storage Configurations
A Non Shared Storage Configuration is one in which the backup devices (tape drives and
libraries) are not shared between media server. In this case, tape hardware is directly connected
to each node. When a media server fails over to another node, the new node resumes running
the job on its own hardware.
In this configuration, each node shares the Jobs, Policies, DrivePools, MediaSets, and Partition
information. However, the Drives, Robots, Slots, and Media entries are specific to each node in
the cluster. This node specific data is stored in the BKUPEXEC\MM\NodeName directories where
NodeName is the server name of the node where this tape hardware is located.
Partition Management mode is not available in clustered SSO environments. The Backup Exec
installation must use Media Management mode.
Backup-to-Disk
Backup-to-Disk Folders are not currently supported in SSO SAN environments.
Backup-to-Disk Folders can be used in a Non Shared Storage configuration, but it requires some
extra configuration steps.
1. Create a Backup-to-Disk folder and put the data for that folder on the cluster volume.
The easiest place to put it is on the cluster volume that BE is installed on. That way,
you are assured that the Backup-to-Disk folder is available to the BE installation no
matter what node it is running on.
2. Create any other Backup-to-Disk folders in the same manner.
3. Migrate BE to the other nodes.
4. On each node, create a Backup-to-Disk folder of the same name, and point it at the
same location.
This is necessary because the Backup-to-Disk configuration files are maintained locally on each
node. By creating Backup-to-Disk folders with the same name, jobs targeted at a Backup-to-Disk
folder will find it no matter what node it is on.
Best Practices
Because the drive pools are shared among the nodes, it is best to target jobs at a drive pool
rather than to individual drives. Then, place drives from each node into the drive pool. This way,
no matter which node is running the job, a drive is available for the job. Note that if the drives are
identical, with identical names, host adapter numbers, and SCSI ID’s, you could target a job at a
particular drive and have it successfully failover to the other node. However, creating a drive pool
will greatly simplify the process.
This is the procedure for setting up the drive pools:
1. Create the drive pool on the current node and assign local drives to it.
2. Migrate Backup Exec to the other node.
3. Add the local drives from the failover node to the drive pool.
As an alternative, you can simply use the All Drives pool which will automatically use whatever
local drive is available. No extra configuration is necessary to update the All Drives pool.
Shared Storage Configurations
A Shared Storage configuration is one in which the backup devices (tape libraries) are shared
between two or more media servers.
Choices
In a clustered SAN environment, we have the following competing and even contradictory
desires:
Local Backup of Data
It is always faster to back up data locally. That is Data Path A is faster than Data Path B.
A. Disk → SAN → Media Server → SAN → Tape Device
B. Disk → SAN → Remote Server → LAN → Media Server → SAN → Tape Device
Data Location Independence
A clustered data volume can move around. It would be nice to set up the backup jobs so
that the data volumes get backed up, no matter where they happen to be when the
backup occurs.
Uninterrupted Backup Operations
If a volume fails over during a backup it may be critical to continue the backup of that
volume immediately. This is especially true of very large volumes.
As you can see, the goals themselves are contradictory. It is desirable to have all local backups.
At the same time it is not desirable to have backups tied to the local media server. Data location
independence is good, but sometimes means a performance degradation when the data has to
be transported across the LAN.
SSO Rules
The following rules must be observed with BENW in a clustered SAN environment.
Only one instance of BENW can be running on a node at a time.
A local instance of BE must be shut down before a clustered BE installation can be
started on a node.
A cluster cannot contain both Group Servers and Primary Group Servers.
This configuration just contains too many possibilities for failure. If a Group Server were
to fail over on top of the Primary Group Server, the entire BE installation would grind to a
halt.
A clustered Group Server cannot fail over another clustered Group Server.
It doesn’t make sense to have one Group Server fail over the top of another Group
Server. Each Group Server would be responsible for backing up a certain portion of the
backup domain. In the event of a failover, only one of those Group Servers would be
active, thus the gain of being able to continue this job would be offset by losing the ability
to run the jobs that the Group Server on the failover node. Of course, the backup
domains could completely overlap with each Group Server backing up everything, but in
that case, failover of BE is unnecessary because there is already redundant backup of
the entire backup domain.
Example Clustered SAN configurations
1-1 Redundancy
In an ideal situation, there would be enough redundant hardware to ensure the constant
availability of all the components necessary to do a backup, and all backups would be local. To
do so requires a redundant server for each media server. The redundant server would typically
be idling. In the event that a media server fails, the clustered BE installation would fail over to the
idling server and the backup would continue.
Benefits
Complete redundancy of backup operations.
Local backup of data
Drawbacks
Hardware and software costs for redundant systems.
High degree of complexity.
Many-1 Redundancy
In this situation, there would be 1 idling server in the cluster. In the event that a media server
fails, the clustered BE installation would fail over to the idling server and the backup would
continue. In the event of multiple server failures, only the first server would be able to failover.
This configuration is not currently supported by BE. There is no mechanism to keep multiple
servers from failing over to the spare server.
Distributed Local BE Installations
In this situation, all of the media servers are installed on non-cluster volumes. If a server goes
down, any backup jobs processed by that system do not run until the node is brought back online.
Note that if it is the Primary Group Server node that fails, the Group Servers will not be able to
proceed until the Primary Group Server node is brought back online. In this scenario, it is
assumed that the backups are made by selecting the Server object so that backups hit all
volumes currently mounted on this system. Note that there is no point in using the cluster volume
resources in this scenario as each server is backing itself up. If the server fails, the volumes
failover, but there is no failover of the backup job.
Benefits
Local backup of data
Low complexity
Volumes get backed up regardless of location
Drawbacks
If a node fails during the backup, the volumes on the node won’t get backed up
until the next backup job runs on the nodes where they failed over to.
Remote Backup of Cluster Volumes
In this situation, all of the volumes are backed up remotely. The media server is installed in one
location and the Remote Agent for NetWare is installed on each of the servers in the cluster. A
backup job is submitted that backs up all the local volumes in the cluster via the Server/Volume
view. The cluster volumes are backed up using the cluster server view. The media server can
either be in, or out of the cluster. The backups will run regardless of where the volumes are
mounted. If a node fails during a backup of a cluster volume, the backup continues as soon as
the volume is mounted on the failover node.
Benefits
No redundant hardware costs
Backups occur regardless of node failures
Low complexity
Drawbacks
Remote backup of data over the LAN. This could be ameliorated by using a
dedicated backup network.
Best Practices in a Clustered SAN
The best practices largely depend on the SAN configuration. It is important to check the SSO
Rules section above and make sure that they are not being violated. Additionally:
Install BE on its own cluster resource.
This is not necessary, but simplifies the management of Backup Exec. By having BE installed in
its own cluster pool, it can be migrated around the cluster without affecting other resources on the
cluster. Just keep in mind that the volume needs to be large enough to accommodate the data
generated during backup operations. The larger users of disk space in a BE installation are the
catalogs and Backup-to-Disk.
Make sure that time is synchronized between the nodes
Having time synchronized is something that is likely already handled. There are plenty of other
reasons to be sure that all the servers in the cluster are time synchronized. BE needs the time
synchronized because otherwise the job schedule is going to be interpreted differently by the
different nodes.
GroupWise on a Cluster
Limitations
Unlike the file system TSA, the GroupWise TSA (GWTSA) does not provide a node independent
name for the objects it publishes. Thus, when a selection list is created to back up a post office,
domain, or library, the name includes the name of the node. If the volume containing that object
fails over to another node, the selection list won’t work any more. For example, when selecting
the post office when it is mounted on server NODE1, the file selection looks something like this:
NODE1.GroupWise System/[Dom]DOM:
When the volume fails over to NODE2, the name for the same post office becomes:
NODE2.GroupWise System/[Dom]DOM:
A backup job that is backing up using the first name won’t find the post office once it has moved.
There are a couple of ways to address this problem as described below.
Back up using the high level TSA selection
In this method, the jobs are set up such they back up all resources that are currently supported by
a node. Instead of selecting the individual objects (post offices, domains and libraries), make the
selection one level higher. That is, select the entire GWTSA for backup. The effect on the
backup is that instead of searching for individual objects to back up, it simply backs up everything
that happens to be available on that node at the time. The advantage is that you don’t care
where the volume is located when the backup runs. As long as you back up all the GWTSAs
where the volume can fail to, it will get backed up. A disadvantage is that if for some reason a
volume doesn’t get mounted anywhere, it won’t get backed up, and there will be no error in the
backups.
Redundant Backup Selections
In this method, the backup job backs up all the individual resources on a node that could possibly
be mounted there. This results in the same object being described in multiple backup jobs. No
matter where the object is, it gets backed up. To do this, the volumes all need to be migrated to a
node, then create the selection list for the job. Repeat this operation for each node. The big
disadvantage with this technique is that all the jobs always fail because they can never find all the
objects they need to back up. This can be made a little easier to manage by splitting up jobs into
jobs that back up objects in their “normal” location, and jobs that back up objects in their “failover”
locations. As long as everything is “normal”, you would expect the normal jobs to complete
successfully, and the “failover” jobs to end in error.
Troubleshooting and Questions
Why don’t I see the cluster volumes in the Backup Sources window?
There are three things to check. First, make sure that the TSA (TSA600 or TSAFS) is cluster
enabled. See TSA Cluster Options for details. Secondly, make sure that you have started the
cluster software before loading the SMS components. This can be easily checked by unloading
SMDR.NLM. If SMDR.NLM won’t unload, it tells you all the NLMS that you need to unload first.
Unload them, and then unload SMDR.NLM. Then load the TSA without disabling the cluster
functionality. Reload Backup Exec and the cluster volumes should show up properly.
Lastly, make sure that NDMPSVRS.DAT has been properly updated with the cluster resource
names and IP addresses. It is important to use the correct name for the cluster resource. At the
media server, run the “DISPLAY SLP SERVICES SMDR.NOVELL” command. This should list all
the available resources, and should include the cluster resource names and IP addresses. This
is a handy format for verifying that the NDMPSVRS.DAT file has the correct names and IP
addresses.
If the volumes are not listed in DISPLAY SLP SERVICES SMDR.NOVELL at the media server,
but they do show up using the same command on the node where they are mounted, then there
is a problem with SLP advertising. Try using the SET SLP RESET = ON command at the media
server and on the node where the resources is mounted to see if re-broadcasting the SLP
information resolves the problem.
Note that the names and IP addresses can also be found in the Load/Unload scripts for the
cluster resources. The Load/Unload scripts can be viewed in ConsoleOne by selecting the
cluster object in the tree view and then viewing the properties of the cluster resource.
Does the remote agent get used for remote cluster backups?
No, NDMPD.NLM is not used when backing up remote cluster volumes.
Can I do an Open File Option backup of a remote cluster volume?
BENW 9.1
No. In order to do an Open File Option backup on a cluster volume, the cluster volume must be
local to the media server. The other option is to back up the cluster volume via the
Server/Volume name of the volume.
BENW 9.2
Yes. In order to do an Open File Option backup on a cluster volume, make sure that the Open
File Option software has been installed on the the cluster volume must be local to the media
server. The other option is to back up the cluster volume via the Server/Volume name of the
volume.
What happens when BESTOP can’t stop BE?
There is a chance that BESTOP will fail to unload Backup Exec. This is most likely to occur when
Backup Exec is in the middle of a job when BESTOP is invoked. When it is invoked with the –C
switch, BESTOP attempts to forcibly terminate any jobs and clean up resources before Backup
Exec unloads. If BESTOP is unable to unload Backup Exec for NetWare Servers, it will fail, and
sit at a prompt waiting indefinitely for user input. The cluster resource Load/Unload script that
invoked BESTOP will also wait indefinitely for BESTOP to be unloaded. When a user responds
to the BESTOP failure, the cluster resource script will continue.
Note that this is an abnormal occurrence and should not occur in normal operations.
How do I make a whole server selection when backing up a cluster node?
When the TSA is cluster enabled (See TSA Cluster Options ), the user is prevented from making
a selection to back up all volumes on the server.
In a non-clustered environment, the TSA presents a server centric view of the volumes on the
server. In this environment, the user can select the server object for a backup operation. This
results in BE backing up all the volumes that are mounted on the server when the backup occurs.
When the TSA is cluster enabled in a clustered environment, the cluster volumes are no longer
presented in a server centric fashion. The result is that a backup selection that selects the server
object will not back up the cluster volumes mounted on the node at the time of backup. This
happens because the cluster volumes are not associated with the server.
Will restore or utility operations failover?
No. Only backup operations fail over.
How do the secondary nodes in a cluster know where the primary is?
The Backup Group Members in the BE_ESI_GROUP object include the cluster volume resource.
The Backup Group Members can be seen in Options/Media Server Options / Shared Devices /
Backup Group Members).
How do I clear a reservation on a drive that was being used during a
failure?
See the Media Server Failover section above.
I am getting a GrantAccessToServer: NWDSModifyObject(1) <-602> error
when I migrate the clustered BE installation to another node in the cluster.
How do I fix it?
This is likely happening if the cluster nodes are not all in the same NDS context. If this is the
case, then the problem is that the Backup Exec account object doesn’t have enough rights to add
itself to the Backup Exec Queue object as an operator. The problem is that a different queue
object gets created for each node that the Media Server migrates to. Grant the Backup Exec
object rights to modify each of the queue objects and this problem should be resolved.
Additional Resources
Cluster Services for NetWare 6 Documentation http://www.novell.com/documentation/lg/ncs6p/index.html
Backup Exec for NetWare Servers Documentation
http://support.veritas.com/main_ddProduct_BENWARE.htm
AppNote on setting up a 2 node SCSI SAN cluster
http://developer.novell.com/research/sections/netsupport/networkts/2002/february/z020201.htm
Novell TID – How to configure the GWTSA in a NetWare Cluster environment?
http://support.novell.com/cgi-bin/search/searchtid.cgi?/10084545.htm