Other Options to Consider

332
Chapter 12
unknown to the system because DNS is not configured on this system. The mrranger
node is powered down so it is known but not reachable. Notice the difference in the
outputs for these two similar, but very different, situations. Please study the code
related to both of these tests in the ping_nodes function.
Other Options to Consider
As always, we can improve on any shell script, and this one is no exception. I have
listed some options that you may want to consider.
$PINGLIST Variable Length Limit Problem
In this scripting solution we gave the user the capability to comment out specific nodes
in the $PINGFILE. We assigned the list of nodes, which is a list without the comments,
to a variable. This is fine for a relatively short list of nodes, but a problem arises when
the maximum variable length, which is usually 2048 characters, is exceeded. If you
have a long list of nodes that you want to ping and you notice that the script never gets
to the end of the ping list, you have a problem. Or if you see a funny-looking node
name, which is probably a hostname that has been cut off by the variable limit and
associated with a system error message, then you have a problem. To resolve this issue,
define a new file to point to the PINGLIST variable, and then we will use the file to
store the ping list data instead of a variable. To use PINGLIST as a file, add/
change the following lines:
ADD THIS LINE:
PINGLIST=/tmp/pinglist.out
CHANGE THIS LINE:
PINGLIST=$(cat $PINGFILE | grep -v ‘^#’)
TO THIS LINE:
cat $PINGFILE | grep -v ‘^#’ > $PINGLIST
CHANGE THIS LINE:
for HOSTPINGING in $(echo $PINGLIST)
TO THIS LINE:
for HOSTPINGING in $(cat $PINGLIST)
Automated Hosts Pinging with Notification
Using the file to store the ping list data changes the limit to the maximum file size
that the system supports or when the filesystem fills up, which should be plenty of
space for anyone. This modified shell script is located on this book’s companion Web
site. The script name is pingnodes_using_a_file.ksh.
Ping the /etc/hosts File Instead of a List File
This may be overkill for any large shop, but it is easy to modify the shell script to
accomplish this task. You want to make the following change to the shell script after
completing the tasks in the previous section “$PINGLIST Variable Length Limit Problem” to the shell script shown in Listing 12.1.
CHANGE THESE LINES:
if [ -s $PINGFILE ]
then
PINGLIST=$(cat $PINGFILE | grep -v ‘^#’)
TO THESE LINES:
if [ -s /etc/hosts ]
then
# Ping all nodes in the /etc/hosts file
cat /etc/hosts | sed /^#/d | sed /^$/d | grep -v 127.0.0.1 \
| awk ‘{print $2}’ > $PINGLIST
In this changed code we cat the /etc/hosts file and pipe the output to a sed
statement, sed /^#/d. This sed statement removes every line in the /etc/hosts file
that begins with a pound sign (#). The output of this sed statement is then piped to
another sed statement, sed /^$/d, which removes all of the blank lines in the
/etc/hosts file (the blank lines are specified by the ^$). This sed output is sent to a
grep command that removes the loopback address from the list. Finally, the remaining
output is piped to an awk statement that extracts the hostname out of the second field.
The resulting output is redirected to the $PINGLIST file. This modified shell script to
ping the /etc/hosts file is included on the Web site that accompanies the book. The
filename is pinghostsfile.ksh.
Logging
I have not added any logging capability to this shell script. Adding a log file, in addition to user notification, can help you find trends of when nodes are unreachable.
Adding a log file is not too difficult to do. The first step is to define a unique log filename in the definitions section and assign the filename to a variable, maybe LOGFILE.
In the script test for the existence of the file, using a test similar to the following statement will work.
333
334
Chapter 12
ADD THESE LINES:
LOGPATH=/usr/local/log
LOGFILE=${LOGPATH}/pingnodes.log
if [ ! -s $LOGFILE ]
then
if [ ! -d $LOGPATH ]
then
echo “\nCreating directory ==> $LOGPATH\c”
mkdir /usr/local/log
if (( $? != 0 ))
then
echo “\nUnable to create the $LOGPATH directory...EXITING
\n”
exit 1
fi
chown $USER /usr/local/log
chmod 755 $LOGPATH
echo
fi
echo “\nCreating Logfile ==> $LOGFILE\c”
cp /dev/null > $LOGFILE
chown $USER $LOGFILE
echo
fi
After adding these lines of code, use the tee -a $LOGFILE command in a pipe to
both display the text on the screen and log the data in the $LOGFILE.
Notification of “Unknown Host”
You may want to add notification, and maybe logging too, for nodes that are not
known to the system. This usually occurs when the machine cannot resolve the node
name into an IP address. This can be caused by the node not being listed in the
/etc/hosts file or failure of the DNS lookup. Check both conditions when you get
the Unknown host message. Currently, this shell script only echoes this information
to the screen. You may want to add this message to the notification.
Notification Method
In this shell script we use email notification. I like email notification, but if you have a
network failure this is not going to help you. To get around the network down problem
with email, you may want to set up a modem, for dial-out only, to dial your alphanumeric pager number and leave you a message. At least you will always get the
message. I have had times, though, when I received the message two hours later due to
a message overflow to the modem.
Automated Hosts Pinging with Notification
You may just want to change the notification to another method, such as SNMP
traps. If you execute this shell script from an enterprise management tool, then the
response required back to the program is usually an SNMP trap. Refer to the documentation of the program you are using for details.
Automated Execution Using a Cron Table Entry
I know you do not want to execute this shell script from the command line every 15
minutes yourself! I use a root cron table entry to execute this shell script every 15 minutes, 24 hours a day, Monday through Saturday, and 8:00 A.M. to midnight on Sunday;
of course, this requires two cron table entries. Because weekly backups and reboots
happen early Sunday morning, I do not want to be awakened every Sunday morning
when a machine reboots, so I have a special cron entry for Sunday. Both root cron table
entries shown execute this script every 15 minutes.
5,20,35,50 * * * 1-6 /usr/local/bin/pingnodes.ksh >/dev/null 2>&1
5,20,35,50 8-23 * * 0
/usr/local/bin/pingnodes.ksh </dev/null 2>&1
The first entry executes the pingnodes.ksh shell script at 5, 20, 35, and 50 minutes
of every hour from Monday through Saturday. The second entry executes the
ping-nodes.ksh shell script at 5, 20, 35, and 50 minutes from 8:00 A.M. until 11:59 P.M.,
with the last ping test running at 11:50 P.M. Sunday night.
Summary
In this chapter we took a different approach than that of some other shell scripts in this
book. Instead of creating a different function for each operating system, we created a
single shell script and then used a separate function to execute the correct command
syntax for the specific operating system. The uname command is a very useful tool for
shell scripting solutions for various Unix flavors in a single shell script.
I hope you enjoyed this chapter. I think we covered some unique ways to solve the
scripting problems that arise when programming for multiple Unix flavors in the same
script. In the next chapter we will dive into the task of taking a system snapshot. The
idea is to get a point-in-time system configuration for later comparison if a system
problem has you puzzled. See you in the next chapter!
335
CHAPTER
13
Taking a System Snapshot
Have you ever rebooted a system and it came up in an unusual state? Any time you
reboot a system you run a risk that the system will not come back up properly. When
problems arise it is nice to have before and after pictures of the state of the machine. In
this chapter we are going to look at some options for shell scripts that execute a series
of commands to take a snapshot of the state of the machine. Some of the things to consider for this system snapshot include filesystems that are mounted, NFS mounts,
processes that are running, network statistics and configuration, and a list of defined
system resources, just to name a few. This is different from gathering a snapshot of
performance statistics, which is gathered over a period of time. All we are looking for
is system configuration data and the system’s state at a point in time, specifically
before the system is rebooted or when it is running in a normal state with all of the
applications running properly.
With this information captured before a system reboot, you have a better chance of
fixing a reboot problem quickly and reducing down time. I like to store snapshot information in a directory called /usr/local/reboot with the command names used for
filenames. For this shell script all of the system information is stored in a single file
with a section header added for each command output. Overall, this is not a difficult
shell script to write, but gathering the list of commands that you want to run can sometimes be a challenge. For example, if you want to gather an application’s configuration
you need to find the commands that will produce the desired output. I always prefer
having too much information, rather than not enough information, to troubleshoot a
problem.
337
338
Chapter 13
In this chapter I have put together a list of commands and created a bunch of functions to execute in the shell script. The commands selected are the most critical for troubleshooting an AIX machine; however, you will need to tailor this set of commands to
suit your particular needs, operating system, and environment. Every shop is different,
but they are all the same in some sense, especially when it comes to troubleshooting a
problem. Let’s look at some commands and the syntax that is required.
Syntax
As always, we need the commands and the proper syntax for these commands before
we can write a shell script. The commands presented in this section are just a sample of
the information that you can gather from the system. This set of commands is for an
AIX system, but most apply to other Unix flavors with modified syntax. The list of AIX
commands is shown in Listing 13.1.
# Hostname of the machine
hostname
OR
uname -n
# Unix flavor
uname -s
# AIX OS version
oslevel
# AIX maintenance level patch set
instfix -i | grep AIX_ML
OR
oslevel -r
# Time zone for this system
cat /etc/environment | grep TZ | awk -F’=’ ‘{print $2}’
# Real memory in the system
echo “$(bootinfo -r)KB”
OR
lsattr -El -a realmem | awk ‘{print $2}’
# Machine type/architecture
uname -M
OR - Depending on the architecture
uname -p
# List of defined system devices
lsdev -C
# Long directory listing of /dev
ls -l /dev
# List of all defined disks
lsdev -Cc disk
# List of all defined pdisks for SSA disks
lsdev -Cc pdisk
# List of defined tape drives
Listing 13.1 System snapshot commands for AIX.
Taking a System Snapshot
lsdev -Cc tape
# List of defined CD-ROMs
lsdev -Cc cdrom
# List of all defined adapters
lsdev -Cc adapter
# List of network routes
netstat -rn
# Network adapter statistics
netstat -i
# Filesystem Statistics
df -k
AND
mount
# List of defined Volume Groups
lsvg | sort -r
# List of varied-on Volume Groups
lsvg -o | sort -r
# List of Logical Volumes in each Volume Group
for VG in $(lsvg -o | sort -r)
do
lsvg -l $VG
done
# Paging space definitions and usage
lsps -a
AND
lsps -s
# List of all hdisks in the system
lspv
# Disk drives listed by Volume Group assignment
for VG in $(lsvg -o | sort -r)
do
lsvg -p $VG
done
# List the HACMP configuration, if installed
if [ -x /usr/sbin/cluster/utilities/cllsif ]
then
/usr/sbin/cluster/utilities/cllsif
echo “\n”
fi
if [ -x /usr/sbin/cluster/utilities/clshowres ]
then
/usr/sbin/cluster/utilities/clshowres
fi
# List of all defined printers
lpstat -W | tail +3
AND
cat /etc/qconfig
Listing 13.1 System snapshot commands for AIX. (continues)
339