Non-Stop Progress

Preparing for High Availability
Adam Backman
[email protected]
V.P. of Technology
White Star Software
What We Will Cover
Planning considerations
 Installation issues
 Maintenance issues

Planning Phase - People
Who “owns” the data
 Be inclusive
 This is not solely an IT decision
 Eliminate surprises

Planning Considerations
Budget – high availability is not free
 Hardware – fault tolerant, redundancy, …
 Software – Progress is good but how is your
“other” software?
 Knowledge – buy or rent
 Time – schedule and outage time
 Personnel constraints – Who is on call?

Goals During Outage
Do no additional damage
 Shortest amount of time
 Reduce/Eliminate impact to customer

The Cost of Downtime

Wages
 Idle
workers
 Cost to replace data

Production
 Lost

production
Impact to the customer
 Can’t
click website
 Can’t place order
How Much Downtime Can You
Afford?

For maintenance
 Application
 Database

For failures
 Hardware
 Software
 Natural
disaster
Planning Phase - Budget

Less downtime = additional cost
 Better
disks (RAID, Mirrors, EMC, …)
 Redundant system
 Remote site

More money does not equal less downtime
 Prioritize
 Look
for most likely scenarios
 Look beyond cool
Planning Phase - Hardware

Disks – The only moving part
– Redundant Array Inexpensive Disks
 Avoid software mirroring
 Use multiple controllers
 Try to stick with a 1 vendor solution
 RAID
What RAID really means
RAID has many levels, here are the most
common
 RAID 0: This level is also called striping.
 RAID 1: This is referred to as mirroring.
 RAID 5: Poor performance RAID level
 RAID 10: This is mirroring and striping.
Also known as RAID 0 + 1
Planning Phase - Hardware

CPU
Check with vendor to ensure fault tolerance

Memory
Do not interleave memory

Vendor
Choose a reliable vendor (IBM, HP, Sun,
Compaq, …)
Planning Phase - Hardware

Other hardware
 File
servers
 Network stuff (LAN & WAN)
 Phone/Internet connections
Planning Phase - Software
Inventory all software (client and server)
and make sure it is current and supported
 Determine what software is needed all of
the time (Production control – Yes,
Reporting software – No)

Planning Phase - Progress
Version of Progress (look for patches)
 Layout of database

 Single
database or Multi-database
 Storage area layout (logical and physical
layout)

Application issues
 Client/Server,
N-Tier or Host based
 Where does the application code reside?
Planning Database Layout

Single database




Easy to maintain
Still have storage areas to spread data
Single point of failure
Multi-database




More to maintain
Allows application partitioning
Maintenance flexibility
Two phase commit
After Imaging
Before image files keep information about
records giving you the ability to undo a
transaction
 After image files keep information about
records that allows you to redo a transaction
in the event of media failure
 After imaging is only part of a high
availability strategy

After Imaging
Every high availability system should have
after imaging enabled
 Multiple after image areas are required for
high availability
 Only enable after imaging after you have a
comprehensive backup and recovery plan in
place

How Does Journaling Work?
Here is an logical over-simplification of how
journaling works
FOR EACH customer:
BI Note written
UPDATE customer.
AI Note written
END.
Planning Phase - Knowledge

Own
Our people have the knowledge to do the project

Buy
We can train our people to do this project

Rent
We will hire consultants to implement this for us
(Insert shameless plug here)
Planning Phase - Time
Schedule for project
 Machine
purchase and delivery
 Software availability
 Resource availability
 Do we need a long weekend for
implementation?
Timings determined later may determine
implementation schedule items
Planning Phase - Personnel

24 hr. Operators
If you don’t have operators you will need to
develop monitoring routines with paging ability
Database Administrator(s)
 System Administrator(s)
Develop an escalation plan with “on call”
schedule for off hours issues

Installation Phase
All items should have been already
developed and tested prior to this stage
 All items should have been already
developed and tested prior to this stage
 All items should have been already
developed and tested prior to this stage
 Get the point?

Installation Steps
Develop a schedule with timings and leave
room for error as there WILL be errors
 Write scripts to do tasks where possible to
eliminate the human factor
 Have a master checklist with the person/
people responsible for each item

Maintenance Goals
Provide consistent performance
 Allow to advanced planning
 Avoid unscheduled outages

Maintenance
Don’t design something you cannot support
 Scripting should be flexible but bulletproof

 Example:

www.peg.com/utilities.html
Monitoring and trending are very important
to maintain high availability systems
Monitoring
Areas of concern for high availability
 Progress
Database areas filling
 BI not being reused
 AI space depleted
 Running out of licenses

 System
Disk space
 Resources (memory, CPU, tunables, …)

Monitoring Progress - DB
/* Storage Area fill rate program */
DEF VAR percent-free as DEC FORMAT ">9.99".
FOR EACH _AreaStatus:
percent-free = 100 - ((_AreaStatus-HiWater /
_AreaStatus-TotBlocks * 100)).
DISPLAY _AreaStatus-areaname "Percent Free:" percent-free .
Monitoring Progress - BI
/* Last BI file growth program */
DEF VAR t_filename AS c FORMAT "x(40)".
t_filename = pdbname(1) + ".b".
FIND LAST _ActIOFile WHERE _IOFile-filename BEGINS t_filename.
IF _IOfile-Extends = 0 THEN
DISPLAY "ALL IS WELL".
ELSE
DISPLAY "The Sky is Falling !!!".
Monitoring Progress - AI
# Program: After image extent full checker
FULL_EXT=`rfutil $DB -C aimage extent list | grep -i full | wc -l`
if [ $FULL_EXT -lt 9 ]
then
echo “$DB has $FULL_EXT full extents STATUS – OK”
else
echo “WARNING - $DB has $FULL_EXT full extents”
fi
Monitoring Progress - Users
/* License count tester */
DEF VAR remaining-licenses AS INT.
FIND _license.
remaining-licenses = _Lic-ValidUsers - _Lic-MaxActive.
/* You may want to use _Lic-ActiveConns instead of _Lic-MaxActive */
IF .10 > (remaining-licenses / _Lic-ValidUsers) THEN
DISPLAY "Less than 10% of licenses remaining"
WITH FRAME X.
ELSE
DISPLAY "More than 10% of licenses remaining"
WITH FRAME Y.
System Monitoring

Disk Space
 How
much disk available for growth
 Also look at throughput capacity (average wait)

Memory capacity
 Free
memory is not a good indicator
 I focus on the scan rate

CPU Capacity
 How
much idle time
Maintenance Tasks
Backup and restore
 After imaging
 Log based replication
 Data maintenance

Backup and Restore
Progress online backup
 Quiet point backup
 Warm standby backup

Backup and Restore
Why can’t I just backup the database and
before image files while the database is at a
slow point?
Answer: The database consists of three
portions while it is up and those are: The
database files, the before image file(s) and
memory
Portions of an Active DB
Shared memory holds the
most volatile data
Shared memory
The database contains older
committed data
The before image holds
transaction information
DB
BI
All three are needed for a
complete backup
Online Backup
What happens during an online backup?
1.
2.
3.
4.
5.
6.
Grab a db latch
Do a pseudo-checkpoint (this synchs memory
to disk)
Switch AI file (if necessary)
Backup the before image file
Release the db latch
Backup the database (starting at the end)
Quiet Points
Very little impact to system availability
 Allows for integration with hardware
utilities
 Only way to get an online backup with an
operating system utility without shutting
down the broker

How quiet points work.
Get database latch
 do pseudo checkpoint
 wait for quiet point to be removed

NOTE: All processing will wait for the quiet
point to be removed
Quiet Point Backup
How to do a quiet point backup
1. Enable the quiet point (This synchs memory
to disk)
2. Synchronize your disk mirrors
3. Split your disk mirrors
4. Disable the quiet point
5. Mount the mirrors as different file systems
6. Backup your mounted mirrors with an OS
utility (tar, cpio, fdump, …)
After Imaging
Every high availability system should have
after imaging enabled
 Only enable after imaging after you have a
comprehensive backup and recovery plan in
place
 AI is sometimes referred to as the redo log

Multi-volume after image files
Not a backup but a journal of completed
transactions
 Can be used to keep a copy of the database
up to date
 Can be switched with no interruption to user
processing
 Should part of every high availability
environment

How to integrate after imaging
In conjunction with a backup site
 To update a report server
 As a means of backup

AI to update a backup site
Poor man’s replication
 Allows for periodic update of a copy of the
database
 The copy can then be backed up with a
conventional backup mechanism

Log Based Replication
Log based replication is another way to say
applying AI files to a copy of your database
 Excellent way to maintain a warm copy of
your database for fail over
 Can be used on the same machine or on a
remote machine for additional protection

Log Based Replication Rules
The standby database can only be accessed
read-only (-RO) which means no remote
(client/server) connections to the standby
data
 You must have a multi-volume AI. This is
a must for high availability in any case
 The standby database can have a different
structure than the primary data

AI as a Means of Backup

Not generally a good idea
•
•
Increased recovery time
Reduced reliability
Backup the database each weekend
 Backup the AI file(s) each weeknight

Backup – Points to Remember
Simplicity and minimizing user interaction
will increase backup reliability
 You are only as good as your last tested
backup
 Archiving off site is essential

Database Maintenance

Data Stuff
 Table
move
 Database analysis

Index Stuff
 Index
rebuild (offline)
 Index Compress
 Index Fix
Table Move

Pros
 Simple
 Bullet

proof
Cons
 Slow
 Table
is read only for the duration of the move
 Uses tons of logging space
Table Move
Syntax:
proutil dbname –C tablemove tablename
table-area [index-area]
Table-area = The target application data area into which the
table is to be moved
Index-area = The name of the target index area, if not
specified the indexes will be left in there existing location
Database Analysis
Useful tool for determining low level
storage information
 Helpful for determining records per block in
storage areas
 Help determine when to compress/rebuild
indexes

Database Analysis
Record Level
RECORD BLOCK SUMMARY FOR AREA "Schema Area" : 6
-------------------------------------------------------Record Size (B)- ---Fragments--- Scatter
Table
Records
Size Min Max Mean Count Factor Factor
PUB.agedar
26
871.0B 31 41 33
26
1.0
1.7
PUB.customer
33
5.7K
159 196 175
33
1.0
0.9
PUB.item
55
4.5K
73 95 83
55
1.0
1.1
PUB.monthly
20
798.0B 37 42 39
20
1.0
1.0
PUB.order
20
2.2K
98 138 113
20
1.0
1.2
PUB.order-line
71
2.1K
29 31 30
71
1.0
1.0
PUB.salesrep
3
219.0B 71 75 73
3
1.0
1.0
PUB.shipping
250
5.7K
18 24 23
250 1.0
1.0
PUB.state
51
1.7K
29 40 34
51
1.0
1.1
PUB.syscontrol
1
134.0B 134 134 134
1
1.0
1.0
Database Analysis
Index Level
INDEX BLOCK SUMMARY FOR AREA "Schema Area" : 6
------------------------------------------------------Table
Index Fields Levels Blocks Size % Util Factor
PUB.agedar
ar_cust
8
1 1
1
194.0B 4.8 1.0
ar_inv
9
1 1
1
242.0B 5.9 1.0
ar_invdat
10
1 1
1
55.0B
1.4 1.0
PUB.customer
cust-num
11
1 1
1
305.0B 7.5 1.0
name
12
1 1
1
809.0B 19.9 1.0
zip
13
1 1
1
326.0B 8.0 1.0
PUB.item
idesc
14
1 1
1
879.0B 21.6 1.0
item-num
15
1 1
1
503.0B 12.4 1.0
Index Rebuild
Used to be the only way to repair and
reorganize indexes
 Provides a positive performance impact in
many cases
 Off line – database only available in read
only mode

Index Compress
The best new feature in version 9
 Reorganizes indexes while the database is
online
 Allows you to specify compaction level
 Possible performance gain with no
downtime

Index fix
Combine with index compress to get all of
the benefits of an index rebuild
 Corrects problems with the index (missing
and extra entries)
 Works online with little interruption to user
activity

Database Maintenance
Points to Remember
Progress provides many utilities, more are
moving online with each release
 Table and index move WILL interrupt
operations
 Periodically analyze your database
 Use Index compress to optimize indexes

Non-Stop Progress
Points to Remember

Plan
 Timing
is everything
 Script to avoid mistakes

Implement
 Checklist,

Checklist, Checklist
Maintain
 Trending
 Low
impact is paramount
Questions