Preparing for High Availability Adam Backman [email protected] V.P. of Technology White Star Software What We Will Cover Planning considerations Installation issues Maintenance issues Planning Phase - People Who “owns” the data Be inclusive This is not solely an IT decision Eliminate surprises Planning Considerations Budget – high availability is not free Hardware – fault tolerant, redundancy, … Software – Progress is good but how is your “other” software? Knowledge – buy or rent Time – schedule and outage time Personnel constraints – Who is on call? Goals During Outage Do no additional damage Shortest amount of time Reduce/Eliminate impact to customer The Cost of Downtime Wages Idle workers Cost to replace data Production Lost production Impact to the customer Can’t click website Can’t place order How Much Downtime Can You Afford? For maintenance Application Database For failures Hardware Software Natural disaster Planning Phase - Budget Less downtime = additional cost Better disks (RAID, Mirrors, EMC, …) Redundant system Remote site More money does not equal less downtime Prioritize Look for most likely scenarios Look beyond cool Planning Phase - Hardware Disks – The only moving part – Redundant Array Inexpensive Disks Avoid software mirroring Use multiple controllers Try to stick with a 1 vendor solution RAID What RAID really means RAID has many levels, here are the most common RAID 0: This level is also called striping. RAID 1: This is referred to as mirroring. RAID 5: Poor performance RAID level RAID 10: This is mirroring and striping. Also known as RAID 0 + 1 Planning Phase - Hardware CPU Check with vendor to ensure fault tolerance Memory Do not interleave memory Vendor Choose a reliable vendor (IBM, HP, Sun, Compaq, …) Planning Phase - Hardware Other hardware File servers Network stuff (LAN & WAN) Phone/Internet connections Planning Phase - Software Inventory all software (client and server) and make sure it is current and supported Determine what software is needed all of the time (Production control – Yes, Reporting software – No) Planning Phase - Progress Version of Progress (look for patches) Layout of database Single database or Multi-database Storage area layout (logical and physical layout) Application issues Client/Server, N-Tier or Host based Where does the application code reside? Planning Database Layout Single database Easy to maintain Still have storage areas to spread data Single point of failure Multi-database More to maintain Allows application partitioning Maintenance flexibility Two phase commit After Imaging Before image files keep information about records giving you the ability to undo a transaction After image files keep information about records that allows you to redo a transaction in the event of media failure After imaging is only part of a high availability strategy After Imaging Every high availability system should have after imaging enabled Multiple after image areas are required for high availability Only enable after imaging after you have a comprehensive backup and recovery plan in place How Does Journaling Work? Here is an logical over-simplification of how journaling works FOR EACH customer: BI Note written UPDATE customer. AI Note written END. Planning Phase - Knowledge Own Our people have the knowledge to do the project Buy We can train our people to do this project Rent We will hire consultants to implement this for us (Insert shameless plug here) Planning Phase - Time Schedule for project Machine purchase and delivery Software availability Resource availability Do we need a long weekend for implementation? Timings determined later may determine implementation schedule items Planning Phase - Personnel 24 hr. Operators If you don’t have operators you will need to develop monitoring routines with paging ability Database Administrator(s) System Administrator(s) Develop an escalation plan with “on call” schedule for off hours issues Installation Phase All items should have been already developed and tested prior to this stage All items should have been already developed and tested prior to this stage All items should have been already developed and tested prior to this stage Get the point? Installation Steps Develop a schedule with timings and leave room for error as there WILL be errors Write scripts to do tasks where possible to eliminate the human factor Have a master checklist with the person/ people responsible for each item Maintenance Goals Provide consistent performance Allow to advanced planning Avoid unscheduled outages Maintenance Don’t design something you cannot support Scripting should be flexible but bulletproof Example: www.peg.com/utilities.html Monitoring and trending are very important to maintain high availability systems Monitoring Areas of concern for high availability Progress Database areas filling BI not being reused AI space depleted Running out of licenses System Disk space Resources (memory, CPU, tunables, …) Monitoring Progress - DB /* Storage Area fill rate program */ DEF VAR percent-free as DEC FORMAT ">9.99". FOR EACH _AreaStatus: percent-free = 100 - ((_AreaStatus-HiWater / _AreaStatus-TotBlocks * 100)). DISPLAY _AreaStatus-areaname "Percent Free:" percent-free . Monitoring Progress - BI /* Last BI file growth program */ DEF VAR t_filename AS c FORMAT "x(40)". t_filename = pdbname(1) + ".b". FIND LAST _ActIOFile WHERE _IOFile-filename BEGINS t_filename. IF _IOfile-Extends = 0 THEN DISPLAY "ALL IS WELL". ELSE DISPLAY "The Sky is Falling !!!". Monitoring Progress - AI # Program: After image extent full checker FULL_EXT=`rfutil $DB -C aimage extent list | grep -i full | wc -l` if [ $FULL_EXT -lt 9 ] then echo “$DB has $FULL_EXT full extents STATUS – OK” else echo “WARNING - $DB has $FULL_EXT full extents” fi Monitoring Progress - Users /* License count tester */ DEF VAR remaining-licenses AS INT. FIND _license. remaining-licenses = _Lic-ValidUsers - _Lic-MaxActive. /* You may want to use _Lic-ActiveConns instead of _Lic-MaxActive */ IF .10 > (remaining-licenses / _Lic-ValidUsers) THEN DISPLAY "Less than 10% of licenses remaining" WITH FRAME X. ELSE DISPLAY "More than 10% of licenses remaining" WITH FRAME Y. System Monitoring Disk Space How much disk available for growth Also look at throughput capacity (average wait) Memory capacity Free memory is not a good indicator I focus on the scan rate CPU Capacity How much idle time Maintenance Tasks Backup and restore After imaging Log based replication Data maintenance Backup and Restore Progress online backup Quiet point backup Warm standby backup Backup and Restore Why can’t I just backup the database and before image files while the database is at a slow point? Answer: The database consists of three portions while it is up and those are: The database files, the before image file(s) and memory Portions of an Active DB Shared memory holds the most volatile data Shared memory The database contains older committed data The before image holds transaction information DB BI All three are needed for a complete backup Online Backup What happens during an online backup? 1. 2. 3. 4. 5. 6. Grab a db latch Do a pseudo-checkpoint (this synchs memory to disk) Switch AI file (if necessary) Backup the before image file Release the db latch Backup the database (starting at the end) Quiet Points Very little impact to system availability Allows for integration with hardware utilities Only way to get an online backup with an operating system utility without shutting down the broker How quiet points work. Get database latch do pseudo checkpoint wait for quiet point to be removed NOTE: All processing will wait for the quiet point to be removed Quiet Point Backup How to do a quiet point backup 1. Enable the quiet point (This synchs memory to disk) 2. Synchronize your disk mirrors 3. Split your disk mirrors 4. Disable the quiet point 5. Mount the mirrors as different file systems 6. Backup your mounted mirrors with an OS utility (tar, cpio, fdump, …) After Imaging Every high availability system should have after imaging enabled Only enable after imaging after you have a comprehensive backup and recovery plan in place AI is sometimes referred to as the redo log Multi-volume after image files Not a backup but a journal of completed transactions Can be used to keep a copy of the database up to date Can be switched with no interruption to user processing Should part of every high availability environment How to integrate after imaging In conjunction with a backup site To update a report server As a means of backup AI to update a backup site Poor man’s replication Allows for periodic update of a copy of the database The copy can then be backed up with a conventional backup mechanism Log Based Replication Log based replication is another way to say applying AI files to a copy of your database Excellent way to maintain a warm copy of your database for fail over Can be used on the same machine or on a remote machine for additional protection Log Based Replication Rules The standby database can only be accessed read-only (-RO) which means no remote (client/server) connections to the standby data You must have a multi-volume AI. This is a must for high availability in any case The standby database can have a different structure than the primary data AI as a Means of Backup Not generally a good idea • • Increased recovery time Reduced reliability Backup the database each weekend Backup the AI file(s) each weeknight Backup – Points to Remember Simplicity and minimizing user interaction will increase backup reliability You are only as good as your last tested backup Archiving off site is essential Database Maintenance Data Stuff Table move Database analysis Index Stuff Index rebuild (offline) Index Compress Index Fix Table Move Pros Simple Bullet proof Cons Slow Table is read only for the duration of the move Uses tons of logging space Table Move Syntax: proutil dbname –C tablemove tablename table-area [index-area] Table-area = The target application data area into which the table is to be moved Index-area = The name of the target index area, if not specified the indexes will be left in there existing location Database Analysis Useful tool for determining low level storage information Helpful for determining records per block in storage areas Help determine when to compress/rebuild indexes Database Analysis Record Level RECORD BLOCK SUMMARY FOR AREA "Schema Area" : 6 -------------------------------------------------------Record Size (B)- ---Fragments--- Scatter Table Records Size Min Max Mean Count Factor Factor PUB.agedar 26 871.0B 31 41 33 26 1.0 1.7 PUB.customer 33 5.7K 159 196 175 33 1.0 0.9 PUB.item 55 4.5K 73 95 83 55 1.0 1.1 PUB.monthly 20 798.0B 37 42 39 20 1.0 1.0 PUB.order 20 2.2K 98 138 113 20 1.0 1.2 PUB.order-line 71 2.1K 29 31 30 71 1.0 1.0 PUB.salesrep 3 219.0B 71 75 73 3 1.0 1.0 PUB.shipping 250 5.7K 18 24 23 250 1.0 1.0 PUB.state 51 1.7K 29 40 34 51 1.0 1.1 PUB.syscontrol 1 134.0B 134 134 134 1 1.0 1.0 Database Analysis Index Level INDEX BLOCK SUMMARY FOR AREA "Schema Area" : 6 ------------------------------------------------------Table Index Fields Levels Blocks Size % Util Factor PUB.agedar ar_cust 8 1 1 1 194.0B 4.8 1.0 ar_inv 9 1 1 1 242.0B 5.9 1.0 ar_invdat 10 1 1 1 55.0B 1.4 1.0 PUB.customer cust-num 11 1 1 1 305.0B 7.5 1.0 name 12 1 1 1 809.0B 19.9 1.0 zip 13 1 1 1 326.0B 8.0 1.0 PUB.item idesc 14 1 1 1 879.0B 21.6 1.0 item-num 15 1 1 1 503.0B 12.4 1.0 Index Rebuild Used to be the only way to repair and reorganize indexes Provides a positive performance impact in many cases Off line – database only available in read only mode Index Compress The best new feature in version 9 Reorganizes indexes while the database is online Allows you to specify compaction level Possible performance gain with no downtime Index fix Combine with index compress to get all of the benefits of an index rebuild Corrects problems with the index (missing and extra entries) Works online with little interruption to user activity Database Maintenance Points to Remember Progress provides many utilities, more are moving online with each release Table and index move WILL interrupt operations Periodically analyze your database Use Index compress to optimize indexes Non-Stop Progress Points to Remember Plan Timing is everything Script to avoid mistakes Implement Checklist, Checklist, Checklist Maintain Trending Low impact is paramount Questions
© Copyright 2026 Paperzz