DB-15: Inside The Recovery Subsystem Plan to commit; Be prepared to rollback. Richard Banville Fellow, Technology and Product Architecture Progress OpenEdge Recovery Types Transaction Recovery* • Before image rollback/undo and crash recovery Hard Failure Recovery • Roll forward after images • Point in time, transaction, retry Coordinated distributed txn consistency • OpenEdge® 2PC - Prepare Phase, Commit Phase Heterogeneous distributed txn consistency (JTA) • External distributed transaction coordinator • Requires application changes • Available for OpenEdge SQL only * Before Imaging is the focus of this presentation 2 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Agenda The BI Units of Measure Some Simple Rules General Processing (the fun stuff) Reliability Switches Summary 3 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation BI Layout: Notes and Blocks Notes are the basis for recording change in the database BI made up of many Notes Notes are variable sized Notes are organized in order of operation Notes are stored into BI blocks BI block size can be customized (1-16K) I/O is performed in BI Blocksize 4 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation BI Layout: Clusters Blocks are grouped to form a cluster BI cluster size can be customized (16KB – 256MB) Size affects checkpoint frequency (among other things) Notes are stored into BI blocks BI Block size can be customized (1-16K) I/O is performed in BI Blocksize 5 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation BI Layout: Clusters Clusters are allocated as needed Clusters are logically joined and ordered into a ring Only ever one cluster accepting BI writes 6 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation BI Layout: Storage BI File BI File BI File The Primary Recovery Area: BI data stored in the extents of area #2 of the database It grows as needed Space is re-used when possible 7 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation What’s in a note? Trid: 81180 code = RL_RMCR version = 2 Trid: 81180 area = 8 dbkey = 14528 update counter = 4770 Header Note Specific Info Data Portion (if needed) Length & note version Record # Block change data Note code/identifier Table number i.e, Record data itself Size of record Only if needed Associates action Note type Transaction Id Split information Block pointer & area Block update counter 8 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Agenda The BI Units of Measure Some Simple Rules General Processing (the fun stuff) Reliability Switches Summary 10 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Rules to live by #1 - Write ahead logging (WAL) • Recovery log notes written BEFORE data – Assures atomic and durable transactions – BI, AI - reliable write I/O – Can relax data write I/O Write prior to BI-reuse Cluster close Missing data applied by redo Deferring writes allows multiple updates to occur with a single I/O #2 - Write ordering rule (FS and hardware) • AI, BI writes get to disk in order requested 11 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Rules to follow #3 - BI Space Reuse • Only when cluster is closed • Cluster closes when its last transaction ends – Checkpoint DOES NOT close a cluster – Checkpoint occurs when cluster fills up #4 - Exclusive Block Access • When changing data in database #5 - Atomic Physical Changes • Such as block chain manipulations • Enforced by internal TXE mechanism • SYSTEM ERROR: User 5 died during micro txn. 12 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Rule #6 - Without exception: • All DB changes are recorded in recovery log. 13 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Rules were meant to be broken #6 - Without exception: • All DB changes are recorded in recovery log. Exception: • Control Area (area #1) changes are not logged. – Why should I care? – Allows structural changes w/o affecting recovery Such as adding space while in roll forward. – Recovery Mechanism: Builddb 14 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Agenda The BI Units of Measure Some Simple Rules General Processing (the fun stuff) Reliability Switches Summary 15 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Forward Processing So you want to perform a database action Locate/Lock the data block to change • Not all notes require a block – Transaction begin, end • Not all DB changes require a block! – Acquiring additional space – Certain index sub-operations Ensure begin transaction recorded Record the change in the BI log (via the BI buffer pool) 16 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation BI Buffer Pool – Recording a change Forward Processing Rollback Processing -bibufs 10 Free List NF - a NF - b Current Output Buffer Modified Queue Current Input Buffer 32 31 15 30 Backout Buffer Backout Buffer 9 12 New Notes (Actions) NF - c 29 NF - d NF - e 17 DB-15: Inside the Recovery Subsystem BI © 2007 Progress Software Corporation BI Buffer Pool – Recording a change Forward Processing -bibufs 10 Free List NF - a NF - b Current Output Buffer Modified Queue 32 31 New Notes (Actions) NF - c 30 29 Busy buffer waits Empty buffer waits Partial Writes Is it OK to buffer dirty BI blocks? YES NF - d NF - e PROMON: Total BI Writes Records (notes) written Is it OK to buffer committed BI data? BI Delayed commit is up to you! 18 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Forward Processing (continued) The BI Note has been written… Finally perform the DB action (make the change) • Logical, physical or a mix Data block’s update ctr is incremented • Identifies if a noted change made it to disk yet • Ensures changes re-applied in order Dependency counter maintained in ctlr struct • Ensures associated BI flushed if –B eviction User may be forced to do (expensive) BI I/O • On -B eviction or No BI buffers available • Avoid with APWs, BIW and -bibufs 19 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Helping avoid OLTP BI I/O 20 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Broker Processing Helping Avoid OLTP BI I/O -bibufs 10 Free List NF - a NF - b Current Output Buffer Modified Queue 32 31 New Notes (Actions) NF - c PROMON: Total BI Writes Records (notes) written Partial Writes 30 29 Delayed commit (Durability) NF - d NF - e 21 DB-15: Inside the Recovery Subsystem Broker BI Based on –Mf value, Broker may flush BI buffers to disk For aged txn ends © 2007 Progress Software Corporation BIW Processing Helping Avoid OLTP BI I/O -bibufs 10 Free List NF - a NF - b PROMON: Total BI Writes Records (notes) written Current Output Buffer Modified Queue 32 31 Partial Writes 30 BIW Writes New Notes (Actions) NF - c 29 NF - d NF - e 22 DB-15: Inside the Recovery Subsystem BIW BI © 2007 Progress Software Corporation APW Processing Helping Avoid OLTP BI I/O -bibufs 10 Free List NF - a NF - b Current Output Buffer Modified Queue 32 31 New Notes (Actions) NF - c 30 Checkpoint Queue Associated BI Note 172 (dependency ctr) 128 Data Blocks 29 WAL NF - d NF - e AP W BI db 12 23 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation BI Clusters And Checkpointing 24 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation The Precious Ring BI Files 1 2 3 4 BI Cluster Layout Current Modified Out Buffer Queue 32 31 BI blocks are grouped together to form a cluster of blocks. The cluster of blocks are logically joined together in a ring. 30 -bibufs 29 1 2 3 4 -B buffer pool Database 25 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Checkpoint – Synchronization point All Database Changes Halted! BI Files 1 Current Modified Out Buffer Queue 32 2 3 4 BI Cluster Layout BI buffer pool flushed Db buffer pool scanned Db buffers previously marked for chkpt are written out (OUCH!) 31 Dirty buffers are marked for chkpt & put on checkpoint queue 30 -bibufs Fuzzy checkpointing avoids I/O 29 File system cache is synchronized 1 2 3 Database 26 DB-15: Inside the Recovery Subsystem 4 File System Cache -B buffer pool No more sync delay © 2007 Progress Software Corporation Checkpoint (with –directio) All Database Changes Halted! BI Files 1 2 3 4 BI Cluster Layout BI buffer pool flushed Db buffer pool scanned Db buffers marked for chkpt are written out Dirty buffers are marked for chkpt & put on checkpoint queue Fuzzy checkpointing avoids I/O 1 2 3 Database 27 DB-15: Inside the Recovery Subsystem 4 -B buffer pool (unbuffered I/O) © 2007 Progress Software Corporation The APW The APWs help w/checkpoints too PROMON: Buffers Flushed at checkpoint BIW Writes AP W APW Queue 172 128 128 Checkpoint Queue 256 1024 512 -B Buffer Pool 1152 1664 … 28 DB-15: Inside the Recovery Subsystem db © 2007 Progress Software Corporation Checkpoint – Size Does Matter Larger cluster sizes • Fewer checkpoints (sync points) – Will a crash result in additional lost data? • Longer recovery time – Recovery starts at last cluster - 1 • Longer BI format time (runtime) • Longer BI format time after truncate – Use at least one fixed length extent Also use a variable length extent – Use bigrow 29 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Checkpoints and Promon Seeing is believing… Ckpt No. Time Ooops!! ------ Database Writes -----Len Freq Dirty CPT Q Scan 27 10:23:12 4 0 384 52 0 0 0 26 10:22:46 25 26 381 381 0 0 0 25 10:22:18 27 28 380 380 0 0 0 24 10:21:50 27 28 346 158 201 0 0 23 10:21:21 28 29 372 360 115 0 0 30 DB-15: Inside the Recovery Subsystem APW Q Flushes © 2007 Progress Software Corporation Checkpoints and Promon Seeing is believing… Ckpt ------ Database Writes ------ No. Time Len Freq Dirty CPT Q Scan APW Q Flushes 27 10:23:12 4 0 384 52 0 0 0 26 10:22:46 25 26 381 381 0 0 0 25 10:22:18 27 28 380 380 0 0 0 24 10:21:50 27 28 346 158 201 0 0 23 10:21:21 28 29 372 360 115 0 0 Len: begin to end time - Time cluster was actively available for writes Freq: begin time to begin time - Time between checkpoints Time spent performing checkpoint operation: Freq - Len Dirty: # data blocks newly updated – not incremented when “made dirtier” 31 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Checkpoints and Promon APW Specific Activity… Ckpt ------ Database Writes ------ No. Time Len Freq Dirty CPT Q Scan APW Q Flushes 27 10:23:12 4 0 384 52 0 0 0 26 10:22:46 25 26 381 381 0 0 0 25 10:22:18 27 28 380 380 0 0 0 24 10:21:50 27 28 346 158 201 0 0 23 10:21:21 28 29 372 360 115 0 0 CPT Q: # data buffers APW wrote from checkpoint queue (from prev chkpt) Scan: # data buffers APW wrote while scanning -B APW Q: # data buffers APW wrote from APW Q Dirty buffers added to APWQ from -B LRU eviction 32 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Checkpoints and Promon To be avoided… Ckpt ------ Database Writes ------ No. Time Len Freq Dirty CPT Q Scan APW Q Flushes 27 10:23:12 4 0 384 52 0 0 0 26 10:22:46 25 26 381 381 0 0 0 25 10:22:18 27 28 380 380 0 0 0 24 10:21:50 27 28 346 158 201 0 0 23 10:21:21 28 29 372 360 115 0 0 Flushes: Number of blocks written during checkpoint (marked from previous checkpoint) Len: Checkpointing too often should be avoided 33 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Reusing space in the BI file 34 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation BI Space Reuse BI Files 1 35 DB-15: Inside the Recovery Subsystem 2 3 4 © 2007 Progress Software Corporation BI Space Reuse BI Files 1 36 DB-15: Inside the Recovery Subsystem 2 3 4 5 © 2007 Progress Software Corporation BI Space Reuse BI Files 1 2 3 4 5 6 When can BI space be reused? No need to “Age” cluster anymore No open transactions in cluster -G 0 vs –G 60 Thanks fdatasync() W h y ?? Checkpoint DOES NOT close a cluster!! Changes have been written to data files If outstanding transaction were to roll back, where would the undo action come from? BI files grow to some working set size 37 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Rollback 38 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Rollback Processing -bibufs 10 Read backwards & UNDO until tx begin Modified Queue Current Input Buffer 31 15 NF - b 30 Backout Buffer Backout Buffer NF - c 29 9 12 Free List NF - a Current Output Buffer PROMON: Input buffer hits Output buffer hits Mod buffer hits Busy buffer waits 32 Total BI Reads Notes read NF - d NF - e BI .lbi 39 DB-15: Inside the Recovery Subsystem ABL sub transaction rollback: ABL requests compensating action © 2007 Progress Software Corporation What about BOB? -bibufs 10 Modified Queue Current Input Buffer 31 15 NF - b 30 Backout Buffer Backout Buffer NF - c 29 9 12 Free List NF - a Current Output Buffer PROMON: Input buffer hits Output buffer hits Mod buffer hits 32 BO Buffer hits NF - d NF - e 40 DB-15: Inside the Recovery Subsystem BI © 2007 Progress Software Corporation Crash Recovery 41 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Crash Recovery Performed on each database startup • Only needed phases performed Brings DB up to last known consistent state • Physically sound • In-flight transactions rolled back • Missing committed transactions re-applied 43 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Physical Redo Bring DB up to point of crash redo phase - forward scan Before-Image Log Last Recorded Note Oldest active txn Find last active cluster and backup one *** Begin Physical Redo Phase, 4 at 0. Apply notes based on updctr No BI notes generated during redo *** Physical Redo Phase Completed at block, off, upd… *** At end of Physical Redo, txn table is 128 44 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Physical Undo Backout physical DB changes (if needed) redo phase - forward scan Before-Image Log Oldest active txn Last Note Physical undo *** Begin Physical Undo 10 txns at block 128 offset 1608 Starts at crash point. Undo physical and physiological notes Causes new BI notes to be generated Ends when 1st transaction end encountered *** Physical Undo Completed at 128 (block #) 45 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Logical Undo Backout all uncommitted transactions redo phase - forward scan Before-Image Log Last Note Oldest active txn Logical undo backward scan Physical undo *** Begin Logical Undo Phase, 10 incomplete txns are being backed out. *** Logical Undo Phase begin at Block 1136 offset 1608. Starts where physical undo left off Undo logical and physiological notes *** Logical Undo Phase Completed at Block 1135 offset 7743. 46 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Agenda The BI Units of Measure Some Simple Rules General Processing Reliability Switches Summary 47 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Switches: Reliability and Integrity -I : No longer a valid parameter. • Never had anything to do with crash recovery -R : Default - Reliable BI I/O • Writes bypass the FS cache • Use for OLTP *** Before-Image File I/O (-r -R): Reliable. *** Crash Recovery (-i): Enabled. 48 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Switches: Reliability and Integrity -r : BI writes are buffered (un-reliable) to FS • • • • • Well tuned system overshadows any gain of -r All notes recorded Rollback will work Crash recovery likely to work Recovery from OS crash will most likely fail *** This session is running with the non-raw (-r) parameter. *** Before-Image File I/O (-r -R): Not Reliable. *** Crash Recovery (-i): Enabled. *** An earlier -r session crashed, the database may be damaged. 49 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Switches: Reliability and Integrity Why provide it then? -i : Does not record purely physical notes • • • • BI I/O is buffered (un-reliable) to FS No FS sync at checkpoint Rollback will work. OS or DB crash, abnormal termination – Must restore from backup *** This session is being run with the no-integrity (-i) option. *** Crash Recovery (-i): Not Enabled. *** Before-Image File I/O (-r -R): Not Reliable. 50 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Switches: Last Resort -F (dash Foolish) • • • • Enter DB without recovery Use as a last resort Integrity NOT maintained Usually need to – Validate Data Integrity – Dump and load 51 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Agenda The BI Units of Measure Some Simple Rules General Processing Reliability Switches Summary 52 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Summary Recovery is a complex thing You can do things to improve the process We make it simple for you 53 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Questions? -bibufs 10 Free List NF - a Current Out Buffer Checkpoint Queue Modified Queue 32 31 NF - b 30 NF - c 29 Associated BI Note 172 128 NF - d AP W NF - e BI 1 2 54 DB-15: Inside the Recovery Subsystem db 3 4 © 2007 Progress Software Corporation Thank you for your time! 55 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation 56 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Other recovery related Switches -bi -biblocksize -directio • No need for sync at checkpoint time -bwdelay -bibufs, -aibufs -bistall, -bithold 57 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation Switches: Transactions -Mf : Delayed commit • # seconds a commit note can reside in –bibufs • Some commits lost/Integrity Maintained Group Commit Technique • –groupdelay only runs w/-Mf 0 • Only in multi user mode • # milliseconds to sleep at commit time -G : # seconds to age cluster (use & re-use) • No longer needed with fdatasync() 58 DB-15: Inside the Recovery Subsystem © 2007 Progress Software Corporation
© Copyright 2026 Paperzz