ADBMS20051018

Recovery
10/18/05
Implementing atomicity
Note, when a transaction commits, the
portion of the system implementing
durability ensures the transaction’s
effects are recorded in persistent
storage.
 However, while a transaction is active
(not yet committed), failure of the
transaction is a real problem for
atomicity -- the DB is left in an
inconsistent state. --> NOT GOOD!

Reasons for Rollback:
Any sort of System SW or HW crash.
 Transaction abort:

 User
initiated
 Transaction, T, itself - e.g., error handling.
 System - e.g.,
T
is involved in deadlock
 Letting T complete may lead to inconsistency
(I.e., violate consistency property).
How to rollback in immediate
update systems

Immediate update system:
 If
T’s request to write x is granted, x is
immediately updated in the DB.
 If T’s request to read x is granted, the value
of x is returned.
 Note, concurrency control is relied upon to
prevent reads of data that were written by
uncommitted transactions.

Immediate update systems maintain a
log of records.
Log in immediate update system
Only append log records -- never
change or delete.
 System uses log to maintain atomicity
and durability.
 For durability:

 Log
used to restore effects of committed
transactions.
 Log is a sequential file on disk
 Often, multiple copies kept on separate
non-volatile storage.
Log - first assume all log records
only on disk.

Update record (for writes)



Before image (aka “undo record”)
Transaction id - the transaction executing the
write.
To rollback T, scan log backwards starting
from last record. Write the before image of
each of T’s log records to DB.
 To improve performance (avoid a scan of
complete log), have each T record a begin
record when T starts.
 Also to improve performance, have each log
record of a particular T be linked together
(stack-wise).
Savepoint record

To increase flexibility in doing rollbacks, a
transaction can specify a savepoint during its
execution. -- Then one can do a partial
rollback to a specified savepoint (especially
useful for transaction error handling).
 Savepoint record contains transaction id,
savepoint id (and any other useful
information).
 To rollback to a specified savepoint, scan log
backward to the specified savepoint record,
applying the before-image to the DB.
Example of use of savepoint
begin_transaction();
stmt1;
sp1 := create_savepoint();
stmt2;
sp2 := create_savepoint();
if (cond1) rollback(sp1);
else if (cond2) rollback(sp2);
(sp2);
else …
commit();
Concurrent transactions
In order to use the log, the system must
determine which transactions have
completed (commited or aborted), and
which are active.
 All active transactions need to be
aborted.

What does commit mean here?
If commit record has not been written to
log and database fails, then the
transaction will be rolled back.
 SO! Commit means the commit record
has been written to the log.

Checkpoints
A checkpoint record gives all currently
active transactions (e.g., written by the
transaction manager to the log).
 To use checkpoint record, scan
backward to most recent checkpoint
record. If T is listed there and there has
been no completion for T (abort or
commit) seen so far, then backward
scan continues.

Log example
B1
B2
U1
C1
CK: T2
U2
U2
// T1 begin
// T1 update
// T1 commit
// checkpoint
// (a)
// (b)
<<fail!>> --> scan back, undo (b), undo (a),
discover only T2 is active, ignore C1, ignore
U1, stop at B2.
Another log example
...
B2
B3
B1
C2
// T2 commit
B5
U3
U5
A5
// T5 abort
CK: T4, T1, T3
U1
U4
B6
C4
// T4 commit
U6
U1
<< fail >>
/\ ok, T3 scan complete
/\ ok, T1 scan complete
/\ ignore
/\ ignore
/\ undo
/\ ignore
/\ ignore
/\ only T1, T3 matter
/\ undo
/\ can ignore
/\ done with T6
/\ T4 completed!
/\ T6 active -undo
/\ T1 active -undo
Yet another log example
...
B6
U5
U4
CK: 1, 4, 5, 6
A5
U4
C4
U6
<< fail>>
/\ continue for T1
/\ T6 scan done
/\ ignore
/\ ignore
/\ only T1, T6 matter
/\ T5 done
/\ ignore
/\ T4 done
/| T6 active-undo
Write-ahead log

MUST always write log before DB is
updated.
 Suppose
don’t do write-ahead, T executes
update --> first change DB then write log.
 If
crash between change DB and write log,
there is no way to recover DB to a consistent
state.
 Suppose
do do write-ahead, T executes
update --> first write log, then change DB.
 If
crash between write log and change DB, the
recovery will write the before image (which is
the same as is currently stored in DB).
Performance stinks because each
DB write requires two I/O writes!
Use volatile storage for the last part of
the log -- log buffer.
 Log buffer periodically flushed to log.
 When system crashes, the log buffer is
not available.
 Note, using cache is analogous:

 Want
cache to improve performance, but…
 Cache data (DB and maybe log buffer) are
lost.
Modify previous scheme for log
buffer and cache


Recall, must write record to log before writing to DB.
So, A dirty page in cache is not written to DB until
after the log buffer containing corresponding data
item is appended to log. Either:
 Append record to log buffer. Eventually the buffer
is flushed and can write dirty cache page.
 Append record to log buffer, then immediately
write log buffer. AKA forced.
 For a normal (unforced) write, DMA can proceed
concurrently with transaction execution.
 BUT! for a forced write, cannot return from disk write
system call until the write is complete.
Alternative implementation for
lug buffer and cache

Add (overhead) data:
 Add
a log sequence number (LSN) to each
log record.
 For each DB page, the LSN of the log
record for the most recent change to the
DB page.
Continuing alternative
implementation

When space needed in cache, choose a dirty
page, P, to write out
 Determine if log buffer contains the update
record whose LSN is the LSN stored in P.
 If so, must force write log buffer before P is
written to DB.
 If not, the log on mass storage is already up
to date wrt P.
Example:

•



DISK: DB
Page#
LSN
• O
3
• P
95
• Q
3
Volatile: Cache
Page#
LSN
P’ (x, y, z)
95, 101, 102, 103
Q’ (a, b, c) 99
O (l, m, n)
3
Log
1 …
…
99 U5(m)
Log Buffer
LSN record
100
...
101
U1(x)
102
U2(y)
103
U2(x)
To remove clean O, no change to DB
To remove dirty Q, cache->Q-> LSN <= log’s maximum LSN. Therefore
can just write out Q.
To remove dirty P, cache->P->LSN > log’s maximimum LSN. Therefore,
must force Log Buffer (from beginning to P->LSN), then can write out
P.
Force policy (on commits)

Force policy:
T
wants to commit, but first!
 If
T’s last update is still in log buffer, force log
buffer. (before image is durable)
 Pages (dirty) in cache updated by T are forced
(new values durable)
 Then, log T’s commit into log buffer.
 When
that part of the log buffer is written,
then T is durable.
Example on board.
For no-force commit policy: New
log record type: after-image
After-image (aka redo record) is a copy
of the new value of the item.
 The motivation for having after-image in
log is to improve disk access
performance. I.e., new data is durable if
the log buffer has been written out
(even though the page in cache has
not). So, there is no required order
between writing commit record (to disk)
and writing dirty page).

Example on board.
Three pass recovery: Do, Undo,
Redo

Pass I: Scan log backward to the most recent
checkpoint (determining which transactions to
rollback, I.e., are active at crash)
 Pass 2: Replay log from checkpoint. For
update records (commited, aborted and
active) update corresponding items in DB
(use after-image). Now DB is up-to-date wrt
all changes prior to crash.
 Pass 3: Scan backward to roll back all
transactions active at the time of crash. Se
before-image to reverse DB value. This pass
ends when begin of all roll back transactions
have been reached.
Caveat to Do, Undo, Redo

Checkpoint followed by T update, then T
abort:
 Update
was rolled back to data before
abort was logged. So updates are restored,
but not rolledback.
 To fix, an abort that had updated x need
TWO records in the log:


update (xold, xnew), followed by
compensation (xnew, xold).
Example on board
Class discussion of ARIES.