Overview of Transaction Management

Part V:
Transactions, Concurrency control,
Scheduling, and Recovery
1
Transaction Management Overview
Chapter 16
2
Database and Concurrency



Concurrent execution of user programs is essential for good
DBMS performance. Why?
 Because disk accesses are frequent, and relatively slow, it
is important to keep the CPU humming by working on
several user programs concurrently.
A user’s program may carry out many operations on the
data retrieved from the database, but the DBMS is only
concerned about what data is read/written from/to the
database. Is this OK?
Database state is defined in terms of data values. With
concurrent access to shared data, what might be a problem
with the database state?
 It is dynamic, not static.
3
Database Correctness

The correctness of a DB is defined by consistency
 Internal consistency (semantic integrity)
 Mutual consistency
 They cannot be enforced in each action. Why?
 Example 1: Transfer $1M from account A to account B.
 Example 2: A = 2*B; Increase A by 50% -> same for B

A transaction
 DBMS’s abstract view of a user program: a sequence of
reads and writes
 A complete and consistent computation
 Atomicity, consistency, isolation, durability (ACID)
 Scheduler synchronizes concurrent operations
4
Overall System Structure
Naïve users
Query processing
Application
interfaces
Object code
Application
programmers
Sophisticated
users
Application
programs
query
Embedded
DML compiler
DML
compiler
DBA
Database scheme
DDL
interpreter
Query evaluation
engine
Transaction
manager
DBMS
System
Cache manager
File manager
Data files
indices
Statistical data
Data dictionary
Disk
storage
5
Database System Model

Functional decomposition
 Transaction manager (TM)
 Scheduler
 Data manager (DM)
• Recovery manager (RM)
• Cache manager (CM)

Transaction manager
 A process running on behalf of the user transaction
 Performs any required preprocessing of database (e.g.,
assigning a transaction id, selecting the participating
sites) and operations it receives from transactions
6
Database System Model

Data manager



Operates directly on the DB and responsible for
transaction termination
Consists of RM and CM
Recovery manager
 Atomicity
 Resilient to failures: transaction, system, and media
 Operations: start, commit, abort, read, write

Cache manager
 Manages data movement between main (volatile)
memory and disk (stable) storage
 Actions: fetch and flush
7
Concurrency in a DBMS

Users submit transactions, and can think of each
transaction as executing by itself.



Concurrency is achieved by the DBMS, which interleaves
actions (reads/writes of DB objects) of various transactions.
Each transaction must leave the database in a consistent
state if the DB is consistent when the transaction begins.
• DBMS will enforce some ICs, depending on the ICs
declared in CREATE TABLE statements.
• Beyond this, the DBMS does not really understand the
semantics of the data. (e.g., it does not understand how
the interest on a bank account is computed).
Issues: Effect of concurrent transactions, and failures.
8
Atomicity of Transactions
A transaction might commit after completing all its
actions, or it could abort (or be aborted by the DBMS)
after executing some actions.
 A very important property guaranteed by the DBMS
for all transactions is that they are atomic.
 Atomicity: a transaction is assumed to be executing
all its actions in one step, or not executing any
actions at all.


Not easy to achieve. Why?
9
Atomicity: Why Difficult?
A transaction executed alone leaves DB consistent.
 During execution of a transaction, the DB state
may be temporarily in an inconsistent state.
 Other transactions may observe inconsistent DB
state
 The system may crash in the middle of the
transaction execution.

10
Consistency, Isolation, Durability
A transaction executed in isolation must preserve
DB consistency.
 Even if multiple transactions executed
concurrently, each should be unaware of other
transactions are being executed concurrently.
 When a transaction complete successfully, the
changes it made must persist, even with failures
afterwards.

11
Example

Consider two transactions (Xacts):
T1:
T2:
BEGIN A=A+100, B=B-100 END
BEGIN A=1.06*A, B=1.06*B END
Intuitively, the first transaction is transferring $100
from B’s account to A’s account. The second is
crediting both accounts with a 6% interest payment.
 There is no guarantee that T1 will execute before T2 or
vice-versa, if both are submitted together. However,
the net effect must be equivalent to these two
transactions running serially in some order.

12
Example (Contd.)

Consider a possible interleaving:
T1:
T2:


A=1.06*A,
B=B-100
B=1.06*B
Is this OK?
What about the following?
T1:
T2:

A=A+100,
A=A+100,
A=1.06*A, B=1.06*B
B=B-100
The DBMS’s view of the second schedule:
T1:
T2:
R(A), W(A),
R(A), W(A), R(B), W(B)
R(B), W(B)
13
Anomalies with Interleaved Execution

T1:
T2:

T1:
T2:
Reading Uncommitted Data (WR Conflicts,
“dirty reads”):
R(A), W(A),
R(A), W(A), C
R(B), W(B), Abort
Unrepeatable Reads (RW Conflicts):
R(A),
R(A), W(A), C
R(A), W(B), C
14
Anomalies (Continued)

T1:
T2:
Overwriting Uncommitted Data (WW
Conflicts):
W(A),
W(A), W(B), C
W(B), C
15
Scheduling Transactions





Schedule: An execution history
Serial schedule: Schedule that does not interleave the
actions of different transactions. Is it efficient?
Equivalent schedules: For any database state, the effect of
executing the first schedule is identical to the effect of
executing the second schedule.
Serializable schedule: A schedule that is equivalent to
some serial execution of the same set of transactions.
Is the following statement correct?
If each transaction preserves consistency, every
serializable schedule preserves consistency.
16
Scheduling and Ordering




Ta=a1(x)a2(y); Tb=b1(x)b2(y)
Is a1a2b1b2 equivalent to a1b1a2b2? What about a1b1b2a2?
The ordering a1b1a2b2 preserves the consistency,
while the other does not.
Ordering actions serves the purpose of implementing
atomic operations so as to preserve the consistency of the
database.
DBMS may execute a set of transactions in any order as
long as the effect is the same as that of some serial order. Is
this OK?
What if the user or application wants a specific order
between two transactions T1 and T2? How to enforce such
ordering?
17
Conflicting Operations

Two operations conflict if
 They are issued by different transactions
 They operate on the same data object
 At least one of them is a write operation
R1(A)W2(A); W1(A)W2(A); W1(A)R2(A)
 R1(A)R2(A)


Conflict equivalent
 Two executions are conflict equivalent if in both
executions, all conflicting operations have the same
order.
18
Serializability



Correctness criterion
 Serializability is the correctness definition of DB
 All serializable schedules are equally correct
 Scheduling algorithms enforce certain ordering
 In distributed DBMS, variable delays may disturb any
particular ordering which is supposed to occur
Equivalent schedules (broader than conflict equivalence)
 Two schedules are equivalent if
• Every read operation reads from the same write in
both schedules
• Both schedules have the same final write
Serialization graph (dependency graph) shows dependency
relationship among transactions
19
Conflicts and Equivalence



When do two operations conflict?
 They are issued by different transactions
 They operate on the same data object
 At least one of them is a write operation
Conflict equivalent
 Two executions are conflict equivalent if in both
executions, all conflicting operations have the same
order.
Equivalent schedules (broader than conflict equivalence)
 Two schedules are equivalent if
• Every read operation reads from the same write in
both schedules
• Both schedules have the same final write
20
Serializability

Correctness criterion
 Serializability is the correctness definition of DB
 All serializable schedules are equally correct
 Scheduling algorithms enforce certain ordering
 In distributed DBMS, variable delays may disturb any
particular ordering which is supposed to occur

Serialization graph (dependency graph) shows dependency
relationship among transactions
21
Serializability Theorem

Dependency relation
 Ti -> Tj implies that for some data object X,
ri(X) -> wj(X), or wi(X) -> ri(X), or wi(X) -> wj(X)

Serialization (dependency) graph
 A directed graph whose nodes are transactions and
there is an edge from Ti to Tj if and only if Ti ->Tj

Serialization Theorem
 For a schedule H, if SG(H) is acyclic, then H is
serializable.

Examples
22
Equivalent Execution
T1=r1(x)r1(z)w1(x)
T2=r2(y)r2(z)w2(y)
T3=w3(x)r3(y) w3(z)
H1=w3(x)r1(x)r3(y)r2(y)w3(z)r1(z)r2(z)w2(y)w1(x)

Dependence relationship?
 T3->T1
 T3->T2
H2=w3(x)r3(y) w3(z)r2(y)r2(z)w2(y)r1(x)r1(z)w1(x)
Is H2 a serial execution?
 Is H1 equivalent to H2?
 Is H1 serializable execution?

23
Properties of Schedules

Recoverability
 To ensure that aborting a transaction does not change
the semantics of committed transactions
w1(x)r2(x)w2(y)C2
 Is it recoverable? What if T1 aborts?
 Recoverable execution depends on commit order
 A transaction cannot commit until all values it read are
guaranteed not to be aborted. How to do it?
 Delaying commit: T2 cannot commit until T1 commits
 Cascaded abort is sometimes necessary. Why?
w1(x)r2(x)w2(x)A1
24
Properties of Schedules

Recoverability
 Cascaded abort is sometimes necessary
w1(x)r2(x)w2(x)A1

Avoiding cascaded aborts
 Achieved if every transaction reads only the values
written by committed transactions
 Must delay each r(x) until all transactions that issued
w(x) is either committed or aborted
w1(x) ….. C1 r2(x) w2(y) …
25
Properties of Schedules

Restoring before images
 Implementing transaction abort by simply restoring
before images of all writes is very convenient
w0(x)w1(x)w2(x) A1 A2
 Value of x must be restored to the initial value, not the
value written by T1
 Solution: delay w(x) until all transactions that have
written x are either committed or aborted

Strictness
 Executions that satisfy both requirements
 Delay both r(x) and w(x) until all transactions that have
written w(x) are either committed or aborted
r1(x)w1(x) w1(y) w2(z) w2(x) C1 --- is it strict?
26
Properties of Schedules
Recoverability (RC)
 RC if Ti reads from Tj and Ci is in H, then Ci follows Cj
 Avoiding cascaded aborts (ACA)
 ACA if Ti reads x from Tj then ri(x) follows Cj
 Strictness (ST)
 ST if whenever Oi(x) follows wj(x), then Oi(x) follows
either Aj or Cj
T1=w1(x)w1(y)w1(z) C1; T2=r2(u)w2(x)r2(y)w2(y) C2
H1=w1(x)w1(y)r2(u)w2(x)r2(y)w2(y)C2 w1(z) C1 –- SR? RC?
C2 before C1 --- SR but not RC
H2=w1(x)w1(y)r2(u)w2(x)r2(y)w2(y) w1(z) C1 C2 –- RC? ACA?
RC but not ACA --- if A1 rather than C1, it must be A2, not C2
H3=w1(x)w1(y)r2(u)w2(x)w1(z) C1 r2(y)w2(y) C2 --- ACA? ST?
ACA but not ST --- w2(x) before C1

27
Exercise on Properties of Schedules

What are the properties of the following schedule?
H1=w1(x)r3(x) w1(y) C1 w3(y) w3(x) C3
H2=w1(x)r2(y)w1(y)w2(z)r1(u) C2 C1
H3=w1(x)w1(y)r2(u)w2(x)r2(y)w2(y) C2 w1(z) C1
H4=w1(x)w1(y)r2(u)w2(x)r2(y)w2(y) w1(z) C1 C2
1) H1 is SR and RC, but not ACA. Is it ST?
2) How can you reschedule H1 to make it ST?
3) H2 is SR and RC. Is it ACA?
4) H3 is SR but not RC. Is it ACA?
5) H4 is SR and RC, but not ACA. Is it ST?
28
Relationships among Properties




Recoverability (RC)
 RC if Ti reads from Tj and Ci is in H, then Ci follows
Cj
Avoiding cascaded aborts (ACA)
 ACA if Ti reads x from Tj then ri(x) follows Cj
Strictness (ST)
 ST if whenever Oi(x) follows wj(x), then Oi(x) follows
either Aj or Cj
What is the relationship among ST, ACA, and RC?
ST < ACA < RC

What about with SR and Serial execution?
29
View Serializability









A schedule is view serializable if it is view equivalent to
some serial schedule.
Schedules S1 and S2 are view equivalent if:
 If Ti reads value of A written by Tj in S1, then Ti also
reads value of A written by Tj in S2
 If Ti writes final value of A in S1, then Ti also writes final
value of A in S2
H1=w1(x)w2(x)w3(x)w2(y)r1(y)
H2=w2(x)w2(y)w1(x)r1(y)w3(x)
Is H1 view equivalent to H2?
Is H1 view serializable?
Is H1 conflict serializable? How can you prove it?
Serialization graph of H1
Relationship: Every C-SR is V-SR, but not the other way.
30
Scheduling and Concurrency Control


Objective
 Atomic execution of transactions on shared data by
controlling the execution order of concurrent access
Approaches
 Two-phase locking
 Timestamp ordering
 Optimistic scheduling
 Hybrid schemes
31
Scheduling and Concurrency Control



TM <-> Scheduler <-> DM
Options for a scheduler

When receiving a request from TM
1) Immediately schedule it (send it to DM)
2) Delay it (put it into a queue)
3) Reject it (causing aborting the transaction)
Aggressive vs conservative approaches
 Optimistic vs pessimistic
 Aggressive favors immediate action (option 1); if not
possible to finish, abort T (option 3)
 Conservative favors option 2
 Performance trade-offs between the two
 Most well-known conservative approach: two-phase
locking
32
Lock-Based Concurrency Control

Two-phase Locking (2PL) Protocol:






Each data object has a lock associated with it.
Transaction must obtain a S (shared) lock on object before
reading, and an X (exclusive) lock on object before writing.
If a conflicting lock is already set by other transaction, the
requested operation will be delayed, forcing the
transaction to wait
• If Ti holds an X lock on an object, no other Tj can get a
lock (S or X) on that object.
Two-phase: growing phase and shrinking phase
Once Ti releases a lock on any data object, it cannot get
another lock on any data object. Why?
Examples
33
Example of Simple Locking
T1=A+100->A; B+100->B
T2=Ax2->A; Bx2->B
Simple locking:
T1: lock A; A+100->A; unlock A; lock B; B+100->B; unlock B
T2: lock A; lock B; Ax2->A; Bx2->B; unlock A; unlock B
 Is this OK?
 What can be a problem here?
Another example: T1=r1(x)w1(y)C1 T2=w2(x)w2(y)C2
H=r1(x)w2(x)w2(y)C2w1(y)C1
 Is it SR? How can you tell?
 Is it RC? Does it matter?
 How can you fix it?
34
Example of Two-Phase Locking
T1=A+100->A; B+100->B
T2=Ax2->A; Bx2->B
2PL:
T1: lock A; A+100->A; lock B; unlock A; B+100->B; unlock B
T2: Lock A; lock B; Ax2->A; Bx2->B; unlock A; unlock B
 Is this OK?
 H= T1: lock A; T1: A+100->A; T1: lock B; T1: unlock A;
T2: lock A; T2: … What will happen?
 T2 will be blocked and wait to get lock B
 T2 will continue when T1 finish B+100->B and unlock B
35
Correctness of Schedulers

Need to prove


All schedules representing the executions that can be
produced by the scheduler are serializable (SR)
How to do it?
 Enumerate all the possible schedules and check any one of
them is not SR.
 Good, but is it feasible?

Two –step approach
 Characterize the properties of its schedules
 Prove that any schedule with such properties are SR

How to characterize the properties?
36
Properties of 2PL

1.
2.
3.

Notation:
Oi(x): Operation O by transaction Ti on x
OLi(x): lock for Oi(x); OUi(x): unlock after Oi(x)
If Oi(x) is in the schedule, then OLi(x) and OUi(x) are also in
the schedule, and OLi(x) < Oi(x) < OUi(x)
If Oi(x) and Oj(x) are conflicting operations in the schedule,
then either OUi(x) < OLj(x) or OUj(x) < OLi(x)
If Oi(x) and Oi(y) is in the schedule, then Oli(x) < OUi(y)
Where is this last property (property 3) coming from?
37
Correctness of 2PL


1.
2.
3.
4.
5.
Theorem: 2PL is correct (i.e., SR)
Proof
If Ti -> Tj is in the schedule, then for some x, OUi(x) < OLj(x)
If Ti -> Tj ->Tk in the schedule, then Ti releases some lock
before Tj sets the lock, and the same for Tj and Tk. By
induction, same for T1 and Tn if T1->T2-> … ->Tn
If the schedule has a cycle in the serialization graph, i.e.,
T1->T2-> … ->Tn->T1, then T1 releases some lock before T1
sets a lock.
This is a violation of two-phaseness (property 3), and it
cannot be a 2PL schedule.
Hence a cycle cannot exist.
38
Variations of 2PL

Strict Two-phase Locking Protocol:






Each Xact must obtain a S (shared) lock on object before
reading, and an X (exclusive) lock on object before writing.
All locks held by a transaction are released together when
the transaction completes
Non-strict 2PL: Release locks anytime, but cannot acquire
locks after releasing any lock.
Almost all 2PL implementations are strict 2PL. Why?
Practical reason: when scheduler can release lock?
Strict 2PL allows only serializable schedules.
 Additionally, it simplifies transaction aborts
 Non-strict 2PL also allows only serializable schedules, but
involves more complex abort processing
39
Aborting a Transaction
If a transaction Ti is aborted, all its actions have to be
undone. Not only that, if Tj reads an object last
written by Ti, Tj must be aborted as well!
 We can avoid such cascading aborts by releasing a
transaction’s locks only at commit time.



If Ti writes an object, Tj can read this only after Ti commits.
In order to undo the actions of an aborted transaction,
the DBMS maintains a log in which every write is
recorded. This mechanism is also used to recover
from system crashes: all active Xacts at the time of the
crash are aborted when the system comes back up.
40
The Log

The following actions are recorded in the log:


Ti writes an object: the old value and the new value.
• Log record must go to disk before the changed page!
Ti commits/aborts: a log record indicating this action.
Log records are chained together by Xact id, so it’s
easy to undo a specific Xact.
 Log is often duplicated and archived on stable storage.
 All log related activities (and in fact, all CC related
activities such as lock/unlock, dealing with deadlocks
etc.) are handled transparently by the DBMS.

41
Recovering From a Crash

There are 3 phases in the Aries recovery algorithm:



Analysis: Scan the log forward (from the most recent
checkpoint) to identify all Xacts that were active, and all dirty
pages in the buffer pool at the time of the crash.
Redo: Redoes all updates to dirty pages in the buffer pool,
as needed, to ensure that all logged updates are in fact
carried out and written to disk.
Undo: The writes of all Xacts that were active at the crash
are undone (by restoring the before value of the update,
which is in the log record for the update), working
backwards in the log. (Some care must be taken to handle
the case of a crash occurring during the recovery process!)
42
Summary
Concurrency control and recovery are among the
most important functions provided by a DBMS.
 Users need not worry about concurrency.



System automatically inserts lock/unlock requests and
schedules actions of different Xacts in such a way as to
ensure that the resulting execution is equivalent to
executing the Xacts one after the other in some order.
Write-ahead logging (WAL) is used to undo the
actions of aborted transactions and to restore the
system to a consistent state after a crash.

Consistent state: Only the effects of commited Xacts seen.
43