VI

Transactions, Concurrency & Recovery

Transaction management, concurrency control, and database recovery mechanisms

1. Overview

The last three fundamental components of a database system are:

Transactions

Atomic units of work that must all succeed or all fail

Concurrency Control

Allow multiple transactions simultaneously while maintaining consistency

Recovery

Restore the database to a consistent state after failures

📖 Transaction

A transaction is a collection of operations that perform a single logical function in a database application. It is a unit of work that is atomic - it either completes entirely or not at all.

2. Transaction Concept

Characteristics of a Transaction

Logical Unit of Work

Operations are related to a single task

Atomic

All operations complete or none

Consistent

Transforms database from one consistent state to another

Isolated

Executes independently of other transactions

Transaction Example

Bank Transfer Transaction
BEGIN TRANSACTION;
    UPDATE Account A SET balance = balance - 100
    WHERE accountID = '12345';

    UPDATE Account B SET balance = balance + 100
    WHERE accountID = '67890';

    -- If any operation fails, ROLLBACK
    -- If all succeed, COMMIT
END TRANSACTION;

⚠️ Why This Example Matters

Both updates must succeed. If the first succeeds but the second fails, money is lost. The transaction ensures all-or-nothing execution.

3. Transaction States

📖 Transaction State Diagram

A transaction goes through several states during its execution, representing different points in its lifecycle.

Active
Partially Committed
↙ ↘
Committed
Failed

Transaction States Explained

Active

The transaction is executing. SQL statements are being processed.

Partially Committed

The final statement has been executed, but changes are not yet written to disk.

Committed

All changes have been written to disk and the transaction has completed successfully. Cannot be rolled back.

Failed

The transaction has encountered an error and cannot continue normally.

📝 Abort vs Commit

From Failed state, the transaction is aborted (rolled back), returning to the state before the transaction began.

4. ACID Properties

ACID properties are fundamental guarantees that a DBMS must provide to ensure data integrity and reliability.

Atomicity (All or Nothing)

A transaction is atomic: it either executes completely or not at all. There are no partial transactions.

Example: In a bank transfer, either both accounts are updated or neither is.

Consistency

A transaction transforms the database from one consistent state to another consistent state. No integrity constraints are violated.

Example: Total money in the bank never changes during a transfer (only moves between accounts).

Isolation

Transactions execute independently. Incomplete transactions are not visible to other transactions.

Example: One transaction's intermediate changes don't affect another transaction's execution.

Durability

Once committed, the changes are permanent and survive any failures (crashes, power loss, etc.).

Example: Even if the server crashes after COMMIT, the changes are still in the database.

ACID in Practice

Property Implemented By
Atomicity Transaction management, Rollback mechanisms
Consistency Integrity constraints, Triggers, Application logic
Isolation Concurrency control, Locking, Timestamp ordering
Durability Logging, Backup, Recovery mechanisms

5. Transaction Control Commands

COMMIT Statement

Saves all changes made during the transaction to the database permanently.

COMMIT Syntax
BEGIN TRANSACTION;
    INSERT INTO account VALUES (...);
    UPDATE account SET balance = balance + 500;
COMMIT;

ROLLBACK Statement

Undoes all changes made during the transaction, returning to the previous consistent state.

ROLLBACK Syntax
BEGIN TRANSACTION;
    UPDATE account A SET balance = balance - 100;
    -- Something went wrong...
ROLLBACK;

SAVEPOINT Statement

Creates a point within a transaction to which you can later rollback.

SAVEPOINT Example
BEGIN TRANSACTION;
    UPDATE account A SET balance = balance - 100;
    SAVEPOINT sp1;

    UPDATE account B SET balance = balance + 100;
    -- Error occurred, rollback to savepoint
    ROLLBACK TO sp1;

    COMMIT;

📝 Transaction Control Flow

  • Every statement in a transaction is atomic itself
  • COMMIT makes all changes permanent
  • ROLLBACK undoes everything since BEGIN
  • SAVEPOINT allows partial rollback

6. Concurrent Executions & Issues

When multiple transactions execute simultaneously, several problems can arise if not properly controlled.

Lost Update Problem

Lost Update

Two transactions read the same data and make conflicting updates, causing one update to be overwritten (lost).

T₁:
READ A (value: 100)
T₂:
READ A (value: 100)
T₁:
A = A + 50 → WRITE A (150)
T₂:
A = A + 20 → WRITE A (120)

Result: A = 120 (T₁'s update is lost!)

Dirty Read Problem

Dirty Read

One transaction reads data that has been written by an uncommitted transaction, which might be rolled back.

T₁:
READ A (100), A = A + 50 → WRITE A (150)
T₂:
READ A (150) ← dirty read!
T₁:
ROLLBACK (A reverts to 100)

T₂ read a value that never existed!

Unrepeatable Read Problem

Unrepeatable Read

A transaction reads the same data twice and gets different values because another transaction modified it between the reads.

T₁:
READ A (100)
T₂:
WRITE A (150)
T₁:
READ A (150) ← different value!

Phantom Read Problem

Phantom Read

A transaction re-executes a query and finds new rows that weren't there before due to another transaction's insert.

T₁:
SELECT * FROM Account WHERE balance > 1000 (5 rows)
T₂:
INSERT INTO Account ... (balance = 2000)
T₁:
SELECT * FROM Account WHERE balance > 1000 (6 rows) ← phantom!

7. Serializability

📖 Serializability

A concurrent execution of transactions is serializable if its effect is equivalent to some serial execution of the same transactions.

Types of Serializability

Conflict Serializability

Concept

Two transactions have a conflict if they access the same data and at least one of them writes to it.

Conflict Types:

  • Read-Write (RW) Conflict
  • Write-Read (WR) Conflict
  • Write-Write (WW) Conflict

Two schedules are conflict equivalent if: They involve the same transactions and every pair of conflicting operations is ordered the same way in both schedules.

View Serializability

Concept

Two schedules are view equivalent if:

  • The same transactions read the same initial values
  • If a transaction reads a value written by another, it's the same in both schedules
  • The same transactions perform final writes

Note: View serializability is more general but harder to enforce than conflict serializability.

📝 Serializability Goals

Ensure that the concurrent execution of transactions produces the same result as if they were executed serially (one after another), maintaining consistency.

8. Isolation Levels

Isolation levels define the degree to which concurrent transactions are isolated from each other.

READ UNCOMMITTED

Allows: Dirty reads, Non-repeatable reads, Phantom reads

Problems: Maximum concurrency, minimum safety

READ COMMITTED

Prevents: Dirty reads

Allows: Non-repeatable reads, Phantom reads

Common default: Balance between safety and performance

REPEATABLE READ

Prevents: Dirty reads, Non-repeatable reads

Allows: Phantom reads

Locks: All rows read during transaction

SERIALIZABLE

Prevents: All anomalies

Effect: Equivalent to serial execution

Cost: Maximum locks, lowest concurrency

Isolation Level Trade-offs

Isolation Level Dirty Read Non-Repeatable Phantom Read Performance
READ UNCOMMITTED ✗ Possible ✗ Possible ✗ Possible ✓ Fastest
READ COMMITTED ✓ Prevented ✗ Possible ✗ Possible Good
REPEATABLE READ ✓ Prevented ✓ Prevented ✗ Possible Fair
SERIALIZABLE ✓ Prevented ✓ Prevented ✓ Prevented ✗ Slowest

9. Lock-Based Concurrency Control

📖 Locks

A lock is a mechanism to control concurrent access to a database object. Only one transaction can hold a lock on an item at a time.

Types of Locks

Shared Lock (Read Lock)

S

Multiple transactions can hold shared locks on the same item

Prevents write by other transactions

Exclusive Lock (Write Lock)

X

Only one transaction can hold an exclusive lock

Prevents read and write by other transactions

Lock Compatibility Matrix

Held Lock Requested Lock
Shared (S) Exclusive (X)
Shared (S) ✓ Compatible ✗ Conflict
Exclusive (X) ✗ Conflict ✗ Conflict

Two-Phase Locking (2PL)

Two-Phase Locking Protocol

All lock requests must come before any unlock requests.

1

Growing Phase

Transaction acquires locks but cannot release any lock

2

Shrinking Phase

Transaction releases locks but cannot acquire new locks

Guarantee: Ensures conflict serializable schedules

Lock Acquisition Example
BEGIN TRANSACTION;
    -- Growing phase
    LOCK A IN SHARED MODE;      -- Acquire S lock on A
    LOCK B IN EXCLUSIVE MODE;   -- Acquire X lock on B

    READ A;
    WRITE B;

    -- Shrinking phase
    UNLOCK A;
    UNLOCK B;
COMMIT;

Strict Two-Phase Locking

All exclusive locks (not shared locks) are held until the transaction commits or aborts. This prevents dirty reads and cascading rollbacks.

10. Deadlock Handling

📖 Deadlock

A deadlock occurs when two or more transactions are waiting for locks held by each other, creating a circular wait condition where neither can proceed.

Deadlock Example

T₁:
LOCK A, needs LOCK B
T₂:
LOCK B, needs LOCK A

Both waiting for each other = DEADLOCK!

Deadlock Prevention

Wait-Die Protocol

Older transaction waits, younger dies

  • If Ti asks for a lock held by Tj
  • If Ti is older: wait
  • If Ti is younger: die (abort)

Wound-Wait Protocol

Older transaction wounds, younger waits

  • If Ti asks for a lock held by Tj
  • If Ti is older: wound Tj (abort)
  • If Ti is younger: wait

Deadlock Detection

Use a Wait-For-Graph (WFG): If a cycle exists, deadlock is detected.

Nodes: Transactions

Edges:

Ti → Tj means Ti is waiting for a lock held by Tj

If cycle exists: Deadlock detected, abort one transaction

Deadlock Recovery

Choose a victim transaction to abort (usually the youngest or least expensive to roll back).

11. Timestamp-Based Concurrency Control

📖 Timestamp Ordering

A concurrency control method that assigns each transaction a unique timestamp and orders transactions based on these timestamps.

Basic Timestamp Ordering (BTO)

For each data item X, maintain:

  • RTS(X) - Read timestamp (latest transaction to read X)
  • WTS(X) - Write timestamp (latest transaction to write X)

Timestamp Ordering Rules

READ Operation

Transaction Ti requests to READ X:

  • If TS(Ti) < WTS(X): Reject (dirty read)
  • Else: Allow and update RTS(X)

WRITE Operation

Transaction Ti requests to WRITE X:

  • If TS(Ti) < RTS(X): Reject (outdated write)
  • If TS(Ti) < WTS(X): Reject (Thomas Write Rule)
  • Else: Allow and update WTS(X)

📝 Timestamp Advantages vs Locks

  • ✓ No deadlock (deadlock-free)
  • ✓ Automatic ordering
  • ✗ More overhead (maintaining timestamps)
  • ✗ Cascading rollbacks possible

12. Recovery Concepts

📖 Recovery

Recovery is the process of restoring a database to a consistent state after a failure.

Types of Failures

Transaction Failure

A transaction cannot complete (e.g., constraint violation)

Recovery: ROLLBACK to undo changes

System Failure

Hardware/software crash (power loss)

Recovery: REDO and UNDO operations

Media Failure

Disk crash (permanent data loss)

Recovery: Restore from backup

Recovery Techniques

Deferred Update Recovery

Changes are only written to disk after COMMIT. If failure occurs before COMMIT, no UNDO is needed (REDO only).

Immediate Update Recovery

Changes are written to disk immediately. If failure occurs before COMMIT, both UNDO and REDO might be needed.

Shadowing

Maintain two copies of the database. Write to shadow copy, swap on COMMIT (risky due to I/O overhead).

13. Log-Based Recovery

📖 Transaction Log

A log is a sequence of log records that record all modifications to the database and transaction operations. Used for recovery and rollback.

Log Record Types

Log Record Type Format Purpose
<START T₁> START transaction T₁ Mark transaction start
<T₁, X, V_old, V_new> UPDATE record Log data modification with before/after values
<COMMIT T₁> COMMIT transaction Mark successful completion
<ABORT T₁> ABORT transaction Mark failed transaction
<CKPT> CHECKPOINT Mark stable point for recovery

Log Structure Example

<START T₁>
<T₁, Account_A, 1000, 950>
<START T₂>
<T₁, Account_B, 500, 550>
<COMMIT T₁>
<T₂, Account_A, 950, 900>
<CKPT>

UNDO/REDO Operations

UNDO

Restore old value V_old for a failed or aborted transaction

When: Transaction aborted or failure before COMMIT

REDO

Reapply new value V_new for a committed transaction

When: Transaction committed but changes not yet written to disk

Recovery Algorithm

1

REDO Phase

Scan forward from last CHECKPOINT. For every COMMIT record, REDO the transaction.

2

UNDO Phase

Scan backward from end. For every ABORT or incomplete transaction (no COMMIT), UNDO the transaction.

📝 CHECKPOINT

A checkpoint marks a point where all modified pages have been written to disk. Allows recovery to start from the checkpoint instead of the beginning of the log, reducing recovery time.

Example Recovery Scenario

Scenario: Crash occurs after T₁ commits but before T₂ commits

Log contents:

<START T₁> <T₁, X, 100, 150> <COMMIT T₁> <START T₂> <T₂, Y, 200, 250> [CRASH]

Recovery:

  • REDO: REDO T₁ (X = 150) - already durable
  • UNDO: UNDO T₂ (Y = 200) - restore old value

Result: T₁ effects are durable, T₂ is rolled back

14. Practice Questions

Explain ACID properties with bank transfer example.

Bank Transfer: Transfer $100 from Account A to Account B

Atomicity (All or Nothing):

Both updates must succeed or both must fail. We cannot have A reduced but B not increased.

Consistency:

Total money in the bank before = Total money after. If A had $1000 and B had $500, after transfer A has $900 and B has $600. Total remains $1500.

Isolation:

Other concurrent transactions cannot see the intermediate state where A is reduced but B is not yet increased.

Durability:

After COMMIT, the transfer is permanent. Even if the server crashes immediately after COMMIT, both accounts will still reflect the transfer when the database recovers.

Define and explain the Lost Update problem with example.

Lost Update Problem: Two transactions read the same data and make updates, but one transaction's update is overwritten (lost) by the other.

Example: Two users increment a counter

Initial value: counter = 5

Time T₁ (User 1) T₂ (User 2) 1 READ counter (5) 2 READ counter (5) 3 counter = 5+1=6 4 WRITE counter (6) 5 counter = 5+1=6 6 WRITE counter (6) Final value: counter = 6 (should be 7!) T₁'s increment is lost!

Solution: Use locks to ensure one transaction completes before another starts, or use timestamp ordering.

What is Dirty Read? How is it prevented?

Dirty Read: A transaction reads data written by another transaction that has not yet committed.

Problem: The writing transaction might be rolled back, leaving the reading transaction with invalid data.

Example:

T₁ reads account balance = $1000 T₂ reads account balance = $1000 T₂ updates balance to $500 and writes it T₁ reads balance again = $500 (dirty read!) T₂ ROLLBACK - balance goes back to $1000 T₁ used invalid data

Prevention Methods:

  • Locks: Exclusive locks prevent reads until transaction commits
  • Isolation Level: Use READ COMMITTED or higher
  • 2PL Protocol: Ensures dirty reads don't occur

Explain Two-Phase Locking (2PL) protocol.

2PL Protocol: All lock requests must come before any unlock requests.

Two Phases:

1. Growing Phase:

  • Transaction acquires locks on data items
  • No locks are released
  • At end of this phase, transaction has all locks it needs

2. Shrinking Phase:

  • Transaction releases locks
  • No new locks can be acquired

Guarantee: 2PL ensures conflict-serializable schedules.

Example:

LOCK A (GROWING) LOCK B (GROWING) READ A, WRITE B UNLOCK A (SHRINKING) UNLOCK B

Note: Strict 2PL holds all exclusive locks until COMMIT to prevent cascading rollback.

What is Deadlock? Explain Wait-Die and Wound-Wait protocols.

Deadlock: Circular wait where two or more transactions are waiting for locks held by each other.

Example:

T₁ locks A, waiting for B (held by T₂) T₂ locks B, waiting for A (held by T₁) Neither can proceed = DEADLOCK

Wait-Die Protocol:

Older transaction waits, younger transaction dies (aborts)

  • If T_old requests lock held by T_young: wait
  • If T_young requests lock held by T_old: die (abort)
  • Prevents circular waits (older never waits for younger)

Wound-Wait Protocol:

Older transaction wounds (aborts) younger, younger waits for older

  • If T_old requests lock held by T_young: wound T_young (abort it)
  • If T_young requests lock held by T_old: wait
  • Younger transactions avoid being aborted if they start first

Explain log-based recovery with REDO and UNDO operations.

Log-Based Recovery: Uses a transaction log to recover from failures.

Log Records:

  • <START T> - Transaction begins
  • <T, X, old_val, new_val> - Modification record
  • <COMMIT T> - Transaction committed
  • <ABORT T> - Transaction aborted

REDO Operation: Reapply changes of committed transactions

UNDO Operation: Undo changes of uncommitted transactions

Recovery Algorithm:

Phase 1 - REDO: Scan log forward from last CHECKPOINT

  • For each <COMMIT T>, REDO the transaction (reapply new values)
  • Ensures durable changes aren't lost

Phase 2 - UNDO: Scan log backward from end

  • For each transaction without <COMMIT>, UNDO it (restore old values)
  • Removes incomplete transactions

Example:

<START T₁> <T₁, A, 100, 150> <COMMIT T₁> <START T₂> <T₂, B, 200, 250> [CRASH - no COMMIT for T₂] Recovery: - REDO T₁ (A = 150) - UNDO T₂ (B = 200)

Explain the difference between Conflict and View Serializability.

Serializability: A concurrent execution is serializable if it produces the same result as some serial execution.

Conflict Serializability:

Two schedules are conflict equivalent if every pair of conflicting operations is ordered the same way in both schedules.

  • Conflicting operations: involve same data item, at least one is write
  • Easier to implement (basis for 2PL)
  • More restrictive than view serializability

View Serializability:

Two schedules are view equivalent if:

  • Same transactions read same initial values
  • If T reads value written by T', it's the same in both schedules
  • Same transactions perform final writes

Comparison:

  • All conflict serializable schedules are view serializable
  • Not all view serializable are conflict serializable
  • Checking view serializability is NP-complete (computationally expensive)
  • Checking conflict serializability is polynomial (efficient)