Transactions, Concurrency & Recovery
Transaction management, concurrency control, and database recovery mechanisms
1. Overview
The last three fundamental components of a database system are:
Transactions
Atomic units of work that must all succeed or all fail
Concurrency Control
Allow multiple transactions simultaneously while maintaining consistency
Recovery
Restore the database to a consistent state after failures
📖 Transaction
A transaction is a collection of operations that perform a single logical function in a database application. It is a unit of work that is atomic - it either completes entirely or not at all.
2. Transaction Concept
Characteristics of a Transaction
Logical Unit of Work
Operations are related to a single task
Atomic
All operations complete or none
Consistent
Transforms database from one consistent state to another
Isolated
Executes independently of other transactions
Transaction Example
BEGIN TRANSACTION;
UPDATE Account A SET balance = balance - 100
WHERE accountID = '12345';
UPDATE Account B SET balance = balance + 100
WHERE accountID = '67890';
-- If any operation fails, ROLLBACK
-- If all succeed, COMMIT
END TRANSACTION;
⚠️ Why This Example Matters
Both updates must succeed. If the first succeeds but the second fails, money is lost. The transaction ensures all-or-nothing execution.
3. Transaction States
📖 Transaction State Diagram
A transaction goes through several states during its execution, representing different points in its lifecycle.
Transaction States Explained
Active
The transaction is executing. SQL statements are being processed.
Partially Committed
The final statement has been executed, but changes are not yet written to disk.
Committed
All changes have been written to disk and the transaction has completed successfully. Cannot be rolled back.
Failed
The transaction has encountered an error and cannot continue normally.
📝 Abort vs Commit
From Failed state, the transaction is aborted (rolled back), returning to the state before the transaction began.
4. ACID Properties
ACID properties are fundamental guarantees that a DBMS must provide to ensure data integrity and reliability.
Atomicity (All or Nothing)
A transaction is atomic: it either executes completely or not at all. There are no partial transactions.
Example: In a bank transfer, either both accounts are updated or neither is.
Consistency
A transaction transforms the database from one consistent state to another consistent state. No integrity constraints are violated.
Example: Total money in the bank never changes during a transfer (only moves between accounts).
Isolation
Transactions execute independently. Incomplete transactions are not visible to other transactions.
Example: One transaction's intermediate changes don't affect another transaction's execution.
Durability
Once committed, the changes are permanent and survive any failures (crashes, power loss, etc.).
Example: Even if the server crashes after COMMIT, the changes are still in the database.
ACID in Practice
| Property | Implemented By |
|---|---|
| Atomicity | Transaction management, Rollback mechanisms |
| Consistency | Integrity constraints, Triggers, Application logic |
| Isolation | Concurrency control, Locking, Timestamp ordering |
| Durability | Logging, Backup, Recovery mechanisms |
5. Transaction Control Commands
COMMIT Statement
Saves all changes made during the transaction to the database permanently.
BEGIN TRANSACTION;
INSERT INTO account VALUES (...);
UPDATE account SET balance = balance + 500;
COMMIT;
ROLLBACK Statement
Undoes all changes made during the transaction, returning to the previous consistent state.
BEGIN TRANSACTION;
UPDATE account A SET balance = balance - 100;
-- Something went wrong...
ROLLBACK;
SAVEPOINT Statement
Creates a point within a transaction to which you can later rollback.
BEGIN TRANSACTION;
UPDATE account A SET balance = balance - 100;
SAVEPOINT sp1;
UPDATE account B SET balance = balance + 100;
-- Error occurred, rollback to savepoint
ROLLBACK TO sp1;
COMMIT;
📝 Transaction Control Flow
- Every statement in a transaction is atomic itself
- COMMIT makes all changes permanent
- ROLLBACK undoes everything since BEGIN
- SAVEPOINT allows partial rollback
6. Concurrent Executions & Issues
When multiple transactions execute simultaneously, several problems can arise if not properly controlled.
Lost Update Problem
Lost Update
Two transactions read the same data and make conflicting updates, causing one update to be overwritten (lost).
Result: A = 120 (T₁'s update is lost!)
Dirty Read Problem
Dirty Read
One transaction reads data that has been written by an uncommitted transaction, which might be rolled back.
T₂ read a value that never existed!
Unrepeatable Read Problem
Unrepeatable Read
A transaction reads the same data twice and gets different values because another transaction modified it between the reads.
Phantom Read Problem
Phantom Read
A transaction re-executes a query and finds new rows that weren't there before due to another transaction's insert.
7. Serializability
📖 Serializability
A concurrent execution of transactions is serializable if its effect is equivalent to some serial execution of the same transactions.
Types of Serializability
Conflict Serializability
Concept
Two transactions have a conflict if they access the same data and at least one of them writes to it.
Conflict Types:
- Read-Write (RW) Conflict
- Write-Read (WR) Conflict
- Write-Write (WW) Conflict
Two schedules are conflict equivalent if: They involve the same transactions and every pair of conflicting operations is ordered the same way in both schedules.
View Serializability
Concept
Two schedules are view equivalent if:
- The same transactions read the same initial values
- If a transaction reads a value written by another, it's the same in both schedules
- The same transactions perform final writes
Note: View serializability is more general but harder to enforce than conflict serializability.
📝 Serializability Goals
Ensure that the concurrent execution of transactions produces the same result as if they were executed serially (one after another), maintaining consistency.
8. Isolation Levels
Isolation levels define the degree to which concurrent transactions are isolated from each other.
READ UNCOMMITTED
Allows: Dirty reads, Non-repeatable reads, Phantom reads
Problems: Maximum concurrency, minimum safety
READ COMMITTED
Prevents: Dirty reads
Allows: Non-repeatable reads, Phantom reads
Common default: Balance between safety and performance
REPEATABLE READ
Prevents: Dirty reads, Non-repeatable reads
Allows: Phantom reads
Locks: All rows read during transaction
SERIALIZABLE
Prevents: All anomalies
Effect: Equivalent to serial execution
Cost: Maximum locks, lowest concurrency
Isolation Level Trade-offs
| Isolation Level | Dirty Read | Non-Repeatable | Phantom Read | Performance |
|---|---|---|---|---|
| READ UNCOMMITTED | ✗ Possible | ✗ Possible | ✗ Possible | ✓ Fastest |
| READ COMMITTED | ✓ Prevented | ✗ Possible | ✗ Possible | Good |
| REPEATABLE READ | ✓ Prevented | ✓ Prevented | ✗ Possible | Fair |
| SERIALIZABLE | ✓ Prevented | ✓ Prevented | ✓ Prevented | ✗ Slowest |
9. Lock-Based Concurrency Control
📖 Locks
A lock is a mechanism to control concurrent access to a database object. Only one transaction can hold a lock on an item at a time.
Types of Locks
Shared Lock (Read Lock)
S
Multiple transactions can hold shared locks on the same item
Prevents write by other transactions
Exclusive Lock (Write Lock)
X
Only one transaction can hold an exclusive lock
Prevents read and write by other transactions
Lock Compatibility Matrix
| Held Lock | Requested Lock | |
|---|---|---|
| Shared (S) | Exclusive (X) | |
| Shared (S) | ✓ Compatible | ✗ Conflict |
| Exclusive (X) | ✗ Conflict | ✗ Conflict |
Two-Phase Locking (2PL)
Two-Phase Locking Protocol
All lock requests must come before any unlock requests.
Growing Phase
Transaction acquires locks but cannot release any lock
Shrinking Phase
Transaction releases locks but cannot acquire new locks
Guarantee: Ensures conflict serializable schedules
BEGIN TRANSACTION;
-- Growing phase
LOCK A IN SHARED MODE; -- Acquire S lock on A
LOCK B IN EXCLUSIVE MODE; -- Acquire X lock on B
READ A;
WRITE B;
-- Shrinking phase
UNLOCK A;
UNLOCK B;
COMMIT;
Strict Two-Phase Locking
All exclusive locks (not shared locks) are held until the transaction commits or aborts. This prevents dirty reads and cascading rollbacks.
10. Deadlock Handling
📖 Deadlock
A deadlock occurs when two or more transactions are waiting for locks held by each other, creating a circular wait condition where neither can proceed.
Deadlock Example
Both waiting for each other = DEADLOCK!
Deadlock Prevention
Wait-Die Protocol
Older transaction waits, younger dies
- If Ti asks for a lock held by Tj
- If Ti is older: wait
- If Ti is younger: die (abort)
Wound-Wait Protocol
Older transaction wounds, younger waits
- If Ti asks for a lock held by Tj
- If Ti is older: wound Tj (abort)
- If Ti is younger: wait
Deadlock Detection
Use a Wait-For-Graph (WFG): If a cycle exists, deadlock is detected.
Nodes: Transactions
Edges:
Ti → Tj means Ti is waiting for a lock held by Tj
If cycle exists: Deadlock detected, abort one transaction
Deadlock Recovery
Choose a victim transaction to abort (usually the youngest or least expensive to roll back).
11. Timestamp-Based Concurrency Control
📖 Timestamp Ordering
A concurrency control method that assigns each transaction a unique timestamp and orders transactions based on these timestamps.
Basic Timestamp Ordering (BTO)
For each data item X, maintain:
RTS(X)- Read timestamp (latest transaction to read X)WTS(X)- Write timestamp (latest transaction to write X)
Timestamp Ordering Rules
READ Operation
Transaction Ti requests to READ X:
- If TS(Ti) < WTS(X): Reject (dirty read)
- Else: Allow and update RTS(X)
WRITE Operation
Transaction Ti requests to WRITE X:
- If TS(Ti) < RTS(X): Reject (outdated write)
- If TS(Ti) < WTS(X): Reject (Thomas Write Rule)
- Else: Allow and update WTS(X)
📝 Timestamp Advantages vs Locks
- ✓ No deadlock (deadlock-free)
- ✓ Automatic ordering
- ✗ More overhead (maintaining timestamps)
- ✗ Cascading rollbacks possible
12. Recovery Concepts
📖 Recovery
Recovery is the process of restoring a database to a consistent state after a failure.
Types of Failures
Transaction Failure
A transaction cannot complete (e.g., constraint violation)
Recovery: ROLLBACK to undo changes
System Failure
Hardware/software crash (power loss)
Recovery: REDO and UNDO operations
Media Failure
Disk crash (permanent data loss)
Recovery: Restore from backup
Recovery Techniques
Deferred Update Recovery
Changes are only written to disk after COMMIT. If failure occurs before COMMIT, no UNDO is needed (REDO only).
Immediate Update Recovery
Changes are written to disk immediately. If failure occurs before COMMIT, both UNDO and REDO might be needed.
Shadowing
Maintain two copies of the database. Write to shadow copy, swap on COMMIT (risky due to I/O overhead).
13. Log-Based Recovery
📖 Transaction Log
A log is a sequence of log records that record all modifications to the database and transaction operations. Used for recovery and rollback.
Log Record Types
| Log Record Type | Format | Purpose |
|---|---|---|
<START T₁> |
START transaction T₁ | Mark transaction start |
<T₁, X, V_old, V_new> |
UPDATE record | Log data modification with before/after values |
<COMMIT T₁> |
COMMIT transaction | Mark successful completion |
<ABORT T₁> |
ABORT transaction | Mark failed transaction |
<CKPT> |
CHECKPOINT | Mark stable point for recovery |
Log Structure Example
UNDO/REDO Operations
UNDO
Restore old value V_old for a failed or aborted transaction
When: Transaction aborted or failure before COMMIT
REDO
Reapply new value V_new for a committed transaction
When: Transaction committed but changes not yet written to disk
Recovery Algorithm
REDO Phase
Scan forward from last CHECKPOINT. For every COMMIT record, REDO the transaction.
UNDO Phase
Scan backward from end. For every ABORT or incomplete transaction (no COMMIT), UNDO the transaction.
📝 CHECKPOINT
A checkpoint marks a point where all modified pages have been written to disk. Allows recovery to start from the checkpoint instead of the beginning of the log, reducing recovery time.
Example Recovery Scenario
Scenario: Crash occurs after T₁ commits but before T₂ commits
Log contents:
<START T₁> <T₁, X, 100, 150> <COMMIT T₁> <START T₂> <T₂, Y, 200, 250> [CRASH]
Recovery:
- REDO: REDO T₁ (X = 150) - already durable
- UNDO: UNDO T₂ (Y = 200) - restore old value
Result: T₁ effects are durable, T₂ is rolled back
14. Practice Questions
Explain ACID properties with bank transfer example.
▼Bank Transfer: Transfer $100 from Account A to Account B
Atomicity (All or Nothing):
Both updates must succeed or both must fail. We cannot have A reduced but B not increased.
Consistency:
Total money in the bank before = Total money after. If A had $1000 and B had $500, after transfer A has $900 and B has $600. Total remains $1500.
Isolation:
Other concurrent transactions cannot see the intermediate state where A is reduced but B is not yet increased.
Durability:
After COMMIT, the transfer is permanent. Even if the server crashes immediately after COMMIT, both accounts will still reflect the transfer when the database recovers.
Define and explain the Lost Update problem with example.
▼Lost Update Problem: Two transactions read the same data and make updates, but one transaction's update is overwritten (lost) by the other.
Example: Two users increment a counter
Initial value: counter = 5
Time T₁ (User 1) T₂ (User 2)
1 READ counter (5)
2 READ counter (5)
3 counter = 5+1=6
4 WRITE counter (6)
5 counter = 5+1=6
6 WRITE counter (6)
Final value: counter = 6 (should be 7!)
T₁'s increment is lost!
Solution: Use locks to ensure one transaction completes before another starts, or use timestamp ordering.
What is Dirty Read? How is it prevented?
▼Dirty Read: A transaction reads data written by another transaction that has not yet committed.
Problem: The writing transaction might be rolled back, leaving the reading transaction with invalid data.
Example:
T₁ reads account balance = $1000
T₂ reads account balance = $1000
T₂ updates balance to $500 and writes it
T₁ reads balance again = $500 (dirty read!)
T₂ ROLLBACK - balance goes back to $1000
T₁ used invalid data
Prevention Methods:
- Locks: Exclusive locks prevent reads until transaction commits
- Isolation Level: Use READ COMMITTED or higher
- 2PL Protocol: Ensures dirty reads don't occur
Explain Two-Phase Locking (2PL) protocol.
▼2PL Protocol: All lock requests must come before any unlock requests.
Two Phases:
1. Growing Phase:
- Transaction acquires locks on data items
- No locks are released
- At end of this phase, transaction has all locks it needs
2. Shrinking Phase:
- Transaction releases locks
- No new locks can be acquired
Guarantee: 2PL ensures conflict-serializable schedules.
Example:
LOCK A (GROWING)
LOCK B (GROWING)
READ A, WRITE B
UNLOCK A (SHRINKING)
UNLOCK B
Note: Strict 2PL holds all exclusive locks until COMMIT to prevent cascading rollback.
What is Deadlock? Explain Wait-Die and Wound-Wait protocols.
▼Deadlock: Circular wait where two or more transactions are waiting for locks held by each other.
Example:
T₁ locks A, waiting for B (held by T₂)
T₂ locks B, waiting for A (held by T₁)
Neither can proceed = DEADLOCK
Wait-Die Protocol:
Older transaction waits, younger transaction dies (aborts)
- If T_old requests lock held by T_young: wait
- If T_young requests lock held by T_old: die (abort)
- Prevents circular waits (older never waits for younger)
Wound-Wait Protocol:
Older transaction wounds (aborts) younger, younger waits for older
- If T_old requests lock held by T_young: wound T_young (abort it)
- If T_young requests lock held by T_old: wait
- Younger transactions avoid being aborted if they start first
Explain log-based recovery with REDO and UNDO operations.
▼Log-Based Recovery: Uses a transaction log to recover from failures.
Log Records:
- <START T> - Transaction begins
- <T, X, old_val, new_val> - Modification record
- <COMMIT T> - Transaction committed
- <ABORT T> - Transaction aborted
REDO Operation: Reapply changes of committed transactions
UNDO Operation: Undo changes of uncommitted transactions
Recovery Algorithm:
Phase 1 - REDO: Scan log forward from last CHECKPOINT
- For each <COMMIT T>, REDO the transaction (reapply new values)
- Ensures durable changes aren't lost
Phase 2 - UNDO: Scan log backward from end
- For each transaction without <COMMIT>, UNDO it (restore old values)
- Removes incomplete transactions
Example:
<START T₁>
<T₁, A, 100, 150>
<COMMIT T₁>
<START T₂>
<T₂, B, 200, 250>
[CRASH - no COMMIT for T₂]
Recovery:
- REDO T₁ (A = 150)
- UNDO T₂ (B = 200)
Explain the difference between Conflict and View Serializability.
▼Serializability: A concurrent execution is serializable if it produces the same result as some serial execution.
Conflict Serializability:
Two schedules are conflict equivalent if every pair of conflicting operations is ordered the same way in both schedules.
- Conflicting operations: involve same data item, at least one is write
- Easier to implement (basis for 2PL)
- More restrictive than view serializability
View Serializability:
Two schedules are view equivalent if:
- Same transactions read same initial values
- If T reads value written by T', it's the same in both schedules
- Same transactions perform final writes
Comparison:
- All conflict serializable schedules are view serializable
- Not all view serializable are conflict serializable
- Checking view serializability is NP-complete (computationally expensive)
- Checking conflict serializability is polynomial (efficient)