Dealing with Contention

Optimistic locking, pessimistic locking, versioning, and handling concurrent writes.

Race conditions (conflicts) occur when multiple processes compete for the same resource simultaneously—booking the last concert ticket, bidding on an auction item, or transferring money between accounts. Most contention problems are solved with database transactions and row locking within a single database. Only reach for distributed coordination when you've truly outgrown vertical scaling, which happens much later than most engineers think.

Start with pessimistic locking (acquire locks upfront) for high-contention scenarios where conflicts are likely. Use optimistic concurrency control (detect conflicts after they occur) when conflicts are rare but you need high throughput. Distributed coordination adds significant complexity and should be your last resort.

The Double-Booking Problem

Two users want the last seat for a Taylor Swift concert. Alice and Bob both click "Buy Now" at exactly the same moment. Without proper coordination:

sequenceDiagram
    participant Alice
    participant Server
    participant Bob
    Alice->>Server: Check availability
    Bob->>Server: Check availability
    Server-->>Alice: 1 seat available
    Server-->>Bob: 1 seat available
    Alice->>Server: Purchase seat
    Bob->>Server: Purchase seat
    Server-->>Alice: Payment successful
    Server-->>Bob: Payment successful
    Note over Alice,Bob: Both get the same seat number

Both users see "1 seat available," both proceed to payment, both get charged $500, and both receive confirmation for Row 5, Seat 12. One person gets kicked out at the venue, and the venue issues a refund while dealing with two angry customers.

This race condition happens because there's a gap between "check current state" and "update based on that state"—a classic read-modify-write problem where instruction interleaving allows the world to change between operations. In that tiny window—microseconds in memory, milliseconds over a network—concurrent processes can see stale data and make decisions based on outdated assumptions. The problem scales with concurrent users: 10,000 people hitting the same resource creates massive conflicts.

Part 1: Single Database Solutions

When all your data lives in a single database, contention solutions are straightforward. Modern databases like PostgreSQL can handle tens of terabytes and thousands of concurrent connections, covering the vast majority of applications.

Quick Decision Matrix

Approach Contention Level Operational Complexity Use Cases
Database Transactions Any Low Money transfers, order processing
Pessimistic Locking High Low Seat booking, inventory with limited stock
Optimistic Concurrency Low Medium User profiles, product catalogs
SERIALIZABLE Isolation Medium Low Financial reporting, audit trails

Database Transactions

Transactions provide atomicity—a group of operations either all succeed or all fail, maintaining data invariants (business rules like "money can't be created or destroyed"). No partial completion. If you're transferring money between accounts, either both the debit and credit happen, or neither does.

BEGIN TRANSACTION;

-- Debit Alice's account
UPDATE accounts SET balance = balance - 100 WHERE user_id = 'alice';

-- Credit Bob's account  
UPDATE accounts SET balance = balance + 100 WHERE user_id = 'bob';

COMMIT; -- Both operations succeed together

For concert tickets, atomicity ensures seat reservation and ticket creation happen together:

BEGIN TRANSACTION;

-- Reserve the seat
UPDATE concerts
SET available_seats = available_seats - 1
WHERE concert_id = 'taylor_swift'
  AND available_seats > 0;

-- Create the ticket record
INSERT INTO tickets (user_id, concert_id, seat_number, purchase_time)
VALUES ('user123', 'taylor_swift', 'A15', NOW());

COMMIT;

But transactions alone don't prevent race conditions. Two people can still start transactions simultaneously, both see available_seats = 1, and both succeed in reserving the same seat. We need coordination mechanisms.

Pessimistic Locking

Pessimistic locking prevents conflicts by acquiring locks upfront. The approach assumes conflicts will happen and blocks them before they occur.

BEGIN TRANSACTION;

-- Lock the row first to prevent race conditions
SELECT available_seats FROM concerts
WHERE concert_id = 'taylor_swift'
FOR UPDATE;

-- Safely update only if seats are available
UPDATE concerts
SET available_seats = available_seats - 1
WHERE concert_id = 'taylor_swift'
  AND available_seats > 0;

-- Application checks affected row count here
-- If 0 rows updated, ROLLBACK instead of inserting

INSERT INTO tickets (user_id, concert_id, seat_number, purchase_time)
VALUES ('user123', 'taylor_swift', 'A15', NOW());

COMMIT;

The FOR UPDATE clause acquires an exclusive lock on the concert row. When Alice runs this code, Bob's identical transaction blocks at the SELECT statement until Alice's transaction completes. This serializes access and prevents both from seeing the same initial seat count.

The AND available_seats > 0 guard is critical. Locking prevents concurrent access, but doesn't enforce business rules. Without that check, the second transaction would decrement seats to -1.

Performance considerations matter with locks. Lock as few rows as possible for as short a time as possible. Lock entire tables and you kill concurrency. Hold locks for seconds instead of milliseconds and you create bottlenecks.

Optimistic Concurrency Control

Optimistic concurrency control assumes conflicts are rare and detects them after they occur. Instead of blocking transactions with locks, you let them proceed and retry the ones that conflict.

The pattern uses version numbers with your data. Every update increments the version. When updating, specify both the new value and the expected current version:

-- Alice reads: concert has 1 seat, version 42
-- Bob reads: concert has 1 seat, version 42

-- Alice tries to update first:
BEGIN TRANSACTION;
UPDATE concerts 
SET available_seats = available_seats - 1, version = version + 1
WHERE concert_id = 'taylor_swift' 
  AND version = 42;  -- Expected version

INSERT INTO tickets (user_id, concert_id, seat_number, purchase_time)
VALUES ('alice', 'taylor_swift', 'A15', NOW());
COMMIT;

-- Alice's update succeeds, seats = 0, version = 43

-- Bob tries to update:
BEGIN TRANSACTION;
UPDATE concerts
SET available_seats = available_seats - 1, version = version + 1
WHERE concert_id = 'taylor_swift'
  AND version = 42;  -- Stale version!

-- Bob's UPDATE affects 0 rows (version mismatch)
-- Application MUST check affected row count and ROLLBACK
ROLLBACK;

The UPDATE with a stale version won't raise a database error—it just updates zero rows. Your application must check the affected row count and roll back if zero rows matched. When Bob's update affects zero rows, he knows someone else modified the record and can retry with the current version.

You can use existing data as the version instead of a separate column:

-- Use available seats count as the version
UPDATE concerts 
SET available_seats = available_seats - 1
WHERE concert_id = 'taylor_swift' 
  AND available_seats = 1;  -- Expected current value

Be careful with the ABA problem: a value changes from A to B and back to A between your read and write, making it appear unchanged when important state transitions occurred. For example, if a bank balance goes from $100 to $50 and back to $100, your optimistic check passes even though transactions happened. Use business values as versions only when they change monotonically (like auction bids). A dedicated version column is safest.

Optimistic concurrency works well when conflicts are uncommon. For most e-commerce scenarios, the chance of two people buying the exact same item simultaneously is low. The occasional retry is worth avoiding locking overhead.

Isolation Levels

Instead of explicit locking, you can raise the isolation level to let the database automatically handle conflicts. Isolation levels control how much concurrent transactions can see of each other's changes.

-- Set isolation level for this transaction
BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE;

UPDATE concerts
SET available_seats = available_seats - 1
WHERE concert_id = 'taylor_swift'
  AND available_seats > 0;

INSERT INTO tickets (user_id, concert_id, seat_number, purchase_time)
VALUES ('user123', 'taylor_swift', 'A15', NOW());

COMMIT;

With SERIALIZABLE isolation, the database detects conflicts and aborts one transaction if they would interfere. Implementation varies: PostgreSQL uses predicate locks and dependency tracking, while MySQL uses gap locks. The aborted transaction must retry. This provides strong consistency but is more expensive than explicit locks—the database must track all reads and writes to detect conflicts.

Part 2: Distributed Coordination

When you need to coordinate updates across multiple databases, complexity increases significantly. If you can keep related data in a single database, do it. Distributed coordination should be your last resort.

Quick Decision Matrix

Approach Consistency Availability Operational Complexity Use Cases
Two-Phase Commit Strong Low Very High Banking transfers, ACID across services
Saga Pattern Eventual High High E-commerce checkout, microservices
Distributed Locks Strong Medium Medium Seat reservations, leader election

Two-Phase Commit (2PC)

Two-phase commit coordinates transactions across multiple databases. A coordinator service manages the transaction across database participants.

sequenceDiagram
    participant Coordinator
    participant DB_A as Database A
    participant DB_B as Database B
    
    Coordinator->>DB_A: PREPARE (debit Alice $100)
    Coordinator->>DB_B: PREPARE (credit Bob $100)
    DB_A-->>Coordinator: PREPARED
    DB_B-->>Coordinator: PREPARED
    
    Note over Coordinator: All participants prepared
    
    Coordinator->>DB_A: COMMIT
    Coordinator->>DB_B: COMMIT
    DB_A-->>Coordinator: COMMITTED
    DB_B-->>Coordinator: COMMITTED

The coordinator asks all participants to prepare the transaction in phase one. If everyone prepares successfully, it tells them to commit in phase two. If any participant fails to prepare, everyone aborts.

The coordinator must write to a persistent log before sending commit decisions. Without this log, coordinator crashes create unrecoverable situations where participants don't know whether to commit or abort.

The prepare phase holds locks on affected rows, blocking other operations. If the coordinator crashes between prepare and commit, databases wait indefinitely for a decision. This blocking problem is 2PC's biggest weakness.

2PC guarantees atomicity across multiple systems but sacrifices availability. Use only when you absolutely need strong consistency across databases and can tolerate the operational complexity.

Saga Pattern

The saga pattern breaks operations into independent steps that can each be undone. Instead of coordinating everything atomically, you do operations sequentially and compensate if something fails.

sequenceDiagram
    participant Service
    participant DB_A as Database A
    participant DB_B as Database B
    
    Service->>DB_A: Debit Alice $100
    DB_A-->>Service: SUCCESS
    
    Service->>DB_B: Credit Bob $100
    DB_B-->>Service: FAILURE
    
    Note over Service: Compensate for failure
    
    Service->>DB_A: Credit Alice $100 (compensation)
    DB_A-->>Service: SUCCESS

For a bank transfer:

  1. Debit $100 from Alice's account in Database A, commit immediately
  2. Credit $100 to Bob's account in Database B, commit immediately
  3. Send confirmation notifications

If step 2 fails (Bob's account doesn't exist), run compensation for step 1 (credit Alice's account back).

Each step is a complete, committed transaction. No long-running locks across network calls. The system is temporarily inconsistent—after step 1, Alice's account is debited but Bob's isn't credited yet. This eventual consistency is what makes sagas practical.

Sagas need a durable coordinator to track progress. If the coordinator crashes after step 1 but before step 2, you need to resume. Workflow engines like Temporal handle this orchestration.

Distributed Locks

Distributed locks ensure only one process can work on a resource across your entire system. Instead of coordinating complex transactions, you prevent concurrent access entirely.

// Acquire locks on both account IDs before starting
const lockKeys = ['account:alice', 'account:bob'].sort(); // Consistent ordering
const locks = await acquireLocks(lockKeys, ttl: 30000);

try {
  // Perform transfer operations
  await debitAccount('alice', 100);
  await creditAccount('bob', 100);
} finally {
  await releaseLocks(locks);
}

Implementation options:

Redis with TTL:

// Atomic lock acquisition with expiration
const acquired = await redis.set(
  `lock:${resourceId}`, 
  processId, 
  'PX', 30000,  // 30 second TTL
  'NX'          // Only set if not exists
);

Database columns:

-- Add lock columns to existing tables
UPDATE resources 
SET locked_by = 'process123', locked_until = NOW() + INTERVAL '30 seconds'
WHERE resource_id = 'concert_123' 
  AND (locked_until IS NULL OR locked_until < NOW());

ZooKeeper/etcd:

// Ephemeral nodes that auto-cleanup on disconnect
const lockPath = await zk.create(
  `/locks/${resourceId}`, 
  processId,
  { ephemeral: true, sequential: true }
);

Distributed locks work well for user-facing flows. Instead of letting users compete for resources, create intermediate states that give temporary exclusive access. When users select concert seats, immediately reserve them with a 10-minute TTL. This prevents the terrible UX of users filling out payment info only to find the seat was taken.

Common Challenges

Deadlock Prevention

Deadlocks occur when transactions acquire locks in different orders. Alice transfers to Bob while Bob transfers to Alice. Transaction A locks Alice's account then tries to lock Bob's. Transaction B locks Bob's account then tries to lock Alice's. Both wait forever.

graph LR
    subgraph "Transaction A"
    A1[Lock Alice] --> A2[Wait for Bob]
    end
    
    subgraph "Transaction B"  
    B1[Lock Bob] --> B2[Wait for Alice]
    end
    
    A2 -.-> B1
    B2 -.-> A1

The solution is ordered locking—always acquire locks in consistent order regardless of business logic. Sort all resources by ID before locking:

// Always lock accounts in ID order, not business logic order
const accountIds = [aliceId, bobId].sort();
for (const accountId of accountIds) {
  await acquireLock(accountId);
}

If Alice (ID 456) transfers to Bob (ID 123), lock Bob first because 123 < 456. If Bob transfers to Alice, still lock Bob first. The ordering scheme doesn't matter as long as it's globally consistent.

The Celebrity Problem

When everyone wants the same resource—Taylor Swift announces a surprise concert and millions try to buy tickets simultaneously. Normal scaling strategies break down because you can't split one concert across multiple databases.

graph TD
    subgraph "Queue-Based Serialization"
    Requests[Millions of Requests] --> Queue[Single Queue]
    Queue --> Worker[Single Worker Thread]
    Worker --> DB[(Database)]
    end

Implement queue-based serialization. Put all requests for the hot resource into a dedicated queue processed by a single worker. This eliminates contention by making operations sequential rather than concurrent.

The tradeoff is latency—users wait longer for processing. But this is better than your entire system grinding to a halt under contention.

Coordinator Failures

In 2PC, coordinator crashes leave databases with prepared transactions waiting for commit/abort decisions. Those transactions hold locks, potentially blocking other operations indefinitely.

Production systems handle this with coordinator failover and transaction recovery. When a new coordinator starts, it reads persistent logs to determine which transactions were in-flight and completes them.

Sagas are more resilient—coordinator failure just pauses progress rather than leaving participants in limbo. No locks are held across network calls.

Real-World Failure Scenarios

Payment Processing Failures:

// Seat reserved, payment fails - need compensation
try {
  await reserveSeat(concertId, seatId);
  await processPayment(userId, amount);
} catch (PaymentError) {
  await releaseSeat(concertId, seatId); // Compensation
  throw new BookingFailedError('Payment declined');
}

Inventory Sync Conflicts:

-- Same seats sold on multiple platforms simultaneously
-- Platform A: Ticketmaster
UPDATE concerts SET available_seats = available_seats - 1 
WHERE id = ? AND available_seats = ? AND platform_version = ?;

-- Platform B: StubHub (different version)
UPDATE concerts SET available_seats = available_seats - 1 
WHERE id = ? AND available_seats = ? AND platform_version = ?;
-- One succeeds, other gets version conflict and must sync

Network Partition Scenarios:

graph TD
    subgraph "East Coast"
    E1[User East] --> E2[East Server]
    E2 --> E3[East DB]
    end
    
    subgraph "West Coast"  
    W1[User West] --> W2[West Server]
    W2 --> W3[West DB]
    end
    
    E3 -.->|Network Split| W3
    
    Note1[Both try to book last seat]
    Note2[Partition heals: conflict resolution needed]

These scenarios show why simple solutions (single database) are preferable when possible—they eliminate entire classes of failure modes that distributed systems must handle.

Conclusion

Contention handling is essential for reliable systems, but the right approach isn't what most engineers expect. Exhaust single-database solutions before considering distributed coordination. Modern databases handle the vast majority of applications, and the complexity jump to distributed systems comes with significant overhead.

Stay within a single database as long as possible. Pessimistic locking handles high contention predictably. Optimistic concurrency delivers excellent performance when conflicts are rare. Only move to distributed coordination when you've truly outgrown vertical scaling.

The simplest solution that works is almost always the right one. Adding distributed coordination introduces new failure modes—network partitions, coordinator crashes, split-brain scenarios—that single-database solutions avoid entirely.

Appendix: Implementation Patterns

Core Concepts

Lock: Mechanism preventing concurrent access to a resource. Provides mutual exclusion. Invariant: Business rule that must always be true (e.g., "account balance ≥ 0"). Mutual Exclusion: Only one process can access a critical section at a time. Throughput vs Locks: More locks = less concurrency = lower throughput. Balance safety with performance.

System-Specific Implementations

PostgreSQL: MVCC with snapshot isolation, predicate locks for SERIALIZABLE Redis: Single-threaded execution eliminates race conditions, Lua scripts for atomicity DynamoDB: Conditional writes with expected values, optimistic concurrency built-in MySQL: MVCC with gap locks, auto-deadlock detection and rollback

Programming Language Primitives (Java)

// Synchronized blocks - mutual exclusion
synchronized(lockObject) { /* critical section */ }

// ReentrantLock - explicit locking with timeout
ReentrantLock lock = new ReentrantLock();
if (lock.tryLock(5, TimeUnit.SECONDS)) { /* work */ }

// AtomicInteger - lock-free operations
AtomicInteger counter = new AtomicInteger();
counter.compareAndSet(expected, newValue); // CAS operation

// Semaphore - limit concurrent access
Semaphore permits = new Semaphore(10);
permits.acquire(); /* work */ permits.release();

Retry Logic for Optimistic Concurrency

Handles version conflicts with exponential backoff and jittered delays to prevent thundering herd.

async function updateWithRetry(resourceId, updateFn, maxRetries = 3) {
  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const resource = await db.findById(resourceId);
      const updated = updateFn(resource);
      
      const result = await db.update(resourceId, updated, {
        where: { version: resource.version }
      });
      
      if (result.affectedRows === 0) {
        throw new OptimisticLockError('Version mismatch');
      }
      
      return result;
    } catch (error) {
      if (error instanceof OptimisticLockError && attempt < maxRetries - 1) {
        await sleep(Math.random() * 100); // Jittered backoff
        continue;
      }
      throw error;
    }
  }
}

Distributed Lock with Fencing Tokens

Prevents split-brain scenarios where expired lock holders continue operating with stale tokens.

class DistributedLock {
  async acquire(resourceId, ttl = 30000) {
    const token = generateMonotonicToken();
    const acquired = await redis.set(
      `lock:${resourceId}`,
      token,
      'PX', ttl,
      'NX'
    );
    
    if (!acquired) {
      throw new LockAcquisitionError('Resource already locked');
    }
    
    return { resourceId, token, ttl };
  }
  
  async validateToken(resourceId, token) {
    const currentToken = await redis.get(`lock:${resourceId}`);
    return currentToken === token;
  }
}

// Storage layer validates tokens
async function updateResource(resourceId, data, fencingToken) {
  const isValid = await lock.validateToken(resourceId, fencingToken);
  if (!isValid) {
    throw new StaleTokenError('Lock expired or stolen');
  }
  
  return db.update(resourceId, data);
}

Saga Orchestration Pattern

Manages multi-step transactions with compensation logic for failure recovery.

class TransferSaga {
  constructor(fromAccount, toAccount, amount) {
    this.steps = [
      { action: 'debitAccount', compensation: 'creditAccount' },
      { action: 'creditAccount', compensation: 'debitAccount' },
      { action: 'sendNotification', compensation: 'sendFailureNotification' }
    ];
    this.state = { fromAccount, toAccount, amount, completedSteps: [] };
  }
  
  async execute() {
    try {
      for (const [index, step] of this.steps.entries()) {
        await this.executeStep(step, index);
        this.state.completedSteps.push(index);
        await this.persistState();
      }
    } catch (error) {
      await this.compensate();
      throw error;
    }
  }
  
  async compensate() {
    // Run compensations in reverse order
    for (let i = this.state.completedSteps.length - 1; i >= 0; i--) {
      const stepIndex = this.state.completedSteps[i];
      const step = this.steps[stepIndex];
      await this.executeCompensation(step.compensation, stepIndex);
    }
  }
}

Connection Lifecycle Examples

Pessimistic Locking Timeline:

1. Transaction A: SELECT FOR UPDATE (acquires lock)
2. Transaction B: SELECT FOR UPDATE (blocks waiting)
3. Transaction A: UPDATE + INSERT + COMMIT (releases lock)
4. Transaction B: Unblocks, continues execution

Optimistic Concurrency Timeline:

1. Transaction A: SELECT (version = 42)
2. Transaction B: SELECT (version = 42)  
3. Transaction A: UPDATE WHERE version = 42 (succeeds, version = 43)
4. Transaction B: UPDATE WHERE version = 42 (0 rows affected)
5. Transaction B: Detects conflict, retries with version = 43

This foundation explains why each approach has different performance characteristics and when to choose each pattern for your specific contention scenarios.

Thought Exercise: Concert Booking System

Scenario: Design ticket booking for a 50,000-seat stadium with 1 million concurrent users.

Basic Single-Seat Booking:

-- Pessimistic: Lock entire concert row, serialize all purchases
SELECT * FROM concerts WHERE id = 'taylor_swift' FOR UPDATE;
-- Pro: No overselling, simple logic
-- Con: Terrible throughput (1 purchase/second)

-- Optimistic: Use available_seats as version, allow concurrent reads
UPDATE concerts SET available_seats = available_seats - 1 
WHERE id = 'taylor_swift' AND available_seats = ?;
-- Pro: High throughput when seats available
-- Con: High retry rate as seats become scarce

Multi-Tier Pricing (Different Contention Patterns):

-- VIP seats: High contention, low volume, high value
UPDATE seat_tiers 
SET available_count = available_count - 1
WHERE concert_id = ? AND tier = 'VIP' AND available_count > 0;

-- General admission: Medium contention, high volume
UPDATE seat_tiers 
SET available_count = available_count - 1
WHERE concert_id = ? AND tier = 'GENERAL' AND available_count > 0;
-- Different locking strategies per tier based on business value

Group Bookings (Complex Atomicity):

-- Reserve 8 consecutive seats - all-or-nothing
BEGIN TRANSACTION;
SELECT section_id, row_number, start_seat 
FROM seat_blocks 
WHERE concert_id = ? AND consecutive_seats >= 8 
FOR UPDATE SKIP LOCKED; -- Skip already locked blocks

UPDATE seat_blocks 
SET available_seats = available_seats - 8
WHERE section_id = ? AND row_number = ? AND start_seat = ?;
-- Demonstrates row-level vs block-level locking trade-offs

Dynamic Pricing (Business Logic Conflicts):

-- Price updates based on remaining inventory
UPDATE concerts 
SET current_price = calculate_dynamic_price(available_seats),
    price_version = price_version + 1
WHERE id = ? AND price_version = ?;
-- Shows optimistic concurrency in pricing algorithms
-- Conflicts when multiple price updates happen simultaneously

Waitlist Management (Saga Pattern):

sequenceDiagram
    participant User
    participant Booking
    participant Waitlist
    participant Payment
    
    User->>Booking: Cancel ticket
    Booking->>Booking: Release seat
    Booking->>Waitlist: Notify next person
    Waitlist->>Payment: Reserve seat (10min TTL)
    Payment-->>User: Payment timeout
    Payment->>Waitlist: Return to waitlist (compensation)

Scalability Scenarios:

  • Celebrity vs Local Band: 1M concurrent (Taylor Swift) vs 100 concurrent (local venue)
  • Geographic Distribution: Same artist, multiple cities, shared inventory pools
  • Flash Sales: Presale windows with time-based distributed locks

Each approach trades off throughput, fairness, and complexity differently. The key insight: contention patterns vary dramatically based on business context, not just technical requirements.