Real-time Updates

WebSockets, Server-Sent Events, long polling, and pushing data to clients.

The 80/20

Real-time updates involve two separate challenges: (1) delivering data from your server to clients, and (2) routing updates from their source to the correct server instance.

For client delivery, you have three main options:

  • Simple polling (2-5 second latency) - easiest to implement and operate
  • Server-Sent Events (sub-second latency) - one-way streaming over HTTP
  • WebSockets (millisecond latency) - bidirectional with operational complexity

Start with polling. Most applications don't need real-time precision, and you can always upgrade the protocol later while keeping the same server-side architecture.

The Two-Hop Problem

Real-time systems face two distinct routing challenges:

graph LR
    A[Update Source] -->|Hop 2| B[Your Server]
    B -->|Hop 1| C[Client]

Hop 1 (Server → Client): How do you deliver updates to clients? HTTP is request-response by design—clients ask, servers answer, connection closes. For real-time delivery, you need persistent connections or efficient polling.

Hop 2 (Source → Server): How do updates reach the right server instance? In distributed systems, clients connect to different servers. When data changes, you need to route that update to whichever server holds the client's connection.

These are independent problems with different solutions. You might use WebSockets for Hop 1 and Redis pub/sub for Hop 2, or polling for Hop 1 and database queries for Hop 2.

Part 1: Client-Server Protocols

The core challenge: HTTP connections close after each request-response cycle. To deliver updates immediately, you need to either keep connections open or poll frequently enough to meet your latency requirements.

Quick Decision Matrix

Protocol Latency Direction Operational Complexity Infrastructure Use When
Simple Polling 2-5s Pull Low Standard HTTP Dashboards, status pages
Long Polling 100-500ms Push Low Standard HTTP Infrequent updates, payments
SSE 50-200ms Server→Client Medium HTTP streaming support Live feeds, AI token streaming
WebSockets 10-100ms Bidirectional High L4 load balancer preferred Chat, collaboration
WebRTC 10-50ms Peer-to-peer Very High STUN/TURN servers Video calls, gaming
graph TD
    A[Need Real-time Updates?] --> B{Latency Requirement?}
    B -->|> 2 seconds OK| C[Simple Polling]
    B -->|< 2 seconds| D{Bidirectional?}
    D -->|No, one-way| E[SSE]
    D -->|Yes| F{High frequency?}
    F -->|No| G[Long Polling]
    F -->|Yes| H[WebSockets]
    D -->|Peer-to-peer| I[WebRTC]

Simple Polling

The simplest approach is to make an HTTP request every N seconds. Your client asks for updates at regular intervals, the server responds immediately, and the connection closes. Then you wait and repeat.

setInterval(() => {
  fetch('/api/messages')
    .then(res => res.json())
    .then(data => updateUI(data));
}, 2000);

The resource implications are straightforward. If you have 10,000 clients polling every 5 seconds, you're handling 2,000 requests per second regardless of whether anything changed. This creates predictable load patterns—no surprise traffic spikes, no connection state to manage, no debugging WebSocket disconnections at 3am. Your infrastructure team will appreciate the consistency.

The main trade-offs are (1) bandwidth waste and (2) latency precision. When nothing changes, you're still making full round trips. You can reduce this overhead with:

  • HTTP keep-alive: Reuses TCP connections, eliminating handshake overhead (typically 40ms per request)
  • Conditional requests: ETags or If-Modified-Since headers return 304 Not Modified when data hasn't changed, reducing payload size

This works because you're trading latency for simplicity. Updates arrive within your polling interval plus request processing time—typically 2-5 seconds total. For dashboards, job monitoring, or analytics, this delay is acceptable. Users don't need millisecond precision; they need reliable updates without operational complexity.

Polling breaks down in two scenarios: when you need sub-second latency (chat applications, collaborative editors, gaming), or when update frequency is very low but delivery must be immediate. If updates happen once per hour but users need them within seconds, you're wasting 3,599 requests for every useful one. For everything else, start with polling.

Long Polling

Long polling improves on simple polling by keeping the HTTP connection open until new data arrives. The client makes a request, the server waits (potentially 30-60 seconds) until something changes, then responds. The client immediately makes a new request. This eliminates wasted requests when nothing is happening.

async function longPoll() {
  while (true) {
    try {
      const response = await fetch('/api/updates');
      const data = await response.json();
      processData(data);
    } catch (error) {
      await new Promise(resolve => setTimeout(resolve, 1000));
    }
  }
}

The latency characteristics are more complex than simple polling. The first update arrives quickly—just network round-trip time. But if multiple updates happen in quick succession, you pay the full round-trip cost for each one. With 100ms network latency, two updates 10ms apart will arrive 100ms and 290ms after they occurred (90ms for first response to complete, 100ms for second request, 100ms for second response).

sequenceDiagram
    participant Client
    participant Server
    Client->>Server: Long poll request
    Note over Server: Wait for update...
    Note over Server: Update available!
    Server-->>Client: Response with data
    Client->>Server: Immediate new request
    Note over Server: Wait for update...

The server-side implementation requires holding connections open, which consumes resources. Each waiting request occupies a thread or async task. Modern async frameworks (Node.js, Go, Rust) handle this efficiently by using event loops rather than blocking threads—a single thread can manage thousands of waiting connections. Traditional threaded servers (older Java/Python) struggle here since each connection needs its own thread.

Long polling works well for scenarios where updates are infrequent but need faster delivery than simple polling provides. Payment processing is a classic example: you initiate a payment, then long-poll for the result. The payment might take 5-30 seconds to process, and you want to show the success page immediately when it completes.

The approach breaks down with high-frequency updates. If updates happen every 100ms, you're constantly establishing new requests, which adds overhead. At that point, WebSockets or SSE become more efficient. Long polling also complicates monitoring—requests that take 30 seconds aren't errors, they're normal operation.

Server-Sent Events (SSE)

SSE extends long polling by streaming multiple updates through a single persistent HTTP connection. The server uses Transfer-Encoding: chunked to send data incrementally without closing the connection. Each update is a "chunk" that the client receives immediately.

// Client-side
const eventSource = new EventSource('/api/updates');

eventSource.onmessage = (event) => {
  const data = JSON.parse(event.data);
  updateUI(data);
};

eventSource.onerror = () => {
  // Browser automatically reconnects
};
// Server-side (Node.js/Express)
app.get('/api/updates', (req, res) => {
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  res.setHeader('Connection', 'keep-alive');

  const sendUpdate = (data) => {
    res.write(`data: ${JSON.stringify(data)}\n\n`);
  };

  dataSource.on('update', sendUpdate);

  req.on('close', () => {
    dataSource.off('update', sendUpdate);
  });
});

The efficiency gain over long polling is significant. With long polling, two updates 10ms apart require two full request-response cycles. With SSE, both updates flow through the same connection with minimal overhead—just the data itself plus a few bytes of framing. For high-frequency updates, this reduces bandwidth and latency substantially.

The browser's EventSource API handles reconnection automatically. If the connection drops, the browser waits a few seconds and reconnects. The SSE protocol includes a "last event ID" mechanism: the server can tag each event with an ID, and when the client reconnects, it sends the last ID it received. The server can then replay any missed events.

SSE works over standard HTTP (Layer 7), which means it passes through most firewalls and proxies that allow HTTP traffic. WebSockets operate at Layer 4 (TCP) after the initial HTTP upgrade, which some corporate firewalls and older proxies may block. This makes SSE more reliable in restrictive network environments.

The main limitation is one-way communication. The server can push to the client, but the client can't send messages back through the SSE connection. For many use cases, this is fine—the client can make separate HTTP POST requests when it needs to send data. But if you need frequent bidirectional communication, WebSockets are more appropriate.

SSE is ideal for live dashboards, activity feeds, and AI applications that stream tokens as they're generated. The browser support is excellent (all modern browsers), the implementation is straightforward, and the operational complexity is minimal compared to WebSockets.

WebSockets

WebSockets provide full-duplex communication over a single TCP connection. The connection starts as a standard HTTP request with special headers (Upgrade: websocket), then the server responds with 101 Switching Protocols, upgrading the connection to the WebSocket protocol. After this handshake, both client and server can send messages at any time without request-response overhead.

// Client-side
const ws = new WebSocket('ws://api.example.com/socket');

ws.onopen = () => {
  ws.send(JSON.stringify({ type: 'subscribe', channel: 'chat' }));
};

ws.onmessage = (event) => {
  const data = JSON.parse(event.data);
  handleUpdate(data);
};

ws.onclose = () => {
  setTimeout(connectWebSocket, 1000);
};
// Server-side (Node.js/ws)
const WebSocket = require('ws');
const wss = new WebSocket.Server({ port: 8080 });

wss.on('connection', (ws) => {
  ws.on('message', (message) => {
    const data = JSON.parse(message);
    processMessage(data);
  });

  dataSource.on('update', (data) => {
    ws.send(JSON.stringify(data));
  });
});

The protocol overhead is minimal. After the initial HTTP upgrade, messages are framed with just 2-14 bytes of overhead (depending on payload size and masking). This makes WebSockets extremely efficient for high-frequency updates. A chat application sending 100 messages per second benefits significantly compared to making 100 HTTP requests.

sequenceDiagram
    participant Client
    participant Server
    Client->>Server: HTTP GET /socket (Upgrade: websocket)
    Server-->>Client: 101 Switching Protocols
    Note over Client,Server: Connection upgraded to WebSocket
    Client->>Server: Message 1
    Server->>Client: Message 2
    Server->>Client: Message 3
    Client->>Server: Message 4
    Note over Client,Server: Connection stays open

The operational complexity comes from managing stateful connections. Each WebSocket connection consumes server memory (typically 4-8KB) and must be tracked. When a server restarts, all connections drop and clients must reconnect. This means deployments cause user-visible disruptions unless you implement graceful shutdown (stop accepting new connections, wait for existing ones to close naturally, then shut down).

Load balancing requires careful consideration. Layer 7 load balancers terminate connections and create new ones to backend servers, which breaks WebSocket semantics. Some L7 load balancers support WebSockets explicitly, but Layer 4 load balancers are more reliable—they forward the TCP connection directly, so the upgrade happens end-to-end.

Scaling WebSocket servers is more complex than scaling stateless HTTP servers. You can't just add servers and distribute load randomly—once a client connects to a server, that server must handle all messages for that client. If load becomes uneven (some servers have many more connections than others), you need strategies to rebalance. Using a "least connections" load balancing algorithm helps, but it's not perfect.

Many architectures separate WebSocket handling from business logic. A dedicated WebSocket service manages connections and forwards messages to backend services via pub/sub or RPC. This keeps the WebSocket layer thin and stateless (except for the connections themselves), while business logic runs in separately scalable services.

graph TD
    C1[Client 1] -->|WebSocket| WS1[WebSocket Server 1]
    C2[Client 2] -->|WebSocket| WS1
    C3[Client 3] -->|WebSocket| WS2[WebSocket Server 2]
    WS1 --> PS[Pub/Sub Service]
    WS2 --> PS
    PS --> BL[Business Logic Services]

WebSockets are the right choice when you need frequent bidirectional communication with low latency. Chat applications, collaborative editors, multiplayer games, and real-time trading platforms all benefit from WebSockets. But if your communication is primarily one-way (server to client), SSE is simpler. And if updates are infrequent, long polling avoids the operational overhead.

WebRTC

WebRTC enables direct peer-to-peer connections between browsers, bypassing the server for data transfer. This is fundamentally different from the other protocols—instead of client-server communication, you have client-client communication.

The setup process involves a signaling server that helps peers discover each other and exchange connection information. Once peers have each other's network details, they attempt to establish a direct connection. If both peers are behind NAT (which is common), WebRTC uses STUN servers to discover public IP addresses and TURN servers as relays when direct connection isn't possible.

sequenceDiagram
    participant P1 as Peer 1
    participant SS as Signaling Server
    participant P2 as Peer 2
    P1->>SS: Connect (WebSocket/SSE)
    P2->>SS: Connect (WebSocket/SSE)
    P1->>SS: Offer (connection info)
    SS->>P2: Forward offer
    P2->>SS: Answer (connection info)
    SS->>P1: Forward answer
    Note over P1,P2: ICE candidates exchanged
    P1->>P2: Direct peer connection established
    P1->>P2: Data/video/audio
// Simplified WebRTC setup
const pc = new RTCPeerConnection({
  iceServers: [
    { urls: 'stun:stun.l.google.com:19302' },
    { urls: 'turn:turn.example.com', username: 'user', credential: 'pass' }
  ]
});

// Get local media stream
const stream = await navigator.mediaDevices.getUserMedia({
  video: true,
  audio: true
});

stream.getTracks().forEach(track => {
  pc.addTrack(track, stream);
});

// Create offer and send to peer via signaling server
const offer = await pc.createOffer();
await pc.setLocalDescription(offer);
signalingServer.send({ type: 'offer', offer });

The latency advantage is significant. With client-server-client communication, a message travels twice across the network. With peer-to-peer, it travels once. For video calls, this can reduce latency from 200-400ms to 50-100ms. For gaming, it's the difference between playable and frustrating.

The infrastructure requirements are lighter than you might expect. The signaling server handles minimal traffic—just connection setup and teardown. The STUN server is stateless and handles simple requests. The TURN server only relays traffic when direct connection fails, which happens in 10-20% of cases depending on network topology.

WebRTC is complex to implement correctly. NAT traversal is finicky, and debugging connection failures requires understanding network topology. The peer connection setup can take several seconds, which adds latency to the initial interaction. And if one peer has poor network connectivity, it affects the other peer directly—there's no server to buffer or retry.

The use cases are specific: video/audio calls, screen sharing, and gaming where latency is critical. Some collaborative editing tools use WebRTC for cursor position and presence information, keeping that high-frequency data peer-to-peer while syncing document changes through servers. This hybrid approach reduces server load while maintaining consistency for important data.

For most real-time update scenarios, WebRTC is overkill. The complexity and setup time outweigh the latency benefits unless you're specifically doing media streaming or need to minimize server bandwidth costs at scale.

Part 2: Server-Side Propagation

Once you've chosen how to deliver updates to clients, you need to solve the routing problem: how do updates get from their source to the server handling each client's connection?

Quick Decision Matrix

Approach State Location Scaling Complexity Use When
Pull (Polling) Database Easy Low Updates can be delayed
Push (Consistent Hash) Server memory Medium High Significant per-connection state
Push (Pub/Sub) Message broker Easy Medium Broadcasting to many clients
graph TD
    A[Update occurs] --> B{How to route?}
    B -->|Client pulls| C[Store in DB]
    B -->|Server pushes| D{Connection state?}
    D -->|Minimal state| E[Pub/Sub broadcast]
    D -->|Significant state| F[Consistent hash routing]

Pull-Based with Database

The simplest server-side approach is to store updates in a database and let clients pull them. When a message is sent, you write it to the database. When a client polls, you query for messages newer than the last one they received.

sequenceDiagram
    participant U as User A
    participant S as Server
    participant DB as Database
    participant C as Client B
    U->>S: Send message
    S->>DB: INSERT message
    Note over C: Polling interval...
    C->>S: GET /messages?since=timestamp
    S->>DB: SELECT messages WHERE timestamp > ?
    DB-->>S: Return messages
    S-->>C: Return messages

The resource implications scale with client count and polling frequency. With 10,000 clients polling every 5 seconds, you're executing 2,000 database queries per second. Each query needs an index on the timestamp column to be efficient. If your messages table has millions of rows, you'll want to partition by time or user to keep query performance consistent.

This approach decouples the update source from the clients consuming updates. The database acts as a buffer—updates can happen at any rate, and clients pull at their own pace. This makes the system more resilient to traffic spikes and simplifies deployment (no connection state to worry about).

The main limitation is latency. Updates sit in the database until the next poll cycle. For applications where 2-5 second delays are acceptable, this is fine. For real-time chat or collaboration, it's not.

This pattern works well when combined with simple polling on the client side. You avoid the complexity of persistent connections and message routing entirely. The database becomes your message queue, and standard database scaling techniques (read replicas, caching, partitioning) apply directly.

Push-Based with Consistent Hashing

When clients maintain persistent connections (WebSocket, SSE), you need to route updates to the specific server holding each client's connection. Consistent hashing solves this by deterministically mapping each client to a server. This is particularly useful for systems like DynamoDB where you need predictable write patterns and want to avoid hot partitions. The hash function ensures even distribution while maintaining routing consistency.

The basic approach uses a hash function to assign clients to servers. If you have N servers numbered 0 to N-1, client with ID user_id connects to server hash(user_id) % N. When an update needs to reach that client, you hash their ID, determine which server they're on, and send the update there.

graph TD
    subgraph "Client Connection"
    C1[Client user_123] -->|hash % 3 = 1| S1[Server 1]
    C2[Client user_456] -->|hash % 3 = 2| S2[Server 2]
    C3[Client user_789] -->|hash % 3 = 0| S0[Server 0]
    end
    
    subgraph "Update Routing"
    U[Update for user_123] -->|hash % 3 = 1| S1
    S1 -->|via WebSocket| C1
    end

The challenge is scaling. When you add or remove servers, N changes, which means almost every client hashes to a different server. With simple modulo hashing, adding one server to a 10-server cluster causes 90% of clients to reconnect to different servers.

Consistent hashing minimizes this disruption. Instead of a simple modulo, you map both servers and clients onto a hash ring (0 to 2^32-1). Each client connects to the next server clockwise on the ring. When you add a server, only the clients between the new server and the next server need to move. When you remove a server, only its clients need to move.

graph TD
    subgraph "Hash Ring"
    direction TB
    N0[Server 0: hash=100] --> N1[Server 1: hash=500]
    N1 --> N2[Server 2: hash=900]
    N2 --> N0
    end
    
    U1[user_123: hash=250] -.->|connects to next server clockwise| N1
    U2[user_456: hash=600] -.->|connects to next server clockwise| N2
    U3[user_789: hash=50] -.->|connects to next server clockwise| N0

The implementation requires a coordination service (ZooKeeper, etcd, Consul) to track which servers are available and their positions on the ring. When a server starts, it registers itself. When it shuts down, it deregisters. All servers watch this registry and update their routing tables.

When a client first connects, they either connect to a random server (which redirects them to the correct one) or use the same hash function client-side to connect directly. The redirect approach is simpler but adds a round trip. The direct approach requires clients to know the server list, which means they need to query the coordination service or a separate discovery endpoint.

// Server-side routing
class ConsistentHashRouter {
  constructor(servers) {
    this.ring = servers.map(s => ({
      hash: hashFunction(s.id),
      server: s
    })).sort((a, b) => a.hash - b.hash);
  }
  
  getServer(userId) {
    const userHash = hashFunction(userId);
    for (let node of this.ring) {
      if (node.hash >= userHash) {
        return node.server;
      }
    }
    return this.ring[0].server; // Wrap around
  }
}

Consistent hashing makes sense when each connection has significant server-side state that's expensive to rebuild. For example, in a collaborative document editor, each connection might have the full document loaded in memory, along with operational transform state for conflict resolution. Losing that state on reconnection means reloading the entire document and recomputing state, which could take seconds.

For simpler use cases where connections just forward messages, pub/sub is easier to operate. You avoid the coordination service, the hash ring logic, and the complexity of handling server additions and removals gracefully.

Push-Based with Pub/Sub

Pub/sub decouples update routing from connection management. Clients connect to any available server (endpoint servers), which subscribe to relevant topics in a message broker (Redis, Kafka, RabbitMQ). When an update occurs, it's published to the broker, which broadcasts it to all subscribed servers. Each server then forwards the update to its connected clients.

graph TD
    subgraph "Clients"
    C1[Client A] -->|WebSocket| ES1[Endpoint Server 1]
    C2[Client B] -->|WebSocket| ES1
    C3[Client C] -->|WebSocket| ES2[Endpoint Server 2]
    end
    
    subgraph "Pub/Sub"
    ES1 -->|Subscribe: user_A, user_B| PS[Redis Pub/Sub]
    ES2 -->|Subscribe: user_C| PS
    end
    
    subgraph "Updates"
    U[Update Service] -->|Publish: user_A| PS
    PS -->|Broadcast| ES1
    ES1 -->|Forward| C1
    end

The connection flow is straightforward. When a client connects to an endpoint server, the server subscribes to that client's topic (typically user:{user_id} or channel:{channel_id}). The server maintains a map from topics to connections. When a message arrives on a topic, the server looks up which connections care about that topic and forwards the message.

// Endpoint server
const connections = new Map(); // topic -> Set<WebSocket>

wss.on('connection', (ws, req) => {
  const userId = authenticate(req);
  const topic = `user:${userId}`;
  
  // Subscribe to user's topic
  redisClient.subscribe(topic);
  
  // Track connection
  if (!connections.has(topic)) {
    connections.set(topic, new Set());
  }
  connections.get(topic).add(ws);
  
  ws.on('close', () => {
    connections.get(topic).delete(ws);
    if (connections.get(topic).size === 0) {
      redisClient.unsubscribe(topic);
    }
  });
});

// Forward messages from Redis to WebSockets
redisClient.on('message', (topic, message) => {
  const sockets = connections.get(topic);
  if (sockets) {
    sockets.forEach(ws => ws.send(message));
  }
});

The update flow is equally simple. When an update occurs (a message is sent, a document is edited, a notification is triggered), the update service publishes it to the appropriate topic. The pub/sub broker handles distribution to all subscribed servers.

// Update service
async function sendMessage(userId, message) {
  await db.messages.insert(message);
  await redisClient.publish(`user:${userId}`, JSON.stringify(message));
}

The scaling characteristics are excellent. Endpoint servers are stateless except for the connections themselves. You can add or remove servers freely—clients just reconnect to a different server, which subscribes to the same topics. Load balancing is simple: use "least connections" to distribute new connections evenly.

The pub/sub broker becomes the scaling bottleneck. Redis Pub/Sub handles roughly 100K messages per second per node. Kafka handles millions of messages per second but with higher latency (10-50ms vs 1-5ms for Redis). For most applications, a single Redis instance is sufficient. When you need more throughput, Redis Cluster shards topics across multiple nodes.

The main limitation is that the pub/sub broker doesn't know which clients are actually connected. If you publish a message to a topic with no subscribers, the broker accepts it and discards it. If a client disconnects and reconnects, they might miss messages published during the disconnection. You need application-level logic to handle this—typically by storing messages in a database and having clients request missed messages on reconnection.

Pub/sub works well for broadcasting updates to many clients, especially when the per-client state is minimal. Chat applications, notification systems, live dashboards, and activity feeds all fit this pattern. The operational simplicity and scaling characteristics make it the default choice for most real-time systems.

Common Challenges

Connection Failures and Reconnection

Networks are unreliable. Mobile clients lose connectivity constantly. WiFi drops out. Servers restart. Your real-time system must handle disconnections gracefully.

The first challenge is detecting disconnections quickly. TCP connections don't always signal when they break—a client might think it's connected while the server has already cleaned up the connection. Implementing heartbeats solves this: the client sends a ping every 30 seconds, and if the server doesn't receive one within 45 seconds, it closes the connection. Similarly, if the client doesn't receive a pong within 45 seconds, it reconnects.

// Client-side heartbeat
const ws = new WebSocket('ws://api.example.com');
let heartbeatInterval;

ws.onopen = () => {
  heartbeatInterval = setInterval(() => {
    ws.send(JSON.stringify({ type: 'ping' }));
  }, 30000);
};

ws.onclose = () => {
  clearInterval(heartbeatInterval);
  setTimeout(reconnect, 1000);
};

The second challenge is resuming without data loss. When a client reconnects, it needs to receive any updates that occurred while it was disconnected. This requires tracking what each client has received. The simplest approach is sequence numbers: each message gets a monotonically increasing ID, and clients track the last ID they received. On reconnection, they request all messages since that ID.

// Client reconnection with sequence tracking
let lastMessageId = 0;

function connect() {
  const ws = new WebSocket(`ws://api.example.com?since=${lastMessageId}`);
  
  ws.onmessage = (event) => {
    const message = JSON.parse(event.data);
    lastMessageId = message.id;
    handleMessage(message);
  };
}

The server needs to buffer recent messages. Redis Streams work well for this—they provide an append-only log with automatic expiration. When a client reconnects, you query the stream for messages since their last ID and send them before resuming normal operation.

The Celebrity Problem

When a user with millions of followers posts an update, you need to deliver it to millions of clients simultaneously. Naive fan-out (write the update to each user's inbox) creates massive write amplification. Publishing to a single topic and having millions of servers subscribe creates a broadcast storm.

The solution is hierarchical aggregation. Instead of publishing directly to user topics, publish to regional topics. Regional aggregators subscribe to these topics, batch updates, and forward them to endpoint servers in their region. This reduces the fan-out at each level.

graph TD
    U[Celebrity posts update] --> PS[Pub/Sub: celebrity_posts]
    PS --> R1[Regional Aggregator US-East]
    PS --> R2[Regional Aggregator US-West]
    PS --> R3[Regional Aggregator EU]
    R1 --> ES1[Endpoint Servers 1-100]
    R2 --> ES2[Endpoint Servers 101-200]
    R3 --> ES3[Endpoint Servers 201-300]
    ES1 --> C1[Clients]
    ES2 --> C2[Clients]
    ES3 --> C3[Clients]

Another approach is to cache the update and have clients pull it. Instead of pushing to millions of users, you push a notification that says "new post from celebrity X." Clients then fetch the post from a CDN-cached endpoint. This converts a write-heavy problem into a read-heavy one, which is easier to scale.

Message Ordering

When multiple servers handle updates, ensuring consistent ordering becomes complex. Two messages sent milliseconds apart might arrive out of order if they travel different network paths or get processed by different servers.

For most applications, the solution is to funnel related messages through a single server or partition. If all messages for a chat room go through the same server, that server can assign timestamps and establish a total order. Clients receive messages in the order that server processed them.

graph LR
    U1[User A] -->|Message 1| S[Chat Room Server]
    U2[User B] -->|Message 2| S
    S -->|Ordered: 1, 2| PS[Pub/Sub]
    PS --> ES1[Endpoint Server 1]
    PS --> ES2[Endpoint Server 2]

For more complex scenarios where messages come from multiple sources, you need vector clocks or logical timestamps. Each server maintains a clock, and messages include timestamp information that helps recipients determine the correct order. This is typically overkill for product-focused systems—funneling through a single server is simpler and sufficient.

Conclusion

Real-time updates require solving two problems: client-server communication and server-side propagation. Start with the simplest solution that meets your latency requirements. Simple polling works for most dashboards and monitoring tools. Long polling or SSE work for moderate real-time needs. WebSockets are for high-frequency bidirectional communication. WebRTC is for peer-to-peer media.

On the server side, pull-based polling with a database is simplest. Pub/sub scales well for broadcasting to many clients. Consistent hashing makes sense when connection state is expensive to rebuild.

Most teams overestimate their real-time requirements. A 2-second delay is often acceptable, and the operational simplicity of polling outweighs the efficiency gains of more complex protocols. When you do need real-time, understand the trade-offs and choose the approach that matches your scale and complexity tolerance.

Appendix: HTTP Protocol Fundamentals

Understanding HTTP mechanics is essential for choosing the right real-time approach. Here's how the browser-to-server communication stack works:

The HTTP Request-Response Cycle

sequenceDiagram
    participant Browser
    participant Server
    Browser->>Server: HTTP Request
    Note over Browser,Server: TCP connection established
    Server-->>Browser: HTTP Response
    Note over Browser,Server: Connection closes (default)

Standard HTTP follows a strict request-response pattern:

  1. Browser opens TCP connection to server
  2. Browser sends HTTP request (headers + optional body)
  3. Server processes request and sends response (headers + body)
  4. Connection closes (unless keep-alive is specified)

This cycle repeats for every request, which is why real-time updates need different approaches.

Key HTTP Headers for Real-Time

Connection Management:

  • Connection: keep-alive - Reuses TCP connection for multiple requests
  • Connection: close - Closes connection after response (default in HTTP/1.0)
  • Connection: Upgrade - Signals protocol upgrade (used by WebSockets)

Streaming Responses:

  • Transfer-Encoding: chunked - Sends response in pieces (used by SSE)
  • Cache-Control: no-cache - Prevents proxy/browser caching of live data
  • Content-Type: text/event-stream - Identifies SSE streams

Conditional Requests:

  • If-Modified-Since: <date> - Only return data if changed since date
  • ETag: "<hash>" - Unique identifier for response content
  • If-None-Match: "<hash>" - Only return data if ETag doesn't match

Browser APIs and Protocol Mapping

graph TD
    subgraph Browser["Browser APIs"]
    F[fetch API] --> H[HTTP Request/Response]
    ES[EventSource API] --> SSE[Server-Sent Events]
    WS[WebSocket API] --> WSP[WebSocket Protocol]
    end
    
    subgraph Network["Network Layer"]
    H --> TCP1[TCP Connection]
    SSE --> TCP2[TCP Connection + HTTP]
    WSP --> TCP3[TCP Connection]
    end
    
    subgraph Server["Server Handling"]
    TCP1 --> S1[Standard HTTP Handler]
    TCP2 --> S2[Streaming HTTP Handler]
    TCP3 --> S3[WebSocket Handler]
    end

fetch() API:

  • Creates standard HTTP request-response
  • Connection closes after response
  • Suitable for polling approaches

EventSource API:

  • Maintains persistent HTTP connection
  • Automatically handles reconnection
  • Parses server-sent event format

WebSocket API:

  • Upgrades from HTTP to WebSocket protocol
  • Bidirectional message passing
  • Manual reconnection handling required

Layer 4 vs Layer 7 Protocols

Layer 4 (Transport Layer):

  • Raw TCP/UDP connections
  • Load balancers forward packets without inspection
  • WebSockets operate here after HTTP upgrade
  • Better performance, less flexibility

Layer 7 (Application Layer):

  • HTTP protocol with headers, methods, status codes
  • Load balancers can inspect and modify requests
  • SSE and polling operate here
  • More features, higher overhead
graph TD
    subgraph L7["Layer 7 HTTP"]
    HTTP[HTTP Request] --> L7LB[L7 Load Balancer]
    L7LB --> Server1[Server 1]
    L7LB --> Server2[Server 2]
    end
    
    subgraph L4["Layer 4 TCP"]
    TCP[TCP Connection] --> L4LB[L4 Load Balancer]
    L4LB --> Server3[Server 3]
    L4LB --> Server4[Server 4]
    end

Why This Matters:

  • SSE works with any HTTP infrastructure (L7 load balancers, proxies, CDNs)
  • WebSockets need L4 load balancers or L7 load balancers with WebSocket support
  • Corporate firewalls often block non-HTTP protocols, making SSE more reliable

Connection Lifecycle Examples

Polling with Keep-Alive:

1. Browser → Server: GET /api/messages (Connection: keep-alive)
2. Server → Browser: 200 OK + messages (Connection: keep-alive)
3. Wait 5 seconds
4. Browser → Server: GET /api/messages (reuses same TCP connection)
5. Server → Browser: 304 Not Modified (if no changes)

Server-Sent Events:

1. Browser → Server: GET /events (Accept: text/event-stream)
2. Server → Browser: 200 OK (Transfer-Encoding: chunked)
3. Server → Browser: data: {"message": "hello"}\n\n
4. Server → Browser: data: {"message": "world"}\n\n
5. Connection stays open...

WebSocket Upgrade:

1. Browser → Server: GET /socket (Upgrade: websocket, Connection: Upgrade)
2. Server → Browser: 101 Switching Protocols
3. Browser ↔ Server: Binary/text frames (no HTTP headers)
4. Either side can send messages anytime

This foundation explains why each real-time approach has different operational characteristics and infrastructure requirements.