How We Achieved <100ms Event Latency at Scale

When a player opens a chest or makes a purchase, the game SDK fires an event. That event needs to reach the server, get persisted, and trigger downstream logic (segment evaluation, intervention checks, dashboard updates) before the player notices anything happened. In games, latency is the difference between a real-time LiveOps platform and a stale analytics dashboard that is always a step behind.

Early on, we were seeing p99 ingestion latencies hovering around 200ms. For a game with 100,000 daily active users generating millions of events per day, that meant a growing backlog of unprocessed events and segments that lagged behind reality. Intervention windows closed before we could act on them. Mobile SDKs were buffering events client-side because the server could not keep up during traffic spikes, and buffered events on a phone are one app crash away from being lost forever.

We needed ingestion under 100ms at the p99, with headroom to spare for burst traffic. Here is how we built the pipeline that got us there.

The Two-Phase Pipeline

The single most impactful architectural decision was splitting event handling into two distinct phases: a fast synchronous ingestion path and a slower asynchronous processing path. This separation means the latency a game SDK experiences is decoupled from the complexity of what happens to that event afterward.

Ingest

Validate

Ack

Queue

Process

Store

Key insight

The SDK receives its acknowledgment at step 3. Everything after that happens asynchronously, so adding processing complexity never increases ingestion latency.

Phase 1: Fast Ingestion

The Execution API, a fully async FastAPI application, is the front door for all SDK traffic. When an event arrives at the ingestion endpoint, the server does three things:

Validate the payload using Pydantic. This is pure CPU work, typically under 1ms.
Write the event to PostgreSQL with processed=False.
Return a success response to the SDK.

No segment evaluation. No player attribute updates. No intervention logic. The API acknowledges the event the moment it is safely persisted. A fire-and-forget Redis pub/sub notification goes out to any connected dashboards, but it is wrapped in exception suppression so it can never block or fail the response.

For batch ingestion, which most production SDKs use, the endpoint accepts up to 1,000 events in a single request and writes them with a single INSERT ... ON CONFLICT DO NOTHING statement. One database round-trip regardless of batch size. A 100-event batch that would take 500-1,500ms as individual inserts completes in 10-30ms. A full 1,000-event batch finishes in 30-80ms. The architecture pays for itself here.

Phase 2: Async Processing

Events marked processed=False are picked up by Celery workers running on a dedicated events queue. Each event flows through a processing pipeline: update the player record, evaluate segment membership, check intervention triggers, publish to the event stream, and finally mark the event as processed.

Batch processing with configurable commit checkpoints (defaulting to every 20 events) prevents any single transaction from growing too large or holding locks for too long. Failed events retry with exponential backoff, with three attempts maximum and a 60-second base delay.

Because the two phases are independent, the p99 ingestion latency stays flat even as we add more processing steps. Adding a new intervention type or a more complex segmentation rule has zero impact on how fast the SDK gets its acknowledgment.

Redis: The Sub-Millisecond Backbone

Redis is the nervous system that ties the platform together, handling authentication, real-time broadcasting, rate limiting, and more.

Connection Pooling

Centralized RedisPoolManager singleton with 50 max connections, 5-second socket timeouts, and 30-second health check intervals. Connection reuse means most Redis operations complete in under 1ms.

Auth Caching

API key validation cached in Redis with a 5-minute TTL using pipelines for atomic writes. A cache hit skips the database entirely, saving 5-15ms per request.

Pub/Sub Fan-Out

Namespaced channels (ilara:game:{game_id}, ilara:dashboard:{tenant_id}) route real-time updates to dashboards across all API server replicas without sticky sessions.

Rate Limiting

Sliding window algorithm using sorted sets. ZRANGEBYSCORE, ZADD, and EXPIRE in a single pipeline. Rejected requests cost only a Redis lookup and a 429 response.

Connection Pooling

We run a centralized RedisPoolManager singleton that maintains shared connection pools across all services. Three dedicated Redis databases separate concerns: DB 0 for sessions, rate limiting, and pub/sub; DB 1 for application caches; DB 2 for the Celery broker and result backend. The pool is configured with 50 max connections, 5-second socket timeouts, and 30-second health check intervals with keepalive enabled. Connection reuse means most Redis operations complete in under 1ms.

Caching the Auth Hot Path

API key validation is the most frequent operation in the system, since every SDK request requires it. Without caching, that is a database query on every single event. Our ApiKeyCacheService stores validated key data in Redis with a 5-minute TTL, using pipelines for atomic writes. A cache hit skips the database entirely, saving 5-15ms per request. Secondary indexes keyed by game_id enable O(1) cache invalidation when keys are rotated, so revoked keys stop working within minutes.

Pub/Sub for Real-Time Fan-Out

When an event is ingested, connected dashboards need to know about it. Redis pub/sub channels follow a namespaced pattern: ilara:game:{game_id} for game-specific events, ilara:dashboard:{tenant_id} for dashboard updates. A background asyncio.Task listens on subscribed channels and routes messages to local WebSocket connections. Because all API server replicas share the same Redis pub/sub channels, this works across multiple instances without any sticky session requirements.

Rate Limiting Without Latency Penalty

Rate limiting uses Redis sorted sets for a sliding window algorithm: ZRANGEBYSCORE, ZADD, and EXPIRE execute in a single pipeline. The rate limiting middleware runs as the outermost layer in the middleware stack, so rejected requests are cheap: a Redis lookup and a 429 response, no database or business logic involved. Tier-based limits (100 requests per minute on the free tier up to unlimited on enterprise) are resolved from tenant context that is already cached in Redis.

Database Write Optimization

PostgreSQL handles the persistent storage layer, and for a write-heavy event pipeline, how you write matters as much as what you write.

The AnalyticsEvent model is deliberately lean on indexes. Only the columns that serve active query patterns are indexed: composite indexes on tenant + time range, game + event name + time, and player + time. The idempotency_key column carries a unique constraint for deduplication but is nullable. Only events that need dedup carry it, so the index stays small.

Arbitrary event data goes into a JSONB properties column. No schema migrations when a game studio wants to track a new custom property. For a multi-tenant platform where each game has its own event taxonomy, this flexibility matters.

On the connection side, SQLAlchemy 2.0 with the asyncpg driver gives us true async PostgreSQL operations rather than the thread-pool wrapping that older approaches rely on. The session is configured with expire_on_commit=False to avoid unnecessary refresh queries after commits, and autoflush=False to prevent surprise queries during object attribute access. Connection pooling with pool_pre_ping=True quietly replaces stale connections before they cause errors.

For dashboard queries, we avoid scanning the raw events table entirely. Pre-aggregated EventSummary records are computed periodically by Celery Beat tasks, giving the dashboard fast reads without competing with the write path.

Async Python Done Right

We went fully async because the concurrency model matches the workload. Event ingestion is almost entirely I/O-bound: validate a payload, write to a database, publish to Redis. A synchronous thread-per-request model wastes most of its time waiting on network round-trips.

FastAPI on Uvicorn handles each incoming request as a lightweight coroutine. The database driver (asyncpg) and the Redis client (redis.asyncio) are both natively async, meaning there is no thread pool indirection adding latency and complexity. A single process can handle thousands of concurrent connections with minimal memory overhead.

The middleware pipeline is ordered deliberately. Rate limiting is outermost so rejected requests are cheapest. Tenant context extraction happens once and is stored in a contextvars.ContextVar, making it available to any async code in the request lifecycle at zero additional cost.

For Celery workers, which run in a synchronous context, we bridge the gap with a run_async() utility that manages per-thread event loops. This lets task code use async/await for database and Redis operations without sacrificing Celery's task management, retry logic, and queue routing.

Scaling Horizontally

The Execution API is fully stateless. Persistent state lives in PostgreSQL; ephemeral state lives in Redis. Spinning up additional API replicas behind a load balancer requires zero configuration changes, since each replica connects to the same database and the same Redis pub/sub channels. Health endpoints at /health/live, /health, and /health/ready give load balancers and orchestrators the signals they need for proper health checking and rolling deployments.

On the worker side, Celery tasks are routed to dedicated queues by domain: events, segments, notifications, retention, and others. Event processing workers can scale independently from notification workers. If event volume spikes, we add event workers without touching anything else. Task acknowledgment is configured with task_acks_late=True and task_reject_on_worker_lost=True, so if a worker crashes mid-processing, the event goes back on the queue instead of being silently dropped.

The Numbers

Here is what the production pipeline looks like for a typical single-event ingestion:

Stage	Latency
Pydantic validation	<1ms
API key validation (cache hit)	<2ms
PostgreSQL INSERT	5-15ms
Redis pub/sub publish	<1ms
Total p50	15-25ms
Total p99	50-80ms

0ms

Median ingestion latency (p50)

0ms

Worst-case ingestion latency (p99)

Throughput gain with batch ingestion

Batch ingestion amplifies the gains. A 100-event batch completes in the same time as a handful of individual inserts. A full 1,000-event batch, the maximum per request, finishes in 30-80ms total. That is a 10-50x throughput improvement over individual ingestion, and it is the mode most production SDKs operate in.

What We Would Do Differently at 10x Scale

Our current architecture serves us well, but we are already thinking about the next order of magnitude. The synchronous PostgreSQL write is the largest contributor to ingestion latency. Here is our scaling roadmap:

Redis Streams Write-Ahead Buffer

Replace the synchronous PostgreSQL write with a Redis Streams write-ahead buffer that drains asynchronously to PostgreSQL, pushing ingestion latency below 10ms.

TimescaleDB / ClickHouse for Events

Automatic time-based partitioning and columnar compression for the events table without managing it ourselves.

Adaptive Client-Side Batching

SDK-side flush intervals that adjust based on network conditions, smoothing out traffic spikes before they ever reach the server.

These are planned improvements, not theoretical ones. The two-phase architecture we built from the start makes each of these changes possible without rethinking the entire system. The ingestion path and the processing path are independent by design, so we can swap out the storage layer under the ingestion path without touching a single line of processing code.

Bottom line

Getting to sub-100ms latency came down to drawing the right boundary between "acknowledge fast" and "process thoroughly," then optimizing each side independently.

How We Achieved <100ms Event Latency at Scale

The Two-Phase Pipeline

Phase 1: Fast Ingestion

Phase 2: Async Processing

Redis: The Sub-Millisecond Backbone

Connection Pooling

Caching the Auth Hot Path

Pub/Sub for Real-Time Fan-Out

Rate Limiting Without Latency Penalty

Database Write Optimization

Async Python Done Right

Scaling Horizontally

The Numbers

What We Would Do Differently at 10x Scale

Stay in the loop.