Horizontal Scaling Guide¶

Controller (Django)¶

The Controller is stateless — all state lives in Postgres, Redis, and MinIO.

Scaling strategy¶

Run multiple Controller instances behind a load balancer
Session affinity not required (sessions stored in Redis/DB)
Each instance runs its own gunicorn workers

Configuration¶

# Production: 4 workers per instance, 2-4 instances
gunicorn config.wsgi:application \
  --bind 0.0.0.0:8000 \
  --workers 4 \
  --threads 2 \
  --timeout 120 \
  --max-requests 1000 \
  --max-requests-jitter 50

Bottlenecks¶

Database connections: Each worker holds a connection. Use PgBouncer for connection pooling at scale.
Brief assembly: CPU-bound (JSON serialization + hashing). Scales linearly with workers.
Celery workers: Scale independently. Add more workers for brief lifecycle, health checks, failure reports.

Kubernetes¶

apiVersion: apps/v1
kind: Deployment
metadata:
  name: controller
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: controller
          resources:
            requests:
              cpu: "500m"
              memory: "512Mi"
            limits:
              cpu: "1"
              memory: "1Gi"

Dispatcher (Go)¶

The Dispatcher manages container lifecycle — scaling depends on the queue source.

Internal queue (Redis list)¶

Only ONE Dispatcher instance can consume from a Redis list (BRPOP is exclusive)
Scale by increasing NUM_CONSUMERS (parallel goroutines within one instance)
For multi-instance: switch to redis-stream queue source

Redis Stream queue¶

Multiple Dispatchers can consume from the same stream via consumer groups
Each instance joins the dispatchers consumer group
Messages are distributed across instances automatically
Set QUEUE_SOURCE=redis-stream on all instances

Configuration¶

# Single instance, high parallelism
NUM_CONSUMERS=10
MAX_CONCURRENT_PULLS=4

# Multi-instance with Redis Streams
QUEUE_SOURCE=redis-stream
QUEUE_STREAM_NAME=kohakku:tasks
QUEUE_CONSUMER_GROUP=dispatchers
NUM_CONSUMERS=5

Bottlenecks¶

Image pulls: Gated by MAX_CONCURRENT_PULLS. Cold pulls dominate latency.
Docker socket: Local backend shares one Docker daemon. For higher throughput, use K8s/ECS backends.
Redis: Single Redis handles queue + state. Separate Redis instances for queue vs state at high scale.

Temporal Worker¶

Stateless — run multiple workers on the same task queue
Temporal server distributes workflow executions across workers
Scale workers independently of Controller instances

# Run 3 worker instances
for i in 1 2 3; do
  python temporal_worker.py &
done

Celery Workers¶

Scale independently from the Controller
Separate queues for different task types if needed

# High-priority queue for dispatch
celery -A config worker -l info -Q celery,dispatch -c 4

# Background queue for cleanup
celery -A config worker -l info -Q cleanup -c 2

Database¶

Read replicas: Django supports database routers for read-heavy loads
Connection pooling: PgBouncer in transaction mode
Indexes: Run manage.py dbshell and check EXPLAIN ANALYZE on slow queries

Redis¶

Persistence: AOF enabled by default in docker-compose (appendonly yes)
Maxmemory: Set to prevent OOM. LRU eviction for cache, noeviction for queue
Separate instances: One for cache/sessions, one for task queue, one for Celery broker