Table of contents
Open Table of contents
Introduction
Without queues, services depend on each other to be alive at the same time. With queues, they only need to share a channel. This single shift explains how large systems stay resilient under load.
The Problem Queues Solve
User clicks “Place Order.” System needs to confirm the order, send email, send SMS, notify warehouse, update inventory, calculate loyalty points, log analytics.
Without a queue — synchronous:
Save to DB (50ms) → Send email (200ms) → Send SMS (150ms)
→ Notify warehouse (100ms) → Update inventory (80ms)
→ Calculate loyalty (120ms) → Log analytics (90ms)
→ Return "Order Confirmed"
Total: ~790ms — user waits for all of this
If SMS service is down → entire order fails.
With a queue — asynchronous:
Save to DB (50ms) → drop message in queue (5ms)
→ Return "Order Confirmed" ✅
Total: ~55ms
Meanwhile in background:
Queue → Email Service
Queue → SMS Service
Queue → Warehouse Service
Queue → Inventory Service
Queue → Analytics Service
User gets instant confirmation. Everything else runs in the background. If SMS is down — message stays in queue and retries automatically.
What a Message Queue Actually Is
Producer → puts message in queue → Consumer picks it up
- Producer: generates work
- Queue: holds work until someone is ready
- Consumer: processes the work
The producer and consumer never talk directly. They only know about the queue. This is decoupling.
Why Decoupling Matters
Without a queue — tightly coupled:
Order Service → directly calls Email Service
→ directly calls SMS Service
→ directly calls Warehouse Service
If Email Service is down → Order Service breaks
If Warehouse is slow → Order Service slows
Adding a new service → must modify Order Service
With a queue — decoupled:
Order Service → Queue ← Email Service
← SMS Service
← Warehouse Service
← Any new service added tomorrow
Email Service down → queue holds messages, retries later
Warehouse slow → queue absorbs the backlog
Adding new service → just subscribe to queue, nothing else changes
Each service lives and dies independently.
A queue converts a hard dependency into a soft dependency. Services no longer need each other alive at the same time.
Two Messaging Models
Point to Point
One producer. One consumer processes each message. No message is processed twice.
Order placed → Queue → Worker 1 processes it
→ Worker 2 processes next
→ Worker 3 processes next
Use when: Exactly one thing should happen per event — payment processing, order fulfillment, video compression.
Pub/Sub (Publish Subscribe)
One producer. Multiple independent consumers each receive every message.
Order placed → Topic → Email Service receives it
→ SMS Service receives it
→ Analytics Service receives it
→ Warehouse Service receives it
Use when: Multiple services need to react to the same event independently.
Kafka vs RabbitMQ
RabbitMQ — Smart Broker
The broker handles routing, retries, delivery guarantees. Messages are deleted after consumption.
- Push-based — broker pushes to consumers
- Messages deleted after acknowledgement
- Complex routing rules built in
- Throughput: hundreds of thousands/sec
Best for: Task queues, job processing, background jobs, scheduled work.
Kafka — Distributed Log
Kafka is not really a queue — it’s a distributed log. Messages are written to a log and stay there. Consumers read from wherever they left off.
Producer → Kafka Log → Consumer A reads from position 100
→ Consumer B reads from position 847
→ Consumer C replays from position 1
Messages are NOT deleted after consumption. They stay for a configured retention period.
- Pull-based — consumers pull at their own pace
- Messages retained after consumption — replay is possible
- Ordering guaranteed within a partition
- Throughput: millions/sec
Best for: Event streaming, analytics pipelines, audit logs, anything needing replay.
The Decision Rule
"One task, done once, by one worker"
→ RabbitMQ
"Many systems react to the same event, replay might be needed"
→ Kafka
Key Concepts
Acknowledgement
Consumer tells the queue “I processed this successfully.” No acknowledgement = queue retries.
Consumer picks up message
→ Processes it
→ Sends ACK ✅ → queue deletes message
→ Crashes before ACK ❌ → queue retries with another consumer
Dead Letter Queue (DLQ)
Message fails repeatedly → moves to DLQ for investigation instead of blocking the queue or disappearing silently.
Message fails 3 times → moves to DLQ
→ engineers investigate
→ fix bug → replay from DLQ
Consumer Groups (Kafka)
Multiple consumers split partitions between them for parallel processing.
Kafka topic: 6 partitions
Consumer group: 3 consumers
→ Consumer 1 handles partitions 1-2
→ Consumer 2 handles partitions 3-4
→ Consumer 3 handles partitions 5-6
Add more consumers to a group → more throughput.
Connections to Previous Lessons
Flash sale from Lesson 6:
Price changes → Kafka topic
10M users connected via WebSocket ← subscribed to topic
Price update published once → all users notified
Zero polling. Zero DB flood.
Video processing from Lesson 1:
500 videos/hour → each upload drops message in RabbitMQ
→ Worker pool processes in parallel
→ Worker crashes → message retried automatically
Order placement with polyglot persistence:
Order saved to PostgreSQL (ACID guaranteed)
→ Event published to Kafka
→ Email, SMS, Inventory, Analytics all react independently
Exercise: Instagram Notifications
Events: photo likes, post comments, new follows, story mentions, weekly activity summary.
For each: RabbitMQ or Kafka? Point-to-point or Pub/Sub? What happens when notification service is down?
Reference answer:
| Event | Tool | Model | Reasoning |
|---|---|---|---|
| Likes | Kafka | Pub/Sub | Notifications + analytics + feed algorithm all react; extreme throughput |
| Comments | Kafka | Pub/Sub | Same — multiple consumers, high volume |
| Follows | Kafka | Pub/Sub | Notification + recommendations + feed all react |
| Story mentions | Kafka | Pub/Sub | Notifications + moderation + analytics |
| Weekly summary | RabbitMQ | Point-to-Point | One job, one email, scheduled, predictable volume |
If notification service is down: messages stay in queue until it recovers. Service restarts and processes the backlog. Repeated failures go to Dead Letter Queue for investigation.
Key Takeaways
- Queues decouple services — producer and consumer don’t need to be alive simultaneously.
- Point-to-point (RabbitMQ) for task processing; Pub/Sub (Kafka) for event-driven reactions.
- Kafka retains messages — consumers can replay history. RabbitMQ deletes after ACK.
- Acknowledgement + Dead Letter Queue = nothing is silently lost.
- Social/event-driven systems are almost always Kafka. Background job systems are often RabbitMQ.
Part of the system design series. Next: CDN — how global apps serve content fast everywhere.