Table of contents
Open Table of contents
Introduction
A notification system looks simple on the surface. In practice it touches fan-out scaling, channel routing, idempotency, real-time infrastructure, and hard decisions about complexity vs cost. Every lesson from Phase 1 and 2 converges here.
Step 1: Requirements
Functional:
- Send notifications across channels: push (mobile), email, SMS, in-app
- Trigger types: user-triggered (like, follow), system-triggered (order shipped), scheduled (weekly summary)
- User preferences: disable notification types, choose channels, set Do Not Disturb hours
- Notification history: user can view past notifications
Non-functional:
- High throughput — millions/second during peak events
- Low latency — notifications arrive within seconds
- Reliability — transactional notifications (OTP, order confirmed) must never be lost
- Idempotency — same notification never sent twice
Step 2: Scale Estimation
100M daily active users
20 notifications/user/day
= 2 billion notifications/day
= ~23,000 notifications/second normally
Peak (IPL, flash sale): could spike to millions/second
This immediately tells you:
- Synchronous processing will collapse — async is mandatory
- Message queues are the backbone
- Must handle massive bursts (fan-out)
Step 3: The Core Challenge — Fan-Out
Fan-out means one event triggers notifications to many users.
Virat Kohli posts a photo
→ 250 million followers
→ 250 million notifications in seconds
Wrong approach:
Post created
→ Fetch all 250M followers from DB (single query, kills DB)
→ Loop through each, send one by one
→ Takes hours
Right approach:
Post created
→ Publish ONE event to Kafka
→ Notification workers consume from Kafka
→ Workers fetch followers in batches of 1,000
→ Thousands of workers process in parallel
→ 250M notifications sent in seconds
Step 4: Architecture
Layer 1: Event Sources
All application services publish notification events to Kafka:
Order Service → "order.shipped" event
Social Service → "post.liked" event
Auth Service → "new.login" event
Scheduler → "weekly.summary.due" event
↓
[Kafka]
Layer 2: Notification Service (The Brain)
Consumes Kafka events and makes decisions:
Event: { type: "POST_LIKED", senderId: "u123", recipientId: "u456" }
Notification Service:
→ Fetch user u456 preferences (Redis cache)
→ Disabled like notifications? → skip
→ DND hours active? → delay
→ Preferred channels? → push + email
→ Build content
→ Route to channel queues
Layer 3: Channel Queues (Separate Per Channel)
Notification Service
↓ ↓ ↓ ↓
Push Queue Email Queue SMS Queue In-App Queue
↓ ↓ ↓ ↓
Push Worker Email Worker SMS Worker In-App Worker
↓ ↓ ↓ ↓
APNs/FCM SendGrid Twilio Cassandra
Why separate queues per channel:
- SMS is slow — don’t let it block push delivery
- Email provider down → push and SMS still work
- Each channel scales independently
- A push spike doesn’t affect email throughput
Kafka vs RabbitMQ Decision
Both work for the channel queues. Many companies use Kafka end-to-end:
Kafka: "notification.push"
Kafka: "notification.email"
Kafka: "notification.sms"
Push consumer group → each notification processed by exactly one worker
Use RabbitMQ if you need native priority queues or per-message TTL (both complex in Kafka). Use Kafka if team already uses it and you want one consistent system.
There is no single correct answer. The key is justifying your choice:
“Kafka end-to-end — consumer groups give exactly-once processing per channel, consistent technology, handles our scale”
“Kafka at ingestion for fan-out, RabbitMQ at channel layer for native priority queues and simpler retry config”
Both are valid.
Step 5: Data Model
Notifications table — Cassandra:
id → unique notification ID
recipient_id → partition key
created_at → sort key (time series access pattern)
type → "POST_LIKED", "ORDER_SHIPPED"
channel → "push", "email", "sms", "in_app"
status → "pending", "sent", "failed", "read"
metadata → JSON
Cassandra fits: 2B writes/day, time series, access pattern is “all notifications for user u456 sorted by time.”
User preferences — PostgreSQL:
user_id → FK to users
push_enabled → boolean
email_enabled → boolean
dnd_start_hour → 22
dnd_end_hour → 8
disabled_types → ["MARKETING"]
timezone → "Asia/Kolkata"
PostgreSQL fits: relational (tied to user accounts), mutable, low volume, ACID for preference updates.
Cache preferences in Redis:
Key: "prefs:u456"
TTL: 1 hour
2B notifications/day hitting PostgreSQL for preferences = disaster. Read from Redis first.
Step 6: Hard Problems
Idempotency — No Duplicate Notifications
In distributed systems, failures cause retries. Retry = potential duplicate.
Worker sends push → APNs delivers ✅
Network timeout before confirmation → worker retries
User gets notification twice ❌
Fix — idempotency key:
notificationId = hash(recipientId + eventType + eventId)
Before sending:
→ Check Redis: "sent:{notificationId}" exists?
→ Yes → already sent, skip
→ No → send, set "sent:{notificationId}" in Redis (24hr TTL)
Same event processed multiple times → delivered exactly once.
Do Not Disturb
Notification arrives at 11 PM, user has DND 10 PM–8 AM:
→ Don't discard — delay it
→ Store in "scheduled_notifications"
→ Scheduler job runs at 8 AM, sends delayed notifications
Transactional notifications override DND:
- OTP → send immediately regardless
- Order confirmed → send immediately
- Marketing email → respect DND
- Like notification → respect DND
Third Party Failures
Worker attempts send
→ APNs/SendGrid/Twilio returns error
→ Exponential backoff retry:
Retry 1 → wait 1 min
Retry 2 → wait 5 min
Retry 3 → wait 30 min
Retry 4 → Dead Letter Queue → alert engineers
Priority Queues
OTP for login → Critical — seconds matter
Order confirmed → High — send quickly
Post like → Normal — minutes acceptable
Weekly summary → Low — any time today
Separate queues per priority. Most workers assigned to Critical.
Full Architecture
[Event Sources]
Order / Social / Auth Services
↓
[Kafka]
"notification.events"
↓
[Notification Service]
Checks Redis prefs cache
Checks DND
Builds content
Routes to channel queues
↓
[Channel Queues]
Push | Email | SMS | In-App
↓
[Channel Workers]
↓
APNs/FCM | SendGrid | Twilio | DB
↓
[Storage]
Cassandra → notification history
PostgreSQL → user preferences
Redis → preference cache + idempotency keys
Feature Extension: Notification Grouping
The ask: show “John, Mary and 2 others liked your post” instead of 4 separate notifications.
Redis aggregation window:
Like arrives
→ Don't send immediately
→ Redis key: "notif:group:{postId}:{recipientId}"
Value: ["John", "Mary", "Alex"]
TTL: 30 seconds
New like:
→ Key exists? → append to list, reset TTL
→ Key gone (TTL expired)? → send grouped notification
Worker triggered by TTL expiry → sends "John, Mary and 1 other liked your post"
Never group:
- OTP, order updates, direct messages, security alerts — each is distinct and time-sensitive
Feature Extension: Real-Time Unread Counter
The ask: bell icon updates in real time without page refresh.
Option A: SSE + Redis Pub/Sub
Notification arrives for u456
→ Increment Redis counter: "unread:u456"
→ Publish to Redis Pub/Sub: channel "user:u456"
→ SSE Server holding u456's connection receives signal
→ Pushes to browser → bell icon updates instantly
Redis Pub/Sub vs Kafka:
| Redis Pub/Sub | Kafka | |
|---|---|---|
| Message persistence | No — fire and forget | Yes — stored on disk |
| Offline consumers | Message lost | Message waits |
| Replay | No | Yes |
| Latency | Sub-millisecond | Low but higher |
| Use case | Real-time signaling | Event streaming |
Redis Pub/Sub is right here because:
- Message loss is acceptable — the Redis counter is the source of truth
- User offline = no SSE connection = nobody to signal anyway
- Sub-millisecond latency required
The scaling problem: 100M users = 100M SSE connections = 2,000 servers at $300K/month just for a bell icon counter.
Option B: Polling (Often the Right Answer)
Client polls every 30 seconds:
GET /api/notifications/count
→ App server reads Redis "unread:u456"
→ Returns count in <1ms
→ Bell icon updates
Why polling often wins for this specific feature:
- Redis counter already exists — polling just reads it
- No extra infrastructure, no extra cost
- 30 second delay on a notification counter is acceptable
- 100M users × poll every 30s = ~3.3M req/sec — Redis handles this easily
When to use SSE instead of polling:
- Feature genuinely requires sub-second updates (live auction, live score)
- You already have WebSocket infrastructure (piggyback for free)
- Poll requests themselves become the bottleneck
When to use push notifications (mobile):
- APNs (iOS) and FCM (Android) maintain the connections
- Apple and Google handle delivery to device
- Zero SSE servers needed for mobile users
Start with the simplest solution that meets the requirement. Add complexity only when the simple solution genuinely breaks.
For a notification counter: polling is simpler, cheaper, and sufficient. Build SSE only if polling proves inadequate.
Everything Connects
| Component | Lesson |
|---|---|
| Kafka fan-out | Lesson 8 — Message Queues |
| Channel workers with RabbitMQ/Kafka | Lesson 8 |
| Redis preference cache | Lesson 6 — Caching |
| Redis idempotency keys | Lesson 6 |
| Redis Pub/Sub for SSE routing | Lesson 6 |
| Cassandra for notification history | Lesson 7 — Databases |
| PostgreSQL for user preferences | Lesson 7 |
| Load balancer across notification servers | Lesson 5 |
| AP for notification history, CP for idempotency | Lesson 3 — CAP |
Key Takeaways
- Fan-out is the hardest scaling problem — one event, millions of recipients. Kafka + batched workers solves it.
- Separate queues per channel — failures and slowdowns in one channel don’t affect others.
- Idempotency keys in Redis — the simplest way to prevent duplicate delivery.
- DND means delay, not discard. Transactional notifications always bypass DND.
- Redis Pub/Sub = fire and forget signaling. Kafka = guaranteed delivery with replay. Don’t confuse them.
- Real-time unread counter: polling + Redis counter is often the right answer. SSE is only worth the infrastructure cost when sub-second updates are a genuine requirement.
Part of the system design series. Next: designing a feed system — how 500 million users get a personalised feed in under 100ms.