Why Kafka Sits Between Cassandra and the Fan-out Service

·
system-design kafka feed-system message-queues

The Architecture With Kafka

User creates post

App Server

Save to Cassandra

Publish to Kafka

Fan-out Service reads from Kafka

Updates Redis feeds

What Breaks Without It

Option A — App Server Calls Fan-out Directly

Problem 1 — User waits for fan-out

Save to Cassandra → 10ms
Call Fan-out Service → updates 1,000 follower feeds in Redis → 500ms
Return "Post created" to user → 510ms total

For a celebrity with 10M followers → user waits minutes ❌

With Kafka: save to Cassandra (10ms) → drop message in Kafka (5ms) → return immediately. Fan-out happens in background.

Problem 2 — Tight coupling

Without Kafka:
Fan-out Service goes down for 2 minutes
→ App Server gets connection errors
→ Post creation fails for users
→ Core feature broken by a background service ❌

With Kafka:
Fan-out Service goes down for 2 minutes
→ Messages accumulate in Kafka
→ Post creation works perfectly ✅
→ Service comes back, processes backlog
→ Slightly delayed feeds, nothing lost

Problem 3 — Traffic spike absorption

IPL match ends — 10 million users post simultaneously:

Without Kafka:
10M posts/sec → App Server calls Fan-out directly
→ Fan-out overwhelmed → crashes
→ App Server gets errors → post creation fails ❌

With Kafka:
10M posts/sec → messages drop into Kafka (it's a buffer)
Fan-out processes at its own pace (e.g. 100k/sec)
→ Catches up in ~100 seconds
→ Zero post creation failures ✅

Problem 4 — Multiple consumers

When a post is created, multiple services need to react:

Without Kafka:
App Server → calls Fan-out API
           → calls Notification API
           → calls Analytics API
           → calls Search Indexing API
           → calls Moderation API
5 direct calls. Any one fails → problem.
Adding a new service → modify App Server code. ❌

With Kafka:
App Server → publishes ONE event
All 5 services independently subscribe.
Adding a new service → just subscribe to the topic.
App Server code never changes. ✅

Option B — App Server Writes Directly to Redis

User creates post
→ App Server saves to Cassandra
→ App Server fetches all followers
→ App Server updates 1,000 Redis keys
→ Return response

At 1,200 posts/sec: 1.2M DB reads + 1.2M Redis writes/sec, all inside user-facing request handlers. App servers overwhelmed, latency spikes for everyone.


The Three Problems Kafka Solves

ProblemWithout KafkaWith Kafka
LatencyUser waits for fan-outInstant response, fan-out async
CouplingPost creation fails if fan-out failsCompletely independent
Traffic spikesFan-out overwhelmed and crashesKafka buffers the spike

Bonus: Replay

Fan-out service had a bug for 2 hours.
Posts processed incorrectly.

Without Kafka: those 2 hours of posts → processed wrongly → can't redo ❌
With Kafka: reset consumer offset to 2 hours ago → reprocess correctly ✅

The Mental Model — When to Use Kafka

Ask three questions:

  1. Can this work happen after I respond to the user? → candidate for async
  2. Would my core feature break if this downstream processing fails? → need decoupling
  3. Can load be uneven — spikes followed by quiet periods? → need buffering

For fan-out in the feed system, all three answers are yes.


One Line Summary

Post creation and feed fan-out are two different concerns with different speeds, different failure modes, and different scale requirements. Without Kafka they are one fragile synchronous chain. With Kafka they are two independent systems that communicate through a buffer.