Three of the most important questions in payment system design. These separate candidates who understand distributed systems from those who just know patterns.
Question 1 — Can the System Handle 10,000 Transactions/Second?
Each transaction involves:
→ Validate payment details
→ Check account balance
→ Debit sender account
→ Credit receiver account
→ Record transaction log
→ Send confirmation
All of these must be ACID guaranteed.
Cannot lose a single transaction.
Cannot double charge.
Cannot credit without debiting.
Where It Breaks
Single PostgreSQL node: ~5,000–10,000 simple queries/second
Each transaction = 4–6 queries
Effective capacity: ~1,000–2,000 TPS
10,000 TPS overwhelms a single node ❌
The Fix — Tiered Architecture
Layer 1 — Redis validation (before DB is touched)
→ Account exists? (Redis cache)
→ Card valid? (Redis cache)
→ Daily limit exceeded? (Redis counter)
→ Rate limit? (Redis)
~80% of validation in Redis → sub-millisecond
Invalid requests filtered before reaching DB ✅
Layer 2 — Horizontal sharding
Shard accounts by account_id:
Shard 1 → accounts 1–25M
Shard 2 → accounts 25M–50M
Shard 3 → accounts 50M–75M
Shard 4 → accounts 75M–100M
10,000 TPS / 4 shards = 2,500 TPS per shard ✅
Layer 3 — Connection pooling
PgBouncer sits between app servers and DB
10,000 concurrent transactions
don't need 10,000 DB connections
Pool of 500 connections handles 10,000 TPS ✅
Layer 4 — Async non-critical work via Kafka
Critical path (synchronous):
→ Debit sender ✅
→ Credit receiver ✅
→ Return confirmation ✅ (<100ms)
Async via Kafka:
→ Email receipt
→ Fraud scoring
→ Analytics
→ Loyalty points
→ Push notification
Question 2 — Database Crashes Mid-Transaction
The Scary Scenarios
Scenario A:
→ Debit sender ✅
→ DB crashes
→ Credit receiver ❌ never happens
→ Money vanished ❌
Scenario B:
→ Debit sender ✅
→ Credit receiver ✅
→ DB crashes before COMMIT
→ Both rolled back — user thinks payment went through ❌
Scenario C:
→ Transaction committed
→ DB crashes before writing to disk
→ Was it committed or not? Nobody knows ❌
Solution 1 — Write-Ahead Log (WAL)
PostgreSQL’s core protection:
Before any data changes:
→ Write intended change to WAL (sequential disk write, fast)
→ Only then apply change to actual data
DB crashes mid-transaction:
→ Restart → read WAL → sees incomplete transaction
→ Rolls back to last consistent state ✅
DB crashes after commit:
→ WAL shows transaction was committed
→ PostgreSQL replays WAL on restart
→ Transaction restored ✅
WAL means the database always knows exactly what happened and can recover to a consistent state.
Solution 2 — Saga Pattern
For cross-shard payments, ACID transactions alone aren’t enough — you can’t do a single ACID transaction across two databases.
A Saga breaks one big transaction into smaller steps, each with a compensating action that undoes it if something fails later.
Step 1: Debit sender (Shard 1)
→ Fail → stop, show error
Step 2: Credit receiver (Shard 2)
→ Fail → COMPENSATE Step 1: refund sender ✅
Step 3: Record transaction log
→ Fail → COMPENSATE Steps 1 & 2 → both accounts restored ✅
Every step has a compensating action:
| Action | Compensation |
|---|---|
| Debit sender | Refund sender |
| Credit receiver | Debit receiver back |
| Send receipt | Send correction email |
If DB crashes mid-saga:
System restarts
→ Reads saga state from persistent log
→ Knows exactly which step failed
→ Executes compensating actions for completed steps
→ Everything rolled back cleanly ✅
→ No money lost. No inconsistent state.
Solution 3 — Synchronous Replication + Automatic Failover
Primary DB handles all writes
Replica — synchronous replication
→ Transaction only confirmed when BOTH primary AND replica have written it
Primary crashes:
→ Replica promoted automatically
→ Failover: 30–60 seconds
→ Zero data loss ✅
Question 3 — Same Request Hits Server Twice (Network Retry)
The Scenario
User clicks Pay ₹10,000
Request 1 → processes → ₹10,000 debited ✅ → response sent
Network drops before response reaches user
User sees spinner → clicks Pay again (or app auto-retries)
Request 2 → looks like a new transaction → ₹10,000 debited again ❌
User charged twice.
This happens constantly in production — network timeouts, mobile drops, load balancer retries.
Solution — Idempotency Keys
Industry standard. Used by Stripe, Razorpay, PayPal, every serious payment processor.
Client generates a unique key before sending:
idempotencyKey = UUID() → "a3f9b2c1-4d5e-6f7a-8b9c"
POST /api/payment
{
amount: 10000,
to: "receiver123",
idempotencyKey: "a3f9b2c1-4d5e-6f7a-8b9c"
}
Server logic:
Request arrives with idempotencyKey
Check Redis: "idempotency:a3f9b2c1-4d5e-6f7a-8b9c" exists?
→ No (first time):
Process transaction
Store result in Redis:
Key: "idempotency:a3f9b2c1..."
Value: { status: "success", transactionId: "t789" }
TTL: 24 hours
Return result ✅
→ Yes (duplicate):
Don't process again
Return SAME stored result
No double charge ✅
Why the client must generate the key (not the server):
If server generates it → client must receive it first
Network drop before client receives key
→ Client retries without key → duplicate possible ❌
Client generates before sending:
→ Key exists regardless of network
→ Same key used for all retries
→ Server deduplicates correctly ✅
Edge Case — Two Identical Requests Arrive Simultaneously
Request 1 arrives → starts processing → not yet done
Request 2 arrives → same key → Redis key not stored yet → thinks new request → double charge ❌
Fix — SET NX as a lock:
Request arrives:
SET "idempotency:key123" "processing" NX EX 30
→ NX: only set if key doesn't exist
→ EX 30: expires in 30 seconds
SET succeeded → this request owns processing → proceed
SET failed → another request processing → wait → return stored result
Only one request ever processes ✅
Complete Payment Architecture
[Client]
Generate idempotencyKey
↓
[Rate Limiter]
Max 5 attempts/user/minute
↓
[App Server]
Check idempotency key in Redis
→ Duplicate → return cached result immediately
→ New → SET NX lock → proceed
↓
[Redis Validation]
Account exists? Daily limit? Card valid? Fraud score?
Failures rejected here — DB never touched
↓
[Saga Orchestrator]
Step 1: Debit sender (Shard 1) → WAL first → record step
Step 2: Credit receiver (Shard 2) → WAL first → record step
Step 3: Commit → store idempotency result in Redis
↓
[Primary DB + Synchronous Replica]
Written to both before confirming
Primary crashes → replica promotes → zero data loss
↓
[Kafka — async]
Email, push notification, analytics, fraud analysis, loyalty points
↓
[Response to user: <200ms]
What Happens in Each Failure Scenario
| Failure | What saves it |
|---|---|
| DB crashes mid-transaction | WAL rollback + Saga compensating actions |
| Network retry sends duplicate | Idempotency key → returns cached result |
| Primary DB goes down | Replica promotes; in-flight tx roll back; client retries with same key → processed once on new primary |
| Saga step fails midway | Compensating actions restore all accounts |
The Three Non-Negotiable Principles
Everything in payment systems reduces to these three:
Atomicity — all steps complete or none do → ACID + Saga pattern
Idempotency — same request processed exactly once, regardless of retries → Client-generated idempotency keys + Redis SET NX
Durability — committed transactions survive any failure → WAL + synchronous replication
Everything else — performance, scale, features — is secondary to getting these three right.