Design a simple service architecture of a payment processor.
Payment processing is the act of receiving payment requests, forwarding those requests to downstream processors, and then returning responses.
The flow consists of:
Hold: Receive a request for an account with an account number (e.g., a credit card number) and amount. This request is forwarded to a downstream processor.
The downstream processor has the choice to approve or deny the hold request. If approved, the money in the account is held for some short period of time.
Charge: If the hold request is approved, the customer has the opportunity to sign (and potentially tip) and send a charge request.
If no charge request is received, eventually the held funds are released.
The charge requests are gathered up into a large batch file grouped by downstream processor.
Every evening at 10pm, that file is sent and the funds are wire transferred in a big batch.
Example: A customer might tap their American Express credit card at a coffee shop.
The coffee shop's card machine sends a hold request to the processor (us), and we forward the request to the downstream processor (American Express) to hold funds for the payment with the credit card number and amount.
American Express might approve that hold based on factors including if there is sufficient credit on the account.
Later, the retailer sends a charge request which is batched by the processor (us) and sent in a batch file with all the American Express transactions to initiate a large transfer for all of the funds.
Note: This is an intentionally simplified view of a payments system to fit in the time period of the interview. Many real-world payment systems have additional complexity.
System Design - Payment System
Payment Processing System Design - Detailed Technical Walkthrough
First Half - Standard Design:
Chose DynamoDB (was more familiar with it compared to other databases)
After analyzing traffic patterns and comparing different database options, decided to stick with DynamoDB
Key points that earned credit:
Mentioned idempotency keys for preventing duplicate charges
Discussed state transitions (pending → authorized → captured → settled)
Second Half - Where Things Got Difficult:
The daily 10pm batch commit was the challenging part:
This is where merchants can adjust the final amount (e.g., adding tips) before committing
Spent too much time thinking about partition key and sort key design to make queries simpler
Struggled with designing an efficient way to query and batch all the captured transactions
What Could Have Been Better:
"In hindsight, I should have just added state as a Local Secondary Index (LSI). That would have made querying by state much simpler for the batch settlement process."
Result:
Did not receive a "strong hire" recommendation
Main reason: batch commit design wasn't handled well
Interview Style:
System is "quite small" in scope, so questions went very deep on the details
Need to specify exact API designs (endpoints, request/response formats)
Need to design database table schemas (specific columns, indexes, data types)
Common Follow-Up Questions:
10x Scale: "How would you handle 10x the traffic?"
Need to discuss sharding strategies
Database bottlenecks (write throughput)
External processor API rate limits
Global Launch: "How would you design this for global deployment?"
Multi-region architecture
Data residency and compliance (GDPR, PCI-DSS)
Regional settlement processes with local processors
Key Takeaway: Because the problem scope is intentionally simplified, interviewers compensate by asking for very specific implementation details and challenging follow-ups about scaling and global deployment.
✦ AI-Generated Solution · System Design · Comprehensive
Functional
(account_number, amount), forward to the correct downstream processor, record approve/deny.Non-functional
Model a payment as a single entity advancing through an explicit state machine. This is the single most important idea the interviewer is looking for.

PENDING ── hold approved ──> AUTHORIZED ── charge req ──> CAPTURED ── 10pm batch ──> SETTLED
│ │
└── hold denied ──> DECLINED └── timeout (no charge) ──> EXPIRED (funds released)
-- One row per payment, mutated through its lifecycle (state is indexed for batch queries)
CREATE TABLE payments (
payment_id UUID PRIMARY KEY,
account_number TEXT NOT NULL, -- tokenized / vaulted, never raw PAN
processor_id TEXT NOT NULL, -- amex, visa, ...
amount_auth BIGINT NOT NULL, -- minor units (cents)
amount_capture BIGINT, -- final amount incl. tip
state TEXT NOT NULL, -- PENDING|AUTHORIZED|CAPTURED|SETTLED|DECLINED|EXPIRED
idempotency_key TEXT UNIQUE, -- dedupes retries
hold_expires_at TIMESTAMPTZ,
created_at TIMESTAMPTZ DEFAULT now(),
updated_at TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_settlement ON payments (processor_id, state, hold_expires_at)
WHERE state = 'CAPTURED'; -- partial index makes the 10pm batch query cheap
CREATE TABLE payment_events ( -- append-only audit log (event sourcing)
event_id BIGSERIAL PRIMARY KEY,
payment_id UUID NOT NULL,
type TEXT NOT NULL, -- HOLD_REQUESTED, HOLD_APPROVED, CAPTURED, SETTLED...
payload JSONB,
created_at TIMESTAMPTZ DEFAULT now()
);
The interview report explicitly flagged that the candidate who struggled "should have added state as a secondary index" — the partial index on (processor_id, state) above is exactly that fix and makes the batch settlement query a fast range scan instead of a full table scan.
POST /v1/holds Idempotency-Key: <uuid>
{ "account_number": "tok_...", "amount": 4000, "processor": "amex" }
-> 201 { "payment_id": "...", "state": "AUTHORIZED" } # or DECLINED
POST /v1/payments/{id}/capture Idempotency-Key: <uuid>
{ "amount": 4600 } # original + tip
-> 200 { "state": "CAPTURED" }
POST /v1/payments/{id}/release -> 200 { "state": "EXPIRED" }
GET /v1/payments/{id} -> current state + event history
Every mutating endpoint requires an Idempotency-Key. The server stores the key with the result; a replay returns the stored result instead of re-executing.

PENDING, calls the downstream processor, transitions to AUTHORIZED/DECLINED. Wrap the external call so a timeout doesn't leave an ambiguous state (reconcile via the processor's query API).AUTHORIZED payment, writes CAPTURED, emits an event.state = CAPTURED, writes a settlement file to object storage, initiates the wire transfer, and atomically flips rows to SETTLED (mark with a batch_id so a re-run is idempotent).(processor_id, 'CAPTURED') in pages; stream rows into a settlement file (group-by processor).batch_id and flip each included row CAPTURED -> SETTLED in the same transaction that records the batch membership — so a crash mid-run is safely re-runnable (rows already SETTLED are skipped).idempotency_key UNIQUE constraint + the state machine (AUTHORIZED -> CAPTURED only) prevents double capture even under retries.PENDING, then reconcile using the processor's idempotency key / query API.state GSI) — you need strong consistency and conditional writes for the state transitions. If using DynamoDB, use conditional updates (attribute state = AUTHORIZED) and a GSI on state for the batch query.payments by payment_id; the settlement job fans out per processor partition. Authorize path is the write hot path → scale stateless services horizontally, batch is throughput-bound not latency-bound.PENDING/AUTHORIZED (reconciliation gaps), settlement file row-count vs captured-count parity, and per-processor wire acknowledgments.| Decision | Choice | Why |
|---|---|---|
| Core model | Explicit state machine + event log | Auditability, no illegal transitions |
| Dedup | Idempotency-Key unique constraint | Exactly-once under retries |
| Settlement query | Partial index on (processor, state) | Cheap range scan at 10pm |
| Re-runnable batch | batch_id + atomic state flip | Crash-safe settlement |
| Storage | Relational / conditional writes | Strong consistency for money |