Back

Payment Processing System (Stripe-like)

System DesignSystem DesignOnsitePhoneSoftware EngineerReported Apr, 2026High Frequency

Problem Statement

Design a simple service architecture of a payment processor.

Payment processing is the act of receiving payment requests, forwarding those requests to downstream processors, and then returning responses.

The flow consists of:

Hold: Receive a request for an account with an account number (e.g., a credit card number) and amount. This request is forwarded to a downstream processor.

The downstream processor has the choice to approve or deny the hold request. If approved, the money in the account is held for some short period of time.

Charge: If the hold request is approved, the customer has the opportunity to sign (and potentially tip) and send a charge request.

If no charge request is received, eventually the held funds are released.

The charge requests are gathered up into a large batch file grouped by downstream processor.

Every evening at 10pm, that file is sent and the funds are wire transferred in a big batch.

Example: A customer might tap their American Express credit card at a coffee shop.

The coffee shop's card machine sends a hold request to the processor (us), and we forward the request to the downstream processor (American Express) to hold funds for the payment with the credit card number and amount.

American Express might approve that hold based on factors including if there is sufficient credit on the account.

Later, the retailer sends a charge request which is batched by the processor (us) and sent in a batch file with all the American Express transactions to initiate a large transfer for all of the funds.

Note: This is an intentionally simplified view of a payments system to fit in the time period of the interview. Many real-world payment systems have additional complexity.

System Design - Payment System

Payment Processing System Design - Detailed Technical Walkthrough

Real Interview Experiences

Experience #1: DynamoDB Choice & Batch Settlement Struggles

First Half - Standard Design:

Chose DynamoDB (was more familiar with it compared to other databases)

After analyzing traffic patterns and comparing different database options, decided to stick with DynamoDB

Key points that earned credit:

Mentioned idempotency keys for preventing duplicate charges

Discussed state transitions (pending → authorized → captured → settled)

Second Half - Where Things Got Difficult:

The daily 10pm batch commit was the challenging part:

This is where merchants can adjust the final amount (e.g., adding tips) before committing

Spent too much time thinking about partition key and sort key design to make queries simpler

Struggled with designing an efficient way to query and batch all the captured transactions

What Could Have Been Better:

"In hindsight, I should have just added state as a Local Secondary Index (LSI). That would have made querying by state much simpler for the batch settlement process."

Result:

Did not receive a "strong hire" recommendation

Main reason: batch commit design wasn't handled well

Experience #2: Small System, Very Detailed Questions

Interview Style:

System is "quite small" in scope, so questions went very deep on the details

Need to specify exact API designs (endpoints, request/response formats)

Need to design database table schemas (specific columns, indexes, data types)

Common Follow-Up Questions:

10x Scale: "How would you handle 10x the traffic?"

Need to discuss sharding strategies

Database bottlenecks (write throughput)

External processor API rate limits

Global Launch: "How would you design this for global deployment?"

Multi-region architecture

Data residency and compliance (GDPR, PCI-DSS)

Regional settlement processes with local processors

Key Takeaway: Because the problem scope is intentionally simplified, interviewers compensate by asking for very specific implementation details and challenging follow-ups about scaling and global deployment.


Reference solution

#24 Payment Processing System (Stripe-like) — Solution

✦ AI-Generated Solution · System Design · Comprehensive


1. Requirements

Functional

  • Hold (authorize): accept (account_number, amount), forward to the correct downstream processor, record approve/deny.
  • Charge (capture): on an approved hold, accept a charge (with optional tip) and queue it for settlement.
  • Release: if no charge arrives before the hold expires, release the held funds.
  • Batch settlement: every evening at 10pm, group captured charges by downstream processor into one file and initiate a bulk wire transfer.

Non-functional

  • Correctness over latency — money must never be double-charged or lost. Exactly-once semantics on capture.
  • Idempotency — terminals retry; the same logical request must not create two holds/charges.
  • Auditability — every state transition is immutable and traceable (regulatory).
  • Scale target to discuss: ~thousands of auths/sec; settlement file of millions of rows/day.

2. Core Abstractions & State Machine

Model a payment as a single entity advancing through an explicit state machine. This is the single most important idea the interviewer is looking for.

Payment state machine

PENDING ── hold approved ──> AUTHORIZED ── charge req ──> CAPTURED ── 10pm batch ──> SETTLED
   │                             │
   └── hold denied ──> DECLINED  └── timeout (no charge) ──> EXPIRED (funds released)

3. Data Model

-- One row per payment, mutated through its lifecycle (state is indexed for batch queries)
CREATE TABLE payments (
  payment_id      UUID PRIMARY KEY,
  account_number  TEXT NOT NULL,          -- tokenized / vaulted, never raw PAN
  processor_id    TEXT NOT NULL,          -- amex, visa, ...
  amount_auth     BIGINT NOT NULL,        -- minor units (cents)
  amount_capture  BIGINT,                 -- final amount incl. tip
  state           TEXT NOT NULL,          -- PENDING|AUTHORIZED|CAPTURED|SETTLED|DECLINED|EXPIRED
  idempotency_key TEXT UNIQUE,            -- dedupes retries
  hold_expires_at TIMESTAMPTZ,
  created_at      TIMESTAMPTZ DEFAULT now(),
  updated_at      TIMESTAMPTZ DEFAULT now()
);
CREATE INDEX idx_settlement ON payments (processor_id, state, hold_expires_at)
  WHERE state = 'CAPTURED';   -- partial index makes the 10pm batch query cheap

CREATE TABLE payment_events (   -- append-only audit log (event sourcing)
  event_id   BIGSERIAL PRIMARY KEY,
  payment_id UUID NOT NULL,
  type       TEXT NOT NULL,            -- HOLD_REQUESTED, HOLD_APPROVED, CAPTURED, SETTLED...
  payload    JSONB,
  created_at TIMESTAMPTZ DEFAULT now()
);

The interview report explicitly flagged that the candidate who struggled "should have added state as a secondary index" — the partial index on (processor_id, state) above is exactly that fix and makes the batch settlement query a fast range scan instead of a full table scan.

4. API Design

POST /v1/holds            Idempotency-Key: <uuid>
  { "account_number": "tok_...", "amount": 4000, "processor": "amex" }
  -> 201 { "payment_id": "...", "state": "AUTHORIZED" }   # or DECLINED

POST /v1/payments/{id}/capture   Idempotency-Key: <uuid>
  { "amount": 4600 }                                       # original + tip
  -> 200 { "state": "CAPTURED" }

POST /v1/payments/{id}/release   -> 200 { "state": "EXPIRED" }
GET  /v1/payments/{id}           -> current state + event history

Every mutating endpoint requires an Idempotency-Key. The server stores the key with the result; a replay returns the stored result instead of re-executing.

5. Architecture

Payment processing architecture

  • API Gateway — authN/Z, idempotency-key check, request validation.
  • Hold Service — writes PENDING, calls the downstream processor, transitions to AUTHORIZED/DECLINED. Wrap the external call so a timeout doesn't leave an ambiguous state (reconcile via the processor's query API).
  • Charge Service — validates an AUTHORIZED payment, writes CAPTURED, emits an event.
  • Batch Settlement (10pm cron) — for each processor, range-scans state = CAPTURED, writes a settlement file to object storage, initiates the wire transfer, and atomically flips rows to SETTLED (mark with a batch_id so a re-run is idempotent).

6. Batch Settlement Deep Dive (the part candidates fail)

  1. Cron triggers at 10pm; acquire a distributed lock per processor (only one settlement run at a time).
  2. Range-scan the partial index (processor_id, 'CAPTURED') in pages; stream rows into a settlement file (group-by processor).
  3. Assign a batch_id and flip each included row CAPTURED -> SETTLED in the same transaction that records the batch membership — so a crash mid-run is safely re-runnable (rows already SETTLED are skipped).
  4. Upload the file, call the processor's bulk-transfer API, and persist the processor's acknowledgment. Reconcile the next morning against the processor's report.

7. Idempotency, Consistency & Failure Handling

  • Exactly-once capture: the idempotency_key UNIQUE constraint + the state machine (AUTHORIZED -> CAPTURED only) prevents double capture even under retries.
  • Ambiguous downstream timeouts: never assume failure; record PENDING, then reconcile using the processor's idempotency key / query API.
  • DB choice: a relational store (PostgreSQL/Vitess, or DynamoDB with a state GSI) — you need strong consistency and conditional writes for the state transitions. If using DynamoDB, use conditional updates (attribute state = AUTHORIZED) and a GSI on state for the batch query.

8. Scaling & Follow-ups

  • 10× traffic: shard payments by payment_id; the settlement job fans out per processor partition. Authorize path is the write hot path → scale stateless services horizontally, batch is throughput-bound not latency-bound.
  • Global deployment: regional processing with local downstream processors; keep data residency per region (PCI-DSS, GDPR); settle per region.
  • Security: never store raw PANs — tokenize/vault; encrypt at rest; scope PCI to the vault.
  • Observability: alert on stuck PENDING/AUTHORIZED (reconciliation gaps), settlement file row-count vs captured-count parity, and per-processor wire acknowledgments.

9. Summary

DecisionChoiceWhy
Core modelExplicit state machine + event logAuditability, no illegal transitions
DedupIdempotency-Key unique constraintExactly-once under retries
Settlement queryPartial index on (processor, state)Cheap range scan at 10pm
Re-runnable batchbatch_id + atomic state flipCrash-safe settlement
StorageRelational / conditional writesStrong consistency for money
WhiteboardAuto-save enabled
Loading whiteboard…