Back

Design an AI Chatbot System

System DesignSystem DesignOnsitePhoneFrontend EngineerSoftware EngineerReported Apr, 2026High Frequency

Problem Statement

Design a simple AI chatbot system similar to ChatGPT's main interface. The system should allow users to interact with an AI assistant through a web-based chat interface.

Core Requirements

User Authentication: Users must be able to authenticate before accessing the chatbot

Message Exchange: Users should be able to send messages and receive responses from the AI

Streaming Responses: AI responses should be displayed in real-time as tokens stream in (not all at once)

Session Management: Conversations are ephemeral - when users refresh the page, a new conversation begins with no data persistence

Key Constraints

No Backend Storage: User messages and conversation history should NOT be stored in a database

Client-Side Storage: All conversation data should be stored in the browser

Frontend Focus: The design should emphasize full-stack considerations, particularly frontend architecture

Follow-up Questions

During the interview, you may be asked to discuss:

Streaming Technology Choice: Compare and contrast different streaming technologies (Server-Sent Events vs WebSockets) for delivering AI responses

API Integration: How to integrate with Chat Completion APIs effectively

Credential Management: Details about handling user authentication and credentials securely

State Management: How to manage conversation state in the browser

Error Handling: How to handle network failures, API timeouts, and streaming interruptions

Solution Resources

This problem has well-documented solutions available online. We recommend reviewing these comprehensive guides:

Design ChatGPT

System Design School - ChatGPT Solution

System Design Handbook - OpenAI Interview Guide

Note: These resources provide detailed architectural approaches. During your interview, you should develop and articulate your own solution based on your understanding of the requirements and trade-offs.

Interview Experience & Insights

Interview Format

Duration: 60 minutes

Role Type: This question is commonly asked for Full-stack Engineer positions

Interviewer Background: The interviewer may have a frontend engineering background, which influences the focus areas

What Interviewers Look For

Based on real interview experiences, interviewers typically emphasize:

Frontend Architecture: Since this is a full-stack problem with frontend emphasis, expect deep dives into:

How you structure the client-side application

State management patterns for chat interfaces

Efficient rendering of streaming content

Streaming Implementation: Strong understanding of:

Server-Sent Events (SSE) vs WebSockets trade-offs

When to use each technology

How to handle backpressure and connection issues

API Design: Clear thinking about:

How the frontend communicates with the Chat Completion API

Request/response patterns for streaming

Error handling and retry logic

Authentication Flow: Detailed discussion of:

User credential management

Token-based authentication

Session handling without server-side persistence

Browser Storage: Practical approaches to:

Storing conversation history locally

Managing storage limits

Clearing data on refresh


Reference solution

#26 Design an AI Chatbot System (ChatGPT-like) — Solution

✦ AI-Generated Solution · System Design (full-stack, frontend-emphasis) · Comprehensive This question is scored mostly on frontend architecture + streaming, with a deliberately stateless, no-database backend.


1. Requirements

Functional

  • Authenticated access before chatting.
  • Send a message, receive an AI response.
  • Streaming responses token-by-token (not all at once).
  • Ephemeral sessions — refresh starts a brand-new conversation; nothing persisted server-side.

Constraints (given)

  • No backend database for messages/history.
  • All conversation data lives in the browser.
  • Emphasis on full-stack reasoning, especially the client.

2. Architecture

AI chatbot architecture

The only backend component is a thin BFF / proxy that (a) holds the model API key so it never reaches the browser, (b) enforces auth and rate limits, and (c) relays the streamed tokens. There is no message store.

3. Streaming: SSE vs WebSocket (the core trade-off)

Server-Sent Events (SSE)WebSocket
DirectionServer→client onlyFull duplex
TransportPlain HTTP, auto-reconnectSeparate ws:// upgrade
Fit for LLM tokensExcellent (one-way stream)Overkill
Infra/proxy friendlinessHigh (just HTTP)Needs sticky/ws-aware LB

Recommendation: SSE. A chat completion is a one-way token stream; SSE is simpler, proxy-friendly, and has built-in reconnection. Reach for WebSockets only if you need bidirectional features (live collaboration, interrupt/typing signals with server push). The modern variant is fetch() with a ReadableStream reading the text/event-stream body, which also lets you POST the prompt (raw EventSource is GET-only).

// Client: stream tokens from the BFF
async function streamChat(messages, onToken, signal) {
  const res = await fetch("/api/chat", {
    method: "POST",
    headers: { "Content-Type": "application/json", Authorization: `Bearer ${token}` },
    body: JSON.stringify({ messages, stream: true }),
    signal,                                   // AbortController -> "stop generating"
  });
  const reader = res.body.getReader();
  const decoder = new TextDecoder();
  let buf = "";
  while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    buf += decoder.decode(value, { stream: true });
    for (const line of buf.split("\n\n")) {            // SSE frames split on blank line
      if (line.startsWith("data: ")) {
        const data = line.slice(6);
        if (data === "[DONE]") return;
        onToken(JSON.parse(data).delta);               // append token to UI
      }
    }
    buf = buf.slice(buf.lastIndexOf("\n\n") + 2);
  }
}

4. Frontend Architecture

  • Component tree: App → AuthGate → ChatView → (MessageList(virtualized), Composer). Token deltas append to the last assistant message.
  • State management: a single store (Redux/Zustand/Context+reducer) holding messages[], streamingStatus (idle | streaming | error), and the in-flight AbortController. Keep streaming text in a dedicated field and commit to the message on [DONE] to avoid re-rendering the whole list per token.
  • Efficient streaming render: append to a single text node / use requestAnimationFrame batching; virtualize the message list so long chats stay smooth; auto-scroll only when the user is at the bottom.
  • Browser storage: because sessions are ephemeral, keep the live conversation in in-memory state; optionally mirror to sessionStorage (cleared on tab close) — explicitly not localStorage, to honor "refresh starts fresh." Mention storage-quota handling if persistence were ever wanted.

5. Auth & Credential Management

  • User authenticates via OAuth/OIDC → short-lived access token (in memory) + refresh token in an HttpOnly, Secure, SameSite cookie (not readable by JS → mitigates XSS token theft).
  • The model API key never touches the browser — the BFF injects it server-side. The browser only ever holds the user's token.
  • BFF enforces per-user rate limits and usage quotas before proxying.

6. Error Handling & Resilience

  • Network/stream interruption: detect a dropped reader; show a "retry" affordance. Because state is client-side, you can re-send the conversation so far to continue.
  • API timeout / 429: exponential backoff with jitter; surface a friendly message; preserve the user's typed input.
  • Partial responses: keep already-streamed tokens; mark the message as incomplete so the user can regenerate.
  • Stop generation: AbortController.abort() cancels the fetch and tells the BFF to close the upstream stream.

7. Summary

ConcernDecision
TransportSSE (fetch + ReadableStream), POST prompt
BackendStateless BFF proxy, no DB, holds API key
StateSingle store; streaming buffer committed on [DONE]
PersistenceIn-memory (+ optional sessionStorage); ephemeral by design
AuthOIDC; access token in memory, refresh in HttpOnly cookie
RenderingVirtualized list, rAF-batched token appends, smart auto-scroll
ResilienceAbort/stop, backoff on 429/timeout, resend-to-continue
WhiteboardAuto-save enabled
Loading whiteboard…