Design a simple AI chatbot system similar to ChatGPT's main interface. The system should allow users to interact with an AI assistant through a web-based chat interface.
User Authentication: Users must be able to authenticate before accessing the chatbot
Message Exchange: Users should be able to send messages and receive responses from the AI
Streaming Responses: AI responses should be displayed in real-time as tokens stream in (not all at once)
Session Management: Conversations are ephemeral - when users refresh the page, a new conversation begins with no data persistence
No Backend Storage: User messages and conversation history should NOT be stored in a database
Client-Side Storage: All conversation data should be stored in the browser
Frontend Focus: The design should emphasize full-stack considerations, particularly frontend architecture
During the interview, you may be asked to discuss:
Streaming Technology Choice: Compare and contrast different streaming technologies (Server-Sent Events vs WebSockets) for delivering AI responses
API Integration: How to integrate with Chat Completion APIs effectively
Credential Management: Details about handling user authentication and credentials securely
State Management: How to manage conversation state in the browser
Error Handling: How to handle network failures, API timeouts, and streaming interruptions
This problem has well-documented solutions available online. We recommend reviewing these comprehensive guides:
Design ChatGPT
System Design School - ChatGPT Solution
System Design Handbook - OpenAI Interview Guide
Note: These resources provide detailed architectural approaches. During your interview, you should develop and articulate your own solution based on your understanding of the requirements and trade-offs.
Duration: 60 minutes
Role Type: This question is commonly asked for Full-stack Engineer positions
Interviewer Background: The interviewer may have a frontend engineering background, which influences the focus areas
Based on real interview experiences, interviewers typically emphasize:
Frontend Architecture: Since this is a full-stack problem with frontend emphasis, expect deep dives into:
How you structure the client-side application
State management patterns for chat interfaces
Efficient rendering of streaming content
Streaming Implementation: Strong understanding of:
Server-Sent Events (SSE) vs WebSockets trade-offs
When to use each technology
How to handle backpressure and connection issues
API Design: Clear thinking about:
How the frontend communicates with the Chat Completion API
Request/response patterns for streaming
Error handling and retry logic
Authentication Flow: Detailed discussion of:
User credential management
Token-based authentication
Session handling without server-side persistence
Browser Storage: Practical approaches to:
Storing conversation history locally
Managing storage limits
Clearing data on refresh
✦ AI-Generated Solution · System Design (full-stack, frontend-emphasis) · Comprehensive This question is scored mostly on frontend architecture + streaming, with a deliberately stateless, no-database backend.
Functional
Constraints (given)

The only backend component is a thin BFF / proxy that (a) holds the model API key so it never reaches the browser, (b) enforces auth and rate limits, and (c) relays the streamed tokens. There is no message store.
| Server-Sent Events (SSE) | WebSocket | |
|---|---|---|
| Direction | Server→client only | Full duplex |
| Transport | Plain HTTP, auto-reconnect | Separate ws:// upgrade |
| Fit for LLM tokens | Excellent (one-way stream) | Overkill |
| Infra/proxy friendliness | High (just HTTP) | Needs sticky/ws-aware LB |
Recommendation: SSE. A chat completion is a one-way token stream; SSE is simpler, proxy-friendly, and has built-in reconnection. Reach for WebSockets only if you need bidirectional features (live collaboration, interrupt/typing signals with server push). The modern variant is fetch() with a ReadableStream reading the text/event-stream body, which also lets you POST the prompt (raw EventSource is GET-only).
// Client: stream tokens from the BFF
async function streamChat(messages, onToken, signal) {
const res = await fetch("/api/chat", {
method: "POST",
headers: { "Content-Type": "application/json", Authorization: `Bearer ${token}` },
body: JSON.stringify({ messages, stream: true }),
signal, // AbortController -> "stop generating"
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buf = "";
while (true) {
const { value, done } = await reader.read();
if (done) break;
buf += decoder.decode(value, { stream: true });
for (const line of buf.split("\n\n")) { // SSE frames split on blank line
if (line.startsWith("data: ")) {
const data = line.slice(6);
if (data === "[DONE]") return;
onToken(JSON.parse(data).delta); // append token to UI
}
}
buf = buf.slice(buf.lastIndexOf("\n\n") + 2);
}
}
App → AuthGate → ChatView → (MessageList(virtualized), Composer). Token deltas append to the last assistant message.messages[], streamingStatus (idle | streaming | error), and the in-flight AbortController. Keep streaming text in a dedicated field and commit to the message on [DONE] to avoid re-rendering the whole list per token.requestAnimationFrame batching; virtualize the message list so long chats stay smooth; auto-scroll only when the user is at the bottom.sessionStorage (cleared on tab close) — explicitly not localStorage, to honor "refresh starts fresh." Mention storage-quota handling if persistence were ever wanted.reader; show a "retry" affordance. Because state is client-side, you can re-send the conversation so far to continue.AbortController.abort() cancels the fetch and tells the BFF to close the upstream stream.| Concern | Decision |
|---|---|
| Transport | SSE (fetch + ReadableStream), POST prompt |
| Backend | Stateless BFF proxy, no DB, holds API key |
| State | Single store; streaming buffer committed on [DONE] |
| Persistence | In-memory (+ optional sessionStorage); ephemeral by design |
| Auth | OIDC; access token in memory, refresh in HttpOnly cookie |
| Rendering | Virtualized list, rAF-batched token appends, smart auto-scroll |
| Resilience | Abort/stop, backoff on 429/timeout, resend-to-continue |