Design a prompt engineering playground similar to ChatGPT Playground or Anthropic Console. This is a product-focused system design question from a client engineer's perspective, emphasizing UX flows, client state management, streaming, performance, and how the frontend collaborates with backend services.
Prompt engineering is the process of iterating on input prompts to make a model better at performing specific tasks. Unlike conversational chat, playgrounds focus on stateless runs: each run is independent, but a single run can include multi-message context (system + user + assistant examples).
A user wants to generate a recipe. They might:
Start simple: "Write me a recipe"
Iterate to improve: "You are a great chef who has written many cookbooks. Write me a recipe for..."
Add examples: Include sample good/bad recipes for the model to learn from
Keep refining until they get the desired output
Save the successful prompt for future use
Disclaimer: This is a sample solution to help you get started. To better prepare for the interview, you should think through the question yourself and try to come up with your own solution. System design questions are open-ended and have multiple valid approaches.
Users should be able to:
Create and edit prompts with system instructions, user messages, and assistant examples
Configure model parameters (model version, temperature, max tokens, top-p, stop sequences)
Execute prompts and see streaming responses in real-time
Save and organize prompts into projects/folders for reuse
View execution history to compare outputs across different prompt versions and configurations
Share prompts with team members (for enterprise users)
For a 45-minute interview, focus on 3-5 core flows: prompt editing, execution with streaming, and saving/organizing prompts. Mention sharing and history as stretch goals.
Clarify the boundaries:
MVP Focus: Web-based prompt editor with real-time execution
Platforms: Desktop-first web app (power users prefer larger screens for prompt iteration)
User Types: Individual developers and enterprise teams
Integrations: API key management, usage tracking, billing integration
Latency: First token p50 < 1s, p95 < 3s (LLM dependent), show progress if slower
Streaming: Responses must stream token-by-token (not wait for complete response)
Reliability: Executions should gracefully handle timeouts, disconnects, and model errors
Autosave: Prompt changes persist automatically (no lost work)
Concurrent editing: Enterprise users may need collaborative editing
Privacy: Users control history retention; clear UI on what is stored
For a platform like Anthropic Console, we might have 10K daily active users, each running 20-50 prompt iterations per session. During peak hours (business hours across time zones), we could see 1000+ concurrent executions. The LLM backend is the bottleneck, not our system.
User
├── user_id (UUID)
├── name
├── org_id
├── api_key_hash
└── created_at
Organization
├── org_id (UUID)
├── name
├── billing_plan
└── usage_limits
Project
├── project_id (UUID)
├── org_id
├── name
├── description
└── created_at
Prompt
├── prompt_id (UUID)
├── project_id
├── name
├── system_prompt
├── messages[]
├── model_config
├── latest_checkpoint_version_id
├── created_at
└── updated_at
PromptVersion
├── prompt_version_id (UUID)
├── prompt_id
├── version_number
├── content_snapshot
├── model_config_snapshot
├── variables_snapshot
├── created_at
└── created_by
Execution
├── execution_id (UUID)
├── prompt_id
├── prompt_version_id
├── user_id
├── variable_values
├── input_tokens
├── output_tokens
├── latency_ms
├── status (running, completed, failed, cancelled)
└── created_at
ExecutionResult
├── execution_result_id (UUID)
├── execution_id
├── response_text
├── finish_reason
└── resolved_model_version
SharedPrompt
├── shared_prompt_id (UUID)
├── prompt_id
├── shared_with_user_id
└── permission_level (view, edit)
Each execution should point to an immutable PromptVersion, not just the mutable Prompt. Before running, the client flushes any pending autosave and the server creates or reuses a version snapshot for the current prompt content, variables, and model config, then stores prompt_version_id on the execution. This makes history, comparison, audit, and reruns reproducible after the prompt is edited.
{
"prompt": {
"system_prompt": "You are a helpful recipe chef...",
"messages": [
{ "role": "user", "content": "Write a recipe for..." },
{ "role": "assistant", "content": "Here's a recipe..." },
{ "role": "user", "content": "{{user_input}}" }
],
"model_config": {
"model": "claude-3-5-sonnet-20241022",
"temperature": 0.7,
"max_tokens": 1024,
"top_p": 1.0,
"stop_sequences": []
},
"variables": ["user_input"]
}
}
The prompt editor screen needs everything in one load—users shouldn't wait for multiple requests while iterating. Design your API to return the complete prompt state in a single call.
This is the core of the interview. Design APIs that feel intuitive to developers using the playground.
GET /projects?cursor=...&limit=20
Response: { "projects": [...], "next_cursor": "..." }
# Create a project
POST /projects
Request: { "name": "Recipe Experiments", "description": "..." }
Response: { "project": { "id": "...", "name": "...", ... } }
# Get project with prompts
GET /projects/:project_id
Response: {
"project": {...},
"prompts": [{ "id": "...", "name": "...", "updated_at": "..." }]
}
# CRUD for prompts
POST /prompts
Request: { "project_id": "...", "name": "...", "system_prompt": "...", "messages": [...], "model_config": {...} }
Response: { "prompt": {...} }
GET /prompts/:prompt_id
Response: { "prompt": {...}, "recent_executions": [...] }
PATCH /prompts/:prompt_id
Request: { "name": "...", "system_prompt": "...", ... } // Partial update
Response: { "prompt": {...} }
DELETE /prompts/:prompt_id
Response: { "success": true }
POST /prompts/:prompt_id/execute
Headers: Accept: text/event-stream
Request: { "variable_values": { "user_input": "chocolate cake" } }
Response: Server-Sent Events stream
event: start
data: { "execution_id": "exec_123", "prompt_version_id": "pv_456", "model": "claude-3-5-sonnet" }
event: token
data: { "text": "Here" }
event: token
data: { "text": "'s a" }
event: token
data: { "text": " delicious" }
...
event: done
data: { "execution_id": "exec_123", "input_tokens": 150, "output_tokens": 523, "finish_reason": "end_turn" }
event: error (if something goes wrong)
data: { "code": "RATE_LIMITED", "message": "Rate limit exceeded. Retry in 30s." }
# Cancel an execution
POST /executions/:execution_id/cancel
Response: { "cancelled": true }
# Get execution result (for history or if SSE disconnected)
GET /executions/:execution_id
Response: {
"execution": {
"id": "...",
"prompt_id": "...",
"prompt_version_id": "...",
"status": "completed",
"variable_values": { "user_input": "chocolate cake" },
"model_config_snapshot": { "model": "claude-3-5-sonnet", "temperature": 0.7 },
"response_text": "...",
"input_tokens": 150,
"output_tokens": 523,
"latency_ms": 2340,
"created_at": "..."
}
}
GET /prompts/:prompt_id/executions?cursor=...&limit=20
Response: {
"executions": [
{ "id": "...", "prompt_version_id": "...", "status": "completed", "input_tokens": 150, "output_tokens": 523, "created_at": "..." }
],
"next_cursor": "..."
}
# Compare two executions
GET /executions/compare?ids=exec_1,exec_2
Response: {
"executions": [
{ "id": "exec_1", "prompt_version_id": "pv_1", "variable_values": {...}, "model_config_snapshot": {...}, "response_text": "..." },
{ "id": "exec_2", "prompt_version_id": "pv_2", "variable_values": {...}, "model_config_snapshot": {...}, "response_text": "..." }
]
}
POST /prompts/:prompt_id/shares
Request: { "email": "colleague@company.com", "permission": "edit" }
Response: { "share": { "id": "...", "user": {...}, "permission": "edit" } }
# List shared prompts
GET /prompts/shared-with-me
Response: { "prompts": [...] }
# Fork a shared prompt (create your own copy)
POST /prompts/:prompt_id/fork
Request: { "project_id": "my_project_id" }
Response: { "prompt": { "id": "new_prompt_id", ... } }
1. Streaming via SSE (Server-Sent Events)
SSE is simpler than WebSockets for unidirectional streaming
Works with standard HTTP infrastructure (load balancers, CDNs)
Client can use native EventSource API or fetch with reader
2. Execution is Tied to Prompt, Not Ad-Hoc
POST /prompts/:id/execute instead of POST /execute with full prompt
Benefits: Automatic history tracking, easier analytics, prompt versioning
Persist prompt_version_id on every execution so the run can be replayed even after the prompt changes
Trade-off: Requires saving prompt first (but autosave handles this)
3. Partial Updates with PATCH
Users constantly tweak prompts—don't resend entire prompt on each keystroke
Send only changed fields: { "temperature": 0.9 }
4. Cursor-Based Pagination
For execution history, chronological cursors work well
Handles concurrent writes better than offset pagination
5. Idempotency for Executions
Optional Idempotency-Key header for POST /execute prevents duplicate runs on network retry
Less critical than payments, but useful for expensive model calls
{
"error": {
"code": "PROMPT_TOO_LONG",
"message": "Prompt exceeds maximum context length of 200k tokens",
"details": {
"prompt_tokens": 215000,
"max_tokens": 200000
}
}
}
Common error codes:
RATE_LIMITED — Too many requests, include retry-after
CONTEXT_LENGTH_EXCEEDED — Prompt too long
MODEL_UNAVAILABLE — Model temporarily unavailable
INVALID_API_KEY — Authentication failed
QUOTA_EXCEEDED — Monthly usage limit hit
A user opens the prompt playground and wants to iterate on a recipe generator:
Call GET /projects to see their projects
Select a project, triggering GET /projects/:id to load prompts
Click 'New Prompt', which sends POST /prompts with initial content
As they type in the editor, debounced PATCH /prompts/:id calls autosave changes
They click 'Run', which calls POST /prompts/:id/execute with variable values
The client flushes any pending autosave; the server checkpoints the current prompt as a PromptVersion, creates an Execution linked to that version, and streams tokens back
When done, they tweak the temperature and run again
They open history with GET /prompts/:id/executions to compare outputs
Happy with the result, they share it via POST /prompts/:id/shares
Web Client
Project List
Prompt Editor
Execution Stream
API Gateway (Auth, Rate Limiting)
Project Service (CRUD projects)
Prompt Service (CRUD, autosave)
Execution Service (SSE streaming, history)
PostgreSQL • projects • prompts • prompt_versions • executions • shares
LLM Backend (Claude API) • Streaming • Token counts
1. Authentication (Client Perspective)
Use session cookies or short-lived JWTs for the web UI
Never store provider API keys in the browser; backend proxies LLM calls
Optional BYOK: store encrypted keys server-side with explicit user consent
All requests authenticated via Authorization header or secure cookies
2. Autosave Pattern
Debounce changes (500ms delay)
Send PATCH with only changed fields
Show "Saving..." → "Saved ✓" indicator
Conflict resolution: last write wins (acceptable for single-user editing)
3. Streaming Execution
// Client-side SSE handling with fetch (POST + streaming)
const response = await fetch(`/prompts/${id}/execute`, {
method: 'POST',
headers: { 'Accept': 'text/event-stream', 'Content-Type': 'application/json' },
body: JSON.stringify({ variable_values: {...} })
});
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
// Parse SSE events from chunk
for (const line of chunk.split('\n')) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
if (data.text) appendToResponse(data.text);
if (data.finish_reason) showExecutionStats(data);
}
}
}
Note: Standard EventSource only supports GET requests. For POST with SSE, use fetch() with a ReadableStream reader as shown above.
4. Retry and Timeout Handling
Execution timeout: 120 seconds (long-running completions)
Client shows estimated progress based on max_tokens
If SSE disconnects, client can fetch result via GET /executions/:id
Retries: exponential backoff for transient errors
5. Client Instrumentation (Product Architecture)
Track time-to-first-token, run success rate, average iterations per prompt
Capture client errors (stream parse, aborts, timeouts)
Use these metrics to tune UX (default max_tokens, autosave debounce)
Where does state live?
Prompt content: Server (PostgreSQL), synced via autosave
Unsaved changes: Client (in-memory), persisted to server on debounce
Execution stream: Client buffers tokens, server stores final result linked to prompt_version_id
UI state: Client-only (expanded panels, selected tabs)
Handling Execution Cancellation
User clicks "Stop"
Client calls POST /executions/:id/cancel
Server signals LLM backend to abort
Partial response is saved to history with finish_reason: "cancelled"
1. Autosave vs. Explicit Save
Decision: Autosave with smart versioning.
Autosave on every change (debounced)
Create or reuse an immutable version immediately before execution if the current draft differs from the latest checkpoint
Create additional versions only on significant non-run changes (e.g., 10+ minutes elapsed)
Show "last saved X minutes ago" indicator
2. Streaming vs. Buffered Response
Decision: Streaming with fallback.
Primary: SSE for real-time token display
Fallback: If SSE fails, poll GET /executions/:id every 2 seconds
3. Execution Tied to Saved Prompt vs. Ad-Hoc
Decision: Execute saved prompts, but make saving invisible.
The Run action flushes any pending autosave so the server snapshots the editor state the user actually sees
Execution snapshots the exact prompt/config into PromptVersion and stores prompt_version_id
User never explicitly "saves before running"
Coarse-Grained: GET /prompts/:id
Returns prompt + recent executions + model config in one call
Rationale: Prompt editor needs all this data immediately
Fine-Grained: PATCH /prompts/:id
Accepts partial updates (just temperature, just system prompt)
Rationale: Autosave sends frequent small changes
1. Prefetching
On project list load, prefetch first 3 prompts
On prompt open, prefetch recent executions
2. Optimistic Updates
Prompt edits appear instantly (autosave in background)
Show "Saving..." only if > 1 second delay
3. Lazy Loading
Execution history loads on scroll (infinite scroll with cursor pagination)
Response text for old executions fetched on demand
4. Rendering Performance
Virtualize execution history for large lists
Incrementally append streamed tokens (avoid full re-render per token)
Use memoization for prompt editor panes and config panels
5. Offline Handling
Queue autosave requests when offline
Show banner: "You're offline. Changes will sync when connected."
Prevent execution (requires server)
1. Long-Running Executions
Max timeout: 120 seconds
Show progress indicator based on token generation rate
Allow cancellation at any time
2. Token Limit Exceeded
Validate prompt length before execution
Show token counter in UI (current / max)
Error gracefully: "Prompt exceeds limit by X tokens"
3. Concurrent Editing (Enterprise)
For MVP: Last write wins
Future: Operational Transform (like Google Docs) for real-time collaboration
Show presence indicators: "Alex is viewing this prompt"
4. API Key Rotation
User can generate new API key
Old key remains valid for 24 hours (grace period)
Webhook notification when key is rotated
5. Sensitive Data
Warning banner before saving prompts with secrets
Per-project retention settings (e.g., 30 days, never store history)
Copy-to-clipboard sanitization to avoid leaking API keys
Strategy: URL versioning (/v1/prompts, /v2/prompts)
Backward Compatibility Rules:
Add new fields, never remove old ones
New endpoints for breaking changes
Deprecation timeline: 6 months notice before removal
Feature Flags in Response:
{
"prompt": {...},
"_capabilities": {
"streaming": true,
"variables": true,
"collaboration": false
}
}
Before concluding, verify you've covered:
Requirements
Core user flows: edit, execute, save, history, share
Non-functional: streaming latency, autosave, reliability
Scope: web-first, individual + enterprise users
Data Model
Prompt structure with system prompt, messages, variables
Model configuration (temperature, max_tokens, etc.)
Execution history linked to immutable prompt versions with token counts and latency
API Design
CRUD for projects and prompts
Streaming execution with SSE
Execution cancellation and history
Sharing and collaboration endpoints
Error response format
High-Level Design
Client-to-LLM data flow
Autosave pattern with debouncing
Streaming connection handling
State management approach
Deep Dive
Autosave vs. explicit save trade-off
Streaming vs. buffered response
Client optimization (prefetch, offline)
Edge cases (timeout, token limit, concurrent edit)