Back

Prompt Engineering Playground System Design

System DesignSystem DesignOnsitePhoneSoftware EngineerReported Apr, 2026High Frequency

Problem Statement

Design a prompt engineering playground similar to ChatGPT Playground or Anthropic Console. This is a product-focused system design question from a client engineer's perspective, emphasizing UX flows, client state management, streaming, performance, and how the frontend collaborates with backend services.

What is Prompt Engineering?

Prompt engineering is the process of iterating on input prompts to make a model better at performing specific tasks. Unlike conversational chat, playgrounds focus on stateless runs: each run is independent, but a single run can include multi-message context (system + user + assistant examples).

Example Workflow

A user wants to generate a recipe. They might:

Start simple: "Write me a recipe"

Iterate to improve: "You are a great chef who has written many cookbooks. Write me a recipe for..."

Add examples: Include sample good/bad recipes for the model to learn from

Keep refining until they get the desired output

Save the successful prompt for future use

Disclaimer: This is a sample solution to help you get started. To better prepare for the interview, you should think through the question yourself and try to come up with your own solution. System design questions are open-ended and have multiple valid approaches.

Phase 1: Requirements (~5-7 minutes)

Functional Requirements

Users should be able to:

Create and edit prompts with system instructions, user messages, and assistant examples

Configure model parameters (model version, temperature, max tokens, top-p, stop sequences)

Execute prompts and see streaming responses in real-time

Save and organize prompts into projects/folders for reuse

View execution history to compare outputs across different prompt versions and configurations

Share prompts with team members (for enterprise users)

For a 45-minute interview, focus on 3-5 core flows: prompt editing, execution with streaming, and saving/organizing prompts. Mention sharing and history as stretch goals.

Product Scope

Clarify the boundaries:

MVP Focus: Web-based prompt editor with real-time execution

Platforms: Desktop-first web app (power users prefer larger screens for prompt iteration)

User Types: Individual developers and enterprise teams

Integrations: API key management, usage tracking, billing integration

Non-Functional Requirements (Client-Led)

Latency: First token p50 < 1s, p95 < 3s (LLM dependent), show progress if slower
Streaming: Responses must stream token-by-token (not wait for complete response)

Reliability: Executions should gracefully handle timeouts, disconnects, and model errors

Autosave: Prompt changes persist automatically (no lost work)

Concurrent editing: Enterprise users may need collaborative editing

Privacy: Users control history retention; clear UI on what is stored

Capacity: Quick Sanity Check

For a platform like Anthropic Console, we might have 10K daily active users, each running 20-50 prompt iterations per session. During peak hours (business hours across time zones), we could see 1000+ concurrent executions. The LLM backend is the bottleneck, not our system.

Phase 2: Data Model (~8-10 minutes)

Core Entities

User

├── user_id (UUID)

├── email

├── name

├── org_id

├── api_key_hash

└── created_at

Organization

├── org_id (UUID)

├── name

├── billing_plan

└── usage_limits

Project

├── project_id (UUID)

├── org_id

├── name

├── description

└── created_at

Prompt

├── prompt_id (UUID)

├── project_id

├── name

├── system_prompt

├── messages[]

├── model_config

├── latest_checkpoint_version_id

├── created_at

└── updated_at

PromptVersion

├── prompt_version_id (UUID)

├── prompt_id

├── version_number

├── content_snapshot

├── model_config_snapshot

├── variables_snapshot

├── created_at

└── created_by

Execution

├── execution_id (UUID)

├── prompt_id

├── prompt_version_id

├── user_id

├── variable_values

├── input_tokens

├── output_tokens

├── latency_ms

├── status (running, completed, failed, cancelled)

└── created_at

ExecutionResult

├── execution_result_id (UUID)

├── execution_id

├── response_text

├── finish_reason

└── resolved_model_version

SharedPrompt

├── shared_prompt_id (UUID)

├── prompt_id

├── shared_with_user_id

└── permission_level (view, edit)

Each execution should point to an immutable PromptVersion, not just the mutable Prompt. Before running, the client flushes any pending autosave and the server creates or reuses a version snapshot for the current prompt content, variables, and model config, then stores prompt_version_id on the execution. This makes history, comparison, audit, and reruns reproducible after the prompt is edited.

Prompt Structure Detail

{
  "prompt": {
    "system_prompt": "You are a helpful recipe chef...",
    "messages": [
      { "role": "user", "content": "Write a recipe for..." },
      { "role": "assistant", "content": "Here's a recipe..." },
      { "role": "user", "content": "{{user_input}}" }
    ],
    "model_config": {
      "model": "claude-3-5-sonnet-20241022",
      "temperature": 0.7,
      "max_tokens": 1024,
      "top_p": 1.0,
      "stop_sequences": []
    },
    "variables": ["user_input"]
  }
}

Client-First Thinking: What Data Does Each Screen Need?

The prompt editor screen needs everything in one load—users shouldn't wait for multiple requests while iterating. Design your API to return the complete prompt state in a single call.

Phase 3: API Design (~15-20 minutes)

This is the core of the interview. Design APIs that feel intuitive to developers using the playground.

Project & Prompt Management

List user's projects

GET /projects?cursor=...&limit=20

  Response: { "projects": [...], "next_cursor": "..." }

# Create a project

POST /projects

  Request:  { "name": "Recipe Experiments", "description": "..." }
  Response: { "project": { "id": "...", "name": "...", ... } }

# Get project with prompts

GET /projects/:project_id

  Response: { 
    "project": {...}, 
    "prompts": [{ "id": "...", "name": "...", "updated_at": "..." }] 
  }

# CRUD for prompts

POST /prompts

  Request:  { "project_id": "...", "name": "...", "system_prompt": "...", "messages": [...], "model_config": {...} }
  Response: { "prompt": {...} }

GET /prompts/:prompt_id

  Response: { "prompt": {...}, "recent_executions": [...] }

PATCH /prompts/:prompt_id

  Request:  { "name": "...", "system_prompt": "...", ... }  // Partial update
  Response: { "prompt": {...} }

DELETE /prompts/:prompt_id

  Response: { "success": true }

Prompt Execution (Critical Path)

Execute a prompt (streaming)

POST /prompts/:prompt_id/execute

  Headers:  Accept: text/event-stream
  Request:  { "variable_values": { "user_input": "chocolate cake" } }
  Response: Server-Sent Events stream
    event: start
    data: { "execution_id": "exec_123", "prompt_version_id": "pv_456", "model": "claude-3-5-sonnet" }

    event: token
    data: { "text": "Here" }

    event: token
    data: { "text": "'s a" }

    event: token
    data: { "text": " delicious" }
    ...

    event: done
    data: { "execution_id": "exec_123", "input_tokens": 150, "output_tokens": 523, "finish_reason": "end_turn" }

    event: error (if something goes wrong)
    data: { "code": "RATE_LIMITED", "message": "Rate limit exceeded. Retry in 30s." }

# Cancel an execution

POST /executions/:execution_id/cancel

  Response: { "cancelled": true }

# Get execution result (for history or if SSE disconnected)

GET /executions/:execution_id

  Response: { 
    "execution": { 
      "id": "...", 
      "prompt_id": "...",
      "prompt_version_id": "...",
      "status": "completed",
      "variable_values": { "user_input": "chocolate cake" },
      "model_config_snapshot": { "model": "claude-3-5-sonnet", "temperature": 0.7 },
      "response_text": "...", 
      "input_tokens": 150,
      "output_tokens": 523,
      "latency_ms": 2340,
      "created_at": "..."
    } 
  }

Execution History

List executions for a prompt

GET /prompts/:prompt_id/executions?cursor=...&limit=20

  Response: { 
    "executions": [
      { "id": "...", "prompt_version_id": "...", "status": "completed", "input_tokens": 150, "output_tokens": 523, "created_at": "..." }
    ],
    "next_cursor": "..." 
  }

# Compare two executions

GET /executions/compare?ids=exec_1,exec_2

  Response: { 
    "executions": [
      { "id": "exec_1", "prompt_version_id": "pv_1", "variable_values": {...}, "model_config_snapshot": {...}, "response_text": "..." },
      { "id": "exec_2", "prompt_version_id": "pv_2", "variable_values": {...}, "model_config_snapshot": {...}, "response_text": "..." }
    ] 
  }

Sharing & Collaboration

Share a prompt

POST /prompts/:prompt_id/shares

  Request:  { "email": "colleague@company.com", "permission": "edit" }
  Response: { "share": { "id": "...", "user": {...}, "permission": "edit" } }

# List shared prompts

GET /prompts/shared-with-me

  Response: { "prompts": [...] }

# Fork a shared prompt (create your own copy)

POST /prompts/:prompt_id/fork

  Request:  { "project_id": "my_project_id" }
  Response: { "prompt": { "id": "new_prompt_id", ... } }

Key API Design Decisions

1. Streaming via SSE (Server-Sent Events)

SSE is simpler than WebSockets for unidirectional streaming

Works with standard HTTP infrastructure (load balancers, CDNs)

Client can use native EventSource API or fetch with reader

2. Execution is Tied to Prompt, Not Ad-Hoc

POST /prompts/:id/execute instead of POST /execute with full prompt

Benefits: Automatic history tracking, easier analytics, prompt versioning

Persist prompt_version_id on every execution so the run can be replayed even after the prompt changes

Trade-off: Requires saving prompt first (but autosave handles this)

3. Partial Updates with PATCH

Users constantly tweak prompts—don't resend entire prompt on each keystroke

Send only changed fields: { "temperature": 0.9 }

4. Cursor-Based Pagination

For execution history, chronological cursors work well

Handles concurrent writes better than offset pagination

5. Idempotency for Executions

Optional Idempotency-Key header for POST /execute prevents duplicate runs on network retry

Less critical than payments, but useful for expensive model calls

Error Handling

{
  "error": {
    "code": "PROMPT_TOO_LONG",
    "message": "Prompt exceeds maximum context length of 200k tokens",
    "details": {
      "prompt_tokens": 215000,
      "max_tokens": 200000
    }
  }
}

Common error codes:

RATE_LIMITED — Too many requests, include retry-after

CONTEXT_LENGTH_EXCEEDED — Prompt too long

MODEL_UNAVAILABLE — Model temporarily unavailable

INVALID_API_KEY — Authentication failed

QUOTA_EXCEEDED — Monthly usage limit hit

Walk Through a User Flow

A user opens the prompt playground and wants to iterate on a recipe generator:

Call GET /projects to see their projects

Select a project, triggering GET /projects/:id to load prompts

Click 'New Prompt', which sends POST /prompts with initial content

As they type in the editor, debounced PATCH /prompts/:id calls autosave changes

They click 'Run', which calls POST /prompts/:id/execute with variable values

The client flushes any pending autosave; the server checkpoints the current prompt as a PromptVersion, creates an Execution linked to that version, and streams tokens back

When done, they tweak the temperature and run again

They open history with GET /prompts/:id/executions to compare outputs

Happy with the result, they share it via POST /prompts/:id/shares

Phase 4: High-Level Design (~10-15 minutes)

Client-to-Backend Data Flow

Web Client

Project List

Prompt Editor

Execution Stream

API Gateway (Auth, Rate Limiting)

Project Service (CRUD projects)

Prompt Service (CRUD, autosave)

Execution Service (SSE streaming, history)

PostgreSQL • projects • prompts • prompt_versions • executions • shares

LLM Backend (Claude API) • Streaming • Token counts

Key Services and Responsibilities

Client-Server Interaction Patterns

1. Authentication (Client Perspective)

Use session cookies or short-lived JWTs for the web UI

Never store provider API keys in the browser; backend proxies LLM calls

Optional BYOK: store encrypted keys server-side with explicit user consent

All requests authenticated via Authorization header or secure cookies

2. Autosave Pattern

Debounce changes (500ms delay)

Send PATCH with only changed fields

Show "Saving..." → "Saved ✓" indicator

Conflict resolution: last write wins (acceptable for single-user editing)

3. Streaming Execution

// Client-side SSE handling with fetch (POST + streaming)
const response = await fetch(`/prompts/${id}/execute`, {
  method: 'POST',
  headers: { 'Accept': 'text/event-stream', 'Content-Type': 'application/json' },
  body: JSON.stringify({ variable_values: {...} })
});

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value);
  // Parse SSE events from chunk
  for (const line of chunk.split('\n')) {
    if (line.startsWith('data: ')) {
      const data = JSON.parse(line.slice(6));
      if (data.text) appendToResponse(data.text);
      if (data.finish_reason) showExecutionStats(data);
    }
  }
}

Note: Standard EventSource only supports GET requests. For POST with SSE, use fetch() with a ReadableStream reader as shown above.

4. Retry and Timeout Handling

Execution timeout: 120 seconds (long-running completions)

Client shows estimated progress based on max_tokens

If SSE disconnects, client can fetch result via GET /executions/:id

Retries: exponential backoff for transient errors

5. Client Instrumentation (Product Architecture)

Track time-to-first-token, run success rate, average iterations per prompt

Capture client errors (stream parse, aborts, timeouts)

Use these metrics to tune UX (default max_tokens, autosave debounce)

State Management

Where does state live?

Prompt content: Server (PostgreSQL), synced via autosave

Unsaved changes: Client (in-memory), persisted to server on debounce

Execution stream: Client buffers tokens, server stores final result linked to prompt_version_id

UI state: Client-only (expanded panels, selected tabs)

Handling Execution Cancellation

User clicks "Stop"

Client calls POST /executions/:id/cancel

Server signals LLM backend to abort

Partial response is saved to history with finish_reason: "cancelled"

Client-Side Caching

Phase 5: Deep Dive & Trade-offs (~8-10 minutes)

Product Trade-offs (Client-Led)

1. Autosave vs. Explicit Save

Decision: Autosave with smart versioning.

Autosave on every change (debounced)

Create or reuse an immutable version immediately before execution if the current draft differs from the latest checkpoint

Create additional versions only on significant non-run changes (e.g., 10+ minutes elapsed)

Show "last saved X minutes ago" indicator

2. Streaming vs. Buffered Response

Decision: Streaming with fallback.

Primary: SSE for real-time token display

Fallback: If SSE fails, poll GET /executions/:id every 2 seconds

3. Execution Tied to Saved Prompt vs. Ad-Hoc

Decision: Execute saved prompts, but make saving invisible.

The Run action flushes any pending autosave so the server snapshots the editor state the user actually sees

Execution snapshots the exact prompt/config into PromptVersion and stores prompt_version_id

User never explicitly "saves before running"

API Granularity Decisions

Coarse-Grained: GET /prompts/:id

Returns prompt + recent executions + model config in one call

Rationale: Prompt editor needs all this data immediately

Fine-Grained: PATCH /prompts/:id

Accepts partial updates (just temperature, just system prompt)

Rationale: Autosave sends frequent small changes

Client Optimization

1. Prefetching

On project list load, prefetch first 3 prompts

On prompt open, prefetch recent executions

2. Optimistic Updates

Prompt edits appear instantly (autosave in background)

Show "Saving..." only if > 1 second delay

3. Lazy Loading

Execution history loads on scroll (infinite scroll with cursor pagination)

Response text for old executions fetched on demand

4. Rendering Performance

Virtualize execution history for large lists

Incrementally append streamed tokens (avoid full re-render per token)

Use memoization for prompt editor panes and config panels

5. Offline Handling

Queue autosave requests when offline

Show banner: "You're offline. Changes will sync when connected."

Prevent execution (requires server)

Edge Cases

1. Long-Running Executions

Max timeout: 120 seconds

Show progress indicator based on token generation rate

Allow cancellation at any time

2. Token Limit Exceeded

Validate prompt length before execution

Show token counter in UI (current / max)

Error gracefully: "Prompt exceeds limit by X tokens"

3. Concurrent Editing (Enterprise)

For MVP: Last write wins

Future: Operational Transform (like Google Docs) for real-time collaboration

Show presence indicators: "Alex is viewing this prompt"

4. API Key Rotation

User can generate new API key

Old key remains valid for 24 hours (grace period)

Webhook notification when key is rotated

5. Sensitive Data

Warning banner before saving prompts with secrets

Per-project retention settings (e.g., 30 days, never store history)

Copy-to-clipboard sanitization to avoid leaking API keys

API Evolution & Versioning

Strategy: URL versioning (/v1/prompts, /v2/prompts)

Backward Compatibility Rules:

Add new fields, never remove old ones

New endpoints for breaking changes

Deprecation timeline: 6 months notice before removal

Feature Flags in Response:

{
  "prompt": {...},
  "_capabilities": {
    "streaming": true,
    "variables": true,
    "collaboration": false
  }
}

Interview Checklist

Before concluding, verify you've covered:

Requirements

Core user flows: edit, execute, save, history, share

Non-functional: streaming latency, autosave, reliability

Scope: web-first, individual + enterprise users

Data Model

Prompt structure with system prompt, messages, variables

Model configuration (temperature, max_tokens, etc.)

Execution history linked to immutable prompt versions with token counts and latency

API Design

CRUD for projects and prompts

Streaming execution with SSE

Execution cancellation and history

Sharing and collaboration endpoints

Error response format

High-Level Design

Client-to-LLM data flow

Autosave pattern with debouncing

Streaming connection handling

State management approach

Deep Dive

Autosave vs. explicit save trade-off

Streaming vs. buffered response

Client optimization (prefetch, offline)

Edge cases (timeout, token limit, concurrent edit)

Summary


WhiteboardAuto-save enabled
Loading whiteboard…