Design a cloud-based IDE similar to Replit or GitHub Codespaces. Users can write code, manage files, run terminal commands, and see real-time output—all in the browser without local setup. This problem tests your ability to design systems with resource management, real-time streaming, and multi-tenancy isolation. The core challenges are VM/container lifecycle management and efficient terminal output streaming at scale.
This walkthrough follows the Interview Framework. Use it as a guide, not a script—adapt based on interviewer cues.
Users should be able to create workspaces with a file tree (create, edit, delete files and folders)
Users should be able to run code and terminal commands with real-time output streaming (stdout/stderr)
Users should be able to stop running processes
Users should be able to install packages and persist environment state within a session
Users should be able to share workspaces with others (view/edit permissions)
Sharing here means access control (view/edit) and last-write-wins. Real-time collaborative editing (OT/CRDT) is out of scope unless explicitly required.
Package installs persist within an active session. Assume user-level installs (pip/npm/etc.) into /workspace or /home; OS-level installs require prebuilt images or an allowlist. Cross-session environment persistence (persisting the full filesystem) is optional (paid tier) and discussed as a trade-off later.
| Requirement | Target | Rationale |
|---|---|---|
| Cold start latency | < 5 seconds | Users expect near-instant execution |
| Output latency | < 100ms | Real-time feel for terminal output |
| Availability | 99.9% | Critical for paid/enterprise users |
| Concurrent users | 100K simultaneous | Scale for popular platforms |
| Execution isolation | Strong | Security: users can't access each other's data |
In an interview, clarify: "Should we support long-running jobs (hours) or just interactive development sessions?" This significantly impacts VM lifecycle design. For this guide, we'll focus on interactive development with a 12-hour max session runtime.
Assumptions:
100K concurrent users, each with 1 active workspace session
Average session: 2 vCPU, 4GB RAM
Peak concurrent sessions: 100K
Compute resources:
100K sessions × 2 vCPU = 200K vCPUs needed at peak
100K sessions × 4GB = 400TB RAM at peak
At ~40GB usable RAM per node, this requires ~10,000 compute nodes
Terminal output streaming:
Not all sessions actively run processes simultaneously—assume 50% peak utilization
50K active processes × 1KB/second = 50MB/second of output data
This is manageable with a modest Kafka cluster (3-5 brokers)
The main cost driver is compute, not storage. VM utilization optimization (pre-warming, pooling) directly impacts infrastructure costs.
Workspace
├── id: UUID
├── owner_id: UUID
├── name: string
├── template: string (e.g., "python", "node", "go")
├── created_at: timestamp
├── updated_at: timestamp
└── sharing_mode: enum (private, view, edit)
File
├── id: UUID
├── workspace_id: UUID (FK)
├── path: string (e.g., "/src/main.py")
├── content: text (for small files)
├── content_ref: string (optional, object storage pointer for large files)
├── is_directory: boolean
├── created_at: timestamp
└── updated_at: timestamp
Process
├── id: UUID
├── workspace_id: UUID (FK)
├── sandbox_id: UUID (FK)
├── command: string (e.g., "python main.py", "npm run dev")
├── status: enum (pending, running, completed, failed, cancelled)
├── started_at: timestamp
├── finished_at: timestamp
└── exit_code: integer
Sandbox (VM/Container instance)
├── id: UUID
├── workspace_id: UUID (FK)
├── user_id: UUID
├── status: enum (provisioning, warm, assigned, running, idle, terminated)
├── instance_type: string (cpu-small, cpu-large, gpu)
├── ip_address: string
├── created_at: timestamp
├── last_activity_at: timestamp
└── expires_at: timestamp
User 1:N Workspace 1:N File
Workspace 1:1 Sandbox (active session)
Workspace 1:N Process
Sandbox 1:N Process
Keep Sandbox as a separate entity from Process. A sandbox persists across multiple command executions within a session, maintaining installed packages and filesystem state.
| Operation | Protocol | Reason |
|---|---|---|
| CRUD operations | REST | Standard request-response |
| Terminal streaming | WebSocket | Real-time bidirectional |
| File uploads | REST + multipart | Large payloads |
# Workspace management
POST /api/workspaces Create workspace
GET /api/workspaces/{id} Get workspace with file tree
PUT /api/workspaces/{id} Update workspace metadata
DELETE /api/workspaces/{id} Delete workspace
# File operations
GET /api/workspaces/{id}/files List files (tree structure)
GET /api/files/{id} Get file content
POST /api/workspaces/{id}/files Create file or directory
PUT /api/files/{id} Update file content
DELETE /api/files/{id} Delete file or directory
POST /api/files/{id}/move Move/rename file
# Process execution
POST /api/workspaces/{id}/run Run command (returns stream token + sandbox_id)
POST /api/processes/{id}/cancel Cancel running process
POST /api/processes/{id}/input Send stdin input
# Sandbox management
POST /api/workspaces/{id}/sandbox Request/connect sandbox for workspace
GET /api/sandboxes/{id}/status Get sandbox status
DELETE /api/sandboxes/{id} Terminate sandbox
Prefer WebSocket messages for interactive stdin/cancel to minimize latency. Keep REST input/cancel as a fallback for non-WS clients or automation.
Run response (202):
{
"process_id": "proc-123",
"sandbox_id": "sbx-456",
"stream_token": "signed-token"
}
The token is sandbox-scoped and short-lived; the server can return the same token for subsequent commands until it expires. If a sandbox already exists for the workspace, the API returns the existing sandbox_id and a refreshed token.
The client uses a short-lived stream_token from the run response; the server validates the token and sandbox ownership.
# Client connects to stream terminal output
WSS /api/stream/{sandbox_id}?token=stream_token
# Server → Client messages
{
"type": "output",
"process_id": "proc-123",
"stream": "stdout" | "stderr",
"data": "Hello, world!\n",
"timestamp": 1699999999999
}
{
"type": "status",
"process_id": "proc-123",
"status": "completed",
"exit_code": 0
}
# Client → Server messages
{
"type": "resume",
"process_id": "proc-123",
"last_seen_id": "1699999999999-0"
}
{
"type": "input",
"process_id": "proc-123",
"data": "user input\n"
}
{
"type": "cancel",
"process_id": "proc-123"
}
The WebSocket connection is per-sandbox, not per-process. This allows streaming output from multiple concurrent processes (e.g., a dev server and a build command) while maintaining a single connection, reducing overhead.
Unlike notebook-style systems where executions are serialized, a Cloud IDE typically allows multiple concurrent processes (e.g., running a server while executing tests). The sandbox manages process isolation internally.
Sandbox Compute
Storage Layer
Terminal Output Streaming
Sandbox Orchestration
Application Layer
Edge Layer
Clients
HTTPS
WSS
Web Browser
Load Balancer
CDN - Static Assets
API Servers
WebSocket Servers
Sandbox Manager
Warm Pool Controller
Kubernetes Cluster
Kafka / Log Bus
PostgreSQL
Metadata
Redis
Session State
Object Storage
Workspaces/Files
Sandbox Pod 1
Sandbox Pod 2
Sandbox Pod N
API Servers
Handle REST requests for workspaces, files, processes
Authenticate users, authorize actions
Persist metadata to PostgreSQL, files to S3
WebSocket Servers
Maintain persistent connections with clients
Subscribe to shared Kafka topics (keyed by sandbox_id)
Fan out terminal output to connected clients
Append recent output to Redis streams for reconnect replay
Handle stdin input forwarding
Use consistent hashing/partition affinity so the server holding the client connection consumes that sandbox's partition (or add a routing layer for fan-out)
Sandbox Manager
Orchestrate sandbox lifecycle (create, monitor, terminate)
Route process execution requests to appropriate sandbox
Track sandbox health and resource usage
Handle sandbox assignment for workspaces
Warm Pool Controller
Maintain a pool of pre-provisioned sandboxes
Scale pool size based on demand prediction
Handle different instance types (CPU, GPU)
Kubernetes Cluster
Run sandbox containers/pods
Provide network isolation between sandboxes
Enforce resource limits (CPU, memory, disk)
Let's walk through what happens when a user clicks "Run" or executes a terminal command:
WebSocket Server
Kafka
Sandbox Pod
Kubernetes
Redis
Sandbox Manager
API Server
User Browser
WebSocket Server
Kafka
Sandbox Pod
Kubernetes
Redis
Sandbox Manager
API Server
User Browser
alt
[No active sandbox]
loop
[Output streaming]
POST /workspaces/{id}/run {command}
Request process execution
Check workspace's active sandbox
Create sandbox pod
Pod ready (IP, port)
Store sandbox mapping
Execute command via gRPC
Process started
202 Accepted {process_id, sandbox_id, stream_token}
Subscribe to process output (sandbox_id + stream_token)
Get sandbox routing metadata
Subscribe to shared topic (keyed by sandbox_id)
Publish stdout/stderr
Consume messages
Forward via WebSocket
Publish completion status
Consume completion
Send status: completed
Each sandbox runs as an isolated Kubernetes pod:
# Sandbox Pod Specification
apiVersion: v1
kind: Pod
metadata:
name: sandbox-{workspace_id}
labels:
type: sandbox
workspace: {workspace_id}
spec:
containers:
- name: runtime
image: sandbox-python:3.11
resources:
requests:
cpu: "1"
memory: "2Gi"
limits:
cpu: "2"
memory: "4Gi"
securityContext:
runAsNonRoot: true
readOnlyRootFilesystem: true
capabilities:
drop: ["ALL"]
volumeMounts:
- name: workspace
mountPath: /workspace
- name: home
mountPath: /home/sandbox
- name: tmp
mountPath: /tmp
- name: agent
image: sandbox-agent:latest
# Handles execution requests, streams output
ports:
- containerPort: 50051 # gRPC
# Network policy: no internet by default
# Egress allowed only to package registries
volumes:
- name: workspace
emptyDir: {}
- name: home
emptyDir: {}
- name: tmp
emptyDir: {}
With a read-only root filesystem, package installs write to mounted volumes like /workspace, /home/sandbox, and /tmp. Use virtualenvs or language-specific paths under those mounts.
Two-container pattern:
Runtime container: Runs user processes (Python/Node/Go/etc.), provides terminal shell access
Agent container: Manages process lifecycle, captures terminal output, syncs files, communicates with control plane
Security is critical. User code runs in untrusted sandboxes. Use: (1) Container isolation with dropped capabilities, (2) Network policies blocking unauthorized egress, (3) Resource limits preventing DoS, (4) Read-only filesystem where possible, (5) Non-root user execution.
The output streaming pipeline is the heart of real-time terminal feel:
WebSocket Layer
Message Bus
Sandbox Pod
stdout/stderr
Batch + Compress
Publish
Keyed partitions
Subscribe
Fan-out
Fan-out
User Process
Agent Process
Buffer
Kafka
Partitions
WS Server
Client 1
Client 2
Avoid one Kafka topic per sandbox. Use a small number of shared topics with partitions keyed by sandbox_id to preserve ordering without exploding metadata.
At scale, shard sandbox_id to a WebSocket server so only the owning shard consumes that partition; otherwise every WS server would need to consume everything and filter locally.
Agent output handling:
# Pseudo-code: Agent captures and streams terminal output
class OutputStreamer:
def __init__(self, kafka_producer, sandbox_id):
self.producer = kafka_producer
self.sandbox_id = sandbox_id
self.topic = "sandbox-output"
self.key = sandbox_id.encode()
self.buffer = []
self.last_flush = time.now()
def capture(self, process_id: str, stream: str, data: bytes):
self.buffer.append({
"process_id": process_id,
"stream": stream,
"data": data,
"ts": time.now_ms()
})
# Flush every 50ms or 4KB, whichever comes first
if self._should_flush():
self._flush()
def _should_flush(self):
buffer_size = sum(len(m["data"]) for m in self.buffer)
time_elapsed = time.now() - self.last_flush
return buffer_size >= 4096 or time_elapsed >= 50ms
def _flush(self):
if not self.buffer:
return
# Batch messages for efficiency
self.producer.send(
self.topic,
key=self.key,
value=msgpack.encode({
"sandbox_id": self.sandbox_id,
"events": self.buffer,
}),
)
self.buffer = []
self.last_flush = time.now()
Why batch output?
Individual characters would create millions of messages
50ms batching provides perceived real-time feel
Reduces Kafka throughput and WebSocket message overhead
Interview insight: Mention the trade-off between latency and throughput. Smaller batches = lower latency but higher overhead. 50ms is a sweet spot—humans can't perceive delays under 100ms as "laggy."
A common interview follow-up: "What happens if the user disconnects mid-process and reconnects?"
Solution: short-term Redis replay buffer + Kafka for live streaming
Output Replay Strategy:
┌─────────────────────────────────────────────────────────┐
│ 1. WS server appends output to Redis stream per process │
│ 2. Redis stream has TTL (e.g., 1 hour) and size cap │
│ 3. Client sends process_id + last_seen_id on reconnect │
│ 4. Server replays from Redis, then resumes live stream │
└─────────────────────────────────────────────────────────┘
# WebSocket reconnection handler
async def handle_reconnect(ws, process_id, last_seen_id):
stream_key = f"proc:{process_id}:output"
if last_seen_id:
# Resume from where client left off
events = redis.xrange(stream_key, min=last_seen_id, max="+")
else:
# New connection: start from earliest available
events = redis.xrange(stream_key, min="-", max="+")
# Replay buffered messages, then switch to live streaming
for event in events:
await ws.send(event)
A Cloud IDE typically allows multiple concurrent processes. Accept a map of process_id -> last_seen_id on reconnect and replay each stream independently.
For terminal history: Unlike notebook-style systems, Cloud IDEs typically don't persist terminal output long-term. Terminal history is kept in Redis for reconnection (TTL ~1 hour). If users need persistent logs (e.g., for build output), store them in object storage with a signed URL for later access.
Don't over-engineer output persistence. Users expect real-time streaming for running processes. Terminal history is ephemeral by nature—focus on the live experience.
Cold-starting a container takes 10-30 seconds. Users expect < 5 seconds. Solution: pre-warm pools.
Pool Configuration:
┌─────────────────┬─────────┬───────────┬──────────┐
│ Instance Type │ Min │ Target │ Max │
├─────────────────┼─────────┼───────────┼──────────┤
│ python-cpu-sm │ 100 │ 500 │ 2000 │
│ python-cpu-lg │ 50 │ 200 │ 1000 │
│ python-gpu │ 10 │ 50 │ 200 │
│ node-cpu-sm │ 50 │ 200 │ 1000 │
└─────────────────┴─────────┴───────────┴──────────┘
Pool Controller Logic:
class WarmPoolController:
def reconcile(self):
for pool_type in self.pool_types:
current = self.count_warm_sandboxes(pool_type)
target = self.calculate_target(pool_type)
if current < target:
# Scale up: provision more sandboxes
to_create = min(target - current, self.max_batch_size)
self.provision_sandboxes(pool_type, to_create)
elif current > target * 1.5:
# Scale down: terminate excess (with buffer)
to_terminate = current - int(target * 1.2)
self.terminate_oldest(pool_type, to_terminate)
def calculate_target(self, pool_type):
# Predict demand based on:
# 1. Current active sandboxes
# 2. Historical patterns (time of day, day of week)
# 3. Recent allocation rate
current_active = self.count_active(pool_type)
allocation_rate = self.get_allocation_rate(pool_type, window=5min)
# Target = active + (allocation_rate × buffer_minutes)
return current_active + (allocation_rate * 10)
Warm pools are a significant cost. Pre-provisioned VMs consume resources even when idle. Balance between fast allocation and cost efficiency based on usage patterns.
Cold Start Latency (< 5s)
| Strategy | Latency Reduction | Trade-off |
|---|---|---|
| Warm pools | 10-30s → 1-2s | Higher idle cost |
| Container image optimization | 5-10s saved | Limited customization |
| Lazy package loading | 2-5s saved | First import slower |
| Snapshot/restore (Firecracker) | Near-instant | Complexity |
Output Latency (< 100ms)
Kafka partitioning by sandbox_id ensures ordering
WebSocket servers co-located with Kafka brokers
Client-side buffering for smooth rendering
Execution Isolation
Kubernetes namespaces per organization/tenant (not per user at 100K+ scale)
Network policies: sandboxes can't communicate with each other
Seccomp profiles restricting dangerous syscalls
Resource quotas preventing noisy neighbor issues
Problem: All process requests route through Sandbox Manager.
Solution:
Stateless Sandbox Manager instances behind load balancer
Sandbox state stored in Redis (not in-memory)
Leader election for pool management tasks only
Problem: 50K active processes × 1KB/s = 50MB/s sustained throughput (at peak, could spike to 100MB/s).
Solution:
Partition by sandbox_id for parallelism
3-5 broker Kafka cluster handles this comfortably
Consider alternatives: Redis Streams for simpler cases, Pulsar for higher scale
Problem: Each connection holds buffer state. 10K connections × 10KB = 100MB per server (100K total connections across ~10 servers).
Solution:
Horizontal scaling with sticky sessions
Connection limits per server (10K connections)
Offload buffering to Redis for reconnection support
Create request
Container ready
Workspace allocation
First process
All processes complete
New process
Timeout (30min)
Explicit workspace stop
Max lifetime (12h)
Pool scale-down
Provisioning
Warm
Assigned
Running
Idle
Terminated
State transitions:
class SandboxStateMachine:
TRANSITIONS = {
"provisioning": ["warm", "terminated"],
"warm": ["assigned", "terminated"],
"assigned": ["running", "terminated"],
"running": ["idle", "terminated"],
"idle": ["running", "terminated"],
}
def handle_idle_timeout(self, sandbox):
"""Called when sandbox has been idle too long"""
if sandbox.status != "idle":
return
idle_duration = now() - sandbox.last_activity_at
# Free tier: 5 min idle timeout
# Paid tier: 30 min idle timeout
timeout = self.get_timeout_for_tier(sandbox.user_id)
if idle_duration > timeout:
# Save workspace files to S3
self.snapshot_workspace(sandbox)
# Terminate to free resources
self.terminate(sandbox)
Cost optimization insight: Aggressive idle timeouts save money but hurt UX. Differentiate by user tier—free users get shorter timeouts, paid users get longer sessions.
Firecracker MicroVMs (what AWS Lambda uses)
Pros:
Sub-second cold starts (150ms possible)
Stronger isolation than containers
Snapshot/restore for instant warm starts
Cons:
More operational complexity
Less tooling than Kubernetes
Requires custom orchestration
When to choose: High-security requirements, need for instant cold starts, Lambda-like execution model.
gVisor/Kata Containers
Pros:
Better isolation than standard containers
Works with Kubernetes
Lower overhead than full VMs
Cons:
Some syscall compatibility issues
Performance overhead (10-20%)
When to choose: Need stronger isolation without leaving Kubernetes ecosystem.
| Approach | Pros | Cons |
|---|---|---|
| Ephemeral | Simple, cheap | Files lost on timeout |
| Persistent workspace (Replit-like) | Better UX, feels like local dev | Higher storage cost |
| Hybrid | Flexible | Complex to implement |
For interview, recommend hybrid:
Source files: Always persisted to S3 (synced on save)
Environment (packages, dependencies): Persist within session; optional snapshots for paid tier (size-capped)
Runtime state (running processes, variables): Lost on timeout, user restarts as needed
Clarified interactive development vs. batch job execution model
Defined cold start latency target
Discussed isolation/security requirements
Established scale (concurrent users, workspaces)
Explained sandbox container architecture (two-container pattern)
Designed terminal output streaming pipeline (agent → Kafka → WebSocket)
Covered warm pool strategy for fast workspace startup
Showed VM state machine (provisioning → warm → assigned → running → idle → terminated)
Addressed reconnection/output replay (common follow-up question)
Addressed cold start with warm pools
Discussed WebSocket horizontal scaling
Mentioned security isolation (network policies, capabilities)
Covered cost optimization (idle timeouts, tiered pools)
| Aspect | Decision | Rationale |
|---|---|---|
| Sandbox runtime | Kubernetes pods | Mature orchestration, good isolation |
| Terminal streaming | Kafka + WebSocket | Decouples producers/consumers, handles backpressure |
| Cold start mitigation | Warm pools | Predictable latency without Firecracker complexity |
| Output batching | 50ms windows | Balance latency and throughput |
| File persistence | Hybrid (files persisted, runtime ephemeral) | Cost-effective, acceptable UX |
| Isolation | Network policies + resource limits + seccomp | Defense in depth |
The key insight for this design is that real-time feel comes from the streaming pipeline, not the compute layer. Users tolerate 2-3 second sandbox startup, but terminal output must stream within 100ms. Design your output pipeline carefully—it's the difference between "feels instant" and "feels broken."