Design a Cloud IDE

System DesignSystem DesignOnsitePhoneSoftware EngineerReported Apr, 2026Medium Frequency

Design a cloud-based IDE similar to Replit or GitHub Codespaces. Users can write code, manage files, run terminal commands, and see real-time output—all in the browser without local setup. This problem tests your ability to design systems with resource management, real-time streaming, and multi-tenancy isolation. The core challenges are VM/container lifecycle management and efficient terminal output streaming at scale.

This walkthrough follows the Interview Framework. Use it as a guide, not a script—adapt based on interviewer cues.

Phase 1: Requirements

Functional Requirements

Users should be able to create workspaces with a file tree (create, edit, delete files and folders)

Users should be able to run code and terminal commands with real-time output streaming (stdout/stderr)

Users should be able to stop running processes

Users should be able to install packages and persist environment state within a session

Users should be able to share workspaces with others (view/edit permissions)

Sharing here means access control (view/edit) and last-write-wins. Real-time collaborative editing (OT/CRDT) is out of scope unless explicitly required.

Package installs persist within an active session. Assume user-level installs (pip/npm/etc.) into /workspace or /home; OS-level installs require prebuilt images or an allowlist. Cross-session environment persistence (persisting the full filesystem) is optional (paid tier) and discussed as a trade-off later.

Non-Functional Requirements

Requirement	Target	Rationale
Cold start latency	< 5 seconds	Users expect near-instant execution
Output latency	< 100ms	Real-time feel for terminal output
Availability	99.9%	Critical for paid/enterprise users
Concurrent users	100K simultaneous	Scale for popular platforms
Execution isolation	Strong	Security: users can't access each other's data

In an interview, clarify: "Should we support long-running jobs (hours) or just interactive development sessions?" This significantly impacts VM lifecycle design. For this guide, we'll focus on interactive development with a 12-hour max session runtime.

Capacity Estimation

Assumptions:

100K concurrent users, each with 1 active workspace session

Average session: 2 vCPU, 4GB RAM

Peak concurrent sessions: 100K

Compute resources:

100K sessions × 2 vCPU = 200K vCPUs needed at peak

100K sessions × 4GB = 400TB RAM at peak

At ~40GB usable RAM per node, this requires ~10,000 compute nodes

Terminal output streaming:

Not all sessions actively run processes simultaneously—assume 50% peak utilization

50K active processes × 1KB/second = 50MB/second of output data

This is manageable with a modest Kafka cluster (3-5 brokers)

The main cost driver is compute, not storage. VM utilization optimization (pre-warming, pooling) directly impacts infrastructure costs.

Phase 2: Data Model

Core Entities

Workspace

├── id: UUID
├── owner_id: UUID
├── name: string
├── template: string (e.g., "python", "node", "go")
├── created_at: timestamp
├── updated_at: timestamp
└── sharing_mode: enum (private, view, edit)

File

├── id: UUID
├── workspace_id: UUID (FK)
├── path: string (e.g., "/src/main.py")
├── content: text (for small files)
├── content_ref: string (optional, object storage pointer for large files)
├── is_directory: boolean
├── created_at: timestamp
└── updated_at: timestamp

Process

├── id: UUID
├── workspace_id: UUID (FK)
├── sandbox_id: UUID (FK)
├── command: string (e.g., "python main.py", "npm run dev")
├── status: enum (pending, running, completed, failed, cancelled)
├── started_at: timestamp
├── finished_at: timestamp
└── exit_code: integer

Sandbox (VM/Container instance)

├── id: UUID
├── workspace_id: UUID (FK)
├── user_id: UUID
├── status: enum (provisioning, warm, assigned, running, idle, terminated)
├── instance_type: string (cpu-small, cpu-large, gpu)
├── ip_address: string
├── created_at: timestamp
├── last_activity_at: timestamp
└── expires_at: timestamp

Entity Relationships

User 1:N Workspace 1:N File

Workspace 1:1 Sandbox (active session)

Workspace 1:N Process

Sandbox 1:N Process

Keep Sandbox as a separate entity from Process. A sandbox persists across multiple command executions within a session, maintaining installed packages and filesystem state.

Phase 3: API Design

Protocol Choices

Operation	Protocol	Reason
CRUD operations	REST	Standard request-response
Terminal streaming	WebSocket	Real-time bidirectional
File uploads	REST + multipart	Large payloads

REST Endpoints

# Workspace management
POST   /api/workspaces                   Create workspace
GET    /api/workspaces/{id}              Get workspace with file tree
PUT    /api/workspaces/{id}              Update workspace metadata
DELETE /api/workspaces/{id}              Delete workspace

# File operations
GET    /api/workspaces/{id}/files        List files (tree structure)
GET    /api/files/{id}                   Get file content
POST   /api/workspaces/{id}/files        Create file or directory
PUT    /api/files/{id}                   Update file content
DELETE /api/files/{id}                   Delete file or directory
POST   /api/files/{id}/move              Move/rename file

# Process execution
POST   /api/workspaces/{id}/run          Run command (returns stream token + sandbox_id)
POST   /api/processes/{id}/cancel        Cancel running process
POST   /api/processes/{id}/input         Send stdin input

# Sandbox management
POST   /api/workspaces/{id}/sandbox      Request/connect sandbox for workspace
GET    /api/sandboxes/{id}/status        Get sandbox status
DELETE /api/sandboxes/{id}               Terminate sandbox

Prefer WebSocket messages for interactive stdin/cancel to minimize latency. Keep REST input/cancel as a fallback for non-WS clients or automation.

Run response (202):

{
  "process_id": "proc-123",
  "sandbox_id": "sbx-456",
  "stream_token": "signed-token"
}

The token is sandbox-scoped and short-lived; the server can return the same token for subsequent commands until it expires. If a sandbox already exists for the workspace, the API returns the existing sandbox_id and a refreshed token.

WebSocket Protocol

The client uses a short-lived stream_token from the run response; the server validates the token and sandbox ownership.

# Client connects to stream terminal output
WSS /api/stream/{sandbox_id}?token=stream_token

# Server → Client messages
{
  "type": "output",
  "process_id": "proc-123",
  "stream": "stdout" | "stderr",
  "data": "Hello, world!\n",
  "timestamp": 1699999999999
}

{
  "type": "status",
  "process_id": "proc-123",
  "status": "completed",
  "exit_code": 0
}

# Client → Server messages
{
  "type": "resume",
  "process_id": "proc-123",
  "last_seen_id": "1699999999999-0"
}

{
  "type": "input",
  "process_id": "proc-123",
  "data": "user input\n"
}

{
  "type": "cancel",
  "process_id": "proc-123"
}

The WebSocket connection is per-sandbox, not per-process. This allows streaming output from multiple concurrent processes (e.g., a dev server and a build command) while maintaining a single connection, reducing overhead.

Unlike notebook-style systems where executions are serialized, a Cloud IDE typically allows multiple concurrent processes (e.g., running a server while executing tests). The sandbox manages process isolation internally.

Phase 4: High-Level Design

Architecture Overview

Sandbox Compute

Storage Layer

Terminal Output Streaming

Sandbox Orchestration

Application Layer

Edge Layer

Clients

HTTPS

WSS

Web Browser

Load Balancer

CDN - Static Assets

API Servers

WebSocket Servers

Sandbox Manager

Warm Pool Controller

Kubernetes Cluster

Kafka / Log Bus

PostgreSQL

Metadata

Redis

Session State

Object Storage

Workspaces/Files

Sandbox Pod 1

Sandbox Pod 2

Sandbox Pod N

Component Responsibilities

API Servers

Handle REST requests for workspaces, files, processes

Authenticate users, authorize actions

Persist metadata to PostgreSQL, files to S3

WebSocket Servers

Maintain persistent connections with clients

Subscribe to shared Kafka topics (keyed by sandbox_id)

Fan out terminal output to connected clients

Append recent output to Redis streams for reconnect replay

Handle stdin input forwarding

Use consistent hashing/partition affinity so the server holding the client connection consumes that sandbox's partition (or add a routing layer for fan-out)

Sandbox Manager

Orchestrate sandbox lifecycle (create, monitor, terminate)

Route process execution requests to appropriate sandbox

Track sandbox health and resource usage

Handle sandbox assignment for workspaces

Warm Pool Controller

Maintain a pool of pre-provisioned sandboxes

Scale pool size based on demand prediction

Handle different instance types (CPU, GPU)

Kubernetes Cluster

Run sandbox containers/pods

Provide network isolation between sandboxes

Enforce resource limits (CPU, memory, disk)

Data Flow: Run Command

Let's walk through what happens when a user clicks "Run" or executes a terminal command:

WebSocket Server

Kafka

Sandbox Pod

Kubernetes

Redis

Sandbox Manager

API Server

User Browser

WebSocket Server

Kafka

Sandbox Pod

Kubernetes

Redis

Sandbox Manager

API Server

User Browser

alt

[No active sandbox]

loop

[Output streaming]

POST /workspaces/{id}/run {command}

Request process execution

Check workspace's active sandbox

Create sandbox pod

Pod ready (IP, port)

Store sandbox mapping

Execute command via gRPC

Process started

202 Accepted {process_id, sandbox_id, stream_token}

Subscribe to process output (sandbox_id + stream_token)

Get sandbox routing metadata

Subscribe to shared topic (keyed by sandbox_id)

Publish stdout/stderr

Consume messages

Forward via WebSocket

Publish completion status

Consume completion

Send status: completed

Sandbox Container Architecture

Each sandbox runs as an isolated Kubernetes pod:

# Sandbox Pod Specification
apiVersion: v1
kind: Pod
metadata:
  name: sandbox-{workspace_id}
  labels:
    type: sandbox
    workspace: {workspace_id}
spec:
  containers:
  - name: runtime
    image: sandbox-python:3.11
    resources:
      requests:
        cpu: "1"
        memory: "2Gi"
      limits:
        cpu: "2"
        memory: "4Gi"
    securityContext:
      runAsNonRoot: true
      readOnlyRootFilesystem: true
      capabilities:
        drop: ["ALL"]
    volumeMounts:
    - name: workspace
      mountPath: /workspace
    - name: home
      mountPath: /home/sandbox
    - name: tmp
      mountPath: /tmp

  - name: agent
    image: sandbox-agent:latest
    # Handles execution requests, streams output
    ports:
    - containerPort: 50051  # gRPC

  # Network policy: no internet by default
  # Egress allowed only to package registries
  volumes:
  - name: workspace
    emptyDir: {}
  - name: home
    emptyDir: {}
  - name: tmp
    emptyDir: {}

With a read-only root filesystem, package installs write to mounted volumes like /workspace, /home/sandbox, and /tmp. Use virtualenvs or language-specific paths under those mounts.

Two-container pattern:

Runtime container: Runs user processes (Python/Node/Go/etc.), provides terminal shell access

Agent container: Manages process lifecycle, captures terminal output, syncs files, communicates with control plane

Security is critical. User code runs in untrusted sandboxes. Use: (1) Container isolation with dropped capabilities, (2) Network policies blocking unauthorized egress, (3) Resource limits preventing DoS, (4) Read-only filesystem where possible, (5) Non-root user execution.

Terminal Output Streaming Deep Dive

The output streaming pipeline is the heart of real-time terminal feel:

WebSocket Layer

Message Bus

Sandbox Pod

stdout/stderr

Batch + Compress

Publish

Keyed partitions

Fan-out

User Process

Agent Process

Buffer

Kafka

Partitions

WS Server

Client 1

Client 2

Avoid one Kafka topic per sandbox. Use a small number of shared topics with partitions keyed by sandbox_id to preserve ordering without exploding metadata.

At scale, shard sandbox_id to a WebSocket server so only the owning shard consumes that partition; otherwise every WS server would need to consume everything and filter locally.

Agent output handling:

# Pseudo-code: Agent captures and streams terminal output
class OutputStreamer:
    def __init__(self, kafka_producer, sandbox_id):
        self.producer = kafka_producer
        self.sandbox_id = sandbox_id
        self.topic = "sandbox-output"
        self.key = sandbox_id.encode()
        self.buffer = []
        self.last_flush = time.now()

    def capture(self, process_id: str, stream: str, data: bytes):
        self.buffer.append({
            "process_id": process_id,
            "stream": stream,
            "data": data,
            "ts": time.now_ms()
        })

        # Flush every 50ms or 4KB, whichever comes first
        if self._should_flush():
            self._flush()

    def _should_flush(self):
        buffer_size = sum(len(m["data"]) for m in self.buffer)
        time_elapsed = time.now() - self.last_flush
        return buffer_size >= 4096 or time_elapsed >= 50ms

    def _flush(self):
        if not self.buffer:
            return

        # Batch messages for efficiency
        self.producer.send(
            self.topic,
            key=self.key,
            value=msgpack.encode({
                "sandbox_id": self.sandbox_id,
                "events": self.buffer,
            }),
        )
        self.buffer = []
        self.last_flush = time.now()

Why batch output?

Individual characters would create millions of messages

50ms batching provides perceived real-time feel

Reduces Kafka throughput and WebSocket message overhead

Interview insight: Mention the trade-off between latency and throughput. Smaller batches = lower latency but higher overhead. 50ms is a sweet spot—humans can't perceive delays under 100ms as "laggy."

Output Persistence & Reconnection

A common interview follow-up: "What happens if the user disconnects mid-process and reconnects?"

Solution: short-term Redis replay buffer + Kafka for live streaming

Output Replay Strategy:

┌─────────────────────────────────────────────────────────┐
│ 1. WS server appends output to Redis stream per process │
│ 2. Redis stream has TTL (e.g., 1 hour) and size cap     │
│ 3. Client sends process_id + last_seen_id on reconnect  │
│ 4. Server replays from Redis, then resumes live stream  │
└─────────────────────────────────────────────────────────┘

# WebSocket reconnection handler
async def handle_reconnect(ws, process_id, last_seen_id):
    stream_key = f"proc:{process_id}:output"

    if last_seen_id:
        # Resume from where client left off
        events = redis.xrange(stream_key, min=last_seen_id, max="+")
    else:
        # New connection: start from earliest available
        events = redis.xrange(stream_key, min="-", max="+")

    # Replay buffered messages, then switch to live streaming
    for event in events:
        await ws.send(event)

A Cloud IDE typically allows multiple concurrent processes. Accept a map of process_id -> last_seen_id on reconnect and replay each stream independently.

For terminal history: Unlike notebook-style systems, Cloud IDEs typically don't persist terminal output long-term. Terminal history is kept in Redis for reconnection (TTL ~1 hour). If users need persistent logs (e.g., for build output), store them in object storage with a signed URL for later access.

Don't over-engineer output persistence. Users expect real-time streaming for running processes. Terminal history is ephemeral by nature—focus on the live experience.

Warm Pool Strategy

Cold-starting a container takes 10-30 seconds. Users expect < 5 seconds. Solution: pre-warm pools.

Pool Configuration:

┌─────────────────┬─────────┬───────────┬──────────┐
│ Instance Type   │ Min     │ Target    │ Max      │
├─────────────────┼─────────┼───────────┼──────────┤
│ python-cpu-sm   │ 100     │ 500       │ 2000     │
│ python-cpu-lg   │ 50      │ 200       │ 1000     │
│ python-gpu      │ 10      │ 50        │ 200      │
│ node-cpu-sm     │ 50      │ 200       │ 1000     │
└─────────────────┴─────────┴───────────┴──────────┘

Pool Controller Logic:

class WarmPoolController:
    def reconcile(self):
        for pool_type in self.pool_types:
            current = self.count_warm_sandboxes(pool_type)
            target = self.calculate_target(pool_type)

            if current < target:
                # Scale up: provision more sandboxes
                to_create = min(target - current, self.max_batch_size)
                self.provision_sandboxes(pool_type, to_create)

            elif current > target * 1.5:
                # Scale down: terminate excess (with buffer)
                to_terminate = current - int(target * 1.2)
                self.terminate_oldest(pool_type, to_terminate)

    def calculate_target(self, pool_type):
        # Predict demand based on:
        # 1. Current active sandboxes
        # 2. Historical patterns (time of day, day of week)
        # 3. Recent allocation rate

        current_active = self.count_active(pool_type)
        allocation_rate = self.get_allocation_rate(pool_type, window=5min)

        # Target = active + (allocation_rate × buffer_minutes)
        return current_active + (allocation_rate * 10)

Warm pools are a significant cost. Pre-provisioned VMs consume resources even when idle. Balance between fast allocation and cost efficiency based on usage patterns.

Phase 5: Scaling & Trade-offs

Addressing Non-Functional Requirements

Cold Start Latency (< 5s)

Strategy	Latency Reduction	Trade-off
Warm pools	10-30s → 1-2s	Higher idle cost
Container image optimization	5-10s saved	Limited customization
Lazy package loading	2-5s saved	First import slower
Snapshot/restore (Firecracker)	Near-instant	Complexity

Output Latency (< 100ms)

Kafka partitioning by sandbox_id ensures ordering

WebSocket servers co-located with Kafka brokers

Client-side buffering for smooth rendering

Execution Isolation

Kubernetes namespaces per organization/tenant (not per user at 100K+ scale)

Network policies: sandboxes can't communicate with each other

Seccomp profiles restricting dangerous syscalls

Resource quotas preventing noisy neighbor issues

Bottleneck Analysis

Sandbox Manager as Single Point of Failure

Problem: All process requests route through Sandbox Manager.

Solution:

Stateless Sandbox Manager instances behind load balancer

Sandbox state stored in Redis (not in-memory)

Leader election for pool management tasks only

Kafka Throughput

Problem: 50K active processes × 1KB/s = 50MB/s sustained throughput (at peak, could spike to 100MB/s).

Solution:

Partition by sandbox_id for parallelism

3-5 broker Kafka cluster handles this comfortably

Consider alternatives: Redis Streams for simpler cases, Pulsar for higher scale

WebSocket Server Memory

Problem: Each connection holds buffer state. 10K connections × 10KB = 100MB per server (100K total connections across ~10 servers).

Solution:

Horizontal scaling with sticky sessions

Connection limits per server (10K connections)

Offload buffering to Redis for reconnection support

Deep Dive: VM Lifecycle Management

Create request

Container ready

Workspace allocation

First process

All processes complete

New process

Timeout (30min)

Explicit workspace stop

Max lifetime (12h)

Pool scale-down

Provisioning

Warm

Assigned

Running

Idle

Terminated

State transitions:

class SandboxStateMachine:
    TRANSITIONS = {
        "provisioning": ["warm", "terminated"],
        "warm": ["assigned", "terminated"],
        "assigned": ["running", "terminated"],
        "running": ["idle", "terminated"],
        "idle": ["running", "terminated"],
    }

    def handle_idle_timeout(self, sandbox):
        """Called when sandbox has been idle too long"""
        if sandbox.status != "idle":
            return

        idle_duration = now() - sandbox.last_activity_at

        # Free tier: 5 min idle timeout
        # Paid tier: 30 min idle timeout
        timeout = self.get_timeout_for_tier(sandbox.user_id)

        if idle_duration > timeout:
            # Save workspace files to S3
            self.snapshot_workspace(sandbox)
            # Terminate to free resources
            self.terminate(sandbox)

Cost optimization insight: Aggressive idle timeouts save money but hurt UX. Differentiate by user tier—free users get shorter timeouts, paid users get longer sessions.

Alternative Architectures

Firecracker MicroVMs (what AWS Lambda uses)

Pros:

Sub-second cold starts (150ms possible)

Stronger isolation than containers

Snapshot/restore for instant warm starts

Cons:

More operational complexity

Less tooling than Kubernetes

Requires custom orchestration

When to choose: High-security requirements, need for instant cold starts, Lambda-like execution model.

gVisor/Kata Containers

Pros:

Better isolation than standard containers

Works with Kubernetes

Lower overhead than full VMs

Cons:

Some syscall compatibility issues

Performance overhead (10-20%)

When to choose: Need stronger isolation without leaving Kubernetes ecosystem.

Trade-off: Persistence Model

Approach	Pros	Cons
Ephemeral	Simple, cheap	Files lost on timeout
Persistent workspace (Replit-like)	Better UX, feels like local dev	Higher storage cost
Hybrid	Flexible	Complex to implement

For interview, recommend hybrid:

Source files: Always persisted to S3 (synced on save)

Environment (packages, dependencies): Persist within session; optional snapshots for paid tier (size-capped)

Runtime state (running processes, variables): Lost on timeout, user restarts as needed

Interview Checklist

Requirements Phase

Clarified interactive development vs. batch job execution model

Defined cold start latency target

Discussed isolation/security requirements

Established scale (concurrent users, workspaces)

Design Phase

Explained sandbox container architecture (two-container pattern)

Designed terminal output streaming pipeline (agent → Kafka → WebSocket)

Covered warm pool strategy for fast workspace startup

Showed VM state machine (provisioning → warm → assigned → running → idle → terminated)

Addressed reconnection/output replay (common follow-up question)

Scaling Phase

Addressed cold start with warm pools

Discussed WebSocket horizontal scaling

Mentioned security isolation (network policies, capabilities)

Covered cost optimization (idle timeouts, tiered pools)

Summary

Aspect	Decision	Rationale
Sandbox runtime	Kubernetes pods	Mature orchestration, good isolation
Terminal streaming	Kafka + WebSocket	Decouples producers/consumers, handles backpressure
Cold start mitigation	Warm pools	Predictable latency without Firecracker complexity
Output batching	50ms windows	Balance latency and throughput
File persistence	Hybrid (files persisted, runtime ephemeral)	Cost-effective, acceptable UX
Isolation	Network policies + resource limits + seccomp	Defense in depth

The key insight for this design is that real-time feel comes from the streaming pipeline, not the compute layer. Users tolerate 2-3 second sandbox startup, but terminal output must stream within 100ms. Design your output pipeline carefully—it's the difference between "feels instant" and "feels broken."

WhiteboardAuto-save enabled

Loading whiteboard…