Data Labeling Task Scheduler

Low-Level DesignCodingOnsitePhoneMachine Learning EngineerSoftware EngineerReported Apr, 2026

Problem Overview

This OpenAI Machine Learning Engineer question was reported as a multi-part coding problem about constructing a schedule for data labeling work.

Part 1 is the easy version: build any valid schedule.

Part 2 adds stronger fairness constraints that must hold at every prefix of the returned list.

Part 3 is a streaming followup: tasks arrive in daily batches and the scheduler must update incrementally.

Problem Statement

You are given:

t tasks, indexed 0..t-1

m models, indexed 0..m-1

h human labelers, indexed 0..h-1

a target k

Return a scheduling, represented as a list of tuples:

(task, model, human)

Each tuple means:

human human works on task task

that assignment is paired with model model

Your schedule must satisfy:

Every human labeler participates in at least k assignments total.

Every (task, human) pair appears at most once.

For every task x, and for every prefix of the schedule, the counts across models stay balanced:

max_i count_prefix(x, model=i) - min_i count_prefix(x, model=i) <= 1

For every task x, and for every prefix of the schedule, the counts across humans also stay balanced:

max_j count_prefix(x, human=j) - min_j count_prefix(x, human=j) <= 1

If no valid schedule exists, return failure in whatever form your language uses (None, empty list, exception, etc.).

Part 1: Build Any Valid Schedule

Ignore the prefix-balance constraints for now. Just return any schedule such that:

each human appears at least k times

each human works on a given task at most once

Key Observation

Because each human can touch a task at most once, a human can do at most t assignments total. Therefore:

k > t => impossible

That is the main feasibility condition for Part 1.

Simple Construction

If k <= t and m >= 1, assign each human to k distinct tasks, for example:

for human in range(h):

for offset in range(k):

task = (human + offset) % t

model = 0

schedule.append((task, model, human))

This is already enough for the easy part.

Part 2: Prefix-Balanced Schedule

Now add the stronger requirement that the schedule must remain balanced at every intermediate point, not just at the end.

This sounds more complicated than it is. The clean construction is:

Schedule work in exactly k rounds.

In each round, every human gets exactly one assignment.

Rotate tasks cyclically so each human sees a new task each round.

For each task, assign models in local round-robin order.

Why This Works

For round r and human u, choose:

task = (u + r) mod t

This gives every human exactly one assignment per round, and over k rounds each human sees k distinct tasks as long as k <= t.

Now keep a counter task_seen[task], meaning how many times task task has already been scheduled. When task task appears again, assign:

model = task_seen[task] mod m

Then increment task_seen[task].

This makes the model counts for each fixed task cycle through:

0, 1, 2, ..., m-1, 0, 1, 2, ...

So for every task, the counts across models are always as even as possible, even at intermediate prefixes.

Important Simplification

The (task, human) balance condition is mostly automatic once you enforce:

each (task, human) pair is used at most once

Since every such count is then either 0 or 1, the difference between the maximum and minimum is never more than 1.

That means the real nontrivial part is balancing models for each task while also giving every human at least k distinct tasks.

Worked Example

Suppose:
t = 3, m = 2, h = 4, k = 2
One valid schedule is: 
[
    (0, 0, 0),
    (1, 0, 1),
    (2, 0, 2),
    (0, 1, 3),
    (1, 1, 0),
    (2, 1, 1),
    (0, 0, 2),
    (1, 0, 3),
]
Check the properties:
every human appears exactly 2 times
no human repeats the same task
for task 0, model counts evolve as (1,0), then (1,1), then (2,1) so the difference never exceeds 1
the same holds for tasks 1 and 2

Reference Solution

from typing import List, Optional, Tuple

Assignment = Tuple[int, int, int]

def build_basic_schedule(t: int, m: int, h: int, k: int) -> Optional[List[Assignment]]:

"""

Part 1: ignore prefix balance and build any valid schedule.

"""

if k == 0:

return []

if t <= 0 or m <= 0 or h <= 0:

return None

if k > t:

return None

schedule: List[Assignment] = []

for human in range(h):

for offset in range(k):

task = (human + offset) % t

model = 0

schedule.append((task, model, human))

return schedule

def build_balanced_schedule(t: int, m: int, h: int, k: int) -> Optional[List[Assignment]]:

"""

Part 2: build a schedule such that:

each human appears at least k times
each (task, human) pair appears at most once
for every task, model counts stay prefix-balanced
task-human balance is automatic because counts are only 0/1

Returns a minimal-length valid schedule of exactly h * k assignments,

or None if impossible.

"""

if k == 0:

return []

if t <= 0 or m <= 0 or h <= 0:

return None

if k > t:

return None

schedule: List[Assignment] = []

task_seen = [0] * t

for round_idx in range(k):

for human in range(h):

task = (human + round_idx) % t

model = task_seen[task] % m

schedule.append((task, model, human))

task_seen[task] += 1

return schedule

Why The Construction Is Correct

1. Each human gets at least k assignments

There are exactly k rounds, and each human receives exactly one assignment per round. So each human appears exactly k times.

2. No human repeats the same task

For a fixed human u, the assigned tasks are:

(u + 0) mod t,

(u + 1) mod t,

...,

(u + k - 1) mod t

These are all distinct when k <= t.

3. Model counts stay balanced for every task

Fix any task x. Each time x appears, we assign the next model in round-robin order:

0, 1, 2, ..., m-1, 0, 1, ...

After any number of occurrences of task x, each model has been used either:

floor(c / m) times, or

ceil(c / m) times

where c is the number of times task x has appeared so far.

Therefore the difference between the largest and smallest model count is always at most 1.

4. The schedule is minimal in length

Every tuple increases the total assignment count of exactly one human by 1. Since each of the h humans must reach at least k, any valid schedule needs at least:

h * k

assignments.

Our construction uses exactly h * k, so it is optimal in schedule length.

Complexity

The construction is straightforward:

Time: O(hk)

Space: O(t) auxiliary space, plus the output list

Part 3: Streaming / Incremental Scheduler

A separate followup variant turns the same problem into an online one. Instead of returning a full schedule up front, the platform receives a fresh batch of tasks each day and must emit assignments incrementally.

New Constraints

Each day a batch of tasks arrives. The scheduler must produce that day's (task, model, human) tuples before the next day begins.

Each human can do at most one task per day.

History accumulates across days. The same (task, human) pair is still forbidden, even across different days.

The scheduler must support incremental updates. It cannot recompute the entire history from scratch every day.

Long-term goal: across all days combined, model and human counts per task remain balanced (the same prefix-balance condition as in Part 2).

The k target from Parts 1 and 2 is not part of this variant. Instead, balance is measured against the cumulative schedule.

Design

Maintain a small amount of persistent state across days:

next_model[task]: round-robin pointer for the next model to assign to this task

next_human[task]: round-robin pointer for the next human to try for this task

pair_used: set of (task, human) pairs already scheduled in any prior day

Each day, also track:

used_today: set of humans already assigned today (resets each day)

For every task in today's batch, walk the human round-robin starting at next_human[task] and pick the first human that is neither in used_today nor already paired with this task. Pick the model as next_model[task] % m and advance the model pointer.

Reference Solution

from typing import Iterable, List, Set, Tuple

Assignment = Tuple[int, int, int]

class StreamingScheduler:

"""

Incremental scheduler for daily batches of labeling tasks.

Each call to schedule_day consumes the tasks that arrived that day

and returns the assignments produced for that day, while updating

persistent state used by subsequent days.

"""

def init(self, t: int, m: int, h: int) -> None:

if t <= 0 or m <= 0 or h <= 0:

raise ValueError("t, m, h must all be positive")

self.t = t

self.m = m

self.h = h

self.next_model = [0] * t

self.next_human = [0] * t

self.pair_used: Set[Tuple[int, int]] = set()

def schedule_day(self, tasks_today: Iterable[int]) -> List[Assignment]:

used_today: Set[int] = set()

result: List[Assignment] = []

for task in tasks_today:

human = self._pick_human(task, used_today)

if human is None:

No labeler is available for this task today.

The platform can defer it to a later day.

continue

model = self.next_model[task] % self.m

result.append((task, model, human))

used_today.add(human)

self.pair_used.add((task, human))

self.next_model[task] += 1

self.next_human[task] = (human + 1) % self.h

return result

def _pick_human(self, task: int, used_today: Set[int]) -> int | None:

start = self.next_human[task]

for offset in range(self.h):

human = (start + offset) % self.h

if human in used_today:

continue

if (task, human) in self.pair_used:

continue

return human

return None

Why This Stays Balanced

Models per task. next_model[task] advances by exactly one whenever task task is scheduled, so the model assigned cycles through 0, 1, ..., m-1, 0, 1, .... After any number of occurrences of task task, model counts differ by at most one. This is the same argument as Part 2, just spread across days.

Humans per task. Each (task, human) pair is used at most once, so per-task human counts are always 0 or 1. The difference between max and min is therefore at most one.

Per-day cap. used_today resets each day and blocks reusing a human, so no labeler is overloaded within a day.

What Can Go Wrong

If a task arrives on a day where no human satisfies both constraints (every available human has already labeled this task before, or has already been used today), the scheduler must defer that task. The reference solution above silently skips it. Real systems would push it onto a backlog and retry the next day.

The lifetime hard limit is h assignments per task, since each (task, human) pair can be used at most once. Once every human has labeled a given task, no further work for that task can ever be scheduled, and the total schedule length across all days is bounded by t * h.

Complexity (per day)

Let b be the size of today's batch.

Time: O(b * h) worst case, since picking a human scans up to h candidates.

Space: O(t + P) where P is the cumulative number of (task, human) pairs already used. P grows monotonically and is bounded by t * h.

Auto-save enabled

Loading editor…

Output

Run your code to see the output here.