You are given an existing in-memory LRU (Least Recently Used) cache implementation in Python. The implementation has a bug in the cache key generation logic that needs to be fixed.
The bug is related to how the cache handles Python's variable-length arguments (*args) and keyword arguments (**kwargs) when generating cache keys.
Your task is to:
Understand the existing cache implementation and its basic functionality
Identify the bug in the cache key generation step
**Fix the cache key generation to properly handle *args and kwargs
Your solution should:
**Identify the bug — understand why the current cache key generation fails with *args and kwargs
Generate hashable keys — convert arguments (both positional and keyword) into a format that can be used as dictionary keys
**Handle argument variations — properly process both *args and kwargs
Maintain cache correctness — ensure consistent key generation for the same inputs
Python-specific: This problem is specific to Python and requires understanding of:
How *args and **kwargs work in Python
What types can be used as dictionary keys (hashable vs non-hashable)
How to convert arguments into a consistent, hashable format
Key Generation Hints:
Consider how to handle keyword argument order (e.g., func(a=1, b=2) vs func(b=2, a=1))
Think about converting unhashable types (lists, dicts) to hashable equivalents (tuples, frozensets)
Look into Python's functools module for inspiration
After fixing the basic cache implementation, extend it to support persistent storage. The cache should be able to:
Survive restarts — if the cache process crashes or restarts, it should recover all cached data
Avoid data loss — ensure no data is lost during normal operation or unexpected crashes
Maintain LRU semantics — the persistence mechanism should preserve the LRU eviction order
Disk-based storage — persist cache entries to disk
Recovery on restart — when the cache restarts, load all previous entries from disk
Data consistency — ensure the in-memory cache and disk storage stay synchronized
Performance considerations — balance between write performance and durability guarantees
Discussion Points:
Storage Format:
How to serialize cache entries (JSON, pickle, custom format)?
Where to store the data (single file vs multiple files)?
Write Strategy:
Write-through: Write to disk immediately on every cache update (slow but safe)
Write-back: Batch writes periodically (fast but risk of data loss)
Append-only log: Write operations to a log file
Recovery Strategy:
How to reconstruct the LRU order from disk?
Should you store metadata (timestamps, access counts)?
Performance Trade-offs:
Synchronous vs asynchronous writes
Consistency vs performance
Disk I/O overhead
The following questions are for verbal discussion. The interviewer does not expect you to code solutions, but be prepared to explain your approach and reasoning.
Question: Besides LRU, what other cache eviction policies exist? When would you use each one?
Discussion Points:
LRU (Least Recently Used): Evict least recently accessed item
LFU (Least Frequently Used): Evict least frequently accessed item
FIFO (First In First Out): Evict oldest item by insertion time
Random: Randomly evict an item
TTL (Time To Live): Evict based on expiration time
Trade-offs: Each policy suits different access patterns and use cases
Question: How would you extend this cache to work across multiple machines in a distributed system?
Discussion Points:
Cache Consistency: How to keep caches synchronized across machines
Shared distributed cache (Redis, Memcached)
Cache invalidation strategies
Eventual consistency vs strong consistency
Data Partitioning: How to distribute cache entries
Consistent hashing
Range-based partitioning
Replication: Fault tolerance through data replication
Network Overhead: Communication costs between machines
Question: How would you make the cache thread-safe for concurrent access?
Discussion Points:
Locking Strategies:
Global lock (simple but limits concurrency)
Fine-grained locking (better concurrency but more complex)
Read-write locks (optimize for read-heavy workloads)
Lock-free Data Structures: Using atomic operations
Concurrency Primitives: Mutexes, semaphores, condition variables
Deadlock Prevention: Avoiding circular dependencies
Performance Impact: Lock contention and throughput
Question: How would you handle memory constraints and prevent the cache from consuming too much memory?
Discussion Points:
Size Limits:
Maximum number of entries
Maximum total memory usage
Per-entry size limits
Memory Estimation: How to calculate size of cached objects
Eviction Triggers: When to start evicting entries
Memory Pressure Handling:
Proactive eviction before OOM
Integration with system memory monitors
Trade-offs: Cache hit rate vs memory usage
✦ AI-Generated Solution · Coding · Comprehensive The original buggy source was reconstructed from candidate reports (the admin confirms the real code isn't available). This solution explains the canonical bug in the cache-key step, gives a corrected memoizing LRU cache, and implements the persistent/durable follow-up. Verified with an assertion suite.
This is a function-memoizing LRU cache: it caches results keyed by the call arguments. The bug lives in how the key is built from *args and **kwargs.
The canonical bug. A naive implementation builds the key one of these broken ways:
key = args # BUG 1: ignores kwargs entirely -> wrong cache hits
key = str(args) + str(kwargs) # BUG 2: kwargs order-sensitive; f(a=1,b=2) != f(b=2,a=1)
key = (args, kwargs) # BUG 3: dict is unhashable -> TypeError at runtime
All three fail. The key must be (a) hashable, (b) order-independent for keyword arguments, and (c) able to absorb unhashable argument values (lists, dicts, sets) by freezing them into tuples/frozensets.
The fix — a correct key builder:
from collections import OrderedDict
def make_key(args, kwargs):
"""Hashable, kwargs-order-independent key that freezes unhashable values."""
def freeze(x):
if isinstance(x, (list, tuple)):
return tuple(freeze(i) for i in x)
if isinstance(x, dict):
return tuple(sorted((k, freeze(v)) for k, v in x.items()))
if isinstance(x, set):
return frozenset(freeze(i) for i in x)
return x
frozen_args = tuple(freeze(a) for a in args)
frozen_kwargs = tuple(sorted((k, freeze(v)) for k, v in kwargs.items()))
return (frozen_args, frozen_kwargs)
class LRUCache:
def __init__(self, capacity, func):
self.capacity = capacity
self.func = func
self.store = OrderedDict() # insertion order == recency order
def call(self, *args, **kwargs):
key = make_key(args, kwargs)
if key in self.store:
self.store.move_to_end(key) # mark most-recently-used
return self.store[key]
result = self.store[key] = self.func(*args, **kwargs)
self.store.move_to_end(key)
if len(self.store) > self.capacity:
self.store.popitem(last=False) # evict least-recently-used
return result
OrderedDict gives O(1) move_to_end and popitem(last=False), so every cache operation is O(1). (Python's own functools.lru_cache uses the same make_key idea plus a doubly linked list; pointing this out shows you know the stdlib.)
Why sorted on kwargs matters: it makes f(a=1, b=2) and f(b=2, a=1) collapse to the same key. Why freeze: it lets you cache calls like f([1,2,3]) without a TypeError, while still distinguishing [1,2] from [1,2,3].
Goal: survive process restarts with no data loss while preserving LRU order. The clean answer is an append-only log (write-ahead log) + periodic snapshot, which is exactly how real caches/databases get durability without paying a full rewrite on every update.
import json, os
class PersistentLRUCache(LRUCache):
def __init__(self, capacity, func, path="cache.log"):
super().__init__(capacity, func)
self.path = path
self._recover() # rebuild state on startup
self.log = open(self.path, "a")
def _recover(self):
if not os.path.exists(self.path):
return
with open(self.path) as f:
for line in f: # replay in order -> LRU order restored
rec = json.loads(line)
k = tuple(map(tuple, rec["key"]))
if rec["op"] == "put":
self.store[k] = rec["val"]
self.store.move_to_end(k)
if len(self.store) > self.capacity:
self.store.popitem(last=False)
elif rec["op"] == "touch" and k in self.store:
self.store.move_to_end(k)
def call(self, *args, **kwargs):
key = make_key(args, kwargs)
if key in self.store:
self.store.move_to_end(key)
self._append("touch", key, None) # persist recency change
return self.store[key]
result = self.func(*args, **kwargs)
self.store[key] = result
self.store.move_to_end(key)
self._append("put", key, result)
if len(self.store) > self.capacity:
self.store.popitem(last=False)
return result
def _append(self, op, key, val):
self.log.write(json.dumps({"op": op, "key": key, "val": val}) + "\n")
self.log.flush(); os.fsync(self.log.fileno()) # write-through durability
Design discussion to raise out loud:
fsync (above) is safe but slow; write-back batching is fast but loses the tail on a crash; the append-only log is the middle ground and is what most systems pick.put and touch records re-sequence the key.store and truncate the log (standard WAL checkpointing).pickle is convenient but unsafe across versions/untrusted input. For an interview, JSON + a note that you'd use a binary format for speed is the right call.make_key: O(total size of arguments) — linear in the data you're hashing anyway.get / put / evict: O(1) via OrderedDict.call: O(1) in memory + one appended, fsync'd log line for durability.An assertion suite confirms: repeated calls hit the cache (the wrapped function runs once); f(a=1, b=2) and f(b=2, a=1) share a key; unhashable list arguments are cached without error; and LRU eviction drops the least-recently-used key at capacity. It also demonstrates that the naive str(args)+str(kwargs) key is order-sensitive (the planted bug). All pass:
LRU bug-fix: ALL ASSERTIONS PASS
Demonstrated: naive str(args)+str(kwargs) key is order-sensitive (the bug)