Product Requirements · v0.1 Draft

Staged Interview Practice Harness

A self-verifying, multi-stage coding interview harness that reproduces the full shape of a real senior-eng interview — staged progression, realistic edge cases, discussion follow-ups — not just LeetCode-style "did the function return the right value."

AuthorRaunaq Naidu

StatusDraft

Target launchv0.2 in ~4 weeks

StakeholdersSelf · candidates prepping for senior interviews

01Problem & opportunity

What's missing today

LeetCode-style platforms test correctness, not the multi-stage interview arc (build → optimize → defend → discuss).
Public solutions show the answer immediately, so you can't practice the actual judgment work — "is my approach reasonable?"
Tests don't catch the edge cases a real interviewer probes (off-host links in a crawler, kwargs-with-unhashable-values in an LRU, RGBA inputs in an image pipeline).
No surface for the discussion follow-ups ("why processes not threads here?") — yet those are 30–40% of the signal.
Existing tooling locks you into one language / one library; real interviews let you choose.

What this product does

Ships each problem in practice/, solution/, test/ triples with a single-line toggle to run tests against either implementation.
Splits each problem into Stage 1 (correctness), Stage 2 (optimization), and Stage 3 (discussion-only follow-ups embedded in the practice file).
Tests are written by someone who's been on the asking side — they catch the edge cases an interviewer actually probes, not just the happy path.
Where the problem allows it, tests are library-agnostic — they only inspect on-disk artifacts, so candidates can use Pillow / OpenCV / numpy / whatever.
Reference solutions are gated behind the toggle — you don't see them by default.

02Target users

solo learner coach / mentor hiring team

Solo learner

Primary

"I have an onsite in 3 weeks. LeetCode trains me for the wrong shape — I keep failing the optimization stage and the follow-up discussion."

Wants: ~5 high-quality problems with the full arc, realistic edge cases, and a way to verify correctness without spoiling the answer.

Coach / mentor

Secondary

"I run mock interviews for friends. I keep recreating the same problems from scratch in a Google Doc with my own test cases."

Wants: a curated catalog they can hand to a candidate, with the discussion prompts already written down.

Hiring team

Tertiary / later

"We want to standardize what we ask, but maintaining a private problem bank is a lot of overhead."

Wants: a way to fork the harness, add private problems with the same structure, and trust the test suite.

03Product principles

The arc, not the answer

Every problem ships staged. Stage 1 alone is uninteresting; the bar is "candidate can defend their Stage 2 choice in Stage 3."

Tests are the spec

The test file is the source of truth for "did you understand the problem." Solutions are reference, not canon.

Don't spoil by default

A practice template should never accidentally give away the API shape, the data structure, or the library. Spoilers are gated behind a deliberate toggle.

Library-agnostic where you can

If the problem lets the candidate choose tools, the tests must too. Inspect on-disk artifacts, not in-memory objects.

Self-verifying

Toggle to solution → all tests pass. That's how the candidate trusts the suite isn't lying to them.

No external state

Tests generate their own fixtures programmatically. No checked-in images, no network, no database, no flakiness.

04Functional requirements

Tier	Requirement	Status (v0.1)
P0	Each problem ships as a triple: `practice/`, `solution/`, `test/`.	Done for 3 problems.
P0	Single-line toggle at the top of the test file flips between practice and solution targets.	Done.
P0	Each test suite passes 100% against its `solution/` impl.	Done. LRU, crawler, image processor all green.
P0	Each problem has a Stage 1 (correctness) and Stage 2 (optimization or extension).	Done.
P0	Stage 3 (discussion-only) follow-ups embedded as comments in the practice file.	Done for crawler and image processor; LRU partial.
P1	Library-agnostic tests for problems where the candidate picks tools.	Done for image processor; N/A for LRU and crawler (pure Python).
P1	Per-problem `INTERVIEW_QUESTIONS.md` with the interviewer's probe questions in order.	Done for crawler only.
P1	Tests generate fixtures programmatically; no checked-in binary assets.	Done.
P1	Tests run in < 1 second per problem on a laptop.	Done. ~0.4s for image processor, <0.1s for LRU.
P1	Each practice file's docstring states the stages, time budgets, and the exported names the tests will import.	Done.
P2	CLI to scaffold a new problem (creates the practice/solution/test triple from a template).	Not started.
P2	Progress tracking: per-problem pass count, time-to-first-green, attempts.	Not started.
P2	Catalog index page listing all problems with stage / difficulty / topic tags.	Not started.
P2	Coach mode: hide solution module entirely until a pass-count threshold is reached.	Not started.

05Non-functional requirements

Determinism. No flakiness. Concurrency tests must use generous timeouts; speed assertions are 3× sequential at worst, never tight bounds.
Portability. Tests run on macOS and Linux without modification. Anything Windows-specific must be flagged.
Zero install at runtime. Tests use stdlib only, except for problem-specific dependencies the practice file requires (e.g. Pillow for the image processor).
Readable failure messages. When a test fails, the message should tell the candidate what they got wrong, not just "AssertionError." Use assertEqual(..., msg="...") aggressively.
No hidden state across tests. Each test gets its own temp directory; teardown is mandatory.

06Differentiation

Capability	This harness	LeetCode	NeetCode	Pair-mocks (e.g. interviewing.io)
Multi-stage problem arc	Yes	No	No	Sometimes
Library-agnostic tests	Yes	No	No	N/A (human-graded)
Solution hidden by default	Yes	No	N/A
Edge cases an interviewer would probe	Yes	Sometimes	Sometimes	Yes (live)
Discussion follow-up prompts	Yes	No	Video only	Yes (live)
Local, free, offline	Yes	No	No	No

07Success metrics

Catalog depth

10 problems

Target for v1.0 (currently 3).

Solution test pass rate

100%

Toggle-to-solution must always be green. Hard gate.

Topic coverage

≥ 5 distinct skill areas

Currently 3: concurrency, persistence, image processing. Need OOP design, distributed, parsing/state machines.

Time to first scaffold

< 2 min

From clone to "tests fail meaningfully on a stub" — requires the v0.2 scaffolding CLI.

Personal: onsite confidence

Subjective

Single user (me) reports feeling prepared for Stage 2 + Stage 3 of the target interview. Renew weekly.

External adopters

≥ 5 forks

Only meaningful after public release. Lagging indicator.

08Phased roadmap

v0.1 · now

Three problems, ad-hoc structure

LRU cache + persistence, web crawler (BFS + MT), batch image processor (sequential + parallel). All passing against their solutions. Toggle convention established.

v0.2 · ~4 weeks

Standardize + scaffold

Lift toggle, docstring header, and test-fixture helpers into a small library. CLI to create a new problem triple from a template. Add 2 problems: a parser/state-machine problem and an OOP design problem.

v0.3

Catalog & tags

Top-level index with topic / difficulty / time-budget tags. Per-problem README pulled from the practice docstring. Progress tracking (per-problem pass count) in a local JSON.

v0.4

Coach mode + private problem bank

Hide solution module until a pass-count threshold. Document the contract for adding private problems (so teams can fork and extend).

v1.0

Public release

≥ 10 problems across ≥ 5 skill areas. Public GitHub repo with template-fork instructions. Optional hosted runner.

09Open questions

Distribution. Single repo template, pip package, or hosted runner? The repo-template route is simplest but doesn't enforce shared structure across forks.
Language scope. v0.1 is Python-only. Is the harness pattern (practice/solution/test + toggle) worth porting to JS/TS, Go? Or is Python sufficient for the target audience?
Solution gating. Toggle is a comment edit today. Should it be enforced via env var / CLI flag instead so candidates can't accidentally glance at the solution?
Discussion prompts. Live as comments in the practice file today. Better as a separate FOLLOWUPS.md the candidate is told to open after their code passes?
Problem provenance. Several of these problems are adapted from real interview leaks. What's the policy on attribution / on hosting them publicly?

10Non-goals

Comprehensive LeetCode coverage. This product is depth (the full arc), not breadth.
Automated grading, scoring, or leaderboards. Tests pass or they don't; subjective signal stays subjective.
Live mock interview matchmaking. Out of scope for the harness; complements other services that do this well.
Behavioral / systems design / take-home problems in v1.0. Coding only.