Architecture Decision · Task Builder

One repo, two processes

Should the Task Builder stay a separate service in a separate repo, or move into the Utkrushta backend as a monolith? The evidence says: merge the repo, keep the process separate — and the "input files" belong in Supabase, not AWS DocumentDB.

Scope · decision doc, no code changed Repos · ngm9/Utkrushta · ngm9/utkrusht-task Companion · "Create a Task via Chat, from a Position" Date · 4 Jul 2026

00Verdict

Repo boundary
Merge ✓ one repo
Move the task-generation code into the Utkrushta monorepo as a fourth service directory. The schema already lives there; the DAO layer it needs already exists there.
Process boundary
Don't merge ✗ own container
Keep it an independently deployed FastAPI container — the notifications/ pattern. Multi-minute pipeline jobs don't belong inside the recruiter or candidate API processes.
Input files
Supabase JSONB ✗ DocumentDB
297 files / 2.7 MB of JSON derived from the competencies table. One new Postgres table, not a new AWS database cluster.

The companion design doc is unaffected

The recruiter feature ("create a task via chat, from a position") targets the Task Builder's HTTP API contract — proxy, sessions, chat, build, runs. That contract is identical whether the code lives in utkrusht-task or in Utkrushta/task_builder_service/. The two workstreams can proceed in parallel.

01The real question(s)

"Separate service vs monolith" bundles two independent decisions that deserve separate answers:

Question	What it's really about	Answer
Repo boundary — one repo or two?	Maintainability, code review, the duplicated data-access layer, schema/code drift, "we have to fix the same thing twice."	Merge
Process boundary — one deployed process or two?	Runtime behavior: request lifetimes, worker model, failure isolation, deploy cadence, dependency weight.	Keep separate

Almost all of the pain being felt — same DAO logic in both repos, drift, double maintenance — is repo-boundary pain. None of it requires collapsing the process. Utkrushta itself already proves this: it is one repo shipping four independent runtimes (Flask :4000, FastAPI :9000, notifications :5000, Airflow) off one shared data layer.

02Current state — and the hidden coupling

The two repos look independent. They aren't. The most load-bearing fact found in this investigation:

The Task Builder's database schema already lives in the other repo

Every table the Task Builder writes — conversations, generation_jobs, generated_scenarios, templates, task_template_match — is defined by migrations in Utkrushta/supabase/migrations/. utkrusht-task has zero migrations of its own. Its conversation_repo.py even carries a comment pointing at the other repo: "The shape mirrors the conversations migration in the Utkrushta backend repo."

ngm9/Utkrushta

supabase/migrations/ — owns the schema (86 files, incl. all 5 task-builder tables)
shared/daos/ — 45 typed DAOs incl. task_dao, competency_dao
Rule: no raw Supabase outside DAOs

ngm9/utkrusht-task

task_builder/ + generators/ — writes those same tables
Raw .table() calls in ~15 files
create_client() re-implemented 8+ times
No migrations, no DAO layer

⚠ invisible cross-repo contract — schema defined here, written there, enforced nowhere

↓ ↓

Shared Supabase — "aptitudetests"

tasks · competencies · conversations · generation_jobs · generated_scenarios · templates · positions · task_sessions

Today: two repos, one database. Utkrushta defines every table; utkrusht-task writes to them with raw, untyped clients. A column rename in one repo silently breaks the other.

This is the worst version of a service boundary: the deployment is split, but the data model is shared with no enforcement. A real service boundary would own its schema and expose an API. What exists today is a monolith's data layer spread across two repos — which is why it hurts.

03The duplication inventory

Concrete, file-level duplication found between the repos — this is the maintenance tax being paid today:

What	Utkrushta side	utkrusht-task side
Access to `tasks`	`shared/daos/task_dao.py` + `shared/models/task.py` (typed: `task_blob`, `criterias`, `status`, `eval_info`…)	`generators/task/persistence.py` — raw `.table("tasks").insert/update` of the exact same fields
Access to `competencies`	`shared/daos/competency_dao.py` + model (`scope`, `long_scope`, `proficiency`)	`generators/input_files/generator.py:fetch_competencies_from_db` — raw select of the same columns
Client init	One `BaseDAO` with injected client	`init_supabase()/create_client()` duplicated in 8+ files
GitHub tooling	`fastapi_service/github_utils.py` (~1,000 LOC)	`infra/github_utils.py` (247 LOC) — an identical-signature subset (`slugify`, `create_github_template_repo`, `create_repo_from_template`, `upload_files_batch`…)
Task lifecycle logic	`fastapi_service/task_utils/multiagent.py` — the deploy half	`multiagent.py` — the generate half. Historically one file, now split across repos.
E2B templates	`e2b_templates/` (runtime deploy set)	`infra/e2b/templates/` (build-gate set) — overlapping, divergent
Env conventions	Identical on both sides already: `SUPABASE_URL_APTITUDETESTS[DEV]`, `GITHUB_UTKRUSHTAPPS_TOKEN`, `PORTKEY_API_KEY`, `E2B_API_KEY` — merging costs nothing here

Why "add DAOs to utkrusht-task" doesn't fix it

Copying the DAO layer into the second repo creates a third thing to keep in sync (two DAO copies + the schema). The only structure that removes the duplication instead of relocating it is one repo where the schema, the DAOs, and every consumer live together.

04Why merge the repo

Schema and writers reunited. The migrations already live in Utkrushta. Moving the writer next to them makes every schema change a single reviewable PR that updates the migration, the DAO, and all consumers atomically.
The DAO layer already exists — it just isn't being used. ConversationDAO, GenerationJobDAO, GeneratedScenarioDAO, TemplateDAO are the only new classes needed; task_dao and competency_dao are adopted as-is. The 8 copies of create_client() collapse into BaseDAO.
The monorepo pattern is proven in-house. Four runtimes already share shared/ via PYTHONPATH, each with its own Dockerfile and a path-triggered CI workflow (fastapi_service/** changes build only the FastAPI image). Adding a fifth service directory is following the paved road, not paving a new one.
One review culture, one import-safety net. Utkrushta's CI import tests and its "no raw Supabase outside DAOs" rule start covering task generation for free.
Small team economics. Two repos means two dependency updates, two CI setups, two sets of conventions, two places to grep. Nothing about the task builder's domain justifies that overhead — it shares the product's core tables.

05Why not merge the process

Folding the Task Builder into the Flask or FastAPI process would be a category error. Its workload is nothing like a request/response API:

Property	flask / fastapi services	Task Builder
Request duration	milliseconds–seconds	minutes — a 5-stage LLM pipeline per job
Execution model	Gunicorn, multiple workers, stateless	Single process; daemon threads spawning 5 subprocesses per run (`python -m generators.*`), progress scraped from stdout
Concurrency	high, uniform	semaphore-capped at 3 concurrent jobs; in-process job state
Deploy tolerance	restart any time	a restart kills in-flight generations — merging means every recruiter-API deploy aborts running builds
Failure blast radius	one bad request	a runaway generation job (memory, subprocess, LLM loop) — must not take candidate sessions down with it
Dependencies	web stack	`dspy`, `claude-agent-sdk`, `e2b`, `PyGithub`, `sse-starlette` — heavy, fast-moving
Python	3.11 / 3.12 images	3.13 image today
External side effects	DB + a few APIs	creates GitHub repos, gists, E2B sandboxes, S3 traces

The in-repo precedent already exists: notifications/

notifications/ is a service that lives in the Utkrushta repo but deploys as its own lean container (21 deps, own Dockerfile, own CI workflow). That is exactly the shape the Task Builder should take — co-located for maintainability, isolated for runtime. Note one deliberate difference: notifications keeps its own db layer off shared/; the Task Builder should do the opposite and adopt shared/daos, because sharing the data layer is the entire point of this move.

The decision, side by side

	A · Status quo two repos, two services	B · True monolith code + process into the backend APIs	C · Monorepo + own service chosen
Schema & writers in one place	✗	✓	✓
One DAO layer, zero duplication	✗	✓	✓
One repo to maintain	✗	✓	✓
Deploys don't kill multi-minute runs	✓	✗	✓
Failure isolation from candidate/recruiter APIs	✓	✗	✓
Heavy deps stay out of API images	✓	✗	✓
Independent release cadence (prompt tweaks ≠ API deploys)	✓	✗	✓ (path-triggered CI)

06Target state

Utkrushta/ ├── flask_service/ # unchanged · recruiter API :4000 ├── fastapi_service/ # unchanged · candidate API :9000 ├── notifications/ # unchanged · :5000 ├── task_builder_service/ # NEW — moved from utkrusht-task │ ├── task_builder/ # FastAPI app :8000 (1.5k LOC) │ ├── generators/ # pipeline stages 00–04 (11k LOC) │ ├── infra/ # llm_provider · e2b · tracing · prompt_cache │ ├── task_generation_prompts/ # prompt libraries (stay as code) │ ├── task_input_parser/ # needed for the 00b_parse_brief step │ ├── requirements.txt # own deps: dspy · claude-agent-sdk · e2b… │ └── task_builder.Dockerfile ├── shared/ │ ├── daos/ # + conversation_dao · generation_job_dao │ │ # generated_scenario_dao · template_dao │ └── models/ # + matching Pydantic models └── supabase/migrations/ # already owns the schema — nothing moves

recruiter-utkrusht (Next.js)

/api/task-builder/[...path] proxy → stamps X-Testmaker-Id + X-Internal-Token

↓ TASK_BUILDER_URL — unchanged contract

flask :4000

own image + CI

fastapi :9000

own image + CI

notifications :5000

own image + CI

task_builder :8000 ★

own image + CI · single worker
5-stage pipeline inside

↓ all via shared/daos ↓

Shared Supabase

one schema · one migration ledger · typed access everywhere

Target: one repo, four+ independently deployed containers, one typed data layer. The recruiter portal's proxy and the companion design doc are untouched — only the code's home changes.

07What moves, what doesn't

Component	Disposition	Why
`task_builder/` · `generators/` · runtime slice of `infra/` · `run_pipeline.py` + stage entrypoints · `cli/`	Moves	The actual runtime of the service (~20k LOC of real source).
`task_generation_prompts/` (32k LOC data-as-code)	Moves	Prompts benefit from git review; keep as code for now.
`task_input_parser/`	Moves	The companion doc's `00b_parse_brief` step needs it soon.
`tests/`	Moves	Comes with the code it tests.
`infra/github_utils.py`	Merges	Fold the 247-LOC subset into one shared module with `fastapi_service/github_utils.py`.
E2B templates (both sets)	Merges	One canonical template dir serving build-gate and deploy.
`data/generated/input_files/` (297 files, 2.7 MB)	→ DB	Becomes the `competency_input_files` table — see §08.
`data/generated/task_artifacts/` (98 MB of generated repos)	Stays out	Build output, not source. Lives in S3 / the created GitHub repos.
`trace_ui/` · `task_quality/` · `task_validation/`	Tools dir / later	Offline eval & debug tooling; not imported by the service.
`flows/` (pr_review · non_tech)	Decide separately	Own CLIs, own deps (google-api). Not part of this move.
`paramiko` / `python-digitalocean` deps	Drop	Legacy droplet path — already superseded by E2B.

08Input files — a database row, not a database cluster

What they actually are

297 files (2.7 MB) across ~82 competency directories: a competency_*.json + background_forQuestions_*.json pair per competency. They are not hand-authored source — pipeline stage 01 generates them from the Supabase competencies table plus one OpenAI call, writes them to disk, and stages 02/04 read them back by absolute path. Humans occasionally hand-tune a single field (minutes_range) via git commits.

Name the pattern and the answer falls out

This is a derived cache keyed by (competency, proficiency), whose source of truth is already a Postgres table. Derived data belongs next to its source — in the same database — not in git, and certainly not in a second database technology.

Options compared

	Supabase JSONB chosen	S3	AWS DocumentDB rejected
New infrastructure	none — DB already shared by every service	none (trace bucket exists)	new cluster, VPC networking, new client + ops surface
Cost	~0	~0	≈ $200+/month minimum
Query / join to `competencies`	native SQL join	none	no join to your Postgres — it's a separate MongoDB-compatible store
Fits the DAO pattern	one more DAO	no	no
Human tuning (`minutes_range`)	admin-UI field instead of git commits	awkward	new tooling needed
Proportionality for 2.7 MB of derived JSON	right-sized	fine but unqueryable	overkill by orders of magnitude

Proposed table

create table competency_input_files (
  competency_id    uuid references competencies,
  proficiency      text not null,
  competency_json  jsonb not null,   -- was competency_*.json
  background_json  jsonb not null,   -- was background_forQuestions_*.json
  source_hash      text,             -- hash of the competency row it derives from
  generated_at     timestamptz not null default now(),
  edited_by        text,             -- preserves the human-tuning workflow
  primary key (competency_id, proficiency)
);

Stage 01 becomes an upsert-if-stale (compare source_hash); stages 02/04 read from the DB — or, as a zero-risk first step, a thin shim materializes temp files so the subprocess interfaces don't change at all. data/generated/ then leaves git entirely.

Deliberately unchanged: task_generation_prompts/ stays as code. Prompt libraries are reviewed, diffed, and reasoned about like source — that's a feature, not a smell. Only the machine-generated cache moves to the DB.

09Migration plan

Ship the recruiter MVP against the existing service — in parallel. The feature targets the stable HTTP contract; do not block it on this migration.
Move the runtime slice into Utkrushta/task_builder_service/ with its own Dockerfile, requirements, and a path-triggered CI workflow cloned from the notifications pattern. Goal: deploys from the monorepo with zero behavior change. Same TASK_BUILDER_URL, same INTERNAL_PROXY_TOKEN.
Adopt the shared data layer. Add ConversationDAO, GenerationJobDAO, GeneratedScenarioDAO, TemplateDAO (+ models); switch the pipeline to the existing task_dao/competency_dao; delete the 8 create_client() copies. Merge github_utils into one shared module.
Move input files to competency_input_files (upsert-if-stale in stage 01, DB reads or temp-file shim in stages 02/04). Remove data/generated/ from git; keep 98 MB of task artifacts in S3/GitHub where they already end up.
Align Python and archive. Settle on one version (likely 3.12 — verify dspy and claude-agent-sdk first), then archive ngm9/utkrusht-task with a pointer to the new home.
(Later, cheap once co-located) Harden the job model: a worker that polls the existing generation_jobs table so in-flight runs survive restarts. The table is already there; only the runner moves out-of-process.

10Risks & caveats — stated honestly

Risk	Assessment & mitigation
Utkrushta repo gets heavier	True: +~20k LOC source, +32k prompt data, new CI workflow. Mitigated by per-service requirements and path-triggered builds — the Flask/FastAPI images and pipelines are untouched by task-builder changes.
In-process job model stays fragile	The merge doesn't fix it — a crash still loses in-flight runs (DB row survives). Accepted for now; step 6 (queue worker on `generation_jobs`) is the fix and becomes much cheaper once the code lives beside the backend.
Python version skew (3.13 vs 3.11/3.12)	Must be resolved before the move; verify the two heaviest deps (`dspy`, `claude-agent-sdk`) on the chosen version. Low risk, but do it first, not last.
Single-worker constraint	The thread + semaphore job model only works with one process. Fine as its own container (that's the point); would silently break under Gunicorn multi-worker — the strongest single argument against option B.
Migration disrupts the recruiter feature	Avoided by sequencing: the HTTP contract is frozen during the move; the recruiter portal only ever sees `TASK_BUILDER_URL`.

AEvidence appendix

Every claim above traces to files read in both repos on 4 Jul 2026:

Utkrushta/supabase/migrations/20260530091150_create_conversations.sql — header: "Replaces an in-memory SESSIONS dict in task_builder/server.py"; siblings create generation_jobs · generated_scenarios · templates · task_template_match
utkrusht-task/task_builder/conversation_repo.py:8 — "The shape mirrors the conversations migration in the Utkrushta backend repo"
Utkrushta/shared/daos/ — ~45 DAOs incl. task_dao.py · competency_dao.py · base.py; ~40 models in shared/models/; rule & exceptions documented in Utkrushta/CLAUDE.md
utkrusht-task/generators/task/persistence.py · generators/input_files/generator.py · generators/prompts/db_queries.py · generators/scenarios/repository.py · infra/e2b/supabase_helpers.py · gist_manager.py · trace_ui/server.py · task_agent_preflight.py — the 8+ raw create_client() sites
utkrusht-task/task_builder/jobs.py — daemon-thread runner, BoundedSemaphore(3), "no cross-process queue" consolidation note; task_builder/runner.py — 5 subprocess stages, stdout scraping
utkrusht-task/infra/github_utils.py (247 LOC) vs Utkrushta/fastapi_service/github_utils.py (~1,000 LOC) — identical signatures
utkrusht-task/data/generated/input_files/ — 297 files / 2.7 MB / ~82 competency dirs; paths hard-coded in generators/input_files/generator.py:41 and run_pipeline.py:53; human-tuning note at generator.py:555
utkrusht-task/task_builder/Dockerfile — python:3.13-slim; Utkrushta service images on 3.11/3.12; notifications precedent: Utkrushta/notifications/ (own Dockerfile, 21 deps, own db layer)
Scale: Utkrushta ≈ shared 30k · flask 21k · fastapi 13k · airflow 37k LOC; utkrusht-task runtime ≈ task_builder 1.5k · generators 11k · infra 7k · prompts 32k · task_artifacts 98 MB (generated)

Utkrusht · "Task Builder — one repo, two processes" · companion to Create a Task via Chat, from a Position. This document changes no application code; it records the architecture decision and the migration plan for the follow-up implementation.