One repo, two processes
Should the Task Builder stay a separate service in a separate repo, or move into the Utkrushta backend as a monolith? The evidence says: merge the repo, keep the process separate — and the "input files" belong in Supabase, not AWS DocumentDB.
00Verdict
notifications/ pattern. Multi-minute pipeline jobs don't belong inside the recruiter or candidate API processes.competencies table. One new Postgres table, not a new AWS database cluster.utkrusht-task or in Utkrushta/task_builder_service/. The two workstreams can proceed in parallel.
01The real question(s)
"Separate service vs monolith" bundles two independent decisions that deserve separate answers:
| Question | What it's really about | Answer |
|---|---|---|
| Repo boundary — one repo or two? | Maintainability, code review, the duplicated data-access layer, schema/code drift, "we have to fix the same thing twice." | Merge |
| Process boundary — one deployed process or two? | Runtime behavior: request lifetimes, worker model, failure isolation, deploy cadence, dependency weight. | Keep separate |
Almost all of the pain being felt — same DAO logic in both repos, drift, double maintenance — is repo-boundary pain. None of it requires collapsing the process. Utkrushta itself already proves this: it is one repo shipping four independent runtimes (Flask :4000, FastAPI :9000, notifications :5000, Airflow) off one shared data layer.
02Current state — and the hidden coupling
The two repos look independent. They aren't. The most load-bearing fact found in this investigation:
conversations, generation_jobs, generated_scenarios, templates, task_template_match — is defined by migrations in Utkrushta/supabase/migrations/. utkrusht-task has zero migrations of its own. Its conversation_repo.py even carries a comment pointing at the other repo: "The shape mirrors the conversations migration in the Utkrushta backend repo."
supabase/migrations/— owns the schema (86 files, incl. all 5 task-builder tables)shared/daos/— 45 typed DAOs incl.task_dao,competency_dao- Rule: no raw Supabase outside DAOs
task_builder/+generators/— writes those same tables- Raw
.table()calls in ~15 files create_client()re-implemented 8+ times- No migrations, no DAO layer
- tasks · competencies · conversations · generation_jobs · generated_scenarios · templates · positions · task_sessions
This is the worst version of a service boundary: the deployment is split, but the data model is shared with no enforcement. A real service boundary would own its schema and expose an API. What exists today is a monolith's data layer spread across two repos — which is why it hurts.
03The duplication inventory
Concrete, file-level duplication found between the repos — this is the maintenance tax being paid today:
| What | Utkrushta side | utkrusht-task side |
|---|---|---|
Access to tasks |
shared/daos/task_dao.py + shared/models/task.py (typed: task_blob, criterias, status, eval_info…) |
generators/task/persistence.py — raw .table("tasks").insert/update of the exact same fields |
Access to competencies |
shared/daos/competency_dao.py + model (scope, long_scope, proficiency) |
generators/input_files/generator.py:fetch_competencies_from_db — raw select of the same columns |
| Client init | One BaseDAO with injected client |
init_supabase()/create_client() duplicated in 8+ files |
| GitHub tooling | fastapi_service/github_utils.py (~1,000 LOC) |
infra/github_utils.py (247 LOC) — an identical-signature subset (slugify, create_github_template_repo, create_repo_from_template, upload_files_batch…) |
| Task lifecycle logic | fastapi_service/task_utils/multiagent.py — the deploy half |
multiagent.py — the generate half. Historically one file, now split across repos. |
| E2B templates | e2b_templates/ (runtime deploy set) |
infra/e2b/templates/ (build-gate set) — overlapping, divergent |
| Env conventions | Identical on both sides already: SUPABASE_URL_APTITUDETESTS[DEV], GITHUB_UTKRUSHTAPPS_TOKEN, PORTKEY_API_KEY, E2B_API_KEY — merging costs nothing here |
|
04Why merge the repo
- Schema and writers reunited. The migrations already live in Utkrushta. Moving the writer next to them makes every schema change a single reviewable PR that updates the migration, the DAO, and all consumers atomically.
- The DAO layer already exists — it just isn't being used.
ConversationDAO,GenerationJobDAO,GeneratedScenarioDAO,TemplateDAOare the only new classes needed;task_daoandcompetency_daoare adopted as-is. The 8 copies ofcreate_client()collapse intoBaseDAO. - The monorepo pattern is proven in-house. Four runtimes already share
shared/viaPYTHONPATH, each with its own Dockerfile and a path-triggered CI workflow (fastapi_service/**changes build only the FastAPI image). Adding a fifth service directory is following the paved road, not paving a new one. - One review culture, one import-safety net. Utkrushta's CI import tests and its "no raw Supabase outside DAOs" rule start covering task generation for free.
- Small team economics. Two repos means two dependency updates, two CI setups, two sets of conventions, two places to grep. Nothing about the task builder's domain justifies that overhead — it shares the product's core tables.
05Why not merge the process
Folding the Task Builder into the Flask or FastAPI process would be a category error. Its workload is nothing like a request/response API:
| Property | flask / fastapi services | Task Builder |
|---|---|---|
| Request duration | milliseconds–seconds | minutes — a 5-stage LLM pipeline per job |
| Execution model | Gunicorn, multiple workers, stateless | Single process; daemon threads spawning 5 subprocesses per run (python -m generators.*), progress scraped from stdout |
| Concurrency | high, uniform | semaphore-capped at 3 concurrent jobs; in-process job state |
| Deploy tolerance | restart any time | a restart kills in-flight generations — merging means every recruiter-API deploy aborts running builds |
| Failure blast radius | one bad request | a runaway generation job (memory, subprocess, LLM loop) — must not take candidate sessions down with it |
| Dependencies | web stack | dspy, claude-agent-sdk, e2b, PyGithub, sse-starlette — heavy, fast-moving |
| Python | 3.11 / 3.12 images | 3.13 image today |
| External side effects | DB + a few APIs | creates GitHub repos, gists, E2B sandboxes, S3 traces |
notifications/ is a service that lives in the Utkrushta repo but deploys as its own lean container (21 deps, own Dockerfile, own CI workflow). That is exactly the shape the Task Builder should take — co-located for maintainability, isolated for runtime. Note one deliberate difference: notifications keeps its own db layer off shared/; the Task Builder should do the opposite and adopt shared/daos, because sharing the data layer is the entire point of this move.
The decision, side by side
| A · Status quo two repos, two services |
B · True monolith code + process into the backend APIs |
C · Monorepo + own service chosen | |
|---|---|---|---|
| Schema & writers in one place | ✗ | ✓ | ✓ |
| One DAO layer, zero duplication | ✗ | ✓ | ✓ |
| One repo to maintain | ✗ | ✓ | ✓ |
| Deploys don't kill multi-minute runs | ✓ | ✗ | ✓ |
| Failure isolation from candidate/recruiter APIs | ✓ | ✗ | ✓ |
| Heavy deps stay out of API images | ✓ | ✗ | ✓ |
| Independent release cadence (prompt tweaks ≠ API deploys) | ✓ | ✗ | ✓ (path-triggered CI) |
06Target state
/api/task-builder/[...path]proxy → stampsX-Testmaker-Id+X-Internal-Token
- own image + CI
- own image + CI
- own image + CI
- own image + CI · single worker
- 5-stage pipeline inside
shared/daos ↓- one schema · one migration ledger · typed access everywhere
07What moves, what doesn't
| Component | Disposition | Why |
|---|---|---|
task_builder/ · generators/ · runtime slice of infra/ · run_pipeline.py + stage entrypoints · cli/ | Moves | The actual runtime of the service (~20k LOC of real source). |
task_generation_prompts/ (32k LOC data-as-code) | Moves | Prompts benefit from git review; keep as code for now. |
task_input_parser/ | Moves | The companion doc's 00b_parse_brief step needs it soon. |
tests/ | Moves | Comes with the code it tests. |
infra/github_utils.py | Merges | Fold the 247-LOC subset into one shared module with fastapi_service/github_utils.py. |
| E2B templates (both sets) | Merges | One canonical template dir serving build-gate and deploy. |
data/generated/input_files/ (297 files, 2.7 MB) | → DB | Becomes the competency_input_files table — see §08. |
data/generated/task_artifacts/ (98 MB of generated repos) | Stays out | Build output, not source. Lives in S3 / the created GitHub repos. |
trace_ui/ · task_quality/ · task_validation/ | Tools dir / later | Offline eval & debug tooling; not imported by the service. |
flows/ (pr_review · non_tech) | Decide separately | Own CLIs, own deps (google-api). Not part of this move. |
paramiko / python-digitalocean deps | Drop | Legacy droplet path — already superseded by E2B. |
08Input files — a database row, not a database cluster
What they actually are
297 files (2.7 MB) across ~82 competency directories: a competency_*.json + background_forQuestions_*.json pair per competency. They are not hand-authored source — pipeline stage 01 generates them from the Supabase competencies table plus one OpenAI call, writes them to disk, and stages 02/04 read them back by absolute path. Humans occasionally hand-tune a single field (minutes_range) via git commits.
Options compared
| Supabase JSONB chosen | S3 | AWS DocumentDB rejected | |
|---|---|---|---|
| New infrastructure | none — DB already shared by every service | none (trace bucket exists) | new cluster, VPC networking, new client + ops surface |
| Cost | ~0 | ~0 | ≈ $200+/month minimum |
Query / join to competencies | native SQL join | none | no join to your Postgres — it's a separate MongoDB-compatible store |
| Fits the DAO pattern | one more DAO | no | no |
Human tuning (minutes_range) | admin-UI field instead of git commits | awkward | new tooling needed |
| Proportionality for 2.7 MB of derived JSON | right-sized | fine but unqueryable | overkill by orders of magnitude |
Proposed table
create table competency_input_files (
competency_id uuid references competencies,
proficiency text not null,
competency_json jsonb not null, -- was competency_*.json
background_json jsonb not null, -- was background_forQuestions_*.json
source_hash text, -- hash of the competency row it derives from
generated_at timestamptz not null default now(),
edited_by text, -- preserves the human-tuning workflow
primary key (competency_id, proficiency)
);
Stage 01 becomes an upsert-if-stale (compare source_hash); stages 02/04 read from the DB — or, as a zero-risk first step, a thin shim materializes temp files so the subprocess interfaces don't change at all. data/generated/ then leaves git entirely.
Deliberately unchanged: task_generation_prompts/ stays as code. Prompt libraries are reviewed, diffed, and reasoned about like source — that's a feature, not a smell. Only the machine-generated cache moves to the DB.
09Migration plan
- Ship the recruiter MVP against the existing service — in parallel. The feature targets the stable HTTP contract; do not block it on this migration.
- Move the runtime slice into
Utkrushta/task_builder_service/with its own Dockerfile, requirements, and a path-triggered CI workflow cloned from the notifications pattern. Goal: deploys from the monorepo with zero behavior change. SameTASK_BUILDER_URL, sameINTERNAL_PROXY_TOKEN. - Adopt the shared data layer. Add
ConversationDAO,GenerationJobDAO,GeneratedScenarioDAO,TemplateDAO(+ models); switch the pipeline to the existingtask_dao/competency_dao; delete the 8create_client()copies. Mergegithub_utilsinto one shared module. - Move input files to
competency_input_files(upsert-if-stale in stage 01, DB reads or temp-file shim in stages 02/04). Removedata/generated/from git; keep 98 MB of task artifacts in S3/GitHub where they already end up. - Align Python and archive. Settle on one version (likely 3.12 — verify
dspyandclaude-agent-sdkfirst), then archivengm9/utkrusht-taskwith a pointer to the new home. - (Later, cheap once co-located) Harden the job model: a worker that polls the existing
generation_jobstable so in-flight runs survive restarts. The table is already there; only the runner moves out-of-process.
10Risks & caveats — stated honestly
| Risk | Assessment & mitigation |
|---|---|
| Utkrushta repo gets heavier | True: +~20k LOC source, +32k prompt data, new CI workflow. Mitigated by per-service requirements and path-triggered builds — the Flask/FastAPI images and pipelines are untouched by task-builder changes. |
| In-process job model stays fragile | The merge doesn't fix it — a crash still loses in-flight runs (DB row survives). Accepted for now; step 6 (queue worker on generation_jobs) is the fix and becomes much cheaper once the code lives beside the backend. |
| Python version skew (3.13 vs 3.11/3.12) | Must be resolved before the move; verify the two heaviest deps (dspy, claude-agent-sdk) on the chosen version. Low risk, but do it first, not last. |
| Single-worker constraint | The thread + semaphore job model only works with one process. Fine as its own container (that's the point); would silently break under Gunicorn multi-worker — the strongest single argument against option B. |
| Migration disrupts the recruiter feature | Avoided by sequencing: the HTTP contract is frozen during the move; the recruiter portal only ever sees TASK_BUILDER_URL. |
AEvidence appendix
Every claim above traces to files read in both repos on 4 Jul 2026:
- Utkrushta/supabase/migrations/20260530091150_create_conversations.sql — header: "Replaces an in-memory SESSIONS dict in task_builder/server.py"; siblings create generation_jobs · generated_scenarios · templates · task_template_match
- utkrusht-task/task_builder/conversation_repo.py:8 — "The shape mirrors the conversations migration in the Utkrushta backend repo"
- Utkrushta/shared/daos/ — ~45 DAOs incl. task_dao.py · competency_dao.py · base.py; ~40 models in shared/models/; rule & exceptions documented in Utkrushta/CLAUDE.md
- utkrusht-task/generators/task/persistence.py · generators/input_files/generator.py · generators/prompts/db_queries.py · generators/scenarios/repository.py · infra/e2b/supabase_helpers.py · gist_manager.py · trace_ui/server.py · task_agent_preflight.py — the 8+ raw
create_client()sites - utkrusht-task/task_builder/jobs.py — daemon-thread runner, BoundedSemaphore(3), "no cross-process queue" consolidation note; task_builder/runner.py — 5 subprocess stages, stdout scraping
- utkrusht-task/infra/github_utils.py (247 LOC) vs Utkrushta/fastapi_service/github_utils.py (~1,000 LOC) — identical signatures
- utkrusht-task/data/generated/input_files/ — 297 files / 2.7 MB / ~82 competency dirs; paths hard-coded in generators/input_files/generator.py:41 and run_pipeline.py:53; human-tuning note at generator.py:555
- utkrusht-task/task_builder/Dockerfile — python:3.13-slim; Utkrushta service images on 3.11/3.12; notifications precedent: Utkrushta/notifications/ (own Dockerfile, 21 deps, own db layer)
- Scale: Utkrushta ≈ shared 30k · flask 21k · fastapi 13k · airflow 37k LOC; utkrusht-task runtime ≈ task_builder 1.5k · generators 11k · infra 7k · prompts 32k · task_artifacts 98 MB (generated)
Utkrusht · "Task Builder — one repo, two processes" · companion to Create a Task via Chat, from a Position. This document changes no application code; it records the architecture decision and the migration plan for the follow-up implementation.