A four-phase plan to evolve HTML Docs from single published pages into the default way developers host documentation — internal project wikis an agent keeps current, and public docs sites — built almost entirely on primitives the product already has.
Status plan approved in principle, implementation not started ·Date Jul 4, 2026 ·Grounding every claim verified against the codebase
01Why this, why now
HTML Docs already has the hard parts of a documentation platform: instant CLI/agent publishing, collaborative region-based editing, comments with AI answers, and folders with cascading permissions. What it lacks is exactly three things — and all three are additive, not rewrites:
Gap
Today
No multi-page sites
Every published page is standalone. Folders group docs for permissions, not navigation. "Tabs" exist in the editor but only the root tab publishes.
No search
Zero server-side full-text search. The dashboard filters titles client-side; the chat source picker does a title-only ilike.
No repo bridge
No way to sync a repo's /docs markdown into a living site. The API already accepts markdown — but one doc per request, no manifest, no idempotence.
The differentiator against Notion/Confluence (internal) and Mintlify/GitBook (external) is the agent surface that already exists: hdk_ API keys with per-agent attribution, an MCP server, a published skill, and account-level webhooks. Nobody else's docs product treats the agent as a first-class maintainer.
Decisions locked
One "sites" primitive serves both wedges — internal wiki and public docs site are the same thing with different access settings.
App-canonical + repo sync — HTML Docs is the source of truth; markdown syncs in from repos via CLI/CI (one-way; two-way deferred).
Knowledge layer v1 = search + "ask this wiki" — full-text search and grounded AI answers first; wikilinks/backlinks/graph later.
02The core idea: a site is a published folder
Folders already have everything a wiki needs for structure and access: nesting (25 deep), collaborator roles (viewer → admin), cascading permissions, visibility, share codes. So we don't invent a new container — we add a thin sites table that points at a folder and owns only the publishing concerns: the URL slug, theme, index page, and (later) a custom domain. Pages are ordinary documents inside the folder, each gaining a page slug and a position.
Fig. 1 — Blue marks what's new. The folder tree, permissions, and page-rendering pipeline are untouched; the chrome injection reuses the exact string-injection seam that already adds OG tags and the view tracker to every published page.
Internal wiki vs public docs site is one switch: a private site runs a folder-role check in the serving route (session cookies are already available there) and responds with no-store cache headers; a public site keeps today's CDN caching and gets indexed.
03The four phases
1 · Sites primitivefolder → published multi-page site with naveffort L
2 · Repo syncmd → site via CLI/CI, agents maintaineffort M
4 · Growthdomains · SEO · teams · importseffort L, parallel tracks
Each phase ships independently and is useful on its own.
Phase 1 — Sites primitive
ships the producteffort L
Everything needed to publish a folder as a navigable site at /site/<site>/<page>.
Schema: new sites table (1:1 with a folder); documents gain site_id, page_slug, page_position. Page slugs unique per site; site slugs share the existing namespace with single-doc slugs (checked in both directions).
Routing: the single-slug route becomes a catch-all. One segment behaves exactly as today (zero regression for every existing published page), then falls through to site lookup; two segments resolve site + page.
Shared chrome: sidebar (logo, page tree grouped by subfolder sections, search box), breadcrumbs, prev/next — rendered in Declarative Shadow DOM so the page's own CSS can't bleed into it. Theme knobs (accent, logo) per site; an opt-out for full-viewport pages like dashboards.
Freshness over fan-out: nav is one indexed query per request; a renamed page appears across a 250-page site within the existing 60-second cache window instead of triggering 250 cache invalidations.
Private wikis: role check in the route + private, no-store headers + noindex. Public sites keep CDN caching.
Editor: "Publish as site" on folders, page reordering, "new page in site", and a link picker that inserts links to sibling pages.
Billing: each published site page consumes one existing plan page-slot — no pricing changes needed to ship.
Phase 2 — Repo sync & agent maintenance
the wedgeeffort M
The "second brain that stays current" story: a repo's /docs folder becomes a site, and agents keep it alive.
Fig. 2 — One-way sync in; one-way notify out. When someone edits a synced page in the app, the repo owner's webhook fires with the source path so a bot can open an issue or PR.
New sync endpoint takes a manifest of pages (path, slug, title, order, hash, markdown) and upserts idempotently — unchanged files cost nothing, so CI runs are cheap.
Reuses the existing machinery: markdown→HTML conversion, document creation, and content replacement (which already snapshots a version first — in-app edit history survives syncs).
CLI + GitHub Action:npx @html-docs/cli sync ./docs --site acme; directory structure becomes nav sections; README/index becomes the index page.
Agent skill updated so any coding agent can create, sync, and maintain a project wiki with the keys/attribution that already exist.
Known limitation (documented): a full content replacement can detach comments anchored to rewritten regions; region-stable diffing is a later refinement.
Phase 3 — Knowledge layer v1
the braineffort M/L
Full-text search without a reindex pipeline: a generated tsvector column directly on the region table (the text already lives there), kept fresh by Postgres itself on every write. No dirty flags, no cron, and per-region snippets/deep-links come free. Title matches boosted in ranking.
Two surfaces: the site search box in the Phase-1 chrome (public for public sites; role-gated for private wikis), and real workspace search in the dashboard replacing today's title-only filter.
"Ask this wiki": Docsmith gains a retrieval tool over the site's pages and answers with page citations. Ships to members first; a public ask-widget on published sites comes later, gated by plan + owner-billed credits + durable rate limiting (the current in-memory limiter isn't enough for public abuse control).
Embeddings-ready: retrieval hides behind one interface, so swapping Postgres FTS for pgvector later changes the implementation, not the callers.
Phase 4 — Growth & monetization
effort L · independent tracks
Track
Approach
Custom domainsfirst
Host routing in middleware rewrites docs.acme.com/* to the site; Vercel Domains API + TXT verification from a settings panel. Business plan.
SEO
Per-site sitemap.xml + robots.txt from the same route; private sites are already noindex from Phase 1.
Teams / seats
Real org model is its own initiative. Interim: folder collaborators already give per-site teams; price by sites/pages, defer seats.
Versioned docs
Defer — sync already snapshots every version, so "view page as of vX" is a cheap read-only render later.
Import funnels
Notion / Confluence / GitBook exports are zips of md+html — they funnel through the Phase-2 sync endpoint as guided flows.
04Validate these three things before writing Phase-1 code
Risk
How we de-risk it
Chrome injection vs arbitrary user HTML
Published pages are arbitrary HTML — full-viewport dashboards, flex bodies, fixed headers. Prototype the shadow-DOM sidebar against 15–20 real production pages before committing. Fallback: iframe shell (worse for SEO).
Private wikis on a CDN
One wrong cache header leaks a private wiki to strangers. Verify on a Vercel preview that sessions are readable in the serving route and that no-store responses never hit the edge cache.
FTS backfill at scale
Adding a generated column rewrites every region row. Dry-run the migration on a staging copy with real row counts; fallback is a trigger-maintained column with identical query shape.
05How we'll know each phase works
P1: a 10-page folder publishes as a site with nav/breadcrumbs/prev-next; a private site 302s logged-out visitors and never appears in the edge cache; every existing single-doc URL, OG image, and print view is regression-tested unchanged.
P2: syncing a real repo's docs twice makes the second run a no-op (hash idempotence); renames/deletes archive rather than destroy; an in-app edit of a synced page fires the repo webhook with the source path.
P3: seeded-corpus searches return ranked, snippeted, site-scoped results; private search is denied without a role; ask-this-wiki cites pages; the retrieval interface passes a swap test.
P4: a test custom domain serves end-to-end with correct canonical URLs.
Plan doc · companion to the approved plan file · sources: two codebase exploration passes + an architecture design pass, all grounded in named files (app/site/[slug]/route.ts, lib/actions/folders.ts, lib/access/folder-access.ts, lib/document-api-ops.ts, app/api/chat/route.ts, lib/billing/plans.ts). Nothing has been implemented yet — say go and Phase 1 starts with the three risk validations.