html-docs · wiki sites · runbook
Everything shipped on branch wiki-sites is live on a preview deployment and fails open until you apply four SQL migrations. This runbook has the exact SQL to run, in order, and a copy-paste test script for each phase.
| Piece | What it does |
|---|---|
| Sites | Publish a folder as a multi-page site: every doc becomes a page at /site/<site>/<page> with a shared sidebar (sections from subfolders, search, prev/next). access: private = internal wiki, gated by folder roles, never cached publicly or indexed. Existing single-doc URLs are untouched. |
| Repo sync | POST /api/v1/sites/:id/sync — a manifest of markdown pages upserts idempotently (hash match = skip). Prune archives, never deletes. Sections auto-create subfolders. |
| Search + AI | Postgres FTS over titles + region content; GET /api/v1/search powers the sidebar search box and agent workspace search; Docsmith gets a searchWiki tool that cites sibling pages. |
| Domains | docs.acme.com → your site, via middleware host routing + the Vercel Domains API (Business plan). Per-site sitemap.xml / robots.txt. |
Every code path checks for the new tables and fails open — you can merge the branch before running any SQL and nothing breaks; sites features simply stay dormant until the migrations exist.
Run each block in the Supabase SQL Editor (dashboard → SQL Editor → paste → Run), or from the worktree with psql:
# from the worktree — each file below also lives in scripts/sql/
cd ~/projects/html-docs/.claude/worktrees/wiki-sites
psql "$POSTGRES_URL" -f scripts/sql/sites.sql
psql "$POSTGRES_URL" -f scripts/sql/sites_sync.sql
psql "$POSTGRES_URL" -f scripts/sql/sites_domains.sql
psql "$POSTGRES_URL" -f scripts/sql/search_fts.sql # see note below first
-- Sites: publish a folder as a multi-page documentation site / wiki.
--
-- A site is a THIN publishing layer over a folder. The folder keeps
-- owning structure + permissions (nesting, folder_collaborators roles,
-- cascade via lib/access/folder-access.ts); the sites row owns only
-- publishing concerns: the public slug, access mode, theme, index page,
-- and (later) a custom domain. Pages are ordinary documents inside the
-- folder tree that carry a per-site page_slug + position.
--
-- Run after folders.sql and documents_publish.sql.
create table if not exists sites (
id uuid primary key default gen_random_uuid(),
folder_id uuid not null unique references folders (id) on delete cascade,
owner_id uuid not null,
slug text not null,
name text not null default 'Untitled site',
-- 'public' → docs-site mode: CDN-cached, indexed
-- 'private' → wiki mode: folder-role gated, no-store, noindex
access text not null default 'public'
check (access in ('public', 'private')),
index_document_id uuid references documents (id) on delete set null,
-- { accent?: string, logo_url?: string, footer_text?: string, chrome?: 'full'|'none' }
theme jsonb not null default '{}'::jsonb,
published_at timestamptz,
created_at timestamptz not null default now(),
updated_at timestamptz not null default now()
);
create unique index if not exists sites_slug_key on sites (slug);
create index if not exists sites_owner_idx on sites (owner_id);
-- Matches the app-wide posture (RLS disabled; access enforced in app code).
alter table sites disable row level security;
-- ── Pages: slug + ordering on documents, scoped to a site ─────────────
--
-- site_id is DENORMALIZED from the folder chain (set when a doc is
-- created in / moved into a site's folder tree, cleared on move-out) so
-- that page-tree fetches, the per-site page_slug uniqueness index, and
-- FTS scoping are single-table queries. Kept in sync by app code in
-- lib/actions/folders.ts + lib/actions/sites.ts; drift is repairable
-- with sites_repair.sql.
alter table documents add column if not exists site_id uuid references sites (id) on delete set null;
alter table documents add column if not exists page_slug text;
alter table documents add column if not exists page_position integer not null default 0;
create unique index if not exists documents_site_page_slug_key
on documents (site_id, page_slug)
where site_id is not null and page_slug is not null;
create index if not exists documents_site_idx
on documents (site_id, page_position)
where site_id is not null;
-- ── Shared slug namespace ──────────────────────────────────────────────
--
-- sites.slug and documents.published_slug both resolve at /site/<seg1>,
-- so a value may exist in at most one of the two tables. App code checks
-- both in checkSlugAvailable(); these triggers are the belt-and-suspenders
-- guarantee under concurrent publishes.
create or replace function sites_slug_not_in_documents() returns trigger
language plpgsql as $$
begin
if exists (select 1 from documents where published_slug = new.slug) then
raise exception 'slug % already used by a published document', new.slug
using errcode = '23505';
end if;
return new;
end $$;
create or replace function documents_slug_not_in_sites() returns trigger
language plpgsql as $$
begin
if new.published_slug is not null
and exists (select 1 from sites where slug = new.published_slug) then
raise exception 'slug % already used by a site', new.published_slug
using errcode = '23505';
end if;
return new;
end $$;
drop trigger if exists sites_slug_cross_check on sites;
create trigger sites_slug_cross_check
before insert or update of slug on sites
for each row execute function sites_slug_not_in_documents();
drop trigger if exists documents_slug_cross_check on documents;
create trigger documents_slug_cross_check
before insert or update of published_slug on documents
for each row execute function documents_slug_not_in_sites();
-- Repo → site sync metadata (Phase 2 of the wiki/docs-sites plan).
--
-- Docs created/updated by `POST /api/v1/sites/:id/sync` are keyed to the
-- source file in the repo so CI runs are idempotent:
-- sync_source_path — repo-relative path of the markdown file
-- sync_hash — content hash from the manifest; matching hash → skip
--
-- Run after sites.sql.
alter table documents add column if not exists sync_source_path text;
alter table documents add column if not exists sync_hash text;
create index if not exists documents_sync_source_idx
on documents (site_id, sync_source_path)
where sync_source_path is not null;
-- Custom domains for sites (Phase 4 of the wiki/docs-sites plan).
--
-- custom_domain — apex or subdomain (e.g. docs.acme.com), unique
-- domain_verified_at — set once the TXT challenge + Vercel domain
-- attachment succeed; middleware only routes
-- verified domains
--
-- Run after sites.sql.
alter table sites add column if not exists custom_domain text;
alter table sites add column if not exists domain_verified_at timestamptz;
create unique index if not exists sites_custom_domain_key
on sites (custom_domain)
where custom_domain is not null;
This adds a trigger + a nullable tsv column (instant), but the GIN index build and the backfill touch every region row. On your current data size it's likely fine to run as-is; if editable_regions has grown into the millions, run the two create index statements separately during low traffic, then loop the backfill below until it reports UPDATE 0.
-- Full-text search over documents (Phase 3 of the wiki/docs-sites plan).
--
-- Region-level indexing: editable_regions is the text store, so the tsv
-- lives there — per-region snippets/anchors come free and no per-doc
-- reindex pipeline is needed. TRIGGER-maintained (not a generated
-- column) so the backfill can run in batches on a large table instead
-- of one giant table rewrite; the query shape is identical either way.
--
-- Run after sites.sql. The backfill loop at the bottom is safe to
-- re-run; run CREATE INDEX steps during low traffic.
-- Strip tags + script/style bodies for indexing. IMMUTABLE so it could
-- back a generated column later without changing callers.
create or replace function html_docs_strip_tags(html text) returns text
language sql immutable as $$
select regexp_replace(
regexp_replace(coalesce(html, ''),
'<(script|style)[^>]*>.*?</\1>', ' ', 'gis'),
'<[^>]+>', ' ', 'g')
$$;
-- ── Region text vectors ────────────────────────────────────────────────
alter table editable_regions add column if not exists tsv tsvector;
create or replace function editable_regions_tsv_update() returns trigger
language plpgsql as $$
begin
new.tsv := to_tsvector('english', left(html_docs_strip_tags(new.content), 100000));
return new;
end $$;
drop trigger if exists editable_regions_tsv_trigger on editable_regions;
create trigger editable_regions_tsv_trigger
before insert or update of content on editable_regions
for each row execute function editable_regions_tsv_update();
-- GIN index (use CONCURRENTLY in production: run this statement alone,
-- outside a transaction).
create index if not exists editable_regions_tsv_idx
on editable_regions using gin (tsv);
-- ── Title vectors ──────────────────────────────────────────────────────
alter table documents add column if not exists title_tsv tsvector;
create or replace function documents_title_tsv_update() returns trigger
language plpgsql as $$
begin
new.title_tsv := to_tsvector('english', coalesce(new.title, ''));
return new;
end $$;
drop trigger if exists documents_title_tsv_trigger on documents;
create trigger documents_title_tsv_trigger
before insert or update of title on documents
for each row execute function documents_title_tsv_update();
create index if not exists documents_title_tsv_idx
on documents using gin (title_tsv);
-- ── Search RPC ─────────────────────────────────────────────────────────
--
-- Ranks per document: 4× title match + best region match; returns the
-- best region's headline as the snippet. Scope with p_site_id (site
-- search) or p_owner_id (workspace search); at least one is required by
-- the API layer.
create or replace function search_pages(
p_query text,
p_site_id uuid default null,
p_owner_id uuid default null,
p_limit int default 20
) returns table (
document_id uuid,
title text,
page_slug text,
site_id uuid,
published_slug text,
score real,
snippet text
)
language sql stable as $$
with q as (select websearch_to_tsquery('english', p_query) as tsq),
region_hits as (
select r.document_id,
max(ts_rank(r.tsv, q.tsq)) as region_rank,
(array_agg(
ts_headline('english',
left(html_docs_strip_tags(r.content), 4000), q.tsq,
'MaxWords=24, MinWords=8, MaxFragments=1')
order by ts_rank(r.tsv, q.tsq) desc
))[1] as best_snippet
from editable_regions r, q
where r.tsv @@ q.tsq
group by r.document_id
),
title_hits as (
select d.id as document_id, ts_rank(d.title_tsv, q.tsq) as title_rank
from documents d, q
where d.title_tsv @@ q.tsq
),
merged as (
select coalesce(rh.document_id, th.document_id) as document_id,
coalesce(th.title_rank, 0) * 4 + coalesce(rh.region_rank, 0) as score,
rh.best_snippet
from region_hits rh
full outer join title_hits th on th.document_id = rh.document_id
)
select d.id, d.title, d.page_slug, d.site_id, d.published_slug,
m.score, m.best_snippet
from merged m
join documents d on d.id = m.document_id
where d.parent_document_id is null
and (p_site_id is null or d.site_id = p_site_id)
and (p_owner_id is null or d.owner_id = p_owner_id)
order by m.score desc
limit greatest(1, least(p_limit, 50))
$$;
-- ── Backfill (batched; re-run until it reports 0 rows) ────────────────
--
-- Each run vectorizes up to 5000 unindexed regions. Loop this statement
-- (psql \watch or a script) until done; it never locks the whole table.
-- update editable_regions
-- set tsv = to_tsvector('english', left(html_docs_strip_tags(content), 100000))
-- where id in (
-- select id from editable_regions where tsv is null limit 5000
-- );
-- update documents
-- set title_tsv = to_tsvector('english', coalesce(title, ''))
-- where id in (
-- select id from documents where title_tsv is null limit 5000
-- );
Then backfill existing rows (re-run each statement until it says UPDATE 0 — every run does up to 5,000 rows and never locks the table):
update editable_regions
set tsv = to_tsvector('english', left(html_docs_strip_tags(content), 100000))
where id in (select id from editable_regions where tsv is null limit 5000);
update documents
set title_tsv = to_tsvector('english', coalesce(title, ''))
where id in (select id from documents where title_tsv is null limit 5000);
Recomputes documents.site_id from the folder tree. You shouldn't need it — the app maintains membership on every move — but it's the fix if a page ever fails to show up in its site.
-- Repair drift in the denormalized documents.site_id column.
--
-- documents.site_id must equal the sites row of the nearest ancestor
-- folder that has one (folder chains are ≤ 25 deep — see
-- lib/access/folder-access.ts MAX_FOLDER_DEPTH). App code maintains it
-- on create/move; run this if a bug or manual DB edit lets it drift.
-- Idempotent; safe to run any time.
with recursive chain as (
-- Every folder mapped to its ancestor path (self first).
select f.id as folder_id, f.id as ancestor_id, 0 as depth
from folders f
union all
select c.folder_id, f.parent_folder_id, c.depth + 1
from chain c
join folders f on f.id = c.ancestor_id
where f.parent_folder_id is not null and c.depth < 25
),
resolved as (
-- Nearest ancestor (lowest depth) that is a site root.
select distinct on (c.folder_id) c.folder_id, s.id as site_id
from chain c
join sites s on s.folder_id = c.ancestor_id
order by c.folder_id, c.depth
)
update documents d
set site_id = r.site_id
from resolved r
where d.folder_id = r.folder_id
and d.site_id is distinct from r.site_id;
-- Clear site_id on docs whose folder chain no longer reaches a site.
update documents d
set site_id = null, page_slug = null
where d.site_id is not null
and (
d.folder_id is null
or not exists (
with recursive up as (
select d.folder_id as fid
union all
select f.parent_folder_id from folders f join up on f.id = up.fid
where f.parent_folder_id is not null
)
select 1 from up join sites s on s.folder_id = up.fid
)
);
Everything below works on the branch preview (PREVIEW=https://v0-collaborative-html-editor-git-wiki-sites-raunaqbns-projects.vercel.app) since it shares the production database — or locally with pnpm dev in the worktree. Get an agent key from Settings → Agent API keys and export it as $KEY.
test-wiki → Publish./site/test-wiki — you should land on the first page with the sidebar: pages listed, the subfolder as a section heading, prev/next at the bottom.# nav chrome + caching are right:
curl -s "$PREVIEW/site/test-wiki" | grep -c "hd-site-chrome" # → 1
curl -sI "$PREVIEW/site/test-wiki" | grep -i cache-control # public, max-age=60, s-w-r
# legacy regression — any EXISTING single-doc page still serves identically:
curl -sI "$PREVIEW/site/<an-existing-old-slug>" | head -3
Then flip the site to Private wiki in the dialog and verify the leak-proofing — this is the critical check:
# logged out: 302 to login, and NEVER a public cache header
curl -sI "$PREVIEW/site/test-wiki" | grep -iE "location|cache-control|x-robots"
# want: location: /auth/login?... cache-control: private, no-store (on page responses)
# run twice and confirm x-vercel-cache is MISS/BYPASS both times — never HIT
curl -sI "$PREVIEW/site/test-wiki" | grep -i x-vercel-cache
# create a site (bootstraps its folder too)
SITE_ID=$(curl -sS -X POST "$PREVIEW/api/v1/sites" \
-H "Authorization: Bearer $KEY" -H 'Content-Type: application/json' \
-d '{"folder_title":"Synced Docs","slug":"synced-docs"}' | jq -r .site.id)
# sync two pages
curl -sS -X POST "$PREVIEW/api/v1/sites/$SITE_ID/sync" \
-H "Authorization: Bearer $KEY" -H 'Content-Type: application/json' \
-d '{"pages":[
{"path":"docs/index.md","slug":"overview","title":"Overview","order":10,
"hash":"sha256:aaa","markdown":"# Overview\nHello from the repo."},
{"path":"docs/deploy.md","slug":"deploy","title":"Deploying","order":20,
"section":"Guides","hash":"sha256:bbb","markdown":"# Deploy\nShip it."}
]}' | jq .counts
# → { "created": 2 }
# run the EXACT same call again — idempotence check
# → { "skipped": 2 } (hashes matched; no versions spammed)
# change one hash+body and re-run → { "updated": 1, "skipped": 1 }
# drop deploy.md from the manifest and add "prune":true → { "archived": 1, ... }
curl -s "$PREVIEW/site/synced-docs/deploy" | grep -o "Ship it"
# site search (public, powers the sidebar box)
curl -s "$PREVIEW/api/v1/search?site=synced-docs&q=deploy" | jq '.results[0]'
# → title, url (/site/synced-docs/deploy), snippet with <b>deploy</b>
# workspace search (agent key; whole account, title + body)
curl -s "$PREVIEW/api/v1/search?q=onboarding" -H "Authorization: Bearer $KEY" | jq '.results | length'
searchWiki and answer citing "Deploying" with the page link.403.VERCEL_API_TOKEN + VERCEL_PROJECT_ID in the project env (one-time; without them the dialog saves the domain and shows a manual-setup note).docs.yourtest.com → Attach → add the shown DNS records → Check until Verified.# after verification, from anywhere:
curl -sI https://docs.yourtest.com/ | head -3 # serves the site index
curl -s https://docs.yourtest.com/sitemap.xml | head -5 # root-relative URLs
curl -s https://docs.yourtest.com/robots.txt # Allow + sitemap pointer
| Behavior | Why / what to expect |
|---|---|
| Nav updates take ≤60s on other pages | Renaming/adding a page revalidates only that page + the index; the rest of the site rides the 60-second edge-cache window. Deliberate — no 250-page invalidation fan-outs. |
| Sync can detach comments | A changed file replaces the whole page content (after a version snapshot), so region-anchored comments on rewritten text may orphan. Documented v1 limitation. |
| Full-viewport pages + sidebar | Dashboards with 100vw/100vh layouts can fight the sidebar's push. Per-site escape hatch: set theme.chrome = "none" (PATCH the site). Worth eyeballing a couple of real published pages. |
| Search returns nothing | Until search_fts.sql is applied AND the backfill loops report 0 — new edits index instantly via trigger, old content needs the backfill. |