html-docs · wiki sites · runbook

Migrations & testing: the wiki-sites branch

Everything shipped on branch wiki-sites is live on a preview deployment and fails open until you apply four SQL migrations. This runbook has the exact SQL to run, in order, and a copy-paste test script for each phase.

Branch wiki-sites (3 commits) · Preview git-wiki-sites preview · PR compare → open PR

01What shipped, in one minute

PieceWhat it does
SitesPublish a folder as a multi-page site: every doc becomes a page at /site/<site>/<page> with a shared sidebar (sections from subfolders, search, prev/next). access: private = internal wiki, gated by folder roles, never cached publicly or indexed. Existing single-doc URLs are untouched.
Repo syncPOST /api/v1/sites/:id/sync — a manifest of markdown pages upserts idempotently (hash match = skip). Prune archives, never deletes. Sections auto-create subfolders.
Search + AIPostgres FTS over titles + region content; GET /api/v1/search powers the sidebar search box and agent workspace search; Docsmith gets a searchWiki tool that cites sibling pages.
Domainsdocs.acme.com → your site, via middleware host routing + the Vercel Domains API (Business plan). Per-site sitemap.xml / robots.txt.
Safe-by-default

Every code path checks for the new tables and fails open — you can merge the branch before running any SQL and nothing breaks; sites features simply stay dormant until the migrations exist.

02The SQL — run these, in this order

Run each block in the Supabase SQL Editor (dashboard → SQL Editor → paste → Run), or from the worktree with psql:

# from the worktree — each file below also lives in scripts/sql/
cd ~/projects/html-docs/.claude/worktrees/wiki-sites
psql "$POSTGRES_URL" -f scripts/sql/sites.sql
psql "$POSTGRES_URL" -f scripts/sql/sites_sync.sql
psql "$POSTGRES_URL" -f scripts/sql/sites_domains.sql
psql "$POSTGRES_URL" -f scripts/sql/search_fts.sql   # see note below first

1 · sites.sql — the core tables safe, additive

scripts/sql/sites.sql table + columns + slug triggers
-- Sites: publish a folder as a multi-page documentation site / wiki.
--
-- A site is a THIN publishing layer over a folder. The folder keeps
-- owning structure + permissions (nesting, folder_collaborators roles,
-- cascade via lib/access/folder-access.ts); the sites row owns only
-- publishing concerns: the public slug, access mode, theme, index page,
-- and (later) a custom domain. Pages are ordinary documents inside the
-- folder tree that carry a per-site page_slug + position.
--
-- Run after folders.sql and documents_publish.sql.

create table if not exists sites (
  id                 uuid primary key default gen_random_uuid(),
  folder_id          uuid not null unique references folders (id) on delete cascade,
  owner_id           uuid not null,
  slug               text not null,
  name               text not null default 'Untitled site',
  -- 'public'  → docs-site mode: CDN-cached, indexed
  -- 'private' → wiki mode: folder-role gated, no-store, noindex
  access             text not null default 'public'
                     check (access in ('public', 'private')),
  index_document_id  uuid references documents (id) on delete set null,
  -- { accent?: string, logo_url?: string, footer_text?: string, chrome?: 'full'|'none' }
  theme              jsonb not null default '{}'::jsonb,
  published_at       timestamptz,
  created_at         timestamptz not null default now(),
  updated_at         timestamptz not null default now()
);

create unique index if not exists sites_slug_key on sites (slug);
create index if not exists sites_owner_idx on sites (owner_id);

-- Matches the app-wide posture (RLS disabled; access enforced in app code).
alter table sites disable row level security;

-- ── Pages: slug + ordering on documents, scoped to a site ─────────────
--
-- site_id is DENORMALIZED from the folder chain (set when a doc is
-- created in / moved into a site's folder tree, cleared on move-out) so
-- that page-tree fetches, the per-site page_slug uniqueness index, and
-- FTS scoping are single-table queries. Kept in sync by app code in
-- lib/actions/folders.ts + lib/actions/sites.ts; drift is repairable
-- with sites_repair.sql.

alter table documents add column if not exists site_id uuid references sites (id) on delete set null;
alter table documents add column if not exists page_slug text;
alter table documents add column if not exists page_position integer not null default 0;

create unique index if not exists documents_site_page_slug_key
  on documents (site_id, page_slug)
  where site_id is not null and page_slug is not null;
create index if not exists documents_site_idx
  on documents (site_id, page_position)
  where site_id is not null;

-- ── Shared slug namespace ──────────────────────────────────────────────
--
-- sites.slug and documents.published_slug both resolve at /site/<seg1>,
-- so a value may exist in at most one of the two tables. App code checks
-- both in checkSlugAvailable(); these triggers are the belt-and-suspenders
-- guarantee under concurrent publishes.

create or replace function sites_slug_not_in_documents() returns trigger
language plpgsql as $$
begin
  if exists (select 1 from documents where published_slug = new.slug) then
    raise exception 'slug % already used by a published document', new.slug
      using errcode = '23505';
  end if;
  return new;
end $$;

create or replace function documents_slug_not_in_sites() returns trigger
language plpgsql as $$
begin
  if new.published_slug is not null
     and exists (select 1 from sites where slug = new.published_slug) then
    raise exception 'slug % already used by a site', new.published_slug
      using errcode = '23505';
  end if;
  return new;
end $$;

drop trigger if exists sites_slug_cross_check on sites;
create trigger sites_slug_cross_check
  before insert or update of slug on sites
  for each row execute function sites_slug_not_in_documents();

drop trigger if exists documents_slug_cross_check on documents;
create trigger documents_slug_cross_check
  before insert or update of published_slug on documents
  for each row execute function documents_slug_not_in_sites();

2 · sites_sync.sql — repo sync columns safe, additive

scripts/sql/sites_sync.sql 2 columns + 1 index
-- Repo → site sync metadata (Phase 2 of the wiki/docs-sites plan).
--
-- Docs created/updated by `POST /api/v1/sites/:id/sync` are keyed to the
-- source file in the repo so CI runs are idempotent:
--   sync_source_path — repo-relative path of the markdown file
--   sync_hash        — content hash from the manifest; matching hash → skip
--
-- Run after sites.sql.

alter table documents add column if not exists sync_source_path text;
alter table documents add column if not exists sync_hash text;

create index if not exists documents_sync_source_idx
  on documents (site_id, sync_source_path)
  where sync_source_path is not null;

3 · sites_domains.sql — custom domains safe, additive

scripts/sql/sites_domains.sql 2 columns + 1 unique index
-- Custom domains for sites (Phase 4 of the wiki/docs-sites plan).
--
--   custom_domain      — apex or subdomain (e.g. docs.acme.com), unique
--   domain_verified_at — set once the TXT challenge + Vercel domain
--                        attachment succeed; middleware only routes
--                        verified domains
--
-- Run after sites.sql.

alter table sites add column if not exists custom_domain text;
alter table sites add column if not exists domain_verified_at timestamptz;

create unique index if not exists sites_custom_domain_key
  on sites (custom_domain)
  where custom_domain is not null;

4 · search_fts.sql — full-text search read the note

Before running

This adds a trigger + a nullable tsv column (instant), but the GIN index build and the backfill touch every region row. On your current data size it's likely fine to run as-is; if editable_regions has grown into the millions, run the two create index statements separately during low traffic, then loop the backfill below until it reports UPDATE 0.

scripts/sql/search_fts.sql strip fn + triggers + indexes + search_pages RPC
-- Full-text search over documents (Phase 3 of the wiki/docs-sites plan).
--
-- Region-level indexing: editable_regions is the text store, so the tsv
-- lives there — per-region snippets/anchors come free and no per-doc
-- reindex pipeline is needed. TRIGGER-maintained (not a generated
-- column) so the backfill can run in batches on a large table instead
-- of one giant table rewrite; the query shape is identical either way.
--
-- Run after sites.sql. The backfill loop at the bottom is safe to
-- re-run; run CREATE INDEX steps during low traffic.

-- Strip tags + script/style bodies for indexing. IMMUTABLE so it could
-- back a generated column later without changing callers.
create or replace function html_docs_strip_tags(html text) returns text
language sql immutable as $$
  select regexp_replace(
           regexp_replace(coalesce(html, ''),
             '<(script|style)[^>]*>.*?</\1>', ' ', 'gis'),
           '<[^>]+>', ' ', 'g')
$$;

-- ── Region text vectors ────────────────────────────────────────────────

alter table editable_regions add column if not exists tsv tsvector;

create or replace function editable_regions_tsv_update() returns trigger
language plpgsql as $$
begin
  new.tsv := to_tsvector('english', left(html_docs_strip_tags(new.content), 100000));
  return new;
end $$;

drop trigger if exists editable_regions_tsv_trigger on editable_regions;
create trigger editable_regions_tsv_trigger
  before insert or update of content on editable_regions
  for each row execute function editable_regions_tsv_update();

-- GIN index (use CONCURRENTLY in production: run this statement alone,
-- outside a transaction).
create index if not exists editable_regions_tsv_idx
  on editable_regions using gin (tsv);

-- ── Title vectors ──────────────────────────────────────────────────────

alter table documents add column if not exists title_tsv tsvector;

create or replace function documents_title_tsv_update() returns trigger
language plpgsql as $$
begin
  new.title_tsv := to_tsvector('english', coalesce(new.title, ''));
  return new;
end $$;

drop trigger if exists documents_title_tsv_trigger on documents;
create trigger documents_title_tsv_trigger
  before insert or update of title on documents
  for each row execute function documents_title_tsv_update();

create index if not exists documents_title_tsv_idx
  on documents using gin (title_tsv);

-- ── Search RPC ─────────────────────────────────────────────────────────
--
-- Ranks per document: 4× title match + best region match; returns the
-- best region's headline as the snippet. Scope with p_site_id (site
-- search) or p_owner_id (workspace search); at least one is required by
-- the API layer.

create or replace function search_pages(
  p_query    text,
  p_site_id  uuid default null,
  p_owner_id uuid default null,
  p_limit    int  default 20
) returns table (
  document_id uuid,
  title       text,
  page_slug   text,
  site_id     uuid,
  published_slug text,
  score       real,
  snippet     text
)
language sql stable as $$
  with q as (select websearch_to_tsquery('english', p_query) as tsq),
  region_hits as (
    select r.document_id,
           max(ts_rank(r.tsv, q.tsq)) as region_rank,
           (array_agg(
              ts_headline('english',
                left(html_docs_strip_tags(r.content), 4000), q.tsq,
                'MaxWords=24, MinWords=8, MaxFragments=1')
              order by ts_rank(r.tsv, q.tsq) desc
            ))[1] as best_snippet
    from editable_regions r, q
    where r.tsv @@ q.tsq
    group by r.document_id
  ),
  title_hits as (
    select d.id as document_id, ts_rank(d.title_tsv, q.tsq) as title_rank
    from documents d, q
    where d.title_tsv @@ q.tsq
  ),
  merged as (
    select coalesce(rh.document_id, th.document_id) as document_id,
           coalesce(th.title_rank, 0) * 4 + coalesce(rh.region_rank, 0) as score,
           rh.best_snippet
    from region_hits rh
    full outer join title_hits th on th.document_id = rh.document_id
  )
  select d.id, d.title, d.page_slug, d.site_id, d.published_slug,
         m.score, m.best_snippet
  from merged m
  join documents d on d.id = m.document_id
  where d.parent_document_id is null
    and (p_site_id is null or d.site_id = p_site_id)
    and (p_owner_id is null or d.owner_id = p_owner_id)
  order by m.score desc
  limit greatest(1, least(p_limit, 50))
$$;

-- ── Backfill (batched; re-run until it reports 0 rows) ────────────────
--
-- Each run vectorizes up to 5000 unindexed regions. Loop this statement
-- (psql \watch or a script) until done; it never locks the whole table.

-- update editable_regions
-- set tsv = to_tsvector('english', left(html_docs_strip_tags(content), 100000))
-- where id in (
--   select id from editable_regions where tsv is null limit 5000
-- );

-- update documents
-- set title_tsv = to_tsvector('english', coalesce(title, ''))
-- where id in (
--   select id from documents where title_tsv is null limit 5000
-- );

Then backfill existing rows (re-run each statement until it says UPDATE 0 — every run does up to 5,000 rows and never locks the table):

update editable_regions
set tsv = to_tsvector('english', left(html_docs_strip_tags(content), 100000))
where id in (select id from editable_regions where tsv is null limit 5000);

update documents
set title_tsv = to_tsvector('english', coalesce(title, ''))
where id in (select id from documents where title_tsv is null limit 5000);

Optional · sites_repair.sql only if drift

Recomputes documents.site_id from the folder tree. You shouldn't need it — the app maintains membership on every move — but it's the fix if a page ever fails to show up in its site.

scripts/sql/sites_repair.sql idempotent repair
-- Repair drift in the denormalized documents.site_id column.
--
-- documents.site_id must equal the sites row of the nearest ancestor
-- folder that has one (folder chains are ≤ 25 deep — see
-- lib/access/folder-access.ts MAX_FOLDER_DEPTH). App code maintains it
-- on create/move; run this if a bug or manual DB edit lets it drift.
-- Idempotent; safe to run any time.

with recursive chain as (
  -- Every folder mapped to its ancestor path (self first).
  select f.id as folder_id, f.id as ancestor_id, 0 as depth
  from folders f
  union all
  select c.folder_id, f.parent_folder_id, c.depth + 1
  from chain c
  join folders f on f.id = c.ancestor_id
  where f.parent_folder_id is not null and c.depth < 25
),
resolved as (
  -- Nearest ancestor (lowest depth) that is a site root.
  select distinct on (c.folder_id) c.folder_id, s.id as site_id
  from chain c
  join sites s on s.folder_id = c.ancestor_id
  order by c.folder_id, c.depth
)
update documents d
set site_id = r.site_id
from resolved r
where d.folder_id = r.folder_id
  and d.site_id is distinct from r.site_id;

-- Clear site_id on docs whose folder chain no longer reaches a site.
update documents d
set site_id = null, page_slug = null
where d.site_id is not null
  and (
    d.folder_id is null
    or not exists (
      with recursive up as (
        select d.folder_id as fid
        union all
        select f.parent_folder_id from folders f join up on f.id = up.fid
        where f.parent_folder_id is not null
      )
      select 1 from up join sites s on s.folder_id = up.fid
    )
  );

03Testing — phase by phase

Everything below works on the branch preview (PREVIEW=https://v0-collaborative-html-editor-git-wiki-sites-raunaqbns-projects.vercel.app) since it shares the production database — or locally with pnpm dev in the worktree. Get an agent key from Settings → Agent API keys and export it as $KEY.

Phase 1 — publish a folder as a site (2 min, in the UI)

# nav chrome + caching are right:
curl -s "$PREVIEW/site/test-wiki" | grep -c "hd-site-chrome"        # → 1
curl -sI "$PREVIEW/site/test-wiki" | grep -i cache-control          # public, max-age=60, s-w-r

# legacy regression — any EXISTING single-doc page still serves identically:
curl -sI "$PREVIEW/site/<an-existing-old-slug>" | head -3

Then flip the site to Private wiki in the dialog and verify the leak-proofing — this is the critical check:

# logged out: 302 to login, and NEVER a public cache header
curl -sI "$PREVIEW/site/test-wiki" | grep -iE "location|cache-control|x-robots"
# want: location: /auth/login?...  cache-control: private, no-store (on page responses)
# run twice and confirm x-vercel-cache is MISS/BYPASS both times — never HIT
curl -sI "$PREVIEW/site/test-wiki" | grep -i x-vercel-cache

Phase 2 — repo sync (the second-brain loop)

# create a site (bootstraps its folder too)
SITE_ID=$(curl -sS -X POST "$PREVIEW/api/v1/sites" \
  -H "Authorization: Bearer $KEY" -H 'Content-Type: application/json' \
  -d '{"folder_title":"Synced Docs","slug":"synced-docs"}' | jq -r .site.id)

# sync two pages
curl -sS -X POST "$PREVIEW/api/v1/sites/$SITE_ID/sync" \
  -H "Authorization: Bearer $KEY" -H 'Content-Type: application/json' \
  -d '{"pages":[
    {"path":"docs/index.md","slug":"overview","title":"Overview","order":10,
     "hash":"sha256:aaa","markdown":"# Overview\nHello from the repo."},
    {"path":"docs/deploy.md","slug":"deploy","title":"Deploying","order":20,
     "section":"Guides","hash":"sha256:bbb","markdown":"# Deploy\nShip it."}
  ]}' | jq .counts
# → { "created": 2 }

# run the EXACT same call again — idempotence check
# → { "skipped": 2 }   (hashes matched; no versions spammed)

# change one hash+body and re-run → { "updated": 1, "skipped": 1 }
# drop deploy.md from the manifest and add "prune":true → { "archived": 1, ... }

curl -s "$PREVIEW/site/synced-docs/deploy" | grep -o "Ship it"

Phase 3 — search + ask-the-wiki needs search_fts.sql + backfill

# site search (public, powers the sidebar box)
curl -s "$PREVIEW/api/v1/search?site=synced-docs&q=deploy" | jq '.results[0]'
# → title, url (/site/synced-docs/deploy), snippet with <b>deploy</b>

# workspace search (agent key; whole account, title + body)
curl -s "$PREVIEW/api/v1/search?q=onboarding" -H "Authorization: Bearer $KEY" | jq '.results | length'

Phase 4 — custom domain (needs Business plan + Vercel env)

# after verification, from anywhere:
curl -sI https://docs.yourtest.com/ | head -3            # serves the site index
curl -s  https://docs.yourtest.com/sitemap.xml | head -5 # root-relative URLs
curl -s  https://docs.yourtest.com/robots.txt            # Allow + sitemap pointer

04Known limits to keep in mind while testing

BehaviorWhy / what to expect
Nav updates take ≤60s on other pagesRenaming/adding a page revalidates only that page + the index; the rest of the site rides the 60-second edge-cache window. Deliberate — no 250-page invalidation fan-outs.
Sync can detach commentsA changed file replaces the whole page content (after a version snapshot), so region-anchored comments on rewritten text may orphan. Documented v1 limitation.
Full-viewport pages + sidebarDashboards with 100vw/100vh layouts can fight the sidebar's push. Per-site escape hatch: set theme.chrome = "none" (PATCH the site). Worth eyeballing a couple of real published pages.
Search returns nothingUntil search_fts.sql is applied AND the backfill loops report 0 — new edits index instantly via trigger, old content needs the backfill.