html-docs · design system 2.0

The quiet system

An audit of what made our published documents read as machine-made, and a revamped system that fixes it — with a worked example for every page type. Switch tabs to see each archetype.

July 2026 · 6 example pages + audit · self-contained, CSS-only tabs

the audit

Six tics made every document look generated. Here is what replaces them.

The old system had good bones — archetypes, self-contained rendering, real diagrams. But its mandates for a “memorable signature,” dramatic scale, and atmospheric backgrounds pushed every page toward the same recognizable AI look.

Scope skill design system + recent published docs · Verdict keep the archetypes, replace the surface language

01What the audit found

Before — the ticAfter — the rule
Gradient-filled headline text (background-clip:text)Ink at scale. Size is the emphasis, never a fill effect.
Pill badges on everything (border-radius:999px, filled)Pills only for status, outlined, square-cornered.
Every block in a rounded, drop-shadowed cardHairline rules and whitespace. A card must earn its border.
Centered hero: eyebrow + huge headline + subtitle + emojiLeft-aligned title block; one eyebrow per document, at the top.
Balanced five-color palettes; status hues used decorativelyInk does the work. One accent, under 10% of the surface.
The same warm radial-gradient wash behind every documentFlat paper. Variety comes from type and accent, not texture.

02Before / after

Before — reads as generated
🚀 Q3 Platform Review
Supercharging Our Infrastructure
Innovation Velocity Impact
After — reads as written
Platform · Q3 review
Infrastructure held at 4× traffic. Two systems didn’t.

What scaled, what buckled, and the three fixes we’re funding next quarter.

30 Jun 2026 · platform team · internal

03The ten rules

  1. Ink does the work. Color annotates; it never carries the hierarchy.
  2. One accent per document, used on less than ~10% of the surface: eyebrow, section numbers, links, one rule.
  3. Hairlines over shadows. Corner radius 4px or less. Shadows only on genuinely interactive elements.
  4. A card must earn its border. The default container is whitespace and a horizontal rule.
  5. Left-align. No centered heroes. Titles state a finding, not a theme.
  6. Hierarchy comes from the type scale, not from boxes, backgrounds, or weight-800 sans.
  7. Pills are for status only — outlined, uppercase, 11px. Everything else is plain text.
  8. No gradient text, no emoji as icons, no decorative gradients anywhere.
  9. Diagrams are monochrome plus the accent. Every box and arrow labeled; a legend only past two colors.
  10. Status colors mean status. Green/amber/red appear only when something is actually good, at risk, or broken.

04Tokens

Neutrals — shared by every document

Paper
#FBFAF7
Ink
#1C1A17
Soft
#55514B
Faint
#8B867D
Hairline
#E6E2D9

Accents — one per document, never together

Clay
#BC5B3C
Slate
#35618E
Teal
#2F6B75
Plum
#6D5586
Bronze
#8A6A2F
Sage
#4C7350
Sepia
#7A4A3A

Each tab in this showcase uses a different accent — that is the entire per-document theming budget. Same paper, same ink, same hairlines everywhere.

Type scale — Georgia display over system sans

42 / displayThe quiet system
21.5 / sectionProposed design
16 / bodyBody text stays 15–16px, line-height 1.6, max 640px measure.
11 / labelSection label · uppercase · letterspaced
12.5 / monop95 184ms · used for numbers, dates, identifiers

RFC-041 · platform

Take image processing off the request path

Uploads block on resize and transcode today. This RFC proposes returning 202 immediately and processing asynchronously through a queue.

In review · Owner platform team · Updated 30 Jun 2026 · Reviewers 2 of 3 approved

TL;DR

Upload p95 is 3.1s because resize/transcode runs inline. Move processing to a queue and worker pool, return 202 Accepted with a status URL, and target p95 under 400ms. Cost: one new managed component (SQS). Rollout is dual-write behind a flag, two weeks.

01Problem

Every image upload synchronously resizes to four variants and transcodes to WebP before responding. At current volume (~140k uploads/day) this holds a request thread for 2–4 seconds, dominates API p95, and couples upload availability to ImageMagick failures — 31% of our 5xx responses in June traced to processing errors, not upload errors.

02Goals and non-goals

Goals

  • Upload p95 under 400ms
  • Processing failures never fail the upload
  • Client can poll or subscribe for readiness

Non-goals

  • Changing the variant set or formats
  • On-the-fly transformation API
  • Migrating existing stored images

03Proposed design

The API stores the original and enqueues a job. Workers pull, process, write variants, and flip the asset’s status. The client gets a status_url in the 202 response.

Client API Queue (SQS) Worker pool Object storage upload enqueue pull job store original write variants ← 202 + status_url
Fig. 1 — After. Blue marks the new asynchronous path; the synchronous path ends at “store original.”

04Alternatives considered

OptionUpload p95New infraVerdict
Async queue + workers~350msSQS + worker deployProposed
Bigger API instances~1.9snoneCost scales with the wrong axis; failures still coupled
Process in Lambda on S3 event~350msLambda + IAM surfaceViable; weaker retry semantics, cold-start tail

05Rollout

06Risks

RiskMitigation
Clients that assume variants exist at 200Audit found 3 call sites; all migrate to status_url polling this sprint
Queue backlog during traffic spikesWorker autoscale on queue depth; alert at 5-minute lag
Lost jobsSQS at-least-once + idempotent workers; DLQ with replay runbook

platform health · week 26

Steady week. Ingest error rate is the one thing to watch.

Operational · Updated 30 Jun 2026, 09:00 PT · Window Jun 16 – 29

41.2M
Requests
+6.4% wow
184ms
p95 latency
−12ms wow
0.42%
Error rate
+0.09pt wow
$0.86
Cost / 1k req
flat

01p95 latency, last 14 days

240 200 160 184ms Jun 16 Jun 29
Milliseconds. The drop after Jun 20 is the CDN cache-key fix shipping.

02Error rate by service

ingest
1.8%
api
0.9%
search
0.5%
auth
0.3%
billing
0.2%

Ingest is elevated because retries against the legacy image path double-count during the RFC-041 rollout. Expected to normalize when the inline path is removed in week 28.

03Active alerts

AlertServiceSinceState
Queue lag over 5 miningestJun 27Investigating
Cert expires in 20 daysedgeJun 29Scheduled
Elevated 429s, one tenantapiJun 28Resolved

decision · infrastructure

Queue backend for the image pipeline

Three candidates evaluated against six criteria for RFC-041. Weighted toward operational burden — this team runs no 24/7 on-call for infra.

Evaluated Jun 2026 · Deciders platform team · Status decided

Recommendation

SQS. It concedes throughput ceiling and strict ordering — neither of which this workload needs — and wins everywhere we actually feel pain: zero ops, native DLQ, per-message visibility timeouts. Confidence: high.

01Scoring matrix

Criterion SQS · recommended Redis Streams Kafka
Ops burden (×2 weight) ●●●●● ●●●○○ ○○○○
Retry / DLQ semantics ●●●●● ●●○○○ ●●●○○
Throughput ceiling ●●●○○ ●●●● ●●●●●
Ordering guarantees ●●○○○ ●●●● ●●●●●
Team familiarity ●●●● ●●●● ●●○○○
Cost at our scale ●●●● ~$40/mo ●●●○○ ~$120/mo ○○○○ ~$900/mo

02Where each one wins — and what we give up

SQS

Recommended

Zero servers, IAM-native, DLQ and visibility timeouts out of the box. Trade-off we accept: ~3k msg/s soft ceiling per queue and best-effort ordering. The pipeline is embarrassingly parallel and peaks at 40 msg/s — two orders of magnitude of headroom.

Redis Streams

Fastest option and we already run Redis for caching. But consumer-group DLQ behavior is hand-rolled, and it couples job durability to a cache instance we currently treat as disposable.

Kafka

The right answer at 100× our volume or with multiple consumers replaying history. Today it is three brokers of ops burden for guarantees nothing downstream consumes.

roadmap · search relaunch

Search relaunch, H2 2026

On track · Owner discovery team · Range May – Nov 2026 · Updated 30 Jun

Replace Elasticsearch 6 with hybrid lexical + vector search
Goal
Indexing pipeline in shadow mode on 10% of writes
Where we are
Full corpus reindexed · Aug 15
Next milestone
Foundations
now · week 3 of 9Indexing
May – mid Jun
mid Jun – Aug
Sep – Oct
Nov

01Phases

Foundations

DoneMay 5 – Jun 13
  • Cluster provisioned; embedding model selected (bge-small, self-hosted)
  • Golden query set — 400 queries with judged relevance
  • Offline eval harness reporting nDCG@10 per experiment

Indexing pipeline

In progressJun 16 – Aug 15
  • Change-data-capture from Postgres into the index queue
  • Shadow-index 10% → 100% of live writes ← current
  • Backfill full corpus (28M docs) with checkpointed batch job

Query & ranking

UpcomingSep 1 – Oct 24
  • Hybrid retrieval (BM25 + ANN) behind the existing search API
  • Interleaving experiment against production ranking
  • Ship if nDCG@10 improves ≥ 8% with p95 under 250ms

GA rollout

UpcomingNov 2 – Nov 27
  • 10% → 100% traffic; ES6 read path removed
  • Decommission legacy cluster (saves ~$3.1k/mo)

02What’s next, and what could slip it

Immediate (2 weeks)Decisions neededRisks
Shadow indexing to 100%; backfill dry-run on 1M docs Reranker: ship with hybrid only, or add cross-encoder in phase 3? owner: mira, by Jul 11 Backfill contends with nightly analytics load on the Postgres replica — may need a dedicated replica (+1 week)

runbook · publishing

Publishing runbook

How to publish, update, and debug documents on html-docs. Answers are under 150 words; commands are copy-ready.

Maintainer platform · Last verified 30 Jun 2026 · 6 entries

Getting started

How do I publish a page?

One POST, no account needed:

curl -sS -X POST https://www.html-docs.com/api/v1/docs -H 'Content-Type: text/html' --data-binary @page.html

The response contains url (share this) and token (keep it — it authorizes every later edit).

Can I choose my own URL?

Add an X-Slug: my-name header and the page is served at /site/my-name. Slugs are first-come, first-served; omit the header for an auto-generated one.

Updating

How do I update a published page?

PUT replaces the whole document:

curl -sS -X PUT https://www.html-docs.com/api/v1/docs/<id> -H 'x-doc-token: <token>' -H 'Content-Type: text/html' --data-binary @new.html

If the doc has comments, prefer PATCHing a single region instead — a full PUT orphans comment anchors.

I lost the token. Can I still edit?

Only if the doc belongs to your account: an API key (Authorization: Bearer hdk_…) works on all docs you own. Anonymous docs with a lost token are immutable — republish and share the new URL.

Troubleshooting

My page renders blank or unstyled

Almost always an external dependency. The viewer renders inside a sandboxed shadow DOM: CDN scripts, external stylesheets, and remote fonts are stripped. Inline all CSS in one <style> block and hand-author charts as inline SVG.

My chart library doesn’t draw anything

Same cause — Chart.js, D3, and Mermaid load from CDNs and are blocked. Bars are <rect>s, lines are <polyline>s. It is less code than the library setup for most doc-sized charts.

Still stuck?

Check the live API contract at GET /api/v1, or ask in #html-docs. Include the doc id and the exact curl command you ran.

weekly debrief · platform

Week 26: the async pipeline shipped to shadow, and we chose our queue

Period Jun 23 – 27, 2026 · Team platform (4 people) · Author weekly rotation: sam

TL;DR
  • Image pipeline — RFC-041 approved by 2 of 3 reviewers; queue + workers live in shadow mode, processing 100% of uploads with zero user-facing writes.
  • Search relaunch — CDC indexing turned on for 10% of writes; eval harness caught a tokenizer bug that would have silently hurt recall.
  • Ops — one warn-level incident (queue lag, 22 min, no data loss); June cost review came in 4% under budget.
14
PRs merged
1
Incidents (0 sev-1)
2
Decisions made
184ms
p95, end of week

01Image pipeline

What happened: the async processing path from RFC-041 went from design to running code. Workers process every upload in shadow mode; output variants are diffed against the inline path nightly — 0 mismatches across 610k images so far.

Why it matters: this is the last de-risking step before the flag flips next week. Upload p95 on the shadow path is 341ms against a 400ms target.

Decision

SQS over Redis Streams and Kafka for the job queue — ops burden was the deciding criterion. Full matrix in the comparison doc. Revisit if sustained volume exceeds 1k msg/s.

02Search relaunch

What happened: change-data-capture indexing is live on 10% of writes. The offline eval harness flagged that the new analyzer dropped hyphenated tokens (t-shirtshirt), cutting recall on 3% of the golden set — fixed before any user saw it.

Why it matters: the harness paid for itself in week one. Every ranking change now gets a scored nDCG@10 report in CI instead of a vibes check.

Decision

Backfill gets a dedicated Postgres replica. Costs ~$400 for six weeks; removes the contention risk with nightly analytics that threatened the Aug 15 milestone.

03Action items

ItemOwnerDue
Flip async_images to 5% and watch DLQ for 48hpriyaJul 2
Migrate 3 call sites that assume variants at 200samJul 4
Reranker recommendation memo (phase-3 scope)miraJul 11
Provision backfill replica + teardown date on calendardevJul 8