HTML Docs · Design System 2.0

the audit

Six tics made every document look generated. Here is what replaces them.

The old system had good bones — archetypes, self-contained rendering, real diagrams. But its mandates for a “memorable signature,” dramatic scale, and atmospheric backgrounds pushed every page toward the same recognizable AI look.

Scope skill design system + recent published docs · Verdict keep the archetypes, replace the surface language

01What the audit found

Before — the tic	After — the rule
Gradient-filled headline text (`background-clip:text`)	Ink at scale. Size is the emphasis, never a fill effect.
Pill badges on everything (`border-radius:999px`, filled)	Pills only for status, outlined, square-cornered.
Every block in a rounded, drop-shadowed card	Hairline rules and whitespace. A card must earn its border.
Centered hero: eyebrow + huge headline + subtitle + emoji	Left-aligned title block; one eyebrow per document, at the top.
Balanced five-color palettes; status hues used decoratively	Ink does the work. One accent, under 10% of the surface.
The same warm radial-gradient wash behind every document	Flat paper. Variety comes from type and accent, not texture.

02Before / after

Before — reads as generated

🚀 Q3 Platform Review

Supercharging Our Infrastructure

Innovation Velocity Impact

After — reads as written

Platform · Q3 review

Infrastructure held at 4× traffic. Two systems didn’t.

What scaled, what buckled, and the three fixes we’re funding next quarter.

30 Jun 2026 · platform team · internal

03The ten rules

Ink does the work. Color annotates; it never carries the hierarchy.
One accent per document, used on less than ~10% of the surface: eyebrow, section numbers, links, one rule.
Hairlines over shadows. Corner radius 4px or less. Shadows only on genuinely interactive elements.
A card must earn its border. The default container is whitespace and a horizontal rule.
Left-align. No centered heroes. Titles state a finding, not a theme.
Hierarchy comes from the type scale, not from boxes, backgrounds, or weight-800 sans.
Pills are for status only — outlined, uppercase, 11px. Everything else is plain text.
No gradient text, no emoji as icons, no decorative gradients anywhere.
Diagrams are monochrome plus the accent. Every box and arrow labeled; a legend only past two colors.
Status colors mean status. Green/amber/red appear only when something is actually good, at risk, or broken.

04Tokens

Neutrals — shared by every document

Paper

#FBFAF7

Ink

#1C1A17

Soft

#55514B

Faint

#8B867D

Hairline

#E6E2D9

Accents — one per document, never together

Clay

#BC5B3C

Slate

#35618E

Teal

#2F6B75

Plum

#6D5586

Bronze

#8A6A2F

Sage

#4C7350

Sepia

#7A4A3A

Each tab in this showcase uses a different accent — that is the entire per-document theming budget. Same paper, same ink, same hairlines everywhere.

Type scale — Georgia display over system sans

42 / displayThe quiet system

21.5 / sectionProposed design

16 / bodyBody text stays 15–16px, line-height 1.6, max 640px measure.

11 / labelSection label · uppercase · letterspaced

12.5 / monop95 184ms · used for numbers, dates, identifiers

RFC-041 · platform

Take image processing off the request path

Uploads block on resize and transcode today. This RFC proposes returning 202 immediately and processing asynchronously through a queue.

In review · Owner platform team · Updated 30 Jun 2026 · Reviewers 2 of 3 approved

TL;DR

Upload p95 is 3.1s because resize/transcode runs inline. Move processing to a queue and worker pool, return 202 Accepted with a status URL, and target p95 under 400ms. Cost: one new managed component (SQS). Rollout is dual-write behind a flag, two weeks.

01Problem

Every image upload synchronously resizes to four variants and transcodes to WebP before responding. At current volume (~140k uploads/day) this holds a request thread for 2–4 seconds, dominates API p95, and couples upload availability to ImageMagick failures — 31% of our 5xx responses in June traced to processing errors, not upload errors.

02Goals and non-goals

Goals

Upload p95 under 400ms
Processing failures never fail the upload
Client can poll or subscribe for readiness

Non-goals

Changing the variant set or formats
On-the-fly transformation API
Migrating existing stored images

03Proposed design

The API stores the original and enqueues a job. Workers pull, process, write variants, and flip the asset’s status. The client gets a status_url in the 202 response.

Fig. 1 — After. Blue marks the new asynchronous path; the synchronous path ends at “store original.”

04Alternatives considered

Option	Upload p95	New infra	Verdict
Async queue + workers	~350ms	SQS + worker deploy	Proposed
Bigger API instances	~1.9s	none	Cost scales with the wrong axis; failures still coupled
Process in Lambda on S3 event	~350ms	Lambda + IAM surface	Viable; weaker retry semantics, cold-start tail

05Rollout

Week 0 — queue + worker deployed, shadow traffic only
Week 1 — dual-write behind async_images flag, 5% → 50%
Week 2 — 100%, remove inline path, delete ImageMagick from API image

06Risks

Risk	Mitigation
Clients that assume variants exist at 200	Audit found 3 call sites; all migrate to `status_url` polling this sprint
Queue backlog during traffic spikes	Worker autoscale on queue depth; alert at 5-minute lag
Lost jobs	SQS at-least-once + idempotent workers; DLQ with replay runbook

platform health · week 26

Steady week. Ingest error rate is the one thing to watch.

Operational · Updated 30 Jun 2026, 09:00 PT · Window Jun 16 – 29

41.2M

Requests

+6.4% wow

184ms

p95 latency

−12ms wow

0.42%

Error rate

+0.09pt wow

$0.86

Cost / 1k req

flat

01p95 latency, last 14 days

Milliseconds. The drop after Jun 20 is the CDN cache-key fix shipping.

02Error rate by service

ingest

1.8%

api

0.9%

0.5%

auth

0.3%

billing

0.2%

Ingest is elevated because retries against the legacy image path double-count during the RFC-041 rollout. Expected to normalize when the inline path is removed in week 28.

03Active alerts

Alert	Service	Since	State
Queue lag over 5 min	ingest	Jun 27	Investigating
Cert expires in 20 days	edge	Jun 29	Scheduled
Elevated 429s, one tenant	api	Jun 28	Resolved

decision · infrastructure

Queue backend for the image pipeline

Three candidates evaluated against six criteria for RFC-041. Weighted toward operational burden — this team runs no 24/7 on-call for infra.

Evaluated Jun 2026 · Deciders platform team · Status decided

Recommendation

SQS. It concedes throughput ceiling and strict ordering — neither of which this workload needs — and wins everywhere we actually feel pain: zero ops, native DLQ, per-message visibility timeouts. Confidence: high.

01Scoring matrix

Criterion	SQS · recommended	Redis Streams	Kafka
Ops burden (×2 weight)	●●●●●	●●●○○	●○○○○
Retry / DLQ semantics	●●●●●	●●○○○	●●●○○
Throughput ceiling	●●●○○	●●●●○	●●●●●
Ordering guarantees	●●○○○	●●●●○	●●●●●
Team familiarity	●●●●○	●●●●○	●●○○○
Cost at our scale	●●●●○ ~$40/mo	●●●○○ ~$120/mo	●○○○○ ~$900/mo

02Where each one wins — and what we give up

SQS

Recommended

Zero servers, IAM-native, DLQ and visibility timeouts out of the box. Trade-off we accept: ~3k msg/s soft ceiling per queue and best-effort ordering. The pipeline is embarrassingly parallel and peaks at 40 msg/s — two orders of magnitude of headroom.

Redis Streams

Fastest option and we already run Redis for caching. But consumer-group DLQ behavior is hand-rolled, and it couples job durability to a cache instance we currently treat as disposable.

Kafka

The right answer at 100× our volume or with multiple consumers replaying history. Today it is three brokers of ops burden for guarantees nothing downstream consumes.

roadmap · search relaunch

Search relaunch, H2 2026

On track · Owner discovery team · Range May – Nov 2026 · Updated 30 Jun

Replace Elasticsearch 6 with hybrid lexical + vector search

Goal

Indexing pipeline in shadow mode on 10% of writes

Where we are

Full corpus reindexed · Aug 15

Next milestone

Foundations

now · week 3 of 9Indexing

Query & ranking

GA rollout

May – mid Jun

mid Jun – Aug

Sep – Oct

Nov

01Phases

Foundations

DoneMay 5 – Jun 13

Cluster provisioned; embedding model selected (bge-small, self-hosted)
Golden query set — 400 queries with judged relevance
Offline eval harness reporting nDCG@10 per experiment

Indexing pipeline

In progressJun 16 – Aug 15

Change-data-capture from Postgres into the index queue
Shadow-index 10% → 100% of live writes ← current
Backfill full corpus (28M docs) with checkpointed batch job

Query & ranking

UpcomingSep 1 – Oct 24

Hybrid retrieval (BM25 + ANN) behind the existing search API
Interleaving experiment against production ranking
Ship if nDCG@10 improves ≥ 8% with p95 under 250ms

GA rollout

UpcomingNov 2 – Nov 27

10% → 100% traffic; ES6 read path removed
Decommission legacy cluster (saves ~$3.1k/mo)

02What’s next, and what could slip it

Immediate (2 weeks)	Decisions needed	Risks
Shadow indexing to 100%; backfill dry-run on 1M docs	Reranker: ship with hybrid only, or add cross-encoder in phase 3? owner: mira, by Jul 11	Backfill contends with nightly analytics load on the Postgres replica — may need a dedicated replica (+1 week)

runbook · publishing

Publishing runbook

How to publish, update, and debug documents on html-docs. Answers are under 150 words; commands are copy-ready.

Maintainer platform · Last verified 30 Jun 2026 · 6 entries

Getting started

How do I publish a page?

One POST, no account needed:

curl -sS -X POST https://www.html-docs.com/api/v1/docs -H 'Content-Type: text/html' --data-binary @page.html

The response contains url (share this) and token (keep it — it authorizes every later edit).

Can I choose my own URL?

Add an X-Slug: my-name header and the page is served at /site/my-name. Slugs are first-come, first-served; omit the header for an auto-generated one.

Updating

How do I update a published page?

PUT replaces the whole document:

curl -sS -X PUT https://www.html-docs.com/api/v1/docs/<id> -H 'x-doc-token: <token>' -H 'Content-Type: text/html' --data-binary @new.html

If the doc has comments, prefer PATCHing a single region instead — a full PUT orphans comment anchors.

I lost the token. Can I still edit?

Only if the doc belongs to your account: an API key (Authorization: Bearer hdk_…) works on all docs you own. Anonymous docs with a lost token are immutable — republish and share the new URL.

Troubleshooting

My page renders blank or unstyled

Almost always an external dependency. The viewer renders inside a sandboxed shadow DOM: CDN scripts, external stylesheets, and remote fonts are stripped. Inline all CSS in one <style> block and hand-author charts as inline SVG.

My chart library doesn’t draw anything

Same cause — Chart.js, D3, and Mermaid load from CDNs and are blocked. Bars are <rect>s, lines are <polyline>s. It is less code than the library setup for most doc-sized charts.

Still stuck?

Check the live API contract at GET /api/v1, or ask in #html-docs. Include the doc id and the exact curl command you ran.

weekly debrief · platform

Week 26: the async pipeline shipped to shadow, and we chose our queue

Period Jun 23 – 27, 2026 · Team platform (4 people) · Author weekly rotation: sam

TL;DR

Image pipeline — RFC-041 approved by 2 of 3 reviewers; queue + workers live in shadow mode, processing 100% of uploads with zero user-facing writes.
Search relaunch — CDC indexing turned on for 10% of writes; eval harness caught a tokenizer bug that would have silently hurt recall.
Ops — one warn-level incident (queue lag, 22 min, no data loss); June cost review came in 4% under budget.

PRs merged

Incidents (0 sev-1)

Decisions made

184ms

p95, end of week

01Image pipeline

What happened: the async processing path from RFC-041 went from design to running code. Workers process every upload in shadow mode; output variants are diffed against the inline path nightly — 0 mismatches across 610k images so far.

Why it matters: this is the last de-risking step before the flag flips next week. Upload p95 on the shadow path is 341ms against a 400ms target.

Decision

SQS over Redis Streams and Kafka for the job queue — ops burden was the deciding criterion. Full matrix in the comparison doc. Revisit if sustained volume exceeds 1k msg/s.

02Search relaunch

What happened: change-data-capture indexing is live on 10% of writes. The offline eval harness flagged that the new analyzer dropped hyphenated tokens (t-shirt → shirt), cutting recall on 3% of the golden set — fixed before any user saw it.

Why it matters: the harness paid for itself in week one. Every ranking change now gets a scored nDCG@10 report in CI instead of a vibes check.

Decision

Backfill gets a dedicated Postgres replica. Costs ~$400 for six weeks; removes the contention risk with nightly analytics that threatened the Aug 15 milestone.

03Action items

Item	Owner	Due
Flip `async_images` to 5% and watch DLQ for 48h	priya	Jul 2
Migrate 3 call sites that assume variants at 200	sam	Jul 4
Reranker recommendation memo (phase-3 scope)	mira	Jul 11
Provision backfill replica + teardown date on calendar	dev	Jul 8

Design System 2.0 · every example above is fictional but structurally real — copy the patterns, not the numbers. Tabs are CSS-only radio inputs; the page prints with all panels expanded.