An 18-hour exposure, and the cultural shift it demands
A misconfigured gatekeeper exposed an unreleased Live Notes feature — and a paid upsell — to 215 Wearables EAP users. Reddit told us before our monitoring did. The fix is technical; the lesson is cultural.
Incident: Live Notes GK exposureWindow: May 18, 2:42 PM PT → May 19, 9:25 AM PTDetection: External (Reddit post)Data stored: None
1What happened, by the numbers
~18h
Detection gap
215
Users with access
29
Sent audio to server
0
Records stored
A GK intended for QA + Wearables EAP was mistakenly opened to the app EAP cohort. 60 users saw a Meta One upsell for a feature that doesn't ship until June 29. No data persisted — but only because mitigation happened to land before any upload completed end-to-end. We got lucky, not safe.
2The cultural shift — what has to change
Every line item in this incident maps to a default we got wrong. The technical fixes are easy; the harder work is moving the team from these old defaults to new ones — for every launch, not just this one.
Shift 1 · From "happy path is enough" to "negative cases are the test"
Test plan validates users who should have access
→
Test plan also validates users who should NOT
Why it matters: The GK test plan passed. The bug was who else got in. A launch checklist that only proves "it works for the right people" cannot catch "it also works for the wrong ones."
Shift 2 · From "any reviewer" to "a reviewer with feature context"
Single-reviewer GK mutations, rubber-stamped
→
Mandatory second reviewer who owns the feature, plus AI-assisted intent-vs-change validation
Why it matters: The mistake was visible in the config — it just needed someone who knew what each cohort meant. Process has to assume reviewers will miss context unless we engineer for it.
Shift 3 · From "client-side gating" to "server is the source of truth"
GK + mobile config is the kill switch
→
Server-side kill switches on ASR and clip upload paths, by default, for every launch
Why it matters: Mobile config propagation is 4–24h. Frames MC has no emergency push at all. The only mitigation that actually stopped data movement in this incident was server-side rejection — and it was added during the SEV, not before. That has to flip.
Shift 4 · From "Reddit is our pager" to "we detect our own exposure"
External users discover unintended access first
→
Synthetic monitors and alerts fire the moment a non-target user touches a pre-GA surface
Why it matters: Comms told engineering 18 hours after the GK landed. That gap is the entire incident. Detection has to be a feature we build, not a thing we wait for.
Shift 5 · From "defaults are open" to "defaults are internal-only"
GKs require explicit constraints to be safe
→
GKs are internal-fail by default; opening to external cohorts is the special case
Why it matters: The blast radius of a one-line mistake should be employees, not 215 external users with a half-baked experience and a paid upsell.
Shift 6 · From "our team learned" to "the org learned"
Postmortem stays inside the Live Notes team
→
Findings broadcast across Wearables; every team adopts the new defaults before their next launch
Why it matters: Every team in the org is one config change away from this exact incident. Treating the learnings as portable is the difference between an isolated fix and a culture change.
3Timeline at a glance
May 18 · 2:42 PM PT
GK lands for Live Notes testing — opens to unintended EAP cohort
May 19 · 8:45 AM PT
Comms (Alan) flags Reddit post to engineering — SEV created by Jeff
May 19 · 9:25 AM PT
Jahan lands GK rollback (within first hour of detection)
May 19 · 11:52 AM PT
Server team engaged to block non-employee/non-test users at the server
May 19 · 1:56 PM PT
Legal consulted — no notification required (no production data stored)
May 19 · ~4:00 PM PT
Bakshi lands server-side clip rejection
May 19 · ~9:00 PM PT
Daniel Rogers lands ASR audio rejection — full mitigation complete (~12h total)
4Action items
Neha Add alerts that detect and notify when unintended production users access a feature before public release
Neha Establish a process and share Live Notes learnings with other projects in the Wearables org
Live Notes Work with Jim on a better emergency frames MC refresh mechanism (parity with app-side emergency push)
Live Notes Revisit and solidify the rollout plan for Live Notes EAP and GA before production launch
Live Notes Clean up deprecated MCs and code paths no longer POR
Neha Add negative test cases for GK changes — verify users who should NOT have access don't get it
Live Notes Partner with AIT to ensure service-side kill switches are in place for future feature launches
Live Notes Publish a broader post so all Wearables teams can learn and take this forward
"No data was stored" is the outcome we want every time — but this time it was luck, not design. The cultural shift is making it design.
Defaults that fail closed · Negative tests that prove exclusion · Detection we own · Learnings that travel beyond the team