Audio Notes

Friday, June 5

4 sessions recorded
🎙️ 12:18 AM – 1:07 AM · Late-night sessions
Daily Overview

A packed late-night session on Friday covered three major threads across four recordings. The most consequential discussion was the beta launch debate: with the recommendation recall score sitting at 72% — well short of the 85% target — the team had to decide whether to delay. After a spirited back-and-forth between Saurabh (delay to protect quality) and Pooja (ship on time and iterate), Dan brokered a compromise: launch on schedule but restrict to major cities only for the first two weeks while the model catches up.

In parallel, the team dove deep into the speaker identification prototype for glasses. The technology shows promise — six or seven embeddings across three speakers are being tested — but significant hurdles remain. Legal won't allow open-source publication, so the work needs to move inside PE. The broader debate surfaced a real philosophical tension: how much to lean into privacy controls and consent flows for bystander comfort versus maximizing raw utility for the device owner. One participant argued that in a world of agents, users will want raw data streams flowing into their AI — but acknowledged the legal lines are "not well defined."

The team also ran live hardware tests on the wearable device, discovering that device positioning critically affects speech detection accuracy. Overlapping speech causes misattribution, and breathing sounds were identified as a possible source of incorrect speaker tagging. A separate short session flagged a post-processing bug, though the overall state was described as "pretty good."

Session Details
🕛 12:18 AM 👤 Dan, Saurabh, Pooja 📍 Beta Launch Decision
1

Beta Launch Debate: 72% Recall vs. 85% Target — Major-Cities Compromise

The Problem

  • Recommendation recall score at 72%, short of the 85% launch target
  • Model performs well for major cities but struggles with niche destinations
  • Risk of early negative reviews and lost user trust if launched broadly

The Debate

  • Saurabh argued for a 2-week delay to integrate new augmented training data — "launching at 72% is too risky"
  • Pooja pushed back strongly — core value prop (schedule quality) has a 91% satisfaction rating; delaying for a trending-up metric is "unnecessary risk to our schedule"
  • Dan mediated: "Time to market is critical, but we can't sacrifice the experience for niche users"

Resolution

  • Launch on time with major-cities-only restriction for the first 2 weeks
  • Geofence protects core experience where accuracy is higher
  • Team silently deploys final model weights during the restriction window
Key Exchanges
12:18 AM Dan: Let's talk about the final week before we lock the beta build. The biggest open question remains the recommendation recall score sitting at 72%, short of our 85% target.
12:18 AM Saurabh: Launching at 72% is too risky. The model works well for major cities but struggles with niche destinations, and that will lead to early negative reviews and lost user trust.
12:19 AM Pooja: I strongly disagree with the delay. Our core value prop — schedule quality — is solid with a 91% user satisfaction rating. Delaying for a quality metric that is trending up is an unnecessary risk to our schedule.
12:19 AM Dan: What if we launch on time but restrict the initial audience to a major-cities-only mode for the first 2 weeks? This protects the core experience where the accuracy is higher while giving your team time to silently deploy the final model weights.
12:20 AM Pooja: From an engineering standpoint, implementing a geofence and a temporary UI change is a manageable lift. I can do that.
🕛 12:23 AM 👤 3 speakers 📍 Speaker ID / Privacy
2

Speaker Identification Prototype — Privacy Controls vs. Raw Utility for Glasses

Prototype Status

  • 6–7 embeddings saved spanning 3 speakers; system classifies audio as known speaker or unknown
  • Feature not yet operational on reviewer's glasses
  • Open-source blocked by legal — must integrate inside PE
  • TE-based storage being provisioned to persist embeddings

The Privacy Tension

  • Path A: Lean into privacy controls — consent flows, bystander comfort, tagging databases, controlled integration
  • Path B: Raw data stream into user's agent — "In a world of agents, my app will be different from yours. Just give me the raw data dump."
  • Key distinction raised: Meta-provided recording (Meta is liable) vs. user-supplied audio (user's responsibility)
  • Three angles identified: legal, bystander comfort, and owner utility — "they're not all the same"

Integration Work

  • Recognition on TE stack estimated at a couple of weeks out
  • Design underway for Hatch to access captured audio nodes
  • Internal dogfooding has moved to TE builds; C50 update enables manual upload
Key Exchanges
12:23 AM Speaker 2: We have saved a few of the embeddings — that includes your embeddings also. We have six or seven, and we have 3 speakers. So it's trying to choose between them or nothing.
12:24 AM Speaker 1: Legal won't let us put this on open source. We have to get it inside PE.
12:26 AM Speaker 1: There's one argument which is — we're gonna really lean into privacy and control… to make sure bystanders are comfortable. There's another way: give me a raw data stream heading into my agent… I'll have a different interface from all you guys.
12:28 AM Speaker 3: The line is — if we build the recording part and it's in our app, then legal says Meta's liable. If the user uses Voice Memo on iOS and sends that audio to Hatch… the user gave me some audio, I have no idea what this is.
12:28 AM Speaker 1: There's 3 angles to this: the legal angle, the bystander comfort angle, and what's most useful to the person who owns the glasses. They're very different… I worry a little bit that we're very legally motivated.
🕐 1:03 AM 📍 Bug Report
3

Post-Processing Bug Observed — Quick Proximity Test

  • A post-processing bug was spotted during testing
  • Another team member may already be aware of the issue
  • Quick test adjusting proximity was run to see if behavior changed
  • Despite the bug, overall state described as "pretty good"
Key Exchanges
1:03 AM Speaker 1: Post-processing bug is happening. I think he already knows.
1:04 AM Speaker 1: Right now I think it's pretty good, but like…
1:04 AM Self: Let's see if I bring it closer…
🕐 1:04 AM 👤 Saurabh, Raunaq 📍 Architecture / Hardware Testing
4

Open Server Solution & Live Wearable Speech Detection Testing

Open Server Strategy

  • Team discussed adopting an open server solution to operate more as a backend team while supporting experience team on first-party apps
  • Hatch and other teams may request the solution; team is confident they have the best one available

Live Hardware Testing Findings

  • Raunaq tested speech detection by attempting to yell — struggled to project volume
  • Device positioning is critical: wearing the device properly produced noticeably better results
  • Continuous single-speaker input remained stable
  • Overlapping speech causes misattribution and segmentation issues
  • Breathing sounds hypothesized as a cause for incorrect speaker attribution — "Could be the breathing"
  • When self wore the device: "This is much better, right? It is perfect — like perfect"
Key Exchanges
1:04 AM Saurabh: That's why we actually need the open server solution… because we become a backend team, right?
1:05 AM Raunaq: Maybe I should yell a bit and see if that helps. I cannot yell — I struggle.
1:05 AM Raunaq: Is this the sweet spot, or does it take time for it to detect?
1:06 AM Self: How about I just wear it and see? This is much better, right? It is perfect — like perfect.
1:06 AM Saurabh: When you keep speaking, it will be fine. It started getting confused because you wore it and started doing the "hmm"…
1:07 AM Saurabh: Could be the breathing.
Pooja Implement a geofence and a temporary UI change to restrict the beta to major-cities-only mode for the first 2 weeks
From: Beta Launch Debate
Team Work on getting the embeddings inside PE — legal won't allow putting it on open source
From: Speaker Identification Prototype