Over three months, the Voice AI Platform team drove Meta Ray-Ban smart glasses from early Live Notes prototyping through dogfooding milestones and into beta launch readiness. The arc began in mid-April with foundational product decisions—how to navigate wiretapping laws with a manual-upload flow, how to make "Hey Meta, start live notes" feel seamless—and ended in June with a major strategic decision: launching the beta on schedule despite recommendation relevance sitting at 72%, thirteen points below the 85% target.
The defining moment came during the first week of June, an intense three-day sprint (June 2–4) that produced 83 of the 123 total recordings. The team debated the beta launch question from every angle—Saurabh's quality concerns versus Pooja's launch-window argument—and arrived at a pragmatic compromise: geofence the beta to major cities for the first two weeks while silently deploying improved model weights. This decision preserved market timing while protecting user trust in locations where the model performs well.
In parallel, the team built the speaker identification pipeline from scratch—collecting voice embeddings, debugging diarization in live meetings, designing per-user voice libraries with Manifold persistence—while simultaneously investigating a Live Notes outage, pushing TE reliability from 75% toward 86%, and planning the M1/M2 milestone roadmap that will carry the product into 2027.
The team shaped the core Live Notes experience through two days of intensive testing and product discussions. The central challenge: wiretapping laws prevent live-streaming audio to a server, so the team designed a manual-upload mitigation where audio is stored on-device and uploaded by the user post-session.
Focus shifted to measurement and data quality. The team confronted poor speaker distribution metrics beyond three speakers and decided to build a golden dataset for annotation. Model performance comparisons between Gemini and Avocado showed similar results, but reliability metrics were falling short.
The team navigated dual pressures: a tight DF7 timeline (code complete by 5/12, dogfood by 5/19) and stakeholder skepticism about product quality. A comprehensive reliability document was commissioned to address concerns, while technical limitations of the 9B model's 16K context window threatened summary quality for long transcripts.
After weeks away, the team made a decisive call: stop debating product questions and ship the existing end-to-end v0 immediately, even with rough design. Meanwhile, a deep dive into OpenAI's WebRTC architecture (for ChatGPT Voice and Realtime API) informed the team's own real-time audio infrastructure decisions.
The defining three days of the quarter. The team tackled beta launch readiness, the speaker identification pipeline, a Live Notes outage, and multiple concurrent workstreams at a pace that produced two-thirds of all recordings in the entire period.
The dominant thread: recommendation relevance at 72% versus the 85% target. This was debated across dozens of sessions, with Saurabh advocating for a delay to protect user trust, and Pooja pushing to maintain the launch window with a 91% itinerary satisfaction score as evidence the core product works.
The compromise: Launch on schedule with a major-cities-only geofence for the first two weeks. Saurabh's team silently deploys improved model weights while Pooja implements the geofence and UI restriction. This preserved market timing while constraining exposure to regions where the model performs well.
Parallel to the launch debate, the team built and debugged the speaker ID pipeline in live sessions:
A Live Notes outage starting May 31 was traced to a Sunday routing change with secondary effects on SMC Service Discovery tiers. Reverting didn't immediately fix it. Final summaries failed ~50% of the time.
Alex presented the successful Connect keynote: Mark's live handwriting demo at 30 wpm, Vanguards on the skate ramp, and DC Rainmaker's review titled "finally smart glasses that don't suck." Glasses tracking to 4–5M units for 2025 (below 10M target but 3x prior year).
The most recent recording maps out the M1/M2 milestone roadmap with Alex, setting the direction that carries the product into 2027.
The single most discussed topic across all recordings. Recommendation relevance sat at 72% against an 85% target, creating a genuine strategic tension between quality and timing.
Saurabh's position: Launching below target risks early negative reviews and lost user trust, especially for niche destinations where the model struggles. Proposed a two-week delay to integrate augmented training data.
Pooja's position: The core itinerary experience shows 91% user satisfaction. A clear launch window exists. Launch-blocking reliability issues (budget overflow bug, stale hotel price caching) are higher priority than incremental relevance gains.
Resolution: Launch on schedule with a temporary geofence restricting the beta to major cities for the first two weeks. Saurabh's team packages improved model weights for silent deployment. Pooja specs out the geofencing logic and temporary UI flag.
A technical thread that ran from April through June, evolving from basic measurement problems to a working prototype:
The team tracked reliability from 75% to 86%+ over the quarter:
The team managed aggressive dogfooding timelines while balancing quality expectations:
Several architectural decisions shaped the platform's direction:
Consolidated and deduplicated from 198 raw action items across all sessions. Grouped by workstream.
Most frequently appearing speakers across all recordings.
A heatmap of recording intensity across the quarter. June 2–4 stands out as the most intense working period.