The intelligence is here. The body isn't. That's the bottleneck nobody is building for.
AI agents crossed an inflection point in 2025. Reasoning models can write production code, manage complex workflows, and hold multi-turn conversations indistinguishable from a human colleague. The intelligence layer is no longer the constraint. But look at where these agents actually live -- and what they can actually perceive -- and the picture changes completely.
Two numbers tell the story. The agentic AI market is projected to hit $48 billion by 2030 -- a 6x increase from 2025. Meanwhile, 156 million smart speakers shipped last year. There's enormous demand for AI in the physical world. But the devices people actually own are running decade-old voice interaction models that can barely set a timer reliably.
The agents are brilliant. The bodies are braindead.
Claude can reason through a 50-page legal contract. GPT can generate production React components. OpenClaw can orchestrate multi-step workflows across dozens of APIs. But ask any of them what's happening in the room you're sitting in and they have absolutely no idea.
This isn't a minor limitation. It's the fundamental bottleneck. Every AI agent in existence is sensory-deprived.
Every piece of physical-world context must be manually narrated by the user. "It's raining." "I'm in the kitchen." "There's a package at the door." You are the sensor -- and you're an unreliable one. You forget to mention things, you don't think to share context, and you definitely don't narrate your environment in real-time. The agent is only as aware as you make it.
An agent can monitor your email or watch a stock price. It can poll any API. But it cannot notice your toddler walking toward the garage, smoke coming from the stove, someone at your door, or that you've been motionless at your desk for five hours. Zero physical-world triggers can reach them.
Every interaction is session-based. You open a chat, describe your situation, get help, close the chat. The agent has no ambient understanding of your life. It doesn't know you just had a tense phone call, that your house is freezing, that three friends came over, or that you've been practicing guitar for two hours. Every conversation starts from zero physical context.
"What's wrong with this circuit board?" "Is this rash something to worry about?" "What bird is making that sound?" Some questions are inherently visual, auditory, or spatial. Today you photograph it, upload it, type a prompt, wait. The friction kills the use case -- most people just don't bother.
The shift from disembodied to embodied isn't incremental. It changes the category of problem an agent can solve.
Today: you ask a question, the agent answers. With embodiment: the agent notices things and acts before you ask. It sees the FedEx truck pull up. It hears the baby crying in the next room. It detects CO2 levels spiking because you've had the windows closed all day. The agent becomes the one who initiates.
Today: context lasts as long as the conversation. With embodiment: the agent has a persistent model of your physical environment. It knows who's home, what room you're in, what the temperature is. When you talk to it, it already has context. "Turn down the heat" works because it already knows which room you're in.
Today: you type or dictate, agent responds with text. With embodiment: you hold up a component and ask "what is this?" You play a chord and ask "what am I playing?" You point at a weed and ask "should I pull that?" Interaction bandwidth goes from ~40 words per minute to the full richness of the physical world.
Today: agents have zero concept of physical space. With embodiment and multiple sensor nodes: the agent knows your home's layout, which rooms are occupied, where sounds originate. It coordinates actions across spaces. "Someone's at the front door" isn't a notification you configured -- it's something the agent just knows.
The market has attempted embodied AI multiple times. Every attempt has been vertically integrated: one company's hardware running one company's AI. The results range from disappointing to catastrophic.
Amazon has reportedly lost over $10 billion on Alexa since its launch.[1] Despite 40% of US households owning a smart speaker,[2] usage patterns are stubbornly shallow: 75% use them for music, and the most common voice command is still "set a timer."[3] Alexa+ -- Amazon's 2025 generative AI upgrade ($19.99/month or free with Prime) -- promised agentic capabilities and natural conversation. User reception has been mixed at best, with XDA Developers bluntly noting that "Alexa+ didn't revolutionize anything in 2025."[4]
The problem isn't the form factor. It's that these devices are voice-in/voice-out pipes to a locked cloud. They have microphones and speakers. That's it. No camera, no environmental sensing, no presence detection, no expandable sensor stack. And the AI is permanently welded to the hardware: you get Amazon's agent, running Amazon's models, in Amazon's cloud. When the agent is mediocre, the hardware is worthless.
2024 produced two high-profile attempts to give AI agents physical form. Both were disasters.
Humane AI Pin ($699 + $24/month) projected its interface onto the user's palm, required a monthly subscription, and was returned more than it was sold. Of the units shipped, returns exceeded $1 million against $9 million in total sales, with 7,000 units sitting in unsellable inventory.[5] Fire risk concerns meant returns couldn't even be refurbished.
Rabbit R1 ($199) was a dedicated hardware device for an AI agent that could have been an app. Tom's Guide called it "one of the worst gadgets I've ever reviewed."[6] Unreliable voice commands, inaccurate answers, 4-hour battery life, and a security flaw that exposed user data through hardcoded API keys.[7] Combined, the two products lost over $5 billion in market value within 12 months.[8]
Meta's partnership with EssilorLuxottica has produced the most commercially successful AI hardware product of this generation. Ray-Ban Meta glasses sold 2 million units with sales tripling year-over-year.[9] Production is scaling toward 10 million units annually by late 2026,[10] and the glasses outsell traditional Ray-Bans in 60% of EMEA retail stores.[11] Global smart glasses shipments surged 110% in H1 2025, with Meta capturing 73% market share.[12]
This validates consumer appetite for AI-in-the-physical-world. People want it. But the glasses are locked to Meta AI, processed in Meta's cloud, with no developer SDK and no ability to run a different agent. When Meta's AI is good, the glasses are great. When it's not, you're stuck. And the form factor -- great for mobile first-person context -- can't do what a stationary hub can: persistent room monitoring, environmental sensing, multi-room coordination, always-on spatial awareness.
Home Assistant -- the open-source, self-hosted smart home platform run by the Open Home Foundation -- is the closest thing to a working model for agent-agnostic physical AI.[13] It integrates with over 1,000 devices, runs locally, supports OpenAI/Claude/Ollama for LLM-powered automation, and its "Year of the Voice" initiative produced open-source voice hardware. The 2025.8 release added "AI Tasks" for structured agent delegation.[14]
But Home Assistant stitches together third-party hardware that was never designed for AI agent use. A Zigbee motion sensor from 2019, a Ring camera talking to Ring's cloud, an ESP32 mic module with a 3-meter pickup range. There is no coherent, optimized sensor stack purpose-built for giving AI agents rich physical-world perception. The software intelligence is there. The hardware body is a Frankenstein.
Every player in this space has made the same mistake: welding the intelligence to the hardware. The table below compares sensor coverage, agent flexibility, privacy posture, and developer access.
| Product | Agent Flexibility | Sensor Stack | Privacy Model | Dev SDK | Verdict |
|---|---|---|---|---|---|
| Amazon Echo / Alexa+ $19.99/mo or Prime |
Alexa only. Skills are sandboxed wrappers, not real agent access. | Mic array + speaker. Some models add a screen. No camera on most. No environmental sensors. | Cloud only Poor track record. Recordings sent to Amazon. |
Skills Kit (limited) | Mediocre |
| Google Nest / Home Free (Google account) |
Google Assistant only. Actions deprecated; Gemini pivot underway. | Mic + speaker. Nest Hub adds camera + temperature sensor. Limited sensing. | Cloud only Data feeds Google's ad model. |
Actions on Google (being sunset) | Mediocre |
| Apple HomePod $299 |
Siri only. SiriKit is extremely restricted. | Mic + speaker + temperature + humidity. No camera, no presence. | Best-in-class On-device processing. Apple's privacy commitment. |
SiriKit (very limited) | Mediocre |
| Meta Ray-Ban Glasses $299-$799 |
Meta AI only. No third-party agents. No SDK. | Camera + mic + speaker. No environmental sensors. Mobile only. | Cloud only All processing in Meta's cloud. |
None | Good hardware, locked ecosystem |
| Rabbit R1 / Humane Pin $199 / $699+$24/mo |
Custom locked agent. R1 = LAM. Pin = Cosmos. | Camera + mic + speaker. No environmental sensors. No expansion. | Cloud only R1 had hardcoded API key leak. |
None | Failed products |
| Home Assistant Free / $65 (Green hub) |
Any LLM. OpenAI, Claude, Ollama. Full automation engine. | Depends on third-party hardware. No unified sensor stack. Fragmented. | Local-first Self-hosted. No cloud dependency. |
Full API + add-ons | Right software, no hardware |
| Embodied (this) $149-199 target |
Any agent. Claude, GPT, OpenClaw, custom. Full SDK. | Camera + 4-mic array + speaker + mmWave radar + temp/humidity/AQ + LED ring. Expandable modules. | Hardware-enforced Kill switches. Local-first. Edge compute. |
Full open SDK | The missing piece |
The pattern is obvious: every product either has good hardware with a locked agent or agent flexibility with no hardware. Nobody has built an open, sensor-rich hardware platform that any AI agent can plug into.
A family of hardware devices -- starting with a single hub -- that serve as the physical body for any AI agent. The hardware is open. The SDK is open. The agent is whatever you want it to be.
A compact, desk/shelf-mountable hub (~palm-sized) containing the essential sensory stack for physical-world AI perception.
Concept render: The Puck -- a palm-sized hub with camera, mic array, speaker, and ambient sensors.
Wireless add-on modules that extend the Puck's sensory range. Each module pairs over BLE and auto-registers capabilities with the SDK.
Glasses or clip-on form factor giving agents first-person visual perspective + always-on audio. Think Meta Ray-Bans but agent-agnostic. V1 focuses on the stationary hub -- the wearable is a harder problem (battery, thermals, optics) and benefits from proving the SDK first.
Each of these is impossible for a disembodied agent. That's the bar -- if an agent could do it through a chat window, it doesn't belong here.
Camera and environmental sensors in the kitchen. Agent watches while you cook.
Without embodiment: The agent would need you to type "I'm cooking chicken with garlic and lemons, what should I make?" The proactive, ambient nature disappears entirely. You'd never think to tell the agent the air quality is dropping -- you don't know it yourself.
Camera pointed at your workspace. Mic always listening (wake-word activated). Presence detection tracks focus time.
Without embodiment: The agent doesn't know you're in a meeting, can't see your screen, doesn't know you've been sitting for hours. You'd have to switch tabs, paste the error, describe the context. The friction means you just Google it instead.
The Workshop Companion: when your hands are covered in sawdust, a chat window isn't an option.
Ruggedized camera module in a garage, studio, or workshop. Mic for hands-free interaction.
Without embodiment: Hands are occupied. You're covered in sawdust, solder, or paint. Picking up a phone isn't just inconvenient -- it breaks your flow and sometimes isn't physically possible. Embodiment is highest-value when the user's hands are busy.
Multiple sensor nodes across a home. Presence detection, environmental sensors, door/window contacts. Camera optional per room.
Without embodiment: The agent has zero knowledge of your home's physical state. Can't know which doors are open, who's home, whether the stove is on. You check everything yourself -- which is exactly what you do today.
Full sensory stack optimized for users with visual impairments, mobility limitations, or cognitive disabilities. This is where embodiment goes from nice-to-have to the entire value proposition.
Without embodiment: These users often can't easily use phones, tablets, or keyboards. The voice + vision interface IS the product. There's no "just use the chat app" fallback.
Multi-node deployment across a property. Each unit runs a security-focused agent with camera and presence sensing. All processing local.
Without embodiment: A disembodied agent can read your Ring camera's notification -- but it can't decide whether to unlock the door, track someone across angles, or reason about a scene in real time. The intelligence layer is missing from existing security hardware.
Every competitor locks hardware to one AI. We don't care which agent you run. Claude, GPT, OpenClaw, Ollama, your custom stack -- the hardware works with all of them through the SDK. When a better model ships next quarter, you swap the brain, not the body.
The SDK is the product, not the device. Every agent developer in the world becomes a potential customer when their agent can gain a physical body through a standard API. We build for the people building agents, not just the people using them.
Hardware kill switches for camera and mic. On-device processing for wake-word, basic ASR, and presence. No cloud dependency for core sensing. In a world where 65% of consumers report discomfort with always-on devices, local-first isn't just ethical -- it's a competitive moat.
Open SDK means third-party sensor modules, community-built agent templates, shared perception models. More agents built for the platform → more hardware sold. More hardware deployed → more agents built. Home Assistant proved this flywheel works in software. We bring it to hardware.
Before the iPhone, phones were vertically integrated: Nokia's hardware ran Nokia's software ran Nokia's apps. The iPhone (and later Android) separated the hardware platform from the application layer. Suddenly any developer could build for the hardware. The app ecosystem -- not the hardware itself -- became the value.
The shift: from talking at a speaker to ambient spatial intelligence.
AI agents are in the "Nokia era" right now. Alexa's hardware runs Alexa's agent. Meta's glasses run Meta's agent. Google's hub runs Google's agent. Each vendor's intelligence is welded to their body.
But the intelligence is commoditizing. Open-source models close the gap with frontier models every quarter. Claude, GPT, Gemini, Llama, Mistral -- the brain is becoming interchangeable. What's not interchangeable is the body. A camera in your kitchen. A mic array in your workshop. A presence sensor in every room. The physical layer is the scarce resource.
We're building the platform that separates the body from the brain -- so any brain can have a body.
On-device ASR (Whisper tiny/small) is fast but lower quality. On-device vision is limited. How much must go to the cloud? How do we preserve privacy guarantees for cloud-processed requests?
Always-on sensing drains power. Stationary Puck is wall-powered (solved). Wearable is the hard problem -- probably 4-6 hours active with current battery tech.
When three agents run on your Puck, how does the user know which is talking? Which is listening? LED ring helps but the interaction model needs careful design.
Open platform means some agents will be bad. A poorly built agent that spams the speaker hurts the hardware brand. Need permissions + review without becoming an app store gatekeeper.
Perceive (camera) → process (cloud LLM) → act (speaker). Round trip: 2-5 seconds. Too slow for conversation. Must identify which loops stay on-device vs. which tolerate latency.
Camera + 4-mic array + speaker + radar + environmental sensors + Wi-Fi 6E + BLE + SoC + 64GB. Tight but feasible at scale. Dev kit uses off-the-shelf at higher unit cost; production Puck benefits from custom PCB and volume.
The embedded AI market is projected to reach $48.9 billion by 2034, growing at 17.5% CAGR.[15] Edge AI hardware alone will hit $36.4 billion by 2033.[16] The consumer AI autonomous agent market -- devices that orchestrate cross-device interactions -- is growing from $458M in 2026 to $833M by 2032.[17]
But none of these projections account for what happens when the body and the brain are decoupled -- when any developer building an AI agent can give it eyes, ears, and spatial awareness through a $149 device with a standard SDK. That's not an incremental market. It's a platform shift.
The intelligence is commoditizing. The physical layer is the moat. We own the body.
[1] Emarketer. "Google Assistant leads U.S. voice assistant adoption... Amazon's $10 billion loss from Alexa's decline." emarketer.com
[2] ElectroIQ. "35% of U.S. adults own smart speakers... global market reached $21.4B in 2025, 156M units shipped." electroiq.com; US Smart Speaker Market Research (MRFR) reports 40% household ownership.
[3] Digital Trends. "75% of smart speaker users primarily use them for music... 43% have made purchases." digitaltrends.com
[4] XDA Developers. "Alexa+ and Google Home's AI didn't revolutionize anything in 2025, but Home Assistant did." xda-developers.com
[5] MacRumors. "Returns of Humane AI Pin outpacing sales: $9 million in sales, $1 million in returns, 7,000 units unsold." macrumors.com
[6] Tom's Guide. "The Rabbit R1 is one of the worst gadgets I've ever reviewed." tomsguide.com
[7] SafeWise. "Rabbit R1 security flaw: hardcoded API keys allowing unauthorized access to user data." safewise.com
[8] Digital Applied. "AI Product Failures 2026: Sora, Humane & Rabbit R1 -- combined $5 billion+ in losses within 12 months." digitalapplied.com
[9] UploadVR / Entrepreneur. "Ray-Ban Meta glasses sales tripled year-over-year; 2 million units sold as of 2025." uploadvr.com
[10] Reuters. "Meta and EssilorLuxottica considering doubling production to 20M units annually; 10M-pair target by late 2026." reuters.com
[11] TechCrunch. "Meta's smart glasses outsell traditional Ray-Bans in 60% of European, Middle Eastern, and African stores." techcrunch.com
[12] Counterpoint Research. "Global smart glasses shipments surged 110% YoY in H1 2025, Meta capturing 73% market share." counterpointresearch.com
[13] Home Assistant. "Open-source home automation platform, 1,000+ device integrations, local-first." home-assistant.io
[14] Home Assistant. "2025.8: The Summer of AI -- AI Tasks for structured task delegation." home-assistant.io
[15] Fortune Business Insights. "Global embedded AI market projected to grow from $13.49B in 2026 to $48.90B by 2034 at 17.5% CAGR." fortunebusinessinsights.com
[16] LinkedIn / Market Research. "Edge AI Hardware Market projected to grow from $12.5B in 2024 to $36.4B by 2033 at 13.4% CAGR." linkedin.com
[17] Research and Markets. "Consumer Electronics AI Autonomous Agent Market -- $458.46M in 2026 to $833.21M by 2032." researchandmarkets.com
[18] MarketsandMarkets. "Global embodied AI market projected to grow from $4.44B in 2025 to $23.06B by 2030 at 39.0% CAGR." marketsandmarkets.com
Draft v0.2 · May 2026