Building a Data Layer for AI Personalization on Robots

Robots that understand people need more than sensors and actuators. They need a memory architecture — a structured data layer that captures who you are, what you prefer, and how you change over time. This document lays out the design.

Raunaq Naidu · May 2026 · Working draft — open for review
Contents
  1. Why Robots Need a Personalization Data Layer
  2. Architecture: What the Data Layer Looks Like
  3. Key Design Decisions
  4. The AI Stack: Foundation Models Meet Personalization
  5. Concrete Examples Across Domains
  6. Infrastructure Patterns
  7. Open Problems & Research Directions
  8. The Agent Infrastructure Thesis

1. Why Robots Need a Personalization Data Layer

The robotics stack has historically been organized around two problems: perception (what is in the world?) and control (how do I move through it?). Decades of research have produced increasingly capable solutions for both. A modern mobile manipulator can navigate a cluttered kitchen, identify objects on a counter, and execute grasps with sub-centimeter precision.

And yet, most robots deployed in homes and workplaces today are, frankly, not very useful. Not because they can't see or move — but because they don't know you.

The core gap

A robot that can fetch a coffee mug is a demo. A robot that knows which mug you want, that you take oat milk on weekdays and whole milk on weekends, that you don't like being interrupted before 9 AM, and that your partner's preferences are different — that's a product. The gap between those two robots is not perception or control. It's a data layer.

This data layer — what we might call the personalization substrate — sits between the foundation-model reasoning layer and the physical execution layer. It provides the memory, preferences, and contextual knowledge that make a robot's actions feel intelligent rather than merely competent.

What "personalization" actually means for robots

Personalization on robots is not a recommendation engine. It spans at least four dimensions:

Preference Memory
Persistent knowledge of a user's likes, dislikes, habits, and routines. Goes beyond explicit settings — inferred from interaction history over weeks and months.
Contextual Adaptation
Adjusting behavior based on time of day, who else is present, what happened recently, ambient conditions. The same request at 7 AM and 11 PM may need different responses.
Behavioral Calibration
Speed, chattiness, proactivity, personal space, noise tolerance — the meta-parameters of how a robot behaves, not just what it does.
Environmental Memory
Persistent spatial knowledge: where things belong, what's changed since last time, the layout of a space, and named locations ("the good shelf," "Mom's chair").

None of these fit neatly into a SLAM map or an object detector. They require a dedicated data architecture — one designed for the unique constraints of physical agents operating in human environments.

The Personalization Gap
Without Data Layer 🤖 "Here is a mug." "Which mug?" "I don't know you." Demo + data layer With Data Layer 🤖 "Your blue mug, oat milk, no sugar. Good morning, Alice." Product

2. Architecture: What the Data Layer Looks Like

The personalization data layer is not a single database. It is a composition of stores, each optimized for a different kind of knowledge, connected through a unified query interface that the robot's reasoning layer can access in real time.

Personalization Stack
Application Layer Task execution, dialogue, planning Personalization Engine Preference resolution, context fusion, adaptation Data Layer ← this document User profiles, interaction logs, embeddings, context graphs, spatial memory Infrastructure Event bus, vector store, graph DB, key-value cache, sync engine Hardware & Sensors Cameras, LiDAR, microphones, IMU, tactile sensors

Core data stores

The data layer decomposes into five primary stores:

Five Core Data Stores
Unified Query API User Profiles demographics, prefs, roles Interaction History events, corrections, logs Preference Models embeddings, weights Context Graphs relationships, situations Spatial / Environmental Memory maps, object locations, layouts
Store Contents Access Pattern Backing Tech
User Profiles Demographics, explicit preferences, accessibility needs, household roles Key-value lookup by user ID Document store / KV
Interaction History Timestamped log of every interaction: commands, corrections, feedback, implicit signals Append-only, time-range queries, aggregation Event store / time-series DB
Preference Models Learned preference vectors, Bayesian priors, multi-armed bandit state for ongoing optimization Read per-decision, batch update on schedule Model artifact store + KV cache
Context Graph Relationships between users, objects, locations, routines, and activities Graph traversal (who → uses → what → when) Property graph DB
Spatial / Environmental Memory Persistent map annotations, object placements, named regions, change history Spatial queries (nearest, within region), temporal diffs Spatial index + document store

The unified query interface

The reasoning layer should not need to know which store holds what. Instead, it queries a personalization context API that assembles a coherent context object for any given situation:

// Pseudocode: what the reasoning layer sees
const ctx = await personalization.getContext({
  user: "alice",
  location: "kitchen",
  time: now(),
  task: "prepare_morning_beverage",
  history_window: "7d"
});

// ctx contains:
// - alice.preferences.beverages → { weekday: "oat latte", weekend: "cortado" }
// - alice.behavior.morning_tolerance → "low" (don't be chatty)
// - kitchen.objects.mug_locations → [{ name: "alice_favorite", shelf: 2, slot: 3 }]
// - kitchen.recent_changes → [{ item: "oat_milk", status: "low", detected: "2h ago" }]
// - interaction_history.beverage_requests → last 30 days of what was asked/served

This context object is the primary input to the LLM or policy network that decides what to do. The personalization data layer's job is to make assembling this context fast (sub-100ms), consistent (no stale preference contradicting a recent correction), and privacy-preserving (only surface data the current task is authorized to access).


3. Key Design Decisions

On-device vs. cloud

This is the defining architectural tension. On-device keeps data private and latency low, but limits storage, compute, and fleet-wide learning. Cloud enables richer models and cross-device continuity, but introduces latency and trust concerns.

The On-Device vs Cloud Tradeoff
Profiles + Prefs sub-10ms · private Spatial Memory works offline Cache On-Device ← preferred default Fleet Learning aggregated models Sync + Backup Cloud enhancement only
On-Device First
Keep by default: user profiles, preference cache, recent interaction history, hot spatial memory, behavioral calibration state.

Why: Sub-10ms access. Works offline. No data leaves the home. Critical for user trust — especially in healthcare and domestic settings.
Cloud When Needed
Sync selectively: anonymized interaction patterns for fleet learning, model updates, cross-device preference sync (if user opts in), long-term archival.

Why: Training data aggregation, backup, multi-robot households, and enrichment from external knowledge.
Design principle

The robot should be fully functional on its own device data, with cloud as an enhancement layer, never a dependency. If the network goes down, the robot still knows your name and how you like your coffee.

Privacy by design

Personalization data is inherently sensitive. The architecture must encode privacy as a structural property, not a policy overlay:

Real-time inference vs. batch learning

The data layer supports two temporal modes, and getting the boundary right matters enormously:

Dual-Loop Learning Architecture
Interaction user speaks / acts Event Store append-only log Fast Loop seconds — update cache Slow Loop hours — retrain models merge Recency-Weighted Merge Explicit recent corrections always win over statistical patterns

The fast loop is authoritative for recent corrections. The slow loop provides the baseline. Conflicts are resolved with a recency-weighted merge — explicit recent corrections always win over statistical patterns.

Schema design for heterogeneous form factors

A home assistant, a warehouse logistics bot, and a healthcare companion share no physical morphology. But they can share a personalization schema — because personalization is about the human, not the robot.

// Core personalization schema (form-factor agnostic)
PersonalizationRecord {
  user_id:          string
  household_id:     string
  
  // Explicit preferences (user-set)
  explicit_prefs:   Map<domain, Map<key, value>>
  
  // Learned preferences (model-derived)
  learned_prefs:    Map<domain, PreferenceVector>
  
  // Behavioral calibration
  interaction_style: {
    proactivity:    float   // 0 = only when asked, 1 = anticipate needs
    verbosity:      float   // 0 = silent, 1 = chatty
    pace:           float   // 0 = deliberate, 1 = fast
    personal_space: float   // meters — physical proximity comfort
  }
  
  // Context graph edges (portable across robots)
  routines:         Routine[]
  relationships:    Relationship[]    // to other users, pets, objects
  
  // Form-factor-specific extensions
  extensions:       Map<form_factor, any>
}

The key insight: the core schema is about the person and their context, with form-factor-specific capabilities layered as extensions. When a household adds a new robot, it inherits the family's personalization data immediately. Day-one personalization without a cold-start problem.


4. The AI Stack: Foundation Models Meet Personalization

Foundation models — LLMs, vision-language models, multimodal transformers — are increasingly the reasoning backbone of robot systems. The personalization data layer connects to these models in three primary ways:

Embeddings for user intent

Every user interaction is embedded into a shared vector space — not just the words, but the intent in context. "Make it warmer" means different things when said to a thermostat robot vs. a beverage-making robot vs. a robot adjusting a blanket.

The personalization layer maintains per-user intent embeddings that capture how this specific user uses language. Over time, the system learns that when Alice says "the usual," she means a specific sequence of actions that's different from what Bob means.

Intent Resolution Pipeline
Raw Utterance "the usual" 🎤 voice input + + User Context Alice 7:15 AM kitchen = Resolved Intent oat_latte alice_mug no_sugar Same words, different meaning per user — the data layer resolves ambiguity

Retrieval-augmented personalization (RAP)

Just as RAG (retrieval-augmented generation) grounds LLMs in external documents, RAP grounds robot reasoning in personal context. Before the LLM plans an action:

  1. The query is embedded and used to retrieve the top-k relevant personalization records from the vector store.
  2. Retrieved context is injected into the LLM prompt alongside the task specification.
  3. The LLM generates a personalized plan that accounts for the user's specific preferences and history.
// RAP prompt assembly
const relevantContext = await vectorStore.query({
  embedding: embed(userRequest),
  filters: { user_id: "alice", recency: "90d" },
  top_k: 10
});

const plan = await llm.generate({
  system: ROBOT_SYSTEM_PROMPT,
  context: [
    { role: "personalization", content: formatContext(relevantContext) },
    { role: "environment", content: currentSceneDescription },
    { role: "user", content: userRequest }
  ]
});

This pattern is powerful because it separates the personalization data from the reasoning model. You can upgrade the LLM without losing personalization. You can transfer personalization to a new robot without retraining. The data layer is the durable asset; the model is a replaceable reasoning engine.

Few-shot adaptation

For novel preferences that don't have enough history for statistical learning, the system uses few-shot adaptation: a small number of examples (often just one explicit correction) are stored and used as in-context demonstrations for the LLM.

Example: Learning a new preference in one interaction
Day 1: Robot serves Alice tea in a random mug. Alice says, "Use the green mug next time."
Stored: { user: "alice", domain: "tea", key: "mug_preference", value: "green_mug", source: "explicit_correction", confidence: 0.95 }
Day 2: Robot serves tea in the green mug without asking. Alice says nothing → implicit confirmation → confidence → 0.98.
Day 14: Slow loop detects consistent pattern across 12 interactions. Promotes from "few-shot correction" to "stable preference."

Multimodal grounding

Personalization doesn't only come from language. The data layer ingests signals from multiple modalities:

All signals are fused into the context graph as weighted edges, creating a rich, multimodal representation of the user's current state and long-term patterns.


5. Concrete Examples Across Domains

Three Domains, One Data Layer
Home multi-person household routines & preferences spatial memory Warehouse worker pace matching ergonomic adaptation shift patterns Healthcare communication style medication reminders mood-aware interaction Shared personalization schema across all form factors

Home assistant robots

Scenario: Multi-person household

A family of four shares a home assistant robot. The personalization layer maintains:

  • Per-person profiles: Alice (parent, early riser, vegetarian), Bob (parent, night owl, coffee enthusiast), Charlie (teen, gaming schedule), Dana (child, needs supervision for certain tasks).
  • Household context graph: Shared preferences (thermostat range everyone tolerates), individual overrides (Alice's office is 2° cooler), pet care schedule, shared calendar integration.
  • Interaction routing: Robot identifies who it's interacting with (voice ID + face recognition → user_id lookup) and loads that person's context. When context is ambiguous (two people in the room), it asks rather than guessing.
  • Routine learning: After 3 weeks, the robot has learned the family's weekday morning sequence. It pre-positions items, adjusts its proactivity level based on who's awake, and avoids vacuuming during Charlie's sleep-in hours.

Warehouse & logistics robots

Scenario: Adapting to worker patterns

A fleet of mobile robots in a fulfillment center personalizes to individual workers:

  • Pace matching: Experienced picker Maria works 40% faster than average. The robots assigned to her lane pre-stage items more aggressively and use tighter delivery windows.
  • Ergonomic adaptation: Worker James has a shoulder injury (flagged in his profile with consent). Robots present items at waist height rather than above-shoulder, even if that's less efficient for packing.
  • Shift-pattern memory: The fleet knows Monday morning after a holiday weekend will be slower. It adjusts throughput expectations and reduces pressure on new workers.
  • Handoff preferences: Some workers prefer the robot to wait; others prefer a "drop and go" pattern. Learned from interaction history, not configured manually.

Healthcare companion robots

Scenario: Elderly patient support

A companion robot in an assisted living facility, where personalization is literally life-quality:

  • Communication calibration: Mrs. Chen speaks softly and prefers Mandarin for emotional topics but English for daily logistics. The robot switches languages based on conversational context, not a static setting.
  • Medication adherence: The robot knows Mrs. Chen's medication schedule, which pills she consistently forgets (the evening one), and the gentlest effective reminder strategy (visual cue on the table lamp, not a verbal nag).
  • Mood-aware interaction: On days when Mrs. Chen's activity level drops (detected via interaction frequency), the robot increases proactive social engagement — showing family photos, suggesting a call with her daughter, playing favorite music.
  • Privacy hierarchy: Medical data is strictly scoped. The robot's general-purpose assistant mode cannot access health records. Only the medication-management task, which runs under a separate permission scope, can read that data.

6. Infrastructure Patterns

Event sourcing for robot interactions

Every interaction is an event. Rather than maintaining mutable state ("Alice's mug preference is green"), the data layer stores the full event stream ("Alice corrected mug choice to green at timestamp T"). This pattern provides:

// Interaction event schema
InteractionEvent {
  event_id:       uuid
  timestamp:      datetime
  user_id:        string
  robot_id:       string
  session_id:     string
  
  event_type:     enum {
    COMMAND, CORRECTION, FEEDBACK_EXPLICIT,
    FEEDBACK_IMPLICIT, OBSERVATION, SYSTEM
  }
  
  domain:         string        // "beverage", "navigation", "social", ...
  payload:        json          // event-type-specific data
  
  context_snapshot: {
    location:     string
    time_of_day:  string
    people_present: string[]
    ambient:      json
  }
  
  // Derived at write time
  embeddings: {
    intent:       float[768]    // semantic embedding of the interaction
    context:      float[384]    // compressed context embedding
  }
}

Vector stores for semantic memory

The interaction history, once embedded, lives in a vector store that supports semantic retrieval — "find interactions similar to this situation" rather than "find interactions matching these exact fields."

This powers several key capabilities:

Analogical Reasoning
"Alice liked X in situation Y. This situation Z is similar to Y. She'll probably want something like X."
Anomaly Detection
"This request doesn't match any prior pattern for this user. Ask for confirmation rather than assuming."
Preference Transfer
"Alice hasn't used this feature before, but her general preference patterns are similar to other users who preferred option B."

The vector store runs on-device with a compact index (HNSW with product quantization, typically under 500MB for a year of household interactions). Cloud sync pushes anonymized embeddings (not raw events) for fleet-wide model improvement.

Federated learning for fleet-wide personalization

The hardest infrastructure problem: how do you improve personalization across a fleet of thousands of robots without centralizing raw user data?

Federated Personalization Pipeline
Robot A Robot B Gradient Upload (no raw data) 🔒 differential privacy Aggregation Server federated averaging Updated model → fleet Fleet gets smarter without sharing family data

The federated approach works in three stages:

  1. Local training: Each robot trains (or fine-tunes) a personalization model on its local data. This happens during idle periods — overnight, when the house is empty, during charging.
  2. Gradient sharing: Instead of sending raw data, each robot sends model gradients (or parameter deltas) to the aggregation server. These gradients contain statistical patterns but not individual user data. Differential privacy noise is added before upload.
  3. Federated averaging: The server averages gradients across the fleet, producing an improved global model that benefits from the collective experience of all robots. This model is pushed back to each robot as a better starting point for local adaptation.
Privacy guarantee

With differential privacy (ε ≤ 2), it is mathematically provable that the aggregation server cannot reconstruct any individual user's preferences from the gradient updates. The fleet gets smarter without any robot sharing its family's data.

Sync engine for multi-robot households

When a household has more than one robot (or when a user interacts with robots in multiple locations — home, office, car), the personalization data layer needs a conflict-free sync protocol:


7. Open Problems & Research Directions

This architecture is a starting point. Several hard problems remain unsolved or under-explored:

Preference drift detection
People change. A preference learned six months ago may be wrong today. How do you detect drift (gradual change) vs. noise (one-off exception) vs. context switch (different behavior in different situations)? Current approaches use sliding-window statistics, but we lack good benchmarks for evaluating drift detection in human-robot interaction.
Cold start for new household members
When a guest visits or a new family member joins, the robot has zero personalization data. How much should it infer from household patterns? From demographic priors? From the first few interactions? The tension between "be helpful immediately" and "don't make assumptions" is unresolved. Transfer learning from similar users helps but raises its own privacy concerns.
Explainability of personalized decisions
When a robot makes a personalized choice, the user should be able to ask why — and get a truthful, understandable answer. "I used the green mug because you told me to on March 14" is good. "My preference vector's cosine similarity was 0.94" is useless. Generating natural-language explanations from the personalization data layer, traceable to specific events, is an active research problem.
Multi-agent preference negotiation
In a shared space, preferences conflict. Alice wants the room at 68°F, Bob wants 72°F. The robot mediates. Current approaches are primitive (average, take turns, defer to whoever asked last). A principled framework for multi-stakeholder preference resolution — that feels fair and transparent — is wide open.
Adversarial robustness
What happens when someone deliberately tries to corrupt a robot's personalization model? A child repeatedly giving incorrect feedback, a malicious visitor injecting false preferences, or a coordinated attack on the federated learning pipeline. Robustness to adversarial personalization inputs is largely unstudied in the robotics context.
Cross-cultural personalization
Personalization defaults and interaction norms vary dramatically across cultures. Eye contact, personal space, silence tolerance, directness of communication, household roles — all are culturally loaded. The data layer needs cultural context models, but encoding culture as a set of parameters risks oversimplification and stereotyping. Research needed on respectful, adaptive cultural calibration.

Benchmarks we need

The field lacks standard benchmarks for evaluating personalization quality in embodied agents. We need:


8. The Agent Infrastructure Thesis

Here's the bigger picture: robots are physical agents. And the personalization data layer they need is structurally identical to what software agents need.

Consider what a good AI coding agent maintains about its user: coding style preferences, project context, past decisions, preferred tools, communication style, review feedback history. Now consider what a home robot maintains: interaction style preferences, environmental context, past decisions, preferred routines, communication calibration, correction history.

These are the same primitives.

Shared Agent Personalization Primitives
Software Agents User memory Preference learning Context retrieval Interaction history Few-shot adaptation Tool preferences Style calibration = SAME PRIMITIVES Physical Agents User memory Preference learning Context retrieval Interaction history Few-shot adaptation Object prefs Behavior calibration

The implication is that the personalization data layer should not be built as a robotics-specific system. It should be built as a general agent personalization infrastructure that physical agents consume alongside software agents.

This convergence is already happening in practice. Systems like LangMem, Mem0, and Zep provide memory and personalization layers for LLM-based agents. The missing piece is extending these to handle the additional modalities and constraints of embodied agents: spatial data, real-time sensor streams, physical safety constraints, and multi-user environments.

What a unified agent personalization platform looks like

Capability Software Agent Today Physical Agent Extension
User profiles Preferences, settings, memory + physical characteristics, accessibility, spatial preferences
Interaction history Chat logs, tool usage, feedback + multimodal events, corrections, implicit signals, environmental context
Context retrieval RAG over docs and memory + spatial queries, temporal context, multi-user presence
Preference learning Embedding-based, few-shot + real-time sensor fusion, behavioral calibration, safety constraints
Privacy Encryption, access control + on-device processing, federated learning, physical-space privacy zones
Multi-agent Shared memory across tools + fleet coordination, multi-robot handoffs, cross-form-factor sync

The company that builds this unified layer — agent personalization infrastructure that works for both software and physical agents — is building one of the most important pieces of the AI stack. Every agent, whether it lives in a terminal or a robot body, needs to know its user. The data layer that enables that knowledge is the foundation everything else is built on.


Summary

Robots don't fail because they can't perceive or act. They fail because they don't remember and don't adapt. The personalization data layer — user profiles, interaction history, preference models, context graphs, and environmental memory — is the missing infrastructure that turns competent machines into useful ones.

And the best part: this infrastructure isn't specific to robots. It's the same memory and personalization substrate that every AI agent needs. Build it once, and you've built the foundation for both the software agents of today and the physical agents of tomorrow.

Published on HTML Docs · Open for review and comments