All articles
Agent Engineering··15 min

When your vector database stops being enough

RAG covered the retrieval question. The four failure modes that show up next (tenant leakage, wrong tenant credentials, unreconstructable decisions, audit asks) live outside the scope of vector retrieval. Here is where they live instead.

When your vector database stops being enough

You shipped a vector database. It solved the problem it was built to solve: semantic recall over unstructured content the agent had to read. That was the right bet. The question that follows, the one most 2026 agent teams are sitting with, is when the vector layer stops being enough and what the next decision is.

The next decisions are not retrieval decisions. They sit one level above the data layer, and the four most common failure modes look like data structure problems only because they share an address space with your data layer.

What vector retrieval does not cover

Skim the recurring incident shapes from the last twelve months of agent post mortems and four keep showing up:

  • Customer A's data leaking into Customer B's prompt context
  • The agent calling a tool with the wrong tenant's credentials
  • An operations team unable to reconstruct why the agent made a specific decision last Thursday
  • An auditor asking "show me every action your agent took on behalf of this customer" and the team reconstructing it from log files
Vector retrieval is a component decision. The four axes are governance decisions. Confusing them is how you find out which one you were actually buying.

None of them are retrieval problems. They are governance problems that show up at the data layer because that is where the agent's read and write traffic surfaces. The data structure choice that prevents them is upstream of "which database." If you want a vocabulary for the underlying threat surface, the OWASP Top 10 for LLM Applications names most of these failure modes by their proper categories (prompt injection, sensitive information disclosure, insecure output handling). It is a better starting reading list than any vector database vendor's blog.

The four questions sitting on top of retrieval

When you reach for the next layer of your agent's data architecture, four questions are in play whether they are named or not:

  1. State. What does the agent need to know right now to act?
  2. Memory. What does it need to remember across tasks?
  3. Audit. What does an operator need to reconstruct after the fact?
  4. Constraint. What must be verifiably true before the agent acts?

Three of these four are governance questions. Vector retrieval (and the vector vs graph conversation more broadly) addresses question two. The other three live upstream of storage entirely.

The data structure question is a governance question with a storage layer. Pick the governance posture first; the storage shape follows.

The four axes of agent access control

Any production agent system that makes it past Stage 2 has to make decisions across four distinct access control axes. They get rolled up as "access control" in conversation, but each one matures at a different time and lives in a different part of the codebase. Naming them separately is a forcing function for the design conversation. The four buckets below trace fairly cleanly to the NIST 800-53 Access Control family if you want an external anchor.

Where the four access-control axes sit in a single agent request A horizontal flow from Agent through three gates (tenant isolation, RBAC, secret dereference) to the data layer. A fourth element, the audit log, sits below and receives a dotted feed from each gate. Where the four axes sit in a single agent request Each action passes three gates before it touches data. A fourth gate records what happened. Agent caller 01 Tenant isolation row-level policy 02 RBAC role policy 03 Secret deref opaque ref Data layer 04 Audit and non-repudiation Append-only event log. Every checkpoint decision flows here.

1. Tenant isolation. Customer A cannot see Customer B's data. The canonical pattern is a row level policy at the database layer keyed on tenant identifier, with the qualifier that row level security has escape hatches (admin bypass roles, connection pooler quirks) you accept as documented tradeoffs. App layer enforcement is defensible only when the tradeoff is named and the audit log can prove it.

Imagine your agent reads from a vendor's API and writes to a tenant scoped table in your own Postgres. The vendor call returns data you trust to be correctly scoped to the tenant whose credentials you used. The write path is where row level security earns its keep. The failure mode that actually shows up: the agent using a connection that has BYPASSRLS set because somebody flipped it on during a debugging session and never flipped it off. Tenant isolation lives or dies on the connection grants table, not the policy file.

If your vector index is not filtered by tenant metadata, that filter is the retrieval side implementation of tenant isolation. The policy lives one layer above.

The Postgres row level security docs spell out the escape hatches in detail; read them before betting tenant isolation on RLS alone. If your hosted database lets pooled connections share sessions, the policy can be silently bypassed without any code change to your app.

2. Role based access (RBAC). Different humans (and different agents) see different slices of the same tenant's data. Shares a code path with tenant isolation (both are policies at the data boundary) but answers a different question: not which customer the row belongs to, but which caller is allowed to see it.

Imagine the same tenant scoped table. Now a finance role inside the customer needs read access to the invoice columns but not the support transcript columns, and the support role needs the inverse. Tenant isolation has done none of this work. The minimal shape is a roles table that ties tenant, user, and a named role together, with a join the data layer policy can check against. Keep it simple so the audit log can show exactly which role decided each read.

The agent is also a caller. If your agent acts on behalf of a finance user, it should inherit that role's slice, not the union of every role available in the tenant. Agents that act with a tenant wide service account are RBAC bypasses with extra steps.

3. Agent secret separation. Your agent should never see API keys, encryption keys, or raw PII it doesn't need. The pattern: the agent operates on opaque references (customer_ref: abc123); a separate, smaller surface component dereferences them when an action lands in the real world.

Imagine your agent drafts a refund. The refund tool's input schema accepts customer_ref and amount_cents. Inside the tool implementation, a small dereferencer (running outside the prompt context, with its own narrower set of permissions) resolves the ref to a real Stripe customer ID and a real payment method, calls Stripe, and returns a refund ID back to the agent. The agent never sees the Stripe key, never sees the card last four, never sees the customer's email. If the prompt is jailbroken, the secret surface is bounded to whatever the dereferencer holds, not whatever was in the agent's working context.

4. Audit and non repudiation. After the fact, who saw what, who did what, and how do you prove it hasn't been tampered with? This is an append only event log, not a last_updated_at column. The failure shape this prevents: an audit log that captures user actions but misses agent initiated tool calls. Three months later, no one can answer who deleted what. Building audit correctly means more than turning on a table. Stage 3 onward is where idempotency keys and an outbox pattern stop being optional, because dual writes between the LLM call and the state store will silently corrupt the trail you spent effort to build. My read is that the cost of bolting audit onto a mutable system after a compliance ask runs much higher than building it right from the start. How much higher depends on how deeply the mutable state has forked across services.

Imagine an auditor at month nine asking for every action the agent took on Customer A in March, with the prompt trail and proof the rows have not been edited since write. Two design choices make the audit table do the work it does at Stage 4. A hash chain across event rows is what makes the log non repudiable. A typed actor column that names agent as a distinct value, separate from user, is what makes the agent's tool calls reconstructable as first class events, not as side effects of user actions. Both are cheap to add at Stage 4 entry; both are painful to back fill three months later.

Yes, the primitives are IAM and audit logging. The naming matters because the moment the four collapse into a single bucket called "auth," each one stops being designed for. This four axis split is one possible decomposition. Some teams will fold secret separation under RBAC; some will treat tenant isolation as a special case of RBAC scoped to a tenant role. Pick the split that keeps each decision named.

The data structure question is downstream of these four axes. If you can't draw them on a whiteboard, you're not architecting an agent. You're prototyping a breach.

One ownership note. The four axes do not have one owner. Tenant isolation and agent secret separation usually live with the security engineering function. RBAC sits with product engineering, because the role model is shaped by the product surface. Audit and non repudiation sit with compliance, or with engineering if compliance is not a function yet. The architect's job is naming the four owners, not collapsing them into one role.

The operational maturity ladder

The shape of your data structure tracks where you are on this ladder. User count is a noisy proxy at best.

StageWhen you enterAxes that light up
1. Self operatedYou are the only operatorNone
2. Co builders / design partnersSecond operator plus first NDA'd external userTenant isolation
3. Paying customersPricing live, contracts signed+ RBAC
4. Regulated customersFirst SOC2 / HIPAA / PCI / GLBA ask+ Agent secret separation + Audit
5. Multi agent at consequenceAgents acting concurrently, cosigners required, reasoning artifacts demandedAll four hardened + reasoning artifact retention

A 50 user fintech in regulated territory sits at Stage 4. A 10K user consumer tool with no PII exposure may sit at Stage 2 forever. Some teams enter mid ladder (a regulated from launch fintech starts at Stage 4) or skip a stage (a consumer tool that suddenly touches PHI lands at Stage 3 and Stage 4 simultaneously). The ladder is the typical sequence, not a guarantee.

Nothing here is legal or compliance advice; the SOC2, HIPAA, PCI, and GLBA references are for orientation, not interpretation. Talk to a real attorney or compliance lead before betting your roadmap on this article's framing of what any of those regimes require.

When each access-control axis lights up across operational maturity stages A heatmap with four rows (the access-control axes) and five columns (operational stages 1 through 5). Cell intensity in violet shows whether the axis is dormant, lighting up, required, or hardened at that stage. When each access-control axis lights up Rows: the four axes. Columns: the operational maturity stages. Intensity shows posture at that stage. dormant lights up required hardened 01 Self 02 Co-builders 03 Paying 04 Regulated 05 Multi-agent Tenant isolation dormant lights up required hardened hardened RBAC dormant dormant lights up required hardened Agent-secret separation dormant dormant dormant lights up required Audit and non-repudiation dormant dormant dormant lights up required Stage progression is set by compliance posture and agent autonomy, not user count.

Stages 1 and 2 look similar from the outside (small team, small data) but the boundary between them is easy to cross silently. A cofounder gets added to the system, a design partner signs an NDA, and tenant isolation is suddenly load bearing before anyone has formally decided it should be. The telling line at the Stage 2 transition is the first time the schema grows a tenant_id column and a row level security policy beside it; that single migration is the moment Stage 1 ends. The telling line at Stage 3 is a roles table joining tenant_id to user_id to a named role; permissions stopped being binary. At Stage 4, it is the first append only table with a hash chain column and an actor_kind enum that names agent as a distinct first class actor. Each transition has one schema or migration that defines it, and once you can read your own migration log as a stage progression, you have a roadmap.

If I were designing for regulated customers at the Stage 4 boundary, I'd reach for an append only audit table with audit specific read replicas before reaching for event sourcing. The audit table covers the regulatory ask (reconstructability, non repudiation, who saw what) without committing the team to the long tail of event sourcing operational work (replay, projection rebuilds, schema evolution across the event stream). Event sourcing is the right answer for some Stage 5 systems and a tax for most Stage 4 ones.

Patterns that matter at each stage

The pattern literature for distributed systems is large. Most of it does not apply to your agent yet. The short version of which pattern earns its keep at which stage:

  • Stage 2: row level security policies. The single primitive that makes tenant isolation real. The pattern is documented; see the Postgres link above.
  • Stage 3: idempotency keys and the outbox pattern. Once you have paying customers, dual writes between the LLM call and the database silently corrupt state. Chris Richardson's outbox pattern writeup is the canonical reference; the idea is that the database write and the outbound message live in the same transaction, then a separate worker drains the outbox. Idempotency keys are the consumer side complement, and together they prevent the "ran the refund twice" failure.
  • Stage 3: materialized views for read heavy agent queries. When your agent's retrieval queries start dominating the OLTP plan, materialized views buy you a cache that lives next to your source of truth, refreshable on a schedule the agent does not need to think about.
  • Stage 4: append only audit tables with hash chaining. Discussed above. Pair with audit specific read replicas so the auditor's reads do not slow the agent down. Read replicas are not optional once the audit log becomes the query target for compliance.
  • Stage 4: tombstones for GDPR style deletion. When a regulated customer asks for their data to be deleted, you cannot literally delete from an append only log without breaking the hash chain. The tombstone pattern (mark deleted, retain the row, redact the payload, log the deletion as its own event) keeps the audit chain intact while honoring the deletion ask.
  • Stage 5: semantic cache and token budget aware retrieval. Only at Stage 5 does the cost of redundant retrieval and the latency of full context rebuilds dominate the engineering work. Earlier than that, a semantic cache is a premature optimization that locks in a retrieval shape before you know which retrieval shape matters.

If you want the longer reading list on the audit and replay side, Greg Young's event sourcing talks are still the cleanest treatment of the tradeoffs.

What you do not need yet

Tests for whether your team is over built:

  • You do not need a second vector database until you can name an unstructured input flow your existing setup cannot serve. A pgvector column on the OLTP Postgres covers most Stage 2 and Stage 3 retrieval shapes; the move to a dedicated store earns its keep when the workload has been measured and the existing database is the bottleneck.
  • You do not need a graph database until you can write down three traversal queries your agent actually needs and prove joins in your SQL DB cannot serve them. Graphs do not make agents reason better; they make join heavy retrieval cheaper.
  • You do not need event sourcing at Stage 4 unless you have a documented temporal query or regulatory replay requirement. Audit lights up at Stage 4, but event sourcing is one implementation; an append only audit table (with audit specific read replicas, not just OLTP replicas) is another. The latter is the honest answer for most regulated systems at this scale. Event sourcing has a brutal operational tax (replay, schema evolution, projection rebuilds) that almost no Stage 4 team pays back.
  • You do not need a semantic cache until you have measured the agent recomputing the same retrieval at a rate that shows up in your bill. "Might save us tokens someday" is not measurement.
  • You do not need multi region replication until your customers tell you they need it. They will tell you.

These are all vendor blog defaults. The cost of building them prematurely is not the infrastructure spend. It is the schema decisions you commit to assuming you will need them, decisions that lock you into shapes wrong for your actual workload.

Diagnostic for your Monday

Find a quiet 30 minutes this week. For each of the four access control axes, answer one question:

  1. Tenant isolation: where in your system is this enforced, and is it at the database layer or higher? If higher, can the audit log prove no bypass?
  2. RBAC: what roles exist today, and which ones can your agents impersonate?
  3. Agent secret separation: what secrets does your agent currently have in its context that it should not? For each: write down the action that would need to dereference the secret, and confirm that action lives outside the agent's prompt.
  4. Audit: if a customer asked tomorrow "show me every action your agent took on my behalf in March," could you reconstruct it? Agent initiated tool calls, not just user actions.

If any answer is "I am not sure," that is the axis to spend a quarter on, regardless of stage. A "no" on the audit question is itself the stage answer.

Closing

The data structure is downstream. Tenant isolation, RBAC, secret separation, and audit are the upstream decisions; they are the axes a regulator or a careful customer will probe first, and they are the axes that lock in the storage shape that follows. The vector database you shipped was the right answer to the retrieval question. The four axes are the next question. Pick the governance posture before you pick the next topology, and the topology stops being the load bearing decision. This is the first post in a series. More on each operational maturity stage in subsequent posts, when each is actually useful to a reader sitting at that boundary.

If you are walking into one of the stage transitions above and the answer to the Monday diagnostic was ugly, I'd welcome a conversation. The contact form at the top of the site goes to my inbox.

Frequently Asked Questions

When should I actually use a vector database?

Use one when your existing setup is the measured bottleneck on a named retrieval flow. Until then, a pgvector column on the OLTP Postgres is the cheaper default. If you cannot describe the retrieval shape in one sentence, you do not have a vector problem; you have a schema problem dressed as one. The signal that you are overbuilt is a dedicated vector store at Stage 2 with no measured pressure on the existing database. The signal that you are underbuilt is the agent missing recall on documents the team can describe by name.

What's the practical difference between tenant isolation and RBAC if they share a code path?

Tenant isolation answers "which customer does this row belong to." RBAC answers "is this caller allowed to read or write this row." They share a code path in the sense that both end up as policies at the data boundary, but they fail in different ways. Tenant isolation failures look like data leakage across customers. RBAC failures look like a support agent who can read columns they should not. The audit log has to capture both, and the schema has to name both, even when one policy file enforces them.

Do I really need event sourcing at Stage 4?

In my view, no, not by default. Stage 4 needs audit and reconstructability. Event sourcing is one implementation of that, and it has a real operational tax (replay, projection rebuilds, schema evolution across events). An append only audit table with hash chaining and audit specific read replicas covers most regulated reconstruction asks at lower cost. Event sourcing earns its keep when you have a documented temporal query requirement, regulatory replay, or a Stage 5 multi agent coordination problem.

How do I know which stage I'm at?

Read the trigger column in the maturity ladder table, then read your own database migrations. The earliest migration that adds a <code>tenant_id</code> column is the moment you crossed into Stage 2. The earliest migration that adds a roles table is Stage 3. The earliest migration that adds an append only event log with an <code>actor_kind</code> is Stage 4. If your migrations do not record these transitions, the audit log is the next place to look; the first time the audit log captures a non user actor is the moment Stage 4 became real for you, and the audit story will need to catch up.

Is this just renamed IAM + audit logging?

Yes, mostly. The point of the renaming is not novelty. It is that the moment four distinct decisions collapse into one bucket called "auth," each of the four stops being designed for. Naming them separately is a forcing function for the design conversation, not a claim that the primitives are new. NIST 800-53 has been describing these primitives for years.

What about agents that act across multiple tenants (e.g., a back office support agent)?

Cross tenant agents are the case where tenant isolation alone is not enough; you need an explicit policy for which tenants the agent can read, write, or act on behalf of, and the audit log has to capture the agent's tenant context per action. The minimal pattern is a cross tenant scope claim on the agent's call (the agent calls with an explicit <code>acting_for_tenant</code> value), checked against an allowlist, and recorded in the audit log on every action. Without that, a single jailbroken prompt can pull data from any tenant the agent has ambient access to.

What about single tenant systems (internal tools, lab work, single tenant SaaS)?

The four axes collapse to three. Tenant isolation is not a runtime concern when there is one tenant, but it still shows up as an audit boundary the moment you add a second tenant, even a test one. The other three axes (RBAC, agent secret separation, audit) hold unchanged. A single tenant internal tool with no audit story is still under built for Stage 4 the moment it acts on regulated data; the missing axis is audit, not tenancy.

Where does the outbox pattern fit?

At the Stage 3 boundary, when the agent's LLM call and the database write start needing to be one logical operation. The outbox pattern keeps them in a single database transaction: the agent's intent gets written to an outbox table inside the same transaction that updates state, and a separate worker drains the outbox to downstream side effects. This prevents the failure where the LLM call succeeded, the database write succeeded, and the downstream notification (refund, email, ticket update) silently dropped. Pair with idempotency keys on the consumer side so retries do not double bill the customer.

Code Atelier · NYC

Ready to get agent-ready before your competitors do?

Let's talk