All articles
Agent Engineering··26 min

When the contract is downstream of your data

Stage 4 is the chapter where what your system can prove matters more than what it does. Two additive migrations, one architectural surgery, and the structure that lets a third party verify what happened on whose behalf.

When the contract is downstream of your data

It is a Monday morning in early winter. The room is the same room, the desk is the same slab of birch, and the lamp is on at seven because the apartment has the kind of cold that has nothing to do with the thermostat. The light through the window is the grey of a sheet of unrolled aluminum. The agent in the other terminal has been running for almost a year. The events table you created on a Friday night in late spring, in the same room, has accumulated four million two hundred and thirty one thousand eight hundred and seven rows of organic write history. You ran the count thirty seconds ago and it is still on the screen.

On the second monitor there are two things. The first is an email, sent at five forty three this morning from the chief information security officer of a fintech that processes loan applications. The fintech signed your contract eight days ago. The contract has a security addendum that you read three times the week before signing and a fourth time on the morning you countersigned. The addendum names two obligations: annual SOC 2 Type II evidence from all third party vendors, and on request, a reconstructable audit log meeting specific criteria. Every action taken on customer data, the actor that took it, the timestamp, and cryptographic proof the rows have not been tampered with since they were written. The CISO's email reads, in two short paragraphs, that the security review begins Thursday May 13, that the attached questionnaire is the orientation document, and that the SOC 2 control list at the bottom is what the review will be calibrated against.

The second thing on the second monitor is the row count from the events table. Four million two hundred and thirty one thousand eight hundred and seven. The table is the one from Decision 4 of the Stage 1 post, the one you wrote on a Friday in May and screenshotted to your co founder when the first ten rows appeared. The schema is the same one it has been since the first weekend. There is no hash chain. There is no append only constraint. The actor_kind column is a Postgres text column, not the enum the brief in the back of your head has been promising to enforce. About 88% of the rows carry 'system' because that was the Stage 1 default and a Stage 2 cleanup never landed.

You make a third cup of coffee. The bag is almost empty. The question on the screen, the only one that matters between this Monday morning and the Thursday on the CISO's calendar, is small. What work, done this week, lets the auditor read your events table and write the report you want them to write?

In the first post in this series I named four axes and a five stage operational ladder. Audit and secret separation were the two axes that did no work at Stage 1, sat dormant through Stage 2's row level security turn on, and stayed deferred through Stage 3's roles table and outbox. The trigger for both was "regulated customer signs," and the trigger arrived eight days ago in the form of a PDF. Stage 4 is the chapter where what your system can prove matters more than what it does.

This is a piece about two migrations plus one architectural surgery. The migrations are additive against the events table you already have. The surgery is invasive, takes a week, rewrites every tool implementation that currently holds a secret, and narrows the agent's access surface permanently. If Stage 2's actor_kind discipline was paid, the first migration is a column add, a backfill, and a constraint. If it was not paid, the first migration looks like that and hides three weeks of forensic reconstruction behind it.

A short legal pause, before the work continues

Before the work continues, the note that belongs here. Nothing in this post is legal or compliance advice. Talk to your customer's compliance officer and your own counsel about which specific controls apply to your situation. The NIST 800-53 AC family reference is for orientation, not interpretation. Specific framework controls (HIPAA, PCI, GLBA, SOC 2 Type II) live in the hands of the customer's auditor and your own counsel. Every regulated engagement has specific contractual requirements that the frameworks alone do not determine. The post is the notes of a peer founder on a transition I have thought about. The notes are not a substitute for the conversation with your lawyer or your customer's compliance officer.

The first piece of work, which is the column that makes the table read as honest

The cursor is on a new migration file. 2026/12/stage_4_hash_chain.sql. The migration adds two columns and one trigger. The columns are previous_hash text and current_hash text. The trigger fires on every insert and computes current_hash = sha256(previous_hash || row_payload). The first row's previous_hash is the all zeros sentinel. Every subsequent row's previous_hash is the current_hash of the row with the immediately preceding id in insertion order, which is reliable because the IDs are UUID v7 from Decision 3 of Stage 1 and v7's lexicographic sort aligns with insertion order.

The canonical payload format matters more than the hash function. You write it down as a comment in the migration, because audit tools will need to replicate the computation. The payload is the concatenation of id, tenant_id, actor_kind, actor_id, action, payload_jsonb (serialized with sorted keys), and created_at as an ISO 8601 string. The order is arbitrary. The fact that the order is documented and reproducible is what makes the chain auditable later.

The trigger has to read the current_hash of the previous row inside the same transaction as the insert, with a SELECT ... ORDER BY id DESC LIMIT 1 FOR UPDATE. The FOR UPDATE prevents two concurrent inserts from reading the same previous row and writing two children of the same parent, which would break the chain. The trigger serializes inserts at the cost of throughput. My read is that for the write rate the agent generates at Stage 4 the cost is real but acceptable; for systems writing thousands of events per second, the right pattern is a per shard chain rather than a single global chain, and that work belongs to Stage 5.

The backfill walks the existing four million rows in created_at order, computes the canonical payload for each row, hashes it against the running chain, and writes the two new columns. The script batches in ten thousand row chunks. You run it against a copy of the production database first, which is the second most important sentence in this section. The dry run takes thirty eight minutes. You roll back, read the hash distribution to confirm there are no collisions in the first six characters of any hash, and commit to the window.

The maintenance window is forty five minutes at two a.m. Pacific. The agent is paused. The backfill runs. The append only constraint comes online at two thirty seven, the trigger at two thirty eight. The agent is unpaused at two thirty nine. The first new event carries a previous_hash that matches the current_hash of the row last written before the pause. You verify by hand before going back to sleep.

The chain is now tamper evident, which is the precise word I would write on the wall in the apartment. Any retroactive modification to a row breaks every hash downstream of it. The chain does not give you non repudiation on its own; non repudiation involves timestamping authorities and lives outside this post. Tamper evidence is what most SOC 2 auditors are calibrated for, and what the contract's reconstructability clause asks for. The auditor's question is "how do you know this log has not been modified." The hash chain is the answer. The walk script that recomputes the chain over a sample of recent rows is the demonstration.

The hash chain and the cascade that follows a tamper Five sequential event rows rendered as horizontally stacked cards. Each card carries an id, actor_kind, payload, current_hash, and previous_hash. Purple arrows trace each card's previous_hash field back to the prior card's current_hash, visualizing the chain. The middle card carries a modified payload outlined in amber; that card and every card downstream are outlined in muted red to show the tamper cascading along the chain. A five row slice of the events table, with one tamper at row three Each row's previous_hash points back to the previous row's current_hash. The chain is the line a third party can walk. EVENT #4231803 ACTOR_KIND user PAYLOAD login.session.open ip: 198.51.100.7 PREVIOUS_HASH 0x0000...0000 CURRENT_HASH 0x9c2e...a14b EVENT #4231804 ACTOR_KIND agent PAYLOAD dashboard.read scope: tenant_42 PREVIOUS_HASH 0x9c2e...a14b CURRENT_HASH 0x4f81...c0d3 TAMPERED #4231805 ACTOR_KIND agent PAYLOAD charge.create amount: 1250 → 12 PREVIOUS_HASH 0x4f81...c0d3 EXPECTED HASH 0x7a3f...b2e0 BROKEN #4231806 ACTOR_KIND user PAYLOAD invoice.confirm id: inv_8821 PREVIOUS_HASH 0x7a3f...b2e0 CURRENT_HASH 0x1d05...8f7c BROKEN #4231807 ACTOR_KIND system PAYLOAD cron.reconcile window: 24h PREVIOUS_HASH 0x1d05...8f7c CURRENT_HASH 0x36b2...90a8 Tamper evidence is the chain's only job. Every modification cascades. The chain does not prevent the tamper; it makes the tamper impossible to hide.

CREATE OR REPLACE FUNCTION compute_event_hash() RETURNS trigger AS $$
DECLARE
  prev_hash text;
  payload text;
BEGIN
  SELECT current_hash INTO prev_hash
    FROM events ORDER BY id DESC LIMIT 1 FOR UPDATE;
  prev_hash := COALESCE(prev_hash, repeat('0', 64));
  -- Note: Postgres jsonb::text serialization is not key order stable across
  -- versions. Pin the Postgres version in the audit documentation and test the
  -- canonical format on upgrade.
  payload := prev_hash || '|' || NEW.id || '|' || NEW.tenant_id || '|' || NEW.actor_kind
    || '|' || NEW.actor_id || '|' || NEW.action
    || '|' || NEW.payload_jsonb::text
    || '|' || to_char(NEW.created_at, 'YYYY-MM-DD"T"HH24:MI:SS.US"Z"');
  NEW.previous_hash := prev_hash;
  NEW.current_hash := encode(sha256(payload::bytea), 'hex');
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

The trigger is fourteen lines. The fourteen lines are the entire load bearing surface area of the tamper evidence story.

The second piece of work, which is the constraint that changes what the table is

The cursor moves to a second migration file. 2026/12/stage_4_append_only.sql. The migration is short enough to read in one breath.

REVOKE UPDATE, DELETE ON events FROM app_role;
-- The migration role retains UPDATE and DELETE under the runbook's
-- two engineer requirement. The escape hatch is documented, not removed.

CREATE OR REPLACE FUNCTION reject_event_mutation() RETURNS trigger AS $$
BEGIN
  IF current_user NOT IN ('audit_admin') THEN
    RAISE EXCEPTION 'events table is append only for role %', current_user;
  END IF;
  RETURN COALESCE(NEW, OLD);
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER prevent_event_update BEFORE UPDATE OR DELETE ON events
  FOR EACH ROW EXECUTE FUNCTION reject_event_mutation();

Three statements. The first revokes UPDATE and DELETE on the events table from the application role and the migration role. The second creates a trigger that raises an exception on any mutation attempt unless the calling role is audit_admin, which you created last week and have not yet handed any credentials for. The third attaches the trigger. The two layers are belt and braces. Both together close the failure mode where a role attribute change or a future grant restores the privilege without the trigger noticing.

The behavior change is that the events table is now write once from the application's perspective. Correcting a bad row is no longer an UPDATE. It is a new event with action correct_prior_event and a payload that references the bad row's ID. The correction is itself an event in the chain. The bad row stays.

The escape hatch, because every system has one and the auditor will ask, is the audit_admin role. It exists for the kind of write down circumstance named in the contract addendum and the runbook together: a court order, a security incident remediation, a customer request that legal has approved. The use of the role is logged into a separate audit table. The auditor's question is "who can modify these rows and under what conditions." The answer is a named role with a documented use policy and its own audit trail.

The Postgres docs on event triggers and table level triggers are the right reading if the pattern is new. What the docs are silent about, which I would name to a peer founder reviewing this work, is the interaction between the trigger and the privilege revocation. The privilege revocation prevents a connection from issuing the statement. The trigger catches the case where the privilege is restored and the statement reaches the table. Both layers should be present.

A short detour into the failure mode that lives on the other side of this Monday

Before the next piece of work lands, I want to put a sibling scene on the table the way the prior posts in this series did. The question of whether your actor_kind column carries meaningful values is the most consequential question in this post.

Picture a different Monday in a different apartment. A four person engineering team has shipped Stages 1 through 3 cleanly. The tenant_id columns are everywhere. The row level policies are forced. The application role lost BYPASSRLS six months ago. The roles table has the four roles their first paying customer asked for. The idempotency layer rejects duplicates with the patience the Stage 3 post describes. The outbox drains cleanly. They are a team that has read the operational maturity literature and applied it.

What they did not do, eleven months ago, was the actor_kind discipline the Stage 2 work asked for. The column existed. The Stage 1 default was 'system'. The team intended to switch the log_event function to write real values, but the change felt like premature work at the time. The change went on the backlog. The backlog had other things on it.

Eleven months later, the team's first regulated customer signs and the auditor's first question lands. The question is "every action a human took on the customer's data in the last 30 days." The team runs the query. The query returns twelve million rows where actor_kind = 'system', because that is what every row says. The reconstruction the auditor is asking for cannot be answered from the events table alone, because the events table does not know which of those rows were agent actions taken on behalf of a human and which were the agent acting autonomously.

The reconstruction that follows is forensic, not migrational. The team joins the events table to the application's session log on a session_id column that has been on the events table since Stage 2 but was never used until this week. They cross reference the LLM provider's API logs to separate agent initiated calls from user initiated calls. They pull VPC flow logs to corroborate IP ranges. They write a SQL script that walks the twelve million rows and assigns each a reconstructed value. The script runs for nine hours. About six hundred thousand rows are unrecoverable because the session log retention is shorter than the audit window. Those rows are disclosed as partial.

The customer's compliance officer requests weekly updates. The founder spends two weekends reading the team's own application logs. The team eventually delivers the reconstruction with the partial rows flagged. The auditor accepts it. The compliance officer copies the report into the contract addendum's renewal review file.

The cost of the eleven months is not the three weeks of reconstruction. The cost is that the three weeks happened during the customer's first impression of the team's operational posture. The technical work was bounded. The trust cost is not. The lesson I would write on the wall in the apartment: Stage 2's actor_kind discipline pays for itself at Stage 4, and skipping it makes the Stage 4 work forensic, not migrational. The Stage 1 default is the leak. The Stage 2 cleanup closes it.

Back to your apartment, where the lamp is still on and the third cup of coffee is now warm enough to drink.

The third piece of work, which is the column that should have started carrying real values nine months ago

The cursor moves to a third migration file. 2026/12/stage_4_actor_kind_enum.sql. This is the migration that comes due for the discipline the Stage 2 post named, and the discipline you paid in part. You set up the log_event parameter on Stage 2's Thursday afternoon. The parameter writes 'user' and 'agent' correctly for the call sites you swept. What it does not do is reach the call sites you did not sweep. About 88% of the rows carry actor_kind = 'system'. Some are real system writes from cron jobs and from a backfill you ran in early autumn. Most are the Stage 1 default leaking through call paths the Stage 2 sweep missed.

The migration is in three pieces. The enum, the backfill, and the constraint.

CREATE TYPE actor_kind_enum AS ENUM (
  'user', 'agent', 'system', 'admin', 'unknown_pre_stage_2'
);

Five values. user for human originated writes. agent for the LLM's autonomous actions. system for cron jobs, backfills, and scheduled work the agent did not originate. admin for the kind of write that comes from an internal tool or a customer support session. unknown_pre_stage_2 for the rows the backfill cannot reconstruct cleanly. The fifth value carries the cost of the incomplete Stage 2 sweep into the enum vocabulary itself, which is the honest way to encode an incomplete history. The auditor's question, when it comes, will be "what does this value mean," and the answer is "rows written before the Stage 2 actor discipline could be enforced, where session reconstruction could not assign a confident value." Renaming this to something more euphemistic would be a small dishonesty that compounded across the auditor's reading would erode trust on a margin that matters.

The backfill is the part that takes time. You write a SQL script that walks the four million rows, joins each row to the session log via the session_id column that has been on the events table since Stage 2, and assigns the value based on the session's authentication context. Sessions initiated by a human login get 'user'. Sessions initiated by an LLM provider API call get 'agent'. Sessions from a scheduled job runner get 'system'. Sessions from the admin panel get 'admin'. Sessions that cannot be joined (the session row has rolled off retention) get 'unknown_pre_stage_2'. The script runs against a staging copy first, takes six hours, and reports the distribution at the end. About 60% are 'agent', 25% 'user', 9% 'system', 1% 'admin', and 5% (around two hundred thousand rows) 'unknown_pre_stage_2'.

The migration runs against production during the same maintenance window as the hash chain backfill, because both touch every row and one maintenance period is cheaper for the customer than two. The order is hash chain first, enum conversion second, because the enum conversion has to write the hashes for the modified rows correctly under the new chain. You write ALTER TABLE events ALTER COLUMN actor_kind TYPE actor_kind_enum USING actor_kind::actor_kind_enum; after the backfill completes. The conversion checks every value against the enum. If any row carries a value not in the enum, the conversion fails atomically. The conversion runs cleanly. Every future write has to specify one of the five enum values. The application code that wrote the 'system' default for nine months will now fail at the database boundary instead of writing the wrong value silently.

The relationship between this work and the sibling scene is the relationship between the discipline you paid and the cost you face. The Stage 2 sweep covered the call sites the application code knew about. The Stage 4 migration covers the remaining 88% by joining to session logs you extended in early summer. If the Stage 2 sweep had not happened, the entire 100% would be a reconstruction effort, and the maintenance window would not be a window but a quarter.

The fourth piece of work, which is the surgery that takes a week and rewrites the agent's relationship to its secrets

The cursor moves to the agent's source tree, which has not changed in a way this large since the first weekend in May. Up until this Monday, the agent has held the secrets it needs in its environment file. OpenAI key. Anthropic key. The fintech customer's Stripe key. Database credentials. API keys for downstream services. The tool implementations call third party APIs with the raw key in the function call. The architecture has been this shape for forty nine weeks.

At Stage 4 it changes. The work is invasive. There is no path through it that does not rewrite every tool implementation in the agent. There is no path through it that lets the existing test harness keep its current shape. There is no path through it that does not require the founder, which is to say me if I were the founder this week, to spend the days between this Monday and next writing a runbook for a new service that did not exist last Friday.

The pattern is a dereference layer. A separate service, deployed independently of the agent, holds the secrets. The agent no longer holds them. The tool implementations call the dereference service with opaque handles. The dereference service resolves the handle to the real secret value, makes the third party call on the agent's behalf, scopes the secret to that specific call's intent, expires the resolved value after the call completes, logs the dereference to its own audit chain, and returns the third party API's response. The agent's process never holds the resolved value. The agent's logs never see the resolved value. The agent's context, which the LLM observes during planning, never carries the resolved value.

The new tool signature changes shape. Where the tool used to accept a raw key in the arguments dictionary, it now accepts an opaque handle string. The dereference service holds the resolution. The agent's process never sees what the handle resolves to. The handle is opaque. The agent does not know what it resolves to or the scope it carries. The dereference service knows the handle resolves to the customer's Stripe secret key, that the key is scoped to the Stripe charges API and no other Stripe endpoint, that the scope expires sixty seconds after the dereference completes, and that the call is logged with the agent's session ID, the tool call ID, and the tenant context. The Stripe handle cannot be used to call Slack. The cross service substitution that an exfiltrated handle could produce is bounded by the scope the dereference service enforces.

The work is a week of focused engineering. I want to name the duration honestly, as a contrast signal rather than a project estimate. The actual duration depends on how many tool implementations the agent has and how many of them touch real credentials. For the agent on the screen this morning, twelve tools and five credential types, the work is roughly a week. For an agent with thirty tools and ten credential types, two to three weeks. For an agent with three tools and one credential type, two days. The duration scales with the surface area.

The work is, in order: build the dereference service as a small standalone process; configure the secrets store it reads from (AWS Secrets Manager, HashiCorp Vault, or a Postgres table with column level encryption); write the scoping rules; rewrite each tool implementation to accept a handle and call the dereference service; update the test harness so every test uses a stub dereference service; remove the raw secrets from the agent's environment; verify the agent's process no longer holds the secrets in memory at startup. The last step is the one most teams skip and the one the auditor will ask about.

The runbook for the dereference service becomes the artifact you write the most carefully this week. The dereference service is now the agent's most security critical component. If it is unavailable, the agent loses access to every downstream API at once. If it is compromised, every secret the system holds is exposed. The blast radius narrows the agent's surface and concentrates the dereference service's. The runbook has to name the failure modes, the recovery procedures, and the operational practices that keep the new concentration from becoming a worse problem than the surface it replaced.

The agent's process before and after the dereference layer Two columns separated by a vertical hairline. The left column shows the prior posture: the agent process holds the raw Stripe key in its environment and passes it through the tool implementation to the Stripe API. The right column shows the Stage 4 posture: the agent holds an opaque handle, calls a separately deployed dereference service that resolves the handle, scopes it to the target API, expires it after the call, and logs the dereference to its own audit chain. The agent's secret surface, two shapes Same call, two architectures. Left is what the auditor reads as too much trust in the agent's process. BEFORE STAGE 4 Raw secret in the agent's env Agent process STRIPE_API_KEY = sk_live_4eC... resolved value lives in process memory key in argv Tool implementation stripe.charges.create(key, args) raw key in flight Stripe API side effect fires AFTER STAGE 4 Opaque handle, dereference layer Agent process handle: stripe-charges-create resolved value never enters this process handle in argv Dereference service SEPARATE DEPLOY resolve handle → secret scope to Stripe charges API expire after the call returns log dereference to audit chain scoped key, 60s TTL Stripe API side effect fires The dereference layer is the structure the auditor calls least privilege. NIST 800-53 AC-6, AC-3.

The NIST 800-53 AC family anchors what the dereference layer is doing in compliance vocabulary. AC-6, least privilege, is the control that the scoping rules are the implementation of. AC-3, access enforcement, is the control the handle resolution implements. The NIST AC family documentation names both controls at the orientation level. The specific interpretation of how those controls apply to your customer's contract is the work the customer's auditor and your own counsel do together.

A second sibling scene, which is the leak that has already happened

I owe you the second sibling scene before the thesis lands. The first scene was about a Stage 2 discipline skipped. The symmetric scene, the one the dereference layer is closing the barn door against, is about a Stage 1 debugging session that left a side effect in a third party log retention service the team does not fully control.

Picture a different fintech, a different team, the same early winter. The team shipped Stages 1 through 3 cleanly. They are reading the brief in the back of their head the same way you are this Monday morning. The dereference layer is on their plan for this week.

What they discover, on the Tuesday of the same week, is that nine months ago a debugging session left a tool call that logs its full input to CloudWatch. The tool call was for Plaid token exchange. The fix for the original bug landed. The log line did not get removed. Three months of production traffic passed through that code path. The CloudWatch log group has a 90 day retention. The Plaid access tokens written into the log group during the three month window are still in the log group right now, in a service the customer's SecOps team has read access to as part of the contract.

The team's CTO finds the tokens at eleven on Tuesday morning, while doing the secret in logs grep that is a Stage 4 pre flight step. The tokens have already been rotated by the application's refresh logic; the original tokens are no longer valid. But the log retention is 90 days. The customer's contract requires the team to disclose any prior token exposure, including exposure to tokens that have since been rotated, because the contract treats the exposure event itself as the disclosed item. The founder spends the afternoon writing a remediation plan to the customer's CISO.

The remediation plan is four pages. A forensic timeline (when the logging started, what token formats were exposed, what API surface those tokens could access). An attestation that the tokens have been revoked and that no anomalous Plaid API activity has been observed during the exposure window. A runbook for preventing recurrence, in which the new dereference layer is the load bearing component, because the pattern makes it impossible for a tool call's input to carry a raw secret in a form a log line can capture. An offer of pro rated contract relief if the customer asks for it. The CISO calls the founder on Thursday morning. The conversation is calibrated and difficult and ends with the customer reserving the right to revisit the matter at the next renewal.

The lesson here: secret separation at Stage 4 is closing the barn door, but the horse left a long time ago. The pattern shipped this week does not undo the Stage 1 debugging session from nine months ago. The data is already in the third party log service the customer does not fully control. The cleanup is irreversible. The disclosure is unavoidable. The runbook update is what the customer's CISO is calibrating against when she reads the remediation plan and decides what posture to take into the renewal.

The pre flight check that surfaces this leak before the auditor does is the secret in logs grep. The grep pattern is calibrated to the credential formats in scope. Stripe live keys begin with sk_live_. Plaid access tokens have a recognizable structure. The grep runs against production log groups, staging log groups, and archived log groups inside the audit window. Hits are the leak. The cleanup is rotation, disclosure, and a written remediation plan. The cleanup is not deletion of the log data, which a third party log retention service does not give you the lever to do.

Back to your apartment, where the dereference layer's first commit is in the repository and the third cup of coffee is almost empty.

The fifth piece of work, which is the tombstone if your customer has European users

The cursor moves to a fourth migration file, conditional on whether your fintech customer has European users in scope for GDPR or a comparable deletion obligation. The fintech on your screen has a small EU subsidiary, and the security addendum carries a deletion clause that references the right to be forgotten. The migration is the answer to the tension between the append only constraint you applied an hour ago and the deletion right the customer has to honor.

The pattern is tombstones. A deletion request does not delete the row from the events table, because the append only constraint forbids it and because deleting a row would break the hash chain downstream. The deletion is a new event. Action data_deleted. The payload references the rows being logically deleted by ID. The rows being deleted have their payload columns nulled or replaced with a redaction marker, but the rows themselves stay in the events table at their original positions in the chain. The chain stays intact. The audit log records the request. The data behind the request is gone.

The schema is small. A new nullable column tombstone_id uuid on the events table that references a new tombstones table with four columns: id, user_ref (external identifier, not data), requested_at, completed_at. The lifecycle is that a deletion request creates a row in tombstones with completed_at null. A background worker walks the events table, nulls the payloads of matching rows, sets the tombstone_id on each, and updates completed_at. The worker is bounded to thirty days, the contractual maximum for the deletion timeline.

The behavior is that the user's data, in the sense of the payloads that referenced them, is gone. The audit chain still references the tombstone row, so the chain is intact. The tombstone pattern retains the row's position in the chain while removing the personal data the row contained. Whether this satisfies the deletion obligation in your specific contract and regulatory context is a question for the customer's compliance officer and your counsel, not for this post. The technical note is that the append only constraint and the deletion right are not in conflict if deletion is implemented as redaction plus a first class audit event rather than as a row removal.

If your customer does not have EU users, this migration is deferred. The append only constraint without GDPR concerns is enough.

The thesis, which lands in one sentence and earns the chapter it belongs to

You step back from the migration files. There are four of them in 2026/12/ now, plus the dereference service's repository in a sibling directory. You look at the work as a set, and you write down on the back of an envelope, in pencil, the shape of what landed this week.

Two migrations, additive against the events table you already had. One architectural surgery, the dereference layer, which rewrites every tool implementation that touched a raw secret and concentrates the secret surface into a separately deployed service with its own audit chain. One conditional migration, the GDPR tombstones, which closes the tension between the append only constraint and the deletion right by recording the request as a first class event rather than as a row removal.

That is the entire surface area of Stage 4 for a team that paid the Stage 1 deferrals and the Stage 2 discipline. The schema's column shape grows by three. The schema's type system grows by one enum. The privileges on the events table narrow by two. The agent's process loses the raw secrets from its environment. The dereference service gains them, in a configuration that scopes every resolution to the call's intent. The runbook gains a new chapter.

The frame the whole post has been building toward lands here. Stage 4 is the chapter where the contract is now downstream of your data structure. Stages 1, 2, 3 were about what your system did. Stage 4 is about what your system can prove. The hash chain, the append only events table, the actor_kind enum, the secret dereference layer, the tombstone pattern. These are not features. They are the structure that lets a third party with no access to your team's standups verify what happened, when, and on whose behalf. The auditor does not need to take your word for any of it, because the structure of the system itself carries the proof.

Stage 4 is the chapter where what your system can prove matters more than what it does.
One events row, four annotations that make it reconstructable A single events row card sits in the center of the canvas. Four annotations radiate outward to the four cardinal directions. Left: a three link hash chain showing previous_hash pointing back and current_hash pointing forward. Top: a stack of five actor_kind enum pills with the value for this row highlighted in purple. Right: a tombstones table reference rendered muted because the column is nullable. Bottom: a downward link into the dereference service's own audit chain, representing the side effect this event triggered. One row in the events table at Stage 4 Four metadata columns radiate outward. Together they make the row a thing a third party can reconstruct. ACTOR_KIND, ONE OF FIVE ENUM, NOT NULL user agent system admin unknown_pre_stage_2 EVENTS ROW #4231805 tenant_id tenant_42 actor_kind agent action charge.create tombstone_id NULL previous_hash 0x4f81...c0d3 current_hash 0x7a3f...b2e0 HASH CHAIN prev #4231804 0x4f81...c0d3 this row links back and signs forward 0x7a3f...b2e0 → TOMBSTONE points to tombstones if a delete request has redacted this row NULLABLE this row carries NULL DEREFERENCE SERVICE AUDIT CHAIN deref #88421 this event triggered a side effect; its own chain entry was written there Four columns of metadata that together compose what an auditor calls reconstructability. The third party never asks the team. The structure of the row carries the proof.

What does not light up at Stage 4, and why each absence is a decision

There is a list, and it is half the point of the post, because the discipline of refusing to do things the regulated customer's signature might seem to require is half of what makes Stage 4 a one week transition rather than a three month rewrite.

Multi region replication stays off. The trigger is Stage 5, either a contractual geographic residency requirement the customer names explicitly in the addendum or measured cross region latency the workload cannot tolerate from a single region. If your customer has not named a specific residency or latency requirement, this is a Stage 5 deferral with a documented trigger, not a compliance gap.

Sharding stays off. The trigger is measured database pressure on the write rate or storage footprint that unsharded Postgres cannot serve. Four million event rows is not a sharding problem. Four hundred million might be, measured. The append only constraint and the hash chain make sharding significantly more invasive than it would be on a mutable table; the chain has to be partitioned per shard.

Read replicas stay off. The trigger is Stage 5 measured read load. Building the replica this week is early in the sense that the previous posts named early as the symmetric mistake to under building. Defer.

Custom database forks stay off. Never. There is no scenario at Stage 4 where the right answer is a fork of Postgres with custom audit primitives compiled in.

Microservices decomposition of the working monolith stays off. Never, in the sense that the trigger is not "the regulated customer signed." The dereference service is the one new boundary the chapter creates, because the security control requires it. Splitting the agent into a service mesh because the auditor might prefer to see services is a Stage 5 plus mistake. The auditor is calibrated for controls, not for topology.

Each absence has a trigger. None of the triggers is "the team had bandwidth this week." Building any of them earlier is early, not unfinished. The deferrals are the decisions, the same as they were at Stages 1, 2, and 3.

What lights up at Stage 4, and what carries over The Stage 3 deferral heatmap with three more cells solidified at Stage 4. Agent secret separation, the hash chained append only events table, and GDPR tombstones turn on. The earlier Stage 2 and Stage 3 cells stay solid as carryover. Two Stage 5 cells stay at half intensity, waiting for measured pressure. Event sourcing stays muted as "if ever". Three more cells lit at Stage 4; five earlier cells carry over Secrets, hash chain, and tombstones turn on. The Stage 2 and Stage 3 cells still enforce. The rest hold. dormant lights up required at this stage 02 Co-builders first NDA'd operator 03 Paying first contract signed 04 Regulated audit + secrets live 05 Multi-agent concurrent at consequence 05+ Almost never documented requirement Row level security policies required required Roles table, RBAC required Idempotency key enforcement required Outbox pattern required actor_kind discipline, real values required required Agent secret separation required Hash chained audit, append only required GDPR tombstones required Materialized views, replicas, cache lights up Multi region, failover, sharding lights up Event sourcing if ever Three cells turn solid at Stage 4. Five cells stay required from earlier stages. Two cells hold for Stage 5 measured pressure. Stage 4 is the chapter where the contract is downstream of your data. Eight cells lit, three deferred.

Monday afternoon, when the work earns the security review

It is now Monday afternoon. The grey is the same grey. The four migration files have been committed and merged. The dereference service is in its first day of development. The runbook is open in a third window. The CISO's email is still on the screen because you have not yet replied. The Thursday on the CISO's calendar is three days away.

What runs this afternoon is the pre flight. Eight checks, in order, between calls and a sandwich. The checklist is not the artifact. The ninety minutes between the third coffee and the call with the customer's engineering lead at four is the artifact.

First, the audit row immutability test. You connect to the database as the application role. You attempt to execute UPDATE events SET action = 'test'. The expected result is permission denied. The query returns permission denied immediately, because the privilege revocation catches the statement before it reaches the trigger. You attempt DELETE FROM events. Permission denied. The two layers of the append only constraint are working. Three minutes.

Second, the hash chain integrity walk. You select the most recent ten thousand rows from the events table ordered by id. For each row, you compute the expected current_hash from the canonical payload and the previous_hash, and compare against the stored value. Expected: every row matches and every previous_hash matches the preceding row's current_hash. The walk runs in twelve seconds. Every row matches. You wire the walk into CI as a permanent regression check, run a deliberate test that mutates a row's hash in a staging copy, confirm the build fails as expected, and leave the regression armed for every future deploy.

Third, the actor_kind enum coverage check. SELECT actor_kind, COUNT(*) FROM events GROUP BY actor_kind. Expected: five values, zero null rows, zero values outside the enum. The query returns the expected distribution. Two minutes.

Fourth, the secret in logs grep. The pattern is the union of all credential formats in scope: Stripe live keys, Plaid access tokens, Anthropic keys, OpenAI keys, the fintech customer's API key format. The grep runs for eight minutes against production, staging, and archived log groups. Zero hits. You save the result for the Thursday review as evidence of the check.

Fifth, the dereference layer scope test. You call the dereference service with a handle that belongs to a different tenant's session. Expected: scope error. The call returns the scope error. You try the same handle against an invalid tool intent (a Stripe handle called against the Slack endpoint). Scope error. The scoping is enforcing. Four minutes.

Sixth, the GDPR tombstone test. You issue a test deletion request through the API path the deletion workflow uses, against a synthetic user in a staging tenant. The relevant payloads are nulled; a data_deleted event is present; the tombstone row exists with completed_at set; the chain across the redacted rows is intact (the chain hashes over positions and metadata, which did not change). Six minutes.

Seventh, the auditor read replica isolation test. Conditional on the audit replica being deployed, which it is not. You flag the absence in the runbook as a "deferred until measured" item with the trigger named. Two minutes spent documenting the deferral.

Eighth, the append only trigger smoke test. You confirm the trigger exists with \d events and that the trigger function returns the expected exception when invoked manually as a non audit_admin role. Three minutes.

The whole rhythm is roughly ninety minutes. The pre flight passed. You write the reply to the CISO now. Three paragraphs. Honest about which controls the architecture supports as of this Monday afternoon. The reply links to the runbook, which is now in a shape the CISO can read.

Thursday afternoon, with the security review behind you

It is now Thursday afternoon. The security review with the CISO began at one and ended at four. The CISO had questions. The questions were calibrated. The questions were about the hash chain's canonical format, the dereference service's scoping rules, the actor_kind enum's unknown_pre_stage_2 value (which the CISO appreciated the honesty of in a way you did not expect), and the runbook's incident response section. The CISO had no blockers. The review went well, in the calibrated sense that the CISO does not say so in words but the absence of follow up flags in the closing remarks is the signal.

You set the laptop aside. You walk to the kitchen. You make a fourth cup of coffee, which is the third refill of the bag this morning since the six a.m. one. The bag is now empty in a way that means tomorrow's coffee is going to be from a different bag than the one that started Monday. You look out the window. The winter light is grey and the apartment is cold. The street has snow on the sidewalk that has been there since Tuesday. The building across the way has lights on in two windows, both of them the warm yellow of someone reading.

The agent is running. The agent is running for a customer who will never meet you. The CISO ran the security review three hours ago and the CISO does not need to take your word for any of it, because the structure of the system itself carries the proof. The hash chain is the proof. The append only constraint is the proof. The actor_kind enum is the proof. The dereference layer is the proof. The runbook is the proof. The pre flight that ran on Monday afternoon is the proof. None of it requires you to be in the room.

If you are at the Stage 3 to Stage 4 boundary and any of this resonates, I would welcome a conversation. The contact form at the top of the site goes to my inbox.

Up next: Stage 5, when measured pressure means something specific.

Frequently Asked Questions

I shipped Stages 1-3 well but skipped Stage 2's actor_kind discipline. How big is the Stage 4 reconstruction?

Directionally, three weeks of forensic work if your auxiliary log retention is long enough to support the reconstruction, and indefinite if it is not. The effort joins the events table to whatever external log sources can identify which sessions were human originated versus agent originated. The minimum you need is an authentication provider log that distinguishes human logins from API initiated sessions, and an LLM provider log that distinguishes agent triggered tool calls from user triggered ones. If both are available with retention covering the audit period, the reconstruction is laborious but bounded. If either is missing, the reconstruction is partial and has to be disclosed as partial. The honest answer is that the cost is unknown until you inventory your log sources and their retention. Do the inventory before telling the regulated customer the audit is ready.

Do I need a hash chain at Stage 4, or is append only enough?

My read is that you need both. The append only constraint prevents writes from mutating the table going forward. It does not prevent mutations that happened before the constraint was applied, and it does not give you a way to prove the table is unmodified to a third party who cannot watch the constraint enforce in real time. The hash chain is the artifact a third party can verify after the fact. An auditor who asks "how do you know this log has not been modified" can read the chain walk and the regression check in CI. An auditor who hears "we have an append only constraint, you can trust us" hears something different. Skipping the chain and keeping the constraint leaves a gap that compounds across the renewal conversation.

What about HIPAA, PCI, GLBA, or SOC 2 specifically?

Nothing in this post is legal or compliance advice. The specific controls that apply to your situation live in the contract addendum your customer signed and in the auditor's interpretation of the relevant framework. The architecture work this post describes is calibrated against the NIST 800-53 AC family at the orientation level, intended to give a peer founder enough vocabulary to read their own contract addendum critically. The interpretation of which control language applies to which architectural decision is the work the customer's auditor and your own counsel do together. Talk to your customer's compliance officer. Talk to your own counsel. Both conversations are cheaper than the conversation that follows a misread.

Can my application role drop the dereference layer if the agent only calls our own API?

The dereference layer earns its keep in cases where the agent calls third party APIs with credentials the third party issued. If the agent's tool surface only ever calls your own internal API, the credential is one you issued and can revoke, and the pattern is over engineered for the surface you have. The pattern still earns its keep in two specific cases: when the agent's tool surface might grow to include third party credentials later (in which case you do not want to retrofit the dereference pattern), and when the internal API credentials are scoped per call type (in which case the dereference layer is the natural enforcement surface). The architectural posture I would name explicitly is that the agent should never hold a long lived credential it can use across unbounded call types.

Do I need GDPR tombstones if I have no EU users?

Not for GDPR specifically. The pattern earns its keep in two adjacent cases. The first is California's CCPA, which has a deletion right that is narrower than GDPR's but real. The second is any customer contract that includes a contractual deletion obligation independent of statutory law, which some enterprise customers write into their security addenda for reasons that have to do with their own downstream compliance obligations. If neither case applies, the pattern is deferred until it does. Whether one of them applies is a question for the customer's compliance officer.

What about non repudiation, does the hash chain give me that?

No, and I want to name the limit honestly because the contract addendum language can be misread on this point. The hash chain gives you tamper evidence. Any modification to a row after it was written breaks the chain downstream, which a third party who recomputes the chain can detect. Non repudiation is a stronger property. It requires that writes be signed by a key the writer cannot deny holding, typically attested by a timestamping authority. The hash chain on its own does not invoke a timestamping authority. The implementation that would add non repudiation is to sign each row's hash with a private key held in a hardware security module and to timestamp the signature against an external service. That work is a Stage 4 plus or Stage 5 consideration that earns its keep when the contract specifically names non repudiation rather than reconstructability. Most SOC 2 auditors are calibrated for tamper evidence.

How do I prepare for the actual security review with the customer?

The honest answer, as a peer founder rather than a consultant, is that the preparation is the runbook and the pre flight passing cleanly. The CISO will ask questions calibrated to the contract addendum and the SOC 2 control list. The right posture is to answer in the terms the architecture actually implements, with references to the runbook when the question is technical, and with explicit deferrals to the customer's compliance officer and your own counsel when the question crosses the interpretation line. The runbook is the document the CISO will read most carefully. The pre flight is the rehearsal that makes you ready to read the runbook out loud if asked. The honesty in the <code>unknown_pre_stage_2</code> value is the kind of honesty the CISO is calibrated to notice, in my read, more than the absence of the gap would have been. The review is a conversation between two technical people about what the system can prove. The architecture is the substrate. Both deserve preparation.

Code Atelier · NYC

Ready to get agent-ready before your competitors do?

Let's talk