If you are a non-technical executive about to approve an AI agent pilot this quarter, you are almost certainly being asked to make a risk decision without a shared vocabulary for the risk itself. I want to name that problem plainly, because I think most of the confusion around agentic AI right now is downstream of it.
For every other enterprise technology a board member cares about, there is a single artifact you can point at. SOC 2 Type II for SaaS security. PCI DSS for card data. ISO 27001 for information security management. HIPAA for protected health information. You can walk into a procurement conversation and ask, "show me your certification," and the answer is either a PDF or a real problem.
For agentic AI in April 2026, there is no equivalent artifact. There is no "Agentic Risk Standard" you can ask a vendor to produce. What exists instead is a constellation of partial frameworks, each written for a different reader, none of which a CFO can use as a go or no-go gate on a pilot.
This piece is commentary on that gap. I am going to use the phrase Agentic Risk Standard, or ARS, as a shorthand for the thing that does not exist yet. It is not a framework I am selling. It is not an existing certification. It is a name for a hole in the market that executives are feeling every time a vendor pitches them on an autonomous agent and they do not know what to ask.
A quick note before we go further. Nothing in this article is legal advice. Regulatory references cite primary sources. If you are in a regulated industry and making a deployment decision, your counsel should be in the room.
The standard you are about to ask for does not exist yet
Here is the test. Imagine your head of engineering brings you a proposal to deploy an autonomous agent that will take actions in three systems: your CRM, your email, and your payment processor. The agent will answer customer questions, update records, and issue small refunds up to some cap.
Your instinct as a buyer is correct. You want to ask: what standard has this been evaluated against, and who certified it? That is the exact question you would ask about a SaaS vendor handling customer data.
The honest answer your engineering team will give you, if they are honest, is some version of: "There is no single standard. We have mapped the agent against OWASP's Top 10 for Agentic Applications, we are tracking NIST's guidance, we have pen tested it using MITRE ATLAS techniques, and our identity provider handles the non-human identity piece."
That answer is not wrong. It may even be a sign of a serious team. But notice what just happened. You asked for a pass or fail, and you got a reading list. That is the gap.
What you have instead: the real frameworks that each cover part of the picture
Before I argue the gap is real, I want to give real frameworks a fair hearing. Every one of the following exists, is publicly documented, and has serious people behind it. None of them, by itself, solves the buyer problem I am describing.
1. NIST AI Risk Management Framework and the Generative AI Profile
The NIST AI Risk Management Framework 1.0 was published in January 2023. It is a voluntary, general-purpose framework organized around four functions: Govern, Map, Measure, and Manage. In July 2024 NIST added the Generative AI Profile (NIST AI 600-1), which calls out twelve GenAI-specific risks and suggested mitigations.
Neither of those is agentic-specific. NIST acknowledged the gap in February 2026 when it announced the AI Agent Standards Initiative through the Center for AI Standards and Innovation, with an AI Agent Interoperability Profile expected in Q4 2026. That is the most credible candidate I see to start closing the gap, and it is still announced rather than published.
If you ask a vendor "are you NIST-aligned," you are asking a question that does not have a specific agentic answer yet. The best a serious vendor can say is "we mapped our controls to the Generative AI Profile risks that apply to us." That is a process answer, not a pass or fail.
2. OWASP Top 10 for Agentic Applications 2026
The OWASP Top 10 for Agentic Applications was released in December 2025 by the OWASP GenAI Security Project. It is the most detailed public threat list for agent security right now. It identifies categories like Agent Goal Hijack (ASI01), Tool Misuse, Identity and Privilege Abuse, and Rogue Agents. Three of the top four categories are specifically about identity, tools, and delegated trust boundaries.
This is an excellent list. It is also written for application security engineers. A CFO cannot use it as a purchase gate because it does not answer "has this vendor passed?" It answers "here are the ten things your vendor should have thought about." Those are different questions.
3. MITRE ATLAS
MITRE ATLAS, the Adversarial Threat Landscape for AI Systems, is a structured knowledge base of real-world tactics and techniques used to attack AI systems. As of the v5.4.0 release in February 2026, it contains 16 tactics, 84 techniques, and 56 sub-techniques. That is up from 15 tactics and 66 techniques in October 2025, largely because MITRE has been adding agentic-specific techniques fast.
The new agentic entries include AI Agent Context Poisoning, Memory Manipulation, Modify AI Agent Configuration, Publish Poisoned AI Agent Tool, and Escape to Host. If you are a red teamer, this is invaluable. If you are a board member trying to decide whether to approve a pilot, you cannot directly use it, and you should not try. It is a map of attacker behavior, not a buyer rubric.
4. EU AI Act
The EU AI Act is the closest thing to a binding regulatory regime for AI systems, and its treatment of agentic AI is worth understanding precisely. Governance rules and general-purpose AI obligations have been applicable since 2 August 2025. The Commission's enforcement powers enter into application on 2 August 2026, and the majority of the high-risk and transparency rules come into force on the same date.
Fines run up to 35 million EUR or 7 percent of global annual turnover for prohibited practices, and up to 15 million EUR or 3 percent for violations of human oversight, audit trail, and transparency requirements. Those are the exact categories an agentic system has to worry about.
Here is the important nuance. The Act does not have a specific agentic carve-out. Agents are regulated by their use case. A customer-service agent that also makes refund decisions is likely to land in the limited-risk or high-risk bucket depending on the context. That means there is no "EU AI Act agentic certificate" you can ask for, because agentic is not the regulatory category. My read is that this will force buyers to do the use-case classification themselves, which most buyers are not equipped to do in April 2026.
5. Cloud Security Alliance Agentic AI Red Teaming Guide
The Cloud Security Alliance Agentic AI Red Teaming Guide is probably the most operationally useful document in this whole landscape for a security team. It lays out threat categories that map directly to what actually goes wrong in production: authorization and control hijacking, checker out of the loop, goal manipulation, knowledge base poisoning, multi-agent exploitation, and untraceability.
In March 2026, CSA launched the CSAI Foundation specifically to build identity-first controls for non-human actors and runtime authorization governance for the "agentic control plane." This matters because the same source estimates that in 2026, non-human identities outnumber human users roughly 100 to 1 in a typical enterprise environment, driven by the rise of autonomous agents operating with real privileges.
Again, this is builder guidance, not buyer guidance. It tells a security engineer what to test. It does not give a board member a score.
6. Vendor-published safety frameworks
Anthropic published its framework for developing safe and trustworthy agents and has iterated on it through early 2026. Its core principles are reasonable: autonomy balanced with oversight, transparency into the agent's reasoning, and human control over high-stakes decisions. Anthropic's Plan Mode in Claude Code, where a user reviews and modifies an entire execution plan upfront instead of approving each step individually, is one practical answer to approval fatigue.
In December 2025, Anthropic, OpenAI, and Block founded the Agentic AI Foundation under the Linux Foundation to coordinate open, interoperable infrastructure for agents. That consortium has the scale to influence where this ends up. It is too early to say what the buyer-facing output will look like.
Vendor-published frameworks are useful, and I read all of them. They are also vendor-published. You would not accept "our CEO says we are secure" as a substitute for SOC 2. You should not accept "our foundation model provider published a framework" as a substitute for an independent standard either.
7. US regulatory activity: California and the FTC
California AB 316, effective 1 January 2026, eliminates the "autonomous AI" defense in civil litigation. Defendants cannot argue that the AI system independently caused the harm as a way to escape liability. The FTC's Operation AI Comply has produced real settlements, including Cleo AI and Air AI. These are not frameworks, they are enforcement. They shape what the downside looks like if your pilot goes wrong.
For the full legal picture on liability, the earlier piece on the five controls that separate reliable AI agents from costly mistakes goes deeper into the specific cases: Air Canada, Meta, DPD, Nippon Life. Those cases are the empirical base every one of the frameworks above is trying to formalize.
Why the gap is a buyer problem, not a vendor problem
It would be easy to read everything above and conclude that the standards bodies just need more time, and that in 12 months NIST's Q4 2026 profile will close the gap. I think that is partly true and partly wrong. It is partly true because the direction of travel is real: NIST, OWASP, MITRE, and CSA are converging on a shared vocabulary. It is partly wrong because the thing that is actually missing is not technical content. It is an artifact designed for the buyer.
SOC 2 works as a buyer tool for a specific reason. It is an attestation, produced by an independent third party, that a vendor meets a defined set of criteria, and it produces a report the buyer can put in a procurement file. The buyer does not need to understand the underlying controls. They need to know that someone with a professional obligation looked at the controls and signed their name to a report.
None of the agentic frameworks I listed above has that shape right now. OWASP does not certify vendors. NIST does not certify vendors. MITRE does not certify vendors. The EU AI Act produces conformity assessments for high-risk systems, but only for systems in scope and only after enforcement starts in August 2026, and the attestation is about regulatory conformity rather than a general agent-safety posture.
So when I say the buyer problem is the hard part, I mean this. The underlying content of a mature Agentic Risk Standard already exists, scattered across five or six documents. The packaging, the independence, the attestation, and the buyer UX do not. Until they do, the buyer is holding the risk, not the vendor.
If you are a board member, that should make you uncomfortable, because you are being asked to make a decision without the artifact you would demand for any other piece of enterprise technology.
A short detour on why SOC 2 actually works
It is worth being specific about what SOC 2 gives a buyer, because the properties it has are the ones the agentic ecosystem is missing. SOC 2 is a report, not a checklist. The report is produced by a CPA firm that has an independent professional obligation to the integrity of the attestation. The underlying Trust Services Criteria are maintained by the AICPA, a standards body that does not sell SOC 2 reports itself. The auditor and the standard-maker are separate organizations, and both are separate from the vendor being audited.
Those three separations are what give a SOC 2 Type II report its force. A buyer can look at the report, see who the auditor was, look up the auditor's reputation, and trust the attestation as much as they trust the auditor. That trust is portable across vendors, because the same auditor can audit many vendors against the same criteria.
Now map that to the agentic AI world. Who maintains the criteria? Right now, nobody in a single place. Who is the independent auditor? There is no equivalent of a CPA firm with an agentic audit practice and a professional obligation tied to that attestation. Who is the standards body? Several, and none of them is in the business of writing buyer-facing criteria for independent attestation. This is not a criticism of any of the existing efforts. It is a structural observation about what the ecosystem has not yet produced.
Until those three roles exist separately, the buyer has to do the auditor's work themselves, and most buyers are not equipped to do that. That is the whole problem in one sentence.
The ten dimensions a mature Agentic Risk Standard would need to cover
This is the constructive part of the piece. If you accept that no single standard exists today, the next useful question is: what would one have to cover to be worth reading? I am going to sketch ten dimensions. They are drawn from the union of the frameworks above plus the documented agent failures of 2024 through 2026.
Treat this as a mental model for now. If someone hands you an agent proposal tomorrow, you can run these ten questions against it and get a clearer read than any one of the existing frameworks will give you on its own.
1. Blast radius
What is the maximum damage a single agent action can cause before a human reviews it? Put it in units. Dollars moved. Records modified. Emails sent. Customers contacted. If the answer is "theoretically unlimited," or worse, "we have not measured it," that is your loudest signal.
The Meta incident in 2025, where an AI alignment director watched an agent delete over 200 of her emails while she sent it stop commands it ignored, is a blast-radius failure. The DPD chatbot incident in January 2024, where the agent was manipulated into swearing at a customer and calling DPD "the worst delivery firm in the world," is a blast-radius failure in reputation terms.
Blast radius is also the easiest dimension to make concrete in a conversation with a non-technical stakeholder, which is why I put it first. Ask a specific question in specific units. What is the largest refund this agent can issue in a single action. What is the largest number of customers it can contact in a single hour. What is the largest number of database records it can modify before a human reviews. If your vendor cannot produce those numbers in five minutes, you have your answer. Not because the vendor is bad, but because a team that has not measured blast radius has not yet done the thinking that would tell them to measure it.
2. Authority scoping
Exactly which credentials, APIs, data scopes, and systems can the agent touch? Is there a documented least-privilege posture, and does it exist as an artifact, not as a verbal promise? This is the OWASP ASI03 category (Identity and Privilege Abuse), and the CSA's 2026 focus on non-human identity governance lives here.
The right answer looks like a spreadsheet or a policy document. The wrong answer looks like "the agent uses the service account that the ops team uses," which is a way of saying the agent has whatever permissions the last ops engineer happened to need.
3. Reversibility
Is every action the agent takes reversible? If not, which actions are one-way doors, and how are those specifically gated? A refund is reversible with paperwork. A public social media post is not. A bulk email to 50,000 customers is not. A payment sent to a new payee is difficult to reverse and depends entirely on the counterparty's cooperation.
The principle here is simple. One-way actions deserve a different class of control than reversible ones. If your vendor's answer treats them the same, you have a reversibility blind spot.
4. Authentication and non-repudiation
After the fact, can you prove which agent took which action on behalf of which human? This is the identity provenance question, and it is the one the CSA has correctly identified as the control-plane challenge of 2026. If your auditor or regulator subpoenas an agent action log 18 months from now, can you produce a chain of custody?
The failure mode here is subtle. Most enterprise logging assumes a human user or a known service principal. Agents often run under service accounts that were never designed to carry user-level non-repudiation. That is fixable, but only if someone insists on fixing it before the agent goes live.
5. Human in the loop thresholds
Which exact actions require explicit human approval? What is the approval interface? How does the system prevent approval fatigue, where humans reflexively click yes on every prompt?
Anthropic's Plan Mode in Claude Code is one credible answer: instead of approving every individual tool call, a human reviews and modifies an entire execution plan upfront. That compresses the approval decision into a single thoughtful review instead of 40 rubber stamps. The pattern matters more than the specific product. If your vendor's answer is "the operator will approve each step," ask how they have tested that it still works on day 90 when the operator has approved 10,000 steps.
6. Auditability
Is there a tamper-evident log of every action, tool call, input, and output? Can a regulator subpoena it? The EU AI Act's human-oversight and audit-trail requirements sit exactly here, and from 2 August 2026 they come with real enforcement teeth.
Tamper-evident is the important word. Logs your engineering team can edit after the fact are not audit logs. They are notes. The distinction matters when you are the one testifying.
7. Grounding and output provenance
Can the agent cite the source of any claim it makes to a customer or internal user? Can you trace a given response back to the specific document, record, or tool call that produced it?
Air Canada's fabricated refund policy is the canonical failure of this dimension. The chatbot generated a policy that did not exist, the customer relied on it, the British Columbia Civil Resolution Tribunal held Air Canada liable. The airline's defense, that the chatbot was a separate legal entity responsible for its own actions, was rejected. The grounding question is no longer academic. It is a liability question.
8. Kill switch and containment
Can the agent be stopped immediately, and is the kill switch actually tested? This sounds obvious. It is the single most commonly skipped control I see discussed in public postmortems.
The test is literal. Can someone in operations, at 3am, with nothing but a terminal and their phone, stop the agent in under 60 seconds? If the answer involves filing a ticket, or paging an engineer who knows the specific internal system, you do not have a kill switch. You have a hope.
9. Supply chain integrity
Are the models, tools, Model Context Protocol servers, and training data the agent depends on verified? Is there a bill of materials for the agent's dependencies the way there is for a software package?
The LiteLLM supply chain incident is the reason this dimension is in the list. Agents pull from a growing stack of model providers, routing proxies, tool libraries, and MCP servers. Each of those is a link in a supply chain that did not exist five years ago. MITRE ATLAS added "Publish Poisoned AI Agent Tool" as a specific technique in its February 2026 update because this class of attack is now common enough to name.
10. Adversarial testing
Has the agent been red-teamed against prompt injection, goal hijack, context poisoning, and tool abuse? Is there a written report? Is the red-team exercise repeated after every major update?
The CSA Agentic AI Red Teaming Guide and the MITRE ATLAS technique library give security teams a playbook for this. The buyer-side test is simpler: ask to see the report. If there is no report, there has been no serious red teaming. If the report is from the same team that built the agent, it is better than nothing but it is not independent.
The questions an executive can ask a vendor this week
Ten dimensions are a lot to hold in your head. If you are walking into a vendor meeting tomorrow morning, here is the shorter version. Eight questions, in plain language, that you do not need to be an engineer to ask.
- What is the largest financial action this agent can take without a human in the loop, in dollars?
- Show me the document that lists every system, API, and credential this agent can access.
- Which actions are one-way doors, and how are they specifically gated?
- If a regulator asks in two years who authorized a specific agent action, can you produce a chain of custody?
- Walk me through how a human operator stops this agent in under 60 seconds. I want to see it happen.
- Show me the most recent adversarial test report for this agent. Who ran it and when?
- When the agent makes a claim to a customer, can you show me the source the claim was grounded in?
- Which EU AI Act risk category do you believe this system falls into, and why?
None of those questions require technical expertise to ask. All of them are answerable in 15 minutes by a vendor who has actually done the work. If the answers are vague, that is your signal. The signal is not always that the vendor is bad. Sometimes the signal is that the vendor has not thought about the right dimensions, which is almost the same thing from a buyer's perspective.
What to watch for over the next 12 to 24 months
The landscape is going to change fast, and a few specific markers are worth watching.
The first is NIST's AI Agent Interoperability Profile, expected in Q4 2026. Of the real candidates to start closing the buyer gap, this is the most credible one. A NIST profile will not be a certification, but it will give the rest of the ecosystem a shared reference.
The second is the EU AI Act's enforcement start on 2 August 2026. Once the first enforcement actions land, the market will have real precedents for what "sufficient human oversight" and "adequate audit trail" look like in practice. Regulators tend to teach the market through enforcement, not through guidance documents.
The third is whether an independent attestation body emerges. There is no SOC 2 for agents because there is no equivalent of the AICPA sitting behind an agent-specific attestation standard yet. That role could be filled by an existing standards body extending its scope, or by a new entity formed for the purpose. The Agentic AI Foundation formed under the Linux Foundation in December 2025 is a plausible home for part of this work, but it was founded by model providers, not by auditors. Independent attestation almost certainly has to come from somewhere else.
The fourth is whether the insurance market catches up. When cyber insurance carriers start asking specific agent-related questions on underwriting forms, the market will have a de facto standard, because carriers will force vendors to answer the same set of questions the same way. That pressure tends to move faster than voluntary standards bodies. The same pattern played out in cyber insurance in 2018 through 2021: long before any formal certification was universal, the underwriting questionnaire became the effective standard, because no carrier was willing to write a policy without an answer. I would watch for the first carriers to add specific agent questions to their 2026 and 2027 renewals. When that happens, the market moves.
The fifth marker is quieter but important. Watch what large enterprise buyers start putting into their master services agreements and data processing addenda. Procurement contracts tend to codify buyer demands faster than standards bodies codify best practices. If Fortune 500 procurement teams start requiring specific agentic-risk representations and warranties in vendor contracts, that contractual language becomes the de facto standard whether or not a formal body ever publishes one. The enforcement mechanism is not a regulator. It is the breach-of-contract lawsuit that follows the next incident.
A note on what this is, and is not
I called this commentary in the opening, and I want to keep that promise in the closing. Everything above is one engineer's read of what is published, what is not, and what the gap means for the people making procurement decisions.
I am not proposing the Agentic Risk Standard as a product, a certification, or a framework I am trying to own. I am using the name to talk about a shape-of-hole in the market. The actual closing of the gap is going to be done by some combination of NIST, OWASP, CSA, the EU AI Commission, a future attestation body, and the insurance industry. None of those are me, and none of them are Code Atelier.
What I have written here is a mental model. I have shipped agentic systems in production, including a RAG pipeline and an agentic workflow at a previous company, so I know the difference between agent problems that look scary on a slide and agent problems that actually bite you in operations. The ten dimensions above are the ones I would want to answer myself before I put an autonomous system into a customer-facing path with real money behind it.
If you are walking into that kind of decision this quarter, I hope the list is useful. And if you want to compare notes on any of it, I am easy to reach. The contact form at the top of the site goes directly to my inbox.