Making AI Accountable: How Explainability and Observability Meet Regulation and Trust

Hanah-Marie Darley

Co-founder & CAIO

AI adoption is accelerating, but trust and accountability are essential. Learn how explainability, observability, and behavioural oversight help security teams meet regulations, manage risk, and scale AI Agents with confidence.

In conversations about AI, it is often described as a “black box.” That perception reflects the complexity of the technology, but it does not capture its potential. As adoption accelerates across industries, there is an opportunity for security leaders to transform AI from something opaque into something explainable, observable, and trusted.

Performance on its own is no longer enough. Boards, regulators, and customers will increasingly demand systems that can be explained, audited, and overseen with the same rigor as any other critical business system. For security and IT leaders, this is more than just a technical challenge. It is a chance to shape enterprise AI adoption in a way that balances risk with opportunity, and compliance with innovation.

Different disciplines of AI, from traditional machine learning, to transformers, to large language models, and now to agents, require different approaches to explainability and observability. Regulations like the EU AI Act, ISO 42001, and sector-specific frameworks are reinforcing this shift, moving explainability and observability from technical aspirations to business obligations.

For security leaders, this is goes beyond a compliance challenge. It is about stepping into a strategic role: enabling innovation while ensuring the enterprise can adopt AI safely and at scale. It is a chance to lead adoption with clarity, align risk with opportunity, and make AI something the enterprise can scale with confidence.

The Landscape of Explainability and Observability

When people talk about “explainability” in AI, they use the same word but often mean very different things. This is because the methods and the level of clarity they provide depend heavily on the type of system being used.

You can think of the various kinds of explainability like meals with and without recipes. Explaining a decision from a simple rules-based system is like showing someone a recipe. You can point to each ingredient and the steps that led to the final dish. With modern AI, such as AI Agents, you are often staring at a meal without the recipe, trying to guess how it was made. That is where different types of explainability and observability come into play.

Inherently explainable models

Some models are transparent by design. Decision trees, for example, let you trace exactly which questions the model asked (“Is income above this threshold? Is credit history longer than 5 years?”) and how each answer shaped the outcome. Tree interpreters show this path clearly, which is why they have long been used in regulated areas like finance.

The upside is clarity: you can hand an auditor the “recipe.” The downside is that these models do not perform well with the messy, high-dimensional data we see in language, images, or unstructured text.

Post-hoc explainability

For more complex systems like deep neural networks or transformers, the “recipe” is not visible. The logic is spread across millions or billions of parameters, like a dish cooked in a kitchen with thousands of invisible chefs all stirring at once. To make sense of the result, we need post-hoc methods that explain decisions after the fact.

These include:

Feature attribution tools such as SHAP or LIME, which highlight which inputs had the most influence, like showing which ingredients dominated the dish.
Attention or saliency maps, which visualise what the model “looked at” most closely, like showing which ingredients the chef paid the most attention to while cooking.
Counterfactual explanations, which answer “what if” questions, like asking what would have happened if the chef had added a little more of one ingredient or swapped another, and seeing how the dish would have turned out differently.

Post-hoc methods provide valuable insights, but they’re more like a lens than a blueprint. In high-stakes domains, relying on approximations alone can be risky.

Observability

Explainability is retrospective. It helps you understand why a decision was made. Observability is more like real-time monitoring. It tells you what the system is doing right now, how it is behaving over time, and where risks may be emerging.

In traditional machine learning, observability focuses on data drift (Are inputs changing?), bias (Are outcomes skewed?), or anomalies (Are predictions failing unexpectedly?). In LLMs, it extends to monitoring for things like prompt injection, hallucinations, or output variability.

With agents, the bar is even higher. Observability must capture behaviour: what actions did the agent take, in what order, and with what outcomes? It should also provide the context needed to pair with explainability methods, so leaders can understand not only what the agent did, but what influenced its choices. This combination helps track a chain of decisions across time, tools, and systems.

A Technical Deep Dive into Explainability

For leaders who want to look under the hood, here are the most common methods of explainability in use today. Each has strengths, but also limitations that matter when decisions carry regulatory, security, or compliance weight.

Tree interpreters: Trace the exact path of a decision, step by step. Regulators value this clarity, which is why tree-based models remain common in finance and risk. Limitation: they do not scale to deep learning or unstructured data.
Feature attribution (e.g., SHAP, LIME): Highlight which inputs had the most influence on an outcome. Intuitive and widely used, but only approximate the model’s logic rather than fully revealing it.
Attention or saliency maps: Visualise which parts of the input the model focused on most, useful for images or text. Limitation: can be unstable and hard to interpret consistently.
Counterfactual explanations: Show what would need to change for an outcome to flip (“If income were $5,000 higher, the loan would be approved”). Powerful for human understanding, but computationally expensive and not always scalable.

No single approach offers a perfect solution. The right method depends on context: structured data, unstructured text, high-stakes compliance, or operational scale. The common thread is that explainability must match the system and the risk profile and be paired with observability when autonomy and behaviour come into play.

Regulatory Drivers of Explainability

Regulation is moving quickly from broad principles to concrete expectations. The EU AI Act sets the tone: high-risk systems must be transparent, traceable, and under meaningful human oversight. It does not prescribe a single method of explainability, but it makes clear that opaque, uninterpretable systems will not be acceptable.

Alongside this, ISO/IEC 42001, the new international standard for AI management systems, raises the bar further. Think of it as ISO 27001’s cousin for AI. Instead of focusing on information security, it creates a management framework for AI, requiring organisations to:

Establish governance structures and accountability for AI.
Define risk management processes tailored to AI’s unique risks, including explainability and transparency.
Document system behaviour, limitations, and intended use cases so stakeholders know what to expect.
Provide monitoring and auditability mechanisms to ensure compliance and continuous improvement.

This is important because ISO 42001 shifts explainability from a technical nice-to-have to a management obligation. It is not enough for engineers to know how a model works. Boards, risk teams, and regulators need evidence that explainability and observability are systematically addressed. This is also recommended for systems used by the business like SaaS-based agents or low-or-no code agent platforms like Microsoft Copilot or AWS Bedrock.

Beyond general AI regulation, sector-specific rules already demand explainability and observability in practice:

Financial services (DORA in the EU): requires resilience, traceability, and auditability of critical ICT systems. As agents begin to automate financial processes, regulators will expect banks and insurers to evidence not only what decisions were made, but how systems behaved under stress.
Healthcare and pharmaceuticals (e.g., FDA in the US, EMA in the EU): demand explainability in clinical decision support and drug development. AI outputs affecting safety or efficacy must be documented, interpretable, and validated. For agents in this domain, behavioural observability will be essential for demonstrating compliance and patient safety.
Critical infrastructure (energy, transport, telecoms): increasingly governed by resilience and safety regulations (e.g., NIS2 in the EU), where explainability must extend to how autonomous systems interact with physical and digital infrastructure.

Matching Explainability to Application

Different AI disciplines demand different approaches to explainability and observability. The right mix depends on both the system and the risk profile.

ApplicationExplainability NeedsObservability NeedsTraditional Machine Learning (structured data)Inherently interpretable models (decision trees, regression) or feature attribution methods (e.g., SHAP, LIME). Required for use cases like risk scoring, fraud detection, or predictive analytics where regulators expect clear logic.Monitoring for model drift, data quality issues, bias, and anomalies that could degrade performance or fairness.Transformers & Deep LearningPost-hoc explainability, such as attention or saliency maps, to reveal what parts of the data influenced decisions.Observability for performance stability, bias over time, and anomaly detection across large datasets.Large Language Models in productivity appsTransparency of data sources, fine-tuning, and limitations. Capturing LLM reasoning as part of the prompt or tool invocations. Guardrails for content and style.Logging of prompts and responses, monitoring for hallucinations, prompt injection, or toxicity.Agents in enterprise opsCounterfactuals and decision-path explanations to show why actions were taken.Behavioural observability: capture decisions, tool use, context changes, and outcomes across time.Cross-system orchestrationExplainability at the system level: why agents handed off tasks and what compounded decisions led to.System-wide observability: monitor interactions between agents, dependencies, and risk propagation.

A Note on Guardrails

The term guardrails comes up often in AI discussions, but it isn’t always used with precision. For most systems today, guardrails mean predefined rules that limit what a model or agent can say or do.

For language models, this might mean:

Blocking unsafe or toxic outputs
Restricting answers on sensitive topics
Enforcing formatting or style rules

For agents, guardrails extend further, defining what the agent is permitted to do within a system:

Allowing an HR agent to draft an onboarding document, but not access payroll data
Enabling an IT agent to triage tickets, but not execute high-risk configuration changes

Guardrails are an important first layer of control. They help enterprises reduce obvious risks and reassure stakeholders that systems are designed not to step outside of defined limits. But for security teams, the real question is: what happens inside the guardrails?

That’s where explainability and observability come in. Guardrails are designed to constrain actions, but they cannot explain why a system made a decision, or show how an agent behaved within its allowed scope. Without that visibility, leaders are left with blind spots. Furthermore, if guardrails are bypassed by the agents, explainability and observability become even more critical, ensuring risks are visible and governable rather than hidden.

This is why guardrails must be complemented with observability and governance. Guardrails set the initial boundaries; observability shows whether the system stayed within them, why decisions were made, and what outcomes were produced. Together, they create both surface-level safety and deep operational trust, and the kind of layered assurance that boards and regulators increasingly expect.

For security leaders, the takeaway is clear: guardrails are useful, but not sufficient. To truly manage AI responsibly, enterprises need to pair them with explainability and behavioural observability so that autonomy, accountability, and innovation can coexist without compromise.

Why Behavioural Observability Is the Next Frontier

Explainability and observability have long helped enterprises understand and monitor AI systems. But with agents, a new layer is required.

Explainability is retrospective. It answers why a model produced a specific output or decision.
Observability is real-time and forward-looking. It answers what the system is doing right now, how it is behaving over time, and where risks may be emerging.

Agents demand more. Because they act with autonomy - selecting tools, chaining actions, and adapting to changing contexts. Enterprises need visibility both into outputs and into behaviours over time. Operational observability delivers this.

Behavioural observability captures the what, when, and how of agent actions: what steps were taken, which tools were used, how the agent responded as context evolved, and what outcomes were produced. It creates a traceable log of behaviour that can be audited, governed, and aligned with business expectations.

This doesn’t replace explainability. It complements and extends it by providing the operational lens enterprises need to manage agents at scale. Without it, leaders are left with blind spots. With it, they gain the ability to monitor, intervene, and ensure agents remain accountable, enabling innovation with the clarity and confidence that security and compliance demand.

Final Thoughts

AI governance isn’t about finding a single “best” method of explainability or observability. It’s about choosing the right approach for the right technology and the right context. As businesses move from models to agents, behavioural observability will become a defining capability - a safeguard that ensures autonomy doesn’t come at the expense of compliance or trust.There is a clear and present opportunity to move beyond black-box uncertainty into a future of clarity, control, and confidence. When explainability and observability are harnessed together, they go beyond mitigating risk to actively power safe, scalable innovation.

‍

The New Attack Surface: Why AI Agents Need Taint Analysis

Colin Nwachukwu Ife

Head of Risk

The New Attack Surface: Why AI Agents Need Taint Analysis

As AI Agents evolve into autonomous actors, securing them requires techniques beyond identity and access controls. This article explores how taint analysis can extend visibility into agent workflows, uncover hidden risks, and strengthen enterprise AI security.

Making AI Accountable: How Explainability and Observability Meet Regulation and Trust

Hanah-Marie Darley

Co-founder & CAIO

Making AI Accountable: How Explainability and Observability Meet Regulation and Trust

From Barriers to Breakthroughs: Governing AI Agents for Safe, Scalable Adoption

Hanah-Marie Darley

Co-founder & CAIO

From Barriers to Breakthroughs: Governing AI Agents for Safe, Scalable Adoption

From technical hurdles to security concerns, this article explores how leaders can overcome adoption barriers with governance and observability at the core.

Footer graphic with abstract geometric patterns and gradients

Making AI Accountable: How Explainability and Observability Meet Regulation and Trust