When the audit trail shows up late

By Nathan Donaldson

A flat near-orthographic diagram: an ordered trail of small record-cards threads upward through a stack of overlapping translucent service planes to a clear glass oversight pane at the top, where restrained machinery sits, against a deep navy background. A single crimson accent marks the top pane. — The audit trail threads up through every layer to a single pane of oversight. Layer 5 is what catches the light.

Imagine a debt notice arrives in the post. It says you owe Centrelink several thousand dollars. The number on the page comes from a calculation a computer did. Nobody at the agency picked your case off a desk and looked at it. Nobody can tell you which rule you broke, only that the rule found you. If you want to argue with the number, the burden is on you to prove it wrong.

That is roughly what happened to a lot of Australians between 2015 and 2019. A system called the Online Compliance Intervention, which most people now know as Robodebt, was raising debts straight from averaged tax-office data. The Federal Court found in 2019 that raising a debt that way was unlawful. The class action settled for one-and-a-bit billion dollars. The Royal Commission report landed in 2023 and called the scheme “a crude and cruel mechanism, neither fair nor legal.” Two mothers gave evidence about their sons’ suicides. The Commission noted a third tragic death linked to a 2017 discrepancy letter.

The thing I want to point at is not the automation. It is what was missing around the automation.

There was no published rule that a citizen could read to see how their debt was calculated. There was no individual explanation in the letter that arrived. There was no real human gate before recovery action started. There was no audit record an outsider could inspect to ask whether the rule the computer ran actually matched the law. The technical capacity to do the match arrived years before the oversight scaffolding that would have made the resulting decisions legitimate.

That gap is what I want to talk about today.

At Boost, we have been thinking about Agentic Government as a five-layer model

At Boost we have been thinking about Agentic Government as a 5-layer model. Layer 1 is the foundation, the identity and data exchange and compute that everything else sits on. Layers 2, 3 and 4 are where agents do most of the visible work: moving cases between agencies, talking to citizens, doing the work itself. Layer 5 is the one that hardly anyone leads with, and the one I keep coming back to.

Layer 5 is oversight and governance. Audit logs. Agent registers. The rule written in a way a computer can run, which is also the way an auditor or a tribunal can read it back. A human-in-the-loop where the consequences earn it. A real right of appeal. An explanation a normal person can understand.

The model is architectural, not a maturity ladder. A serious Agentic Government build needs all five. The reason layer 5 is the binding constraint, in my read, is not that automation is dangerous. It is that the failure cases in this field, the ones with names and royal commissions attached, are layer-5 failures. The automation was the easy part. The audit trail showed up late.

What the failure cases look like, and what they share

Robodebt is the cleanest case to point at because the rule it was running was a set rule, not a learning model. A simple averaging calculation, the kind any junior developer could write. The catastrophe was not the rule. The catastrophe was that the rule had no statutory basis, no published explanation, no inspectable trail, no real contest path, and an Ombudsman who was actively obstructed when she went looking.

The Netherlands has a related story from the same decade. Their childcare-benefits affair, the toeslagenaffaire, ran from 2005 to 2019. The tax administration used a risk-scoring model to flag families for fraud investigation. Nationality was one of the inputs. The model’s design was not published. There was no individual explanation for the families flagged. About 26,000 families were wrongly accused. The third Rutte cabinet resigned in full over it in January 2021.

Different country, different decade, different kind of model. A learning model in the Dutch case, a set-rules calculator in the Australian one. Same shape of failure. No inspectable rule-trail. No individual explanation. No proportionate contest path. No real human gate before recovery action started. The kind of automation matters for the design of layer 5, but not for whether you need it.

Two abstract boxes side by side on a navy plane: a sealed opaque grey cube with no way to see inside, and a transparent glass cube revealing a neat ordered stack of fine internal record-lines, with a single thin crimson line marking the readable trail. — Same decision, two designs: one sealed shut, one with an inspectable trail you can read back.

What the cases that worked look like

The other half of the story is the one that does not get told as often. Canada has had a Directive on Automated Decision-Making since 2019. Before any federal department deploys an automated decision system, it has to fill out an Algorithmic Impact Assessment. The form scores the system into four impact tiers. The higher the tier, the more the department has to do: published plain-language description of the decision, public posting of the assessment, mandatory notice to the affected person, access to human review, peer review of the system design, head-of-agency sign-off at the top tier. Seven years in, I haven’t found a Canadian-federal Robodebt-class case on the public record.

The UK has gone the other way around the same problem. By 2025, the UK had made ATRS publication mandatory for central government departments and specified arm’s-length bodies. The Algorithmic Transparency Recording Standard sits on a public hub on gov.uk. The record names the tool, the owner, the data it uses, how humans oversee it, and how to contest it. More than 125 records have been published so far. The OECD points to it as world-leading.

Estonia is the substrate proof. X-Road has been running since 2001. Every exchange of personal data between Estonian agencies is authenticated, encrypted, timestamped, signed and logged. Citizens can see in the e-Estonia portal which officials looked at their records and why. Unauthorised access is a criminal offence. Twenty-five years on, no comparable accountability blow-up has surfaced at the data-exchange layer that I can find. The lesson there is structural. Layer 5 is cheap when it is built into the foundation. It is expensive and brittle when it is bolted on after.

The common thread across these three is that the oversight design was specified before, or in lockstep with, the automation. Not after.

Rules-based is the easy case to get layer 5 right on

Here is the constructive bit. The rules-based cases that failed, Robodebt, Michigan’s unemployment-fraud system, the Arkansas Medicaid algorithm, all failed in the same way: the rule was kept beyond inspection. A 24-line eligibility rule expressed in code is more inspectable than a discretionary human queue with no record of which criteria each officer applied. Sweden’s Trelleborg municipality automated simple-case social-assistance decisions in 2017, and turnaround on those decisions improved materially.

That is the constructive limb of the layer-5 argument. Rules-based automated decisions are the easy case to get right first, because deterministic rules produce an explicit, publishable rule-trail. The same automation that critics worry about can be more transparent than the human queue it replaces. But only if layer 5 turns that latent auditability into something a citizen, an auditor or a review tribunal can actually see and contest.

The harder case is the learning-model side, the kind of system the Netherlands had, the kind Denmark’s Udbetaling Danmark agency is currently running. Amnesty International reported in November 2024 that the agency operates more than sixty machine-learning models to flag suspected benefit fraud. The agency says human caseworkers always review flagged cases. Amnesty’s point is that the human-review claim is not externally auditable. That argument, the one over whether the human-in-the-loop is real or asserted, is the layer-5 argument that the next decade of this work will keep landing on.

The EU has legislated for the learning-model side. The AI Act classifies systems that decide eligibility for essential public benefits as high risk. The original compliance date was 2 August 2026. The European Commission has proposed deferring the high-risk obligations to 2 December 2027; that agreement is political at the time of writing, not yet formally adopted. Either way, the structure of what the Act requires for those systems, the documentation, the logging, the human oversight, the conformity assessment, the EU register, is recognisably a layer-5 design pattern at supranational scale.

The live local instance

New Zealand has just made the move every jurisdiction in this story has made or is making. The Social Security Modernisation Amendment Bill passed at the end of May 2026. It permits MSD to use automated systems to make social-security decisions. MSD has been clear that the automation is rules-based, not generative AI. The Minister has been clear that human judgement remains where it is needed.

What this country has now, in plain words, is statutory authorisation for rules-based automated decisions inside a major benefit-delivery system, sitting on top of a non-statutory transparency overlay called the Algorithm Charter. The Charter was published in 2020. It asks signatory agencies to document the algorithm in plain English, to partner with Māori, to focus attention on high and critical-risk decisions, to keep humans in the oversight loop, to publish a clear contact point for review.

The opportunity is structural. The Bill authorises the rules-based case, which is the case the global evidence says is the easy one to get layer 5 right on. The Charter describes the oversight maturity those decisions deserve. The toolkit other jurisdictions have published, the Canadian impact assessment, the UK transparency register, the Estonian audit-trail model, the EU documentation standard, is sufficient to design for the rules-based regime today. Not a new act. A design pattern. An inspectable rule, a human-failsafe gate proportionate to the consequence, a published register of which decisions are automated and under which rule, a contestable explanation, an audit log.

How I would test whether I’m wrong about this

Layer 5 is the binding constraint on Agentic Government, and the rules-based ADM regime the Bill authorises is the easier case to get right first.

Here is what would change that, measured from 22 May 2026. A jurisdiction ships a material-scale agentic-government deployment, the kind that decides multi-million citizen-affecting cases (welfare eligibility, immigration processing, or tax assessment), without parallel layer-5 maturity. No agent registry. No rules-as-code. No programmatic audit trail meeting or exceeding the X-Road-class baseline. And the deployment runs for eighteen months or more without a serious accountability incident, a public-trust crisis, or a judicial reversal. If that pattern shows up in two or more comparable jurisdictions, not one, then the binding-constraint reading is wrong and I will need to revise it.

My best guess, on the rules-based regime authorised in NZ, is that the next two years will tell us whether the implementation joins the constructive set or the failed-in-the-dark set. The variable that decides which set it joins is layer 5. The audit trail, the inspectable rule, the contest path, the human gate where the consequences earn it. Not whether to automate. How to make the automated decision legible to the person it affects and to the auditor who comes later.

If you’ve worked on any of this, in New Zealand, Australia, Canada, or anywhere else, send me a note.

When the audit trail shows up late

At Boost, we have been thinking about Agentic Government as a five-layer model

What the failure cases look like, and what they share

What the cases that worked look like

Rules-based is the easy case to get layer 5 right on

The live local instance

How I would test whether I’m wrong about this

Here are some other posts you might enjoy

Make a bigger impact tomorrow

When the audit trail shows up late

At Boost, we have been thinking about Agentic Government as a five-layer model

What the failure cases look like, and what they share

What the cases that worked look like

Rules-based is the easy case to get layer 5 right on

The live local instance

How I would test whether I’m wrong about this

Here are some other posts you might enjoy

Layer 5: the part that decides whether it works

An agentic front door, if citizens want one

Make a bigger impact tomorrow