Layer 4: the part everyone watches

By Nathan Donaldson

Layer 4: the part everyone watches

Boost's 5-layer model is a map of Agentic Government. This post is about layer 4, the layer the whole debate fixates on, and the one the model treats as the wrong place to spend the attention.

Agentic AI is software that can chase goals on its own, not just answer one prompt at a time. The five layers, in brief:

  1. The foundation underneath. Identity, registers, data exchange.
  2. Internal coordination. Moving a case between agencies.
  3. The citizen interface. The front door.
  4. The work itself. The decisions and the doing.
  5. Oversight and governance. Audit, appeal, and the record.

The numbering is about parts, not steps. Today, layer 4.

What layer 4 is

Layer 4 is the work itself. An agent making the actual call. Working out eligibility. Drafting the document. Moving the case along. Modelling a policy choice.

This is the layer the argument lives at. Are the models good enough yet? Can an AI really make the decision? It is also the busy working floors of the building, full of desks, full of motion. Impressive to walk through. But a building does not get its certificate based on how busy the floors look. That call is made one floor up.

The point of this layer: the layer everyone watches is the wrong place to look. Model skill at layer 4 matters, but it is not the thing that holds agentic government back. The capability is arriving on its own. The harder questions sit one floor up, at oversight, and two floors down, at coordination and the foundation.

The test that tells real work from dressed-up automation

One line carries this layer. If the agent's ability to act on its own were removed and replaced with a scripted call or a person doing the click, would the system still work the same way? If yes, it is not layer-4 agentic. It is the foundation with a chat skin on top.

This is why robotic process automation is not the same thing. That kind of automation runs one process by a fixed script. No goal of its own. No reasoning across a case. No record built for an agent's choices. It is useful. It just is not what the agentic claim is about. "We already do automation" is the most common way the layer-4 claim gets faked.

What is actually running

Most of what runs in government today, at real scale, is a person being helped by a model, not a model deciding alone. Two examples make the pattern clear.

In the United Kingdom, the Department for Work and Pensions uses an AI tool to help caseworkers match medical evidence for a benefit. It has handled more than 780,000 cases since 2020. An early version got the match right about a third of the time, and human agents corrected the rest. Read that the right way. The humans are doing the deciding. The tool is assisting. That is not an agent making the call.

In Aotearoa New Zealand, Inland Revenue uses machine learning to help spot compliance risk. Its own published page is plain about the limit: the activities that follow from these algorithms are subject to human oversight and human decision-making. Again, a model helping a person, not a model deciding alone. Inland Revenue is also rolling out an everyday AI assistant to its staff. That is an internal productivity tool, not a citizen-facing decision system. It says something, though. Capability is arriving inside government workplaces well before any agent runs a citizen-facing decision.

The frameworks point further out. The Tony Blair Institute imagines a National Policy Twin for modelling policy. The Agentic State paper has domains for decision-making and crisis management. Those are designs on paper, not running systems.

New Zealand's 30 May 2026 law, letting the Ministry of Social Development approve automated decisions in social security, sits here too, with a careful reading. The Ministry was clear it means rules-based decisions, not generative AI. By the test above, that is not yet layer-4 agentic. It is the easy, honest case. Every decision has an explicit rule behind it, which makes it more inspectable than a human queue, not less.

The honest edges of this layer

The layer-2 and layer-4 line is not clean. An agent that moves a case across agencies is also making the small judgements along the way. One deployment, two layers. The model keeps the example narrow when the layer-4 framing has to stay sharp.

The line "capability is arriving outside government's control" can be pushed on, fairly. Procurement, model testing, and sovereign-cloud rules do shape what a layer-4 system can do inside government. The direction of travel is set outside government. The exact shape inside it is not. The model tries not to over-claim that government has no say.

And the most important caution. The United Kingdom's troubles with AI in benefits, the bias findings, the complaints about opacity, are real and worth taking seriously. But they come from tools that assist humans at scale, not from agents deciding alone. So they are not proof of anything about the agentic stage. They are an early warning. If the failures show up already, at the gentler human-assist stage, the case for getting oversight right before the agent stage gets stronger, not weaker.

Where this leaves the model

On this reading, layer 4 will keep getting better mostly on its own, and watching it is a distraction from the layers that actually decide whether agentic government works safely. The interesting work is up at oversight and down at the foundation.

The next post moves to the layer the model points at as the binding constraint: oversight and governance.

Sources and further reading

Make a bigger impact tomorrow