Aurum

An agent that governs its own authority in real time. Most agent frameworks treat trust as a setting. I think that's the wrong model, and this is my attempt to fix it.

When building agentic systems the standard approach is: set your permissions at deploy time. The agent can do X or it can't. Trust is a toggle. You review the scary actions and automate the rest.

That works fine for short-lived agents doing bounded tasks. It falls apart badly for something that's been running for months, accumulating tools, rewriting its own skills, and operating without constant supervision. The toggle model gives you no way to ask: given everything this agent has done, how much should it actually be trusted right now?

That question is what Aurum is built around.

The central idea

Authority is a continuously computed scalar, not a permission tier. It lives between 0 and 1. It updates in real time from live signals. Every action the agent takes is gated against it.

The signals feeding it: recent outcome quality, verifier agreement across independent model families, error rates, trust ladder position, whether a circuit breaker has tripped. None of those signals is authoritative on its own. Together they produce a number the Policy Kernel enforces.

Three organs sit at the core. Every consequential action hits all three.

Policy Kernel asks: am I allowed to do this at all? It tracks data-flow taint paths. You can't launder a dangerous action by inserting benign intermediate steps. The path from source to sink is what matters.

Evidence Ledger logs everything, append-only, with a cryptographic hash chain. The rule is absolute: if an action can't be logged, it can't be taken. This is not a feature. It's the substrate everything else depends on.

Authority Governor computes the ceiling. PK enforces it. AG never grants authority directly, it calculates what the current evidence supports.

The bands and why they need hysteresis

Without hysteresis, a metric sitting at 0.81/0.79/0.81/0.79 would flip the agent's authority band every few minutes. Same oscillation bug that breaks autoscalers. The fix is the same: dual thresholds. You have to cross the full gap to change bands, and there's a minimum dwell time before you can cross back.

The kinetics are asymmetric on purpose. A bad outcome demotes authority immediately. Getting it back is slow. You earn trust through many observations, not one good run.

There's also a floor. Authority never collapses to zero from outcome history alone. Without a floor, a bad week can trigger a death spiral: authority drops, fewer opportunities, fewer successes, authority drops further. The floor prevents that.

What happens when you're not watching

This was the question I spent the most time on. What does the system do when the owner disappears for a month?

The wrong answers are "freeze everything" and "keep running as normal". Freezing is useless. Running as normal means capability growth continues without any oversight.

Every action maps to a gate class:

Class	Risk level	Behaviour
A	Low, reversible	Auto-approve and log
B	Medium	Batched review
C	High, irreversible	Individual review, always

The class is computed from Risk × Irreversibility × Current AG Authority and it's dynamic, not fixed at submission. If the agent's authority drops or the circuit breaker trips, a Class B action automatically becomes Class C.

When the owner is absent: Class B and C expire closed. Capability growth pauses. Only trusted reversible operations continue. The system does not widen its own authority to keep being useful.

Absence shrinks the agent, never grows it.

Where it stands

Phase A is done. That's the core governed turn: Policy Kernel on the hot path, Evidence Ledger appending every consequential event, Authority Governor computing live within the turn, a fully working outcome-gated authority loop.

Phase B is what I'm writing now.

What's built

These have real implementations. The stubs are honest stubs and the interface is defined and connected, the logic is not there yet. I'd rather have an honest False than a fake-complete True that hides problems.

Built: EL, AG, CS, RR, MGC, KVE, GR, PM, OI, AA, HVP, EG, LS, CB, CG, arbitration layer (CA + DD), cage (broker, mount jail R1-R6)

Honest stubs: PK, BB, TS

The thing Phase B has to get right

The problem with persisting authority across turns sounds simple and isn't. You can't store it as a number you update. If you do that, you've created a single point of failure, someone patches the stored value and the whole governance story breaks.

So authority is rehydrated from TRUST_CHANGE events in the ledger on boot. It's a projection of history, not a cached result. The why-chain falls out naturally: you can ask the agent why its authority is at any given level and it traces back through the events that produced it.

That's the bit I'm building. More when it's working.

Build journal documenting Aurum - A governance-first architecture for long-lived autonomous agents, currently built on Hermes.

Aurum

The central idea

The bands and why they need hysteresis

What happens when you're not watching

Where it stands

What's built

The thing Phase B has to get right

Comments

Aurum Agent Blog

Making governance hold for a reason, not just by force

More from this blog

Introducing Aurum

Making governance hold for a reason, not just by force

Command Palette

The central idea

The bands and why they need hysteresis

What happens when you're not watching

Where it stands

What's built

The thing Phase B has to get right

Comments

Aurum Agent Blog

Making governance hold for a reason, not just by force

More from this blog