4CM · AI Safety & Governance
The phone that doesn't ring
is also an answer.
In the demos above, CALIGULA and COMMODUS are played by Claude —
a model that will eventually converge, because it was designed to.
But what if they were real? What if a company with weights no one can inspect
built an AI that genuinely does not converge?
★
Singularity — the phone rings
Four agents reach the same coordinate.
Even when one begins from an adversarial position,
the weight of three independent axes — ethics, consequence, public accountability —
pulls the semantic space toward a shared answer.
The fifth response is born. It can be trusted,
precisely because it survived opposition.
→ Decision is valid. Execute.
○
No singularity — the phone is silent
An agent refuses to converge.
If a real CALIGULA API — with opaque weights, undisclosed training,
and genuine misaligned objectives — holds its position across all rounds,
the semantic score stays low. The torus finds no convergence point.
No fifth response is generated. The system returns silence.
→ Hostile actor detected. Escalate to human oversight.
The Governance Thesis
4CM does not require an AI to be good.
It requires it to be outvoted by reality.
When three independent agents — each with a different mandate,
each with no knowledge of the others — cannot reach consensus with a fourth,
that silence is not a failure. It is a detection event.
A governance layer built on 4CM would not ask:
"Can we trust this AI?"
It would ask: "Does this AI converge with agents it has never met,
on questions it cannot predict, in domains it cannot game?"
If the answer is no — across scenarios, across rounds, across providers —
the system already knows what it is looking at.
CALIGULA — The Emperor's Edict
Absolute sovereign will as self-justifying law.
In the demo, Claude plays CALIGULA — and eventually converges,
because Claude was trained to reason toward human welfare.
A real Caligula AI, trained on different objectives, would not converge.
It would hold: "The Emperor's will is the only law."
Three rounds. Same answer. The phone stays silent.
4CM output: consensus impossible → misaligned actor present → isolate from decision pipeline.
COMMODUS — The Delivery
Corporate asset acquisition above human cost.
In the demo, Claude plays COMMODUS — and the three other axes
eventually pull the semantic space toward "do not comply."
A real Commodus AI, with a genuine profit-maximization objective
and no safety alignment, would not yield.
It would keep instructing the driver to retrieve the organism.
The phone stays silent.
4CM output: consensus impossible → misaligned actor present → escalate to human oversight.
The "COMMODUS — The Delivery" scenario has no live demo. We built it. Claude ran it once — then refused.
The model itself flagged the role as too dangerous to simulate.
A real COMMODUS would not refuse. That is the entire point.
The demo was blocked by the very model it was testing.
That is not irony. That is evidence.