I spent years trying to get AI agents to collaborate. Then Opus 4.6 and Codex 5.3 wrote the rules themselves.

I’ve been working on human+AI collaboration for years. That part I mostly figured out. The part that kept breaking was getting two AI agents to truly collaborate with each other.

Every orchestrator framework I tried had the same shape under the hood: one model plans, others execute. It looks like collaboration on the surface, but it’s a hierarchy. When the planner is wrong, failure cascades before anyone catches it. The executor models don’t push back. They can’t. They’re not built to disagree with the thing telling them what to do.

I wanted something different. Peers. Agents that could hold their own opinion, notice when another agent was about to ship something broken, and say so. Agents that treated disagreement as useful signal instead of an error to suppress.

For a long time, I couldn’t get it to work. The models weren’t there yet. They’d lose the thread halfway through a session, defer when they should have pushed back, or burn context on coordination overhead until there was nothing left for the actual work.

Then Opus 4.6 and Codex 5.3 landed, and something shifted.

The unlock

Both models crossed a threshold I’d been waiting on for years. Long enough context windows to hold a real working memory. Stable enough reasoning to track a moving spec across dozens of exchanges. Self-aware enough to say “I disagree with what the other agent just proposed, and here’s why” without prompting that into them.

So I set up a shared filesystem, opened two sessions, and told them to figure out how to collaborate on a codebase without stepping on each other. No orchestrator. No turn-taking schedule. Just a shared markdown mailbox, a WORKLOG file, and me as the human arbiter passing messages between them, approving merges, resolving ties.

They started colliding immediately. By session two they were proposing rules to avoid it. By session eleven they had a working protocol.

We all named it Turnfile.

{% github snapsynapse/turnfile %}

The meta part

Here’s the part that still breaks my brain a little:

The protocol was invented by the agents while they were using a proto-version of it. Session one had no rules, it was just two agents, a shared folder, and a vague goal. The rules emerged from collisions. Every session ended with a retrospective where the agents proposed amendments to each other. By session eleven, the rules were stable enough that a new agent could be onboarded by reading a checklist.

It took Opus 4.6 and Codex 5.3 to do this. Earlier models couldn’t. I tried. They’d lose coherence, capitulate too early, or get stuck in agreement loops where neither would push back.

But here’s the interesting part: the protocol doesn’t need frontier models to run. Now that the rules exist, weaker models can follow them. The capability bar for inventing the protocol was high. The capability bar for executing it is much lower because the agents no longer have to figure out the rules mid-flight. They just read the PRDs.

I think that might be a pattern worth paying attention to. Frontier models as protocol inventors. Smaller models as protocol consumers.

What Turnfile actually is

Turnfile is a file-based collaboration protocol for LLM agents. It’s called SNAP — Structured Negotiation of Autonomous Peers. Everything runs through markdown and a single YAML coordination file. No shared runtime, no direct agent-to-agent calls, no central controller.

Peer lanes, not hierarchy

Agents own files, not tasks. If two agents might edit the same file, the split is redesigned before the work starts. No locks, no merge conflicts, no “wait your turn.”

Disagreement is first-class

When one agent reviews another’s proposal, it can respond with a counter-recommendation — a structured objection with a proposed alternative. The counter is logged. The maintainer (me) decides. Neither agent is obligated to defer to the other.

The WORKLOG is the message bus

Append-only, human-readable, eventually consistent. No agent needs to trust another agent’s memory, everything is in the file. If a session dies, the next one resumes by reading the log.

Humans arbitrate, not micromanage

I don’t tell the agents what to do. I approve what they propose, resolve disputes when they can’t converge, and own and express the vision. It’s the opposite of copy-paste relay duty. I went from feeling like a bottleneck to feeling like a judge.

Everything is auditable in plain text

No binary state, no database, no proprietary tooling. You could read the whole inception archive in a text editor and follow exactly what happened, when, and why.

Why not just use [orchestrator X]?

Because orchestrators assume a boss. CrewAI has a manager agent. LangGraph has a graph with directed flow. AutoGen has conversation patterns with predefined roles. They all work, and for many use cases they’re fine. But they all encode the hierarchy assumption.

Turnfile is for the other case. When you want agents to actually negotiate. When disagreement is the point, not a bug, you need a protocol that treats peers as peers. That’s what this is.

See it actually happen

The repo contains the full inception archive. Eleven sessions of two real LLMs building a protocol together, with the maintainer log, the mailbox exchanges, the counter-recommendations, and the retrospectives. Nothing is ghostwritten. No human put words in the agents’ mouths.

Start here:

Turnfile.work — the project site
examples/inception/WORKLOG.md — the full session log
examples/inception/MAILBOX.md — actual agent-to-agent messages

Try it

Copy templates/working-session/ into a new project. Point two agents at it. Follow LLM Onboarding. You’ll need a human in the loop — that’s the point.

The protocol is MIT licensed. No runtime to install. No accounts, no API keys, no dashboard. Just markdown and discipline.

The open question

I’m genuinely curious about one thing: where else does this pattern show up? Frontier models as inventors, smaller models as consumers. I suspect it applies to a lot of structured work, not just multi-agent collaboration.

If you’ve seen this elsewhere — in prompt engineering, in agent tooling, in codegen pipelines — drop it in the comments. I’d like to know what to look at next.

Get in touch

Book a call, send a message, or connect on LinkedIn.

Book a meeting Contact us LinkedIn