Operon v0.24: Why Your Agent Framework Needs a Structural Linter

Five-layer convergence architecture, TLA+ verification, and what it means for Swarms, DeerFlow, and the rest

Bogdan Banu · March 2026 · github.com/coredipper/operon

Release: v0.24.0
Abstract

The agent orchestration ecosystem has fractured into dozens of incompatible frameworks — Swarms, DeerFlow, AnimaWorks, Ralph, A-Evolve — each making different coordination choices. But they all share the same structural failure modes: error amplification in deep chains, sequential handoff overhead, tool-density coordination tax. What if you could analyze any agent topology for these risks before deploying it, regardless of which framework built it? That is what v0.24 delivers: a universal structural linter for agent systems, backed by four TLA+ formal specifications.

1. The Problem: Framework Fragmentation

The ACG survey catalogs over forty distinct multi-agent coordination methods. Each reinvents the same set of coordination patterns — supervisor hierarchies, sequential pipelines, parallel fan-out — with different names, different APIs, and different failure characteristics. There is no shared vocabulary for the structural properties that determine whether a given agent topology will amplify errors or incur unnecessary overhead.

This matters because, as Evans et al. argue in their work on collective intelligence, intelligence is plural: different problem shapes call for different coordination patterns. We need interoperation between frameworks, not a winner-take-all consolidation. But interoperation requires a shared structural language — something that describes what a topology is, independent of which framework instantiated it.

The practical consequence is that framework authors are flying blind. A Swarms HierarchicalSwarm with eight agents and a deep delegation chain may have an error amplification bound of 6.4x, but you would never know it until it failed in production. A DeerFlow LangGraph session with twelve sequential handoffs may spend 40% of its wall-clock time on coordination overhead, but the framework has no way to warn you. These are structural properties — they follow from the topology itself, not from the prompts or models.

The Core Claim

Structural failure modes — error amplification, sequential penalty, tool-density tax — are properties of the agent topology, not the agent implementation. They can be detected statically, before any LLM call is made.

2. The Solution: ExternalTopology as Universal IR

Operon v0.24 introduces ExternalTopology: a single frozen dataclass that any agent framework’s configuration can be parsed into. It captures the minimal structural information needed for analysis — agent names, roles, directed edges, and metadata — without importing anything from the source framework.

@dataclass(frozen=True)
class ExternalTopology:
    source: str              # "swarms", "deerflow", "animaworks", "ralph", "aevolve"
    pattern_name: str        # e.g., "HierarchicalSwarm", "LangGraphSession"
    agents: tuple[dict, ...]  # at least "name" and "role" per agent
    edges: tuple[tuple[str, str], ...]  # directed (from, to) pairs
    metadata: dict = field(default_factory=dict)

The key function is analyze_external_topology(). It takes any ExternalTopology and applies Operon’s four epistemic theorems as a structural linter:

  1. Error amplification bound — how badly can errors cascade through the topology?
  2. Sequential penalty — how much wall-clock time is lost to handoff coordination?
  3. Tool density — is the planning cost ratio manageable?
  4. Topology classification — does the external topology match Operon’s recommendation for the detected task shape?

Here is what it looks like in practice, analyzing a Swarms HierarchicalSwarm:

from operon_ai.convergence import (
    parse_swarm_topology,
    analyze_external_topology,
)

# Parse a Swarms HierarchicalSwarm config
topology = parse_swarm_topology(
    pattern_name="HierarchicalSwarm",
    agent_specs=[
        {"name": "director", "role": "supervisor", "capabilities": ["planning"]},
        {"name": "researcher", "role": "worker", "capabilities": ["web_search"]},
        {"name": "writer", "role": "worker", "capabilities": ["text_gen"]},
        {"name": "reviewer", "role": "critic", "capabilities": ["code_review"]},
    ],
    edges=[
        ("director", "researcher"),
        ("director", "writer"),
        ("researcher", "reviewer"),
        ("writer", "reviewer"),
    ],
)

# Analyze with Operon's epistemic theorems
result = analyze_external_topology(topology)

print(f"Risk score: {result.risk_score}")        # 0.0 - 1.0
print(f"Warnings: {result.warnings}")             # structural concerns
print(f"Recommended: {result.topology_advice}")   # Operon's suggestion
print(f"Template: {result.suggested_template}")   # ready for PatternLibrary

The output is an AdapterResult containing a composite risk score, a list of structural warnings, Operon’s topology recommendation, and a PatternTemplate that can be registered directly in a PatternLibrary for reuse. Zero coupling: the adapter never imports the swarms package. It operates entirely on dicts and tuples.

3. Five Frameworks, One Abstraction

v0.24 ships adapters for five distinct agent orchestration frameworks, each representing a different coordination paradigm:

Framework Coordination Model Parser What Operon Adds
Swarms Graph-based workflows parse_swarm_topology() Error amplification analysis, topology mismatch detection
DeerFlow LangGraph state machines parse_deerflow_session() Sequential penalty detection, skill-to-template bridging
AnimaWorks Supervisor hierarchies parse_animaworks_org() Role-to-stage mapping, memory bridge integration
Ralph Event-driven hats parse_ralph_config() Backpressure analysis, hat-to-stage conversion
A-Evolve Evolutionary workspaces parse_aevolve_workspace() Skill import, evolution gating, monotonic score verification

Every adapter follows the same pattern: parse the framework-specific configuration into an ExternalTopology, then call analyze_external_topology(). The analysis code does not know or care which framework produced the topology. Adding a sixth adapter is approximately 100 lines of parsing code.

The adapters also provide source-specific converters — swarm_to_template(), deerflow_to_template(), ralph_to_template(), animaworks_to_template(), aevolve_to_template() — that produce richer PatternTemplate objects with framework-specific metadata preserved. These templates can be registered in a PatternLibrary and exchanged between organisms using the social learning protocol.

4. Formal Verification: TLA+ for Agent Systems

Structural linting catches topology-level risks. But there are deeper coordination properties that static analysis cannot reach: does template exchange always respect developmental stage requirements? Can a critical period accidentally reopen? Will a non-converging organism always be halted? These are questions about all possible interleavings of concurrent operations — and they require formal verification.

What is TLA+?

TLA+ (Temporal Logic of Actions) is Leslie Lamport’s specification language for modeling concurrent and distributed systems. It lets you describe a system as a state machine — an initial state plus a set of actions that transition between states — and then state invariants and temporal properties that should hold across all reachable states. The TLC model checker exhaustively explores every possible state to verify these properties.

TLA+ is not an academic curiosity. Amazon uses it to verify AWS infrastructure protocols. Microsoft uses it for Azure Cosmos DB’s consistency guarantees. MongoDB uses it for their replication protocol. These are organizations that discovered, the hard way, that testing concurrent systems is insufficient — bugs that depend on specific interleavings of operations will slip through any finite test suite.

Why it matters for agent systems

Agent orchestration is a distributed systems problem. When organism A imports a template from organism B, that is a distributed state update involving trust scoring, stage verification, and library modification. When a watcher checks convergence while stages are still running, that is a concurrent safety check. When A-Evolve’s evolution loop accepts a mutation while another organism is reading the workspace, that is a classic read-write concurrency scenario.

Testing catches bugs in the interleavings your tests happen to exercise. TLA+ catches bugs in all interleavings — including the ones that only manifest when three organisms simultaneously attempt template exchange while one of them is mid-stage-advance. Those are the bugs that make multi-agent systems fail in production in ways nobody can reproduce.

The Verification Thesis

Agent orchestration protocols have the same concurrency characteristics as distributed databases and cloud infrastructure. They deserve the same verification rigor.

What Operon’s specs verify

Operon ships four TLA+ specifications in the specs/ directory. Each models a specific coordination protocol and proves that its safety invariants hold for all possible interleavings.

1. TemplateExchangeProtocol.tla

Models the social learning protocol from operon_ai/coordination/social_learning.py. Organisms maintain pattern libraries, exchange templates with peers, and modulate adoption via EMA-based trust scoring (epistemic vigilance). The biological analogy is horizontal gene transfer in bacteria.

Safety invariant S1 (TemplateAdoptionSafety): An adopted template’s minimum stage requirement never exceeds the adopter’s current developmental stage. In plain English: an organism in the JUVENILE stage can never accidentally import a template that requires MATURE capabilities — no matter what sequence of exports, imports, and stage advances other organisms perform concurrently.

Safety invariant S2 (TrustMonotonicity): Trust is only modified through the RecordOutcome action, which updates trust via exponential moving average. Export, Import, and StageAdvance all leave trust unchanged. Trust cannot be manipulated by any action other than recorded outcomes.

\* From TemplateExchangeProtocol.tla -- the stage guard
Import(org, peer) ==
    /\ org # peer                                    \* No self-adoption
    /\ trust[org][peer] >= MIN_TRUST                 \* Trust guard
    /\ \E tmpl \in exported[peer] :
         /\ tmpl \notin library[org]                 \* Not already held
         /\ StageGEQ(stage[org], MinStage[tmpl])     \* Stage guard
         /\ MeetsAdoptionThreshold(peer, tmpl, trust[org][peer])
         /\ library' = [library EXCEPT ![org] = library[org] \union {tmpl}]

2. DevelopmentalGating.tla

Models the developmental staging protocol from operon_ai/state/development.py. Organisms consume telomere capacity over time, advancing through four stages (EMBRYONIC, JUVENILE, ADOLESCENT, MATURE). Critical periods open and close permanently. The analogy is neurodevelopmental critical periods — windows of maximal plasticity for language acquisition or imprinting.

Safety invariant S2 (CriticalPeriodIrreversibility): Once a critical period is closed, it never reopens. Periods only advance: pending to open to closed, never backwards. Once an organism closes its “rapid learning” window, no sequence of events — no concurrent ticks, no scaffolding from other organisms, no tool acquisitions — can reopen it.

Safety invariant S3 (StageMonotonicity): Developmental stages never regress. An organism that has reached ADOLESCENT cannot return to JUVENILE. This is a temporal property verified across all state transitions.

3. ConvergenceDetection.tla

Models the watcher convergence protocol from operon_ai/patterns/watcher.py. Organisms execute stages and may receive interventions (RETRY, ESCALATE, HALT). When the ratio of interventions to observed stages exceeds a threshold, the organism is halted. This follows Hao et al.’s BIGMAS framework, which uses intervention count as a convergence proxy.

Safety invariant S1 (HaltIsTerminal): Once halted, no more stages or interventions occur. Halt is irreversible and immediate.

Liveness property (BoundedNonConvergence): If an organism’s intervention rate exceeds the threshold, it will be stopped — no matter what other agents are doing concurrently. Non-convergence detection is bounded: the system cannot remain in a state where the rate exceeds the threshold without halting.

4. EvolutionGating.tla

Models the A-Evolve Solve → Observe → Evolve → Gate → Reload loop. Organisms maintain a workspace version and a benchmark score. Mutations are generated nondeterministically; a gate action accepts the mutation only when the new score meets or exceeds the current score. Rejected mutations are rolled back. The analogy is positive selection in directed evolution.

Safety invariant S1 (MonotonicScore): The score never decreases after an accepted mutation. If the workspace version advances, the new score is at least as high as the old score. A-Evolve’s git-based rollback guarantees monotonic improvement.

Safety invariant S2 (GateBeforeDeploy): Workspace versions only advance through the gate — never jump or decrement. Every version increment is preceded by a fitness check. No mutation reaches the workspace without passing the gate.

How TLC model checking works

TLC, the TLA+ model checker, works by exhaustive state exploration. You define a small model — say, 2 organisms, 3 templates, 5 telomere units — and TLC explores every reachable state from the initial configuration. For the TemplateExchangeProtocol with 2 organisms and 3 templates, this means exploring every possible interleaving of Export, Import, RecordOutcome, and StageAdvance actions across both organisms.

If any reachable state violates a safety invariant, TLC produces a counterexample trace: the exact sequence of actions that leads to the violation. This is invaluable for debugging — it tells you not just that something can go wrong, but exactly how.

Operon’s four specs pass TLC on their configured small models. All safety invariants and temporal properties hold for every reachable state in the explored state space.

The gap between model and reality

Honesty demands a caveat. TLA+ proves properties of the model, not the implementation. The model abstracts away timing, network failures, floating-point precision, and Python’s GIL. The TemplateExchangeProtocol spec models trust as a real number in [0, 1]; the implementation uses IEEE 754 doubles. The ConvergenceDetection spec models intervention rate as exact rational division; the implementation uses floating-point comparison.

But these abstractions are deliberate and documented. TLA+ catches design-level bugs — the kind that emerge from incorrect protocol logic, missing guards, or unanticipated action interleavings. These are exactly the bugs that make multi-agent systems fail in ways that testing never reproduces, because the failure requires a specific interleaving of concurrent operations across multiple organisms.

The specs live in specs/ in the Operon repository: TemplateExchangeProtocol.tla, DevelopmentalGating.tla, ConvergenceDetection.tla, EvolutionGating.tla.

5. Co-Design Convergence: Does the Adaptive Loop Stabilize?

The five adapters raise a composition question: when you chain them together in an adaptive assembly loop — run a topology, score its performance, select a better template, repeat — does the loop converge? Or does it oscillate forever between configurations?

This is where Zardini’s co-design theory from ETH Zurich becomes directly applicable. In the co-design framework, each adapter is a design problem (DP): a monotone map from a resource poset (inputs, constraints) to a functionality poset (outputs, capabilities). Monotonicity means: more resources yield at least as many functionalities.

from operon_ai.convergence import (
    DesignProblem,
    compose_series,
    compose_parallel,
    feedback_fixed_point,
)

# Each adapter as a design problem
swarms_dp = DesignProblem(
    name="swarms_adapter",
    evaluate_fn=lambda r: {"risk": analyze_swarms(r), "templates": r["templates"] + 1},
    feasibility_fn=lambda r: r.get("agents", 0) > 0,
)

deerflow_dp = DesignProblem(
    name="deerflow_adapter",
    evaluate_fn=lambda r: {"risk": analyze_deerflow(r), "skills": r.get("skills", 0)},
    feasibility_fn=lambda r: r.get("session") is not None,
)

# Series composition: swarms analysis feeds deerflow analysis
pipeline = compose_series(swarms_dp, deerflow_dp, name="swarms_then_deerflow")

# Feedback fixed-point: does the adaptive loop converge?
final_state, iterations, converged = feedback_fixed_point(
    pipeline,
    initial={"agents": 4, "templates": 0, "session": True, "skills": 0},
    convergence_key="risk",
    epsilon=0.01,
    max_iterations=100,
)

print(f"Converged: {converged} in {iterations} iterations")
print(f"Final risk: {final_state['risk']}")

The adaptive assembly loop (run → score → select → repeat) is feedback composition in Zardini’s framework. The monotone convergence theorem guarantees that if the scoring function is monotone and bounded, the sequence of scores approaches a limit. feedback_fixed_point() provides practical epsilon-approximate termination: when the score changes by less than epsilon between iterations, the loop halts.

Why This Matters

Without convergence guarantees, an adaptive assembly loop can oscillate indefinitely — template A scores better than B in context X, but B scores better in context Y, and switching templates changes the context. Co-design theory provides the mathematical scaffolding to prove this does not happen when the scoring function is well-behaved.

6. What This Means for Framework Authors

If you are building an agent framework, Operon can serve as your structural linter. The integration model is deliberately lightweight:

The convergence adapters also provide catalog-seeding functions — seed_library_from_swarms(), seed_library_from_deerflow(), seed_library_from_ralph() — that populate a PatternLibrary with pre-analyzed templates from each framework’s standard workflow patterns. These are ready to use as starting points for the adaptive assembly loop.

7. What’s Next

v0.24 completes the convergence adapter layer (phases C1 through C4). The roadmap continues:

The convergence paper documenting the formal framework and proofs is available at article/paper2/main.pdf.

8. Closing

The agent ecosystem does not need one framework to win. It needs a shared structural language for reasoning about agent coordination — what Operon calls the “structural layer.” v0.24 is a concrete step toward that goal: five adapters proving that one abstraction works across graph-based, event-driven, and evolutionary paradigms. Four TLA+ specifications proving that the coordination protocols are safe under all interleavings. And a co-design convergence theorem proving that the adaptive assembly loop stabilizes.

The structural linter is free, the adapters are zero-coupling, and the formal specs are open. If your framework coordinates agents, Operon can tell you what will go wrong before you deploy it.

Code and release: github.com/coredipper/operon, operon-ai on PyPI, convergence docs, convergence paper