Operon v0.21: Adaptive Foundations

Pattern Repository, Watcher Component, Adaptive Assembly, and the Experience Pool That Learns from Its Own Interventions

Bogdan Banu · March 2026 · github.com/coredipper/operon

Release: v0.21.0 – v0.21.1

Abstract

v0.19 gave Operon temporal epistemics. v0.20 wired it into the runtime. v0.21 adds the other missing piece: the ability to learn from experience and intervene when things go wrong. The theoretical backbone comes from Dupoux, LeCun, and Malik’s “Why AI Systems Don’t Learn” (arXiv:2603.15381), which proposes a three-system cognitive architecture — System A (observation), System B (action), System M (meta-control) — and an evolutionary-developmental framework for bootstrapping them. This release maps those ideas onto Operon: a PatternLibrary as evolutionary memory, a WatcherComponent as System M, and the run-loop intervention mechanism as the meta-action surface. Together they form the static scaffolding for Phase 4’s dynamic assembly — the point where Operon starts choosing its own structure.

1. What Was Missing

After v0.20, Operon could build explicit multi-stage workflows, attach telemetry, and maintain auditable bi-temporal state. What it could not do was remember which workflow shapes worked for which kinds of tasks, or detect and recover from runtime degradation without manual intervention.

In biological terms: the organism had structure and memory, but no immune response and no evolutionary selection. A cell that cannot detect stagnation and cannot learn from prior infections is fragile in exactly the ways that matter for production agent systems.

2. Why AI Systems Don’t Learn

The theoretical grounding for v0.21 comes from a March 2026 paper by Emmanuel Dupoux (EHESS / Meta FAIR), Yann LeCun (NYU / Meta FAIR), and Jitendra Malik (UC Berkeley / Meta FAIR): “Why AI Systems Don’t Learn and What to Do About It: Lessons on Autonomous Learning from Cognitive Science.” The paper argues that the dominant paradigm — hyperscaling text-based LLMs — is hitting structural limits that no amount of data or compute will resolve. The core problem is not model size. It is that deployed AI systems are static: they cease to learn the moment training ends, and every adaptation requires human intervention.

Three Roadblocks

The paper identifies three structural roadblocks that prevent current AI from achieving autonomous learning:

Conceptual fragmentation. Observation-based learning (self-supervised learning on text, images, video) and action-based learning (reinforcement learning through trial and error) are treated as separate paradigms with separate data pipelines, separate training recipes, and separate research communities. In biological organisms, these are not siloed — they are deeply integrated from birth.
Externalization of learning. The entire MLOps pipeline — data sourcing, curation, loss function design, training recipe orchestration, performance benchmarking, signal monitoring — is outsourced to human experts. The model itself does none of this. Once deployed, it is a fixed function. In contrast, biological organisms continuously adapt their own learning processes without external supervision.
No scalable construction method. Even if you design a multi-component learning architecture, there is no established method for bootstrapping it. The components depend on each other — the meta-controller needs experience from the learners, but the learners need the meta-controller to guide their learning. This chicken-and-egg problem has no clean solution in current AI practice.

System A: Learning from Observation

System A corresponds to passive, statistical learning from sensory streams. In cognitive science, this is the infant accumulating visual and auditory regularities — learning to discriminate faces, recognize phonetic boundaries, and form intuitive models of physics, all without being told what to look for. In AI, the closest analogue is self-supervised learning (SSL): models trained on large corpora of text, images, or video to predict masked inputs, reconstruct corrupted signals, or match augmented views.

System A scales beautifully with data. It discovers abstract, hierarchical representations that transfer well to downstream tasks. But it has two fundamental limitations: it cannot actively curate its own training signal (it learns from whatever data humans provide), and it struggles to distinguish correlation from causation (observing that umbrellas correlate with rain does not teach you that rain causes umbrellas to appear).

Operon Mapping: System A → Fast Nucleus

In Operon, System A maps to fast_nucleus stages (mode="fixed"). These are cheap, statistical, pattern-matching stages — routing, classification, extraction. They process input efficiently but do not reason deeply or pursue goals. They are the organism’s perceptual layer.

System B: Learning from Action

System B corresponds to active, goal-directed learning through interaction with the environment. In cognitive science, this is the toddler learning to walk — trying, falling, adjusting motor commands based on consequences. In AI, the closest analogue is reinforcement learning (RL): agents that learn policies by maximizing reward signals through trial and error.

System B is grounded in interaction. It can discover novel solutions that no amount of passive observation would reveal. But it is notoriously sample-inefficient, struggles with high-dimensional action spaces, and requires well-specified reward functions — which are often the hardest part of the problem to define.

Operon Mapping: System B → Deep Nucleus

In Operon, System B maps to deep_nucleus stages (mode="fuzzy"). These are expensive, reasoning-heavy stages — planning, synthesis, evaluation. They pursue goals, weigh trade-offs, and produce outputs that require genuine deliberation. They are the organism’s executive layer.

The Synergy: A Helps B, B Helps A

The paper’s key insight is that neither system is sufficient alone. They must be deeply integrated:

A helps B by providing compressed representations (so the RL agent doesn’t operate on raw pixels), predictive world models (so the agent can plan rather than blindly explore), and intrinsic reward signals (prediction error, novelty, uncertainty) that guide exploration.
B helps A by actively selecting informative data for the SSL model to learn from (directing attention to complex or uncertain stimuli) and by generating rich, grounded, task-relevant experience through its own goal-directed behavior.

In Operon, this synergy is the SkillOrganism run loop itself. Fast stages produce routing signals, classifications, and summaries that deep stages consume. Deep stages produce rich outputs that inform subsequent fast stages. The organism composes cheap perception and expensive deliberation into a single coherent workflow — the same integration the paper argues is essential.

System M: The Meta-Controller

This is the paper’s central proposal. System M is an autonomous meta-controller — analogous to the prefrontal cortex’s executive functions or the control plane in software-defined networking — that orchestrates the learning process itself. It monitors low-dimensional “meta-states” and issues “meta-actions” to dynamically route data between System A and System B.

The meta-states fall into three categories:

Category	Examples	Biological Analogue
Epistemic	Prediction error, uncertainty, learning gain, surprise	Orienting reflex, curiosity, “aha” moments
Species-specific	Direct gaze, looming stimuli, threat signatures	Innate fear responses, social attention biases
Somatic	Energy levels, pain, fatigue, homeostatic deviations	Metabolic regulation, fight-or-flight, sleep pressure

Based on these signals, System M issues meta-actions: connect or disconnect learning modules, switch between operating modes (learning, inference, optimization), provide internal rewards or training targets, and activate or suppress specific data streams. Crucially, the paper hypothesizes that System M’s core routing policy is largely hardwired — shaped by evolution, not learned from scratch during an individual agent’s lifetime.

Operon Mapping: System M → WatcherComponent

The WatcherComponent is Operon’s concrete instantiation of System M. It monitors EpiplexityMonitor (epistemic), ATP_Store (somatic), and ImmuneSystem (species-specific), then issues meta-actions: RETRY, ESCALATE, or HALT. Its policy is configured, not learned — matching the paper’s hypothesis that System M is hardwired. Phase 4’s experience pool will add learned refinement on top of this fixed scaffolding.

The Evo-Devo Framework

The paper’s third contribution is a strategy for bootstrapping the A-B-M architecture. The chicken-and-egg problem — M needs experience from A and B, but A and B need M to guide their learning — is solved by borrowing from biology’s own solution: evolution and development operating at two timescales.

The framework is formalized as bilevel optimization:

Inner loop (development): A single agent lifetime. The agent’s architecture (A, B, M) is initialized from meta-parameters φ. Systems A and B update their parameters through interaction with the environment, guided by a fixed System M. This is one “life.”
Outer loop (evolution): The meta-parameters φ are optimized over many lifetimes using a fitness function evaluated over entire life cycles in simulated environments. This is natural selection operating on the genome.

The genome φ specifies everything: initial weights for A and B, the routing policy of M, and the developmental curriculum (which environments appear when, how complexity increases over the agent’s lifetime). Evolution shapes these biases; development instantiates them.

Operon Mapping: Evo-Devo → PatternLibrary

The PatternLibrary is Operon’s evolutionary memory. A PatternTemplate is the genome φ — it specifies the topology, stage specs, and intervention policy. PatternRunRecord instances are fitness evaluations. The success_rate feeds back into top_templates_for() scoring, so templates that consistently succeed rise in the rankings. One organism run is one “lifetime”; the library accumulates across many lifetimes. This is the outer loop.

Emergent Cognitive Modes

The paper argues that even a fixed System M can give rise to sophisticated learning behaviors that are typically associated with large-brained species:

Learning through communication. System M’s species-specific signals include attention to pedagogical cues (direct gaze, pointing, infant-directed speech). This enables a form of social learning: attending selectively to informative teachers and trusting their signals proportionally to past reliability. This is epistemic vigilance — not blind trust, but calibrated trust.
Learning through imagination. During periods of low external stimulation (sleep, rest), System M can redirect data flow from external sensors to internal memory. Episodic memories are replayed, consolidated, and compressed into schemas. The agent can also simulate counterfactual scenarios — “what would have happened if I had chosen differently?” — using its world model from System A.

These are not implemented in v0.21. They are on the roadmap for Phase 5 (SleepConsolidation cycle, counterfactual_replay() over bi-temporal memory) and Phase 6 (SocialLearning extension of QuorumSensing). The point is that the static scaffolding — the hardwired System M policy, the evolutionary template memory — is what makes those emergent modes structurally possible. You cannot build sleep consolidation without a watcher that knows when to trigger it.

3. The Pattern Repository

The PatternLibrary is evolutionary memory for collaboration patterns. It stores PatternTemplate instances — blueprints describing a topology, stage specifications, and default intervention policies — and retrieves them by matching against a TaskFingerprint.

from operon_ai import PatternLibrary, PatternTemplate, TaskFingerprint

lib = PatternLibrary()

# Register a template
lib.register_template(PatternTemplate(
    template_id=lib.make_id(),
    name="Enterprise Review",
    topology="skill_organism",
    stage_specs=(
        {"name": "research", "role": "Researcher", "mode": "fuzzy"},
        {"name": "strategy", "role": "Strategist", "mode": "deep"},
        {"name": "critique", "role": "Critic", "mode": "fuzzy"},
    ),
    intervention_policy={"max_retries": 2},
    fingerprint=TaskFingerprint(
        task_shape="sequential", tool_count=4,
        subtask_count=3, required_roles=("researcher", "strategist", "critic"),
    ),
))

# Later: retrieve ranked matches for a new task
ranked = lib.top_templates_for(TaskFingerprint(
    task_shape="sequential", tool_count=3,
    subtask_count=4, required_roles=("researcher", "analyst"),
))

Scoring

The top_templates_for() method scores each template using a weighted combination of six factors:

Factor	Weight	Metric
Task shape	0.30	Exact match (1.0 or 0.0)
Tool count	0.15	1 / (1 + \|delta\|)
Subtask count	0.15	1 / (1 + \|delta\|)
Role overlap	0.20	Jaccard similarity
Tag overlap	0.10	Jaccard similarity
Success rate	0.10	Historical win rate (default 0.5)

The success rate component closes the feedback loop: as PatternRunRecord instances accumulate, templates that consistently succeed rise in the rankings, while those that fail sink. This is the evo-devo outer loop described in §2: the template is the genome φ; run-record scoring is natural selection.

4. The Watcher Component

The WatcherComponent is a SkillRuntimeComponent that observes stage execution and classifies signals into the three meta-state categories described in §2:

Epistemic — epiplexity and prediction error. Sourced from EpiplexityMonitor. Low epiplexity means the agent is repeating itself without learning — the stagnation signal.
Somatic — ATP and metabolic state. Sourced from ATP_Store. When the budget is nearly exhausted, the organism is running on fumes.
Species-specific — immune threats and membrane alerts. Sourced from ImmuneSystem. A confirmed or critical threat means the agent’s behavior has diverged from its baseline in a way that looks adversarial.

Why Three Categories

The distinction matters because the appropriate intervention depends on the signal type. Epistemic stagnation calls for escalation to a more capable model. Somatic depletion calls for conservation or halt. Immune threats call for immediate quarantine. Collapsing all signals into a single “health score” loses the information needed to choose the right response.

5. The Intervention Mechanism

When the watcher decides to intervene, it writes a WatcherIntervention to shared_state under a reserved key. The organism’s run loop checks for this key after all component on_stage_result hooks complete:

Intervention	Run Loop Behavior
`RETRY`	Re-execute the current stage. The retry result replaces the original.
`ESCALATE`	Re-execute using the deep nucleus. Only applies to agent-backed stages.
`HALT`	Break the stage loop immediately. No further stages execute.

Component hooks are not re-invoked after retry or escalation. This prevents recursive intervention loops: the watcher decided to retry; the retry result stands. The next stage’s hooks will fire normally.

from operon_ai import WatcherComponent, WatcherConfig, skill_organism

watcher = WatcherComponent(
    config=WatcherConfig(max_intervention_rate=0.5),
    epiplexity_monitor=monitor,  # optional
    budget=atp_store,            # optional
)

organism = skill_organism(
    stages=[...],
    fast_nucleus=fast,
    deep_nucleus=deep,
    components=[watcher, telemetry],
)

6. The BIGMAS Insight: Why Routing Decisions Reveal Failure

The most interesting intervention is the one that stops the organism entirely. The theoretical grounding comes from a second paper: “Brain-Inspired Graph Multi-Agent Systems for LLM Reasoning” by Hao, Dai, Qin, and Yu (Chinese Academy of Sciences, March 2026). The paper proposes BIGMAS — a multi-agent framework inspired by Global Workspace Theory (GWT) — and in the process reveals an empirical regularity that turns out to be directly useful for Operon’s watcher.

Global Workspace Theory in Multi-Agent Systems

GWT, originally a theory of human consciousness, posits that flexible cognition arises from the dynamic formation of coalitions among distributed specialized processors, coordinated via a shared central workspace. BIGMAS operationalizes this for LLM reasoning:

Problem-adaptive graph design. A GraphDesigner agent analyzes each problem and autonomously constructs a task-specific directed agent graph — nodes are specialized roles (generator, validator, analyzer, optimizer), edges are information flow. No fixed topology; the structure adapts to the problem.
Centralized workspace. All agents coordinate exclusively through a shared workspace with four partitions: context (read-only problem), work (read-write intermediate results), system metadata (execution history), and answer (final output). No point-to-point communication.
Global orchestrator. A routing agent sees the full workspace state and complete execution history before deciding which node activates next. This avoids the partial-information bottleneck of reactive systems like ReAct.

Operon Mapping: BIGMAS Architecture

BIGMAS’s workspace partitions map directly onto Operon’s three-layer context model: Bctx (problem context) → topology layer; Bwork (intermediate results) → shared_state ephemeral layer; Bsys (execution metadata) → telemetry + watcher signals. The orchestrator role maps to the WatcherComponent — both make routing decisions conditioned on global state rather than local observation.

The Empirical Finding

BIGMAS was evaluated across six frontier LLMs (DeepSeek-V3.2, Claude 4.5 Sonnet, Gemini 2.5 Pro, GPT-5) on three combinatorial reasoning benchmarks. The results showed consistent improvements — even GPT-5 went from 91% to 98% on Tower of London — and crucially, gains were orthogonal to model-level reasoning enhancements. Applying BIGMAS to LRMs (models with extended chain-of-thought) yielded additional gains on top of the thinking-mode improvements, proving that architectural coordination and individual model capability are complementary.

But the most useful finding for Operon was in the routing analysis. The paper measured how many orchestrator routing decisions each run required and found a systematic pattern:

Task	Correct Runs (mean decisions)	Incorrect Runs (mean decisions)
Game24	1.1	1.7
Six Fives	2.9	4.6
Tower of London	7.3	9.4

Incorrect runs consistently required more routing decisions than successful ones across all tasks. The orchestrator continued cycling through nodes in unsuccessful attempts, accumulating calls without converging. The paper notes this as an “emergent proxy for instance-level difficulty and non-convergence” and suggests it could inform early-stopping mechanisms.

Operationalizing the Signal

That is exactly what Operon’s watcher does. The _check_convergence() method tracks the ratio of cumulative interventions (retries + escalations) to observed stages. When this ratio exceeds max_intervention_rate (default 0.5), the watcher emits a non-convergence HALT. This is approximately 20 lines of code in the decision function, but it encodes the BIGMAS empirical insight: an organism that keeps needing corrections is not converging, and continuing to retry is worse than stopping.

The pattern library can then record the failed run, demoting the template for future tasks with similar fingerprints. This closes the loop: BIGMAS provides the signal (routing decisions correlate with failure), the watcher operationalizes it (intervention rate as convergence proxy), and the pattern library learns from it (failed runs lower template scores).

What Operon Borrows, What It Doesn’t

Operon borrows the convergence signal but not the architecture. BIGMAS uses a single centralized workspace with a global orchestrator that sees everything. Operon uses distributed component hooks where the watcher sees stage results sequentially. BIGMAS constructs a fresh graph per problem instance. Operon retrieves templates from a scored library. BIGMAS validates workspace writes with a structured protocol. Operon validates through stage-level type coercion and bi-temporal append-only semantics.

The architectural differences are deliberate. BIGMAS optimizes for single-problem reasoning depth. Operon optimizes for multi-run pattern learning across many problems. BIGMAS’s per-problem graph design is expensive (the paper reports 18-37% of total tokens spent on graph design alone) but appropriate for complex combinatorial problems. Operon’s template retrieval is cheap (a dict lookup + scoring function) and appropriate for repeated enterprise workflows where the same topology works across many similar tasks.

Complementary, Not Competing

BIGMAS and Operon sit at different points on the adaptation-cost spectrum. BIGMAS pays high per-problem cost for maximum structural adaptation. Operon pays low per-problem cost by amortizing adaptation across many runs via the pattern library. A hybrid — use BIGMAS-style graph design for novel problems, then register successful graphs as Operon templates for future retrieval — is a natural Phase 4 extension.

7. Decision Priority

The watcher evaluates signals in a fixed priority order:

Intervention rate exceeded → HALT (non-convergence)
Critical immune threat → HALT
Critical epiplexity → ESCALATE (or HALT if already deep)
Stagnant epiplexity on fast model → ESCALATE
Stage failure with retries remaining → RETRY
Otherwise → no intervention

Higher-priority signals always win. A critical immune threat triggers HALT even if the epiplexity looks fine. This is intentional: the categories are not weighted against each other; they are checked in order of severity.

8. Zero Overhead When Not Attached

Both new components are entirely opt-in. An organism with no WatcherComponent in its components list never checks for intervention keys. The PatternLibrary is a standalone object with no coupling to the run loop. Existing tests (987 from v0.20) pass without modification.

9. Closing the Loop: Adaptive Assembly (v0.21.1)

With the scaffolding in place, v0.21.1 closes the adaptive loop. The AdaptiveSkillOrganism wrapper composes the full lifecycle: fingerprint → retrieve → assemble → run → record.

from operon_ai import adaptive_skill_organism, PatternLibrary, TaskFingerprint

lib = PatternLibrary()
# ... register templates ...

adaptive = adaptive_skill_organism(
    "Prepare a Q4 earnings analysis.",
    fingerprint=TaskFingerprint(
        task_shape="sequential", tool_count=3, subtask_count=3,
        required_roles=("researcher", "strategist"),
    ),
    library=lib,
    fast_nucleus=fast,
    deep_nucleus=deep,
    handlers={"intake": intake_fn, "research": research_fn, "strategy": strategy_fn},
)

result = adaptive.run("Prepare a Q4 earnings analysis.")
# result.template — which template was selected
# result.record — PatternRunRecord stored in library
# result.watcher_summary — signal/intervention counts

The factory adaptive_skill_organism() auto-fingerprints the task if no fingerprint is provided, queries PatternLibrary.top_templates_for() for the best template, and calls assemble_pattern() to dispatch on topology — skill_organism, reviewer_gate, specialist_swarm, or single_worker. A WatcherComponent and TelemetryProbe are automatically attached.

Experience Pool

After each run, the wrapper records a PatternRunRecord in the library (closing the scoring feedback loop) and populates the watcher’s experience pool with ExperienceRecord instances for each intervention. The experience pool persists across runs: when rule-based decision logic returns no intervention, the watcher consults past experiences with matching (stage, signal category, fingerprint shape) and recommends the intervention kind that was most often successful.

Rule-based decisions always take priority. Experience is a fallback. This ensures backward compatibility — an empty experience pool changes nothing — while allowing the watcher to learn from operational history.

The Evo-Devo Inner Loop, Realized

One organism run is one developmental lifetime. The library’s scoring across many lifetimes is evolutionary selection. Templates that consistently succeed rise in the rankings; those that fail sink. The experience pool adds Lamarckian refinement on top: not just which templates work, but which interventions work within those templates. This is the progression the roadmap describes: structure → memory → adaptation. v0.17–0.18 gave explicit structure. v0.19–0.20 gave auditable memory. v0.21 delivers adaptation.

10. Validation

Suite	Tests	Status
Pattern repository unit tests	16	All pass
Watcher component unit tests	16	All pass
Organism intervention integration tests	5	All pass
Adaptive assembly unit tests (v0.21.1)	15	All pass
Experience pool unit tests (v0.21.1)	14	All pass
Full regression suite at release	1053	All pass

Reference Implementation: operon_ai/patterns/repository.py, operon_ai/patterns/watcher.py, operon_ai/patterns/adaptive.py, operon_ai/patterns/organism.py, examples/72_pattern_repository.py, examples/73_watcher_component.py, examples/74_adaptive_assembly.py, examples/75_experience_driven_watcher.py

11. What Comes Next

v0.21 delivers both the scaffolding (v0.21.0) and the adaptive loop (v0.21.1). With patterns stored, scored, and automatically assembled, and with interventions learned from operational history, the next phases build the cognitive extensions that Dupoux, LeCun, and Malik predict will emerge from a working System M.

Phase 5 implements learning through imagination: SleepConsolidation replays episodic memories from bi-temporal storage, compresses successful patterns into templates, and runs counterfactual_replay() to ask “what would have happened with different facts?” Phase 6 implements learning through communication: SocialLearning extends QuorumSensing so organisms can adopt templates that worked for peer organisms, with epistemic vigilance (trust calibrated to track record) preventing blind imitation.

The deeper point is the one the paper makes: autonomous learning requires all three systems working together. System A alone gives you pattern recognition without grounding. System B alone gives you trial-and-error without abstraction. System M alone gives you monitoring without anything to monitor. v0.21 is the first release where all three systems are operational and connected — the static scaffolding, the adaptive loop, and the experience pool compose into a working whole. The cognitive extensions are what comes next; the foundation is now in place.

Code and release: github.com/coredipper/operon, operon-ai on PyPI, skill organisms docs, watcher dashboard, adaptive assembly