Operon v0.21: Adaptive Foundations
Pattern Repository, Watcher Component, Adaptive Assembly, and the Experience Pool That Learns from Its Own Interventions
Release: v0.21.0 – v0.21.1
v0.19 gave Operon temporal epistemics. v0.20 wired it into the
runtime. v0.21 adds the other missing piece: the ability to learn
from experience and intervene when things go wrong. The theoretical
backbone comes from Dupoux, LeCun, and Malik’s
“Why AI Systems Don’t Learn” (arXiv:2603.15381),
which proposes a three-system cognitive architecture —
System A (observation), System B (action), System M
(meta-control) — and an evolutionary-developmental framework
for bootstrapping them. This release maps those ideas onto Operon:
a PatternLibrary as evolutionary memory, a
WatcherComponent as System M, and the run-loop
intervention mechanism as the meta-action surface. Together they
form the static scaffolding for Phase 4’s dynamic
assembly — the point where Operon starts choosing its own
structure.
1. What Was Missing
After v0.20, Operon could build explicit multi-stage workflows, attach telemetry, and maintain auditable bi-temporal state. What it could not do was remember which workflow shapes worked for which kinds of tasks, or detect and recover from runtime degradation without manual intervention.
In biological terms: the organism had structure and memory, but no immune response and no evolutionary selection. A cell that cannot detect stagnation and cannot learn from prior infections is fragile in exactly the ways that matter for production agent systems.
2. Why AI Systems Don’t Learn
The theoretical grounding for v0.21 comes from a March 2026 paper by Emmanuel Dupoux (EHESS / Meta FAIR), Yann LeCun (NYU / Meta FAIR), and Jitendra Malik (UC Berkeley / Meta FAIR): “Why AI Systems Don’t Learn and What to Do About It: Lessons on Autonomous Learning from Cognitive Science.” The paper argues that the dominant paradigm — hyperscaling text-based LLMs — is hitting structural limits that no amount of data or compute will resolve. The core problem is not model size. It is that deployed AI systems are static: they cease to learn the moment training ends, and every adaptation requires human intervention.
Three Roadblocks
The paper identifies three structural roadblocks that prevent current AI from achieving autonomous learning:
- Conceptual fragmentation. Observation-based learning (self-supervised learning on text, images, video) and action-based learning (reinforcement learning through trial and error) are treated as separate paradigms with separate data pipelines, separate training recipes, and separate research communities. In biological organisms, these are not siloed — they are deeply integrated from birth.
- Externalization of learning. The entire MLOps pipeline — data sourcing, curation, loss function design, training recipe orchestration, performance benchmarking, signal monitoring — is outsourced to human experts. The model itself does none of this. Once deployed, it is a fixed function. In contrast, biological organisms continuously adapt their own learning processes without external supervision.
- No scalable construction method. Even if you design a multi-component learning architecture, there is no established method for bootstrapping it. The components depend on each other — the meta-controller needs experience from the learners, but the learners need the meta-controller to guide their learning. This chicken-and-egg problem has no clean solution in current AI practice.
System A: Learning from Observation
System A corresponds to passive, statistical learning from sensory streams. In cognitive science, this is the infant accumulating visual and auditory regularities — learning to discriminate faces, recognize phonetic boundaries, and form intuitive models of physics, all without being told what to look for. In AI, the closest analogue is self-supervised learning (SSL): models trained on large corpora of text, images, or video to predict masked inputs, reconstruct corrupted signals, or match augmented views.
System A scales beautifully with data. It discovers abstract, hierarchical representations that transfer well to downstream tasks. But it has two fundamental limitations: it cannot actively curate its own training signal (it learns from whatever data humans provide), and it struggles to distinguish correlation from causation (observing that umbrellas correlate with rain does not teach you that rain causes umbrellas to appear).
Operon Mapping: System A → Fast Nucleus
In Operon, System A maps to fast_nucleus stages
(mode="fixed"). These are cheap, statistical,
pattern-matching stages — routing, classification, extraction.
They process input efficiently but do not reason deeply or pursue
goals. They are the organism’s perceptual layer.
System B: Learning from Action
System B corresponds to active, goal-directed learning through interaction with the environment. In cognitive science, this is the toddler learning to walk — trying, falling, adjusting motor commands based on consequences. In AI, the closest analogue is reinforcement learning (RL): agents that learn policies by maximizing reward signals through trial and error.
System B is grounded in interaction. It can discover novel solutions that no amount of passive observation would reveal. But it is notoriously sample-inefficient, struggles with high-dimensional action spaces, and requires well-specified reward functions — which are often the hardest part of the problem to define.
Operon Mapping: System B → Deep Nucleus
In Operon, System B maps to deep_nucleus stages
(mode="fuzzy"). These are expensive,
reasoning-heavy stages — planning, synthesis, evaluation.
They pursue goals, weigh trade-offs, and produce outputs that
require genuine deliberation. They are the organism’s
executive layer.
The Synergy: A Helps B, B Helps A
The paper’s key insight is that neither system is sufficient alone. They must be deeply integrated:
- A helps B by providing compressed representations (so the RL agent doesn’t operate on raw pixels), predictive world models (so the agent can plan rather than blindly explore), and intrinsic reward signals (prediction error, novelty, uncertainty) that guide exploration.
- B helps A by actively selecting informative data for the SSL model to learn from (directing attention to complex or uncertain stimuli) and by generating rich, grounded, task-relevant experience through its own goal-directed behavior.
In Operon, this synergy is the SkillOrganism run loop
itself. Fast stages produce routing signals, classifications, and
summaries that deep stages consume. Deep stages produce rich outputs
that inform subsequent fast stages. The organism composes cheap
perception and expensive deliberation into a single coherent
workflow — the same integration the paper argues is essential.
System M: The Meta-Controller
This is the paper’s central proposal. System M is an autonomous meta-controller — analogous to the prefrontal cortex’s executive functions or the control plane in software-defined networking — that orchestrates the learning process itself. It monitors low-dimensional “meta-states” and issues “meta-actions” to dynamically route data between System A and System B.
The meta-states fall into three categories:
| Category | Examples | Biological Analogue |
|---|---|---|
| Epistemic | Prediction error, uncertainty, learning gain, surprise | Orienting reflex, curiosity, “aha” moments |
| Species-specific | Direct gaze, looming stimuli, threat signatures | Innate fear responses, social attention biases |
| Somatic | Energy levels, pain, fatigue, homeostatic deviations | Metabolic regulation, fight-or-flight, sleep pressure |
Based on these signals, System M issues meta-actions: connect or disconnect learning modules, switch between operating modes (learning, inference, optimization), provide internal rewards or training targets, and activate or suppress specific data streams. Crucially, the paper hypothesizes that System M’s core routing policy is largely hardwired — shaped by evolution, not learned from scratch during an individual agent’s lifetime.
Operon Mapping: System M → WatcherComponent
The WatcherComponent is Operon’s concrete
instantiation of System M. It monitors
EpiplexityMonitor (epistemic),
ATP_Store (somatic), and
ImmuneSystem (species-specific), then issues
meta-actions: RETRY, ESCALATE, or
HALT. Its policy is configured, not learned —
matching the paper’s hypothesis that System M is
hardwired. Phase 4’s experience pool will add
learned refinement on top of this fixed scaffolding.
The Evo-Devo Framework
The paper’s third contribution is a strategy for bootstrapping the A-B-M architecture. The chicken-and-egg problem — M needs experience from A and B, but A and B need M to guide their learning — is solved by borrowing from biology’s own solution: evolution and development operating at two timescales.
The framework is formalized as bilevel optimization:
- Inner loop (development): A single agent lifetime. The agent’s architecture (A, B, M) is initialized from meta-parameters φ. Systems A and B update their parameters through interaction with the environment, guided by a fixed System M. This is one “life.”
- Outer loop (evolution): The meta-parameters φ are optimized over many lifetimes using a fitness function evaluated over entire life cycles in simulated environments. This is natural selection operating on the genome.
The genome φ specifies everything: initial weights for A and B, the routing policy of M, and the developmental curriculum (which environments appear when, how complexity increases over the agent’s lifetime). Evolution shapes these biases; development instantiates them.
Operon Mapping: Evo-Devo → PatternLibrary
The PatternLibrary is Operon’s evolutionary
memory. A PatternTemplate is the genome φ —
it specifies the topology, stage specs, and intervention policy.
PatternRunRecord instances are fitness evaluations.
The success_rate feeds back into
top_templates_for() scoring, so templates that
consistently succeed rise in the rankings. One organism run is
one “lifetime”; the library accumulates across
many lifetimes. This is the outer loop.
Emergent Cognitive Modes
The paper argues that even a fixed System M can give rise to sophisticated learning behaviors that are typically associated with large-brained species:
- Learning through communication. System M’s species-specific signals include attention to pedagogical cues (direct gaze, pointing, infant-directed speech). This enables a form of social learning: attending selectively to informative teachers and trusting their signals proportionally to past reliability. This is epistemic vigilance — not blind trust, but calibrated trust.
- Learning through imagination. During periods of low external stimulation (sleep, rest), System M can redirect data flow from external sensors to internal memory. Episodic memories are replayed, consolidated, and compressed into schemas. The agent can also simulate counterfactual scenarios — “what would have happened if I had chosen differently?” — using its world model from System A.
These are not implemented in v0.21. They are on the roadmap for
Phase 5 (SleepConsolidation cycle,
counterfactual_replay() over bi-temporal memory) and
Phase 6 (SocialLearning extension of
QuorumSensing). The point is that the static
scaffolding — the hardwired System M policy, the
evolutionary template memory — is what makes those emergent
modes structurally possible. You cannot build sleep consolidation
without a watcher that knows when to trigger it.
3. The Pattern Repository
The PatternLibrary is evolutionary memory for collaboration
patterns. It stores PatternTemplate instances —
blueprints describing a topology, stage specifications, and default
intervention policies — and retrieves them by matching against
a TaskFingerprint.
from operon_ai import PatternLibrary, PatternTemplate, TaskFingerprint
lib = PatternLibrary()
# Register a template
lib.register_template(PatternTemplate(
template_id=lib.make_id(),
name="Enterprise Review",
topology="skill_organism",
stage_specs=(
{"name": "research", "role": "Researcher", "mode": "fuzzy"},
{"name": "strategy", "role": "Strategist", "mode": "deep"},
{"name": "critique", "role": "Critic", "mode": "fuzzy"},
),
intervention_policy={"max_retries": 2},
fingerprint=TaskFingerprint(
task_shape="sequential", tool_count=4,
subtask_count=3, required_roles=("researcher", "strategist", "critic"),
),
))
# Later: retrieve ranked matches for a new task
ranked = lib.top_templates_for(TaskFingerprint(
task_shape="sequential", tool_count=3,
subtask_count=4, required_roles=("researcher", "analyst"),
))
Scoring
The top_templates_for() method scores each template
using a weighted combination of six factors:
| Factor | Weight | Metric |
|---|---|---|
| Task shape | 0.30 | Exact match (1.0 or 0.0) |
| Tool count | 0.15 | 1 / (1 + |delta|) |
| Subtask count | 0.15 | 1 / (1 + |delta|) |
| Role overlap | 0.20 | Jaccard similarity |
| Tag overlap | 0.10 | Jaccard similarity |
| Success rate | 0.10 | Historical win rate (default 0.5) |
The success rate component closes the feedback loop: as
PatternRunRecord instances accumulate, templates that
consistently succeed rise in the rankings, while those that fail
sink. This is the evo-devo outer loop described in §2: the
template is the genome φ; run-record scoring is natural
selection.
4. The Watcher Component
The WatcherComponent is a
SkillRuntimeComponent that observes stage execution and
classifies signals into the three meta-state categories described
in §2:
- Epistemic — epiplexity and prediction error.
Sourced from
EpiplexityMonitor. Low epiplexity means the agent is repeating itself without learning — the stagnation signal. - Somatic — ATP and metabolic state. Sourced
from
ATP_Store. When the budget is nearly exhausted, the organism is running on fumes. - Species-specific — immune threats and
membrane alerts. Sourced from
ImmuneSystem. A confirmed or critical threat means the agent’s behavior has diverged from its baseline in a way that looks adversarial.
Why Three Categories
The distinction matters because the appropriate intervention depends on the signal type. Epistemic stagnation calls for escalation to a more capable model. Somatic depletion calls for conservation or halt. Immune threats call for immediate quarantine. Collapsing all signals into a single “health score” loses the information needed to choose the right response.
5. The Intervention Mechanism
When the watcher decides to intervene, it writes a
WatcherIntervention to shared_state under
a reserved key. The organism’s run loop checks for this key
after all component on_stage_result hooks complete:
| Intervention | Run Loop Behavior |
|---|---|
RETRY | Re-execute the current stage. The retry result replaces the original. |
ESCALATE | Re-execute using the deep nucleus. Only applies to agent-backed stages. |
HALT | Break the stage loop immediately. No further stages execute. |
Component hooks are not re-invoked after retry or escalation. This prevents recursive intervention loops: the watcher decided to retry; the retry result stands. The next stage’s hooks will fire normally.
from operon_ai import WatcherComponent, WatcherConfig, skill_organism
watcher = WatcherComponent(
config=WatcherConfig(max_intervention_rate=0.5),
epiplexity_monitor=monitor, # optional
budget=atp_store, # optional
)
organism = skill_organism(
stages=[...],
fast_nucleus=fast,
deep_nucleus=deep,
components=[watcher, telemetry],
)
6. The BIGMAS Insight: Why Routing Decisions Reveal Failure
The most interesting intervention is the one that stops the organism entirely. The theoretical grounding comes from a second paper: “Brain-Inspired Graph Multi-Agent Systems for LLM Reasoning” by Hao, Dai, Qin, and Yu (Chinese Academy of Sciences, March 2026). The paper proposes BIGMAS — a multi-agent framework inspired by Global Workspace Theory (GWT) — and in the process reveals an empirical regularity that turns out to be directly useful for Operon’s watcher.
Global Workspace Theory in Multi-Agent Systems
GWT, originally a theory of human consciousness, posits that flexible cognition arises from the dynamic formation of coalitions among distributed specialized processors, coordinated via a shared central workspace. BIGMAS operationalizes this for LLM reasoning:
- Problem-adaptive graph design. A
GraphDesigneragent analyzes each problem and autonomously constructs a task-specific directed agent graph — nodes are specialized roles (generator, validator, analyzer, optimizer), edges are information flow. No fixed topology; the structure adapts to the problem. - Centralized workspace. All agents coordinate exclusively through a shared workspace with four partitions: context (read-only problem), work (read-write intermediate results), system metadata (execution history), and answer (final output). No point-to-point communication.
- Global orchestrator. A routing agent sees the full workspace state and complete execution history before deciding which node activates next. This avoids the partial-information bottleneck of reactive systems like ReAct.
Operon Mapping: BIGMAS Architecture
BIGMAS’s workspace partitions map directly onto
Operon’s three-layer context model: Bctx
(problem context) → topology layer;
Bwork (intermediate results) →
shared_state ephemeral layer;
Bsys (execution metadata) → telemetry +
watcher signals. The orchestrator role maps to the
WatcherComponent — both make routing
decisions conditioned on global state rather than local
observation.
The Empirical Finding
BIGMAS was evaluated across six frontier LLMs (DeepSeek-V3.2, Claude 4.5 Sonnet, Gemini 2.5 Pro, GPT-5) on three combinatorial reasoning benchmarks. The results showed consistent improvements — even GPT-5 went from 91% to 98% on Tower of London — and crucially, gains were orthogonal to model-level reasoning enhancements. Applying BIGMAS to LRMs (models with extended chain-of-thought) yielded additional gains on top of the thinking-mode improvements, proving that architectural coordination and individual model capability are complementary.
But the most useful finding for Operon was in the routing analysis. The paper measured how many orchestrator routing decisions each run required and found a systematic pattern:
| Task | Correct Runs (mean decisions) | Incorrect Runs (mean decisions) |
|---|---|---|
| Game24 | 1.1 | 1.7 |
| Six Fives | 2.9 | 4.6 |
| Tower of London | 7.3 | 9.4 |
Incorrect runs consistently required more routing decisions than successful ones across all tasks. The orchestrator continued cycling through nodes in unsuccessful attempts, accumulating calls without converging. The paper notes this as an “emergent proxy for instance-level difficulty and non-convergence” and suggests it could inform early-stopping mechanisms.
Operationalizing the Signal
That is exactly what Operon’s watcher does. The
_check_convergence() method tracks the ratio of
cumulative interventions (retries + escalations) to observed stages.
When this ratio exceeds max_intervention_rate
(default 0.5), the watcher emits a non-convergence HALT. This is
approximately 20 lines of code in the decision function, but it
encodes the BIGMAS empirical insight: an organism that keeps needing
corrections is not converging, and continuing to retry is worse than
stopping.
The pattern library can then record the failed run, demoting the template for future tasks with similar fingerprints. This closes the loop: BIGMAS provides the signal (routing decisions correlate with failure), the watcher operationalizes it (intervention rate as convergence proxy), and the pattern library learns from it (failed runs lower template scores).
What Operon Borrows, What It Doesn’t
Operon borrows the convergence signal but not the architecture. BIGMAS uses a single centralized workspace with a global orchestrator that sees everything. Operon uses distributed component hooks where the watcher sees stage results sequentially. BIGMAS constructs a fresh graph per problem instance. Operon retrieves templates from a scored library. BIGMAS validates workspace writes with a structured protocol. Operon validates through stage-level type coercion and bi-temporal append-only semantics.
The architectural differences are deliberate. BIGMAS optimizes for single-problem reasoning depth. Operon optimizes for multi-run pattern learning across many problems. BIGMAS’s per-problem graph design is expensive (the paper reports 18-37% of total tokens spent on graph design alone) but appropriate for complex combinatorial problems. Operon’s template retrieval is cheap (a dict lookup + scoring function) and appropriate for repeated enterprise workflows where the same topology works across many similar tasks.
Complementary, Not Competing
BIGMAS and Operon sit at different points on the adaptation-cost spectrum. BIGMAS pays high per-problem cost for maximum structural adaptation. Operon pays low per-problem cost by amortizing adaptation across many runs via the pattern library. A hybrid — use BIGMAS-style graph design for novel problems, then register successful graphs as Operon templates for future retrieval — is a natural Phase 4 extension.
7. Decision Priority
The watcher evaluates signals in a fixed priority order:
- Intervention rate exceeded → HALT (non-convergence)
- Critical immune threat → HALT
- Critical epiplexity → ESCALATE (or HALT if already deep)
- Stagnant epiplexity on fast model → ESCALATE
- Stage failure with retries remaining → RETRY
- Otherwise → no intervention
Higher-priority signals always win. A critical immune threat triggers HALT even if the epiplexity looks fine. This is intentional: the categories are not weighted against each other; they are checked in order of severity.
8. Zero Overhead When Not Attached
Both new components are entirely opt-in. An organism with no
WatcherComponent in its components list
never checks for intervention keys. The
PatternLibrary is a standalone object with no coupling
to the run loop. Existing tests (987 from v0.20) pass without
modification.
9. Closing the Loop: Adaptive Assembly (v0.21.1)
With the scaffolding in place, v0.21.1 closes the adaptive loop.
The AdaptiveSkillOrganism wrapper composes the full
lifecycle: fingerprint → retrieve → assemble →
run → record.
from operon_ai import adaptive_skill_organism, PatternLibrary, TaskFingerprint
lib = PatternLibrary()
# ... register templates ...
adaptive = adaptive_skill_organism(
"Prepare a Q4 earnings analysis.",
fingerprint=TaskFingerprint(
task_shape="sequential", tool_count=3, subtask_count=3,
required_roles=("researcher", "strategist"),
),
library=lib,
fast_nucleus=fast,
deep_nucleus=deep,
handlers={"intake": intake_fn, "research": research_fn, "strategy": strategy_fn},
)
result = adaptive.run("Prepare a Q4 earnings analysis.")
# result.template — which template was selected
# result.record — PatternRunRecord stored in library
# result.watcher_summary — signal/intervention counts
The factory adaptive_skill_organism() auto-fingerprints
the task if no fingerprint is provided, queries
PatternLibrary.top_templates_for() for the best
template, and calls assemble_pattern() to dispatch
on topology — skill_organism,
reviewer_gate, specialist_swarm, or
single_worker. A WatcherComponent and
TelemetryProbe are automatically attached.
Experience Pool
After each run, the wrapper records a
PatternRunRecord in the library (closing the scoring
feedback loop) and populates the watcher’s experience pool
with ExperienceRecord instances for each intervention.
The experience pool persists across runs: when rule-based decision
logic returns no intervention, the watcher consults past experiences
with matching (stage, signal category, fingerprint shape) and
recommends the intervention kind that was most often successful.
Rule-based decisions always take priority. Experience is a fallback. This ensures backward compatibility — an empty experience pool changes nothing — while allowing the watcher to learn from operational history.
The Evo-Devo Inner Loop, Realized
One organism run is one developmental lifetime. The library’s scoring across many lifetimes is evolutionary selection. Templates that consistently succeed rise in the rankings; those that fail sink. The experience pool adds Lamarckian refinement on top: not just which templates work, but which interventions work within those templates. This is the progression the roadmap describes: structure → memory → adaptation. v0.17–0.18 gave explicit structure. v0.19–0.20 gave auditable memory. v0.21 delivers adaptation.
10. Validation
| Suite | Tests | Status |
|---|---|---|
| Pattern repository unit tests | 16 | All pass |
| Watcher component unit tests | 16 | All pass |
| Organism intervention integration tests | 5 | All pass |
| Adaptive assembly unit tests (v0.21.1) | 15 | All pass |
| Experience pool unit tests (v0.21.1) | 14 | All pass |
| Full regression suite at release | 1053 | All pass |
operon_ai/patterns/repository.py,
operon_ai/patterns/watcher.py,
operon_ai/patterns/adaptive.py,
operon_ai/patterns/organism.py,
examples/72_pattern_repository.py,
examples/73_watcher_component.py,
examples/74_adaptive_assembly.py,
examples/75_experience_driven_watcher.py
11. What Comes Next
v0.21 delivers both the scaffolding (v0.21.0) and the adaptive loop (v0.21.1). With patterns stored, scored, and automatically assembled, and with interventions learned from operational history, the next phases build the cognitive extensions that Dupoux, LeCun, and Malik predict will emerge from a working System M.
Phase 5 implements learning through imagination:
SleepConsolidation replays episodic memories from
bi-temporal storage, compresses successful patterns into templates,
and runs counterfactual_replay() to ask “what
would have happened with different facts?” Phase 6
implements learning through communication:
SocialLearning extends QuorumSensing so
organisms can adopt templates that worked for peer organisms, with
epistemic vigilance (trust calibrated to track record) preventing
blind imitation.
The deeper point is the one the paper makes: autonomous learning requires all three systems working together. System A alone gives you pattern recognition without grounding. System B alone gives you trial-and-error without abstraction. System M alone gives you monitoring without anything to monitor. v0.21 is the first release where all three systems are operational and connected — the static scaffolding, the adaptive loop, and the experience pool compose into a working whole. The cognitive extensions are what comes next; the foundation is now in place.
Code and release: github.com/coredipper/operon, operon-ai on PyPI, skill organisms docs, watcher dashboard, adaptive assembly