SLAM Already Solved Stagnation
operon-langgraph-gates v0.1 is a discrete-state port of factor-graph fixed-lag smoothing (Kaess 2012). The port is trivial — the scope discipline that falls out is the upgrade.A LangGraph user filed issue #6731 asking for protection against infinite agent loops. It was closed, not as wontfix, but as NOT_PLANNED — out of scope for the framework.
I do not blame the maintainers. It is out of scope. But it was also the most common pain point in the four-source LangGraph issues, discussions, and blog-posts sample I reviewed before starting on operon-langgraph-gates — four sources is not a census, but it was enough to shift my priors. Agents that cannot detect they have stopped making progress is a live unsolved reliability problem in production LangGraph deployments right now.
The thing I kept not saying out loud, because it felt a little too neat, is this: there is a well-understood structural pattern from a different domain that matches the shape of this problem. Factor-graph SLAM has been using it since the mid-2000s, formalised incrementally in iSAM2 (Kaess et al. 2012). It is called fixed-lag smoothing over a factor graph, and it is how the factor-graph formulation behind modern SLAM back-ends decides whether a belief about the world is getting better or getting stuck.
Frank Dellaert just published a blog post positioning factor graphs as a concrete, structured instance of the energy-based “world models” the JEPA line of work is building on differently. He calls the loop STAG: Sense, Think, Act, with Graphs. Past graph does perception (fixed-lag smoothing). Future graph does planning (predictive control). They share a dynamics model. Both are energy minimisation.
When I read that post the same morning I was writing the README for the LangGraph gates wedge, I realised something that had been sitting in the code for a year: Operon’s runtime guards have the graph structure of a discrete-state STAG loop. The pre-guard is a dynamics-residual check against a standing checkpoint. The windowed stability verifier is the past-graph smoother it feeds into. The post-guard is a (degenerate) future-graph planner. The cleanest way to put it, borrowed from the companion paper appendix: the guards admit a STAG description even though the code does not use factor-graph machinery to compute it.
This post is the explanation. It is also the theoretical-basis paragraph for operon-langgraph-gates v0.1.
Connected to: the April 21 post on cert-firing
If you read Score-Rejection Isn’t Cert-Firing (and n=10 Isn’t Enough) from two days ago, you saw the sharper version of a distinction this blog keeps returning to: a cert fires only when a named structural property is attested, not when a score crosses a threshold. The April 21 update was me cleaning up that conflation at the evaluation-outcome layer. This post is the same discipline, one layer deeper, at the mechanism layer: what does it mean for the primitive that does the detecting to be structural and not algorithmic?
Factor graphs are the cleanest answer I know. They force you to be concrete about exactly three things — what is a state, what counts as an observation, what is the dynamics model — before you can even describe the loop. If you cannot write those down, your stagnation detector is not a detector; it is vibes. Every post on this blog since What Biological Agent Design Actually Buys You has been making the same argument from a different angle: structural guarantees, not algorithmic sophistication. Factor graphs are what that slogan commits you to if you actually write down the math.
The thing LangGraph users actually want
The ask in issue #6731 is narrow and specific: make the agent halt when it is not making progress. The ask is not “give me a world model” or “prove termination.” The ask is the robot-operator version: if my smoother says the belief has not moved in the last W steps, stop.
The reason the ask is natural is that humans know stuck-ness when they see it. You have watched an LLM agent retry the same failing call six times in a row. You have watched it ask itself “should I use tool X?” and answer “yes, I should use tool X” and then not use tool X. The pattern is unmistakable. The question is only whether your framework can detect it before you hit the budget cap.
LangChain’s answer, for reasons I actually find defensible, is “observability will tell you, after the fact.” LangSmith shows you the trace. You notice. You tighten the prompt. This is fine for one-off debugging and wrong as a runtime reliability story. There is a well-understood structural pattern from a different domain that fits the shape of this class of failures. You can port it, or you can keep not-porting it.
Factor graphs, really briefly
A factor graph has variables (what you care about) and factors (functions that score combinations of variables). The joint score is the product of factor values. For robotics, the variables are states at different times xt−5, …, xt. The factors come in two flavours:
- Measurement factors score how well a state matches a sensor reading.
- Dynamics factors score how well a state transition matches the motion model.
Fixed-lag smoothing anchors the graph at the current time t and keeps the last W steps in memory. Every time a new observation arrives, you slide the window, add one measurement factor, add one dynamics factor, marginalise the oldest variable into a prior factor, and re-solve. You get a rolling MAP estimate of where the robot has been. If that estimate stops moving, the robot is stuck.
Dellaert’s STAG framing adds the future side: the same variable xt is also the anchor of a second graph looking forward, with goal factors and control variables. Plan becomes energy minimisation on the future graph. Perception becomes energy minimisation on the past graph. One dynamics model, two graphs glued at the present.
The reason this has survived 20 years of scrutiny is that it forces you to be concrete about exactly three things: what is a state, what counts as an observation, and what is the dynamics model. You cannot wave your hands. If you cannot write those down, your smoother is not a smoother; it is vibes.
The translation to LLM agents
I want to do this term by term because the cleanness of the translation is the point.
State. In Operon, xt is the agent’s tuple of (genome, expression, short-term memory) between stages. It is symbolic, not Euclidean. The dynamics is deterministic — a checkpoint-predicted transition, not a learned model.
Measurement factor (with one caveat). In operon_ai/core/certificate.py, the behavioral_stability_windowed verifier computes a per-window residual r = max(0, x − τ) where x is the per-window mean of some quality signal and τ is the stability threshold. Each window’s “measurement factor” is the indicator 1[r = 0] — which, to be fair to the factor-graph reader, collapses a likelihood factor to a hard admissibility constraint. We inherit the graph topology but not the probabilistic semantics: no Mahalanobis residual, no information matrix, no covariance on the smoothed belief. That is a real loss and it is the first place a SLAM reader should push back.
Dynamics factor. DNARepair.scan in operon_ai/state/dna_repair.py checks whether the observed genome hash matches the checkpoint’s predicted hash. That is the dynamics residual. The checkpoint is the (deterministic) dynamics model. If the residual is positive, the agent has transitioned into an inadmissible state, and the pre-guard halts.
Past graph — windowed verifier, not the pre-guard. The windowed stability verifier is the piece that acts like a fixed-lag smoother: it marginalises a sequence of per-window residuals into a single pass/fail over the window. The pre_guard in compile_guarded_graph (operon_ai/convergence/guarded_graph.py) is a separate, one-step component — a DNARepair.scan against the standing checkpoint. Blog readers sometimes conflate these two; the appendix is careful, so the blog should be too. The pre-guard is the residual check; the windowed verifier is the smoother-shaped component that feeds into pass/fail decisions.
Future graph. Same file, the post-guard runs a rubric verifier and either accepts or routes to retry. Horizon h=1. One goal factor. This is the degenerate but operationally honest version of STAG’s future graph, and I want to be clear that it is deliberately small. The moment you go to h > 1 you are planning, and planning is where agent systems go to die.
Shared dynamics. Both guards use the same coalgebraic state transition defined in operon_ai/core/coalgebra.py. One dynamics model, two guards glued at the current stage. Exactly the STAG split.
Joining across agents (with the same caveat). When two agents exchange certificates via the A2A codec (operon_ai/convergence/a2a_certificate.py), they are treating a certificate as a shared factor conditional on schema agreement. The receiving agent can verify the factor locally and fold it in, or forward-without-verify under the graceful-degradation rule. This is not full distributed-SLAM joining — distributed SLAM shares continuous latent state (agreement on a pose), while Operon agents share a predicate name plus a boolean. The monoidal structure of the optimiser is consistent with associative certificate composition; I would not claim more than that without writing the functor down. It is a shared-evidence pattern under a shared schema, with the topology of factor joining but not its semantics.
That is the whole mapping. No new math, no new optimiser, no learned factors. Just the claim that the pre/post-guard pattern in front of an LLM has the same graph topology as the rolling reliability loop SLAM has been running in front of a camera for a decade and a half. To borrow a line from the appendix: there is no dual benefit from the structure, only a dual description.
What this gets you that naming does not
The translation matters for four reasons, and I want to separate them so the useful ones stand out from the merely-aesthetic.
1. Citation lineage. “We gate agents with an ad-hoc stagnation check” is a weaker story than “we gate agents with a discrete-state analogue of fixed-lag smoothing, a technique used in production robotics since Kaess et al. 2012.” The second is also what the code actually does. For reviewers, for reliability engineers picking a framework, and frankly for me, the second framing earns its keep.
2. Composition is not ad-hoc. The A2A codec was, in my head, a serialisation format with graceful degradation. After the translation, it is shared-evidence-under-shared-schema with the topology of factor joining. That is still a load-bearing upgrade — reliability gates at different agents compose, and the composition inherits associativity from the schema-and-monoidal structure — but the joining is over booleans, not over continuous latents, so the algebra you get is correspondingly weaker than in distributed SLAM. Still useful. Do not over-read it.
3. Vocabulary for future scope. Every reasonable extension of the gates wedge — longer pre-guard windows, goal factors in the post-guard, richer dynamics from richer checkpoints — now has a name. When somebody proposes learning the factors from data, we know exactly what we are signing up for: we are giving up the structural guarantee and taking on an empirical one. That is sometimes fine. But at least it is visible now.
4. A hard line against scope drift. Here is the load-bearing one. It is very tempting, once you write down a factor graph, to start training the factors. That is the whole JEPA direction, and it is a perfectly respectable research programme. It is also not what Operon does and not what operon-langgraph-gates is going to do. Factors fixed in code. Topology fixed. Only the set of theorems grows. The moment we start fitting residuals to data, the reliability story becomes an empirical one, and I have enough empirical reliability stories in this field.
What I am not claiming
I want to say this clearly because the translation is neat enough that it invites overreach.
- I am not claiming this is novel. Fixed-lag smoothing is 20+ years old. What is new-ish is running it over a symbolic LLM agent state with verification instead of gradient descent as the solver. That is a porting exercise, not a breakthrough.
- I am not claiming Operon is a world model. There is no learned dynamics, no predictive rollout, no energy function in the JEPA sense. The analogy to STAG is structural, not dynamical. Our “energy” is a residual count.
- I am not claiming factor graphs solve LangGraph #6731 by themselves. They give a vocabulary and a design discipline. The actual closing of the loop happens in code, in the wedge repo, in about 200 lines.
The wedge
operon-langgraph-gates v0.1 ships two primitives. StagnationGate wraps the rolling-window stagnation detector Paper 4 ยง4.3 benchmarks at 0.960 on convergence and false-stagnation scenarios with real sentence embeddings; its test is, in the language of this post, “the smoothed belief did not move in the last W steps.” IntegrityGate wraps user-defined invariants against a standing checkpoint — a dynamics-residual check. Both are drop-in LangGraph nodes. Neither mentions the word “organism,” “skill,” or anything biological, because nobody wiring up their LangGraph agent cares about the mitochondrial analogy and I have finally internalised that.
What the README says, in the paragraph this post exists to earn, is that the primitives are not new. They are a port. They come from the oldest, best-tested reliability loop in autonomy research. And the reason the field has drifted away from them is not that they stopped working — it is that LLM agents arrived on a separate vocabulary track and nobody stopped to notice the overlap.
This post is me stopping and noticing.
- operon-langgraph-gates on GitHub
- Paper 6 appendix §8 — the full term-by-term mapping, worked example with real seed-0 numbers, and an explicit record of where the analogy stops
- Dellaert, “Factor Graphs and World Models” — the GTSAM post this one is written in reply to
- StagnationGate demo on HuggingFace Spaces
- Pinecones and the Portable Certificate — companion cross-domain post: a materials-engineering preprint reaches the same compositional-verification framework, one layer up at the theorem level rather than the mechanism level