How RLMs and Context Graphs Fold Together

RLMs and Context Graphs are two complementary frameworks that solve different layers of the same problem. Neither works well alone; together they enable a new class of AI systems.

The Core Folding Dynamic

RLMs handle the cognitive layer: How does an agent manage massive context within a single decision sequence? Through delegation to sub-LLMs, selective data access via Python REPL, and iterative refinement of the answer variable, an RLM keeps its active context compact while accessing arbitrary amounts of data.

Context Graphs handle the organizational layer: How does an enterprise remember why decisions were made, across many sequences over time? By capturing decision traces (inputs, policies, exceptions, approvals) as first-class data, linked by causation and precedent, decision traces become a searchable organizational memory

The integration of both solves compounding autonomy: An agent makes a decision (RLM), emits a trace (Context Graph), future agents query that trace for precedent (back to RLM), and the cycle repeats with better decisions and richer precedent. Neither layer alone enables this feedback loop.

Why this Integration matter?

Problem 1: Context cost explode while result degrade

RLMs alone compress context by delegating heavy computation to sub-LLMs, keeping the main model’s context token usage low. But a model can’t know that the same context question was already answered five time in past across different agents. Context Graphs store the answers durably, so future agents don't re-analyze.

Integrated: The graph says: "We've evaluated this 20 times; the answer is consistent." RLM trusts the graph, saves tokens, makes faster decisions.

Problem 2: Agents Can’t Learn from Mistakes or Past Processing

RLMs improve via RL training on task efficiency. But without organizational memory, every new agent runs the same learning curve. Context Graphs capture what was learned so the next agent doesn't start from zero.

Integrated: With context graphs, an RLM would query the graph ("Show me similar decisions"), gets concrete precedent ("87% of cases approved at this level"), and conditions its reasoning on that. Learning becomes organizational, not individual. The fourth renewal agent is vastly better than the first because it inherits three months of decision patterns.

Problem 3: Autonomy Becomes Audit-Opaque and Ungovernable

RLMs make reasoning visible through explicit sub-task decomposition. But visibility without durability doesn't help auditors. "I decomposed it into 5 sub-tasks" is clear, but if the next auditor asks "Why the same decision, made again?" there's no record. Context Graphs make decisions replayable: you can see the exact state of knowledge at decision time (what data was gathered, what precedent existed, what policy applied, why the exception was approved). Governance shifts from "pre-decision rules" (prevent bad decisions) to "post-decision audit trails" (explain and learn from decisions).

Integrated: Every agent decision is both transparent (via RLM decomposition) and durable (via Context Graph traces).

Problem 4: Organizations Reinvent Edge Cases Repeatedly

Sales learns "healthcare companies get +10% discounts" in Week 1. Finance discovers the same pattern in Week 8. Support learns it in Week 16. The knowledge exists only in Slack and people's heads. RLMs can read Slack, but unstructured messages aren't executable. Context Graphs structure decisions, but without an orchestration layer capturing traces, they stay stale

Integrated: Renewal RLM captures "healthcare discount" as a decision trace; three days later, an expansion agent queries the graph, finds the pattern, and reuses it without asking. Knowledge is durable and executable, not tribal.

The Three Folding Junctures

The integration happens at three critical points:

Juncture 1: Input Folding

Data lives in enterprise systems (CRM, ERP, Zendesk)
RLM doesn't ingest it as prompt tokens
Python REPL queries it programmatically; sub-LLMs handle heavy lifting in parallel
Only summaries return to the main model
Result: Full context accessible; token cost logarithmic, not linear

Juncture 2: Decision Folding

At decision time, RLM emits a structured trace: inputs, policy, exception, approval, outcome
This trace becomes a node in the Context Graph, linked to entities and precedent
Not a log; first-class data
Result: "Why" is now as queryable as "what"

Juncture 3: Knowledge Reuse

Next RLM run queries the graph for similar decisions
Sub-LLMs retrieve summaries of matching precedents
Main RLM conditions its reasoning on precedent
Better reasoning → better decisions → better traces → richer graph
Result: Knowledge learned once is leveraged forever; compounding value

How would compounding look in terms of value over time

Hour 0: A single RLM decision (renewal discount). Solved correctly, but why it was solved is invisible.

Day 1: Decision trace is captured-inputs gathered, policy evaluated, exception type, who approved, rationale. Node added to Context Graph.

Week 1: Five similar renewals happen. Context Graph now has a cluster of 6 related decisions, all with consistent outcomes. RLM future runs can see this pattern.

Week 2: New renewal RLM queries graph before deciding: "Show me service-impact exceptions, past 90 days." Gets 6 precedents, all approved at 15-20% discount. Sub-LLM summarizes "High precedent match." Main RLM routes to VP with summary instead of analysis, shortening approval time from 20 minutes to 5.

Month 2: 50 renewals processed. 7 decision patterns identified. RLM is now trained (via RL) to auto-approve high-confidence patterns, escalate medium-confidence ones with precedent, and ask for guidance on novel cases. Autonomy is ~70%.

Quarter 2: 200 renewals. Context Graph is queried not just by renewal agent but by expansion (similar logic), support (knows which customers have high-precedent exceptions), and finance (understands approval variance). Precedent from one function improves others. Decision patterns are organisational, not siloed.

Why RLM+Context graph better than LLM + Context Graphs

RLMs outperform plain LLMs in long-horizon tasks, making them superior for context graphs that rely on high-quality decision traces.

Core Advantages of RLM + Context Graph:

Superior Long-Context Handling: Plain LLMs suffer "context rot": performance degrades as context grows, rejecting inputs beyond limits (~400k tokens). RLMs handle inputs up to 100x larger (10M+ tokens) by delegating to sub-LLMs and Python REPL
Dramatically Better Token Efficiency: RLMs shift heavy computation to sub-LLMs, reducing main-model tokens while scaling "thinking" capacity
Learned Context Management via RL: Plain LLMs use fixed prompts/files. RLMs are trained end-to-end via RL to learn delegation, chunking, and folding strategies, surpassing baselines on long-horizon tasks like Deep Research and SWE-bench.
Parallel Sub-LLM Orchestration: RLMs use llm_batch() for parallel sub-LLMs (each with tools), handling multi-source synthesis (e.g., Salesforce + Zendesk + PagerDuty) without bloating the main context.

Why Integration Is Structurally Superior

RLM alone: Solves the cognitive problem (context management, RL training) but has no memory between runs. Fourth agent repeats mistakes of first agent because there's no durable record of what was learned.

Context Graph alone: Captures precedent durably but requires someone else to read and act on it. Needs active instrumentation to stay updated. Hasn't changed how decisions are made; just recorded them after the fact.

Integrated: RLM automatically generates traces. Context Graph automatically feeds back to RLM. Learning is bidirectional, compounding, and organisational. The fourth agent is vastly better than the first because it inherited three months of structured precedent.

Why This Matters Now

RLMs prove that long-horizon (month-spanning) agent workflows are feasible with the right scaffolding. Context folding via RL training doesn't require unrealistic context windows; it requires training agents to delegate and manage information intelligently.

Context Graphs prove that the missing layer in enterprise software isn't better data or better AI-it's decision lineage. Enterprises have data; they don't have durable, queryable records of why decisions were made. This is the trillion-dollar gap.

Together, they prove that the next generation of enterprise software will not be "add AI to Salesforce" but "build new systems of record for decisions that are captured from agents in the execution path." Regie (AI-native sales), Maximor (finance workflows), PlayerZero (incident response), and others are doing exactly this. The infrastructure-RLMs for orchestration, Context Graphs for memory, observability for governance-is crystallizing in real time.