The Recursion Curse: Why “Summarizing Summaries” Quietly Breaks Long‑Context LLMs
Most teams building LLM features for support analytics, QA, or long‑running workflows eventually hit the same wall: demos look great, but production summaries feel shallow, inconsistent, or just wrong. The failure rarely comes from the model alone—it comes from the architecture pattern behind it.
The most common culprit is recursive, hierarchical summarization: summarizing chunks, then summarizing those summaries, and so on. It’s easy to scale, but it bakes in a mathematical guarantee of information loss and error accumulation as depth increases.
The Popular Pattern That Doesn’t Scale
The standard approach to “long context” looks like this:
C1, C2, …, Cn
↓
S1 = summarize(C1)
S2 = summarize(C2)
...
Sn = summarize(Cn)
↓
S_final = summarize(S1 + S2 + ... + Sn)
This map–reduce / tree summarization pattern is attractive:
- Parallelizes cleanly across many chunks
- Latency grows logarithmically with corpus size
But there is a hidden cost: with every extra layer, the system gets further from the ground truth and more confident about potentially wrong inferences.
In practice, teams see three symptoms:
- Crucial details disappear across boundaries
- Tone and sentiment get exaggerated
- Early hallucinations harden into “facts”
Why This Happens: An Information‑Theory Intuition
A hierarchical pipeline can be modeled as:
X → Y → Z
Where:
- X: original corpus (e.g., months of tickets)
- Y: first‑layer summaries
- Z: final summary
The Data Processing Inequality states:
I(X; Z) ≤ I(X; Y)
Meaning: once information is processed, later stages cannot recover what was lost earlier.
LLM summarization is both:
- Lossy (details are dropped)
- Noisy (hallucinations, sentiment drift, mis‑labeling)
A useful heuristic model for summary quality over multiple hops is:
Q_{t+1} ≈ α Q_t − β
Where:
α ∈ (0,1)is signal retentionβ > 0is noise injected per step
As depth increases, signal decays geometrically while noise accumulates.
Failure Modes You Actually See in Production
1. Broken Links Across Chunks
Long‑range dependencies break when chunk boundaries hide context:
- Ticket #5: “502 Gateway error in checkout”
- Ticket #50: “It happened again today”
Independent summaries lose the referent of it, and the final output drops the actionable detail.
2. Semantic Drift (Telephone Game Effect)
Repeated rewriting nudges content toward model priors:
"slightly annoyed" → "dissatisfied" → "angry"
By the top layer, nuance is gone and sentiment is exaggerated.
3. Hallucination Anchoring
Early assumptions become facts:
- First layer: “User is on iOS” (weak inference)
- Later layers: analyze everything through an iOS lens
Downstream reasoning rationalizes around the error instead of questioning it.
What Works Better in Production
1. Rolling Refinement (Single Evolving State)
Instead of summarizing summaries, maintain a rolling state:
S_{t+1} = f(S_t, C_{t+1})
Each update sees:
- Existing beliefs
- Fresh raw context
This allows later evidence to correct earlier mistakes and preserves long‑range links.
2. Schema‑First Extraction
For most enterprise cases, structured state beats prose.
Example schema update:
{
"errors": ["502 Gateway"],
"component": "checkout",
"sentiment": "frustrated",
"occurrences": 3
}
Each chunk updates fields; aggregation is deterministic. Narrative summaries are generated after understanding is stabilized.
3. Anchored Claims
If hierarchical summarization is unavoidable:
- Require every claim to carry message IDs or span references
- Only propagate claims with valid anchors
Unanchored statements are treated as low confidence.
The Golden Rule
Summarization behaves like a lossy codec with noise.
Never summarize a summary unless raw context is re‑introduced or updates are applied to a structured schema.
Teams that design with this constraint see more stable analytics, fewer hallucinations, and systems that keep working beyond the demo.
References & Further Reading
-
Agenta — Top Techniques to Manage Context Length in LLMs https://agenta.ai/blog/top-6-techniques-to-manage-context-length-in-llms
-
Snowflake Engineering — Impact of Chunking in Long‑Context RAG https://www.snowflake.com/en/engineering-blog/impact-retrieval-chunking-finance-rag/
-
Galileo AI — LLM Summarization in Production https://galileo.ai/blog/llm-summarization-production-guide
-
PromptQL — Failure Modes in RAG Systems https://promptql.io/blog/fundamental-failure-modes-in-rag-systems
-
Zyphra — Graph‑Based RAG and Multi‑Hop QA https://www.zyphra.com/post/understanding-graph-based-rag-and-multi-hop-question-answering
-
arXiv — On Information Loss and Noise in Multi‑Step LLM Reasoning https://arxiv.org/html/2407.13101v2
-
arXiv — Signal Degradation in Multi‑Hop LLM Systems https://arxiv.org/html/2504.16787
-
ICLR Blog — Data Processing Inequality and Representation Learning https://iclr-blogposts.github.io/2024/blog/dpi-fsvi/
-
UC Davis — Information Theory & Data Processing Inequality https://csc.ucdavis.edu/~cmg/Group/readings/MDM-StochasticResonance.pdf
-
Neon Redwood — Hierarchical Insights with LLMs https://www.neonredwood.com/blog/case-study-llm-application-for-hierarchical-insights-in-glp-1
-
Kubiya — Context Engineering Best Practices https://www.kubiya.ai/blog/context-engineering-best-practices


