Knowledge Tree: Structured Long-Term Memory for LLMs

Moving beyond context limits with navigable, structured knowledge trees

June 4, 2024

Moving beyond context limits with navigable, structured knowledge trees

Abstract

Large Language Models are stateless, and context windows are finite. Existing solutions - RAG, extended context, recurrence - each have fundamental tradeoffs. Knowledge Tree takes a different approach: build a navigable tree of structured nodes from long-form content, then let the LLM reason about which branches to explore. Unlike prior work using plain text summaries, each node contains typed metadata (content types, decisions, actions, events) that informs navigation. Combined with partial-answer detection and multi-path retry logic, this enables question answering over corpora far exceeding context limits.

1. Introduction

Large Language Models have transformed how we interact with information. They can summarize documents, answer questions, generate code, and reason through complex problems. Yet they suffer from a fundamental limitation: they are stateless. Every conversation starts fresh. Every context window has a hard limit. The moment content exceeds that limit, something must be discarded.

This matters because real-world knowledge work isn't stateless. Projects accumulate months of decisions, discussions, and documentation. Teams need to recall why a particular architecture was chosen six months ago, or what blockers were discussed in last quarter's retrospectives. The information exists - scattered across tickets, documents, and meeting notes - but it far exceeds what any context window can hold.

1.1 The Limits of Current Approaches

Several approaches attempt to bridge this gap, each with fundamental tradeoffs:

Extended Context Windows

Models now support 100K, 200K, even 1M tokens. But longer isn't always better:

Attention quality degrades with distance; models exhibit "lost in the middle" effects where information in the center of long contexts is poorly recalled
Positional bias causes models to favor content at the beginning or end
Cost scales linearly (or worse) with context length
The context window remains finite - eventually, you hit the wall

Retrieval-Augmented Generation (RAG)

RAG systems retrieve relevant chunks and inject them into the prompt. This works well for document search but struggles with coherent long-text understanding:

Retrieval is optimized for similarity, not relevance to complex queries
Chunks are selected independently, losing narrative coherence
Multi-hop reasoning ("find X, then use X to find Y") requires multiple retrieval rounds
No mechanism to backtrack when retrieved content proves unhelpful

Recurrence and Summarization

Recurrent approaches compress earlier content into summaries, carrying them forward:

Each compression step loses information
Older content fades as it passes through multiple summarization layers
No way to recover detail once it's been compressed away
The compression isn't query-aware - important details for future questions may be discarded

Knowledge Tree treats long-form memory as a structure to navigate rather than content to retrieve or compress.

This mirrors how humans handle large bodies of knowledge. We don't load everything into working memory at once. We build mental models - hierarchies of concepts, indexes of where to find what, intuitions about which areas are relevant to which questions. When we need specific information, we navigate to it: "That decision was made during the architecture review... which happened after the Q2 planning... let me look at the technical decisions from that period."

Applied to LLMs, this means:

Build a navigable structure: Transform long-form content into a tree of nodes, each containing structured metadata about what lies beneath
Navigate, don't retrieve: When a query arrives, the LLM reasons about which branches are most likely to contain relevant information
Extract structured knowledge: Nodes aren't just text summaries - they contain typed metadata (decisions, actions, events, topics) that inform navigation
Recover from wrong turns: If a path proves unhelpful, backtrack and try alternatives

The LLM becomes an active explorer rather than a passive recipient of retrieved chunks. It reasons about where to look, evaluates what it finds, and adapts when its first guess is wrong.

1.3 Contributions

This paper presents Knowledge Tree, extending prior work on interactive reading (MemWalker) with several key innovations:

Structured node extraction: Beyond plain summaries, we extract typed metadata - content types, decisions, actions, events - that enables more informed navigation
Content type taxonomy: A predefined categorization scheme ensures consistent tagging and enables categorical reasoning ("look for decisions, not meeting notes")
Robust partial-answer handling: Explicit detection of incomplete answers triggers systematic exploration of alternative paths
Multi-attempt navigation: Configurable retry logic across branches and leaves prevents early termination on wrong paths

Together, these enable question answering over corpora far exceeding context limits.

1.4 Paper Organization

Section 2 reviews MemWalker and identifies opportunities for improvement
Section 3 details Knowledge Tree's key innovations
Section 4 explains the construction and navigation algorithms
Section 5 walks through a practical example with real project data
Section 6 discusses implementation considerations
Section 7 addresses limitations and future directions
Section 8 concludes

2. Background: MemWalker

MemWalker (Chen et al., 2023) introduced an interactive reading approach for long-context understanding:

What MemWalker Got Right

Two-stage approach: Build a tree structure, then navigate it
Iterative prompting: LLM decides which branch to explore
Revert capability: Can backtrack if a path proves unfruitful
Working memory: Carries context from visited nodes

What MemWalker Left on the Table

Plain text summaries: Nodes contain only text, no structure
Generic navigation: Chooses based on summary similarity alone
Binary outcomes: Limited handling of partial information
Single-purpose: Tree is built for one query, then discarded

3. Knowledge Tree: Key Innovations

3.1 Structured Node Extraction

Where MemWalker creates text summaries, Knowledge Tree extracts structured knowledge:

Field	Purpose	Example
Summary	Concise overview of content	"Team discussed Q3 roadmap priorities"
Content Types	Categorization from taxonomy	["Meeting minutes", "Project plans"]
Critical Actions	Action items, tasks, TODOs	"Design review scheduled for Friday"
Decisions	Choices made, commitments	"Decided to use PostgreSQL over MongoDB"
Noteworthy Events	Important occurrences	"Client approved the proposal"
About	Topics/entities mentioned	["authentication", "API redesign", "Q3 goals"]

The Content Type Taxonomy

A predefined taxonomy of 50+ content types enables:

Consistent categorization across nodes
Filtering during navigation ("look for decisions, not meeting notes")
Future hybrid retrieval (semantic + category)

Example categories:

Meeting notes & minutes
Task records & tickets
Design documents
Decisions & agreements
Requirements & specifications

The taxonomy is domain-specific - define categories that match your content.

Plain summary comparison:

"Summary 0: The team met to discuss project updates..."

"Summary 1: Technical review of the authentication system..."

Structured comparison:

"Option 0: Meeting minutes | Decisions: None | About: [status updates, timeline]"

"Option 1: Design document | Decisions: OAuth2 selected | About: [authentication, security]"

The LLM can now reason: "The question asks about auth decisions. Option 1 explicitly contains decisions about authentication."

Navigation prompts present full metadata, not just summaries:

Options:
- Index: 0
  Summary: "..."
  Content Types: [Meeting minutes]
  Decisions: None
  Critical Actions: ["Schedule follow-up"]
  About: [roadmap, timeline]

- Index: 1
  Summary: "..."
  Content Types: [Design document]
  Decisions: "Selected OAuth2 for authentication"
  Critical Actions: None
  About: [authentication, security, API]

This gives the LLM multiple signals to reason about relevance.

3.3 Graceful Degradation

Real-world queries often require exploring multiple paths. Knowledge Tree handles this with:

Partial Answer vs No Answer

Response	Meaning	Action
No Answer	Content is irrelevant to query	Try different leaf/branch
Partial Answer	Some information found, but incomplete	Try additional paths, may combine
Complete Answer	Query fully satisfied	Return response

Multi-Attempt Strategy

max_branch_attempts = 3
leaves_per_branch = 2

for each branch attempt:
    select best branch
    for each leaf attempt:
        select best leaf
        try to answer
        if complete: return
        if partial/none: remove leaf, retry
    if still incomplete: remove branch, retry

This systematic exploration prevents early termination on wrong paths.

4. How It Works

4.1 Tree Construction

Diagram

flowchart TB
    subgraph "Input"
        CONTENT[/"Long-form content<br/>(documents, tickets, notes)"/]
    end

    subgraph "Stage 1: Segmentation"
        CONTENT --> SEG["Split into fixed-size chunks<br/>(~5000 characters each)"]
    end

    subgraph "Stage 2: Leaf Extraction"
        SEG --> LEAF["For each chunk:<br/>Extract structured summary<br/>+ Content Types<br/>+ Decisions<br/>+ Actions<br/>+ Events<br/>+ About"]
        LEAF --> LEAVES[("LEAF nodes")]
    end

    subgraph "Stage 3: Branch Synthesis"
        LEAVES --> GROUP["Group leaves<br/>(5-8 per branch)"]
        GROUP --> BRANCH["Aggregate summaries<br/>Merge Content Types<br/>Merge About topics"]
        BRANCH --> BRANCHES[("BRANCH nodes")]
    end

    subgraph "Stage 4: Root Creation"
        BRANCHES --> ROOT["Synthesize final summary<br/>from all branches"]
        ROOT --> ROOTNODE[("ROOT node")]
    end

Each node contains:

Content: Original text (for leaves) or child references (for branches/root)
Summary: Structured metadata object with all extracted fields
Parents: References to child nodes for traversal
Level: Node type identifier (Leaf, Branch, Root)

Diagram

flowchart TB
    START([Query Received]) --> ROOT[Start at ROOT]
    ROOT --> PRESENT[Present child nodes<br/>with full metadata]
    PRESENT --> SELECT{LLM selects<br/>best option}

    SELECT --> |"Selected BRANCH"| DESCEND[Descend to branch]
    DESCEND --> PRESENT

    SELECT --> |"Selected LEAF"| ATTEMPT[Attempt to answer<br/>from leaf content]
    ATTEMPT --> CHECK{Answer<br/>complete?}

    CHECK --> |"Complete"| RETURN([Return answer])
    CHECK --> |"Partial/None"| BACKTRACK[Remove tried leaf<br/>Backtrack]
    BACKTRACK --> RETRY{More leaves<br/>to try?}
    RETRY --> |"Yes"| PRESENT
    RETRY --> |"No"| BRANCHBACK[Backtrack to<br/>parent branch]
    BRANCHBACK --> RETRY2{More branches<br/>to try?}
    RETRY2 --> |"Yes"| PRESENT
    RETRY2 --> |"No"| BEST([Return best<br/>partial answer])

The navigation loop continues until either:

A complete answer is found
All retry attempts are exhausted (returns best partial answer)
No relevant content exists (returns "no answer")

At each node, the LLM:

Observes structured metadata from all child options
Reasons about which option most likely contains relevant information
Decides which path to explore
Evaluates whether the answer is complete
Adapts by backtracking if the path proves unfruitful

This loop continues until a complete answer is found or all paths are exhausted.

See Section 5 for a complete end-to-end walkthrough with real data.

5. Practical Example: Project Management

To illustrate Knowledge Tree in practice, we walk through a real scenario: a project management assistant that needs to answer questions about a software project's history spanning several months of activity.

5.1 The Scenario

Input corpus: 6 months of project data for "Botterfly MVP" including:

50+ Jira tickets with descriptions, status changes, and assignments
Project documentation (features, architecture, integrations)
Team information and roles
Business model and pitch deck content

Total content size: ~40,000 tokens (well beyond typical context windows)

Goal: Answer natural language questions like:

"What UI issues were reported in November?"
"Who is working on the notification system?"
"What integrations are planned?"

5.2 Tree Construction

The construction phase transforms raw content into a navigable structure:

Diagram

flowchart TB
    subgraph Input
        RAW[/"40,000 tokens of project data"/]
    end

    subgraph "Stage 1: Chunking"
        RAW --> C1[Chunk 1<br/>5000 chars]
        RAW --> C2[Chunk 2<br/>5000 chars]
        RAW --> C3[Chunk 3<br/>5000 chars]
        RAW --> C4[...]
        RAW --> C8[Chunk 8<br/>5000 chars]
    end

    subgraph "Stage 2: Leaf Extraction"
        C1 --> L1[Leaf 1]
        C2 --> L2[Leaf 2]
        C3 --> L3[Leaf 3]
        C4 --> L4[...]
        C8 --> L8[Leaf 8]
    end

    subgraph "Stage 3: Branch Aggregation"
        L1 --> B1[Branch 1]
        L2 --> B1
        L3 --> B1
        L4 --> B2[Branch 2]
        L8 --> B2
    end

    subgraph "Stage 4: Root Synthesis"
        B1 --> ROOT[Root]
        B2 --> ROOT
    end

Each node stores structured metadata, not just text summaries.

5.3 Node Structure

A leaf node extracted from ticket data might look like:

LEAF NODE: L3
├── level: "Leaf"
├── content: [raw chunk - 5000 chars of ticket data]
└── summary:
    ├── Summary: "UI refinements for dashboard including navbar
    │            fixes, margin adjustments, and scroll behavior"
    ├── Content Types: ["Bug & issue tracking records",
    │                   "Task lists & tickets"]
    ├── Critical Actions: ["Fix navbar font color",
    │                      "Adjust margins per Figma",
    │                      "Make columns scrollable"]
    ├── Decisions: "Replace history icon with new chat icon"
    ├── Noteworthy Events: None
    └── About: ["UI", "dashboard", "navbar", "Beenish Khan",
                "BMVP-66", "scroll behavior"]

A branch node aggregating multiple leaves:

BRANCH NODE: B1
├── level: "Branch"
├── parents: [L1._id, L2._id, L3._id]
└── summary:
    ├── Summary: "Frontend development tasks including UI fixes,
    │            component design, and dashboard implementation"
    ├── Content Types: ["Bug & issue tracking records",
    │                   "Task lists & tickets",
    │                   "Design documents"]
    ├── Critical Actions: [aggregated from children]
    ├── Decisions: [aggregated from children]
    └── About: ["UI", "dashboard", "React", "frontend", ...]

Query: "What UI issues were reported and who is fixing them?"

Diagram

flowchart TB
    subgraph "Step 1: Root Selection"
        ROOT[ROOT] --> |"Present options"| CHOOSE1{LLM Chooses}
        CHOOSE1 --> |"Option 0: Frontend tasks<br/>Content Types: Bug tracking, Tasks<br/>About: UI, dashboard, React"| B1[Branch 1 ✓]
        CHOOSE1 -.-> |"Option 1: Backend & integrations<br/>Content Types: API docs, Integration<br/>About: MS Teams, APIs"| B2[Branch 2]
    end

    subgraph "Step 2: Branch Selection"
        B1 --> |"Present leaves"| CHOOSE2{LLM Chooses}
        CHOOSE2 --> |"Leaf 2: Sign-up changes<br/>Decisions: Role dropdown"| L2[Leaf 2]
        CHOOSE2 -.-> |"Leaf 3: Dashboard UI issues<br/>Actions: Fix navbar, margins<br/>About: UI, Beenish Khan"| L3[Leaf 3 ✓]
    end

    subgraph "Step 3: Answer Attempt"
        L2 --> |"First attempt"| ANS1{Try Answer}
        ANS1 --> |"Partial Answer:<br/>Found assignee but<br/>not all UI issues"| RETRY[Backtrack]
        RETRY --> L3
        L3 --> ANS2{Try Answer}
        ANS2 --> |"Complete Answer"| DONE[Return Response]
    end

    style B1 fill:#90EE90
    style L3 fill:#90EE90
    style DONE fill:#90EE90

At ROOT, the LLM sees:

Option	Summary	Content Types	About
0	Frontend tasks: UI fixes, dashboard...	Bug tracking, Tasks	UI, React, dashboard
1	Backend: APIs, integrations, auth...	API docs, Integration	MS Teams, Jira, API

The LLM reasons: "Question asks about UI issues. Option 0 explicitly mentions 'UI fixes' and has Content Type 'Bug tracking'. Selecting Option 0."

At Branch 1, the LLM sees leaves:

Option	Summary	Critical Actions	About
0	Convert frontend from VueJS to ReactJS	None	VueJS, ReactJS
1	Sign-up form changes	None	sign-up, dropdown
2	Dashboard UI issues	Fix navbar, margins, scroll	UI, Beenish Khan, BMVP-66
3	Sidebar implementation	None	sidebar, dashboard

The LLM reasons: "Option 2 has explicit 'Critical Actions' about UI fixes and mentions an assignee. Selecting Option 2."

At Leaf 2 (first attempt - wrong path):

The LLM finds sign-up related content but not comprehensive UI issues.

Response: { "Partial Answer": true, "Answer": "Found role dropdown change..." }

Retry mechanism triggers → removes Leaf 2, tries Leaf 3

At Leaf 3 (correct path):

The LLM finds the full ticket content:

"Font color in navbar is all white. Not visible. Paddings and margins are bigger than in the UI design... Each column needs to be individually scrollable..."

Assignee: Beenish Khan

Response: { "Partial Answer": false, "Answer": "Several UI issues were reported including navbar visibility, margin adjustments, scroll behavior, and icon changes. Beenish Khan is assigned to fix these issues (BMVP-66)." }

5.6 Where Knowledge Tree Made the Difference

Without Structure	With Structure
"Summary mentions dashboard..."	`Content Types: ["Bug tracking"]` → clearly issue-related
"Might be about UI?"	`About: ["UI", "navbar", "Beenish Khan"]` → confirms relevance
"Unknown who's responsible"	`Critical Actions` list shows specific fixes

Partial Answer Handling Prevented False Negatives

Diagram

flowchart LR
    Q[Query] --> L2[Leaf 2]
    L2 --> |"MemWalker would stop here<br/>with incomplete answer"| FAIL[❌ Incomplete]

    Q --> L2B[Leaf 2]
    L2B --> |"Partial Answer = true"| RETRY[Retry]
    RETRY --> L3[Leaf 3]
    L3 --> |"Complete Answer"| SUCCESS[✓ Full Answer]

    style FAIL fill:#FFB6C1
    style SUCCESS fill:#90EE90

The original MemWalker approach would have returned after the first leaf, missing critical information. Knowledge Tree's explicit partial answer handling triggered exploration of additional leaves.

Content Type Taxonomy Avoided Wrong Branches

The query mentioned "issues" - the taxonomy distinguished between:

Bug & issue tracking records (correct)
Meeting minutes & notes (wrong - would contain discussion, not tickets)
Project plans & roadmaps (wrong - future-looking, not issues)

This categorical signal helped the LLM avoid branches that might have similar keywords but wrong content types.

5.7 Tree Statistics

For this example corpus:

Metric	Value
Input tokens	~40,000
Leaf nodes	8
Branch nodes	2
Tree depth	3 (Root → Branch → Leaf)
Avg navigation steps	2.5
Tokens read per query	~15,000 (37% of total)

The tree structure reduced the tokens processed per query by 63% compared to feeding the entire corpus.

6. Implementation Considerations

Model Selection

Reasoning capability is critical (per original MemWalker findings)
70B+ parameter models recommended for complex navigation
Smaller models may work for simpler trees / fewer branches

Chunk Size & Tree Depth Tradeoffs

Larger chunks	Smaller chunks
Fewer nodes, shallower tree	More nodes, deeper tree
More context per leaf	More precise localization
Risk losing detail in summary	Risk fragmenting coherent content

Recommended starting point: ~5000 characters per leaf segment.

Token Budget

Navigation prompts grow with number of children
Limit children per node (5-8 recommended)
Working memory may need truncation for deep traversals

7. Limitations & Future Work

Current Limitations

Scaling: Very large corpora produce large trees; construction cost grows linearly
Static structure: Tree is built once; updates require reconstruction
Single-tree: One tree per corpus; no cross-tree navigation

Future Directions

Incremental updates: Add new content without full rebuild
Hybrid retrieval: Combine tree navigation with vector similarity
Multi-tree federation: Navigate across multiple knowledge trees
Self-improvement: Use query patterns to restructure tree over time

8. Conclusion

Knowledge Tree treats long-form memory as a structure to navigate rather than text to retrieve. The LLM decides which branches to explore, evaluates what it finds, and backtracks when needed. Three properties make this work:

Structure enables reasoning. Plain text summaries force the LLM to infer relevance from prose. Structured metadata provides explicit signals: content types enable categorical matching, decision fields surface choices directly, action fields highlight tasks. The LLM reasons about structure, not just semantics.

Graceful degradation beats early termination. Real queries often require information scattered across multiple locations. Explicit partial-answer detection and multi-path retry logic keep the system exploring until it finds complete answers or exhausts its options.

The tree is reusable. Unlike embeddings, Knowledge Tree nodes contain human-readable structured knowledge. A tree built for one query serves future queries, and can support purposes beyond Q&A: generating summaries, identifying patterns, onboarding.

When to Use Knowledge Tree

Knowledge Tree is well-suited for scenarios involving:

Coherent long-form content: Project histories, documentation, conversation logs - where context and narrative matter
Complex queries: Questions requiring reasoning across multiple pieces of information
Persistent knowledge bases: Corpora that will be queried repeatedly, justifying upfront construction cost
Explainability needs: The navigation trace shows exactly which content informed each answer

It is less suited for:

Simple keyword lookup (traditional search is faster)
Rapidly changing content (tree reconstruction has cost)
Single-use queries over disposable content

Looking Forward

The limitations noted in Section 7 - static construction, single-tree scope - are engineering challenges, not fundamental barriers. As LLM reasoning capabilities improve, structured navigation approaches become more viable.

The shift is conceptual: from "what text matches this query?" to "where should I look?" That reframing sidesteps the context window problem entirely.

Knowledge Tree builds on the MemWalker approach introduced by Chen et al. (2023), extending it with structured extraction, content taxonomies, and robust partial-answer handling.

Appendix: Prompt Templates

Knowledge Tree uses three core prompt templates: one for tree construction (summarization) and two for navigation (branch selection and leaf evaluation).

A.1 Summarization Prompt (Tree Construction)

Used to extract structured metadata from content chunks during tree building.

Instruction: |
  Evaluate content and extract structured metadata.
  Return JSON only in the format specified under 'Provide Answers'.

Context Explanation:
  Strategy: |
    Build a memory tree that condenses long texts into structured
    summaries, enabling guided navigation to segments relevant
    to user queries.

  Memory Tree Details:
    Total Levels: [Root, Branch, Leaf]
    Current Level: {level}  # Leaf, Branch, or Root

Field Explanations:
  About: |
    Should help traverse the memory tree easily. List everything
    mentioned or discussed in the current content - entities,
    topics, people, systems, etc.

  Content Types: |
    Categorize content using the taxonomy. Avoid over-tagging;
    prefer high-level types when multiple apply. Introduce new
    types if none fit.

  Possible Content Types:
    - Meeting notes & minutes
    - Task records & tickets
    - Design documents
    - Decisions & agreements
    # ... [domain-specific categories]

Content: {content_chunk}

Provide Answers:
  Summary: <Concise overview of content>
  Content Types: [<List of applicable types>]
  Critical Actions: <Action items, tasks, TODOs if any>
  Decisions: <Choices made, commitments if any>
  Noteworthy Events: <Important occurrences if any>
  About: [<List of topics, entities, people mentioned>]

Key design decisions:

Explicit field explanations reduce ambiguity
Taxonomy provided in-prompt ensures consistency
"If any" qualifiers prevent hallucinated metadata
About field optimized for navigation relevance

Used at non-leaf nodes to select which child to explore.

Instruction: |
  The user has asked a question related to the project. Evaluate
  the options and select the option index with highest potential
  to answer the user's question.

  Don't worry about options not fully answering the question.
  Consider 'Content Types', 'Critical Actions', 'Noteworthy Events',
  and 'Decisions' to determine each option's potential.

  Your response will be used to navigate further down the tree.
  Return JSON only in format under 'Provide Answers'.

Memory Tree Details:
  Total Levels: [Root, Branch, Leaf]
  Current Level: {level}  # Branch or Leaf

Project Context:
  Project Name: {project_name}
  Project Root Summary: {root_summary}

# Included when navigating from a branch (provides breadcrumb context)
Selected Branch:  # Optional - only present after first navigation
  Summary: {branch_summary}
  Content Types: {branch_content_types}
  Critical Actions: {branch_actions}
  Decisions: {branch_decisions}
  Noteworthy Events: {branch_events}
  About: {branch_about}

Options:
  - Index: 0
    Summary: {option_0_summary}
    Content Types: {option_0_types}
    Critical Actions: {option_0_actions}
    Decisions: {option_0_decisions}
    Noteworthy Events: {option_0_events}
    About: {option_0_about}

  - Index: 1
    Summary: {option_1_summary}
    # ... same fields

  # ... additional options

Question: {user_query}

Provide Answers:
  Selected Option Index: <integer>
  Selection Reason: <brief justification>

Key design decisions:

Full metadata per option enables informed comparison
"Selection Reason" captures reasoning (useful for debugging/explainability)
Branch context (when present) provides navigation breadcrumb
Explicit instruction to consider all metadata fields, not just summary

A.3 Answer Prompt (Leaf Evaluation)

Used at leaf nodes to attempt answering from content, with explicit partial/no answer detection.

Instruction: |
  You are a knowledge assistant. Answer the user's question based
  on the provided context.

  Determine the completeness of your answer:
  - Set 'Partial Answer' to true if any part of the question
    remains unanswered due to insufficient context
  - Set 'No Answer' to true only if you cannot provide any
    response based on the context

Answer Requirements: |
  Write a full, conversational answer under 'Answer' field.
  Ensure all parts of the user's question are addressed.

Response Format: JSON under 'Provide Answers'

Context: {leaf_content}

Question: {user_query}

Partial Answer Explained: |
  Set to true if the provided context is not sufficient to
  completely satisfy ALL aspects of the user's question, or
  if required information is missing. If any sub-question
  remains unanswered, set to true.

No Answer Explained: |
  Set to true ONLY if the provided context does not allow
  for any form of answer whatsoever.

Provide Answers:
  Answer: <full conversational response or null>
  No Answer: <boolean>
  Partial Answer: <boolean>

Key design decisions:

Explicit definitions for Partial vs No Answer prevent ambiguity
"ALL aspects" language catches multi-part questions
Conversational answer requirement ensures useful responses
Boolean flags enable programmatic retry logic

A.4 Prompt Design Principles

The prompts follow several principles that improve reliability:

Principle	Implementation
Structured output	JSON format with explicit schema
Field explanations	Each field has inline documentation
Taxonomy constraints	Content types from predefined list
Graceful unknowns	"If any" qualifiers for optional fields
Reasoning capture	Selection reason for explainability
Explicit thresholds	Clear definitions for partial/no answer
Context preservation	Branch breadcrumbs during navigation

A.5 Example: Summarization Prompt in Action

Input content chunk:

BMVP-66: Fix minor UI issues in Dashboard view
Status: Selected for Development
Assignee: Beenish Khan

Description: Font color in navbar is all white. Not visible.
Paddings and margins are bigger than in the UI design.
Make sure to follow the margins used in the figma file.
Icons to collapse each section are inverted...

LLM Response:

{
  "Summary": "UI bug fixes for dashboard view including navbar
              visibility, margin adjustments, and icon corrections",
  "Content Types": ["Bug & issue tracking records",
                    "Task lists & tickets"],
  "Critical Actions": "Fix navbar font color, adjust margins per
                       Figma, correct collapse icons, make columns
                       scrollable",
  "Decisions": "Replace history icon with new chat icon",
  "Noteworthy Events": null,
  "About": ["dashboard", "UI", "navbar", "Beenish Khan",
            "BMVP-66", "Figma", "scroll behavior", "margins"]
}

This structured output enables precise navigation when a user later asks "What UI bugs were reported?" - the Content Type Bug & issue tracking records and About field ["UI", "dashboard", ...] directly signal relevance.

References

Chen, H., Pasunuru, R., Weston, J., & Celikyilmaz, A. (2023). Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading. arXiv:2310.05029

On this page

Knowledge Tree: Structured Long-Term Memory for LLMs

Abstract

1. Introduction

1.1 The Limits of Current Approaches

1.2 A Different Approach: Memory as Navigation

1.3 Contributions

1.4 Paper Organization

2. Background: MemWalker

What MemWalker Got Right

What MemWalker Left on the Table

3. Knowledge Tree: Key Innovations

3.1 Structured Node Extraction

The Content Type Taxonomy

Why Structure Helps Navigation

3.2 Informed Navigation

3.3 Graceful Degradation

Partial Answer vs No Answer

Multi-Attempt Strategy

4. How It Works

4.1 Tree Construction

4.2 Query Navigation

4.3 The Navigation Loop

5. Practical Example: Project Management

5.1 The Scenario

5.2 Tree Construction

5.3 Node Structure

5.4 Navigation Walkthrough

5.5 The Navigation in Detail

5.6 Where Knowledge Tree Made the Difference

Structured Metadata Enabled Precise Navigation

Partial Answer Handling Prevented False Negatives

Content Type Taxonomy Avoided Wrong Branches

5.7 Tree Statistics

6. Implementation Considerations

Model Selection

Chunk Size & Tree Depth Tradeoffs

Token Budget

7. Limitations & Future Work

Current Limitations

Future Directions

8. Conclusion

When to Use Knowledge Tree

Looking Forward

Appendix: Prompt Templates

A.1 Summarization Prompt (Tree Construction)

A.2 Navigation Prompt (Branch/Node Selection)

A.3 Answer Prompt (Leaf Evaluation)

A.4 Prompt Design Principles

A.5 Example: Summarization Prompt in Action

References