Overview

code-tree is a Rust-based tool that traverses codebases to generate “structured summary contexts”. The generated data is stored in the .ctree/ directory and organized in LLM-friendly formats including tree structures, symbols, dependencies, and diff information.

It is available as a CLI tool (ctree) and MCP server (ctree-mcp), enabling direct integration with editors and agents.

Design Intent

Initial Design: Starting from Programming Languages

code-tree was originally designed as a static analysis tool for extracting symbols from programming languages like Rust, Go, and Python. However, real-world operation revealed critical gaps:

  • Document Format Handling: In Hugo-based blogs or documentation projects, Markdown content comprises the majority of the project. Excluding it from context results in incomplete project understanding
  • Template Structure: Changes to HTML templates (Hugo/Jinja) weren’t reflected in the context

This experience led to extending code-tree to support Markdown and HTML templates. See “Building code-tree HTML Template and Markdown Scanner” for details.

The Need for Context Compression

For large-scale projects, the naive approach of reading all source code each time is impractical.

  • Token Cost Growth: Large repositories must have information structured in minimal, reproducible form before being passed to LLMs
  • Token Waste: Instead of sending full scan results each time, leverage diff revisions (snapshots/rev/*.txt) and hash references for efficiency
  • Operational Consistency: Provide unified tool ecosystems that enable reliable ctree_check-centric workflows across editors and agents

Controlling Noise and Variance

In team development environments, output reproducibility is critical.

  • Fixed generation parameters like sw (strong width) and ww (weak width) suppress output variance
  • Hash-based references allow context window savings while drilling down to required information

Primary Specifications

Supported Languages

The architecture features modularized language support, enabling incremental addition of language support as needed. Currently, implementation progresses from languages actually used in development.

Planned language support: Go, Rust, Python, TypeScript, JavaScript, C#, Dart, Lua, Awk, Shell (sh/bash/zsh), Kotlin, Swift, Markdown

Each language is modularized, enabling easy extension through pattern definition and scan logic additions.

Scope Classification

Files are classified into 3 tiers with priority ordering (strong > weak > symbol_only):

  • strong: Generates detailed summaries. Covers function definitions, class definitions, module structures—symbols central to overall project architecture
  • weak: Generates simplified summaries. Covers utility functions, helper methods, config files—reference-level symbols
  • symbol_only: Extracts existence information and primary symbols only. Covers dependency packages, test code, generated code

Output Structure (.ctree/)

FilePurpose
tree_annotated_<lang>.txtDirectory tree with annotations
symbols_<lang>.jsonlExtracted symbols list (JSON Lines format)
depends_<lang>.jsonlFile and module dependencies
text_store_<lang>.jsonlText store for hash references
strong_summary_<lang>.txtText summary of strong scope only
weak_summary_<lang>.txtText summary including weak scope
ctree_ctx_<lang>.txtFinal context optimized for LLM input

Diff Management and History

Compact diffs are stored as 0001.txt format in .ctree/snapshots/rev/.

  • Symbols are hashed based on path|kind|name and continuously tracked
  • Each revision records “additions (+S)”, “deletions (-S)”, “dependency additions (+D)”, “deletions (-D)”
  • Change points are searchable in O(k log n) without merge joins (k: changes, n: total symbols)

Configuration Management

Supports configuration via .ctree.toml. Priority order is: “CLI arguments > .ctree.toml > defaults”.

  [ctree]
strong = ["src/main.rs", "src/lib.rs", "src/*/mod.rs"]
weak = ["src/**/*.rs", "tests/**/*.rs"]
symbol_only = ["vendor/**/*"]

[output]
template = "markdown"  # Unified Markdown format
  

Output is unified in Markdown format, enabling direct viewing and editing in Obsidian and other tools.

MCP Tool Suite

ToolFunction
ctree_initCreate configuration template
ctree_generateGenerate/update context
ctree_resetReset history and regenerate
ctree_checkCombined generation, diff checking, and text retrieval
ctree_get_baselineRetrieve baseline
ctree_get_revsFetch diff history
ctree_get_textOn-demand retrieval of specific text by hash

Expected Adoption Benefits

Token Consumption Optimization (Design Expectation)

By passing only minimum indexed information, we expect improvements in both cost and processing speed.

  • Initial context: Compress entire project skeleton with .ctree/ctree_ctx_<lang>.txt
  • Detailed information: Retrieve only required symbols on-demand via hash reference
  • Diff information: Pass only changes since previous snapshot

Verification Status: When running diff checks in new sessions, diff-based work intent understanding has proven accurate. Detailed quantitative verification of token cost reduction is planned for future work.

Reduced Synchronization Cost

Enables “follow latest only” workflows, eliminating the need to resend context per conversation turn.

  • Immediately understand “what changed” from rev file diffs
  • Specify related symbol hashes to dynamically fetch detailed information

Improved Visualization and Review Efficiency

Structural changes become easier to understand, with investigation starting points becoming clear.

  • Visualize project structure across .tree_annotated_<lang>.txt
  • Index all symbols in symbols_<lang>.jsonl for fast grep/jq searching

Step 1: Initialization

  ctree_init --config .ctree.toml
  

Initialize project-specific configuration (strong/weak scope definitions).

Step 2: Baseline Generation

  ctree generate --config .ctree.toml
  

Generate initial context into .ctree/ directory. Record this state as revision 0000.

Step 3: Daily Operations

  ctree_check --config .ctree.toml
  

ctree_check performs these operations in one call:

  1. Detect source code updates
  2. Regenerate .ctree/
  3. Generate new rev file (0001.txt, etc.)
  4. Output final context for LLM

Step 4: LLM Integration

Provide the following as baseline context to the LLM:

  # Project Structure
<Contents of ctree_ctx_<lang>.txt>

# Recent Changes (rev 0010)
<Contents of snapshots/rev/0010.txt>
  

When detailed information is needed, specify hashes:

  MCP call: ctree_get_text(hashes=["abc123", "def456"])
  

Design Philosophy

Rather than always producing “heavy full text” output, the design emphasizes “lightweight index + on-demand retrieval”, maximizing the effectiveness of LLM integration in large-scale development.

“Telescope” Design

The design of recording only hashes and fetching detailed text on demand offers excellent synergy with LLM agents.

  • Low Magnification (Overview): Examine project structure via tree_annotated_<lang>.txt
  • Medium Magnification (Module Unit): Integrate symbols_<lang>.jsonl with Serena, enabling drill-down from symbol search to structural understanding
  • High Magnification (Code Details): Reference specific symbol implementation via text_store_<lang>.jsonl

Volume Control for Monorepos

In DDD + Clean Architecture structured projects (e.g., features/{module}/{domain,infra,presentation}), effective volume control increases annotation density for the active module (–sw: strong width) while reducing it for dependencies (–ww: weak width).

Configuration Example:

  # Active module (detailed summary)
strong = ["features/payment/**/*.rs"]

# Dependencies (simplified)
weak = ["features/*/domain/**/*.rs", "features/*/infra/**/*.rs"]

# Tests and generated code (symbols only)
symbol_only = ["tests/**/*.rs"]
  

This enables dynamic context volume control even in monorepo scale:

  ctree generate --config .ctree.toml --sw 20 --ww 5
  

The active module (e.g., payment) includes detailed dependencies and implementations, while other modules’ domain/infra layers provide only structural overviews, reducing token costs while maintaining necessary information.

Current Implementation Status

  • Language Support: Modularized design enables incremental language implementation as needed
  • Diff Management: Diff tracking using rev file format is complete. Confirmed that work intent understanding in new sessions is accurate
  • Token Cost Reduction: While the design is theoretically sound, quantitative verification is planned for future work

Summary

code-tree is a tool that improves LLM agent effectiveness from the “context compression” perspective. It enables lightweight, reproducible context management even for large projects, simultaneously reducing token costs and improving development efficiency.