Introduction

During pathfinder development, multiple optimization cycles refined the system from a simple “return path candidates” function into a sophisticated solution with quantization-aware model selection, dynamic parameter tuning, filesystem integration, and history-based refinement.

Optimization Cycles

Phase 1-2: Foundations and Dynamic TopK

Initial Challenges

  • ONNX model warm-up latency during filesystem indexing
  • Relative vs absolute path confusion as primary LLM failure mode
  • Fixed TopK unable to adapt to project complexity

Solutions

1. .gitignore Integration

  # Automatically exclude based on .gitignore
__pycache__
node_modules
dist
build
target
.git
  

Index size reduction + improved search precision. MCP config uses INCLUDE_DIRS for explicit exceptions.

2. Dynamic TopK Adjustment

Replace fixed TopK (e.g., 10) with filesystem-adaptive value:

  TopK = ceil(total_files * ratio)  # ratio: 0.05 or 0.1
  

Example: 3,000 files → TopK=150; 300 files → TopK=15.

Search space grows proportionally with project size, maintaining latency/precision balance across scales.

3. ReRanking Warm-up

ONNX first inference incurs startup overhead. Mitigation:

  // Pre-warm during initialization
let _ = rerank("alpha_path", "beta_path", &embedder)?;
  

Subsequent queries execute with warm runtime, stabilizing latency.

Phase 3-4: Filesystem Integration and Configuration Simplification

Challenges

  • Complex IGNORE_CASE environment variables
  • .gitignore negative patterns unsupported (e.g., !mxbai-colbert exceptions)
  • Minimal sampling environment

Solutions

1. Negative Pattern Parsing

  /models/*        # exclude all under models/
!mxbai-colbert   # except this directory
  

Enhanced parser to correctly handle gitignore negation. Complex exclude/include logic implemented.

2. Design Flip: IGNORE_DIRS → INCLUDE_DIRS

Strategy reversal:

  • Default: Follow .gitignore
  • Exception: Explicitly add via INCLUDE_DIRS env

Projects now mostly auto-configure via .gitignore compliance.

3. Sampling Environment Expansion

Constructed 3,000+ file complex tree for realistic benchmarking:

  sampling/
├── cmd/server/main.go
├── internal/
│   ├── domain/
│   │   ├── knowledge/
│   │   ├── llm/
│   │   └── pipeline/
│   ├── infra/
│   │   ├── llamacpp/
│   │   ├── openaihttp/
│   │   ├── postgres/
│   │   └── vllm/
├── devstack/
└── packages/resolver-go/
  

Simulates real monorepo scenarios.

Phase 5-6: Quantization Precision Benchmarking

Challenges

  • INT8 accuracy impact unknown
  • Latency vs precision tradeoffs unquantified
  • Monorepo accuracy degradation risk unclear

Solutions

1. Parallel Model/Quantization Testing

  Model candidates:
├── mxbai-edge-colbert-v0-32m
│   ├── model_int8.onnx    ← fast, precision?
│   ├── model_fp16.onnx    ← balanced
│   └── model_fp32.onnx    ← baseline
├── lightonai-mxbai-edge
└── ort-comm-colbert-sm-v1
  

Measured results (full monorepo benchmark: 55 cases spanning all categories including typos, path prefix omissions, extension errors, and ambiguous bare filenames; 3,261 files):

ConfigurationAccuracyAvg Latency
answerai-colbert-small INT8 (default)87.3% (48/55)9.07ms
answerai-colbert-small FP1687.3% (48/55)12.61ms
answerai-colbert-small FP3290.9% (50/55)1,080ms
mxbai-colbert-edge INT887.3% (48/55)8.95ms
mxbai-edge-colbert INT887.3% (48/55)9.12ms

Key finding: all three ColBERT models produced identical accuracy. The 7 failures were in scoring, not model limitations — neural model improvements cannot fix them.

FP32 gained +3.6pp (87.3% → 90.9%) but at 119x latency cost (9ms → 1,080ms). INT8 had the best yield and was used as the baseline.

2. Ensemble Experiment → Abandoned

Tested multiple ColBERT ensemble averaging:

  let scores_m1 = rerank_with_model(query, candidates, &model1)?;
let scores_m2 = rerank_with_model(query, candidates, &model2)?;
let final_scores = (scores_m1 + scores_m2) / 2.0;
  

Three-model ensemble still achieved 87.3% accuracy with 3x latency (33.2ms).

Phase 7: History Correlation for Precision Gain

1. Query History Buffer

Maintain in-memory N recent queries (N=5):

  struct QueryHistory {
    queries: Vec<QueryRecord>,
}

struct QueryRecord {
    query: String,
    parent_dir: String,
    timestamp: u64,
}
  

On new query, check past 5 queries’ parent directories; if match found, re-filter candidates to that parent.

2. Parent Directory Affinity Scoring

  // Same parent as recent query: +20 points
// Similar structure (depth, pattern): +10 points

fn compute_history_affinity(
    candidate_dir: &str,
    history: &QueryHistory,
) -> f32 {
    // compute alignment to past 5 parent dirs
    ...
}
  

Correlation benchmark results (ambiguous bare filename subset only: 20 scenarios, sequential queries):

ModeAccuracyBreakdown
With history (primed)85.0% (17/20)dir_affinity: 13/15, pkg_affinity: 3/4, explicit: 1/1
Baseline (no history)35.0% (7/20)dir_affinity: 4/15, pkg_affinity: 2/4, explicit: 1/1
Delta+50.0pp11 improved, 1 regressed

+50pp improvement. Bare filename ambiguity that only resolved at 35% without history was lifted to 85% with a 5-entry ring buffer.

Critical Design Decisions

Skip Threshold (Early Exit Criterion)

Score-based decision to avoid reranking:

  score >= 50.0 → high confidence, return immediately
30.0 < score < 50.0 → marginal, consider reranking
score <= 30.0 → low confidence, always rerank
  

Dynamic adjustment was initially considered, but with INT8 reranking at ~9ms including reranking overhead, it was fast enough that design simplicity was prioritized over excessive optimization. A relatively high fixed threshold (50) was adopted for now.

Reranking Cost-Benefit

INT8 reranking adds ~9ms average overhead, justified by:

  1. Latency increase acceptable for tool calls
  2. Accuracy gains substantial
  3. “Always rerank” simplifies implementation

LLM Instruction Anchoring

Embed into tool initialization:

For file operations in this project, always use tool_retry_with_resolve. Path corrections are automatic.

Report reranking occurrences to LLM to reinforce path resolution reliance.

Technical Implementation Details

Failure Pattern Analysis

Monorepo benchmark breakdown (55 cases):

CategoryCorrectTotalRate
Cross-package confusion77100%
Filename typo55100%
Wrong nesting depth44100%
Wrong directory44100%
Wrong extension4757%
Combined typo+wrong-pkg1250%
Other (ambiguous bare names)91275%

Wrong extension (e.g., matcher.pymatcher.rs) and ambiguous bare filenames (client.go existing in 8 locations) remain as open challenges. The former requires scoring design changes; the latter is addressed by history correlation.

Quantization Conclusion

INT8 and FP16 produce identical accuracy (87.3%) with only latency difference (9ms vs 13ms). FP32 gains +3.6pp at 119x slowdown. INT8 is optimal as default. The accuracy bottleneck is scoring algorithm design, not model precision.

Deliverable

CLI Help Output

  ksh3@desktop.home.arpa ~/Development/loftllc-web % pathfinder-mcp -h
pathfinder-mcp — deterministic path resolution MCP server

USAGE
    pathfinder [OPTIONS]

OPTIONS
    --root <PATH>     Add a project root directory to watch and index.
                      May be specified multiple times.  Defaults to $PWD.
    -h, --help        Print this help message and exit.

DESCRIPTION
    An MCP (Model Context Protocol) server that resolves ENOENT / NotFound
    path errors for AI coding agents.  It builds an in-memory path index of
    configured root directories and uses fuzzy matching combined with ColBERT
    MaxSim re-ranking (when an ONNX model is available) to resolve incorrect
    paths to their most likely existing counterparts.

    Communication is via JSON-RPC over stdin/stdout.  Evaluation metrics are
    written to stderr as JSON lines (redirect with 2>metrics.jsonl).

    The server runs a stdin reader thread with periodic orphan detection and
    exits automatically when the parent MCP client process disappears.

ENVIRONMENT VARIABLES
    RESOLVE_MODEL_PRECISION  Model precision: "int8" (default), "fp16", or "fp32".
    RESOLVE_MODEL_DIR        Model directory containing model_*.onnx + tokenizer.json.
    RESOLVE_TOPK             Minimum topk value (default 10).
    INCLUDE_DIRS             Comma-separated directory names to force-include.

MCP TOOLS
    path_resolve             Resolve a failed file path to the best match.
    tool_retry_with_resolve  Resolve and automatically retry the operation.
    roots_list               Return configured root directories.
    reindex_paths            Force a full rebuild of the path index.

MCP CLIENT CONFIGURATION (Claude Code)
    "pathfinder-mcp": {
      "command": "pathfinder",
      "args": ["--root", "${workspaceFolder}"]
    }
  

Quantitative Summary

MetricValue
Rust source code2,117 lines (server 1,951 + inference 166)
Python benchmarks1,587 lines (6 scripts)
Dependencies8 crates
MCP tools4
Scoring features12 lexical + 2 history + 1 neural
Test sampling files3,261 files / 628 directories
Models evaluated3 ColBERT variants
Benchmark suites4 (basic, small-project, monorepo, correlation)

Conclusion

pathfinder optimization was not a search for a single “perfect” scoring function, but rather iterative refinement through benchmark-driven hypothesis testing.

Key insights from improvement iterations:

  1. History correlation is highly effective: A mere 5-entry ring buffer yielded +50pp accuracy gain. For bare filename disambiguation, context utilization vastly outperforms model improvement
  2. INT8 is sufficient for this use case: FP32’s 119x latency cost buys only +3.6pp improvement. For interactive path resolution, INT8 at 9ms is the optimal choice