shelpa: Design and Lessons from a Scrapped Sandbox MCP

Overview

shelpa was an MCP (Model Context Protocol) compliant virtual pipeline shell developed to provide LLM agents with a safe file writing mechanism. The design — consolidating all writes through the tee command and mirroring every operation to .shelpa/ to preserve edit history — was a good concept.

However, this tool was ultimately scrapped. This article records shelpa’s design philosophy, security implementation, and why it was abandoned.

Design Philosophy: Consolidating Writes to tee

Intent

Conventional MCP file operation tools allow LLM agents to freely modify files via write_file and edit_file. shelpa’s intent was to restrict the write mechanism to UNIX pipeline tee only:

  rg "pattern" src/ | awk '{print $2}' | tee output.txt

rg, awk, sed, jq, etc. are used as read buffers
The final write goes through tee — the only write mechanism
tee writes are automatically mirrored to .shelpa/, preserving edit history

The Scheme: Disguising Tools as Shell Commands

shelpa’s design had another key intent: disguising the tool interface as shell commands so LLMs would reuse the “how to use a shell” knowledge already acquired during pre-training — essentially, tricking the model into adoption.

LLMs have learned massive amounts of shell scripts and terminal logs during pre-training and “know” how to use commands like tail, rg, awk, sed, and tee. To exploit this knowledge, tool names were intentionally aligned with UNIX shell vocabulary:

Tool Name	Role
`shelpa_pipe`	Execute pipeline commands
`shelpa_tee`	File writing (via tee)
`shelpa_write`	Direct writing (tee wrapper)
`shelpa`	Navigation (cd, pwd)

The actual help output looked like this:

  shelpa-mcp (MCP stdio server)
Usage:
  shelpa-mcp [--root <ROOT>] [--help]
Notes:
  - This binary speaks MCP over stdio. It does not serve HTTP.
  - Use your MCP client to call tools below.
  - --root sets the workspace root directory for all tool calls.
  - cwd defaults to the workspace root if not provided.
Tools:
  - shelpa_pipe    Execute a virtual safety pipeline
  (tail  rg  awk  sed  tr  jq  wc  tee  fd  ls  sort  head)
  - shelpa_write   Execute a virtual safety tee (auto guard, save override history)
Allowed pipeline commands: tail rg awk sed tr jq wc tee fd
Navigation commands (single-stage only): pwd  cd <path>  ls [path]
Pipes only. No redirects (> >>). No sed -i. No awk file output. Save via tee only.
CRITICAL: Never use standard file editing tools (such as write_file, replace, etc.)
  - always use the specified tool exclusively.

The hypothesis: an LLM seeing shelpa_pipe would recognize “this is a tool for executing shell pipelines” and naturally construct pipelines using its pre-trained shell knowledge. The CRITICAL line in the help output was also designed as a system prompt instruction, strongly telling the LLM “never use standard file editing tools, use shelpa exclusively.”

Security Design

Vulnerability 1: Path Escape via cwd Option

The tool_pipe cwd parameter was being canonicalized but not verified to be within the workspace root.

  // After fix
let canonical = fs::canonicalize(&resolved)?;
if !canonical.starts_with(root.canonicalize()?) {
    return Err(GuardViolation::new(
        GuardReason::PathEscape,
        "cwd must be within workspace root"
    ));
}

Vulnerability 2: Path Escape Within .shelpa

shelpa_write_path directly joined the target, allowing writes outside .shelpa/ via ../ patterns.

  // After fix: compute relative path from canonicalized real path
pub fn shelpa_write_path(workspace_root: &Path, real_path: &Path) -> Result<PathBuf, GuardViolation> {
    let real_canonical = fs::canonicalize(real_path)?;
    let relative = real_canonical.strip_prefix(workspace_root.canonicalize()?)
        .map_err(|_| GuardViolation::new(GuardReason::PathEscape, "Path is outside workspace"))?;
    Ok(workspace_root.join(".shelpa").join(relative))
}

Vulnerability 3: Unbounded Memory Buffering in tee

All output was buffered in memory during tee operations, risking OOM with large outputs. Addressed with two-tier limits:

Hard limit (1MB): Excess is truncated, truncated=true set in metadata
Soft limit (2KB, approval gate): Returns APPROVAL_REQUIRED error when exceeded, requiring explicit user confirmation

  if out_data.len() > TEE_APPROVAL_BYTES && !confirm_oversize {
    return Err(GuardViolation::new(
        GuardReason::ApprovalRequired,
        format!("tee would write {} bytes, exceeding the {}-byte threshold. \
                 Re-invoke with confirm_oversize=true to proceed.",
                out_data.len(), TEE_APPROVAL_BYTES)
    ));
}

Audit Mirror Design

All writes were mirrored to .shelpa/. On each tee execution, the same content was saved in .shelpa/, with timestamp separators inserted on overwrite:

  --- shelpa:overwrite ts=2026-02-25T13:17:30Z record_id=1772025450395572000 ---
(new content)

Why It Was Scrapped

The Fundamental Problem: Difficulty of Model Behavior Correction

shelpa’s design was technically sound, but it proved impossible to establish the behavior pattern of “write through shelpa” in LLM agents.

Specifically:

Low tool selection priority: LLMs default to using write_file and edit_file. Even when shelpa usage was enforced via system prompts, models would forget as context grew longer
Pipeline syntax construction errors: Frequent failures in correctly building pipelines like rg pattern | awk '{print $2}' | tee output.txt. Quote escaping was a common point of failure
Shell disguise didn’t fool the model: The intent was to trick LLMs with shell-style naming like shelpa_pipe and shelpa_tee, but models didn’t recognize them as “a type of shell.” The scheme to leverage pre-trained shell knowledge had less effect than expected

Lessons Learned

1. Tools That Fight LLM Default Behavior Won’t Stick

LLMs are strongly pulled toward tool usage patterns learned during pre-training. When providing custom interfaces, system prompts alone are insufficient for correction.

2. You Can’t Fool Models with Shell Disguise

Shell-style naming like shelpa_pipe and shelpa_tee was meant to trick models into reusing pre-trained shell knowledge, but it failed to change LLM behavior. Tool naming tricks are overwhelmingly weaker than the LLM’s internal pattern of “call write_file when writing files.” That said, the model would occasionally send one-liners through the pipe, but scrutinizing those was also quite a chore.

3. The Audit Mirror Idea Itself Is Valid

The design of mirroring all writes to .shelpa/ was useful from a post-hoc audit perspective. Using pattern matching, you could quickly find checkpoints you wanted to roll back to. This was convenient for recovery when an LLM agent accidentally overwrote important files.

Summary

shelpa was an MCP tool built on the concept of “tee as the sole write mechanism + audit mirror,” but it proved impossible to get LLMs to adopt this tool, and it was ultimately scrapped.

Model behavior correction is difficult, and forcing the use of fundamental tools (reading and writing) was not practical at this point. That said, the virtual pipeline design and security defense patterns made for some interesting implementation, so even though it was scrapped, it was fun.

shelpa: Design and Lessons from a Scrapped Sandbox MCP

Overview link

Design Philosophy: Consolidating Writes to tee link

Intent link

The Scheme: Disguising Tools as Shell Commands link

Security Design link

Vulnerability 1: Path Escape via cwd Option link

Vulnerability 2: Path Escape Within .shelpa link

Vulnerability 3: Unbounded Memory Buffering in tee link

Audit Mirror Design link

Why It Was Scrapped link

The Fundamental Problem: Difficulty of Model Behavior Correction link

Lessons Learned link

1. Tools That Fight LLM Default Behavior Won’t Stick link

2. You Can’t Fool Models with Shell Disguise link

3. The Audit Mirror Idea Itself Is Valid link

Summary link