Sandboxing LLM agents for security audits

Draft — these are my notes; edit freely before publishing.

Vellma orchestrates LLM agents that run real security tooling — Foundry, Slither, and friends — against smart contracts. The central problem is not prompt quality. It’s that you cannot trust anything the model decides to run.

Threat model

The agent will, eventually, try to execute arbitrary code: that’s the whole point of giving it tools. So the design assumes the agent is hostile and contains it rather than constraining it.

Each run gets a fresh, network-isolated container.
The workspace is ephemeral and mounted read-only except for a scratch dir.
Tool invocations are brokered, not shelled out raw.

// Tool calls go through a broker that owns the allowlist and the timeout —
// the agent never touches the host.
res, err := broker.Run(ctx, ToolCall{
    Name: "slither",
    Args: []string{"--json", "-", target},
    Timeout: 90 * time.Second,
})

The lesson

Agent frameworks make it easy to hand the model a shell. The useful work is in the boring layer underneath: the broker, the allowlist, the timeouts, the teardown. The model is the easy part.