Persistent Storage for AI Agent Sandboxes: Volumes, Copy-on-Write Forks, and Snapshots
AI agent sandboxes are ephemeral by default. Every sandbox gets its own isolated filesystem, and when that container stops — by timeout, deletion, or crash — everything inside it is gone. For a single-turn agent that runs, returns a result, and exits, this is fine. For anything longer — multi-step workflows, parallel evaluation runs, pipelines that hand state from one agent to the next — ephemeral storage is the wrong foundation.
Persistent storage for AI agent sandboxes means decoupling the data layer from the container lifecycle: a named volume that exists independently of any sandbox, mountable on demand, and accessible to multiple agents simultaneously via the standard POSIX filesystem interface. No object storage SDK. No serialization. No context window transfers.
This is what Sandbox0 volumes are. The rest of this post explains how they work, how to share them across sandboxes, and how copy-on-write forks make parallel agent workloads storage-efficient.
Why Ephemeral Sandbox Storage and Context Passing Break Down#
The problems compound quickly:
Context windows have hard limits. A 200KB file passed through a context window consumes roughly 50,000 tokens. At $15 per million input tokens, every file-sharing hop costs money directly proportional to file size. More importantly, if the file exceeds the context limit, you have to chunk, summarize, or drop data — all of which introduce correctness problems.
You can't resume from where you stopped. If an agent runs for 30 minutes and builds up a local workspace, a crash or timeout loses everything. Restarting from scratch is not just slow — some operations aren't safely repeatable.
Parallel agents need a shared view. If you want ten agents to independently process variants of the same dataset, you either copy the dataset ten times (expensive, slow) or have each agent download it independently (ten times the bandwidth, ten times the startup latency).
Sequential pipelines create coupling. When agent A's output becomes agent B's input through a context variable, your pipeline is brittle. Any schema change in A breaks B. There's no audit trail, no version history, and no way to replay step B with the same inputs.
Sandbox Volumes: Persistent Storage Independent of Container Lifetime#
A Sandbox0 Volume is a named, persistent storage unit that exists independently of any sandbox. When a sandbox stops — whether it times out, is explicitly deleted, or crashes — the volume data remains. Mount the same volume into a new sandbox, and you're back exactly where you left off.
bash# Create a persistent volume s0 volume create --access-mode RWX # Mount it to a running sandbox s0 sandbox volume mount \ --sandbox-id <sandbox-id> \ --volume-id <volume-id> \ --path /mnt/workspace
From inside the sandbox, the agent reads and writes to /mnt/workspace like any normal directory. There is no special API, no object storage SDK, no serialization step. The POSIX interface is preserved end to end — the agent code doesn't need to know it's talking to a distributed filesystem backed by S3 and PostgreSQL.
Volumes are backed by JuiceFS. Data blocks are stored in S3; filesystem metadata lives in PostgreSQL. This gives you S3-grade durability with POSIX-grade filesystem semantics, without requiring the agent code to use an object storage API.
Three Access Modes#
Volumes support three sharing modes:
| Mode | Description | When to Use |
|---|---|---|
RWO | Read-write, single sandbox | Exclusive agent workspace, database files |
ROX | Read-only, multiple sandboxes | Shared model weights, static reference datasets |
RWX | Read-write, multiple sandboxes | Collaborative workspaces, shared output queues |
The access mode is set at volume creation time. Trying to mount an ROX volume for writing fails immediately — the constraint is enforced at the infrastructure layer, not by convention.
Sharing a Persistent Volume Across Multiple Agent Sandboxes#
The most direct use of RWX mode is a producer-consumer pattern: one agent writes, another reads, with no serialization through a context window.
python# Create a shared volume volume = client.volumes.create(CreateSandboxVolumeRequest( access_mode=VolumeAccessMode.RWX, )) # Mount to both sandboxes sandbox1 = client.sandboxes.claim("default") sandbox2 = client.sandboxes.claim("default") with sandbox1.mount(volume.id, "/mnt/shared"): with sandbox2.mount(volume.id, "/mnt/shared"): # Agent 1 writes its output sandbox1.write_file("/mnt/shared/analysis.json", results) # Agent 2 can read it immediately — no transfer, no tokens consumed data = sandbox2.read_file("/mnt/shared/analysis.json")
This also works for fan-out patterns where multiple agents need concurrent read access to the same large dataset. Instead of each agent downloading or receiving a copy of a 10GB model file, all of them mount the same ROX volume. The data is fetched once from S3, cached at the storage layer, and served to all readers.
Snapshots: Checkpointing Agent State#
A snapshot is a point-in-time, read-only copy of a volume — created in near-instant time using copy-on-write semantics. Snapshots don't copy data blocks; they record what blocks existed at the moment the snapshot was taken, and only write new blocks when the live volume diverges from the snapshot state.
bash# Take a snapshot before a risky operation s0 volume snapshot create vol_abc123xyz --name "before-refactor" # If something goes wrong, roll back s0 volume snapshot restore vol_abc123xyz snap_def456uvw
Restoring a snapshot is reversible in the sense that you can take another snapshot before restoring — but the restore itself replaces the live volume state with the snapshot state. All writes made after the snapshot was taken are lost. The docs are explicit about this: create a backup snapshot before any restore.
Snapshots are useful for agent workflows that involve state mutation:
- Evaluation checkpoints: snapshot before running an agent against an evaluation set, so you can restore and re-run with different parameters without re-building the initial state
- Rollback on failure: if an agent modifies a shared dataset and produces corrupt output, roll back to the last known-good snapshot
- Version tracking: give snapshots meaningful names (
v1.0-release,after-training-run-47) and treat the volume like a versioned artifact
Copy-on-Write Fork: Branch a Sandbox Volume for Parallel Agent Workloads#
Fork is the operation that makes parallel agent workflows economically viable. When you fork a volume, Sandbox0 creates a new independent volume that shares all data blocks with the source — using copy-on-write (COW) semantics at the JuiceFS metadata layer. The fork starts with zero additional storage cost. Storage only diverges as each fork writes new data.
Concretely: the fork operation copies the directory metadata (inodes and block references), not the data blocks in S3. The original blocks are shared by reference. Only when a fork writes to a file are new blocks created in S3 for the affected region — the source volume's blocks are untouched. Ten forks of a large source volume all reading the same data pay no duplication cost at all.
bash# Fork a base dataset volume into 10 independent variants for i in $(seq 1 10); do s0 volume fork <base-volume-id> --access-mode RWO done
Each forked volume is fully independent. An agent writing to fork 3 doesn't affect fork 7. When the agents finish, you can inspect each fork's output, pick the best result, and discard the rest. The data blocks shared with the source volume are not deleted until all references are gone.
This maps directly to several real agent patterns:
Parallel evaluation: fork a codebase volume 5 times, run 5 agents with different prompts or models, compare their outputs without any agent seeing another's changes.
Safe mutation: fork a production dataset, run an agent on the fork, and promote the fork to production only if the results are correct — without ever touching the original data during the experiment.
Multi-agent search: fork an initial research volume, run a tree of agents exploring different hypotheses, and collapse back to the best branch.
The fork API inherits performance configuration from the source volume, but you can override any parameter at fork time — giving heavier-write forks more buffer, or giving read-heavy forks more prefetch.
How the Storage Layer Works#
The POSIX semantics agents see are provided by storage-proxy, a centralized component that manages JuiceFS filesystems and volume metadata. When an agent writes a file through /mnt/workspace, the write goes to procd (the PID-1 process inside the sandbox), which forwards it to storage-proxy via gRPC, which writes to the JuiceFS metadata store in PostgreSQL and stages data blocks to S3.
This architecture is why volumes survive sandbox deletion: the storage-proxy and the volume metadata exist outside the sandbox pod lifecycle. The sandbox is an execution environment; the volume is the data layer. They're decoupled by design.
It's also why multiple sandboxes can mount the same volume simultaneously — the storage-proxy mediates all access, handles the JuiceFS metadata, and coordinates concurrent reads and writes at the filesystem layer.
Nix for Reproducible Environments#
Volumes can also persist runtime environment artifacts — not just application data. Pairing a volume with Nix lets you store a dependency closure once and reuse it across sandbox restarts without re-downloading packages.
What volumes can persist across sandbox restarts: dependency closures, build caches, package stores, lock files, and workspace-level toolchains.
What volumes cannot persist: live process runtime state — in-memory variables, process stacks, open sockets, PID state. A new sandbox mounts the volume and sees the files, but starts as a new process. It's not a resume of a paused container; it's a fresh process with access to a prebuilt filesystem.
FAQ#
Does the agent need to change its code to use a persistent sandbox volume?
No. From the agent's perspective, the persistent volume is just a directory at the mount path. Any file operation — open, read, write, stat, rename — works through the standard POSIX interface. The storage-proxy translates those operations to JuiceFS calls transparently. There is no object storage SDK, no special write API, and no serialization step required.
What happens to the volume if a sandbox crashes?
The volume is unaffected. Sandbox crashes don't unmount the volume from the storage-proxy's perspective — they disconnect the sandbox pod from the mount, but the volume data is unchanged. Mount it to a new sandbox to continue from where the crashed agent left off.
Can multiple sandboxes write to the same RWX volume concurrently?
Yes. The RWX mode is designed for concurrent read-write access. You're responsible for handling write conflicts at the application level — if two agents might write to the same file, use a lock file or a coordination protocol. The filesystem does not enforce write ordering between concurrent writers.
Is a fork the same as a snapshot?
A snapshot is a read-only point-in-time record tied to a source volume. A fork is a new independent writable volume that starts from the source's current state. You can take many snapshots of one volume; each snapshot points back to its source. A fork is its own volume — once created, it has no runtime dependency on the source.
Does forking require the source volume to be unmounted?
No. The fork operation does not require the source volume to be mounted or unmounted. It operates on the JuiceFS metadata layer directly.
Can I self-host the volume system?
Yes. The storage-proxy component is part of the open-source Sandbox0 repository. It runs in a Kubernetes cluster alongside manager and the other data plane components. You bring your own S3-compatible object storage and PostgreSQL instance.
Volume operations — create, mount, snapshot, fork, and restore — are documented in the Volume section of the Sandbox0 docs. For mounting volumes to running sandboxes, see Volume Mounts. If you want to keep one working tree aligned between your laptop and a cloud sandbox, read Sync a Local Workspace with a Cloud Sandbox for AI Agents.