Benchmarking Plan

This document specifies the benchmarking strategy for AnyFS when the implementation exists. Functionality and security are the primary goals; performance validation is secondary but important.

Goals

Validate design decisions - Confirm that the Tower-style middleware approach doesn’t introduce unacceptable overhead
Identify optimization opportunities - Find hot paths that need attention
Establish baselines - Know where we stand relative to alternatives
Prevent regressions - Track performance across versions

Benchmark Categories

1. Backend Benchmarks

Compare AnyFS backends against equivalent solutions for their specific use cases.

MemoryBackend vs Alternatives

Competitor	Use Case	Why Compare
`std::collections::HashMap`	Raw key-value baseline	Theoretical minimum overhead
`tempfile` + `std::fs`	In-memory temp files	Common testing approach
`vfs::MemoryFS`	Virtual filesystem	Direct competitor
`virtual-fs`	In-memory FS	Another VFS crate

Metrics:

Sequential read/write throughput (1KB, 64KB, 1MB, 16MB files)
Random access latency (small reads at random offsets)
Directory listing performance (10, 100, 1000, 10000 entries)
Memory overhead per file/directory

SqliteBackend vs Alternatives

Competitor	Use Case	Why Compare
`rusqlite` raw	Baseline SQLite performance	Measure our abstraction cost
`sled`	Embedded database	Alternative storage engine
`redb`	Embedded database	Modern alternative
File-per-record	Direct filesystem	Traditional approach

Metrics:

Insert throughput (batch vs individual)
Read throughput (sequential vs random)
Transaction overhead
Database size vs raw file size
Startup time (opening existing database)

VRootFsBackend vs Alternatives

Competitor	Use Case	Why Compare
`std::fs` direct	Baseline filesystem	Measure containment overhead
`cap-std`	Capability-based FS	Security-focused alternative
`chroot` simulation	Traditional sandboxing	System-level approach

Metrics:

Path resolution overhead
Symlink traversal cost
Escape attempt detection cost

2. Middleware Overhead Benchmarks

Measure the cost of each middleware layer.

Middleware	What to Measure
`Quota<B>`	Size tracking overhead per operation
`PathFilter<B>`	Glob matching cost per path
`ReadOnly<B>`	Should be zero (just error return)
`RateLimit<B>`	Fixed-window counter check overhead
`Tracing<B>`	Span creation/logging cost
`Cache<B>`	Cache hit/miss latency difference

Key question: What’s the cost of a 5-layer middleware stack vs direct backend access?

Target: Middleware overhead should be <5% of I/O time for typical operations.

3. Composition Benchmarks

Measure real-world stacks, not isolated components.

AI Agent Sandbox Stack

Quota → PathFilter → RateLimit → Tracing → MemoryBackend

Compare against:

Raw MemoryBackend (baseline)
Manual checks in application code (alternative approach)

Persistent Database Stack

Cache → Tracing → SqliteBackend

Compare against:

Raw SqliteBackend (baseline)
Application-level caching (alternative approach)

4. Trait Implementation Benchmarks

Validate that strategic boxing doesn’t hurt performance.

Operation	Expected Cost
`read()` / `write()`	Zero-cost (monomorphized)
`open_read()` → `Box<dyn Read>`	~50ns allocation, negligible vs I/O
`read_dir()` → `ReadDirIter`	One allocation per call
`FileStorage::boxed()`	One-time cost at setup

Competitor Matrix

By Use Case

Use Case	AnyFS Component	Primary Competitors
Testing/mocking	MemoryBackend	`tempfile`, `vfs::MemoryFS`
Embedded database	SqliteBackend	`sled`, `redb`, raw SQLite
Sandboxed host access	VRootFsBackend	`cap-std`, `chroot`
Policy enforcement	Middleware stack	Manual application code
Union filesystem	Overlay	`overlayfs` (kernel), `fuse-overlayfs`

Crate Comparison

Crate	Strengths	Weaknesses	Compare For
`vfs`	Simple API	No middleware, limited features	API ergonomics
`virtual-fs`	WASM support	Less composable	Cross-platform
`cap-std`	Security-focused	Different abstraction level	Sandboxing
`tempfile`	Battle-tested	Not a VFS	Temp file operations
`include_dir`	Compile-time embedding	Read-only	Embedded assets

Benchmark Infrastructure

Framework

Use criterion for statistical rigor:

Warm-up iterations
Outlier detection
Comparison between runs

Test Data Sets

Dataset	Contents	Purpose
Small files	1000 files × 1KB	Metadata-heavy workload
Large files	10 files × 100MB	Throughput workload
Deep hierarchy	10 levels × 10 dirs	Path resolution stress
Wide directory	1 dir × 10000 files	Listing performance
Mixed realistic	Project-like structure	Real-world simulation

Reporting

Generate:

Throughput charts (ops/sec, MB/sec)
Latency histograms (p50, p95, p99)
Memory usage graphs
Comparison tables vs competitors

Performance Targets

These are aspirational targets to validate during implementation:

Metric	Target	Rationale
Middleware overhead	<5% of I/O time	Composability shouldn’t cost much
MemoryBackend vs HashMap	<2x slower	Abstraction cost
SqliteBackend vs raw SQLite	<1.5x slower	Thin wrapper
VRootFsBackend vs std::fs	<1.2x slower	Path checking cost
5-layer stack	<10% overhead	Real-world composition

Benchmark Workflow

Development Phase

cargo bench --bench <component>

Run focused benchmarks during development to catch regressions.

Release Phase

cargo bench --all

Full benchmark suite before releases, with comparison to previous version.

CI Integration

Run subset of benchmarks on PR (smoke test)
Full benchmark suite on main branch
Store results for trend analysis

Non-Goals

Beating std::fs at raw I/O - We add abstraction; some overhead is acceptable
Micro-optimizing cold paths - Focus on hot paths (read, write, metadata)
Benchmark gaming - Optimize for real use cases, not synthetic benchmarks

Tracking

GitHub Issue: Implement benchmark suite

Blocked by: Core AnyFS implementation
Dependencies: criterion, test data generation
Milestone: Post-1.0 (after functionality and security are solid)

Keyboard shortcuts

AnyFS Ecosystem Manual