Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Benchmarking Plan

This document specifies the benchmarking strategy for AnyFS when the implementation exists. Functionality and security are the primary goals; performance validation is secondary but important.


Goals

  1. Validate design decisions - Confirm that the Tower-style middleware approach doesn’t introduce unacceptable overhead
  2. Identify optimization opportunities - Find hot paths that need attention
  3. Establish baselines - Know where we stand relative to alternatives
  4. Prevent regressions - Track performance across versions

Benchmark Categories

1. Backend Benchmarks

Compare AnyFS backends against equivalent solutions for their specific use cases.

MemoryBackend vs Alternatives

CompetitorUse CaseWhy Compare
std::collections::HashMapRaw key-value baselineTheoretical minimum overhead
tempfile + std::fsIn-memory temp filesCommon testing approach
vfs::MemoryFSVirtual filesystemDirect competitor
virtual-fsIn-memory FSAnother VFS crate

Metrics:

  • Sequential read/write throughput (1KB, 64KB, 1MB, 16MB files)
  • Random access latency (small reads at random offsets)
  • Directory listing performance (10, 100, 1000, 10000 entries)
  • Memory overhead per file/directory

SqliteBackend vs Alternatives

CompetitorUse CaseWhy Compare
rusqlite rawBaseline SQLite performanceMeasure our abstraction cost
sledEmbedded databaseAlternative storage engine
redbEmbedded databaseModern alternative
File-per-recordDirect filesystemTraditional approach

Metrics:

  • Insert throughput (batch vs individual)
  • Read throughput (sequential vs random)
  • Transaction overhead
  • Database size vs raw file size
  • Startup time (opening existing database)

VRootFsBackend vs Alternatives

CompetitorUse CaseWhy Compare
std::fs directBaseline filesystemMeasure containment overhead
cap-stdCapability-based FSSecurity-focused alternative
chroot simulationTraditional sandboxingSystem-level approach

Metrics:

  • Path resolution overhead
  • Symlink traversal cost
  • Escape attempt detection cost

2. Middleware Overhead Benchmarks

Measure the cost of each middleware layer.

MiddlewareWhat to Measure
Quota<B>Size tracking overhead per operation
PathFilter<B>Glob matching cost per path
ReadOnly<B>Should be zero (just error return)
RateLimit<B>Fixed-window counter check overhead
Tracing<B>Span creation/logging cost
Cache<B>Cache hit/miss latency difference

Key question: What’s the cost of a 5-layer middleware stack vs direct backend access?

Target: Middleware overhead should be <5% of I/O time for typical operations.

3. Composition Benchmarks

Measure real-world stacks, not isolated components.

AI Agent Sandbox Stack

Quota → PathFilter → RateLimit → Tracing → MemoryBackend

Compare against:

  • Raw MemoryBackend (baseline)
  • Manual checks in application code (alternative approach)

Persistent Database Stack

Cache → Tracing → SqliteBackend

Compare against:

  • Raw SqliteBackend (baseline)
  • Application-level caching (alternative approach)

4. Trait Implementation Benchmarks

Validate that strategic boxing doesn’t hurt performance.

OperationExpected Cost
read() / write()Zero-cost (monomorphized)
open_read()Box<dyn Read>~50ns allocation, negligible vs I/O
read_dir()ReadDirIterOne allocation per call
FileStorage::boxed()One-time cost at setup

Competitor Matrix

By Use Case

Use CaseAnyFS ComponentPrimary Competitors
Testing/mockingMemoryBackendtempfile, vfs::MemoryFS
Embedded databaseSqliteBackendsled, redb, raw SQLite
Sandboxed host accessVRootFsBackendcap-std, chroot
Policy enforcementMiddleware stackManual application code
Union filesystemOverlayoverlayfs (kernel), fuse-overlayfs

Crate Comparison

CrateStrengthsWeaknessesCompare For
vfsSimple APINo middleware, limited featuresAPI ergonomics
virtual-fsWASM supportLess composableCross-platform
cap-stdSecurity-focusedDifferent abstraction levelSandboxing
tempfileBattle-testedNot a VFSTemp file operations
include_dirCompile-time embeddingRead-onlyEmbedded assets

Benchmark Infrastructure

Framework

Use criterion for statistical rigor:

  • Warm-up iterations
  • Outlier detection
  • Comparison between runs

Test Data Sets

DatasetContentsPurpose
Small files1000 files × 1KBMetadata-heavy workload
Large files10 files × 100MBThroughput workload
Deep hierarchy10 levels × 10 dirsPath resolution stress
Wide directory1 dir × 10000 filesListing performance
Mixed realisticProject-like structureReal-world simulation

Reporting

Generate:

  • Throughput charts (ops/sec, MB/sec)
  • Latency histograms (p50, p95, p99)
  • Memory usage graphs
  • Comparison tables vs competitors

Performance Targets

These are aspirational targets to validate during implementation:

MetricTargetRationale
Middleware overhead<5% of I/O timeComposability shouldn’t cost much
MemoryBackend vs HashMap<2x slowerAbstraction cost
SqliteBackend vs raw SQLite<1.5x slowerThin wrapper
VRootFsBackend vs std::fs<1.2x slowerPath checking cost
5-layer stack<10% overheadReal-world composition

Benchmark Workflow

Development Phase

cargo bench --bench <component>

Run focused benchmarks during development to catch regressions.

Release Phase

cargo bench --all

Full benchmark suite before releases, with comparison to previous version.

CI Integration

  • Run subset of benchmarks on PR (smoke test)
  • Full benchmark suite on main branch
  • Store results for trend analysis

Non-Goals

  • Beating std::fs at raw I/O - We add abstraction; some overhead is acceptable
  • Micro-optimizing cold paths - Focus on hot paths (read, write, metadata)
  • Benchmark gaming - Optimize for real use cases, not synthetic benchmarks

Tracking

GitHub Issue: Implement benchmark suite

  • Blocked by: Core AnyFS implementation
  • Dependencies: criterion, test data generation
  • Milestone: Post-1.0 (after functionality and security are solid)