Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Prior Art Analysis: Filesystem Abstraction Libraries

This document analyzes filesystem abstraction libraries in other languages to learn from their successes, identify features we should adopt, and avoid known vulnerabilities.


Executive Summary

LibraryLanguageKey StrengthKey WeaknessWhat We Can Learn
fsspecPythonAsync + caching + data science integrationNo middleware compositionCaching strategies, async design
PyFilesystem2PythonClean URL-based APISymlink handling issuesPath normalization
AferoGoComposition (CopyOnWrite, Cache, BasePathFs)Symlink escape in BasePathFsComposition patterns
Apache Commons VFSJavaEnterprise-grade, many backendsCVE: Path traversal with encoded ..URL encoding attacks
System.IO.Abstractions.NETPerfect for testing, mirrors System.IONo middleware/compositionMockFileSystem patterns
memfsNode.jsBrowser + Node unified APIFork exists due to “longstanding bugs”In-memory implementation
soft-canonicalizeRustNon-existing path resolution, TOCTOU-safeReal FS only (not virtual)Attack patterns to defend
strict-pathRust19+ attack types blocked, type-safe markersReal FS only (not virtual)Attack catalog for testing

Detailed Analysis

1. Python: fsspec

Repository: fsspec/filesystem_spec

What they do well:

  1. Unified Interface Across 20+ Backends

    • Local, S3, GCS, Azure, HDFS, HTTP, FTP, SFTP, ZIP, TAR, Git, etc.
    • Same API regardless of backend
  2. Sophisticated Caching

    # Block-wise caching - only download accessed parts
    fs = fsspec.filesystem('blockcache', target_protocol='s3',
                           cache_storage='/tmp/cache')
    
    # Whole-file caching
    fs = fsspec.filesystem('filecache', target_protocol='s3',
                           cache_storage='/tmp/cache')
    
  3. Async Support

    • AsyncFileSystem base class for async implementations
    • Concurrent bulk operations (cat fetches many files at once)
    • Used by Dask for parallel data processing
  4. Data Science Integration

    • Native integration with Pandas, Dask, Intake
    • Parquet optimization with parallel chunk fetching

What we should adopt:

  • Block-wise caching strategy (not just whole-file LRU)
  • Async design from the start (our ADR-024 async plan)
  • Consider “parts caching” for large file access patterns

What they lack that we have:

  • No middleware composition pattern
  • No quota/rate limiting built-in
  • No path filtering/sandboxing

2. Python: PyFilesystem2

Repository: PyFilesystem/pyfilesystem2

What they do well:

  1. URL-based Filesystem Specification

    from fs import open_fs
    
    home_fs = open_fs('osfs://~/')
    zip_fs = open_fs('zip://foo.zip')
    ftp_fs = open_fs('ftp://ftp.example.com')
    mem_fs = open_fs('mem://')
    
  2. Consistent Path Handling

    • Forward slashes everywhere (even on Windows)
    • Paths normalized automatically
  3. Glob Support Built-in

    for match in fs.glob('**/*.py'):
        print(match.path)
    

Known Issues (from GitHub):

IssueDescriptionImpact
#171Symlink loops cause infinite recursionDoS potential
#417No symlink creation supportMissing feature
#411Incorrect handling of symlinks with non-existing targetsBroken functionality
#61Symlinks not detected properlySecurity concern

Lessons for AnyFS:

  • ⚠️ Symlink handling is complex - we must handle loops, non-existent targets, and escaping
  • URL-based opening is convenient - consider for future
  • Consistent path format - virtual backends use forward slashes internally; OS-backed backends follow OS semantics

3. Go: Afero

Repository: spf13/afero

What they do well:

  1. Composition Pattern (Similar to Ours!)

    // Sandboxing
    baseFs := afero.NewOsFs()
    restrictedFs := afero.NewBasePathFs(baseFs, "/var/data")
    
    // Caching layer
    cachedFs := afero.NewCacheOnReadFs(baseFs, afero.NewMemMapFs(), time.Hour)
    
    // Copy-on-write
    cowFs := afero.NewCopyOnWriteFs(baseFs, afero.NewMemMapFs())
    
  2. io/fs Compatibility

    • Works with Go 1.16+ standard library interfaces
    • ReadDirFS, ReadFileFS, etc.
  3. Extensive Backend Support

    • OS, Memory, SFTP, GCS
    • Community: S3, MinIO, Dropbox, Google Drive, Git

Known Issues:

IssueDescriptionOur Mitigation
#282Symlinks in BasePathFs can escape jailUse strict-path crate for VRootFsBackend
#88Symlink handling inconsistentDocument behavior clearly
#344BasePathFs fails when basepath is .Test edge cases

BasePathFs Symlink Escape Issue:

“SymlinkIfPossible will resolve the RealPath of underlayer filesystem before make a symlink. For example, creating a link like ‘/foo/bar’ -> ‘/foo/file’ will be transform into a link point to ‘/{basepath}/foo/file.’”

This means symlinks can potentially point outside the base path!

Our Solution:

  • VRootFsBackend uses strict-path for real filesystem containment
  • Virtual backends (Memory, SQLite) are inherently safe - paths are just keys
  • PathFilter middleware provides additional sandboxing layer

What we should verify:

  • Test symlink creation pointing outside VRootFsBackend
  • Test .. in symlink targets
  • Test symlink loops with max depth

4. Java: Apache Commons VFS

Repository: Apache Commons VFS

🔴 CRITICAL VULNERABILITY: CVE in versions < 2.10.0

The Bug:

// FileObject API has resolveFile with scope parameter
FileObject file = baseFile.resolveFile("../secret.txt", NameScope.DESCENDENT);
// SHOULD throw exception - "../secret.txt" is not a descendent

// BUT with URL encoding:
FileObject file = baseFile.resolveFile("%2e%2e/secret.txt", NameScope.DESCENDENT);
// DOES NOT throw exception! Returns file outside base directory.

Root Cause: Path validation happened BEFORE URL decoding.

Lesson for AnyFS:

#![allow(unused)]
fn main() {
// WRONG - validate then decode
fn resolve(path: &str) -> Result<PathBuf, FsError> {
    validate_no_traversal(path)?;  // Checks for ".."
    let decoded = url_decode(path);  // "../" appears after decode!
    Ok(PathBuf::from(decoded))
}

// CORRECT - decode then validate
fn resolve(path: &str) -> Result<PathBuf, FsError> {
    let decoded = url_decode(path);
    let normalized = normalize_path(&decoded);  // Resolve all ".."
    validate_containment(&normalized)?;
    Ok(normalized)
}
}

Action Items:

  • Add test: URL-encoded %2e%2e path traversal attempt
  • Add test: Double-encoding %252e%252e
  • Ensure path normalization happens BEFORE validation
  • Document in security model

5. .NET: System.IO.Abstractions

Repository: TestableIO/System.IO.Abstractions

What they do well:

  1. Perfect API Compatibility

    • Mirrors System.IO exactly
    • Drop-in replacement for testing
  2. MockFileSystem for Testing

    var fileSystem = new MockFileSystem(new Dictionary<string, MockFileData>
    {
        { @"c:\myfile.txt", new MockFileData("Testing") },
        { @"c:\demo\jQuery.js", new MockFileData("jQuery content") },
    });
    
    // Use in tests
    var sut = new MyComponent(fileSystem);
    
  3. Analyzers Package

    • Roslyn analyzers warn when using System.IO directly
    • Guides developers to use abstractions

What they lack:

  • No middleware/composition
  • No caching layer
  • No sandboxing/path filtering
  • Testing-focused, not production backends

What we should adopt:

  • Consider Rust analyzer/clippy lint for std::fs usage
  • MockFileSystem pattern is similar to our MemoryBackend

6. Node.js: memfs + unionfs

Repository: streamich/memfs

What they do well:

  1. Browser + Node Unified

    • Works in browser via File System API
    • Same API as Node’s fs
  2. Union Filesystem Composition

    import { Union } from 'unionfs';
    import { fs as memfs } from 'memfs';
    import * as fs from 'fs';
    
    const ufs = new Union();
    ufs.use(fs);        // Real filesystem as base
    ufs.use(memfs);     // Memory overlay
    

Known Issues:

“There is a fork of memfs maintained by SageMath (sagemathinc/memfs-js) which was created to fix 13 security vulnerabilities revealed by npm audit. This fork exists because, as their GitHub description notes, ‘there are longstanding bugs’ in the upstream memfs.”

Lesson: Even popular libraries can have security issues. Our conformance test suite should be comprehensive.


Vulnerabilities Summary

LibraryVulnerabilityTypeOur Mitigation
Apache Commons VFSCVE (pre-2.10.0)URL-encoded path traversalDecode before validate
Afero (Go)Issue #282, #88Symlink escape from BasePathFsUse strict-path, test thoroughly
PyFilesystem2Issue #171Symlink loop causes infinite recursionLoop detection with max depth
memfs (Node)13 vulns in npm auditVarious (unspecified)Comprehensive test suite

Features Comparison Matrix

FeaturefsspecPyFS2AferoCommons VFSSystem.IO.AbsAnyFS
Middleware composition
Quota enforcement
Path sandboxing
Rate limiting
Caching layer
Async support🔜
Block-wise caching
URL-based opening
Union/overlay FS
Memory backend
SQLite backend
FUSE mounting🔜
Type-safe wrappers*

Future Ideas to Consider

These are optional extensions inspired by other ecosystems. They are intentionally not part of the core scope.

Keep (add-ons that fit the current design):

  • URL-based backend registry (sqlite://, mem://, stdfs://) as a helper crate, not in core APIs.
  • Bulk operation helpers (read_many, write_many, copy_many, glob, walk) as FsExt or a utilities crate.
  • Early async adapter crate (anyfs-async) to support remote backends without changing sync traits.
  • Bash-style shell (example app or anyfs-shell crate) that routes ls/cd/cat/cp/mv/rm/mkdir/stat through FileStorage to demonstrate middleware and backend neutrality (navigation and file management only, not full bash scripting).
  • Copy-on-write overlay middleware (Afero-style CopyOnWriteFs) as a specialized Overlay variant.
  • Archive backends (zip/tar) as separate crates implementing Fs (PyFilesystem/fsspec-style).

Defer (valuable, but needs data or wider review):

  • Range/block caching middleware for read_range heavy workloads (fsspec-style block cache).
  • Runtime capability discovery (Capabilities struct) for feature detection (symlink control, case sensitivity, max path length).
  • Lint/analyzer to discourage direct std::fs usage in app code (System.IO.Abstractions-style).
  • Retry/timeout middleware for remote backends (once remote backends exist).

Drop for now (adds noise or cross-platform complexity):

  • Change notification support (optional FsWatch trait or polling middleware).

Security Tests to Add

Based on vulnerabilities found in other libraries, add these to our conformance test suite:

Path Traversal Tests

#![allow(unused)]
fn main() {
#[test]
fn test_url_encoded_path_traversal() {
    let fs = create_sandboxed_fs("/sandbox");

    // These should all fail or be contained
    assert!(fs.read("%2e%2e/etc/passwd").is_err());      // URL-encoded ../
    assert!(fs.read("%252e%252e/secret").is_err());      // Double-encoded
    assert!(fs.read("..%2f..%2fetc/passwd").is_err());   // Mixed encoding
    assert!(fs.read("....//....//etc/passwd").is_err()); // Extra dots
}

#[test]
fn test_symlink_escape() {
    let fs = create_sandboxed_fs("/sandbox");

    // Symlink pointing outside should fail or be contained
    assert!(fs.symlink("/etc/passwd", "/sandbox/link").is_err());
    assert!(fs.symlink("../../../etc/passwd", "/sandbox/link").is_err());

    // Even if symlink created, reading should fail
    fs.symlink("../secret", "/sandbox/link").ok();
    assert!(fs.read("/sandbox/link").is_err());
}

#[test]
fn test_symlink_loop_detection() {
    let fs = MemoryBackend::new();

    // Create loop: a -> b -> a
    fs.symlink("/b", "/a").unwrap();
    fs.symlink("/a", "/b").unwrap();

    // Should detect loop, not hang
    let result = fs.read("/a");
    assert!(matches!(result, Err(FsError::TooManySymlinks { .. })));
}
}

Resource Exhaustion Tests

#![allow(unused)]
fn main() {
#[test]
fn test_deep_directory_traversal() {
    let fs = create_fs_with_depth_limit(64);

    // Creating very deep paths should fail
    let deep_path = "/".to_string() + &"a/".repeat(100);
    assert!(fs.create_dir_all(&deep_path).is_err());
}

#[test]
fn test_many_open_handles() {
    let fs = create_fs();
    let mut handles = vec![];

    // Opening many files shouldn't crash
    for i in 0..10000 {
        fs.write(format!("/file{}", i), b"x").unwrap();
        if let Ok(h) = fs.open_read(format!("/file{}", i)) {
            handles.push(h);
        }
    }
    // Should either succeed or return resource error, not crash
}
}

Action Items

High Priority

TaskSourcePriority
Add URL-encoded path traversal testsApache Commons VFS CVE🔴 Critical
Add symlink escape tests for VRootFsBackendAfero issues🔴 Critical
Add symlink loop detectionPyFilesystem2 #171🔴 Critical
Verify strict-path handles all edge casesAfero BasePathFs issues🔴 Critical

Medium Priority (Future)

TaskSourcePriority
Consider block-wise caching for large filesfsspec🟡 Enhancement
Add async supportfsspec async design🟡 Enhancement
URL-based filesystem specificationPyFilesystem2, Commons VFS🟢 Nice-to-have

Documentation

TaskSource
Document symlink behavior for each backendAll libraries have issues
Add security considerations for path handlingApache Commons VFS CVE
Compare AnyFS to alternativesThis analysis

Sibling Rust Projects: Path Security Libraries

AnyFS builds on foundational security work from two related Rust crates that specifically address path resolution vulnerabilities. These crates are planned to be used in AnyFS’s path handling implementation.

soft-canonicalize-rs

Repository: DK26/soft-canonicalize-rs

Purpose: Path canonicalization that works with non-existing paths—a critical gap in std::fs::canonicalize.

Security Features:

FeatureDescriptionAttack Prevented
NTFS ADS validationBlocks alternate data stream syntaxHidden data, path escape
Symlink cycle detectionBounded depth trackingDoS via infinite loops
Path traversal clampingCan’t ascend past rootDirectory escape
Null byte rejectionEarly validationNull injection
TOCTOU resistanceAtomic-like resolutionRace conditions
Windows UNC handlingNormalizes extended pathsPath confusion
Linux namespace preservationUses proc-canonicalizeContainer escape via /proc/PID/root

Key Innovation: Anchored Canonicalization

#![allow(unused)]
fn main() {
// All paths (including symlink targets) are clamped to anchor
let result = anchored_canonicalize("/workspace", user_input)?;
// If symlink points to /etc/passwd, result becomes /workspace/etc/passwd
}

This is exactly what VRootFsBackend needs for safe path containment.

strict-path-rs

Repository: DK26/strict-path-rs

Purpose: Type-safe path handling that prevents traversal attacks at compile time.

Two Modes:

ModeBehaviorUse Case
StrictPathReturns Err(PathEscapesBoundary) on escapeArchive extraction, file uploads
VirtualPathClamps escape attempts within sandboxMulti-tenant, per-user storage

Documented Attack Coverage (19+ vulnerabilities):

Attack TypeDescription
Symlink/junction escapesFollows and validates canonical paths
Windows 8.3 short namesDetects PROGRA~1 obfuscation
NTFS Alternate Data StreamsBlocks file.txt:hidden:$DATA
Zip Slip (CVE-2018-1000178)Validates archive entries before extraction
TOCTOU (CVE-2022-21658)Handles time-of-check-time-of-use races
Unicode/encoding bypassesNormalizes path representations
Mixed separatorsHandles / and \ on Windows
UNC path tricksPrevents \\?\C:\..\..\ attacks

Type-Safe Marker Pattern (mirrors AnyFS’s design!):

#![allow(unused)]
fn main() {
struct UserFiles;
struct SystemFiles;

fn process_user(f: &StrictPath<UserFiles>) { /* ... */ }
// Wrong marker type = compile error
}

Applicability to AnyFS

Important distinction:

Backend TypeStorage MechanismPath Resolution Provider
VRootFsBackendReal filesystemOS (backend is SelfResolving)
MemoryBackendHashMap keysFileStorage (symlink-aware)
SqliteBackendDB stringsFileStorage (symlink-aware)

For virtual backends (Memory, SQLite, etc.):

  • These third-party crates perform real filesystem resolution (follow actual symlinks on disk)
  • Virtual backends treat paths as keys, so these crates can’t help
  • AnyFS implements its own path resolution in FileStorage that:
    1. Walks path components via metadata() and read_link()
    2. Resolves symlinks by reading targets from virtual storage
    3. Handles .. correctly after symlink resolution
    4. Detects loops by tracking visited virtual paths

For VRootFsBackend only:

  • Since it wraps the real filesystem, strict-path provides safe containment
  • The backend implements SelfResolving, so FileStorage skips its own resolution

Security Tests Added to Conformance Suite

Based on these libraries, we’ve added tests for:

Windows-Specific:

  • NTFS Alternate Data Streams (file.txt:hidden)
  • Windows 8.3 short names (PROGRA~1)
  • UNC path traversal (\\?\C:\..\..\)
  • Reserved device names (CON, PRN, NUL)
  • Junction point escapes

Linux-Specific:

  • /proc/PID/root magic symlinks
  • /dev/fd/N file descriptor symlinks

Unicode:

  • NFC vs NFD normalization
  • Right-to-Left Override (U+202E)
  • Homoglyph confusion (Cyrillic vs Latin)

TOCTOU:

  • Check-then-use race conditions
  • Symlink target changes during resolution

Conclusion

What makes AnyFS unique:

  1. Middleware composition - Only Afero has this, and we do it better (Tower-style)
  2. Quota + rate limiting - No other library has built-in resource control
  3. Type-safe wrappers - Users can create wrapper newtypes for compile-time container isolation
  4. SQLite backend - No other abstraction library offers this

What we should learn from others:

  1. Path traversal via encoding - Apache Commons VFS vulnerability
  2. Symlink handling complexity - All libraries struggle with this
  3. Caching strategies - fsspec’s block-wise caching is sophisticated
  4. Async support - fsspec shows how to do this well

Critical security tests to add:

  1. URL-encoded path traversal (%2e%2e)
  2. Symlink escape from sandboxed directories
  3. Symlink loop detection
  4. Deep path exhaustion

Sources

External Libraries

Sibling Rust Projects

Vulnerability References