IndexedBackend Pattern

SQLite Metadata + Content-Addressed Blob Storage

This document describes the IndexedBackend architecture pattern: separating filesystem metadata (stored in SQLite) from file content (stored as blobs). This enables efficient queries, large file support, and flexible storage backends.

Ecosystem Implementation: The anyfs-indexed crate provides IndexedBackend as a production-ready implementation using local disk blobs. See the Backends Guide for usage. This document covers the underlying design pattern for those building custom implementations (e.g., with S3, cloud storage, or custom blob stores).

Overview

The IndexedBackend pattern separates:

Metadata (directory structure, inodes, permissions) → SQLite
Content (file bytes) → Content-Addressed Storage (CAS)

┌─────────────────────────────────────────────────────────┐
│            IndexedBackend (pattern)                     │
│  ┌─────────────────────┐    ┌────────────────────────┐  │
│  │   SQLite Metadata   │    │   Blob Store (CAS)     │  │
│  │                     │    │                        │  │
│  │  - inodes           │    │  - content-addressed   │  │
│  │  - dir_entries      │    │  - deduplicated        │  │
│  │  - blob references  │    │  - S3, local, etc.     │  │
│  │  - audit log        │    │                        │  │
│  └─────────────────────┘    └────────────────────────┘  │
└─────────────────────────────────────────────────────────┘

Custom backends can use S3, cloud storage, or other blob stores.
IndexedBackend implements a simpler variant with UUID-named local blobs
(optimized for streaming; see note below on storage models).

Why this pattern?

SQLite is great for metadata queries (directory listings, stats, audit)
Blob stores scale better for large file content
Content-addressing enables deduplication
Separating concerns enables independent scaling

Storage Model Variants

Model	Blob Naming	Dedup	Best For
Content-Addressed	SHA-256 of content	✅ Yes	Cloud/S3, archival, multi-tenant
UUID+Timestamp	`{uuid}-{timestamp}.bin`	❌ No	Streaming large files, simplicity

IndexedBackend uses UUID+Timestamp naming because:

Large files can be streamed without buffering the entire file to compute a hash
Write latency is consistent (no hash computation)
Simpler garbage collection (delete blob when reference removed)

Custom implementations may prefer content-addressed storage when:

Deduplication is valuable (many users uploading same files)
Using cloud blob stores with native CAS support (S3, GCS)
Building archival systems where write latency is acceptable

Framework Validation

Do Current Traits Support This?

Yes. The Fs traits define operations, not storage implementation.

Trait Method	Hybrid Implementation
`read(path)`	SQLite lookup → blob fetch
`write(path, data)`	Blob upload → SQLite update
`metadata(path)`	SQLite query only
`read_dir(path)`	SQLite query only
`remove_file(path)`	SQLite update (refcount–)
`rename(from, to)`	SQLite update only
`copy(from, to)`	SQLite update (refcount++)

The traits don’t care where bytes come from - that’s the backend’s business.

Thread Safety

Current design requires &self methods with interior mutability. For hybrid:

#![allow(unused)]
fn main() {
pub struct CustomIndexedBackend {
    // SQLite needs single-writer (see "Write Queue" below)
    metadata: Arc<Mutex<Connection>>,

    // Blob store is typically already thread-safe
    blobs: Arc<dyn BlobStore>,

    // Write queue for serializing SQLite writes
    write_tx: mpsc::Sender<WriteCmd>,
}
}

This aligns with ADR-023 (interior mutability).

Data Model

SQLite Schema

-- Inode table (one row per file/directory/symlink)
CREATE TABLE nodes (
    inode       INTEGER PRIMARY KEY,
    parent      INTEGER NOT NULL,
    name        TEXT NOT NULL,
    node_type   TEXT NOT NULL,  -- 'file', 'dir', 'symlink'
    size        INTEGER NOT NULL DEFAULT 0,
    mode        INTEGER NOT NULL DEFAULT 420,  -- 0o644
    nlink       INTEGER NOT NULL DEFAULT 1,
    blob_id     TEXT,           -- NULL for directories
    symlink_target TEXT,        -- NULL unless symlink
    created_at  INTEGER NOT NULL,
    modified_at INTEGER NOT NULL,
    accessed_at INTEGER NOT NULL,

    UNIQUE(parent, name)
);

-- Root directory (inode 1)
INSERT INTO nodes (inode, parent, name, node_type, size, mode, created_at, modified_at, accessed_at)
VALUES (1, 1, '', 'dir', 0, 493, strftime('%s', 'now'), strftime('%s', 'now'), strftime('%s', 'now'));

-- Blob reference tracking (for dedup + GC)
CREATE TABLE blobs (
    blob_id     TEXT PRIMARY KEY,  -- sha256 hex
    size        INTEGER NOT NULL,
    refcount    INTEGER NOT NULL DEFAULT 0,
    created_at  INTEGER NOT NULL
);

-- Audit log (optional but recommended)
CREATE TABLE audit (
    seq         INTEGER PRIMARY KEY AUTOINCREMENT,
    timestamp   INTEGER NOT NULL,
    operation   TEXT NOT NULL,
    path        TEXT,
    actor       TEXT,
    details     TEXT  -- JSON
);

-- Indexes
CREATE INDEX idx_nodes_parent ON nodes(parent);
CREATE INDEX idx_nodes_blob ON nodes(blob_id) WHERE blob_id IS NOT NULL;
CREATE INDEX idx_blobs_refcount ON blobs(refcount) WHERE refcount = 0;

Blob Store Interface

#![allow(unused)]
fn main() {
/// Content-addressed blob storage.
pub trait BlobStore: Send + Sync {
    /// Store bytes, returns content hash (blob_id).
    fn put(&self, data: &[u8]) -> Result<String, BlobError>;

    /// Retrieve bytes by content hash.
    fn get(&self, blob_id: &str) -> Result<Vec<u8>, BlobError>;

    /// Check if blob exists.
    fn exists(&self, blob_id: &str) -> Result<bool, BlobError>;

    /// Delete blob (only call after refcount reaches 0).
    fn delete(&self, blob_id: &str) -> Result<(), BlobError>;

    /// Streaming read for large files.
    fn open_read(&self, blob_id: &str) -> Result<Box<dyn Read + Send>, BlobError>;

    /// Streaming write, returns blob_id on completion.
    fn open_write(&self) -> Result<Box<dyn BlobWriter>, BlobError>;
}

pub trait BlobWriter: Write + Send {
    /// Finalize the blob and return its content hash.
    fn finalize(self: Box<Self>) -> Result<String, BlobError>;
}
}

Implementations could be:

LocalCasBackend - local directory with content-addressed files
S3BlobStore - S3-compatible object storage
MemoryBlobStore - in-memory for testing

Implementation Sketch

Core Structure

#![allow(unused)]
fn main() {
use anyfs_backend::{FsRead, FsWrite, FsDir, FsError, Metadata, ReadDirIter, DirEntry, FileType};
use rusqlite::Connection;
use std::sync::{Arc, Mutex};
use std::path::{Path, PathBuf};
use tokio::sync::mpsc;

pub struct CustomIndexedBackend {
    /// SQLite connection (metadata)
    db: Arc<Mutex<Connection>>,

    /// Content-addressed blob storage
    blobs: Arc<dyn BlobStore>,

    /// Write command queue (single-writer pattern)
    write_tx: mpsc::UnboundedSender<WriteCmd>,

    /// Background writer handle
    _writer_handle: Arc<WriterHandle>,
}

enum WriteCmd {
    Write {
        path: PathBuf,
        blob_id: String,
        size: u64,
        reply: oneshot::Sender<Result<(), FsError>>,
    },
    Remove {
        path: PathBuf,
        reply: oneshot::Sender<Result<(), FsError>>,
    },
    CreateDir {
        path: PathBuf,
        reply: oneshot::Sender<Result<(), FsError>>,
    },
    // ... other write operations
}
}

Read Operations (Direct)

Read operations can query SQLite and blob store directly (no queue needed):

#![allow(unused)]
fn main() {
impl FsRead for CustomIndexedBackend {
    fn read(&self, path: &Path) -> Result<Vec<u8>, FsError> {
        let path = path.as_ref();

        // 1. Query SQLite for blob_id
        let db = self.db.lock().map_err(|_| FsError::Backend("lock poisoned".into()))?;

        let (blob_id, node_type): (Option<String>, String) = db.query_row(
            "SELECT blob_id, node_type FROM nodes WHERE inode = (
                SELECT inode FROM nodes WHERE parent = ? AND name = ?
            )",
            // ... path resolution params
            |row| Ok((row.get(0)?, row.get(1)?)),
        ).map_err(|_| FsError::NotFound { path: path.to_path_buf() })?;

        if node_type != "file" {
            return Err(FsError::NotAFile { path: path.to_path_buf() });
        }

        let blob_id = blob_id.ok_or_else(|| FsError::NotFound { path: path.to_path_buf() })?;

        drop(db);  // Release lock before blob fetch

        // 2. Fetch from blob store
        self.blobs.get(&blob_id)
            .map_err(|e| FsError::Backend(e.to_string()))
    }

    fn exists(&self, path: &Path) -> Result<bool, FsError> {
        let path = path.as_ref();
        let db = self.db.lock().map_err(|_| FsError::Backend("lock poisoned".into()))?;

        // Pure SQLite query
        let exists: bool = db.query_row(
            "SELECT EXISTS(SELECT 1 FROM nodes WHERE parent = ? AND name = ?)",
            // ... params
            |row| row.get(0),
        ).unwrap_or(false);

        Ok(exists)
    }

    fn metadata(&self, path: &Path) -> Result<Metadata, FsError> {
        let path = path.as_ref();
        let db = self.db.lock().map_err(|_| FsError::Backend("lock poisoned".into()))?;

        // Pure SQLite query - no blob store needed
        db.query_row(
            "SELECT node_type, size, mode, nlink, created_at, modified_at, accessed_at, inode
             FROM nodes WHERE parent = ? AND name = ?",
            // ... params
            |row| {
                let node_type: String = row.get(0)?;
                Ok(Metadata {
                    file_type: match node_type.as_str() {
                        "file" => FileType::File,
                        "dir" => FileType::Directory,
                        "symlink" => FileType::Symlink,
                        _ => FileType::File,
                    },
                    size: row.get(1)?,
                    permissions: Some(row.get(2)?),
                    // ... other fields
                })
            },
        ).map_err(|_| FsError::NotFound { path: path.to_path_buf() })
    }

    // ... other FsRead methods
}
}

Write Operations (Two-Phase Commit)

Writes use a two-phase pattern: upload blob first, then commit SQLite:

#![allow(unused)]
fn main() {
impl FsWrite for CustomIndexedBackend {
    fn write(&self, path: &Path, data: &[u8]) -> Result<(), FsError> {
        let path = path.as_ref().to_path_buf();

        // Phase 1: Upload blob (can fail independently)
        let blob_id = self.blobs.put(data)
            .map_err(|e| FsError::Backend(format!("blob upload failed: {}", e)))?;

        // Phase 2: Commit metadata (via write queue)
        let (tx, rx) = oneshot::channel();

        self.write_tx.send(WriteCmd::Write {
            path,
            blob_id,
            size: data.len() as u64,
            reply: tx,
        }).map_err(|_| FsError::Backend("write queue closed".into()))?;

        // Wait for SQLite commit
        rx.blocking_recv()
            .map_err(|_| FsError::Backend("write cancelled".into()))?
    }

    fn remove_file(&self, path: &Path) -> Result<(), FsError> {
        let path = path.as_ref().to_path_buf();

        // Queue the removal (blob cleanup happens in background via GC)
        let (tx, rx) = oneshot::channel();

        self.write_tx.send(WriteCmd::Remove { path, reply: tx })
            .map_err(|_| FsError::Backend("write queue closed".into()))?;

        rx.blocking_recv()
            .map_err(|_| FsError::Backend("remove cancelled".into()))?
    }

    fn copy(&self, from: &Path, to: &Path) -> Result<(), FsError> {
        // Copy is just a metadata operation - increment refcount, no blob copy!
        let (tx, rx) = oneshot::channel();

        self.write_tx.send(WriteCmd::Copy {
            from: from.as_ref().to_path_buf(),
            to: to.as_ref().to_path_buf(),
            reply: tx,
        }).map_err(|_| FsError::Backend("write queue closed".into()))?;

        rx.blocking_recv()
            .map_err(|_| FsError::Backend("copy cancelled".into()))?
    }

    // ... other FsWrite methods
}
}

Write Queue Worker

The single-writer pattern for SQLite:

#![allow(unused)]
fn main() {
async fn write_worker(
    db: Arc<Mutex<Connection>>,
    blobs: Arc<dyn BlobStore>,
    mut rx: mpsc::UnboundedReceiver<WriteCmd>,
) {
    while let Some(cmd) = rx.recv().await {
        let result = {
            let mut db = db.lock().unwrap();

            match cmd {
                WriteCmd::Write { path, blob_id, size, reply } => {
                    let result = db.execute_batch(&format!(r#"
                        BEGIN;

                        -- Upsert blob record
                        INSERT INTO blobs (blob_id, size, refcount, created_at)
                        VALUES ('{blob_id}', {size}, 1, strftime('%s', 'now'))
                        ON CONFLICT(blob_id) DO UPDATE SET refcount = refcount + 1;

                        -- Update or insert node
                        -- (simplified - real impl needs path resolution)

                        -- Audit log
                        INSERT INTO audit (timestamp, operation, path)
                        VALUES (strftime('%s', 'now'), 'write', '{path}');

                        COMMIT;
                    "#));

                    let _ = reply.send(result.map_err(|e| FsError::Backend(e.to_string())));
                }

                WriteCmd::Remove { path, reply } => {
                    // Decrement refcount (GC cleans up when refcount = 0)
                    let result = db.execute_batch(&format!(r#"
                        BEGIN;

                        -- Get blob_id before delete
                        -- Decrement refcount
                        -- Remove node
                        -- Audit log

                        COMMIT;
                    "#));

                    let _ = reply.send(result.map_err(|e| FsError::Backend(e.to_string())));
                }

                // ... other commands
            }
        };
    }
}
}

Deduplication

Content-addressing gives you dedup for free:

#![allow(unused)]
fn main() {
impl BlobStore for LocalCasBackend {
    fn put(&self, data: &[u8]) -> Result<String, BlobError> {
        // Hash the content
        let hash = sha256(data);
        let blob_id = hex::encode(hash);

        // Check if already exists
        let blob_path = self.root.join(&blob_id[0..2]).join(&blob_id);

        if blob_path.exists() {
            // Already have this content - dedup!
            return Ok(blob_id);
        }

        // Store new blob
        std::fs::create_dir_all(blob_path.parent().unwrap())?;
        std::fs::write(&blob_path, data)?;

        Ok(blob_id)
    }
}
}

Dedup in action:

User A writes report.pdf (10 MB) → blob abc123, refcount = 1
User B writes identical report.pdf → same blob abc123, refcount = 2
Physical storage: 10 MB (not 20 MB)

Refcount Management

-- On file write (new reference to blob)
UPDATE blobs SET refcount = refcount + 1 WHERE blob_id = ?;

-- On file delete
UPDATE blobs SET refcount = refcount - 1 WHERE blob_id = ?;

-- On copy (no blob copy needed!)
UPDATE blobs SET refcount = refcount + 1 WHERE blob_id = ?;

SQLite Performance

The SQLite metadata database benefits from the same tuning as SqliteBackend:

Setting	Default	Purpose	Tradeoff
`journal_mode`	WAL	Concurrent reads during writes	Creates .wal/.shm files
`synchronous`	FULL	Index integrity on power loss	Safe default, opt-in to NORMAL
`cache_size`	16MB	Smaller cache for metadata-only	Tune based on index size
`busy_timeout`	5000	Gracefully handle lock contention	Prevents SQLITE_BUSY errors
`auto_vacuum`	INCREMENTAL	Reclaim space from deletions	Gradual space recovery

Why FULL synchronous: Index corruption means paths no longer resolve to blobs—blobs become orphaned and unreachable. Use FULL as the safe default; opt-in to NORMAL only with battery-backed storage or when index can be rebuilt.

SQL Indexes (critical):

CREATE INDEX idx_nodes_parent ON nodes(parent);
CREATE INDEX idx_nodes_blob ON nodes(blob_id) WHERE blob_id IS NOT NULL;
CREATE INDEX idx_blobs_refcount ON blobs(refcount) WHERE refcount = 0;

Without proper indexes, path lookups become full table scans—catastrophic for large filesystems.

Connection pooling: 4-8 reader connections for concurrent metadata queries; single writer for updates. See SQLite Operations Guide for detailed patterns.

Garbage Collection

Blobs with refcount = 0 are orphans and can be deleted:

#![allow(unused)]
fn main() {
impl CustomIndexedBackend {
    /// Run garbage collection (call periodically or on-demand).
    pub fn gc(&self) -> Result<GcStats, FsError> {
        let db = self.db.lock().map_err(|_| FsError::Backend("lock".into()))?;

        // Find orphaned blobs
        let orphans: Vec<String> = db.prepare(
            "SELECT blob_id FROM blobs WHERE refcount = 0"
        )?.query_map([], |row| row.get(0))?
          .filter_map(|r| r.ok())
          .collect();

        drop(db);

        // Delete from blob store
        let mut deleted = 0;
        for blob_id in &orphans {
            if self.blobs.delete(blob_id).is_ok() {
                deleted += 1;
            }
        }

        // Remove from SQLite
        let db = self.db.lock().unwrap();
        db.execute(
            "DELETE FROM blobs WHERE refcount = 0",
            [],
        )?;

        Ok(GcStats { orphans_found: orphans.len(), blobs_deleted: deleted })
    }
}
}

GC Safety:

Never delete blobs referenced by snapshots
Add snapshot_refs table or use refcount that includes snapshot references
Run GC in background, not during writes

Snapshots and Backup

Creating a Snapshot

#![allow(unused)]
fn main() {
impl CustomIndexedBackend {
    /// Create a point-in-time snapshot.
    pub fn snapshot(&self, name: &str) -> Result<SnapshotId, FsError> {
        let db = self.db.lock().unwrap();

        db.execute_batch(&format!(r#"
            BEGIN;

            -- Record snapshot
            INSERT INTO snapshots (name, created_at, root_manifest)
            VALUES ('{name}', strftime('%s', 'now'),
                    (SELECT json_group_array(blob_id) FROM blobs WHERE refcount > 0));

            -- Pin all current blobs (prevent GC)
            UPDATE blobs SET refcount = refcount + 1
            WHERE blob_id IN (SELECT blob_id FROM nodes WHERE blob_id IS NOT NULL);

            COMMIT;
        "#))?;

        Ok(SnapshotId(name.to_string()))
    }

    /// Export as single portable artifact.
    pub fn export(&self, dest: impl AsRef<Path>) -> Result<(), FsError> {
        // 1. SQLite backup API for metadata
        let db = self.db.lock().unwrap();
        let backup_db = Connection::open(dest.as_ref().join("metadata.db"))?;
        db.backup(rusqlite::DatabaseName::Main, &backup_db, None)?;

        // 2. Copy referenced blobs
        let blob_ids: Vec<String> = db.prepare(
            "SELECT DISTINCT blob_id FROM nodes WHERE blob_id IS NOT NULL"
        )?.query_map([], |row| row.get(0))?
          .filter_map(|r| r.ok())
          .collect();

        drop(db);

        let blobs_dir = dest.as_ref().join("blobs");
        std::fs::create_dir_all(&blobs_dir)?;

        for blob_id in blob_ids {
            let data = self.blobs.get(&blob_id)?;
            std::fs::write(blobs_dir.join(&blob_id), data)?;
        }

        Ok(())
    }
}
}

Middleware Integration

Middleware works unchanged - it wraps the hybrid backend like any other:

#![allow(unused)]
fn main() {
use anyfs::{FileStorage, QuotaLayer, TracingLayer, PathFilterLayer};

let backend = CustomIndexedBackend::open("drive.db", LocalCasBackend::new("./blobs"))?;

// Standard middleware stack
let backend = backend
    .layer(QuotaLayer::builder()
        .max_total_size(50 * 1024 * 1024 * 1024)  // 50 GB
        .build())
    .layer(PathFilterLayer::builder()
        .deny("**/.env")
        .build())
    .layer(TracingLayer::new());

let fs = FileStorage::new(backend);

// Use like any other filesystem
fs.write("/documents/report.pdf", &pdf_bytes)?;
}

Quota tracking note: QuotaLayer tracks logical size (what users see), not physical size (with dedup). For physical tracking, the backend could expose physical_usage() separately.

Async Considerations

The hybrid pattern benefits significantly from async (ADR-024):

Operation	Sync Pain	Async Benefit
Blob upload to S3	Blocks thread	Concurrent uploads
Multiple reads	Sequential	Parallel fetches
Write queue	`blocking_recv()`	Native async channel
GC	Blocks all ops	Background task

When AsyncFs traits exist (ADR-024), the hybrid backend can use them naturally:

#![allow(unused)]
fn main() {
#[async_trait]
impl AsyncFsRead for CustomIndexedBackend {
    async fn read(&self, path: &Path) -> Result<Vec<u8>, FsError> {
        let blob_id = self.lookup_blob_id(path).await?;
        self.blobs.get_async(&blob_id).await  // Non-blocking!
    }
}
}

Identified Gaps

Areas where the current framework could be enhanced:

Gap	Current State	Recommendation
Two-phase commit pattern	Not documented	Add to backend guide
Refcount/GC patterns	Not documented	Add section
Streaming large files	`open_read`/`open_write` exist	Document chunked patterns
Physical vs logical size	Quota tracks logical only	Consider `PhysicalStats` trait
Background tasks (GC)	No pattern	Document spawn pattern

Summary

Framework validation: PASSED

The current AnyFS trait design supports hybrid backends:

Traits define operations, not storage
Interior mutability allows single-writer patterns
Middleware composes unchanged
Async strategy (ADR-024) enhances this pattern

Key patterns for hybrid backends:

Single-writer queue for SQLite
Two-phase commit (blob upload → SQLite commit)
Content-addressing for dedup
Refcounting for GC safety
Snapshot pinning for backup safety

This validates that AnyFS is flexible enough for advanced storage architectures while maintaining its simple middleware composition model.

Keyboard shortcuts

AnyFS Ecosystem Manual