Crate ragfs_embed

Crate ragfs_embed 

Source
Expand description

§ragfs-embed

Local embedding generation for RAGFS using the Candle ML framework.

This crate provides offline, privacy-preserving vector embeddings without external APIs. Embeddings are generated using the gte-small model from Hugging Face.

§Features

  • Local-first: All computation happens on your machine
  • Offline capable: Works without internet after initial model download
  • No API costs: No rate limits or usage fees
  • Concurrent: Thread pool for parallel embedding generation
  • Cached: LRU cache to avoid redundant computations

§Cargo Features

  • candle (default): Enables the Candle ML stack for real embeddings
  • Without candle: Only NoopEmbedder is available (for testing/development)

§Model Details

PropertyValue
Modelthenlper/gte-small
Dimension384
Max tokens512
ArchitectureBERT-based
Size~100MB

§Usage

use ragfs_embed::{CandleEmbedder, EmbedderPool, EmbeddingCache};
use ragfs_core::{Embedder, EmbeddingConfig};
use std::sync::Arc;

// Create and initialize the embedder
let embedder = CandleEmbedder::new("~/.local/share/ragfs/models".into());
embedder.init().await?;  // Downloads model on first run

// Wrap with a thread pool for concurrency
let pool = EmbedderPool::new(Arc::new(embedder), 4);

// Embed documents
let config = EmbeddingConfig::default();
let texts = vec!["Hello world", "Machine learning"];
let embeddings = pool.embed_batch(&texts, &config).await?;
// Each embedding is a Vec<f32> with 384 dimensions

§Caching

Use EmbeddingCache to avoid recomputing embeddings for identical text:

use ragfs_embed::EmbeddingCache;

// Create a cache with default capacity (10,000 entries)
let cache = EmbeddingCache::new(embedder);

// Or with custom capacity
let cache = EmbeddingCache::with_capacity(embedder, 50_000);

// Embeddings are cached by content hash
let result = cache.embed_text(&["Hello"], &config).await?;

§Components

TypeDescription
CandleEmbedderTransformer-based embeddings using gte-small (requires candle feature)
EmbeddingCacheLRU cache for embedding results (requires candle feature)
EmbedderPoolConcurrent embedding with semaphore limiting (always available)
NoopEmbedderNo-op embedder for testing (always available)

Re-exports§

pub use cache::EmbeddingCache;
pub use candle::CandleEmbedder;
pub use noop::NoopEmbedder;
pub use pool::EmbedderPool;

Modules§

cache
Embedding cache for avoiding redundant computations.
candle
GTE-small embedder using Candle.
noop
No-op embedder for testing without Candle.
pool
Embedder pool for concurrent embedding operations.