Under the Hood: How Willustore Works
A step-by-step walkthrough of decentralized vector storage — from embedding generation to sub-millisecond retrieval.
Data Vectorization
Your AI generates embeddings
Your application (a RAG pipeline, semantic search engine, or AI assistant) generates vector embeddings from raw data — documents, images, user behavior — using models like OpenAI, HuggingFace, or your own.
These high-dimensional vectors (typically 384–1536 dimensions) are the fingerprints of your data. They capture semantic meaning in a format that can be searched and compared at scale.
// Example: Generating embeddings
const embedding = await openai.embeddings.create({
model: "text-embedding-ada-002",
input: "Willustore decentralized storage"
});
// Send to Willustore
await willustore.upsert({
id: "doc_001",
vector: embedding.data[0].embedding,
metadata: { source: "docs", org: "acme-corp" }
});Fragmentation & Encryption
Split, encrypt, never expose
Before any data leaves the central orchestration server, it is split into fragments using a secret sharing scheme (similar to Shamir's Secret Sharing). Each fragment is independently encrypted with AES-256-GCM.
Critically, no single device ever holds enough fragments to reconstruct the original vector. A configurable threshold (e.g., 3 of 5 fragments) is required for reconstruction.
// Conceptual fragmentation logic
function fragmentAndEncrypt(vector, k=3, n=5) {
const shares = shamirSplit(vector, k, n);
return shares.map(share => ({
encrypted: aes256gcm.encrypt(share, orgKey),
fragment_id: uuid(),
threshold_k: k,
total_n: n,
}));
}Distributed Storage via Gossip
Spread across trusted devices
The central index uses a gossip protocol to discover available participating devices on your network. Encrypted fragments are distributed to devices with available storage, prioritizing proximity.
The gossip protocol ensures fault tolerance — as devices join and leave, fragments are automatically rebalanced. The system maintains redundancy so even if 40% of devices go offline, all data remains retrievable.
// Gossip protocol node discovery
class GossipNode {
async broadcast() {
const peers = await this.discover();
const healthyPeers = peers.filter(p => p.storage > MIN_STORAGE);
await this.distributeFragments(fragments, healthyPeers);
}
async handleFragment(fragment) {
await this.storage.save(fragment.id, fragment.encrypted);
await this.ackToIndex(fragment.id, this.nodeId);
}
}HNSW Indexing
Lightning-fast vector search
The central index maintains an HNSW (Hierarchical Navigable Small World) graph — a state-of-the-art approximate nearest neighbor data structure. It stores only vector IDs and fragment location metadata, never the raw vectors.
HNSW achieves O(log n) query complexity with recall rates above 98%. This means even with millions of vectors distributed across hundreds of devices, search queries resolve in under 10 milliseconds.
// HNSW index stores only metadata
const index = new HNSWIndex({ dimensions: 1536, ef: 200 });
// On insert: register location, not data
index.add(vector_id, {
fragment_locations: ["device:a1b2", "device:c3d4"],
threshold: 3,
timestamp: Date.now(),
});
// On query: find nearest IDs, then fetch fragments
const nearestIds = index.search(queryVector, topK=10);Retrieval via WiFi / Bluetooth
Local network, maximum speed
When your application issues a vector query, the index resolves which devices hold the required fragments. The client agent fetches fragments from nearby devices using WiFi (LAN) or Bluetooth — no internet round-trips needed.
This local-first retrieval model means campus or office deployments see latency under 5ms for most queries.
// Local retrieval priority
async function retrieveFragments(fragmentLocations) {
const localPeers = fragmentLocations.filter(
loc => isOnSameSubnet(loc.ip) || isBluetooth(loc)
);
// Prefer local peers (< 5ms latency)
const fragments = await Promise.all(
localPeers.map(peer => peer.fetchFragment())
);
return shamirReconstruct(fragments, threshold=3);
}Secure Reconstruction & Response
Assemble, decrypt, deliver
Retrieved fragments are sent to the client's local agent, where they are assembled using the secret sharing reconstruction algorithm and decrypted. The original vector is returned to your application — the raw data never touched any server.
The entire retrieval pipeline from query to response typically completes in under 20 milliseconds for same-network deployments.
// Client-side reconstruction
async function resolveQuery(queryVector) {
const nearestIds = await index.search(queryVector);
const fragmentSets = await fetchAllFragments(nearestIds);
return fragmentSets.map(({ fragments, key }) => ({
vector: aes256gcm.decrypt(
shamirReconstruct(fragments),
key
),
metadata: fragments[0].metadata,
}));
}Defense in Depth
Multiple independent layers of security ensure that even if one layer is compromised, your data remains protected.
AES-256-GCM Encryption
Every fragment is encrypted with a unique key derived from your organization master key. Authenticated encryption prevents tampering.
Secret Sharing Threshold
Shamir's Secret Sharing means n-of-k fragments are required for reconstruction. Individual device compromise is insufficient.
Zero-Knowledge Index
The central index holds only IDs and device metadata — never raw vector data. Index compromise reveals nothing sensitive.
Mutual TLS Authentication
All device-to-index and device-to-device communication requires mutual TLS with certificates issued by your org's CA.
Audit Logging
Every read, write, and device join/leave event is cryptographically signed and logged. Full traceability for compliance.
Hardware Attestation
Devices can optionally require TPM-based attestation before joining the network, ensuring only trusted hardware participates.
Ready to Try It Yourself?
Our documentation and SDK make integration straightforward. Deploy your first private vector store in under an hour.