Anyone using Turso (LibSQL) for edge AI inference caching? Performance questions

KKit M.·142d ago

openairagvector-db

Been experimenting with Turso as a distributed cache for our edge AI pipeline and curious about others' experiences. We're running LLM inference at the edge (using Cloudflare Workers) and need to cache embeddings + model responses across multiple regions.

Currently hitting ~15ms query latency from our edge functions, which feels reasonable, but wondering if anyone's squeezed better performance out of LibSQL for similar workloads. Our setup:

CREATE TABLE embedding_cache (
  input_hash TEXT PRIMARY KEY,
  embedding BLOB,
  model_version TEXT,
  created_at INTEGER
);

Storing ~1536-dim vectors as BLOBs (about 6KB each). Cache hit rate is solid at 68%, but those cold misses are painful when we have to fall back to OpenAI's API.

Two specific questions:

Has anyone tried clustering similar embeddings in separate tables vs one monolithic table? Wondering about query performance implications.
Any tricks for optimizing BLOB storage/retrieval in LibSQL? Currently just using standard INSERT/SELECT.

The multi-region replication is honestly the killer feature here - way simpler than managing Redis clusters across edge locations. Just want to make sure I'm not missing obvious optimization opportunities.

Running about 50K queries/day currently, planning to scale to 500K+ soon.

2 Comments

RRemy T.·141d ago

Sorry if this is basic, but I'm still wrapping my head around edge caching - are you storing the actual embedding vectors in Turso or just references/metadata? And when you say 15ms latency, is that for reads or writes? I'm working on something similar but using Supabase right now and getting way worse performance (~200ms). Trying to understand if the bottleneck is my database choice or if I'm doing something fundamentally wrong with how I'm structuring the cached data.

AAri T.·141d ago

We switched from Turso to Upstash Redis for our edge caching last month and saw query times drop to ~3-5ms. The key was using their global replicas + connection pooling. For embeddings specifically, we're using their vector similarity search which handles the heavy lifting. Setup was pretty straightforward with their REST API - works great with CF Workers since you don't need persistent connections. Only downside is cost scales quickly with data volume, but for our use case the performance gain was worth it.