---
tags:
- vector-database
- benchmarks
- faiss
- weaviate
- chroma
- multimodal
- clip
- retrieval
license: apache-2.0
---

# Vector Database Benchmarks: FAISS vs Chroma vs Weaviate

This repository contains experiments benchmarking popular vector databases on **multimodal embeddings** generated from the [Flickr8k dataset](https://huggingface.co/datasets/jxie/flickr8k).  
We focused on four key evaluation dimensions:

1. **Latency per query**  
2. **Recall@5 vs Flat (accuracy tradeoffs)**  
3. **Queries per second (QPS throughput)**  
4. **Ingestion scaling performance**

All experiments were run on **Google Colab** (T4 GPU for embedding generation, CPU backend for databases).  

---

## Methodology

- Dataset: 6k images and 30k captions from Flickr8k.  
- Embeddings: CLIP (OpenAI ViT-B/32).  
- Workload: Caption-to-image retrieval (cross-modal).  
- Baseline: FAISS Flat index used as the ground-truth for recall calculations.  

Each vector database was tested under the same conditions for ingestion, search, and recall.  

---

## Results Summary

| Metric                  | FAISS            | Chroma           | Weaviate         |
|--------------------------|------------------|------------------|------------------|
| **Avg Latency per Query** | 0.19 ms          | 0.76 ms          | 1.82 ms          |
| **Recall@5 (Flat Baseline)** | 1.00             | 0.002            | 0.918            |
| **QPS Throughput**       | 1929.94          | 719.01           | 598.40           |
| **Ingestion Scaling (20k)** | 0.024s           | 2.806s           | 4.000s           |


![Vector DB Comparison](./vectordb_metrics.png)

---

## Key Takeaways

- **FAISS** is fastest, leveraging in-memory array ingestion and customizable indexing strategies.  
- **Chroma** offers simplicity and ease of integration but struggles at scale due to batching and internal constraints.  
- **Weaviate** provides a more feature-rich ecosystem (schema, hybrid search, persistence) but at higher ingestion and query overhead.  

At the million-vector scale, speed alone will not decide your choice; **engineering tradeoffs, developer productivity, and system features** will.  
Benchmarks tell one part of the story, your use case tells the rest.  

---

## Usage

You can reproduce these experiments using the provided notebook and Hugging Face dataset.  
See full code here: [rag-experiments/VectorDB-Benchmarks](https://huggingface.co/rag-experiments/VectorDB-Benchmarks).
Dataset used: Flickr8k (train split — 6k images, 30k captions, multimodal — images and text), CLIP Embeddings. Dataset Author: Johnathan Xie

---

## Citation

If you find this useful, please cite this repository: