Skip to content

myriel-io/ivf-index

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 

Repository files navigation

IVF vs HNSW Comparison App

A Streamlit application that demonstrates the differences between IVF (Inverted File Index) and HNSW (Hierarchical Navigable Small World) indexing methods for approximate nearest neighbor search.

About

This app allows you to:

  • Compare performance metrics between IVF and HNSW methods
  • Adjust parameters to see their impact on search performance
  • Visualize differences in build time, query time, recall, and memory usage
  • Learn when to use each method for your specific use cases

Getting Started

Prerequisites

  • Python 3.8+ (Python 3.12+ recommended for best compatibility with the latest dependencies)
  • pip

Installation

  1. Clone this repository:
git clone https://github.com/yourusername/ivf-index.git
cd ivf-index
  1. Create and activate a virtual environment (recommended):
# On macOS/Linux
python -m venv venv
source venv/bin/activate

# On Windows
python -m venv venv
venv\Scripts\activate
  1. Install the required dependencies:
pip install -r requirements.txt

Note: If you encounter installation errors, try updating pip first:

pip install --upgrade pip setuptools wheel

Running the App

Run the Streamlit app with:

streamlit run app.py

The app will open in your default web browser at http://localhost:8501.

Features

  • Interactive Parameters: Adjust dataset size, dimensions, and algorithm-specific parameters
  • Performance Metrics: Compare build time, query time, recall, and memory usage
  • Visualization: Bar charts showing relative performance
  • Educational Content: Explanations of how each algorithm works and when to use them

How It Works

  1. The app loads and processes the 20 Newsgroups dataset, converting text to TF-IDF vectors
  2. PCA is applied to reduce dimensionality to a manageable size
  3. Both IVF and HNSW indices are built with user-specified parameters
  4. Random query vectors are selected from the dataset
  5. Search is performed with both methods and compared to an exact (brute force) search
  6. Results are displayed as metrics and visualizations

Key Differences Between IVF and HNSW

IVF (Inverted File Index)

  • Clustering-based approach
  • Partitions vectors into Voronoi cells
  • Searches only within relevant clusters
  • Memory-efficient but requires careful parameter tuning

HNSW (Hierarchical Navigable Small World)

  • Graph-based approach with multiple layers
  • Creates "highways" for fast traversal
  • Generally higher recall and faster queries
  • More memory intensive but handles high dimensions well

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages