IVF vs HNSW Comparison App

A Streamlit application that demonstrates the differences between IVF (Inverted File Index) and HNSW (Hierarchical Navigable Small World) indexing methods for approximate nearest neighbor search.

About

This app allows you to:

Compare performance metrics between IVF and HNSW methods
Adjust parameters to see their impact on search performance
Visualize differences in build time, query time, recall, and memory usage
Learn when to use each method for your specific use cases

Getting Started

Prerequisites

Python 3.8+ (Python 3.12+ recommended for best compatibility with the latest dependencies)
pip

Installation

Clone this repository:

git clone https://github.com/yourusername/ivf-index.git
cd ivf-index

Create and activate a virtual environment (recommended):

# On macOS/Linux
python -m venv venv
source venv/bin/activate

# On Windows
python -m venv venv
venv\Scripts\activate

Install the required dependencies:

pip install -r requirements.txt

Note: If you encounter installation errors, try updating pip first:
pip install --upgrade pip setuptools wheel

Running the App

Run the Streamlit app with:

streamlit run app.py

The app will open in your default web browser at http://localhost:8501.

Features

Interactive Parameters: Adjust dataset size, dimensions, and algorithm-specific parameters
Performance Metrics: Compare build time, query time, recall, and memory usage
Visualization: Bar charts showing relative performance
Educational Content: Explanations of how each algorithm works and when to use them

How It Works

The app loads and processes the 20 Newsgroups dataset, converting text to TF-IDF vectors
PCA is applied to reduce dimensionality to a manageable size
Both IVF and HNSW indices are built with user-specified parameters
Random query vectors are selected from the dataset
Search is performed with both methods and compared to an exact (brute force) search
Results are displayed as metrics and visualizations

Key Differences Between IVF and HNSW

IVF (Inverted File Index)

Clustering-based approach
Partitions vectors into Voronoi cells
Searches only within relevant clusters
Memory-efficient but requires careful parameter tuning

HNSW (Hierarchical Navigable Small World)

Graph-based approach with multiple layers
Creates "highways" for fast traversal
Generally higher recall and faster queries
More memory intensive but handles high dimensions well

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

IVF vs HNSW Comparison App

About

Getting Started

Prerequisites

Installation

Running the App

Features

How It Works

Key Differences Between IVF and HNSW

IVF (Inverted File Index)

HNSW (Hierarchical Navigable Small World)

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Languages

myriel-io/ivf-index

Folders and files

Latest commit

History

Repository files navigation

IVF vs HNSW Comparison App

About

Getting Started

Prerequisites

Installation

Running the App

Features

How It Works

Key Differences Between IVF and HNSW

IVF (Inverted File Index)

HNSW (Hierarchical Navigable Small World)

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages