Skip to content

A comprehensive tool for clustering photos based on faces detected in them, plus person-specific photo search capabilities. This tool can process thousands of photos, detect faces, cluster similar faces together, organize photos by person, and search for specific people across your entire photo collection.

License

Notifications You must be signed in to change notification settings

ibalampanis/face-recognition-clustering

Repository files navigation

Face Recognition Clustering Tool

A comprehensive tool for clustering photos based on faces detected in them, plus person-specific photo search capabilities. This tool can process thousands of photos, detect faces, cluster similar faces together, organize photos by person, and search for specific people across your entire photo collection.

Features

  • Face Detection: Uses MTCNN for accurate face detection
  • Face Recognition: Uses ModernArcFace with InceptionResnetV1 for generating high-quality face embeddings
  • Clustering: Uses DBSCAN to group similar faces with optimized parameters
  • Photo Organization: Automatically organizes photos by detected persons
  • Person-Specific Search: Search for photos of specific people using reference photos
  • Backup Creation: Creates compressed backups of person-specific photo collections
  • Progress Tracking: Real-time progress bars and logging
  • Caching: Saves face embeddings to avoid reprocessing
  • Apple Silicon Support: Optimized for MPS acceleration on Apple devices

🚀 Quick Start

# 1. Install dependencies
conda create -n face-recognition python=3.9 -y
conda activate face-recognition
pip install -r requirements.txt

# 2. Organize photos by person (face clustering)
python src/main.py --input_dir ./data/Gallery --output_dir ./output

# 3. Search for a specific person
python search_person.py --reference ./reference_photos --gallery ./data/Gallery --confidence 0.7 --output ./person_matches

Project Structure

face-recognition-clustering/
├── src/
│   ├── face_detector.py          # Face detection using MTCNN
│   ├── face_encoder.py           # Modern ArcFace encoding with InceptionResnetV1
│   ├── clustering.py             # Face clustering algorithms
│   ├── photo_organizer.py        # Photo organization and backup
│   ├── person_search.py          # Person-specific photo search
│   ├── utils.py                  # Utility functions
│   └── main.py                   # Main application entry point
├── search_person.py              # Command-line person search tool
├── data/
│   └── Gallery/                  # Your photo collection
├── output/
│   ├── clusters/                 # Clustered photos by person
│   ├── face_previews/           # Face thumbnail previews
│   └── backups/                 # Person-specific photo backups
├── cache/                        # Cached face embeddings
├── logs/                         # Application logs
├── requirements.txt              # Python dependencies

Installation

Installation with Conda

If you prefer manual setup:

# Create conda environment
conda create -n face-recognition python=3.9 -y
conda activate face-recognition

# Install PyTorch with MPS support (Apple Silicon)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu

# Install other dependencies
pip install facenet-pytorch mtcnn Pillow opencv-python scikit-learn numpy matplotlib seaborn tqdm pandas psutil

Usage

1. Face Clustering (Primary Feature)

First, activate the conda environment (if using the recommended installation):

conda activate face-recognition

Then run the face clustering to organize your photos by person:

# Automatic device selection (will use MPS on Apple Silicon)
python src/main.py --input_dir ./data/Gallery --output_dir ./output

# With optimized parameters for realistic person grouping
python src/main.py --input_dir ./data/Gallery --output_dir ./output --eps 0.6 --min_samples 3

2. Person-Specific Photo Search 🔍

After clustering (or independently), search for specific people across your photo collection:

Quick Start

# Search for a person using reference photos
python search_person.py --reference ./reference_photos --gallery ./data/Gallery --confidence 0.7 --output ./person_matches

Common Use Cases

# Search only (no moving files)
python search_person.py -r ./reference_photos -g ./data/Gallery -c 0.7

# High confidence search (fewer false positives)
python search_person.py -r ./reference_photos -g ./data/Gallery -c 0.85 -o ./high_confidence_matches

# Broader search (more matches, some false positives)
python search_person.py -r ./reference_photos -g ./data/Gallery -c 0.6 -o ./moderate_matches --verbose

Reference Photos Setup

Create a folder with 3-10 high-quality photos of the person you want to search for:

reference_photos/
├── person_photo_1.jpg    # Clear, well-lit face
├── person_photo_2.jpg    # Different angle
├── person_photo_3.jpg    # Different expression
└── person_photo_4.jpg    # Good quality

Person Search Features:

  • 🎯 Confidence Thresholding: 70% recommended, 60%+ for broader search
  • 📁 Automatic File Management: Moves matching photos to dedicated folders
  • 📊 Detailed Results: JSON reports with similarity scores and statistics
  • 🚀 Apple Silicon Optimized: Uses MPS acceleration
  • 📈 Match Statistics: Shows match rates and confidence distribution

3. Advanced Clustering Options

python src/main.py \
    --input_dir ./data/Gallery \
    --output_dir ./output \
    --min_cluster_size 10 \
    --eps 0.6 \
    --face_threshold 0.9 \
    --backup_person "Person_1" \
    --create_backup

4. Moving Photos from Gallery

After clustering, you can move photos of specific persons from the original Gallery structure to organized folders:

# Move photos for Person_1 from Gallery to a single folder
python move_person_photos.py 1 --gallery_dir ./data/Gallery --output_dir ./output

# Copy photos instead of moving (preserves originals)
python move_person_photos.py 1 --copy

# List available persons
python move_person_photos.py --list

Or use the main script with move options:

python src/main.py \
    --input_dir ./data/Gallery \
    --output_dir ./output \
    --move_from_gallery \
    --move_person "Person_1" \
    --copy_mode

Parameters

Face Clustering Parameters

  • --input_dir: Directory containing photos to process
  • --output_dir: Output directory for results
  • --min_cluster_size: Minimum photos per person cluster (default: 5)
  • --eps: DBSCAN clustering epsilon parameter (default: 0.6, optimized)
  • --min_samples: DBSCAN minimum samples parameter (default: 3, optimized)
  • --face_threshold: Face detection confidence threshold (default: 0.9)
  • --backup_person: Create backup for specific person cluster
  • --create_backup: Enable backup creation
  • --use_cache: Use cached embeddings if available
  • --device: Processing device (auto/cpu/cuda/mps, default: auto)

Person Search Parameters

  • --reference, -r: Directory with reference photos (required)
  • --gallery, -g: Gallery directory to search through (required)
  • --confidence, -c: Confidence threshold 0.0-1.0 (default: 0.7)
  • --output, -o: Output directory to move matching photos (optional)
  • --results-file: JSON file to save detailed results (optional)
  • --verbose, -v: Enable detailed logging
  • --device: Processing device (auto/cpu/cuda/mps, default: auto)

Photo Moving Parameters

  • --move_from_gallery: Enable moving photos from original Gallery
  • --move_person: Move specific person (e.g., "Person_1")
  • --move_all_persons: Move all persons with minimum photo count
  • --copy_mode: Copy files instead of moving them

Workflow

Primary Workflow: Face Clustering

  1. Detection Phase: Scans all photos and detects faces
  2. Encoding Phase: Generates face embeddings using ModernArcFace
  3. Clustering Phase: Groups similar faces using DBSCAN (optimized eps=0.6, min_samples=3)
  4. Organization Phase: Creates person-specific folders
  5. Backup Phase: Creates compressed backups (optional)
  6. Move Phase: Moves person photos from Gallery to single folders (optional)

Secondary Workflow: Person-Specific Search

  1. Reference Setup: Prepare 3-10 high-quality photos of the target person
  2. Reference Processing: Extract and encode face embeddings from reference photos
  3. Gallery Search: Scan target gallery and compare face embeddings
  4. Similarity Matching: Calculate cosine similarity scores for each detected face
  5. Filtering: Apply confidence threshold to identify matches
  6. File Management: Optionally move matching photos to dedicated folders

Combined Workflow (Recommended)

  1. First: Run face clustering to organize your entire photo collection by person
  2. Then: Use person search to find additional photos of specific people in new batches
  3. Finally: Merge results and maintain organized photo collections

Output

Face Clustering Output

  • output/clusters/Person_X/: Folders containing photos for each detected person
  • output/face_previews/: Face thumbnails organized by person
  • output/backups/Person_X.zip: Compressed backups of person-specific photos
  • Person_X_from_Gallery/: Photos moved from original Gallery structure
  • cache/face_embeddings.pkl: Cached face embeddings for future runs

Person Search Output

  • person_matches/: Photos matching your reference person (when using --output)
  • person_search_results_TIMESTAMP.json: Detailed search results with confidence scores
  • person_search.log: Search process logs and statistics

Example Search Results

{
  "search_parameters": {
    "gallery_directory": "./data/Gallery",
    "confidence_threshold": 0.7,
    "output_directory": "./person_matches"
  },
  "statistics": {
    "total_images_searched": 1234,
    "matching_images": 42,
    "match_rate": 0.034,
    "moved_images": 42
  },
  "matches": [
    {
      "image_path": "./data/Gallery/photo1.jpg",
      "best_similarity": 0.847,
      "matching_faces": 1,
      "total_faces": 3
    }
  ]
}

Models Used

  • MTCNN: Multi-task CNN for face detection and alignment
  • ModernArcFace: Advanced face recognition using InceptionResnetV1 backbone with enhanced training
  • DBSCAN: Density-based clustering algorithm for grouping similar faces
  • Cosine Similarity: Distance metric for comparing face embeddings in person search

Technical Specifications

Face Recognition

  • Embedding Dimension: 512D face vectors
  • Face Resolution: 160x160 pixels (normalized to [-1, 1])
  • Detection Confidence: 90% default threshold
  • Similarity Metric: Cosine similarity for person matching

Clustering Parameters (Optimized)

  • eps: 0.6 (distance threshold for grouping faces)
  • min_samples: 3 (minimum faces to form a cluster)
  • min_cluster_size: 5 (minimum photos per person folder)

Person Search Parameters

  • Recommended Confidence: 70% (good balance of precision/recall)
  • High Confidence: 85%+ (conservative matching)
  • Broad Search: 60-70% (more matches, some false positives)

Performance Tips

Face Clustering

  • First run will be slower as it processes all images
  • Subsequent runs use cached embeddings for faster processing
  • Adjust eps parameter if clustering is too loose/tight (0.6 is optimized)
  • Use min_cluster_size to filter out persons with few photos

Person-Specific Search

  • Reference Photo Quality: Use 3-10 clear, well-lit photos of the target person
  • Confidence Tuning: Start with 70%, adjust based on results
  • Performance: ~100-500 images per minute depending on hardware
  • Memory Usage: ~2-4GB for large galleries
  • Apple Silicon: Automatically uses MPS acceleration for faster processing

Hardware Optimization

  • Apple Silicon (M1/M2/M3): Uses MPS acceleration automatically
  • NVIDIA GPU: Use --device cuda for CUDA acceleration
  • CPU Only: Fallback option for maximum compatibility
  • Memory: 8GB+ recommended for large photo collections

Troubleshooting

Person Search Issues

  • No matches found: Lower confidence threshold or improve reference photos
  • Too many false positives: Increase confidence threshold to 0.8+
  • Performance issues: Use GPU acceleration or process smaller batches
  • Memory errors: Close other applications or use CPU processing

Clustering Issues

  • Too many small clusters: Increase eps parameter (try 0.7)
  • Too few clusters: Decrease eps parameter (try 0.5)
  • Giant Person_0 cluster: Current optimized parameters (eps=0.6) should prevent this

Use Cases

Face Clustering

  • 📸 Organize family photos by person automatically
  • 🎯 Process bulk photo imports from multiple sources
  • 📱 Merge photos from different devices and accounts
  • 🗂️ Create person-specific albums for easier browsing

Person-Specific Search

  • 👥 Find all photos of a family member across thousands of photos
  • 🔍 Locate specific people in large photo collections
  • 📅 Create person timelines by finding photos across different time periods
  • 🎉 Prepare personalized albums for special occasions
  • 🏷️ Quality control with adjustable confidence thresholds

About

A comprehensive tool for clustering photos based on faces detected in them, plus person-specific photo search capabilities. This tool can process thousands of photos, detect faces, cluster similar faces together, organize photos by person, and search for specific people across your entire photo collection.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages