A comprehensive tool for clustering photos based on faces detected in them, plus person-specific photo search capabilities. This tool can process thousands of photos, detect faces, cluster similar faces together, organize photos by person, and search for specific people across your entire photo collection.
- Face Detection: Uses MTCNN for accurate face detection
- Face Recognition: Uses ModernArcFace with InceptionResnetV1 for generating high-quality face embeddings
- Clustering: Uses DBSCAN to group similar faces with optimized parameters
- Photo Organization: Automatically organizes photos by detected persons
- Person-Specific Search: Search for photos of specific people using reference photos
- Backup Creation: Creates compressed backups of person-specific photo collections
- Progress Tracking: Real-time progress bars and logging
- Caching: Saves face embeddings to avoid reprocessing
- Apple Silicon Support: Optimized for MPS acceleration on Apple devices
# 1. Install dependencies
conda create -n face-recognition python=3.9 -y
conda activate face-recognition
pip install -r requirements.txt
# 2. Organize photos by person (face clustering)
python src/main.py --input_dir ./data/Gallery --output_dir ./output
# 3. Search for a specific person
python search_person.py --reference ./reference_photos --gallery ./data/Gallery --confidence 0.7 --output ./person_matches
face-recognition-clustering/
├── src/
│ ├── face_detector.py # Face detection using MTCNN
│ ├── face_encoder.py # Modern ArcFace encoding with InceptionResnetV1
│ ├── clustering.py # Face clustering algorithms
│ ├── photo_organizer.py # Photo organization and backup
│ ├── person_search.py # Person-specific photo search
│ ├── utils.py # Utility functions
│ └── main.py # Main application entry point
├── search_person.py # Command-line person search tool
├── data/
│ └── Gallery/ # Your photo collection
├── output/
│ ├── clusters/ # Clustered photos by person
│ ├── face_previews/ # Face thumbnail previews
│ └── backups/ # Person-specific photo backups
├── cache/ # Cached face embeddings
├── logs/ # Application logs
├── requirements.txt # Python dependencies
If you prefer manual setup:
# Create conda environment
conda create -n face-recognition python=3.9 -y
conda activate face-recognition
# Install PyTorch with MPS support (Apple Silicon)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
# Install other dependencies
pip install facenet-pytorch mtcnn Pillow opencv-python scikit-learn numpy matplotlib seaborn tqdm pandas psutil
First, activate the conda environment (if using the recommended installation):
conda activate face-recognition
Then run the face clustering to organize your photos by person:
# Automatic device selection (will use MPS on Apple Silicon)
python src/main.py --input_dir ./data/Gallery --output_dir ./output
# With optimized parameters for realistic person grouping
python src/main.py --input_dir ./data/Gallery --output_dir ./output --eps 0.6 --min_samples 3
After clustering (or independently), search for specific people across your photo collection:
# Search for a person using reference photos
python search_person.py --reference ./reference_photos --gallery ./data/Gallery --confidence 0.7 --output ./person_matches
# Search only (no moving files)
python search_person.py -r ./reference_photos -g ./data/Gallery -c 0.7
# High confidence search (fewer false positives)
python search_person.py -r ./reference_photos -g ./data/Gallery -c 0.85 -o ./high_confidence_matches
# Broader search (more matches, some false positives)
python search_person.py -r ./reference_photos -g ./data/Gallery -c 0.6 -o ./moderate_matches --verbose
Create a folder with 3-10 high-quality photos of the person you want to search for:
reference_photos/
├── person_photo_1.jpg # Clear, well-lit face
├── person_photo_2.jpg # Different angle
├── person_photo_3.jpg # Different expression
└── person_photo_4.jpg # Good quality
Person Search Features:
- 🎯 Confidence Thresholding: 70% recommended, 60%+ for broader search
- 📁 Automatic File Management: Moves matching photos to dedicated folders
- 📊 Detailed Results: JSON reports with similarity scores and statistics
- 🚀 Apple Silicon Optimized: Uses MPS acceleration
- 📈 Match Statistics: Shows match rates and confidence distribution
python src/main.py \
--input_dir ./data/Gallery \
--output_dir ./output \
--min_cluster_size 10 \
--eps 0.6 \
--face_threshold 0.9 \
--backup_person "Person_1" \
--create_backup
After clustering, you can move photos of specific persons from the original Gallery structure to organized folders:
# Move photos for Person_1 from Gallery to a single folder
python move_person_photos.py 1 --gallery_dir ./data/Gallery --output_dir ./output
# Copy photos instead of moving (preserves originals)
python move_person_photos.py 1 --copy
# List available persons
python move_person_photos.py --list
Or use the main script with move options:
python src/main.py \
--input_dir ./data/Gallery \
--output_dir ./output \
--move_from_gallery \
--move_person "Person_1" \
--copy_mode
--input_dir
: Directory containing photos to process--output_dir
: Output directory for results--min_cluster_size
: Minimum photos per person cluster (default: 5)--eps
: DBSCAN clustering epsilon parameter (default: 0.6, optimized)--min_samples
: DBSCAN minimum samples parameter (default: 3, optimized)--face_threshold
: Face detection confidence threshold (default: 0.9)--backup_person
: Create backup for specific person cluster--create_backup
: Enable backup creation--use_cache
: Use cached embeddings if available--device
: Processing device (auto/cpu/cuda/mps, default: auto)
--reference
,-r
: Directory with reference photos (required)--gallery
,-g
: Gallery directory to search through (required)--confidence
,-c
: Confidence threshold 0.0-1.0 (default: 0.7)--output
,-o
: Output directory to move matching photos (optional)--results-file
: JSON file to save detailed results (optional)--verbose
,-v
: Enable detailed logging--device
: Processing device (auto/cpu/cuda/mps, default: auto)
--move_from_gallery
: Enable moving photos from original Gallery--move_person
: Move specific person (e.g., "Person_1")--move_all_persons
: Move all persons with minimum photo count--copy_mode
: Copy files instead of moving them
- Detection Phase: Scans all photos and detects faces
- Encoding Phase: Generates face embeddings using ModernArcFace
- Clustering Phase: Groups similar faces using DBSCAN (optimized eps=0.6, min_samples=3)
- Organization Phase: Creates person-specific folders
- Backup Phase: Creates compressed backups (optional)
- Move Phase: Moves person photos from Gallery to single folders (optional)
- Reference Setup: Prepare 3-10 high-quality photos of the target person
- Reference Processing: Extract and encode face embeddings from reference photos
- Gallery Search: Scan target gallery and compare face embeddings
- Similarity Matching: Calculate cosine similarity scores for each detected face
- Filtering: Apply confidence threshold to identify matches
- File Management: Optionally move matching photos to dedicated folders
- First: Run face clustering to organize your entire photo collection by person
- Then: Use person search to find additional photos of specific people in new batches
- Finally: Merge results and maintain organized photo collections
output/clusters/Person_X/
: Folders containing photos for each detected personoutput/face_previews/
: Face thumbnails organized by personoutput/backups/Person_X.zip
: Compressed backups of person-specific photosPerson_X_from_Gallery/
: Photos moved from original Gallery structurecache/face_embeddings.pkl
: Cached face embeddings for future runs
person_matches/
: Photos matching your reference person (when using --output)person_search_results_TIMESTAMP.json
: Detailed search results with confidence scoresperson_search.log
: Search process logs and statistics
{
"search_parameters": {
"gallery_directory": "./data/Gallery",
"confidence_threshold": 0.7,
"output_directory": "./person_matches"
},
"statistics": {
"total_images_searched": 1234,
"matching_images": 42,
"match_rate": 0.034,
"moved_images": 42
},
"matches": [
{
"image_path": "./data/Gallery/photo1.jpg",
"best_similarity": 0.847,
"matching_faces": 1,
"total_faces": 3
}
]
}
- MTCNN: Multi-task CNN for face detection and alignment
- ModernArcFace: Advanced face recognition using InceptionResnetV1 backbone with enhanced training
- DBSCAN: Density-based clustering algorithm for grouping similar faces
- Cosine Similarity: Distance metric for comparing face embeddings in person search
- Embedding Dimension: 512D face vectors
- Face Resolution: 160x160 pixels (normalized to [-1, 1])
- Detection Confidence: 90% default threshold
- Similarity Metric: Cosine similarity for person matching
- eps: 0.6 (distance threshold for grouping faces)
- min_samples: 3 (minimum faces to form a cluster)
- min_cluster_size: 5 (minimum photos per person folder)
- Recommended Confidence: 70% (good balance of precision/recall)
- High Confidence: 85%+ (conservative matching)
- Broad Search: 60-70% (more matches, some false positives)
- First run will be slower as it processes all images
- Subsequent runs use cached embeddings for faster processing
- Adjust
eps
parameter if clustering is too loose/tight (0.6 is optimized) - Use
min_cluster_size
to filter out persons with few photos
- Reference Photo Quality: Use 3-10 clear, well-lit photos of the target person
- Confidence Tuning: Start with 70%, adjust based on results
- Performance: ~100-500 images per minute depending on hardware
- Memory Usage: ~2-4GB for large galleries
- Apple Silicon: Automatically uses MPS acceleration for faster processing
- Apple Silicon (M1/M2/M3): Uses MPS acceleration automatically
- NVIDIA GPU: Use
--device cuda
for CUDA acceleration - CPU Only: Fallback option for maximum compatibility
- Memory: 8GB+ recommended for large photo collections
- No matches found: Lower confidence threshold or improve reference photos
- Too many false positives: Increase confidence threshold to 0.8+
- Performance issues: Use GPU acceleration or process smaller batches
- Memory errors: Close other applications or use CPU processing
- Too many small clusters: Increase
eps
parameter (try 0.7) - Too few clusters: Decrease
eps
parameter (try 0.5) - Giant Person_0 cluster: Current optimized parameters (eps=0.6) should prevent this
- 📸 Organize family photos by person automatically
- 🎯 Process bulk photo imports from multiple sources
- 📱 Merge photos from different devices and accounts
- 🗂️ Create person-specific albums for easier browsing
- 👥 Find all photos of a family member across thousands of photos
- 🔍 Locate specific people in large photo collections
- 📅 Create person timelines by finding photos across different time periods
- 🎉 Prepare personalized albums for special occasions
- 🏷️ Quality control with adjustable confidence thresholds