A complete pipeline for processing PDF resumes using AI-powered analysis with Claude CLI integration.
This toolkit consists of three main scripts that work together to:
- Convert PDF resumes to text format
- Extract contact information and build a searchable database
- Perform AI-powered CV analysis using Claude CLI
- Python 3.8+
- UV package manager installed
- Claude CLI installed and configured (
claude
command available in PATH)
# Install UV (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone the repository
git clone https://github.com/rimads/custom-ai-ats.git
cd custom-ai-ats
# Install dependencies with UV
uv sync
# Install Claude CLI (if not already installed)
# Follow instructions at: https://docs.anthropic.com/claude/docs
The scripts support the ATS_WORKING_DIR
environment variable to specify the working directory:
# Set working directory (optional)
export ATS_WORKING_DIR=/path/to/your/cv/files
# Or use inline
ATS_WORKING_DIR=/path/to/data python3 pdf_to_text.py
If not set, scripts default to the current working directory.
Place all PDF resumes in your working directory, then run:
# Process all PDFs in working directory
uv run python pdf_to_text.py
# Or process a specific PDF
uv run python pdf_to_text.py resume.pdf
# Using custom working directory
ATS_WORKING_DIR=/path/to/cvs uv run python pdf_to_text.py
# Alternative: using installed script
uv run pdf-to-text
This will:
- Process all
.pdf
files in the working directory - Create corresponding
.txt
files with extracted text content - Skip files that have already been converted
Extract contact information and build a SQLite database:
# Process all text files and create database
uv run python extract_emails.py
# Search by email domain
uv run python extract_emails.py search_domain gmail.com
# Using custom working directory
ATS_WORKING_DIR=/path/to/cvs uv run python extract_emails.py
# Alternative: using installed script
uv run extract-emails
This will:
- Process all
.txt
files (converted CVs) - Extract emails, phone numbers, LinkedIn profiles, GitHub links, websites
- Create
cv_database.db
with all contact information - Build full-text search index for efficient querying
Analyze CVs using Claude CLI integration with flexible criteria:
# Process all unprocessed CVs with default frontend criteria
uv run python batch_reviewer.py cv_database.db
# Use custom criteria directly
uv run python batch_reviewer.py cv_database.db --criteria "Analyze for backend roles focusing on Python, Django, and API development. Return JSON with: python_skills, api_experience, years_of_experience, company_names."
# Use criteria from file
uv run python batch_reviewer.py cv_database.db --criteria-file criteria_examples/fullstack_developer.txt
# Show processing progress
uv run python batch_reviewer.py cv_database.db --progress
# Custom retry settings
uv run python batch_reviewer.py cv_database.db --max-retries 5
# Using custom working directory
ATS_WORKING_DIR=/path/to/cvs uv run python batch_reviewer.py
# Alternative: using installed script
uv run batch-reviewer --criteria-file my_criteria.txt
The AI reviewer uses flexible, customizable criteria and stores results in a JSON features
column. You can define any analysis criteria for different roles:
-
Frontend Developer (default):
knows_reactjs
,knows_redux
,years_of_experience
company_names
,clear_impact
,primary_skills
seniority_level
,css_frameworks
,testing_experience
-
Full-stack Developer:
frontend_experience
,backend_experience
,database_skills
cloud_experience
,devops_skills
,architecture_experience
team_leadership
-
Data Scientist:
python_experience
,ml_frameworks
,data_tools
education_level
,domain_expertise
,visualization_tools
cloud_ml
Create your own analysis criteria by specifying JSON fields and their evaluation logic:
# Custom criteria for DevOps roles
uv run python batch_reviewer.py cv_database.db --criteria "
Analyze for DevOps engineering roles and return JSON with:
- docker_experience: 1 if mentions Docker/containers, 0 if not, null if unclear
- kubernetes_skills: 1 if mentions Kubernetes/K8s, 0 if not, null if unclear
- cloud_platforms: Array of cloud platforms (AWS, Azure, GCP), null if none
- years_of_experience: Total professional experience (integer), null if unclear
- automation_tools: Array of automation tools mentioned, null if none
- monitoring_experience: 1 if mentions monitoring tools (Prometheus, Grafana, etc.), 0 if not, null if unclear
"
{
"knows_reactjs": 1,
"years_of_experience": 5,
"company_names": ["Google", "Microsoft", "Startup Inc"],
"clear_impact": 1,
"primary_skills": ["JavaScript", "TypeScript", "Node.js"],
"seniority_level": "senior"
}
The SQLite database includes these key tables:
-- Main CV table
CREATE TABLE cvs (
id INTEGER PRIMARY KEY AUTOINCREMENT,
filename TEXT UNIQUE NOT NULL,
content TEXT NOT NULL,
email TEXT,
phones TEXT,
linkedin TEXT,
github TEXT,
website TEXT,
all_emails TEXT,
features TEXT, -- JSON column for AI analysis results
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Full-text search index
CREATE VIRTUAL TABLE cvs_fts USING fts5(...);
-- Query JSON features examples:
-- SELECT * FROM cvs WHERE JSON_EXTRACT(features, '$.knows_reactjs') = 1;
-- SELECT * FROM cvs WHERE JSON_EXTRACT(features, '$.years_of_experience') > 5;
-- SELECT * FROM cvs WHERE JSON_EXTRACT(features, '$.seniority_level') = 'senior';
-- SELECT JSON_EXTRACT(features, '$.primary_skills') FROM cvs WHERE id = 1;
# 1. Set working directory (optional)
export ATS_WORKING_DIR=/path/to/your/cv/data
# 2. Place PDF resumes in working directory
ls $ATS_WORKING_DIR/*.pdf
# john_doe_resume.pdf, jane_smith_cv.pdf, ...
# 3. Convert to text
uv run python pdf_to_text.py
# Creates: john_doe_resume.txt, jane_smith_cv.txt, ...
# 4. Build database
uv run python extract_emails.py
# Creates: cv_database.db
# 5. AI analysis with custom criteria
uv run python batch_reviewer.py cv_database.db --criteria-file criteria_examples/frontend_developer.txt
# Processes all CVs through Claude CLI
# 6. Check results
uv run python batch_reviewer.py cv_database.db --progress
# Progress: 150/150 CVs processed (100.0%)
# Sample features found: ['knows_reactjs', 'knows_redux', 'years_of_experience', 'company_names', 'clear_impact', 'primary_skills', 'seniority_level']
# CVs with analysis features: 150
The batch reviewer includes robust error handling:
- Timeout Protection: 60-second timeout for Claude CLI calls
- SQL Validation: Tests queries before execution using transactions
- Auto-Fix: Handles common SQL formatting issues (quotes, NULL values)
- Retry Logic: Up to 3 attempts per CV with enhanced error context
- Graceful Failures: Continues processing other CVs if one fails
*.txt
- Text versions of PDF resumescv_database.db
- SQLite database with all CV data and analysistest.db
- Test database (if created for testing)
- Large Batches: The script processes all CVs in one run - no manual batching needed
- Resume Processing: Script handles various CV formats and extracts contact info reliably
- Database Queries: Use standard SQLite commands to query results
- Testing: Create test databases with small subsets before processing large batches
- Claude CLI Issues: Ensure
claude
command is available and authenticated - PDF Conversion: Some PDFs may not extract cleanly - check
.txt
output - Database Locks: Close any database connections before running scripts
- Memory Usage: Large CV collections may require sufficient RAM for processing
For questions or issues, refer to the individual script documentation or Claude CLI documentation.