AI-Powered ATS CV Processing Pipeline

A complete pipeline for processing PDF resumes using AI-powered analysis with Claude CLI integration.

Overview

This toolkit consists of three main scripts that work together to:

Convert PDF resumes to text format
Extract contact information and build a searchable database
Perform AI-powered CV analysis using Claude CLI

Prerequisites

Python 3.8+
UV package manager installed
Claude CLI installed and configured (claude command available in PATH)

Installation

# Install UV (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone the repository
git clone https://github.com/rimads/custom-ai-ats.git
cd custom-ai-ats

# Install dependencies with UV
uv sync

# Install Claude CLI (if not already installed)
# Follow instructions at: https://docs.anthropic.com/claude/docs

Environment Variables

The scripts support the ATS_WORKING_DIR environment variable to specify the working directory:

# Set working directory (optional)
export ATS_WORKING_DIR=/path/to/your/cv/files

# Or use inline
ATS_WORKING_DIR=/path/to/data python3 pdf_to_text.py

If not set, scripts default to the current working directory.

Usage

Step 1: Convert PDFs to Text

Place all PDF resumes in your working directory, then run:

# Process all PDFs in working directory
uv run python pdf_to_text.py

# Or process a specific PDF
uv run python pdf_to_text.py resume.pdf

# Using custom working directory
ATS_WORKING_DIR=/path/to/cvs uv run python pdf_to_text.py

# Alternative: using installed script
uv run pdf-to-text

This will:

Process all .pdf files in the working directory
Create corresponding .txt files with extracted text content
Skip files that have already been converted

Step 2: Build Contact Database

Extract contact information and build a SQLite database:

# Process all text files and create database
uv run python extract_emails.py

# Search by email domain
uv run python extract_emails.py search_domain gmail.com

# Using custom working directory
ATS_WORKING_DIR=/path/to/cvs uv run python extract_emails.py

# Alternative: using installed script
uv run extract-emails

This will:

Process all .txt files (converted CVs)
Extract emails, phone numbers, LinkedIn profiles, GitHub links, websites
Create cv_database.db with all contact information
Build full-text search index for efficient querying

Step 3: AI-Powered CV Analysis

Analyze CVs using Claude CLI integration with flexible criteria:

# Process all unprocessed CVs with default frontend criteria
uv run python batch_reviewer.py cv_database.db

# Use custom criteria directly
uv run python batch_reviewer.py cv_database.db --criteria "Analyze for backend roles focusing on Python, Django, and API development. Return JSON with: python_skills, api_experience, years_of_experience, company_names."

# Use criteria from file
uv run python batch_reviewer.py cv_database.db --criteria-file criteria_examples/fullstack_developer.txt

# Show processing progress
uv run python batch_reviewer.py cv_database.db --progress

# Custom retry settings
uv run python batch_reviewer.py cv_database.db --max-retries 5

# Using custom working directory
ATS_WORKING_DIR=/path/to/cvs uv run python batch_reviewer.py

# Alternative: using installed script
uv run batch-reviewer --criteria-file my_criteria.txt

Analysis Criteria

The AI reviewer uses flexible, customizable criteria and stores results in a JSON features column. You can define any analysis criteria for different roles:

Built-in Criteria Examples:

Frontend Developer (default):
- knows_reactjs, knows_redux, years_of_experience
- company_names, clear_impact, primary_skills
- seniority_level, css_frameworks, testing_experience
Full-stack Developer:
- frontend_experience, backend_experience, database_skills
- cloud_experience, devops_skills, architecture_experience
- team_leadership
Data Scientist:
- python_experience, ml_frameworks, data_tools
- education_level, domain_expertise, visualization_tools
- cloud_ml

Custom Criteria:

Create your own analysis criteria by specifying JSON fields and their evaluation logic:

# Custom criteria for DevOps roles
uv run python batch_reviewer.py cv_database.db --criteria "
Analyze for DevOps engineering roles and return JSON with:
- docker_experience: 1 if mentions Docker/containers, 0 if not, null if unclear
- kubernetes_skills: 1 if mentions Kubernetes/K8s, 0 if not, null if unclear  
- cloud_platforms: Array of cloud platforms (AWS, Azure, GCP), null if none
- years_of_experience: Total professional experience (integer), null if unclear
- automation_tools: Array of automation tools mentioned, null if none
- monitoring_experience: 1 if mentions monitoring tools (Prometheus, Grafana, etc.), 0 if not, null if unclear
"

Example Output Structure:

{
    "knows_reactjs": 1,
    "years_of_experience": 5,
    "company_names": ["Google", "Microsoft", "Startup Inc"],
    "clear_impact": 1,
    "primary_skills": ["JavaScript", "TypeScript", "Node.js"],
    "seniority_level": "senior"
}

Database Schema

The SQLite database includes these key tables:

-- Main CV table
CREATE TABLE cvs (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    filename TEXT UNIQUE NOT NULL,
    content TEXT NOT NULL,
    email TEXT,
    phones TEXT,
    linkedin TEXT, 
    github TEXT,
    website TEXT,
    all_emails TEXT,
    features TEXT,  -- JSON column for AI analysis results
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Full-text search index
CREATE VIRTUAL TABLE cvs_fts USING fts5(...);

-- Query JSON features examples:
-- SELECT * FROM cvs WHERE JSON_EXTRACT(features, '$.knows_reactjs') = 1;
-- SELECT * FROM cvs WHERE JSON_EXTRACT(features, '$.years_of_experience') > 5;
-- SELECT * FROM cvs WHERE JSON_EXTRACT(features, '$.seniority_level') = 'senior';
-- SELECT JSON_EXTRACT(features, '$.primary_skills') FROM cvs WHERE id = 1;

Example Workflow

# 1. Set working directory (optional)
export ATS_WORKING_DIR=/path/to/your/cv/data

# 2. Place PDF resumes in working directory
ls $ATS_WORKING_DIR/*.pdf
# john_doe_resume.pdf, jane_smith_cv.pdf, ...

# 3. Convert to text
uv run python pdf_to_text.py
# Creates: john_doe_resume.txt, jane_smith_cv.txt, ...

# 4. Build database
uv run python extract_emails.py  
# Creates: cv_database.db

# 5. AI analysis with custom criteria
uv run python batch_reviewer.py cv_database.db --criteria-file criteria_examples/frontend_developer.txt
# Processes all CVs through Claude CLI

# 6. Check results
uv run python batch_reviewer.py cv_database.db --progress
# Progress: 150/150 CVs processed (100.0%)
# Sample features found: ['knows_reactjs', 'knows_redux', 'years_of_experience', 'company_names', 'clear_impact', 'primary_skills', 'seniority_level']
# CVs with analysis features: 150

Error Handling

The batch reviewer includes robust error handling:

Timeout Protection: 60-second timeout for Claude CLI calls
SQL Validation: Tests queries before execution using transactions
Auto-Fix: Handles common SQL formatting issues (quotes, NULL values)
Retry Logic: Up to 3 attempts per CV with enhanced error context
Graceful Failures: Continues processing other CVs if one fails

Output Files

*.txt - Text versions of PDF resumes
cv_database.db - SQLite database with all CV data and analysis
test.db - Test database (if created for testing)

Tips

Large Batches: The script processes all CVs in one run - no manual batching needed
Resume Processing: Script handles various CV formats and extracts contact info reliably
Database Queries: Use standard SQLite commands to query results
Testing: Create test databases with small subsets before processing large batches

Troubleshooting

Claude CLI Issues: Ensure claude command is available and authenticated
PDF Conversion: Some PDFs may not extract cleanly - check .txt output
Database Locks: Close any database connections before running scripts
Memory Usage: Large CV collections may require sufficient RAM for processing

For questions or issues, refer to the individual script documentation or Claude CLI documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
batch_reviewer.py		batch_reviewer.py
extract_emails.py		extract_emails.py
pdf_to_text.py		pdf_to_text.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI-Powered ATS CV Processing Pipeline

Overview

Prerequisites

Installation

Environment Variables

Usage

Step 1: Convert PDFs to Text

Step 2: Build Contact Database

Step 3: AI-Powered CV Analysis

Analysis Criteria

Built-in Criteria Examples:

Custom Criteria:

Example Output Structure:

Database Schema

Example Workflow

Error Handling

Output Files

Tips

Troubleshooting

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

rimads/custom-ai-ats

Folders and files

Latest commit

History

Repository files navigation

AI-Powered ATS CV Processing Pipeline

Overview

Prerequisites

Installation

Environment Variables

Usage

Step 1: Convert PDFs to Text

Step 2: Build Contact Database

Step 3: AI-Powered CV Analysis

Analysis Criteria

Built-in Criteria Examples:

Custom Criteria:

Example Output Structure:

Database Schema

Example Workflow

Error Handling

Output Files

Tips

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages