Skip to content

AI-powered ATS CV processing pipeline with Claude CLI integration for automated resume analysis and contact extraction

Notifications You must be signed in to change notification settings

rimads/custom-ai-ats

Repository files navigation

AI-Powered ATS CV Processing Pipeline

A complete pipeline for processing PDF resumes using AI-powered analysis with Claude CLI integration.

Overview

This toolkit consists of three main scripts that work together to:

  1. Convert PDF resumes to text format
  2. Extract contact information and build a searchable database
  3. Perform AI-powered CV analysis using Claude CLI

Prerequisites

  • Python 3.8+
  • UV package manager installed
  • Claude CLI installed and configured (claude command available in PATH)

Installation

# Install UV (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone the repository
git clone https://github.com/rimads/custom-ai-ats.git
cd custom-ai-ats

# Install dependencies with UV
uv sync

# Install Claude CLI (if not already installed)
# Follow instructions at: https://docs.anthropic.com/claude/docs

Environment Variables

The scripts support the ATS_WORKING_DIR environment variable to specify the working directory:

# Set working directory (optional)
export ATS_WORKING_DIR=/path/to/your/cv/files

# Or use inline
ATS_WORKING_DIR=/path/to/data python3 pdf_to_text.py

If not set, scripts default to the current working directory.

Usage

Step 1: Convert PDFs to Text

Place all PDF resumes in your working directory, then run:

# Process all PDFs in working directory
uv run python pdf_to_text.py

# Or process a specific PDF
uv run python pdf_to_text.py resume.pdf

# Using custom working directory
ATS_WORKING_DIR=/path/to/cvs uv run python pdf_to_text.py

# Alternative: using installed script
uv run pdf-to-text

This will:

  • Process all .pdf files in the working directory
  • Create corresponding .txt files with extracted text content
  • Skip files that have already been converted

Step 2: Build Contact Database

Extract contact information and build a SQLite database:

# Process all text files and create database
uv run python extract_emails.py

# Search by email domain
uv run python extract_emails.py search_domain gmail.com

# Using custom working directory
ATS_WORKING_DIR=/path/to/cvs uv run python extract_emails.py

# Alternative: using installed script
uv run extract-emails

This will:

  • Process all .txt files (converted CVs)
  • Extract emails, phone numbers, LinkedIn profiles, GitHub links, websites
  • Create cv_database.db with all contact information
  • Build full-text search index for efficient querying

Step 3: AI-Powered CV Analysis

Analyze CVs using Claude CLI integration with flexible criteria:

# Process all unprocessed CVs with default frontend criteria
uv run python batch_reviewer.py cv_database.db

# Use custom criteria directly
uv run python batch_reviewer.py cv_database.db --criteria "Analyze for backend roles focusing on Python, Django, and API development. Return JSON with: python_skills, api_experience, years_of_experience, company_names."

# Use criteria from file
uv run python batch_reviewer.py cv_database.db --criteria-file criteria_examples/fullstack_developer.txt

# Show processing progress
uv run python batch_reviewer.py cv_database.db --progress

# Custom retry settings
uv run python batch_reviewer.py cv_database.db --max-retries 5

# Using custom working directory
ATS_WORKING_DIR=/path/to/cvs uv run python batch_reviewer.py

# Alternative: using installed script
uv run batch-reviewer --criteria-file my_criteria.txt

Analysis Criteria

The AI reviewer uses flexible, customizable criteria and stores results in a JSON features column. You can define any analysis criteria for different roles:

Built-in Criteria Examples:

  1. Frontend Developer (default):

    • knows_reactjs, knows_redux, years_of_experience
    • company_names, clear_impact, primary_skills
    • seniority_level, css_frameworks, testing_experience
  2. Full-stack Developer:

    • frontend_experience, backend_experience, database_skills
    • cloud_experience, devops_skills, architecture_experience
    • team_leadership
  3. Data Scientist:

    • python_experience, ml_frameworks, data_tools
    • education_level, domain_expertise, visualization_tools
    • cloud_ml

Custom Criteria:

Create your own analysis criteria by specifying JSON fields and their evaluation logic:

# Custom criteria for DevOps roles
uv run python batch_reviewer.py cv_database.db --criteria "
Analyze for DevOps engineering roles and return JSON with:
- docker_experience: 1 if mentions Docker/containers, 0 if not, null if unclear
- kubernetes_skills: 1 if mentions Kubernetes/K8s, 0 if not, null if unclear  
- cloud_platforms: Array of cloud platforms (AWS, Azure, GCP), null if none
- years_of_experience: Total professional experience (integer), null if unclear
- automation_tools: Array of automation tools mentioned, null if none
- monitoring_experience: 1 if mentions monitoring tools (Prometheus, Grafana, etc.), 0 if not, null if unclear
"

Example Output Structure:

{
    "knows_reactjs": 1,
    "years_of_experience": 5,
    "company_names": ["Google", "Microsoft", "Startup Inc"],
    "clear_impact": 1,
    "primary_skills": ["JavaScript", "TypeScript", "Node.js"],
    "seniority_level": "senior"
}

Database Schema

The SQLite database includes these key tables:

-- Main CV table
CREATE TABLE cvs (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    filename TEXT UNIQUE NOT NULL,
    content TEXT NOT NULL,
    email TEXT,
    phones TEXT,
    linkedin TEXT, 
    github TEXT,
    website TEXT,
    all_emails TEXT,
    features TEXT,  -- JSON column for AI analysis results
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Full-text search index
CREATE VIRTUAL TABLE cvs_fts USING fts5(...);

-- Query JSON features examples:
-- SELECT * FROM cvs WHERE JSON_EXTRACT(features, '$.knows_reactjs') = 1;
-- SELECT * FROM cvs WHERE JSON_EXTRACT(features, '$.years_of_experience') > 5;
-- SELECT * FROM cvs WHERE JSON_EXTRACT(features, '$.seniority_level') = 'senior';
-- SELECT JSON_EXTRACT(features, '$.primary_skills') FROM cvs WHERE id = 1;

Example Workflow

# 1. Set working directory (optional)
export ATS_WORKING_DIR=/path/to/your/cv/data

# 2. Place PDF resumes in working directory
ls $ATS_WORKING_DIR/*.pdf
# john_doe_resume.pdf, jane_smith_cv.pdf, ...

# 3. Convert to text
uv run python pdf_to_text.py
# Creates: john_doe_resume.txt, jane_smith_cv.txt, ...

# 4. Build database
uv run python extract_emails.py  
# Creates: cv_database.db

# 5. AI analysis with custom criteria
uv run python batch_reviewer.py cv_database.db --criteria-file criteria_examples/frontend_developer.txt
# Processes all CVs through Claude CLI

# 6. Check results
uv run python batch_reviewer.py cv_database.db --progress
# Progress: 150/150 CVs processed (100.0%)
# Sample features found: ['knows_reactjs', 'knows_redux', 'years_of_experience', 'company_names', 'clear_impact', 'primary_skills', 'seniority_level']
# CVs with analysis features: 150

Error Handling

The batch reviewer includes robust error handling:

  • Timeout Protection: 60-second timeout for Claude CLI calls
  • SQL Validation: Tests queries before execution using transactions
  • Auto-Fix: Handles common SQL formatting issues (quotes, NULL values)
  • Retry Logic: Up to 3 attempts per CV with enhanced error context
  • Graceful Failures: Continues processing other CVs if one fails

Output Files

  • *.txt - Text versions of PDF resumes
  • cv_database.db - SQLite database with all CV data and analysis
  • test.db - Test database (if created for testing)

Tips

  • Large Batches: The script processes all CVs in one run - no manual batching needed
  • Resume Processing: Script handles various CV formats and extracts contact info reliably
  • Database Queries: Use standard SQLite commands to query results
  • Testing: Create test databases with small subsets before processing large batches

Troubleshooting

  • Claude CLI Issues: Ensure claude command is available and authenticated
  • PDF Conversion: Some PDFs may not extract cleanly - check .txt output
  • Database Locks: Close any database connections before running scripts
  • Memory Usage: Large CV collections may require sufficient RAM for processing

For questions or issues, refer to the individual script documentation or Claude CLI documentation.

About

AI-powered ATS CV processing pipeline with Claude CLI integration for automated resume analysis and contact extraction

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages