Using GPT-OSS to label emails with .LBNL.record

This repository contains a Python package for analyzing Gmail emails using local Large Language Models (LLMs) via Ollama to automatically identify and classify Berkeley Lab records according to official LBNL record management policies.

Overview

The email analyzer automatically processes Gmail emails to determine if they qualify as Berkeley Lab records based on the official definition:

A record is material, in any media, that has been created or received in the course of Laboratory business, and provides evidence of the Lab's decisions or actions related to a research or operational function.

An email qualifies as a lab record if it meets BOTH criteria:

Lab business: Related to responsibilities at the Lab
Action/Decision: Documents an action or decision

The system also excludes emails that are explicitly NOT records:

Calendar responses (invitations, acceptances, meeting announcements, agendas, Zoom invitations)
Formal and informal announcements (system outages, drills, routine IT maintenance)
Personal emails (anything unrelated to Lab business)
Newsletters/Listservs and junk mail

Features

Automated Gmail Integration: Connects to Gmail API to retrieve recent emails
Local LLM Analysis: Uses Ollama to run GPT-OSS locally for privacy and security
Berkeley Lab Specific: Tailored to LBNL record management requirements
Headless Authentication: Supports server environments without GUI
Comprehensive Reporting: Generates detailed analysis reports and JSON outputs
Configurable: Customizable time ranges, confidence thresholds, and analysis parameters

Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│   Gmail API     │────│  Email Analyzer  │────│   Ollama/GPT    │
│   (OAuth 2.0)   │    │     Package      │    │   (Local LLM)   │
└─────────────────┘    └──────────────────┘    └─────────────────┘
         │                        │                        │
         ▼                        ▼                        ▼
┌─────────────────┐    ┌──────────────────┐    ┌─────────────────┐
│ Recent Emails   │    │ Analysis Results │    │ Record          │
│ (Last N days)   │    │ (JSON/Reports)   │    │ Classification  │
└─────────────────┘    └──────────────────┘    └─────────────────┘

Using Gmail API

Enabling Gmail API Client and Download Credentials

Create Google Cloud Project
- Go to Google Cloud Console
- Create a new project or select existing one
- Project name suggestion: "Berkeley Lab Email Analyzer"
Enable Gmail API
- Navigate to "APIs & Services" → "Library"
- Search for "Gmail API"
- Click "Enable"
Configure OAuth Consent Screen
- Go to "APIs & Services" → "OAuth consent screen"
- Choose "External" user type
- Fill required fields:
  - App name: "Berkeley Lab Email Analyzer"
  - User support email: Your email
  - Developer contact: Your email
- Add scopes: https://www.googleapis.com/auth/gmail.readonly
- Add test users (your email address)
Create OAuth 2.0 Credentials
- Go to "APIs & Services" → "Credentials"
- Click "Create Credentials" → "OAuth 2.0 Client IDs"
- Important: Choose "Desktop application" (not web application)
- Name: "Email Analyzer Desktop Client"
- Download the JSON file and save as credentials.json

Headless Authentication

For server environments without GUI access, the package supports console-based authentication:

Automatic Detection: The system automatically detects headless environments (SSH sessions, no DISPLAY variable)
Manual Console Flow:
- System generates an authorization URL
- Copy URL and open in any browser (phone, laptop, etc.)
- Complete Google authentication
- Copy the authorization code back to terminal
Force Console Mode: Set environment variable FORCE_CONSOLE_AUTH=true

Example Authentication Flow:

# System detects headless environment and shows:
MANUAL AUTHENTICATION REQUIRED
============================================================

1. Copy and paste this URL into a web browser:
------------------------------------------------------------
https://accounts.google.com/o/oauth2/auth?response_type=code&client_id=...
------------------------------------------------------------

2. Complete the authentication in your browser
3. After authorization, Google will display an authorization code
4. Copy that authorization code and paste it below

Enter the authorization code: [paste code here]

Install the Package

Prerequisites

Python 3.8 or higher
Internet connection for Gmail API
Ollama installed and running locally

Installation Steps

Clone the Repository

git clone <repository-url>
cd email_analyzer_package

Create Virtual Environment (Recommended)

python -m venv email_analyzer_env

# Activate virtual environment
# On Linux/Mac:
source email_analyzer_env/bin/activate
# On Windows:
email_analyzer_env\Scripts\activate

Install Dependencies

# Install in development mode
pip install -e .

# Or install dependencies directly
pip install -r requirements.txt

Set Up Configuration

# Copy your Gmail credentials
cp /path/to/downloaded/credentials.json ./credentials.json

# Create environment configuration
cat > .env << EOF
GMAIL_CREDENTIALS_PATH=credentials.json
GMAIL_TOKEN_PATH=token.json
OLLAMA_BASE_URL=http://localhost:11434
OLLAMA_MODEL=gpt-oss:120b
MAX_EMAILS_PER_BATCH=10
DAYS_BACK=7
FORCE_CONSOLE_AUTH=true
EOF

Package Structure

email_analyzer_package/
├── credentials.json          # Gmail API credentials (you provide)
├── token.json               # Auto-generated after first auth
├── .env                     # Configuration file
├── requirements.txt         # Python dependencies
├── setup.py                # Package installation
├── README.md               # This file
├── example_usage.py        # Example script
├── authenticate_manual.py  # Standalone auth script
└── email_analyzer/         # Main package
    ├── __init__.py
    ├── config.py           # Configuration management
    ├── gmail_client.py     # Gmail API interface
    ├── llm_analyzer.py     # LLM analysis logic
    ├── email_processor.py  # Main processing logic
    └── cli.py             # Command-line interface

Pull and Serve GPT-OSS Locally with Ollama

Ref: run gpt-oss locally cookbook

Install and Run Ollama

# Download and install Ollama
curl -L https://ollama.com/download/ollama-linux-amd64.tgz -o ollama-linux-amd64.tgz
tar -C $CFS/nstaff/$USER/ollama -xzf ollama-linux-amd64.tgz

# Set up environment variables
export PATH=$CFS/nstaff/$USER/ollama/bin:$PATH
export LD_LIBRARY_PATH=$CFS/nstaff/$USER/ollama/lib:$LD_LIBRARY_PATH
export OLLAMA_MODELS=$SCRATCH/ollama-models

# Start Ollama server
ollama serve

Note: Keep this terminal session running. Ollama must be serving for the email analyzer to work.

Pull and Run the Model

On the same node where ollama serve is running, open a second terminal and pull the model:

# Set up environment (same as above)
export PATH=$CFS/nstaff/$USER/ollama/bin:$PATH
export LD_LIBRARY_PATH=$CFS/nstaff/$USER/ollama/lib:$LD_LIBRARY_PATH

# Pull the GPT-OSS model (this may take time - model is large)
ollama pull gpt-oss:120b

# Verify model is available
ollama list

# Test the model (optional)
ollama run gpt-oss:120b

Alternative Models

If gpt-oss:120b is too large or unavailable, you can use alternative models:

# Smaller alternatives
ollama pull llama2:13b      # Good balance of size/performance
ollama pull mistral:7b      # Faster, smaller model
ollama pull codellama:13b   # Good for structured output

# Update your .env file accordingly
echo "OLLAMA_MODEL=llama2:13b" >> .env

Setup the Package and Run

Quick Start

Verify Prerequisites

# Check Python version
python --version  # Should be 3.8+

# Check Ollama is running
curl http://localhost:11434/api/tags

First-Time Authentication

# Run authentication (will open browser or show console instructions)
python authenticate_manual.py

Run Email Analysis

# Basic usage - analyze last 7 days
python example_usage.py

# Custom time range
python -c "
from email_analyzer import EmailProcessor
processor = EmailProcessor()
results = processor.process_recent_emails(days_back=30)
print(f'Analyzed {len(results)} emails')
"

Command Line Interface

The package includes a CLI for advanced usage:

# Install with CLI support
pip install -e .

# Basic analysis
email-analyzer --days-back 7 --confidence 0.5

# Verbose output with custom settings
email-analyzer --days-back 30 --confidence 0.7 --verbose --format both

# Help
email-analyzer --help

Example Usage Script

#!/usr/bin/env python3
"""
Example: Analyze recent emails for lab records
"""

from email_analyzer import EmailProcessor
import json

def main():
    # Initialize processor
    processor = EmailProcessor()
    
    # Analyze recent emails
    results = processor.process_recent_emails(days_back=14)
    
    # Filter high-confidence lab records
    lab_records = processor.filter_lab_records(results, min_confidence=0.7)
    
    # Generate report
    report = processor.generate_report(results)
    print(report)
    
    # Save results
    processor.save_results(results, 'analysis_results.json')
    
    # Summary statistics
    stats = processor.get_summary_stats(results)
    print(f"\nSummary: {stats['lab_records']}/{stats['total_emails']} emails are lab records")

if __name__ == "__main__":
    main()

Configuration Options

Customize analysis via .env file:

# Email processing
MAX_EMAILS_PER_BATCH=50     # Process more emails at once
DAYS_BACK=30                # Look further back

# LLM settings  
OLLAMA_MODEL=gpt-oss:120b   # Use specific model
OLLAMA_BASE_URL=http://localhost:11434

# Authentication
FORCE_CONSOLE_AUTH=true     # Always use console auth
GMAIL_CREDENTIALS_PATH=./credentials.json

Output Files

The analyzer generates several output files:

email_analysis_results.json - Complete analysis with all details
lab_records_only.json - Only emails classified as lab records
email_analysis_YYYYMMDD_HHMMSS_report.txt - Human-readable summary report
token.json - Gmail authentication token (auto-generated)

Troubleshooting

Common Issues and Solutions:

"Ollama connection failed"

# Check if Ollama is running
ps aux | grep ollama
# Restart if needed
ollama serve

"Gmail authentication failed"

# Delete token and re-authenticate
rm token.json
python authenticate_manual.py

"No emails found"
- Check date range (DAYS_BACK setting)
- Verify Gmail account has emails in the specified period
- Check Gmail API quotas in Google Cloud Console

"Model not found"

# List available models
ollama list
# Pull required model
ollama pull gpt-oss:120b

"Permission denied"

# Check file permissions
chmod 600 credentials.json token.json

Performance Notes

Model Size: gpt-oss:120b requires significant RAM (~80GB). Use smaller models for resource-constrained environments.
Processing Speed: Analysis time depends on model size and number of emails. Expect 2-5 seconds per email.
Gmail API Limits: Google imposes rate limits. The package includes automatic throttling.

Security Considerations

Credentials: Keep credentials.json and token.json secure and private
Local Processing: All email content is processed locally via Ollama - no data sent to external LLM services
Read-Only Access: The package only requests read-only Gmail permissions
Token Management: Tokens are automatically refreshed; manual intervention rarely needed

For additional support or questions, refer to the troubleshooting section or check the project's issue tracker.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Using GPT-OSS to label emails with .LBNL.record

Overview

Features

Architecture

Using Gmail API

Enabling Gmail API Client and Download Credentials

Headless Authentication

Install the Package

Prerequisites

Installation Steps

Package Structure

Pull and Serve GPT-OSS Locally with Ollama

Install and Run Ollama

Pull and Run the Model

Alternative Models

Setup the Package and Run

Quick Start

Command Line Interface

Example Usage Script

Configuration Options

Output Files

Troubleshooting

Performance Notes

Security Considerations

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
email_analyzer		email_analyzer
.env		.env
LICENSE		LICENSE
README.md		README.md
authenticate_manual.py		authenticate_manual.py
example_usage.py		example_usage.py
setup.py		setup.py

License

dingp/email_analyzer

Folders and files

Latest commit

History

Repository files navigation

Using GPT-OSS to label emails with .LBNL.record

Overview

Features

Architecture

Using Gmail API

Enabling Gmail API Client and Download Credentials

Headless Authentication

Install the Package

Prerequisites

Installation Steps

Package Structure

Pull and Serve GPT-OSS Locally with Ollama

Install and Run Ollama

Pull and Run the Model

Alternative Models

Setup the Package and Run

Quick Start

Command Line Interface

Example Usage Script

Configuration Options

Output Files

Troubleshooting

Performance Notes

Security Considerations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages