Skip to content

[AI] OpenAI Embedding Generation System #1808

@ad-m-ss

Description

@ad-m-ss

Ticket Information

Context & Background

Implement OpenAI embedding generation service using Azure OpenAI's text-embedding-3-small model to convert article chunks into vector embeddings for semantic search. This includes batch processing, error handling, cost monitoring, and integration with the content pipeline.

Reference Documents:

Requirements & Acceptance Criteria

  • Install and configure OpenAI Python client for Azure OpenAI
  • Implement embedding generation service using text-embedding-3-small model
  • Create batch processing system for efficient embedding generation
  • Add comprehensive error handling and retry logic for API failures
  • Implement cost monitoring and usage tracking for OpenAI API calls
  • Create management commands for embedding generation and batch processing
  • Add comprehensive unit tests for embedding service
  • Setup monitoring and alerting for embedding generation processes

Implementation Steps

1. OpenAI Client Configuration

  • Add OpenAI and Azure Identity packages to project requirements
    • openai >= 1.0.0 for Azure OpenAI client
    • azure-identity >= 1.12.0 for Azure authentication
  • Create knowledge/services/openai_client.py with AzureOpenAIClient class
  • Configure Azure OpenAI client with the following settings:
    • API Key: From Azure OpenAI service (AZURE_OPENAI_API_KEY)
    • API Version: Azure OpenAI API version (AZURE_OPENAI_API_VERSION)
    • Endpoint: Azure OpenAI service endpoint (AZURE_OPENAI_ENDPOINT)
    • Model: text-embedding-3-small for embedding generation
    • Retry Logic: Maximum 3 retries with exponential backoff
    • Logging: Comprehensive logging for debugging and monitoring

Required methods:

  • generate_embedding(text, attempt) - Single text embedding with retry logic
  • _prepare_text(text) - Text cleaning and preparation
  • _log_usage(usage) - API usage logging for cost tracking

Error handling requirements:

  • Rate Limit Errors: Exponential backoff retry mechanism
  • API Errors: Proper logging and error propagation
  • Text Preparation: Handle empty text, length limits, whitespace cleanup

2. Batch Embedding Generation

Create batch processing functionality with the following features:

  • Batch Processing: Process multiple texts efficiently in configurable batches
  • Text Preparation: Clean and validate text inputs before processing
  • Rate Limiting: Implement pauses between batches to respect API limits
  • Usage Tracking: Monitor token usage and costs for budget management
  • Error Handling: Handle individual text failures gracefully

Key methods:

  • generate_batch_embeddings(texts, batch_size) - Process multiple texts in batches
  • _prepare_text(text) - Clean and validate individual text inputs
  • _log_usage(usage) - Track API usage for cost monitoring

3. Embedding Service Integration

Create knowledge/services/embedding_generator.py with Django integration:

  • Article Processing: Generate embeddings for all chunks of an article
  • Batch Operations: Efficient processing of multiple articles
  • Database Integration: Update ArticleChunk models with generated embeddings
  • Progress Tracking: Monitor and log embedding generation progress
  • Error Recovery: Handle partial failures and resume processing

Required methods:

  • generate_embeddings_for_article(article) - Process single article
  • generate_embeddings_batch(limit) - Process multiple articles
  • regenerate_embeddings_for_article(article) - Regenerate existing embeddings

4. Management Commands

Create knowledge/management/commands/generate_embeddings.py with:

  • Command Arguments: Flexible processing options
    • --article-id: Process specific article by ID
    • --batch-size: Configurable batch size (default 50)
    • --limit: Maximum chunks to process (default 500)
    • --regenerate: Regenerate existing embeddings
  • Processing Logging: Integration with ProcessingLog model
  • Progress Reporting: Track and report processing progress
  • Error Handling: Comprehensive error handling with Sentry integration

5. Cost Monitoring and Usage Tracking

Implement cost monitoring with the following features:

  • Token Usage Tracking: Monitor tokens consumed per request
  • Cost Calculation: Calculate estimated costs based on Azure OpenAI pricing
  • Budget Alerts: Alert when approaching budget thresholds
  • Usage Analytics: Track usage patterns and optimization opportunities

6. Performance Optimization

Implement performance optimizations:

  • Batch Size Optimization: Configure optimal batch sizes for API efficiency
  • Concurrent Processing: Process multiple batches concurrently where possible
  • Memory Management: Efficient memory usage for large text processing
  • Caching: Cache embeddings to avoid regeneration

Code Changes Required

  • Create knowledge/services/openai_client.py with Azure OpenAI client
  • Create knowledge/services/embedding_generator.py for Django integration
  • Add OpenAI configuration to Django settings
  • Create management commands for embedding generation
  • Update ArticleChunk model with embedding-related methods
  • Add comprehensive unit tests for embedding services
  • Create integration tests with mocked OpenAI responses

External Documentation

Deliverables

  1. Azure OpenAI client with comprehensive error handling
  2. Batch embedding generation system
  3. Django service integration layer
  4. Management commands for embedding operations
  5. Cost monitoring and usage tracking
  6. Performance optimization and monitoring
  7. Comprehensive test suite
  8. Documentation for embedding generation workflow

Performance Requirements

  • Generate embeddings for 1000+ text chunks per hour
  • Handle batch processing with 50-100 texts per batch
  • Maintain 99% success rate for embedding generation
  • Response time < 30 seconds for typical batch operations
  • Cost monitoring with budget alerts

Next Steps

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions