-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Description
Ticket Information
- Assigned Team: Engineering Team
- Dependencies:
- [AI] Azure Environment Setup #1803 (Azure Environment Setup)
- [AI] Content Ingestion Pipeline #1806 (Content Ingestion Pipeline)
Context & Background
Implement OpenAI embedding generation service using Azure OpenAI's text-embedding-3-small model to convert article chunks into vector embeddings for semantic search. This includes batch processing, error handling, cost monitoring, and integration with the content pipeline.
Reference Documents:
- Phase 1 Implementation Plan:
docs/ai/phase1-implementation.rst - Azure Environment Setup: [AI] Azure Environment Setup #1803
Requirements & Acceptance Criteria
- Install and configure OpenAI Python client for Azure OpenAI
- Implement embedding generation service using text-embedding-3-small model
- Create batch processing system for efficient embedding generation
- Add comprehensive error handling and retry logic for API failures
- Implement cost monitoring and usage tracking for OpenAI API calls
- Create management commands for embedding generation and batch processing
- Add comprehensive unit tests for embedding service
- Setup monitoring and alerting for embedding generation processes
Implementation Steps
1. OpenAI Client Configuration
- Add OpenAI and Azure Identity packages to project requirements
- openai >= 1.0.0 for Azure OpenAI client
- azure-identity >= 1.12.0 for Azure authentication
- Create
knowledge/services/openai_client.pywith AzureOpenAIClient class - Configure Azure OpenAI client with the following settings:
- API Key: From Azure OpenAI service (AZURE_OPENAI_API_KEY)
- API Version: Azure OpenAI API version (AZURE_OPENAI_API_VERSION)
- Endpoint: Azure OpenAI service endpoint (AZURE_OPENAI_ENDPOINT)
- Model: text-embedding-3-small for embedding generation
- Retry Logic: Maximum 3 retries with exponential backoff
- Logging: Comprehensive logging for debugging and monitoring
Required methods:
generate_embedding(text, attempt)- Single text embedding with retry logic_prepare_text(text)- Text cleaning and preparation_log_usage(usage)- API usage logging for cost tracking
Error handling requirements:
- Rate Limit Errors: Exponential backoff retry mechanism
- API Errors: Proper logging and error propagation
- Text Preparation: Handle empty text, length limits, whitespace cleanup
2. Batch Embedding Generation
Create batch processing functionality with the following features:
- Batch Processing: Process multiple texts efficiently in configurable batches
- Text Preparation: Clean and validate text inputs before processing
- Rate Limiting: Implement pauses between batches to respect API limits
- Usage Tracking: Monitor token usage and costs for budget management
- Error Handling: Handle individual text failures gracefully
Key methods:
generate_batch_embeddings(texts, batch_size)- Process multiple texts in batches_prepare_text(text)- Clean and validate individual text inputs_log_usage(usage)- Track API usage for cost monitoring
3. Embedding Service Integration
Create knowledge/services/embedding_generator.py with Django integration:
- Article Processing: Generate embeddings for all chunks of an article
- Batch Operations: Efficient processing of multiple articles
- Database Integration: Update ArticleChunk models with generated embeddings
- Progress Tracking: Monitor and log embedding generation progress
- Error Recovery: Handle partial failures and resume processing
Required methods:
generate_embeddings_for_article(article)- Process single articlegenerate_embeddings_batch(limit)- Process multiple articlesregenerate_embeddings_for_article(article)- Regenerate existing embeddings
4. Management Commands
Create knowledge/management/commands/generate_embeddings.py with:
- Command Arguments: Flexible processing options
--article-id: Process specific article by ID--batch-size: Configurable batch size (default 50)--limit: Maximum chunks to process (default 500)--regenerate: Regenerate existing embeddings
- Processing Logging: Integration with ProcessingLog model
- Progress Reporting: Track and report processing progress
- Error Handling: Comprehensive error handling with Sentry integration
5. Cost Monitoring and Usage Tracking
Implement cost monitoring with the following features:
- Token Usage Tracking: Monitor tokens consumed per request
- Cost Calculation: Calculate estimated costs based on Azure OpenAI pricing
- Budget Alerts: Alert when approaching budget thresholds
- Usage Analytics: Track usage patterns and optimization opportunities
6. Performance Optimization
Implement performance optimizations:
- Batch Size Optimization: Configure optimal batch sizes for API efficiency
- Concurrent Processing: Process multiple batches concurrently where possible
- Memory Management: Efficient memory usage for large text processing
- Caching: Cache embeddings to avoid regeneration
Code Changes Required
- Create
knowledge/services/openai_client.pywith Azure OpenAI client - Create
knowledge/services/embedding_generator.pyfor Django integration - Add OpenAI configuration to Django settings
- Create management commands for embedding generation
- Update ArticleChunk model with embedding-related methods
- Add comprehensive unit tests for embedding services
- Create integration tests with mocked OpenAI responses
External Documentation
- Azure OpenAI Service Documentation
- OpenAI Python Client Documentation
- Azure OpenAI Embeddings Guide
- OpenAI API Rate Limits
Deliverables
- Azure OpenAI client with comprehensive error handling
- Batch embedding generation system
- Django service integration layer
- Management commands for embedding operations
- Cost monitoring and usage tracking
- Performance optimization and monitoring
- Comprehensive test suite
- Documentation for embedding generation workflow
Performance Requirements
- Generate embeddings for 1000+ text chunks per hour
- Handle batch processing with 50-100 texts per batch
- Maintain 99% success rate for embedding generation
- Response time < 30 seconds for typical batch operations
- Cost monitoring with budget alerts
Next Steps
- Upon completion, enable [AI] Vector Database Integration with ChromaDB #1807 (Vector Database Integration)
- Enable [AI] Search API Development #1809 (Search API Development)
- Schedule cost optimization review with infrastructure team
Metadata
Metadata
Assignees
Labels
No labels