Skip to content

Migrate Django clearsessions to Celery Task #1833

@ad-m-ss

Description

@ad-m-ss

Migrate Django clearsessions to Celery Task

Overview

Migrate the Django built-in clearsessions management command to a Celery task for consistent background job management. This task removes expired sessions from the database to prevent session table bloat.

Current Implementation Analysis

Command Location

  • Command: Django built-in clearsessions management command
  • Current Schedule: Daily ("@daily")
  • Execution: Via cron with file locking (.contrib/docker/cron/run_locked.sh)

Task Complexity

  • Purpose: Remove expired Django sessions from the database
  • Operations: Database cleanup, session expiry evaluation
  • Dependencies: Django session framework, database backend
  • Runtime: Low-Medium (depends on session table size)
  • Failure Points: Database locks, large session tables, concurrent access

Implementation Tasks

1. Create Celery Task Wrapper

  • Create or update poradnia/core/tasks.py (or appropriate utility app)
  • Create Celery task that wraps Django's clearsessions command
  • Maintain existing cleanup functionality
  • Add Celery-specific error handling and monitoring

2. Error Handling & Database Safety

  • Implement retry logic for database lock conflicts (max 3 retries)
  • Add specific exception handling for:
    • Database connection errors
    • Table lock timeouts
    • Large transaction issues
    • Concurrent access conflicts
  • Safe handling of large session deletion batches
  • Transaction management for cleanup operations

3. Logging & Monitoring

  • Structured logging with cleanup progress tracking
  • Track number of sessions removed
  • Log cleanup execution time and performance
  • Database table size monitoring (before/after)
  • Integration with Celery result backend

4. Performance Optimization

  • Batch deletion for large session tables
  • Monitor database performance impact
  • Optimize deletion queries for efficiency
  • Handle potential deadlocks gracefully

5. Scheduling Configuration

  • Configure Celery beat periodic task (daily)
  • Use database-backed scheduling (django-celery-beat)
  • Allow runtime schedule modifications
  • Choose optimal daily execution time (low-traffic period)

Files to Modify/Create

New/Updated Files

  • poradnia/core/tasks.py - Celery task implementation (create app if needed)
  • poradnia/core/__init__.py - Create core app if needed
  • poradnia/core/apps.py - Django app configuration if needed

Modified Files

  • poradnia/settings/base.py - Add clearsessions to Celery beat schedule and INSTALLED_APPS if core app created
  • docs/celery.rst – Documentation updates

Configuration

# poradnia/settings/base.py - Add to CELERY_BEAT_SCHEDULE
'clearsessions': {
    'task': 'poradnia.core.tasks.clearsessions',
    'schedule': crontab(hour=3, minute=0),  # Daily at 03:00 (low traffic)
},

Dependencies

This issue cannot begin until the Celery infrastructure from #1828 is fully operational.

Related Issues

Can be developed in parallel with other task migrations once infrastructure is ready. This is the lowest priority of the migration tasks.

Testing Requirements

Unit Tests

  • Test task execution with mock session data
  • Test cleanup logic with various session states
  • Test error handling for database issues
  • Test retry logic for lock conflicts
  • Test performance with large session datasets

Integration Tests

  • Test full Celery task execution
  • Test daily scheduling via Celery beat
  • Test database cleanup with real session data
  • Test concurrent execution handling

Performance Tests

  • Test cleanup performance with large session tables
  • Test database impact during cleanup operations
  • Test memory usage during batch deletions
  • Compare performance with original management command

Implementation Example Structure

# poradnia/core/tasks.py
from celery import shared_task
from celery.utils.log import get_task_logger
from django.core.management import call_command
from django.db import transaction, DatabaseError
from django.core.management.base import CommandError

logger = get_task_logger(__name__)

@shared_task(bind=True, autoretry_for=(DatabaseError,), retry_kwargs={'max_retries': 3, 'countdown': 600})
def clearsessions(self):
    """
    Clear expired Django sessions from the database.
    Wrapper around Django's built-in clearsessions command with Celery integration.
    """
    try:
        logger.info("Starting Django session cleanup task")
        
        # Get session count before cleanup (for reporting)
        from django.contrib.sessions.models import Session
        initial_count = Session.objects.count()
        
        # Execute Django's clearsessions command
        with transaction.atomic():
            call_command('clearsessions', verbosity=0)
        
        # Get session count after cleanup
        final_count = Session.objects.count()
        sessions_removed = initial_count - final_count
        
        result = {
            "status": "completed",
            "sessions_removed": sessions_removed,
            "initial_count": initial_count,
            "final_count": final_count
        }
        
        logger.info(f"Session cleanup completed: {result}")
        return result
        
    except CommandError as cmd_exc:
        logger.error(f"Django clearsessions command failed: {cmd_exc}")
        raise self.retry(exc=cmd_exc)
    except DatabaseError as db_exc:
        logger.error(f"Database error during session cleanup: {db_exc}")
        raise self.retry(exc=db_exc)
    except Exception as exc:
        logger.error(f"Session cleanup task failed: {exc}")
        raise self.retry(exc=exc)

Acceptance Criteria

  • Celery task successfully clears expired Django sessions
  • Task runs on daily schedule (optimized timing)
  • All current cleanup functionality is preserved
  • Error handling improves reliability over cron system
  • Task execution can be monitored through Celery
  • Database performance impact is minimal
  • Session cleanup metrics are tracked and logged
  • Task handles large session tables efficiently

Django Session Considerations

  • Respect Django session backend configuration
  • Handle different session backends (database, cached, file)
  • Maintain session expiry logic accuracy
  • Consider session security implications
  • Handle session table locks gracefully

App Structure Decision

If creating a new core app for utility tasks:

  • Create poradnia/core/ directory structure
  • Add to INSTALLED_APPS in Django settings
  • Follow existing app patterns in the project
  • Document the purpose of the core app

Rollback Plan

  • Keep original cron-based clearsessions during transition
  • Monitor database performance after migration
  • Document rollback procedure for utility tasks

Success Metrics

  • Reliability: 100% daily execution success rate
  • Performance: Cleanup time comparable to or better than original
  • Database Health: Session table size maintained efficiently
  • Monitoring: Clear visibility into cleanup operations and results

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions