Skip to content

Unrecoverable crash every few days: context deadline exceeded #359

Open
@nathang21

Description

@nathang21
  • [x ] I have checked the existing issues to avoid duplicates
  • [x ] I have redacted any info hashes and content metadata from any logs or screenshots attached to this issue

Describe the bug

Bitmagnet container crashes after some delay, about every ~1-3 days from what I can tell. It does not restart/recover automatically, seemingly because the exit code is 1 (docker is weird about restarting only for certain exit codes). I have been having some intermittent network stability issues with my ISP, and it seems loosely correlated with that, although it has happened even when I didn't notice any other problems on my network, so i'm not fully convinced that is the trigger/root cause.

I've attached the raw debug logs from a recent failover, AFAICT there isn't anything sensitive in them as it's mostly errors but apologize if I missed anything, happy to edit/redact if needed.
bitmagnet.log

To Reproduce

Steps to reproduce the behavior:

  1. Boot Bitmagnet
  2. Wait for issue to re-occur
  3. View stopped container and inspect logs

Expected behavior

Bitmagnet to remain stable and not crash, however if it does crash ideally it would self-recover better as well.

Environment Information (Required)

  • Bitmagnet version: v0.9.5
  • OS and version: macOS 15.1.1 (24B2091)
  • Browser and version (if issue is with WebUI): Version 131.0.6778.70 (Official Build) (arm64) (not WebUI related)
  • Please specify any config values for which you have overridden the defaults: See docker compose below

Additional context

Bitmagnet was super heavy on Disk I/O, and I have plenty of RAM so I made some tweaks to the Postgres config to prefer RAM over Disk I/O in some cases, which has helped a lot with the performance of my Synology NAS DS423+.

Docker Compose:

bitmagnet:
    container_name: bitmagnet
    image: ghcr.io/bitmagnet-io/bitmagnet:latest
    volumes:
      - /volume2/docker/starr-trash/bitmagnet:/root/.local/share/bitmagnet
    restart: always
    environment:
      - LOG_FILE_ROTATOR_ENABLED=true
      - POSTGRES_HOST=bitmagnet-postgres
      - POSTGRES_PASSWORD=<REDACTED>
      - TMDB_API_KEY=<REDACTED>
      - CLASSIFIER_DELETE_XXX=true
      - DHT_CRAWLER.SCAILING_FACTOR=5
      - LOG_LEVEL=debug
    labels:
      - autoheal=true
    shm_size: 1g
    logging:
      driver: json-file
      options:
        max-file: ${DOCKERLOGGING_MAXFILE}
        max-size: ${DOCKERLOGGING_MAXSIZE}
    # logging:
    #   driver: none
    # Ports mapped via VPN
    # ports:
    #   - 3333:3333 # Bitmagnet - API and WebUI
    #   - 3334:3334/tcp # Bitmagnet - BitTorrent
    #   - 3334:3334/udp # Bitmagnet - BitTorrent
    network_mode: service:gluetun
    depends_on:
      gluetun:
        condition: service_healthy # Used by gluetun-healthcheck.sh script.
        restart: true
      bitmagnet-postgres:
        condition: service_healthy
        restart: true
    healthcheck:
      test: "nc -z localhost 9999 || kill 1"
      interval: 1m
      timeout: 1m
      start_period: 300s
    command:
      - worker
      - run
      # Run all workers:
      - --all
      # Or enable individual workers:
      # - --keys=http_server
      # - --keys=queue_server
      # - --keys=dht_crawler

  bitmagnet-postgres:
    image: postgres:16-alpine
    container_name: bitmagnet-postgres
    volumes:
      - /volume2/docker/starr-trash/bitmagnet/postgres:/var/lib/postgresql/data
    ports:
      - "6432:5432"
    shm_size: 3g
    restart: always
    command:
      -c shared_buffers=3GB
      -c work_mem=256MB
      -c maintenance_work_mem=512MB
      -c checkpoint_timeout=30min
      -c checkpoint_completion_target=0.9
      -c wal_buffers=128MB
      -c effective_cache_size=6GB
      -c synchronous_commit=off
      -c autovacuum_vacuum_cost_limit=2000
      -c autovacuum_vacuum_cost_delay=10ms
      -c autovacuum_max_workers=3
      -c autovacuum_naptime=20s
      -c autovacuum_vacuum_scale_factor=0.05
      -c autovacuum_analyze_scale_factor=0.02
      -c temp_file_limit=5GB
      # Risk data loss
      # -c fsync=off
      # -c full_page_writes=off
    # logging:
    #   driver: none
    environment:
      - POSTGRES_PASSWORD=<REDACTED>
      - POSTGRES_DB=bitmagnet
      - PGUSER=postgres
    healthcheck:
      test: ["CMD-SHELL", "pg_isready"]
      interval: 10s
      start_period: 60s
    networks:
      syno-bridge:
        # https://github.com/qdm12/gluetun-wiki/blob/main/setup/inter-containers-networking.md#between-a-gluetun-connected-container-and-another-container
        # Required until fixed: https://github.com/qdm12/gluetun/issues/281
        ipv4_address: <REDACTED>

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions