Skip to content

tickets/DM-43715 #894

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 52 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
98988f0
Build jemalloc from source with profiling enabled
fritzm Mar 14, 2025
ab1036f
Made MyISAM the only engine option for the result tables
iagaponenko Apr 11, 2025
9b035db
Extended transient API of QMeta to read workers-to-chunks map from da…
iagaponenko Apr 5, 2024
a71e691
CzarFamilyMap create now waits for a successful read.
jgates108 Jul 22, 2024
4e1126b
Changed Czar to catch 5GB limit.
jgates108 Dec 18, 2024
47338f0
Extended transient API of QMeta to read workers-to-chunks map from da…
iagaponenko Apr 5, 2024
23ed9e1
Czar and workers can send http messages to each other.
jgates108 May 16, 2024
9c57462
Added cancellation code and for queries, uberjobs, and czar restart.
jgates108 Sep 3, 2024
4d96824
Added worker believed czar was dead handling.
jgates108 Oct 1, 2024
8307937
Changed Czar to catch 5GB limit.
jgates108 Dec 18, 2024
32084a7
Added family map option to not use chunk size for distribution.
jgates108 Feb 12, 2025
24f5d37
Added JobErrorMsg.
jgates108 Apr 29, 2025
4a487a6
Rebase fixes.
jgates108 May 9, 2025
c4ca9f1
Removed protobufs.
jgates108 May 9, 2025
94365bc
Review changes.
jgates108 May 14, 2025
6e012ef
Merge pull request #924 from lsst/tickets/DM-50667
jgates108 May 15, 2025
6ae7661
Fixed a bug in the result management service of the worker
iagaponenko Jun 4, 2025
4c3a0d8
Removed protobufs.
jgates108 May 9, 2025
b89985c
Review changes.
jgates108 May 14, 2025
32c54de
Added worker executable.
jgates108 May 16, 2025
b358bc6
Worker starts but doesn't connect to mysql.
jgates108 May 20, 2025
2039982
Passes integration tests without xrootd.
jgates108 May 21, 2025
c7edfc1
Removed dead code.
jgates108 May 22, 2025
271dc27
Capped source length and fixed issue with map reading.
jgates108 May 29, 2025
51ae801
Fixed Qmeta message table write.
jgates108 Jun 4, 2025
53b560d
Fixed czar hanging on CsvStream::push.
jgates108 Jun 5, 2025
a1338b2
Fixed a bug in the result management service of the worker
iagaponenko Jun 4, 2025
0108bec
Added worker executable.
jgates108 May 16, 2025
662482d
Worker starts but doesn't connect to mysql.
jgates108 May 20, 2025
8036eb0
Passes integration tests without xrootd.
jgates108 May 21, 2025
da6b994
Removed dead code.
jgates108 May 22, 2025
19855dc
Capped source length and fixed issue with map reading.
jgates108 May 29, 2025
2a02dad
Fixed Qmeta message table write.
jgates108 Jun 4, 2025
c354d40
Fixed czar hanging on CsvStream::push.
jgates108 Jun 5, 2025
e3256a1
Merge branch 'tickets/DM-50621' of github.com:lsst/qserv into tickets…
jgates108 Jun 5, 2025
761153d
Merge pull request #926 from lsst/tickets/DM-50621
jgates108 Jun 6, 2025
166246d
Improved implementatin of the chunk map building algorithm
iagaponenko Jun 25, 2025
c16c9ab
Merge branch 'tickets/DM-51534' into tickets/DM-43715
iagaponenko Jun 25, 2025
2021e5e
Changed FQDN calls to blocking or using stored value.
jgates108 Jul 14, 2025
17830dd
Merge pull request #942 from lsst/tickets/DM-51795
jgates108 Jul 14, 2025
3e6ea56
Added CsvStream child that stores complete result in memory.
jgates108 Jun 13, 2025
ef786f1
Added memory/disk hybrid for transfering csv files.
jgates108 Jun 20, 2025
2b51c1e
Some integration tests failing.
jgates108 Jun 25, 2025
5a94437
Fixed shared pointer loops associated with CsvStream.
jgates108 Jun 26, 2025
587c76a
Fix for LIMIT problem.
jgates108 Jun 27, 2025
2b8e83a
Fix for large LIMIT queries.
jgates108 Jul 1, 2025
cb40f0c
Removed "stream" and "memory" csv file transfer methods.
jgates108 Jul 2, 2025
b3a1a18
Reverted CsvBuffer.
jgates108 Jul 2, 2025
6465dcb
Added czar id to temporary file names.
jgates108 Jul 3, 2025
422696a
Improved LIMIT fix.
jgates108 Jul 3, 2025
9fa1acd
Changed integration testing configuration.
jgates108 Jul 7, 2025
40c4d39
Merge pull request #939 from lsst/tickets/DM-51299
jgates108 Jul 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
24 changes: 4 additions & 20 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -305,37 +305,21 @@ jobs:
if: always()
run: docker logs ${USER}-czar-http-1

- name: Czar CMSD Log
if: always()
run: docker logs ${USER}-czar-cmsd-1

- name: Czar XROOTD Log
if: always()
run: docker logs ${USER}-czar-xrootd-1

- name: Czar MariaDB Log
if: always()
run: docker logs ${USER}-czar-mariadb-1

- name: Qzerv Worker 0 CMSD Log
- name: Qzerv Worker 0 worker-svc Log
if: always()
run: docker logs ${USER}-worker-cmsd-0-1

- name: Qzerv Worker 0 XROOTD Log
if: always()
run: docker logs ${USER}-worker-xrootd-0-1
run: docker logs ${USER}-worker-svc-0-1

- name: Qzerv Worker 0 MariaDB Log
if: always()
run: docker logs ${USER}-worker-mariadb-0-1

- name: Qzerv Worker 1 CMSD Log
if: always()
run: docker logs ${USER}-worker-cmsd-1-1

- name: Qzerv Worker 1 XROOTD Log
- name: Qzerv Worker 1 worker-svc Log
if: always()
run: docker logs ${USER}-worker-xrootd-1-1
run: docker logs ${USER}-worker-svc-1-1

- name: Qzerv Worker 1 MariaDB Log
if: always()
Expand Down
156 changes: 28 additions & 128 deletions admin/local/docker/compose/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,11 @@ x-log-volume:
- type: bind
source: ./log/
target: /config-etc/log/
x-worker-cmsd:
&worker-cmsd
image: "${QSERV_IMAGE:?err}"
init: true
# ports are published in worker-xrootd because this container uses that container's network stack.
x-worker-xrootd:
&worker-xrootd
x-worker-svc:
&worker-svc
image: "${QSERV_IMAGE:?err}"
init: true
expose:
- "1094"
- "2131"
- "3306" # for the worker db, which shares this container's network stack.
x-repl-worker:
&repl-worker
Expand All @@ -43,6 +36,7 @@ volumes:
volume_czar_xrootd:
volume_czar_home:
volume_czar_cfg:
volume_czar_transfer:

volume_czar_mariadb_data:
volume_czar_mariadb_cfg:
Expand All @@ -53,14 +47,12 @@ volumes:

volume_worker_0_data:
volume_worker_0_results:
volume_worker_0_xrootd:
volume_worker_0_home:
volume_worker_0_mariadb_lib:
volume_worker_0_mariadb_run:

volume_worker_1_data:
volume_worker_1_results:
volume_worker_1_xrootd:
volume_worker_1_home:
volume_worker_1_mariadb_lib:
volume_worker_1_mariadb_run:
Expand Down Expand Up @@ -97,30 +89,26 @@ services:
- type: volume
source: volume_worker_0_mariadb_run
target: /var/run/mysqld # This is where the mariadb container puts the socket file
network_mode: "service:worker-xrootd-0"
worker-xrootd-0:
<< : *worker-xrootd
network_mode: "service:worker-svc-0"

worker-svc-0:
<< : *worker-svc
command: >
entrypoint worker-xrootd
entrypoint worker-svc
--db-uri mysql://qsmaster:[email protected]:3306
--db-admin-uri mysql://root:[email protected]:3306
--vnid-config "@/usr/local/lib64/libreplica.so {{db_uri}}/qservw_worker 0 0"
--repl-instance-id qserv_proj
--repl-auth-key replauthkey
--repl-admin-auth-key=repladminauthkey
--repl-registry-host repl-registry
--repl-registry-port 25082
--results-dirname /qserv/data/results
--cmsd-manager-name czar-xrootd
--mysql-monitor-password CHANGEME_MONITOR
--log-cfg-file=/config-etc/log/log-worker-xrootd.cnf
--log-cfg-file=/config-etc/log/log-worker-svc.cnf
volumes:
- type: volume
source: volume_worker_0_results
target: /qserv/data/results
- type: volume
source: volume_worker_0_xrootd
target: /var/run/xrootd
- type: volume
source: volume_worker_0_home
target: /home/qserv
Expand All @@ -131,36 +119,8 @@ services:
networks:
default:
aliases:
- worker-cmsd-0
- worker-mariadb-0
worker-cmsd-0:
<< : *worker-cmsd
command: >
entrypoint worker-cmsd
--db-uri mysql://qsmaster:CHANGEME2@worker-mariadb-0:3306
--vnid-config "@/usr/local/lib64/libreplica.so mysql://qsmaster:[email protected]:3306/qservw_worker 0 0"
--results-dirname /qserv/data/results
--repl-instance-id qserv_proj
--repl-auth-key replauthkey
--repl-admin-auth-key=repladminauthkey
--repl-registry-host repl-registry
--repl-registry-port 25082
--cmsd-manager-name czar-xrootd
network_mode: "service:worker-xrootd-0"
volumes:
- type: volume
source: volume_worker_0_results
target: /qserv/data/results
- type: volume
source: volume_worker_0_xrootd
target: /var/run/xrootd
- type: volume
source: volume_worker_0_home
target: /home/qserv
- type: volume
source: volume_worker_0_mariadb_run
target: /qserv/mariadb/run # This matches the ?socket=... location in --db-uri and --db-admin-uri
- << : *log-volume

repl-worker-0:
<< : *repl-worker
command: >
Expand All @@ -185,6 +145,7 @@ services:
source: volume_worker_0_home
target: /home/qserv
- << : *log-volume

# worker 1 uses and validates socket file (where possible) to connect to the worker-mariadb
worker-mariadb-1:
<< : *worker-mariadb
Expand All @@ -202,31 +163,27 @@ services:
- type: volume
source: volume_worker_1_mariadb_run
target: /var/run/mysqld # This is where the mariadb container puts the socket file
network_mode: "service:worker-xrootd-1"
worker-xrootd-1:
<< : *worker-xrootd
network_mode: "service:worker-svc-1"

worker-svc-1:
<< : *worker-svc
command: >
entrypoint --log-level DEBUG worker-xrootd
entrypoint --log-level DEBUG worker-svc
--db-uri mysql://qsmaster:[email protected]:3306?socket={{db_socket}}
--db-admin-uri mysql://root:[email protected]:3306?socket={{db_socket}}
--vnid-config "@/usr/local/lib64/libreplica.so mysql://qsmaster:[email protected]:3306/qservw_worker 0 0"
--repl-instance-id qserv_proj
--repl-auth-key replauthkey
--repl-admin-auth-key=repladminauthkey
--repl-registry-host repl-registry
--repl-registry-port 25082
--results-dirname /qserv/data/results
--cmsd-manager-name czar-xrootd
--mysql-monitor-password CHANGEME_MONITOR
--targs db_socket=/qserv/mariadb/run/mysqld.sock
--log-cfg-file=/config-etc/log/log-worker-xrootd.cnf
--log-cfg-file=/config-etc/log/log-worker-svc.cnf
volumes:
- type: volume
source: volume_worker_1_results
target: /qserv/data/results
- type: volume
source: volume_worker_1_xrootd
target: /var/run/xrootd
- type: volume
source: volume_worker_1_home
target: /home/qserv
Expand All @@ -237,36 +194,8 @@ services:
networks:
default:
aliases:
- worker-cmsd-1
- worker-mariadb-1
worker-cmsd-1:
<< : *worker-cmsd
command: >
entrypoint --log-level DEBUG worker-cmsd
--db-uri mysql://qsmaster:CHANGEME2@worker-mariadb-1:3306?socket=/qserv/mariadb/run/mysqld.sock
--vnid-config "@/usr/local/lib64/libreplica.so mysql://qsmaster:[email protected]:3306/qservw_worker 0 0"
--results-dirname /qserv/data/results
--repl-instance-id qserv_proj
--repl-auth-key replauthkey
--repl-admin-auth-key=repladminauthkey
--repl-registry-host repl-registry
--repl-registry-port 25082
--cmsd-manager-name czar-xrootd
network_mode: "service:worker-xrootd-1"
volumes:
- type: volume
source: volume_worker_1_results
target: /qserv/data/results
- type: volume
source: volume_worker_1_xrootd
target: /var/run/xrootd
- type: volume
source: volume_worker_1_home
target: /home/qserv
- type: volume
source: volume_worker_1_mariadb_run
target: /qserv/mariadb/run
- << : *log-volume

repl-worker-1:
<< : *repl-worker
# qserv-replica-worker app does not support socket file yet.
Expand All @@ -292,42 +221,7 @@ services:
source: volume_worker_1_home
target: /home/qserv
- << : *log-volume
czar-xrootd:
image: "${QSERV_IMAGE:?err}"
init: true
command: >
entrypoint xrootd-manager
--cmsd-manager-name czar-xrootd
hostname: czar-xrootd
expose:
- "1094"
- "2131"
volumes:
- type: volume
source: volume_czar_xrootd
target: /var/run/xrootd
- type: volume
source: volume_worker_1_home
target: /home/qserv
- << : *log-volume
networks:
default:
aliases:
- czar-cmsd
czar-cmsd:
image: "${QSERV_IMAGE:?err}"
init: true
# NOTE!! cms-delay-servers must match the number of workers being launched!
command: entrypoint cmsd-manager --cms-delay-servers 2
network_mode: "service:czar-xrootd"
volumes:
- type: volume
source: volume_czar_xrootd
target: /var/run/xrootd
- type: volume
source: volume_czar_home
target: /home/qserv
- << : *log-volume

czar-mariadb:
image: "${QSERV_MARIADB_IMAGE:?err}"
init: true
Expand All @@ -351,6 +245,7 @@ services:
- type: volume
source: volume_czar_mariadb_run
target: /var/run/mysqld

czar-proxy:
image: "${QSERV_IMAGE:?err}"
init: true
Expand All @@ -359,7 +254,6 @@ services:
--db-uri mysql://qsmaster:[email protected]:3306?socket={{db_socket}}
--db-admin-uri mysql://root:[email protected]:3306?socket={{db_socket}}
--targs db_socket=/qserv/mariadb/run/mysqld.sock
--xrootd-manager czar-xrootd
--log-cfg-file=/config-etc/log/log-czar-proxy.cnf
--repl-instance-id qserv_proj
--repl-auth-key replauthkey
Expand All @@ -379,6 +273,10 @@ services:
- type: volume
source: volume_czar_mariadb_run
target: /qserv/mariadb/run
- type: volume
source: volume_czar_transfer
target: /tmp

- << : *log-volume
expose:
- "3306" # for czar-mariadb
Expand All @@ -395,7 +293,6 @@ services:
command: >
entrypoint --log-level DEBUG czar-http
--db-uri mysql://qsmaster:CHANGEME2@czar-mariadb:3306/
--xrootd-manager czar-xrootd
--czar-name http
--http-port 4048
--http-threads 4
Expand All @@ -414,6 +311,9 @@ services:
- type: volume
source: volume_czar_cfg
target: /config-etc
- type: volume
source: volume_czar_transfer
target: /tmp
- type: volume
source: volume_czar_home
target: /home/qserv
Expand Down Expand Up @@ -453,10 +353,10 @@ services:
--instance-id=qserv_proj
--auth-key=replauthkey
--admin-auth-key=repladminauthkey
--xrootd-host=czar-xrootd
--registry-host=repl-registry
--controller-auto-register-workers=1
--qserv-sync-force
--qserv-chunk-map-update
--debug
expose:
- "25081"
Expand Down
2 changes: 0 additions & 2 deletions admin/local/docker/compose/log/log-czar-proxy.cnf
Original file line number Diff line number Diff line change
Expand Up @@ -4,5 +4,3 @@ log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%d{yyyy-MM-ddTHH:mm:ss.SSSZ} LWP %-5X{LWP} %-5p %m%n

log4j.logger.lsst.qserv.xrdssi.msgs=WARN
#log4j.logger.lsst.qserv.xrdssi.msgs=DEBUG
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,3 @@ log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%d{yyyy-MM-ddTHH:mm:ss.SSSZ} LWP %-5X{LWP} %-5p %m%n

log4j.logger.lsst.qserv.xrdssi.msgs=WARN
3 changes: 0 additions & 3 deletions admin/local/docker/compose/log/log.cnf
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,3 @@ log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
log4j.appender.CONSOLE.layout.ConversionPattern=%d{yyyy-MM-ddTHH:mm:ss.SSSZ} LWP %-5X{LWP} %-5p %m%n

log4j.logger.lsst.qserv.xrdssi.msgs=WARN
#log4j.logger.lsst.qserv.xrdssi.msgs=DEBUG

Loading
Loading