Open
Description
IcoswISC240_WOA23_performance_test
is failing on Chicoma.
Test log:
compass calling: compass.ocean.tests.global_ocean.performance_test.PerformanceTest.run()
inherited from: compass.testcase.TestCase.run()
in /users/althea/code/compass/main/compass/testcase.py
compass calling: compass.run.serial._run_test()
in /users/althea/code/compass/main/compass/run/serial.py
Running steps:
prognostic_ice_shelf_melt
data_ice_shelf_melt
* step: prognostic_ice_shelf_melt
compass calling: compass.ocean.tests.global_ocean.forward.ForwardStep.runtime_setup()
in /users/althea/code/compass/main/compass/ocean/tests/global_ocean/forward.py
Warning: replacing namelist options in namelist.ocean
config_dt = 02:00:00
config_btr_dt = 00:06:00
compass calling: compass.ocean.tests.global_ocean.forward.ForwardStep.run()
in /users/althea/code/compass/main/compass/ocean/tests/global_ocean/forward.py
Warning: replacing namelist options in namelist.ocean
config_pio_num_iotasks = 1
config_pio_stride = 36
Running: gpmetis graph.info 36
******************************************************************************
METIS 5.0 Copyright 1998-13, Regents of the University of Minnesota
(HEAD: , Built on: Jan 8 2025, 16:43:49)
size of idx_t: 64bits, real_t: 64bits, idx_t *: 64bits
Graph Information -----------------------------------------------------------
Name: graph.info, #Vertices: 7301, #Edges: 21002, #Parts: 36
Options ---------------------------------------------------------------------
ptype=kway, objtype=cut, ctype=shem, rtype=greedy, iptype=metisrb
dbglvl=0, ufactor=1.030, no2hop=NO, minconn=NO, contig=NO, nooutput=NO
seed=-1, niter=10, ncuts=1
Direct k-way Partitioning ---------------------------------------------------
- Edgecut: 1446, communication volume: 1535.
- Balance:
constraint #0: 1.026 out of 0.005
- Most overweight partition:
pid: 25, actual: 208, desired: 202, ratio: 1.03.
- Subdomain connectivity: max: 6, min: 2, avg: 4.33
- Each partition is contiguous.
Timing Information ----------------------------------------------------------
I/O: 0.004 sec
Partitioning: 0.016 sec (METIS time)
Reporting: 0.001 sec
Memory Information ----------------------------------------------------------
Max memory used: 1.575 MB
******************************************************************************
Running: srun -c 1 -N 1 -n 36 ./ocean_model -n namelist.ocean -s streams.ocean
PE 0: MPICH processor detected:
PE 0: AMD Rome (23:49:0) (family:model:stepping)
MPI VERSION : CRAY MPICH version 8.1.28.29 (ANL base 3.4a2)
MPI BUILD INFO : Wed Nov 15 20:57 2023 (git hash 1cde46f) (CH4)
PE 0: MPICH environment settings =====================================
PE 0: MPICH_ENV_DISPLAY = 1
PE 0: MPICH_VERSION_DISPLAY = 1
PE 0: MPICH_ABORT_ON_ERROR = 0
PE 0: MPICH_CPUMASK_DISPLAY = 0
PE 0: MPICH_STATS_DISPLAY = 0
PE 0: MPICH_RANK_REORDER_METHOD = 1
PE 0: MPICH_RANK_REORDER_DISPLAY = 0
PE 0: MPICH_MEMCPY_MEM_CHECK = 0
PE 0: MPICH_USE_SYSTEM_MEMCPY = 0
PE 0: MPICH_OPTIMIZED_MEMCPY = 1
PE 0: MPICH_ALLOC_MEM_PG_SZ = 4096
PE 0: MPICH_ALLOC_MEM_POLICY = PREFERRED
PE 0: MPICH_ALLOC_MEM_AFFINITY = SYS_DEFAULT
PE 0: MPICH_MALLOC_FALLBACK = 0
PE 0: MPICH_MEM_DEBUG_FNAME =
PE 0: MPICH_INTERNAL_MEM_AFFINITY = SYS_DEFAULT
PE 0: MPICH_NO_BUFFER_ALIAS_CHECK = 0
PE 0: MPICH_COLL_SYNC = MPI_Bcast
PE 0: MPICH_SINGLE_HOST_ENABLED = 1
PE 0: MPICH_USE_PERSISTENT_TOPS = 0
PE 0: MPICH_DISABLE_PERSISTENT_RECV_TOPS = 0
PE 0: MPICH_MAX_TOPS_COUNTERS = 0
PE 0: MPICH_ENABLE_ACTIVE_WAIT = 0
PE 0: MPICH/RMA environment settings =================================
PE 0: MPICH_RMA_MAX_PENDING = 128
PE 0: MPICH_RMA_SHM_ACCUMULATE = 0
PE 0: MPICH/Dynamic Process Management environment settings ==========
PE 0: MPICH_DPM_DIR =
PE 0: MPICH_LOCAL_SPAWN_SERVER = 0
PE 0: MPICH_SPAWN_USE_RANKPOOL = 0
PE 0: MPICH/SMP environment settings =================================
PE 0: MPICH_SMP_SINGLE_COPY_MODE = XPMEM
PE 0: MPICH_SMP_SINGLE_COPY_SIZE = 8192
PE 0: MPICH_SHM_PROGRESS_MAX_BATCH_SIZE = 8
PE 0: MPICH/COLLECTIVE environment settings ==========================
PE 0: MPICH_COLL_OPT_OFF = 0
PE 0: MPICH_BCAST_ONLY_TREE = 1
PE 0: MPICH_BCAST_INTERNODE_RADIX = 4
PE 0: MPICH_BCAST_INTRANODE_RADIX = 4
PE 0: MPICH_ALLTOALL_SHORT_MSG = 64-512
PE 0: MPICH_ALLTOALL_SYNC_FREQ = 1-24
PE 0: MPICH_ALLTOALLV_THROTTLE = 8
PE 0: MPICH_ALLGATHER_VSHORT_MSG = 1024-4096
PE 0: MPICH_ALLGATHERV_VSHORT_MSG = 1024-4096
PE 0: MPICH_GATHERV_SHORT_MSG = 131072
PE 0: MPICH_GATHERV_MIN_COMM_SIZE = 64
PE 0: MPICH_GATHERV_MAX_TMP_SIZE = 536870912
PE 0: MPICH_GATHERV_SYNC_FREQ = 16
PE 0: MPICH_IGATHERV_MIN_COMM_SIZE = 1000
PE 0: MPICH_IGATHERV_SYNC_FREQ = 100
PE 0: MPICH_IGATHERV_RAND_COMMSIZE = 2048
PE 0: MPICH_IGATHERV_RAND_RECVLIST = 0
PE 0: MPICH_SCATTERV_SHORT_MSG = 2048-8192
PE 0: MPICH_SCATTERV_MIN_COMM_SIZE = 64
PE 0: MPICH_SCATTERV_MAX_TMP_SIZE = 536870912
PE 0: MPICH_SCATTERV_SYNC_FREQ = 16
PE 0: MPICH_SCATTERV_SYNCHRONOUS = 0
PE 0: MPICH_ALLREDUCE_MAX_SMP_SIZE = 262144
PE 0: MPICH_ALLREDUCE_BLK_SIZE = 716800
PE 0: MPICH_GPU_ALLGATHER_VSHORT_MSG_ALGORITHM = 1
PE 0: MPICH_GPU_ALLREDUCE_USE_KERNEL = 0
PE 0: MPICH_GPU_COLL_STAGING_BUF_SIZE = 1048576
PE 0: MPICH_GPU_ALLREDUCE_STAGING_THRESHOLD = 256
PE 0: MPICH_ALLREDUCE_NO_SMP = 0
PE 0: MPICH_REDUCE_NO_SMP = 0
PE 0: MPICH_REDUCE_SCATTER_COMMUTATIVE_LONG_MSG_SIZE = 524288
PE 0: MPICH_REDUCE_SCATTER_MAX_COMMSIZE = 1000
PE 0: MPICH_SHARED_MEM_COLL_OPT = 1
PE 0: MPICH_SHARED_MEM_COLL_NCELLS = 8
PE 0: MPICH_SHARED_MEM_COLL_CELLSZ = 256
PE 0: MPICH MPIIO environment settings ===============================
PE 0: MPICH_MPIIO_HINTS_DISPLAY = 0
PE 0: MPICH_MPIIO_HINTS = NULL
PE 0: MPICH_MPIIO_ABORT_ON_RW_ERROR = disable
PE 0: MPICH_MPIIO_CB_ALIGN = 2
PE 0: MPICH_MPIIO_DVS_MAXNODES = -1
PE 0: MPICH_MPIIO_AGGREGATOR_PLACEMENT_DISPLAY = 0
PE 0: MPICH_MPIIO_AGGREGATOR_PLACEMENT_STRIDE = -1
PE 0: MPICH_MPIIO_MAX_NUM_IRECV = 50
PE 0: MPICH_MPIIO_MAX_NUM_ISEND = 50
PE 0: MPICH_MPIIO_MAX_SIZE_ISEND = 10485760
PE 0: MPICH_MPIIO_OFI_STARTUP_CONNECT = disable
PE 0: MPICH_MPIIO_OFI_STARTUP_NODES_AGGREGATOR = 2
PE 0: MPICH MPIIO statistics environment settings ====================
PE 0: MPICH_MPIIO_STATS = 0
PE 0: MPICH_MPIIO_TIMERS = 0
PE 0: MPICH_MPIIO_WRITE_EXIT_BARRIER = 1
PE 0: MPICH Thread Safety settings ===================================
PE 0: MPICH_ASYNC_PROGRESS = 0
PE 0: MPICH_OPT_THREAD_SYNC = 1
PE 0: rank 0 required = funneled, was provided = funneled
MPICH ERROR [Rank 0] [job id 21208684.35] [Fri Jan 10 09:53:02 2025] [nid001265] - Abort(1734831948) (rank 0 in comm 0): application called MPI_Abort(MPI_COMM_WORLD, 1734831948) - process 0
aborting job:
application called MPI_Abort(MPI_COMM_WORLD, 1734831948) - process 0
srun: error: nid001265: task 0: Exited with exit code 255
srun: Terminating StepId=21208684.35
slurmstepd: error: *** STEP 21208684.35 ON nid001265 CANCELLED AT 2025-01-10T09:53:02 ***
srun: error: nid001265: tasks 1-35: Terminated
srun: Force Terminated StepId=21208684.35
Failed
Exception raised while running the steps of the test case
Traceback (most recent call last):
File "/users/althea/code/compass/main/compass/run/serial.py", line 322, in _log_and_run_test
_run_test(test_case, available_resources)
~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/users/althea/code/compass/main/compass/run/serial.py", line 419, in _run_test
_run_step(test_case, step, test_case.new_step_log_file,
~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
available_resources)
^^^^^^^^^^^^^^^^^^^^
File "/users/althea/code/compass/main/compass/run/serial.py", line 470, in _run_step
step.run()
~~~~~~~~^^
File "/users/althea/code/compass/main/compass/ocean/tests/global_ocean/forward.py", line 224, in run
run_model(self, update_pio=update_pio)
~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/users/althea/code/compass/main/compass/model.py", line 60, in run_model
run_command(args=args, cpus_per_task=cpus_per_task, ntasks=ntasks,
~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
openmp_threads=openmp_threads, config=config, logger=logger)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/users/althea/code/compass/main/compass/parallel.py", line 149, in run_command
check_call(command_line_args, logger, env=env)
~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/users/althea/miniforge3/envs/dev_compass_1.7.0-alpha.1/lib/python3.13/site-packages/mpas_tools/logging.py", line 59, in check_call
raise subprocess.CalledProcessError(process.returncode,
print_args)
subprocess.CalledProcessError: Command 'srun -c 1 -N 1 -n 36 ./ocean_model -n namelist.ocean -s streams.ocean' returned non-zero exit status 143.