Skip to content

Conversation

@aerorahul
Copy link
Contributor

This PR:

  • specifies files for use in the initial condition generation instead of globs
  • unifies the gfs and gefs processing code and supplies the variables read from the grib2 files from configuration yamls
  • adds python logging to the code base; --debug can be used to enable verbose logging
  • adds testing to the download class. more testing will be added for initial condition pre-processing

Usage:

❯ gen_ics --help
usage: gen_ics [-h] {gfs,gefs} ...

Download IC data for GFS or GEFS

positional arguments:
  {gfs,gefs}  Model to download and process initial conditions for [GFS | GEFS]
    gfs       Download GFS data
    gefs      Download GEFS ensemble data

options:
  -h, --help  show this help message and exit

Detailed usage for gfs|gefs can be obtained as:

❯ gen_ics gfs --help
usage: gen_ics gfs [-h] --current-cycle YYYYMMDDHH [--source {local,s3}] [--target TARGET]
                   [--bucket-name BUCKET_NAME] [--bucket-root-directory BUCKET_ROOT_DIRECTORY]
                   [--comroot COMROOT] [--num-levels NUM_LEVELS] [--varinfo-yaml VARINFO_YAML]
                   [--output OUTPUT] [--debug] [--download-only]

options:
  -h, --help            show this help message and exit
  --current-cycle YYYYMMDDHH
                        Datetime to download and process initial conditions for in YYYYMMDDHH format
  --source {local,s3}   Data source for getting model grib2 data
  --target TARGET       Target directory to store grib2 model data into
  --bucket-name BUCKET_NAME
                        S3 bucket name. [default: noaa-gfs-bdp-pds (for GFS), noaa-ncepdev-none-ca-ufs-
                        cpldcld (for GEFS)]
  --bucket-root-directory BUCKET_ROOT_DIRECTORY
                        S3 bucket root directory. [default: (for GFS), Linlin.Cui/gefs_wcoss2 (for GEFS)]
  --comroot COMROOT     Root directory. [default: /lfs/h1/ops/prod/com/gfs/v16.3 (for GFS),
                        /lfs/h1/ops/prod/com/gefs/v12.3 (for GEFS)]
  --num-levels NUM_LEVELS
                        Number of vertical levels to download from the model data
  --varinfo-yaml VARINFO_YAML
                        Path to the varinfo YAML file
  --output OUTPUT       Name of the output NetCDF file
  --debug               Set logging level to DEBUG
  --download-only       Only download the data, do not process

gefs provides additional arguments for member:

  --member {c00,p01,p02,p03,p04,p05,p06,p07,p08,p09,p10,p11,p12,p13,p14,p15,p16,p17,p18,p19,p20,p21,p22,p23,p24,p25,p26,p27,p28,p29,p30}
                        Ensemble member

pytests on github actions are failing due to missing g2c library. I'll need add that to the cache action soon enough.
Tests pass locally where pytest is available.

@LinlinCui-NOAA
Copy link
Collaborator

@aerorahul Creating a venv failed on Ursa because of library libg2c.so:

Collecting grib2io>=2.5.4 (from mlglobal==0.1.0)
  Downloading grib2io-2.5.4.tar.gz (938 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 938.6/938.6 kB 22.0 MB/s eta 0:00:00
  Installing build dependencies ... done

  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [26 lines of output]
      Traceback (most recent call last):
        File "/scratch3/NAGAPE/gpu-ai4wp/Linlin.Cui/git/aerorahul/MLGlobal/venv/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/scratch3/NAGAPE/gpu-ai4wp/Linlin.Cui/git/aerorahul/MLGlobal/venv/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/scratch3/NAGAPE/gpu-ai4wp/Linlin.Cui/git/aerorahul/MLGlobal/venv/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
                 ^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-ptqltm13/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 331, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=[])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-ptqltm13/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 301, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-ptqltm13/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 317, in run_setup
          exec(code, locals())
        File "<string>", line 295, in <module>
        File "<string>", line 70, in get_package_info
        File "<string>", line 148, in find_library
      ValueError:

      The library "libg2c.so" could not be found in any of the following
      directories:
      ['/usr', '/usr/local', '/opt/local', '/opt/homebrew', '/opt', '/sw']


      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

@aerorahul
Copy link
Contributor Author

Installing grib2io on ursa needs g2c. Discussed with @LinlinCui-NOAA and got that resolved.

Since this is using grib2io instead of wgrib2, we think it is a bit risky for upcoming release. We will consider this in a future release.

aerorahul pushed a commit that referenced this pull request Oct 10, 2025
This PR includes the last changes that SPA team made to
- `oper/gen_aigfs_ics.py`
- `oper/gen_aigefs_ics.py`

to operationalize the scripts. These changes are a band-aid fix that
will need a coordinated fix in the next release. A PR such as #54 would
likely resolve these issues. Some key areas we will need to improve on
include:
- `oper/gen_aigefs_ics.py`
- Remove boto initialization (caused failures during testing) and calls
to download data from s3 buckets
- Output netCDF file is still has `mlgefs` prefix, which is consistent
with
[exaigefs_prep.sh](https://github.com/NOAA-EMC/aigefs/blob/62f3cb50438e13fce5d95ddd3a09301a68ca1214/scripts/exaigefs_prep.sh#L40)
- Both scripts
  - Remove colon characters from intermediate file names
  - Explicitly locate files instead of looping and globbing (#36)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants