Workflow Examples for ERA Initialization

Introduction

EHRATM supports the use of custom ERA met files to initialize WRF simulations. Like GFS inputs, the WPS components will ultimately create a set of met_em* files, which can then be used to drive the WRF components. The ERA met files are not, however, part of the default WRF Preprocessing System (WPS), and need to be handled differently. The Vtables for conversion from GRIB2 format to WPS intermediate format are different, the ERA data comes in model and surface levels and needs to be processed into pressure levels, and then metgrid needs to take all of this data to produce the standard set of met_em* files. From that point on, however, WRF processing is the same for GFS and ERA initialized data.

Without a lot of explanation, these examples simply illustrate the use of wfnamelists to perform each of the steps through metgrid in both standalone mode and in a single workflow mode.

These examples are intended to follow on from the previous GFS examples, so it is assumed that the environment variables and test directory (with symbolic link epython3) have been set up in the same way. But, a quick review of the env variables

export PYTHONPATH=<path-to-high-res-atm>/packages/nwpservice/src:<path-to-high-res-atm>/packages/ehratm/src:<path-to-high-res-atm>/packages/ehratmworkflow/src

And, a reminder that copies of all of the following wfnamelists (and other important files) are available in the repository directory packages/ehratmworkflow/docs/UserPerspective/sample_workflows/

$ tree sample_workflows
sample_workflows
├── flexp_input.txt
├── namelist.input.gfs_twonest
├── small_domain_twonest.nml
├── wfnamelist.donothing
├── wfnamelist.ecmwf-fullwps-twonest-4mpitasks
├── wfnamelist.ecmwf-metgridonly-twonest-4mpitasks
├── wfnamelist.ecmwfungrib
├── wfnamelist.flexwrf+srm
├── wfnamelist.fullworkflow
├── wfnamelist.fullwps-twonest
├── wfnamelist.geogrid-twonest-4mpitasks
├── wfnamelist.gfs-metgridonly
├── wfnamelist.gfs-realonly
├── wfnamelist.gfsungrib
└── wfnamelist.gfs-wrfonly

0 directories, 15 files

Finally, a copy of a recent Workflow Namelist Reference document is available for understanding the format and options within a wfnamelist.


ERA ungrib

Unlike global GFS and ECMWF met data, ERA data is not stored in a standard place on CTBTO machines (in part, because these often represent varying, custom, regional domains for specific purposes). Therefore, EHRATM has adopted the convention that the ERA data it looks for will be stored in a similar kind of directory hierarchy that the global data is stored in, namely

<path-to-root-dir>/YYYY/MM/DD/

and the files will be named with an EA prefix, followed by a YYYYMMDDHH date/time string, followed by either a .ml (for 3D model levels) or .sfc (for 2D surface) extension. So, for example, in the following case we have a root data directory of /dvlscratch/ATM/morton/EHRATM_ERA_TESTDATA/, that looks as follows

$ tree /dvlscratch/ATM/morton/EHRATM_ERA_TESTDATA
/dvlscratch/ATM/morton/EHRATM_ERA_TESTDATA
└── 2014
    └── 01
        └── 24
            ├── EA2014012400.ml
            ├── EA2014012400.sfc
            ├── EA2014012403.ml
            ├── EA2014012403.sfc
            ├── EA2014012406.ml
            └── EA2014012406.sfc

3 directories, 6 files

The workflow namelist in this case consists of two ungrib entries, one for the model level data and one for the surface data (which may be located in the same directory). ehratmwf.py will recognize that for this type of data, once it has been ungribbed it will be processed by a WPS utility to created the pressure level ungribbed files.

It’s important to note that the ungrib components in this case use custom Vtable files that aren’t a part of the default WRF distributions. EHRATM is set up with default locations for these files, but it is possible for the user to specify their own within the namelist. In this example, for simplicity, we use the EHRATM default locations, so they don’t need to be part of the wfnamelist

wfnamelist.ecmwfungrib

&control
  workflow_list = 'ungrib'
/

&time
  start_time = '2014012400'
  end_time = '2014012406'
  wrf_spinup_hours = 0
/

&grib_input1
  type = 'ecmwf_ml'
  hours_intvl = 3
  rootdir = '/dvlscratch/ATM/morton/EHRATM_ERA_TESTDATA'
/

&grib_input2
  type = 'ecmwf_sfc'
  hours_intvl = 3
  rootdir = '/dvlscratch/ATM/morton/EHRATM_ERA_TESTDATA'
/
$ ./epython3 ehratmwf.py -n wfnamelist.ecmwfungrib
2023-09-19:19:03:08 --> Workflow started
2023-09-19:19:03:08 --> Start process_namelist()
2023-09-19:19:03:08 --> Started run_workflow(): /dvlscratch/ATM/morton/tmp/ehratmwf_20230919_190308.462201
2023-09-19:19:03:08 --> Start run_ungrib()
2023-09-19:19:03:11 --> Start run_ecmwfplevels()

-----------------------
Workflow Events Summary
-----------------------
2023-09-19:19:03:08 --> Workflow started
2023-09-19:19:03:08 --> Start process_namelist()
2023-09-19:19:03:08 --> Started run_workflow(): /dvlscratch/ATM/morton/tmp/ehratmwf_20230919_190308.462201
2023-09-19:19:03:08 --> Start run_ungrib()
2023-09-19:19:03:11 --> Start run_ecmwfplevels()
2023-09-19:19:03:13 --> Workflow completed...

The structure of the output directory is a little more busy than we’ve seen with GFS.

$ ls  -F /dvlscratch/ATM/morton/tmp/ehratmwf_20230919_190308.462201
ecmwfplevels_namelist.wps  namelist.wps_ecmwf_sfc  ungrib_rundir_ecmwf_ml/
ecmwfplevels_rundir/       tmpmetdata-ecmwf_ml/    ungrib_rundir_ecmwf_sfc/
namelist.wps_ecmwf_ml      tmpmetdata-ecmwf_sfc/

The ungribbed files are found in separate run directories for the model level inputs and the surface level inputs. As in the case of the GFS examples, these run directories are complete, and the interested user can go in here and experiment or troubleshoot, if desired.

In both dir listings, the ungribbed output files are the first three listed, with the timestamps.

$ ls -F /dvlscratch/ATM/morton/tmp/ehratmwf_20230919_190308.462201/ungrib_rundir_ecmwf_ml/WPS
ECMWF_ML:2014-01-24_00  GRIBFILE.AAB@   namelist.wps.all_options  ungrib.exe@
ECMWF_ML:2014-01-24_03  GRIBFILE.AAC@   namelist.wps.fire*        ungrib.log
ECMWF_ML:2014-01-24_06  link_grib.csh*  namelist.wps.global       util/
geogrid/                metgrid/        namelist.wps.nmm          Vtable@
geogrid.exe@            metgrid.exe@    README
GRIBFILE.AAA@           namelist.wps    ungrib/

$ ls -F /dvlscratch/ATM/morton/tmp/ehratmwf_20230919_190308.462201/ungrib_rundir_ecmwf_sfc/WPS
ECMWF_SFC:2014-01-24_00  GRIBFILE.AAB@   namelist.wps.all_options  ungrib.exe@
ECMWF_SFC:2014-01-24_03  GRIBFILE.AAC@   namelist.wps.fire*        ungrib.log
ECMWF_SFC:2014-01-24_06  link_grib.csh*  namelist.wps.global       util/
geogrid/                 metgrid/        namelist.wps.nmm          Vtable@
geogrid.exe@             metgrid.exe@    README
GRIBFILE.AAA@            namelist.wps    ungrib/

Finally, the execution of this wfnamelist has also resulted in the production of pressure level ungribbed files in its own run directory, producing the timestamped ungribbed files with the PRES prefix. When we run metgrid, we will need to specify the locations of all of these ungribbed files.

$ ls  -F /dvlscratch/ATM/morton/tmp/ehratmwf_20230919_190308.462201/ecmwfplevels_rundir/WPS
ecmwf_coeffs@             geogrid/        namelist.wps.all_options  README
ECMWF_ML:2014-01-24_00@   geogrid.exe@    namelist.wps.fire*        ungrib/
ECMWF_ML:2014-01-24_03@   link_grib.csh*  namelist.wps.global       ungrib.exe@
ECMWF_ML:2014-01-24_06@   logfile.log     namelist.wps.nmm          util/
ECMWF_SFC:2014-01-24_00@  metgrid/        PRES:2014-01-24_00
ECMWF_SFC:2014-01-24_03@  metgrid.exe@    PRES:2014-01-24_03
ECMWF_SFC:2014-01-24_06@  namelist.wps    PRES:2014-01-24_06

Note that the executable for producing the pressure level files is, with respect to the above directory, util/calc_ecmwf_p.exe, and, like in the other run directories, we can run this alone for experimenting and troubleshooting - the data is already set up for this.


Simple two-nest geogrid

This is exactly the same as the two-nest geogrid illustrated for GFS, but it’s being repeated here for completeness. The generation of the geogrid products is independent of the type of met data input being used. In fact, if you’ve saved your geogrid output from the GFS examples, you don’t even need to do the following - you can just use its path when running metgrid in the next step. This, hopefully, illustrates the asynchronous capabilities of the EHRATM worfklow component system - you don’t have to run full workflows when it doesn’t make sense.

For this one, we’ll use wfnamelist.geogrid-twonest-4mpitasks (the same one we used for GFS case example), which refers to the domain definition namelist, small_domain_twonest.nml (both of which I copy to my local dir for the following test). Note that most of the file/dir entries in the wfnamelist will support full pathnames. If not used, they refer to the current working directory.

The domain(s) created by geogrid are defined in the domain definition namelist, small_domain_twonest.nml

&domain_defn
  parent_id          =   1, 1,
  parent_grid_ratio  =   1, 3,
  i_parent_start     =   1, 20,
  j_parent_start     =   1, 20,
  e_we               =  51, 19,
  e_sn               =  42, 16,
  geog_data_res      =  '10m', '5m'
  dx                 =  30000,
  dy                 =  30000,
  map_proj           = 'lambert',
  ref_lat            = 50.00,
  ref_lon            = 5.00,
  truelat1           = 60.0,
  truelat2           = 30.0,
  stand_lon = 5.0,
/

whose path (in this case, a relative path) is specified in the wfnamelist, wfnamelist.geogrid-twonest-4mpitasks

&control
  workflow_list = 'geogrid'
  workflow_rootdir = '/dvlscratch/ATM/morton/tmp'
/

&domain_defn
  domain_defn_path = 'small_domain_twonest.nml'
/

&geogrid
  num_mpi_tasks = 4
/
$ ./epython3 ehratmwf.py -n wfnamelist.geogrid-twonest-4mpitasks
2023-09-19:22:53:05 --> Workflow started
2023-09-19:22:53:05 --> Start process_namelist()
2023-09-19:22:53:05 --> Started run_workflow(): /dvlscratch/ATM/morton/tmp/ehratmwf_20230919_225305.871366
2023-09-19:22:53:05 --> Start run_geogrid()

-----------------------
Workflow Events Summary
-----------------------
2023-09-19:22:53:05 --> Workflow started
2023-09-19:22:53:05 --> Start process_namelist()
2023-09-19:22:53:05 --> Started run_workflow(): /dvlscratch/ATM/morton/tmp/ehratmwf_20230919_225305.871366
2023-09-19:22:53:05 --> Start run_geogrid()
2023-09-19:22:53:16 --> Workflow completed...

Again, note the location of the run directory, as we’ll need this information to run the standalone metgrid component.

$ ls -F /dvlscratch/ATM/morton/tmp/ehratmwf_20230919_225305.871366/geogrid_rundir/WPS
geo_em.d01.nc     geogrid.log.0001  metgrid.exe@              namelist.wps.nmm
geo_em.d02.nc     geogrid.log.0002  namelist.wps              README
geogrid/          geogrid.log.0003  namelist.wps.all_options  ungrib/
geogrid.exe@      link_grib.csh*    namelist.wps.fire*        ungrib.exe@
geogrid.log.0000  metgrid/          namelist.wps.global       util/

The geogrid output files are geo_em.d0?.nc, and can be viewed with the ncview utility (if it’s not in the general system path, a copy of the executable is available in the repository, misc/utilities/ncview)

$ ncview /dvlscratch/ATM/morton/tmp/ehratmwf_20230919_225305.871366/geogrid_rundir/WPS/geo_em.d01.nc

ncview-landmask


Simple two-nest metgrid

For this one, we’ll use wfnamelist.ecmwf-metgridonly-twonest-4mpitasks

Because it is standalone, we need to specify where it can find the ungrib and geogrid files produced above. You will obviously need to edit for your own run directories. This is different from the simple GFS case because we need to specify the locations of three sets of ungribbed files - model level, surface level, and pressure level - using the output directories from the previous two components.

&control
  workflow_list = 'metgrid'
/

&time
  start_time = '2014012400'
  end_time = '2014012403'
  wrf_spinup_hours = 0
/

&metgrid
  ug_prefix_01 = 'ECMWF_ML'
  ug_path_01 = '/dvlscratch/ATM/morton/tmp/ehratmwf_20230919_190308.462201/ungrib_rundir_ecmwf_ml/WPS'
  ug_prefix_02 = 'ECMWF_SFC'
  ug_path_02 = '/dvlscratch/ATM/morton/tmp/ehratmwf_20230919_190308.462201/ungrib_rundir_ecmwf_sfc/WPS'
  ug_prefix_03 = 'PRES'
  ug_path_03 = '/dvlscratch/ATM/morton/tmp/ehratmwf_20230919_190308.462201/ecmwfplevels_rundir/WPS'
  geogrid_path = '/dvlscratch/ATM/morton/tmp/ehratmwf_20230919_225305.871366/geogrid_rundir/WPS'
  num_nests = 2
  hours_intvl = 3
  num_mpi_tasks = 4
/

And then, we run

$ ./epython3 ehratmwf.py -n wfnamelist.ecmwf-metgridonly-twonest-4mpitasks
2023-09-19:23:04:55 --> Workflow started
2023-09-19:23:04:55 --> Start process_namelist()
2023-09-19:23:04:55 --> Started run_workflow(): /dvlscratch/ATM/morton/tmp/ehratmwf_20230919_230455.938253
2023-09-19:23:04:55 --> Start run_metgrid()

-----------------------
Workflow Events Summary
-----------------------
2023-09-19:23:04:55 --> Workflow started
2023-09-19:23:04:55 --> Start process_namelist()
2023-09-19:23:04:55 --> Started run_workflow(): /dvlscratch/ATM/morton/tmp/ehratmwf_20230919_230455.938253
2023-09-19:23:04:55 --> Start run_metgrid()
2023-09-19:23:04:58 --> Workflow completed...

In viewing excerpts of the run directory listing, we see links to all of the necessary geogrid and ungrib files (model level, surface level and pressure level) as well as the met_em* output files. This complex run directory is all set up for another execution, if desired for experimental or troubleshooting purposes!

$ ls -l /dvlscratch/ATM/morton/tmp/ehratmwf_20230919_230455.938253/metgrid_rundir/WPS
total 27204
lrwxrwxrwx. 1 morton consult      108 Sep 19 23:04 ECMWF_ML:2014-01-24_00 -> /dvlscratch/ATM/morton/tmp/ehratmwf_20230919_190308.462201/ungrib_rundir_ecmwf_ml/WPS/ECMWF_ML:2014-01-24_00
lrwxrwxrwx. 1 morton consult      108 Sep 19 23:04 ECMWF_ML:2014-01-24_03 -> /dvlscratch/ATM/morton/tmp/ehratmwf_20230919_190308.462201/ungrib_rundir_ecmwf_ml/WPS/ECMWF_ML:2014-01-24_03
lrwxrwxrwx. 1 morton consult      110 Sep 19 23:04 ECMWF_SFC:2014-01-24_00 -> /dvlscratch/ATM/morton/tmp/ehratmwf_20230919_190308.462201/ungrib_rundir_ecmwf_sfc/WPS/ECMWF_SFC:2014-01-24_00
lrwxrwxrwx. 1 morton consult      110 Sep 19 23:04 ECMWF_SFC:2014-01-24_03 -> /dvlscratch/ATM/morton/tmp/ehratmwf_20230919_190308.462201/ungrib_rundir_ecmwf_sfc/WPS/ECMWF_SFC:2014-01-24_03
lrwxrwxrwx. 1 morton consult       91 Sep 19 23:04 geo_em.d01.nc -> /dvlscratch/ATM/morton/tmp/ehratmwf_20230919_225305.871366/geogrid_rundir/WPS/geo_em.d01.nc
lrwxrwxrwx. 1 morton consult       91 Sep 19 23:04 geo_em.d02.nc -> /dvlscratch/ATM/morton/tmp/ehratmwf_20230919_225305.871366/geogrid_rundir/WPS/geo_em.d02.nc
.
.
.
-rw-r--r--. 1 morton consult 11547876 Sep 19 23:04 met_em.d01.2014-01-24_00:00:00.nc
-rw-r--r--. 1 morton consult 11547876 Sep 19 23:04 met_em.d01.2014-01-24_03:00:00.nc
-rw-r--r--. 1 morton consult  1552892 Sep 19 23:04 met_em.d02.2014-01-24_00:00:00.nc
-rw-r--r--. 1 morton consult  1552892 Sep 19 23:04 met_em.d02.2014-01-24_03:00:00.nc

.
.
.
lrwxrwxrwx. 1 morton consult      101 Sep 19 23:04 PRES:2014-01-24_00 -> /dvlscratch/ATM/morton/tmp/ehratmwf_20230919_190308.462201/ecmwfplevels_rundir/WPS/PRES:2014-01-24_00
lrwxrwxrwx. 1 morton consult      101 Sep 19 23:04 PRES:2014-01-24_03 -> /dvlscratch/ATM/morton/tmp/ehratmwf_20230919_190308.462201/ecmwfplevels_rundir/WPS/PRES:2014-01-24_03
.
.
.

Full WPS ungrib + geogrid + metgrid Workflow

In general, we will be interested in a single wfnamelist that defines a full workflow. The above examples were provided as more of an academic demonstration of how components can be chained together separately for asynchronous workflows, and specifically demonstrated the added complexity of the ERA inputs. In the following, we define the entire workflow in one wfnamelist. In this scenario, we will not need to keep track of where the geogrid and ungrib outputs are - the ehratmwf.py will handle all of that.

wfnamelist.ecmwf-fullwps-twonest-4mpitasks

&control
  workflow_list = 'ungrib', 'geogrid', 'metgrid'
/

&time
  start_time = '2014012400'
  end_time = '2014012403'
  wrf_spinup_hours = 0
/

&grib_input1
  type = 'ecmwf_ml'
  hours_intvl = 3
  rootdir = '/dvlscratch/ATM/morton/EHRATM_ERA_TESTDATA'
/

&grib_input2
  type = 'ecmwf_sfc'
  hours_intvl = 3
  rootdir = '/dvlscratch/ATM/morton/EHRATM_ERA_TESTDATA'
/

&domain_defn
  domain_defn_path = 'small_domain_twonest.nml'
/

&geogrid
  num_mpi_tasks = 4
/

&metgrid
  hours_intvl = 3
  num_mpi_tasks = 4
/

Then, we run it, again noting the run directory which, in this case, will include run directories for ungrib, geogrid and metgrid

$ ./epython3 ehratmwf.py -n wfnamelist.ecmwf-fullwps-twonest-4mpitasks
2023-09-19:23:21:36 --> Workflow started
2023-09-19:23:21:36 --> Start process_namelist()
2023-09-19:23:21:36 --> Started run_workflow(): /dvlscratch/ATM/morton/tmp/ehratmwf_20230919_232136.681431
2023-09-19:23:21:36 --> Start run_ungrib()
2023-09-19:23:21:38 --> Start run_ecmwfplevels()
2023-09-19:23:21:39 --> Start run_geogrid()
2023-09-19:23:21:48 --> Start run_metgrid()

-----------------------
Workflow Events Summary
-----------------------
2023-09-19:23:21:36 --> Workflow started
2023-09-19:23:21:36 --> Start process_namelist()
2023-09-19:23:21:36 --> Started run_workflow(): /dvlscratch/ATM/morton/tmp/ehratmwf_20230919_232136.681431
2023-09-19:23:21:36 --> Start run_ungrib()
2023-09-19:23:21:38 --> Start run_ecmwfplevels()
2023-09-19:23:21:39 --> Start run_geogrid()
2023-09-19:23:21:48 --> Start run_metgrid()
2023-09-19:23:21:50 --> Workflow completed...
$ tree -L 2 -F /dvlscratch/ATM/morton/tmp/ehratmwf_20230919_232136.681431
/dvlscratch/ATM/morton/tmp/ehratmwf_20230919_232136.681431
├── ecmwfplevels_namelist.wps
├── ecmwfplevels_rundir/
│   ├── GEOG_DATA -> /dvlscratch/ATM/morton/WRFDistributions/WRFV4.3/GEOG_DATA/
│   ├── WPS/
│   └── WRF/
├── geogrid_namelist.wps
├── geogrid_rundir/
│   ├── GEOG_DATA -> /dvlscratch/ATM/morton/WRFDistributions/WRFV4.3/GEOG_DATA/
│   ├── WPS/
│   └── WRF/
├── metgrid_rundir/
│   ├── GEOG_DATA -> /dvlscratch/ATM/morton/WRFDistributions/WRFV4.3/GEOG_DATA/
│   ├── WPS/
│   └── WRF/
├── namelist.wps
├── namelist.wps_ecmwf_ml
├── namelist.wps_ecmwf_sfc
├── tmpmetdata-ecmwf_ml/
│   ├── EA2014012400.ml -> /dvlscratch/ATM/morton/EHRATM_ERA_TESTDATA/2014/01/24/EA2014012400.ml
│   └── EA2014012403.ml -> /dvlscratch/ATM/morton/EHRATM_ERA_TESTDATA/2014/01/24/EA2014012403.ml
├── tmpmetdata-ecmwf_sfc/
│   ├── EA2014012400.sfc -> /dvlscratch/ATM/morton/EHRATM_ERA_TESTDATA/2014/01/24/EA2014012400.sfc
│   └── EA2014012403.sfc -> /dvlscratch/ATM/morton/EHRATM_ERA_TESTDATA/2014/01/24/EA2014012403.sfc
├── ungrib_rundir_ecmwf_ml/
│   ├── GEOG_DATA -> /dvlscratch/ATM/morton/WRFDistributions/WRFV4.3/GEOG_DATA/
│   ├── WPS/
│   └── WRF/
└── ungrib_rundir_ecmwf_sfc/
    ├── GEOG_DATA -> /dvlscratch/ATM/morton/WRFDistributions/WRFV4.3/GEOG_DATA/
    ├── WPS/
    └── WRF/

22 directories, 9 files

Each of the run directories is self-contained and all set up for execution of the specific case components, and the interested user can go into them and rerun for experimentation and/or troubleshooting.