===================================================================
"Unit" testing of the ehratmworkflow package with clibatchtest.py
===================================================================


**clibatchtest.py**  - **Command Line Interface batch test** - is a program that will iterate through a list of specified wfnamelists, running each and reporting in an organised way its success or failure.  Its primary intent is to serve in the way that unit tests would, iterating through a set of problems that are all expected to “pass.”  Use of this kind of utility can be particularly helpful when dealing with a new environment or after making underlying code modifications.

The presentation here will proceed as follows, in an incremental fashion

* Use of clibatchtest.py to run a single case (including debugging features)
* Use of clibatchtest.py to run more than one case
* Testing with portable collection of wfnamelists


Running a simple case
======================

This section introduces the basics of **clibatchtest.py** usage, beginning with a simple wfnamelist, and then introducing an error in an effort to illustrate how we might start troubleshooting the problem.

The program is found in the repository in

**packages/ehratmworkflow/tests/clibatchtest.py**

and, to work correctly, will need correct environment set up in the accompanying file **setup_test_env.sh**.

To start, since you’ll be editing **clibatchtest.py** to specify the location of wfnamelists, and to enable troubleshooting when necessary, and you may need to edit **setup_test_env.sh**, it may be a good idea to use a copy of this outside of the repository in a working directory.  That's what the following will assume.

?????
(I’m doing this in ~/clibatchtestdir/)


In the following, we will

* use **setup_test_env.sh** to set up necessary environment variables and set up a compatible PYTHON environment
* copy in some test files that we used in the :doc:`User Perspective </UserPerspective/index>` tests
* edit **clibatchtest.py** to run a single case
* introduce an error in the wfnamelist and demonstrate how we can begin to obtain more information for troubleshooting the problem


setup_test_env.sh
------------------

About eight environment variables need to be defined, pointing to the repository location, the location of the WPS/WRF and FlexpartWRF distributions, and location of met data.   In addition to a couple more variables, the ``CONDA_ENV`` variable assumes that you have available a conda environment of the specified name, one which is compatible with the EHRATM system.  This was discussed in the :doc:`User Perspective </UserPerspective/index>` document and, for the purposes of this demonstration, if you don’t want to try to set up an environment you can hard-code it (as suggested in the User Perspective document) in this case by creating a link to my own, working python3.  This is not the normal way you would do things, but it's meant to allow you to temporarily avoid the complexities of getting the right Python environment set up.


.. code-block::bash

    $ ln -sf /dvlscratch/ATM/morton/anaconda3/envs/ehratmv0.02/bin/python3 epython3


Setup of the simple test case
------------------------------

For this example, we will copy into our working directory the following wfnamelist file that was used in the `User Perspective </UserPerspective/index>` document, in the repository directory: **packages/ehratmworkflow/docs/UserPerspective/sample_workflows/wfnamelist.gfsungrib**


.. code-block::

    &control
      workflow_list = 'ungrib'
      workflow_rootdir = '/dvlscratch/ATM/morton/tmp'
    /

    &time
      start_time = '2021051415'
      end_time = '2021051421'
      wrf_spinup_hours = 0
    /

    &grib_input1
      type = 'gfs_ctbto'
      subtype = '0.5.fv3'
      hours_intvl = 3
      rootdir = '/ops/data/atm/live/ncep'
    /

The user will want to change the **workflow_rootdir** value to one of their own writable directories.

Then, **clibatchtest.py** will need to be edited

**clibatchtest.py** is set up under the assumption that it will be used from within the repository.  This is probably more convenient for the experienced developer who is already somewhat familiar with this code, but for this initial exposure, the following assumes that we are - in accordance with guidance given above - using copies of **clibatchtest.py** and **setup_test_env.sh** in a working directory (in this example case, **/home/consult/morton/clibatchtestdir/**)

The copy of **clibatchtest.py** that is in this directory should be modified to 1) use the **epython3** symlink described above;  2) use the current directory as the directory that contains the wfnamelists to be tested.  In **clibatchtest.py**, this means

* Change ``PYTHON = ‘python3’`` to ``‘./epython3’``
* Comment out existing definition of **WFNML_TESTDIR** and set it to the current working directory, ``‘./’``

**clibatchtest.py** relies on several (currently eight) environment variables to be set correctly, and these are defined in **setup_test_env.sh**.  Users will need to adjust these to known distributions of WRF and Flexpart WRF, and a known good copy / branch of the **high-res-atm** repository.  Additionally, the **TEMP_ROOTDIR** variable will need to be set to an existing scratch location that the user has write permissions for.

Finally, if you are using the **epython3** symlink for Python, you should comment out the last statement in the file that sets up a conda environment.

Finally, we need to define the list of wfnamelists to test.  There is a single variable in **clibatchtest.py**, **WFNML_TESTLIST**, representing a Python list of these wfnamelists to be tested.  The program assumes that all of the files are in the **WFNML_TESTDIR** defined above.

Although this section of **clibatchtest.py** is long, and looks very complicated, the reality is that most of it is commented out in a way that the knowledgeable user can construct a list of all possible tests, subsets of these tests, or single tests.  The bottom line, however, is that at the end of this section, the uncommented assignment to **WFNML_TESTLIST** should be a Python list of the names of the wfnamelists to be tested.  It’s really that simple.

In our case, we want to test a single, simple wfnamelist, **wfnamelist.gfsungrib**, that was already copied into the working directory, so in **clibatchtest.py** we simply set

.. code-block::

    WFNML_TESTLIST = ['wfnamelist.gfsungrib']

To review, we have

* Created copies of **clibatchtest.py** and **setup_test_env.sh** into a local working directory.  This is done to keep the inexperienced user from accidentially corrupting the repository code.  The experienced user will likely work right from the repository and not bother with this.
* We have copied a simple wfnamelist into the working directory so that we can demonstrate a test on it
* We created a link to “my” Python 3 on *devlan*, calling it **epython3**.  The experienced user will generally forego this step and make sure they are using the correct conda environment.
* We edited variables at the beginning of **clibatchtest.py** for this particular setup.  The experienced user will find that if they just run all of this out of the repository, they won’t need to do this.
* We edited **setup_test_env.sh** for the needed environment variables.

Once this is accomplished, we simply need to make sure that our environment is set, and run **clitbatchtest.py** for the single test we defined.

.. code-block::bash

    $ . setup_test_env.sh
    $ ./epython3 clibatchtest.py

    ===========================================================================
    TEST 1/1
    Running Test: ./wfnamelist.gfsungrib


    ***************************************************************************
    Last 10 stdout lines: ./wfnamelist.gfsungrib
    ***************************************************************************
       .
       .
       .
    2023-09-24:22:58:25 --> Start run_ungrib()

    -----------------------
    Workflow Events Summary
    -----------------------
    2023-09-24:22:58:25 --> Workflow started
    2023-09-24:22:58:25 --> Start process_namelist()
    2023-09-24:22:58:25 --> Started run_workflow(): /dvlscratch/ATM/morton/tmp/ehratmwf_20230924_225825.014276
    2023-09-24:22:58:25 --> Start run_ungrib()
    2023-09-24:22:58:35 --> Workflow completed...


    =====================================================
                      SUMMARY RESULTS
    =====================================================

    return code   wfnamelist
    -----------   ----------
         0        ./wfnamelist.gfsungrib


    Number successful: 1/1


    WARNING - workflow product directories NOT automatically deleted


In the above example, the Unix return code is displayed which, in this case, a zero implies that the test occurred without any operating system error - the simulation ran.  It’s important to note that this does not test the actual output values, just that the system itself ran the test without apparent problems.


Note that, by default (this can be changed), the run directories for each test are retained, and just as in :doc:`User Perspective </UserPerspective/index>`, we can go in and experiment or troubleshoot cases.

.. code-block::bash

    $ ls -F /dvlscratch/ATM/morton/tmp/ehratmwf_20230924_225825.014276/ungrib_rundir_gfs_ctbto/WPS
    geogrid/           GRIBFILE.AAB@   namelist.wps.all_options  ungrib.exe@
    geogrid.exe@       GRIBFILE.AAC@   namelist.wps.fire*        ungrib.log
    GFS:2021-05-14_15  link_grib.csh*  namelist.wps.global       util/
    GFS:2021-05-14_18  metgrid/        namelist.wps.nmm          Vtable@
    GFS:2021-05-14_21  metgrid.exe@    README
    GRIBFILE.AAA@      namelist.wps    ungrib/


And, that’s it, if we set up **WFNML_TESTLIST** for a number of tests, they would be run iteratively, with a little bit of information made available for each test, and then a summary statement.  The following screenshots show an example of a full set of tests

.. |clibatchdisplay1| image:: clibatchdisplay1.png
    :scale: 80%

|clibatchdisplay1|

.. |clibatchdisplay2| image:: clibatchdisplay2.png
    :scale: 80%

|clibatchdisplay2|

These figures show that there is a large collection of tests already created which run through single-component to full simulation workflows, and being able to run these through a single command allows us to understand if our system is working correctly.


In order to keep the output of such a test immediately informative, any output produced by the simulations has been suppressed, so by default the user sees only a a brief timeline for each test, as well as a summary.  However, when there is a failure it is useful to be able to see debugging output and stack traces, and the next section describes this.


Accessing runtime logging information for a single test
--------------------------------------------------------

In this section we demonstrate features of **clibatchtest.py** that allow us to gain additional information that might be helpful for troubleshooting.  In this example, we introduce a minor error in the **wfnamelist.ungrib**, one that I’ve encountered.  We will change the ``subtype = ‘0.5.fv3’`` to ``subtype = ‘0.5’``.  Since there is no data for this subtype during the period of interest, the simulation will have to fail.


.. code-block::bash

    $ ./epython3 clibatchtest.py


    ===========================================================================
    TEST 1/1
    Running Test: ./wfnamelist.gfsungrib


    ***************************************************************************
    Last 10 stdout lines: ./wfnamelist.gfsungrib
    ***************************************************************************
      .
      .
      .
    2023-09-25:01:15:20 --> Workflow started
    2023-09-25:01:15:20 --> Start process_namelist()
    2023-09-25:01:15:20 --> Started run_workflow(): /dvlscratch/ATM/morton/tmp/ehratmwf_20230925_011520.564059
    2023-09-25:01:15:20 --> Start run_ungrib()


    =====================================================
                     SUMMARY RESULTS
    =====================================================

    return code   wfnamelist
    -----------   ----------
        1        ./wfnamelist.gfsungrib


    Number successful: 0/1


    WARNING - workflow product directories NOT automatically deleted


We might start by looking in the listed run directory, **/dvlscratch/ATM/morton/tmp/ehratmwf_20230925_011520.564059/**, and we find that it’s empty.  This implies that something went wrong early, but what?  We have nothing to go on.  There is a flag in **clibatchtest.py**, however, that will allow us to enable the output of all simulation stdout and stderr

.. code-block::

    ###########################################################################
    # This variable will normally be set to False, allowing clean iteration
    # through all tests.
    #
    # If this is set to true, then the single test (if there is more than
    # one listed, only the first one will be executed) will be run with
    # stdout/stderr all going to screen, and after execution this program
    # will terminate.  Since this program by default tries to keep things
    # clean, if something goes wrong it can be difficult to understand.  So,
    # this option allows for all the ugly details to be generated.
    ####################################
    ####  FOR DEBUGGING  ########
    SINGLE_RUN_DEBUGGING = False
    ####################################
    ###########################################################################

As the comments say, by setting **SINGLE_RUN_DEBUGGING** to **True**, all messages will go to the screen for the first simulation in the list of tests, and then **clibatchtest.py** will exit.  So, if we run it with the option enabled, we get some information that might help the experienced user immediately find the problem (newer GFS files are stored in subdirectory **0.5.fv3**, not **0.5**).

.. code-block::bash

    $ ./epython3 clibatchtest.py
    BEGIN DEBUGGING DUMP...
    2023-09-25:01:21:33 --> Workflow started
    2023-09-25:01:21:33 --> Start process_namelist()
    WARNING    [ungrib:ungrib.py:gribmet_verify_gfs_ctbto:134] --> Unable to find metpath: /ops/data/atm/live/ncep/2021/05/14/0.5/GD21051415
    WARNING    [ungrib:ungrib.py:gribmet_verify_gfs_ctbto:134] --> Unable to find metpath: /ops/data/atm/live/ncep/2021/05/14/0.5/GD21051418
    WARNING    [ungrib:ungrib.py:gribmet_verify_gfs_ctbto:134] --> Unable to find metpath: /ops/data/atm/live/ncep/2021/05/14/0.5/GD21051421
    2023-09-25:01:21:33 --> Started run_workflow(): /dvlscratch/ATM/morton/tmp/ehratmwf_20230925_012133.745995
    2023-09-25:01:21:33 --> Start run_ungrib()
    CRITICAL   [ehratmwf:ehratmwf.py:run_workflow:1795] --> run_ungrib() ABEND ==> Use log level of DEBUG for full stack trace and messages
    END DEBUGGING DUMP...


If that’s not enough information, we can, as the message suggests, specify a more verbose log level, and this is accomplished by adding


``log_level = ‘debug’`` into the **&control** section of the wfnamelist.


Because I have made extensive (some might say obsessive) use of **DEBUG** logger statements in all of the codes, this will firehose much more information than you would ever want, but with patience it hopefully helps to zero in on the problem area.

.. code-block::bash

    $ ./epython3 clibatchtest.py
    BEGIN DEBUGGING DUMP...

    Log level set to wfnamelist specified value: DEBUG

    2023-09-25:01:26:53 --> Workflow started
    INFO       [ehratmwf:ehratmwf.py:run_workflow:1741] --> process_namelist
    2023-09-25:01:26:53 --> Start process_namelist()
    DEBUG      [wfnamelist:wfnamelist.py:standards_check:182] --> wf_ref_namelist_dict:
    &control
      workflow_list = 'UNGRIB', 'GEOGRID'
    .
    .
    .
    DEBUG      [wfnamelist:wfnamelist.py:nml_grib_inputs:530] --> unordered_gribinput: {'type': 'gfs_ctbto', 'subtype': '0.5', 'hours_intvl': 3, 'rootdir': '/ops/data/atm/live/ncep'}
    DEBUG      [wfnamelist:wfnamelist.py:nml_grib_inputs:587] --> Verifying met type gfs_ctbto
    DEBUG      [ungrib:ungrib.py:gribmet_verify_gfs_ctbto:119] --> Verifying: 2021-05-14 15:00:00
    DEBUG      [ungrib:ungrib.py:gribmet_verify_gfs_ctbto:131] --> Checking for: /ops/data/atm/live/ncep/2021/05/14/0.5/GD21051415
    WARNING    [ungrib:ungrib.py:gribmet_verify_gfs_ctbto:134] --> Unable to find metpath: /ops/data/atm/live/ncep/2021/05/14/0.5/GD21051415
    .
    .
    .
    INFO       [ehratmwf:ehratmwf.py:run_workflow:1784] --> run_ungrib()
    2023-09-25:01:26:53 --> Start run_ungrib()
    DEBUG      [ehratmwf:ehratmwf.py:run_ungrib:422] --> Start _run_ungrib
    DEBUG      [ehratmwf:ehratmwf.py:run_ungrib:431] --> products_dir exists and writable: /dvlscratch/ATM/morton/tmp/ehratmwf_20230925_012653.020535
    Traceback (most recent call last):
    File "/dvlscratch/ATM/morton/git/high-res-atm/packages/ehratmworkflow/src/ehratmworkflow/ehratmwf.py", line 1786, in run_workflow
      ungribbed_info_list = self.run_ungrib(
    File "/dvlscratch/ATM/morton/git/high-res-atm/packages/ehratmworkflow/src/ehratmworkflow/ehratmwf.py", line 447, in run_ungrib
      raise ValueError(msg)
    ValueError: Problem with grib input type: gfs_ctbto

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
    File "/dvlscratch/ATM/morton/git/high-res-atm/packages/ehratmworkflow/src/ehratmworkflow/ehratmwf.py", line 2159, in <module>
      main()
    File "/dvlscratch/ATM/morton/git/high-res-atm/packages/ehratmworkflow/src/ehratmworkflow/ehratmwf.py", line 2153, in main
      Workflow.run_workflow()
    File "/dvlscratch/ATM/morton/git/high-res-atm/packages/ehratmworkflow/src/ehratmworkflow/ehratmwf.py", line 1790, in run_workflow
      raise Exception
    Exception
    END DEBUGGING DUMP...


In this way, we can run through a small or large collection of tests and then, if any fail, with little effort we can gain a great deal more information.


Running two cases
==================

In this example, we run two tests in **clibatchtest.py** - a single-component geogrid test and a full ECMWF workflow (with fdda).  Nothing really new is done in this example - it’s just meant to show additional variety, and how we can process more than one test.

The necessary files are found in the repository **UserPerspective/sample_workflows/** dir used in the :doc:`User Perspective </UserPerspective/index>` documentation.  They are


* **wfnamelist.geogrid-twonest-4mpitasks**
    * need to edit **workflow_rootdir** for your own filesystem
* **wfnamelist.full-era-workflow**
    * Edit **workflow_rootdir** in **&control**
    * May need to edit **rootdir** in **&grib_input1** and **&grib_input2** if the ERA met files have been moved
* **small_domain_twonest.nml**
* **namelist.input.era-twonest-fdda**
    * Edit **anne_bypass_namelist_input** in **&real** and **&wrf** for the correct *FULL* path to **namelist.input.era-twonest-fdda**
    * Edit **bypass_user_input** in **&flexwrf** for the correct *FULL* path to **flxp_input.txt.twonest-from-era**


.. code-block::bash

    $ ./epython3 ./clibatchtest.py


    ===========================================================================
    TEST 1/2
    Running Test: ./wfnamelist.geogrid-twonest-4mpitasks


    ***************************************************************************
    Last 10 stdout lines: ./wfnamelist.geogrid-twonest-4mpitasks
    ***************************************************************************
      .
      .
      .
    2023-09-29:02:16:34 --> Start run_geogrid()

    -----------------------
    Workflow Events Summary
    -----------------------
    2023-09-29:02:16:34 --> Workflow started
    2023-09-29:02:16:34 --> Start process_namelist()
    2023-09-29:02:16:34 --> Started run_workflow(): /dvlscratch/ATM/morton/tmp/ehratmwf_20230929_021634.821765
    2023-09-29:02:16:34 --> Start run_geogrid()
    2023-09-29:02:16:44 --> Workflow completed...


    ===========================================================================
    TEST 2/2
    Running Test: ./wfnamelist.full-era-workflow


    ***************************************************************************
    Last 10 stdout lines: ./wfnamelist.full-era-workflow
    ***************************************************************************
      .
      .
      .
    2023-09-29:02:16:44 --> Started run_workflow(): /dvlscratch/ATM/morton/tmp/ehratmwf_20230929_021644.232178
    2023-09-29:02:16:44 --> Start run_ungrib()
    2023-09-29:02:16:46 --> Start run_ecmwfplevels()
    2023-09-29:02:16:47 --> Start run_geogrid()
    2023-09-29:02:16:56 --> Start run_metgrid()
    2023-09-29:02:17:00 --> Start run_real()
    2023-09-29:02:17:02 --> Start run_wrf()
    2023-09-29:02:19:03 --> Start run_flexwrf()
    2023-09-29:02:22:07 --> Start run_srm()
    2023-09-29:02:22:07 --> Workflow completed...


    =====================================================
                     SUMMARY RESULTS
    =====================================================

    return code   wfnamelist
    -----------   ----------
        0        ./wfnamelist.geogrid-twonest-4mpitasks
        0        ./wfnamelist.full-era-workflow


    Number successful: 2/2


    WARNING - workflow product directories NOT automatically deleted


Portable Tests for Reproducibility
===================================

The wfnamelist approach uses the concept of Fortran namelists, with processing well-supported (though sometimes tricky) by the Python **f90nml** package.   This is a powerful system for processing namelists, but an important shortcoming in the context of setting up automated tests for a variety of users is that there is no native support for using variables within a namelist to represent parameters like paths.  For example, many of the previous examples have had scenarios like the following


.. code-block::

    &control
     workflow_list = 'ungrib'
     workflow_rootdir = '/dvlscratch/ATM/morton/tmp'
    /

    &time
     start_time = '2021051415'
     end_time = '2021051421'
     wrf_spinup_hours = 0
    /

    &grib_input1
     type = 'gfs_ctbto'
     subtype = '0.5.fv3'
     hours_intvl = 3
     rootdir = '/ops/data/atm/live/ncep'
    /

where the **workflow_rootdir** in the **&control** group, and the **rootdir** in the **&grib_input1** group have very specific paths.  With more than thirty wfnamelists currently set up for automated batch testing, it would require a great deal of error-prone work to modify them all for different **workflow_rootdir** or **rootdir paths**.  This is especially bothersome when one considers working on a system with a different file structure.  Because the **f90nml** package - to the best of my knowledge - does not provide any kind of support for conditional values I decided to implement an optional string pattern replacement preprocessing on wfnamelists so that when a specially-coded string is found in the wfnamelist it is replaced with the value found in an environment variable.


For example, the above wfnamelist is written as

.. code-block::

    &control
     workflow_list = 'ungrib'
     workflow_rootdir = '#!!#WFN_TEMP_ROOTDIR#!!#/myworkflowdir'
    /
    &time
     start_time = '2021051415'
     end_time = '2021051421'
     wrf_spinup_hours = 0
    /

    &grib_input1
     type = 'gfs_ctbto'
     subtype = '0.5.fv3'
     hours_intvl = 3
     rootdir = '#!!#WFN_METDATA_NCEP#!!#'
    /


The behaviour of **clibatchtest.py**, upon seeing each wfnamelist is that it will read it line by line and copy it to a temporary wfnamelist, which will actually be used for the test (unknown to the user).  If it encounters a string delimited by ``#!!# … #!!#``, it will look for an environment variable that matches the string and, if found, will replace that string with the value of the environment variable.  In this way, custom strings can be entered into the wfnameslists without having to edit them, and the same set of test wfnamelists can be used across different user accounts and computing systems.


The preceding might be run like


.. code-block::bash

    $ export WFN_TEMP_ROOTDIR=/dvlscratch/ATM/tipka/scratch
    $ export WFN_METDATA_NCEP=/ops/data/atm/live/ncep

    $ ./epython3 ./clibatchtest.py


The wfnamelist would be processed into the following (the general user would not be aware of this)

.. code-block::

    &control
     workflow_list = 'ungrib'
     workflow_rootdir = '/dvlscratch/ATM/tipka/scratch/myworkflowdir'
    /

    &time
     start_time = '2021051415'
     end_time = '2021051421'
     wrf_spinup_hours = 0
    /

    &grib_input1
     type = 'gfs_ctbto'
     subtype = '0.5.fv3'
     hours_intvl = 3
     rootdir = '/ops/data/atm/live/ncep'
    /

and then processed with the custom directories correctly encoded.

The use of ``#!!# … #!!#`` as delimiter was chosen after some consideration, mostly wanting to make sure that it couldn’t possibly conflict with other string sequences.  I know it’s a bit of overkill, and Anne has already commented on this, but I’m afraid it is what it is for now - all of the “portable” wfnamelist test cases have been encoded with this delimiter.


The preprocessing has been implemented with the intent that changing the delimiter to something else would be a matter of simply changing the

.. code-block::
    DELIMITER = ‘#!!#’

statement at the top of **src/ehratmworkflow/preprocnml.py** to something else (note that I haven't tested this feature)


Running the portable tests
----------------------------

Running the set of portable tests - the ones that should run successfully (with one intermittent condition, discussed below) on properly configured systems - involves the following.  Note the documentation in **clibatchtest.py** that says the default setup assumes that it is run from within the repository directory **packages/ehratmworkflow/tests/**.  Given that this is essentially a development tool, it is a logical place to run from.  If you are going to run elsewhere you will need to modify some of the folllowing appropriately


* Editing **WFNML_TESTDIR** in **clibatchtest.py** to specify the path to the portable tests.  The default is already coded in, expecting these to be in the repository in **packages/ehratmworkflow/tests/wfnml_tests_portable/**.
* Unless debugging, ensure that **SINGLE_RUN_DEBUGGING** is set to its default value of False
* Set up the list **WFNML_TESTLIST** for the names of the wfnamelists you wish to test.  Note that the variable **ALL_TESTS** in the program contains the list of all wfnamelists, and just setting **WFNML_TESTLIST = ALL_TESTS** will test them all.  Otherwise, edit for the desired set of tests
* A number of correctly-set environment variables are expected by **clibatchtest.py**.  These are set in **packages/ehratmworkflow/tests/setup_test_env.sh**, and most are based on system-specific paths.  This file was addressed in previous examples.  Note at the end of the file are three WorkFlowNamelist (**WFN_**) variables that will be used to replace the string patterns discussed above.

Although the above may seem complicated, **clibatchtest.py** “should” be ready to go simply by editing a few root paths in **setup_test_env.sh**, setting ``WFNML_TESTLIST = ALL_TESTS``, and running in **packages/ehratmworkflow/tests/**


On devlan, buffering may make it appear that nothing is happening for a while, but running ``top`` in a separate window will allow you to see the various WPS/WRF and FlexpartWRF executables running.


.. code-block::bash

    $ . setup_test_env.sh
    $ ./clibatchtest.py 2>&1 | tee batchtest.log

    ===========================================================================
    TEST 1/33
    Running Test: /dvlscratch/ATM/morton/git/high-res-atm/packages/ehratmworkflow/tests/wfnml_tests/wfnml_tests_portable/wfnamelist.ecmwf-metgridonly


    ***************************************************************************
    Last 10 stdout lines: /dvlscratch/ATM/morton/git/high-res-atm/packages/ehratmworkflow/tests/wfnml_tests/wfnml_tests_portable/wfnamelist.ecmwf-metgridonly
    ***************************************************************************
      .
      .
      .
    2023-10-05:22:54:16 --> Start run_metgrid()

    -----------------------
    Workflow Events Summary
    -----------------------
    2023-10-05:22:54:16 --> Workflow started
    2023-10-05:22:54:16 --> Start process_namelist()
    2023-10-05:22:54:16 --> Started run_workflow(): /dvlscratch/ATM/morton/tmp/ehratmwf_20231005_225416.898253
    2023-10-05:22:54:16 --> Start run_metgrid()
    2023-10-05:22:54:20 --> Workflow completed...


    ===========================================================================
    TEST 2/33
    .
    .
    .
    =====================================================
                     SUMMARY RESULTS
    =====================================================

    return code   wfnamelist
    -----------   ----------
        0        /dvlscratch/ATM/morton/git/high-res-atm/packages/ehratmworkflow/tests/wfnml_tests/wfnml_tests_portable/wfnamelist.ecmwf-metgridonly
        0        /dvlscratch/ATM/morton/git/high-res-atm/packages/ehratmworkflow/tests/wfnml_tests/wfnml_tests_portable/wfnamelist.ecmwf-metgridonly-twonest-4mpitasks
    .
    .
    .
    /wfnml_tests_portable/wfnamelist.ecmwfungrib+geogrid+metgrid+real+wrf-twonest-1mpitask
        0        /dvlscratch/ATM/morton/git/high-res-atm/packages/ehratmworkflow/tests/wfnml_tests/wfnml_tests_portable/wfnamelist.ecmwfungrib+geogrid+metgrid+real+wrf-twonest-1mpitask-fdda
        1        /dvlscratch/ATM/morton/git/high-res-atm/packages/ehratmworkflow/tests/wfnml_tests/wfnml_tests_portable/wfnamelist.ecmwfungrib+geogrid+metgrid-twonest-4mpitasks
        1        /dvlscratch/ATM/morton/git/high-res-atm/packages/ehratmworkflow/tests/wfnml_tests/wfnml_tests_portable/wfnamelist.ecmwfungrib+geogrid-twonest
        0        /dvlscratch/ATM/morton/git/high-res-atm/packages/ehratmworkflow/tests/wfnml_tests/wfnml_tests_portable/wfnamelist.ecmwfungrib+geogrid-twonest+4mpitasks
    .
    .
    .
        0        /dvlscratch/ATM/morton/git/high-res-atm/packages/ehratmworkflow/tests/wfnml_tests/wfnml_tests_portable/wfnamelist.twonest-flexwrf-only
        0        /dvlscratch/ATM/morton/git/high-res-atm/packages/ehratmworkflow/tests/wfnml_tests/wfnml_tests_portable/wfnamelist.twonest-flexwrf-srm


    Number successful: 31/33


    WARNING - workflow product directories NOT automatically deleted

In this case we see that two tests failed (return code of ``1``).  Unfortunately, this is a result of an intermittent memory problem on *devlan* associated with one of the WPS utilities.  You will note that the two failures have the *ecmwfungrib* component in them, and that’s where it’s failing.  If you were to run this test again, it’s quite likely that these two tests would pass, but another test or two with the *ecmwfungrib* component would fail.  This problem has been traced into the WPS **util/calc_ecmwf_p.exe** executable, and similar problems have been mentioned in various bug reports.  So, this is simply something that a user needs to be aware of for now.


Review of troubleshooting
--------------------------

This section is written like a sidebar, just to remind the technical user how they might try to troubleshoot problems like this

Just as a review of troubleshooting, one could look through the **batchtest.log** and see that this test proceeded as follows


.. code-block::

    ===========================================================================
    TEST 11/33
    Running Test: /dvlscratch/ATM/morton/git/high-res-atm/packages/ehratmworkflow/tests/wfnml_tests/wfnml_tests_portable/wfnamelist.ecmwfungrib+geogrid+metgrid-twonest-4mpitasks


    ***************************************************************************
    Last 10 stdout lines: /dvlscratch/ATM/morton/git/high-res-atm/packages/ehratmworkflow/tests/wfnml_tests/wfnml_tests_portable/wfnamelist.ecmwfungrib+geogrid+metgrid-twonest-4mpitasks
    ***************************************************************************
      .
      .
      .
    2023-10-05:23:08:47 --> Workflow started
    2023-10-05:23:08:47 --> Start process_namelist()
    2023-10-05:23:08:47 --> Started run_workflow(): /dvlscratch/ATM/morton/tmp/ehratmwf_20231005_230847.514046
    2023-10-05:23:08:47 --> Start run_ungrib()
    2023-10-05:23:08:49 --> Start run_ecmwfplevels()


The workflow for this test never made it past the **run_ecmwfplevels()** operation.  One could further go into the provided run directory - **/dvlscratch/ATM/morton/tmp/ehratmwf_20231005_230847.514046** in this case - and notice that in the **ecmwfplevels_rundir/WPS/** subdir, the expected **PRES\*** output files were not produced.  From in that directory, we could run **util/calc_ecmwf_p.exe** and, in this case, it would likely run correctly, producing the expected output files.  And, we might be able to run it another time or two successfully, but eventually we would hit the intermittent error:

.. code-block::

    .
    .
    .
     136         0.000000  0.9976300001
     137         0.000000  1.0000000000

    Reading from ECMWF_SFC at time 2014-01-24_00
    Found PSFC field in ECMWF_SFC:2014-01-24_00
    Operating system error: Cannot allocate memory
    Allocation would exceed memory limit

And, if we ran it again, it would probably work again a couple more times.


.. toctree::