Skip to content

Output post processing

Michael Dietze edited this page Jan 6, 2014 · 1 revision

Table of Contents

Synopsis

ED2 output is formatted in Hierarchical Data Format 5 (HDF5). For more information on this data format please see the main HDF5 web page: http://www.hdfgroup.org/HDF5/. The HDF5 format was chosen because it offers compression options, api's for parallel I/O, the ability to embed metadata/descriptions to the file, and self description regarding byte sizes and orders. The current implementation of ED2 does not use compression for output files because compression both slows down file writes and complicates the parallel writing of hyper-slabs. This may change in the future.

File Types

By observing the ED2IN namelist file, one will see that there are typically 6 output files that are created during run time. These 6 files are described as the following:

State (history) Files: They use an -S- tag denoting "state" and are written out at a frequency of the users discretion. These are intended to contain all of the information necessary to restart the simulation at any point and produce output with binary reproducibility with an original (non-restarted) run.

Fast Files: They use an -F- tag denoting "fast" and are written at a frequency of the user's discretion. These are intended to contain diagnostic variables of environmental conditions, typically at the polygon level. These diagnostic variables are either instantaneous, or are mean quantities derived from the integration period between the fast file write outs.

Daily Files: They use a -D- tag for obvious reasons. These files are written at 00:00 GMT of simulation time. The data contained in the file reference mean quantities for the day that has just passed, and therefore the time stamp on the file will read 00:00 GMT for the date of the previous day. These are intended to contain diagnostic variables of the dynamic environmental conditions, and also ecological states. Dynamic environmental states are reported at the polygon level, averaged over the day. In a similar fashion, ecological states are averaged spatially within the polygon.

Monthly Files: They use an -E- tag, again for obvious reasons. The data is written on the first of each month, at 00:00 GMT. Like the convention in writing daily files, they provide mean quantities for the previous month, and as such are time stamped 00:00 GMT for the first day of the month that was just integrated. They are intended to offer the same variables for analysis that daily files do.

Yearly Files: They use a -Y- tag. The timing of the file writing and data composition is similar to the monthly and daily files. It contains ecological state variables (e.g. biomass, soil carbon, etc.) that operate at a slower time scale in contrast to the flux terms that are a larger focus of the daily and monthly files.

"Tower" Files: They use a -T- tag. The contents of these files are an annual time-series of fluxes at the "fast" time step and is intended to be analogous to an eddy-covariance tower's output but with additional partitioning of flux terms. These files are particularly useful for comparison to flux towers or optimization of the model at a tower site. The info in this file can also be extracted from the Fast Files but without the hassle of manipulating large numbers of files.

File Metadata Control

The output files can be written with or without metadata. The metadata that is written with each variable data provides the end user with three types of information:

(1) a long description of the variable (2) the units of the variable (3) the meaning of the variable dimensions

This information is appended to each dataset as an attribute. The attachement of the metadata does not add considerably to the size of the file, but it may compromise the speed of file writing. Although, this has never been tested. If the user wishes to turn on file metadata, they can simply indicate this in the ED2IN namelist file with the following flag:

NL%ATTACH_METADATA = 1

File Composition Control

In the prior mentioned description of each file type, the word intended was frequently used. These descriptions refer to the default composition of the files. The composition of each file can be changed if the user so desires. These changes can only be activated by modifying a registry file in the model code, and then re-compiling the code. This registry is contained in five subroutines in the core memory file of the model code set, ED/src/memory/ed_state_vars.f90.

Every single global variable in the model code is referenced in one of the following subroutines: filltab_globtype, filltab_edtype, filltab_polygontype, filltab_sitetype, filltab_patchtype.

Any global variable in the model code set, can be directed to any output file at write time. CAUTION: Only the default set of variable assigned to each output file type, have been tested. Many variables are only correct, interpretable and properly averaged, because they are preprocessed at the correct time in the code. Modifying file composition is therefore cautioned.

That being said, lets take an example. Lets say you want the number of plants per each cohort to monthly files. The variable is found in the registry, it is a cohort level variable, and is therefore assigned a "patch" level pointer; look for cpatch%nplant. Its default registry entry can be found in ed_state_vars.f90 using the grep command or any text searching utility in your favorite editor. Here it is.

Image:registry_nplant.png

As a default it is only included in the history file and the yearly output file (for good reasons, nplant only changes on a monthly time scale, and is one of the largest memory consuming variables in the code). To add nplant to monthly files, add the ":mont" token to the last parameter so that it reads 'NPLANT :41:hist:anal:mont'. The following is a full list of tokens used in this context:

  • hist
  • anal
  • lite
  • mpti
  • mpt1
  • mpt2
  • mpt3
  • recycle
  • mont
  • dail
  • year
  • opti

File (and Memory) Structure

ED2 has a unique spatial hierarchy for its variables. Grids contain polygons, polygons contain sites, sites contain patches and patches contain cohorts. In the HDF5 output files, every single variable is written as a vector. When the model writes these variables to a file, it will append all the cohort data in the entire model state into a single, continuous block of vector data. This single vector contains data that originally came from numerous other vectors with different pointers. So given a single vector of something like cohort variables, how do we determine which patch, site, polygon and grid each cohort is a member of?

Here is a hypothetical example. The domain is very small, lets just say 2 polygons in the grid (2 grid points). Since we are not considering hillslopes, hydrologic or topographic effects within the polygon, each polygon has only one site. Each site, will have only 4 patches, and each patch will have only 3 cohorts. In this example the following are true:

A grid level variable such as latitude, will be stored as a single vector of 2 entries (2 polygons). Each index of the vector is unique to a single polygon. A polygon level variable such as elevation, will be stored as a single vector, also of 2 entries (2 polygons x 1 site/polygon). Each index of the vector is unique to a single site (and polygon too, in this example) A site level variable such as patch age, will be stored as a single vector with 8 entries (2 sites x 4 patches/site). Each index of the vector is unique to a single patch. A patch level variable such as leaf biomass, will be stored in a single vector with 24 entries (8 patches x 3 cohorts/patch). Each index is unique to a single cohort.

To find which polygon, site and patch that any given cohort in the HDF5 output dataset is associated with, there is a set of mapping variables. The mapping variables are simply pointers to the indices of their parent variables that signify where they start and how many of them are there.

/pysi_id

 (the index of the first site for each polygon, this points to their location in the site level vector)

/pysi_n

 (the number of sites for each polygon)     

/sipa_id

 (the index of the first patch for each site, this points to their locations in the patch level vector.)

/sipa_n

 (the number of patches for each site.)

/paco_id

 (the index of the first cohort for each patch, this points to their locations in the cohort level vector.)

/paco_n

 (the number of cohorts for each patch)

Image:hdf5_mapping.png

Sample Analysis Script

Attached is a zip file with a sample MATLAB script for making an analysis of ED2 output. The contents of the file include the base matlab m file, an auxillary file used for generating the vertices of the ED polygons (used to make patch objects), a text file that lists a sequence of H5 files to be analyzed, and of course the sequence of H5 files to be analysed.

Give it a try! The base code is documented to give you an idea of what is going on. The test data is from a coarse resolution regional simulation of the Amazon rain basin in South America. In this run we happened to be testing some respiration parameterizations, so the results are not valid. The simulation was started at an arbitrary date, 1200 AD and continues for 199 years until 1399. The simulation was initialized with a near bare ground condition.

Download file: Sample Analysis Script

The sample analysis tool should give you some maps of AGB and a histogram of cohort heights for each polygon.

Image:Sample_for_ed2_io.png