Case configuration
WAM2layers uses case configuration files to store the settings for an
experiment. That makes it possible to run various experiments without changing
the model code. The configuration files are written in
yaml format. When
you run WAM2layers, these settings are loaded into a Config object. The
options in your yaml file should correspond to the attributes of the Config
class listed below.
An example configuration file is available here. Alternatively, checkout the configuration files for the example cases for Volta and Eiffel.
- class wam2layers.config.Config
- model_config = {'arbitrary_types_allowed': True, 'validate_assignment': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- filename_template: str
The filename pattern of the raw input data.
Used to find the input files during preprocessing. The pattern will be interpreted during execution of the model to find the input data for each date and variable.
For example, the following pattern:
filename_template: /ERA5data/{year}/{month:02}/ERA5_{year}-{month:02d}-{day:02d}{levtype}_{variable}.nc
will be converted to
/ERA5data/2021/07/ERA5_2021-07-15_ml_u.nc
for date 2022-07-15, variable u, and levtype “_ml” (note the underscore).
- preprocessed_data_folder: HttpUrl | Path
Location where the pre-processed data should be stored.
If it does not exist, it will be created during pre-processing.
For example:
preprocessed_data_folder: ~/floodcase_202107/preprocessed_data
- calendar: Literal['standard', 'gregorian', 'proleptic_gregorian', 'noleap', '365_day', '360_day', 'julian', 'all_leap', '366_day']
Which calendar the input data and preprocessed data is on.
ERA5 uses the standard calendar, however future scenarios in earth system models can be on different calendars.
For example:
calendar: noleap
- tracking_direction: Literal['forward', 'backward']
The tracking direction, either forward or backward.
You have to specify if either forward or backward tracking should be performed.
For example:
tracking_direction: backward
- tagging_region: Annotated[Annotated[Path, PathType(path_type=file)] | BoundingBox, AfterValidator(func=validate_region)]
Subdomain delimiting the tagging region from which in forward mode evaporation is tagged (i.e., a source region) or from which in backward mode precipitation is tagged (i.e., a sink region)
You can either specify a path that contains a netcdf file, a shapefile, or a bounding box of the form [west, south, east, north]: - The netcdf file should consist of ones (tagging region) and zeros
(non-tagging region). Values between 0 and 1 are possible as well and can be used in case the region of interest overlaps with part of a grid cell.
The shapefile should contain only one polygon. The mask generated from the shapefile is also written to the debug directory in the output folder. The entire shapefile polygon needs to be inside -180, -80, 180, 80.
The bounding box should be inside -180, -80, 180, 80; if west > south, the coordinates will be rolled to retain a continous longitude, this can be used for studies crossing the antimeridian.
The file should exist. If the netCDF file has a time dimension, the nearest field will be used as tagging region, and the time should still be between tagging_start_date and tagging_end_date. A dynamic tagging region is thus possible as well. TODO: test the dynamic tagging region in more detail.
For example:
tagging_region: /data/volume_2/era5_2021/tagging_region_global.nc tagging_region: tests/test_data/shape/Rhine.shp tagging_region: [0, 50, 10, 55]
- tracking_domain: Annotated[BoundingBox, AfterValidator(func=validate_region)] | None
Subdomain delimiting the region considered during tracking.
This is useful when you have global pre-processed data but you don’t need global tracking.
You can specify a bounding box of the form [west, south, east, north].
The bounding box should be inside -180, -80, 180, 80; if west > south, the coordinates will be rolled to retain a continous longitude.
If it is set to null, then it will use full domain of preprocessed data.
Note that you should always set periodic_boundary to False if you use a subdomain that does not have all Earth’s longitudes.
For example:
tracking_domain: [0, 50, 10, 55]
- output_folder: Path
Location where output of tracking and analysis should be written.
For example:
output_folder: ~/floodcase_202107/output_data
- preprocess_start_date: DatetimeNoLeap | Datetime360Day | DatetimeAllLeap | DatetimeGregorian | DatetimeJulian | DatetimeProlepticGregorian
Start date for preprocessing.
Should be formatted as: “YYYY-MM-DD[T]HH:MM”. Start date < end date. The preprocess_start_date is included in the preprocessing.
For example:
preprocess_start_date: "2021-07-01T00:00"
- preprocess_end_date: DatetimeNoLeap | Datetime360Day | DatetimeAllLeap | DatetimeGregorian | DatetimeJulian | DatetimeProlepticGregorian
End date for preprocessing.
Should be formatted as: “YYYY-MM-DD[T]HH:MM”. Start date < end date. The preprocess_end_date is included in the preprocessing.
For example:
preprocess_end_date: "2021-07-15T23:00"
- parallel_preprocess: bool
Run the preprocessor with multiple parallel processes.
Optional configuration. Should be True of False. Parallel preprocessing is a lot faster but much more resource intensive. Note that this feature is still experimental, and needs more testing.
For example:
parallel_preprocess: True
- parallel_processes: int | None
The number of parallel processes to use (if parallel processing is enabled).
If this is left empty, the number of processes is set to the number of CPU threads available on your system. You can define a lower number to reduce the CPU and memory load.
For example:
parallel_processes: 4
- tracking_start_date: DatetimeNoLeap | Datetime360Day | DatetimeAllLeap | DatetimeGregorian | DatetimeJulian | DatetimeProlepticGregorian
Start date for tracking.
Should be formatted as: “YYYY-MM-DD[T]HH:MM”. Start date < end date, even if backtracking. When backward tracking the tracking_start_date is not given as output date.
For example:
tracking_start_date: "2021-07-01T00:00"
- tracking_end_date: DatetimeNoLeap | Datetime360Day | DatetimeAllLeap | DatetimeGregorian | DatetimeJulian | DatetimeProlepticGregorian
Start date for tracking.
Should be formatted as: “YYYY-MM-DD[T]HH:MM”. Start date < end date, even if backtracking.
For example:
tracking_end_date: "2021-07-15T23:00"
- tagging_start_date: DatetimeNoLeap | Datetime360Day | DatetimeAllLeap | DatetimeGregorian | DatetimeJulian | DatetimeProlepticGregorian
Start date for tagging.
For tracking individual (e.g. heavy precipitation) events, you can set the start and end date to something different than the total tracking start and end date, you can also indicate the hours that you want to track. The start date is included.
Should be formatted as: “YYYY-MM-DD[T]HH:MM”. Start date < end date, even if backtracking.
For example:
tagging_start_date: "2021-07-13T00:00"
- tagging_end_date: DatetimeNoLeap | Datetime360Day | DatetimeAllLeap | DatetimeGregorian | DatetimeJulian | DatetimeProlepticGregorian
End date for tagging.
For tracking individual (e.g. heavy precipitation) events, you can set the start and end date to something different than the total tracking start and end date, you can also indicate the hours that you want to track. The end date is included.
Should be formatted as: “YYYY-MM-DD[T]HH:MM”. Start date < end date, even if backtracking.
For example:
tagging_end_date: "2021-07-14T23:00"
- input_frequency: Annotated[str, AfterValidator(func=fix_deprecated_frequency)]
Time frequency of the raw input data.
This refers to the time frequency of the climate or weather model data data. Primarily used during the preprocessing, but the setting does carry over to the tracking as well. TODO: Enable surface fluxes and atmospheric fluxes to have different frequencies, which may happen with some data sets.
For example:
input_frequency: '1h'
- timestep: int
Timestep in seconds with which to perform the tracking.
The data will be interpolated during model execution. Too large timestep will violate CFL criterion, too small timestep will lead to excessive numerical diffusion and slow progress. For best performance, the input_frequency should be divisible by the timestep.
For example:
timestep: 600 # timestep in seconds
- output_frequency: Annotated[str, AfterValidator(func=fix_deprecated_frequency)]
Frequency at which to write output to file. TODO: clarify if this is used during preprocessing, tracking or both
For example, for daily output files:
output_frequency: '1d'
- level_type: Literal['model_levels', 'pressure_levels']
Type of vertical levels in the raw input data.
Can be either model_levels or pressure_levels.
For example:
level_type: model_levels
- levels: List[int] | Literal['All']
Which levels to use from the raw input data.
A list of integers corresponding to the levels in the input data, or a subset thereof. Shorthand “all” will attempt to use all 137 ERA5 levels.
For example:
levels: [20,40,60,80,90,95,100,105,110,115,120,123,125,128,130,131,132,133,134,135,136,137]
- restart: bool
Whether to restart from previous run.
If set to true, this will attempt to read the output from a previous tracking run and continue from there. The output from the previous timestep must be available for this to work. TODO: test this more extensively
For example:
restart: False
- periodic_boundary: bool
Whether to use period boundaries in the zonal direction.
This should be used when working with global datasets.
For example:
periodic_boundary: true
- kvf: float
net to gross vertical flux multiplication parameter TODO: test this more extensively
For example:
kvf: 3
- level_layer_boundary: int
Which level to use for the layer boundary.
Defaults to layer number 111 (ERA5). The pressure at this boundary can be found in `WAM2layers/src/wam2layers/preprocessing/tableERA5model_to_pressure.csv’
TODO: Check if this is a reasonable choice for boundary
Any layer numbers greater than the specified one will be included in the lower layer.
For example:
level_layer_boundary: 111
- pressure_boundary_factor: float
Pressure level boundary multiplication factor.
- The layer boundary is placed at:
A * P_surf + B
- Where P_surf is the air pressure at the surface (at this location and time),
A is the pressure_boundary_factor, and B is the pressure_boundary_offset.
Any pressure levels above this point will end up in the upper layer. The others in the lower layer.
Defaults together with pressure_boundary_offset to layer number 111 (ERA5). The pressure at this boundary can be found in `WAM2layers/src/wam2layers/preprocessing/tableERA5model_to_pressure.csv’
For example:
pressure_boundary_factor: 0.72878581
- pressure_boundary_offset: float
Pressure level boundary offset.
- The layer boundary is placed at:
A * P_surf + B
- Where P_surf is the air pressure at the surface (at this location and time),
A is the pressure_boundary_factor, and B is the pressure_boundary_offset.
Any pressure levels above this point will end up in the upper layer. The others in the lower layer
Defaults together with pressure_boundary_factor to layer number 111 (ERA5). The pressure at this boundary can be found in `WAM2layers/src/wam2layers/preprocessing/tableERA5model_to_pressure.csv’
For example:
pressure_boundary_offset: 7438.803223
- classmethod from_yaml(config_file)
Read settings from a configuration.yaml file.
For example:
from wam2layers.config import Config config = Config.from_yaml('../../cases/floodcase_2021.yaml')
- convert_dates()
- check_date_order()
- to_file(fname: str | Path) None
Export the configuration to a file.
Note that any comments and formatting from an original yaml file is lost.
- to_string()
Pydantic can’t serialize the dates. We convert these to string manually.