Dataset Processor Module

dataset_processor.deseasonalize_dataset(dataset)

Calculate daily climatology and subtract the climatology from the input dataset

Parameters:dataset (dataset.Dataset) – The dataset to convert.
Returns:A Dataset with values converted to new units.
Return type:dataset.Dataset
dataset_processor.ensemble(datasets)

Generate a single dataset which is the mean of the input datasets

An ensemble datasets combines input datasets assuming the all have similar shape, dimensions, and units.

Parameters:datasets – Datasets to be used to compose the ensemble dataset from. All Datasets must be the same shape.
Return type:dataset.Dataset
dataset_processor.mask_missing_data(dataset_array)

Check missing values in observation and model datasets. If any of dataset in dataset_array has missing values at a grid point, the values at the grid point in all other datasets are masked. :param dataset_array: an array of OCW datasets

dataset_processor.normalize_dataset_datetimes(dataset, timestep)

Normalize Dataset datetime values.

Force daily to an hour time value of 00:00:00. Force monthly data to the first of the month at midnight.

Parameters:
  • dataset (dataset.Dataset) – The Dataset which will have its time value normalized.
  • timestep (string) – The timestep of the Dataset’s values. Either ‘daily’ or ‘monthly’.
Returns:

A new Dataset with normalized datetime values.

Return type:

dataset.Dataset

dataset_processor.safe_subset(target_dataset, subregion, subregion_name=None)

Safely subset given dataset with subregion information A standard subset requires that the provided subregion be entirely contained within the datasets bounds. safe_subset returns the overlap of the subregion and dataset without returning an error.

Parameters:
  • subregion (dataset.Bounds) – The Bounds with which to subset the target Dataset.
  • target_dataset (dataset.Dataset) – The Dataset object to subset.
  • subregion_name (string) – The subset-ed Dataset name
Returns:

The subset-ed Dataset object

Return type:

dataset.Dataset

dataset_processor.spatial_regrid(target_dataset, new_latitudes, new_longitudes, boundary_check=True)

Regrid a Dataset using the new latitudes and longitudes

Parameters:
  • target_dataset (dataset.Dataset) – Dataset object that needs spatially regridded
  • new_latitudes (numpy.ndarray) – Array of latitudes
  • new_longitudes (numpy.ndarray) – Array of longitudes
  • boundary_check (:class:'bool') – Check if the regriding domain’s boundaries are outside target_dataset’s domain
Returns:

A new spatially regridded Dataset

Return type:

dataset.Dataset

dataset_processor.subset(target_dataset, subregion, subregion_name=None, extract=True, user_mask_values=[1])

Subset given dataset(s) with subregion information

Parameters:
  • subregion (dataset.Bounds) – The Bounds with which to subset the target Dataset.
  • target_dataset (dataset.Dataset) – The Dataset object to subset.
  • subregion_name (string) – The subset-ed Dataset name
  • extract (boolean) – If False, the dataset inside regions will be masked.
  • user_mask_value (int) – grid points where mask_variable == user_mask_value will be extracted or masked .
Returns:

The subset-ed Dataset object

Return type:

dataset.Dataset

Raises:

ValueError

dataset_processor.temperature_unit_conversion(dataset)

Convert temperature units as necessary Automatically convert Celcius to Kelvin in the given dataset.

Parameters:dataset – The dataset for which units should be updated. :type dataset; dataset.Dataset
Returns:The dataset with (potentially) updated units. :rtype: dataset.Dataset
dataset_processor.temporal_rebin(target_dataset, temporal_resolution)

Rebin a Dataset to a new temporal resolution

Parameters:
  • target_dataset (dataset.Dataset) – Dataset object that needs temporal rebinned
  • temporal_resolution (string) – The new temporal resolution
Returns:

A new temporally rebinned Dataset

Return type:

dataset.Dataset

dataset_processor.temporal_rebin_with_time_index(target_dataset, nt_average)

Rebin a Dataset to a new temporal resolution

Parameters:
  • target_dataset (dataset.Dataset) – Dataset object that needs temporal rebinned
  • nt_average – Time resolution for the output datasets. It is the same as the number of time indicies to be averaged. length of time dimension in the rebinned dataset) = (original time dimension length/nt_average)
Returns:

A new temporally rebinned Dataset

Return type:

dataset.Dataset

dataset_processor.temporal_slice(target_dataset, start_time, end_time)

Temporally slice given dataset(s) with subregion information. This does not spatially subset the target_Dataset

Parameters:
  • start_time (:class:'int') – start time
  • end_time (:class:'datetime.datetime') – end time
  • target_dataset (dataset.Dataset) – The Dataset object to subset.
Returns:

The subset-ed Dataset object

Return type:

dataset.Dataset

Raises:

ValueError

dataset_processor.temporal_subset(target_dataset, month_start, month_end, average_each_year=False)

Temporally subset data given month_index.

Parameters:
  • month_start (int) – An integer for beginning month (Jan=1)
  • month_end (int) – An integer for ending month (Jan=1)
  • target_dataset (Open Climate Workbench Dataset Object) – Dataset object that needs temporal subsetting
  • average_each_year (:class:'boolean') – If True, output dataset is averaged for each year
Returns:

A temporal subset OCW Dataset

Return type:

Open Climate Workbench Dataset Object

dataset_processor.variable_unit_conversion(dataset)

Convert water flux or temperature variables units as necessary

For water flux variables, convert full SI units water flux units to more common units. For temperature, convert Celcius to Kelvin.

Parameters:dataset (dataset.Dataset) – The dataset to convert.
Returns:A Dataset with values converted to new units.
Return type:dataset.Dataset
dataset_processor.water_flux_unit_conversion(dataset)

Convert water flux variables units as necessary

Convert full SI units water flux units to more common units.

Parameters:dataset (dataset.Dataset) – The dataset to convert.
Returns:A Dataset with values converted to new units.
Return type:dataset.Dataset
dataset_processor.write_netcdf(dataset, path, compress=True)

Write a dataset to a NetCDF file.

Parameters: