Evaluation Module

class evaluation.Evaluation(reference, targets, metrics, subregions=None)

Container for running an evaluation

An Evaluation is the running of one or more metrics on one or more target datasets and a (possibly optional) reference dataset. Evaluation can handle two types of metrics, unary and binary. The validity of an Evaluation is dependent upon the number and type of metrics as well as the number of datasets.

A unary metric is a metric that runs over a single dataset. If you add a unary metric to the Evaluation you are only required to add a reference dataset or a target dataset. If there are multiple datasets in the evaluation then the unary metric is run over all of them.

A binary metric is a metric that runs over a reference dataset and target dataset. If you add a binary metric you are required to add a reference dataset and at least one target dataset. The binary metrics are run over every (reference dataset, target dataset) pair in the Evaluation.

An Evaluation must have at least one metric to be valid.

Default Evaluation constructor.

Parameters:
  • reference (dataset.Dataset) – The reference Dataset for the evaluation.
  • targets (list of dataset.Dataset) – A list of one or more target datasets for the evaluation.
  • metrics (list of metrics) – A list of one or more Metric instances to run in the evaluation.
  • subregions (list of dataset.Bounds) – (Optional) Subregion information to use in the evaluation. A subregion is specified with a Bounds object.
Raises:

ValueError

add_dataset(target_dataset)

Add a Dataset to the Evaluation.

A target Dataset is compared against the reference dataset when the Evaluation is run with one or more metrics.

Parameters:target_dataset (dataset.Dataset) – The target Dataset to add to the Evaluation.
Raises:ValueError – If a dataset to add isn’t an instance of Dataset.
add_datasets(target_datasets)

Add multiple Datasets to the Evaluation.

Parameters:target_datasets (list of dataset.Dataset) – The list of datasets that should be added to the Evaluation.
Raises:ValueError – If a dataset to add isn’t an instance of Dataset.
add_metric(metric)

Add a metric to the Evaluation.

A metric is an instance of a class which inherits from metrics.Metric.

Parameters:metric (metrics) – The metric instance to add to the Evaluation.
Raises:ValueError – If the metric to add isn’t a class that inherits from metrics.Metric.
add_metrics(metrics)

Add multiple metrics to the Evaluation.

A metric is an instance of a class which inherits from metrics.Metric.

Parameters:metrics (list of metrics) – The list of metric instances to add to the Evaluation.
Raises:ValueError – If a metric to add isn’t a class that inherits from metrics.Metric.
metrics = None

The list of “binary” metrics (A metric which takes two Datasets) that the Evaluation should use.

results = None

A list containing the results of running regular metric evaluations. The shape of results is (num_target_datasets, num_metrics) if the user doesn’t specify subregion information. Otherwise the shape is (num_target_datasets, num_metrics, num_subregions).

run()

Run the evaluation.

There are two phases to a run of the Evaluation. First, if there are any “binary” metrics they are run through the evaluation. Binary metrics are only run if there is a reference dataset and at least one target dataset.

If there is subregion information provided then each dataset is subset before being run through the binary metrics.

..note:: Only the binary metrics are subset with subregion information.

Next, if there are any “unary” metrics they are run. Unary metrics are only run if there is at least one target dataset or a reference dataset.

target_datasets = None

The target dataset(s) which should each be compared with the reference dataset when the evaluation is run.

unary_metrics = None

The list of “unary” metrics (A metric which takes one Dataset) that the Evaluation should use.

unary_results = None

A list containing the results of running the unary metric evaluations. The shape of unary_results is (num_targets, num_metrics) where num_targets = num_target_ds + (1 if ref_dataset != None else 0