# Benchmarks

The SANGOMA proposal starts with the observation that the majority of current MyOcean products (implementing the GMES marine core service for ocean monitoring and forecasting) are based on suboptimal assimilation methods, providing a limited information about the uncertainties in the model nowcast/forecast. To go beyond the current situation, the main objective of SANGOMA is to advance the status of probablistic assimilation methods and their applicability to operational MyOcean systems. This requires (i) establishing a European network of experts in probabilistic data assimilation, and (ii) providing a harmonized access to state-of-the-art concepts, algorithms and softwares (often developed by individual efforts). To set up such an effective connection between the SANGOMA partners and the MyOcean consortium, a key element of the project is to assess the performance of the methods in a variety of testcases, including realistic assimilation problems. The implementation of these testcases (benchmarks) is the purpose of WP4, and the objective of this webpage is to document the standard data assimilation problems defining the SANGOMA benchmarks.

Usually, assimilation methods are first developed and tested using quite simple and idealized assimilation problems. This is important to check the mathematical consistency of the method, and to understand how it works without being blinded by real world approximations. On the other hand, the main purpose of assimilation methods is to solve real world problems, and it is also important to evaluate their performance with problems of higher complexity, and their robustness to inescapable approximations. This is why the SANGOMA project includes a hierarchy of benchmarks of increasing complexity: (i) a small case benchmark, based on the Lorenz-96 model with 40 variables, (ii) a medium case benchmark, based on an idealized square ocean model, and (iii) a large case benchmark, based on a realistic North Atlantic model at 1/4° resolution. To be complient with most MyOcean systems, the last two benchmarks are based on the NEMO model (see the description of the model configurations below).

In the definition of each benchmark, what must be specified is (i) the forward model that is used to describe the system (see the description of the model configurations below), and (ii) the inverse problem that must be solved (see benchmark specifications). It is only in a second step that various methods can be compared according to their relative merits in terms of reliability (in the description of the prior and posterior probability distributions), resolution (information gain or uncertainty reduction), complexity (e.g. number of free parameters that must be tuned by the user), numerical cost,... This is why each benchmark requires defining appropriate metrics to measure the assets of every method (see the definition of the metrics below). Nevertheless, at this stage of the project, it is also necessary that these definitions and metrics remain flexible enough to be adjusted to the specifities of each inverse problem (which still need to be fully specified in the next deliverable).

# Detailed definition of the benchmarks

The definition of the benchmarks include (i) the description of the model configurations, (ii) the detailed specification of the assimilation problem, (iii) the definition of a set of metrics to assess the performance of the assimilation systems, and (iv) the evaluation of the results of the experiments: