Friday 28 July 2017

Finding the Perfect Place to Put New Measurement Stations

One of the major outputs of this project was an "Optimal Network Design" for the placement of new CO2 measurement stations. South Africa already has a long term monitoring station at Cape Point where various "species" are continually monitored, including CO2. This station is located at one of the most beautiful spots in South Africa, and if you ever get the opportunity to visit Cape Point in the Table Mountain National Park, I would highly recommend it. The purpose of the station is to obtain background levels of the different substances it's geared to measure. It's run by the most wonderful bunch of scientists from the South African Weather Service.


Information about Cape Point GAW Station


The job of the new stations would be to provide information about continental CO2. By using information from what we think we know about where fossil fuel emissions are originating and how much is being emitted, and about the amount of CO2 taken up and respired by vegetation, we had the prior (or initial - best guess) estimates of the different South African emission sources we were interested in. We then hooked this up to a Lagrangian particle dispersion model (a fancy name for a physics model which uses various bits of climate information like wind speed and temperature to drive parcels of air around), which could then tell us, for a particular location, what information we could hope to obtain about the major emission sources (and therefore reduce our uncertainty in what we thought we knew about the emissions) if a new station was located in this position.

From a statistical/mathematical point of view, the interesting thing about this particular task were the different optimisation techniques available for this task (like the Genetic Algorithm and Incremental Optimisation), and what effect changing the spatial resolution of the information would have on the results of the optimisation.

Of all the factors we considered, the spatial resolution had the most profound impact on the solution. The spatial resolution determines something called "aggregation error". This error results from the fact that we are using a mean measure of wind speed to represent an entire block in the grid, whose size of course depends on the resolution. If, for example, the particles being carried by the Lagrangian particle dispersion model skirts the boundary of a block, the information it carries away with it will be the average information of that block. If the section that was skirted is very different from the rest of the block (e.g. the particles skirt over a section of busy road, whereas the rest of the block is covered in natural vegetation), then the information we think we're taking away will look like biogenic activity (and our model will try to correct information about biogenic activity), when in reality we're really seeing a lot more fossil fuel information than the model thinks. And that's why the higher the spatial resolution which can be managed, the better for an optimal network design, as this reduces the amount of aggregation error (which is really difficult to determine).  The limitation on spatial resolution implemented in these optimisation activities is the amount of computational resources and time available.

Another factor which impacts on the computation resources needed is the optimisation algorithm used. We considered two - Incremental Optimisation and the Genetic Algorithm. Incremental Optimisation is pretty straightforward and is actually very handy for this type of optimisation task. A list of all the potential sites for where new stations could be located is given. For every single location in the list, the algorithm will calculate the uncertainty reduction (i.e. the information gained) of having a measurement station here. It then selects the site which gives the highest information gain. That's the first station sorted. It then goes through all the remaining stations again, until it finds the one which gives the best information gain (excluding the one we already have in the solution set), and proceeds in this way until there are the number of stations required in the network.

The genetic algorithm is a little more complicated, but has the potential to find much more interesting solutions. Whereas the incremental optimisation only ever considers one station at a time, and in a sequential manner, the genetic algorithm tries to solve for all five (in our case) stations at the same time. It does this by creating a large population of different five station solutions. Each solution is assigned a fitness value (which is based on the information gained). Those with the highest fitness have the largest probability of surviving and reproducing in the next generation of solutions. The subsequent generation of solutions is then made up of the fittest members from the previous generation, and the offspring of those solutions. Some new combinations are created using mutation and cross-over (all genetic terms), whereby parts of the two solutions can randomly cross over, and where an individual station in a solution can randomly mutate into another station from the list. This ensures diversity in the the population of solutions. In our setup, we always ensured that the best solution of the previous generation made it through to the new generation intact. The solution with the best fitness after a set number of iterations is then selected as the final solution.

If you want to read more about the optimal network solution for South Africa, it's available at Cape Town Optimisation Paper ACP.

Some of the results from the South African optimal network design - Nickless et al. 2015
And at the beginning of next month (Sep 2017) I'll get a chance to present some of the results specifically looking at the Genetic Algorithm and how it compares to the Incremental Optimisation under different conditions, at the RSS 2017 conference. That is - the Royal Statistical Society Conference as opposed to the Robotics: Science and Systems Conference (which would have been cool too).