An example of scikit-learn and SciPy used for the analysis of extreme weather with Andre R. Erler

Presentation on dimanche at 3:10 après-midi to 3:20 après-midi in Room 1180.

There is considerable interest in the effect of climate change on extreme weather in the scientific community and in the public. However, detecting changes in extreme weather events in the observational record is extremely difficult, because extreme events are by definition rare and the instrumental record is not long enough to establish robust statistics for a single station record.

In this talk I show how tools from the scientific Python software stack can be used to analyze precipitation (rainfall) data and overcome this problem and detect changes in the observational record.

The analysis proceeds in two stages: first a k-means clustering algorithm (sklearn.cluster) is used to aggregate data from different stations that have similar climatological characteristics, and then a theoretical distribution function is fitted to the data (scipy.stats). The first step increases the number of data points to constrain the fit in the second step, assuming all stations in the same cluster have the same underlying distribution. The second step serves to further reduce noise and extrapolate the distribution to the most extreme quantiles. Finally a statistical test (scipy.stats) can be used to detect changes and asses statistical significance.

I will introduce the analysis algorithm using historical data from meteorological stations (Environment Canada), but I will also show how this technique can be applied to climate model projections of future climate change.

The analysis was conducted using the GeoPy analysis package, which is described in a separate talk. The package is available on GitHub: An extended abstract submitted to the Climate Informatics workshop in Boulder (September 24-25, 2015; 2 pages) is available here:

Andre R. Erler Bio

Andre is a young researcher and climate modeler; he runs regional and global climate models at the SciNet High Performance Computing facility and analyses their output. He uses Python and its scientific software stack for data handling (or ""data plumbing""), analysis and visualization, and develops tools for these tasks. Andre is also interested in machine learning and the use of data science techniques in and outside of climate science, and is somewhat concerned about the state of software development in science.

He cares deeply about open source software, open science, the environment and sustainable global development.