Multiresolution analysis ( MRA ) classification of plurennial to multi-1 decadal climate drivers to streamflow in France using Wavelet 2 Transform and Geostatistical Euclidean Distance Clustering 3 4

Abstract. It is interesting to try and classify the spatial data to infer the spatial scales associated with certain characteristics within a group of time series. It is indeed very important for operational practices such as flood management and/or climate change adaptation to not only know the severity of a potential impact but also predict the spatial extent of the impact, since similar stations should suffer similar consequences. Classifications on streamflow have already been studied (Hannah etal., 2000; Renard et al., 2006; Wilson et al., 2013) and climate to rainfall or streamflow classification has also been realized by Boe and Terray (2008), Martinez et al. (2008), Martinez and Garavaglia et al. (2010) Garavaglia. Most of those studies establish classification based on annual hydrological regimes. This is in deep contrast with the study that focuses on low frequency time series. There are two axes that the present study aims at improving on. First most of the previous works were carried on raw signals while it is known that different scales of variability (TSV) characterize variabilities of geophysical signals. For instance, atmospheric pressure, precipitation, annual cycle, aquifer contribution or even tidal cycle variability may all be contributing to streamflow variabilities (Labat et al., 2000; Massei and Fournier, 2012). Several studies have highlighted the correlation between some time scales of variability in streamflow or rainfall and the corresponding scales in climatic fields (Massei et al., 2007; Massei et al., 2010; Feliks and Ghil, 2010; Fritier et al., 2012; Pinault, 2012). More specifically on streamflow, Boe and Habets (2013) have shown that multi-decadal variability in streamflow in France was impacted by large multi-decadal climate oscillation especially the Atlantic Multi-decadal Variability. Dieppois et al. (2013) Dieppois highlighted the importance of considering the different TSV in assessing the link between climate fields and local temperature and rainfall variability. Relation of hydroclimate time scales to spatial scales has also been recently also in vestigated by Laepple and Huybers (2014). Those studies highlight the importance of considering each time scale separately. Thus prior to doing a clustering analysis, the input signals must be decomposed into nearly independent components (as Palus (2014) and Jajcay et al. (2016) Jajcay pointed out, pure independence is rarely seen). The second axis is the classification method itself. Classifying fields is always difficult because the extension in two dimensions brings the notion of "general similarity" which is different from "local similarity" i.e. two fields may have differences pointwise but an otherwise similar global shape. To deal with it, fields are either represented as index (Boe and Habets, 2013; Lopez and Frances, 2013) which necessarily lose a lot of spatial information or a mean of taking into account the global shape has to be found like, for example, the Teweles-Wobus score. However the performance of such score is very variable dependent (it works well on pressure fields but less on other field variables) (Teweles and Wobus, 1954). This study presents a method called Geostatistical Euclidean Distance Clustering (GeoEDC) whose performance, by considering the field as an image, is less variable-dependent.

Equation 1 states that the signal can be both represented on this orthonormal basis (wavelet analysis), and perfectly reconstructed from its Hilbert representation (wavelet synthesis).
The wavelet functions families that form the orthonormal basis of the Hilbert space are defined as: As equation 2 states, the wavelet at a given scale and position is a scaled and translated version of the same wavelet from the next/previous scale.One can see that wavelet transforms are by nature multiresolution transforms like Fourier transforms but they differ in that the wavelet is not periodic.
This allows the wavelet transform to act as a time-frequency transform, providing not only information on the time scales (that can also be approximated as Fourier frequencies), but also on the temporal evolution of those scales., , the wavelet coefficients, express how the signal varies from one position to another, those position depending on the scale (Precise interpretation of wavelet coefficients depends on the wavelet function family chosen and can sometimes be very tedious).
The choice of the wavelet function family is important.This choice must be based on three parameters: the function itself (which conditions the interpretation of wavelet coefficients and the reconstruction process), the size of its support (the size where the wavelet value is non zero), and the number of its vanishing moments.The choice will mainly be a compromise between frequency and time resolution.A wavelet with a short support will have a very good time resolution (being able to detect singularities) but a bad frequency resolution and vice versa.A number of wavelet functions have been developed and more information can be found in the reference previously cited.
Using equation 1, wavelet transforms can be used mainly in two ways: data analysis and data reduction.The former involves studying wavelet coefficients to gather information about the variation of the data at different scales and times (equation 1 solved for , ), while the latter mainly involves decomposing the signal into a set of orthonormal components ( at chosen scales) and retaining only the most significant ones (in term of variability).Example of the first use is time series analysis while image/video compression is a common application of the second.
The wavelet method used in this study is the multiresolution analysis (MRA) with the aim of decomposing both streamflow and climate fields into components of increasing time scale of variability (TSV).Detailed theory explaining the multi resolution nature of wavelets can be found in Mallat(1989) and one of its most famous applications for computers in Percival and Walden (2000).The MRA analysis is based on equation 1, but the scales and positions are chosen so as to decrease the number of needed information as much as possible.The scales chosen are dyadic (i.e: power of 2), and positions are chosen so that the number of positions where the coefficients are calculated is decreased by 2 at each scale.This peculiar MRA technique is a discrete wavelet transform (DWT) process that satisfies the orthogonality of the basis both in translation and scale.It is also possible to use an undecimated discrete wavelet transform (MODWT) where the number of coefficients isn't down sampled at each scale.This kind of MRA loses orthogonality in scale, but allows for shift invariance of the coefficients.For this study the DWT was chosen as the MRA technique.The wavelet chosen for the MRA is a D4 (Daubechies 4 wavelet), which offers a good frequency-time resolution compromise.Both streamflow and climate data are the pixels.The idea is that pixels that are close do not exhibit strong differences in gray levels and that when it happens it is due to a deformation.Thus the method gives importance to differences of gray levels that are spatially far enough from each other.The aim there is to be able to recognize global patterns that are similar even if locally small distortions exist.Such a method is thus adapted to the clustering of climate fields whose impact on streamflow depends both on actual values (pressure, wind strength etc ), but also the shape of the fields.For example it is well known that the pressure field pattern changes in the North Atlantic often identified as "NAO+/-, Atlantic Ridge or Scandinavian Blocking weather regime", will dictate a large part of the climate variability over Western Europe.The IMED method was thus modified and adapted in this work into the GeoEDC method.The difference lies into the calculation of the distance between pixels which now becomes a distance between coordinates on the map and thus takes into account Earth curvature.
The basic steps are as following: The Euclidean distance between two correlation maps is calculated as in: ( , ), the distance between correlation map of station and correlation map of station can be calculated by the outer product of the two correlation maps , and the matrix of weights : is the matrix of distance between each coordinates.can be seen as a Gaussian ponderation of the variable (e.g.geopotentials correlation) difference between two coordinates based on their spatial distance.Instead of calculating the spatial distance using trigonometric functions the coordinates are translated into n-vectors.This prevents calculation errors when angles approach certain values.
A n-vector is a 3 dimensional vector defined as: When used on a computer, calculation of the spatial distance between two n-vectors is defined as: The spatial distance between a n-vector and n-vector , , is thus the product of radius of Earth and the angle between the two coordinates represented by the n-vectors, 2(| × |, .).
To sum up, due to matrix , neighbor's coordinates will see their variable difference decreased and far away coordinates will have it untouched.Extremely far away coordinates are also filtered.
The GeoEDC gives a distance matrix between correlation map, and an ascendant hierarchical clustering (AHC) is applied on the distance matrix using the complete metric distance.A silhouette optimization algorithm gives the best number of clusters for each climate driver by minimizing the inverse of the sum of the mean total silhouette width and the intra silhouette standard deviation.

Cluster maps of streamflow based on their climate drivers
Figure 3 shows the results of the GeoEDC applied to the correlation climate fields for the four climate drivers and for the three TSVs.
Clear spatial patterns are visible for all climate drivers as well as for all TSV even when the number of clusters is only two.
Even at the highest TSV clear spatial repartition shows up.
Inter climate driver comparison shows that clusters are not necessarily the same depending on the climate driver.This highlights the complexity of climate influence, even for relatively close stations.Nevertheless as the TSV increases the inter climate driver variability decreases, that is more stations share the same climate drivers.
Inter TSV comparison indicates that for a given climate driver the clustering changes quite significantly according to time scale.However one can see that as TSV increases the clusters shape and repartition are more and more dipolar (often with north/south shape and repartition).
Spatially, the Mediterranean area is often a singularity even at high TSV.The results can be broken down into two components.First the intra TSV comparison, for one given TSV, assesses the difference between clusters.The second one is an inter TSV comparison.The latter answers the question "Are CCFs different depending on the TSV?".Indeed, it must be stressed that a certain cluster at one TSV, say cluster 1, do not contain the same stations at another TSV so this latter way of tackling the results cannot be called "intra-cluster".It will be referred as inter-TSV.

Climate Correlation Fields
Intra TSV results show that climate driver clustering is sometimes based on shapes i.e for a given TSV, clusters CCF differ on their very shape, and sometimes only correlation values are distinct, which is the case as one go to higher TSV.
Inter TSV results show significant variability in CCF depending on the TSV.Geopotential and wind show very different structures for each of their TSV.Of course fields themselves are of particular interest.At the 6 year TSV, geopotential is organized with clear pressure dipole in the Euro-Atlanic zone with a negative correlation zone over England and a positive correlation zone, located along an arc starting from Western Africa coast to South Mediterranean basin for stations of Central to Eastern France (Figure 5a).
Interestingly, for the Mediterranean stations (cluster 2) streamflow is not correlated with Mediterranean geopotential (Figure 5a).Correlation values are not very high with values up to ±0.6.Zonal Wind patterns clearly show Westward reinforcing influence with a positive correlation zone over the North Atlantic.Differences between clusters result from the latitudinal localization of the high positive correlation band (Figure 5b).For example southern stations are mainly correlated with southernmost westerlies.Meridional wind shows evolution towards two vortices identified by their opposing poles, one corresponds to a counterclockwise vortex over England, the other to a clockwise vortex over western coast of Africa which is in line with the geopotential CCFS previously described (Figure 5c).Correlation values for winds are high, with some areas having higher than ±0.7 correlation values.Air humidity show a local pattern with high correlation over France for all but the southern stations where streamflow is positively correlated with air humidity over the Eastern North Atlantic (Figure 5d).Correlation values are high with values up to ±0.7.
At the 10 year TSV, geopotential shows a clear dipole correlation pattern with a strong negative band from the Eastern US coast to England, and a strong positive correlation band from Eastern Canada to Scandinavia.The difference between clusters is the presence or absence of a positive correlation area over the African continent (Figure 5a).Correlation values are high up to ±0.8.Zonal wind shows a similar arrangement with the decrease in the western wind from North America Eastern Coast to England, and a positive correlation zone at the south of that band.Difference between clusters seems to be the presence or absence of vortex like systems with some clusters more uniform along the band (Figure 5b).Indeed Meridional wind show vortex-like circulations on the eastern coast of North America and over Northern Europe (Figure 5c).
Correlation values are high for winds, going up to ±0.9.Air humidity shows positive correlation value at the location of the wind and geopotential vortex-like systems.Cluster differences are based on the strength of a positive correlation band going from England to the Middle-East (Figure 5d).Correlation values are high (±0.9).
At the 21 year TSV, Geopotential shows a structure reminiscent of the 10 year band with the band of negative correlation over the Eastern US coast to England, and a positive band over Eastern Canada to Scandinavia.The difference between clusters is the general level of correlation.While most stations are highly (anti)correlated with geopotential, the Mediterranean and South Britany stations are not (Figure 5a).Both winds CCFs are also very close to their 10 year TSV counterpart and they are both having high correlation values.Note that the number of clusters has gone from 3 in the 10 year TSV, to 2 in the 21 year TSV (Figure 5b and 5c).Air humidity on the other hand shows a different shape from its previous TSV with a generalization of positive correlation zones over the Northern Atlantic (Figure 5d).High correlation values are still present.

About GeoEDC clustering
The GeoEDC method is not the first to focus on the shape of fields rather than strict values of the variables, however the method differs from those of Boé and Terray(2008) their gradient because it translates into the notion of wind patterns (gradient of pressure effectively being the main driver with Coriolis force behind wind direction).It follows that such technique work well with pressure fields, but may not be adaptable to humidity fields for example.By treating fields as images, the GeoEDC method seems quite suitable for any type of field where global shape is more important that the mean values.The real strength of GeoEDC lies in its ability to account for local deformation and thus concentrate on global shapes.When those are comparable between stations, the method performs classification on mean values as is it still a Euclidean Distance based method.
One important aspect of the GeoEDC method is its capacity to produce smooth clustering i.e. making smooth transitions between stations belonging to different clusters but close spatially.Here we will verify this property by the following two step approach: The first one only consists in comparing CCF at stations located at the interface between two clusters, to see if they share similar CCFs.The second step is more complex, and requires finding the point at the center of the cluster, based on the rational that closest stations to this point are also the farthest from the interfaces and thus should have the most representative characteristics of the cluster's CCF.Finding a center for a cluster is easy; however this center is only relevant if the cluster is physically bounded by other clusters.If the boundaries aren't made of other clusters (for example, the cluster is limited on one of its side by sea or if the cluster has sides on the borders of the stations coverage) then the cluster is not totally representative of the CCF, and the real physical center may lie elsewhere.This is the case for the cluster maps produced for this study.The clusters all have an artificial boundary and thus are never completely bounded by other clusters.
Finding the geometric center of the clusters as they are may be of some use however.Computing it and comparing stations close to that center with stations on interfaces will give information about the real spatial extent of the cluster.If stations close to that center are very different from those at interfaces, this means that the real extent of the cluster is close to the one provided by the study, otherwise this indicates that one should look at stations close to artificial boundaries as they may provide very different CCFs.

Cluster maps
The results show that for one TSV, clusters are sometimes different depending on the climate drivers.The exploration of the physical processes behind this repartition is beyond the scope of this study, and should be treated in future work.However one thing that appears in the results is that with increasing TSV, the clustering is more and more the same between climate drivers especially between geopotential and wind.Indeed, already from the 10 year TSV, the clusters are all along the same lines, only the number of clusters differs.This translates into stations sharing more and more of the same climate drivers.
This suggests that the spatial scales of climate drivers correlated with stations increase with increasing TSV.This is a Hydrol.Earth Syst.Sci. Discuss., doi:10.5194/hess-2016-395, 2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: 26 August 2016 c Author(s) 2016.CC-BY 3.0 License.
commonly accepted hypothesis that large TSV are correlated with large spatial scales.This is however something that would have to be verified in future work.One interesting detail is that for each TSV, geopotential always displays the smallest number of clusters, and the most homogeneous spatial patterns.This suggests that the causal chain of climate drivers correlated with streamflow in France is first geopotential, then winds and humidity.This is what was highlighted in the results section i.e. winds and air humidity clusters differ in either latitude or longitude, while the geopotential CCF are more similar and have a far more homogeneous structure.
Looking at inter-TSV difference in clustering, the same remark can be made.With increasing TSV, the clusters align more and more along a latitudinal repartition with Southern, Central and Northern France apart.The number of clusters decreases for each climate driver as the TSV increases.This again suggests that spatial scales increase with TSV.

Climate Correlation fields
On a general note, it is important to understand one nuance that comes from wavelet analysis.Components of a wavelet transform are subtracted not from the mean but from the original signal.It means that the CCF should be understood not as e.g."streamflow increases when winds are westward", but as "streamflow increases when the winds tend to be accentuated westward" (an eastward wind becoming less eastward falls in this last category).Thus, the correlation field map should be treated as correlation between streamflow and climate driver variations of variation (Correlation analysis treats with variations, but wavelet analysis do so too hence the "variations of variation").As an example, the zonal wind CCF at 10 and 21 year TSVs doesn't show easterly winds at the North (negative correlation area), and westerly at the south of the North Atlantic.Similarly, they do not show that vortices develop on eastern US coast and over north England.Rather, they show that streamflow reacts to some dynamics that tend towards those structures.The total (the actual field of wind values) field for this TSV may still be different as shown in Figure 6.One can see that the field is in fact mostly made of westerlies on the Northern part of the North Atlantic, while the inverse is true for the Southern part.
Establishing the climate fields that drive either precipitation or streamflow over France has already been achieved, for example by Boé and Terray(2008) and Garavaglia et al.(2010)Garavaglia focusing on spatial extent like this study.However other studies have been realized focusing more on the temporal evolution of climate forcing at some TSV.For example, Sutton and Hodson(2005), Sutton and Dong(2012) inspected the role of multi-decadal variability in SST to explain changes in Europe climate.In particular Boé and Habets(2013) studied the multi-decadal streamflow variability in France for a number of rivers, and studied their link with large scale fields such as sea level pressure and Atlantic Multidecadal Variability.Their work concentrated on spring months but they noted that in winter (the months chosen for this study), multidecadal variabilities were not synchronously connected with local climate variation (e.g temperature, precipitation).The authors found that when correlation existed, streamflow were correlated with negative pressure fields over continental Europe, and positive pressure fields over central Atlantic.The results of this study are consistent with those from the authors in that the same field at that scale was found.However, the present study found good correlation between geopotential and streamflow during winter months in contrary to the previous authors.It should be noted that Boé and Habets(2013) used an index, while "original" fields were used in this study.This may explain the differences in correlation results.In addition, it should be noted that stations from the second cluster at the 21 year scale are far less correlated that the one in the first cluster.Those stations almost all lie in the Western Mediterranean area.
Finally, the spatial to temporal scale relationship is really interesting.As noted previously, station clustering in France tends to get less complex as TSV increases both intra-climate driver and inter-TSV.This is confirmed by CCF maps which show that the patterns at the higher TSV become more "global", and the difference between clusters more due to differences in correlation values instead of field shape (weather regimes tend to be similar).However a close look highlights areas where main dipole value differences, but some distinct differences in shape are visible on the Eastern coast of the South African continent.Similarly, the Western Siberia seems to be under westerlies reinforcement area in the first cluster but completely out in the second.The same can be said of air humidity 21 year TSV CCF (Figure 5d), where Western Mediterranean basin is either under a highly positive correlation area or not at all.This leads to think that several spatial scales actually coexist within one TSV.Future work is needed to investigate this temporal-spatial scale relationship.

Conclusion
This study aimed at assessing whether a regional classification of streamflow variability in France based on large-scale climate drivers was possible according to interannual to interdecadal time-scales.

decomposed into 9
components according to the following time scales: 3 Months, 7 months, Annual, 1.5 years, 4 years, 6 years, 10 years and 21 years.The last component is composed of the rest of the signal (usually contains the mean).For this study, only the 6, 10 and 21 year components are kept.Correlation Analysis between a streamflow time series and a climate field is done pointwise.Hydrol.Earth Syst.Sci.Discuss., doi:10.5194/hess-2016-395,2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: 26 August 2016 c Author(s) 2016.CC-BY 3.0 License.Once the correlation maps are obtained, a Geostatistical Euclidean Distance Clustering (GEDC) is undertaken to see which stations have similar climate fields.GEDC is a method adapted from the IMage Euclidean Distance (IMED) proposed by Wang et al.(2005)Wang who designed a Euclidean distance algorithm for images taking into account the distance between

Figure 4
Figure 4 represents the average correlation fields corresponding to each cluster, for each climate driver, and each TSV.The GeoEDC method first distinguishes the general shape of the climate correlation field (CCF).If two are very similar then discrimination is made on the actual correlation values.

Figure 5
Figure5illustrates the latter point.The spatial structure of CCFs on the Mediterranean part of the cluster 2 (black points) of geopotential for the 6 year TSV is shown.The cluster has North and West boundaries with another cluster, but the South and East boundaries are artificial (respectively coastline and France/Italy border).Analyzing the stations at the interface between clusters shows that indeed the GeoEDC provides smooth transitions as stations from two different clusters that are the closest do produce similar CCFs (Arrows on Figure5point at stations from different clusters that are close together).The geometrical center of the cluster was calculated using a convex set minimization problem with linear constrains taking into account the Earth curvature.This problem can be explained as finding the smallest spherical cap that contains all the stations of the cluster, and then finding the center line of the spherical cap.This is equivalent to finding the smallest vector from the center of Earth to the base plane of the spherical cap.One can see that the stations at the interface seem to follow a counter clockwise differencing scheme.Starting from the Southeast most stations, going counter clockwise, CCFs change gradually towards a Scandinavia-Equatorial/Mid latitudes pressure dipole.The stations that are closest to the geometric center actually take the CCF of the northern stations even those on interfaces.Since those stations are not different from stations at the interfaces, this means the real geometric center of the cluster lies probably much further south (possibly over the Mediterranean Sea).

Fig. 1 .
Fig. 1.Streamflow time series stations map; Stations chosen are those with little anthropogenic impact.

Fig. 6 .
Fig. 6.Average Zonal wind Field over the Euro Atlantic Area Wang et al.(2005)on 152 climate sensitive streamflow watersheds over France.Our work focused on three time scales of variability (TSV): 6, 10 and 21 years.In order to test this hypothesis, a clustering by hierarchical ascendant classification was applied to 4 types of correlation maps between climate fields and streamflow time series in France: Geopotential, zonal and meridional wind and air humidity (all at 850mb).To this end, we developed a method called Geo Euclidean Distance (GeoEDC), adapted from Image Euclidean Distance first proposed byWang et al.(2005)Wang for image processing.This method computes similarity between fields based on their global shapes before looking at mean values.By treating field as images, GeoEDC works on more types of fields that methods previously developed.The results of the clustering show distinct spatial patterns for every climate drivers and for all time scales of variability considered.The general trend is that at the lowest TSV (6 year) stations that share one climate driver do not necessarily share the others.As the TSV increases, more stations share the same climate drivers.