Hydrological response to climate extremes in mesoscale ( pre-) Alpine basins at 0 . 5 ◦ and hyperresolution

The response of key hydrological variables to climate extremes within five meso-scale basins in the Swiss Alps is investigated at two different resolutions using the distributed hydrological model Spatial Processes in Hydrology (SPHY). Based on elevation and presence of glaciers, three catchments are identified as Alpine and two as pre-Alpine. We run SPHY both at hyperresolution and at 0.5×0.5◦, and aggregate simulated runoff and evapotranspiration per season. For four seasonal extremes representing flood and drought/heatwave conditions we investigate the simulated response at both model resolutions. 5 Results from the high resolution model show that the within-basin response gets more complex with more extreme events. The response within each basin can be grouped per land use type, due to different dominant runoff generating processes. A comparison with the coarse resolution model results shows that there is a large discrepancy between the two simulated responses. The low resolution model is not able to correctly simulate the complex hydrological response as simulated with the distributed model, since both the complex topography and land use classes are not properly represented. We show that 10 hydrological response simulated with a high resolution model can be a lot more extreme than a low resolution model might indicate, which has important implications for global assessments carried out at course resolution.


Introduction
It is currently well understood that global warming will lead to an intensification of the global hydrological cycle (Huntington, 2006;Stocker et al., 2013).Many studies have investigated global or continental trends in the terrestrial water cycle that are the result of recent and projected changes in climate conditions (e.g.Luterbacher et al., 2004;Sánchez et al., 2004;Barnett et al., 2005;Beniston et al., 2007;Sheffield and Wood, 2008;Adam et al., 2009;Sheffield et al., 2012;Van Huijgevoort et al., 2014;Jacob et al., 2014).Most of these studies used models which are run on rather coarse spatial resolutions, often 0.5 • × 0.5 • , and rely on standardized anomalies such as the Standardized Precipitation Index (SPI) in order to make qualitative comparisons between different climatic regions across the globe.For example, Van Huijgevoort et al. (2014) used the output of multiple global hydrological models on a relative coarse resolution of 0.5 • ×0.5 • and investigated effects of climate change on droughts.Sánchez et al. (2004) and Beniston et al. (2007) both used regional climate models to identify areas showing extreme response to future events in the Mediterranean region and the entire European continent, respectively.In course-scale global studies, it is often implicitly assumed that the sign of the course-scale anomaly represents the average anomaly at the finer scale.However, few, if any, studies have focused on the comparison of anomalies at multiple scales.
We note the recent trend in hydrological modeling, which convenes the coarse global modeling with local high resolution modeling: hyper resolution modeling (Wood et al., 2011;Bierkens, 2015;Bierkens et al., 2015).Global simulations on high resolutions are very likely to better represent the locally relevant hydrological processes, reducing or even removing the possible discrepancy induced by coarse resolution modeling.However most current global studies are still ran on relatively coarse spatial resolutions, due to e.g. a lack of global input data at the required resolution or a lack of computational power (Beven and Cloke, 2012;Beven et al., 2015;Melsen et al., 2016b).As a result, an analysis of potential scaling issues in anomalies at the scale of mesoscale basins is highly relevant.
In contrast to global scale studies, research on basin-scale hydrological response to climate change or climatic extremes often makes use of high resolution models fed with output from global or regional climate models to examine local response in more detail.For example, Immerzeel et al. (2012) analyzed the impact of climate change on a glacierized catchment in the Himalayas using a high resolution hydrological model.Wong et al. (2011) studied the effects of climate change on droughts in Norway using HBV at a 1 × 1 km resolution.Middelkoop et al. (2001) described the impact of climate change on the entire Rhine basin using different hydrological models for different regions of the basin.Hurkmans et al. (2009) and Hurkmans et al. (2010) used the Variable Infiltration Capacity model to examine changes in stream flow throughout the Rhine basin as result of land use change and climate change, respectively.Driessen et al. (2010) used HBV at a high resolution (0.088 • ) to study changes as result of three possible climate change scenarios in the Ourthe, a tributary to the Meuse.The majority of individual river basin studies describe the impact in terms of change in fluxes with respect to the normal situation, but without relying on standardized values as is common in (global) climate studies.This ensures meaningful comparisons between the new and old situations, but makes comparing different basins more difficult.
Due to the many nonlinearities and thresholds in hydrological processes, discrepancies in simulated hydrological responses can arise between lumped simulations or simulations on coarse global grids at similar scales on the one hand, and highresolution simulations on the other hand.This indicates not only that a small change in input or in the initial conditions can lead to a relative large change in hydrological response but also that scaling of processes over heterogeneous catchments might not be trivial.For example, Blöschl et al. (2013) investigated the 2013 flood of the Danube river caused by extreme precipitation.
They found that the discharge peak could have been higher, since not all precipitation fell as rain.In parts of the catchment that were high enough for the temperature to stay below the snow threshold, a fraction of precipitation fell as snow, and did not directly contribute to the discharge.Teuling et al. (2013) showed that during droughts evaporation increased, based on data from several headwater catchments in Europe including the Swiss Rietholzbach catchment located within the larger Thur basin (Seneviratne et al., 2012).This was explained by the lack of rainfall coinciding with reduced cloud cover and increasing net radiation, which out-weighted the effect of lower soil moisture conditions.Jolly et al. (2005) studied how vegetation responded to the extreme summer of 2003 in the Swiss Alps.They found that vegetation response was not homogeneous, but showed different responses per elevation zone.Zappa and Kan (2007) present the hydrological response during the summer of 2003 for multiple basins in the Swiss Alps, and found that basins with glaciers generally respond in a different way than basins without glaciers.These four examples indicate the complexity of hydrological processes in response to climate variability and how they are highly variable in both space and time.We expect that model resolution will play an important role in the simulated response, especially in complex basins, where most of the variability occurs at scales smaller than the grid-size of coarse-scale models.
To investigate the potential discrepancy in simulated hydrological response at different scales or resolutions, we focus on five meso-scale basins in Switzerland with a size roughly corresponding to the 0.5 • × 0.5 • pixel scale.These basins have complex topography with large elevation gradients, and are known to show complex behavior due to the non-trivial response of snow and glaciers to extreme events (Verbunt et al., 2003;Zappa and Kan, 2007;Van Tiel et al., 2017).With more than one-sixth of the global population's water supply originating from glaciers and snow packs (Barnett et al., 2005), correct understanding of these regions is of great importance.Numerous studies have investigated the role of resolution on the simulated response, both for regional climate models and for hydrological models (e.g.Haddeland et al., 2002;Leung and Qian, 2003;Carpenter and Georgakakos, 2006;Gao et al., 2006;Lucas-Picher et al., 2012;Pryor et al., 2012;Lobligeois et al., 2014;Kumar et al., 2016;Melsen et al., 2016a).Most studies conclude that high resolution modeling yields more realistic results, but it depends on the complexity of the region, process representation within the model and resolution of the input data.Furthermore, the Swiss Alps have been studied intensively, with or without a focus on extremes (e.g.Gurtz et al., 2003;Verbunt et al., 2003;Jolly et al., 2005;Schaefli et al., 2007;Zappa and Kan, 2007;Bavay et al., 2013;Speich et al., 2015).However, no study has investigated the distribution of hydrological flux anomalies in response to seasonal climate extremes, and how these depend on model resolution.

Methods, model and data
For this study, we selected five meso-scale basins in the Swiss Alps.Not only is the response of these basins relevant on regional scale, these basins can contribute for considerable amounts to large rivers in Europe.During the warm and dry summer of 2003, melt water from snow and ice mostly originated in the Swiss Alps contributed for almost 40% to the total discharge of one of the largest basins in Northern Europe, the Rhine (Wolf et al., 1999;Stahl et al., 2016).While not all basins are tributaries to the Rhine, they nonetheless provide important insight into our understanding on the behavior of mountainous catchments.
The basins were selected based on size (roughly corresponding to a 0.5 • × 0.5 • pixel scale), elevation range, percentage of glaciation and data availability (see Table 1 for these statistics).Figure 1 shows the locations and digital elevation models of each catchment.Please note that not always the entire river basin is chosen; see Table 1 for the names and station IDs used by Swiss Federal Office of the Environment.Two basin categories can be distinguished: high elevated catchments with glaciers (Reuss, Rhone and Inn) and lower elevated basins without glaciers (Emme and Thur).We will refer to those basin categories as Alpine and pre-Alpine, respectively.
The Spatial Processes in Hydrology model (SPHY) was used to simulate each basin (Lutz et al., 2013(Lutz et al., , 2014;;Terink et al., 2015;Lutz et al., 2016;Hunink et al., 2017).SPHY is a spatially distributed conceptual hydrological model, including representations of rainfall-runoff, cryosphere, evapotranspiration, dynamic vegetation, lake/reservoir and soil moisture processes , as well as their main nonlinearities and thresholds.The model runs of a fixed daily time step and a user defined resolution.Subgrid variability is taken into account via cell fractions.A schematic overview of the model concept is presented in Fig. 2. Based on the daily average temperature, SPHY determines whether precipitation will fall as snow or rain.The liquid precipitation will fall on the land surface, where part of the water can be directed to the river as surface runoff depending on the volume of water already present in the rootzone.The remainder infiltrates in the first soil layer, where it is subject to evapotranspiration based on the type of land use.Water in the rootzone can either percolate to the second soil layer (the subzone), or is transported to the river network as lateral flow.From the subzone, water can either move back into the rootzone as result of capillary rise, or can percolate to the third soil layer.Water in the groundwater layer can contribute to the river discharge as baseflow.The solid precipitation is added to the snow storage, where melting of snow is diverted to the stream network as snow runoff.Finally, part of the grid cell can consist of glaciers.A fraction of the melted ice is added to the groundwater storage, and another fraction is transported to the river as glacier runoff.The glaciers in SPHY are fixed in space and time, so glaciers cannot extend and retreat.More information about the model structure and parameterizations are provided by Terink et al. (2015).
The model is forced with daily precipitation and temperature from MeteoSwiss (MeteoSwiss, 2013(MeteoSwiss, , 2016)).Land use data was obtained from WSL (2016) and grouped into four classes: forest, grass, glacier and sparse/bare vegetation (referred to as "other").Discharge observations are obtained from FOEN (2016).Catchment elevation, delineation and stream network is derived from the digital elevation model from Jarvis et al. (2008).
SPHY was applied to each basin at two different resolutions: at ∼ 500 × 500m (corresponding to the hyperresolution), and at ∼ 40 × 40km (corresponding to the 0.5 × 0.5 • resolution).All input data were re-sampled to match the spatial resolution of the hydrological model.SPHY was calibrated individually for both resolutions and all basins, using the using the L-BFGS-B algorithm to minimize the sum of squares between monthly simulated and observed discharge (Zhu et al., 1997).We calibrate on monthly discharge since we are interested in seasonal responses, and not in the day-to-day variation.SPHY was calibrated over a period of two years (1999)(2000), where the preceding year 1998 was used as spin-up period.Four parameters were selected for calibration, all of which were found to influence the monthly discharge: rootzone depth, degree-day factor for snow melt, a parameter determining the fraction of water that can refreeze in the snow pack, and the critical temperature describing the point where precipitation falls as snow.Since the L-BFGS-B algorithm is highly sensitive to the initial parameter guess, 10 different starting parameters sets were generated using Latin Hypercube Sampling to cover the parameter space (McKay et al., 1979).The calibration resulted in 10 new parameter sets per region and model type, and the best performing set was selected based on the Kling-Gupta efficiency (Gupta et al., 2009).Using this parameter set, SPHY was ran from 1993 to 2014, where the first year was used as a spin-up period, resulting in 21 years of data used for analysis.
Model output was used to describe the response within each basin, focusing on the distribution of generated runoff and actual evapotranspiration.Output is averaged over three months, grouping the hydrological response per season: December, January and February for winter (DJF); March, April and May for spring (MAM); June, July and August for summer (JJA); Overview based on the more detailed concept by Terink et al. (2015).
September, October, November for autumn (SON).Standardized anomalies are used to quantify the magnitude of the deviation within each season, and are calculated for each individual model cell according to: (2) where µ S x is the mean of variable x in season S, x S i is the value of variable x for year i in season S, ranging from 1994 to 2014, σ S x is the standard deviation of x based on the same period, and Z S xi is the dimensionless standardized anomaly of variable x for year i in season S. We note that most often climatologies are calculated based on data series of 30 years or more.Since we are not interested in absolute values, but in the patterns and relations, we do not expect any different conclusions when longer time series are used.

10
For this study, we are also interested in the difference between lumped and distributed model simulations, and its implications on the simulated anomalies.To quantify those differences, different metrics can be used.Two known and widely used metrics coincides with the median of the distributed results.However, the percentile does not provide information about the error committed when neglecting/simplifying spatial complexity, only about the relative position of the lumped value.The second well known option is the root mean square error (RM SE).This metric gives insight into the average error between the single lumped anomaly and the entire distributed dataset.This metric can be rewritten as the root of the sum of the variance and squared bias, calculated for each hydrological flux and season, according to the following equation: where σ 2 and µ the variance and mean of the distributed model anomalies and is the lumped model anomaly.The RM SE provides more insight into the error induced by the change in model resolution, measuring the mean error.For this study, we are interested in the entire range of distributed simulation results, since we assume every model cell to be equally important.
The variance and mean used in the RM SE calculation does not provide sufficient information to capture the entire data range.
As a result, we conclude that both metrics seem to provide insufficient information to describe the error for this study.
To better quantify the differences between the lumped model and the distributed model output, we propose a new metric: the Density Weighted Distance (DW D).DW D measures the distance between lumped anomaly and extend of the distributed dataset, weighted by the density of data that is present between these fractions.The extend is measured in this case as the 5 − 95% data range to exclude the outliers.DW D is calculated as follows (see Fig. 3a for the concept behind these equations): where d lower and d upper are the distances between the lumped standardized anomaly (Z lumped ) and the 5% and 95% standardized anomaly of the distributed model (Z 5% distr and Z 95% distr , respectively), which will be corrected to 0 if they are positioned outside the 5-95% data range.P lumped is the percentile of the distributed data that corresponds to the lumped model anomaly.
Substituting the sample values from Fig. 3a into Eq.( 5) to Eq. ( 7), gives the following result: position of the data point within the distributed simulations.In Fig. 3b, both the "Normal.2","Skewed" and "Bimodal" have the same percentile, yet the error varies largely between the different cases.The RM SE is able to catch these differences, but it does not provide the best information about the size of the spread that is missed by the lumped simulations, since it is a measure of the error with respect to the mean and the variance.This is not the case with the Density Weighted Distance (DW D) since this metric measures the distances and corrects those for the relative frequency of data.Lastly, we also compared the effects of selecting a different data range: 25 − 75% in stead of 5 − 95%.We conclude that this mostly influences results in terms of absolute size, but does not alter the relative differences much.We expect that when using the 25 − 75% range, lumped model results will be more often outside of this range than when using the 5 − 95% range.Furthermore, we assume that all grid cells in the distributed model are equally important and will therefore use the biggest data range to calculate the DW D, only excluding the outer 10% to remove any unwanted behavior resulting from outliers.R 2 values are presented in the bottom left box of each graph, calculated using the simulated and observed discharge anomalies per season.

Results and discussion
This section contains the results and discussion of the model simulations.Results of the distributed model are presented and discussed first, followed by the comparison with the lumped model simulations.

High resolution simulations
The key focus of this work is the catchment response to extreme events.To identify those extreme events, standardized precipitation and temperature anomalies are calculated for each season (see Fig. 4).Since patterns are similar across the two catchment types, results of only two basins are depicted in this figure.It should be noted that due to the averaging of values over three months, it is very likely that extreme events with a shorter duration are averaged out on this three-monthly time step.
The colored semicircles give an indication of model performance during each season.Matching colors indicate that SPHY simulated the same standardized anomaly as is found in the observed discharge.To better quantify model performance, the 10 coefficient of determination (R 2 ) is calculated between the simulated and observed discharge anomalies.Model performance is best in the pre-Alpine basin, but still satisfactory in the Alpine basin.The winter R 2 value in the Alpine basin is very low, which can be explained by the inability of the model to simulate discharge during low flow periods.This might be related to the relative coarse monthly calibration time step.Overall, model performance is satisfactory to answer our research question since patterns are simulated correctly.
The pre-Alpine basin seems to have a distinct pattern in the discharge anomalies, where high precipitation anomalies are often paired with high discharge deviations and vice versa.Temperature also seems to influence the discharge anomalies, but this relation is less evident.The Alpine basin shows a much more random pattern, without any clear relations with temperature and/or precipitation.This indicates that processes The highlighted dots in Fig. 4 show the extreme seasons selected for this study, for which the hydrological response will be analyzed.The seasons were selected based on the extreme precipitation and/or temperature events.Brönnimann et al. (2007) and MeteoSwiss (2017) both mention the extreme warm spring of 2007 in Switzerland.The extreme warm and dry summer of 2003 is known to be the most extreme summer in at least the last 500 years (Luterbacher et al., 2004;Zappa and Kan, 2007;Seneviratne et al., 2012).The extreme heavy precipitation during November 2002 caused mudflows in eastern Switzerland (Schmidli and Frei, 2005).No literature reference was found for the extreme wet winter of 1994/95.
Hydrological response maps for the two main hydrological fluxes (actual evapotranspiration (ET) and generated runoff) during each extreme event are presented in Fig. 5. Grid cells are colored by their cell-specific standardized anomalies.ET anomaly maps are only shown for spring and summer periods, when this flux is most important.During the two other seasons, large parts of the basins are covered with snow, where the model assumes no ET to occur.All basins show roughly the same ET response to the warm spring conditions in 2007.In the areas with a standardized anomaly of exactly 0, no evapotranspiration was simulated since the cells were covered with snow.Cells close to this region show a particularly high standardized anomaly.These cells are snow-free only for a limited time during spring, distorting the mean and standard deviation used to calculate the standardized anomaly.
A more complex response is present during the extreme warm and dry summer of 2003.In each basin, cells with low elevation show a different anomaly sign than the cells at high/mid altitudes.Higher temperatures increase potential evapotranspiration: actual ET was increased if cells had sufficient water available, but cells with negative ET anomalies indicate that they could not meet the increased potential ET, indicating that those cells became water limited in the course of the summer because of the elevated potential ET.This leads to a situation in which both negative and positive anomalies are present within the same meso-scale basin, even at the seasonal timescale and in response to a rather homogeneous distribution of temperature anomalies.
Runoff anomalies generally show a similar complex response, in particular in the Alpine regions.These basins often show contrasting anomaly signs within each basin.Generally, cells with low elevation show a different anomaly than the cells at high altitude.This dependency is not visible in the pre-Alpine basins, where all models cells show roughly the same response.We will further investigate the cause of this response below in Fig. 7.
In Fig. 6 spatial variability (as expressed by the standard deviation) of both fluxes is plotted against the average forcing for each season.Here the standard deviation (sd) is used as a measure of complexity, with large sd values indicating a complex and highly spatial variable hydrological response.This gives insight into how the response complexity varies with average forcing.
The precipitation -evapotranspiration relation was excluded since no interesting relations were found.Spread in the actual evapotranspiration response seems correlated with temperature (Fig. 6a), where higher temperatures result in bigger ET standard deviations.As mentioned earlier, ET is expected to increase with higher temperatures, but so is the number of water stressed cells.This combination increases the spatial sd, and is visible in almost all basins and seasons.
11 Standard deviation of generated runoff shows a different response to temperature and seems most sensitive during summer and autumn (Fig. 6b).The two catchment types show a different response: the runoff sd increases with temperature for the Alpine basins, while the sd decreases in the pre-Alpine basins.The cause for this discrepancy is the presence of glaciers: glacier melt will increase with higher temperatures, while regions without glaciers will evaporate more, reducing the runoff and thus the standard deviation.Please note that the average temperatures in both catchment types show almost no overlap, making it difficult to identify the exact cause of this disparity.
Influence of average precipitation on the runoff sd seems smaller (Fig. 6c).During winter, only the pre-Alpine basins show a response in runoff sd to precipitation.The lack of response in the Alpine basins is related to temperature: average winter temperatures are always below 0 • C, indicating that precipitation falls as snow and does not directly contribute to runoff.
Average winter temperatures in the pre-Alpine basins are more often above freezing point, resulting in a more rapid runoff Since there is only one event this extreme in the 21 years of simulations (see Fig. 4), it remains difficult to separate the effects induced by temperature and/or precipitation.The autumn period shows a similar response as the summer months, but the relation with temperature needs again to be taken into account.As visible in Fig. 4, high precipitation events are often related to lower temperatures, while low precipitation events are often paired with higher temperatures; independent of the basin.This could indicate that the relation between precipitation and runoff sd might be the inverse of the temperature -runoff sd relation.
To gain a better understanding about the hydrological behavior within each basin, the standardized anomalies of each individual grid cell are plotted against elevation (see Fig. 7).We again only show results for one of each catchment type, since the response patterns were similar across the different basins.The forcing anomalies show very little spread: the 95% confidence interval is almost always thinner than the plotted line, making it barely visible.Spread in runoff anomalies is in both catchment types bigger than the spread in forcing anomalies, making it impossible to explain the hydrological response solely by the forcing anomalies.Each dot in Fig. 7 is colored by land use type.Land use shows a clear correlation with elevation, best visible in the Alpine basin.The pre-Alpine basins did not contain any glacier cells and only a limited number of sparse/bare cells.This is explained by their more limited elevation range compared to the Alpine basins (see Fig. 1b).
The hydrological response can be grouped per land use type: "forest" and "glaciers" show almost always a different response within the same basin and season, where "grass" and "other" are covering a gradual transition between the two extremes.This grouping can be explained by the runoff generating processes.Areas at high elevation generate runoff by melting ice and snow (if present), while areas at low altitudes rely on rootzone and/or groundwater processes.The latter are mostly driven by the amount of available water (water limited), while runoff from ice and snow is mostly dependent on the incoming energy (energy limited).This dependency is best visible in Fig. 7a, where the hydrological anomalies at lower elevations coincide with the sign and size of the precipitation anomaly, while hydrological response shifts towards the temperature anomaly at higher elevations.Due to the insufficient "other" and "glacier" cells in the pre-Alpine basin, this relation is not as evident as in the Alpine basin.In the pre-Alpine basin, runoff anomalies seem to follow precipitation anomalies, indicating that the runoff generating processes are mostly driven by available water (Fig. 7b).Only during the winter of 1995, cells at high altitudes shift towards the temperature anomaly, implying that the response of those cells is dependent on the amount of available energy.
This grouping of different responses matches with different zones defined by Theurillat and Guisan (2001): colline: <700m, montane: 700-1400m, subalpine: 1400-2100m, alpine: 2100-2800m, nival: >2800m.These zones match with the different land use classes defined in our study: the first class is not represented in basin Reuss, montane corresponds to the "forest" group, subalpine to the "grass" group, alpine and nival to the "other" and "glacier" groups.A study by Jolly et al. (2005) described that these zones could also be used to group vegetation responses to the extreme summer of 2003.This indicates that elevation and thus vegetation cover are controlling the hydrological response to extreme events.Our results may be influenced by parameterizations defined within the model.For example, the limited evapotranspiration of snow-covered cells is a choice made by the developer of SPHY.One could argue whether this is realistic.Furthermore, the glaciers in SPHY are fixed in location and extend.The importance of dynamical glaciers is investigated by Van Tiel et al. (2017), and they conclude that using a dynamical glacier module is most important for long term studies.The simulation period of our study was rather short, and we there expect only minor differences in the location and extend of the glaciers over our time period.We do not expect any major different results and conclusions as result of those parameterizations within SPHY.

Impact of model resolution
With better understanding of the hydrological response to extreme events at high model resolution, we can compare those results to results when the basins are simulated on a 0.5 • × 0.  range, but does not show a consistent position (e.g.always equal to the mean).Nevertheless, this figure does not provide enough information to draw firm conclusions about the performance of the lumped model with respect to the distributed model results.
For each hydrological flux, basin and extreme season, the Density Weighted Distance was calculated and is presented in Table 2.This table shows that the runoff DW D is Alpine basins is generally higher than the DW D in the pre-Alpine basins (average Alpine DW D = 2.27 and average pre-Alpine DW D = 1.06).This is also visible in Fig. 8, where the pre-Alpine 5 runoff violin plots cover a smaller anomaly range than the Alpine violin plots.These averages indicate that the distributed model anomaly can on average deviate with 2.27 and 1.06 standard deviations from the lumped model in no specific direction for the Alpine and pre-Alpine basins, respectively.This illustrates that in these areas, the local hydrological response can be a lot more extreme than the basin average response.This effect is largest in the Alpine basins, which can be explained by the wider range in elevation and land cover types.this season (see Fig. 8).The lumped model is not able to replicate this response, since the model only consists of a single grid cell.This entire cell was clearly not water limited during the extreme warm and dry summer of 2003, since higher than average ET was simulated.As a result, the lumped model is not able to mimic basin responses which are at least 2.90 standard deviations away from the lumped anomaly.
Actual evapotranspiration is not only highly dependent on the amount of available water in that specific grid cell, the snow cover is also a very important factor.For example, the high DW D value during spring 2007 in basin Inn can be attributed to this response.In the lumped model, the entire cell was covered with snow, stopping any ET to occur.In the distributed model this occurred over about half of all cells.This resulted in a larger spread in actual ET in the distributed model, while the lumped model had an anomaly of exactly 0; the lumped model was never free of snow in every single spring over this simulation period.

Conclusions
In this study, we investigated the complex hydrological response anomalies in five basins in the Swiss Alps.These basins were selected based on their complex topography and their expected non-trivial response during extreme events, due to the presence of snow and glaciers.Three out of five basins are situated at high elevations and contain glaciers (Alpine basins), and the two other basins are situated at lower elevations and do not contain glaciers (pre-Alpine basins).All basins are for large parts covered with snow during the winter months.We ran the hydrological model Spatial Processes in Hydrology (SPHY) at -Spread in the hydrological response tends to increase with higher temperatures.We conclude that the hydrological response gets more complex with more extreme events.A relation with precipitation is less clearly pronounced: especially summer results might be influenced by the very extreme warm and dry summer of 2003.
-Variability in the intra-basin hydrological response is highly dependent on the land use type.This dependency is most clearly visible in the Alpine basins, where the elevation range and thus the variation in land use types is largest.Elevation classes defined by Theurillat and Guisan (2001) match the different response groups in our study.We found that runoff anomalies tend to match the temperature anomalies when the main runoff generating processes are dependent on available energy (melting of snow and glaciers), and runoff anomalies tend to match the precipitation anomalies when the main runoff generating processes are dependent on the amount of available water (rootzone and/or groundwater processes).
The two pre-Alpine basins generally show a different response to the extreme events.These differences can be attributed to the lower variation in elevation and land use classes in these basins.
-A new metric has been proposed to quantify the difference between the distributed and lumped model results: Density Weighted Distance (DWD).This metric measures the difference between the 5%-95% distributed data and the lumped model anomaly, weighted by the density of data that is present between the respective distances.This metric gives more information than (for example) the mean square error, due to the non-Gaussian distribution of distributed model results.
-A comparison between the distributed and lumped model results using DWD shows that distributed model responses can be a lot more extreme than the lumped model anomaly.DWD values show that the lumped model often misses roughly 2 standard deviations worth of standardized anomalies, with the largest deviations in the generated runoff response.This effect is largest in the Alpine basins, due to their wider range in elevation and land cover types.
Our most important conclusion is in line with the recent results for the Thur basin as reported by Melsen et al. (2016a), who state that spatial variability is very likely to be underestimated in hydrological models, increasing the uncertainty.They conclude that we should be careful with the interpretation of results from large-domain models.Our results stress the importance of this statement: we showed that results generated with a high resolution model can be a lot more extreme than a coarse/lumped model might indicate.This effect is most severe in basins with large elevation ranges and/or many different land cover types with different important runoff generating processes.
Competing interests.The authors declare that they have no conflict of interest.

Figure 1 .
Figure 1.Overview of the location (a) and elevation (b) of the five basins used in this study.Names of the main river basin are plotted in the center of each basin.Two character abbreviations represent country identifications.

Figure 2 .
Figure 2. Schematic overview of the conceptualization in SPHY.Blue arrows represent fluxes contributing to total runoff generated in each model cell, the red arrow represents the actual evapotranspiration, and the small grey arrows the fluxes between the different reservoirs.

6
Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2017-629Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 16 November 2017 c Author(s) 2017.CC BY 4.0 License.are the percentile score and the root mean square error.The percentile score provides information about the position of the lumped model anomaly within the entire range of distributed model results.A value of 0.1 indicates that the lumped anomaly corresponds to the bottom 10% anomaly-value of the distributed model, and a value of 0.5 indicates that the lumped anomaly

Figure 2 :Figure 3 .
Figure 2: The concept behind Data Weighted Distance (a).Comparison between different metrics (b), MSE is the abbreviation for mean square error and DWD the abbreviation from Density Weighted Distance.The large box in (b) represents the 5-95% data range, and the smallest box the 25-75% data range.

Figure 4 .
Figure 4. Relation between climate anomalies and observed and simulated anomalies in hydrological response.Each dot represents a single season and is colored with the standardized observed discharge anomaly (left half) and the standardized simulated discharge anomaly (right half).Dots with black outline represent the selected extreme events (winter of 1995, spring of 2007, summer of 2003 and autumn of 2002).

Figure 5 .
Figure 5. Spatial distribution of anomalies of actual evapotranspiration (a) and generated runoff (b) during four extreme seasons, for all basins.

Figure 6 .
Figure 6.Relation between simulated spatial variability in hydrological response and basin-averaged climate conditions: temperature versus evapotranspiration (a), temperature versus runoff (b), precipitation versus runoff (c).Each point represents a single season in the 1994-2014 period.A linear regression through these points is represented as solid line, with the shaded area indicating the 95% uncertainty range.
Syst.Sci.Discuss., https://doi.org/10.5194/hess-2017-629Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 16 November 2017 c Author(s) 2017.CC BY 4.0 License.response to precipitation events.A more pronounced relation is visible in the last two seasons, where the sd in Alpine basins decreases with increasing precipitation, and vice-versa in the pre-Alpine basins.The Alpine regression lines are strongly influenced by the extreme warm and dry summer of 2003: without this event the regression lines would have been much more horizontal.

Figure 7 .
Figure 7. Relation between elevation and hydrological response colored by land use type, presented for the Reuss (a) and Thur (b) basins.Each point represents the standardized anomaly for a single model cell, based on the data in Fig. 5.The solid and dotted lines show the smoothed precipitation and temperature anomalies, with the shaded area showing the 95% data range.Land use type "other" represents all sparse and bare vegetation types.

Figure 8 .
Figure 8. Model response to extreme events for both generated runoff (a) and actual evapotranspiration (b), where violin plots represent the distributed model response and the diamond the lumped model response.
Syst.Sci.Discuss., https://doi.org/10.5194/hess-2017-629Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 16 November 2017 c Author(s) 2017.CC BY 4.0 License.The summer of 2003 in the Rhone basin shows a very high DW D value (DW D = 3.83).This is mostly related to a very low percentile (P = 0.08), which emphasizes the large distance between the upper 95% distributed model anomaly and lumped model anomaly (d upper = 4.39).A very large portion of the distributed values are close the lumped model anomaly, implying that a small increase in lumped model anomaly would significantly increase the P value, which would have reduced the emphasis on d upper , decreasing the DW D value.Another high value is found for actual evapotranspiration during the 2003 summer in the Thur basin (DW D = 2.90).The distributed model shows a long tail towards negative anomalies, caused by model cells which are severely water limited during two different resolutions for each basin: as a distributed model at roughly 500 × 500m, and as a lumped model with a single cell of roughly 40 × 40km, corresponding to the hyperresolution and 0.5 × 0.5 • resolution, respectively.Model results were aggregated per season, and analyzed based on standardized anomalies.Over the simulation period 1993-2014, we selected one extreme event per season, based on standardized precipitation and temperature anomalies: the winter of 1995, spring of 2007, summer of 2003 and autumn of 2002.From the results, we can draw the following conclusions: Hydrol.Earth Syst.Sci.Discuss., https://doi.org/10.5194/hess-2017-629Manuscript under review for journal Hydrol.Earth Syst.Sci. Discussion started: 16 November 2017 c Author(s) 2017.CC BY 4.0 License.

Table 1 .
Statistics for each catchment.Station elevation represents the elevation of the outlet of the basin (FOEN, 2016).

Table 2 .
Density Weighted Distance for both hydrological fluxes during the four extreme seasons, for all basins.