Flash-flood forecasting in two Spanish Mediterranean catchments : a comparison of distinct hydrometeorological ensemble prediction strategies

Hydrological Ensemble Prediction Systems (HEPSs) are becoming more and more popular methods to deal with the meteorological and hydrological uncertainties that affect discharge forecasts. These uncertainties are particularly difficult to handle when dealing with Mediterranean flash-flood forecasting as many hydrological and meteorological factors take place and precipitation comes from small scale convective systems. In this work, the performances of distinct HEPS are compared for two heavy precipitation events that affected two different semi-arid Spanish Mediterranean catchments : the cases of the 5 03 November 2011 on the Llobregat River in Catalonia, and the 28 September 2012 on the Guadalentín River near in Murcia. The latter case corresponds to the IOP8 of HYMEX field campaign. The uncertainty on quantitative precipitation forecasting is sampled by using two different meteorological ensemble generation strategies. First, a convection-permitting EPS, which consists in dynamically downscaling the ECMWF-EPS directly by means of the WRF model. The second EPS strategy is based on the AROME-WMED convective-scale model. Its deterministic QPFs are perturbed based on a previous rainfall 10 forecast error climatology and by using the probability density functions of the errors, in term of total amounts and location of the heaviest rainfalls. The population of both ensembles is of 50 members, which are used to drive the HEC-HMS and ISBA-TOP hydrological models. For each HEPS, the performance is assessed in term of the quantitative discharge forecasts. The results point out the benefits of using (i) a hydrological model when evaluating highly-variable and convective-driven precipitation fields and (ii) an EPS to better encompass these uncertainties arising from different level of the HEPS. Issues 15 about the optimal number of ensemble members and impact of the ensemble forecasting lead time are addressed for optimal flash-flood forecasting purposes as well.


Introduction
Flash floods are among the worst hazards worldwide (Doocy et al., 2013).These hydrometeorological episodes can result in substantial human, social and economic losses.The HYdrological cycle in the Mediterranean EXperiment (HyMeX, http://www.hymex.orprogramme fosters a better understanding, quantification and modelling of precipitation and flood extremes in the Mediterranean (Drobinski et al., 2014).The Spanish Mediterranean region is affected recurrently by extreme precipitations resulting in flash-flooding, mainly during the end of the warm season.The early-autumn intrusion of upper-level cold air masses in the Western Mediterranean region and its comparatively large sea surface temperature boost the convective available potential energy.Besides, the prominent orography of Mediterranean Spain results in the lifting of the warm and moist sea and the subsequent generation of deep moist convection (Fig. 1).
Flash floods result from the persistence for several hours of high precipitation rates over specific hydrographic catchments.
In the Spanish Mediterranean region, quasi-stationary precipitation is often a consequence of long-lived Mesoscale Convective Systems (MCSs) which remain anchored by the prominent orography (Romero et al., 2000;Ducrocq et al., 2014).In addition, many semi-arid river basins are small-to-medium sized, highly urbanized and contain coastal steep streams.All these factors shorten even more the hydrological response times.As many small rivers are ephemeral, large and rapid flows carrying extensive quantities of debris can exacerbate extensive flood damage.Therefore, the development and evaluation of state-of-the-art hydrometeorological tools is an issue of major relevance.These tools can contribute to a better understanding and forecasting of flash floods in order to implement more reliable forecasting and warning systems over the Mediterranean Spain.
For this purpose, we have selected the 03 November 2011 and the 28 September 2012 flash floods in the Llobregat and Guadalentín basins, respectively.The Guadalentín and Llobregat basins are an archetype of Spanish Mediterranean catchments that are recurrently affected by flooding (for further details, see Amengual et al. (2007Amengual et al. ( , 2009Amengual et al. ( , 2015) ) ).The Guadalentín River basin is located in Murcia while the Llobregat watershed is located in Catalonia, south and north-eastern Spain respectively (Fig. 1).For both catchments, we have implemented two event-based models : (i) the semi-distributed and conceptual-based HEC-HMS model (USACE-HEC, 2000) and; (ii) the fully-distributed and physically-based ISBA-TOP model (Bouilloud et al., 2010;Vincendon et al., 2010Vincendon et al., , 2016)).
The first step of the present work is to explore the impact on the simulated flood hydrographs of these distinct hydrological model set-ups that account for different levels of variability in the rainfall fields and basin properties.The use of distinct hydrological models for both extreme events allows investigating more comprehensively the potential impacts of the aforementioned modelling issues in real-time.
Regarding the aim of the present work, the use of two models with different formulations, especially for the modelling of the soil infiltration mechanism, may result beneficial to better understand and describe the rainfall-runoff transformation processes, according to the nature of the rainfall episode which occur over the catchment in question.Specifically, we use two distinct model structures and physical parameterizations to simulate the complex rainfall-runoff transformation of intense precipitations over semi-arid basins.In fact, the different soil infiltration and routing schemes are determining factors modulating peak discharges, timings and runoff volumes.As a matter of fact, the characteristics of the rainfall event (i.e.spatial-temporal distri-tive scores.The impact of the forecasting lead time of the SREPSs as well as the size of the ensemble have been investigated also.
Next, section 2 presents a short overview of the flash-floods, the study areas and observational networks; section 3 provides the hydrological tools, the atmospheric models and ensemble generation strategies; results are presented in section 4. The last section summarizes main conclusions and provides further remarks.
2 Study regions, databases and flash-flood episodes 2.1 The Llobregat River basin and the first select episode The Llobregat basin is the largest internal hydrographic catchment of Catalonia (Fig. 2 a).Altitudes range from above 3,000 m in the Pyrenees to between 200-750 m in the pre-coastal and littoral ranges.The Llobregat River basin extends covering an area of 5,040 km 2 , with a maximum length around 170 km.The Llobregat covers a wide spectrum of annual rainfall amounts depending on altitude.Precipitation is broadly beyond 1000 mm where the altitude is higher than 1000 m in the Pyrenees.
Annual rainfall amounts are rougly of 700 mm over the pre-Pyrenees, with altitudes ranging from 600 to 1000 m, and barely reach 500 mm for lower heights.The Llobregat basin exhibits a mild and rainy cold season and a hot and dry warm season, characteristic of the Mediterranean climate.Furthermore, extreme precipitations affect the Spanish Mediterranean region every year, which represent a substantial part of the total amounts.Two dams are found in the montainous regions of the Llobregat (Fig. 2 a).
Available raw precipitation consists of 5-minute rainfall data recorded at 81 stations inside or very close to the Llobregat basin (Fig. 2 a).These pluviometric stations belong either to the Catalan Agency of Water -Automatic Hydrological Information System (ACA-SAIH) or the Spanish Agency of Meteorology (AEMET) networks.Five-minute streamflow data are recorded at four hydrometric sections in the Llobregat River.These stream-gauges are included in the ACA-SAIH network and are deployed in: (i) Súria town, with a dranaige area of 940 km 2 (labelled as Súria); (ii) Sant Sadurní d'Anoia city, with a drainage area of 736 km 2 (Sadurní); and (iii) Castellbell (3340 km 2 ) and (iv) Sant Joan Despí (4915 km 2 ) towns (labelled as Castellbell and Despí, respectively).A more detailed description of the Llobregat River basin and databases can be found in Amengual et al. (2007).
The first selected event occurred in autumn 2011, which has been loaded with heavy precipitation events (HPE) in the north-western Mediterranean (Silvestro et al., 2012;Hally et al., 2013;Rebora et al., 2013).The northern part of Catalonia was particularly affected at the beginning of November, 2011.On November 2nd, the large-scale situation was dominated by a deep, cold upper-level trough approaching from the North Atlantic Ocean while a southeasterly low-level flow was strengthening over the Catalonia region.Those atmospheric conditions favoured convection and heavy rainfall occurred.A maximum amount of daily precipitation of 202.9 mm was recorded on November 3rd over Catalonia and slightly above 150 mm over the Llobregat River basin.Most of the rivers of Catalonia were flooded even if minor damage was produced in Catalonia (Llasat et al., 2014).

The Guadalentín River basin and the second selected episode
The Segura is one of the most important Spanish rivers running into the Mediterranean Sea.This catchment spans over an area of 18 208 km 2 and the maximum length of the Segura River is about 325 km.The Guadalentín is the main affluent of the Segura River.The river basin extends over a region of 3343.1 km 2 and the river length is close to 121 km (Fig. 2 b).The Guadalentín river basin comprises altitudes ranging from above 2,000 m (in the Baetic system) to 1,200 m in the Murcian pre-litoral depression, and to barely 110 m in its mouth to the Segura.The Guadalentín basin is found in a particularly arid region of Mediterranean Spain, owing to its particular settlement.This Baetic range shelters this area from the frequent passage of cold fronts coming from the Atlantic in the wet season and bringing copious precipitations to other Spanish regions.Thus, precipitation in the catchment relies on the cyclogenesis of Mediterranean systems and the subsequent impinging of humid low level flow coming from south west.But these disturbances are sparse in time and small in space.Rainfall amounts are altitude-dependent and range from 50 to 300 mm.The hydraulic section of the CHS is well aware of the short recurrence of catastrophic flash-floods affecting the whole region.
Therefore, numerous structural measures have been implemented over the Segura River basin along the years.Specifically, four reservoirs are located within the Guadalentín.Furthermore, a channel was constructed downstream of Paretón so as to link directly the river with the Mediterranean Sea (Fig. 2 b).This channel was designed specially so as to prevent hazardous flooding impacting Murcia city.Therefore, this channel partially diverts large runoff discharges from the Guadalentín River into the Mediterranean Sea.Further details on the Guadalentín and databases can be found in Amengual et al. (2015).
The second study case is a classical Spanish Mediterranean HPE that affected Andalusia, Murcia and later on Valencia and Catalonia (even if less intense there) from 27 to 29 September 2012.This case has been documented during the HyMex campaign within an Intensive Observing Period, IOP8 (Amengual et al., 2015).About ten casualties have been deplored and damage has been estimated to more than 120 Meuros.On 27 September 2012, at upper-level, the synoptic atmospheric situation was controled by a cut-off low centered on the South-West of the Iberian Peninsula moving slowly north-eastward and a lowlevel depression centered upon inner Andalusia.The upwards forcing associated with the north-east flank of the cut-off low favoured the trigerring of deep convection and the development of a low-level convergence zone.Those ingredients reinforce convection and heavy precipitation across Murcia, Valencia, the Balearics and Catalonia, that was affected on 30 September 2012.The amounts of precipitation locally reached 214 mm in 24 hours in Andalusia and 240 mm in Murcia.They caused the flash-flooding around Murcia, especially the one of the Guadalentín River (Ducrocq et al., 2014;Amengual et al., 2015).The Weather and Research Forecasting (WRF) model version 3.5 (Skamarock, 2008) has been implemented with a single computational domain of 767 x 575 grid-points that is centered in the Western Mediterranean and spans the entire Mediterranean Spanish coast (Fig. 1).We have used a horizontal resolution of 2.5 km, 50 vertical levels and an integration time-step of 12 s.This model set-up allows resolving explicitly deep moist convective systems with relevant entity (Weisman and Klemp, 1997;Bryan et al., 2003;Roberts and Lean, 2008;Zheng et al., 2016).With this model configuration, the aim is to improve the prediction of the convective precipitations through a more accurate representation of the physics of the convective systems (Done et al., 2004).
The operational European Centre for Medium Range Weather Forecasts -global Ensemble Predictions System (ECMWF-EPS) aims at coping with uncertainty on the initial atmospheric conditions, taking all the observed and modelled information available into account.In particular, the global ECMWF-EPS consist of 50 integrations with a spatial resolution of ∼ 20 km, after applying singular vector perturbation to an initial forecast (Palmer et al., 1998;Leutbecher, 2005;Hoskins and Coutinho, 2005).By a dynamical downscaling of the global ECMWF-EPS, we rely on the sampling of the IC/LBCs uncertainties provided by the system, but we simulate more precisely the small-scale interactions so as to enhance the high-resolution quantitative precipitation forecasts (Branković et al., 2008).Lateral boundary fields are updated every 3-h.Physical parameterizations are identical across all WRF ensemble members and involve: the WRF single-moment 6-class microphysical scheme with graupel (WSM6; Hong and Lim (2006)); the 1.5-order Mellor-Yamada-Janjić boundary layer scheme (MYJ; Janjić (1994)); the Dudhia short-wave scheme (Dudhia, 1989); the RRTM long-wave scheme (Mlawer et al., 1997); the unified NOAH land surface model (Tewari et al., 2004); and the eta similarity surface layer model (Janjić, 1994).Atmospheric simulations span over two consecutive 48-h periods for the Llobregat event: 02-04 November and 03-05 November 2011 00 UTC, respectively.For the Guadalentín episode, NWP simulations span over two consecutive 48-h periods: 27-29 September and 28-30 September 2012 00 UTC, respectively.Next, hourly QPFs outputs are used to force the hydrological models.
Note that the model set-up matches the WRF operational configuration employed at the University of the Balearic Islands (http : //meteo.uib.es/wrf ).

The AROME-WMED model
AROME-WMED is a research convection-permitting model dedicated to HYMEX (Fourrié et al., 2015).It is based on AROME-France that is the operationnal non-hydrostatic model of Météo-France (Seity et al., 2011).The models run at a 2.5-km horizontal resolution.It has 60 vertical levels that range from 10 m above ground to 1 hPa.The deep convection is thus explicitly resolved.Its microphysical parameterization is a one-moment 5-class scheme (Pinty and Jabouille, 1998;Caniaux et al., 1994) with rain, snow, graupel, cloud liquid water and cloud ice.In the boundary layer, the vertical turbulent transport follows by Nat.Hazards Earth Syst.Sci.Discuss., https://doi.org/10.5194/nhess-2017-353Manuscript under review for journal Nat.Hazards Earth Syst.Sci. Discussion started: 18 October 2017 c Author(s) 2017.CC BY 4.0 License.two schemes: the parameterization of Cuxart et al. (2000) for prognostic turbulent kinetic energy for an eddy diffusivity part or the mass flux scheme of Pergaud et al. (2009) for dry thermal and shallow convection.The surface scheme is SURFEX (Masson et al., 2013) and the surface boundary layer is SLB (Masson and Seity, 2009).AROME-WMED covers a domain that encompasses most of the Western Mediteranean Sea (34N-11W,48N-20E; see Fourrié et al. (2015) to vizualize it).Its lateral boundary conditions come from forecasts of the French operational global model ARPEGE (Courtier et al., 1991).During HYMEX SOP1, a daily 48-h AROME-WMED forecast was run in real time, starting at 00 UTC every day from September to December 2012.A 3D-var data assimilation scheme is used to produce the initial atmospheric state.Compared to AROME-France analysis, more satellite and surface observations and some experimental measurements such as low-layer ballons data and additional radiosoundings were included in the data assimilation process.The overall quality of the AROME-WMED model proved to be as good as the AROME-FRANCE one, especially QPFs.All details can be found in Fourrié et al. (2015).Vincendon et al. (2011) used the object-oriented verification method called SAL (Wernli et al., 2008) to evaluate AROME-France QPF, in terms of location and amplitude errors.The references used were the quantitative precipitation estimates (QPE) provided by radar data (see Tabary, 2007;Tabary et al., 2007).No systematic biases in the magnitude of the rainfall were found and location errors remained lower than 50 km in the 80% of the cases.So, the high-resolution process-based model trajectory can be used to feed an hydrological model subject to the consideration of the uncertainty that affects QPF.
A classical way to take the uncertainty of atmospheric forecasts into account is to use a meteorological EPS.A prototype of a meteorological ensemble at convective-scale based on AROME model has been developed (Vié et al., 2011;Nuissier et al., 2012;Bouttier et al., 2012).This system is called AROME-EPS.Unfortunatelly, the AROME-EPS domain does not encompass the Murcia area.So, it has not been used in this work and another approach has been adopted.Vincendon et al. (2011) developed the so-called "Perturbation method" so as to create a set of QPF scenarios on the basis of the AROME-France deterministic forecast that contains valuable information.Primarly, an object-oriented climatology of forecasts errors was established besides the SAL evaluation cited in the former section.First rainy objects were defined as corresponding to connected grid cells with more than 2 mm an hour.The same is performed for convective objects, which were connected grid cells with more than 100 mm an hour.Some probability density functions (PDFs) of errors in term of amounts of rain and location of those objects were computed.For the present study, this climatology has been extended using a more extended domain (including Northern Spain) and including more rainy events.The study sample gathered rainy events from 2008 to 2010.Then a perturbation tuned from those PDFs was introduced into the deterministic forecast.The perturbation method follows the following steps: -Shifting of the rainy objects in accordance with the PDF of the errors in location of the AROME deterministic forecasts.
-Change of the rainfall intensity inside the rainy objects in accordance with the PDF of the errors in amplitude of the rainy objects.
-Change of the convective objects within each rainy object that are set more or less peaked/flat in accordance with the PDF of the amplitude errors of the convective objects.
Next, this method was applied to the AROME-WMED model forecasts.This is possible owing to the very close statistics obtained with AROME-France and AROME-WMED in QPF evaluations.This method tuned on AROME-France thus can be applied to AROME-WMED forecasts on other regions and periods, given that the rainy events considered have common characteristics with some in the sample.
This system is not a meteorological EPS but it allows to obtain several members for AROME forecasts.Consequently, the system will be called AROME ensemble in the following in order to simplify the writing and understanding.It is worth to mention that the AROME ensemble has finally an actual horizontal resolution that is higher than the WRF ensemble, due to the different approaches used to generate the members of those ensembles.In this work, 48-h AROME-WMED forecasts started on 02 and 03 November 2011 at 00 UTC for the Llobregat study case, and 27 and 28 September 2012 at 00 UTC for the Guadalentín study case.

The HEC-HMS model
HEC-HMS has been used as semi-distributed, conceptual-and event-based configuration for both river basins (USACE-HEC, 2000).A single hyetograph is used to drive the hydrological model for each subbasin.First, the hourly rainfall spatial accumulations are obtained from the 40 rain-gauges available.Spatial distributions are obtained by the kriging method.The semi-variogram has been matched after applying a linear model.This technique is founded in minimizing the error variance and is prescribed for rain-gauge stations with an uneven spatial distribution (Bhagarva and Danard, 1994;Seo, 1998).Next, we have computed the time series of the hourly rainfall amounts for each individual subwatershed as the area-average of the spatially gridded precipitation within the corresponding subbasin.Note that the same procedure is used to drive the hydrological model with the QPFs, but we have used NWP model grid points instead of the observed precipitation.Spatially gridded precipitation is 1-km resolution.
Runoff is computed according to the curve number methodology (CN; USDA, 1986).CNs depends non-linearly on a wide range of facets: accumulated precipitation, lithology, land uses and soil's antecedent moisture condition (AMC; Chow and W. (1988)).CNs were obtained after experimental field campaigns with AMC II for both river basins.The dimensionless unit hydrograph (UH) provided by the American Soil Conservation Service has been applied so as to transform the effective rainfall into overland flow for every sub-basin.This conceptual scheme links the runoff maximum with the time-to-peak by accounting for the sub-basin area and a conversion constant (USACE-HEC, 2000).The Muskingum method has been implemented so as to propagate the flood hydrograph (Chow and W., 1988).Therefore, the model set-up employs spatially uniform conceptual model parameters for each sub-basin and for the dynamical formulation as well.
The dams have been modelled by using the following information provided by the CHS and ACA hydraulic divisions: the storage capacity, maximum outflow and elevation, and initial water level.The reservoirs have been simulated by using their elevation-storage-outflow relationships (USACE-HEC, 2000).Finally, we have implemented a diversion element to account for the diversion of the flood volumes towards the Mediterranean Sea in the Guadalentín river.Note that data on diverted flows have been provided by the CHS.Model calibration focused on peak discharges, timings and runoff volumes, which are strongly dependent on infiltration and routing processes.In the semi-arid Mediterranean Spain, sparse vegetation together with torrential convective precipitations, that easily exceed the high initial soil infiltration capacity, favour fast Hortonian flows and rapid flow velocities in the river streams (Belmonte and Beltrán, 2001).Consequently, we calibrated the following model parameters: SCS-CNs, times of concentrations and flood wave celerities.Note that SCS-CNs encompass effectively the initial soil moisture conditions.The calibration procedure was carried out by combining a manual and an automatic approach.The automatic procedure uses as objective function the root mean square error weighted according to the peak; and as search algorithm, the univariate-gradient method (USACE-HEC, 2000).A complete description of the HEC-HMS set-ups and the calibration and validation tasks for both catchments can be found in (Amengual et al., 2007(Amengual et al., , 2009(Amengual et al., , 2015)).
All the hydrologic simulations comprise a 96-h period.For the Llobregat flooding, HEC-HMS has been run from 02 to 06 November 2011 00 UTC.For the Guadalentín flash flood, model simulations span from 27 September to 01 October 2012 00 UTC.These simulation periods encompass the whole flash-flood episodes.Note that the hydrological model applies a linear interpolation to convert hourly rainfall amounts to the model time-step.

The ISBA-TOP model
The hydrological model ISBA-TOP (Bouilloud et al., 2010) is dedicated to Mediterranean catchments simulations.ISBA-TOP fully couples the land surface model ISBA (Interaction Surface Biosphere Atmosphere, Noilhan and Planton, 1989) and a version of TOPMODEL (Beven and Kirkby, 1979) that has been adapted to the Mediterranean areas (Pellarin et al., 2002).This coupling consists of introducing a lateral distribution of soil water following the TOPMODEL concept into ISBA.ISBA deals with the soil-atmosphere exchanges : water and energy budgets are managed over a rectangular domain described by 1 km 2 grid cells.The hydrological processes are simulated over soil vertical columns.The catchments are described by small grid cells (called pixels) according to a DTM.The ISBA soil moisture over a grid cell allows determining the water-storage deficit on the corresponding catchment pixels as well as the hill slope recharge.TOPMODEL equations allow computing the lateral water transfers within the catchment using the topographical information and rainfall spatial distribution.The pixels water-storage deficit are thus updated and are transformed back into ISBA soil moisture.Pixels with null deficits define the saturated contributive areas.
ISBA computes soil water flows from those new saturated areas and soil moisture fields.Runoff over saturated areas (Dunne, 1978) occurs when water excess concerns the contributive areas that are diagnosed by the TOPMODEL approach.Hortonian runoff that occurs when rainfall intensity exceeds the infiltration capacity (Horton, 1933) is estimated on the the non-saturated fraction of the grid.ISBA also computes gravitational drainage at the bottom of the deepest soil layer.The drainage flow computed for an ISBA grid cell is distributed over all the corresponding catchment pixels, whereas total runoff is assumed to occur on the saturated catchment pixels only.The DTM informs about the geomorphology of the catchment, including the distance between each hill slope pixel and the river.The water flows are routed up to the river so as to compute total discharges at each river pixel through a geomorphological method.Artinyan et al. (2016) link the river discharge variable and the water velocity in the river.A full description of the coupling principle is available in Bouilloud et al. (2010).This ISBA-TOP original version has recently been improved so as to obtain satisfactory results without a calibration of the parameters.They are defined through pedotransfert functions (Vincendon et al., 2016).Reservoirs and diversion elements have been considered on the same basis as described in section 3. Summarizing the different experiments, we have distinct types of data that have been used to drive the hydrological models: -A reference simulation consists of using rainfall collected by raingauges and spatially interpolated by kriging.These rainfall data will be called QPE, Quantitative Precipitation Estimates.The discharge simulations obtained with both hydrological models driven by measured rainfall are annotated REF 2011 for the November 2011 case and REF 2012 for the HYMEX IOP8 case (see Table 1).
-The hydrological models driven by deterministic QPF (Quantitative Precipitation Forecasts) allow to obtain deterministic QDF (Quantitative Discharge Forecasts).AROME produces deterministic QPF originally (as described in section 3.1.2) while, for the WRF model, the unperturbed member of ECMWF-EPS dowscaled by WRF is used as control and will be denoted as "deterministic" as well.For each event, two simulations of deterministic forecasts have been produced at two distinct starting times.
-Ensemble rainfall scenarios provided by WRF and AROME SREPSs allow to obtain ensembles of QDFs.The same experiments than those deterministic are produced (two starting times for each study case).
Finally, more than 2400 discharge time series have been obtained.

Evaluation method
The hourly discharges ensemble forecats skills have been assessed computing objective scores on the whole data sample, using all the outlets and both cases in order to increase the statistical significance.See A for their definition.Classically, Root Mean Square Error (RM SE) has been computed to provide a comparison with observed discharge data and σ to inform about the spread of the HEPSs.For an informative ensemble, RM SE has weak values and σ has the same order of magnitude as RM SE.So ratios σ RM SE lower than one indicate a lack of spread.Brier Scores (BS) have been computed also to quantify the ability of the ensembles to forecast threshold exceedance.The threshold exeedance for the HEPS is compared generally with a threshold exceedance in observations, which are considered as a reference.In the present study, raingauge driven discharges simulations are used as the reference.Computing BS for several thresholds allows to point out a potential threshold impact.
An extention of BS for all the considered thresholds is the Ranked Probability Score (RP S).In the present study, the RP S of the ensembles are compared against the reference -that is the simulations with the derteministic versions of the models-by computing the Ranked Probability Skill Scores (RP SS).RP SS assesses the benefit on the studied ensembles compared with the deterministic version of the same model.RP SS score greater than zero means better skill for the ensemble than for the reference.

Hydrological models
The different experiments have been assessed at the following outlets for each river basin, where the stream mesurements are quality-controlled by the ACA and CHS hydraulic divisions: -Despí, Castellbell and Sadurní for the Llobregat River and -Lorca and Paretón outlets for the Guadalentín River.Lay and Saulnier (2007).The Nash-Sutcliffe efficiencies computed for each simulation are ranked from the smallest to the largest.The probability of a value being less than the i th smallest Nash-Sutcliffe effyciency (given by (i − 0.5)/n, with n total number of data) gives the cumulative frequency.
On the graph, the more the distribution is shifted to the right, the better is the skill of the model.This representation allows to characterise the overall accuracy of a model from a regional point of view.Better results are obtained with HEC-HMS model with more frequent high Nash-Sutcliffe efficiencies.This is confirmed by the representation of the observed versus simulated discharges (see Fig. 4).The points are quite well organised along the bisectrix with the HEC-HMS model.Note that HEC-HMS was previously calibrated and validated by using observed streamflow and precipitation data for both river basins Nat.Hazards Earth Syst.Sci.Discuss., https://doi.org/10.5194/nhess-2017-353Manuscript under review for journal Nat.Hazards Earth Syst.Sci. Discussion started: 18 October 2017 c Author(s) 2017.CC BY 4.0 License.(Amengual et al., 2007(Amengual et al., , 2009(Amengual et al., , 2015)), while ISBA-TOP has been run without any previous calibration task as described by Vincendon et al. (2016), but by using an spatial estimation of the initial soil moisture.Note that the initially dry soils and high infiltration capacities of these semi-arid catchments enhance the nonlinear response of runoff to intense precipitation and large rainfall amounts.In addition, heterogeneities also arise in the hydraulics of these basins' response to flash floods.Without a previous task of calibration and verification, it is difficult to cope with these high non-linearities (Amengual et al., 2017).
An illustration of the raingauge driven simulated hourly discharge time series is given by Figures 5 and 6, that show some examples at Castellbell and Paretón of the deterministic as well as ensemble forecasts.For the November 2011 case, the flow peaks obtained with ISBA-TOP driven by QPE are underestimated compared to the observations at Despí (not shown) and Castellbell (see Fig. 5a) and overestimated at Sadurní (not shown).Two flow peaks are obtained with both hydrological models, a moderate time lag is obtained with HEC-HMS; In addition, this model simulates a peak of the good order of magnitude at Castellbell (Fig. 5b), but an overestimation is obtained at Sadurní (not shown).That is, ISBA-TOP presents more inaccuracies when modelling the surface flow of the first wave of intense rains, probably as consequence of estimating the initial conditions of the soil drier than the actual.Another reason could be that the ISBA-TOP model first computed the runoff over saturated areas, so the saturation excess process contributes more to runoff generation than the Hortonian process.However, both flashfloods were mainly dominated by a rapid surface flow production as consequence of the very intense precipitations.
The hourly discharges produced by the QPE driven HEC-HMS simulations are quite close to the observed discharges for the September 2012 case (Fig. 6b), slightly underestimated with ISBA-TOP (Fig. 6a).Naturally, the distinct physical schemes of both hydrological models lead to distinct hydrological responses yielding useful information from both configurations.The calibrated HEC-HMS model simulates more properly the peak flow amplitude but the timing seems better reproduced by the uncalibrated ISBA-TOP coupled system as it simulates more accurately the dynamic processes of this flooding.This tends to show that a multi-model approach can be really informative.That is, the SCS-CN method is more suitable for simulating flow volumes before fast response Hortonian floods, but an spatially-distributed modelling of the dynamic of the overland flow is paramount for capturing properly the flood timing.A reason could be that once the rain has moistened the soil, the saturation excess runoff generation process plays a stronger role in streamflow dynamics and ISBA-TOP succeed better in simulating it.

QPF evaluation
Catchment-averaged precipitation amounts have been computed for 24-h periods for each experiment of Table 1 and for the distinct rainfall sources : QPE based on raingauge measurements, deterministic QPFs from the AROME and WRF models and ensemble forecasts.Figures 7 and 8 show the results for the cases of November 2011 and September 2012, respectively.
The Llobregat catchment-averaged precipitation amounts from 03/11/2012 at 00 UTC to 04/11/2012 at 00 UTC is over-/underestimated by AROME/WRF deterministic QPFs for both RU N 02 and RU N 03 (see Fig. 7).On the contrary, all the ensemble strategies succeed in encompassing the raingauges value either in the interquartile range (for the AROME ensemble) Nat.Hazards Earth Syst.Sci.Discuss., https://doi.org/10.5194/nhess-2017-353Manuscript under review for journal Nat.Hazards Earth Syst.Sci. Discussion started: 18 October 2017 c Author(s) 2017.CC BY 4.0 License.or in the "highest scenarios" for the WRF ensemble as figure 7 shows.Note also the generalized higher ensemble spreads for AROME than for WRF.
The deterministic QPFs from both models and both starting times underestimate the 24h-accumulated rainfall averages on the Guadalentín catchment (see Fig. 8).The ensemnle QPF scenarios do not succeed in correcting this tendency, except for the RU N 28 corresponding to the AROME ensemble.These facts illustrate the arduous task of simulating extreme precipitations properly over medium-sized basins in terms of location, timing and rainfall amounts.Recall that small-scale convective systems are highly sensitive to different NWP model configurations, physical schemes and IC/LBCs.
As far as rainfall the spatial distribution is concerned, AROME deterministic QPF overestimates the highest rainfall for RU N 02 and RU N 03 experiments and locates them too north-west compared to the observation.The location is better for WRF deterministic QPF but the rainfall amounts are underestimated (not shown).The behaviour of the ensembles is illustrated by figure 9. AROME ensemble still leads to overestimate the spatial extension of the highest precipitation whereas WRF ensemble still mislocate them.So the errors of the deterministic NWP models are still present in the ensembles.AROME RU N 27 and WRF RU N 27 deterministic as well as ensemble runs do not succeed in simulating neither good rainfall amounts nor a good spatial distribution (see Fig. 10).WRF RU N 28 forecasts a good location for the heaviest rains but the amounts are far too weak (except for the highest percentiles of the ensemble).The contrary is obtained with AROME RU N 28 whose QPF reaches the good totals but with a mislocation.AROME RU N 28 has a clear tendency to extend more westward the rainfall.
That is, WRF fails to simulate realistically the extreme rainfall values produced by the convective-scale systems anchored by the complex orography, while AROME fails to locate them realistically.It is worth noting that the WRF ensemble does not account for those uncertainties associated to the physical parameterization diversity.Some previous studies have shown the benefits of using multiple physics ensemble strategies to further reduce biases when forecasting HPEs in the Mediterranean Spain (Tapiador et al., 2012;Amengual et al., 2017).Moreover the higher actual resolution of the AROME ensemble can induce a slightly better rainfall intensity forecast.

QDF evaluation
The raingauge driven discharges remain uncertain as shown in section 4.1 due to both hydrological modelling and QPE uncertainties.Therefore, the rain-gauge driven discharge simulations are considered as the references so as to assess the HEPS performances.That is, the REF 2011 and REF 2012 experiments.In this way, the hydrological modeling uncertainties are avoided when evaluating the QDFs.As aforementioned, figures 5 and 6 show some examples of the hourly discharge time series.Those examples illustrate that the HEPS approach improves the forecast compared to the deterministic experiments for both extreme floods.Indeed, the inter-quartile envelopes better encompass the raingauge driven discharges (blue lines) than the deterministic forecasts (green lines).This conclusion is the same than what was concluded for rainfall.A focus on the RU N 03 experiment is interesting.The observed rainfall is well approximated by the AROME ensemble 25%-percentile or by the WRF ensemble 75%-percentile (see Fig. 7).But as far as the discharges are concerned (see figure 5), the peak discharge simulation is  (Figs. 11 and 12, Ferraris et al., 2002).This additional information further complements the benefits of accounting for HEPSs when coping with flash floods.Peak discharge exceedance probabilities [P (Q ≥ q)] of the observed flows quantify the likelihood of forecasting these extremes.[P (Q ≥ Qobs)] are of 0.36, and 0.20 for the AROME-ISBA HEPSs at Castellbell and Paretón, respectively.Similarly, peak discharge exceedance probabilities are of 0.18 and 0.16 for the WRF-HEC HEPSs (Table 2 ).As reference, AEMET declares an alert when extreme weather has a probability of occurrence greater than or equal to 0.20.In addition, almost all HEPSs issue unequivocal probabilities of surpassing the different discharge return periods (Qp T ), spaning from 0.34 to 0.98 depending on the hydrometric section (Table 2 ; for further information about the values of the Qp T s, see Amengual et al., 2009Amengual et al., , 2015)).
Even if the observed peak discharge exceedance probabilities range from low to moderate for both HEPSs, the AROME-ISBA HEPSs would have triggered emergency procedures before both episodes according to the AEMET criterion.Furthermore, the distinct hydrological ensemble strategies would prove useful for conveying proper information to civil protection and emergency decision-makers before both floods as [P (Q ≥ q)s] are all well above 0.2.Note that the different discharge return periods quantify the risk of facing hazardous floodings.
Table 3 points out positive RP SS values for all the ensembles that confirm a benefit of driving the hydrological models with the ensembles rather than with the deterministic QPFs.Nevertheless, the RP SS is lower using WRF meteorological ensemble.
The RM SE values are not significantly different for these extreme floodings (less than 10%).On the contrary, the σ scores are very far one from the others.The spread is higher using AROME than using WRF and higher with HEC-HMS model than with ISBA-TOP.However, the ratio σ RM SE is always lowerthan 1.This tends to show a lack of spread except when HEC-HMS is driven with AROME ensemble.The lesser spread of the WRF than the AROME ensemble can be attributed to the fact that the synoptic and large mesoscale dynamical and thermodynamical environment is sufficiently accurate in the ECMWF unperturbed member.Therefore, less additional information is conveyed by the perturbed IC/LBC ensemble members.On the other hand, the AROME ensemble is founded on a climatological model error database, accounting not just for inaccuracies in the IC/LBCs, but also in the model formulation.In this case, it would be advisable to also account for inaccuracies in the WRF model physical schemes when using a perturbed IC/LBC ensemble in order to further span the EPS spread.The different spreads in the HEC-HMS and ISBA-TOP models can be attributed to the distinct model infiltration schemes, when initiating runoff as a based-threshold process.

Impact of the forecasting lead time
In the following, the study cases are considered separately.The Brier scores have been computed for the hourly QDFs rendered by the meteorological models started at different lead times.Figures 13) and 14 depict the results for the November 2011 and September 2012 episodes, respectively.These forecast probability levels are: Qq < 0.05; 0.05 ≤ Qq < 0.15; 0.15 ≤ Qq < 0.25; ...; 0.85 ≤ Qq < 0.95; and Qq ≥ 0.95.All HEPS are verified for the following hourly discharge thresholds: 0, 1, 2, 4, 8, Nat.Hazards Earth Syst.Sci.Discuss., https://doi.org/10.5194/nhess-2017-353Manuscript under review for journal Nat.Hazards Earth Syst.Sci. Discussion started: 18 October 2017 c Author(s) 2017.CC BY 4.0 License.16, 32, 128, 256 and 512 m 3 .s−1 .To increase the statistical robustness, a boot strap with 1000 repetitions has been applied (Diaconis and Efron, 1983).It appears that the forecasting lead time of the atmospheric EPSs has a strong impact on the flashflood forecasts performances.Figure 13 shows that, for the November 2011 case, except for the HEPS where HEC-HMS is driven by AROME, the lower Brier scores are obtain with RU N 03 rather than RU N 02.For the September 2012 case, RU N 28 leads to better results that RU N 27 for the four HEPS (see Fig. 14).This tends to show that better results are obtained using the latest meteorological forecasts.This can be expected for the WRF ensemble since it is built by perturbing the initial conditions.So, the closer the lead times to the episode are, the more accurate is the representation of the initial state of the atmospheric conditions that result in the flash-floods.With AROME, the results are more varied.In fact, this AROME ensemble comes from a perturbation of the forecasted rainfall, so it depends more on the deterministic scenario.The AROME deterministic simulations also rely on the initial conditions.So, in theory, the later the simulations start, the more accurate are the IC/LBCs to those producing the flash-floods.But closer lead times to the episode date does not guarantees to have better simulations than starting before, especially when dealing with extremes.This has to be confirmed on more a wider climatology, but it would point out to a possible improvement of the forecasting and warning schemes before hazardous floods in order to better establish the confidence levels from an operational perspective.

Impact of the number of ensemble members
The computational cost of a HEPS with 50 members is very high from an operational point of view.So, another issue of the maximum interest is to examine the ensemble forecasting skill in terms of the ensemble size.That is, are so large-sized ensembles really necessary or smaller size ensembles have similar forecasting skill?To investigate this issue, a sub-sample of ensemble members is selected randomly among the whole population for each ensemble experiment.How the performance is affected in terms of probabilistic QDFs is assessed by computing the aforementioned statistical scores (i.e.RP SS, RM SE, σ, σ RM SE ) for the different sub-sets of HEPSs.The size of the additional ensembles range from 10 to 40 members.Fig. 15 shows the scores depending on the number of members for the September 2012 case study.Reducing the number of members leads to a deterioration of the scores but this relationship is not linear and depends on the considered HEPS.It seems that the best compromise in terms of RPSS and spread (i.e. σ RM SE for instance) is obtained around 20 or 25 members.These outcomes do not change by just considering the November 2011 episode or by considering both episodes together (figs.not shown).

Conclusions
In line with the major scientific goal of the HyMeX program of improving flood early warning procedures and mitigation measures, several distinct HEPSs have been built to forecast two flash-floods events that have occurred over two semi-arid Spanish Mediterranean watersheds.These HEPSs make use of two different meteorological ensembles at convective scale and two hydrological modelling systems.First, the performances of the hydrological models driven by observed rainfall data have been assessed first.Next, the skill of the HEPSs is studied for the two flash-flood events.The aim was two-folded: to analyze -The calibrated SCS-CN method better simulates the peak discharges but a spatially-distributed (even uncalibrated) model better reproduces the flood dynamics.
-The use of an ensemble rather than a deterministic approach clearly improves the forecasts of both extreme rainfall episodes over the watersheds and discharges at the different hydrometric sections.This is clear whatever the case, catchment and HEPS.
-The AROME ensemble shows a higher spread than the WRF ensemble.Recall that high ensemble spread can avoid underdispersion.This result would indicate that accounting for the physical scheme uncertainty could enhance the ensemble spread.
-The results of the HEPS intercomparison vary depending on the case and catchment.But it is interesting to note that the conclusions made on the catchment-averaged rainfall forecasts can be different from those made on the discharge forecasts owing to the strong non linearities in the rainfall-runoff transformation.This fact clearly shows the added value of assessing ensembles through an hydrological point of view.
-As far as lead time is concerned, a better skill is obtained for ensembles with shorter forecasting lead times, at least in the sample considered in our study.
-Regarding the ensemble size, the ensembles with 50 members obtain the best objective scores, but considering more computationally-affordable ensemble sizes (around 20 or 25 members) does not deteriorate the ensemble performance significantly.This result could help to the optimal design of real-time hydrometeorological forecasting chains.
Although two case studies do not allow to reach general conclusions about the predictability of this kind of hydrometeorological hazards or about the optimal hydrometeorological forecasting strategy in an operational framework, it points out important aspects to take into account in future studies.Note that the 28 September 2012 episode is a prototype of long-lasting mesoscale convective systems that are responsible for the most hazardous flash floods over the Western Mediterranean.Finally, this study aims at assessing HEPSs that combine multi-model and ensemble approaches.This kind of hydrometeorological approach will develop in the future.Integrative studies with integrated chains considering meteorology, hydrology but also hydraulics and impacts are promising approaches for a future transference to an operational context and breakthrough improvements in risk management strategies.The solid blue and green lines correspond to the rain-gauge and NWP-deterministic driven maximum discharges, respectively.The dash red line represents the ensemble median peak discharge.The light gray shaded area depicts the ensemble spread between quantiles q0.25 and q0.75 of the members.Variables Qp5 and Qp10 denote peak discharge exceedance probabilities of the 5-, 10-year return periods, respectively.
108 automatic rain-gauges are located within the Confederación Hidrológica del Segura (CHS) boundaries (Fig. 2 b).Raw precipitation accumulations are recorded with a time frequency of 5 minutes.These automatic stations belong either to the Confederación Hidrológica del Segura -Automatic Hydrological Information System (CHS-SAIH) or to AEMET networks.Almost 40 of these 108 stations are deployed over the Guadalentín River watershed or in the vicinity.Streamflow is recorded at three hydrometric sections along the Guadalentín River: in Lorca and Paretón de Totana cities (labelled as Lorca and Paretón, respectively) and in Salabosque -outskirts of Murcia city-.Their respective drainage areas are of: 1827.1 km 2 , 2384.7 km 2 and 3170.4 km 2 (Fig. 2 b).Runoff is registered in the CHS-SAIH network with a temporal frequency of 5 minutes as well.
2.1.In this work, ISBA-TOP has been run with an hourly time step.Hourly rainfall amounts collected by rain gauges and spatially distributed (through a linear horizontal interpolation method) are used to drive ISBA-TOP.Other parameters (2m-air temperature, 2m-air humidity,10m-wind speed and direction,...) are set constant.AROME-WMED analyses provide ISBA-TOP initial soil water contents and temperatures.ISBA-TOP simulations start systematically 2 days before the beginning of the studied period, so on 01/11/2011 at 00 UTC and on 26/09/2012 at 00 UTC to ensure soil moisture balance at the beginning of the rainfall event.3.3Designed HEPS Several HEPS have been built by coupling HEC-HMS and ISBA-TOP hydrological models with both WRF and AROME-WMED SREPSs.Recall that the SREPSs have been run for two 48-h different periods for each case study: on 02/11/2011 00 UTC and 03/11/2011 00 UTC for the Llobregat flood, and on 27/09/2012 00 UTC and 28/09/2012 00 UTC for the Guadalentín flooding.Each SREPS has 50 members plus the unperturbed simulation.The resulting HEPSs have been labelled as: AROME-ISBA, AROME-HMS, WRF-ISBA and WRF-HMS, respectively, and have been run for the aforementioned periods.In brief, we have four experimental set-ups encompassing the 03/11/2011 and 28/09/2012 floods: RUN02 (starting on 02/11/2011 00 UTC), RUN03 (starting on 03/11/2011 00 UTC), RUN27 (starting on 27/09/2012 00 UTC) and RUN28 (starting on 28/09/2012 00 UTC).For each experimental set-up we have two different 48-h EPSs and four subsequent HEPS.Several experiments have been designed for evaluation purposes.They are listed inTable 1 together with their starting time and experimental name.

Figure 3
Figure 3 presents the cumulated Nash-Sutcliffe efficiency frequencies at these five stream-gauges for the simulated hourly discharges within REF 2011 and REF 2012 experiments with HEC-HMS model (in blue) and ISBA-TOP (in orange).These empirical cumulative distributions are built following the method of LeLay and Saulnier (2007).The Nash-Sutcliffe efficien- Nat. Hazards Earth Syst.Sci.Discuss., https://doi.org/10.5194/nhess-2017-353Manuscript under review for journal Nat.Hazards Earth Syst.Sci. Discussion started: 18 October 2017 c Author(s) 2017.CC BY 4.0 License.close to the 75%-percentile for the AROME ensemble driven HEPS and is out of the interquartile range for the WRF ensemble driven HEPS.This fact highlights the highly nonlinear nature of the rainfall-runoff transformation in semi-arid basins.The cumulative distribution functions (CDFs) of the flow peaks at Castellbell and Paretón are plotted for the previously comented examples Nat. Hazards Earth Syst.Sci.Discuss., https://doi.org/10.5194/nhess-2017-353Manuscript under review for journal Nat.Hazards Earth Syst.Sci. Discussion started: 18 October 2017 c Author(s) 2017.CC BY 4.0 License. the forecasting skill of several distinct HEPSs as well as to investigate the impact of the forecast lead time and of the number of ensemble members.The main conclusions are :

Figure 1 .!Figure 3 .
Figure 1.Configuration of the computational domain used for the WRF numerical simulations.Main geographical features mentioned in the text are shown.The thick continuous lines show the regions where the Llobregat and Guadalentín River basins are located in the Internal Basins of Catalonia (IBC) and the Confederación Hydrográfica del Segura (CHS), respectively.Coordinates are indicated in latitude and longitude.

Table 3 .
Scores for hourly discharges (m 3 .s−1 ) with the meteorological ensembles AROME-EPS and WRF-EPS and the hydrological models ISBA-TOP and HEC-HMS.All catchments and cases are considered.