Interactive comment on “ Frequency Analysis of Extreme Sub-Daily Precipitation under Stationary and Non-Stationary Conditions across Two Contrasting Hydroclimatic Environments ” by

The paper aims to use observed sub-daily summer (June-October) precipitation intensities from two ARS sites to test for evidence of temporal trends and to build IDF curves using Annual Maximum Series (AMS) and Partial Duration Series (PDS) approaches and a Bayesian method that takes into account the non-stationarity of the time series, using this last approach for a failure analysis addressed to infrastructures that are designed with a stationary approach. The paper is interesting and in the line of some recent literature on the topic. There are few major points to be discussed:

General comments This paper compares Intensity-Duration-Frequency (IDF) curves resulting from stationary and nonstationary frequency analysis.This exercise is not new in the literature, even though the Authors emphasize the quality of the analysed (fine scale) rainfall data set.The statistical methodology devised for such a type of analysis is already available both in frequentist and in Bayesian flavour.In this respect, the paper does not introduce any improvement; on the other hand, I found the application quite confused and questionable.To summarize, in my opinion, this paper is in line with a growing literature adopting some misconceptions and misinterpretations about these topics.In the following I'll provide some remarks, which however are secondary compared with the fact that (non)stationarity is a property of models and cannot be inferred from data only (as done in this paper), thus making these studies (based on trend tests and model selection) no very informative, if not meaningless and misleading.I regret to provide such a negative opinion.I hope that the other reviewers will provide a positive feedback.
We thank the Dr. Serinaldi for the valuable and useful comments on this manuscript, they have certainly helped improve our understanding of the statistical concepts we are using in our study.We believe that their suggestions will further improve our manuscript and we can address these comments in the revised manuscript.We understand the Reviewer's concerns and we are aware of the available scientific literature that reflects on the misuse of statistics done by hydrologists (Clarke, 2010;WMO, 2013;Koutsoyiannis and Montanari, 2015;Lins and Cohn, 2011).However, despite the limitations of our approach we think that there is considerable scientific evidence, albeit uncertain, that precipitation intensities have increased with warmer atmospheric temperatures (Fischer and Knutti, 2016;Lenderink and van Meijgaard, 2008;Mishra et al., 2012;Muschinski and Katz, 2013;Wasko and Sharma, 2015;Wasko et al., 2016;Westra et al., 2014) and these increases need to be taken into account in the design process.The authors do not think that a measure of humility (Lins and Cohn, 2011) is enough to design structures and to implement water management policies that allow a sustainable improvement of people's quality of life.In many localities, sub-daily duration design storms are specified for design of storm water management structures.The dataset employed in our analysis is not optimal for length of record (few are if the criteria of the reviewer is used), but nonetheless, the analysis of this data provides valuable insights into sub-daily intensities and the differences in their trends as compared to daily analysis for which the vast majority of intensity trend analysis has been conducted.

Specific comments
P2L14-19: the Authors are right when they talk about 'stationarity of the model', but then they confuse stationarity (of the model) with low-frequency fluctuations of ENSO, AMO or whatever climate index as possible sources of nonstationarity.Low-frequency fluctuations resulting in local upward or downward trends (in observed time series) can be due to persistence or other causes.If we do not know the cause of such local trends, they can results from both nonstationary processes and stationary processes (with persistence, regime switch, etc.).The class of stationarity models goes far beyond the i/id case, and comprise (non)i/id processes whose realizations show local trends that can be confused with the effect of nonstationarity.Unfortunately, without identifying the generating mechanism, no definite conclusion can be made about the stationarity or nonstationarity of the underlying process.
The Reviewer raises an excellent point.We consider that natural systems are inherently non-stationary with hydro-climatological time series having falling and rising local trends known as long-term persistence or Hurst's phenomena (Koutsoyiannis, 2005).
In the text (P2L15), we refer to them as low frequency components of climate variability C3 such as ENSO, AMO, PDO.The presence of long-term persistence can result in the detection of statistically significant trends even when no trend is present (Cohn and Lins, 2005;Villarini et al., 2009).We are aware than from an statistical perspective even the longest available record of a climate variable can exhibit long-term persistence when an spectral analysis, which is also a model with limitations and assumptions, is performed on them (Lins and Cohn, 2011).However, the difficulty identifying the forcing mechanism generating such trends can be challenging due to the length of the available observational records time series, and it might be easier, as Villarini et al. (2009) suggested, to proclaim the demise of stationarity than to prove it with the available data.
After reading Dr. Serinaldi's comments, we realized that our wording is incorrect and it needs to be improved in the revised version of the manuscript.Specifically, the text should read (P2L14) stationary of the time series not stationarity of the model.We will revise this section in the revised manuscript and provide additional information about the limitations of our approach.
Concerning the positive increases in observed precipitation linked to a warmer atmosphere, it is worth recalling P1L28-29: the increase of water vapor is a model projection (we can also project vapor decrease, if we want), and the change of the magnitude and frequency of intense precipitation events is a hypothesis (among many others equally reasonable based on the available data).
We agree that climate models are imperfect representations of the complex climate system.Their uncertainties are constantly being evaluated by the scientific community and improved model versions are being developed as computational power increases.However, we believe that despite their limitations, climate models constitute the best available tool to reproduce past and future climates.Stochastic models are excellent options but they are black box approaches that rely only in the memory of the data available.Climate models on the other hand, use physically-based equations, when feasible, to represent natural processes in the ocean, atmosphere, and land surface.

C4
Additionally to climate model simulations, precipitation and temperature observations during the last half of the 20th century show, in the absence of changes in circulation patterns, that precipitation intensities have increased at ratios that exceeded the expected from the Clausius-Clapeyron relation which describes the capacity of the atmosphere to hold moisture (Fischer and Knutti, 2016;Lenderink and van Meijgaard, 2008;Mishra et al., 2012;Wasko et al., 2016).Within virtually all of the scientifically informed climate community there is clear consensus that atmospheric temperatures are increasing, as show within the observational record with increasing greenhouse gases.It is a matter of simple physics.As projected trends in greenhouse gas releases are increasing for some time, even under the most optimistic reduction scenarios, it is safe to assume temperatures will also increase.The physics clearly indicates that a warmer atmosphere can contain greater water vapor so the projection of increasing atmospheric water vapor is hardly a contentious assertion.P2L20: see also Rootzén and Katz (2013) and Serinaldi (2015) for a wider discussion.Thank you for referencing these papers.We will incorporate them in the revised version of the manuscript.P2L22-24: 'A few climate studies have shown that design storm estimates from stationary models are significantly lower than estimates from nonstationary models'.If we assume an increasing trend in the parameters of a distribution (as done in those works and in the present paper), an increase of return level for a given return period or an increase of the risk of failure for a given design value are unavoidable, and trivial, as they are a direct consequence of the model structure.On the other hand, if we incorporate decreasing trends, we should conclude that the estimates from nonstationary models are significantly lower than estimates from stationary models (which is also a trivial conclusion, given the model structure).The point is that we do not the actual direction without identifying the mechanism driving the evolution of the system.Only if we know such a mechanism and we identify a deterministic component, then the use of nonstationary modelling is legitimate; otherwise, it reduces to a simple numerical exercise of C5 parameter fitting on a bunch of data (which is always possible, but uninformative).This is a very interesting point.The mechanism driving the trends is not well understood and should be further investigated.In the semi-arid site winter, spring and fall precipitation are enhanced under El Nino-Southern Oscillation (ENSO) conditions and summer precipitation is reduced (Webb and Betancourt, 1992).Annual droughts are more frequent when positive Atlantic Multidecadal Oscillation (AMO) and negative Pacific Decadal Oscillation (PDO) occur (McCabe et al., 2004).In the temperate site, wetter conditions during winter have been dominated by El Niño condition with the high-Pacific-North American pattern (PNA) andÂ ȃPacific Decadal Oscillation (PDO) indices (Ning and Bradley, 2014).P2L1-6 and the outcome of the trend tests confirm that we actually have both positive and negative trends: if we fit stationary and nonstationary distributions, we will find that stationary estimates are lower (higher) than nonstationary estimates in the former (latter) case.This pattern of positive and negative trends in precipitation has been reported in the literature.We agree that the lower/higher estimates between stationary/nonstationary models is expected, however engineering design needs an estimate, albeit uncertain, of the magnitude of those ups and downs.Saying that the estimates are lower or higher than the available design storms are it is useless for informing structural design.P3L21-27: Sincerely, I cannot see any large inter-decadal variability in Fig. 1b; on the contrary, the 11-yr moving average denotes a very regular (flat) behaviour.Please, do not confuse isolated minima and maxima with decadal patterns.Data and resolution are not enough to draw such conclusions, and the diagrams do not support them in any case.The same holds for Fig. 1c: please stop fitting untenable, unjustified, and evidently unsuitable straight lines to hydro-meteorological time series.I hope not to be the only one who see that they are ill-posed and uninformative...maybe not (Poppick et al. 2017).If you do not agree with me, please try this experiment: fit straight lines to data for the periods 1961-1980, 1981-2002, and 1961-2002, then assess the significance of the trend slope, extrapolate to 2015, and finally draw conclusions about the nonstationarity of the process and the design values that we should have adopted.Actually, I think there is no steady increasing pattern, but only a very short time series whose length is not enough to draw conclusions about the nature of the observed variability, and to support nonstationary models.Concerning sample size requirements and the unreliability of nonstationary design values in future time widows, you can refer to Prosdocimi et al. (2014), Serinaldi and Kilsby (2015), and Luke et al. (2017).
Thank you for your comments.The Reviewer is correct, based on the figure alone it is not possible to infer the presence of inter-decadal variability.We will rewrite the description of Fig 1 (b) and (c) to incorporate this comment.
Regarding the fitting of a straight line to the figures, the Reviewer is correct to point out that the magnitude and direction of the trend is dependent on the beginning and ending years and has been clearly demonstrated with different periods of the Southwest USA observations (Goodrich et al., 2008).We agree with the Reviewer that fitting a straight line to a different period might change the results.In the revised manuscript, we will add a Block Bootstrapping Mann-Kendall (BBS-MK) trend analysis to examine if the rainfall intensities for different time series lengths show a nonstationary behavior.In the BBS-MK approach (Kundzewicz and Robson, 2004), the original time series are randomly resampled in blocks for a set period of time which allows the original short-term memory of the system to be preserved.The method is flexible and robust and avoids any modification of the original dependence structure of the data which constitutes one of the strengths of the method.P4L21-26: T = 1/P and T = 1/rP are not definitions but expected values obtained under specific model assumptions (see, Fernández and Salas, 1999;Douglas et al. 2002;Serinaldi 2015;Volpi et al. 2015;Salvadori et al. 2016) Thank you pointing this out.We will replace the equations in the manuscript with Expected values T=E(x)=. ... We agree with the Reviewer that the traditional (stationary) approach for defining T is based C7 on the assumption of serially independent annual maximum events that follow a geometric distribution with an exceedance probability p.Under stationary conditions, the mean return period is always 1/p; however under non-stationary conditions p is no longer constant from year to year (Read and Vogel, 2015;Vogel et al., 2015).The approach we are using (Non-stationary Extreme Value Analysis (NEVA), (Cheng and AghaKouchak, 2014;Cheng et al., 2014)), computes time varying exceedance probabilities as outlined in (Salas and Obeysekera, 2014).P5L5: for an updated discussion on threshold selection see Langousis et al. (2016) and references therein.We will improve the discussion in the revised version of the manuscript.
P5L11-14: MK test does not check for linear trends but for stochastic dominance (double check Mann's paper and the rationale Mann-Whitney U-statistic).Moreover, if you check for monotonic trends, why should the trend pattern restricted to the linear one?Why not an S-shaped patter or something else?
The Reviewer is right, the Mann test is a test of randomness against trend.Why linear?That is a very good point, a linear trend is frequently chosen for hydrologic and climatic analysis hence we chose it to be able to compare our results with similar analysis available in the literature (Groisman et al., 2005;Kunkel et al., 1999;Kunkel et al., 2012;Pryor et al., 2009) among others).P5L20-23: Please, pay attention to cookbook recipes!In their Eq. 10 for equivalent sample size, Yue and Wang (2002) merged the results for lag-1 ACF corresponding to a Markov model (from Matalas and Langbein (1962)) with those for all lags (lag-1 included) from Bayley and Hammersley (1946), which in turn do not rely on Markov assumption.Such hybridization is not required and mixes results relying on different hypotheses.Moreover, ACF estimates are prone to strong bias and required corrections depend (once again) on the assumed model (see, e.g.Koutsoyiannis, 2003;Serinaldi and Kilsby 2016).

C8
We thank the Reviewer for pointing this out.We are not sure what the Reviewer is referring to with cookbook recipes.We have implemented Eq. 10 from Yue and Wang (2002) as indicated by the authors.P5L25-33: I do not understand which estimation method you use.In these few lines you mention a Bayesian framework without giving any detail, but in the same paragraph, P5L8-10, and in the remainder, you talk about maximum-likelihood estimates, and perform a frequentist analysis based on classical GoF tests, likelihood ratio tests (P6L12-22), etc. Please clarify.My guess is that nothing is Bayesian in this study.If I am wrong, please use a unique approach throughout the paper.Both frequentist and Bayesian approaches come with a full set of tools for complete inference.Choose the method that you prefer, but avoid exotic hybridizations because there is already enough confusion (and misuse) regarding statistics in hydrology.
We appreciate the Reviewer's comment.We are using a Bayesian-based Markov chain Monte Carlo approach to obtain the posterior distribution of parameters from an arbitrary distribution of the parameter space.First the prior parameter values are specified with an assumed probability distribution (Cheng et al., 2014), then assuming that the observations are independent the Bayes theorem is used to compute the posterior parameter distribution.
The Reviewer is correct about using the ML method to fit the GEV and GPD models.We realized that the statement in P5L5-10 is incorrect.We used the ML method to estimate the GPD threshold parameter under stationary assumptions.The goodness of both probabilistic models was measured with the log-likelihood ratio method was used to compare the fit of the stationary and nonstationary models (Coles, 2001).
We will include additional information about the methods used and remove confusing information in the revised manuscript.P6L1-6: I am sorry, but I think that this is not the correct way to perform this type of analysis.You pretend to compare results from a GEV model where the location C9 parameter (controlling the shift of the distribution) varies linearly with a GPD where the linear behaviour is assigned to the scale parameter related to the variability (spread) of the distribution.This is simply nonsensical: you should choose if the trend is in mean or in variance!Note that you performed trend tests only on the average levels, not on the squared residuals.At most, you should use a GPD with time varying threshold.Moreover, linear trends in the GEV and GPD cannot be justified by the outcome of the trend tests because the distribution introduces a nonlinear transformation between quantiles and location and scale parameters.Again, what means that you multiplied the 40000 parameters by the linear trend?This is not how NS inference works.In the NS framework your distribution is parametrized as F(x; µ0, µ1, σ, ξ); so, your set of MC(MC?) parameters already describes the linear trend and its uncertainty.You do not need to multiply for anything.I only hope that you did not fit the model in a stationary set up, then using the fitted straight lines as correction factors.If so, your approach is not only empirically and theoretical inconsistent, but it also strongly underestimates one of the main sources of uncertainty, i.e., the variability of shape and magnitude of the parameter's trend.
We included the GPD scale parameter () based on recommendations by (Cheng, personal communication and (Cheng et al., 2014)) who indicated that long-term time series are required to model the temporal changes in this parameter.We do not think that our approach is nonsensical, we are clearly stating what parameters were included in the analysis and that comparing side-by-side GEV to GPD is not one of the goals of the study.That said, we realized that this is not clear in the manuscript in its current form and we will incorporate the Reviewer's comments in a revised version of it.
The Reviewer suggests "At most, you should use a GPD with time varying threshold", we chose to keep the threshold constant which will allows us to evaluate if precipitation intensities have increased/decreased in time during the study period.By allowing the threshold parameter to move in time, it is not possible to have a benchmark for that selection.

C10
We realized that our wording was unclear and misleading.We did not fit the GEV/GPD model in a stationary setup and added the trend line.For each Monte Carlo realization (40000), the time-variant location parameter (in the case of the GEV) linearly varies in time ((t) = 1t + 0 values).The model parameters are used to estimate nonstationary precipitation intensities for different exceedance probabilities (AghaKouchak et al., 2013;Cheng and AghaKouchak, 2014;Cheng et al., 2014).
We will modify the text in the revised manuscript to reflect this comment.P7L1: Eq. 8 confirms the confusion about the meaning and derivation of the concept of return period: Tt simply does not exist.In both stationary and nonstationary framework, the return period and the corresponding return level are unique value.NS risk of failure can only be written in terms of time-varying probabilities of exceedance Pt because T results from integration (averaging) over a time windows; writing Tt = 1/Pt is meaningless.As mentioned above, T = 1/P is not a definition (axiom) but a relationship derived from calculations under specified model assumptions.
We are not sure what the Reviewer means by "Tt simply does not exist".In our study the nonstationary return periods are computed with the expected waiting time theory.Details of the approach are found in (Cheng and AghaKouchak, 2014;Salas and Obeysekera, 2014).As we mentioned in a previous comment, we are aware that T is dependent on model assumptions (usually a geometric distribution under stationary conditions) that are not valid for non-stationary conditions (Read and Vogel, 2015;Vogel et al., 2015).The approach we are using (Non-stationary Extreme Value Analysis (NEVA), (Cheng and AghaKouchak, 2014;Cheng et al., 2014)), computes time varying exceedance probabilities as outlined in (Salas and Obeysekera, 2014).P7L19-31: It seems that the results of trend tests are somewhat mixed with positive and negative trends.So, why do you select two sites with only positive trends, and, in this set, you discard the case showing significant negative trend (temperate, PDS, 60-min)?Leaving aside that the use of trend tests to justify NS models makes little C11 sense, a fair picture should show results for both increasing and decreasing cases.Of course, this would contradict the main conclusion, resulting is a less catchy but more realistic and obvious result: we have both positive and negative trends; they are likely related to random (short-term and long term) fluctuations (or at least the information is not enough to identify a true deterministic evolution), and then NS design values are lower or higher than the stationary counter-part.This is a very good point.We selected only positive trends because we consider that from a structural design perspective, the consequences of underestimating the design storms are more damaging to society than overdesigning them.We agree that including negative trends will most likely result in underestimations of precipitation intensities.We think that incorporating the negative trends in our analysis will be an interesting exercise and will improve the manuscript.To show the differences, we will implement the approach using negative trends in the revised manuscript.P8L9-10: Why are you sure that negative trends are related to natural variability and do not deserve an NS model telling us that the risk is decreasing, whereas positive trends are due to a deterministic mechanism (which is required to justify the NS models) and need NS modelling?Why are negative trends 'children of a lesser god'?
This comment is related to the previous one.Negative trends are not "children of a lesser god", we considered that their impact on engineering design was not as crucial as in the case of increasing trends.P8L15-24: What means that the theoretical distributions were found statistically significant?Did they pass GoF tests or not?Fig. 3 shows that the fitting is pretty good.Since you have 40000 MC parameter sets, please complement point estimates with confidence intervals (or Bayesian credible intervals).
The GoF for the stationary models is very good and the fitting passed the test.This is shown in Figure 3.We are not plotting the fitting for the NS models in this figure .C12 P8L26-31: The QQ plots in Fig. 4 should correspond to the results in Fig. 3.However, panels in Fig. 4 seem to contain much more data points, and the bad agreement on the upper and lower tails does not match with the good fit shown in Fig. 3.It seems that the QQ plots show results for the stations altogether.I hope to be wrong.Moreover, the temperate site is excluded from QQ plots because it does not show significant trends.Leaving aside that this site shows two significant trends for PDS at 60 and 1440 minutes (see Fig. 2), in P6L10-11 you write that 'This method [QQ plots] was only implemented for the stationary models since the parameters of the non-stationary models change with time'.Thus, you apply QQ plots for stationary cases, but you do not show QQ plots for the stationary case!Is this a three-card Montegame?:-) By the way, QQ plots can be used in the NS case after suitable rescaling (see, Furrer and Katz, 2008).
We will verify that the data used to plot Figures 3 and 4 are consistent.We didn't include the temperature site in the Q-Q plots in Figure 4 because we didn't use the site for further analysis in the paper.We will include the data in the revised manuscript.We are not sure we understand the Reviewer's comment "Thus, you apply QQ plots for stationary cases, but you do not show QQ plots for the stationary case!Is this a threecard Montegame?:-)".Perhaps he meant nonstationary case.We didn't show the Q-Q plots for NS models, we will include them in the revised version of the manuscript.P9L1-9: I think that a fairer assessment should include all sites.We will include all sites in the analysis in the revised manuscript.P9L15-19: As mentioned above, there is no evidence for any linear trend, and in any case, this does not translate into linear trends in GEV location, and even less in GPD scale parameter.Extrapolating the observed trend is not what NS models do.They extrapolate the law of variation of their parameters with the corresponding uncertainty.
We extrapolate the observed linear trend in our models since we assume that the rate of change in precipitation intensities will stay constant in the future.Based on C13 this assumption, the GEV location parameter and the GPD scale parameter will vary accordingly.Perhaps this is an approach with its limitations, however is widely used in the climate and hydrologic sciences (Condon et al., 2015;Liuzzo et al., 2017;Luke et al., 2017;Sarhadi and Soulis, 2017;Wi et al., 2016).The goal of this paper is to use available statistical tools to evaluate changes in design storms not to develop new ones.
P9L21-23: Please, use transparency to show the overlap of the confidence bands.As mentioned above, obviously NS design values are systematically higher than the stationary values according to the magnitude of the increasing trend introduced in the GEV location parameter.This is a trivial result.Please show also the NS cases with negative trends, where design values decrease compared with the stationary ones.
We will use transparent confidence bands and negative trends in the revised manuscript.P9L25-30: 'These results indicate that a specific precipitation intensity will be more likely to occur, i.e., shorter return period, in the future if the observed trends are incorporated in the IDF design.'...For sure, and it will be less likely in the cases of negative trends. . .and will be unchanged if the trend is null... and water is wet :-) I hope you will understand that the above sentence is pleonastic, actually tautological.
We will include the analysis with negative trends in the revised version to be able to quantify the magnitude of the changes.We do not consider our work superfluous or we would not be doing it.We will improve the writing in the revised manuscript to avoid needless repetition of ideas.P10L1-12: In text and Fig. 6, you refer to absolute differences, but the caption reports the expression of the relative percentage difference.So, what is shown in Fig. 6? Which values are discussed in the text?Absolute or relative?