Interactive comment on “Using StorAge Selection functions to quantify ecohydrological controls on the time-variant age of evapotranspiration, soil water, and recharge” by Aaron A. Smith et al

First, the evaluation of the model is not very compelling and needs to be improved to make the model results more credible. The model seems to have a large number of parameters calibrated to a relatively small number of isotopic measurements with high within-day scatter. The main indicator of model skill shown in the manuscript is an ability to roughly reproduce a seasonal signal in isotopic concentration that dampens with depth (Figure 3). The model also, presumably, simulates soil moisture, but this was not compared to data in the manuscript. The calibration keeps the 100 "best" pa-

Response to R3C1: The authors agree that the benefit of using SAS functions can be clearer.The benefit of SAS functions provides a method to examine incomplete soil water mixing while also considering the potential non-stationarity of mixing.Within the Bruntland Burn, the non-stationarity of soil water mixing has previously been observed at the catchment scale (Benettin et al., 2017), which suggests that the same non-stationarity may be present within smaller soil volumes.For the number of model outputs that are calibrated, the number of parameters within the model is quite low.For example, time-series are produced for 5cm, 10cm, 15cm, 20cm, and xylem water and evaluated with δ 2 H, δ 18 O, and lc-excess (15 efficiency criteria).Each efficiency criteria yields different information content.The number of parameters is quite low given the high number of efficiency criteria for calibration (15).In the development of the model, numerous model assumptions were tested.Model structure testing included parameterization of SAS functions at each depth, 3 SAS function parameters at each depth and 12 parameters for 4 soil layers.However, preliminary simulations found little difference in the SAS functions in the layers below 5cm.Additionally, parameterization of SAS functions at each depth provides a high degree of parameter freedom and the model would be significantly over-parameterized.Furthermore, there is little evidence to suggest a significantly different soil types occur within the top 20cm of soil for either site.The SAS function for D was selected as uniform due to insufficient evidence to provide a different distribution.The uniform SAS function required no parameterization.Within model calibration, the mixing (D) was negligible and the soils are not greatly sensitive to the SAS function.D was included to the potential for diffusive mixing in drier soils (Sprenger et al., 2018).As mentioned in the manuscript, evaporation and root-uptake fluxes were assumed to be uniform due to estimation of evaporation and root-uptake source.Assuming a different distribution would over-parameterize the flux and would result in a lack of uniqueness between the SAS function of evaporation (or root-uptake) and the selection of the flux from depth.

R3C2:
Development of the model.The model (Equations 1-12) is not described well and potentially wrong.My concerns are listed below.The symbol Sz was used for both C3 fast and slow flow domains.I assumed that the Sz in Equation 1 is the age-ranked storage for the fast domain and that the Sz in Equation 2 is the storage for the slow domain, as it is stated in Lines 16-17 on page 4.

Response to R3C2:
The authors recognize that the simplification of the equation term names provided may lead to confusion (see Response to R1C10).The manuscript will be modified where the subscripts f and s represent the relative age-ranked storage in the fast and slow domain, respectively, and D S F = D F S for all time-steps and represent the movement of slow domain to fast domain (D S F ) and fast domain to slow domain (D F S ).Additionally, the water balance of the soil will be shown: where Q F S fills the slow domain (dV S /dt = 0), and V F and V S are the fast and slow domain volumes, respectively.For the fluxes in all equations, the subscripts F and S represent the volume of water from the fast or slow domain, respectively.

R3C3:
There are no influx terms in Equations 1 and 2. All storages will be continuously depleted.I believe it is a typo.However, it is important to know if the influxes go either to the fast flow zone or to the slow domain, or if the influxes somehow partitioned into both domains.If the latter is the case, how the partition occurs also needs to be C4 may be "selected" from the total storage volume.The revision will term the distribution as the "cumulative residence time of layer relative to the soil water age-ranked storage" R3C12: Equation 6 1. δ z should be defined better as it is a function of S z on the righthand side term and is a function of z on the left-hand side term.2. And, again, ω z is not a SAS function.
Response to R3C12: ω z was intended to represent the SAS function of a particular flux rather than the definition provided in Eq 5.The manuscript will be amended to: where * indicates the type of flux leaving storage (Q, D, E, or R), S F and S S indicate the absolute age-ranked storage in the fast and slow flow domains, respectively, δ F and δ S are the absolute age-ranked isotopic composition of the fast and slow domain, respectively, and V F and V S indicate the total volume of the fast and slow flow domains, respectively.The units of ω are inverse storage (1/mm), which is integrated with respect to storage.
R3C13: Equation 7The z and t dependencies of the terms are not described well.
Response to R3C13: This equation will be modified and defined with respect to T , t, and z for each flux, for example for the fast flow domain: dT where δ F (S F , t, z) and δ S (S S , t, z) are the absolute age-ranked isotopic composition of the fast and slow domains, respectively at soil layer z, S F and S S indicate the absolute age-ranked storage in the fast and slow flow domains, respectively, q(T, t, z − ∆z) is the downward flux from the storage above with absolute age T , and dS F , dF S , r F , C7 and e F are the fluxes in/out of storage (diffusion, root-uptake, and evaporation, respec-tively) with absolute age = T .The parenthesis of the absolute age-ranked storage (e.g. S F (T, t, z) = S F ) have been removed for clarity of the equation.The isotopic compositions were evaluated for each value of T (i.e. S F /dT ).R3C14: Equations 8-9.It is not described well how the δ E , estimated from Equation 8, can substitute the one in Equation 7. In Equation 7, δ E is a function of T , while δ E in Equation 8 is not.
Response to R3C14: The authors will add additional detail on the application of δ E within equation 7. Equation 8 is a function of T as it has δ z .For further clarification, Equation 8 will be modified to (shown here for the fast flow domain): Response to R3C15: The equation (Eq 10) will be modified in the manuscript to re-duce confusion of the terms.Since δ E is not measured the equation will be updated to reflect root-uptake: where δ R indicates the total isotopic composition of a vapour flux (E or R), where the summation increases with steps of ∆z evaluated at i ∈ 1 • • • N (N number of soil layers) until the maximum modelling domain depth of Z, f R is the probability distribution of C8 root-uptake from depth with units of 1/depth evaluated for a specific soil layer, and the combination of isotopic composition of fast and slow flow domain are volume weighted.

R3C16:
What are the values of the parameters used?Also, were the same values used for all the layers?
Response to R3C16: The only parameter within equations 8 and 9 is the C K , which is the ratio of diffusivities of H2/H1 and O18/O16.The remaining values in the equations are measured (i.e.humidity, temperature, soil moisture) or estimated (δ A , and θ o ).Since n is dependent on θ o and θ (which were different for each soil layer), the value of n changed between layers.Although the differences of n between layers were quite small.θ o was estimated using equations 11 and 12.The revised manuscript will rearrange the equation to define θ o prior to Eqs 8 and 9.
R3C17: Equations 11-12.Equation 12 is a water mass balance model using the relationship described in Equation 11.Thus, the introducing sentence which reads "Eq.11 is rearranged to solve for the . .." is not correct.
Response to R3C17: The statement will be amended to better describe the evaluation for equation 12.
R3C18: Potential inconsistency in slow and fast flow domain storage: There are two different slow domain storages in the model: θ o ∆z (as stated in Lines 9-10 on Page 7) and S T (T, t) (used in Equation 2).If those are different, this inconsistency should be introduced and treated carefully in the manuscript.Such inconsistency also exists for the fast flow domain.

Response to R3C18:
The revised manuscript will modify this terminology (see Response to R3C2 for the terms that will be used).

R3C19:
In addition to the above point, I think there are three domains in the model among four available combinations: fastθ-fastST , fastθ-slowST , slowθ-fastST , and slowθ-slowST , where fastθ is the fast flow domain determined by θ, fastST is the C9 fast flow domain determined by S T , and so on.These four available domains are not described at all in the manuscript, and the manuscript misleads readers by stating that there are only two types of domains.The authors stated that the parameter u can be time-variant.However, how u was formulated is not described.There is no λ in Equation 13-14, while the authors stated that "λ in Equations 1314 was permitted to equal 0, time-invariant conditions" in Lines 5-6, Page C3 in AC2: Response to Reviewer 2. λ was introduced later under Equation 16 but not used in Equation 13-14.Perhaps, an equation similar to Equation 16 is required to formulate u here.Also, it would be arbitrary that how the functional form for u was selected.Can it be justified?

Response to R3C19:
The authors recognize that some of the terminology is confusing, and will rearrange the methodology for clarity (see Response to R3C16).There are 2 domains in each CV, fast and slow.The volumes of the domains are determined by equation 12.The estimated value of θ o is used to estimate the volume of water in the slow domain while the different of θ and θ o is used to estimate the volume of water in the fast domain.The authors will add in the linear definition of u F for clarity.A linear function was used for simplicity and to reduce parameterization.Within the function, if there is no slope the function has no time-variance, and permits the assessment of how sensitive the soil is to time-variant conditions.R3C20: Model performance measure: Equation 19: Was the adjusted NSE newly developed in this study, or are there any references to cite?
Response to R3C20: The adjusted NSE is newly developed to better account for the large variability of measurement in the soil water.This metric was created throughout model development as it was determined that the mode efficiencies (not limited to NSE) were highly sensitive to the event in January 2016 (1 in 200 year event).Using the mean value on each day was not representative of the measurements, as each day presented different uncertainties of measurement.

R3C21:
It is unclear what samples were used in the density estimation.Thus, I had to guess that the replicated samples (n = 4) were used to construct the density.It is also not clear what bandwidth was used for the kernel density estimation.

Response to R3C21:
The soil samples (n = 5) were used to derive the distribution.The manuscript will be revised to clarify how p in Eq. 19 was estimated.The use of kernel density was misstated.The kernel density estimations were only used for visual inspection of the results and a proxy of the GLUE method estimation, not within the NSE metric.Rather, the normal distribution was used (as the reviewer suggested in R3C22).R3C22: Moreover, wouldn't the (perhaps chosen arbitrarily) bandwidth plays an important role in considering the measurement uncertainty in the adjusted NSE?I wonder what the benefit of using the kernel density would be compared to the likelihood functions which considers the measurement uncertainty in a statistically more rigorous way (by using statistics of the measurement such as standard deviation).I don't think the use of the kernel density would be a better way of accounting the uncertainty than using such likelihood functions.

Response to R3C22:
As mentioned in Response to R3C21, it was misstated in the manuscript that kernel density estimation was used within the adjusted NSE estimation.The standard deviation of the samples was used with the normal distribution to estimate the probability (p) used in Eq. 19.Rather than fully addressing the uncertainty of the model, this method was used to reduce emphasis on sample days with high uncertainty.For example, the 1-in-200 year event (January 2016) was highly depleted from some samples, and less depleted for others.Most efficiency criteria are highly sensitive to this sample day as it has a larger deviation from the other sample days.However, there is high uncertainty associated with the sample day.Other evaluation tools to include measurement uncertainty (e.g.Kuppel et al., 2013), use a similar weighting method (e.g.covariance matrices) to reduce the influence of high uncertainty days on the overall model performance.C11 R3C22: It is not described well how the 100 parameter sets were chosen using the multiple NSEs (for different types of measurements and different measures).The only description is: "The "best" calibrations were selected using the N SE adj for all measurements (..) and a cumulative distribution function (Ala-Aho et al., 2017)" in Lines 2-4 on Page 9.It is unclear what the "cumulative distribution function" is in this context until one looks at the cited paper.More detailed description is required so that potential readers can grasp what the authors did without looking at the cited paper.(By the way, the description (Equations 6-8) in the cited paper is written with typos, so it was hard to understand the method.Thus, I think the equations should be re-written in this paper).Moreover, selecting the 100 best parameter sets (not 200, 1000, ..) is quite arbitrary, and the arbitrariness makes it difficult to interpret the model's uncertainty estimation.

Response R3C22:
The authors direct the reviewer to the Initial Response to Reviewers 1 and 2 for how the 100 best parameter sets were selected.The parameter sets were based on minimum efficiency criteria rather than arbitrarily selected.The authors will include a more details explanation of how the parameters were selected, and subsequently ranked using the cumulative distribution function (CDF): n(X) = (∩ 5 i=1 ∩ 3 j=1 P i,j (X ≤ x))/100 where P i,j is a CDF for a model layer i with efficiency criteria j, and n(X) is the number of simulations meeting the objective (X ≤ x).
R3C23: This part, mainly the discussion, was very hard to read as I don't have a clear picture of the developed model.Thus, I wrote only a few comments on this part at this stage.The authors stated that "this model structure does not make the assumption that uptake is time-variant or time-invariant" (for example, in Lines 6-7 on Page C3 in AC2: Response to Reviewer 2), and they argued that the model and data supported the time-variant hypothesis as perhaps the NSEs were higher when the time-variability was allowed (when the parameters u F is time-variant).I don't agree with the statements for two reasons.First, the criteria for choosing the time-invariant model is unclear.Second, C12 and more importantly, I don't think the authors have enough data to discuss the timevariability, and the detected time-variability could be an artifact.I will discuss these in more detail in the following.First, when can you say the beta distribution in Equations 14-15 was time-invariant?When the calibrated parameter (let's say λ) is "exactly" zero?Or, when the absolute value of λ is less than a certain threshold?Moreover, it seems to me that the isotope dataset (presented in Figure 3) is perhaps not sufficient to test the time-variability.With the above (threshold) issue in mind, perhaps the best way to test the hypothesis on the time-variability is to see how the model works for two different cases: one with the λ parameter set to 0 and another by allowing calibration of the parameter.If the authors can identify several periods when the model with λ = 0 captures the observed time-variability, the authors perhaps can say that the timevariable model was required.However, I don't think the dataset is enough to be used for this, and perhaps the model still would do a relatively good job with the λ parameter set to 0. Thus, I suggest the authors show the model results with the λ parameter set to zero.As the use of NSE would not be sufficient to discuss it, the timeseries of model results (similar to Figure 3) should be included (at least in Supplemental material) so that readers can agree to the argument on the time-variability Response to R3C23: The authors will include the parameter distributions in the appendix of the revisions as well as the sensitivity analysis.The parameter, λ, is one of the most sensitive parameters, which defines how variable the SAS function is in time.We did not previously include a direct comparison of time-variant vs time-invariant due to the observed values of calibrated λ.The values of λ determined through calibration showed values trending away from values of 0, or even values near zero.Parameter values trending towards values of zero would indicate that the soils are more likely time-invariant than time-variant.We calibrated mean values of λ of 3.7 for Site A and 2.1 for Site B. For improved comparison, we will provide a time-series of time-invariant and time-variant solutions in the appendix.
R3C24: It should be described better with an equation of how the SAS functions in C13 Figure 6 were estimated.
Response to R3C24: The equation will be included in the methods section.

R3C25:
Comparison of the estimated range of water age.Page 20, Lines 24-25: It seems to me that the ranges overlapped each other quite a lot; thus, it is hard to agree with the statement.
Response to R3C25: While there are brief periods of overlap of the evaporation and root-uptake, the difference between the two fluxes is amplified in the spring and summer of 2016.During this period there are both low soil moisture (prior to June) and high soil moisture conditions (mid June).It is also during these periods that the evaporation and root-uptake are at the greatest values, and the differences between the two will be the largest.This distinct periods will be discussed further in the revised manuscript.