The new importance measures based on vector projection for 1 multivariate output : application on hydrological model 2

Abstract. Analyzing the effects of the inputs on the correlated multivariate output is important to assess risk and make decisions in Hydrological processes. However, the existing methods, such as output decomposition approach and covariance decomposition approach, cannot provide sufficient information of the effects of the inputs on the multivariate output, since these methods only measure the influence of input variables on the magnitudes of variances of the dimensionalities in the multiple output space and ignore the effects on the dimensionality directions of output variances. In this paper, a new kind of sensitivity indices based on vector projection for the multivariate output is proposed. By the projection of the conditional vectors on the unconditional vector in the dimensionless multiple output space, the new sensitivity indices measure the influence of the input variables on the magnitudes of variances and directions of the dimensionalities simultaneously. The mathematical properties of the proposed index are discussed, and its link with the Sobol indices is derived. And Polynomial Chaos Expansion (PCE) is used to estimate the proposed sensitivity indices. The results for two numerical examples and a hydrological model indicate the validity and potential benefits of the vector projection index and the efficiency of estimation approach.


Introduction
Models with multivariate output are widely used in the field of engineer and science, and the multivariate output is correlated in some degree.For example, output of multiple elicitation surveys are applied to the cost of key low-carbon energy technology (Bosetti, Marangoni et al. 2015), and many dynamic models used to study risk assessment and decision support in ecology and crop science generate time-dependent model predictions, with time being either discretized in a finite number of time steps or considered as continuous (Lamboni, Monod et al. 2011).
Traditional methods for Global Sensitivity Analysis (GSA), including the elementary effect method (Campolongo, Cariboni et al. 2007, Campolongo, Saltelli et al. 2011), variance based method (Homma andSaltelli 1996, Sobol' 2001), derivative based method (Sobol' andKucherenko 2009, Sobol' andKucherenko 2010) and moment dependent method (Borgonovo 2007, Cui, Lü et al. 2010, Luyi, Zhenzhou et al. 2012), were designed for scalar output.And the direct way to perform sensitivity analysis for models with multivariate output is to perform sensitivity analysis for each Hydrol. Earth Syst. Sci. Discuss., doi:10.5194/hess-2016-259, 2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: 5 July 2016 c Author(s) 2016.CC-BY 3.0 License.output separately.However, this way is just a repetition of the traditional GSA and it ignores the correlations among the multivariate output.Thus, it may be insufficient to perform sensitivity analysis on each output separately or on a few context specific scalar functions of the output (Lamboni, Monod et al. 2011).A high degree of redundant sensitivity indices can be obtained when the correlation in the outputs is strong.In such a case, it is difficult to interpret the result (Garcia-Cabrejo and Valocchi 2014).Saltelli and Tarantola proposed to define a scalar index of interest to apply the GSA to simplify the original problem (Saltelli, Tarantola et al. 2000).It is recommended to apply sensitivity analysis to the multivariate output as a whole, and criteria and methods need to be developed for the sensitivity analysis of multivariate output.
Campbell (Campbell, McKay et al. 2006) proposed the output decomposition method for sensitivity analysis, which consists in (i) performing an orthogonal decomposition of the multivariate output, and (ii) applying sensitivity analysis on most informative components separately.This method gives more attention to a few components rather than the whole output.
To summarize the sensitivity over the whole output, Lamboni (Lamboni, Monod et al. 2011) proposed a new synthetic sensitivity criterion and extended the criterion to the continuous case.Generalized Sobol' sensitivity indices for multivariate output based on the decomposition of covariance matrix of model outputs was defined by Gamboa et al (Gamboa, Janon et al. 2013), and it is more computational efficient since it doesn't need spectral decomposition compared to the output decomposition method (Lamboni, Monod et al. 2011).
These sensitivity analysis methods for multivariate output only considered the sum of variance of each output, which implicitly assumes that the relationship between outputs is simple and additive.
However, there are different dimensions of measurement and orders of magnitude among outputs, which make them not be directly used for the comprehensive analysis.Therefore, it is necessary to have a dimensionless process for the outputs before the comprehensive analysis.Besides, for multivariate output space, each output represents one dimensionality of the multivariate output space.The variance of each output can represent the uncertainty of each dimensionality, which can be regarded as the magnitude of each variance dimensionality.The covariance decomposition method compares the importance of the model inputs by the influence of the inputs on the variance of each output.For the output decomposition method, the original outputs are transformed into a new set of outputs, which form a transformed space.Then, the influence of the model inputs on the variance of the new outputs tells the importance of each input.The sensitivity methods above can tell the influence of the model inputs on the variance of model outputs, which can be regarded as an influence on the magnitude of all the variance dimensionalities.However, they can't tell the influence of the model inputs on the directions of all the variance dimensionalities, i.e., the direction of the variance vector of the output space, which can reflect another character of the multivariate output uncertainty space.Thus, these methods are not Hydrol. Earth Syst. Sci. Discuss., doi:10.5194/hess-2016-259, 2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: 5 July 2016 c Author(s) 2016.CC-BY 3.0 License.
sufficient to tell the importance of model inputs.
In this work, we introduce a new sensitivity index based on vector projection which contains the influence of the input uncertainties on the magnitudes and directions simultaneously of all the multivariate output variance vector.Through dimensionless process, the influence of dimension of outputs can be eliminated.Then the conditional variances and unconditional variances of each output are set as vectors, the influence of the input uncertainties can be reflected by the similarity between the conditional variances vector and the unconditional variances vector, which can be measured by the vector projection.
The rest of this paper is organized as follows.The next section briefly reviews the global sensitivity indices based on the variance for scalar output and multiple outputs, then the definition and properties of the vector projection method.In Section 3, Polynomial Chaos Expansion (PCE) is applied to estimate the new sensitivity index.In Section 4, the new sensitivity index is illustrated by two numerical examples and HBV model, which gives the hydrological forecasts and predicts the potential climate changes or floods.Conclusions come at the end of paper.

Variance based method
The importance measure (IM) is defined as "the study of how uncertainty in the output of a model (numerical or otherwise) can be apportioned to different sources of uncertainty in the model input" (Saltelli, Tarantola et al. 2004), and Sobol decomposition is one of the main methodologies for IM.
(1) (1) (1) . The first order partial variance V .The second order partial variance (1) represents the interaction effect between 1 i X and 2 i X .In the same way explanation can be given to the higher order partial variances.And the total partial variance the summation of all terms in Eq. ( 2) with subscripts including i : (1) (1) (1) (1) V consists of the individual effect of i X and its interaction effects with all the other (1) = (1) (1) (1 and (1) (1) (1) (1)

Covariance decomposition approach
In the case of multivariate output rm  ( 1) m  , taking the covariance matrices for both sides of Y , Gamboa (Garcia-Cabrejo and Valocchi 2014) obtained ) The expression implies that the covariance matrix C of the multivariate output can be partitioned into the sum of covariance matrices that comes from changes in single i C , pairs

C
and so on of input variables.
According to Eq.( 7), the multivariate single effect index 1 M i S of the input variable i X is given by While the multivariate total effect index The If the correlation is in the output, the traditional sensitivity measure for the multivariate output is difficult to be interpreted.Furthermore, these methods ignore the influence of the dimension of the output variable.If some outputs have higher order of magnitude than others, they will make larger contribution improperly over the whole outputs (Szirtes and Rózsa 2007).To solve these problems, an alternative measure has been proposed for the importance measure of the multivariate output.It is called the vector projection approach.

Preliminaries
Assume random input variables be independent defined on some probability space ,  ( P), and ( )

Definition of Vector Projection approach
Transformed by Eq.( 10), the output () r Y can be dimensionless where ()

E
is the expectation operator and || is the absolute operator.
From Eq.( 2), the variance decomposition of can be obtained: are the th,( (1, 2,..., )) r r n  conditional variances and the unconditional variances of the multivariate output respectively.The above equation can be simplified as following by ignoring the superscript Ŷ . 12 The vector projection can be used to generalize the important measure.According to the definition of inner product (Durier 1994), the vector projection i Q of the vector i V on the vector V can be given by where i θ is the angle from the vector V to the vector i V , represents the inner product of two vectors, and represents the magnitude of a vector.Then, normalize the projection by dividing the norm of vector V and the new main effect index i P is defined as following: Similarly, the vector projection 12 , ,..., (( 1,..., ),1 )   of the interaction effect is given by: where 12 , ,..., r i i i θ is the angle from vector V to vector 12 , ,..., . And the interaction effect index is: Like Eq.(3) the total effect can be expressed as Eq.( 17): V includes the individual effect of i X and its interaction effects with all the other remaining variances of all model outputs when all the inputs but i X are fixed over their full ranges.
The vector projection of the total effect i T Q can be expressed as: And the total effect index is: Lemma 2.2.1 The vector projection measures sum up to 1, i.e, Proof.
1 2 Proportion 2.2.1 For all input terms i the vector projection indices satisfy:

(ii)
If has no effect on , then If has no effect on but has effect on , then where () corr  ，is the correlation coefficient between two random variables.
Proof.Point(i):positivity is clear, as  19).Point(ii)and point(iii)are easy to be proved by the definition.For (iv) and (v), more details are given in section 2.3.

The link between the vector projection index and the Sobol index
A comparable definition of i S proposed by Sobol' (Sobol 1996) is based on the correlation between the th k output () ˆk Y and the conditional expectation And i P is given by where () And when 1 m  , the index for two outputs.
For the scalar output, the Sobol index measures the ratio of magnitude of the conditional variance to the magnitude of the unconditional variance.For the multivariate output case, multiple parameters can be expressed as a vector and the inner product can measure the similarity between the vectors.
The degree of the coincidence, between the vector i V included all conditional variances and the vector V included all unconditional variances implies the main effect contribution of each input factor to the multivariate variance of all the outputs.Fig. 1 shows the geometric interpretation of the vector projection measure for multivariate output ( 2 m  ).In Fig. 1, we can see that the vector angle i θ reflects the difference of the direction between two vectors, the smaller is, the closer the overlapping between vectors is.Except for vector angle, the magnitude of vector i V also influences the coincidence degree.The new sensitivity index is the ratio of the vector projection that from the vectors i V to the vector V to the norm of the unconditional variance vector.

Polynomial Chaos Expansion
The Polynomial Chaos expansion (PCE) of 2-nd order random variable is a decomposition of the form (Wiener 1938, Ghanem andSpanos 1991) 0 () where j  is the jth ( 1,..., )  are orthogonal to each other with respect the corresponding PDF (Xiu andKarniadakis 2002, Xiu 2010) and   1  2 ( , ,..., ) independent standard normal random variables.To be used in engineering models, Eq.( 24) needs to be truncated.The order of the polynomials is M and the number of input variables is n , then the total number of terms 1 P  with order less than or equal to M is given by There are two approaches for the estimation of the coefficients: projection and regression.The projection can take advantage of the orthogonal nature of the polynomial  and the coefficients are estimated using multidimensional numerical integration (Ghanem and Spanos 1991), but it requires a large number of the model evaluations to compute integration (Xiu 2010).In the regression approach (Berveiller, Sudret et al. 2006, Sudret 2008) we define Ψ the matrix whose coefficients are given by () 0 ( ), 1,..., ; 0,..., 1 with evaluation of the orthogonal polynomials at the collocation points () k ξ , from these we can obtain the coefficients j  as Due to the orthogonality of the basis, it is easy to show that the statistical moments of the random variable Y such as the mean and variance respectively read:

Estimating the new sensitivity indices for multivariate output by the PCE
where 1 ,, is the set of multiple tuples and indices 1 ( , , ) where is an integer set used to correspond each term in Eq.( 24) to the orthogonal polynomials (Sudret 2008).The variance of ()  ˆk Y can be obtained using Eq.( 27), and therefore the main effect of the new sensitivity index for an input variable i X can be given by: And the interaction effect of any group of input variables 1 ,..., r i i X can be estimated as follows by PCE Similarly, the total effect index of i X can be compactly expressed as

Numerical examples
Example 4.1 Consider a linear model with multivariate output (0,1), (0,1), (0,1), (0,1) The sensitivity results obtained by the Sobol index i S for each output, the covariance decomposition method 1 M i S and the proposed index i P are listed in Table 1.For the above equations, we magnify the second function 2 y 100 times to simulate the influence of dimension.
Since the input variables are standardized normal random variables, it is straightforward to find the important measure ranking is through qualitative analysis.Since there are no interaction term in this example, just the main sensitivity index of the original sensitivity indices is calculated.From Table 1, it can be observed the following aspects.
Firstly, the importance rankings and the sensitivity values obtained by i S for (2) y and 1 M i S are the same.This is easy to explain by the fact that 1 M i S is influenced by the high order of magnitude of dimension of the second function (2)   y .Therefore, 1 M i S can't describe output variability comprehensively during the dimensions of outputs are different, and the magnitude orders of the (1)   y and (2) y have too big discrepancy.Secondly, the ranking result of the vector projection index from the quantitative analysis is equal to the ranking of qualitative analysis, which denotes that the importance measure based on the new sensitivity index is more applicable than the traditional indices for the multivariate output.Thirdly, there is less computational cost for PCE to obtain the convergent values.So the accuracy and efficiency for estimating the new sensitivity index can be improved by PCE based method.
Example 4.2 Consider the following nonlinear model used in (Luyi, Zhenzhou et al. 2016) (3) 1.905 0.565 ( ) 0.03 The input variables follow normal distribution, and their distribution parameters are shown in Table 2.The sensitivity results are listed in Table 3.Since this example has interaction terms, the main effect indexes and the total effect indexes based on the covariance decomposition and the vector projection are both presented in Table 3.To compare other differency between the new indices with the traditional method for multivariate output expect dimension, we calculate the results of ˆ1M i S and ˆM i ST which the influence of the dimension of the outputs is eliminated.provides an efficient alternative for the sensitivity analysis for multivariate output space by taking both of its dimension, magnitudes and directions of the multivariate variances into account simultaneously.

The hydrological model: HBV model
The HBV model is a conceptual model for rainfall-runoff simulation and takes the precipitation, temperature and potential evaporation as the inputs.The model consists of a degree-day snow model, soil-moisture accounting model and a runoff response model (Kollat, Reed et al. 2012)  There are a variety of criterions for the calibration of HBV model (Diskin andSimon 1977, van Werkhoven, Wagener et al. 2009).Here we consider three metrics, which are Nash-Sutcliffe efficiency (NSE) (Nash andSutcliffe 1970, Kollat, Reed et al. 2012), Transformed Root-Mean Square Error (TRMSE) (Kollat, Reed et al. 2012) and Slope of the Flow Duration Curve (SDFCE).Jan suggested that the combination of different functions is suitable to judge different parameter sets which may perform more or less similarly well (Seibert 1997).

Nash Sutcliffe Efficiency(NSE )
The first objective emphasizes peak flow errors using the Nash-Sutcliffe Efficiency as shown in Eq.( 34), The second objective emphasizes low flow errors using the Box-Cox transformed root-mean-Table 4 The parameters of HBV model and the corresponding ranges Slope of the Flow Duration Curve(SDFCE) The third objective emphasizes the flashiness of a watershed's response by minimizing in simulating the slope of the flow duration curve(SFDCE) as shown in Eq.( 35) ,67% ,33% ,67% ,33% 1 100% About the three response outputs, they have different dimensions and the third function's dimension has the largest orders of magnitude.S shown in Fig. 6,which is caused by the influence of dimension of SFDCE.This suggests that in multivariate output case, the magnitude orders of the dimension has great impact on ranking results.For the main effect, Fig. 5 shows that although both i P and identify the same important variables BATE and FC, the rankings they are obtained are not same.
i P indicates that FC is more important than BATE based on the vector projection, while indicates BATE has the largest importance for the multivariate output, followed by FC.
For the total effect, Fig6 shows that the rankings obtained by

Conclusions
The vector projection importance measure is proposed in this paper to evaluate the comprehensive effect of the inputs on the magnitudes of variances and directions of the multiple output space.The mathematical properties of the new sensitivity index are derived and its geometric significance is discussed.Two numerical examples and a hydrological model are employed to verify the effectiveness of the proposed method.Comparison with the covariance decomposition method shows that the new sensitivity index based on the vector projection can measure the effect of the inputs on the whole uncertainty of the multivariate output synthetically.
The rankings of the input variables obtained by the generalized sensitivity indices are not necessarily the same with the proposed index.This is easy to understand by the fact that the vector projection based method additionally considers the effects on the dimension and directions which are ignored by traditional indices.Thus, only measuring the effects of the input variables on the magnitudes of variances is not enough to reflect the relative importance of the input variables comprehensively.In addition, the Polynomial Chaos Expansion method is used to estimate the new sensitivity indices for the multivariate output, and the main computational cost of the PCE based method is the estimation of the coefficients of the expansions.Thus the PCE based method for estimating the new sensitivity index is efficient compared with the Monte Carlo Simulation.
output can be obtained as follows.
trace []TrC is the sum of the variances of all outputs () ( 1,..., ) as the sum of the variances associated with input variable i X and ~i X of all the outputs Y .Garcia-Cabrejo et al. (Garcia-Cabrejo and Valocchi 2014) pointed out that the output decomposition method and the covariance decomposition method are equivalent if the first K eigenvectors in the principle component decomposition preserve the original variance of outputs.The output and covariance decomposition methods mainly focus on the sum of the variances of the multivariate output.However, the comprehensive effect for the input variable on the multiple output may not be equal to the sum of each input contribution to the scalar output.
proposed new sensitivity indices for the multivariate output have analytical expressions which are estimated from the coefficients of the PCE of the output variables.There is no additional cost for obtaining the new sensitivity indices once the coefficients of the PCE are available.4．Example In this section the new sensitivity index is applied to two numerical examples and a hydrological model to analyze the influence of the input variables on the multivariate output.And the results of the new sensitivity index based on the vector projection index will be compared with Hydrol.Earth Syst.Sci.Discuss., doi:10.5194/hess-2016-259,2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: 5 July 2016 c Author(s) 2016.CC-BY 3.0 License. the sensitivity index based on the covariance decomposition method.To ensure the convergence of the computational results, the sample size of Monte Carlo Simulation (MCS) for all the sensitivity indices is taken as 100000 N  .The results of the PCE in different orders ( M ) to estimate the new sensitivity indices are compared with the results of the MCS to verify the effectiveness of the PCE based method.
and the directions in the multiple output space.This indicates that the importance measures based on the vector projection are more comprehensive than the generalized Sobol indexes.In addition, although the results of i P and Ti P estimated by MCS and PCE are approximately equal, the computation cost of PCE is much less than that of MCS.Once the coefficients of the PCE are estimated, the multivariate sensitivity indices can be obtained without additional computational cost shown in Eqs.(30)-(32).Therefore, the proposed measure i Syst.Sci.Discuss., doi:10.5194/hess-2016-259,2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: 5 July 2016 c Author(s) 2016.CC-BY 3.0 License.

Fig. 2
Fig.2 Sketch map of the HBV modelThere are 13 parameters that should be calibrated for the HBV model.The parameters and the corresponding ranges are shown in Table1(the first four parameters are related to degree-day snow module, next three parameters are related to soil-moisture accounting model, and the last six ones are related to the runoff response model).The ranges of the parameters are based on prior studies(Kollat, Reed et al. 2012).

Fig. 6 .
Fig.6.The total effect indices of multivariate output of the HBV model In Figs.3 and 4, the results of the Sobol index i S and i ST of three outputs are presented respectively.And the sensitivity analysis results of the multivariate output of the HBV model, which are obtained by the vector projection indices i P and Ti P and the covariance decomposition interaction effect between the input variables, since i PT which includes magnitudes of the variances and the directions in the dimensionless multiple output space, but the traditional sensitivity index just includes magnitudes of variances in the multiple output space.In addition, Fig.5and 6 show that results of PCE are similar to those of MSC.The PCE is able to evaluate the proposed index, with 6720 model evaluations ( 3 M  ) which is much lower than Hydrol.Earth Syst.Sci.Discuss., doi:10.5194/hess-2016-259,2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: 5 July 2016 c Author(s) 2016.CC-BY 3.0 License.above results, it can be concluded that the parameters BETA and FC have much more importance for 3 outputs represented by NSE, TRMSE and SFDCE among 13 inputs, the following importance inputs are the parameters CFMAX, TS,CWH,K1,UZL,MAXBAS since they have large interaction effects.For the rest parameters CFR, LP, PERC, K0 and K2, they have less contribution to the multivariate output. .
The relationship among , the coefficients are estimated by minimizing the sum of squares of the difference between a set of model evaluations Ｎ
Table 2 the distribution parameters of Example 2

Table 3
Sensitivity indices for Example 2