Interactive comment on “ A nonlinear modelling-based high-order response surface method for predicting monthly pan evaporations ” by Behrooz Keshtegar and Ozgur Kisi

This manuscript deals with the estimation of monthly pan evaporations using high-order response surface (HORS) function along with three machine-learning algorithms. The abstract is clear and well written. Its not clear whether pan evaporation or total pan evaporation is estimated. Difference between these two terms is not clear. Introduction: Page 3, line 56 Authors stated that the main disadvantage of the ANFIS and ANN methods are their complex formulations. How proposed HORS is simple formulations in compare to ANFIS and ANN. This needs clarification. It is useful to mention about studies carried out using HORS in water resources. Why HORS is chosen? Section 2, page 3-5 High-order response surface method -This is lengthy, difficult to read. It is suggested to provide flow chart of methodology. Section 4: Comparative statistics


Introduction
Evaporation is critical in water resources development and management.In arid and semiarid regions where water resources are rare, the prediction of evaporation turns out to be more interesting in the planning and management of water resources (Karimi-Googhari, 2010).
Accurate determination evaporation amount from the soil is vital for analyzing water balance at the land surface, which is essential to compute drainage requirements for preventing water logging and moving away excess water from the root zone to develop crop production (de Ridder and Boonstra, 1994;Kim et al., 2014).In practice, the estimation of evaporation can be accomplished by direct or indirect methods.Pan evaporation (Epan) is one of the direct methods for evaporation measurements.Estimation of Epan is of impressive significance to Hydrol.Earth Syst.Sci. Discuss., doi:10.5194/hess-2016-191, 2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: June 2016 c Author(s) 2016.CC-BY 3.0 License.hydrologists and agriculturists.Indirect methods, for example, mass transfer and water budget techniques, taking into account meteorological data have been utilized to estimate evaporation on a water body by numerous researchers (Coulomb et al., 2001;Gavin and Agnew, 2004; ).
In the last decades, data-driven methods such as fuzzy genetic (FG), artificial neural network (ANN), adaptive neuro fuzzy inference system (ANFIS) have been applied for modelling Epan (Sudheer et al., 2002;Kisi, 2009;Dogan et al., 2010;Kim et al., 2013;Kisi and Tombul, 2013;Malik and Kumar, 2015) and have been successfully applied in water resources (Moghaddamnia et al., 2009;Amini et al., 2010;Sanikhani et al., 2012;Kisi and Tombul, 2013;Liu et al. 2014;Li et al. 2015;Khan and Valeo, 2015;Xu et al., 2016).Sudheer et al. (2002) used ANN in modelling Epan and compared with Stephens-Stewart (SS) method.They found that two-input ANN whose inputs are air temperature and solar radiation performed better than the SS.Kisi (2009) compared the accuracy of three different ANN techniques in modelling daily Epan and he indicated that the muti-layer perceptron (MLP) and radial basis ANN gave almost similar estimates and their accuracies were better than the GRNN and SS models.Dogan et al. (2010) successfully applied ANFIS to Epan of Yuvacik Dam, Turkey and compared with multiple linear regression (MLR).Shiri et al. (2011) successfully estimated daily Epan using ANFIS and ANN methods.Kim et al. (2013) used two different ANN, MLP and a cascade correlation neural network (CCNN), in prediction of daily Epan and found CCNN to perform better than the MLP.Kisi and Tombul (2013) successfully modeled Epan of Antalya and Mersin stations, Turkey by using FG and compared with ANN and ANFIS methods.They found FG method to be superior to the other methods in modelling Epan.Malik and Kumar (2015) modeled daily Epan of Pantnagar, located in Uttarakhand, India by using co-active ANFIS, ANN and MLR methods and they reported that the ANN had better accuracy than the other models in modelling Epan.The Hydrol.Earth Syst.Sci. Discuss., doi:10.5194/hess-2016-191, 2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: June 2016 c Author(s) 2016.CC-BY 3.0 License.main disadvantage of the ANFIS and ANN methods are their complex formulations.
Therefore, simpler and more efficient models are needed for estimating Epan in practical applications.
The main aim of this study is to investigate the ability of high-order response surface (HORS) function in estimation of Epan and compare with FG, ANFIS and ANN models previously developed by Kisi and Tombul (2013).This is the first study that applies HORS in Epan modelling.

High-order response surface method
In the stochastic process e.g. the pan evaporation, the accurate prediction is vital important in terms of a set of several input variables, which are selected based on climatic data.Generally, the evaporation is an implicit process that can be depended on several input variables (X) such as air temperature, solar radiation, relative humidity and wind speed.A finding the closed-form expression for evaporation based on input climatic variables is the main effort to predict the availability of evaporation process because it cannot be obtained when accurate approximation is not available to evaluate the evaporations.To overcome this difficulty, it can be implemented the response surface methodology (RSM) to estimate the monthly pan evaporation by an approximate closed-form expressions.
The RSM was proposed based on a set of mathematical polynomial functions through a number of set experiments for increasing the computational efficiency (Bucher and Bourgund 1990) that it is useful for modelling and evaluating an implicit process as a response surface function in explicit form.It is expressed a function E(X) based on n input variables X (x1, x2,…, xn) using a second-order polynomial form with cross terms expression as follows (Khuri and Cornell 1996): Hydrol.Earth Syst.Sci. Discuss., doi:10.5194/hess-2016-191, 2016 Manuscript under coefficients to be determined based on calibration of a number of set experimental samples.It may be provided an appropriate prediction for the evaporation through the inclusion of the cross terms in the second-order polynomial RSF.However, it may not produce accurate approximations for a highly nonlinear actual process with several input variables.Therefore, a quadratic polynomial form of RSF is inappropriate to approximate the pan evaporation for a wide range of stations, since the mathematical nonlinear degree of the evaporation function varies for each station.It has been developed the high-order response surface method by Gavin and Yau (2008) to achieve preferable flexibility of RFS.The high-order RSF proposed by Gavin and Yau (2008) is more inefficiently computation due to determine the order of a variable in a mixed term and use the forth steps for calibration process to compute unknown coefficient.Therefore, the application for predictions of evaporation is not simply based on more input variables.It is proposed a high-order response surface function based on the Eq.(1) as follows: coefficients in which Or is order of RSF.The main effort in the RSM form is to fit a RSF based on Eq. (2) on the limited experiment points.The high-order RSF Eq. ( 2) is rewritten in matrix form as (Kang et al. 2010).
where, a is the coefficient vector and The least squares estimator is commonly used in evaluating the unknown coefficients of the RSF in terms of the experimental points (Kang et al. 2010).In least square method, the unknown coefficients of a are computed by minimizing the error between the experiment ( E ) and approximate ( In which, are the experiment vector and polynomial function vector for number of data points NE , respectively. The minimization of the error function in Eq. ( 5) with respect to the unknown coefficients of a , we have 0 . Thus, the coefficients of a are yielded as follows: Substituting Eq. ( 6) in Eq. ( 3), the predicted evaporation based on the high-order RSF are attained as follows: The proposed high-order polynomial RSF (2) produces more accurate results that unknown coefficients are simply obtained more computationally efficient by Eq. ( 7).The high-order RSF is obtained for predicting the pan evaporation using the set of observed points from climatic data based on the a codes in MATLAB 7.10 (2010) and ran on a Intel (R) Core (TM) i5 Laptop with two 2.53 GHz CPU processors and 4.0 GB RAM memory through the following algorithm :  4)

Compute the predicted evaporation based on the high-order RSF as
XV based on Eq. ( 4)

END FOR
Determine the validated evaporation using the high-order RSF as a XV XV

Case study
In the applications, monthly climatic data of two automated weather stations, Antalya and Mersin station operated by the Turkish Meteorological Organization (TMO) in Turkey were used in the study.These data were also used by Kisi and Tombul (2013).The Mediterranean Region has a Mediterranean climate characterized by warm to hot, dry summers and mild to cool, wet winters.The winter temperature reaches its max.as 24 °C and in summer it may be as high as 40 °C.
Monthly data composed of twenty years   data (50% of the whole data) were used to train the models, the second five years data (25% of the whole data) were used for testing and the remaining five years data (25% of the whole data) were used for validation for each station.Detailed information about data can be obtained from Kisi and Tombul (2013).

Comparative statistics
In this study, several statistical parameters were used to evaluate the performance of predicted models, which were given by the following relations (Nash and Sutcliffe 1970, Willmott 1981, Daren andSmith 2007).

Agreement index (d)
Hydrol.Earth Syst.Sci. Discuss., doi:10.5194/hess-2016-191, 2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: June 2016 c Author(s) 2016.CC-BY 3.0 License.Efficiency factor (EF) is calculated on the basis of the relationship between the predicted and observed mean deviations and it can show the correlation between the predicted and observed data.EF is better suited to evaluate model goodness-of-fit than the R 2 , because R 2 is insensitive to additive and proportional differences between model prediction and observations.
The agreement index is a descriptive measure that the range of d is similar to that of R 2 and varies between 0 (no correlation) and 1 (perfect fit).R 2 is overly sensitive to extreme values because it is sensitive to differences in the observed and predicted means and variances, the factor d can be applied to overcome this difficulties based on Eq. ( 11) because the agreement index was not designed to be a measure of correlation (Daren and Smith 2007).

Illustrative applications and results
The performance including both the accuracy and agreement of the HORS methods are of were separately calibrated based on climatic input data for each station.In the second application, the Mersin's pan evaporations were estimated using data from Antalya stations.
In the third application, the Mersin's pan evaporations were approximated using input climatic data from both Antalya and Mersin stations.For this three applications, the comparative results of three order RSFs including 2-order, 3-order, and 4-order are determined and compared with the soft computing-based FG, ANFIS, ANN models.A program code was developed by MATLAB language for HORS models based on algorithm of high-order RSF.The results of FG, ANFIS and ANN models were obtained from the study of Kisi and Tombul (2013).

Predicting monthly pan evaporations of Antalya and Mersin stations
In the present paper, three different HORS models including 2-order RSF which indicates a response surface function with second-order polynomial form, 3-order RSF, and 4-order RSF were developed for predicting the monthly pan evaporations based on four inputs, T, SR, W and H for Antalya and Mersin stations.The test and validation results of each model are tabulated and compared with FG, ANFIS and ANN in Table 1.In the table, the FG(2,gauss,100000) model represents a FG model comprising 2, 2, 2 and 2 Gaussian MFs for each climatic input and 100000 iterations.ANFIS(2,gauss,10) model represents an ANFIS model including 2, 2, 2 and 2 Gaussian MFs for each input and 10 iterations and ANN(4,1,1) model indicates an ANN model having 4, 1 and 1 nodes for the input, hidden and output nodes, respectively.In Antalya Station, RSF models perform superior to the FG, ANFIS and ANN models in both test and validation periods.The accuracy of the FG model with respect to RMSE, MAE, EF and d were improved by 69%, 82%, 10% and 3% using 4-order RFS, respectively.In Mersin Station, also the RSF models have better accuracy than the soft ANN and RFS models in validation stage for the Antalya and Mersin stations, respectively.It is apparent from the fit line equations and R 2 values that the RFS models have less scattered estimates which are closer to the ideal line than those of the soft computing models.3-order and 4-order RSF models have almost similar accuracy and they are slightly better than the 2order RSF models.In both stations, the accuracy ranks of the applied models in validation period are: 4-order RFS, 3-order RFS, 2-order RFS, FG, ANFIS and ANN.
Table 2 reports the total pan evaporation (TPA) predictions of each model.As clearly observed from the table that the RFS models estimate TPA better than the soft computing methods.Among the RFS methods, 4-order RFS provides the closest estimate for both stations in the validation stage.For the Antalya Station, the observed TPA of 322 mm was estimated as 306 mm by 4-order RFS with an underestimation of 4.8% while it was respectively estimated as 303, 302, 283, 275 and 275 mm by 3-order RFS, 2-order RFS, FG, ANFIS and ANN models with underestimations of 5.7, 6.1, 12, 14.5 and 15.3%.For the Mersin Station, while the 4-order RFS estimated the TPA as 179 mm, compared to the measured 173 mm, with an overestimation of 3.5% in the validation period, the 3-order RFS, 2-order RFS, FG, ANFIS and ANN models resulted in 180, 186, 216, 225 and 230 mm, with overestimations of 4, 7.4, 25, 30 and 33%, respectively.

Predicting Mersin's pan evaporations using climatic data of Antalya
In this section of the study, the accuracy of RFS models was tested in prediction of Mersin's Epan using climatic input data of Antalya Station and results were compared with soft computing methods.The validation results of the applied models are given in Table 3.It is apparent from the table that the RFS models perform superior to the FG, ANFIS andANN Hydrol. Earth Syst. Sci. Discuss., doi:10.5194/hess-2016-191, 2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: June 2016 c Author(s) 2016.CC-BY 3.0 License.models in terms of RMSE, MAE, EF and d.The RMSE accuracies of the FG, ANFIS and ANN models were increased by 110, 132 and 133% using 4-order RFS, separately.The worst 2-order RFS increased the MAE, EF and d accuracies of the best soft computing FG model by 67, 224 and 32%, respectively.The TPA predictions are also compared in Table 3.Similar to the previous applications, here also the RFS models outperform the soft computing methods.4-order RFS estimated the TPA as 176 mm, instead of measured 173 mm, with an overestimation of 1.8% in the validation period, 3-order RFS, 2-order RFS, FG, ANFIS and ANN resulted in 178,184,205,215 and 212 mm,with overestimations of 2.6,6.4,18.2,24.1 and 22.7%, respectively.There is a slight difference between RFS models.Figure 3  It is obvious that the RFS model has less scattered estimates and they are closer to the ideal line than those of the soft computing methods.It can be said that the RFS models can be successfully used in estimation of Epan without local input data.

Predicting Mersin's pan evaporations using climatic data of Antalya and Mersin
In this section of the study, the RFS models are compared with soft computing methods in Epan estimation using local and external inputs.Climatic input data of Mersin and Antalya stations were used as inputs to the applied models to estimate Epan of Mersin Station.
Limited climatic inputs were also considered as inputs to the models in this part of the study.
Estimating Epan using limited input variables is very essential especially for the developing countries where wind speed and relative humidity data are missing or unavailable.The validation results of the RFS and soft computing methods are provided in Table 4.The superior accuracy of the RFS models to the soft computing methods are clearly seen from the table.In case four-input parameter, 4-order RFS1 increased accuracy of the FG1 by 316, 371, 7.3 and 43% in terms of RMSE, MAE, EF and d, respectively.Furthermore, the RMSE accuracies of the two-input FG2, ANFIS2 and ANN2 models were increased by 143, 243, Hydrol. Earth Syst. Sci. Discuss., doi:10.5194/hess-2016-191, 2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: June 2016 c Author(s) 2016.CC-BY 3.0 License.158 and 54 using the 4-order RFS2 model with two inputs.RFS models seem to be more successful than the soft computing models in estimating TPA values in validation stage.The scatterplots of the estimates obtained from RFS and soft computing models in validation stage are demonstrated in Figures 4 and 5 for the four-and two-input models.In both cases, 4-order and 3-order RFS models have similar estimates and they are closer to the observed Epan values than those of the other models.Comparison of two-and four-input models indicates that the wind speed and relative humidity variables are very effective on Epan and removing these inputs significantly decreases models' accuracies especially for the RFS models.

Conclusions
The present study investigated the ability of response surface method to predict the monthly pan evaporations.A high-order response surface (HORS) function was proposed with simple formulation to estimate the pan evaporations using climatic input variables including air temperature (T), relative humidity (H), wind speed (W) and solar radiation (SR) for Antalya and Mersin stations.The HORS function was extended based on order of polynomial functions based on input variables more than two.In this approach, the high-order polynomial functions are simply and directly calibrated based on the observed climatic data and relative experiments of evaporation data for each station.The accuracy of HORS function with second-order, third-order and four-order were compared to the FG, ANFIS, ANN approaches for estimating the monthly pan evaporations using several comparative statistics such as root mean square error (RMSE), mean absolute errors (MAE), model efficiency factor (EF), and agreement index (d).Three applications of HORS function were compared with the soft computing-based models based on input variables of Antalya and Mersin stations.In the first stage of the predictions, the performance of proposed HORS Hydrol.Earth Syst.Sci. Discuss., doi:10.5194/hess-2016-191, 2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: June 2016 c Author(s) 2016.CC-BY 3.0 License.models was compared in estimating pan evaporations of Antalya and Mersin stations, separately.In the second application, the prediction results of HORS functions for evaporation of Mersin station with input variables of Antalya were compared.
In the third part of the study, models of HORS and FG, ANFIS, and ANN were compared with each other in estimating Mersin's pan evaporations using input data of the Antalya and Mersin stations.Comparison of the models indicated that the 4-order RSF models generally performed better than the 2-order RSF, 3-order RSF, FG, ANFIS and ANN models.The RSFs with second, third and fourth-order polynomial functions were performed better than the soft computing-based models inclining both the accuracy (less RMSE and MAE than FG, ANFIS, ANN) and agreement (more EF and d than FG, ANFIS, ANN).This result revealed that the HORS models were much simpler than the other models and could be successfully used in estimating monthly pan evaporations.The 3-order RSF and 4-order RSF models provided the closest total pan evaporation estimates based on RMSE for Antalya and Mersin stations in the validation period, respectively.The comparative statistics for both stations were computed similar based on 3-order RSF and 4-order RSF models.Kisi and Tombul (2013) Hydrol.Earth Syst.Sci. Discuss., doi:10.5194/hess-2016-191, 2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Hydrol.Earth Syst.Sci. Discuss., doi:10.5194/hess-2016-191, 2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: 13 June 2016 c Author(s) 2016.CC-BY 3.0 License.Hydrol.Earth Syst.Sci. Discuss., doi:10.5194/hess-2016-191, 2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: 13 June 2016 c Author(s) 2016.CC-BY 3.0 License.Hydrol.Earth Syst.Sci. Discuss., doi:10.5194/hess-2016-191, 2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: 13 June 2016 c Author(s) 2016.CC-BY 3.0 License.Hydrol.Earth Syst.Sci. Discuss., doi:10.5194/hess-2016-191, 2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: 13 June 2016 c Author(s) 2016.CC-BY 3.0 License.

X
is the polynomial basic function vector at the experimental point i X which is defined based on polynomial order of RSF as ] Hydrol.Earth Syst.Sci.Discuss., doi:10.5194/hess-2016-191,2016   Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: June 2016 c Author(s) 2016.CC-BY 3.0 License.Algorithm of high-order RSF: Give initial parameters and database NE (Number of experiments including the train and test); NV (Number of validate data); X (input train and test data); XV (input validate data); E (evaporation of test and Train database).on Eq. ( of monthly values of air temperature (T), solar radiation (SR), wind speed (W), relative humidity (H) and Epan.The first ten years Hydrol.Earth Syst.Sci.Discuss., doi:10.5194/hess-2016-191,2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: June 2016 c Author(s) 2016.CC-BY 3.0 License.
evaluated through two different stations such as Antalya and Mersin stations.The four comparative statistics i.e.RMSE, MAE, d, and EF are used to illustrate the performance of proposed HORS functions and the performance of HORS functions are compared with the FG, ANFIS, and ANN models in three applications.In the first application, pan evaporations Hydrol.Earth Syst.Sci.Discuss., doi:10.5194/hess-2016-191,2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: June 2016 c Author(s) 2016.CC-BY 3.0 License.
computing techniques from the RMSE, MAE, EF and d viewpoints.The 4-order RFS Hydrol.Earth Syst.Sci.Discuss., doi:10.5194/hess-2016-191,2016 Manuscript under review for journal Hydrol.Earth Syst.Sci.Published: June 2016 c Author(s) 2016.CC-BY 3.0 License.improved the accuracy of the FG model with respect to RMSE, MAE, EF and d by 176%, 202%, 7.2% and 44%, respectively.Figures 1-2 illustrates the estimates of the FG, ANFIS, compares the Epan estimates of each model with the corresponding observed values in validation stage. Figures

.
Fig. 1.The observed and estimated pan evaporation of the Antalya Station in validation period (The results of FG, ANFIS and ANN were obtained from Kisi and Tombul (2013)).

Fig. 2 .
Fig. 2. The observed and estimated pan evaporation of the Mersin Station in validation period (The results of FG, ANFIS and ANN were obtained from Kisi and Tombul (2013)).

Fig. 3 .
Fig. 3.The observed and estimated pan evaporation of the Mersin Station using the climatic data of Antalya Station in validation period (The results of FG, ANFIS and ANN were obtained from Kisi and Tombul (2013)).

Fig. 4 .
Fig. 4. The observed and estimated pan evaporation of the Mersin Station using the climatic data of Antalya and Mersin stations (i.e.TA, SRA, WA, HA, TM, SRM, WM and HM) in validation period (The results of FG, ANFIS and ANN were obtained from Kisi and Tombul (2013)).

Fig. 5 .
Fig. 5.The observed and estimated pan evaporation of the Mersin Station using the climatic data of Antalya and Mersin stations (i.e.TA, SRA, TM, and SRM) in validation period (The results of FG, ANFIS and ANN were obtained from Kisi and Tombul (2013)).
where, NE is the number of data experiments and E is average of the observed monthly pan evaporation for each station.RMSE and MAE show the average difference between predicted ) for ith data.Of course, lower values of RMSE and MAE indicate a better fit, with zero indicating a perfect prediction.
i E

Table 3 .
Comparison of models in estimating Mersin's pan evaporation using the climatic data of Antalya in validation period.Note that the validation results of the FG, ANFIS and ANN models were obtained from Kisi and Tombul Hydrol.Earth Syst.Sci.Discuss., doi:10.5194/hess-2016-191,2016Manuscriptunder review for journal Hydrol.Earth Syst.Sci.Published: 13 June 2016 c Author(s) 2016.CC-BY 3.0 License.

Table 4 .
Comparison of models in estimating Mersin's pan evaporation using the climatic data of Antalya and Mersin in validation period.Note that the validation results of the FG, ANFIS and ANN models were obtained from