Watershed models have been used extensively for quantifying nonpoint source (NPS) pollution, but few studies have been conducted on the error-transitivity from different input data sets to NPS modeling. In this paper, the effects of four input data, including rainfall, digital elevation models (DEMs), land use maps, and the amount of fertilizer, on NPS simulation were quantified and compared. A systematic input-induced uncertainty was investigated using watershed model for phosphorus load prediction. Based on the results, the rain gauge density resulted in the largest model uncertainty, followed by DEMs, whereas land use and fertilizer amount exhibited limited impacts. The mean coefficient of variation for errors in single rain gauges-, multiple gauges-, ASTER GDEM-, NFGIS DEM-, land use-, and fertilizer amount information was 0.390, 0.274, 0.186, 0.073, 0.033 and 0.005, respectively. The use of specific input information, such as key gauges, is also highlighted to achieve the required model accuracy. In this sense, these results provide valuable information to other model-based studies for the control of prediction uncertainty.
Nonpoint source (NPS) pollution has become the major obstacle in sustaining high-quality water supplies in developed countries, such as the United States, as well as in developing countries, such as China (Zheng et al., 2011). Hydrological models, such as the Agricultural Non-Point Source Model (AGNPS) and Soil and Water Assessment Tool (SWAT) (Arnold et al., 1998), provide essential tools for quantifying NPS loads and understanding their effects on water quality deterioration. Nevertheless, due to the complexity of watershed systems and substantial requirements for input data, uncertainty becomes an inevitable part of model-based research and thus management plans (Beven, 2006; Xue et al., 2014). Typically, model uncertainty comes from its structure, parameter choice and input data. Structure uncertainty results from incomplete knowledge of watershed processes or different assumptions during model setup, whereas parameter uncertainty arises due to the availability of validation data, imprecise representation of parameter ranges and distributions. In addition, input uncertainty is generated from simplifications in natural randomness and temporal-spatial data variability and would be inevitably magnified by model uncertainty to larger output errors.
Model inputs typically include spatial data, such as spatial precipitation input, digital elevation models (DEMs), land use maps and soil maps, as well as attribute data, such as fertilizer amount (Shen et al., 2013). The uncertainty of spatial data, typically in the forms of GIS maps, is derived from many factors, including the quantity of available scenes, the resolution for the data that were captured, and the choice of interpolation techniques (Wu et al., 2005). Rainfall plays a crucial role in runoff production and mass transport so its reliability has been considered as major factor for the accuracy of hydrological models. Traditionally, the rain station is the fundamental tool for representing spatial distribution of rainfall within a watershed (Andréassian et al., 2001). Designing the proper location, number and density of rain-gauge stations is important to hydrological research (Duncan et al., 1993). Studies have explored the impact of heterogeneous rainfall data on parameter estimation and model outputs and concluded that large bias could be expected if detailed variations in the rainfall data are not considered (Strauch et al., 2012).
As another important GIS data, a DEM is used to extract surface characteristic parameters, such as watershed boundary, slope, and thus flow direction, so its resolution influences model outputs (Lin et al., 2013; Wellen et al., 2014). Studies have noted that coarser DEMs smooth watershed slope and thereby reduce the simulated peak flow or sediment yields (Zhang et al., 2014). It is also shown that nitrogen output decreased with the decreased DEM resolution, while a decreased DEM resolution does not always resulted in decreased total phosphorus (TP) (Chaubey et al., 2005). In this sense, the question about whether higher-resolution data would always lead to better model performance should be considered first (Shen et al., 2013). In the meantime, GIS data may be available from alternative sources; therefore, another question is which specific data set should be used. For example, land use maps could be obtained from federal, state and local government agencies, whereas county and local governments are developing detailed datasets (Shen and Zhao, 2010; Han et al., 2014). Land use maps for a specific point in time, typically obtained by interpreting remote sensing data, are often used, and possible changes in land uses during that specific period are not considered (Mango et al., 2011; Pai and Saraswat, 2013).
Despite the research progress described above, input-induced uncertainty remains a significant challenge due to various input data, which largely limits the applicability of watershed models. For example, model-based programs, such as Total Maximum Daily Loads (TMDLs), are often criticized for their inadequate consideration of input uncertainty (Chen et al., 2012). First, there is relatively more uncertainty research about hydrological processes but less on NPS pollution. Second, the sensitivity of watershed models also depends on how well attribute data aggregation describes the relevant characteristics of human management. Thus, it is useful to understand the assumptions of attribute data and how these assumptions will likely impact the model results. Third, previous studies have not evaluated the relative contribution of each input data set so a strategy on how to reduce input uncertainty cannot be formulated in a cost-effective manner (Munoz-Carpena et al., 2006).
The main objective of this paper is to conduct a comprehensive assessment of input-induced uncertainty in TP modeling. Four key types of input data, i.e., rainfall, topography, land use and fertilizer amount, are analysed, and their uncertainties are quantified. The uncertainties related to these input data are then compared.
The Upper Daning River Watershed, which is located in the Three Gorges
Reservoir Area of China, was selected as the studied watershed (Fig. 1). This
watershed, covering an area of 2421
In this study, the SWAT model, as a commonly-used watershed model, was used for NPS-TP modeling. The studied watershed was partitioned into 22 sub-watersheds from a constructed DEM and each sub-watershed is then divided into hydrologic response units (HRUs) by designing their homogeneous slope, soil, and land use. To use the SWAT model efficiently and effectively, the SWAT-CUP software (Abbaspour et al., 2007) was applied for model calibration and validation. The measured water quality and flow data were obtained from the Changjiang Water Resources Commission as well as local government. Thereafter, the SWAT model was calibrated and validated using the initial input data (Shen et al., 2012a), and the transitivity error from input data to model outputs was quantified by changing the available datasets while keeping the calibrated parameters fixed.
Errors introduced by rainfall data, DEMs and land use maps were analyzed.
The influence of soil type maps was not analyzed, because only one soil map
data (coarse resolution at
In this study, rainfall datasets were collected from twelve rain gauges
located within the watershed boundary and two outside stations that were
within approximately 10
In this watershed, two DEM sets were available for NPS modeling: (1) the
National Fundamental Geographic Information System of China DEM (NFGIS DEM)
and (2) the ASTER GDEM. Specifically, the NFGIS DEM was acquired in 1998 from
a topographic map with a resolution of 90
As discussed above, land use data available for the modeling effort will likely come from numerous sources; therefore, an assessment of available land use data and the time period covered by these data should be made. In this study, land use data were obtained from the 1980s (1980–1989), 1995, 2000, and 2007. The land use statistics are shown in Table 2. Specifically, maps from the 1980s, 1995 and 2000 were interpreted from MSS/TM/ETM images by the Chinese Academy of Sciences, whereas the land use map for 2007 was created from a TM image. In our previous study (Shen et al., 2013), the resolution of land use data was shown to have only a slight influence on simulated NPS-P for the study region; therefore, the land use map was not resampled in this study.
Attribute data, including crop planting time, irrigation, fertilization, and
tillage, were mainly obtained from the agricultural bureau and local
farmers; therefore, these data only reflect the aggregate information at an
average level. In this sense, there were inevitable differences in
management practices among farmers; therefore, the use of this average
information might result in fertilizer amount errors. In this analysis, the
uncertainty due to the amount of fertilizer applied was also treated as
input uncertainty. Initially, the annual applied urea and compound
fertilizer was set as 450
This study focused on error-transitivity from input data to NPS-TP
predictions (the sum of organic P and mineral P) at the WX for the period
from 2000 to 2007. First, the sensitivity of simulated TP to each input data
was quantified in the form of summary statistics, such as the SD and the
coefficient of variation (CV). Specifically, the CV, which is a normalized
measure of dispersion of a probability distribution, is defined as
a dimensionless number by quantifying the ratio of the SD to the MV. Compared to
SD, the CV is more appropriate for comparing different data sets; therefore, it
was used as the main approach for expressing uncertainty in this study.
To re-validate the range of input data, the Nash–Sutcliffe coefficients
(
Generally, watershed modeling involves two kinds of uncertainty: (1)
systematic model uncertainty regardless of correct input, and (2) uncertainty
due to inaccurate input. In this study, model structure was fixed and model
results will be dependent on the interaction of input errors. Based on the
performance ratings by Moriasi et al. (2007),
As shown in Table 3, for the flow simulation, the
To determine the sensitivity of each input dataset, the degree of
uncertainty of simulated TP was illustrated in Fig. 2. As shown in
Fig. 2a, the annual mean CV ranged from 0.284 (2006) to 0.587 (2003),
indicating there were significant uncertainties in these single rain gauge
simulations. The
Using NFGIS DEMs (Fig. 2c), the CV values were found to be low with an annual mean CV of 0.026–0.119, but the CV values were higher using ASTER DEMs (Fig. 2d), with CV values ranging from 0.105 to 0.383. Figure 2e shows the statistical analysis using different land use maps. Compared to the input data presented above, the annual mean CV values, which ranged from 0.009 to 0.036, were relatively low. Besides, as shown in Fig. 2f, the simulated TP showed only slight variation related to the errors in the amount of fertilizer, with mean CV values of 0.003–0.008.
Finally, a multi-input ensemble method was used for a comprehensive evaluation of input-induced model uncertainty. As shown in Table 4, the annual CV values of simulated TP ranged from 0.101 to 0.271, indicating a temporal variation for the period from 2000 to 2007. The ensemble of input-induced outputs was also determined for all six given outlets. As illustrated in Fig. 3, the annual mean CV values were 0.190 for XX, 0.088 for DX, 0.206 for HX, 0.162 for BY, 0.168 for WX and 0.135 for CF.
Table 4 gives a clear comparison between different types of input
data. For the given catchment and rainfall characteristics, rainfall input
is identified as the most important factor in NPS simulation, whereas rain
gauge density is the most important source contributing to the overall
uncertainty. The results from the statistical analysis are reasonable as
rainfall is the major driving force of NPS pollution (Andréassian et al., 2001; McMillan et al., 2011). As shown in Table 1, rainfall data
varied substantially among different gauges, with a 933
Figure 2b illustrates that there were reductions in the CV values
compared with the single-gauge simulations, which clearly showed that the
ensemble of multi-gauge simulations outperformed the single-gauge
simulations. However, no clear relationship existed between the
As illustrated in Fig. 2c and d, the second highest uncertainty was caused by
DEMs, and the ASTER GDEM-induced uncertainty was higher than by uncertainty
induced by NFGIS DEM. These higher values could be due to the following two
reasons: first, NFGIS DEM was already validated in many places in China,
which was not the case for ASTER GDEM (Wu et al., 2007; Dixon and Earls,
2009). In fact, ASTER GDEM contains systematic errors; i.e., a significant
number of anomalies attributable to cloud disturbances, the algorithm used to
generate the final GDEM, and not applying inland water mask. Second, the
initial resolution of NFGIS DEM (
In contrast, land use maps and fertilizer amount resulted in low uncertainties. The result differ from those of Payraudeau et al. (2004), who found that model outputs were highly sensitive to land use changes. This could be explained by the fact that most agricultural land was redistributed to forest and other land uses in the study of Payraudeau et al. (2004), which leads to significant changes in soil compaction and ground cover. However, these low values in our study could be due to minor land use changes during the period from the 1980s to 2007. As shown in Table 2, the fraction of forest area decreased gradually from 61.75 to 54.76 %, whereas agricultural land increased from 25.68 to 33.47 %. Figure 2f indicates that the fertilizer input has only a slight impact on in-stream TP loads. This was because P application was low in this watershed with the inorganic N being applied in greater amounts and more widely. Additionally, according to the mechanism of the SWAT model, P would be taken up mainly by crop rotation, and this process would govern the turnover rates and transport of P. Therefore, only a small proportion of P will finally flow into the water body as in-stream NPS-TP. In this sense, there might also be minor CV values if other representative attribute practices, e.g., tillage data, were selected. This indicates the degree of sensitivity due to single input data depends on two factors: the ratio of each individual input contribution to the total load (which is the case for management data) and the error in the individual input (which is more meaningful for land use maps).
As shown in Fig. 3, this demonstrated that input-induced uncertainty may be highly area-specific; i.e., dependent upon the scale of the drainage area and rainfall variability. For example, when multiple gauges (from 1 to 12) are used as model inputs, the simulated TP remained stable for the DX and no model uncertainty was observed. This could be due to the mechanism of SWAT, in which only the rainfall data from the closest gauge to the centroid were chosen and used as the sole model input for that specific sub-watershed. As shown in Fig. 1, there is only one sub-watershed in the DX region and the XN gauge is closest to its centroid; therefore, the rainfall data from the same gauge was used every time for this region. However, the CV values remained high for other outlets, ranging from 0.187 (CF) to 0.448 (XX), suggesting that rain gauge density indicated different impacts under different spatial scales of drainage areas. In addition, using different DEM data, the CV values were relatively low for XX, DX, WX and CF, with an annual mean CV of 0.022–0.055, but the CV values were relatively high for HX and BY, with values of 0.152 and 0.136, respectively. This could be explained by the fact that there are more mountainous areas along XX, DX, WX and CF; therefore, the generated topography in these regions, such as the watershed boundary, surface slope and other characteristic parameters, could be extracted more easily by DEM data.
These results pose two significant scientific challenges for TMDLs. First, as model uncertainty is difficult to quantify, the margin of safety (MOS) was often arbitrarily assumed as 10 % error. However, as shown in Table 4, this assumption is not highly related to the reliability of the model system and supported the quantification of TMDLs poorly. Specifically, ccompare to our previous studies (Shen et al., 2012b), the uncertainties caused by input errors were greater than those resulting from model parameters in 2001, 2005, and 2007, whereas uncertainties caused by inputs were lower in the remaining years. Overall, the mean CV (0.168) for input-induced TP uncertainty was slightly higher than that (0.156) for the parameter uncertainty, which agrees with previous studies (Kuczera et al., 2006). Therefore, input data uncertainty is critical in NPS modeling and efforts should be made to reduce this type of uncertainty. Second, as illustrated in Fig. 3, the input data-induced uncertainty varies considerably temporally and spatially as a complex function of climate, underlying topography, land use, soil type, and management (Shen and Zhao, 2010; Chen et al., 2012). In this sense, a site-specific MOS might be more robust to any particular sequence of input errors than current steady MOS.
In this research, the impacts of four different input data types, including rainfall data, DEMs, land use maps, and amount of fertilizer, on NPS modeling were quantified and compared. Based on the results, input data-induced uncertainty is critical in NPS modeling and efforts should be made to decrease this type of uncertainty. For the case study, the mean CV value ranged from 0.101 to 0.271, which is slightly higher than that for the parameter uncertainty. The study indicated that rainfall input resulted in the highest uncertainty, followed by DEM, land use maps, and fertilizer amount. Therefore, measures should be taken first to reduce this source of uncertainty by adding rain gauges, modifying the selection mechanism of rain gauge in SWAT, and using appropriate interpolation techniques. This paper also demonstrated the required input information would be reached if several key rain gauges and moderate-resolution DEMs are used. This paper provides valuable information for developing TMDLs in the Three Gorges Reservoir Area, and these results are also valuable to other model-based watershed studies for the control of model uncertainty.
However, this conclusion might be only appropriate for NPS-TP and not for other pollutants, i.e., the generation and transportation of nitrogen differ substantially from those of NPS-P. Furthermore, the influence of soil type maps was not analyzed, because only one coarse soil map was available for the study region. More researches are needed if detailed input data sets are collected.
The data could be obtained by emailing the first author.
Z. Shen designed the experiments. L. Chen and Y. Gong developed the SWAT model and performed the simulations. L. Chen prepared the manuscript with contributions from all co-authors.
This project was supported by the Fund for Innovative Research Group of the National Natural Science Foundation of China (Grant No. 51421065), the National Natural Science Foundation of China (No. 51409003 & 51579011), and Project funded by China Postdoctoral Science Foundation.
The recorded annual mean rainfall data for each rain gauge (2000–2007).
MV indicates the mean value and SD represents the standard deviation.
The fraction of land use types within the watershed for different periods.
The values of
The sensitivity of simulated TP (CV values) to different input dataset.
Locations of and the rain gauges within the Upper Daning River Watershed.
Uncertainty of simulated TP induced by each input data, in which the line, error bar and inverted column indicate the mean value, SD and CV values, respectively.
Comprehensive uncertainty of input data-induced simulated TP, in which the line, error bar and inverted column indicate the mean value, SD and CV values, respectively.