Journal cover Journal topic
Hydrology and Earth System Sciences An interactive open-access journal of the European Geosciences Union
Journal topic

Journal metrics

Journal metrics

  • IF value: 4.936 IF 4.936
  • IF 5-year value: 5.615 IF 5-year
    5.615
  • CiteScore value: 4.94 CiteScore
    4.94
  • SNIP value: 1.612 SNIP 1.612
  • IPP value: 4.70 IPP 4.70
  • SJR value: 2.134 SJR 2.134
  • Scimago H <br class='hide-on-tablet hide-on-mobile'>index value: 107 Scimago H
    index 107
  • h5-index value: 63 h5-index 63
Discussion papers
https://doi.org/10.5194/hess-2019-648
© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.
https://doi.org/10.5194/hess-2019-648
© Author(s) 2020. This work is distributed under
the Creative Commons Attribution 4.0 License.

Submitted as: research article 06 Jan 2020

Submitted as: research article | 06 Jan 2020

Review status
This discussion paper is a preprint. It is a manuscript under review for the journal Hydrology and Earth System Sciences (HESS).

Systematic comparison of five machine-learning methods in classification and interpolation of soil particle size fractions using different transformed data

Mo Zhang1,2 and Wenjiao Shi1,3 Mo Zhang and Wenjiao Shi
  • 1Key Laboratory of Land Surface Pattern and Simulation, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China
  • 2School of Earth Sciences and Resources, China University of Geosciences, Beijing 100083, China
  • 3College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, China

Abstract. Soil texture and soil particle size fractions (PSFs) play an increasing role in physical, chemical and hydrological processes. Many previous studies have used machine-learning and log ratio transformation methods for soil texture classification and soil PSFs interpolation to improve the prediction accuracy. However, few reports systematically compared the performance of them in both classification and interpolation. Here, a total of 45 evaluation models generated from five machine-learning models – K-nearest neighbor (KNN), multilayer perceptron neural network (MLP), random forest (RF), support vector machines (SVM), extreme gradient boosting (XGB), combined with original and three log ratio methods – additive log ratio (ALR), centered log ratio (CLR) and isometric log ratio (ILR), were applied to evaluate and compare both of them using 640 soil samples in the Heihe River Basin in China. The results demonstrated that log ratio transformation methods decreased skewness of distributions of soil PSFs data. For soil texture classification, RF and XGB showed better performance with the overall accuracy and kappa coefficients, they were also recommended to evaluate classification capacity of imbalanced data according to the area under the precision-recall curve (AUPRC) analysis. For soil PSFs interpolation, RF delivered the best performance among five machine-learning models with the lowest root mean squared error (RMSE, sand: 15.09 %, silt: 13.86 %, clay: 6.31 %), mean absolute error (MAE, sand: 10.65 %, silt: 9.99 %, clay: 5.00 %), Aitchison distance (AD, 0.84) and standardized residual sum of squares (STRESS, 0.61), and the highest coefficient of determination (R2, sand: 53.28 %, silt: 45.77 %, clay: 53.75 %). STRESS was improved using log ratio methods, especially CLR and ILR. For the comparison of direct and indirect classification, prediction maps were similar on the middle and upper reaches and different on the lower reaches of the HRB. Moreover, indirect classification maps based on log ratio transformed data had more detailed information. There is a pronounced improvement with 21.3 % of kappa coefficient using indirect methods for soil texture classification compared to the direct ones. RF was recommended as the best strategy among these five machine-learning models according to the accuracy evaluation of soil PSFs interpolation and soil texture classification, and ILR was recommended for component-wise machine-learning methods without multivariate treatment considering the constrained nature of compositional data. In addition, XGB was preferred than other models when trade-off of accuracy and time was considered. Our findings can provide a reference for other research of spatial prediction of soil PSFs and texture using machine-learning methods with skewed distribution soil PSFs data in a large area.

Mo Zhang and Wenjiao Shi
Interactive discussion
Status: open (until 02 Mar 2020)
Status: open (until 02 Mar 2020)
AC: Author comment | RC: Referee comment | SC: Short comment | EC: Editor comment
[Subscribe to comment alert] Printer-friendly Version - Printer-friendly version Supplement - Supplement
Mo Zhang and Wenjiao Shi
Mo Zhang and Wenjiao Shi
Viewed  
Total article views: 180 (including HTML, PDF, and XML)
HTML PDF XML Total Supplement BibTeX EndNote
151 27 2 180 11 2 3
  • HTML: 151
  • PDF: 27
  • XML: 2
  • Total: 180
  • Supplement: 11
  • BibTeX: 2
  • EndNote: 3
Views and downloads (calculated since 06 Jan 2020)
Cumulative views and downloads (calculated since 06 Jan 2020)
Viewed (geographical distribution)  
Total article views: 166 (including HTML, PDF, and XML) Thereof 166 with geography defined and 0 with unknown origin.
Country # Views %
  • 1
1
 
 
 
 
Cited  
Saved  
No saved metrics found.
Discussed  
No discussed metrics found.
Latest update: 24 Jan 2020
Publications Copernicus
Download
Short summary
We systematically compared 45 models for direct and indirect soil texture classification, and soil particle size fractions interpolation using 5 machine-learning models and 3 log ratio transformation methods. Random forest showed powerful performance in both classification of imbalanced data and regression assessment. Extreme gradient boosting are more meaningful and computationally efficient when dealing with large data sets. The indirect classification and log ratio methods are recommended.
We systematically compared 45 models for direct and indirect soil texture classification, and...
Citation