Ecosystems https://doi.org/10.1007/s10021-024-00928-7  2024 The Author(s) From Rangelands to Cropland, Land-UseChangeandIts ImpactonSoil OrganicCarbonVariables inaPeruvian Andean Highlands: AMachine Learning Modeling Approach Mariella Carbajal,1,2,3* David A. Ramı́rez,1* Cecilia Turin,4,5 Sean M. Schaeffer,6 Julie Konkel,7 Johan Ninanya,1,8 Javier Rinza,1 Felipe De Mendiburu,9 Percy Zorogastua,10 Liliana Villaorduña,11 and Roberto Quiroz12 1International Potato Center (CIP), Headquarters, P.O. Box 1558, 15024 Lima, Peru; 2Electrical and Computer Engineering, North Carolina State University, Raleigh, North Carolina, USA; 3NC Plant Sciences Initiative, North Carolina State University, Raleigh, NC, USA; 4Department of Animal Production, School of Animal Science, Universidad Nacional Agraria La Molina (UNALM), 15024 Lima, Peru; 5Centro Experimental La Molina, Dirección de Supervisión y Monitoreo en Las Estaciones Experimentales Agrarias, Instituto Nacional de Innovación Agraria (INIA), Av. La Molina N 1981, 15024 Lima, Peru; 6Department of Biosystems Engineering and Soil Science, University of Tennessee, 2506 E.J. Chapman Drive, Knoxville, Tennessee 37996, USA; 7Blount County Soil Conservation District, 1217 McArthur Road Maryville, Maryville, Tennessee 37804, USA; 8Applied Meteorology Master Program, Universidad Nacional Agraria La Molina (UNALM), 15024 Lima, Peru; 9Department of Statistics and Informatics, School of Economics and Planning, Universidad Nacional Agraria La Molina (UNALM), 15024 Lima, Peru; 10Department of Agronomy, School of Agriculture, Universidad Nacional Agraria La Molina (UNALM), 15024 Lima, Peru; 11Water Resources Engineering Master Program, Universidad Nacional Agraria La Molina (UNALM), 15024 Lima, Peru; 12Sistema Nacional de Investigadores (SENACYT), Edificio 205, Ciudad Saber, Clayton, Panama Received 17 April 2024; accepted 13 August 2024 ABSTRACT Supplementary Information: The online version contains supple- Andean highland soils contain significant quanti- mentary material available at https://doi.org/10.1007/s10021-024-0092 ties of soil organic carbon (SOC); however, more 8-7. efforts still need to be made to understand the Author contributions MC performed soil C analyses, ML modeling, SOC mapping, and led manuscript writing. DR led manuscript writing processes behind the accumulation and persistence and the research team. CT led soil sampling, land use, and land cover of SOC and its fractions. This study modeled SOC categorization. SMS and JK established and led soil C analyses. JN and JR variables—SOC, refractory SOC (RSOC), and the conducted statistical analysis and contributed to the writing. FDM, PZ, 13 and LV designed and performed soil sampling. RQ led the research team, C isotope composition of SOC (d 13CSOC)—using conceptualized the main study, and obtained financial support for the machine learning (ML) algorithms in the Central research.MC performed soil C analyses, ML modeling, SOC mapping, and Andean Highlands of Peru, where grasslands and led manuscript writing. DR led manuscript writing and the research team. CT led soil sampling, land use, and land cover categorization. SMS and JK wetlands (‘‘bofedales’’) dominate the landscape established and led soil C analyses. JN and JR conducted statistical surrounded by Junin National Reserve. A total of analysis and contributed to the writing. FDM, PZ, and LV designed and 198 soil samples (0.3 m depth) were collected to performed soil sampling. RQ led the research team, conceptualized the main study, and obtained financial support for the research. assess SOC variables. Four ML algorithms—random *Corresponding author; e-mail: mcarbaj@ncsu.edu; d.ramirez@cgiar.org forest (RF), support vector machine (SVM), artifi- M. Carbajal and others cial neural networks (ANNs), and eXtreme gradient ment; Rolando and others 2017a) make these areas boosting (XGB)—were used to model SOC vari- especially vulnerable to climate change. Rising ables using remote sensing data, land-use and land- temperatures have led to an expansion of crops to cover (LULC, nine categories), climate topography, higher elevations (Skarbø and VanderMolen 2016), and sampled physical–chemical soil variables. RF promoting an increasing incidence of pests (Dan- was the best algorithm for SOC and d13CSOC pre- gles and others 2008) and diseases. diction, whereas ANN was the best to model RSOC. Global warming drives crop encroachment on ‘‘Bofedales’’ showed 2–3 times greater SOC the Andes’s higher lands (Rolando and others (11.2 ± 1.60%) and RSOC (1.10 ± 0.23%) and 2017a), which causes a substantial land-use change more depleted d13CSOC (- 27.0 ± 0.44 &) than and the reduction of soil organic carbon (SOC) other LULC, which reflects high C persistent, pools. External market demand, environmental turnover rates, and plant productivity. This high- policy, and management of high Andean grasslands lights the importance of ‘‘bofedales’’ as SOC reser- have led to regrettable examples of landscape voirs. LULC and vegetation indices close to the degradation and transformation. In the Andean near-infrared bands were the most critical envi- highlands of Junin-Peru, the so-called ‘‘boom’’ of ronmental predictors to model C variables SOC and maca (Lepidium meyenii), a ‘‘superfood’’ appreciated d13CSOC. In contrast, climatic indices were more for its energizing nutritional power with high de- important environmental predictors for RSOC. This mand in the Asian market during 2011–2015 study’s outcomes suggest the potential of ML (Turin and others 2018), has transformed a land- methods, with a particular emphasis on RF, for scape dominated by highland grassland cover to a mapping SOC and its fractions in the Andean prevalence of bare soil degraded by maca cultiva- highlands. tion. This cultivation process involves burning and plowing the grassland with heavy machinery, Key words: Artificial neural networks; Bofedales; releasing significant amounts of carbon (123– 13C isotope composition; Extreme gradient boost- 136 t ha-1; Rolando and others 2017b). Further- ing; Grasslands; Random forest; Refractory C frac- more, other activities put to risk the conservation tion; Support vector machine. and functioning of high Andes wetlands named ‘‘bofedales,’’ which are crucial for water security in lowlands (MINAM 2015) and for conserving sig- nificant soil C stocks (Monge-Salazar and others HIGHLIGHTS 2022; Hribljan and others 2016) and biodiversity (Polk and others 2019; Maldonado 2014). These activities involve extracting compact blocks of  ML algorithms consistently modeled SOC vari- vegetation with a thin layer of soil, which is then ables with high performance used as alternative energy for heating and cooking  Free publicly available remote sensing data was (Caro and others 2014) and the overgrazing caused useful for SOC variables prediction by domestic livestock (Cochi Machaca and others  Bofedales and grasslands were the most impor- 2018). tant reservoirs of SOC and fractions Andean soil contains high quantities of SOC, the carbon that remains in the soil after the partial decomposition of organic matter by microorgan- INTRODUCTION isms (Alavi-Murillo and others 2022). However, The High Andes, located between 5S and 20S and few studies address SOC assessment and modeling above 4000 m.a.s.l., are characterized by their rich in the Andean highlands region. Refractory SOC agro biodiversity (Monge-Salazar and others 2022) (RSOC) represents a fraction that persists in soil and ecosystem services (Rolando and others and has a finite turnover time of thousands of years 2017a). However, the high melting rate of their (Krull and others 2003). It represents one of the glaciers (Zemp and others 2019), high frequency significant global SOC pools (Jagadamma and oth- and intensity of extreme events (heavy rainfalls, ers 2010), and its quantification is crucial for frosts, strong winds, droughts, among others; Po- understanding C dynamics (decomposition and veda and others 2020), and changes in land use stabilization processes). Also, the 13C isotope com- (mainly agricultural intensification and encroach- position of SOC (d 13CSOC) constitutes another cru- From Rangelands to Cropland, Land-use Change and Its Impact cial soil trait because it may be used to estimate topography, and soil properties, and iv) to spatially plant inputs into soil organic matter (Ehleringer model SOC across the study area. and others 2000; Bernoux and others 1998). Moreover, d13CSOC has been shown to vary with MATERIALS AND METHODS SOC turnover rate, and sources of SOC under land- use change (Ehleringer and others 2000; Xia and Study Area others 2021; Han and others 2023). Predictions of The study was conducted in the central Peruvian quantity and turnover rate based on d13CSOC are Andean highlands within the districts of Junin and subject to errors associated with climate variability, Carhuamayo in the department of Junin (10 01¢ S, temporal differences, and anthropogenic contami- 76 07¢ W, 4200 m a.s.l.). The study area comprised nation. Therefore, it is essential to quantify these about 800 km2 within the Junin National Reserve errors and compare them with other results to buffer zone (Figure 1), whose primary purpose is to achieve robustness. Artificial intelligence methods, protect the grassland and bofedal ecosystems and including machine learning (ML) and deep learn- biodiversity of Junins lake and the surrounding ing, emerged in the last two decades in pedometrics central Andean highlands. The Ramsar Convention and have been demonstrated to outperform other identifies this site as an essential wetland area (site SOC modeling approaches, such as linear regres- number 882; RSIS 2021). The climate is rainy and sion and geostatistical approaches, due to their cold, with dry autumn/winter according to the ability to find nonlinear patterns in a multidimen- Thornthwaite climatic classification system (SE- sional set of potential environmental predictions NAMHI 2022). The annual average maximum (Somarathna and others 2016; Keskin and others temperature, minimum temperature, and precipi- 2019; Veronesi and Schillaci 2019; Chen and others tation are 9–19 C, -3–3 C, and 500–1200 mm, 2022; Grunwald 2022; Zhu and others 2022). respectively (period 1981–2010; SENAMHI 2022). However, multiple algorithms are commonly tested The soil in the study area is characterized mainly by and compared because no rule exists for choosing a predominance of Inceptisols with a trend of high the best ML algorithm. This is because ML models SOC concentrations and acidic pH (Rolando and are considered black boxes (the underlying pro- others 2018). cesses for prediction are unknown), and the algo- As grasslands and ‘‘bofedales’’ dominate the rithms fit differently depending on the input data. landscape, the primary land use and the main Remote sensing data have been used as a primary livelihood is grazing livestock consisting of cattle source of predictor variables. Multispectral ima- and sheep, which coexist with wild vicuñas (Vi- gery, including Landsat (Ayala Izurieta and others cugna vicugna). In some cases, subsistence agricul- 2021), MODIS (Sreenivas and others 2016), SPOT ture is practiced with crops of potato and maca and (Liu and others 2015a), and others, is used as a is limited to a few small spots of land. However, nondestructive data source to study SOC variability from the 1990s to the present day, maca cultivation (Gehl and Rice 2007; Chatterjee and others 2021) has had a significant expansion, becoming the through the calculation of different spectral indices. primary driver of land-use change and the leading The Andean highlands have received little atten- disruptor of the high Andean drylands (‘‘puna’’) tion for quantifying soil C fractions. No approaches ecosystem (Turin and others 2018). for developing predictive models that help to understand C process dynamics and the main dri- Measured Soil Data vers for this system have been validated, perhaps due to the high spatial heterogeneity and limited The soil sampling sites were selected following the resources to conduct sampling. In this study, SOC, Latin Hypercube sampling (LHS) statistical method, RSOC, and d13CSOC (referred to as soil C target which provides an efficient way of sampling vari- variables hereafter) were measured and used to ables, ensuring a good representation of the envi- develop predictive models using ML algorithms and ronmental characteristics of the study area (Carré publicly available remote sensing data in the An- and others 2007; Wang and others 2022; Stein dean highlands of Junin-Peru. This study aims: i) to 1987; McKay and others 1979). The LHS method compare the soil C target variables among the most used the multidimensional distributions of the important land uses in the zone, ii) to analyze the slope, precipitation, minimum and maximum performance of some ML methods for predictive temperatures, normalized difference vegetation modeling, iii) to find their most important envi- index (NDVI), and land cover estimated by a ronmental predictors related to land use, climate, supervised classification from Landsat 8 imagery from the United States Geological Survey (USGS M. Carbajal and others Figure 1. The study area and the 198 soil sample locations in the proximities of Junin Lake are located in the Central Peruvian Andean Region of the Province of Junin. Advanced Land Observing Satellite (ALOS) Phased Array type L-band Synthetic Aperture Radar (PALSAR) digital elevation model was used for mapping. 2020) to determine the sampling locations. The estimated by multiplying SOC (see its determina- sampling locations were adjusted in practice due to tion below) with bulk density following Rolando high slopes and accessibility, resulting in the and others’ (2017b) procedure. Unfortunately, bulk selection of 198 sites. A composite soil sample density measurements were made for just 64% of ( 5 kg) was gathered at each sampling site from the sites selected due to operational inconve- five locations: one central point and four points niences; therefore, LULC averaged values are re- positioned 2 m apart in the N, S, E, and W cardinal ported. directions. These samples were collected using an Composite soil samples were analyzed for texture auger from the 0.3 m soil profile (Art’s Manufac- and pH using a hydrometer and suspension turing & Supply Inc., model Mud Augers, USA). In potentiometer (water in 1:1 relation) at the Soil addition, a pit measuring 0.8 9 0.7 9 0.5 m3 was Laboratory of the National Agrarian University La dug at the central point for bulk density measure- Molina—Lima, Peru. The soil C target variables’ ments within the 0–0.3 m soil profile, using metal values were determined using a Combustion cylinders of 0.05 m in diameter. Then, C stock was Module coupled to a Cavity Ring-Down Spec- From Rangelands to Cropland, Land-use Change and Its Impact troscopy (CM-CRDS) system based on Liu and uct. The climate indices were the minimum and other’s (2018) procedure for SOC and d13CSOC. maximum of the average monthly minimum Thus, a soil subsample per site was sieved to < 2 (TMNN and TMNX, respectively) and maximum mm, dried at 60 C, and ground with a mortar and temperatures (TMXN and TMXX, respectively) and pestle. Then, the final soil sample weights to be the average annual total precipitation (PREC), analyzed were determined by land-use type based calculated from WorldClim version 2.1 climate data on their mean expected soil C concentration. (period 1970–2000 with  1 km resolution). Hence, 0.015, 0.030, 0.027, and 0.0075 g were Vegetation also plays a vital role in these carbon packaged in tin capsules for maca crops, fallow and variables, so the nine spectral bands and several cultivated pastures, native grasslands and improved vegetation indexes were estimated from a Landsat pastures, and wetlands (‘‘bofedales,’’ see below), 8 Operational Land Imager (OLI) imagery from respectively. For RSOC, a second soil subsample per November 26th, 2014 (see list in Table 1). Remote site was oxidized using H2O2, according to Jaga- sensing data was preprocessed using Environmen- damma and others (2010), with slight modifica- tal Systems Research Institute (ESRI) ArcGIS soft- tions. Thus, 1 g of sieved soil (< 2 mm) was ware (ESRI, 2011, Redlands, CA). oxidized by adding 90 ml of 10% H2O2 for 2– In addition, as the predominant vegetation was 3 days, centrifugated for 15 min, washed three grasslands and grasslands converted into maca times with deionized water, and freeze-dried. From fields, finer land-use and land-cover (LULC) cate- the remaining soil, 0.075 g was weighed and gories were defined depending on the type of packaged in tin capsules. Finally, all tin capsules grassland, condition, and history (see Figure 2). were submitted to a CM-CRDS system (G2131-iA- ‘‘Vigorous grasslands’’ (n = 45) was defined as nalyzer, Picarro Inc., USA). d13CSOC was estimated healthy, tall grasslands with good cover and sparse from the 13C/12C natural abundance values re- bare soil. ‘‘Partially degraded grasslands’’ (n = 57) ported by the equipment relative to international were referred to as medium-sized, sparse grasslands standard VPDB (Vienna Pee Dee Belemnite) using with some bare soil, whereas ‘‘degraded grass- the equations by Liu and others (2018). All the lands’’ (n = 47) were typified as low and sparse analyses were performed in the Schaeffer Lab in grasslands covered surrounded by abundant bare the Department of Biosystems Engineering and Soil soil. All grassland categories are land used neither Science at the University of Tennessee, Knoxville, for cropping activities nor perturbed. ‘‘Improved USA. pastures’’ (n = 5) referred to grasslands with introduced cultivated species such as white clover Environmental Predictors and Land-Use (Trifolium pratense) and red clover (Trifolium repens) and Land-Cover Categories through inter-seeding, implying a minimum per- turbation since it does not require plowing. ‘‘Cul- Given that the soil C target variables result from tivated pastures’’ (n = 24) was defined to transform complex processes and interactions of several native grasslands into an association of species such environmental factors—including topography, cli- as king grasses (Lolium multiflorum, Lolium perenne) mate, soil properties, and vegetation—the primary and clovers (Trifolium spp.), introduced 40 years environmental predictors underpinning their un- ago in the case of the multi-communal cooperative ique processes are likely to vary in significance. system and 15 years ago in the farmer community Despite this complexity and considering the limited system. ‘‘Bofedales’’ (n = 10) is a type of Andean ML experience in predicting soil C variables beyond highland wetland with hydromorphic vegetation SOC, this study utilized an identical set of features and generally accumulates peat, seasonally or per- (environmental predictors hereafter) for SOC, 13 manently saturated with water (Monge-SalazarRSOC, and d CSOC. Thus, the environmental pre- and others 2022). ‘‘Fallow 1’’ (n = 13) referred to dictors considered for the models were obtained bare soils from recently harvested maca crops or up from publicly available remote sensing data, soil lab to 2 years of fallow, which in turn come from the analysis, and vegetation type and condition at soil recent conversion of vigorous or partially degraded sampling (see definitions in Table 1). The topo- grasslands plowed to be converted to maca crop- graphic variables were elevation (DEM—Digital land. ‘‘Fallow 2’’ (n = 20) was composed of bare elevation model), slope, aspect, and topographic soils with invasive sparse grass species, coming wetness index (TWI), derived from the Advanced from maca crops harvested 3 to 5 years ago, which Land Observing Satellite (ALOS) Phased Array type in turn result from the conversion of ‘‘vigorous’’ or L-band Synthetic Aperture Radar ‘‘partially degraded grasslands’’ that have been (PALSAR)—Radiometric Terrain Correction prod- plowed to be transformed into maca cropland. M. Carbajal and others Table 1. Features Considered as Potential Environmental Predictors in Soil Organic Carbon Prediction Variables Abbreviation Equation Source Land-use and land-cover condition at LULC Field sampling sampling Soil texture particle sizes: sand con- SAND tent (%) Soil texture particle sizes: silt content SILT (%) Soil texture particle sizes: clay con- CLAY tent (%) pH in water pH Closest Euclidean distance to the lake DLAKE d(lake, sample) Average annual precipitation (mm) PREC Climate variables: Minimum of average monthly mini- TMNN min(Tmin) WorldClim v2.1, cli- mum temperature (C) mate data from 1970– Maximum of average monthly mini- TMNX max(Tmin) 2000. 30 s resolution mum temperature (C) ( 1 km) Minimum of average monthly maxi- TMXN min(Tmax) mum temperature (C) Maximum of average monthly max- TMXX max(Tmax) imum temperature (C) Global-Aridity index ARID CGIAR Consortium for Global-Potential evapotranspiration PET Spatial Information (CSI) Ultra-blue band (435 – 451 nm) UBLUE Landsat 8 Operational Blue band (452 – 512 nm) BLUE Land Imager (OLI) Green band (533 – 590 nm) GREEN Red band (636 – 673 nm) RED Near-infrared band (851 – 879 nm) NIR short-wave infrared-1 band (1566 – SWIR1 1651 nm) Short-wave infrared-2 band (2107 – SWIR2 2294 nm) Spectral vegetation indexes 1 SER1 Red=Green Derived vegetation in- Spectral vegetation indexes 2 SER2 Red=SWIR2 dexes from Landsat 8 Spectral vegetation indexes 3 SER3 SWIR1=SWIR2 OLI Normalized Difference Vegetation NDVI ðNIR RedÞ=ðNIRþ RedÞ Index   Enhanced Vegetation Index EVI 2:5 NIRRed NIRþ6Red7:5Blueþ1 Soil-Adjusted Vegetation Index. SAVI ð1þ LÞ  ððNIR RedÞ=ðNIRþ Red þ LÞÞ L = 0.5 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi 2NIRþ1 ð  þ 22 NIR 1Þ 8ðNIRRedÞ Modified Soil Adjusted Vegetation MSAVI 2 Index Normalized Difference Moisture In- NDMI ðNIR SWIR1Þ=ðNIRþ SWIR1Þ dex Normalized Burn Ratio 1 and 2 NBR1 ðNIR SWIR2Þ=ðNIRþ SWIR2Þ Normalized Burn Ratio 1 and 2 NBR2 ðSWIR1 SWIR2Þ=ðSWIR1þ SWIR2Þ Digital Elevation Model DEM ALOS PALSAR* Slope in degrees SLOPE Derived topographic Aspect ASPECT properties from DEM Topographic Wetness Index** TWI lnða=tanbÞ (*) Advanced Land Observing Satellite (ALOS) Phased Array type L-band Synthetic Aperture Radar (PALSAR). (**) Calculated according to Quinn and others (1991), where a ¼ ðTotalcatchmentareaÞ=ðFlowwidthÞ and b ¼ Slope: From Rangelands to Cropland, Land-use Change and Its Impact ‘‘Fallow-3’’ (n = 12) referred to invasive grass erature recommendation, especially for regression species with sparse low vegetation resulting from and when variable importance is of interest, feature long-standing maca fallow (> 5 years) of trans- scaling was applied even for the tree-based algo- formed grasslands into maca cropland. (Table 2). rithms RF and XGB (Strobl and others 2007; Bal- abaeva and Kovalchuk, 2019). Then, Modeling Approach hyperparameters were tuned using ‘‘out-of-bag, ‘‘tenfold cross-validation repeated three times, and From the 42 potential environmental predictors ‘‘leave-one-out cross-validation’’ resampling considered for this study (33 numerical and nine methods for RF, SVM, and ANN-XGB. For every categorical from LULC, Table 1), some may be soil C target variable modeled, performance metrics nonessential or repetitive, and it is always better to were averaged across the fivefold partitions for identify and exclude them from the model build- both the training and testing phases (see next sec- ing. Addressing and preselecting the minimum- tion) and compared to identify the best predictive optimal and all-relevant features to include (fea- ML model. Once the best model was found and due ture selection) helps optimize the model prediction to the small dataset, the ML model was retrained and reduce overfitting (Parsaie and others 2021). using the whole dataset (without partitioning), and Among the feature selection methods, the Boruta the important variables were evaluated. The ML method (Kursa and Rudnicki 2010) has yielded models were built and assessed using R 3.6.1 and better results when working with environmental the packages ‘‘Boruta’’ v7.0.0 (Kursa and Rudnicki processes like SOC decomposition due to its ability 2010) for feature selection and ‘‘caret’’ v6.0.86 to identify linear and nonlinear relationships from (Kuhn and others 2019) for applying the RF, ANN, complex processes (Keskin and others 2019; Zer- SVM, and XGB algorithms. aatpisheh and others 2022). This study used Boruta Finally, its spatial distribution was mapped, and to select all-relevant and tentative environmental SOC was identified as the primary variable of predictors for building the models for every soil C interest. The RF model was recalibrated by target variable. Based on a random forest (RF) retraining it, using the most important spatially classification algorithm, this method creates ran- available environmental predictors, which included domness in the system and determines the unim- LULC, SER2, NDMI, MSAVI, NDVI, DLAKE, portant, meaningful, and tentative attributes of a SWIR2, and NBR1. For LULC, a land-cover classi- given variable. After the Boruta feature selection, fication was performed using the RF classification the new dataset underwent balancing and parti- algorithm in Google Collaboratory. This classifica- tioning, including the selected environmental pre- tion used the 198 sample sites across the nine LULC dictors and the soil C target variables. This categories and categories for water bodies, inun- partitioning for model training and testing was dated areas, urban areas, rocks, and cattails (Man- based on the values of the soil C target variable, tas and Caro 2023) and the same Landsat imagery utilizing a fivefold approach (Yates and others used in this study. These additional land-use cate- 2022). The approach comprises five cycles of model gories were masked together and defined as ‘‘Non- training and testing, where each iteration involves carbon storing surfaces’’ for mapping purposes. permuting four folds for training (75%) and Furthermore, a raster depicting the Euclidean dis- reserving onefold for testing (25%). The four top- tance—the shortest distance—to Junin Lake performing algorithms in predicting SOC (John and (DLAKE) was generated based on the lake’s others 2020; Emadi and others 2020)—RF, artificial boundary. The remaining predictors were Landsat- neural Networks (ANN), Support Vector Machine based indices, which were already spatially avail- (SVM), and eXtreme Gradient Boosting able. The training samples for classification, the (XGB)—were employed to develop predictive DLAKE raster, and the process of raster snapping models for the soil C target variables. Due to the (at 30 m resolution) for all variables were con- differences in the ranges and distributions of the ducted in ArcGIS. environmental predictors’ values, feature scaling (transformations of values) through scaling (sub- Statistical Comparison of Soil Organic C tracting feature mean and dividing by feature Variables and Models Performance standard deviation, mean 0, and standard deviation 1) and normalization (dividing by the feature Assessment maximum, range from 0 to 1), was executed and The Kruskal–Wallis rank sum test was used to test tested to determine the most effective method for significant differences among LULC for the soil C enhancing model performance. Following the lit- target variables, followed by Dunn’s post hoc test M. Carbajal and others Figure 2. Photos of land-use and land-cover categories (see definition in Materials and Methods section): A Bofedales, B Cultivated pastures, C Improved pastures, D Vigorous grasslands, E Partially degraded grasslands, F Degraded Grasslands, G Fallow areas fallow with 0–2 years after maca cultivation (Fallow 1), H Fallow areas fallow with 3–5 years after maca cultivation (Fallow 2), I Fallow areas fallow with > 5 years after maca cultivation (Fallow-3). with Holm’s correction method for adjusting p- where X̂i, Xi, and N are the model predicted values, values for multiple comparisons. For that analysis, observed values, and, total number of observed the R packages ‘‘stats’’ (R Core Team 2022) and values, respectively. Higher R2 (close to 1) and ‘‘DescTools’’ (Signorel and others 2022) were used. lower RMSE (close to 0) mean better ML model Next, the coefficient of determination (R2) and root performance. Model performance metrics were mean square error (RMSE) were used to assess the calculated as the average across the fivefold parti- performance of the ML models tested. R2 represents tions for training and testing. the proportion of variance explained by each ML model, and RMSE indicates the accuracy of the RESULTS predicted values (Yang and others 2014). R2 and RMSE were calculated as follows: Soil C Measurements by LULC 0 12 P P P b SOC values ranged between 1.67–17.77%, with theB N N N R2 ¼ @BrhffiffiffiffiffiffiffiPffiffiffiffiffiffiffiffiffiffiffiffiffiffi Nffiffiffiffiffiffiffiffiffii¼ffiffiPffiffi 1ffiffiffiXffiffiffiiffiX̂ffiffiffiiffiffiffiffiffiffiffiffiiffiffiffihffi iffi¼ffiffiffi1ffiffiXPffiffiffiffi iffiffiffiffiffiffiffiiffi¼ffiffiffi1ffiffiffiXffiffiffiiffiffiffiffi CC N N 2  N P ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiiffiffiA lowest value found in ‘‘degraded grasslands,’’ N 2 N i¼1 X 2  2i i¼1 Xi N i¼1 X̂i  i¼1 X̂i which was significantly lower than that of ‘‘bofe- sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi dales’’ (p-value < 0.001) and ‘‘partially degradedP grasslands’’ (p-value < 0.01) (Figure 3A,N ð  Þ2X̂i Xi RMSE ¼ i¼1 Table S1). The highest SOC value was found in N ‘‘bofedales,’’ being significantly (p-value < 0.05) From Rangelands to Cropland, Land-use Change and Its Impact Table 2. Model Performance Metrics (R2—Coefficient of Determination and RMSE—Root mean Square Error) for Random Forest (RF), Artificial Neural Networks (ANNs), Support Vector Machine (SVM), and eXtreme Gradient Boosting (XGB) Algorithms on the Models’ Training and Testing of Soil Organic Carbon (SOC), Refractory SOC (RSOC), and 13C Isotopic Composition of SOC (d13CSOC) Soil C variable Model Training Testing RMSE R2 RMSE R2 SOC RF 0.77 0.87 1.47 0.49 ANN 1.15 0.70 1.49 0.43 SVM 1.11 0.72 1.66 0.36 XGB 0.92 0.81 1.53 0.42 d13CSOC RF 0.35 0.89 0.79 0.42 ANN 0.60 0.66 0.85 0.29 SVM 0.42 0.84 0.88 0.26 XGB 0.30 0.84 0.81 0.37 RSOC RF 0.10 0.87 0.19 0.46 ANN 0.12 0.80 0.18 0.50 SVM 0.14 0.75 0.20 0.46 XGB 0.05 0.95 0.20 0.41 2–3 times higher than that of the other LULC cat- egories except for ‘‘improved pastures’’ (Figure 3A, Table S1). RSOC values ranged between 0.01 and 2.58%, being the lowest and highest ones found in Figure 3. A Soil organic carbon (SOC), B refractory SOC ‘‘fallow–1’’ and ‘‘bofedales,’’ respectively (Fig- (RSOC), and C 13C isotopic composition of SOC ure 3B). ‘‘Cultivated’’ and ‘‘improved pastures’’ (d13CSOC) on different land-use and land-cover were not significantly (p-value > 0.05) lower than categories (see definition in Materials and Methods ‘‘bofedales’’ which was 2–3 times higher than the section). Red dashed horizontal line represents the other LULC categories (Table S1). Values of d13C global average.SOC ranged between - 29.09–20.35 & being the high- est one found in ‘‘fallow–3’’ (Figure 3C). The models, achieving R 2 > 0.87 during training, ex- lowest value was found in ‘‘bofedales’’ (all its val- cept for XGB in the RSOC model (0.95), and 2 ues were below the overall mean of - R > 0.42 during testing, except for ANN in the 24.76 ± 0.074 &) which was significantly different RSOC model (R 2 = 0.50) (Table 2). Thus, following to all the other LULC categories except to ‘‘culti- the criteria indicated in Sect. ’’Statistical Compar- vated’’ and ‘‘improved pastures’’ (Figure 3C, ison of Soil Organic C Variables and Models Per- Table S1). ‘‘Fallow–3’’ (p-value < 0.01), ‘‘de- formance Assessment’’ and analyzing the average 2 graded grasslands’’ (p-value < 0.01), and ‘‘par- fivefold R and RMSE values for training and test- tially degraded grasslands’’ (p-value < 0.05) ing, respectively (Table 2), RF was selected as the showed significant differences compared to ‘‘Cul- most appropriate model for predicting SOC and 13 tivated pastures’’ (Table S1). Bulk density in ‘‘bo- d CSOC, and ANN for predicting RSOC. fedales’’ was approximately half compared to other LULC categories (0.49 t m-3 vs. 0.98–1.09 t m-3), Explanatory Variables while carbon stock was nearly twice as high The environmental predictors excluded (see selec- (210.9 t ha-1 vs. 97.4–126.3 t ha-1) (Table S2). tion criteria in Sect. ’’Modeling Approach’’) from the model building of the soil C target variables Model Performance and Comparison were SILT (silt content), CLAY (clay content), Overall, RF consistently outperformed other ML BLUE (blue band), GREEN (green band), SWIR1 algorithms in modeling soil C target variables (short-wave infrared-1 band), SLOPE, ASPECT, M. Carbajal and others TWI, TMNN, TMNX, and ‘‘Cultivated pastures’’ in the southernmost part of the study area, pri- (data not shown). A total of 23, 22, and 20 out of marily corresponding to cultivated pasture zones. the 42 environmental predictors were selected (see Conversely, the lowest SOC values were predicted selection criteria in Sect. ’’Modeling Approach’’) in the western Reserve Buffer Zone. for building the models for SOC, d13CSOC, and RSOC, respectively (data not shown). From the DISCUSSION selected environmental predictors, ‘‘bofedales’’ were identified as the most critical for SOC, fol- ‘‘Bofedales’’ as Essential Reservoirs lowed by SER2 (Spectral vegetation indexes 2) and of Soil Organic Carbon in the Andes NDMI (Normalized Difference Moisture Index), Highlands both of which were considerably less important d13 Bofedales showed higher SOC amounts compared(Figure 4A). Regarding CSOC, ‘‘bofedales’’ also to other assessed land uses (Figure 2). Even though were the most critical environmental predictor, these wetlands have been recognized as an essen- followed by NDMI and DLAKE, which had similar tial reservoir of SOC in the Andes (Alavi-Murillo importance, and then by NIR (Near-infrared band) and others 2022; Segnini and others 2013), their and pH, which were the next ones in importance relevance in policy incidences and conservation/ (Figure 4B). For RSOC, pH was as critical as ‘‘Fal- restoration actions is scarce, or null (Maldonado low-3,’’ followed by SWIR2 (short-wave infrared-2 2014). The SOC range of values in this study (3.2– band), ‘‘bofedales,’’ and EVI (Enhanced Vegetation 17.8%) was in the lower range of values reported Index) with lower importance (Figure 4C). by other studies (13.2– 83.2%) (Cooper and others SOC Mapping 2010; Segnini and others 2010; Alavi-Murillo and others 2022; Monge-Salazar and others 2022). The land-cover classification yielded accuracies of Plant biomass extraction from the soil through 95% during training and 60% during testing. Pre- ‘‘champeo’’ and overgrazing has been reported in dicted SOC values within the study area ranged the study area (Caro and others 2007, 2014; Sal- from 2.7 to 11.5% (Figure 5). The highest SOC vador and others 2014; Mantas and Caro 2023); values were predominantly found north and south these perturbations could promote SOC reduction. of Junins lake, mainly in the ‘‘bofedales’’ zone. On the other hand, C stock values (in the 0–0.3 m Areas with the next highest SOC values were found soil profile) found in this study (Table S2) were Table 3. Caused Impacts by the Primary Land-Use Changes in the Study Area and Highland Andean Ecosystems Reported by Literature Effect The main land-use Drivers in the study area Soil/land components & function Ecosystem changes services  Native Fuel demand and inappro- SOM reduction a, erosion b, loss of plant fl FWFP, fl ‘‘bofedales’’/grasslands priate management diversity and soil degradation c,d, water CS, fl NC to Degraded (‘‘champeo’’, overgrazing) retention d, loss of productivity e ‘‘bofedales’’/grassland  Native grasslands to cul- Agricultural policy reforms > soil aggregation and potential soil C › FWFP, › tivated grassland sequestration f, > plant productivity and CS, › SF, soil fertility f,g ›NC  Native grasslands to External market demand Soils macroaggregate disruption f, > nutrient › FP, › SF maca crop mineralization and soil fertility g  Maca crop to the fallow External market demand SOC reduction f, slow plant recovery fl FWFP, fl areas f, > proneness to erosion and runoff f,g CS, fl NC, flWR aAdler and Morales (1999), b Rolando and others (2017a), c Catorci and others (2014), d Cochi Machaca and others (2018), e Caro and others (2014), fRolando and others (2017b), gRolando and others (2018). Likely positive (›) and adverse (fl) effects on provisioning (FWFP = food, wool, and fiber provision, FP = food provision), regulating (CS = carbon sequestration, WR = water regulation), and supporting (SF = soil fertility, NC = nutrient cycling,) ecosystem services are referred based on Millennium Ecosystem Assessment (Corvalán and others 2005). In bold are the processes directly and indirectly observed/measured in this study. From Rangelands to Cropland, Land-use Change and Its Impact thickness can reach as deep as 15 m). The study findings highlighted the importance of ‘‘bofedales’’ as a reservoir of SOC and its stable C fractions and called for its conservation and restoration (see Scale, Reach, and Impacts of Land-use Changes and their Implications for Conservation Section). The highest depletion of d13CSOC (ranged from - 29.5 to - 25.0 &) in ‘‘bofedales’’ than other LULC suggested that SOC was formed from plants under no water restriction conditions and better photosynthetic performance discriminating against 13C (Farquhar and others 1989; More and others 2022). This finding highlights the potential for relatively high primary productivity in ‘‘bofe- dales’’ in this Andean ecosystem. On the other hand, high enrichment of d13C is also related to higher fractions of persistent SOC pools (Ehleringer and others 2000), which is consistent with our findings considering that ‘‘bofedales’’ showed the highest RSOC (1.10 ± 0.23%) than other LULC (Figure 3C). Furthermore, Segnini and others (2010) found an increase in persistent SOC pools with soil depth in Andean- ‘‘bofedales.’’ Highland grasslands have been reported as other important reservoirs of C stocks and SOC in the Andes (Gibbon and others 2010; Zimmermann and others 2010; Farley and others 2013). In our study area, Rolando and others (2017b) detected that cultivated pastures showed similar values of SOC but a higher depletion of d13C (4.5 ± 0.2% and - 26.0 ± 0.1 &, respectively) than native grasslands (4.6 ± 0.3% and - 25.6 ± 0.1&, respectively) and Figure 4. Rankings of the top most important fallow areas (4.1 ± 0.3% and - 25.6 ± 0.1 &, environmental predictors defined for the best- respectively). This has been interpreted as a higher performed machine learning model for soil organic 13 carbon (SOC) with random forest (A, in %), 13C depletion of d C in cultivated pastures from incor- isotopic composition of SOC (d.13C ) with random porating N-fixer species (white clover) and long-s-SOC forest (B, in &), and refractory SOC (RSOC) with tanding perennial grasses (like ryegrass), manure, Artificial Neural Network (C, in %). Importance is and supplemental irrigation. In this study (in defined as the increase in the MSE prediction when the agreement with Rolando and others 2017b), ‘‘cul- variables are permuted. The environmental predictors are tivated pastures’’ LULC showed significantly more described in Table 1. depleted d13CSOC (- 25.4 ± 0.16 &) than ‘‘partially degraded’’ (- 24.6 ± 0.12 &) and ‘‘degraded slightly lower than those values reported in the grasslands’’ (- 24.5 ± 0.12 &), and fallows area literature for ‘‘bofedales’’ (211 vs. 230–306 t ha-1 after three years (- 23.7 ± 0.46 &) (Figure 3). This from Segnini and others 2010), grasslands (102– result suggested that vegetation that formed SOC in 126 vs. 135–144 t ha-1 from Farley and others cultivated pastures had better physiological perfor- 2013), fallows (106–11 vs.  123 t ha-1 from Ro- mance and that soil in degraded grasslands and fal- lando and others 2017b) and pastures (97–119 vs. low areas likely had more labile C forms. 136 t ha-1 from Rolando and others 2017b). There were no significant differences in C stock values RF as Promising ML Algorithm among LULC categories except for ‘‘bofedales’’, for Predicting Soil C Variables which was almost twice as high (Table S2). In in the Andean Highlands ‘‘bofedales,’’ C stocks are more extensive and pro- found than the other LULC categories and range Overall, among the ML algorithms, RF performed from 30–700 t C ha-1 per meter of peat depth (peat the best, capturing C processes’ nonlinear interac- M. Carbajal and others Figure 5. Predicted spatial distribution of soil organic carbon across the Lake Junı́n Region, Junı́n, Peru. This map showcases the distribution, as inferred by a random forest algorithm utilizing the eight most significant environmental predictors available spatially. Non-carbon storing surfaces correspond to classes such as water bodies, inundated surfaces, cattails, rocks, and urban areas. tions with acceptable and consistent R2 and RMSE the study area, and model algorithm (Grunwald performances (Table 2), which agrees with most of 2022). Sample size has a more significant effect the reported SOC modeling studies. In the litera- than the model algorithm on the model perfor- ture, the performance of ML algorithms predicting mance (Somarathna and others 2017). R2 is among SOC is highly variable. It depends on multiple the most reported model performance indicators for factors, like the observed sample size, number and ML regression algorithms for soil C models due to type of covariates, time–space resolution, extent of its more straightforward interpretation, especially From Rangelands to Cropland, Land-use Change and Its Impact when comparing multiple site applications where and others 2003; Luo and others 2017; Sing and target value ranges and/or units may differ to use others 2018; Dynarski and others 2020). Even RMSE. However, most of these R2 values ranged though land use significantly affects both labile and from 0.24 to 0.68 (from first to third quartile) persistent C pools (Liu and others 2020; Padb- (Grunwald 2022), reflecting little understanding of hushan and others 2022; Smith 2008), the latter the main drivers and methods for predicting SOC. responds much slower than labile C pools to land- For this study, the R2 of predicted soil C target use and other human-induced changes (for exam- variables varied from 0.42 to 0.50 for the best ML ple, land management) (Dynarski and others 2020; algorithms, agreeing with other studies with small Padbhushan and others 2022; Sainepo and others sampling sizes and similar covariates (Zeraatpisheh 2018). Thus, LULC was one of the leading envi- and others 2022). Using multi-temporal data or soil ronmental predictors, ‘‘bofedales’’ the most rele- nutrient indicators as covariates has been a strategy vant for SOC and d13CSOC, and ‘‘Fallow-3’’ for to counter the effect of a small sample size, allow- RSOC. Several studies have highlighted the ing somewhat higher R2, 0.58–0.68 (John and importance of LULC as a predictor variable for SOC others 2020; Shafizadeh-Moghadam and others (Emadi and others 2020; Keskin and others 2019; 2022). Therefore, the moderate performance of the Xiong and others 2014) and RSOC (Keskin and models, especially in predicting d13CSOC, suggested others 2019; Xiao and others 2022) using ML that the processes involved are too complex for the algorithms. Regarding d13CSOC, Wang and others given small sample size and/or some essential (2015) stress that the litter quality and soil water variables at the correct time–space scale were can increase the carbon isotope fractionation dur- missing as covariates. Regarding the RMSE, pre- ing organic matter decomposition. Because soil 13C dicting SOC got 1.47%, which seems high, but isotope composition (d13C) is strongly influenced considering the small sample size and high SOC by leaf (litter) d13C, variations in this variable can values from ‘‘bofedales,’’ it is fair and in the mid- be influenced by LULC because it determines the range of the reported values from 0.59 to 2.7 across type and quality of litter inputs into the soil (Smith multiple SOC studies (Padarian and others 2019; and Chalk 2021; Wang and others 2013). Thus, Peng and others 2015; Safanelli and others 2020). d13C values in labile C pools (that is, relatively Few studies modeled other C fractions apart from ‘‘new’’ material) would reflect d13C values closer to SOC with ML techniques; for example, Adi and the current vegetation, whereas d13C values in Grunwald (2020) and Keskin and others (2019) persistent C pools (that is, older material) shows modeled persistent C fraction at 0–0.2 m depth for relatively enriched d13C values due to isotopic dis- Florida State using 850 and 1014 soil samples and crimination of the heavy isotope in soil organic 151 and 327 environmental predictors, respec- matter compounds (Wang and others 2013). In tively. When employing the RF algorithm, these addition, the crucial role of soil water and soil studies achieved acceptable R2 values of 0.68 and temperature and pH during soil organic matter 0.72, respectively. This suggests that model per- decomposition has been highlighted as they in- formances could be improved by adding sampled crease the activity of soil fauna and microorganisms data and potential environmental predictors. The (Wang and others 2013; Wang and others 2015; ANN model was selected for RSOC predictions due Smith and Chalk 2021). Thus, we found that for to its balanced performance in the training and ‘‘bofedales,’’ some indicators of soil water (DLAKE testing phases. Although XGB and RF demon- and NDMI) and vegetation (SER2), and pH were strated superior learning capabilities during train- relevant environmental predictors for d13CSOC and ing, ANN performed well in training and exhibited RSOC (Figure 3). The greater relevance of pH for the best generalization to unseen data in the testing RSOC and d13CSOC than for SOC could be due to its phase (Table 2). impact on the activity and growth of microorgan- isms, which metabolize the different forms of C, Vegetation and Climatic Indices resulting in a variation in the organic carbon iso- as Essential Predictors of Soil Organic topic composition of the soil (Neina 2019; Klink Carbon and others 2022). Also, soil pH can affect the interactions between soil minerals and organic Quality and quantity of SOC are mainly deter- matter, which determines the preservation and mined by a soil’s physical and chemical environ- stability of C (Neina 2019). Although some soil ment, physical accessibility of organic matter to variables, such as clay content, were reported as biological agents (that is, microbes and/or en- essential predictors for SOC (John and others 2020; zymes), and the ratio of C inputs to losses (Krull Davy and Koen 2013), in this study, it was not of M. Carbajal and others high relevance, likely due to the importance of pH ‘‘bofedales’’ and grasslands. Both perturbations against other chemical indicators to explain SOC in (‘‘champeo’’ and overgrazing) are the most Andean highlands soils (Alavi-Murillo and others important drivers that impact the change from 2022). vigorous/native to degraded ‘‘bofedales’’/grass- The relationship between SOC and remotely lands, reducing SOC (Figure 3) and provisioning, sensed and easily accessible variables has rarely regulation, and supporting ecosystem services (Ta- been reported (Mirchooli and others 2020). How- ble 3). Land policy reforms during the’70 s pro- ever, Lamichhane and others (2019) reported that moted establishing a multi-communal agrarian these variables were among the top five for SOC company (SAIS Tupac Amaru) in the region, cov- prediction. NDMI is a vegetation index that detects ering more than 0.2 Mha, to increase grassland vegetation water content and is a good predictor for productivity for livestock (Diez 2020). Through measuring SOC using ML methods (John and these reforms, the natural grasslands from these others 2020). Mirchooli and others (2020) found lands were managed by incorporating productive that coloration index and NDMI are the most crit- pastures (ryegrass-white clover), irrigation, inor- ical environmental predictors for SOC prediction in ganic–organic fertilization, and rotational livestock the RF model, followed by elevation, NDVI, and grazing (Rolando and others 2017b). The land-use slope. NDMI is indirectly related to soil moisture in change from native to cultivated grasslands was the the surface layers (0–0.3 m), and the latter can only one that was not considered a perturbation; it prevent the net loss of organic soils through oxi- increased plant productivity (see first Discussion dation (Liu and others 2015b). In this study, NDMI section) and soil health, promoting provisioning, was the main environmental predictor in both SOC regulation, and supporting ecosystem services (Ta- and d13CSOC under the RF model, followed by ble 3). Land-use changes caused by crop SER2, NIR, and NBR1. These last variables are encroachment in highland grasslands are consid- closely related by the NIR and SWIR2 bands, found ered one of the most critical perturbations that in the spectrums wavelengths from 850 to threaten the ecosystem services of these landscapes 2200 nm. Bishop and others (2008) found a strong in the highland Andes region (Tovar and others absorption near 1400 nm (also for Kaolinite) and 2013; Rolando and others 2017a). Climate change 1900 nm, indicating the presence of water bound facilitating the upward expansion of agriculture in the interlayer lattices of soil. This could provide (Tovar and others 2012; Arce and others 2019) and the conditions for a physical protection mechanism socioeconomic factors like the increase of interna- through the interaction of SOC with the soil min- tional market demand (like quinoa, Gamboa and eral matrix and the stabilization process by aggre- others 2020) have been crucial drivers of Andes gate formation (Krull and others 2003). Also, Al- grasslands transformation. abbas and others (1972) reported an inverse rela- In the study area, maca (Lepidium meyenii) culti- tionship to SOC approximately near this region of vation was gradually extended in the grasslands of the spectrum, and with all this, it could have ob- Junin since the early 90 s for local, American, and tained the affinity to be one of the best environ- European markets. Still, its expansion was massive mental predictors for SOC and d13CSOC. in 2011–2015 to cover the high demands of the Asian markets. This led to a rapid transformation of Scale, Reach, and Impacts of Land-use the high Andean landscape with direct conse- Changes and Their Implications quences on ‘‘puna’’ ecosystem services, such as the for Conservation decrease of grassland primary production, reduced grazing areas, reduced land cover, loss of water The extraction of vegetation and part of the topsoil infiltration and retention capacity of soils, besides of ‘‘bofedales’’ and grasslands (an activity locally changes in the main livelihood (Turin and others called ‘‘champeo’’) has been carried out for decades 2018) (see Table 3). This study corroborates find- by rural inhabitants (Caro and others 2014). ings previously reported in the field (Rolando and ‘‘Champeo’’ allows the local population to guar- others 2017b, 2018), highlighting the occurrence of antee fuel for domestic use (mainly cooking); a degradation process following maca cultivation however, it also constitutes a critical perturbation (as indicated in Table 3), particularly in steep ter- affecting SOC accumulation (Table 3). Overgrazing rains. Swift restorative measures are imperative to caused by domestic livestock is another activity reinstate ecosystem services provided by grasslands. reported in the study area (Caro and others 2007; Despite the inclusion of high Andean natural pas- Salvador and others 2014) that reduces peat pro- ture management for greenhouse gas reduction duction and can affect SOC pools from the assessed within the National Determined Contribution From Rangelands to Cropland, Land-use Change and Its Impact (NDC), outlined by the Peruvian multisectoral CONCLUSION working group (MINAM 2019), further measures are warranted to ensure the preservation of soil C Processes that drive SOC and fractions like RSOC 13 stored within grasslands and unique ‘‘bofedales’’ and d CSOC in high Andean rangeland systems ecosystems. Economic and social incentives for have not been studied yet, challenging the choice pastoralists must be implemented to guarantee the of environmental predictors (LULC identification establishment of best management practices (rota- and classification, remote sensing products, climate tional grazing, improved fallows with legumes, and soil variables, among others) for their model- water harvesting, wetland, and grassland restora- ing. Under this context, ML algorithms capture tion) to avoid the expansion of the agricultural nonlinear interaction and process complexity to frontier. Special attention must be provided to model the studied soil C target variables with ‘‘bofedales’’ which occupy around 0.8% of the acceptable and consistent performance. ‘‘Bofe- Peru surface ( 1.05 Mha) and are found pre- dales’’ were the most important reservoirs in terms dominantly in mine concessions (41% of total of the total and the refractory fraction of SOC ‘‘bofedales’’ surface), keeping 21% of them under compared to the other land uses. Its highest 13 the custody of rural inhabitants (Fuentealba and depletion of dC is a potential indicator of higher Rios 2023). Despite that, an increase of + 2% year- turnover rates, high plant productivity, and C 1 in areas of ‘‘bofedales’’ (by greater availability of persistence. Because ‘‘bofedales’’ are affected by water resources in dry seasons due to deglaciation) strong perturbations (extraction of vegetation and has been reported for the 1986–2005 period in the part of the topsoil—‘‘champeo,’’, overgrazing) in southern Andes (Pauca-Tanco and others 2020), in the study area, it is recommended to establish recent years there has been a reduction of areas of restoration activities to guarantee ecosystem ser- ‘‘bofedales.’’ Thus, some studies reported an area vices from those ecosystems. For example, the loss rate of - 3.8 to - 0.4% year-1 during the management of natural grasslands through culti- 2005–2016 period (Machuca-Crespo 2018; Pauca- vated pastures showed indicators of higher pro- 13 Tanco and others 2020; Pamo-Sedano and Oscco- ductivity (more depletion of d C), remarking its Coa 2022). These ecosystems can be restored by potential for grassland restoration after crop establishing artificial ‘‘bofedales,’’ which can pre- encroachment (like maca crop) in this area. serve the same ecosystem services as natural ones, Free, publicly available remote sensing data can as was remarked in recent studies (Monge-Salazar be beneficial for SOC prediction. Vegetation indices and others 2022). close to the NIR band, such as NDMI and SER2, The present study was conducted in the Junin were good environmental predictors for the total 13 National Reserve, which covers 5303.9 and soil C (SOC and d CSOC). However, to improve the 3608.8 ha of ‘‘bofedales’’ of the Junin and Pasco prediction, vegetation and climatic indices must be departments, respectively (Fuentealba and Rios complemented with data taken in situ, such as pH, 2023). Conservation areas can be crucial as a life and especially LULC, because it is the primary dri- lab to test and monitor restoration activities ver of SOC variation. Together, these variables can involving local communities, thus improving the explain SOC dynamics, facilitating their prediction geospatial modeling of SOC to build an interoper- using ML algorithms. Considering the high reser- able public digital infrastructure that can serve as a voirs of C in the soils of highland Andean ecosys- monitoring-verification system for future compen- tems, future SOC and fractions mapping will be sation schemes for the benefit of indigenous pas- essential for decision-makers and regional govern- toralists and rural inhabitants. Focusing on the ments for compensation schemes in voluntary or ecologically significant and delineated regions of regulated C markets. The SOC map elaborated in the Junin National Reserve and its buffer zone, our this study can be used for this aim, and some predictive mapping depicted distinct variations in improvements can be achieved if more soil sam- SOC distribution. Specifically, within the reserve plings are collected, especially in ‘‘bofedales,’’ im- itself, approximately 32% of the C storing surfaces proved and cultivated pastures, and fallows LULC. had SOC values over 9.6%, compared to only 8% within its buffer zone (Figure 5). While RSOC and ACKNOWLEDGEMENTS d13CSOC are key variables that provide valuable This research was carried out under the support of information, the significant importance of pH—a the ‘‘Innovate’’ Work Package of ‘‘Excellence in site-specific sampled predictor—in their models Agronomy’’ and the 3rdWork Package of ‘‘AgriLAC limited our ability to produce accurate spatial dis- Resiliente’’ OneCGIAR initiatives. MC received tribution maps. M. Carbajal and others support from the USDA Foreign Agricultural Ser- Across Farming Landscapes. Land 8:169. https://doi.org/10.3 vice, Borlaug Fellowship Program (BF-CR-16-009). 390/land8110169. The authors greatly thank producers of the study Ayala Izurieta JE, Márquez CO, Garcı́a VJ, Jara Santillán CA, Sisti JM, Pasqualotto N, Van Wittenberghe S, Delegido J. area in Junin, Peru, for allowingus to sample soils for 2021. Multi-predictor mapping of soil organic carbon in the their properties, Lake Junin National Reserve staff, alpine tundra: a case study for the central Ecuadorian páramo. and especially Alan Chamorro of ECOAN, for the Carbon Balance and Management 16(1):1–19. help provided during fieldwork. Balabaeva K, Kovalchuk S. 2019. Comparison of temporal and non-temporal features effect on machine learning models OPEN ACCESS quality and interpretability for chronic heart failure patients. Procedia Computer Science 156:87–96. https://doi.org/10.10 This article is licensed under a Creative Commons 16/j.procs.2019.08.183 Attribution-NonCommercial-NoDerivatives 4.0 Bernoux M, Cerri CC, Neill C, de Moraes JF. 1998. The use of International License, which permits any non- stable carbon isotopes for estimating soil organic matter turnover rates. Geoderma 82(1–3):43–58. commercial use, sharing, distribution and repro- Bishop JL, Lane MD, Dyar MD, Brown AJ. 2008. Reflectance duction in any medium or format, as long as you and emission spectroscopy study of four groups of phyllosili- give appropriate credit to the original author(s) and cates: Smectites, kaolinite-serpentines, chlorites and micas. the source, provide a link to the Creative Commons Clay Minerals 431:35–54. licence, and indicate if you modified the licensed Caro C, Quinteros Z, Mendoza V. 2007. Identificación de indi- material. You do not have permission under this cadores de conservación para la Reserva Nacional de Junı́n. licence to share adapted material derived from this Perú. Ecologı́a Aplicada 6(1–2):67–74. article or parts of it. The images or other third party Caro C, Sánchez E, Quinteros Z, Castañeda L. 2014. Respuesta de los pastizales altoandinos a la perturbación generada por material in this article are included in the article’s extracción mediante la actividad de ‘‘champeo’’ en los ter- Creative Commons licence, unless indicated renos de la Comunidad Campesina Villa de Junı́n. Perú. otherwise in a credit line to the material. If material Ecologı́a Aplicada 13(2):85–95. is not included in the article’s Creative Commons Carré F, McBratney AB, Minasny B. 2007. Estimation and licence and your intended use is not permitted by potential improvement of the quality of legacy soil samples for statutory regulation or exceeds the permitted use, digital soil mapping. Geoderma 141(1–2):1–14. you will need to obtain permission directly from Catorci A, Cesaretti S, Velasquez JL, Malatesta L, Zeballos H. 2014. The interplay of land forms and disturbance intensity the copyright holder. To view a copy of this licence, drive the floristic and functional changes in the dry Puna visit http://creativecommons.org/licenses/by-nc-n pastoral systems (southern Peruvian Andes). Plant Biosystems d/4.0/. 148:547–557. https://doi.org/10.1080/11263504.2014.90012 6. Chatterjee S, Hartemink AE, Triantafilis J, Desai AR, Soldat D, DATA AVAILABILITY Zhu J, Townsend PA, Zhang Y, Huang J. 2021. Characteriza- tion of field-scale soil variation using a stepwise multi-sensor The data that support the findings of this study are fusion approach and a cost-benefit analysis. Catena openly available in Dataverse CGIAR repository at 201:105190. https://doi.org/https://doi.org/10.21223/VVIFL9. Chen S, Arrouays D, Mulder VL, Poggio L, Minasny B, Roudier P, Libohova Z, Lagacherie P, Shi Z, Hannam J, Meersmans J, Richer-de-Forges A, Walter C. 2022. Digital mapping of REFERENCES globalsoilmap soil properties at a broad scale: a review. Geo- derma 409:115567. Adi SH, Grunwald S. 2020. Integrative environmental modeling Cochi Machaca N, Condori B, Rojas Pardo A, Anthelme F, of soil carbon fractions based on a new latent variable model Meneses RI, Weeda CE, Perotto-Baldivieso HL. 2018. Effects approach. Science of the Total Environment 711:134566. of grazing pressure on plant species composition and water Adler PB, Morales JM. 1999. Influence of environmental factors presence on bofedales in the Andes mountain range of Boli- and sheep grazing on an Andean grassland. Journal of Range via. Mires Peat 21:1–15. Management 52:471–481. https://doi.org/10.2307/4003774. Cooper D, Wolf E, Colson C, Vering W, Granda A, Meyer M. Al-Abbas AH, Swain PH, Baumgardner MF. 1972. Relating or- 2010. Alpine Peatlands of the Andes, Cajamarca, Peru. Arctic, ganic matter and clay content to the multispectral radiance of Antarctic, and Alpine Research 42:19–33. soils. Soil Science 114(6):477–485. Corvalán C, Hales S, McMichael AJ. 2005. Ecosystems and hu- Alavi-Murillo G, Diels J, Gilles J, Willems P. 2022. Soil organic man well-being: health synthesis, Millennium ecosystem carbon in Andean high-mountain ecosystems: importance, assessment. Millennium Ecosystem Assessment (Program), challenges, and opportunities for carbon sequestration. Re- World Health Organization (Eds.). World Health Organiza- gional Environmental Change 22:128. tion, Geneva, Switzerland. Arce A, de Haan S, Juarez H, Dhar Burra D, Plasencia F, Ccanto Dangles O, Carpio C, Barragan AR, Zeddam JL, Silvain JF. 2008. R, Polreich S, Scurrah M. 2019. The Spatial-Temporal Temperature as a key driver of ecological sorting among Dynamics of Potato Agrobiodiversity in the Highlands of invasive pest species in the tropical Andes. Ecological Appli- Central Peru: A Case Study of Smallholder Management cations 18:1795–1809. From Rangelands to Cropland, Land-use Change and Its Impact Davy MC, Koen TB. 2013. Variations in soil organic carbon for Klink S, Keller AB, Wild AJ, Baumert VL, Gube M, Lehndorff E, two soil types and six land uses in the Murray catchment, New Meyer N, Mueller CW, Phillips RP,Pausch J. 2022. Stable iso- South Wales. Australia. Soil Research 51(8):631–644. topes reveal that fungal residues contribute more to mineral- Diez A. 2020. Reforma agraria y procesos comunales: las associated organic matter pools than plant residues. Soil comunidades de las SAIS Cahuide y Túpac Amaru en la sierra Biology and Biochemistry 168:108634. https://doi.org/10.10 central del Perú. Revista Del Instituto Riva-Agüero 5:299–337. 16/j.soilbio.2022.108634 https://doi.org/10.18800/revistaira.202002.010. Krull ES, Baldock JA, Skjemstad JO. 2003. Importance of Dynarski KA, Bossio DA, Scow KM. 2020. Dynamic stability of mechanisms and processes of the stabilisation of soil organic soil carbon: reassessing the ‘‘permanence’’ of soil carbon matter for modelling carbon turnover. Functional Plant Biol- sequestration. Frontiers in Environmental Science 8:514701. ogy 30(2):207–222. Ehleringer JR, Buchmann N, Flanagan LB. 2000. Carbon isotope Kuhn M, Wing J, Weston S, Williams A, Keefer C, Engelhardt A, ratios in belowground carbon cycle processes. Ecological Cooper T, et al. 2019. Caret: classification and regression Applications 10(2):412–422. training. R Package Version 6:86. Emadi M, Taghizadeh-Mehrjardi R, Cherati A, Danesh M, Mo- Kursa MB, Rudnicki WR. 2010. Feature selection with the savi A, Scholten T. 2020. Predicting and mapping of soil or- Boruta package. Journal of Statistical Software 36:1–13. ganic carbon using machine learning algorithms in Northern Lamichhane S, Kumar L, Wilson B. 2019. Digital soil mapping Iran. Remote Sensing 12(14):2234. algorithms and covariates for soil organic carbon mapping and Farley KA, Bremer LL, Harden CP, Hartsig J. 2013. Changes in their implications: a review. Geoderma 352:395–413. carbon storage under alternative land uses in biodiverse An- Liu S, An N, Yang J, Dong S, Wang C, Yin Y. 2015a. Prediction of dean grasslands: implications for payment for ecosystem ser- soil organic matter variability associated with different land vices. Conservation Letters 6:21–27. use types in mountainous landscape in southwestern Yunnan Farquhar GD, Ehleringer JR, Hubick KT. 1989. Carbon isotope province, China. Catena 133:137–144. discrimination and photosynthesis. Annual Review of Plant Liu Y, Guo L, Jiang Q, Zhang H, Chen Y. 2015b. Comparing Physiology and Plant Molecular Biology 40:503–537. geospatial techniques to predict SOC stocks. Soil and Tillage Fuentealba B, Rios R. 2023. Memoria descriptiva inventario Research 148:46–58. nacional de bofedales 2023. Instituto Nacional de Investi- Liu D, Yu Z, Lin J. 2018. Application of combustion module gación en Glaciares y Ecosistemas de Montaña (INAIGEM). coupled with cavity ring-down spectroscopy for simultaneous Huaraz, 205 p. https://repositorio.inaigem.gob.pe/handle/16 measurement of SOC and d13C-SOC. Journal of Spectroscopy 072021/466 2018:1–5. Gamboa C, Bojacá CR, Schrevens E, Maertens M. 2020. Sus- Liu X, Chen D, Yang T, Huang F, Fu S, Li L. 2020. Changes in soil tainability of smallholder quinoa production in the Peruvian labile and recalcitrant carbon pools after land-use change in a Andes. J. Clean. Prod. 264:121657. https://doi.org/10.1016/j. semi-arid agro-pastoral ecotone in Central Asia. Ecological jclepro.2020.121657. Indicators 110:105925. Gehl RJ, Rice CW. 2007. Emerging technologies for in situ Luo Z, Feng W, Luo Y, Baldock J, Wang E. 2017. Soil organic measurement of soil carbon. Climatic Change 80(1–2):43–54. carbon dynamics jointly controlled by climate, carbon inputs, Gibbon A, Silman MR, Malhi Y, Fisher JB, Meir P, Zimmermann soil properties and soil carbon fractions. Global Change Biol- M, Dargie GC, Farfan WR, Garcia KC. 2010. Ecosystem carbon ogy 23(10):4430–4439. storage across the grassland-forest transition in the high Andes Machuca-Crespo DV. 2018. Efectos de la extracción de turba en of Manu National Park. Peru. Ecosystems 13(7):1097–1111. un sistema socio-ecológico altoandino: bofedales de Caram- Grunwald S. 2022. Artificial intelligence and soil carbon mod- poma-Lima (Bachelor’s thesis, Pontifica Universidad Católica eling demystified: power, potentials, and perils. Carbon del Perú). Footprints 1:5. Maldonado F. 2014. An introduction to the bofedales of the Han R, Zhang Q, Xu Z. 2023. Responses of soil organic carbon Peruvian High Andes. Mires and Peat 15(5):1–13. cycle to land degradation by isotopically tracing in a typical Mantas V, Caro C. 2023. User-relevant land cover products for karst area, southwest China. PeerJ 11:e15249. https://doi.org/ informed decision-making in the complex terrain of the 10.7717/peerj.15249 Peruvian Andes. Remote Sensing 15(13):3303. Hribljan JA, Suárez E, Heckman KA, Lilleskov EA, Chimner RA. McKay MD, Beckman RJ, Conover WJ. 1979. A comparison of 2016. Peatland carbon stocks and accumulation rates in the three methods for selecting values of input variables in the Ecuadorian páramo. Wetlands Ecology and Management analysis of output from a computer code. Technometrics 24:113–127. 21(2):239–245. Jagadamma S, Lal R, Ussiri DA, Trumbore SE, Mestelan S. 2010. MINAM. 2015. Mapa nacional de cobertura vegetal: memoria Evaluation of structural chemistry and isotopic signatures of descriptiva. Ministerio del Ambiente. Dirección General de refractory soil organic carbon fraction isolated by wet oxida- Evaluación, Valoración y Financiamiento del Patrimonio tion methods. Biogeochemistry 98:29–44. Natural. Lima-Perú. John K, Abraham Isong I, Michael Kebonye N, Okon Ayito E, MINAM. 2019. Catálogo de Medidas de Mitigación. Ministerio Chapman Agyeman P, Marcus Afu S. 2020. Using machine del Ambiente. Dirección General de Cambio Climático y learning algorithms to estimate soil organic carbon variability Desertificación. Lima-Perú. with environmental variables and soil nutrient indicators in Mirchooli F, Kiani-Harchegani M, Darvishan AK, Falahatkar S, an alluvial soil. Land 9(12):487. Sadeghi SH. 2020. Spatial distribution dependency of soil or- Keskin H, Grunwald S, Harris WG. 2019. Digital mapping of soil ganic carbon content to important environmental variables. carbon fractions with machine learning. Geoderma 339:40– Ecological Indicators 116:106473. 58. M. Carbajal and others Monge-Salazar MJ, Tovar C, Cuadros-Adriazola J, Baiker JR, different land uses in the Peruvian high-Andean Puna. Geo- Montesinos-Tubée DB, Bonnesoeur V, Antiporta J, Román- derma 307:65–72. Dañobeytia F, Fuentealba B, Ochoa-Tocachi BF, Buytaert W. Rolando JL, Dubeux JCB Jr, Ramirez DA, Ruiz-MorenoM, Turin 2022. Ecohydrology and ecosystem services of a natural and C, Mares V, Sollenberger LE, Quiroz R. 2018. Land Use Effects an artificial bofedal wetland in the central Andes. Science of on soil fertility and nutrient cycling in the Peruvian high- the Total Environment 838:155968. Andean Puna grasslands. Soil Science Society of America More SJ, Ravi V, Raju S. 2022. Carbon isotope discrimination Journal 82:463–474. studies in plants for abiotic stress. In: Shanker C, Anand A, RSIS. 2021. Ramsar Sites Information Service: Reserva Nacional Maheswari M, Eds. Shanker AK, . Climate Change and Crop de Junı́n. https://rsis.ramsar.org/es/ris/882Accessed 20 Apr Stress: Molecules to ecosystems. Academic Press International 2023. Publishing. pp 493–537. Safanelli JL, Chabrillat S, Ben-Dor E, Demattê JA. 2020. Mul- Neina D. 2019. The role of soil pH in plant nutrition and soil tispectral models from bare soil composites for mapping top- remediation. Applied and environmental soil science soil properties over Europe. Remote Sensing 12(9):1369. 2019(1):5794869. https://doi.org/10.1155/2019/5794869 Sainepo BM, Gachene CK, Karuma A. 2018. Assessment of soil Padarian J, Minasny B, McBratney AB. 2019. Using deep organic carbon fractions and carbon management index under learning for digital soil mapping. Soil 5(1):79–89. different land use types in Olesharo Catchment, Narok Padbhushan R, Kumar U, Sharma S, Rana DS, Kumar R, Kohli County. Kenya. Carbon Balance and Management 13(1):1–9. A, Kumari P, Parmar B, Kaviraj M, Kumar Sinha A, Anna- Salvador F, Monerris J, Rochefort L. 2014. Peatlands of the purna K, Gupta VV. 2022. Impact of land-use changes on soil Peruvian Puna ecoregion: types, characteristics and distur- properties and carbon pools in India: a meta-analysis. Fron- bance. Mires Peat 15:1–17. tiers in Environmental Science 9:722. Segnini A, Posadas A, Quiroz R, Milori D, Saab SC, Neto LM, Vaz Pamo-Sedano J, Oscco-Coa CE. 2022. Análisis espacio temporal CMP. 2010. Spectroscopic assessment of soil organic matter in del bofedal de la comunidad de Ancomarca (Tacna-Perú) wetlands from the high Andes. Soil Science Society of durante el perı́odo 1990–2021, con técnicas de teledetección. America Journal 74(6):2246–2253. Revista Ciencias Biológicas y Ambientales 1(1):43–53. http Segnini A, de Souza AA, Novotny EH, Milori D, da Silva WTL, s://doi.org/10.33326/29585309.2022.1.1587 Bonagamba TJ, Posadas A, Quiroz R. 2013. Characterization Parsaie F, Farrokhian Firouzi A, Mousavi SR, Rahmani A, Sedri of peatland soils from the high Andes through 13C nuclear MH, Homaee M. 2021. Large-scale digital mapping of topsoil magnetic resonance spectroscopy. Soil Science Society of total nitrogen using machine learning models and associated America Journal 77(2):673–679. uncertainty map. Environmental Monitoring and Assessment SENAMHI: Servicio Nacional de Meterologı́a e Hidrologı́a del 193(4):1–15. Perú. 2022. Mapa Climático del Perú. https://www.senamhi.g Pauca-Tanco A, Ramos-Mamani C, Luque-Fernández CR, ob.pe/?p=mapa-climatico-del-peru. Last accessed 15/12/2022. Talavera-Delgado C, Villasante-Benavides JF, Quispe-Turpo Shafizadeh-Moghadam H, Minaei F, Talebi-khiyavi H, Xu T, JP, Villegas-Paredes L. 2020. Análisis espacio temporal y cli- Homaee M. 2022. Synergetic use of multi-temporal Sentinel- mático del humedal altoandino de Chalhuanca (Perú) durante 1, Sentinel-2, NDVI, and topographic factors for estimating el periodo 1986–2016. Revista de Teledetección 55:105–118. soil organic carbon. Catena 212:106077. https://doi.org/10.4995/raet.2020.13325 Signorel A, Aho K, Alfons A, Anderegg N, Aragon T, Arppe A, Peng Y, Xiong X, Adhikari K, Knadel M, Grunwald S, Greve MH. and others. 2022. DescTools: Tools for Descriptive Statistics. R 2015. Modeling soil organic carbon at regional scale by com- package version 0.99.47. bining multi-spectral images with laboratory spectra. PloS One Singh M, Sarkar B, Sarkar S, Churchman J, Bolan N, Mandal S, 10(11):e0142295. Menon M, Purakayastha TJ, Beerling DJ. 2018. Stabilization Polk MH, Young KR, Cano A, León B. 2019. Vegetation of an- of soil organic carbon as influenced by clay mineralogy. In: dean wetlands (bofedales) in huascarán national park. Peru. Sparks DL, editor. Advances in agronomy. Academic Press Mires and Peat 24(01):1–26. International Publishing. pp 33–84. Poveda G, Espinoza JC, Zuluaga MD, Solman SA, Garreaud R, Skarbø K, VanderMolen K. 2016. Maize migration: key crop Van Oevelen PJ. 2020. High impact weather events in the expands to higher altitudes under climate change in the An- Andes. Frontiers in Earth Science 8:162. des. Climate and Development 8:245–255. Quinn P, Beven K, Chevallier P, Planchon O. 1991. The pre- Smith P. 2008. Soil organic carbon dynamics and land-use diction of hillslope flow paths for distributed hydrological change. In: Braimoh AK, Vlek PLG, Eds. Land use and soil modelling using digital terrain models. Hydrological Processes resources, . Dordrecht International Publishing: Springer. pp 5(1):59–79. 9–22. R Core Team. 2022. R: A language and environment for statis- Smith CJ, Chalk PM. 2021. Carbon (d13C) dynamics in agroe- tical computing. Vienna, Austria: R Foundation for Statistical cosystems under traditional and minimum tillage systems: a Computing. review. Soil Research 59(7):661–672. Rolando JL, Turin C, Ramı́rez DA, Mares V, Monerris J, Quiroz Somarathna PDSN, Malone BP, Minasny B. 2016. Mapping soil R. 2017a. Key ecosystem services and ecological intensifica- organic carbon content over New South Wales, Australia tion of agriculture in the tropical high-Andean Puna as af- using local regression kriging. Geoderma Regional 7(1):38–48. fected by land-use and climate changes. Agriculture, Somarathna PDSN, Minasny B, Malone BP. 2017. More data or a Ecosystems & Environment 236:221–233. better model? Figuring out what matters most for the spatial Rolando JL, Dubeux JC, Perez W, Ramirez DA, Turin C, Ruiz- prediction of soil carbon. Soil Science Society of America Moreno M, Comerford NB, Mares V, Garcia S, Quiroz R. Journal 81(6):1413–1426. 2017b. Soil organic carbon stocks and fractionation under From Rangelands to Cropland, Land-use Change and Its Impact Sreenivas K, Dadhwal VK, Kumar S, Harsha GS, Mitran T, Su- biomarkers. Land Degradation & Development 32(4):1591– jatha G, Rama Suresh GJ, Fyzee MA, Ravisankar T. 2016. 1605. https://doi.org/10.1002/ldr.3720 Digital mapping of soil organic and inorganic carbon status in Xiao Y, Xue J, Zhang X, Wang N, Hong Y, Jiang Y, Zhou Y, Teng India. Geoderma 269:160–173. H, Hu B, Lugato E, Richer-de-Forges A, Arrouays D, Shi Z, Stein M. 1987. Large sample properties of simulations using Chen S. 2022. Improving pedotransfer functions for predicting Latin hypercube sampling. Technometrics 29(2):143–151. soil mineral associated organic carbon by ensemble machine Strobl C, Boulesteix AL, Zeileis A, Hothorn T. 2007. Bias in learning. Geoderma 428:116208. random forest variable importance measures: Illustrations, Xiong X, Grunwald S, Myers DB, Kim J, Harris WG, Comerford sources and a solution. BMC bioinformatics 8:1–21. https://d NB. 2014. Holistic environmental soil-landscape modeling of oi.org/10.1186/1471-2105-8-25 soil organic carbon. Environmental Modelling & Software Tovar C, Duivenvoorden JF, Sánchez-Vega I, Seijmonsbergen 57:202–215. AC. 2012. Recent changes in patch characteristics and plant Yang JM, Yang JY, Liu S, Hoogenboom G. 2014. An evaluation communities in the jalca grasslands of the Peruvian Andes. of the statistical methods for testing the performance of crop Biotropica 44:321–330. https://doi.org/10.1111/j.1744-7429. models with observed data. Agricultural Systems 127:81–89. 2011.00820.x. Yates LA, Aandahl Z, Richards SA, Brook BW. 2022. Cross val- Tovar C, Seijmonsbergen AC, Duivenvoorden JF. 2013. Moni- idation for model selection: a review with examples from toring land use and land cover change in mountain regions: ecology. Ecological Monographs 93(1):e1557. An example in the Jalca grasslands of the Peruvian Andes. Zemp M, Huss M, Thibert E, Eckert N, McNabb R, Huber J, Landsc. Urban Plan. 112:40–49. https://doi.org/10.1016/j.land Barandun M, Machguth H, Nussbaumer SU, Gärtner-Roer I, urbplan.2012.12.003. Thomson L, Paul F, Maussion F, Kutuzov S, Cogley JG. 2019. Turin C, Carbajal M, Zorogastúa P, Chamorro A, Quiroz R. 2018. Global glacier mass changes and their contributions to sea- El boom de la maca: transformando paisajes y sociedades level rise from 1961 to 2016. Nature 568:382–386. rurales de la zona central altoandina del Perú. 17 Seminario Zeraatpisheh M, Garosi Y, Owliaie HR, Ayoubi S, Taghizadeh- Permanente de Investigacion Agraria (SEPIA). Cajamarca Mehrjardi R, Scholten T, Xu M. 2022. Improving the spatial Perú. 29–31 Ago 2017. prediction of soil organic carbon using environmental United States Geological Survey: USGS 02323500 Suwan- covariates selection: A comparison of a group of environ- neeRiver Near Wilcox, Fla. 2020. https://waterdata.usgs.gov/ mental covariates. Catena 208:105723. usa/ nwis/uv?site_no=02323500. Last accessed 24/07/2022. Zhu C, Wei Y, Zhu F, Lu W, Fang Z, Li Z, Pan J. 2022. Digital Veronesi F, Schillaci C. 2019. Comparison between geostatistical mapping of soil organic carbon based on machine learning and and machine learning models as predictors of topsoil organic regression kriging. Sensors 22:8997. carbon with a focus on local uncertainty estimation. Ecolog- Zimmermann M, Meir P, Silman MR, Fedders A, Gibbon A, ical Indicators 101:1032–1044. Malhi Y, Urrego DH, Bush MB, Feeley KJ, Garcia KC, Dargie Wang S, Fan J, Song M, Yu G, Zhou L, Liu J, Zhong H, Gao L, Hu GC, Farfan WR, Goetz BP, Johnson WT, Kline KM, Modi AT, Z, Wu W, Song T. 2013. Patterns of SOC and soil 13C and their Rurau NMQ, Staudt BT, Zamora F. 2010. No Differences in relations to climatic factors and soil characteristics on the soil carbon stocks across the tree line in the Peruvian Andes. Qinghai-Tibetan Plateau. Plant and Soil 363:243–255. Ecosystems 13:62–74. Wang G, Jia Y, Li W. 2015. Effects of environmental and biotic factors on carbon isotopic fractionation during decomposition of soil organic matter. Scientific Reports 5(1):11043. Springer Nature or its licensor (e.g. a society or other partner) Wang Y, Qi Q, Bao Z, Wu L, Geng Q, Wang J. 2022. A novel holds exclusive rights to this article under a publishing agree- sampling design considering the local heterogeneity of soil for ment with the author(s) or other rightsholder(s); author self- farm field-level mapping with multiple soil properties. Preci- archiving of the accepted manuscript version of this article is sion Agriculture 24:1–22. solely governed by the terms of such publishing agreement and Xia S, Song Z, Wang Y, Wang W, Fu X, Singh BP, Kuzyakov Y, applicable law. Wang H. 2021. Soil organic matter turnover depending on land use change: Coupling C/N ratios, d13C, and lignin