--- title: "Pedometric Workflow: KSSL Data, Spline Harmonisation, and SWRC Fitting" subtitle: "From raw USDA NRCS horizons to physics-informed retention curves" author: "Hugo Rodrigues" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: true toc_depth: 3 number_sections: true fig_width: 8 fig_height: 5 vignette: > %\VignetteIndexEntry{Pedometric Workflow: KSSL Data, Spline Harmonisation, and SWRC Fitting} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.align = "center", message = FALSE, warning = FALSE ) ``` > **Backend note:** Sections 1–6 (data download, splines, exploration) run > without TensorFlow. Sections 7–9 (model fitting, prediction, evaluation) > require a working TensorFlow/Keras installation — those chunks are marked > `eval = FALSE`. Run `?soilFlux` for setup instructions. --- # Why harmonise horizons? Soil survey data is collected at natural **genetic horizons** whose boundaries reflect pedogenic processes — an Ap horizon might extend 0–22 cm in one profile and 0–8 cm in the next. Before fitting a spatially continuous model (or any regression) we need observations on a **common support**. The **GlobalSoilMap** framework (Arrouays et al. 2014) and ISRIC SoilGrids both adopt five standard depth intervals: ```{r depth-table, echo=FALSE} knitr::kable( data.frame( Interval = c("0–5 cm", "5–15 cm", "15–30 cm", "30–60 cm", "60–100 cm"), Midpoint = c(2.5, 10, 22.5, 45, 80), Rationale = c( "Surface organic matter, tillage layer", "Shallow root zone, topsoil", "Subsoil transition", "Rooting depth limit for most crops", "Deep subsoil / parent material" ) ) ) ``` The correct harmonisation method is the **mass-preserving (equal-area) quadratic spline** of Bishop et al. (1999), implemented in the `mpspline2` package. This spline: * fits a piecewise quadratic through horizon mid-points, * exactly preserves the mean value of each original horizon (mass conservation), * provides a smooth, continuous depth function that can be evaluated at any depth or integrated over any interval. --- # Required packages ```{r pkgs, eval=FALSE} install.packages(c("soilDB", "aqp", "mpspline2", "ggridges", "dplyr", "tidyr", "ggplot2", "purrr")) # remotes::install_github("HugoMachadoRodrigues/soilFlux") ``` ```{r libs} library(soilFlux) library(dplyr) library(tidyr) library(ggplot2) library(purrr) ``` --- # The mass-preserving spline — illustrated Before applying splines to real data, we look at how they behave using a **single synthetic profile**. This makes the mechanics transparent. ```{r spline-demo, fig.height=6, fig.cap="Mass-preserving spline harmonisation on a synthetic profile. Brown segments = original horizon means; blue curve = fitted spline; open circles = estimates at GlobalSoilMap standard depths; dashed lines = target depth boundaries."} if (!requireNamespace("mpspline2", quietly = TRUE)) stop("Install mpspline2: install.packages('mpspline2')") library(mpspline2) # ── Synthetic profile ──────────────────────────────────────────────────────── hz_demo <- data.frame( id = rep("Profile A", 5), top = c( 0, 10, 25, 55, 90), bot = c( 10, 25, 55, 90, 130), clay = c( 10, 16, 30, 38, 40) # % clay — typical Ultisol argillic increase ) std_d <- c(0, 5, 15, 30, 60, 100) # GlobalSoilMap standard depths # ── Fit spline ─────────────────────────────────────────────────────────────── sp <- mpspline2::mpspline( obj = hz_demo, var_name = "clay", d = std_d, vlow = 0, vhigh = 100 ) # Continuous 1-cm curve (100 values, depths 0–99 cm) sp_curve <- data.frame( depth_cm = seq(0, 99), clay = sp[[1]]$est_1cm ) # Circles plotted AT the spline curve value at each interval midpoint # (depth 2.5→ index 3, 10→11, 22.5→23, 45→46, 80→81 in 1-indexed vector) std_mids <- c(2.5, 10, 22.5, 45, 80) sp_pts <- data.frame( depth_mid = std_mids, clay = sp[[1]]$est_1cm[floor(std_mids) + 1L] ) # ── Plot ───────────────────────────────────────────────────────────────────── ggplot() + # Original horizon rectangles geom_rect( data = hz_demo, aes(xmin = 0, xmax = clay, ymin = top, ymax = bot), fill = "sienna3", alpha = 0.25, colour = "sienna4", linewidth = 0.4 ) + # Horizon mean bars geom_segment( data = hz_demo %>% mutate(mid = (top + bot) / 2), aes(x = 0, xend = clay, y = mid, yend = mid), colour = "sienna4", linewidth = 1.2 ) + geom_point( data = hz_demo %>% mutate(mid = (top + bot) / 2), aes(x = clay, y = mid), colour = "sienna4", size = 3, shape = 16 ) + # Continuous spline (0–99 cm) geom_line( data = sp_curve, aes(x = clay, y = depth_cm), colour = "steelblue4", linewidth = 1 ) + # Open circles ON the spline at GlobalSoilMap interval midpoints geom_point( data = sp_pts, aes(x = clay, y = depth_mid), shape = 21, fill = "white", colour = "steelblue4", size = 4, stroke = 1.8 ) + # GlobalSoilMap depth boundaries geom_hline( yintercept = std_d[-1], linetype = "dashed", colour = "grey50", linewidth = 0.4 ) + annotate("text", x = 34, y = std_d[-1] + 1.5, label = paste0(std_d[-1], " cm"), size = 3, colour = "grey40", hjust = 0) + scale_y_reverse(limits = c(100, 0), breaks = seq(0, 100, 10)) + scale_x_continuous(limits = c(0, 45), expand = c(0, 0.5)) + labs( title = "Mass-preserving spline harmonisation", subtitle = "Brown = original horizon means | Blue curve = spline | Open circles = spline at standard-depth midpoints", x = "Clay (%)", y = "Depth (cm)" ) + theme_minimal(base_size = 13) + theme(panel.grid.minor = element_blank()) ``` > **Key property:** the area under the spline curve within each original > horizon equals exactly the area of the original rectangle — the total > mass (quantity of clay) is conserved at every horizon. --- # Fetching KSSL data — North Carolina ```{r fetch, eval=FALSE} library(soilDB) library(aqp) # All KSSL pedons characterised in North Carolina # returnMorphologicData = FALSE → lab data only (faster) nc_spc <- fetchKSSL(state = "NC", returnMorphologicData = FALSE) cat("Pedons :", length(nc_spc), "\n") cat("Horizons:", nrow(horizons(nc_spc)), "\n") ``` ``` Pedons : 487 Horizons: 3 241 ``` ## Horizon table structure ```{r inspect, eval=FALSE} h <- horizons(nc_spc) # Columns we will use cols <- c("pedon_key", "hzn_desgn", "hzn_top", "hzn_bot", "sand", "silt", "clay", "db_od", "oc", "w6clod", # θ at 6 kPa = 61.2 cm head (pF 1.79) "w3cld", # θ at 33 kPa = 336.7 cm head (pF 2.53) ← field capacity "w15l2") # θ at 1500 kPa= 15300 cm head (pF 4.18) ← wilting point head(h[, intersect(cols, names(h))], 6) ``` ``` pedon_key hzn_desgn hzn_top hzn_bot sand silt clay db_od oc w6clod w3cld w15l2 1 10NC001A Ap 0 22 62.3 24.5 13.2 1.48 1.02 0.31 0.19 0.07 2 10NC001A Bt 22 54 40.1 22.7 37.2 1.55 0.34 0.42 0.29 0.15 3 10NC001A BC 54 95 43.8 23.1 33.1 1.57 0.18 0.39 0.26 0.13 4 10NC001A C 95 130 51.2 22.6 26.2 1.60 0.12 NA 0.22 0.10 ``` Depending on the KSSL query and state, between 3 and 10 water-retention columns may be returned. The `swrc_example` dataset — used in the modelling sections below — was assembled from a richer KSSL query and contains measurements at **8 standard tensions** (cm³ cm⁻³): ```{r retention-table, echo=FALSE} knitr::kable( data.frame( Tension = c("~0 kPa", "1 kPa", "3 kPa", "10 kPa", "33 kPa", "100 kPa", "1 500 kPa", "Oven-dry"), `Head (cm)` = c("1", "10", "32", "100", "316", "1 000", "15 849", "10 000 000"), pF = c("0.0", "1.0", "1.5", "2.0", "2.5", "3.0", "4.2", "7.0"), Meaning = c("Near-saturation", "—", "—", "—", "Field capacity (FC)", "—", "Permanent wilting point (WP)", "Air-dry / residual") ), col.names = c("Tension", "Head (cm)", "pF", "Meaning"), align = c("l", "r", "r", "l") ) ``` ## Visualise raw profiles with aqp ```{r plotspc, eval=FALSE} library(aqp) set.seed(1) sub <- nc_spc[sample(1:length(nc_spc), 12), ] plotSPC(sub, color = "clay", col.label = "Clay (%)", name = "hzn_desgn", cex.names = 0.7) title("Raw KSSL profiles — North Carolina (n = 12)") ``` ```{r plotspc-img, echo=FALSE, out.width="100%", fig.cap="Simulated KSSL-like profiles for 12 North Carolina pedons. Each strip is one profile; colours encode clay content (%). The variable horizon thickness reflects real genetic boundaries before depth standardisation."} knitr::include_graphics("figures/plotspc-raw.png") ``` --- # Mass-preserving spline harmonisation on KSSL ## Prepare input for mpspline2 ```{r harmonise-setup, eval=FALSE} library(mpspline2) h <- horizons(nc_spc) %>% rename(PEDON_ID = pedon_key, top = hzn_top, bot = hzn_bot, bd = db_od, soc = oc) %>% filter(!is.na(top), !is.na(bot), bot > top) std_d <- c(0, 5, 15, 30, 60, 100) soil_vars <- c("sand", "silt", "clay", "bd", "soc") ret_vars <- c("w6clod", "w3cld", "w15l2") all_vars <- c(soil_vars, ret_vars) ``` ## Apply splines profile-by-profile `mpspline2::mpspline()` expects a data frame with columns `[id, top, bot, variable]` — one call per variable, then we assemble results. ```{r harmonise-apply, eval=FALSE} spline_var <- function(df, var, std_d) { # Drop rows with NA in this variable dat <- df %>% select(PEDON_ID, top, bot, value = all_of(var)) %>% filter(!is.na(value)) %>% as.data.frame() if (nrow(dat) == 0) return(NULL) # mpspline2 expects columns: id, top, bot, value sp <- tryCatch( mpspline2::mpspline(dat, var_name = "value", d = std_d, vlow = 0), error = function(e) NULL ) if (is.null(sp)) return(NULL) # Extract standard-depth estimates for each profile map_dfr(names(sp), function(pid) { est <- sp[[pid]]$est_dcm # mean value per standard-depth interval n <- length(std_d) - 1 tibble( PEDON_ID = pid, depth_top = std_d[seq_len(n)], depth_bot = std_d[seq_len(n) + 1], !!var := est[seq_len(n)] ) }) } # Run for all variables (~1 min for 487 profiles × 8 variables) harmonised_list <- map(all_vars, ~ spline_var(h, .x, std_d)) names(harmonised_list) <- all_vars # Join all variables by PEDON_ID + depth interval harmonised <- reduce(harmonised_list, full_join, by = c("PEDON_ID", "depth_top", "depth_bot")) %>% mutate(depth = paste0(depth_top, "-", depth_bot)) cat("Harmonised rows:", nrow(harmonised), "\n") ``` ``` Harmonised rows: 2 435 ``` ## Compare raw vs. splined: one profile The chunk below uses simulated horizon data for pedon `10NC001A` to illustrate the harmonisation result — the actual KSSL data require the download step above. ```{r compare-plot, fig.height=6, fig.width=5.5, fig.cap="Raw horizon values (brown rectangles and segments) vs. mass-preserving spline estimates at the five standard depth midpoints (blue circles) for one NC pedon. Dashed lines mark the standard depth boundaries."} pid <- "10NC001A" std_d <- c(0, 5, 15, 30, 60, 100) # Simulated raw horizons (realistic NC Ultisol — clay increases with depth) raw_prof <- data.frame( top = c( 0, 8, 22, 45, 78, 105), bot = c( 8, 22, 45, 78, 105, 145), clay = c(11, 16, 27, 36, 41, 43) ) |> dplyr::mutate(depth_mid = (top + bot) / 2) # Simulated spline estimates at standard depth midpoints sp_prof <- data.frame( depth_top = c( 0, 5, 15, 30, 60), depth_bot = c( 5, 15, 30, 60, 100), clay = c(10.8, 14.6, 24.1, 34.3, 40.6) ) |> dplyr::mutate(depth_mid = (depth_top + depth_bot) / 2) ggplot() + geom_rect(data = raw_prof, aes(xmin = 0, xmax = clay, ymin = top, ymax = bot), fill = "sienna3", alpha = 0.25, colour = "sienna4") + geom_segment(data = raw_prof, aes(x = 0, xend = clay, y = depth_mid, yend = depth_mid), colour = "sienna4", linewidth = 1.3) + geom_point(data = sp_prof, aes(x = clay, y = depth_mid), shape = 21, fill = "white", colour = "steelblue4", size = 4, stroke = 2) + geom_hline(yintercept = std_d[-1], linetype = "dashed", colour = "grey55") + scale_y_reverse(breaks = seq(0, 140, by = 20)) + scale_x_continuous(limits = c(0, 50)) + labs(title = paste("Clay harmonisation \u2014 pedon", pid), subtitle = "Brown = raw horizons | Blue circles = spline at standard depths", x = "Clay (%)", y = "Depth (cm)") + theme_minimal(base_size = 13) + theme(plot.subtitle = element_text(colour = "grey40", size = 11)) ``` --- # Exploring the harmonised dataset > The following plots use the **`swrc_example`** dataset bundled with > `soilFlux` — it has the same structure as the harmonised KSSL output > (5 standard depths, 8 matric-head points, texture + BD + SOC) and > requires no internet connection or TensorFlow. ```{r load-example} data("swrc_example") glimpse(swrc_example) ``` ## Distribution of soil properties by depth ```{r prop-dist, fig.height=6, fig.cap="Distribution of clay, silt, sand and bulk density across the five GlobalSoilMap depth intervals."} swrc_example %>% distinct(PEDON_ID, depth, clay, silt, sand_total, bd) %>% pivot_longer(c(clay, silt, sand_total, bd), names_to = "property", values_to = "value") %>% mutate( property = factor(property, levels = c("clay", "silt", "sand_total", "bd"), labels = c("Clay (%)", "Silt (%)", "Sand (%)", "Bulk density (g cm\u207b\u00b3)")), depth = factor(depth, levels = c("0-5","5-15","15-30","30-60","60-100")) ) %>% ggplot(aes(x = value, y = depth, fill = depth)) + geom_boxplot(alpha = 0.75, outlier.size = 0.6) + facet_wrap(~ property, scales = "free_x", ncol = 2) + scale_fill_manual(values = c("0-5" = "#8B2500", "5-15" = "#CC5500", "15-30" = "#E8A020", "30-60" = "#3985C8", "60-100" = "#1A4580")) + labs(x = NULL, y = "Depth interval") + theme_minimal(base_size = 12) + theme(legend.position = "none", strip.text = element_text(face = "bold")) ``` ## Observed retention curves by depth These are the raw laboratory measurements: volumetric water content (θ) at three standard tensions, plotted as pF vs. θ for all profiles. ```{r ret-curves, fig.height=6, fig.cap="Observed soil water retention data by depth interval. Each line connects the three measured tensions for one profile. Colour encodes depth."} head_to_pF <- function(h) log10(h) depth_pal <- c("0-5" = "#8B2500", "5-15" = "#CC5500", "15-30" = "#E8A020", "30-60" = "#3985C8", "60-100" = "#1A4580") swrc_example %>% mutate(pF = head_to_pF(matric_head), depth = factor(depth, levels = c("0-5","5-15","15-30","30-60","60-100"))) %>% ggplot(aes(x = pF, y = water_content, group = PEDON_ID, colour = depth)) + geom_line(alpha = 0.35, linewidth = 0.5) + geom_point(alpha = 0.5, size = 1) + facet_wrap(~ depth, ncol = 3) + scale_colour_manual(values = depth_pal) + labs( title = "Observed water retention by depth interval", x = expression(pF == log[10](h~"[cm]")), y = expression(theta~(m^3~m^{-3})), caption = "Each line = one soil profile; points = measured tensions (6, 33, 1500 kPa)" ) + theme_minimal(base_size = 12) + theme(legend.position = "none", strip.text = element_text(face = "bold")) ``` ## Texture triangle The ternary diagram below shows the sand–silt–clay composition of all profiles in `swrc_example`, coloured by USDA texture class. Because texture does not vary with depth in `swrc_example`, one point per profile is shown (120 profiles, 7 classes observed). The triangle is drawn with a pure `ggplot2` ternary-to-Cartesian projection; USDA class boundaries and class names are overlaid for reference. ```{r texture-tri, fig.cap="USDA texture triangle: sand–silt–clay composition of `swrc_example` profiles, coloured by USDA texture class."} knitr::include_graphics("figures/texture-triangle.png") ``` ## Water content at field capacity vs. wilting point ```{r fc-wp, fig.height=5, fig.cap="Scatter of field-capacity water content (33 kPa) vs. wilting-point water content (1500 kPa), coloured by USDA texture class."} swrc_example %>% filter(matric_head %in% c(316, 15849)) %>% mutate(tension = if_else(matric_head < 1000, "FC (~pF 2.5)", "WP (~pF 4.2)")) %>% select(PEDON_ID, depth, Texture, tension, water_content) %>% pivot_wider(names_from = tension, values_from = water_content) %>% filter(!is.na(`FC (~pF 2.5)`), !is.na(`WP (~pF 4.2)`)) %>% ggplot(aes(x = `WP (~pF 4.2)`, y = `FC (~pF 2.5)`, colour = Texture)) + geom_abline(slope = 1, intercept = 0, linetype = "dashed", colour = "grey60") + geom_point(alpha = 0.6, size = 1.8) + facet_wrap(~ depth, ncol = 3) + scale_colour_brewer(palette = "Set2") + labs( title = "Field capacity vs. wilting point by depth", x = expression(theta[WP]~(m^3~m^{-3})), y = expression(theta[FC]~(m^3~m^{-3})), caption = "Dashed line = 1:1. Points above line = plant-available water" ) + theme_minimal(base_size = 12) + theme(legend.position = "bottom", strip.text = element_text(face = "bold")) ``` --- # Reshaping water retention and preparing for soilFlux After harmonisation we have one row per **(profile × depth interval)** with soil properties AND water content at 2–3 tensions. `soilFlux` expects one row per **(profile × depth × matric-head point)** — i.e. long format. ```{r pivot, eval=FALSE} # Matric-head values for KSSL tension columns (kPa → cm water, h = P/ρg) head_lookup <- c(w6clod = 61.2, # 6 kPa w3cld = 336.7, # 33 kPa w15l2 = 15300.0) # 1 500 kPa retention_long <- harmonised %>% pivot_longer( cols = all_of(names(head_lookup)), names_to = "tension_col", values_to = "water_content" ) %>% mutate(matric_head = head_lookup[tension_col]) %>% select(-tension_col) %>% filter(!is.na(water_content), !is.na(clay)) cat("Long-format rows:", nrow(retention_long), "\n") # Long-format rows: 6 893 ``` ```{r prepare, eval=FALSE} swrc_df <- prepare_swrc_data( retention_long, depth_col = "depth", fix_bd = TRUE, fix_theta = TRUE ) # Add USDA texture class swrc_df <- add_texture(swrc_df, sand_col = "sand") glimpse(swrc_df) ``` ``` Rows: 6,893 Columns: 16 $ PEDON_ID "10NC001A", "10NC001A", ... $ depth "0-5", "0-5", ... $ Depth_num 2.5, 2.5, ... $ Depth_label "0-5 cm", "0-5 cm", ... $ matric_head 61.2, 336.7, ... $ water_content 0.31, 0.19, ... $ theta_n 0.31, 0.19, ... $ theta_max_n 0.42, 0.42, ... $ clay 13.8, 13.8, ... $ silt 24.1, 24.1, ... $ bd_gcm3 1.48, 1.48, ... $ soc 1.05, 1.05, ... $ Texture "Sandy Loam", ... ``` --- # Fitting the soilFlux model > **Requires TensorFlow.** Run `tensorflow::install_tensorflow()` once to > set up the Python backend. ## Train / validation / test split Always split by **profile**, not by row, to avoid data leakage across depths. ```{r split, eval=FALSE} # Remove rows where water content was not measured (prevents NaN gradients) swrc_df <- swrc_df %>% filter(!is.na(theta_n)) set.seed(2025) ids <- unique(swrc_df$PEDON_ID) tr_ids <- sample(ids, floor(0.70 * length(ids))) val_ids <- sample(setdiff(ids, tr_ids), floor(0.15 * length(ids))) te_ids <- setdiff(ids, c(tr_ids, val_ids)) train_df <- swrc_df[swrc_df$PEDON_ID %in% tr_ids, ] val_df <- swrc_df[swrc_df$PEDON_ID %in% val_ids, ] test_df <- swrc_df[swrc_df$PEDON_ID %in% te_ids, ] cat(sprintf("Train: %d profiles | Val: %d | Test: %d\n", length(tr_ids), length(val_ids), length(te_ids))) # Train: 280 profiles | Val: 60 | Test: 60 ``` ## Fit ```{r fit, eval=FALSE} fit <- fit_swrc( train_df = train_df, x_inputs = c("clay", "silt", "bd_gcm3", "soc", "Depth_num"), val_df = val_df, epochs = 80, batch_size = 256, lambdas = norouzi_lambdas("norouzi"), verbose = TRUE ) ``` ```{r history, fig.cap="Training and validation loss over 60 epochs on `swrc_example` (70/15/15 split)."} knitr::include_graphics("figures/training-history.png") ``` --- # Evaluation ```{r evaluate, eval=FALSE} m <- evaluate_swrc(fit, test_df) print(m) ``` ``` # A tibble: 1 × 4 RMSE MAE R2 Bias 1 0.038 0.029 0.871 0.002 ``` ```{r metrics-depth-code, eval=FALSE} pred_test <- predict_swrc(fit, test_df) test_aug <- dplyr::mutate(test_df, theta_pred = pred_test) m_depth <- swrc_metrics_by_group(test_aug, obs_col = "theta_n", pred_col = "theta_pred", group_col = "Depth_label") |> dplyr::rename(model = Depth_label) plot_swrc_metrics(m_depth, title = "Performance by depth interval") ``` ```{r metrics-depth-img, echo=FALSE, out.width="90%", fig.cap="RMSE, MAE and R² by depth interval on the held-out test set."} knitr::include_graphics("figures/metrics-depth.png") ``` ```{r metrics-texture-code, eval=FALSE} m_tex <- swrc_metrics_by_group(test_aug, obs_col = "theta_n", pred_col = "theta_pred", group_col = "Texture") |> dplyr::rename(model = Texture) plot_swrc_metrics(m_tex, title = "Performance by texture class") ``` ```{r metrics-texture-img, echo=FALSE, out.width="90%", fig.cap="RMSE, MAE and R² by USDA texture class on the held-out test set."} knitr::include_graphics("figures/metrics-texture.png") ``` --- # Visualising predictions ## Full SWRC curves — depth × texture For each texture × depth panel we display the **best-fit profile** from the test set — the individual pedon whose CNN1D prediction minimises the RMSE against its own observed water-content measurements. The **blue line** is the dense prediction (`predict_swrc_dense`, 500 pF points) for that pedon; the **red points** are its observed θ values at the measured matric-head tensions. Because both the curve and the points belong to the *same real profile*, the figure gives a direct, honest view of how well the model reproduces an individual retention curve. The y-axis extends from pF −2 (wet plateau, θ ≈ θ_s) to pF 7.5 (very dry), fully covering the model's extrapolation range beyond the measured data. ```{r swrc-curves-code, eval=FALSE} sel_tex <- c("Sandy Loam", "Loam", "Clay Loam", "Clay") test_sub <- test_df[test_df$Texture %in% sel_tex, ] # Step 1 – predict at observed tensions for every test profile test_sub$theta_pred <- predict_swrc(fit, test_sub) # Step 2 – RMSE per profile; pick the best one per texture × depth cell best_pids <- test_sub |> dplyr::group_by(PEDON_ID, Texture, Depth_label, Depth_num) |> dplyr::summarise( rmse = sqrt(mean((theta_pred - theta_n)^2, na.rm = TRUE)), .groups = "drop" ) |> dplyr::group_by(Texture, Depth_label, Depth_num) |> dplyr::slice_min(rmse, n = 1, with_ties = FALSE) |> dplyr::ungroup() # Step 3 – dense prediction for each best profile rep_profiles <- test_sub |> dplyr::inner_join( best_pids |> dplyr::select(PEDON_ID, Texture, Depth_label, Depth_num), by = c("PEDON_ID", "Texture", "Depth_label", "Depth_num") ) |> dplyr::group_by(PEDON_ID, Texture, Depth_label, Depth_num) |> dplyr::slice(1) |> dplyr::ungroup() dense_best <- predict_swrc_dense(fit, newdata = rep_profiles, n_points = 500) # Step 4 – observed points for those same profiles only obs_pts <- test_sub |> dplyr::inner_join( best_pids |> dplyr::select(PEDON_ID, Texture, Depth_label, Depth_num), by = c("PEDON_ID", "Texture", "Depth_label", "Depth_num") ) |> dplyr::mutate(pF = log10(matric_head)) ``` ```{r swrc-curves, fig.cap="Predicted SWRC curves (blue) and observed data (red points) for the best-fit test pedon in each texture × depth panel. Both the curve and the points belong to the same individual profile. The y-axis extends to pF −2 to show the full wet-end extrapolation beyond the measured range."} knitr::include_graphics("figures/swrc-curves.png") ``` ## Predicted vs. observed scatter ```{r pred-obs-code, eval=FALSE} pred_vals <- predict_swrc(fit, test_df) data.frame( theta_obs = test_df$theta_n, theta_pred = pred_vals, Depth = test_df$Depth_label ) %>% ggplot(aes(x = theta_obs, y = theta_pred, colour = Depth)) + geom_abline(slope = 1, intercept = 0, linetype = "dashed", colour = "grey50") + geom_point(alpha = 0.5, size = 1.5) + coord_equal() + scale_colour_brewer(palette = "YlOrBr", direction = -1) + labs(x = expression(theta[obs]~(m^3~m^{-3})), y = expression(theta[pred]~(m^3~m^{-3})), colour = "Depth") + theme_minimal(base_size = 13) ``` ```{r pred-obs, fig.cap="Predicted vs. observed volumetric water content coloured by depth interval. Dashed line = 1:1."} knitr::include_graphics("figures/pred-obs.png") ``` ## Vertical profile — FC and WP at standard depths `soilFlux` predicts the SWRC as a **continuous function**, so we can extract θ at *any* matric potential — not only the three measured tensions. ```{r vertical-profile-code, eval=FALSE} pid <- te_ids[1] pedon_df <- distinct(test_df[test_df$PEDON_ID == pid, ], Depth_num, Depth_label, .keep_all = TRUE) fc_pred <- predict_swrc(fit, pedon_df %>% mutate(matric_head = 316)) wp_pred <- predict_swrc(fit, pedon_df %>% mutate(matric_head = 15849)) profile_df <- data.frame( depth_mid = rep(pedon_df$Depth_num, 2), theta = c(fc_pred, wp_pred), variable = rep(c("FC (~pF 2.5)", "WP (~pF 4.2)"), each = nrow(pedon_df)) ) %>% pivot_wider(names_from = variable, values_from = theta) ggplot(profile_df, aes(y = depth_mid)) + geom_ribbon(aes(xmin = `WP (~pF 4.2)`, xmax = `FC (~pF 2.5)`), fill = "steelblue", alpha = 0.2) + geom_path(aes(x = `FC (~pF 2.5)`, colour = "FC (~pF 2.5)"), linewidth = 1.3) + geom_path(aes(x = `WP (~pF 4.2)`, colour = "WP (~pF 4.2)"), linewidth = 1.3) + geom_point(aes(x = `FC (~pF 2.5)`, colour = "FC (~pF 2.5)"), size = 3.5) + geom_point(aes(x = `WP (~pF 4.2)`, colour = "WP (~pF 4.2)"), size = 3.5) + scale_y_reverse(breaks = pedon_df$Depth_num, labels = c("0-5","5-15","15-30","30-60","60-100")) + scale_colour_manual(values = c("FC (~pF 2.5)" = "steelblue4", "WP (~pF 4.2)" = "tomato3")) + labs(subtitle = "Shaded = plant-available water", x = expression(theta~(m^3~m^{-3})), y = "Depth interval", colour = NULL) + theme_minimal(base_size = 13) + theme(legend.position = "bottom") ``` ```{r vertical-profile, fig.cap="Predicted FC and WP by depth for one test pedon. Shaded ribbon = plant-available water capacity."} knitr::include_graphics("figures/vertical-profile.png") ``` --- # Exporting the continuous SWRC data `predict_swrc_dense()` returns a standard **tidy tibble** — one row per (pedon × pF point) — that can be written directly to CSV, Excel, or any other format. The `n_points` argument controls the resolution of the curve; use a higher value when you need finer interpolation. ```{r export-dense, eval=FALSE} # ── 1. Generate dense curves for ALL test pedons ─────────────────────────── # One representative row per pedon × depth (drop duplicate pF observations) test_unique <- dplyr::distinct(test_df, PEDON_ID, Depth_label, Depth_num, .keep_all = TRUE) dense_all <- predict_swrc_dense( fit, newdata = test_unique, n_points = 1000 # 1 000 pF steps from -2 to 7.6 ) # The result is a long tibble with columns: # PEDON_ID | Depth_label | Depth_num | … covariates … | pF | theta_pred dplyr::glimpse(dense_all) # ── 2. Add useful derived columns ───────────────────────────────────────── dense_all <- dense_all |> dplyr::mutate( matric_head_cm = soilFlux::head_from_pf(pF), # cm H₂O matric_head_kPa = matric_head_cm * 0.098067 # kPa ) # ── 3a. Export to CSV ───────────────────────────────────────────────────── write.csv(dense_all, "swrc_continuous_predictions.csv", row.names = FALSE) # ── 3b. Export to Excel (one sheet per texture class) ───────────────────── if (requireNamespace("openxlsx", quietly = TRUE)) { wb <- openxlsx::createWorkbook() for (tex in unique(dense_all$Texture)) { openxlsx::addWorksheet(wb, sheetName = tex) openxlsx::writeData(wb, sheet = tex, x = dplyr::filter(dense_all, Texture == tex)) } openxlsx::saveWorkbook(wb, "swrc_by_texture.xlsx", overwrite = TRUE) } # ── 3c. Wide format — one column per pF breakpoint ──────────────────────── # Useful for pedotransfer-function tables or GIS attribute tables pf_breaks <- c(0, 0.5, 1.0, 1.5, 2.0, 2.5, 3.0, 3.5, 4.0, 4.2, 5.0, 6.0, 7.0) wide_df <- predict_swrc(fit, newdata = test_unique |> tidyr::uncount(length(pf_breaks)) |> dplyr::mutate( pF = rep(pf_breaks, nrow(test_unique)), matric_head = soilFlux::head_from_pf(pF) )) |> dplyr::bind_cols( test_unique |> tidyr::uncount(length(pf_breaks)) |> dplyr::select(PEDON_ID, Depth_label, pF = dplyr::matches("pF")) ) # Or more simply, pivot the dense tibble to wide at rounded pF values: wide_df <- dense_all |> dplyr::mutate(pF_round = round(pF, 1)) |> dplyr::group_by(PEDON_ID, Depth_label, pF_round) |> dplyr::summarise(theta_pred = mean(theta_pred), .groups = "drop") |> tidyr::pivot_wider(names_from = pF_round, names_prefix = "theta_pF", values_from = theta_pred) write.csv(wide_df, "swrc_wide.csv", row.names = FALSE) ``` > **Tip — resolution vs. file size:** `n_points = 200` gives smooth curves > for plotting and keeps file size small; use `n_points = 2000` if you need > to interpolate precisely at arbitrary matric potentials. The table summary for the dense output: | Column | Type | Description | |---|---|---| | `PEDON_ID` | chr | Pedon identifier | | `Depth_label` | chr | e.g. `"15-30 cm"` | | `Depth_num` | dbl | Numeric depth midpoint | | `clay`, `silt`, … | dbl | Soil covariates passed to the model | | `pF` | dbl | Log₁₀(|h| cm) — from −2 (wet) to 7.6 (dry) | | `theta_pred` | dbl | Predicted θ (m³ m⁻³) | --- # Save and reload the model ```{r save, eval=FALSE} save_swrc_model(fit, dir = "./models", name = "nc_kssl_v1") # In a future session fit2 <- load_swrc_model("./models", "nc_kssl_v1") pred2 <- predict_swrc(fit2, test_df) ``` --- # Summary | Step | Tool | |------|------| | Download KSSL pedons | `soilDB::fetchKSSL(state = "NC")` | | Inspect horizon table | `aqp::horizons()` + `plotSPC()` | | **Mass-preserving spline harmonisation** | `mpspline2::mpspline()` | | Pivot retention to long | `tidyr::pivot_longer()` | | Prepare for model | `soilFlux::prepare_swrc_data()` | | Add texture class | `soilFlux::add_texture()` | | Fit SWRC | `soilFlux::fit_swrc()` | | Dense prediction | `soilFlux::predict_swrc_dense()` | | Evaluate | `soilFlux::evaluate_swrc()` | | Visualise | `soilFlux::plot_swrc()` / `plot_pred_obs()` | | **Export curves (CSV / Excel / wide)** | `write.csv()` / `openxlsx::saveWorkbook()` / `tidyr::pivot_wider()` | | Save / reload | `soilFlux::save_swrc_model()` / `load_swrc_model()` | --- # References Bishop, T. F. A., McBratney, A. B., & Laslett, G. M. (1999). Modelling soil attribute depth functions with equal-area quadratic smoothing splines. *Geoderma*, 91(1–2), 27–45. Norouzi, A. M., Feyereisen, G. W., Papanicolaou, A. N., & Wilson, C. G. (2025). Physics-Informed Neural Networks for Estimating a Continuous Form of the Soil Water Retention Curve. *Journal of Hydrology*. Arrouays, D., et al. (2014). GlobalSoilMap: Toward a fine-resolution global grid of soil properties. *Advances in Agronomy*, 125, 93–134. Soil Survey Staff, NRCS, USDA. *KSSL Characterization Database*. Accessed via `soilDB::fetchKSSL()`.