| Title: | Physics-Informed Neural Networks for Soil Water Retention Curves |
|---|---|
| Description: | Implements a physics-informed one-dimensional convolutional neural network (CNN1D-PINN) for estimating the complete soil water retention curve (SWRC) as a continuous function of matric potential, from soil texture, organic carbon, bulk density, and depth. The network architecture ensures strict monotonic decrease of volumetric water content with increasing suction by construction, through cumulative integration of non-negative slope outputs (monotone integral architecture). Four physics-based residual constraints adapted from Norouzi et al. (2025) <doi:10.1029/2024WR038149> are embedded in the loss function: (S1) linearity at the dry end (pF in [5, 7.6]); (S2) non-negativity at pF = 6.2; (S3) non-positivity at pF = 7.6; and (S4) a near-zero derivative in the saturated plateau region (pF in [-2, -0.3]). Includes tools for data preparation, model training, dense prediction, performance metrics, texture classification, and publication-quality visualisation. |
| Authors: | Hugo Rodrigues [aut, cre] (ORCID: <https://orcid.org/0000-0002-8070-8126>) |
| Maintainer: | Hugo Rodrigues <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.1.5 |
| Built: | 2026-05-26 08:55:51 UTC |
| Source: | https://github.com/hugomachadorodrigues/soilflux |
Add texture classification column to a data frame
add_texture( df, sand_col = "sand_total", silt_col = "silt", clay_col = "clay", out_col = "Texture" )add_texture( df, sand_col = "sand_total", silt_col = "silt", clay_col = "clay", out_col = "Texture" )
df |
A data frame. |
sand_col |
Column name for sand (default |
silt_col |
Column name for silt (default |
clay_col |
Column name for clay (default |
out_col |
Name of the output column (default |
The input data frame with an additional out_col column.
df <- data.frame(sand_total = c(70, 20), silt = c(15, 50), clay = c(15, 30)) add_texture(df)df <- data.frame(sand_total = c(70, 20), silt = c(15, 50), clay = c(15, 30)) add_texture(df)
Scales df[, scaler$cols] using the precomputed min/range.
Returns a numeric matrix with the same column order as scaler$cols.
apply_minmax(df, scaler)apply_minmax(df, scaler)
df |
A data frame or tibble. |
scaler |
A scaler object returned by |
A numeric matrix scaled to approximately [0, 1] per column.
Columns correspond to scaler$cols.
df_train <- data.frame(sand = c(20, 40, 60), clay = c(30, 20, 10)) sc <- fit_minmax(df_train, c("sand", "clay")) df_new <- data.frame(sand = c(50, 25), clay = c(15, 28)) apply_minmax(df_new, sc)df_train <- data.frame(sand = c(20, 40, 60), clay = c(30, 20, 10)) sc <- fit_minmax(df_train, c("sand", "clay")) df_new <- data.frame(sand = c(50, 25), clay = c(15, 28)) apply_minmax(df_new, sc)
Generates four sets of collocation points (random soil-property vectors and associated pF values) used to evaluate the physics constraints during training.
build_residual_sets( df_raw, x_inputs, S1 = 1500L, S2 = 500L, S3 = 500L, S4 = 1500L, pF_lin_min = 5, pF_lin_max = 7.6, pF0_pos = 6.2, pF1_neg = 7.6, pF_sat_min = -2, pF_sat_max = -0.3, seed = 123L )build_residual_sets( df_raw, x_inputs, S1 = 1500L, S2 = 500L, S3 = 500L, S4 = 1500L, pF_lin_min = 5, pF_lin_max = 7.6, pF0_pos = 6.2, pF1_neg = 7.6, pF_sat_min = -2, pF_sat_max = -0.3, seed = 123L )
df_raw |
Data frame with covariate columns (training split). |
x_inputs |
Character vector of covariate column names. |
S1 |
Number of S1 points — dry-end linearity (default 1500). |
S2 |
Number of S2 points — non-negativity at pF0 (default 500). |
S3 |
Number of S3 points — non-positivity at pF1 (default 500). |
S4 |
Number of S4 points — saturated plateau (default 1500). |
pF_lin_min |
Lower pF for the S1 linearity constraint (default 5.0). |
pF_lin_max |
Upper pF for the S1 linearity constraint (default 7.6). |
pF0_pos |
pF at which theta must be >= 0 — S2 (default 6.2). |
pF1_neg |
pF at which theta must be <= 0 — S3 (default 7.6). |
pF_sat_min |
Lower pF for the S4 plateau constraint (default -2.0). |
pF_sat_max |
Upper pF for the S4 plateau constraint (default -0.3). |
seed |
Integer random seed (default 123). |
A named list with four data frames: set1, set2, set3,
set4. Each data frame has one row per collocation point, with columns
corresponding to x_inputs (sampled uniformly within training range)
and a pF column.
## Not run: df <- data.frame( clay = c(20, 30, 10), silt = c(30, 40, 20), sand_total = c(50, 30, 70), Depth_num = c(15, 30, 60) ) sets <- build_residual_sets(df, c("clay", "silt", "sand_total", "Depth_num"), S1 = 50L, S2 = 20L, S3 = 20L, S4 = 50L) ## End(Not run)## Not run: df <- data.frame( clay = c(20, 30, 10), silt = c(30, 40, 20), sand_total = c(50, 30, 70), Depth_num = c(15, 30, 60) ) sets <- build_residual_sets(df, c("clay", "silt", "sand_total", "Depth_num"), S1 = 50L, S2 = 20L, S3 = 20L, S4 = 50L) ## End(Not run)
Constructs a Keras model implementing the monotone-integral architecture of Norouzi et al. (2025). The returned list contains two models that share weights:
theta_model — full prediction model (pF query + covariates → theta).
param_model — extracts the saturated water content (theta_s) from
covariates only.
build_swrc_model(n_covariates, hidden = c(128L, 64L), dropout = 0.1, K = 64L)build_swrc_model(n_covariates, hidden = c(128L, 64L), dropout = 0.1, K = 64L)
n_covariates |
Integer. Number of soil-property covariates ( |
|
Integer vector of length 2. Number of filters in the
first and second Conv1D layers (default |
|
dropout |
Numeric dropout rate after each Conv1D layer
(default |
K |
Integer. Number of knot points for the cumulative
integration grid (default |
A named list:
theta_modelKeras model: inputs [Xseq_knots, pf_norm],
output shape [N, 2] (theta_hat, theta_s).
param_modelKeras model: input Xseq_knots,
output shape [N, 1] (theta_s only).
KThe K value used.
dkThe knot spacing 1 / (K - 1).
knot_gridNumeric vector of knot positions.
## Not run: mod <- build_swrc_model(n_covariates = 9L) ## End(Not run)## Not run: mod <- build_swrc_model(n_covariates = 9L) ## End(Not run)
Returns the USDA texture class name for each row based on the sand, silt, and clay fractions. Inputs are expected in per-cent (0–100) and must sum to approximately 100.
classify_texture(sand, silt, clay, tol = 1)classify_texture(sand, silt, clay, tol = 1)
sand |
Numeric vector: sand content (%). |
silt |
Numeric vector: silt content (%). |
clay |
Numeric vector: clay content (%). |
tol |
Tolerance for the 100 % sum check (default |
Character vector of USDA texture class names. Returns NA
for rows where values are missing or do not sum to approximately 100.
classify_texture(sand = c(70, 20, 10, 40), silt = c(15, 50, 30, 40), clay = c(15, 30, 60, 20))classify_texture(sand = c(70, 20, 10, 40), silt = c(15, 50, 30, 40), clay = c(15, 30, 60, 20))
Functions to prepare soil data for input to the CNN1D monotone-integral model, including 3-D sequence array construction and observation matrix creation.
A convenience wrapper that calls predict_swrc() and swrc_metrics().
evaluate_swrc(object, newdata, obs_col = "theta_n")evaluate_swrc(object, newdata, obs_col = "theta_n")
object |
A |
newdata |
Data frame with covariate columns and |
obs_col |
Name of the observed theta column in |
A tibble with columns R2, RMSE, MAE, n.
Computes per-column minimum and range from df[, cols].
Constant columns (range == 0) are assigned a range of 1 to avoid
division by zero.
fit_minmax(df, cols)fit_minmax(df, cols)
df |
A data frame or tibble. |
cols |
Character vector of column names to include. |
A list with elements:
minNamed numeric vector of per-column minima.
rngNamed numeric vector of per-column ranges.
colsThe character vector cols (stored for later use).
df <- data.frame(sand = c(20, 40, 60), clay = c(30, 20, 10)) sc <- fit_minmax(df, c("sand", "clay")) scdf <- data.frame(sand = c(20, 40, 60), clay = c(30, 20, 10)) sc <- fit_minmax(df, c("sand", "clay")) sc
The main user-facing function for training. Given prepared training and (optionally) validation data, it builds the model, creates physics residual sets, runs the training loop with early stopping, and returns a fitted object for prediction and evaluation.
fit_swrc( train_df, x_inputs, val_df = NULL, hidden = c(128L, 64L), dropout = 0.1, lr = 0.001, epochs = 80L, batch_size = 256L, patience = 5L, K = 64L, lambdas = norouzi_lambdas("norouzi"), S1 = 1500L, S2 = 500L, S3 = 500L, S4 = 1500L, pF_lin_min = 5, pF_lin_max = 7.6, pF0_pos = 6.2, pF1_neg = 7.6, pF_sat_min = -2, pF_sat_max = -0.3, wet_split_cm = 4.2, w_wet = 1, w_dry = 1, pf_left = -2, pf_right = 7.6, seed = 123L, verbose = TRUE )fit_swrc( train_df, x_inputs, val_df = NULL, hidden = c(128L, 64L), dropout = 0.1, lr = 0.001, epochs = 80L, batch_size = 256L, patience = 5L, K = 64L, lambdas = norouzi_lambdas("norouzi"), S1 = 1500L, S2 = 500L, S3 = 500L, S4 = 1500L, pF_lin_min = 5, pF_lin_max = 7.6, pF0_pos = 6.2, pF1_neg = 7.6, pF_sat_min = -2, pF_sat_max = -0.3, wet_split_cm = 4.2, w_wet = 1, w_dry = 1, pf_left = -2, pf_right = 7.6, seed = 123L, verbose = TRUE )
train_df |
Data frame for training (output of
|
x_inputs |
Character vector of covariate column names. |
val_df |
Optional validation data frame (same structure as
|
|
Integer vector of length 2: Conv1D filter counts
(default |
|
dropout |
Dropout rate (default |
lr |
Learning rate for the Adam optimizer (default |
epochs |
Maximum number of epochs (default |
batch_size |
Mini-batch size (default |
patience |
Early-stopping patience in multiples of 5 epochs
(default |
K |
Number of knot points (default |
lambdas |
Named list of loss weights; use |
S1, S2, S3, S4
|
Residual set sizes (defaults: 1500, 500, 500, 1500). |
pF_lin_min |
Lower pF for S1 linearity constraint (default |
pF_lin_max |
Upper pF for S1 linearity constraint (default |
pF0_pos |
pF threshold for S2 (default |
pF1_neg |
pF threshold for S3 (default |
pF_sat_min |
Lower pF for S4 (default |
pF_sat_max |
Upper pF for S4 (default |
wet_split_cm |
Matric head (cm) separating wet/dry end (default |
w_wet |
Sample weight for wet observations (default |
w_dry |
Sample weight for dry observations (default |
pf_left |
Left pF domain boundary (default |
pf_right |
Right pF domain boundary (default |
seed |
Random seed (default |
verbose |
Logical; print progress (default |
An S3 object of class swrc_fit, a named list containing:
theta_modelThe fitted Keras model.
param_modelThe theta_s extractor model.
x_inputsCovariate names used.
scalerFitted min-max scaler.
KNumber of knot points.
dkKnot spacing.
knot_gridKnot positions in [0, 1].
pf_left,pf_right
pF domain boundaries.
theta_factorUnit multiplier for theta.
best_epochEpoch at which validation loss was minimised.
lambdasLoss weights used during training.
historyData frame of per-epoch training/validation losses.
## Not run: if (reticulate::py_module_available("tensorflow")) { df <- prepare_swrc_data(swrc_example, depth_col = "depth") fit <- fit_swrc(df, x_inputs = c("clay", "silt", "bd_gcm3", "soc", "Depth_num"), epochs = 2L, verbose = FALSE) } ## End(Not run)## Not run: if (reticulate::py_module_available("tensorflow")) { df <- prepare_swrc_data(swrc_example, depth_col = "depth") fit <- fit_swrc(df, x_inputs = c("clay", "silt", "bd_gcm3", "soc", "Depth_num"), epochs = 2L, verbose = FALSE) } ## End(Not run)
If the median raw value is > 10 it is assumed to be in kg/m3 and is divided by 100 to convert to g/cm3.
fix_bd_units(bd_raw)fix_bd_units(bd_raw)
bd_raw |
Numeric vector of raw bulk-density values. |
Numeric vector in g/cm3.
fix_bd_units(c(1.2, 1.45, 1.3)) # already g/cm3 fix_bd_units(c(120, 145, 130)) # kg/m3 -> g/cm3fix_bd_units(c(1.2, 1.45, 1.3)) # already g/cm3 fix_bd_units(c(120, 145, 130)) # kg/m3 -> g/cm3
Convert pF to matric head (cm)
head_from_pf(pf)head_from_pf(pf)
pf |
Numeric vector of pF values. |
Numeric vector of matric head values in cm.
head_from_pf(c(0, 1, 2, 3, 4.2))head_from_pf(c(0, 1, 2, 3, 4.2))
Convenience wrapper: converts head to pF then normalises.
head_normalize(h_cm, pf_left = -2, pf_right = 7.6)head_normalize(h_cm, pf_left = -2, pf_right = 7.6)
h_cm |
Numeric vector of matric heads in cm. |
pf_left |
Left boundary (default |
pf_right |
Right boundary (default |
Numeric vector in [0, 1].
head_normalize(c(1, 10, 100, 15850))head_normalize(c(1, 10, 100, 15850))
Converts scaled values back to original units.
invert_minmax(X_scaled, scaler)invert_minmax(X_scaled, scaler)
X_scaled |
Numeric matrix (or vector) of scaled values. |
scaler |
A scaler object returned by |
Numeric matrix in the original (unscaled) units.
df <- data.frame(sand = c(20, 40, 60), clay = c(30, 20, 10)) sc <- fit_minmax(df, c("sand", "clay")) Xs <- apply_minmax(df, sc) invert_minmax(Xs, sc)df <- data.frame(sand = c(20, 40, 60), clay = c(30, 20, 10)) sc <- fit_minmax(df, c("sand", "clay")) Xs <- apply_minmax(df, sc) invert_minmax(Xs, sc)
Functions to persist a fitted swrc_fit object to disk
and reload it for later use (prediction, spatial mapping, etc.).
Reconstructs the CNN1D Keras model from the saved weights and metadata
and returns a swrc_fit-compatible list that can be passed to
predict_swrc(), predict_swrc_dense(), etc.
load_swrc_model(dir = "./models/swrc", name = "swrc_model")load_swrc_model(dir = "./models/swrc", name = "swrc_model")
dir |
Directory containing the saved files (default
|
name |
Stem name used when saving (default |
A swrc_fit object (without history or param_model).
## Not run: if (reticulate::py_module_available("tensorflow")) { df <- prepare_swrc_data(swrc_example, depth_col = "depth") fit <- fit_swrc(df, x_inputs = c("clay", "silt", "bd_gcm3", "soc", "Depth_num"), epochs = 2L, verbose = FALSE) save_swrc_model(fit, dir = tempdir(), name = "model_test") fit2 <- load_swrc_model(tempdir(), "model_test") } ## End(Not run)## Not run: if (reticulate::py_module_available("tensorflow")) { df <- prepare_swrc_data(swrc_example, depth_col = "depth") fit <- fit_swrc(df, x_inputs = c("clay", "silt", "bd_gcm3", "soc", "Depth_num"), epochs = 2L, verbose = FALSE) save_swrc_model(fit, dir = tempdir(), name = "model_test") fit2 <- load_swrc_model(tempdir(), "model_test") } ## End(Not run)
Functions to compute R², RMSE, and MAE for soil water content predictions.
Build the physics-informed 1-D CNN with a monotone integral output layer, as described in Norouzi et al. (2025).
The model takes two inputs:
Xseq_knots: a 3-D tensor of shape [N, K, p+1] — for each
observation, p scaled covariates are broadcast across K knot
positions, and the knot positions themselves form the last channel.
pf_norm: a 2-D tensor of shape [N, 1] — the query pF value
normalised to [0, 1].
The output satisfies:
where is a 1-channel convolutional output. Monotone decrease
is guaranteed by construction because the integrand is always positive.
Norouzi, A. M., et al. (2025). Physics-Informed Neural Networks for Estimating a Continuous Form of the Soil Water Retention Curve. Journal of Hydrology.
Table 1 of Norouzi et al. (2025) defines six loss-weight hyperparameters.
This function returns them as a named list that can be passed to
fit_swrc() and compute_physics_loss().
norouzi_lambdas(config = c("norouzi", "smooth"))norouzi_lambdas(config = c("norouzi", "smooth"))
config |
Character string; either |
A named list:
lambda_wetWeight for wet-end data loss (lambda1 = 1).
lambda_dryWeight for dry-end data loss (lambda2 = 10).
lambda3S1 dry-end linearity (lambda3 = 1 or 10).
lambda4S2 non-negativity at pF0 (lambda4 = 1000).
lambda5S3 non-positivity at pF1 (lambda5 = 1000).
lambda6S4 saturated-plateau flatness (lambda6 = 1).
norouzi_lambdas() norouzi_lambdas("smooth")norouzi_lambdas() norouzi_lambdas("smooth")
Accepts strings of the form "0-5", "5-15", "100", etc. and returns
the numeric midpoint and a human-readable label (e.g. "0-5 cm").
parse_depth(s)parse_depth(s)
s |
A character string describing a depth interval or single depth. |
A named list with elements:
midNumeric midpoint in cm.
labelCharacter label, e.g. "0-5 cm".
parse_depth("0-5") parse_depth("100-200") parse_depth("30")parse_depth("0-5") parse_depth("100-200") parse_depth("30")
Applies parse_depth() row-wise and appends Depth_num and
Depth_label columns.
parse_depth_column(df, depth_col = "depth")parse_depth_column(df, depth_col = "depth")
df |
A data frame. |
depth_col |
Name of the depth column (character string). |
The input data frame with two extra columns:
Depth_num (numeric midpoint) and Depth_label (factor, ordered by
depth).
df <- data.frame(depth = c("0-5", "5-15", "15-30"), x = 1:3) parse_depth_column(df, "depth")df <- data.frame(depth = c("0-5", "5-15", "15-30"), x = 1:3) parse_depth_column(df, "depth")
Convert matric head (cm) to pF
pf_from_head(h_cm)pf_from_head(h_cm)
h_cm |
Numeric vector of matric head values in cm (positive). |
Numeric vector of pF values ().
pf_from_head(c(1, 10, 100, 1000, 15850))pf_from_head(c(1, 10, 100, 1000, 15850))
Maps the pF domain [pf_left, pf_right] linearly to [0, 1].
Values outside the domain are clipped.
pf_normalize(pf, pf_left = -2, pf_right = 7.6)pf_normalize(pf, pf_left = -2, pf_right = 7.6)
pf |
Numeric vector of pF values. |
pf_left |
Left boundary of the pF domain (default |
pf_right |
Right boundary of the pF domain (default |
Numeric vector in [0, 1].
pf_normalize(c(-2, 0, 4, 7.6))pf_normalize(c(-2, 0, 4, 7.6))
Functions for defining, building, and evaluating the four physics-based residual constraints of Norouzi et al. (2025).
Norouzi, A. M., et al. (2025). Physics-Informed Neural Networks for Estimating a Continuous Form of the Soil Water Retention Curve. Journal of Hydrology.
Creates a scatter plot of predicted vs. observed theta with a 1:1 line and optional regression line, optionally faceted by a grouping variable.
plot_pred_obs( df, obs_col = "theta_n", pred_col = "theta_predicted", group_col = NULL, show_lm = TRUE, show_stats = TRUE, ncol = 5L, base_size = 12, point_alpha = 0.25, title = NULL )plot_pred_obs( df, obs_col = "theta_n", pred_col = "theta_predicted", group_col = NULL, show_lm = TRUE, show_stats = TRUE, ncol = 5L, base_size = 12, point_alpha = 0.25, title = NULL )
df |
Data frame containing observed and predicted columns. |
obs_col |
Column name for observed theta (default |
pred_col |
Column name for predicted theta (default
|
group_col |
Column name for facet grouping (default |
show_lm |
Logical; add a linear regression line (default |
show_stats |
Logical; add R², RMSE, MAE text annotations (default
|
ncol |
Number of facet columns when |
base_size |
Base font size (default |
point_alpha |
Point transparency (default |
title |
Plot title (default |
A ggplot object.
## Not run: df_plot <- data.frame( theta_n = c(0.30, 0.25, 0.20, 0.15, 0.10), theta_predicted = c(0.28, 0.26, 0.22, 0.14, 0.11) ) plot_pred_obs(df_plot) ## End(Not run)## Not run: df_plot <- data.frame( theta_n = c(0.30, 0.25, 0.20, 0.15, 0.10), theta_predicted = c(0.28, 0.26, 0.22, 0.14, 0.11) ) plot_pred_obs(df_plot) ## End(Not run)
Creates a ggplot2 figure showing continuous SWRC predictions
(lines) optionally overlaid with observed data points.
plot_swrc( pred_curves, obs_points = NULL, curve_col = "theta", obs_col = "theta_n", group_col = "PEDON_ID", facet_row = NULL, facet_col = NULL, x_limits = NULL, y_limits = c(-0.25, 7.75), line_colour = "steelblue4", point_colour = "black", base_size = 12, title = NULL )plot_swrc( pred_curves, obs_points = NULL, curve_col = "theta", obs_col = "theta_n", group_col = "PEDON_ID", facet_row = NULL, facet_col = NULL, x_limits = NULL, y_limits = c(-0.25, 7.75), line_colour = "steelblue4", point_colour = "black", base_size = 12, title = NULL )
pred_curves |
A data frame (or tibble) of dense curve predictions,
typically returned by |
obs_points |
Optional data frame of observed data. Must contain
|
curve_col |
Column in |
obs_col |
Column in |
group_col |
Column name used to distinguish individual profiles
in |
facet_row |
Column for facet rows (default |
facet_col |
Column for facet columns (default |
x_limits |
Numeric vector of length 2 for the x-axis (theta)
range (default |
y_limits |
Numeric vector of length 2 for the y-axis (pF)
range (default |
line_colour |
Colour of the predicted curve lines (default
|
point_colour |
Colour of the observed data points (default
|
base_size |
Base font size for |
title |
Plot title (default |
A ggplot object.
## Not run: pred_curves <- data.frame( PEDON_ID = rep(c("P1", "P2"), each = 5), pF = rep(c(0, 1, 2, 3, 4), 2), theta = c(0.42, 0.36, 0.28, 0.18, 0.08, 0.38, 0.32, 0.25, 0.16, 0.07) ) plot_swrc(pred_curves, group_col = "PEDON_ID") ## End(Not run)## Not run: pred_curves <- data.frame( PEDON_ID = rep(c("P1", "P2"), each = 5), pF = rep(c(0, 1, 2, 3, 4), 2), theta = c(0.42, 0.36, 0.28, 0.18, 0.08, 0.38, 0.32, 0.25, 0.16, 0.07) ) plot_swrc(pred_curves, group_col = "PEDON_ID") ## End(Not run)
Creates a bar chart comparing R², RMSE, and MAE across multiple models or configurations.
plot_swrc_metrics( metrics_df, model_col = "model", palette = "Blues", base_size = 12, title = NULL )plot_swrc_metrics( metrics_df, model_col = "model", palette = "Blues", base_size = 12, title = NULL )
metrics_df |
A data frame with columns |
model_col |
Column name for the model identifier (default
|
palette |
RColorBrewer palette (default |
base_size |
Base font size (default |
title |
Plot title (default |
A ggplot object.
## Not run: m1 <- swrc_metrics(c(0.30, 0.25, 0.20), c(0.28, 0.26, 0.22)) |> dplyr::mutate(model = "Model 1") m2 <- swrc_metrics(c(0.30, 0.25, 0.20), c(0.29, 0.24, 0.21)) |> dplyr::mutate(model = "Model 2") plot_swrc_metrics(dplyr::bind_rows(m1, m2)) ## End(Not run)## Not run: m1 <- swrc_metrics(c(0.30, 0.25, 0.20), c(0.28, 0.26, 0.22)) |> dplyr::mutate(model = "Model 1") m2 <- swrc_metrics(c(0.30, 0.25, 0.20), c(0.29, 0.24, 0.21)) |> dplyr::mutate(model = "Model 2") plot_swrc_metrics(dplyr::bind_rows(m1, m2)) ## End(Not run)
Plots the per-epoch training and (optionally) validation loss from the
history slot of a swrc_fit object. A log(1 + x) y-axis is used so
that both the (larger) composite training loss and the (small) validation
MSE remain clearly visible on the same panel.
plot_training_history( fit, loss_col = "loss", val_col = "val_mse", log_scale = TRUE, base_size = 12 )plot_training_history( fit, loss_col = "loss", val_col = "val_mse", log_scale = TRUE, base_size = 12 )
fit |
A |
loss_col |
Column in |
val_col |
Validation loss column (default |
log_scale |
Logical; apply |
base_size |
Base font size (default |
A ggplot object.
Functions for visualising soil water retention curves, predicted vs. observed scatter plots, and model performance metrics.
Functions for generating soil water content predictions
from a fitted swrc_fit object, both at specific pF points and as
dense continuous curves.
See predict_swrc() and predict_swrc_dense() for return value
details.
Given a fitted swrc_fit object and a new data frame of soil properties,
returns predicted volumetric water content at each supplied pF (or matric
head) value.
predict_swrc(object, newdata, pf = NULL, heads = NULL, ...)predict_swrc(object, newdata, pf = NULL, heads = NULL, ...)
object |
A |
newdata |
A data frame with the same covariate columns used during
training (i.e. |
pf |
Optional numeric vector of pF values (overrides
|
heads |
Optional numeric vector of matric heads in cm (overrides
|
... |
Ignored. |
A numeric vector of predicted theta values (m3/m3), one per row
in newdata.
## Not run: if (reticulate::py_module_available("tensorflow")) { df <- prepare_swrc_data(swrc_example, depth_col = "depth") fit <- fit_swrc(df, x_inputs = c("clay", "silt", "bd_gcm3", "soc", "Depth_num"), epochs = 2L, verbose = FALSE) pred <- predict_swrc(fit, newdata = df) } ## End(Not run)## Not run: if (reticulate::py_module_available("tensorflow")) { df <- prepare_swrc_data(swrc_example, depth_col = "depth") fit <- fit_swrc(df, x_inputs = c("clay", "silt", "bd_gcm3", "soc", "Depth_num"), epochs = 2L, verbose = FALSE) pred <- predict_swrc(fit, newdata = df) } ## End(Not run)
For each unique (PEDON_ID × depth) profile in newdata, predicts theta
across a dense grid of pF values and returns a tidy long-format tibble.
predict_swrc_dense( object, newdata, n_points = 1000L, pf_range = NULL, id_cols = c("PEDON_ID", "Depth_num", "Depth_label", "Texture") )predict_swrc_dense( object, newdata, n_points = 1000L, pf_range = NULL, id_cols = c("PEDON_ID", "Depth_num", "Depth_label", "Texture") )
object |
A |
newdata |
A data frame with covariate columns plus (optionally)
|
n_points |
Number of equally spaced pF points (default |
pf_range |
Numeric vector of length 2: min and max pF values for
the output grid (default |
id_cols |
Character vector of columns used to identify profiles
(default |
A tibble with columns: all id_cols present in newdata,
pF, matric_head, and theta (predicted volumetric water content
in m3/m3).
## Not run: if (reticulate::py_module_available("tensorflow")) { df <- prepare_swrc_data(swrc_example, depth_col = "depth") fit <- fit_swrc(df, x_inputs = c("clay", "silt", "bd_gcm3", "soc", "Depth_num"), epochs = 2L, verbose = FALSE) dense <- predict_swrc_dense(fit, newdata = df, n_points = 50) } ## End(Not run)## Not run: if (reticulate::py_module_available("tensorflow")) { df <- prepare_swrc_data(swrc_example, depth_col = "depth") fit <- fit_swrc(df, x_inputs = c("clay", "silt", "bd_gcm3", "soc", "Depth_num"), epochs = 2L, verbose = FALSE) dense <- predict_swrc_dense(fit, newdata = df, n_points = 50) } ## End(Not run)
Uses the param_model (which maps covariate inputs to theta_s) to
extract the modelled saturated water content for each row of newdata.
predict_theta_s(object, newdata)predict_theta_s(object, newdata)
object |
A |
newdata |
Data frame with covariate columns. |
Numeric vector of theta_s values (m3/m3).
Dispatches to predict_swrc().
## S3 method for class 'swrc_fit' predict(object, newdata, pf = NULL, heads = NULL, ...)## S3 method for class 'swrc_fit' predict(object, newdata, pf = NULL, heads = NULL, ...)
object |
A |
newdata |
A data frame with the same covariate columns used during
training (i.e. |
pf |
Optional numeric vector of pF values (overrides
|
heads |
Optional numeric vector of matric heads in cm (overrides
|
... |
Ignored. |
A numeric vector of predicted volumetric water content values
(m3/m3), one per row in newdata.
A convenience wrapper that:
Renames columns to standard names.
Parses the depth column.
Fixes bulk-density units.
Detects and normalises volumetric water content units.
Computes per-profile maximum theta.
Drops rows with missing key variables.
prepare_swrc_data( df, x_cols = NULL, depth_col = "depth", fix_bd = TRUE, fix_theta = TRUE )prepare_swrc_data( df, x_cols = NULL, depth_col = "depth", fix_bd = TRUE, fix_theta = TRUE )
df |
A data frame with soil characterization data. |
x_cols |
Named character vector mapping standard names to
actual column names. Standard names: |
depth_col |
Column name for depth (character string, default
|
fix_bd |
Logical; apply |
fix_theta |
Logical; scale theta to m3/m3 if needed (default
|
A tibble with standardised columns plus Depth_num,
Depth_label, bd_gcm3, theta_n (normalised WC), and
theta_max_n (per-profile maximum theta).
## Not run: df <- data.frame( ID = rep("P01", 3), sand_pct = c(50, 50, 50), silt = c(30, 30, 30), clay = c(20, 20, 20), soc = c(1.2, 1.2, 1.2), bd_gcc = c(1.3, 1.3, 1.3), matric_head = c(10, 100, 1000), theta = c(0.40, 0.30, 0.20), depth = c("0-30", "0-30", "0-30") ) df_prep <- prepare_swrc_data(df, x_cols = c(PEDON_ID = "ID", sand = "sand_pct", bd = "bd_gcc", water_content = "theta")) ## End(Not run)## Not run: df <- data.frame( ID = rep("P01", 3), sand_pct = c(50, 50, 50), silt = c(30, 30, 30), clay = c(20, 20, 20), soc = c(1.2, 1.2, 1.2), bd_gcc = c(1.3, 1.3, 1.3), matric_head = c(10, 100, 1000), theta = c(0.40, 0.30, 0.20), depth = c("0-30", "0-30", "0-30") ) df_prep <- prepare_swrc_data(df, x_cols = c(PEDON_ID = "ID", sand = "sand_pct", bd = "bd_gcc", water_content = "theta")) ## End(Not run)
Print method for swrc_fit
## S3 method for class 'swrc_fit' print(x, ...)## S3 method for class 'swrc_fit' print(x, ...)
x |
An |
... |
Ignored. |
Invisibly returns x (called for its side effect of printing a
summary of the fitted model to the console).
Saves the Keras model weights as an HDF5 file and the R metadata
(scalers, hyperparameters, etc.) as an .rds file inside dir.
save_swrc_model(fit, dir, name = "swrc_model")save_swrc_model(fit, dir, name = "swrc_model")
fit |
A |
dir |
Directory where the model will be saved. Created if it does not exist. |
name |
Stem name for the output files (default |
Invisibly returns a named list with paths to the two files:
weights_pathPath to the .weights.h5 file.
meta_pathPath to the .rds metadata file.
## Not run: if (reticulate::py_module_available("tensorflow")) { df <- prepare_swrc_data(swrc_example, depth_col = "depth") fit <- fit_swrc(df, x_inputs = c("clay", "silt", "bd_gcm3", "soc", "Depth_num"), epochs = 2L, verbose = FALSE) save_swrc_model(fit, dir = tempdir(), name = "model_test") } ## End(Not run)## Not run: if (reticulate::py_module_available("tensorflow")) { df <- prepare_swrc_data(swrc_example, depth_col = "depth") fit <- fit_swrc(df, x_inputs = c("clay", "silt", "bd_gcm3", "soc", "Depth_num"), epochs = 2L, verbose = FALSE) save_swrc_model(fit, dir = tempdir(), name = "model_test") } ## End(Not run)
Fit and apply a column-wise min-max scaler to a data frame or matrix, mapping each feature to [0, 1].
Summary method for swrc_fit
## S3 method for class 'swrc_fit' summary(object, ...)## S3 method for class 'swrc_fit' summary(object, ...)
object |
An |
... |
Ignored. |
Invisibly returns object (called for its side effect of printing
a detailed summary including covariates, training parameters, and loss
weights to the console).
A synthetic but physically realistic soil characterisation dataset generated to illustrate the functions in soilFlux. The data mimics the structure of the Florida Soil Characterization Database (FSCD) used in Rodrigues et al. / Norouzi et al. (2025), with van Genuchten curves used to produce internally consistent water-content observations.
swrc_exampleswrc_example
A data frame with 4 800 rows and 14 columns:
Character. Unique soil profile identifier
(e.g. "P0001").
Numeric. Total sand content (%).
Numeric. Silt content (%).
Numeric. Clay content (%).
Numeric. Soil organic carbon (%).
Numeric. Bulk density (g/cm3).
Numeric. Very fine sand fraction (%).
Numeric. Fine sand fraction (%).
Numeric. Medium sand fraction (%).
Numeric. Coarse sand fraction (%).
Numeric. Matric head (cm H2O, positive).
Numeric. Volumetric water content (m3/m3).
Approximately 1% of values are NA to simulate missing
measurements.
Character. Depth interval (e.g. "0-5", "5-15",
etc.).
Character. USDA texture class (one of: Sand, Loamy Sand, Sandy Loam, Loam, Silt Loam, Clay Loam, Clay).
The dataset contains 120 unique profiles (PEDON_ID) across five
depth intervals (0–5, 5–15, 15–30, 30–60, 60–100 cm) and eight
matric-head points per depth (pF approximately 0, 1, 1.5, 2, 2.5, 3, 4.2, 7).
Profiles are evenly distributed across seven USDA texture classes.
Water-content observations were generated with the van Genuchten (1980) equation using parameters that vary realistically with texture and depth, then small Gaussian noise was added.
Synthetic dataset generated by
data-raw/create_example_data.R.
van Genuchten, M. T. (1980). A closed-form equation for predicting the hydraulic conductivity of unsaturated soils. Soil Science Society of America Journal, 44(5), 892–898.
data("swrc_example") head(swrc_example) table(swrc_example$Texture[!duplicated(swrc_example$PEDON_ID)]) # Prepare for modelling df <- prepare_swrc_data(swrc_example, depth_col = "depth")data("swrc_example") head(swrc_example) table(swrc_example$Texture[!duplicated(swrc_example$PEDON_ID)]) # Prepare for modelling df <- prepare_swrc_data(swrc_example, depth_col = "depth")
Returns R², RMSE, and MAE between observed and predicted volumetric water content (or any continuous response).
swrc_metrics(observed, predicted, na.rm = TRUE)swrc_metrics(observed, predicted, na.rm = TRUE)
observed |
Numeric vector of observed values. |
predicted |
Numeric vector of predicted values (same length). |
na.rm |
Logical; remove |
A tibble::tibble() with one row and columns R2, RMSE,
MAE, n (number of non-missing pairs).
obs <- c(0.30, 0.25, 0.20, 0.15, 0.10) pred <- c(0.28, 0.26, 0.22, 0.14, 0.11) swrc_metrics(obs, pred)obs <- c(0.30, 0.25, 0.20, 0.15, 0.10) pred <- c(0.28, 0.26, 0.22, 0.14, 0.11) swrc_metrics(obs, pred)
Applies swrc_metrics() within each level of one or more grouping
variables, returning a tidy data frame.
swrc_metrics_by_group(df, obs_col, pred_col, group_col, na.rm = TRUE)swrc_metrics_by_group(df, obs_col, pred_col, group_col, na.rm = TRUE)
df |
A data frame containing observed and predicted columns. |
obs_col |
Name of the observed-values column (character string). |
pred_col |
Name of the predicted-values column (character string). |
group_col |
Character vector of grouping column names. |
na.rm |
Logical; passed to |
A tibble with one row per group and columns: grouping variables,
R2, RMSE, MAE, n.
df <- data.frame( obs = c(0.30, 0.25, 0.20, 0.15, 0.10, 0.35, 0.28, 0.18), pred = c(0.28, 0.26, 0.22, 0.14, 0.11, 0.33, 0.27, 0.19), texture = c("Clay","Clay","Clay","Clay","Clay","Sand","Sand","Sand") ) swrc_metrics_by_group(df, "obs", "pred", "texture")df <- data.frame( obs = c(0.30, 0.25, 0.20, 0.15, 0.10, 0.35, 0.28, 0.18), pred = c(0.28, 0.26, 0.22, 0.14, 0.11, 0.33, 0.27, 0.19), texture = c("Clay","Clay","Clay","Clay","Clay","Sand","Sand","Sand") ) swrc_metrics_by_group(df, "obs", "pred", "texture")
Check whether a model directory contains a valid saved model
swrc_model_exists(dir = "./models/swrc", name = "swrc_model")swrc_model_exists(dir = "./models/swrc", name = "swrc_model")
dir |
Directory path. |
name |
Model stem name. |
Logical scalar.
Classify soil samples into USDA texture classes from sand, silt, and clay percentages, and create texture triangle plots.
Creates a USDA texture triangle with filled texture-class regions,
USDA class boundary lines, class name labels, and optional data points —
all using a pure ggplot2 ternary-to-Cartesian projection (Clay at top,
Sand at bottom-left, Silt at bottom-right). No external ternary-plot
package is required.
texture_triangle( df, sand_col = "sand_total", silt_col = "silt", clay_col = "clay", color_col = NULL, title = "Soil Texture Triangle", point_size = 3, alpha = 0.85 )texture_triangle( df, sand_col = "sand_total", silt_col = "silt", clay_col = "clay", color_col = NULL, title = "Soil Texture Triangle", point_size = 3, alpha = 0.85 )
df |
A data frame. |
sand_col |
Column name for sand (default |
silt_col |
Column name for silt (default |
clay_col |
Column name for clay (default |
color_col |
Column name for colouring data points (default |
title |
Plot title. |
point_size |
Point size (default |
alpha |
Point transparency (default |
The background is always coloured by USDA texture class (12-class
USDA-inspired palette). Data points are overlaid and can be mapped to a
separate variable via color_col.
A ggplot object.
df <- data.frame(sand_total = c(70, 20, 10, 40), silt = c(15, 50, 30, 40), clay = c(15, 30, 60, 20), Texture = c("Sandy Loam", "Silt Loam", "Clay", "Loam")) texture_triangle(df, color_col = "Texture")df <- data.frame(sand_total = c(70, 20, 10, 40), silt = c(15, 50, 30, 40), clay = c(15, 30, 60, 20), Texture = c("Sandy Loam", "Silt Loam", "Clay", "Loam")) texture_triangle(df, color_col = "Texture")
Returns 100 if the maximum value suggests percentage units (> 1.5), otherwise returns 1 (m3/m3 assumed).
theta_unit_factor(theta_vec)theta_unit_factor(theta_vec)
theta_vec |
Numeric vector of volumetric water content values. |
Numeric scalar: 100 (percentage) or 1 (m3/m3).
theta_unit_factor(c(0.1, 0.35, 0.5)) # returns 1 theta_unit_factor(c(10, 35, 50)) # returns 100theta_unit_factor(c(0.1, 0.35, 0.5)) # returns 1 theta_unit_factor(c(10, 35, 50)) # returns 100
High-level and low-level functions for training the physics-informed CNN1D model, including the eager-mode train step and the main fitting loop with early stopping.
Internal and exported helpers for pF conversion, depth parsing, and unit detection.