A deterministic key gives one
answer. But the inputs to that key carry error – analytical error on a
lab value, estimation error on a VLM-extracted colour, regional-average
error on a SoilGrids prior. classify_with_uncertainty()
propagates that error into a probability distribution over classes, and
– crucially – it scales the error to how each value was actually
obtained.
classification_robustness() (since v0.9.42) perturbs
every horizon cell by the same fixed fraction and reports one number:
the share of Monte-Carlo runs that kept the baseline class. That treats
a lab-measured clay value and a photo-guessed one as equally
trustworthy.
classify_with_uncertainty() does two things
differently:
It returns the whole posterior –
P(class = x) for every class the profile reached – not just
a single robustness percentage.
It weights the perturbation by the provenance
ledger. Each (horizon, attribute) cell is
perturbed by an amount set by its evidence grade:
| Grade | Source | Multiplicative half-width |
|---|---|---|
| A | measured | 3 % |
| B | spectra-predicted | 7 % |
| C | prior-inferred | 10 % |
| D | VLM-extracted | 17 % |
| E | user-assumed | 30 % |
(pH and Munsell columns get additive half-widths instead; see
get_perturbation_scale().)
So a classification that rests on assumptions is reported as genuinely uncertain, while one resting on measurements is reported as firm – even when both sit the same distance from a key boundary.
library(soilKey)
p <- make_ferralsol_canonical()
u <- classify_with_uncertainty(p, n = 200, system = "wrb2022")
u
#> <soilkey_uncertainty> system=wrb2022 level=rsg
#> baseline : Ferralsols
#> MC runs : 200 (200 successful)
#> entropy : 0.000
#> posterior :
#> Ferralsols 100.0%
#> most decisive attribute: ...The fields of the returned soilkey_uncertainty
object:
The same profile, classified twice – once with its clay values measured, once with them merely assumed:
# Mark every clay value as a bare assumption (grade E).
p_assumed <- make_ferralsol_canonical()
for (i in seq_len(nrow(p_assumed$horizons))) {
p_assumed$add_measurement(i, "clay_pct",
p_assumed$horizons$clay_pct[i],
source = "user_assumed", confidence = 0.2,
overwrite = TRUE)
}
classify_with_uncertainty(p, n = 200)$entropy # low -- measured
classify_with_uncertainty(p_assumed, n = 200)$entropy # high -- assumedNothing about the soil changed; only what we know about it did. That is exactly what a pedometric uncertainty estimate should capture.
classify_with_uncertainty() also runs a
leave-one-attribute-out pass: each attribute is held fixed in turn, and
the drop in flip rate measures how much that attribute drives the
instability.
The top row is the measurement that would most sharpen the classification – a direct, defensible answer to “what should I measure next?”.
classification_robustness() is unchanged by default. It
gains one new argument, provenance_aware, off by
default:
The Uncertainty tab of the Pro Shiny app
(run_classify_app()) wraps all of this: pick a system, set
the number of Monte-Carlo runs, and read off the posterior bar chart,
the entropy, and the attribute-sensitivity table.