A deterministic key gives one answer. But the inputs to that key carry error – analytical error on a lab value, estimation error on a VLM-extracted colour, regional-average error on a SoilGrids prior. classify_with_uncertainty() propagates that error into a probability distribution over classes, and – crucially – it scales the error to how each value was actually obtained.

The idea

classification_robustness() (since v0.9.42) perturbs every horizon cell by the same fixed fraction and reports one number: the share of Monte-Carlo runs that kept the baseline class. That treats a lab-measured clay value and a photo-guessed one as equally trustworthy.

classify_with_uncertainty() does two things differently:

It returns the whole posterior – P(class = x) for every class the profile reached – not just a single robustness percentage.
It weights the perturbation by the provenance ledger. Each (horizon, attribute) cell is perturbed by an amount set by its evidence grade:

Grade Source Multiplicative half-width

A measured 3 %

B spectra-predicted 7 %

C prior-inferred 10 %

D VLM-extracted 17 %

E user-assumed 30 %

(pH and Munsell columns get additive half-widths instead; see get_perturbation_scale().)

Grade	Source	Multiplicative half-width
A	measured	3 %
B	spectra-predicted	7 %
C	prior-inferred	10 %
D	VLM-extracted	17 %
E	user-assumed	30 %

So a classification that rests on assumptions is reported as genuinely uncertain, while one resting on measurements is reported as firm – even when both sit the same distance from a key boundary.

A first run

library(soilKey)

p <- make_ferralsol_canonical()
u <- classify_with_uncertainty(p, n = 200, system = "wrb2022")
u
#> <soilkey_uncertainty>  system=wrb2022  level=rsg
#>   baseline    : Ferralsols
#>   MC runs     : 200 (200 successful)
#>   entropy     : 0.000
#>   posterior   :
#>     Ferralsols                       100.0%
#>   most decisive attribute: ...

The fields of the returned soilkey_uncertainty object:

u$posterior     # named numeric vector, P(class), sums to 1
u$top1          # the modal class
u$entropy       # Shannon entropy -- 0 for a certain result
u$sensitivity   # data.table(attribute, importance)

Provenance changes the answer

The same profile, classified twice – once with its clay values measured, once with them merely assumed:

# Mark every clay value as a bare assumption (grade E).
p_assumed <- make_ferralsol_canonical()
for (i in seq_len(nrow(p_assumed$horizons))) {
  p_assumed$add_measurement(i, "clay_pct",
                            p_assumed$horizons$clay_pct[i],
                            source = "user_assumed", confidence = 0.2,
                            overwrite = TRUE)
}

classify_with_uncertainty(p,         n = 200)$entropy   # low  -- measured
classify_with_uncertainty(p_assumed, n = 200)$entropy   # high -- assumed

Nothing about the soil changed; only what we know about it did. That is exactly what a pedometric uncertainty estimate should capture.

The sensitivity ranking

classify_with_uncertainty() also runs a leave-one-attribute-out pass: each attribute is held fixed in turn, and the drop in flip rate measures how much that attribute drives the instability.

u$sensitivity
#>      attribute importance
#> 1     clay_pct      0.18
#> 2     cec_cmol      0.06
#> ...

The top row is the measurement that would most sharpen the classification – a direct, defensible answer to “what should I measure next?”.

Backward compatibility

classification_robustness() is unchanged by default. It gains one new argument, provenance_aware, off by default:

# Identical to every previous release:
classification_robustness(p, system = "wrb2022", n = 100)

# Opt in to grade-scaled perturbation:
classification_robustness(p, system = "wrb2022", n = 100,
                          provenance_aware = TRUE)

In the Shiny app

The Uncertainty tab of the Pro Shiny app (run_classify_app()) wraps all of this: pick a system, set the number of Monte-Carlo runs, and read off the posterior bar chart, the entropy, and the attribute-sensitivity table.