quivers.diagnostics

Adapter to the ArviZ Bayesian-analysis ecosystem. Converts quivers' inference records into xarray.DataTree objects consumable by every ArviZ 1.x function (plot_trace, plot_forest, plot_ppc, loo, compare, hdi, ...).

diagnostics

Adapter to the ArviZ Bayesian-analysis ecosystem.

Quivers' inference layer produces quivers.inference.MCMCResult records and quivers.inference.guides.base.Guide fits. The canonical 2026 Bayesian-analysis library is ArviZ, whose xarray.DataTree-based data model is the lingua franca for posterior-summary, diagnostic, posterior-predictive, model-comparison, and calibration tooling.

The functions in this subpackage convert quivers fits into ArviZ DataTrees and wrap the canonical ArviZ entry points (arviz.compare, arviz.loo, arviz.hdi, arviz.plot_ppc) with shape-aware, quivers-typed signatures. No posterior-analysis primitives are reimplemented here; ArviZ owns the analytics.

to_datatree

to_datatree(posterior: MCMCResult, *, observed_data: Mapping[str, Tensor] | None = None, posterior_predictive: Mapping[str, Tensor] | None = None, log_likelihood: Mapping[str, Tensor] | None = None, constant_data: Mapping[str, Tensor] | None = None, coords: Mapping[str, list] | None = None, dims: Mapping[str, list[str]] | None = None) -> DataTree

Convert an MCMCResult into an ArviZ-style DataTree.

PARAMETER DESCRIPTION
posterior

Sampler output. posterior.samples[name] carries an (num_chains, num_samples, *site_shape) tensor; each becomes a posterior variable. log_densities populates sample_stats/lp; acceptance_rates and divergence_counts populate sample_stats.

TYPE: MCMCResult

observed_data

Site name to observed tensor (the original data used at fit time). Becomes the observed_data group.

TYPE: Mapping[str, Tensor] DEFAULT: None

posterior_predictive

Site name to posterior-predictive draws of shape (num_chains, num_samples, *site_shape). Becomes the posterior_predictive group.

TYPE: Mapping[str, Tensor] DEFAULT: None

log_likelihood

Site name to per-observation log-likelihood of shape (num_chains, num_samples, *obs_shape). Becomes the log_likelihood group; required for arviz.loo / arviz.waic.

TYPE: Mapping[str, Tensor] DEFAULT: None

constant_data

Site name to fixed covariate tensor (e.g. design matrix). Becomes the constant_data group.

TYPE: Mapping[str, Tensor] DEFAULT: None

coords

Coordinate values per named axis (e.g. {"Verb": ["eat", "drink", "run"]}). Forwarded to ArviZ.

TYPE: Mapping[str, list] DEFAULT: None

dims

Per-site axis names (e.g. {"beta": ["Verb"]}). Forwarded to ArviZ.

TYPE: Mapping[str, list[str]] DEFAULT: None

RETURNS DESCRIPTION
DataTree

Canonical ArviZ DataTree consumable by every plotting and diagnostic function in the ArviZ 1.x API.

Source code in src/quivers/diagnostics/arviz_io.py
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
def to_datatree(
    posterior: MCMCResult,
    *,
    observed_data: Mapping[str, torch.Tensor] | None = None,
    posterior_predictive: Mapping[str, torch.Tensor] | None = None,
    log_likelihood: Mapping[str, torch.Tensor] | None = None,
    constant_data: Mapping[str, torch.Tensor] | None = None,
    coords: Mapping[str, list] | None = None,
    dims: Mapping[str, list[str]] | None = None,
) -> xr.DataTree:
    """Convert an `MCMCResult` into an ArviZ-style DataTree.

    Parameters
    ----------
    posterior : MCMCResult
        Sampler output.  ``posterior.samples[name]`` carries an
        ``(num_chains, num_samples, *site_shape)`` tensor; each
        becomes a posterior variable.  ``log_densities`` populates
        ``sample_stats/lp``; ``acceptance_rates`` and
        ``divergence_counts`` populate ``sample_stats``.
    observed_data : Mapping[str, torch.Tensor], optional
        Site name to observed tensor (the original data used at
        fit time).  Becomes the ``observed_data`` group.
    posterior_predictive : Mapping[str, torch.Tensor], optional
        Site name to posterior-predictive draws of shape
        ``(num_chains, num_samples, *site_shape)``.  Becomes the
        ``posterior_predictive`` group.
    log_likelihood : Mapping[str, torch.Tensor], optional
        Site name to per-observation log-likelihood of shape
        ``(num_chains, num_samples, *obs_shape)``.  Becomes the
        ``log_likelihood`` group; required for
        `arviz.loo` / `arviz.waic`.
    constant_data : Mapping[str, torch.Tensor], optional
        Site name to fixed covariate tensor (e.g. design matrix).
        Becomes the ``constant_data`` group.
    coords : Mapping[str, list], optional
        Coordinate values per named axis (e.g.
        ``{"Verb": ["eat", "drink", "run"]}``).  Forwarded to ArviZ.
    dims : Mapping[str, list[str]], optional
        Per-site axis names (e.g. ``{"beta": ["Verb"]}``).
        Forwarded to ArviZ.

    Returns
    -------
    xr.DataTree
        Canonical ArviZ DataTree consumable by every plotting and
        diagnostic function in the ArviZ 1.x API.
    """
    data: dict[str, dict] = {}

    posterior_group: dict = {}
    for name, t in posterior.samples.items():
        posterior_group[name] = _tensor_to_numpy(t)
    data["posterior"] = posterior_group

    sample_stats_group: dict = {
        "lp": _tensor_to_numpy(posterior.log_densities),
    }
    # acceptance_rate and diverging are per-chain scalars in
    # quivers; ArviZ expects per-draw arrays.  We broadcast.
    n_chains = posterior.num_chains
    n_samples = posterior.num_samples
    sample_stats_group["mean_acceptance_per_chain"] = (
        posterior.acceptance_rates.detach()
        .cpu()
        .numpy()
        .reshape(n_chains, 1)
        .repeat(n_samples, axis=1)
    )
    sample_stats_group["total_divergences_per_chain"] = (
        posterior.divergence_counts.detach()
        .cpu()
        .numpy()
        .reshape(n_chains, 1)
        .repeat(n_samples, axis=1)
    )
    data["sample_stats"] = sample_stats_group

    if observed_data:
        data["observed_data"] = {
            name: _tensor_to_numpy(t) for name, t in observed_data.items()
        }
    if posterior_predictive:
        data["posterior_predictive"] = {
            name: _tensor_to_numpy(t) for name, t in posterior_predictive.items()
        }
    if log_likelihood:
        data["log_likelihood"] = {
            name: _tensor_to_numpy(t) for name, t in log_likelihood.items()
        }
    if constant_data:
        data["constant_data"] = {
            name: _tensor_to_numpy(t) for name, t in constant_data.items()
        }

    return az.from_dict(
        data,
        coords=dict(coords) if coords is not None else None,
        dims=dict(dims) if dims is not None else None,
    )

compare

compare(fits: Mapping[str, DataTree], *, method: Literal['stacking', 'BB-pseudo-BMA', 'pseudo-BMA'] = 'stacking', var_name: str | None = None, reference: str | None = None) -> object

Rank candidate models by expected log predictive density.

Delegates to arviz.compare, which computes PSIS-LOO via arviz.loo on each fit's log_likelihood group and combines the resulting arviz.stats.ELPDData records into a ranked comparison table.

PARAMETER DESCRIPTION
fits

Per-model fit, each a DataTree produced by quivers.diagnostics.to_datatree. Every fit must carry a log_likelihood group; without it arviz.loo cannot compute elpd.

TYPE: Mapping[str, DataTree]

method

Stacking weight estimator. Default "stacking" follows Yao, Vehtari, Simpson, Gelman 2018.

TYPE: "stacking", "BB-pseudo-BMA", or "pseudo-BMA" DEFAULT: 'stacking'

var_name

Name of the observed variable in log_likelihood to compare on; required when a fit's log_likelihood group carries multiple variables.

TYPE: str DEFAULT: None

reference

Fit name to use as the reference for elpd-difference comparisons. Default is the top-ranked model.

TYPE: str DEFAULT: None

RETURNS DESCRIPTION
DataFrame

ArviZ ranking table with columns rank, elpd_loo, p_loo, se, weight, ... and one row per model.

Source code in src/quivers/diagnostics/comparison.py
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
def compare(
    fits: Mapping[str, xr.DataTree],
    *,
    method: Literal["stacking", "BB-pseudo-BMA", "pseudo-BMA"] = "stacking",
    var_name: str | None = None,
    reference: str | None = None,
) -> object:
    """Rank candidate models by expected log predictive density.

    Delegates to `arviz.compare`, which computes PSIS-LOO
    via `arviz.loo` on each fit's ``log_likelihood`` group
    and combines the resulting `arviz.stats.ELPDData` records
    into a ranked comparison table.

    Parameters
    ----------
    fits : Mapping[str, xr.DataTree]
        Per-model fit, each a DataTree produced by
        [`quivers.diagnostics.to_datatree`][quivers.diagnostics.to_datatree].  Every fit must
        carry a ``log_likelihood`` group; without it
        `arviz.loo` cannot compute elpd.
    method : "stacking", "BB-pseudo-BMA", or "pseudo-BMA"
        Stacking weight estimator.  Default ``"stacking"`` follows
        [Yao, Vehtari, Simpson, Gelman 2018](https://doi.org/10.1214/17-BA1091).
    var_name : str, optional
        Name of the observed variable in ``log_likelihood`` to
        compare on; required when a fit's ``log_likelihood`` group
        carries multiple variables.
    reference : str, optional
        Fit name to use as the reference for elpd-difference
        comparisons.  Default is the top-ranked model.

    Returns
    -------
    pandas.DataFrame
        ArviZ ranking table with columns ``rank, elpd_loo, p_loo,
        se, weight, ...`` and one row per model.
    """
    return az.compare(
        dict(fits),
        method=method,
        var_name=var_name,
        reference=reference,
    )

posterior_predictive_check

posterior_predictive_check(idata: DataTree, *, observed_name: str, statistic: str | Callable[[ndarray], float] = 'mean', by: str | None = None) -> Mapping[str, float | ndarray | str]

Compute a posterior-predictive p-value (PPP-value) for a user-chosen test statistic.

PARAMETER DESCRIPTION
idata

Fit produced by quivers.diagnostics.to_datatree, with both observed_data and posterior_predictive groups populated.

TYPE: DataTree

observed_name

Name of the observed site (must appear in both groups).

TYPE: str

statistic

Either a key into STATISTICS or a user-supplied Callable[[np.ndarray], float].

TYPE: str or callable DEFAULT: 'mean'

by

If given, computes the statistic per group along the named dim (e.g. by="Verb"); the PPP-value becomes a numpy array of shape (|by|,).

TYPE: str DEFAULT: None

RETURNS DESCRIPTION
Mapping[str, float or ndarray]

{"observed": T(y), "predictive_mean": E[T(y_rep)], "ppp": Pr(T(y_rep) >= T(y))}. The PPP-value is the canonical posterior-predictive p-value; values near 0 or 1 indicate model mis-fit on the chosen statistic.

Source code in src/quivers/diagnostics/predictive_checks.py
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
def posterior_predictive_check(
    idata: xr.DataTree,
    *,
    observed_name: str,
    statistic: str | Callable[[np.ndarray], float] = "mean",
    by: str | None = None,
) -> Mapping[str, float | np.ndarray | str]:
    """Compute a posterior-predictive p-value (PPP-value) for a
    user-chosen test statistic.

    Parameters
    ----------
    idata : xr.DataTree
        Fit produced by [`quivers.diagnostics.to_datatree`][quivers.diagnostics.to_datatree],
        with both ``observed_data`` and ``posterior_predictive``
        groups populated.
    observed_name : str
        Name of the observed site (must appear in both groups).
    statistic : str or callable
        Either a key into `STATISTICS` or a user-supplied
        ``Callable[[np.ndarray], float]``.
    by : str, optional
        If given, computes the statistic per group along the named
        dim (e.g. ``by="Verb"``); the PPP-value becomes a numpy
        array of shape ``(|by|,)``.

    Returns
    -------
    Mapping[str, float or numpy.ndarray]
        ``{"observed": T(y), "predictive_mean": E[T(y_rep)],
        "ppp": Pr(T(y_rep) >= T(y))}``.  The PPP-value is the
        canonical [posterior-predictive p-value](https://en.wikipedia.org/wiki/Posterior_predictive_p-value);
        values near 0 or 1 indicate model mis-fit on the chosen
        statistic.
    """
    if callable(statistic):
        stat_fn = statistic
        stat_name = getattr(statistic, "__name__", "user_statistic")
    else:
        if statistic not in STATISTICS:
            raise ValueError(
                f"unknown statistic {statistic!r}; choices are "
                f"{sorted(STATISTICS)} or a user-supplied callable"
            )
        stat_fn = STATISTICS[statistic]
        stat_name = statistic

    observed = idata["observed_data"][observed_name].values
    pp = idata["posterior_predictive"][observed_name].values
    # pp shape: (chain, draw, *site_shape).  Flatten chains for stat
    # computation across all draws.
    chains, draws = pp.shape[0], pp.shape[1]
    pp_flat = pp.reshape(chains * draws, *pp.shape[2:])

    if by is None:
        t_obs = stat_fn(observed)
        t_rep = np.array([stat_fn(pp_flat[i]) for i in range(pp_flat.shape[0])])
        return {
            "statistic": stat_name,
            "observed": t_obs,
            "predictive_mean": float(t_rep.mean()),
            "ppp": float((t_rep >= t_obs).mean()),
        }

    # Per-group statistic.  Locate the `by` dim in the observed array.
    dims = idata["observed_data"][observed_name].dims
    if by not in dims:
        raise ValueError(
            f"posterior_predictive_check: dim {by!r} not found in "
            f"observed_data/{observed_name} dims {dims}"
        )
    group_axis = dims.index(by)
    observed_groups = np.split(observed, observed.shape[group_axis], axis=group_axis)
    pp_axis = group_axis + 2  # +2 for the leading (chain, draw) axes
    pp_groups = np.split(pp_flat, pp_flat.shape[pp_axis - 1], axis=pp_axis - 1)
    t_obs = np.array([stat_fn(g.squeeze(group_axis)) for g in observed_groups])
    t_rep = np.array(
        [
            np.array([stat_fn(g[i].squeeze(pp_axis - 2)) for i in range(g.shape[0])])
            for g in pp_groups
        ]
    )  # (groups, n_draws)
    return {
        "statistic": stat_name,
        "by": by,
        "observed": t_obs,
        "predictive_mean": t_rep.mean(axis=1),
        "ppp": (t_rep >= t_obs[:, None]).mean(axis=1),
    }