`quivers.formulas`¶

Brms-style formula frontend over the QVR DSL. A user writes a regression formula and gets a fitted Bayesian model without hand-writing .qvr source.

formulas ¶

Brms-style formula frontend over the QVR DSL.

A user writes a regression formula (Wilkinson notation, extended with brms-style random-effect groups) and gets a fitted Bayesian model without hand-writing .qvr source. The compiler emits programs that route through the existing QVR DSL surface:

fixed-effect linear predictors as morphism composition X >> beta over observed design-matrix morphisms,
random-effect groups as plate-gather of per-level latent draws,
responses as observe sites with the family-link kernel (Gaussian / Bernoulli / Binomial / Categorical / Poisson / NegativeBinomial / Cumulative / Beta / Gamma / Student-t / ZeroInflatedPoisson / HurdlePoisson / Mixture) registered in quivers.formulas.family.

The implementation reuses formulae for formula parsing (the Bambi team's pure-Python parser; supports brms-style (slope | group) random effects, smooth terms, and custom contrasts) and lifts the resulting DesignMatrices into a typed Formula didactic.api.Model.

The frontend is the formula→QVR direction of a panproto lens; the QVR DSL is the canonical source of truth, and the formula compiler is a structure-preserving translation from the smaller formula language to the QVR DSL. Future versions will register the formula schema as a panproto protocol so the get/put bidirectional machinery applies.

FormulaToQVRModule ¶

FormulaToQVRModule(family: Family, *, fixed_prior: str = 'Normal(0.0, 5.0)', random_scale_prior: str = 'HalfNormal(1.0)', user_priors: Mapping[str, str] | None = None, reparameterize: Literal['centered', 'noncentered'] = 'noncentered')

Bases: Lens[Formula, Module, FormulaData]

Translate a Formula to a QVR Module AST.

A typed didactic.api.Lens whose complement is a FormulaData carrier: just the fields of the source Formula that are not recoverable from the emitted Module. The per-row data arrays, the original (pre- _qvr_name) identifiers, the per-column term / name presentation labels, and the original formula string travel in the complement; everything else (which columns there are, the intercept / random-term structure, the family choice) is decoded back out of the Module by _decode_module.

PARAMETER	DESCRIPTION
`family`	Response family from `quivers.formulas.families`. TYPE: `Family`
`fixed_prior`	Default prior for fixed-effect coefficients, in the surface form `"Family(arg, arg, ...)"`; numeric args become floats, identifier args stay as variable references in the emitted program. TYPE: `str` DEFAULT: `'Normal(0.0, 5.0)'`
`random_scale_prior`	Default prior for random-effect scale parameters. TYPE: `str` DEFAULT: `'HalfNormal(1.0)'`
`user_priors`	Per-name prior overrides keyed by the latent's variable name in the emitted module. TYPE: `Mapping[str, str]` DEFAULT: `None`

Notes

GetPut: backward (forward(f)) = f for every Formula f. PutGet holds on pairs (t, c) for which t is in the image of forward and c is the corresponding FormulaData.

Source code in src/quivers/formulas/compile.py

def __init__(
    self,
    family: Family,
    *,
    fixed_prior: str = "Normal(0.0, 5.0)",
    random_scale_prior: str = "HalfNormal(1.0)",
    user_priors: Mapping[str, str] | None = None,
    reparameterize: Literal["centered", "noncentered"] = "noncentered",
) -> None:
    self._family = family
    self._fixed_prior = fixed_prior
    self._random_scale_prior = random_scale_prior
    self._user_priors: Mapping[str, str] = dict(user_priors or {})
    if reparameterize not in ("centered", "noncentered"):
        raise ValueError(
            f"FormulaToQVRModule: reparameterize must be 'centered' "
            f"or 'noncentered', got {reparameterize!r}"
        )
    self._reparameterize = reparameterize

fixed_column_observations ¶

fixed_column_observations(formula: Formula) -> dict[str, Tensor]

Per-column free-variable bindings for the host-data channel. One entry per non-intercept fixed column, shape (N,).

Source code in src/quivers/formulas/compile.py

def fixed_column_observations(self, formula: Formula) -> dict[str, torch.Tensor]:
    """Per-column free-variable bindings for the host-data
    channel.  One entry per non-intercept fixed column, shape
    ``(N,)``.
    """
    obs: dict[str, torch.Tensor] = {}
    for col in formula.fixed_columns:
        if col.is_intercept:
            continue
        obs[col.qvr_name] = torch.as_tensor(col.data.copy(), dtype=torch.float32)
    return obs

Family ¶

Bases: Model

Response family: observation kernel plus its link and any auxiliary parameters.

The compiler emits, for each family-keyed regression:

The auxiliary-parameter latent draws (intercept-only by default; one per parameter named in aux_params).
A let eta = <linear predictor> binding the location's linear predictor.
A let mu = <link.inverse_expr> binding the natural parameter.
A single observe y : Resp <- <observe_family>(mu, ...) step parameterised by mu and the auxiliary parameters.

Link ¶

Bases: Model

Inverse-link function bridging a linear predictor on :math:\mathbb{R} to the family's natural parameter space.

ATTRIBUTE	DESCRIPTION
`name`	Canonical name (e.g. `"identity"`, `"logit"`, `"log"`, `"probit"`, `"cloglog"`). TYPE: `str`
`inverse_expr`	QVR let-expression template applied to a linear predictor `eta` to produce the family parameter. Substitution token `{eta}` is replaced by the linear predictor's variable name. `"{eta}"` itself is the identity link. TYPE: `str`

BayesianFit ¶

Bases: Model

A fitted Bayesian regression: the compiled program, the parsed formula, the family, the user-supplied data, and the posterior samples.

ATTRIBUTE	DESCRIPTION
`formula`	Parsed formula IR. TYPE: `Formula`
`family`	Response family used at compile time. TYPE: `Family`
`program`	The compiled program. TYPE: `MonadicProgram`
`posterior`	`quivers.inference.MCMCResult` from NUTS / HMC, or a `quivers.inference.guides.base.Guide` from SVI. TYPE: `MCMCResult or Guide`
`observations`	Inference-time observations dict (response + per-column covariates + per-group plate indices). TYPE: `Mapping[str, Tensor]`

qvr_source `property` ¶

qvr_source: str

Lazily emit the AST-equivalent .qvr source for display.

dump_qvr ¶

dump_qvr(path: str | Path) -> Path

Write the AST-equivalent .qvr source to path and return the resulting Path.

Source code in src/quivers/formulas/_fit.py

def dump_qvr(self, path: str | Path) -> Path:
    """Write the AST-equivalent ``.qvr`` source to ``path`` and
    return the resulting `Path`.
    """
    out = Path(path)
    out.write_text(self.qvr_source)
    return out

Formula ¶

Bases: Model

A parsed regression formula plus the data it was parsed against.

ATTRIBUTE	DESCRIPTION
`formula`	Original formula string. TYPE: `str`
`response_name`	Name of the response column. TYPE: `str`
`fixed_columns`	One entry per design-matrix column (matches R/brms's one-coefficient-per-column convention). TYPE: `tuple[FixedColumn, ...]`
`random_terms`	Random-effect group specifications. TYPE: `tuple[RandomTerm, ...]`
`response_values`	Response column values, shape `(N,)`. TYPE: `ndarray`
`group_levels`	Canonical level ordering per grouping factor, used to derive deterministic plate-index tensors. TYPE: `Mapping[str, tuple[str, ...]]`
`group_indices`	Per-group integer index array, shape `(N,)`. TYPE: `Mapping[str, tuple[int, ...]]`

RandomTerm ¶

Bases: Model

One random-effect group, e.g. (1 | g) or (x | g).

ATTRIBUTE	DESCRIPTION
`slope`	`"Intercept"` for `(1 \| g)`; otherwise the slope variable name. TYPE: `str`
`group`	Grouping factor name. TYPE: `str`

fit ¶

fit(formula: str, *, data: IntoDataFrame, family: str | Family = 'gaussian', method: Literal['nuts', 'hmc', 'svi'] = 'nuts', num_warmup: int = 500, num_samples: int = 1000, num_chains: int = 4, fixed_prior: str = 'Normal(0.0, 5.0)', random_scale_prior: str = 'HalfNormal(1.0)', priors: Mapping[str, str] | None = None, guide: type | None = None, reparameterize: Literal['centered', 'noncentered'] = 'noncentered', seed: int = 0) -> BayesianFit

Compile a brms-style formula, fit it, and return the result.

See quivers.formulas for surface details. This entry point composes formula_from_data, FormulaToQVRModule, Compiler, and the inference layer in one call.

Source code in src/quivers/formulas/_fit.py

def fit(
    formula: str,
    *,
    data: IntoDataFrame,
    family: str | Family = "gaussian",
    method: Literal["nuts", "hmc", "svi"] = "nuts",
    num_warmup: int = 500,
    num_samples: int = 1000,
    num_chains: int = 4,
    fixed_prior: str = "Normal(0.0, 5.0)",
    random_scale_prior: str = "HalfNormal(1.0)",
    priors: Mapping[str, str] | None = None,
    guide: type | None = None,
    reparameterize: Literal["centered", "noncentered"] = "noncentered",
    seed: int = 0,
) -> BayesianFit:
    """Compile a brms-style formula, fit it, and return the result.

    See [`quivers.formulas`][quivers.formulas] for surface details.  This entry
    point composes `formula_from_data`, `FormulaToQVRModule`,
    `Compiler`, and the inference layer in one call.
    """
    if isinstance(family, str):
        if family not in families:
            raise ValueError(
                f"fit: unknown family {family!r}; choices are {sorted(families)}"
            )
        family_obj = families[family]
    else:
        family_obj = family

    parsed = formula_from_data(formula, data)
    lens = FormulaToQVRModule(
        family_obj,
        fixed_prior=fixed_prior,
        random_scale_prior=random_scale_prior,
        user_priors=priors,
    )
    module, _ = lens.forward(parsed)
    compiler = Compiler(module)
    program_runtime = compiler.compile()
    morphism = program_runtime.morphism
    if not isinstance(morphism, MonadicProgram):
        raise TypeError(
            f"fit: compiled morphism has type "
            f"{type(morphism).__name__}, expected MonadicProgram"
        )
    program = morphism

    observations: dict[str, torch.Tensor] = {}
    observations.update(lens.fixed_column_observations(parsed))
    response_name = parsed.response_name
    observations[response_name] = torch.as_tensor(
        parsed.response_values.copy(), dtype=torch.float32
    ).reshape(-1)
    for group, codes in parsed.group_indices.items():
        observations[f"{_qvr_name(group)}_idx"] = torch.as_tensor(
            list(codes), dtype=torch.long
        )

    torch.manual_seed(seed)
    if method == "svi":
        posterior = _fit_svi(program, observations, num_samples, guide_cls=guide)
    else:
        posterior = _fit_mcmc(
            program,
            observations,
            sampler=method,
            num_warmup=num_warmup,
            num_samples=num_samples,
            num_chains=num_chains,
        )

    return BayesianFit(
        formula=parsed,
        family=family_obj,
        program=program,
        posterior=posterior,
        observations=observations,
        reparameterize=reparameterize,
    )

formula_to_qvr ¶

formula_to_qvr(formula: str, *, data: IntoDataFrame, family: str | Family = 'gaussian', fixed_prior: str = 'Normal(0.0, 5.0)', random_scale_prior: str = 'HalfNormal(1.0)', priors: Mapping[str, str] | None = None, reparameterize: Literal['centered', 'noncentered'] = 'noncentered', path: str | Path | None = None) -> str

Emit .qvr source for a brms-style formula without fitting.

Builds the formula AST → QVR Module via FormulaToQVRModule, then serialises the module via quivers.dsl.emit.module_to_source. Optionally writes the result to path.

Source code in src/quivers/formulas/_fit.py

def formula_to_qvr(
    formula: str,
    *,
    data: IntoDataFrame,
    family: str | Family = "gaussian",
    fixed_prior: str = "Normal(0.0, 5.0)",
    random_scale_prior: str = "HalfNormal(1.0)",
    priors: Mapping[str, str] | None = None,
    reparameterize: Literal["centered", "noncentered"] = "noncentered",
    path: str | Path | None = None,
) -> str:
    """Emit ``.qvr`` source for a brms-style formula without fitting.

    Builds the formula AST → QVR Module via `FormulaToQVRModule`,
    then serialises the module via [`quivers.dsl.emit.module_to_source`][quivers.dsl.emit.module_to_source].
    Optionally writes the result to ``path``.
    """
    if isinstance(family, str):
        if family not in families:
            raise ValueError(
                f"formula_to_qvr: unknown family {family!r}; choices are "
                f"{sorted(families)}"
            )
        family_obj = families[family]
    else:
        family_obj = family
    parsed = formula_from_data(formula, data)
    lens = FormulaToQVRModule(
        family_obj,
        fixed_prior=fixed_prior,
        random_scale_prior=random_scale_prior,
        user_priors=priors,
        reparameterize=reparameterize,
    )
    module, _ = lens.forward(parsed)
    source = module_to_source(module)
    if path is not None:
        Path(path).write_text(source)
    return source

formula_from_data ¶

formula_from_data(formula: str, data: IntoDataFrame, *, extra_namespace: Mapping[str, object] | None = None) -> Formula

Build a typed Formula IR by lifting formulae.design_matrices over a dataframe.

This is an adapter, not a parser: the brms-style formula syntax is parsed by the formulae library; we lift its formulae.matrices.DesignMatrices result into a typed didactic record, augmented with deterministic per-group level orderings and integer-code arrays derived from the dataframe.

The R-style numeric transforms (log, exp, sqrt, abs, sin, cos, tan, log10, log2, log1p, expm1, asin, acos, atan, sinh, cosh, tanh) are pre-loaded into the formulae evaluation namespace so users coming from R / brms get the expected base R behaviour without explicit registration. Polynomial terms via poly(x, k) are orthogonal by default, matching R's stats::poly.

PARAMETER	DESCRIPTION
`formula`	Formula string in brms / lme4 syntax. TYPE: `str`
`data`	Pandas, polars, or any other Narwhals-compatible dataframe. TYPE: `IntoDataFrame`
`extra_namespace`	Additional names visible inside the formula's expression evaluation, merged on top of the R-style transforms. TYPE: `Mapping[str, object]` DEFAULT: `None`

Source code in src/quivers/formulas/formula.py

def formula_from_data(
    formula: str,
    data: IntoDataFrame,
    *,
    extra_namespace: Mapping[str, object] | None = None,
) -> Formula:
    """Build a typed `Formula` IR by lifting
    `formulae.design_matrices` over a dataframe.

    This is an adapter, not a parser: the brms-style formula syntax
    is parsed by the [`formulae`](https://bambinos.github.io/formulae/)
    library; we lift its `formulae.matrices.DesignMatrices`
    result into a typed didactic record, augmented with deterministic
    per-group level orderings and integer-code arrays derived from
    the dataframe.

    The R-style numeric transforms (``log``, ``exp``, ``sqrt``,
    ``abs``, ``sin``, ``cos``, ``tan``, ``log10``, ``log2``,
    ``log1p``, ``expm1``, ``asin``, ``acos``, ``atan``, ``sinh``,
    ``cosh``, ``tanh``) are pre-loaded into the formulae evaluation
    namespace so users coming from R / brms get the expected base
    R behaviour without explicit registration.  Polynomial terms via
    ``poly(x, k)`` are orthogonal by default, matching R's
    ``stats::poly``.

    Parameters
    ----------
    formula : str
        Formula string in brms / lme4 syntax.
    data : IntoDataFrame
        Pandas, polars, or any other Narwhals-compatible dataframe.
    extra_namespace : Mapping[str, object], optional
        Additional names visible inside the formula's expression
        evaluation, merged on top of the R-style transforms.
    """
    nw_df = nw.from_native(data, eager_only=True)
    pandas_df = nw_df.to_pandas()
    namespace: dict[str, object] = dict(_R_TRANSFORMS)
    if extra_namespace:
        namespace.update(extra_namespace)
    dm = fo.design_matrices(formula, data=pandas_df, extra_namespace=namespace)
    if dm.response is None:
        raise ValueError(
            f"formula_from_data: formula {formula!r} has no response "
            f"variable on the left of `~`"
        )
    response_name = dm.response.name
    n_obs = int(pandas_df.shape[0])

    fixed_columns: list[FixedColumn] = []
    if dm.common is not None:
        for term_name, term in dm.common.terms.items():
            fixed_columns.extend(_explode_term(term_name, term, n_obs))

    random_terms: list[RandomTerm] = []
    group_levels: dict[str, tuple[str, ...]] = {}
    group_indices: dict[str, tuple[int, ...]] = {}
    if dm.group is not None:
        for term_name in dm.group.terms.keys():
            if "|" not in term_name:
                raise ValueError(
                    f"formula_from_data: unexpected random term name "
                    f"{term_name!r}; expected `(slope | group)` syntax"
                )
            slope, group = term_name.split("|", 1)
            slope = slope.strip()
            group = group.strip()
            if slope == "1":
                slope = "Intercept"
            random_terms.append(RandomTerm(slope=slope, group=group))
            if group not in group_levels:
                levels = tuple(
                    str(v) for v in nw_df[group].drop_nulls().unique().sort().to_list()
                )
                group_levels[group] = levels
                level_index = {v: i for i, v in enumerate(levels)}
                codes = tuple(level_index[str(v)] for v in nw_df[group].to_list())
                group_indices[group] = codes

    response_values = (
        np.asarray(dm.response.design_matrix).reshape(-1).astype(np.float64)
    )

    return Formula(
        formula=formula,
        response_name=response_name,
        fixed_columns=tuple(fixed_columns),
        random_terms=tuple(random_terms),
        response_values=response_values,
        group_levels=group_levels,
        group_indices=group_indices,
    )

quivers.formulas¶

formulas ¶

FormulaToQVRModule ¶

fixed_column_observations ¶

Family ¶

Link ¶

BayesianFit ¶

qvr_source property ¶

dump_qvr ¶

Formula ¶

RandomTerm ¶

fit ¶

formula_to_qvr ¶

formula_from_data ¶

`quivers.formulas`¶

qvr_source `property` ¶