`quivers.formulas.formula`¶

Parsed-formula IR: Formula, FixedColumn, RandomTerm, plus the formula_from_data adapter over formulae.design_matrices.

formula ¶

Parsed-formula IR: a typed didactic.api.Model wrapping the raw formulae.matrices.DesignMatrices so the rest of the formula frontend operates on typed values.

The Formula IR is the canonical source representation of the formula→QVR lens. Future versions can register it as a panproto protocol so the lens machinery applies; for now the compiler walks this IR directly.

Convention

Each fixed-effect term may produce one or more design-matrix columns (a single column for x, two for poly(x, 2), K for an unordered factor with K + 1 levels, etc.). R / brms assign one coefficient per column; this IR follows the same convention by exploding each term into a tuple of FixedColumn records. Multi-column terms thus produce multiple named scalar latents downstream, with deterministic naming {term}_1, {term}_2, ... that mirrors R's poly(x, 2)1 / poly(x, 2)2 display.

Polynomial default: formulae.design_matrices's poly transform is orthogonal by default (matches R's stats::poly). Raw monomials are available via I(x^2) / I(x**2). Transforms log, exp, sqrt, abs, sin, cos, tan, log10, log2, log1p, expm1 are wired through the formulae evaluation namespace so users coming from R get the expected base R behaviour.

FixedColumn ¶

Bases: Model

One column of the fixed-effects design matrix.

ATTRIBUTE	DESCRIPTION
`term`	Originating term name (e.g. `"poly(x, 2)"` or `"x"`). TYPE: `str`
`name`	Per-column label, equal to `term` for single-column terms and `f"{term}_{k+1}"` (1-indexed, matching R's display) for multi-column terms like `poly(x, 2)`. TYPE: `str`
`qvr_name`	QVR-legal identifier derived from `name` (alnum / `_` only); used as the variable name in the emitted program. TYPE: `str`
`is_intercept`	`True` for the constant-1 column. TYPE: `bool`

RandomTerm ¶

Bases: Model

One random-effect group, e.g. (1 | g) or (x | g).

ATTRIBUTE	DESCRIPTION
`slope`	`"Intercept"` for `(1 \| g)`; otherwise the slope variable name. TYPE: `str`
`group`	Grouping factor name. TYPE: `str`

Formula ¶

Bases: Model

A parsed regression formula plus the data it was parsed against.

ATTRIBUTE	DESCRIPTION
`formula`	Original formula string. TYPE: `str`
`response_name`	Name of the response column. TYPE: `str`
`fixed_columns`	One entry per design-matrix column (matches R/brms's one-coefficient-per-column convention). TYPE: `tuple[FixedColumn, ...]`
`random_terms`	Random-effect group specifications. TYPE: `tuple[RandomTerm, ...]`
`response_values`	Response column values, shape `(N,)`. TYPE: `ndarray`
`group_levels`	Canonical level ordering per grouping factor, used to derive deterministic plate-index tensors. TYPE: `Mapping[str, tuple[str, ...]]`
`group_indices`	Per-group integer index array, shape `(N,)`. TYPE: `Mapping[str, tuple[int, ...]]`

FormulaData ¶

Bases: Model

The complement of a Formula under the quivers.formulas.compile.FormulaToQVRModule lens.

The emitted QVR quivers.dsl.ast_nodes.Module carries the structural skeleton of the formula (which columns there are, keyed by their QVR-legal identifier; whether each is an intercept; the random-effect group / slope pairs; the family; the response identifier in its QVR-legal form). It does not carry:

the per-row data arrays (those flow through the host-data channel at fit time);
the per-column / per-group / response original names (the lens uses _qvr_name to normalize identifiers, which replaces non-alphanumeric characters with underscores and is therefore lossy);
the per-column term label (presentation, ungrouped from the lens forward output);
the original formula string (presentation: the lens emits a canonical AST that does not record user whitespace or operator-precedence choices).

Those fields travel in the complement. backward(module, complement) decodes the structural fields from the Module and fuses them with this carrier to reproduce the original Formula verbatim.

ATTRIBUTE	DESCRIPTION
`formula`	Original formula string. TYPE: `str`
`response_name`	Original (pre-`_qvr_name`) response column name. TYPE: `str`
`response_values`	Response column values, shape `(N,)`. TYPE: `ndarray`
`fixed_column_names`	Per-column `(term, name)` keyed by `FixedColumn.qvr_name`. Lets the decoder recover `FixedColumn.term` and `FixedColumn.name` from the qvr-name surfaced in the Module's latent declarations. TYPE: `Mapping[str, tuple[str, str]]`
`fixed_column_data`	Per-row predictor values, keyed by `FixedColumn.qvr_name`. TYPE: `Mapping[str, ndarray]`
`group_original_names`	Per-group `qvr_name → original group name`. TYPE: `Mapping[str, str]`
`group_levels`	Canonical per-group level ordering. Needed to populate `Formula.group_levels` from the integer-coded `object G : K` declarations the Module records. TYPE: `Mapping[str, tuple[str, ...]]`
`group_indices`	Per-row integer codes for each grouping factor. TYPE: `Mapping[str, tuple[int, ...]]`

formula_from_data ¶

formula_from_data(formula: str, data: IntoDataFrame, *, extra_namespace: Mapping[str, object] | None = None) -> Formula

Build a typed Formula IR by lifting formulae.design_matrices over a dataframe.

This is an adapter, not a parser: the brms-style formula syntax is parsed by the formulae library; we lift its formulae.matrices.DesignMatrices result into a typed didactic record, augmented with deterministic per-group level orderings and integer-code arrays derived from the dataframe.

The R-style numeric transforms (log, exp, sqrt, abs, sin, cos, tan, log10, log2, log1p, expm1, asin, acos, atan, sinh, cosh, tanh) are pre-loaded into the formulae evaluation namespace so users coming from R / brms get the expected base R behaviour without explicit registration. Polynomial terms via poly(x, k) are orthogonal by default, matching R's stats::poly.

PARAMETER	DESCRIPTION
`formula`	Formula string in brms / lme4 syntax. TYPE: `str`
`data`	Pandas, polars, or any other Narwhals-compatible dataframe. TYPE: `IntoDataFrame`
`extra_namespace`	Additional names visible inside the formula's expression evaluation, merged on top of the R-style transforms. TYPE: `Mapping[str, object]` DEFAULT: `None`

Source code in src/quivers/formulas/formula.py

def formula_from_data(
    formula: str,
    data: IntoDataFrame,
    *,
    extra_namespace: Mapping[str, object] | None = None,
) -> Formula:
    """Build a typed `Formula` IR by lifting
    `formulae.design_matrices` over a dataframe.

    This is an adapter, not a parser: the brms-style formula syntax
    is parsed by the [`formulae`](https://bambinos.github.io/formulae/)
    library; we lift its `formulae.matrices.DesignMatrices`
    result into a typed didactic record, augmented with deterministic
    per-group level orderings and integer-code arrays derived from
    the dataframe.

    The R-style numeric transforms (``log``, ``exp``, ``sqrt``,
    ``abs``, ``sin``, ``cos``, ``tan``, ``log10``, ``log2``,
    ``log1p``, ``expm1``, ``asin``, ``acos``, ``atan``, ``sinh``,
    ``cosh``, ``tanh``) are pre-loaded into the formulae evaluation
    namespace so users coming from R / brms get the expected base
    R behaviour without explicit registration.  Polynomial terms via
    ``poly(x, k)`` are orthogonal by default, matching R's
    ``stats::poly``.

    Parameters
    ----------
    formula : str
        Formula string in brms / lme4 syntax.
    data : IntoDataFrame
        Pandas, polars, or any other Narwhals-compatible dataframe.
    extra_namespace : Mapping[str, object], optional
        Additional names visible inside the formula's expression
        evaluation, merged on top of the R-style transforms.
    """
    nw_df = nw.from_native(data, eager_only=True)
    pandas_df = nw_df.to_pandas()
    namespace: dict[str, object] = dict(_R_TRANSFORMS)
    if extra_namespace:
        namespace.update(extra_namespace)
    dm = fo.design_matrices(formula, data=pandas_df, extra_namespace=namespace)
    if dm.response is None:
        raise ValueError(
            f"formula_from_data: formula {formula!r} has no response "
            f"variable on the left of `~`"
        )
    response_name = dm.response.name
    n_obs = int(pandas_df.shape[0])

    fixed_columns: list[FixedColumn] = []
    if dm.common is not None:
        for term_name, term in dm.common.terms.items():
            fixed_columns.extend(_explode_term(term_name, term, n_obs))

    random_terms: list[RandomTerm] = []
    group_levels: dict[str, tuple[str, ...]] = {}
    group_indices: dict[str, tuple[int, ...]] = {}
    if dm.group is not None:
        for term_name in dm.group.terms.keys():
            if "|" not in term_name:
                raise ValueError(
                    f"formula_from_data: unexpected random term name "
                    f"{term_name!r}; expected `(slope | group)` syntax"
                )
            slope, group = term_name.split("|", 1)
            slope = slope.strip()
            group = group.strip()
            if slope == "1":
                slope = "Intercept"
            random_terms.append(RandomTerm(slope=slope, group=group))
            if group not in group_levels:
                levels = tuple(
                    str(v) for v in nw_df[group].drop_nulls().unique().sort().to_list()
                )
                group_levels[group] = levels
                level_index = {v: i for i, v in enumerate(levels)}
                codes = tuple(level_index[str(v)] for v in nw_df[group].to_list())
                group_indices[group] = codes

    response_values = (
        np.asarray(dm.response.design_matrix).reshape(-1).astype(np.float64)
    )

    return Formula(
        formula=formula,
        response_name=response_name,
        fixed_columns=tuple(fixed_columns),
        random_terms=tuple(random_terms),
        response_values=response_values,
        group_levels=group_levels,
        group_indices=group_indices,
    )

quivers.formulas.formula¶

formula ¶

FixedColumn ¶

RandomTerm ¶

Formula ¶

FormulaData ¶

formula_from_data ¶

`quivers.formulas.formula`¶