quivers.formulas

Brms-style formula frontend over the QVR DSL. A user writes a regression formula and gets a fitted Bayesian model without hand-writing .qvr source.

formulas

Brms-style formula frontend over the QVR DSL.

A user writes a regression formula (Wilkinson notation, extended with brms-style random-effect groups) and gets a fitted Bayesian model without hand-writing .qvr source. The compiler emits programs that route through the existing QVR DSL surface:

  • fixed-effect linear predictors as morphism composition X >> beta over observed design-matrix morphisms,
  • random-effect groups as plate-gather of per-level latent draws,
  • responses as observe sites with the family-link kernel (Gaussian / Bernoulli / Categorical / Poisson / NegBin / Cumulative / Beta / Gamma / Student-t / ZIP / hurdle / mixture) registered in quivers.formulas.family.

The implementation reuses formulae for formula parsing (the Bambi team's pure-Python parser; supports brms-style (slope | group) random effects, smooth terms, and custom contrasts) and lifts the resulting DesignMatrices into a typed Formula didactic.api.Model.

The frontend is the formula→QVR direction of a panproto lens; the QVR DSL is the canonical source of truth, and the formula compiler is a structure-preserving translation from the smaller formula language to the QVR DSL. Future versions will register the formula schema as a panproto protocol so the get/put bidirectional machinery applies.

FormulaToQVRModule

FormulaToQVRModule(family: Family, *, fixed_prior: str = 'Normal(0.0, 5.0)', random_scale_prior: str = 'HalfNormal(1.0)', user_priors: Mapping[str, str] | None = None, reparameterize: Literal['centered', 'noncentered'] = 'noncentered')

Bases: Lens[Formula, Module, FormulaData]

Translate a Formula to a QVR Module AST.

A typed didactic.api.Lens whose complement is a FormulaData carrier: just the fields of the source Formula that are not recoverable from the emitted Module. The per-row data arrays, the original (pre- _qvr_name) identifiers, the per-column term / name presentation labels, and the original formula string travel in the complement; everything else (which columns there are, the intercept / random-term structure, the family choice) is decoded back out of the Module by _decode_module.

PARAMETER DESCRIPTION
family

Response family from quivers.formulas.families.

TYPE: Family

fixed_prior

Default prior for fixed-effect coefficients, in the surface form "Family(arg, arg, ...)"; numeric args become floats, identifier args stay as variable references in the emitted program.

TYPE: str DEFAULT: 'Normal(0.0, 5.0)'

random_scale_prior

Default prior for random-effect scale parameters.

TYPE: str DEFAULT: 'HalfNormal(1.0)'

user_priors

Per-name prior overrides keyed by the latent's variable name in the emitted module.

TYPE: Mapping[str, str] DEFAULT: None

Notes

GetPut: backward (forward(f)) = f for every Formula f. PutGet holds on pairs (t, c) for which t is in the image of forward and c is the corresponding FormulaData.

Source code in src/quivers/formulas/compile.py
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
def __init__(
    self,
    family: Family,
    *,
    fixed_prior: str = "Normal(0.0, 5.0)",
    random_scale_prior: str = "HalfNormal(1.0)",
    user_priors: Mapping[str, str] | None = None,
    reparameterize: Literal["centered", "noncentered"] = "noncentered",
) -> None:
    self._family = family
    self._fixed_prior = fixed_prior
    self._random_scale_prior = random_scale_prior
    self._user_priors: Mapping[str, str] = dict(user_priors or {})
    if reparameterize not in ("centered", "noncentered"):
        raise ValueError(
            f"FormulaToQVRModule: reparameterize must be 'centered' "
            f"or 'noncentered', got {reparameterize!r}"
        )
    self._reparameterize = reparameterize

fixed_column_observations

fixed_column_observations(formula: Formula) -> dict[str, Tensor]

Per-column free-variable bindings for the host-data channel. One entry per non-intercept fixed column, shape (N,).

Source code in src/quivers/formulas/compile.py
473
474
475
476
477
478
479
480
481
482
483
def fixed_column_observations(self, formula: Formula) -> dict[str, torch.Tensor]:
    """Per-column free-variable bindings for the host-data
    channel.  One entry per non-intercept fixed column, shape
    ``(N,)``.
    """
    obs: dict[str, torch.Tensor] = {}
    for col in formula.fixed_columns:
        if col.is_intercept:
            continue
        obs[col.qvr_name] = torch.as_tensor(col.data.copy(), dtype=torch.float32)
    return obs

Family

Bases: Model

Response family: observation kernel plus its link and any auxiliary parameters.

The compiler emits, for each family-keyed regression:

  1. The auxiliary-parameter latent draws (intercept-only by default; one per parameter named in aux_params).
  2. A let eta = <linear predictor> binding the location's linear predictor.
  3. A let mu = <link.inverse_expr> binding the natural parameter.
  4. A single observe y : Resp <- <observe_family>(mu, ...) step parameterised by mu and the auxiliary parameters.

Bases: Model

Inverse-link function bridging a linear predictor on :math:\mathbb{R} to the family's natural parameter space.

ATTRIBUTE DESCRIPTION
name

Canonical name (e.g. "identity", "logit", "log", "probit", "cloglog").

TYPE: str

inverse_expr

QVR let-expression template applied to a linear predictor eta to produce the family parameter. Substitution token {eta} is replaced by the linear predictor's variable name. "{eta}" itself is the identity link.

TYPE: str

BayesianFit

Bases: Model

A fitted Bayesian regression: the compiled program, the parsed formula, the family, the user-supplied data, and the posterior samples.

ATTRIBUTE DESCRIPTION
formula

Parsed formula IR.

TYPE: Formula

family

Response family used at compile time.

TYPE: Family

program

The compiled program.

TYPE: MonadicProgram

posterior

TYPE: MCMCResult or Guide

observations

Inference-time observations dict (response + per-column covariates + per-group plate indices).

TYPE: Mapping[str, Tensor]

qvr_source property

qvr_source: str

Lazily emit the AST-equivalent .qvr source for display.

dump_qvr

dump_qvr(path: str | Path) -> Path

Write the AST-equivalent .qvr source to path and return the resulting Path.

Source code in src/quivers/formulas/_fit.py
79
80
81
82
83
84
85
def dump_qvr(self, path: str | Path) -> Path:
    """Write the AST-equivalent ``.qvr`` source to ``path`` and
    return the resulting `Path`.
    """
    out = Path(path)
    out.write_text(self.qvr_source)
    return out

Formula

Bases: Model

A parsed regression formula plus the data it was parsed against.

ATTRIBUTE DESCRIPTION
formula

Original formula string.

TYPE: str

response_name

Name of the response column.

TYPE: str

fixed_columns

One entry per design-matrix column (matches R/brms's one-coefficient-per-column convention).

TYPE: tuple[FixedColumn, ...]

random_terms

Random-effect group specifications.

TYPE: tuple[RandomTerm, ...]

response_values

Response column values, shape (N,).

TYPE: ndarray

group_levels

Canonical level ordering per grouping factor, used to derive deterministic plate-index tensors.

TYPE: Mapping[str, tuple[str, ...]]

group_indices

Per-group integer index array, shape (N,).

TYPE: Mapping[str, tuple[int, ...]]

RandomTerm

Bases: Model

One random-effect group, e.g. (1 | g) or (x | g).

ATTRIBUTE DESCRIPTION
slope

"Intercept" for (1 | g); otherwise the slope variable name.

TYPE: str

group

Grouping factor name.

TYPE: str

fit

fit(formula: str, *, data: IntoDataFrame, family: str | Family = 'gaussian', method: Literal['nuts', 'hmc', 'svi'] = 'nuts', num_warmup: int = 500, num_samples: int = 1000, num_chains: int = 4, fixed_prior: str = 'Normal(0.0, 5.0)', random_scale_prior: str = 'HalfNormal(1.0)', priors: Mapping[str, str] | None = None, guide: type | None = None, reparameterize: Literal['centered', 'noncentered'] = 'noncentered', seed: int = 0) -> BayesianFit

Compile a brms-style formula, fit it, and return the result.

See quivers.formulas for surface details. This entry point composes formula_from_data, FormulaToQVRModule, Compiler, and the inference layer in one call.

Source code in src/quivers/formulas/_fit.py
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
def fit(
    formula: str,
    *,
    data: IntoDataFrame,
    family: str | Family = "gaussian",
    method: Literal["nuts", "hmc", "svi"] = "nuts",
    num_warmup: int = 500,
    num_samples: int = 1000,
    num_chains: int = 4,
    fixed_prior: str = "Normal(0.0, 5.0)",
    random_scale_prior: str = "HalfNormal(1.0)",
    priors: Mapping[str, str] | None = None,
    guide: type | None = None,
    reparameterize: Literal["centered", "noncentered"] = "noncentered",
    seed: int = 0,
) -> BayesianFit:
    """Compile a brms-style formula, fit it, and return the result.

    See [`quivers.formulas`][quivers.formulas] for surface details.  This entry
    point composes `formula_from_data`, `FormulaToQVRModule`,
    `Compiler`, and the inference layer in one call.
    """
    if isinstance(family, str):
        if family not in families:
            raise ValueError(
                f"fit: unknown family {family!r}; choices are {sorted(families)}"
            )
        family_obj = families[family]
    else:
        family_obj = family

    parsed = formula_from_data(formula, data)
    lens = FormulaToQVRModule(
        family_obj,
        fixed_prior=fixed_prior,
        random_scale_prior=random_scale_prior,
        user_priors=priors,
    )
    module, _ = lens.forward(parsed)
    compiler = Compiler(module)
    program_runtime = compiler.compile()
    morphism = program_runtime.morphism
    if not isinstance(morphism, MonadicProgram):
        raise TypeError(
            f"fit: compiled morphism has type "
            f"{type(morphism).__name__}, expected MonadicProgram"
        )
    program = morphism

    observations: dict[str, torch.Tensor] = {}
    observations.update(lens.fixed_column_observations(parsed))
    response_name = parsed.response_name
    observations[response_name] = torch.as_tensor(
        parsed.response_values.copy(), dtype=torch.float32
    ).reshape(-1)
    for group, codes in parsed.group_indices.items():
        observations[f"{_qvr_name(group)}_idx"] = torch.as_tensor(
            list(codes), dtype=torch.long
        )

    torch.manual_seed(seed)
    if method == "svi":
        posterior = _fit_svi(program, observations, num_samples, guide_cls=guide)
    else:
        posterior = _fit_mcmc(
            program,
            observations,
            sampler=method,
            num_warmup=num_warmup,
            num_samples=num_samples,
            num_chains=num_chains,
        )

    return BayesianFit(
        formula=parsed,
        family=family_obj,
        program=program,
        posterior=posterior,
        observations=observations,
        reparameterize=reparameterize,
    )

formula_to_qvr

formula_to_qvr(formula: str, *, data: IntoDataFrame, family: str | Family = 'gaussian', fixed_prior: str = 'Normal(0.0, 5.0)', random_scale_prior: str = 'HalfNormal(1.0)', priors: Mapping[str, str] | None = None, reparameterize: Literal['centered', 'noncentered'] = 'noncentered', path: str | Path | None = None) -> str

Emit .qvr source for a brms-style formula without fitting.

Builds the formula AST → QVR Module via FormulaToQVRModule, then serialises the module via quivers.dsl.emit.module_to_source. Optionally writes the result to path.

Source code in src/quivers/formulas/_fit.py
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
def formula_to_qvr(
    formula: str,
    *,
    data: IntoDataFrame,
    family: str | Family = "gaussian",
    fixed_prior: str = "Normal(0.0, 5.0)",
    random_scale_prior: str = "HalfNormal(1.0)",
    priors: Mapping[str, str] | None = None,
    reparameterize: Literal["centered", "noncentered"] = "noncentered",
    path: str | Path | None = None,
) -> str:
    """Emit ``.qvr`` source for a brms-style formula without fitting.

    Builds the formula AST → QVR Module via `FormulaToQVRModule`,
    then serialises the module via [`quivers.dsl.emit.module_to_source`][quivers.dsl.emit.module_to_source].
    Optionally writes the result to ``path``.
    """
    if isinstance(family, str):
        if family not in families:
            raise ValueError(
                f"formula_to_qvr: unknown family {family!r}; choices are "
                f"{sorted(families)}"
            )
        family_obj = families[family]
    else:
        family_obj = family
    parsed = formula_from_data(formula, data)
    lens = FormulaToQVRModule(
        family_obj,
        fixed_prior=fixed_prior,
        random_scale_prior=random_scale_prior,
        user_priors=priors,
        reparameterize=reparameterize,
    )
    module, _ = lens.forward(parsed)
    source = module_to_source(module)
    if path is not None:
        Path(path).write_text(source)
    return source

formula_from_data

formula_from_data(formula: str, data: IntoDataFrame, *, extra_namespace: Mapping[str, object] | None = None) -> Formula

Build a typed Formula IR by lifting formulae.design_matrices over a dataframe.

This is an adapter, not a parser: the brms-style formula syntax is parsed by the formulae library; we lift its formulae.matrices.DesignMatrices result into a typed didactic record, augmented with deterministic per-group level orderings and integer-code arrays derived from the dataframe.

The R-style numeric transforms (log, exp, sqrt, abs, sin, cos, tan, log10, log2, log1p, expm1, asin, acos, atan, sinh, cosh, tanh) are pre-loaded into the formulae evaluation namespace so users coming from R / brms get the expected base R behaviour without explicit registration. Polynomial terms via poly(x, k) are orthogonal by default, matching R's stats::poly.

PARAMETER DESCRIPTION
formula

Formula string in brms / lme4 syntax.

TYPE: str

data

Pandas, polars, or any other Narwhals-compatible dataframe.

TYPE: IntoDataFrame

extra_namespace

Additional names visible inside the formula's expression evaluation, merged on top of the R-style transforms.

TYPE: Mapping[str, object] DEFAULT: None

Source code in src/quivers/formulas/formula.py
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
def formula_from_data(
    formula: str,
    data: IntoDataFrame,
    *,
    extra_namespace: Mapping[str, object] | None = None,
) -> Formula:
    """Build a typed `Formula` IR by lifting
    `formulae.design_matrices` over a dataframe.

    This is an adapter, not a parser: the brms-style formula syntax
    is parsed by the [`formulae`](https://bambinos.github.io/formulae/)
    library; we lift its `formulae.matrices.DesignMatrices`
    result into a typed didactic record, augmented with deterministic
    per-group level orderings and integer-code arrays derived from
    the dataframe.

    The R-style numeric transforms (``log``, ``exp``, ``sqrt``,
    ``abs``, ``sin``, ``cos``, ``tan``, ``log10``, ``log2``,
    ``log1p``, ``expm1``, ``asin``, ``acos``, ``atan``, ``sinh``,
    ``cosh``, ``tanh``) are pre-loaded into the formulae evaluation
    namespace so users coming from R / brms get the expected base
    R behaviour without explicit registration.  Polynomial terms via
    ``poly(x, k)`` are orthogonal by default, matching R's
    ``stats::poly``.

    Parameters
    ----------
    formula : str
        Formula string in brms / lme4 syntax.
    data : IntoDataFrame
        Pandas, polars, or any other Narwhals-compatible dataframe.
    extra_namespace : Mapping[str, object], optional
        Additional names visible inside the formula's expression
        evaluation, merged on top of the R-style transforms.
    """
    nw_df = nw.from_native(data, eager_only=True)
    pandas_df = nw_df.to_pandas()
    namespace: dict[str, object] = dict(_R_TRANSFORMS)
    if extra_namespace:
        namespace.update(extra_namespace)
    dm = fo.design_matrices(formula, data=pandas_df, extra_namespace=namespace)
    if dm.response is None:
        raise ValueError(
            f"formula_from_data: formula {formula!r} has no response "
            f"variable on the left of `~`"
        )
    response_name = dm.response.name
    n_obs = int(pandas_df.shape[0])

    fixed_columns: list[FixedColumn] = []
    if dm.common is not None:
        for term_name, term in dm.common.terms.items():
            fixed_columns.extend(_explode_term(term_name, term, n_obs))

    random_terms: list[RandomTerm] = []
    group_levels: dict[str, tuple[str, ...]] = {}
    group_indices: dict[str, tuple[int, ...]] = {}
    if dm.group is not None:
        for term_name in dm.group.terms.keys():
            if "|" not in term_name:
                raise ValueError(
                    f"formula_from_data: unexpected random term name "
                    f"{term_name!r}; expected `(slope | group)` syntax"
                )
            slope, group = term_name.split("|", 1)
            slope = slope.strip()
            group = group.strip()
            if slope == "1":
                slope = "Intercept"
            random_terms.append(RandomTerm(slope=slope, group=group))
            if group not in group_levels:
                levels = tuple(
                    str(v) for v in nw_df[group].drop_nulls().unique().sort().to_list()
                )
                group_levels[group] = levels
                level_index = {v: i for i, v in enumerate(levels)}
                codes = tuple(level_index[str(v)] for v in nw_df[group].to_list())
                group_indices[group] = codes

    response_values = (
        np.asarray(dm.response.design_matrix).reshape(-1).astype(np.float64)
    )

    return Formula(
        formula=formula,
        response_name=response_name,
        fixed_columns=tuple(fixed_columns),
        random_terms=tuple(random_terms),
        response_values=response_values,
        group_levels=group_levels,
        group_indices=group_indices,
    )