dpyr

dplyr for Python. The tidyverse's verb vocabulary — filter, mutate, group_by, summarize, joins, across, tidyselect, window functions — as Python method chains, executing on polars or duckdb, with real IDE autocompletion and semantics verified against dplyr itself.

pip install dpyr        # or: uv add dpyr

from dpyr import read, col, n, desc

orders = read({
    "customer": ["ana", "ana", "bo", "bo", "bo", "cy"],
    "amount":   [12.5,  30.0,  8.0,  None, 22.0, 5.0],
    "region":   ["east", "east", "west", "west", "west", "east"],
})

(
    orders
    .filter(col.amount > 6)
    .group_by(col.region)
    .summarize(orders = n(), avg = col.amount.mean())
    .arrange(desc(col.avg))
)

Evaluate that in a notebook and you see rows immediately. Mistype a column name and the error arrives on that exact line, with a did-you-mean suggestion. Put the same chain in a pipeline that only calls .collect() at the end and the whole thing runs as one fused, pushed-down query.

Where to go

Get started — the single-table verbs in one sitting.
Reading & writing — every format and source: files, databases, URLs, in-memory objects.
Grouped data — group_by and everything it changes.
Joins — the six two-table verbs.
Window functions — lag, ranks, cumulatives, top-n per group.
Column-wise operations — tidyselect and across.
Reshaping — pivots, separate, unite.
Expressions & autocompletion — how the col proxy and typed completion work.
Backends — connecting and operating polars and duckdb, plus ML interop: Hugging Face datasets, numpy, torch, jax.

Why trust it

Every release passes a three-layer proof: differential tests against real dplyr (committed golden files generated by R), a Hypothesis fuzzer that runs random verb chains on both engines and demands identical results, and the deliberate divergences written down in SEMANTICS.md rather than left to chance.