dpyr
dplyr for Python. The tidyverse's verb vocabulary — filter, mutate,
group_by, summarize, joins, across, tidyselect, window functions — as
Python method chains, executing on polars or
duckdb, with real IDE autocompletion and semantics
verified against dplyr itself.
pip install dpyr # or: uv add dpyr
from dpyr import read, col, n, desc
orders = read({
"customer": ["ana", "ana", "bo", "bo", "bo", "cy"],
"amount": [12.5, 30.0, 8.0, None, 22.0, 5.0],
"region": ["east", "east", "west", "west", "west", "east"],
})
(
orders
.filter(col.amount > 6)
.group_by(col.region)
.summarize(orders = n(), avg = col.amount.mean())
.arrange(desc(col.avg))
)
Evaluate that in a notebook and you see rows immediately. Mistype a column
name and the error arrives on that exact line, with a did-you-mean
suggestion. Put the same chain in a pipeline that only calls .collect()
at the end and the whole thing runs as one fused, pushed-down query.
Where to go
- Get started — the single-table verbs in one sitting.
- Reading & writing — every format and source: files, databases, URLs, in-memory objects.
- Grouped data —
group_byand everything it changes. - Joins — the six two-table verbs.
- Window functions —
lag, ranks, cumulatives, top-n per group. - Column-wise operations — tidyselect and
across. - Reshaping — pivots,
separate,unite. - Expressions & autocompletion — how the
colproxy and typed completion work. - Backends — connecting and operating polars and duckdb, plus ML interop: Hugging Face datasets, numpy, torch, jax.
Why trust it
Every release passes a three-layer proof: differential tests against real dplyr (committed golden files generated by R), a Hypothesis fuzzer that runs random verb chains on both engines and demands identical results, and the deliberate divergences written down in SEMANTICS.md rather than left to chance.