Skip to content

dpyr

dplyr for Python. The tidyverse's verb vocabulary — filter, mutate, group_by, summarize, joins, across, tidyselect, window functions — as Python method chains, executing on polars or duckdb, with real IDE autocompletion and semantics verified against dplyr itself.

pip install dpyr        # or: uv add dpyr
from dpyr import read, col, n, desc

orders = read({
    "customer": ["ana", "ana", "bo", "bo", "bo", "cy"],
    "amount":   [12.5,  30.0,  8.0,  None, 22.0, 5.0],
    "region":   ["east", "east", "west", "west", "west", "east"],
})

(
    orders
    .filter(col.amount > 6)
    .group_by(col.region)
    .summarize(orders = n(), avg = col.amount.mean())
    .arrange(desc(col.avg))
)

Evaluate that in a notebook and you see rows immediately. Mistype a column name and the error arrives on that exact line, with a did-you-mean suggestion. Put the same chain in a pipeline that only calls .collect() at the end and the whole thing runs as one fused, pushed-down query.

Where to go

Why trust it

Every release passes a three-layer proof: differential tests against real dplyr (committed golden files generated by R), a Hypothesis fuzzer that runs random verb chains on both engines and demands identical results, and the deliberate divergences written down in SEMANTICS.md rather than left to chance.