In-memory objects
Everything tabular that's already in your Python session goes through
the same door. read() dispatches on the object's type — no flags:
from dpyr import read, col
read({"x": [1, 2], "y": ["a", "b"]}) # plain Python data
read(pandas_dataframe) # near-zero copy
read(polars_dataframe) # zero copy
read(arrow_table) # zero copy
read(numpy_2d_array) # columns column_0, column_1, ...
read(torch_or_jax_tensor) # CPU tensors
read(hf_dataset) # Hugging Face, arrow-backed
This is the everyday bridge from other libraries: whatever a colleague's
pandas script produces, read() it and continue in dpyr verbs.
Dicts: the quick way to test something
A dict of lists is the fastest table you can type, which makes it ideal for trying a verb or building a tiny example:
trees = read({
"species": ["sugar maple", "red oak"],
"height_m": [24.0, 19.5],
})
Keys become column names; lists must all have the same length (the error tells you which one doesn't match).
Hugging Face datasets
A DatasetDict has named splits, so — as everywhere in dpyr where a
source holds several tables — the second argument picks one:
from datasets import load_dataset
dd = load_dataset("user/dataset")
train = read(dd, "train")
A single Dataset (already one split) needs no second argument.
Getting back out
collect() gives a polars DataFrame, and a dataframe also exits directly
with to_pandas(), to_numpy(), to_torch(), and to_jax(). The
backends guide
shows the ML round-trips in detail.
When things go wrong
read() doesn't know what to do with <SomeType>— the object isn't one of the supported kinds; the error lists them. Convert to a dict or pandas/polars dataframe first.read(table=...) only applies to database sources and Hugging Face dataset splits— the second argument means nothing for a dict or a dataframe; drop it.- GPU tensors — move them to CPU first (
tensor.cpu()); dpyr reads CPU memory.