Skip to content

Databases

A duckdb file is the modern "folder of CSVs": one file, many typed tables, instant open, no server and no login. If your project has outgrown a pile of spreadsheets, this is the next step — and dpyr treats the file like any other path.

duckdb files (.db, .duckdb, .ddb)

A database holds many tables, so this is where read()'s second argument earns its keep — it names the table:

from dpyr import read, col, n

db = read("forest.db")            # the whole catalog
plots = read("forest.db", "plots")  # one table, as a frame

Reading the catalog gives a Database object: db.tables lists the table names, and each table is an attribute —

print(db.tables)                  # ['plots', 'tap_summary']
db.plots.filter(col.acres > 2).collect()

Printing db shows every table with its columns and types, so when you don't know what's inside a file, print(read("mystery.db")) is the answer.

Writing

Writing into a database requires a table name, and dpyr reminds you if you forget:

summary.write("forest.db", "tap_summary")   # creates/replaces the table
summary.write("forest.db")
# DpyrError: write('forest.db') needs a table name: write('forest.db', 'orders')

The file is created if it doesn't exist, so dataframe.write("new.db", "results") is all it takes to start a database.

SQLite files (.sqlite, .sqlite3)

Lots of instruments and older tools hand you SQLite. dpyr opens these read-only through duckdb's sqlite scanner — same shape as duckdb files:

db = read("legacy.sqlite")               # catalog
users = read("legacy.sqlite", "users")   # one table

The first use downloads duckdb's sqlite extension, so it needs network access once. If the file isn't actually SQLite (or the extension can't load), the error says so:

DpyrError: could not open 'legacy.sqlite' as sqlite via duckdb's sqlite extension: ...

There's no SQLite writer — write to a duckdb file instead.

Live connections

If you already hold a duckdb.connect() connection — in-memory work, a database with attached remotes, custom settings — read() takes it directly:

import duckdb
con = duckdb.connect("forest.db")

db = read(con)                    # catalog, like read("forest.db")
orders = read(con, "orders")      # one table
db.sql("SELECT * FROM plots WHERE acres > 2")   # arbitrary SQL -> frame

Connections to server databases (Postgres, MySQL) aren't directly supported — but duckdb can ATTACH them, and read(con) then sees their tables. The backends guide covers landing results inside an engine with to_table() and to_view().

When things go wrong

  • read_duckdb: no such file 'forest.db' — reading never creates a database (writing does). Check the path.
  • Unknown table nameprint(read("file.db")) lists what's actually in there; names are case-sensitive.
  • Joining tables from two different database files — dataframes from different connections can't meet in one query; read both tables from the same file/connection, or see Joins for how dpyr bridges across.