Excel & Google Sheets

Spreadsheets are where a lot of science lives. dpyr treats an .xlsx workbook like a small database: one sheet reads as a dataframe, several sheets open as a catalog, and writing adds or replaces one sheet at a time without destroying the rest. Google Sheets work the same way — paste the browser URL into read().

Setup (one time)

Excel support is an optional extra — install it once:

pip install 'dpyr[excel]'        # or: uv pip install 'dpyr[excel]'

If you forget, dpyr tells you exactly that:

DpyrError: reading .xlsx needs the excel extra: pip install 'dpyr[excel]'

Reading

A workbook with one sheet behaves like a CSV — you get the dataframe:

from dpyr import read, col

field = read("field_data.xlsx")          # the single sheet, as a frame

A workbook with several sheets opens as a Workbook catalog, the same shape as a database: sheets are attributes, with tab completion and did-you-mean on typos.

wb = read("report.xlsx")
print(wb)

# excel workbook: report.xlsx
#   2024 plots
#   notes
# read a sheet: wb.sheet(name), wb[name], or wb.<name>

wb.sheets                  # ['2024 plots', 'notes']
wb.notes                   # plain names work as attributes
wb["2024 plots"]           # names with spaces use [...] or .sheet(...)

To skip the catalog, name the sheet directly as read()'s second argument — and a wrong name tells you what the workbook holds:

read("report.xlsx", "2024 plots")
read("report.xlsx", "2024 plot")
# DpyrError: no sheet named '2024 plot' in 'report.xlsx'.
# Did you mean '2024 plots'? Sheets: 2024 plots, notes

Sheet names must match exactly as they appear on the tab in Excel, including spaces and capitalization.

Writing

The second argument names the worksheet; without it you get Sheet1:

summary.write("results.xlsx")             # one sheet named "Sheet1"
summary.write("results.xlsx", "by_site")  # one sheet named "by_site"

Writing into an existing workbook replaces only the named sheet and keeps the others, so building a multi-sheet report is just repeated writes:

plots.write("report.xlsx", "plots")
notes.write("report.xlsx", "notes")       # plots sheet survives
plots2.write("report.xlsx", "plots")      # replaced in place, notes survives

One honest caveat, and dpyr warns about it: the carried-over sheets are rewritten from their values, so any hand-applied cell formatting (colors, column widths, formulas) in them is lost. For formatting-heavy workbooks, write your data to its own file and link it from the formatted one.

Google Sheets

Copy the sheet's URL straight from the browser's address bar — any docs.google.com/spreadsheets/... link works, no API keys and no client libraries:

wb = read("https://docs.google.com/spreadsheets/d/1AbC.../edit?gid=0")
wb.sheets                          # same Workbook catalog as a local file
read("https://docs.google.com/spreadsheets/d/1AbC.../edit", "plots")

The whole workbook is fetched through Google's export endpoint, so sheet selection, the catalog, and the error messages are identical to local Excel. Two things to know:

The sheet must be link-readable. In Google Sheets: Share → "Anyone with the link" (Viewer). A private sheet fails with exactly that instruction — dpyr can't log into your Google account:
```
DpyrError: this Google Sheet is not link-readable: '...'. In Google
Sheets use Share -> 'Anyone with the link' (Viewer), or
File -> Download -> .xlsx and read the file
```
Reads are a snapshot. Each read() downloads the current state of the sheet; collaborators' later edits appear on the next read().

There is no writing back to Google Sheets — write an .xlsx and import it, or keep results in parquet/duckdb.

When things go wrong

Missing file — DpyrError: read('field_data.xlsx'): no such file. Check the spelling and your working directory.
Wrong sheet name — the error names the available sheets and suggests the closest match (see above).
Numbers come in as text (or dates as numbers) — usually the spreadsheet itself has mixed content in the column, often a stray note typed into a cell. Clean the column in Excel, or cast after reading: .mutate(x=col.x.cast(float)).
Headers aren't on row 1 — dpyr expects the column names in the first row. Title rows and merged cells above the data confuse the reader; delete them in Excel first.

Good to know

Excel files are read eagerly (the whole sheet is parsed up front), unlike CSV and parquet which are lazy scans. Fine for the sizes Excel handles anyway.
If a workbook is a one-time import, read it once and .write() to parquet — subsequent reads are faster and typed. For a many-table result that stays in the data world, a duckdb file is the better container; reach for multi-sheet Excel when the audience is a human with a spreadsheet program.