Skip to content

Excel & Google Sheets

Spreadsheets are where a lot of science lives. dpyr treats an .xlsx workbook like a small database: one sheet reads as a dataframe, several sheets open as a catalog, and writing adds or replaces one sheet at a time without destroying the rest. Google Sheets work the same way — paste the browser URL into read().

Setup (one time)

Excel support is an optional extra — install it once:

pip install 'dpyr[excel]'        # or: uv pip install 'dpyr[excel]'

If you forget, dpyr tells you exactly that:

DpyrError: reading .xlsx needs the excel extra: pip install 'dpyr[excel]'

Reading

A workbook with one sheet behaves like a CSV — you get the dataframe:

from dpyr import read, col

field = read("field_data.xlsx")          # the single sheet, as a frame

A workbook with several sheets opens as a Workbook catalog, the same shape as a database: sheets are attributes, with tab completion and did-you-mean on typos.

wb = read("report.xlsx")
print(wb)
# excel workbook: report.xlsx
#   2024 plots
#   notes
# read a sheet: wb.sheet(name), wb[name], or wb.<name>
wb.sheets                  # ['2024 plots', 'notes']
wb.notes                   # plain names work as attributes
wb["2024 plots"]           # names with spaces use [...] or .sheet(...)

To skip the catalog, name the sheet directly as read()'s second argument — and a wrong name tells you what the workbook holds:

read("report.xlsx", "2024 plots")
read("report.xlsx", "2024 plot")
# DpyrError: no sheet named '2024 plot' in 'report.xlsx'.
# Did you mean '2024 plots'? Sheets: 2024 plots, notes

Sheet names must match exactly as they appear on the tab in Excel, including spaces and capitalization.

Writing

The second argument names the worksheet; without it you get Sheet1:

summary.write("results.xlsx")             # one sheet named "Sheet1"
summary.write("results.xlsx", "by_site")  # one sheet named "by_site"

Writing into an existing workbook replaces only the named sheet and keeps the others, so building a multi-sheet report is just repeated writes:

plots.write("report.xlsx", "plots")
notes.write("report.xlsx", "notes")       # plots sheet survives
plots2.write("report.xlsx", "plots")      # replaced in place, notes survives

One honest caveat, and dpyr warns about it: the carried-over sheets are rewritten from their values, so any hand-applied cell formatting (colors, column widths, formulas) in them is lost. For formatting-heavy workbooks, write your data to its own file and link it from the formatted one.

Google Sheets

Copy the sheet's URL straight from the browser's address bar — any docs.google.com/spreadsheets/... link works, no API keys and no client libraries:

wb = read("https://docs.google.com/spreadsheets/d/1AbC.../edit?gid=0")
wb.sheets                          # same Workbook catalog as a local file
read("https://docs.google.com/spreadsheets/d/1AbC.../edit", "plots")

The whole workbook is fetched through Google's export endpoint, so sheet selection, the catalog, and the error messages are identical to local Excel. Two things to know:

  • The sheet must be link-readable. In Google Sheets: Share → "Anyone with the link" (Viewer). A private sheet fails with exactly that instruction — dpyr can't log into your Google account:

    DpyrError: this Google Sheet is not link-readable: '...'. In Google
    Sheets use Share -> 'Anyone with the link' (Viewer), or
    File -> Download -> .xlsx and read the file
    
  • Reads are a snapshot. Each read() downloads the current state of the sheet; collaborators' later edits appear on the next read().

There is no writing back to Google Sheets — write an .xlsx and import it, or keep results in parquet/duckdb.

When things go wrong

  • Missing fileDpyrError: read('field_data.xlsx'): no such file. Check the spelling and your working directory.
  • Wrong sheet name — the error names the available sheets and suggests the closest match (see above).
  • Numbers come in as text (or dates as numbers) — usually the spreadsheet itself has mixed content in the column, often a stray note typed into a cell. Clean the column in Excel, or cast after reading: .mutate(x=col.x.cast(float)).
  • Headers aren't on row 1 — dpyr expects the column names in the first row. Title rows and merged cells above the data confuse the reader; delete them in Excel first.

Good to know

  • Excel files are read eagerly (the whole sheet is parsed up front), unlike CSV and parquet which are lazy scans. Fine for the sizes Excel handles anyway.
  • If a workbook is a one-time import, read it once and .write() to parquet — subsequent reads are faster and typed. For a many-table result that stays in the data world, a duckdb file is the better container; reach for multi-sheet Excel when the audience is a human with a spreadsheet program.