Skip to content

Add ADBC reader behind 'adbc' feature#422

Merged
georgestagg merged 16 commits into
posit-dev:mainfrom
jimhester:pr1-adbc-reader
May 14, 2026
Merged

Add ADBC reader behind 'adbc' feature#422
georgestagg merged 16 commits into
posit-dev:mainfrom
jimhester:pr1-adbc-reader

Conversation

@jimhester
Copy link
Copy Markdown
Contributor

@jimhester jimhester commented May 1, 2026

Adds a generic AdbcReader<D: Driver> to src/reader/, behind a new off-by-default adbc feature. Generic over adbc_core::sync::Driver, so concrete ADBC drivers (Flight SQL, Snowflake, BigQuery, etc.) compose at the call site.

Strategic context and the broader 4-PR plan live at #341 (comment).

Testing

  • Default unit tests (cargo test --features adbc --lib): in-process adbc_datafusion covers routing, conversion, dialect, multi-batch / null / empty-result handling.

  • SQLite ADBC equivalence suite (gated #[ignore]): loads the official Apache Arrow SQLite ADBC driver via ManagedDriver::load_from_name("sqlite", ...) and asserts that AdbcReader<sqlite> matches ggsql's existing SqliteReader on the same query against the same tempfile-backed SQLite database. Run locally with:

    # install dbc via instructions at https://docs.columnar.tech/dbc/
    curl -LsSf https://dbc.columnar.tech/install.sh | sh
    dbc install sqlite
    cargo test --features "adbc sqlite" --lib -- --ignored equivalence

    CI installs both and runs the equivalence suite automatically (see the new step in .github/workflows/build.yaml).

Scope

Rust API only — CLI / Jupyter URI plumbing for ADBC is deferred to its own thread.

adbc is intentionally not added to all-readers here; happy to flip if you'd prefer it included.

Test plan

jimhester added 11 commits May 1, 2026 12:33
New off-by-default 'adbc' feature. Implements ggsql's Reader trait over
any adbc_core::sync::Driver: execute_sql, register, unregister, dialect.
Bridges ADBC's &mut Statement API to Reader's &self via RefCell on the
Connection (same pattern as OdbcReader).

Tests are added in subsequent commits.
adbc_datafusion is added as a dev-dep so unit tests can construct an
in-process AdbcReader without dynamically loading any native driver.

Also enables the workspace `arrow` crate's `ipc` feature when `adbc`
is on, since the reader's IPC roundtrip uses both arrow-56 (workspace)
and arrow-58 (direct) IPC paths.
ggsql's execute pipeline emits CREATE OR REPLACE TEMP TABLE for
layer/stat materialization. adbc_datafusion 0.23 rejects that with
NotImplemented("Temporary tables not supported") and unwraps the
result inside Statement::execute, so the unit-test process panics.
The full pipeline works against drivers that support TEMP TABLE
(DuckDB, Trino, etc.) — see the equivalence-test path for that.
The spike-result note and PR-description draft live in the Netflix
fork's records and in this branch's git history; they don't need to
land in the upstream PR tree. Cleaner diff for review.
Loads the official Apache Arrow SQLite ADBC driver via
`ManagedDriver::load_from_name("sqlite", ...)` (installed by `dbc install
sqlite` to the platform manifest path) and asserts that
`AdbcReader<ManagedDriver>` produces output identical to ggsql's
`SqliteReader` on the same query against the same tempfile-backed SQLite
database.

Three tests cover:

- `equiv_simple_select`: scalar projection (int, text, float)
- `equiv_register_and_query`: register-then-query through the standard
  ADBC bulk-ingest path against a real C driver
- `equiv_nulls`: mixed null + typed values

All `#[ignore]`'d so the default `cargo test` is unchanged. Run with
`cargo test --features "adbc sqlite" -- --ignored equivalence` after
installing the driver.

`register()` now sets `IngestMode::Append` on each batch statement so the
SQLite ADBC driver doesn't try to re-CREATE the table our SQL DDL
already created. DataFusion 0.23 returns `Status::NotFound` from
`set_option(IngestMode, ...)`; that's tolerated and the driver's default
append behavior takes over.

Surfaces one pre-existing divergence: `SqliteReader` types an
exclusively-NULL column as `Utf8` (no values to infer from), while
`AdbcReader` correctly carries through the declared projection type. The
nulls test mixes NULLs with typed values to steer around this.
Adds two cargo test runs to the existing build job:
- `cargo test --features "adbc sqlite" --lib` — exercises the
  adbc_datafusion-backed unit tests
- `cargo test --features "adbc sqlite" --lib -- --ignored equivalence`
  — exercises the SQLite ADBC equivalence suite against the official
  Apache Arrow SQLite ADBC driver, installed via `dbc install sqlite`

The default `cargo test --lib --bins` step is unchanged, so the
existing test surface is untouched. dbc is installed via the official
curl-pipe-sh from dbc.columnar.tech.
dataders added a commit to dataders/ggsql that referenced this pull request May 8, 2026
Forward the CLI crate's optional adbc feature to ggsql/adbc so the CLI can be built against the ADBC reader from PR posit-dev#422 without adding URI, Snowflake, or dbt profile plumbing here.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@georgestagg georgestagg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thank you!

It looks like there is some contention between arrow 58 (from adbc) and 56 (from duckdb, parquet) types. If we can, I'd like to try and avoid the round trip through IPC by aligning everything in the workspace on the same arrow version.

There are some other changes I'd like to make before we take this:

  • Make adbc a default feature, but move the datafusion & duckdb drivers behind a flag.
  • Some additional work to CI to ensure the protobuf compiler is available everywhere we build ggsql.
  • Deal with the conflicts from main.
  • Some minor tweaks to comments and wording.

I am happy to take these change on, and will attempt to push commits directly to this PR branch before merging.

@jimhester
Copy link
Copy Markdown
Contributor Author

Sounds good, I think as long as you update the duckdb-rs version to take this change duckdb/duckdb-rs#702 then the arrow versions would align. duckdb-rs just hard-pins the arrow version, and we didn't want to force a update to duckdb-rs which is why we ended up with this situation. But agree version alignment is the right thing to do.

We also measured the IPC overhead on representative Trino result-set sizes (single-digit MB up to ~100 MB), and it was well under 5% of total query time — dominated by remote-fetch latency. So while the alignment is the right destination, the IPC bridge isn't a major hot-path concern in the meantime, but agree still nice to remove it.

@georgestagg
Copy link
Copy Markdown
Collaborator

Annoyingly, I've started hitting duckdb/duckdb#22133 when I upgrade duckdb to the latest version. Tricky...

@georgestagg georgestagg merged commit 5d363eb into posit-dev:main May 14, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants