Add ADBC reader behind 'adbc' feature#422
Conversation
New off-by-default 'adbc' feature. Implements ggsql's Reader trait over any adbc_core::sync::Driver: execute_sql, register, unregister, dialect. Bridges ADBC's &mut Statement API to Reader's &self via RefCell on the Connection (same pattern as OdbcReader). Tests are added in subsequent commits.
adbc_datafusion is added as a dev-dep so unit tests can construct an in-process AdbcReader without dynamically loading any native driver. Also enables the workspace `arrow` crate's `ipc` feature when `adbc` is on, since the reader's IPC roundtrip uses both arrow-56 (workspace) and arrow-58 (direct) IPC paths.
ggsql's execute pipeline emits CREATE OR REPLACE TEMP TABLE for
layer/stat materialization. adbc_datafusion 0.23 rejects that with
NotImplemented("Temporary tables not supported") and unwraps the
result inside Statement::execute, so the unit-test process panics.
The full pipeline works against drivers that support TEMP TABLE
(DuckDB, Trino, etc.) — see the equivalence-test path for that.
The spike-result note and PR-description draft live in the Netflix fork's records and in this branch's git history; they don't need to land in the upstream PR tree. Cleaner diff for review.
Loads the official Apache Arrow SQLite ADBC driver via
`ManagedDriver::load_from_name("sqlite", ...)` (installed by `dbc install
sqlite` to the platform manifest path) and asserts that
`AdbcReader<ManagedDriver>` produces output identical to ggsql's
`SqliteReader` on the same query against the same tempfile-backed SQLite
database.
Three tests cover:
- `equiv_simple_select`: scalar projection (int, text, float)
- `equiv_register_and_query`: register-then-query through the standard
ADBC bulk-ingest path against a real C driver
- `equiv_nulls`: mixed null + typed values
All `#[ignore]`'d so the default `cargo test` is unchanged. Run with
`cargo test --features "adbc sqlite" -- --ignored equivalence` after
installing the driver.
`register()` now sets `IngestMode::Append` on each batch statement so the
SQLite ADBC driver doesn't try to re-CREATE the table our SQL DDL
already created. DataFusion 0.23 returns `Status::NotFound` from
`set_option(IngestMode, ...)`; that's tolerated and the driver's default
append behavior takes over.
Surfaces one pre-existing divergence: `SqliteReader` types an
exclusively-NULL column as `Utf8` (no values to infer from), while
`AdbcReader` correctly carries through the declared projection type. The
nulls test mixes NULLs with typed values to steer around this.
Adds two cargo test runs to the existing build job: - `cargo test --features "adbc sqlite" --lib` — exercises the adbc_datafusion-backed unit tests - `cargo test --features "adbc sqlite" --lib -- --ignored equivalence` — exercises the SQLite ADBC equivalence suite against the official Apache Arrow SQLite ADBC driver, installed via `dbc install sqlite` The default `cargo test --lib --bins` step is unchanged, so the existing test surface is untouched. dbc is installed via the official curl-pipe-sh from dbc.columnar.tech.
Forward the CLI crate's optional adbc feature to ggsql/adbc so the CLI can be built against the ADBC reader from PR posit-dev#422 without adding URI, Snowflake, or dbt profile plumbing here. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
This looks great, thank you!
It looks like there is some contention between arrow 58 (from adbc) and 56 (from duckdb, parquet) types. If we can, I'd like to try and avoid the round trip through IPC by aligning everything in the workspace on the same arrow version.
There are some other changes I'd like to make before we take this:
- Make
adbca default feature, but move the datafusion & duckdb drivers behind a flag. - Some additional work to CI to ensure the protobuf compiler is available everywhere we build ggsql.
- Deal with the conflicts from
main. - Some minor tweaks to comments and wording.
I am happy to take these change on, and will attempt to push commits directly to this PR branch before merging.
|
Sounds good, I think as long as you update the duckdb-rs version to take this change duckdb/duckdb-rs#702 then the arrow versions would align. duckdb-rs just hard-pins the arrow version, and we didn't want to force a update to duckdb-rs which is why we ended up with this situation. But agree version alignment is the right thing to do. We also measured the IPC overhead on representative Trino result-set sizes (single-digit MB up to ~100 MB), and it was well under 5% of total query time — dominated by remote-fetch latency. So while the alignment is the right destination, the IPC bridge isn't a major hot-path concern in the meantime, but agree still nice to remove it. |
|
Annoyingly, I've started hitting duckdb/duckdb#22133 when I upgrade duckdb to the latest version. Tricky... |
Adds a generic
AdbcReader<D: Driver>tosrc/reader/, behind a new off-by-defaultadbcfeature. Generic overadbc_core::sync::Driver, so concrete ADBC drivers (Flight SQL, Snowflake, BigQuery, etc.) compose at the call site.Strategic context and the broader 4-PR plan live at #341 (comment).
Testing
Default unit tests (
cargo test --features adbc --lib): in-processadbc_datafusioncovers routing, conversion, dialect, multi-batch / null / empty-result handling.SQLite ADBC equivalence suite (gated
#[ignore]): loads the official Apache Arrow SQLite ADBC driver viaManagedDriver::load_from_name("sqlite", ...)and asserts thatAdbcReader<sqlite>matches ggsql's existingSqliteReaderon the same query against the sametempfile-backed SQLite database. Run locally with:CI installs both and runs the equivalence suite automatically (see the new step in
.github/workflows/build.yaml).Scope
Rust API only — CLI / Jupyter URI plumbing for ADBC is deferred to its own thread.
adbcis intentionally not added toall-readershere; happy to flip if you'd prefer it included.Test plan
cargo test(default features) unchangedcargo test --features adbc --libpassescargo test --features "adbc sqlite" --lib -- --ignored equivalencepasses afterdbc install sqlite