feat(agent): transparent routing through agent tunnel#1741
feat(agent): transparent routing through agent tunnel#1741irvingouj@Devolutions (irvingoujAtDevolution) wants to merge 29 commits into
Conversation
8638365 to
ad3d3a0
Compare
76acc25 to
b70e278
Compare
There was a problem hiding this comment.
Pull request overview
Adds transparent target-based routing through the QUIC agent tunnel so the gateway can automatically forward connections via an enrolled agent when the destination matches advertised subnets/domains.
Changes:
- Introduces a shared agent-tunnel routing pipeline (
resolve_route/try_route) and wires it into forwarding (WS TCP/TLS), RDP clean path, and KDC proxy. - Extends route advertisements to support IPv4+IPv6 subnets and normalized domain suffix matching (longest domain suffix wins).
- Updates RDP clean-path server connection logic to support both TCP and QUIC transports via a concrete
ServerTransportenum (to preserveSend).
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| devolutions-gateway/src/rdp_proxy.rs | Updates Kerberos send function signature usage for CredSSP network requests. |
| devolutions-gateway/src/rd_clean_path.rs | Splits clean-path into authorization vs connect; adds TCP/QUIC ServerTransport for server side. |
| devolutions-gateway/src/proxy.rs | Tightens transport bounds to require Send for both sides. |
| devolutions-gateway/src/generic_client.rs | Integrates agent-tunnel routing into generic TCP forwarding path. |
| devolutions-gateway/src/api/rdp.rs | Plumbs agent_tunnel_handle into the RDP handler path. |
| devolutions-gateway/src/api/kdc_proxy.rs | Adds optional agent-tunnel routing to KDC proxy send path and generalizes reply reading. |
| devolutions-gateway/src/api/fwd.rs | Plumbs agent_tunnel_handle into WS forwarder and routes via tunnel when matched. |
| devolutions-gateway/src/agent_tunnel/routing.rs | New shared routing pipeline + unit tests. |
| devolutions-gateway/src/agent_tunnel/registry.rs | Adds target matching helpers and agent lookup by subnet/domain specificity; moves to IpNetwork. |
| devolutions-gateway/src/agent_tunnel/mod.rs | Exposes new routing module. |
| devolutions-agent/src/tunnel_helpers.rs | Extends tunnel target parsing/resolution to support IPv6 and IpNetwork. |
| devolutions-agent/src/tunnel.rs | Switches advertised subnets to IpNetwork and domains to normalized DomainName. |
| crates/agent-tunnel-proto/src/stream.rs | Refactors framing helpers placement and control stream split types. |
| crates/agent-tunnel-proto/src/lib.rs | Re-exports DomainName. |
| crates/agent-tunnel-proto/src/control.rs | Introduces DomainName and changes subnet advertisement type to IpNetwork (IPv4+IPv6). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
b70e278 to
80aed20
Compare
f323f30 to
3c49f7f
Compare
- routing.rs: when `explicit_agent_id` is set but the gateway has no
tunnel handle, return `Err` instead of silently falling back to a
direct connect. A token that names a specific `jet_agent_id` is
declaring a required network boundary; silent fallback would bypass
it.
- api/fwd.rs, generic_client.rs, rd_clean_path.rs, api/kdc_proxy.rs:
use `TargetAddr::as_addr()` (which brackets IPv6) instead of
`format!("{host}:{port}")` or `to_string()` (which includes scheme).
Fixes two real bugs: IPv6 targets were malformed (`::1:443` vs
`[::1]:443`), and kdc_proxy was passing `tcp://host:88` to the
tunnel target parser — which only accepts bare `host:port`.
- rdp_proxy.rs: add a `TODO(agent-tunnel)` documenting that CredSSP
Kerberos network requests cannot currently traverse the agent
tunnel because `send_network_request` hardcodes `None` for the
handle. Edge case (KDC behind a NAT'd site only reachable via an
enrolled agent); plumbing the handle through `RdpProxy` is a
follow-up.
- tests/agent_tunnel_routing.rs: replace a flaky `thread::sleep(10ms)`
(Windows timer resolution is ~16 ms) with an explicit
`set_received_at_for_test` helper. Adds two new tests for the new
explicit-agent-without-handle error path.
- registry.rs: expose `set_received_at_for_test` for the above.
- agent-tunnel-proto/control.rs: fix a stale doc comment that claimed
`subnets` is IPv4+IPv6 (it is IPv4-only; `Vec<Ipv4Network>`).
- routing.rs: when `explicit_agent_id` is set but the gateway has no
tunnel handle, return `Err` instead of silently falling back to a
direct connect. A token that names a specific `jet_agent_id` is
declaring a required network boundary; silent fallback would bypass
it.
- api/fwd.rs, generic_client.rs, rd_clean_path.rs, api/kdc_proxy.rs:
use `TargetAddr::as_addr()` (which brackets IPv6) instead of
`format!("{host}:{port}")` or `to_string()` (which includes scheme).
Fixes two real bugs: IPv6 targets were malformed (`::1:443` vs
`[::1]:443`), and kdc_proxy was passing `tcp://host:88` to the
tunnel target parser — which only accepts bare `host:port`.
- rdp_proxy.rs: add a `TODO(agent-tunnel)` documenting that CredSSP
Kerberos network requests cannot currently traverse the agent
tunnel because `send_network_request` hardcodes `None` for the
handle. Edge case (KDC behind a NAT'd site only reachable via an
enrolled agent); plumbing the handle through `RdpProxy` is a
follow-up.
- tests/agent_tunnel_routing.rs: replace a flaky `thread::sleep(10ms)`
(Windows timer resolution is ~16 ms) with an explicit
`set_received_at_for_test` helper. Adds two new tests for the new
explicit-agent-without-handle error path.
- registry.rs: expose `set_received_at_for_test` for the above.
- agent-tunnel-proto/control.rs: fix a stale doc comment that claimed
`subnets` is IPv4+IPv6 (it is IPv4-only; `Vec<Ipv4Network>`).
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 31 out of 32 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- routing.rs: when `explicit_agent_id` is set but the gateway has no
tunnel handle, return `Err` instead of silently falling back to a
direct connect. A token that names a specific `jet_agent_id` is
declaring a required network boundary; silent fallback would bypass
it.
- api/fwd.rs, generic_client.rs, rd_clean_path.rs, api/kdc_proxy.rs:
use `TargetAddr::as_addr()` (which brackets IPv6) instead of
`format!("{host}:{port}")` or `to_string()` (which includes scheme).
Fixes two real bugs: IPv6 targets were malformed (`::1:443` vs
`[::1]:443`), and kdc_proxy was passing `tcp://host:88` to the
tunnel target parser — which only accepts bare `host:port`.
- rdp_proxy.rs: add a `TODO(agent-tunnel)` documenting that CredSSP
Kerberos network requests cannot currently traverse the agent
tunnel because `send_network_request` hardcodes `None` for the
handle. Edge case (KDC behind a NAT'd site only reachable via an
enrolled agent); plumbing the handle through `RdpProxy` is a
follow-up.
- tests/agent_tunnel_routing.rs: replace a flaky `thread::sleep(10ms)`
(Windows timer resolution is ~16 ms) with an explicit
`set_received_at_for_test` helper. Adds two new tests for the new
explicit-agent-without-handle error path.
- registry.rs: expose `set_received_at_for_test` for the above.
- agent-tunnel-proto/control.rs: fix a stale doc comment that claimed
`subnets` is IPv4+IPv6 (it is IPv4-only; `Vec<Ipv4Network>`).
137a51f to
9f3321b
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 31 out of 32 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 33 out of 34 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
* `upstream::ConnectedUpstream::server_addr` for tunneled targets now reports the target IP:port (or `0.0.0.0:<port>` for a hostname target) instead of `0.0.0.0:0`. Logs / PCAP filenames / session info surface a meaningful peer address again. * `RoutePlan` and its methods are now `pub(crate)`; they were never meant to leak outside the upstream module. * `RoutingDecision::ExplicitAgentNotFound` is logged and degraded to Direct rather than panicking via `unreachable!` — the routing crate could grow new branches and we should not crash the gateway on a contract drift. * `/jet/tunnel/enrollment-string` now returns `400` when neither `quic_host` is provided nor a parseable host can be extracted from `api_base_url`. The previous silent fallback to `conf.hostname` was a Docker/K8s footgun (often a container ID the agent cannot dial), and the token store insert is now performed only after that check so a 400 leaves no orphan token. * `AgentManagementReadAccess` and `AgentManagementWriteAccess` reject with an error message that names the accepted scopes, easing integration debugging.
Earlier, when an RDP fwd request did not match the registry the only visible breadcrumb was the eventual `Connected to destination server` line — which fires for both Direct and ViaAgent paths and gave no way to tell whether the registry was even consulted. Adding a single debug! at the resolution call site lets an operator distinguish "target_host did not match any agent" from "registry never asked", which was already the difference between two real bug reports during smoke testing.
* `quic_host` field doc updated to match the implementation, which rejects with 400 instead of falling back to `conf.hostname`. * Reject overflowing `lifetime` values in `/jet/tunnel/enrollment-string`: with the prior `now_secs + lifetime_secs` an attacker-controlled lifetime could wrap and emit a token whose stored expiry is in the past, looking valid to the caller while it is in fact already redeemable as expired. Validate up-front so the in-memory store is never poisoned. * Mirror the gateway's dual-stack fallback on the agent's QUIC client socket: try `[::]:0` first, fall back to `0.0.0.0:0` with a warn when the host has IPv6 disabled (typical of stripped-down Linux containers). Prevents the agent from being stranded on hosts where the v6 bind itself fails. * Renewal CSR's CommonName was being filled with `tunnel_conf.gateway_endpoint` (a `host:port`), which only worked because the gateway ignores the CSR subject and trusts the mTLS-authenticated identity. Read the agent's CommonName from the existing certificate (the authoritative source for the registered name) and use it for the renewal CSR.
Refactoring the upstream connect path collapsed the two distinct "WebSocket-TCP forwarding" / "WebSocket-TLS forwarding" messages into a single "WebSocket forwarding" with a `mode` structured field. The TLS-anchoring integration test (and operators' existing log greps) key on the literal pre-refactor strings, so the test `cli::dgw::tls_anchoring::test::case_1_self_signed_correct_thumb` hung waiting for them and CI failed. Keep the structured field for new telemetry but emit the original message text per mode.
The `POST /jet/tunnel/enrollment-string` endpoint had the gateway generating, storing, and returning a UUID enrollment token wrapped in a `dgw-enroll:v1:<base64-json>` envelope. That put token issuance and state where it does not belong: in Devolutions' architecture, DVLS is the only authority for tokens (it holds the provisioner private key), the gateway is a stateless verifier (it holds the public key). The in-memory `EnrollmentTokenStore` also broke HA — an agent could redeem its token only against the specific gateway node that minted it, and a gateway restart silently invalidated all unredeemed tokens. The same `/jet/tunnel/enroll` handler already accepts a JWT scope token (`TunnelEnroll` / `Wildcard`) signed by the provisioner key, which is the correct path: DVLS signs, the agent presents the JWT, the gateway verifies statelessly. With the redundant path removed, the gateway no longer mints tokens at all. Removes the route, the request/response types, the `EnrollmentTokenStore` itself, and the corresponding branch from the `enroll` handler. The `gateway.agent.enroll` scope is kept (DVLS signs `TunnelEnroll` JWTs with that scope on its scope-token path) and so is the static `enrollment_secret` fallback for environments without DVLS.
Two scopes had grown for the same concept now that DVLS mints the enrollment JWT itself: `gateway.tunnel.enroll` was the scope on the JWT presented by the agent at `/jet/tunnel/enroll`, and `gateway.agent.enroll` was the scope DVLS used when calling the removed `/jet/tunnel/enrollment-string` endpoint. With that endpoint gone, the second meaning is dead and the first is the only one that actually authorizes anything on the wire. Drop `AccessScope::TunnelEnroll` and have the gateway-side validator accept `AgentEnroll | Wildcard`. DVLS signs with `agent.enroll`. The .NET side gets a new `EnrollmentClaims` class that mirrors the Rust `EnrollmentTokenClaims` shape (scope + jet_gw_url + optional jet_agent_name + optional jet_quic_endpoint) so `TokenUtils.Sign` can emit the JWT directly. NuGet bumped to 2025.10.3.
* The cert renewal check used to run only once, immediately after the QUIC connection was established. With a 30-day cert and a 15-day renewal threshold an agent that stays connected long enough never reaches the check again, and once the cert expires the next mTLS reconnect fails — the renewal request can no longer be sent because the tunnel cannot come up at all. Add an hourly tick in the main select! loop: when it detects the cert has entered the renewal window, close the connection and surface ConnectionOutcome::CertRenewed, routing the agent back through the existing pre-loop renewal block on reconnect (where the control stream is still un-split, so the request/response handshake is straightforward). * KDC routing in send_krb_message walked the agent-tunnel pipeline for any matching subnet/domain, but the agent only speaks ConnectRequest::tcp. A `udp://` KDC token whose host happened to match an enrolled agent would therefore be delivered to the agent as a TCP target — wrong protocol semantics that silently breaks UDP Kerberos deployments. Skip agent routing for non-tcp KDC schemes; fall through to the direct path, which honors udp/tcp correctly.
* `read_kdc_reply_message` no longer trusts the 4-byte length prefix blindly. A misbehaving (or malicious) tunnel peer that announces `u32::MAX` would otherwise cause us to pre-allocate ~4 GiB and OOM. Cap the announced length at 64 KiB (well above any realistic Kerberos reply), use checked arithmetic on the header, and surface oversize/overflow as `io::Error` instead of panicking. * `route_and_connect` returns `Err` on an empty candidate slice rather than `assert!`-ing. The function is a public API in a library crate; mis-calling it should not crash the gateway process. * `set_last_seen_for_test` and `set_received_at_for_test` are explicitly named test-only by the `_for_test` suffix. Add `#[doc(hidden)]` and a comment explaining why they remain `pub` (cross-crate integration tests need them; `cfg(test)` only fires in the declaring crate's own test build) so production callers do not pick them up by accident. * Fixed the misleading "skip hostname verification" comment in the agent-tunnel integration test — the test does not skip hostname validation, it just narrows the trusted-roots set to the test CA. * Added round-trip JSON tests for the new `EnrollmentClaims` (full claims set + null-omission of optional fields).
The agent's `up` subcommand could only persist subnets via CLI; domain advertisements still required hand-editing `agent.json` after enrollment, which made demos and one-off setups awkward. Mirror the existing `--advertise-subnets` shape: accept a comma-separated list and persist it on enrollment, falling back to the on-disk value when the flag is omitted (so an existing config is not silently wiped). `enroll.nu` at the project root wraps the full demo flow into one command — wipe previous state, run `up` with a JWT and the project's smoke-test advertise lists, then start the agent service in the foreground.
JWTs minted from the DVLS UI without an explicit name in the dialog do not carry a `jet_agent_name` claim, and the agent's `up` subcommand requires `--name`. Pass a sensible default so the demo helper works on any JWT, and let the caller override: nu enroll.nu "<JWT>" nu enroll.nu "<JWT>" my-agent
* Drop `enroll.nu` — it was a developer convenience for local smoke
tests, not part of the shipped product.
* Document `EnrollmentClaims.JetAgentName` end-to-end: explain that
the gateway never reads it (auth is by signature/scope), the
authoritative name is sent in the agent's enrollment request body,
and the JWT claim is read agent-side as the default for the
`--name` CLI flag — letting DVLS pre-fill the name typed in the
"Generate Enrollment String" dialog.
* Move the `agent_tunnel_*` integration tests out of
`devolutions-gateway/tests/` and into the `testsuite` crate's
central test binary (`testsuite/tests/agent_tunnel/{integration,
registry, routing}.rs`), where the rest of the cross-crate
integration tests already live. Drop the now-ineffective
`#![allow(unused_crate_dependencies)]` inner attributes (the lint
is crate-level only) and add the agent-tunnel-related dev deps to
`testsuite/Cargo.toml`.
Maintainer asked for the dual-stack listener change to ship as its own PR (#1741 review on `service.rs`). Restore the original v4-only bind on both ends so this PR is back to "transparent routing" scope only: * gateway listener — `Ipv4Addr::UNSPECIFIED` instead of `Ipv6Addr::UNSPECIFIED` * agent client — `0.0.0.0:0` instead of the `[::]:0` + IPv4 fallback * remove the `bind_dual_stack_endpoint` / `build_dual_stack_v6_socket` helpers and the `socket2` dependency from `agent-tunnel` The dual-stack work is being re-submitted on a separate branch.
PR #1741 was reviewed as too large. Reduce its scope to A+B (refactor + transparent routing) by backing out the cert-renewal additions (C) and the JWT-based enrollment pivot (D). Both will be opened as their own PRs against master. Cert renewal (C) removed: - Agent-side: drop the pre-loop expiry check, periodic cert_expiry_tick in the main select! loop, ConnectionOutcome enum, and the `is_cert_expiring` / `read_agent_name_from_cert` / `generate_csr_from_existing_key` helpers from enrollment.rs. - Gateway-side: drop the agent's ability to drive renewal; the CertRenewal proto messages stay (they exist on master from #1738) and the listener keeps the stub debug-and-drop arm. AGENT_CERT_VALIDITY_DAYS reverts to 365. JWT enrollment refactor (D) removed: - Gateway: revert token.rs (TunnelEnroll only, no AgentEnroll/AgentRead), extract.rs (no AgentManagement scope unions), and api/tunnel.rs to master (EnrollmentTokenStore-backed enroll handler with quic_endpoint in the response). - Agent-tunnel crate: restore enrollment_store module + handle getter + registration in bind(). - Agent CLI: revert main.rs and cli_tests.rs to before --advertise-domains (config-side advertise_domains support stays, only the CLI flag goes). Test JWTs go back to gateway.tunnel.enroll scope. - NuGet: delete EnrollmentClaims.cs, drop GatewayAgentEnroll/Read from AccessScope.cs, revert csproj version, drop the new JsonSerializationTests cases.
20bed8e to
b4ec360
Compare
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
mistake, will fix
| let ca_manager = agent_tunnel::cert::CaManager::load_or_generate(&data_dir) | ||
| .context("failed to initialize agent tunnel CA")?; | ||
|
|
||
| // Bind to the IPv6 unspecified address so the listener is dual-stack and |
There was a problem hiding this comment.
mistake, will fix
| let target_addr = TargetAddr::parse(request.url.as_str(), Some(88))?; | ||
|
|
||
| send_krb_message(&target_addr, &request.data) | ||
| // TODO(agent-tunnel): plumb `agent_tunnel_handle` through `RdpProxy` and pass it here |
There was a problem hiding this comment.
Double check later, address in follow up
PR #1741 was reviewed as too large. Reduce its scope to A+B (refactor + transparent routing) by backing out the cert-renewal additions (C) and the JWT-based enrollment pivot (D). Both will be opened as their own PRs against master. Cert renewal (C) removed: - Agent-side: drop the pre-loop expiry check, periodic cert_expiry_tick in the main select! loop, ConnectionOutcome enum, and the `is_cert_expiring` / `read_agent_name_from_cert` / `generate_csr_from_existing_key` helpers from enrollment.rs. - Gateway-side: drop the agent's ability to drive renewal; the CertRenewal proto messages stay (they exist on master from #1738) and the listener keeps the stub debug-and-drop arm. AGENT_CERT_VALIDITY_DAYS reverts to 365. JWT enrollment refactor (D) removed: - Gateway: revert token.rs (TunnelEnroll only, no AgentEnroll/AgentRead), extract.rs (no AgentManagement scope unions), and api/tunnel.rs to master (EnrollmentTokenStore-backed enroll handler with quic_endpoint in the response). - Agent-tunnel crate: restore enrollment_store module + handle getter + registration in bind(). - Agent CLI: revert main.rs and cli_tests.rs to before --advertise-domains (config-side advertise_domains support stays, only the CLI flag goes). Test JWTs go back to gateway.tunnel.enroll scope. - NuGet: delete EnrollmentClaims.cs, drop GatewayAgentEnroll/Read from AccessScope.cs, revert csproj version, drop the new JsonSerializationTests cases.
Trim missed agent-side D content — the JWT enrollment refactor lives in its own follow-up PR, so PR2 should not carry any of it: - enrollment.rs: restore EnrollResponse::quic_endpoint and the original enroll_agent / persist_enrollment_response signatures (no extra quic_endpoint or advertise_domains parameters). Drop the EnrollmentJwtClaims::jet_quic_endpoint claim — the enrollment JWT carries gw_url / agent_name only. - main.rs: drop --quic-endpoint CLI flag, drop UpCommand::quic_endpoint, drop the JWT jet_quic_endpoint extraction, restore the two-arg-shorter service-mode signature, restore the inline cli tests module. - cli_tests.rs: removed (the tests are back inline in main.rs at the master state).
… PRs
Tests (testsuite/tests/agent_tunnel/{integration,registry,routing}.rs) and
the read_cert_chain rewrite are not part of the routing/upstream feature
itself — they ship as their own PRs so this one stays focused on the
feature code:
- Cert PEM parsing fix → #1771
- Agent-tunnel test suite → follow-up PR (stacked on this one)
After this trim, PR2's diff is purely:
- Routing: agent-tunnel/{routing,registry,listener}.rs
- Upstream refactor: devolutions-gateway/upstream.rs and the proxy paths
(fwd, kdc_proxy, rdp, rdp_proxy, rd_clean_path, generic_client)
- Agent client: devolutions-agent/tunnel_helpers.rs (TargetAddr widening
to handle IPv6 alongside IPv4)
Revert kdc_proxy.rs to master. KDC tunnel routing covers two callers (the /jet/KdcProxy HTTP handler and the CredSSP/NLA path in rdp_proxy.rs::send_network_request), and both need to agree on how send_krb_message takes a session_id. Doing only the HTTP handler here forced a Uuid::new_v4() at the routing site for a "session" the KDC token has no notion of -- meaningless on the wire and on the agent log. Move the whole KDC-via-tunnel story (HTTP path + CredSSP path, plus the read_kdc_reply_message DoS cap) into DGW-384 where the API can be designed once with both call sites in view. Also drop the kdc_proxy.rs reference from routing.rs's module doc since this crate's caller is now only the upstream module family.
b4ec360 to
a2b2ff5
Compare
Address CBenoit's review on #1741: in addition to the `#[doc(hidden)]` marker and the `_for_test` naming, gate `AgentPeer::set_last_seen_for_test` and `set_received_at_for_test` behind `#[cfg(any(test, feature = "test-utils"))]` so production builds of `devolutions-gateway` and `devolutions-agent` don't compile these methods at all. Cross-crate test consumers (the workspace `testsuite` crate carrying the agent-tunnel integration tests, in #1772) opt in via `features = ["test-utils"]` on their `agent-tunnel` dev-dep.
Summary
Transparent routing through QUIC agent tunnel (PR 2 of 4, stacked on #1738).
When a connection target matches an agent's advertised subnets or domains, the gateway automatically routes through the QUIC tunnel instead of connecting directly.
Depends on: #1738 (must merge first)
Changes
ServerTransportenum (Tcp/Quic) inrd_clean_path.rsfor RDP tunnel supportPR stack
🤖 Generated with Claude Code