Skip to content

feat(server): add generate-certs subcommand; replace alpine PKI hook#1257

Open
TaylorMutch wants to merge 2 commits intomainfrom
tmutch/gateway-certgen
Open

feat(server): add generate-certs subcommand; replace alpine PKI hook#1257
TaylorMutch wants to merge 2 commits intomainfrom
tmutch/gateway-certgen

Conversation

@TaylorMutch
Copy link
Copy Markdown
Collaborator

Summary

Replaces the alpine + openssl PKI hook with openshell-gateway generate-certs. The command runs in two output modes — Kubernetes Secrets (default) and filesystem (--output-dir). The Helm chart's pre-install hook now uses the gateway image itself; the RPM systemd path will switch over in a follow-up PR.

Changes

  • New generate-certs subcommand on openshell-gateway (crates/openshell-server/src/certgen.rs). Reuses openshell_bootstrap::pki::generate_pki and openshell_bootstrap::mtls::store_pki_bundle — no new cert-generation code.
  • Two output modes:
    • Kubernetes (default): creates two kubernetes.io/tls Secrets (tls.crt/tls.key/ca.crt) via kube-rs.
    • Filesystem (--output-dir <DIR>): writes the 6-file layout used by deploy/rpm/init-pki.sh (<dir>/{ca.crt, ca.key, server/tls.{crt,key}, client/tls.{crt,key}}); also copies client materials to $XDG_CONFIG_HOME/openshell/gateways/openshell/mtls/ for CLI auto-discovery.
  • Server CLI gains optional subcommand support; the bare openshell-gateway invocation still runs the gateway. --db-url validated at the call site instead of clap-required to avoid the flatten + required-field interaction.
  • Helm chart: templates/pki-hook.yaml deleted, replaced by templates/certgen.yaml. pkiInitJob.image.*, caValidityDays, and certValidityDays removed from values.yaml. serverDnsNames/serverIpAddresses defaults emptied (the gateway binary already includes the cluster SANs); the values are now additive overrides.
  • Idempotency contract is the same in both modes: all targets present → skip; partial → fail with a recovery hint; nothing → generate and write. Filesystem mode self-heals the CLI mTLS copy when the server-side PKI is intact but ~/.config was wiped.
  • Atomic filesystem writes via a sibling <dir>.certgen.tmp staging directory; 0o700 dirs, 0o600 keys.

Testing

  • mise run pre-commit passes (clippy -D warnings, fmt, markdownlint, full unit suite).
  • 21 new unit tests in certgen and cli modules — k8s decision table, local decision table, layout, sibling temp path, write_local_bundle (incl. unix permission mode), CLI parse for both modes (with and without --db-url / kube flags), stale-temp recovery.
  • Helm chart helm lint clean across all six CI overlays (values-{gateway,cert-manager,tls-disabled,skaffold,keycloak}.yaml plus default).
  • k3d end-to-end (k8s mode):
    • Fresh install → certgen Job runs, both Secrets created with kubernetes.io/tls type and 3 keys (tls.crt/tls.key/ca.crt).
    • openssl verify validates the chain; server cert SANs include all 6 cluster defaults with no duplicates.
    • Helm upgrade → Job logs PKI secrets already exist, skipping.; secret resourceVersions unchanged.
    • Delete one secret + upgrade → Job fails with × partial PKI state in namespace openshell: ... Recover with: kubectl delete secret -n openshell openshell-server-tls openshell-client-tls.
    • StatefulSet stabilizes; openshell sandbox create -- /bin/echo ... runs end-to-end (compute allocate, image pull, supervisor mTLS callback, command output, auto-delete).
  • Local mode end-to-end (binary invoked directly):
    • 6-file layout written; permissions verified (drwx------ on dirs, -rw------- on *.key).
    • CLI mTLS auto-copy populated at $XDG_CONFIG_HOME/openshell/gateways/openshell/mtls/.
    • Repeat invocation logs PKI files already exist, skipping.
    • Deleting one of the 6 files → × partial PKI state ... Recover with: rm -rf <dir> ...

Checklist

  • Follows Conventional Commits
  • Architecture docs updated (architecture/gateway.md PKI Bootstrap subsection)
  • Helm README updated (deploy/helm/openshell/README.md)
  • helm-dev-environment skill updated to reflect the new hook description

Follow-up

A separate PR will swap deploy/rpm/init-pki.sh for openshell-gateway generate-certs --output-dir %S/openshell/tls in openshell.spec's ExecStartPre and delete the shell script.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 7, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@TaylorMutch TaylorMutch added the test:e2e Requires end-to-end coverage label May 7, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 7, 2026

Label test:e2e applied, but pull-request/1257 is at {"messa while the PR head is f7e72b0. A maintainer needs to comment /ok to test f7e72b03b593d516b2546c59e847bf51d8a9232a to refresh the mirror. Once the mirror catches up, re-run Branch E2E Checks from the Actions tab.

@TaylorMutch
Copy link
Copy Markdown
Collaborator Author

/ok to test f7e72b0

Introduce `openshell-gateway generate-certs` modeled on envoyproxy/gateway's
certgen pattern. The Helm pre-install/pre-upgrade hook now runs the gateway
image instead of an alpine + openssl shell job — one image to mirror in
air-gapped environments, one PKI implementation, real test coverage.

Reuses `openshell_bootstrap::pki::generate_pki` for CA/server/client cert
generation. Idempotency contract preserved: both Secrets exist → skip; one
exists → fail with `kubectl delete` recovery hint; neither exists → POST
both `kubernetes.io/tls` Secrets.

The server CLI gains optional subcommand support: bare `openshell-gateway`
still runs the gateway, `generate-certs` runs the new path. `--db-url`
moved from clap-required to call-site validated to avoid the clap flatten +
required-field landmine.
Presence of `--output-dir <DIR>` switches the subcommand from Kubernetes
Secret writes to filesystem writes, making the kube flags optional.

Local layout matches `deploy/rpm/init-pki.sh` exactly:
  <dir>/{ca.crt, ca.key, server/tls.{crt,key}, client/tls.{crt,key}}

Stages writes to a sibling `<dir>.certgen.tmp` and renames into place for
atomic per-file installation. Sets 0o700 on directories and 0o600 on key
files. Reuses `openshell_bootstrap::mtls::store_pki_bundle` to populate
the CLI auto-discovery directory at $XDG_CONFIG_HOME/openshell/gateways/
openshell/mtls/, mirroring init-pki.sh's local-CLI UX.

Same idempotency contract as the Kubernetes path: all six files present →
skip (and self-heal the CLI mTLS copy if missing); partial → fail with a
recovery hint; nothing → generate and write.

Sets up the seam for a follow-up PR that swaps init-pki.sh for the Rust
command in the systemd unit. The shell script and unit are untouched here.
@TaylorMutch TaylorMutch force-pushed the tmutch/gateway-certgen branch from f7e72b0 to e845d31 Compare May 8, 2026 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

test:e2e Requires end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant