Skip to content

Added support for lifecycle.started for clusters#5150

Open
andrewnester wants to merge 8 commits intomainfrom
feat/lifecycle-started-clusters
Open

Added support for lifecycle.started for clusters#5150
andrewnester wants to merge 8 commits intomainfrom
feat/lifecycle-started-clusters

Conversation

@andrewnester
Copy link
Copy Markdown
Contributor

Changes

Adds lifecycle.started support for clusters in the direct deployment engine, mirroring the same feature for apps (#4672).

Why

Without this field, clusters defined in a bundle are always left in whatever state the API puts them in after creation.
Users have no way to declare "ensure this cluster is running after every deploy."

lifecycle.started: true guarantees the cluster is RUNNING after bundle deploy.
lifecycle.started: false creates the cluster but immediately terminates it, and subsequent deploys that detect drift (e.g., someone started the cluster manually) will stop it again.

Note: WaitAfterCreate always waits for RUNNING first — real clusters start in PENDING state and must be polled. For
started=false, we wait for RUNNING then terminate; this avoids races with the API that would reject a terminate on a still-pending cluster.

Tests

Added acceptance tests

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 30, 2026

Approval status: pending

/acceptance/bundle/ - needs approval

22 files changed
Suggested: @denik
Also eligible: @shreyas-goenka, @pietern, @janniklasrose, @lennartkats-db, @anton-107

/bundle/ - needs approval

8 files changed
Suggested: @denik
Also eligible: @shreyas-goenka, @pietern, @janniklasrose, @lennartkats-db, @anton-107

General files (require maintainer)

Files: libs/testserver/clusters.go, libs/testserver/handlers.go
Based on git history:

  • @denik -- recent work in bundle/deployplan/, bundle/direct/dresources/, libs/testserver/

Any maintainer (@anton-107, @denik, @pietern, @shreyas-goenka, @simonfaltum, @renaudhartert-db) can approve all areas.
See OWNERS for ownership rules.

@andrewnester andrewnester force-pushed the feat/lifecycle-started-clusters branch from d14c9b4 to 75e051f Compare May 1, 2026 12:22
@andrewnester andrewnester temporarily deployed to test-trigger-is May 1, 2026 12:23 — with GitHub Actions Inactive
@andrewnester andrewnester temporarily deployed to test-trigger-is May 1, 2026 12:23 — with GitHub Actions Inactive
Comment thread bundle/direct/dresources/cluster.go Outdated
Comment thread bundle/direct/dresources/cluster.go Outdated
Comment thread bundle/direct/dresources/cluster.go Outdated
Comment thread bundle/direct/dresources/cluster.go Outdated
@andrewnester andrewnester requested a review from denik May 4, 2026 12:40
@andrewnester andrewnester temporarily deployed to test-trigger-is May 4, 2026 12:40 — with GitHub Actions Inactive
@andrewnester andrewnester temporarily deployed to test-trigger-is May 4, 2026 12:40 — with GitHub Actions Inactive
type ClusterState struct {
compute.ClusterSpec

ClusterId string `json:"cluster_id,omitempty"`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this? Suprising that the direct deployment framework does not directly provide id to WaitAfterCreate

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do need this to pass the ClusterId. WaitAfterCreate accepts the state, in many cases Id is in the state already but not for clusters where Id/ClusterId is not part of compute.ClusterSpec

// ClusterRemote extends compute.ClusterDetails with a synthetic Lifecycle field so that
// RemoteType satisfies TestRemoteSuperset (every field in ClusterState exists in ClusterRemote).
// Lifecycle.Started is populated by DoRead from the cluster's running state.
type ClusterRemote struct {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we use the same struct for ClusterRemote and ClusterState?

// lifecycle.started=true: fire Start; WaitAfterUpdate polls for RUNNING.
_, err := r.client.Clusters.Start(ctx, compute.StartCluster{ClusterId: id})
return nil, err
} else if !desiredStarted && alreadyRunning {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also call delete on other states? Like PENDING | RESTARTING | RESIZING | UNKNOWN | ERROR? And poll waiting for the state transition if the state is TERMINATING?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we guarentee TERMINATED if started = false?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good question. I believe we should only explicitly manpulate it if it's in a known good state remotely, if it's not it's better not to

Comment thread bundle/direct/dresources/cluster.go
// cluster_id is stored in state for informational purposes only; it must not appear in plan output.
// PrepareState never sets it (input has no ID), so after the first deploy ch.Old="<id>" while ch.New="",
// causing a spurious Skip entry. Drop it unconditionally so it never pollutes plan JSON.
if path == "cluster_id" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need cluster_id or why do we need to skip it? We use cluster_id to later on wait for cluster status. We need to skip this because cluster_id is not part of the bundle config and we want to have a clean plan as a result anyway

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants