Added support for lifecycle.started for clusters#5150
Added support for lifecycle.started for clusters#5150andrewnester wants to merge 8 commits intomainfrom
Conversation
Approval status: pending
|
d14c9b4 to
75e051f
Compare
| type ClusterState struct { | ||
| compute.ClusterSpec | ||
|
|
||
| ClusterId string `json:"cluster_id,omitempty"` |
There was a problem hiding this comment.
Do we really need this? Suprising that the direct deployment framework does not directly provide id to WaitAfterCreate
There was a problem hiding this comment.
Yes, we do need this to pass the ClusterId. WaitAfterCreate accepts the state, in many cases Id is in the state already but not for clusters where Id/ClusterId is not part of compute.ClusterSpec
| // ClusterRemote extends compute.ClusterDetails with a synthetic Lifecycle field so that | ||
| // RemoteType satisfies TestRemoteSuperset (every field in ClusterState exists in ClusterRemote). | ||
| // Lifecycle.Started is populated by DoRead from the cluster's running state. | ||
| type ClusterRemote struct { |
There was a problem hiding this comment.
can we use the same struct for ClusterRemote and ClusterState?
| // lifecycle.started=true: fire Start; WaitAfterUpdate polls for RUNNING. | ||
| _, err := r.client.Clusters.Start(ctx, compute.StartCluster{ClusterId: id}) | ||
| return nil, err | ||
| } else if !desiredStarted && alreadyRunning { |
There was a problem hiding this comment.
Should we also call delete on other states? Like PENDING | RESTARTING | RESIZING | UNKNOWN | ERROR? And poll waiting for the state transition if the state is TERMINATING?
There was a problem hiding this comment.
Do we guarentee TERMINATED if started = false?
There was a problem hiding this comment.
It's a good question. I believe we should only explicitly manpulate it if it's in a known good state remotely, if it's not it's better not to
| // cluster_id is stored in state for informational purposes only; it must not appear in plan output. | ||
| // PrepareState never sets it (input has no ID), so after the first deploy ch.Old="<id>" while ch.New="", | ||
| // causing a spurious Skip entry. Drop it unconditionally so it never pollutes plan JSON. | ||
| if path == "cluster_id" { |
There was a problem hiding this comment.
why do we need this?
There was a problem hiding this comment.
Why do we need cluster_id or why do we need to skip it? We use cluster_id to later on wait for cluster status. We need to skip this because cluster_id is not part of the bundle config and we want to have a clean plan as a result anyway
Changes
Adds lifecycle.started support for clusters in the direct deployment engine, mirroring the same feature for apps (#4672).
Why
Without this field, clusters defined in a bundle are always left in whatever state the API puts them in after creation.
Users have no way to declare "ensure this cluster is running after every deploy."
lifecycle.started: trueguarantees the cluster is RUNNING after bundle deploy.lifecycle.started: falsecreates the cluster but immediately terminates it, and subsequent deploys that detect drift (e.g., someone started the cluster manually) will stop it again.Note:
WaitAfterCreatealways waits for RUNNING first — real clusters start in PENDING state and must be polled. Forstarted=false, we wait for RUNNING then terminate; this avoids races with the API that would reject a terminate on a still-pending cluster.Tests
Added acceptance tests