docs(website): add kubernetes autoscaling guide#25732
Conversation
|
Your preview site for the vector.dev will be ready in a few minutes, please allow time for it to build. Heres your preview link: |
This comment has been minimized.
This comment has been minimized.
|
Your preview site for the Rust Doc will be ready in a few minutes, please allow time for it to build. Heres your preview link: |
|
Your preview site for the VRL Playground will be ready in a few minutes, please allow time for it to build. Heres your preview link: |
|
Your preview site for the vector.dev will be ready in a few minutes, please allow time for it to build. Heres your preview link: |
|
Your preview site for the VRL Playground will be ready in a few minutes, please allow time for it to build. Heres your preview link: |
|
Your preview site for the Rust Doc will be ready in a few minutes, please allow time for it to build. Heres your preview link: |
|
Your preview site for the vector.dev will be ready in a few minutes, please allow time for it to build. Heres your preview link: |
|
Your preview site for the Rust Doc will be ready in a few minutes, please allow time for it to build. Heres your preview link: |
|
Your preview site for the VRL Playground will be ready in a few minutes, please allow time for it to build. Heres your preview link: |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 40025bd3dd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| variable "ssh_public_key_path" { | ||
| type = string | ||
| default = "~/.ssh/vector_tests.pub" |
There was a problem hiding this comment.
Expand default SSH key paths before use
When users rely on the default ~/.ssh/vector_tests.pub, it is passed directly to file(var.ssh_public_key_path) in main.tf. Terraform’s pathexpand docs identify that function as the one that expands a leading ~; without it here, Terraform looks for a literal ~ path from the module and fails before creating the AWS key pair. Use file(pathexpand(...)) or make the default an absolute path.
Useful? React with 👍 / 👎.
| kube scale deployment vector -n "$NAMESPACE" --replicas=1 >/dev/null 2>&1 | ||
| kube autoscale deployment vector -n "$NAMESPACE" \ | ||
| --cpu-percent=70 --min=1 --max=8 >/dev/null 2>&1 |
There was a problem hiding this comment.
Wait for scale-down before starting HPA
In the end-to-end script, Phase 3 leaves the Deployment at 8 replicas, and the kubectl scale reference describes this command as setting a new size rather than waiting for the controller to finish reconciling it. Creating the HPA immediately can let it observe the still-running 8-pod, low-CPU state and settle by scaling down instead of exercising the intended 1→N scale-up timeline, so Phase 4 timing and results can be invalid. Wait until the Deployment actually has one available/current replica before creating the HPA.
Useful? React with 👍 / 👎.
| if [[ "$stable_count" -ge 5 && "$elapsed" -gt 120 && -n "$cpu_avg" ]]; then | ||
| if [[ "$cpu_avg" -ge 63 && "$cpu_avg" -le 77 ]]; then |
There was a problem hiding this comment.
Add a timeout to the HPA wait loop
When the target cluster never enters this hard-coded 63–77% CPU window—for example, slower nodes leave 8 replicas above 77%, metrics are unavailable, or the steady state lands just outside the band—the loop has no maximum elapsed time and the script never reaches the results table. Add a bounded timeout or treat max-replica/metrics-failure states as terminal so reproducing the guide cannot hang indefinitely.
Useful? React with 👍 / 👎.
| total_mibps=$(python3 -c "print(f'{float(\"$mibps_per_pod\") * $replicas:.2f}')") | ||
| total_eps=$(python3 -c "print(f'{float(\"$eps_per_pod\") * $replicas:.0f}')") |
There was a problem hiding this comment.
Sum all pods instead of scaling one sample
For multi-pod phases this reports total throughput by measuring only the first running Vector pod and multiplying it by the replica count. If request distribution is uneven during rollout/HPA, or the selected pod joined or drained traffic differently during the 30-second window, the table can over- or under-state total MiB/s and events/s; measure every running Vector pod over the same window and sum the deltas instead.
Useful? React with 👍 / 👎.
|
|
||
| # Install K3s — include the public IP in the TLS SAN so kubectl works directly | ||
| # Poll IMDS until the public IP is available (avoids a race on first boot) | ||
| until PUBLIC_IP=$(curl -s --max-time 3 http://169.254.169.254/latest/meta-data/public-ipv4) && [ -n "$PUBLIC_IP" ]; do |
There was a problem hiding this comment.
Fetch metadata with an IMDSv2 token
In AWS accounts or AMIs that require IMDSv2, this IMDSv1 request does not return the public IP; because curl is not using --fail, a 401 body can be treated as a non-empty $PUBLIC_IP and written into the K3s TLS SAN, or the loop can stall without installing K3s. Either way, the generated kubeconfig will not work reliably against https://<public_ip>:6443; request an IMDSv2 token before reading metadata or avoid IMDS here.
Useful? React with 👍 / 👎.
Summary
Adds a new level-up guide walking through Kubernetes autoscaling with Vector, including Terraform, manifests, and an experiment script. Also adds an
embedshortcode for embedding in-tree files into Markdown docs.Vector configuration
NA
How did you test this PR?
Ran the website locally with
make serveand verified the guide renders correctly, including the embedded configuration files.Change Type
Is this a breaking change?
Does this PR include user facing changes?
no-changeloglabel to this PR.References
NA