docs(website): add kubernetes autoscaling guide by thomasqueirozb · Pull Request #25732 · vectordotdev/vector

thomasqueirozb · 2026-07-01T19:26:36Z

Summary

Adds a new level-up guide walking through Kubernetes autoscaling with Vector, including Terraform, manifests, and an experiment script. Also adds an embed shortcode for embedding in-tree files into Markdown docs.

Vector configuration

NA

How did you test this PR?

Ran the website locally with make serve and verified the guide renders correctly, including the embedded configuration files.

Change Type

Is this a breaking change?

Yes
No

Does this PR include user facing changes?

Yes. Please add a changelog fragment based on our guidelines.
No. A maintainer will apply the no-changelog label to this PR.

References

NA

github-actions · 2026-07-01T19:27:05Z

Your preview site for the vector.dev will be ready in a few minutes, please allow time for it to build.

Heres your preview link:
vector.dev preview

github-actions · 2026-07-01T19:27:15Z

Your preview site for the Rust Doc will be ready in a few minutes, please allow time for it to build.

Heres your preview link:
Rust Doc preview

github-actions · 2026-07-01T19:27:17Z

Your preview site for the VRL Playground will be ready in a few minutes, please allow time for it to build.

Heres your preview link:
VRL Playground preview

github-actions · 2026-07-01T20:52:56Z

Your preview site for the vector.dev will be ready in a few minutes, please allow time for it to build.

Heres your preview link:
vector.dev preview

github-actions · 2026-07-01T20:53:07Z

Your preview site for the VRL Playground will be ready in a few minutes, please allow time for it to build.

Heres your preview link:
VRL Playground preview

github-actions · 2026-07-01T20:53:09Z

Your preview site for the Rust Doc will be ready in a few minutes, please allow time for it to build.

Heres your preview link:
Rust Doc preview

github-actions · 2026-07-02T20:45:07Z

Your preview site for the vector.dev will be ready in a few minutes, please allow time for it to build.

Heres your preview link:
vector.dev preview

github-actions · 2026-07-02T20:45:14Z

Your preview site for the Rust Doc will be ready in a few minutes, please allow time for it to build.

Heres your preview link:
Rust Doc preview

github-actions · 2026-07-02T20:45:14Z

Your preview site for the VRL Playground will be ready in a few minutes, please allow time for it to build.

Heres your preview link:
VRL Playground preview

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 40025bd3dd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-07-02T20:56:31Z

+
+variable "ssh_public_key_path" {
+  type    = string
+  default = "~/.ssh/vector_tests.pub"


Expand default SSH key paths before use

When users rely on the default ~/.ssh/vector_tests.pub, it is passed directly to file(var.ssh_public_key_path) in main.tf. Terraform’s pathexpand docs identify that function as the one that expands a leading ~; without it here, Terraform looks for a literal ~ path from the module and fails before creating the AWS key pair. Use file(pathexpand(...)) or make the default an absolute path.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-02T20:56:31Z

+  kube scale deployment vector -n "$NAMESPACE" --replicas=1 >/dev/null 2>&1
+  kube autoscale deployment vector -n "$NAMESPACE" \
+    --cpu-percent=70 --min=1 --max=8 >/dev/null 2>&1


Wait for scale-down before starting HPA

In the end-to-end script, Phase 3 leaves the Deployment at 8 replicas, and the kubectl scale reference describes this command as setting a new size rather than waiting for the controller to finish reconciling it. Creating the HPA immediately can let it observe the still-running 8-pod, low-CPU state and settle by scaling down instead of exercising the intended 1→N scale-up timeline, so Phase 4 timing and results can be invalid. Wait until the Deployment actually has one available/current replica before creating the HPA.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-02T20:56:31Z

+    if [[ "$stable_count" -ge 5 && "$elapsed" -gt 120 && -n "$cpu_avg" ]]; then
+      if [[ "$cpu_avg" -ge 63 && "$cpu_avg" -le 77 ]]; then


Add a timeout to the HPA wait loop

When the target cluster never enters this hard-coded 63–77% CPU window—for example, slower nodes leave 8 replicas above 77%, metrics are unavailable, or the steady state lands just outside the band—the loop has no maximum elapsed time and the script never reaches the results table. Add a bounded timeout or treat max-replica/metrics-failure states as terminal so reproducing the guide cannot hang indefinitely.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-02T20:56:31Z

+  total_mibps=$(python3 -c "print(f'{float(\"$mibps_per_pod\") * $replicas:.2f}')")
+  total_eps=$(python3    -c "print(f'{float(\"$eps_per_pod\")   * $replicas:.0f}')")


Sum all pods instead of scaling one sample

For multi-pod phases this reports total throughput by measuring only the first running Vector pod and multiplying it by the replica count. If request distribution is uneven during rollout/HPA, or the selected pod joined or drained traffic differently during the 30-second window, the table can over- or under-state total MiB/s and events/s; measure every running Vector pod over the same window and sum the deltas instead.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-02T20:56:31Z

+
+    # Install K3s — include the public IP in the TLS SAN so kubectl works directly
+    # Poll IMDS until the public IP is available (avoids a race on first boot)
+    until PUBLIC_IP=$(curl -s --max-time 3 http://169.254.169.254/latest/meta-data/public-ipv4) && [ -n "$PUBLIC_IP" ]; do


Fetch metadata with an IMDSv2 token

In AWS accounts or AMIs that require IMDSv2, this IMDSv1 request does not return the public IP; because curl is not using --fail, a 401 body can be treated as a non-empty $PUBLIC_IP and written into the K3s TLS SAN, or the loop can stall without installing K3s. Either way, the generated kubeconfig will not work reliably against https://<public_ip>:6443; request an IMDSv2 token before reading metadata or avoid IMDS here.

Useful? React with 👍 / 👎.

docs(website): add k8s autoscaling guide and embed shortcode

1d01889

github-actions Bot added the docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. label Jul 1, 2026

thomasqueirozb added the no-changelog Changes in this PR do not need user-facing explanations in the release changelog label Jul 1, 2026

This comment has been minimized.

Sign in to view

fmt

f07d73e

thomasqueirozb added 7 commits July 2, 2026 16:08

Rework guide sections

4d5e19b

Fix markdownlint errors

7d69324

Remove unused vars in variables.tf

489d1d2

improve wording, fix helm command

b63cfe1

Fix phase 4 results

b6bd2f5

Add summary table

6aa5eb5

docs(website): make run-experiment.sh embed collapsible by default

40025bd

thomasqueirozb marked this pull request as ready for review July 2, 2026 20:44

thomasqueirozb requested review from a team as code owners July 2, 2026 20:44

thomasqueirozb changed the title ~~docs(website): add k8s autoscaling guide and embed shortcode~~ docs(website): add k8s autoscaling guide Jul 2, 2026

thomasqueirozb changed the title ~~docs(website): add k8s autoscaling guide~~ docs(website): add kubernetes autoscaling guide Jul 2, 2026

chatgpt-codex-connector Bot reviewed Jul 2, 2026

View reviewed changes

		if [[ "$stable_count" -ge 5 && "$elapsed" -gt 120 && -n "$cpu_avg" ]]; then
		if [[ "$cpu_avg" -ge 63 && "$cpu_avg" -le 77 ]]; then

		total_mibps=$(python3 -c "print(f'{float(\"$mibps_per_pod\") * $replicas:.2f}')")
		total_eps=$(python3 -c "print(f'{float(\"$eps_per_pod\") * $replicas:.0f}')")

Uh oh!

Conversation

thomasqueirozb commented Jul 1, 2026

Summary

Vector configuration

How did you test this PR?

Change Type

Is this a breaking change?

Does this PR include user facing changes?

References

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

This comment has been minimized.

github-actions Bot commented Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 1, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

github-actions Bot commented Jul 2, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant