Skip to content

docs(website): add kubernetes autoscaling guide#25732

Open
thomasqueirozb wants to merge 9 commits into
masterfrom
website-load-balancing-guide
Open

docs(website): add kubernetes autoscaling guide#25732
thomasqueirozb wants to merge 9 commits into
masterfrom
website-load-balancing-guide

Conversation

@thomasqueirozb

Copy link
Copy Markdown
Member

Summary

Adds a new level-up guide walking through Kubernetes autoscaling with Vector, including Terraform, manifests, and an experiment script. Also adds an embed shortcode for embedding in-tree files into Markdown docs.

Vector configuration

NA

How did you test this PR?

Ran the website locally with make serve and verified the guide renders correctly, including the embedded configuration files.

Change Type

  • Bug fix
  • New feature
  • Non-functional (chore, refactoring, docs)
  • Performance
  • Dependencies

Is this a breaking change?

  • Yes
  • No

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

References

NA

@github-actions github-actions Bot added the docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. label Jul 1, 2026
@thomasqueirozb thomasqueirozb added the no-changelog Changes in this PR do not need user-facing explanations in the release changelog label Jul 1, 2026
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Your preview site for the vector.dev will be ready in a few minutes, please allow time for it to build.

Heres your preview link:
vector.dev preview

@datadog-vectordotdev

This comment has been minimized.

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Your preview site for the Rust Doc will be ready in a few minutes, please allow time for it to build.

Heres your preview link:
Rust Doc preview

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Your preview site for the VRL Playground will be ready in a few minutes, please allow time for it to build.

Heres your preview link:
VRL Playground preview

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Your preview site for the vector.dev will be ready in a few minutes, please allow time for it to build.

Heres your preview link:
vector.dev preview

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Your preview site for the VRL Playground will be ready in a few minutes, please allow time for it to build.

Heres your preview link:
VRL Playground preview

@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Your preview site for the Rust Doc will be ready in a few minutes, please allow time for it to build.

Heres your preview link:
Rust Doc preview

@thomasqueirozb thomasqueirozb marked this pull request as ready for review July 2, 2026 20:44
@thomasqueirozb thomasqueirozb requested review from a team as code owners July 2, 2026 20:44
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Your preview site for the vector.dev will be ready in a few minutes, please allow time for it to build.

Heres your preview link:
vector.dev preview

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Your preview site for the Rust Doc will be ready in a few minutes, please allow time for it to build.

Heres your preview link:
Rust Doc preview

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Your preview site for the VRL Playground will be ready in a few minutes, please allow time for it to build.

Heres your preview link:
VRL Playground preview

@thomasqueirozb thomasqueirozb changed the title docs(website): add k8s autoscaling guide and embed shortcode docs(website): add k8s autoscaling guide Jul 2, 2026
@thomasqueirozb thomasqueirozb changed the title docs(website): add k8s autoscaling guide docs(website): add kubernetes autoscaling guide Jul 2, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 40025bd3dd

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".


variable "ssh_public_key_path" {
type = string
default = "~/.ssh/vector_tests.pub"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Expand default SSH key paths before use

When users rely on the default ~/.ssh/vector_tests.pub, it is passed directly to file(var.ssh_public_key_path) in main.tf. Terraform’s pathexpand docs identify that function as the one that expands a leading ~; without it here, Terraform looks for a literal ~ path from the module and fails before creating the AWS key pair. Use file(pathexpand(...)) or make the default an absolute path.

Useful? React with 👍 / 👎.

Comment on lines +152 to +154
kube scale deployment vector -n "$NAMESPACE" --replicas=1 >/dev/null 2>&1
kube autoscale deployment vector -n "$NAMESPACE" \
--cpu-percent=70 --min=1 --max=8 >/dev/null 2>&1

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Wait for scale-down before starting HPA

In the end-to-end script, Phase 3 leaves the Deployment at 8 replicas, and the kubectl scale reference describes this command as setting a new size rather than waiting for the controller to finish reconciling it. Creating the HPA immediately can let it observe the still-running 8-pod, low-CPU state and settle by scaling down instead of exercising the intended 1→N scale-up timeline, so Phase 4 timing and results can be invalid. Wait until the Deployment actually has one available/current replica before creating the HPA.

Useful? React with 👍 / 👎.

Comment on lines +187 to +188
if [[ "$stable_count" -ge 5 && "$elapsed" -gt 120 && -n "$cpu_avg" ]]; then
if [[ "$cpu_avg" -ge 63 && "$cpu_avg" -le 77 ]]; then

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Add a timeout to the HPA wait loop

When the target cluster never enters this hard-coded 63–77% CPU window—for example, slower nodes leave 8 replicas above 77%, metrics are unavailable, or the steady state lands just outside the band—the loop has no maximum elapsed time and the script never reaches the results table. Add a bounded timeout or treat max-replica/metrics-failure states as terminal so reproducing the guide cannot hang indefinitely.

Useful? React with 👍 / 👎.

Comment on lines +135 to +136
total_mibps=$(python3 -c "print(f'{float(\"$mibps_per_pod\") * $replicas:.2f}')")
total_eps=$(python3 -c "print(f'{float(\"$eps_per_pod\") * $replicas:.0f}')")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Sum all pods instead of scaling one sample

For multi-pod phases this reports total throughput by measuring only the first running Vector pod and multiplying it by the replica count. If request distribution is uneven during rollout/HPA, or the selected pod joined or drained traffic differently during the 30-second window, the table can over- or under-state total MiB/s and events/s; measure every running Vector pod over the same window and sum the deltas instead.

Useful? React with 👍 / 👎.


# Install K3s — include the public IP in the TLS SAN so kubectl works directly
# Poll IMDS until the public IP is available (avoids a race on first boot)
until PUBLIC_IP=$(curl -s --max-time 3 http://169.254.169.254/latest/meta-data/public-ipv4) && [ -n "$PUBLIC_IP" ]; do

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fetch metadata with an IMDSv2 token

In AWS accounts or AMIs that require IMDSv2, this IMDSv1 request does not return the public IP; because curl is not using --fail, a 401 body can be treated as a non-empty $PUBLIC_IP and written into the K3s TLS SAN, or the loop can stall without installing K3s. Either way, the generated kubeconfig will not work reliably against https://<public_ip>:6443; request an IMDSv2 token before reading metadata or avoid IMDS here.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs review on hold The documentation team reviews PRs only after a PR is approved by the COSE team. no-changelog Changes in this PR do not need user-facing explanations in the release changelog

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant