Skip to content

experiment(internal telemetry): gate allocation tracing behind a Cargo feature#25709

Draft
pront wants to merge 1 commit into
masterfrom
feat/gate-allocation-tracing
Draft

experiment(internal telemetry): gate allocation tracing behind a Cargo feature#25709
pront wants to merge 1 commit into
masterfrom
feat/gate-allocation-tracing

Conversation

@pront

@pront pront commented Jun 30, 2026

Copy link
Copy Markdown
Member

⚠️ Experiment — do not merge

This branch exists to measure the cost of always compiling Vector's per-component allocation tracking into release builds. It is not a merge candidate. If SMP shows a meaningful win we will follow up with a real proposal; if it does not, this branch will be closed.

Motivation

Vector's #[global_allocator] is GroupedTraceableAllocator<Jemalloc>. It is always compiled in for unix release builds, and on every alloc/dealloc it does a relaxed atomic load + branch on TRACK_ALLOCATIONS to decide whether to do the per-component bookkeeping. The runtime --allocation-tracing flag toggles that bool, but the wrapper itself — and therefore the load + branch — is unconditional.

Open questions this experiment is meant to answer:

  1. Is the always-on AtomicBool check on every allocation measurable in throughput/latency across our component matrix, or is the branch predictor hiding it?
  2. If it is measurable, is the per-component grouping feature worth that permanent tax, or should we move to a model where the wrapper is only present in opt-in builds (and possibly replace it with sampled jemalloc prof.active for users who just want heap visibility)?

See https://github.com/vectordotdev/vector/blob/master/src/internal_telemetry/allocations/allocator/tracing_allocator.rs for the hot path being measured.

What this PR does

Wraps the entire allocation-tracing surface in #[cfg(feature = "allocation-tracing")]:

  • #[global_allocator] in src/lib.rs (wrapped vs plain Jemalloc)
  • init code in src/main.rs
  • --allocation-tracing / --allocation-tracing-reporting-interval-ms CLI flags in src/cli.rs
  • AllocationLayer in src/trace.rs
  • acquire_allocation_group_id call sites in src/topology/running.rs
  • get_allocation_tracing_status gRPC RPC in src/api/grpc/service.rs
  • the allocations module itself in src/internal_telemetry/mod.rs

Feature is off by default, so the SMP comparison build runs with a plain tikv_jemallocator::Jemalloc and no wrapper at all. Baseline is current master (wrapper always on).

How tested

  • cargo check --no-default-features --features default (feature off): clean
  • cargo check --no-default-features --features "default,allocation-tracing" (feature on): clean
  • cargo clippy --no-default-features --features default --lib --bin vector -- -D warnings: clean
  • SMP regression run dispatched: https://github.com/vectordotdev/vector/actions/runs/28452960292

Change Type

  • Bug fix
  • New feature
  • Dependencies
  • Non-functional (chore, refactoring, docs)
  • Performance

Is this a breaking change?

  • Yes
  • No

(Not landing as-is; if a follow-up does land, default builds would lose the runtime --allocation-tracing flag and the component_allocated_bytes_total / component_deallocated_bytes_total / component_allocated_bytes internal metrics unless the feature is added to the release feature set.)

Does this PR include user facing changes?

  • Yes. Please add a changelog fragment based on our guidelines.
  • No. A maintainer will apply the no-changelog label to this PR.

Build/Cargo-feature wiring only.

…ture

Wrap the entire allocation tracing surface (global allocator wrapper,
init code, CLI flags, AllocationLayer, topology call sites, gRPC status
RPC) in `#[cfg(feature = "allocation-tracing")]`. Feature is off by
default; release builds opt in by adding it to the feature set.

Motivation: when the feature is compiled in, every alloc/dealloc pays a
relaxed atomic load + branch on `TRACK_ALLOCATIONS` even when the
runtime flag is off. This commit lets us run an SMP experiment to
measure the cost of that always-on path.
@github-actions github-actions Bot added the domain: topology Anything related to Vector's topology code label Jun 30, 2026
@datadog-vectordotdev

datadog-vectordotdev Bot commented Jun 30, 2026

Copy link
Copy Markdown

Pipelines  Tests

⚠️ Warnings

🚦 2 Pipeline jobs failed

Changelog | validate-changelog   View in Datadog   GitHub Actions

PR Title Check | Check PR Title   View in Datadog   GitHub Actions

ℹ️ Info

No other issues found (see more)

🧪 All tests passed
❄️ No new flaky tests detected

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: ab79a0c | Docs | Give us feedback!

@pront pront changed the title chore(internal telemetry): gate allocation tracing behind a Cargo feature experiment(internal telemetry): gate allocation tracing behind a Cargo feature Jun 30, 2026
@pront

pront commented Jun 30, 2026

Copy link
Copy Markdown
Member Author

/ci-run-regression

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

domain: topology Anything related to Vector's topology code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant