The code in this directory provides the prometheus-tcl package: a pure Tcl library for instrumenting Tcl scripts with Prometheus metrics.
After running the provided install make target, import the prometheus-tcl package like usual
package require prometheus-tcl
This provides a number of procs in the ::prom namespace (abbreviated for ease of typing).
prometheus-tcl is written for Tcl 8.6 and requires a minimal set of dependencies:
cmdline(argument parsing)TclOO(organizes the code providing the client API)Thread(thread safe metric operations)zlib(compressing HTTP replies)tls(only required if exposing metrics over HTTPS, either push or pull)http(only required if pushing metrics)base64(only required if pushing metrics)uri(only required if pushing metrics)
And, for the unit tests:
struct::listmath
Before using a metric, declare it with one of the new procs:
# Create a Counter named messages_processed_total without any labels
prom::counter::new messages_processed_total -help "Number of input messages processed"
# Create a Counter named http_requests_total with some labels
prom::counter::new http_requests_total -help "HTTP requests by method and code" -labels {method code}
# Create a Gauge named rate_limit_queue_size
prom::gauge::new rate_limit_queue_size -help "Number of enqueued messages waiting to be processed"
# Create a Histogram with a method label
prom::histogram::new http_request_duration_seconds -labels {method} -buckets {0.05 0.1 0.2 0.5 1}
# Create an Info metric to track build info
prom::info::new application_build -help "Build info for application" -labels {branch version}
# Create a Summary for request duration
prom::summary::new microservice_rpc_duration_seconds -help "Microservice RPC duration in seconds"
When defining a metric with labels, only the label keys should be provided.
Calling new more than once for the same metric name throws an error (unless a suitable policy has been set to override that behavior).
Likewise, trying to use a metric without a prior call to new throws an error (unless a suitable policy has been set to override that behavior).
Also, providing an invalid metric name or label key throws an error.
If no errors occur, the new procs return the empty string.
After declaring a metric, manipulate its value by passing its name to one of the procs detailed in a later section.
prom::counter::new metricName ?-help helpText? ?-namespace metricNamePrefix? ?-labels labelKeys? ?-timestamp?
prom::gauge::new metricName ?-help helpText? ?-namespace metricNamePrefix? ?-labels labelKeys? ?-timestamp? ?-mergePolicy policy?
prom::histogram::new metricName ?-help helpText? ?-namespace metricNamePrefix? ?-labels labelKeys? ?-timestamp? ?-buckets bucketBoundaries
prom::info::new metricName ?-help helpText? ?-namespace metricNamePrefix? ?-labels labelKeys? ?-timestamp?
prom::summary::new metricName ?-help helpText? ?-namespace metricNamePrefix? ?-labels labelKeys? ?-timestamp?
Given a valid metric name, metricName, new creates a Counter, Gauge, Histogram, Info or Summary metric.
To set the HELP description for the metric, provide the optional -help argument. If no -help is provided, the metricName will be used. The Prometheus documentation is fairly strict about requiring help text, but prometheus-tcl is not. Help text is recommended but not required.
For specifying the metric's labels, provide a list of label keys, labelKeys, to the -label argument. The order of elements in labelKeys is important: label values will need to be passed in that same order when calling the procs detailed below for using the metrics, e.g., incrementing it or observing a value.
Optionally the -namespace argument can contain a string metricNamePrefix that will be prepended (along with an underscore) to the metricName passed to a new proc. Although a -namespace can be provided to new, the preferred way is to set a default global namespace value for metrics created with new.
Lastly, if -timestamp is provided, then each modification to the metric will also be accompanied by an epoch timestamp in milliseconds. This timestamp will show up in the data provided at scrape time, e.g., by calling prom::collect.
If using prometheus-tcl in a multi-threaded application, it is possible to merge the metrics across some subset of Tcl threads. In that case, merging metrics is fairly straightforward, except for gauges where it is not always clear what the best option is. For that situation, the -metricPolicy option exists. For policy it accepts max (the default), min or sum.
The default buckets for a histogram are {.005 .01 .025 .05 .1 .25 .5 1 2.5 5 10}. These are taken from the golang client. See the link for further explanation.
To override the defaults, provide a list of numbers, bucketBoundaries, sorted in increasing order to the -buckets argument.
bucketBoundaries must contain at least one value excluding Inf.
The Inf bucket does not need to be explicitly provided.
Two helper procs ease common-case bucket creation.
Both helper procs return a list of bucket boundaries.
For consecutive bucket boundaries separated by a common difference
prom::linear_buckets start width count
count must be greater than or equal to 1
For consecutive bucket boundaries separated by a common factor
prom::exponential_buckets start factor count
start must be greater than 0
count must be greater than or equal to 1
factor must be greater than 1
# Example of linear buckets
prom::linear_buckets 0 5 10
# Returns the following list
0 5 10 15 20 25 30 35 40 45
# Example of exponential buckets
prom::exponential_buckets 1 2 5
# Returns the following list
1 2 4 8 16
Info metrics are not part of the Prometheus standard. Behind the scenes, they are actually Gauges with a value of 1.0..
However, it is common practice to use this style of metric for exposing information like [https://www.robustperception.io/exposing-the-software-version-to-prometheus](software versions) or machine roles.
The prom::info::new proc takes the same arguments as the others (except for Histograms), but adds two wrinkles:
- At least one label is required. An error will be thrown if the -labels option value is an empty list.
- If the metricName does not end with
_info,prometheus-tclwill automatically append_infoto it. There is no way to disable this, so use a Gauge explicitly if this is not desired.
Note that summary metric types in prometheus-tcl do NOT calculate Phi-quantiles. Instead, they only provide the two Counters, _sum and _count. As stated in the link above:
Overall summarys without quantiles are a nice cheap way to track latencies, amount of data transferred per request, records accessed etc. as it only uses two time series per labelset.
When declaring new metrics, a global namespace, which acts as a metric name prefix, can be set using the prom::set_namespace proc, which takes a single argument, namepsace. By default the namespace is the empty string, but, if not, the value of namespace and an underscore will be prefixed to every metric name provided to a new proc.
When a non-empty namespace has been set, the full name of the metric is exposed when prom::collect is called. Outside of that, though, the namespace value should not be provided in any prometheus-tcl proc calls requiring a metric name.
# Before declaring metrics, set the namespace
prom::set_namespace flightaware
# Create a metric whose full name will be flightaware_departures_total, but only as seen by Prometheus
prom::counter::new departures_total -help "Count total departures issued by FlightAware" -labels {airline adhoc}
# Use the metric without needing to mention the namespace
prom::counter::inc departures_total $airline $adhoc
# Providing the full name of the counter throws an error
prom::counter::inc flightaware_departures_total $airline $adhoc
# What is displayed during a scrape by Prometheus:
# HELP departures_total Count total departures issued by FlightAware
# TYPE departures_total counter
departures_total{airline="1",adhoc="1"} 1.0
When defining and using metrics, following the Prometheus naming best practices can lead to fairly verbose metric names. Typing the full metric name can become tedious, and when used with labels, can lead to very long line lengths. To help address this problem somewhat, prometheus-tcl allows for setting a callback for controlling metric names that will be invoked at metric creation time, i.e., when calling a new proc:
prom::metic_name_callback callback
Set the callback by passing a non-empty string to callback.
Unset any previously set callback by passing in "" as callback.
The callback will be passed two arguments, metricType and metricName.
The callback must return a string representing the full metric name to use (excluding any namespace).
When a metric name callback has been set and a metric has been declared, use the same value passed to new when interacting with the metric.
For example, to automatically append _total to any counter metrics:
proc counter_naming {metricType metricName} {
if {$metricType eq "counter"} {
set metricName ${metricName}_total
}
return $metricName
}
prom::metric_name_callback counter_naming
# When exposed to Prometheus, the name will be example_total
prom::counter::new example
# When using the counter, pass the same name provided to new
prom::counter::inc example
# Do not use the name returned by the metric name callback
# It will traceback
prom::counter::inc example_total; # error
While the default behavior of new is to throw an error if the metric name being declared already exists. This can be modified by setting the name conflict policy to ignore
prom::set_name_conflict_policy policyName
It supports a policyName of error policy (the default) or ignore which silently returns without doing anything.
By default prometheus-tcl registers every metric created with new into a default registry created at package load (a registry is a prom::Registry object). This should cover the majority of cases.
For more advanced scenarios where an alternate registry is needed use
prom::set_collection_registry registryObject
After setting the collection registry, all subsequent calls to prom::collect will use the provided registryObject.
Once a metric has been declared with new, use its name to access the expected Prometheus operations.
Label values are provided after the metric name and the operation's exepected argument(s).
Label values must be provided in the same order that their corresponding label keys were provided to the new command.
prom::counter::inc metricName ?-amount amount? ?labelValue ...?
Increment metricName either by 1 (the default) or by some other non-negative amount provided to the -amount argument. amount can be any value recognized by string is double -strict.
prom::gauge::inc metricName ?-amount amount? ?labelValue ...?
Increment metricName either by 1 (the default) or by some other non-negative amount using the -amount option. amount can be any value recognized by string is double -strict.
prom::gauge::dec metricName ?-amount amount? ?labelValue ...?
Decrement metricName either by 1 (the default) or by a non-negative amount using the -amount option. amount can be any value recognized by string is double -strict.
prom::gauge::set_value metricName value ?labelValue ...?
Set metricName to a particular numeric value. value can be anything recognized by string is double -strict.
prom::gauge::set_to_current_time metricName ?labelValue ...?
Set metricName to the current epoch timestamp in seconds.
prom::gauge::time metricName script ?labelValue ...?
Run script using Tcl's time command, convert the result to seconds and set that value in the gauge named metricName.
prom::histogram::observe metricName value ?labelValue ...?
Observe a value for the Histogram named metricName.
prom::histogram::time metricName script ?labelValue ...?
Run script using Tcl's time command, convert the result to seconds and observe that value in the histogram named metricName.
prom::summary::observe metricName value ?labelValue ...?
Observe a value for the Summary named metricName.
prom::summary::time metricName script ?labelValue ...?
Run script using Tcl's time command, convert the result to seconds and observe that value in the summary named metricName.
prom::info::labels metricName labelValue ?labelValue ...?
Set the label values for the Info metric metricName.
Since labels are required for creating Info metrics, at least one labelValue value must be provided.
For any proc detailed in the section above, when providing the label values for a given metric name, the values MUST be in the order provided to the -labels argument to new.
For example, assume we declare a Counter named total with three label keys, k1, k2, and k3, represented by the Tcl list {k1 k2 k3}:
prom::counter::new total -help "Example of the importance of labelValue order" -labels {k1 k2 k3}
With total declared, assume we now want to increment total with label values k2="v2", k3="v3", and k1="v1". Since we declared total with the labelKeys list {k1 k2 k3}, we need to provide label values in that order:
prom::counter::inc total v1 v2 v3
The time procs provided by prom::gauge, prom::histogram and prom::summary only support timing in seconds.
Restricting the units to seconds conforms with Prometheus client guidelines and the metric naming best practices.
time only provides the wall time it took to execute the script argument. If you need CPU time, see times from the TclX extension.
Importantly, these procs do NOT catch exceptions; however, if an error is thrown while evaluating script, metricName will still be updated with script's execution time.
By default it is an error to use one of the procs in this section without a prior call to new.
prometheus-tcl can also silently ignore any attempt to manipulate an undeclared metric name by setting the missing name policy
prom::set_name_missing_policy policy
To ignore any attempt to use an undeclared metric name, pass a value of ignore as the policy argument.
For getting the metrics in Prometheus' text-based exposition format use
prom::collect
which takes no arguments and returns a Prometheus formatted string of the current metrics created using prometheus-tcl.
prometheus-tcl only supports the Prometheus text format.
To expose a HTTP port for Prometheus to scrape use
prom::expose ?-address address? ?-port port? ?-tls? ?-tlsArgs args? ?-path path?
Requires entering the event loop.
By default prometheus-tcl listens on the wildcard address on port 1347 for path /metrics where path is the request-target of RFC7230.
The -address option supports the same values that the Tcl socket command does for its -myaddr option.
TLS support can be enabled with the boolean -tls argument. To configure TLS, pass a single list of arguments to the -tlsArgs option. It takes all arguments accepted by tcltls' tls::import command.
Importantly, if you use TLS with prom::expose and a request fails, e.g., a client does not even use TLS for a request, the background exception handler is called, so be sure to set one when using this option. The message argument passed to the background error handler for these protocol failures starts with the string SSL channel.
path can be any glob pattern. If a request is made to a URI that doesn't match path, a 400 Bad Request is sent.
prom::collect_to_file filePath
writes the results of prom::collect to a file in a non-blocking way. It returns 0 if an error occurred or 1 if the file was written successfully.
This can be used, for instance, with the node_exporter's Textfile Collector.
For pushing metrics to a PushGateway the following procs are provided:
prom::push_to_gateway gateway job ?-groupingKey labelsDict? ?-timeout timeoutMS?
prom::pushadd_to_gateway gateway job ?-groupingKey labelsDict? ?-timeout timeoutMS?
prom::delete_from_gateway gateway job ?-groupingKey labelsDict? ?-timeout timeoutMS?
Given a gateway hostname, job value and an optional dict of labels, labelsDict, send an HTTP(s) PUT (prom::push_to_gateway), POST (prom::pushadd_to_gateway) or DELETE (prom::delete_from_gateway) to the PushGateway.
The gateway hostname should be the PushGateway's domain name or IP address. It can optionally include a URI scheme of http or https (http is assumed if none is specified) along with a : and port number (9091 is assumed as the port if none is specified).
On success, which means the PushGateway provided a status code indicating such, the prom::*_gateway procs return 1 or 0 otherwise.
These procs will throw an error if the gateway is not provided in an acceptable format.
To make clear the accepted gateway values in the above procs, consider an imagined host gateway.com running a PushGateway on port 12345:
# Since no scheme provided, http used by default
# If no port is provided, 9091 is assumed by default
gateway.com:12345
# Alternate way of writing the above
http://gateway.com:12345
# For TLS, must explicitly specify https
https://gateway.com:12345
If an https scheme is provided, tcltls is used to encrypt the connection to the gateway.
prometheus-tcl supports a multi-threaded mode of operation. This is off by default but can be enabled by setting a multi-threaded collection policy of mt using the proc
prom::set_collection_policy policy
By default the policy is st, or single-threaded. That means that prom::collect only collects metrics from the current Tcl interpreter.
If a policy of mt is specified, then prom::collect will aggregate metrics from all threads specified for collection and merge their values (for any given metric name and labels).
The mt policy assumes a Tcl program using threads in the context of the Tcl threading model. In a multi-threaded application, it is possible that each thread could expose its metrics on a separate port. However, this has several drawbacks, one of which could be the creation of too many values for the Prometheus-added instance label. It is also likely that each thread will share some (if not all) of the same metric names. In that case, it is desirable for prom::expose to return merged metrics across all threads with metrics to share. That is the multi-threaded collection policy supported by prometheus-tcl.
Note that in the multi-threaded context, prometheus-tcl does not have safeguards against multiple threads using the same metric name but different label keys. If that occurs, the behavior is undefined. It is up to application developers to enforce this, preferably by doing metric creation in a proc shared by all threads to enforce uniformity.
In passing it is worth mentioning that it would be possible to avoid merging metrics by creating, for instance, a cpptcl extension or tsvs for metric instrumentation in a multi-threaded setting. Either one of those could be fine solutions and were considered, but are not used in this package.
When collecting metrics in a multi-threaded setting it is possible that one of the threads could block indefinitely and stall the thread that called prom::collect. To avoid this situation, prometheus-tcl uses thread::send -async along with an after event timeout and vwait to set an upper bound on how long collection can take before giving up.
By default a collection timeout of 10 milliseconds is used, but, depending on workload and number of threads, this could use adjustment. To set a different value in milliseconds use
prom::set_mt_collection_timeout timeoutMS
To see the current collection timeout value
prom::get_mt_collection_timeout
The timeout value is for the time taken to collect from all threads specified for collection not for an individual thread.
If some threads fail to return a value before the timeout no error will result. Any collected metrics will be exposed.
By default, prometheus-tcl's multi-threaded collection policy collects from all threads returned by thread::names. To set only some threads for collection use
prom::set_mt_collection_threads threadIDs
When merging metrics the following rules are followed:
- Metrics with the same name and labels (includes keys and values) can be merged
- If threads provide different label keys for the same metric name, the result is undefined
- Counters have their values added together
- Gauges by default take the maximum value seen. However, this can be modified with the -mergePolicy option for
prom::gauge::new. Supported merge policies aremax,min, andsum - Histograms have the count for each bucket and the overall count and total summed
- Summaries have their count and total values summed
- For metrics with timestamps (this is controlled at metric creation time), typically the maximum timestamp seen is taken. This is not the case, however, for gauges where it depends on the merge policy. A
sumpolicy uses the maximum timestamp seen butmaxorminuse the timestamp of the maximum or minimum value seen
Compliance with Prometheus Client Guidelines
prometheus-tcl complies with most if not all of the client library recommendations.
For instance:
- It is thread safe for Tcl's threading model
- It offers Counter and Gauge and both Summary and Histogram
- It has a default registry (the client API hides this detail from the user)
- Using the different classes defined in the
promnamespace, it is possible to use a different registry - All metrics have the mandatory methods and names (except for
prom::gauge::set_valueto avoid collision with::set) - Unit tests are included (can run them with the
testmake target)
If you would like additional documentation on the package and have Doxygen installed, a docs Make target will run doxygen. This will generate output in the docs/ sub-directory of the current directory.