Notes on Prometheus

I have to admit that it took me quite a while to understand Prometheus, the open-source metrics toolkit – not because of any inherent complexity, but because I had the wrong idea about some of the fundamentals.

This article is my best attempt to summarize what I wish had been explained to me from the outset, in the hopes that it will save someone else the trouble I went through.

Prometheus is for time-series data. A time series is a stream of samples – timestamped numeric measurements. Think memory usage: record the percentage of RAM in use every 5 seconds, and you have a time series. Prometheus is not a general-purpose data-store for any kind of metric – only for those with a time component.
Measure rates with a counter time-series. Often you want to answer questions like, "How many HTTP requests does my server receive per second?" and "How many database queries did my application make in the last 5 minutes?" Don't try to record the rates themselves; just count the total number of events. Since the samples are timestamped, Prometheus will calculate rates and intervals for you.
A metric is a named collection of time series. It is a collection and not a single time series because of labels: each unique combination of label values is its own time series. http_requests_total{method="GET", status_code=200} and http_requests_total{method="POST", status_code=500} are two different time series that both belong to the http_requests_total metric.
The Prometheus server does not collect samples. It only scrapes and stores them. To actually collect them you have two choices: use a pre-existing exporter programs (like node_exporter, postgres_exporter, etc.), or turn your application into an exporter itself using a Prometheus client library. The latter allows you to record your own application-specific metrics.
Samples are created when a target is scraped. I wrongly thought that a sample was created when I called, e.g., gauge.set(x) using the client library. But set() just sets the current value; it isn't recorded as a sample until the Prometheus server scrapes it. So if I call gauge.set(x) and gauge.set(y) in quick succession, the value x may never be sampled at all.
PromQL has two time-series data types: instant vectors and range vectors. An instant vector is a set of time series, each with a single sample at the same timestamp. A range vector is a set of time series, each with potentially multiple samples within the same time range. http_requests_total is an instant vector; http_requests_total[5m] is a range vector.
Only instant vectors can be graphed. This sounded backwards to me – doesn't a graph look like a range vector? But it becomes clearer when you understand how the graph is drawn. For each step from the graph's minimum time to its maximum, the PromQL query is evaluated at that instant (an x-coordinate) to produce an instant vector, which is a set of numeric values (the y-coordinates).
rate turns a range vector into an instant vector. rate(http_requests_total[5m]) evaluates to the average number of HTTP requests per second over the past five minutes. Notice that http_requests_total[5m] is a range vector, while the whole expression is an instant vector (whose instant is the end of the 5-minute interval). To make matters more confusing, if you use VictoriaMetrics's MetricsQL instead of PromQL, you can write rate(http_requests_total), which is translated to rate(http_requests_total[$__interval]) under the hood, where $__interval is the interval you chose for the graph.
sum aggregates instant vectors. sum(metrics) takes an instant vector and returns another instant vector with all the samples in the different time series summed up. It does not sum over time, only different samples at the same instant.
Histograms are just time-series. Histograms track the distribution of events, like the time taken to serve an HTTP request. A histogram metric like http_request_time_secs is represented with a counter time series for each bucket. When you call histogram.observe(x), the exact value x is not recorded anywhere. All that happens is the bucket which x falls into is incremented.¹
Histograms are imprecise for calculating percentiles. You may want to know what is the P95 latency of your web server. A histogram metric cannot answer that question precisely, because calculating an arbitrary percentile requires you to store all the samples observed, which would consume an ever-growing share of memory. Summary metrics are an alternative to histograms that let you choose the percentiles you are interested in and the error you are willing to tolerate, at the cost of more client-side overhead. Histogram metrics are a perfect fit if all you are interested in is, e.g., the percentage of requests that took more than 200 milliseconds.
Multi-process applications require special consideration. Python web servers commonly use multiple processes to serve requests. If you naively embed the Prometheus client library into your web application, each process will have its own set of metrics value, and the Prometheus server will get an arbitrary process's metrics depending on which process happens to serve the scrape request. I solved this with multiprocess mode, though there are other options as well. ∎

A histogram metric also has time series for the count and sum of observations, so that the mean value can be calculated. ↩