Metrics in Effect
In complex and highly concurrent applications, managing various interconnected components can be quite challenging. Ensuring that everything runs smoothly and avoiding application downtime becomes crucial in such setups.
Now, let’s imagine we have a sophisticated infrastructure with numerous services. These services are replicated and distributed across our servers. However, we often lack insight into what’s happening across these services, including error rates, response times, and service uptime. This lack of visibility can make it challenging to identify and address issues effectively. This is where Effect Metrics comes into play; it allows us to capture and analyze various metrics, providing valuable data for later investigation.
Effect Metrics offers support for five different types of metrics:
Metric | Description |
---|---|
Counter | Counters are used to track values that increase over time, such as request counts. They help us keep tabs on how many times a specific event or action has occurred. |
Gauge | Gauges represent a single numerical value that can fluctuate up and down over time. They are often used to monitor metrics like memory usage, which can vary continuously. |
Histogram | Histograms are useful for tracking the distribution of observed values across different buckets. They are commonly used for metrics like request latencies, allowing us to understand how response times are distributed. |
Summary | Summaries provide insight into a sliding window of a time series and offer metrics for specific percentiles of the time series, often referred to as quantiles. This is particularly helpful for understanding latency-related metrics, such as request response times. |
Frequency | Frequency metrics count the occurrences of distinct string values. They are useful when you want to keep track of how often different events or conditions are happening in your application. |
In the world of metrics, a Counter is a metric that represents a single numerical value that can be both incremented and decremented over time. Think of it like a tally that keeps track of changes, such as the number of a particular type of request received by your application, whether it’s increasing or decreasing.
Unlike some other types of metrics (like gauges), where we’re interested in the value at a specific moment, with counters, we care about the cumulative value over time. This means it provides a running total of changes, which can go up and down, reflecting the dynamic nature of certain metrics.
Some typical use cases for counters include:
- Request Counts: Monitoring the number of incoming requests to your server.
- Completed Tasks: Keeping track of how many tasks or processes have been successfully completed.
- Error Counts: Counting the occurrences of errors in your application.
To create a counter, you can use the Metric.counter
constructor.
Example (Creating a Counter)
Once created, the counter can accept an effect that returns a number
, which will increment or decrement the counter.
Example (Using a Counter)
You can specify whether the counter tracks a number
or bigint
.
If you need a counter that only increments, you can use the incremental: true
option.
Example (Using an Increment-Only Counter)
In this configuration, the counter only accepts positive values. Any attempts to decrement will have no effect, ensuring the counter strictly counts upwards.
You can configure a counter to always increment by a fixed value each time it is invoked.
Example (Constant Input)
In the world of metrics, a Gauge is a metric that represents a single numerical value that can be set or adjusted. Think of it as a dynamic variable that can change over time. One common use case for a gauge is to monitor something like the current memory usage of your application.
Unlike counters, where we’re interested in cumulative values over time, with gauges, our focus is on the current value at a specific point in time.
Gauges are the best choice when you want to monitor values that can both increase and decrease, and you’re not interested in tracking their rates of change. In other words, gauges help us measure things that have a specific value at a particular moment.
Some typical use cases for gauges include:
- Memory Usage: Keeping an eye on how much memory your application is using right now.
- Queue Size: Monitoring the current size of a queue where tasks are waiting to be processed.
- In-Progress Request Counts: Tracking the number of requests currently being handled by your server.
- Temperature: Measuring the current temperature, which can fluctuate up and down.
To create a gauge, you can use the Metric.gauge
constructor.
Example (Creating a Gauge)
Once created, a gauge can be updated by passing an effect that produces the value you want to set for the gauge.
Example (Using a Gauge)
You can specify whether the gauge tracks a number
or bigint
.
A Histogram is a metric used to analyze how numerical values are distributed over time. Instead of focusing on individual data points, a histogram groups values into predefined ranges, called buckets, and tracks how many values fall into each range.
When a value is recorded, it gets assigned to one of the histogram’s buckets based on its range. Each bucket has an upper boundary, and the count for that bucket is increased if the value is less than or equal to its boundary. Once recorded, the individual value is discarded, and the focus shifts to how many values have fallen into each bucket.
Histograms also track:
- Total Count: The number of values that have been observed.
- Sum: The sum of all the observed values.
- Min: The smallest observed value.
- Max: The largest observed value.
Histograms are especially useful for calculating percentiles, which can help you estimate specific points in a dataset by analyzing how many values are in each bucket.
This concept is inspired by Prometheus, a well-known monitoring and alerting toolkit.
Histograms are particularly useful in performance analysis and system monitoring. By examining how response times, latencies, or other metrics are distributed, you can gain valuable insights into your system’s behavior. This data helps you identify outliers, performance bottlenecks, or trends that may require optimization.
Common use cases for histograms include:
- Percentile Estimation: Histograms allow you to approximate percentiles of observed values, like the 95th percentile of response times.
- Known Ranges: If you can estimate the range of values in advance, histograms can organize the data into predefined buckets for better analysis.
- Performance Metrics: Use histograms to track metrics like request latencies, memory usage, or throughput over time.
- Aggregation: Histograms can be aggregated across multiple instances, making them ideal for distributed systems where you need to collect data from different sources.
Example (Histogram With Linear Buckets)
In this example, we define a histogram with linear buckets, where the values range from 0
to 100
in increments of 10
. Additionally, we include a final bucket for values greater than 100
, referred to as the “Infinity” bucket. This configuration is useful for tracking numeric values, like request latencies, within specific ranges.
The program generates random numbers between 1
and 120
, records them in the histogram, and then prints the histogram’s state, showing the count of values that fall into each bucket.
In this example, we demonstrate how to use a timer metric to track the duration of specific workflows. The timer captures how long certain tasks take to execute, storing this information in a histogram, which provides insights into the distribution of these durations.
We generate random values to simulate varying wait times, record the durations in the timer, and then print out the histogram’s state.
Example (Tracking Workflow Durations with a Timer Metric)
A Summary is a metric that gives insights into a series of data points by calculating specific percentiles. Percentiles help us understand how data is distributed. For instance, if you’re tracking response times for requests over the past hour, you may want to examine key percentiles such as the 50th, 90th, 95th, or 99th to better understand your system’s performance.
Summaries are similar to histograms in that they observe number
values, but with a different approach. Instead of immediately sorting values into buckets and discarding them, a summary holds onto the observed values in memory. However, to avoid storing too much data, summaries use two parameters:
- maxAge: The maximum age a value can have before it’s discarded.
- maxSize: The maximum number of values stored in the summary.
This creates a sliding window of recent values, so the summary always represents a fixed number of the most recent observations.
Summaries are commonly used to calculate quantiles over this sliding window. A quantile is a number between 0
and 1
that represents the percentage of values less than or equal to a certain threshold. For example, a quantile of 0.5
(or 50th percentile) is the median value, while 0.95
(or 95th percentile) would represent the value below which 95% of the observed data falls.
Quantiles are helpful for monitoring important performance metrics, such as latency, and for ensuring that your system meets performance goals (like Service Level Agreements, or SLAs).
The Effect Metrics API also allows you to configure summaries with an error margin. This margin introduces a range of acceptable values for quantiles, improving the accuracy of the result.
Summaries are particularly useful in cases where:
- The range of values you’re observing is not known or estimated in advance, making histograms less practical.
- You don’t need to aggregate data across multiple instances or average results. Summaries calculate their results on the application side, meaning they focus on the specific instance where they are used.
Example (Creating and Using a Summary)
In this example, we will create a summary to track response times. The summary will:
- Hold up to
100
samples. - Discard samples older than
1 day
. - Have a
3%
error margin when calculating quantiles. - Report the
10%
,50%
, and90%
quantiles, which help track response time distributions.
We’ll apply the summary to an effect that generates random integers, simulating response times.
Frequencies are metrics that help count the occurrences of specific values. Think of them as a set of counters, each associated with a unique value. When new values are observed, the frequency metric automatically creates new counters for those values.
Frequencies are particularly useful for tracking how often distinct string values occur. Some example use cases include:
- Counting the number of invocations for each service in an application, where each service has a logical name.
- Monitoring how frequently different types of failures occur.
Example (Tracking Error Occurrences)
In this example, we’ll create a Frequency
to observe how often different error codes occur. This can be applied to effects that return a string
value:
Tags are key-value pairs you can add to metrics to provide additional context. They help categorize and filter metrics, making it easier to analyze specific aspects of your application’s performance or behavior.
When creating metrics, you can add tags to them. Tags are key-value pairs that provide additional context, helping in categorizing and filtering metrics. This makes it easier to analyze and monitor specific aspects of your application.
You can tag individual metrics using the Metric.tagged
function.
This allows you to add specific tags to a single metric, providing detailed context without applying tags globally.
Example (Tagging an Individual Metric)
Here, the request_count
metric is tagged with "environment": "production"
, allowing you to filter or analyze metrics by this tag later.
You can use Effect.tagMetrics
to apply tags to all metrics within the same context. This is useful when you want to apply common tags, like the environment (e.g., “production” or “development”), across multiple metrics.
Example (Tagging Multiple Metrics)
If you only want to apply tags within a specific scope, you can use Effect.tagMetricsScoped
. This limits the tag application to metrics within that scope, allowing for more precise tagging control.