Menu
Grafana Cloud

Map Prometheus metrics to observe resource utilization and saturation

This guide explains how to map Prometheus metrics to observe resource utilization and saturation in Asserts. This integration allows for identification of trends, anomalies, and potential performance issues.

Resource utilization

Some resources have a finite limit. For these resources, you can:

  • Express their current level of utilization as a ratio against the limit.
  • Observe their utilization and get early warning of their saturation.

For these cases, record the asserts:resource metric.

For example, consider the number of clients for a Redis server. Redis has a configuration to limit the max number of clients. This limit, as well as the current number of clients, is available through metrics exposed by the Redis exporter.

MetricDetails
redis_connected_clientsNumber of active client connections
redis_config_maxclientsMaximum number of client connections allowed

Considering that the number of clients is a finite resource, it’s useful to track the resource’s utilization and receive an early warning before the resource actually saturates. To achieve this, record the asserts:resource metric.

code
- record: asserts:resource
  expr: >
    max by (asserts_env, asserts_site, namespace, service, job) (
      redis_connected_clients / redis_config_maxclients
    )
  labels:
    asserts_entity_type: Service
    asserts_resource_type: client_connections
    asserts_source: redis_exporter

With this, the client connections utilization becomes normalized to a scale of 0-1. You must now define the warning and critical thresholds for saturation.

code
- record: asserts:resource:threshold
  expr: 0.8
  labels:
    asserts_resource_type: client_connections
    asserts_severity: warning

- record: asserts:resource:threshold
  expr: 0.9
  labels:
    asserts_resource_type: client_connections
    asserts_severity: critical

You can also manage the thresholds in the user interface by navigating to Observability > Asserts > Rules > Threshold and then clicking Resource.

Understand the Asserts meta labels and aggregation

  • asserts_env and asserts_site: The same Redis service may be deployed in multiple environments. In Asserts, all services and other infrastructure components are grouped by environment and site. For example, asserts_env = prod and asserts_site = us-west-2. The metrics, alerts and the services discovered from these metrics are scoped by these labels.

  • namespace and service: There may be multiple deployments of Redis for different functionalities, for example, redis-payments, redis-orders, and so on. In K8s environments, these would be deployed as different stateful services. Asserts uses the namespace and service label in the metric to uniquely identify each service in a given environment.

  • job: In non-K8s environment, the Prometheus metric scrape configuration for the job label is going to be different for each deployment of Redis. Asserts uses the job label in the metric to uniquely identify each service in a given environment.

  • asserts_entity_type: It’s possible that Redis is clustered and the saturation occurs for one or more instances. In the earlier rule, the saturation was reported at the Service level and the detail of the instance is lost. If we want to observe and report saturation at the instance level the rule needs to be written as follows and then the expression includes the instance label and the entity type is set to ServiceInstance.

    code
    - record: asserts:resource
      expr: >
          redis_connected_clients / redis_config_maxclients
      labels:
        asserts_entity_type: ServiceInstance
        asserts_resource_type: client_connections
        asserts_source: redis_exporter
  • asserts_resource_type: Asserts models the various resources that it observes into different types. For example, cpu:usage , memory:usage, disk:usage, and so on are different types of resources. In this example, client_connections is being observed. You may set it to any value that best describes the resource and signal being observed.

  • asserts_source: Sometimes, the same or similar metrics may be available from more than one instrumentation. The asserts_source is a useful meta label to indicate what is the source exporter of the metric. This is helpful when investigating the alert, as you now have a recording rule to track utilization. This also provides understanding of the different parts of the recording rule. Asserts automatically observes for utilization of client connections and raise alerts when the warning or critical threshold is exceeded. Asserts has a default threshold for all resource utilization. The thresholds for saturation of different resources can be configured here.

Automatic anomaly detection on resource usage

When resource metrics are available as gauges by mapping them to asserts:resource:gauge, Asserts automatically detects anomalies. For example, while we are observing the utilization and saturation of client connections, it might also be interesting to observe for anomalies in the number of connections. For example, a sudden drop or spike in the number of client connections is an anomaly.

Gauge resource metric

To observe anomalies such as those described about, the number of client connections can be recorded as resource gauge as follows:

output
- record: asserts:resource:gauge
  expr: redis_connected_clients
  labels:
    asserts_resource_type: client_connections
    asserts_entity_type: Service
    asserts_source: redis_exporter