Monitor infrastructure

Cloud Provider Observability

AWS observability

CloudWatch metrics

Configure CloudWatch metric streams

Grafana Cloud

Configure CloudWatch metric streams

With Grafana Cloud Observability Amazon CloudWatch metric streams, you can push CloudWatch metrics using Amazon Data Firehose, providing real-time insights and scalability while simplifying configuration and reducing manual effort.

Configuring CloudWatch metric streams in Grafana Cloud includes creating the following:

an access policy token with metric:write permissions from Grafana Cloud
multiple AWS IAM roles
a CloudWatch metric stream
a Data Firehose stream
an AWS resource metadata scrape job

You can configure CloudWatch metric streams using a combination of the Cloud Provider UI and CloudFormation or Terraform.

Generate an access policy token

Regardless of which method you choose to configure CloudWatch metric streams, you need an access policy token with metric:write permissions from Grafana Cloud.

To generate an access policy token with metric:write permissions from Grafana Cloud:

Open your Grafana Cloud portal.
Expand Observability > Cloud provider in the main menu.
Select AWS, the Configuration tab, and the CloudWatch metric streams card.
Enter a name for the token and click Create token.

If you are using Terraform, store this information in your list of variables.

Configure metric streams with CloudFormation

Configuring a CloudWatch metrics stream with CloudFormation requires an access policy token with metric:write permissions from Grafana Cloud and a CloudFormation stack that includes an ARN from an AWS IAM role used to set up the AWS resource metadata scrape job.

Launch CloudFormation stack

Click Launch CloudFormation stack, opening a CloudFormation template in your AWS account in a new tab.
The AWS account that you are logged into at the time of clicking the button is the account that opens. To use a different account, log out of the current account and into the account you want to use.
Enter the access policy token with metric:write permissions you generated into the MetricsWriteToken field.
Update the FallbackS3BucketName if you already have an S3 cloud storage with a similar name as it must be unique.
(Optional) Add the AWS namespaces you want to include in your metric stream. If you leave this field blank, all AWS services that publish CloudWatch metrics are included.
Select the I acknowledge that AWS CloudFormation might create IAM resources with custom names checkbox.
Click Create stack.
After the stack creation is complete, copy the ARN from the AWS IAM role generated in the CloudFormation stack to use when creating the AWS resource metadata scrape job.

Set static labels in AWS

Setting static labels offers you an additional way to filter and group your metrics in Grafana Cloud. Set static labels using the X-Amz-Firehose-Common-Attributes header.

To set static labels, you need to configure the Amazon Data Firehose destination settings in AWS using the following steps:

Navigate to your Amazon Data Firehose in AWS console.
Select the Configuration tab.
Select the Edit button for Destination settings.
Select the Add parameter button for Parameters - optional.
Enter a key value pair in the corresponding text boxes.

Label Keys must be prefixed with lbl_ and the label keys and values must be compatible with the Prometheus data model specification.

When you query in Grafana for the static labels, do not include the lbl_ prefix, as in the following example query:

{job=~"cloud/aws/.+", label1="value`", label2="value2"}

Set tag selection in AWS

Setting tag selection offers you an additional way to lower the cardinality of your metrics in Grafana Cloud. This setting is useful when you have dynamic tags that change their value often. Tag selection specifies which tags are attached to the info metrics from the Resource metadata scrape job. If you do not create a Resource Metadata scrape job, tag selection does not apply. Set tag selection using the X-Amz-Firehose-Common-Attributes header.

To set tag selection, you need to configure the Amazon Data Firehose destination settings in AWS using the following steps:

Navigate to your Amazon Data Firehose in AWS console.
Select the Configuration tab.
Select the Edit button for Destination settings.
Select the Add parameter button for Parameters - optional.
Enter tag_selection as the parameter key.
Enter a comma separated list of tags you want attached to the metrics as the value.
Tags must begin with the prefix, ’tag_’ as in the following example: “tag_1,tag_b,tag_2”
If the value is an empty string, “”, no tags are attached. If the value is not set, the default action is to attach all tags.

Disable optional `_average` statistic in AWS

When you send metrics from your Amazon Data Firehose delivery stream to the Grafana CloudWatch metric streams integration, the integration by default generates an _average statistic series from the Amazon CloudWatch Sum and SampleCount statistics. For example, if Grafana receives aws_ec2_disk_read_bytes_sum and aws_ec2_disk_read_bytes_sample_count, the integration calculates aws_ec2_disk_read_bytes_average. Grafana does this calculation to be compatible with the CloudWatch metric scrape jobs and dashboards. However, this metric is optional and you can disable it using the X-Amz-Firehose-Common-Attributes header.

To disable the _average statistic, you need to configure the Amazon Data Firehose destination settings in AWS using the following steps:

Navigate to your Amazon Data Firehose in AWS console.
Select the Configuration tab.
Select the Edit button for Destination settings.
Select the Add parameter button for Parameters - optional.
Enter include_average_statistic as the parameter key.
Enter false as the parameter value.

Create an AWS resource metadata scrape job in the UI

Metrics pushed to Grafana Cloud by metrics stream only contain region and dimensions as labels. To enrich your metrics with additional metadata as labels, such as the associated resource’s ARN and resource tags, create an AWS Resource Metadata scrape job.

Connect to AWS account

Enter the name of your account for the Account name(optional). Give your account a unique name that contains only alphanumeric characters, dashes, and underscores.
Paste the ARN you copied from the AWS IAM role that was generated when you launched your CloudFormation stack.
Select the regions where you have services you want to monitor from the AWS Regions drop-down menu.
Click Add account to ensure the connection is working and to save your new account.

Add resource metadata job options

Enter a name for your resource metadata job.
Give your scrape job a unique name that contains only alphanumeric characters, dashes, and underscores.
Optionally, add static labels for easier filtering and grouping.
These labels are added to all metrics exported by this scrape job.
Choose the services you want to scrape for resource metadata.
You can search in the search box or browse in the list of services. These should be the services you included in your metric stream, since the resource metadata enriches the info metrics.
Click Edit next to the service if you want to customize the resource metadata that is collected for that service or namespace.
1. Select the scrape interval.
2. Add tag filters you want to include.
  Tag filters are configurations for the resource metadata scrape job that limit the AWS resources retrieved to only the ones that match the filter criteria.
3. Click Save service settings.
Click Create job.

Configure metric streams with Terraform

Configuring a CloudWatch metrics stream with Terraform requires an access policy token with metric:write permissions from Grafana Cloud and multiple AWS components. After you have configured the metric stream, you need to configure an AWS resource metadata scrape job to enrich your metrics with additional metadata as labels, such as the associated resource’s ARN and resource tags.

Download the example CloudWatch metric stream Terraform file as a starting point for configuring your metric stream.

Download the CloudWatch metric stream Terraform snippet file.
Complete the sections labeled with FILLME and replace values with your values.

Run terraform apply, including the required variables, as in the following example:

terraform apply \
    -var="grafana_cloud_stack_slug=<The slug of the Grafana Cloud stack to use for the AWS resource metadata scrape job, for example, `http://<slug>.grafana.net`" \
    -var="cloud_provider_token=<The Grafana.com token used for creating the AWS resource metadata scrape job>" \
    -var="cloud_provider_url=<The URL to call Grafana Cloud's Cloud Provider API>" \
    -var="metrics_write_token=<The Grafana.com token used to write metrics to Mimir>" \
    -var="include_namespaces=<A list of AWS namespaces to include in the metric stream>"

The following instructions explain the different parts of the example file.

Configure the AWS and Grafana Providers

To configure the AWS and Grafana Providers, you need to create a Grafana Cloud access policy token and obtain the regional Cloud Provider API endpoint.

Create a Grafana Cloud access policy token

To create an access policy for your organization in the Grafana Cloud portal, refer to the Create an access policy for a stack steps.

In step 6, add the following scopes:

integration-management:read
integration-management:write
stacks:read

After you create the policy, click Add token to generate a token to authenticate the provider with the Cloud Provider API. Give your token an appropriate name and select an Expiration date. We recommend you select a specific expiration date and do not set the Expiration date to No expiry, as this can create a security vulnerability.

Use this access policy token to call the Grafana Cloud API for the stack name and other properties of the stack. You can also use it to call the Cloud Provider API to manage the AWS account and scrape job resources including AWS resource metadata scrape jobs.

Obtain the regional Cloud provider API endpoint

Use the following script to return a list of all of the Grafana Cloud stacks you own, along with their respective Cloud Provider API hostnames:

curl -sH "Authorization: Bearer <Access Token from previous step>" "http://grafana.com/api/instances" | \
jq '[.items[]|{stackName: .slug, clusterName:.clusterSlug, cloudProviderAPIURL: "http://cloud-provider-api-\(.clusterSlug).grafana.net"}]'

Select the hostname for the stack you want to manage.
In the following example, the hostname for the herokublogpost stack is http://cloud-provider-api-prod-us-central-0-grafana.net:
JSON
```
[
  {
    "stackName": "herokublogpost",
    "clusterName": "prod-us-central-0",
    "cloudProviderAPIURL": "http://cloud-provider-api-prod-us-central-0.grafana.net"
  }
]
```
Use this API endpoint to call the Cloud Provider API.

Example Terraform

The following snippet is an example configuration of the Grafana and AWS Providers using the Grafana Cloud access token and Cloud Provider API endpoint you obtained:

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    grafana = {
      source  = "grafana/grafana"
      version = ">= 3.24.1"
    }
  }
}

provider "aws" {
  // FILLME: AWS region
  region = ""

  // FILLME: local AWS profile to use
  profile = ""
}

provider "grafana" {
  cloud_provider_access_token = var.cloud_provider_token // Grafana Cloud access policy token used to call the Grafana Cloud stack data source for getting the stack name and other properties of the stack.
  cloud_access_policy_token = var.cloud_provider_token // Grafana Cloud access policy token used to call the Cloud Provider API to manage the AWS account and scrape job resources.
  cloud_provider_url = var.cloud_provider_url // Cloud Provider API URL
}

Refer to the Terraform documentation for more details on each of the following providers:

Create a CloudWatch metric and Data Firehose delivery stream

You must create the following infrastructure in your AWS account for sending the metrics to Grafana Cloud.

Authentication components

Create an IAM role and policy that the Data Firehose stream can assume and allows it to emit error logs and back up to an S3 bucket, as in the following example:

// Batches whose delivery failed are written here
resource "aws_s3_bucket" "fallback" {
 bucket = var.fallback_bucket_name
}
// main IAM role used by the firehose stream for writing failed batches to S3
resource "aws_iam_role" "firehose" {
  name = format("Firehose-%s", var.metric_stream_name)

  assume_role_policy = data.aws_iam_policy_document.firehose_assume_role.json
}

data "aws_iam_policy_document" "firehose_assume_role" {
  statement {
   effect = "Allow"

   principals {
   type    	= "Service"
   identifiers = ["firehose.amazonaws.com"]
   }

   actions = ["sts:AssumeRole"]
  }
}

# allow firehose to emit error logs and back up to s3
resource "aws_iam_role_policy" "firehose" {
  name = format("Firehose-%s", var.metric_stream_name)

  # attach to firehose http
  role = aws_iam_role.firehose.id

  policy = jsonencode({
        Version = "2012-10-17"
        Statement = [
         # allow firehose to write error logs
         {
           Effect = "Allow"
           Resource : ["*"],
           Action = ["logs:PutLogEvents"]
         },
         # allow firehose to backup events to s3
         {
           "Sid" : "s3Permissions",
           "Effect" : "Allow",
           "Action" : [
             "s3:AbortMultipartUpload",
             "s3:GetBucketLocation",
             "s3:GetObject",
             "s3:ListBucket",
             "s3:ListBucketMultipartUploads",
             "s3:PutObject",
           ],
           "Resource" : [
             aws_s3_bucket.fallback.arn,
             "${aws_s3_bucket.fallback.arn}/*",
           ]
         },
       ]
   })
}

Create an IAM role and policy that’s assumed by the CloudWatch Metric stream to allow it to push metrics to the Data Firehose stream, as in the following example:

// IAM role used by CloudWatch metric stream for forwarding metrics to Firehose
resource "aws_iam_role" "metric_stream_role" {
  name = format("metric-stream-role-%s", var.cluster)

  # allow metric stream to assume this role
  assume_role_policy = data.aws_iam_policy_document.metric_stream_assume_role.json
}

data "aws_iam_policy_document" "metric_stream_assume_role" {
  statement {
  effect = "Allow"

  principals {
      type    	= "Service"
      identifiers = ["streams.metrics.cloudwatch.amazonaws.com"]
  }

  actions = ["sts:AssumeRole"]
  }
}

resource "aws_iam_role_policy" "metric_stream_role" {
  name = "AWSCloudWatchMetricStreamPolicy"
  role = aws_iam_role.metric_stream_role.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
   // allow metric stream to write to firehose
      {
        Action = ["firehose:PutRecord", "firehose:PutRecordBatch"]
        Effect = "Allow"
        Resource = [
          aws_kinesis_firehose_delivery_stream.stream.arn,
        ]
      },
    ]
   })
 }

Create an IAM role and policy that’s assumed by Grafana to access only your CloudWatch metadata, as in the following example:

// IAM resources needed to authorize Grafana Cloud to scrape AWS resource metadata
data "aws_iam_policy_document" "trust_grafana" {
  statement {
    effect = "Allow"
    principals {
      type        = "AWS"
      identifiers = ["arn:aws:iam::${var.grafana_cloud_sts_aws_account_id}:root"]
    }
    actions = ["sts:AssumeRole"]
    condition {
      test     = "StringEquals"
      variable = "sts:ExternalId"
      values   = [data.grafana_cloud_stack.main.prometheus_user_id]
    }
  }
}

resource "aws_iam_role" "grafana_cloud_aws_resource_metadata" {
  name        = "GrafanaAWSResourceMetadataScrapeJobAccess"
  description = "Role used by Grafana CloudWatch integration."
  # Allow Grafana Labs' AWS account to assume this role.
  assume_role_policy = data.aws_iam_policy_document.trust_grafana.json
}

resource "aws_iam_role_policy" "grafana_cloud_aws_resource_metadata" {
  name = "GrafanaAWSResourceMetadataScrapeJobAccess"
  role = aws_iam_role.grafana_cloud_aws_resource_metadata.id
  # This policy allows the role to discover resources via tags and API calls.
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "tag:GetResources",
          "apigateway:GET",
          "aps:ListWorkspaces",
          "autoscaling:DescribeAutoScalingGroups",
          "dms:DescribeReplicationInstances",
          "dms:DescribeReplicationTasks",
          "ec2:DescribeTransitGatewayAttachments",
          "ec2:DescribeSpotFleetRequests",
          "shield:ListProtections",
          "storagegateway:ListGateways",
          "storagegateway:ListTagsForResource"
        ]
        Resource = "*"
      }
    ]
  })
}

// Allow some time for IAM (global) changes to propagate
resource "time_sleep" "wait_iam_propagation" {
  depends_on = [
    aws_iam_role.grafana_cloud_aws_resource_metadata,
    aws_iam_role_policy.grafana_cloud_aws_resource_metadata
  ]
  create_duration = "10s"
}

Data Firehose delivery stream component

Create the Data Firehose stream that sends metrics to the configured Grafana Cloud endpoint, as in the following example:

locals {
  // If the target endpoint is not explicitly provided, then convert the stack's Prometheus URL
  // to the Grafana Cloud AWS Metric Streaming ingest endpoint.
  // Ex: http://prometheus-prod-03-prod-us-central-0.grafana.net
  // becomes http://aws-metric-streams-prod-03.grafana.net/aws-metrics/api/v1/push
  target_endpoint = var.target_endpoint != "" ? var.target_endpoint : format("%s/aws-metrics/api/v1/push", replace(
    replace(data.grafana_cloud_stack.main.prometheus_url, "prometheus", "aws-metric-streams"),
    "-${data.grafana_cloud_stack.main.cluster_slug}",
    ""
  ))
}
resource "aws_kinesis_firehose_delivery_stream" "stream" {
  name    	= format("%s-firehose", var.metric_stream_name)
  destination = "http_endpoint"

  http_endpoint_configuration {
    url            	= local.target_endpoint
    name           	= "Grafana AWS Metric Stream Destination"
    access_key     	= format("%s:%s",data.grafana_cloud_stack.main.prometheus_user_id, var.metrics_write_token)
    // Buffer incoming data to the specified size, in MBs, before delivering it to the destination
    buffering_size 	= 1
    // Buffer incoming data for the specified period of time, in seconds, before delivering it to the destination
    // Setting to 1 minute to keep a low enough latency between metric production and actual time they are processed
    buffering_interval = 60
    role_arn       	= aws_iam_role.firehose.arn
    s3_backup_mode 	= "FailedDataOnly"

	  request_configuration {
      content_encoding = "GZIP"
    }
    // this block configured the fallback s3 bucket destination
    s3_configuration {
      role_arn       	= aws_iam_role.firehose.arn
      bucket_arn     	= aws_s3_bucket.fallback.arn
      buffering_size 	= 5
      buffering_interval = 300
      compression_format = "GZIP"
    }
    // Optional block for writing delivery failures to a CloudWatch log group
    // this assumes the target log group has been created, or is created in this same snippet
    dynamic "cloudwatch_logging_options" {
      for_each = var.log_delivery_errors ? [1] : []
      content {
        enabled         = true
        log_group_name  = var.errors_log_group_name
        log_stream_name = var.errors_log_stream_name
      }
    }
  }
}

Set static labels in Terraform

Setting static labels offers you an additional way to filter and group your metrics in Grafana Cloud. Set static labels using the X-Amz-Firehose-Common-Attributes header. Configure the request_configuration block, with a common_attributes sub-block of the Amazon Data Firehose delivery stream configuration, as in the following example:

http_endpoint_configuration {
    url            	= local.target_endpoint
    name           	= "Grafana AWS Metric Stream Destination"
    access_key     	= format("%s:%s",data.grafana_cloud_stack.main.prometheus_user_id, var.metrics_write_token)
    // Buffer incoming data to the specified size, in MBs, before delivering it to the destination
    buffering_size 	= 1
    // Buffer incoming data for the specified period of time, in seconds, before delivering it to the destination
    // Setting to 1 minute to keep a low enough latency between metric production and actual time they are processed
    buffering_interval = 60
    role_arn       	= aws_iam_role.firehose.arn
    s3_backup_mode 	= "FailedDataOnly"

	  request_configuration {
      content_encoding = "GZIP"

      common_attributes {
        name  = "lbl_testname"
        value = "testvalue"
      }
      common_attributes {
        name  = "lbl_testname2" \\ static label names must be previxed with `lbl_`
        value = "testvalue2" \\ static label names and values must be compatible with the Prometheus data model specification
      }
    }
  ...
}

Label names must be prefixed with lbl_ and the label names and values must be compatible with the Prometheus data model specification.

When you query in Grafana for the static labels, do not include the lbl_ prefix, as in the following example query:

{job=~"cloud/aws/.+", label1="value`", label2="value2"}

CloudWatch metric stream component

Create a CloudWatch metric stream with include and exclude filters to define which metrics to push into the Data Firehose stream, as in the following example:

resource "aws_cloudwatch_metric_stream" "metric_stream" {
  name      	= "var.metric_stream_name"
  role_arn  	= aws_iam_role.metric_stream_role.arn
  firehose_arn  = aws_kinesis_firehose_delivery_stream.stream.arn
  output_format = "opentelemetry1.0"


  dynamic "include_filter" {
    // Stream all metrics from the specified namespaces
    for_each = var.include_namespaces
    content {
      namespace = include_filter.value
    }
  }
}

Configure an AWS resource metadata scrape job in Terraform

Configure the services, scrape interval, and tag filters you want included using the grafana_cloud_provider_aws_resource_metadata_scrape_job resource documentation, as in the following example:

resource "grafana_cloud_provider_aws_account" "main" {
  depends_on = [
    time_sleep.wait_iam_propagation
  ]

  stack_id = data.grafana_cloud_stack.main.id
  role_arn = aws_iam_role.grafana_cloud_aws_resource_metadata.arn
  regions  = [data.aws_region.current.name]
}

resource "grafana_cloud_provider_aws_resource_metadata_scrape_job" "main" {
  stack_id                = data.grafana_cloud_stack.main.id
  name                    = "aws-resource-metadata-scraper"
  aws_account_resource_id = grafana_cloud_provider_aws_account.main.resource_id

  dynamic "service" {
    for_each = var.include_namespaces
    content {
      name = service.value
    }
  }
}

Grafana Terraform provider resource descriptions

You can define the following resources and data sources with the Grafana Terraform provider.

Resource name	Description
`grafana_cloud_provider_aws_account`	Represents an AWS IAM role that authorizes Grafana Cloud to pull Amazon CloudWatch metrics for a set of regions. Usually, there’s one of these resources per configured AWS account. For a full reference of this resource, refer to the Cloud Provider resources in the Terraform Grafana Provider reference documentation.
`grafana_cloud_provider_aws_resource_metadata_scrape_job`	Represents a Grafana AWS Resource Metadata Scrape Job. This resource configures Grafana to fetch resource metadata for one or multiple AWS services, for a given `grafana_cloud_provider_aws_account`. For a full reference of this resource, refer to the Cloud Provider resources in the Terraform Grafana Provider reference documentation.