Menu
Grafana Cloud

Configure CloudWatch metric streams

CloudWatch metric streams require an access policy token with metric write permissions from Grafana Cloud, multiple AWS IAM roles, a CloudWatch metric stream, a Data Firehose stream, and an AWS resource metadata scrape job. You can configure CloudWatch metric streams using a combination of the Cloud Provider UI and CloudFormation or Terraform.

Before you begin

Open your Grafana Cloud portal, expand Observability > Cloud provider in the main menu, then select AWS, the Configuration tab, and the CloudWatch metric streams card to find and copy the following information you need to configure CloudWatch metric streams:

  • The API token with the metrics:write permission.
    Create a Grafana.com token by entering a name for the token and clicking Create token.

If you are using Terraform, store this information in your list of variables.

Configure metric streams with CloudFormation

Configuring a CloudWatch metrics stream with CloudFormation requires an access policy token with metric:write permissions from Grafana Cloud and a CloudFormation stack that includes an ARN from an AWS IAM role used to set up the AWS resource metadata scrape job.

Launch CloudFormation stack

  1. Click the Launch CloudFormation stack button for step 2.
  2. Complete the steps to create all of the AWS components in CloudFormation.
  3. Copy the ARN from the AWS IAM role generated in the CloudFormation stack to use when creating the AWS metadata scrape job.
  4. Update the MetricsWriteToken field with the Grafana.com token you generated.

Set static labels in AWS

Setting static labels offers you an additional way to filter and group your metrics in Grafana Cloud. Set static labels using the X-Amz-Firehose-Common-Attributes header.

To set static labels in the AWS:

  1. Navigate to your Amazon Data Firehose.
  2. Select the Configuration tab.
  3. Select the Edit button for Destination settings.
  4. Select the Add parameter button for Parameters - optional.
  5. Enter a key value pair in the corresponding text boxes.

Label Keys must be prefixed with lbl_ and the label keys and values must be compatible with the Prometheus data model specification.

When you query in Grafana for the static labels, do not include the lbl_ prefix, as in the following example query:

{job=~"cloud/aws/.+", label1="value`", label2="value2"}

Create an AWS resource metadata scrape job in the UI

Metrics pushed to Grafana Cloud by metrics stream only contain region and dimensions as labels. To enrich your metrics with additional metadata as labels, such as the associated resource’s ARN and resource tags, create an AWS Resource Metadata scrape job.

Connect to AWS account

  1. Enter the name of your account for the Account name(optional). Give your account a unique name that contains only alphanumeric characters, dashes, and underscores.
  2. Paste the ARN you copied from the AWS IAM role that was generated when you launched your CloudFormation stack.
  3. Select the regions where you have services you want to monitor from the AWS Regions drop-down menu.
  4. Click Add account to ensure the connection is working and to save your new account.

Add resource metadata job options

  1. Enter a name for your resource metadata job. Give your scrape job a unique name that contains only alphanumeric characters, dashes, and underscores.
  2. Optionally, add static labels for easier filtering and grouping. These labels are added to all metrics exported by this scrape job.
  3. Choose the services you want to scrape. You can search in the search box or browse in the list of services.
  4. Click Edit next to the service if you want to customize the metadata that are collected for that service or namespace.
    1. Select the scrape interval.
    2. Add tag filters you want to include.
    3. Click Save service settings.
  5. Click Create job.

Configure metric streams with Terraform

Configuring a CloudWatch metrics stream with Terraform requires an access policy token with metric:write permissions from Grafana Cloud and multiple AWS components. After you have configured the metric stream, you need to configure an AWS resource metadata scrape job to enrich your metrics with additional metadata as labels, such as the associated resource’s ARN and resource tags.

Download the example CloudWatch metric stream Terraform file as a starting point for configuring your metric stream.

The following instructions explain the different parts of the example file.

Configure the AWS and Grafana Providers

To configure the AWS and Grafana Providers, you need to create a Grafana Cloud access policy token and obtain the regional Cloud Provider API endpoint.

Create a Grafana Cloud access policy token

To create an access policy for your organization in the Grafana Cloud portal, refer to the Create an access policy for a stack steps.

In step 6, add the following scopes:

  • integration-management:read
  • integration-management:write
  • stacks:read

After you create the policy, click Add token to generate a token to authenticate the provider with the Cloud Provider API. Give your token an appropriate name and select an Expiration date. We recommend you select a specific expiration date and do not set the Expiration date to No expiry, as this can create a security vulnerability.

Use this access policy token to call the Grafana Cloud API for the stack name and other properties of the stack. You can also use it to call the Cloud Provider API to manage the AWS account and scrape job resources including AWS resource metadata scrape jobs.

Obtain the regional Cloud provider API endpoint

  1. Use the following script to return a list of all of the Grafana Cloud stacks you own, along with their respective Cloud Provider API hostnames:
    bash
    curl -sH "Authorization: Bearer <Access Token from previous step>" "http://grafana.com/api/instances" | \
    jq '[.items[]|{stackName: .slug, clusterName:.clusterSlug, cloudProviderAPIURL: "http://cloud-provider-api-\(.clusterSlug).grafana.net"}]'
  2. Select the hostname for the stack you want to manage.
    In the following example, the hostname for the herokublogpost stack is http://cloud-provider-api-prod-us-central-0-grafana.net:
    json
    [
      {
        "stackName": "herokublogpost",
        "clusterName": "prod-us-central-0",
        "cloudProviderAPIURL": "http://cloud-provider-api-prod-us-central-0.grafana.net"
      }
    ]
    Use this API endpoint to call the Cloud Provider API.

Example Terraform

The following snippet is an example configuration of the Grafana and AWS Providers using the Grafana Cloud access token and Cloud Provider API endpoint you obtained:

terraform
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    grafana = {
      source  = "grafana/grafana"
      version = ">= 3.24.1"
    }
  }
}

provider "aws" {
  // FILLME: AWS region
  region = ""

  // FILLME: local AWS profile to use
  profile = ""
}

provider "grafana" {
  cloud_provider_access_token = var.cloud_provider_token // Grafana Cloud access policy token used to call the Grafana Cloud stack data source for getting the stack name and other properties of the stack.
  cloud_access_policy_token = var.cloud_provider_token // Grafana Cloud access policy token used to call the Cloud Provider API to manage the AWS account and scrape job resources.
  cloud_provider_url = var.cloud_provider_url // Cloud Provider API URL
}

Refer to the Terraform documentation for more details on each of the following providers:

Create a CloudWatch metric and Data Firehose delivery stream

You must create the following infrastructure in your AWS account for sending the metrics to Grafana Cloud.

Authentication components

  1. Create an IAM role and policy that the Data Firehose stream can assume and allows it to emit error logs and back up to an S3 bucket, as in the following example:

    terraform
    // Batches whose delivery failed are written here
    resource "aws_s3_bucket" "fallback" {
     bucket = var.fallback_bucket_name
    }
    // main IAM role used by the firehose stream for writing failed batches to S3
    resource "aws_iam_role" "firehose" {
      name = format("Firehose-%s", var.metric_stream_name)
    
      assume_role_policy = data.aws_iam_policy_document.firehose_assume_role.json
    }
    
    data "aws_iam_policy_document" "firehose_assume_role" {
      statement {
       effect = "Allow"
    
       principals {
       type    	= "Service"
       identifiers = ["firehose.amazonaws.com"]
       }
    
       actions = ["sts:AssumeRole"]
      }
    }
    
    # allow firehose to emit error logs and back up to s3
    resource "aws_iam_role_policy" "firehose" {
      name = format("Firehose-%s", var.metric_stream_name)
    
      # attach to firehose http
      role = aws_iam_role.firehose.id
    
      policy = jsonencode({
            Version = "2012-10-17"
            Statement = [
             # allow firehose to write error logs
             {
               Effect = "Allow"
               Resource : ["*"],
               Action = ["logs:PutLogEvents"]
             },
             # allow firehose to backup events to s3
             {
               "Sid" : "s3Permissions",
               "Effect" : "Allow",
               "Action" : [
                 "s3:AbortMultipartUpload",
                 "s3:GetBucketLocation",
                 "s3:GetObject",
                 "s3:ListBucket",
                 "s3:ListBucketMultipartUploads",
                 "s3:PutObject",
               ],
               "Resource" : [
                 aws_s3_bucket.fallback.arn,
                 "${aws_s3_bucket.fallback.arn}/*",
               ]
             },
           ]
       })
    }
  2. Create an IAM role and policy that’s assumed by the CloudWatch Metric stream to allow it to push metrics to the Data Firehose stream, as in the following example:

    terraform
    // IAM role used by CloudWatch metric stream for forwarding metrics to Firehose
    resource "aws_iam_role" "metric_stream_role" {
      name = format("metric-stream-role-%s", var.cluster)
    
      # allow metric stream to assume this role
      assume_role_policy = data.aws_iam_policy_document.metric_stream_assume_role.json
    }
    
    data "aws_iam_policy_document" "metric_stream_assume_role" {
      statement {
      effect = "Allow"
    
      principals {
          type    	= "Service"
          identifiers = ["streams.metrics.cloudwatch.amazonaws.com"]
      }
    
      actions = ["sts:AssumeRole"]
      }
    }
    
    resource "aws_iam_role_policy" "metric_stream_role" {
      name = "AWSCloudWatchMetricStreamPolicy"
      role = aws_iam_role.metric_stream_role.id
    
      policy = jsonencode({
        Version = "2012-10-17"
        Statement = [
       // allow metric stream to write to firehose
          {
            Action = ["firehose:PutRecord", "firehose:PutRecordBatch"]
            Effect = "Allow"
            Resource = [
              aws_kinesis_firehose_delivery_stream.stream.arn,
            ]
          },
        ]
       })
     }
  3. Create an IAM role and policy that’s assumed by Grafana to access only your CloudWatch metadata, as in the following example:

    terraform
    // IAM resources needed to authorize Grafana Cloud to scrape AWS resource metadata
    data "aws_iam_policy_document" "trust_grafana" {
      statement {
        effect = "Allow"
        principals {
          type        = "AWS"
          identifiers = ["arn:aws:iam::${var.grafana_cloud_sts_aws_account_id}:root"]
        }
        actions = ["sts:AssumeRole"]
        condition {
          test     = "StringEquals"
          variable = "sts:ExternalId"
          values   = [data.grafana_cloud_stack.main.prometheus_user_id]
        }
      }
    }
    
    resource "aws_iam_role" "grafana_cloud_aws_resource_metadata" {
      name        = "GrafanaAWSResourceMetadataScrapeJobAccess"
      description = "Role used by Grafana CloudWatch integration."
      # Allow Grafana Labs' AWS account to assume this role.
      assume_role_policy = data.aws_iam_policy_document.trust_grafana.json
    }
    
    resource "aws_iam_role_policy" "grafana_cloud_aws_resource_metadata" {
      name = "GrafanaAWSResourceMetadataScrapeJobAccess"
      role = aws_iam_role.grafana_cloud_aws_resource_metadata.id
      # This policy allows the role to discover resources via tags and API calls.
      policy = jsonencode({
        Version = "2012-10-17"
        Statement = [
          {
            Effect = "Allow"
            Action = [
              "tag:GetResources",
              "apigateway:GET",
              "aps:ListWorkspaces",
              "autoscaling:DescribeAutoScalingGroups",
              "dms:DescribeReplicationInstances",
              "dms:DescribeReplicationTasks",
              "ec2:DescribeTransitGatewayAttachments",
              "ec2:DescribeSpotFleetRequests",
              "shield:ListProtections",
              "storagegateway:ListGateways",
              "storagegateway:ListTagsForResource"
            ]
            Resource = "*"
          }
        ]
      })
    }
    
    // Allow some time for IAM (global) changes to propagate
    resource "time_sleep" "wait_iam_propagation" {
      depends_on = [
        aws_iam_role.grafana_cloud_aws_resource_metadata,
        aws_iam_role_policy.grafana_cloud_aws_resource_metadata
      ]
      create_duration = "10s"
    }

Data Firehose delivery stream component

Create the Data Firehose stream that sends metrics to the configured Grafana Cloud endpoint, as in the following example:

terraform
locals {
  // If the target endpoint is not explicitly provided, then convert the stack's Prometheus URL
  // to the Grafana Cloud AWS Metric Streaming ingest endpoint.
  // Ex: http://prometheus-prod-03-prod-us-central-0.grafana.net
  // becomes http://aws-metric-streams-prod-03.grafana.net/aws-metrics/api/v1/push
  target_endpoint = var.target_endpoint != "" ? var.target_endpoint : format("%s/aws-metrics/api/v1/push", replace(
    replace(data.grafana_cloud_stack.main.prometheus_url, "prometheus", "aws-metric-streams"),
    "-${data.grafana_cloud_stack.main.cluster_slug}",
    ""
  ))
}
resource "aws_kinesis_firehose_delivery_stream" "stream" {
  name    	= format("%s-firehose", var.metric_stream_name)
  destination = "http_endpoint"

  http_endpoint_configuration {
    url            	= local.target_endpoint
    name           	= "Grafana AWS Metric Stream Destination"
    access_key     	= format("%s:%s",data.grafana_cloud_stack.main.prometheus_user_id, var.metrics_write_token)
    // Buffer incoming data to the specified size, in MBs, before delivering it to the destination
    buffering_size 	= 1
    // Buffer incoming data for the specified period of time, in seconds, before delivering it to the destination
    // Setting to 1 minute to keep a low enough latency between metric production and actual time they are processed
    buffering_interval = 60
    role_arn       	= aws_iam_role.firehose.arn
    s3_backup_mode 	= "FailedDataOnly"

	  request_configuration {
      content_encoding = "GZIP"
    }
    // this block configured the fallback s3 bucket destination
    s3_configuration {
      role_arn       	= aws_iam_role.firehose.arn
      bucket_arn     	= aws_s3_bucket.fallback.arn
      buffering_size 	= 5
      buffering_interval = 300
      compression_format = "GZIP"
    }
    // Optional block for writing delivery failures to a CloudWatch log group
    // this assumes the target log group has been created, or is created in this same snippet
    dynamic "cloudwatch_logging_options" {
      for_each = var.log_delivery_errors ? [1] : []
      content {
        enabled         = true
        log_group_name  = var.errors_log_group_name
        log_stream_name = var.errors_log_stream_name
      }
    }
  }
}
Set static labels in Terraform

Setting static labels offers you an additional way to filter and group your metrics in Grafana Cloud. Set static labels using the X-Amz-Firehose-Common-Attributes header. Configure the request_configuration block, with a common_attributes sub-block of the Amazon Data Firehose delivery stream configuration, as in the following example:

tf
http_endpoint_configuration {
    url            	= local.target_endpoint
    name           	= "Grafana AWS Metric Stream Destination"
    access_key     	= format("%s:%s",data.grafana_cloud_stack.main.prometheus_user_id, var.metrics_write_token)
    // Buffer incoming data to the specified size, in MBs, before delivering it to the destination
    buffering_size 	= 1
    // Buffer incoming data for the specified period of time, in seconds, before delivering it to the destination
    // Setting to 1 minute to keep a low enough latency between metric production and actual time they are processed
    buffering_interval = 60
    role_arn       	= aws_iam_role.firehose.arn
    s3_backup_mode 	= "FailedDataOnly"

	  request_configuration {
      content_encoding = "GZIP"

      common_attributes {
        name  = "lbl_testname"
        value = "testvalue"
      }
      common_attributes {
        name  = "lbl_testname2" \\ static label names must be previxed with `lbl_`
        value = "testvalue2" \\ static label names and values must be compatible with the Prometheus data model specification
      }
    }
  ...
}

Label names must be prefixed with lbl_ and the label names and values must be compatible with the Prometheus data model specification.

When you query in Grafana for the static labels, do not include the lbl_ prefix, as in the following example query:

{job=~"cloud/aws/.+", label1="value`", label2="value2"}

CloudWatch metric stream component

Create a CloudWatch metric stream with include and exclude filters to define which metrics to push into the Data Firehose stream, as in the following example:

terraform
resource "aws_cloudwatch_metric_stream" "metric_stream" {
  name      	= "var.metric_stream_name"
  role_arn  	= aws_iam_role.metric_stream_role.arn
  firehose_arn  = aws_kinesis_firehose_delivery_stream.stream.arn
  output_format = "opentelemetry1.0"


  dynamic "include_filter" {
    // Stream all metrics from the specified namespaces
    for_each = var.include_namespaces
    content {
      namespace = include_filter.value
    }
  }
}

Configure an AWS resource metadata scrape job in Terraform

Metrics pushed to Grafana Cloud by metrics stream only contain region and dimensions as labels. To enrich your metrics with additional metadata as labels, such as the associated resource’s ARN and resource tags, create an AWS Resource Metadata scrape job.

Configure the services, scrape interval, and tag filters you want included using the grafana_cloud_provider_aws_resource_metadata_scrape_job resource documentation, as in the following example:

terraform
resource "grafana_cloud_provider_aws_account" "main" {
  depends_on = [
    time_sleep.wait_iam_propagation
  ]

  stack_id = data.grafana_cloud_stack.main.id
  role_arn = aws_iam_role.grafana_cloud_aws_resource_metadata.arn
  regions  = [data.aws_region.current.name]
}

resource "grafana_cloud_provider_aws_resource_metadata_scrape_job" "main" {
  stack_id                = data.grafana_cloud_stack.main.id
  name                    = "aws-resource-metadata-scraper"
  aws_account_resource_id = grafana_cloud_provider_aws_account.main.resource_id

  dynamic "service" {
    for_each = var.include_namespaces
    content {
      name = service.value
    }
  }
}

Grafana Terraform provider resource descriptions

You can define the following resources and data sources with the Grafana Terraform provider.

Resource nameDocumentation referenceDescription
grafana_cloud_provider_aws_accountDocRepresents an AWS IAM role that authorizes Grafana Cloud to pull Amazon CloudWatch metrics for a set of regions. Usually, there’s one of these resources per configured AWS account.
grafana_cloud_provider_aws_resource_metadata_scrape_jobDocRepresents a Grafana AWS Resource Metadata Scrape Job. This resource configures Grafana to fetch resource metadata for one or multiple AWS services, for a given grafana_cloud_provider_aws_account.