Configure CloudWatch metric streams
CloudWatch metric streams require an access policy token with metric write permissions from Grafana Cloud, multiple AWS IAM roles, a CloudWatch metric stream, a Data Firehose stream, and an AWS resource metadata scrape job. You can configure CloudWatch metric streams using a combination of the Cloud Provider UI and CloudFormation or Terraform.
Before you begin
Open your Grafana Cloud portal, expand Observability > Cloud provider in the main menu, then select AWS, the Configuration tab, and the CloudWatch metric streams card to find and copy the following information you need to configure CloudWatch metric streams:
- The API token with the
metrics:write
permission.
Create a Grafana.com token by entering a name for the token and clicking Create token.
If you are using Terraform, store this information in your list of variables.
Configure metric streams with CloudFormation
Configuring a CloudWatch metrics stream with CloudFormation requires an access policy token with metric:write
permissions from Grafana Cloud and a CloudFormation stack that includes an ARN from an AWS IAM role used to set up the AWS resource metadata scrape job.
Launch CloudFormation stack
- Click the Launch CloudFormation stack button for step 2.
- Complete the steps to create all of the AWS components in CloudFormation.
- Copy the ARN from the AWS IAM role generated in the CloudFormation stack to use when creating the AWS metadata scrape job.
- Update the MetricsWriteToken field with the Grafana.com token you generated.
Set static labels in AWS
Setting static labels offers you an additional way to filter and group your metrics in Grafana Cloud.
Set static labels using the X-Amz-Firehose-Common-Attributes
header.
To set static labels in the AWS:
- Navigate to your Amazon Data Firehose.
- Select the Configuration tab.
- Select the Edit button for Destination settings.
- Select the Add parameter button for Parameters - optional.
- Enter a key value pair in the corresponding text boxes.
Label Keys must be prefixed with lbl_
and the label keys and values must be compatible with the Prometheus data model specification.
When you query in Grafana for the static labels, do not include the lbl_
prefix, as in the following example query:
{job=~"cloud/aws/.+", label1="value`", label2="value2"}
Create an AWS resource metadata scrape job in the UI
Metrics pushed to Grafana Cloud by metrics stream only contain region and dimensions as labels. To enrich your metrics with additional metadata as labels, such as the associated resource’s ARN and resource tags, create an AWS Resource Metadata scrape job.
Connect to AWS account
- Enter the name of your account for the Account name(optional). Give your account a unique name that contains only alphanumeric characters, dashes, and underscores.
- Paste the ARN you copied from the AWS IAM role that was generated when you launched your CloudFormation stack.
- Select the regions where you have services you want to monitor from the AWS Regions drop-down menu.
- Click Add account to ensure the connection is working and to save your new account.
Add resource metadata job options
- Enter a name for your resource metadata job. Give your scrape job a unique name that contains only alphanumeric characters, dashes, and underscores.
- Optionally, add static labels for easier filtering and grouping. These labels are added to all metrics exported by this scrape job.
- Choose the services you want to scrape. You can search in the search box or browse in the list of services.
- Click Edit next to the service if you want to customize the metadata that are collected for that service or namespace.
- Select the scrape interval.
- Add tag filters you want to include.
- Click Save service settings.
- Click Create job.
Configure metric streams with Terraform
Configuring a CloudWatch metrics stream with Terraform requires an access policy token with metric:write
permissions from Grafana Cloud and multiple AWS components.
After you have configured the metric stream, you need to configure an AWS resource metadata scrape job to enrich your metrics with additional metadata as labels, such as the associated resource’s ARN and resource tags.
Download the example CloudWatch metric stream Terraform file as a starting point for configuring your metric stream.
- Download the CloudWatch metric stream Terraform snippet file.
- Complete the sections labeled with
FILLME
and replace values with your values. - Run
terraform apply
, including the required variables, as in the following example:terraform apply \ -var="grafana_cloud_stack_slug=<The slug of the Grafana Cloud stack to use for the AWS resource metadata scrape job, for example, `http://<slug>.grafana.net`" \ -var="cloud_provider_token=<The Grafana.com token used for creating the AWS resource metadata scrape job>" \ -var="cloud_provider_url=<The URL to call Grafana Cloud's Cloud Provider API>" \ -var="metrics_write_token=<The Grafana.com token used to write metrics to Mimir>" \ -var="include_namespaces=<A list of AWS namespaces to include in the metric stream>"
The following instructions explain the different parts of the example file.
Configure the AWS and Grafana Providers
To configure the AWS and Grafana Providers, you need to create a Grafana Cloud access policy token and obtain the regional Cloud Provider API endpoint.
Create a Grafana Cloud access policy token
To create an access policy for your organization in the Grafana Cloud portal, refer to the Create an access policy for a stack steps.
In step 6, add the following scopes:
integration-management:read
integration-management:write
stacks:read
After you create the policy, click Add token to generate a token to authenticate the provider with the Cloud Provider API. Give your token an appropriate name and select an Expiration date. We recommend you select a specific expiration date and do not set the Expiration date to No expiry, as this can create a security vulnerability.
Use this access policy token to call the Grafana Cloud API for the stack name and other properties of the stack. You can also use it to call the Cloud Provider API to manage the AWS account and scrape job resources including AWS resource metadata scrape jobs.
Obtain the regional Cloud provider API endpoint
- Use the following script to return a list of all of the Grafana Cloud stacks you own, along with their respective Cloud Provider API hostnames:
curl -sH "Authorization: Bearer <Access Token from previous step>" "http://grafana.com/api/instances" | \ jq '[.items[]|{stackName: .slug, clusterName:.clusterSlug, cloudProviderAPIURL: "http://cloud-provider-api-\(.clusterSlug).grafana.net"}]'
- Select the hostname for the stack you want to manage.
In the following example, the hostname for the herokublogpost stack ishttp://cloud-provider-api-prod-us-central-0-grafana.net
:Use this API endpoint to call the Cloud Provider API.[ { "stackName": "herokublogpost", "clusterName": "prod-us-central-0", "cloudProviderAPIURL": "http://cloud-provider-api-prod-us-central-0.grafana.net" } ]
Example Terraform
The following snippet is an example configuration of the Grafana and AWS Providers using the Grafana Cloud access token and Cloud Provider API endpoint you obtained:
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
grafana = {
source = "grafana/grafana"
version = ">= 3.24.1"
}
}
}
provider "aws" {
// FILLME: AWS region
region = ""
// FILLME: local AWS profile to use
profile = ""
}
provider "grafana" {
cloud_provider_access_token = var.cloud_provider_token // Grafana Cloud access policy token used to call the Grafana Cloud stack data source for getting the stack name and other properties of the stack.
cloud_access_policy_token = var.cloud_provider_token // Grafana Cloud access policy token used to call the Cloud Provider API to manage the AWS account and scrape job resources.
cloud_provider_url = var.cloud_provider_url // Cloud Provider API URL
}
Refer to the Terraform documentation for more details on each of the following providers:
Create a CloudWatch metric and Data Firehose delivery stream
You must create the following infrastructure in your AWS account for sending the metrics to Grafana Cloud.
Authentication components
Create an IAM role and policy that the Data Firehose stream can assume and allows it to emit error logs and back up to an S3 bucket, as in the following example:
// Batches whose delivery failed are written here resource "aws_s3_bucket" "fallback" { bucket = var.fallback_bucket_name } // main IAM role used by the firehose stream for writing failed batches to S3 resource "aws_iam_role" "firehose" { name = format("Firehose-%s", var.metric_stream_name) assume_role_policy = data.aws_iam_policy_document.firehose_assume_role.json } data "aws_iam_policy_document" "firehose_assume_role" { statement { effect = "Allow" principals { type = "Service" identifiers = ["firehose.amazonaws.com"] } actions = ["sts:AssumeRole"] } } # allow firehose to emit error logs and back up to s3 resource "aws_iam_role_policy" "firehose" { name = format("Firehose-%s", var.metric_stream_name) # attach to firehose http role = aws_iam_role.firehose.id policy = jsonencode({ Version = "2012-10-17" Statement = [ # allow firehose to write error logs { Effect = "Allow" Resource : ["*"], Action = ["logs:PutLogEvents"] }, # allow firehose to backup events to s3 { "Sid" : "s3Permissions", "Effect" : "Allow", "Action" : [ "s3:AbortMultipartUpload", "s3:GetBucketLocation", "s3:GetObject", "s3:ListBucket", "s3:ListBucketMultipartUploads", "s3:PutObject", ], "Resource" : [ aws_s3_bucket.fallback.arn, "${aws_s3_bucket.fallback.arn}/*", ] }, ] }) }
Create an IAM role and policy that’s assumed by the CloudWatch Metric stream to allow it to push metrics to the Data Firehose stream, as in the following example:
// IAM role used by CloudWatch metric stream for forwarding metrics to Firehose resource "aws_iam_role" "metric_stream_role" { name = format("metric-stream-role-%s", var.cluster) # allow metric stream to assume this role assume_role_policy = data.aws_iam_policy_document.metric_stream_assume_role.json } data "aws_iam_policy_document" "metric_stream_assume_role" { statement { effect = "Allow" principals { type = "Service" identifiers = ["streams.metrics.cloudwatch.amazonaws.com"] } actions = ["sts:AssumeRole"] } } resource "aws_iam_role_policy" "metric_stream_role" { name = "AWSCloudWatchMetricStreamPolicy" role = aws_iam_role.metric_stream_role.id policy = jsonencode({ Version = "2012-10-17" Statement = [ // allow metric stream to write to firehose { Action = ["firehose:PutRecord", "firehose:PutRecordBatch"] Effect = "Allow" Resource = [ aws_kinesis_firehose_delivery_stream.stream.arn, ] }, ] }) }
Create an IAM role and policy that’s assumed by Grafana to access only your CloudWatch metadata, as in the following example:
// IAM resources needed to authorize Grafana Cloud to scrape AWS resource metadata data "aws_iam_policy_document" "trust_grafana" { statement { effect = "Allow" principals { type = "AWS" identifiers = ["arn:aws:iam::${var.grafana_cloud_sts_aws_account_id}:root"] } actions = ["sts:AssumeRole"] condition { test = "StringEquals" variable = "sts:ExternalId" values = [data.grafana_cloud_stack.main.prometheus_user_id] } } } resource "aws_iam_role" "grafana_cloud_aws_resource_metadata" { name = "GrafanaAWSResourceMetadataScrapeJobAccess" description = "Role used by Grafana CloudWatch integration." # Allow Grafana Labs' AWS account to assume this role. assume_role_policy = data.aws_iam_policy_document.trust_grafana.json } resource "aws_iam_role_policy" "grafana_cloud_aws_resource_metadata" { name = "GrafanaAWSResourceMetadataScrapeJobAccess" role = aws_iam_role.grafana_cloud_aws_resource_metadata.id # This policy allows the role to discover resources via tags and API calls. policy = jsonencode({ Version = "2012-10-17" Statement = [ { Effect = "Allow" Action = [ "tag:GetResources", "apigateway:GET", "aps:ListWorkspaces", "autoscaling:DescribeAutoScalingGroups", "dms:DescribeReplicationInstances", "dms:DescribeReplicationTasks", "ec2:DescribeTransitGatewayAttachments", "ec2:DescribeSpotFleetRequests", "shield:ListProtections", "storagegateway:ListGateways", "storagegateway:ListTagsForResource" ] Resource = "*" } ] }) } // Allow some time for IAM (global) changes to propagate resource "time_sleep" "wait_iam_propagation" { depends_on = [ aws_iam_role.grafana_cloud_aws_resource_metadata, aws_iam_role_policy.grafana_cloud_aws_resource_metadata ] create_duration = "10s" }
Data Firehose delivery stream component
Create the Data Firehose stream that sends metrics to the configured Grafana Cloud endpoint, as in the following example:
locals {
// If the target endpoint is not explicitly provided, then convert the stack's Prometheus URL
// to the Grafana Cloud AWS Metric Streaming ingest endpoint.
// Ex: http://prometheus-prod-03-prod-us-central-0.grafana.net
// becomes http://aws-metric-streams-prod-03.grafana.net/aws-metrics/api/v1/push
target_endpoint = var.target_endpoint != "" ? var.target_endpoint : format("%s/aws-metrics/api/v1/push", replace(
replace(data.grafana_cloud_stack.main.prometheus_url, "prometheus", "aws-metric-streams"),
"-${data.grafana_cloud_stack.main.cluster_slug}",
""
))
}
resource "aws_kinesis_firehose_delivery_stream" "stream" {
name = format("%s-firehose", var.metric_stream_name)
destination = "http_endpoint"
http_endpoint_configuration {
url = local.target_endpoint
name = "Grafana AWS Metric Stream Destination"
access_key = format("%s:%s",data.grafana_cloud_stack.main.prometheus_user_id, var.metrics_write_token)
// Buffer incoming data to the specified size, in MBs, before delivering it to the destination
buffering_size = 1
// Buffer incoming data for the specified period of time, in seconds, before delivering it to the destination
// Setting to 1 minute to keep a low enough latency between metric production and actual time they are processed
buffering_interval = 60
role_arn = aws_iam_role.firehose.arn
s3_backup_mode = "FailedDataOnly"
request_configuration {
content_encoding = "GZIP"
}
// this block configured the fallback s3 bucket destination
s3_configuration {
role_arn = aws_iam_role.firehose.arn
bucket_arn = aws_s3_bucket.fallback.arn
buffering_size = 5
buffering_interval = 300
compression_format = "GZIP"
}
// Optional block for writing delivery failures to a CloudWatch log group
// this assumes the target log group has been created, or is created in this same snippet
dynamic "cloudwatch_logging_options" {
for_each = var.log_delivery_errors ? [1] : []
content {
enabled = true
log_group_name = var.errors_log_group_name
log_stream_name = var.errors_log_stream_name
}
}
}
}
Set static labels in Terraform
Setting static labels offers you an additional way to filter and group your metrics in Grafana Cloud.
Set static labels using the X-Amz-Firehose-Common-Attributes
header.
Configure the request_configuration
block, with a common_attributes
sub-block of the Amazon Data Firehose delivery stream configuration, as in the following example:
http_endpoint_configuration {
url = local.target_endpoint
name = "Grafana AWS Metric Stream Destination"
access_key = format("%s:%s",data.grafana_cloud_stack.main.prometheus_user_id, var.metrics_write_token)
// Buffer incoming data to the specified size, in MBs, before delivering it to the destination
buffering_size = 1
// Buffer incoming data for the specified period of time, in seconds, before delivering it to the destination
// Setting to 1 minute to keep a low enough latency between metric production and actual time they are processed
buffering_interval = 60
role_arn = aws_iam_role.firehose.arn
s3_backup_mode = "FailedDataOnly"
request_configuration {
content_encoding = "GZIP"
common_attributes {
name = "lbl_testname"
value = "testvalue"
}
common_attributes {
name = "lbl_testname2" \\ static label names must be previxed with `lbl_`
value = "testvalue2" \\ static label names and values must be compatible with the Prometheus data model specification
}
}
...
}
Label names must be prefixed with lbl_
and the label names and values must be compatible with the Prometheus data model specification.
When you query in Grafana for the static labels, do not include the lbl_
prefix, as in the following example query:
{job=~"cloud/aws/.+", label1="value`", label2="value2"}
CloudWatch metric stream component
Create a CloudWatch metric stream with include
and exclude
filters to define which metrics to push into the Data Firehose stream, as in the following example:
resource "aws_cloudwatch_metric_stream" "metric_stream" {
name = "var.metric_stream_name"
role_arn = aws_iam_role.metric_stream_role.arn
firehose_arn = aws_kinesis_firehose_delivery_stream.stream.arn
output_format = "opentelemetry1.0"
dynamic "include_filter" {
// Stream all metrics from the specified namespaces
for_each = var.include_namespaces
content {
namespace = include_filter.value
}
}
}
Configure an AWS resource metadata scrape job in Terraform
Metrics pushed to Grafana Cloud by metrics stream only contain region and dimensions as labels. To enrich your metrics with additional metadata as labels, such as the associated resource’s ARN and resource tags, create an AWS Resource Metadata scrape job.
Configure the services, scrape interval, and tag filters you want included using the grafana_cloud_provider_aws_resource_metadata_scrape_job
resource documentation, as in the following example:
resource "grafana_cloud_provider_aws_account" "main" {
depends_on = [
time_sleep.wait_iam_propagation
]
stack_id = data.grafana_cloud_stack.main.id
role_arn = aws_iam_role.grafana_cloud_aws_resource_metadata.arn
regions = [data.aws_region.current.name]
}
resource "grafana_cloud_provider_aws_resource_metadata_scrape_job" "main" {
stack_id = data.grafana_cloud_stack.main.id
name = "aws-resource-metadata-scraper"
aws_account_resource_id = grafana_cloud_provider_aws_account.main.resource_id
dynamic "service" {
for_each = var.include_namespaces
content {
name = service.value
}
}
}
Grafana Terraform provider resource descriptions
You can define the following resources and data sources with the Grafana Terraform provider.
Resource name | Documentation reference | Description |
---|---|---|
grafana_cloud_provider_aws_account | Doc | Represents an AWS IAM role that authorizes Grafana Cloud to pull Amazon CloudWatch metrics for a set of regions. Usually, there’s one of these resources per configured AWS account. |
grafana_cloud_provider_aws_resource_metadata_scrape_job | Doc | Represents a Grafana AWS Resource Metadata Scrape Job. This resource configures Grafana to fetch resource metadata for one or multiple AWS services, for a given grafana_cloud_provider_aws_account . |