Troubleshoot the Kubernetes Monitoring Helm chart configuration
Grafana Alloy has a web user interface that shows every configuration
component the Alloy instance is using and the component status.
By default, the web UI runs on each Alloy pod on port 12345
.
Since that UI is typically not exposed external to the Cluster, you can access it with port forwarding:
kubectl port-forward svc/grafana-k8s-monitoring-alloy 12345:12345
Then open a browser to http://localhost:12345
to view the GUI.
Specific Cluster platform providers
Certain Kubernetes Cluster platforms require some specific configurations for this Helm chart. If your Cluster is running on one of these platforms, see the example for the changes required to run this Helm chart:
Common issues
The following are frequently seen problems related to configuring with this Helm chart.
Authentication error: invalid scope requested
To deliver telemetry data to Grafana Cloud, you use an Access Policy Token with the appropriate scopes.
Scopes define an action that can be done to a specific data type.
For example metrics:write
permits writing metrics.
If sending data to Grafana Cloud, this Helm chart uses the <data>:write
scopes for delivering data.
If your token does not have the correct scope, you will see errors in the Grafana Alloy logs.
For example, when trying to deliver profiles to Pyroscrope without the profiles:write
scope:
msg="final error sending to profiles to endpoint" component=pyroscope.write.profiles_service endpoint=http://tempo-prod-1-prod-eu-west-2.grafana.net:443 err="unauthenticated: authentication error: invalid scope requested"
The following table shows the scopes required for various actions done by this chart:
Data type | Server | Scope for writing | Scope for reading |
---|---|---|---|
Metrics | Grafana Cloud Metrics (Prometheus or Mimir) | metrics:write | metrics:read |
Logs & Cluster Events | Grafana Cloud Logs (Loki) | logs:write | logs:read |
Traces | Grafana Cloud Trace (Tempo) | traces:write | traces:read |
Profiles | Grafana Cloud Profiles (Pyroscope) | profiles:write | profiles:read |
Kepler Pods crashing on AWS Graviton Nodes
Kepler cannot run on AWS Graviton Nodes and Pods; these Nodes will CrashLoopBackOff. To prevent this, you can add a Node selector to the Kepler deployment:
kepler:
nodeSelector:
kubernetes.io/arch: amd64
ResourceExhausted eror when sending traces
You might encounter the following if you have traces enabled and you see log entries in your alloy
instance that looks like this:
Permanent error: rpc error: code = ResourceExhausted desc = grpc: received message after decompression larger than max (5268750 vs. 4194304)" dropped_items=11226
ts=2024-09-19T19:52:35.16668052Z level=info msg="rejoining peers" service=cluster peers_count=1 peers=6436336134343433.grafana-k8s-monitoring-alloy-cluster.default.svc.cluster.local.:12345
This error is likely due to the span size being too large. To fix this, adjust the batch size:
receivers:
processors:
batch:
maxSize: 2000
Start with 2000 and adjust as needed.