Semantic conventions for GPU metrics

Status: Development

GPU metrics hw.gpu.*

Graphics Processing Unit (discrete).

hw.type MUST be set to "gpu".

All GPU metrics may include the below attributes:

Attributes:

KeyStabilityRequirement LevelValue TypeDescriptionExample Values
hw.idDevelopmentRequiredstringAn identifier for the hardware component, unique within the monitored hostwin32battery_battery_testsysa33_1
hw.driver_versionDevelopmentRecommendedstringDriver version for the hardware component10.2.1-3
hw.firmware_versionDevelopmentRecommendedstringFirmware version of the hardware component2.0.1
hw.modelDevelopmentRecommendedstringDescriptive model name of the hardware componentPERC H740P; Intel(R) Core(TM) i7-10700K; Dell XPS 15 Battery
hw.nameDevelopmentRecommendedstringAn easily-recognizable name for the hardware componenteth0
hw.parentDevelopmentRecommendedstringUnique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller)dellStorage_perc_0
hw.serial_numberDevelopmentRecommendedstringSerial number of the hardware componentCNFCP0123456789
hw.vendorDevelopmentRecommendedstringVendor name of the hardware componentDell; HP; Intel; AMD; LSI; Lenovo

Metric: hw.errors (GPU)

This metric is recommended.

Number of errors encountered by the GPU.

When using this metric, the following attributes MUST be set:

  • hw.type MUST be set to "gpu" to indicate that the errors are from a GPU.
  • error.type SHOULD be set to one of the following values to indicate the type of error:
    • "corrected": Errors that were detected and corrected by the GPU.
    • "uncorrected": Errors that were detected but could not be corrected by the GPU.
NameInstrument TypeUnit (UCUM)DescriptionStabilityEntity Associations
hw.errorsCounter{error}Number of errors encountered by the component.Development

Attributes:

KeyStabilityRequirement LevelValue TypeDescriptionExample Values
hw.idDevelopmentRequiredstringAn identifier for the hardware component, unique within the monitored hostwin32battery_battery_testsysa33_1
hw.typeDevelopmentRequiredstringType of the component [1]battery; cpu; disk_controller
error.typeStableConditionally Required if and only if an error has occurredstringThe type of error encountered by the component. [2]uncorrected; zero_buffer_credit; crc; bad_sector
hw.nameDevelopmentRecommendedstringAn easily-recognizable name for the hardware componenteth0
hw.parentDevelopmentRecommendedstringUnique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller)dellStorage_perc_0
network.io.directionDevelopmentRecommendedstringDirection of network traffic for network errors. [3]receive; transmit

[1] hw.type: Describes the category of the hardware component for which hw.state is being reported. For example, hw.type=temperature along with hw.state=degraded would indicate that the temperature of the hardware component has been reported as degraded.

[2] error.type: The error.type SHOULD match the error code reported by the component, the canonical name of the error, or another low-cardinality error identifier. Instrumentations SHOULD document the list of errors they report.

[3] network.io.direction: This attribute SHOULD only be used when hw.type is set to "network" to indicate the direction of the error.


error.type has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

ValueDescriptionStability
_OTHERA fallback error value to be used when the instrumentation doesn’t define a custom value.Stable

hw.type has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

ValueDescriptionStability
batteryBatteryDevelopment
cpuCPUDevelopment
disk_controllerDisk controllerDevelopment
enclosureEnclosureDevelopment
fanFanDevelopment
gpuGPUDevelopment
logical_diskLogical diskDevelopment
memoryMemoryDevelopment
networkNetworkDevelopment
physical_diskPhysical diskDevelopment
power_supplyPower supplyDevelopment
tape_driveTape driveDevelopment
temperatureTemperatureDevelopment
voltageVoltageDevelopment

network.io.direction has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

ValueDescriptionStability
receivereceiveDevelopment
transmittransmitDevelopment

Metric: hw.gpu.io

This metric is recommended.

NameInstrument TypeUnit (UCUM)DescriptionStabilityEntity Associations
hw.gpu.ioCounterByReceived and transmitted bytes by the GPU.Development

Attributes:

KeyStabilityRequirement LevelValue TypeDescriptionExample Values
hw.idDevelopmentRequiredstringAn identifier for the hardware component, unique within the monitored hostwin32battery_battery_testsysa33_1
network.io.directionDevelopmentRequiredstringThe network IO operation direction.receive; transmit
hw.driver_versionDevelopmentRecommendedstringDriver version for the hardware component10.2.1-3
hw.firmware_versionDevelopmentRecommendedstringFirmware version of the hardware component2.0.1
hw.modelDevelopmentRecommendedstringDescriptive model name of the hardware componentPERC H740P; Intel(R) Core(TM) i7-10700K; Dell XPS 15 Battery
hw.nameDevelopmentRecommendedstringAn easily-recognizable name for the hardware componenteth0
hw.parentDevelopmentRecommendedstringUnique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller)dellStorage_perc_0
hw.serial_numberDevelopmentRecommendedstringSerial number of the hardware componentCNFCP0123456789
hw.vendorDevelopmentRecommendedstringVendor name of the hardware componentDell; HP; Intel; AMD; LSI; Lenovo

network.io.direction has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

ValueDescriptionStability
receivereceiveDevelopment
transmittransmitDevelopment

Metric: hw.gpu.memory.limit

This metric is recommended.

NameInstrument TypeUnit (UCUM)DescriptionStabilityEntity Associations
hw.gpu.memory.limitUpDownCounterBySize of the GPU memory.Development

Attributes:

KeyStabilityRequirement LevelValue TypeDescriptionExample Values
hw.idDevelopmentRequiredstringAn identifier for the hardware component, unique within the monitored hostwin32battery_battery_testsysa33_1
hw.driver_versionDevelopmentRecommendedstringDriver version for the hardware component10.2.1-3
hw.firmware_versionDevelopmentRecommendedstringFirmware version of the hardware component2.0.1
hw.modelDevelopmentRecommendedstringDescriptive model name of the hardware componentPERC H740P; Intel(R) Core(TM) i7-10700K; Dell XPS 15 Battery
hw.nameDevelopmentRecommendedstringAn easily-recognizable name for the hardware componenteth0
hw.parentDevelopmentRecommendedstringUnique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller)dellStorage_perc_0
hw.serial_numberDevelopmentRecommendedstringSerial number of the hardware componentCNFCP0123456789
hw.vendorDevelopmentRecommendedstringVendor name of the hardware componentDell; HP; Intel; AMD; LSI; Lenovo

Metric: hw.gpu.memory.utilization

This metric is recommended.

NameInstrument TypeUnit (UCUM)DescriptionStabilityEntity Associations
hw.gpu.memory.utilizationGauge1Fraction of GPU memory used.Development

Attributes:

KeyStabilityRequirement LevelValue TypeDescriptionExample Values
hw.idDevelopmentRequiredstringAn identifier for the hardware component, unique within the monitored hostwin32battery_battery_testsysa33_1
hw.driver_versionDevelopmentRecommendedstringDriver version for the hardware component10.2.1-3
hw.firmware_versionDevelopmentRecommendedstringFirmware version of the hardware component2.0.1
hw.modelDevelopmentRecommendedstringDescriptive model name of the hardware componentPERC H740P; Intel(R) Core(TM) i7-10700K; Dell XPS 15 Battery
hw.nameDevelopmentRecommendedstringAn easily-recognizable name for the hardware componenteth0
hw.parentDevelopmentRecommendedstringUnique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller)dellStorage_perc_0
hw.serial_numberDevelopmentRecommendedstringSerial number of the hardware componentCNFCP0123456789
hw.vendorDevelopmentRecommendedstringVendor name of the hardware componentDell; HP; Intel; AMD; LSI; Lenovo

Metric: hw.gpu.memory.usage

This metric is recommended.

NameInstrument TypeUnit (UCUM)DescriptionStabilityEntity Associations
hw.gpu.memory.usageUpDownCounterByGPU memory used.Development

Attributes:

KeyStabilityRequirement LevelValue TypeDescriptionExample Values
hw.idDevelopmentRequiredstringAn identifier for the hardware component, unique within the monitored hostwin32battery_battery_testsysa33_1
hw.driver_versionDevelopmentRecommendedstringDriver version for the hardware component10.2.1-3
hw.firmware_versionDevelopmentRecommendedstringFirmware version of the hardware component2.0.1
hw.modelDevelopmentRecommendedstringDescriptive model name of the hardware componentPERC H740P; Intel(R) Core(TM) i7-10700K; Dell XPS 15 Battery
hw.nameDevelopmentRecommendedstringAn easily-recognizable name for the hardware componenteth0
hw.parentDevelopmentRecommendedstringUnique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller)dellStorage_perc_0
hw.serial_numberDevelopmentRecommendedstringSerial number of the hardware componentCNFCP0123456789
hw.vendorDevelopmentRecommendedstringVendor name of the hardware componentDell; HP; Intel; AMD; LSI; Lenovo

Metric: hw.gpu.utilization

This metric is recommended.

NameInstrument TypeUnit (UCUM)DescriptionStabilityEntity Associations
hw.gpu.utilizationGauge1Fraction of time spent in a specific task.Development

Attributes:

KeyStabilityRequirement LevelValue TypeDescriptionExample Values
hw.idDevelopmentRequiredstringAn identifier for the hardware component, unique within the monitored hostwin32battery_battery_testsysa33_1
hw.driver_versionDevelopmentRecommendedstringDriver version for the hardware component10.2.1-3
hw.firmware_versionDevelopmentRecommendedstringFirmware version of the hardware component2.0.1
hw.gpu.taskDevelopmentRecommendedstringType of task the GPU is performingdecoder; encoder; general
hw.modelDevelopmentRecommendedstringDescriptive model name of the hardware componentPERC H740P; Intel(R) Core(TM) i7-10700K; Dell XPS 15 Battery
hw.nameDevelopmentRecommendedstringAn easily-recognizable name for the hardware componenteth0
hw.parentDevelopmentRecommendedstringUnique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller)dellStorage_perc_0
hw.serial_numberDevelopmentRecommendedstringSerial number of the hardware componentCNFCP0123456789
hw.vendorDevelopmentRecommendedstringVendor name of the hardware componentDell; HP; Intel; AMD; LSI; Lenovo

hw.gpu.task has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

ValueDescriptionStability
decoderDecoderDevelopment
encoderEncoderDevelopment
generalGeneralDevelopment

Metric: hw.status (GPU)

This metric is recommended.

Operational status: 1 (true) or 0 (false) for each of the possible states.

When using this metric for GPU status, the following attributes MUST be set:

  • hw.type MUST be set to "gpu" to indicate that the status is for a GPU.
  • hw.state MUST be set to one of the following values to indicate the GPU state:
    • "ok": The GPU is operating normally.
    • "degraded": The GPU is operating with reduced functionality or performance.
    • "failed": The GPU has failed and is not operational.
    • "predicted_failure": The GPU is currently operational but is predicted to fail soon.
NameInstrument TypeUnit (UCUM)DescriptionStabilityEntity Associations
hw.statusUpDownCounter1Operational status: 1 (true) or 0 (false) for each of the possible states. [1]Development

[1]: hw.status is currently specified as an UpDownCounter but would ideally be represented using a StateSet as defined in OpenMetrics. This semantic convention will be updated once StateSet is specified in OpenTelemetry. This planned change is not expected to have any consequence on the way users query their timeseries backend to retrieve the values of hw.status over time.

Attributes:

KeyStabilityRequirement LevelValue TypeDescriptionExample Values
hw.idDevelopmentRequiredstringAn identifier for the hardware component, unique within the monitored hostwin32battery_battery_testsysa33_1
hw.stateDevelopmentRequiredstringThe current state of the componentdegraded; failed; needs_cleaning
hw.typeDevelopmentRequiredstringType of the component [1]battery; cpu; disk_controller
hw.nameDevelopmentRecommendedstringAn easily-recognizable name for the hardware componenteth0
hw.parentDevelopmentRecommendedstringUnique identifier of the parent component (typically the hw.id attribute of the enclosure, or disk controller)dellStorage_perc_0

[1] hw.type: Describes the category of the hardware component for which hw.state is being reported. For example, hw.type=temperature along with hw.state=degraded would indicate that the temperature of the hardware component has been reported as degraded.


hw.state has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

ValueDescriptionStability
degradedDegradedDevelopment
failedFailedDevelopment
needs_cleaningNeeds CleaningDevelopment
okOKDevelopment
predicted_failurePredicted FailureDevelopment

hw.type has the following list of well-known values. If one of them applies, then the respective value MUST be used; otherwise, a custom value MAY be used.

ValueDescriptionStability
batteryBatteryDevelopment
cpuCPUDevelopment
disk_controllerDisk controllerDevelopment
enclosureEnclosureDevelopment
fanFanDevelopment
gpuGPUDevelopment
logical_diskLogical diskDevelopment
memoryMemoryDevelopment
networkNetworkDevelopment
physical_diskPhysical diskDevelopment
power_supplyPower supplyDevelopment
tape_driveTape driveDevelopment
temperatureTemperatureDevelopment
voltageVoltageDevelopment