7.4a571c0d144b14fd4a87a9d9b2aa9fcd6Templates/Applications32a39c8aca9445df862e9f3e1369c19aNvidia by Zabbix agent 2Nvidia by Zabbix agent 2This template is designed for Nvidia GPU monitoring and doesn't require any external scripts.
1. Setup and configure Zabbix agent 2 compiled with the Nvidia monitoring plugin.
2. Create a host with Zabbix agent interface and attach the template to it.
All Nvidia GPUs will be discovered. Set filters with macros if you want to override default filter parameters.
You can discuss this template or leave feedback on our forum https://www.zabbix.com/forum/zabbix-suggestions-and-feedback.
Generated by official Zabbix template tool "Templator"Zabbix7.2-1Templates/Applications- ac18129f78a54a2e95d1d48630e0eab2Number of devicesnvml.device.count1hRetrieves the number of compute devices in the system. A compute device is a single GPU.
For all Nvidia products.DISCARD_UNCHANGED_HEARTBEAT1dcomponentnvidia92b64a9d2e87418a99cd2ca814a2deacchange(/Nvidia by Zabbix agent 2/nvml.device.count) <> 0Nvidia: Number of devices has changedNvidia: Number of devices on {HOST.HOST} has changed.current value: {ITEM.LASTVALUE1}WARNINGNumber of devices has changed. Check if this was intentional.YESscopenotice
- afe2ea47c42f4992851f02b31efd8225Get devicesnvml.device.get1h0TEXTRetrieves a list of Nvidia devices in the system.componentnvidiacomponentraw
- 824a7555809f4e23b2f7fa72aff28a0cDriver versionnvml.system.driver.version1hCHARRetrieves the version of the system's graphics driver.
For all Nvidia products.DISCARD_UNCHANGED_HEARTBEAT1dcomponentnvidiada8538e357a044ee8ebea1d6d4077ffdchange(/Nvidia by Zabbix agent 2/nvml.system.driver.version) <> 0Nvidia: Driver version has changedNvidia: Driver version on {HOST.HOST} has changed.current value: {ITEM.LASTVALUE1}INFODriver version has changed.
Check the Nvidia website for the specific driver version: https://www.nvidia.com/en-us/drivers/YESscopenotice
- 8e54cc700a9147baa5f18aa286c47205NVML library versionnvml.version1hCHARRetrieves the version of the NVML library.
For all Nvidia products.DISCARD_UNCHANGED_HEARTBEAT1dcomponentnvidia1f04a31c99954172a69f2c8529b429ffchange(/Nvidia by Zabbix agent 2/nvml.version) <> 0Nvidia: NVML library has changedNvidia: NVML library on {HOST.HOST} has changed.current value: {ITEM.LASTVALUE1}INFONVML library version has changed.
Check the changelog for details: https://docs.nvidia.com/deploy/nvml-api/change-log.htmlYESscopenotice
bb7ede7d50f0427aaa043c59ee706c09GPU DiscoveryDEPENDENTnvml.device.discoveryAND{#NAME}{$NVIDIA.NAME.MATCHES}{#NAME}{$NVIDIA.NAME.NOT_MATCHES}NOT_MATCHES_REGEX{#UUID}{$NVIDIA.UUID.MATCHES}{#UUID}{$NVIDIA.UUID.NOT_MATCHES}NOT_MATCHES_REGEXNvidia GPU discovery in the system.cf631ee2b8d9494e81eef3e9e0bddf80[{#UUID}]: Decoder utilizationnvml.device.decoder.utilization["{#UUID}"]%Retrieves the current utilization for the Decoder.
For Nvidia Kepler or newer fully supported devices.componentnvidiadevice{#NAME}device{#UUID}6b41ac763eab4e83b69e1e28ec01db3dmin(/Nvidia by Zabbix agent 2/nvml.device.decoder.utilization["{#UUID}"],3m) > {$NVIDIA.DECODER.UTIL.CRIT}Nvidia: [{#UUID}]: Decoder utilization exceeded critical thresholdNvidia: [{#UUID}]: Decoder utilization ({ITEM.VALUE1}) exceeded critical threshold ({$NVIDIA.DECODER.UTIL.CRIT} %)current value: {ITEM.LASTVALUE1}AVERAGE[{#UUID}]: Decoder utilization is very high. It may indicate abnormal behavior/activity. Change corresponding macro in case of false-positive.scopeperformance4cc20329b01543f7b0ef9b3a00dbff07min(/Nvidia by Zabbix agent 2/nvml.device.decoder.utilization["{#UUID}"],3m) > {$NVIDIA.DECODER.UTIL.WARN}Nvidia: [{#UUID}]: Decoder utilization exceeded warning thresholdNvidia: [{#UUID}]: Decoder utilization ({ITEM.VALUE1}) exceeded warning threshold ({$NVIDIA.DECODER.UTIL.WARN} %)current value: {ITEM.LASTVALUE1}WARNING[{#UUID}]: Decoder utilization is high. It may indicate abnormal behavior/activity. Change corresponding macro in case of false-positive.Nvidia: [{#UUID}]: Decoder utilization exceeded critical thresholdmin(/Nvidia by Zabbix agent 2/nvml.device.decoder.utilization["{#UUID}"],3m) > {$NVIDIA.DECODER.UTIL.CRIT}scopeperformance71cabb57ad9b43c39ff05d56f753b2c1[{#UUID}]: Encoder average FPSDEPENDENTnvml.device.encoder.stats.fps["{#UUID}"]!fpsRetrieves the trailing average FPS of all active encoder sessions for a given device.
For Nvidia Maxwell or newer fully supported devices.JSONPATH$.average_fpsnvml.device.encoder.stats.get["{#UUID}"]componentencodercomponentnvidiadevice{#NAME}device{#UUID}cc4ff50ff1ed4c6d92ccb24f205b65c5[{#UUID}]: Encoder statsnvml.device.encoder.stats.get["{#UUID}"]0TEXTRetrieves the current encoder statistics for a given device.
For Nvidia Maxwell or newer fully supported devices.componentnvidiacomponentrawdevice{#NAME}device{#UUID}727e5f7da5394d4191814d5ccb5b6af0[{#UUID}]: Encoder average latencyDEPENDENTnvml.device.encoder.stats.latency["{#UUID}"]FLOATsRetrieves the current encode latency for a given device.
For Nvidia Maxwell or newer fully supported devices.JSONPATH$.average_latency_msMULTIPLIER0.001nvml.device.encoder.stats.get["{#UUID}"]componentencodercomponentnvidiadevice{#NAME}device{#UUID}c6b3813511454b1eafa69d834dfd8d0clast(/Nvidia by Zabbix agent 2/nvml.device.encoder.stats.latency["{#UUID}"]) > (2 * avg(/Nvidia by Zabbix agent 2/nvml.device.encoder.stats.latency["{#UUID}"],3m))Nvidia: [{#UUID}]: Encoder average latency is highNvidia: [{#UUID}]: Encoder average latency is 2x higher than usual.current value: {ITEM.LASTVALUE1}WARNINGscopeperformance4a39249deb40404386492631ce987b19[{#UUID}]: Encoder sessionsDEPENDENTnvml.device.encoder.stats.sessions["{#UUID}"]Retrieves the current count of active encoder sessions for a given device.
For Nvidia Maxwell or newer fully supported devices.JSONPATH$.session_countnvml.device.encoder.stats.get["{#UUID}"]componentencodercomponentnvidiadevice{#NAME}device{#UUID}6cf65da338f64f6c9087f8699c2d71dc[{#UUID}]: Encoder utilizationnvml.device.encoder.utilization["{#UUID}"]%Retrieves the current utilization for the Encoder.
For Nvidia Kepler or newer fully supported devices.componentnvidiadevice{#NAME}device{#UUID}0c7f39babddf49469d46aba48114c45emin(/Nvidia by Zabbix agent 2/nvml.device.encoder.utilization["{#UUID}"],3m) > {$NVIDIA.ENCODER.UTIL.CRIT}Nvidia: [{#UUID}]: Encoder utilization exceeded critical thresholdNvidia: [{#UUID}]: Encoder utilization ({ITEM.VALUE1}) exceeded critical threshold ({$NVIDIA.ENCODER.UTIL.CRIT} %)current value: {ITEM.LASTVALUE1}AVERAGE[{#UUID}]: Encoder utilization is very high. It may indicate abnormal behavior/activity. Change corresponding macro in case of false-positive.scopeperformancee2cb63b538514aa49c163c74b4471a39min(/Nvidia by Zabbix agent 2/nvml.device.encoder.utilization["{#UUID}"],3m) > {$NVIDIA.ENCODER.UTIL.WARN}Nvidia: [{#UUID}]: Encoder utilization exceeded warning thresholdNvidia: [{#UUID}]: Encoder utilization ({ITEM.VALUE1}) exceeded warning threshold ({$NVIDIA.ENCODER.UTIL.WARN} %)current value: {ITEM.LASTVALUE1}WARNING[{#UUID}]: Encoder utilization is high. It may indicate abnormal behavior/activity. Change corresponding macro in case of false-positive.Nvidia: [{#UUID}]: Encoder utilization exceeded critical thresholdmin(/Nvidia by Zabbix agent 2/nvml.device.encoder.utilization["{#UUID}"],3m) > {$NVIDIA.ENCODER.UTIL.CRIT}scopeperformance8f992f3cffcc42d5a0f4cac878781457[{#UUID}]: Energy consumptionnvml.device.energy.consumption["{#UUID}"]FLOATJRetrieves the total energy consumption of this GPU in joules since the last driver reload.
For Nvidia Volta or newer fully supported devices.MULTIPLIER0.001componentnvidiadevice{#NAME}device{#UUID}c558f53c06b346b8a77b941211797cb2[{#UUID}]: Memory ECC errors, correctedDEPENDENTnvml.device.errors.memory.corrected["{#UUID}"]Retrieves the count of GPU device memory errors that were corrected. For ECC errors, these are single-bit errors, for Texture memory, these are errors fixed by resend.
For Nvidia Fermi or newer fully supported devices.JSONPATH$.correctednvml.device.errors.memory["{#UUID}"]componentmemorycomponentnvidiadevice{#NAME}device{#UUID}4e42cbee73874ca0a8a054a922128cf7change(/Nvidia by Zabbix agent 2/nvml.device.errors.memory.corrected["{#UUID}"]) <> 0Nvidia: [{#UUID}]: Number of corrected memory ECC errors has changedNvidia: Number of corrected memory ECC errors on {HOST.HOST} has changed.current value: {ITEM.LASTVALUE1}INFOAn increasing number of corrected ECC errors can indicate (but not necessary mean) aging or degrading of memory.YESscopenotice4daf87d77c5c4950b0e2d51e24afd1a5[{#UUID}]: Memory ECC errors, uncorrectedDEPENDENTnvml.device.errors.memory.uncorrected["{#UUID}"]Retrieves the count of GPU device memory errors that were not corrected. For ECC errors, these are double-bit errors, for Texture memory, these are errors where the resend fails.
For Nvidia Fermi or newer fully supported devices.JSONPATH$.uncorrectednvml.device.errors.memory["{#UUID}"]componentmemorycomponentnvidiadevice{#NAME}device{#UUID}c5b2ca940c824f0d98e9aff5a5561a2bchange(/Nvidia by Zabbix agent 2/nvml.device.errors.memory.uncorrected["{#UUID}"]) <> 0Nvidia: [{#UUID}]: Number of uncorrected memory ECC errors has changedNvidia: Number uncorrected of memory ECC errors on {HOST.HOST} has changed.current value: {ITEM.LASTVALUE1}INFOAn increasing number of uncorrected ECC errors can indicate potential issues such as: data corruption, system instability, hardware issuesYESscopenoticee97eacb1bef64fdaa7b55883fc37a4a9[{#UUID}]: Memory ECC errors, getnvml.device.errors.memory["{#UUID}"]0TEXTRetrieves the GPU device memory error counters for the device.
For Nvidia Fermi or newer fully supported devices.
Requires NVML_INFOROM_ECC version 2.0 or higher to report aggregate location-based memory error counts. Requires NVML_INFOROM_ECC version 1.0 or higher to report all other memory error counts.
Only applicable to devices with ECC.
Requires ECC Mode to be enabled.CHECK_NOT_SUPPORTED0The requested operation is not available on target deviceCUSTOM_ERRORNo ECC on the device or ECC mode is turned off.componentnvidiacomponentrawdevice{#NAME}device{#UUID}9a7c0d865ff9423b861913b276c2b3f5[{#UUID}]: Register file errors, correctedDEPENDENTnvml.device.errors.register.corrected["{#UUID}"]Retrieves the count of GPU register file errors that were corrected. For ECC errors, these are single-bit errors, for Texture memory, these are errors fixed by resend.
For Nvidia Fermi or newer fully supported devices.JSONPATH$.correctednvml.device.errors.register["{#UUID}"]componentmemorycomponentnvidiadevice{#NAME}device{#UUID}979bd4cc64294b9aa5d0c452f84cf74achange(/Nvidia by Zabbix agent 2/nvml.device.errors.register.corrected["{#UUID}"]) <> 0Nvidia: [{#UUID}]: Number of corrected register file errors has changedNvidia: Number corrected of register file errors on {HOST.HOST} has changed.current value: {ITEM.LASTVALUE1}INFOAn increasing number of corrected register file errors can indicate (but not necessary mean) wearing, aging or degrading of memory.YESscopenoticee7a077d0610e407fb49f1ca8d782c81f[{#UUID}]: Register file errors, uncorrectedDEPENDENTnvml.device.errors.register.uncorrected["{#UUID}"]Retrieves the count of GPU register file errors that were not corrected. For ECC errors, these are double-bit errors, for Texture memory, these are errors where the resend fails.
For Nvidia Fermi or newer fully supported devices.JSONPATH$.uncorrectednvml.device.errors.register["{#UUID}"]componentmemorycomponentnvidiadevice{#NAME}device{#UUID}7fc8da6cd0624880b9838c23386df758change(/Nvidia by Zabbix agent 2/nvml.device.errors.register.uncorrected["{#UUID}"]) <> 0Nvidia: [{#UUID}]: Number of uncorrected register file errors has changedNvidia: Number uncorrected of register file errors on {HOST.HOST} has changed.current value: {ITEM.LASTVALUE1}INFOAn increasing number of uncorrected register file errors can indicate potential issues such as: data corruption, system instability, hardware degradation.YESscopenotice07ea4aec260f4850b0c2a59610f703de[{#UUID}]: Register file errors, getnvml.device.errors.register["{#UUID}"]0TEXTRetrieves the GPU register file error counters for the device.
For Nvidia Fermi or newer fully supported devices.
Requires NVML_INFOROM_ECC version 2.0 or higher to report aggregate location-based memory error counts. Requires NVML_INFOROM_ECC version 1.0 or higher to report all other memory error counts.
Only applicable to devices with ECC.
Requires ECC Mode to be enabled.CHECK_NOT_SUPPORTED0The requested operation is not available on target deviceCUSTOM_ERRORNo ECC on the device or ECC mode is turned off.componentnvidiacomponentrawdevice{#NAME}device{#UUID}49283334edaa4aeb9d63d8f2794c23d7[{#UUID}]: Fan speednvml.device.fan.speed.avg["{#UUID}"]%Retrieves the intended operating speed of the specified device fan.
Note: The reported speed is the intended fan speed. If the fan is physically blocked and unable to spin, the output will not match the actual fan speed.
For all Nvidia discrete products with dedicated fans.
The fan speed is expressed as a percentage of the product's maximum noise tolerance fan speed. In certain cases, this value may exceed 100%.componentnvidiadevice{#NAME}device{#UUID}3311b17ca4aa4b81a6dd10af744166dfmin(/Nvidia by Zabbix agent 2/nvml.device.fan.speed.avg["{#UUID}"],3m) > {$NVIDIA.FAN.SPEED.CRIT}Nvidia: [{#UUID}]: Fan speed exceeded critical thresholdNvidia: [{#UUID}]: Fan speed ({ITEM.VALUE1}) exceeded critical threshold ({$NVIDIA.FAN.SPEED.CRIT} %)current value: {ITEM.LASTVALUE1}AVERAGE[{#UUID}]: Fan speed is very high. It may indicate abnormal behavior/activity. Change corresponding macro in case of false-positive.scopeperformance9aed50cb1a4c4cd4b0d6dce570bb33damin(/Nvidia by Zabbix agent 2/nvml.device.fan.speed.avg["{#UUID}"],3m) > {$NVIDIA.FAN.SPEED.WARN}Nvidia: [{#UUID}]: Fan speed exceeded warning thresholdNvidia: [{#UUID}]: Fan speed ({ITEM.VALUE1}) exceeded warning threshold ({$NVIDIA.FAN.SPEED.WARN} %)current value: {ITEM.LASTVALUE1}WARNING[{#UUID}]: Fan speed is high. It may indicate abnormal behavior/activity. Change corresponding macro in case of false-positive.Nvidia: [{#UUID}]: Fan speed exceeded critical thresholdmin(/Nvidia by Zabbix agent 2/nvml.device.fan.speed.avg["{#UUID}"],3m) > {$NVIDIA.FAN.SPEED.CRIT}scopeperformance77bce39e010f4d4ab5fc347aebd7e61b[{#UUID}]: Graphics frequencynvml.device.graphics.frequency["{#UUID}"]HzRetrieves the current graphics clock speed for the device.
For Nvidia Fermi or newer fully supported devices.MULTIPLIER1000000componentnvidiadevice{#NAME}device{#UUID}141fe268d32941f19b79472a7e041ff9[{#UUID}]: BAR1 memory, freeDEPENDENTnvml.device.memory.bar1.free["{#UUID}"]BUnallocated BAR1 memory on the device.
For Nvidia Kepler or newer fully supported devicesJSONPATH$.free_memory_bytesnvml.device.memory.bar1.get["{#UUID}"]componentmemorycomponentnvidiadevice{#NAME}device{#UUID}da9f4e0ce0af45af9cb414b34932df70[{#UUID}]: BAR1 memory, getnvml.device.memory.bar1.get["{#UUID}"]0TEXTGets Total, Available, and Used size of BAR1 memory.
BAR1 is used to map the FB (device memory) so that it can be directly accessed by the CPU or 3rd party devices (peer-to-peer on the PCIE bus).
For Nvidia Kepler or newer fully supported devicescomponentnvidiacomponentrawdevice{#NAME}device{#UUID}301e573a311d48de876efc3dfe718222[{#UUID}]: BAR1 memory, totalDEPENDENTnvml.device.memory.bar1.total["{#UUID}"]BTotal BAR1 memory on the device.
For Nvidia Kepler or newer fully supported devicesJSONPATH$.total_memory_bytesnvml.device.memory.bar1.get["{#UUID}"]componentmemorycomponentnvidiadevice{#NAME}device{#UUID}da69286b1c4f428d8a9da0d77d5a9057change(/Nvidia by Zabbix agent 2/nvml.device.memory.bar1.total["{#UUID}"]) <> 0Nvidia: [{#UUID}]: Total BAR1 memory has changedNvidia: Total BAR1 memory on {HOST.HOST} has changed.current value: {ITEM.LASTVALUE1}WARNINGTotal BAR1 memory has changed. This could mean possible memory degradation, hardware configuration changes, or memory reservation by system or software.YESscopenoticeafd6bbb5c6584e328cf37a6584970c2d[{#UUID}]: BAR1 memory, usedDEPENDENTnvml.device.memory.bar1.used["{#UUID}"]BAllocated used BAR1 memory on the device.
For Nvidia Kepler or newer fully supported devicesJSONPATH$.used_memory_bytesnvml.device.memory.bar1.get["{#UUID}"]componentmemorycomponentnvidiadevice{#NAME}device{#UUID}fbef18bb0f4d466bbdcd859e7198f144[{#UUID}]: FB memory, freeDEPENDENTnvml.device.memory.fb.free["{#UUID}"]BUnallocated memory on the device.
For all Nvidia products.JSONPATH$.free_memory_bytesnvml.device.memory.fb.get["{#UUID}"]componentmemorycomponentnvidiadevice{#NAME}device{#UUID}cad7ac51204847e2a00a8a594a7ffecc[{#UUID}]: FB memory, getnvml.device.memory.fb.get["{#UUID}"]0TEXTRetrieves the amount of used, free, reserved, and total memory available on the device.
For all Nvidia products.
Enabling ECC reduces the amount of total available memory due to the extra required parity bits. Under WDDM, most of the device memory is allocated and managed on startup by Windows.
Under Linux and Windows TCC, the reported amount of used memory is equal to the sum of memory allocated by all active channels on the device.componentnvidiacomponentrawdevice{#NAME}device{#UUID}d30fba06739c4353bf8b39014da65b21[{#UUID}]: FB memory, reservedDEPENDENTnvml.device.memory.fb.reserved["{#UUID}"]BMemory reserved for system use (driver or firmware) on the device.
For all Nvidia products.JSONPATH$.reserved_memory_bytesCUSTOM_ERRORNVML library too old to support this metric.nvml.device.memory.fb.get["{#UUID}"]componentmemorycomponentnvidiadevice{#NAME}device{#UUID}c67a14cb463e40289500c88d7f9e0997[{#UUID}]: FB memory, totalDEPENDENTnvml.device.memory.fb.total["{#UUID}"]BTotal physical memory on the device.
For all Nvidia products.JSONPATH$.total_memory_bytesnvml.device.memory.fb.get["{#UUID}"]componentmemorycomponentnvidiadevice{#NAME}device{#UUID}945579a20b8e4983a07719f1b7ace8e2change(/Nvidia by Zabbix agent 2/nvml.device.memory.fb.total["{#UUID}"]) <> 0Nvidia: [{#UUID}]: Total FB memory has changedNvidia: Total FB memory on {HOST.HOST} has changed.current value: {ITEM.LASTVALUE1}WARNINGTotal FB memory has changed. This could mean possible memory degradation, hardware configuration changes, or memory reservation by system or software.YESscopenoticec4b9a7593e6d4433b9021c9e3656ed41[{#UUID}]: FB memory, usedDEPENDENTnvml.device.memory.fb.used["{#UUID}"]BAllocated memory on the device.
For all Nvidia products.JSONPATH$.used_memory_bytesnvml.device.memory.fb.get["{#UUID}"]componentmemorycomponentnvidiadevice{#NAME}device{#UUID}78918ce1cbae45bdb3a0773f3bf9bd14[{#UUID}]: Memory frequencynvml.device.memory.frequency["{#UUID}"]HzRetrieves the current memory clock speed for the device.
For Nvidia Fermi or newer fully supported devices.MULTIPLIER1000000componentnvidiadevice{#NAME}device{#UUID}88f5cf58ce5043b8b3f71e57eff67340[{#UUID}]: PCIe utilization, RxDEPENDENTnvml.device.pci.utilization.rx.rate["{#UUID}"]bpsThe PCIe Rx (receive) throughput over a 20ms interval on the device.
For Nvidia Maxwell or newer fully supported devices.JSONPATH$.rx_rate_kb_sMULTIPLIER1024nvml.device.pci.utilization["{#UUID}"]componentnvidiadevice{#NAME}device{#UUID}256d8cd75fb14382ae79a32ab8cc0b13[{#UUID}]: PCIe utilization, TxDEPENDENTnvml.device.pci.utilization.tx.rate["{#UUID}"]bpsThe PCIe Tx (transmit) throughput over a 20ms interval on the device.
For Nvidia Maxwell or newer fully supported devices.JSONPATH$.tx_rate_kb_sMULTIPLIER1024nvml.device.pci.utilization["{#UUID}"]componentnvidiadevice{#NAME}device{#UUID}35d9d7075294487f9bd2f47f4be66dea[{#UUID}]: PCIe utilization, getnvml.device.pci.utilization["{#UUID}"]0TEXTRetrieves PCIe utilization information.
For Nvidia Maxwell or newer fully supported devices.componentnvidiacomponentrawdevice{#NAME}device{#UUID}5a97162894064078ad8913d647dbf76e[{#UUID}]: Performance statenvml.device.performance.state["{#UUID}"]Retrieves the current performance state for the device.
For Nvidia Fermi or newer fully supported devices.Performance statecomponentnvidiadevice{#NAME}device{#UUID}6b99a9e1c5b841059f0e62b04001c82d[{#UUID}]: Power limitnvml.device.power.limit["{#UUID}"]1hFLOATwattsRetrieves the power management limit associated with this device.
For Nvidia Fermi or newer fully supported devices.
The power limit defines the upper boundary for the card's power draw. If the card's total power draw reaches this limit, the power management algorithm kicks in.
This reading is only available if power management mode is supported.MULTIPLIER0.001componentnvidiadevice{#NAME}device{#UUID}4c821a8812c444eca5e03cc0d223b697change(/Nvidia by Zabbix agent 2/nvml.device.power.limit["{#UUID}"]) <> 0Nvidia: [{#UUID}]: Power limit has changedNvidia: [{#UUID}]Power limit on {HOST.HOST} has changed.current value: {ITEM.LASTVALUE1}INFOPower limit for the device has changed. Check if this was intentional.YESscopenoticebb1c7aee478f4ba4b108491532ccd737[{#UUID}]: Power usagenvml.device.power.usage["{#UUID}"]FLOATwattsRetrieves power usage for this GPU (in watts) and its associated circuitry (e.g. memory).
For Nvidia Fermi or newer fully supported devices.
On Fermi and Kepler GPUs, the reading is accurate to within +/- 5% of current power draw. On Ampere (except GA100) or newer GPUs, the API returns power averaged over a 1 second interval. On GA100 and older architectures, instantaneous power is returned.MULTIPLIER0.001componentnvidiadevice{#NAME}device{#UUID}94c121b6588d424f9573f6e1352b5cd4[{#UUID}]: Serial numbernvml.device.serial["{#UUID}"]1hCHARRetrieves the globally unique board serial number associated with this device's board.
For all products with an inforom.
This number matches the serial number tag that is physically attached to the board.CHECK_NOT_SUPPORTED0The requested operation is not available on target deviceCUSTOM_ERRORThe device does not support operation to retrieve serial number.componentnvidiadevice{#NAME}device{#UUID}6e149695990a46f294361c047f288b80[{#UUID}]: SM frequencynvml.device.sm.frequency["{#UUID}"]HzRetrieves the current SM clock speed for the device.
For Nvidia Fermi or newer fully supported devices.MULTIPLIER1000000componentnvidiadevice{#NAME}device{#UUID}f47e1eed64354c7cb010dea8570c067b[{#UUID}]: Temperaturenvml.device.temperature["{#UUID}"]CRetrieves the current temperature readings for the device, in degrees C.
For Nvidia all products.componentnvidiadevice{#NAME}device{#UUID}563fdb237ef84d51b53d6ce70e9c67admin(/Nvidia by Zabbix agent 2/nvml.device.temperature["{#UUID}"],3m) > {$NVIDIA.TEMPERATURE.CRIT}Nvidia: [{#UUID}]: Temperature exceeded critical thresholdNvidia: [{#UUID}]: Temperature ({ITEM.VALUE1}) exceeded critical threshold ({$NVIDIA.TEMPERATURE.CRIT} C)current value: {ITEM.LASTVALUE1}AVERAGE[{#UUID}]: Temperature is very high. It may indicate abnormal behavior/activity. Change corresponding macro in case of false-positive.scopeperformancea5d5a632695c415eab61d427f221db93min(/Nvidia by Zabbix agent 2/nvml.device.temperature["{#UUID}"],3m) > {$NVIDIA.TEMPERATURE.WARN}Nvidia: [{#UUID}]: Temperature exceeded warning thresholdNvidia: [{#UUID}]: Temperature ({ITEM.VALUE1}) exceeded warning threshold ({$NVIDIA.TEMPERATURE.WARN} C)current value: {ITEM.LASTVALUE1}WARNING[{#UUID}]: Temperature is high. It may indicate abnormal behavior/activity. Change corresponding macro in case of false-positive.Nvidia: [{#UUID}]: Temperature exceeded critical thresholdmin(/Nvidia by Zabbix agent 2/nvml.device.temperature["{#UUID}"],3m) > {$NVIDIA.TEMPERATURE.CRIT}scopeperformance0dce87608aeb4b77b0e0df8baf1b9027[{#UUID}]: GPU utilizationDEPENDENTnvml.device.utilization.gpu["{#UUID}"]%Percentage of time over the past sampling period during which one or more kernels were running on the GPU.
For Nvidia Fermi or newer fully supported devices.JSONPATH$.devicenvml.device.utilization["{#UUID}"]componentnvidiadevice{#NAME}device{#UUID}fcc330882ac34f59b2277dd4533d66f8min(/Nvidia by Zabbix agent 2/nvml.device.utilization.gpu["{#UUID}"],3m) > {$NVIDIA.GPU.UTIL.CRIT}Nvidia: [{#UUID}]: GPU utilization exceeded critical thresholdNvidia: [{#UUID}]: GPU utilization ({ITEM.VALUE1}) exceeded critical threshold ({$NVIDIA.GPU.UTIL.CRIT} %)current value: {ITEM.LASTVALUE1}AVERAGE[{#UUID}]: GPU utilization is very high. It may indicate abnormal behavior/activity. Change corresponding macro in case of false-positive.scopeperformanceffa0dd66c71e4c8991244e80d8c4c53amin(/Nvidia by Zabbix agent 2/nvml.device.utilization.gpu["{#UUID}"],3m) > {$NVIDIA.GPU.UTIL.WARN}Nvidia: [{#UUID}]: GPU utilization exceeded warning thresholdNvidia: [{#UUID}]: GPU utilization ({ITEM.VALUE1}) exceeded warning threshold ({$NVIDIA.GPU.UTIL.WARN} %)current value: {ITEM.LASTVALUE1}WARNING[{#UUID}]: GPU utilization is high. It may indicate abnormal behavior/activity. Change corresponding macro in case of false-positive.Nvidia: [{#UUID}]: GPU utilization exceeded critical thresholdmin(/Nvidia by Zabbix agent 2/nvml.device.utilization.gpu["{#UUID}"],3m) > {$NVIDIA.GPU.UTIL.CRIT}scopeperformancec545a78d63644cefac8de7c1c2cbea9a[{#UUID}]: Memory utilizationDEPENDENTnvml.device.utilization.memory["{#UUID}"]%Percentage of time over the past sampling period during which global (device) memory was being read or written.
For Nvidia Fermi or newer fully supported devices.JSONPATH$.memorynvml.device.utilization["{#UUID}"]componentmemorycomponentnvidiadevice{#NAME}device{#UUID}6e296d7736a2484d8b971e51b306927dmin(/Nvidia by Zabbix agent 2/nvml.device.utilization.memory["{#UUID}"],3m) > {$NVIDIA.MEMORY.UTIL.CRIT}Nvidia: [{#UUID}]: Memory utilization exceeded critical thresholdNvidia: [{#UUID}]: Memory utilization ({ITEM.VALUE1}) exceeded critical threshold ({$NVIDIA.MEMORY.UTIL.CRIT} %)current value: {ITEM.LASTVALUE1}AVERAGE[{#UUID}]: Memory utilization is very high. It may indicate abnormal behavior/activity. Change corresponding macro in case of false-positive.scopeperformance722ff2b5a1d94d5f83f2946b04c989admin(/Nvidia by Zabbix agent 2/nvml.device.utilization.memory["{#UUID}"],3m) > {$NVIDIA.MEMORY.UTIL.WARN}Nvidia: [{#UUID}]: Memory utilization exceeded warning thresholdNvidia: [{#UUID}]: Memory utilization ({ITEM.VALUE1}) exceeded warning threshold ({$NVIDIA.MEMORY.UTIL.WARN} %)current value: {ITEM.LASTVALUE1}WARNING[{#UUID}]: Memory utilization is high. It may indicate abnormal behavior/activity. Change corresponding macro in case of false-positive.Nvidia: [{#UUID}]: Memory utilization exceeded critical thresholdmin(/Nvidia by Zabbix agent 2/nvml.device.utilization.memory["{#UUID}"],3m) > {$NVIDIA.MEMORY.UTIL.CRIT}scopeperformance9a3e4972de4d4149bf14512be108179f[{#UUID}]: Device utilization, getnvml.device.utilization["{#UUID}"]0TEXTRetrieves the current utilization rates for the device's major subsystems.
For Nvidia Fermi or newer fully supported devices.componentnvidiacomponentrawdevice{#NAME}device{#UUID}91fc19e1784846b7afb982f7ae87439e[{#UUID}]: Video frequencynvml.device.video.frequency["{#UUID}"]HzRetrieves the current video encoder/decoder clock speed for the device.
For Nvidia Fermi or newer fully supported devices.MULTIPLIER1000000componentnvidiadevice{#NAME}device{#UUID}655ad43bd4744909aea3f843db89fe36(min(/Nvidia by Zabbix agent 2/nvml.device.power.usage["{#UUID}"],3m) * 100 / last(/Nvidia by Zabbix agent 2/nvml.device.power.limit["{#UUID}"])) > {$NVIDIA.POWER.UTIL.CRIT}Nvidia: [{#UUID}]: Power usage exceeded critical thresholdNvidia: [{#UUID}]: Power usage ({ITEM.VALUE1}) exceeded critical threshold ({$NVIDIA.POWER.UTIL.CRIT} %)current value: {ITEM.LASTVALUE1}AVERAGE[{#UUID}]: Power usage is very high. It may indicate abnormal behavior/activity. Change corresponding macro in case of false-positive.scopeperformance5ffa7d5f130b4e46936a6f17f7773efe(min(/Nvidia by Zabbix agent 2/nvml.device.power.usage["{#UUID}"],3m) * 100 / last(/Nvidia by Zabbix agent 2/nvml.device.power.limit["{#UUID}"])) > {$NVIDIA.POWER.UTIL.WARN}Nvidia: [{#UUID}]: Power usage exceeded warning thresholdNvidia: [{#UUID}]: Power usage ({ITEM.VALUE1}) exceeded warning threshold ({$NVIDIA.POWER.UTIL.WARN} %)current value: {ITEM.LASTVALUE1}WARNING[{#UUID}]: Power usage is high. It may indicate abnormal behavior/activity. Change corresponding macro in case of false-positive.Nvidia: [{#UUID}]: Power usage exceeded critical threshold(min(/Nvidia by Zabbix agent 2/nvml.device.power.usage["{#UUID}"],3m) * 100 / last(/Nvidia by Zabbix agent 2/nvml.device.power.limit["{#UUID}"])) > {$NVIDIA.POWER.UTIL.CRIT}scopeperformance5d68d70d11454cb5942b251e6d1c2eb3Nvidia: [{#UUID}]: BAR1 memorySTACKEDITEMNvidia by Zabbix agent 2nvml.device.memory.bar1.total["{#UUID}"]FF0000- Nvidia by Zabbix agent 2nvml.device.memory.bar1.used["{#UUID}"]
176B900- Nvidia by Zabbix agent 2nvml.device.memory.bar1.free["{#UUID}"]
9e977f8c9d5c4568b2ec57a6aeb03361Nvidia: [{#UUID}]: Fan speed199C0D- Nvidia by Zabbix agent 2nvml.device.fan.speed.avg["{#UUID}"]
62d7606507774eee96f0d56fd6d74a6fNvidia: [{#UUID}]: FB memorySTACKEDITEMNvidia by Zabbix agent 2nvml.device.memory.fb.total["{#UUID}"]FF0000- Nvidia by Zabbix agent 2nvml.device.memory.fb.used["{#UUID}"]
1FF8000- Nvidia by Zabbix agent 2nvml.device.memory.fb.reserved["{#UUID}"]
276B900- Nvidia by Zabbix agent 2nvml.device.memory.fb.free["{#UUID}"]
93509c440ee446d7b96c407207bcd726Nvidia: [{#UUID}]: Memory ECC errors76B900- Nvidia by Zabbix agent 2nvml.device.errors.memory.corrected["{#UUID}"]
1FF0000- Nvidia by Zabbix agent 2nvml.device.errors.memory.uncorrected["{#UUID}"]
9acad578a5ee4ce6aeef509d5d08f824Nvidia: [{#UUID}]: PCIe utilizationFF0000- Nvidia by Zabbix agent 2nvml.device.pci.utilization.rx.rate["{#UUID}"]
10040FF- Nvidia by Zabbix agent 2nvml.device.pci.utilization.tx.rate["{#UUID}"]
7a36984b772d4c3292ca0a8eafbd2c3dNvidia: [{#UUID}]: Performance state15FIXEDFIXED199C0D- Nvidia by Zabbix agent 2nvml.device.performance.state["{#UUID}"]
08956a3887d14562b2c12a78c7519a23Nvidia: [{#UUID}]: Power usageITEMNvidia by Zabbix agent 2nvml.device.power.limit["{#UUID}"]199C0D- Nvidia by Zabbix agent 2nvml.device.power.usage["{#UUID}"]
30a5a91d874847619938568ffa0a7341Nvidia: [{#UUID}]: Register file errors76B900- Nvidia by Zabbix agent 2nvml.device.errors.register.corrected["{#UUID}"]
1FF0000- Nvidia by Zabbix agent 2nvml.device.errors.register.uncorrected["{#UUID}"]
nvml.device.get{#NAME}$.device_name{#UUID}$.device_uuidDISCARD_UNCHANGED_HEARTBEAT1dclasshardwaretargetnvidia{$NVIDIA.DECODER.UTIL.CRIT}90Critical threshold for decoder utilization, in %.{$NVIDIA.DECODER.UTIL.WARN}80Warning threshold for decoder utilization, in %.{$NVIDIA.ENCODER.UTIL.CRIT}90Critical threshold for encoder utilization, in %.{$NVIDIA.ENCODER.UTIL.WARN}80Warning threshold for encoder utilization, in %.{$NVIDIA.FAN.SPEED.CRIT}90Critical threshold for fan speed, in %.{$NVIDIA.FAN.SPEED.WARN}80Warning threshold for fan speed, in %.{$NVIDIA.GPU.UTIL.CRIT}90Critical threshold for overall GPU utilization, in %.{$NVIDIA.GPU.UTIL.WARN}80Warning threshold for overall GPU utilization, in %.{$NVIDIA.MEMORY.UTIL.CRIT}90Critical threshold for memory utilization, in %.{$NVIDIA.MEMORY.UTIL.WARN}80Warning threshold for memory utilization, in %.{$NVIDIA.NAME.MATCHES}.*Filter to include GPUs by name in discovery.{$NVIDIA.NAME.NOT_MATCHES}CHANGE IF NEEDEDFilter to exclude GPUs by name in discovery.{$NVIDIA.POWER.UTIL.CRIT}90Critical threshold for power usage, in %.{$NVIDIA.POWER.UTIL.WARN}80Warning threshold for power usage, in %.{$NVIDIA.TEMPERATURE.CRIT}90Critical threshold for temperature, in %.{$NVIDIA.TEMPERATURE.WARN}80Warning threshold for temperature, in %.{$NVIDIA.UUID.MATCHES}.*Filter to include GPUs by UUID in discovery.{$NVIDIA.UUID.NOT_MATCHES}CHANGE IF NEEDEDFilter to exclude GPUs by UUID in discovery.bc2dc6cc58984a20829499823a3c84faNvidia: OverviewSummarysvggraphGPU utilization366INTEGERds.0.fill0STRINGds.0.items.0*GPU utilization*INTEGERds.0.missingdatafunc1INTEGERds.0.transparency1INTEGERds.0.width3INTEGERlegend_lines10INTEGERlegend_lines_mode1INTEGERlegend_statistic1STRINGreferenceEBEFBsvggraphTemperature6366INTEGERds.0.fill0STRINGds.0.items.0*Temperature*INTEGERds.0.missingdatafunc1INTEGERds.0.transparency1INTEGERds.0.width3INTEGERlegend_lines10INTEGERlegend_lines_mode1INTEGERlegend_statistic1STRINGreferenceECEEBsvggraphMemory utilization36366INTEGERds.0.fill0STRINGds.0.items.0*Memory utilization*INTEGERds.0.missingdatafunc1INTEGERds.0.transparency1INTEGERds.0.width3INTEGERlegend_lines10INTEGERlegend_lines_mode1INTEGERlegend_statistic1STRINGreferenceEBEEBsvggraphPower usage366366INTEGERds.0.fill0STRINGds.0.items.0*Power usage*INTEGERds.0.missingdatafunc1INTEGERds.0.transparency1INTEGERds.0.width3INTEGERlegend_lines10INTEGERlegend_lines_mode1INTEGERlegend_statistic1STRINGreferenceECEDBFrequenciessvggraphSM frequency366INTEGERds.0.fill0STRINGds.0.items.0*SM frequency*INTEGERds.0.missingdatafunc1INTEGERds.0.transparency1INTEGERds.0.width3INTEGERlegend_lines10INTEGERlegend_lines_mode1INTEGERlegend_statistic1STRINGreferenceDCBFAsvggraphVideo frequency6366INTEGERds.0.fill0STRINGds.0.items.0*Video frequency*INTEGERds.0.missingdatafunc1INTEGERds.0.transparency1INTEGERds.0.width3INTEGERlegend_lines10INTEGERlegend_lines_mode1INTEGERlegend_statistic1STRINGreferenceFCACCsvggraphGraphics frequency36366INTEGERds.0.fill0STRINGds.0.items.0*Graphics frequency*INTEGERds.0.missingdatafunc1INTEGERds.0.transparency1INTEGERds.0.width3INTEGERlegend_lines10INTEGERlegend_lines_mode1INTEGERlegend_statistic1STRINGreferenceADAEEsvggraphMemory frequency366366INTEGERds.0.fill0STRINGds.0.items.0*Memory frequency*INTEGERds.0.missingdatafunc1INTEGERds.0.transparency1INTEGERds.0.width3INTEGERlegend_lines10INTEGERlegend_lines_mode1INTEGERlegend_statistic1STRINGreferenceDAEBEMemory errorssvggraphMemory ECC errors, corrected366INTEGERds.0.fill0STRINGds.0.items.0*Memory ECC errors, corrected*INTEGERds.0.transparency1INTEGERds.0.width3INTEGERlegend_lines10INTEGERlegend_lines_mode1INTEGERlegend_statistic1STRINGreferenceECBFBsvggraphRegister file Errors, corrected6366INTEGERds.0.fill0STRINGds.0.items.0*Register file errors, corrected*INTEGERds.0.transparency1INTEGERds.0.width3INTEGERlegend_lines10INTEGERlegend_lines_mode1INTEGERlegend_statistic1STRINGreferenceECBFFsvggraphMemory ECC errors, uncorrected36366INTEGERds.0.fill0STRINGds.0.items.0*Memory ECC errors, uncorrected*INTEGERds.0.transparency1INTEGERds.0.width3INTEGERlegend_lines10INTEGERlegend_lines_mode1INTEGERlegend_statistic1STRINGreferenceEABFFsvggraphRegister file Errors, uncorrected366366INTEGERds.0.fill0STRINGds.0.items.0*Register file errors, uncorrected*INTEGERds.0.transparency1INTEGERds.0.width3INTEGERlegend_lines10INTEGERlegend_lines_mode1INTEGERlegend_statistic1STRINGreferenceECBFAMemory, PCI, fangraphprototypeBAR1 memory366INTEGERcolumns1GRAPH_PROTOTYPEgraphid.0Nvidia by Zabbix agent 2Nvidia: [{#UUID}]: BAR1 memorySTRINGreferenceDCBFBgraphprototypePCIe utilization6366INTEGERcolumns1GRAPH_PROTOTYPEgraphid.0Nvidia by Zabbix agent 2Nvidia: [{#UUID}]: PCIe utilizationSTRINGreferenceACECAgraphprototypeFB memory36366INTEGERcolumns1GRAPH_PROTOTYPEgraphid.0Nvidia by Zabbix agent 2Nvidia: [{#UUID}]: FB memorySTRINGreferenceACDCAgraphprototypeFan speed366366INTEGERcolumns1GRAPH_PROTOTYPEgraphid.0Nvidia by Zabbix agent 2Nvidia: [{#UUID}]: Fan speedSTRINGreferenceACFCAEncoderssvggraphEncoder utilization366INTEGERds.0.fill0STRINGds.0.items.0*Encoder utilization*INTEGERds.0.transparency1INTEGERds.0.width3INTEGERlegend_lines10INTEGERlegend_lines_mode1INTEGERlegend_statistic1STRINGreferenceECAFBsvggraphEncoder average FPS6366INTEGERds.0.fill0STRINGds.0.items.0*Encoder average FPS*INTEGERds.0.transparency1INTEGERds.0.width3INTEGERlegend_lines10INTEGERlegend_lines_mode1INTEGERlegend_statistic1STRINGreferenceECFFFsvggraphEncoder sessions36366INTEGERds.0.fill0STRINGds.0.items.0*Encoder sessions*INTEGERds.0.transparency1INTEGERds.0.width3INTEGERlegend_lines10INTEGERlegend_lines_mode1INTEGERlegend_statistic1STRINGreferenceECAFFsvggraphEncoder average latency366366INTEGERds.0.fill0STRINGds.0.items.0*Encoder average latency*INTEGERds.0.transparency1INTEGERds.0.width3INTEGERlegend_lines10INTEGERlegend_lines_mode1INTEGERlegend_statistic1STRINGreferenceECFFA9c9e6145dc87474ba03f16ea73c355d4Performance state0MaximumIN_RANGE1-4HighIN_RANGE5-10AverageIN_RANGE11-14Low15Minimum32Unknown