GPU Usage Metrics in LLM Systems
GPU Metrics
- Utilization = Output achieved Capacity paid for
- GPU Allocation Utilization = GPU-seconds running application code GPU-seconds paid for
- GPU Kernel Utilization = GPU-seconds running kernels GPU-seconds paid for
When utilizing GPU's, there are a few metrics to use to figure out how much you get out of it. The time to actually reserve, setup, and place code onto a GPU is known as Allocation, and then once code is sitting on the GPU the actual utilization of SIMD processing, which is commonly known as kernel compute, showcases our actual ability to use the GPU for what it's best for
Just because an allocated GPU is running application code doesn’t mean it is running code on the GPU. The term of art for “code that runs on the GPU” in the popular CUDA programming model for GPUs is “kernel”, and so we call the fraction of time we spend running code on the GPU the GPU Kernel Utilization
This utilization metric is reported by, among others, the beloved nvidia-smi command line tool wrapping NVIDIA’s Management Library for their GPU hardware, and so it is commonly checked and cited