Introduction
NVIDIA's latest generation of GPUs based on the Kepler architecture, contain a hardware-based H.264 video encoder (henceforth referred to as NVENC). This document provides information about the capabilities of the hardware encoder, along with some relevant data about quality and performance.
Before Kepler GPUs, the only NVIDIA solution for video encoding was via use of NVIDIA's CUDA-based encoder, exposed through the NVCUVENC API. One of the disadvantages of the CUDA-based encoder is that it used a combination of the CPU and GPU's 3D engine for encoding, leaving very little processing power for other tasks. This approach also increased overall system power consumption.
NVENC, being dedicated H.264 hardware, does not use 3D engine and hence uses much less power compared to the CUDA-based encoder. It also leaves the CPU to perform other tasks. The hardware is optimized to provide excellent quality at high performance, enabling a wide range of applications that require video encoding capabilities. The NVENC hardware encoder improves encoding performance by almost a factor of 4, compared to the CUDA encoder* (at equivalent quality).
It is important to note that an application can choose to encode using both NVENC hardware and NVIDIA's legacy CUDA encoder in parallel, without affecting each other. Note, however, that, video pre-processing algorithms may require CUDA, and will result in reduced performance from the CUDA encoder.
* CUDA encoder profiled with Core 2 Duo (2.6 GHz) + Tesla C2050 with GF100
Performance
The NVENC hardware is designed to support up to 8X real-time HD video encoding (1080p @30 fps). This means that the hardware can encode 240 frames per second of 1920 × 1080 progressive video. The application can trade performance for encoded picture quality.
A more common setting of the encoder (internally referred to as HQ – High Quality) results in a very good quality encoded bit-stream. At this setting, NVENC can encode 1080p video at 4X real-time; i.e. at 120 fps (no B frames).
With the inclusion of B frames in the encoding, the performance is lower and depends on the exact GOP structure.
The encoding latency is currently 1 frame (without B-frames), but the software supports slice-based encoding and subsequent software API releases will expose this feature to the applications.
The hardware has been extensively tested and verified to yield the advertised performance at all settings. The performance has been measured using the sample application provided with the NVENC SDK [1], using a single encode session and multiple concurrent encode sessions. Figure 2 shows the measured encoding performance of NVENC with various sample video clips using several presets.
Although the performance benchmarking results below use motion video, performance is not different with synthetic content (e.g. gameplay, desktop). However, it should be noted that the quality constraints for such synthetic content can vary significantly from application to application, and this may indirectly affect the performance.
Quality
NVENC hardware has been designed to provide quality comparable to x264 (an open source H.264 encode library) with much higher performance. The comparable x264 preset used for quality comparison between NVENC, x264 and other competitive solutions is as follows (refer to x264 documentation) :