CUDA is the computing engine in NVIDIA graphics processing units (GPUs) that is accessible to software developers through variants of industry standard programming languages.

CUDA's aim is to expose the massive data parallel processing power of GPUs to generic computation problems. The CUDA model consists of a CPU that handles generic computation and one of more GPUs that handle specially marked portions of the code called kernels. Kernels are written in a C-like language, which is then compiled to assembly suitable for execution on a GPU. Each CUDA execution engine consists of a large number of execution units, which in turn supports a large number of on the fly threads. The typical execution of a kernel involves allocating memory on the graphics device, copying data from main system memory to it, firing up the algorithm execution and then copying the results back. There are several intricacies that CUDA hides, for example splitting the load among thread groups and synchronising threads at predefined points.


CUDA has been adopted by a wide arrange of applications in various fields. Being one of the first (debuted in 2007) technologies that allowed generic GPU programming, its use has been widespread in several scientific, medical, financial and consumer products. NVidia produces machines (rack-mounted and workstations) that are able to execute massively parallel computations efficiently. The second largest supercomputer in the world (Nebulae, China) sports CUDA processors, an indication that CUDA might be ready for HPC in the real world.

CUDA can act as a base layer for many other technologies, like C++ container libraries (Thurst and Thurst Graph), mathematical libraries (CUBLAS), bioinforatics (gromacs) and others. OpenCL has also been implemented on top of CUDA.

CUDA is probably the most widely adopted GPU acceleration architecture. To that end, the availability of free (and high quality) software tools and the fact that the target audience is quite big (NVidia is already the largest graphics card maker) led to a large number of developers using it. CUDA runtime bindings exist in several languages, so one can use CUDA indirectly, though a scripting language such as Python. However, CUDA is not an industry supported standard and no offerings outside NVidia exist, so implementations are tied to a single provider.

License: Proprietary