kramann.info
© Guido Kramann

GPU-Test

(EN google-translate)

(PL google-translate)

cudatest.zip -- enthält nachfolgend verwendetes Testprogramm sample_cuda.cu und die kompilierte Datei sample_cuda.

Der Jetson-Nano besitzt neben einer Vierkern-CPU 128 GPUs.
Diese sollen verwendet werden, um Verarbeitungsprozesse zu parallelisieren.
Dazu müssen Daten vom CPU-Programm auf die GPUs verteilt werden und deren Funktionsaufruf angestoßen werden.
Die Ergebnisse aller GPU-Rechnungen werden dann wieder vom CPU-Teil des Programms weiter verarbeitet.
CUDA ermöglicht es, in einem einzigen Quelltext sowohl den CPU-Teil als auch GPU-Teil zu formulieren.
CUDA steht für "Compute Unified Device Architecture".

Unter folgendem Link findet sich ein Testprogramm, das ohne irgend eine Anpassung und unter Verwendung genau der dort angegebenen Compiler-Befehle funktioniert:

https://maker.pro/nvidia-jetson/tutorial/introduction-to-cuda-programming-with-jetson-nano

Dort ist auch die Architektur des Jetson Nano mit CPU und GPUs visualisiert worden.

#include <iostream>
#include <cuda_runtime.h>
#include <math.h>
#include <stdlib.h>
// Kernel function to add the elements of two arrays
__global__ void addNums(int *output, int *x, int *y, int num_iters) {
for (int i = 0; i < num_iters; i++) {
output[i] = x[i] + y[i];
}
}
int main() {
// Declare the variables
int num_iters = 12000000;
int *x;
int *y;
int *output;
// Seeding the random number generator
srand(10);
std::cout << "Hello World, this is CUDA sample code" << std::endl;
// Declare the memory size for the variables from the unified memory location accessible from CPU or GPU
cudaMallocManaged(&x, num_iters*sizeof(int));
cudaMallocManaged(&y, num_iters*sizeof(int));
cudaMallocManaged(&output, num_iters*sizeof(int));
// Initialization with random numbers
for (unsigned int i = 0; i < num_iters; i++) {
output[i] = 0;
x[i] = rand();
y[i] = rand();
}
// Run the kernel function on each 64 threads on 32 blocks of the GPU
addNums<<<32, 64>>>(output, x, y, num_iters);
// Synchronization between the CPU and GPU (CPU waiting for GPU to finish before accessing the memory)
cudaDeviceSynchronize();
// Releasing the memory
cudaFree(x);
cudaFree(y);
std::cout << "Code Execution Completed" << std::endl;
return 0;
}

Code 0-1: Quelltext sample_cuda.cu von https://maker.pro/nvidia-jetson/tutorial/introduction-to-cuda-programming-with-jetson-nano

// Open the ~/.bashrc file
~$ sudo gedit ~/.bashrc
// Append these two lines to the file
export PATH=:/usr/local/cuda/bin
export LD_LIBRARY_PATH=:/usr/local/cuda/lib64
// Source the file
~$ source ~/.bashrc
// Confirm the compiler version
~$ nvcc --version
2. Save the code provided in file called sample_cuda.cu. The file extension is .cu to indicate it is a CUDA code.
3. Compile the code: ~$ nvcc sample_cuda.cu -o sample_cuda
4. Execute the code: ~$ ./sample_cuda

Code 0-2: Angaben, um sample_cuda.cu zu kompilieren und zu starten. Quelle ist ebenfalls: https://maker.pro/nvidia-jetson/tutorial/introduction-to-cuda-programming-with-jetson-nano

Weitere Links zu Beispielen, Tutorials und Dokumentationen von CUDA mit dem Jetson Nano:

https://docs.nvidia.com/cuda/index.html

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html

https://jfrog.com/connect/post/installing-cuda-on-nvidia-jetson-nano/

https://www.seeedstudio.com/blog/2020/07/29/install-cuda-11-on-jetson-nano-and-xavier-nx/

https://smist08.wordpress.com/2019/04/03/playing-with-cuda-on-my-nvidia-jetson-nano/