Leo Chu: September 2015

In order to get familiar with CUDA, I wrote a small C++ program to compare the performance between running a function sequentially and in parallel.

The function is very small, just adding 2 1024 dimensional vector into one another vector. It was ran for 2 times in my program. Sequentially at first, and then ran in many threads by GPU later. If comparing the processing time between the 2 of them.

As it turns out, the function ran in parallel is a little bit better, but not as good as I expected. It is 6.5 times faster than the sequential one. So I did very simple profiling: counting the time of cuda functions. What surprised me most is cudaMalloc spent the processing time most, almost 70%. That means the 70% was not spent on calculating.

I guess like cache, GPU memory also needs warm up. So I run the function parallelly once right after the second one. Bang! It runs 22.33 faster than the sequential one. The cudaMalloc was very close to zero.

Actually, I didn't wrote my program from scratch. Instead, I modified the source code of a youtube online tutorial for it. Below is a cuda work flow, and cudaMalloc is responsible to the step 1. What my program actually did is warming up GPU memory at the first parallel run, and the next one gained all the benefit of GPU multi-threading.

For engineers, the best learning approach is try to build something, no matter you familiar with it or not.

The thing I am trying to study is parallel programming. So here is CUDA. CUDA(Compute Unified Device Architecture) is developed by NVIDIA which makes use of GPU for parallel programming. That means a smartly programmed function can be divided into many threads for a GPU to run.

There are lots of interesting comparisons between CUDA and OpenCL, it seems that both of them are very popular. Is OpenCL suitable for all display cards? I will find out later.

For this week, I think I should at least setup a development environment for CUDA. Luckily, my laptop has configured a NIVIDA CUDA compatible display card (GeForce GT 740M, https://developer.nvidia.com/cuda-gpus). It is not a good one but seems can support CUDA.

Installing CUDA is easy, just need to download from https://developer.nvidia.com/cuda-downloads. I have upgraded just upgraded Windows 10, so I downloaded the Windows 8.1 package. But I made a mistake which costed me a day to fix. I shouldn't install Visual Studio 2015! I had to restart my laptop several times to complete the installation. The worst thing is CUDA doesn't support it. How come 2015 is not backward compatible? It works finally after I uninstalled Visual Studio 2015 and installed 2013 community version.

Ok. The development environment was ready. But I had no idea if it can work. So I followed some online tutorial available on youtube (https://www.youtube.com/watch?v=Ed_h2km0liI). Great, it worked. It seems not difficult. There is a good introduction on the net (http://geco.mines.edu/tesla/cuda_tutorial_mio/). As a beginner, I should read it through.

This is the report that generated by CUDA device query which is telling how weak my GPU is.
Note that the CPP source code is located at:
C:\ProgramData\NVIDIA Corporation\CUDA Samples\v7.0\1_Utilities\deviceQuery\

Leo Chu

Thursday, 10 September 2015

Parallel computing - 2 (performance)

Sunday, 6 September 2015

Parallel computing - 1 (environment)