c++ - Sequential programing in CUDA -


i realize simple loop in cuda.

for (int = 1; <= n; i++)     {         (int j = 1; j <= n; j++)         {         x[i, j] = (x0[i, j] + a*(x[i - 1, j] + x[i + 1, j] + x[i, j - 1] + x[i, j + 1])) / c;         }     } 

the problem is: compute e.g. x[i,j] need know new value of x[i-1,j] , x[i,j-1] easy if want calculate on cpu (calculations sequential). gpu calculating parallel results received cpu , gpu different. found info dynamic parallelism in cuda , cudadevicesynchronize() , believe useful anyway still have no idea how implement loop in kernel. grateful help.

comments above right , pure sequential implementation, need copy data. here kernel (without memory management code or further details):

   __global__ void update(...)     for(int = threadidx.x + blockdim.x * blockidx.x; <= n; += blockdim.x * griddim.x) {     for(int j = threadidx.y + blockdim.y * blockidx.y; j <= n; j += blockdim.y * griddim.y)     {         output[i,j] = update_func(input, i, j);     } } 

which can invoke (from host), using

update<<<dim3(16, 16), dim3(64, 64)>>>(input, output, width, height); 

replacing launch bounds whatever value suits hardare


Comments

Popular posts from this blog

Ansible - ERROR! the field 'hosts' is required but was not set -

customize file_field button ruby on rails -

SoapUI on windows 10 - high DPI/4K scaling issue -