c++ - Sequential programing in CUDA -
i realize simple loop in cuda.
for (int = 1; <= n; i++) { (int j = 1; j <= n; j++) { x[i, j] = (x0[i, j] + a*(x[i - 1, j] + x[i + 1, j] + x[i, j - 1] + x[i, j + 1])) / c; } }
the problem is: compute e.g. x[i,j] need know new value of x[i-1,j] , x[i,j-1] easy if want calculate on cpu (calculations sequential). gpu calculating parallel results received cpu , gpu different. found info dynamic parallelism in cuda , cudadevicesynchronize() , believe useful anyway still have no idea how implement loop in kernel. grateful help.
comments above right , pure sequential implementation, need copy data. here kernel (without memory management code or further details):
__global__ void update(...) for(int = threadidx.x + blockdim.x * blockidx.x; <= n; += blockdim.x * griddim.x) { for(int j = threadidx.y + blockdim.y * blockidx.y; j <= n; j += blockdim.y * griddim.y) { output[i,j] = update_func(input, i, j); } }
which can invoke (from host), using
update<<<dim3(16, 16), dim3(64, 64)>>>(input, output, width, height);
replacing launch bounds whatever value suits hardare
Comments
Post a Comment