caching - Why did I failed to speed up my program by making it cache-friendly? -


i used intel's ifort compile program

there lot nested loops 1 below

do i=1,imax j=1,jmax if (a(i,j).ne.1) cycle ..... ..... ..... enddo enddo  

as loops like

do j=1,jmax i=1,imax if (a(i,j).ne.1) cycle ..... ..... ..... enddo enddo  

i noticed fortran stores arrays column-major order, put j loop outside of loop.

it confused me that, there seems drop of performance.

in contrast, tried put loop outsider of j loop others loops. there's still performance loss, slight however


program p  integer ::i, j, k, s real ::a(1000,1000)  i=1,1000   j=1,1000     call random_number(a(i,j))   enddo enddo  k=1,10000   j=1,1000     i=1,1000       if (a(i,j) .ge. 111) s = s+a(i,j)     enddo   enddo enddo  print *,s  end 

ifort -fpic -no-prec-div -o3 -xcore-avx2 a.f90 -o time ./a 10921820

real    0m2.221s user    0m2.221s sys     0m0.001s 

program p  integer ::i, j, k, s real ::a(1000,1000)  i=1,1000   j=1,1000     call random_number(a(i,j))   enddo enddo  k=1,10000   i=1,1000     j=1,1000       if (a(i,j) .ge. 111) s = s+a(i,j)     enddo   enddo enddo  print *,s  end 

ifort -fpic -no-prec-div -o3 -xcore-avx2 a.f90 -o a

time ./a     10923324  real    0m4.459s user    0m4.457s sys     0m0.003s 

the performance difference stable, may prove optimization. when adapted real project, failed.

thank again, i'm quite new stack overflow, help

as mention correctly, fortran column-major. means column major index , columns of matrix stored 1 after other in memory. assumed, column-major means looping on column index means traversing matrix on elements contiguous in memory, not true. matrix

| a11, a12, a13 | | a21, a22, a23 | | a31, a32, a33 | 

is stored in column-major order as

a11, a21, a31, a12, a22, a32, a13, a23, a33 --column 1---  --column 2---  --column 3--- 

this means when create two-dimensional loop in fortran, first index, represents row index, should innermost best performance, because traverse matrix in memory order wrote above. therefore

do j=1,1000     i=1,1000         if (a(i,j) .ge. 111) s = s + a(i,j)     enddo enddo 

gives, found well, best performance.


Comments

Popular posts from this blog

Ansible - ERROR! the field 'hosts' is required but was not set -

customize file_field button ruby on rails -

SoapUI on windows 10 - high DPI/4K scaling issue -