开发者

Fortran intrinsic timing routines, which is better? cpu_time or system_clock

开发者 https://www.devze.com 2023-03-24 11:22 出处:网络
When timing a FORTRAN program i usually just use the command call cpu_time(t). Then i stumbled across call system_clock([count,count_rate,count_max]) which seems to do the same thing. However, in a mo

When timing a FORTRAN program i usually just use the command call cpu_time(t).

Then i stumbled across call system_clock([count,count_rate,count_max]) which seems to do the same thing. However, in a more difficult manor. My knowledge of these come from: Old Intel documentation.

I wasn't able to find it on Intel's homepage. See my markup below.

  1. Which is the more accurate, or are they similar?
  2. Do one of them count cache misses (or other of the sorts) and the other not, or do any of them?
  3. Or is the only difference being the marked thing in my markup below?

Those are my questions, below i have supplied a code for you to see some timings and usages. They have showed me that they are very similar in output and thus seem to be similar in implementation.

I should note that i will probably always stick with cpu_time, and that i don't really need more precise timings.

In the below code i have tried to compare them. (i have also tried more elaborate things, but will n开发者_开发百科ot supply in order to keep brevity) So basically my result is that:

  • cpu_time

    1. Is easier to use, you don't need the initialization calls
    2. Direct time in a difference
    3. Should also be compiler specific, but there is no way to see the precision. (the norm is milliseconds)
    4. Is sum of thread time. I.e. not recommended for parallel runs.
  • system_clock

    1. Needs pre-initialization.
    2. After-process, in form of a divide. (small thing, but nonetheless a difference)
    3. Is compiler specific. On my PC the following was found:
      • Intel 12.0.4 uses a count rate of 10000, due to the INTEGER precision.
      • gcc-4.4.5 uses 1000, do not know how this differentiates
    4. Is prone to encounter wraparounds, i.e. if c1 > c2, due to count_max
    5. Is time from one standard time. Thus this will yield the actual time of one thread and not the sum.

Code:

PROGRAM timer
  IMPLICIT NONE
  REAL :: t1,t2,rate 
  INTEGER :: c1,c2,cr,cm,i,j,n,s
  INTEGER , PARAMETER :: x=20000,y=15000,runs=1000
  REAL :: array(x,y),a_diff,diff

  ! First initialize the system_clock
  CALL system_clock(count_rate=cr)
  CALL system_clock(count_max=cm)
  rate = REAL(cr)
  WRITE(*,*) "system_clock rate ",rate

  diff = 0.0
  a_diff = 0.0
  s = 0
  DO n = 1 , runs
     CALL CPU_TIME(t1)
     CALL SYSTEM_CLOCK(c1)
     FORALL(i = 1:x,j = 1:y)
        array(i,j) = REAL(i)*REAL(j) + 2
     END FORALL
     CALL CPU_TIME(t2)
     CALL SYSTEM_CLOCK(c2)
     array(1,1) = array(1,2)     
     IF ( (c2 - c1)/rate < (t2-t1) ) s = s + 1
     diff = (c2 - c1)/rate - (t2-t1) + diff
     a_diff = ABS((c2 - c1)/rate - (t2-t1)) + a_diff
  END DO

  WRITE(*,*) "system_clock : ",(c2 - c1)/rate
  WRITE(*,*) "cpu_time     : ",(t2-t1)
  WRITE(*,*) "sc < ct      : ",s,"of",runs
  WRITE(*,*) "mean diff    : ",diff/runs
  WRITE(*,*) "abs mean diff: ",a_diff/runs
END PROGRAM timer

To complete i here give the output from my Intel 12.0.4 and gcc-4.4.5 compiler.

  • Intel 12.0.4 with -O0

    system_clock rate    10000.00    
    system_clock :    2.389600    
    cpu_time     :    2.384033    
    sc < ct      :            1 of        1000
    mean diff    :   4.2409324E-03
    abs mean diff:   4.2409897E-03
    
    real    42m5.340s
    user    41m48.869s
    sys 0m12.233s
    
  • gcc-4.4.5 with -O0

    system_clock rate    1000.0000    
    system_clock :    1.1849999    
    cpu_time     :    1.1840820    
    sc < ct      :          275 of        1000  
    mean diff    :   2.05709646E-03  
    abs mean diff:   2.71424348E-03  
    
    real    19m45.351s  
    user    19m42.954s  
    sys 0m0.348s  
    

Thanks for reading...


These two intrinsics report different types of time. system_clock reports "wall time" or elapsed time. cpu_time reports time used by the CPU. On a multi-tasking machine these could be very different, e.g., if your process shared the CPU equally with three other processes and therefore received 25% of the CPU and used 10 cpu seconds, it would take about 40 seconds of actual elapsed or wall clock time.


cpu_time() usually has a resolution of about 0.01 second on Intel compatible CPUs. This means that a smaller time interval may count as zero time. Most current compilers for linux make the resolution of system_clock() depend on the data types of the arguments, so integer(int64) will give better than 1 microsecond resolution, as well as permitting counting over a significant time interval. gfortran for Windows was changed recently (during 2015) so as to make system_clock() equivalent to query_performance calls. ifort Windows, however, still shows about 0.01 resolution for system_clock, even after omp_get_wtime was changed to use query_performance. I would discount previous comments about measuring cpu_time or system_clock resolution in clock ticks, particularly if that may be thought to relate to CPU or data buss ticks, such as rdtsc instruction could report.


I find itime (see gfortran manual) to be a good alternative to system_clock for timing fortran programs. It is very easy to use:

integer, dimension(3) :: time
call itime(time)
print *, 'Hour:  ', time(1)
print *, 'Minute:', time(2)
print *, 'Second:', time(3)


I find secnds() to be the easiest way to get wall time. Its usage is almost identical to cpu_time().

real(8)::t1,delta
t1=secnds(0.0)
!Do stuff
delta=seconds(t1)
0

精彩评论

暂无评论...
验证码 换一张
取 消