Performance profiler: How to use unsigned long to measure time in microseconds?_问答_开发者

Performance profiler: How to use unsigned long to measure time in microseconds?

开发者 https://www.devze.com 2023-01-30 19:44 出处：网络

I am using unsigned long to collect performance data while profiling an application as follows unsigned long time_taken = (unsigned long) ((finish_time_in_nano_sec - start_time_in_nano_sec)/1000);

相关专题：c

I am using unsigned long to collect performance data while profiling an application as follows

unsigned long time_taken = (unsigned long) ((finish_time_in_nano_sec - start_time_in_nano_sec)/1000);
Write time_taken to a file. Do this for every function that gets called in my app.
Now after app execution, have a separa开发者_如何学Gote program read the above file and sum up the time taken for every function including call counts such as

func_name aggregate_time call_counts

The problem I have is that for some functions, the aggregate_time field is a 20 digit value i.e. the maximum value an unsigned long can hold. This cannot be true because I measure time in microseconds and don't run my app for more than like 20 seconds. How can it then be a 20 digit value?

Do you see mistakes in Steps 1,2 and 3?

Regards, Krishna

EDIT:

1) Time measurement: clock_gettime(CLOCK_REALTIME, &start_time); clock_gettime(CLOCK_REALTIME, &finish_time); unsigned long time_taken = (unsigned long)((finish_time.tv_nsec - art_time.tv_nsec)/1000);

2) File write: fwrite(&time_taken, sizeof(unsigned long), 1, datafile);

3) File read: fread(&time_taken, sizeof(long), 1, datafile);

The problem is in your time measurement code:

clock_gettime(CLOCK_REALTIME, &start_time);
clock_gettime(CLOCK_REALTIME, &finish_time);
unsigned long time_taken = (unsigned long)((finish_time.tv_nsec - start_time.tv_nsec)/1000);

If a 1-second clock boundary is crossed between start_time and finish_time, then your subtraction could result in a negative value, and will then be converted to a very large positive unsigned value.

You need to take into account the tv_sec member of the times as well:

unsigned long time_taken;
time_taken = (unsigned long)((finish_time.tv_sec - start_time.tv_sec) * 1000000);
time_taken += (unsigned long)(finish_time.tv_nsec / 1000);
time_taken -= (unsigned long)(start_time.tv_nsec / 1000);

(eg a clock time of 23989032.452 seconds would be representated as a struct timespec with .tv_sec = 23989032 and .tv_nsec = 452000000)

The problem isn't in the code you show - though it would be interesting to know the types of the times in nanoseconds.

Therefore, the problem is in the code you don't show:

writing the data to file
reading the data from file
obtaining the times in nanoseconds
processing the times in microseconds

Actually, it would be possible in some cases to obtain negative times. Considering that the time intervals are very small, they most probably come from using RDTSC instruction, that reads a counter that increments with every clock of the CPU. On some multicore AMD processors each core have its own counter, that is not synchronized with others, so it could be possible that the difference between two consecutive readings performed on different cores could be negative.

It can be solved easily by forcing the program to execute on a single core, for example by using Win32 API SetProcessAffinityMask.

EDIT: Be careful how you measure times. Most time APIs have a horrible resolution, so measuring small intervals using them have no sense. Try using RDTSC explicitly.

And the most important: file I/O is such a slow function, that it can completely destroy useful information. For example, if you perform the measurements for both, the caller and the callee, and callee write the result to file, then the time measured for caller would include the time spent in fwrite. Monitor also each write, so that you can see how much time is spent there (to avoid recursion, you can save the time took by a fwrite and log it with the next fwrite).