I'm trying to determine the CPU utilization of specific LWPs in specific processes in Solaris 10 using data from the /proc filesystem. The problem I have is that sometimes a utilization counter decreases.
Here's the gist of it:
// we'll be reading from the file named /proc/<pid>/lwp/<lwpid>/lwpusage
std::stringstream filename;
filename << "/proc/" << pid << "/lwp/" << lwpid << "/lwpusage";
int fd = open(filename.str().c_str(), O_RDONLY);
// error checking
while(1)
{
prusage_t usage;
ssize_t readResult = pread(usage_fd, &usage, sizeof(prusage_t), 0);
// error checking
std::cout << "sec=" << usage.pr_stime.tv_sec
<< "nsec=" << usage.pr_stime.tv_nsec << std::endl;
// wait
}
close(fd);
The number of nanoseconds reported in the prusage_t struct are derived from timestamps recorded each time an LWP chang开发者_运维问答es state. This feature is called microstate accounting. Sounds good, but every so often the "system call cpu time" counter decreases roughly 1-10 milliseconds.
Update: its not just the "system call cpu time" counter, I've since seen other counters decreasing as well.
Another curiosity is that it always seems to be exactly one sample that's bogus - never two near each other. All the other samples are monotonically increasing at the expected rate. This seems to rule out the possibility that the counter is somehow reset in the kernel.
Any clues as to what's going on here?
> uname -a
SunOS cdc-build-sol10u7 5.10 Generic_139556-08 i86pc i386 i86pc
If you are on a multicore machine, you might check whether this is occurring when the process is migrated from one processor core to a different one. If your processes are running, prstat
will show the cpu on which they are running. To minimize lock contention, frequently updated data is sometimes updated in a processor-specific memory area and then synchronized with any copies of the data for other processors.
Just a guess. You might want to disable temporarily NTP and see if the problem still appears.
精彩评论