开发者

Loop line by line and wall clock time with a Python Filter

开发者 https://www.devze.com 2023-02-16 12:33 出处:网络
I have a simple loop like: #Core Loop chunk_size=1000 while True: line_c = 0 chunk_array = [] while True: line = sys.stdin.readline()

I have a simple loop like:

#Core Loop
chunk_size=1000
while True:
    line_c = 0 
    chunk_array = []
    while True:
        line = sys.stdin.readline()
        line_c +=1
        m = line_regex.match(line)
        if m:   
            chunk_array.append(m.groupdict())
        if line_c >= chunk_size:
            #print top_value(chunk_array, 'HTTP_HOST', 10)
            print stats(chunk_array, 'HTTP_HAPROXY_TT')
            break

The script is called as a unix filter, for example:

tail -f /var/log/web/stackoverflow.log | python logFilter.py

Instead of printing every X lines, what would be a good way to refactor this loop to do every X seconds?

Reference:

Stats function:

def stats(l, value):
    '''stats of an integer field'''
    m = []
    for line in l:
        if line[value].isdigit():
            m.append(int(line[value]))
    return "Mean: %s Min: %s Max: %s StdDev: %s" % (mean(m), amin(m), amax(m), std(m))

The input will be lines of a web log file, the line_regex turns them into field value pairs (groupdict). The output when using the stats function is like:

tail -f /var/log/web/stackoverflow.log | python logFilter.py -f HTTP_HAPROXY_TR -t stats
Mean: 183.43919598 Min: 0 Max: 3437 StdDev: 321.673112066
Mean: 182.768304915 Min: 0 Max: 2256 StdDev: 255.039386654
Mean: 142.672064777 Min: 0 Max: 1919 StdDev: 208.870675922

So those stat lines are开发者_开发技巧 printed every time the script has received 1000 lines. Instead of doing it every X number of lines, I would like to change the loop so this happens every say 10 seconds.


Do this

import time
def time_chunk( some_source, period=10 ):
    start= time.time()
    buffer= []
    for line in some_source:
        buffer.append( line )
        if time.time() - start >= period:
            start= time.time()
            yield buffer
            buffer= []
    yield buffer

for chunk in time_chunk( sys.stdin ):  
    print( stats( chunk ) )


To do something every -- say -- 5 seconds in Python, you can use the signal module. To fire the timer every 5 seconds, use

signal.setitimer(signal.ITIMER_REAL, 5.0, 5.0)

and install a handler which is called in this interval by

signal.signal(signal.SIGALRM, handler)

where handler is the function you want to be called in this interval.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号