I'm developing a logger daemon to squid to grab the logs on a mongodb database. But I'm experiencing too much cpu utilization. How can I optimize this code?
from sys import stdin
from pymongo import Connection
connection = Connection()
db = connection.squid
logs = db.logs
buffer = []
a = 'timestamp'
b = 'resp_time'
c = 'src_ip'
d = 'cache_status'
e = 'reply_size'
f = 'req_method'
g = 'req_url'
h = 'username'
i = 'dst_ip'
j = 'mime_type'
L = 'L'
while True:
l = stdin.readline()
if l[0] == L:
l = l[1:].split()
buffer.append({
a: float(l[0]),
b: int(l[1]),
c: l[2],
d: l[3],
e: int(l[4]),
f: l[5],
g: l[6],
h: l[7],
i: l[8],
j: l[9]
}
)
if len(buffer开发者_StackOverflow中文版) == 1000:
logs.insert(buffer)
buffer = []
if not l:
break
connection.disconnect()
This might be a better question for a python profiler. There's a few builtin Python profiling modules such as cProfile; you can read more about it here.
I'd suspect it might actually be readline() causing cpu utilization. Try running the same code with the readline replaced with just looking at some constant buffer provided by you. And try running with the database inserts commented out. Establish which one of these is the culprit.
The cpu usage is given by that active loop While True. How many lines / minute do you have? put the
if len(buffer) == 1000:
logs.insert(buffer)
buffer = []
check after the buffer.append
I will tell you more after you tell me how many insertions you get so far
精彩评论