开发者

python working with files as they are written

开发者 https://www.devze.com 2023-03-05 18:09 出处:网络
So I\'m trying to create a little script to deal with some logs. I\'m just learning python, but know about loops and such in other languages. It seems that I don\'t understand quite how the loops work

So I'm trying to create a little script to deal with some logs. I'm just learning python, but know about loops and such in other languages. It seems that I don't understand quite how the loops work in python.

I have a raw log from which I'm trying to isolate just the external IP addresses. An example line:

05/09/2011 17:00:18 192.168.111.26 192.168.111.255 Broadcast packet dropped udp/netbios-ns 0 0 X0 0 0 N/A

And heres the code I have so far:

import os,glob,fileinput,re

def parseips():
    f = open("126logs.txt",'rb')
    r = open("rawips.txt",'r+',os.O_NONBLOCK)

    for line in f:
        rf = open("rawips.txt",'r+',os.O_NONBLOCK)
        ip = line.split()[3]
        res=re.search('192.168.',ip)
        if not res:
            rf.flush()
            for line2 in rf:
                if ip not in line2:
                    r.write(ip+'\n')
                    print 'else write'
                else:
                    print "no"
    f.close()
   开发者_运维问答 r.close()
    rf.close()  

parseips()

I have it parsing out the external ip's just fine. But, thinking like a ninja, I thought how cool would it be to handle dupes? The idea or thought process was that I can check the file that the ips are being written to against the current line for a match, and if there is a match, don't write. But this produces many more times the dupes than before :) I could probably use something else, but I'm liking python and it makes me look busy.

Thanks for any insider info.


DISCLAIMER: Since you are new to python, I am going to try to show off a little, so you can lookup some interesting "python things".

I'm going to print all the IPs to console:

def parseips():
    with open("126logs.txt",'r') as f:
        for line in f:
            ip = line.split()[3]
            if ip.startswith('192.168.'):
                print "%s\n" %ip, 

You might also want to look into:

f = open("126logs.txt",'r')
IPs = [line.split()[3] for line in f if line.split()[3].startswith('192.168.')]

Hope this helps, Enjoy Python!


Something along the lines of this might do the trick:

import os,glob,fileinput,re

def parseips():
    prefix = '192.168.'
    #preload partial IPs from existing file.
    if os.path.exists('rawips.txt'):
        with open('rawips.txt', 'rt') as f:
            partial_ips = set([ip[len(prefix):] for ip in f.readlines()])
    else:
        partial_ips = set()

    with open('126logs.txt','rt') as input, with open('rawips.txt', 'at') as output:
        for line in input:
            ip = line.split()[3]
            if ip.startswith(prefix) and not ip[len(prefix):] in partial_ips:
                partial_ips.add(ip[len(prefix):])
                output.write(ip + '\n')

parseips()


Rather than looping through the file you're writing, you might try just using a set. It might consume more memory, but your code will be much nicer, so it's probably worth it unless you run into an actual memory constraint.


Assuming you're just trying to avoid duplicate external IPs, consider creating an additional data structure in order to keep track of which IPs have already been written. Since they're in string format, a dictionary would be good for this.

externalIPDict = {}
#code to detect external IPs goes here- when you get one;
if externalIPString in externalIPDict:
    pass # do nothing, you found a dupe
else:
    externalIPDict[externalIPDict] = 1
    #your code to add the external IP to your file goes here
0

精彩评论

暂无评论...
验证码 换一张
取 消