开发者

Parsing Data from live website in Python Enumerate problem!

开发者 https://www.devze.com 2023-03-05 14:15 出处:网络
The following script is supposed to fetch a specific line number and parse it from a live website. It works for like 30 loops but then it seems like enumerate(f) stops working correctly... the \"i\" i

The following script is supposed to fetch a specific line number and parse it from a live website. It works for like 30 loops but then it seems like enumerate(f) stops working correctly... the "i" in the for loop seems to stop at line 130 instead of like 200 something. Could this be due to the website I'm trying to fetch data from or something else? Thanks!!

import sgmllib

class MyParser(sgmllib.SGMLParser):
"A simple parser class."

def parse(self, s):
    "Parse the given string 's'."
    self.feed(s)
    self.close()

def __init__(self, verbose=0):
    "Initialise an object, passing 'verbose' to the superclass."

    sgmllib.SGMLParser.__init__(self, verbose)
    self.divs = []
    self.descriptions = []
    self.inside_div_element = 0

def start_div(self, attributes):
    "Process a hyperlink and its 'attributes'."

    for name, value in attributes:
        if name == "id":
            self.divs.append(value)
            self.inside_div_element = 1

def end_div(self):
    "Record the end of a hyperlink."

    self.inside_div_element = 0

def handle_data(self, data):
    "Handle the textual 'data'."

    if self.inside_div_element:
        self.descriptions.append(data)


def get_div(self):
    "Return the list of hyperlinks."

    return self.divs

def get_descriptions(self, check):
    "Return a list of descriptions."
if check == 1:
    self.descriptions.pop(0)
    return self.descriptions

def rm_descriptions(self):
"Remove all descriptions."

self.descriptions.pop()

import urllib
import linecache
import sgmllib


tempLine = ""
tempStr = " "
tempStr2 = ""
myparser = MyParser()
count = 0
user = ['']
oldUser = ['none']  
oldoldUser = [' ']
array = [" ", 0]
index = 0
found = 0    
k = 0
j = 0
posIndex = 0
a = 0
firstCheck = 0
fCheck = 0
while a < 1000:

print a
f = urllib.urlopen("SITE")
a = a+1

for i, line in enumerate(f):


    if i == 187:
        print i
        tempLine = line
        print line

    开发者_如何学编程    myparser.parse(line)
        if fCheck == 1:
            result  = oldUser[0] is oldUser[1]

            u1 = oldUser[0]
            u2 = oldUser[1]
            tempStr = oldUser[1]
            if u1 == u2:
                result = 1
        else:
            result = user is oldUser
        fCheck = 1

        user = myparser.get_descriptions(firstCheck)
        tempStr = user[0]
        firstCheck = 1



        if result:

            array[index+1] = array[index+1] +0

        else:
            j = 0

            for z in array:
                k = j+2

                tempStr2 = user[0]
                if k < len(array) and tempStr2 == array[k]: 

                    array[j+3] = array[j+3] + 1
                    index = j+2
                    found = 1
                    break
                j = j+1
            if found == 0:

                array.append(tempStr)
                array.append(0)


        oldUser = user
        found = 0
        print array


    elif i > 200:
        print "HERE"
        break



print array
f.close()


Perhaps the number of lines on that web page are fewer than you think? What does this give you?:

print max(i for i, _ in enumerate(urllib.urlopen("SITE")))


Aside: Your indentation is stuffed after the while a < 1000: line. Excessive empty lines and one-letter names don't assist the understanding of your code.

enumerate is not broken. Instead of such speculation, inspect your data. Suggestion: replace

for i, line in enumerate(f):

by

lines = list(f)
print "=== a=%d linecount=%d === % (a, len(lines))
for i, line in enumerate(lines):
    print "   a=%d i=%d line=%r" % (a, i, line)

Examine the output carefully.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号