开发者

List problem with extracting data from Twitter XML page

开发者 https://www.devze.com 2023-03-21 19:50 出处:网络
With my function I can extract usernames from a twitter xml search page for a friend finder app I am building as a pr开发者_运维知识库oject. The problem though is that when I grab the usernames and in

With my function I can extract usernames from a twitter xml search page for a friend finder app I am building as a pr开发者_运维知识库oject. The problem though is that when I grab the usernames and input them into a list something strange happens. Instead of having each username as a separate element within a list I have each username being its own list.

So I instead get 20 or so lists. Here is an example of what my code produces list = ["twitter.com/username"], ["twitter.com/username1"],["twitter.com/username2"]

So you see every single username is its own list. Instead of having one list with three values I have three lists with one value each in them. This is an absolute nightmare to iterate through. How can I make it so I have one list with three elements?

Code is here:

def get_names(search_term = raw_input("What term do you want to search for?")):
    search_page = "http://search.twitter.com/search.atom?q="
    search_page += search_term
    data = []
    doc = urllib.urlopen(search_page).read()
    soup = BeautifulStoneSoup(''.join(doc))
    data = soup.findAll("uri")
    for uri in soup.findAll('uri'):
        data = []
        uri = str(uri.extract())
        data.append(uri[5:-6] 
        print data


You're making a new list, called data, for each URI. If you move the data = [] line out of the for uri in soup.findAll('uri'): loop, you should end up with one list instead of a list of lists.

In addition, you've got some other problems. There is a syntax error on your next to last line: you're missing a close-parenthesis at the end of the line. You've got duplicate lines. Try removing the first data = [] line, as well as the data = soup.findAll('url') line, as you're just doing findAll again for the for loop. In addition, you shouldn't put raw_input in the function signature, because that means it gets call when you define the function, not when you call the function.

Try this:

def get_names():
    search_page = "http://search.twitter.com/search.atom?q="
    search_page += raw_input("What term do you want to search for?")
    doc = urllib.urlopen(search_page).read()
    soup = BeautifulStoneSoup(doc)
    doc.close()
    data = [str(uri.extract())[5:-6] for uri in soup.findall('uri')]
    return data
names = get_names()
print(names)

Edit: You also don't need to ''.join(doc), read() returns a single string, not a sequence; data can be assembled with a string comprehension.


The problem is you're sort of all over the place in your assignments to data; I'd suggest changing that code to:

def get_names(search_term = raw_input("What term do you want to search for?")):
    search_page = "http://search.twitter.com/search.atom?q="
    search_page += search_term
    data = []
    doc = urllib.urlopen(search_page).read()
    soup = BeautifulStoneSoup(''.join(doc))
    for uri in soup.findAll('uri'):
        uri = str(uri.extract())
        data.append(uri[5:-6])
    print data
    return data

(untested since I don't know what BeautifulStoneSoup is refering to)

HTH

Pacific

0

精彩评论

暂无评论...
验证码 换一张
取 消