开发者

Problem with Python CSV putting each letter in new field

开发者 https://www.devze.com 2023-03-16 16:05 出处:网络
I\'m trying to put a list of URLs into a csv file that I\'m scraping from a webpage using urllib2 and BeautifulSoup.I have tried writing the links to a csv file as unicode and also converted to utf-8.

I'm trying to put a list of URLs into a csv file that I'm scraping from a webpage using urllib2 and BeautifulSoup. I have tried writing the links to a csv file as unicode and also converted to utf-8. In both cases, each letter is inserted into a new field.

Here's my code (I've tried it at least these two ways):

f = open('filename','wb')
w = csv.writer(f,delimiter=',')
for link in links:
    w.writerow(link['href'])

And:

f = open('filename','wb')
w = csv.writer(f,delimiter=',')
for link in links:
    w.writerow(link['href'].encode('utf-8'))

links is a list that looks like this:

[<a href="#Flyout1" accesskey="2" class="quicklinks" tabindex="1" title="Skip to content">Quick Links: Skip to main page content</a>, <a href="#search" class="quicklinks" tabindex="1" title="Skip to search">Skip to Search</a>, <a href="#News" class="quicklinks" tabindex="1" title="Skip to Section table of contents">Skip to Section Content Menu</a>, <a href="#footer" class="quicklinks" tabindex="1" title="Skip to site options">Skip to Common Links</a>, <a href="http://www.hhs.gov"><img src="/ucm/groups/fdagov-public/@system/documents/system/img_fdagov_hhs_gov.png" alt="www.hhs.gov link" style="width:112px; height:18px;" border="0" /></a>]

Not all the links have an 'href' key but I check for that in code not shown 开发者_如何学Gohere. In both cases, the correct strings are written to the csv file, but each letter is in a new field.

Any thoughts?


From the docs: "A row must be a sequence of strings or numbers ..." You are passing a single string, not a sequence of strings, so it treats each letter as an item. Put your string in a list.

So change w.writerow(link['href']) to w.writerow([link['href']]).

Note: A csv file with a single column looks exactly like a flat text file. Maybe you don't need csv.


I think by "each letter inserted into a new field" you mean something like this, right?

h,t,t,p,:,/,/,w,w,w,.,g,o,o,g,l,e,.,c,o,m

If so, then writerow() is iterating over the characters in your string, and interpreting those as distinct columns. Try using writerow([link['href']]) instead.

Edit: Looks like @Steven Rumbalski beat me to the punch on this!


According to the docs, writerow() takes an iterable object and, iterating over it, prints out the CSV representation of it. Your problem is a string is an iterable object. If I have:

mystring = 'foo'

Python will let me iterate over like so:

for c in mystring:
    print c

And I'll get:

f
o
o

That's a handy feature, but it's working against you in this case.

You don't want writerow() to itterate over the string, you want it to itterate over a list of strings -- separating the strings by commas, not the characters. In that case you'll want to make a list out of the strings like so:

w.writerow([link['href']])
0

精彩评论

暂无评论...
验证码 换一张
取 消