IOErrors for regular expression in python_问答_开发者

IOErrors for regular expression in python

开发者 https://www.devze.com 2023-03-19 19:39 出处：网络

I\'ve been working on a program, but due to Mac OS X\'s difficulties in updating python, I\'ve been doing it in both 3.2 and 2.6, nevertheless, both versions of the script give me IOErrors (they\'re d

相关专题：python regex

I've been working on a program, but due to Mac OS X's difficulties in updating python, I've been doing it in both 3.2 and 2.6, nevertheless, both versions of the script give me IOErrors (they're different though). Here's the script:

This is the 3.2 version:

import sys
import os 
import re 
import urllib 
import urllib.request

## opens the URL as a bytes object
urlfilebytes = urllib.request.urlopen('http://www.reddit.com/r/fffffffuuuuuuuuuuuu')
## saves the bytes object to a string
urlfile = urlfilebytes.read().decode('utf-8'))
## saves list of matches for pattern
matches = re.findall(r'[http://imgur.com/][\s]+"', open(urlfile).read())

This returns t开发者_高级运维he error: TypeError: invalid file:

The 2.6 version on the other hand:

import sys
import os
import re
import urllib
urlfilebytes = urllib.urlopen('http://www.reddit.com/r/fffffffuuuuuuuuuuuu')
urlfile = urlfilebytes.read().decode('utf-8')
matches = re.findall(r'[http://imgur.com/][\s]+"', open(urlfile).read())

This returns the error:

IOError: [Errno 63] File name too long: u'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"><html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en" ><head><title>FFFFFFFUUUUUUUUUUUU-</title><meta name="keywords" content=" r **ETC ETC ETC**

I'm kind of stumped here, can anyone help me out?

You call open on the string, which tries to open a file named whatever the string contains. In this case <!DOCTYPE.... And that is not a valid filename or existing file. If you replace open(urlfile).read() with just urlfile, it should work.

Also, you might want to escape the []s in the regexp, or it won't do what you want.

Are you sure you don't want to just do this?

re.findall(r'[http://imgur.com/][\s]+"', urlfile)

And I bet the regexp doesn't do what you think it does. Perhaps you need to ask another question about that

Perhaps something like this

re.findall(r'(http://imgur.com/\S+)"', urlfile)

or this

re.findall(r'http://imgur.com/(\S+)"', urlfile)

IOErrors for regular expression in python

精彩评论

关注公众号

热门标签

图文推荐

IOErrors for regular expression in python

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：