Python: replace urls with title names from a string_问答_开发者

Python: replace urls with title names from a string

开发者 https://www.devze.com 2022-12-29 09:44 出处：网络

I wou开发者_StackOverflow社区ld like to remove urls from a string and replace them with their titles of the original contents.

For example:

mystring = "Ah I like this site: http://www.stackoverflow.com. Also I must say I like http://www.digg.com"

sanitize(mystring) # it becomes "Ah I like this site: Stack Overflow. Also I must say I like Digg - The Latest News Headlines, Videos and Images"

For replacing url with the title, I have written this snipplet:

#get_title: string -> string
def get_title(url):
    """Returns the title of the input URL"""

    output = BeautifulSoup.BeautifulSoup(urllib.urlopen(url))
    return output.title.string

I somehow need to apply this function to strings where it catches the urls and converts to titles via get_title.

Here is a question with information for validating a url in Python: How do you validate a URL with a regular expression in Python?

urlparse module is probably your best bet. You will still have to decide what constitutes a valid url in the context of your application.

To check the string for a url you will want to iterate over each word in the string check it and then replace the valid url with the title.

example code (you will need to write valid_url):

def sanitize(mystring):
  for word in mystring.split(" "):
    if valid_url(word):
      mystring = mystring.replace(word, get_title(word))
  return mystring

You can probably solve this using regular expressions and substitution (re.sub accepts a function, which will be passed the Match object for each occurence and returns the string to replace it with):

url = re.compile("http:\/\/(.*?)/")
text = url.sub(get_title, text)

The difficult thing is creating a regexp that matches an URL, not more, not less.

Python: replace urls with title names from a string

精彩评论

关注公众号

热门标签

图文推荐

Python: replace urls with title names from a string

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：