开发者

python search from tag

开发者 https://www.devze.com 2023-01-04 21:52 出处:网络
i need help with python programming: i need a command which can search all the words between tags from a text file.

i need help with python programming: i need a command which can search all the words between tags from a text file. for example i开发者_运维问答n the text file has <concept> food </concept>. i need to search all the words between <concept> and </concept> and display them. can anybody help please.......


  1. Load the text file into a string.
  2. Search the string for the first occurrence of <concept> using pos1 = s.find('<concept>')
  3. Search for </concept> using pos2 = s.find('</concept>', pos1)

The words you seek are then s[pos1+len('<concept>'):pos2]


There is a great library for HTML/XML traversing named BeautifulSoup. With it:

from BeautifulSoup import BeautifulStoneSoup
soup = BeautifulStoneSoup(open('myfile.xml', 'rt').read())
for t in soup.findAll('concept'):
   print t.string


Have a look at regular expressions. http://docs.python.org/library/re.html

If you want to have for example the tag <i>, try

text = "text to search. <i>this</i> is the word and also <i>that</i> end"
import re
re.findall("<i>(.*?)</i>",text)

Here's a short explanation how findall works: It looks in the given string for a given regular expression. The regular expression is <i>(.*?)</i>:

  • <i> denotes just the opening tag <i>
  • (.*?) creates a group and matches as much as possible until it comes to the first
  • </i>, which concludes the tag

Note that the above solution does not mach something like

<i> here's a line
break </i>

Since you just wanted to extract words.

However, it is of course possible to do so:

re.findall("<i>(.*?)</i>",text,re.DOTALL)
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号