开发者

extract a sentence using python

开发者 https://www.devze.com 2023-01-21 19:49 出处:网络
开发者_如何学GoI would like to extract the exact sentence if a particular word is present in that sentence. Could anyone let me know how to do it with python. I used concordance() but it only prints l

开发者_如何学GoI would like to extract the exact sentence if a particular word is present in that sentence. Could anyone let me know how to do it with python. I used concordance() but it only prints lines where the word matches.


Just a quick reminder: Sentence breaking is actually a pretty complex thing, there's exceptions to the period rule, such as "Mr." or "Dr." There's also a variety of sentence ending punctuation marks. But there's also exceptions to the exception (if the next word is Capitalized and is not a proper noun, then Dr. can end a sentence, for example).

If you're interested in this (it's a natural language processing topic) you could check out:
the natural language tool kit's (nltk) punkt module.


If you have each sentence in a string you can use find() on your word and if found return the sentence. Otherwise you could use a regex, something like this

pattern = "\.?(?P<sentence>.*?good.*?)\."
match = re.search(pattern, yourwholetext)
if match != None:
    sentence = match.group("sentence")

I havent tested this but something along those lines.

My test:

import re
text = "muffins are good, cookies are bad. sauce is awesome, veggies too. fmooo mfasss, fdssaaaa."
pattern = "\.?(?P<sentence>.*?good.*?)\."
match = re.search(pattern, text)
if match != None:
    print match.group("sentence")


dutt did a good job answering this. just wanted to add a couple things

import re

text = "go directly to jail. do not cross go. do not collect $200."
pattern = "\.(?P<sentence>.*?(go).*?)\."
match = re.search(pattern, text)
if match != None:
    sentence = match.group("sentence")

obviously, you'll need to import the regex library (import re) before you begin. here is a teardown of what the regular expression actually does (more info can be found at the Python re library page)

\. # looks for a period preceding sentence.
(?P<sentence>...) # sets the regex captured to variable "sentence".
.*? # selects all text (non-greedy) until the word "go".

again, the link to the library ref page is key.

0

精彩评论

暂无评论...
验证码 换一张
取 消