开发者

python re match, findall or search and then NLP (what to do with it?)

开发者 https://www.devze.com 2023-03-01 05:42 出处:网络
I am starting to write code that would capture part of sentence \"types\" and if they match a criteria, start a specific python script that deals with the \"type.\" I am \"finding\":) that findall kin

I am starting to write code that would capture part of sentence "types" and if they match a criteria, start a specific python script that deals with the "type." I am "finding":) that findall kind of works better for what i am doing hence:

m = re.compile(r'([0-9] days from now)')
m.match("i think maybe 7 days from now i hope")
print m.match("i think maybe 7 days from now i hope")
None
f= m.findall("i think maybe 7 days from now i hope")
print f[0]
7 days from now

This seems to give me the part of sentence that i was looking for. I can then give this to for example - the pyparsing module using its example datetime conversion script that returns a datetime from a similar NL statement (I know there are other modules but they are rigid in input statements they can handle) .

Then I could do a db insert into my online diary for example or on a hosted web app if other parts of the sentence matched another "type" ie. appointments, deadlines etc.

I am just tinkering here but slowly i am building something useful. Is this structure /process logical or are there better methods/ ways: that is what开发者_开发知识库 i am asking myself now. Any feedback is appreciated


The reason why m.match() fails is that it expects the match to start at the beginning of the string.

findall() makes sense if you expect more than one (non-overlapping) match in your string. Otherwise, use the search() method (which will return the first match it finds).

This is all well covered in the docs.


From my knowledge of search interfaces, it seems like you'd need an awful lot of regular expressions to capture the great variety of ways in which people express themselves. For a feeling for just how many, see this writeup on "the vocabulary problem".

So, if you're just doing date/time stuff, and you're tying very specific actions to them that it would suck to get wrong, then RE's seem like a good way to go. On the other hand, if you're just trying to detect a "date" expression vs. e.g. an "email" expression or a "note" expression, then perhaps it might be worth a try to POS-tag the sentences using NLTK and match patterns on the part of speech level.

0

精彩评论

暂无评论...
验证码 换一张
取 消