I try to do named entity recognition in python using NLTK. I want to extract personal list of skills. I have the list of skills and would like to search them in requisition and tag the skills. I noticed that NLTK has NER tag for predefine tags like Person, Location etc. Is there a external gazetter tagger in Python I can use? any idea how to do it more sophisticated than search of terms ( sometimes multi words term )?
Thanks, As开发者_运维百科saf
I haven't used NLTK enough recently, but if you have words that you know are skills, you don't need to do NER- just a text search.
Maybe use Lucene or some other search library to find the text, and then annotate it? That's a lot of work but if you are working with a lot of data that might be ok. Alternatively, you could hack together a regex search which will be slower but probably work ok for smaller amounts of data and will be much easier to implement.
Have a look at RegexpTagger and eventually RegexpParser, I think that's exactly what you are looking for.
You can create your own POS tags, ie. map skills to a tag, and then easily define a grammar.
Some sample code for the tagger is in this pdf.
精彩评论