I need to take an input text file with a one word. I then need to find the lemma_names, definition and examples of the synset of the word using wordnet. I have gone through the book : "Python Text Processing with NLTK 2.0 Cookbook" and also "Natural Language Processing using NLTK" to help me in this direction. Though I have understood how this can be done using the terminal, I'm not able to do the same using a text editor.
For example, if the input text has the word "flabbergasted", the output needs to be in this fashion:
flabbergasted (verb) flabbergast, boggle, bowl over - overcome with amazement ; "This boggles the mind!" (adjective) dumbfounded , dumfounded , flabbergasted , stupefied , thunderstruck , dumbstruck , dumbstricken - as if struck dumb with astonishment and surprise; "a circle of policement stood dumbfounded by her denial of having seen the accident"; "the flabbergasted aldermen were speechless"; "was thunderstruck by the news of his promotion"
The开发者_Go百科 synsets, definitions and example sentences are obtained from WordNet directly!
I have the following piece of code:
from __future__ import division
import nltk
from nltk.corpus import wordnet as wn
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
fp = open("inpsyn.txt")
data = fp.read()
#to tokenize input text into sentences
print '\n-----\n'.join(tokenizer.tokenize(data))# splits text into sentences
#to tokenize the tokenized sentences into words
tokens = nltk.wordpunct_tokenize(data)
text = nltk.Text(tokens)
words = [w.lower() for w in text]
print words #to print the tokens
for a in words:
print a
syns = wn.synsets(a)
print "synsets:", syns
for s in syns:
for l in s.lemmas:
print l.name
print s.definition
print s.examples
I get the following output:
flabbergasted
['flabbergasted']
flabbergasted
synsets: [Synset('flabbergast.v.01'), Synset('dumbfounded.s.01')]
flabbergast
boggle
bowl_over
overcome with amazement
['This boggles the mind!']
dumbfounded
dumfounded
flabbergasted
stupefied
thunderstruck
dumbstruck
dumbstricken
as if struck dumb with astonishment and surprise
['a circle of policement stood dumbfounded by her denial of having seen the accident', 'the flabbergasted aldermen were speechless', 'was thunderstruck by the news of his promotion']
Is there a way to retrieve the part of speech along with the group of lemma names?
def synset(word):
wn.synsets(word)
doesn't return anything so by default you get None
you should write
def synset(word):
return wn.synsets(word)
Extracting lemma names:
from nltk.corpus import wordnet
syns = wordnet.synsets('car')
syns[0].lemmas[0].name
>>> 'car'
[s.lemmas[0].name for s in syns]
>>> ['car', 'car', 'car', 'car', 'cable_car']
[l.name for s in syns for l in s.lemmas]
>>>['car', 'auto', 'automobile', 'machine', 'motorcar', 'car', 'railcar', 'railway_car', 'railroad_car', 'car', 'gondola', 'car', 'elevator_car', 'cable_car', 'car']
Here I have created a module which can easily be used(imported), and with a string being passed to it, will return all the lemma words of the string.
Module:
#!/usr/bin/python2.7
''' pass a string to this funciton ( eg 'car') and it will give you a list of
words which is related to cat, called lemma of CAT. '''
from nltk.corpus import wordnet as wn
import sys
#print all the synset element of an element
def lemmalist(str):
syn_set = []
for synset in wn.synsets(str):
for item in synset.lemma_names:
syn_set.append(item)
return syn_set
Usage:
Note: module name is lemma.py hence "from lemma import lemmalist"
>>> from lemma import lemmalist
>>> lemmalist('car')
['car', 'auto', 'automobile', 'machine', 'motorcar', 'car', 'railcar', 'railway_car', 'railroad_car', 'car', 'gondola', 'car', 'elevator_car', 'cable_car', 'car']
Cheers!
synonyms = []
for syn in wordnet.synsets("car"):
for l in syn.lemmas():
synonyms.append(l.name())
print synonyms
In NLTK 3.0
, lemma_names
has been changed from attribute to method. So if you get an error saying:
TypeError: 'method' object is not iterable
You can fix it using:
>>> from nltk.corpus import wordnet as wn
>>> [item for sysnet in wn.synsets('car') for item in sysnet.lemma_names()]
This will output:
>>> [
'car', 'auto', 'automobile', 'machine', 'motorcar', 'car',
'railcar', 'railway_car', 'railroad_car', 'car', 'gondola',
'car', 'elevator_car', 'cable_car', 'car'
]
精彩评论