I have just started using Natural Language Toolkit (NLTK) as a part of my Engineering college project. Can anybody please tell me how do I read an input paragraph text and
1) break it down into textual components i.e into number of sentences, number of words, number of characters and number of polysyllabic or complex words in the given paragraph
and
2) Also 开发者_开发知识库print the above determined values
Where's the input paragraph coming from? File? Console? That's more of a python issue than NLTK.
For the rest, look at the nltk.tokenize module & nltk.probability.FreqDist.
From a discussion on the NLTK google group:
import curses
from curses.ascii import isdigit
import nltk
from nltk.corpus import cmudict
d = cmudict.dict()
def nsyl(word):
return [len(list(y for y in x if isdigit(y[-1]))) for x in d[word.lower()]]
This should be able to give you a syllable count for each word. Hope this helps.
精彩评论