开发者

How do I find text features and print them?

开发者 https://www.devze.com 2023-02-10 09:54 出处:网络
I have just started using Natural Language Toolkit (NLTK) as a part of my Engineering college project. Can anybody please tell me how do I read an input paragraph text and

I have just started using Natural Language Toolkit (NLTK) as a part of my Engineering college project. Can anybody please tell me how do I read an input paragraph text and

1) break it down into textual components i.e into number of sentences, number of words, number of characters and number of polysyllabic or complex words in the given paragraph

and

2) Also 开发者_开发知识库print the above determined values


Where's the input paragraph coming from? File? Console? That's more of a python issue than NLTK.

For the rest, look at the nltk.tokenize module & nltk.probability.FreqDist.


From a discussion on the NLTK google group:

import curses 
from curses.ascii import isdigit 
import nltk 
from nltk.corpus import cmudict

d = cmudict.dict() 

def nsyl(word): 
  return [len(list(y for y in x if isdigit(y[-1]))) for x in d[word.lower()]] 

This should be able to give you a syllable count for each word. Hope this helps.

0

精彩评论

暂无评论...
验证码 换一张
取 消