Project Gutenberg Python problem?_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-01-12 14:08 出处：网络

I am trying to process various texts by regex and NLTK of python -which is at http://www.nltk.org/book-. I am trying to create a random text generator and I am having a hard time with a problem. First

Enter a sentence as input -this is called trigger string-
Get longest word in trigger string
Search all Project Gutenberg database for sentences that contain this word -regardless of uppercase lowercase-
Return the longest sentence that has the w开发者_JS百科ord I spoke about in step 3
Append the sentence in Step 1 and Step4 together
Repeat the process. Note that I have to get the longest word in second sentence and continue like that and so on-

So far I have been able to do this for first two sentences but I cannot perform a case insensitive search. Entire sentence database of Project Gutenberg is available via gutenberg.sents() function but regex - case insensitive search is practically impossible since the gutenberg.sents() outputs the sentences in books as following -in a list of list format-:

EXAMPLE: all the sentences of shakespeare's macbeth is called by typing

import nltk

from nltk.corpus import gutenberg 

gutenberg.sents('shakespeare-macbeth.txt')

into the python shell command line and output is:

[['[', 'The', 'Tragedie', 'of', 'Macbeth', 'by', 'William', 'Shakespeare', '1603', ']'], 
['Actus', 'Primus', '.'], .......]

with [The Tragedie of Macbeth by William Shakespare, 1603] and Actus Primus. being the first two sentences.

How can I find the word I'm looking for regardless of it being uppercase/lowercase ? I'm desperately in need of help since I have been tinkering with this for the past two days and it's starting to wear on my nerves. Thanks a lot.

Given a list L of words, and a target word t,

any(t.lower()==w.lower() for w in L)

tells you whether L has word t in a case-insensitive way. It's faster, of course, to do

lt = t.lower()
any(lt==w.lower() for w in L)

since Python does not "hoist" the constant computation out of the loop and, unless you hoist it yourself, it will be performed repeatedly.

Given a list of lists lol, the longest sub-list including t can be found by

longest = max((L for L in lol if any(lt==w.lower() for w in L)), key=len)

If multiple sub-lists include t and are of the same maximal length, this will give you the first one, as it happens.

How about using the built-in function: str.lower()¶ Return a copy of the string converted to lowercase.

Then just compare the strings.

Project Gutenberg Python problem?

精彩评论

关注公众号

热门标签

图文推荐

Project Gutenberg Python problem?

更多 问答 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多问答相关资讯：