开发者

How to make concept representation with the help of bag of words

开发者 https://www.devze.com 2022-12-22 07:46 出处:网络
Thanks for stoping to read my question :) this is very sweet place full of GREAT peoples ! I have a question about \"creating sentences with words\". NO NO it is n开发者_如何学编程ot about english gr

Thanks for stoping to read my question :) this is very sweet place full of GREAT peoples !

I have a question about "creating sentences with words". NO NO it is n开发者_如何学编程ot about english grammar :)

Let me explain, If I have bag of words like

"person apple apple person person a eat person will apple eat hungry apple hungry"

and it can generate some kind of following sentence

"hungry person eat apple"

I don't in which field this topic will relate. Where should I try to find an answer. I tried to search google but I only found english grammar stuff :)

Any body there who can tell me which algo can work in this problem? or any program

Thanks

P.S: It is not an assignment :) if it would be i would ask for source code ! I don't even know in which field I should look for :)


Most successful linguistic parsers today are statistically based, and this is (for example) how Google Translate works. What you do is get a large semantically marked-up corpus and start walking the word chart. The set of linguistically valid English sentences is larger than that of generative grammar (an older approach), but a large corpus will get you a huge number of viable sentence templates. You can make sentences from your bag by any data traversal technique, from random walk to genetic algorithms. Let us know what you do!

Here's a great set of resources to start: Stanford statistical natural language processing and corpus-based computational linguistics resources

In response to OP comment below: To generate a sentence you must have an abstract representations of valid sentences. A simple example is SUBJECT VERB OBJECT in generative grammar. You might also get SUBJECT VERB ADJECTIVE OBJECT as well. The problem is that you can fill it out with grammatically correct nonsense, such as "I ate hungry apple." What statistical analysis will tell you is that "hungry apple" is a combination you almost never see--it's very unlikely to appear in real English (your corpus), and so without even having to know the meaning I can eliminate that as a possible sentence. If you were writing a grammar checker you might underline that word pair as being questionable.

Since you are writing a sentence generator, you would just need to reverse that process--one simple possibility is to simply generate a large set of random combinations of the words and then check them against your database to see if the word chains all meet a certain threshold of likelihood, such as 80%. Another option is to treat individual word chains as genes in a genetic algorithm, and after a few generations chains like "hungry apple" will die out in favor of more successful genes like "red apple." With a small "word bag" like the one you mentioned you don't need to get that fancy, you can probably test every possible sentence with numwords < n with no problem. You only need to get fancy in your sentence search algorithm when your word bag is too huge to exhaustively compute.

The link above does have several marked-up corpora you can download and use, as well as plenty of sample programs for marking up corpora of your own. But you do want to keep it simple if this is just a project of idle curiosity. Let me make another suggestion--one of the largest corpora available is Google's index of the web. Any sentence or phrase you put in quotes in a google search will return a number of hits. "red apple" returns over a million hits, for example, whereas "hungry apple" returns a mere 11,000. You can use this to build a small statistical markup for the validity of your sentences with a small word bag. If the statistical process turns out to be too complicated for you to implement, instead think of marking up your word bag with parts of speech (research part-of-speech markup) and provide your program with a variety of abstract sentence templates--you will still get sentences like "A person will eat a hungry apple" but depending on your needs that may be enough. :)

P.S. Without the word "an" in your word bag you look limited to Tarzan grammar and the world of man-eating apples :)


I think you might be thinking of Generative Grammars, but I'm not too sure.

0

精彩评论

暂无评论...
验证码 换一张
取 消