I know that as3 has some powerful new text search capabilities, especially when combined with regex.
I don't even know if this is possible, but I would like to somehow, search any block of text, and return all nouns, adjectives and verbs.
What would be the best(most efficent) way to do this? Is regex an option? or would I have to load in some开发者_JS百科 sort of open sourced dictionary 9as used in spellcheckers) to compare with or??
After, I've pulled all the nouns, adjectives and verbs, I need to count and prioritize by their frequency.
Any suggestions welcome...
There is no regex expression that has any concept of grammatical syntax or parts of speech. Regular expressions are just a way to search strings for patterns.
To do what you want you would need to plug in, as you say, "some sort of open sourced dictionary". The amount of work involved is likely to be significant.
I came across this open source full search engine
http://www.servebox.org/actionscript-foundry/actionscript-foundry-documentation/full-text-search-tree/
The sequence of steps is as I see it
1) create or obtain a list of all english nouns, verbs, adjective (any tips on obtaining or creating this list much appreciated!)
2) Search the data source to see if a match exists with the 1st dictionary word
3) if a match exists, build an index which would contain the number of occurrences.
4) Move on to 2nd word in dictionary and repeat steps 2&3.
5) repeat until every word in the dictionary has been used to search.
So @Robusto is correct, you will need some sort of dictionary data that has words and associates them as either nouns, verbs or adjectives. However if you can find that or build it yourself (might take a while), you can use a Dictionary object in AS3 to build your result arrays:
//dummy data
var testString:String = "Mary had a little lamb her fleece was white as snow";
var testString2:String = "The blue zebra had a rad jacket";
var nouns:Array = ['cup', 'Mary', 'phone', 'lamb', 'jacket', 'fleece', 'snow', 'zebra'];
var verbs:Array = ['had', 'was', 'ran', 'jumped', 'read'];
var adj:Array = ['awesome', 'rad', 'little', 'tall', 'white', 'blue', 'red'];
//SETUP
//Create the dictionaries, in a more complex setting you might load data in from an XML file
//here I'm just pulling the data from the arrays created above
var nounDict:Dictionary = createDictionary( nouns );
var verbDict:Dictionary = createDictionary( verbs );
var adjDict:Dictionary = createDictionary( adj );
//Creates a dictionary based on an Array of words
function createDictionary( wordData:Array ):Dictionary {
var dict:Dictionary = new Dictionary( true );
for(var i:uint = 0; i < wordData.length; i++) {
//add the word as a key to the dictionary
dict[ wordData[i] ] = wordData[i];
}
return dict;
}
//SEARCHING
//str is the string you want to search through
//dict is the dictionary you want to use to search against the string
function searchDictionary( str:String, dict:Dictionary ):Array {
//break up the words by the spaces (you can figure out how to deal with punctuation)
var words:Array = str.split(' ');
//store the matching words in the matches array
var matches:Array = [];
for( var i:uint = 0; i < words.length; i++) {
//check the dictionary for the word
if(dict[ words[i] ]) {
matches.push(words[i]);
}
}
return matches;
}
//TEST IT OUT
trace( searchDictionary( testString, nounDict ) );
trace( searchDictionary( testString, verbDict ) );
trace( searchDictionary( testString, adjDict ) );
trace( searchDictionary( testString2, nounDict ) );
trace( searchDictionary( testString2, verbDict ) );
trace( searchDictionary( testString2, adjDict ) );
You can pop this code with into a new FLA file and see how it works out.
Thanks for the suggestion!
Another approach I was considering was to first remove all pronouns, prepositions from the source collection, and then index ALL the remaining words.
What should be left over is an index list of all nouns, verbs, adverbs.
I think the total list of all pronouns, prepositions ( and conjunctions?) is much smaller than the total list of all nouns, verbs, adverbs, so this elimination type search should be much quicker for any given collection...
精彩评论