I have a Ruby-on-Rails model:
class Candidate < ActiveRecord::Base
validates_presence_of :application_essay
validate :validate_length_of_application_essay
protected
def validate_length_of_application_essay
re开发者_如何学Pythonturn if application_essay.blank? # don't add a second error message if they didn't fill it out
errors.add(:application_essay, :too_long), unless ...
end
end
Without dropping into C, what is the fastest way to check that the application_essay
contains no more than 500 words? You can assume that most essays will be at least 200 words, are unlikely to be more than 5000 words, and are in English (or the pseudo-English sometimes called "business-ese"). You can also classify anything you want as a "word" as long as your classification would be immediately obvious to a typical user. (NB: this is not the place to debate what a "typical user" is :) )
In Rails3 using a :tokenizer
with a lambda
method works too.
validates_length_of :essay, :minimum => 100, :too_short => "Your essay must be at least 100 words."), :tokenizer => lambda {|str| str.scan(/\w+/) }
It may not be the fastest, but is certainly the cleanest way.
You're not going to get any faster than a linear search, sorry (unless this is for some sort of text-editor, and you can keep track incrementally)
I would just use something like:
string.split(" ").length <= 500
What performance issue are you seeing? A string a 500 or so words shouldn't be much of a problem.
You could estimate the typical size of a word and guess the amount of words by dividing.
some hints here:http://blogamundo.net/lab/wordlengths/
You could try like 5.1 and see how accurate you are by running a few tests.
Well probably dividing by 6.1 since you have whitespaces.
Keep in mind you would be assuming that your text is not just huge amount of white spaces or something. Well but if your really just interested to make sure it has not more than x words. You could try a low number on x maybe 5 and if it has less then x times 5 characters you can be pretty sure it does not have more then x words.
So you are maybe better off doing a linear search as stated in the other answers. A linear search isnt that bad at all. It just depends on what you want to do.
There's a plugin for that, havn't used it myself tho :)
http://code.google.com/p/validates-word-count/
That plugin switches all adjacent "word characters" into a single character, then removes all non-word characters and count them. Not sure if it's the fastest tho.
Here is a nice article that you might like
http://dotnetperls.com/word-count
精彩评论