I want to determine if a string is the name of a month and I want to do it relatively quickly. The function that is currently stuck in my brain is something like:
boolean isaMonth( String str ) {
String[] months = DateFormatSymbols.getInstance().getMonths();
String[] shortMonths = DateFormatSymbols.getInstance().getShortMonths();
int i;
for( i = 0; i<months.length(); ++i;) {
if( months[i].equals(str) ) return true;
if( shortMonths[i].equals(str ) return true;
}
return false;
}
However, I will be processing lots of text, passed one string at a time to this function, and most of the time I will be getting the worst case of going through the entire loop and returning false.
I saw another question that talked about a Regex to match a month name and a year which could be adapted for this situation. Would the Regex be faster? Is there any other solution that migh开发者_StackOverflow中文版t be faster?
Why not store the month names in a HashSet
? This will give you constant time lookup instead of the linear time lookup you are getting from your loop.
import java.util.HashSet;
import java.util.Collections;
import java.text.DateFormatSymbols;
class Test {
public static void main(String[] args) {
HashSet<String> months = new HashSet<String>(24);
Collections.addAll(months, DateFormatSymbols.getInstance().getMonths());
Collections.addAll(months, DateFormatSymbols.getInstance().getShortMonths());
System.out.println(months.contains(args[0]));
}
}
Merge months and shortMonths into a single sorted array and do a binary search on the array. Or merge them both into a Set (HashSet) and use contains. Change all the month names to lowercase and do the same with the search value, if you want to be case insensitive.
If you want to be able to retrieve the number of the month, merge them all into a Map (HashMap) with the value being the month number.
HashSet is a good general purpose solution - but I think you can do better. Take a look at the first letter of the months - jfmasond - if you pre-filter on those, and only do the HashSet check if it passes, it will take care of a huge number of your 'return false' scenarios.
You can set this up a couple of ways - one super easy way to do it is to use a switch statement, although a lookup table would be faster. Note also that you only need to do the check if the first character is between a and s, so a lookup table doesn't have to have the full unicode (or UTF-8 depending on the requirements) code space.
To make this even more effective, you can construct your lookup table so it contains the first 2 characters of each month - the resulting lookup table isn't too big, and this would drastically reduce the number of words that need to be checked against the hashset.
PS - before you do any of this, you should do some profiling and make sure that this is the area of your code that is actually the bottleneck.
精彩评论