I'm running a comparison program and at the minute it does a direct 'string-to-string' comparison and if they are an exact match it outputs that they ar开发者_如何学JAVAe a match.
Well, I was hoping to add an additional feature that allowed for 'similarity'...
so for example:
String em1 = "52494646";
String em2 = "52400646";
if (em1.equals(em2)){
output.writeUTF(dir + filenames[i]);
}
This is sort of a snippet of the code. I'd like it so that it skips over the "00" and still recognises it as 'almost' the same number and still outputs it.
I'd imagine it would look something like String em2 = "524"+ ## +"646"
but thats obviously just a concept
Does anyone know if there is a way to have this kind of 'wildcard' (a term I've picked up from uni SQL) or if there is another way to do this similarity type deal.
Thanks :)
You can use regular expressions:
if (em1.matches("524[0-9]{2}646")) {
// do stuff
}
For Java specific documentation see the Pattern
class. For some uses of regular expressions (such as in the sample above), there are shortcut methods in String
: matches()
, replaceAll()
/replaceFirst()
and split()
.
regular-expressions.info has good documentation on regular expression in general.
You can solve it easily using regular expressions:
if (em1.matches("524..646"))
for instance.
(The .
is a wildcard that stands for any character. You could replace it with \\d
if you like to restrict the wildcard to digits.)
Here is a more general variant that matches "0" against any character:
String em1 = "52494646";
String em2 = "52400646";
if (em1.matches(em2.replaceAll("0", "\\\\d"))){
System.out.println("Matches");
}
Usually you can do a combination of startsWith, endsWith, or contains to find if a String start with, ends with or contains another string. You can uses these in combination like
number.startsWith("524") && number.endsWith("646");
Using a regular expression is likely to be a better choice 95% of the time but is more expensive.
I think the problem with the aforementioned RE solution is that you're not interested in numbers that are identical but for the 3rd or 4th position, but in numbers that are identical but for one/two digits.
Which is a bit more intricated problem, but you basically want to compute the http://en.wikipedia.org/wiki/Hamming_distance for your two strings. Well known algorithm for lots of problems so you should find lots of examples, but I fear the standard library won't do it. Also it's a for loop and a counter, so you shouldn't have problems with an implementation - you lose some optimization potential the STL can use (comparing the addresses of the two strings and you have to compare the whole string in either case), but not much more.
Regular expressions would be the way you want to do this. For your example, you would want something like "524\\d{2}646"
. See the Java API for Regex.
Also see the useful Apache Commons IO library here since it sounds like you are dealing with files: https://commons.apache.org/proper/commons-io/javadocs/api-release/index.html?org/apache/commons/io/package-summary.html
You should use Regular Expressions for this.
Well, unfortunately, I believe apache commons StringUtil doesn't have any wildcard operation.
If I remember correctly, there's a StringUtils class on mysql JDBC connector that has a method to compare strings with wildcards.
-Or -
You can try using some fuzz logic: http://jfuzzylogic.sourceforge.net/html/index.html
Why are people reluctant to just write a simple & direct algorithm?
boolean equals(String s1, String s2, char wildcard)
if(s1.length() != s2.length())
return false;
for(int i=0; i<s1.length(); i++)
char c1 = s1.charAt(i), c2 = s2.charAt(i);
if(c1!=wildcard && c2!=wildcard && c1!=c2)
return false;
return true;
If you are looking for a different way to express a wildcard, here is an option:
String em1 = "52494646";
String em2 = "52400646";
if (em2.startsWith("524")){
output.writeUTF(dir + filenames[i]);
}
精彩评论