开发者

Hashcode comparison problem

开发者 https://www.devze.com 2022-12-21 09:41 出处:网络
I have list of a an object 开发者_Python百科which is termed as rule in our case, this object itself is a list of field for which I have to do hashcode comparison as we can\'t duplicate rule in the

I have list of a an object 开发者_Python百科which is termed as rule in our case, this object itself is a list of field for which I have to do hashcode comparison as we can't duplicate rule in the system.

i.e Let say I have two Rules R1 and R2 with fields A & B.

Now if values of A & B in R1 are 7 and 2 respectively.

And in R2 it's 3 and 4 respectively then the process I have used to check the duplicity of Rules in the system that is hashcode comparison fails

the method which I have used is

for(Rule rule : rules){
changeableAttrCode=0;

fieldCounter=1;

attributes = rule.getAttributes();

for(RuleField ruleField : attributes){

changeableAttrCode = changeableAttrCode + (fieldCounter * ruleField.getValue().hashCode());

fieldCounter++;

}
parameters = rule.getParameters();

for(RuleField ruleField : parameters){

changeableAttrCode = changeableAttrCode + (fieldCounter * ruleField.getValue().hashCode());

fieldCounter++;

}

changeableAttrCodes.add(changeableAttrCode);

here changeableAttrCodes where we store the hashcode of all the rules.

so can please suggest me better method so that this kind of problem does not arise in future as well as duplicity of rules in system can be seen.

Thanks in advance


hashcode() is not meant to be used to check for equality. return 42; is a perfectly valid implementation of hashcode(). Why don't you overwrite equals() (and hashcode() for that matter) in the rules objects and use that to check whether two rules are equal? You could still use the hashcode to check which objects you need to investigate, since two equal() objects should always have the same hashcode, but that is a performance improvement that you may or may not need, depending on your system.


  • Implement hashCode and equals in class Rule.
  • Implementation of equals has to compare its values.

Then use a HashSet<Rule> and ask if(mySet.contains(newRule))

HashSet + equals implementation solves the problem of the non-uniqueness of the hash. It uses hash for classifying and speed but it uses equals at the end to ensure that two Rules with same hash are the same Rule or not.

More on hash: if you want to do it by hand, use the prime number sudggestion, and review the JDK code for string hashcodes. If you want to make a clean implementation try to retrieve the hashcode of the elements, make some kind of array of ints and use Arrays.hashCode(int[]) to get a hashcode for the combination of them.


Updated Your hashing algorithm is not producing a good spread of hash values - it gives the same value for (7, 2) and (3, 4):

1 * 7 + 2 * 2 = 11
1 * 3 + 2 * 4 = 11

It would also give the same value for (11, 0), (-1, 6), ... and one can trivially make up an endless number of similar equivalence classes based on your current algorithm.

Of course you can not avoid collisions - if you have enough instances, hash collision is inevitable. However, you should aim to minimize the chance for collisions. Good hashing algorithms strive to spread hash values equally over a wide range of values. A typical way to achieve this is to generate the hash value for an object containing n independent fields as an n-digit number with a base big enough to hold the different hash values for the individual fields.

In your case, instead of multiplying with fieldCounter you should multiply with a prime constant, e.g. 31 (that would be the base of your number). And add another prime constant to the result, e.g. 17. This gives you a better spread of hash values. (Of course the concrete base depends on what values can your fields take - I have no info about that.)

Also if you implement hashCode, you are strongly advised to implement equals as well - and in fact, you should use the latter to test for equality.

Here is an article about implementing hashCode.


I don't understand what you are trying to do here. With most hash function scenarios, collision is inevitable, because there are way more objects to hash than there are possible hash values (it's a pigeonhole principle).

It is generally the case that two different objects may have the same hash value. You cannot rely on hash functions alone to eliminate duplicates.

Some hash functions are better than others in minimizing collisions, but it's still an inevitability.


That said, there are some simple guidelines that usually gives a good enough hash function. Joshua Bloch gives the following in his book Effective Java 2nd Edition:

  • Store some constant nonzero value, say 17, in an int variable called result.
  • Compute an int hashcode c for each field:
    • If the field is a boolean, compute (f ? 1 : 0)
    • If the field is a byte, char, short, int, compute (int) f
    • If the field is a long, compute (int) (f ^ (f >>> 32))
    • If the field is a float, compute Float.floatToIntBits(f)
    • If the field is a double, compute Double.doubleToLongBits(f), then hash the resulting long as in above.
    • If the field is an object reference and this class's equals method compares the field by recursively invoking equals, recursively invoke hashCode on the field. If the value of the field is null, return 0.
    • If the field is an array, treat it as if each element is a separate field. If every element in an array field is significant, you can use one of the Arrays.hashCode methods added in release 1.5.
  • Combine the hashcode c into result as follows: result = 31 * result + c;


I started to write that the only way you can achieve what you want is with Perfect Hashing.

But then I thought about the fact that you said you can't duplicate objects in your system.

Edit based on thought-provoking comment from helios:

Your solution depends on what you meant when you wrote that you "can't duplicate rules".

If you meant that literally you cannot, that there is guaranteed to be only one instance of a rule with a particular set of values, then your problem is trivial: you can do identity comparison, in which case you can do identity comparison using ==.

On the other hand, you meant that you shouldn't for some reason (performance), then your problem is also trivial: just do value comparisons.

Given the way you've defined your problem, under no circumstances should you be considering the use of hashcodes as a substitute for equality. As others have noted, hashcodes by their nature yield collisions (false equality), unless you go to a Perfect Hashing solution, but why would you in this case?

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号