#include <iostream>
#include <iomanip>
#include <string>
#include <vector>
using namespace std;
class Item {
public:
Item(const string & v): value(v), next(0) { }
string value;
Item * next;
};
int hash_function(const string & s)
{
unsigned int hashval = 0;
int i = s.length();
while (i > 0)
{
hashval += s[--i];
}
return hashval%101;
}
main()
{
string name;
int index;
Item * p;
vector<Item *> bucket(101);
for (index = 0; index < 101; index++)
bucket[index] = 0;
while (cin >> name) {
p = new Item(name);
index = hash_function(name);
// push front
if (bucket[index] != 0)
p->next = bucket[index];
bucket[index] = p;
}
for (index = 0; index < 101; index++)
if (bucket[index] != 0) {
cout << setw(3) << index << ": ";
p = bucket[index];
while (p != 0) {
cout << p->value << " ";
p = p->next;
}
cout << endl;
}
Item * temp;
for (index = 0; index < 101; index++) {
p = bucket[index];
while (p != 0) {
temp = p;
p = p->next;
delete temp;
}
}
}
which contains two very simple hash functions. I'm trying to work on the one which is not commented out, as it seems like the better of the two when tested. I want a set of names that is input to be distributed evenly in it's own bucket and so far, that seems to 开发者_StackOverflowbe working, with the exception of names which begin with the same letter. For example, Amy and Alice will appear in the same bucket and so on.
Here is a sample input/output:
Alice
Amy
Barry
Carrie
David
Garret
Edward
Henry
Ingrid
Fred
65: Amy Alice
66: Barry
67: Carrie
68: David
69: Edward
70: Fred
71: Garret
72: Henry
73: Ingrid
What can I add to my algorithm that would allow Amy and Alice to be placed in their own bucket?
Your function hash_function
isn't actually returning a value. You should pay more attention to your compiler's warnings!
Apparently it happens to have the effect of returning the first character in the string. This is purely arbitrary. On another platform it might always return zero, or cause your computer to explode. (Probably not actually the latter.)
As for making a better hash function: once you fix this bug, you'll no longer find that the hash value depends only on the first character. However, you will find e.g. that "Brian" and "Brain" hash to the same value. That's the next thing you should think about.
Instead of blindly adding each letter, give some weight to each, so that cpp
, pcp
, ppc
all could produce different hashvalue.
Here is little improved version:
int hash_function(const string & s)
{
double hashval = 0;
int i = s.length();
double weight = 1.0;
while (i > 0)
{
hashval += weight * s[--i];
weight *= 1.5;
}
return (int) hashval;
}
Assuming the string s
is not too long, otherwise there will be overflow!
Check these out(suggested by the google sparsehash): Bob Jenkins: http://burtleburtle.net/bob/hash/ or Paul Hsieh: http://www.azillionmonkeys.com/qed/hash.html
Try weighting the different letters differently. In your current implementation (assuming it worked, as mentioned above), the name ab would hash to the same value as ba. Something like:
for (int i = 0 to str.len())
hash = hash + hash + str[i]
would return different values for two strings with the same letters, yet is still very simple.
精彩评论