开发者

Consistent String#hash based only on the string's content

开发者 https://www.devze.com 2023-03-16 17:57 出处:网络
GOAL: Map every URL handled by a server to 0, 1, 2, or 3, distributing as uniformly as possible. While the documentation for ruby\'s String#hash method says it will \"return a hash based on the strin

GOAL: Map every URL handled by a server to 0, 1, 2, or 3, distributing as uniformly as possible.

While the documentation for ruby's String#hash method says it will "return a hash based on the string‘s length an开发者_高级运维d content," this clearly isn't the whole story. A given string's hash is not consistent across invocations of the interpreter:

$ irb
ruby-1.9.2-p180 :001 > "foo".hash
 => 360517580588231756 
ruby-1.9.2-p180 :002 > ^D

$ irb
ruby-1.9.2-p180 :001 > "foo".hash
 => -2716152678666510148 

This means a particular string's hash value may differ across, say, servers. Rails uses String#hash internally to map a URL path to one of four asset hosts (if the app's asset_host is so configured), but this feature is a lot less efficient than it could be because of the cross-machine inconsistencies; different servers may map the same URL to different asset hosts, reducing the effectiveness of caches, clouding skies, cooling cups of tea prematurely, besmirching the reputations of otherwise fine programmers.

Can you suggest an alternate hash function that could effectively and speedily distribute hashes across a typical app's URL space, preferably one that produces a Fixnum since, in the end, I'll want to map it into one of four asset hosts?


there are lot of such functionality in ruby's digest module: http://ruby-doc.org/stdlib/libdoc/digest/rdoc/index.html

simple example:

require 'digest/sha1'
Digest::SHA1.hexdigest("some string")


The easiest (and consistent) way may be this (and it's fast):

"https://www.example.com/abc/def/123?hij=345".sum % 4

That will always produce an integer 0 - 3, is quite fast, and should be fairly well distributed (though I haven't actually run tests on distribution).


There is tiny library xxHash:

XXhash.xxh32('qwe') #=> 2396643526
XXhash.xxh64('qwe') #=> 9343136760830690622

Maybe it will have more collisions but it is 10x faster than SHA1:

Benchmark.bm do |x|
  n = 100_000
  str = 'qweqweqwe'
  x.report('xxhash32') { n.times { XXhash.xxh32(str) } }
  x.report('xxhash64') { n.times { XXhash.xxh64(str) } }
  x.report('hexadigest') { n.times { Digest::SHA1.hexdigest(str) } }
end;1

#       user     system      total        real
# xxhash32  0.020000   0.000000   0.020000 (  0.021948)
# xxhash64  0.040000   0.000000   0.040000 (  0.036340)
# hexadigest  0.240000   0.030000   0.270000 (  0.276443)


You can try to_i(36).

"Hash me please :(".to_i(36)
=> 807137
0

精彩评论

暂无评论...
验证码 换一张
取 消