开发者

Ruby: hash that doesn't remember key values

开发者 https://www.devze.com 2023-03-08 11:04 出处:网络
Is there a hash implementation around that doens\'t remember key values? I have to make a giant hash but I don\'t care what the keys are.

Is there a hash implementation around that doens't remember key values? I have to make a giant hash but I don't care what the keys are.

Edit:

Ruby's hash implementation stores the key's value. I would like hash that doesn't remember the key's value. It just uses the hash function to store your value and forgets the key. The reason for this is that I need to make a hash for about 5 gb of data and I don't care what the key values are after creating it. I only want to be able to look up the values based on other keys.

Edit Edit:

The language is kind of confusing. By key's value I mean this:

hsh['value'] = data

I don't care what 'value' is after the hash function stores data in the hash.

Edit^3:

Okay so here's what I am doing: I am generating every 35-letter (nucleotide) kmer for a set of multiple genes. Each gene has an ID. The hash looks like this:

kmers = { 'A...G' => [1, 5, 3], 'G...T' => [4, 9, 9, 3]  }

So the hash key is the kmer, and the value is an array containing IDs for the gene(s)/string(s) that have that kmer.

I am querying the hash for kmers in another dataset to quickly find matching genes. I don't care what the hash keys are, I just need to get the arra开发者_如何学JAVAy of numbers from a kmer.

>> kmers['A...G']
=> [1, 5, 3]

>> kmers.keys.first
=> "Sorry Dave, I can't do that"


I guess you want a set, allthough it stores unique keys and no values. It has the fast lookup time from a hash. Set is included in the standard libtrary.

require 'set'
s = Set.new
s << 'aaa'
p s.merge(['ccc', 'ddd'])  #=> #<Set: {"aaa", "ccc", "ddd"}>


Even if there was an oddball hash that just recorded existence (which is how I understand the question) you probably wouldn't want to use it, as the built-in Hash would be simpler, faster, not require a gem, etc. So just set...

 h[k] = k

...and call it a day...


I assume the 5 gb string is a genome, and the kmers are 35 base pair nucleotide sequences.

What I'd probably do (slightly simplified) is:

human_genome = File.read("human_genome.txt")
human_kmers = Set.new
human_genome.each_cons(35) do |potential_kmer|
  human_kmers << potential_kmer unless human_kmers.include?(potential_kmer)
end
unknown_gene = File.read("unknown_gene.txt")
related_to_humans = unknown_gene.each_cons(35).any? do |unknown_gene_kmer|
  human_kmers.include?(unknown_gene_kmer)
end


I have to make a giant hash but I don't care what the keys are.

That is called an array. Just use an array. A hash without keys is not a hash at all and loses its value. If you don't need key-value lookup then you don't need a hash.


Use an Array. An Array indexes by integers instead of keys. http://www.ruby-doc.org/core/classes/Array.html

a = []
a << "hello"
puts a #=> ["hello"]
0

精彩评论

暂无评论...
验证码 换一张
取 消