Is there a hash implementation around that doens't remember key values? I have to make a giant hash but I don't care what the keys are.
Edit:
Ruby's hash implementation stores the key's value. I would like hash that doesn't remember the key's value. It just uses the hash function to store your value and forgets the key. The reason for this is that I need to make a hash for about 5 gb of data and I don't care what the key values are after creating it. I only want to be able to look up the values based on other keys.
Edit Edit:
The language is kind of confusing. By key's value I mean this:
hsh['value'] = data
I don't care what 'value' is after the hash function stores data in the hash.
Edit^3:
Okay so here's what I am doing: I am generating every 35-letter (nucleotide) kmer for a set of multiple genes. Each gene has an ID. The hash looks like this:
kmers = { 'A...G' => [1, 5, 3], 'G...T' => [4, 9, 9, 3] }
So the hash key is the kmer, and the value is an array containing IDs for the gene(s)/string(s) that have that kmer.
I am querying the hash for kmers in another dataset to quickly find matching genes. I don't care what the hash keys are, I just need to get the arra开发者_如何学JAVAy of numbers from a kmer.
>> kmers['A...G']
=> [1, 5, 3]
>> kmers.keys.first
=> "Sorry Dave, I can't do that"
I guess you want a set, allthough it stores unique keys and no values. It has the fast lookup time from a hash. Set is included in the standard libtrary.
require 'set'
s = Set.new
s << 'aaa'
p s.merge(['ccc', 'ddd']) #=> #<Set: {"aaa", "ccc", "ddd"}>
Even if there was an oddball hash that just recorded existence (which is how I understand the question) you probably wouldn't want to use it, as the built-in Hash would be simpler, faster, not require a gem, etc. So just set...
h[k] = k
...and call it a day...
I assume the 5 gb string is a genome, and the kmers are 35 base pair nucleotide sequences.
What I'd probably do (slightly simplified) is:
human_genome = File.read("human_genome.txt")
human_kmers = Set.new
human_genome.each_cons(35) do |potential_kmer|
human_kmers << potential_kmer unless human_kmers.include?(potential_kmer)
end
unknown_gene = File.read("unknown_gene.txt")
related_to_humans = unknown_gene.each_cons(35).any? do |unknown_gene_kmer|
human_kmers.include?(unknown_gene_kmer)
end
I have to make a giant hash but I don't care what the keys are.
That is called an array. Just use an array. A hash without keys is not a hash at all and loses its value. If you don't need key-value lookup then you don't need a hash.
Use an Array. An Array indexes by integers instead of keys. http://www.ruby-doc.org/core/classes/Array.html
a = []
a << "hello"
puts a #=> ["hello"]
精彩评论