开发者

Array-like data structure efficient in terms of insert & find time

开发者 https://www.devze.com 2023-03-03 08:00 出处:网络
I need a data structure, implemented in Ruby language, to house a huge amount of distinct urls (e.g. 10**10 pieces), so the performance is my concern.

I need a data structure, implemented in Ruby language, to house a huge amount of distinct urls (e.g. 10**10 pieces), so the performance is my concern. I am using an Array and keep its elements ordered, so I can perform binary search to find if one already exists or where to insert a url rapidly. I wrote this:

class UrlStore
    def initialize(*args)
        @urls = []
        args.each { |e| add(e) unless e.class != String }
    end

    def add(url)
        p = find(url)
        @urls.insert(p, url) unless p.class == String
    end


    def find(url)
        l, m, h = 0, 0, @urls.size - 1

        while l <= h do
            m = l + (h - l) / 2
            b = url <=> @urls[m]
            if b == 0
                return m.to_s
            elsif b == 1
                l = m + 1
            else
                h = m - 1
            end
        end

        return l
    end
end

The find method will, if found, return the position of the url in hosting array, but in String form in order to distinguish from those positions found to be inserted; Otherwise, return an integer(Fixnum) telling wher开发者_运维技巧e the url should go to keep the array ordered.

But note that, I use Array#insert to add an element at a specified position. My intuition tells me that this method will move all elements after insert-point a step backward, which may cause severe performance deduction. The Array module is not in the standard library, its Ruby's intrinsic data type, so I don't know if I am right.

May be it's so naive a data structure for hosting such a task. So can any one share an awesome one.


If, as Phrogz has suggested you do manage to get 370GB of memory, or you realise you don't actually need to store that many URLs, you might want to look into using a SortedSet.


You might want to look at the growing number of Open source NoSQL solutions including MongoDB, Cassandra, Kyoto Cabinet or Redis.

MongoHQ provides a free hosting service for MongoDB. RedisToGo provides a free hosting service for Redis. Both have very easy to use Ruby bindings. I have used both and recommend them.

I have heard good things about Cassandra and Kyoto Cabinet but have not used them in any production app.

0

精彩评论

暂无评论...
验证码 换一张
取 消