开发者

Reading strings from one file and adding to another file with suffix to make unique

开发者 https://www.devze.com 2023-03-07 15:44 出处:网络
I am processing documents in ruby. I have a document I am extracting specific strings from using regexp and then adding them to another file. When added to the destination file they must be made uniq

I am processing documents in ruby.

I have a document I am extracting specific strings from using regexp and then adding them to another file. When added to the destination file they must be made unique so if that string already exists in the destination file I'am adding a simple suffix e.g. <word>_1. Eventually I want to be referencing the strings by name so random number generation or string from the date is no good.

At present I am storing each word added in an array and then everytime I add a word I check the string doesn't exist in an array which is fine if there is only 1 duplicate however there might be 2 or more so I need to check for the initial string then loop incrementing the suffix until it doesn't exist, (I have simplified my code so there may be bugs)

def add_word(word) 
  if @added_words include? word
    suffix = 1
    suffixed_word = word
    while开发者_Python百科 added_words include? suffixed_word
      suffixed_word = word + "_" + suffix.to_s
      suffix += 1
    end
    word = suffixed_word                 
  end
  @added_words << word
end

It looks messy, is there a better algorithm or ruby way of doing this?


Make @added_words a Set (don't forget to require 'set'). This makes for faster lookup as sets are implemented with hashes, while still using include? to check for set membership. It's also easy to extract the highest used suffix:

>> s << 'foo' 
#=> #<Set: {"foo"}>
>> s << 'foo_1' 
#=> #<Set: {"foo", "foo_1"}>
>> word = 'foo'
#=> "foo"
>> s.max_by { |w| w =~ /#{word}_?(\d+)?/ ; $1 || '' } 
#=> "foo_1"
>> s << 'foo_12' #=> 
#<Set: {"foo", "foo_1", "foo_12"}>
>> s.max_by { |w| w =~ /#{word}_?(\d+)?/ ; $1 || '' } 
#=> "foo_12"

Now to get the next value you can insert, you could just do the following (imagine you already had 12 foos, so the next should be a foo_13):

>> s << s.max_by { |w| w =~ /#{word}_?(\d+)?/ ; $1 || '' }.next 
#=> #<Set: {"foo", "foo_1", "foo_12", "foo_13"}

Sorry if the examples are a bit confused, I had anesthesia earlier today. It should be enough to give you an idea of how sets could potentially help you though (most of it would work with array too, but sets have faster lookup).


Change @added_words to a Hash with a default of zero. Then you can do:

@added_words = Hash.new(0)

def add_word( word)
  @added_words[word] += 1
end

# put it to work:

list = %w(test foo bar test bar bar)
names = list.map do |w|
  "#{w}_#{add_word(w)}"
end
p @added_words
#=> {"test"=>2, "foo"=>1, "bar"=>3}
p names
#=>["test_1", "foo_1", "bar_1", "test_2", "bar_2", "bar_3"]


In that case, I'd probably use a set or hash:

#in your class:
require 'set'
require 'forwardable'
extend Forwardable #I'm just including this to keep your previous api

#elsewhere you're setting up your instance_var, it's probably [] at the moment
def initialize
   @added_words = Set.new
end

#then instead of `def add_word(word); @added_words.add(word); end`:
def_delegator :added_words, :add_word, :add 
#or just change whatever loop to use #@added_words.add('word') rather than self#add_word('word')
#@added_words.add('word') does nothing if 'word' already exists in the set.

If you've got some attributes that you're grouping via these sections, then a hash might be better:

#elsewhere you're setting up your instance_var, it's probably [] at the moment
def initialize
   @added_words = {}
end

def add_word(word, attrs={})
   @added_words[word] ||= []
   @added_words[word].push(attrs)
end


Doing it the "wrong way", but in slightly nicer code:

def add_word(word) 
  if @added_words.include? word
    suffixed_word = 1.upto(1.0/0.0) do |suffix|
      candidate = [word, suffix].join("_")
      break candidate unless @added_words.include?(candidate)
    end
    word = suffixed_word
  end
  @added_words << word
end
0

精彩评论

暂无评论...
验证码 换一张
取 消