开发者

How do I efficiently reject Strings in an array if they (regex) match Strings in a second array in Ruby?

开发者 https://www.devze.com 2022-12-13 13:56 出处:网络
I have two arrays of Strings, for example sentences and words. If any word is found in a sentence e.g. sentence =~ /#{word}/ I want to reject the sentence fr开发者_Python百科om the sentence array. Thi

I have two arrays of Strings, for example sentences and words. If any word is found in a sentence e.g. sentence =~ /#{word}/ I want to reject the sentence fr开发者_Python百科om the sentence array. This is easy to do with a double loop, but I'm wondering if there is a more efficient way of doing this, maybe with logical operators?


Array subtraction is your friend here:

words.each do |word|
  sentences -= sentences.grep(/#{word}/)
end

It's still the same basic time complexity (probably less efficient overall), but you can get around writing out the double loop.

Be aware that with this solution, words need not match entire whitespace separated words in the sentence. So, the word cat would knock out the sentence: String concatenation is gross.


Joining strings into a Regexp is a pretty bad idea because backtracking slows things down horribly and because you run into limits on the regex size pretty quickly. (Though it may work well in practice if wordarray is small)

Consider using one of the DictionaryMatcher Ruby Quiz solutions.

Then you can operate as follows:

dm=DictionaryMatcher.new
wordarray.each{|w| dm << w}
sentencearray.reject{|s| s =~ dm}


You could join all the words together into one regex, with the words separated by the "|" character.

sentence =~ /word1|word2|..../

You can convert the word array into a suitable regex with array.join("|").

If the words are likely to contain regex metacharacters then enclose each word in in non-capturing brackets.

sentence =~ /(?:word1)|(?:word2)|..../

Using a single regex should be much more efficient than looping through the words array, since the regex will be compiled into a single statetable.


words = [...]
sentences = [....]

result = sentences.select{|sentence| !words.any?{|word| sentence =~ /#{word}/}}
0

精彩评论

暂无评论...
验证码 换一张
取 消