I'm trying to return a prefix of a string, my related question is here,but I've run into a new problem.
How to return the string prefix from regexp
Basically I have a strings like
23430-BL 23430BZ 23430BK/BL
The Extensions I'm trying to remove are
strip_ext = BK/BL|BZ|BL
The regular expression I'm using to get the string without the e开发者_C百科xtension is
prefix = sample_data[/(.*[^-])-?(?:#{strip_ext})/,1]
This is returning
23430 23430 23430-BK
In theory, I understand that the regexp finds the BL match, and for some reason selects that as the match over the BK/BL. But is there a way to get the regexp to find BK/BL rather than BL?
Unfortunately, there isn't always a dash before the part that I am looking to strip.
I added the original strip_ext list as an example, and thought it would make it easy to understand. An actual strip_ext list looks like this and changes based on the sample data provided, so unfortunately it isn't as easy as Mu's answer below.
AM/DB|AM/BN|RD/BK|PR/WT|YP/BN|YP/CH|YP/DB|PK/BN|PK/CH|PK/DB|SF/BN|SF/CH|SF/DB|AM/CH|BN/CH|BN/DB|CH/BN|CH/DB|DB/BN|DB/CH|BN/BN|CH/CH|MR/BN|MR/CH|MR/DB|DB/DB|AM/AB|DIC/BN|DIC/CH|DIC/DB|BN|DB|WT|BN/WT|BK|WT/BN|BK/BN|BK/DB|BL/BN|BL/DB|BK/CH|BL/CH|AM|CH|FR|SB|AM/BK|AM/WT|PT/CH|BG/CH|BG/DB|MF/CH|MF/DB|YR/CH|YR/DB|WT/DB|pt/bn
Make the first quantifier ungreedy.
(.*?[^-])-?(?:BK/BL|BZ|BL)
See it here on Regexr
The ?
causes the .*?
to match as less as possible.
You could mix a negative look-behind into your BL
matcher:
/(.*[^-])-?(?:BK\/BL|BZ|(?<!BK\/)BL)/
Adding (?<!BK\/)
indicates that you want to match BL
except when it is preceded by BK/
.
A quick test:
>> %w{23430-BL 23430GR 23430BK/BL}.map { |s| s[/(.*[^-])-?(?:BK\/BL|BZ|(?<!BK\/)BL)/,1] }
=> ["23430", nil, "23430"]
Your sample output doesn't match your input though, is "GR" a typo in your inputs or is "BZ" a typo in your regex?
Given that your patterns are not fixed, you could bypass regular expressions completely and fall back on simple string wrangling. Here's a better example of what I mentioned in my comment:
require 'set'
# The suffix list that you get from somewhere.
suffixes = [ 'BK/BL', 'BZ', 'BL' ]
# We want to do a couple things at once here. For each suffix, we
# want both the suffix and the suffix with a leading '-' attached,
# the `map` and `flatten` stuff does that. Then we group them by
# length to get a hash like:
#
# { 2 => ['BZ','BL'], 3 => ['-BZ', '-BL'], 5 => ['BK/BL'], ... }
#
by_length = suffixes.map { |suffix| [suffix, '-' + suffix ] }.flatten.group_by(&:length)
# Now we reorganize our suffixes into sets with the set of longest
# suffixes first and the set of shortest suffixes last. The result
# will be:
#
# [#<Set: {"-BK/BL"}>, #<Set: {"BK/BL"}>, #<Set: {"-BZ", "-BL"}>, #<Set: {"BZ", "BL"}>]
#
sets = by_length.keys.sort { |a,b| b <=> a }.map { |k| Set.new(by_length[k]) }
# Then we can just spin through sets, pull off the suffix of the
# appropriate length from the string, and see if it is in our set.
# If it is then chop the suffix off the string, do whatever is to be
# done with chopped string, and break out for the next string.
#
%w{ 23430-BL 23430BZ 23430BK/BL }.each do |string|
sets.each do |suffixes|
len = suffixes.first.length
sfx = string[string.length - len, len]
if(suffixes.include?(sfx))
puts string[0 .. -(len + 1)]
break
end
end
end
That's just an "off the top of my head" illustration of the algorithm.
精彩评论