I never tried regex before today, and I like it so far, but I'm lost on some things.
I have a string that looks like this:
Type OtherType ThirdType - SubType AnotherSubType QuiteTheType
I want two regex, both care about the '-
' character.
gsub
to turn them into an array of strings, two arrays, which is why I need two regex expressions.
So far I have this: ([a-zA-z]{1,}) (?=-)
but that only gets me the word right before the dash, I.E. ThirdType
.
If I just use ([a-zA-z]{1,})
I get all words highlighted开发者_运维问答, but that includes the ones AFTER the -
which I don't want yet.
How can I get all occurrences of [a-zA-z]{1,}
that happen before -
but not necessarily IMMEDIATELY before it?
s = "Type OtherType ThirdType - SubType AnotherSubType QuiteTheType"
words_before, words_after = s.split(/\s*-\s*/).map do |t|
t.split(/\s+/)
end
p words_before # => ["Type", "OtherType", "ThirdType"]
p words_after # => ["SubType", "AnotherSubType", "QuiteTheType"]
Here's how this works:
s.split(/\s*-\s*/)
This splits the string in two, using a regular expression delimiter. The delimiter means "any amount of white-space, then a dash, then any amount of white-space." The result is an array with two strings in it: The part on the left of the delimeter, and the part on the right.
...map do |t|
...
end
map takes an array and transforms it into another array with the same number of elements. It takes each element of the array, passes it to the block, and uses the return value from the block as the new value for that element. We'll use it to transform the two strings into two arrays of words.
So, what's in the block?
t.split(/\s+/)
It's another split. This time we'll split on one or more whitespace characters. That results in an array of words.
Since the map applies that split to first the left side and then the right side, the result of the entire s.split...
expression is an array of two arrays.
Now we'll use one of Ruby's fun syntaxes:
words_before, words_after = s.split...
Whenever you have multiple variables on the left side of an assignment, ruby will "decompose" the array on the right side, assigning the first element of the array to the first variable, the second element of the array to the second variable, and so on. Since our array has two elements (the first being an array of words from the left side, and the second being an array of words from the right side), we'll use two variables to hold them.
I don't know exactly how Ruby's regex implementation works, but the following regex in Perl should get you what you want:
/^([a-zA-z\s]+) \- ([a-zA-Z\s]+)$/
For example:
perl -e '$_="Type OtherType ThirdType - SubType AnotherSubType QuiteTheType";
if(/^([a-zA-z\s]+) \- ([a-zA-Z\s]+)$/){print "$1\n";print "$2\n";}'
produces
Type OtherType ThirdType
SubType AnotherSubType QuiteTheType
ETA: To explain what's going on, the initial ^
denotes the beginning of the line and the ending $
denotes the end of the line. So, ^([a-zA-Z\s]+)
starts at the beginning and (greedily) matches all of the words from the beginning of the line up until the space before the dash (which is escaped by a backslash, since -
is a reserved character in most regex implementations). Likewise with ([a-zA-Z\s]+)$
.
You can use look-ahead:
(\w+)(?=.*?-)
While regex is powerful and useful, it often leads to a more complicated solution than you need, and complicated results in more work and maintenance.
sentence = 'Type OtherType ThirdType - SubType AnotherSubType QuiteTheType'
sentence.split('-') # => ["Type OtherType ThirdType ", " SubType AnotherSubType QuiteTheType"]
sentence.scan(/[^-]+/) # => ["Type OtherType ThirdType ", " SubType AnotherSubType QuiteTheType"]
If the whitespace surrounding the hyphen is annoying pass the returned sections through strip
:
sentence.split('-').map{ |w| w.strip } # => ["Type OtherType ThirdType", "SubType AnotherSubType QuiteTheType"]
sentence.scan(/[^-]+/).map{ |w| w.strip } # => ["Type OtherType ThirdType", "SubType AnotherSubType QuiteTheType"]
If you want the individual words, and not the sentences before and after the hyphen:
sentence.split('-').map{ |w| w.strip.split(' ') } # => [["Type", "OtherType", "ThirdType"], ["SubType", "AnotherSubType", "QuiteTheType"]]
sentence.scan(/[^-]+/).map{ |w| w.strip.split(' ') } # => [["Type", "OtherType", "ThirdType"], ["SubType", "AnotherSubType", "QuiteTheType"]]
精彩评论