I've got a string like this:
<block trace="true" name="AssignResources: Append Resources">
I need to get the word (or the characters to next whitespace) after <
(in this case block) and the words before =
(here
trace and name).
I tried several regex patterns, but all my attempts return the word with the "delimiters" characters included... like ;block
.
I'm sure it's not that hard, but I've not found the solution yet.
Anybody's got a hint?
Thanks.Btw: I want to replace the pattern matches with gsub
.
EDIT:
Solved it with following regexes:
1)
/\s(\w+)="(.*?)"/
matches all attr and their values in $1 and $2.
2)
/<!--.*-->/
matches comments
3)
/<([\/|!|\?]?)([A-Za-z0-9]+)[^\s|>|\/]*/
matches a开发者_高级运维ll tag names, wheter they're in a closing tag, self closing tag, <?xml>
-tag or DTD-tag. $1
includes optional prefixed / ! or ?
or nothing and $2
contains the tagname
Its looks so much like parsing HTML with regex to me
Ruby has very good html parser called Nokogiri
And Here is howto for that
require 'nokogiri'
html=Nokogiri::HTML('<block trace="true" name="AssignResources: Append Resources">')
html.xpath("//*").each do |s|
puts s.node_name #block
puts s.keys #trace, name
puts s.values #true, AssignResources: Append Resources
end
You can try:
<([^ ]*)\s([^=]*)=
'<block trace="true" name="AssignResources: Append Resources">'[/<(\w+)/, 1]
#=> "block"
If you pass a regex and an index i to String#[]
, it'll return the value of the ith capturing group.
Edit:
In 1.9 you can use /(?<=<)\w+/
to require the presence of the <
without matching it. In 1.8 there is no way to do that. The best you can do is to put the part, you don't want to replace, in a capturing group and and access that group in the replacement like this:
"lo<la li".gsub(/(<)(\w+)/, '\1 --\2--')
#=> "lo< --la-- li"
<block trace="true" name="AssignResources: Append Resources">
<([^\s]+)\s+([^=]+)="([^"]*)"\s+([^=]+)="([^"]*)"\s*>
#result:
$1 block
$2 trace
$3 true
$4 name
$5 AssignResources: Append Resources
Update: I don't know ruby, but based on the description of gsub here, I believe that something like the following should do the trick.
str = '<block trace="true" name="AssignResources: Append Resources">'
repl = str.gsub(/<([^\s]+)\s+([^=]+)="([^"]*)"\s+([^=]+)="([^"]*)"\s*>/,
"tag name: \\1\n\\2 is \\3 and \\4 is \\5\n")
print repl
Most probably you should go with Nokigiri or something similar. I couldn't fit it in one gsub but in two:
>> m,r=0,["<blockie ", " tracie=", " namie="]
>> s.gsub(/<.*?([^\s]+)\s/, r[0]).gsub(/\s([^=]+)=/) {|ma| m+=1; r[m]}
=> "<blockie tracie="true" namie="AssignResources: Append Resources">"
精彩评论