开发者

Ruby Regex matching string before and after certain characters

开发者 https://www.devze.com 2022-12-21 15:44 出处:网络
I\'ve got a string like this: <block trace=\"true\" name=\"AssignResources: Append Resources\">

I've got a string like this:

<block trace="true" name="AssignResources: Append Resources">

I need to get the word (or the characters to next whitespace) after < (in this case block) and the words before = (here trace and name).

I tried several regex patterns, but all my attempts return the word with the "delimiters" characters included... like ;block.

I'm sure it's not that hard, but I've not found the solution yet.

Anybody's got a hint?

Thanks.

Btw: I want to replace the pattern matches with gsub.

EDIT:

Solved it with following regexes:

1) /\s(\w+)="(.*?)"/ matches all attr and their values in $1 and $2.

2) /<!--.*-->/ matches comments

3) /&lt;([\/|!|\?]?)([A-Za-z0-9]+)[^\s|&gt;|\/]*/ matches a开发者_高级运维ll tag names, wheter they're in a closing tag, self closing tag, <?xml>-tag or DTD-tag. $1 includes optional prefixed / ! or ? or nothing and $2 contains the tagname


Its looks so much like parsing HTML with regex to me

Ruby has very good html parser called Nokogiri

And Here is howto for that

require 'nokogiri'

html=Nokogiri::HTML('<block trace="true" name="AssignResources: Append Resources">')

html.xpath("//*").each do |s|
    puts s.node_name #block
    puts s.keys #trace, name
    puts s.values #true, AssignResources: Append Resources
end


You can try:

&lt;([^ ]*)\s([^=]*)=


'&lt;block trace="true" name="AssignResources: Append Resources"&gt;'[/&lt;(\w+)/, 1]
#=> "block"

If you pass a regex and an index i to String#[], it'll return the value of the ith capturing group.

Edit:

In 1.9 you can use /(?<=&lt;)\w+/ to require the presence of the &lt; without matching it. In 1.8 there is no way to do that. The best you can do is to put the part, you don't want to replace, in a capturing group and and access that group in the replacement like this:

"lo&lt;la li".gsub(/(&lt;)(\w+)/, '\1 --\2--')
 #=> "lo&lt; --la-- li"


&lt;block trace="true" name="AssignResources: Append Resources"&gt;

&lt;([^\s]+)\s+([^=]+)="([^"]*)"\s+([^=]+)="([^"]*)"\s*&gt;

#result:

$1 block
$2 trace
$3 true
$4 name
$5 AssignResources: Append Resources

Update: I don't know ruby, but based on the description of gsub here, I believe that something like the following should do the trick.

str = '&lt;block trace="true" name="AssignResources: Append Resources"&gt;'
repl = str.gsub(/&lt;([^\s]+)\s+([^=]+)="([^"]*)"\s+([^=]+)="([^"]*)"\s*&gt;/, 
    "tag name: \\1\n\\2 is \\3 and \\4 is \\5\n")
print repl


Most probably you should go with Nokigiri or something similar. I couldn't fit it in one gsub but in two:

>> m,r=0,["&lt;blockie ", " tracie=", " namie="]
>> s.gsub(/&lt;.*?([^\s]+)\s/, r[0]).gsub(/\s([^=]+)=/) {|ma| m+=1; r[m]}
=> "&lt;blockie tracie="true" namie="AssignResources: Append Resources"&gt;"
0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号