开发者

Character which can be used to distinguish set of regex and that character must not be used in regex

开发者 https://www.devze.com 2023-01-24 06:45 出处:网络
I am defining set of rege开发者_JAVA百科x in xml file and then I will read them from xml file to match them with string. Now problem is if I want to define them in xml file then there should one speci

I am defining set of rege开发者_JAVA百科x in xml file and then I will read them from xml file to match them with string. Now problem is if I want to define them in xml file then there should one special character which distinguish them. But that character must not use in regex like if I use $ to distinguish set of regex then there will be problem when $ will be used in regex which will be define in xml. So which is the character which can be used to distinguish regexes defined in xml file and not used in regex ?


Not sure what you mean by a special character - but if the issue is including regex expressions within XML you should use the character data section tag, <![CDATA[My unparsed data goes here.]]>:

<regexes>
  <regex><![CDATA[/^\w+$/]]></regex>
  <regex><![CDATA[/^\d+/]]></regex>
  <regex><![CDATA[/^\[a-zA-Z]\@example\.com$/]]></regex>
</regexes>


In your regex strings just replace all spaces (' ') with equivalen unicode representation ('\u0020') and then join them with regular space.

irb(main):001:0> regexps = [/^a b/, /end$/, /[ ]+/]
=> [/^a b/, /end$/, /[ ]+/]
irb(main):002:0> attr = regexps.map{|r| r.source.gsub(' ', '\u0020')}.join(' ')
=> "^a\\u0020b end$ [\\u0020]+"

irb(main):006:0> regexps = attr.split(' ').map{|s| Regexp.new(s)}
=> [/^a\u0020b/, /end$/, /[\u0020]+/]
irb(main):008:0> regexps[0] =~ 'a bdbbb'
=> 0


You'll go around in circles trying to find a character. Regular expressions have already solved that problem: The first character is the delimiter, which must also end the expression. If this delimiter is to be used literally inside the expression, it needs to be escaped.

/foo/
^   ^
delimiter

/foo\/bar/
     ^
escaped literal '/' character

The same goes for XML: The <, > and & characters have special meanings, if you want to use them as part of the content, you need to escape them using &gt;, &lt; and &amp;.

Alternatively, you can wrap a section in <![CDATA[ ]]> tags, but then you can not use the sequence ]]> inside those without escaping (you see, it's going around in circles).

As such, you will need to escape the expression delimiter and XML entities in your expressions:

<regex>/foo&amp;\/bar&gt;baz&lt;/</regex>   (to match 'foo&/bar<baz>')
<regex><![CDATA[/foo&\/bar<baz>/]]></regex>  (alternative)
0

精彩评论

暂无评论...
验证码 换一张
取 消