开发者

Help with regular expression to create definition list

开发者 https://www.devze.com 2022-12-11 01:12 出处:网络
Having trouble wrapping my head around this. I need parse this using a regular expression to create the definition list below

Having trouble wrapping my head around this. I need parse this using a regular expression to create the definition list below

Width=3/8 in|Length=1 in|Thread - TPI or Pitch=|Bolt/Screw Length=|Material=|Coating=|Type=Snap-On|Used With=|Quantity=5000 pe开发者_开发问答r pack|Wt.=20 lb|Color=

The result would be something like this

<dt>Width</dt>
<dd>3/8 in</dd>
<dt>Length </dt>
<dd>1 Inch</dd>
<dt>Thread - TPI or Pitch</dt>
<dd></dd>
<dt>Quantity</dt>
<dd>5000 a pack</dd>
<dt>Wt.</dt>
<dd>20 lb</dd>


If you don't need to reorder items or change their values, and are confident the values themselves don't contain the equals signs or vertical bars used as markup in the input, you could apply a series of regular expressions to introduce the HTML. Using Java's String class from Scala, this could be a dense but effective one-liner:

"Escape test=&<>|Width=3/8 in|Length=1 in|Thread - TPI or Pitch=|Bolt/Screw Length=|Material=|Coating=|Type=Snap-On|Used With=|Quantity=5000 per pack|Wt.=20 lb|Color=".
replaceAll("&","&amp;").
replaceAll("<","&lt;").
replaceAll(">","&gt;").
replaceAll("^","<dl>\n\t<dt>").
replaceAll("=","</dt>\n\t<dd>").
replaceAll("\\|","</dd>\n\n\t<dt>").
replaceAll("$","</dd>\n</dl>")

which yields

<dl>
<dt>Escape test</dt>
<dd>&amp;&lt;&gt;</dd>

<dt>Width</dt>
<dd>3/8 in</dd>

<dt>Length</dt>
<dd>1 in</dd>

<dt>Thread - TPI or Pitch</dt>
<dd></dd>

<dt>Bolt/Screw Length</dt>
<dd></dd>

<dt>Material</dt>
<dd></dd>

<dt>Coating</dt>
<dd></dd>

<dt>Type</dt>
<dd>Snap-On</dd>

<dt>Used With</dt>
<dd></dd>

<dt>Quantity</dt>
<dd>5000 per pack</dd>

<dt>Wt.</dt>
<dd>20 lb</dd>

<dt>Color</dt>
<dd></dd>


You can use

([^=|]+)=([^|]+)(?:\||$)

Apply with the "global" flag.

Explanation:

(             # start match group 1
  [^=|]+      # any character that's not a "=" or "|", at least once
)             # end match group 1
=             # a literal "="
(             # start match group 2
  [^|]+       # any character that's not a "|", at least once
)             # end match group 2
(?=           # look-ahead: followed by
  \|          # either a literal "|"
  |           # or…
  $           # the end of the string
)             # end look-ahead

The string parts you are interested in are in match groups 1 and 2, respectively. For me the above matches:

  1. Width = 3/8 in
  2. Length = 1 in
  3. Type = Snap-On
  4. Quantity = 5000 per pack
  5. Wt. = 20 lb

Your example is inconsistent in the Thread - TPI or Pitch case.


Something like this:

/(?:(.*?)=(.*?)(\||$))+/
0

精彩评论

暂无评论...
验证码 换一张
取 消