开发者

Find mathmls from a String using java

开发者 https://www.devze.com 2023-03-08 07:40 出处:网络
I have a Big string which has multiple mathmls in it. Want to take out all of them in a string array. Using regex to find them. But something missing in the regex so it doesn\'t gives any output.

I have a Big string which has multiple mathmls in it. Want to take out all of them in a string array. Using regex to find them. But something missing in the regex so it doesn't gives any output.

What is the regex for MathMls?

Example string

Find sum of «math xmlns=\"http://www.w3.org/1998/Math/MathML\"»«mroot»«mrow»«mi»#«/mi»«mi»a«/mi»«/mrow»«mn»3«/mn»«/mroot»«mo»=«/mo»«mroot»«mrow»«mi»#«/mi»«mi»b«/mi»«/mrow»«mn»3«/mn»«/mroot»«/math» and «math xmlns=\"http://www.w3.org/19开发者_如何学JAVA98/Math/MathML\"»«mo»=«/mo»«msup»«mfenced»«mrow»«mi»#«/mi»«mi»b«/mi»«/mrow»«/mfenced»«mfrac»«mn»1«/mn»«mn»3«/mn»«/mfrac»«/msup»«/math»

From this get 2 mathmls


You can't do that with Java's regex engine since this is valid input:

<math>
  <apply>
    <plus/>
    <apply>
      <times/>
      <ci>a</ci>
      <apply>
        <power/>
        <ci>x</ci>
        <cn>2</cn>
      </apply>
    </apply>
    <apply>
      <times/>
      <ci>b</ci>
      <ci>x</ci>
    </apply>
    <ci>c</ci>
  </apply>
</math>

i.e.: there can be arbitrary nested tags and Java's regex engine has no ability to match recursive patterns. You will have to resort to some parser to handle MathML input.

EDIT

Can i consider the entire thing as a string and find for a pattern which matches ? That is what i am trying. And there is not going to be any recursive tags inside another tag. they will be in same level.

In that case, try this pattern:

<math[>\s](?s).*?</math>

or as a String literal:

"<math[>\\s](?s).*?</math>"

which means:

<math[>\s]   # match `<math` followed by a space or `>`
(?s).*?      # reluctantly match zero or more chars (`(?s)` causes `\r` 
             # and `\n` also to be matched)
</math>      # match `</math>`
0

精彩评论

暂无评论...
验证码 换一张
取 消