开发者

Xsl subtrings with pattern

开发者 https://www.devze.com 2023-02-21 16:58 出处:网络
I would like to retrieve a value of my xml file with a xsl template but I have no idea how to retrieve these value...

I would like to retrieve a value of my xml file with a xsl template but I have no idea how to retrieve these value...

I have this XML file

<BlastOutput_iterations>
<Iteration>
  <Iteration_iter-num>1</Iteration_iter-num>
  <Iteration_query-ID>Query_1</Iteration_query-ID>
  <Iteration_query-def>Teg18as_antisens</Iteration_query-def>
  <Iteration_query-len>865</Iteration_query-len>
  <Iteration_hits>
    <Hit>
      <Hit_num>1</Hit_num>
      <Hit_id>gnl|BL_ORD_ID|30</Hit_id>
      <Hit_def>gi|150392480|ref|NC_009632.1| Staphylococcus aureus subsp. aureus JH1, complete genome</Hit_def>
      <Hit_accession>2233</Hit_accession>
      <Hit_len>2906507</Hit_len>
      <Hit_hsps>
        <Hsp>
          <Hsp_num>1</Hsp_num>
          <Hsp_bit-score>1561.20031714635</Hsp_bit-score>
          <Hsp_score>1730</Hsp_score>
          <Hsp_evalue>0</Hsp_evalue>
          <Hsp_query-from>1</Hsp_query-from>
          <Hsp_query-to>865</Hsp_query-to>
          <Hsp_hit-from>355668</Hsp_hit-from>
          <Hsp_hit-to>356532</Hsp_hit-to>
          <Hsp_query-frame>1</Hsp_query-frame>
          <Hsp_hit-frame>1</Hsp_hit-frame>
          <Hsp_identity>865</Hsp_identity>
          <Hsp_positive>865</Hsp_positive>
          <Hsp_gaps>0</Hsp_gaps>
          <Hsp_align-len>865</Hsp_align-len>
          <Hsp_qseq>CAACTCGTTAGGACAATCACGATGATTGTCTACAGTTGCAGGTGGATTTGAATATACTACTAGTTATTTGTTGTCTAGGATAATAGATTTAGTATGTTGATAAGTTTGACTCAGATTTGTATTTTCTAATAAATGATAACTCACGATATCGATTAAAAAGAGTGTCGCAATTTGTGTGTTGATAAATTGATGGTCGGTATTACGCGATTGATCCGTTGTTAAAAGTACTAAATCTGCACAATCTGTAAGTTTACTACCTTCGAAATTTGTGATGGCAACGACATATGCACCATGAGATTTGGCGACTTCCGCTGCTGAAATTAATTCCGAAGTATTACCACTATTTGACATAGCAATAAACATGTCCGAATGAGATAGTAGGGATGCCGATATTTTCATTAAATGTGAATCGGTAGTAACATTACCTTTTAGCCCCATACGAATCATACGATAATAAAATTCAGTCGCTGATAAACCAGAGCTACCTAGTCCAGCAAAGAGTATATGTCGACTTGATTGGAGTTTGTCGATAAAGGTTTGGATAATGTCGTTATCAATAAATTCACCAGTTTGTTGAATGATTTGTTGATGATATTTATGAATTCTTTGAATAATTGGGCTATTTTCAATAACTGTCTCTGTCATTTCTTGTTGAATATTAAATTTTAAATCTTGGAAATTCTCATAATCTAGCTTATGACTAAAGCGTGTCATCGTTGCTGGTGATGTACCAATCGCATGGGCTAAGGAGTTAATCGTTGAAAAGGCATCGCTATAACCATTTTGTCTTATATAATTGACGATGCGTTTATCAGTTTTTGTAAATAAATGTTGATAACGTTGAACACGATTCTCAAATTTCATT</Hsp_qseq>
          <Hsp_hseq>CAACTCGTTAGGACAATCACGATGATTGTCTACAGTTGCAGGTGGATTTGAATATACTACTAGTTATTTGTTGTCTAGGATAATAGATTTAGTATGTTGATAAGTTTGACTCAGATTTGTATTTTCTAATAAATGATAACTCACGATATCGATTAAAAAGAGTGTCGCAATTTGTGTGTTGATAAATTGATGGTCGGTATTACGCGATTGATCCGTTGTTAAAAGTACTAAATCTGCACAATCTGTAAGTTTACTACCTTCGAAATTTGTGATGGCAACGACATATGCACCATGAGATTTGGCGACTTCCGCTGCTGAAATTAATTCCGAAGTATTACCACTATTTGACATAGCAATAAACATGTCCGAATGAGATAGTAGGGATGCCGATATTTTCATTAAATGTGAATCGGTAGTAACATTACCTTTTAGCCCCATACGAATCATACGATAATAAAATTCAGTCGCTGATAAACCAGAGCTACCTAGTCCAGCAAAGAGTATATGTCGACTTGATTGGAGTTTGTCGATAAAGGTTTGGATAATGTCGTTATCAATAAATTCACCAGTTTGTTGAATGATTTGTTGATGATATTTATGAATTCTTTGAATAATTGGGCTATTTTCAATAACTGTCTCTGTCATTTCTTGTTGAATATTAAATTTTAAATCTTGGAAATTCTCATAATCTAGCTTATGACTAAAGCGTGTCATCGTTGCTGGTGATGTACCAATCGCATGGGCTAAGGAGTTAATCGTTGAAAAGGCATCGCTATAACCATTTTGTCTTATATAATTGACGATGCGTTTATCAGTTTTTGTAAATAAATGTTGATAACGTTGAACACGATTCTCAAATTTCATT</Hsp_hseq>
          <Hsp_midline>|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||开发者_C百科||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||</Hsp_midline>
        </Hsp>
      </Hit_hsps>
    </Hit>

  </Iteration_hits>

</Iteration>

and a would like to retrieve NC_009632.1 in gi|150392480|ref|NC_009632.1| Staphylococcus aureus subsp. aureus JH1, complete genome, the structure of these line is always like "xx|number|xxx|value_to_retrieve| "

thank for help


<xsl:template match="Hit_def">
  <xsl:value-of select="substring-before(
                           substring-after(
                              substring-after(
                                 substring-after(., '|'),
                                 '|'
                              ),
                              '|'
                           ),
                           '|'
                        )"/>
</xsl:template>

with XSLT 1.0 should do, with 2.0

<xsl:template match="Hit_def">
  <xsl:value-of select="tokenize(., '|')[4]"/>
</xsl:template>

is easier.


Here is a simple tokenization. One can use the result to access any token by position:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>

 <xsl:template match="text()" name="tokenize">
  <xsl:param name="pText" select="concat(.,'|')"/>
   <xsl:if test="string-length($pText)">
      <t>
        <xsl:value-of select="substring-before($pText, '|')"/>
      </t>
       <xsl:call-template name="tokenize">
        <xsl:with-param name="pText" select=
            "substring-after($pText, '|')"/>
       </xsl:call-template>
     </xsl:if>
 </xsl:template>
</xsl:stylesheet>

when applied on:

<t>a|b|c|d|e|f|g|h</t>

produces:

<t>a</t>
<t>b</t>
<t>c</t>
<t>d</t>
<t>e</t>
<t>f</t>
<t>g</t>
<t>h</t>

Finally, we are using this tokenisation technique to solve the original problem:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:ext="http://exslt.org/common" >
 <xsl:output omit-xml-declaration="yes" indent="yes"/>


    <xsl:template match="/">
      <xsl:variable name="vrtfTokens">
        <xsl:call-template name="tokenize">
         <xsl:with-param name="pText" select=
          "/*/*/*/*/Hit_def"/>
        </xsl:call-template>
      </xsl:variable>

      <xsl:value-of select="ext:node-set($vrtfTokens)/*[4]"/>
    </xsl:template>

    <xsl:template match="text()" name="tokenize">
        <xsl:param name="pText" select="concat(.,'|')"/>
        <xsl:if test="string-length($pText)">
            <t>
                <xsl:value-of select="substring-before($pText, '|')"/>
            </t>
            <xsl:call-template name="tokenize">
                <xsl:with-param name="pText" select=
                "substring-after($pText, '|')"/>
            </xsl:call-template>
        </xsl:if>
    </xsl:template>
</xsl:stylesheet>

when this transformation is applied on the provided XML document:

<BlastOutput_iterations>
    <Iteration>
        <Iteration_iter-num>1</Iteration_iter-num>
        <Iteration_query-ID>Query_1</Iteration_query-ID>
        <Iteration_query-def>Teg18as_antisens</Iteration_query-def>
        <Iteration_query-len>865</Iteration_query-len>
        <Iteration_hits>
            <Hit>
                <Hit_num>1</Hit_num>
                <Hit_id>gnl|BL_ORD_ID|30</Hit_id>
                <Hit_def>gi|150392480|ref|NC_009632.1| Staphylococcus aureus subsp. aureus JH1, complete genome</Hit_def>
                <Hit_accession>2233</Hit_accession>
                <Hit_len>2906507</Hit_len>
                <Hit_hsps>
                    <Hsp>
                        <Hsp_num>1</Hsp_num>
                        <Hsp_bit-score>1561.20031714635</Hsp_bit-score>
                        <Hsp_score>1730</Hsp_score>
                        <Hsp_evalue>0</Hsp_evalue>
                        <Hsp_query-from>1</Hsp_query-from>
                        <Hsp_query-to>865</Hsp_query-to>
                        <Hsp_hit-from>355668</Hsp_hit-from>
                        <Hsp_hit-to>356532</Hsp_hit-to>
                        <Hsp_query-frame>1</Hsp_query-frame>
                        <Hsp_hit-frame>1</Hsp_hit-frame>
                        <Hsp_identity>865</Hsp_identity>
                        <Hsp_positive>865</Hsp_positive>
                        <Hsp_gaps>0</Hsp_gaps>
                        <Hsp_align-len>865</Hsp_align-len>
                        <Hsp_qseq>CAACTCGTTAGGACAATCACGATGATTGTCTACAGTTGCAGGTGGATTTGAATATACTACTAGTTATTTGTTGTCTAGGATAATAGATTTAGTATGTTGATAAGTTTGACTCAGATTTGTATTTTCTAATAAATGATAACTCACGATATCGATTAAAAAGAGTGTCGCAATTTGTGTGTTGATAAATTGATGGTCGGTATTACGCGATTGATCCGTTGTTAAAAGTACTAAATCTGCACAATCTGTAAGTTTACTACCTTCGAAATTTGTGATGGCAACGACATATGCACCATGAGATTTGGCGACTTCCGCTGCTGAAATTAATTCCGAAGTATTACCACTATTTGACATAGCAATAAACATGTCCGAATGAGATAGTAGGGATGCCGATATTTTCATTAAATGTGAATCGGTAGTAACATTACCTTTTAGCCCCATACGAATCATACGATAATAAAATTCAGTCGCTGATAAACCAGAGCTACCTAGTCCAGCAAAGAGTATATGTCGACTTGATTGGAGTTTGTCGATAAAGGTTTGGATAATGTCGTTATCAATAAATTCACCAGTTTGTTGAATGATTTGTTGATGATATTTATGAATTCTTTGAATAATTGGGCTATTTTCAATAACTGTCTCTGTCATTTCTTGTTGAATATTAAATTTTAAATCTTGGAAATTCTCATAATCTAGCTTATGACTAAAGCGTGTCATCGTTGCTGGTGATGTACCAATCGCATGGGCTAAGGAGTTAATCGTTGAAAAGGCATCGCTATAACCATTTTGTCTTATATAATTGACGATGCGTTTATCAGTTTTTGTAAATAAATGTTGATAACGTTGAACACGATTCTCAAATTTCATT</Hsp_qseq>
                        <Hsp_hseq>CAACTCGTTAGGACAATCACGATGATTGTCTACAGTTGCAGGTGGATTTGAATATACTACTAGTTATTTGTTGTCTAGGATAATAGATTTAGTATGTTGATAAGTTTGACTCAGATTTGTATTTTCTAATAAATGATAACTCACGATATCGATTAAAAAGAGTGTCGCAATTTGTGTGTTGATAAATTGATGGTCGGTATTACGCGATTGATCCGTTGTTAAAAGTACTAAATCTGCACAATCTGTAAGTTTACTACCTTCGAAATTTGTGATGGCAACGACATATGCACCATGAGATTTGGCGACTTCCGCTGCTGAAATTAATTCCGAAGTATTACCACTATTTGACATAGCAATAAACATGTCCGAATGAGATAGTAGGGATGCCGATATTTTCATTAAATGTGAATCGGTAGTAACATTACCTTTTAGCCCCATACGAATCATACGATAATAAAATTCAGTCGCTGATAAACCAGAGCTACCTAGTCCAGCAAAGAGTATATGTCGACTTGATTGGAGTTTGTCGATAAAGGTTTGGATAATGTCGTTATCAATAAATTCACCAGTTTGTTGAATGATTTGTTGATGATATTTATGAATTCTTTGAATAATTGGGCTATTTTCAATAACTGTCTCTGTCATTTCTTGTTGAATATTAAATTTTAAATCTTGGAAATTCTCATAATCTAGCTTATGACTAAAGCGTGTCATCGTTGCTGGTGATGTACCAATCGCATGGGCTAAGGAGTTAATCGTTGAAAAGGCATCGCTATAACCATTTTGTCTTATATAATTGACGATGCGTTTATCAGTTTTTGTAAATAAATGTTGATAACGTTGAACACGATTCTCAAATTTCATT</Hsp_hseq>
                        <Hsp_midline>|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||</Hsp_midline>
                    </Hsp>
                </Hit_hsps>
            </Hit>
        </Iteration_hits>
    </Iteration>
</BlastOutput_iterations>

the wanted, correct result is produced:

NC_009632.1
0

精彩评论

暂无评论...
验证码 换一张
取 消