How to get a part of html file?_问答_开发者_运维开发者技术经验分享

开发者 https://www.devze.com 2023-04-03 17:08 出处：网络

I just want to get a part of an website within all the html-tags: <table></table> ... <div><font>some <b>kind</b> of <i>individual<开发者_如何学运维;/i>

相关专题：parsing

I just want to get a part of an website within all the html-tags:

<table></table>
...
<div><font>some <b>kind</b> of <i>individual<开发者_如何学运维;/i> text I need</font></div>
...
<div>other things I don't need</div>

-> I only want this: <font>some <b>kind</b> of <i>individual</i> text I need</font>

My goal is it to display this part with bold tags and images in a UIWebView. I've tried some XPath parser but these skipped the tags which I wanted to display in the web view. On Stackoverflow I found a solution with java script: extract-part-of-html-in-c-objective-c but I don't know how this could help me in my ios application

Hopefully someone can help me

You may find this useful: (see the Demo inside this article)

http://api.jquery.com/html/

Its almost everything that you need, except the "make tags bold" part

update: includes getting content from separate url

http://api.jquery.com/jQuery.get/

$.get("http://www.website_i_need_to_parce.com", function(data){

  /// work with "data" variable as you work with "document"

  var htmlStr = data.html().find('#someDiv');

});

After this call - htmlStr will contain contents of the div with id="someDiv". If you need to paste these contents as html - use:

  $('#div_on_my_site_where_I_Want_to_paste_code').text(htmlStr);

Supposing the context node is the parent of the div and the div is the first div child of the context node (You haven't provide the complete source XML !!!), this XPath expression selects the wanted nodes:

div[1]/node()

XSLT - based verification:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/t">
     <xsl:copy-of select="div[1]/node()"/>
 </xsl:template>
</xsl:stylesheet>

when this transformation is applied on the provided XML fragment (wrapped into a single top node to make it a well-formed XML document):

<t>
    <table></table>
    ...    
    <div>
        <font>some 
            <b>kind</b> of 
            <i>individual</i> text I need
        </font>
    </div>
     ...    
    <div>other things I don't need</div>
</t>

the wanted, correct result is produced:

<font>some 
            <b>kind</b> of 
            <i>individual</i> text I need
        </font>

Explanation: The XPath expression above selects all children nodes of the first div child of the context node. This is exactly what is wanted: all children of the div element but excluding the div element itself.