开发者

Convert links in blockquotes to plain text

开发者 https://www.devze.com 2023-03-16 07:58 出处:网络
So, I\'ve been asking a lot of Xpath questions recently. Sorry, but I\'ve only just started using it, and I\'m working on a kind of hard project.

So, I've been asking a lot of Xpath questions recently. Sorry, but I've only just started using it, and I'm working on a kind of hard project. You see, at the moment I'm parsing HTML like this (not a copy and paste, just an example):

<span id="no153434"></span>
<blockquote>Text here.<br/>More text.<br/>Some more text.</blockquote>

And I'm using

//span[starts-with(@id, 'no')]/following::*[1][name()='blockquote']//node()

To get the text inside. It's working fine, although it's very frustrating. I need to manually check for

then manually combine the strings before and after the br, add a newline, and so on. But it stills works. Until there is a link in the text, that is. Then the code is like this:

<span id="no153434"></span>
<blockquote>Text here.<br/>Text.<br/><font class = "unkfunc"><a href="linkhere" class="link">linkhere</a></font></blockquote>

I have absolutely NO idea where to go from here, as the link is incl开发者_StackOverflowuded as a completely seperate item (twice) in the array. Atleast with the br I knew where it had to be moved to. Really contemplating giving up in this project after all this effort.


You can use this XPath to obtain text inside element: //span[starts-with(@id, 'no')]/following::*[1][name()='blockquote']//text()

So you receive following result:

  1. Text here.
  2. Text.
  3. linkhere


If you want only text nodes and br:

 //span
  [starts-with(@id, 'no')]/
  following::*[1][name()='blockquote']
   //node()
   [ count(.|..//text()) = count(..//text())
     or 
     name()='br'
   ]

returns

Text here.
<br />
Text.
<br />
linkhere


The answer is to not use XPath for this kind of work. Got it working 1,000,000x easier with Objective-C-HTML-Parser.

0

精彩评论

暂无评论...
验证码 换一张
取 消