开发者

Find specific div with RegEx and print content

开发者 https://www.devze.com 2023-04-11 22:44 出处:网络
I\'m trying to pull some text from an external website using this script. It works perfectly, but it gets the entire page. I want to take only the content inside a specific div with the class \'conte

I'm trying to pull some text from an external website using this script.

It works perfectly, but it gets the entire page. I want to take only the content inside a specific div with the class 'content'. The entire page is put inside the variable 'data', and then this function is created to strip some tags:

function filterData(data){
  data = data.replace(/<?\/body[^>]*>/g,'');
  data = data.replace(/[\r|\n]+/g,'');
  data = data.replace(/<--[\S\s]*?-->/g,'');
  data = data.replace(/<noscript[^>]*>[\S\s]*?<\/noscript>/g,'');
  data = data.replace(/<script[^>]*>[\S\s]*?<\/script>/g,'');
  data = data.replace(/<script.*\/>/,'');
  return data;
}

How would I go about finding the div with the class 'content' and only viewing the content inside that?

UPDATE: Sorry about using RegExes — can you help me to get the content without using RegEx? So, this is my HTML file:

<a href="http://www.eurest.dk/kantiner/228/all.asp?a=9" class="ajaxtrigger">erg</a>
<div id="target" style="width:200px;height:500px;"></div>
开发者_C百科<div id="code" style="width:200px;height:200px;"></div>
<script src="http://code.jquery.com/jquery.min.js"></script>
<script>
$(document).ready(function(){
var container = $('#target');
$('.ajaxtrigger').click(function(){
doAjax($(this).attr('href'));
return false;
});
function doAjax(url){
if(url.match('^http')){
$.getJSON("http://query.yahooapis.com/v1/public/yql?"+
            "q=select%20*%20from%20html%20where%20url%3D%22"+
            encodeURIComponent(url)+
            "%22&format=xml'&callback=?",
    function(data){
      if(data.results[0]){
        var tree = string2dom(data.results[0]);
        container.html($("div.content", tree.doc));tree.destroy();
      } else {
        var errormsg = '<p>Error: could not load the page.</p>';
        container.html(errormsg);
      }
    }
  );
} else {
  $('#target').load(url);
}
}
function filterData(data){

return tree;
}
});
</script>


Try something like this:

var matches = data.match(/<div class="content">([^<]*)<\/div>/);

if (matches) 
    return matches[1]; // div content


try this:

<div\b[^>]*class="content"[^>]*>([\s\S]*?)<\/div>


Here try this :

<div[^>]*?class='content'[^>]*?>(.*?)</div>

Captured reference /1 will have your content. Although you shouldn't be doing this with regexes :)


this may help you:

    var divtxt = match(/<div[^>]*class="content"[^>]>.*<\/div>/);

but it may stop at the wrong .

you should use jquery or prototype to make it a dom-object and use selectors to find the right div. using jquery you would do something like:

    var divtxt = $(data).find(".content").first().html();

remember to load the jquery library first.

0

精彩评论

暂无评论...
验证码 换一张
取 消