开发者

Using Javascript and Regular expression to get content inside the html body [duplicate]

开发者 https://www.devze.com 2023-03-17 18:33 出处:网络
This question already has answers here: Closed 11 years ago. Possible Duplicate: how to extract body contents using regexp
This question already has answers here: Closed 11 years ago.

Possible Duplicate:

how to extract body contents using regexp

I have response text which is having a full page content like html,head,bo开发者_运维技巧dy.I want only the content inside the body.How to achieve this using regx.please help to achieve this.


A DOM parser is the most reliable method for extracting data like this, but a regex can do a pretty decent job if the HTML is sane. (i.e. the text: <body or: </body does not occur inside comments, scripts, stylesheets, CDATA sections or attribute values. And the BODY element start tag attributes do not contain the: > character.) This regex captures the contents of the first innermost BODY element (should only ever be one):

var bodytext = '';
var m = text.match(/<body[^>]*>([^<]*(?:(?!<\/?body)<[^<]*)*)<\/body\s*>/i);
if (m) bodytext = m[1];

It implements Jeffrey Friedl's "Unrolling-the-Loop" efficiency technique so is quite fast.

0

精彩评论

暂无评论...
验证码 换一张
取 消