Regexp - search for text which doesn't contain whole word_问答_开发者

Regexp - search for text which doesn't contain whole word

开发者 https://www.devze.com 2022-12-19 04:04 出处：网络

I have text similar like this: <html>this is the text and this is another text</html>

相关专题：regex

I have text similar like this:

<html>this is the text and this is another text</html>

and I need to get this text using regexp

this is 开发者_StackOverflow中文版the text

Problem is, when I use simple regexp like this (<html>.*) I'm getting whole text until the last occurence of 

Can anyone help me?

thanks lennyd

You need a non-greedy match:

<html>.*?</p>

Also, you might want to consider using an HTML parser instead of regular expressions for this task.

By default regular expression quantifiers are greedy, i.e. you get the match of maximum length. You'll have to specify that you want an 'un-greedy' match using .*?

To capture the data in between para tags you may use regexp with positive look-ahead assertion /(.*)(?=<\/p>)/, which is more greedy then .*? and works slower, but may be helpful for you. Also make sure that your HTML is valid, that means:

All para tags are closed. HTML browsers close para tags, when they enter another block.
Para tags are not nested :) Otherwise you have problems with any regex.

Silly question, still using pure regex, why not just strip any <..> inside paragraphs? THEN grab the phrases using something like [^<]
?