Can't manage to write regex for this [duplicate]_问答_开发者

Can't manage to write regex for this [duplicate]

开发者 https://www.devze.com 2023-03-27 12:19 出处：网络

This question already has answers here: Closed 11 years ago. Possible Duplicate: What RSS parser should I use in PHP?

相关专题：cdata php regex

This question already has answers here: Closed 11 years ago.

Possible Duplicate:
What RSS parser should I use in PHP?

Here is the code:

<item>
<title><![CDATA[OLK: The statement of shareholders for shares sale and for shares purchase]]></title>
<link>http://www.nasdaqomxbaltic.com/market/?pg=news&amp;news_id=250910</link>
<description><![CDATA[<pre></pre>]]></description>
<pubDate>2011-08-12 16:25:00</pubDate>
<guid>250910</guid>
</item>
<item>
<title><![CDATA[ZMP: PraneÅ¡imas apie sandorius susijusÄ¯ su emitento vertybiniais popieriais]]></title>
<link>http://www.nasdaqomxbaltic.com/market/?pg=news&amp;news_id=250907</link>
<description><![CDATA[<pre></pre>]]></description>
<pubDate>2011-08-12 16:12:00</pubDate>
<guid>250907</guid>
</item>

And I need to get the values OLK, ZMP which are between <title><![CDATA[ and :. What is the fastest and the most efficient way to do this in php regex? and why is CDATA here? N开发者_如何学JAVAOTE: Im getting the news_id= too.

You should use XML parser (eg. SimpleXML) to gain access to the tag content, and then use regular expressions on the content of the tag.

This is the most efficient solution, because:

XML parser is the most efficient way to parse XML documents,
if you really need to use regular expression, you should use it on data contained within CDATA,

When it comes to part of your question about CDATA, you can see more info about it here.

This is a great guide to parse xml propperly with php. http://www.kirupa.com/web/xml_php_parse_beginner.htm It is what I used when I started with php to figure out how the xml parser works.

Consider using an XML parser, CDATA allows you to use special characters inside the value. If you insist using regex, try following:

/<title><!\[CDATA\[OLK:\s*(.*?)\]\]/

If you really want to go regex then i would reccomend look-ahead and look-behind zero-width assertions. They allow you to state and expression as a start and finish of the match, but it won't be included in the result.