开发者

Best practices to parse and manipulate XML files, that are minimum of 1000 MBs or more in size [duplicate]

开发者 https://www.devze.com 2023-02-24 04:56 出处:网络
This question already has answers here: Closed 11 years ago. Possible Duplicates: PHP what is the best approach to using XML? Need to create and parse XML responses
This question already has answers here: Closed 11 years ago.

Possible Duplicates:

PHP what is the best approach to using XML? Need to create and parse XML responses

Parse big XML in PHP

Hello Community,

I am writing an application, that requires to parse开发者_开发知识库 XML files, that can minimum of 1000 MBs or more in size.

I have tried with few code that is available on internet. As file size is more, it's easy to understand that file will have lots and lots of XML tags. So, loop performance gets weak as time elapse.

So, I would need a parser: -> Performance is considerably good as time passes, when doing execution / parsing -> Doesn't load the whole XML file in memory

I know about following XML parsers, but not sure which to use and why?

  1. XML Parser
  2. SimpleXML
  3. XMLReader

I am using PHP 5.3, so please help me guys and gals, to choose the parser.

You can even suggest me some other options, or classes.

Thanks.

EDIT

I even want to know about SAX (Simple API for XML) and StAX implementation of PHP


First of all, you can't load that much XML in memory. It depends on your machine, but if your XML file is more than 10-20 MB it generally is too much. The server may be able to handle more, but it's not a good idea to fill all the memory with one script. So you can rule out SimpleXML and DOM from the start.

The other two options, XML Parser and XMLReader, will both be good, with XMLReader being a newer extension, so probably better. But as a warning you should take notice that XMLReader also allows you to load everything in memory. Don't do that. Instead use it as a node-by-node parser and read/process your data in small bits.

You problem may go beyond the scope of choosing a parser if you need most of the data from the XML. You should also make sure that you don't load it all up in memory and use it at the end of the script. Instead use it as you get it and dispose of it once you no longer need it.


Load your giant XML files into an XML database and perform your query and manipulations through their XQuery/XSLT interfaces.

http://www.xml.com/pub/a/2003/10/22/embed.html

0

精彩评论

暂无评论...
验证码 换一张
取 消