开发者

Dealing with large amounts of data via XML API

开发者 https://www.devze.com 2023-04-02 23:35 出处:网络
So, I searched some here, but couldn\'t find anything good, apologies if my search-fu is insufficient...

So, I searched some here, but couldn't find anything good, apologies if my search-fu is insufficient...

So, what I have today is that my users upload a CSV text file using a form to my PHP script, and then I import that file into a database, after validating every line in it. The text file can be put to about 70,000 lines long, and each lines contains 24 fields of values. This is obviously not a problem since dealing with that kind of data. Every line needs to be validated plus I check the DB for duplicates (according to a dynamic开发者_开发百科 key generated from the data) to determine if the data should be inserted or updated.

Right, but my clients are now requesting an automatic API for this, so they don't have to manually create and upload a text file. Sure, but how would I do it?

If I were to use a REST server, memory would run out pretty quickly if one request contained XML for 70k posts to be inserted, so that's pretty much out of the question.

So, how should I do it? I have thought about three options, please help med decide or add more options to the list

  1. One post per request. Not all clients have 70k posts, but an update to the DB could result in the API handling 70k requests in a short period, and it would probably be daily either way.

  2. X amount of posts per request. Set a limit to the number of posts that the API deals with per request is set to, say, 100 at a time. This means 700 requests.

  3. The API requires for the client script to upload a CSV file ready to import using the current routine. This seems "fragile" and not very modern.

Any other ideas?


If you read up on SAX processing http://en.wikipedia.org/wiki/Simple_API_for_XML and HTTP Chunk Encoding http://en.wikipedia.org/wiki/Chunked_transfer_encoding you will see that it should be feasible to parse the XML document whilst it is being sent.


I have now solved this by imposing a limit of 100 posts per request, and I am using REST through PHP to handle the data. Uploading 36,000 posts takes about two minutes with all the validation.


First of all don't use XMl for this! Use JSON, it is fastest than xml.

I Use on my project import from xls. file is very large, but script work fine, just client must create files with same structure for import

0

精彩评论

暂无评论...
验证码 换一张
取 消