I need to write a script that takes a link and parses the HTML of the linked page to pull in the title and a few other pieces of data like potentially a short description much like when you link to something on Facebook.
It will be called when a user adds a link to the site, so could see a decent number of hits when the client launches the site.
I am curious if I should do this on the server side with PHP or the end user side with Javascript? I have been writing the logic behind trying to figure out which areas of the markup are filled with potential content and it made me wonder if the load would be too much if I continue in PHP.
The client has just the one decent web server and I worry parsing/ana开发者_JAVA技巧lyzing HTML pages may be too much load where we could do it in Javascript and farm it out to the user adding the link.
Any advice or thoughts on the matter would be awesome. Thank you.
Edit: This data is not going straight into the database, it is used to help the user by auto filling the description of their link which still goes through my regular vetting before being stored to the DB.
Well, this is an easy one, because performing this from the client-side purely with JavaScript just plain isn't an option at all due to the same origin policy.
Parsing HTML isn't that heavy of a task, you should be fine doing it in PHP.
I would offload this to the end-user via javascript, with a listener you could then bind it back to the server. The reasons why are simple:
- This is a helper to the front-end not the backend (values aren't stored or manipulated on the backend directly.)
- The load is better spread around than localized on your server, also you'll probably give a better user experience here if the end-user is only pulling 1 url vs. the server pulling thousands.
- Processing in the front-end also mitigates the possibility of malicious code being executed directly on your server.
If you're thinking about having the client actually got and fetch some random site, parse it for you in Javascript, grab the title, description and other data and then submit that in your form for you, your form's submit time is going to be held hostage to your user's network connection speed for fetching that page and whatever overhead (likely miniscule) for parsing the data. If you do that server side using cURL, the hit will be in parsing the document for what you need. the best speed solution would probably be to let the person enter the URL, get it back in PHP, have PHP hand it off to a Perl script (which has some wicked fast DOM parsers) and get the required data back for the PERL script. From personal experience, the Perl scripts outperform cURL all day long, and cURL generally outperforms javascript AJAX gets by a wide margin just by nature of being on a bigger pipe than a home user.
You can do both....
1) PHP:
- checkout HTML DOM Parser, could be helpful
- or use php curl and then parse with DOMDocument
2) JavaScript:
- you don't have to bother your server (pro)
- parsing content with jQuery is easy (pro)
- you need to handle cross domain policy (cons)
精彩评论