开发者

Should I use Perl or PHP or something else for this project?

开发者 https://www.devze.com 2022-12-24 07:01 出处:网络
I\'m about to embark on a project that will need to: Process XML Heavy text parsing of non-xml documents

I'm about to embark on a project that will need to:

  • Process XML
  • Heavy text parsing of non-xml documents
  • Insertion of data from xml and non-xml documents into a relational DB.
  • Present processed data to user from db using webpages.
  • Must handle load very well.

The website will be subject to short periods of very heavy loads to pages (300+ visitors a minute for several minutes), but most of the time will be idle (a dozen or so visitors a minute).

I have a very strong background in Java and web services, but I do not want to use Java for this project as I'd l开发者_StackOverflow社区ike to diversify my skill set.

I'm not looking for your opinion on which language you think is best. What are some pros and cons from using these languages that you might recognize from your own experiences?


I'd go with Perl. The LibXML series of modules gives a variety of interfaces (DOM, XPath, XSLT, etc.) backed by a fast C parser.

Perl's regex support for slicing and dicing text is pretty much unmatched by any other language. If you expect to do lots of arbitrary text processing, and are at least a little familiar with regex, you will thank yourself.

There are also a series of great web frameworks for Perl, including the simple but powerful Mojolicious framework, and the comprehensive Catalyst framework. There's always the ancient and stable CGI library, but Mojolicious or Catalyst would probably be better choices.


Since I'm a PHP guy, here is what I can offer about PHP

  • PHP scales well due to it's shared nothing architecture
  • PHP has native support for various XML libs
  • PHP has native support for a number of RDBMS
  • PHP has native support for caching
  • PHP has native support for webservices
  • PHP is a templating engine

So the requirements to a language from your question are met by PHP.

However, Perl, Python or Ruby or even ServerSide JavaScript (...) should all be capable of doing what you are asking for either. PHP has it's quirks, so do the other languages. If you are a Java Guy, you might like Ruby for it's syntax, but then again, only you can decide.


  • Perl scales well
  • Perl supports various XML libs
  • Perl supports a large number of RDBMs via DBI
  • Perl supports caching
  • Perl supports web services such as SOAP, XML-RPC etc.
  • Perl has many template engines

Therefore, every single item on your list can be done using both languages. You should choose the one you believe will make you most productive taking into account your own strengths and weaknesses.


It is, indeed, very much a subjective question. I can totally conceive that in 2010, Perl or PHP (and even Python or Ruby) could equally serve you for such a project. The difference is not going to come from the language itself as much as the tools, best practices and community.

Among these languages, I am most familiar with Perl, so let me try to offer an answer from that perspective, regarding your needs.

Text and XML parsing: Perl has very robust support for text parsing of even very long files (as long as you don't slurp), and allows powerful, clear and easy regex programming. It has clear built-in Unicode support and standard trans-encoding tools (the Encode module), which is very handy when it comes to user interfaces. It also has a direct binding for libxml2 in the form of a standard, fast and well-maintained module: XML::LibXML.

Relational DB Support: In addition to the standard database interface (DBI) which allows direct SQL queries to a number of DBMSes, there are a number of frameworks to make DB-to-Webdoc management easier while still powerful. The most famous probably being Catalyst.

HTML Document presentation: Mason is my favorite web application delivery engine. The integration with Perl is so elegant, yet it does not sacrifice templating patterns or language features.

Heavy load handling: There are as many solutions as there are load problems to solve. Perl offers bindings for memcached: Cache::Memcached (written in Perl) and Cache::Memcached::Fast (written in C).

Balance that out with your personal preferences regarding syntax and general language philosophy, and you could very much join the Enlightened Perl community quite soon :)


As it appears the bulk of your work will be processing data more than presentation, in my opinion this is what Perl does best. Perl does perform very well with regular expressions and the vast array of modules on CPAN can help you parse commonplace formats. There are also a good few frameworks in Perl that will make life easier in the presentation of the data. The major disadvantage for a newcomer, is with the tens of distributions on CPAN for each of the various problems you may encounter (XML parsing, web framework, ORM etc), it can be hard to make decisions as to which one to use. Thanks to Plack/PSGI, talking to webservers with Perl in recent times has gotten much, much better.

It's important that "load" is a problem that is completely language agnostic, so it is not what language you choose, it is how you engineer your system that will determine how well it handles increased load. Perl, Java, PHP have all been used in small setups all the way through to some of the most heavily trafficked websites on the net. If growth is on your future needs, decouple where appropriate and design for future expansion first. Multiple database servers, caching, message/work queues can be used in the small scale, and putting them in when things are small is easier than having to rewrite or quickly hack them in when demand for more resources is needed.


Your architecture and algorithms will have more impact on speed and scalability than choice of language.

Perl, PHP or Java will all do the job.

I'd do this in Perl since I know it well and prefer it to PHP (which I also know well). YOur mileage will vary.


As far as I'm aware, PHP's regex (which I would assume is what you'll use) PCRE library came from Perl. So if you have a lot of non-XML parsing then you need to test both and see which one runs faster. I'm not sure which one is faster for you neededs.

They both handle XML well (finally).

However, PHP is just a massive community. There is no other scripting language on the planet as large. So if that matters to you then use PHP since you can find everything under-the-sun about it.

However, Perl also has a large following and I'm sure there are plenty of tutorials for everything you would want to do.

Python is also a language you might want to look into. Heck, since everyone realized Ruby was God's gift to the world it has exploded too! You can honstly do what you want in any language so you need to look at the syntax of each of them and figure out which one you like best. From there you can run a simple example benchmark in each one to see which language is the fastest for you neededs.

Whatever you do - don't use a "framework" like wordpress or drupal. They are CMS's not frameworks and are so slow and bloated. Wordpress takes 8MB just to load the index page!

We had a PHP project and a Guy from Java joined us and was up and running in a week or two once he got the hang of everthing.


Why don't you try Ruby on Rails?

Coming back to your question i would say PHP. Since you need to learn something new and at the same time you should have a great community where you can find support.

PHP does all what you have requested.


All mentioned languages should be usable for your purpose. But as far as I know PHP could be a little bit tricky regarding UTF8 strings (e.g. getting the right string length for UTF8 character which consists of multiple bytes). But I'm sure some guys will provide good solutions for PHP via comments soon :-)

My personal favorite is Ruby. As it provides for all your needs really easy and powerful APIs (so called gems).


I would use Common Lisp.

  • Closure XML for parsing XML
  • cl-ppcre is a perl-compatible regular expression library, but depending on what kind of text you want to parse, you can perhaps find specialized parsers at the Common Lisp Directory.
  • I don't know what database you want to use, but Postmodern is very nice for Postgres. There is also the more generic CLSQL.
  • You can use Hunchentoot as a webserver and, e.g., CL-WHO to produce HTML pages. 5 pages per second should be no problem.


Use Perl, if you have experience with neither and your goal is to make yourself more marketable.

It's much easier to fake PHP experience if you need to defend both entries in your 'professional experience' section.


Depending on your needs you may want to consider a framework that already supports caching, Drupal is one example but there are many others. Most frameworks are extensible so you can add plugins to handle all the parsing and presentation.

I think language is less important than the framework you choose. I would personally choose PHP over Perl, because I think it is more applicable in the real world. Python is another beautiful scripting language, but php has the most traction in the web world. If you goal is to make your skill set more marketable, go with PHP.


Ok, so everyone is been subjective in their answers I'll add mine too.

Use Java, the core supports all you need (no frameworks needed), its free, OS and its 2 to 3 times faster than Perl - PHP.

Seriously... PHP is designed for Web projects, its easy, and support all you need to do (try Zend framework), it has a decent learning curve (Java is harder to learn), there is a huge community of developers out there to help you if you run into something unexpected (bigger than Pearl's and Java's). On performance, its a little slower than pearl (im talking about plain'old PHP scripts, no wierd-vodoo optimizations) but its enough for what you probably need.

In the end I'm pretty sure you will get a smaller-consistent app if you use PHP ( and if follow all the coding and design best practices) than you will ever get using Perl.

(Java is way better... but I don't want to be verbally lynched by some PHP zealot)

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号