开发者

Rendering large collections of articles to PDF fails in MediaWiki with mwlib

开发者 https://www.devze.com 2023-01-30 00:38 出处:网络
I have installed the Mediawiki Collection Extension and mwlib to render articles (or collections of articles) to PDF. This works very well for single articles and collections with up to 20 articles.

I have installed the Mediawiki Collection Extension and mwlib to render articles (or collections of articles) to PDF. This works very well for single articles and collections with up to 20 articles.

When I render larger collections, the percentage counter in the parsing page (which counts to a 100% when renderi开发者_JS百科ng succeeds) is stuck at 1%.

Looking at the mwrender.log I see an Error 32 - Pipe Broken error. Searching the internet reveals that Error 32 can be caused by the receiving process (the part after the pipe) crashing or not responding.

From here it is hard to proceed. Where should I look for more clues? Could it be the connection to the MySQL server that dies?

The whole applicance is running on a Turnkey Linux Mediawiki VM.


I'm using PDF Export Extension and it works with more than 20 articles. Maybe try that?


I figured out the problem myself.

Mw-render spawns a parallel request for every article in a collection. This means that for a collection of 50 pages, 50 simultaneous requests are made. Apache could handle this, but not the MySQL db of MediaWiki.

You can limit the amount of threads that mw-render spawns with the --num-threads=NUM option. I couldn't find where mw-serve calls mw-render, so I just limited the maximum amount of threads (workers) Apache could spawn to 10.

mw-render automatically repeats requests for articles if the first ones fail, so this approach worked.

I rendered a PDF with 185 articles within 4 minutes, the resulting PDF had 300+ pages.

0

精彩评论

暂无评论...
验证码 换一张
取 消