Invoking the MediaWiki Page Parser to get HTML?_问答_开发者

Invoking the MediaWiki Page Parser to get HTML?

开发者 https://www.devze.com 2022-12-13 09:26 出处：网络

I\'d like to get the HTML for a MediaWiki Page, that is I want to run the MediaWiki Markup through the parser. Now, I know I could just use some external Parser, but most of them do not support Transc

相关专题：mediawiki php

I'd like to get the HTML for a MediaWiki Page, that is I want to run the MediaWiki Markup through the parser. Now, I know I could just use some external Parser, but most of them do not support Transclusion and (naturally) Extensions, so my output will be di开发者_StackOverflow社区fferent.

As I have access to the MediaWiki installation, I wonder if I can just use the built-in parser to render me the page. I don't want to do screen scraping because of all the other stuff on the page (navigation, sidebar, javascript and css includes etc.), I literally just want the body.

If it matters, it is running MediaWiki 1.12 on PHP 5.2.

Use action=render; eg index.php?title=Article_title&action=render

Yes you can do that, as a matter of fact, I remember doing this very thing in many of my extensions available here.

Found one of my extension that does this: SecureTransclusion.

snippet follows:

public function mg_strans( &$parser, $page, $errorMessage = null, $timeout = 5 ) {

    if (!self::checkExecuteRight( $parser->mTitle ))
        return 'SecureTransclusion: '.wfMsg('badaccess');

    $title = Title::newFromText( $page );
    if (!is_object( $title ))
        return 'SecureTransclusion: '.wfMsg('badtitle')." ($page)";

    if ( $title->isTrans() )
        $content = $this->getRemotePage( $parser, $title, $errorMessage, $timeout );
    else
        $content = $this->getLocalPage( $title, $errorMessage );

    $po = $parser->parse( $content, $parser->mTitle, new ParserOptions() );
    $html = $po->getText();

    return array( $html, 'noparse' => true, 'isHTML' => true );
}

How about using the current MediaWiki parser? Just grab the converted output, say

from  to either <div class="printfooter">

or NewPP limit report. The latter begins the preprocessor's statistics. That way all the side frames and banners are omitted.