I am trying to rework many pages across many sites. The pages may contain JavaScript, PHP, or ASP code in addition to HTML. The problem I'm encountering is that the module rewrites things I don't want rewritten. I've managed to handle most of the symbols (e.g., "
, >
) in HTML tags like script
, but they get changed into entities (e.g., "
, >
) in the php sections. Plus, the php tags are stripped out at the same time.
If I have a PHP file that looks like this:
<html>
<head><title>My Page</title></head>
<body>
<p>Some cruft which I want to repeat</p>
<form name="foo"> (form content to be replaced)
</form>
<script type="JavaScript">
<!--
Some javaScript to be left alone
-->
</script>
<a href="somepage.php">Link to be removed</a>
<?php
if (strlen($txtKeyword) > 2)
{
echo " or <a href=\"database_search_keyword.htm\">Search again?</a></p>";
if(isset($_REQUEST['nr']))
{
$numRows = $_REQUEST['nr'];
....
?>
</body>
</html>
I want the final result to look like:
<html>
<head><title>My Page</title></head>
<body>
<p>Some cruft which I want to repeat</p>
<ul><li>List replacing form</li>
</ul>
<script type="JavaScript">
<!--
Some javaScript to be left alone
-->
</script>
<?php
if (strlen($txtKeyword) > 2)
{
echo " or <a href=\"database_search_keyword.htm\">Search again?</a></p>";
if(isset($_REQUEST['nr']))
{
$numRows = $_REQUEST['nr'];
....
?>
</body>
</html>
As I said, I'm able to get everything working except the php. It gets managled, so the result
<html>
<head><title>My Page</title></head>
<body>
<p>Some cruft which I want to repeat</p>
<ul><li>List replacing form</li>
</ul>
<script type="JavaScript">
<!--
Some javaScript to be left alone
-->
</script>
<?php
if (strlen($txtKeyword) > 2)
{
echo " or ";
if(isset($_REQUEST['nr']))
{
$numRows = $_REQUEST['nr'];
....
?>
</body>
</html>
I have been working with HTML::TreeBuilder 3.23. I've tried the developer release 3.23_3, but it gives an error message due to php code (e.g., a has an invalid attribute name '"§ion_id' ' . $section_id . '
).
Example code for what I've done so far (with the filesystem walking, etc. chopped out) is
#!/usr/bin/perl -w
use strict;
use HTML::TreeBuilder;
# Set up replacement forms
my $artistSearch = HTML::Element->new ('~literal', 'text', <<EOF);
<p>Please select from the list below.</p>
<ul>
<li><a href="http://firstlink.com/">item 1</a></li>
<li><a href="http://secondlink.com/">item 1</a></li>
</ul>
EOF
my $filename = "AFA.php";
my $file = HTML::TreeBuilder->new();
$file->store_comments(1);
$file->ignore_ignorable_whitespace(1);
$file->no_space_compacting(1);
my $tree = $file->parse_file($filename);
my $form = $tree->find_by_tag_name(开发者_如何学Go'form');
my $fname = $form->attr('name');
if ($fname eq 'mainform') {
$form->delete;
} elsif ($fname eq 'artist_search') {
$form->replace_with($artistSearch)->delete;
} else {
# It's a form we're not changing
}
my $printout = $file->as_HTML("", " ", {});
open (PAGE, "> $filename");
print PAGE $printout;
close (PAGE);
$file->delete;
I am open to any suggestions, examples, etc. I'm not necessarily tied to any particular module, but I'm not exactly an expert programmer.
Thank you!
The problem here is obviously the <?php .. ?>
tag. You could accomplish this with a preparser. I'll use a simple regex for this:
use strict;
use warnings;
undef $/;
$_=<>;
my @phps;
push @phps, $1 while s/<\?php (.*?) \?>/__PHP_CODE__/;
use Data::Dumper;
die Dumper [$_, \@phps];
You can try it:
echo "foo<?php phpfoo ?> bar <?php phpbar ?> baz" | filter.pl
$VAR1 = [
'foo__PHP_CODE__ bar __PHP_CODE__ baz',
[
'phpfoo',
'phpbar'
]
];
Now, when you're done with it. You can just do the reverse to get the PHP code out of the @phps
array and back into the proper order in the output:
my $count = 0;
s/__PHP_CODE__/<?php $phps[$count++] ?>/g;
Make no mistake about it, this is a hack; but, it will get your job done quite effectively without much thought. It is fairly simple to implement too. I can think of a ton of better ways to do this -- like extending HTML::Element
to include a pseudo <?php .. ?>
element. What you don't want is to undo mangling (like character-encoding) by HTML::Element
in TT -- that sounds like a far worse idea to me. You could even implement the stuff that goes from the __PHP_CODE__
token to the real PHP code using an Template
filter.
It should be noted this doesn't take care of shorttags (though it could easily!) And, I'm not sure of the logic that triggers the PHP interpreter (escaping <?php
or ?>
for instance). It should be obvious, though I'll disclose, that this pays no respect to PHP code like this:
echo '?>';
精彩评论