开发者

How to remove a line that match a pattern with sepific occurance?

开发者 https://www.devze.com 2023-03-22 09:24 出处:网络
Here i would like to delete the line that match with <li><p><a href=\"anti\\/recent.html\"> with the fourth occurance

Here i would like to delete the line that match with <li><p><a href="anti\/recent.html"> with the fourth occurance

I asked here before but that is a bit different, at that time i only have to match with <ul>

at that time i get the answer:

    awk '/<ul>/ {ul++} ul == 6 { getline } 1' file

However , that can not be applied to <li><p><a href="anti\/recent.html"> as i tried https://stackoverflow.com/questions/ask

awk '/<li><p><a href="anti\/recent.html">/ {lipa href="anti\/recent.html"++} lipa href="anti\/recent.html" == 4 { getline } 1' file

That return me synatx error . Can any one give some help ?? thanks


The errors are:

awk: /<li><p><a href="anti\/recent.html">/ {lipa href="anti\/recent.html"++} lip                                                                                                                               a href="anti\/recent.html" == 4 { getline } 1
awk:                                                 ^ syntax error
awk: warning: escape sequence `\/' treated as plain `/'
awk: /<li><p><a href="anti\/recent.html">/ {lipa href="anti\/recent.html"++} lip                                                                                                                               a href="anti\/recent.html" == 4 { getline } 1
awk:                                                                                                                                                                                                                 ^ syntax error

***Continue: Thanks for anyone that helped the awk here seems have some bug

^I^I^I^I^I^I^I^I^I<li><p><a href="anti/recent.html">4 Jul 2011 - Fraudulent email purporting to be related to Standard Chartered Bank (Hong Kong) Limited</a></p></li>$
                                      <!--<li>There is no phishing attack at this  moment.</li>-->$
^I^I^I^I^I^I^I^I    </ul>$

it will delete the </ul> as well although that is on the different line?

I have edit it and test:

#!/bin/bash
i=1 cat test2.html | while read -r
do
    if [ "$(echo $REPLY | grep -E '<li><p><a href=\"anti/recent.html\">')" ]
    then
        let i++;
        if [ ! "$i" -eq 4 ]
        then
            echo "$REPLY";
        fi;
    else
        echo "$REPLY";
    fi;
done > tes开发者_运维问答t2.html;

is this correct or not??? when i execute the code and see the result of test2.html, it is a page with nothing??? no html code no text? thanks .


You must do it with awk? If not - this code may be more clear.

i=1 cat some_file | while read -r
do 
    if [ "$(echo $REPLY | grep -E '<li><p><a href=\"anti/recent.html\">')" ]
    then 
        let i++; 
        if [ ! "$i" -eq 4 ]
        then 
            echo "$REPLY"; 
        fi;
    else
        echo "$REPLY";
    fi;
done > fixed_file;
0

精彩评论

暂无评论...
验证码 换一张
取 消