开发者

sed: removing alphanumeric words from a file

开发者 https://www.devze.com 2023-01-30 14:44 出处:网络
I have file with a lot of text, what I want to do is to remove all alphanumeric words. Example of words to be removed:

I have file with a lot of text, what I want to do is to remove all alphanumeric words.

Example of words to be removed:

gr8  
2006  
sdlfj435ljsa  
232asa  
asld213  
ladj2343asda
asd!32  

what is the best way I c开发者_StackOverflowan do this?


If you want to remove all words that consist of letters and digits, leaving only words that consist of all digits or all letters:

sed 's/\([[:alpha:]]\+[[:digit:]]\+[[:alnum:]]*\|[[:digit:]]\+[[:alpha:]]\+[[:alnum:]]*\) \?//g' inputfile

Example:

$ echo 'abc def ghi 111 222 ab3 a34 43a a34a 4ab3' | sed 's/\<\([[:alpha:]]\+[[:digit:]]\+[[:alnum:]]*\|[[:digit:]]\+[[:alpha:]]\+[[:alnum:]]*\) \?//g'
abc def ghi 111 222


Assuming the only output you wanted from your sample text is 2006 and you have one word per line:

 sed '/[[:alpha:]]\+/{/[[:digit:]]\+/d}' /path/to/alnum/file

Input

$ cat alnum
gr8
2006
sdlFj435ljsa
232asa
asld213
ladj2343asda
asd!32
alpha

Output

$ sed '/[[:alpha:]]\+/{/[[:digit:]]\+/d}' ./alnum
2006
alpha


If the goal is actually to remove all alphanumeric words (strings consisting entirely of letters and digits) then this sed command will work. It replaces all alphanumeric strings with nothing.

sed 's/[[:alnum:]]*//g' < inputfile

Note that other character classes besides alnum are also available (see man 7 regex).

For your given example data, this leaves only 6 blank lines and a single ! (since that is the only non-alphanumeric character in the example data). Is this actually what you're trying to do?


AWK solution:

BEGIN { # Statement that will be executed once at the beginning.
    FS="[ \t]" # Set space and tab characters to be treated as word separator.
}
# Code below will execute for each line in file.
{
    x=1  # Set initial word index to 1 (0 is the original string in array)
    fw=1 # Indicate that future matched word is a first word. This is needed to put newline and spaces correctly.
    while ( x<=NF )
    {
        gsub(/[ \t]*/,"",$x) # Strip word. Remove any leading and trailing white-spaces.
        if (!match($x,"^[A-Za-z0-9]*$")) # Print word only if it does not match pure alphanumeric set of characters.
        {
            if (fw == 0)
            {
                printf (" %s", $x) # Print the word offsetting it with space in case if this is not a first match.
            }
            else
            {
                printf ("%s", $x) # Print word as is...
                fw=0 # ...and indicate that future matches are not first occurrences
            }
        }
        x++ # Increase word index number.
    }
    if (fw == 0) # Print newline only if we had matched some words and printed something.
    {
        printf ("\n")
    }
}

Assuming you have this script in script.awk' and data indata.txt, you have to invokeawk` like this:

awk -f ./test.awk ./data.txt

For your file it will produce:

asd!32

For more complex cases like this:

gr8
2006
sdlfj435ljsa
232asa  he!he lol
asld213  f
ladj2343asda
asd!32  ab acd!s

... it will produce this:

he!he
asd!32 acd!s

Hope it helps. Good luck!

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号