开发者

Regex to remove lines in file(s) that ending with same or defined letters

开发者 https://www.devze.com 2023-04-06 10:02 出处:网络
i need a bash script for mac osx working in this way: ./script.sh * folder/to/files/ # # or # # ./script.sh xx folder/to/files/

i need a bash script for mac osx working in this way:

./script.sh * folder/to/files/ 
#
# or #
#
./script.sh xx folder/to/files/

This script

  • read a list of files
  • open each file and read each lines
  • if lines ended with the same letters ('*' mode) or with custom letters ('xx') then

    remove line and RE-SAVE file

  • backup original file

My first approach to do this:

#!/bin/bash

# ck init params
if [ $# -le 0 ]
then
  echo "Usage: $0 <letters>"
  exit 0
fi

# list files in current dir
list=`ls BRUTE*` 
for i in $list 
do 

  # prepare regex    
  case $1 in
       "*") REGEXP="^.*(.)\1+$";;
       *) REGEXP="^.*[$1]$";;
  esac    
  FILE=$i

  # backup file
  cp $FILE $FILE.bak

  # removing line with same letters
  sed -Ee "s/$REGEXP//g" -i '' $FILE
  cat $FILE | grep -v "^$"

done

exit 0

But it doesn't work as i want....

What's wrong?

How can i fix this script?


Example:

$cat BRUTE02.dat BRUTE03.dat
aa
ab
ac
ad
ee
ef
ff
hhh
$

If i use '*' i want all files that ended with same letters to be clean.

If i use 'ff' i want all files that ended with 'ff' to be clean.


Ah, it's on Mac OSx. Remember that sed is a little different from classical linux sed.

man sed

 sed [-Ealn] command [file ...]
 sed [-Ealn] [-e command] [-f command_file] [-i extension] [file

...]

DESCRIPTION The sed utility reads the specified files, or the standard input if no files are specified, modifying the input as specified by a list of commands. The input is then written to the standard output.

 A single command may be specified as the first argument to sed. 

Multiple commands may be specified by using the -e or -f options. All commands are applied to the input in the order they are specified regardless of their origin.

 The following options are available:

 -E      Interpret regular expressions as extended (modern)

regular expressions rather than basic regular expressions (BRE's). The re_format(7) manual page fully describes both formats.

 -a      The files listed as parameters for the ``w'开发者_JAVA技巧' functions

are created (or truncated) before any processing begins, by default. The -a option causes sed to delay opening each file until a command containing the related ``w'' function is applied to a line of input.

 -e command
         Append the editing commands specified by the command

argument to the list of commands.

 -f command_file
         Append the editing commands found in the file

command_file to the list of commands. The editing commands should each be listed on a separate line.

 -i extension
         Edit files in-place, saving backups with the specified

extension. If a zero-length extension is given, no backup will be saved. It is not recom- mended to give a zero-length extension when in-place editing files, as you risk corruption or partial content in situations where disk space is exhausted, etc.

 -l      Make output line buffered.

 -n      By default, each line of input is echoed to the standard

output after all of the commands have been applied to it. The -n option suppresses this behavior.

 The form of a sed command is as follows:

       [address[,address]]function[arguments]

 Whitespace may be inserted before the first address and the

function portions of the command.

 Normally, sed cyclically copies a line of input, not including

its terminating newline character, into a pattern space, (unless there is something left after a ``D'' function), applies all of the commands with addresses that select that pattern space, copies the pattern space to the standard output, append- ing a newline, and deletes the pattern space.

 Some of the functions use a hold space to save all or part of the

pattern space for subsequent retrieval.

anything else?

it's clear my problem?

thanks.


I don't know bash shell too well so I can't evaluate what the failure is.
This is just an observation of the regex as understood (this may be wrong).

The * mode regex looks ok:
^.*(.)\1+$ that ended with same letters..

But the literal mode might not do what you think.
current: ^.*[$1]$ that ended with 'literal string'
This shouldn't use a character class.

Change it to: ^.*$1$

Realize though the string in $1 (before it goes into the regex) should be escaped
incase there are any regex metacharacters contained within it.

Otherwise, do you intend to have a character class?


perl -ne '
    BEGIN {$arg = shift; $re = $arg eq "*" ? qr/([[:alpha:]])\1$/ : qr/$arg$/}
    /$re/ && next || print
'

Example:

echo "aa
ab
ac
ad
ee
ef
ff" | perl -ne '
    BEGIN {$arg = shift; $re = $arg eq "*" ? qr/([[:alpha:]])\1$/ : qr/$arg$/}
    /$re/ && next || print
' '*'

produces

ab
ac
ad
ee
ef


A possible issue:

  • When you put * on the command line, the shell replaces it with the name of all the files in your directory. Your $1 will never equal *.

And some tips:

  • You can replace replace:

This:

# list files in current dir
list=`ls BRUTE*` 
for i in $list 

With:

for i in BRUTE*
  • And:

This:

cat $FILE | grep -v "^$"

With:

grep -v "^$" $FILE

Besides the possible issue, I can't see anything jumping out at me. What do you mean clean? Can you give an example of what a file should look like before and after and what the command would look like?


This is the problem!

grep '\(.\)\1[^\r\n]$' *

on MAC OSX, ( ) { }, etc... must be quoted!!!

Solved, thanks.

0

精彩评论

暂无评论...
验证码 换一张
取 消