开发者

A Linux Shell Script Problem

开发者 https://www.devze.com 2023-01-24 19:47 出处:网络
I have a string separated by dot in Linux Shell, $example=This.is.My.String I want to 1.Add some string before the last dot, for example, I want to add \"Good.Long\" before thelast dot, so I get:

I have a string separated by dot in Linux Shell,

$example=This.is.My.String

I want to

1.Add some string before the last dot, for example, I want to add "Good.Long" before the last dot, so I get:

This.is.My.Goood.Long.String

2.Get the part after the last dot, so I will get

String

3.Turn the dot into underscore except the las开发者_如何转开发t dot, so I will get

This_is_My.String

If you have time, please explain a little bit, I am still learning Regular Expression.

Thanks a lot!


I don't know what you mean by 'Linux Shell' so I will assume bash. This solution will also work in zsh, etcetera:

example=This.is.My.String
before_last_dot=${example%.*}
after_last_dot=${example##*.}
echo ${before_last_dot}.Goood.Long.${after_last_dot} 
This.is.My.Goood.Long.String

echo ${before_last_dot//./_}.${after_last_dot} 
This_is_My.String

The interim variables before_last_dot and after_last_dot should explain my usage of the % and ## operators. The //, I also think is self-explanatory but I'd be happy to clarify if you have any questions.

This doesn't use sed (or even regular expressions), but bash's inbuilt parameter substitution. I prefer to stick to just one language per script, with as few forks as possible :-)


Other users have given good answers for #1 and #2. There are some disadvantages to some of the answers for #3. In one case, you have to run the substitution twice. In another, if your string has other underscores they might get clobbered. This command works in one go and only affects dots:

sed 's/\(.*\)\./\1\n./;h;s/[^\n]*\n//;x;s/\n.*//;s/\./_/g;G;s/\n//'
  1. It splits the line before the last dot by inserting a newline and copies the result into hold space:

    s/\(.*\)\./\1\n./;h
    
  2. removes everything up to and including the newline from the copy in pattern space and swaps hold space and pattern space:

    s/[^\n]*\n//;x
    
  3. removes everything after and including the newline from the copy that's now in pattern space

    s/\n.*//
    
  4. changes all dots into underscores in the copy in pattern space and appends hold space onto the end of pattern space

    s/\./_/g;G
    
  5. removes the newline that the append operation adds

    s/\n//
    

Then the sed script is finished and the pattern space is output.

At the end of each numbered step (some consist of two actual steps):

Step        Pattern Space                 Hold Space

  1.        This.is.My\n.String       This.is.My\n.String

  2.        This.is.My\n.String       .String

  3.        This.is.My                        .String

  4.        This_is_My\n.String     .String

  5.        This_is_My.String            .String


Solution

  1. Two versions of this, too:
    • Complex: sed 's/\(.*\)\([.][^.]*$\)/\1.Goood.Long\2/'
    • Simple: sed 's/.*\./&Goood.Long./' - thanks Dennis Williamson
  2. What do you want?
    • Complex: sed 's/.*[.]\([^.]*\)$/\1/'
    • Simpler: sed 's/.*\.//' - thanks, glenn jackman.
  3. sed 's/\([^.]*\)[.]\([^.]*[.]\)/\1_\2/g'

With 3, you probably need to run the substitute (in its entirety) at least twice, in general.

Explanation

Remember, in sed, the notation \(...\) is a 'capture' that can be referenced as '\1' or similar in the replacement text.

  1. Capture everything up to a string starting with a dot followed by a sequence of non-dots (which you also capture); replace by what came before the last dot, the new material, and the last dot and what came after it.

  2. Ignore everything up to the last dot followed by a capture of a sequence of non-dots; replace with the capture only.

  3. Find and capture a sequence of non-dots, a dot (not captured), followed by a sequence of non-dots and a dot; replace the first dot with an underscore. This is done globally, but the second and subsequent matches won't touch anything already matched. Therefore, I think you need ceil(log2N) passes, where N is the number of dots to be replaced. One pass deals with 1 dot to replace; two passes deals with 2 or 3; three passes deals with 4-7, and so on.


Here's a version that uses Bash's regex matching (Bash 3.2 or greater).

[[ $example =~ ^(.*)\.(.*)$ ]]
echo ${BASH_REMATCH[1]//./_}.${BASH_REMATCH[2]}

Here's a Bash version that uses IFS (Internal Field Separator).

saveIFS=$IFS
IFS=.
array=($e)                    # *   split the string at each dot
lastword=${array[@]: -1}
unset "array[${#array}-1]"    # *
IFS=_
echo "${array[*]}.$lastword"  #     The asterisk as a subscript when inside quotes causes IFS (an underscore in this case) to be inserted between each element of the array
IFS=$saveIFS

* use declare -p array after these steps to see what the array looks like.


1.

$ echo 'This.is.my.string' | sed 's}[^\.][^\.]*$}Good Long.&}'
This.is.my.Good Long.string

before: a dot, then no dot until the end. after: obvious, & is what matched the first part

2.

$ echo 'This.is.my.string' | sed 's}.*\.}}'
string

sed greedy matches, so it will extend the first closure (.*) as far as possible i.e. to the last dot.

3.

$ echo 'This.is.my.string' | tr . _ | sed 's/_\([^_]*\)$/\.\1/'
This_is_my.string

convert all dots to _, then turn the last _ to a dot.

(caveat: this will turn 'This.is.my.string_foo' to 'This_is_my_string.foo', not 'This_is_my.string_foo')


You don't need regular expressions at all (those complex things hurt my eyes!) if you use Awk and are a little creative.

1. echo $example| awk -v ins="Good.long" -F . '{OFS="."; $NF = ins"."$NF;print}'

What this does:
-v ins="Good.long" tells awk to create a variable called 'ins' with "Good.long" as content,
-F . tells awk to use the dot as a separator for your fields for input,
-OFS tells awk to use the dot as a separator for your fields as output,
NF is the number of fields, so $NF represents the last field,
the $NF=... part replaces the last field, it appends the current last string to what you want to insert (the variable called "ins" declared earlier).

2. echo $example| awk -F . '{print $NF}'

$NF is the last field, so that's all!

3. echo $example| awk -F . '{OFS="_"; $(NF-1) = $(NF-1)"."$NF; NF=NF-1; print}'

Here we have to be creative, as Awk AFAIK doesn't allow deleting fields. Of course, we set the output field separateor to underscore.

$(NF-1) = $(NF-1)"."$NF: First, we replace the second last field with the last glued to the second last, with a dot between.
Then, we fool awk to make it think the Number of fields is equal to the number of fields minus one, hence deleting the last field!

Note you can't say $NF="", because then it would display two underscores.

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号