开发者

Make sed replace ONLY exact strings

开发者 https://www.devze.com 2023-03-22 20:26 出处:网络
i have a css file like the following: #layout.one-column#menu-secondary{background: #3c3c3c; height: 20px; font-family: \'Trebuchet MS\'; font-weight: bold; font-size: 15px; padding: 10px;}

i have a css file like the following:

    #layout.one-column  #menu-secondary{background: #3c3c3c; height: 20px; font-family: 'Trebuchet MS'; font-weight: bold; font-size: 15px; padding: 10px;}     
    #layout.one-column  #menu-secondary a {color: #FFF; text-decoration: none;}
    #layout.one-column  #menu-secondary ul {}   
    #layout.one-column  #menu-secondary ul li {display: block; height: 30px; float: left; margin: 0 20px 0 0;}  
    .ofr h2 {font-size: 17px; height: 35px; margin: 0 10px 10px 10px;}  
    .ofr h2 a {color: #2a2a2a; text-decoration: none;}      
    #layout.one-column  #menu-secondary ul li.active {background: url(../img/selected.gif) no-repeat bottom center;}
    #layout.one-column  #menu-secondary ul li a {display: block; float: left; padding: 0 10px;}     
    #layout.one-column  #menu-secondary ul li a:hover {text-decoration: underline;}  

As you can see every line in its beginning has tabs/couple blank spaces and the string starts with .whatever/#whatever. I've coded a litle script wich at a point runs:

find css/myCSS.css -name "*.css" -type f -exec sed -i "s/\<$pattern\>/$replacer/g" {} \;

where $pattern could be #layout and $replacer could be #LAYOUT. What i would like to do, and obviusly i'm doing it in the wrong way is to replace #layout by #LAYOUT if the strings are

  • equal (blank spaces/tabs before and after the $pattern)
  • equal (blank spaces/tabs just before the $pattern) followed by dot plus whatever (#pattern.whatever)
  • equa开发者_StackOverflowl (blank spaces/tabs just before the $pattern) followed by # plus whatever (#pattern#whatever)
  • like #whatever.pattern or #whatever#pattern (blank spaces/tabs just before the #whatever and after #pattern).

I hope i made it now, cristal clear :)

Here are some examples, in every line #pattern or .pattern should get replaced:

#pattern     <- blank spaces/tabs before and after the string  
#pattern.bar <- blank spaces/tabs before #pattern  and after .bar  
.pattern#bar <- blank spaces/tabs before .pattern  and after #bar  
#foo.pattern <- blank spaces/tabs before #foo and after .pattern  
.foo#pattern <- blank spaces/tabs before .foo and after #pattern  
.pattern     <- blank spaces/tabs before and after the string   

I've been trying to do it using sed but i can't get through and thought i could be "easy" for someone whos' working daily with sed. Thanks again :)


If you want to redefine word-boundaries to your needs, you need to enumerate them. One approach is, to grab the boundary-pattern and append it in the end:

echo "well #menu not #menu-foo #menu" | sed -r 's/#menu([ \t\n\r.!?,]|$)/#MENU\1/g'
well #MENU not #menu-foo #MENU

The |$ is to grab the end of file/end of input case.

I still don't know the role of the leading #, but I guess you can apply the idea so far, if you need something like \1MENU\2 for the first delimiter pattern.

update 28.07, 23:45:

  • equal (blank/tab before and after the $pattern) [ \t]pattern[ \t]
  • equal (blank/tab just before the $pattern) followed by dot plus whatever (#pattern.whatever) [ \t]pattern.[^ \t] an exhaustive description of 'whatever' would be better. Additional dots, is - allowed - how do we recognize that 'whatever' ended? Whitespace?
  • equal (blank/tab just before the $pattern) followed by # plus whatever (#pattern#whatever) [ \t]pattern#[^ \t] okay, that's the same as above, just hash instead of dot.
  • like #whatever.pattern or #whatever#pattern (blank/tab just before the #whatever and after #pattern). [ \t]#[^ \t].pattern[ \t] or [ \t]#[^ \t]#pattern[ \t]

No. 2 and 3 is nearly the same. If we mean A or B, we can simply form a group [#.]. Inside the group, we don't need to mask the dot, because a dot as a joker wouldn't make any sense in a group.

No. 2 and 3 combined is therefore

[ \t]pattern[#.][^ \t][ \t]

But! You don't do anything with 'whatever'. Whatever it is, it isn't changed. So we add # and . just to the list of delimiters (blank and tab) and return them (or blank or tab), whatever they are:

[ \t]pattern([#. \t])

A simple test:

echo "well #menu not #menu-false #menu.dot #menu#hash" \
| sed -r 's/[ \t]#menu([#. \t])/ #MENU\1/g' 
well #MENU not #menu-false #MENU.dot #MENU#hash

This would modify the thing in front of #Menu, be it blank or tab, always to blank. We could capture it too, if wanted.

| sed -r 's/([ \t])#menu([#. \t])/\1#MENU\2/g' 

But what's about the last rule, number 4, where 'whatever' is leading 'pattern'? We can combine dot and hash:

[ \t]#[^ \t][.#]menu[ \t]

Combining this case into our regex would allow #foo#pattern#bar. That's getting complicated. We better start a fresh, new command:

s/([ \t]#[^ \t]+[.#])menu[ \t]/\1MENU /g'

which can be appended with ';' after the previous one:

| sed -r 's/[ \t]#menu([#. \t])/ #MENU\1/g;s/([ \t]#[^ \t]+[.#])menu[ \t]/\1MENU /g'

So I guess I solved your 4 rules, but the example at the top only addresses two of them. And your attempt includes again \< and \> which is only confusing.

Here my selfmade example, including a case for rule 4:

echo "well #bar.menu and #foo#menu #menu not #menu-false #menu.dot #menu#hash" \
| sed -r 's/[ \t]#menu([#. \t])/ #MENU\1/g;s/([ \t]#[^ \t]+[#.])menu[ \t]/\1MENU /g'

well #bar.MENU and #foo#MENU #MENU not #menu-false #MENU.dot #MENU#hash


Rewritten based on question rewrite. Warning, some quoting games are played herein:

pattern="layout"
replace="FOO"

sed 's/\([ \t#.]\)'"$pattern"'\([ \t#.]\)/\1'"$replace"'\2/g'  << EXAMPLE

 #layout  #layout.whatever #layout#whatever
 #whatever.layout #whatever#layout
 .layout .layout.whatever .layout#whatever
EXAMPLE

produces

 #FOO  #FOO.whatever #FOO#whatever
 #whatever.FOO #whatever#layout
 .FOO .FOO.whatever .FOO#whatever


UPDATE 2

OK, you need to match whole words that start with a # or a . and are a valid CSS identifier, and then may end with a CSS chain or whitespace. And they may be at the end of a CSS chain as well?

sed -i "s/\(\s+|[#.][a-z_][a-z0-9_-]*\)#pattern\(\s+|[#.:]\)/\1#PATTERN\2/"

That is ugly and has everything spelled out. I checked the CSS spec to make sure I had the pattern right for selector IDENTs. There is a : in the terminal group because of pseudo-selectors.

OLDER STUFF

\b won't work for you (because you consider #menu-foo to be a single item and \b sees it as four things # menu - foo.)

You need to be clearer about what you consider a "word break" to be before we can help you. At minimum you might try your sed like this if whitespace breaks are what you consider enough:

sed -i "s/\(\s\)#menu\(\s\)/\1#MENU\2/"

Alternately, you are going to have to specify what a word break consists of. Rather than \s you might need \(^|[\s"']\) for the beginning and something real ugly for the end condition.

Based on your comment, if each token you care about is between HTML tags then you could do something like the following. Take the -i out of the sed line if you don't want case-insensitivity. At this point, my only question would be if there are also line-breaks in your data. Is all the HTML on one text-line?

sed -i "s/>#menu</>#MENU</"

or, fancier and including possible line breaks:

sed -i "s/\(^|>\)#menu\($|<\)/\1#MENU\2/"

We may need sample data to get beyond this...

0

精彩评论

暂无评论...
验证码 换一张
取 消