I have bash function which run python (which return all finded regex from stdin)
function find-all() {
python -c "import re
import sys
print '\n'.join(re.findall('$1', sys.stdin.read()))"
}
When I use this regex find-all 'href="([^"]*)"' <开发者_如何转开发 index.html
it should return first group from the regex (value of href attribute from file index.html)
How can I write this in sed or awk?
I suggest you use grep -o
.
-o, --only-matching
Show only the part of a matching line that matches PATTERN.
E.g.:
$ cat > foo
test test test
test
bar
baz test
$ grep -o test foo
test
test
test
test
test
Update
If you were extracting href attributes from html files, using a command like:
$ grep -o -E 'href="([^"]*)"' /usr/share/vlc/http/index.html
href="style.css"
href="iehacks.css"
href="old/"
You could extract the values by using cut
and sed
like this:
$ grep -o -E 'href="([^"]*)"' /usr/share/vlc/http/index.html| cut -f2 -d'=' | sed -e 's/"//g'
style.css
iehacks.css
old/
But you'd be better off using html/xml parsers for reliability.
Here's a gawk implementation (not tested with other awks): find_all.sh
awk -v "patt=$1" '
function find_all(str, patt) {
while (match(str, patt, a) > 0) {
for (i=0; i in a; i++) print a[i]
str = substr(str, RSTART+RLENGTH)
}
}
$0 ~ patt {find_all($0, patt)}
' -
Then:
echo 'asdf href="href1" asdf asdf href="href2" asdfasdf
asdfasdfasdfasdf href="href3" asdfasdfasdf' |
find_all.sh 'href="([^"]+)"'
outputs:
href="href1"
href1
href="href2"
href2
href="href3"
href3
Change i=0
to i=1
if you only want to print the captured groups. With i=0
you'll get output even if you have no parentheses in your pattern.
精彩评论