In grep on Ubuntu, how can I display only the string that matched the regular expression?_问答_开发者

In grep on Ubuntu, how can I display only the string that matched the regular expression?

开发者 https://www.devze.com 2023-01-10 07:05 出处：网络

I am basically grepping with a regular expression on. In the output, I would like to see only the strings that match my reg exp.

相关专题：grep

I am basically grepping with a regular expression on. In the output, I would like to see only the strings that match my reg exp.

In a bunch of XML files (mostly they are single-line files with huge amounts of data in a开发者_开发技巧 line), I would like to get all the words that start with MAIL_.

Also, I would like the grep command on the shell to give only the words that matched and not the entire line (which is the entire file in this case).

How do I do this?

I have tried

grep -Gril MAIL_* .
grep -Grio MAIL_* .
grep -Gro MAIL_* .

First of all, with GNU grep that is installed with Ubuntu, -G flag (use basic regexp) is the default, so you can omit it, but, even better, use extended regexp with -E.

-r flag means recursive search within files of a directory, this is what you need.

And, you are right to use -o flag to print matching part of a line. Also, to omit file names you will need a -h flag.

The only mistake you made is the regular expression itself. You missed character specification before *. Your command should look like this:

grep -Ehro 'MAIL_[^[:space:]]*' .

Sample output (not recursive):

$ echo "Some garbage MAIL_OPTION comes MAIL_VALUE here" | grep -Eho 'MAIL_[^[:space:]]*'
MAIL_OPTION
MAIL_VALUE

Try the following command

grep -Eo 'MAIL_[[:alnum:]_]*'

grep -o or --only-matching

outputs only the matching text instead of complete lines but the problem could be your regex that's not restrictive or greedy enough and actually matches the whole file.

From your comment to Thor's answer it seems you also want to distinguish if the MAIL_.* text is a text node or an attribute, not just to isolate it whenever it appears in the XML document. Grep cannot parse XML, you need a proper XML parser for that.

A command line xml parser is xmlstarlet. It is packaged in Ubuntu.

Using it on this example file example file:

$ cat test.xml 
<some_root>
    <test a="MAIL_as_attribute">will be printed if you want matching attributes</test>
    <bar>MAIL_as_text will be printed if you want matching text nodes</bar>
    <MAIL_will_not_be_printed>abc</MAIL_will_not_be_printed>
</some_root>

For selecting text nodes you can use:

$ xmlstarlet sel -t -m '//*' -v 'text()' -n test.xml | grep -Eo 'MAIL_[^[:space:]]*'
MAIL_as_text

And for selecting attributes:

$ xmlstarlet sel -t -m '//*[@*]' -v '@*' -n test.xml | grep -Eo 'MAIL_[^[:space:]]*'
MAIL_as_attribute

Brief explanations:

//* is an XPath expression that selects all elements in the document and text() outputs the value of their children text nodes, therefore everything except text nodes gets filtered out
//*[@*] is an XPath expression that selects all attributes in the document and then @* outputs their value