When reading some articles about Unix commands, I noticed that the examples provided in them were not practically useful. It turns out that we do not know how to use tools that are actually useful.

Before that

Three years ago I was asked to hold an interview with applicants for the position of unix system administrator. There were eight applicants, and two of them were top-rated at the freelance marketplace. I never require sysadmins to know configs by heart. I think that we can always become familiar with the necessary software when needed. Of course, if we are ready to read much and want to use system tools properly. Therefore, I asked the applicants to solve the following tasks:

To my surprise, none of them coped with the tasks! Two of them did not even know anything about grep.

So, let’s talk about it.

To begin with, everything mentioned below is true for

# grep --version | grep grep
grep (GNU grep) 2.5.1-FreeBSD

It is important because of

# man grep | grep -iB 2 freebsd
       -P, --perl-regexp
              Interpret PATTERN as a Perl regular expression.  This option  is
              not supported in FreeBSD.

First of all, here's how we usually grep files:

[email protected]:/ # cat /var/run/dmesg.boot | grep CPU:
CPU: Intel® Core(TM)2 Quad CPU    Q9550  @ 2.83GHz (2833.07-MHz K8-class CPU)

But why? We can also do it this way:

[email protected]:/ # grep CPU: /var/run/dmesg.boot
CPU: Intel® Core(TM)2 Quad CPU    Q9550  @ 2.83GHz (2833.07-MHz K8-class CPU)

Or like this (I hate this construction):

[email protected]:/ # 

For some reason or other, count the selected lines with the help of wc:

[email protected]:/ # grep WARNING /var/run/dmesg.boot | wc -l
       3

Though we can also do it like this:

[email protected]:/ # grep WARNING /var/run/dmesg.boot -c
3

Let’s create a test file:

[email protected]:/ # grep ".*" test.txt
one two three
seven eight one eight three
thirteen fourteen fifteen
 sixteen seventeen eighteen seven
sixteen seventeen eighteen
        twenty seven
one 504 one
one 503 one
one     504     one
one     504 one
#comment UP
twentyseven
        #comment down
twenty1
twenty3
twenty5
twenty7

and get down to searching:

-w option allows to search by a whole word

[email protected]:/ # grep -w 'seven' test.txt
seven eight one eight three
 sixteen seventeen eighteen seven
        twenty seven

But what if we should search by the beginning or the end of a word?

[email protected]:/ # grep '\' test.txt
seven eight one eight three
 sixteen seventeen eighteen seven
        twenty seven
twentyseven

Or by a word that is at the beginning or at the end of the line?

[email protected]:/ # grep '^seven' test.txt
seven eight one eight three
[email protected]:/ # grep 'seven$' test.txt
 sixteen seventeen eighteen seven
        twenty seven
twentyseven
[email protected]:/ #

Want to see the lines around the sought one?

[email protected]:/ # grep -C 1 twentyseven test.txt
#comment UP
twentyseven
        #comment down

From below or above?

[email protected]:/ # grep -A 1 twentyseven test.txt
twentyseven
        #comment down
[email protected]:/ # grep -B 1 twentyseven test.txt
#comment UP
twentyseven

We can also do it like this:

[email protected]:/ # grep "twenty[1-4]" test.txt
twenty1
twenty3

Or excluding them:

[email protected]:/ # grep "twenty[^1-4]" test.txt
        twenty seven
twentyseven
twenty5
twenty7

Of course, grep supports other base quantifiers, metacharacters and other regular expressions.

Some examples:

[email protected]:/ # cat /etc/resolv.conf
#options edns0
#nameserver 127.0.0.1
nameserver 8.8.8.8
nameserver 77.88.8.8
nameserver 8.8.4.4

Select only the lines with IP:

[email protected]:/ # grep -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" /etc/resolv.conf
#nameserver 127.0.0.1
nameserver 8.8.8.8
nameserver 77.88.8.8
nameserver 8.8.4.4

It operates, but this way is better:

[email protected]:/ # grep -E '\b[0-9]{1,3}(\.[0-9]{1,3}){3}\b' /etc/resolv.conf
#nameserver 127.0.0.1
nameserver 8.8.8.8
nameserver 77.88.8.8
nameserver 8.8.4.4

Want to remove the line with a comment?

[email protected]:/ # grep -E '\b[0-9]{1,3}(\.[0-9]{1,3}){3}\b' /etc/resolv.conf | grep -v '#'
nameserver 8.8.8.8
nameserver 77.88.8.8
nameserver 8.8.4.4

And now fetch IP only:

[email protected]:/ # grep -oE '\b[0-9]{1,3}(\.[0-9]{1,3}){3}\b' /etc/resolv.conf | grep -v '#'
127.0.0.1
8.8.8.8
77.88.8.8
8.8.4.4

Oops! The commented line came back due to some peculiarities of template processing. What should we do? Something like this:

[email protected]:/ # grep -v '#' /etc/resolv.conf | grep -oE '\b[0-9]{1,3}(\.[0-9]{1,3}){3}\b'
8.8.8.8
77.88.8.8
8.8.4.4

Let’s take a look at the search inverting by -v key.

Suppose we need to do «ps -afx | grep ttyv»

[email protected]:/ # ps -afx | grep ttyv
 1269 v1  Is+       0:00.00 /usr/libexec/getty Pc ttyv1
 1270 v2  Is+       0:00.00 /usr/libexec/getty Pc ttyv2
 1271 v3  Is+       0:00.00 /usr/libexec/getty Pc ttyv3
 1272 v4  Is+       0:00.00 /usr/libexec/getty Pc ttyv4
 1273 v5  Is+       0:00.00 /usr/libexec/getty Pc ttyv5
 1274 v6  Is+       0:00.00 /usr/libexec/getty Pc ttyv6
 1275 v7  Is+       0:00.00 /usr/libexec/getty Pc ttyv7
48798  2  S+        0:00.00 grep ttyv

Okay, but we do not need “48798 2 S+ 0:00.00 grep ttyv” line. Use –v:

[email protected]:/ # ps -afx | grep ttyv | grep -v grep
 1269 v1  Is+       0:00.00 /usr/libexec/getty Pc ttyv1
 1270 v2  Is+       0:00.00 /usr/libexec/getty Pc ttyv2
 1271 v3  Is+       0:00.00 /usr/libexec/getty Pc ttyv3
 1272 v4  Is+       0:00.00 /usr/libexec/getty Pc ttyv4
 1273 v5  Is+       0:00.00 /usr/libexec/getty Pc ttyv5
 1274 v6  Is+       0:00.00 /usr/libexec/getty Pc ttyv6
 1275 v7  Is+       0:00.00 /usr/libexec/getty Pc ttyv7

Does not look good? What about now?

[email protected]:/ # ps -afx | grep "[t]tyv"
 1269 v1  Is+       0:00.00 /usr/libexec/getty Pc ttyv1
 1270 v2  Is+       0:00.00 /usr/libexec/getty Pc ttyv2
 1271 v3  Is+       0:00.00 /usr/libexec/getty Pc ttyv3
 1272 v4  Is+       0:00.00 /usr/libexec/getty Pc ttyv4
 1273 v5  Is+       0:00.00 /usr/libexec/getty Pc ttyv5
 1274 v6  Is+       0:00.00 /usr/libexec/getty Pc ttyv6
 1275 v7  Is+       0:00.00 /usr/libexec/getty Pc ttyv7

Do not forget about | (OR)

[email protected]:/ # vmstat -z | grep -E "(sock|ITEM)"
ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP
socket:                 696, 130295,      30,      65,   43764,   0,   0

the same, but in a different way:

[email protected]:/ # vmstat -z | grep "sock\|ITEM"
ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP
socket:                 696, 130295,      30,      65,   43825,   0,   0

While many of you remember using regular expressions in grep, some of you still forget to utilize POSIX classes, though they are actually handy as well:

POSIX:

[:alpha:] Any alphabetical character, regardless of case
[:digit:] Any numerical character
[:alnum:] Any alphabetical or numerical character
[:blank:] Space or tab characters
[:xdigit:] Hexadecimal characters; any number or A–F or a–f
[:punct:] Any punctuation symbol
[:print:] Any printable character (not control characters)
[:space:] Any whitespace character
[:graph:] Exclude whitespace characters
[:upper:] Any uppercase letter
[:lower:] Any lowercase letter
[:cntrl:] Control characters

Let's grep lines with uppercase characters:

[email protected]:/ # grep "[[:upper:]]" test.txt
#comment UP

Can not see it clearly? Let’s highlight it:

Grep

Some more tricks. The first one is more academic. I haven't used it for 15 years.

Select lines containing six, seven or eight.

It is simple.

[email protected]:/ # grep -E "(six|seven|eight)" test.txt
seven eight one eight three
 sixteen seventeen eighteen seven
sixteen seventeen eighteen
        twenty seven
twentyseven

And now select only the lines containing six, seven or eight several times. This feature is called Backreferences:

[email protected]:/ # grep -E "(six|seven|eight).*\1" test.txt
seven eight one eight three
 sixteen seventeen eighteen seven

Here is a second trick which is much more useful.

Print the lines, in which 504 has a tab around it (PCRE support would be great here…).

POSIX classes use does not help:

[email protected]:/ # grep "[[:blank:]]504[[:blank:]]" test.txt
one 504 one
one     504     one
one     504 one

[CTRL+V][TAB] construction comes in handy:

[email protected]:/ # grep "     504     " test.txt
one     504     one

What have I missed out? grep can certainly search in files/directories recursively. Let’s find the code allowing to use Intel for exterior SFPs. I don’t remember which way to write: allow_unsupported_sfp or unsupported_allow_sfp. Anyway, it is grep’s problem:

[email protected]:/ # grep -rni allow /usr/src/sys/dev/ | grep unsupp
/usr/src/sys/dev/ixgbe/README:75:of unsupported modules by setting the static variable 'allow_unsupported_sfp'
/usr/src/sys/dev/ixgbe/ixgbe.c:322:static int allow_unsupported_sfp = TRUE;
/usr/src/sys/dev/ixgbe/ixgbe.c:323:TUNABLE_INT("hw.ixgbe.unsupported_sfp", &allow_unsupported_sfp);
/usr/src/sys/dev/ixgbe/ixgbe.c:542:     hw->allow_unsupported_sfp = allow_unsupported_sfp;
/usr/src/sys/dev/ixgbe/ixgbe_type.h:3249:       bool allow_unsupported_sfp;
/usr/src/sys/dev/ixgbe/ixgbe_phy.c:1228:                                if (hw->allow_unsupported_sfp == TRUE) {

Hope you are not bored as it is just the top of grep iceberg.

Happy grepping!

Published by
RATING: 13.90
Published in

*nix

The *nix world is all about Unix-like systems, e.g., Linux, BSD, etc
  • 1
  • 2

Subscribe to Kukuruku Hub

Or subscribe with RSS

2 comments

Federico Ramirez
Interesting article, I learnt some stuff about grep! Thanks, anyways I think you could have explained a bit more on some topics, like saying «That works but this is better», why is it better?Cheers.
Kukuruku Hub
I agree. In this specific case it's better because the regex is a little bit shorter and more precise.
[email protected]:/ # grep -E '\b[0-9]{1,3}(\.[0-9]{1,3}){3}\b' /etc/resolv.conf
The \b is a word boundary modifier, which means that it matches before and after an alphanumeric sequence. To be honest, this regular expression does not represent IP address as per RFC. For instance, this rule [0-9]{1,3} will also match 745, which is far over 255. But usually you do not care too much, as you can visually distinguish between real IP address and something like 745.983.001.874.

Read Next