When reading some articles about Unix commands, I noticed that the examples provided in them were not practically useful. It turns out that we do not know how to use tools that are actually useful.

Before that

Three years ago I was asked to hold an interview with applicants for the position of unix system administrator. There were eight applicants, and two of them were top-rated at the freelance marketplace. I never require sysadmins to know configs by heart. I think that we can always become familiar with the necessary software when needed. Of course, if we are ready to read much and want to use system tools properly. Therefore, I asked the applicants to solve the following tasks:

To my surprise, none of them coped with the tasks! Two of them did not even know anything about grep.

So, let’s talk about it.

To begin with, everything mentioned below is true for

# grep --version | grep grep
grep (GNU grep) 2.5.1-FreeBSD

It is important because of

# man grep | grep -iB 2 freebsd
       -P, --perl-regexp
              Interpret PATTERN as a Perl regular expression.  This option  is
              not supported in FreeBSD.

First of all, here's how we usually grep files:

root@nm3:/ # cat /var/run/dmesg.boot | grep CPU:
CPU: Intel® Core(TM)2 Quad CPU    Q9550  @ 2.83GHz (2833.07-MHz K8-class CPU)

But why? We can also do it this way:

root@nm3:/ # grep CPU: /var/run/dmesg.boot
CPU: Intel® Core(TM)2 Quad CPU    Q9550  @ 2.83GHz (2833.07-MHz K8-class CPU)

Or like this (I hate this construction):

root@nm3:/ # 

For some reason or other, count the selected lines with the help of wc:

root@nm3:/ # grep WARNING /var/run/dmesg.boot | wc -l
       3

Though we can also do it like this:

root@nm3:/ # grep WARNING /var/run/dmesg.boot -c
3

Let’s create a test file:

root@nm3:/ # grep ".*" test.txt
one two three
seven eight one eight three
thirteen fourteen fifteen
 sixteen seventeen eighteen seven
sixteen seventeen eighteen
        twenty seven
one 504 one
one 503 one
one     504     one
one     504 one
#comment UP
twentyseven
        #comment down
twenty1
twenty3
twenty5
twenty7

and get down to searching:

-w option allows to search by a whole word

root@nm3:/ # grep -w 'seven' test.txt
seven eight one eight three
 sixteen seventeen eighteen seven
        twenty seven

But what if we should search by the beginning or the end of a word?

root@nm3:/ # grep '\' test.txt
seven eight one eight three
 sixteen seventeen eighteen seven
        twenty seven
twentyseven

Or by a word that is at the beginning or at the end of the line?

root@nm3:/ # grep '^seven' test.txt
seven eight one eight three
root@nm3:/ # grep 'seven$' test.txt
 sixteen seventeen eighteen seven
        twenty seven
twentyseven
root@nm3:/ #

Want to see the lines around the sought one?

root@nm3:/ # grep -C 1 twentyseven test.txt
#comment UP
twentyseven
        #comment down

From below or above?

root@nm3:/ # grep -A 1 twentyseven test.txt
twentyseven
        #comment down
root@nm3:/ # grep -B 1 twentyseven test.txt
#comment UP
twentyseven

We can also do it like this:

root@nm3:/ # grep "twenty[1-4]" test.txt
twenty1
twenty3

Or excluding them:

root@nm3:/ # grep "twenty[^1-4]" test.txt
        twenty seven
twentyseven
twenty5
twenty7

Of course, grep supports other base quantifiers, metacharacters and other regular expressions.

Some examples:

root@nm3:/ # cat /etc/resolv.conf
#options edns0
#nameserver 127.0.0.1
nameserver 8.8.8.8
nameserver 77.88.8.8
nameserver 8.8.4.4

Select only the lines with IP:

root@nm3:/ # grep -E "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" /etc/resolv.conf
#nameserver 127.0.0.1
nameserver 8.8.8.8
nameserver 77.88.8.8
nameserver 8.8.4.4

It operates, but this way is better:

root@nm3:/ # grep -E '\b[0-9]{1,3}(\.[0-9]{1,3}){3}\b' /etc/resolv.conf
#nameserver 127.0.0.1
nameserver 8.8.8.8
nameserver 77.88.8.8
nameserver 8.8.4.4

Want to remove the line with a comment?

root@nm3:/ # grep -E '\b[0-9]{1,3}(\.[0-9]{1,3}){3}\b' /etc/resolv.conf | grep -v '#'
nameserver 8.8.8.8
nameserver 77.88.8.8
nameserver 8.8.4.4

And now fetch IP only:

root@nm3:/ # grep -oE '\b[0-9]{1,3}(\.[0-9]{1,3}){3}\b' /etc/resolv.conf | grep -v '#'
127.0.0.1
8.8.8.8
77.88.8.8
8.8.4.4

Oops! The commented line came back due to some peculiarities of template processing. What should we do? Something like this:

root@nm3:/ # grep -v '#' /etc/resolv.conf | grep -oE '\b[0-9]{1,3}(\.[0-9]{1,3}){3}\b'
8.8.8.8
77.88.8.8
8.8.4.4

Let’s take a look at the search inverting by -v key.

Suppose we need to do «ps -afx | grep ttyv»

root@nm3:/ # ps -afx | grep ttyv
 1269 v1  Is+       0:00.00 /usr/libexec/getty Pc ttyv1
 1270 v2  Is+       0:00.00 /usr/libexec/getty Pc ttyv2
 1271 v3  Is+       0:00.00 /usr/libexec/getty Pc ttyv3
 1272 v4  Is+       0:00.00 /usr/libexec/getty Pc ttyv4
 1273 v5  Is+       0:00.00 /usr/libexec/getty Pc ttyv5
 1274 v6  Is+       0:00.00 /usr/libexec/getty Pc ttyv6
 1275 v7  Is+       0:00.00 /usr/libexec/getty Pc ttyv7
48798  2  S+        0:00.00 grep ttyv

Okay, but we do not need “48798 2 S+ 0:00.00 grep ttyv” line. Use –v:

root@nm3:/ # ps -afx | grep ttyv | grep -v grep
 1269 v1  Is+       0:00.00 /usr/libexec/getty Pc ttyv1
 1270 v2  Is+       0:00.00 /usr/libexec/getty Pc ttyv2
 1271 v3  Is+       0:00.00 /usr/libexec/getty Pc ttyv3
 1272 v4  Is+       0:00.00 /usr/libexec/getty Pc ttyv4
 1273 v5  Is+       0:00.00 /usr/libexec/getty Pc ttyv5
 1274 v6  Is+       0:00.00 /usr/libexec/getty Pc ttyv6
 1275 v7  Is+       0:00.00 /usr/libexec/getty Pc ttyv7

Does not look good? What about now?

root@nm3:/ # ps -afx | grep "[t]tyv"
 1269 v1  Is+       0:00.00 /usr/libexec/getty Pc ttyv1
 1270 v2  Is+       0:00.00 /usr/libexec/getty Pc ttyv2
 1271 v3  Is+       0:00.00 /usr/libexec/getty Pc ttyv3
 1272 v4  Is+       0:00.00 /usr/libexec/getty Pc ttyv4
 1273 v5  Is+       0:00.00 /usr/libexec/getty Pc ttyv5
 1274 v6  Is+       0:00.00 /usr/libexec/getty Pc ttyv6
 1275 v7  Is+       0:00.00 /usr/libexec/getty Pc ttyv7

Do not forget about | (OR)

root@nm3:/ # vmstat -z | grep -E "(sock|ITEM)"
ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP
socket:                 696, 130295,      30,      65,   43764,   0,   0

the same, but in a different way:

root@nm3:/ # vmstat -z | grep "sock\|ITEM"
ITEM                   SIZE  LIMIT     USED     FREE      REQ FAIL SLEEP
socket:                 696, 130295,      30,      65,   43825,   0,   0

While many of you remember using regular expressions in grep, some of you still forget to utilize POSIX classes, though they are actually handy as well:

POSIX:

[:alpha:] Any alphabetical character, regardless of case
[:digit:] Any numerical character
[:alnum:] Any alphabetical or numerical character
[:blank:] Space or tab characters
[:xdigit:] Hexadecimal characters; any number or A–F or a–f
[:punct:] Any punctuation symbol
[:print:] Any printable character (not control characters)
[:space:] Any whitespace character
[:graph:] Exclude whitespace characters
[:upper:] Any uppercase letter
[:lower:] Any lowercase letter
[:cntrl:] Control characters

Let's grep lines with uppercase characters:

root@nm3:/ # grep "[[:upper:]]" test.txt
#comment UP

Can not see it clearly? Let’s highlight it:

Grep

Some more tricks. The first one is more academic. I haven't used it for 15 years.

Select lines containing six, seven or eight.

It is simple.

root@nm3:/ # grep -E "(six|seven|eight)" test.txt
seven eight one eight three
 sixteen seventeen eighteen seven
sixteen seventeen eighteen
        twenty seven
twentyseven

And now select only the lines containing six, seven or eight several times. This feature is called Backreferences:

root@nm3:/ # grep -E "(six|seven|eight).*\1" test.txt
seven eight one eight three
 sixteen seventeen eighteen seven

Here is a second trick which is much more useful.

Print the lines, in which 504 has a tab around it (PCRE support would be great here…).

POSIX classes use does not help:

root@nm3:/ # grep "[[:blank:]]504[[:blank:]]" test.txt
one 504 one
one     504     one
one     504 one

[CTRL+V][TAB] construction comes in handy:

root@nm3:/ # grep "     504     " test.txt
one     504     one

What have I missed out? grep can certainly search in files/directories recursively. Let’s find the code allowing to use Intel for exterior SFPs. I don’t remember which way to write: allow_unsupported_sfp or unsupported_allow_sfp. Anyway, it is grep’s problem:

root@nm3:/ # grep -rni allow /usr/src/sys/dev/ | grep unsupp
/usr/src/sys/dev/ixgbe/README:75:of unsupported modules by setting the static variable 'allow_unsupported_sfp'
/usr/src/sys/dev/ixgbe/ixgbe.c:322:static int allow_unsupported_sfp = TRUE;
/usr/src/sys/dev/ixgbe/ixgbe.c:323:TUNABLE_INT("hw.ixgbe.unsupported_sfp", &allow_unsupported_sfp);
/usr/src/sys/dev/ixgbe/ixgbe.c:542:     hw->allow_unsupported_sfp = allow_unsupported_sfp;
/usr/src/sys/dev/ixgbe/ixgbe_type.h:3249:       bool allow_unsupported_sfp;
/usr/src/sys/dev/ixgbe/ixgbe_phy.c:1228:                                if (hw->allow_unsupported_sfp == TRUE) {

Hope you are not bored as it is just the top of grep iceberg.

Happy grepping!

Published by
RATING: 13.90
Published in

*nix

The *nix world is all about Unix-like systems, e.g., Linux, BSD, etc
  • 1
  • 2

Subscribe to Kukuruku Hub

Or subscribe with RSS

2 comments

Federico Ramirez
Interesting article, I learnt some stuff about grep! Thanks, anyways I think you could have explained a bit more on some topics, like saying «That works but this is better», why is it better?Cheers.
Kukuruku Hub
I agree. In this specific case it's better because the regex is a little bit shorter and more precise.
root@nm3:/ # grep -E '\b[0-9]{1,3}(\.[0-9]{1,3}){3}\b' /etc/resolv.conf
The \b is a word boundary modifier, which means that it matches before and after an alphanumeric sequence. To be honest, this regular expression does not represent IP address as per RFC. For instance, this rule [0-9]{1,3} will also match 745, which is far over 255. But usually you do not care too much, as you can visually distinguish between real IP address and something like 745.983.001.874.

Read Next