Professional Documents
Culture Documents
In the previous post, we talked about sed command and we saw many examples of using it in text
processing and we saw how it is good in this, but it has some limitations. Sometimes you need
something powerful, giving you more control to process data. This is where awk command comes in.
The awk command or GNU awk in specific provides a scripting language for text processing. With
awk scripting language, you can make the following:
Define variables.
Use string and arithmetic operators.
Use control flow and loops.
Generate formatted reports.
Actually, you can process log files that contain maybe millions of lines to output a readable report that
you can benefit from.
Table of Contents
1. 1 Awk Options
2. 2 Read AWK Scripts
3. 3 Using Variables
4. 4 Using Multiple Commands
5. 5 Reading The Script From a File
6. 6 Awk Preprocessing
7. 7 Awk Postprocessing
8. 8 Built-in Variables
9. 9 More Variables
10. 10 User Defined Variables
11. 11 Structured Commands11.1 While Loop
12. 11.2 The for Loop
13. 12 Formatted Printing
14. 13 Built-In Functions
15. 13.1 Mathematical Functions
16. 14 String Functions
17. 15 User Defined Functions
Awk Options
The awk command is used like this:
We will see how to process files and print results using awk.
Read AWK Scripts
To define an awk script, use braces surrounded by single quotation marks like this:
To terminate the program, press The Ctrl+D. Looks tricky, don’t panic, the best is yet to come.
Using Variables
With awk, you can process text files. Awk assigns some variables for each data field found:
The whitespace character like space or tab is the default separator between fields in awk.
Sometimes the separator in some files is not space nor tab but something else. You can specify it
using –F option:
The first command makes the $2 field equals Adam. The second command prints the entire line.
1 {
2
3 text = " home at "
4
5 print $1 $6
6
7 }
Awk Preprocessing
If you need to create a title or a header for your result or so. You can use the BEGIN keyword to
achieve this. It runs before processing the data:
Awk Postprocessing
To run a script after processing the data, use the END keyword:
1 BEGIN {
2
3 print "Users and thier corresponding home"
4
5 print " UserName \t HomePath"
6
7 print "___________ \t __________"
8
9 FS=":"
10
11 }
12
13 {
14
15 print $1 " \t " $6
16
17 }
18
19 END {
20
21 print "The end"
22
23 }
First, the top section is created using BEGIN keyword. Then we define the FS and print the footer at
the end.
Built-in Variables
We saw the data field variables $1, $2 $3, etc are used to extract data fields, we also deal with the
field separator FS.
But these are not the only variables, there are more built-in variables.
By default, the OFS variable is the space, you can set the OFS variable to specify the separator you
need:
1 1235.96521
2
3 927-8.3652
4
5 36257.8157
Look at the output. The output fields are 3 per line and each field length is based on what we
assigned by FIELDWIDTH exactly.
Suppose that your data are distributed on different lines like the following:
1 Person Name
2
3 123 High Street
4
5 (222) 466-1234
6
7
8
9 Another person
10
11 487 High Street
12
13 (523) 643-8754
In the above example, awk fails to process fields properly because the fields are separated by new
lines and not spaces.
You need to set the FS to the newline (\n) and the RS to a blank text, so empty lines will be
considered separators.
More Variables
There are some other variables that help you to get more information:
You can review the previous post shell scripting to know more about these variables.
The ENVIRON variable retrieves the shell environment variables like this:
1 $ awk '
2
3 BEGIN{
4
5 print ENVIRON["PATH"]
6
7 }'
You can use bash variables without ENVIRON variables like this:
The NF variable specifies the last field in the record without knowing its position:
The NF variable can be used as a data field variable if you type it like this: $NF.
Let’s take a look at these two examples to know the difference between FNR and NR variables:
1 $ awk '
2
3 BEGIN {FS=","}
4
5 {print $1,"FNR="FNR,"NR="NR}
6
7 END{print "Total",NR,"processed lines"}' myfile myfile
The FNR variable becomes 1 when comes to the second file, but the NR variable keeps its value.
1 $ awk '
2
3 BEGIN{
4
5 test="Welcome to LikeGeeks website"
6
7 print test
8
9 }'
Structured Commands
The awk scripting language supports if conditional statement.
10
15
33
45
1 $ awk '{
2
3 if ($1 > 30)
4
5 {
6
7 x = $1 * 3
8
9 print x
10
11 }
12
13 }' testfile
You can use else statements like this:
1 $ awk '{
2
3 if ($1 > 30)
4
5 {
6
7 x = $1 * 3
8
9 print x
10
11 } else
12
13 {
14
15 x = $1 / 2
16
17 print x
18
19 }}' testfile
Or type them on the same line and separate the if statement with a semicolon like this:
While Loop
You can use the while loop to iterate over data with a condition.
cat myfile
1 $ awk '{
2
3 sum = 0
4
5 i = 1
6
7 while (i < 5)
8
9 {
10
11 sum += $i
12
13 i++
14
15 }
16
17 average = sum / 4
18
19 print "Average:",average
20
21 }' testfile
The while loop runs and every time it adds 1 to the sum variable until the i variables becomes 4.
You can exit the loop using break command like this:
1 $ awk '{
2
3 tot = 0
4
5 i = 1
6
7 while (i < 5)
8
9 {
10
11 tot += $i
12
13 if (i == 3)
14
15 break
16
17 i++
18
19 }
20
21 average = tot / 3
22
23 print "Average is:",average
24
25 }' testfile
The for Loop
The awk scripting language supports the for loops:
1 $ awk '{
2
3 total = 0
4
5 for (var = 1; var < 5; var++)
6
7 {
8
9 total += $var
10
11 }
12
13 avg = total / 3
14
15 print "Average:",avg
16
17 }' testfile
Formatted Printing
The printf command in awk allows you to print formatted output using format specifiers.
%[modifier]control-letter
This list shows the format specifiers you can use with printf:
1 $ awk 'BEGIN{
2
3 x = 100 * 100
4
5 printf "The result is: %e\n", x
6
7 }'
Here is an example of printing scientific numbers.
We are not going to try every format specifier. You know the concept.
Built-In Functions
Awk provides several built-in functions like:
Mathematical Functions
If you love math, you can use these functions in your awk scripts:
String Functions
There are many string functions, you can check the list, but we will examine one of them as an
example and the rest is the same:
The function toupper converts character case to upper case for the passed string.
1 $ awk '
2
3 function myfunc()
4
5 {
6
7 printf "The user %s has home path at %s\n", $1,$6
8
9 }
10
11 BEGIN{FS=":"}
12
13 {
14
15 myfunc()
16
17 }' /etc/passwd
Here we define a function called myprint, then we use it in our script to print output using printf
function.
Thank you.