You are on page 1of 6

AWK

Swiss army knife of UNIX system toolkit. The AWK command made a late entry into the
UNIX system in 1977 to augment the toolkit with suitable report formatting capabilities.
Anmed after its authors, Aho, Weinberger, and Kernigham, AWK until the advent of
PERL, was the most powerful utility for text manipulation. AWK appears as GAWK
(GNU AWK) in Linux. AWK was first implemented inSVR3.1. AWK works with
database files only. Unlike other filters, AWK operates at the field level. It can easily
access, transform and format individual records. It also accepts regular expression for
pattern matching, has C-type programming construct, varibles and several built-in
functions.

AWK follows general syntax:


awk options {pattern} {action}
Any AWK script have to start by writing awk. The input to AWK file is a database file.
AWK by default works like a loop. Pattern consists of sequence of characters which are
to be matched with every record of input. The sequence of characters can be expressions,
logical expressions, relational expression. The two special patterns are:-
BEGIN
END
Now let us take an example to understand it better:-
 To display details of all the directors.
$ awk –f “|” ‘/director/ {print}’ emp.lst
The pattern section (/director/) selects line that are processed in the pattern
section({print}). If the pattern is missing, the action applies to all lines of the file. If the
action is missing, the entire line will be printed. Either the pattern or the action is
optional, but both must be enclosed within a pair of single quotes. The PRINT statement,
when used without any field specifiers, prints the entire line. $ when embedded in AWK
script points to that particular field.

TIP: An AWK program must have either pattern or action, or both, but within
single quotes.
The following three forms could be considered equivalent:-
 $ awk ‘/director/’ emp.lst #Printing is default action.
 $ awk ‘/director/ { print }’ emp.lst White space permitted.
 $ awk ‘/director/ { print $0 }’ emp.lst $0 is the complete line.

SPLITTING A LINE INTO FIELDS


AWK uses the special variable $0 to indicate the entire line. It also identifies fields by
$1,$2,$3. Since these parameters also have a special meaning to the shell, single quoting
an awk program protects them from the interpretation by the shell.
Unlike other UNIX filters which operate on fields, AWK also uses a contiguous sequence
of spaces and tabs as a single delimiter. But the sample database uses the |, so we must
use –F option to specify it in our program.
 To display name, designation, department and salary of all sales people.
$ awk-F “|” ‘/sales/ {print $2, $3, $4, $6}’ emp.lst
 To display name, designation, salary of all employee whose salary > 7000
$ awk-F “|” ‘($6 > 7000) {print $2, $3, $6}’ emp.lst
For giving some spaces between the two fields we use double quotes.
 $ awk-F “|” ‘($6 > 7000) {print $2,“ ” $3,“ ” $6}’ emp.lst
 Write a awk script which will print name, designation of all employees along with
heading.
$ awk-F “|” ‘BEGIN{ print “NAME DESIGNATION”}
{print $2, $3}’ emp.lst
The special patterns BEGIN and END basically seperates part of your awk script from
normal awk loop that examines each line of input. BEGIN executed before input file is
parsed. Hence it is an ideal place to initialise the variables. END executed after input file
is parsed. Hence it is ideal place for summary report.

TIP: BEGIN and END pattern without any action makes no sense. Always start the
opening bracein the same line the section(BEGIN & END) begins.
 Display the total salary of all employees.
$ awk-f “|” ‘BEGIN{total=0}
{total= total + $6}
END{print “Total salary is”, total}’emp.lst

BUILT-IN FUNCTION AVAILABLE IN AWK


Awk has several built-in functions, performing both arithmetic and string operations. The
parameters are passed to a function in C style, delimited by commas and enclosed by
matched pair of parenthesis.
FUNCTIONS DESCRIPTIONS
int(x) Returns integer value of x.
sqrt(x) Returns square root of x.
length(x) Returns length of x.
substr(s1,s2,s3) Return portion of string of length s3,starting from position s2 in the
string s1.
index(s1,s2) Returns the position of string s2 in s1

 To display all records whose length is greater than 60.


$ awk-F “|” ‘length > 60 {print}’ emp.lst
 If you are specific about a particular field then…
$ awk-F “|” ‘length($3) < 20 {print}’ emp.lst
 Display all the records of employee whose salary is greater than 7000 and less
than 8000.
$ awk-F “|” ‘$6 > 7000 && $6 < 8000 {print $6}’ emp.lst

TIP: No type declaration or initial values are required for user defined variables
used in an AWK program. AWK identifies their type and initailise them to zero or
null string.

BUILT_IN VARIABLES
AWK has several built-in variables. They are all assigned automatically, though it is also
possible for a user to reassign them.
VARIABLES FUNCTION
NR Cumulative number of records read.
FS Input field seperator
OFS Output field seperator.
NF Number of fields in current record.
RS Record seperator.
FILENAME Current input file.

For identification of fields, AWK uses a contiguous string of spaces as the default field
delimiter. FS redefines the field seperator, when used at all, it must occur in the BEGIN
section so that the body of the program knows its value before it starts processing.
BEGIN ( FS = “|”)
This is alternave to the – F option of the command which does the same thing.
The OFS variable: when you use the print statement with comma seperated arguments,
each argument was seperated from the others by a space. This is Awk’s default output
field seperator, and can be reassigned using the variable OFS in the begin section.
BEGIN { OFS = “~” }
When you reassign this variable with a ~, Awk will use this character for delimiting the
print arguments.
FILENAME stores the name of the current file being processed.

TIP: Awk is the only filter that uses white space as the default delimiter instead of a
single space.

POSITIONAL PARAMETERS
AWK also uses positional parameters like the shell, except that they have to be placed in
single quotes. This would enable AWK to distinguish between a positional parameter and
a field identifier. The entire AWK command should now be stored in a shell script, and
the parameter supplied as an argument to the script.
 Awk-F “|” ‘BEGIN{print “Employee”}
($6 > 7500) {
count++;
print $2, $3, $4 }
END{ “the average is”,count }
So instead of having ($6 > 7500) as th eline specifier, you should generalise it as
$6 > ’$1’. If you now have the script empabs.awk containing the entire awk command,
you can invoke the command with an argument , say 8000:
empabs.awk 8000
MAJOR DRAWBACK OF AWK
It doesn’t take input from keyboard. To take input from terminal, AWK uses
getline which has got a diverse syntax:
getline <variable> <”</dev/tty”

QUESTIONS FOR PRACTISE:


The file contains following fields:
COUNTRY POPULATION AREA CONTINENT
USSR 12548 86565 ASIA
CANADA 12598 56532 NORTH AMERICA
BRAZIL 54487 55452 AFRICA
FRANCE 98878 15457 EUROPE
ARGENTINA 69852 55645 SOUTH AMERICA

 Display names of all countries and continents.


 Display records of all countries in asia.
 Display records of all countries whose area is >25400.
 Display names of countries which are not in asia.
 Display the total population of all the countries in North America.

You might also like