You are on page 1of 144

What is Awk

Awk is a programming language used for manipulating data and


generating reports

The data may come from standard input, one or more files, or as
output from a process

Awk scans a file (or input) line by line, from the first to the last line,
searching for lines that match a specified pattern and performing
selected actions (enclosed in curly braces) on those lines.

If there is a pattern with no specific action, all lines that match the
pattern are displayed;

If there is an action with no pattern, all input lines specified by the


action are executed upon.

Which Awk

The command is awk if using the old


version, nawk if using the new version, and
gawk is using the gnu version

Awks format

An awk program consists of:

the awk command

the program instructions enclosed in quotes (or a fle) , and

the name of the input fle

If an input fle is not specifed, input comes from


standard input (stdin), the keyboard

Awk instructions consists of

patterns,

actions, or

a combination of patterns and actions

A pattern is a statement consisting of an


epression of some type

Awks format (continue.)

Actions consist of one or more statements


separated by semicolons or new lines and
enclosed in curly braces

Patterns cannot be enclosed in curly braces, and


consist of regular epressions enclosed in
forward slashes or epressions consisting of one
or more of the many operators provided by awk

awk commands can be typed at the command


line or in awk script fles

The input lines can come from fles, pipes, or


standard input

Awks format (continue.)

!ormat"
nawk 'pattern' filename
nawk '{action}' filename
nawk 'pattern {action}' filename

Input from !iles

#ample $"
$ cat employees
Chen Cho 5/19/63 203-344-1234 $6
!om "ill# 4/12/45 913-92-4536 $102
$arr# %hite 11/2/54 90&-65-23&9 $54
"ill Clinton 1/14/60 654-56-4114 $201
'te(e )nn 9/15/1 202-545-&&99 $5&
$ nawk '/Tom/' employees
!om "ill# 4/12/45 913-92-4536 $102
$

Input from !iles (continue.)

#ample %"
$ cat employees
Chen Cho 5/19/63 203-344-1234 $6
!om "ill# 4/12/45 913-92-4536 $102
$arr# %hite 11/2/54 90&-65-23&9 $54
"ill Clinton 1/14/60 654-56-4114 $201
'te(e )nn 9/15/1 202-545-&&99 $5&
$ nawk '{print $1}' employees
Chen
!om
$arr#
"ill
'te(e

Awks format (continue.)

#ample &"
$ cat employees
Chen Cho 5/19/63 203-344-1234 $6
!om "ill# 4/12/45 913-92-4536 $102
$arr# %hite 11/2/54 90&-65-23&9 $54
"ill Clinton 1/14/60 654-56-4114 $201
'te(e )nn 9/15/1 202-545-&&99 $5&
$ nawk '/Steve/{print $1, $2}' employees
'te(e )nn

"he print function

"he default action is to print the lines that are


matched to the screen

"he print function can also be eplicitly used in


the action part of awk as {print}

"he print function accepts arguments as

variables,

computed values, or

string constants

#tring must be enclosed in double $uotes

%ommas are used to separate the arguments: if


commas are not pro&ided, the arguments are
concatenated together

"he print function (continue.)

The comma evaluates to the value of the


output feld separator (OFS), which is by
default a space

The output of the print function can be


redirected or piped to another program, and
another program can be piped to awk for
printing

"he print function (continue.)

#ample"
$ date
*ri *e+ 9 0,49,2& -'! 2001
$ date | nawk '{ print "Mont! " $2
""n#ear! ", $$}'
.onth, *e+
/ear, 2001

'scape se$uences

#scape sequences are represented by a


bac'slash and a letter or number
#scape
sequence
(eaning
)b *ac'space
)f !orm feed
)n +ewline
)r ,arriage return
)t Tab
)-./ 0ctal value ./, a single quote
)c c represents any other character,
e.g., )1

'scape se$uences (continue.)

#ample"
$ cat employees
Chen Cho 5/19/63 203-344-1234 $6
!om "ill# 4/12/45 913-92-4536 $102
$arr# %hite 11/2/54 90&-65-23&9 $54
"ill Clinton 1/14/60 654-56-4114 $201
'te(e )nn 9/15/1 202-545-&&99 $5&
$ nawk '/%nn/{print ""t"t&ave a nice day, " $1,
$2 ""'"}' employees &ave a nice day, Steve %nn'

"he printf !unction

The printf function can be used for formatting fancy


output

The printf function returns a formatted string to standard


output, li'e the printf statement in ,.

2nli'e the print function, printf does not provide a


newline. The escape, \n, must be provided if a newline is
desired

3hen an argument is printed, the place where the output


is printed is called the feld, and when the width of the
feld is the number of characters contained in that feld

"he printf !unction (continue.)

#ample $"
$ eco "()*+" | nawk ' {print, "|-.1/s|"n",
$1}'
01234 0
$ eco "()*+" | nawk '{print, "|-1/s|"n",
$1}'
0 12340
$

"he printf !unction (continue.)

#ample %"
$ cat employees
Chen Cho 5/19/63 203-344-1234 $6
!om "ill# 4/12/45 913-92-4536 $102
$arr# %hite 11/2/54 90&-65-23&9 $54
"ill Clinton 1/14/60 654-56-4114 $201
'te(e )nn 9/15/1 202-545-&&99 $5&
$ nawk '{print, "Te name is! -.1/s *0 is
-1d"n", $1, $2}' employees
!he name i5, Chen 36 i5 5
!he name i5, !om 36 i5 4
!he name i5, $arr# 36 i5 11
!he name i5, "ill 36 i5 1
!he name i5, 'te(e 36 i5 9
$

"he printf !unction (continue.)
%on&ersion
%haracter
(efnition
c ,haracter
s 4tring
d 5ecimal number
ld 6ong decimal number
u 2nsigned decimal
number
lu 6ong unsigned decimal
number

"he printf !unction (continue.)
%on&ersion %haracter (efnition
7eadecimal number
l 6ong headecimal number
o 0ctal number
lo 6ong octal number
e !loating point number in
scientifc notation (e8
notation)
f !loating point number
g !loating point number using
either e or f conversion,
whichever ta'es the least
space

"he printf !unction (continue.)
)odifer %haracter (efnition
8 6eft89ustifcation modifer
: ;ntegers in octal form are
displayed with a leading -<
integers in headecimal
form are displayed with a
leading -.
= !or conversions using d,e,f,
and g, integers are
displayed with a numeric
sign = or 8
- The displayed value is
padded with >eros instead of
white space

"he printf !unction (continue.)
Printf !ormat #pecifer What it (oes
?iven @AAB,y@$C,>@%.&, and D$ @ *ob 4mith"
Ec Prints a single A4,;; character.
printf("The character is %c\n",x) prints"
The character is A
Ed Prints a decimal number
printf("The !" is %d "ears !ld\n", ")
prints" The !" is #$ "ears !ld
Ee Prints the e notation of a number
printf("% is %f\n", %) prints" > is &'(e)*#
Ef Prints a Foating point number
printf("% is %e\n", %) prints" > is +',*****

"he printf !unction (continue.)
Printf !ormat
#pecifer
What it (oes
?iven @AAB,y@$C,>@%.&, and D$ @ *ob 4mith"
Eo Prints the octal value of a number
printf("" is %!\n",") prints" " is #,
Es Prints a string of characters
printf("The na-e !f the c.lprit is
%s\n", /#) prints" The na-e !f the
c.lprit is 0! S-ith
E Prints the hex 1al.e of a number
printf("" is %x\n", ") prints" " is F

awk commands from within
a
fle (continue.)

;f awk commands are placed in a fle, the


2f option is used with the name of the awk
fle, followed by the name of the input fle
to be processed

A record is read into awkGs buHer and each


of the commands in the awk fle are tested
and eecuted for that record

;f an action is not controlled by a pattern,


the default behavior is to print the entire
record

;f a pattern does not have an action


associated with it, the default is to print
the record where the pattern matches an
input line
awk commands from within a file
(continue.)

awk commands from within a
fle (continue.)

#ample"
Chen Cho 5/19/63 203-344-1234 $6
!om "ill# 4/12/45 913-92-4536 $102
$arr# %hite 11/2/54 90&-65-23&9 $54
"ill Clinton 1/14/60 654-56-4114 $201
'te(e )nn 9/15/1 202-545-&&99 $5&
$ cat awk,ile
/3Steve/{print "&ello Steve'"}
{print $1, $2, $2}
$ nawk ., awk,ile employees
Chen Cho 5/19/63
!om "ill# 4/12/45
$arr# %hite 11/2/54
"ill Clinton 1/14/60
7ello 'te(e8
'te(e )nn 9/15/1

*ecords

*y default, each line is called a rec!rd


and is terminated with a newline

"he *ecord #eparator

*y default, the output and input record


separator (line separator) is a carriage
return, stored in the built8in awk variables
O3S and 3S, respectively

The O3S and 3S values can be changed, but


only in a limited fashion

"he /* +ariable

An entire record is referenced as /* by awk

3hen /* is changed by substitution or


assignment, the value of 4F, the number of
felds, may be changed

The newline value is stored in awkGs built8in


variable 3S, a carriage return by default

"he /* +ariable (continue.)

#ample"
$ cat employees
Chen Cho 5/19/63 203-344-1234 $6
!om "ill# 4/12/45 913-92-4536 $102
$arr# %hite 11/2/54 90&-65-23&9 $54
"ill Clinton 1/14/60 654-56-4114 $201
'te(e )nn 9/15/1 202-545-&&99 $5&
$ nawk '{print $4}' employees
Chen Cho 5/19/63 203-344-1234 $6
!om "ill# 4/12/45 913-92-4536 $102
$arr# %hite 11/2/54 90&-65-23&9 $54
"ill Clinton 1/14/60 654-56-4114 $201
'te(e )nn 9/15/1 202-545-&&99 $5&

"he 43 +ariable

The number of each record is stored in


awkGs built8in variable, 43

After a record has been processed, the


value of 43 is incremented by one

"he 43 +ariable (continue.)

#ample"
$ cat employees
Chen Cho 5/19/63 203-344-1234 $6
!om "ill# 4/12/45 913-92-4536 $102
$arr# %hite 11/2/54 90&-65-23&9 $54
"ill Clinton 1/14/60 654-56-4114 $201
'te(e )nn 9/15/1 202-545-&&99 $5&
$ nawk '{print )5, $4}' employees
1 Chen Cho 5/19/63 203-344-1234 $6
2 !om "ill# 4/12/45 913-92-4536 $102
3 $arr# %hite 11/2/54 90&-65-23&9 $54
4 "ill Clinton 1/14/60 654-56-4114 $201
5 'te(e )nn 9/15/1 202-545-&&99 $5&

!ields

#ach record consists of words called felds


which, by default, are separated by white
space, that is, blan' spaces or tabs. #ach
of these words is called a feld, an awk
'eeps trac' of the number of felds in its
built8in variable, 4F

The value of 4F can vary from line to line,


and the limit is implementation8
dependent, typically $-- felds per line

!ields (continue.)

#ample $"
$1 $2 $3 $4 $5
Chen Cho 5/19/63 203-344-1234 $6
!om "ill# 4/12/45 913-92-4536 $102
$arr# %hite 11/2/54 90&-65-23&9 $54
"ill Clinton 1/14/60 654-56-4114 $201
'te(e )nn 9/15/1 202-545-&&99 $5&
$ nawk '{print )5, $1, $2, $/}' employees
1 Chen Cho $6
2 !om "ill# $102
3 $arr# %hite $54
4 "ill Clinton $201
5 'te(e )nn $5&

!ields (continue.)

#ample %"
nawk '{print $4, )6}' employees
Chen Cho 5/19/63 203-344-1234 $6 5
!om "ill# 4/12/45 913-92-4536 $102 5
$arr# %hite 11/2/54 90&-65-23&9 $54 5
"ill Clinton 1/14/60 654-56-4114 $201 5
'te(e )nn 9/15/1 202-545-&&99 $5& 5

"he Input !ield #eparator

awk's builtin !ariable, FS, holds the !alue of the


input field separator.

"hen the default !alue of #$ is used, awk separates


fields by spaces and%or tabs, stripping leading blan&s
and tabs

The FS can be changed by assigning new !alue to it,


either'
(
in a BEGIN statement, or
(
at the command line

"he Input !ield #eparator (continue.)

To change the !alue of FS at the command line, the F


option is used, followed by the character representing
the new separator

"he Input !ield #eparator (continue.)

)xample'
$ cat employees
Chen Cho,5/19/63,203-344-1234,$6
!om "ill#,4/12/45,913-92-4536,$102
$arr# %hite,11/2/54,90&-65-23&9,$54
"ill Clinton,1/14/60,654-56-4114,$201
'te(e )nn,9/15/1,202-545-&&99,$5&
$ nawk .6! '/Tom 7illy/{print $1, $2}'
employees
!om "ill# 4/12/45

,sing )ore than -ne !ield
#eparator

*ou may specify more than one input separator

If more than one character is used for the field


separator, #$, then the string is a regular expression
and is enclosed in s+uare brac&ets

)xample
$ nawk .6'8 !"t9' '{print $1, $2, $2}: employees
Chen Cho 5/19/63
!om "ill# 4/12/45
$arr# %hite 11/2/54
"ill Clinton 1/14/60
'te(e )nn 9/15/1

"he -utput !ield #eparator

The default output field separator is a single space and is


stored in awk's internal !ariable, OFS

The OFS will not be e!aluated unless the comma separates


the fields

)xample'
$ cat employees
Chen Cho,5/19/63,203-344-1234,$6
!om "ill#,4/12/45,913-92-4536,$102
$arr# %hite,11/2/54,90&-65-23&9,$54
"ill Clinton,1/14/60,654-56-4114,$201
'te(e )nn,9/15/1,202-545-&&99,$5&
$ nawk .6! '/Tom 7illy/{print $1 $2 $2 $;}'
employees
!om "ill#4/12/45913-92-4536$102

.atterns

, pattern consists of
(
a regular expression,
(
an expression resulting in a true or false condition, or
(
a combination of these

"hen reading a pattern expression, there is an implied


if statement

.atterns (continue.)

)xample'
$ cat employees
Chen Cho 5/19/63 203-344-1234 $6
!om "ill# 4/12/45 913-92-4536 $102
$arr# %hite 11/2/54 90&-65-23&9 $54
"ill Clinton 1/14/60 654-56-4114 $201
'te(e )nn 9/15/1 202-545-&&99 $5&
$ nawk '/Tom/' employees
!om "ill# 4/12/45 913-92-4536 $102
$ nawk '$; < ;4' employees
Chen Cho 5/19/63 203-344-1234 $6
'te(e )nn 9/15/1 202-545-&&99 $5&

Actions

,ctions are statements enclosed within curly braces and


separated by semicolons

,ctions can be simple statements or complex groups of


statements

$tatements are separated


(
by semicolons, or
(
by a newline if placed on their own line

*egular 'pressions

, regular expression to awk is a pattern that consists of


characters enclosed in forward slashes

)xample -'

)xample .'
$ nawk '/Steve/' employees
'te(e )nn 9/15/1 202-545-&&99 $5&
$ nawk '/Steve/{print $1, $2}' employees
'te(e )nn

*egular 'pression
)eta characters
5 (atches at the beginning of string
/ (atches at the end of string
' (atches for a single character
6 (atches >ero or more of preceding
character
) (atches for one or more of preceding
character
7 (atches for >ero or one of preceding
character
8A09: (atches for any one character in the set of
characters, i.e., A,0, or 9

*egular 'pression )eta characters
(continue.)
85A09: (atches characters not in the set of
characters, i.e.,A,0 or 9
8A2;: (atches for any character in the
range from A to ;
A<0 (atches either A or 0
(A0)) (atches one or more sets of A0
\6 (atches for a literal asteris'
= 2sed in the replacement to represent
what was found in the search string

*egular 'pressions (continue.)

)xample /'

)xample 0'
$ nawk '/3Steve/' employees
'te(e )nn 9/15/1 202-545-&&99 $5&
$ nawk '/38%.=98a.>9? /' employees
Chen Cho 5/19/63 203-344-1234 $6
!om "ill# 4/12/45 913-92-4536 $102
$arr# %hite 11/2/54 90&-65-23&9 $54
"ill Clinton 1/14/60 654-56-4114 $201
'te(e )nn 9/15/1 202-545-&&99 $5&

"he )atch -perator

The match operator, the tilde (1), is used to match an


expression within a record or a field

)xample -'
$ cat employees
Chen Cho 5/19/63 203-344-1234 $6
!om "ill# 4/12/45 913-92-4536 $102
$arr# %hite 11/2/54 90&-65-23&9 $54
"ill Clinton 1/14/60 654-56-4114 $201
'te(e )nn 9/15/1 202-545-&&99 $5&
$ nawk '$1 @ /87A9ill/' employees
"ill Clinton 1/14/60 654-56-4114 $201

"he )atch -perator (continue.)

)xample .'
$ nawk '$1 '@ /lee$/' employees
Chen Cho 5/19/63 203-344-1234 $6
!om "ill# 4/12/45 913-92-4536 $102
$arr# %hite 11/2/54 90&-65-23&9 $54
"ill Clinton 1/14/60 654-56-4114 $201
'te(e )nn 9/15/1 202-545-&&99 $5&

awk %ommands in a #cript !ile

"hen you ha!e multiple awk pattern%action statements,


it is often easier to put the statements in a script

The script file is a file containing awk comments and


statements

If statements and actions are on the same line, they are


separated by semicolons

2omments are preceded by a pound (3) sign



awk %ommands in a #cript !ile
(continue.)

)xample'
$ cat employees
Chen Cho,5/19/63,203-344-1234,$6
!om "ill#,4/12/45,913-92-4536,$102
$arr# %hite,11/2/54,90&-65-23&9,$54
"ill Clinton,1/14/60,654-56-4114,$201
'te(e )nn,9/15/1,202-545-&&99,$5&
$ cat in,o
9 .# fir5t awk 5cript +# )+:el5hako;r )+;<nei:
9 'cript name, info= 6ate, *e+r;ar# 09> 2001
/!om/{print ?!om'5 +irth:a# i5 ?$3}
/"ill/{print 2@> $0}
/A'te(e/{print ?7i 'te(eB ? $1 ? ha5 a 5alar# of ? $4
?B?}
9-n: of info 5cript

awk %ommands in a #cript !ile
(continue.)

)xample (continue.)'

To !iew info script, clic& here


$ nawk .6! ., in,o employees
!om'5 +irth:a# i5 913-92-4536
2 !om "ill#,4/12/45,913-92-4536,$102
4 "ill Clinton,1/14/60,654-56-4114,$201
7i 'te(eB 'te(e )nn ha5 a 5alar# of $5&B


%omparison 'pressions

2omparison expressions match lines where if the


condition is true, the action is performed

The !alue of the expression e!aluates true, and 0 if false

2omparison expressions match lines where if the


condition is true, the action is performed

The !alue of the expression e!aluates true, and 4 if false



*elational -perators
0perato
r
(eaning #ampl
e
> 6ess than I y
>? 6ess than or equal to I@ y
?? #qual to @@ y
@? +ot equal to J@ y
A? ?reater than or equal to K@ y
A ?reater than K y
B (atched by regular epression L M yM
@B +ot matched by regular
epression
JL MyM

*elational -perators (continue.)

)xample'
$ cat employees
Chen Cho 5/19/63 203-344-1234 6
!om "ill# 4/12/45 913-92-4536 102
$arr# %hite 11/2/54 90&-65-23&9 54
"ill Clinton 1/14/60 654-56-4114 201
'te(e )nn 9/15/1 202-545-&&99 5&
$ nawk '$/ BB 241' employees
"ill Clinton 1/14/60 654-56-4114 201
$ nawk '$/ C 144' employees
!om "ill# 4/12/45 913-92-4536 102
$ nawk '$2 @ /%nn/ ' employees
'te(e )nn 9/15/1 202-545-&&99 5&

*elational -perators (continue.)

)xample (continue)'
$ nawk '$2 '@ /%nn/ ' employees
Chen Cho 5/19/63 203-344-1234 6
!om "ill# 4/12/45 913-92-4536 102
$arr# %hite 11/2/54 90&-65-23&9 54
"ill Clinton 1/14/60 654-56-4114 201

%onditional 'pressions

, conditional expression uses two symbols, the


+uestion mar& and the colon, to e!aluate expression

#ormat'
con:itional eCpre55ion1 D eCpre55ion2 , eCpre55ion3

%onditional 'pressions (continue.)

)xample'
$ nawk '{maDBE$1 C $2F G $1 ! $2H print maD}' employees
Cho
!om
%hite
Clinton
'te(e

%omputation

awk performs all arithmetic in floating point


-perator )eaning 'ample
) Add = y
2 4ubstract N y
6 (ultiply O y
C 5ivide M y
% (odulus E y
5 #ponentiati
on
P y

%omputation (continue.)

)xample'
$ nawk '$2 I $; C /44' ,ilename

%ompound .atterns

2ompounds patterns are expressions that combine


patterns with logical operators
0perator (eaning #ample
QQ 6ogical A+5 a QQ b
RR 6ogical 0S a RR b
J +0T Ja

%ompound .atterns (continue.)

)xample '
$ nawk '$2 C / JJ $2 <B 1/' employees
$
$ nawk '$/ BB 1444 || $2 C /4' employees
'te(e )nn 9/15/1 202-545-&&99 5&
$

*ange .atterns

5ange patterns match from the first occurrence of one


pattern to the first occurrence of the second pattern, then
match for the next occurrence of the second pattern, etc

If the first pattern is matched and the second pattern is


not found, awk will display all lines to the end of the file

)xample '
$ nawk '/Tom/,/Steve/' employees
!om "ill# 4/12/45 913-92-4536 102
$arr# %hite 11/2/54 90&-65-23&9 54
"ill Clinton 1/14/60 654-56-4114 201
'te(e )nn 9/15/1 202-545-&&99 5&
$


/umeric and #tring %onstants

6umeric constants can be represented as


(
Integer li&e .0/
(
#loating point numbers li&e /.-0, or
(
6umbers using scientific notation li&e .7./)- or
/.0

$trings, such as Hello are enclosed in double +uotes



Initiali0ation and "ype %oercion

, !ariable can be
(
a string
(
a number, or
(
both

"hen it is set, it becomes the type of the expression


on the righthand side of the e+ual sign

Initiali8ed !ariables ha!e the !alue 8ero or the !alue


9 9, depending on the context in which they are used

,ser1(efned +ariables

:serdefined !ariables consist of letters, digits, and


underscores, and cannot begin with a digit

;ariables in awk are not declared

If the !ariable is not initiali8ed, awk initiali8es string


!ariables to null and numeric !ariables to 8ero

;ariables are assigned !alues with awk's assignment


operators

)xample '
$ nawk '$1 @ /Tom/ {waKe B $/ I ;4H print
waKe}' employees
40&0

Increment and (ecrement
-perators

The expression x++ is e+ui!alent to x=x+1

The expression x( is e+ui!alent to x=x1

*ou can use the increment and decrement operators either


preceding operator, as in ++x, or after the operator, as x++
{x = 1;y = x++; print x, y}
name=Nancy name is string
x++ x is a number; x is
initialized to zero and
incremented by 1
number=3 number is a number

,ser1(efned +ariables at the
%ommand line

, !ariable can be assigned a !alue at the command


line and passed into awk script

)xample'
$ nawk F: -f awkscript month=4 year=1999
filename

"he v -ption (nawk)

The ! option pro!ided by nawk allows command line


arguments to be processed within a <)=I6 statement

#or each argument passed at the command line, there


must be a (! option preceding it

2uilt1in +ariables

<uiltin !ariables ha!e uppercase names. They can be


used in expressions and can be reset
+ariable
/ame
+ariable %ontents
A3D9 +umber of command line argument
A3DE Array of command line arguments
FFGH4AIH +ame of current input fle
F43 Secord number in current fle
FS The input feld separator, by default a
space

2uilt1in +ariables (continue.)
+ariable /ame +ariable %ontents
4F +umber of felds in current record
43 +umber of record so far
OFIT 0utput format for numbers
OFS 0utput feld separator
O3S 0utput record separator
3GH4DTJ 6ength of string matched by -atch
function
3S ;nput record separator
3STA3T 0Hset of string matched by -atch function
SK0SHP 4ubscript separator

2uilt1in +ariables (continue.)

)xample'
$ nawk .6! '$1 BB "Steve %nn"{print )5, $1, $2,
$)6}' employees2
5 'te(e )nn 9/15/1 $5&

0HDF4 .atterns

The BEGIN pattern is followed by an action bloc&


that is executed before awk processes any lines from the
input file

The BEGIN action is often used to change the !alue


of the builtin !ariables, OFS, "S, FS, and so forth, to
assign initial !alues to userdefined !ariables, and to
print headers or titles as part of the output

0HDF4 .atterns (continue.)

)xample - '
$ nawk '7LM*){6SB"!"H N6SB""t"H N5SB""n"n"}
{print $1,$2,$2}' employees2
Chen Cho 5/19/63 203-344-1234
!om "ill# 4/12/45 913-92-4536
$arr# %hite 11/2/54 90&-65-23&9
"ill Clinton 1/14/60 654-56-4114
'te(e )nn 9/15/1 202-545-&&99
$

0HDF4 .atterns (continue.)

)xample . '
$ nawk '7LM*){print "Make #ear"}'
.ake /ear

H4L .atterns

EN# patterns do not match any input lines, but executes any
actions that are associated with the EN# pattern. EN# patterns
are handled af$er all lines of input ha!e been processed

)xamples'
$ nawk 'L)0{print "Te nOmAer o, records is "
)5 }' employees
!he n;m+er of recor:5 i5 5
$ nawk '/Steve/{coOnt??}L)0{print "Steve was ,oOnd
" coOnt " timesP"}' employees
'te(e wa5 fo;n: 1 time5B
$

-utput *edirection

"hen redirecting output from within awk to a %NI& file, the


shell redirection operators are used

The filename must be enclosed in double +uotes

>nce the file is opened, it remains opened until explicitly closed


or the awk program terminates

)xample'
$ nawk '$5 >= 70 {print $1, $2 > "passingfile" !' emplo"ees
$ cat passingfile
2hen 2ho
Tom <illy
<ill 2linton

"he getline !unction

5eads input from


(
The standard input,
(
a pipe, or
(
a file other than from the current file being processed

It gets the next line of input and sets the NF, N" and the
FN" builtin !ariables

The ge$line function returns


(
1 if a record is found
(
0 if )># (end of file)
(
1 if there is an error

"he getline !unction (continue.)

)xamples '
$ nawk '7LM*){ "date" | Ketline dH print d}'
employees2
*ri *e+ 9 09,39,53 -'! 2001
$ nawk '7LM*){ "date" | Ketline dH splitE d, monFH
print mon829}' employees
*e+
$ nawk '7LM*){wileE"ls" | KetlineF print}
1234
(arfile
(arfile2
(arfile3
(arfile4
(arfile5
(arfile6

"he getline !unction (continue.)

)xamples '
$ nawk '7LM*){ print "Qat is yoOr nameG" H"
C Ketline name < "/dev/tty"}"
C $1 @ name {print "6oOnd " name " on line ",
)5 "P"}"
C L)0{print "See ya, " name "P"}' employees
%hat i5 #o;r nameD
a+:;l
'ee #a> a+:;lB
$

.ipes

If you open a pipe in an awk program, you must


close it before opening another one

The command on the righthand side of the pipe


symbol is enclosed in double +uotes

.ipes (continue.)

)xample'
$ cat names
Ehon 5mith
alice che+a
ton# tram
:an 5a(aFe
eli<a Fol:+orF
$ nawk '{print $1, $2 | "sort .r ?1 .2 ?4 .1
"}' names
ton# tram
Ehon 5mith
:an 5a(aFe
eli<a Fol:+orF
alice che+a
$

%losing !iles and .ipes

The pipe remains opened until awk exits

$tatements in the EN# bloc& will also be affected by


the pipe. The first line in the EN# bloc& closes the pipe

)xample'
! "n script#
{ print $1, $%, $3 & sort 'r +1 '% +( '1}
)N*{
+lose!sort 'r +1 '% +( '1#
,rest o- statement. }

"he #ystem !unction

The builtin s's$e( function ta&es a :6I? (operating


system command) command as its argument, executes the
command, and returns the exit status to the awk program

The :6I? command must be enclosed in double


+uotes

)xample'
! "n script#
{
/ystem ! cat $1 #
/ystem ! clear #
}

Ff #tatement

#ormat'
If )expression* +
s$a$e(en$, s$a$e(en$, -
.

FfCelse #tatement

#ormat'
+If )expression* +
s$a$e(en$, s$a$e(en$, -
.
else +
s$a$e(en$, s$a$e(en$, -
.
.

FfCelse #tatement

)xample'
$ nawk '{i,E$$ C /4F print $1 "Too iK"H "
C else print "5anKe is NR"}' names
@anFe i5 GH
@anFe i5 GH
@anFe i5 GH
@anFe i5 GH
@anFe i5 GH
$

FfCelse else if #tatement

#ormat'
+ If )expression* +
s$a$e(en$, s$a$e(en$, -
.
else if )expression* +
s$a$e(en$, s$a$e(en$, -
.
else if )expression* +
s$a$e(en$, s$a$e(en$, -
.
else +
s$a$e(en$, s$a$e(en$, -
.
.

3oops

@oops are used to iterate through the field within a


record and to loop through the elements of an array in
the EN# bloc&

Mhile 3oop

The first step in using a w/ile loop is to set a !ariable


to an initial !alue

The 0o1w/ile loop is similar to the while loop, except


that the expression is not tested until the body of the
loop is executed at least once

Mhile 3oop (continue.)

)xample'
$ nawk '{ i B 1H wile Ei <B )6 F { print )6,
$iH i??}}' names
2 Ehon
2 5mith
2 alice
2 che+a
2 ton#
2 tram
2 :an
2 5a(aFe
2 eli<a
2 Fol:+orF
$

f!r 3oop

for loop re+uires three expressions within the


parentheses' the initiali8ation expression, the test
expression and the expression to update the !ariables
within the test expression

The first statement within the parentheses of the for


loop can perform only one initiali8ation

f!r 3oop (continue.)

)xample'
$ nawk '{ i B 1H wile Ei <B )6 F { print )6,
$iH i??}}' names
2 Ehon
2 5mith
2 alice
2 che+a
2 ton#
2 tram
2 :an
2 5a(aFe
2 eli<a
2 Fol:+orF
$

reak and c!ntin.e #tatement

The 2reak statement lets you brea& out of a loop if a


certain condition is true

The 3on$inue statement causes the loop to s&ip any


statement that follow if a certain condition is true, and returns
control to the top of the loop, starting at the next iteration

)xample'
(In $cript)
A if (B- CeterD AnextDD
else AprintD
D

next #tatement

The nex$ statement gets the next line of input from the
input file, restarting execution at the top of the awk script

)xample'
(In $cript)
- Afor ( x E /; x FE 6#; xGG)
if ( Bx F 4) Aprint H<ottomed outIJ; #reakD
3 brea&s out of the loop
D
. Afor ( x E /; x FE 6#; xGG )
if ( Bx EE 4 ) A print H=et next itemJ; contin$eD
3 starts next iteration of the for loop
D

exit #tatement

The exi$ statement is used to terminate the awk


program. It stops processing records, but does not s&ip
o!er an EN# statement

If the exit statement is gi!en a !alue between 4 and


.KK as an argument (exit -), this !alue can be printed at the
command line to indicate success or failure by typing'

exit #tatement (continue.)

)xample'
)In S3rip$*
+exi$ )1*.
)4/e 5o((an0 6ine*
7 e3/o 8s$a$us )3s/*
1
8 e3/o 89 )s/1ks/*
1

Arrays

,rrays in awk are called asso3ia$i!e arrays because



the subscripts can be either
(
number, or
(
string

The &eys and !alues are stored internally is a table


where a hashing algorithm is applied to the !alue of
the &ey in +uestion

,n array is created by using it, and awk can infer


whether or not is used to store numbers or strings

Arrays (continue.)

,rray elements are initiali8ed with


(
numeric !alue, and
(
string !alue null

*ou do not ha!e to declare the si8e of an aw array

awk arrays are used to collect information from


records and may be used for accumulating totals,
counting words, trac&ing the number of times a
pattern occurred

Arrays (continue.)

)xample'
$ cat employees
Chen Cho 5/19/63 203-344-1234 6
!om "ill# 4/12/45 913-92-4536 102
$arr# %hite 11/2/54 90&-65-23&9 54
"ill Clinton 1/14/60 654-56-4114 201
'te(e )nn 9/15/1 202-545-&&99 5&
$ nawk '{name8D??9B$2}HL)0{,orEiB4H i<)5Hi??F"
C print i, name8i9}' employees
0 Cho
1 "ill#
2 %hite
3 Clinton
4 )nn

Arrays (continue.)

)xample'
$ nawk '{id8)59B$2}HL)0{,orED B 1H D<B )5H
D??F"
C print id8D9}' employees
5/19/63
4/12/45
11/2/54
1/14/60
9/15/1
$

"he special f!r 3oop

The special for loop is used to read through an


associati!e array when strings are used as subscripts or the
subscripts are not consecuti!e numbers

"hen strings are used as subscripts or the subscripts


are not consecuti!e numbers

"he special f!r 3oop (continue.)

)xample'
$ cat dA
!om Ione5
.ar# ):am5
'all# ChanF
"ill# "lack
!om 'a(aFe
$ nawk '/3Tom/{name8)59B$1}H"
C L)0{,orE iB1H i <B )5H i?? Fprint name8i9}' dA
!om
!om
$

"he special f!r 3oop (continue.)

)xample'
$ nawk '/3Tom/{name8)59B$1}H"
C L)0{,orEi in nameF{print name8i9}}' dA
!om
!om
$

,sing #trings as Array #ubscripts

, subscript may consist of a !ariable containing a string or


literal string

If the string is a literal, it must be enclosed in double


+uotes

)xample
$ cat dA
!om Ione5
.ar# ):am5
'all# ChanF
"ill# "lack
!om 'a(aFe
$ nawk ., awkscript dA
!here are 2 !om'5 in the file an: 1 .ar#'5 in the fileB

,sing !ield +alues as Array
#ubscripts (continue.)

,ny expression can be used as a subscript in an array.


Therefore, fields can be used

)xample -'
$ cat dA
!om Ione5
.ar# ):am5
'all# ChanF
"ill# "lack
!om 'a(aFe
$ nawk '{coOnt8$29??}L)0{,orEname in coOntFprint name,
coOnt8name9 }' dA
ChanF 1
"lack 1
Ione5 1
'a(aFe 1
):am5 1
$

,sing !ield +alues as Array
#ubscripts (continue.)

)xample .'
$ cat dA
!om Ione5
.ar# ):am5
'all# ChanF
"ill# "lack
!om 'a(aFe
$ nawk '{coOnt8$29??}L)0{,orEname in coOntFprint name,
coOnt8name9 }' dA
ChanF 1
"lack 1
Ione5 1
'a(aFe 1
):am5 1
$

Arrays and the split !unction

awkLs builtin spli$ function allows you to split string


into words and store them in an array

*ou can define the field separator or use the !alue


currently stored in FS

#ormat'
split!string, array, -ield separator#
split!string, array#

"he delete !unction

The 0ele$e function remo!es an array elements



)ultidimensional Arrays: 4awk

Multidimensional array is done by concatenating the


indices into a string separated by the !alue of a special
builtin !ariable, S%BSE:

The $:<$)C !ariable contains the !alue HN4/0J, an


unprintable character

The expression (a$rix;<=>? is really the array


(a$rix;< S%BSE: >? which e!aluates to (a$rix;@<A0BC>D?

A3DE

2ommand line arguments are a!ailable to nawk with


the builtin array called A"GE

These arguments include the command nawk, but not


any of the options passed to nawk

The index of the ,5=; array starts at 8ero



A3D9

A"G5 is a builtin !ariable that contains the number of


command line arguments

)xample'
$ cat myscript
STis script is called myscript
7LM*){
,or E i B 4H i< %5MTH i??F{
print,E"arKv8-d9 is -s"n", i,
%5MU8i9F
}
print,E"Te nOmAer o, arKOments, %5MTB-d"n",
%5MTF
}

A3D9 (continue.)

)xample '
$ cat %5MUS
STis script is called arKvs
7LM*){
,or E iB4H i < %5MTH i?? F {
print,E"arKv8-d9 is -s"n", i, %5MU8i9F
}
print,E"Te nOmAer o, arKOments, %5MTB-d"n", %5MTF
}
$
$ nawk ., %5MUS data,ile
arF(J0K i5 nawk
arF(J1K i5 :atafile
!he n;m+er of arF;ment5> )@LCM2
$

A3D9 (continue.)

)xample '
$ nawk ., %5MUS data,ile "Veter Van" 12
arF(J0K i5 nawk
arF(J1K i5 :atafile
arF(J2K i5 Neter Nan
arF(J3K i5 12
!he n;m+er of arF;ment5> )@LCM4
$

A3D9 (continue.)

)xample '
$ cat arKinK
STis script is called arKin
7LM*){6SB"!"H nameB%5MU829
print "%5MU829 is "%5MU829
}
$1 @ name { print $4 }
$ nawk ., arKinK employees2 "Ten To"
)@LOJ2K i5 Chen Cho
Chen Cho,5/19/63,203-344-1234,$6
nawk, can't open file Chen Cho
inp;t recor: n;m+er 5> file Chen Cho
5o;rce line n;m+er 1

A3D9 (continue.)

)xample '
$ cat arKinK1
STis script is called arKin
7LM*){6SB"!"H nameB%5MU829
print "%5MU829 is "%5MU829
delete %5MU829
}
$1 @ name { print $4 }
$ nawk ., arKinK1 employees2 "Ten To"
)@LOJ2K i5 Chen Cho
Chen Cho,5/19/63,203-344-1234,$6

"he s. and gs. !unctions

The su2 function matches the regular expression for


the largest and leftmost substring in the record, and
replaces that substring with the substitution string

If a target string is specified, the regular expression is


matched for the largest and leftmost substring in the target
string, and the substring is replaced with the substitution
string

If a target string is not specified, the entire record is


used

"he s. and gs. !unctions (continue.)

#ormat'
5;+ PreF;lar eCpre55ion> 5;+5tit;tion 5trinFQ=
5;+ PreF;lar 5trinF> 5;+5tit;tion 5trinF>
tarFet 5trinFQ

#ormat'
F5;+ PreF;lar eCpre55ion> 5;+5tit;tion 5trinFQ
5;+ PreF;lar 5trinF> 5;+5tit;tion 5trinF>
tarFet 5trinFQ

"he index !unction

The in0ex function returns the first position where a


substring is found in a string

>ffset starts at position -



"he length !unction

The leng$/ function returns the number of


characters in a string

"ithout an argument, the leng$/ function returns the


number of characters in a record

#ormat'
lenFth P5trinFQ
lenFth

"he length !unction

)xample
$ nawk '{ print lenKtE"ello"F }' employees
5
5
5
5
5
$

"he s.str !unction

The su2s$r function returns the substring of a string


starting at a position where the first position is one

If the length of the substring is gi!en, that part of the


string is returned

If the specified length exceeds the actual string, the


string is returned

#ormat'
5;+5tr P5trinF> 5tartinF po5itionQ
5;+5tr P5trinF> 5tartinF po5ition>
lenFth of 5trinF

"he -atch !unction

The (a$3/ function returns the index where the


regular expression is found in the string, or 8ero if not
found

The match function sets the builtin !ariable "S4A"4


to the starting position of the substring within the string,
and "6ENG4H to the number of characters to the end of
the substring

#ormat'
match P5trinF> reF;lar eCpre55ionQ

"he -atch !unction (continue.)

)xample '
$ nawk 'L)0{startBmatcE"Mood ole (S%", /8%.=9
?$/FH"
C print 5ST%5T, 5WL)MT&}' employees
10 3

"he split !unction

The spli$ function splits a string into an array using


whate!er field separator is designated as the third
parameter

If the third parameter is not pro!ided, awk will use


the current !alue of #$

#ormat'
5plit P5trinF> arra#> fiel: 5eparatorQ
5plit P5trinF> arra#Q

"he sprintf !unction

The sprin$f function returns an expression in a


specified format. It allows you to apply the format
specifications of the prin$f function

#ormat'
Oaria+leM5printfPR5trinF with format
5pecifier5S eCpr1> eCpr2> T> eCprnQ

"he sprintf !unction (continue.)

)xample '
$ awk '{line B sprint, E "-.1/s -$P2, ", $1 ,
$2 FH"
C print line}' employees
Chen 0B00
!om 0B00
$arr# 0B00
"ill 0B00
'te(e 0B00

2uilt1in Arithmetic
!unctions
/ame +alue returned
atan%(,y) Arctangent of yM in the
range
cos() ,osine of , with in
radians
ep() #ponential function of ,
e
int() ;nteger part of <
truncated toward - when
K-
log() +atural (ase e) logarithm
of
rand() Sandom number r, where
-IrI$

2uilt1in Arithmetic !unctions
(continue.)
/ame +alue returned
sin() 4ine of , with radians
sqrt() 4quare root of
srand() T is a new seed for
rand()

Fnteger !unction

The in$ function truncates any digits to the right of


the decimal point to create a whole number. There is no
rounding off

"he rand !unction

The ran0 function generates a pseudorandom floating point


number greater than or e+ual to 8ero and less than one

)xample
$ nawk '{print randEF}' employees
0B513&1
0B1526
0B30&634
0B534532
0B9463
$ nawk '{print randEF}' employees
0B513&1
0B1526
0B30&634
0B534532
0B9463

"he srand !unction

The sran0 function without an argument uses the time of day to


generate the seed for the ran0 function

sran0)x* uses x as the seed. 6ormally, x should !ary during the run
of the program

)xample
$ nawk '7LM*){srandEF}H{print randEF}' employees
0B54&53
0B392254
0B9242
0B&2149
0B15322
$

"he srand !unction (continue.)

)xample
$ nawk '7LM*){srandEF}H{print randEF}' employees
4P/;1X/2
4P2Y22/;
4PYX2;X2
4P121;YX
4P1/2X22
$ nawk '7LM*){srandEF}H{print randEF}' employees
4P44/2X12$
4P212X1;
4P22X22
4P122422
4P1X24;

,ser1(efned !unctions (nawk)

, userdefined function can be placed anywhere in


the script that a pattern action rule can

#ormat
f;nction name Pparameter> parameter>
parameter> TQ{
5tatement5
ret;rn eCpre55ion
Pthe ret;rn 5tatement an: eCpre55ion
are optionalQ
}

,ser1(efned !unctions (nawk)
(continue.)

;ariables are passed by !alue and are local to the


function where they are used

,rrays are passed by address or by references, so


array
elements can be directly changed within the function

,ny !ariable used within the function that has not


been passed in the parameter list is considered a global
!ariable that is, it is !isible to the entire awk program,
and if changed in the function, is changed throughout
the program

,ser1(efned !unctions (nawk)
(continue.)

The only way to pro!ide local !ariables within a function


is to include them in the parameter list

If there is not a formal parameter pro!ided in the function


call, the parameter is initially set to null

The return statement returns control and possibly a !alue


to the caller

)xample
$ cat Krades
44 55 66 22 99
100 22 99 33 66
55 66 100 99 && 45
$

,ser1(efned !unctions (nawk)
(continue.)

)xample
$ cat sorter
SScript is called sorter
S *t sorts nOmAers in ascendinK order
,Onction sort E scores, nOmZelements, temp, i, [ F
{
S temp, i, and [ will Ae local and private,
Swit an initila valOe o, nOllP
,orE i B 2H i <B nOmZelements H i?? F
{
,or E [ B iH scores 8[.19 C scores8[9H ..[F
{
temp B scores8[9
scores8[9 B scores8[.19
scores8[.19 B temp
}
}
}

,ser1(efned !unctions (nawk)
(continue.)

)xample

To !iew sorter script, clic& here


{,or E i B 1H i <B )6H i??F
Krades8i9B$i
sortEKrades, )6F
,or E[ B 1H [ <B )6H [?? F
print,E "-d ", Krades8[9 F
print,E""n"F
}
$ nawk ., sorter Krades
22 44 55 66 99
22 33 66 99 100
45 55 66 && 99 100

"he s.str function

In the following example, the fields are of fixed


width, but are not separated by a field separator. The
su2s$r function is used to create field

'mpty !ields

If the data is stored in fixedwidth fields, it is possible that


some of the fields are empty. In the following example, the
su2s$r function is used to preser!e the fields, whether or not
they contain data

)xample
$ cat ,ile
CCC CCC
CCC a+c CCC
CCC a +++
CCC CC

'mpty !ields (continue.)
$ cat awk,iD
SVreservinK empty ,ieldsP 6ield widt is ,iDedP
{
,819BsOAstrE$4,1,2F
,829BsOAstrE$4,/,2F
,829BsOAstrE$4,Y,2F
lineBsprint,E"-.;s-.;s-.;s"n", ,819, ,829, ,829F
print line
}
$ nawk ., awk,iD ,ile
CCC CCC
CCC a+c CCC
CCC a +++
CCC CC

'mpty !ields (continue.)

To !iew aw&fix script, clic& here



Seferences

2+;T 47#664 *U #TA(P6# *U #66;#


V2;?6#U

2+;T !0S PS0?SA((#S4 A+5


24#S4 *U ?. ?6A44 A+5 W A*6#4

2+;T 47#66 PS0?SA((;+? *U 4.


W0,7A+ A+5 P. 3005

You might also like