You are on page 1of 4

awk

data transformation, report generation language


Command
SYNOPSIS
awk [-F ere] [-f prog] [-v var=value ...] [program] [var=value ...] [file ...]
DESCRIPTION
awk is a file-processing language which is well suited to data manipulation and
retrieval of information from text files. This reference page provides a full te
chnical description of awk. If you are unfamiliar with the language, you may fin
d it helpful to read the online AWK Tutorial before reading the following materi
al.
An awk program consists of any number of user-defined functions and rules in the
form:
pattern {action}
There are two ways to specify the awk program:
Directly on the command line. In this case, the program is a single command line
argument, usually enclosed in apostrophes (') to prevent the shell from attempt
ing to expand it.
By using the -f prog option.
You can only specify program directly on the command line if you do not use any
-f prog arguments.
When you specify files on the command line, those files provide the input data f
or awk to manipulate. If you specify no such files or you specify - as a file, a
wk reads data from the standard input.
You can initialize variables on the command line using
var=value
You can intersperse such initializations with the names of input files on the co
mmand line. awk processes initializations and input files in the order they appe
ar on the command line. For example, the command
awk -f progfile a=1 f1 f2 a=2 f3
sets a to 1 before reading input from f1 and sets a to 2 before reading input fr
om f3.
Variable initializations that appear before the first file on the command line a
re performed immediately after the BEGIN action. Initializations appearing after
the last file are performed immediately before the END action. For more informa
tion on BEGIN and END, see Patterns.
The -v option lets you assign a value to a variable before the awk program begin
s running (that is, before the BEGIN action). For example, in
awk -v v1=10 -f prog datafile
awk assigns the variable v1 its value before the BEGIN action of the program (bu
t after default assignments made to built-in variables like FS and OFMT; these b
uilt-in variables have special meaning to awk, as described in later sections).
awk divides input into records. By default, newline characters separate records;
however, you may specify a different record separator if you want.

One at a time, and in order, awk compares each input record with the pattern of
every rule in the program. When a pattern matches, awk performs the action part
of the rule on that input record. Patterns and actions often refer to separate f
ields within a record. By default, white space (usually blanks, newlines, or hor
izontal tab characters) separates fields; however, you can specify a different f
ield separator string using the -F ere option (see Input).
You can omit the pattern or action part of an awk rule (but not both). If you om
it pattern, awk performs the action on every input record (that is, every record
matches). If you omit action, awk writes every record matching the pattern to t
he standard output.
awk considers everything after a # in a program line to be a comment. For exampl
e:
# This is a comment
To continue program lines on the next line, add a backslash (\) to the end of th
e line. Statement lines ending with a comma (,), double or-bars (||), or double
ampersands (&&) continue automatically on the next line.
Options
-F ere
specifies an extended regular expression to use as the field separator.
-f prog
runs the awk program contained in the file prog. When more than one -f option ap
pears on the command line, the resulting program is a concatenation of all progr
ams you specify.
-v var=value
assigns value to var before running the program. You can specify this option a n
umber of times.
Variables and Expressions
There are three types of variables in awk: identifiers, fields and array element
s.
An identifier is a sequence of letters, digits and underscores beginning with a
letter or an underscore.
For a description of fields, see the Input subsection.
Arrays are associative collections of values called the elements of the array. C
onstructs of the form,
identifier[subscript]
where subscript has the form expr or expr,expr,.... refer to array elements. Eac
h such expr can have any string value. For multiple expr subscripts, awk concate
nates the string values of all exprs with a separate character SUBSEP between ea
ch. The initial value of SUBSEP is set to \034 (ASCII field separator).
Fields and identifiers are sometimes referred to as scalar variables to distingu
ish them from arrays.
You do not declare awk variables and you do not need to initialize them. The val
ue of an uninitialized variable is the empty string in a string context and the
number 0 in a numeric context.
Expressions consist of constants, variables, functions, regular expressions and

subscript in array conditions (described later) combined with operators. Each va


riable and expression has a string value and a corresponding numeric value; awk
uses the value appropriate to the context.
When converting a numeric value to its corresponding string value, awk performs
the equivalent of a call to the sprintf function (see Built-In String Functions)
. The one and only expr argument is the numeric value and the fmt argument is ei
ther %d (if the numeric value is an integer) or the value of the variable CONVFM
T (if the numeric value is not an integer). The default value of CONVFMT is %.6g
. If you use a string in a numeric context, and awk cannot interpret the content
s of the string as a number, it treats the value of the string as zero.
Numeric constants are sequences of decimal digits.
String constants are quoted, as in "a literal string". Literal strings can conta
in the escape sequences shown in Table 1, Escape Sequences in awk Literal String
s.
awk supports extended regular expressions (see regexp). When awk reads a program
, it compiles characters enclosed in slash characters (/) as regular expressions
. In addition, when literal strings and variables appear on the right side of a
~ or !~ operator, or as certain arguments to built-in matching and substitution
functions, awk interprets them as dynamic regular expressions.
Escape Character
\a
audible bell
\b
backspace
\f
formfeed
\n
newline
\r
carriage return
\t
horizontal tab
\v
vertical tab
\ooo
octal value ooo
\xdd
hexadecimal value dd
\/
slash
\"
quote
Table 1: Escape Sequences in awk Literal Strings
Note:
When you use literal strings as regular expressions, you need extra backslashes
to escape regular expression metacharacters, since the backslash is also the lit
eral string escape character. For example the regular expression,
/e\.g\./
when written as a string is:
"e\\.g\\."
awk defines the subscript in array condition as:
index in array
where index looks like expr or (expr,...,expr). This condition evaluates to 1 if
the string value of index is a subscript of array, and to 0 otherwise. This is
a way to determine if an array element exists. When the element does not exist,
this condition does not create it.
Symbol Table
You can access the symbol table through the built-in array SYMTAB.
SYMTAB[expr]
is equivalent to the variable named by the evaluation of expr. For example,

SYMTAB["var"]
is a synonym for the variable var.
Environment
An awk program can determine its initial environment by examining the ENVIRON ar
ray. If the environment consists of entries of the form:
name=value
then
ENVIRON[name]
has string value
"value"
For example, the following program is equivalent to the default output of env:
BEGIN {
for (i in ENVIRON)
printf("%s=%s\n", i, ENVIRON[i])
exit
}
Operators
awk follows the usual precedence order of arithmetic operations, unless overridd
en with parentheses; a table giving the order of operations appears later in thi
s section.
The unary operators are +, -, ++ and --, where you can use the ++ and -- operato
rs as either postfix or prefix operators, as in C. The binary arithmetic operato
rs are +, -, *, /, % and ^.
The conditional operator
expr ? expr1 : expr2
evaluates to expr1 if the value of expr is non-zero, and to expr2 otherwise.
If two expressions are not separated by an operator, awk concatenates their stri
ng values.

You might also like