Professional Documents
Culture Documents
Concept
Awk scans ascii files or standard input. It can search strings easily and then has a lot of
possibilities to process the found lines and output them in the new format. It does not
change the input file but sends its results onto standard output.
awk/nawk/gawk
Awk is the original awk. Nawk is new_awk and gawk the gnu_awk. The gnu_awk can do
most, but is not available everywhere. So best is to use only things which nawk can do,
because if that is not installed, then the system is not well anyway.
Variables
Awk does not distinguish between strings and numbers. So one may just put anything into
a variable with varname = othervar or varname = "string". To get it out of the var just
write it's name as a function argument or on the right side of any operator.
Multiline awk in a shell script
All between '' is in awk. With a=$var awk get's a shell variable.
The action is to print variable a and put it into a file.
awk '
BEGIN { print a > "testfile" }
' a=$var
awk '
BEGIN { myvalue = 1700 }
/debt/ { myvalue -= $4 }
/want/ { myvalue += $4 }
END { print myvalue }
' infile
awk '
$1 ~ /fred/ && $4 !~ /ok/ {
print "Fred has not yet paid $3"
}
' infile
awk '
BEGIN { count = 0 }
/myline/ {
for(i=1;i<=NF;i++){
if(substr($i,3,2) == "ae"){
bla = "Found it on line: "
print bla NR " in field: " i
}
}
}
END { print "Found " count " instances of it" }
' infile
awk '
{ for(i=1;i<=NF;i++){
len = length($i)
for(j=len;j>0;j--){
char = substr($i,j,1)
tmp = tmp char
}
$i = tmp
tmp = ""
}
print
}
' infile
#!/usr/bin/ksh
{ while read line;do
print - "$line"
done } |\
tee -a /path/mymailfile |\
awk '
/^From/ || /^Replay/ {
for(i=1;i<=NF;i++){
if($i ~ /@/){
print $i
}
}
}
' |\
sed '
s/[<>]//g;
s/[()]//g;
s/"//g;
...more substitutions for really extracting the email only...
' |\
{ while read addr;do
done }
With #!/usr/bin/nawk -f the whole script is interpreted intirely as an awk script and no
more shell escapes are needed, but one can and has to do everything in awk itself. It's
nawk because of the getline function.
While iterates until the expression becomes wrong or until a break is encountered.
Gsub() is for string substitution.
Getline reads in a line each time it is called.
System() executes a unix command.
">>" appends to a file.
This script is an example only. For really extracting email addresses several special cases
would have to be considered...
#!/usr/bin/nawk -f
# Lines from a mail are dropping in over stdin. Append every line to a
# file before checking anything.
/^From:/ || /^Replay/ {
# Find fields with @. Iterate over the fields and check for @
for(i=1;i<=nf;i++){
if($i ~ /@/){
gsub(/[<>()"]/,"",$i)
if($i == antiaddr){
break
}else{
#!/usr/bin/nawk -f
! /(foo|boo)/ { print }
# Rearange and calculate with columns but only on lines with foo or boo
/(foo|boo)/ {
# Extract fields
mytype = $1
var1 = $2
var2 = $3
var3 = $4
# Calculate
if(mytype == "foo"){
var1 *= 10
var2 += 20
var3 = log(var3)
}
if(mytype == "boo"){
var1 *= 4
var2 += 10
var3 = cos(var3)
}
printf("%-4s%10.3f%10.3f%10.3f\n",mytype,var3,var2,var1)
}
#!/usr/bin/ksh
var="term1 term2 term3 term4 term5"
awk '
BEGIN { split(myvar,myarr) }
{
for(val in myarr){
if($0 ~ myarr[val]){
print
}
}
}
' myvar="$var" file
Functions
This example substitutes the first three occurrences of "searchterm" with a different term
in each case and from the fourth case it just prints the line as it is.
#!/usr/bin/nawk -f
BEGIN{
mysub1 = "first_sub"
mysub2 = "second_sub"
mysub3 = "third_sub"
mycount = 1
find = "searchterm"
}
{
if($0 ~ find){
if(mycount == 1){ replace(mysub1); }
if(mycount == 2){ replace(mysub2); }
if(mycount == 3){ replace(mysub3); }
if(mycount > 3){ print; }
mycount++
}else{
print
}
}
function replace(mysub) {
sub(find,mysub)
print
break
}
String functions
sub(regexp,sub) Substitute sub for regexp in $0
sub(regexp,sub,var) Substitute sub for regexp in var
gsub(regexp,sub) Globally substitute sub for regexp in $0
gsub(regexp,sub,var) Globally substitute sub for regexp in var
split(var,arr) Split var on white space into arr
split(var,arr,sep) Split var on white space into arr on sep as separator
index(bigvar,smallvar) Find index of smallvar in bigvar
match(bigvar,expr) Find index for regexp in bigvar
length(var) Number of characters in var
substr(var,num) Extract chars from posistion num to end
substr(var,num1,num2) Extract chars from num1 through num2
sprintf(format,vars) Format vars to a string
Perl can do 100 times more than awk can, but awk is present on any standard unix
system, where perl first has to be installed. And for short commands awk seems to be
more practical. The auto split mode of perl splits into pieces called: $F[0] through
$F[$#F] which is not so nice as $1 through $NF where awk retains the whole line in $0 at
the same time.
To get the first column of any file in awk and in perl: