Professional Documents
Culture Documents
UFSC - CTC
Unix/perl
Prof. Mario Dantas
mario@inf.ufsc.br
http://www.inf.ufsc.br
Motivation
The large amount of computing resources
available in many organizations can be gather to
solve a large amount of problems from several
research areas.
Biology represents an example of an areas that
can improve experiments through the use of
these distributed resources.
Motivation
Workflow:
Motivation
Grid Information Service
R2
Grid Resource Broker
R3
R5
R4
RN
R6
R1
Several
Virtual
Organizations
Each organization
develops your
ontology
Motivation
The Architecture Proposal
Motivation
Motivation
Motivation
Graphic interface for editing
queries
Motivation
Case Study 1 (matching semantic)
Motivation
Case Study 2 (checking queries)
Motivation
SuMMIT
11
Motivation
SuMMIT Mobile GUI
Monitoring Interface
GRID 2007 - September 20, 2007
12
Motivation
SuMMIT Agent
13
Motivation
SuMMIT Workflow Manager
(4/7)
14
Motivation
SuMMIT Operation
Automation
and
Coordination
15
Motivation
SuMMIT Resource Selector
16
Objective
In this course will introduce the basics of
UNIX/perl programming, therefore in the
end participants will be able to write
simple scripts programms.
References
There are several books and sites that
can help in the task to develop and
improve your knowledge about Unix and
perl, some examples are:
Books
http://proquest.safaribooksonline.com
Books
Interesting Sites
http://directory.fsf.org/project/emacs/
http://stein.cshl.org/genome_informatics/
http://www.cs.usfca.edu/~parrt/course/601/l
ectures/unix.util.html
Interesting Sites
http://people.genome.duke.edu/~jes12//co
urses/perl_duke_2005/
http://www.pasteur.fr/~tekaia/BCGA_useful
_links.html
http://google.com/linux
Course Outline
Introduction
Course Outline
Introduction
In the Beginning
PDP-11
Commercial Success
AIX
SunOS, Solaris
Ultrix, Digital Unix
HP-UX
Irix
UnixWare -> Novell -> SCO -> Caldera ->SCO
Xenix:
-> SCO
Standardization (Posix, X/Open)
Openlook
Motif
BSD Lite
Popular Success!
Darwin
Apple abandoned old Mac OS for UNIX
Technical strengths!
Research, not commercial
PDP-11 was popular with an unusable OS
AT&Ts legal concerns
Not allowed to enter computer business but
needed to write software to help with switches
Licensed cheaply or free
..continued
Reusable code
Good programmers write good code;
great programmers borrow good code
print $(who | awk '{print $1}' | sort | uniq) | sed 's/ /,/g'
who
755
awk
3,412
sort
2,614
uniq
302
sed
2,093
9,176 lines
Silence is golden
Only report if something is wrong
Think hierarchically
Software tools
Development tools
Scripting languages
TCP/IP
Kernel Basics
The kernel is
a program loaded into memory during the
boot process, and always stays in physical
memory.
responsible for managing CPU and memory
for processes, managing file systems, and
interacting with devices.
User Space
utilities
C programs
compilers
Kernel
scheduler
Devices
printer
swapper
RAM
Kernel Subsystems
Process management
Schedule processes to run on CPU
Inter-process communication (IPC)
Memory management
Virtual memory
Paging and swapping
I/O system
File system
Device drivers
Buffer cache
System Calls
Interface to the kernel
Over 1,000 system calls available on Linux
3 main categories
File/device manipulation
e.g. mkdir(), unlink()
Process control
e.g. fork(), execve(), nice()
Information manipulation
e.g. getuid(), time()
Logging In
Need an account and password first
Enter at login: prompt
Password not echoed
After successful login, you will see a shell prompt
Entering commands
At the shell prompt, type in commands
Typical format: command options arguments
Examples: who, date, ls, cat myfile, ls l
Case sensitive
exit to log out
Remote Login
Use Secure Shell
(SSH)
Windows
e.g. PuTTY
UNIX-like OS
ssh name@access.cims.nyu.edu
UNIX on Windows
Two recommended UNIX emulation environments:
UWIN (AT&T)
http://www.research.att.com/sw/tools/uwin
Cygwin (GPL)
http://www.cygwin.com
Linux Distributions
Course Outline
Introduction
c programs
scripts
ls
ksh
gcc
find
open()
fork()
exec()
kernel
hardware
Kernel Subsystems
File system
Deals with all input and output
Includes files and terminals
Integration of storage devices
Process management
Deals with programs and program interaction
What is a shell?
The user interface to the operating system
Functionality:
Execute other programs
Manage files
Manage processes
Command history
Command line editing
File expansion (tab completion)
Command expansion
Key bindings
Spelling correction
Job control
Shell Scripting
A set of shell commands that
constitute an executable program
A shell script is a regular text file that
contains shell or UNIX commands
Very useful for automating repetitive task and
administrative tools and for storing
commands for later execution
Simple Commands
simple command: sequence of non blanks
arguments separated by blanks or tabs.
1st argument (numbered zero) usually specifies
the name of the command to be executed.
Any remaining arguments:
Are passed as arguments to that command.
Arguments may be filenames, pathnames, directories
or special options (up to command)
Special characters are interpreted by shell
A simple example
$ ls l /bin
-rwxr-xr-x 1 root
$
prompt
command
sys
43234 Sep 26
2001 date
arguments
Types of Arguments
$ tar c v f archive.tar main.c main.h
Options/Flags
Convention: -X or --longname
Parameters
May be files, may be strings
Depends on command
1. Commands
2. System calls
3. Subroutines
4. Special files
5. File format and conventions
6. Games
http://en.wikipedia.org/wiki/Unix_manual
USER
COMMANDS
ls ( 1 )
NAME
ls - list files and/or directories
SYNOPSIS
ls [ options ] [ file ... ]
DESCRIPTION
For each directory argument ls lists the contents; for each file argument the name and requested information are listed. The
current directory is listed if no file arguments appear. The listing is sorted by file name by default, except that file arguments
are listed before directories.
.
OPTIONS
a, --all
List entries starting with .; turns off --almost-all.
-F, --classify
Append a character for typing each entry.
-l, --long|verbose
Use a long listing format.
-r, --reverse
Reverse order while sorting.
-R, --recursive
List subdirectories recursively.
-
SEE ALSO
chmod(1), find(1), getconf(1), tw(1)
Fundamentals of Security
UNIX systems have one or more users,
identified with a number and name.
A set of users can form a group. A user
can be a member of multiple groups.
A special user (id 0, name root) has
complete control.
Each user has a primary (default)
group.
A simple example
$ ls l /bin
-rwxr-xr-x 1 root
$
read
write
sys
execute
43234 Sep 26
2001 date
Definition: Filename
A sequence of characters other than slash.
Case sensitive.
tmp
usr
dmr
wm4
foo
etc
bin
who
date
foo
.profile
who
date
.profile
Definition: Directory
Holds a set of files or other directories.
Case sensitive.
tmp
usr
dmr
wm4
foo
etc
bin
who
date
etc
.profile
usr
dmr
bin
Definition: Pathname
A sequence of directory names followed by a simple
filename, each separated from the previous one by a /
/
tmp
usr
dmr
wm4
.profile
foo
etc
bin
who
date
/usr/wm4/.profile
usr
dmr
wm4
.profile
foo
etc
bin
who
date
usr
dmr
wm4
.profile
foo
etc
bin
who
date
Tilde Expansion
Each user has a home directory
Most shells (ksh, csh) support ~ operator:
~ expands to my home directory
~/myfile
/home/kornj/myfile
/home/unixtool/file2
external device
/
a
a
b
Device
Mount Point
/
/a/b
b
a
Mount table
pwd
print process working dir
ed, vi, emacs
create/edit files
ls
list contents of directory
rm
remove file
mv
rename file
cp
copy a file
touch
create an empty file or update
mkdir and rmdir create and remove dir
wc
counts the words in a file
file
determine file contents
du
directory usage
File Permissions
UNIX provides a way to protect files based on
users and groups.
Three types of permissions:
read, process may read contents of file
write, process may write contents of file
execute, process may execute file
Directory permissions
Same types and sets of permissions as for
files:
read: process may a read the directory
contents (i.e., list files)
write: process may add/remove files in the
directory
execute: process may open files in directory
or subdirectories
chmod
change file permissions
chown
change file owner
chgrp
change file group
umask
user file creation mode mask
only owner or super-user can change file
attributes
upon creation, default permissions given
to file modified by process umask value
Chmod command
Symbolic access modes {u,g,o} /
{r,w,x}
example: chmod +r file
read
write
execute
0
1
2
3
4
5
6
7
no
no
no
no
yes
yes
yes
yes
no
no
yes
yes
no
no
yes
yes
no
yes
no
yes
no
yes
no
yes
Mode
Count
Position
1023
1331
read
read/write
1
2
0
50
Standard in/out/err
The first three entries in the file descriptor
table are special by convention:
cat
Devices
Besides files, input and output can go
from/to various hardware devices
Redirection
Before a command is executed, the input and
output can be changed from the default
(terminal) to a file
Shell modifies file descriptors in child process
The child program knows nothing about this
ls
ls
Redirection of input/ouput
Redirection of output: >
example:$ ls > my_files
Using Devices
Redirection works with devices (just like
files)
Special files in /dev directory
Example: /dev/tty
Example: /dev/lp
Example: /dev/null
cat big_file > /dev/lp
cat big_file > /dev/null
Links
Directories are a list of files and directories.
Each directory entry links to a file on the disk
mydir
hello
file2
subdir
Hello
World!
ln command
Symbolic links
Symbolic links are different than regular links (often
called hard links). Created with ln -s
Can be thought of as a directory entry that points to
the name of another file.
Does not change link count for file
When original deleted, symbolic link remains
Contents of file
Hard link
symlink
dirent
Symbolic Link
Contents of file
Example
usr tmp etc bin
dmr wm4
.profile
etc
foo
who date
Hard Link
usr tmp etc bin
dmr wm4
.profile
etc
foo
who date
Symbolic Link
usr tmp etc bin
dmr wm4
foo
who date
.profile
etc
/usr/wm4/.profile
dmr wm4
foo
who date
.profile
etc
cat
Tree Walking
How can do we find a set of files in the
hierarchy?
One possibility:
ls l R /
What about:
All files below a given directory in the hierarchy?
All files since Jan 1, 2001?
All files larger than 10K?
find utility
find pathlist expression
find recursively descends through pathlist
and applies expression to every file.
expression can be:
-name pattern
true if file name matches pattern. Pattern may
include shell patterns such as *, must be in quotes
to suppress shell interpretation.
Eg: find / -name '*.c'
many more
op1 -a op2
op1 -o op2
( )
find: actions
-print
-exec cmd
Executes cmd, where cmd must be terminated by an
escaped semicolon (\; or ';').
If you specify {} as a command line argument, it is
replaced by the name of the current file just found.
exec executes cmd once per file.
Example:
find -name "*.o" -exec rm "{}" ";"
find Examples
Find all files beneath home directory beginning with f
find ~ -name 'f*' -print
apples
oranges
grapes
comm
Reads two files and outputs three columns:
Lines in first file only
Lines in second file only
Lines in both files
Must be sorted
Options: fields to suppress ( [-123] )
Course Outline
Introduction
User Space
Code
Code
Code
Data
Data
Data
Process
Info
Process
Info
Process
Info
Open File
Table
Kernel Space
Process
Table
Unix Processes
Process: An entity of execution
Definitions
program: collection of bytes stored in a file that can
be run
image: computer execution environment of program
process: execution of an image
Process Creation
Interesting trait of UNIX
fork system call clones the current process
A
Process Setup
All of the per process information is copied
with the fork operation
Working directory
Open files
/*createchild*/
/*parentwaits*/
/*childexecs*/
in it
execs
I n it
execs
g e tty
g e tty
execs
g e tty
lo g in
execs
/ b in / s h
Background Jobs
By default, executing a command in the
shell will wait for it to exit before printing
out the next prompt
Trailing a command with & allows the shell
and command to run simultaneously
$ /bin/sleep 10 &
[1] 3424
$
Program Arguments
When a process is started, it is sent a list
of strings
argv, argc
Ending a process
When a process ends, there is a return
code associated with the process
This is a positive integer
0 means success
>0 represent various kinds of failure, up to
process
Process group id
number used to identify set of processes
Parent process id
process id of the process that created the process
Environment variables
Environment of a Process
A set of name-value pairs associated with
a process
Keys and values are strings
Passed to children processes
Cannot be passed back up
Common examples:
PATH: Where to search for programs
TERM: Terminal type
Example:
$ myprogram
sh: myprogram not found
$ PATH=/bin:/usr/bin:/home/kornj/bin
$ myprogram
hello!
$ ./foo
Hello, foo.
Shell Variables
Shells have several mechanisms for creating
variables. A variable is a name representing a
string value. Example: PATH
Shell variables can save time and reduce typing
errors
Variables (cont)
Syntax varies by shell
varname=value
set varname = value
# sh, ksh
# csh
Environmental Variables
NAMEMEANING
$HOME Absolute pathname of your home
directory
$PATH A list of directories to search for
$MAIL Absolute pathname to mailbox
$USER Your user id
$SHELL Absolute pathname of login shell
$TERM Type of your terminal
$PS1
Prompt
Inter-process Communication
Ways in which processes communicate:
Passing arguments, environment
Read/write regular files
Exit values
Signals
Pipes
Signals
Signal: A message a process can send to a process
or process group, if it has appropriate permissions.
Message type represented by a symbolic name
For each signal, the receiving process can:
Explicitly ignore signal
Specify action to be taken upron receipt (signal handler)
Otherwise, default action takes place (usually process is
killed)
Common signals:
SIGKILL, SIGTERM, SIGINT
SIGSTOP, SIGCONT
SIGSEGV, SIGBUS
An Example of Signals
When a child exists, it sends a
SIGCHLD signal to its parent.
If a parent wants to wait for a child to
exit, it tells the system it wants to catch
the SIGCHLD signal
When a parent does not issue a wait, it
ignores the SIGCHLD signal
Pipes
Pipes
General idea: The input of one program
is the output of the other, and vice versa
Pipes (2)
Often, only one end of the pipe is used
standard out
standard in
File Approach
Run first program, save output into file
Run second program, using file as input
process 1
process 2
If the pipe fills up, UNIX puts the writer to sleep until
the reader frees up space (by doing a read)
standard out
standard in
Interprocess Communication
For Unrelated Processes
FIFO (named pipes)
A special file that when opened represents
pipe
System V IPC
p1
message queues
semaphores
shared memory
Sockets (client/server model)
p2
Pipelines
Output of one program becomes input to
another
Uses concept of UNIX pipes
Example: $ who | wc -l
counts the number of users logged in
An Extra Process
$ cat file | command
cat
command
Introduction to Filters
A class of Unix tools called filters.
Utilities that read from standard input,
transform the file, and write to standard out
Examples of Filters
Sort
Input: lines from a file
Output: lines from the file sorted
Grep
Input: lines from a file
Output: lines that match the argument
Awk
Programmable filter
cat file*
ls | cat -n
head
Display the first few lines of a specified file
Syntax: head [-n] [filename...]
-n - number of lines to display, default is 10
filename... - list of filenames to display
tail
Displays the last part of a file
Syntax: tail +|-number [lbc] [f] [filename]
or: tail +|-number [l] [rf] [filename]
+number - begins copying at distance number from
beginning of file, if number isnt given, defaults to 10
-number - begins from end of file
l,b,c - number is in units of lines/block/characters
r - print in reverse order (lines only)
f - if input is not a pipe, do not terminate after end of
file has been copied but loop. This is useful to
monitor a file being written by another process
tee
Unix Command
Standard output
file-list
tee cont
Syntax: tee [ -ai ] file-list
-a - append to output file rather than overwrite,
default is to overwrite (replace) the output file
-i - ignore interrupts
file-list - one or more file names for capturing
output
Examples
ls | head 10 | tee first_10 | tail 5
who | tee user_list | wc
99
75
50
95
33
76
Pipe-separated
COMP1011|2252424|Abbot, Andrew John |3727|1|M
COMP2011|2211222|Abdurjh, Saeed |3640|2|M
COMP1011|2250631|Accent, Aac-Ek-Murhg |3640|1|M
COMP1021|2250127|Addison, Blair |3971|1|F
COMP4012|2190705|Allen, David Peter |3645|4|M
COMP4910|2190705|Allen, David Pater |3645|4|M
Colon-separated
root:ZHolHAHZw8As2:0:0:root:/root:/bin/ksh
jas:nJz3ru5a/44Ko:100:100:John Shepherd:/home/jas:/bin/ksh
cs1021:iZ3sO90O5eZY6:101:101:COMP1021:/home/cs1021:/bin/bash
cs2041:rX9KwSSPqkLyA:102:102:COMP2041:/home/cs2041:/bin/csh
cs3311:mLRiCIvmtI9O2:103:103:COMP3311:/home/cs3311:/bin/sh
Some options:
-f listOfCols: print only the specified columns (tabseparated) on output
-c listOfPos: print only chars in the specified positions
-d c: use character c as the column separator
cut examples
cut -f 1 < data
cut -f 1-3 < data
cut -f 1,4 < data
cut -f 4- < data
cut -d'|' -f 1-3 < data
cut -c 1-4 < data
Unfortunately, there's no way to refer to "last
column" without counting the columns.
1
2
3
4
5
6
paste example
cut -f 1 < data > data1
cut -f 2 < data > data2
cut -f 3 < data > data3
paste data1 data3 data2 > newdata
sort: Options
Syntax: sort [-dftnr] [-o filename] [filename(s)]
-dDictionary order, only letters, digits, and whitespace
are significant in determining sort order
-f Ignore case (fold into lower case)
-t Specify delimiter
-n Numeric order, sort by arithmetic value instead of
first digit
-r Sort in reverse order
-ofilename - write output to filename, filename can be
the same as one of the input files
Exclusive
Start from 0 (unlike cut, which starts at 1)
New way:
-k f[.c][options][,f[.c][options]]
-k2.1 k0,1 k3n
Inclusive
Start from 1
sort Examples
sort +2nr < data
sort k2nr data
sort -t: +4 /etc/passwd
sort -o mydata mydata
Count lines
Count words
Count characters
tr (continued)
tr reads from standard input.
Any character that does not match a character in
string1 is passed to standard output unchanged
Any character that does match a character in string1
is translated into the corresponding character in
string2 and then passed to standard output
Examples
tr s z replaces all instances of s with z
tr so zx replaces all instances of s with z and o with
x
tr a-z A-Z
replaces all lower case characters
with
upper case characters
tr d a-c
deletes all a-c characters
tr uses
Change delimiter
tr | :
Rewrite numbers
tr ,. .,
xargs
Unix limits the size of arguments and
environment that can be passed down to child
What happens when we have a list of 10,000
files to send to a command?
xargs solves this problem
Reads arguments as standard input
Sends them to commands that take file lists
May invoke program several times depending on size
of arguments
cmd a1 a2
a1 a300
xargs
cmd
out
xargs invokes wc 1 or more times
wc -l a b c d e f g
wc -l h i j k l m n o
Next Time
Regular Expressions
Allow you to search for text in files
grep command
Previously
Basic UNIX Commands
Files: rm, cp, mv, ls, ln
Processes: ps, kill
Unix Filters
Subtleties of commands
Today
Regular Expressions
Allow you to search for text in files
grep command
Stream manipulation:
sed
But first, a command we didnt cover last time
xargs
Unix limits the size of arguments and
environment that can be passed down to child
What happens when we have a list of 10,000
files to send to a command?
xargs handles this problem
Reads arguments as standard input
Sends them to commands that take file lists
May invoke program several times depending on size
of arguments
cmd a1 a2
a1 a300
xargs
cmd
wc -l a b c d e f g
wc -l h i j k l m n o
Regular Expressions
Regular Expressions
The simplest regular expressions are a
string of literal characters to match.
The string matches the regular
expression if it contains the substring.
regular expression
c k s
Regular Expressions
A regular expression can match a string in
more than one place.
regular expression
a p p l e
match 2
Regular Expressions
The . regular expression can be used to
match any character.
regular expression
o .
match 2
Character Classes
Character classes [] can be used to
match any specific set of characters.
regular expression
b [eor] a t
match 2
match 3
regular expression
b [^eo] a t
[[:alpha:]]
[[:alnum:]]
[45[:lower:]]
Anchors
Anchors are used to match at the beginning or end of a line (or both).
^ means beginning of the line
$ means end of the line
^ b [eor] a t
regular expression
regular expression
b [eor] a t $
^word$
^$
Repetition
The * is used to define zero or more
occurrences of the single regular
expression preceding it.
y a * y
regular expression
regular expression
o a * o
.*
Match length
A match will be the longest string that
satisfies the regular expression.
regular expression
a . * e
no
yes
Repetition Ranges
Ranges can also be specified
{ } notation can specify a range of
repetitions for the immediately preceding
regex
{n} means exactly n occurrences
{n,} means at least n occurrences
{n,m} means at least n occurrences but no
more than m occurrences
Example:
.{0,} same as .*
a{2,} same as aaa*
Subexpressions
If you want to group part of an expression so
that * or { } applies to more than just the
previous character, use ( ) notation
Subexpresssions are treated like a single
character
a* matches 0 or more occurrences of a
abc* matches ab, abc, abcc, abccc,
(abc)* matches abc, abcabc, abcabcabc,
(abc){2,3} matches abcabc or abcabcabc
grep
grep comes from the ed (Unix text editor)
search command global regular expression
print or g/re/p
This was such a useful command that it was
written as a standalone utility
There are two other variants, egrep and fgrep
that comprise the grep family
grep is the answer to the moments where you
know you want the file that contains a specific
phrase but you cant remember its name
Family Differences
grep - uses regular expressions for pattern
matching
fgrep - file grep, does not use regular
expressions, only matches fixed strings but can
get search strings from a file
egrep - extended grep, uses a more powerful
set of regular expressions but does not support
backreferencing, generally the fastest member
of the grep family
agrep approximate grep; not standard
Syntax
Regular expression concepts we have seen
so far are common to grep and egrep.
grep and egrep have slightly different syntax
grep: BREs
egrep: EREs (enhanced features we will discuss)
Protecting Regex
Metacharacters
Since many of the special characters used
in regexs also have special meaning to the
shell, its a good idea to get in the habit of
single quoting your regexs
This will protect any special characters from
being operated on by the shell
If you habitually do it, you wont have to worry
about when it is necessary
Egrep: Alternation
Regex also provides an alternation character |
for matching one or another subexpression
(T|Fl)an will match Tan or Flan
^(From|Subject): will match the From and Subject
lines of a typical email message
It matches a beginning of line followed by either the characters
From or Subject followed by a :
Grep: Backreferences
Sometimes it is handy to be able to refer to a
match that was made earlier in a regex
This is done using backreferences
\n is the backreference specifier, where n is a number
Time of day
(1[012]|[1-9]):[0-5][0-9] (am|pm)
grep Family
Syntax
grep [-hilnv] [-e expression] [filename]
egrep [-hilnv] [-e expression] [-f filename] [expression]
[filename]
fgrep [-hilnxv] [-e string] [-f filename] [string] [filename]
-h Do not display filenames
-i Ignore case
-l List only filenames containing matching lines
-n Precede each matching line with its line number
-v Negate matches
-x Match whole line only (fgrep only)
-e expression Specify expression as option
-f filename
Take the regular expression (egrep)
or a list of strings (fgrep) from filename
grep Examples
egrep has its limits: For example, it cannot match all lines that
contain a number divisible by 7.
25,000 words
Other Notes
Use /dev/null as an extra file name
Will print the name of the file that matched
grep test bigfile
This is a test.
input line
regular expression
grep, egrep
grep
egrep
Quick
Reference
A Unix filter
Superset of previously mentioned tools
Sed Architecture
Input
Input line
(Pattern Space)
Output
scriptfile
Scripts
A script is nothing more than a file of commands
Each command consists of up to two addresses
and an action, where the address can be a
regular expression or line number.
address
action
address
action
address
action
address
action
address
action
command
script
cmd 1
cmd 2
...
cmd n
print command
input
output
output
only without -n
sed Syntax
Syntax: sed [-n] [-e] [command] [file]
sed [-n] [-f scriptfile] [file]
-n - only print lines specified with the print
command (or the p flag of the substitute (s)
command)
-f scriptfile - next argument is a filename
containing editing commands
-e command - the next argument is an editing
command rather than a filename, useful if multiple
commands are specified
If the first line of a scriptfile is #n, sed acts as
though -n had been specified
sed Commands
sed commands have the general form
[address[, address]][!]command [arguments]
Addressing
An address can be either a line number or
a pattern, enclosed in slashes (
/pattern/ )
A pattern is described using regular
expressions (BREs, as in grep)
If no pattern is specified, the command will
be applied to all lines of the input file
To refer to the last line: $
Addressing (continued)
Most commands will accept two addresses
If only one address is given, the command operates
only on that line
If two comma separated addresses are given, then
the command operates on a range of lines between
the first and second address, inclusively
Commands
command is a single letter
Example: Deletion: d
[address1][,address2]d
Delete the addressed line(s) from the pattern
space; line(s) not passed to standard output.
A new line of input is read and editing
resumes with the first command of the script.
d
6d
Multiple Commands
Braces {} can be used to apply multiple commands to
an address
[/pattern/[,/pattern/]]{
command1
command2
command3
}
Strange syntax:
The opening brace must be the last character on a line
The closing brace must be on a line by itself
Make sure there are no spaces following the braces
Sed Commands
Although sed contains many editing commands,
we are only going to cover the following subset:
s - substitute
a - append
i - insert
c - change
d - delete
p - print
y - transform
q - quit
Print
The Print command (p) can be used to force
the pattern space to be output, useful if the -n
option has been specified
Syntax: [address1[,address2]]p
Note: if the -n option has not been specified, p
will cause the line to be output twice!
Examples:
1,5p will display lines 1 through 5
/^$/,$p will display the lines from the first
blank line through the last line of the file
Substitute
Syntax: [address(es)]s/pattern/replacement/
[flags]
pattern - search pattern
replacement - replacement string for pattern
flags - optionally any of the following
Substitute Examples
s/Puff Daddy/P. Diddy/
Substitute P. Diddy for the first occurrence of Puff
Daddy in pattern space
s/Tom/Dick/2
Substitutes Dick for the second occurrence of Tom in
the pattern space
s/wood/plastic/p
Substitutes plastic for the first occurrence of wood and
outputs (prints) pattern space
Replacement Patterns
Substitute can use several special
characters in the replacement string
& - replaced by the entire string matched in
the regular expression for pattern
\n - replaced by the nth substring (or
subexpression) previously specified using \(
and \)
\ - used to escape the ampersand (&) and
the backslash (\)
Example:
Change
Unlike Insert and Append, Change can be
applied to either a single line address or a range
of addresses
When applied to a range, the entire range is
replaced by text specified with change, not each
line
Exception: If the Change command is executed with
other commands enclosed in { } that act on a range
of lines, each line will be replaced with text
Change Examples
Remove mail headers, ie;
/^From /,/^$/c\
the address specifies a
<Mail Headers Removed>
range of lines beginning
with a line that begins with
From until the first blank
/^From /,/^$/{
line.
s/^From //p
The first example replaces
all lines with a single
occurrence of <Mail Header
Removed>.
The second example
replaces each line with
<Mail Header Removed>
c\
<Mail Header Removed>
}
Using !
If an address is followed by an exclamation point
(!), the associated command is applied to all lines
that dont match the address or address range
Examples:
1,5!d would delete all lines except 1 through 5
/black/!s/cow/horse/ would substitute
horse for cow on all lines except those that
contained black
The brown cow -> The brown horse
The black cow -> The black cow
Transform
The Transform command (y) operates like tr, it
does a one-to-one or character-to-character
replacement
Transform accepts zero, one or two addresses
[address[,address]]y/abc/xyz/
every a within the specified address(es) is
transformed to an x. The same is true for b to y and
c to z
y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQR
STUVWXYZ/ changes all lower case characters on the
Quit
Quit causes sed to stop reading new input lines
and stop sending them to standard output
It takes at most a single line address
Once a line matching the address is reached, the
script will be terminated
This can be used to save time when you only want
to process some portion of the beginning of a file
Pattern
Hold
out
Sed Advantages
Regular expressions
Fast
Concise
Sed Drawbacks
Hard to remember text from one line to
another
Not possible to go backward in the file
No way to do forward references
like /..../+1
No facilities to manipulate numbers
Cumbersome syntax
Course Outline
Introduction
What is a shell?
The user interface to the operating system
Functionality:
Execute other programs
Manage files
Manage processes
Shell History
There are
many choices
for shells
Shell features
evolved as
UNIX grew
C shell
Enhanced C Shell
The Bourne Shell / POSIX shell
Korn shell
Korn shell clone, from GNU
Scripting
A set of shell commands that constitute an
executable program
Means of output:
Return status code [control information]
Standard out [data]
Standard error [error messages]
Shell Scripts
A shell script is a regular text file that
contains shell or UNIX commands
Before running it, it must have execute
permission:
chmod +x filename
Shell Scripts
When a script is run, the kernel determines
which shell it is written for by examining the first
line of the script
If 1st line starts with #!pathname-ofshell, then it invokes pathname and sends
the script as an argument to be interpreted
If #! is not specified, the current shell
assumes it is a script in its own language
leads to problems
Simple Example
#!/bin/sh
echo Hello World
The C Shell
C-like syntax (uses { }'s)
Inadequate for scripting
http://www.faqs.org/faqs/unix-faq/shell/csh-whynot
Simple Commands
simple command: sequence of non blanks
arguments separated by blanks or tabs.
1st argument (numbered zero) usually specifies
the name of the command to be executed.
Any remaining arguments:
Are passed as arguments to that command.
Arguments may be filenames, pathnames, directories
or special options
ls l /
/bin/ls
-l
/
Background Commands
Any command ending with "&" is run in the
background.
firefox &
Complex Commands
The shell's power is in its ability to hook
commands together
We've seen one example of this so far
with pipelines:
cut d: -f2 /etc/passwd | sort | uniq
Redirection of input/ouput
Redirection of output: >
example:$ ls -l > my_files
Multiple Redirection
cmd 2>file
send standard error to file
standard output remains the same
Here Documents
Shell provides alternative ways of supplying
standard input to commands (an anonymous file)
Shell allows in-line input redirection using <<
called here documents
Syntax:
command [arg(s)] << arbitrary-delimiter
command input
:
:
arbitrary-delimiter
arbitrary-delimiter should be a string that does
not appear in text
Shell Variables
To set:
name=value
Read: $var
Variables can be local or environment.
Environment variables are part of UNIX
and can be accessed by child processes.
Turn local variable into environment:
export variable
Variable Example
#!/bin/sh
MESSAGE="Hello World"
echo $MESSAGE
Environmental Variables
NAMEMEANING
$HOME Absolute pathname of your home
directory
$PATH A list of directories to search for
$MAIL Absolute pathname to mailbox
$USER Your login name
$SHELL Absolute pathname of login shell
$TERM Type of your terminal
$PS1
Prompt
Parameters
A parameter is one of the following:
A variable
A positional parameter, starting from 1
A special parameter
To get the value of a parameter: ${param}
Can be part of a word (abc${foo}def)
Works within double quotes
Positional Parameters
The arguments to a shell script
$1, $2, $3
Special Parameters
$#
$$?
$$
$!
process
$*
"$@"
Command Substitution
Used to turn the output of a command into a string
Used to create arguments or variables
Command is placed with grave accents ` ` to
capture the output of command
$ date
Wed Sep 25 14:40:56 EDT 2001
$ NOW=`date`
$ grep `generate_regexp` myfile.c
$ sed "s/oldtext/`ls | head -1`/g"
$ PATH=`myscript`:$PATH
File Expansion
If multiple matches, all are returned $ /bin/ls
and treated as separate arguments: file1 file2
NOT
$ cat file1
a
$ cat file2
b
$ cat file*
a
dont bsee the wildcards)
argv[0]: /bin/cat
argv[1]: file*
Compound Commands
Multiple commands
Separated by semicolon or newline
Command groupings
pipelines
Subshell
( command1; command2 ) > file
Boolean operators
Control structures
Boolean Operators
Exit value of a program (exit system call) is a
number
0 means success
anything else is a failure code
cmd1 || cmd2
executes
cmd2 if cmd1
is not successful
$ ls bad_file
> /dev/null
&& date
$ ls bad_file > /dev/null || date
Wed Sep 26 07:43:23 2006
Control Structures
if expression
then
command1
else
command2
fi
What is an expression?
Any UNIX command. Evaluates to true if the
exit code is 0, false if the exit code > 0
Special command /bin/test exists that does
most common expressions
String compare
Numeric comparison
Check file properties
Examples
if test "$USER" = "kornj"
then
echo "I know you"
else
echo "I dont know you"
fi
if [ -f /tmp/stuff ] && [ `wc l < /tmp/stuff` -gt 10 ]
then
echo "The file has more than 10 lines in it"
else
echo "The file is nonexistent or small"
fi
test Summary
Numeric tests
int1 eq int2
int1 ne int2
-gt, -ge, -lt, -le
File tests
-r
-w
-f
-d
-s
file
file
file
file
file
Logic
!
-a, -o
( expr )
Length of string is 0
Length of string is not 0
Strings are identical
Strings differ
String is not NULL
First int equal to second
First int not equal to second
greater, greater/equal, less, less/equal
File exists and is readable
File exists and is writable
File is regular file
File is directory
file exists and is not empty
Negate result of expression
and operator, or operator
groups an expression
Arithmetic
No arithmetic built in to /bin/sh
Use external command /bin/expr
expr expression
Evaluates expression and sends the result to
standard output.
Yields a numeric or string result
expr 4 "*" 12
Particularly
expr "("
4 + with
3 ")"
"*" 2
useful
command
substitution
X=`expr $X + 2`
for loops
Different than C:
for var in list
do
command
done
Case statement
Like a C switch statement for strings:
case $var in
opt1) command1
command2
;;
opt2) command
;;
*)
command
;;
esac
Case Example
#!/bin/sh
for INPUT in "$@"
do
case $INPUT in
hello)
echo "Hello there."
;;
bye)
echo "See ya later."
;;
*)
echo "I'm sorry?"
;;
esac
done
echo "Take care."
Case Options
opt can be a shell pattern, or a list of shell
patterns delimited by |
Example:
case $name in
*[0-9]*)
echo "That doesn't seem like a name."
;;
J*|K*)
echo "Your name starts with J or K, cool."
;;
*)
echo "You're not special."
;;
esac
Types of Commands
All behave the same way
Programs
Most that are part of the OS in /bin
Built-in commands
Functions
Aliases
Built-in Commands
Built-in commands are internal to the shell and
do not create a separate process. Commands
are built-in because:
They are intrinsic to the language (exit)
They produce side effects on the current process (cd)
They perform faster
No fork/exec
Special built-ins
: . break continue eval exec export exit
readonly return set shift trap unset
:
replaces shell with program
change working directory
:
rearrange positional parameters
set positional parameters
:
wait for background proc. to exit
:
change default file permissions
:
quit the shell
:
parse and execute string
:
run command and print times
:
put variable into environment
:
set signal handlers
:
:
:
:
:
continue in loop
break in loop
return from function
true
read file of commands into
current shell; like #include
Functions
Functions are similar to scripts and other
commands except:
They can produce side effects in the callers script.
Variables are shared between caller and callee.
The positional parameters are saved and restored
when invoking a function.
Syntax:
name ()
{
commands
}
Aliases
Like macros (#define in C)
Shorter to define than functions, but more
limited
Not recommended for scripts
Example:
alias rm='rm i'
Shell Comments
Special Characters
The shell processes the following characters specially
unless quoted:
| & ( ) < > ; " ' $ ` space tab newline
Token Types
The shell uses spaces and tabs to split the
line or lines into the following types of
tokens:
Control operators (||)
Redirection operators (<)
Reserved words (if)
Assignment tokens
Word tokens
Operator Tokens
Operator tokens are recognized everywhere
unless quoted. Spaces are optional before and
after operator tokens.
I/O Redirection Operators:
> >> >| >& < << <<- <&
Each I/O operator can be immediately preceded by a
single digit
Control Operators:
| & ; ( ) || && ;;
Shell Quoting
Quoting causes characters to loose special
meaning.
\ Unless quoted, \ causes next character to
be quoted. In front of new-line causes lines to
be joined.
''
Literal quotes. Cannot contain '
""
Removes special meaning of all
characters except $, ", \ and `. The \ is only
special before one of these characters and newline.
Quoting Examples
$ cat file*
a
b
$ cat "file*"
cat: file* not found
$ cat file1 > /dev/null
$ cat file1 ">" /dev/null
a
cat: >: cannot open
FILES="file1 file2"
$ cat "$FILES"
cat: file1 file2 not found
Simple Commands
A simple command consists of three types of
tokens:
Example:
foo=bar z=`date`
echo $HOME
x=foobar > q$$ $xyz z=3
Word Splitting
After parameter expansion, command
substitution, and arithmetic expansion,
the characters that are generated as a
result of these expansions that are not
inside double quotes are checked for split
characters
Default split character is space or tab
Split characters are defined by the value
of the IFS variable (IFS="" disables)
IFS=x v=exit
echo exit $v "$v"
exit e it exit
Pathname Expansion
After word splitting, each field that
contains pattern characters is replaced by
the pathnames that match
Quoting prevents expansion
set o noglob disables
Not in original Bourne shell, but in POSIX
Parsing Example
DATE=`date` echo $foo > \
/dev/null
DATE=`date` echo $foo > /dev/null
assignment
word
param
redirection
hello there
split by IFS
/dev/null
/dev/null
Script Examples
Rename files to lower case
Strip CR from files
Emit HTML for directory contents
Rename files
#!/bin/sh
for file in *
do
lfile=`echo $file | tr A-Z a-z`
if [ $file != $lfile ]
then
mv $file $lfile
fi
done
Generate HTML
$ dir2html.sh > dir.html
The Script
#!/bin/sh
[ "$1" != "" ] && cd "$1"
cat <<HUP
<html>
<h1> Directory listing for $PWD </h1>
<table border=1>
<tr>
HUP
num=0
for file in *
do
genhtml $file
# this function is on next
page
done
cat <<HUP
</tr>
</table>
</html>
Function genhtml
genhtml()
{
file=$1
echo "<td><tt>"
if [ -f $file ]
then
echo "<font color=blue>$file</font>"
elif [ -d $file ]
then
echo "<font color=red>$file</font>"
else
echo "$file"
fi
echo "</tt></td>"
num=`expr $num + 1`
if [ $num -gt 4 ]
then
echo "</tr><tr>"
num=0
fi
}
Command Substitution
Better syntax with $(command)
Allows nesting
x=$(cat $(generate_file_list))
Expressions
Expressions are built-in with the [[ ]] operator
if [[ $var = "" ]]
string == pattern
string != pattern
string1 < string2
file1 nt file2
file1 ot file2
file1 ef file2
&&, ||
Patterns
Can be used to do string matching:
if [[ $foo = *a* ]]
if [[ $foo = [abc]* ]]
Variables
Variables can be arrays
foo[3]=test
echo ${foo[3]}
Indexed by number
${#arr} is length of the array
Multiple array elements can be set at once:
set A foo a b c d
echo ${foo[1]}
Set command can also be used for positional
params: set a b c d; print $2
Printing
Built-in print command to replace echo
Much faster
Allows options:
-u#
Functions
Alternative function syntax:
function name {
commands
}
Additional Features
Built-in arithmetic: Using $((expression ))
e.g., print $(( 1 + 1 * 8 / x ))
$HOME
home directory of user
$PWD
$OLDPWD
Course Outline
Introduction
Shell Quoting
Quoting causes characters to loose special
meaning.
\
Quoting Examples
$ cat file*
a
b
$ cat "file*"
cat: file* not found
$ cat file1 > /dev/null
$ cat file1 ">" /dev/null
a
cat: >: cannot open
FILES="file1 file2"
$ cat "$FILES"
cat: file1 file2 not found
Shell Comments
Special Characters
The shell processes the following characters specially
unless quoted:
| & ( ) < > ; " ' $ ` space tab newline
Token Types
The shell uses spaces and tabs to split the
line or lines into the following types of
tokens:
Control operators (||)
Redirection operators (<)
Reserved words (if)
Assignment tokens
Word tokens
Operator Tokens
Operator tokens are recognized everywhere
unless quoted. Spaces are optional before and
after operator tokens.
I/O Redirection Operators:
> >> >| >& < << <<- <&
Each I/O operator can be immediately preceded by a
single digit
Control Operators:
| & ; ( ) || && ;;
Simple Commands
A simple command consists of three types of
tokens:
Assignments (must come first)
Command word tokens
Redirections: redirection-op + word-op
Word Splitting
After parameter expansion, command
substitution, and arithmetic expansion,
the characters that are generated as a
result of these expansions that are not
inside double quotes are checked for split
characters
Default split character is space or tab
Split characters are defined by the value
of the IFS variable (IFS="" disables)
IFS=x v=exit
echo exit $v "$v"
exit e it exit
Pathname Expansion
After word splitting, each field that
contains pattern characters is replaced by
the pathnames that match
Quoting prevents expansion
set o noglob disables
Not in original Bourne shell, but in POSIX
Parsing Example
DATE=`date` echo $foo > \
/dev/null
DATE=`date` echo $foo > /dev/null
assignment
word
param
redirection
hello there
split by IFS
/dev/null
/dev/null
trap command
trap specifies command that should be evaled
when the shell receives a signal of a particular
value.
trap [ [command] {signal}+]
If command is omitted, signals are ignored
Especially useful for cleaning up temporary files
trap 'echo "please, dont interrupt!"' SIGINT
trap 'rm /tmp/tmpfile' EXIT
Reading Lines
read is used to read a line from a file and
to store the result into shell variables
read r prevents special processing
Uses IFS to split into words
If no variable specified, uses REPLY
read
read r NAME
read FIRSTNAME LASTNAME
Script Examples
Rename files to lower case
Strip CR from files
Emit HTML for directory contents
Rename files
#!/bin/sh
for file in *
do
lfile=`echo $file | tr A-Z a-z`
if [ $file != $lfile ]
then
mv $file $lfile
fi
done
Generate HTML
$ dir2html.sh > dir.html
The Script
#!/bin/sh
[ "$1" != "" ] && cd "$1"
cat <<HUP
<html>
<h1> Directory listing for $PWD </h1>
<table border=1>
<tr>
HUP
num=0
for file in *
do
genhtml $file
# this function is on next
page
done
cat <<HUP
</tr>
</table>
</html>
Function genhtml
genhtml()
{
file=$1
echo "<td><tt>"
if [ -f $file ]
then
echo "<font color=blue>$file</font>"
elif [ -d $file ]
then
echo "<font color=red>$file</font>"
else
echo "$file"
fi
echo "</tt></td>"
num=`expr $num + 1`
if [ $num -gt 4 ]
then
echo "</tr><tr>"
num=0
fi
}
Command Substitution
Better syntax with $(command)
Allows nesting
x=$(cat $(generate_file_list))
Expressions
Expressions are built-in with the [[ ]] operator
if [[ $var = "" ]]
string == pattern
string != pattern
string1 < string2
file1 nt file2
file1 ot file2
file1 ef file2
&&, ||
Patterns
Can be used to do string matching:
if [[ $foo = *a* ]]
if [[ $foo = [abc]* ]]
Variables
Variables can be arrays
foo[3]=test
echo ${foo[3]}
Indexed by number
${#arr} is length of the array
Multiple array elements can be set at once:
set A foo a b c d
echo ${foo[1]}
Set command can also be used for positional
params: set a b c d; print $2
Functions
Alternative function syntax:
function name {
commands
}
Additional Features
Built-in arithmetic: Using $((expression ))
e.g., print $(( 1 + 1 * 8 / x ))
$HOME
home directory of user
$PWD
$OLDPWD
KornShell 93
Variable Attributes
By default attributes hold strings of unlimited length
Attributes can be set with typeset:
Name References
A name reference is a type of variable that
references another variable.
nameref is an alias for typeset -n
Example:
user1="jeff"
user2="adam"
typeset n name="user1"
print $name
jeff
Patterns Extended
Additional pattern
types so that shell
patterns are
equally
expressive as
regular
expressions
Used for:
file expansion
[[ ]]
case statements
parameter
expansion
Patterns
Regular Expressions
ANSI C Quoting
$'' Uses C escape sequences
$'\t'
$'Hello\nthere'
Extensions
Associative Arrays
Network Application
Client application and server application
communicate via a network protocol
A protocol is a set of rules on how the client and
server communicate
web
client
HTTP
web
server
kernel
user
TCP/IP Suite
application
layer
client
TCP/UDP
transport
layer
server
TCP/UDP
internet
layer
IP
drivers/
hardware
IP
drivers/
hardware
(ethernet)
Data Encapsulation
Data
ApplicationLayer
H1
Data
H2
H1
Data
H2
H1
Data
TransportLayer
InternetLayer
Network
Access
Layer
H3
Internet Layer
Transport Layer
Transport Layer
Host-host layer
Provides error-free, point-to-point connection between
hosts
Ports
Both TCP and UDP use 16-bit port numbers
A server application listen to a specific port for
connections
Ports used by popular applications are well-defined
SSH (22), SMTP (25), HTTP (80)
1-1023 are reserved (well-known)
Name Service
Every node on the network normally has a
hostname in addition to an IP address
Domain Name System (DNS) maps IP
addresses to names
e.g. 128.122.81.155 is access1.cims.nyu.edu
Sockets
Sockets provide access to TCP/IP on UNIX
systems
Sockets are communications endpoints
Invented in Berkeley UNIX
Allows a network connection to be opened as a
file (returns a file descriptor)
machine 1
machine 2
Ksh93: /dev/tcp
Files in the form
/dev/tcp/hostname/port result in a
HTTP
Hypertext Transfer Protocol
Use port 80
QuickTime and a
TIFF (LZW) decompressor
are needed to see this picture.
HTTPrequest:
Getmethisdocument
HTTPresponse:
Hereisyourdocument
Resources
Web servers host web resources, including
HTML files, PDF files, GIF files, MPEG movies,
etc.
Each web object has an associated MIME type
HTML document has type text/html
JPEG image has type image/jpeg
host
port
resource
HTTP Transactions
HTTP request to web server
GET/v40images/nyu.gifHTTP/1.1
Host:www.nyu.edu
request
HTTP/1.1200OK
Date:Wed,19Oct200506:59:49GMT
Server:Apache/2.0.49(Unix)mod_perl/1.99_14Perl/v5.8.4
mod_ssl/2.0.49OpenSSL/0.9.7emod_auth_kerb/4.13PHP/5.0.0RC3
LastModified:Thu,12Sep200217:09:03GMT
ContentLength:163
ContentType:text/html;charset=ISO88591
response
<!DOCTYPEHTMLPUBLIC"//IETF//DTDHTML//EN">
<html>
<head>
<title></title>
<metaHTTPEQUIV="Refresh"CONTENT="0;URL=csweb/index.html">
<body>
</body>
</html>
Status Codes
Status code in the HTTP response indicates if a
request is successful
Some typical status codes:
200
302
OK
Found; Resource in different
URI
401
403
404
Authorization required
Forbidden
Not Found
Gateways
Interface between resource and a web
server
QuickTime and a
TIFF (LZW) decompressor
are needed to see this picture.
http WebServer
Gateway
resource
CGI
Common Gateway Interface is a standard interface for
running helper applications to generate dynamic
contents
Specify the encoding of data passed to programs
CGI Diagram
HTTPrequest
QuickTime and a
TIFF (LZW) decompressor
are needed to see this picture.
WebServer
HTTPresponse
spawnprocess
Document
Script
HTML
Document format used on the web
<html>
<head>
<title>SomeDocument</title>
</head>
<body>
<h2>SomeTopics</h2>
ThisisanHTMLdocument
<p>
Thisisanotherparagraph
</body>
</html>
HTML
HTML is a file format that describes a web
page.
These files can be made by hand, or
generated by a program
A good way to generate an HTML file is by
writing a shell script
Forms
HTML forms are used to collect user input
Data sent via HTTP request
Server launches CGI script to process data
<formmethod=POST
action=http://www.cs.nyu.edu/~unixtool/cgi
bin/search.cgi>
Enteryourquery:<inputtype=textname=Search>
<inputtype=submit>
</form>
Input Types
Text Field
<inputtype=textname=zipcode>
Radio Buttons
<inputtype=radioname=sizevalue=S>Small
<inputtype=radioname=sizevalue=M>Medium
<inputtype=radioname=sizevalue=L>Large
Checkboxes
<inputtype=checkboxname=extrasvalue=lettuce>Lettuce
<inputtype=checkboxname=extrasvalue=tomato>Tomato
Text Area
<textareaname=addresscols=50rows=4>
</textarea>
Submit Button
Submits the form for processing by the
CGI script specified in the form tag
<inputtype=submitvalue=SubmitOrder>
HTTP Methods
Determine how form data are sent to web
server
Two methods:
GET
Form variables stored in URL
POST
Form variables sent as content of HTTP request
Special characters are replaced with %## (2digit hex number), spaces replaced with +
e.g. 10/20Wed is encoded as 10%2F20+Wed
GET/POST examples
GET:
GET/cgibin/myscript.pl?name=Bill%20Gates&
company=MicrosoftHTTP/1.1
HOST:www.cs.nyu.edu
POST:
POST/cgibin/myscript.plHTTP/1.1
HOST:www.cs.nyu.edu
other headers
name=Bill%20Gates&company=Microsoft
GET or POST?
GET method is useful for
Retrieving information, e.g. from a database
Embedding data in URL without form element
DOCUMENT_ROOT
HTTP_HOST
HTTP_REFERER
HTTP_USER_AGENT
HTTP_COOKIE
REMOTE_ADDR
REMOTE_HOST
REMOTE_USER
REQUEST_METHOD
SERVER_NAME
SERVER_PORT
Debugging
Debugging can be tricky, since error
messages don't always print well as HTML
One method: run interactively
$ QUERY_STRING='birthday=10/15/03'
$ ./birthday.cgi
Content-type: text/html
<html>
Your birthday is <tt>10/15/02</tt>.
</html>
CGI Benefits
Simple
Language independent
UNIX tools are good for this because
Work well with text
Integrate programs well
Easy to prototype
No compilation (CGI scripts)
What is Perl?
Practical Extraction and Report Language
Scripting language created by Larry Wall in the
mid-80s
Functionality and speed somewhere between
low-level languages (like C) and high-level
ones (like shell)
Influence from awk, sed, and C Shell
Easy to write (after you learn it), but sometimes
hard to read
Widely used in CGI scripting
Data Types
Basic types: scalar, lists, hashes
Support OO programming and userdefined types
What Type?
Type of variable determined by special
leading character
$foo
scalar
@foo
list
%foo
hash
&foo
function
Scalars
Can be numbers
$num = 100;
$num = 223.45;
$num = -1.3e38;
Can be strings
$str
$str
$str
$str
=
=
=
=
unix tools;
Who\s there?;
good evening\n;
one\ttwo;
Name of script
Default variable
Current PID
Status of last pipe or system call
System error message
Input record separator
Input record number
Acts like 0 or empty string
Operators
Numeric: + - * / % **
String concatenation: .
$state = New . York;
# NewYork
String repetition: x
print bla x 3;
# blablabla
Binary assignments:
$val = 2; $val *= 3;
$state .= City;
# $val is 6
# NewYorkCity
Comparison Operators
Comparison
Numeri
c
String
Equal
==
eq
Not Equal
!=
ne
Greater than
>
gt
Less than
<
lt
<=
le
>=
ge
Boolean Values
if ($ostype eq unix) { }
if ($val) { }
No boolean data type
undef is false
0 is false; Non-zero numbers are true
and 0 are false; other strings are true
The unary not (!) negates the boolean
value
Array/List Assignment
@teams=(Knicks,Nets,Lakers);
print $teams[0];
# print Knicks
$teams[3]=Celtics;# add new elt
@foo = ();
# empty list
@nums = (1..100);
# list of 1-100
@arr = ($x, $y*6);
($a, $b) = (apple, orange);
($a, $b) = ($b, $a); # swap $a $b
@arr1 = @arr2;
# $num gets 8
Hashes
Associative arrays - indexed by strings (keys)
$cap{Hawaii} = Honolulu;
%cap = ( New York, Albany, New Jersey,
Trenton, Delaware, Dover );
# New Jersey
Hash Functions
keys returns a list of keys
@state = keys %cap;
Merging Hashes
Method 1: Treat them as lists
%h3 = (%h1, %h2);
Subroutines
sub myfunc { }
$name=Jane;
sub print_hello {
print Hello $name\n; # global $name
}
&print_hello;
# print Hello Jane
print_hello;
# print Hello Jane
print_hello();
# print Hello Jane
Arguments
Parameters are assigned to the special array @_
Individual parameter can be accessed as $_[0],
$_[1],
sub sum {
my $x;
# private variable $x
foreach (@_) { # iterate over params
$x += $_;
}
return $x;
}
$n = &sum(3, 10, 22); # n gets 35
Return Values
The return value of a subroutine is the last
expression evaluated, or the value returned by
the return operator
sub myfunc {
my $x = 1;
$x + 2; #returns 3
}
sub myfunc {
my $x = 1;
return $x + 2;
} @somelist;
Can also return a list: return
Lexical Variables
Variables can be scoped to the enclosing block
with the my operator
sub myfunc {
my $x;
my($a, $b) = @_;
# copy params
use strict
The use strict pragma enforces some
good programming rules
All new variables need to be declared with
my
#!/usr/bin/perl -w
use strict;
$n = 1;
sub dec_by_one {
my @ret = @_;
for my $n (@ret) { $n-- }
return @ret;
}
sub minus_one {
for (@_) { $_-- }
}
# @res=(0, 1, 2, 3)
# (@nums,$num)=(1, 2, 3, 4)
# (@nums,$num)=(0, 1, 2, 3)
# make a copy
<>
Diamond operator < > helps Perl programs
behave like standard Unix utilities (cut, sed, )
Lines are read from list of files given as command
line arguments (@ARGV), otherwise from stdin
while (<>) {
chomp;
print Line $. from $ARGV is $_\n;
}
./myprog file1 file2 Read from file1, then file2, then standard input
$ARGV is the current filename
Filehandles
Use open to open a file for reading/writing
open
open
open
open
LOG,
LOG,
LOG,
LOG,
syslog;
<syslog;
>syslog;
>>syslog;
# read
# read
# write
# append
Errors
When a fatal error is encountered, use die to
print out error message and exit program
die Something bad happened\n if .;
Always check return value of open
open LOG, >>syslog
or die Cannot open log: $!;
Writing to a File
open LOG, >/tmp/log
or die Cannot create log: $!;
print LOG Some log messages\n
printf LOG %d entries processed.\n,
$num;
close LOG;
no comma after filehandle
-e
-z
-s
$_isthedefaultoperand
if - elsif - else
if elsif else
if ( $x
print
}
elsif (
print
}
else {
print
}
> 0 ) {
x is positive\n;
$x < 0 ) {
x is negative\n;
x is zero\n;
unless
Like the opposite of if
unless ($x < 0) {
print $x is non-negative\n;
}
unlink $file unless -A $file < 100;
for
for (init; test; incr) { }
# sum of squares of 1 to 5
for ($i = 1; $i <= 5; $i++) {
$sum += $i*$i;
}
next
next skips the remaining of the current
iteration (like continue in C)
# only print non-blank lines
while (<>) {
if ( $_ eq \n) { next; }
else { print; }
}
last
last exits loop immediately (like break in
C)
# print up to first blank line
while (<>) {
if ( $_ eq \n) { last; }
else { print; }
}
Logical AND/OR
Logical AND : &&
if (($x > 0) && ($x < 10)) { }
Logical OR : ||
if
Ternary Operator
Same as the ternary operator (?:) in C
expr1 ? expr2 : expr3
Like if-then-else: If expr1 is true, expr2 is
used; otherwise expr3 is used
$weather=($temp>50)?warm:cold;
Regular Expressions
Use EREs (egrep style)
Plus the following character classes
\w
word characters: [A-Za-z0-9_]
\d
digits: [0-9]
\s
whitespaces: [\f\t\n\r ]
\b
word boundaries
\W, \D, \S, \B are complements of the corresponding
classes above
Backreferences
Support backreferences
Subexpressions are referred to using \1, \
2, etc. in the RE and $1, $2, etc. outside
RE
if (/^this (red|blue|green) (bat|ball) is \1/)
{
($color, $object) = ($1, $2);
}
Matching
Pattern match operator: /RE/ is shortcut of m/RE/
Returns true if there is a match
Match against $_
Can also use m(RE), m<RE>, m!RE!, etc.
if (/^\/usr\/local\//) { }
if (m%/usr/local/%) { }
Case-insensitive match
if (/new york/i) { };
Matching cont.
To match an RE against something other than
$_, use the binding operator =~
if ($s =~ /\bblah/i) {
print Found blah!
}
!~ negates the match
while (<STDIN> !~ /^#/) { }
\Substitutions
Sed-like search and replace with s///
s/red/blue/;
$x =~ s/\w+$/$`/;
RE Functions
split string using RE (whitespace by default)
@fields = split /:/, ::ab:cde:f;
# gets (,,ab,cde,f)
# gets --ab-cde-f
Capturing Output
If output from another program needs to
be collected, use the backticks
my $files = `ls *.c`;
Collect all output lines into a single string
my @files = `ls *.c`;
Each element is an output line
Environment Variables
Environment variables are stored in the
special hash %ENV
$ENV{PATH} = /usr/local/bin:
$ENV{PATH};
s2p
Translates a sed script to Perl
perldoc
$
$
$
$
Modules
Perl modules are libraries of reusable
code with specific functionalities
Standard modules are distributed with
Perl, others can be obtained from
Include modules in your program with use,
e.g. use CGI incorporates the CGI module
Each module has its own namespace
CGI Programming
Forms
HTML forms are used to collect user input
Data sent via HTTP request
Server launches CGI script to process data
<form method=POST
action=http://www.cs.nyu.edu/~unixtool/cgibin/search.cgi>
Enter your query: <input type=text name=Search>
<input type=submit>
</form>
Input Types
Text Field
<input type=text name=zipcode>
Radio Buttons
<input type=radio name=size value=S> Small
<input type=radio name=size value=M> Medium
<input type=radio name=size value=L> Large
Checkboxes
<input type=checkbox name=extras value=lettuce> Lettuce
<input type=checkbox name=extras value=tomato> Tomato
Text Area
<textarea name=address cols=50 rows=4>
</textarea>
Submit Button
Submits the form for processing by the
CGI script specified in the form tag
<input type=submit value=Submit Order>
HTTP Methods
Determine how form data are sent to web
server
Two methods:
GET
Form variables stored in URL
POST
Form variables sent as content of HTTP request
Special characters are replaced with %## (2digit hex number), spaces replaced with +
e.g. 11/8 Wed is encoded as 11%2F8+Wed
GET or POST?
GET method is useful for
Retrieving information, e.g. from a database
Embedding data in URL without form element
</html>
CGI Benefits
Simple
Language independent
UNIX tools are good for this because
Work well with text
Integrate programs well
Easy to prototype
No compilation (CGI scripts)