Professional Documents
Culture Documents
Partially funded by DOE/MICS Field Work Proposal on Internet End-to-end Performance Monitoring (IEPM), also supported by IUPAP
Overview
Why is measurement difficult yet important? LAN vs WAN SNMP Effects of measurement interval Passive Active
Tools
Trouble shooting
Tools, how to find things & who to tell
2
ISPs worried about providing access to core, making results public, & privacy issues The phone connection oriented model (Poisson distributions of session length etc.) does not work for Internet traffic 3 (heavy tails, self similar behavior, multi-fractals etc.)
LAN vs WAN
Measuring the LAN
Network admin has control so:
Can read MIBs from devices Can within limits passively sniff traffic Know the routes between devices
Manually for small networks Automated for large networks
So typically have to make do with what can be measured from end to end with very limited information from intermediates equipment hops.
5
Remote Network Monitoring (RMON) is a management tool for passively watching line traffic SNMP communication protocol to read out data and set parameters
Polling protocol, manager asks questions & agent responds
TCP/IP net
Agent MIB
Agent MIB
Agent MIB
Agent MIB
NMS contains manager software to send & receive SNMP messages to Agents Agent is a software component residing on a managed node, responds to SNMP queries, performs updates & reports problems MIB resides on nodes and at NMS and is a logical description of all network management data.
7
MIB variables must be polled separately, i.e. entire MIB cannot be fetched with single command SNMPv2 and v3 attempt to address these and other limitations Despite limitations, SNMP has been a big success
Provides device and link utilization (byte, packets) and errors Lot of facilities/tools built around SNMP to provide reports for sites Security concerns limit access typically to very limited set of owner/admins
E.g. ISPs wont let you poll their devices
8
SNMP Examples
Using MRTG to display Router bits/s MIB variable
Averaging intervals
Typical measurements of utilization are made for 5 minute intervals or longer in order not to create much impact. Interactive human interactions require second or sub-second response So it is interesting to see the difference between measurement made with different time frames.
10
5 secs
5 mins
1 hour
11
Averages vs maxima
Maximum of all 5 sec samples can be factor of 2 or more greater than the average over 5 minutes
12
13
Active:
provides explicit control on the generation of packets for measurement scenarios testing what you want, when you need it. Injects extra artificial traffic
14
Passive tools
SNMP Hardware probes e.g. Sniffer, NetScout, can be stand-alone or remotely access from a central management station Software probes: snoop, tcpdump, require promiscous access to NIC card, i.e. root/sudo access Flow measurement: netramet, OCxMon/CoralReef, Cisco/Netflow
15
(#)
rtr-msfc-dmz
NTON
Swh-dmz slac-rt1.es.net
ESnet
swh-root
(*) Upgrade to OC12 has been requested (#) This link will be replaced with a OC48 POS card for the 6500 when available 17
2 Days SSH
19
Time series
UDP
TCP
Outgoing
20
Flow sizes
SNMP
Real A/V
21
22
Flow lengths
Distribution of netflow lengths for SLAC border
Log-log plots, linear trendline = power law Netflow ties off flows after 30 minutes TCP, UDP & ICMP flows are ~log-log linear for longer (hundreds to 1500 seconds) flows (heavy-tails) There are some peaks in TCP distributions, timeouts?
Web server CGI script timeouts (300s), TCP connection establishment (default 75s), TIME_WAIT (default 240s), tcp_fin_wait (default 675s)
ICMP
TCP UDP
23
Flow lengths
60% of TCP flows less than 1 second Would expect TCP streams longer lived
But 60% of UDP flows over 10 seconds, maybe due to heavy use of AFS
24
Alternative synack, but can look like DoS attack Sting: measures one way loss Traceroute
How it works, what it provides Reverse traceroute servers Traceroute archives
Ping
ICMP client/server application built on IP
Client send ICMP echo request, server sends reply Server usually in kernel, so reliable & fast
0 Type=8 8 16 24 Code Checksum Identifier Sequence number Optional data 31
User can specify number of data bytes. Client puts timestamp in data bytes. Compares timestamp with time when echo comes back to get RTT Many flavors (e.g. fping) and options
packet length, number of tries, timeout, separation
Ping example
Repeat count Packet size Remote host RTT syrup:/home$ ping -c 6 -s 64 thumper.bellcore.com PING thumper.bellcore.com (128.96.41.1): 64 data bytes 72 bytes from 128.96.41.1: icmp_seq=0 ttl=240 time=641.8 ms 72 bytes from 128.96.41.1: icmp_seq=2 ttl=240 time=1072.7 ms Missing seq # 72 bytes from 128.96.41.1: icmp_seq=3 ttl=240 time=1447.4 ms 72 bytes from 128.96.41.1: icmp_seq=4 ttl=240 time=758.5 ms Summary 72 bytes from 128.96.41.1: icmp_seq=5 ttl=240 time=482.1 ms --- thumper.bellcore.com ping statistics --- 6 packets transmitted, 5 packets received, 16% packet loss round-trip min/avg/max = 482.1/880.5/1447.4 ms
27
Traceroute
UDP/ICMP tool to show route packets take from local to Max hops remote host Remote host
Probes/hop 17cottrell@flora06:~>traceroute -q 1 -m 20 lhr.comsats.net.pk traceroute to lhr.comsats.net.pk (210.56.16.10), 20 hops max, 40 byte packets 1 RTR-CORE1.SLAC.Stanford.EDU (134.79.19.2) 0.642 ms 2 RTR-MSFC-DMZ.SLAC.Stanford.EDU (134.79.135.21) 0.616 ms 3 ESNET-A-GATEWAY.SLAC.Stanford.EDU (192.68.191.66) 0.716 ms 4 snv-slac.es.net (134.55.208.30) 1.377 ms 5 nyc-snv.es.net (134.55.205.22) 75.536 ms 6 nynap-nyc.es.net (134.55.208.146) 80.629 ms 7 gin-nyy-bbl.teleglobe.net (192.157.69.33) 154.742 ms 8 if-1-0-1.bb5.NewYork.Teleglobe.net (207.45.223.5) 137.403 ms 9 if-12-0-0.bb6.NewYork.Teleglobe.net (207.45.221.72) 135.850 ms No response: 10 207.45.205.18 (207.45.205.18) 128.648 ms Lost packet or router 11 210.56.31.94 (210.56.31.94) 762.150 ms ignores 12 islamabad-gw2.comsats.net.pk (210.56.8.4) 751.851 ms 13 * 14 lhr.comsats.net.pk (210.56.16.10) 827.301 ms
28
Reverse traceroute server runs as CGI script in web server Allow measurement of route from other end. Important for asymmetric routes. See e.g.
CAIDA map of reverse traceroute servers
www.caida.org/analysis/routing/reversetrace/
www.slac.stanford.edu/comp/net/wan-mon/traceroute-srv.html
30
Pingroute
Run traceroute, then ping each router n times
helps identify where in route the problems start to occur
Routers may not respond to pings, or may treat pings directed at them, differently to other packets
31
Path characterization
Pathchar
sends multiple packets of varying sizes to each router along route measures minimum response time plot min RTT vs packet size to get bandwidth calculate differences to get individual hop characteristics measures for each hop: BW, queuing, delay/hop can take a long time
Pipechar
Also sends back-to-back packets and measures separation Bottleneck on return Much faster Finds bottleneck Min spacing
At bottleneck
Network throughput
Iperf
Client generates & sends UDP or TCP packets Server receives receives packets Can select port, maximum window size, port , duration, Mbytes to send etc. Client/server communicate packets seen etc. Reports on throughput
Requires sever to be installed at remote site, i.e. friendly administrators or logon account and password
33
Iperf example
TCP port 5006 Max window size 3 parallel streams Remote host 25cottrell@flora06:~>iperf -p 5008 -w 512K -P 3 -c sunstats.cern.ch -----------------------------------------------------------Client connecting to sunstats.cern.ch, TCP port 5008 TCP window size: 512 KByte -----------------------------------------------------------[ 6] local 134.79.16.101 port 57582 connected with 192.65.185.20 port 5008 [ 5] local 134.79.16.101 port 57581 connected with 192.65.185.20 port 5008 [ 4] local 134.79.16.101 port 57580 connected with 192.65.185.20 port 5008 [ ID] Interval Transfer Bandwidth [ 4] 0.0-10.3 sec 19.6 MBytes 15.3 Mbits/sec [ 5] 0.0-10.3 sec 19.6 MBytes 15.3 Mbits/sec [ 6] 0.0-10.3 sec 19.7 MBytes 15.3 Mbits/sec
http://amp.nlanr.net/AMP/
AMP uses dedicated PCs as monitors, ~ 115 (June, 2001) Oriented to Internet 2, ~ 10 countries Does mainly full mesh pinging
36
PingER
Measurements from
32 monitors in 14 countries Over 600 remote hosts Over 72 countries Over 3300 monitor-remote site pairs Measurements go back to Jan-95 Reports on RTT, loss, reachability, jitter, reorders, duplicates
PingER cont.
Monitor timestamps and sends ping to remote site at regular intervals (typically about every 30 minutes) Remote site echoes the ping back Monitor notes current and send time and gets RTT Discussing installing monitor site in Pakistan
provide real experience of using techniques get real measurements to set expectations, identify problem areas, make recommendations provide access to data for developing new analysis techniques, for statisticians etc.
38
NIMI (National Internet Measurement Infrastructure) more of an infrastructure for measurements and some tools (I.e. currently does not have public available data,regularly updated)
Mainly full mesh measurements on demand
39
Skitter
Makes ping & route measurements to tens of thousands of sites around the world. Site selection varies based on web site hits.
Provide loss & RTTs Skitter & PingER are main 2 sites to monitor developing world.
40
Trouble shooting
Ping to localhost, ping to gateway & to remote host
Use IP address to avoid nameserver problems Look for connectivity, loss & RTT May need to run for a long time to see some patyhologies (e.g. bursty loss dues to DSL loss of sync) Use synack or sting if ICMP blocked
Traceroute to remote host Reverse traceroute from remote host to you Ping routers along route Look at history plots (PingER, AMP, Surveyor), when did problem start, how big an effect is it?
41
Trouble shooting
Try user application Iperf to test throughput
42
Where is a host?
Where is host?
Name server lookup to find hostname given IP address
47cottrell@netflow:~>nslookup Server: localhost Address: 127.0.0.1 Name: lhr.comsats.net.pk Address: 210.56.16.10 210.56.16.10
43
Get contacts for ISPs (if know ISP or AS): http://puck.nether.net/netops/nocs.cgi Gives ISP name, web page, phone number, email, hours etc.
Look at real-time information about the global routing system from the perspectives of several different locations around the Internet
Use route views at www.antc.uoregon.edu/route-views/
44
45
46
More Information
Tutorial on monitoring
www.slac.stanford.edu/comp/net/wan-mon/tutorial.html
Ping
http://www.ping127001.com/pingpage.htm
Ames IXP: approximately 60-65% was HTTP, about 13% was NNTP Uwisc: 34% HTTP, 24% FTP, 13% Napster
48