Professional Documents
Culture Documents
ZMap source code and documentation are available for download at https://zmap.io/.
Eric Wustrow
University of Michigan
J. Alex Halderman
University of Michigan
zakir@umich.edu
ewust@umich.edu
jhalderm@umich.edu
Abstract
Internet-wide network scanning has numerous security
applications, including exposing new vulnerabilities and
tracking the adoption of defensive mechanisms, but probing the entire public address space with existing tools is
both difficult and slow. We introduce ZMap, a modular,
open-source network scanner specifically architected to
perform Internet-wide scans and capable of surveying
the entire IPv4 address space in under 45 minutes from
user space on a single machine, approaching the theoretical maximum speed of gigabit Ethernet. We present
the scanner architecture, experimentally characterize its
performance and accuracy, and explore the security implications of high speed Internet-scale network surveys, both
offensive and defensive. We also discuss best practices for
good Internet citizenship when performing Internet-wide
surveys, informed by our own experiences conducting a
long-term research survey over the past year.
2.1
Addressing Probes
State
&
Config
Packet
Generation
Address Generation
Packet
Transmission
Probe Scheduler
Framework Monitoring
Output Handler
Response
Interpretation
Result Processing
Receipt &
Validation
Figure 1: ZMap Architecture ZMap is an open-source network scanner optimized for efficiently performing
Internet-scale network surveys. Modular packet generation and response interpretation components (blue) support
multiple kinds of probes, including TCP SYN scans and ICMP echo scans. Modular output handlers (red) allow users
to output or act on scan results in application-specific ways. The architecture allows sending and receiving components
to run asynchronously and enables a single source machine to comprehensively scan every host in the public IPv4
address space for a particular open TCP port in under 45 mins using a 1 Gbps Ethernet link.
specific case, we know that 3 is a primitive root of
(Z/4, 294, 967, 311Z) .
Because we know that the generators of (Z p1 , +) are
{s|(s, p 1) = 1}, we can efficiently find the generators
of the additive group by precalculating and storing the
factorization of p 1 and checking addresses against the
factorization at random until we find one that is coprime
with p 1 and then map it into (Zp , ). Given that there
exist approximately 109 generators, we expect to make
four tries before finding a primitive root. While this process introduces complexity at the beginning of a scan, it
adds only a small amount of one-time overhead.
Once a primitive root has been generated, we can easily
iterate through the address space by applying the group
operation to the current address (in other words, multiplying the current address by the primitive root modulo
232 + 15). We detect that a scan has completed when we
reach the initially scanned IP address. This technique
allows the sending thread to store the selected permutation and progress through it with only three integers: the
primitive root used to generate the multiplicative group,
the first scanned address, and the current address.
2.2
Upon receipt of a packet, we check the source and destination port, discard packets clearly not initiated by the
scan, and pass the remaining packets to the active probe
module for interpretation.
While the sending and receiving components of ZMap
operate independently, we ensure that the receiver is initialized prior to sending probes and that the receiver continues to run for a period of time (by default, 8 seconds)
after the sender has completed in order to process any
delayed responses.
2.3
Probe Modules
2.4
Output Modules
Checking Response Integrity ZMaps receiving components need to determine whether received packets are
valid responses to probes originating from the scanner
or are part of other background traffic. Probe modules perform this validation by encoding host- and scaninvocationspecific data into mutable fields of each probe
packet, utilizing fields that will have recognizable effects
on fields of the corresponding response packets in a manner similar to SYN cookies [4].
For each scanned host, ZMap computes a MAC of the
destination address keyed by a scan-specific secret. This
MAC value is then spread across any available fields by
4
1.02
Hitrate
1.01
0.98
0.97
0.96
0.95
0.94
0
00
00 0
15 00
00 0
14 00
00 0
13 00
00 0
12 00
00 0
11 00
00
10 00
00
75 00
00
50 00
00
25 00
00
10 0
00
50 0
00
25 0
00
10
00
50
00
25
00
10
3.1
1
0.99
Figure 2: Hit rate vs. Scan rate We find no correlation between hit rate (positive responses/hosts probed)
and scan rate (probes sent/second). Shown are means and
standard deviations over ten trials. This indicates that
slower scanning does not reveal additional hosts.
89000
Hosts Found
88500
88000
87500
87000
86500
86000
85500
85000
10
15
20
Unique SYN Packets Sent
25
30
In order to determine whether our scanner and our upstream network can handle scanning at gigabit line speed,
we examine whether the scan rate, the rate at which ZMap
sends probe packets, has any effect on the hit rate, the
fraction of probed hosts that respond positively (in this
case, with a SYN-ACK). If libpcap, the Linux kernel, our
institutional network, or our upstream provider are unable
to adequately handle the traffic generated by the scanner
at full speed, we would expect packets to be dropped and
the hit rate to be lower than at slower scan rates.
We experimented by sending TCP SYN packets to
random 1% samples of the IPv4 address space on port
443 at varying scan rates. We conducted 10 trials at each
of 16 scan rates ranging from 1,000 to 1.4 M packets per
second. The results are shown in Figure 2.
We find no statistically significant correlation between
scan rate and hit rate. This shows that our ZMap setup
is capable of handling scanning at 1.4 M packets per
1.02
Hitrate
0.98
0.96
0.94
0.92
:0
22
:0
20
:0
18
:0
16
:0
14
:0
:0
12
00
10
00
8:
00
6:
4:
00
2:
:0
:0
00
22
Time of Day
Figure 4: Diurnal Effect on Hosts Found We observed a 3.1% variation in ZMaps hit rate depending
on the time of day the scan was performed. (Times EST.)
5
3.3
second and that scanning at lower rates provides no benefit in terms of identifying additional hosts. From an
architectural perspective, this validates that our receiving
infrastructure based on libpcap is capable of processing
responses generated by the scanner at full speed and that
kernel modules such as PF_RING [8] are not necessary
for gigabit-speed network scanning.
3.2
3.4
Scan Type
Coverage
(normalized)
Duration
(mm:ss)
0.978
0.814
1.000
0.987
45:03
24:12
00:11
00:10
116.3 days
62.5 days
2:12:35
1:09:45
Table 1: ZMap vs. Nmap Comparison We scanned 1 million hosts on TCP port 443 using ZMap and Nmap and
averaged over 10 trials. Despite running hundreds of times faster, ZMap finds more listening hosts than Nmap, due to
Nmaps low host timeout. Times for ZMap include a fixed 8 second delay to wait for responses after the final probe.
second configuration that sends two SYN probes to each
host (-P 2). We repeated this process for 10 trials over a
12 hour period and report the averages in Table 1.
The results show that ZMap scanned much faster than
Nmap and found more listening hosts than either Nmap
configuration. The reported durations for ZMap include
time sent sending probes as well as a fixed 8-second delay
after the sending process completes, during which ZMap
waits for late responses. Extrapolating to the time required for an Internet-wide scan, the fastest tested ZMap
configuration would complete approximately 1300 times
faster than the fastest Nmap configuration.1
0.8
0.6
3.5
1.0
0.4
0.2
0.0
0
0.2
0.4
0.6
response time (seconds)
0.8
Scan
Date
2010/12
2011/10
2012/06
2013/05
TLS Servers
All Certs
Trusted Certs
16.2 M
28.9 M
31.8 M
34.5 M
7.7 M
12.8 M
19.0 M
22.8 M
4.0 M
5.8 M
7.8 M
8.6 M
1.46 M
1.96 M
2.95 M
3.27 M
Table 2: Comparison with Prior Internet-wide HTTPS Surveys Due to growth in HTTPS deployment, ZMap
finds almost three times as many TLS servers as the SSL Observatory did in late 2010, yet this process takes only
10 hours to complete from a single machine using a ZMap-based workflow, versus three months on three machines.
To compare ZMaps performance for this task, we used
it to conduct comprehensive scans of port 443 and used
a custom certificate fetcher based on libevent [24] and
OpenSSL [37] to retrieve TLS certificates from each responsive host. With this methodology, we were able to
discover hosts, perform TLS handshakes, and collect and
parse the resulting certificates in under 10 hours from a
single machine.
As shown in Table 2, we find significantly more TLS
servers than previous work78% more than Heninger
et al. and 196% more than the SSL Observatorylikely
due to increased HTTPS deployment since those studies
were conducted. Linear regression shows an average
growth in HTTPS deployment of about 540,000 hosts
per month over the 29 month period between the SSL
Observatory scan and our most recent dataset. Despite
this growth, ZMap is able to collect comprehensive TLS
certificate data in a fraction of the time and cost needed
in earlier work. The SSL Observatory took roughly 650
times as much machine time to acquire the same kind of
data, and Heninger et al. took about 65 times as much.
Organization
GoDaddy.com, Inc.
GeoTrust Inc.
Comodo CA Limited
VeriSign, Inc.
Thawte, Inc.
DigiCert Inc
GlobalSign
Starfield Technologies
StartCom Ltd.
Entrust, Inc.
913,416
586,376
374,769
317,934
228,779
145,232
117,685
94,794
88,729
76,929
(31.0%)
(19.9%)
(12.7%)
(10.8%)
(7.8%)
(4.9%)
(4.0%)
(3.2%)
(3.0%)
(2.6%)
4.1
Certificates
4.2
Port
80
7547
443
21
23
22
25
3479
8080
53
Service
HTTP
CWMP
HTTPS
FTP
Telnet
SSH
SMTP
2-Wire RPC
HTTP-alt/proxy
DNS
1.77
1.12
0.93
0.77
0.71
0.57
0.43
0.42
0.38
0.38
Trusted Certificates
1.25
HTTPS Hosts
Unique Certificates
Trusted
Certificates
1.2
Alexa Top 1 Mil. Domains
E.V. Certificates
Netcraft HTTP Hosts
1.15
4.3
1.1
1.05
1
0.95
3
/1
3
/1
05
3
/1
04
3
/1
03
3
/1
02
2
/1
01
2
/1
12
2
/1
11
2
/1
10
2
/1
09
2
/1
08
2
/1
07
06
Scan Date
0.62
0.6
0.58
0.56
0.54
0.52
10
13
/
05
13
/
03
13
/
01
12
/
11
12
/
09
12
/
07
Scan Date
4.4
The ability to perform comprehensive Internet scans implies the potential to uncover unadvertised services that
were previously only accessible with explicit knowledge
of the host name or address. For example, Tor bridges
are intentionally not published in order to prevent ISPs
and government censors from blocking connections to
the Tor network [35]. Instead, the Tor Project provides
users with the IP addresses of a small number of bridges
based on their source address. While Tor developers have
acknowledged that bridges can in principle be found by
Internet-wide scanning [9], the set of active bridges is constantly changing, and the data would be stale by the time
a long running scan was complete. However, high-speed
scanning might be used to mount an effective attack.
2 Moore reported many more UPnP hosts [25] but acknowledges that
his scans occurred over a 5 month period and did not account for hosts
being counted multiple times due to changing IP addresses.
10
4.6
4.5
Table 5: Recommended Practices We offer these suggestions for other researchers conducting fast Internetwide scans as guidelines for good Internet citizenship.
5.1
User Responses
Small/Medium Business
Home User
Other Corporation
Academic Institution
Government/Military
Internet Service Provider
Unknown
Total Entities
41
38
17
22
15
2
10
145
Related Work
Future Work
Scanning IPv6 While ZMap is capable of rapidly scanning the IPv4 address space, brute-force scanning methods will not suffice in the IPv6 address space, which
is far too large to be fully enumerated [7]. This places
current researchers in a window of opportunity to take
advantage of fast Internet-wide scanning methodologies
before IPv6-only services become common place. New
methodologies will need to be developed specifically for
performing surveys of the IPv6 address space.
Acknowledgments
The authors thank the exceptional sysadmins at the University of Michigan for their help and support throughout
this project. This research would not have been possible
without Kevin Cheek, Laura Fink, Paul Howell, Don Winsor, and others from ITS, CAEN, and DCO. We thank
Michael Bailey for advice on many aspects of the work
and Oguz Durumeric for his discussion of generating permutations of the IPv4 address space. We also thank Brad
Campbell, Peter Eckersley, James Kasten, Pat Pannuto,
Amir Rahmati, Michael Rushanan, and Seth Schoen. This
work was supported in part by NSF grant CNS-1255153
and by an NSF Graduate Research Fellowship.
Scanning Exclusion Standards If Internet-wide scanning becomes more widespread, it will become increasingly burdensome for system operators who do not want
to receive such probe traffic to manually opt out from
all benign sources. Further work is needed to standardize an exclusion signaling mechanism, akin to HTTPs
robots.txt [19]. For example, a host could use a combination of protocol flags to send a do-not-scan signal,
perhaps by responding to unwanted SYNs with the SYN
and RST flags, or a specific TCP option set.
References
[1] Anonymous. Internet census 2012. http://census2012.
sourceforge.net/paper.html, March 2013.
Conclusion
[2] G. Bartlett, J. Heidemann, and C. Papadopoulos. Understanding passive and active service discovery. In 7th ACM
SIGCOMM conference on Internet measurement (IMC),
pages 5770, 2007.
14
[9] R. Dingledine. Research problems: Ten ways to discover Tor bridges. http://blog.torproject.org/blog/researchproblems-ten-ways-discover-tor-bridges, October 2011.
[10] P. Eckersley and J. Burns. An observatory for the SSLiverse. Talk at Defcon 18 (2010). https://www.eff.org/files/
DefconSSLiverse.pdf.
[14] N. Heninger, Z. Durumeric, E. Wustrow, and J. A. Halderman. Mining your Ps and Qs: Detection of widespread
weak keys in network devices. In 21st USENIX Security
Symposium, August 2012.
http://
15