You are on page 1of 6

Google platform

F50 IBM RS/6000 donated by IBM, included 4 processors, 512 MB of memory and 8 9 GB hard disk
drives.
Two additional boxes included 3 9 GB hard drives
and 6 x 4 GB hard disk drives respectively (the original storage for Backrub). These were attached to
the Sun Ultra II.
IBM disk expansion box with another 8 9 GB hard
disk drives donated by IBM.
Homemade disk box which contained 10 9 GB
SCSI hard disk drives.

1.2 Production hardware


Google uses commodity-class x86 server computers running customized versions of Linux. The goal is to purchase CPU generations that oer the best performance
per dollar, not absolute performance. How this is measured is unclear, but it is likely to incorporate running
costs of the entire server, and CPU power consumption
could be a signicant factor.[2] Servers as of 20092010
consisted of custom-made open-top systems containing
two processors (each with two cores[3] ), a considerable
amount of RAM spread over 8 DIMM slots housing
double-height DIMMs, and two SATA hard disk drives
connected through a non-standard ATX-sized power supply unit.[4] According to CNET and to a book by John
Hennessy, each server had a novel 12-volt battery to reduce costs and improve power eciency.[3][5]

Googles rst production server rack, circa 1998

The Google platform refers to the computer software


and large hardware resources Google uses to provide
their services. This article describes the technological
infrastructure behind Googles websites as presented in
the companys public announcements.

According to Google their global data center operation electrical power ranges between 500 and 681
megawatts.[6][7] The combined processing power of these
servers might have reached from 20 to 100 petaops in
2008.[8]

Hardware

1.1

Original hardware

The original hardware (circa 1998) that was used by 1.3 Network topology
Google when it was located at Stanford University
included:[1]
Details of the Google worldwide private networks are not
publicly available but Google publications[9][10] make ref Sun Microsystems Ultra II with dual 200 MHz pro- erences to the Atlas Top 10 report that ranks Google as
cessors, and 256 MB of RAM. This was the main the third largest ISP behind Level 3.[11]
machine for the original Backrub system.
In order to run such a large network with direct connections to as many ISP as possible at the lowest possible cost
Google has a very open peering policy.[12]

2 300 MHz dual Pentium II servers donated by


Intel, they included 512 MB of RAM and 10 9
GB hard drives between the two. It was on these
that the main search ran.

From this site we can see that the Google network can be
accessed from 67 public exchange points and 69 dier1

SOFTWARE

Project 02, the $600 million[20] complex was built in


2006 and is approximately the size of two American
football elds, with cooling towers four stories high.[21]
The site was chosen to take advantage of inexpensive
hydroelectric power, and to tap into the regions large
surplus of ber optic cable, a remnant of the dot-com
[22]
The private side of the network is a secret but recent boom. A blueprint of the site appeared in 2008.
disclosure from Google[13] indicate that they use custom
built high-radix switch-routers (with a capacity of 128 1.5 Summa papermill
10 Gigabit Ethernet port) for the wide area network. Running no less than two routers per datacenter (for redun- In February 2009, Stora Enso announced that they had
dancy) we can conclude that the Google network scales in sold the Summa paper mill in Hamina, Finland to Google
the terabit per second range (with two fully loaded routers for 40 million Euros.[23][24] Google plans to invest 200
the bi-sectional bandwidth amount to 1,280 Gbit/s).
million euros on the site to build a data center.[25] Google
ent locations across the world. As of May 2012 Google
had 882 Gbit/s of public connectivity (not counting private peering agreements that Google has with the largest
ISPs). This public network is used to distribute content
to Google users as well as to crawl the Internet to build
its search indexes.

These custom switch-routers are connected to DWDM chose this location due to the availability and proximity
devices to interconnect data centers and point of pres- of renewable energy sources.[26]
ences (PoP) via dark bre.
From a datacenter view, the network starts at the rack
level, where 19-inch racks are custom-made and contain
40 to 80 servers (20 to 40 1U servers on either side,
while new servers are 2U rackmount systems.[14] Each
rack has a switch). Servers are connected via a 1 Gbit/s
Ethernet link to the top of rack switch (TOR). TOR
switches are then connected to a gigabit cluster switch using multiple gigabit or ten gigabit uplinks.[15] The cluster
switches themselves are interconnected and form the datacenter interconnect fabric (most likely using a dragony
design rather than a classic buttery or attened buttery
layout[16] ).

1.6 Modular container data centers


Since 2005,[27] Google has been moving to a containerized modular data center. Google led a patent application for this technology in 2003.[28]

2 Software
Most of the software stack that Google uses on their
servers was developed in-house.[29] According to a
well known Google employee, C++, Java, Python and
(more recently) Go are favored over other programming
languages.[30] For example, the back end of Gmail is written in Java and the back end of Google Search is written in C++.[31] Google has acknowledged that Python has
played an important role from the beginning, and that it
continues to do so as the system grows and evolves.[32]

From an operation standpoint, when a client computer attempts to connect to Google, several DNS servers resolve
www.google.com into multiple IP addresses via Round
Robin policy. Furthermore, this acts as the rst level of
load balancing and directs the client to dierent Google
clusters. A Google cluster has thousands of servers and
once the client has connected to the server additional load
balancing is done to send the queries to the least loaded The software that runs the Google infrastructure
[33]
web server. This makes Google one of the largest and includes:
most complex content delivery networks.[17]
Google Web Server (GWS) custom Linux-based
Google has numerous data centers scattered around
Web server that Google uses for its online services.
the world. At least 12 signicant Google data center installations are located in the United States. The
Storage systems:
largest known centers are located in The Dalles, Ore Google File System and its successor,
gon; Atlanta, Georgia; Reston, Virginia; Lenoir, North
[18]
Colossus[34][35]
Carolina; and Moncks Corner, South Carolina. In Europe, the largest known centers are in Eemshaven and
BigTable structured storage built upon
Groningen in the Netherlands and Mons, Belgium.[18]
GFS/Colossus[34]
Googles Oceania Data Center is claimed to be located
Spanner planet-scale structured storage sysin Sydney, Australia. [19]
tem, next generation of BigTable stack[34][36]

1.4

Project 02

One of the largest Google data centers is located in the


town of The Dalles, Oregon, on the Columbia River,
approximately 80 miles from Portland. Codenamed

Google F1 a distributed, quasi-SQL DBMS


based on Spanner, substituting a custom version of MySQL.[37]
Chubby lock service
MapReduce and Sawzall programming language

3.2

Server types

Indexing/search systems:

whole index in main memory (although with low replication or no replication at all), and in early 2001 Google
TeraGoogle Googles large search index switched to an in-memory index system. This switch rad(launched in early 2006), designed by Anna ically changed many design parameters of their search
Patterson of Cuil fame.[38]
system, and allowed for a signicant increase in through Caeine (Percolator) continuous indexing put and a large decrease in latency of queries.[46]
system (launched in 2010).[39]
In June 2010, Google rolled out a next-generation in-

Hummingbird major search index update, dexing and serving system called Caeine which can
including complex search and voice search.[40] continuously crawl and update the search index. Previously, Google updated its search index in batches usGoogle has developed several abstractions which it uses ing a series of MapReduce jobs. The index was separated into several layers, some of which were updated
for storing most of its data:[41]
faster than the others, and the main layer wouldn't be updated for as long as two weeks. With Caeine the en Protocol Buers Googles lingua franca for
tire index is updated incrementally on a continuous basis.
data,[42] a binary serialization format which is
Later Google revealed a distributed data processing syswidely used within the company.
tem called Percolator[47] which is said to be the basis of
[39][48]
SSTable (Sorted Strings Table) a persistent, or- Caeine indexing system.
dered, immutable map from keys to values, where
both keys and values are arbitrary byte strings.
It is also used as one of the building blocks of 3.2 Server types
BigTable.[43]
Googles server infrastructure is divided into
RecordIO a sequence of variable sized several types, each assigned to a dierent
records.[41][44][45]
purpose:[14][17][49][50][51]

2.1

Software development practices

Most operations are read-only. When an update is required, queries are redirected to other servers, so as to
simplify consistency issues. Queries are divided into subqueries, where those sub-queries may be sent to dierent
ducts in parallel, thus reducing the latency time.[14]
To lessen the eects of unavoidable hardware failure,
software is designed to be fault tolerant. Thus, when a
system goes down, data is still available on other servers,
which increases reliability.

3
3.1

Search infrastructure
Index

Like most search engines, Google indexes documents by


building a data structure known as inverted index. Such
an index allows obtaining a list of documents by a query
word. The index is very large due to the number of documents stored in the servers.[17]
The index is partitioned by document IDs into many
pieces called shards. Each shard is replicated onto multiple servers. Initially, the index was being served from
hard disk drives, as is done in traditional information retrieval (IR) systems. Google dealt with the increasing
query volume by increasing number of replicas of each
shard and thus increasing number of servers. Soon they
found that they had enough servers to keep a copy of the

Web servers coordinate the execution of queries


sent by users, then format the result into an HTML
page. The execution consists of sending queries to
index servers, merging the results, computing their
rank, retrieving a summary for each hit (using the
document server), asking for suggestions from the
spelling servers, and nally getting a list of advertisements from the ad server.
Data-gathering servers are permanently dedicated to
spidering the Web. Googles web crawler is known
as GoogleBot. They update the index and document
databases and apply Googles algorithms to assign
ranks to pages.
Each index server contains a set of index shards.
They return a list of document IDs (docid), such
that documents corresponding to a certain docid
contain the query word. These servers need less disk
space, but suer the greatest CPU workload.
Document servers store documents. Each document
is stored on dozens of document servers. When performing a search, a document server returns a summary for the document based on query words. They
can also fetch the complete document when asked.
These servers need more disk space.
Ad servers manage advertisements oered by services like AdWords and AdSense.
Spelling servers make suggestions about the spelling
of queries.

References

REFERENCES

[22] Strand, Ginger. "Google Data Center" Harpers Magazine.


March 2008. Retrieved on October 15, 2008.

[1] Google Stanford Hardware at the Wayback Machine


(archived February 9, 1999). Stanford University (provided by Internet Archive). Retrieved on July 10, 2006.

[23] Stora Enso divests Summa Mill premises in Finland for


EUR 40 million. Stora Enso. 2009-02-12. Retrieved
12.02.2009. Check date values in: |accessdate= (help)

[2] Tawk Jelassi and Albrecht Enders (2004). Case study


16 Google". Strategies for E-business. Pearson Education. p. 424. ISBN 978-0-273-68840-2.

[24] Stooora ylltys: Google ostaa Summan tehtaan.


Kauppalehti (in Finnish) (Helsinki). 2009-02-12. Retrieved 2009-02-12.

[3] Computer Architecture, Fifth Edition: A Quantitative


Approach, ISBN 978-0123838728; Chapter Six; 6.7 A
Google Warehouse-Scale Computer page 471 Designing motherboards that only need a single 12-volt supply
so that the UPS function could be supplied by standard
batteries associated with each server

[25] Google investoi 200 miljoonaa euroa Haminaan.


Taloussanomat (in Finnish) (Helsinki). 2009-02-04. Retrieved 2009-03-15.

[4] Googles secret power supplies on YouTube

[27] http://www.theregister.co.uk/2009/04/10/google_data_
center_video

[5] Google on-server 12V UPS, 1 April 2009.


[6] Google Green infographics
[7] Analytics Press Growth in data center electricity use 2005
to 2010
[8] Google Surpasses Supercomputer Community, Unnoticed?, May 20, 2008.
[9] Fiber Optic Communication Technologies: Whats
Needed for Datacenter Network Operations, Research,
Google
[10] FTTH look ahead technologies & architectures,
Research, Google
[11] James Pearn. How many servers does Google have?.
plus.google.com.
[12] kumara ASN15169, Peering DB
[13] Urs Holzle, Speakers, Open Network Summit
[14] Web Search for a Planet: The Google Cluster Architecture
(Luiz Andr Barroso, Jerey Dean, Urs Hlzle)
[15] Warehouse size computers
[16] Denis Abt High Performance Datacenter Networks: Architectures, Algorithms, and Opportunities
[17] Fiach Reid (2004). Case Study: The Google search engine. Network Programming in .NET. Digital Press. pp.
251253. ISBN 978-1-55558-315-6.
[18] Rich Miller (March 27, 2008). Google Data Center
FAQ. Data Center Knowledge. Retrieved 2009-03-15.
[19] Brett Winterford (March 5, 2010). Found: Google Australias secret data network. ITNews. Retrieved 2010-0320.

[26] Finland First Choice for Siting Your Cloud Computing


Data Center. Accessed 4 August 2010.

[28] United States Patent: 7278273. Patft.uspto.gov. Retrieved 2012-02-17.


[29] Mark Levene (2005). An Introduction to Search Engines
and Web Navigation. Pearson Education. p. 73. ISBN
978-0-321-30677-7.
[30] Python Status Update. Artima. 2006-01-10. Retrieved
2012-02-17.
[31] Warning. Panela. Blog-city. Retrieved 2012-02-17.
[32] Quotes about Python. Python. Retrieved 2012-02-17.
[33] Google Architecture. High Scalability. 2008-11-22.
Retrieved 2012-02-17.
[34] Fikes, Andrew (July 29, 2010), Storage Architecture and
Challenges, TechTalk (PDF), Google
[35] Colossus: Successor to the Google File System (GFS)".
Highly Scalable Systems. 2013-02-05. Retrieved 201403-10.
[36] Dean, Jerey 'Je' (2009), Design, Lessons and Advice from Building Large Distributed Systems, Ladis
(KEYNOTE TALK PRESENTATION), Cornell
[37] Shute, Jerey 'Je'; Oancea, Mircea; Ellner, Stephan;
Handy, Benjamin 'Ben'; Rollins, Eric; Samwel, Bart; Vingralek, Radek; Whipkey, Chad; Chen, Xin; Jegerlehner,
Beat; Littleeld, Kyle; Tong, Phoenix (2012), F1 the
Fault-Tolerant Distributed RDBMS Supporting Googles
Ad Business, Research (PRESENTATION), Sigmod:
Google
[38] Anna Patterson CrunchBase Prole.
base.com. Retrieved 2012-02-17.

Crunch-

[39] The Register. Google Caeine jolts worldwide search


machine

[20] Google "The Dalles, Oregon Data Center" Retrieved on


January 3, 2011.

[40] Google ocial release note. Google.com. Retrieved


2013-09-28.

[21] Marko, John; Hansell, Saul. "Hiding in Plain Sight,


Google Seeks More Power." New York Times. June 14,
2006. Retrieved on October 15, 2008.

[41] Google Developing Caeine Storage System | TechWeekEurope UK. Eweekeurope.co.uk. 2009-08-18.
Retrieved 2012-02-17.

[42] Developer Guide Protocol Buers Google Code.


Code.google.com. Retrieved 2012-02-17.
[43]
[44] windley on June 24, 2008 1:10 PM (2008-06-24). Phil
Windleys Technometria | Velocity 08: Storage at Scale.
Windley.com. Retrieved 2012-02-17.
[45] Message limit Protocol Buers | Google Groups.
Groups.google.com. Retrieved 2012-02-17.
[46] Je Deans keynote at WSDM 2009 (PDF). Retrieved
2012-02-17.
[47] Daniel Peng, Frank Dabek. (2010). Large-scale Incremental Processing Using Distributed Transactions and
Notications. Proceedings of the 9th USENIX Symposium on Operating Systems Design and Implementation.
[48] The Register. Google Percolator global search jolt sans
MapReduce comedown
[49] Chandler Evans (2008). Google Platform. Future of
Google Earth. Madison Publishing Company. p. 299.
ISBN 978-1-4196-8903-1.
[50] Chris Sherman (2005). How Google Works. Google
Power. McGraw-Hill Professional. pp. 1011. ISBN
978-0-07-225787-8.
[51] Michael Miller (2007). How Google Works. Googlepedia. Pearson Technology Group. pp. 1718. ISBN
978-0-7897-3639-0.

Further reading
L.A. Barroso, J. Dean, and U. Hlzle (MarchApril
2002). Web search for a planet: The Google cluster architecture (PDF). IEEE Micro 23 (2): 2228.
doi:10.1109/MM.2003.1196112.
Shankland, Stephen, CNET news "Google uncloaks
once-secret server. April 1, 2009.

External links
Google Research Publications
Web Search for a Planet: The Google Cluster Architecture (Luiz Andr Barroso, Jerey Dean, Urs
Hlzle)
Underneath the Covers at Google: Current Systems
and Future Directions (Talk given by Je Dean at
Google I/O conference in May 2008)

7 TEXT AND IMAGE SOURCES, CONTRIBUTORS, AND LICENSES

Text and image sources, contributors, and licenses

7.1

Text

Google platform Source: http://en.wikipedia.org/wiki/Google_platform?oldid=666160878 Contributors: AxelBoldt, Edward, Ixfd64, CesarB, Ihcoyc, Ronabop, Bogdangiusca, Nikola Smolenski, Ed g2s, Joy, Tranquileye, JesseW, Mattaschen, Acm, Centrx, Philwiki, Christopher Parham, Bkonrad, Mcapdevila, Bobblewik, Neilc, Joeblakesley, Karol Langner, Joyous!, Ojw, Zondor, Mike Rosoft, Jrp, Discospinster, Smyth, Antaeus Feldspar, ReiVaX, Bender235, ZeroOne, Haxwell, Smalljim, Duk, Cmdrjameson, Pearle, Next2u, Lmviterbo, Demi,
B3virq3b, Samohyl Jan, Velella, Rick Sidwell, Suruena, Omphaloscope, R6MaY89, Jesvane, H2g2bob, Computerjoe, Kusma, KUsam,
RHaworth, Uncle G, Splintax, Zippo, WadeSimMiser, Pchov, Bbatsell, Zzyzx11, TNLNYC, Windchaser, TheDaveRoss, Rune.welsh,
DevastatorIIC, Kickboy, AmritTuladhar, Atratus, Cjdyer, Cyberprog, RussBot, Arado, Amaltsev, Gardar Rurak, Stephenb, Debackerl,
Schoen, Tungsten, Wiki alf, ColdFusion650, Derek.cashman, Iambk, Petri Krohn, JLaTondre, Eaefremov, Sinan Taifour, NeilN, Bask,
Dee00, SmackBot, Ashenai, Reedy, Tarret, Anastrophe, Cacuija, Xaosux, Gilliam, Hmains, Lakshmin, Remohammadi, Tree Biting
Conspiracy, Mnemoc, Metalim, Can't sleep, clown will eat me, Frap, Landon, Masteroftherealm, Adamantios, Radagast83, Mig30m6,
Doodle77, A5b, Ligulembot, Ohconfucius, Eliyak, Platonides, JirkaV, Agencius, Candamir, Mcnumara, Optimale, DavidParrish, Hulkster, DangerousPanda, CmdrObot, Raysonho, Chrisahn, Tex, Mikapell, ShoobyD, Gogo Dodo, Rocket000, Bear475, Joshvf, EdJohnston,
Porqin, AntiVandalBot, Seaphoto, List of marijuana slang terms, Mnp~enwiki, VoABot II, Recurring dreams, Not a dog, Pkrecker, Zsh, InnocuousPseudonym, Ckielstra, Slogsweep, Uncle Dick, Nsigniacorp, Grosscha, WebHamster, KylieTastic, DMCer, Beanman252, Wikieditor06, Orgads, Robo45h, Drestros power, Everything counts, Q Chris, Andy Dingley, Haseo9999, JohnWallaby, Aednichols, Wraithvefa,
Sound-Mind, FBachofner, Flyer22, Belinrahs, LightSpeed2, Gyrferret, Sfan00 IMG, ClueBot, Shaded0, Justin545, Sun Creator, Papa6,
Dsimic, Addbot, Crimsontaco99, TheNeutroniumAlchemist, Sbelza, NailPuppy, Jasper Deng, Jarble, Margin1522, Yobot, TaBOT-zerem,
Bugnot, Richard.e.morton, AnomieBOT, Jim1138, Bluerasberry, Citation bot, Simrider, Visualpiano, Omnipaedista, FrescoBot, W Nowicki, X7q, Chaim Shel, Spindocter123, Arndbergmann, Citation bot 1, Btilm, Sasha Beluj, Full-date unlinking bot, Mai0907, RjwilmsiBot,
Nevknown, John of Reading, GoingBatty, Ludovic.ferre, Chris Kuehl, ClueBot NG, JetBlast, 10k, Juiceman74, Alpha7248, Tkhemani,
Helpful Pixie Bot, Cowthief, Compfreak7, Tony Tan, Ericzqma, Klilidiplomus, Paburr, BattyBot, Wtb435, James Pearn, Alipoor90, Frosty,
Inna Z, Cdwn, Masida, Epicgenius, Drewp, Facker~enwiki, CosmosSoup, Antrocent and Anonymous: 255

7.2

Images

File:Ambox_current_red.svg Source: https://upload.wikimedia.org/wikipedia/commons/9/98/Ambox_current_red.svg License: CC0


Contributors: self-made, inspired by Gnome globe current event.svg, using Information icon3.svg and Earth clip art.svg Original artist:
Vipersnake151, penubag, Tkgd2007 (clock)
File:Commons-logo.svg Source: https://upload.wikimedia.org/wikipedia/en/4/4a/Commons-logo.svg License: ? Contributors: ? Original
artist: ?
File:Folder_Hexagonal_Icon.svg Source: https://upload.wikimedia.org/wikipedia/en/4/48/Folder_Hexagonal_Icon.svg License: Cc-bysa-3.0 Contributors: ? Original artist: ?
File:Googles_First_Server.jpg Source: https://upload.wikimedia.org/wikipedia/commons/2/25/Googles_First_Server.jpg License: CC
BY 2.0 Contributors: Googles First Server Original artist: Erik Pitti from San Diego, CA, USA
File:People_icon.svg Source: https://upload.wikimedia.org/wikipedia/commons/3/37/People_icon.svg License: CC0 Contributors: OpenClipart Original artist: OpenClipart
File:Portal-puzzle.svg Source: https://upload.wikimedia.org/wikipedia/en/f/fd/Portal-puzzle.svg License: Public domain Contributors: ?
Original artist: ?
File:Wikiversity-logo.svg Source: https://upload.wikimedia.org/wikipedia/commons/9/91/Wikiversity-logo.svg License: CC BY-SA 3.0
Contributors: Snorky (optimized and cleaned up by verdy_p) Original artist: Snorky (optimized and cleaned up by verdy_p)

7.3

Content license

Creative Commons Attribution-Share Alike 3.0

You might also like