Professional Documents
Culture Documents
Topics
CSE 454
Aaron Kimball
Cloudera Inc.
3 Dec 2009
About me
Aaron
Designed/taught
CSE 490H
A lecture about
An outline?
Big
Big
Big
Computing Environments
Databases
MySQL,
Store
Pro:
Con:
...
Example table
mysql> use corp;
Database changed
mysql> describe employees;
+------------+-------------+------+-----+---------+----------------+
| Field
| Type
| Null | Key | Default | Extra
|
+------------+-------------+------+-----+---------+----------------+
| id
| int(11)
| NO
| PRI | NULL
| auto_increment |
| firstname
| varchar(32) | YES
| NULL
| lastname
| varchar(32) | YES
| NULL
| jobtitle
| varchar(64) | YES
| NULL
| start_date | date
| YES
| NULL
| dept_id
| YES
| NULL
| int(11)
+------------+-------------+------+-----+---------+----------------+
Grouping
Enforcing
data quality
The problems
A
Performing
Databases
How
How
How
What we need
An
*(Serious ber-hacker)
Reliability Demands
Support
partial failure
Reliability Demands
Data
Recoverability
Reliability Demands
Individual
Recoverability
Reliability Demands
Consistency
Reliability Demands
Scalability
never
Programmer
Data
Locality
Fault Tolerance
Factors
logic
MapReduce Provides:
Automatic
Fault-tolerance
Status
A
Programming Model
Borrows
Users
map
map
map (in_key, in_value) ->
(intermediate_key, int_value) list
reduce
reduce (intermediate_key, int_value list) ->
(out_key, out_value) list
HDFS / GFS
Storage assumptions
GFS Architecture
When
An evolving ecosystem
Hardware as a service:
virtualization
Amazon EC2: Machines for rent by the hour, on demand.
where
Virtualization in a nutshell
Just
Software
A fully-virtualized machine
All
applications run in a VM
One
Clouds: high-level
Interface
EC2
AppEngine
My
High
Download: http://cloudera.com/hadoop-training-virtualmachine
Youll
Its
You
The
Want
Send
resumes/inquiries/etc to aaron@cloudera.com
Questions: aaron@cloudera.com
(c) 2008 Cloudera, Inc. or its licensors. "Cloudera" is a registered trademark of Cloudera, Inc.. All rights reserved. 1.0