You are on page 1of 2

Main Topics

CCB-400 is designed to test a candidates fluency with the concepts and skills in
the following areas:
Core HBase Concepts
Recognize the fundamental characteristics of Apache HBase and its role in a big
data ecosystem. Identify differences between Apache HBase and a traditional RDBM
S. Describe the relationship between Apache HBase and HDFS. Given a scenario, id
entify application characteristics that make the scenario an appropriate applica
tion for Apache HBase.
Data Model
Describe how an Apache HBase table is physically stored on disk. Identify the di
fferences between a Column Family and a Column Qualifier. Given a data loading s
cenario, identify how Apache HBase will version the rows. Describe how Apache HB
ase cells store data. Detail what happens to data when it is deleted.
Architecture
Identify the major components of an Apache HBase cluster. Recognize how regions
work and their benefits under various scenarios. Describe how a client finds a r
ow in an HBase table. Understand the function and purpose of minor and major com
pactions. Given a region server crash scenario, describe how Apache HBase fails
over to another region server. Describe RegionServer splits.
Schema Design
Describe the factors to be considered with creating Column Families. Given an ac
cess pattern, define the row keys for optimal read performance. Given an access
pattern, define the row keys for locality.
API
Describe the functions and purpose of the HBaseAdmin class. Given a table and ro
wkey, use the get() operation to return specific versions of that row. Describe
the behavior of the checkAndPut() method.
Administration
Recognize how to create, describe, and access data in tables from the shell. Des
cribe how to bulk load data into Apache HBase. Recognize the benefits of managed
region splits.
Sample Questions
Question 1
You want to store clickstream data in HBase. Your data consists of the following
: the source id, the name of the cluster, the URL of the click, the timestamp fo
r each click
Which rowkey would you use if you wanted to retrieve the source ids with a scan
and sorted with the most recent first?
A. <(Long)timestamp>
B. <source_id><Long.MAX_VALUE (Long)timestamp>
C. <timestamp><Long.MAX_VALUE>
D. <Long.MAX_VALUE><timestamp>
Question 2
Your application needs to retrieve 200 to 300 non-sequential rows from a table w
ith one billion rows. You know the rowkey of each of the rows you need to retrie
ve. Which does your application need to implement?
A. Scan without range
B. Scan with start and stop row
C. HTable.get(Get get)
D. HTable.get(List<Get> gets)
Question 3
You perform a check and put operation from within an HBase application using the
following:
table.checkAndPut(Bytes.toBytes("rowkey"),
Bytes.toBytes("colfam"),
Bytes.toBytes("qualifier"),
Bytes.toBytes("barvalue"), newrow));
Which describes this check and put operation?
A. Check if rowkey/colfam/qualifier exists and the cell value "barvalue" is equa
l to newrow. Then return true.
B. Check if rowkey/colfam/qualifier and the cell value "barvalue" is NOT equal t
o newrow. Then return true.
C. Check if rowkey/colfam/qualifier and has the cell value "barvalue". If so, pu
t the values in newrow and return false.
D. Check if rowkey/colfam/qualifier and has the cell value "barvalue". If so, pu
t the values in newrow and return true.
Question 4
What is the advantage of the using the bulk load API over doing individual Puts
for bulk insert operations?
A.Writes bypass the HLog/MemStore reducing load on the RegionServer.
B.Users doing bulk Writes may disable writing to the WAL which results in possib
le data loss.
C.HFiles created by the bulk load API are guaranteed to be co-located with the R
egionServer hosting the region.
D.HFiles written out via the bulk load API are more space efficient than those w
ritten out of RegionServers.
Question 5
You have a WebLog table in HBase. The Row Keys are the IP Addresses. You want to r
etrieve all entries that have an IP Address of 75.67.12.146. The shell command y
ou would use is:
A. get 'WebLog', '75.67.21.146'
B. scan 'WebLog', '75.67.21.146'
C. get 'WebLog', {FILTER => '75.67.21.146'}
D. scan 'WebLog', {COLFAM => 'IP', FILTER => '75.67.12.146'}
Answers
Question 1: B
Question 2: D
Question 3: D
Question 4: A
Question 5: A
Disclaimer: These exam preparation pages are intended to provide information abo
ut the objectives covered by each exam, related resources, and recommended readi
ng and courses. The material contained within these pages is not intended to gua
rantee a passing score on any exam. Cloudera recommends that a candidate thoroug
hly understand the objectives for each exam and utilize the resources and traini
ng courses recommended on these pages to gain a thorough understand of the domai
n of knowledge related to the role the exam evaluates.

You might also like