Professional Documents
Culture Documents
We guarantee you can pass any IT certification exam at your first attempt with just 10-12
hours study of our guides.
Our study guides contain actual exam questions; accurate answers with detailed explanation
verified by experts and all graphics and drag-n-drop exhibits shown just as on the real test.
To test the quality of our guides, you can download the one-fourth portion of any guide from
http://www.certificationking.com absolutely free. You can also download the guides for retired
exams that you might have taken in the past.
For other payment options and any further query, feel free to mail us at
info@certificationking.com
Cloudera DS-200 : Practice Test
Question No : 1
Answer: B
Question No : 2
A. ^A (Control-A)
B. , (comma)
C. \t (tab)
D. : (colon)
Answer: A
Reference:http://blog.spryinc.com/2013/10/four-useful-tricks-for-working-with-
hive.html(change the delimiter when exporting hive table)
Question No : 3
A. Native Bayes
B. Linear Regression
C. Survival analysis
www.CertificationKing.com 2
Cloudera DS-200 : Practice Test
D. Sequencealignment
Answer: B
Question No : 4
You are working with a logistic regression model to predictthe probabilitythat a user will
click on anad.Your model has hundreds of features, andyou’renot sure ifall of thosefeatures
are helpingyour prediction.Which regularization techniqueshould you use to prune features
that aren’tcontributing tothe model?
A. Convex
B. Uniform
C. L2
D. L1
Answer: A
Question No : 5
www.CertificationKing.com 3
Cloudera DS-200 : Practice Test
A. A
B. B
C. C
Answer: A
Question No : 6
A. A
B. B
C. C
Answer: C
Question No : 7
www.CertificationKing.com 4
Cloudera DS-200 : Practice Test
A. A
B. B
C. C
Answer: B
Question No : 8
A. When the volume of input data is so large and diverse that a 2nd-order optimization
technique can be fit to a sample of the data
B. When the model’s estimates must be updated in real-time in order to account for
newobservations.
C. When the input data can easily fit into memory on a single machine, but we want to
calculate confidence intervals for all of the parameters in the model.
D. When we are required to find the parameters that return the optimal value of the
objective function.
Answer: A,B
www.CertificationKing.com 5
Cloudera DS-200 : Practice Test
Question No : 9
What is the result of thefollowing command (thedatabase username is foo and password is
bar)?
A. sqoop lists only those tables in the specified MySql database that have not already been
imported into FDFS
B. sqoop returns an error
C. sqoop lists the available tables from the database
D. sqoopimports all the tables from SQLHDFS
Answer: C
Reference:https://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-
15/getting-sqoop
Question No : 10
Answer: C
Question No : 11
www.CertificationKing.com 6
Cloudera DS-200 : Practice Test
The makeup of the groups as follows:
Answer: D
Question No : 12
There are 20 patients with acute lymphoblastic leukemia (ALL) and 32 patients with acute
myeloid leukemia (AML), both variants of a blood cancer.
www.CertificationKing.com 7
Cloudera DS-200 : Practice Test
The makeup of the groups as follows:
Each individual has an expression value for each of 10000 different genes. The expression
value for each gene is a continuous value between -1 and 1.
You want to use the data from the 52 patientsin the scenarioto improvethe abilityof
doctorsbeing able to distinguishbetween ALL and AML. What type ofdata scienceproblem
is this?
A. Classification
B. Regression
C. Clustering
D. Filtering
Answer: D
Question No : 13
There are 20 patients with acute lymphoblastic leukemia (ALL) and 32 patients with acute
myeloid leukemia (AML), both variants of a blood cancer.
www.CertificationKing.com 8
Cloudera DS-200 : Practice Test
Each individual has an expression value for each of 10000 different genes. The expression
value for each gene is a continuous value between -1 and 1.
With which type of plot can you encodethe most amount of the datavisually?
Answer: C
Question No : 14
There are 20 patients with acute lymphoblastic leukemia (ALL) and 32 patients with acute
myeloid leukemia (AML), both variants of a blood cancer.
www.CertificationKing.com 9
Cloudera DS-200 : Practice Test
Each individual has an expression value for each of 10000 different genes. The expression
value for each gene is a continuous value between -1 and 1.
With which type of plot can you encode the most amount of the data visually?
Answer: C,D,F
Question No : 15
There are 20 patients with acute lymphoblastic leukemia (ALL) and 32 patients with acute
myeloid leukemia (AML), both variants of a blood cancer.
www.CertificationKing.com 10
Cloudera DS-200 : Practice Test
The makeup of the groups as follows:
Each individual has an expression value for each of 10000 different genes. The expression
value for each gene is a continuous value between -1 and 1.
With which type of plot can you encode the most amount of the data visually?
A. ~ 800 MB
B. ~ 400 MB
C. ~ 160 KB
D. ~ 4 MB
Answer: B
Question No : 16
www.CertificationKing.com 11