Professional Documents
Culture Documents
OOZIE
What is OOZIE?
Apache Oozie is a java Web application used to schedule Apache Hadoop obs. Oozie combines
multiple obs se uentiallX into one logical unit of work. It is integrated with the Hadoop stack and
supports Hadoop jobs for Apache Map educeHApache PigHApache HiveHand Apache oop. Oozie is a
workflow scheduler system to manage Apache Hadoop jobs. Oozie workflow obs are Directed
Acyclical Graphs (DAGs) of actions. Oozie coordinator obs are recurrent Oozie workflow obs
triggered based time frequency and data availability.
Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of
the box (such as java map-reduce, Streaming map-reduceHPigHHiveH oop and Match as well as
systems specific obs such as java programs and shell scripts. Oozie is a scalableHreliable and
extensible system after completing this chapter you will be able to:
- What is Oozie
- Features of Oozie
- Oozie components
- Basic types of Oozie jobs
- How Oozie works
- Different ways to interflow from the command line
- Pig and Hive operations through Oozie
Basic Understanding of java or Any Scripting language basics of BigData & Hadoop Basics of XML
Apache Pig – High –level language for expressing data analysis programs
Apache Flume – Distributed service for collecting and aggregating log and event data
1
BigData
OOZIE
UI Framework SDK
HBASE
FLUME_SCOOP
Coordination
ZOOKEEPER
OOZIE Features
Major Flexibility – start, stop, suspend and re-run the jobs
Allows you to restart from the failure- skip the failure nodes
Java client API/Command Line Interface – Launch, control and monitor jobs from your java apps
Web service API
Run Periodic jobs – Jobs needed to run every hours, day, week
Receive an email when job is complete
Control Flow
- Start, end, kill
- Decision
- Fork, join
2
BigData
OOZIE
Actions
- Map-reduce
- Java
- Pig
- Hdfs
There are two basic types of Oozie jobs:
- Oozie Workflow jobs are Directed Acyclical Graphs (DAGs), specifying a sequence of actions
to execute. The workflow job has to wait
- Oozie Coordinator jobs are recurrent Oozie workflow jobs that are triggered by time and
data availability.
Oozie Bundle provides a way to package multiple coordinator and workflow jobs and to manage
the lifecycle of those jobs
WS API
Tomcat Oozie UI
Command Hadoop, Pig, Hive
Line
DB
Oozie
3
BigData
OOZIE
KILL END
4
BigData
OOZIE
Appendix
Start OK OK
RunHiveScript RunSqoopExport End
Start
Error Error
Kill
MR1 job
Java