You are on page 1of 17

DataStage Naming Standards InfoTech Enterprises

Document Reference Version Date of Issue Reason for Issue

DataStage Proposed Naming Standards 1.1

Updated for V8.1 new stages

Produced by Authorised

Data Integration Platform

05/02/2012 DataStage Development Standards and Guidelines Page 1 of 17

1. Version History
This table records the status and version history of this document. Version 0.1 Date Author Suresh R Status (draft, peer review etc) Initial Draft Version History

05/02/2012 DataStage Development Standards and Guidelines Page 2 of 17

2. Table of Contents
1. Version History...............................................................................................2 2. Table of Contents...........................................................................................3 3. Introduction....................................................................................................4 4. DataWarehousing Training Schedule............................................................5 4.1.1 Introduction to SQL............................................................................5 4.1.2 Introduction To Infosphere Datastage...............................................6 4.1.3 Introduction To UNIX Operating System ..........................................7 5. Development Standards................................................................................9 5.1 DataStage Objects Naming Standards....................................................9 5.2 Table Definitions Naming Conventions..................................................16

05/02/2012 DataStage Development Standards and Guidelines Page 3 of 17

3. Introduction
This document provides the set naming standards IBM DataStage jobs. While this document contains some recommendations specific to release 8.1, most of the topics will be appropriate for future releases and training schedule for DataWarehousing resources.

05/02/2012 DataStage Development Standards and Guidelines Page 4 of 17

4. DATAWAREHOUSING TRAINING SCHEDULE


4.1.1 Introduction to SQL

Days

Introduction to SQL
What Can SQL do? SQL can execute queries against a database SQL can retrieve data from a database SQL can insert records in a database SQL can update records in a database SQL can delete records from a database SQL can create new databases SQL can create new tables in a database SQL can create stored procedures in a database SQL can create views in a database SQL can set permissions on tables, procedures, and views

SQL Basic

SQL Functions

SQL Advanced
SQL Like SQL Wildcards SQL In SQL Between SQL Alias SQL Joins SQL Inner Join SQL Left Join SQL Right Join SQL Full Join SQL Union SQL Not Null SQL Unique SQL Primary Key SQL Foreign Key SQL Drop

Day 110

SQL Intro SQL Syntax SQL Select SQL Distinct SQL Where SQL And & Or SQL Order By SQL Insert SQL Update SQL Delete

SQL avg() SQL count() SQL first() SQL last() SQL max() SQL min() SQL sum() SQL Group By SQL Having SQL ucase() SQL lcase() SQL mid() SQL len() SQL round()

05/02/2012 DataStage Development Standards and Guidelines Page 5 of 17

4.1.2

Introduction To Infosphere Datastage

Day 1. Datastage-modules - the Day contains an overview of the datastage components


and modules with screenshots.

Day 2. Designing jobs - datastage palette - a list of all stages and activities used in
Datastage.

Day 3. Extracting and loading data - ODBC and ORACLE stages - description and use of
the ODBC and ORACLE stages (ORAOCI9) used for data extraction and data load. Covers ODBC input and output links, Oracle update actions and best practices.

Day 4. Extracting and loading data - sequential files - description and use of the sequential
files (flat files, text files, CSV files) in datastage.

Day 5, 6, 7. Transforming and filtering data - use of transformers to perform data


conversions, mappings, validations and data refining. Design examples of the most commonly used datastage jobs.

Day 8. Performing lookups in Datastage - how to use database stages as a lookup source. Day 9, 10, 11. Implementing ETL process in Datastage - step by step guide on how to
implement the ETL process efficiently in Datastage. Contains tips on how to design and run a set of jobs executed on a daily basis, and how to develop and use the Containers.

Day 12, 13, 14. SCD implementation in Datastage - the Day illustrates how to implement
SCD's (slowly changing dimensions) in Datastage, contains job designs, screenshots and sample data. All the Slowly Changing Dimensions types are described in separate articles below: SCD Type 1 SCD Type 2 SCD Type 3 and 4

Day 15. Datastage jobs in Canada Lands Project - a set of examples of job designs resolving
real-life problems implemented in production datawarehouse environments in various companies.

Day 16. Header and trailer file processing - a sample Datastage job which processes a
textfile organized in a header and trailer format.

05/02/2012 DataStage Development Standards and Guidelines Page 6 of 17

4.1.3

Introduction To UNIX Operating System

What is UNIX? Files and processes The Directory Structure Starting an UNIX terminal

Day 1. Day 2. Day 3. Day 4. Day 5. File system security (access rights) Changing access rights Processes and Jobs
05/02/2012 DataStage Development Standards and Guidelines Page 7 of 17

Listing files and directories Making Directories Changing to a different Directory The directories. And ... Pathnames More about home directories and pathnames Copying Files Moving Files Removing Files and directories Displaying the contents of a file on the screen Searching the contents of a file Redirection Redirecting the Output Redirecting the Input Pipes Wildcards Filename Conventions Getting Help

Day 6.

Listing suspended and background processes Killing a process Other Useful UNIX commands

05/02/2012 DataStage Development Standards and Guidelines Page 8 of 17

5. Development Standards
This section outlines the standards that must be followed for DataStage Job development on all Proje

5.1

DataStage Objects Naming Standards

DataStage Object: Projects Naming Standard XXXXXXXX-<purpose> Description Examples: CGSBM-Dev_Phase1 CGSBM-Test CGSBM-Prod Project names may be up to 18 characters long & contain underscores and/or hyphens. DataStage Object: Categories Naming Standard Description

05/02/2012 DataStage Development Standards and Guidelines Page 9 of 17

Category Hierarchy (nonstandard sub-categories are broken out): Data Elements Jobs QA EXT EXT 1000 EXT 2000 Egs. INI CUS LNK CHQ TAX INV ARQ STO DFN USERS User1 TestJob1 TestJob2 User2

There will be two top level job categories, USERS and QA The USERS area is where all development is carried out. Under the users area each developer has a subcategory named as their unix login. Developer unit testing is performed in the users area. Once a developer has completed development and testing of their module they will move it into the QA area into the relevant subcategory. Each Developer is free to use whatever subcategories they wish in their own category, however it is strongly recommended that they follow the module category convention as per the QA area. The QA area is where jobs are placed to be logged into version control for system testing. In this example the QA area is subdivided into functional areas: Extraction (EXT), Initialisation (INI), Customer (CUS), Link (LNK), Cheque (CHQ),Tax (TAX), Investment (INV), Automated request (ARQ), Standing Order (STO) , Daily Function (DFN), Scheduler (SHD). Each functional area is further subdivided into categories corresponding to the module names. The module names are the functional areas with a four letter numeric suffix in the thousands range Eg EXT1000, EXT2000 would be used to contain jobs in modules EXT1000 and EXT2000 respectively.

DataStage Object: Jobs Naming Standard jbFFFNNNNxyz Description jb is a required prefix. FFF is the functional area code of the job, e.g. EXT, INI, CUS, LNK, CHQ, TAX, INV, ARQ, STO, DFN or SHD. NNNN is the job number which must start in the same 1000s range as the module number. Jobs increase in 10s and the first job in a module is job 10. 05/02/2012 DataStage Development Standards and Guidelines Page 10 of 17

xyz is the an optional descriptive purpose using Init Caps Notation. e.g AccountsFile. Examples: jbEXT1010AccountsFile first extraction job in module EXT1000 (the first EXT module)

DataStage Object: Sequencers Naming Standard sqFFFNNNN Description sq is a prefix to identify Sequencers. A sequencer name in DataStage can only be started with an alphabetic character FFFis the functional area NNNN is the optional module number, required for sub sequncers that run just the module. The sequence for an entire functional area would not require this. Examples: sqEXT the sequencer for extraction sqCUS2000 the sub-sequencer for the second Customer module DataStage Object: Stages Stage Type Aggregator Basic Transformer Change Apply Change Capture Checksum Column Export Column Gen Column Import Combine Records Compare Stage ID ag bt ca cc cs cx cg ci cr cm <action verb> is applicable to active stages to indicates what action the stage <Stage ID> indicates the type of stage. <source/target name> is applicable to passive stages and would consist of the table or file name. Description <Stage ID> + <source/target name OR action verb> + <S or T>

05/02/2012 DataStage Development Standards and Guidelines Page 11 of 17

Complex Flat file Compress Containers Containers Copy Dataset Decode Dedup Difference External Source External Target Fileset Filter FTP Funnel Generic Hashed Head Join Lookup Lookupfileset Make Subrecord Make Vector Merge Modify MQSeries OraBulk OraOCI Parallel Transformer Peek Pivot

cf co lc local container sc shared container cp ds de dd di es et fs fi ft fu ge hf hd jn lu lf ms mv mg mo mq ob or xf pk pt

is performing. S or T is a suffix applicable to passive stages to indicate whether the stage is a source or target Examples: or_IssueRatingS (Oracle OCI Source) xf_FilterReformats (Filters Reformat Transformer).

Stages can be categorized into the following types: 1. Passive Stages: Source and/or Target Stages that connect to sources of data including relational database, odbc, datasets, hashfiles, sequential files. Passive stages use verbs to describe their action and the source and/or target to which the action applies. 2. Active Stages: Transformation Stages that contain the bulk of the transformation logic and business rules. Lookup or Reference Stages that connect to a Look up or Reference data source. This type of stage is used principally as a translation mechanism for any transformation logic. It can also be used to pass information from one job to another.

05/02/2012 DataStage Development Standards and Guidelines Page 12 of 17

Prompte Subrecord Remove Duplicates Row Gen Sample Sequential File Slowly Changing Dimension Sort Split Subrecord Split Vector Surrogate Key Generator Switch Tail TeraBulk Teradata Transformer Wr Range Map

ps rd rg sm sq sd st ss sv sk sw tl tb td xf wr

DataStage Object: Stage Variables Naming Standard sv<VARIABLENAME> Description sv to denote the Stage Variable, then the variable name in UPPERCASE.

DataStage Object: Links Naming Standard lk<action verb>Yyyyyy Description lk is a prefix to identify Link. <action verb> describes what is happening to the data flowing through the link Yyyyyy is an optional name of the table definition or field or other specific instance of a general stage object flowing through this link. Examples: lkSortAccountsData

05/02/2012 DataStage Development Standards and Guidelines Page 13 of 17

Links or workflows connect stages and carry data from sources through any transform stages into a target stage(s). All links are named with respect to the source of the data that is on the link. In other words, all links will be named with respect to the output stage of that link, not the input stage. Server jobs supports two types of links: Stream Link: A link representing the flow of data from source stage to transform stage and target stage. This link is represented by solid line. Reference Link: A link representing a table (data) lookup. It is used to provide information that might affect the way data is changed, but does not supply the data to be changed. This link is represented by dotted line. Output/Input link to a passive stage should describe at a lower level (like table) the data that is flowing through it. DataStage Object: Routines Naming Standard <prefix>Xxxxxxxxx Description rt is a prefix to identify a routine rtx is a prefix for an external routine Xxxxxxxxx is a description of what is the function of the routine. Routines should be used for complex transformations. A routine accepts a series of arguments and returns an answer. A routine may include an Action argument to indicate different return answers. Argument names should be changed from the default arg1, arg2, arg3, etc., to names relevant to the data the argument supplies. If the routine that includes an Action argument then the Action argument should accept relevant names indicating the return answer. DataStage Object: BUILD OPS 05/02/2012 DataStage Development Standards and Guidelines Page 14 of 17

Naming Standard boGgggggggg

Description Build Op name starts with bo and Ggggggggggis a short description of its function. Build ops are custom operators developed in C++. They can be used to implement specialist functionality that cannot be easily developed using a combination of the standard operators available in the GUI.

DataStage Object: Table Definitions Metadata/Table definitions are described in the next section. DataStage Object: Dataset Names Naming Standard Dataset fie-naming conventions are described in later sections. Dataset stages must have the same name as the filename, prefixed by ds, and excluding the file extension. i.e.: ds<FunctionalArea><ModuleNumber><purpose> Eg : dsEXT1000SourceAccountsData. DataStage Object: Job Parameters all have lowercase p prefix. Rest of parameter is upper case. Naming Standard pORACLE_SID/ pDB2_SID Description Sid for Oracle/DB2. Description Note that, on the DataStage canvas, the

pORACLE_USER/ pDB2_USER

Username to connect to Oracle/DB2.

pORACLE_PASSWORD/ pDB2_PASSWORD pDSPATH pTARGET_FILE_DIR pEFFECTIVE_DATE

Password to connect to Oracle/DB2. Datasets directory. Directory for saving sequential files. Effective date for processing

DataStage Object: Logic & Functions 05/02/2012 DataStage Development Standards and Guidelines Page 15 of 17

Naming Standard Logical Statements

Description All the logical statements should follow the following standards. The first letter should be in upper case and the following letters in lower case. Example: If (InputDate = ) Then Step1 Else Step2 SQL statements used in the Database stage should follow the following standards. The first letter should be in upper case and the following letters in lower case. Example: Select EmpNo, EmpName From Emp_Table Where EmpNo Is NOT NULL Logical operators should be in upper case. Example: Logical operators like AND, OR, NOT, NOR

SQL Statements

Logical Operators

DataStage Functions

etc. DataStage functions should be in InitCaps. Each word in the function should start with upper case. Example: Trim(), Str(), IsNull(), NullToEmpty(), NullToValue(), NullToZero() etc.

5.2

Table Definitions Naming Conventions

Table definitions define the format of the data to be used at each stage of a DataStage job. They are stored in the Repository and can be copied by all the jobs in a project. Table definitions are created in the following ways: 1. Imported from an external source or target such as a databases table or csv file or cfd. 2. Saved from a user defined set of columns from a Dataset, sequential file or file set definition 3. Manually created The original source meta data will be imported using the utility in DataStage manager. These column definitions must not be changed.

05/02/2012 DataStage Development Standards and Guidelines Page 16 of 17

\Datasets\staging contains the table defs of datasets that are stored between jobs within a functional area. Staging datasets must not be used to pass data across functional areas or as direct source for data to be sent through connect:direct. \Datasets\deliver - contains the table defs of datasets that are produced from the output of a functional area that are intended to be used by a downstream functional area.

05/02/2012 DataStage Development Standards and Guidelines Page 17 of 17

You might also like