You are on page 1of 7

project architecture We have to start with .Our projects are mainly onsite and offshore model projects.

In this project we have one staging area in between source to target databases. In some project they won t use staging area s. Staging area simplify the process..

Architecture Analysis Requirement Gathering Design Development Testing Production

Analysis and Requirement Gathering: Output :Analysis Doc, Subject Area 100% in onsite, Business Analyst, project manager. Gather the useful information for the DSS and indentifying the subject areas, identify the schema objects and all.. Design: Output: Technical Design Doc s, HLD, UTP ETL Lead, BA and Data Architect 80% onsite .( Schema design in Erwin and implement in database and preparing the technical design document for ETL. 20% offshore: HLD & UTP Based on the Technical specs.. developers has to create the HLD(high level design) it will have he Informatica flow chart. What are the transformation required for that mapping. In some companies they won t have HLD. Directly form technical specs they will create mappings. HLD will cover only 75% of requirement. UTP Unit Test Plan.. write the test cases based on the requirement. Both positive and negative test cases.

Development : output : Bugs free code, UTR, Integration Test Plan ETL Team and offshore BA 100% offshore

Based on the HLD. U have to create the mappings. After that code review and code standard review will be done by another team member. Based on the review comments u have to updated the mapping. Unit testing based on the UTP. U have to fill the UTP and Enter the expected values and name it as UTR (Unit Test Results). 2 times code review and 2 times unit testing will be conducted in this phase. Migrating to testing repository Integration test plan has to prepare by the senior people. Testing : Output: ITR, UAT, Deployment Doc and User Guide Testing Team, Business Analyst and Client. 80% offshore Based on the integration test plan testing the application and gives the bugs list to the developer. Developers will fix the bugs in the development repository and again migrated to testing repository. Again testing starts till the bugs free code. 20% Onsite UAT User Accept Testing. Client will do the UAT.. this is last phase of the etl project. If client satisfy with the product .. next deployment in production environment.

Production: 50% offshore 50% onsite Work will be distributed between offshore and onsite based on the run time of the application. Mapping Bugs needs to fix by Development team. Development team will support for warranty period of 90 days or based on agreement days..

In ETL projects Three Repositorys. For each repository access permissions and location will be different. Development : E1 Testing : E2 Prduction : E3 In Everyphase

i given the Out come of the phase as OUTPUT after that

Roles involved percentage of involvment.. In my project ... we won't call as dimensions and facts.. we willl called as standard tables( Dimensions) and relationship tables( facts tables)..

dimensions are Account, Customer, Involveparty, Distributor, etc.,. Facts are relation between all those tables. ACCT_X_CUST, IP_X_ACCT, IP_X_IP

Its is not mendatory that ur project had dimensions and fact tables.. because now a days informatica is using for automations of any kind of process.. simply using for some validations, converting from flat file to xml .. etc.,, because if clients are purchasing informatica software.. they are trying to utilize the tool for another projects also..

In interviews u people can tel that my project is not Datawarehousig project.. we used to automate some process of the client.. my requirement is to validate the data and populate the target with valid and useful data to the clients... Q) Did you ever load a SnowFlake Dimension? If Yes can i know the process of doing ETL Mapping for a Level4 SnowFlake Dimension

A) Snowflake Dimensions????? a Level4 SnowFlake Dimension ????? there is no such term as per my knowledge... May be its snowflake schema.. This is first time i'm hearing level 4 and snowflake dimensions Did you do mapping document? If not who does that? How to do it?

A) Mapping documents(technical design document ) will developed by Onsite ETL Lead, Business Analyst and sometimes offshore Lead and Sr Software. Based on ur knowledge and confidence level u can say i involved in that.. Its nothing to prepare mapping docs it easy based on the source structure and target structure.. we will mapp the matching fields.. some fields needs to validate and lookup on some other tables etc., we will mentioned on the related coloumn in the doc. this logics will be provided by the Business Analysts. if some fields hav no related fields in the source for few target coloumns we will intract with Architech

wht is that fields represents for and wht kind of data needs to populate to those fields. Sometimes hardcode values or nulls based on the design he will confirm.. Q) How do you do the inital loads (ex: 5 Years data), while we have only 3 years data in OLTP, & All attributes of dimensions are updated & source system is having only latest information for populating dimension

A) Inital Load is loading all the historical data from the starting to current date. If u want 5 yrs data but only had 3 yrs data. First of all its not valid data. u need to hav all the previous data from the starting i.e from day one. In OLTP's they will keeep only few years data and they will load the old data into some flat files and keep it as historical data. In tat case we hav to load flat files as initial loads and after that we have to load those OLTP data. once its finished all the loading till date. we will Load Incremental Load from next day onwards on daily. Q) A) Mapplet: Mapplet is a reusable object that we create in mapplet desinger. It contains a set of transformations and allows us to reuse the transformation logic in multiple mappings..

Mapplet restrictions:

1. we cant use non reusable sequence generator transformation in mapplet. 2. we cant use external stored procedure transformation. 3. normalizer Transformation 4. Cobal sources 5. XML Source qualifier transformation. 6. XML Sources 7. Traget defination 8. Other mapplets.

Worklet:

Worklet is a reusable object that we create in worklet desinger. it groups the set of tasks and allows us to reuse the tasks logic in differnet workflows.

Profiling:

It is a process of comparing data and check wheather it meets the standards, inspect data for errors inconsistences, redundincies and incomplete information.

Version:

Any one of multiple copies of versioned objects stored in repository. The repository uses version numbers to differnciate versions.

CUME Function:

Ruturns a running total. A running total means CUME returns a total each time it adds a value .

ex:

sal return_value 10000 10000 15000 25000 13000 38000

in the first row return_value is 10000, second row is 25000(10000+15000) and so on.....

Parameters and variables:

Mapping parameters and variables:

Mapping Parameter is a constant value between the session runs. This means create mapping parameters for use within a mapping or mapplet to reference values that remain constant throughout a session. Creating mapping variables in mapping or mapplet to write exprdssions referencing values that change from session to session.

Workflow Variables: Workflow variables use within workflow to write expressions referencing values that change from workflow to workflow. Q) Abt Variables Two types of variables are there.

1) Local Variable 2) Global Variable

Local Variable ----------------Local Variable are used in mapping itself.(Expression, Aggregator, Rank having the variable ports, this ports are local to the particular tranformation. The Scope of the variable is LOCAL)

Global Variable -------------------

Global Variable are used in both mapping, session, workflow. Mapping variables are called as global variable. The scope of the variable is GLOBAL. In mapping variables three aggregrate functions are there(MAX,MIN,COUNT). Q)What is the difference between View and Materialized View?Explain both. A> Normal View :: It is a virtual table.It will not occupy any space. normal viewl store the data but not the data definition normal view is depended on base table

Materialized VIew:: Materialized view gives the access to view set of attributes in a table. It stores the data nadlso data definition It occupies space. It can be run on demand,commit type,interval point It is independent of base table

eg: used in reporting tool wen u create cubes and dimenssions.Sme info is very imp., which shouldnt be shard like CREDICARD NUMBER.In this case materialized will be used on that table Q) what is data reconciliation? A) Data reconciliation is element level checking where each element is valid. This includes matching the source and reflecting an accurate, valid value.