You are on page 1of 19

ETL TESTING This guide provides the following sections:1. Data warehouse concepts 3. Etl test plan 5.

Types of etl testing 7. Bug reporting reporting & etc..) 9. Etl performance testing 11. Project with example 13. Unix 2. Etl development life cycle 4. Etl testing life cycle (or) Etl test process 6. Types of etl bugs 8. Testing templates(test case, bug 10. Etl interview questions 12. SQL

1. Data warehouse concepts Data ware house is relational database which is subject oriented, integrated, time-variant and non volatile collection of data used to support strategic decision making process Data warehouse Architecture:

2. Etl development life cycle To learn etl testing, sql is mandatory and should have knowledge in unix. Any way I will guide you in last section. ETL Testing:ETL testing is similar to manual testing which we have to do manually with human interaction.

Once after inserting or updating the data into datamart by etl developer then we will test that datamart before loading into the centralized dataware house. This test is called ETL Testing. Etl development life cycle:

REQUIREMENT ANLAYSIS HIGH LEVEL DESIGN LOW LEVEL DESIGN DEVELOPMENT SIT(system integration testing) REVIEW TESTING--------->>Etl Testing life Cycle UAT(user acceptance testing) PRODUCTION 3. Etl test plan Test Plan for banking project

Introduction Back Ground Test Items Features to be tested Approach Testing levels Features Pass & Fail criteria Suspension criteria

Banking Informatica, oracle 10g Fixed Deposit, Withdrawls. Like password.non secure field to tested Types of Etl testing Sanity, smoke How many tc pass, tc fail Company will make some rules

Test Environment

Staging server, client server(Alpha), production server(Beta), live server

Test deliverables Scheduled tasks

Test cases, bug logging, test procedure Its a time table of the project or module.

Staff & training Risk and mitigation Sign off Features not to be tested

Required persons General Holidays, seek leaves Higher authority Secure fiels,tables

Etl testing life cycle (or) Etl test process ETL TESTING LIFE CYCLE:-

5. Types of etl testing

1)

Constraint Testing:

In the phase of constraint testing, the test engineers identifies whether the data is mapped from source to target or not. The Test Engineer follows the below scenarios in ETL Testing process. a) b) c) d) e) f) g) 2) NOT NULL UNIQUE Primary Key Foreign key Check Default NULL Source to Target Count Testing:

In the Source to Target data is matched or not. A Tester can check in this view whether it is ascending order or descending order it doesnt matter .Only count is required for Tester. Due to lack of time a tester can follow this type of Testing.

3)

Source to Target Data Validation Testing: In this Testing, a tester can validate the each and every point of the source to target

data. Most of the financial projects, a tester can identify the decimal factors.

4)

Threshold/Data Integrated Testing:

In this Testing, the Ranges of the data, A test Engineer can usually identifies the population calculation and share marketing and business finance analysis (quarterly, halferly, Yearly)

MIN 4

MAX 10

RANGE 6

5)

Field to Field Testing:

In the field to field testing, a test engineer can identify that how much space is occupied in the database. The data is integrated in the table cum datatypes.

NOTE: To check the order of the columns and source column to target column.

6) Duplicate Check Testing: In this phase of ETL Testing, a Tester can face duplicate value very frequently so, at that time the tester follows database queries why because huge amount of data is present in source and Target tables. Select ENO, ENAME, SAL, COUNT (*) FROM EMP GROUP BY ENO, ENAME, SAL HAVING COUNT (*) >1;

Note: 1) There are no mistakes in Primary Key or no Primary Key is allotted then the duplicates may arise.

2)

Sometimes, a developer can do mistakes while transferring the data from source to target at that time duplicates may arise. Due to Environment Mistakes also duplicates arise (Due to improper plugins in the tool).

3)

7) 1) 2)

Error/Exception Logical Testing: Delimiter is available in Valid Tables Delimiter is not available in invalid tables(Exception Tables)

8)

Incremental and Historical Process Testing:

In the Incremental data, the historical data is not corrupted. When the historical data is corrupted then this is the condition where bugs raise.

9)

Control Columns and Defect Values Testing:

This is introduced by IBM 10) Navigation Testing: Navigation Testing is the End user point of view testing. An end user cannot follow the friendly of the application that navigation is called as bad or poor Navigation. At the time of Testing, A tester can identify this type of navigation scenarios to avoid unnecessary navigation.

11) Initialization testing: A combination of hardware and software installed in platform is called the Initialization Testing

12) Transformation Testing: At the time of mapping from source table to target table, Transformation is not in mapping condition, then the Test Engineer raises bugs.

13) Regression Testing: Code modification to fix a bug or to implement a new functionality which makes us to to find errors. These introduced errors are called regression . Identifying for regression effect is called regression testing.

14) Retesting: Re executing the failed test cases after fixing the bug.

15)

System Integration Testing:

Integration testing: After the completion of programming process . Developer can integrate the modules there are 3 models a) b) c) Top Down Bottom Up Hybrid 6. Types of etl bugs
1. User interface bugs/cosmetic bugs:Related to GUI of application Navigation, spelling mistakes, font style, font size, colors, alignment.

2. BVA Related bug:Minimum and maximum values 3. ECP Related bug:Valid and invalid type 4. Input/output bugs:Valid values not accepted Invalid values accepted

5. Calculation bugs:Mathematical errors Final output is wrong 6. Load condition bugs:Does not allows multiple users Does not allows customer expected load 7. Race condition bugs:System crash & hang System cannot run client plat forms 8. Version control bugs:No logo matching No version information available This occurs usually in regression testing 9. H/W bugs:Device is not responding to the application 10. Source bugs:Mistakes in help documents

7)Bug reporting
Bug Life Cycle (or) Defect Tracking Process

DETECT DEFECT REPRODUCED DEFECT REPORT DEFECT BUG FIXING BUG RESOLVING BUG CLOSING

Testing templates
1. 2. 3. 4. Issue log/Clarification template Test case template Bug reporting template Metrics template

Issue log/Clarification template:-

Reference (Doc Name)

Issue Description

Clarification provider

status

Raised date

Clarified date

Clarified by

Remarks

Test case template:-

S.NO

TC_ID

Description

Expected Result

status

Query

comment

Bug reporting template:-

Defect_ID

Description Build_ID

Version_ID Severity

Priority

Status

Assigned to

Detected By

Metrics template:-

Date

No. of test cases designed

No. of test cases executed

No. of test cases failed

No. of test cases hold

No. of defects logged

Comments

Etl performance testing


ETL Performance Tuning:

In the Phase of ETL Performing testing , A tester can involve in database Level or Core Database Level. As well as database tester and the same time ETL tester can involve in Performance tuning also. Performance tuning means server side based work.

What is a Performance Testing :

To test the Server response with different user loads. The Purpose of performance testing is to find bottle neck in the application.

What is a Bottle Neck ?

Bottle Neck is a break point where the server will be in peak (or) the bottle neck is a pin point (or) break point when the server responds where the server will be busy with the user request.

ETL Performing Life cycle :

Work flow requirements Performing Objective Performing testing Performing Measurements Performance Tuning

ETL Workflow requirements:In the Phase of work flow req, ETL Tester can identify the performing scenarios how to connect the database to server which environment supports the performance testing and to check the front end and back end environment and batch jobs, data merging, file system components finally reporting events.

Performing Objective : The performing objective is to start end to end performance testing most Of the time performing objective will be decided by the client.

Performing Testing : To calculate the speed of the project , ETL Tester can test the DataBase Level . The data base is loading the target properly or not. When ETL Developer doesnt loads the data in proper conditions then some damage is caused in the performance of the system.

Performing Measurements : At the time of Performing execution, we need to measure the below metrics. Client side metrics 2.hits/sec 3. Through put 4. Memory allocation,5. Process resources 6. Database statistics database user conditions.

Performance Tuning :

It is a mechanism to get a fixed performance related issues as a Performance tester , we are going to give some suggest recommendations to tuning department.

Code Level ---------- Developer Data Base Level-----------DBA Network Level------------Administrator System Level-------------S/A Server Level------------Server side People.
ETL interview questions:

1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22.

What is the difference between OLAP and OLTP? Tell me about your ETL workflow process? What is the difference between Operational Database and Warehouse? What type of approach you follow in your project? What is the difference between Data Mart and data ware house? In your project you are using which type of data base and how much space ? Explain the test case template? What is the difference between Severity and Priority? What is the difference between SDLC and STLC? What is the difference between Issue Log and Clarification Log? What type of bugs you have faced in your project? What is Banking? Explain what are the types of Banking? What is the difference between Dimension table and Fact table? Explain SCDs and their types? how it will be used? Explain Bug reporting? Are you using any models in SDLC? Which process used in ETL Testing? What is unit testing? who will do this? Whats the difference between Incremental Load and Initial Load? Through which document you have done your project? Are you using Requirement tab in QC?

Project Here I am taking emp table as example. For this I will write test scenarios and test cases, that means we are testing emp table.

Check List or Test Scenarios:1. To validate the data in table (emp) 2. To validate the table structure. 3. To validate the null values of the table. 4. To validate the null values of very attribute. 5. To check the duplicate values of the table. 6. To check the duplicate values of each attribute of the table 7. To check the field value or space (length of the field size) 8. To check the constraints (foreign ,primary key) 9. To check the name of the employer who has not earned any commission 10. To check the all employers who are work in dept no (Account dept,sales dept) 11. To check the row count of each attribute. 12. To check the row count of the table. 13. To check the max salary from emp table. 14. To check the min salary from emp table.
Introduction to database:-

Data: The properties of anything is called data Ex:- Meaningful facts, text, graphics, images, sound, video segments Information: Data processed to be useful in decision making Ex: - student got 1 rank. Database: To store the information
st

Earlier days to store information we are using flat file systems like : 1. Spread sheets 2. Folders 3. Ledgers 4. List The above mentioned storage methods are called as Flat file systems.

Disadvantages:-

Data Redundancy Limited data sharing Excessive program maintenence File System Approach Access: For each program we have to maintain separate file

To avoid this drawbacks "RDBMS" came to picture

RDBMS:

It is an advanced version of DBMS with relationships It is also used to store and manage data with efficient way than DBMS RDBMS Approach

You can't connect directly to the database it won't allow. So, we used RDBMS. SQL Structured query language and purpose is in order to store (or) manage the information with relational database Sql is a set of standards maintain by the ANSCII group Installation Procedures for Oracle 10g,11g: Installation of Oracle 10g in windows xp:- Click here Installation of Oracle 11g in windows 7:- Click here Once after installing the sql prepare the below content and practice it simultaneously

DATAWAREHOUSE-BASICS What is a Data warehouse? Why we need Data warehouse? According to, Ralph Kimball: A data warehouse is a relational database that is designed for querying and analyzing the business but not for transaction processing.It usually contains historical data derived from transactional data (different sourcesystems). According to ,W.H.Inmon: A Data warehouse is a Subject oriented, integrated, timevariant and non-volatile collection of Data used to support strategic decision Makingprocess. Characteristic features of a Data warehouse: 1.Subject Oriented 2.Integrated

3.Nonvolatile 4.Time Variant Note: The first data warehousing system is implemented in 1987 by W.H.Inmon Subject Oriented:The data warehouses are designed as a Subject-oriented that a reused to analyze the business by top level management, or middle level management, or for a individual department in an enterprise.

Process Oriented

Subject Oriented

Transactional Storage Data Warehouse StorageFor example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like "Who was our best customer for this item last year?" This ability to define a datawarehouse by subject matter, sales in this case makes the data warehouse subject oriented. Integrated: A data warehouse is an integrated database which contains the business information collected from various operational data sources.

12 12 Integration of Data Data Warehouse StorageTransactional Storage A p p l . A - M , F A p p l . B - 1 , 0 A p p l . C - X , Y Appl. A -pipeline cm.A p p l . B - p i p e l i n e i n c h e s Appl. C -pipeline mcfAppl. A -balance dec(13,2)Appl. B -balance PIC 9(9)V99Appl. C balance floatA p p l . A - b a l - o n - h a n d Appl. B -current_balanceAppl. C -balanceAppl. A -

date (Julian)Appl. B -date (yymmdd)Appl. C -date (absolute)M, Fpipeline cmbalance dec(13, 2)balancedate (Julian) I n t e g r a t i o n

EncodingUnit of AttributesPhysicalAttributesNamingConventionsDataConsistency Time Variant :A Data warehouse is a time variant database which allows you to analyze and compare the business with respect to various time periods (Year, Quarter, Month, Week, Day) because which maintains historical data. Current Data Historical Data Transactional Storage Data Warehouse Storage Non-volatile :AData warehouse is a non-volatile database. That means once the data entered into data warehouse cannot change. It doesnt reflect to the changes taken place in operational database. Hence the data is static Volatile Non- Volatile According to, Babcock Data Warehouse is a repository of data summarized or aggregated in simplified form from operational systems. End user orientated data access and reporting tools let user get at the data for decision support. Why we need Data warehouse? 1.To Store Large Volumes of Historical Detail Data from Mission Critical Applications 2.Better business intelligence for end-users3. Data Security - To prevent unauthorized access to sensitive data4. Replacement of older, less-responsive decision support systems5. Reduction in time to locate, access, and analyze information Evaluation: 1.60s: Batch reports 1. hard to find and analyze information2. inflexible and expensive, reprogram every new request3. 70s: Terminal -based DSS and EIS (executive information systems)1.

still inflexible, not integrated with desktop tools4. 80s: Desktop data access and analysis tools 1. query tools, spreadsheets, GUIs2. easier to use, but only access operational databases5. 90s: Data warehousing with integrated OLAP engines an d tools What is an Operational System? OR What is OLTP? 1. Operational systems are the systems that help us run the day-to-day enterprise operations. 2. On Line Transactional Processing systems not built to hold history data.3. The data in these systems are having current data only.4. The data in these systems are maintained in 3 NF. The data is used for running the business that doesnt used for analyzing the business . 5. The examples are online reservations, credit-card authorizations, and AT withdrawals etc., Difference between OLTP and Data warehouse (OLAP)In general we can assume that OLTP systems provide source data to data warehouses, whereas OLAP systems help to analyze it. Operational System (OLTP) Data warehouse (OLAP) It is designed to support business transactional processing. It is designed to support decision-making process.Application oriented data Subject oriented dataCurrent data Historical da taDetailed data Summary dataVolatile data Non-volatile dataLess history (36 months) More history (5-10 years)Normalization data Denormalization dataDesigned for running the business Designed for analyzing the businessSupports ER modeling Supports Dimensional modelingClerical users can access this data Kn owledge users can access this

dataDB Size 100MB-GB DB Size 100GB-TBFew Indexes Many IndexesMany Joins Some Joins Advantages of Data Warehousing: 1.

High query performance2. Queries not visible outside warehouse3. Can operate when sources unavailable4. Can query data not stored in a DBMS5. Extra information at warehouse1. Modify, summarize (store aggregates)2. Add historical information6. Improves the quality and accessibility of data.7. Reduce the requirements of users to access operational data.8. Allows new reports and studies to be introduced without disrupting operationalsystems.9. Increases the amount of information available to users Types of Data warehouse: There are three types of data warehouses Centralized data warehouse: A centralised DW is one in which data is stored in asingle, large primary database. This database can be queried directly or used to feeddata marts.1. Functional data warehouse : A functional DW is dedicated to a subset of the business, such as a marketing or finance business function. 1.Separate DWs for different business capabilities2. Easier to build initially

You might also like