You are on page 1of 18

1.

Introduction:

1.1. Business Intelligence


Business intelligence (BI) refers to computer-based techniques used
in identifying, extracting, and analyzing business data, such as sales
revenue by products and/or departments, or by associated costs and
incomes. BI technologies provide historical, current and predictive views
of business operations. Common functions of business intelligence
technologies are reporting, online analytical processing, analytics, data
mining, business performance management, benchmarking and text
mining and predictive analytics.
Business intelligence aims to support better business decisionmaking. Thus a BI system can be called a decision support system (DSS).
BI uses technologies, processes, and applications to analyze mostly
internal, structured data and business processes while competitive
intelligence gathers, analyzes and disseminates information with a topical
focus on company competitors. Business intelligence understood broadly
can include the subset of competitive intelligence.

1.2. Datawarehousing
A data warehouse (DW) is a database used for reporting. The data is
offloaded from the operational systems for reporting. The data may pass
through an operational data store for additional operations before it is
used in the DW for reporting. A data warehouse maintains its functions in
three layers: staging, integration, and access. Staging is used to store raw
data for use by developers (analysis and support). The integration layer is
used to integrate data and to have a level of abstraction from users. The
access layer is for getting data out for users.
This definition of the data warehouse focuses on data storage. The
main source of the data is cleaned, transformed, catalogued and made
available for use by managers and other business professionals for data
mining, online analytical processing, market research and decision
support (Marakas & OBrien 2009). However, the means to retrieve and
analyze data, to extract, transform and load data, and to manage the data
dictionary are also considered essential components of a data
warehousing system. Many references to data warehousing use this
broader context. Thus, an expanded definition for data warehousing
includes business intelligence tools, tools to extract, transform and load
data into the repository, and tools to manage and retrieve metadata.

1.3. Microsoft SQL Server R2


Microsoft SQL Server is a relational model database server produced
by Microsoft. Its primary query languages are T-SQL and ANSI SQL. SQL

Server 2008 R2 adds certain features to SQL Server 2008 including a


master data management system branded as Master Data Services, a
central management of master data entities and hierarchies. Also Multi
Server Management, a centralized console to manage multiple SQL Server
2008 instances and services including relational databases, Reporting
Services, Analysis Services & Integration Services.

1.4. Business Intelligence Development Studio


Business Intelligence Development Studio (BIDS) is the IDE from
Microsoft used for developing data analysis and Business Intelligence
solutions utilizing the Microsoft SQL Server Analysis Services, Reporting
Services and Integration Services. It is based on the Microsoft Visual
Studio development environment but customizes with the SQL Server
services-specific extensions and project types, including tools, controls
and projects for reports, ETL dataflows, OLAP cubes and data mining
structure.

2. Personal Loans
Personal loans are unsecured loans which people can use for a
variety of purposes, such as paying tax bills, covering school tuition, or
making car repairs. Many banks and other lenders offer personal loans to
people with good credit records who can demonstrate an ability to repay
them. This type of loan is often touted as a useful tool for consolidating
debt, for people who have multiple outstanding accounts which are
difficult to manage. By using a single loan to pay off debt, people can
consolidate their debt into one monthly payment, and they may also
achieve a lower interest rate, which is a distinct benefit. Consolidating
debt also tends to increase one's credit rating.
There are two types of personal loans. A closed-end loan is a onetime loan of a set amount, with a fixed rate and repayment schedule. This
type of loan often has a repayment period of one to two years, depending
on the amount which is borrowed, and borrowers can choose to make
additional payments to pay the loan off more quickly. For one-time
expenses, a closed-end loan can be very useful.

3. Problem Definition
The objective is to perform Extract, Transform & Load (ETL)
operations on the set of input files containing the details of Personal Loans
of a particular bank. Each input file is of a specific format like XML, txt,
csv, etc. The first part of an ETL process involves extracting the data from
these sources and carrying out transformations on these data.
The load phase loads the data into the end target, usually the data
warehouse (DW). As the load phase interacts with a database, the
constraints defined in the database schema as well as in triggers
activated upon data load apply (for example, uniqueness, referential
integrity, mandatory fields), which also contribute to the overall data
quality performance of the ETL process.

4. Stage I: Building The Warehouse


For the ETL process the inputs are the files;
1. allocation.csv (contains the details of the loans allocated)
2. customers.txt (contains the details of the customers who have
availed loans)
3. employees.txt (contains the organizational details of the
employees)
4. payments.xml (contains the payments or dues made by the
customers)
Below is the structure of each of the input file;

Classic Car Retailer


(customers, products, sales
orders, sales order line items, etc.)
1.)

Customer Table

2.)

Employees Table

3.)

Offices Table

4.)

Orderdetails Table

5.)

Orders Table

6.)

Payments Table

7.)

Productlines Table

8.)

Products Table

DATA Flow (All Tables)

Customers Data Flow

Employees Data Flow

Offices Data Flow

Products Data Flow

Productline Data Flow

Orders Data Flow

Orderdetails Data Flow

Payments Data Flow

Running Tables

Data Source View

Cubes

Problem 1:

The product code and product name from the products table.
The text description of product lines from the productlines table.

Query 1:
SELECT productCode, productName, textDescription
FROM
products T1
INNER JOIN productlines T2 ON T1.productline = T2.productline

Problem 2:
Each order in the orders table must belong to a customer in the customers table.
Each customer in the customers table can have zero or more orders in the orders table.
To find all orders that belong to each customer, you can use the LEFT JOIN clause as follows:

Query 2:
SELECT c.customerNumber, c.customerName, orderNumber, o.status
FROM
customers c
LEFT JOIN
orders o ON c.customerNumber = o.customerNumber

Problem 3:
to find all customers who have not ordered any products, you can use the following query:

Query 3:
SELECT c.customerNumber,c.customerName,orderNumber,o. STATUS
FROM customers c
LEFT JOIN orders o ON c.customerNumber = o.customerNumber
WHERE orderNumber IS NULL

Problem 4:
Check Total Delivery status, groupby status

Query 4:
SELECT
status, COUNT(*)
FROM
orders
GROUP BY status DESC;

Problem 5:
We can use GROUP BY clause to get order number, the number of items sold per order, and
total sales for each:

Query 5:
SELECT
ordernumber,
SUM(quantityOrdered) AS itemsCount,
SUM(priceeach) AS total
FROM
orderdetails
GROUP BY ordernumber;

You might also like