Professional Documents
Culture Documents
Introduction:
1.2. Datawarehousing
A data warehouse (DW) is a database used for reporting. The data is
offloaded from the operational systems for reporting. The data may pass
through an operational data store for additional operations before it is
used in the DW for reporting. A data warehouse maintains its functions in
three layers: staging, integration, and access. Staging is used to store raw
data for use by developers (analysis and support). The integration layer is
used to integrate data and to have a level of abstraction from users. The
access layer is for getting data out for users.
This definition of the data warehouse focuses on data storage. The
main source of the data is cleaned, transformed, catalogued and made
available for use by managers and other business professionals for data
mining, online analytical processing, market research and decision
support (Marakas & OBrien 2009). However, the means to retrieve and
analyze data, to extract, transform and load data, and to manage the data
dictionary are also considered essential components of a data
warehousing system. Many references to data warehousing use this
broader context. Thus, an expanded definition for data warehousing
includes business intelligence tools, tools to extract, transform and load
data into the repository, and tools to manage and retrieve metadata.
2. Personal Loans
Personal loans are unsecured loans which people can use for a
variety of purposes, such as paying tax bills, covering school tuition, or
making car repairs. Many banks and other lenders offer personal loans to
people with good credit records who can demonstrate an ability to repay
them. This type of loan is often touted as a useful tool for consolidating
debt, for people who have multiple outstanding accounts which are
difficult to manage. By using a single loan to pay off debt, people can
consolidate their debt into one monthly payment, and they may also
achieve a lower interest rate, which is a distinct benefit. Consolidating
debt also tends to increase one's credit rating.
There are two types of personal loans. A closed-end loan is a onetime loan of a set amount, with a fixed rate and repayment schedule. This
type of loan often has a repayment period of one to two years, depending
on the amount which is borrowed, and borrowers can choose to make
additional payments to pay the loan off more quickly. For one-time
expenses, a closed-end loan can be very useful.
3. Problem Definition
The objective is to perform Extract, Transform & Load (ETL)
operations on the set of input files containing the details of Personal Loans
of a particular bank. Each input file is of a specific format like XML, txt,
csv, etc. The first part of an ETL process involves extracting the data from
these sources and carrying out transformations on these data.
The load phase loads the data into the end target, usually the data
warehouse (DW). As the load phase interacts with a database, the
constraints defined in the database schema as well as in triggers
activated upon data load apply (for example, uniqueness, referential
integrity, mandatory fields), which also contribute to the overall data
quality performance of the ETL process.
Customer Table
2.)
Employees Table
3.)
Offices Table
4.)
Orderdetails Table
5.)
Orders Table
6.)
Payments Table
7.)
Productlines Table
8.)
Products Table
Running Tables
Cubes
Problem 1:
The product code and product name from the products table.
The text description of product lines from the productlines table.
Query 1:
SELECT productCode, productName, textDescription
FROM
products T1
INNER JOIN productlines T2 ON T1.productline = T2.productline
Problem 2:
Each order in the orders table must belong to a customer in the customers table.
Each customer in the customers table can have zero or more orders in the orders table.
To find all orders that belong to each customer, you can use the LEFT JOIN clause as follows:
Query 2:
SELECT c.customerNumber, c.customerName, orderNumber, o.status
FROM
customers c
LEFT JOIN
orders o ON c.customerNumber = o.customerNumber
Problem 3:
to find all customers who have not ordered any products, you can use the following query:
Query 3:
SELECT c.customerNumber,c.customerName,orderNumber,o. STATUS
FROM customers c
LEFT JOIN orders o ON c.customerNumber = o.customerNumber
WHERE orderNumber IS NULL
Problem 4:
Check Total Delivery status, groupby status
Query 4:
SELECT
status, COUNT(*)
FROM
orders
GROUP BY status DESC;
Problem 5:
We can use GROUP BY clause to get order number, the number of items sold per order, and
total sales for each:
Query 5:
SELECT
ordernumber,
SUM(quantityOrdered) AS itemsCount,
SUM(priceeach) AS total
FROM
orderdetails
GROUP BY ordernumber;