You are on page 1of 17

Term Paper

on
Data Mining for Business Intelligence
&
Data Warehousing

Submitted to –
Md. Fahami Ahsan Mazmum
Assistant Professor
Department of Management Information Systems
Faculty of Business Studies

Course Code: EMIS 517


Department of Management Information Systems
University of Dhaka
Letter of Transmittal

December 14th, 2017

Md. Fahami Ahsan Mazmum

Assistant Professor

Department of Management Information Systems

Faculty of Business Studies

University of Dhaka

Subject: Submission of Term Paper

Dear Sir

With your permission and per your instructions, here we are submitting the assignment
that we were assigned on ‘Data Mining for Business Intelligence’ and ‘Data
Warehousing’. The assignment has been completed by the knowledge that we have
gathered from the course “Decision Support System”.

We are thankful for all the knowledge and experience we gathered during doing this
assignment. We have tried our level best to complete this assignment meaningfully and
correctly, as much as possible. We would be happy if you read the report carefully and
provide us your valuable response.

Thanking you.

Yours obediently,
Contents
1.1 Opening Vignette: Data Mining goes to Hollywood................. 4
1.1.1 Summary: ............................................................................. 4
1.1.2 Questions and Answers:....................................................... 5
1.2 End of Chapter Application Case: Data Mining helps develop
Custom-Tailored Product Portfolios for Telecommunication
Companies ................................................................................. 8
1.2.1 Summary: ............................................................................. 8
1.2.2 Questions and Answers:....................................................... 9
1.3 Opening Vignette: DirecTV Thrives with Active Data
Warehousing ............................................................................ 10
1.3.1 Summary: ........................................................................... 10
1.3.2 Questions and Answers:..................................................... 10
1.4 End of Chapter Application Case: Continental Airlines Flies
High with its Real Time Data Warehouse ............................... 15
1.4.1 Summary: ........................................................................... 15
1.4.2 Questions and Answers:..................................................... 16
Chapter-5: Data Mining for Business Intelligence

1.1 Opening Vignette: Data Mining goes to Hollywood

1.1.1 Summary: Predicting the box office success of a movie is a very difficult task,
which makes the movie business in Hollywood a risky endeavor. On an attempt to solve
this challenging real world problem, Ramesh Sharda and Dursun Delen have explored
the use of data mining to predict the financial performance of a movie at the box office
even before it enters the production stage. They first developed a prediction model, in
which the forecasting (regression) problem was converted into a classification problem.
It means that they classified a movie based its box office receipts in one of nine
categories (ranging from flop to blockbuster, rather than forecasting the point estimate
of box-office receipt. The required data was collected from variety of movie-related
databases. Then it was consolidated into a single data set. After the data collection, the
prediction models were developed by Sharda and Delen using various data mining
methods like neural networks, decision trees, support vector machines and three types
of ensembles. The data from 1998 to 2005 were used as training data to build the
prediction models, and the data from 2006 was used as the test data to assess and
compare the models’ prediction accuracy. After that the prediction results of all three
data mining methods as well as the results of the three different ensembles were
published. The researchers claimed that these prediction results were better than any
reported in the published literature for this problem domain. Beyond the attractive
accuracy of their prediction results of the box-office receipts, these models could also
be used to further analyze (and potentially optimize) the decision variables in order to
maximize the financial return. Specifically, the parameters used for modeling could be
altered using the already trained prediction models in order to better understand the
impact of different parameters on the end results. During this process, which is
commonly referred to as sensitivity analysis, the decision maker of a given
entertainment firm could find out, with a fairly high accuracy level, how much value a
specific actor brought to the financial success of a film, making the underlying system
an invaluable decision aid. Therefore it can be said that the data mining is a prime
candidate in predicting and explaining the financial outlook of a movie which most still
think as a form of art and hence cannot be forecasted.
1.1.2 Questions and Answers:

Q1. Why should Hollywood decision makers use data mining?

Ans: Hollywood decision makers should make use of data mining because

 Data mining supports and improves predictability when enough is known about the
situation to identify the predictors (independent variables) and to build the model.

 Data mining can improve the accuracy of predicting box-office receipts, which are
critical to their financial success.

 With data mining decisions are based on data-driven forecasting model and a
classification model rather than on hunches and wild guesses.

 Importantly predictive models are effectively in early stages of movie production


before huge investments have been made. Of course, minimizing investments in
flops improves profitability.

 Decision makers should make use of Data mining so as to predict the financial
performance of a motion picture at the box office before it even enters
production(while the movie is nothing more than a conceptual idea).

Q2. What are the top challenges for Hollywood managers? Can you think of
other industry segments that face similar problems?

Ans: Hollywood managers have to allocate their scarce resources (budget, actor,
facilitates, directors, etc.) to get the highest return on investments. Movies are capital
investments for Hollywood. They invest in movies for the same reason that other types
of companies (manufacturers, retailers, service sector, and entertainment, financial)
make investments to maximize return on investment (ROI). The top challenges facing
all of those industry sectors are to identify which investments and which combination
of investments will maximize ROI at a particular time; and which variables (predictors)
to consider in evaluating alternatives.
Q3. Do you think the researchers used all of the relevant data to build
prediction models?

Ans: No I don’t think the researchers used all of the relevant data to build the prediction
model.

Q4. Why do you think the researchers choose to convert a regression problem
into a classification problem?

Ans: Researchers choose to convert a regression problem into a classification problem


because an organization consists of huge chunks of data and analysis each and every
data on by one is a very tedious job. So the process of data classification helps in
assigning each records to a particular class based on the attributes possessed by them
and this in turn helps in building a model that can be used further for predictive analysis.

Q5. How do you think these prediction models can be used? Can you think of
good production system for such models?

Ans: The classification process helps in assigning huge chunks of records to a particular
class as accurately as possible. And also it consists of a test sets and training sets of
data. The training sets of data are used to build a model and tests sets of data are used
to validate or determine the accuracy of the model. And when a model is developed
than it is used for making a predictive analysis or forecasting the behavior.

Q6. Do you think the decision makers would easily adapt to such an
information system?

Ans: Making the right decisions to manage large amounts of money is critical to
success of many of companies in this marketplace. And the main motto of decision
makers in every organization is to maximize the company’s profit as much as possible.
The use of right kind of technology to make right decision might act as a competitive
advantage among the competitors. And the model used by Hollywood decision makers
helped them in predicting the financial performance of a motion picture at the box office
before it even enters production. Also it provided a way to improve the accuracy of
predicting box-office receipts, which are critical to their financial success. So I think
decision makers would adapt to such an information system gradually.
Q7. What can be done to further improve the prediction models explained in
this case?

Ans: To further improve the prediction model, researchers should deal with more and
more data.
1.2 End of Chapter Application Case: Data Mining helps develop
Custom-Tailored Product Portfolios for Telecommunication
Companies

1.2.1 Summary: Argonauten360°, a consulting group helps businesses build and


improve successful strategies for customer relationship management (CRM) using
Relevanz-Marketing to create value by facilitating dialogue with relevant clients like
BMW, Allianz, Deutsche Bank, Gerling, Coca-Cola etc. They apply effective advanced
analytic technologies for client scoring, clustering and life-time-value computations
and the requirements for these tasks are demanding, because each project typically
presents a new and specific set of circumstances, data scenarios, obstacles, and analytic
challenges. So the problem is that the existing toolset needed to be augmented with
effective, cutting-edge, yet flexible, data mining capabilities. Another critical
consideration was for the solution to yield quick return on investment which had to be
easy to apply, with a fast learning curve, so that analysts could quickly take ownership.
The need to learn different tools for different modeling tasks had significantly hindered
the efficiency and effectiveness of the company’s consultants. After 12 months of
evaluating a wide range of data mining tools, the company chose Statistica Data Miner
(by StatSoft. Inc.) as it provided the ideal combination of features to satisfy most every
analyst’s needs and requirements with user-friendly interfaces. The business of call-by-
call services is very competitive in Europe as this service is very popular with cell phone
users as well as with regular phone users. The success of this telecommunications
provider depends greatly on attractive per-minute calling rates. The Argonauten360°
consultants analyzed the available data with Statistica’s data mining tool and proved
the popular wisdom of non-elasticity and non-predictability of the elasticity (if existed)
rising from the competition wrong. This analyses which was based on data describing
minute-by-minute phone traffic won them the business of a leading provider of call-by-
call services. Heuristic “expert-opinions” were used to forecast the expected volume of
minutes (of airtime) for the following 2 months, prior to the application of the models
derived via data mining. By using Statistica Data Miner, the accuracy of the prognoses
improved significantly, while the error rate was cut in half and this was deemed to be a
dramatically pleasing result, this providing clear proof for the efficacy and potential
benefits of advanced analytic strategies when applied to problems of this type. This is
an excellent example of a successful application of data mining technologies to help the
company gain competitive advantage in a highly competitive business environment.

1.2.2 Questions and Answers:

Q1. Why do you think that consulting companies are more likely to use data
mining tools and techniques? What specific value proposition do they offer?

Ans: Consulting companies use data mining tools and techniques because the results
are valuable to their clients. Consulting companies can develop data mining expertise
and invest in the hardware and software and then earn a return on those investments by
selling those services. Data mining can lead to insights that provide a competitive
advantage to their clients.

Q2. Why was it important for Argonauten360° to employ a comprehensive tool


that has all modeling capabilities?

Ans: In order to offer a comprehensive set of intelligence services, the company needed
a comprehensive tool—or else their analysts needed to learn many different tools. After
12 months of evaluating a wide range of data mining tools, the company chose Statistica
Data Miner because it provided the ideal combination of features to satisfy most every
analyst’s needs and requirements with user-friendly interfaces.

Q3. What was the problem that Argonauten360° helped solve for a call-by-call
provider?

Ans: It is a very competitive business, and the success of the call-by-call


telecommunications provider depends greatly on attractive per-minute calling rates.
Rankings of those rates are widely published, and the key is to be ranked somewhere
in the top-five lowest-cost providers while maintaining the best possible margins.

Q4. Can you think of other problems for telecommunication companies that
are likely to be solved with data mining?

Ans: Yes, such as predicting customer churn (lost customers), predicting demand for
capacity, predicting the volume of calls for customer service based on time of day and
predicting demographic shifts.
Chapter-8: Data Warehousing

1.3 Opening Vignette: DirecTV Thrives with Active Data


Warehousing

1.3.1 Summary: DirecTV is a perfect example of how an interactive data


warehousing and business intelligence product can spread across the enterprise. It is
known for its direct television broadcast satellite service and has been a regular
contributor to the evolution of TV with its advanced HD programming and other
attractive features. But in the middle of their journey, they faced the challenge of
accommodating a large volume of data coming from the escalating number of daily
customer calls as well as the accordingly changing market conditions. Though they had
an early implementation of the data warehouse addressing the company’s needs fairly
well, it had some limitations like time consuming and system straining. So DirecTV
used software solutions from Teradata and GoldenGate which allowed the integration
of a range of data management systems and platforms. As a result, a huge business
benefit turned out which allowed DirecTV to measure their churn in real time. With
fresh data at their fingertips, call center sales personnel were able to contact a customer
who had just asked to be disconnected and make a new sales offer to retain the customer
just hours later the same day. The system has also been set up to log customer service
calls, reporting back constantly on technical problems that are reported in the field. This
allowed management to better evaluate and react to field reports. Through this method,
DirecTV was able to leverage its data assets spread throughout the enterprise to be used
by knowledge workers wherever and whenever they were needed. The key lesson here
is that a real time, enterprise-level active data warehouse combined with a strategy for
its use in decision support can result in significant benefits for an organization.

1.3.2 Questions and Answers:

Q1. Why is it important for DirecTV to have an active data warehouse?

Ans: DIRECTV is a leading worldwide provider of satellite-based television services.


With annual revenue of $21.5 billion, the company serves approximately 18.5 million
U.S. customers and more than 6.5 million Latin American customers. In the U.S.,
DIRECTV ranked higher in customer satisfaction than cable nine years running (based
on data from the 2001–2009 American Customer Satisfaction Index). It is very
important for DIRECTV to have an active data warehouse because:

Data warehouse is Subject oriented– e.g. customers, patients, students, products.

Focusing on the modeling and analysis of data for decision makers, not on daily
operations or transaction processing.

Provide a simple and concise view around particular subject issues by excluding data
that are not useful in the decision support process.

It provides Integrated information- Consistent naming conventions, formats, encoding


structures; from multiple data sources

Time-variant (time series)-Can study trends and changes of data in various time period.

Data remains for a long period of time without modification.

Summarized Information and Reporting facility.

Operational update of data does not occur in the data warehouse environment.

Q2. What were the challenges DirecTV faced on its way to having an integrated
active data warehouse?

Ans: DirecTV needed to maintain high-quality customer care for its millions of satellite
television customers to support growth and prevent customer churn in a highly
competitive market. This included improving overall customer service levels by
optimizing field technician routes for new installations and service calls.

DirecTV also wanted to reduce fraud by proactively alerting field service teams to avoid
truck rolls to new customers who are potentially fraudulent. In addition, the company
wanted to attract new customers through improved execution of targeted marketing
campaigns.

DirecTV was challenged to satisfy the changing needs of its customer service agents
averaging more than 600,000 customer calls per day, and had to handle large data
volumes (10-plus terabytes).
Q3. Identify the major difference between traditional data warehouse and an
active data warehouse, such as the one implemented at DirectTV?

Ans: The differences are:

Traditional Data warehouse Active Data warehousing

Strategic decisions only Strategic and tactical decisions

Results sometimes hard to measure Results measured with operations

Daily,weekly, monthly data currency is


Only comprehensive detailed data
acceptable; summaries are often
available within minutes is acceptable
appropriate

Highly restrictive reporting used to Flexible ad hoc reporting as well as


confirm or check existing process and machine assisted modeling (e.g: data
patterns; often uses predeveloped mining) to discover new hypothesis and
summary tables or data marts relationship

High number of users accessing and


Moderate user concurrency
querying the system simultaneously

Power users, knowledge workers, Operational staffs, call centers, external


internal users. users

Q.4 What strategic advantage can DirectTV derive from the real time system
as opposed to a traditional information system?

Ans: In a world of business today, fast response and quick decision making are key to
success. This is where data warehouse comes into the picture. Data warehouse is
designed to support decision making. It contains both current and historical data that
are useful for decision makers and managers. Some of the advantage of implementing
a data warehouse might be:

Direct Benefits –

 Give end users freedom to carry out wide-ranging analysis in various manners
 Provide a “single version of truth”
 Provide superior and more up-to-date information
 Increase system performance
 Simplify the process of data access
 Identify market trend

Indirect Benefits –

 Increase competitive advantage


 Improve business opportunities
 Enhance the opportunity to attract new customers and retain existingcustomers
 Improve customer services and satisfaction
 Reduce operation costs
 Allow business process redesign and align with business strategy

Obviously, data warehouse offers competitive edge to organizations directly and


indirectly. It is a central repository where all data are stored and can be viewed as well
as analyzed extensively.

Q.5 Why do you think large organization like DirectTV cannot afford not to
have a capable data warehouse?

Ans: Large Organizations like DirectTV cannot afford not to have a capable data
warehouse because of the challenges which are encountered in a data warehousing
management systems which are as follows:

User expectation: As the volume of data in a warehouse increases, warehouse


management systems need to move deeper in warehouses for data analysis. The end-
user in these cases demands and expects more accurate and refined results in return of
processing, however that is not the case with warehouse systems. The performance
decreases with exploding data and so the efficiency of the system reduces.

Systems optimization: Regressive utilization of business intelligence tools require


frequent maintenance and fine tuning of whole system in order to meet users’
expectations. Carefully designed and configured data analysis tool helps in providing
better results for effective business development decisions.

Data structuring: Proper processing of data requires structuring it in a desired format


so that further operations can be executed. As the volume of data increases the task of
structuring the unstructured data add-on in slowing down the processing capabilities of
system and eventually becomes hectic for the system manager to qualify the data for
analytic purpose.

Prefabricated vs. Custom warehouse: The varieties of warehouses available in


market, create ambiguity about which type to choose or go for. Where custom
warehouse saves the time of building the warehouses from various operational
databases from weeks to days or even hours, prefabricated warehouses saves the time
of initial configuration and installation.

Resource Balancing: In order to draw benefit from data warehousing, most


departments inside an organization tend to access the processing capabilities of the
warehouse. This eventually redeuces the performance of the system and decreases the
efficiency as the stress on the system increases. Access control and security are some
techniques which can be used to maintain a balance between the utilization and
performance of warehouse systems.
1.4 End of Chapter Application Case: Continental Airlines Flies
High with its Real Time Data Warehouse

1.4.1 Summary: Continental Airlines went from worst to best using real time data
warehousing and business intelligence. As business intelligence (BI) becomes a critical
component of daily operations, real-time data warehouses that provide end users with
rapid updates and alerts generated from transactional systems are increasingly being
deployed. But several years ago, Continental was in deep financial trouble because of
multiple bankruptcy occurrence. The reason behind this was the low quality service to
the customers. But Continental Airlines saw light of hope when Gordon Bethune
became CEO and initiated the Go Forward plan, which consisted of four interrelated
parts to be implemented simultaneously. In 1999, they chose to integrate its marketing,
IT, revenue and operational data sources into a single, in-house EDW (Enterprise Data
Warehouse) before which they were consuming more time and facing ineffectiveness
in processing queries and instigating marketing programs to its high-value customers.
Soon they started storing real time data feeds into the warehouse, extracts of data from
legacy systems into the warehouse, and tactical queries to the warehouse that required
almost immediate response times. Continental identified and eliminated over $7 million
in fraud and reduced costs by $41 million. With a $30 million investment in hardware
and software over 6 years, Continental has reached over $500 million in increased
revenues and cost savings in marketing, fraud detection, demand forecasting and
tracking, and improved data center management. Continental is now identified as a
leader in real-time BI, based on its scalable and extensible architecture, practical
decisions on what data are captured in real time, strong relationships with end users, a
small and highly competent data warehouse staff, sensible weighing of strategic and
tactical decision support requirements, understanding of the synergies between decision
support and operations, and changed business processes that use real-time data.
1.4.2 Questions and Answers:

Q1. Describe the benefits of implementing the Continent Go Forward


Strategy?

Ans: Go Forward Strategy consisted of four interrelatedness parts to be implemented


simultaneously which will act as supportive tool for making right decision in an
organization.

Some of the benefits are:

 Helped Continental Airlines establish more actionable ways to alter its


industry status from worst to first and then from first to favourite. Technology
became increasingly critical for supporting these new initiatives.
 The Strategy produced substantial strategic value.
 Cost Reduction.
 Elimination of fraud.

Q2. Explain why it is important for an airline to use a real time data
warehouse.

Ans: Importance of use of real time data warehouse for an airline:

 Real time data warehouse provides end users with rapid updates and alerts
generated from transactional systems.
 Revenue management and accounting.
 Customer Relationship Management.
 Crew Operations and payroll.
 Security and fraud.
 Flight Operations

Q4. Identify the major differences between the traditional data warehouse
and a real time data warehouse, as was implemented at Continental.

Ans: Some of the major differences between the traditional data warehouse and real
time data warehouse are:
 A traditional data warehouse (TDW) is used for strategic decisions (and
sometimes tactical); an RDW for strategic and tactical (sometimes
operational) ones.
 The results of using a TDW can be hard to measure; results of using an RDW
are measured by operational data.
 Acceptable TDW refresh rates range from daily to monthly; RDW data must
be up to the minute.
 TDW summaries are often appropriate; RDWs must supply detailed data.
 Small user community at upper organizational levels means a TDW supports
few concurrent users; an RDW must support many, perhaps over a thousand.
 TDWs typically use restrictive reporting to confirm or check patterns, often
predefined summary tables; RDWs need flexible, ad hoc reporting.
 TDW user community generally consists of power users, knowledge workers,
managers, other internal users; RDWs are used by operational staff, call
centers, perhaps external users.

Q5. What strategic advantage can Continental derive from the real-time
system as opposed to a traditional information system?

Ans: Strategic Advantage:

 Strong relationships with end users.


 A small and highly competent data warehouse staff.
 Improvement in Customer valued performance.
 Improvement in security and fraud.
 Continuously update of information.

You might also like