Professional Documents
Culture Documents
on
Data Mining for Business Intelligence
&
Data Warehousing
Submitted to –
Md. Fahami Ahsan Mazmum
Assistant Professor
Department of Management Information Systems
Faculty of Business Studies
Assistant Professor
University of Dhaka
Dear Sir
With your permission and per your instructions, here we are submitting the assignment
that we were assigned on ‘Data Mining for Business Intelligence’ and ‘Data
Warehousing’. The assignment has been completed by the knowledge that we have
gathered from the course “Decision Support System”.
We are thankful for all the knowledge and experience we gathered during doing this
assignment. We have tried our level best to complete this assignment meaningfully and
correctly, as much as possible. We would be happy if you read the report carefully and
provide us your valuable response.
Thanking you.
Yours obediently,
Contents
1.1 Opening Vignette: Data Mining goes to Hollywood................. 4
1.1.1 Summary: ............................................................................. 4
1.1.2 Questions and Answers:....................................................... 5
1.2 End of Chapter Application Case: Data Mining helps develop
Custom-Tailored Product Portfolios for Telecommunication
Companies ................................................................................. 8
1.2.1 Summary: ............................................................................. 8
1.2.2 Questions and Answers:....................................................... 9
1.3 Opening Vignette: DirecTV Thrives with Active Data
Warehousing ............................................................................ 10
1.3.1 Summary: ........................................................................... 10
1.3.2 Questions and Answers:..................................................... 10
1.4 End of Chapter Application Case: Continental Airlines Flies
High with its Real Time Data Warehouse ............................... 15
1.4.1 Summary: ........................................................................... 15
1.4.2 Questions and Answers:..................................................... 16
Chapter-5: Data Mining for Business Intelligence
1.1.1 Summary: Predicting the box office success of a movie is a very difficult task,
which makes the movie business in Hollywood a risky endeavor. On an attempt to solve
this challenging real world problem, Ramesh Sharda and Dursun Delen have explored
the use of data mining to predict the financial performance of a movie at the box office
even before it enters the production stage. They first developed a prediction model, in
which the forecasting (regression) problem was converted into a classification problem.
It means that they classified a movie based its box office receipts in one of nine
categories (ranging from flop to blockbuster, rather than forecasting the point estimate
of box-office receipt. The required data was collected from variety of movie-related
databases. Then it was consolidated into a single data set. After the data collection, the
prediction models were developed by Sharda and Delen using various data mining
methods like neural networks, decision trees, support vector machines and three types
of ensembles. The data from 1998 to 2005 were used as training data to build the
prediction models, and the data from 2006 was used as the test data to assess and
compare the models’ prediction accuracy. After that the prediction results of all three
data mining methods as well as the results of the three different ensembles were
published. The researchers claimed that these prediction results were better than any
reported in the published literature for this problem domain. Beyond the attractive
accuracy of their prediction results of the box-office receipts, these models could also
be used to further analyze (and potentially optimize) the decision variables in order to
maximize the financial return. Specifically, the parameters used for modeling could be
altered using the already trained prediction models in order to better understand the
impact of different parameters on the end results. During this process, which is
commonly referred to as sensitivity analysis, the decision maker of a given
entertainment firm could find out, with a fairly high accuracy level, how much value a
specific actor brought to the financial success of a film, making the underlying system
an invaluable decision aid. Therefore it can be said that the data mining is a prime
candidate in predicting and explaining the financial outlook of a movie which most still
think as a form of art and hence cannot be forecasted.
1.1.2 Questions and Answers:
Ans: Hollywood decision makers should make use of data mining because
Data mining supports and improves predictability when enough is known about the
situation to identify the predictors (independent variables) and to build the model.
Data mining can improve the accuracy of predicting box-office receipts, which are
critical to their financial success.
With data mining decisions are based on data-driven forecasting model and a
classification model rather than on hunches and wild guesses.
Decision makers should make use of Data mining so as to predict the financial
performance of a motion picture at the box office before it even enters
production(while the movie is nothing more than a conceptual idea).
Q2. What are the top challenges for Hollywood managers? Can you think of
other industry segments that face similar problems?
Ans: Hollywood managers have to allocate their scarce resources (budget, actor,
facilitates, directors, etc.) to get the highest return on investments. Movies are capital
investments for Hollywood. They invest in movies for the same reason that other types
of companies (manufacturers, retailers, service sector, and entertainment, financial)
make investments to maximize return on investment (ROI). The top challenges facing
all of those industry sectors are to identify which investments and which combination
of investments will maximize ROI at a particular time; and which variables (predictors)
to consider in evaluating alternatives.
Q3. Do you think the researchers used all of the relevant data to build
prediction models?
Ans: No I don’t think the researchers used all of the relevant data to build the prediction
model.
Q4. Why do you think the researchers choose to convert a regression problem
into a classification problem?
Q5. How do you think these prediction models can be used? Can you think of
good production system for such models?
Ans: The classification process helps in assigning huge chunks of records to a particular
class as accurately as possible. And also it consists of a test sets and training sets of
data. The training sets of data are used to build a model and tests sets of data are used
to validate or determine the accuracy of the model. And when a model is developed
than it is used for making a predictive analysis or forecasting the behavior.
Q6. Do you think the decision makers would easily adapt to such an
information system?
Ans: Making the right decisions to manage large amounts of money is critical to
success of many of companies in this marketplace. And the main motto of decision
makers in every organization is to maximize the company’s profit as much as possible.
The use of right kind of technology to make right decision might act as a competitive
advantage among the competitors. And the model used by Hollywood decision makers
helped them in predicting the financial performance of a motion picture at the box office
before it even enters production. Also it provided a way to improve the accuracy of
predicting box-office receipts, which are critical to their financial success. So I think
decision makers would adapt to such an information system gradually.
Q7. What can be done to further improve the prediction models explained in
this case?
Ans: To further improve the prediction model, researchers should deal with more and
more data.
1.2 End of Chapter Application Case: Data Mining helps develop
Custom-Tailored Product Portfolios for Telecommunication
Companies
Q1. Why do you think that consulting companies are more likely to use data
mining tools and techniques? What specific value proposition do they offer?
Ans: Consulting companies use data mining tools and techniques because the results
are valuable to their clients. Consulting companies can develop data mining expertise
and invest in the hardware and software and then earn a return on those investments by
selling those services. Data mining can lead to insights that provide a competitive
advantage to their clients.
Ans: In order to offer a comprehensive set of intelligence services, the company needed
a comprehensive tool—or else their analysts needed to learn many different tools. After
12 months of evaluating a wide range of data mining tools, the company chose Statistica
Data Miner because it provided the ideal combination of features to satisfy most every
analyst’s needs and requirements with user-friendly interfaces.
Q3. What was the problem that Argonauten360° helped solve for a call-by-call
provider?
Q4. Can you think of other problems for telecommunication companies that
are likely to be solved with data mining?
Ans: Yes, such as predicting customer churn (lost customers), predicting demand for
capacity, predicting the volume of calls for customer service based on time of day and
predicting demographic shifts.
Chapter-8: Data Warehousing
Focusing on the modeling and analysis of data for decision makers, not on daily
operations or transaction processing.
Provide a simple and concise view around particular subject issues by excluding data
that are not useful in the decision support process.
Time-variant (time series)-Can study trends and changes of data in various time period.
Operational update of data does not occur in the data warehouse environment.
Q2. What were the challenges DirecTV faced on its way to having an integrated
active data warehouse?
Ans: DirecTV needed to maintain high-quality customer care for its millions of satellite
television customers to support growth and prevent customer churn in a highly
competitive market. This included improving overall customer service levels by
optimizing field technician routes for new installations and service calls.
DirecTV also wanted to reduce fraud by proactively alerting field service teams to avoid
truck rolls to new customers who are potentially fraudulent. In addition, the company
wanted to attract new customers through improved execution of targeted marketing
campaigns.
DirecTV was challenged to satisfy the changing needs of its customer service agents
averaging more than 600,000 customer calls per day, and had to handle large data
volumes (10-plus terabytes).
Q3. Identify the major difference between traditional data warehouse and an
active data warehouse, such as the one implemented at DirectTV?
Q.4 What strategic advantage can DirectTV derive from the real time system
as opposed to a traditional information system?
Ans: In a world of business today, fast response and quick decision making are key to
success. This is where data warehouse comes into the picture. Data warehouse is
designed to support decision making. It contains both current and historical data that
are useful for decision makers and managers. Some of the advantage of implementing
a data warehouse might be:
Direct Benefits –
Give end users freedom to carry out wide-ranging analysis in various manners
Provide a “single version of truth”
Provide superior and more up-to-date information
Increase system performance
Simplify the process of data access
Identify market trend
Indirect Benefits –
Q.5 Why do you think large organization like DirectTV cannot afford not to
have a capable data warehouse?
Ans: Large Organizations like DirectTV cannot afford not to have a capable data
warehouse because of the challenges which are encountered in a data warehousing
management systems which are as follows:
1.4.1 Summary: Continental Airlines went from worst to best using real time data
warehousing and business intelligence. As business intelligence (BI) becomes a critical
component of daily operations, real-time data warehouses that provide end users with
rapid updates and alerts generated from transactional systems are increasingly being
deployed. But several years ago, Continental was in deep financial trouble because of
multiple bankruptcy occurrence. The reason behind this was the low quality service to
the customers. But Continental Airlines saw light of hope when Gordon Bethune
became CEO and initiated the Go Forward plan, which consisted of four interrelated
parts to be implemented simultaneously. In 1999, they chose to integrate its marketing,
IT, revenue and operational data sources into a single, in-house EDW (Enterprise Data
Warehouse) before which they were consuming more time and facing ineffectiveness
in processing queries and instigating marketing programs to its high-value customers.
Soon they started storing real time data feeds into the warehouse, extracts of data from
legacy systems into the warehouse, and tactical queries to the warehouse that required
almost immediate response times. Continental identified and eliminated over $7 million
in fraud and reduced costs by $41 million. With a $30 million investment in hardware
and software over 6 years, Continental has reached over $500 million in increased
revenues and cost savings in marketing, fraud detection, demand forecasting and
tracking, and improved data center management. Continental is now identified as a
leader in real-time BI, based on its scalable and extensible architecture, practical
decisions on what data are captured in real time, strong relationships with end users, a
small and highly competent data warehouse staff, sensible weighing of strategic and
tactical decision support requirements, understanding of the synergies between decision
support and operations, and changed business processes that use real-time data.
1.4.2 Questions and Answers:
Q2. Explain why it is important for an airline to use a real time data
warehouse.
Real time data warehouse provides end users with rapid updates and alerts
generated from transactional systems.
Revenue management and accounting.
Customer Relationship Management.
Crew Operations and payroll.
Security and fraud.
Flight Operations
Q4. Identify the major differences between the traditional data warehouse
and a real time data warehouse, as was implemented at Continental.
Ans: Some of the major differences between the traditional data warehouse and real
time data warehouse are:
A traditional data warehouse (TDW) is used for strategic decisions (and
sometimes tactical); an RDW for strategic and tactical (sometimes
operational) ones.
The results of using a TDW can be hard to measure; results of using an RDW
are measured by operational data.
Acceptable TDW refresh rates range from daily to monthly; RDW data must
be up to the minute.
TDW summaries are often appropriate; RDWs must supply detailed data.
Small user community at upper organizational levels means a TDW supports
few concurrent users; an RDW must support many, perhaps over a thousand.
TDWs typically use restrictive reporting to confirm or check patterns, often
predefined summary tables; RDWs need flexible, ad hoc reporting.
TDW user community generally consists of power users, knowledge workers,
managers, other internal users; RDWs are used by operational staff, call
centers, perhaps external users.
Q5. What strategic advantage can Continental derive from the real-time
system as opposed to a traditional information system?