You are on page 1of 6

WINTER 2013 ASSIGNMENT MC0088- DATA WAREHOUSING & DATA MINING 1.

Differentiate between Data Mining and Data Warehousing.

Ans. Data mining is actually the analysis of data. It is the computer-assisted process of digging through
and analyzing enormous sets of data that have either been compiled by the computer or have been inputted into the computer. In data mining, the computer will analyze the data and extract the meaning from it. It will also look for hidden patterns within the data and try to predict future behavior. Data Mining is mainly used to find and show relationships among the data. The purpose of data mining, also known as knowledge discovery, is to allow businesses to view these behaviors, trends and/or relationships and to be able to factor them within their decisions. This allows the businesses to make proactive, knowledge-driven decisions. The term data mining comes from the fact that the process of data mining, i.e. searching for relationships between data, is similar to mining and searching for precious materials. Data mining tools use artificial intelligence, machine learning, statistics, and database systems to find correlations between the data. These tools can help answer business questions that traditionally were too time consuming to resolve. Data Mining includes various steps, including the raw analysis step, database and data management aspects, data preprocessing, model and inference considerations, interestingness metrics, complexity considerations, post-processing of discovered structures, visualization, and online updating. In contrast, data warehousing is completely different. However, data warehousing and data mining are interrelated. Data warehousing is the process of compiling information or data into a data warehouse. A data warehouse is a database used to store data. It is a central repository of data in which data from various sources is stored. This data warehouse is then used for reporting and data analysis. It can be used for creating trending reports for senior management reporting such as annual and quarterly comparisons. The purpose of a data warehouse is to provide flexible access to the data to the user. Data warehousing generally refers to the combination of many different databases across an entire enterprise. The main difference between data warehousing and data mining is that data warehousing is the process of compiling and organizing data into one common database, whereas data mining is the process of extracting meaningful data from that database. Data mining can only be done once data warehousing is complete.

2. Describe the key features of a Data Warehouse

Ans. The data warehousing contains the combinations of technology, methodologies, tools and
techniques , user management system and data manipulation systems. But according to the dictionary definition warehousing is to gather up the data from the different and various kind or resources and interrelated the complied information collected from the different types of data resources. May leading organization has their separate warehouse for the collection and maintenance of data.

Historical Background:
Data warehousing currently hold important position into the market concerning the organization it is done for. It captures huge area in the present economy. the most famous people into data are housing foundation and origination are the Ralph Kimball and Bill Inmon .they are collectively known as the pioneer of the data warehousing. Before the arrival of the data warehousing there was no concept for the data storage and synchronization according to the need of the data. Many reach papers were published in the year 2002 .it was the same year when it was found. But the core concept was evolved in 1990 by the Bill.

Features of the Data warehousing:


There are very unique and significant properties of the data warehousing .some of the major ones are as follows

Decision making support:


Warehousing provide great support in the entire decision making process because its core components involves all the major plans, methodologies and technique that will be implemented to achieve the goal. Conceptualized and complied form of data is nicely helpful in taking quick and accurate decisions.

Subject orientation:
Another important characteristic of the warehousing is that it s subject oriented. the data is gathered from the different resources each resource has different background and applicational secularities .this helps in smoothing the companies regular operations because grounded available with the help of warehousing. knowledge about all required is

Integration:
Another important and fundamental characteristic of the warehouse is the integration of the data. The data is gathered form the different resources and then merged after compiling it to the single database. Which is dynamically and diversely applicable.

Time flexibility:
All the data that is stored at the warehouses are identified through the specific time period according to the need of the data.

Non volatile form f data:


Before the arrival of warehousing properly it was known the secondary storage is the best way to save the information but warehousing also supported the integration, cohesiveness and multi dimensional application of the data. Warehousing is one of the finest way to preserve the entire knowledge for the effective utilization in the future. The data stored in the warehouses remain stable and safe. This makes data more reliable.

Bulk storage:
Data can be stored in the large volumes according to the sizing of the warehouse. it depends on the organization what kind and amount of data they required to store or the future use.

Accurate and grounded:


Another property of the data stored at the warehouses is that the data is accurate and grounded containing all the practically possible theories and techniques. We can say that essence of the related field is stored at the warehousing. Number of technologies is involved in preserving the data which make it discrete, effective and multi dimensional.

Future perception:
Warehousing was officially introduced in 2002 and it is becoming famous day by day. at present there are many organizations especially larger one having own warehouses .for preserving different types of data .the advanced engineers and programmer area working to set online warehousing systems to make the access of data more efficient and quicker. Warehousing is one of the most effective techniques for saving the large and dynamic amount of data. 3. Differentiate between Data Integration and Transformation

Ans.

4. Differentiate between database management systems (DBMS) and data mining. Ans. Database Management System (DBMS) is the software that manages data on physical storage devices.
Data Mining: Data mining is the process of discovering relationships among data in the database. Area Task Type of result Method Example question DBMS Extraction of detailed and summary data Information Deduction (Ask the question, verify the data) Who purchased mutual funds in the last 3 years? Data mining Knowledge discovery of hidden patterns and insights Insight and Prediction Induction (Build the model, apply it to new data, get the result) Who will buy a mutual fund in the next 6 months and why?

Data mining is concerned with finding hidden relationships present in business data to allow businesses to make predictions for future use. It is the process of data-driven extraction of not so obvious but useful information from large databases. The aim of data mining is to extract implicit, previously unknown and potentially useful (or actionable) patterns from data. Data mining consists of many up-todate techniques such as classification (decision trees, naive Bayes classifier, k-nearest neighbor, and neural networks), clustering (k-means, hierarchical clustering, and density-based clustering), association (one-dimensional, multidimensional, multilevel association, constraint-based association). Data warehousing is defined as a process of centralized data management and retrieval. Data Warehouse is an enabled relational database system designed to support very large databases (VLDB) at a significantly higher level of performance and manage ability. Data warehouse is an Environment, not a product. It is an architectural construct of information that is hard to access or Present in traditional operational data stores.

5. Differentiate between K-means and Hierarchical clustering Ans. Hierarchal clustering is the sort that you might apply when there is a "tree" structure to the data. Think of the classification of living things. At the top, all of them, then splitting into plants, animals and other things such as funghi. Once you are on the animal branch, these splits into mammals, reptiles, etc, and you can keep going until you get down to individual species. AT NO TIME, when things have been split off from the rest of the data onto one of the branches, do subsets ever move to other branches. You might think about whether this is appropriate for your data. Once you have split your data up into two sets this split is final, and the process only subdivides further - nothing from set one ever moves back into set two. K-means clustering does not assume a tree structure. In its pure form you might ask the computer - split these data values into three groups or four groups, but you can't guarantee that merging two groups from the four-group solution will produce the same as the three-group solution. If you have only two or three dimensions (or can sensibly reduce your data by factor analysis) you can plot it and see what sort of relationships you have. Are you looking for nice spherical clusters, or are long chains more suitable? You might consider that your data values were generated from multivariate normal random variables from groups with different means, and you might consider how best to identify these groups and their means. Sometimes data values fall into such clear groups that almost all clustering methods will find the same clusters. Where the boundaries are fuzzy, the solutions may be very different. I'll end with a little parable. Suppose I have a very willing idiot working for me, and I ask him to arrange my books nicely. He might do this by author or by subject, or by the colour of the cover, or the size of the book, or by weight, or by date of publication. If I simply ask for a "nice arrangement" I ought not to complain about any of these, and I might find one or more useful. If you just ask SPSS to use cluster analysis to produce a "nice arrangement" then, according to the method chosen, the order of the data and a possible random element, you might get one of many rather different nice arrangments, and the "best" of these depends on what you want the clustering for.

6. Differentiate between Web content mining and Web usage mining.


Ans. Web Content Mining Web content mining targets the knowledge discovery, in which the main objects are the traditional collections of multimedia documents such as images, video, and audio, which are embedded in or linked to the web pages. It is different from Data mining because Web data are mainly semi-structured and/or unstructured, while Data mining deals primarily with structured data. Web content mining could be differentiated from two points of view: Agent-based approach or Database approach. The first approach aims on improving the information finding and filtering. The second approach aims on modeling the data on the Web into more structured form in order to apply standard database querying mechanism and data mining applications to analyze it. Web usage mining Web Usage Mining focuses on techniques that could predict the behavior of users while they are interacting with the WWW. Web usage mining, discover user navigation patterns from web data, tries to discovery the useful information from the secondary data derived from the interactions of the users while surfing on the Web. Web usage mining collects the data from Web log records to discover user access patterns of web pages. There are several available research projects and commercial tools that analyze those patterns for different purposes. The insight knowledge could be utilized in personalization, system improvement, site modification, business intelligence and usage characterization. The only information left behind by many users visiting a Web site is the path through the pages they have accessed. Most of the Web information retrieval tools only use the textual information, while they ignore the link information that could be very valuable. In general, there are mainly four kinds of data mining techniques applied to the web mining domain to discover the user navigation pattern: Association Rule mining Sequential pattern Clustering Classification

You might also like