Professional Documents
Culture Documents
MOHSIN ALI
EP-1651023
MCS FINAL (EVE)
DATA MINING
Data mining is a knowledge discovery process in large and complex data sets, refers to
extracting or “mining” knowledge from large amounts of data.
DATA SET
It is a collection of data, usually presented in tabular form. Each column represents a
particular variable.
PRE-PROCESSING
Data mining requires substantial pre-processing of data. This was especially the case of the
behavioural data. To make the data comparable, all data needs to be normalized.
GENERAL RESULTS
This activity is related to overall assessment of the effort in order to find out whether some
important issues might have been overlooked.
DECISION TREES
Decision trees are powerful and popular tools for classification and prediction. The
attractiveness of decision trees is due to the fact that, these decision trees represent rules.
Association Rules
Association rules describe events that tend to occur together. They are formal statements in
the form of X=>Y, where if X happens, Y is likely to happen (Márquez et al., 2008).
CLASSIFICATION
It is the activity of generalizing known structure to apply to the new data set. It may include
the analysis of E-mail whether it is valid or a Spam.
REGRESSION
It tries to extract a function that models the data with the least error.
WEB MINING
The application of data mining techniques to discover patterns from the web (www) and
categorical extraction and evaluation with filtered information for knowledge discovery from
sophisticated web data and its appropriate web services is known as Web Mining.
In the process of data preparation of Web usage mining, the Web content and Web site topology will be
used as the information sources which interacts Web usage mining with the Web content mining and Web structure
mining. The Web usage mining is parsed into three distinctive phases:
Pre-processing
Pattern Discovery
Pattern Analysis
USE:
Usage processing, used to complete pattern discovery. This first use is also the most difficult
because only bits of information like IP addresses, user information, and site clicks are
available.
Use is content processing, consisting of the conversion of Web information like text, images,
scripts and others into useful forms.
Use is structure processing. This consists of analysis of the structure of each page contained
in a Web site.
TARGETED ADVERTISING
Ads are a major source of revenue for web portals and web sites and e-commerce sites.
Internet advertising is probably the ―hottest‖ web mining application today.
FRAUD
Maintain a signature for each user based on buying patterns on the web. If buying pattern
changes significantly, then signal fraud.
PERFORMANCE MANAGEMENT
Annual bandwidth demand is increasing ten fold on average, annual bandwidth supply is
rising only by a factor of three.
FAULT MANAGEMENT
Analyze alarm and traffic data to carry out root cause analysis of faults.