A Study of Application of Web Mining For E-Commerce: Tools and Methodology

A STUDY OF APPLICATION OF WEB MINING FOR
E-COMMERCE: TOOLS AND METHODOLOGY
SaiMing Au
Department of Information System,
City University of Hong Kong,
Tak Chee Avenue,
Hong Kong
Abstract
Internet commerce or e-commerce Key words: e-commerce, web mining,

brings together consumers and merchants all application, tools, methodology.
over the world to a virtual marketplace where
customization, direct marketing, market
segmentation and customer relationship Introduction
management can take place. In this new
marketplace, most marketers find their Today the Web is more than a place for
customer behaviors difficult to understand. information exchange. It is an important
Web site mining enables better understanding marketplace for e-commerce. With the Web,
of the customers, for discovering meaningful every aspect of commerce, from sales pitch to
business correlations and trends, and for final delivery, can be automated and made
providing better sales and marketing services available 24 hours a day all over the world.
over the Web. It is an active research area Companies can use their e-commerce
and the tools and methodology is still platform to improve sales, increase customer
evolving. This comprehensive study reviews satisfaction or reduce cost. E-commerce
its application, tools and methodology to changes the B2B and B2C relationships,
form a knowledge base for future research in enabling new business models and strategies
the area. There are many successful to develop. For instance, B2B developers can
commercial application cases and tools form vertical partnerships and co-branding as
available. The web mining methodology is innovative business solution, and B2C
more involving; in its generic form, it marketers can find new channels to sell
comprises of data pre-processing, domain directly to their customers.
knowledge elicitation, methodology
identification, pattern discovery, and As only an effective web site can fulfill
knowledge post-processing. what it intends to achieve, marketers and web
designers find it necessary to understand the
effectiveness of their sites and to take
International Journal of The Computer, The Internet and Management, Vol. 10, No.3, 2002, p 1 - 14
1
appropriate action when they fall short. They mass storage with data management
want to know who their customers are and will be critical. Data mining is needed
how they react to their web sites. Although to extract information from the database.
they cannot meet them face-to-face, Data mining can help the users find
fortunately there are many footprints left by information on the web. Commercial
the web surfers that enable them to study the site like the Questia store over 35,000
customer behavior. The key is the computer books and deliver them on-line. They
log. With enormous data of the web surfers use intelligent agents to match the
available, web mining can be used to learn queries of the users with their stored
rules relating to the behavior of customers, materials.
turning data into valuable knowledge and
untapped business opportunities. Browsing enhancement software can be
dated back as early as in 1994, with
Owing to the great impact of web mining Letizia produced as a user interface
on e-commerce, both the academic and the agent that assists a user browsing the
commercial sectors are doing a lot of World Wide Web. As the user operates
researches and application works on this area. a conventional Web browser such as
With such cross-disciplinary efforts, there is Netscape, the agent tracks user behavior
the need to summarize the current research and attempts to anticipate items of
directions and results. This study performs a interest by doing concurrent,
comprehensive survey of applications area autonomous exploration of links from
and cases of web mining in e-commerce, and the user's current position. The agent
looks into examples of the tools and the automates a browsing strategy
details of the methodology. consisting of a best-first search
augmented by heuristics inferring user
interest from browsing behavior. It
Application of web mining in e-commerce learns user preferences and discovers
Web information sources that
The web environment is ideal for having correspond to these preferences.
interactive communication and flexible
transaction between the sellers and the buyers. Customized marketing is one key aspect
Customers can place order anywhere at any in e-commerce. The Web servers
time. More proactively, many web miners function as pushier, with the
can base their offers on visitor profiles to document to be pushed being
create new products that match the results of determined by a set of association rules
their analysis. There are many application mined from a sample of the access log
areas, which include the digital library, of the Web server [1]. For instance,
browsing enhancement, customized Perkowitz and Etzioni [2] mine the data
marketing, personalization, customer buried in Web server logs to produce
relationship management, web advertising adaptive Web sites that automatically
and web site quality improvement. improve their organization and
presentation by learning from visitor
Digital libraries are essentially data access patterns. It allows the service
management and information provider to customize and adapt the
management systems that have to sites interface for the individual user,
interoperate on the web. Due to the and to improve the sites static structure
large amount of data, integration of within the underlying hypertext system.
2
NetZero, track subscribers traffic
Web personalization tailors the Web patterns throughout their online session
experience according to the users and uses the information it collects to
preferences. A good example of display advertisements and content that
e-commerce site using personalization may be of interest to subscribers.
is the Amazon.com, in which customer Advancement in the area like the Latent
profiles are stored in the database and Semantic Analysis (LSA) information
appropriate recommendations are retrieval technique by Murray and
pushed to different customers. Most Durrell [5] is used for targeting
customers welcome this service as very advertisement. They construct a vector
often they find the recommended space to represent the usage data
products really meeting their needs. associated with each Internet user of
The technology applies interest. This enables the marketer to
collaborative-filtering to recommend infer the demographic attributes of the
items liked by similar users. Thus users Web users.
are grouped by sharing similar interest. Web mining can be helpful in the
Then a user is recommended items, development of strategy to improve the
which his similar users have rated web sites. Spiliopoulou et al [6]
highly and he has not seen before. propose a new methodology based on
Recent development like the Inductive the discovery and comparison of
Logic Programming based INDWEB navigation patterns of customers and
helps Internet users browse the Web by non-customers. The comparison leads
learning a model of their preference [3]. to rules on how the sites topology
should be improved. Web caching,
Customer relationship management prefetching and swapping can be
adopts a total quality management applied to improve access efficiency.
approach to serve the customers. The problem of classifying customers
Marketing experts divide the customer can also be solved by using a clustering
relationship life cycle into three distinct method based on the access pattern [7].
steps, which cover attraction, retention, Using attribute-oriented induction, the
and cross sales. Buchner and Mulvenna sessions are then generalized according
[4] suggest using adaptive web sites to to a page hierarchy, thereby organizing
attract customers, using sequential pages based on their contents. These
patterns to display special offers generalized sessions are finally
dynamically to keep a customer interest clustered using a hierarchical clustering
in the site, and using customer segments method.
for cross-selling.
Advertising accounts for the highest

sales revenue in the e-commerce. At Overview of web mining tools
present, there are several commercial
services and software tools for Web mining is more than a simple
evaluating effectiveness of Web application of ordinary data mining to the
advertising in terms of traffic and sales web data. The lack of structure and dynamic
driven by them. They use metrics such nature of the Web content adds difficulty to
as click-through rates and ad banner data extraction and mining. Moreover, instead
ROI. Commercial agents, like the
3
of using conventional market research data or allows for better logging of the truly
customer database showing demographics, personalized and interactive web behaviors of
researchers need to rebuild the profiles of the customers.
their customers using computer logs, web
content and new transaction variables. There are many computer programs
Luckily the log data are relatively easy and (Table 1) that log the visitors and provide
cheap to collect. The recent rapid some statistical information. They are not
development and growing interest in Web web mining software as they provide little
mining for the e-commerce is aided by the analysis and no data mining facilities. They
technical advancement in the use of scripting provide basic statistics pertaining to the
and CGI that replaces the static web pages by visitor categories (by visit frequency), referral,
dynamic contents using web page generation browsing pattern, traffic pattern, entry and
on request and applets-like applications. This leaving pattern, etc.
Table 1: Some Web log / Web site traffic analysis programs
Product Author / Feature Function

Company
Analog University of Measures the usage on web server. It tells Log file analyzer
Cambridge which pages are most popular, which
Statistical countries people are visiting from, which
Laboratory sites they tried to follow broken links from.
Webalizer GNU project Supports standard Common Log file Web server log
Format server logs. In addition, several analyzer
variations of the Combined Log file Format
are supported, allowing statistics to be
generated for referring sites and browser
types as well.
NetTracker Sane Solution Analyze multiple web sites, as well as Web server log
proxy server and firewall log files to analyzer
monitor the organizations' web surfing
patterns, plus FTP log files
Weblog Webscripts Relies on Datalog-like rules to represent Web content
web documents mining
For genuine web mining software, in the or less structured text files. It is useful to
taxonomy of web mining, three types of web information retrieval for indexing documents
mining are identified according to their main and assisting users to locate information.
purpose, viz. web content mining, web link Many applications are developed to serve this
structure mining and web usage mining. or related purposes (Table 2). Content
mining techniques draw heavily from the
Web content mining is about extracting work on information retrieval, databases,
the important knowledge from non-structured intelligent agents, etc. It is used for web
4
page summarization and search engine application, it is used for classifying the type
result summarization, discovering of web pages that the surfer often visits before
information and extracts knowledge from web site personalization can be done.
text documents. For e-commerce
Table 2: Content mining programs

Company
Intelligent IBM Implements a variety of analysis functions based on Content mining
Miner for utilizing an automatically created semantic network
Text, of the investigated text.
TextAnalyst
MetaCrawler Selberg & Provides an interface for specifying a query to Search agent
Etzioni, 1995 several engines in parallel
WebWatcher Amstrong et An agent helping the users to locate the desired Personal agent
al, 1995 information, users input required
Letizia Lieberman Uses the idle processing time available when the user Behavior-based
is reading a document to explore links from the interface agents
current position Personal agent
SiteHelper Ngu and Wu, Use log data to identify the pages viewed by a given Page
1997 user in previous visits to the sites recommendation
LexiBot Bright Planet Search agent capable of identifying, retrieving, Search agent
classifying and organizing "surface" and "deep" Web
content.
Webdoggie MIT Collaborative approach to suggests new WWW Information

documents to the user based on WWW documents in filtering agent
which the user has expressed an interest in the past
The second form of web mining, the web authoring style and content variation than that
structure mining establishes structures seen in traditional text-document collections.
amongst many web pages. It identifies This level of complexity makes an
authoritative web pages or hubs to improve "off-the-shelf" database-management and
the overall structure of a series of web pages. information-retrieval solution impossible and
Essentially, the Web is a body of hypertext of calls for the need to mine the link structure
approximately 300 million pages that from the web pages. A way of structure
continues to grow at roughly a million pages mining takes the advantage of the collective
per day. The set of web pages lacks a judgment of web page quality in the form of
unifying structure and shows far more hyperlinks. The most frequently visited paths
5
in a Web site are used as the objective program for structure mining is the WebViz
assessment of the quality of web sites as by Pitkow et al [8]. It is a system for
perceived by the customers, as path analysis visualizing WWW access patterns. It allows
can be used to determine. This principle is the analyst to selectively analyze the portion
used by popular programs like the PageRank of the Web that is of interest by filtering out
and CLEVER (see Table 3). Another popular the irrelevant portions.
Table 3: Programs for web structure mining

Company
PageRank Larry Page Using the Web link structure as an Search engine
indicator of an individual page's value.
In essence, it interprets a link from
page A to page B as a vote
CLEVER IBM Almaden Incorporates several algorithms that Hypertext

Research make use of hyperlink structure for Classification,
Center. discovering high-quality information Mining
on the Web Communities
WebViz Tamara Information hierarchy visualization 3D graphical

Munzner, Paul Web as a graph: nodes are documents, representation of
Burchard edges are links the structure of the
Web
The third form, mining for usage pattern to shed light on better structure and grouping
is the key to discover marketing intelligence of resource providers. The WEBMINER [10]
in e-commerce. It helps tracking of general discovers association rules and sequential
access pattern, personalization of web link or patterns automatically from server access
web content and customizing adaptive sites. logs. Commercial software WebAnalyst by
It can disclose the properties and Megaputer learns the interests of the visitors,
inter-relationship between potential based on their interaction with the website.
customers, users and markets, so as to User profiles are modified in real time as
improve Web performance, on-line more information is learned. Clementine and
promotion and personalization activities. DB2 Intelligent Miner for Data are two
There are many popular programs for usage general-purpose data mining tools, which can
pattern mining (see Table 4). Web Log be used for web usage mining with suitable
Mining [9] uses KDD techniques to data preprocessing.
understand general access patterns and trends
Table 4: Some web usage mining programs
6
Product Author / Company Feature Function
WebMate Chen & Sycara, The user profile is inferred from Proxy agent
1998 training examples
WebLogMiner Zaiane et al Use data mining and OLAP on Mining web server log
treated and transformed web access files
files.
SpeedTracer IBM Use the referrer page and the URL Mining web server log
of the requested page as a traversal files
step and reconstructs the user
transversal paths for session
identification
Web usage Myra Spiliopoulou To analyze the navigational Discovers navigation
miner (WUM) behavior of users, appropriate for patterns in the form of
sequential pattern discovery in any graphs
type of log. It discovers patterns
comprised of not necessarily
adjacent events
WEBMINER R. Cooley and J. A general and flexible framework Restructure a Web site,
Srivastava for Web usage mining, the and in analyzing user
application of data mining access patterns to
techniques, such as the discovery of dynamically present
association rules and sequential information tailored to
patterns, to extract relationships specific groups of users
from data collected in large Web
data repositories
Clementine SPSS To browse data using interactive CRM
graphics to find important features
and relationships
WebAnalyst Megaputer Integrates the data and text mining Profiles the website
capabilities of analytical software resources and
directly dynamically identifies the
most appropriate
resources to serve each
visitor
DB2 Intelligent (IBM) Provides a single framework for User database miner
Miner for Data database mining using proven,
parallel mining techniques
Methodology of web mining into 4 stages: data preprocessing, domain

knowledge elicitation, methodology
The web mining process can be divided identification and knowledge post processing.
7
The details of these steps are different with are cross-tabulated in Table 5.
different purposes for web mining and they
Table 5: Methodology of web mining
Stages Content mining Web link mining Web usage mining

Data Linguistic Definition of session; Selection of data source
preprocessing processing: tagger, To reorganize log Definition of session
expression, entries supported by Process of web log
terminology, meta data; content
semantics, Manipulation of date Transaction
Feature extraction, and time related fields identification (content /
Document analysis Removal of futile navigation- content):
(Content-only), entries E.g. Page representation
Feature selection,
Feature weighting,
Domain Incorporation of Traffic analysis; Syntactic constraint,
knowledge linguistic, lexical, mining dynamic log Navigation template,
elicitation and contextual file Network topologies,
(feature techniques Concept hierarchy
selection)
Methodology To build a Sequence Build data cubes from
identification n-dimensional web association Web server logs for
log cube Build data cubes OLAP and data mining
Application of from Web server Research on query
OLAP logs for data mining language
Knowledge post Directory hierarchy Path and node Graph and visualization
processing Search result representation Rule extraction
Model validation
As shown by the complexity of the approach and a holistic integrated approach.

methodology concerned, instead of covering
all the aspects, the following discussion uses Zaiane et als WebLogMiner [9] adopt an
web usage mining to highlight the procedure OLAP data warehousing approach to build up
involved. Web usage mining is chosen, as it a data warehouse or data mart from Internet
is a more popular research area relating to log files and then use OLAP or data mining
e-commerce and it is useful to the customer technique. The most important step is to
behavior. As a matter of fact, web usage obtain a good data mart that can be
mining incorporates the input from both summarized in Fig.1. With so many data
context mining and link structure mining. fields and records, it calls for sophisticated
techniques for data warehousing and data
There are two approaches for web usage mining.
mining, viz. an OLAP data warehousing
8
Traffic Sales Customer Product Impression Hyperlink
data data data data data semantic labels
Data preprocessing Domain knowledge elicitation
Data mart for e-commerce analysis
Fig. 1: Data mart preparation
On the other hand, the holistic approach Aggregated trees are generated from log files
proposed by Buchner et al [11] uses an in order to discover user-driven navigation
integrated approach with emphasis on the patterns.
interaction of different sub-systems and
agents to extract information from various The 4 stages of web usage mining viz,
sources. They use the Midas (Mining Internet data preprocessing, domain knowledge
Data for Associative Sequences) to discover elicitation, methodology identification and
marketing intelligence from Internet data. knowledge post processing are more
The data sources include server data (server distinctive in the holistic approach and they
log, error log and cookie log), marketing data are discussed in the subsequent sections.
and knowledge (customers, products,
transaction, domain expertise), and web meta First step: data preprocessing
data. They emphasize on the involvement of
different experts required for different Data preprocessing involves data source
sub-system: a web administrator, a marketing selection, session identification and
expert, and a data-mining specialist. Web transaction identification. Firstly, the
Site Information Filter (WEBSIFT) by researchers need to select data sources from
Cooley et al [10] contains an architecture web server logs, referral logs, registration
elaborated on the same theme. Spiliopoulous files, cookies log, transaction log, index
[12] sequence discoverer for web data, the server logs and the query data to a web server,
Web Utilization Miner WUM is very similar. etc. The server log contains information
It pre-processes the data and organizes the log about the click-streams that shows how a web
into sessions according to user-specified site is navigated and used by its visitors. The
criteria. Subsequently, an aggregation transaction log relates the click-stream data of
service transforms the log of sequences into the users to actual purchases and is useful for
a tree structure, where sequences with the understanding the effectiveness of marketing
same prefix are merged. WUM processes this and merchandising efforts. In addition, the
reduced-size structure and applies further transaction log can be merged with some
heuristics to improve performance. external database containing data from
9
customer surveys, point-of-sale terminals, space model in which documents are
inventory databases and product-mix represented as real-valued vectors. Each
information. element corresponds to the frequency of
occurrence of a particular term in the
Secondly, researchers need to define document.
session to overcome the difficulty of
identification of individual users in a web, as For the navigation-content transactions,
many web servers only record the IP address they consist of a single content reference and
of clients shared by more than one user. A all of the navigation references in the
session is defined as an episode of interaction traversal path leading to the content reference.
between a Web users and the Web server These transactions can be used to mine for
consisting of the pages the user visited and path traversal patterns. They are represented
the time spent on each page. It is grouped by in many different ways:
consecutive pages requested by the same user
together. The use of a properly defined Maximal forward reference is
session is still limited by many other factors defined by Chen et al [14] as the set of pages
and the related results must be interpreted in the path from the first page in the log for a
carefully. For instance, the session cannot tell user up the page before a backward reference
the browsing time of the users as the actual is made. A new transaction is started when
time spent on each page is always different the next forward reference is made. A
from the estimation because of network forward reference is defined to be a page not
traffic, server load, user reading speed, etc. already in the set of pages for the current
Moreover, the existence of caching functions transaction. A backward reference is defined
and proxy servers render the server log not to be a page that is already contained in the set
reflecting the actual history of browsing. of pages for the current transaction. The
WEBMINER system [11] currently has
Thirdly, researchers need to identify reference length, maximal forward reference,
transaction by extracting appropriate fields and time window divide modules, and a time
and merge them together into meaningful window merge module to achieve the
clusters of references for each user. This transaction definition. This can identify the
process can be extended into multiple steps of most traversed paths through a Web locality.
merge or divide in order to create transactions
for a given data mining task. One way is to Page Interest Estimator (PIE)
classify a given page to either content-only or argues that the History, Bookmarks and Links
navigation-content, based on the time spent of the users indicate their interest [15]. In the
on it [9]. History, a higher frequency and more recent
visits of an URL indicate stronger user
Content-only transactions consist of all interest of that URL. For the Bookmarks,
of the content references for a given user pages that are bookmarked are of strong
session. These transactions can be used to interest to the user. Similarly, a higher
discover associations between the content percentage of links visited from a page
pages of a site. This kind of "page typing" is indicates a stronger user interest in that page.
delineated by Pirolli et al [13], using various Considering all these factors thus can form an
page types such as index pages, personal index reflecting the interest of the users.
home pages, etc. in the discovery of user
patterns. Their page representation represents Web Access Graph (WAG) looks at
the content of the visited pages using a vector the path transversal patterns of the customers
10
instead of looking at the content of page content supported by external
individual page. A WAG is a weighted knowledge. In order to discover web-specific
directed graph to represent a users access sequential patterns, domain knowledge may
behavior. Each vertex in the graph represents be incorporated with the objective to
a web page and stores the access frequency of constrain the search space of the algorithm,
that page. The intensity of a vertex indicates reduce the quantity of pattern discovered and
the interest level of the corresponding page increase the quality of the discovered patterns.
and the thickness of an edge depicts the From the raw data, the modeler can generate
degree of association between two different some relationships. For instance it can report
pages. Chan estimates the users interest of a the most frequent visitors to a set of web
web page by locating multi-word phrases to pages by demographic classification.
enrich the common bag-of-words Sometimes some business models may be
representation for text documents [15]. PIEs useful. For example, visitors may be
then learns to predict the users interest on classified as short time visitors, active
any web page, and a WAG is used to investigators and customers as appropriate
summarize the web page access patterns of a and the data will be drawn and reported
user. The user profile can be utilized to accordingly. A popular bunch of tools use
analyze search results and recommend new some flexible navigation templates for
and interesting pages. domain knowledge representation, which is
summarized by Baumgarten et al [11]:
Sequential patterns are special
patterns discovered from the log, which can Syntactic constraint uses threshold
be constrained by a number of factors, such as sextuples representing the minimum support
support, minimum and maximum time gaps and minimum confidence. It is possible to
(between user sessions). These patterns are eliminate shallow navigational patterns and
then recorded and mined. This allows any enables the researchers to focus on more
number of log files to be combined in any important navigation patterns.
order determined by the analyst. Sequential
pattern is first proposed by Agrawal & Navigation template can specify the
Srikant, using the GSP algorithm [16]. The pattern in the start, middle and end with
MiDAS proposed by Baumgarten et al is an constants, wildcards, and predicates
improvement over the GSP algorithm [11], restricting the permissible values of the
using depth-dependent pattern trees. pattern. This can prepare tuples that are
specified by the researchers.
Hypertext probabilistic grammar is
used by Borges and Levene to capture the Network topologies can derive the
user navigation behavior patterns [17]. topology from log files, based on all site
Higher probability strings are used to reflect internal http referrers - URL document name
the users preferred trials. They propose links.
using entropy as an estimator of the
grammars statistical properties. Concept hierarchy redefines page
relationships other than the URL document
Second step: Domain knowledge name links. A typical application is the
specification topological organizations of Internet domain
levels. In addition, marketing-related
Effective data mining relies on further hierarchies, such as product categorizations
process of the transaction log and the web or customer locations can also be used.
11
Classification rules (CHAID), cluster
Third step: Methodology identification analysis and neural network.
and pattern discovery
Fourth step: knowledge post
After the data are preprocessed, the data processing
will be transformed, cleansed, normalized,
integrated with some well-established Knowledge post processing is important
procedures. The processed data then can be to convey the finding to the decision makers.
used for On-Line Analytical Processing The managers like decision rules, as they are
(OLAP) or data mining. easier to understand and apply. Very often,
some other summarization and visualization
OLAP is a special category of query and techniques are required to make the results
reporting tools that can be used to pulled data more meaningful to the market managers. In
out of a database. They are designed to addition, the result of web mining can be
support complex, multi-dimensional and processed by the XML that is a standard
multi-level on-line analysis of large volumes language for formatting the responses from
of data stored in data warehouses. OLAP can database systems so that the web clients can
also be used in path analysis to determine understand the results.
frequent traversal patterns or large reference
sequences from the physical layout type of
graph. It can be used to determine the most
frequently visited paths in a Web site. The Concluding remarks
WEBMINER system [18] proposes an
SQL-like query mechanism for querying the The potential of using a website as a data
discovered knowledge (in the form of collection tool for e-commerce is enormous,
association rules and sequential patterns). because of its interactiveness, simplicity and
Besides, new algorithm like MiDas supports unobtrusiveness. The results of the data
sequence discovery from multidimensional mining would ideally be integrated into the
data to detect sequence across monitored dynamic website to provide an automated,
attributes, such as URLs and http referrers end-to-end functional system for target
[12]. The mechanism has been incorporated marketing and customer relationship
in an SQL-like query language (called MINT), management. Most of the web mining tools
which together form the key components of are evolving and the present web mining
the Web Utilization Analysis platform. techniques still have rooms for improvement
to make them prevail in the e-commerce.
Data mining tools work in a way very Some problems like the need for greater
similar to that of statistical tools, but the user integration, scalability problem, and the need
is much less active in the analysis process for better mining tools are just some problems
than when using the statistical tools. Due to mentioned by many researchers [19].
its own objectives and data representation,
web mining employs a special subset of the The sharpening on the mining tools in
general data mining tools. Web mining tools many different aspects are important for the
can be grouped according to the desired future development in this area:
outcomes: classification, sequence detection,
data dependency analysis and deviation Web usage mining must handle the
analysis. Some of the more popular tools for integration of offline data with e-business
web mining include the Association rule,
12
analytic tools, RDBMS, catalogs of
products and services and other applications. 3. Jacquent, F. and P. Brenot (1998),
Learning User Preferences on the WEB,
Some new variables or logs should Research and Development in Knowledge
be sought that can be used for finding more Discovery and Data Mining, Second
natural, meaningful and useful patterns. Pacific-Asia Conference, PAKDD-98,
Melbourne, Australia, 1998 Proceedings.
New tools are sought which will not
use up too much resources or process time 4. Buchner, A., Maurice Mulvennan, Sarabjot
during the web mining process. Anand and John Huges (2000): An
Internet-enabled Knowledge Discovery
There will always be a need to have Process, (Internet resources, to be
benchmark tests to improve the performance identified later)
of mining algorithms, as the efficiency and
effectiveness of a mining algorithm can be 5. Murray, D and Kevan Durrell (1999):
measured and a better tool for web data Inferring Demographic Attributes of
mining can be used. Anonymous Internet Users,
International WEBKDD99 Workshop
It is important to improve San Diego, Ca, USA, August 1999.
visualization, as much of the data is Revised Papers.
unorganized and difficult for the user to
understand. 6. Berendt, B. and Myra Spiliopoulou (2000):
Analysis of Navigation Behavior in Web
Web mining is a new and rapidly Sites Integrating Multiple Information
developing research and application area. Systems, The VLDB Journal 9:56-75.
With more collaborative research across
different disciplines like database, artificial 7. Fu, Y., Kanwalpreet Sandhu and Ming-Yi
intelligence, statistics and marketing, we will Shih (1999): A Generalization-Based
be able to development web mining Approach to Clustering of Web Usage
applications that are very useful to the Sessions, International WEBKDD99
e-commerce community. Workshop San Diego, Ca, USA, August
1999. Revised Papers.
8. Pitkow, J. and Krishna K. Bharat. Webviz

References (1994), A tool for world-wide web
access log analysis, First International
1. Lan, B. Stephae Bressan, BengChin Ooi WWW Conference.
and Y.C. Tay (1999): Making Web
Servers Pushier, International 9. Zanine, O.R., M.Xin, J. Han (1998),
WEBKDD99 Workshop San Diego, Ca, Discovering Web Access Patterns and
USA, August 1999. Revised Papers. Trends by Applying OLAP and Data
Mining Technology on Web Logs, Prod.
2. Perkowitz, M and O. Etzioni (2000), Advances in Digital Libraries Conf. (1998)
Towards Adaptive Web Sites: 19-29.
Conceptual Framework and Case Study,
Artificial Intelligence, 118 (2000) 10. Cooley, R., Bamshad Mobasher,
245-275.
13
Jaideep Srivastava (2000), Web 16. Agrawal, R. and R. Srikant (1995),
Mining: Information and Pattern Mining Sequential Pattern, Proc. Intl
Discovery on the World Wide Web, Conf. On Data Engineering, 3-15.
http://maya.cs.depaul.edu/~mobasher/
webminer/survey/survey.html 17. Borge, J. and M. Levene (1999), Data
Mining of User Navigation Patterns,
11. Baumgarten, M., Alex Buchner, International WEBKDD99 Workshop
Sarabjot Anand, Maurice Mulvennan San Diego, Ca, USA, August 1999.
and John Huges (1999): User-Driven Revised Papers.
Navigation Pattern Discovery from
Internet Data, International 18. Mobasher, B., H. Dai, T. Luo, Y. Sun and
WEBKDD99 Workshop San Diego, J. Zhu (1999), Integrating Web Usage
Ca, USA, August 1999. Revised and Content Mining for More Effective
Papers. Personalization, International
WEBKDD99 Workshop San Diego, Ca,
12. Spiliopoulou, M., Carsten Pohle and USA, August 1999. Revised Papers.
Lukas Faulstich (1999): Improving the
Effectiveness of a Web Site with Web 19. Torrent System Inc. (2000), Driving
Usage Mining, International e-Commerce Profitability from Online
WEBKDD99 Workshop San Diego, Ca, and Offline Data, Torrent Systems
USA, August 1999. Revised Papers. White Paper.
13. Pirolli, P., J. Pitkow, and R. Rao (1996):

Silk from a sow's ear: Extracting _____
usable structures from the web, Proc.
of 1996 Conference on Human Factors
in Computing Systems (CHI-96),
Vancouver, British Columbia, Canada,
1996.
14. Chen, M.S., J.S. Park, and P.S. Yu (1996):

Data mining for path traversal patterns
in a web environment, In Proceedings
of the 16th International Conference on
Distributed Computing Systems, pages
385--392, 1996.
15. Chan, P.K. (1999): Constructing Web

User Profiles: A Non-invasive Learning
Approach, International WEBKDD99
Workshop San Diego, Ca, USA, August
1999. Revised Papers.
14

A Study of Application of Web Mining For E-Commerce: Tools and Methodology

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Study of Application of Web Mining For E-Commerce: Tools and Methodology

Uploaded by

Copyright:

Available Formats

A STUDY OF APPLICATION OF WEB MINING FOR

E-COMMERCE: TOOLS AND METHODOLOGY

Internet commerce or e-commerce Key words: e-commerce, web mining,

Advertising accounts for the highest

Table 1: Some Web log / Web site traffic analysis programs

Product Author / Feature Function

Table 2: Content mining programs

Product Author / Feature Function

Webdoggie MIT Collaborative approach to suggests new WWW Information

Table 3: Programs for web structure mining

Product Author / Feature Function

CLEVER IBM Almaden Incorporates several algorithms that Hypertext

WebViz Tamara Information hierarchy visualization 3D graphical

Table 4: Some web usage mining programs

Methodology of web mining into 4 stages: data preprocessing, domain

Table 5: Methodology of web mining

Stages Content mining Web link mining Web usage mining

As shown by the complexity of the approach and a holistic integrated approach.

data data data data data semantic labels

Data preprocessing Domain knowledge elicitation

Data mart for e-commerce analysis

Fig. 1: Data mart preparation

8. Pitkow, J. and Krishna K. Bharat. Webviz

13. Pirolli, P., J. Pitkow, and R. Rao (1996):

14. Chen, M.S., J.S. Park, and P.S. Yu (1996):

15. Chan, P.K. (1999): Constructing Web

You might also like