You are on page 1of 5

1. On what kind of data mining task can be performed?

Or
Explain different data repository on which data mining task
can be performed.

● Database-oriented data sets and applications


● Relational database, data warehouse, transactional database
● Advanced data sets and advanced applications
● Data streams and sensor data
● Time-series data, temporal data, sequence data (incl. bio-sequences)
● Structure data, graphs, social networks and multi-linked data
● Object-relational databases
● Heterogeneous databases and legacy databases
● Spatial data and spatiotemporal data
● Multimedia database
● Text databases
The World-Wide Web

● Some database application require efficient data structure and scalable method for
handling complex object structures; variable length records; semi structured or
unstructured data;text;spatio temporal; and multimedia data; and database schema
with complex structures and dynamic changes.

● Due to this reason only advanced database system and specific application oriented
database system has been developed.

Relational Database system

● Collection of interrelated data is known as database and a set of s/w programs to


access and manage data.
● A Relational database is a collection of tables, each of which is assigned a unique
name. Each table consists set of attributes.
● ER model is constructed for relational database which represent the database as a
set of entities and their relationships.
● Relational data's can be accessed by executing SQL query
● A relational query transformed into set of relational operation such as join, selection,
and projection, and is then optimized for efficient processing.
● Relational Queries:
● Show me list of all items that were sold in last quarter.
● Show me the total sales of the last month, grouped by branch.
(sum,avg,min,max,aggregation)
● Mining system can analyze customer data to predict the credit risk of new
customers based on their income, age, and previous credit information.
● Mining system can detect deviations, such as items whose sales are far from those
expected in comparison with previous year

Data warehouse data

● Historical data analysis facilitates decision making.


● Integrated, Subject oriented, Non volatile, Time referenced.
● Multidimensional Analysis of data.
● Data's are organized in separate schema architecture.
Data mining
● Can analyze transactional, spatial, textual, and multimedia data that are difficult to
model with current multidimensional technology.
Transaction Database

● Consists of a file where each record represents a transaction.


● Transaction also consists unique transaction identity number and a list of the items
making up the transaction, other information such as date of transaction, customer id
no, sales person id no and of the branch at which the sales occurred.
● Most relational DBMS do not support nested relational structure where as
transactional DB store list of items ids for each transaction.
Show me all items purchased by Ashok, a simple query on Transactional DB can be
performed.

Data Mining
A regular data retrieval system not able to answer queries like, which items sold well
together?
● Market Basket analysis would enable you to bundle group of items together as a
strategy for maximizing sales.
● Printers are commonly purchased by together with computer; you could offer an
expensive model of printers at a discount to customers buying selected computers, in
the hopes of selling more of the expensive printers.

Spatial and Spatial Temporal Database

● Contains spatial related information's. includes,


● Geographic (map) databases, very large scale integration or computer aided design
databases, medical and satellite image databases.
● Spatial data represented in raster format, consisting of n-dimensional bit maps or
pixel maps.
● Example 2-D Satellite image represented as raster data, where each pixel registers
the rainfall in a given area.
● Maps represented in vector format, where roads,bridges,buildings and lakes are
represented as unions or overlays of basic geometric constructs, such as
points,polygons,and network formed by these components.
● Geographic databases have numerous applications, ranging from forestry and
ecology planning to provide public service information regarding the location of
telephone and electric cables, pipes, and sewage systems.
● Used to vehicle navigation and dispatching systems.Example.Taxys would store a
city map with information regarding one way street, suggested route, and the
location of the restaurant and hospitals as well as the current location each driver.
What kind of Mining-on Spatial?
● Discover pattern describing characteristics of houses located near a specified kind
of location, such as park.
● Describe the climate of mountainous areas located at various altitude
(Height).
● Describe the change in trend of metropolitan poverty rates based on city distances
from major highways.

● A spatial Database that stores spatial objects that change with time is called
spatiotemporal database, from which interesting information can be mined.
● Able to group the trends of moving objects and identify some strangely moving
vehicles, or distinguish a bioterrorist attack from a normal outbreak of the flu based
on the geographic spread of a disease with time.

Text Database and Multimedia database

● Text databases are database that contain word description for objects. It is not a
simple keyword but long sentences or paragraphs such as product specification, error
or bug reports, warning msg, and summary report notes.
● What can data mine on text database uncover?
● Discover general and concise descriptions of the text documents, content
association, as well as clustering behavior of the text document.
● To do this standard data mining method need to be integrated with information
retrieval technique.

Multimedia database

● Multimedia database store image, audio, and video data.


● They are used in applications such as picture content based retrival, voice mail
system, video on demand system, World Wide Web, and speech based user
interfaces that recognize spoken commands.
● Specialized storage and search techniques are need to be integrated with standard
data mining methods. because video and audio data require real time retrieval at a
steady and predetermined rate in order to avoid picture or sound gaps and system
buffer overflows, such data are referred to as continuous media data.
● Construction of multimedia data cubes leads to extraction of multiple features from
multimedia data and similarity based pattern matching.

Heterogeneous and legacy databases

● A Heterogeneous database consists of a set of interconnected, autonomous


component databases. The components are communicated in order to exchange
information and answer queries.
● Heterogeneous data's are not easily integrated.
● A legacy database is a group of heterogeneous database that combines different
kinds of data systems. The heterogeneous database in a legacy database connected
by intra or inter -computer network.
● Information exchange across such database is difficult because it would require
precise transformation rules from one representation to another, considering diverse
semantics.
● School Example. Data mining provide interesting solution to the information
exchange problem by performing statistical data distribution and correlation analysis
and transforming the given data into higher, more generalized, conceptual level(such
as fair ,good, excellent) from which information exchange can then more easily be
performed.

Data streams

● Many applications involve the generation and analysis of a new kind of data, where
data flow and in and out of an observation platform dynamically.
● Features of data streams: huge or possibly infinite volume, dynamically changing,
flowing in and out in a fixed order, allowing only one or a small number of scans and
demanding fast response time.
● Data streams include various kinds of scientific and engineering data ,time series
data ,power supply, network traffic ,stock exchange, telecommunications video
surveillance and whether on environmental monitoring.
● Data streams are not stored in any kind of data repository. Because efficient mgt and
analysis of stream data is an challenging task.

World Wide Web

● www and its associated information services such as yahoo,google,provide rich


worldwide on line information services, where data objects are linked together to
facilitate interactive access.
● Understanding user access pattern will not only help improve system design but also
leads to better marketing decisions.
● Capturing user access pattern in such distributed information environment is called
web usage mining.

● Data model for stream data is Continuous query model consists predefined queries
constantly evaluate incoming streams, collect aggregate data, report the current
status of the data streams and response to their changes.
● Efficient discovery of general patterns and dynamic changes within stream data.
● Example: To detect intrusion of a computer network based on the anomaly of
message flow, which may be discovered by clustering data streams, dynamic
construction of stream models, or comparing the current frequent patterns with that
at a certain previous time.
● Data mining can often provide additional help than the web services.
● Web page analysis based on linkages among pages can help rank web pages based
on their importance, influence, and topics.
● Automated web page clustering and classification help group and arrange web pages
in a multidimensional manner based on their content.
● Web community analysis help identify hidden web social network and communities
and observe their evaluation.

Object Relational Database

● Constructed based on the object- relational data model.


● This model extends the relational model by providing rich data types for handling
complex objects orientation.
● Most sophisticated database application need to handle complex objects and
structures. So this model became very popular in industry and applications.
● Conceptually ORDM inherits the essential concept of OO databases, where each
entity is considered as object.
● Data and code relating to an object are encapsulated into a single unit. Each object is
associated with set of variable which is describe object, a set of messages that the
object can use to communicate with other objects or other database system, a set of
methods where each method holds the code to implement a message.
● Objects that share common properties can be grouped into an object class, each
object is an instant of a class.
● Example: An employee class can contain variables like name, address and birth date.
The sub class of the class, salesperson can inherit all variables of its super class.
● Data mining technique need to be developed for handling complex object structures,
complex data types, class and sub class hierarchies, property inheritance and
methods

Temporal, sequence and Time series database

● A temporal database stores relational data that include time related attributes. These
attribute involve several timestamps, each having different semantics.
● DM tech used to find the characteristics of object evaluation or the trend of changes
for object in the database. Such information useful in decision making and strategy
planning.
● Example: Mining of bank data aid in the scheduling of bank tellers according to the
volume of customer traffic.
● A sequence database store sequences of ordered events with or without concrete
notion of time.(Customer shopping sequences)
● A time series database stores sequence of values or events obtained over repeated
measurements of time.(Hourly,daily,weekly)
● Example: data collected from the stock exchange, inventory control, and observation
of natural phenomena (like temperature and wind).
● DM tech used to find the characteristics of object evaluation or the trend of changes
for object in the database
● Stock exchange data mined to discover trends that could help you to plan investment
strategies.(When is the best time to purchase ALL ELECTRONICS STOCK? This
analysis is requiring defining multiple granularity of time.

You might also like