Professional Documents
Culture Documents
By.
Varsha Gaikwad
M-Tech (IT)
Contents
Needs
affecting quality
Faulty instruments
Human or computer errors(disguised missing data)
Errors in data transmission
Technology limitations
Duplicate tuples
Incomplete information
Timeliness
Believability
Interpretability
Data Cleaning
Data
Noisy data:
Data smoothing techniques
Binning
Regression
Outlier analysis
1: Discrepancy detection
detection tools
Transformation
tools:
ETL(extraction/transformation/loadin
g)tools
2. Redundancy
Correlation analysis
nominal data-2 (chi-square)test
Numeric attributes-correlation coefficient and
covarianc
Data Reduction
Data
reduction:
Numerosity
reduction:
compression:
discretization, a form of
data transformation where the
raw values of a numeric attribute
(e.g., age) are replaced by
interval labels (e.g., 010, 1120,
etc.) or conceptual labels (e.g.,
youth, adult, senior).
Discretization process
top-down
vs. bottom-up
Concept Hierarchy
Generation
raw data are replaced by a
smaller number of interval or
concept labels.
[On-line Transaction
Processing]
OLAP
[On-line Analytical
Processing]
1.
2.
3.
4.
5.
Data collection
Presentation
Manipulation
Interactive multi-level mining
Other miscellaneous information
!
U
O
Y
K
N
A
H
T