You are on page 1of 4

Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing

Slices Mining Based on Singular Value

Lijie Zhang , Haili Yin, Hui Liu


School of Sciences, Qingdao Technological University, Qingdao Shandong,China, 266033
zh_1112@163.com

Abstract The concept of OLAP (on-line analytical


processing) was proposed by E.F.Codd in 1993.
Slice is one of the major operations in on-line Typical OLAP operations consist of roll-up, drill-
analysis processing which has played an important down, slice, dice and pivot. Now most research works
role in the application of decision support. Based on of data mining have focused on OLAP.[2-7]
data cube, by mining the maximum singular value of
the slices, a method was proposed in this paper to 2. Data Cube and Slice
extract the inner rules of movement. Algebraic theories
proved that it is feasible. And the numerical Usually, information gained from certain things is
experiment also demonstrated that it is efficient. the results of their movement at specific time or
specific space. It is common that many factors can
1. Introduction influence the movement of objects and these factors as
well as the movement constitute a system. A
Our capabilities of both generating and collecting multidimensional data space is typically organized
data have been increasing rapidly, due to the swift around a central theme which is viewed as a single
development of technology of computer, the hot dimension. The degrees of effect on the system for all
popularization of the World Wide Web, the wide- factors or the targets of data mining determine the
ranging collection of data and the widespread use of numbers of dimension of data space. Then data cube
database. If these tremendous amount of data are useful can be formed by the operations of roll-up and drill-
or not is dependent on whether valuable information down.
and knowledge can be mined from them. As one of the basic operations, the slice operation
Faced huge amounts of data with different performs a selection on one dimension of the given
classification and distinct architecture, there is not an cube, resulting a subcube.
identical technique used to search for interesting Figure 1 shows the data cube and slices. Data cube
knowledge. The problems, such as what we want to, B has three dimensions D1, D2 and D3. Dimension D2
where we start in and how begin on, are becoming the is organized into k equispaced partitions. Dimension
major subjects of researching in economics analysis, D2 and Dimension D3 are organized into n and m
business management and science exploration. partitions, respectively. k slices are obtained
Data warehouses presented in later 1980s is usually corresponding to k m×n matrices.
modeled by multidimensional database structure which In same system, the distribution of data in different
actual physical structure may be a relational data store slices at same dimension has some correlation or
or a multidimensional data cube. It has the following uniformity. It is very benefic to find the potential
properties: subject-oriented, integrated, time-variant information and knowledge by discovering the
and nonvolatile. Providing preprocessing data for data characteristics of distribution inside the slices and
mining, Data warehouses solves the problem very well: comparing the slices with each other.
where we start in. Especially, it also provides on-line Commonly, the same distributive characteristics of
analytical processing(OLAP) tools, applied to data from subcubes or slices in the same dimension
interactive analysis of multidimensional data of varied structure represent the dynamical traits of inner rules of
granularities, with the properties of summarization, the system. By contrast, the slices with obviously
consolidation, and aggregation, as well as the ability to different properties from others need to pay great
view information from different angles.[1] attention to analyze deeply.

0-7695-2909-7/07 $25.00 © 2007 IEEE 40


DOI 10.1109/SNPD.2007.268
4. Eigenvalues of Slice
Slice
The systems are classified into linear and nonlinear
systems. Most of nonlinear systems are transformed
Cube A Roll-up into linear systems. So, all researches in this paper
focus on linear systems.
Slice Suppose that V ( p) is a linear space over the field
p and T ∈ L (V ) is a linear transformation on V ( p) ,
D3 Cube B
D2 Drill-down where L(V ) is a space of linear transformations. The
D1 number λ is called an eigenvalue of a linear
Dimension Slice transformation T associated with the eigenvector ξ if
there exists λ ∈ p and ξ ∈ V ( p ) such that T ξ = λ ξ .
Cube C The eigenvalue λ exhibits the property of the linear
Figure 1. OLAP operations on data cube transformation T , such as the growth rate of
population a in the Malthus’s equation Dy = ay .
3. Data Analysis of Slice Where D ∈ L (V ) is a differential operator.[8]
If V ( p) is a n-dimensional linear space, then there
A group of slices in an objective dimension have exists a basis {ε 1 , ε 2 , " , ε n } for V ( p) such that
the same numerical structures that indicate the K
ξ = (ε1 , ε 2 ," , ε n ) X and
characteristics of movement of the system. They
should have some similarities. An abnormal numerical T (ε 1 , ε 2 , " , ε n ) = (ε 1 , ε 2 , " , ε n ) A
character of slices can reflect the changes even the for any ξ ∈ V ( p ) ,
K
unusualities of the system. Therefore, it is significant where X = ( x1 , x 2 , " , x n ) Τ ∈ p n and A is the matrix
to mine the similarities or the abnormalities from the of the linear transformation T relative to the basis
slices.
{ε 1 , ε 2 , ", ε n } . So we have the following:
A popular method to determine the abnormalities K K
among slices with different data elements is to estimate T ξ = λ ξ ⇔ T (ε 1 , ε 2 , ", ε n ) X = λ (ε 1 , ε 2 , " , ε n ) X
K K K K
n .
⇔ (ε 1 , ε 2 , " , ε n ) AX = (ε 1 , ε 2 , " , ε n )λ X ⇔ AX = λX


1
the one-order moment x = aij and the two-order The matrix A of a linear transformation T relative
n i, j =1
to a given basis {ε1 , ε 2 ,", ε n } is unique. The
n
eigenvalues of A , is same as ones of T , indicate the

1
moment s 2 = (aij − x ) 2 of the slices. This characteristics of the linear transformation T . For
n − 1 i , j =1
example, the equation Kx = ω 2 x is a vibration system,
method is useful and efficient. Note that the sources of
data are multiform. It is difficult to find the difference the vibration frequency ω 2 is just the eigenvalue of the
among slices using the moment of order one or two if system and K represents the stiffness matrix in the
the slices have the similar data elements but the dynamic system.[8]
different data distribution at an target dimension in a
data cube. As an example, two slices are displayed in 5. Eigenvalues Mining
Figure 2.
Let A ∈ R m×n be a m × n matrix, R m×n be the
1 2 3 8 9 7 linear space of m × n real matrices. The eigenvalues
4 5 6 6 5 4 of any n× n real matrix are usually complex. In
7 8 9 3 2 1 addition, the matrices generated by slices are also in
R m×n . So, the main work in this paper focuses on
Figure 2. Two slices
mining eigenvalues which exhibit the traits of
movement from the slices in the system.
From the two slices, we only have x1 = x2 = 5 and
s 21 = s 2 2 = 0 .

41
Definition 1. Assume that A ∈ C m×n , and the
numbers λ1 , λ2 , ", λr are the eigenvalues of A Η A .
The nonnegative numbers
σ i = λi , (i = 1, 2, ", r ) are called the singular values
of the matrix A . Where C m×n is the collection of m× n
complex matrices, A Η is the transpose of A , and r is
the rank of A .
Definition 2. Let A ∈ C m×n . The matrix A is called
the unitary matrix if A Η A = E . And we use Σ n×n to
denotes the collection of the unitary matrices.
Theorem 1.(Singular Value Decomposition Theorem) Figure 3. Results(1)
m×n
Let A ∈ Cr . Then there exists two unitary
m×m
matrices U ∈ Σ and V ∈ Σ n×n such that
 D 0 r ×n− r  .
U Η AV =  r
0 m− r×r 0 m− r×n −r 
Where Dr = diag(σ 1 , σ 2 , ", σ r ) , σ 1 ≥ σ 2 ≥ " ≥ σ r > 0 and
r is the rank of A .
Theorem 2. Let A ∈ C m×n . Then σ 1 =|| A || 2 is the
maximum singular value of A , called spectral norm of
A .[9]
According to Theorem 1, applying singular value
decomposition to slices is an effective method of Figure 3 . Results(2)
algebraic feature extraction. Using this method can
find the basic structure and the algebraic essence of the Analysis of Results
data matrix. Form 1951 to 2001, Population of the ten blocks
Theorem 2 indicates that the first singular value of increased from 3,400,000 to 6,920,000, area from
the matrix is maximum, and it can present the algebraic 2,412 m2 to 2,963 m2 and population density from
feature of the matrix. Therefore, searching for the 1,410 people per m2 to 2,336 people per m2 .
maximum singular value from the slices in the same
According to the distribution of the singular values, we
class does not influence the similarity and the
have the following:
abnormality of the slices. Furthermore, the
(1) The curve of singular values in outskirts which
computation to find the singular value is not difficult.
has the stable area is growing, such as in Nanhui and
We can easily find the singular value of the two slices
Congming. Especially in Congming located in the delta
shown in Figure 2 is (16.85,1.07,0.00) and
of the Yangtse Rive, the population is increasing while
(15.36,7.00,0.28), respectively.
the area does not change.
(2) The curve of singular values of the areas in the
6. Experiment and Analysis centre of the city tends to decrease such as in Hongkou,
Zhabei, Yangpu, Luwan and Xuhui. The change focus
Hardware and Software: 1.3GHz CPU, 128M on the years 1960 and 1980 since there was an area re-
Memory, Matlab 6.5 division in Shanghai in 1960 and 1984. The curve of
Data Source: To analyze the singular value, we have singular values exhibits the period and the efficiency of
chosen the data including the populations, families, this regulation.
areas and population density involving 10 blocks of 19 (3) The most abnormal area lies in Huangpu. Being
regions in Shanghai City from 1951 to 2001. Among the centre of economy and culture in Shanghai,
them, 60 percent of blocks are located in the centre of Huangpu area is always very susceptible to economy
the city, 40 percent of blocks are located in outskirt. and polity.
Choose the time as the target dimension and the size of In this example, by analyzing the singular values of
the subcube as each 3 year. Finding the maximum the slices and searching for the similarity and
singular values of 17 slices in each region have taken abnormality, we knew the tendency of movement and
1.05 seconds. The results are demonstrated in Figure 3. found why .it was so. The results of this example

42
indicate that it is valuable to research the singular Data Bases, Edinburgh: Morgan Kaufmann Publishers, 1999,
values of the slices in data mining. pp. 42-53.

[6] S. Sarawagi, User-Adaptive exploration of


7. Conclusion multidimensional data. In: Proc. of the 26th Int’l Conf. on
Very Large Data Bases, Cairo: Morgan Kaufmann
Based on data cube, by mining the maximum Publishers, 2000, pp. 307-316.
singular value of the slices, a method was proposed in
this paper to extract the inner rules of movement. [7] G. Sathe, S. Sarawagi, Intelligent rollups in
Algebraic theories proved that it is feasible. And the multidimensional OLAP data. In: Proc. of the 27th Int’l
Conf. on Very Large Data Bases, Roma: Morgan Kaufmann
numerical experiment also demonstrated that it is Publishers, 2001, pp. 531-540.
efficient.
More researches need to do as follows: [8] G. Dennis, Zill and R. Michael, Differential Equations
(1) Reflecting the inner algebraic nature of slices, with Boundary-Value Problems, Thomson Learning, 2001,
the method of mining the singular value is valuable to pp.167-176. (in Chinese)
analyze the mechanism of movement. But we are not
sure whether it is in good condition when it is used to [9] Shufang Xu, Theories and Methods of Matrix
mine the data cube with different sources, especially to Computing, Beijing University Press, 1995, pp. 1-21.
the ones with no evident inner traits. In other words, it (in Chinese)
is a problem what types of data cube are proper well to
use the singular value mining.
(2) Keeping no change on transformation such as
transpose, rotation, shift and so on, this method of
singular values mining is flexible. But generating slices
is dependent on the determination of the target
dimension in data cube which influences on
computation and efficiency. There are the problems
that how to determine the target dimension in data
cube and how to operate roll-up and drill-down.
(3) Representation of data noises in slices is
disturbance of the matrix. A small disturbance of the
matrix generates the change of eigenvalues. It is
important to investigate how to compute the singular
values under unknown whether there exists disturbance
of the matrix.

8. References

[1] Jiawei, Micheline Kamber, Data Mining: Concepts and


Techniques, Morgan Kaufmann Publishers, 2000, pp. 58-62.

[2] T. Imielinski, L. Khachiyan, and A. Abdulghani,


Cubegrades: Generalizing association rules, In: Proc. of the
8th Int’l Conf. on Data Mining and Knowledge Discovery.
Edmonton: ACM Press, 2002. pp. 219-257.

[3] V. S. Lakshmanan, J. Pei, and J. W. Han, Quotient cube:


How to summarize the semantics of a data cube, In: Proc. of
the 28th Int’l Conf. on Very Large Data Bases. Hong Kong:
Morgan Kaufmann Publishers, 2002, pp. 778-789.

[4] S. Sarawagi, R. Agrawal, and N. Megiddo, Discovery-


Driven exploration of OLAP data cubes. In: Proc. of the Int’l
Conf. on Extending Database Technology. LNCS 1377,
Springer-Verlag, 1998, pp. 168-182.
[5] S. Sarawagi, Explaining differences in multidimensional
aggregates, In: Proc. of the 25th Int’l Conf. on Very Large

43

You might also like