Professional Documents
Culture Documents
Infinite streams.
Offline Component
uses only the summary statistics to do clustering
Micro-Clusters
What is a Micro-Cluster
A Micro-Cluster is a set of individual data points that are
close to each other and will be treated as a single unit in
further offline Macro-clustering.
Snapshot
A merged cluster will have all the IDs its components have
Macro-Cluster Creation
Based on the Additivity Property of cluster
feature vector
Macro-Cluster Creation
Current Time T, the window size is h. That means the user want to
find the clusters formed in (T-h, T).
Approach:
1. 1st step: Find the snapshot for T, get the micro-cluster set S(T).
2. 2nd step: Find the snapshot for T-h, get the micro-cluster set S(T-h).
3. Use S(T)-S(T-h)
Specifically, we have a merged cluster with Id list (C1, C2, C3) in S(T)
and a cluster with Id C1 in S(T-h). Then the we use
CFT(C1,C2,C3)-CFT(C1)=CFT(C2,C3), because C1 are formed before
T-h, thus should not contribute to the micro-cluster formed in (T-h,T)
Example
Result
Time: T Time: T-h
Advertisement
Di and Charus project deals with:
1. Deterministic Clusters
2. Clusters with Arbitrary Shapes
3. Real Expirations
4. Disk Version
5. Outlier Detection by Free