You are on page 1of 14

Data Mining

Entropy Weighting Genetic k-Means Algorithm for Subspace Clustering

Data Mining

Reverse Nearest Neighbors Search in Adhoc Subspaces

Networking

On Wireless Scheduling Algorithms for Minimizing the Queue-Overflow Probability

Image Processing

Cluster-Based Retrieval of Images by Unsupervised Learning

To eliminate commercial breaks.

Virtual Smartphone over IP

ROPE: ROBUST POSITION ESTIMATION IN WIRELESS SENSOR NETWORKS

Mobile Computing

SmartCam - Smart Phone Web Camera Profiling bank customers behaviour using cluster analysis for profitability

Implementation of Polynomial Neural Network in Web Usage Mining

A Method for Estimating the Precision of Placename Matching

Customer Segmentation - Predictive Knowledge & Data Mining

Fuzzy Logic = Computing with Words

Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches

CLUE: Cluster-Based Retrieval of Images by Unsupervised Learning

On Wireless Scheduling Algorithms for Minimizing the Queue-Overflow Probability

Reverse Nearest Neighbors Search in Adhoc Subspaces

Shape Representation and Classification Using the Poisson Equation

Texture Segmentation by Multiscale Aggregation of Filter Responses and Shape Elements

Unsupervised Learning of Human Motion

Content-based Image Retrieval Using Gabor Texture Features

Relevance Feedback: A Power Tool for Interactive Content-Based Image Retrieval

This paper presents a genetic k-means algorithm for clustering high dimensional objects in subspaces. High dimensional data faces data sparsity problem. In this algorithm, we present the genetic k-means clustering process to calculate a weight for each dimension in each cluster and use the weight values to identify the subsets of important dimensions that categorize different clusters. This is achieved by including the weight entropy in the objective function that is minimized in the k-means clustering process. Further, the use of genetic algorithm ensure for converge to the global optimum. The experiments on UCI data has reported that this algorithm can generate better clustering results than other subspace clustering algorithms.

Given an object q, modeled by a multidimensional point, a reverse nearest neighbors (RNN) query returns the set of objects in the database that have q as their nearest neighbor. In this paper, we study an interesting generalization of the RNN query, where not all dimensions are considered, but only an ad-hoc subset thereof. The rationale is that (i) the dimensionality might be too high for the result of a regular RNN query to be useful, (ii) missing values may implicitly define a meaningful subspace for RNN retrieval, and (iii) analysts may be interested in the query results only for a set of (ad-hoc) problem dimensions (i.e., object attributes). We consider a suitable storage scheme and develop appropriate algorithms for projected RNN queries, without relying on multidimensional indexes. Given the significant cost difference between random and sequential data accesses, our algorithms are based on applying sequential accesses only on the projected atomic values of the data at each dimension, to progressively derive a set of RNN candidates. Whether these candidates are actual RNN results is then validated via an optimized refinement step. In addition, we study variants of the projected RNN problem, including RkNN search, bichromatic RNN, and RNN retrieval for the case where sequential accesses are not possible. Our methods are experimentally evaluated with real and synthetic data.

In this paper, we are interested in wireless scheduling algorithms for the downlink of a single cell that can minimize the queue-overflow probability. Specifically, in a largedeviation setting, we are interested in algorithms that maximize the asymptotic decay-rate of the queue-overflow probability, as the queue-overflow threshold approaches infinity. We first derive an upper bound on the decay-rate of the queue-overflow probability over all scheduling policies.We then focus on a class of scheduling algorithms collectively referred to as the alpha-algorithms.? Fora given alpha>=1, the alpha-algorithm picks the user for service at each time that has the largest product of the transmission rate multiplied by the backlog raised to the power alpha. We show that when the overflow metric is appropriately modified, the minimum-cost-to-overflow under the alpha-algorithm can be achieved by a simple linear path, and it can be written as the solution of a vector-optimization problem. Using this structural property, we then show that when alpha approaches infinity, the alphaalgorithms asymptotically achieve the largest decay-rate of the queueover flow probability. Finally, this result enables us to design scheduling algorithms that are both close-to-optimal in terms of the asymptotic decay-rate of the overflow probability, and empirically shown to maintain small queue-overflow probabilities over queue-length ranges of practical interest.

In a typical content-based image retrieval (CBIR) system, target images (images in the database) are sorted by feature similarities with respect to the query.From a computational perspective, a typical CBIR system views the query image and images in the database (target images) as a collection of features and ranks the relevance between the query image and any target image in proportion to a similarity measure calculated from the features. In this sense, these features, or signatures of images, characterize the content of images.

Using Matlab eliminate the commercials between two intervals to make a continuous episode The number of smartphone users and mobile application offerings are growing rapidly. A smartphone is often expected to offer PC-like functionality. In this paper, we present Virtual Smartphone over IP system that allows users to create virtual smartphone images in the mobile cloud and to customize each image to meet different needs. Users can easily and freely tap into the power of the data center by installing the desired mobile applications remotely in one of these images. Because the mobile applications are controlled remotely, they are not constrained by the limit of processing power, memory and battery life of a physical smartphone.

We address the problem of secure location determination, known as Secure Localization, and the problem of verifying the location claim of a node, known as Location Verification, in Wireless Sensor Networks (WSN). We propose a robust positioning system we call ROPE that allows sensors to determine their location without any centralized computation. In addition, ROPE provides a location verification mechanism that verifies the location claims of the sensors before data collection.

Turns a camera phone (S60, WinMo6.x, Android, Samsung Bada) with bluetooth or WiFi into a handy webcam ready to use with your PC
Analyzing information of bank's customers for Customer Behavior Modeling is a difficult problem since it is a multi dimensional problem. This study has been done in a bank at the analysis of customers behavior that isnt limited to customers benefit in their general uses of banking services and also is considered as channels charges. In this study, used of kmeans for customer segmentation according to important characteristic of customers and their behaviors in using a channel. The result of segmentation of customers profile is according to their behavior which will help the bank for retention strategies of current customers and attracting new customers.

Education, banking, various business and humans necessary needs are made available on the Internet. Day by day number of users and service providers of these facilities are exponentially growing up. The people face the challenges of how to reach their target among the enormous Information on web on the other side the owners of web site striving to retain their visitors among their competitors. Personalized attention on a user is one of the best solutions to meet the challenges. Thousands of papers have been published about personalization. Most of the papers are distinct either in gathering users logs, or preprocessing the web logs or Mining algorithm. In this paper simple codification is performed to filter the valid web logs. The codified logs are preprocessed with polynomial vector preprocessing and then trained with Back Propagation Algorithms. The computational efforts are calculated with various set of usage logs. The results are proved the goodness of the algorithm than the conventional methods.

Information in digital libraries and information systems frequently refers to locations or objects in geographic space. Digital gazetteers are commonly employed to match the referred placenames with actual locations in information integration and data cleaning procedures. This process may fail due to missing information in the gazetteer, multiple matches, or false positive matches.We have analyzed the cases of success and reasons for failure of the mapping process to a gazetteer. Based on these, we present a statistical model that permits estimating 1) the completeness of a gazetteer with respect to the specific target area and application, 2) the expected precision and recall of one-to-one mappings of source placenames to the gazetteer, 3) the semantic inconsistency that remains in one-to-one mappings, and 4) the degree to which the precision and recall are improved under knowledge of the identity of higher levels in a hierarchy of places. The presented model is based on statistical analysis of the mapping process of a large set of placenames itself and does not require any other background data. The statistical model assumes that a gazetteer is populated by a stochastic process. The paper discusses how future work could take deviations from this assumption into account. The method has been applied to a real case.

customer segmentation is a process that divides customers into smaller groups called segments. Segments are to be homogeneous within and desirably heterogeneous in between. In another words, customers of the same segments possess the same or similar set of attributes. But customers of different segments have differing sets of attributes. Segmentation process can be very complicated. Therefore, it's best to use advanced analytic tools.

its name suggests, computing with words (CW) is a methodology in which words are used in place of numbers for computing and reasoning. The point of this note is that fuzzy logic plays a pivotal role in CW and vice-versa. Thus, as an approximation, fuzzy logic may be equated to CW. There are two major imperatives for computing with words. First, computing with words is a necessity when the available information is too imprecise to justify the use of numbers, and second, when there is a tolerance for imprecision which can be exploited to achieve tractability, robustness, low solution cost, and better rapport with reality. Exploitation of the tolerance for imprecision is an issue of central importance in CW. In CW, a word is viewed as a label of a granule; that is, a fuzzy set of points drawn together by similarity, with the fuzzy set playing the role of a fuzzy constraint on a variable. The premises are assumed to be expressed as propositions in a natural language. For purposes of computation, the propositions are expressed as canonical forms which serve to place in evidence the fuzzy constraints that are implicit in the premises. Then, the rules of inference in fuzzy logic are employed to propagate the constraints from premises to conclusions. At this juncture, the techniques of computing with words underlie-in one way or anotheralmost all applications of fuzzy logic. In coming years, computing with words is likely to evolve into a basic methodology in its own right with wide-ranging ramifications on both basic and applied levels.

Case-based reasoning is a recent approach to problem solving and learning that has got a lot of attention over the last few years. Originating in the US, the basic idea and underlying theories have spread to other continents, and we are now within a period of highly active research in case-based reasoning in Europe, as well. This paper gives an overview of the foundational issues related to case-based reasoning, describes some of the leading methodological approaches within the field,and exemplifies the current state through pointers to some systems. Initially, a general framework is defined, to which the subsequent descriptions and discussions will refer. The framework is influenced by recent methodologies for knowledge level descriptions of intelligent systems. The methods for case retrieval, reuse, solution testing, and learning are summarized, and their actual realization is discussed in the light of a few example systems that represent different CBR approaches. We also discuss the role of case-based methods as one type of reasoning and learning method within an integrated system architecture.

In a typical content-based image retrieval (CBIR) system, target images (images in the database) are sorted by feature similarities with respect to the query. Similarities among target images are usually ignored. This paper introduces a new technique, cluster-based retrieval of images by unsupervised learning (CLUE),for improving user interaction with image retrieval systems by fully exploiting the similarity information. CLUE retrieves image clusters by applying a graph-theoretic clustering algorithm to a collection of images in the vicinity of the query. Clustering in CLUE is dynamic. In particular, clusters formed depend on which images are retrieved in response to the query. CLUE can be combined with any real-valued symmetric similarity measure (metric or nonmetric). Thus, it may be embedded in many current CBIR systems, including relevance feedback systems. The performance of an experimental image retrieval system using CLUE is evaluated on a database of around 60,000 images from COREL. Empirical results demonstrate improved performance compared with a CBIR system using the same image similarity measure. In addition, results on images returned by Googles Image Search reveal the potential of applying CLUE to realworld image data and integrating CLUE as a part of the interface for keyword-based image retrieval systems. In this paper, we are interested in wireless scheduling algorithms for the downlink of a single cell that can minimize the queue-overflow probability. Specifically, in a largedeviation setting, we are interested in algorithms that maximize the asymptotic decay-rate of the queue-overflow probability, as the queue-overflow threshold approaches infinity. We first derive an upper bound on the decay-rate of the queue-overflow probability over all scheduling policies. We then focus on a class of scheduling algorithms collectively referred to as the -algorithms. For a given 1, the -algorithm picks the user for service at each time that has the largest product of the transmission rate multiplied by the backlog raised to the power . We show that when the overflow metric is appropriately modified, the minimum-cost-to-overflow under the -algorithm can be achieved by a simple linear path, and it can be written as the solution of a vector-optimization problem. Using this structural property, we then show that when approaches infinity, the -algorithms asymptotically achieve the largest decay-rate of the queue-overflow probability. Finally, this result enables us to design scheduling algorithms that are both close-to-optimal in terms Of the asymptotic decay-rate of the overflow probability, and empirically shown to maintain small queue-overflow probabilities over queue-length ranges of practical interest.

Given an object q, modeled by a multidimensional point, a reverse nearest neighbors (RNN) paper, we study an interesting generalization of the RNN query, where not all dimensions are considered, but only an ad-hoc subset thereof. The rationale is that (i) the dimensionality might be too high for the result of a regular RNN query to be useful, (ii) missing values may implicitly define a meaningful subspace for RNN retrieval, and (iii) analysts may be interested in the query results only for a set of (ad-hoc) problem dimensions (i.e., object attributes). We consider a suitable storage scheme and develop appropriate algorithms for projected RNN queries, without relying on multidimensional indexes. Given the significant cost difference between random and sequential data accesses, our algorithms are based on applying sequential accesses only on the projected atomic values of the data at each dimension, to progressively derive a set of RNN candidates. Whether these candidates are actual RNN results is thenvalidated via an optimized refinement step. In addition, we study variants of the projected RNN problem, including RkNN search, bichromatic RNN, and RNN retrieval for the case where sequential accesses are not possible. Our methods are experimentally evaluated with real and synthetic data.

Silhouettes contain rich information about the shape of objects that can be used for recognition and classification. We present a novel approach that allows us to reliably compute many useful properties of a silhouette. Our approach assigns for every internal point of the silhouette a value reflecting the mean time required for a random walk beginning at the point to hit the boundaries. This function can be computed by solving Poissons equation, with the silhouette contours providing boundary conditions. We show how this function can be used to reliably extract various shape properties including part structure and rough skeleton, local orientation and aspect ratio of different parts, and convex and concave sections of the boundaries. In addition to this we discuss properties of the solution and show how to efficiently compute this solution using multigrid algorithms. We demonstrate the utility of the extracted properties by using them for shape classification.

Texture segmentation is a difficult problem, as is apparent from camouflage pictures. A Textured region can contain texture elements of various sizes, each of which can itself be textured. We approach this problem using a bottom-up aggregation framework that combines structural characteristics of texture elements with filter responses. Our process adaptively identifies the shape of texture elements and characterize them by their size, aspect ratio, orientation, brightness, etc., and then uses various statistics of these properties to distinguish between different textures. At the same time our process uses the statistics of filter responses to characterize textures. In our process the shape measures and the filter responses crosstalk extensively. In addition, a top-down cleaning process is applied to avoid mixing the statistics of neighboring segments. We tested our algorithm on real images and demonstrate that it can accurately segment regions that contain challenging textures.

An unsupervised learning algorithm that can obtain a probabilistic model of an object composed of a collection of parts (a moving human body in our examples) automatically from unlabeled training data is presented. The training data include both useful foreground features as well as features that arise from irrelevant background clutterthe correspondence between parts and detected features is unknown. The joint probability density function of the parts is represented by a mixture of decomposable triangulated graphs which allow for fast detection. To learn the model structure as well as model parameters, an EM-like algorithm is developed where the labeling of the data (part assignments) is treated as hidden variables. The unsupervised learning technique is not limited to decomposable triangulated graphs. The efficiency and effectiveness of our algorithm is demonstrated by applying it to generate models of human motion automatically from unlabeled image sequences, and testing the learned models on a variety of sequences.

Gabor wavelet proves to be very useful texture analysis and is widely adopted in the literature. In this paper we present a image retrieval method based on Gabor filter. Texture features are found by calculating the mean and variation of the Gabor filtered image. Rotation normalization is realized by a circular shift of the feature elements so that all images have the same dominant direction. The image indexing and retrieval are conducted on textured images and natural images. Experimental results are shown and discussed

You might also like