Learning Bayesian Classifiers

Department of Computer Science and Engineering, National Institute of Technology, Warangal 506004
Learning Bayesian Classifiers

Using Differential Evolution algorithm for Variable Ordering
Project Guide: Dr. S. G. Sanjeevi (Head of the Department) Associate Professor 12/31/2011
Page |1
CONTENTS
1. Introduction a. Bayesian Network b. K2 Algorithm c. Learning Variable Ordering (VO) 2. Previous Experiments a. Evolutionary Algorithms (EAs) b. VOGA (Variable Ordering Genetic Algorithm) i. What is VOGA? ii. How it is implemented? iii. Experiment 3. Scope a. Differential Evolution i. Algorithm 4. Conclusion 5. References -- Page 2 -- Page 2 -- Page 2 -- Page 3 -- Page 5 -- Page 5 -- Page 5 -- Page 5 -- Page 5 -- Page 7 -- Page 9 -- Page 9 -- Page 9 -- Page 10 -- Page 10
Shruti B 8772
Mouli C R K 8792
Divya B V 8773
Page |2
INTRODUCTION
Bayesian Network:
A Bayesian Network (G) has a directed acyclic graph (DAG) structure. Each node in the graph corresponds to a discrete random variable in the domain. An edge, Y X, on the graph, describes a parent and child relation in which Y is the parent and X is the child. All parents of X constitute the parent set of X which is denoted by ( ). In addition to the graph, each node has a conditional probability table (CPT) specifying the probability of each possible state of the node given each possible combination of states of its parents. If a node contains no parent, the table gives the marginal probabilities of the node. In a process of learning BNs from data, the BN variables represent the dataset attributes (or features). When using algorithms based on heuristic search, the initial order of the dataset attributes may be an important issue. Some of these algorithms depend on this ordering to determine the arcs direction such that an earlier attribute (in an ordered list) is a possible parent only of the later ones. Instead of encoding a joint probability distribution over a set of random variables, a Bayesian Classifier (BC) aims at correctly predicting the value of a designated discrete class variable given a vector of attributes (predictors). Learning Bayesian Networks methods may be used to induce BC and it is done in this work. The BN learning algorithm applied in our experiments is based on the K2 algorithm, which constructs a BN from data and uses a heuristic search for doing so.
K2 Algorithm:
The K2 algorithm constructs a BN from data using a heuristic search. It receives as input a complete database and a VO. Considering these assumptions, the K2 algorithm searches for the BN structure that best represents the database. This algorithm is commonly applied due to its performance in terms of computational complexity (time) and good results when an adequate VO is supplied. The attributes preorder assumption is used to reduce the number of possible structures to be learned. In this sense, K2 uses an ordered list (containing all the attributes including the class), which asserts that only the attributes positioned before a given attribute A may be parents of A. Hence, the first attribute in the list has no parent, i.e. it is a root node in the BN.
Shruti B 8772
Mouli C R K 8792
Divya B V 8773
Page |3 The algorithm uses a greedy method to search for the best structure. It begins as if every node had no parent. Then, beginning with the second attribute from the ordered list (the first one is a root node), the possible parents are tested and those that maximize the whole probability structure are added to the network. This process is repeated to all attributes in order to get the best possible structure. K2 metric to test each possible parent set to each variable is defined by the following equation. ( ( ) has ) )
Where each attribute dataset with m objects. Each attribute instantiations of ( instantiated as ( ).
possible values* ), and
+. D is a is the number of and xi is .
has a set of parents (
Is the number of objects in D, in which
has value =
represents the j-th instantiation relative to D of xi). Finally,
With the best structure already defined, the network conditional probabilities are determined. It is done using a Bayesian estimation of the (predefined) network structure probability. When dataset D has a distinguished class variable, K2 may be used as a BC learning algorithm. This is exactly our assumption.
Learning Variable Ordering (VO):

Learning a Bayesian Network (BN) from data became an effervescent research topic in the last decade. The search space for a BN with n variables has an exponential dimension. Therefore, finding the BN structure that better represents the dependences among the variables is not a trivial task. This is a NP Complete problem, thus it is hard to identify the best solution for all the application problems. Trying to reduce the search space of this process, some restrictions are usually imposed and often the algorithms obtain good results with acceptable computational effort. A very common restriction when learning a BN is the definition of a previous variables ordering (VO). The same situation happens when trying to learn a Bayesian Classifier (BC) from data. We present a genetic algorithm namely VOGA (Variable Ordering Genetic Algorithm) for the optimization of the learning BC from data process by means of the identification of a suitable VO. In general, genetic algorithms are capable to identify and explore aspects of the environment where the problem is inserted and to converge globally to excellent solutions, or approximately excellent. Therefore, genetic algorithms are considered an efficient search and
Shruti B 8772
Mouli C R K 8792
Divya B V 8773
Page |4 optimization tool for most different types of problems. Several works propose hybrid GA/Bayes methods using a GA to define an adequate VO: Presented a genetic algorithm to search for the best variable ordering. Each element of the
population is a possible ordering and their fitness function is the K2 metric. Implemented a GA for the problem of permutation of variables in BN learning and inference. Considers a subgroup of the set of dependence /independence relations to get the variables
ordering. This process is guided by genetic algorithms and simulated annealing. Even having a number of works dealing with this issue; most of them are defined to learn unrestricted BN. Our GA/Bayes hybrid approach (VOGA), on the other hand, is devoted to learn Bayesian Classifiers from data. In this sense, the class variable may play an interesting role in the variable ordering definition.
Shruti B 8772
Mouli C R K 8792
Divya B V 8773
Page |5
PREVIOUS EXPERIMENTS
Genetic algorithms like VOGA and VOGA+ have been used for optimizing the learning of BC from data process by means of the identification of a suitable VO. In these genetic algorithms each element of the population is a possible ordering and their fitness is the K2 metric (g value). Evolutionary algorithms with canonical crossover and mutation have also been used to find an appropriate VO.
Evolutionary Algorithms (EAs):

EAs are computational models that solve a given problem by maintaining a changing population of chromosomes, each with its own level of fitness. A fitness function is used to measure the quality of each chromosome. Genetic algorithms are most popular models of EAs. Differential Evolution algorithms are also a class of Evolutionary Algorithms.
VOGA (Variable Ordering Genetic Algorithm):

What is VOGA?
The main idea in the proposed method is to use a GA and the class variable information to optimize the variable ordering (VO) which will be used as an input to learn a BC from data. In this sense, we fix the class variable as the first one in the VO. Subsequently, the GA is used trying to find the best ordering for the remaining variables. Our method uses a GA in which the chromosomes represent possible variables ordering. The variables identification (ID) is codified as an integer number. Therefore, each chromosome has (n 1) genes, where n is the number of variables (including the class variable) and each gene is instanced with a variable ID. Thus, each possible ordering may form a chromosome. The fitness function is given by the Bayesian score (g function) defined in K2 algorithm.
How it is implemented?
VOGA generates a random initial population. Each chromosome is evaluated by the K2 algorithm whose function g is used as fitness function. The best chromosomes are selected, and using crossover and mutation operators the next generation is generated. The process is repeated and for each generation the best ordering is stored. If there is no improvement after 10 generations, the algorithm locks up and returns the best found ordering. The flowchart summarizes the process all.
Shruti B 8772
Mouli C R K 8792
Divya B V 8773
Page |6
Start Chromosomes Evaluation Read data Crossover and Mutation Returns the best VO Stop?
Initial population Generation
End Chromosomes Evaluation Selection Flow Chart In addition to the aforementioned VOGA algorithm, it was implemented as a slightly different version, namely VOGA+, in which the initial population is not randomly generated. In VOGA+, more information about the class variable is used trying to optimize the initial population and, therefore, trying to obtain better BC structures (mainly in domains having many attributes). In order to define the VO of the initial population chromosomes, the 2 (chi-squared) statistical test is performed using each variable jointly with the class variable (for this reason, VOGA+ can only be applied in a classification context, where there is a distinguished variable, namely class variable). Thus, the strength of the dependence relationship between each variable and the class can be measured. Subsequently, the variables are decreasingly ordered according to their 2 scores. The first variable in the ordered list has the highest 2 score, i.e. it is the most dependent upon the class. Obviously, the relation between the 2 statistical test and the best VO may not hold strictly, but the work, show that good results can be achieved using this heuristic. Having defined the VO given by 2 statistical test, all initial population chromosomes are defined using this VO (all chromosomes are identical).
Shruti B 8772
Mouli C R K 8792
Divya B V 8773
Page |7
Experiment:
Seven domains were used in our simulations. Two well-known Bayesian Network domain (Engine Fuel System and Asia) and five benchmark problems from the U. C. Irvine repository1 were used in the VO and classification task, namely, Balance, Breast w, Congressional Voting Records (Voting), Vehicle and Iris. The following table summarizes the data set features. Asia AT IN CL 8 15000 2 Balance 5 625 3 Breast w 10 683 2 Engine 9 15000 2 Iris 5 150 3 Vehicle 19 846 4 Voting 17 232 2
Datasets Description with dataset name (Data), number of attributes plus class (AT), number of instances (IN) and number of classes (CL).
The experiments were conducted following the steps below. 1. Initially, the datasets had been used as input to the K2 algorithm. The VO was the original one given in the file. The Bayesian score (g) obtained to each dataset was stored. 2. The same datasets used in step 1 had been used as input to VOGA and VOGA+. The Bayesian score (g) obtained to each dataset and the number of generations necessary to reach the solution were stored.
Results achieved in steps 1 and 2 are presented in the following tables respectively. Asia K2 VOGA VOGA+ -33610 -33610 -33608 Balance -4457 -4457 -4457 Breast w -8159 -8159 -8159 Engine -33809 -33755 -33755 Iris -2026 -2026 -2026 Vehicle -10357 -10006 -9956 Voting -1749 -1727 -1724
Bayesian Score (g function) of each achieved Bayesian Network Structure. The best results in each dataset are in bold face.
Analyzing results presented in the above Table, it is possible to infer that, as far as the Bayesian score (g function) is concerned, in all performed experiments, VOGA produced results at least as good as the
Shruti B 8772
Mouli C R K 8792
Divya B V 8773
Page |8 ones produced by K2 and in 3 out of the 7 datasets VOGA improved the results obtained using K2. In addition, VOGA+ performed at least as well as VOGA and in 3 out of the 7 datasets VOGA+ improved the results obtained using VOGA. Another interesting issue revealed in Table 2 is that datasets having higher number of attributes, namely Vehicle (19 attributes) and Voting (17 attributes) favored the proposed method (VOGA), mainly when using the enhanced version VOGA+. Asia VOGA VOGA+ 11 19 Balance 11 11 Breast w 11 11 Engine 13 12 Iris 11 11 Vehicle 11 15 Voting 38 6
Number of generations needed until convergence.
When the number of generations is concerned, in 4 (Balance, Breast-w, Engine and Iris) out of the 7 datasets VOGA and VOGA+ presented (mostly) the same results. The other 3 datasets (Asia, Vehicle and Voting) revealed that, when the number of generations was not the same for VOGA and VOGA+, the Bayesian score obtained by the later one was always better.
Shruti B 8772
Mouli C R K 8792
Divya B V 8773
Page |9
SCOPE
Scope: Replacing Genetic Algorithm with Differential Evolution algorithm for better convergence and for a better Variable Ordering (possibly).
Differential Evolution:
A basic variant of the DE algorithm works by having a population of candidate solutions (called agents). These agents are moved around in the search-space by using simple mathematical formulae to combine the positions of existing agents from the population. If the new position of an agent is an improvement it is accepted and forms part of the population, otherwise the new position is simply discarded.
Algorithm:
Differential Evolution Algorithm: Let designate candidate solution in the population. The basic DE algorithm can then be
described as follows: Initialize all agents with random positions in the search-space. Until a termination criterion is met (e.g. number of iterations performed, or adequate fitness reached), repeat the following: For each agent in the population do: Pick three agents , and from the population at random, they must be
distinct from each other as well as from agent Pick a random index be optimized). Compute the agent's potentially new position Pick a uniformly distributed number * ( ) +as follows: * +(n being the dimensionality of the problem to
If ri < CR or i = R then set yi = ai + F(bi ci) otherwise set yi = xi
Shruti B 8772
Mouli C R K 8792
Divya B V 8773
P a g e | 10 If ( ) ( ) then replace the agent in the population with the improved with in the population.
candidate solution, that is, replace
Pick the agent from the population that has the highest fitness or lowest cost and return it as the best found candidate solution.
Note that
-is called the differential weight and
-is called the crossover probability,
both these parameters are selectable by the practitioner along with the population size
CONCLUSION
Experiments for the usage of differential evolution to find a suitable variable ordering (possibly) and to extend the results for Bayesian networks.
Shruti B 8772
Mouli C R K 8792
Divya B V 8773
P a g e | 11
REFERENCES
SANTOS, E. B.; HRUSCHKA JR., EBECKEN, Evolutionary Algorithm using Random Multi-point Crossover Operator for Learning Bayesian Network Structures, In 9th INTERNATIOAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, 2010.
SANTOS, E. B.; HRUSCHKA JR., ER. VOGA: Variable ordering genetic algorithm for learning Bayesian classifiers. In: 6TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS - HIS2006, 2006, Auckland.
Shruti B 8772
Mouli C R K 8792
Divya B V 8773

Learning Bayesian Classifiers

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Learning Bayesian Classifiers

Uploaded by

Copyright:

Available Formats

Department of Computer Science and Engineering, National Institute of Technology, Warangal 506004

Learning Bayesian Classifiers

possible values* ), and

+. D is a is the number of and xi is .

has a set of parents (

Is the number of objects in D, in which

represents the j-th instantiation relative to D of xi). Finally,

Learning Variable Ordering (VO):

Evolutionary Algorithms (EAs):

VOGA (Variable Ordering Genetic Algorithm):

Initial population Generation

Number of generations needed until convergence.

If ri < CR or i = R then set yi = ai + F(bi ci) otherwise set yi = xi

candidate solution, that is, replace

-is called the differential weight and

-is called the crossover probability,

You might also like