You are on page 1of 83

A SIMPLE INTRODUCTION

TO WEKA
Contents
What is WEKA?
WEKA Explorer
Preprocessing the data
Classification
Clustering
Association Rules
Attribute Selection
Data Visualization
1 What is WEKA?
Waikato Environment for Knowledge Analysis

Developed by Department of Computer Science, University


of Waikato, New Zealand.
Weka is also a bird found only on the
islands of New Zealand.
A collection of machine learning algorithms
for data mining tasks.

Download and Install WEKA

Website:http://www.cs.waikato.ac.nz/~
ml/weka/index.html

Platform independent
WEKA GUI Exploratory data analysis

Experimental
environment

New process model inspire


interface

Command Line Interface


2

WEKA Explorer
Preprocessing the data

Classification

Clustering

Association Rules

Attribute Selection

Data Visualization
Pre-Processing the data

Data can be imported from a file in various


formats.
ARFF-Attribute-Relation File Format
CSV - Comma Separated Values

Data can be read from a URL or from a SQL


database.

Filters are used for pre-processing


WEKA only deals with flat files

@relation heart-disease-simplified
@attribute age numeric
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...
University of Waikato
4/13/17 13
University of Waikato
4/13/17 14
University of Waikato
4/13/17 15
University of Waikato
4/13/17 16
University of Waikato
4/13/17 17
University of Waikato
4/13/17 18
University of Waikato
4/13/17 19
University of Waikato
4/13/17 20
University of Waikato
4/13/17 21
University of Waikato
4/13/17 22
University of Waikato
4/13/17 23
University of Waikato
4/13/17 24
University of Waikato
4/13/17 25
University of Waikato
4/13/17 26
University of Waikato
4/13/17 27
University of Waikato
4/13/17 28
University of Waikato
4/13/17 29
University of Waikato
4/13/17 30
Building Classifiers

Classifiers in WEKA are models for predicting nominal


or numeric quantities

Implemented learning schemes include:

Decision trees and lists, instance-based classifiers,


support vector machines, multi-layer perceptron,
logistic regression, Bayes nets,
Decision Tree Induction: Training Dataset
Output: A Decision Tree for buys_computer
age?

<=30 overcast
31..40 >40

student? yes credit rating?


no yes excellent fair
no yes yes
4/13/17 University of Waikato 34
4/13/17 University of Waikato 35
4/13/17 University of Waikato 36
University of Waikato
4/13/17 37
University of Waikato
4/13/17 38
University of Waikato
4/13/17 39
University of Waikato
4/13/17 40
University of Waikato
4/13/17 41
University of Waikato
4/13/17 42
University of Waikato
4/13/17 43
University of Waikato
4/13/17 44
University of Waikato
4/13/17 45
University of Waikato
4/13/17 46
University of Waikato
4/13/17 47
University of Waikato
4/13/17 48
University of Waikato
4/13/17 49
University of Waikato
4/13/17 50
University of Waikato
4/13/17 51
University of Waikato
4/13/17 52
University of Waikato
4/13/17 53
University of Waikato
4/13/17 54
University of Waikato
4/13/17 55
Clustering data

Finding groups of similar instances in a


dataset

Implemented schemes in WEKA are:


k-Means, EM, Cobweb, X-means,
FarthestFirst
Finding Associations
WEKA contains an implementation of the Apriori
algorithm for learning association rules

Works only with discrete data

Can identify statistical dependencies between


groups of attributes:
University of Waikato
4/13/17 58
University of Waikato
4/13/17 59
University of Waikato
4/13/17 60
University of Waikato
4/13/17 61
University of Waikato
4/13/17 62
Attribute Selection
Used to determine the most predictive attributes

Consists of two parts

1.) A search method : best-first, forward selection,


random, exhaustive, genetic algorithm and etc.

2.)An evaluation method : correlation-based,


wrapper, information gain an etc.
University of Waikato
4/13/17 64
University of Waikato
4/13/17 65
University of Waikato
4/13/17 66
University of Waikato
4/13/17 67
University of Waikato
4/13/17 68
University of Waikato
4/13/17 69
University of Waikato
4/13/17 70
University of Waikato
4/13/17 71
Data Visualization
WEKA can visualize single attributes (1-d) and
pairs of attributes (2-d)

Color-coded class values

Use of jitter option

Zoom-in function
University of Waikato
4/13/17 73
University of Waikato
4/13/17 74
University of Waikato
4/13/17 75
University of Waikato
4/13/17 76
University of Waikato
4/13/17 77
University of Waikato
4/13/17 78
University of Waikato
4/13/17 79
University of Waikato
4/13/17 80
University of Waikato
4/13/17 81
University of Waikato
4/13/17 82
The End

You might also like