You are on page 1of 11

Algorithm for constructing

Decision tree

Lect 17/ 01-09-09 1


• TreeGrowth (TR,F)
1: if stopping_cond(TR,F)=true then
2: leaf=createNode()
3: leaf.label=Classify(TR)
4: return leaf
5: else
6: root=createNode()
7: root.test_cond=find_best_split(TR,F)
8: let V={v|v is a possible outcome of root.test_cond}
9: for each v∈ V do
10: TRv={tr|root.test_cond(tr)=v and tr ∈TR}
11: child= TreeGrowth(TRv,F)
12: add child as descendent of root and label the edge (root child)
as v
13: end for
14: endif
15: return root. Lect 17/ 01-09-09 2
Characteristics of Decision
Tree Induction

Lect 17/ 01-09-09 3


• 1. DTI is a nonparametric approach for
building class. model.
– Doesn’t require any prior assumption about the
distribution of the data set

• 2. The algorithm presented so far uses a


top-down, recursive partitioning strategy to
induce a reasonable solution.

• 3. Tech. developed for construction of DT


are computationally inexpensive, making it
possible to construct models very fast even
when datasets areLect very large.
17/ 01-09-09 4
• 4. DT especially small in size are relatively
easy to interpret.
• 5. DT are robust to the presence of noise.
• 6. The presence of redundant attr. does not
adversely affect the accuracy of the
decision tree.
• An attr. is redundant if it is strongly correlated with
another attr. in the data.
• If data contains irrelevant attr. then feature selection
tech. can help to improve the accuracy of the DT by
eliminating such attr. during preprocessing.

Lect 17/ 01-09-09 5


• 7. Data Fragmentation: Number of
instances gets smaller as you traverse
down the tree
• Number of instances at the leaf nodes could be too
small to make any statistically significant decision.
• One possible solution is to disallow further splitting
when the no. of records fall below a certain
threshold.

Lect 17/ 01-09-09 6


9. Decision Boundary
1

0.9

x < 0.43?
0.8

0.7
Yes
0.6
y

0.5 y < 0.47?


0.4

0.3
Yes No

0.2
:4 :0
0.1 :0 :4
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
• The test cond. considered so
x far contains only a single attr. at a time.
•So tree constr. can be viewed as partitioning the attr. space into
disjoint regions until each region contains records of the same class
label.
•Border line between two neighboring regions of different classes is
known as decision boundary
• Decision boundary is parallel to axes because test condition involves
a single attribute at-a-time. This limits the expressiveness of the DT
Lect 17/ 01-09-09
representation for modeling complex rel. among cont. attr. 7
The fig shown here can’t be classified
effectively by a DT that uses single attr. test
10. Oblique Decision Trees
cond. at a time.

x+y<1

yes no

Class = + Class =

• Test condition may involve multiple attributes in an oblique tree.


• More expressive representation
• Finding optimal test condition is computationally expensive
Lect 17/ 01-09-09 8
P

11. Tree Replication


Q R

S 0 Q 1

0 1 S 0

0 1

• Same subtree appears in multiple branches.


• DT becomes difficult to interpret.
• Such situation arises when construction of DT depends
on a single attr. test cond. at each internal node.
Lect 17/ 01-09-09 9
• 8. Expressiveness :Decision tree provides
expressive representation for learning discrete-
valued function
– But they do not generalize well to certain types of
Boolean functions
• Example: parity function:
– Class = 1 if there is an even number of Boolean attributes with
truth value = True
– Class = 0 if there is an odd number of Boolean attributes with
truth value = True
• For accurate modeling, must have a complete tree with 2d
nodes, where d is the no. of Boolean attr.
• (Q: Draw a DT for a parity function with 4 boolean attr. A, B,
C, D)

• Not expressive enough for modeling continuous


variables
– Particularly when test condition involves only a single
attribute at-a-time
Lect 17/ 01-09-09 10
• 12. Studies have shown that choice of
impurity measures do not impact the
performance of the DT induction
algorithms.

Lect 17/ 01-09-09 11

You might also like