C4.5 Classification

C4.5 is a suite of algorithms for classification problems in machine learning and data mining. It is targeted at supervised learning: Given an attribute-valued dataset where instances are described by collections of attributes and belong to one of a set of mutually exclusive classes, C4.5 learns a mapping from attribute values to classes that can be applied to classify new, unseen instances.C4.5, designed by J. Ross Quinlan, is so named because it is a descendant of the ID3 approach to inducing decision trees,which in turn is the third incarnation in a series of “iterative dichotomizers.” A decision tree is a series of questions systematically arranged so that each question queries an attribute and branches based on the value of the attribute. At the leaves of the tree are placed predictions of the class variable. The algorithm is given below.

   1: Input: an attribute-valued dataset D 

   2: Tree ={}

   3: if D is “pure” OR other stopping criteria met then

   4: terminate

   5: end if

   6: for all attribute a ∈ D do

   7: Compute information-theoretic criteria if we split on a

   8: end for

   9: abest = Best attribute according to above computed criteria

  10: Tree = Create a decision node that tests abest in the root

  11: Dv = Induced sub-datasets from D based on abest

  12: for all Dv do

  13: Treev = C4.5(Dv)

  14: Attach Treev to the corresponding branch of Tree

  15: end for

  16: return Tree

Leave a Reply

Your email address will not be published. Required fields are marked *

%d bloggers like this: