C4.5 is a suite of algorithms for classification problems in machine learning and data mining. It is targeted at supervised learning: Given an attribute-valued dataset where instances are described by collections of attributes and belong to one of a set of mutually exclusive classes, C4.5 learns a mapping from attribute values to classes that can be applied to classify new, unseen instances.C4.5, designed by J. Ross Quinlan, is so named because it is a descendant of the ID3 approach to inducing decision trees,which in turn is the third incarnation in a series of “iterative dichotomizers.” A decision tree is a series of questions systematically arranged so that each question queries an attribute and branches based on the value of the attribute. At the leaves of the tree are placed predictions of the class variable. The algorithm is given below.
1: Input: an attribute-valued dataset D
2: Tree ={}
3: if D is “pure” OR other stopping criteria met then
4: terminate
5: end if
6: for all attribute a ∈ D do
7: Compute information-theoretic criteria if we split on a
8: end for
9: abest = Best attribute according to above computed criteria
10: Tree = Create a decision node that tests abest in the root
11: Dv = Induced sub-datasets from D based on abest
12: for all Dv do
13: Treev = C4.5(Dv)
14: Attach Treev to the corresponding branch of Tree
15: end for
16: return Tree