Interpreting Fuzzy Models: the Discriminative Power of Input Features
Abstract: An important part of the interpretation of a decision process lies on the ascertainment of the influence of the input features, that is, of how much the implemented model relies on a given input feature to perform the desired task. Recently data analysis techniques based on fuzzy logic have gained attention because of their interpretability. Many real-world applications, however, have very high dimensionality and require very complex decision borders. In this case the number of fuzzy rules can proliferate and the easy interpretability of the fuzzy model can progressively disappear.
A method is presented that quantifies the discriminative power of the input features in a fuzzy model. The proposed quantification helps the interpretation of fuzzy models constructed on high dimensional and very fragmented training sets. First, a measure of the information contained in the fuzzy model is defined on the basis of its fuzzy rules. The classification is then performed along one of the input features, that is, the fuzzy rules are split according to that feature's linguistic values. For each linguistic value, a fuzzy sub-model is generated from the original fuzzy model. The average information contained in these fuzzy sub-models is measured and the relative comparison with the information measure of the original fuzzy model quantifies the information gain that derives from the classification performed on the selected input feature. This information gain characterizes the discriminative power of that input feature. Therefore, the proposed information gain can be used to obtain better insights into the selected fuzzy classification strategy, even in very high dimensional cases, and possibly to reduce the input dimension.
Several artificial and real-world data analysis are reported as examples, in order to illustrate the characteristics and potentialities of the proposed algorithm. As real-world examples, the most informative electrocardiographic measures are detected for an arrhythmia classification problem and the role of duration, amplitude and pitch variations of syllabic nuclei in American English spoken sentences is investigated for prosodic stress classification.