Feature selection is an important aspect under study in machine learning based diagnosis, that aims to remove irrelevant features for reaching good performance in the diagnostic systems. The behaviour of diagnostic models could be sensitive with regard to the amount of features, and significant features can represent the problem better than the entire set. Consequently, algorithms to identify these features are valuable contributions. This work deals with the feature selection problem through attribute clustering. The proposed algorithm is inspired by existing approaches, where the relative dependency between attributes is used to calculate dissimilarity values. The centroids of the created clusters are selected as representative attributes. The selection algorithm uses a random process for proposing centroid candidates, in this way, the inherent exploration in random search is included. A hierarchical procedure is proposed for implementing this algorithm. In each level of the hierarchy, the entire set of available attributes is split in disjoint sets and the selection process is applied on each subset. Once the significant attributes are proposed for each subset, a new set of available attributes is created and the selection process runs again in the next level. The hierarchical implementation aims to refine the search space in each level on a reduced set of selected attributes, while the computational time-consumption is improved also. The approach is tested with real data collected from a test bed, results show that the diagnosis precision by using a Random Forest based classifier is over 98 % with only 12 % of the attributes from the available set.
Bibliographical noteFunding Information:
The authors want to express a deep gratitude to The Secretary of Higher Education, Science, Technology and Innovation (SENESCYT) of the Republic of Ecuador and the Prometeo program, for their support in this research work. We also acknowledge the support of the GIDTEC research group of the Universidad Politécnica Salesiana in Cuenca-Ecuador, for the accomplishment of this research.
© 2015, Springer Science+Business Media New York.
- Attribute clustering
- Feature selection
- Gear fault diagnosis
- Relative dependency
- Rough sets