TY - JOUR
T1 - Hierarchical feature selection based on relative dependency for gear fault diagnosis
AU - Cerrada, Mariela
AU - Sánchez, René Vinicio
AU - Pacheco, Fannia
AU - Cabrera, Diego
AU - Zurita, Grover
AU - Li, Chuan
N1 - Publisher Copyright:
© 2015, Springer Science+Business Media New York.
PY - 2016/4/1
Y1 - 2016/4/1
N2 - Feature selection is an important aspect under study in machine learning based diagnosis, that aims to remove irrelevant features for reaching good performance in the diagnostic systems. The behaviour of diagnostic models could be sensitive with regard to the amount of features, and significant features can represent the problem better than the entire set. Consequently, algorithms to identify these features are valuable contributions. This work deals with the feature selection problem through attribute clustering. The proposed algorithm is inspired by existing approaches, where the relative dependency between attributes is used to calculate dissimilarity values. The centroids of the created clusters are selected as representative attributes. The selection algorithm uses a random process for proposing centroid candidates, in this way, the inherent exploration in random search is included. A hierarchical procedure is proposed for implementing this algorithm. In each level of the hierarchy, the entire set of available attributes is split in disjoint sets and the selection process is applied on each subset. Once the significant attributes are proposed for each subset, a new set of available attributes is created and the selection process runs again in the next level. The hierarchical implementation aims to refine the search space in each level on a reduced set of selected attributes, while the computational time-consumption is improved also. The approach is tested with real data collected from a test bed, results show that the diagnosis precision by using a Random Forest based classifier is over 98 % with only 12 % of the attributes from the available set.
AB - Feature selection is an important aspect under study in machine learning based diagnosis, that aims to remove irrelevant features for reaching good performance in the diagnostic systems. The behaviour of diagnostic models could be sensitive with regard to the amount of features, and significant features can represent the problem better than the entire set. Consequently, algorithms to identify these features are valuable contributions. This work deals with the feature selection problem through attribute clustering. The proposed algorithm is inspired by existing approaches, where the relative dependency between attributes is used to calculate dissimilarity values. The centroids of the created clusters are selected as representative attributes. The selection algorithm uses a random process for proposing centroid candidates, in this way, the inherent exploration in random search is included. A hierarchical procedure is proposed for implementing this algorithm. In each level of the hierarchy, the entire set of available attributes is split in disjoint sets and the selection process is applied on each subset. Once the significant attributes are proposed for each subset, a new set of available attributes is created and the selection process runs again in the next level. The hierarchical implementation aims to refine the search space in each level on a reduced set of selected attributes, while the computational time-consumption is improved also. The approach is tested with real data collected from a test bed, results show that the diagnosis precision by using a Random Forest based classifier is over 98 % with only 12 % of the attributes from the available set.
KW - Attribute clustering
KW - Feature selection
KW - Gear fault diagnosis
KW - Relative dependency
KW - Rough sets
UR - http://www.scopus.com/inward/record.url?scp=84960411565&partnerID=8YFLogxK
U2 - 10.1007/s10489-015-0725-3
DO - 10.1007/s10489-015-0725-3
M3 - Article
AN - SCOPUS:84960411565
SN - 0924-669X
VL - 44
SP - 687
EP - 703
JO - Applied Intelligence
JF - Applied Intelligence
IS - 3
ER -