Hierarchical feature selection based on relative dependency for gear fault diagnosis

Mariela Cerrada; René Vinicio Sánchez; Fannia Pacheco; Diego Cabrera; Grover Zurita; Chuan Li

doi:10.1007/s10489-015-0725-3

Hierarchical feature selection based on relative dependency for gear fault diagnosis

Mariela Cerrada, René Vinicio Sánchez, Fannia Pacheco, Diego Cabrera, Grover Zurita, Chuan Li

Research output: Contribution to journal › Article › peer-review

58 Scopus citations

Abstract

Feature selection is an important aspect under study in machine learning based diagnosis, that aims to remove irrelevant features for reaching good performance in the diagnostic systems. The behaviour of diagnostic models could be sensitive with regard to the amount of features, and significant features can represent the problem better than the entire set. Consequently, algorithms to identify these features are valuable contributions. This work deals with the feature selection problem through attribute clustering. The proposed algorithm is inspired by existing approaches, where the relative dependency between attributes is used to calculate dissimilarity values. The centroids of the created clusters are selected as representative attributes. The selection algorithm uses a random process for proposing centroid candidates, in this way, the inherent exploration in random search is included. A hierarchical procedure is proposed for implementing this algorithm. In each level of the hierarchy, the entire set of available attributes is split in disjoint sets and the selection process is applied on each subset. Once the significant attributes are proposed for each subset, a new set of available attributes is created and the selection process runs again in the next level. The hierarchical implementation aims to refine the search space in each level on a reduced set of selected attributes, while the computational time-consumption is improved also. The approach is tested with real data collected from a test bed, results show that the diagnosis precision by using a Random Forest based classifier is over 98 % with only 12 % of the attributes from the available set.

Original language	English
Pages (from-to)	687-703
Number of pages	17
Journal	Applied Intelligence
Volume	44
Issue number	3
DOIs	https://doi.org/10.1007/s10489-015-0725-3
State	Published - 1 Apr 2016

Bibliographical note

Funding Information:
The authors want to express a deep gratitude to The Secretary of Higher Education, Science, Technology and Innovation (SENESCYT) of the Republic of Ecuador and the Prometeo program, for their support in this research work. We also acknowledge the support of the GIDTEC research group of the Universidad Politécnica Salesiana in Cuenca-Ecuador, for the accomplishment of this research.

Publisher Copyright:
© 2015, Springer Science+Business Media New York.

Keywords

Attribute clustering
Feature selection
Gear fault diagnosis
Relative dependency
Rough sets

Access to Document

10.1007/s10489-015-0725-3

Cite this

@article{84b20f72b025499ea972e099fdde3ac1,

title = "Hierarchical feature selection based on relative dependency for gear fault diagnosis",

abstract = "Feature selection is an important aspect under study in machine learning based diagnosis, that aims to remove irrelevant features for reaching good performance in the diagnostic systems. The behaviour of diagnostic models could be sensitive with regard to the amount of features, and significant features can represent the problem better than the entire set. Consequently, algorithms to identify these features are valuable contributions. This work deals with the feature selection problem through attribute clustering. The proposed algorithm is inspired by existing approaches, where the relative dependency between attributes is used to calculate dissimilarity values. The centroids of the created clusters are selected as representative attributes. The selection algorithm uses a random process for proposing centroid candidates, in this way, the inherent exploration in random search is included. A hierarchical procedure is proposed for implementing this algorithm. In each level of the hierarchy, the entire set of available attributes is split in disjoint sets and the selection process is applied on each subset. Once the significant attributes are proposed for each subset, a new set of available attributes is created and the selection process runs again in the next level. The hierarchical implementation aims to refine the search space in each level on a reduced set of selected attributes, while the computational time-consumption is improved also. The approach is tested with real data collected from a test bed, results show that the diagnosis precision by using a Random Forest based classifier is over 98 % with only 12 % of the attributes from the available set.",

keywords = "Attribute clustering, Feature selection, Gear fault diagnosis, Relative dependency, Rough sets",

author = "Mariela Cerrada and S{\'a}nchez, {Ren{\'e} Vinicio} and Fannia Pacheco and Diego Cabrera and Grover Zurita and Chuan Li",

note = "Publisher Copyright: {\textcopyright} 2015, Springer Science+Business Media New York.",

year = "2016",

month = apr,

day = "1",

doi = "10.1007/s10489-015-0725-3",

language = "English",

volume = "44",

pages = "687--703",

journal = "Applied Intelligence",

issn = "0924-669X",

publisher = "Springer Netherlands",

number = "3",

}

TY - JOUR

T1 - Hierarchical feature selection based on relative dependency for gear fault diagnosis

AU - Cerrada, Mariela

AU - Sánchez, René Vinicio

AU - Pacheco, Fannia

AU - Cabrera, Diego

AU - Zurita, Grover

AU - Li, Chuan

PY - 2016/4/1

Y1 - 2016/4/1

N2 - Feature selection is an important aspect under study in machine learning based diagnosis, that aims to remove irrelevant features for reaching good performance in the diagnostic systems. The behaviour of diagnostic models could be sensitive with regard to the amount of features, and significant features can represent the problem better than the entire set. Consequently, algorithms to identify these features are valuable contributions. This work deals with the feature selection problem through attribute clustering. The proposed algorithm is inspired by existing approaches, where the relative dependency between attributes is used to calculate dissimilarity values. The centroids of the created clusters are selected as representative attributes. The selection algorithm uses a random process for proposing centroid candidates, in this way, the inherent exploration in random search is included. A hierarchical procedure is proposed for implementing this algorithm. In each level of the hierarchy, the entire set of available attributes is split in disjoint sets and the selection process is applied on each subset. Once the significant attributes are proposed for each subset, a new set of available attributes is created and the selection process runs again in the next level. The hierarchical implementation aims to refine the search space in each level on a reduced set of selected attributes, while the computational time-consumption is improved also. The approach is tested with real data collected from a test bed, results show that the diagnosis precision by using a Random Forest based classifier is over 98 % with only 12 % of the attributes from the available set.

AB - Feature selection is an important aspect under study in machine learning based diagnosis, that aims to remove irrelevant features for reaching good performance in the diagnostic systems. The behaviour of diagnostic models could be sensitive with regard to the amount of features, and significant features can represent the problem better than the entire set. Consequently, algorithms to identify these features are valuable contributions. This work deals with the feature selection problem through attribute clustering. The proposed algorithm is inspired by existing approaches, where the relative dependency between attributes is used to calculate dissimilarity values. The centroids of the created clusters are selected as representative attributes. The selection algorithm uses a random process for proposing centroid candidates, in this way, the inherent exploration in random search is included. A hierarchical procedure is proposed for implementing this algorithm. In each level of the hierarchy, the entire set of available attributes is split in disjoint sets and the selection process is applied on each subset. Once the significant attributes are proposed for each subset, a new set of available attributes is created and the selection process runs again in the next level. The hierarchical implementation aims to refine the search space in each level on a reduced set of selected attributes, while the computational time-consumption is improved also. The approach is tested with real data collected from a test bed, results show that the diagnosis precision by using a Random Forest based classifier is over 98 % with only 12 % of the attributes from the available set.

KW - Attribute clustering

KW - Feature selection

KW - Gear fault diagnosis

KW - Relative dependency

KW - Rough sets

UR - http://www.scopus.com/inward/record.url?scp=84960411565&partnerID=8YFLogxK

U2 - 10.1007/s10489-015-0725-3

DO - 10.1007/s10489-015-0725-3

M3 - Article

AN - SCOPUS:84960411565

SN - 0924-669X

VL - 44

SP - 687

EP - 703

JO - Applied Intelligence

JF - Applied Intelligence

IS - 3

ER -

Hierarchical feature selection based on relative dependency for gear fault diagnosis

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this