TY - JOUR
T1 - Lung Cancer Detection
T2 - A Classification Approach Utilizing Oversampling and Support Vector Machines
AU - Jara Gavilanes, Adolfo
AU - Robles Bykbaev, Vladimir
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd.
PY - 2024/1
Y1 - 2024/1
N2 - Lung cancer is the type of cancer that causes the most deaths each year. It is also cancer with the lowest survival rate. This represents a health problem worldwide. Lung cancer has two subtypes: Non-Small Cell Lung Cancer (NSCLC) and Small Cell Lung Cancer (SCLC). For doctors, it can be hard to detect and differentiate them. Therefore, in this work, we present a method to help doctors with this issue. It consists of three phases: image preprocessing is the first phase. It starts gathering the data. After that, PET scans are selected. Then, all the scans are converted to grayscale images, and finally, all the images are joined to create a video from each patient’s scan. Next, the data extraction phase starts. In this phase, some frames are extracted from each video, and they are flattened and blended to create a row of information from each frame. Thus, a dataframe is created where each row represents a patient, and each column is a pixel value. To obtain better results, an oversampling technique is applied. In this manner, the classes are balanced. Following this, a dimensionality reduction technique is applied to reduce the number of columns produced by the previous steps and to check if this technique improves the results yielded by each model. Subsequently, the model evaluation phase begins. At this stage, two models are created: a Support Vector Machine (SVM), and a Random Forest. Ultimately, the findings are unveiled, revealing that the SVM emerged as the top-performing model, boasting an impressive 97% accuracy, 98% precision, and 97% sensitivity. Eventually, this method can be applied to detect and classify different diseases that involve PET scans.
AB - Lung cancer is the type of cancer that causes the most deaths each year. It is also cancer with the lowest survival rate. This represents a health problem worldwide. Lung cancer has two subtypes: Non-Small Cell Lung Cancer (NSCLC) and Small Cell Lung Cancer (SCLC). For doctors, it can be hard to detect and differentiate them. Therefore, in this work, we present a method to help doctors with this issue. It consists of three phases: image preprocessing is the first phase. It starts gathering the data. After that, PET scans are selected. Then, all the scans are converted to grayscale images, and finally, all the images are joined to create a video from each patient’s scan. Next, the data extraction phase starts. In this phase, some frames are extracted from each video, and they are flattened and blended to create a row of information from each frame. Thus, a dataframe is created where each row represents a patient, and each column is a pixel value. To obtain better results, an oversampling technique is applied. In this manner, the classes are balanced. Following this, a dimensionality reduction technique is applied to reduce the number of columns produced by the previous steps and to check if this technique improves the results yielded by each model. Subsequently, the model evaluation phase begins. At this stage, two models are created: a Support Vector Machine (SVM), and a Random Forest. Ultimately, the findings are unveiled, revealing that the SVM emerged as the top-performing model, boasting an impressive 97% accuracy, 98% precision, and 97% sensitivity. Eventually, this method can be applied to detect and classify different diseases that involve PET scans.
KW - Lung cancer
KW - Oversampling
KW - PET scan
KW - Support vector machines
UR - http://www.scopus.com/inward/record.url?scp=85182411248&partnerID=8YFLogxK
U2 - 10.1007/s42979-023-02432-6
DO - 10.1007/s42979-023-02432-6
M3 - Article
AN - SCOPUS:85182411248
SN - 2662-995X
VL - 5
JO - SN Computer Science
JF - SN Computer Science
IS - 1
M1 - 74
ER -