Objective: Breast cancer is regarded as a deadly disease in women causing lots of mortalities. Early diagnosis of breast cancer with appropriate tumor biomarkers may facilitate early treatment of the disease, thus reducing the mortality rate. The purpose of the current study is to improve early diagnosis of breast by proposing a two-stage classification of breast tumor biomarkers fora sample of Iraqi women.
Methods: In this study, a two-stage classification system is proposed and tested with four machine learning classifiers. In the first stage, breast features (demographic, blood and salivary-based attributes) are classified into normal or abnormal cases, while in the second stage the abnormal breast cases are further classified into either malignant or benign. The collected 20 breast cancer features are utilized to test the performance of the proposed classification system with Leave-One-Out (LOO) cross validation and Synthetic Minority Over-Sampling Technique (SMOTE) to balance the classes. Furthermore, correlation-based feature selection (CFS) was employed in an exploratory analysis to find the best features for the 2-stage classification system.
Results: Classification accuracy of 94% for stage-1 and 100% for stage-2was achieved with a Naïve Bayesclassifier which outperformed other three methods. In addition, CFS selected small subset of features as being the best five features out of the all 20 features for both stage-1 and stage-2.
Conclusion: We achieved a high classification accuracy which is promising to help improve the early diagnosis of breast tumor. The outcome of this study also shows the importance of CA15-3protein in saliva and blood as well as carcinoembryonic antigen level and total protein in blood, and Estrogen hormone level in saliva, for predicting breast tumors.