Tha dataset I used in this projest comes from UC Irvine Machine Learning Repository
Data was collected from UC Irvine Machine Learning Repository. Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.
The dataset contains 569 observations and 32 attributes (ID, diagnosis, 30 real-valued input features), and no missing values. Change Diagnosis (M = malignant set as 1, B = benign set as 0 ) and data was made model ready using Python libraries
Selected 10 features from 30 features to be used in a predictive model.
Tested LogisticRegression, Trees(RandomForestClassifier), KNN, SVM, GridSearch five models for prediction. The best model for prediction (detection of breast cancer types) is SVM.
Builded a text mining model to accessing the Entrez Database via PubMed API Using Biopython
Sorted the top words from the titles and abstracts of Breast Cancer Diagnosis related papers.
Created Flask API to predict the breast cancer type by putting the new patient parameter data.
Data and results were displayed in datatable page and Plots.
Used Tableau to analysis the data and find the important features.