Final project

Tha dataset I used in this projest comes from UC Irvine Machine Learning Repository

Data Collection

Data was collected from UC Irvine Machine Learning Repository. Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.
Data Cleaning

The dataset contains 569 observations and 32 attributes (ID, diagnosis, 30 real-valued input features), and no missing values. Change Diagnosis (M = malignant set as 1, B = benign set as 0 ) and data was made model ready using Python libraries
Feature and Model Selection

Selected 10 features from 30 features to be used in a predictive model.

Tested LogisticRegression, Trees(RandomForestClassifier), KNN, SVM, GridSearch five models for prediction. The best model for prediction (detection of breast cancer types) is SVM.
NLP Text Mining

Builded a text mining model to accessing the Entrez Database via PubMed API Using Biopython

Sorted the top words from the titles and abstracts of Breast Cancer Diagnosis related papers.
Model Prediction Flask API and Heroku

Created Flask API to predict the breast cancer type by putting the new patient parameter data.
Data Visualization

Data and results were displayed in datatable page and Plots.
Tableau Analysis and Dashboard creation

Used Tableau to analysis the data and find the important features.