Analysis of Breast Cancer Wisconsin (Diagnostic) Data


dataset-images

Tha dataset I used in this projest comes from UC Irvine Machine Learning Repository

    Project Summary:
  1. Data Collection

    Data was collected from UC Irvine Machine Learning Repository. Features are computed from a digitized image of a fine needle aspirate (FNA) of a breast mass. They describe characteristics of the cell nuclei present in the image.

  2. Data Cleaning

    The dataset contains 569 observations and 32 attributes (ID, diagnosis, 30 real-valued input features), and no missing values. Change Diagnosis (M = malignant set as 1, B = benign set as 0 ) and data was made model ready using Python libraries

  3. Feature and Model Selection

    Selected 10 features from 30 features to be used in a predictive model.

    Tested LogisticRegression, Trees(RandomForestClassifier), KNN, SVM, GridSearch five models for prediction. The best model for prediction (detection of breast cancer types) is SVM.

  4. NLP Text Mining

    Builded a text mining model to accessing the Entrez Database via PubMed API Using Biopython

    Sorted the top words from the titles and abstracts of Breast Cancer Diagnosis related papers.

  5. Model Prediction Flask API and Heroku

    Created Flask API to predict the breast cancer type by putting the new patient parameter data.

  6. Data Visualization

    Data and results were displayed in datatable page and Plots.

  7. Tableau Analysis and Dashboard creation

    Used Tableau to analysis the data and find the important features.