Breast Cancer Prediction Using Machine Learning

Breast Cancer Prediction Using Machine Learning

Introduction

Breast cancer is one of the most common and life-threatening diseases affecting women worldwide. Early detection plays a crucial role in improving survival rates. This Breast Cancer Prediction Project leverages machine learning algorithms to classify whether a tumor is benign or malignant based on 30 medical features available in the dataset. The dataset used in this project is sourced from the Breast Cancer Wisconsin Diagnostic Dataset, which is available in the scikit-learn library.


Dataset Information

📌 Dataset Source: Breast Cancer Wisconsin Diagnostic Dataset
📌 Number of Features: 30
📌 Problem Type: Binary Classification (Benign or Malignant)

Key Features in the Dataset:

  • Mean radius, texture, perimeter, and area of the tumor
  • Mean smoothness, compactness, symmetry, and fractal dimension
  • Worst and standard error values of all features

Machine Learning Models Used

To improve prediction accuracy, multiple machine learning models were trained, and the best-performing model was selected for deployment. The following classification algorithms were used:

Logistic Regression
Support Vector Machine (SVM)
Gaussian Naive Bayes
Random Forest Regressor
Gradient Boosting
Decision Tree
Neural Network (MLP – Multi-Layer Perceptron)

After model evaluation, the best-performing model was chosen for final predictions.


Project Structure

This project is well-structured and organized into multiple components for efficient execution and scalability.

Main Files & Directories:

📂 requirements.txt – Lists all necessary Python libraries and dependencies.
📂 application.py – Flask application file responsible for hosting the web-based interface.
📂 notebooks/ – Contains Jupyter Notebooks for data exploration, visualization, and model training.
📂 setup.py – Contains configuration settings for the project.
📂 src/ – The core source code of the project, including:

  • logs/ – Stores log files generated during execution.
  • components/ – Contains key components of the machine learning pipeline:
    • data_ingestion.py – Loads and prepares the dataset.
    • data_transformation.py – Performs preprocessing and feature engineering.
    • model_trainer.py – Trains and evaluates machine learning models.
  • pipelines/ – Defines ML pipelines for training and prediction:
    • training_pipeline.py – Handles training pipeline execution.
    • prediction_pipeline.py – Defines the pipeline for making predictions using the trained model.
  • exception.py – Manages user-defined errors for better debugging.
  • logger.py – Records and tracks logs for monitoring application performance.
  • utils.py – Contains utility functions used across the project.

📂 artifacts/ – Stores raw, train, and test datasets, along with the best-trained model in a pickle file.
📂 templates/ – Contains HTML files used for user input forms, allowing users to enter test data for prediction via Flask.


How It Works

1️⃣ Data Preprocessing:

  • The dataset is loaded and cleaned (handling missing values, encoding categorical features, and scaling numerical values).
  • Features are extracted and transformed to optimize model training.

2️⃣ Model Training:

  • Multiple classification models are trained and evaluated.
  • The best-performing model is selected based on accuracy, precision, recall, and F1-score.

3️⃣ Web Application:

  • A Flask-based web interface allows users to enter medical parameters.
  • The trained model predicts whether the tumor is benign or malignant.

4️⃣ Model Deployment:

  • The trained model is saved and deployed for real-time breast cancer prediction.

This Breast Cancer Prediction Project demonstrates how machine learning can be used to assist in early detection of breast cancer. By leveraging classification algorithms, the system provides highly accurate predictions, helping medical professionals make informed decisions.

🚀 Future Enhancements:
🔹 Deploy the model as a cloud-based API for broader accessibility.
🔹 Train the model using a larger, more diverse dataset for improved accuracy.
🔹 Integrate deep learning (CNNs & LSTMs) for enhanced feature extraction and prediction.


Python Project :- click here

Projects :- Click here

 

Post Comment