Fake News Detection Using Machine Learning Project with Source Code - Fake News Detection

Fake News Detection System Using Machine Learning Project with Source Code

Fake News Detection System Using Machine Learning Project with Source Code

Interested in above project ,Click Below
WhatsApp
Telegram
LinkedIn

Fake news spreads faster than real news — and that is a serious problem in today’s digital world. Social media, blogs, and messaging apps allow anyone to share information in seconds, but not all of it is true. This is exactly why a Fake News Detection System using Machine Learning and NLP is one of the most relevant and impressive projects you can build as a CS/IT student in 2026. It combines Python, TF-IDF, SVM, DistilBERT, and a Streamlit web interface into one complete solution that solves a real-world problem — making it a project that stands out in every placement interview and viva.

Fake News Detection System Using Machine Learning Project with Source

Project Overview

Project NameFake News Detection System
LanguagePython 3.8+
TechnologyMachine Learning, NLP, Deep Learning
Models UsedLogistic Regression, Linear SVM, DistilBERT
Text FeaturesTF-IDF Vectorization
FrontendStreamlit
DatasetLabelled Real and Fake News Articles (balanced)
DifficultyIntermediate to Advanced
Best ForBCA, MCA, B.Tech CS/IT Final Year Students Globally

Key Features

  • Three ML models in one project — Logistic Regression, Linear SVM, and DistilBERT all trained and deployed side by side, letting students compare performance across classical ML and Deep Learning approaches
  • Single article prediction — paste any news article text and get an instant REAL or FAKE prediction with a confidence score from any of the three models
  • Batch CSV upload — upload a CSV file of multiple articles for bulk analysis, making it easy to demo large-scale detection during viva
  • Model comparison dashboard — compare predictions from Logistic Regression, SVM, and BERT side by side on the same article to understand how different approaches handle the same text
  • Download prediction results — export bulk analysis results as a CSV file for documentation and project reports
  • Clean Streamlit web interface — fully built in Python with no HTML or CSS required; shows students how to deploy an ML model as a working web application
  • Balanced real-world dataset — trained on labelled real news from trusted sources and fake news from flagged websites with equal class distribution to avoid biased predictions

Technologies Used

LayerTechnologyPurpose
LanguagePython 3.8+Core ML pipeline, preprocessing, and model training
NLP FeaturesTF-IDF Vectorizer (scikit-learn)Convert news text into numerical feature vectors
Model 1Logistic Regression (scikit-learn)Fast, lightweight binary classification baseline
Model 2Linear SVM (scikit-learn)Higher accuracy on large noisy text feature spaces
Model 3DistilBERT (HuggingFace Transformers)Transformer-based deep learning for contextual understanding
Deep LearningPyTorch (torch)Backend engine for running the DistilBERT model
Web InterfaceStreamlitDeploy and interact with the ML model via a browser
Datapandas + numpyDataset loading, preprocessing, and manipulation

How the Three Models Work

Model 1 — TF-IDF + Logistic Regression

Text is converted into a TF-IDF matrix where each word gets a score based on how often it appears in one article versus all articles. Logistic Regression then performs binary classification — REAL or FAKE — on these scores. It is fast, lightweight, and easy to explain in viva. This is the best baseline model for students just starting with NLP.

Model 2 — TF-IDF + Linear SVM

The same TF-IDF features are passed to a Linear Support Vector Machine, which finds the optimal hyperplane to separate fake and real news in high-dimensional feature space. SVM handles noisy, high-dimensional text data better than Logistic Regression and delivers noticeably higher accuracy on the news dataset.

Model 3 — DistilBERT (Deep Learning)

DistilBERT is a lighter, faster version of Google’s BERT transformer model. Unlike TF-IDF, it understands the actual meaning and context of words — not just their frequency. It captures sentence structure, tone, and language patterns that TF-IDF misses. This gives the highest accuracy of all three models and makes this project perfect for students who want to explore Deep Learning and transformer architecture.

See also  Library Management System in Python (Flask)

How It Works

Single article prediction flow

  1. User opens the Streamlit web interface in the browser at http://localhost:8501
  2. User pastes a news article into the text input box and selects the model (LR, SVM, or BERT)
  3. The text is cleaned — stopwords removed, lowercased, punctuation stripped
  4. For LR and SVM, the cleaned text is vectorized using the saved TF-IDF transformer; for BERT, it is tokenized using the DistilBERT tokenizer
  5. The selected model runs the prediction and returns REAL or FAKE with a confidence percentage displayed on screen

Batch CSV prediction flow

  1. User uploads a CSV file containing a column of news article texts
  2. The app reads each row, preprocesses the text, and runs the selected model on every article
  3. Results are shown in a table — article text, predicted label (REAL/FAKE), and confidence score for each row
  4. User clicks Download Results to export the prediction table as a CSV file for project documentation

Model training pipeline

  1. Raw dataset of labelled real and fake news articles is loaded using pandas
  2. Text is preprocessed — tokenized, stopwords removed, TF-IDF vectorizer fitted on the training split
  3. Logistic Regression and SVM are trained on TF-IDF features using scikit-learn’s fit() method
  4. DistilBERT is fine-tuned on the same labelled dataset using HuggingFace Transformers and PyTorch
  5. All three trained models and the TF-IDF vectorizer are saved to disk using pickle / joblib for loading at inference time

How to Run This Project

Step 1 — Enter the project folder

cd fake-news-detection

Step 2 — Create a virtual environment (recommended)

# Windows
python -m venv venv
venv\Scripts\activate

# macOS / Linux
python -m venv venv
source venv/bin/activate

Step 3 — Install all dependencies

pip install -r requirements.txt

The requirements.txt includes all packages needed:

pandas
numpy
scikit-learn
transformers
torch
streamlit

Step 4 — Train the models (first-time only)

python train.py

This trains all three models on the dataset and saves the trained files to the models/ folder.

Step 5 — Launch the Streamlit app

streamlit run app.py

Open http://localhost:8501 in your browser. Paste any news article and get a REAL or FAKE prediction instantly.

See also  Best Vehicle Management System Using Python Django

Download Full Source Code

Fake News Detection System Using Machine Learning Project with Source
Fake News Detection System Using Machine Learning Project with Source
Fake News Detection System Using Machine Learning Project with Source
Fake News Detection System Using Machine Learning Project with Source

Get the complete project with full source code, dataset, trained model files, and setup guide. Remote support is available if you face any issues during installation or deployment.


Future Scope and Improvements

  • Multi-language detection — extend the model beyond English to detect fake news in Hindi, Spanish, or other languages
  • Image and video analysis — detect fake news embedded in media content using computer vision
  • Source credibility scoring — rank news sources by historical reliability and flag low-trust domains
  • Browser extension — detect fake news in real time as users browse news websites
  • AI-generated content detection — flag articles written by ChatGPT or other generative AI tools
  • Research paper scope — the three-model comparison makes this project suitable for publishing in college journals or IEEE conferences

Why This is a Great Final Year Project

  • Three ML models in one codebase — Logistic Regression, SVM, and DistilBERT together make this project stand out from every single-model submission in your class
  • NLP fundamentals fully covered — tokenization, stopword removal, TF-IDF vectorization, and transformer tokenization are all demonstrated in real working code
  • Streamlit deployment is a highly demanded, industry-relevant skill that most students never learn — this project teaches it from scratch
  • BERT and transformer architecture shows examiners you understand modern deep learning, not just classical ML algorithms
  • Real-world problem — fake news detection is actively researched by companies like Google, Meta, and Twitter; building it as a student project directly connects to what the industry needs
  • Batch CSV analysis and downloadable results make the project feel like a complete product, not just a notebook
  • Strong resume value — Python + NLP + Transformers + Streamlit deployment is a data science and AI skill set that companies actively hire for

  •    KEYWORDS:
    • Fake News Detection System india
    • Fake News Detection System kaggle
    • Fake News Detection System dataset
    • Fake News Detection System app
    • Fake News Detection System geeksforgeeks
    • fake news detection dataset kaggle
    • fake news detection dataset india
    • fake news detection project ppt
    • fake news detection system project
    • fake news detection system project pdf
    • fake news detection system github
    • fake news detection system python
    • fake news detection system pdf
    • fake news detection using machine learning
    • fake news detection website
    • fake news detection system in machine learning

🎓 Need Complete Final Year Project?

Get Source Code + Report + PPT + Viva Questions (Instant Access)

🛒 Visit UpdateGadh Store →
💬 Chat Now