Fake Review Detection System using NLP and ML

Fake Review Detection System using NLP and ML

🔍 Introduction

In the age of online shopping, product reviews play a crucial role in influencing customer decisions. However, not all reviews are genuine — some are fake, either to promote products or sabotage competitors. To tackle this issue, we developed a Fake Review Detection System using Natural Language Processing (NLP) and Machine Learning, wrapped in a user-friendly Streamlit web interface.

This project enables users to upload a CSV file of product reviews and receive two separate downloadable files: one containing real reviews and the other with fake reviews.


🎓 What You Will Learn

  • How to preprocess review data
  • How to train an NLP-based ML model
  • How to classify fake vs. real reviews
  • How to create a web interface using Streamlit
  • How to handle file upload and download in a web app

Heart Attack Prediction Using Machine Learning : Click here

🏠 Tech Stack

  • Frontend: Streamlit (Python-based web framework)
  • Backend: Logistic Regression with TF-IDF Vectorizer
  • Language: Python
  • Libraries: Pandas, NumPy, scikit-learn, re, string

🌐 Streamlit App Flow

1. Upload the CSV file

Users upload a CSV containing product review data.

2. NLP Model Processes Reviews

The system preprocesses the text (lowercasing, punctuation and digit removal) and uses a pre-trained TF-IDF + Logistic Regression model to classify reviews.

3. Download the Results

Two downloadable CSVs are generated: real_reviews.csv and fake_reviews.csv.


New Real World Projects : Click Here

📂 Required CSV Format

Ensure your file follows this structure:

categoryratinglabeltext_
Home_and_Kitchen_55CGLove this! Well made, sturdy.
Home_and_Kitchen_51ORMissing information on how to use it.
  • category: Product category
  • rating: Star rating (1-5)
  • label: CG for genuine, OR for other
  • text_: The review content

🔮 How Fake/Real Is Determined

We used TF-IDF (Term Frequency-Inverse Document Frequency) to transform text data into numerical vectors. Then, we trained a Logistic Regression model using labeled data:

  • Label CG is treated as real (1)
  • Others are treated as fake (0)

🌐 Full Streamlit Code Overview

The app is a single Python file:

  • Loads and trains a model using a sample dataset
  • Allows CSV upload
  • Validates column structure
  • Preprocesses text
  • Classifies reviews
  • Generates download buttons for real and fake reviews

🚀 How to Run the App

  1. Save the app as fake_review_app.py
  2. Install dependencies:
pip install streamlit pandas numpy scikit-learn
  1. Run the Streamlit app:
streamlit run fake_review_app.py
  1. Open your browser at http://localhost:8501
  2. Upload your review CSV and download results!

Report

The report will include:

Abstract
Introduction (Overview, Problem Statement, Motivation)
Literature Review
Existing System & Drawbacks
Proposed System
System Architecture (Diagrams)
System Specifications
Experimental Design Diagrams
Implementation (Setup, Modules, Sample Code)
System Testing
Results & Screenshots
Conclusion & Future Scope
References

image-8 Fake Review Detection System using NLP and ML
Fake Review Detection System
AD_4nXceBVijZXuiSTpqKetbrsL6KYRr94ruH1PHUvwqkaOKhVlEQ-fjmY8GTXwx8mChU1cQqCcoi-mQnGLVzzs57Hp497rQ2tmbgx4BFca_5lD7VRXbDNPS2Um-NJezJAURYHUmL9mt7w?key=FeGpu5VOQrmp5ssEXJ-roaMl Fake Review Detection System using NLP and ML
Fake Review Detection System

Post Comment