Fake Review Detection System using NLP and ML
Fake Review Detection System using NLP and ML
🔍 Introduction
In the age of online shopping, product reviews play a crucial role in influencing customer decisions. However, not all reviews are genuine — some are fake, either to promote products or sabotage competitors. To tackle this issue, we developed a Fake Review Detection System using Natural Language Processing (NLP) and Machine Learning, wrapped in a user-friendly Streamlit web interface.
This project enables users to upload a CSV file of product reviews and receive two separate downloadable files: one containing real reviews and the other with fake reviews.
🎓 What You Will Learn
- How to preprocess review data
- How to train an NLP-based ML model
- How to classify fake vs. real reviews
- How to create a web interface using Streamlit
- How to handle file upload and download in a web app
Heart Attack Prediction Using Machine Learning : Click here
🏠 Tech Stack
- Frontend: Streamlit (Python-based web framework)
- Backend: Logistic Regression with TF-IDF Vectorizer
- Language: Python
- Libraries: Pandas, NumPy, scikit-learn, re, string
🌐 Streamlit App Flow
1. Upload the CSV file
Users upload a CSV containing product review data.
2. NLP Model Processes Reviews
The system preprocesses the text (lowercasing, punctuation and digit removal) and uses a pre-trained TF-IDF + Logistic Regression model to classify reviews.
3. Download the Results
Two downloadable CSVs are generated: real_reviews.csv
and fake_reviews.csv
.
New Real World Projects : Click Here
📂 Required CSV Format
Ensure your file follows this structure:
category | rating | label | text_ |
---|---|---|---|
Home_and_Kitchen_5 | 5 | CG | Love this! Well made, sturdy. |
Home_and_Kitchen_5 | 1 | OR | Missing information on how to use it. |
category
: Product categoryrating
: Star rating (1-5)label
: CG for genuine, OR for othertext_
: The review content
🔮 How Fake/Real Is Determined
We used TF-IDF (Term Frequency-Inverse Document Frequency) to transform text data into numerical vectors. Then, we trained a Logistic Regression model using labeled data:
- Label
CG
is treated as real (1) - Others are treated as fake (0)
🌐 Full Streamlit Code Overview
The app is a single Python file:
- Loads and trains a model using a sample dataset
- Allows CSV upload
- Validates column structure
- Preprocesses text
- Classifies reviews
- Generates download buttons for real and fake reviews
🚀 How to Run the App
- Save the app as
fake_review_app.py
- Install dependencies:
pip install streamlit pandas numpy scikit-learn
- Run the Streamlit app:
streamlit run fake_review_app.py
- Open your browser at
http://localhost:8501
- Upload your review CSV and download results!
Report
The report will include:
✅ Abstract
✅ Introduction (Overview, Problem Statement, Motivation)
✅ Literature Review
✅ Existing System & Drawbacks
✅ Proposed System
✅ System Architecture (Diagrams)
✅ System Specifications
✅ Experimental Design Diagrams
✅ Implementation (Setup, Modules, Sample Code)
✅ System Testing
✅ Results & Screenshots
✅ Conclusion & Future Scope
✅ References
Post Comment