Fake Review Detection System using NLP and ML
Fake Review Detection System using NLP and ML
 Introduction
These days, when people shop online, product reviews really help them decide what to buy. But some reviews are fake — either to make a product look better or to harm others.To solve this problem, we made a Fake Review Detection System using NLP and Machine Learning. It comes with an easy-to-use web app made with Streamlit.Users can upload a CSV file with product reviews, and the system will check them. Then, it gives two download files — one with real reviews and one with fake ones.
What You Will Learn
- How to preprocess review data
- How to train an NLP-based ML model
- How to classify fake vs. real reviews
- How to create a web interface using Streamlit
- How to handle file upload and download in a web app
Heart Attack Prediction Using Machine Learning : Click here
Tech Stack
- Frontend: Streamlit (Python-based web framework)
- Backend: Logistic Regression with TF-IDF Vectorizer
- Language: Python
- Libraries: Pandas, NumPy, scikit-learn, re, string
Streamlit App Flow
1. Upload the CSV file
Users upload a CSV containing product review data.
2. NLP Model Processes Reviews
The system preprocesses the text (lowercasing, punctuation and digit removal) and uses a pre-trained TF-IDF + Logistic Regression model to classify reviews.
3. Download the Results
Two downloadable CSVs are generated: real_reviews.csv
and fake_reviews.csv
.
New Real World Projects : Click Here
Required CSV Format
Ensure your file follows this structure:
category | rating | label | text_ |
---|---|---|---|
Home_and_Kitchen_5 | 5 | CG | Love this! Well made, sturdy. |
Home_and_Kitchen_5 | 1 | OR | Missing information on how to use it. |
category
: Product categoryrating
: Star rating (1-5)label
: CG for genuine, OR for othertext_
: The review content
How Fake/Real Is Determined
We used TF-IDF (Term Frequency-Inverse Document Frequency) to transform text data into numerical vectors. Then, we trained a Logistic Regression model using labeled data:
- Label
CG
is treated as real (1) - Others are treated as fake (0)
Full Streamlit Code Overview
The app is a single Python file:
- Loads and trains a model using a sample dataset
- Allows CSV upload
- Validates column structure
- Preprocesses text
- Classifies reviews
- Generates download buttons for real and fake reviews
How to Run the App
- Save the app as
fake_review_app.py
- Install dependencies:
pip install streamlit pandas numpy scikit-learn
- Run the Streamlit app:
streamlit run fake_review_app.py
- Open your browser at
http://localhost:8501
- Upload your review CSV and download results!
Report
The report will include:
✅ Abstract
✅ Introduction (Overview, Problem Statement, Motivation)
✅ Literature Review
✅ Existing System & Drawbacks
✅ Proposed System
✅ System Architecture (Diagrams)
✅ System Specifications
✅ Experimental Design Diagrams
✅ Implementation (Setup, Modules, Sample Code)
✅ System Testing
✅ Results & Screenshots
✅ Conclusion & Future Scope
✅ References
Post Comment