Fake Review Detection System using NLP and ML
 Introduction
These days, when people shop online, product reviews really help them decide what to buy. But some reviews are fake — either to make a product look better or to harm others.To solve this problem, we made a Fake Review Detection System using NLP and Machine Learning. It comes with an easy-to-use web app made with Streamlit.Users can upload a CSV file with product reviews, and the system will check them. Then, it gives two download files — one with real reviews and one with fake ones.
What You Will Learn
- How to preprocess review data
- How to train an NLP-based ML model
- How to classify fake vs. real reviews
- How to create a web interface using Streamlit
- How to handle file upload and download in a web app
Heart Attack Prediction Using Machine Learning : Click here
Tech Stack
- Frontend: Streamlit (Python-based web framework)
- Backend: Logistic Regression with TF-IDF Vectorizer
- Language: Python
- Libraries: Pandas, NumPy, scikit-learn, re, string
Streamlit App Flow
1. Upload the CSV file
Users upload a CSV containing product review data.
2. NLP Model Processes Reviews
The system preprocesses the text (lowercasing, punctuation and digit removal) and uses a pre-trained TF-IDF + Logistic Regression model to classify reviews.
3. Download the Results
Two downloadable CSVs are generated: real_reviews.csv and fake_reviews.csv.
New Real World Projects : Click Here
Required CSV Format
Ensure your file follows this structure:
| category | rating | label | text_ |
|---|---|---|---|
| Home_and_Kitchen_5 | 5 | CG | Love this! Well made, sturdy. |
| Home_and_Kitchen_5 | 1 | OR | Missing information on how to use it. |
category: Product categoryrating: Star rating (1-5)label: CG for genuine, OR for othertext_: The review content
How Fake/Real Is Determined
We used TF-IDF (Term Frequency-Inverse Document Frequency) to transform text data into numerical vectors. Then, we trained a Logistic Regression model using labeled data:
- Label
CGis treated as real (1) - Others are treated as fake (0)
Full Streamlit Code Overview
The app is a single Python file:
- Loads and trains a model using a sample dataset
- Allows CSV upload
- Validates column structure
- Preprocesses text
- Classifies reviews
- Generates download buttons for real and fake reviews
How to Run the App
- Save the app as
fake_review_app.py - Install dependencies:
pip install streamlit pandas numpy scikit-learn
- Run the Streamlit app:
streamlit run fake_review_app.py
- Open your browser at
http://localhost:8501 - Upload your review CSV and download results!
Report Structure
✅ Abstract
-
Short summary of the project (100–200 words).
-
Should include objective, methodology (ML/Deep Learning/Tools), and key results.
✅ Introduction
-
Overview: General context (e.g., healthcare, environment, salary prediction, etc.).
-
Problem Statement: The main issue being solved.
-
Motivation: Why this project is important and useful.
✅ Literature Review
-
Previous works, research papers, or existing studies in the same area.
-
Comparisons of different approaches.
-
Gaps in existing research that your project addresses.
✅ Existing System & Drawbacks
-
How current systems/methods work.
-
Their limitations (accuracy, scalability, cost, user experience, etc.).
✅ Proposed System
-
Description of your solution.
-
Features, advantages, and innovations over existing system.
✅ System Architecture (Diagrams)
-
High-level architecture diagram.
-
Data flow diagram (DFD).
-
Use-case diagram (if needed).
✅ System Specifications
-
Hardware Requirements: CPU, RAM, GPU, Storage.
-
Software Requirements: OS, Python, Libraries, Frameworks, Database.
✅ Experimental Design Diagrams
-
Workflow diagrams showing data preprocessing, training, testing, and prediction pipeline.
-
Flowchart/ER diagram (if using database).
✅ Implementation
-
Setup process (environment installation).
-
Explanation of modules.
-
Sample code snippets with explanation.
✅ System Testing
-
Types of testing done (Unit, Integration, System, User testing).
-
Test cases and results.
✅ Results & Screenshots
-
Output graphs, charts, prediction results.
-
Screenshots of web/app interface.
✅ Conclusion & Future Scope
-
Summary of achievements.
-
Limitations of the project.
-
Future improvements (adding more features, real-time deployment, cloud integration).
✅ References
-
Research papers, articles, official documentation, datasets.
-
Follow IEEE or APA referencing style.
