Apriori Algorithm in Machine Learning

Apriori Algorithm in Machine Learning

When you walk into a supermarket and spot an irresistible combo offer like “Buy Bread and Get Butter at 50% Off,” there’s a good chance that this strategy was derived utilising the Apriori Algorithm, which is a fundamental component of machine learning association rule mining. This algorithm digs deep into transaction data to uncover relationships between items, answering questions like: Which items are most frequently purchased together? and What patterns of behavior can be identified across thousands of customer transactions?

Complete Python Course with Advance topics:-Click Here
SQL Tutorial :-Click Here
Data Science Tutorial:-Click Here

🌟 What is the Apriori Algorithm?

The Apriori algorithm, which was first presented by Rakesh Agrawal and Ramakrishnan Srikant in 1994, is used to find frequently occurring itemsets and produce association rules. It efficiently scans datasets using a Hash Tree structure and a breadth-first search approach.

Large datasets are ideal for it, particularly in:

  • Finding items that are frequently purchased together is known as market basket analysis.
  • Healthcare: Finding connections between medications and therapies.

🧮 What is a Frequent Itemset?

A frequent itemset refers to items that appear together in a transactional database more often than a user-defined minimum support threshold.

Example:

Let’s say we have:

  • Transaction A = {1,2,3,4,5}
  • Transaction B = {2,3,7}

From these, {2,3} appear in both transactions and are considered frequent itemsets if their support crosses the threshold.

Note: Understanding support, confidence, and lift is key to mastering Apriori and association rules.

🪜 Steps of the Apriori Algorithm

  1. Determine the itemsets’ support and remove those with less than the bare minimum.
  2. Create candidate itemsets from frequently occurring itemsets that are longer (e.g., triplets, pairs).
  3. Determine each rule’s confidence level using the often occurring itemsets, then eliminate any weak rules.
  4. Sort rules by lift to prioritize the most influential associations.

🧠 How Apriori Works: A Step-by-Step Example

Let’s walk through a hypothetical example:

Step 1: Generate C1 and L1

  • C1: Count the frequency (support) of individual items.
  • L1: Keep only those items whose support ≥ minimum support (say 2).
    Items like E with low support are removed.

Step 2: Generate C2 and L2

  • Make pairs of L1 goods that are frequently used.
  • Count how often each pair appears in transactions.
  • Keep only frequent pairs in L2.

Step 3: Generate C3 and L3

  • Combine items into triplets.
  • Find frequent triplets (e.g., {A, B, C}) that meet the support threshold.

Step 4: Generate Association Rules

From {A, B, C}, generate rules:

  • A ∧ B → C
  • B ∧ C → A
  • A ∧ C → B
  • And calculate confidence for each.

Make pairs of L1 goods that are frequently used.

Example Output:

Rule Confidence
A ∧ B → C 50%
B ∧ C → A 50%
A ∧ C → B 50%

These are your strong association rules.

✅ Advantages of the Apriori Algorithm

  • Simple to understand and implement
  • Effective on large datasets due to pruning strategies
  • Easily integrates with business intelligence tools

❌ Disadvantages

  • Computationally expensive – multiple scans of the database
  • O(2^D), where D is the number of objects, indicates high time and space complexity.
  • Performance drops as dataset size increases

💻 Python Implementation of Apriori

Let’s use the apyori library to implement Apriori in Python.

📌 Step 1: Install and Import Libraries

pip install apyori

import numpy as np
import pandas as pd
from apyori import apriori

📌 Step 2: Load and Preprocess the Data

dataset = pd.read_csv('Market_Basket_data1.csv', header=None)
transactions = []
for i in range(0, 7501):
    transactions.append([str(dataset.values[i, j]) for j in range(0, 20)])

Here, each row is a transaction. For Apriori compatibility, we’re turning the data into a list of lists.

📌 Step 3: Apply the Apriori Algorithm

rules = apriori(
    transactions=transactions,
    min_support=0.003,
    min_confidence=0.2,
    min_lift=3,
    min_length=2,
    max_length=2
)
results = list(rules)

  • min_support: 0.003 (~3 transactions out of 7501)
  • min_confidence: 20%
  • min_lift: 3 ensures interesting and useful rules

📌 Step 4: Display the Results

for item in results:
    print(f"Rule: {item.items}")
    for stat in item.ordered_statistics:
        print(f"   {set(stat.items_base)} -> {set(stat.items_add)} | Confidence: {stat.confidence:.2f} | Lift: {stat.lift:.2f}")

Sample Output:

Rule: frozenset({'chicken', 'light cream'})
   {'light cream'} -> {'chicken'} | Confidence: 0.29 | Lift: 4.84
Rule: frozenset({'escalope', 'pasta'})
   {'pasta'} -> {'escalope'} | Confidence: 0.37 | Lift: 4.70

These insights can help retailers run smarter promotions like “Buy Pasta, Get Escalope Discount!”

Download New Real Time Projects :-Click here
Complete Advance AI topics:- CLICK HERE

📈 Final Thoughts

The Apriori Algorithm is one of the most intuitive and practical tools in data mining. From retail to healthcare, it helps organizations uncover hidden patterns that drive smarter decisions. While it may not be the fastest, its simplicity and reliability make it a strong starting point in association rule learning.

🚀 If you’re exploring data mining or e-commerce analytics, the Apriori algorithm is your go-to tool for discovering hidden patterns in data.


apriori algorithm in data mining
apriori algorithm example
apriori algorithm in python
apriori algorithm solved example
fp growth algorithm
apriori algorithm in machine learning code
apriori algorithm example with solution pdf
apriori algorithm bread, milk example
genetic algorithm in machine learning
k means clustering
apriori algorithm in machine learning with example
apriori algorithm in machine learning python
apriori algorithm in machine learning geeksforgeeks

Share this content:

Post Comment