Python Code Snippets for Data Science Projects

Updategadh August 4, 2024 6 min read

Python Code Snippets for Data Science Projects

Python is the go-to language for data science due to its simplicity and the powerful libraries it offers. Whether youre a beginner or an experienced data scientist, having a collection of handy code snippets can save you time and enhance your productivity. Here are the top 10 Python code snippets for data science projects that you should know.

1. Importing Essential Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats

Explanation: This snippet imports the essential libraries for data manipulation, statistical analysis, and visualization.

2. Loading a Dataset

# Load dataset from a CSV file
df = pd.read_csv('data.csv')

Explanation: Use pandas to read a CSV file into a DataFrame for easy data manipulation.

3. Handling Missing Values

# Fill missing values with the mean of the column
df.fillna(df.mean(), inplace=True)

Explanation: This snippet fills missing values in a DataFrame with the mean of the respective columns.

4. Basic Data Exploration

# Display the first 5 rows of the dataset
print(df.head())

# Get summary statistics
print(df.describe())

# Check for missing values
print(df.isnull().sum())

Explanation: Quickly explore your dataset with these basic commands to understand its structure and identify any missing values.

5. Data Visualization

# Plot a histogram for a specific column
plt.hist(df['column_name'], bins=20)
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histogram of column_name')
plt.show()

Explanation: Visualize the distribution of data in a specific column using a histogram.

6. Correlation Matrix

# Compute and visualize the correlation matrix
corr_matrix = df.corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()

Explanation: Use a heatmap to visualize the correlation matrix and understand the relationships between different features.

7. Feature Scaling

from sklearn.preprocessing import StandardScaler

# Scale features
scaler = StandardScaler()
scaled_df = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)

Explanation: Standardize features by removing the mean and scaling to unit variance using StandardScaler from sklearn.

8. Splitting the Dataset

from sklearn.model_selection import train_test_split

# Split the dataset into training and testing sets
X = df.drop('target', axis=1)
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

Explanation: Split your dataset into training and testing sets to evaluate the performance of your models.

9. Building a Simple Machine Learning Model

from sklearn.linear_model import LinearRegression

# Create and train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

Explanation: Build and train a simple linear regression model using sklearn.

10. Model Evaluation

from sklearn.metrics import mean_squared_error, r2_score

# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
print(f'R^2 Score: {r2}')

Explanation: Evaluate your models performance using metrics like Mean Squared Error (MSE) and R-squared (R^2) score.

Complete Code

# Importing Essential Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import scipy.stats as stats

# Loading a Dataset
# Replace 'data.csv' with your dataset file
df = pd.read_csv('data.csv')

# Handling Missing Values
# Fill missing values with the mean of the column
df.fillna(df.mean(), inplace=True)

# Basic Data Exploration
# Display the first 5 rows of the dataset
print("First 5 rows of the dataset:")
print(df.head())

# Get summary statistics
print("\nSummary statistics:")
print(df.describe())

# Check for missing values
print("\nMissing values count:")
print(df.isnull().sum())

# Data Visualization
# Plot a histogram for a specific column
# Replace 'column_name' with the column you want to plot
plt.hist(df['column_name'], bins=20)
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histogram of column_name')
plt.show()

# Correlation Matrix
# Compute and visualize the correlation matrix
corr_matrix = df.corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()

# Feature Scaling
# Scale features
scaler = StandardScaler()
scaled_df = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)

# Splitting the Dataset
# Replace 'target' with the name of your target variable
X = df.drop('target', axis=1)
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Building a Simple Machine Learning Model
# Create and train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Model Evaluation
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
print(f'R^2 Score: {r2}')

Instructions:

Dataset: Make sure to replace 'data.csv' with the path to your dataset file.
Column Names: Replace 'column_name' with the name of the column you want to plot in the histogram.
Target Variable: Replace 'target' with the name of your target variable.

Complete Python Course :

Free Notes :-

New Project :-https://www.youtube.com/@Decodeit2

How to setup this Project Complete video Click here

Conclusion

These Python Code Snippets cover a range of tasks in data science, from loading and exploring data to building and evaluating machine learning models. By incorporating these Python Code Snippets into your workflow, you can streamline your data science projects and focus on deriving insights and making impactful decisions. Keep these Python Code Snippets handy, and youll be well-equipped to tackle any data science challenge that comes your way.

Python Code Snippets for Data Science Projects