Python Code Snippets for Data Science Projects
Python is the go-to language for data science due to its simplicity and the powerful libraries it offers. Whether you’re a beginner or an experienced data scientist, having a collection of handy code snippets can save you time and enhance your productivity. Here are the top 10 Python code snippets for data science projects that you should know.
Table of Contents
1. Importing Essential Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import scipy.stats as stats
- Explanation: This snippet imports the essential libraries for data manipulation, statistical analysis, and visualization.
2. Loading a Dataset
# Load dataset from a CSV file
df = pd.read_csv('data.csv')
- Explanation: Use
pandas
to read a CSV file into a DataFrame for easy data manipulation.
3. Handling Missing Values
# Fill missing values with the mean of the column
df.fillna(df.mean(), inplace=True)
- Explanation: This snippet fills missing values in a DataFrame with the mean of the respective columns.
4. Basic Data Exploration
# Display the first 5 rows of the dataset
print(df.head())
# Get summary statistics
print(df.describe())
# Check for missing values
print(df.isnull().sum())
- Explanation: Quickly explore your dataset with these basic commands to understand its structure and identify any missing values.
5. Data Visualization
# Plot a histogram for a specific column
plt.hist(df['column_name'], bins=20)
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histogram of column_name')
plt.show()
- Explanation: Visualize the distribution of data in a specific column using a histogram.
6. Correlation Matrix
# Compute and visualize the correlation matrix
corr_matrix = df.corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()
- Explanation: Use a heatmap to visualize the correlation matrix and understand the relationships between different features.
7. Feature Scaling
from sklearn.preprocessing import StandardScaler
# Scale features
scaler = StandardScaler()
scaled_df = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
- Explanation: Standardize features by removing the mean and scaling to unit variance using
StandardScaler
fromsklearn
.
8. Splitting the Dataset
from sklearn.model_selection import train_test_split
# Split the dataset into training and testing sets
X = df.drop('target', axis=1)
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
- Explanation: Split your dataset into training and testing sets to evaluate the performance of your models.
9. Building a Simple Machine Learning Model
from sklearn.linear_model import LinearRegression
# Create and train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
- Explanation: Build and train a simple linear regression model using
sklearn
.
10. Model Evaluation
from sklearn.metrics import mean_squared_error, r2_score
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
print(f'R^2 Score: {r2}')
- Explanation: Evaluate your model’s performance using metrics like Mean Squared Error (MSE) and R-squared (R^2) score.
Complete Code
# Importing Essential Libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import scipy.stats as stats
# Loading a Dataset
# Replace 'data.csv' with your dataset file
df = pd.read_csv('data.csv')
# Handling Missing Values
# Fill missing values with the mean of the column
df.fillna(df.mean(), inplace=True)
# Basic Data Exploration
# Display the first 5 rows of the dataset
print("First 5 rows of the dataset:")
print(df.head())
# Get summary statistics
print("\nSummary statistics:")
print(df.describe())
# Check for missing values
print("\nMissing values count:")
print(df.isnull().sum())
# Data Visualization
# Plot a histogram for a specific column
# Replace 'column_name' with the column you want to plot
plt.hist(df['column_name'], bins=20)
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.title('Histogram of column_name')
plt.show()
# Correlation Matrix
# Compute and visualize the correlation matrix
corr_matrix = df.corr()
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix')
plt.show()
# Feature Scaling
# Scale features
scaler = StandardScaler()
scaled_df = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
# Splitting the Dataset
# Replace 'target' with the name of your target variable
X = df.drop('target', axis=1)
y = df['target']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Building a Simple Machine Learning Model
# Create and train a linear regression model
model = LinearRegression()
model.fit(X_train, y_train)
# Make predictions
y_pred = model.predict(X_test)
# Model Evaluation
# Evaluate the model
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print(f'Mean Squared Error: {mse}')
print(f'R^2 Score: {r2}')
Instructions:
- Dataset: Make sure to replace
'data.csv'
with the path to your dataset file. - Column Names: Replace
'column_name'
with the name of the column you want to plot in the histogram. - Target Variable: Replace
'target'
with the name of your target variable.
Complete Python Course : Click here
Free Notes :- Click here
New Project :-https://www.youtube.com/@Decodeit2
How to setup this Project Complete video – Click here
Conclusion
These Python Code Snippets cover a range of tasks in data science, from loading and exploring data to building and evaluating machine learning models. By incorporating these Python Code Snippets into your workflow, you can streamline your data science projects and focus on deriving insights and making impactful decisions. Keep these Python Code Snippets handy, and you’ll be well-equipped to tackle any data science challenge that comes your way.
Tags
- Python Code Snippets
- Data Science Python Tips
- Python for Data Science
- Data Science Code Examples
- Python Data Manipulation
- Machine Learning with Python Python Code Snippets
- Data Visualization in Python
- Data Analysis Python Tricks
- Python Programming for Data Science
- Effective Python Code