A5 - Data Analytics-2

✅ Tested and working as intended.

Pre-requisites

Install required libraries: pandas, numpy, matplotlib, seaborn & scikit-learn

pip install pandas numpy matplotlib seaborn
pip install -U scikit-learn

Save the dataset Assignment-A5-Social_Network_Ads.csv in the same directory as this Jupyter notebook.

Code blocks

Import libraries:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score

Tip

Hit Tab key while typing library names (or anything else) to activate auto-complete in Jupyter notebook.

Load the dataset from a CSV file into a pandas DataFrame:

df= pd.read_csv("Assignment-A5-Social_Network_Ads.csv")
df.head() # Print first 5 rows

Print column names of the DataFrame:

df.columns

Convert Gender to numeric; Splot data (25%, 75%):

# Convert Gender to numeric
df['Gender'] = df['Gender'].map({'Male': 0, 'Female': 1})

# Features and Target
X = df[['Gender', 'Age', 'EstimatedSalary']]
y = df['Purchased']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

Feature scaling:

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

Train model and make predictions:

# Train the model
classifier = LogisticRegression()
classifier.fit(X_train, y_train)
# Make predictions
y_pred = classifier.predict(X_test)

Evaluate the model:

# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", cm)

# Extract values
TN, FP, FN, TP = cm.ravel()

# Metrics
accuracy = accuracy_score(y_test, y_pred)
error_rate = 1 - accuracy
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)

print(f"True Positives (TP): {TP}")
print(f"False Positives (FP): {FP}")
print(f"True Negatives (TN): {TN}")
print(f"False Negatives (FN): {FN}")
print(f"Accuracy: {accuracy:.2f}")
print(f"Error Rate: {error_rate:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")

Visualize:

sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel("Predicted")
plt.ylabel("Actual")
plt.title("Confusion Matrix")
plt.show()

References

Jupyter notebook ❌❌❌ (not referring anymore)
Dataset source

2.9 KiB Raw Blame History

A5 - Data Analytics-2

Pre-requisites

Code blocks

References

2.9 KiB

Raw Blame History