sppu-te-comp-content/DataScienceAndBigDataAnalytics

Kshitij 2e4da857cd

Added code for A5 (data analytics 2), i.e. logistic regression.

2025-03-28 11:19:04 +05:30

2.7 KiB

Raw Permalink Blame History

A5 - Data Analytics-2

Pre-requisites

Install required libraries: pandas & scikit-learn

pip install pandas
pip install -U scikit-learn

Save the dataset Social_Network_Ads.csv in the same directory as this Jupyter notebook.

Code blocks

Import libraries:

import pandas as pd

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score

Load the dataset from a CSV file into a pandas DataFrame:

df = pd.read_csv("Social_Network_Ads.csv")
df["Gender"].replace({"Male":0,"Female":1}, inplace=True)
df

Print columns of the DataFrame:

df.columns

Defining the feature set (X) and the target variable (y):

x = df[['User ID', 'Gender', 'Age', 'EstimatedSalary']]
y = df[['Purchased']]

Splitting the dataset into training and testing sets (75% training, 25% testing):

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.25,random_state=29)

Creating an instance of the Logistic Regression model & fitting the model to the training data:

model = LogisticRegression()
model.fit(x_train,y_train)

Making and displaying predictions on the test set using the trained model:

y_pred = model.predict(x_test)
y_pred

Evaluating the model's performance on the training set:

model.score(x_train,y_train)

Evaluating the model's performance on the entire dataset:

model.score(x,y)

Generating and displaying the confusion matrix to evaluate the model's predictions:

cm = confusion_matrix(y_test,y_pred)
cm

Unpacking and printing the confusion matrix into true negatives (tn), false positives (fp), false negatives (fn), and true positives (tp):

tn, fp, fn, tp = confusion_matrix(y_test,y_pred).ravel()
print(tn,fp,fn,tp)

Calculating and displaying the accuracy score of the model on the test set:

a = accuracy_score(y_test,y_pred)
a

Calculating and displaying the error rate (1 - accuracy):

e = 1 - a
e

Calculating the precision score of the model:

precision_score(y_test,y_pred)

Calculating the recall score of the model:

recall_score(y_test,y_pred)

References