Compare commits
2 Commits
d003991559
...
422014bec7
Author | SHA1 | Date | |
---|---|---|---|
422014bec7 | |||
97ed1414ea |
130
Codes/Code-A5 (Data Analytics-2).md
Normal file
130
Codes/Code-A5 (Data Analytics-2).md
Normal file
@ -0,0 +1,130 @@
|
||||
# A5 - Data Analytics-2
|
||||
|
||||
---
|
||||
|
||||
## Pre-requisites:
|
||||
|
||||
- Install required libraries: `pandas` & `scikit-learn`
|
||||
|
||||
```shell
|
||||
pip install pandas
|
||||
pip install -U scikit-learn
|
||||
```
|
||||
|
||||
- Save the dataset [Social_Network_Ads.csv](https://git.kska.io/sppu-te-comp-content/DataScienceAndBigDataAnalytics/src/branch/main/Datasets/Social_Network_Ads.csv) in the same directory as this Jupyter notebook.
|
||||
|
||||
---
|
||||
|
||||
## Code blocks:
|
||||
|
||||
1. Import libraries:
|
||||
|
||||
```shell
|
||||
import pandas as pd
|
||||
|
||||
from sklearn.model_selection import train_test_split
|
||||
from sklearn.linear_model import LogisticRegression
|
||||
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score
|
||||
```
|
||||
|
||||
2. Load the dataset from a CSV file into a pandas DataFrame:
|
||||
|
||||
```shell
|
||||
df = pd.read_csv("Social_Network_Ads.csv")
|
||||
df["Gender"].replace({"Male":0,"Female":1}, inplace=True)
|
||||
df
|
||||
```
|
||||
|
||||
3. Print columns of the DataFrame:
|
||||
|
||||
```shell
|
||||
df.columns
|
||||
```
|
||||
|
||||
4. Defining the feature set (X) and the target variable (y):
|
||||
|
||||
```shell
|
||||
x = df[['User ID', 'Gender', 'Age', 'EstimatedSalary']]
|
||||
y = df[['Purchased']]
|
||||
```
|
||||
|
||||
5. Splitting the dataset into training and testing sets (75% training, 25% testing):
|
||||
|
||||
```shell
|
||||
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.25,random_state=29)
|
||||
```
|
||||
|
||||
6. Creating an instance of the Logistic Regression model & fitting the model to the training data:
|
||||
|
||||
```shell
|
||||
model = LogisticRegression()
|
||||
model.fit(x_train,y_train)
|
||||
```
|
||||
|
||||
7. Making and displaying predictions on the test set using the trained model:
|
||||
|
||||
```shell
|
||||
y_pred = model.predict(x_test)
|
||||
y_pred
|
||||
```
|
||||
|
||||
8. Evaluating the model's performance on the training set:
|
||||
|
||||
```shell
|
||||
model.score(x_train,y_train)
|
||||
```
|
||||
|
||||
9. Evaluating the model's performance on the entire dataset:
|
||||
|
||||
```shell
|
||||
model.score(x,y)
|
||||
```
|
||||
|
||||
10. Generating and displaying the confusion matrix to evaluate the model's predictions:
|
||||
|
||||
```shell
|
||||
cm = confusion_matrix(y_test,y_pred)
|
||||
cm
|
||||
```
|
||||
|
||||
11. Unpacking and printing the confusion matrix into true negatives (tn), false positives (fp), false negatives (fn), and true positives (tp):
|
||||
|
||||
```shell
|
||||
tn, fp, fn, tp = confusion_matrix(y_test,y_pred).ravel()
|
||||
print(tn,fp,fn,tp)
|
||||
```
|
||||
|
||||
12. Calculating and displaying the accuracy score of the model on the test set:
|
||||
|
||||
```shell
|
||||
a = accuracy_score(y_test,y_pred)
|
||||
a
|
||||
```
|
||||
|
||||
13. Calculating and displaying the error rate (1 - accuracy):
|
||||
|
||||
```shell
|
||||
e = 1 - a
|
||||
e
|
||||
```
|
||||
|
||||
14. Calculating the precision score of the model:
|
||||
|
||||
```shell
|
||||
precision_score(y_test,y_pred)
|
||||
```
|
||||
|
||||
15. Calculating the recall score of the model:
|
||||
|
||||
```shell
|
||||
recall_score(y_test,y_pred)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## References
|
||||
|
||||
1. [Jupyter notebook](https://github.com/ganimtron-10/SPPU-2019-TE-DSBDA-Lab/blob/master/Group-A/Q5.ipynb)
|
||||
2. [Dataset source](https://www.kaggle.com/datasets/akram24/social-network-ads)
|
||||
|
||||
---
|
@ -15,8 +15,9 @@
|
||||
|
||||
### Codes
|
||||
|
||||
1. [Code-A9 (Data visualisation-2)](Codes/Code-A9%20%28Data%20visualisation-2%29.md)
|
||||
1. [Code-A10 (Data visualisation-3)](Codes/Code-A10%20%28Data%20visualisation-3%29.md)
|
||||
1. [Code-A9 (Data Visualisation-2)](Codes/Code-A9%20%28Data%20Visualisation-2%29.md)
|
||||
2. [Code-A10 (Data Visualisation-3)](Codes/Code-A10%20%28Data%20Visualisation-3%29.md)
|
||||
3. [Code-A5 (Data Analytics-2)](Codes/Code-A5%20%28Data%20Analytics-2%29.md)
|
||||
|
||||
### Notebooks
|
||||
|
||||
|
Loading…
x
Reference in New Issue
Block a user