Changed filenames to title case in code, updated those links in readme and added link for code a5.

Added code for A5 (data analytics 2), i.e. logistic regression.
2025-03-28 11:10:57 +05:30 · 2025-03-28 11:09:11 +05:30
4 changed files with 133 additions and 2 deletions
--- a/Visualisation-3).md
+++ b/Visualisation-3).md
--- a/Analytics-2).md
+++ b/Analytics-2).md
@ -0,0 +1,130 @@
+# A5 - Data Analytics-2
+
+---
+
+## Pre-requisites:
+
+- Install required libraries: `pandas` & `scikit-learn`
+
+```shell
+pip install pandas
+pip install -U scikit-learn
+```
+
+- Save the dataset [Social_Network_Ads.csv](https://git.kska.io/sppu-te-comp-content/DataScienceAndBigDataAnalytics/src/branch/main/Datasets/Social_Network_Ads.csv) in the same directory as this Jupyter notebook.
+
+---
+
+## Code blocks:
+
+1. Import libraries:
+
+```shell
+import pandas as pd
+
+from sklearn.model_selection import train_test_split
+from sklearn.linear_model import LogisticRegression
+from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score
+```
+
+2. Load the dataset from a CSV file into a pandas DataFrame:
+
+```shell
+df = pd.read_csv("Social_Network_Ads.csv")
+df["Gender"].replace({"Male":0,"Female":1}, inplace=True)
+df
+```
+
+3. Print columns of the DataFrame:
+
+```shell
+df.columns
+```
+
+4. Defining the feature set (X) and the target variable (y):
+
+```shell
+x = df[['User ID', 'Gender', 'Age', 'EstimatedSalary']]
+y = df[['Purchased']]
+```
+
+5. Splitting the dataset into training and testing sets (75% training, 25% testing):
+
+```shell
+x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.25,random_state=29)
+```
+
+6. Creating an instance of the Logistic Regression model & fitting the model to the training data:
+
+```shell
+model = LogisticRegression()
+model.fit(x_train,y_train)
+```
+
+7. Making and displaying predictions on the test set using the trained model:
+
+```shell
+y_pred = model.predict(x_test)
+y_pred
+```
+
+8. Evaluating the model's performance on the training set:
+
+```shell
+model.score(x_train,y_train)
+```
+
+9. Evaluating the model's performance on the entire dataset:
+
+```shell
+model.score(x,y)
+```
+
+10. Generating and displaying the confusion matrix to evaluate the model's predictions:
+
+```shell
+cm = confusion_matrix(y_test,y_pred)
+cm
+```
+
+11. Unpacking and printing the confusion matrix into true negatives (tn), false positives (fp), false negatives (fn), and true positives (tp):
+
+```shell
+tn, fp, fn, tp = confusion_matrix(y_test,y_pred).ravel()
+print(tn,fp,fn,tp)
+```
+
+12. Calculating and displaying the accuracy score of the model on the test set:
+
+```shell
+a = accuracy_score(y_test,y_pred)
+a
+```
+
+13. Calculating and displaying the error rate (1 - accuracy):
+
+```shell
+e = 1 - a
+e
+```
+
+14. Calculating the precision score of the model:
+
+```shell
+precision_score(y_test,y_pred)
+```
+
+15. Calculating the recall score of the model:
+
+```shell
+recall_score(y_test,y_pred)
+```
+
+---
+
+## References
+
+1. [Jupyter notebook](https://github.com/ganimtron-10/SPPU-2019-TE-DSBDA-Lab/blob/master/Group-A/Q5.ipynb)
+2. [Dataset source](https://www.kaggle.com/datasets/akram24/social-network-ads)
+
+---
--- a/Visualisation-2).md
+++ b/Visualisation-2).md
--- a/README.md
+++ b/README.md
@ -15,8 +15,9 @@

 ### Codes

-1. [Code-A9 (Data visualisation-2)](Codes/Code-A9%20%28Data%20visualisation-2%29.md)
-1. [Code-A10 (Data visualisation-3)](Codes/Code-A10%20%28Data%20visualisation-3%29.md)
+1. [Code-A9 (Data Visualisation-2)](Codes/Code-A9%20%28Data%20Visualisation-2%29.md)
+2. [Code-A10 (Data Visualisation-3)](Codes/Code-A10%20%28Data%20Visualisation-3%29.md)
+3. [Code-A5 (Data Analytics-2)](Codes/Code-A5%20%28Data%20Analytics-2%29.md)

 ### Notebooks
Author	SHA1	Message	Date
Kshitij	422014bec7	Changed filenames to title case in code, updated those links in readme and added link for code a5.	2025-03-28 11:10:57 +05:30
Kshitij	97ed1414ea	Added code for A5 (data analytics 2), i.e. logistic regression.	2025-03-28 11:09:11 +05:30