# A5 - Data Analytics-2 --- ## Pre-requisites - Install required libraries: `pandas` & `scikit-learn` ```shell pip install pandas pip install -U scikit-learn ``` - Save the dataset [Social_Network_Ads.csv](https://git.kska.io/sppu-te-comp-content/DataScienceAndBigDataAnalytics/src/branch/main/Datasets/Social_Network_Ads.csv) in the same directory as this Jupyter notebook. --- ## Code blocks 1. Import libraries: ```shell import pandas as pd from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score ``` 2. Load the dataset from a CSV file into a pandas DataFrame: ```shell df = pd.read_csv("Social_Network_Ads.csv") df["Gender"].replace({"Male":0,"Female":1}, inplace=True) df ``` 3. Print columns of the DataFrame: ```shell df.columns ``` 4. Defining the feature set (X) and the target variable (y): ```shell x = df[['User ID', 'Gender', 'Age', 'EstimatedSalary']] y = df[['Purchased']] ``` 5. Splitting the dataset into training and testing sets (75% training, 25% testing): ```shell x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.25,random_state=29) ``` 6. Creating an instance of the Logistic Regression model & fitting the model to the training data: ```shell model = LogisticRegression() model.fit(x_train,y_train) ``` 7. Making and displaying predictions on the test set using the trained model: ```shell y_pred = model.predict(x_test) y_pred ``` 8. Evaluating the model's performance on the training set: ```shell model.score(x_train,y_train) ``` 9. Evaluating the model's performance on the entire dataset: ```shell model.score(x,y) ``` 10. Generating and displaying the confusion matrix to evaluate the model's predictions: ```shell cm = confusion_matrix(y_test,y_pred) cm ``` 11. Unpacking and printing the confusion matrix into true negatives (tn), false positives (fp), false negatives (fn), and true positives (tp): ```shell tn, fp, fn, tp = confusion_matrix(y_test,y_pred).ravel() print(tn,fp,fn,tp) ``` 12. Calculating and displaying the accuracy score of the model on the test set: ```shell a = accuracy_score(y_test,y_pred) a ``` 13. Calculating and displaying the error rate (1 - accuracy): ```shell e = 1 - a e ``` 14. Calculating the precision score of the model: ```shell precision_score(y_test,y_pred) ``` 15. Calculating the recall score of the model: ```shell recall_score(y_test,y_pred) ``` --- ## References 1. [Jupyter notebook](https://github.com/ganimtron-10/SPPU-2019-TE-DSBDA-Lab/blob/master/Group-A/Q5.ipynb) 2. [Dataset source](https://www.kaggle.com/datasets/akram24/social-network-ads) ---