Added codes, datasets and Jupyter notebooks directory.

2025-06-11 13:48:53 +05:30
parent b3a22e9b79
commit 76dc1de8db
32 changed files with 8930 additions and 0 deletions
@@ -0,0 +1,126 @@
+# A5 - Data Analytics-2
+
+✅ Tested and working as intended.
+
+---
+
+## Pre-requisites
+
+- Install required libraries: `pandas`, `numpy`, `matplotlib`, `seaborn` & `scikit-learn`
+
+```shell
+pip install pandas numpy matplotlib seaborn
+pip install -U scikit-learn
+```
+
+- Save the dataset [Assignment-A5-Social_Network_Ads.csv](https://git.kska.io/sppu-te-comp-content/DataScienceAndBigDataAnalytics/src/branch/main/Datasets/Assignment-A5-Social_Network_Ads.csv) in the same directory as this Jupyter notebook.
+
+---
+
+## Code blocks
+
+1. Import libraries:
+
+```python3
+import pandas as pd
+import numpy as np
+import matplotlib.pyplot as plt
+import seaborn as sns
+from sklearn.model_selection import train_test_split
+from sklearn.preprocessing import StandardScaler
+from sklearn.linear_model import LogisticRegression
+from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score
+```
+
+> [!TIP]
+> Hit `Tab` key while typing library names (or anything else) to activate auto-complete in Jupyter notebook.
+
+2. Load the dataset from a CSV file into a pandas DataFrame:
+
+```python3
+df= pd.read_csv("Assignment-A5-Social_Network_Ads.csv")
+df.head() # Print first 5 rows
+```
+
+3. Print column names of the DataFrame:
+
+```python3
+df.columns
+```
+
+4. Convert `Gender` to numeric; Splot data (25%, 75%):
+
+```python3
+# Convert Gender to numeric
+df['Gender'] = df['Gender'].map({'Male': 0, 'Female': 1})
+
+# Features and Target
+X = df[['Gender', 'Age', 'EstimatedSalary']]
+y = df['Purchased']
+
+# Split the data
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
+```
+
+5. Feature scaling:
+
+```python3
+sc = StandardScaler()
+X_train = sc.fit_transform(X_train)
+X_test = sc.transform(X_test)
+```
+
+6. Train model and make predictions:
+
+```python3
+# Train the model
+classifier = LogisticRegression()
+classifier.fit(X_train, y_train)
+# Make predictions
+y_pred = classifier.predict(X_test)
+```
+
+7. Evaluate the model:
+
+```python3
+# Confusion Matrix
+cm = confusion_matrix(y_test, y_pred)
+print("Confusion Matrix:\n", cm)
+
+# Extract values
+TN, FP, FN, TP = cm.ravel()
+
+# Metrics
+accuracy = accuracy_score(y_test, y_pred)
+error_rate = 1 - accuracy
+precision = precision_score(y_test, y_pred)
+recall = recall_score(y_test, y_pred)
+
+print(f"True Positives (TP): {TP}")
+print(f"False Positives (FP): {FP}")
+print(f"True Negatives (TN): {TN}")
+print(f"False Negatives (FN): {FN}")
+print(f"Accuracy: {accuracy:.2f}")
+print(f"Error Rate: {error_rate:.2f}")
+print(f"Precision: {precision:.2f}")
+print(f"Recall: {recall:.2f}")
+```
+
+8. Visualize:
+
+```python3
+sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
+plt.xlabel("Predicted")
+plt.ylabel("Actual")
+plt.title("Confusion Matrix")
+plt.show()
+```
+
+---
+
+## References
+
+1. [Jupyter notebook](https://github.com/ganimtron-10/SPPU-2019-TE-DSBDA-Lab/blob/master/Group-A/Q5.ipynb) ❌❌❌ (not referring anymore)
+2. [Dataset source](https://www.kaggle.com/datasets/akram24/social-network-ads)
+
+---