Files
MachineLearning/Codes/Code-2.md
T

2.8 KiB
Raw Blame History

Practical-2 (Spam Email Detection)

Problem Statement: Classify the email using the binary classification method. Email Spam detection has two states: a) Normal State Not Spam, b) Abnormal State Spam. Use K-Nearest Neighbors and Support Vector Machine for classification. Analyze their performance.

Note

Dataset available in Datasets directory.


Steps

  1. Import libraries
  2. Load dataset
  3. Data splitting (training and testing)
  4. KNN
  5. SVM
  6. Plotting

Code

1. Import libraries:

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import matplotlib.pyplot as plt
import seaborn as sns

2. Load dataset:

df = pd.read_csv("emails.csv", encoding="ISO-8859-1")  # Adjust path if needed

# Drop unnecessary columns if present
if "Email No." in df.columns:
    df = df.drop(columns=["Email No."])

# Ensure label is integer
df["Prediction"] = df["Prediction"].astype(int)

# Features & target
X = df.drop(columns=["Prediction"])
y = df["Prediction"]

# Print basic info
print(df.columns)
print(df.head(5))

3. Data splitting (training and testing):

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42, stratify=y
)

4. KNN:

knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
y_pred_knn = knn.predict(X_test)

print("\n--- KNN Performance ---")
print("Accuracy:", accuracy_score(y_test, y_pred_knn))
print("Classification Report:\n", classification_report(y_test, y_pred_knn))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_knn))

5. SVM:

svm = SVC(kernel='linear', random_state=42)  # Linear kernel for binary classification
svm.fit(X_train, y_train)
y_pred_svm = svm.predict(X_test)

print("\n--- SVM Performance ---")
print("Accuracy:", accuracy_score(y_test, y_pred_svm))
print("Classification Report:\n", classification_report(y_test, y_pred_svm))
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred_svm))

6. Plotting:

fig, ax = plt.subplots(1, 2, figsize=(12, 5))

sns.heatmap(confusion_matrix(y_test, y_pred_knn), annot=True, fmt="d", cmap="Blues", ax=ax[0])
ax[0].set_title("KNN Confusion Matrix")
ax[0].set_xlabel("Predicted")
ax[0].set_ylabel("Actual")

sns.heatmap(confusion_matrix(y_test, y_pred_svm), annot=True, fmt="d", cmap="Greens", ax=ax[1])
ax[1].set_title("SVM Confusion Matrix")
ax[1].set_xlabel("Predicted")
ax[1].set_ylabel("Actual")

plt.show()

Miscellaneous