241 lines
7.2 KiB
Markdown
241 lines
7.2 KiB
Markdown
# Practical-3a (Convolutional Neural Network - Plant Diseases)
|
||
|
||
Problem Statement: Convolutional Neural Network (CNN): Use any dataset of plant disease and design a plant disease detection system using CNN.
|
||
|
||
> [!NOTE]
|
||
> Download dataset directly from [source](https://www.kaggle.com/datasets/vipoooool/new-plant-diseases-dataset/data).
|
||
> Haven't added it to the `/Datasets` directory due to its large size.
|
||
> tbh the dataset doesn't really matter in this case, you just need to ensure dataset directory contains `train` and `valid` sub-directories.
|
||
> Refer the above dataset to understand the required directory structure.
|
||
|
||
---
|
||
|
||
## Pre-requisities
|
||
|
||
1. Install packages using `pip`: `pip install tensorflow keras numpy opencv-python matplotlib seaborn scikit-learn` (`tensorflow` requires Python 3.9 - 3.12)
|
||
2. Download and unzip the dataset in the same directory as the Jupyter notebook.
|
||
3. Ensure your unzipped dataset has the required directory structure:
|
||
|
||
```shell
|
||
New Plant Diseases Dataset(Augmented)/
|
||
├── train
|
||
│ ├── Apple___Apple_scab
|
||
│ ├── Apple___Black_rot
|
||
│ ├── Apple___Cedar_apple_rust
|
||
├── valid
|
||
│ ├── Apple___Apple_scab
|
||
│ ├── Apple___Black_rot
|
||
│ ├── Apple___Cedar_apple_rust
|
||
```
|
||
|
||
## Steps
|
||
|
||
1. Import Libraries
|
||
2. Load Dataset
|
||
3. Exploratory Data Analysis (EDA)
|
||
4. Split into Training and Testing Sets
|
||
5. Build the CNN Model
|
||
6. Compile the Model
|
||
7. Train the Model
|
||
8. Evaluate the Model on Test Data
|
||
9. Plot Training vs Validation Accuracy
|
||
10. Plot Training vs Validation Loss
|
||
11. Confusion Matrix and Classification Report
|
||
|
||
---
|
||
|
||
## Code
|
||
|
||
### 1. Import Libraries:
|
||
|
||
```python3
|
||
import os
|
||
import numpy as np
|
||
import cv2
|
||
import matplotlib.pyplot as plt
|
||
import seaborn as sns
|
||
from sklearn.model_selection import train_test_split
|
||
from sklearn.metrics import confusion_matrix, classification_report
|
||
from tensorflow.keras.models import Sequential
|
||
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense, Dropout
|
||
from tensorflow.keras.utils import to_categorical
|
||
```
|
||
|
||
### 2. Load Dataset:
|
||
|
||
```python3
|
||
data = []
|
||
labels = []
|
||
|
||
# Path to dataset folder containing one subfolder per disease class
|
||
path = './New Plant Diseases Dataset(Augmented)/train/'
|
||
categories = sorted(os.listdir(path)) # sort for consistent label ordering
|
||
|
||
# Map each category name to a numeric index
|
||
label_dict = {category: idx for idx, category in enumerate(categories)}
|
||
print("Classes found:", len(categories))
|
||
|
||
max_per_class = 200 # cap images per class to avoid RAM overflow on large datasets
|
||
|
||
for category in categories:
|
||
folder = os.path.join(path, category)
|
||
count = 0
|
||
for img_name in os.listdir(folder):
|
||
if count >= max_per_class:
|
||
break
|
||
img_path = os.path.join(folder, img_name)
|
||
img_array = cv2.imread(img_path)
|
||
if img_array is not None: # skip unreadable files
|
||
img_array = cv2.resize(img_array, (64, 64)) # resize to fixed 64x64 pixels
|
||
data.append(img_array)
|
||
labels.append(label_dict[category])
|
||
count += 1
|
||
|
||
data = np.array(data) / 255.0 # normalize pixel values from [0,255] to [0,1]
|
||
labels = np.array(labels)
|
||
print("Dataset shape:", data.shape)
|
||
print("Labels shape:", labels.shape)
|
||
```
|
||
|
||
### 3. Exploratory Data Analysis (EDA):
|
||
|
||
```python3
|
||
print("Total images:", len(data))
|
||
print("Image shape:", data[0].shape)
|
||
print("Number of classes:", len(categories))
|
||
|
||
# Class distribution bar chart
|
||
class_counts = {cat: int((labels == idx).sum()) for cat, idx in label_dict.items()}
|
||
plt.figure(figsize=(14, 5))
|
||
plt.bar(class_counts.keys(), class_counts.values())
|
||
plt.xticks(rotation=90)
|
||
plt.title("Number of Images per Disease Class")
|
||
plt.xlabel("Class")
|
||
plt.ylabel("Count")
|
||
plt.tight_layout()
|
||
plt.show()
|
||
|
||
# Sample images from first 5 classes
|
||
plt.figure(figsize=(15, 3))
|
||
for i, category in enumerate(categories[:5]):
|
||
idx = np.where(labels == label_dict[category])[0][0] # index of first image in class
|
||
plt.subplot(1, 5, i + 1)
|
||
plt.imshow(cv2.cvtColor((data[idx] * 255).astype(np.uint8), cv2.COLOR_BGR2RGB))
|
||
plt.title(category[:15], fontsize=8)
|
||
plt.axis('off')
|
||
plt.suptitle("Sample Images per Class")
|
||
plt.show()
|
||
```
|
||
|
||
### 4. Split into Training and Testing Sets:
|
||
|
||
```python3
|
||
X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size=0.2, random_state=42)
|
||
|
||
num_classes = len(categories)
|
||
# One-hot encode labels: e.g. class 2 of 5 → [0, 0, 1, 0, 0]
|
||
y_train = to_categorical(y_train, num_classes)
|
||
y_test = to_categorical(y_test, num_classes)
|
||
print("Train samples:", X_train.shape[0])
|
||
print("Test samples: ", X_test.shape[0])
|
||
```
|
||
|
||
### 5. Build the CNN Model:
|
||
|
||
```python3
|
||
model = Sequential()
|
||
|
||
model.add(Input(shape=(64, 64, 3))) # input: 64x64 RGB image
|
||
model.add(Conv2D(32, (3, 3), activation='relu')) # 32 filters, detect basic features
|
||
model.add(MaxPooling2D(2, 2)) # downsample by 2x
|
||
|
||
model.add(Conv2D(64, (3, 3), activation='relu')) # 64 filters, detect complex features
|
||
model.add(MaxPooling2D(2, 2))
|
||
|
||
model.add(Flatten()) # convert 2D feature maps to 1D vector
|
||
|
||
model.add(Dense(128, activation='relu')) # fully connected layer
|
||
model.add(Dropout(0.5)) # randomly drop 50% neurons to reduce overfitting
|
||
|
||
model.add(Dense(num_classes, activation='softmax')) # output: probability for each class
|
||
|
||
model.summary()
|
||
```
|
||
|
||
### 6. Compile the Model:
|
||
|
||
```python3
|
||
# categorical_crossentropy: standard loss for multi-class classification with one-hot labels
|
||
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
|
||
```
|
||
|
||
### 7. Train the Model:
|
||
|
||
```python3
|
||
history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)
|
||
```
|
||
|
||
### 8. Evaluate the Model on Test Data:
|
||
|
||
```python3
|
||
loss, accuracy = model.evaluate(X_test, y_test)
|
||
print(f"Test Loss: {loss:.4f}")
|
||
print(f"Test Accuracy: {accuracy*100:.2f}%")
|
||
```
|
||
|
||
### 9. Plot Training vs Validation Accuracy:
|
||
|
||
```python3
|
||
plt.plot(history.history['accuracy'], label='Training Accuracy')
|
||
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
|
||
plt.title('CNN Model Accuracy Over Epochs')
|
||
plt.xlabel('Epoch')
|
||
plt.ylabel('Accuracy')
|
||
plt.legend()
|
||
plt.grid(True)
|
||
plt.show()
|
||
```
|
||
|
||
### 10. Plot Training vs Validation Loss:
|
||
|
||
```python3
|
||
plt.plot(history.history['loss'], label='Training Loss')
|
||
plt.plot(history.history['val_loss'], label='Validation Loss')
|
||
plt.title('CNN Model Loss Over Epochs')
|
||
plt.xlabel('Epoch')
|
||
plt.ylabel('Loss')
|
||
plt.legend()
|
||
plt.grid(True)
|
||
plt.show()
|
||
```
|
||
|
||
### 11. Confusion Matrix and Classification Report:
|
||
|
||
```python3
|
||
y_pred = np.argmax(model.predict(X_test), axis=1) # predicted class index
|
||
y_true = np.argmax(y_test, axis=1) # actual class index (from one-hot)
|
||
|
||
cm = confusion_matrix(y_true, y_pred)
|
||
plt.figure(figsize=(14, 12))
|
||
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
|
||
xticklabels=categories, yticklabels=categories)
|
||
plt.title('Confusion Matrix')
|
||
plt.ylabel('Actual')
|
||
plt.xlabel('Predicted')
|
||
plt.xticks(rotation=90)
|
||
plt.tight_layout()
|
||
plt.show()
|
||
|
||
print("\nClassification Report:\n")
|
||
print(classification_report(y_true, y_pred, target_names=categories))
|
||
```
|
||
|
||
---
|
||
|
||
## Miscellaneous
|
||
|
||
- [Dataset source](https://www.kaggle.com/datasets/vipoooool/new-plant-diseases-dataset)
|
||
|
||
---
|
||
|