Compare commits
11 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
451d0cd299
|
|||
|
aac5138b33
|
|||
|
879a46011a
|
|||
|
90086d4bfa
|
|||
|
c3504f2743
|
|||
|
a5914df5c7
|
|||
|
fd71b0ff24
|
|||
|
811033a359
|
|||
|
0c674d02f9
|
|||
|
b54e56669d
|
|||
|
0c028cb9c4
|
+186
@@ -0,0 +1,186 @@
|
|||||||
|
# Practical-1 (Linear Regression using Deep Neural Network)
|
||||||
|
|
||||||
|
Problem Statement: Linear regression by using Deep Neural network: Implement Boston housing price prediction problem by Linear regression using Deep Neural network. Use Boston House price prediction dataset.
|
||||||
|
|
||||||
|
> [!NOTE]
|
||||||
|
> Dataset available in [Datasets](../Datasets/boston.csv) directory.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pre-requisities
|
||||||
|
|
||||||
|
1. Install packages using `pip`: `pip install tensorflow keras pandas numpy scikit-learn matplotlib seaborn` (`tensorflow` requires Python 3.9 - 3.12)
|
||||||
|
2. Copy the `boston.csv` dataset in the same directory as the Jupyter notebook.
|
||||||
|
|
||||||
|
## Steps
|
||||||
|
|
||||||
|
1. Import Libraries
|
||||||
|
2. Load Dataset
|
||||||
|
3. Exploratory Data Analysis (EDA)
|
||||||
|
4. Check for Missing Values
|
||||||
|
5. Correlation Heatmap
|
||||||
|
6. Separate Features and Target
|
||||||
|
7. Split into Training and Testing Sets
|
||||||
|
8. Feature Scaling (Standardization)
|
||||||
|
9. Build the Neural Network Model
|
||||||
|
10. Compile the Model
|
||||||
|
11. Train the Model
|
||||||
|
12. Evaluate the Model on Test Data
|
||||||
|
13. Make Predictions
|
||||||
|
14. Plot Training vs Validation Loss
|
||||||
|
15. Plot Predicted vs Actual Prices
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Code
|
||||||
|
|
||||||
|
### 1. Import Libraries:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
import pandas as pd
|
||||||
|
import numpy as np
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
import seaborn as sns
|
||||||
|
from sklearn.model_selection import train_test_split
|
||||||
|
from sklearn.preprocessing import StandardScaler
|
||||||
|
from keras import Input
|
||||||
|
from keras.models import Sequential
|
||||||
|
from keras.layers import Dense
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Load Dataset:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
data = pd.read_csv('boston.csv')
|
||||||
|
print(data.head())
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Exploratory Data Analysis (EDA):
|
||||||
|
|
||||||
|
```python3
|
||||||
|
print("Shape:", data.shape) # number of rows and columns
|
||||||
|
print("\nData Types:\n", data.dtypes)
|
||||||
|
print("\nStatistical Summary:\n", data.describe()) # min, max, mean, std, etc.
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Check for Missing Values:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
print("Missing values per column:\n", data.isnull().sum())
|
||||||
|
|
||||||
|
# Drop rows with missing values (if any)
|
||||||
|
data = data.dropna()
|
||||||
|
print("\nShape after dropping nulls:", data.shape)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Correlation Heatmap:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
plt.figure(figsize=(12, 8))
|
||||||
|
sns.heatmap(data.corr(), annot=True, fmt=".2f", cmap="coolwarm") # show correlation between all feature pairs
|
||||||
|
plt.title("Feature Correlation Heatmap")
|
||||||
|
plt.tight_layout()
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
|
||||||
|
### 6. Separate Features and Target:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
X = data.drop('MEDV', axis=1) # all columns except house price
|
||||||
|
y = data['MEDV'] # target: median house price
|
||||||
|
```
|
||||||
|
|
||||||
|
### 7. Split into Training and Testing Sets:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
# 80% train, 20% test; random_state=42 ensures reproducible split
|
||||||
|
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 8. Feature Scaling (Standardization):
|
||||||
|
|
||||||
|
```python3
|
||||||
|
scaler = StandardScaler()
|
||||||
|
|
||||||
|
X_train = scaler.fit_transform(X_train) # learn mean/std from train, then scale
|
||||||
|
X_test = scaler.transform(X_test) # apply same mean/std to test (no leakage)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 9. Build the Neural Network Model:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
model = Sequential()
|
||||||
|
model.add(Input(shape=(X_train.shape[1],))) # input shape = number of features
|
||||||
|
model.add(Dense(64, activation='relu')) # hidden layer 1: 64 neurons
|
||||||
|
model.add(Dense(32, activation='relu')) # hidden layer 2: 32 neurons
|
||||||
|
model.add(Dense(1, activation='linear')) # output layer: single value (house price)
|
||||||
|
|
||||||
|
model.summary()
|
||||||
|
```
|
||||||
|
|
||||||
|
### 10. Compile the Model:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
# adam: adaptive optimizer; mse: standard regression loss; mae: human-readable error metric
|
||||||
|
model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mae'])
|
||||||
|
```
|
||||||
|
|
||||||
|
### 11. Train the Model:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
# validation_split=0.2 reserves 20% of training data to monitor val loss each epoch
|
||||||
|
history = model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 12. Evaluate the Model on Test Data:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
loss, mae = model.evaluate(X_test, y_test)
|
||||||
|
print(f"Test Loss (MSE): {loss:.4f}")
|
||||||
|
print(f"Test Mean Absolute Error: {mae:.4f}")
|
||||||
|
```
|
||||||
|
|
||||||
|
### 13. Make Predictions:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
predictions = model.predict(X_test)
|
||||||
|
print("First 5 Predicted Prices:", predictions[:5].flatten())
|
||||||
|
print("First 5 Actual Prices: ", y_test.values[:5])
|
||||||
|
```
|
||||||
|
|
||||||
|
### 14. Plot Training vs Validation Loss:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
plt.plot(history.history['loss'], label='Training Loss')
|
||||||
|
plt.plot(history.history['val_loss'], label='Validation Loss')
|
||||||
|
plt.title('Model Loss Over Epochs')
|
||||||
|
plt.ylabel('Loss (MSE)')
|
||||||
|
plt.xlabel('Epoch')
|
||||||
|
plt.legend()
|
||||||
|
plt.grid(True)
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
|
||||||
|
### 15. Plot Predicted vs Actual Prices:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
plt.figure(figsize=(8, 6))
|
||||||
|
plt.scatter(y_test, predictions, alpha=0.7) # each point = one test sample
|
||||||
|
plt.plot([y_test.min(), y_test.max()],
|
||||||
|
[y_test.min(), y_test.max()], 'r--', label='Ideal Fit') # diagonal = perfect prediction
|
||||||
|
plt.xlabel('Actual Price')
|
||||||
|
plt.ylabel('Predicted Price')
|
||||||
|
plt.title('Actual vs Predicted House Prices')
|
||||||
|
plt.legend()
|
||||||
|
plt.grid(True)
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Miscellaneous
|
||||||
|
|
||||||
|
- [Dataset source](https://www.kaggle.com/datasets/fedesoriano/the-boston-houseprice-data)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
@@ -0,0 +1,202 @@
|
|||||||
|
# Practical-2b (Classification using Deep Neural Network - IMDB Dataset)
|
||||||
|
|
||||||
|
Problem Statement: Binary classification using Deep Neural Networks Example: Classify movie reviews into positive" reviews and "negative" reviews, just based on the text content of the reviews. Use IMDB dataset
|
||||||
|
|
||||||
|
> [!NOTE]
|
||||||
|
> Dataset available in [Datasets](../Datasets/IMDB%20Dataset.csv) directory.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pre-requisities
|
||||||
|
|
||||||
|
1. Install packages using `pip`: `pip install tensorflow keras pandas numpy scikit-learn matplotlib seaborn` (`tensorflow` requires Python 3.9 - 3.12)
|
||||||
|
2. Copy the `IMDB Dataset.csv` dataset in the same directory as the Jupyter notebook.
|
||||||
|
|
||||||
|
## Steps
|
||||||
|
|
||||||
|
1. Import Libraries
|
||||||
|
2. Load Dataset
|
||||||
|
3. Exploratory Data Analysis (EDA)
|
||||||
|
4. Data Cleaning - Strip HTML Tags
|
||||||
|
5. Encode Labels and Separate Features
|
||||||
|
6. Tokenize and Pad Text Sequences
|
||||||
|
7. Split into Training and Testing Sets
|
||||||
|
8. Build the Neural Network Model
|
||||||
|
9. Compile the Model
|
||||||
|
10. Train the Model
|
||||||
|
11. Evaluate the Model on Test Data
|
||||||
|
12. Plot Training vs Validation Accuracy
|
||||||
|
13. Plot Training vs Validation Loss
|
||||||
|
14. Confusion Matrix and Classification Report
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Code
|
||||||
|
|
||||||
|
### 1. Import Libraries:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
import re
|
||||||
|
import pandas as pd
|
||||||
|
import numpy as np
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
import seaborn as sns
|
||||||
|
from sklearn.model_selection import train_test_split
|
||||||
|
from sklearn.preprocessing import LabelEncoder
|
||||||
|
from sklearn.metrics import confusion_matrix, classification_report
|
||||||
|
from tensorflow.keras.models import Sequential
|
||||||
|
from tensorflow.keras.layers import Dense, Embedding, GlobalAveragePooling1D
|
||||||
|
from tensorflow.keras.preprocessing.text import Tokenizer
|
||||||
|
from tensorflow.keras.preprocessing.sequence import pad_sequences
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Load Dataset:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
data = pd.read_csv('IMDB Dataset.csv')
|
||||||
|
print(data.head())
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Exploratory Data Analysis (EDA):
|
||||||
|
|
||||||
|
```python3
|
||||||
|
print("Shape:", data.shape)
|
||||||
|
print("\nMissing Values:\n", data.isnull().sum())
|
||||||
|
print("\nClass Distribution:\n", data['sentiment'].value_counts())
|
||||||
|
|
||||||
|
# Visualize class distribution
|
||||||
|
sns.countplot(x='sentiment', data=data)
|
||||||
|
plt.title('Sentiment Class Distribution')
|
||||||
|
plt.show()
|
||||||
|
|
||||||
|
# Sample reviews
|
||||||
|
print("\nSample positive review:\n", data[data['sentiment'] == 'positive']['review'].iloc[0][:300])
|
||||||
|
print("\nSample negative review:\n", data[data['sentiment'] == 'negative']['review'].iloc[0][:300])
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Data Cleaning - Strip HTML Tags:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
def clean_text(text):
|
||||||
|
text = re.sub(r'<.*?>', '', text) # remove HTML tags like <br />
|
||||||
|
text = text.lower().strip() # lowercase and trim whitespace
|
||||||
|
return text
|
||||||
|
|
||||||
|
data['review'] = data['review'].apply(clean_text)
|
||||||
|
print("Sample cleaned review:\n", data['review'].iloc[0][:300])
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Encode Labels and Separate Features:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
label_encoder = LabelEncoder()
|
||||||
|
data['sentiment'] = label_encoder.fit_transform(data['sentiment']) # positive=1, negative=0
|
||||||
|
|
||||||
|
X = data['review'].values # input: review text
|
||||||
|
y = data['sentiment'].values # output: 0 or 1
|
||||||
|
```
|
||||||
|
|
||||||
|
### 6. Tokenize and Pad Text Sequences:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
vocab_size = 10000 # keep only top 10,000 most frequent words
|
||||||
|
max_length = 200 # truncate/pad all reviews to 200 words
|
||||||
|
|
||||||
|
tokenizer = Tokenizer(num_words=vocab_size, oov_token='<OOV>') # <OOV> handles unknown words
|
||||||
|
tokenizer.fit_on_texts(X) # build word index from training text
|
||||||
|
|
||||||
|
sequences = tokenizer.texts_to_sequences(X) # convert each word to its integer index
|
||||||
|
padded_sequences = pad_sequences(sequences, maxlen=max_length,
|
||||||
|
padding='post', truncating='post') # pad/truncate to fixed length
|
||||||
|
```
|
||||||
|
|
||||||
|
### 7. Split into Training and Testing Sets:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
X_train, X_test, y_train, y_test = train_test_split(padded_sequences, y, test_size=0.2, random_state=42)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 8. Build the Neural Network Model:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
model = Sequential()
|
||||||
|
model.add(Embedding(vocab_size, 16)) # maps each word index to a 16-dim vector
|
||||||
|
model.add(GlobalAveragePooling1D()) # averages all word vectors into one vector
|
||||||
|
model.add(Dense(24, activation='relu')) # hidden layer: 24 neurons
|
||||||
|
model.add(Dense(1, activation='sigmoid')) # output: probability between 0 and 1 (binary)
|
||||||
|
|
||||||
|
model.summary()
|
||||||
|
```
|
||||||
|
|
||||||
|
### 9. Compile the Model:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
# binary_crossentropy: standard loss for binary classification; sigmoid output
|
||||||
|
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
|
||||||
|
```
|
||||||
|
|
||||||
|
### 10. Train the Model:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 11. Evaluate the Model on Test Data:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
loss, accuracy = model.evaluate(X_test, y_test)
|
||||||
|
print(f"Test Loss: {loss:.4f}")
|
||||||
|
print(f"Test Accuracy: {accuracy*100:.2f}%")
|
||||||
|
```
|
||||||
|
|
||||||
|
### 12. Plot Training vs Validation Accuracy:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
plt.plot(history.history['accuracy'], label='Training Accuracy')
|
||||||
|
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
|
||||||
|
plt.title('Model Accuracy Over Epochs')
|
||||||
|
plt.ylabel('Accuracy')
|
||||||
|
plt.xlabel('Epoch')
|
||||||
|
plt.legend()
|
||||||
|
plt.grid(True)
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
|
||||||
|
### 13. Plot Training vs Validation Loss:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
plt.plot(history.history['loss'], label='Training Loss')
|
||||||
|
plt.plot(history.history['val_loss'], label='Validation Loss')
|
||||||
|
plt.title('Model Loss Over Epochs')
|
||||||
|
plt.ylabel('Loss')
|
||||||
|
plt.xlabel('Epoch')
|
||||||
|
plt.legend()
|
||||||
|
plt.grid(True)
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
|
||||||
|
### 14. Confusion Matrix and Classification Report:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
y_pred = (model.predict(X_test) > 0.5).astype(int) # threshold 0.5: prob > 0.5 = positive
|
||||||
|
|
||||||
|
cm = confusion_matrix(y_test, y_pred)
|
||||||
|
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
|
||||||
|
xticklabels=['Negative', 'Positive'],
|
||||||
|
yticklabels=['Negative', 'Positive'])
|
||||||
|
plt.title('Confusion Matrix')
|
||||||
|
plt.ylabel('Actual')
|
||||||
|
plt.xlabel('Predicted')
|
||||||
|
plt.show()
|
||||||
|
|
||||||
|
print("\nClassification Report:\n", classification_report(y_test, y_pred, target_names=['Negative', 'Positive']))
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Miscellaneous
|
||||||
|
|
||||||
|
- [Dataset source](https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
@@ -0,0 +1,240 @@
|
|||||||
|
# Practical-3a (Convolutional Neural Network - Plant Diseases)
|
||||||
|
|
||||||
|
Problem Statement: Convolutional Neural Network (CNN): Use any dataset of plant disease and design a plant disease detection system using CNN.
|
||||||
|
|
||||||
|
> [!NOTE]
|
||||||
|
> Download dataset directly from [source](https://www.kaggle.com/datasets/vipoooool/new-plant-diseases-dataset/data).
|
||||||
|
> Haven't added it to the `/Datasets` directory due to its large size.
|
||||||
|
> tbh the dataset doesn't really matter in this case, you just need to ensure dataset directory contains `train` and `valid` sub-directories.
|
||||||
|
> Refer the above dataset to understand the required directory structure.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pre-requisities
|
||||||
|
|
||||||
|
1. Install packages using `pip`: `pip install tensorflow keras numpy opencv-python matplotlib seaborn scikit-learn` (`tensorflow` requires Python 3.9 - 3.12)
|
||||||
|
2. Download and unzip the dataset in the same directory as the Jupyter notebook.
|
||||||
|
3. Ensure your unzipped dataset has the required directory structure:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
New Plant Diseases Dataset(Augmented)/
|
||||||
|
├── train
|
||||||
|
│ ├── Apple___Apple_scab
|
||||||
|
│ ├── Apple___Black_rot
|
||||||
|
│ ├── Apple___Cedar_apple_rust
|
||||||
|
├── valid
|
||||||
|
│ ├── Apple___Apple_scab
|
||||||
|
│ ├── Apple___Black_rot
|
||||||
|
│ ├── Apple___Cedar_apple_rust
|
||||||
|
```
|
||||||
|
|
||||||
|
## Steps
|
||||||
|
|
||||||
|
1. Import Libraries
|
||||||
|
2. Load Dataset
|
||||||
|
3. Exploratory Data Analysis (EDA)
|
||||||
|
4. Split into Training and Testing Sets
|
||||||
|
5. Build the CNN Model
|
||||||
|
6. Compile the Model
|
||||||
|
7. Train the Model
|
||||||
|
8. Evaluate the Model on Test Data
|
||||||
|
9. Plot Training vs Validation Accuracy
|
||||||
|
10. Plot Training vs Validation Loss
|
||||||
|
11. Confusion Matrix and Classification Report
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Code
|
||||||
|
|
||||||
|
### 1. Import Libraries:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
import os
|
||||||
|
import numpy as np
|
||||||
|
import cv2
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
import seaborn as sns
|
||||||
|
from sklearn.model_selection import train_test_split
|
||||||
|
from sklearn.metrics import confusion_matrix, classification_report
|
||||||
|
from tensorflow.keras.models import Sequential
|
||||||
|
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense, Dropout
|
||||||
|
from tensorflow.keras.utils import to_categorical
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Load Dataset:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
data = []
|
||||||
|
labels = []
|
||||||
|
|
||||||
|
# Path to dataset folder containing one subfolder per disease class
|
||||||
|
path = './New Plant Diseases Dataset(Augmented)/train/'
|
||||||
|
categories = sorted(os.listdir(path)) # sort for consistent label ordering
|
||||||
|
|
||||||
|
# Map each category name to a numeric index
|
||||||
|
label_dict = {category: idx for idx, category in enumerate(categories)}
|
||||||
|
print("Classes found:", len(categories))
|
||||||
|
|
||||||
|
max_per_class = 200 # cap images per class to avoid RAM overflow on large datasets
|
||||||
|
|
||||||
|
for category in categories:
|
||||||
|
folder = os.path.join(path, category)
|
||||||
|
count = 0
|
||||||
|
for img_name in os.listdir(folder):
|
||||||
|
if count >= max_per_class:
|
||||||
|
break
|
||||||
|
img_path = os.path.join(folder, img_name)
|
||||||
|
img_array = cv2.imread(img_path)
|
||||||
|
if img_array is not None: # skip unreadable files
|
||||||
|
img_array = cv2.resize(img_array, (64, 64)) # resize to fixed 64x64 pixels
|
||||||
|
data.append(img_array)
|
||||||
|
labels.append(label_dict[category])
|
||||||
|
count += 1
|
||||||
|
|
||||||
|
data = np.array(data) / 255.0 # normalize pixel values from [0,255] to [0,1]
|
||||||
|
labels = np.array(labels)
|
||||||
|
print("Dataset shape:", data.shape)
|
||||||
|
print("Labels shape:", labels.shape)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Exploratory Data Analysis (EDA):
|
||||||
|
|
||||||
|
```python3
|
||||||
|
print("Total images:", len(data))
|
||||||
|
print("Image shape:", data[0].shape)
|
||||||
|
print("Number of classes:", len(categories))
|
||||||
|
|
||||||
|
# Class distribution bar chart
|
||||||
|
class_counts = {cat: int((labels == idx).sum()) for cat, idx in label_dict.items()}
|
||||||
|
plt.figure(figsize=(14, 5))
|
||||||
|
plt.bar(class_counts.keys(), class_counts.values())
|
||||||
|
plt.xticks(rotation=90)
|
||||||
|
plt.title("Number of Images per Disease Class")
|
||||||
|
plt.xlabel("Class")
|
||||||
|
plt.ylabel("Count")
|
||||||
|
plt.tight_layout()
|
||||||
|
plt.show()
|
||||||
|
|
||||||
|
# Sample images from first 5 classes
|
||||||
|
plt.figure(figsize=(15, 3))
|
||||||
|
for i, category in enumerate(categories[:5]):
|
||||||
|
idx = np.where(labels == label_dict[category])[0][0] # index of first image in class
|
||||||
|
plt.subplot(1, 5, i + 1)
|
||||||
|
plt.imshow(cv2.cvtColor((data[idx] * 255).astype(np.uint8), cv2.COLOR_BGR2RGB))
|
||||||
|
plt.title(category[:15], fontsize=8)
|
||||||
|
plt.axis('off')
|
||||||
|
plt.suptitle("Sample Images per Class")
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Split into Training and Testing Sets:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size=0.2, random_state=42)
|
||||||
|
|
||||||
|
num_classes = len(categories)
|
||||||
|
# One-hot encode labels: e.g. class 2 of 5 → [0, 0, 1, 0, 0]
|
||||||
|
y_train = to_categorical(y_train, num_classes)
|
||||||
|
y_test = to_categorical(y_test, num_classes)
|
||||||
|
print("Train samples:", X_train.shape[0])
|
||||||
|
print("Test samples: ", X_test.shape[0])
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Build the CNN Model:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
model = Sequential()
|
||||||
|
|
||||||
|
model.add(Input(shape=(64, 64, 3))) # input: 64x64 RGB image
|
||||||
|
model.add(Conv2D(32, (3, 3), activation='relu')) # 32 filters, detect basic features
|
||||||
|
model.add(MaxPooling2D(2, 2)) # downsample by 2x
|
||||||
|
|
||||||
|
model.add(Conv2D(64, (3, 3), activation='relu')) # 64 filters, detect complex features
|
||||||
|
model.add(MaxPooling2D(2, 2))
|
||||||
|
|
||||||
|
model.add(Flatten()) # convert 2D feature maps to 1D vector
|
||||||
|
|
||||||
|
model.add(Dense(128, activation='relu')) # fully connected layer
|
||||||
|
model.add(Dropout(0.5)) # randomly drop 50% neurons to reduce overfitting
|
||||||
|
|
||||||
|
model.add(Dense(num_classes, activation='softmax')) # output: probability for each class
|
||||||
|
|
||||||
|
model.summary()
|
||||||
|
```
|
||||||
|
|
||||||
|
### 6. Compile the Model:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
# categorical_crossentropy: standard loss for multi-class classification with one-hot labels
|
||||||
|
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
|
||||||
|
```
|
||||||
|
|
||||||
|
### 7. Train the Model:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 8. Evaluate the Model on Test Data:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
loss, accuracy = model.evaluate(X_test, y_test)
|
||||||
|
print(f"Test Loss: {loss:.4f}")
|
||||||
|
print(f"Test Accuracy: {accuracy*100:.2f}%")
|
||||||
|
```
|
||||||
|
|
||||||
|
### 9. Plot Training vs Validation Accuracy:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
plt.plot(history.history['accuracy'], label='Training Accuracy')
|
||||||
|
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
|
||||||
|
plt.title('CNN Model Accuracy Over Epochs')
|
||||||
|
plt.xlabel('Epoch')
|
||||||
|
plt.ylabel('Accuracy')
|
||||||
|
plt.legend()
|
||||||
|
plt.grid(True)
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
|
||||||
|
### 10. Plot Training vs Validation Loss:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
plt.plot(history.history['loss'], label='Training Loss')
|
||||||
|
plt.plot(history.history['val_loss'], label='Validation Loss')
|
||||||
|
plt.title('CNN Model Loss Over Epochs')
|
||||||
|
plt.xlabel('Epoch')
|
||||||
|
plt.ylabel('Loss')
|
||||||
|
plt.legend()
|
||||||
|
plt.grid(True)
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
|
||||||
|
### 11. Confusion Matrix and Classification Report:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
y_pred = np.argmax(model.predict(X_test), axis=1) # predicted class index
|
||||||
|
y_true = np.argmax(y_test, axis=1) # actual class index (from one-hot)
|
||||||
|
|
||||||
|
cm = confusion_matrix(y_true, y_pred)
|
||||||
|
plt.figure(figsize=(14, 12))
|
||||||
|
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
|
||||||
|
xticklabels=categories, yticklabels=categories)
|
||||||
|
plt.title('Confusion Matrix')
|
||||||
|
plt.ylabel('Actual')
|
||||||
|
plt.xlabel('Predicted')
|
||||||
|
plt.xticks(rotation=90)
|
||||||
|
plt.tight_layout()
|
||||||
|
plt.show()
|
||||||
|
|
||||||
|
print("\nClassification Report:\n")
|
||||||
|
print(classification_report(y_true, y_pred, target_names=categories))
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Miscellaneous
|
||||||
|
|
||||||
|
- [Dataset source](https://www.kaggle.com/datasets/vipoooool/new-plant-diseases-dataset)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
@@ -0,0 +1,233 @@
|
|||||||
|
# Practical-3b (Convolutional Neural Network - MNIST Fashion Dataset)
|
||||||
|
|
||||||
|
Problem Statement: Convolutional Neural Network (CNN): Use MNIST Fashion Dataset and create a classifier to classify fashion clothing into categories.
|
||||||
|
|
||||||
|
> [!NOTE]
|
||||||
|
> Download dataset directly from [source](https://www.kaggle.com/datasets/zalando-research/fashionmnist).
|
||||||
|
> Dataset available in [Datasets](../Datasets/fashionmnist.zip) directory.
|
||||||
|
> In the code, dataset is downloaded directly from Keras/TensorFlow in 2nd step (Load Dataset)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pre-requisities
|
||||||
|
|
||||||
|
1. Install packages using `pip`: `pip install tensorflow keras numpy matplotlib seaborn scikit-learn` (`tensorflow` requires Python 3.9 - 3.12)
|
||||||
|
|
||||||
|
## Steps
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Code
|
||||||
|
|
||||||
|
### 1. Import Libraries:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
import numpy as np
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
import seaborn as sns
|
||||||
|
import tensorflow as tf
|
||||||
|
from tensorflow.keras.models import Sequential
|
||||||
|
from tensorflow.keras.layers import Input, Conv2D, AvgPool2D, GlobalAveragePooling2D, Dense
|
||||||
|
from tensorflow.keras.utils import to_categorical
|
||||||
|
from sklearn.metrics import confusion_matrix, classification_report
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Load Dataset:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
# Fashion MNIST is built into Keras, downloads automatically on first run
|
||||||
|
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
|
||||||
|
|
||||||
|
'''
|
||||||
|
import numpy as np
|
||||||
|
import gzip
|
||||||
|
import os
|
||||||
|
|
||||||
|
def load_fashion_mnist(path):
|
||||||
|
"""Load Fashion MNIST from local .gz files (Kaggle Zalando format)."""
|
||||||
|
files = {
|
||||||
|
'X_train': 'train-images-idx3-ubyte.gz',
|
||||||
|
'y_train': 'train-labels-idx1-ubyte.gz',
|
||||||
|
'X_test': 't10k-images-idx3-ubyte.gz',
|
||||||
|
'y_test': 't10k-labels-idx1-ubyte.gz',
|
||||||
|
}
|
||||||
|
|
||||||
|
with gzip.open(os.path.join(path, files['X_train'])) as f:
|
||||||
|
X_train = np.frombuffer(f.read(), np.uint8, offset=16).reshape(-1, 28, 28)
|
||||||
|
with gzip.open(os.path.join(path, files['y_train'])) as f:
|
||||||
|
y_train = np.frombuffer(f.read(), np.uint8, offset=8)
|
||||||
|
with gzip.open(os.path.join(path, files['X_test'])) as f:
|
||||||
|
X_test = np.frombuffer(f.read(), np.uint8, offset=16).reshape(-1, 28, 28)
|
||||||
|
with gzip.open(os.path.join(path, files['y_test'])) as f:
|
||||||
|
y_test = np.frombuffer(f.read(), np.uint8, offset=8)
|
||||||
|
|
||||||
|
return (X_train, y_train), (X_test, y_test)
|
||||||
|
|
||||||
|
# Replace the Keras load line with:
|
||||||
|
(X_train, y_train), (X_test, y_test) = load_fashion_mnist('./fashion-mnist/')
|
||||||
|
'''
|
||||||
|
|
||||||
|
print("Training set shape:", X_train.shape) # (60000, 28, 28)
|
||||||
|
print("Test set shape: ", X_test.shape) # (10000, 28, 28)
|
||||||
|
print("Classes:", np.unique(y_train))
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Exploratory Data Analysis (EDA):
|
||||||
|
|
||||||
|
```python3
|
||||||
|
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
|
||||||
|
'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
|
||||||
|
|
||||||
|
# Class distribution
|
||||||
|
unique, counts = np.unique(y_train, return_counts=True)
|
||||||
|
plt.figure(figsize=(10, 4))
|
||||||
|
plt.bar([class_names[i] for i in unique], counts)
|
||||||
|
plt.xticks(rotation=45, ha='right')
|
||||||
|
plt.title("Training Set Class Distribution")
|
||||||
|
plt.ylabel("Count")
|
||||||
|
plt.tight_layout()
|
||||||
|
plt.show()
|
||||||
|
|
||||||
|
# Sample images (one per class)
|
||||||
|
plt.figure(figsize=(15, 3))
|
||||||
|
for i, cls in enumerate(class_names):
|
||||||
|
idx = np.where(y_train == i)[0][0] # index of first image for this class
|
||||||
|
plt.subplot(1, 10, i + 1)
|
||||||
|
plt.imshow(X_train[idx], cmap='gray')
|
||||||
|
plt.title(cls, fontsize=7)
|
||||||
|
plt.axis('off')
|
||||||
|
plt.suptitle("Sample Image per Class")
|
||||||
|
plt.tight_layout()
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Preprocess Data:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
# Reshape to add channel dimension: (samples, 28, 28) -> (samples, 28, 28, 1)
|
||||||
|
X_train = X_train.reshape(-1, 28, 28, 1).astype('float32') / 255.0 # normalize to [0,1]
|
||||||
|
X_test = X_test.reshape(-1, 28, 28, 1).astype('float32') / 255.0
|
||||||
|
|
||||||
|
# One-hot encode labels: e.g. class 3 of 10 -> [0,0,0,1,0,0,0,0,0,0]
|
||||||
|
y_train_cat = to_categorical(y_train, num_classes=10)
|
||||||
|
y_test_cat = to_categorical(y_test, num_classes=10)
|
||||||
|
|
||||||
|
print("X_train shape:", X_train.shape)
|
||||||
|
print("y_train_cat shape:", y_train_cat.shape)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Build the CNN Model:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
model = Sequential()
|
||||||
|
|
||||||
|
model.add(Input(shape=(28, 28, 1))) # input: 28x28 grayscale image
|
||||||
|
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same')) # 64 filters, extract features
|
||||||
|
model.add(AvgPool2D(pool_size=(2, 2))) # downsample to 14x14
|
||||||
|
|
||||||
|
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', padding='same')) # 32 filters, refine features
|
||||||
|
model.add(AvgPool2D(pool_size=(2, 2))) # downsample to 7x7
|
||||||
|
|
||||||
|
model.add(GlobalAveragePooling2D()) # average each feature map to single value
|
||||||
|
model.add(Dense(10, activation='softmax')) # output: probability for each of 10 classes
|
||||||
|
|
||||||
|
model.summary()
|
||||||
|
```
|
||||||
|
|
||||||
|
### 6. Compile the Model:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
# categorical_crossentropy: standard loss for multi-class one-hot classification
|
||||||
|
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
|
||||||
|
```
|
||||||
|
|
||||||
|
### 7. Train the Model:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
# validation_data uses test set to monitor performance after each epoch
|
||||||
|
history = model.fit(X_train, y_train_cat, epochs=10, validation_data=(X_test, y_test_cat))
|
||||||
|
```
|
||||||
|
|
||||||
|
### 8. Evaluate the Model on Test Data:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
loss, accuracy = model.evaluate(X_test, y_test_cat)
|
||||||
|
print(f"Test Loss: {loss:.4f}")
|
||||||
|
print(f"Test Accuracy: {accuracy*100:.2f}%")
|
||||||
|
```
|
||||||
|
|
||||||
|
### 9. Plot Training vs Validation Accuracy:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
plt.plot(history.history['accuracy'], label='Training Accuracy')
|
||||||
|
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
|
||||||
|
plt.title('Model Accuracy Over Epochs')
|
||||||
|
plt.xlabel('Epoch')
|
||||||
|
plt.ylabel('Accuracy')
|
||||||
|
plt.legend()
|
||||||
|
plt.grid(True)
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
|
||||||
|
### 10. Plot Training vs Validation Loss:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
plt.plot(history.history['loss'], label='Training Loss')
|
||||||
|
plt.plot(history.history['val_loss'], label='Validation Loss')
|
||||||
|
plt.title('Model Loss Over Epochs')
|
||||||
|
plt.xlabel('Epoch')
|
||||||
|
plt.ylabel('Loss')
|
||||||
|
plt.legend()
|
||||||
|
plt.grid(True)
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
|
||||||
|
### 11. Confusion Matrix and Classification Report:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
y_pred = np.argmax(model.predict(X_test), axis=1) # predicted class index
|
||||||
|
|
||||||
|
cm = confusion_matrix(y_test, y_pred)
|
||||||
|
plt.figure(figsize=(10, 8))
|
||||||
|
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
|
||||||
|
xticklabels=class_names, yticklabels=class_names)
|
||||||
|
plt.title('Confusion Matrix')
|
||||||
|
plt.ylabel('Actual')
|
||||||
|
plt.xlabel('Predicted')
|
||||||
|
plt.xticks(rotation=45, ha='right')
|
||||||
|
plt.tight_layout()
|
||||||
|
plt.show()
|
||||||
|
|
||||||
|
print("\nClassification Report:\n")
|
||||||
|
print(classification_report(y_test, y_pred, target_names=class_names))
|
||||||
|
```
|
||||||
|
|
||||||
|
### 12. Visualize Sample Predictions:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
# batch predict all test images, then pick 10 random ones to display
|
||||||
|
all_preds = np.argmax(model.predict(X_test), axis=1)
|
||||||
|
random_indices = np.random.choice(len(X_test), 10, replace=False)
|
||||||
|
|
||||||
|
plt.figure(figsize=(20, 4))
|
||||||
|
for i, idx in enumerate(random_indices):
|
||||||
|
plt.subplot(2, 5, i + 1)
|
||||||
|
plt.imshow(X_test[idx].reshape(28, 28), cmap='gray')
|
||||||
|
predicted = class_names[all_preds[idx]]
|
||||||
|
actual = class_names[y_test[idx]]
|
||||||
|
color = 'green' if predicted == actual else 'red' # green = correct, red = wrong
|
||||||
|
plt.title(f"P: {predicted}\nA: {actual}", fontsize=8, color=color)
|
||||||
|
plt.axis('off')
|
||||||
|
plt.suptitle("Sample Predictions (Green=Correct, Red=Wrong)")
|
||||||
|
plt.tight_layout()
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Miscellaneous
|
||||||
|
|
||||||
|
- [Dataset source](https://www.kaggle.com/datasets/zalando-research/fashionmnist)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
+276
@@ -0,0 +1,276 @@
|
|||||||
|
# Practical-4 (Recurrent Neural Network - Google Stock Price Dataset)
|
||||||
|
|
||||||
|
Problem Statement: Recurrent neural network (RNN): Use the Google stock prices dataset and design a time series analysis and prediction system using RNN.
|
||||||
|
|
||||||
|
> [!NOTE]
|
||||||
|
> Dataset available in [Datasets](../Datasets/GOOG.csv) directory.
|
||||||
|
> In the code, dataset is downloaded directly from Keras/TensorFlow in 2nd step (Load Dataset)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Pre-requisities
|
||||||
|
|
||||||
|
1. Install packages using `pip`: `pip install tensorflow keras numpy pandas matplotlib scikit-learn yfinance` (`tensorflow` requires Python 3.9 - 3.12)
|
||||||
|
|
||||||
|
## Steps
|
||||||
|
|
||||||
|
1. Import Libraries
|
||||||
|
2. Load Dataset
|
||||||
|
3. Exploratory Data Analysis (EDA)
|
||||||
|
4. Visualize Closing Price Over Time
|
||||||
|
5. Preprocess Data - Normalize Closing Price
|
||||||
|
6. Create Sequences for RNN Input
|
||||||
|
7. Build the RNN Model
|
||||||
|
8. Train the Model
|
||||||
|
9. Plot Training vs Validation Loss
|
||||||
|
10. Make Predictions and Inverse Scale
|
||||||
|
11. Evaluate the Model
|
||||||
|
12. Plot Actual vs Predicted Stock Price
|
||||||
|
13. Forecast Next 30 Days
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Code
|
||||||
|
|
||||||
|
### 1. Import Libraries:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
import numpy as np
|
||||||
|
import pandas as pd
|
||||||
|
import matplotlib.pyplot as plt
|
||||||
|
import yfinance as yf
|
||||||
|
from sklearn.preprocessing import MinMaxScaler
|
||||||
|
from sklearn.metrics import mean_squared_error, mean_absolute_error
|
||||||
|
from tensorflow.keras.models import Sequential
|
||||||
|
from tensorflow.keras.layers import Input, Dense, SimpleRNN, Dropout
|
||||||
|
from tensorflow.keras.callbacks import EarlyStopping
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Load Dataset:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
# Downloads GOOGL stock data from Yahoo Finance for the given date range
|
||||||
|
ticker = "GOOGL"
|
||||||
|
df = yf.download(ticker, start="2018-01-01", end="2024-01-01")
|
||||||
|
|
||||||
|
# --- Offline alternative (comment out the yf.download above and use this instead if using local dataset) ---
|
||||||
|
# df = pd.read_csv('GOOGL.csv', index_col='Date', parse_dates=True)
|
||||||
|
# df = df.sort_index() # ensure chronological order
|
||||||
|
|
||||||
|
# yfinance returns MultiIndex columns — flatten to single level
|
||||||
|
df.columns = df.columns.get_level_values(0)
|
||||||
|
|
||||||
|
print(f"Dataset Shape: {df.shape}")
|
||||||
|
print(f"Date Range: {df.index.min().date()} to {df.index.max().date()}")
|
||||||
|
print(df.head())
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Exploratory Data Analysis (EDA):
|
||||||
|
|
||||||
|
```python3
|
||||||
|
print("=== Dataset Info ===")
|
||||||
|
print(df.info())
|
||||||
|
print("\n=== Statistical Summary ===")
|
||||||
|
print(df.describe())
|
||||||
|
print("\n=== Missing Values ===")
|
||||||
|
print(df.isnull().sum())
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4. Visualize Closing Price Over Time:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
plt.figure(figsize=(16, 6))
|
||||||
|
plt.plot(df.index, df['Close'], color='steelblue', linewidth=1.5, label='Close Price')
|
||||||
|
plt.title('Google (GOOGL) Stock Closing Price (2018–2024)')
|
||||||
|
plt.xlabel('Date')
|
||||||
|
plt.ylabel('Price (USD)')
|
||||||
|
plt.legend()
|
||||||
|
plt.grid(alpha=0.3)
|
||||||
|
plt.tight_layout()
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
|
||||||
|
### 5. Preprocess Data - Normalize Closing Price:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
data = df[['Close']].values # use only Close price for prediction
|
||||||
|
|
||||||
|
scaler = MinMaxScaler(feature_range=(0, 1))
|
||||||
|
data_scaled = scaler.fit_transform(data) # scale values to [0, 1]
|
||||||
|
|
||||||
|
print(f"Original data range: [{data.min():.2f}, {data.max():.2f}]")
|
||||||
|
print(f"Scaled data range: [{data_scaled.min():.4f}, {data_scaled.max():.4f}]")
|
||||||
|
print(f"Total data points: {len(data_scaled)}")
|
||||||
|
```
|
||||||
|
|
||||||
|
### 6. Create Sequences for RNN Input:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
def create_sequences(data, time_steps=60):
|
||||||
|
X, y = [], []
|
||||||
|
for i in range(time_steps, len(data)):
|
||||||
|
X.append(data[i - time_steps:i, 0]) # window of past `time_steps` days
|
||||||
|
y.append(data[i, 0]) # next day's price
|
||||||
|
return np.array(X), np.array(y)
|
||||||
|
|
||||||
|
TIME_STEPS = 60 # use past 60 days to predict the next day
|
||||||
|
|
||||||
|
# 80/20 train-test split (manual, to preserve time order)
|
||||||
|
train_size = int(len(data_scaled) * 0.80)
|
||||||
|
train_data = data_scaled[:train_size]
|
||||||
|
test_data = data_scaled[train_size - TIME_STEPS:] # overlap ensures test sequences start correctly
|
||||||
|
|
||||||
|
X_train, y_train = create_sequences(train_data, TIME_STEPS)
|
||||||
|
X_test, y_test = create_sequences(test_data, TIME_STEPS)
|
||||||
|
|
||||||
|
# Reshape to [samples, time_steps, features] — required format for RNN layers
|
||||||
|
X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
|
||||||
|
X_test = X_test.reshape((X_test.shape[0], X_test.shape[1], 1))
|
||||||
|
|
||||||
|
print(f"Training samples: {X_train.shape}")
|
||||||
|
print(f"Testing samples: {X_test.shape}")
|
||||||
|
```
|
||||||
|
|
||||||
|
### 7. Build the RNN Model:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
model = Sequential()
|
||||||
|
|
||||||
|
model.add(Input(shape=(TIME_STEPS, 1))) # input: sequence of 60 days
|
||||||
|
model.add(SimpleRNN(units=64, return_sequences=True)) # first RNN layer, passes output to next
|
||||||
|
model.add(Dropout(0.2)) # drop 20% neurons to reduce overfitting
|
||||||
|
model.add(SimpleRNN(units=64, return_sequences=False)) # second RNN layer, outputs single vector
|
||||||
|
model.add(Dropout(0.2))
|
||||||
|
model.add(Dense(units=32, activation='relu')) # fully connected layer
|
||||||
|
model.add(Dense(units=1)) # output: single predicted price
|
||||||
|
|
||||||
|
model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mae'])
|
||||||
|
model.summary()
|
||||||
|
```
|
||||||
|
|
||||||
|
### 8. Train the Model:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
# EarlyStopping stops training if val_loss doesn't improve for 10 consecutive epochs
|
||||||
|
early_stop = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
|
||||||
|
|
||||||
|
history = model.fit(
|
||||||
|
X_train, y_train,
|
||||||
|
epochs=60,
|
||||||
|
batch_size=32,
|
||||||
|
validation_split=0.1, # use 10% of training data for validation
|
||||||
|
callbacks=[early_stop],
|
||||||
|
verbose=1
|
||||||
|
)
|
||||||
|
print(f"\nTraining stopped at epoch: {len(history.history['loss'])}")
|
||||||
|
```
|
||||||
|
|
||||||
|
### 9. Plot Training vs Validation Loss:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
plt.plot(history.history['loss'], label='Train Loss', color='royalblue')
|
||||||
|
plt.plot(history.history['val_loss'], label='Val Loss', color='tomato')
|
||||||
|
plt.title('Model Training Loss Over Epochs')
|
||||||
|
plt.xlabel('Epoch')
|
||||||
|
plt.ylabel('MSE Loss')
|
||||||
|
plt.legend()
|
||||||
|
plt.grid(alpha=0.3)
|
||||||
|
plt.tight_layout()
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
|
||||||
|
### 10. Make Predictions and Inverse Scale:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
y_pred_scaled = model.predict(X_test)
|
||||||
|
|
||||||
|
# Convert scaled predictions back to original USD price range
|
||||||
|
y_pred = scaler.inverse_transform(y_pred_scaled)
|
||||||
|
y_actual = scaler.inverse_transform(y_test.reshape(-1, 1))
|
||||||
|
|
||||||
|
print(f"Sample predictions (first 5): {y_pred[:5].flatten().round(2)}")
|
||||||
|
print(f"Actual values (first 5): {y_actual[:5].flatten().round(2)}")
|
||||||
|
```
|
||||||
|
|
||||||
|
### 11. Evaluate the Model:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
mse = mean_squared_error(y_actual, y_pred)
|
||||||
|
rmse = np.sqrt(mse)
|
||||||
|
mae = mean_absolute_error(y_actual, y_pred)
|
||||||
|
mape = np.mean(np.abs((y_actual - y_pred) / y_actual)) * 100 # mean absolute percentage error
|
||||||
|
|
||||||
|
print("=" * 40)
|
||||||
|
print(" MODEL EVALUATION METRICS")
|
||||||
|
print("=" * 40)
|
||||||
|
print(f" MSE : {mse:.4f}")
|
||||||
|
print(f" RMSE : {rmse:.4f}")
|
||||||
|
print(f" MAE : {mae:.4f}")
|
||||||
|
print(f" MAPE : {mape:.2f}%")
|
||||||
|
print("=" * 40)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 12. Plot Actual vs Predicted Stock Price:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
test_dates = df.index[train_size:] # align dates with test predictions
|
||||||
|
|
||||||
|
plt.figure(figsize=(16, 6))
|
||||||
|
plt.plot(test_dates, y_actual, label='Actual Price', color='steelblue', linewidth=1.5)
|
||||||
|
plt.plot(test_dates, y_pred, label='Predicted Price', color='tomato', linewidth=1.5, linestyle='--')
|
||||||
|
plt.title('Google Stock Price: Actual vs Predicted (RNN)')
|
||||||
|
plt.xlabel('Date')
|
||||||
|
plt.ylabel('Price (USD)')
|
||||||
|
plt.legend()
|
||||||
|
plt.grid(alpha=0.3)
|
||||||
|
plt.tight_layout()
|
||||||
|
plt.show()
|
||||||
|
```
|
||||||
|
|
||||||
|
### 13. Forecast Next 30 Days:
|
||||||
|
|
||||||
|
```python3
|
||||||
|
n_future = 30 # number of future days to predict
|
||||||
|
|
||||||
|
# Seed the forecast with the last TIME_STEPS days of known data
|
||||||
|
future_input = data_scaled[-TIME_STEPS:].reshape(1, TIME_STEPS, 1)
|
||||||
|
future_predictions = []
|
||||||
|
|
||||||
|
for _ in range(n_future):
|
||||||
|
pred = model.predict(future_input, verbose=0)
|
||||||
|
future_predictions.append(pred[0, 0])
|
||||||
|
# Slide the window: drop oldest day, append new prediction
|
||||||
|
future_input = np.append(future_input[:, 1:, :], pred.reshape(1, 1, 1), axis=1)
|
||||||
|
|
||||||
|
# Inverse scale forecasted prices back to USD
|
||||||
|
future_prices = scaler.inverse_transform(np.array(future_predictions).reshape(-1, 1))
|
||||||
|
|
||||||
|
# Generate business day dates starting from the day after last known date
|
||||||
|
last_date = df.index[-1]
|
||||||
|
future_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), periods=n_future, freq='B')
|
||||||
|
|
||||||
|
plt.figure(figsize=(16, 6))
|
||||||
|
plt.plot(df.index[-120:], scaler.inverse_transform(data_scaled[-120:]),
|
||||||
|
label='Historical', color='steelblue', linewidth=1.5)
|
||||||
|
plt.plot(future_dates, future_prices,
|
||||||
|
label='30-Day Forecast', color='orange', linewidth=1.5)
|
||||||
|
plt.axvline(x=last_date, color='gray', linestyle='--', label='Forecast Start')
|
||||||
|
plt.title('Google Stock — 30-Day Future Price Forecast (RNN)')
|
||||||
|
plt.xlabel('Date')
|
||||||
|
plt.ylabel('Price (USD)')
|
||||||
|
plt.legend()
|
||||||
|
plt.grid(alpha=0.3)
|
||||||
|
plt.tight_layout()
|
||||||
|
plt.show()
|
||||||
|
|
||||||
|
print(f"\nForecasted price range: {future_prices.min():.2f} USD - {future_prices.max():.2f} USD")
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Miscellaneous
|
||||||
|
|
||||||
|
- [Dataset source](https://www.kaggle.com/datasets/henryshan/google-stock-price)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@@ -8,6 +8,30 @@ This repository gathers comprehensive material for the SPPU Computer Engineering
|
|||||||
|
|
||||||
### Notes
|
### Notes
|
||||||
|
|
||||||
|
### Codes
|
||||||
|
|
||||||
|
1. [Code-1 (Linear Regression using Deep Neural Network)](Codes/Code-1.md)
|
||||||
|
2. [Code-2b (Classification using Deep Neural Network)](Codes/Code-2b.md)
|
||||||
|
3. [Code-3a (Convolutional Neural Network - Plant Diseases)](Codes/Code-3a.md)
|
||||||
|
4. [Code-3b (Convolutional Neural Network - MNIST Fashion Dataset)](Codes/Code-3b.md)
|
||||||
|
5. [Code-4 (Recurrent Neural Network - Google Stock Price Dataset)](Codes/Code-4.md)
|
||||||
|
|
||||||
|
### Jupyter Notebooks
|
||||||
|
|
||||||
|
1. [Notebook-1 (Linear Regression using Deep Neural Network)](Notebooks/Notebook-1.ipynb)
|
||||||
|
2. [Notebook-2b (Classification using Deep Neural Network)](Notebooks/Notebook-2b.ipynb)
|
||||||
|
3. [Notebook-3a (Convolutional Neural Network - Plant Diseases)](Notebooks/Notebook-3a.ipynb)
|
||||||
|
4. [Notebook-3b (Convolutional Neural Network - MNIST Fashion Dataset)](Notebooks/Notebook-3b.ipynb)
|
||||||
|
5. [Notebook-4 (Recurrent Neural Network - Google Stock Price Dataset)](Notebooks/Notebook-4.ipynb)
|
||||||
|
|
||||||
|
|
||||||
|
### Datasets
|
||||||
|
|
||||||
|
1. [Dataset for Practical-1 (Boston House Price)](Datasets/boston.csv)
|
||||||
|
2. [Dataset for Practical-2b (IMDB Reviews)](Datasets/IMDB%20Dataset.csv)
|
||||||
|
3. [Dataset for Practical-3b (MNIST Fashion)](Datasets/fashionmnist.zip)
|
||||||
|
4. [Dataset for Practical-4 (Google Stock Price)](Datasets/GOOG.csv)
|
||||||
|
|
||||||
### Assignments
|
### Assignments
|
||||||
|
|
||||||
- [Questions - Assignment 1 and 2](Assignments/DL%20-%20Assignments-1+2%20%28Questions%29.pdf)
|
- [Questions - Assignment 1 and 2](Assignments/DL%20-%20Assignments-1+2%20%28Questions%29.pdf)
|
||||||
|
|||||||
Reference in New Issue
Block a user