chore: add links for codes, Jupyter notebooks and datasets in README.

add Jupyter notebook for practical 4; rnn @ google stock price.
add Jupyter notebook for practical 3b; cnn fashion dataset.
2026-05-03 23:24:10 +05:30 · 2026-05-03 23:21:31 +05:30 · 2026-05-03 23:21:14 +05:30 · 2026-05-03 23:20:59 +05:30 · 2026-05-03 23:20:37 +05:30 · 2026-05-03 23:20:19 +05:30
11 changed files with 4277 additions and 0 deletions
@@ -0,0 +1,186 @@
 # Practical-1 (Linear Regression using Deep Neural Network)
 Problem Statement: Linear regression by using Deep Neural network: Implement Boston housing price prediction problem by Linear regression using Deep Neural network. Use Boston House price prediction dataset.
 > [!NOTE]
 > Dataset available in [Datasets](../Datasets/boston.csv) directory.
 ---
 ## Pre-requisities
 1. Install packages using `pip`: `pip install tensorflow keras pandas numpy scikit-learn matplotlib seaborn` (`tensorflow` requires Python 3.9 - 3.12)
 2. Copy the `boston.csv` dataset in the same directory as the Jupyter notebook.
 ## Steps
 1. Import Libraries
 2. Load Dataset
 3. Exploratory Data Analysis (EDA)
 4. Check for Missing Values
 5. Correlation Heatmap
 6. Separate Features and Target
 7. Split into Training and Testing Sets
 8. Feature Scaling (Standardization)
 9. Build the Neural Network Model
 10. Compile the Model
 11. Train the Model
 12. Evaluate the Model on Test Data
 13. Make Predictions
 14. Plot Training vs Validation Loss
 15. Plot Predicted vs Actual Prices
 ---
 ## Code
 ### 1. Import Libraries:
 ```python3
 import pandas as pd
 import numpy as np
 import matplotlib.pyplot as plt
 import seaborn as sns
 from sklearn.model_selection import train_test_split
 from sklearn.preprocessing import StandardScaler
 from keras import Input
 from keras.models import Sequential
 from keras.layers import Dense
 ```
 ### 2. Load Dataset:
 ```python3
 data = pd.read_csv('boston.csv')
 print(data.head())
 ```
 ### 3. Exploratory Data Analysis (EDA):
 ```python3
 print("Shape:", data.shape)          # number of rows and columns
 print("\nData Types:\n", data.dtypes)
 print("\nStatistical Summary:\n", data.describe())  # min, max, mean, std, etc.
 ```
 ### 4. Check for Missing Values:
 ```python3
 print("Missing values per column:\n", data.isnull().sum())
 # Drop rows with missing values (if any)
 data = data.dropna()
 print("\nShape after dropping nulls:", data.shape)
 ```
 ### 5. Correlation Heatmap:
 ```python3
 plt.figure(figsize=(12, 8))
 sns.heatmap(data.corr(), annot=True, fmt=".2f", cmap="coolwarm")  # show correlation between all feature pairs
 plt.title("Feature Correlation Heatmap")
 plt.tight_layout()
 plt.show()
 ```
 ### 6. Separate Features and Target:
 ```python3
 X = data.drop('MEDV', axis=1)   # all columns except house price
 y = data['MEDV']                 # target: median house price
 ```
 ### 7. Split into Training and Testing Sets:
 ```python3
 # 80% train, 20% test; random_state=42 ensures reproducible split
 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
 ```
 ### 8. Feature Scaling (Standardization):
 ```python3
 scaler = StandardScaler()
 X_train = scaler.fit_transform(X_train)  # learn mean/std from train, then scale
 X_test = scaler.transform(X_test)        # apply same mean/std to test (no leakage)
 ```
 ### 9. Build the Neural Network Model:
 ```python3
 model = Sequential()
 model.add(Input(shape=(X_train.shape[1],)))  # input shape = number of features
 model.add(Dense(64, activation='relu'))       # hidden layer 1: 64 neurons
 model.add(Dense(32, activation='relu'))       # hidden layer 2: 32 neurons
 model.add(Dense(1, activation='linear'))      # output layer: single value (house price)
 model.summary()
 ```
 ### 10. Compile the Model:
 ```python3
 # adam: adaptive optimizer; mse: standard regression loss; mae: human-readable error metric
 model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mae'])
 ```
 ### 11. Train the Model:
 ```python3
 # validation_split=0.2 reserves 20% of training data to monitor val loss each epoch
 history = model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2)
 ```
 ### 12. Evaluate the Model on Test Data:
 ```python3
 loss, mae = model.evaluate(X_test, y_test)
 print(f"Test Loss (MSE): {loss:.4f}")
 print(f"Test Mean Absolute Error: {mae:.4f}")
 ```
 ### 13. Make Predictions:
 ```python3
 predictions = model.predict(X_test)
 print("First 5 Predicted Prices:", predictions[:5].flatten())
 print("First 5 Actual Prices:   ", y_test.values[:5])
 ```
 ### 14. Plot Training vs Validation Loss:
 ```python3
 plt.plot(history.history['loss'], label='Training Loss')
 plt.plot(history.history['val_loss'], label='Validation Loss')
 plt.title('Model Loss Over Epochs')
 plt.ylabel('Loss (MSE)')
 plt.xlabel('Epoch')
 plt.legend()
 plt.grid(True)
 plt.show()
 ```
 ### 15. Plot Predicted vs Actual Prices:
 ```python3
 plt.figure(figsize=(8, 6))
 plt.scatter(y_test, predictions, alpha=0.7)                        # each point = one test sample
 plt.plot([y_test.min(), y_test.max()],
         [y_test.min(), y_test.max()], 'r--', label='Ideal Fit')   # diagonal = perfect prediction
 plt.xlabel('Actual Price')
 plt.ylabel('Predicted Price')
 plt.title('Actual vs Predicted House Prices')
 plt.legend()
 plt.grid(True)
 plt.show()
 ```
 ---
 ## Miscellaneous
 - [Dataset source](https://www.kaggle.com/datasets/fedesoriano/the-boston-houseprice-data)
 ---
@@ -0,0 +1,202 @@
 # Practical-2b (Classification using Deep Neural Network - IMDB Dataset)
 Problem Statement: Binary classification using Deep Neural Networks Example: Classify movie reviews into positive" reviews and "negative" reviews, just based on the text content of the reviews. Use IMDB dataset
 > [!NOTE]
 > Dataset available in [Datasets](../Datasets/IMDB%20Dataset.csv) directory.
 ---
 ## Pre-requisities
 1. Install packages using `pip`: `pip install tensorflow keras pandas numpy scikit-learn matplotlib seaborn` (`tensorflow` requires Python 3.9 - 3.12)
 2. Copy the `IMDB Dataset.csv` dataset in the same directory as the Jupyter notebook.
 ## Steps
 1. Import Libraries
 2. Load Dataset
 3. Exploratory Data Analysis (EDA)
 4. Data Cleaning - Strip HTML Tags
 5. Encode Labels and Separate Features
 6. Tokenize and Pad Text Sequences
 7. Split into Training and Testing Sets
 8. Build the Neural Network Model
 9. Compile the Model
 10. Train the Model
 11. Evaluate the Model on Test Data
 12. Plot Training vs Validation Accuracy
 13. Plot Training vs Validation Loss
 14. Confusion Matrix and Classification Report
 ---
 ## Code
 ### 1. Import Libraries:
 ```python3
 import re
 import pandas as pd
 import numpy as np
 import matplotlib.pyplot as plt
 import seaborn as sns
 from sklearn.model_selection import train_test_split
 from sklearn.preprocessing import LabelEncoder
 from sklearn.metrics import confusion_matrix, classification_report
 from tensorflow.keras.models import Sequential
 from tensorflow.keras.layers import Dense, Embedding, GlobalAveragePooling1D
 from tensorflow.keras.preprocessing.text import Tokenizer
 from tensorflow.keras.preprocessing.sequence import pad_sequences
 ```
 ### 2. Load Dataset:
 ```python3
 data = pd.read_csv('IMDB Dataset.csv')
 print(data.head())
 ```
 ### 3. Exploratory Data Analysis (EDA):
 ```python3
 print("Shape:", data.shape)
 print("\nMissing Values:\n", data.isnull().sum())
 print("\nClass Distribution:\n", data['sentiment'].value_counts())
 # Visualize class distribution
 sns.countplot(x='sentiment', data=data)
 plt.title('Sentiment Class Distribution')
 plt.show()
 # Sample reviews
 print("\nSample positive review:\n", data[data['sentiment'] == 'positive']['review'].iloc[0][:300])
 print("\nSample negative review:\n", data[data['sentiment'] == 'negative']['review'].iloc[0][:300])
 ```
 ### 4. Data Cleaning - Strip HTML Tags:
 ```python3
 def clean_text(text):
    text = re.sub(r'<.*?>', '', text)   # remove HTML tags like <br />
    text = text.lower().strip()         # lowercase and trim whitespace
    return text
 data['review'] = data['review'].apply(clean_text)
 print("Sample cleaned review:\n", data['review'].iloc[0][:300])
 ```
 ### 5. Encode Labels and Separate Features:
 ```python3
 label_encoder = LabelEncoder()
 data['sentiment'] = label_encoder.fit_transform(data['sentiment'])  # positive=1, negative=0
 X = data['review'].values    # input: review text
 y = data['sentiment'].values # output: 0 or 1
 ```
 ### 6. Tokenize and Pad Text Sequences:
 ```python3
 vocab_size = 10000   # keep only top 10,000 most frequent words
 max_length = 200     # truncate/pad all reviews to 200 words
 tokenizer = Tokenizer(num_words=vocab_size, oov_token='<OOV>')  # <OOV> handles unknown words
 tokenizer.fit_on_texts(X)                        # build word index from training text
 sequences = tokenizer.texts_to_sequences(X)      # convert each word to its integer index
 padded_sequences = pad_sequences(sequences, maxlen=max_length,
                                 padding='post', truncating='post')  # pad/truncate to fixed length
 ```
 ### 7. Split into Training and Testing Sets:
 ```python3
 X_train, X_test, y_train, y_test = train_test_split(padded_sequences, y, test_size=0.2, random_state=42)
 ```
 ### 8. Build the Neural Network Model:
 ```python3
 model = Sequential()
 model.add(Embedding(vocab_size, 16))        # maps each word index to a 16-dim vector
 model.add(GlobalAveragePooling1D())          # averages all word vectors into one vector
 model.add(Dense(24, activation='relu'))      # hidden layer: 24 neurons
 model.add(Dense(1, activation='sigmoid'))    # output: probability between 0 and 1 (binary)
 model.summary()
 ```
 ### 9. Compile the Model:
 ```python3
 # binary_crossentropy: standard loss for binary classification; sigmoid output
 model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
 ```
 ### 10. Train the Model:
 ```python3
 history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)
 ```
 ### 11. Evaluate the Model on Test Data:
 ```python3
 loss, accuracy = model.evaluate(X_test, y_test)
 print(f"Test Loss: {loss:.4f}")
 print(f"Test Accuracy: {accuracy*100:.2f}%")
 ```
 ### 12. Plot Training vs Validation Accuracy:
 ```python3
 plt.plot(history.history['accuracy'], label='Training Accuracy')
 plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
 plt.title('Model Accuracy Over Epochs')
 plt.ylabel('Accuracy')
 plt.xlabel('Epoch')
 plt.legend()
 plt.grid(True)
 plt.show()
 ```
 ### 13. Plot Training vs Validation Loss:
 ```python3
 plt.plot(history.history['loss'], label='Training Loss')
 plt.plot(history.history['val_loss'], label='Validation Loss')
 plt.title('Model Loss Over Epochs')
 plt.ylabel('Loss')
 plt.xlabel('Epoch')
 plt.legend()
 plt.grid(True)
 plt.show()
 ```
 ### 14. Confusion Matrix and Classification Report:
 ```python3
 y_pred = (model.predict(X_test) > 0.5).astype(int)  # threshold 0.5: prob > 0.5 = positive
 cm = confusion_matrix(y_test, y_pred)
 sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Negative', 'Positive'],
            yticklabels=['Negative', 'Positive'])
 plt.title('Confusion Matrix')
 plt.ylabel('Actual')
 plt.xlabel('Predicted')
 plt.show()
 print("\nClassification Report:\n", classification_report(y_test, y_pred, target_names=['Negative', 'Positive']))
 ```
 ---
 ## Miscellaneous
 - [Dataset source](https://www.kaggle.com/datasets/lakshmi25npathi/imdb-dataset-of-50k-movie-reviews)
 ---
@@ -0,0 +1,240 @@
 # Practical-3a (Convolutional Neural Network - Plant Diseases)
 Problem Statement: Convolutional Neural Network (CNN): Use any dataset of plant disease and design a plant disease detection system using CNN.
 > [!NOTE]
 > Download dataset directly from [source](https://www.kaggle.com/datasets/vipoooool/new-plant-diseases-dataset/data).
 > Haven't added it to the `/Datasets` directory due to its large size.
 > tbh the dataset doesn't really matter in this case, you just need to ensure dataset directory contains `train` and `valid` sub-directories.
 > Refer the above dataset to understand the required directory structure.
 ---
 ## Pre-requisities
 1. Install packages using `pip`: `pip install tensorflow keras numpy opencv-python matplotlib seaborn scikit-learn` (`tensorflow` requires Python 3.9 - 3.12)
 2. Download and unzip the dataset in the same directory as the Jupyter notebook.
 3. Ensure your unzipped dataset has the required directory structure:
 ```shell
 New Plant Diseases Dataset(Augmented)/
 ├── train
 │   ├── Apple___Apple_scab
 │   ├── Apple___Black_rot
 │   ├── Apple___Cedar_apple_rust
 ├── valid
 │   ├── Apple___Apple_scab
 │   ├── Apple___Black_rot
 │   ├── Apple___Cedar_apple_rust
 ```
 ## Steps
 1. Import Libraries
 2. Load Dataset
 3. Exploratory Data Analysis (EDA)
 4. Split into Training and Testing Sets
 5. Build the CNN Model
 6. Compile the Model
 7. Train the Model
 8. Evaluate the Model on Test Data
 9. Plot Training vs Validation Accuracy
 10. Plot Training vs Validation Loss
 11. Confusion Matrix and Classification Report
 ---
 ## Code
 ### 1. Import Libraries:
 ```python3
 import os
 import numpy as np
 import cv2
 import matplotlib.pyplot as plt
 import seaborn as sns
 from sklearn.model_selection import train_test_split
 from sklearn.metrics import confusion_matrix, classification_report
 from tensorflow.keras.models import Sequential
 from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense, Dropout
 from tensorflow.keras.utils import to_categorical
 ```
 ### 2. Load Dataset:
 ```python3
 data = []
 labels = []
 # Path to dataset folder containing one subfolder per disease class
 path = './New Plant Diseases Dataset(Augmented)/train/'
 categories = sorted(os.listdir(path))  # sort for consistent label ordering
 # Map each category name to a numeric index
 label_dict = {category: idx for idx, category in enumerate(categories)}
 print("Classes found:", len(categories))
 max_per_class = 200  # cap images per class to avoid RAM overflow on large datasets
 for category in categories:
    folder = os.path.join(path, category)
    count = 0
    for img_name in os.listdir(folder):
        if count >= max_per_class:
            break
        img_path = os.path.join(folder, img_name)
        img_array = cv2.imread(img_path)
        if img_array is not None:                          # skip unreadable files
            img_array = cv2.resize(img_array, (64, 64))   # resize to fixed 64x64 pixels
            data.append(img_array)
            labels.append(label_dict[category])
            count += 1
 data = np.array(data) / 255.0   # normalize pixel values from [0,255] to [0,1]
 labels = np.array(labels)
 print("Dataset shape:", data.shape)
 print("Labels shape:", labels.shape)
 ```
 ### 3. Exploratory Data Analysis (EDA):
 ```python3
 print("Total images:", len(data))
 print("Image shape:", data[0].shape)
 print("Number of classes:", len(categories))
 # Class distribution bar chart
 class_counts = {cat: int((labels == idx).sum()) for cat, idx in label_dict.items()}
 plt.figure(figsize=(14, 5))
 plt.bar(class_counts.keys(), class_counts.values())
 plt.xticks(rotation=90)
 plt.title("Number of Images per Disease Class")
 plt.xlabel("Class")
 plt.ylabel("Count")
 plt.tight_layout()
 plt.show()
 # Sample images from first 5 classes
 plt.figure(figsize=(15, 3))
 for i, category in enumerate(categories[:5]):
    idx = np.where(labels == label_dict[category])[0][0]  # index of first image in class
    plt.subplot(1, 5, i + 1)
    plt.imshow(cv2.cvtColor((data[idx] * 255).astype(np.uint8), cv2.COLOR_BGR2RGB))
    plt.title(category[:15], fontsize=8)
    plt.axis('off')
 plt.suptitle("Sample Images per Class")
 plt.show()
 ```
 ### 4. Split into Training and Testing Sets:
 ```python3
 X_train, X_test, y_train, y_test = train_test_split(data, labels, test_size=0.2, random_state=42)
 num_classes = len(categories)
 # One-hot encode labels: e.g. class 2 of 5 → [0, 0, 1, 0, 0]
 y_train = to_categorical(y_train, num_classes)
 y_test  = to_categorical(y_test,  num_classes)
 print("Train samples:", X_train.shape[0])
 print("Test samples: ", X_test.shape[0])
 ```
 ### 5. Build the CNN Model:
 ```python3
 model = Sequential()
 model.add(Input(shape=(64, 64, 3)))               # input: 64x64 RGB image
 model.add(Conv2D(32, (3, 3), activation='relu'))  # 32 filters, detect basic features
 model.add(MaxPooling2D(2, 2))                      # downsample by 2x
 model.add(Conv2D(64, (3, 3), activation='relu'))  # 64 filters, detect complex features
 model.add(MaxPooling2D(2, 2))
 model.add(Flatten())                              # convert 2D feature maps to 1D vector
 model.add(Dense(128, activation='relu'))          # fully connected layer
 model.add(Dropout(0.5))                           # randomly drop 50% neurons to reduce overfitting
 model.add(Dense(num_classes, activation='softmax'))  # output: probability for each class
 model.summary()
 ```
 ### 6. Compile the Model:
 ```python3
 # categorical_crossentropy: standard loss for multi-class classification with one-hot labels
 model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
 ```
 ### 7. Train the Model:
 ```python3
 history = model.fit(X_train, y_train, epochs=10, batch_size=32, validation_split=0.2)
 ```
 ### 8. Evaluate the Model on Test Data:
 ```python3
 loss, accuracy = model.evaluate(X_test, y_test)
 print(f"Test Loss: {loss:.4f}")
 print(f"Test Accuracy: {accuracy*100:.2f}%")
 ```
 ### 9. Plot Training vs Validation Accuracy:
 ```python3
 plt.plot(history.history['accuracy'], label='Training Accuracy')
 plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
 plt.title('CNN Model Accuracy Over Epochs')
 plt.xlabel('Epoch')
 plt.ylabel('Accuracy')
 plt.legend()
 plt.grid(True)
 plt.show()
 ```
 ### 10. Plot Training vs Validation Loss:
 ```python3
 plt.plot(history.history['loss'], label='Training Loss')
 plt.plot(history.history['val_loss'], label='Validation Loss')
 plt.title('CNN Model Loss Over Epochs')
 plt.xlabel('Epoch')
 plt.ylabel('Loss')
 plt.legend()
 plt.grid(True)
 plt.show()
 ```
 ### 11. Confusion Matrix and Classification Report:
 ```python3
 y_pred = np.argmax(model.predict(X_test), axis=1)  # predicted class index
 y_true = np.argmax(y_test, axis=1)                  # actual class index (from one-hot)
 cm = confusion_matrix(y_true, y_pred)
 plt.figure(figsize=(14, 12))
 sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=categories, yticklabels=categories)
 plt.title('Confusion Matrix')
 plt.ylabel('Actual')
 plt.xlabel('Predicted')
 plt.xticks(rotation=90)
 plt.tight_layout()
 plt.show()
 print("\nClassification Report:\n")
 print(classification_report(y_true, y_pred, target_names=categories))
 ```
 ---
 ## Miscellaneous
 - [Dataset source](https://www.kaggle.com/datasets/vipoooool/new-plant-diseases-dataset)
 ---
@@ -0,0 +1,233 @@
 # Practical-3b (Convolutional Neural Network - MNIST Fashion Dataset)
 Problem Statement: Convolutional Neural Network (CNN): Use MNIST Fashion Dataset and create a classifier to classify fashion clothing into categories.
 > [!NOTE]
 > Download dataset directly from [source](https://www.kaggle.com/datasets/zalando-research/fashionmnist).
 > Dataset available in [Datasets](../Datasets/fashionmnist.zip) directory.
 > In the code, dataset is downloaded directly from Keras/TensorFlow in 2nd step (Load Dataset)
 ---
 ## Pre-requisities
 1. Install packages using `pip`: `pip install tensorflow keras numpy matplotlib seaborn scikit-learn` (`tensorflow` requires Python 3.9 - 3.12)
 ## Steps
 ---
 ## Code
 ### 1. Import Libraries:
 ```python3
 import numpy as np
 import matplotlib.pyplot as plt
 import seaborn as sns
 import tensorflow as tf
 from tensorflow.keras.models import Sequential
 from tensorflow.keras.layers import Input, Conv2D, AvgPool2D, GlobalAveragePooling2D, Dense
 from tensorflow.keras.utils import to_categorical
 from sklearn.metrics import confusion_matrix, classification_report
 ```
 ### 2. Load Dataset:
 ```python3
 # Fashion MNIST is built into Keras, downloads automatically on first run
 (X_train, y_train), (X_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
 '''
 import numpy as np
 import gzip
 import os
 def load_fashion_mnist(path):
    """Load Fashion MNIST from local .gz files (Kaggle Zalando format)."""
    files = {
        'X_train': 'train-images-idx3-ubyte.gz',
        'y_train': 'train-labels-idx1-ubyte.gz',
        'X_test':  't10k-images-idx3-ubyte.gz',
        'y_test':  't10k-labels-idx1-ubyte.gz',
    }
    with gzip.open(os.path.join(path, files['X_train'])) as f:
        X_train = np.frombuffer(f.read(), np.uint8, offset=16).reshape(-1, 28, 28)
    with gzip.open(os.path.join(path, files['y_train'])) as f:
        y_train = np.frombuffer(f.read(), np.uint8, offset=8)
    with gzip.open(os.path.join(path, files['X_test'])) as f:
        X_test = np.frombuffer(f.read(), np.uint8, offset=16).reshape(-1, 28, 28)
    with gzip.open(os.path.join(path, files['y_test'])) as f:
        y_test = np.frombuffer(f.read(), np.uint8, offset=8)
    return (X_train, y_train), (X_test, y_test)
 # Replace the Keras load line with:
 (X_train, y_train), (X_test, y_test) = load_fashion_mnist('./fashion-mnist/')
 '''
 print("Training set shape:", X_train.shape)   # (60000, 28, 28)
 print("Test set shape:    ", X_test.shape)    # (10000, 28, 28)
 print("Classes:", np.unique(y_train))
 ```
 ### 3. Exploratory Data Analysis (EDA):
 ```python3
 class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat',
               'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
 # Class distribution
 unique, counts = np.unique(y_train, return_counts=True)
 plt.figure(figsize=(10, 4))
 plt.bar([class_names[i] for i in unique], counts)
 plt.xticks(rotation=45, ha='right')
 plt.title("Training Set Class Distribution")
 plt.ylabel("Count")
 plt.tight_layout()
 plt.show()
 # Sample images (one per class)
 plt.figure(figsize=(15, 3))
 for i, cls in enumerate(class_names):
    idx = np.where(y_train == i)[0][0]   # index of first image for this class
    plt.subplot(1, 10, i + 1)
    plt.imshow(X_train[idx], cmap='gray')
    plt.title(cls, fontsize=7)
    plt.axis('off')
 plt.suptitle("Sample Image per Class")
 plt.tight_layout()
 plt.show()
 ```
 ### 4. Preprocess Data:
 ```python3
 # Reshape to add channel dimension: (samples, 28, 28) -> (samples, 28, 28, 1)
 X_train = X_train.reshape(-1, 28, 28, 1).astype('float32') / 255.0  # normalize to [0,1]
 X_test  = X_test.reshape(-1, 28, 28, 1).astype('float32') / 255.0
 # One-hot encode labels: e.g. class 3 of 10 -> [0,0,0,1,0,0,0,0,0,0]
 y_train_cat = to_categorical(y_train, num_classes=10)
 y_test_cat  = to_categorical(y_test,  num_classes=10)
 print("X_train shape:", X_train.shape)
 print("y_train_cat shape:", y_train_cat.shape)
 ```
 ### 5. Build the CNN Model:
 ```python3
 model = Sequential()
 model.add(Input(shape=(28, 28, 1)))                                          # input: 28x28 grayscale image
 model.add(Conv2D(64, kernel_size=(3, 3), activation='relu', padding='same')) # 64 filters, extract features
 model.add(AvgPool2D(pool_size=(2, 2)))                                        # downsample to 14x14
 model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', padding='same')) # 32 filters, refine features
 model.add(AvgPool2D(pool_size=(2, 2)))                                        # downsample to 7x7
 model.add(GlobalAveragePooling2D())                                           # average each feature map to single value
 model.add(Dense(10, activation='softmax'))                                    # output: probability for each of 10 classes
 model.summary()
 ```
 ### 6. Compile the Model:
 ```python3
 # categorical_crossentropy: standard loss for multi-class one-hot classification
 model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
 ```
 ### 7. Train the Model:
 ```python3
 # validation_data uses test set to monitor performance after each epoch
 history = model.fit(X_train, y_train_cat, epochs=10, validation_data=(X_test, y_test_cat))
 ```
 ### 8. Evaluate the Model on Test Data:
 ```python3
 loss, accuracy = model.evaluate(X_test, y_test_cat)
 print(f"Test Loss:     {loss:.4f}")
 print(f"Test Accuracy: {accuracy*100:.2f}%")
 ```
 ### 9. Plot Training vs Validation Accuracy:
 ```python3
 plt.plot(history.history['accuracy'], label='Training Accuracy')
 plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
 plt.title('Model Accuracy Over Epochs')
 plt.xlabel('Epoch')
 plt.ylabel('Accuracy')
 plt.legend()
 plt.grid(True)
 plt.show()
 ```
 ### 10. Plot Training vs Validation Loss:
 ```python3
 plt.plot(history.history['loss'], label='Training Loss')
 plt.plot(history.history['val_loss'], label='Validation Loss')
 plt.title('Model Loss Over Epochs')
 plt.xlabel('Epoch')
 plt.ylabel('Loss')
 plt.legend()
 plt.grid(True)
 plt.show()
 ```
 ### 11. Confusion Matrix and Classification Report:
 ```python3
 y_pred = np.argmax(model.predict(X_test), axis=1)  # predicted class index
 cm = confusion_matrix(y_test, y_pred)
 plt.figure(figsize=(10, 8))
 sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=class_names, yticklabels=class_names)
 plt.title('Confusion Matrix')
 plt.ylabel('Actual')
 plt.xlabel('Predicted')
 plt.xticks(rotation=45, ha='right')
 plt.tight_layout()
 plt.show()
 print("\nClassification Report:\n")
 print(classification_report(y_test, y_pred, target_names=class_names))
 ```
 ### 12. Visualize Sample Predictions:
 ```python3
 # batch predict all test images, then pick 10 random ones to display
 all_preds = np.argmax(model.predict(X_test), axis=1)
 random_indices = np.random.choice(len(X_test), 10, replace=False)
 plt.figure(figsize=(20, 4))
 for i, idx in enumerate(random_indices):
    plt.subplot(2, 5, i + 1)
    plt.imshow(X_test[idx].reshape(28, 28), cmap='gray')
    predicted = class_names[all_preds[idx]]
    actual    = class_names[y_test[idx]]
    color = 'green' if predicted == actual else 'red'   # green = correct, red = wrong
    plt.title(f"P: {predicted}\nA: {actual}", fontsize=8, color=color)
    plt.axis('off')
 plt.suptitle("Sample Predictions (Green=Correct, Red=Wrong)")
 plt.tight_layout()
 plt.show()
 ```
 ---
 ## Miscellaneous
 - [Dataset source](https://www.kaggle.com/datasets/zalando-research/fashionmnist)
 ---
@@ -0,0 +1,276 @@
 # Practical-4 (Recurrent Neural Network - Google Stock Price Dataset)
 Problem Statement: Recurrent neural network (RNN): Use the Google stock prices dataset and design a time series analysis and prediction system using RNN.
 > [!NOTE]
 > Dataset available in [Datasets](../Datasets/GOOG.csv) directory.
 > In the code, dataset is downloaded directly from Keras/TensorFlow in 2nd step (Load Dataset)
 ---
 ## Pre-requisities
 1. Install packages using `pip`: `pip install tensorflow keras numpy pandas matplotlib scikit-learn yfinance` (`tensorflow` requires Python 3.9 - 3.12)
 ## Steps
 1. Import Libraries
 2. Load Dataset
 3. Exploratory Data Analysis (EDA)
 4. Visualize Closing Price Over Time
 5. Preprocess Data - Normalize Closing Price
 6. Create Sequences for RNN Input
 7. Build the RNN Model
 8. Train the Model
 9. Plot Training vs Validation Loss
 10. Make Predictions and Inverse Scale
 11. Evaluate the Model
 12. Plot Actual vs Predicted Stock Price
 13. Forecast Next 30 Days
 ---
 ## Code
 ### 1. Import Libraries:
 ```python3
 import numpy as np
 import pandas as pd
 import matplotlib.pyplot as plt
 import yfinance as yf
 from sklearn.preprocessing import MinMaxScaler
 from sklearn.metrics import mean_squared_error, mean_absolute_error
 from tensorflow.keras.models import Sequential
 from tensorflow.keras.layers import Input, Dense, SimpleRNN, Dropout
 from tensorflow.keras.callbacks import EarlyStopping
 ```
 ### 2. Load Dataset:
 ```python3
 # Downloads GOOGL stock data from Yahoo Finance for the given date range
 ticker = "GOOGL"
 df = yf.download(ticker, start="2018-01-01", end="2024-01-01")
 # --- Offline alternative (comment out the yf.download above and use this instead if using local dataset) ---
 # df = pd.read_csv('GOOGL.csv', index_col='Date', parse_dates=True)
 # df = df.sort_index()  # ensure chronological order
 # yfinance returns MultiIndex columns — flatten to single level
 df.columns = df.columns.get_level_values(0)
 print(f"Dataset Shape: {df.shape}")
 print(f"Date Range: {df.index.min().date()} to {df.index.max().date()}")
 print(df.head())
 ```
 ### 3. Exploratory Data Analysis (EDA):
 ```python3
 print("=== Dataset Info ===")
 print(df.info())
 print("\n=== Statistical Summary ===")
 print(df.describe())
 print("\n=== Missing Values ===")
 print(df.isnull().sum())
 ```
 ### 4. Visualize Closing Price Over Time:
 ```python3
 plt.figure(figsize=(16, 6))
 plt.plot(df.index, df['Close'], color='steelblue', linewidth=1.5, label='Close Price')
 plt.title('Google (GOOGL) Stock Closing Price (2018–2024)')
 plt.xlabel('Date')
 plt.ylabel('Price (USD)')
 plt.legend()
 plt.grid(alpha=0.3)
 plt.tight_layout()
 plt.show()
 ```
 ### 5. Preprocess Data - Normalize Closing Price:
 ```python3
 data = df[['Close']].values   # use only Close price for prediction
 scaler = MinMaxScaler(feature_range=(0, 1))
 data_scaled = scaler.fit_transform(data)  # scale values to [0, 1]
 print(f"Original data range: [{data.min():.2f}, {data.max():.2f}]")
 print(f"Scaled data range:   [{data_scaled.min():.4f}, {data_scaled.max():.4f}]")
 print(f"Total data points:   {len(data_scaled)}")
 ```
 ### 6. Create Sequences for RNN Input:
 ```python3
 def create_sequences(data, time_steps=60):
    X, y = [], []
    for i in range(time_steps, len(data)):
        X.append(data[i - time_steps:i, 0])  # window of past `time_steps` days
        y.append(data[i, 0])                  # next day's price
    return np.array(X), np.array(y)
 TIME_STEPS = 60  # use past 60 days to predict the next day
 # 80/20 train-test split (manual, to preserve time order)
 train_size = int(len(data_scaled) * 0.80)
 train_data = data_scaled[:train_size]
 test_data  = data_scaled[train_size - TIME_STEPS:]  # overlap ensures test sequences start correctly
 X_train, y_train = create_sequences(train_data, TIME_STEPS)
 X_test,  y_test  = create_sequences(test_data,  TIME_STEPS)
 # Reshape to [samples, time_steps, features] — required format for RNN layers
 X_train = X_train.reshape((X_train.shape[0], X_train.shape[1], 1))
 X_test  = X_test.reshape((X_test.shape[0],   X_test.shape[1],  1))
 print(f"Training samples: {X_train.shape}")
 print(f"Testing samples:  {X_test.shape}")
 ```
 ### 7. Build the RNN Model:
 ```python3
 model = Sequential()
 model.add(Input(shape=(TIME_STEPS, 1)))                               # input: sequence of 60 days
 model.add(SimpleRNN(units=64, return_sequences=True))                 # first RNN layer, passes output to next
 model.add(Dropout(0.2))                                               # drop 20% neurons to reduce overfitting
 model.add(SimpleRNN(units=64, return_sequences=False))                # second RNN layer, outputs single vector
 model.add(Dropout(0.2))
 model.add(Dense(units=32, activation='relu'))                         # fully connected layer
 model.add(Dense(units=1))                                             # output: single predicted price
 model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mae'])
 model.summary()
 ```
 ### 8. Train the Model:
 ```python3
 # EarlyStopping stops training if val_loss doesn't improve for 10 consecutive epochs
 early_stop = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
 history = model.fit(
    X_train, y_train,
    epochs=60,
    batch_size=32,
    validation_split=0.1,   # use 10% of training data for validation
    callbacks=[early_stop],
    verbose=1
 )
 print(f"\nTraining stopped at epoch: {len(history.history['loss'])}")
 ```
 ### 9. Plot Training vs Validation Loss:
 ```python3
 plt.plot(history.history['loss'], label='Train Loss', color='royalblue')
 plt.plot(history.history['val_loss'], label='Val Loss', color='tomato')
 plt.title('Model Training Loss Over Epochs')
 plt.xlabel('Epoch')
 plt.ylabel('MSE Loss')
 plt.legend()
 plt.grid(alpha=0.3)
 plt.tight_layout()
 plt.show()
 ```
 ### 10. Make Predictions and Inverse Scale:
 ```python3
 y_pred_scaled = model.predict(X_test)
 # Convert scaled predictions back to original USD price range
 y_pred   = scaler.inverse_transform(y_pred_scaled)
 y_actual = scaler.inverse_transform(y_test.reshape(-1, 1))
 print(f"Sample predictions (first 5): {y_pred[:5].flatten().round(2)}")
 print(f"Actual values      (first 5): {y_actual[:5].flatten().round(2)}")
 ```
 ### 11. Evaluate the Model:
 ```python3
 mse  = mean_squared_error(y_actual, y_pred)
 rmse = np.sqrt(mse)
 mae  = mean_absolute_error(y_actual, y_pred)
 mape = np.mean(np.abs((y_actual - y_pred) / y_actual)) * 100  # mean absolute percentage error
 print("=" * 40)
 print("     MODEL EVALUATION METRICS")
 print("=" * 40)
 print(f"  MSE  : {mse:.4f}")
 print(f"  RMSE : {rmse:.4f}")
 print(f"  MAE  : {mae:.4f}")
 print(f"  MAPE : {mape:.2f}%")
 print("=" * 40)
 ```
 ### 12. Plot Actual vs Predicted Stock Price:
 ```python3
 test_dates = df.index[train_size:]   # align dates with test predictions
 plt.figure(figsize=(16, 6))
 plt.plot(test_dates, y_actual, label='Actual Price',    color='steelblue', linewidth=1.5)
 plt.plot(test_dates, y_pred,   label='Predicted Price', color='tomato',    linewidth=1.5, linestyle='--')
 plt.title('Google Stock Price: Actual vs Predicted (RNN)')
 plt.xlabel('Date')
 plt.ylabel('Price (USD)')
 plt.legend()
 plt.grid(alpha=0.3)
 plt.tight_layout()
 plt.show()
 ```
 ### 13. Forecast Next 30 Days:
 ```python3
 n_future = 30  # number of future days to predict
 # Seed the forecast with the last TIME_STEPS days of known data
 future_input       = data_scaled[-TIME_STEPS:].reshape(1, TIME_STEPS, 1)
 future_predictions = []
 for _ in range(n_future):
    pred = model.predict(future_input, verbose=0)
    future_predictions.append(pred[0, 0])
    # Slide the window: drop oldest day, append new prediction
    future_input = np.append(future_input[:, 1:, :], pred.reshape(1, 1, 1), axis=1)
 # Inverse scale forecasted prices back to USD
 future_prices = scaler.inverse_transform(np.array(future_predictions).reshape(-1, 1))
 # Generate business day dates starting from the day after last known date
 last_date    = df.index[-1]
 future_dates = pd.date_range(start=last_date + pd.Timedelta(days=1), periods=n_future, freq='B')
 plt.figure(figsize=(16, 6))
 plt.plot(df.index[-120:], scaler.inverse_transform(data_scaled[-120:]),
         label='Historical', color='steelblue', linewidth=1.5)
 plt.plot(future_dates, future_prices,
         label='30-Day Forecast', color='orange', linewidth=1.5)
 plt.axvline(x=last_date, color='gray', linestyle='--', label='Forecast Start')
 plt.title('Google Stock — 30-Day Future Price Forecast (RNN)')
 plt.xlabel('Date')
 plt.ylabel('Price (USD)')
 plt.legend()
 plt.grid(alpha=0.3)
 plt.tight_layout()
 plt.show()
 print(f"\nForecasted price range: {future_prices.min():.2f} USD - {future_prices.max():.2f} USD")
 ```
 ---
 ## Miscellaneous
 - [Dataset source](https://www.kaggle.com/datasets/henryshan/google-stock-price)
 ---
@@ -8,6 +8,30 @@ This repository gathers comprehensive material for the SPPU Computer Engineering
 ### Notes
 ### Codes
 1. [Code-1 (Linear Regression using Deep Neural Network)](Codes/Code-1.md)
 2. [Code-2b (Classification using Deep Neural Network)](Codes/Code-2b.md)
 3. [Code-3a (Convolutional Neural Network - Plant Diseases)](Codes/Code-3a.md)
 4. [Code-3b (Convolutional Neural Network - MNIST Fashion Dataset)](Codes/Code-3b.md)
 5. [Code-4 (Recurrent Neural Network - Google Stock Price Dataset)](Codes/Code-4.md)
 ### Jupyter Notebooks
 1. [Notebook-1 (Linear Regression using Deep Neural Network)](Notebooks/Notebook-1.ipynb)
 2. [Notebook-2b (Classification using Deep Neural Network)](Notebooks/Notebook-2b.ipynb)
 3. [Notebook-3a (Convolutional Neural Network - Plant Diseases)](Notebooks/Notebook-3a.ipynb)
 4. [Notebook-3b (Convolutional Neural Network - MNIST Fashion Dataset)](Notebooks/Notebook-3b.ipynb)
 5. [Notebook-4 (Recurrent Neural Network - Google Stock Price Dataset)](Notebooks/Notebook-4.ipynb)
 ### Datasets
 1. [Dataset for Practical-1 (Boston House Price)](Datasets/boston.csv)
 2. [Dataset for Practical-2b (IMDB Reviews)](Datasets/IMDB%20Dataset.csv)
 3. [Dataset for Practical-3b (MNIST Fashion)](Datasets/fashionmnist.zip)
 4. [Dataset for Practical-4 (Google Stock Price)](Datasets/GOOG.csv)
 ### Assignments
 - [Questions - Assignment 1 and 2](Assignments/DL%20-%20Assignments-1+2%20%28Questions%29.pdf)
Author	SHA1	Message	Date
notkshitij	451d0cd299	chore: add links for codes, Jupyter notebooks and datasets in README.	2026-05-03 23:24:10 +05:30
notkshitij	aac5138b33	add Jupyter notebook for practical 4; rnn @ google stock price.	2026-05-03 23:21:31 +05:30
notkshitij	879a46011a	add Jupyter notebook for practical 3b; cnn fashion dataset.	2026-05-03 23:21:14 +05:30
notkshitij	90086d4bfa	add Jupyter notebook for practical 3a; cnn plant diseases.	2026-05-03 23:20:59 +05:30
notkshitij	c3504f2743	add Jupyter notebook for practical 2b; classification.	2026-05-03 23:20:37 +05:30
notkshitij	a5914df5c7	add Jupyter notebook for practical 1; linear regression.	2026-05-03 23:20:19 +05:30
notkshitij	fd71b0ff24	add code blocks for practical 4; rnn @ google stock price.	2026-05-03 23:15:04 +05:30
notkshitij	811033a359	add code blocks for practical 3b; cnn fashion dataset.	2026-05-03 23:14:29 +05:30
notkshitij	0c674d02f9	add code blocks for practical 3a; cnn plant diseases.	2026-05-03 23:14:20 +05:30
notkshitij	b54e56669d	add code blocks for practical 2b; classification.	2026-05-03 23:13:22 +05:30
notkshitij	0c028cb9c4	add code blocks for practical 1; linear regression.	2026-05-03 23:11:39 +05:30