Compare commits

..

19 Commits

Author SHA1 Message Date
notkshitij f497756963 add answers for may-june 2025 + november-december 2025 pyqs for unit 6 (Reinforcement Learning) 2026-05-26 00:58:41 +05:30
notkshitij 0a597c5917 add answers for may-june 2025 + november-december 2025 pyqs for unit 5 (Deep Generative Models) 2026-05-26 00:51:39 +05:30
notkshitij 03baa08d8a add answers for may-june 2025 + november-december 2025 pyqs for unit 4 (Recurrent Neural Network) 2026-05-26 00:38:48 +05:30
notkshitij 2a71605050 add answers for may-june 2025 + november-december 2025 pyqs for unit 3 (Convolution Neural Network) 2026-05-26 00:38:21 +05:30
notkshitij 702fe6cda6 add may-june 2025 + november-december 2025 pyqs for end-sem. 2026-05-25 23:19:28 +05:30
notkshitij fc477ab15b add link for end-sem pyq answers in README. 2026-05-21 20:45:05 +05:30
notkshitij 628d9f171e add end-sem pyq answers for unit 6 (Reinforcement Learning) 2026-05-21 20:44:23 +05:30
notkshitij 6b81388d0e add end-sem pyq answers for unit 5 (Deep Generative Models) 2026-05-21 20:42:56 +05:30
notkshitij 7bb921a482 add end-sem pyq answers for unit 4 (Recurrent Neural Network) 2026-05-21 20:41:14 +05:30
notkshitij 6135de00dd add end-sem pyq answers for unit 3 (Convolution Neural Network) 2026-05-21 20:34:24 +05:30
notkshitij 8489b2d5aa add end-sem pyqs for DL (may june 2023, nov-dec 2023, may-june 2024) 2026-05-15 01:41:24 +05:30
notkshitij 3f6ece863d chore: add single line of comment above local data loading lines of code @ practical 3b. 2026-05-04 16:35:07 +05:30
notkshitij deb41dfdc8 fix: remove function for loading dataset locally and add basic lines of code to load csv instead for practical 3b notebook. 2026-05-04 16:27:37 +05:30
notkshitij 73f8c867b7 fix: remove function for loading dataset locally and add basic lines of code to load csv instead for practical 3b. 2026-05-04 16:27:06 +05:30
notkshitij a201069006 chore: add links for codes, Jupyter notebooks and datasets in README @ practical 2a. 2026-05-04 12:27:26 +05:30
notkshitij 740030163a fix: heading in notebook @ practical 2a. 2026-05-04 12:25:44 +05:30
notkshitij a4a86b25eb add dataset for practical 2a; letter recognition. 2026-05-04 12:23:56 +05:30
notkshitij ed595a29cb add code blocks for practical 2a; multiclass classification. 2026-05-04 12:23:10 +05:30
notkshitij 89350b362f add Jupyter notebook for practical 2a; multiclass classification. 2026-05-04 12:22:47 +05:30
16 changed files with 980 additions and 90 deletions
+1
View File
@@ -3,3 +3,4 @@ Datasets/boston.csv filter=lfs diff=lfs merge=lfs -text
Datasets/IMDB[[:space:]]Dataset.csv filter=lfs diff=lfs merge=lfs -text
Datasets/fashionmnist.zip filter=lfs diff=lfs merge=lfs -text
Datasets/GOOG.csv filter=lfs diff=lfs merge=lfs -text
Datasets/letter+recognition.zip filter=lfs diff=lfs merge=lfs -text
+219
View File
@@ -0,0 +1,219 @@
# Practical-2a (Classification using Deep Neural Network - OCR Letter Recognition)
Problem Statement: Multiclass classification using Deep Neural Networks: Example: Use the OCR letter recognition dataset.
> [!NOTE]
> Dataset available in [Datasets](../Datasets/letter+recognition.zip) directory.
---
## Pre-requisities
1. Install packages using `pip`: `pip install tensorflow keras numpy pandas matplotlib seaborn scikit-learn` (`tensorflow` requires Python 3.9 - 3.12)
2. Download and unzip the `letter+recognition.zip` dataset in the same directory as the Jupyter notebook.
## Steps
1. Import Libraries
2. Load Dataset
3. Exploratory Data Analysis (EDA)
4. Visualize Class Distribution
5. Encode Labels and Separate Features
6. Split into Training and Testing Sets
7. Feature Scaling (Standardization)
8. One-Hot Encode Labels
9. Build the Deep Neural Network Model
10. Compile the Model
11. Train the Model
12. Evaluate the Model on Test Data
13. Plot Training vs Validation Accuracy
14. Plot Training vs Validation Loss
15. Confusion Matrix and Classification Report
---
## Code
### 1. Import Libraries:
```python3
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.metrics import confusion_matrix, classification_report
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Input, Dense, Dropout
from tensorflow.keras.utils import to_categorical
```
### 2. Load Dataset:
```python3
# Dataset has no header row — define column names manually based on UCI documentation
col_names = ['letter', 'x-box', 'y-box', 'width', 'high', 'onpix',
'x-bar', 'y-bar', 'x2bar', 'y2bar', 'xybar',
'x2ybr', 'xy2br', 'x-ege', 'xegvy', 'y-ege', 'yegvx']
data = pd.read_csv('./letter+recognition/letter-recognition.data', header=None, names=col_names)
print("Shape:", data.shape)
print(data.head())
```
### 3. Exploratory Data Analysis (EDA):
```python3
print("Data Types:\n", data.dtypes)
print("\nMissing Values:\n", data.isnull().sum())
print("\nStatistical Summary:\n", data.describe())
```
### 4. Visualize Class Distribution:
```python3
plt.figure(figsize=(14, 4))
data['letter'].value_counts().sort_index().plot(kind='bar')
plt.title("Number of Samples per Letter Class")
plt.xlabel("Letter")
plt.ylabel("Count")
plt.tight_layout()
plt.show()
```
### 5. Encode Labels and Separate Features:
```python3
label_encoder = LabelEncoder()
data['letter'] = label_encoder.fit_transform(data['letter']) # A=0, B=1, ..., Z=25
X = data.drop('letter', axis=1).values # 16 numeric features
y = data['letter'].values # class index 025
num_classes = len(label_encoder.classes_)
print("Classes:", label_encoder.classes_)
print("Number of classes:", num_classes)
```
### 6. Split into Training and Testing Sets:
```python3
# 80% train, 20% test; stratify ensures balanced class distribution in both sets
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y)
print("Train samples:", X_train.shape[0])
print("Test samples: ", X_test.shape[0])
```
### 7. Feature Scaling (Standardization):
```python3
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train) # learn mean/std from train, then scale
X_test = scaler.transform(X_test) # apply same mean/std to test (no leakage)
```
### 8. One-Hot Encode Labels:
```python3
# e.g. class 2 of 26 -> [0, 0, 1, 0, ..., 0]
y_train_cat = to_categorical(y_train, num_classes)
y_test_cat = to_categorical(y_test, num_classes)
```
### 9. Build the Deep Neural Network Model:
```python3
model = Sequential()
model.add(Input(shape=(X_train.shape[1],))) # input: 16 features
model.add(Dense(256, activation='relu')) # hidden layer 1: 256 neurons
model.add(Dropout(0.3)) # drop 30% neurons to reduce overfitting
model.add(Dense(128, activation='relu')) # hidden layer 2: 128 neurons
model.add(Dropout(0.3))
model.add(Dense(64, activation='relu')) # hidden layer 3: 64 neurons
model.add(Dense(num_classes, activation='softmax')) # output: probability for each of 26 letters
model.summary()
```
### 10. Compile the Model:
```python3
# categorical_crossentropy: standard loss for multi-class one-hot classification
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
```
### 11. Train the Model:
```python3
history = model.fit(
X_train, y_train_cat,
epochs=50,
batch_size=32,
validation_split=0.2 # use 20% of training data to monitor val loss each epoch
)
```
### 12. Evaluate the Model on Test Data:
```python3
loss, accuracy = model.evaluate(X_test, y_test_cat)
print(f"Test Loss: {loss:.4f}")
print(f"Test Accuracy: {accuracy*100:.2f}%")
```
### 13. Plot Training vs Validation Accuracy:
```python3
plt.plot(history.history['accuracy'], label='Training Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.title('Model Accuracy Over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend()
plt.grid(True)
plt.show()
```
### 14. Plot Training vs Validation Loss:
```python3
plt.plot(history.history['loss'], label='Training Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title('Model Loss Over Epochs')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.legend()
plt.grid(True)
plt.show()
```
### 15. Confusion Matrix and Classification Report:
```python3
y_pred = np.argmax(model.predict(X_test), axis=1) # predicted class index
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(16, 14))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
xticklabels=label_encoder.classes_,
yticklabels=label_encoder.classes_)
plt.title('Confusion Matrix')
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.tight_layout()
plt.show()
print("\nClassification Report:\n")
print(classification_report(y_test, y_pred, target_names=label_encoder.classes_))
```
---
## Miscellaneous
- [Dataset source](https://archive.ics.uci.edu/ml/datasets/letter%2Brecognition)
---
+8 -28
View File
@@ -38,34 +38,14 @@ from sklearn.metrics import confusion_matrix, classification_report
# Fashion MNIST is built into Keras, downloads automatically on first run
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()
'''
import numpy as np
import gzip
import os
def load_fashion_mnist(path):
"""Load Fashion MNIST from local .gz files (Kaggle Zalando format)."""
files = {
'X_train': 'train-images-idx3-ubyte.gz',
'y_train': 'train-labels-idx1-ubyte.gz',
'X_test': 't10k-images-idx3-ubyte.gz',
'y_test': 't10k-labels-idx1-ubyte.gz',
}
with gzip.open(os.path.join(path, files['X_train'])) as f:
X_train = np.frombuffer(f.read(), np.uint8, offset=16).reshape(-1, 28, 28)
with gzip.open(os.path.join(path, files['y_train'])) as f:
y_train = np.frombuffer(f.read(), np.uint8, offset=8)
with gzip.open(os.path.join(path, files['X_test'])) as f:
X_test = np.frombuffer(f.read(), np.uint8, offset=16).reshape(-1, 28, 28)
with gzip.open(os.path.join(path, files['y_test'])) as f:
y_test = np.frombuffer(f.read(), np.uint8, offset=8)
return (X_train, y_train), (X_test, y_test)
# Replace the Keras load line with:
(X_train, y_train), (X_test, y_test) = load_fashion_mnist('./fashion-mnist/')
'''
# --- Offline alternative (comment out tf.keras line above and use this instead) ---
# import pandas as pd
# train_df = pd.read_csv('fashion-mnist_train.csv')
# test_df = pd.read_csv('fashion-mnist_test.csv')
# y_train = train_df['label'].values
# y_test = test_df['label'].values
# X_train = train_df.drop('label', axis=1).values.reshape(-1, 28, 28) # unflatten pixels to 28x28
# X_test = test_df.drop('label', axis=1).values.reshape(-1, 28, 28)
print("Training set shape:", X_train.shape) # (60000, 28, 28)
print("Test set shape: ", X_test.shape) # (10000, 28, 28)
Binary file not shown.
File diff suppressed because one or more lines are too long
+4 -51
View File
@@ -51,58 +51,11 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": null,
"id": "859cbc0f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Training set shape: (60000, 28, 28)\n",
"Test set shape: (10000, 28, 28)\n",
"Classes: [0 1 2 3 4 5 6 7 8 9]\n"
]
}
],
"source": [
"# 2. Load Dataset\n",
"# Fashion MNIST is built into Keras, downloads automatically on first run\n",
"(X_train, y_train), (X_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()\n",
"\n",
"'''\n",
"import numpy as np\n",
"import gzip\n",
"import os\n",
"\n",
"def load_fashion_mnist(path):\n",
" \"\"\"Load Fashion MNIST from local .gz files (Kaggle Zalando format).\"\"\"\n",
" files = {\n",
" 'X_train': 'train-images-idx3-ubyte.gz',\n",
" 'y_train': 'train-labels-idx1-ubyte.gz',\n",
" 'X_test': 't10k-images-idx3-ubyte.gz',\n",
" 'y_test': 't10k-labels-idx1-ubyte.gz',\n",
" }\n",
"\n",
" with gzip.open(os.path.join(path, files['X_train'])) as f:\n",
" X_train = np.frombuffer(f.read(), np.uint8, offset=16).reshape(-1, 28, 28)\n",
" with gzip.open(os.path.join(path, files['y_train'])) as f:\n",
" y_train = np.frombuffer(f.read(), np.uint8, offset=8)\n",
" with gzip.open(os.path.join(path, files['X_test'])) as f:\n",
" X_test = np.frombuffer(f.read(), np.uint8, offset=16).reshape(-1, 28, 28)\n",
" with gzip.open(os.path.join(path, files['y_test'])) as f:\n",
" y_test = np.frombuffer(f.read(), np.uint8, offset=8)\n",
"\n",
" return (X_train, y_train), (X_test, y_test)\n",
"\n",
"# Replace the Keras load line with:\n",
"(X_train, y_train), (X_test, y_test) = load_fashion_mnist('./fashion-mnist/')\n",
"'''\n",
"\n",
"print(\"Training set shape:\", X_train.shape) # (60000, 28, 28)\n",
"print(\"Test set shape: \", X_test.shape) # (10000, 28, 28)\n",
"print(\"Classes:\", np.unique(y_train))"
]
"outputs": [],
"source": "# 2. Load Dataset\n# Fashion MNIST is built into Keras — downloads automatically on first run\n(X_train, y_train), (X_test, y_test) = tf.keras.datasets.fashion_mnist.load_data()\n\n# --- Offline alternative (comment out tf.keras line above and use this instead) ---\n# import pandas as pd\n# train_df = pd.read_csv('fashion-mnist_train.csv')\n# test_df = pd.read_csv('fashion-mnist_test.csv')\n# y_train = train_df['label'].values\n# y_test = test_df['label'].values\n# X_train = train_df.drop('label', axis=1).values.reshape(-1, 28, 28) # unflatten pixels to 28x28\n# X_test = test_df.drop('label', axis=1).values.reshape(-1, 28, 28)\n\nprint(\"Training set shape:\", X_train.shape) # (60000, 28, 28)\nprint(\"Test set shape: \", X_test.shape) # (10000, 28, 28)\nprint(\"Classes:\", np.unique(y_train))"
},
{
"cell_type": "code",
@@ -597,4 +550,4 @@
},
"nbformat": 4,
"nbformat_minor": 5
}
}
+16 -11
View File
@@ -11,26 +11,29 @@ This repository gathers comprehensive material for the SPPU Computer Engineering
### Codes
1. [Code-1 (Linear Regression using Deep Neural Network)](Codes/Code-1.md)
2. [Code-2b (Classification using Deep Neural Network)](Codes/Code-2b.md)
3. [Code-3a (Convolutional Neural Network - Plant Diseases)](Codes/Code-3a.md)
4. [Code-3b (Convolutional Neural Network - MNIST Fashion Dataset)](Codes/Code-3b.md)
5. [Code-4 (Recurrent Neural Network - Google Stock Price Dataset)](Codes/Code-4.md)
2. [Code-2a (Classification using Deep Neural Network - OCR Letter Recognition)](Codes/Code-2a.md)
3. [Code-2b (Classification using Deep Neural Network)](Codes/Code-2b.md)
4. [Code-3a (Convolutional Neural Network - Plant Diseases)](Codes/Code-3a.md)
5. [Code-3b (Convolutional Neural Network - MNIST Fashion Dataset)](Codes/Code-3b.md)
6. [Code-4 (Recurrent Neural Network - Google Stock Price Dataset)](Codes/Code-4.md)
### Jupyter Notebooks
1. [Notebook-1 (Linear Regression using Deep Neural Network)](Notebooks/Notebook-1.ipynb)
2. [Notebook-2b (Classification using Deep Neural Network)](Notebooks/Notebook-2b.ipynb)
3. [Notebook-3a (Convolutional Neural Network - Plant Diseases)](Notebooks/Notebook-3a.ipynb)
4. [Notebook-3b (Convolutional Neural Network - MNIST Fashion Dataset)](Notebooks/Notebook-3b.ipynb)
5. [Notebook-4 (Recurrent Neural Network - Google Stock Price Dataset)](Notebooks/Notebook-4.ipynb)
2. [Notebook-2a (Classification using Deep Neural Network - OCR Letter Recognition)](Notebooks/Notebook-2a.ipynb)
3. [Notebook-2b (Classification using Deep Neural Network)](Notebooks/Notebook-2b.ipynb)
4. [Notebook-3a (Convolutional Neural Network - Plant Diseases)](Notebooks/Notebook-3a.ipynb)
5. [Notebook-3b (Convolutional Neural Network - MNIST Fashion Dataset)](Notebooks/Notebook-3b.ipynb)
6. [Notebook-4 (Recurrent Neural Network - Google Stock Price Dataset)](Notebooks/Notebook-4.ipynb)
### Datasets
1. [Dataset for Practical-1 (Boston House Price)](Datasets/boston.csv)
2. [Dataset for Practical-2b (IMDB Reviews)](Datasets/IMDB%20Dataset.csv)
3. [Dataset for Practical-3b (MNIST Fashion)](Datasets/fashionmnist.zip)
4. [Dataset for Practical-4 (Google Stock Price)](Datasets/GOOG.csv)
2. [Dataset for Practical-2a (Letter Recognition)](Datasets/letter+recognition.zip)
3. [Dataset for Practical-2b (IMDB Reviews)](Datasets/IMDB%20Dataset.csv)
4. [Dataset for Practical-3b (MNIST Fashion)](Datasets/fashionmnist.zip)
5. [Dataset for Practical-4 (Google Stock Price)](Datasets/GOOG.csv)
### Assignments
@@ -56,6 +59,8 @@ This repository gathers comprehensive material for the SPPU Computer Engineering
### [IN-SEM PYQ Answers](Notes/IN-SEM%20PYQ%20Answers/)
### [END-SEM PYQ Answers](Notes/END-SEM%20PYQ%20Answers/)
### [Question Bank](DL%20-%20Question%20Bank.pdf)
---