# Practical-1 (Linear Regression using Deep Neural Network) Problem Statement: Linear regression by using Deep Neural network: Implement Boston housing price prediction problem by Linear regression using Deep Neural network. Use Boston House price prediction dataset. > [!NOTE] > Dataset available in [Datasets](../Datasets/boston.csv) directory. --- ## Pre-requisities 1. Install packages using `pip`: `pip install tensorflow keras pandas numpy scikit-learn matplotlib seaborn` (`tensorflow` requires Python 3.9 - 3.12) 2. Copy the `boston.csv` dataset in the same directory as the Jupyter notebook. ## Steps 1. Import Libraries 2. Load Dataset 3. Exploratory Data Analysis (EDA) 4. Check for Missing Values 5. Correlation Heatmap 6. Separate Features and Target 7. Split into Training and Testing Sets 8. Feature Scaling (Standardization) 9. Build the Neural Network Model 10. Compile the Model 11. Train the Model 12. Evaluate the Model on Test Data 13. Make Predictions 14. Plot Training vs Validation Loss 15. Plot Predicted vs Actual Prices --- ## Code ### 1. Import Libraries: ```python3 import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.model_selection import train_test_split from sklearn.preprocessing import StandardScaler from keras import Input from keras.models import Sequential from keras.layers import Dense ``` ### 2. Load Dataset: ```python3 data = pd.read_csv('boston.csv') print(data.head()) ``` ### 3. Exploratory Data Analysis (EDA): ```python3 print("Shape:", data.shape) # number of rows and columns print("\nData Types:\n", data.dtypes) print("\nStatistical Summary:\n", data.describe()) # min, max, mean, std, etc. ``` ### 4. Check for Missing Values: ```python3 print("Missing values per column:\n", data.isnull().sum()) # Drop rows with missing values (if any) data = data.dropna() print("\nShape after dropping nulls:", data.shape) ``` ### 5. Correlation Heatmap: ```python3 plt.figure(figsize=(12, 8)) sns.heatmap(data.corr(), annot=True, fmt=".2f", cmap="coolwarm") # show correlation between all feature pairs plt.title("Feature Correlation Heatmap") plt.tight_layout() plt.show() ``` ### 6. Separate Features and Target: ```python3 X = data.drop('MEDV', axis=1) # all columns except house price y = data['MEDV'] # target: median house price ``` ### 7. Split into Training and Testing Sets: ```python3 # 80% train, 20% test; random_state=42 ensures reproducible split X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) ``` ### 8. Feature Scaling (Standardization): ```python3 scaler = StandardScaler() X_train = scaler.fit_transform(X_train) # learn mean/std from train, then scale X_test = scaler.transform(X_test) # apply same mean/std to test (no leakage) ``` ### 9. Build the Neural Network Model: ```python3 model = Sequential() model.add(Input(shape=(X_train.shape[1],))) # input shape = number of features model.add(Dense(64, activation='relu')) # hidden layer 1: 64 neurons model.add(Dense(32, activation='relu')) # hidden layer 2: 32 neurons model.add(Dense(1, activation='linear')) # output layer: single value (house price) model.summary() ``` ### 10. Compile the Model: ```python3 # adam: adaptive optimizer; mse: standard regression loss; mae: human-readable error metric model.compile(optimizer='adam', loss='mean_squared_error', metrics=['mae']) ``` ### 11. Train the Model: ```python3 # validation_split=0.2 reserves 20% of training data to monitor val loss each epoch history = model.fit(X_train, y_train, epochs=100, batch_size=32, validation_split=0.2) ``` ### 12. Evaluate the Model on Test Data: ```python3 loss, mae = model.evaluate(X_test, y_test) print(f"Test Loss (MSE): {loss:.4f}") print(f"Test Mean Absolute Error: {mae:.4f}") ``` ### 13. Make Predictions: ```python3 predictions = model.predict(X_test) print("First 5 Predicted Prices:", predictions[:5].flatten()) print("First 5 Actual Prices: ", y_test.values[:5]) ``` ### 14. Plot Training vs Validation Loss: ```python3 plt.plot(history.history['loss'], label='Training Loss') plt.plot(history.history['val_loss'], label='Validation Loss') plt.title('Model Loss Over Epochs') plt.ylabel('Loss (MSE)') plt.xlabel('Epoch') plt.legend() plt.grid(True) plt.show() ``` ### 15. Plot Predicted vs Actual Prices: ```python3 plt.figure(figsize=(8, 6)) plt.scatter(y_test, predictions, alpha=0.7) # each point = one test sample plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], 'r--', label='Ideal Fit') # diagonal = perfect prediction plt.xlabel('Actual Price') plt.ylabel('Predicted Price') plt.title('Actual vs Predicted House Prices') plt.legend() plt.grid(True) plt.show() ``` --- ## Miscellaneous - [Dataset source](https://www.kaggle.com/datasets/fedesoriano/the-boston-houseprice-data) ---