sppu-te-comp-content/DataScienceAndBigDataAnalytics

Files

T

notkshitij 76dc1de8db

Added codes, datasets and Jupyter notebooks directory.

2025-06-11 13:48:53 +05:30

1.4 KiB

Raw Blame History

A1 - Data Wrangling-1

✅ Tested and working as intended.

Pre-requisites

Install required libraries: pandas & numpy

pip install pandas numpy

Save the dataset iris.csv in the same directory as this Jupyter notebook.

Code blocks

Import libraries:

import pandas as pd
import numpy as np

Load the dataset from a CSV file into a pandas DataFrame:

df=pd.read_csv('iris.csv')
df.describe() # Print description of DataFrame

Print first and last 5 values:

print("First 5 values:\n", df.head())
print ("Last 5 values:\n", df.tail())

Print duplicated values:

df.duplicated()

Print null values true/false:

df.isnull()

Print summary of DataFrame:

df.info()

Print shape, i.e. rows + columns:

df.shape

Print null (true/false) values in sepal.length column:

df["sepal.length"].isnull()

Delete/Drop petal.length column:

y = df.drop(["petal.length"], axis=1) # axis=1 column. For row, axis=0
print(y)

In variety column, replace Setosa with 0 and Virginica with 1:

df['variety'].replace(['Setosa', 'Virginica'], [0,1], inplace=True)
print(df)

Print sum of NULL values in each column:

df.isnull().sum()