Files

1.4 KiB

A1 - Data Wrangling-1

Tested and working as intended.


Pre-requisites

  • Install required libraries: pandas & numpy
pip install pandas numpy
  • Save the dataset iris.csv in the same directory as this Jupyter notebook.

Code blocks

  1. Import libraries:
import pandas as pd
import numpy as np
  1. Load the dataset from a CSV file into a pandas DataFrame:
df=pd.read_csv('iris.csv')
df.describe() # Print description of DataFrame
  1. Print first and last 5 values:
print("First 5 values:\n", df.head())
print ("Last 5 values:\n", df.tail())
  1. Print duplicated values:
df.duplicated()
  1. Print null values true/false:
df.isnull()
  1. Print summary of DataFrame:
df.info()
  1. Print shape, i.e. rows + columns:
df.shape
  1. Print null (true/false) values in sepal.length column:
df["sepal.length"].isnull()
  1. Delete/Drop petal.length column:
y = df.drop(["petal.length"], axis=1) # axis=1 column. For row, axis=0
print(y)
  1. In variety column, replace Setosa with 0 and Virginica with 1:
df['variety'].replace(['Setosa', 'Virginica'], [0,1], inplace=True)
print(df)
  1. Print sum of NULL values in each column:
df.isnull().sum()