# A1 - Data Wrangling-1

✅ Tested and working as intended.

---

## Pre-requisites

- Install required libraries: `pandas` & `numpy`

```shell
pip install pandas numpy
```

- Save the dataset [iris.csv](https://git.kska.io/sppu-te-comp-content/DataScienceAndBigDataAnalytics/src/branch/main/Datasets/iris.csv) in the same directory as this Jupyter notebook.

---

## Code blocks

1. Import libraries:

```python3
import pandas as pd
import numpy as np
```

2. Load the dataset from a CSV file into a pandas DataFrame:

```python3
df=pd.read_csv('iris.csv')
df.describe() # Print description of DataFrame
```

3. Print first and last 5 values:

```python3
print("First 5 values:\n", df.head())
print ("Last 5 values:\n", df.tail())
```

4. Print duplicated values:

```python3
df.duplicated()
```

5. Print null values true/false:

```python3
df.isnull()
```

6. Print summary of DataFrame:

```python3
df.info()
```

7. Print shape, i.e. rows + columns:

```python3
df.shape
```

8. Print null (true/false) values in `sepal.length` column:

```python3
df["sepal.length"].isnull()
```

9. Delete/Drop `petal.length` column:

```python3
y = df.drop(["petal.length"], axis=1) # axis=1 column. For row, axis=0
print(y)
```

10. In `variety` column, replace `Setosa` with `0` and `Virginica` with `1`:

```python3
df['variety'].replace(['Setosa', 'Virginica'], [0,1], inplace=True)
print(df)
```

11. Print sum of NULL values in each column:

```python3
df.isnull().sum()
```

---