Added codes, datasets and Jupyter notebooks directory.
This commit is contained in:
@@ -0,0 +1,92 @@
|
||||
# A1 - Data Wrangling-1
|
||||
|
||||
✅ Tested and working as intended.
|
||||
|
||||
---
|
||||
|
||||
## Pre-requisites
|
||||
|
||||
- Install required libraries: `pandas` & `numpy`
|
||||
|
||||
```shell
|
||||
pip install pandas numpy
|
||||
```
|
||||
|
||||
- Save the dataset [iris.csv](https://git.kska.io/sppu-te-comp-content/DataScienceAndBigDataAnalytics/src/branch/main/Datasets/iris.csv) in the same directory as this Jupyter notebook.
|
||||
|
||||
---
|
||||
|
||||
## Code blocks
|
||||
|
||||
1. Import libraries:
|
||||
|
||||
```python3
|
||||
import pandas as pd
|
||||
import numpy as np
|
||||
```
|
||||
|
||||
2. Load the dataset from a CSV file into a pandas DataFrame:
|
||||
|
||||
```python3
|
||||
df=pd.read_csv('iris.csv')
|
||||
df.describe() # Print description of DataFrame
|
||||
```
|
||||
|
||||
3. Print first and last 5 values:
|
||||
|
||||
```python3
|
||||
print("First 5 values:\n", df.head())
|
||||
print ("Last 5 values:\n", df.tail())
|
||||
```
|
||||
|
||||
4. Print duplicated values:
|
||||
|
||||
```python3
|
||||
df.duplicated()
|
||||
```
|
||||
|
||||
5. Print null values true/false:
|
||||
|
||||
```python3
|
||||
df.isnull()
|
||||
```
|
||||
|
||||
6. Print summary of DataFrame:
|
||||
|
||||
```python3
|
||||
df.info()
|
||||
```
|
||||
|
||||
7. Print shape, i.e. rows + columns:
|
||||
|
||||
```python3
|
||||
df.shape
|
||||
```
|
||||
|
||||
8. Print null (true/false) values in `sepal.length` column:
|
||||
|
||||
```python3
|
||||
df["sepal.length"].isnull()
|
||||
```
|
||||
|
||||
9. Delete/Drop `petal.length` column:
|
||||
|
||||
```python3
|
||||
y = df.drop(["petal.length"], axis=1) # axis=1 column. For row, axis=0
|
||||
print(y)
|
||||
```
|
||||
|
||||
10. In `variety` column, replace `Setosa` with `0` and `Virginica` with `1`:
|
||||
|
||||
```python3
|
||||
df['variety'].replace(['Setosa', 'Virginica'], [0,1], inplace=True)
|
||||
print(df)
|
||||
```
|
||||
|
||||
11. Print sum of NULL values in each column:
|
||||
|
||||
```python3
|
||||
df.isnull().sum()
|
||||
```
|
||||
|
||||
---
|
||||
Reference in New Issue
Block a user