Added codes, datasets and Jupyter notebooks directory.

2025-06-11 13:48:53 +05:30
parent b3a22e9b79
commit 76dc1de8db
32 changed files with 8930 additions and 0 deletions
@@ -0,0 +1,92 @@
+# A1 - Data Wrangling-1
+
+✅ Tested and working as intended.
+
+---
+
+## Pre-requisites
+
+- Install required libraries: `pandas` & `numpy`
+
+```shell
+pip install pandas numpy
+```
+
+- Save the dataset [iris.csv](https://git.kska.io/sppu-te-comp-content/DataScienceAndBigDataAnalytics/src/branch/main/Datasets/iris.csv) in the same directory as this Jupyter notebook.
+
+---
+
+## Code blocks
+
+1. Import libraries:
+
+```python3
+import pandas as pd
+import numpy as np
+```
+
+2. Load the dataset from a CSV file into a pandas DataFrame:
+
+```python3
+df=pd.read_csv('iris.csv')
+df.describe() # Print description of DataFrame
+```
+
+3. Print first and last 5 values:
+
+```python3
+print("First 5 values:\n", df.head())
+print ("Last 5 values:\n", df.tail())
+```
+
+4. Print duplicated values:
+
+```python3
+df.duplicated()
+```
+
+5. Print null values true/false:
+
+```python3
+df.isnull()
+```
+
+6. Print summary of DataFrame:
+
+```python3
+df.info()
+```
+
+7. Print shape, i.e. rows + columns:
+
+```python3
+df.shape
+```
+
+8. Print null (true/false) values in `sepal.length` column:
+
+```python3
+df["sepal.length"].isnull()
+```
+
+9. Delete/Drop `petal.length` column:
+
+```python3
+y = df.drop(["petal.length"], axis=1) # axis=1 column. For row, axis=0
+print(y)
+```
+
+10. In `variety` column, replace `Setosa` with `0` and `Virginica` with `1`:
+
+```python3
+df['variety'].replace(['Setosa', 'Virginica'], [0,1], inplace=True)
+print(df)
+```
+
+11. Print sum of NULL values in each column:
+
+```python3
+df.isnull().sum()
+```
+
+---