111 lines
2.5 KiB
Markdown
111 lines
2.5 KiB
Markdown
# A10 - Data Visualization-3
|
|
|
|
✅ Tested and working as intended.
|
|
|
|
---
|
|
|
|
## Pre-requisites
|
|
|
|
- Install required libraries: `pandas`, `seaborn` & `matplotlib`
|
|
|
|
```shell
|
|
pip install pandas matplotlib seaborn
|
|
```
|
|
|
|
---
|
|
|
|
1. Import libraries:
|
|
|
|
```python3
|
|
import pandas as pd
|
|
import matplotlib.pyplot as plt
|
|
import seaborn as sns
|
|
```
|
|
|
|
2. Load dataset into Pandas DataFrame:
|
|
|
|
```python3
|
|
df = pd.read_csv('iris.csv')
|
|
df.head()
|
|
```
|
|
|
|
3. Features & their datatypes:
|
|
|
|
```python3
|
|
print("Feature and their types:")
|
|
df.dtypes
|
|
```
|
|
|
|
4. Histogram for each numerical feature:
|
|
|
|
```python3
|
|
plt.figure(figsize=(12, 6))
|
|
|
|
for i, column in enumerate(df.columns[:-1]): # Exclude 'species' column
|
|
plt.subplot(2, 2, i + 1)
|
|
|
|
ax = plt.hist(df[column], edgecolor="black")
|
|
plt.gca().bar_label(plt.gca().containers[0], fmt='%d') # Add count labels
|
|
plt.title(f"Histogram of {column}")
|
|
plt.xlabel(column)
|
|
plt.ylabel("Frequency")
|
|
|
|
plt.tight_layout()
|
|
plt.show()
|
|
```
|
|
|
|
5. Boxplot (for identifying outliers in this case):
|
|
|
|
```python3
|
|
plt.figure(figsize=(12, 6))
|
|
for i, column in enumerate(df.columns[:-1]): # Exclude 'species' column
|
|
ax = plt.subplot(2, 2, i + 1)
|
|
# Create boxplot and store it in a container
|
|
box_container = sns.boxplot(x=df[column], ax=ax, color='salmon')
|
|
plt.title(f"Boxplot of {column}")
|
|
|
|
plt.tight_layout()
|
|
plt.show()
|
|
```
|
|
|
|
6. Detecting outliers:
|
|
|
|
```python3
|
|
for column in df.columns[:-1]: # Exclude 'species' column
|
|
Q1 = df[column].quantile(0.25)
|
|
Q3 = df[column].quantile(0.75)
|
|
|
|
IQR = Q3 - Q1
|
|
|
|
lower_bound = Q1 - 1.5 * IQR
|
|
upper_bound = Q3 + 1.5 * IQR
|
|
|
|
outliers = df[(df[column] < lower_bound) | (df[column] > upper_bound)][column]
|
|
|
|
print(f"\nFeature: {column}")
|
|
print(f" Mean: {df[column].mean():.2f}, Median: {df[column].median():.2f}, Std Dev: {df[column].std():.2f}")
|
|
print(f" Outliers Detected: {'Yes' if not outliers.empty else 'No'}","\n " f"Outlier Values: {outliers.tolist()}" if not outliers.empty else "")
|
|
print("-" * 40)
|
|
```
|
|
|
|
7. Violin plot:
|
|
|
|
```python3
|
|
plt.figure(figsize=(12, 8))
|
|
for i, column in enumerate(df.columns[:-1]): # Exclude 'species' column
|
|
plt.subplot(2, 2, i + 1)
|
|
sns.violinplot(x=df["variety"], y=df[column], palette="Set2", hue=df['variety'])
|
|
plt.title(f"Violin Plot of {column} by variety")
|
|
|
|
plt.tight_layout()
|
|
plt.show()
|
|
```
|
|
|
|
---
|
|
|
|
## References
|
|
|
|
- [Dataset source-1](https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data) *(not exactly, but yes, kinda)*
|
|
- [Dataset source-2](https://archive.ics.uci.edu/dataset/53/iris) *(not exactly, but yes, kinda)*
|
|
|
|
--- |