Files
DataScienceAndBigDataAnalytics/Codes/Code-A9 (Data Visualization-2).md
T

96 lines
1.8 KiB
Markdown

# A9 - Data Visualization-2
✅ Tested and working as intended.
---
## Pre-requisites
- Install required libraries: `seaborn` & `matplotlib`
```shell
pip install matplotlib seaborn
```
---
## Code blocks
1. Import libraries:
```python3
import seaborn as sns
import matplotlib.pyplot as plt
from collections import Counter
```
2. Load built-in dataset:
```python3
df= sns.load_dataset('titanic')
df.head()
```
3. Describe:
```python3
# Describe
print(df.describe())
# Describe - transposed, i.e. rows and columns swapped
print(df.describe().transpose())
```
4. Mean, median, mode: **(NOT SURE IF THIS IS NEEDED)**
```python3
# Mean, median, mode
age_data = df['age'].dropna() # Drop missing values in age & store in age_data var
sorted_age_data = sorted(age_data) # Store sorted age_data
n = len(sorted_age_data) # Store length of age_data
# Calculate mean
mean_age = sum(age_data) / len(age_data)
# Calculate median
if n % 2 == 1: # odd
median_age = sorted_age_data[n // 2]
else: # even
median_age = (sorted_age_data[n // 2 - 1] + sorted_age_data[n // 2]) / 2
# Calculate mode
age_counts = Counter(age_data) # Count occurrences of each age
mode_age = age_counts.most_common(1)[0][0] # Get the most common value
# Print
print(f"The mean age is: {mean_age}")
print(f"The median age is: {median_age}")
print(f"The mode age is: {mode_age}")
```
5. Boxplot:
```python
plt.figure(figsize=(8,4)) # 8 by 4 inches
sns.boxplot(x="sex", y="age", hue="survived", data= df, palette="viridis")
plt.title("Distribution of age with respect to each gender and survival Status")
plt.xlabel("Sex")
plt.ylabel("Age")
plt.show()
```
6. Violin plot:
```python3
sns.violinplot(x='sex',y='age',data=df, hue= 'survived')
```
7. Catplot:
```python3
sns.catplot(x="sex", hue="survived", data=df, kind="count")
```
---