A3 - Descriptive Statistics

✅ Tested and working as intended.

Pre-requisites

Install required libraries: pandas

pip install pandas

Code blocks

Problem Statement - Part 1 (data.csv)

Import library

import pandas as pd

Generate data and load into DataFrame:

# Generate data
data = {
    'age': [25, 30, 22, 40, 55, 60, 33, 28, 45, 50],
    'income': [50000, 60000, 45000, 70000, 80000, 90000, 65000, 62000, 75000, 85000],
    'age_group': ['20-30', '30-40', '20-30', '40-50', '50-60', '50-60', '30-40', '20-30', '40-50', '50-60']
}

# Define data in DataFrame
df = pd.DataFrame(data)

Group data by age_group, compute statistics for income + print:

# Group the data by age_group and compute summary statistics for 'income'
summary_stats = df.groupby('age_group')['income'].describe()

# Print summary
print(summary_stats)

Group the data by age_group; Select income column for each of the groups created; Calculate median for income:

# Group the data by age_group; Select income column for each of the groups created; Calculate median for income
median_income = df.groupby('age_group')['income'].median()

# Print dat median
print("Median Income by Age Group:")
print(median_income)

Print column names:

print("Column Names:", df.columns)

Modified dataset with repeated values; define in DataFrame:

# Modified dataset with repeated values
data = {
    'age': [25, 30, 25, 40, 55, 60, 33, 28, 45, 50, 25, 30, 28, 30, 25],
    'income': [50000, 60000, 50000, 70000, 80000, 90000, 65000, 62000, 75000, 85000, 50000, 60000, 62000, 70000, 75000],
    'age_group': ['20-30', '30-40', '20-30', '40-50', '50-60', '50-60', '30-40', '20-30', '40-50', '50-60', '20-30', '30-40', '20-30', '30-40', '20-30']
}

# Define data in DataFrame
df = pd.DataFrame(data)

Calculate mode:

# Calculate the mode for each column
mode_age = df['age'].mode()
mode_income = df['income'].mode()
print(f"Mode of Age: {mode_age.values}")
print(f"Mode of Income: {mode_income.values}")

Problem Statement - Part 2 (iris.csv)

Save the dataset iris.csv in the same directory as this Jupyter notebook.

Load dataset and print first 5 rows:

# Load iris.csv in the DataFrame
df = pd.read_csv('iris.csv')

print(df.head()) # Print first 5 columns

Group data; Compute percentiles; Display:

# Group the data by species and display summary statistics
summary_stats_species = df.groupby('variety').describe()

# Compute specific percentiles and statistics
percentiles = df.groupby('variety').quantile([0.25, 0.5, 0.75])

# Display summary statistics and percentiles
summary_stats_species = df.groupby('variety').describe()

print("\nPercentiles by Species:")
print(percentiles)

Group the data by variety; Select sepal.width column for each of the groups created; Display summary statistics:

# Group the data by variety; Select sepal.width column for each of the groups created; Display summary statistics
summary_stats_species = df.groupby('variety')['sepal.width'].describe()

print("\nSummary Statistics by Species for Sepal Width:")
print(summary_stats_species)

Group by variety and compute the median for numeric columns:

# Group by variety and compute the median for numeric columns
median_values = df.groupby('variety').median()

print("Median Values by Species:")
print(median_values)

Group the data by variety; Select sepal.width column for each of the groups created; Display median:

# Group the data by variety; Select sepal.width column for each of the groups created; Display median
median_sepal_length = df.groupby('variety')['sepal.length'].median()
print("Median Sepal Length by Species:")
print(median_sepal_length)

Calculate & print mode for sepal.width:

# Calculate & print mode for sepal.width
mode_width = df['sepal.width'].mode()
print(f"Mode of Width: {mode_width.values}")

4.1 KiB Raw Permalink Blame History

A3 - Descriptive Statistics

Pre-requisites

Code blocks

Problem Statement - Part 1 (data.csv)

Problem Statement - Part 2 (iris.csv)

4.1 KiB

Raw Permalink Blame History