🚀
Advanced
ndarray · Broadcasting · DataFrame · GroupBy · Matplotlib
Week 8 — NumPy & Pandas
Accelerate numerical computation with NumPy arrays and explore tabular data with Pandas. Combine them with Matplotlib to visualize results.
numpypandasmatplotlibDataFramevectorized
Duration
⏱ 3 hours
Level
📊 Advanced
Prerequisite
🎯 Basic Weeks 6–7
OUTCOME
Analyze a CSV dataset: clean, group, aggregate, and plot the results
What you'll learn
- 1Create and manipulate NumPy arrays with vectorized operations
- 2Understand broadcasting rules
- 3Load, clean, and filter data with Pandas DataFrames
- 4Aggregate with groupby and pivot_table
- 5Create line, bar, and scatter plots with Matplotlib
1. NumPy Arrays
python
import numpy as np
a = np.array([1, 2, 3, 4, 5])
print(a * 2) # [2 4 6 8 10] — vectorized
print(a[a > 3]) # [4 5] — boolean indexing
# 2D array
m = np.arange(9).reshape(3, 3)
print(m)
print(m.T) # transpose
print(m @ m) # matrix multiply2. Pandas DataFrame
python
import pandas as pd
df = pd.DataFrame({
"name": ["Alice","Bob","Carol","Dave"],
"dept": ["Eng","Eng","HR","HR"],
"score": [92, 85, 78, 88]
})
print(df.describe())
print(df[df["score"] >= 85])
print(df.groupby("dept")["score"].mean())3. Data Cleaning
python
df = pd.read_csv("data.csv")
# Inspect
print(df.shape, df.dtypes)
print(df.isnull().sum())
# Clean
df = df.dropna(subset=["score"]) # drop rows with no score
df["score"] = pd.to_numeric(df["score"], errors="coerce")
df = df[df["score"].between(0, 100)] # valid range
df = df.drop_duplicates()💻 Examples
Run these examples and check the output yourself.
01_analysis.py— Full mini data analysis pipeline
CODE
import numpy as np
import pandas as pd
# Create sample data
np.random.seed(42)
df = pd.DataFrame({
"name": [f"Student{i}" for i in range(50)],
"math": np.random.randint(40, 100, 50),
"eng": np.random.randint(40, 100, 50),
"sci": np.random.randint(40, 100, 50),
})
df["avg"] = df[["math", "eng", "sci"]].mean(axis=1)
df["grade"] = pd.cut(df["avg"], bins=[0,60,70,80,90,100],
labels=["F","D","C","B","A"])
print(df.groupby("grade").agg(count=("name","count"), mean_avg=("avg","mean")))
print("\nTop 5 students:")
print(df.nlargest(5, "avg")[["name","avg","grade"]])
📝 Exercises
Try them yourself first, then open the solution to compare.
Exercise 1
Sales Analysis
Goal: Load sales.csv and produce a monthly revenue report.
Requirements
- Load CSV with pd.read_csv
- Parse date column, extract year-month
- Group by month, sum revenue
- Plot bar chart with matplotlib
- Print top 3 months
▶Toggle solution
SOLUTION
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("sales.csv", parse_dates=["date"])
df["month"] = df["date"].dt.to_period("M")
monthly = df.groupby("month")["revenue"].sum().sort_index()
print("Top 3 months:")
print(monthly.nlargest(3))
monthly.plot(kind="bar", title="Monthly Revenue")
plt.tight_layout()
plt.savefig("monthly_revenue.png")
print("Chart saved.")
Example code / lecture materials
All lecture materials and example code are openly available on GitHub.
View on GitHub ↗