⚙️

Intermediate

datetime · pathlib · collections · itertools · random

Week 4 — Standard Library

The Python standard library has batteries included. Learn the most useful modules: datetime for time, pathlib for files, collections for advanced data structures, and itertools for efficient looping.

datetimepathlibcollectionsitertoolsstdlib

Duration

⏱ 2.5 hours

Level

📊 Intermediate

Prerequisite

🎯 Week 3

OUTCOME

Build a file report tool using pathlib and collections.Counter

What you'll learn

1Parse and format dates with datetime
2Navigate the filesystem with pathlib.Path
3Use Counter, defaultdict, and deque from collections
4Apply itertools.chain, groupby, combinations, permutations
5Generate random data with the random module

1. datetime

python

from datetime import datetime, timedelta, date

now = datetime.now()
print(now.strftime("%Y-%m-%d %H:%M:%S"))

bday = date(1990, 6, 15)
today = date.today()
age = (today - bday).days // 365
print(f"Age: {age} years")

next_week = now + timedelta(days=7)
print(next_week.date())

2. pathlib

python

from pathlib import Path

p = Path(".")
for f in p.glob("*.py"):
    print(f.name, f.stat().st_size, "bytes")

data = Path("data.txt")
data.write_text("Hello, pathlib!")
print(data.read_text())
data.unlink()  # delete

3. collections

python

from collections import Counter, defaultdict, deque

# Counter: count occurrences
words = "the cat sat on the mat".split()
print(Counter(words).most_common(3))

# defaultdict: no KeyError on missing key
dd = defaultdict(list)
dd["fruits"].append("apple")

# deque: efficient append/pop at both ends
q = deque([1, 2, 3])
q.appendleft(0)
q.append(4)
print(list(q))   # [0, 1, 2, 3, 4]

4. itertools

python

from itertools import chain, combinations, permutations, groupby

print(list(chain([1,2], [3,4], [5])))
print(list(combinations("ABC", 2)))
print(list(permutations([1,2,3], 2)))

data = [("a",1),("a",2),("b",3),("b",4)]
for key, group in groupby(data, key=lambda x: x[0]):
    print(key, list(group))

When to reach for what (and the gotchas)

These modules overlap with things you could hand-roll, but the standard library version is faster, tested, and clearer. Here's the quick decision table and the trap each one hides.

Need	Reach for	Common gotcha
Calendar dates / deadlines	datetime.date	Subtracting dates gives a timedelta — read .days, don't compare strings
Date + time of day	datetime.datetime	Naive vs timezone-aware: don't mix them in subtraction
Filesystem paths	pathlib.Path	Path objects, not strings — wrap with str(p) only at the boundary
Counting occurrences	collections.Counter	Ties keep insertion order, not alphabetical
Missing-key defaults	collections.defaultdict	Reading a missing key CREATES it
Fast both-end queue	collections.deque	Random indexing (q[5000]) is O(n), unlike a list
Grouping rows	itertools.groupby	Only groups CONSECUTIVE keys — sort first

💡

Rule of thumb: if you're about to write a loop that counts, buckets, or pairs things up, there's usually a one-liner in collections or itertools that does it faster and reads better.

Common Mistakes (FAQ)

Q. datetime or date — which should I use?

Use date for calendar days (birthdays, deadlines, 'days until'); use datetime when the time of day matters. Subtracting either gives a timedelta — read .days or .total_seconds(), never compare formatted strings.

Q. defaultdict created a key I only meant to read — why?

Indexing a missing key on a defaultdict runs the factory and INSERTS it. If you only want to check, use d.get(key) or `key in d` instead of d[key].

Q. itertools.groupby returned weird/empty groups.

groupby only groups CONSECUTIVE items with the same key. Sort the data by that same key first (sorted(data, key=...)), otherwise identical keys scattered through the list become separate groups.

Q. Counter.most_common() ties aren't alphabetical — is that a bug?

No. Equal counts keep their first-seen (insertion) order in Python 3.7+. If you need a deterministic tie-break, sort with a secondary key: sorted(c.items(), key=lambda kv: (-kv[1], kv[0])).

Q. pathlib or os.path?

Prefer pathlib — it's the modern, object-oriented API (p / 'sub' / 'file.txt', p.read_text(), p.glob(...)). Reach for os.path only when an older library hands you string paths.

💻 Examples

Run these examples and check the output yourself.

01_file_report.py— File type report using pathlib + Counter

CODE

from pathlib import Path
from collections import Counter

path = Path(".").resolve()
all_files = [f for f in path.rglob("*") if f.is_file()]
exts = Counter(f.suffix or "(no ext)" for f in all_files)
total_size = sum(f.stat().st_size for f in all_files)

print(f"Total files: {len(all_files)}")
print(f"Total size:  {total_size/1024:.1f} KB")
print("\nTop file types:")
for ext, cnt in exts.most_common(5):
    print(f"  {ext:<12} {cnt}")

02_date_calc.py— Date arithmetic with datetime

CODE

from datetime import date, timedelta

today = date.today()
print(f"Today:      {today}")
print(f"100 days later: {today + timedelta(days=100)}")

# Days until New Year
new_year = date(today.year + 1, 1, 1)
print(f"Days to New Year: {(new_year - today).days}")

📝 Exercises

Try them yourself first, then open the solution to compare.

Exercise 1

Word Frequency Analyzer

Goal: Read a text file and produce a frequency report using Counter.

Requirements

Read file with pathlib
Tokenize into words (lowercase, strip punctuation)
Print top-10 words with counts and a bar chart (ASCII)

▶Toggle solution

SOLUTION

from pathlib import Path
from collections import Counter
import re

text = Path("sample.txt").read_text(errors="ignore").lower()
words = re.findall(r"[a-z]+", text)
top = Counter(words).most_common(10)
max_c = top[0][1]
for word, count in top:
    bar = "█" * (count * 20 // max_c)
    print(f"{word:<15} {count:5d} {bar}")

Example code / lecture materials

All lecture materials and example code are openly available on GitHub.

View on GitHub ↗