← Back to Python series
⚙️
Intermediate
datetime · pathlib · collections · itertools · random

Week 4 — Standard Library

The Python standard library has batteries included. Learn the most useful modules: datetime for time, pathlib for files, collections for advanced data structures, and itertools for efficient looping.

datetimepathlibcollectionsitertoolsstdlib
Duration
2.5 hours
Level
📊 Intermediate
Prerequisite
🎯 Week 3
OUTCOME
Build a file report tool using pathlib and collections.Counter

What you'll learn

  • 1Parse and format dates with datetime
  • 2Navigate the filesystem with pathlib.Path
  • 3Use Counter, defaultdict, and deque from collections
  • 4Apply itertools.chain, groupby, combinations, permutations
  • 5Generate random data with the random module

1. datetime

python
from datetime import datetime, timedelta, date

now = datetime.now()
print(now.strftime("%Y-%m-%d %H:%M:%S"))

bday = date(1990, 6, 15)
today = date.today()
age = (today - bday).days // 365
print(f"Age: {age} years")

next_week = now + timedelta(days=7)
print(next_week.date())

2. pathlib

python
from pathlib import Path

p = Path(".")
for f in p.glob("*.py"):
    print(f.name, f.stat().st_size, "bytes")

data = Path("data.txt")
data.write_text("Hello, pathlib!")
print(data.read_text())
data.unlink()  # delete

3. collections

python
from collections import Counter, defaultdict, deque

# Counter: count occurrences
words = "the cat sat on the mat".split()
print(Counter(words).most_common(3))

# defaultdict: no KeyError on missing key
dd = defaultdict(list)
dd["fruits"].append("apple")

# deque: efficient append/pop at both ends
q = deque([1, 2, 3])
q.appendleft(0)
q.append(4)
print(list(q))   # [0, 1, 2, 3, 4]

4. itertools

python
from itertools import chain, combinations, permutations, groupby

print(list(chain([1,2], [3,4], [5])))
print(list(combinations("ABC", 2)))
print(list(permutations([1,2,3], 2)))

data = [("a",1),("a",2),("b",3),("b",4)]
for key, group in groupby(data, key=lambda x: x[0]):
    print(key, list(group))

When to reach for what (and the gotchas)

These modules overlap with things you could hand-roll, but the standard library version is faster, tested, and clearer. Here's the quick decision table and the trap each one hides.

NeedReach forCommon gotcha
Calendar dates / deadlinesdatetime.dateSubtracting dates gives a timedelta — read .days, don't compare strings
Date + time of daydatetime.datetimeNaive vs timezone-aware: don't mix them in subtraction
Filesystem pathspathlib.PathPath objects, not strings — wrap with str(p) only at the boundary
Counting occurrencescollections.CounterTies keep insertion order, not alphabetical
Missing-key defaultscollections.defaultdictReading a missing key CREATES it
Fast both-end queuecollections.dequeRandom indexing (q[5000]) is O(n), unlike a list
Grouping rowsitertools.groupbyOnly groups CONSECUTIVE keys — sort first
💡

Rule of thumb: if you're about to write a loop that counts, buckets, or pairs things up, there's usually a one-liner in collections or itertools that does it faster and reads better.

Common Mistakes (FAQ)

Q. datetime or date — which should I use?

Use date for calendar days (birthdays, deadlines, 'days until'); use datetime when the time of day matters. Subtracting either gives a timedelta — read .days or .total_seconds(), never compare formatted strings.

Q. defaultdict created a key I only meant to read — why?

Indexing a missing key on a defaultdict runs the factory and INSERTS it. If you only want to check, use d.get(key) or `key in d` instead of d[key].

Q. itertools.groupby returned weird/empty groups.

groupby only groups CONSECUTIVE items with the same key. Sort the data by that same key first (sorted(data, key=...)), otherwise identical keys scattered through the list become separate groups.

Q. Counter.most_common() ties aren't alphabetical — is that a bug?

No. Equal counts keep their first-seen (insertion) order in Python 3.7+. If you need a deterministic tie-break, sort with a secondary key: sorted(c.items(), key=lambda kv: (-kv[1], kv[0])).

Q. pathlib or os.path?

Prefer pathlib — it's the modern, object-oriented API (p / 'sub' / 'file.txt', p.read_text(), p.glob(...)). Reach for os.path only when an older library hands you string paths.

💻 Examples

Run these examples and check the output yourself.

01_file_report.pyFile type report using pathlib + Counter
CODE
from pathlib import Path
from collections import Counter

path = Path(".").resolve()
all_files = [f for f in path.rglob("*") if f.is_file()]
exts = Counter(f.suffix or "(no ext)" for f in all_files)
total_size = sum(f.stat().st_size for f in all_files)

print(f"Total files: {len(all_files)}")
print(f"Total size:  {total_size/1024:.1f} KB")
print("\nTop file types:")
for ext, cnt in exts.most_common(5):
    print(f"  {ext:<12} {cnt}")
02_date_calc.pyDate arithmetic with datetime
CODE
from datetime import date, timedelta

today = date.today()
print(f"Today:      {today}")
print(f"100 days later: {today + timedelta(days=100)}")

# Days until New Year
new_year = date(today.year + 1, 1, 1)
print(f"Days to New Year: {(new_year - today).days}")

📝 Exercises

Try them yourself first, then open the solution to compare.

Exercise 1

Word Frequency Analyzer

Goal: Read a text file and produce a frequency report using Counter.

Requirements
  • Read file with pathlib
  • Tokenize into words (lowercase, strip punctuation)
  • Print top-10 words with counts and a bar chart (ASCII)
Toggle solution
SOLUTION
from pathlib import Path
from collections import Counter
import re

text = Path("sample.txt").read_text(errors="ignore").lower()
words = re.findall(r"[a-z]+", text)
top = Counter(words).most_common(10)
max_c = top[0][1]
for word, count in top:
    bar = "█" * (count * 20 // max_c)
    print(f"{word:<15} {count:5d} {bar}")
Example code / lecture materials

All lecture materials and example code are openly available on GitHub.

View on GitHub ↗