← Back to Python series
🛠
Practice Projects · ★★★★
Markov chains · Text processing · Probability

Project 15 — Markov Chain Chatbot

Build a text generator chatbot using Markov chains. Train it on any text corpus and generate statistically plausible responses.

Markov chainNLPtext generationprobabilitydict
Duration
3 hours
Level
📊 Advanced Applied
Prerequisite
🎯 Intermediate Weeks 2–4
OUTCOME
A Markov chain text generator trained on a text corpus

What you'll learn

  • 1Build an n-gram frequency model from text
  • 2Generate text by sampling from the model
  • 3Handle sentence boundaries naturally
  • 4Train on multiple corpora and blend responses

Algorithm

A Markov chain model predicts the next word based only on the previous N words (N-gram). For N=2: given (word1, word2), sample the most likely word3 from training data.

python
from collections import defaultdict
import random

class MarkovChain:
    def __init__(self, n=2):
        self.n = n
        self.model = defaultdict(list)

    def train(self, text: str):
        words = text.split()
        for i in range(len(words) - self.n):
            key = tuple(words[i:i+self.n])
            self.model[key].append(words[i+self.n])

    def generate(self, length=50, seed=None):
        if seed is None:
            seed = random.choice(list(self.model.keys()))
        output = list(seed)
        for _ in range(length - self.n):
            key = tuple(output[-self.n:])
            nexts = self.model.get(key)
            if not nexts: break
            output.append(random.choice(nexts))
        return ' '.join(output)

📝 Exercises

Try them yourself first, then open the solution to compare.

Exercise 1

Build the Markov chatbot

Goal: Train a Markov model on a text file and create a chatbot loop.

Requirements
  • Build bigram model from corpus
  • Generate 50-word responses
  • Interactive chat loop
  • Load corpus from file
Example code / lecture materials

All lecture materials and example code are openly available on GitHub.

View on GitHub ↗