🛠
Practice Projects · ★★★★
Markov chains · Text processing · Probability
Project 15 — Markov Chain Chatbot
Build a text generator chatbot using Markov chains. Train it on any text corpus and generate statistically plausible responses.
Markov chainNLPtext generationprobabilitydict
Duration
⏱ 3 hours
Level
📊 Advanced Applied
Prerequisite
🎯 Intermediate Weeks 2–4
OUTCOME
A Markov chain text generator trained on a text corpus
What you'll learn
- 1Build an n-gram frequency model from text
- 2Generate text by sampling from the model
- 3Handle sentence boundaries naturally
- 4Train on multiple corpora and blend responses
Algorithm
A Markov chain model predicts the next word based only on the previous N words (N-gram). For N=2: given (word1, word2), sample the most likely word3 from training data.
python
from collections import defaultdict
import random
class MarkovChain:
def __init__(self, n=2):
self.n = n
self.model = defaultdict(list)
def train(self, text: str):
words = text.split()
for i in range(len(words) - self.n):
key = tuple(words[i:i+self.n])
self.model[key].append(words[i+self.n])
def generate(self, length=50, seed=None):
if seed is None:
seed = random.choice(list(self.model.keys()))
output = list(seed)
for _ in range(length - self.n):
key = tuple(output[-self.n:])
nexts = self.model.get(key)
if not nexts: break
output.append(random.choice(nexts))
return ' '.join(output)📝 Exercises
Try them yourself first, then open the solution to compare.
Exercise 1
Build the Markov chatbot
Goal: Train a Markov model on a text file and create a chatbot loop.
Requirements
- Build bigram model from corpus
- Generate 50-word responses
- Interactive chat loop
- Load corpus from file
Example code / lecture materials
All lecture materials and example code are openly available on GitHub.
View on GitHub ↗