# Making a Markov Chain Twitter Bot in Python

The study of Markov Chains is an interesting topic that has many applications. Such techniques can be used to model the progression of diseases, the weather, or even board games. We are going to introduce and motivate the concept mathematically, and then build a “Markov bot” for Twitter in Python. The resulting bot is available on GitHub.

## Introduction to Markov Chains

A Markov Chain is used to model events whose outcome only depend on the current state. That is to say, what the system does next only depends on where it currently is. In many ways, a Markov Chain can describe how I play most strategic games… Anyways, if our state at time $t$ is represented by $X_t$, then this is the same as writing \begin{equation} P(X_t = x | \text{Where we are + where we’ve been}) = P(X_t = x | \text{Where we are now}). \end{equation}

This can of course be written more mathematically, but it gets the point across. When you have this situation, you can arrange these probabilities into a matrix. To construct such a matrix, the rows and columns will be all the possible states, and the $state_i, state_j$ entry is the probability of moving from state $i$ to state $j$.

Let’s pretend our mood is governed by a Markov Chain each day. If we are happy, then there’s a 75% change for us to be happy the next day, other wise we’re sad. If we’re sad, there’s a 10% change we’ll become happy the next day, and 90% change we’ll stay the same. Putting this into a matrix we have

If we start happy, we can represent that as the row vector $s_0 = [1, 0]$. If we want to know how we’ll feel tomorrow, we can find the probability distribution like this:

so we have a 75% change to still be happy, and a 25% otherwise. After another day…

[ 0.5875  0.4125]


And in 100 days…

[ 0.28571429  0.71428571]


And a thousand days…

[ 0.28571429  0.71428571]


Wait, the distribution is the same? We’ve stumbled onto what is called a limiting distribution. For special transition matrices, these limiting distributions exist, which mean no matter what state you start at, you’ll end up at the same distribution eventually. In this case, the interpretation is this person will be sad approximately 3/4 of the time.

Where does this distribution come from? It’s the eigenvector of $P^T$ associated with the eigenvalue $1$ scaled to sum to 1!

[ 0.28571429  0.71428571]


Limiting distributions are related to stationary distributions. It turns out that as long as your transition matrix P satisfies a few qualities, it’s largest eigenvalue will equal 1, the (left) eigenvector associated with 1 will have entries of the same sign, and all other eigenvalues will be strictly smaller than 1 in absolute value. The mathematical underpinning of this guarantee is a special case of the Perron-Frobenius Theorem. We call this left eigenvector the stationary distribution. Of course, it’s clear that for some left eigenvector $\pi$ associated with $1$, that $\pi P = \pi$, but how does a stationary distribution $\pi$ (new $\pi$) come about? The stationary distribution $\pi$ is a vector such that for an arbitrary vector $v$, \begin{equation} \lim_{n\to\infty} v P^n = \pi. \end{equation}

Now we will look at an example of this behavior with right eigenvalues to accustom ourselves to idea of limiting distributions. Let $M$ be a 2-by-2 matrix which has eigenvalues $1, \epsilon$ with $% $ and eigenvectors $e_1, e_2$ which span $\mathbb{R}^2$. This means any vector $x$ in $R^2$ can be written $x = c_1 e_1 + c_2 e_2$. Another way to put it is the eigenvectors form a basis for the space. This isn’t always true, but assume we have a matrix where this is true.

Then, look what happens when we start multiplying $x$ by our matrix $M$:

\begin{equation} M x = M( c_1 e_1 + c_2 e_2) = c_1 M e_1 + c_2 M e_2 = c_1 e_1 + c_2 \epsilon e_2 \end{equation} which follows since $e_1$ and $e_2$ are eigenvectors. Notice, we can keep doing this for the relation: \begin{equation} M^n x = M^n( c_1 e_1 + c_2 e_2) = c_1 M^n e_1 + c_2 M^n e_2 = c_1 (1)^n e_1 + c_2 (\epsilon)^n e_2 \end{equation}

As $n\to \infty$, the term $c_2 (\epsilon)^n e_2 \to 0$ and the other term will be the only term left. So, \begin{equation} \lim_{n\to \infty} M^n x = \pi \end{equation} for all $x\in \mathbb{R}^2$. This gives us an idea of how limiting distributions can come about in the wild, but this is not the whole story!

It’s interesting to note that if a transition matrix $P$ has a stationary distribution, it may not necessarily be a limiting distribution. However, if we have an idealized case like our matrix $M$ , then the stationary distribution is the limiting distribution.

## Implicitly Building a Transition Matrix From Text

With the theory out of the way, let’s talk about how you can actually build such a matrix (implicitly). We’re going to use a phrase from “Smells Like Teen Spirit” from Nirvana to do an example.

Hello, hello, hello, how low?


In the case of text Markov Chain, a state is what word you’re currently at and the transition matrix is determined by what comes after that word in the whole “corpus” you are using (in this case, this line of the song).

Think about what the transition matrix of this sentence would look like when you ignore punctuation and capitalization. If you are at the word “hello”, there is 66% chance you’ll end up back the word “hello” and a 34% chance you’ll end up at “how”. If you are at “how”, you have a 100% chance to get to “low”. For text, it’s usually easy to actually represent this as a dictionary in Python to avoid calculating everything:

{'low': ['hello'], 'hello': ['hello', 'hello', 'how'], 'how': ['low']}


Now, this isn’t quite a matrix, but we can use it in the same way.

## Sampling from the Chain

Let’s say we want to “sample” a sentence from corpus that is typical of the underlying distribution of the corpus (via it’s transition matrix). We first choose a starting state (word) and then we keep applying the transition matrix (via the dictionary) until we have our final sentence. We are really using a Markov Chain Monte Carlo technique in hopes to sample from the limiting distribution. So, if your corpus really has a limiting distribution, you would end up with realization of the chain, or a path, which samples each word from the limiting distribution, if it exists.

For this experiment we will use the full song without repeating any parts:

a dirty word

Hello, hello, hello, hello, hello, how low?
Hello, hello, how low?
Hello, hello, hello, hello, how low?
Hello, hello, hello!

With the lights out, it's hard to lose and self assured
Oh no, I taste
Oh yeah, I feel blessed
Our little group has always will until the lights out, it's hard to find
Oh well,


Normally, the end of these chains are fun to read in a nonsensical way. With a larger corpus, you begin to get better combinations.

just why I taste
Oh yeah, I guess it hard, it's hard to find
Oh well, whatever, never mind

It's fun to pretend
She's over bored and self assured
Oh no, I feel blessed
Our little group has always will until the end

And I guess it hard, it's less dangerous
Here we are now, entertain us
A muchacho
An albino
A mosquito
My libido
Yeah, hey, yay

I'm worse at what I taste
Oh yeah, I feel blessed
Our little group has always been
And always been
And always will until the end

And I guess it makes me smile
I found it makes me smile
I found it hard, it's less dangerous
Here


Leaving in the new lines makes it more humorous because you get two words that “go” together in the song. We can actually expand our state space to be pairs of words rather than single words to get better results. But who’s going to see our results? The answer… the world:

# Assembling the Bot

Now, we have explored how to use a Markov Chain to generate gibberish (essentially), and now we’re going to unleash that gibberish on the world with an automated twitter bot. In a brief post, we piece together the components necessary to have a bot. The resulting bot is available on GitHub.

To build a bot, there’s really two steps:

1. Sample from a Markov Chain

The bot will allow you to input a series of text files as a corpora and use all of them to create your tweets. Having multiple documents complicates things a little when building the transition matrix, but it’s not too bad.

## The Basics of the Bot

First, I have a class called Bot. Bot not only stores the transition dictionary but it also can generate tweets and post to twitter via the TwitterAPI. This class’s init gives you all of the options and creates some basic instance level variables that will get filled out later.

The main loop of the bot is the following function. In an eternal loop, it samples a tweet, posts a tweet, and waits– before starting all over again for all eternity.

## Cleaning Up the Output

In our last post, we would sample text but did nothing to clean it up to make it more readable. There’s a lot that can be done to clean it up and make it more sensical. First, you can leave in the punctuation to make it flow better on the page. Another trick for a larger corpus is to make your state two words rather than just a single word. This will make the output make more sense because the words will follow a more natural pattern.

### Transition Matrix with Two States

In this next snippet of code, we have a corpora stored in self._documents. We follow the advice of the previous paragraph to create a transition matrix (dictionary) with two states. The code is slightly more complicated due to the special case of having one word in one corpus and one word in another, but it’s not so bad.

### Generating the Text in a Nicer Way

This part is more of an art than a science. We apply a few heuristics to make the text flow better. As before we have a burn in section to get ourselves further into the state space. Then we make sure our tweet begins at the first word of a sentence and add proper capitalization.