Making Fantasy Football Projections Via A Monte Carlo Simulation

Table of Contents

In this post, we are going to use historic data from the nflgame package in Python to make projections on total points for a fantasy football team using a Monte Carlo simulation. We’ll also discuss a statistical technique to shrink the standard deviation of our projection. As opposed to typical fantasy football projections, our simulation focuses on projecting the score of the team accurately rather than the players, which could give you an edge for selecting your roster.

In our simulation, a fantasy team can be thought of as a sum of random variables, one for each member of a team. An example might be: \(X_{team} = X_{qb} + X_{wr1} + X_{rb1} + \dots\)

Each player level random variable outputs a fantasy score, which is sampled from historic games. However, the historic games we consider only come from 2013-2015 to keep the scores relevant. We also will give different sampling weights to different years using a heuristic. This method will not work for first year players.

Important Point: We are choosing the simulate in this fashion. 
We could extend this approach to simulate the games themselves 
and estimate each player's outcome in each game. 

By defining our fantasy team in such a way, we are able to directly see how we can use a Monte Carlo simulation to project the score of our team. For $N$ experiments, we will sample a score for each player and sum the scores to find a team projection. It is important that we are doing team level projections, and here’s why: the team is a sum of random variables. The Central Limit Theorem implies that such sums are approximately normally distributed, especially with a large number of variables being summed together. We can check that in fact, most team projections from this method are approximately normal, even though teams are smaller than the typical 30.

Because the team variable is approximately normal, the sample variance estimator is (approximately) independent from the sample mean estimator. This allows us to have more confidence that any biases in our projections won’t be found in both the estimate and the standard deviation.

The next important point is that we will use the expectation of the sample mean as our projection, not the random variable itself. Why? The expectation of the sample mean is the same as the expectation of the original random variable, but it has a smaller standard deviation. Viewing the problem this way allows us to have a tighter confidence interval on the projection. The standard standard in this case is estimated by \(S = s / \sqrt{ n }\) where \(s\) is the sample standard deviation and \(n\) is the team size. Luckily, the expectation of the sample mean can be estimated simply by the expectation of the random variable, and so we can just take the average value of the team scores from our experiments.

Selecting The Team

The first thing we have to do is get the Player objects from nflgame for our team. I wrote a simple function to grab the objects and print them to verify that I have the correct members:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import nflgame
team = ['Drew Brees', 
        'Antonio Brown', 
        'Allen Robinson', 
        'Adrian Peterson',
        'Doug Martin',
         'Gary Barnidge',
         'Keenan Allen']

def make_team(team):
    tm = []
    for p in team:
        for plr in nflgame.find(p):
            if plr.position not in set(['QB','WR','TE','RB']) or plr.status == '':
                continue
            tm.append(plr)
    return tm

def validate_team(team):
    for t in team:
        print(t.full_name, t.team)

tm = make_team(team)
validate_team(tm)
Drew Brees NO
Antonio Brown PIT
Allen Robinson JAC
Adrian Peterson MIN
Doug Martin TB
Gary Barnidge CLE
Keenan Allen SD

Scoring nflgame’s output

In order to write a simulation, we’re going to need fantasy points for each player. The nflgame package does not provide fantasy points, so I wrote a simple scoring function below.

Essentially, nflgame’s methods can return a list of players and they have a “_stats” attribute which contains game level stats, for example “rushing_yrds” which is self explanatory. I collected the important stats into a dict that maps the attribute to a scoring function (given by a lambda) which takes as its input the game stat and outputs the fantasy score.

From this point, the scoring function becomes very simple as you can see below. The scoring is roughly the same as DraftKings.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
scoring = {
    'passing_yds' : lambda x : x*.04 +
                        (3. if x >= 300 else 0),
    'passing_tds' : lambda x : x*4., 
    'passing_ints' : lambda x : -1.*x,
    'rushing_yds' : lambda x : x*.1 + (3 if x >= 100 else 0),
    'rushing_tds' : lambda x : x*6.,
    'kickret_tds' : lambda x : x*6.,
    'receiving_tds' : lambda x : x*6.,
    'receiving_yds' : lambda x : x*.1,
    'receiving_rec' : lambda x : x,
    'fumbles_lost' : lambda x : -1*x,
    'passing_twoptm'  : lambda x : 2*x,
    'rushing_twoptm' : lambda x : 2*x,
    'receiving_twoptm' : lambda x : 2*x
}

def score_player(player):
    score = 0
    for stat in player._stats:
        if stat in scoring:
            score += scoring[stat](getattr(player,stat))    
    return score

Simulatig the score for a single player

Now that we can score the output from nflgame, we need to select a game for a player and score it. We’re going to implement this below. It will take games from 2013-2015 and a random week, and then it will score the game. If for any reason the player has no stats, it will try again (as you can see by the return on the last line of the function).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
import numpy as np
def get_score_for_player(player):
    
    # Sample the year and week
    year = np.random.choice([2013,2014,2015],
                            p=[.2,.3,.5])
    week = np.random.randint(1,18)
    
    # Find the player and score them for the given week/year   
    for p in get_games(year,week):
        if p.player is None:
            continue
        if player == p.player:
            return score_player(p)
        
    return get_score_for_player(player) # Retry due to bye weeks / failure for any other reason

Defining the get_games function and using the LRU Cache decorator for performance

The get_game function is a wrapper for nflgame which I define below. It can be a costly function because nflgame stores data in zipped files on disk (if it is not pinging the NFL servers).

The get_game function called is defined below. I use the lru_cache decorator to set up a cache for games returned so the code won’t have to ping nflgame for the data if it’s already been accessed before. This is a simple approach to more efficiently dealing with a library which may have costly function or data access calls.

The inputs and the output of the function must be hashable for this to work Under the covers, the lru_cache will create a dict which stores inputs to outputs. If you call the function with the same inputs as a previous call in the cache, it will automatically return to you the output without actually calling the function.

1
2
3
4
5
from functools import lru_cache
@lru_cache(200) # Define a cache with 200 empty slots
def get_games(year,week):
    g = nflgame.games(year,week=week)
    return nflgame.combine_game_stats(g)

Simulation and Results

Now, we’ll look at the final simulation function. This function will create a pandas data frame of all the player scores for each experiment. This way, if you would like to swap out players for any reason you’ll have access to their data. The simulation is straightforward from our building blocks defined above:

1
2
3
4
5
6
7
8
9
10
11
12
import pandas as pd
def simulate(team, exps=10):
    scores = pd.DataFrame(data=np.zeros((exps,len(team))),
                          columns = [p.name for p in team])
    for n in range(exps):
        for player in team:
            scores.loc[n,player.name] += get_score_for_player(player)
    return scores


outcome = simulate(tm, exps=100)
outcome.head()
Drew Brees Antonio Brown Allen Robinson Adrian Peterson Doug Martin Gary Barnidge Keenan Allen
0 13.10 31.0 16.3 6.4 4.1 19.4 23.5
1 14.50 12.5 14.8 23.2 2.6 19.5 35.1
2 24.20 36.7 16.3 25.3 12.8 22.5 29.7
3 14.36 40.6 3.7 6.7 10.7 1.9 23.5
4 19.68 35.8 16.0 29.0 8.8 11.9 17.2

Projecting Fantasy Points

We can calculate some projections easily from our simulation data. Again, we are defining our simulator’s projection to be the expected value of the sample mean of the team random variable. Therefore, the standard deviation of our projection is the standard error of the team random variable.

1
2
3
4
game_points = outcome.sum(axis=1, skipna=True) # Sum the player scores together

print('Team projection: %s' % game_points.mean())
print('Standard Deviations: %s' % (game_points.std()/np.sqrt(len(outcome.columns))))
Team projection: 119.5094
Standard Deviations: 8.61532541363

As mentioned, you may also look at player level stats.

1
outcome.mean() # Point projections for each player
Drew Brees         23.1486
Antonio Brown      23.2828
Allen Robinson     17.1940
Adrian Peterson    17.1430
Doug Martin        13.0310
Gary Barnidge      12.1940
Keenan Allen       13.5160
dtype: float64
1
outcome.std()  # Standard deviation in point projections for each player
Drew Brees          8.768458
Antonio Brown      11.413678
Allen Robinson      8.935615
Adrian Peterson     8.293532
Doug Martin         9.705395
Gary Barnidge       7.577823
Keenan Allen        8.965317
dtype: float64

What’s Next?

Obviously, this is a very basic simulator and could be extended in a variety of ways to improve the projections. However, it’s a good baseline for creating your own projections for your own team which is usually not the focus on fantasy football websites, as they prefer to make projections at the player level.

Some Caveats

Scoring

In this tutorial, I do a PPR scoring and I do not include special teams. This could certainly be extended to handle those cases.

nflgame version

The nflgame package is written for python 2.7. I typically prefer 3.4+, and so I cloned a repo that had futurized the package to python 3.5. You can find this in the pull requests of the main repo on github.