Aidan Coco

Nov 3, 2020

6 min read

Daily Fantasy and Data Science: a Match Made in Degenerate Heaven

Disclaimer: If you struggle with gambling addiction get the help you need. I just play some line ups here or there for a few bucks. Doing a little data analysis does NOT make it not gambling.


I was planning to write this about making a web scraper to get data on player prices and stats on daily fantasy sites so you could easily query certain lineups. Then I went on Fanduel and realized they straight up let you download a CSV of all the players with this data in one click. Oh, man no wonder I always lose at these things, they literally created this degenerate gambling activity disguised as fantasy football specifically for data scientists. I am sorry, I cannot show you how to win, but I will show you why you should not even bother playing DFS without some basic pandas or python skills. Eventually, as my skills grow I would like to make my own model to solve this complex problem on this blog, so if you follow along maybe you will learn how to rake in the cash eventually.

Why use Analytics at All?

This is what lighting my money on fire looks like

Humans are not very good at predicting things. Humans + analytics are a little better. So by accepting your limitations as an emotional being you can get an edge. Sports might be one of the ultimate examples of this. We watch sports to connect with the stories. We have idealized visions of the ultimate football player and how good they look in shorts. Nick Foles is an American hero and I refuse to accept he is almost the sole reason an elite Bears roster is starting to show its flaws. Daily Fantasy is all about choosing between people we love to watch play a sport we love, no NFL fan can truely make an unbiased selection. Im not saying watching film or speculating on news events has no place in figuring out whats going to happen, but scrolling through Fanduel and clicking on people is just not going to cut it. Furthermore, alot of what some data science skills will do for your line up is just arithmetic, even if you just use this blog to make all the different possible lineups and you pick some that you like, you’re still better off since it will open it up to possibilites you never would have noticed.

How Much Datascience Do You Need to Know?

I know medium is like a lightsource for flys but for datascientists for whatever reason, but if anybody is here just to learn how to optimize your DFS lineup this section is about the few skills you will need to acquire. I used Python and a package within Python called Pandas for everything in this blog. Thats probably going to sound like greek to you, and if it does, I encourage you to look at taking some free courses on learning the basics of python. Flatiron’s free bootcamp prep is what got me hooked. If you know what Pandas is I think you should be able to follow along with the rest of this blog, even if thats just copy and pasting.

No Webscraping Required

In my old click and pray days, I never thought much of this innocous little button, but its why I would say some basic data science is almost required to play DFS. Just click download players list and you will get a csv with all the data you need. I am sure the pros have bots that do this for them. How could you ever hope to compete?

Pandas Time

df = pd.read_csv("FanDuel-NFL-2020-11-01-51232-players-list.csv")

First step, import pandas as pd then run a pd.read_csv on wherever you saved that CSV we just got. If you dont know what any of that means, just run the same code as above and make sure the csv is in the same folder as the notebook you are using, although you will have to update the date in the file for future weeks.

df = df[df.columns[0:10]]

You can drop like half the columns that will not be relevant for today and are mostly NANs and get your dataframe down to this:

Doing some basic feature engineering to create a column for fantasy points per game(FPPG) divided by salary we can see who has been the best return on investment this season.

df_playable = df[(df.fppg > 1)&(df.played > 5)&(df.played < 8)]   

I narrowed things down a little further to people who had played for atleast five games and had more than one FPPG. You can filter however you like, but I strongly reccomend that you do the df.played < 8 since Fanduel includes last years stats for players that have not played this year. I also made each position into its own DataFrame since this will make it easier for us to build a lineup.

df_RB = df_playable[df_playable.position == 'RB']
df_WR = df_playable[df_playable.position == 'WR']
df_QB = df_playable[df_playable.position == 'QB']
df_TE = df_playable[df_playable.position == 'TE']
df_FLEX = df_playable[(df_playable.position == 'RB')|(df_playable.position == 'TE')|(df_playable.position == 'TE')]
df_D = df_playable[df_playable.position == 'D']

This is Where Things Get a Little Gross

lineup = []
for x in range(1,10000):
RB = df_RB.sample(2)
WR = df_WR.sample(3)
FLEX = df_FLEX.sample(1)
QB = df_QB.sample(1)
DEF = df_D.sample(1)
TE = df_TE.sample(1)
df1 = RB.append(WR).append(FLEX).append(QB).append(DEF).append(TE)

I needed a way to generate all the possible lineups and this is what I came up with. It’s slow and gross but it works. It goes through and makes 10,000 random lineups.

for x in lineup:
x['salary_total'] = x.salary.sum()

Heres some more gross for loops, sorry pandas afficionados. This code goes through all the lineups we made, filters out the ones that are above the salary cap then sorts them by FPPG. After this you can do whatever you want, you can access the line ups by calling them from the list. You can get the one with the most FPPG, look through the top ten and analyze them or run further queries on this list. Lets see how the best lineup this quick code returned did last week:

  • QB: Russel Wilson, 28 pts
  • RB: Rex Burkhead, 3 pts
  • RB: Alvin Kamara: 20.8 pts
  • WR: JuJu: 13pts
  • WR: Zach Pascal: 7 pts
  • WR: Tyler Locket: 7pts
  • FLEX: Jimmy Graham: 3pts
  • TE: Travis Kelce: 27pts
  • D: LA Chargers: 2pts

Wow those are some god awful picks; the idea of flexing Jimmy Graham makes me physically ill, but it still scored 112 points which out performed my galaxy brain 49ers stack and would have been enough to cash in a double up I played in this week.

Future Steps

Theres a ton wrong with this simple attempt at creating an optimal lineup but I think this shows how much easier datascience can make it to build a lineup, especially if you want to enter a bunch into a contest instead of a double up. Adding other data is my mine step forward, such as standard deviation, expert projections and opponent data. Then some data to validate picks and look for trends to work towards a model is the end goal. Hopefully I can adress those in a future blog. Its pretty complicated becuase you would need to run the model for each player to predict their points, then use a method like the one in this blog to optimize based on salary. This was just to show some basic pandas and how easy it is to apply this stuff to DFS.